Machine Learning-Assisted False Positive Detection in Metabolite Identification Workflows
December 10, 2025
Abstract
Metabolite identification is a pivotal step in drug discovery and development, enabling the comprehensive analysis of drug-derived compounds within biological systems. However, the complexity of liquid chromatography–mass spectrometry data often results in numerous false positives, complicating the identification of true metabolites. This study introduces a machine-learning-based approach to improve the accuracy of false positive detection in metabolite identification workflows. By incorporating expert knowledge, we develop a feature set for metabolite-related chromatographic peaks that characterizes true and false positives with high accuracy, integrating data from mass spectra, chromatographic signals, and kinetic profiles. We validate this method via gradient boosting decision tree classifiers on both publicly available and proprietary “real-world” data sets, including small molecules and new modalities. Our findings demonstrate that machine learning-assisted techniques significantly reduce false positive identifications, thereby increasing the efficiency and accuracy of metabolite identification processes.

