Uncertainty-Aware Site-of-Metabolism Prediction from Ambiguous LC–MS Metabolite Identification Data

Uncertainty-Aware Site-of-Metabolism Prediction from Ambiguous LC–MS Metabolite Identification Data

June 2026, ASMS Conference

Ramon Adàlia1, 2; Ismael Zamora3

1Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain; 2Lead Molecular Design, SL, Sant Cugat del Vallès, Spain; 3Mass Analytica, Sant Cugat del Valles, Spain

 

Abstract

Introduction

Liquid chromatography–mass spectrometry (LC–MS) is the dominant technology for metabolite identification in early drug discovery, yet it frequently produces structurally ambiguous metabolites with multiple plausible sites of metabolism (SoMs). Although this ambiguity is well understood in LC–MS workflows, most computational SoM models require unambiguous, binary annotations and therefore cannot directly exploit discovery-stage data. As a result, a large fraction of routinely generated LC–MS metabolite information is excluded from predictive modeling. We present an uncertainty-aware modeling strategy that preserves LC–MS-derived structural ambiguity by encoding relative SoM plausibility, enabling direct use of metabolite identification data without requiring definitive structure elucidation.

Methods

Human liver microsome LC–MS metabolite identification data were processed with software to generate candidate metabolite structures consistent with observed mass shifts and fragmentation patterns. For each metabolite peak, atom-level soft labels were constructed by averaging SoM assignments across equally scoring structural hypotheses, yielding relative plausibility scores rather than binary labels. Metabolite peaks assigning nonzero labels to an excessively large fraction of atoms were filtered during label construction to control noise. Atom rankings were learned using a graph attention neural network trained with a pairwise ranking objective. Molecular graph features were augmented with atom-level reactivity scores from MetaSite7. Model performance was evaluated using ranking-based metrics on the soft-labeled dataset and top-2 accuracy on an independent benchmark with experimentally confirmed SoMs.

You must be logged in to access this content. Not yet registered? Create a new account

 

 

Advancing spatial-omics through Pyxis, vendor-neutral softwarefor ion mobility mass spectrometry imaging data analysis

Advancing spatial-omics through Pyxis, vendor-neutral software for ion mobility mass spectrometry imaging data analysis

November 25, 2025

Sara Tortorella, Sebastian Bessler, Giuseppe Arturi, Jens Soltwisch, Gabriele Cruciani

Abstract

Mass spectrometry imaging (MSI) is a powerful analytical technique suited for simultaneously measuring and assigning functional roles of multiple analytes directly from intact tissue sections. MSI acquisitions often lead to highly complex datasets with numerous isobaric species. Ion mobility (IM) spectrometry crucially helps to unravel these datasets by providing an orthogonal separation thus supplementing the lack of chromatographic separation in MSI. However, the rich, multidimensional data produced by IM-MSI investigations, combined with the lack of comprehensive software solutions that support the entire data analysis workflow, poses a major challenge preventing IM-MSI full exploitation. Here, we discuss the benefits and challenges of IM-MSI data analysis in metabolomic applications. Finally, Pyxis, a novel, vendor-neutral IT solution for IM-MSI data analysis, is introduced and its capabilities demonstrated on mouse kidney tissue and THP-1 monocytes data.

Machine Learning-Assisted False Positive Detection in Metabolite Identification Workflows

Machine Learning-Assisted False Positive Detection in Metabolite Identification Workflows

December 10, 2025

Ramon Adàlia, Paula Cifuentes, Joyce Liu, Lionel Cheruzel, Gemma Sanjuan, Tomàs Margalef, Ismael Zamora

Abstract

Metabolite identification is a pivotal step in drug discovery and development, enabling the comprehensive analysis of drug-derived compounds within biological systems. However, the complexity of liquid chromatography–mass spectrometry data often results in numerous false positives, complicating the identification of true metabolites. This study introduces a machine-learning-based approach to improve the accuracy of false positive detection in metabolite identification workflows. By incorporating expert knowledge, we develop a feature set for metabolite-related chromatographic peaks that characterizes true and false positives with high accuracy, integrating data from mass spectra, chromatographic signals, and kinetic profiles. We validate this method via gradient boosting decision tree classifiers on both publicly available and proprietary “real-world” data sets, including small molecules and new modalities. Our findings demonstrate that machine learning-assisted techniques significantly reduce false positive identifications, thereby increasing the efficiency and accuracy of metabolite identification processes.

Prediction of peptide cleavage sites using protein language models and graph neural networks

Prediction of peptide cleavage sites using protein language models and graph neural networks

October 30, 2025

Paula Cifuentes, Ramon Adàlia, Ismael Zamora

Abstract

The growing interest in using peptide molecules as therapeutic agents, driven by their high selectivity and efficacy, has become a significant trend in the pharmaceutical industry. However, their oral administration remains challenging due to their low bioavailability and vulnerability to proteases, which produce the cleavage of peptide bonds. To optimize peptide drug development, in silico tools based on machine learning algorithms have been developed for site of cleavage prediction. These tools, which rely on manual feature extraction, have limitations in capturing complex peptide structures, especially those involving non-natural amino acids or cyclic peptides. This study presents two novel in silico approaches for cleavage site prediction. The first approach uses protein language models, specifically ESM-2, which has been fine- tuned to leverage its learned peptide structure embeddings for accurate cleavage site prediction, eliminating the need for manual feature engineering. The second approach employs graph neural networks, representing peptides via hierarchical graphs at the atom and amino acid levels, effectively handling cyclic peptide structures, including those containing non-natural amino acids. The applicability of this second approach is shown through a case study on a set of four cyclic peptides containing non-natural amino acids, comparing in silico predictions with experimental data.

Scalable Peptide MRM Transition Prediction for High-Throughput Proteomics via Hashing-Based Sequence Encoding

Scalable Peptide MRM Transition Prediction for High-Throughput Proteomics via Hashing-Based Sequence Encoding

Peptide analysis via Multiple Reaction Monitoring (MRM) is indispensable for quantification and/or biomarker validation and drug development, yet its reliance on experimental transition optimization limits scalability. Current computational models for small molecules fail to address peptide-specific complexities, such as sequence-dependent fragmentation and charge-state variability. We introduce a novel framework that combines hashing-based peptide fragment encoding with gradient-boosted decision trees to predict MRM transitions efficiently. This method eliminates bottlenecks in experimental workflows, enabling rapid, resource-efficient transition identification without compromising accuracy—a critical advancement for high-throughput proteomics pipelines.