Uncertainty-Aware Site-of-Metabolism Prediction from Ambiguous LC–MS Metabolite Identification Data

Uncertainty-Aware Site-of-Metabolism Prediction from Ambiguous LC–MS Metabolite Identification Data

June 2026, ASMS Conference

Ramon Adàlia1, 2; Ismael Zamora3

1Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain; 2Lead Molecular Design, SL, Sant Cugat del Vallès, Spain; 3Mass Analytica, Sant Cugat del Valles, Spain

 

Abstract

Introduction

Liquid chromatography–mass spectrometry (LC–MS) is the dominant technology for metabolite identification in early drug discovery, yet it frequently produces structurally ambiguous metabolites with multiple plausible sites of metabolism (SoMs). Although this ambiguity is well understood in LC–MS workflows, most computational SoM models require unambiguous, binary annotations and therefore cannot directly exploit discovery-stage data. As a result, a large fraction of routinely generated LC–MS metabolite information is excluded from predictive modeling. We present an uncertainty-aware modeling strategy that preserves LC–MS-derived structural ambiguity by encoding relative SoM plausibility, enabling direct use of metabolite identification data without requiring definitive structure elucidation.

Methods

Human liver microsome LC–MS metabolite identification data were processed with software to generate candidate metabolite structures consistent with observed mass shifts and fragmentation patterns. For each metabolite peak, atom-level soft labels were constructed by averaging SoM assignments across equally scoring structural hypotheses, yielding relative plausibility scores rather than binary labels. Metabolite peaks assigning nonzero labels to an excessively large fraction of atoms were filtered during label construction to control noise. Atom rankings were learned using a graph attention neural network trained with a pairwise ranking objective. Molecular graph features were augmented with atom-level reactivity scores from MetaSite7. Model performance was evaluated using ranking-based metrics on the soft-labeled dataset and top-2 accuracy on an independent benchmark with experimentally confirmed SoMs.

You must be logged in to access this content. Not yet registered? Create a new account