Software-aided approach designed to analyze and predict cleavage sites for peptides

Software-aided approach designed to analyze and predict cleavage sites for peptides

73rd ASMS Conference on Mass Spectrometry. June 2025

Paula Cifuentes1,2; Ramon Adalia1,2; Ismael Zamora2; Lisa O’Callaghan3, Richard Gundersdorf3

1Lead Molecular Design, S.L., Sant Cugat Del Valles, Spain. 2Mass Analytica, S.L., Sant Cugat Del Valles, Spain. 3Merck & Co., Inc., West Point, PA, USA

Abstract

Introduction

The growing interest in using peptide molecules as therapeutic agents, driven by their high selectivity and efficacy, has become a significant trend in the pharmaceutical industry. However, oral administration remains a key challenge, as peptide drugs have low bioavailability and are highly susceptible to proteases that produce the cleavage of peptide bonds. Identifying this site of cleavage and characterizing the resulting metabolites (MetID) is essential to understanding how peptides are metabolized. In-silico tools have been developed to predict peptide cleavage sites. However, these tools face limitations, such as limited applicability to unnatural amino acids, inability to process cyclic peptides, and lack of customization to user-specific data. These challenges highlight the need for further advancements in this area.

Methods

The methodology defines a new workflow that uses LC-MS data from peptide metabolic experiments as well as data coming from external sources to predict potential cleavage sites in new candidate’s peptide drugs by employing a machine-learning model. The models make use of transformer architecture with added mechanisms to encode graph structural data. Notably, these models eliminate the need for manual feature extraction, as they can predict peptide properties such as secondary structure and solvent accessibility. The methodology is designed to operate without structural constraints, allowing for linear and cyclic peptides, and including natural and unnatural amino acids. Users can train the models with their own experimental data. The methodology was validated using experimental MetID data from over 100 individual peptides.

Preliminary data

Our machine learning model demonstrated strong performance on an experimental dataset of 114 peptides incubated with a complex matrix of proteases, including cyclic structures with non-canonical amino acids. The model achieved a Hits@4 score of 2.74, indicating that, on average, 2.74 correct cleavage sites were identified within the top four predictions per peptide. Furthermore, the model achieved a precision of 91.30% for the top-ranked prediction, signifying that the predicted cleavage site was correct in 91.30% of cases. Additionally, the model achieved a mean average precision (MAP) of 84.56, highlighting its effectiveness in ranking cleavage sites accurately across the dataset.  Moreover, this model can be updated with new experimental MetID user data to further improve its performance by a self-learning approach where new expert curated information is added to the model building process without human intervention.

In addition, models were developed and trained on publicly available data for a selected number of proteases involved in peptide drug degradation. These models were optimized using 5-fold cross-validation and hyperparameter tuning, achieving F1 scores exceeding 95% and precisions of 98%, demonstrating their high accuracy and reliability. When compared to existing cleavage site prediction models from the literature, our approach outperformed by achieving an F1 score 60% higher, without the need for feature extraction or dataset balancing techniques.

This tool has the potential to significantly accelerate the development of peptide-based drugs by efficiently identifying cleavage sites, enabling more effective modifications to compound structures that enhance their stability, while reducing the time and cost associated with experimental validation..

 

You must be logged in to access this content. Not yet registered? Create a new account

 

 

LC-MS and High-Throughput Data Processing Solutions for Lipid Metabolic Tracing Using Bioorthogonal Click Chemistry

LC-MS and High-Throughput Data Processing Solutions for Lipid Metabolic Tracing Using Bioorthogonal Click Chemistry

24 April 2025

Palina NepachalovichStefano BonciarelliGabriele Lombardi BendoulaJenny DesantisMichela EleuteriChristoph ThieleLaura GoracciMaria Fedorova

Graphical Abstract

This study introduces an integrated analytical and bioinformatics platform for high-throughput tracing of lipid metabolism using bioorthogonal alkyne fatty acids and optimized LC-MS workflow. Applied to human fibrosarcoma cells, the method traced fatty acid metabolism, revealing nuances in sphingolipid routing and metabolic bottlenecks, highlighting its potential for lipidomics research.

Abstract

Tracing lipid metabolism in mammalian cells presents a significant technological challenge due to the vast structural diversity of lipids involved in multiple metabolic routes. Bioorthogonal approaches based on click chemistry have revolutionized analytical performance in lipid tracing. When adapted for mass spectrometry (MS), they enable highly specific and sensitive analyses of lipid transformations at the lipidome scale. Here, we advance this approach by integrating liquid chromatography (LC) prior to MS detection and developing a software-assisted workflow for high-throughput data processing. LC separation resolved labeled and unmodified lipids, enabling qualitative and quantitative analysis of both lipidome fractions, as well as isomeric lipid species. Using synthetic standards and endogenously produced alkyne lipids, we characterized LC-MS behavior, including preferential adduct formation and the extent of in-source fragmentation. Specific fragmentation rules, derived from tandem MS experiments for 23 lipid subclasses, were implemented in Lipostar2 software for high-throughput annotation and quantification of labeled lipids. Applying this platform, we traced metabolic pathways of palmitic and oleic acid alkynes, revealing distinct lipid incorporation patterns and metabolic bottlenecks. Altogether, here we provide an integrated analytical and bioinformatics platform for high-throughput tracing of lipid metabolism using LC-MS workflow.

 

Molecular Structure and Mass Spectral Data Quality–Driven Processing of High‐Resolution Mass Spectrometry for Quantitative Analysis

Molecular Structure and Mass Spectral Data Quality–Driven Processing of High‐Resolution Mass Spectrometry for Quantitative Analysis

February 2025

Fabien Fontaine, Luca Morettoni, Ken Anderson, Bernard Choi, Ismael Zamora, Kevin P. Bateman

Abstract

Rationale

LC-MS-based quantification is traditionally performed using selected or multiple reaction monitoring (SRM/MRM) acquisition functions on triple quadrupole (QQQ) instruments resulting in both high sensitivity and selectivity. This workflow requires a previously identified reaction or transition from a precursor ion to a fragment ion to be monitored to obtain the needed selectivity for the compound of interest. High-resolution mass spectrometry (HRMS) has long sought to be a viable alternative for quantitatipve workflows but has been unable to broadly compete, mainly due to the lack of suitable data processing software.

Methods

The approach we developed agnostically and automatically identifies all ions related to the compound being analyzed in both the MS and MSMS data, acquired with data-dependent or data-independent methods. The algorithm automatically selects optimal parameters (ion extraction window, ions to sum, etc.) to provide the best overall method to meet the acceptance criteria defined by the user (accuracy/precision).

Results

The results obtained are directly compared to QQQ data collected from the same set of samples and show that the automated HRMS approach is as good as and, in some cases, better than the traditional QQQ approach in terms of selectivity, sensitivity, and dynamic range.

Conclusions

This new methodology enables the use of generic methods for data collection for quantitative analysis using high-resolution mass spectrometry. With this approach, data collection is faster, and the processing algorithm provides quality equal to or better than the current QQQ methodology. This enables an overall reduction in cycle time and improved assay performance versus current HRMS-based quantitative analysis as well as traditional QQQ workflows.

Oniro: automation and standardization for workflow definition

Oniro: automation and standardization for workflow definition

The new Workflow definition guides the user through the steps to define experiments, starting from the acquisition sample list. By defining the workflow, new experiments are generated and launched directly on the Oniro server. Oniro directly controls MassMetaSite, MassChemSite, or WebQuant computations, reducing analysis time and minimizing errors from file handling. Once a workflow is defined, it can be saved and automatically applied to new sample lists.”

Customizable spectra signal colors

Customizable spectra signal colors

The user can customize the colors displayed for different types of signals: match (shifted or non-shifted), mismatch, metabolite match, or unassigned m/z. The colors assigned to each signal type are also kept in the report analysis.