Software-aided approach designed to analyze and predict cleavage sites for peptides
73rd ASMS Conference on Mass Spectrometry. June 2025
Paula Cifuentes1,2; Ramon Adalia1,2; Ismael Zamora2; Lisa O’Callaghan3, Richard Gundersdorf3
1Lead Molecular Design, S.L., Sant Cugat Del Valles, Spain. 2Mass Analytica, S.L., Sant Cugat Del Valles, Spain. 3Merck & Co., Inc., West Point, PA, USA
Abstract
Introduction
The growing interest in using peptide molecules as therapeutic agents, driven by their high selectivity and efficacy, has become a significant trend in the pharmaceutical industry. However, oral administration remains a key challenge, as peptide drugs have low bioavailability and are highly susceptible to proteases that produce the cleavage of peptide bonds. Identifying this site of cleavage and characterizing the resulting metabolites (MetID) is essential to understanding how peptides are metabolized. In-silico tools have been developed to predict peptide cleavage sites. However, these tools face limitations, such as limited applicability to unnatural amino acids, inability to process cyclic peptides, and lack of customization to user-specific data. These challenges highlight the need for further advancements in this area.
Methods
The methodology defines a new workflow that uses LC-MS data from peptide metabolic experiments as well as data coming from external sources to predict potential cleavage sites in new candidate’s peptide drugs by employing a machine-learning model. The models make use of transformer architecture with added mechanisms to encode graph structural data. Notably, these models eliminate the need for manual feature extraction, as they can predict peptide properties such as secondary structure and solvent accessibility. The methodology is designed to operate without structural constraints, allowing for linear and cyclic peptides, and including natural and unnatural amino acids. Users can train the models with their own experimental data. The methodology was validated using experimental MetID data from over 100 individual peptides.
Preliminary data
Our machine learning model demonstrated strong performance on an experimental dataset of 114 peptides incubated with a complex matrix of proteases, including cyclic structures with non-canonical amino acids. The model achieved a Hits@4 score of 2.74, indicating that, on average, 2.74 correct cleavage sites were identified within the top four predictions per peptide. Furthermore, the model achieved a precision of 91.30% for the top-ranked prediction, signifying that the predicted cleavage site was correct in 91.30% of cases. Additionally, the model achieved a mean average precision (MAP) of 84.56, highlighting its effectiveness in ranking cleavage sites accurately across the dataset. Moreover, this model can be updated with new experimental MetID user data to further improve its performance by a self-learning approach where new expert curated information is added to the model building process without human intervention.
In addition, models were developed and trained on publicly available data for a selected number of proteases involved in peptide drug degradation. These models were optimized using 5-fold cross-validation and hyperparameter tuning, achieving F1 scores exceeding 95% and precisions of 98%, demonstrating their high accuracy and reliability. When compared to existing cleavage site prediction models from the literature, our approach outperformed by achieving an F1 score 60% higher, without the need for feature extraction or dataset balancing techniques.
This tool has the potential to significantly accelerate the development of peptide-based drugs by efficiently identifying cleavage sites, enabling more effective modifications to compound structures that enhance their stability, while reducing the time and cost associated with experimental validation..
You must be logged in to access this content. Not yet registered? Create a new account