Prediction of peptide cleavage sites using protein language models and graph neural networks

Prediction of peptide cleavage sites using protein language models and graph neural networks

October 30, 2025

Paula Cifuentes, Ramon Adàlia, Ismael Zamora

Abstract

The growing interest in using peptide molecules as therapeutic agents, driven by their high selectivity and efficacy, has become a significant trend in the pharmaceutical industry. However, their oral administration remains challenging due to their low bioavailability and vulnerability to proteases, which produce the cleavage of peptide bonds. To optimize peptide drug development, in silico tools based on machine learning algorithms have been developed for site of cleavage prediction. These tools, which rely on manual feature extraction, have limitations in capturing complex peptide structures, especially those involving non-natural amino acids or cyclic peptides. This study presents two novel in silico approaches for cleavage site prediction. The first approach uses protein language models, specifically ESM-2, which has been fine- tuned to leverage its learned peptide structure embeddings for accurate cleavage site prediction, eliminating the need for manual feature engineering. The second approach employs graph neural networks, representing peptides via hierarchical graphs at the atom and amino acid levels, effectively handling cyclic peptide structures, including those containing non-natural amino acids. The applicability of this second approach is shown through a case study on a set of four cyclic peptides containing non-natural amino acids, comparing in silico predictions with experimental data.

Scalable Peptide MRM Transition Prediction for High-Throughput Proteomics via Hashing-Based Sequence Encoding

Scalable Peptide MRM Transition Prediction for High-Throughput Proteomics via Hashing-Based Sequence Encoding

Peptide analysis via Multiple Reaction Monitoring (MRM) is indispensable for quantification and/or biomarker validation and drug development, yet its reliance on experimental transition optimization limits scalability. Current computational models for small molecules fail to address peptide-specific complexities, such as sequence-dependent fragmentation and charge-state variability. We introduce a novel framework that combines hashing-based peptide fragment encoding with gradient-boosted decision trees to predict MRM transitions efficiently. This method eliminates bottlenecks in experimental workflows, enabling rapid, resource-efficient transition identification without compromising accuracy—a critical advancement for high-throughput proteomics pipelines.