Prediction of peptide cleavage sites using protein language models and graph neural networks
Prediction of peptide cleavage sites using protein language models and graph neural networks
October 30, 2025
Abstract
The growing interest in using peptide molecules as therapeutic agents, driven by their high selectivity and efficacy, has become a significant trend in the pharmaceutical industry. However, their oral administration remains challenging due to their low bioavailability and vulnerability to proteases, which produce the cleavage of peptide bonds. To optimize peptide drug development, in silico tools based on machine learning algorithms have been developed for site of cleavage prediction. These tools, which rely on manual feature extraction, have limitations in capturing complex peptide structures, especially those involving non-natural amino acids or cyclic peptides. This study presents two novel in silico approaches for cleavage site prediction. The first approach uses protein language models, specifically ESM-2, which has been fine- tuned to leverage its learned peptide structure embeddings for accurate cleavage site prediction, eliminating the need for manual feature engineering. The second approach employs graph neural networks, representing peptides via hierarchical graphs at the atom and amino acid levels, effectively handling cyclic peptide structures, including those containing non-natural amino acids. The applicability of this second approach is shown through a case study on a set of four cyclic peptides containing non-natural amino acids, comparing in silico predictions with experimental data.
AI-PEAK SELECTION
2026 Mass Analytica User Meeting and Training
An exciting opportunity to shape the future of our software and gain valuable training in the process
2026 Mass Analytica Training
Contact us for a focused and hands-on, on-site training session designed to deliver practical skills and targeted insights on our IT solutions – tailored specifically to your needs and delivered directly at your location.
Whether you want to deepen your knowledge of a specific software, discover new tools, or exchange best data analysis practices, this interactive session combines expert guidance, live demos, and hands-on training tailored to your use cases.
On-site training format
Duration: 2-4 hours (adjustable based on your needs)
Format: In-person, interactive, and hands-on
Content: Training, live demos, and practical examples focused on selected use cases
Note: Contact us to learn more and customize your session: info@mass-analytica.com


