Helmkit: fast and robust conversion of HELM notation to atomistic representations for large-scale macromolecular informatics

Helmkit: fast and robust conversion of HELM notation to atomistic representations for large-scale macromolecular informatics

May 29, 2026

Ramon Adàlia, Gemma Sanjuan, Tomàs Margalef, Ismael Zamora.

Abstract

The Hierarchical Editing Language for Macromolecules (HELM) provides a powerful framework for representing complex biomolecules, including peptides, oligonucleotides, and hybrid constructs, but existing tools for converting HELM notations to atomistic models suffer from limitations in speed, scope, and robustness. We introduce helmkit, an open-source Python library that enables direct, high-throughput conversion of HELM strings to RDKit molecular objects. Designed for general macromolecular structures, helmkit supports peptides, nucleic acids, chemical linkers, and hybrids, while natively handling inline monomers, special characters in names, and automatic inference of missing attachment points. Its streamlined architecture, with minimal dependencies and built-in parallelization, achieves processing speeds of up to 5,000 HELM entities per second. Validation on large-scale datasets from PubChem (878,442 entries) and CycPeptMPDB (7,298 entries) demonstrates near-perfect accuracy, with helmkit successfully parsing structures that fail in other libraries. By facilitating efficient, scalable analysis of diverse macromolecules, helmkit advances computational workflows in drug discovery, virtual screening, and biomolecular engineering.

Predicting enzymatic cleavage sites in cyclic peptides with non-canonical amino acids using a Graphormer model trained on MetID user data

Predicting enzymatic cleavage sites in cyclic peptides with non-canonical amino acids using a Graphormer model trained on MetID user data

April 25, 2026

Paula Cifuentes, Ramon Adàlia, Lisa A. Vasicek, Richard Gundersdorf, Abigail Wheeler, Ismael Zamora.

Abstract

Peptides are promising therapeutic agents because of their high selectivity and efficacy. However, their development is often limited by rapid enzymatic degradation, resulting in short half-lives. Chemical modifications such as cyclization, incorporation of D- or non-natural amino acids, and terminal modifications can improve peptide stability, yet their productive application requires prior identification of potential cleavage sites. Experimental determination of these sites is time-consuming, expensive, and may not fully capture the complexity of physiological environments. While computational approaches for cleavage site prediction exist, most are limited: they apply only to linear peptides composed of standard amino acids, have been tested only in single-enzyme systems, and cannot incorporate user-generated metabolite identification (MetID) data, restricting their utility for customized peptide design. To overcome these limitations, we present a workflow that integrates liquid chromatography–mass spectrometry (LC–MS) data from peptide metabolism studies with a Graphormer-based machine learning model to predict potential cleavage sites in peptides, including those with cycles and/or modified amino acids. The approach was evaluated using publicly available MEROPS datasets and MetID datasets from a leading pharmaceutical company, which included cyclic peptides with both natural and modified amino acids incubated in complex enzymatic matrices. The results show that the model achieves high precision in top-ranked cleavage site predictions, providing scientists with a practical tool that can help guide peptide drug design.

Software-Aided Prediction of Key Peptide Properties Using LC–MS Data

Software-Aided Prediction of Key Peptide Properties Using LC–MS Data

June 2026, ASMS Conference

Paula Cifuentes1, 2, 3; Ramon Adàlia2, 3, 4; Lisa A.Vasicek5; Richard Gundersdorf5; Abigail Wheeler5; Paul Harradine5; Ismael Zamora3

1Universitat Pompeu Fabra, Barcelona, Spain; 2Lead Molecular Design, SL, Sant Cugat del Vallès, Spain; 3Mass Analytica, S.L., Sant Cugat del Vallés, Spain; 4Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain; 5Merck & Co., Inc., West Point, PA

 

Abstract

Introduction

Peptides have emerged as promising therapeutic agents due to their high specificity, favorable safety profiles, and cost-effective synthesis. However, their clinical development is limited by low oral bioavailability and short half-lives. These challenges arise from high clearance rates, poor solubility, limited membrane permeability, and reduced metabolic stability caused by peptidase activity and modulated by post-translational modifications. Deficiencies in any of these properties can significantly impact peptide’s therapeutic efficacy. Consequently, in silico prediction tools have become increasingly important in the pharmaceutical industry, enabling early identification and elimination of unsuitable peptide drug candidates. Despite recent advances, existing tools are often limited to natural amino acids, cannot process cyclic peptides, and lack customization to user-specific experimental data, highlighting the need for further development.

Methods

The methodology defines a new workflow that integrates LC-MS data from peptide metabolism studies with a Graphormer-based machine learning model to predict five key peptide properties: potential cleavage sites, half-life, permeability, solvent accessibility, and post-translational modifications. The methodology operates without structural constraints, allowing cyclic peptides, and modified amino acids. The models employ transformer architecture with added mechanisms to encode graph structural information. Users can train models with their own LC-MS experimental data for improved alignment with specific peptides and continuously update them via a self-learning approach. The five selected end points predictive models have been compared to the state-of-the art tools. Additionally, the site of cleavage model and half-life models were validated using experimental MetID data from a pharmaceutical company.

You must be logged in to access this content. Not yet registered? Create a new account

 

 

AI Parent-to-Metabolite Pathway Predictor

AI Parent-to-Metabolite Pathway Predictor

June 2026, ASMS Conference

Savannah M Mason1; Paula Cifuentes1, 2, 3; Tommaso Palomba1, 4; Ismael Zamora1

1Mass Analytica, S.L., Sant Cugat del Vallés, Spain; 2Universitat Pompeu Fabra, Barcelona, Spain; 3Lead Molecular Design, SL, Sant Cugat del Vallès, Spain; 4Molecular Discovery, Borehamwood, United Kingdom

 

Abstract

Introduction

Most drugs undergo chemical transformations in the body, known as biotransformations, to produce metabolites that are more readily eliminated. These reactions are largely mediated by metabolic enzymes, primarily in the liver, and exhibit high specificity, with each enzyme favoring particular substrates. Understanding the enzymes responsible for metabolite formation is critical for elucidating the metabolic pathways, predicting metabolic behavior, and anticipating potential toxicity. Metabolite Identification (MetID) studies, performed in vitro or in vivo, rely heavily on LC-MS/MS for the detection and structural identification of metabolites. However, most discovery studies provide limited information about the enzymes involved. Consequently, experimental approaches to reaction phenotyping, including recombinant enzymes incubations or chemical inhibition, are time- and resource-intensive, making comprehensive pathway characterization challenging.

Methods

This workflow integrates LC-MS MetID experiments from in vitro incubations. Users may apply a model to an experimentally identified metabolite to predict the possible enzymatic pathways responsible for its formation, including Phase I and Phase II reactions. The computational algorithm evaluates the exposure of reactive atoms of xenobiotic compounds to catalytic residues of human metabolic enzymes by simulating interactions between the two, using the enzyme’s 3D structure. Multiple docking poses are generated and scored based on energy contributions. The best pose is normalized to rank the probability, which is provided in the output. MetID experiments were analyzed using MassMetaSite in the ONIRO server with LC-MS data from Sciex and Thermo instruments. The predictions were performed using MetaSite 7 inside Oniro.

You must be logged in to access this content. Not yet registered? Create a new account