Mass Analytica

Helmkit: fast and robust conversion of HELM notation to atomistic representations for large-scale macromolecular informatics

May 29, 2026

Ramon Adàlia, Gemma Sanjuan, Tomàs Margalef, Ismael Zamora.

Abstract

The Hierarchical Editing Language for Macromolecules (HELM) provides a powerful framework for representing complex biomolecules, including peptides, oligonucleotides, and hybrid constructs, but existing tools for converting HELM notations to atomistic models suffer from limitations in speed, scope, and robustness. We introduce helmkit, an open-source Python library that enables direct, high-throughput conversion of HELM strings to RDKit molecular objects. Designed for general macromolecular structures, helmkit supports peptides, nucleic acids, chemical linkers, and hybrids, while natively handling inline monomers, special characters in names, and automatic inference of missing attachment points. Its streamlined architecture, with minimal dependencies and built-in parallelization, achieves processing speeds of up to 5,000 HELM entities per second. Validation on large-scale datasets from PubChem (878,442 entries) and CycPeptMPDB (7,298 entries) demonstrates near-perfect accuracy, with helmkit successfully parsing structures that fail in other libraries. By facilitating efficient, scalable analysis of diverse macromolecules, helmkit advances computational workflows in drug discovery, virtual screening, and biomolecular engineering.

Read the article

Predicting enzymatic cleavage sites in cyclic peptides with non-canonical amino acids using a Graphormer model trained on MetID user data

April 25, 2026

Paula Cifuentes, Ramon Adàlia, Lisa A. Vasicek, Richard Gundersdorf, Abigail Wheeler, Ismael Zamora.

Abstract

Peptides are promising therapeutic agents because of their high selectivity and efficacy. However, their development is often limited by rapid enzymatic degradation, resulting in short half-lives. Chemical modifications such as cyclization, incorporation of D- or non-natural amino acids, and terminal modifications can improve peptide stability, yet their productive application requires prior identification of potential cleavage sites. Experimental determination of these sites is time-consuming, expensive, and may not fully capture the complexity of physiological environments. While computational approaches for cleavage site prediction exist, most are limited: they apply only to linear peptides composed of standard amino acids, have been tested only in single-enzyme systems, and cannot incorporate user-generated metabolite identification (MetID) data, restricting their utility for customized peptide design. To overcome these limitations, we present a workflow that integrates liquid chromatography–mass spectrometry (LC–MS) data from peptide metabolism studies with a Graphormer-based machine learning model to predict potential cleavage sites in peptides, including those with cycles and/or modified amino acids. The approach was evaluated using publicly available MEROPS datasets and MetID datasets from a leading pharmaceutical company, which included cyclic peptides with both natural and modified amino acids incubated in complex enzymatic matrices. The results show that the model achieves high precision in top-ranked cleavage site predictions, providing scientists with a practical tool that can help guide peptide drug design.

Read the article

Software-Aided Prediction of Key Peptide Properties Using LC–MS Data

June 2026, ASMS Conference

Paula Cifuentes^{1, 2, 3}; Ramon Adàlia^{2, 3, 4}; Lisa A.Vasicek⁵; Richard Gundersdorf⁵; Abigail Wheeler⁵; Paul Harradine⁵; Ismael Zamora³

¹Universitat Pompeu Fabra, Barcelona, Spain; ²Lead Molecular Design, SL, Sant Cugat del Vallès, Spain; ³Mass Analytica, S.L., Sant Cugat del Vallés, Spain; ⁴Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain; ⁵Merck & Co., Inc., West Point, PA

Abstract

Introduction

Peptides have emerged as promising therapeutic agents due to their high specificity, favorable safety profiles, and cost-effective synthesis. However, their clinical development is limited by low oral bioavailability and short half-lives. These challenges arise from high clearance rates, poor solubility, limited membrane permeability, and reduced metabolic stability caused by peptidase activity and modulated by post-translational modifications. Deficiencies in any of these properties can significantly impact peptide’s therapeutic efficacy. Consequently, in silico prediction tools have become increasingly important in the pharmaceutical industry, enabling early identification and elimination of unsuitable peptide drug candidates. Despite recent advances, existing tools are often limited to natural amino acids, cannot process cyclic peptides, and lack customization to user-specific experimental data, highlighting the need for further development.

Methods

The methodology defines a new workflow that integrates LC-MS data from peptide metabolism studies with a Graphormer-based machine learning model to predict five key peptide properties: potential cleavage sites, half-life, permeability, solvent accessibility, and post-translational modifications. The methodology operates without structural constraints, allowing cyclic peptides, and modified amino acids. The models employ transformer architecture with added mechanisms to encode graph structural information. Users can train models with their own LC-MS experimental data for improved alignment with specific peptides and continuously update them via a self-learning approach. The five selected end points predictive models have been compared to the state-of-the art tools. Additionally, the site of cleavage model and half-life models were validated using experimental MetID data from a pharmaceutical company.

You must be logged in to access this content. Not yet registered? Create a new account

AI Parent-to-Metabolite Pathway Predictor

June 2026, ASMS Conference

Savannah M Mason¹; Paula Cifuentes^{1, 2, 3}; Tommaso Palomba^{1, 4}; Ismael Zamora¹

¹Mass Analytica, S.L., Sant Cugat del Vallés, Spain; ²Universitat Pompeu Fabra, Barcelona, Spain; ³Lead Molecular Design, SL, Sant Cugat del Vallès, Spain; ⁴Molecular Discovery, Borehamwood, United Kingdom

Abstract

Introduction

Most drugs undergo chemical transformations in the body, known as biotransformations, to produce metabolites that are more readily eliminated. These reactions are largely mediated by metabolic enzymes, primarily in the liver, and exhibit high specificity, with each enzyme favoring particular substrates. Understanding the enzymes responsible for metabolite formation is critical for elucidating the metabolic pathways, predicting metabolic behavior, and anticipating potential toxicity. Metabolite Identification (MetID) studies, performed in vitro or in vivo, rely heavily on LC-MS/MS for the detection and structural identification of metabolites. However, most discovery studies provide limited information about the enzymes involved. Consequently, experimental approaches to reaction phenotyping, including recombinant enzymes incubations or chemical inhibition, are time- and resource-intensive, making comprehensive pathway characterization challenging.

Methods

This workflow integrates LC-MS MetID experiments from in vitro incubations. Users may apply a model to an experimentally identified metabolite to predict the possible enzymatic pathways responsible for its formation, including Phase I and Phase II reactions. The computational algorithm evaluates the exposure of reactive atoms of xenobiotic compounds to catalytic residues of human metabolic enzymes by simulating interactions between the two, using the enzyme’s 3D structure. Multiple docking poses are generated and scored based on energy contributions. The best pose is normalized to rank the probability, which is provided in the output. MetID experiments were analyzed using MassMetaSite in the ONIRO server with LC-MS data from Sciex and Thermo instruments. The predictions were performed using MetaSite 7 inside Oniro.

You must be logged in to access this content. Not yet registered? Create a new account

Development of Machine Learning assisted Fingermark Imaging Software (iFIS)

June 2026, ASMS Conference

Simona Francese¹; Elias Jensen²; Sara Tortorella³; Chloe Spencer^{1, 4}; Giuseppe Arturi⁵; Simon Cross⁶; Hassan Ugail²

¹Sheffield Hallam University, Sheffield, United Kingdom; ²University of Bradford, Bradford, United Kingdom; ³Mass Analytica, Sant Cugat del Valles, Spain; ⁴University of Nottingham, Nottingham, United Kingdom; ⁵Molecular Discovery, Borehamwood, United Kingdom; ⁶Mass Analytica, S.L., Borehamwood, United Kingdom

Abstract

Introduction

Molecular fingerprinting has been featured in the Fingermark Visualisation Manual edited by Dstl/Home Office and is being used in Police casework. It encompasses the application of Mass Spectrometry Imaging (MSI), particularly MALDI MSI, for the provision of biometric information, through generating multiple molecular images of crime scene fingermark evidence, alongside contextual (molecular) information. Whilst both advanced freeware and proprietary MSI exist, they are complex as mostly built to process biological tissue imaging data. We have developed a dedicated software, enabling auditable fingermark images “manipulation”, seamless separation of overlapping fingermarks and, crucially, integrating a machine learning algorithm capable of grading and finding fingermark images of the highest quality within a seconds, without manually having to inspect the entire mass range.

Methods

Following Ethical approval (ER52762288), fingermarks were matrix spray-coated using the HTX M3+™ Sprayer (HTX Technologies, USA) and imaged on a SELECT SERIES MRT MALDI mass spectrometer (Waters Corporation, UK) in positive mode. Lipostar MSI (Molecular Discovery, UK) – used as the skeleton to build iFIS – generated 663 images which were graded according to the 5-point scale Scotland Yard system machine learning model. The ML algorithm utilised a range of deep features based on the Resnet50 architecture corresponding to the visual characteristics of the fingermark (minutiae and texture features). These were then fed to a Support Vector Machine-based classification algorithm for categorizing fingermarks into five distinct categories.

You must be logged in to access this content. Not yet registered? Create a new account

1 2 Next page

Tag Archives: 2026

Helmkit: fast and robust conversion of HELM notation to atomistic representations for large-scale macromolecular informatics

Helmkit: fast and robust conversion of HELM notation to atomistic representations for large-scale macromolecular informatics

Abstract

Predicting enzymatic cleavage sites in cyclic peptides with non-canonical amino acids using a Graphormer model trained on MetID user data

Predicting enzymatic cleavage sites in cyclic peptides with non-canonical amino acids using a Graphormer model trained on MetID user data

Abstract

Software-Aided Prediction of Key Peptide Properties Using LC–MS Data

Software-Aided Prediction of Key Peptide Properties Using LC–MS Data

Abstract

AI Parent-to-Metabolite Pathway Predictor

AI Parent-to-Metabolite Pathway Predictor

Abstract

Development of Machine Learning assisted Fingermark Imaging Software (iFIS)

Development of Machine Learning assisted Fingermark Imaging Software (iFIS)

Abstract

Address

Email Us