Lipostar, a Comprehensive Platform-Neutral Cheminformatics Tool for Lipidomics

Lipostar, a Comprehensive Platform-Neutral Cheminformatics Tool for Lipidomics

May 2017.

Goracci L, Tortorella S, Tiberi P, Pellegrino RM, Di Veroli A, Valeri A, Cruciani G.

Abstract

To date, the main limitations for LC-MS-based untargeted lipidomics reside in the lack of adequate computational and cheminformatics tools that are able to support the analysis of several thousands of species from biological samples, enabling data mining and automating lipid identification and external prediction processes. To address these issues, we developed Lipostar, a novel vendor-neutral high-throughput software that effectively supports both targeted and untargeted LC-MS lipidomics, implementing data acquisition, user-friendly multivariate analysis (to be used for model generation and new sample predictions), and advanced lipid identification protocols that can work with or without the support of preformed lipid databases. Moreover, Lipostar integrates the lipidomic processes with a full metabolite identification (MetID) procedure, essential in drug safety applications and in translational studies. Case studies demonstrating a number of Lipostar features are also presented. 

An innovative algorithm to elucidate the structure of unknown compounds using tandem Mass Spectrometry and NMR data

An innovative algorithm to elucidate the structure of unknown compounds using tandem Mass Spectrometry and NMR data

65th ASMS Conference on Mass Spectrometry and Allied Topics, Indianapolis, IN (United States of America) 04 June 2017

Abstract

The interpretation of data obtained by tandem mass spectrometry is usually the bottleneck in different areas. This process becomes more complicated if the scientist does not have any clue about the structure of the analyzed compound. Until now, several algorithms have been developed to make easier the structural determination of the MS/MS data. Unfortunately, most of them use a database of interpreted MS/MS spectra where the input data is queried, which can reduce the number of potential results to those ones contained in the original dataset.

The algorithm developed and presented here makes the difference between other software in the origin of the initial dataset where the input MS/MS data is looked into. Methods The presented methodology is composed by three parts. The creation of the database is the first one. Users can choose a set of compounds from their own data or take them from an external database. Those compounds will be fragmented and stored on the database individually. In the second part, mz values from the input MS/MS are queried on the database and used to build a set of candidates by its rational combination. In the last part of the code, all the candidates are fragmented and compared with the peaks of the input MS/MS spectra. To decrease the number results, the NMR spectra of all of them is predicted and compared with the experimental NMR data of unknown compound. Preliminary Data The algorithm has been developed and successfully tested using a small set of compounds from the pharmacological area. One of them is 10P-909 (PubChem CID: 1480036; IUPAC name: 2-chloro-5-[[4-[3-(trifluoromethyl) phenyl] piperazin-1-yl]methyl]-1,3-thiazole).

The procedure to elucidate this benchmarking compound will be described in the following lines. The first step we made was the creation of the database. As a set of compounds, we use 500,000 structures from PubChem. Then, those compounds were fragmented using an in-house code, generating a final set of 6 million of independent fragments. Next, the algorithm was feed with the required parameters: the mz of the unknown structure, the tolerance given to this mz value, the ion mode, the adduct type and the MS/MS data. In this case study, the mz of the unknown compound is 362.0703, was acquired with positive ion mode and its adduct type is [M+H]+. Tolerance was set to 3 ppm because of the quality of the acquisition.

The original MS/MS input contains a total of 18 peaks, reduced to the half after removing isotopes. The mz data from the input culminates in a total of 6492 fragments from the database. The rational combination of them yields to up to 6500 solutions. To clean up the amount of solutions, each one was fragmented and later compared with the original MS/MS data. Close to 2000 results match with the 9 peaks of the original spectra.

Then, to obtain a most accurate result, both 1H and 13C NMR spectra of the best matched structures were predicted by an in house program and latter compared with the NMR data from the unknown structure. In our case, the 1H, 13C NMR and COSY spectra of the original unknown help us to find the structure of the 10P-909 on the first position of the ranking. Novel Aspect This algorithm operates with real fragments instead of using existing MS/MS spectra to predict the structure of the unknown compound. 

Towards an automatic structure elucidation process in various chemical workflows by LC-HRMS and NMR data analysis

Towards an automatic structure elucidation process in various chemical workflows by LC-HRMS and NMR data analysis

254th ACS National Meeting, Washington DC (United States of America) 20 August 2017 

Abstract

MassChemSite (Molecular Discovery, Ltd. UK) is a novel vendor agnostic software which automatizes the peak finding and structure elucidation from LC-HRMS analytical data obtained from chemical reaction samples, speeding up this task. MassChemSite can identify the reactants and products of a sample based on MS/MSMS information and the chemical reactions under consideration.

Two new features were recently added to MassChemSite: the first one is a method applied to elucidate unknown structures (i.e., when the m/z found is not obtained by the combination of reactions used as input) from unassigned LC-HRMS peaks. First, m/z values from the input MSMS are queried in a database of fragments of compounds (built from user compounds, or previously built and provided). Then, a set of candidate compounds is built by rational combination of the fragments found in the database.

Finally, all the candidates are fragmented and compared with the peaks of the input MSMS spectra. The second feature is an algorithm that uses different NMR acquisition methodologies to discriminate between multiple solutions, for example when LC-HRMS analysis cannot provide a unique structural solution, or to further refine the results of the unknown structure elucidation method. Chemists can load directly to the program a processed NMR experiment file or add the NMR data by hand. The experimental data (1D or 2D experiments) is compared to the predicted one based on the structures proposed by the LC-HRMS analysis, keeping only those solutions where the predicted and experimental NMR data match. 

Enabling Efficient Late‐Stage Functionalization of Drug‐Like Molecules with LC‐MS and Reaction‐Driven Data Processing

Enabling Efficient Late‐Stage Functionalization of Drug‐Like Molecules with LC‐MS and Reaction‐Driven Data Processing

European Journal of  Organic Chemistry, 2017 

Huifang Yao, Yong Liu, Sriram Tyagarajan, Eric Streckfuss, Mikhail Reibarkh,  Kuanchang Chen,  Ismael Zamora,  Fabien Fontaine,  Laura Goracci,  Roy Helmy,  Kevin P. Bateman,  Shane W. Krska

Abstract

Latestage functionalization (LSF) through CH functionalization of drug leads is a powerful synthetic strategy for drug discovery. A key challenge in LSF is that multiple regioisomeric products are often generated, which requires slow and laborious product isolation and structure confirmation steps. To address this, an analytical approach using LCHRMS/MS coupled with automated chemically aware data processing was developed. Using this method to analyze reaction screening arrays based on three common Cfunctionalization chemistries with a set of marketed drugs, the relative amount and localization of chemical modification could be determined for each regioisomeric product generated in the screening.

This approach allows one to construct a workflow in which the various regioisomeric products of a given transformation are triaged according to their site of modification, allowing downstream isolation and structure elucidation efforts to focus on those analogues of highest interest, leading to an overall increase in productivity of the LSF strategy. 

Peptide metabolism: High resolution Mass Spectrometry tool to investigate Peptide structure and amine bond metabolic susceptibility

Peptide metabolism: High resolution Mass Spectrometry tool to investigate Peptide structure and amine bond metabolic susceptibility

American Peptide Symposium, Whistler (Canada)… 17 June 2017 

 

Abstract

Several in-silico approaches have been developed such as PeptideCutter to predict peptide cleavage sites for different proteases. Moreover, several databases exist where this information is collected and stored such as MEROPS. Despite these new methodologies there are still some limitations in their usage: inability to handle unnatural amino acids and cyclic peptides. The aim of this work is to develop a new methodology to analyze the mass spectrometry driven experimental data to find those metabolites, then determine their structures, database all the results in a chemistry aware manner and finally to compute the peptide bond susceptibility by using a frequency analysis of the metabolic liability.

This approach uses ultra-performance liquid chromatography with high resolution mass spectrometry to obtain the analytical data from incubations of peptides with different enzyme matrices. Metabolite identification was performed on 13 commercial peptide compounds and 4 positive substrates for the four selected proteases (serine and aspartic). The peptides were incubated for three hours with five time points being taken during the experiment. The compounds were diverse with respect to linear and cyclic structure, containing natural and unnatural amino acids and ranged in molecular weight.

The analysis of this data set resulted in 45 metabolites that were annotated in the database. The frequency analysis revealed 26 site of cleavage and the Trp-Ser being the most frequently cleaved bond for all cases. Selectivity was identified for pancreatic elastase and trypsin/chymotrypsin because the Ser-Tyr and Leu-Ser were revealed as a most frequently cleaved bond, respectively. These results agreed with previous studies.