An innovative algorithm to elucidate the structure of unknown compounds using tandem Mass Spectrometry and NMR data

An innovative algorithm to elucidate the structure of unknown compounds using tandem Mass Spectrometry and NMR data

65th ASMS Conference on Mass Spectrometry and Allied Topics, Indianapolis, IN (United States of America) 04 June 2017

Abstract

The interpretation of data obtained by tandem mass spectrometry is usually the bottleneck in different areas. This process becomes more complicated if the scientist does not have any clue about the structure of the analyzed compound. Until now, several algorithms have been developed to make easier the structural determination of the MS/MS data. Unfortunately, most of them use a database of interpreted MS/MS spectra where the input data is queried, which can reduce the number of potential results to those ones contained in the original dataset.

The algorithm developed and presented here makes the difference between other software in the origin of the initial dataset where the input MS/MS data is looked into. Methods The presented methodology is composed by three parts. The creation of the database is the first one. Users can choose a set of compounds from their own data or take them from an external database. Those compounds will be fragmented and stored on the database individually. In the second part, mz values from the input MS/MS are queried on the database and used to build a set of candidates by its rational combination. In the last part of the code, all the candidates are fragmented and compared with the peaks of the input MS/MS spectra. To decrease the number results, the NMR spectra of all of them is predicted and compared with the experimental NMR data of unknown compound. Preliminary Data The algorithm has been developed and successfully tested using a small set of compounds from the pharmacological area. One of them is 10P-909 (PubChem CID: 1480036; IUPAC name: 2-chloro-5-[[4-[3-(trifluoromethyl) phenyl] piperazin-1-yl]methyl]-1,3-thiazole).

The procedure to elucidate this benchmarking compound will be described in the following lines. The first step we made was the creation of the database. As a set of compounds, we use 500,000 structures from PubChem. Then, those compounds were fragmented using an in-house code, generating a final set of 6 million of independent fragments. Next, the algorithm was feed with the required parameters: the mz of the unknown structure, the tolerance given to this mz value, the ion mode, the adduct type and the MS/MS data. In this case study, the mz of the unknown compound is 362.0703, was acquired with positive ion mode and its adduct type is [M+H]+. Tolerance was set to 3 ppm because of the quality of the acquisition.

The original MS/MS input contains a total of 18 peaks, reduced to the half after removing isotopes. The mz data from the input culminates in a total of 6492 fragments from the database. The rational combination of them yields to up to 6500 solutions. To clean up the amount of solutions, each one was fragmented and later compared with the original MS/MS data. Close to 2000 results match with the 9 peaks of the original spectra.

Then, to obtain a most accurate result, both 1H and 13C NMR spectra of the best matched structures were predicted by an in house program and latter compared with the NMR data from the unknown structure. In our case, the 1H, 13C NMR and COSY spectra of the original unknown help us to find the structure of the 10P-909 on the first position of the ranking. Novel Aspect This algorithm operates with real fragments instead of using existing MS/MS spectra to predict the structure of the unknown compound. 

Towards an automatic structure elucidation process in various chemical workflows by LC-HRMS and NMR data analysis

Towards an automatic structure elucidation process in various chemical workflows by LC-HRMS and NMR data analysis

254th ACS National Meeting, Washington DC (United States of America) 20 August 2017 

Abstract

MassChemSite (Molecular Discovery, Ltd. UK) is a novel vendor agnostic software which automatizes the peak finding and structure elucidation from LC-HRMS analytical data obtained from chemical reaction samples, speeding up this task. MassChemSite can identify the reactants and products of a sample based on MS/MSMS information and the chemical reactions under consideration.

Two new features were recently added to MassChemSite: the first one is a method applied to elucidate unknown structures (i.e., when the m/z found is not obtained by the combination of reactions used as input) from unassigned LC-HRMS peaks. First, m/z values from the input MSMS are queried in a database of fragments of compounds (built from user compounds, or previously built and provided). Then, a set of candidate compounds is built by rational combination of the fragments found in the database.

Finally, all the candidates are fragmented and compared with the peaks of the input MSMS spectra. The second feature is an algorithm that uses different NMR acquisition methodologies to discriminate between multiple solutions, for example when LC-HRMS analysis cannot provide a unique structural solution, or to further refine the results of the unknown structure elucidation method. Chemists can load directly to the program a processed NMR experiment file or add the NMR data by hand. The experimental data (1D or 2D experiments) is compared to the predicted one based on the structures proposed by the LC-HRMS analysis, keeping only those solutions where the predicted and experimental NMR data match. 

Enabling Efficient Late‐Stage Functionalization of Drug‐Like Molecules with LC‐MS and Reaction‐Driven Data Processing

Enabling Efficient Late‐Stage Functionalization of Drug‐Like Molecules with LC‐MS and Reaction‐Driven Data Processing

European Journal of  Organic Chemistry, 2017 

Huifang Yao, Yong Liu, Sriram Tyagarajan, Eric Streckfuss, Mikhail Reibarkh,  Kuanchang Chen,  Ismael Zamora,  Fabien Fontaine,  Laura Goracci,  Roy Helmy,  Kevin P. Bateman,  Shane W. Krska

Abstract

Latestage functionalization (LSF) through CH functionalization of drug leads is a powerful synthetic strategy for drug discovery. A key challenge in LSF is that multiple regioisomeric products are often generated, which requires slow and laborious product isolation and structure confirmation steps. To address this, an analytical approach using LCHRMS/MS coupled with automated chemically aware data processing was developed. Using this method to analyze reaction screening arrays based on three common Cfunctionalization chemistries with a set of marketed drugs, the relative amount and localization of chemical modification could be determined for each regioisomeric product generated in the screening.

This approach allows one to construct a workflow in which the various regioisomeric products of a given transformation are triaged according to their site of modification, allowing downstream isolation and structure elucidation efforts to focus on those analogues of highest interest, leading to an overall increase in productivity of the LSF strategy. 

A case study of the MassChemSite Reaction Tracking Workflow: Detecting and identifying byproducts during PROTAC synthesis

A case study of the MassChemSite Reaction Tracking Workflow: Detecting and identifying byproducts during PROTAC synthesis

68th ASMS Conference on Mass Spectrometry and Allied Topics Reboot. Online. June 2020

Abstract

PROTACs are heterobifunctional small molecules composed of a ligand for a protein of interest (POI) and an E3 ligase recruiter connected through a linker.1 Instead of inhibiting the protein functions, PROTACs promote the formation of a ternary complex with POI and E3 ligase, inducing POI poly-ubiquitylation and its successive proteasomal-dependent degradation. 

This appealing technology has already attracted great attention from both academia and industry, and the optimization of PROTACs’ synthetic procedures is now needed. As an example, to automatically find byproducts formed during the synthesis of PROTAC, in this poster we will present the use of the Reaction Tracking workflow included in MassChemsite. This workflow is designed for untargeted multicomponent reactions. 

Structural elucidation tools to enhance organic synthesis productivity

Structural elucidation tools to enhance organic synthesis productivity

66th ASMS Conference on Mass Spectrometry and Allied Topics, San Diego (United States of America) … 06 June 2018 

Abstract

The majority of organic synthesis workflows end up with the synthesis of at least few milligrams of pure compound, which structure is corroborated by Nuclear Magnetic Resonance spectroscopy.  Therefore, it needs first to use relatively large quantities of initial materials and purify the reaction crude before knowing if the desired compound has been obtained. The chemist uses LCMS prior purification to identify if a peak with the expected mass was formed. Nowadays there are Mass Spectrometry techniques that with the aid of computational algorithms can determine if the desired compound was obtained, as well as if there were other interesting compounds formed with minimal amount of sample and without the need of purification, making the synthetic process more time/cost effective.