Generating Molecular Fragmentation Graphs with Autoregressive Neural Networks
Author(s)
Goldman, Samuel; Li, Janet; Coley, Connor W
DownloadAccepted version (1.087Mb)
Open Access Policy
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
The accurate prediction of tandem mass spectra from molecular structures has the potential to unlock new metabolomic discoveries by augmenting the community's libraries of experimental reference standards. Cheminformatic spectrum prediction strategies use a "bond-breaking" framework to iteratively simulate mass spectrum fragmentations, but these methods are (a) slow due to the need to exhaustively and combinatorially break molecules and (b) inaccurate as they often rely upon heuristics to predict the intensity of each resulting fragment; neural network alternatives mitigate computational cost but are black-box and not inherently more accurate. We introduce a physically grounded neural approach that learns to predict each breakage event and score the most relevant subset of molecular fragments quickly and accurately. We evaluate our model by predicting spectra from both public and private standard libraries, demonstrating that our hybrid approach offers state-of-the-art prediction accuracy, improved metabolite identification from a database of candidates, and higher interpretability when compared to previous breakage methods and black-box neural networks. The grounding of our approach in physical fragmentation events shows especially great promise for elucidating natural product molecules with more complex scaffolds.
Date issued
2024-02-27Department
Massachusetts Institute of Technology. Computational and Systems Biology Program; Massachusetts Institute of Technology. Department of Chemical EngineeringJournal
Analytical Chemistry
Publisher
American Chemical Society
Citation
Samuel Goldman, Janet Li, and Connor W. Coley. Analytical Chemistry 2024 96 (8), 3419-3428.
Version: Author's final manuscript