Show simple item record

dc.contributor.authorGoldman, Samuel
dc.contributor.authorXin, Jiayi
dc.contributor.authorProvenzano, Joules
dc.contributor.authorColey, Connor W
dc.date.accessioned2026-04-14T18:56:42Z
dc.date.available2026-04-14T18:56:42Z
dc.date.issued2023-09-19
dc.identifier.urihttps://hdl.handle.net/1721.1/165432
dc.description.abstractChemical formula annotation for tandem mass spectrometry (MS/MS) data is the first step toward structurally elucidating unknown metabolites. While great strides have been made toward solving this problem, the current state-of-the-art method depends on time-intensive, proprietary, and expert-parametrized fragmentation tree construction and scoring. In this work, we extend our previous spectrum Transformer methodology into an energy-based modeling framework, MIST-CF: Metabolite Inference with Spectrum Transformers for Chemical Formula prediction, for learning to rank chemical formula and adduct assignments given an unannotated MS/MS spectrum. Importantly, MIST-CF learns in a data-dependent fashion using a Formula Transformer neural network architecture and circumvents the need for fragmentation tree construction. We train and evaluate our model on a large open-access database, showing an absolute improvement of 10% top 1 accuracy over other neural network architectures. We further validate our approach on the CASMI2022 challenge data set, achieving nearly equivalent performance to the winning entry within the positive mode category without any manual curation or postprocessing of our results. These results demonstrate an exciting strategy to more powerfully leverage MS2 fragment peaks for predicting MS1 precursor chemical formulas with data-driven learning.en_US
dc.language.isoen
dc.publisherAmerican Chemical Societyen_US
dc.relation.isversionof10.1021/acs.jcim.3c01082en_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceauthoren_US
dc.titleMIST-CF: Chemical Formula Inference from Tandem Mass Spectraen_US
dc.typeArticleen_US
dc.identifier.citationGoldman, Samuel, Xin, Jiayi, Provenzano, Joules and Coley, Connor W. 2023. "MIST-CF: Chemical Formula Inference from Tandem Mass Spectra." Journal of Chemical Information and Modeling, 64 (7).
dc.contributor.departmentMassachusetts Institute of Technology. Computational and Systems Biology Programen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Chemical Engineeringen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.relation.journalJournal of Chemical Information and Modelingen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2026-04-14T18:35:13Z
dspace.orderedauthorsGoldman, S; Xin, J; Provenzano, J; Coley, CWen_US
dspace.date.submission2026-04-14T18:35:15Z
mit.journal.volume64en_US
mit.journal.issue7en_US
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record