Show simple item record

dc.contributor.authorMo, Yiming
dc.contributor.authorGuan, Yanfei
dc.contributor.authorVerma, Pritha
dc.contributor.authorGuo, Jiang
dc.contributor.authorFortunato, Mike E
dc.contributor.authorLu, Zhaohong
dc.contributor.authorColey, Connor W
dc.contributor.authorJensen, Klavs F
dc.date.accessioned2021-10-27T19:52:02Z
dc.date.available2021-10-27T19:52:02Z
dc.date.issued2021
dc.identifier.urihttps://hdl.handle.net/1721.1/133304
dc.description.abstractWith recent advances in the computer-aided synthesis planning (CASP) powered by data science and machine learning, modern CASP programs can rapidly identify thousands of potential pathways for a given target molecule. However, the lack of a holistic pathway evaluation mechanism makes it challenging to systematically prioritize strategic pathways except for using some simple heuristics. Herein, we introduce a data-driven approach to evaluate the relative strategic levels of retrosynthesis pathways using a dynamic tree-structured long short-term memory (tree-LSTM) model. We first curated a retrosynthesis pathway database, containing 238k patent-extracted pathways along with ∼55 M artificial pathways generated from an open-source CASP program, ASKCOS. The tree-LSTM model was trained to differentiate patent-extracted and artificial pathways with the same target molecule in order to learn the strategic relationship among single-step reactions within the patent-extracted pathways. The model achieved a top-1 ranking accuracy of 79.1% to recognize patent-extracted pathways. In addition, the trained tree-LSTM model learned to encode pathway-level information into a representative latent vector, which can facilitate clustering similar pathways to help illustrate strategically diverse pathways generated from CASP programs.
dc.language.isoen
dc.publisherRoyal Society of Chemistry (RSC)
dc.relation.isversionof10.1039/d0sc05078d
dc.rightsCreative Commons Attribution 3.0 unported license
dc.rights.urihttps://creativecommons.org/licenses/by/3.0/
dc.sourceRoyal Society of Chemistry (RSC)
dc.titleEvaluating and clustering retrosynthesis pathways with learned strategy
dc.typeArticle
dc.contributor.departmentMassachusetts Institute of Technology. Department of Chemical Engineering
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.contributor.departmentMassachusetts Institute of Technology. Department of Chemistry
dc.relation.journalChemical Science
dc.eprint.versionFinal published version
dc.type.urihttp://purl.org/eprint/type/JournalArticle
eprint.statushttp://purl.org/eprint/status/PeerReviewed
dc.date.updated2021-06-09T16:40:36Z
dspace.orderedauthorsMo, Y; Guan, Y; Verma, P; Guo, J; Fortunato, ME; Lu, Z; Coley, CW; Jensen, KF
dspace.date.submission2021-06-09T16:40:37Z
mit.journal.volume12
mit.journal.issue4
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Needed


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record