Evaluating and clustering retrosynthesis pathways with learned strategy
Author(s)
Mo, Yiming; Guan, Yanfei; Verma, Pritha; Guo, Jiang; Fortunato, Mike E; Lu, Zhaohong; Coley, Connor W; Jensen, Klavs F; ... Show more Show less
DownloadPublished version (1.900Mb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
With recent advances in the computer-aided synthesis planning (CASP) powered by data science and machine learning, modern CASP programs can rapidly identify thousands of potential pathways for a given target molecule. However, the lack of a holistic pathway evaluation mechanism makes it challenging to systematically prioritize strategic pathways except for using some simple heuristics. Herein, we introduce a data-driven approach to evaluate the relative strategic levels of retrosynthesis pathways using a dynamic tree-structured long short-term memory (tree-LSTM) model. We first curated a retrosynthesis pathway database, containing 238k patent-extracted pathways along with ∼55 M artificial pathways generated from an open-source CASP program, ASKCOS. The tree-LSTM model was trained to differentiate patent-extracted and artificial pathways with the same target molecule in order to learn the strategic relationship among single-step reactions within the patent-extracted pathways. The model achieved a top-1 ranking accuracy of 79.1% to recognize patent-extracted pathways. In addition, the trained tree-LSTM model learned to encode pathway-level information into a representative latent vector, which can facilitate clustering similar pathways to help illustrate strategically diverse pathways generated from CASP programs.
Date issued
2021Department
Massachusetts Institute of Technology. Department of Chemical Engineering; Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of ChemistryJournal
Chemical Science
Publisher
Royal Society of Chemistry (RSC)