dc.contributor.author | Mo, Yiming | |
dc.contributor.author | Guan, Yanfei | |
dc.contributor.author | Verma, Pritha | |
dc.contributor.author | Guo, Jiang | |
dc.contributor.author | Fortunato, Mike E | |
dc.contributor.author | Lu, Zhaohong | |
dc.contributor.author | Coley, Connor W | |
dc.contributor.author | Jensen, Klavs F | |
dc.date.accessioned | 2021-10-27T19:52:02Z | |
dc.date.available | 2021-10-27T19:52:02Z | |
dc.date.issued | 2021 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/133304 | |
dc.description.abstract | With recent advances in the computer-aided synthesis planning (CASP) powered by data science and machine learning, modern CASP programs can rapidly identify thousands of potential pathways for a given target molecule. However, the lack of a holistic pathway evaluation mechanism makes it challenging to systematically prioritize strategic pathways except for using some simple heuristics. Herein, we introduce a data-driven approach to evaluate the relative strategic levels of retrosynthesis pathways using a dynamic tree-structured long short-term memory (tree-LSTM) model. We first curated a retrosynthesis pathway database, containing 238k patent-extracted pathways along with ∼55 M artificial pathways generated from an open-source CASP program, ASKCOS. The tree-LSTM model was trained to differentiate patent-extracted and artificial pathways with the same target molecule in order to learn the strategic relationship among single-step reactions within the patent-extracted pathways. The model achieved a top-1 ranking accuracy of 79.1% to recognize patent-extracted pathways. In addition, the trained tree-LSTM model learned to encode pathway-level information into a representative latent vector, which can facilitate clustering similar pathways to help illustrate strategically diverse pathways generated from CASP programs. | |
dc.language.iso | en | |
dc.publisher | Royal Society of Chemistry (RSC) | |
dc.relation.isversionof | 10.1039/d0sc05078d | |
dc.rights | Creative Commons Attribution 3.0 unported license | |
dc.rights.uri | https://creativecommons.org/licenses/by/3.0/ | |
dc.source | Royal Society of Chemistry (RSC) | |
dc.title | Evaluating and clustering retrosynthesis pathways with learned strategy | |
dc.type | Article | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Chemical Engineering | |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Chemistry | |
dc.relation.journal | Chemical Science | |
dc.eprint.version | Final published version | |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | |
eprint.status | http://purl.org/eprint/status/PeerReviewed | |
dc.date.updated | 2021-06-09T16:40:36Z | |
dspace.orderedauthors | Mo, Y; Guan, Y; Verma, P; Guo, J; Fortunato, ME; Lu, Z; Coley, CW; Jensen, KF | |
dspace.date.submission | 2021-06-09T16:40:37Z | |
mit.journal.volume | 12 | |
mit.journal.issue | 4 | |
mit.license | PUBLISHER_CC | |
mit.metadata.status | Authority Work and Publication Information Needed | |