dc.contributor.author | Qian, Yujie | |
dc.contributor.author | Guo, Jiang | |
dc.contributor.author | Tu, Zhengkai | |
dc.contributor.author | Coley, Connor W | |
dc.contributor.author | Barzilay, Regina | |
dc.date.accessioned | 2025-02-07T20:19:07Z | |
dc.date.available | 2025-02-07T20:19:07Z | |
dc.date.issued | 2023-07-10 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/158184 | |
dc.description.abstract | Reaction diagram parsing is the task of extracting reaction schemes from a diagram in the chemistry literature. The reaction diagrams can be arbitrarily complex; thus, robustly parsing them into structured data is an open challenge. In this paper, we present RxnScribe, a machine learning model for parsing reaction diagrams of varying styles. We formulate this structured prediction task with a sequence generation approach, which condenses the traditional pipeline into an end-to-end model. We train RxnScribe on a dataset of 1378 diagrams and evaluate it with cross validation, achieving an 80.0% soft match F1 score, with significant improvements over previous models. Our code and data are publicly available at https://github.com/thomas0809/RxnScribe. | en_US |
dc.language.iso | en | |
dc.publisher | American Chemical Society (ACS) | en_US |
dc.relation.isversionof | 10.1021/acs.jcim.3c00439 | en_US |
dc.rights | Creative Commons Attribution-Noncommercial-ShareAlike | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | en_US |
dc.source | arxiv | en_US |
dc.title | RxnScribe: A Sequence Generation Model for Reaction Diagram Parsing | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Yujie Qian, Jiang Guo, Zhengkai Tu, Connor W. Coley, and Regina Barzilay. Journal of Chemical Information and Modeling 2023 63 (13), 4030-4041. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Chemical Engineering | en_US |
dc.relation.journal | Journal of Chemical Information and Modeling | en_US |
dc.eprint.version | Author's final manuscript | en_US |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
dc.date.updated | 2025-02-07T20:13:40Z | |
dspace.orderedauthors | Qian, Y; Guo, J; Tu, Z; Coley, CW; Barzilay, R | en_US |
dspace.date.submission | 2025-02-07T20:13:41Z | |
mit.journal.volume | 63 | en_US |
mit.journal.issue | 13 | en_US |
mit.license | OPEN_ACCESS_POLICY | |
mit.metadata.status | Authority Work and Publication Information Needed | en_US |