Show simple item record

dc.contributor.advisorBarzilay, Regina
dc.contributor.authorWang, Alex
dc.date.accessioned2024-10-09T18:27:17Z
dc.date.available2024-10-09T18:27:17Z
dc.date.issued2024-09
dc.date.submitted2024-10-07T14:34:34.218Z
dc.identifier.urihttps://hdl.handle.net/1721.1/157191
dc.description.abstractAutomated extraction of structured information from chemistry literature is vital for maintaining up-to-date databases for use in data-driven chemistry. However, comprehensive extractions require reasoning across multiple modalities and the flexibility to generalize across different styles of articles. Our work on OpenChemIE presents a multimodal system that reasons across text, tables, and figures to parse reaction data. In particular, our system is able to infer structures in substrate scope diagrams as well as align reactions with their metadata defined elsewhere. In addition, we explore the chemistry information extraction potential of Vision Language Models (VLM), which allow powerful large language models to leverage visual understanding. Our findings indicate that VLMs still require additional work in order to meet the performance of our bespoke models.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleDeep Learning Multimodal Extraction of Reaction Data
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record