Deep Learning Multimodal Extraction of Reaction Data

Wang, Alex

dc.contributor.advisor	Barzilay, Regina
dc.contributor.author	Wang, Alex
dc.date.accessioned	2024-10-09T18:27:17Z
dc.date.available	2024-10-09T18:27:17Z
dc.date.issued	2024-09
dc.date.submitted	2024-10-07T14:34:34.218Z
dc.identifier.uri	https://hdl.handle.net/1721.1/157191
dc.description.abstract	Automated extraction of structured information from chemistry literature is vital for maintaining up-to-date databases for use in data-driven chemistry. However, comprehensive extractions require reasoning across multiple modalities and the flexibility to generalize across different styles of articles. Our work on OpenChemIE presents a multimodal system that reasons across text, tables, and figures to parse reaction data. In particular, our system is able to infer structures in substrate scope diagrams as well as align reactions with their metadata defined elsewhere. In addition, we explore the chemistry information extraction potential of Vision Language Models (VLM), which allow powerful large language models to leverage visual understanding. Our findings indicate that VLMs still require additional work in order to meet the performance of our bespoke models.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Deep Learning Multimodal Extraction of Reaction Data
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: wang-wang7776-meng-eecs-2024-t ...
Size:: 1.039Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record