Are learned molecular representations ready for prime time?
Author(s)
Yang, Kevin,M. Eng.Massachusetts Institute of Technology.
Download1127567158-MIT.pdf (3.500Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Regina Barzilay.
Terms of use
Metadata
Show full item recordAbstract
Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors, and graph convolutional neural networks that construct a learned molecular representation by operating on the molecular graph. In this paper, I benchmark models extensively on 15 proprietary industrial datasets spanning a wide variety of chemical endpoints. In addition, I introduce a graph convolutional model that consistently outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary datasets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, the proposed model nevertheless offers significant improvements over models currently used in industrial workflows. In addition, I demonstrate that similar models show promise in the molecular generation setting.
Description
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019 Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 65-69).
Date issued
2019Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.