Analyzing Learned Molecular Representations for Property Prediction

Yang, Kevin; Swanson, Kyle; Jin, Wengong; Coley, Connor; Eiden, Philipp; Gao, Hua; Guzman-Perez, Angel; Hopper, Timothy; Kelley, Brian; Mathea, Miriam; Palmer, Andrew; Settels, Volker; Jaakkola, Tommi; Jensen, Klavs; Barzilay, Regina

dc.contributor.author	Yang, Kevin
dc.contributor.author	Swanson, Kyle
dc.contributor.author	Jin, Wengong
dc.contributor.author	Coley, Connor
dc.contributor.author	Eiden, Philipp
dc.contributor.author	Gao, Hua
dc.contributor.author	Guzman-Perez, Angel
dc.contributor.author	Hopper, Timothy
dc.contributor.author	Kelley, Brian
dc.contributor.author	Mathea, Miriam
dc.contributor.author	Palmer, Andrew
dc.contributor.author	Settels, Volker
dc.contributor.author	Jaakkola, Tommi
dc.contributor.author	Jensen, Klavs
dc.contributor.author	Barzilay, Regina
dc.date.accessioned	2021-10-27T20:05:52Z
dc.date.available	2021-10-27T20:05:52Z
dc.date.issued	2019
dc.identifier.uri	https://hdl.handle.net/1721.1/134630
dc.description.abstract	© 2019 American Chemical Society. Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial data sets spanning a wide variety of chemical end points. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary data sets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.
dc.language.iso	en
dc.publisher	American Chemical Society (ACS)
dc.relation.isversionof	10.1021/acs.jcim.9b00237
dc.rights	Creative Commons Attribution-NonCommercial-NoDerivs License
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.source	ACS
dc.title	Analyzing Learned Molecular Representations for Property Prediction
dc.type	Article
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.relation.journal	Journal of Chemical Information and Modeling
dc.eprint.version	Author's final manuscript
dc.type.uri	http://purl.org/eprint/type/JournalArticle
eprint.status	http://purl.org/eprint/status/PeerReviewed
dc.date.updated	2019-08-22T13:08:28Z
dspace.orderedauthors	Yang, K; Swanson, K; Jin, W; Coley, C; Eiden, P; Gao, H; Guzman-Perez, A; Hopper, T; Kelley, B; Mathea, M; Palmer, A; Settels, V; Jaakkola, T; Jensen, K; Barzilay, R
dspace.date.submission	2019-08-22T13:08:30Z
mit.journal.volume	59
mit.journal.issue	8
mit.license	PUBLISHER_POLICY
mit.metadata.status	Authority Work and Publication Information Needed

Files in this item

Name:: acs.jcim.9b00237.pdf
Size:: 2.976Mb
Format:: PDF
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record