Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction
Author(s)
Coley, Connor Wilson; Barzilay, Regina; Green Jr, William H; Jaakkola, Tommi S; Jensen, Klavs F
DownloadConey Manuscript.pdf (1.057Mb)
PUBLISHER_POLICY
Publisher Policy
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Terms of use
Metadata
Show full item recordAbstract
The task of learning an expressive molecular representation is central to developing quantitative structure–activity and property relationships. Traditional approaches rely on group additivity rules, empirical measurements or parameters, or generation of thousands of descriptors. In this paper, we employ a convolutional neural network for this embedding task by treating molecules as undirected graphs with attributed nodes and edges. Simple atom and bond attributes are used to construct atom-specific feature vectors that take into account the local chemical environment using different neighborhood radii. By working directly with the full molecular graph, there is a greater opportunity for models to identify important features relevant to a prediction task. Unlike other graph-based approaches, our atom featurization preserves molecule-level spatial information that significantly enhances model performance. Our models learn to identify important features of atom clusters for the prediction of aqueous solubility, octanol solubility, melting point, and toxicity. Extensions and limitations of this strategy are discussed.
Date issued
2017-07Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Chemical EngineeringJournal
Journal of Chemical Information and Modeling
Publisher
American Chemical Society (ACS)
Citation
Coley, Connor W. et al “Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction.” Journal of Chemical Information and Modeling 57, 8 (July 2017): 1757–1772 © 2017 American Chemical Society
Version: Author's final manuscript
ISSN
1549-9596
1549-960X