Generating molecules with optimized aqueous solubility using iterative graph translation

Bilodeau, Camille; Jin, Wengong; Xu, Hongyun; Emerson, Jillian A; Mukhopadhyay, Sukrit; Kalantar, Thomas H; Jaakkola, Tommi; Barzilay, Regina; Jensen, Klavs F

Notice

This is not the latest version of this item. The latest version can be found at:https://dspace.mit.edu/handle/1721.1/139660.2

Show simple item record

dc.contributor.author	Bilodeau, Camille
dc.contributor.author	Jin, Wengong
dc.contributor.author	Xu, Hongyun
dc.contributor.author	Emerson, Jillian A
dc.contributor.author	Mukhopadhyay, Sukrit
dc.contributor.author	Kalantar, Thomas H
dc.contributor.author	Jaakkola, Tommi
dc.contributor.author	Barzilay, Regina
dc.contributor.author	Jensen, Klavs F
dc.date.accessioned	2022-01-24T14:09:17Z
dc.date.available	2022-01-24T14:09:17Z
dc.date.issued	2021-11-15
dc.identifier.uri	https://hdl.handle.net/1721.1/139660
dc.description.abstract	While molecular discovery is critical for solving many scientific problems, the time and resource costs of experiments make it intractable to fully explore chemical space. Here, we present a generative modeling framework that proposes novel molecules that are 1) based on starting candidate structures and 2) optimized with respect to one or more objectives or constraints. We explore how this framework performs in an applied setting by focusing on the problem of optimizing molecules for aqueous solubility, using an experimental database containing data curated from the literature. The resulting model was capable of improving molecules with a range of starting solubilities. When synthetic feasibility was applied as a secondary optimization constraint (estimated using a combination of synthetic accessibility and retrosynthetic accessibility scores), the model generated synthetically feasible molecules 83.0% of the time (compared with 59.9% of the time without the constraint). To validate model performance experimentally, a set of candidate molecules was translated using the model and the solubilities of the candidate and generated molecules were verified experimentally. We additionally validated model performance via experimental measurements by holding out the top 100 most soluble molecules during training and showing that the model could rediscover 33 of those molecules. To determine the sensitivity of model performance to dataset size, we trained the model on different subsets of the initial training dataset. We found that model performance did not decrease significantly when the model was trained on a random 50% subset of the training data but did decrease when the model was trained on subsets containing only less soluble molecules (i.e., the bottom 50%). Overall, this framework serves as a tool for generating optimized, synthetically feasible molecules that can be applied to a range of problems in chemistry and chemical engineering.	en_US
dc.language.iso	en
dc.publisher	Royal Society of Chemistry (RSC)	en_US
dc.relation.isversionof	10.1039/d1re00315a	en_US
dc.rights	Creative Commons Attribution 3.0 unported license	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/3.0/	en_US
dc.source	Royal Society of Chemistry (RSC)	en_US
dc.title	Generating molecules with optimized aqueous solubility using iterative graph translation	en_US
dc.type	Article	en_US
dc.identifier.citation	Bilodeau, Camille, Jin, Wengong, Xu, Hongyun, Emerson, Jillian A, Mukhopadhyay, Sukrit et al. 2021. "Generating molecules with optimized aqueous solubility using iterative graph translation." Reaction Chemistry & Engineering.
dc.relation.journal	Reaction Chemistry & Engineering	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2022-01-24T14:02:55Z
dspace.orderedauthors	Bilodeau, C; Jin, W; Xu, H; Emerson, JA; Mukhopadhyay, S; Kalantar, TH; Jaakkola, T; Barzilay, R; Jensen, KF	en_US
dspace.date.submission	2022-01-24T14:02:57Z
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: d1re00315a.pdf
Size:: 4.139Mb
Format:: PDF
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record

Version	Item	Date	Summary
2	1721.1/139660.2	2022-01-24T15:26:07Z	Publication information verified/added.
1	1721.1/139660*	2022-01-24T14:09:17Z

DSpace@MIT

Notice

Generating molecules with optimized aqueous solubility using iterative graph translation

Files in this item

This item appears in the following Collection(s)

Version History