Notice

This is not the latest version of this item. The latest version can be found at:https://dspace.mit.edu/handle/1721.1/139660.2

Show simple item record

dc.contributor.authorBilodeau, Camille
dc.contributor.authorJin, Wengong
dc.contributor.authorXu, Hongyun
dc.contributor.authorEmerson, Jillian A
dc.contributor.authorMukhopadhyay, Sukrit
dc.contributor.authorKalantar, Thomas H
dc.contributor.authorJaakkola, Tommi
dc.contributor.authorBarzilay, Regina
dc.contributor.authorJensen, Klavs F
dc.date.accessioned2022-01-24T14:09:17Z
dc.date.available2022-01-24T14:09:17Z
dc.date.issued2021-11-15
dc.identifier.urihttps://hdl.handle.net/1721.1/139660
dc.description.abstractWhile molecular discovery is critical for solving many scientific problems, the time and resource costs of experiments make it intractable to fully explore chemical space. Here, we present a generative modeling framework that proposes novel molecules that are 1) based on starting candidate structures and 2) optimized with respect to one or more objectives or constraints. We explore how this framework performs in an applied setting by focusing on the problem of optimizing molecules for aqueous solubility, using an experimental database containing data curated from the literature. The resulting model was capable of improving molecules with a range of starting solubilities. When synthetic feasibility was applied as a secondary optimization constraint (estimated using a combination of synthetic accessibility and retrosynthetic accessibility scores), the model generated synthetically feasible molecules 83.0% of the time (compared with 59.9% of the time without the constraint). To validate model performance experimentally, a set of candidate molecules was translated using the model and the solubilities of the candidate and generated molecules were verified experimentally. We additionally validated model performance via experimental measurements by holding out the top 100 most soluble molecules during training and showing that the model could rediscover 33 of those molecules. To determine the sensitivity of model performance to dataset size, we trained the model on different subsets of the initial training dataset. We found that model performance did not decrease significantly when the model was trained on a random 50% subset of the training data but did decrease when the model was trained on subsets containing only less soluble molecules (i.e., the bottom 50%). Overall, this framework serves as a tool for generating optimized, synthetically feasible molecules that can be applied to a range of problems in chemistry and chemical engineering.en_US
dc.language.isoen
dc.publisherRoyal Society of Chemistry (RSC)en_US
dc.relation.isversionof10.1039/d1re00315aen_US
dc.rightsCreative Commons Attribution 3.0 unported licenseen_US
dc.rights.urihttps://creativecommons.org/licenses/by/3.0/en_US
dc.sourceRoyal Society of Chemistry (RSC)en_US
dc.titleGenerating molecules with optimized aqueous solubility using iterative graph translationen_US
dc.typeArticleen_US
dc.identifier.citationBilodeau, Camille, Jin, Wengong, Xu, Hongyun, Emerson, Jillian A, Mukhopadhyay, Sukrit et al. 2021. "Generating molecules with optimized aqueous solubility using iterative graph translation." Reaction Chemistry & Engineering.
dc.relation.journalReaction Chemistry & Engineeringen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2022-01-24T14:02:55Z
dspace.orderedauthorsBilodeau, C; Jin, W; Xu, H; Emerson, JA; Mukhopadhyay, S; Kalantar, TH; Jaakkola, T; Barzilay, R; Jensen, KFen_US
dspace.date.submission2022-01-24T14:02:57Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

VersionItemDateSummary

*Selected version