Generating molecules with optimized aqueous solubility using iterative graph translation
Author(s)Bilodeau, Camille; Jin, Wengong; Xu, Hongyun; Emerson, Jillian A; Mukhopadhyay, Sukrit; Kalantar, Thomas H; Jaakkola, Tommi; Barzilay, Regina; Jensen, Klavs F; ... Show more Show less
MetadataShow full item record
While molecular discovery is critical for solving many scientific problems, the time and resource costs of experiments make it intractable to fully explore chemical space. Here, we present a generative modeling framework that proposes novel molecules that are 1) based on starting candidate structures and 2) optimized with respect to one or more objectives or constraints. We explore how this framework performs in an applied setting by focusing on the problem of optimizing molecules for aqueous solubility, using an experimental database containing data curated from the literature. The resulting model was capable of improving molecules with a range of starting solubilities. When synthetic feasibility was applied as a secondary optimization constraint (estimated using a combination of synthetic accessibility and retrosynthetic accessibility scores), the model generated synthetically feasible molecules 83.0% of the time (compared with 59.9% of the time without the constraint). To validate model performance experimentally, a set of candidate molecules was translated using the model and the solubilities of the candidate and generated molecules were verified experimentally. We additionally validated model performance via experimental measurements by holding out the top 100 most soluble molecules during training and showing that the model could rediscover 33 of those molecules. To determine the sensitivity of model performance to dataset size, we trained the model on different subsets of the initial training dataset. We found that model performance did not decrease significantly when the model was trained on a random 50% subset of the training data but did decrease when the model was trained on subsets containing only less soluble molecules (i.e., the bottom 50%). Overall, this framework serves as a tool for generating optimized, synthetically feasible molecules that can be applied to a range of problems in chemistry and chemical engineering.
Reaction Chemistry & Engineering
Royal Society of Chemistry (RSC)
Bilodeau, Camille, Jin, Wengong, Xu, Hongyun, Emerson, Jillian A, Mukhopadhyay, Sukrit et al. 2021. "Generating molecules with optimized aqueous solubility using iterative graph translation." Reaction Chemistry & Engineering.
Final published version