Show simple item record

dc.contributor.authorJanet, Jon Paul
dc.contributor.authorDuan, Chenru
dc.contributor.authorYang, Tzuhsiung
dc.contributor.authorNandy, Aditya
dc.contributor.authorKulik, Heather J.
dc.date.accessioned2022-03-23T14:47:01Z
dc.date.available2021-10-27T20:05:53Z
dc.date.available2022-03-23T14:47:01Z
dc.date.issued2019-07
dc.date.submitted2019-05
dc.identifier.issn2041-6520
dc.identifier.issn2041-6539
dc.identifier.urihttps://hdl.handle.net/1721.1/134631.2
dc.description.abstractThis journal is © The Royal Society of Chemistry. Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (e.g., ensemble models) or rely on feature engineering (e.g., feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning.en_US
dc.language.isoen
dc.publisherRoyal Society of Chemistry (RSC)en_US
dc.relation.isversionofhttp://dx.doi.org/10.1039/c9sc02298hen_US
dc.rightsCreative Commons Attribution Noncommercial 3.0 unported licenseen_US
dc.rights.urihttps://creativecommons.org/licenses/by-nc/3.0/en_US
dc.sourceRoyal Society of Chemistry (RSC)en_US
dc.titleA quantitative uncertainty metric controls error in neural network-driven chemical discoveryen_US
dc.typeArticleen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Chemical Engineering
dc.contributor.departmentMassachusetts Institute of Technology. Department of Chemistry
dc.relation.journalChemical Scienceen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2021-06-11T16:45:46Z
dspace.orderedauthorsJanet, JP; Duan, C; Yang, T; Nandy, A; Kulik, HJen_US
dspace.date.submission2021-06-11T16:45:47Z
mit.journal.volume10en_US
mit.journal.issue34en_US
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

VersionItemDateSummary

*Selected version