A quantitative uncertainty metric controls error in neural network-driven chemical discovery

Janet, Jon Paul; Duan, Chenru; Yang, Tzuhsiung; Nandy, Aditya; Kulik, Heather J

Notice

This is not the latest version of this item. The latest version can be found at:https://dspace.mit.edu/handle/1721.1/134631.2

Show simple item record

dc.contributor.author	Janet, Jon Paul
dc.contributor.author	Duan, Chenru
dc.contributor.author	Yang, Tzuhsiung
dc.contributor.author	Nandy, Aditya
dc.contributor.author	Kulik, Heather J
dc.date.accessioned	2021-10-27T20:05:53Z
dc.date.available	2021-10-27T20:05:53Z
dc.date.issued	2019
dc.identifier.uri	https://hdl.handle.net/1721.1/134631
dc.description.abstract	This journal is © The Royal Society of Chemistry. Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (e.g., ensemble models) or rely on feature engineering (e.g., feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning.
dc.language.iso	en
dc.publisher	Royal Society of Chemistry (RSC)
dc.relation.isversionof	10.1039/C9SC02298H
dc.rights	Creative Commons Attribution Noncommercial 3.0 unported license
dc.rights.uri	https://creativecommons.org/licenses/by-nc/3.0/
dc.source	Royal Society of Chemistry (RSC)
dc.title	A quantitative uncertainty metric controls error in neural network-driven chemical discovery
dc.type	Article
dc.relation.journal	Chemical Science
dc.eprint.version	Final published version
dc.type.uri	http://purl.org/eprint/type/JournalArticle
eprint.status	http://purl.org/eprint/status/PeerReviewed
dc.date.updated	2021-06-11T16:45:46Z
dspace.orderedauthors	Janet, JP; Duan, C; Yang, T; Nandy, A; Kulik, HJ
dspace.date.submission	2021-06-11T16:45:47Z
mit.journal.volume	10
mit.journal.issue	34
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed

Files in this item

Name:: c9sc02298h.pdf
Size:: 2.281Mb
Format:: PDF
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record

Version	Item	Date	Summary
2	1721.1/134631.2	2022-03-23T14:21:17Z	Publication information verified/added.
1	1721.1/134631*	2021-10-27T20:05:53Z

DSpace@MIT

Notice

A quantitative uncertainty metric controls error in neural network-driven chemical discovery

Files in this item

This item appears in the following Collection(s)

Version History