A quantitative uncertainty metric controls error in neural network-driven chemical discovery

Janet, Jon Paul; Duan, Chenru; Yang, Tzuhsiung; Nandy, Aditya; Kulik, Heather J.

dc.contributor.author	Janet, Jon Paul
dc.contributor.author	Duan, Chenru
dc.contributor.author	Yang, Tzuhsiung
dc.contributor.author	Nandy, Aditya
dc.contributor.author	Kulik, Heather J.
dc.date.accessioned	2022-03-23T14:47:01Z
dc.date.available	2021-10-27T20:05:53Z
dc.date.available	2022-03-23T14:47:01Z
dc.date.issued	2019-07
dc.date.submitted	2019-05
dc.identifier.issn	2041-6520
dc.identifier.issn	2041-6539
dc.identifier.uri	https://hdl.handle.net/1721.1/134631.2
dc.description.abstract	This journal is © The Royal Society of Chemistry. Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (e.g., ensemble models) or rely on feature engineering (e.g., feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning.	en_US
dc.language.iso	en
dc.publisher	Royal Society of Chemistry (RSC)	en_US
dc.relation.isversionof	http://dx.doi.org/10.1039/c9sc02298h	en_US
dc.rights	Creative Commons Attribution Noncommercial 3.0 unported license	en_US
dc.rights.uri	https://creativecommons.org/licenses/by-nc/3.0/	en_US
dc.source	Royal Society of Chemistry (RSC)	en_US
dc.title	A quantitative uncertainty metric controls error in neural network-driven chemical discovery	en_US
dc.type	Article	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Chemical Engineering
dc.contributor.department	Massachusetts Institute of Technology. Department of Chemistry
dc.relation.journal	Chemical Science	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2021-06-11T16:45:46Z
dspace.orderedauthors	Janet, JP; Duan, C; Yang, T; Nandy, A; Kulik, HJ	en_US
dspace.date.submission	2021-06-11T16:45:47Z
mit.journal.volume	10	en_US
mit.journal.issue	34	en_US
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work Needed	en_US

Files in this item

Name:: c9sc02298h.pdf
Size:: 2.281Mb
Format:: Unknown
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record

Version	Item	Date	Summary
2	1721.1/134631.2*	2022-03-23T14:21:17Z	Publication information verified/added.
1	1721.1/134631	2021-10-27T20:05:53Z

DSpace@MIT

A quantitative uncertainty metric controls error in neural network-driven chemical discovery

Files in this item

This item appears in the following Collection(s)

Version History