Show simple item record

dc.contributor.authorHeid, Esther
dc.contributor.authorMcGill, Charles J
dc.contributor.authorVermeire, Florence H
dc.contributor.authorGreen, William H
dc.date.accessioned2025-07-08T19:28:01Z
dc.date.available2025-07-08T19:28:01Z
dc.date.issued2023-06-20
dc.identifier.urihttps://hdl.handle.net/1721.1/159977
dc.description.abstractCharacterizing uncertainty in machine learning models has recently gained interest in the context of machine learning reliability, robustness, safety, and active learning. Here, we separate the total uncertainty into contributions from noise in the data (aleatoric) and shortcomings of the model (epistemic), further dividing epistemic uncertainty into model bias and variance contributions. We systematically address the influence of noise, model bias, and model variance in the context of chemical property predictions, where the diverse nature of target properties and the vast chemical chemical space give rise to many different distinct sources of prediction error. We demonstrate that different sources of error can each be significant in different contexts and must be individually addressed during model development. Through controlled experiments on data sets of molecular properties, we show important trends in model performance associated with the level of noise in the data set, size of the data set, model architecture, molecule representation, ensemble size, and data set splitting. In particular, we show that 1) noise in the test set can limit a model's observed performance when the actual performance is much better, 2) using size-extensive model aggregation structures is crucial for extensive property prediction, and 3) ensembling is a reliable tool for uncertainty quantification and improvement specifically for the contribution of model variance. We develop general guidelines on how to improve an underperforming model when falling into different uncertainty contexts.en_US
dc.language.isoen
dc.publisherAmerican Chemical Societyen_US
dc.relation.isversionof10.1021/acs.jcim.3c00373en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceAmerican Chemical Societyen_US
dc.titleCharacterizing Uncertainty in Machine Learning for Chemistryen_US
dc.typeArticleen_US
dc.identifier.citationHeid, Esther, McGill, Charles J, Vermeire, Florence H and Green, William H. 2023. "Characterizing Uncertainty in Machine Learning for Chemistry." Journal of Chemical Information and Modeling, 63 (13).
dc.contributor.departmentMassachusetts Institute of Technology. Department of Chemical Engineeringen_US
dc.relation.journalJournal of Chemical Information and Modelingen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2025-07-08T19:02:11Z
dspace.orderedauthorsHeid, E; McGill, CJ; Vermeire, FH; Green, WHen_US
dspace.date.submission2025-07-08T19:02:13Z
mit.journal.volume63en_US
mit.journal.issue13en_US
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record