Show simple item record

dc.contributor.authorRasmussen, Maria H.
dc.contributor.authorDuan, Chenru
dc.contributor.authorKulik, Heather J.
dc.contributor.authorJensen, Jan H.
dc.date.accessioned2024-01-10T21:07:54Z
dc.date.available2024-01-10T21:07:54Z
dc.date.issued2023-12-18
dc.identifier.urihttps://hdl.handle.net/1721.1/153303
dc.description.abstractWith the increasingly more important role of machine learning (ML) models in chemical research, the need for putting a level of confidence to the model predictions naturally arises. Several methods for obtaining uncertainty estimates have been proposed in recent years but consensus on the evaluation of these have yet to be established and different studies on uncertainties generally uses different metrics to evaluate them. We compare three of the most popular validation metrics (Spearman’s rank correlation coefficient, the negative log likelihood (NLL) and the miscalibration area) to the error-based calibration introduced by Levi et al. (Sensors 2022, 22, 5540). Importantly, metrics such as the negative log likelihood (NLL) and Spearman’s rank correlation coefficient bear little information in themselves. We therefore introduce reference values obtained through errors simulated directly from the uncertainty distribution. The different metrics target different properties and we show how to interpret them, but we generally find the best overall validation to be done based on the error-based calibration plot introduced by Levi et al. Finally, we illustrate the sensitivity of ranking-based methods (e.g. Spearman’s rank correlation coefficient) towards test set design by using the same toy model ferent test sets and obtaining vastly different metrics (0.05 vs. 0.65).en_US
dc.publisherSpringer International Publishingen_US
dc.relation.isversionofhttps://doi.org/10.1186/s13321-023-00790-0en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceSpringer International Publishingen_US
dc.titleUncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data setsen_US
dc.typeArticleen_US
dc.identifier.citationJournal of Cheminformatics. 2023 Dec 18;15(1):121en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Chemistry
dc.contributor.departmentMassachusetts Institute of Technology. Department of Chemical Engineering
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2023-12-24T04:17:48Z
dc.language.rfc3066en
dc.rights.holderThe Author(s)
dspace.embargo.termsN
dspace.date.submission2023-12-24T04:17:48Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record