Show simple item record

dc.contributor.authorHattrick-Simpers, Jason R.
dc.contributor.authorDeCost, Brian
dc.contributor.authorKusne, A. G.
dc.contributor.authorJoress, Howie
dc.contributor.authorWong-Ng, Winnie
dc.contributor.authorKaiser, Debra L.
dc.contributor.authorZakutayev, Andriy
dc.contributor.authorPhillips, Caleb
dc.contributor.authorSun, Shijing
dc.contributor.authorThapa, Janak
dc.contributor.authorYu, Heshan
dc.contributor.authorTakeuchi, Ichiro
dc.contributor.authorBuonassisi, Tonio
dc.date.accessioned2021-11-01T14:33:36Z
dc.date.available2021-11-01T14:33:36Z
dc.date.issued2021-06-09
dc.identifier.urihttps://hdl.handle.net/1721.1/136823
dc.description.abstractAbstract Modern machine learning and autonomous experimentation schemes in materials science rely on accurate analysis of the data ingested by these models. Unfortunately, accurate analysis of the underlying data can be difficult, even for domain experts, complicating the training of the models intended to drive experiments. This is especially true when the goal is to identify the presence of weak signatures in diffraction or spectroscopic datasets. In this work, we examine a set of as-obtained diffraction data that track the phase transition from monoclinic to tetragonal in a Nb-doped VO2 film as a function of temperature and dopant concentration. We then task a set of domain experts and a set of machine learning experts with identifying which phase is present in each diffraction pattern manually and algorithmically, respectively; in both cases, the labels can vary dramatically, especially at the phase boundaries. We use the mode of the labels and the Shannon entropy as a method to capture, preserve and propagate consensus labels and their variance. Further we use the expert labels as a benchmark and demonstrate the use of Shannon entropy weighted scoring to test the performance of machine learning generated labels. Finally, we propose a material data challenge centered around generating improved labeling algorithms. This real-world dataset curated with expert labels can act as test bed for new algorithms. The raw data, annotations and code used in this study are all available online at data.gov and the interested reader is encouraged to replicate and improve the existing modelsen_US
dc.publisherSpringer International Publishingen_US
dc.relation.isversionofhttps://doi.org/10.1007/s40192-021-00213-8en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceSpringer International Publishingen_US
dc.titleAn Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Modelsen_US
dc.typeArticleen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Mechanical Engineering
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2021-06-15T03:19:12Z
dc.language.rfc3066en
dc.rights.holderThis is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply
dspace.embargo.termsY
dspace.date.submission2021-06-15T03:19:12Z
mit.licenseOPEN_ACCESS_POLICY
mit.metadata.statusAuthority Work and Publication Information Needed


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record