Show simple item record

dc.contributor.authorPickering, Ethan
dc.contributor.authorSapsis, Themistoklis P.
dc.date.accessioned2024-10-28T14:17:41Z
dc.date.available2024-10-28T14:17:41Z
dc.date.issued2024-09-30
dc.identifier.urihttps://hdl.handle.net/1721.1/157430
dc.description.abstractMisleading or unnecessary data can have out-sized impacts on the health or accuracy of Machine Learning (ML) models. We present a Bayesian sequential selection method, akin to Bayesian experimental design, that identifies critically important information within a dataset while ignoring data that are either misleading or bring unnecessary complexity to the surrogate model of choice. Our method improves sample-wise error convergence and eliminates instances where more data lead to worse performance and instabilities of the surrogate model, often termed sample-wise “double descent”. We find these instabilities are a result of the complexity of the underlying map and are linked to extreme events and heavy tails. Our approach has two key features. First, the selection algorithm dynamically couples the chosen model and data. Data is chosen based on its merits towards improving the selected model, rather than being compared strictly against other data. Second, a natural convergence of the method removes the need for dividing the data into training, testing, and validation sets. Instead, the selection metric inherently assesses testing and validation error through global statistics of the model. This ensures that key information is never wasted in testing or validation. The method is applied using both Gaussian process regression and deep neural network surrogate models.en_US
dc.publisherMultidisciplinary Digital Publishing Instituteen_US
dc.relation.isversionofhttp://dx.doi.org/10.3390/e26100835en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceMultidisciplinary Digital Publishing Instituteen_US
dc.titleInformation FOMO: The Unhealthy Fear of Missing Out on Information—A Method for Removing Misleading Data for Healthier Modelsen_US
dc.typeArticleen_US
dc.identifier.citationPickering, E.; Sapsis, T.P. Information FOMO: The Unhealthy Fear of Missing Out on Information—A Method for Removing Misleading Data for Healthier Models. Entropy 2024, 26, 835.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Mechanical Engineeringen_US
dc.relation.journalentropyen_US
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2024-10-25T13:43:00Z
dspace.date.submission2024-10-25T13:43:00Z
mit.journal.volume26en_US
mit.journal.issue10en_US
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record