MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Information FOMO: The Unhealthy Fear of Missing Out on Information—A Method for Removing Misleading Data for Healthier Models

Author(s)
Pickering, Ethan; Sapsis, Themistoklis P.
Thumbnail
Downloadentropy-26-00835-v3.pdf (3.566Mb)
Publisher with Creative Commons License

Publisher with Creative Commons License

Creative Commons Attribution

Terms of use
Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/
Metadata
Show full item record
Abstract
Misleading or unnecessary data can have out-sized impacts on the health or accuracy of Machine Learning (ML) models. We present a Bayesian sequential selection method, akin to Bayesian experimental design, that identifies critically important information within a dataset while ignoring data that are either misleading or bring unnecessary complexity to the surrogate model of choice. Our method improves sample-wise error convergence and eliminates instances where more data lead to worse performance and instabilities of the surrogate model, often termed sample-wise “double descent”. We find these instabilities are a result of the complexity of the underlying map and are linked to extreme events and heavy tails. Our approach has two key features. First, the selection algorithm dynamically couples the chosen model and data. Data is chosen based on its merits towards improving the selected model, rather than being compared strictly against other data. Second, a natural convergence of the method removes the need for dividing the data into training, testing, and validation sets. Instead, the selection metric inherently assesses testing and validation error through global statistics of the model. This ensures that key information is never wasted in testing or validation. The method is applied using both Gaussian process regression and deep neural network surrogate models.
Date issued
2024-09-30
URI
https://hdl.handle.net/1721.1/157430
Department
Massachusetts Institute of Technology. Department of Mechanical Engineering
Journal
entropy
Publisher
Multidisciplinary Digital Publishing Institute
Citation
Pickering, E.; Sapsis, T.P. Information FOMO: The Unhealthy Fear of Missing Out on Information—A Method for Removing Misleading Data for Healthier Models. Entropy 2024, 26, 835.
Version: Final published version

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.