Show simple item record

dc.contributor.authorBertsimas, Dimitris J
dc.contributor.authorPaskov, Ivan
dc.date.accessioned2022-06-28T15:20:33Z
dc.date.available2021-10-27T20:22:43Z
dc.date.available2022-06-28T15:20:33Z
dc.date.issued2020-11-01
dc.identifier.urihttps://hdl.handle.net/1721.1/135271.2
dc.description.abstractWe investigate and ultimately suggest remediation to the widely held belief that the best way to train regression models is via random assignment of our data to training and validation sets. In particular, we show that taking a robust optimization approach, and optimally selecting such training and validation sets, leads to models that not only perform significantly better than their randomly constructed counterparts in terms of prediction error, but more importantly, are considerably more stable in the sense that the standard deviation of the resulting predictions, as well as of the model coefficients, is greatly reduced. Moreover, we show that this optimization approach to training is far more effective at recovering the true support of a given data set, i.e., correctly identifying important features while simultaneously excluding spurious ones. We further compare the robust optimization approach to cross validation and find that optimization continues to have a performance edge albeit smaller. Finally, we show that this optimization approach to training is equivalent to building models that are robust to all subpopulations in the data, and thus in particular are robust to the hardest subpopulation, which leads to interesting domain specific interpretations through the use of optimal classification trees. The proposed robust optimization algorithm is efficient and scales training to essentially any desired size.en_US
dc.language.isoen
dc.rightsCreative Commons Attribution 4.0 International licenseen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceJournal of Machine Learning Researchen_US
dc.titleStable regression: On the power of optimization over randomization in training regression problemsen_US
dc.typeArticleen_US
dc.contributor.departmentSloan School of Managementen_US
dc.contributor.departmentMassachusetts Institute of Technology. Operations Research Centeren_US
dc.relation.journalJournal of Machine Learning Researchen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2021-02-05T19:15:34Z
dspace.orderedauthorsBertsimas, D; Paskov, Ien_US
dspace.date.submission2021-02-05T19:15:36Z
mit.journal.volume21en_US
mit.licensePUBLISHER_CC
mit.metadata.statusPublication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

VersionItemDateSummary

*Selected version