Stable regression: On the power of optimization over randomization in training regression problems

Bertsimas, Dimitris J; Paskov, Ivan

dc.contributor.author	Bertsimas, Dimitris J
dc.contributor.author	Paskov, Ivan
dc.date.accessioned	2022-06-28T15:20:33Z
dc.date.available	2021-10-27T20:22:43Z
dc.date.available	2022-06-28T15:20:33Z
dc.date.issued	2020-11-01
dc.identifier.uri	https://hdl.handle.net/1721.1/135271.2
dc.description.abstract	We investigate and ultimately suggest remediation to the widely held belief that the best way to train regression models is via random assignment of our data to training and validation sets. In particular, we show that taking a robust optimization approach, and optimally selecting such training and validation sets, leads to models that not only perform significantly better than their randomly constructed counterparts in terms of prediction error, but more importantly, are considerably more stable in the sense that the standard deviation of the resulting predictions, as well as of the model coefficients, is greatly reduced. Moreover, we show that this optimization approach to training is far more effective at recovering the true support of a given data set, i.e., correctly identifying important features while simultaneously excluding spurious ones. We further compare the robust optimization approach to cross validation and find that optimization continues to have a performance edge albeit smaller. Finally, we show that this optimization approach to training is equivalent to building models that are robust to all subpopulations in the data, and thus in particular are robust to the hardest subpopulation, which leads to interesting domain specific interpretations through the use of optimal classification trees. The proposed robust optimization algorithm is efficient and scales training to essentially any desired size.	en_US
dc.language.iso	en
dc.rights	Creative Commons Attribution 4.0 International license	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Journal of Machine Learning Research	en_US
dc.title	Stable regression: On the power of optimization over randomization in training regression problems	en_US
dc.type	Article	en_US
dc.contributor.department	Sloan School of Management	en_US
dc.contributor.department	Massachusetts Institute of Technology. Operations Research Center	en_US
dc.relation.journal	Journal of Machine Learning Research	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2021-02-05T19:15:34Z
dspace.orderedauthors	Bertsimas, D; Paskov, I	en_US
dspace.date.submission	2021-02-05T19:15:36Z
mit.journal.volume	21	en_US
mit.license	PUBLISHER_CC
mit.metadata.status	Publication Information Needed	en_US

Files in this item

Name:: 19-408.pdf
Size:: 584.4Kb
Format:: Unknown
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record

Version	Item	Date	Summary
2	1721.1/135271.2*	2022-06-28T15:16:35Z	Metadata changed: Verified or entered author name and department authority metadata.
1	1721.1/135271	2021-10-27T20:22:43Z

*Selected version

DSpace@MIT

Stable regression: On the power of optimization over randomization in training regression problems

Files in this item

This item appears in the following Collection(s)

Version History