Show simple item record

dc.contributor.advisorBertsimas, Dimitris
dc.contributor.authorPaskov, Ivan Spassimirov
dc.date.accessioned2022-06-15T13:14:24Z
dc.date.available2022-06-15T13:14:24Z
dc.date.issued2022-02
dc.date.submitted2022-01-06T00:07:09.464Z
dc.identifier.urihttps://hdl.handle.net/1721.1/143350
dc.description.abstractThis thesis explores one of the most fundamental questions in Machine Learning, namely, how should the "learning" component in Machine Learning be done? For essentially the entire history of the field, ever since Mosteller and Tukey proposed the paradigm in 1968, the answer has remained constant: use randomization. Namely, randomly split your data into training, validation, and test sets, then train your model on the training set, pick parameters based on the validation set, and then report performance based on the test set. Conceptually and practically simple, this methodology has gained near unanimous adoption. Despite this popularity, however, the methodology is fraught with numerous issues relating to the instability of the trained models, and the question remains whether or not we can do better? In this thesis, we answer that question in the affirmative. By taking a robust, combinatorial optimization approach, we propose a new way of training all machine learning models based on optimization rather than randomization. Rather than requesting that the model be performant against a single, randomly chosen training set, as is typically done, instead we require that it be robust against every training set of a fixed size. In this way, we extract out that which is common amongst all training sets, rather than the idiosyncrasies of any particular dataset, which are unlikely to generalize to new, yet unseen datasets. We begin by developing the methodology within the context of spatial, cross-sectional methods, and then proceed to extend the framework to time-series methods, where the contiguous structure of time now plays a key role. We next derive efficient algorithms that make the approach extremely scalable. Finally, we demonstrate the efficacy of the methodology across all methods on a large set of datasets, synthetic and real, derived from both academia and industry.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titleStable Machine Learning
dc.typeThesis
dc.description.degreePh.D.
dc.contributor.departmentMassachusetts Institute of Technology. Operations Research Center
dc.identifier.orcid0000-0002-5161-1771
mit.thesis.degreeDoctoral
thesis.degree.nameDoctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record