Regression under a modern optimization lens

King, Angela, Ph. D. Massachusetts Institute of Technology

Author(s)

King, Angela, Ph. D. Massachusetts Institute of Technology

DownloadFull printable version (8.847Mb)

Other Contributors

Massachusetts Institute of Technology. Operations Research Center.

Advisor

Dimitris Bertsimas.

Terms of use

M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

In the last twenty-five years (1990-2014), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving mixed integer optimization (MIO) problems. The common mindset of MIO as theoretically elegant but practically irrelevant is no longer justified. In this thesis, we propose a methodology for regression modeling that is based on optimization techniques and centered around MIO. In Part I we propose a method to select a subset of variables to include in a linear regression model using continuous and integer optimization. Despite the natural formulation of subset selection as an optimization problem with an lo-norm constraint, current methods for subset selection do not attempt to use integer optimization to select the best subset. We show that, although this problem is non-convex and NP-hard, it can be practically solved for large scale problems. We numerically demonstrate that our approach outperforms other sparse learning procedures. In Part II of the thesis, we build off of Part I to modify the objective function and include constraints that will produce linear regression models with other desirable properties, in addition to sparsity. We develop a unified framework based on MIO which aims to algorithmize the process of building a high-quality linear regression model. This is the only methodology we are aware of to construct models that imposes statistical properties simultaneously rather than sequentially. Finally, we turn our attention to logistic regression modeling. It is the goal of Part III of the thesis to efficiently solve the mixed integer convex optimization problem of logistic regression with cardinality constraints to provable optimality. We develop a tailored algorithm to solve this challenging problem and demonstrate its speed and performance. We then show how this method can be used within the framework of Part II, thereby also creating an algorithmic approach to fitting high-quality logistic regression models. In each part of the thesis, we illustrate the effectiveness of our proposed approach on both real and synthetic datasets.

Description

Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2015.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 131-139).

Date issued

2015

URI

http://hdl.handle.net/1721.1/98719

Department

Massachusetts Institute of Technology. Operations Research Center; Sloan School of Management

Publisher

Massachusetts Institute of Technology

Keywords

Operations Research Center.

Collections

Doctoral Theses