Sparse Learning using Discrete Optimization: Scalable Algorithms and Statistical Insights

Hazimeh, Hussein

dc.contributor.advisor	Mazumder, Rahul
dc.contributor.author	Hazimeh, Hussein
dc.date.accessioned	2022-02-07T15:21:08Z
dc.date.available	2022-02-07T15:21:08Z
dc.date.issued	2021-09
dc.date.submitted	2021-08-18T21:22:02.993Z
dc.identifier.uri	https://hdl.handle.net/1721.1/140050
dc.description.abstract	Sparsity is a central concept in interpretable machine learning and high-dimensional statistics. While sparse learning problems can be naturally modeled using discrete optimization, computational challenges have historically shifted the focus towards alternatives based on continuous optimization and heuristics. Recently, growing evidence suggests that discrete optimization methods can obtain more interpretable models than popular alternatives. However, scalability issues are limiting the adoption of discrete methods and our understanding of their statistical properties. This thesis develops scalable discrete optimization methods and presents new statistical insights for a fundamental class of sparse learning problems. In the first chapter, we consider the L0-regularized linear regression problem, which aims to select a subset of features that best predict the outcome. We propose fast, approximate algorithms, based on coordinate descent and local combinatorial optimization, and establish convergence guarantees. Empirically, we identify important high-dimensional settings where L0-based estimators achieve better statistical performance than popular sparse learning methods (e.g., based on L1 regularization). Our open-source implementation (L0Learn) can handle instances with millions of features and run up to 3x faster than state-of-the-art sparse learning toolkits. In the second chapter, we propose an exact, scalable approach for L0-regularized linear regression. In particular, we develop a specialized nonlinear branch-and-bound (BnB) framework that solves a mixed integer programming (MIP) formulation of the problem. In a radical shift from modern MIP solvers, we solve the BnB subproblems using a specialized first-order method that exploits sparsity. Our open-source solver L0BnB can scale to instances with ~ 10^7 features, over 1000x larger than what modern MIP solvers can handle. In the third chapter, we focus on L0-regularized classification. We propose an exact and novel algorithm that solves the problem via a sequence of MIP subproblems, each involving a relatively small number of binary variables. The algorithm can scale to instances with 50,000 features. We also develop fast, approximate algorithms that generalize those of the first chapter. We show theoretically and empirically that our proposals can outperform popular sparse classification methods. In the last two chapters, we consider structured sparse learning problems, in which group or hierarchy constraints are imposed to enhance interpretability. We develop specialized convex and discrete optimization algorithms for these problems. Our experiments indicate that the proposed algorithms are more scalable and can achieve better statistical performance than existing methods.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Sparse Learning using Discrete Optimization: Scalable Algorithms and Statistical Insights
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Operations Research Center
dc.contributor.department	Sloan School of Management
dc.identifier.orcid	0000-0003-4501-0678
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: hazimeh-hazimeh-phd-orc-2021-t ...
Size:: 3.500Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record