Sparse Learning using Discrete Optimization: Scalable Algorithms and Statistical Insights

Hazimeh, Hussein

Author(s)

Hazimeh, Hussein

DownloadThesis PDF (3.500Mb)

Advisor

Mazumder, Rahul

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Sparsity is a central concept in interpretable machine learning and high-dimensional statistics. While sparse learning problems can be naturally modeled using discrete optimization, computational challenges have historically shifted the focus towards alternatives based on continuous optimization and heuristics. Recently, growing evidence suggests that discrete optimization methods can obtain more interpretable models than popular alternatives. However, scalability issues are limiting the adoption of discrete methods and our understanding of their statistical properties. This thesis develops scalable discrete optimization methods and presents new statistical insights for a fundamental class of sparse learning problems. In the first chapter, we consider the L0-regularized linear regression problem, which aims to select a subset of features that best predict the outcome. We propose fast, approximate algorithms, based on coordinate descent and local combinatorial optimization, and establish convergence guarantees. Empirically, we identify important high-dimensional settings where L0-based estimators achieve better statistical performance than popular sparse learning methods (e.g., based on L1 regularization). Our open-source implementation (L0Learn) can handle instances with millions of features and run up to 3x faster than state-of-the-art sparse learning toolkits. In the second chapter, we propose an exact, scalable approach for L0-regularized linear regression. In particular, we develop a specialized nonlinear branch-and-bound (BnB) framework that solves a mixed integer programming (MIP) formulation of the problem. In a radical shift from modern MIP solvers, we solve the BnB subproblems using a specialized first-order method that exploits sparsity. Our open-source solver L0BnB can scale to instances with ~ 10^7 features, over 1000x larger than what modern MIP solvers can handle. In the third chapter, we focus on L0-regularized classification. We propose an exact and novel algorithm that solves the problem via a sequence of MIP subproblems, each involving a relatively small number of binary variables. The algorithm can scale to instances with 50,000 features. We also develop fast, approximate algorithms that generalize those of the first chapter. We show theoretically and empirically that our proposals can outperform popular sparse classification methods. In the last two chapters, we consider structured sparse learning problems, in which group or hierarchy constraints are imposed to enhance interpretability. We develop specialized convex and discrete optimization algorithms for these problems. Our experiments indicate that the proposed algorithms are more scalable and can achieve better statistical performance than existing methods.

Date issued

2021-09

URI

https://hdl.handle.net/1721.1/140050

Department

Massachusetts Institute of Technology. Operations Research Center; Sloan School of Management

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses