MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Sparse Learning using Discrete Optimization: Scalable Algorithms and Statistical Insights

Author(s)
Hazimeh, Hussein
Thumbnail
DownloadThesis PDF (3.500Mb)
Advisor
Mazumder, Rahul
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Sparsity is a central concept in interpretable machine learning and high-dimensional statistics. While sparse learning problems can be naturally modeled using discrete optimization, computational challenges have historically shifted the focus towards alternatives based on continuous optimization and heuristics. Recently, growing evidence suggests that discrete optimization methods can obtain more interpretable models than popular alternatives. However, scalability issues are limiting the adoption of discrete methods and our understanding of their statistical properties. This thesis develops scalable discrete optimization methods and presents new statistical insights for a fundamental class of sparse learning problems. In the first chapter, we consider the L0-regularized linear regression problem, which aims to select a subset of features that best predict the outcome. We propose fast, approximate algorithms, based on coordinate descent and local combinatorial optimization, and establish convergence guarantees. Empirically, we identify important high-dimensional settings where L0-based estimators achieve better statistical performance than popular sparse learning methods (e.g., based on L1 regularization). Our open-source implementation (L0Learn) can handle instances with millions of features and run up to 3x faster than state-of-the-art sparse learning toolkits. In the second chapter, we propose an exact, scalable approach for L0-regularized linear regression. In particular, we develop a specialized nonlinear branch-and-bound (BnB) framework that solves a mixed integer programming (MIP) formulation of the problem. In a radical shift from modern MIP solvers, we solve the BnB subproblems using a specialized first-order method that exploits sparsity. Our open-source solver L0BnB can scale to instances with ~ 10^7 features, over 1000x larger than what modern MIP solvers can handle. In the third chapter, we focus on L0-regularized classification. We propose an exact and novel algorithm that solves the problem via a sequence of MIP subproblems, each involving a relatively small number of binary variables. The algorithm can scale to instances with 50,000 features. We also develop fast, approximate algorithms that generalize those of the first chapter. We show theoretically and empirically that our proposals can outperform popular sparse classification methods. In the last two chapters, we consider structured sparse learning problems, in which group or hierarchy constraints are imposed to enhance interpretability. We develop specialized convex and discrete optimization algorithms for these problems. Our experiments indicate that the proposed algorithms are more scalable and can achieve better statistical performance than existing methods.
Date issued
2021-09
URI
https://hdl.handle.net/1721.1/140050
Department
Massachusetts Institute of Technology. Operations Research Center; Sloan School of Management
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.