Show simple item record

dc.contributor.advisorMazumder, Rahul
dc.contributor.authorHazimeh, Hussein
dc.date.accessioned2022-02-07T15:21:08Z
dc.date.available2022-02-07T15:21:08Z
dc.date.issued2021-09
dc.date.submitted2021-08-18T21:22:02.993Z
dc.identifier.urihttps://hdl.handle.net/1721.1/140050
dc.description.abstractSparsity is a central concept in interpretable machine learning and high-dimensional statistics. While sparse learning problems can be naturally modeled using discrete optimization, computational challenges have historically shifted the focus towards alternatives based on continuous optimization and heuristics. Recently, growing evidence suggests that discrete optimization methods can obtain more interpretable models than popular alternatives. However, scalability issues are limiting the adoption of discrete methods and our understanding of their statistical properties. This thesis develops scalable discrete optimization methods and presents new statistical insights for a fundamental class of sparse learning problems. In the first chapter, we consider the L0-regularized linear regression problem, which aims to select a subset of features that best predict the outcome. We propose fast, approximate algorithms, based on coordinate descent and local combinatorial optimization, and establish convergence guarantees. Empirically, we identify important high-dimensional settings where L0-based estimators achieve better statistical performance than popular sparse learning methods (e.g., based on L1 regularization). Our open-source implementation (L0Learn) can handle instances with millions of features and run up to 3x faster than state-of-the-art sparse learning toolkits. In the second chapter, we propose an exact, scalable approach for L0-regularized linear regression. In particular, we develop a specialized nonlinear branch-and-bound (BnB) framework that solves a mixed integer programming (MIP) formulation of the problem. In a radical shift from modern MIP solvers, we solve the BnB subproblems using a specialized first-order method that exploits sparsity. Our open-source solver L0BnB can scale to instances with ~ 10^7 features, over 1000x larger than what modern MIP solvers can handle. In the third chapter, we focus on L0-regularized classification. We propose an exact and novel algorithm that solves the problem via a sequence of MIP subproblems, each involving a relatively small number of binary variables. The algorithm can scale to instances with 50,000 features. We also develop fast, approximate algorithms that generalize those of the first chapter. We show theoretically and empirically that our proposals can outperform popular sparse classification methods. In the last two chapters, we consider structured sparse learning problems, in which group or hierarchy constraints are imposed to enhance interpretability. We develop specialized convex and discrete optimization algorithms for these problems. Our experiments indicate that the proposed algorithms are more scalable and can achieve better statistical performance than existing methods.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titleSparse Learning using Discrete Optimization: Scalable Algorithms and Statistical Insights
dc.typeThesis
dc.description.degreePh.D.
dc.contributor.departmentMassachusetts Institute of Technology. Operations Research Center
dc.contributor.departmentSloan School of Management
dc.identifier.orcid0000-0003-4501-0678
mit.thesis.degreeDoctoral
thesis.degree.nameDoctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record