Advances in Sparse and Low Rank Matrix Optimization for Machine Learning Applications
Author(s)
Johnson, Nicholas André G.
DownloadThesis PDF (1.737Mb)
Advisor
Bertsimas, Dimitris J.
Terms of use
Metadata
Show full item recordAbstract
Numerous fundamental problems in operations research, machine learning, and statistics exhibit natural formulations as cardinality or rank constrained optimization problems. Sparse solutions are desirable for their interpretability and storage benefits. Moreover, in the machine learning setting, sparse solutions exhibit superior model generalization and have a natural interpretation as conducting feature extraction in high-dimensional datasets. On the other hand, since the rank of a matrix is equivalent to the cardinality of the matrix's vector of singular values, rank can be interpreted as the matrix generalization of sparsity. Accordingly, low rank solutions inherit similar desirables properties as sparse solutions while allowing for very flexible modelling capability. Unfortunately, optimizing over cardinality and rank constraints is non-convex and NP-Hard in general which has led to strong reliance on convex relaxations and heuristic methods which yield sub-optimal solutions.
This thesis advances both the theory and application of sparse and low rank matrix optimization, focusing on problems that arise in statistics and machine learning. We develop algorithmic approaches to problems exhibiting cardinality and rank constraints by leveraging techniques from mixed-integer and mixed-projection optimization. The proposed algorithms outperform existing convex relaxations and heuristics. Our rigorous analysis and empirical validation aim to contribute to both the theoretical foundations of optimization and the development of practical tools for complex challenges in statistics and machine learning.
Chapter 2 studies the Sparse Plus Low Rank Matrix Decomposition problem. We present an alternating minimization algorithm that computes high quality feasible solutions and outperforms benchmark methods, scaling to dimension n=10000 in minutes. We additionally design a custom branch and bound algorithm to globally solve problem instances of dimension up to n=25 in minutes. Chapter 3 examines the Compressed Sensing problem, for which we present a custom branch and bound algorithm that can compute globally optimal solutions. Our approach produces solutions that are on average 6.22% more sparse on synthetic data and 9.95% more sparse on real world ECG data when compared to state of the art benchmark approaches. Moreover, our approach outperforms benchmark methods when used as part of a multi-label learning algorithm. Chapter 4 explores the problem of learning a partially observed matrix that is predictive of fully observed side information, which consists of an important generalization of the Matrix Completion problem. We reformulate this problem as a mixed-projection optimization problem and present an alternating direction method of multipliers algorithm that can solve problems with n = 10000 rows and m = 10000 columns in less than a minute. On large scale real world data, our algorithm produces solutions that achieves 67% lower out of sample error than benchmark methods in 97% less execution time.
Date issued
2024-09Department
Massachusetts Institute of Technology. Operations Research CenterPublisher
Massachusetts Institute of Technology