Show simple item record

dc.contributor.advisorParrilo, Pablo A.
dc.contributor.advisorShah, Devavrat
dc.contributor.authorSong, Dogyoon
dc.date.accessioned2022-01-14T14:59:42Z
dc.date.available2022-01-14T14:59:42Z
dc.date.issued2021-06
dc.date.submitted2021-06-23T19:40:42.585Z
dc.identifier.urihttps://hdl.handle.net/1721.1/139254
dc.description.abstractData-driven decision making has become a necessary commodity in virtually every domain of human endeavor, fueled by the exponential growth in the availability of data and the rapid increase in our computing power. In principle, if the collected data contain sufficient information, it is possible to build a useful model for making decisions. Nevertheless, there are a few challenges to address to bring it into reality. First, the gathered data can be contaminated by noise, or even by missing values. Second, building a model from data usually involves solving an optimization problem, which may require prohibitively large computational resources. In this thesis, we explore two research directions, motivated by these two challenges. In the first part of the thesis, we consider statistical learning problems with missing data, and discuss the efficacy of data imputation approaches in predictive modeling tasks. To this end, we first review low-rank matrix completion techniques and establish a novel error analysis for the matrix estimation beyond the traditional mean squared error (Frobenius norm), focusing on the singular value thresholding algorithm. Thereafter, we study two specific predictive problem settings -- namely, errors-in-variables regression and Q-learning with thrifty exploration -- and argue that the predictions based on the imputed data are typically nearly as accurate as the predictions made when the complete data were available. In the second part of the thesis, we investigate the tradeoff between the scalability and the quality of optimal solutions in the context of approximate semidefinite programming. Specifically, we ask the question: “how closely can we approximate the set of unit-trace n × n positive semidefinite (PSD) matrices, denoted by Dⁿ, using at most N number of k × k PSD constraints?” We show that any set S that approximates Dⁿ within a constant approximation ratio must have superpolynomially large Sᵏ₊ -extension complexity for all k = o(n/ log n). Our results imply that it is impossible to globally approximate a large-scale PSD cone using only a few, smaller-sized PSD constraints. Therefore, we conclude that local, problem-adaptive techniques are essential to approximate SDPs for enhanced scalability.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titleAddressing Missing Data and Scalable Optimization for Data-driven Decision Making
dc.typeThesis
dc.description.degreePh.D.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.orcidhttps://orcid.org/0000-0001-5489-8213
mit.thesis.degreeDoctoral
thesis.degree.nameDoctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record