Show simple item record

dc.contributor.advisorRobert Freund.en_US
dc.contributor.authorBerk, Lauren Elizabeth.en_US
dc.contributor.otherMassachusetts Institute of Technology. Operations Research Center.en_US
dc.date.accessioned2020-09-15T21:50:32Z
dc.date.available2020-09-15T21:50:32Z
dc.date.copyright2020en_US
dc.date.issued2020en_US
dc.identifier.urihttps://hdl.handle.net/1721.1/127291
dc.descriptionThesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, May, 2020en_US
dc.descriptionCataloged from the official PDF of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 245-260).en_US
dc.description.abstractIn this thesis, we propose novel formulation optimization methods for four matrix factorization problems in depth: sparse principal component analysis, compressed sensing, discrete component analysis, and latent Dirichlet allocation. For each new formulations, we develop efficient solution algorithms using discrete and robust optimization, and demonstrate tractability and effectiveness in computational experiments. In Chapter 1, we develop a framework for matrix factorization problems and provide a technical introduction to topic modeling with examples. Chapter 2, Certifiably optimal sparse principal component analysis, addresses the sparse principal component analysis (SPCA) problem. We propose a tailored branch-and- bound algorithm, Optimal-SPCA, that enables us to solve SPCA to certifiable optimality.en_US
dc.description.abstractWe apply our methods to real data sets to demonstrate that our approach scales well and provides superior solutions compared to existing methods, explaining a higher proportion of variance and permitting more control over the desired sparsity. Chapter 3, optimal compressed sensing in submodular settings, presents a novel algorithm for compressed sensing that guarantees optimality under submodularity conditions rather than restricted isometry property (RIP) conditions. The algorithm defines submodularity properties of the loss function, derives lower bounds, and generates these lower bounds as constraints for use in a cutting planes algorithm. The chapter also develops a local search heuristic based on this exact algorithm. Chapter 4, Robust topic modeling, develops a new form of topic modeling inspired by robust optimization and by discrete component analysis.en_US
dc.description.abstractThe new approach builds uncertainty sets using one-sided constraints and two hypothesis tests, uses alternating optimization and projected gradient methods, including Adam and mirror descent, to find good local optima. In computational experiments, we demonstrate that these models are better able to avoid over-fitting than LDA and PLSA, and result in more accurate reconstruction of the underlying topic matrices. In Chapter 5, we develop modifications to latent Dirichlet allocation to account for differences in the distribution of topics by authors. The chapter adds author-specific topic priors to the generative process and allows for co-authorship, providing the model with increased degrees of freedom and enabling it to model an enhanced set of problems. The code for the algorithms developed in each chapter in the Julia language is available freely on GitHub at https://github.com/lauren897en_US
dc.description.statementofresponsibilityby Lauren Elizabeth Berk.en_US
dc.format.extent260 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectOperations Research Center.en_US
dc.titleNew optimization approaches to matrix factorization problems with connections to natural language processingen_US
dc.typeThesisen_US
dc.description.degreePh. D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Operations Research Centeren_US
dc.contributor.departmentSloan School of Management
dc.identifier.oclc1191900766en_US
dc.description.collectionPh.D. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Centeren_US
dspace.imported2020-09-15T21:50:32Zen_US
mit.thesis.degreeDoctoralen_US
mit.thesis.departmentSloanen_US
mit.thesis.departmentOperResen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record