Large-Scale Algorithms for Machine Learning: Efficiency, Estimation Errors, and Beyond

Wang, Haoyue

Author(s)

Wang, Haoyue

DownloadThesis PDF (2.196Mb)

Advisor

Mazumder, Rahul

Terms of use

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/

Metadata

Show full item record

Abstract

Optimization algorithms stand as a cornerstone for machine learning and statistical inference. The advent of large-scale datasets introduces computational challenges, necessitating the pursuit of more efficient algorithms. Modern optimization techniques are usually tailored to particular machine learning issues. These approaches leverage the unique structural characteristics of the problems, resulting in enhanced efficiency compared to current methods applied to these issues. Another key aspect in examining learning algorithms involves comprehending the estimation precision of the derived estimator. In some scenarios, while achieving exact optimization on the training set may be impractical, certain straightforward and effective heuristics can demonstrate commendable estimation accuracy within an appropriate statistical framework. In this thesis, we examine a few large-scale algorithms from both optimization and statistics perspectives. In Chapters 2 and 3, we study two continuous optimization algorithms tailored to structural constraints. Chapter 2 focuses on a generalized Frank-Wolfe method for unbounded constraints with cylinder-like constraints. Chapter 3 focuses on a CD-like method for polyhedral constraints with a small number of extreme points. Both methods have state-of-the-art performance due to their awareness of the problem structures. In Chapter 4, we study a variant of linear regression with possible mismatches between interpreter-response pairs. We study a simple and efficient heuristic method, and give a rigorous analysis of its estimation error in a statistical setting. In Chapters 5 and 6, we examine two algorithms for decision trees. Chapter 5 studies the computation of optimal decision trees, and introduces a new branch-and-bound method for optimal decision trees with general continuous features. Chapter 6 turns to the analysis of the CART algorithm under a sufficient impurity decrease condition. We prove tight error bounds for signals functions with this condition, and discuss a few function classes that satisfy this condition. Chapter 7 studies a density estimation problem with shape restrictions. We propose a cubic-Newton method framework for the computation, and also study the approximation property of the finite mixture.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/155490

Department

Massachusetts Institute of Technology. Operations Research Center; Sloan School of Management

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses