| dc.description.abstract | Optimization algorithms have long been fundamental tools across science and engineering, and now are also at the center of the rise of machine learning and artificial intelligence. However, extracting good practical performance from many of these algorithms depends on careful manual calibration and tuning. In fact, the algorithms with best theoretical guarantees do not always perform best in practice. In this light, reducing the effort and skill required to set up optimization algorithms can save immeasurable amounts of time and resources. This thesis makes two contributions to this end, proposing optimization algorithms that require less supervision by adaptively selecting step-sizes and estimating problem parameters online. First, we revisit the foundational subroutine called backtracking line search (BLS). Typically, a base algorithm calls BLS to search for a parameter (e.g., step-size) such that the iterates of that algorithm satisfy a given condition (e.g., Armijo, descent lemma) that leads to desirable behavior (e.g., reducing the value of the objective function). To find a feasible parameter, BLS successively adjusts a parameter candidate by a constant factor until the given condition is satisfied. We propose to instead adjust the parameter candidate by an adaptive factor that takes into account the degree to which the given condition is violated. This adaptive BLS (ABLS) subroutine adds no computational burden relative to BLS, but can lead to significantly better practical results. Experiments on over fifteen real-world datasets demonstrate that ABLS can be more robust than BLS to problem set ups and require significantly fewer condition evaluations to return higher-quality parameters. At the same time, we prove that ABLS enjoys essentially the same theoretical guarantees of BLS. The second contribution of this thesis is a parameter-free algorithm for smooth and strongly convex objective problems called NAG-free. To our knowledge, NAG-free is the first adaptive algorithm capable of directly estimating the strong convexity parameter without priors or resorting to restart schemes. We prove that NAG-free converges globally at least as fast gradient descent, and achieves accelerated convergence locally if the Hessian is locally smooth and other mild additional assumptions hold. Prominent classes of machine learning problems with locally smooth Hessian include the regularized logistic loss, ridge regression, exponential family negative log-likelihoods with bounded natural parameters, and Moreau envelope smoothing. We present real-world experiments in which NAG-free performs comparably well with restart schemes, demonstrating that it can adapt to better local curvature conditions represented by the smoothness and strong convexity parameters. | |