Fundamental Limits of Learning for Generalizability, Data Resilience, and Resource Efficiency
Author(s)
Blanchard, Moïse
DownloadThesis PDF (3.009Mb)
Advisor
Jaillet, Patrick
Terms of use
Metadata
Show full item recordAbstract
With the advancement of machine learning models and the rapid increase in their range of applications, learning algorithms should not only have the capacity to learn complex tasks, but also be resilient to imperfect data, all while being resource efficient. This thesis explores trade-offs between these three core challenges in statistical learning theory. We aim to understand the limits of learning algorithms across a wide range of machine learning and optimization settings, with the goal of providing adaptable, robust, and efficient learning algorithms for decision-making.
In Part I of this thesis, we study the limits of learning with respect to generalizability and data assumptions following the universal learning framework. In universal learning, we seek general algorithms that have convergence guarantees for any objective task without structural restrictions. While this cannot be achieved without conditions on the training data, we show that in general this can be performed beyond standard statistical assumptions. More generally, we aim to characterize provably-minimal assumptions for which universal learning can be performed, and to provide algorithms that learn under these minimal assumptions. After giving a detailed overview of the framework and a summary of our results in Chapter 2, we investigate universal learnability across a wide range of machine learning settings: full-feedback in realizable online learning (Chapter 3), supervised learning with arbitrary or adversarial noise (Chapter 4); partial-feedback in standard contextual bandits (Chapter 5) and, as a first step towards more complex reinforcement learning settings, contextual bandits with non-stationary or adversarial rewards (Chapter 6).
We investigate the impact of resource constraints in Part II, specifically of memory constraints in convex optimization. The efficiency of optimization algorithms is typically measured through the number of calls to a first-order oracle which provides value and gradient information on the function, aptly referred to as oracle-complexity. However, this may not be the only bottleneck; understanding the trade-offs with the usage of resources such as memory could pave the way for more practical optimization algorithms. Following this reasoning, we make advancements in characterizing achievable regions for optimization algorithms in the oracle-complexity/memory landscape. In Chapter 7 we show that full memory is necessary to achieve the optimal oracle-complexity for deterministic algorithms; hence, classical cutting-plane methods are Pareto-optimal in the oracle-complexity/memory trade-off. On the positive side, we provide memory-efficient algorithms in Chapter 8 for high-accuracy regimes (sub-polynomial in the dimension). In exponential-accuracy regimes, these algorithms strictly improve the oracle-complexity of gradient descent while preserving the same optimal memory usage. These algorithms can in fact be used for the more general feasibility problem for which we give improved lower-bound trade-offs in Chapter 9. These results imply that in standard accuracy regimes (polynomial in the dimension), gradient descent is also Pareto-optimal and reveal a phase transition for the oracle-complexity of memory-constrained algorithms.
Date issued
2024-05Department
Massachusetts Institute of Technology. Operations Research Center; Sloan School of ManagementPublisher
Massachusetts Institute of Technology