Foundations of Machine Learning: Over-parameterization and Feature Learning
Author(s)
Radhakrishnan, Adityanarayanan
DownloadThesis PDF (29.03Mb)
Advisor
Uhler, Caroline
Terms of use
Metadata
Show full item recordAbstract
In this thesis, we establish and analyze two core principles driving the success of neural networks: over-parameterization and feature learning. We leverage these principles to design models with improved performance and interpretability on various computer vision and biomedical applications.
We begin by discussing the benefits of over-parameterization, i.e., using increasingly large networks that can perfectly fit training data. While prior work characterized the benefits of over-parameterized networks for supervised learning tasks, we show that over-parameterization is also beneficial for unsupervised learning problems, such as autoencoding. The ubiquitous advantage of using increasingly larger networks suggests that infinitely large networks should yield best performance. Remarkably, under certain conditions, training infinitely wide networks simplifies to training classical models known as kernel machines using the Neural Tangent Kernel (NTK). We showcase the practical value of the NTK by deriving and using it for matrix completion problems such as image inpainting and virtual drug screening. Additionally, we use the NTK connection to provide theoretical guarantees for deep neural networks. Namely, we construct interpolating infinitely wide and deep networks that are Bayes optimal, or consistent, for classification.
While the NTK has been a useful tool for understanding properties of deep networks, it lacks a key component that is critical to the success of neural networks: feature learning. In the second part of this thesis, we identify and mathematically characterize the mechanism through which deep neural networks automatically select features, or patterns in data. We show that neural feature learning occurs by re-weighting features based on how much they change predictions upon perturbation. Our result explains various deep learning phenomena such as spurious features, lottery tickets, and grokking. Moreover, the mechanism identified in our work provides a backpropagation-free method for feature learning with any machine learning model. To demonstrate the effectiveness of this feature learning mechanism, we use it to enable feature learning in kernel machines. We show that the resulting models, referred to as Recursive Feature Machines, achieve state-of-the-art performance on tabular data.
Overall, this thesis advances the foundations of machine learning and provides tools for building new machine learning models that are computationally simple, interpretable, and effective.
Date issued
2023-06Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology