Characterizations of how neural networks learn

Boix-Adsera, Enric

dc.contributor.advisor	Bresler, Guy
dc.contributor.advisor	Rigollet, Philippe
dc.contributor.author	Boix-Adsera, Enric
dc.date.accessioned	2024-08-21T18:55:30Z
dc.date.available	2024-08-21T18:55:30Z
dc.date.issued	2024-05
dc.date.submitted	2024-07-10T13:01:25.993Z
dc.identifier.uri	https://hdl.handle.net/1721.1/156306
dc.description.abstract	Training neural network architectures on Internet-scale datasets has led to many recent advances in machine learning. However, the mechanisms underlying how neural networks learn from data are largely opaque. This thesis develops a mechanistic understanding of how neural networks learn in several settings, as well as new tools to analyze trained networks. First, we study data where the labels depend on an unknown low-dimensional subspace of the input (i.e., the multi-index setting). We identify the “leap complexity”, which is a quantity that we argue characterizes how much data networks need in order to learn. Our analysis reveals a saddle-to-saddle dynamic in network training, where training alternates between loss plateaus and sharp drops in the loss. Furthermore, we show that network weights evolve such that the trained weights are a low-rank perturbation of the original weights. We observe this effect empirically in state-of-the-art transformer models trained on image and vision data. Second, we study the ability of language models to learn to reason. On a family of “relational reasoning” tasks, we prove that modern transformers learn to reason with enough data, but classical fully-connected architectures do not. Our analysis suggests small architectural modifications that improve data efficiency. Finally, we construct new tools to interpret trained networks. These are: (a) a definition of distance between two models that captures their functional similarity, and (b) a distillation algorithm to efficiently extract interpretable decision-tree structure from a trained model when possible.
dc.publisher	Massachusetts Institute of Technology
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title	Characterizations of how neural networks learn
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: boix-eboix-phd-eecs-2024-thesis.pdf
Size:: 14.21Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record