Structuring Representations in Deep Learning: Symmetries and Linear Models
Author(s)
Lawrence, Hannah
DownloadThesis PDF (12.37Mb)
Advisor
Moitra, Ankur
Terms of use
Metadata
Show full item recordAbstract
The ability of deep neural networks to learn rich data representations is considered paramount to understanding their behavior and empirical success. In particular, imposing known structure on learned representations via careful architecture choice has proven impactful for problems with underlying symmetries. Conversely, discovering the similarity structure between different representations — even in the absence of such explicit priors — provides a valuable tool for comparing the architectures which gave rise to them. In this thesis, we study three aspects of deep learning theory through the lens of structured representations: architecture optimization, approximation, and comparison. First, we examine the implicit bias of gradient descent on linear group convolutional networks (G-CNNs), which provide a model for learning highly structured representations. For such architectures, we prove that gradient descent implicitly minimizes the net’s Schatten norm in Fourier space [Lawrence et al., 2022]. While the explicit bias of equivariant nets is the main reason for their usage, this result indicates that a structured implicit bias may impact the types of functions they learn as well. Next, we expand on existing universality results for equivariant architectures. In contrast to the exponential dependence on dimension of existing universality results, we demonstrate that certain smooth subclasses of invariant functions, analogous to Barron classes of functions, can be efficiently approximated using architectures which capture invariant representations. Finally, we define a new metric for probing the structure of arbitrary learned representations [Boix-Adser`a et al., 2022]. In particular, we embed trained representations into a shared metric space, based on the principle that representations are “close” if they behave similarly on downstream linear regression tasks. This metric, termed gulp, is invariant under unitary transformations, and empirically provides an effective method for comparing learned representations on different architectures.
Date issued
2022-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology