Structuring Representations in Deep Learning: Symmetries and Linear Models

Lawrence, Hannah

Author(s)

Lawrence, Hannah

DownloadThesis PDF (12.37Mb)

Advisor

Moitra, Ankur

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

The ability of deep neural networks to learn rich data representations is considered paramount to understanding their behavior and empirical success. In particular, imposing known structure on learned representations via careful architecture choice has proven impactful for problems with underlying symmetries. Conversely, discovering the similarity structure between different representations — even in the absence of such explicit priors — provides a valuable tool for comparing the architectures which gave rise to them. In this thesis, we study three aspects of deep learning theory through the lens of structured representations: architecture optimization, approximation, and comparison. First, we examine the implicit bias of gradient descent on linear group convolutional networks (G-CNNs), which provide a model for learning highly structured representations. For such architectures, we prove that gradient descent implicitly minimizes the net’s Schatten norm in Fourier space [Lawrence et al., 2022]. While the explicit bias of equivariant nets is the main reason for their usage, this result indicates that a structured implicit bias may impact the types of functions they learn as well. Next, we expand on existing universality results for equivariant architectures. In contrast to the exponential dependence on dimension of existing universality results, we demonstrate that certain smooth subclasses of invariant functions, analogous to Barron classes of functions, can be efficiently approximated using architectures which capture invariant representations. Finally, we define a new metric for probing the structure of arbitrary learned representations [Boix-Adser`a et al., 2022]. In particular, we embed trained representations into a shared metric space, based on the principle that representations are “close” if they behave similarly on downstream linear regression tasks. This metric, termed gulp, is invariant under unitary transformations, and empirically provides an effective method for comparing learned representations on different architectures.

Date issued

2022-09

URI

https://hdl.handle.net/1721.1/147568

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses