Structuring Representations in Deep Learning: Symmetries and Linear Models

Lawrence, Hannah

dc.contributor.advisor	Moitra, Ankur
dc.contributor.author	Lawrence, Hannah
dc.date.accessioned	2023-01-19T19:59:10Z
dc.date.available	2023-01-19T19:59:10Z
dc.date.issued	2022-09
dc.date.submitted	2022-10-19T18:57:39.739Z
dc.identifier.uri	https://hdl.handle.net/1721.1/147568
dc.description.abstract	The ability of deep neural networks to learn rich data representations is considered paramount to understanding their behavior and empirical success. In particular, imposing known structure on learned representations via careful architecture choice has proven impactful for problems with underlying symmetries. Conversely, discovering the similarity structure between different representations — even in the absence of such explicit priors — provides a valuable tool for comparing the architectures which gave rise to them. In this thesis, we study three aspects of deep learning theory through the lens of structured representations: architecture optimization, approximation, and comparison. First, we examine the implicit bias of gradient descent on linear group convolutional networks (G-CNNs), which provide a model for learning highly structured representations. For such architectures, we prove that gradient descent implicitly minimizes the net’s Schatten norm in Fourier space [Lawrence et al., 2022]. While the explicit bias of equivariant nets is the main reason for their usage, this result indicates that a structured implicit bias may impact the types of functions they learn as well. Next, we expand on existing universality results for equivariant architectures. In contrast to the exponential dependence on dimension of existing universality results, we demonstrate that certain smooth subclasses of invariant functions, analogous to Barron classes of functions, can be efficiently approximated using architectures which capture invariant representations. Finally, we define a new metric for probing the structure of arbitrary learned representations [Boix-Adser`a et al., 2022]. In particular, we embed trained representations into a shared metric space, based on the principle that representations are “close” if they behave similarly on downstream linear regression tasks. This metric, termed gulp, is invariant under unitary transformations, and empirically provides an effective method for comparing learned representations on different architectures.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Structuring Representations in Deep Learning: Symmetries and Linear Models
dc.type	Thesis
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Electrical Engineering and Computer Science

Files in this item

Name:: Lawrence-hanlaw-SM-EECS-2022-t ...
Size:: 12.37Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record