Show simple item record

dc.contributor.advisorMoitra, Ankur
dc.contributor.authorLawrence, Hannah
dc.date.accessioned2023-01-19T19:59:10Z
dc.date.available2023-01-19T19:59:10Z
dc.date.issued2022-09
dc.date.submitted2022-10-19T18:57:39.739Z
dc.identifier.urihttps://hdl.handle.net/1721.1/147568
dc.description.abstractThe ability of deep neural networks to learn rich data representations is considered paramount to understanding their behavior and empirical success. In particular, imposing known structure on learned representations via careful architecture choice has proven impactful for problems with underlying symmetries. Conversely, discovering the similarity structure between different representations — even in the absence of such explicit priors — provides a valuable tool for comparing the architectures which gave rise to them. In this thesis, we study three aspects of deep learning theory through the lens of structured representations: architecture optimization, approximation, and comparison. First, we examine the implicit bias of gradient descent on linear group convolutional networks (G-CNNs), which provide a model for learning highly structured representations. For such architectures, we prove that gradient descent implicitly minimizes the net’s Schatten norm in Fourier space [Lawrence et al., 2022]. While the explicit bias of equivariant nets is the main reason for their usage, this result indicates that a structured implicit bias may impact the types of functions they learn as well. Next, we expand on existing universality results for equivariant architectures. In contrast to the exponential dependence on dimension of existing universality results, we demonstrate that certain smooth subclasses of invariant functions, analogous to Barron classes of functions, can be efficiently approximated using architectures which capture invariant representations. Finally, we define a new metric for probing the structure of arbitrary learned representations [Boix-Adser`a et al., 2022]. In particular, we embed trained representations into a shared metric space, based on the principle that representations are “close” if they behave similarly on downstream linear regression tasks. This metric, termed gulp, is invariant under unitary transformations, and empirically provides an effective method for comparing learned representations on different architectures.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titleStructuring Representations in Deep Learning: Symmetries and Linear Models
dc.typeThesis
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record