MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Modeling the Geometry of Neural Network Representation Spaces

Author(s)
Robinson, Joshua David
Thumbnail
DownloadThesis PDF (32.37Mb)
Advisor
Jegelka, Stefanie
Sra, Suvrit
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Neural networks automate the process of representing objects and their relations on a computer, including everything from household items to molecules. New representations are obtained by transforming different instances into a shared representation space, where variations in data can be measured using simple geometric quantities such as Euclidean distances. This thesis studies the geometric structure of this space and its influence on key properties of the learning process, including how much data is needed to acquire new skills, when predictions will fail, and the computational cost of learning. We examine two foundational aspects of the geometry of neural network representations. Part I designs and studies learning algorithms that take into account the location of data in representation space. Focusing on contrastive self-supervised learning, we design a) hard instance sampling strategies and b) methods for controlling what features models learn. Each produces improvements in key characteristics, such as training speed, generalization, and model reliability. Part II studies how to use non-Euclidean geometries to build network architectures that respect symmetries and structures arising in physical data, providing a powerful inductive bias for learning. Specifically, we use geometric spaces such as the real projective plane and the spectraplex to build a) provably powerful neural networks that respect the symmetries of eigenvectors, which is important for building Transformers on graph structured data, and b) neural networks that solve combinatorial optimization problems on graphs such as finding big cliques or small cuts, which arise in molecular engineering and network science.
Date issued
2023-09
URI
https://hdl.handle.net/1721.1/152692
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.