MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Parsimonious Principles of Deep Neural Networks

Author(s)
Huh, Minyoung
Thumbnail
DownloadThesis PDF (34.17Mb)
Advisor
Isola, Phillip
Agrawal, Pulkit
Terms of use
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
At the core of human intelligence lies an insatiable drive to uncover the simple underlying principles that govern the world’s complexities. This quest for parsimony is not unique to biological cognition but also seems to be a fundamental characteristic of artificial intelligence systems. In this thesis, we explore the intrinsic simplicity bias exhibited by deep neural networks — the powerhouse of modern AI. By analyzing the effective rank of the learned representation kernels, we unveil the observation that these models have an inherent preference for learning parsimonious relationships in the data. We provide further experimental results to support the hypothesis that simplicity bias is a good inductive bias for finding generalizing solutions. Building upon this finding, we present the Platonic Representation Hypothesis — the idea that as AI systems continue to grow in capability, they will converge toward not only simple representational kernels but also a common one. This phenomenon is evidenced by the increasing similarity of models across domains, suggesting the existence of a Platonic “ideal” way to represent the world. However, this path to the Platonic representation necessitates scaling up AI models, which poses significant challenges regarding computational demand. To address this obstacle, we conclude the thesis by proposing a framework for training a model with parallel low-rank updates to effectively reach this convergent endpoint.
Date issued
2024-09
URI
https://hdl.handle.net/1721.1/158482
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.