MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Factorization and Compositional Generalization in Diffusion Models

Author(s)
Liang, Qiyao
Thumbnail
DownloadThesis PDF (24.83Mb)
Advisor
Fiete, Ila R.
Terms of use
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
One of the defining features of human intelligence is compositionality—the ability to generate an infinite array of complex ideas from a limited set of components. This capacity allows for the creation of novel and intricate combinations of arbitrary concepts, enabling potentially infinite expressive power from finite learning experiences. A likely prerequisite for the emergence of compositionality is the development of factorized representations of distinct features of variation in the world. However, the precise mechanisms behind the formation of these factorized representations in the human brain, and their connection to compositionality, remain unclear. Diffusion models are capable of generating photorealistic images that combine elements not co-occurring in the training set, demonstrating their ability to compositionally generalize. Yet, the underlying mechanisms of such compositionality and its acquisition through learning are still not well understood. Additionally, the relationship between forming factorized representations of distinct features and a model’s capacity for compositional generalization is not fully elucidated. In this thesis, we explore a simplified setting to investigate whether diffusion models can learn semantically meaningful and fully factorized representations of composable features. We conduct extensive controlled experiments on conditional diffusion models trained to generate various forms of 2D Gaussian data. Through preliminary investigations, we identify three distinct learning phases in the model, revealing that while overall learning rates depend on dataset density, the rates for independent generative factors do not. Moreover, our findings show that models can represent continuous features of variation with semi-continuous, factorized manifolds, resulting in superior compositionality but limited interpolation over unseen values. Based on our investigations, we propose a more data-efficient training scheme for diffusion models and suggest potential future architectures for more robust and efficient generative models.
Date issued
2024-09
URI
https://hdl.handle.net/1721.1/158507
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.