Unsupervised Compositional Image Decompositionwith Diffusion Models
Author(s)
Su, Jocelin
DownloadThesis PDF (1.257Mb)
Advisor
Tenenbaum, Joshua B.
Terms of use
Metadata
Show full item recordAbstract
Our visual understanding of the world is factorized and compositional. With just a single observation, we can ascertain both global and local attributes in a scene, such as lighting, weather, and underlying objects. These attributes are highly compositional and can be combined in various ways to create new representations of the world. This paper introduces Decomp Diffusion, an unsupervised method for decomposing images into a set of underlying compositional factors, each represented by a different diffusion model. We demonstrate how each decomposed diffusion model captures a different factor of the scene, ranging from global scene descriptors, (e.g. shadows, foreground, or facial expression) to local scene descriptors (e.g. constituent objects). Furthermore, we show how these inferred factors can be flexibly composed and recombined both within and across different image datasets.
Date issued
2023-06Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology