Unsupervised Learning for Generative Scene Editing and Motion
Author(s)
Fang, David S.
DownloadThesis PDF (13.69Mb)
Advisor
Sitzmann, Vincent
Terms of use
Metadata
Show full item recordAbstract
Unsupervised learning for images and videos is important for many applications in computer vision. While supervised methods usually have the best performance, the amount of data curation and labeling that supervised datasets require makes it difficult to scale. On the other hand, unsupervised learning is more scalable, generalizable, and requires much less data curation, but is harder because it lacks a clear target objective. In this thesis, we propose two distinct lines of unsupervised learning work with generative applications: 1) BlobGSN and 2) optical flow estimation and flow generation with diffusion models. BlobGSN explores the unsupervised learning of spatially disentangled mid-level latent representations for 3D scenes in a generative context. Within this generative framework, we show that BlobGSN facilitates novel scene generation and editing. In a different vein, current state-of-the-art optical flow learning models rely on ground truth data collection for sequences of frames in videos. Unsupervised learning of optical flow, which would not require ground truth data, could theoretically leverage any publicly available video data for training. We explore different frameworks for unsupervised optical flow learning to tackle different problems such as photometric error, occlusion handling, and flow smoothness. Additionally, we propose a generative framework for generating optical flow from a single frame that can be trained in an unsupervised manner.
Date issued
2024-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology