Analyzing and synthesizing deformations in image datasets
Author(s)
Balakrishnan, Guha.
Download1051458641-MIT.pdf (60.26Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
John V. Guttag.
Terms of use
Metadata
Show full item recordAbstract
Many tasks in computer vision and graphics, such as image registration, optical flow estimation and image warping, are concerned with measuring spatial deformations between images. Traditional algorithms for these applications often rely on solving an optimization problem for each test input. Some of these methods produce poor results on complex problems. For example, image warping methods struggle with non-planar objects and occlusions. Furthermore, for large inputs, some of these algorithms can be quite slow. For instance, a state-of-the-art medical image registration algorithm takes over 2 hours on a CPU to register a pair of 3D volumes. In recent years, learning methods have proven successful in a variety of computer vision applications. This thesis first presents neural network models to address two image deformation tasks: registration and warping. We first present a diffeomorphic, unsupervised learning model for medical image registration that obtains similar accuracy to current state-of-the-art, while operating orders of magnitude faster on a large dataset of MRI brain data. Next, we present a supervised model that surpasses the state-of-the-art in accuracy by 7%. In the second project, we present a modular generative neural network that modifies an image of a person to synthesize new images of that person in different poses. Our model decomposes a scene into several foreground/background layers, spatially deforms the layers, modifies their appearances, and fuses them into an output image. Evaluation on videos of human actions show that our method can accurately reconstruct poses within a given action class, as well as transfer pose across action classes. Additionally, we can construct a temporally coherent video from a sequence of poses. The last two works in this thesis present methods for assisting humans in visually comparing spatiotemporal deformations of similar image sequences (i.e., video recordings). We propose "video diffing" to highlight subtle differences between a pair of videos of similar actions, and video averaging to summarize the common patterns among a group of videos of similar actions. We demonstrate the usefulness of these approaches on various video data downloaded from YouTube.
Description
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018 Cataloged from PDF version of thesis. Includes bibliographical references (pages 93-101).
Date issued
2018Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.