Analyzing and synthesizing deformations in image datasets

Balakrishnan, Guha.

Author(s)

Balakrishnan, Guha.

Download1051458641-MIT.pdf (60.26Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

John V. Guttag.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Many tasks in computer vision and graphics, such as image registration, optical flow estimation and image warping, are concerned with measuring spatial deformations between images. Traditional algorithms for these applications often rely on solving an optimization problem for each test input. Some of these methods produce poor results on complex problems. For example, image warping methods struggle with non-planar objects and occlusions. Furthermore, for large inputs, some of these algorithms can be quite slow. For instance, a state-of-the-art medical image registration algorithm takes over 2 hours on a CPU to register a pair of 3D volumes. In recent years, learning methods have proven successful in a variety of computer vision applications. This thesis first presents neural network models to address two image deformation tasks: registration and warping.

We first present a diffeomorphic, unsupervised learning model for medical image registration that obtains similar accuracy to current state-of-the-art, while operating orders of magnitude faster on a large dataset of MRI brain data. Next, we present a supervised model that surpasses the state-of-the-art in accuracy by 7%. In the second project, we present a modular generative neural network that modifies an image of a person to synthesize new images of that person in different poses. Our model decomposes a scene into several foreground/background layers, spatially deforms the layers, modifies their appearances, and fuses them into an output image. Evaluation on videos of human actions show that our method can accurately reconstruct poses within a given action class, as well as transfer pose across action classes. Additionally, we can construct a temporally coherent video from a sequence of poses.

The last two works in this thesis present methods for assisting humans in visually comparing spatiotemporal deformations of similar image sequences (i.e., video recordings). We propose "video diffing" to highlight subtle differences between a pair of videos of similar actions, and video averaging to summarize the common patterns among a group of videos of similar actions. We demonstrate the usefulness of these approaches on various video data downloaded from YouTube.

Description

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 93-101).

Date issued

2018

URI

http://hdl.handle.net/1721.1/118029

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Doctoral Theses