MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Analyzing and synthesizing deformations in image datasets

Author(s)
Balakrishnan, Guha.
Thumbnail
Download1051458641-MIT.pdf (60.26Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
John V. Guttag.
Terms of use
MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
Many tasks in computer vision and graphics, such as image registration, optical flow estimation and image warping, are concerned with measuring spatial deformations between images. Traditional algorithms for these applications often rely on solving an optimization problem for each test input. Some of these methods produce poor results on complex problems. For example, image warping methods struggle with non-planar objects and occlusions. Furthermore, for large inputs, some of these algorithms can be quite slow. For instance, a state-of-the-art medical image registration algorithm takes over 2 hours on a CPU to register a pair of 3D volumes. In recent years, learning methods have proven successful in a variety of computer vision applications. This thesis first presents neural network models to address two image deformation tasks: registration and warping.
 
We first present a diffeomorphic, unsupervised learning model for medical image registration that obtains similar accuracy to current state-of-the-art, while operating orders of magnitude faster on a large dataset of MRI brain data. Next, we present a supervised model that surpasses the state-of-the-art in accuracy by 7%. In the second project, we present a modular generative neural network that modifies an image of a person to synthesize new images of that person in different poses. Our model decomposes a scene into several foreground/background layers, spatially deforms the layers, modifies their appearances, and fuses them into an output image. Evaluation on videos of human actions show that our method can accurately reconstruct poses within a given action class, as well as transfer pose across action classes. Additionally, we can construct a temporally coherent video from a sequence of poses.
 
The last two works in this thesis present methods for assisting humans in visually comparing spatiotemporal deformations of similar image sequences (i.e., video recordings). We propose "video diffing" to highlight subtle differences between a pair of videos of similar actions, and video averaging to summarize the common patterns among a group of videos of similar actions. We demonstrate the usefulness of these approaches on various video data downloaded from YouTube.
 
Description
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
 
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018
 
Cataloged from PDF version of thesis.
 
Includes bibliographical references (pages 93-101).
 
Date issued
2018
URI
http://hdl.handle.net/1721.1/118029
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.