Vision by alignment

Kraft, Adam Davis

dc.contributor.advisor	Patrick H. Winston.	en_US
dc.contributor.author	Kraft, Adam Davis	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2018-05-23T15:05:36Z
dc.date.available	2018-05-23T15:05:36Z
dc.date.copyright	2018	en_US
dc.date.issued	2018	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/115632
dc.description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.	en_US
dc.description	This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.	en_US
dc.description	Cataloged from student-submitted PDF version of thesis.	en_US
dc.description	Includes bibliographical references (pages 127-134).	en_US
dc.description.abstract	Human visual intelligence is robust. Vision is versatile in its variety of tasks and operating conditions, it is flexible, adapting facilely to new tasks, and it is introspective, providing compositional explanations for its findings. Vision is fundamentally underdetermined, but it exists in a world that abounds with constraints and regularities perceived not only through vision but through other senses as well. These observations suggest that the imperative of vision is to exploit all sources of information to resolve ambiguity. I propose an alignment model for vision, in which computational specialists eagerly share state with their neighbors during ongoing computations, availing themselves of neighbors' partial results in order to ll gaps in evolving descriptions. Connections between specialists extend across sensory modalities, so that the computational machinery of many senses may be brought to bear on problems with strictly-visual inputs. I anticipate that this alignment process accounts for vision's robust attributes, and I call this prediction the alignment hypothesis. In this document I lay the groundwork for evaluating the hypothesis. I then demonstrate progress toward that goal, by way of the following contributions: -- I performed an experiment to investigate and characterize the ways that high-performing computer-vision models fall short of robust perception, and evaluated whether alignment models can address the shortcomings. The experiment, which relied on a procedure to remove signal energy from natural images while preserving high classication condence by a neural network, revealed that the type of object depicted in the original image is a strong predictor of whether humans recognize the reduced-energy image. -- I implemented an alignment model based on a network of propagators. The model can use constraints to infer locations and heights of pedestrians and locations of occluding objects in an outdoor urban scene. I used the results of the effort to refine the requirements of mechanisms to use in building alignment models. -- I implemented an alignment model based on neural networks. Alignment-motivated design empowers the model, trained to estimate depth maps from single images, to perform the additional task of depth super-resolution without retraining. The design thus demonstrates flexibility, a property of robust vision systems.	en_US
dc.description.statementofresponsibility	by Adam Davis Kraft.	en_US
dc.format.extent	134 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Vision by alignment	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph. D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	1036987419	en_US

Files in this item

Name:: 1036987419-MIT.pdf
Size:: 23.21Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record