Advanced Search

Large databases of real and synthetic images for feature evaluation and prediction

Research and Teaching Output of the MIT Community

Show simple item record

dc.contributor.advisor William T. Freeman and Antonio Torralba. en_US Kaneva, Biliana K en_US
dc.contributor.other Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. en_US 2012-07-02T15:46:24Z 2012-07-02T15:46:24Z 2012 en_US 2012 en_US
dc.description Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012. en_US
dc.description Cataloged from PDF version of thesis. en_US
dc.description Includes bibliographical references (p. 157-167). en_US
dc.description.abstract Image features are widely used in computer vision applications from stereo matching to panorama stitching to object and scene recognition. They exploit image regularities to capture structure in images both locally, using a patch around an interest point, and globally, over the entire image. Image features need to be distinctive and robust toward variations in scene content, camera viewpoint and illumination conditions. Common tasks are matching local features across images and finding semantically meaningful matches amongst a large set of images. If there is enough structure or regularity in the images, we should be able not only to find good matches but also to predict parts of the objects or the scene that were not directly captured by the camera. One of the difficulties in evaluating the performance of image features in both the prediction and matching tasks is the availability of ground truth data. In this dissertation, we take two different approaches. First, we propose using a photorealistic virtual world for evaluating local feature descriptors and leaning new feature detectors. Acquiring ground truth data and, in particular pixel to pixel correspondences between images, in complex 3D scenes under different viewpoint and illumination conditions in a controlled way is nearly impossible in a real world setting. Instead, we use a high-resolution 3D model of a city to gain complete and repeatable control of the environment. We calibrate our virtual world evaluations by comparing against feature rankings made from photographic data of the same subject matter (the Statue of Liberty). We then use our virtual world to study the effects on descriptor performance of controlled changes in viewpoint and illumination. We further employ machine learning techniques to train a model that would recognize visually rich interest points and optimize the performance of a given descriptor. In the latter part of the thesis, we take advantage of the large amounts of image data available on the Internet to explore the regularities in outdoor scenes and, more specifically, the matching and prediction tasks in street level images. Generally, people are very adept at predicting what they might encounter as they navigate through the world. They use all of their prior experience to make such predictions even when placed in unfamiliar environment. We propose a system that can predict what lies just beyond the boundaries of the image using a large photo collection of images of the same class, but not from the same location in the real world. We evaluate the performance of the system using different global or quantized densely extracted local features. We demonstrate how to build seamless transitions between the query and prediction images, thus creating a photorealistic virtual space from real world images. en_US
dc.description.statementofresponsibility by Biliana K. Kaneva. en_US
dc.format.extent 167 p. en_US
dc.language.iso eng en_US
dc.publisher Massachusetts Institute of Technology en_US
dc.rights M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. en_US
dc.rights.uri en_US
dc.subject Electrical Engineering and Computer Science. en_US
dc.title Large databases of real and synthetic images for feature evaluation and prediction en_US
dc.type Thesis en_US Ph.D. en_US
dc.contributor.department Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. en_US
dc.identifier.oclc 795522644 en_US

Files in this item

Name Size Format Description
795522644-MIT.pdf 23.45Mb PDF Full printable version

This item appears in the following Collection(s)

Show simple item record