Advanced Search

Labeling and modeling large databases of videos

Research and Teaching Output of the MIT Community

Show simple item record

dc.contributor.advisor Antonio Torralba. en_US Yuen, Jenny, Ph. D. Massachusetts Institute of Technology en_US
dc.contributor.other Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. en_US 2012-07-02T15:47:41Z 2012-07-02T15:47:41Z 2012 en_US 2012 en_US
dc.description Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012. en_US
dc.description Cataloged from PDF version of thesis. en_US
dc.description Includes bibliographical references (p. 91-98). en_US
dc.description.abstract As humans, we can say many things about the scenes surrounding us. For instance, we can tell what type of scene and location an image depicts, describe what objects live in it, their material properties, or their spatial arrangement. These comprise descriptions of a scene and are majorly studied areas in computer vision. This thesis, however, hypotheses that observers have an inherent prior knowledge that can be applied to the scene at hand. This prior knowledge can be translated into the cognisance of which objects move, or in the trajectories and velocities to expect. Conversely, when faced with unusual events such as car accidents, humans are very well tuned to identify them regardless of having observed the scene a priori. This is, in part, due to prior observations that we have for scenes with similar configurations to the current one. This thesis emulates the prior knowledge base of humans by creating a large and heterogeneous database and annotation tool for videos depicting real world scenes. The first application of this thesis is in the area of unusual event detection. Given a short clip, the task is to identify the moving portions of the scene that depict abnormal events. We adopt a data-driven framework powered by scene matching techniques to retrieve the videos nearest to the query clip and integrate the motion information in the nearest videos. The result is a final clip with localized annotations for unusual activity. The second application lies in the area of event prediction. Given a static image, we adapt our framework to compile a prediction of motions to expect in the image. This result is crafted by integrating the knowledge of videos depicting scenes similar to the query image. With the help of scene matching, only scenes relevant to the queries are considered, resulting in reliable predictions. Our dataset, experimentation, and proposed model introduce and explore a new facet of scene understanding in images and videos. en_US
dc.description.statementofresponsibility by Jenny Yuen. en_US
dc.format.extent 98 p. en_US
dc.language.iso eng en_US
dc.publisher Massachusetts Institute of Technology en_US
dc.rights M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. en_US
dc.rights.uri en_US
dc.subject Electrical Engineering and Computer Science. en_US
dc.title Labeling and modeling large databases of videos en_US
dc.type Thesis en_US Ph.D. en_US
dc.contributor.department Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. en_US
dc.identifier.oclc 795583357 en_US

Files in this item

Name Size Format Description
795583357-MIT.pdf 11.13Mb PDF Full printable version

This item appears in the following Collection(s)

Show simple item record