Interactive object recognition and search over mobile video

Chen, Yu-Han Tiffany

dc.contributor.advisor	Hari Balakrishnan.	en_US
dc.contributor.author	Chen, Yu-Han Tiffany	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2017-10-18T15:08:10Z
dc.date.available	2017-10-18T15:08:10Z
dc.date.copyright	2017	en_US
dc.date.issued	2017	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/111876
dc.description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.	en_US
dc.description	Cataloged from PDF version of thesis.	en_US
dc.description	Includes bibliographical references (pages 123-131).	en_US
dc.description.abstract	Cameras of good quality are now available on handheld and wearable mobile devices. The high resolution of these cameras coupled with pervasive wireless connectivity and advanced computer vision algorithms makes it feasible to develop new ways to interact with mobile video. Two important examples are interactive object recognition and search-by-content. Interactive recognition continuously locates objects in a video stream, recognizes them, and labels them with information associated with the objects in the user's view. Example use cases include an augmented shopping application that recognizes products or brands to inform customers about the items they buy and a driver assistance application that recognizes vehicles and signs to improve driver safety. Interactive search-by-content allows users to discover videos using textual queries (e.g., "child dog play"). Instead of requiring broadcasters to manually annotate videos with meta-data tags, our search system uses vision algorithms to automatically produce textual tags. These two services must be highly interactive because users expect timely feedback for their interactions and changes in content. However, achieving high interactivity without sacrificing accuracy or efficiency is challenging. The required computer vision algorithms use computationally intensive deep neural networks and must run at a frame rate of 30 frames per second. Recognizing an object scales with the size of the corpus of objects, and is infeasible on a mobile device. Off-loading recognition operations to servers introduces network and processing delay; when this delay is higher than a frame-time, it degrades recognition accuracy. This dissertation presents two systems that study the trade-off between accuracy and efficiency for interactive recognition and search, and demonstrate how to achieve both goals. Glimpse enables interactive object recognition for camera-equipped mobile devices. Because the algorithms for object recognition entail significant computation, Glimpse runs them on servers across the network. To "hide" latency, Glimpse uses an active cache of video frames on the device and performs tracking on a subset of frames to correct the stale results obtained from the processing pipeline. Our results show that Glimpse achieves a precision of 90% for face recognition, which improves over a scheme performing server-side recognition without using an active cache by 2.8 x. For fast moving objects such as road signs, Glimpse achieves precision up to 80%; without using the active cache, interactive recognition is non-functional (1.9% precision). Panorama enables search on live video streams. It introduces three new mechanisms: (1) an intelligent frame selector that reduces the number of frames on which expensive recognition must be run, (2) a distributed scheduler that uses feedback from the vision algorithms to dynamically determine the order in which streams must be processed, and (3) a search-ranking method that uses visual features to improve search relevance. Our experimental results show that incorporating visual features doubles search relevance from 45% to 90%. To achieve 90% search accuracy, with current pricing from Amazon Web Services, Panorama incurs 24x lower cost than a scheme that recognizes every frame.	en_US
dc.description.statementofresponsibility	by Yu-Han Tiffany Chen.	en_US
dc.format.extent	131 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Interactive object recognition and search over mobile video	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph. D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	1004957278	en_US

Files in this item

Name:: 1004957278-MIT.pdf
Size:: 12.02Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record