Show simple item record

dc.contributor.advisorHari Balakrishnan.en_US
dc.contributor.authorChen, Yu-Han Tiffanyen_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2017-10-18T15:08:10Z
dc.date.available2017-10-18T15:08:10Z
dc.date.copyright2017en_US
dc.date.issued2017en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/111876
dc.descriptionThesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 123-131).en_US
dc.description.abstractCameras of good quality are now available on handheld and wearable mobile devices. The high resolution of these cameras coupled with pervasive wireless connectivity and advanced computer vision algorithms makes it feasible to develop new ways to interact with mobile video. Two important examples are interactive object recognition and search-by-content. Interactive recognition continuously locates objects in a video stream, recognizes them, and labels them with information associated with the objects in the user's view. Example use cases include an augmented shopping application that recognizes products or brands to inform customers about the items they buy and a driver assistance application that recognizes vehicles and signs to improve driver safety. Interactive search-by-content allows users to discover videos using textual queries (e.g., "child dog play"). Instead of requiring broadcasters to manually annotate videos with meta-data tags, our search system uses vision algorithms to automatically produce textual tags. These two services must be highly interactive because users expect timely feedback for their interactions and changes in content. However, achieving high interactivity without sacrificing accuracy or efficiency is challenging. The required computer vision algorithms use computationally intensive deep neural networks and must run at a frame rate of 30 frames per second. Recognizing an object scales with the size of the corpus of objects, and is infeasible on a mobile device. Off-loading recognition operations to servers introduces network and processing delay; when this delay is higher than a frame-time, it degrades recognition accuracy. This dissertation presents two systems that study the trade-off between accuracy and efficiency for interactive recognition and search, and demonstrate how to achieve both goals. Glimpse enables interactive object recognition for camera-equipped mobile devices. Because the algorithms for object recognition entail significant computation, Glimpse runs them on servers across the network. To "hide" latency, Glimpse uses an active cache of video frames on the device and performs tracking on a subset of frames to correct the stale results obtained from the processing pipeline. Our results show that Glimpse achieves a precision of 90% for face recognition, which improves over a scheme performing server-side recognition without using an active cache by 2.8 x. For fast moving objects such as road signs, Glimpse achieves precision up to 80%; without using the active cache, interactive recognition is non-functional (1.9% precision). Panorama enables search on live video streams. It introduces three new mechanisms: (1) an intelligent frame selector that reduces the number of frames on which expensive recognition must be run, (2) a distributed scheduler that uses feedback from the vision algorithms to dynamically determine the order in which streams must be processed, and (3) a search-ranking method that uses visual features to improve search relevance. Our experimental results show that incorporating visual features doubles search relevance from 45% to 90%. To achieve 90% search accuracy, with current pricing from Amazon Web Services, Panorama incurs 24x lower cost than a scheme that recognizes every frame.en_US
dc.description.statementofresponsibilityby Yu-Han Tiffany Chen.en_US
dc.format.extent131 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleInteractive object recognition and search over mobile videoen_US
dc.typeThesisen_US
dc.description.degreePh. D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc1004957278en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record