Show simple item record

dc.contributor.advisorRichard W. Madison and Tomaso A. Poggio.en_US
dc.contributor.authorXu, Yuetianen_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2011-02-23T14:42:33Z
dc.date.available2011-02-23T14:42:33Z
dc.date.copyright2009en_US
dc.date.issued2009en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/61290
dc.descriptionThesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (p. ).en_US
dc.description.abstractAutomatic understanding of video content is a problem which grows in importance every day. Video understanding algorithms require accuracy, robustness, speed, and scalability. Accuracy generates user confidence in usage. Robustness enables greater autonomy and reduced human intervention. Applications such as navigation and mapping demand real-time performance. Scalability is also important for maintaining high speed while expanding capacity to multiple users and sensors. In this thesis, I propose a "bag-of-phrases" model to improve the accuracy and robustness of the popular "bag-of-words" models. This model applies a "geometric grammar" to add structural constraints to the unordered "bag-of-words." I incorporate this model into an architecture which combines an object recognizer, a tracker, and a geolocation module. This architecture has the ability to use the complementarity of its components to compensate for its weaknesses. This allows for improvements in accuracy, robustness, and speed. Subsequently, I introduce VICTORIOUS, a fast implementation of the proposed architecture. Evaluation on computer-generated data as well as Caltech-101 indicate that this implementation is accurate, robust, and capable of performing in real time on current generation hardware. This implementation, together with the "bag-of-phrases" model and integrated architecture, forms a step towards meeting the requirements for an accurate, robust, real-time vision system.en_US
dc.description.statementofresponsibilityby Yuetian Xu.en_US
dc.format.extentp.en_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleVICTORIOUS : video indexing with combined tracking and object recognition for improved object understanding in scenesen_US
dc.title.alternativeVideo indexing with combined tracking and object recognition for improved object understanding in scenesen_US
dc.typeThesisen_US
dc.description.degreeM.Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc702644367en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record