Login

A Trainable System for Object Detection in Images and Video Sequences

Show simple item record

dc.contributor.author Papageorgiou, Constantine P. en_US
dc.date.accessioned 2004-10-01T13:59:58Z
dc.date.available 2004-10-01T13:59:58Z
dc.date.issued 2000-05-01 en_US
dc.identifier.other AITR-1685 en_US
dc.identifier.other CBCL-186 en_US
dc.identifier.uri http://hdl.handle.net/1721.1/5566
dc.description.abstract This thesis presents a general, trainable system for object detection in static images and video sequences. The core system finds a certain class of objects in static images of completely unconstrained, cluttered scenes without using motion, tracking, or handcrafted models and without making any assumptions on the scene structure or the number of objects in the scene. The system uses a set of training data of positive and negative example images as input, transforms the pixel images to a Haar wavelet representation, and uses a support vector machine classifier to learn the difference between in-class and out-of-class patterns. To detect objects in out-of-sample images, we do a brute force search over all the subwindows in the image. This system is applied to face, people, and car detection with excellent results. For our extensions to video sequences, we augment the core static detection system in several ways -- 1) extending the representation to five frames, 2) implementing an approximation to a Kalman filter, and 3) modeling detections in an image as a density and propagating this density through time according to measured features. In addition, we present a real-time version of the system that is currently running in a DaimlerChrysler experimental vehicle. As part of this thesis, we also present a system that, instead of detecting full patterns, uses a component-based approach. We find it to be more robust to occlusions, rotations in depth, and severe lighting conditions for people detection than the full body version. We also experiment with various other representations including pixels and principal components and show results that quantify how the number of features, color, and gray-level affect performance. en_US
dc.description.provenance Made available in DSpace on 2004-10-01T13:59:58Z (GMT). No. of bitstreams: 2 AITR-1685.ps: 72537763 bytes, checksum: fe83e485ecee975226b7c68d57a69be0 (MD5) AITR-1685.pdf: 15910731 bytes, checksum: f1cd52b639ff7f37d64254e06d46103b (MD5) Previous issue date: 2000-05-01 en
dc.format.extent 128 p. en_US
dc.format.extent 72537763 bytes
dc.format.extent 15910731 bytes
dc.format.mimetype application/postscript
dc.format.mimetype application/pdf
dc.language.iso en_US
dc.relation.ispartofseries AITR-1685 en_US
dc.relation.ispartofseries CBCL-186 en_US
dc.subject AI en_US
dc.subject MIT en_US
dc.subject Artificial Intelligence en_US
dc.subject object detection en_US
dc.subject pattern recognition en_US
dc.subject people detection en_US
dc.subject face detection en_US
dc.subject car detection en_US
dc.title A Trainable System for Object Detection in Images and Video Sequences en_US

Files in this item

Files Size Format
AITR-1685.pdf 15.91Mb application/pdf
AITR-1685.ps 72.53Mb application/postscript

This item appears in the following Collection(s)

Show simple item record

Search DSpace@MIT


Advanced Search

Browse

My Account

Links