Discovering object categories in image collections

Sivic, Josef; Russell, Bryan C.; Efros, Alexei A.; Zisserman, Andrew; Freeman, William T.

Author(s)

Sivic, Josef; Russell, Bryan C.; Efros, Alexei A.; Zisserman, Andrew; Freeman, William T.

DownloadMIT-CSAIL-TR-2005-012.ps (28.21Mb)

Additional downloads

Metadata

Show full item record

Abstract

Given a set of images containing multiple object categories,we seek to discover those categories and their image locations withoutsupervision. We achieve this using generative modelsfrom the statistical text literature: probabilistic Latent SemanticAnalysis (pLSA), and Latent Dirichlet Allocation (LDA). In text analysisthese are used to discover topics in a corpus using the bag-of-wordsdocument representation. Here we discover topics as object categories, sothat an image containing instances of several categories is modelled as amixture of topics.The models are applied to images by using avisual analogue of a word, formed by vector quantizing SIFT like regiondescriptors. We investigate a set of increasingly demanding scenarios,starting with image sets containing only two object categories through tosets containing multiple categories (including airplanes, cars, faces,motorbikes, spotted cats) and background clutter. The object categoriessample both intra-class and scale variation, and both the categories andtheir approximate spatial layout are found without supervision.We also demonstrate classification of unseen images and images containingmultiple objects. Performance of the proposed unsupervised method is compared tothe semi-supervised approach of Fergus et al.

Date issued

2005-02-25

URI

http://hdl.handle.net/1721.1/30525

Other identifiers

MIT-CSAIL-TR-2005-012

AIM-2005-005

Series/Report no.

Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory

Keywords

Collections

CSAIL Technical Reports (July 1, 2003 - present)