Author(s)Torralba, Antonio; Fergus, Rob; Freeman, William T.
MetadataShow full item record
The human visual system is remarkably tolerant to degradations in image resolution: in a scene recognition task, human performance is similar whether $32 \times 32$ color images or multi-mega pixel images are used. With small images, even object recognition and segmentation is performed robustly by the visual system, despite the object being unrecognizable in isolation. Motivated by these observations, we explore the space of 32x32 images using a database of 10^8 32x32 color images gathered from the Internet using image search engines. Each image is loosely labeled with one of the 70,399 non-abstract nouns in English, as listed in the Wordnet lexical database. Hence the image database represents a dense sampling of all object categories and scenes. With this dataset, we use nearest neighbor methods to perform objectrecognition across the 10^8 images.
Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory
Recognition, Nearest neighbors methods, Image databases