Examining high level neural representations of cluttered scenes

Meyers, Ethan; Embark, Hamdy; Freiwald, Winrich; Serre, Thomas; Kreiman, Gabriel; Poggio, Tomaso

Author(s)

Meyers, Ethan; Embark, Hamdy; Freiwald, Winrich; Serre, Thomas; Kreiman, Gabriel; ... Show more

DownloadMIT-CSAIL-TR-2010-034.pdf (2.052Mb)

Other Contributors

Center for Biological and Computational Learning (CBCL)

Advisor

Tomaso Poggio

Metadata

Show full item record

Abstract

Humans and other primates can rapidly categorize objects even when they are embedded in complex visual scenes (Thorpe et al., 1996; Fabre-Thorpe et al., 1998). Studies by Serre et al., 2007 have shown that the ability of humans to detect animals in brief presentations of natural images decreases as the size of the target animal decreases and the amount of clutter increases, and additionally, that a feedforward computational model of the ventral visual system, originally developed to account for physiological properties of neurons, shows a similar pattern of performance. Motivated by these studies, we recorded single- and multi-unit neural spiking activity from macaque superior temporal sulcus (STS) and anterior inferior temporal cortex (AIT), as a monkey passively viewed images of natural scenes. The stimuli consisted of 600 images of animals in natural scenes, and 600 images of natural scenes without animals in them, captured at four different viewing distances, and were the same images used by Serre et al. to allow for a direct comparison between human psychophysics, computational models, and neural data. To analyze the data, we applied population "readout" techniques (Hung et al., 2005; Meyers et al., 2008) to decode from the neural activity whether an image contained an animal or not. The decoding results showed a similar pattern of degraded decoding performance with increasing clutter as was seen in the human psychophysics and computational model results. However, overall the decoding accuracies from the neural data lower were than that seen in the computational model, and the latencies of information in IT were long (~125ms) relative to behavioral measures obtained from primates in other studies. Additional tests also showed that the responses of the model units were not capturing several properties of the neural responses, and that detecting animals in cluttered scenes using simple model units based on V1 cells worked almost as well as using more complex model units that were designed to model the responses of IT neurons. While these results suggest AIT might not be the primary brain region involved in this form of rapid categorization, additional studies are needed before drawing strong conclusions.

Date issued

2010-07-29

URI

http://hdl.handle.net/1721.1/57463

Series/Report no.

MIT-CSAIL-TR-2010-034CBCL-289

Keywords

decoding, readout, rapid categorization, inferior temporal cortex, object recognition, scene understanding, neuroscience, visual clutter, electrophysiology

Collections

CSAIL Technical Reports (July 1, 2003 - present)