Learning a dictionary of shape-components in visual cortex : comparison with neurons, humans and machines

Serre, Thomas (Thomas R. G.)

dc.contributor.advisor	Tomaso Poggio.	en_US
dc.contributor.author	Serre, Thomas (Thomas R. G.)	en_US
dc.contributor.other	Massachusetts Institute of Technology. Dept. of Brain and Cognitive Sciences.	en_US
dc.date.accessioned	2006-10-31T15:21:19Z
dc.date.available	2006-10-31T15:21:19Z
dc.date.copyright	2006	en_US
dc.date.issued	2006	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/34270
dc.description	Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2006.	en_US
dc.description	This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.	en_US
dc.description	Includes bibliographical references (p. [175]-211).	en_US
dc.description.abstract	In this thesis, I describe a quantitative model that accounts for the circuits and computations of the feedforward path of the ventral stream of visual cortex. This model is consistent with a general theory of visual processing that extends the hierarchical model of [Hubel and Wiesel, 1959] from primary to extrastriate visual areas. It attempts to explain the first few hundred milliseconds of visual processing and "immediate recognition". One of the key elements in the approach is the learning of a generic dictionary of shape components from V2 to IT, which provides an invariant representation to task-specific categorization circuits in higher brain areas. This vocabulary of shape-tuned units is learned in an unsupervised manner from natural images, and constitutes a large and redundant set of image features with different complexities and invariances. This theory significantly extends an earlier approach by [Riesenhuber and Poggio, 1999a] and builds upon several existing neurobiological models and conceptual proposals. First, I present evidence to show that the model can duplicate the tuning properties of neurons in various brain areas (e.g., V1, V4 and IT).	en_US
dc.description.abstract	(cont.) In particular, the model agrees with data from V4 about the response of neurons to combinations of simple two-bar stimuli [Reynolds et al., 1999] (within the receptive field of the S2 units) and some of the C2 units in the model show a tuning for boundary conformations which is consistent with recordings from V4 [Pasupathy and Connor, 2001]. Second, I show that not only can the model duplicate the tuning properties of neurons in various brain areas when probed with artificial stimuli, but it can also handle the recognition of objects in the real-world, to the extent of competing with the best computer vision systems. Third, I describe a comparison between the performance of the model and the performance of human observers in a rapid animal vs. non-animal recognition task for which recognition is fast and cortical back-projections are likely to be inactive. Results indicate that the model predicts human performance extremely well when the delay between the stimulus and the mask is about 50 ms. This suggests that cortical back-projections may not play a significant role when the time interval is in this range, and the model may therefore provide a satisfactory description of the feedforward path.	en_US
dc.description.abstract	(cont.) Taken together, the evidences suggest that we may have the skeleton of a successful theory of visual cortex. In addition, this may be the first time that a neurobiological model, faithful to the physiology and the anatomy of visual cortex, not only competes with some of the best computer vision systems thus providing a realistic alternative to engineered artificial vision systems, but also achieves performance close to that of humans in a categorization task involving complex natural images.	en_US
dc.description.statementofresponsibility	by Thomas Serre.	en_US
dc.format.extent	211 p.	en_US
dc.format.extent	6642161 bytes
dc.format.extent	6639386 bytes
dc.format.mimetype	application/pdf
dc.format.mimetype	application/pdf
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582
dc.subject	Brain and Cognitive Sciences.	en_US
dc.title	Learning a dictionary of shape-components in visual cortex : comparison with neurons, humans and machines	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph.D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
dc.identifier.oclc	71152487	en_US

Files in this item

Name:: 71152487-MIT.pdf
Size:: 6.331Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record