The dynamics of invariant object and action recognition in the human visual system

Isik, Leyla

Author(s)

Isik, Leyla

DownloadFull printable version (14.91Mb)

Other Contributors

Massachusetts Institute of Technology. Computational and Systems Biology Program.

Advisor

Tomaso Poggio.

Terms of use

M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Humans can quickly and effortlessly recognize objects, and people and their actions from complex visual inputs. Despite the ease with which the human brain solves this problem, the underlying computational steps have remained enigmatic. What makes object and action recognition challenging are identity-preserving transformations that alter the visual appearance of objects and actions, such as changes in scale, position, and viewpoint. The majority of visual neuroscience studies examining visual recognition either use physiology recordings, which provide high spatiotemporal resolution data with limited brain coverage, or functional MRI, which provides high spatial resolution data from across the brain with limited temporal resolution. High temporal resolution data from across the brain is needed to break down and understand the computational steps underlying invariant visual recognition. In this thesis I use magenetoencephalography, machine learning, and computational modeling to study invariant visual recognition. I show that a temporal association learning rule for learning invariance in hierarchical visual systems is very robust to manipulations and visual disputations that happen during development (Chapter 2). I next show that object recognition occurs very quickly, with invariance to size and position developing in stages beginning around 100ms after stimulus onset (Chapter 3), and that action recognition occurs on a similarly fast time scale, 200 ms after video onset, with this early representation being invariant to changes in actor and viewpoint (Chapter 4). Finally, I show that the same hierarchical feedforward model can explain both the object and action recognition timing results, putting this timing data in the broader context of computer vision systems and models of the brain. This work sheds light on the computational mechanisms underlying invariant object and action recognition in the brain and demonstrates the importance of using high temporal resolution data to understand neural computations.

Description

Thesis: Ph. D., Massachusetts Institute of Technology, Computational and Systems Biology Program, 2015.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 123-138).

Date issued

2015

URI

http://hdl.handle.net/1721.1/98000

Department

Massachusetts Institute of Technology. Computational and Systems Biology Program

Publisher

Massachusetts Institute of Technology

Keywords

Computational and Systems Biology Program.

Collections

Doctoral Theses