Dorsal stream : from algorithm to neuroscience
Author(s)
Jhuang, Hueihan
DownloadFull printable version (22.78Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Tomaso Poggio.
Terms of use
Metadata
Show full item recordAbstract
The dorsal stream in the primate visual cortex is involved in the perception of motion and the recognition of actions. The two topics, motion processing in the brain, and action recognition in videos, have been developed independently in the field of neuroscience and computer vision. We present a dorsal stream model that can be used for the recognition of actions as well as explaining neurophysiology in the dorsal stream. The model consists of a spatio-temporal feature detectors of increasing complexity: an input image sequence is first analyzed by an array of motion sensitive units which, through a hierarchy of processing stages, lead to position and scale invariant representation of motion in a video sequence. The model outperforms or on par with the state-of-the-art computer vision algorithms on a range of human action datasets. We then describe the extension of the model into a high-throughput system for the recognition of mouse behaviors in their homecage. We provide software and a very large manually annotated video database used for training and testing the system. Our system outperforms a commercial software and performs on par with human scoring, as measured from the ground-truth manual annotations of more than 10 hours of videos of freely behaving mice. We complete the neurobiological side of the model by showing it could explain the motion processing as well as action selectivity in the dorsal stream, based on comparisons between model outputs and the neuronal responses in the dorsal stream. Specifically, the model could explain pattern and component sensitivity and distribution [161], local motion integration [97], and speed-tuning [144] of MT cells. The model, when combining with the ventral stream model [173], could also explain the action and actor selectivity in the STP area. There exists only a few models for the motion processing in the dorsal stream, and these models were not be applied to the real-world computer vision tasks. Our model is one that agrees with (or processes) data at different levels: from computer vision algorithm, practical software, to neuroscience.
Description
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011. Cataloged from PDF version of thesis. Includes bibliographical references (p. 173-195).
Date issued
2011Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.