Spatiotemporal interpretation features in the recognition of dynamic images

Ben-Yosef, Guy; Kreiman, Gabriel; Ullman, Shimon

dc.contributor.author	Ben-Yosef, Guy
dc.contributor.author	Kreiman, Gabriel
dc.contributor.author	Ullman, Shimon
dc.date.accessioned	2018-11-21T19:36:07Z
dc.date.available	2018-11-21T19:36:07Z
dc.date.issued	2018-11-21
dc.identifier.uri	http://hdl.handle.net/1721.1/119248
dc.description.abstract	Objects and their parts can be visually recognized and localized from purely spatial information in static images and also from purely temporal information as in the perception of biological motion. Cortical regions have been identified, which appear to specialize in visual recognition based on either static or dynamic cues, but the mechanisms by which spatial and temporal information is integrated is only poorly understood. Here we show that visual recognition of objects and actions can be achieved by efficiently combining spatial and motion cues in configurations where each source on its own is insufficient for recognition. This analysis is obtained by the identification of minimal spatiotemporal configurations: these are short videos in which objects and their parts, along with an action being performed, can be reliably recognized, but any reduction in either space or time makes them unrecognizable. State-of-the-art computational models for recognition from dynamic images based on deep 2D and 3D convolutional networks cannot replicate human recognition in these configurations. Action recognition in minimal spatiotemporal configurations is invariably accompanied by full human interpretation of the internal components of the image and their inter-relations. We hypothesize that this gap is due to mechanisms for full spatiotemporal interpretation process, which in human vision is an integral part of recognizing dynamic event, but is not sufficiently represented in current DNNs.	en_US
dc.description.sponsorship	This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Center for Brains, Minds and Machines (CBMM)	en_US
dc.relation.ispartofseries	CBMM Memo Series;094
dc.title	Spatiotemporal interpretation features in the recognition of dynamic images	en_US
dc.type	Technical Report	en_US
dc.type	Working Paper	en_US
dc.type	Other	en_US

Files in this item

Name:: CBMM-Memo-094.pdf
Size:: 1.211Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

CBMM Memo Series

Show simple item record