MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

An fMRI dataset of 1,102 natural videos for visual event understanding

Author(s)
Lahner, Benjamin
Thumbnail
DownloadThesis PDF (14.48Mb)
Advisor
Oliva, Aude
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
A visual event, such as a dog running in a park, communicates complex relationships between objects and their environment. The human visual system is tasked with transforming these spatiotemporal events into meaningful outputs so we can effectively interact with our environment. To form a useful representation of the event, the visual system utilizes many visual processes, from object recognition to motion perception. Thus, studying the neural correlates of visual event understanding requires brain responses that capture the entire transformation from video-based stimuli to high-level conceptual understanding. However, despite its ecological importance and computational richness, there does not yet exist a dataset to sufficiently study visual event understanding. Here we release the Algonauts Action Videos (AAV) dataset composed of high quality functional magnetic resonance imaging brain responses to 1,102 richly annotated naturalistic video stimuli. We detail AAV’s experimental design and highlight its high quality and reliable activation throughout the visual and parietal cortices. Initial analyses show the signal contained in AAV reflects numerous visual processes representing different aspects of visual event understanding, from scene recognition to action recognition to memorability processing. Since AAV captures an ecologically-relevant and complex visual process, this dataset can be used to study how various aspects of visual perception integrate to form a meaningful understanding of a video. Additionally, we demonstrate its utility as a model evaluation benchmark to bridge the gap between visual neuroscience and video-based computer vision research.
Date issued
2022-05
URI
https://hdl.handle.net/1721.1/144631
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.