The natural stories corpus
Author(s)
Futrell, Richard; Gibson, Edward; Blank, Idan; Vishnevetsky, Anastasia
DownloadSubmitted version (681.3Kb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
© LREC 2018 - 11th International Conference on Language Resources and Evaluation. All rights reserved. It is now a common practice to compare models of human language processing by comparing how well they predict behavioral and neural measures of processing difficulty, such as reading times, on corpora of rich naturalistic linguistic materials. However, many of these corpora, which are based on naturally-occurring text, do not contain many of the low-frequency syntactic constructions that are often required to distinguish between processing theories. Here we describe a new corpus consisting of English texts edited to contain many low-frequency syntactic constructions while still sounding fluent to native speakers. The corpus is annotated with hand-corrected Penn Treebank-style parse trees and includes self-paced reading time data and aligned audio recordings. Here we give an overview of the content of the corpus and release the data.
Date issued
2018Department
Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences; Center for Brains, Minds, and MachinesCitation
Futrell, Richard, Gibson, Edward, Blank, Idan and Vishnevetsky, Anastasia. 2018. "The natural stories corpus."
Version: Original manuscript