Lexical-Semantic Content, Not Syntactic Structure, Is the Main Contributor to ANN-Brain Similarity of fMRI Responses in the Language Network

Kauf, Carina; Tuckute, Greta; Levy, Roger; Andreas, Jacob; Fedorenko, Evelina

Author(s)

Kauf, Carina; Tuckute, Greta; Levy, Roger; Andreas, Jacob; Fedorenko, Evelina

Downloadnol_a_00116.pdf (2.835Mb)

Publisher with Creative Commons License

Terms of use

Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/

Metadata

Show full item record

Abstract

Representations from artificial neural network (ANN) language models have been shown to predict human brain activity in the language network. To understand what aspects of linguistic stimuli contribute to ANN-to-brain similarity, we used an fMRI data set of responses to n = 627 naturalistic English sentences (Pereira et al., 2018) and systematically manipulated the stimuli for which ANN representations were extracted. In particular, we (i) perturbed sentences’ word order, (ii) removed different subsets of words, or (iii) replaced sentences with other sentences of varying semantic similarity. We found that the lexical-semantic content of the sentence (largely carried by content words) rather than the sentence’s syntactic form (conveyed via word order or function words) is primarily responsible for the ANN-to-brain similarity. In follow-up analyses, we found that perturbation manipulations that adversely affect brain predictivity also lead to more divergent representations in the ANN’s embedding space and decrease the ANN’s ability to predict upcoming tokens in those stimuli. Further, results are robust as to whether the mapping model is trained on intact or perturbed stimuli and whether the ANN sentence representations are conditioned on the same linguistic context that humans saw. The critical result—that lexical-semantic content is the main contributor to the similarity between ANN representations and neural ones—aligns with the idea that the goal of the human language system is to extract meaning from linguistic strings. Finally, this work highlights the strength of systematic experimental manipulations for evaluating how close we are to accurate and generalizable models of the human language network.

Date issued

2023-09-21

URI

https://hdl.handle.net/1721.1/153506

Department

Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences; McGovern Institute for Brain Research at MIT; Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory

Journal

Neurobiology of Language

Publisher

MIT Press

Citation

Kauf, C., Tuckute, G., Levy, R., Andreas, J., & Fedorenko, E. (2023).Lexical-semantic content, not syntactic structure, is the main contributor to ANN-brain similarity of fMRI responses in the language network. Neurobiology of Language. Advance publication.

Version: Final published version

ISSN

2641-4368

Keywords

Neurology, Linguistics and Language

Collections

MIT Open Access Articles

The following license files are associated with this item:

Creative Commons