Expectation-based comprehension of linguistic input: facilitation from visual context

Pushpita, Subha Nawer

dc.contributor.advisor	Levy, Roger
dc.contributor.author	Pushpita, Subha Nawer
dc.date.accessioned	2024-09-16T13:51:29Z
dc.date.available	2024-09-16T13:51:29Z
dc.date.issued	2024-05
dc.date.submitted	2024-07-11T14:37:11.187Z
dc.identifier.uri	https://hdl.handle.net/1721.1/156826
dc.description.abstract	Context fundamentally shapes real-time human language processing, creating linguistic expectations that drive efficient processing and accurate disambiguation (Kuperberg and Jaeger, 2016). In naturalistic language understanding, the visual scene often provides crucial context (Ferreira et al., 2013; Huettig et al., 2011). We know that visual context guides spoken word recognition (Allopenna et al., 1998), syntactic disambiguation (Tanenhaus et al., 1995), and prediction (Altmann and Kamide, 1999), but much about how visual context shapes real-time language comprehension remains unknown. In this project, we investigate how visual information penetrates the language processing system and real-time language understanding. Here we show that relevant visual context significantly facilitates reading comprehension, with the amount of facilitation modulated by a word’s degree of grounding in that visual context or image in our case. Our results also demonstrate that the facilitation is largely mediated by the effect of multimodal surprisal(the relative entropy induced by the word between the distributions over interpretations of the previous words in the sentence and the image). We also found that the errors that people are prone to make in reading comprehension tasks can be largely predicted by the amount of multimodal surprisal. The results also highlight the strong correlation between a word’s degree of grounding and reduction of surprisal for the presence of an image. Our work offers new possibilities for how multimodal large language models may be used in psycholinguistic research to investigate how visual context affects language processing. This work will also pioneer questions about how information processed in different modalities such as audio, video, or structured visuals like graphs and diagrams shape our upcoming linguistic comprehension or even language generation, providing fundamental theoretical insights into the understanding of the way we use language to navigate in a complex world.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Expectation-based comprehension of linguistic input: facilitation from visual context
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: pushpita-snpushpi-meng-eecs-20 ...
Size:: 20.39Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record