Show simple item record

dc.contributor.advisorLevy, Roger
dc.contributor.authorPushpita, Subha Nawer
dc.date.accessioned2024-09-16T13:51:29Z
dc.date.available2024-09-16T13:51:29Z
dc.date.issued2024-05
dc.date.submitted2024-07-11T14:37:11.187Z
dc.identifier.urihttps://hdl.handle.net/1721.1/156826
dc.description.abstractContext fundamentally shapes real-time human language processing, creating linguistic expectations that drive efficient processing and accurate disambiguation (Kuperberg and Jaeger, 2016). In naturalistic language understanding, the visual scene often provides crucial context (Ferreira et al., 2013; Huettig et al., 2011). We know that visual context guides spoken word recognition (Allopenna et al., 1998), syntactic disambiguation (Tanenhaus et al., 1995), and prediction (Altmann and Kamide, 1999), but much about how visual context shapes real-time language comprehension remains unknown. In this project, we investigate how visual information penetrates the language processing system and real-time language understanding. Here we show that relevant visual context significantly facilitates reading comprehension, with the amount of facilitation modulated by a word’s degree of grounding in that visual context or image in our case. Our results also demonstrate that the facilitation is largely mediated by the effect of multimodal surprisal(the relative entropy induced by the word between the distributions over interpretations of the previous words in the sentence and the image). We also found that the errors that people are prone to make in reading comprehension tasks can be largely predicted by the amount of multimodal surprisal. The results also highlight the strong correlation between a word’s degree of grounding and reduction of surprisal for the presence of an image. Our work offers new possibilities for how multimodal large language models may be used in psycholinguistic research to investigate how visual context affects language processing. This work will also pioneer questions about how information processed in different modalities such as audio, video, or structured visuals like graphs and diagrams shape our upcoming linguistic comprehension or even language generation, providing fundamental theoretical insights into the understanding of the way we use language to navigate in a complex world.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleExpectation-based comprehension of linguistic input: facilitation from visual context
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record