Show simple item record

dc.contributor.advisorOliva, Aude
dc.contributor.authorHuang, Irene Y.
dc.date.accessioned2024-09-24T18:26:36Z
dc.date.available2024-09-24T18:26:36Z
dc.date.issued2024-05
dc.date.submitted2024-07-11T14:37:27.714Z
dc.identifier.urihttps://hdl.handle.net/1721.1/157010
dc.description.abstractRecent advancements in modern Vision-Language Models (VLMs), comprising a visual encoder coupled with a Large Language Model (LLM) decoder, have demonstrated remarkable proficiency in Compositional Reasoning (CR). CR entails grasping the significance of attributes, relations, and word order. This prompts a crucial question: have VLMs effectively tackled the CR challenge? Our conjecture suggests that existing CR benchmarks may not adequately push the boundaries of modern VLMs due to their reliance on a negative text generation pipeline. Consequently, the negatives produced often deviate either as outliers from the natural language distribution learned by VLMs’ LLM decoders or as improbable within the corresponding image context. To redress these limitations, we propose a novel pipeline integrating GPT-4V alongside a suite of contemporary open-source VLMs. Through the application of in-context-learning and prompt engineering methodologies, our pipeline autonomously generates, evaluates, and selects challenging compositional reasoning questions, to establish a robust CR benchmark, also subsequently validated manually. The meticulously curated dataset evinces a noteworthy, up to 45%, decrease in CR performance compared to preceding benchmarks, thereby reinstating the CR challenge even for state-of-the-art VLMs.
dc.publisherMassachusetts Institute of Technology
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleRethinking the Evaluation of Compositional Reasoning for Modern VLMs
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record