MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Rethinking the Evaluation of Compositional Reasoning for Modern VLMs

Author(s)
Huang, Irene Y.
Thumbnail
DownloadThesis PDF (8.565Mb)
Advisor
Oliva, Aude
Terms of use
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
Recent advancements in modern Vision-Language Models (VLMs), comprising a visual encoder coupled with a Large Language Model (LLM) decoder, have demonstrated remarkable proficiency in Compositional Reasoning (CR). CR entails grasping the significance of attributes, relations, and word order. This prompts a crucial question: have VLMs effectively tackled the CR challenge? Our conjecture suggests that existing CR benchmarks may not adequately push the boundaries of modern VLMs due to their reliance on a negative text generation pipeline. Consequently, the negatives produced often deviate either as outliers from the natural language distribution learned by VLMs’ LLM decoders or as improbable within the corresponding image context. To redress these limitations, we propose a novel pipeline integrating GPT-4V alongside a suite of contemporary open-source VLMs. Through the application of in-context-learning and prompt engineering methodologies, our pipeline autonomously generates, evaluates, and selects challenging compositional reasoning questions, to establish a robust CR benchmark, also subsequently validated manually. The meticulously curated dataset evinces a noteworthy, up to 45%, decrease in CR performance compared to preceding benchmarks, thereby reinstating the CR challenge even for state-of-the-art VLMs.
Date issued
2024-05
URI
https://hdl.handle.net/1721.1/157010
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.