| dc.contributor.advisor | Oliva, Aude | |
| dc.contributor.author | Huang, Irene Y. | |
| dc.date.accessioned | 2024-09-24T18:26:36Z | |
| dc.date.available | 2024-09-24T18:26:36Z | |
| dc.date.issued | 2024-05 | |
| dc.date.submitted | 2024-07-11T14:37:27.714Z | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/157010 | |
| dc.description.abstract | Recent advancements in modern Vision-Language Models (VLMs), comprising a visual encoder coupled with a Large Language Model (LLM) decoder, have demonstrated remarkable proficiency in Compositional Reasoning (CR). CR entails grasping the significance of attributes, relations, and word order. This prompts a crucial question: have VLMs effectively tackled the CR challenge? Our conjecture suggests that existing CR benchmarks may not adequately push the boundaries of modern VLMs due to their reliance on a negative text generation pipeline. Consequently, the negatives produced often deviate either as outliers from the natural language distribution learned by VLMs’ LLM decoders or as improbable within the corresponding image context. To redress these limitations, we propose a novel pipeline integrating GPT-4V alongside a suite of contemporary open-source VLMs. Through the application of in-context-learning and prompt engineering methodologies, our pipeline autonomously generates, evaluates, and selects challenging compositional reasoning questions, to establish a robust CR benchmark, also subsequently validated manually. The meticulously curated dataset evinces a noteworthy, up to 45%, decrease in CR performance compared to preceding benchmarks, thereby reinstating the CR challenge even for state-of-the-art VLMs. | |
| dc.publisher | Massachusetts Institute of Technology | |
| dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) | |
| dc.rights | Copyright retained by author(s) | |
| dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
| dc.title | Rethinking the Evaluation of Compositional Reasoning for Modern VLMs | |
| dc.type | Thesis | |
| dc.description.degree | M.Eng. | |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| mit.thesis.degree | Master | |
| thesis.degree.name | Master of Engineering in Electrical Engineering and Computer Science | |