MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Multimodal Representation Learning for Agentic AI Systems

Author(s)
Andonian, Alexander
Thumbnail
DownloadThesis PDF (45.34Mb)
Advisor
Oliva, Aude
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Modern artificial intelligence (AI) is poised to transform the scientific process, from ideation and experimentation to peer review. Many researchers posit that emerging generalist AI “agents” will soon no longer be mere tools, but equal partners in scientific exploration. In this work, we contribute to this evolving landscape through converging lines of research focused on developing and evaluating more efficient and interpretable AI systems, spanning both vision and language domains, and their applications to scientific evaluation and review. Our research focuses on three key areas. First, we introduce a novel framework to enhance the efficiency and robustness of cross-modal representation learning methods. Our approach utilizes progressive self-distillation and soft image-text alignments to model the many-to-many correspondences found in noisy web-harvested datasets. Extensive evaluation demonstrates that our method consistently outperforms CLIP across multiple benchmarks, including improved robustness to natural distribution shifts. We extend this framework to zero-shot open vocabulary detection, introducing augmentation, architectural and self-training strategies for improving vision-text feature alignment. Evaluation on long-tail detection benchmarks demonstrates state-of-the-art performance, with competitive performance for unseen classes, as well as superior transfer to additional datasets. Finally, we present the Review Integrated Scientific Evaluation (RISE) benchmark, a novel framework for assessing AI performance in understanding, critiquing, and providing constructive feedback on scientific manuscripts. Our study compares AI-generated reviews against human expert evaluations, revealing both the promising capabilities and current limitations of AI in scientific peer review. The dissertation concludes by proposing future directions for AI-accelerated science, emphasizing the need for collaborative human-AI scientific communities and the development of evaluation methods for higher-level autonomous capabilities in scientific domains. Altogether, this work contributes to the ongoing discourse on AI’s role in scientific research and paves the way for more rigorous integration of AI systems into the scientific process.
Date issued
2024-09
URI
https://hdl.handle.net/1721.1/158506
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.