MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Minimalist Approach to End-to-End Vision Language Navigation with Multi-Modal Foundation Model Features

Author(s)
Mishra, Kartikesh
Thumbnail
DownloadThesis PDF (19.36Mb)
Advisor
Rus, Daniela
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Recent vision-language navigation (VLN) approaches leverage large models, prompt engineering, and/or explicit reasoning for instruction interpretation and agent guidance. We introduce MiniNav, a minimalist framework employing frozen vision-language foundation models as patch-wise feature extractors, avoiding data and compute heavy fine-tuning and cumbersome language model reasoning. Our lightweight control policies (∼ 10⁵ trainable parameters) are trained on a compact dataset of language-based specified navigational behaviors (∼ 10² runs, ∼ 10⁴ frames per behavior). We demonstrate generalization to novel objects and scenes, including direct real-world transfer, despite training on only two objects in a single simulated environment. Through its simple and scalable design, MiniNav provides an alternative to computationally intensive pipelines for robust real-world instruction-following. Our solution can provide a reference for evaluating the effective edge of more complex and larger VLN policies.
Date issued
2025-05
URI
https://hdl.handle.net/1721.1/163018
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.