dc.contributor.advisor | Balakrishnan, Hamsa | |
dc.contributor.author | Ding, Wenqi | |
dc.date.accessioned | 2025-09-18T14:28:30Z | |
dc.date.available | 2025-09-18T14:28:30Z | |
dc.date.issued | 2025-05 | |
dc.date.submitted | 2025-06-23T14:01:44.498Z | |
dc.identifier.uri | https://hdl.handle.net/1721.1/162710 | |
dc.description.abstract | Robotic agents navigating 3D environments must continuously decide their next moves by reasoning about both visual observations and high-level language instructions. However, they plan in a high-dimensional latent space, opaque to human collaborators. Hence, it is difficult for humans to understand the agent’s decision-making process. This lack of interpretability hinders effective collaboration between humans and robots. The key question we are trying to answer in this thesis is: Can we build a unified planning framework that fuses visual and language into a single, interpretable representation, so that humans can interpret robots’ decisions? We propose a model-based planning framework built around pretrained vision-language models (VLMs). We show that VLMs can be used to plan in a unified embedding space, where visual and language representations can be decoded back to human-interpretable forms. Empirical evaluation on vision-language navigation benchmarks demonstrates both improved sample efficiency and transparent decision making, enabling human-in-the-loop planning and more effective human-robot collaboration. | |
dc.publisher | Massachusetts Institute of Technology | |
dc.rights | In Copyright - Educational Use Permitted | |
dc.rights | Copyright retained by author(s) | |
dc.rights.uri | https://rightsstatements.org/page/InC-EDU/1.0/ | |
dc.title | Model-based Planning for Efficient Task Execution | |
dc.type | Thesis | |
dc.description.degree | M.Eng. | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
mit.thesis.degree | Master | |
thesis.degree.name | Master of Engineering in Electrical Engineering and Computer Science | |