The MIT Libraries is completing a major upgrade to DSpace@MIT.
Starting May 5 2026, DSpace will remain functional, viewable, searchable, and downloadable, however, you will not be able to edit existing collections or add new material.
We are aiming to have full functionality restored by May 18, 2026, but intermittent service interruptions may occur.
Please email dspace-lib@mit.edu with any questions.
Thank you for your patience as we implement this important upgrade.
Model-based Planning for Efficient Task Execution
| dc.contributor.advisor | Balakrishnan, Hamsa | |
| dc.contributor.author | Ding, Wenqi | |
| dc.date.accessioned | 2025-09-18T14:28:30Z | |
| dc.date.available | 2025-09-18T14:28:30Z | |
| dc.date.issued | 2025-05 | |
| dc.date.submitted | 2025-06-23T14:01:44.498Z | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/162710 | |
| dc.description.abstract | Robotic agents navigating 3D environments must continuously decide their next moves by reasoning about both visual observations and high-level language instructions. However, they plan in a high-dimensional latent space, opaque to human collaborators. Hence, it is difficult for humans to understand the agent’s decision-making process. This lack of interpretability hinders effective collaboration between humans and robots. The key question we are trying to answer in this thesis is: Can we build a unified planning framework that fuses visual and language into a single, interpretable representation, so that humans can interpret robots’ decisions? We propose a model-based planning framework built around pretrained vision-language models (VLMs). We show that VLMs can be used to plan in a unified embedding space, where visual and language representations can be decoded back to human-interpretable forms. Empirical evaluation on vision-language navigation benchmarks demonstrates both improved sample efficiency and transparent decision making, enabling human-in-the-loop planning and more effective human-robot collaboration. | |
| dc.publisher | Massachusetts Institute of Technology | |
| dc.rights | In Copyright - Educational Use Permitted | |
| dc.rights | Copyright retained by author(s) | |
| dc.rights.uri | https://rightsstatements.org/page/InC-EDU/1.0/ | |
| dc.title | Model-based Planning for Efficient Task Execution | |
| dc.type | Thesis | |
| dc.description.degree | M.Eng. | |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| mit.thesis.degree | Master | |
| thesis.degree.name | Master of Engineering in Electrical Engineering and Computer Science |
