Generalizable Robot Manipulation through Unified Perception, Policy Learning, and Planning
Author(s)
Fang, Xiaolin
DownloadThesis PDF (51.30Mb)
Advisor
Kaelbling, Leslie Pack
Lozano-Pérez, Tomás
Terms of use
Metadata
Show full item recordAbstract
Advancing robotic manipulation to achieve generalization across diverse goals, environments, and embodiments is a critical challenge in robotics research. While the availability of data and large-scale training has brought exciting progress in robotics manipulation, current methods often struggle with generalizing to unseen, unstructured environments and solving long-horizon tasks. In this thesis, I will present my work in robot learning and planning that enables multi-step manipulation in partially observable environments, towards general-purpose embodied agents. Specifically, I will talk about my work in 1) constructing a modular framework that estimates affordances with learned perception models with task-and-motion-planning (TAMP) for object rearrangement in unstructured scenes, 2) learning generative diffusion models of robot skills, which can be composed to solve unseen combination of environmental constraints through infeference-time optimization, 3) leveraging large vision-language models (VLMs) in building task-oriented visual abstractions, allowing skills to generalize across different environments with only 5 to 10 demonstrations. Together, these approaches contribute to the generality and scalability of embodied agents towards solving real-world manipulation in unstructured environments.
Date issued
2025-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology