Exploring the Role of Foundation Models for Training Generalist Robot Learning Policies
Author(s)
Feng, Eugenia Y.
DownloadThesis PDF (28.89Mb)
Advisor
Kaelbling, Leslie
Agarwal, Aditya
Terms of use
Metadata
Show full item recordAbstract
Numerous methodologies to solving goal-conditioned short-horizon tasks require hundreds of expert demonstrations, but these demonstrations are effort-intensive to collect, reducing the scalability of these approaches. Even with approaches that do work, they may have difficulty generalizing to slightly different settings. In this work, we explore two approaches to training generalist robot learning policies using large-scale foundation models.
The first approach aims to use a video foundation model to generate task-conditioned synthetic demonstrations at scale from a single expert demonstration. The objective is to leverage these synthetic demonstrations as proxy for expert demonstrations to train models that learn rewards from expert videos for solving complex visual RL problems.
The second approach seeks to improve upon the generalization ability of behavior cloning policies. Moving away from the use of videos for training, we explore using privileged representations such as keypoints or object-poses learned using open-set foundation models. By tracking pose or keypoint correspondences, the aim is to minimize the required number of demonstrations to achieve task completion and improve generalization within classes of objects.
Date issued
2025-02Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology