Robust Scene and Object Generalization of Neural Policies Trained in Synthetic Environments

Quach, Alex H.

dc.contributor.advisor	Rus, Daniela
dc.contributor.author	Quach, Alex H.
dc.date.accessioned	2024-09-03T21:08:18Z
dc.date.available	2024-09-03T21:08:18Z
dc.date.issued	2024-05
dc.date.submitted	2024-07-11T14:36:25.550Z
dc.identifier.uri	https://hdl.handle.net/1721.1/156571
dc.description.abstract	Achieving generalization for autonomous robotic systems operating in real-world environments remains a significant challenge. Training robots solely in simulations can be limiting due to the "sim-to-real gap"– discrepancies between simulated and real-world conditions. We present two novel approaches to enhance the generalization capabilities of autonomous quadrotor navigation systems when transferring from simulation to the real world. Our f irst approach integrates a 3D Gaussian Splatting radiance field with a quadrotor flight dynamics engine to generate high-quality, photorealistic training data. We design imitation learning schemes to train liquid time-constant neural networks on this data. Through rigorous evaluations, we demonstrate successful zero-shot transfer of the learned navigation policies from simulation to real-world flight, exhibiting generalization to complex, multi-step tasks in novel indoor and outdoor environments. Notably, we showcase autonomous quadrotor policies trained entirely in simulation that can be directly deployed in the real world without fine-tuning. Our method leverages the complementary strengths of photorealistic rendering and irregularly time-sampled data augmentation for enhancing generalization with liquid neural networks. Additionally, we compose off-the-shelf vision-and-language models with neural policies, enabling real-world generalization to complex objects and instructions unseen during training. To the best of our knowledge, this is the first report of zero-shot sim-to-real transfer and semantic generalization for autonomous quadrotor navigation using imitation learning. Our key contributions include: (1) a dynamics-augmented Gaussian splatting simulator, (2) implicit closed-loop augmentation via expert trajectory design, (3) robustifying liquid neural networks through irregularly sampled data, (4) extensive simulation and real-world validation, (5) demonstrating zero-shot real-world transfer capabilities, and (6) enabling zero-shot instruction generalization to novel objects using multimodal representations.
dc.publisher	Massachusetts Institute of Technology
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title	Robust Scene and Object Generalization of Neural Policies Trained in Synthetic Environments
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: quach-aquach-meng-eecs-2024-th ...
Size:: 38.29Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record