Robust Scene and Object Generalization of Neural Policies Trained in Synthetic Environments

Quach, Alex H.

Author(s)

Quach, Alex H.

DownloadThesis PDF (38.29Mb)

Advisor

Rus, Daniela

Terms of use

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/

Metadata

Show full item record

Abstract

Achieving generalization for autonomous robotic systems operating in real-world environments remains a significant challenge. Training robots solely in simulations can be limiting due to the "sim-to-real gap"– discrepancies between simulated and real-world conditions. We present two novel approaches to enhance the generalization capabilities of autonomous quadrotor navigation systems when transferring from simulation to the real world. Our f irst approach integrates a 3D Gaussian Splatting radiance field with a quadrotor flight dynamics engine to generate high-quality, photorealistic training data. We design imitation learning schemes to train liquid time-constant neural networks on this data. Through rigorous evaluations, we demonstrate successful zero-shot transfer of the learned navigation policies from simulation to real-world flight, exhibiting generalization to complex, multi-step tasks in novel indoor and outdoor environments. Notably, we showcase autonomous quadrotor policies trained entirely in simulation that can be directly deployed in the real world without fine-tuning. Our method leverages the complementary strengths of photorealistic rendering and irregularly time-sampled data augmentation for enhancing generalization with liquid neural networks. Additionally, we compose off-the-shelf vision-and-language models with neural policies, enabling real-world generalization to complex objects and instructions unseen during training. To the best of our knowledge, this is the first report of zero-shot sim-to-real transfer and semantic generalization for autonomous quadrotor navigation using imitation learning. Our key contributions include: (1) a dynamics-augmented Gaussian splatting simulator, (2) implicit closed-loop augmentation via expert trajectory design, (3) robustifying liquid neural networks through irregularly sampled data, (4) extensive simulation and real-world validation, (5) demonstrating zero-shot real-world transfer capabilities, and (6) enabling zero-shot instruction generalization to novel objects using multimodal representations.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/156571

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses