Towards Self-Supervised Object Representations and 3D Scene Graph Based Navigation

Peng, Lisa

Author(s)

Peng, Lisa

DownloadThesis PDF (39.37Mb)

Advisor

Carlone, Luca

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

3D Scene Graphs are powerful hierarchical representations of environments that combine spatial and semantic information into multiple levels of abstraction. 3D Scene Graphs are useful for a wide range of planning tasks in robotics that benefit from high-level semantic knowledge, and also capture dense low-level 3D geometry which is useful to support robot navigation. However, current methods for 3D Scene Graph construction result in sparse and sometimes spurious node instances and incorrect annotations. This is due to the their reliance on 2D semantic segmentation networks that may perform poorly outside their training domain. This thesis advances the state of the art in dense 2D semantic segmentation and 3D object pose estimation to improve scene graph construction and enable navigation in real-life environments. First, we tackle the scalability problem of data annotation for deep semantic segmentation and introduce a simple training approach for dense 2D object instance segmentation. The approach uses model-based synthetic data for training, and augments it with a small amount of real-world training data. We show that with this approach, our segmentation network needs 20x less real-world annotated images and achieves higher quality pixel-level segmentation on real-world test data. Second, we address the problem of data annotation in 3D object pose estimation and model fitting by proposing a novel self-supervised training framework that uses corrector and certification modules. Our architecture successfully trains a model to predict poses of partial point clouds without any ground truth pose annotations on real data, and with certifications of correctness and non-degeneracy – characterizing both quality of model fit and uniqueness of the solution. We provide extensive experiments, evaluating performance on both simulated and real world data, and show that the proposed approach matches the performance of fully supervised baselines. Lastly, we introduce a novel application of 3D Scene Graphs to an object search task. We show how 3D Scene Graphs can be used in a reinforcement learning framework to guide autonomous navigation and discuss how hierarchical information and dense semantics improves the effectiveness of the learned policy.

Date issued

2022-09

URI

https://hdl.handle.net/1721.1/147509

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses