MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Towards Self-Supervised Object Representations and 3D Scene Graph Based Navigation

Author(s)
Peng, Lisa
Thumbnail
DownloadThesis PDF (39.37Mb)
Advisor
Carlone, Luca
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
3D Scene Graphs are powerful hierarchical representations of environments that combine spatial and semantic information into multiple levels of abstraction. 3D Scene Graphs are useful for a wide range of planning tasks in robotics that benefit from high-level semantic knowledge, and also capture dense low-level 3D geometry which is useful to support robot navigation. However, current methods for 3D Scene Graph construction result in sparse and sometimes spurious node instances and incorrect annotations. This is due to the their reliance on 2D semantic segmentation networks that may perform poorly outside their training domain. This thesis advances the state of the art in dense 2D semantic segmentation and 3D object pose estimation to improve scene graph construction and enable navigation in real-life environments. First, we tackle the scalability problem of data annotation for deep semantic segmentation and introduce a simple training approach for dense 2D object instance segmentation. The approach uses model-based synthetic data for training, and augments it with a small amount of real-world training data. We show that with this approach, our segmentation network needs 20x less real-world annotated images and achieves higher quality pixel-level segmentation on real-world test data. Second, we address the problem of data annotation in 3D object pose estimation and model fitting by proposing a novel self-supervised training framework that uses corrector and certification modules. Our architecture successfully trains a model to predict poses of partial point clouds without any ground truth pose annotations on real data, and with certifications of correctness and non-degeneracy – characterizing both quality of model fit and uniqueness of the solution. We provide extensive experiments, evaluating performance on both simulated and real world data, and show that the proposed approach matches the performance of fully supervised baselines. Lastly, we introduce a novel application of 3D Scene Graphs to an object search task. We show how 3D Scene Graphs can be used in a reinforcement learning framework to guide autonomous navigation and discuss how hierarchical information and dense semantics improves the effectiveness of the learned policy.
Date issued
2022-09
URI
https://hdl.handle.net/1721.1/147509
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.