Towards Object-based SLAM

Zhang, Yihao

Author(s)

Zhang, Yihao

DownloadThesis PDF (27.54Mb)

Advisor

Leonard, John J.

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Simultaneous localization and mapping (SLAM) is a fundamental capability for a robot to perceive its surrounding environment. The research area has developed for more than two decades from the original sparse landmark-based SLAM to dense SLAM, and now there is a demand for semantic understanding of the environment beyond pure geometric understanding. This thesis delves into object-based SLAM where the map consists of a set of objects with their semantic categories recognized and their poses and shapes estimated. Such a map provides vital object-level semantic and geometric perception to applications such as augmented reality (AR), mixed reality (MR), robot manipulation, and self-driving. In order to perform object-based SLAM, the sensor measurements have to undergo a series of processes. First, objects are semantically segmented in the sensor measurements. This step is typically done by a neural network. As robots are often required to bootstrap from some initial labeled datasets and adapt to different environments where labeled data are unavailable, it is important to enable semi-supervised learning to improve the robot’s performance with the unlabeled data collected by the robot itself. Second, after the objects are segmented, measurements for each object across different views have to be associated together for downstream processing. Lastly, the robot must be able to extract the object pose and shape information from the measurements without access to the detailed object CAD models which are commonly unavailable. This thesis studies these three aspects of object-based SLAM, namely semi-supervised learning of semantic segmentation in a robotics context, data association for object-based SLAM, and category-level object pose and shape estimation. The thesis closes with a discussion of how these components can be integrated into a full object-based SLAM system in the future.

Date issued

2024-09

URI

https://hdl.handle.net/1721.1/158310

Department

Massachusetts Institute of Technology. Department of Mechanical Engineering

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses