Long-term Object-based SLAM in Low-dynamic Environments

Fu, Jiahui

Author(s)

Fu, Jiahui

DownloadThesis PDF (29.23Mb)

Advisor

Leonard, John J.

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Simultaneous Localization and Mapping (SLAM) is fundamental for autonomous agents to understand their surroundings. Moreover, for advanced robotic tasks, engaging in consistent object-level reasoning is critical, especially for activities involving repetitive traversal within the same environment, such as household cleaning and object retrieval. In a changing world, robots should always locate themselves and their targets while maintaining an updated environment map. Traditional SLAM relies on static geometric primitives from observations, lacking semantic understanding. These unordered sets of points, lines, or planes struggle with object-level interpretation, leading to false estimation against scene changes. As the world functions and evolves under the minimal unit of objects, object-aided SLAM is a logical option. This thesis revolves around long-term object-based SLAM within low-dynamic environments to bridge the communication gap between SLAM techniques and high-level robotic applications and enhance SLAM compatibility with object-level perception. It presents three contributions: First, we propose a multi-hypothesis approach for the ambiguity-aware adoption of object poses in object-based SLAM. This approach accommodates the inherent ambiguity arising from occlusion or symmetrical object shapes. We design a multi-hypothesis object pose estimator front end in a mixture-of-expert fashion and utilize a max-mixture-based back end to infer globally consistent camera and object poses from a sequence of pose hypothesis sets. Second, we develop two change detection approaches for offline and online applications, with two novel scene and object representations, PlaneSDF and shape-consistent neural descriptor fields, respectively. Regarding long-term operation, we account for inevitable scene changes over extended periods and the efficiency and scalability of the chosen map representations. Furthermore, we explore cluster- and object-level change detection, following a "divide-and-conquer" strategy to enable more accurate and flexible change detection through local scene differencing. Last, we propose a neural SE(3)-equivariant object embedding (NeuSE) for long-term consistent spatial understanding in object-based SLAM. NeuSE is trained to serve as a compact point cloud surrogate for complete object models. Our NeuSE-based object SLAM paradigm directly derives SE(3) camera pose constraints compatible with general SLAM pose graph optimization. This realizes object-assisted localization and a lightweight object-centric map with change-aware mapping ability, ultimately achieving robust scene understanding despite temporal environment changes.

Date issued

2024-02

URI

https://hdl.handle.net/1721.1/153705

Department

Massachusetts Institute of Technology. Department of Mechanical Engineering

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses