Addressing Challenges in Object-Based Robot Navigation and Mapping
Author(s)
Lu, Ziqi
DownloadThesis PDF (132.9Mb)
Advisor
Leonard, John J.
Terms of use
Metadata
Show full item recordAbstract
Developing fully autonomous systems that can safely traverse and interact with the environment has been a long-term objective in robotics. Many relevant tasks, such as planning and mobile manipulation, require the robot to possess an object-level understanding of the ambient world. In particular, it would be crucial to maintain a globally consistent objectbased map of the environment for these operations. Without external assistance – such as a prior map or a motion capture system – the robot needs to navigate and map the environment using an object-based SLAM system. This thesis is dedicated to addressing several key challenges in developing object SLAM systems. The first challenge arises from the ambiguity of object poses in single-view observations. When an object is observed from a single vantage point, it can often have multiple probable poses due to symmetry, occlusion, or perceptual failures. It would be difficult for an object SLAM system to incorporate such ambiguous measurements. To address this issue, we introduce an ambiguity-aware object SLAM method. We use Gaussian max-mixture models to represent and efficiently track the multiple object pose hypotheses, and gradually disambiguate the poses to construct a globally consistent object-level map. The second challenge is the performance degradation of neural networks when deployed in novel robot operating environments, commonly known as the domain gap problem. Specifically, when a pre-trained 6DoF object pose estimator is used in a novel environment, its pose predictions are often corrupted by outliers, and quantifying their uncertainties becomes difficult. Using these noisy predictions with unmodeled uncertainties as measurements in an object SLAM system can lead to significant estimation errors. To mitigate the problem, we propose a SLAM-supported self-training pipeline for domain adaptation of 6DoF object pose estimators. We exploit robust pose graph optimization (PGO) results to pseudo-label robot-collected images and fine-tune 6D object pose estimators. In particular, we develop an Automatic Covariance Tuning (ACT) method to model pose prediction uncertainties automatically during the PGO process. The third challenge is environmental changes. As changes occur in the scene, such as object insertion, removal, or rearrangement, the robot needs to efficiently detect these changes and update the map accordingly. While detecting and reflecting scene changes is relatively straightforward with handcrafted map representations like point clouds or voxels, it becomes significantly more difficult with learned radiance-field-based scene representations, such as Neural Radiance Field (NeRF) and 3D Gaussian Splatting (3DGS) models. In this thesis, we develop a radiance-field-based 3D change detection method to identify 3D object-level scene changes. Our approach can rapidly detect object changes in cluttered environments represented with radiance field models from as few as a single post-change image observation. We also develop efficient update methods for NeRF and 3DGS models to reflect physical object rearrangements, guided by sparse post-change images. By addressing these challenges, this thesis advances the robustness and adaptability of object SLAM systems in real-world environments, paving the way for more reliable and autonomous robotic systems capable of complex interactions with the environment.
Date issued
2025-02Department
Massachusetts Institute of Technology. Department of Mechanical EngineeringPublisher
Massachusetts Institute of Technology