Large-Scale Multi-Robot Spatial Perception

Chang, Yun

Author(s)

Chang, Yun

DownloadThesis PDF (55.05Mb)

Advisor

Carlone, Luca

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

This thesis addresses the challenge of scalable and robust multi-robot spatial perception, with the goal of supporting autonomous task execution in large-scale environments. The work focuses on two core issues: scaling to large, complex environments, and incorporating highlevel scene understanding to enable autonomy for complex tasks. Current multi-robot systems typically focus on geometric reconstruction for navigation, but often fall short in providing the scene understanding needed for complex decision-making and task execution in real-world environments. Conversely, many recent demonstrations of autonomous task execution are limited to small, controlled environments, with few methods addressing scalability to larger scenes. This work bridges this gap by integrating multi-robot simultaneous localization and mapping (SLAM) with spatial perception in order to support downstream autonomy for complex tasks. We begin by introducing methods to enhance the robustness and efficiency of loop closure detection in centralized multi-robot SLAM, focusing on prioritizing loop closures and mitigating the impact of incorrect loop closures in large-scale environments. We then present the first fully distributed metric-semantic SLAM system for multi-robot teams, which supports real-time semantic mapping and enables large-scale deployments with up to 8 robots and 8 kilometers of traversal. To improve reasoning across robot teams, we extend this work to 3D scene graphs, proposing a framework for collaboratively building and maintaining a shared multi-robot scene graph online. Additionally, we introduce algorithms for task-oriented compression of 3D scene graphs to support communication across robots under bandwidth constraints. Finally, we explore open-set scene understanding made possible by advances in visual-language models and highlight the need for task-driven mapping. Building on this, we propose a novel framework for grounding high-level language commands into scene graphs, enabling robots to decompose high-level tasks into executable subtasks while focusing on task-relevant components of the environment. The contributions of this thesis are validated through experimental evaluations in extreme environments and real-world deployments, where multi-robot teams operate in large-scale settings. These experiments tackle a broad range of tasks, from navigation and object search to executing high-level language commands (e.g., “clean the room”). Our contributions advance multi-robot large-scale spatial perception and have the potential to impact real-world applications such as exploration, service robotics, and search and rescue, where autonomous multi-robot teams are essential for performing complex tasks in large environments.

Date issued

2025-09

URI

https://hdl.handle.net/1721.1/165155

Department

Massachusetts Institute of Technology. Department of Aeronautics and Astronautics

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses