An Intelligence Architecture for Grounded Language Communication with Field Robots

Howard, Thomas; Stump, Ethan; Fink, Jonathan; Arkin, Jacob; Paul, Rohan; Park, Daehyung; Roy, Subhro; Barber, Daniel; Bendell, Rhyse; Schmeckpeper, Karl; Tian, Junjiao; Oh, Jean; Wigness, Maggie; Quang, Long; Rothrock, Brandon; Nash, Jeremy; Walter, Matthew; Jentsch, Florian; Roy, Nicholas

Author(s)

Howard, Thomas; Stump, Ethan; Fink, Jonathan; Arkin, Jacob; Paul, Rohan; ... Show more

DownloadPublished version (18.77Mb)

Publisher with Creative Commons License

Terms of use

Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/

Metadata

Show full item record

Abstract

For humans and robots to collaborate effectively as teammates in unstructured environments, robots must be able to construct semantically rich models of the environment, communicate efficiently with teammates, and perform sequences of tasks robustly with minimal human intervention, as direct human guidance may be infrequent and/or intermittent. Contemporary architectures for human-robot interaction often rely on engineered human-interface devices or structured languages that require extensive prior training and inherently limit the kinds of information that humans and robots can communicate. Natural language, particularly when situated with a visual representation of the robot’s environment, allows humans and robots to exchange information about abstract goals, specific actions, and/or properties of the environment quickly and effectively. In addition, it serves as a mechanism to resolve inconsistencies in the mental models of the environment across the human-robot team. This article details a novel intelligence architecture that exploits a centralized representation of the environment to perform complex tasks in unstructured environments. The centralized environment model is informed by a visual perception pipeline, declarative knowledge, deliberate interactive estimation, and a multimodal interface. The language pipeline also exploits proactive symbol grounding to resolve uncertainty in ambiguous statements through inverse semantics. A series of experiments on three different, unmanned ground vehicles demonstrates the utility of this architecture through its robust ability to perform language-guided spatial navigation, mobile manipulation, and bidirectional communication with human operators. Experimental results give examples of component-level behaviors and overall system performance that guide a discussion on observed performance and opportunities for future innovation.

Date issued

2022

URI

https://hdl.handle.net/1721.1/145529

Department

Massachusetts Institute of Technology. Department of Aeronautics and Astronautics

Journal

Field Robotics

Publisher

Field Robotics Publication Society

Citation

Howard, Thomas, Stump, Ethan, Fink, Jonathan, Arkin, Jacob, Paul, Rohan et al. 2022. "An Intelligence Architecture for Grounded Language Communication with Field Robots." Field Robotics, 2 (1).

Version: Final published version

Collections

MIT Open Access Articles