Learning semantic maps from natural language
Author(s)
Hemachandra, Sachithra Madhawa
DownloadFull printable version (38.05Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Nicholas Roy.
Terms of use
Metadata
Show full item recordAbstract
As robots move into human-occupied environments, the need for effective mechanisms to enable interactions with humans becomes vital. Natural language is a flexible, intuitive medium that can enable such interactions, but language understanding requires robots to learn representations of their environments that are compatible with the conceptual models used by people. Current approaches to constructing such spatial-semantic representations rely solely on traditional sensors to acquire knowledge of the environment, which restricts robots to learning limited knowledge of their local surround. Furthermore, they can only reason over the limited portion of the environment that is in the robot's field-of-view. Natural language, on the other hand, allows people to share rich properties of their environment with their robotic partners in a flexible, efficient manner. The ability to integrate such descriptions can allow the robot to learn semantic properties such as colloquial names that are difficult to infer using existing methods, and learn about the world outside its perception range. The spatial and temporal disconnect between language descriptions and the robot's onboard sensors makes fusing the two sources of information challenging. This thesis addresses the problem of fusing information contained in natural language descriptions with the robot's onboard sensors to construct spatial-semantic representations useful for interacting with humans. The novelty lies in treating natural language descriptions as another sensor observation that informs the robot about its environment. Towards this end, we introduce the semantic graph, a spatial-semantic representation that provides a common framework in which we integrate information that the user communicates (e.g., labels and spatial relations) with observations from the robot's sensors. Our algorithm efficiently maintains a factored distribution over semantic graphs based upon the stream of natural language and low-level sensor information. We detail the means by which the framework incorporates knowledge conveyed by the user's descriptions, including the ability to reason over expressions that reference yet unknown regions in the environment. We evaluate the algorithm's ability to learn human-centric maps of several different environments and analyze the knowledge inferred from language and the utility of the learned maps. The results demonstrate that the incorporation of information from free-form descriptions increases the metric, topological and semantic accuracy of the recovered environment model. Next, we outline an algorithm that enables robots to improve their spatial-semantic representation of an environment by engaging users in dialog. The algorithm reasons over the ambiguity of language descriptions provided by the user given the current map, and selects information-gathering actions in the form of targeted questions about its local surroundings and areas distant from the robot. Our algorithm balances the information-theoretic value of candidate questions with a measure of cost associated with dialog. We demonstrate that by asking deliberate questions of the user, the method significantly improves the accuracy of the learned semantic map. Finally, we introduce a learning framework that enables robots to successfully follow natural language navigation instructions within previously unknown environments. The algorithm utilizes information about the environment that the human conveys within the command to learn a distribution over the spatial-semantic model of the environment. We achieve this through a formulation of our semantic mapping algorithm that uses information conveyed in the command to directly reason over unobserved spatial structure. The framework then uses this distribution in place of the latent world model to interpret the natural language instruction as a distribution over the intended actions. Next, a belief space planner solves for the action that best satisfies the intent of the command. We apply this towards following directions to objects and natural language route directions in unknown environments. We evaluate this approach through simulation and physical experiments, and demonstrate its ability to follow navigation commands with performance comparable to that of a fully-known environment.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from PDF student-submitted version of thesis. Includes bibliographical references (pages 185-193).
Date issued
2015Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.