Grounded Situation Models for Situated Conversational Assistants

Mavridis, Nikolaos

Author(s)

Mavridis, Nikolaos

DownloadFull printable version (4.269Mb)

Alternative title

GCMs for SCAs

Other Contributors

Massachusetts Institute of Technology. Dept. of Architecture. Program In Media Arts and Sciences

Advisor

Rosalind W. Picard.

Terms of use

M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

A Situated Conversational Assistant (SCA) is a system with sensing, acting and speech synthesis/recognition abilities, which engages in physically situated natural language conversation with human partners and assists them in carrying out tasks. This thesis addresses some prerequisites towards an ideal truly cooperative SCA through the development of a computational model of embodied, situated language agents and implementation of the model in the form of an interactive, conversational robot. The proposed model produces systems that are capable of a core set of situated natural language communication skills, and provides leverage for many extensions towards the ideal SCA, such as mind reading skills. The central idea is to endow agents with a sensor-updated "structured blackboard" representational structure called a Grounded Situation Model (GSM), which is closely related to the cognitive psychology notion of situation models. The GSM serves as a workspace with contents similar to a "theatrical stage" in the agent's "mind". The GSM may be filled either with the contents of the agent's present here-and-now physical situation, or a past situation that is being recalled, or an imaginary situation that is being described or planned.

(cont.) Furthermore, the GSM contains descriptions of both physical (such as objects) as well as mental aspects of situations (such as beliefs of others). Most importantly, the proposed GSM design enables bidirectional translation between linguistic descriptions and perceptual data / expectations. To demonstrate viability, an instance of the model was implemented on a manipulator robot with touch, vision, and speech synthesis/recognition. The robot grasps the semantics of a range of words and speech acts related to cooperative manipulation of objects on a table top situated between the robot and human. The robot's language comprehension abilities are comparable to those implied by a standard and widely used test of children's language comprehension (the Token Test), and in some directions also surpass those abilities. Not only the viability but also the effectiveness of the GSM proposal is thus demonstrated, through a real-world autonomous robot that performs comparably to those capabilities of a normally-developing three-year old child which are assessed by the token test.

Description

Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007.

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.

Includes bibliographical references (p. 259-267).

Date issued

2007

URI

http://hdl.handle.net/1721.1/38523

Department

Program in Media Arts and Sciences (Massachusetts Institute of Technology)

Publisher

Massachusetts Institute of Technology

Keywords

Architecture. Program In Media Arts and Sciences

Collections

Doctoral Theses