MIDOS : Multimodal Interactive DialOgue System
Author(s)
Adler, Aaron D. (Aaron Daniel), 1979-
DownloadFull printable version (14.91Mb)
Alternative title
Multimodal Interactive DialOgue System
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Randall Davis.
Terms of use
Metadata
Show full item recordAbstract
Interactions between people are typically conversational, multimodal, and symmetric. In conversational interactions, information flows in both directions. In multimodal interactions, people use multiple channels. In symmetric interactions, both participants communicate multimodally, with the integration of and switching between modalities basically effortless. In contrast, consider typical human-computer interaction. It is almost always unidirectional { we're telling the machine what to do; it's almost always unimodal (can you type and use the mouse simultaneously?); and it's symmetric only in the disappointing sense that when you type, it types back at you. There are a variety of things wrong with this picture. Perhaps chief among them is that if communication is unidirectional, it must be complete and unambiguous, exhaustively anticipating every detail and every misinterpretation. In brief, it's exhausting. This thesis examines the benefits of creating multimodal human-computer dialogues that employ sketching and speech, aimed initially at the task of describing early stage designs of simple mechanical devices. The goal of the system is to be a collaborative partner, facilitating design conversations. Two initial user studies provided key insights into multimodal communication: simple questions are powerful, color choices are deliberate, and modalities are closely coordinated. These observations formed the basis for our multimodal interactive dialogue system, or Midos. Midos makes possible a dynamic dialogue, i.e., one in which it asks questions to resolve uncertainties or ambiguities. (cont.) The benefits of a dialogue in reducing the cognitive overhead of communication have long been known. We show here that having the system able to ask questions is good, but for an unstructured task like describing a design, knowing what questions to ask is crucial. We describe an architecture that enables the system to accept partial information from the user, then request details it considers relevant, noticeably lowering the cognitive overhead of communicating. The multimodal questions Midos asks are in addition purposefully designed to use the same multimodal integration pattern that people exhibited in our study. Our evaluation of the system showed that Midos successfully engages the user in a dialogue and produces the same conversational features as our initial human-human conversation studies.
Description
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (p. 239-243).
Date issued
2009Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.