Developing attribute acquisition strategies in spoken dialogue systems via user simulation
Author(s)
Filisko, Edward A. (Edward Anthony), 1977-
DownloadFull printable version (11.25Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Stephanie Seneff.
Terms of use
Metadata
Show full item recordAbstract
A spoken dialogue system (SDS) is an application that supports conversational interaction with a human to perform some task. SDSs are emerging as an intuitive and efficient means for accessing information. A critical barrier to their widespread deployment remains in the form of communication breakdown at strategic points in the dialogue, often when the user tries to supply a named entity from a large or open vocabulary set. For example, a weather system might know several thousand cities, but there is no easy way to inform the user about what those cities are. The system will likely misrecognize any unknown city as some known city. The inability of a system to acquire an unknown value can lead to unpredictable behavior by the system, as well as by the user. This thesis presents a framework for developing attribute acquisition strategies with a simulated user. We specifically focus on the acquisition of unknown city names in a flight domain, through a spell-mode subdialogue. Collecting data from real users is costly in both time and resources. In addition, our goal is to focus on situations that tend to occur sporadically in real dialogues, depending on the domain and the user's experience in that domain. (cont.) Therefore, we chose to employ user simulation, which would allow us to generate a large number of dialogues, and to configure the input as desired in order to exercise specific strategies. We present a novel method of utterance generation for user simulation, that exploits an existing corpus of real user dialogues, but recombines the utterances using an example-based, template approach. Items of interest not in the corpus, such as foreign or unknown cities, can be included by splicing in synthesized speech. This method allows us to produce realistic utterances by retaining the structural variety of real user utterances, while introducing cities that can only be resolved via spelling. We also developed a model of generic dialogue management, allowing a developer to quickly specify interaction properties on a per-attribute basis. This model was used to assess the effectiveness of various combinations of dialogue strategies and simulated user behavior. Current approaches to user simulation typically model simulated utterances at the intention level, assuming perfect recognition and understanding. We employ speech to develop our strategies in the context of errors that occur naturally from recognition and understanding. (cont.) We use simulation to address two problems: the conflict problem requires the system to choose how to act when a new hypothesis for an attribute conflicts with its current belief, while the compliance problem requires the system to decide whether a user was compliant with a spelling request. Decision models were learned from simulated data, and were tested with real users, showing that the learned model significantly outperformed a heuristic model in choosing the "ideal" response to the conflict problem, with accuracies of 84.1% and 52.1%, respectively. The learned model to predict compliance achieved a respectable 96.3% accuracy. These results suggest that such models learned from simulated data can attain similar, if not better, performance in dialogues with real users.
Description
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006. Includes bibliographical references (p. 159-169).
Date issued
2006Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.