Learning optimal discourse strategies in a spoken dialogue system
Author(s)Fromer, Jeanne C., 1975-
Robert C. Berwick.
MetadataShow full item record
Participants in a conversation can often realize their conversational goals in multiple ways by employing different discourse strategies. For example, one can usually present requested information in various ways; different presentation methods are preferred and most effective in varying contexts. One can also manage conversations, or assume initiative, to varying degrees by directing questions, issuing commands, restricting potential responses, and controlling discussion topics in different ways. Agents that converse with users in natural language and possess different discourse strategies need to choose and realize the optimal strategy from competing strategies. Previous work in natural language generation has selected discourse strategies by using heuristics based on discourse focus, medium, style, and the content of previous utterances. Recent work suggests that an agent can learn which strategies are optimal. This thesis investigates the issues involved with learning optimal discourse strategies on the basis of experience gained through conversations between human users and natural language agents. A spoken dialogue agent, ELVIS, is implemented as a testbed for learning optimal discourse strategies. ELVIS provides telephone-based voice access to a caller's email. Within ELVIS, various discourse strategies for the distribution of initiative, reading messages, and summarizing messages are implemented. Actual users interact with discourse strategy-based variations of ELVIS. Their conversations are used to derive a dialogue performance function for ELVIS using the PARADISE dialogue evaluation framework. This performance function is then used with reinforcement learning techniques, such as adaptive dynamic programming, Q-learning, temporal difference learning, and temporal difference Q-learning, to determine the optimal discourse strategies for ELVIS to use in different contexts. This thesis reports and compares learning results and describes how the particular reinforcement algorithm, local reward functions, and the system state space representation affect the efficiency and the outcome of the learning results. This thesis concludes by suggesting how it may be possible to automate online learning in spoken dialogue systems by extending the presented evaluation and learning techniques.
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 123-129).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Electrical Engineering and Computer Science