A Conversational Agent for Dynamic Procedural Interactions

Colón-Hernández, Pedro

Author(s)

Colón-Hernández, Pedro

DownloadThesis PDF (22.81Mb)

Advisor

Breazeal, Cynthia

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

How-To questions (e.g., “How do I cook rice?”, “How do I write a check?”, or “How do I send pictures to my family from my iPhone?”) are some of the most common questions asked of search engines and presumably of conversational agents as well. Answers to How-To questions should generally be in the form of a procedure; step-by-step instructions that users perform in sequence. However, people find reading instructions cognitively demanding and often prefer that another person guide them through a procedure. Prior work in automating procedural guidance either concentrates on how to communicate instructions or how to reason about procedural knowledge to extract states of entities. In this work, we present an end-to-end procedural voice guidance system that automatically generates and presents step-by-step instructions to users through a conversational agent. This system overcomes three significant challenges: generating a contextual knowledge graph of the procedure, ordering necessary information through reasoning on that graph and converting it to procedural steps, and finally constructing a conversational system that delivers the procedure in a way that is easily followed by users. Our approach improves upon the current state-of-the-art in conversational agents, which often hand off the interaction to a web search. We demonstrate that our system can be utilized for end-user guidance, and that a contextual commonsense inference system can be used for procedural knowledge graph generation and ultimately procedural step generation. We also show that reasoning for procedural step generation is essential for the task. Lastly, we show that combining our knowledge driven system, both its steps and contextual commonsense assertions with a large language model (LLM) provides more accurate and reliable procedural guidance in tasks that the LLM may have trouble recalling/or were created after training. This work opens up paths to perform contextual graph-based reasoning for story-based applications and helps inform the design of future conversational agents within the domain of procedural guidance.

Date issued

2023-06

URI

https://hdl.handle.net/1721.1/151990

Department

Program in Media Arts and Sciences (Massachusetts Institute of Technology)

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses