A Conversational Agent for Dynamic Procedural Interactions

Colón-Hernández, Pedro

dc.contributor.advisor	Breazeal, Cynthia
dc.contributor.author	Colón-Hernández, Pedro
dc.date.accessioned	2023-08-30T15:57:18Z
dc.date.available	2023-08-30T15:57:18Z
dc.date.issued	2023-06
dc.date.submitted	2023-08-16T20:34:01.641Z
dc.identifier.uri	https://hdl.handle.net/1721.1/151990
dc.description.abstract	How-To questions (e.g., “How do I cook rice?”, “How do I write a check?”, or “How do I send pictures to my family from my iPhone?”) are some of the most common questions asked of search engines and presumably of conversational agents as well. Answers to How-To questions should generally be in the form of a procedure; step-by-step instructions that users perform in sequence. However, people find reading instructions cognitively demanding and often prefer that another person guide them through a procedure. Prior work in automating procedural guidance either concentrates on how to communicate instructions or how to reason about procedural knowledge to extract states of entities. In this work, we present an end-to-end procedural voice guidance system that automatically generates and presents step-by-step instructions to users through a conversational agent. This system overcomes three significant challenges: generating a contextual knowledge graph of the procedure, ordering necessary information through reasoning on that graph and converting it to procedural steps, and finally constructing a conversational system that delivers the procedure in a way that is easily followed by users. Our approach improves upon the current state-of-the-art in conversational agents, which often hand off the interaction to a web search. We demonstrate that our system can be utilized for end-user guidance, and that a contextual commonsense inference system can be used for procedural knowledge graph generation and ultimately procedural step generation. We also show that reasoning for procedural step generation is essential for the task. Lastly, we show that combining our knowledge driven system, both its steps and contextual commonsense assertions with a large language model (LLM) provides more accurate and reliable procedural guidance in tasks that the LLM may have trouble recalling/or were created after training. This work opens up paths to perform contextual graph-based reasoning for story-based applications and helps inform the design of future conversational agents within the domain of procedural guidance.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	A Conversational Agent for Dynamic Procedural Interactions
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Program in Media Arts and Sciences (Massachusetts Institute of Technology)
dc.identifier.orcid	https://orcid.org/0000-0001-9293-203X
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: colonHernandez-pe25171-PHD-MAS ...
Size:: 22.81Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record