Show simple item record

dc.contributor.advisorBreazeal, Cynthia
dc.contributor.authorColón-Hernández, Pedro
dc.date.accessioned2023-08-30T15:57:18Z
dc.date.available2023-08-30T15:57:18Z
dc.date.issued2023-06
dc.date.submitted2023-08-16T20:34:01.641Z
dc.identifier.urihttps://hdl.handle.net/1721.1/151990
dc.description.abstractHow-To questions (e.g., “How do I cook rice?”, “How do I write a check?”, or “How do I send pictures to my family from my iPhone?”) are some of the most common questions asked of search engines and presumably of conversational agents as well. Answers to How-To questions should generally be in the form of a procedure; step-by-step instructions that users perform in sequence. However, people find reading instructions cognitively demanding and often prefer that another person guide them through a procedure. Prior work in automating procedural guidance either concentrates on how to communicate instructions or how to reason about procedural knowledge to extract states of entities. In this work, we present an end-to-end procedural voice guidance system that automatically generates and presents step-by-step instructions to users through a conversational agent. This system overcomes three significant challenges: generating a contextual knowledge graph of the procedure, ordering necessary information through reasoning on that graph and converting it to procedural steps, and finally constructing a conversational system that delivers the procedure in a way that is easily followed by users. Our approach improves upon the current state-of-the-art in conversational agents, which often hand off the interaction to a web search. We demonstrate that our system can be utilized for end-user guidance, and that a contextual commonsense inference system can be used for procedural knowledge graph generation and ultimately procedural step generation. We also show that reasoning for procedural step generation is essential for the task. Lastly, we show that combining our knowledge driven system, both its steps and contextual commonsense assertions with a large language model (LLM) provides more accurate and reliable procedural guidance in tasks that the LLM may have trouble recalling/or were created after training. This work opens up paths to perform contextual graph-based reasoning for story-based applications and helps inform the design of future conversational agents within the domain of procedural guidance.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleA Conversational Agent for Dynamic Procedural Interactions
dc.typeThesis
dc.description.degreePh.D.
dc.contributor.departmentProgram in Media Arts and Sciences (Massachusetts Institute of Technology)
dc.identifier.orcidhttps://orcid.org/0000-0001-9293-203X
mit.thesis.degreeDoctoral
thesis.degree.nameDoctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record