Open Intent Generation Through Unsupervised Semantic Clustering of Task-Oriented Dialog
Author(s)
Wagner, Julia N.
DownloadThesis PDF (32.68Mb)
Advisor
Glass, James
Terms of use
Metadata
Show full item recordAbstract
The natural language processing field has seen task-oriented dialog systems emerge as a strong area of interest in research and industry over the past years. However, the limited existence of complex and sufficiently annotated training data still places a bottleneck on the development of more advanced, domain-agnostic chatbots. Novel domains require extensive time and manual effort from experts when creating intents for new datasets to support dialog systems. This thesis analyzes a two-staged unsupervised semantic clustering and intent generation approach with multiple dataset adaptive interchangeable methods. We examine various pre-trained embeddings, scoring objectives for the number of clusters, unsupervised clustering algorithms, intent generation techniques, and utterance tokenization schemes. We then run experiments with these combinations on three datasets: SNIPS, MultiWOZ, and real-world chat data. This is followed by quantitative metric and in-depth qualitative cluster-based evaluation. We show the benefits of bigram frequency intent generation as datasets increase irregularity and confirm the success of the universal sentence encoder embeddings with K-Means clustering. Additionally, our examination of real-world data underlines the importance of fine-grained utterance tokenization and gives promise to the feasibility of research methods on unpublished data. Altogether, this thesis provides a comprehensive analysis covering the abilities of the the two-stage pipeline components to support open intent discovery for a variety of dataset characteristics, offering alternative solutions where beneficial for real-world applications. This gives insight to the optimal configuration to automatically generate a novel dialog training dataset from unstructured, unlabeled chat utterances. The code for this thesis can be found at https://github.com/jnwagner53/dialog-intent-generation.
Date issued
2022-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology