Morphological segmentation : an unsupervised method and application to Keyword Spotting
Author(s)
Narasimhan, Karthik Rajagopal
DownloadFull printable version (1.936Mb)
Alternative title
Unsupervised method and application to KWS
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Regina Barzilay.
Terms of use
Metadata
Show full item recordAbstract
The contributions of this thesis are twofold. First, we present a new unsupervised algorithm for morphological segmentation that utilizes pseudo-semantic information, in addition to orthographic cues. We make use of the semantic signals from continuous word vectors, trained on huge corpora of raw text data. We formulate a log-linear model that is simple and can be used to perform fast, efficient inference on new words. We evaluate our model on a standard morphological segmentation dataset, and obtain large performance gains of up to 18.4% over an existing state-of-the-art system, Morfessor. Second, we explore the impact of morphological segmentation on the speech recognition task of Keyword Spotting (KWS). Despite potential benefits, state-of-the-art KWS systems do not use morphological information. In this thesis, we augment a KWS system with sub-word units derived by multiple segmentation algorithms including supervised and unsupervised morphological segmentations, along with phonetic and syllabic segmentations. Our experiments demonstrate that morphemes improve overall performance of KWS systems. Syllabic units, however, rival the performance of morphological units when used in KWS. By combining morphological and syllabic segmentations, we demonstrate substantial performance gains..
Description
Thesis: S.M. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014. 26 Cataloged from PDF version of thesis. Includes bibliographical references (pages 41-44).
Date issued
2014Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.