Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams

Zhang, Yaodong; Glass, James R.

dc.contributor.author	Glass, James R.
dc.contributor.author	Zhang, Yaodong, Ph. D. Massachusetts Institute of Technology
dc.date.accessioned	2012-10-01T16:23:45Z
dc.date.available	2012-10-01T16:23:45Z
dc.date.issued	2010-01
dc.date.submitted	2009-12
dc.identifier.isbn	978-1-4244-5478-5
dc.identifier.issn	978-1-4244-5479-2
dc.identifier.uri	http://hdl.handle.net/1721.1/73507
dc.description.abstract	In this paper, we present an unsupervised learning framework to address the problem of detecting spoken keywords. Without any transcription information, a Gaussian Mixture Model is trained to label speech frames with a Gaussian posteriorgram. Given one or more spoken examples of a keyword, we use segmental dynamic time warping to compare the Gaussian posteriorgrams between keyword samples and test utterances. The keyword detection result is then obtained by ranking the distortion scores of all the test utterances. We examine the TIMIT corpus as a development set to tune the parameters in our system, and the MIT Lecture corpus for more substantial evaluation. The results demonstrate the viability and effectiveness of our unsupervised learning framework on the keyword spotting task.	en_US
dc.language.iso	en_US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.isversionof	http://dx.doi.org/10.1109/ASRU.2009.5372931	en_US
dc.rights	Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.	en_US
dc.source	IEEE	en_US
dc.title	Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams	en_US
dc.type	Article	en_US
dc.identifier.citation	Zhang, Yaodong, and James R. Glass. “Unsupervised Spoken Keyword Spotting via Segmental DTW on Gaussian Posteriorgrams.” Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU 2009) IEEE, 2009. 398–403. (c) 2009 IEEE	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.contributor.approver	Glass, James R.
dc.contributor.mitauthor	Glass, James R.
dc.contributor.mitauthor	Zhang, Yaodong
dc.relation.journal	IEEE Workshop on Automatic Speech Recognition & Understanding, 2009 (ASRU 2009)	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
dspace.orderedauthors	Zhang, Yaodong; Glass, James R.	en
dc.identifier.orcid	https://orcid.org/0000-0002-3097-360X
dspace.mitauthor.error	true
mit.license	PUBLISHER_POLICY	en_US
mit.metadata.status	Complete

Files in this item

Name:: Yaodong-2009-Unsupervised spoken ...
Size:: 118.6Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record