An architecture for low-power voice-command recognition systems

He, Qing, Ph. D. Massachusetts Institute of Technology

dc.contributor.advisor	Gregory W. Wornell.	en_US
dc.contributor.author	He, Qing, Ph. D. Massachusetts Institute of Technology	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2016-12-05T19:11:12Z
dc.date.available	2016-12-05T19:11:12Z
dc.date.copyright	2016	en_US
dc.date.issued	2016	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/105574
dc.description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.	en_US
dc.description	This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.	en_US
dc.description	Cataloged from student-submitted PDF version of thesis.	en_US
dc.description	Includes bibliographical references (pages 149-157).	en_US
dc.description.abstract	The advancements in fields such as machine-learning have allowed for a growing number of applications seeking to exploit learning methods. Many such applications involve complex algorithms working over high-dimensional features and are implemented in large scale systems where power and other resources are abundant. With emerging interest in embedded applications, nano-scale systems, and mobile devices, which are power and computation constrained, there is a rising need to find simple, low-power solutions for common applications such as voice activation. This thesis develops an ultra-low-power system architecture for voice-command recognition applications. It optimizes system resources by exploiting compact representations of the signal features and extracting them with efficient analog front-ends. The front-end performs feature pre-selection such that only a subset of all available features are chosen and extracted. Two variations of front-end feature extraction design are developed, for the applications of text-dependent speaker-verification and user-independent command recognition, respectively. For speaker-verification, the features are selected with knowledge of the speaker's fundamental frequency and are adapted based on the noise spectrum. The back-end algorithm, supporting adaptive feature selection, is a weighted dynamic time warping algorithm that removes signal misalignments and mitigates speech rate variations while preserving the signal envelope. In the case of user-independent command recognition, a universal set of features are selected without using speaker-specific information. The back-end classifier is enabled by a novel multi-band deep neural network model that processes only the selected features at each decision. In experiments, the proposed systems achieve improved accuracy with noise robustness using significantly less power consumption and computation than existing systems. Components of the front- and back-ends have been implemented in hardware, and the end-to-end system power consumption is kept under a few hundred [mu]Ws.	en_US
dc.description.statementofresponsibility	by Qing He.	en_US
dc.format.extent	157 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	An architecture for low-power voice-command recognition systems	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph. D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	964448763	en_US

Files in this item

Name:: 964448763-MIT.pdf
Size:: 13.24Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record