An architecture for low-power voice-command recognition systems
Author(s)
He, Qing, Ph. D. Massachusetts Institute of Technology
DownloadFull printable version (13.24Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Gregory W. Wornell.
Terms of use
Metadata
Show full item recordAbstract
The advancements in fields such as machine-learning have allowed for a growing number of applications seeking to exploit learning methods. Many such applications involve complex algorithms working over high-dimensional features and are implemented in large scale systems where power and other resources are abundant. With emerging interest in embedded applications, nano-scale systems, and mobile devices, which are power and computation constrained, there is a rising need to find simple, low-power solutions for common applications such as voice activation. This thesis develops an ultra-low-power system architecture for voice-command recognition applications. It optimizes system resources by exploiting compact representations of the signal features and extracting them with efficient analog front-ends. The front-end performs feature pre-selection such that only a subset of all available features are chosen and extracted. Two variations of front-end feature extraction design are developed, for the applications of text-dependent speaker-verification and user-independent command recognition, respectively. For speaker-verification, the features are selected with knowledge of the speaker's fundamental frequency and are adapted based on the noise spectrum. The back-end algorithm, supporting adaptive feature selection, is a weighted dynamic time warping algorithm that removes signal misalignments and mitigates speech rate variations while preserving the signal envelope. In the case of user-independent command recognition, a universal set of features are selected without using speaker-specific information. The back-end classifier is enabled by a novel multi-band deep neural network model that processes only the selected features at each decision. In experiments, the proposed systems achieve improved accuracy with noise robustness using significantly less power consumption and computation than existing systems. Components of the front- and back-ends have been implemented in hardware, and the end-to-end system power consumption is kept under a few hundred [mu]Ws.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 149-157).
Date issued
2016Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.