Show simple item record

dc.contributor.advisorAnantha Chandrakasan and James Glass.en_US
dc.contributor.authorPrice, Michael, Ph. D. (Michael R.). Massachusetts Institute of Technologyen_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2016-12-22T16:28:36Z
dc.date.available2016-12-22T16:28:36Z
dc.date.copyright2016en_US
dc.date.issued2016en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/106090
dc.descriptionThesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 135-141).en_US
dc.description.abstractAs people become more comfortable with speaking to machines, the applications of speech interfaces will diversify and include a wider range of devices, such as wearables, appliances, and robots. Automatic speech recognition (ASR) is a key component of these interfaces that is computationally intensive. This thesis shows how we designed special-purpose integrated circuits to bring local ASR capabilities to electronic devices with a small size and power footprint. This thesis adopts a holistic, system-driven approach to ASR hardware design. We identify external memory bandwidth as the main driver in system power consumption and select algorithms and architectures to minimize it. We evaluate three acoustic modeling approaches-Gaussian mixture models (GMMs), subspace GMMs (SGMMs), and deep neural networks (DNNs)-and identify tradeoffs between memory bandwidth and recognition accuracy. DNNs offer the best tradeoffs for our application; we describe a SIMD DNN architecture using parameter quantization and sparse weight matrices to save bandwidth. We also present a hidden Markov model (HMM) search architecture using a weighted finite-state transducer (WFST) representation. Enhancements to the search architecture, including WFST compression and caching, predictive beam width control, and a word lattice, reduce memory bandwidth to 10 MB/s or less, despite having just 414 kB of on-chip SRAM. The resulting system runs in real-time with accuracy comparable to a software recognizer using the same models. We provide infrastructure for deploying recognizers trained with open-source tools (Kaldi) on the hardware platform. We investigate voice activity detection (VAD) as a wake-up mechanism and conclude that an accurate and robust algorithm is necessary to minimize system power, even if it results in larger area and power for the VAD itself. We design fixed-point digital implementations of three VAD algorithms and explore their performance on two synthetic tasks with SNRs from -5 to 30 dB. The best algorithm uses modulation frequency features with an NN classifier, requiring just 8.9 kB of parameters. Throughout this work we emphasize energy scalability, or the ability to save energy when high accuracy or complex models are not required. Our architecture exploits scalability from many sources: model hyperparameters, runtime parameters such as beam width, and voltage/frequency scaling. We demonstrate these concepts with results from five ASR tasks, with vocabularies ranging from 11 words to 145,000 words.en_US
dc.description.statementofresponsibilityby Michael Price.en_US
dc.format.extent141 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleEnergy-scalable speech recognition circuitsen_US
dc.typeThesisen_US
dc.description.degreePh. D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc965382032en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record