Algorithms and low power hardware for keyword spotting

Wang, Miaorong

Author(s)

Wang, Miaorong

DownloadFull printable version (7.945Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Anantha P. Chandrakasan.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Keyword spotting (KWS) is widely used in mobile devices to provide hands-free interface. It continuously listens to all sound signals, detects specific keywords and triggers the downstream system. The key design target of a KWS system is to achieve high classification accuracy of specified keywords and have low power consumption while doing real-time processing of speech data. The algorithm based on convolutional neural network (CNN) delivers high accuracy with small model size that can be stored in on-chip memory. However, the state-of-the-art NN accelerators either target at complex tasks using large CNN models, e.g. AlexNet, or support limited neural network (NN) architectures which delivers lower classification accuracy for KWS. This thesis takes an algorithm-and-hardware co-design approach to implement a low power NN accelerator for the KWS system that is able to process CNN with flexible structures. On the algorithm side, we propose a weight tuning method that tweaks the bits of weights to lower the switching activity in the weight network-on-chip (NoC) and multipliers. The algorithm takes in 2's complement 8-bit original weights and outputs sign-magnitude 8-bit tuned weights. In our experiment, 60.96% reduction in the toggle count of weights is achieved with 0.75% loss in accuracy. On the hardware side, we implement a processing element (PE) to efficiently process the tuned weights. It takes in sign-magnitude weights and input activations, and multiplies them by an unsigned multiplier. An XOR gate is used to generate the sign bit of the product. The sign-magnitude product is converted back to 2's complement representation and accumulated using an adder-and-subtractor. The sign bit of the product is used as a carry bit to do the conversion. Comparing to the PE that processes original 2's complement weights, around 35% power reduction is observed. In the end, this thesis presents a CNN accelerator that consumes 1.2 mW when doing real-time processing of speech data with an accuracy of around 87.3% on Google speech command dataset [34].

Description

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 73-76).

Date issued

2018

URI

http://hdl.handle.net/1721.1/118035

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Graduate Theses