Show simple item record

dc.contributor.advisorAnantha P. Chandrakasan.en_US
dc.contributor.authorJi, Alex.en_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2019-07-18T20:35:06Z
dc.date.available2019-07-18T20:35:06Z
dc.date.copyright2018en_US
dc.date.issued2018en_US
dc.identifier.urihttps://hdl.handle.net/1721.1/121835
dc.descriptionThesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 69-71).en_US
dc.description.abstractImage processing has become more important with the ever increasing amount of available image data. This has been accompanied by the development of new algorithms and hardware. However, dedicated hardware is often required to run these algorithms efficiently and conversely, algorithms need to be developed to exploit the benefits of the new hardware. For example, depth cameras have been created to add a new dimension to human-computer interaction. They can benefit applications that can operate on the raw depth data directly, such as breath monitoring. As for new algorithms, convolutional neural networks (CNNs) have become the standard for difficult image processing tasks due to their high accuracy. But to execute them efficiently, we need new hardware to fully exploit the parallelism inherent in these computations. The first part of the thesis presents an algorithm for breath monitoring using a low-resolution time-of-flight camera. It consists of automatic region-of-interest detection, followed by frequency estimation. It can be accurate to within 1 breath per minute, comparing with a respiratory belt as reference. The second part presents a processing element (PE) for a neural network accelerator supporting compressed weights and using a new technique called factored computation. The PE consists of an accumulator array, row decoder, and output combination block. Modifications to the row decoder can allow for reconfigurability of the compressed weight bit-widths. Several common layer operations in CNNs are described and mapped onto the proposed hardware. An energy model of the design is formulated and verified by synthesizing and simulating a basic processing element containing an 8 x 20 accumulator array. Simulations show the proposed design achieves up to 4.5x reduction in the energy per MAC compared to a baseline 16-bit fixed-point MAC unit.en_US
dc.description.statementofresponsibilityby Alex Ji.en_US
dc.format.extent71 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleAlgorithms and low-power hardware for image processing applicationsen_US
dc.typeThesisen_US
dc.description.degreeS.M.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.identifier.oclc1108621278en_US
dc.description.collectionS.M. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienceen_US
dspace.imported2019-07-18T20:35:03Zen_US
mit.thesis.degreeMasteren_US
mit.thesis.departmentEECSen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record