A scalable optical neural network architecture using coherent detection
Author(s)
Sludds, Alexander; Bernstein, Liane; Hamerly, Ryan M; Soljacic, Marin; Englund, Dirk R.
DownloadPublished version (1.544Mb)
Publisher Policy
Publisher Policy
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Terms of use
Metadata
Show full item recordAbstract
© COPYRIGHT SPIE. Downloading of the abstract is permitted for personal use only. Storing, proceßing, and learning from data is a central task in both industrial practice and modern science. Recent advances in modern statistical learning, particularly Deep Neural Networks (DNNs), have given record breaking performance on tasks in game playing,1, 2 natural language proceßing,3 computer vision,4 computational biology,5, 6 and many others. The rapid growth of the field has been driven by an increase in the amount of public datasets,7 improvements to algorithms,8 and a substantial growth in computing power.9 In order to perform well on these tasks networks have had to grow in size, learning more complicated statistical features. The training and deployment of these large neural networks has spurred the creation of many neural network accelerators to aid in the computation of these networks.10-12 Existing general purpose computing devices such as CPUs and GPUs are limited both by thermal dißipation per unit area and yield aßociated with large chips.13, 14 The design of Application Specific Integrated circuits (ASICs) has aided in decreasing the energy consumption per workload substantially by limiting the supported operations on chip. An example of this is the first generation tensor proceßing unit (TPU)15 which is able to perform the inference of large convolutional neural networks in datacenter in <10ms with an idle power of 28W and an workload power of 40W. It may seen counterintuitive then that the limiting factor for the implementation of DNNs is not computation, but rather the energy and bandwidth aßociated with reading and writing data from memory as well as the energy cost of moving data inside of the ASIC.15, 16 Several emerging technologies, such as in-memory computing,17 memristive croßbar arrays18 promise increased performance, but these emerging architectures suffer from calibration ißues and limited accuracy.19 Photonics as a field has had tremendous succeß in improving the energy efficiency of data interconnects.20 This has motivated the creation of optical neural networks (ONNs) based on 3D-printed diffractive elements,21 spiking neural networks utilizing ring-resonators,22 reservoir computing23 and nanophotonic circuits.24 However, these architectures have several ißues. 3D-printed diffractive networks and schemes requiring spatial light modulators are non-programmable, meaning that they are unable to perform the task of training. Nanophotonic circuits allow for an O(N2) array of interferometers to be programmed, providing paßive matrix-vector multiplication. However, the large (1mm2) size of on chip electro-optic interferometers means that scaling to an array of 100x100 would require 10; 000mm2 of silicon, demonstrating the limitations of scaling this architecture. To date no architecture has demonstrated high-speed (GHz) speed computation with more than N ≥ 10; 000 neurons. Here we present an architecture that is scalable to N ≥ 106 neurons. The key mechanism of this architecture is balanced homodyne detection. By scaling the architecture to such a large size we show that we can decimate energy costs per operation aßociated with the optical component of this architecture, reaching a bound set by shot noise on the receiving photodetectors which leads to claßification error. We call this bound a standard quantum limit (SQL) which reaches 100zJ/MAC on problems such as MNIST. We also analyze the energy consumption using existing technologies and show that sub-fJ/MAC energy consumption should be poßible. This paper is organized as follows: In section 1 we will discuß the function of this architecture as a matrixmatrix proceßor. In section 2 we will analyze the energy consumption of the architecture. In section 3 we will discuß methods for training and extending the accelerator to a broader scope of problems, namely convolutionally neural networks (CNNs).
Date issued
2020Department
Massachusetts Institute of Technology. Research Laboratory of Electronics; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Proceedings of SPIE - The International Society for Optical Engineering
Publisher
SPIE