The Sensors Group neuromorphic digital AI deep neural network accelerators exploit key notions of sparse brain computing inspired by the brain's Spiking Neural Networks (SNNs) to save energy and time. Our CNN accelerators NullHop and its descendants exploit activation sparsity and our multiple generations of RNN accelerators DeltaRNN, EdgeDRNN, and Spartus exploit temporal sparsity and weight sparsity.
Like in SNNs, the computations become driven by activity, but unlike SNNs, our accelerators access memory predictably.
Instead of computing all units for each input sample, why not only update the ones that have activity?
This idea can lead to embedded IP blocks that access a lot less memory (and burn less energy), but in a predictable way that matches with DRAM requirements.
Shih-Chii Liu, Sheng Zhou, Zixiao Li, Chang Gao, Kwantae Kim and Tobi Delbruck, 2024. "Bringing Dynamic Sparsity to the Forefront for Low-Power Audio Edge Computing: Brain-inspired approach for sparsifying network updates", *IEEE Solid-State Circuits Mag*, 16(4), 62-69.
http://dx.doi.org/10.1109/MSSC.2024.3455290.
Liu, Shih-Chii, Chang Gao, Kwantae Kim, and Tobi Delbruck. 2022. “Energy-Efficient Activity-Driven Computing Architectures for Edge Intelligence.” In *2022 International Electron Devices Meeting (IEDM)*, 21.2.1–21.2.4. doi:10.1109/IEDM45625.2022.10019443. http://dx.doi.org/10.1109/IEDM45625.2022.10019443.
Tobi Delbruck and Shih-Chii Liu. 2019. “Data-Driven Neuromorphic DRAM-Based CNN and RNN Accelerators.” In *2019 53rd Asilomar Conference on Signals, Systems, and Computers*, 500–506. doi:10.1109/IEEECONF44664.2019.9048865. http://dx.doi.org/10.1109/IEEECONF44664.2019.9048865.
Kwantae Kim, Chang Gao, Rui Graça, Ilya Kiselev, Hoi-Jun Yoo, Tobi Delbruck, and Shih-Chii Liu. 2022. “A 23-μW Keyword Spotting IC With Ring-Oscillator-Based Time-Domain Feature Extraction.” *IEEE Journal of Solid-State Circuits* 57 (11) (November): 3298–3311. doi:10.1109/JSSC.2022.3195610. http://dx.doi.org/10.1109/JSSC.2022.3195610 .
Xi Chen, Chang Gao, Tobi Delbruck, and Shih-Chii Liu. 2021. “EILE: Efficient Incremental Learning on the Edge.” In *2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)*, 1–4. ieeexplore.ieee.org. doi:10.1109/AICAS51828.2021.9458554. http://dx.doi.org/10.1109/AICAS51828.2021.9458554.
Chang Gao, Tobi Delbruck, and Shih-Chii Liu. 2022. “Spartus: A 9.4 TOp/s FPGA-Based LSTM Accelerator Exploiting Spatio-Temporal Sparsity.” *IEEE Trans Neural Netw Learn Syst*. arXiv. doi:10.1109/TNNLS.2022.3180209. http://dx.doi.org/10.1109/TNNLS.2022.3180209.
Chang Gao, Antonio Rios-Navarro, Xi Chen, Shih-Chii Liu and Tobi Delbruck. 2020. “EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference.” *IEEE Journal on Emerging and Selected Topics in Circuits and Systems* 10 (4) (December): 419–432. doi:10.1109/JETCAS.2020.3040300. http://dx.doi.org/10.1109/JETCAS.2020.3040300.
Alessandro Aimar, Hesham Mostafa, Enrico Calabrese, Antonio Rios-Navarro, Ricardo Tapiador-Morales, Iulia-Alexandra Lungu, Moritz B. Milde, et al. 2019. “NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps.” *IEEE Transactions on Neural Networks and Learning Systems* 30 (3) (March): 644–656. doi:10.1109/TNNLS.2018.2852335. http://dx.doi.org/10.1109/TNNLS.2018.2852335.
Chang Gao, Daniel Neil, Enea Ceolini, Shih-Chii Liu, and Tobi Delbruck. 2018. “DeltaRNN: A Power-Efficient Recurrent Neural Network Accelerator.” In *Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays*, 21–30. FPGA ’18. New York, NY, USA: Association for Computing Machinery. doi:10.1145/3174243.3174261. https://doi.org/10.1145/3174243.3174261.
PhD theses
D. Neil, “Deep Neural Networks and Hardware Systems for Event-driven Data,” ETH Zurich, 2017. Available: https://www.research-collection.ethz.ch/handle/20.500.11850/168865. ETH Medal distinction award.
A. Aimar, “Energy-Efficient Convolutional Neural Network Accelerators for Edge Intelligence,” PhD, University of Zurich, 2021. Available: https://www.zora.uzh.ch/id/eprint/209482/
C. Gao, “Energy-Efficient Recurrent Neural Network Accelerators for Real-Time Inference,” University of Zurich, 2022. doi: 10.5167/uzh-219686. Available: https://www.zora.uzh.ch/id/eprint/219686/ . [Accessed: Jan. 31, 2024]. UZH PhD thesis distinction award.
B. Rückauer, “Event-Based Vision Processing in Deep Neural Networks,” University of Zurich, 2020. doi: 10.5167/uzh-200987. Available: https://www.zora.uzh.ch/id/eprint/200987/ . [Accessed: Jan. 31, 2024]
E. Calabrese, “Neuromorphic Solutions towards More Efficient Computer Vision,” PhD, University of Zurich, 2021. Available: https://drive.google.com/file/d/10u6mKMVs5kAKjFPEXF5gpQCsxaDWppDg/view?usp=sharing
Phase 2 of the Samsung Global Research Neuromorphic Processor Project (aka NPP) concluded officially in 2019, but we are still pursuing the aims of this project.
NPP developed theory, architectures, and digital implementations targeting specific applications of deep neural network technology in vision and audition. This project aims towards real-time, low-power, brain-inspired solutions targeted at full-custom SoC integration. A particular aim of the NPP was to develop efficient data-driven deep neural network architectures that can enable always-on operation on battery-powered mobile devices in conjunction with event-driven sensors.
The project team includes leading academic partners in the USA, Canada, and Spain. The project is coordinated by the Inst. of Neuroinformatics. The overall PI of the project was Tobi Delbruck.
The NPP Phase 2 partners included
Inst. of Neuroinformatics (INI), UZH-ETH Zurich (T. Delbruck, SC Liu, G Indiveri, M Pfeiffer)
Montreal Institute of Learning Algorithms (MILA) - Univ. of Montreal (Y Bengio)
Robotics and Technology of Computers Lab, Univ. of Seville (A. Linares-Barranco)
In Phase 1 of NPP, we worked with Samsung and partners from Canada, USA, and Spain to develop deep inference theory and processor architectures with state of the art power efficiency. Several key hardware accelerator results inspired by neuromorphic design principles were obtained by the Sensors group. These results exploit sparsity of neural activation in space and time to reduce computation and particularly expensive memory access to external memory, which costs hundreds of times more energy that local memory access or arithmetic operations. That way, these DNN accelerators are like synchronous spiking neural networks.
NullHop uses spatial feature map sparsity to provide flexible convolutional neural network (CNN) acceleration that exploits the large amount of sparsity in feature maps resulting from widely-used ReLU activation functions. NullHop can achieve state of the art power efficiency of 3TOp/s/W at throughput of 500GOp/s. See the IEEE TNNLS paper (IEEE link) , video of NullHop driving CNN inference in RoShamBo, and video explaining the Rock-Scissors-Paper demo from Scientifca 2018.
DeltaRNN uses temporal change sparsity in for recurrent neural network (RNN) acceleration that exploits the fact that most of the units in RNNs change slowly. DRNN can accelerate gated recurrent unit (GRU) RNNs by a factor of 10 or more even for single sample inference on single streams. On Xillinx Zynq FGPA, it achieves state of the art effective throughput of 1.2TOp/s at power efficiency of 164 GOp/s/W. See the ICML theory paper, the FPGA18 paper, and the first DeltaRNNv1 demo video, where DeltaRNN does real time spoken digit recognition with people having a variety of accents.
Inst. of Neuroinformatics (INI), UZH-ETH Zurich (T. Delbruck, SC Liu, G Indiveri, M Pfeiffer)
inilabs (F Corradi)
Robotics and Technology of Computers Lab, Univ. of Seville (A. Linares-Barranco)
Inst. of Microelectronics Seville (IMSE-CNM) - (B. Linares-Barranco)
Montreal Institute of Learning Algorithms (MILA) - Univ. of Montreal (Y Bengio)
Spiking neural networks (SNNs) can achieve equivalent accuracy as conventional analog neural networks even for very deep CNNs such as VGG16 and GoogleNet, but they are very inefficient for coding precise analog values and their unpredictable memory access is a very poor match to economical DRAM.
Both CNNs and RNNs can be trained for greatly reduced weight and state precision, resulting in huge savings in memory bandwidth.