Neural Hardware
The Sensors Group neuromorphic digital AI deep neural network accelerators exploit key notions of sparse brain computing inspired by the brain's Spiking Neural Networks (SNNs) to save energy and time. Our CNN accelerators NullHop and its descendants exploit activation sparsity and our multiple generations of RNN accelerators DeltaRNN, EdgeDRNN, and Spartus exploit temporal sparsity and weight sparsity.
Like in SNNs, the computations become driven by activity, but unlike SNNs, our accelerators access memory predictably.
Key concepts
Instead of computing all units for each input sample, why not only update the ones that have activity?
This idea can lead to embedded IP blocks that access a lot less memory (and burn less energy), but in a predictable way that matches with DRAM requirements.
Key publications in Neural Hardware
Reviews
Liu, Shih-Chii, Chang Gao, Kwantae Kim, and Tobi Delbruck. 2022. “Energy-Efficient Activity-Driven Computing Architectures for Edge Intelligence.” In *2022 International Electron Devices Meeting (IEDM)*, 21.2.1–21.2.4. doi:10.1109/IEDM45625.2022.10019443. http://dx.doi.org/10.1109/IEDM45625.2022.10019443.
Delbruck, T., and S. Liu. 2019. “Data-Driven Neuromorphic DRAM-Based CNN and RNN Accelerators.” In *2019 53rd Asilomar Conference on Signals, Systems, and Computers*, 500–506. doi:10.1109/IEEECONF44664.2019.9048865. http://dx.doi.org/10.1109/IEEECONF44664.2019.9048865.
Research papers
Kim, Kwantae, Chang Gao, Rui Graça, Ilya Kiselev, Hoi-Jun Yoo, Tobi Delbruck, and Shih-Chii Liu. 2022. “A 23-μW Keyword Spotting IC With Ring-Oscillator-Based Time-Domain Feature Extraction.” *IEEE Journal of Solid-State Circuits* 57 (11) (November): 3298–3311. doi:10.1109/JSSC.2022.3195610. http://dx.doi.org/10.1109/JSSC.2022.3195610 .
Chen, Xi, Chang Gao, Tobi Delbruck, and Shih-Chii Liu. 2021. “EILE: Efficient Incremental Learning on the Edge.” In *2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)*, 1–4. ieeexplore.ieee.org. doi:10.1109/AICAS51828.2021.9458554. http://dx.doi.org/10.1109/AICAS51828.2021.9458554.
Gao, Chang, Tobi Delbruck, and Shih-Chii Liu. 2022. “Spartus: A 9.4 TOp/s FPGA-Based LSTM Accelerator Exploiting Spatio-Temporal Sparsity.” *IEEE Trans Neural Netw Learn Syst*. arXiv. doi:10.1109/TNNLS.2022.3180209. http://dx.doi.org/10.1109/TNNLS.2022.3180209.
Gao, C., A. Rios-Navarro, X. Chen, S-C Liu, and T. Delbruck. 2020. “EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference.” *IEEE Journal on Emerging and Selected Topics in Circuits and Systems* 10 (4) (December): 419–432. doi:10.1109/JETCAS.2020.3040300. http://dx.doi.org/10.1109/JETCAS.2020.3040300.
Aimar, Alessandro, Hesham Mostafa, Enrico Calabrese, Antonio Rios-Navarro, Ricardo Tapiador-Morales, Iulia-Alexandra Lungu, Moritz B. Milde, et al. 2019. “NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps.” *IEEE Transactions on Neural Networks and Learning Systems* 30 (3) (March): 644–656. doi:10.1109/TNNLS.2018.2852335. http://dx.doi.org/10.1109/TNNLS.2018.2852335.
Gao, Chang, Daniel Neil, Enea Ceolini, Shih-Chii Liu, and Tobi Delbruck. 2018. “DeltaRNN: A Power-Efficient Recurrent Neural Network Accelerator.” In *Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays*, 21–30. FPGA ’18. New York, NY, USA: Association for Computing Machinery. doi:10.1145/3174243.3174261. https://doi.org/10.1145/3174243.3174261.
PhD theses
D. Neil, “Deep Neural Networks and Hardware Systems for Event-driven Data,” ETH Zurich, 2017. Available: https://www.research-collection.ethz.ch/handle/20.500.11850/168865. ETH Medal distinction award.
A. Aimar, “Energy-Efficient Convolutional Neural Network Accelerators for Edge Intelligence,” PhD, University of Zurich, 2021. Available: https://www.zora.uzh.ch/id/eprint/209482/
C. Gao, “Energy-Efficient Recurrent Neural Network Accelerators for Real-Time Inference,” University of Zurich, 2022. doi: 10.5167/uzh-219686. Available: https://www.zora.uzh.ch/id/eprint/219686/ . [Accessed: Jan. 31, 2024]. UZH PhD thesis distinction award.
B. Rückauer, “Event-Based Vision Processing in Deep Neural Networks,” University of Zurich, 2020. doi: 10.5167/uzh-200987. Available: https://www.zora.uzh.ch/id/eprint/200987/ . [Accessed: Jan. 31, 2024]
E. Calabrese, “Neuromorphic Solutions towards More Efficient Computer Vision,” PhD, University of Zurich, 2021. Available: https://drive.google.com/file/d/10u6mKMVs5kAKjFPEXF5gpQCsxaDWppDg/view?usp=sharing
Neuromorphic Processor Project
NPP Phase 2
Phase 2 of the Samsung Global Research Neuromorphic Processor Project (aka NPP) concluded officially in 2019, but we are still pursuing the aims of this project.
NPP developed theory, architectures, and digital implementations targeting specific applications of deep neural network technology in vision and audition. This project aims towards real-time, low-power, brain-inspired solutions targeted at full-custom SoC integration. A particular aim of the NPP was to develop efficient data-driven deep neural network architectures that can enable always-on operation on battery-powered mobile devices in conjunction with event-driven sensors.
The project team includes leading academic partners in the USA, Canada, and Spain. The project is coordinated by the Inst. of Neuroinformatics. The overall PI of the project was Tobi Delbruck.
The NPP Phase 2 partners included
Inst. of Neuroinformatics (INI), UZH-ETH Zurich (T. Delbruck, SC Liu, G Indiveri, M Pfeiffer)
Montreal Institute of Learning Algorithms (MILA) - Univ. of Montreal (Y Bengio)
Robotics and Technology of Computers Lab, Univ. of Seville (A. Linares-Barranco)
NPP Phase 1
In Phase 1 of NPP, we worked with Samsung and partners from Canada, USA, and Spain to develop deep inference theory and processor architectures with state of the art power efficiency. Several key hardware accelerator results inspired by neuromorphic design principles were obtained by the Sensors group. These results exploit sparsity of neural activation in space and time to reduce computation and particularly expensive memory access to external memory, which costs hundreds of times more energy that local memory access or arithmetic operations. That way, these DNN accelerators are like synchronous spiking neural networks.
Key results
NullHop uses spatial feature map sparsity to provide flexible convolutional neural network (CNN) acceleration that exploits the large amount of sparsity in feature maps resulting from widely-used ReLU activation functions. NullHop can achieve state of the art power efficiency of 3TOp/s/W at throughput of 500GOp/s. See the IEEE TNNLS paper (IEEE link) , video of NullHop driving CNN inference in RoShamBo, and video explaining the Rock-Scissors-Paper demo from Scientifca 2018.
DeltaRNN uses temporal change sparsity in for recurrent neural network (RNN) acceleration that exploits the fact that most of the units in RNNs change slowly. DRNN can accelerate gated recurrent unit (GRU) RNNs by a factor of 10 or more even for single sample inference on single streams. On Xillinx Zynq FGPA, it achieves state of the art effective throughput of 1.2TOp/s at power efficiency of 164 GOp/s/W. See the ICML theory paper, the FPGA18 paper, and the first DeltaRNNv1 demo video, where DeltaRNN does real time spoken digit recognition with people having a variety of accents.
NPP Phase 1 partners
Inst. of Neuroinformatics (INI), UZH-ETH Zurich (T. Delbruck, SC Liu, G Indiveri, M Pfeiffer)
inilabs (F Corradi)
Robotics and Technology of Computers Lab, Univ. of Seville (A. Linares-Barranco)
Inst. of Microelectronics Seville (IMSE-CNM) - (B. Linares-Barranco)
Montreal Institute of Learning Algorithms (MILA) - Univ. of Montreal (Y Bengio)
Other key results
Spiking neural networks (SNNs) can achieve equivalent accuracy as conventional analog neural networks even for very deep CNNs such as VGG16 and GoogleNet, but they are very inefficient for coding precise analog values and their unpredictable memory access is a very poor match to economical DRAM.
Both CNNs and RNNs can be trained for greatly reduced weight and state precision, resulting in huge savings in memory bandwidth.
Key publications from NPP phase 1
- Aimar, A., Mostafa, H., Calabrese, E., Rios-Navarro, A., Tapiador-Morales, R., Lungu, I., et al. (2018). NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps. IEEE Transactions on Neural Networks and Learning Systems, 1–13. doi:10.1109/TNNLS.2018.2852335.
- Gao, C., Neil, D., Ceolini, E., Liu, S.-C., and Delbruck, T. (2018). DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator. in Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays FPGA ’18. (New York, NY, USA: ACM), 21–30. doi:10.1145/3174243.3174261
- J. Binas, D. Neil, S.-C. Liu, and T. Delbruck, “DDD17: End-To-End DAVIS Driving Dataset,” in ICML’17 Workshop on Machine Learning for Autonomous Vehicles (MLAV 2017), Sydney, Australia, 2017 [Online]. Available: https://openreview.net/forum?id=HkehpKVG-¬eId=HkehpKVG-
- D. Neil, J. H. Lee, T. Delbruck, and S.-C. Liu, “Delta Networks for Optimized Recurrent Network Computation,” in PMLR, 2017, pp. 2584–2593 [Online]. Available: http://proceedings.mlr.press/v70/neil17a.html. [Accessed: 14-Sep-2017]
- A. Aimar, H. Mostafa, E. Calabrese, A. Rios-Navarro, R. Tapiador-Morales, I.-A. Lungu, M. B. Milde, F. Corradi, A. Linares-Barranco, S.-C. Liu, and T. Delbruck, “NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps,” arXiv:1706.01406 [cs], Jun. 2017 [Online]. Available: http://arxiv.org/abs/1706.01406. [Accessed: 03-Sep-2017]
- D. Neil, J. H. Lee, T. Delbruck, and S.-C. Liu, “Delta Networks for Optimized Recurrent Network Computation,” in accpted to ICML, 2017 [Online]. Available: https://arxiv.org/abs/1706.01406. [Accessed: 19-Dec-2016]
- D. P. Moeys and et al., “A Sensitive Dynamic and Active Pixel Vision Sensor for Color or Neural Imaging Applications,” IEEE Transactions on Biomedical Circuits and Systems, submitted 2017.
- I.-A. Lungu, F. Corradi, and T. Delbruck, “Live Demonstration: Convolutional Neural Network Driven by Dynamic Vision Sensor Playing RoShamBo,” in 2017 IEEE Symposium on Circuits and Systems (ISCAS 2017), Baltimore, MD, USA, 2017 [Online]. Available: https://drive.google.com/file/d/0BzvXOhBHjRheYjNWZGYtNFpVRkU/view?usp=sharing
- C. Gao, D. Neil, E. Ceolini, and S.-C. Liu, “DeltaRNN: A Power-efficient RNN Accelerator,” under review, 2017.
- D. Neil, J. H. Lee, T. Delbruck, and S.-C. Liu, “Delta Networks for Optimized Recurrent Network Computation,” arXiv:1612.05571 [cs], Dec. 2016 [Online]. Available: http://arxiv.org/abs/1612.05571. [Accessed: 19-Dec-2016]
- B. Rueckauer, I.-A. Lungu, Y. Hu, and M. Pfeiffer, “Theory and Tools for the Conversion of Analog to Spiking Convolutional Neural Networks,” arXiv:1612.04052 [cs, stat], Dec. 2016 [Online]. Available: http://arxiv.org/abs/1612.04052. [Accessed: 16-May-2017]
- J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training Deep Spiking Neural Networks using Backpropagation,” arXiv:1608.08782 [cs], Aug. 2016 [Online]. Available: http://arxiv.org/abs/1608.08782. [Accessed: 03-Sep-2017]
- J. Ott, Z. Lin, Y. Zhang, S.-C. Liu, and Y. Bengio, “Recurrent Neural Networks With Limited Numerical Precision,” arXiv:1608.06902 [cs], Aug. 2016 [Online]. Available: http://arxiv.org/abs/1608.06902. [Accessed: 25-Aug-2016]
- J. Binas, D. Neil, G. Indiveri, S.-C. Liu, and M. Pfeiffer, “Precise deep neural network computation on imprecise low-power analog hardware,” arXiv:1606.07786 [cs], Jun. 2016 [Online]. Available: http://arxiv.org/abs/1606.07786. [Accessed: 23-Aug-2016]
- S. Braun, D. Neil, and S.-C. Liu, “A Curriculum Learning Method for Improved Noise Robustness in Automatic Speech Recognition,” arXiv:1606.06864 [cs], Jun. 2016 [Online]. Available: http://arxiv.org/abs/1606.06864. [Accessed: 16-May-2017]
- D. P. Moeys, F. Corradi, E. Kerr, P. Vance, G. Das, D. Neil, D. Kerr, and T. Delbrück, “Steering a predator robot using a mixed frame/event-driven convolutional neural network,” in 2016 Second International Conference on Event-based Control, Communication, and Signal Processing (EBCCSP), 2016, pp. 1–8.
- D. Neil and S. C. Liu, “Effective sensor fusion with event-based sensors and deep network architectures,” in 2016 IEEE International Symposium on Circuits and Systems (ISCAS), 2016, pp. 2282–2285.
- D. Neil, M. Pfeiffer, and S.-C. Liu, “Learning to Be Efficient: Algorithms for Training Low-latency, Low-compute Deep Spiking Neural Networks,” in Proceedings of the 31st Annual ACM Symposium on Applied Computing, New York, NY, USA, 2016, pp. 293–298 [Online]. Available: http://doi.acm.org/10.1145/2851613.2851724. [Accessed: 23-Aug-2016]
- T. Delbruck, “Neuromorophic Vision Sensing and Processing (Invited paper),” in 2016 European Solid-State Device Research Conf. & European Solid-State Circuits Conf. Proceedings, Lausanne, Switzerland, 2016.
- A. Aimar, E. Calabrese, H. Mostafa, A. Rios-Navarro, R. Tapiador, I.-A. Lungu, A. Jimenez-Fernandez, F. Corradi, S.-C. Liu, A. Linares-Barranco, and T. Delbruck, “Nullhop: Flexibly efficient FPGA CNN accelerator driven by DAVIS neuromorphic vision sensor,” in NIPS 2016, Barcelona, 2016 [Online]. Available: https://nips.cc/Conferences/2016/Schedule?showEvent=6317
- E. Stromatias, D. Neil, M. Pfeiffer, F. Galluppi, S. B. Furber, and S.-C. Liu, “Robustness of spiking Deep Belief Networks to noise and reduced bit precision of neuro-inspired hardware platforms,” Front Neurosci, vol. 9, Jul. 2015 [Online]. Available: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4496577/. [Accessed: 23-Aug-2016]