卷积神经网络定点化设计及FPGA实现*

陈畅, 黄均才, 刘鉴栋, 袁晶

集成电路与嵌入式系统 ›› 2022, Vol. 22 ›› Issue (2) : 41-45.

集成电路与嵌入式系统 ›› 2022, Vol. 22 ›› Issue (2) : 41-45.
技术纵横

卷积神经网络定点化设计及FPGA实现*

  • 陈畅, 黄均才, 刘鉴栋, 袁晶
作者信息 +

Fixed-point Design of Convolutional Neural Network and FPGA Implementation for Substation Video Surveillance

  • Chen Chang, Huang Juncai, Liu Jiandong, Yuan Jing
Author information +
文章历史 +

摘要

卷积神经网络优异的性能使其在图像处理领域占有重要地位, 然而模型的实际应用多依赖于GPU, 难以部署在对功耗敏感的嵌入式设备上。为了使模型能够高效部署在以FPGA为基础的平台上, 本文提出一种卷积神经网络定点化方法, 以数据精度与资源消耗为设计指标, 根据模型中数据分布的统计以及数据类型的划分, 确定不同的定点化策略, 并给出了不同量化方法与溢出模式和硬件资源消耗的关系。使用Xilinx定点化库进行测试, 实验结果表明, 使用16位定点数对模型进行统一量化, 能够在较小的精度损失下降低硬件资源消耗, 且不同的量化模式下硬件资源消耗相同, 不同的溢出模式下硬件资源消耗区别较大。

Abstract

The excellent performance of convolutional neural network makes it occupy an important position in the field of image processing.However, the actual application of the model mostly relies on GPU, which is difficult to deploy on power-sensitive embedded devices.In order to enable the model to be efficiently deployed on FPGA-based platforms, a fixed-point method for convolutional neural networks is proposed, which takes data accuracy and resource consumption as design indicators.Through the statistics of data distribution in the model and the division of data types, different fixed-point strategies are determined while the relationship among different quantification methods, overflow modes and hardware resource consumption is also given.Using Xilinx fixed-point library for testing, the experiment demonstrates that the use of 16-bit fixed-point number to quantify the model uniformly can reduce hardware resource consumption with a small loss of precision, and the hardware resource consumption is the same in different quantization modes, but it is a big difference in different overflow modes.

关键词

卷积神经网络 / 定点化 / FPGA

Key words

convolutional neural network / fixed-point / FPGA

引用本文

导出引用
陈畅, 黄均才, 刘鉴栋, 袁晶. 卷积神经网络定点化设计及FPGA实现*[J]. 集成电路与嵌入式系统. 2022, 22(2): 41-45
Chen Chang, Huang Juncai, Liu Jiandong, Yuan Jing. Fixed-point Design of Convolutional Neural Network and FPGA Implementation for Substation Video Surveillance[J]. Integrated Circuits and Embedded Systems. 2022, 22(2): 41-45
中图分类号: TP391   

参考文献

[1] Le Cun Y, Bengio Y, Hinton G.Deep learning[J].Nature, 2015, 521(7553):436444.
[2] Russakovsky O, Deng J, Su H, et al.Imagenet large scale visual recognition challenge[J].International journal of computer vision, 2015, 115(3):211252.
[3] He K, Gkioxari G, Dollár P, et al.MaskRCNN[C]//Proceedings of the IEEE international conference on computer vision, 2017:29612969.
[4] Long J, Shelhamer E, Darrell T.Fully convolutional networks for semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2015:34313440.
[5] Girshick R, Donahue J, Darrell T, et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2014:580587.
[6] Girshick R.FastRCNN[C]//Proceedings of the IEEE international conference on computer vision, 2015:14401448.
[7] Liu W, Anguelov D, Erhan D, et al.SSD:Single shot multibox detector[C]//European conference on computer vision, 2016:2137.
[8] Redmon J, Divvala S, Girshick R, et al.You only look once: Unified, realtime object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2016:779788.
[9] Courbariaux M, Hubara I, Soudry D, et al.Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or1[J].arXiv preprintarXiv:1602.02830, 2016.
[10] Li F, Zhang B, Liu B.Ternary weight networks[J].arXiv preprint arXiv:1605.04711, 2016.
[11] Han S, Mao H, Dally W J.Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding[J].Fiber2015, 56(4):37.
[12] Hubara I, Courbariaux M, Soudry D, et al.Quantized neural net works:Training neural networks with low precision weights and activations[J].The Journal of Machine Learning Research, 2017, 18(1):68696898.
[13] Redmon J, Farhadi A.YOLOv3:An incremental improvement[J].arXiv preprint arXiv:1804.02767, 2018.
[14] Song Z, Liu Z, Wang D.Computation error analysis of block floating point arithmetic oriented convolution neural network accelerator design[J].arXiv preprint arXiv:1709.07776, 2017.
[15] Xilinx Corporation.Vivado design suite user guide:High level synthesis ug902[EB/OL].[202107].https://china.xilinx.com/support/documentation/sw_manuals/xilinx2020_1/ug902vivadohighlevelsynthesis.pdf.

基金

*南方电网重点科技项目—智能电网重大关键技术研究与集成示范课题6:变电站全景孪生及高压设备状态感知技术研究(GZHKJXM20200003)。

Accesses

Citation

Detail

段落导航
相关文章

/