卷积神经网络优异的性能使其在图像处理领域占有重要地位, 然而模型的实际应用多依赖于GPU, 难以部署在对功耗敏感的嵌入式设备上。为了使模型能够高效部署在以FPGA为基础的平台上, 本文提出一种卷积神经网络定点化方法, 以数据精度与资源消耗为设计指标, 根据模型中数据分布的统计以及数据类型的划分, 确定不同的定点化策略, 并给出了不同量化方法与溢出模式和硬件资源消耗的关系。使用Xilinx定点化库进行测试, 实验结果表明, 使用16位定点数对模型进行统一量化, 能够在较小的精度损失下降低硬件资源消耗, 且不同的量化模式下硬件资源消耗相同, 不同的溢出模式下硬件资源消耗区别较大。
Abstract
The excellent performance of convolutional neural network makes it occupy an important position in the field of image processing.However, the actual application of the model mostly relies on GPU, which is difficult to deploy on power-sensitive embedded devices.In order to enable the model to be efficiently deployed on FPGA-based platforms, a fixed-point method for convolutional neural networks is proposed, which takes data accuracy and resource consumption as design indicators.Through the statistics of data distribution in the model and the division of data types, different fixed-point strategies are determined while the relationship among different quantification methods, overflow modes and hardware resource consumption is also given.Using Xilinx fixed-point library for testing, the experiment demonstrates that the use of 16-bit fixed-point number to quantify the model uniformly can reduce hardware resource consumption with a small loss of precision, and the hardware resource consumption is the same in different quantization modes, but it is a big difference in different overflow modes.
关键词
卷积神经网络 /
定点化 /
FPGA
Key words
convolutional neural network /
fixed-point /
FPGA
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Le Cun Y, Bengio Y, Hinton G.Deep learning[J].Nature, 2015, 521(7553):436444.
[2] Russakovsky O, Deng J, Su H, et al.Imagenet large scale visual recognition challenge[J].International journal of computer vision, 2015, 115(3):211252.
[3] He K, Gkioxari G, Dollár P, et al.MaskRCNN[C]//Proceedings of the IEEE international conference on computer vision, 2017:29612969.
[4] Long J, Shelhamer E, Darrell T.Fully convolutional networks for semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2015:34313440.
[5] Girshick R, Donahue J, Darrell T, et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2014:580587.
[6] Girshick R.FastRCNN[C]//Proceedings of the IEEE international conference on computer vision, 2015:14401448.
[7] Liu W, Anguelov D, Erhan D, et al.SSD:Single shot multibox detector[C]//European conference on computer vision, 2016:2137.
[8] Redmon J, Divvala S, Girshick R, et al.You only look once: Unified, realtime object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2016:779788.
[9] Courbariaux M, Hubara I, Soudry D, et al.Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or1[J].arXiv preprintarXiv:1602.02830, 2016.
[10] Li F, Zhang B, Liu B.Ternary weight networks[J].arXiv preprint arXiv:1605.04711, 2016.
[11] Han S, Mao H, Dally W J.Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding[J].Fiber2015, 56(4):37.
[12] Hubara I, Courbariaux M, Soudry D, et al.Quantized neural net works:Training neural networks with low precision weights and activations[J].The Journal of Machine Learning Research, 2017, 18(1):68696898.
[13] Redmon J, Farhadi A.YOLOv3:An incremental improvement[J].arXiv preprint arXiv:1804.02767, 2018.
[14] Song Z, Liu Z, Wang D.Computation error analysis of block floating point arithmetic oriented convolution neural network accelerator design[J].arXiv preprint arXiv:1709.07776, 2017.
[15] Xilinx Corporation.Vivado design suite user guide:High level synthesis ug902[EB/OL].[202107].https://china.xilinx.com/support/documentation/sw_manuals/xilinx2020_1/ug902vivadohighlevelsynthesis.pdf.
基金
*南方电网重点科技项目—智能电网重大关键技术研究与集成示范课题6:变电站全景孪生及高压设备状态感知技术研究(GZHKJXM20200003)。