基于国产智能可重构平台的AI加速软硬件设计

郭涛, 周海洋, 余裕鑫, 范晓畅, 王硕, 张彦龙, 陈雷

集成电路与嵌入式系统 ›› 2025, Vol. 25 ›› Issue (12) : 8-17.

PDF(7993 KB)
PDF(7993 KB)
集成电路与嵌入式系统 ›› 2025, Vol. 25 ›› Issue (12) : 8-17. DOI: 10.20193/j.ices2097-4191.2025.0057
智能嵌入式系统软硬件协同设计与应用专栏

基于国产智能可重构平台的AI加速软硬件设计

作者信息 +

AI acceleration software and hardware design based on domestic intelligent reconfigurable platform

Author information +
文章历史 +

摘要

针对装备电子系统智能化的需求,基于“鸿芯”智能可重构平台中的可编程逻辑设计了一款神经网络加速器软核及配套的量化编译软件,实现了神经网络模型面向自研加速器软核的统一量化编译与加速运行,同时拓展“鸿途”嵌入式实时操作系统功能,实现了对神经网络硬件加速运行的支持。经实验测试,神经网络加速器软核性能与AMD Xilinx DPU软核相当,“鸿途”嵌入式实时操作系统运行ResNet18、ResNet50的性能相比AMD Xilinx PetaLinux环境提升了4倍,提升了“鸿芯”智能可重构平台中的人工智能能力。

Abstract

In response to the demand for intelligent equipment electronic systems, this article designs a neural network accelerator soft core and supporting quantitative compilation software based on the programmable logic on the "Hongxin" intelligent reconfigurable platform. It realizes the unified quantitative compilation and deployment of neural network models for self-developed accelerator soft cores, and expands the functions of the "Hongtu" embedded real-time operating system, achieving support for hardware accelerated operation of neural networks. Through experimental testing, the performance of the neural network accelerator soft core is comparable to that of the AMD Xilinx DPU soft core. The "Hongtu" embedded real-time operating system running ResNet18 and ResNet50 delivers four times higher performance than the AMD Xilinx PetaLinux environment. These results enhance the artificial intelligence capabilities of the "Hongxin" intelligent reconfigurable platform.

关键词

智能可重构平台 / 神经网络加速器 / 神经网络量化编译软件 / 嵌入式实时操作系统

Key words

intelligent reconfigurable platform / neural network accelerator / neural network quantitative compilation software / embedded real-time operating system

引用本文

导出引用
郭涛, 周海洋, 余裕鑫, . 基于国产智能可重构平台的AI加速软硬件设计[J]. 集成电路与嵌入式系统. 2025, 25(12): 8-17 https://doi.org/10.20193/j.ices2097-4191.2025.0057
GUO Tao, ZHOU Haiyang, YU Yuxin, et al. AI acceleration software and hardware design based on domestic intelligent reconfigurable platform[J]. Integrated Circuits and Embedded Systems. 2025, 25(12): 8-17 https://doi.org/10.20193/j.ices2097-4191.2025.0057
中图分类号: TP319 (专用应用软件)   

参考文献

[1]
隆云滔, 刘海波, 许哲平, 等. 关于构建我国人工智能开源创新生态体系的建议[J]. 中国科学院院刊, 2025, 40(3): 453-458.
LONG Y T, LIU H B, XU Z P, et al. Suggestions on building China̓s artificial intelligence open source innovation ecosystem[J]. Bulletin of Chinese Academy of Sciences, 2025, 40(3):453-458 (in Chinese).
[2]
谢坤鹏, 卢冶, 靳宗明, 等. FAQ-CNN:面向量化卷积神经网络的嵌入式FPGA可扩展加速框架[J]. 计算机研究与发展, 2022, 59(7):1409-1427.
XIE K P, LU Z, JIN Z M, et al. FAQ-CNN:A Flexible Acceleration Framework for Quantized ConvolutionalNeural Networks on Embedded FPGAs[J]. Journal of Computer Research and Development, 2022, 59(7):1409-1427 (in Chinese).
[3]
LEONE G, SCRUGLI M A, BADAS L, et al. SYNtzulu:A Tiny RISC-V-Controlled SNN Processor for Real-Time Sensor Data Analysis on Low-Power FPGAs[J]. IEEE Transactions on Circuits and Systems I, 2025, 72(2):790- 801.
[4]
CHEN T, DU Z, SUN N, et al. DianNao:A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning[C]// Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2014:269-284.
[5]
CHEN Y, KRISHNA T, EMER J, et al. Eyeriss:A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks[C]// Proceedings of the 43rd Annual International Symposium on Computer Architecture (ISCA), 2016:367-379
[6]
JUN L, SHU L Z, LI D, et al. FlightVGM:Efficient Video Generation Model Inference with Online Sparsification and Hybrid Precision on FPGAs[C]// Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '25),New York,NY,USA, 2025: 2-13.
[7]
籍浩林, 徐伟, 朴永杰, 等. 基于CNN的异构FPGA硬件加速器设计[J]. 液晶与显示, 2025, 40(3):448-456.
JI H L, XU W, PIAO Y J, et al. Design of heterogeneous FPGA hardware accelerator based on CNN[J]. Chinese Journal of Liquid Crystals and Displays, 2025, 40(3):448-456 (in Chinese).
[8]
AMD Xilinx.Zynq DPU Product Guide,PG338 v4.1.
[9]
PEDRO ANTUNES, ARTUR PODOBAS.FPGA-Based Neural Network Accelerators for Space Applications: A Survey.arXiv:2504.16173 [cs.AR].
[10]
HU Y, LIU Y, LIU Z. A Survey on Convolutional Neural Network Accelerators: GPU, FPGA and ASIC[C]// 2022 14th International Conference on Computer Research and Development (ICCRD),Shenzhen,China, 2022:100-107.
[11]
QIAN C, LING T H, CHRISTOPHER CICHIWSKYJ, et al. Configuration-aware approaches for enhancing energy efficiency in FPGA-based deep learning accelerators[J]. Journal of Systems Architecture, 2025,163.
[12]
ANOUAR NECHI, LUKAS GROTH, SALEH MULHEM, et al. FPGA-based Deep Learning Inference Accelerators: Where Are We Standing[J]. ACM Trans. Reconfigurable Technol. Syst. 2023, 16(4).
[13]
WU H, LI J, CHEN X. Implementation of CNN Heterogeneous Scheme Based on Domestic FPGA with RISC-V Soft Core CPU[C]// 2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA), 2022:1-4.
[14]
孟文超, 刘宏森, 龙常青, 等. 基于FPGA实现卷积神经网络的软硬件协同加速方法:CN202311624785.5[P].2024-02-27.
MENG W C, LIU H S, LONG C Q, et al. Implementation of software hardware collaborative acceleration method for convolutional neural network based on FPGA:CN202311624785.5[P].2024-02-27 (in Chinese).
[15]
吴刚, 陈永正, 张澜, 等. 一种用于部署CNN模型至基于FPGA的高性能加速器的编译系统:CN116301920A[P].2025-04-11.
WU G, CHEN Y Z, ZHANG L, et al.A compilation system for deploying CNN models to FPGA-based high-performance accelerators[P]. CN116301920A,2025-04-11 (in Chinese).
[16]
Wind River. How real-time operating systems uplevel their game and merge with AI[EB/OL]. [2025-06]. https://www.windriver.com/intelligent-systems/rtos/ready-for-ai.
[17]
ZHANG Y, ZHAO X, YIN J, et al. Operating System and Artificial Intelligence:A Systematic Review[J]. arXiv preprint arXiv:2407.14567,2024.
[18]
YALAMANCHILI M T, HIREMATH P, MULPURI K. A Comprehensive Survey on AI-Enhanced CPU Scheduling in Real-Time Environments[J]. Techniques, Challenges,and Opportunities.International Research Journal of Engineering and Technology (IRJET), 2024, 11(12):785-790.
[19]
李欣瑶, 刘飞阳, 李鹏. 嵌入式智能计算加速技术综述[C]// 2019年(第四届)中国航空科学技术大会论文集, 2019:1004-1012.
LI X Y, LIU F Y, LI P. Survey of Embedded Intellingent Computing Acceleration Technology[C]// Proceedings of the 4th China Aerospace Science and Technology Conference in 2019, 2019:1004-1012 (in Chinese).
[20]
CITTADINI E, MARINONI M, BUTTAZZO G. A hardware accelerator to support deep learning processor units in real-time image processing[J]. Engineering Applications of Artificial Intelligence, 2025, 145:110159.
[21]
XUE G, ZHE W, CHUN Y C, et al. From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36(4):5837-5857.
[22]
GHANI A, AINA A, SEE C H. An Optimised CNN Hardware Accelerator Applicable to IoT End Nodes for Disruptive Healthcare[J]. IoT, 2024(5):901-921.
[23]
KIM J, LEE J, KWON Y, et al. QuantuneV2:Compiler-Based Local Metric-Driven Mixed Precision Quantization for Practical Embedded AI Applications[J]. arXiv preprint arXiv:2501.07161,2025.
[24]
潘年鹏, 戴广成, 李璋, 等. DNP-PTQ:一种针对YOLO嵌入式部署的模型量化方法[J]. 激光与光电子学进展, 2025.
PAN N P, DAI G C, LI Z, et al. DNP-PTQ:A Model Quantization Method for YOLO Embedded[J]. Laser & Optoelectronics Progress, 2025 (in Chinese).
[25]
王琦瑶. 人工智能中公开数据集的专利技术分析[J]. 中国科技信息, 2024(22):14-17.
WANG Q Y. Patent Technology Analysis of Public Data Sets in Artificial Intelligence[J]. China Science and Technology Information, 2024(22):14-17 (in Chinese).
[26]
AMD Xilinx. Vitis AI Library[EB/OL]. [2025-06]. https://github.com/Xilinx/Vitis-AI/tree/master/examples/vai_library.
[27]
AMD Xilinx. Vitis AI Library User Guide,UG1354 v3.5.

责任编辑: 薛士然
PDF(7993 KB)

Accesses

Citation

Detail

段落导航
相关文章

/