面向可重构AI芯片的编译框架设计

PDF(1103 KB)

集成电路与嵌入式系统 ›› 2023, Vol. 23 ›› Issue (6) : 20-23.

专题论述

面向可重构AI芯片的编译框架设计

于振华¹, 王向前², 吕亚飞¹

作者信息 +

Compilation Framework Design for Reconfigurable AI Chip

Yu Zhenhua¹, Wang Xiangqian², Lv Yafei¹

Author information +

文章历史 +

摘要

针对LUNA体系结构的特征,设计了高效的类C语言的数据流编译框架NLANG,采用C+原语的静态图编程模式描述LUNA的计算逻辑,提出了外层原语—内层原语—低层原语的3层框架对静态图进行高效转换,分析当前计算特征,归纳出相应计算模式,根据计算模式自动生成匹配的硬件连接配置。性能评测结果表明,NLANG编译器生成的汇编代码效率能够达到手工汇编效率的90%以上。

Abstract

Aiming at the characteristics of the LUNA architecture,an efficient C-like language data flow compilation framework NLANG is designed,and the static graph programming mode of C+ primitives is used to describe the calculation logic of LUNA,and the outer-inner-lower three-layer layer framework primitives are proposed.The key technology mainly includes analyzing the characteristics of the current calculation and summarizing the corresponding calculation mode.According to the calculation mode,Aautomatically generating the matching hardware connection configuration.The performance evaluation results show that the efficiency of assembly code generated by NLANG compiler can reach more than 90% of manual assembly.

导出引用

于振华, 王向前, 吕亚飞. 面向可重构AI芯片的编译框架设计[J]. 集成电路与嵌入式系统. 2023, 23(6): 20-23

Yu Zhenhua, Wang Xiangqian, Lv Yafei. Compilation Framework Design for Reconfigurable AI Chip[J]. Integrated Circuits and Embedded Systems. 2023, 23(6): 20-23

中图分类号： TP314

参考文献

[1] Chen T,Du Z,Sun N,et al.DianNao:a small-footprint high-throughput accelerator for ubiquitous machine-learning[C]//International Conference on Architectural Support for Programming Languages & Operating Systems.ACM,2014.
[2] Chen Y,Tao L,Liu S,et al.DaDianNao:A Machine-Learning Supercomputer[C]//2014 47th Annual IEEE/ACM International Symposium on Microarchitecture,2014.
[3] Du Z,Fasthuber R,Chen T,et al.ShiDianNao:shifting vision processing closer to the sensor[C]//ISCA '15 Proceedings of the 42nd Annual International Symposium on Computer Architecture,2015.
[4] Liu D,Chen T,Liu S,et al.PuDianNao:A Polyvalent Machine Learning Accelerator[J].ACM SIGPLAN Notices,2015,50(4):369-381.
[5] Liu S,Du Z,Tao J,et al.Cambricon:An Instruction Set Architecture for Neural Networks[C]//ACM/IEEE International Symposium on Computer Architecture.IEEE Computer Society,2016:393-405.
[6] Jouppi N P,Young C,Patil N,et al.In-datacenter performance analysis of a tensor processing unit[C]//Proceedings of the 44th annual international symposium on computer architecture,2017:1-12.
[7] Teich P.Under the Hood of Google's TPU2 Machine Learning Clusters[J].blog,2017.
[8] Liao H,Tu J,Xia J,et al.DaVinci:A Scalable Architecture for Neural Network Computing[C]//Hot Chips Symposium,2019:1-44.
[9] 宋鹤鸣.智能语音系统加速器设计[D].上海:上海交通大学,2019.
[10] Jafri S M A H,Gia T N,Dytckov S,et al.Neurocgra: A cgra with support for neural networks[C]//2014 International Conference on High Performance Computing & Simulation (HPCS).IEEE,2014:506-511.
[11] Liu L,Zhu J,Li Z,et al.A survey of coarse-grained reconfigurable architecture and design:Taxonomy,challenges,and applications[J].ACM Computing Surveys (CSUR),2019,52(6):1-39.