针对LUNA体系结构的特征,设计了高效的类C语言的数据流编译框架NLANG,采用C+原语的静态图编程模式描述LUNA的计算逻辑,提出了外层原语—内层原语—低层原语的3层框架对静态图进行高效转换,分析当前计算特征,归纳出相应计算模式,根据计算模式自动生成匹配的硬件连接配置。性能评测结果表明,NLANG编译器生成的汇编代码效率能够达到手工汇编效率的90%以上。
Abstract
Aiming at the characteristics of the LUNA architecture,an efficient C-like language data flow compilation framework NLANG is designed,and the static graph programming mode of C+ primitives is used to describe the calculation logic of LUNA,and the outer-inner-lower three-layer layer framework primitives are proposed.The key technology mainly includes analyzing the characteristics of the current calculation and summarizing the corresponding calculation mode.According to the calculation mode,Aautomatically generating the matching hardware connection configuration.The performance evaluation results show that the efficiency of assembly code generated by NLANG compiler can reach more than 90% of manual assembly.
关键词
可重构 /
硬件连接 /
编译优化 /
AI芯片
Key words
reconfigurable /
hardware connect /
compiling optimization /
AI chip
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Chen T,Du Z,Sun N,et al.DianNao:a small-footprint high-throughput accelerator for ubiquitous machine-learning[C]//International Conference on Architectural Support for Programming Languages & Operating Systems.ACM,2014.
[2] Chen Y,Tao L,Liu S,et al.DaDianNao:A Machine-Learning Supercomputer[C]//2014 47th Annual IEEE/ACM International Symposium on Microarchitecture,2014.
[3] Du Z,Fasthuber R,Chen T,et al.ShiDianNao:shifting vision processing closer to the sensor[C]//ISCA '15 Proceedings of the 42nd Annual International Symposium on Computer Architecture,2015.
[4] Liu D,Chen T,Liu S,et al.PuDianNao:A Polyvalent Machine Learning Accelerator[J].ACM SIGPLAN Notices,2015,50(4):369-381.
[5] Liu S,Du Z,Tao J,et al.Cambricon:An Instruction Set Architecture for Neural Networks[C]//ACM/IEEE International Symposium on Computer Architecture.IEEE Computer Society,2016:393-405.
[6] Jouppi N P,Young C,Patil N,et al.In-datacenter performance analysis of a tensor processing unit[C]//Proceedings of the 44th annual international symposium on computer architecture,2017:1-12.
[7] Teich P.Under the Hood of Google's TPU2 Machine Learning Clusters[J].blog,2017.
[8] Liao H,Tu J,Xia J,et al.DaVinci:A Scalable Architecture for Neural Network Computing[C]//Hot Chips Symposium,2019:1-44.
[9] 宋鹤鸣.智能语音系统加速器设计[D].上海:上海交通大学,2019.
[10] Jafri S M A H,Gia T N,Dytckov S,et al.Neurocgra: A cgra with support for neural networks[C]//2014 International Conference on High Performance Computing & Simulation (HPCS).IEEE,2014:506-511.
[11] Liu L,Zhu J,Li Z,et al.A survey of coarse-grained reconfigurable architecture and design:Taxonomy,challenges,and applications[J].ACM Computing Surveys (CSUR),2019,52(6):1-39.