Aiming at the characteristics of the LUNA architecture,an efficient C-like language data flow compilation framework NLANG is designed,and the static graph programming mode of C+ primitives is used to describe the calculation logic of LUNA,and the outer-inner-lower three-layer layer framework primitives are proposed.The key technology mainly includes analyzing the characteristics of the current calculation and summarizing the corresponding calculation mode.According to the calculation mode,Aautomatically generating the matching hardware connection configuration.The performance evaluation results show that the efficiency of assembly code generated by NLANG compiler can reach more than 90% of manual assembly.
Key words
reconfigurable /
hardware connect /
compiling optimization /
AI chip
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
References
[1] Chen T,Du Z,Sun N,et al.DianNao:a small-footprint high-throughput accelerator for ubiquitous machine-learning[C]//International Conference on Architectural Support for Programming Languages & Operating Systems.ACM,2014.
[2] Chen Y,Tao L,Liu S,et al.DaDianNao:A Machine-Learning Supercomputer[C]//2014 47th Annual IEEE/ACM International Symposium on Microarchitecture,2014.
[3] Du Z,Fasthuber R,Chen T,et al.ShiDianNao:shifting vision processing closer to the sensor[C]//ISCA '15 Proceedings of the 42nd Annual International Symposium on Computer Architecture,2015.
[4] Liu D,Chen T,Liu S,et al.PuDianNao:A Polyvalent Machine Learning Accelerator[J].ACM SIGPLAN Notices,2015,50(4):369-381.
[5] Liu S,Du Z,Tao J,et al.Cambricon:An Instruction Set Architecture for Neural Networks[C]//ACM/IEEE International Symposium on Computer Architecture.IEEE Computer Society,2016:393-405.
[6] Jouppi N P,Young C,Patil N,et al.In-datacenter performance analysis of a tensor processing unit[C]//Proceedings of the 44th annual international symposium on computer architecture,2017:1-12.
[7] Teich P.Under the Hood of Google's TPU2 Machine Learning Clusters[J].blog,2017.
[8] Liao H,Tu J,Xia J,et al.DaVinci:A Scalable Architecture for Neural Network Computing[C]//Hot Chips Symposium,2019:1-44.
[9] 宋鹤鸣.智能语音系统加速器设计[D].上海:上海交通大学,2019.
[10] Jafri S M A H,Gia T N,Dytckov S,et al.Neurocgra: A cgra with support for neural networks[C]//2014 International Conference on High Performance Computing & Simulation (HPCS).IEEE,2014:506-511.
[11] Liu L,Zhu J,Li Z,et al.A survey of coarse-grained reconfigurable architecture and design:Taxonomy,challenges,and applications[J].ACM Computing Surveys (CSUR),2019,52(6):1-39.