Design for Multi-Precision Reconfigurable Tensor Computing Unit

HU Xianghong, YIN Feiyue, LIANG Kelong, FENG Zhaozhang, LIN Yuanmiao, CAI Shuting, Xiong Xiaoming

Integrated Circuits and Embedded Systems ›› 0

Integrated Circuits and Embedded Systems ›› 0 DOI: 10.20193/j.ices2097-4191.2025.0109

Design for Multi-Precision Reconfigurable Tensor Computing Unit

  • HU Xianghong, YIN Feiyue, LIANG Kelong, FENG Zhaozhang, LIN Yuanmiao, CAI Shuting, Xiong Xiaoming
Author information +
History +

Abstract

With the rapid development of artificial intelligence and deep learning applications, tensor computing urgently demands high-efficiency and multi-precision computing hardware accelerators. Traditional general-purpose processors face energy efficiency bottlenecks when processing large-scale matrix multiplication operations, while existing dedicated accelerators often lack flexibility in supporting diverse data precision and hybrid computing modes. This paper presents a multi-precision and mixed-precision tensor processing unit (TPU), designed based on a reconfigurable architecture, which supports five data formats (INT4, INT8, FP16, BF16, FP32) and two hybrid modes (FP16+FP32, BF16+FP32).It is capable of efficiently performing matrix multiplication and accumulation across three different dimensions (m16n16k16, m32n8k16, m8n32k16). By incorporating a reconfigurable computing array, dynamic data flow control, multi-mode buffer design, and a unified floating-point processing unit, the design achieves high hardware reuse and significantly improved computational efficiency. Synthesized on the VCU118 FPGA platform at 251.13MHz, it delivers a peak theoretical performance of 257.16 GOPS/GFLOPS (INT4/INT8/FP16/BF16) and 64.29 GFLOPS (FP32). This design is well-suited for applications such as deep learning inference, autonomous driving, and medical imaging, where both computational efficiency and flexibility are critical.

Key words

Tensor Processing Unit / Multi-Precision Computation / Reconfigurable Architecture / Matrix Multiplication / Hardware Reutilization

Cite this article

Download Citations
HU Xianghong, YIN Feiyue, LIANG Kelong, FENG Zhaozhang, LIN Yuanmiao, CAI Shuting, Xiong Xiaoming. Design for Multi-Precision Reconfigurable Tensor Computing Unit[J]. Integrated Circuits and Embedded Systems. 0 https://doi.org/10.20193/j.ices2097-4191.2025.0109

Funding

Science and Technology Planning Project of Guangzhou(2023B01J0007​); National Natural Science Foundation of China(62301165)

Accesses

Citation

Detail

Sections
Recommended

/