Special Issue on FPGA Cutting-edge Technologies and Applied Research
HUANG Sixiao, PENG Haoxiang, SHI Xu, SU Zhifeng, HUANG Mingqiang, YU Hao
In recent years, with the widespread application of large models (such as GPT, LLaMA, DeepSeek, etc.), the computing power requirements and energy efficiency issues in the reasoning stage have become increasingly prominent. Although traditional GPU solutions can provide high throughput, they face challenges in power consumption, real-time performance and cost. FPGAs have become an important alternative for large model reasoning deployment with their customizable architecture, low latency determinism and high energy efficiency. This paper systematically reviews the network structure of large models and the reasoning implementation technology of large models on FPGAs, covering three major directions: hardware architecture adaptation, algorithm-hardware co-optimization and system-level challenges. At the hardware level, the focus is on the design of computing units and storage level optimization strategies; at the algorithm level, key technologies such as model compression, dynamic quantization and compiler optimization are analyzed. At the system level, challenges such as multi-FPGA expansion, thermal management and emerging storage-computing integrated architectures are discussed. In addition, this paper summarizes the limitations of the current FPGA reasoning ecosystem (such as insufficient tool chain maturity) and looks forward to future trends, including chiplet heterogeneous integration, photonic computing fusion and the establishment of a standardized evaluation system. The research results show that the architectural flexibility of FPGA gives it a unique advantage in the field of efficient reasoning of large models, but interdisciplinary collaboration is still needed to promote the implementation of the technology.