Abstract
With the wide application of Field Programmable Gate Arrays (FPGAs) in high-performance computing, artificial intelligence inference, and 5G communications, the scale of circuit designs and the complexity of timing constraints continue to increase, placing higher demands on the runtime efficiency of Static Timing Analysis (STA). Existing FPGA STA tools predominantly rely on single-core or multi-core Central Processing Unit (CPU) architectures. Although continuous algorithmic optimizations have been made, they still face computational bottlenecks and insufficient memory access efficiency when handling large-scale FPGA designs. In recent years, Graphics Processing Units (GPUs), with their massive parallel computing capabilities, have provided new opportunities for improving FPGA STA performance. However, challenges in memory access patterns under heterogeneous GPU architectures, optimization for timing graph loop detection, and heterogeneous parallel acceleration strategies limit the effectiveness of current GPU-accelerated methods in FPGA STA scenarios. To address these issues, we propose an FPGA STA algorithm accelerated by an efficient heterogeneous parallel strategy. First, targeting the problem of discontinuous memory access and field interleaving in traditional object-oriented data structures under CPU-GPU heterogeneous architectures, a structure-of-arrays (SoA)-based data layout strategy is presented. Combined with data reordering operations, this approach effectively reduces memory access latency and improves bandwidth utilization, providing a data foundation for high-performance FPGA STA computational engines. Second, to overcome the limitations of low efficiency and poor robustness in timing graph loop detection, a parallel loop detection optimization algorithm based on color propagation is designed, enabling efficient acceleration in the preprocessing stage of FPGA STA. Furthermore, a task decomposition and timing graph traversal method tailored for CPU-GPU heterogeneous architectures is proposed, achieving efficient acceleration of core STA operations such as delay calculation, levelization, and graph propagation. Finally, experimental results on both the OpenCores and industrial-grade FPGA benchmarks demonstrate that, compared with traditional CPU implementations, the proposed method achieves a runtime speedup of 3.125× to 33.333×, with overall performance surpassing that of the OpenTimer tool. This research provides a practical and feasible approach for efficient timing verification in large-scale FPGA designs.
Key words
field programmable gate array /
static timing analysis /
heterogeneous computing /
parallel acceleration /
electronic design automation
Cite this article
Download Citations
FPGA Static Timing Analysis Algorithm Accelerated by High-Efficiency Heterogeneous Parallelization Strategy[J]. Integrated Circuits and Embedded Systems. 0 https://doi.org/10.20193/j.ices2097-4191.2025.0138
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}