摘要
随着现场可编程门阵列(Field Programmable Gate Array, FPGA)在高性能计算、人工智能推理以及5G通信等领域的广泛运用,其电路设计规模与时序约束复杂度持续攀升,对静态时序分析(Static Timing Analysis, STA)的运行效率提出了更高的要求。现有FPGA STA工具多依赖于单核或多核中央处理器(Central Processing Unit, CPU)架构,虽在算法层面不断优化,但在处理大规模FPGA设计时仍面临计算瓶颈与内存访问效率不足等问题。近年来,图形处理器(Graphics Processing Unit, GPU)凭借其大规模并行计算能力,为提升FPGA STA性能提供了新的机遇。然而,现有GPU架构下的内存访问模式、时序图环路检测优化与异构并行加速计算策略等问题,制约了GPU加速方法在FPGA STA场景中的应用效果。针对上述问题,本文提出一种基于高效异构并行策略加速的FPGA STA算法。首先,针对传统面向对象数据结构在CPU-GPU异构架构下存在的内存访问不连续及字段交错导致带宽利用率低等问题,提出了基于数组结构体的数据结构布局策略,并结合数据重排等优化操作,有效降低访存延迟并提升带宽利用率,为高性能FPGA STA计算引擎提供数据基座。其次,针对时序图环路检测效率不足及鲁棒性欠佳的现状,设计了一种基于颜色传播的并行环路检测优化算法,实现了FPGA STA前处理阶段的高效加速。进一步地,提出了面向CPU-GPU异构并行架构的任务分解与时序图遍历过程的设计方法,实现了对延迟计算、层次化处理及图传播等STA 核心操作的高效加速。最后,在OpenCores与工业级FPGA测试集上的实验结果表明,相比传统CPU实现,本文方法可实现3.125倍至33.333倍的运行时间加速比,且整体性能优于OpenTimer工具,上述研究为大规模FPGA设计中的高效时序验证提供了可行路径与实践参考。
Abstract
With the wide application of Field Programmable Gate Arrays (FPGAs) in high-performance computing, artificial intelligence inference, and 5G communications, the scale of circuit designs and the complexity of timing constraints continue to increase, placing higher demands on the runtime efficiency of Static Timing Analysis (STA). Existing FPGA STA tools predominantly rely on single-core or multi-core Central Processing Unit (CPU) architectures. Although continuous algorithmic optimizations have been made, they still face computational bottlenecks and insufficient memory access efficiency when handling large-scale FPGA designs. In recent years, Graphics Processing Units (GPUs), with their massive parallel computing capabilities, have provided new opportunities for improving FPGA STA performance. However, challenges in memory access patterns under heterogeneous GPU architectures, optimization for timing graph loop detection, and heterogeneous parallel acceleration strategies limit the effectiveness of current GPU-accelerated methods in FPGA STA scenarios. To address these issues, we propose an FPGA STA algorithm accelerated by an efficient heterogeneous parallel strategy. First, targeting the problem of discontinuous memory access and field interleaving in traditional object-oriented data structures under CPU-GPU heterogeneous architectures, a structure-of-arrays (SoA)-based data layout strategy is presented. Combined with data reordering operations, this approach effectively reduces memory access latency and improves bandwidth utilization, providing a data foundation for high-performance FPGA STA computational engines. Second, to overcome the limitations of low efficiency and poor robustness in timing graph loop detection, a parallel loop detection optimization algorithm based on color propagation is designed, enabling efficient acceleration in the preprocessing stage of FPGA STA. Furthermore, a task decomposition and timing graph traversal method tailored for CPU-GPU heterogeneous architectures is proposed, achieving efficient acceleration of core STA operations such as delay calculation, levelization, and graph propagation. Finally, experimental results on both the OpenCores and industrial-grade FPGA benchmarks demonstrate that, compared with traditional CPU implementations, the proposed method achieves a runtime speedup of 3.125× to 33.333×, with overall performance surpassing that of the OpenTimer tool. This research provides a practical and feasible approach for efficient timing verification in large-scale FPGA designs.
关键词
现场可编程门阵列 /
静态时序分析 /
异构计算 /
并行加速 /
电子设计自动化
Key words
field programmable gate array /
static timing analysis /
heterogeneous computing /
parallel acceleration /
electronic design automation
田春生, 赵翔宇, 王硕, 王卓立, 曹永铮, 周婧, 张瑶伟, 陈雷.
基于高效异构并行策略加速的FPGA静态时序分析算法[J]. 集成电路与嵌入式系统. 0 https://doi.org/10.20193/j.ices2097-4191.2025.0138
FPGA Static Timing Analysis Algorithm Accelerated by High-Efficiency Heterogeneous Parallelization Strategy[J]. Integrated Circuits and Embedded Systems. 0 https://doi.org/10.20193/j.ices2097-4191.2025.0138
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
基金
国家自然科学基金面上项目(62374138)