基于高效异构并行策略加速的FPGA静态时序分析算法

田春生, 赵翔宇, 王硕, 王卓立, 曹永铮, 周婧, 张瑶伟, 陈雷

集成电路与嵌入式系统 ›› 2026, Vol. 26 ›› Issue (4) : 14-25.

PDF(4051 KB)
PDF(4051 KB)
集成电路与嵌入式系统 ›› 2026, Vol. 26 ›› Issue (4) : 14-25. DOI: 10.20193/j.ices2097-4191.2025.0138
集成电路设计自动化(EDA)与高可靠性设计研究专栏

基于高效异构并行策略加速的FPGA静态时序分析算法

作者信息 +

FPGA static timing analysis algorithm accelerated by high-efficiency heterogeneous parallelization strategy

Author information +
文章历史 +

摘要

随着现场可编程门阵列(Field Programmable Gate Array, FPGA)在高性能计算、人工智能推理以及5G通信等领域的广泛运用,其电路设计规模与时序约束复杂度持续攀升,对静态时序分析(Static Timing Analysis, STA)的运行效率提出了更高的要求。现有FPGA STA工具多依赖于单核或多核中央处理器(Central Processing Unit, CPU)架构,虽在算法层面不断优化,但在处理大规模FPGA设计时仍面临计算瓶颈与内存访问效率不足等问题。近年来,图形处理器(Graphics Processing Unit, GPU)凭借大规模并行计算能力,为提升FPGA STA性能提供了新的机遇。然而,现有GPU架构下的内存访问模式、时序图环路检测优化与异构并行加速计算策略等问题,制约了GPU加速方法在FPGA STA场景中的应用效果。针对上述问题,文中提出一种基于高效异构并行策略加速的FPGA STA算法。首先,针对传统面向对象数据结构在CPU-GPU异构架构下存在的内存访问不连续及字段交错导致带宽利用率低等问题,提出了基于数组结构体的数据结构布局策略,并结合数据重排等优化操作,有效降低了访存延迟并提升了带宽利用率,为高性能FPGA STA计算引擎提供数据基座。其次,针对时序图环路检测效率不足及鲁棒性欠佳的现状,设计了一种基于颜色传播的并行环路检测优化算法,实现了FPGA STA前处理阶段的高效加速;进一步地,提出了面向CPU-GPU异构并行架构的任务分解与时序图遍历过程的设计方法,实现了对延迟计算、层次化处理及图传播等STA 核心操作的高效加速。最后,在OpenCores与工业级FPGA测试集上的实验结果表明,相比传统CPU实现,文中方法可实现3.125~33.333倍的运行时间加速比,且整体性能优于OpenTimer工具,上述研究为大规模FPGA设计中的高效时序验证提供了可行路径与实践参考。

Abstract

The widespread integration of Field Programmable Gate Arrays (FPGAs) in high-performance computing, AI inference, and 5G communications has led to an unprecedented escalation in design scale and timing constraint complexity. These trends impose stringent demands on the runtime efficiency of Static Timing Analysis (STA). Current FPGA STA tools, primarily anchored in single-core or multi-core CPU architectures, are increasingly hitting a performance wall, despite persistent algorithmic refinements, they struggle with computational bottlenecks and suboptimal memory throughput when confronted with large-scale designs. In recent years, Graphics Processing Units (GPUs) with their massive parallel computing capabilities have provided new opportunities for improving FPGA STA performance. However, challenges in memory access patterns under heterogeneous GPU architectures, the optimization for timing graph loop detection, and heterogeneous parallel acceleration strategies continue to hinder the effectiveness of current GPU-accelerated methods in FPGA STA scenarios. To address these issues, we propose an FPGA STA algorithm accelerated by an efficient heterogeneous parallel strategy. First, targeting the problem of discontinuous memory access and field interleaving in traditional object-oriented data structures under CPU-GPU heterogeneous architectures, a structure-of-arrays (SoA)-based data layout strategy is presented. Combined with data reordering operations, this approach effectively reduces memory access latency and improves bandwidth utilization, providing a data foundation for high-performance FPGA STA computational engines. Second, to overcome the limitations of low efficiency and poor robustness in timing graph loop detection, a parallel loop detection optimization algorithm based on color propagation is designed, enabling efficient acceleration in the preprocessing stage of FPGA STA. Furthermore, a task decomposition and timing graph traversal method tailored for CPU-GPU heterogeneous architectures is proposed, achieving efficient acceleration of core STA operations such as delay calculation, levelization, and graph propagation. Finally, experimental results on both the OpenCores and industrial-grade FPGA benchmarks demonstrate that, compared with traditional CPU implementations, the proposed method achieves a runtime speedup of 3.125× to 33.333×, with overall performance surpassing that of the OpenTimer tool. This research provides a practical and feasible approach for efficient timing verification in large-scale FPGA designs.

关键词

现场可编程门阵列 / 静态时序分析 / 异构计算 / 并行加速 / 电子设计自动化

Key words

FPGA / static timing analysis / heterogeneous computing / parallel acceleration / electronic design automation

引用本文

导出引用
田春生, 赵翔宇, 王硕, . 基于高效异构并行策略加速的FPGA静态时序分析算法[J]. 集成电路与嵌入式系统. 2026, 26(4): 14-25 https://doi.org/10.20193/j.ices2097-4191.2025.0138
TIAN Chunsheng, ZHAO Xiangyu, WANG Shuo, et al. FPGA static timing analysis algorithm accelerated by high-efficiency heterogeneous parallelization strategy[J]. Integrated Circuits and Embedded Systems. 2026, 26(4): 14-25 https://doi.org/10.20193/j.ices2097-4191.2025.0138
中图分类号: TN47 (大规模集成电路、超大规模集成电路)    TP301 (理论、方法)   

参考文献

[1]
田春生, 陈雷, 王源, 等. 面向FPGA的布局与布线技术研究综述[J]. 电子学报, 2022, 50(5):1243-1254.
摘要
随着大规模集成电路器件复杂度与容量的不断提升,现场可编程门阵列(Field Programmable Gate Array, FPGA)以高度的并行、可定制和可重构的特性得到了广泛的关注与应用. 在制约FPGA发展的众多因素中,最为关键的便是电子设计自动化(Electronic Design Automation, EDA)技术,作为FPGA EDA流程中的关键环节,布局和布线技术的研究对于FPGA的重要性不言而喻. 本文综述了面向FPGA的布局和布线技术,包括基于划分的布局、基于启发式的布局、基于解析式的布局、FPGA串行布线和FPGA并行布线等技术,分析对比了不同技术方法的优缺点,在此基础上,本文还展望了未来FPGA布局和布线技术的发展趋势,将为FPGA未来健康可持续的发展提供有力支撑.
TIAN C S, CHEN L, WANG Y, et al. Review on Technology of Placement and Routing for the FPGA[J]. Acta Electronica Sinica, 2022, 50(5):1243-1254. (in Chinese)
[2]
田春生, 陈雷, 王源, 等. 基于机器学习的FPGA电子设计自动化技术研究综述[J]. 电子与信息学报, 2023, 45(1):1-13.
TIAN C S, CHEN L, WANG Y, et al. A survey on FPGA electronic design automation technology based on machine learning[J]. Journal of Electronics & Information Technology, 2023, 45(1):1-13. (in Chinese)
[3]
麦景, 王嘉睿, 邸志雄, 等. OpenPARF:基于深度学习工具包的大规模异构FPGA开源布局布线框架[J]. 电子与信息学报, 2023, 45(9):3118-3131.
MAI J, WANG J R, DI Z X, et al. OpenPARF:an open-source placement and routing framework for large-scale heterogeneous FPGAs with deep learning toolkit[J]. Journal of Electronics & Information Technology, 2023, 45(9):3118-3131. (in Chinese)
[4]
WANG X B, YOU J. Summary of Tools in FPGA static timing analysis[C]// 2020 7th International Conference on Dependable Systems and Their Applications (DSA),Xi’an,China, 2020:491-492.
[5]
MSHAH B, MEHTA U. Development of static timing analysis tool in Perl[C]// 2020 International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT),Bangalore,India, 2020:252-255.
[6]
MURRAY K E, BETZ V. Tatum:parallel timing analysis for faster design cycles and improved optimization[C]// 2018 International Conference on Field-Programmable Technology (FPT),Naha,Japan, 2018:110-117.
[7]
徐宇. FPGA EDA 工具中的软件模型与优化应用研究[D]. 北京: 中国科学院大学, 2020.
XU Y. Research on optimization and application of FPGA software models in EDA tool[D]. Beijing: University of Chinese Academy of Sciences, 2020. (in Chinese)
[8]
GUO G N, HUANG T W, LIN Y B, et al. GPU-accelerated path-based timing analysis[C]// 2021 58th ACM/IEEE Design Automation Conference (DAC), San Francisco,CA,USA, 2021:721-726.
[9]
GUO G N, HUANG T W, LIN Y B, et al. GPU-accelerated critical path generation with path constraints[C]// 2021 IEEE/ACM International Conference on Computer Aided Design (ICCAD),Munich,Germany, 2021:1-9.
[10]
GUO Z Z, HUANG T W, LIN Y B. Accelerating static timing analysis using CPU-GPU heterogeneous parallelism[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2023, 42(12):4973-4984.
[11]
GUO Z Z, HUANG T W, LIN Y B. HeteroCPPR: accelerating common path pessimism removal with heterogeneous CPU-GPU parallelism[C]// 2021 IEEE/ACM International Conference on Computer Aided Design (ICCAD),Munich,Germany, 2021:1-9.
[12]
GUO Z Z, HUANG T W, LIN Y B. GPU-accelerated static timing analysis[C]// 2020 IEEE/ACM International Conference on Computer Aided Design (ICCAD),San Diego,CA,USA, 2020:1-9.
[13]
贺旭, 王耀, 傅智勇, 等. 敏捷设计中基于机器学习的静态时序分析方法综述[J]. 计算机辅助设计与图形学学报, 2023, 35(4):640-652.
HE X, WANG Y, FU Z Y, et al. A survey on machine learning-based technology for static timing analysis in agile design[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(4):640-652. (in Chinese)
[14]
HUANG T W, WU P C, WONG M D F. Fast path-based timing analysis for CPPR[C]// 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD),San Jose,CA,USA, 2014:596-599.
[15]
周珊, 王金波, 王晓丹. 基于时序路径的FPGA时序分析技术研究[J]. 微电子学与计算机, 2016, 33(1):76-80.
ZHOU S, WANG J B, WANG X D. Research of FPGA timing sequence analysis technology based on timing sequence path[J]. Microelectronics & computer, 2016, 33(1):76-80. (in Chinese)
[16]
HUANG T W, WONG M D F. OpenTimer:A high-performance timing analysis tool[C]// 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD),Austin,TX,USA, 2015:895-902.
[17]
BAIDARI I, HANAGAWADIMATH. Traversing directed cyclic and acyclic graphs using modified BFS algorithm[C]// 2014 Science and Information Conference,London,UK, 2014:175-181.
[18]
DONATH W, HATHAWAY. Distributed static timing analysis[P]. US Patent. 1998.
[19]
HOLDER A, DCAROTHERS C, KALAFALA K. Prototype for a large-scale static timing analyzer running on an IBM Blue Gene[C]// 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW),Atlanta,GA,USA, 2010:1-8.
[20]
WANG H H W, LIN L Y Z, HUANG R H M, et al. CASTA:CUDA-accelerated static timing analysis for VLSI designs[C]// 2014 43rd International Conference on Parallel Processing,Minneapolis,MN,USA, 2014:192-200.
[21]
GULATI K, KHATRI S P. Accelerating statistical static timing analysis using graphics processing units[C]// 2009 Asia and South Pacific Design Automation Conference,Yokohama,Japan, 2009:260-265.
[22]
SHEN Y R, HU J. GPU acceleration for PCA-based statistical static timing analysis[C]// 2015 33rd IEEE International Conference on Computer Design (ICCD),New York,NY,USA, 2015:674-679.
[23]
GUO Z Z, ZHANG Z D, LI W X, et al. HeteroExcept:a CPU-GPU heterogeneous algorithm to accelerate exception-aware static timing analysis[C]// Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design,New York,NY,USA, 2024:1-9.
[24]
HSU D F, LAN X J, MILLER G, et al. A comparative study of algorithm for computing strongly connected components[C]// 2017 IEEE 15th International Conference on Dependable, Autonomic and Secure Computing,15th International Conference on Pervasive Intelligence and Computing,3rd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech),Orlando,FL,USA, 2017:431-437.
[25]
田春生, 陈雷, 王硕, 等. 面向大规模FPGA的粗粒度并行布线方法研究[J]. 集成电路与嵌入式系统, 2025, 25(6):68-77.
摘要
针对大规模FPGA布线过程中存在的资源开销与内存占用过大、布线算法求解效率低等问题,提出了一种资源友好型的面向大规模FPGA的粗粒度并行布线方法。首先,提出了非侵入式的数据优化技术,以减少因布线资源图而导致的资源开销与内存占用,解决因FPGA规模增大而导致的内存空间爆炸问题,为布线方法提供数据基座。其次,提出了自适应负载均衡以及高扇出线网划分技术,以解决粗粒度并行布线方法并行度低的问题,提升布线方法求解效率。实验结果表明,所提出的面向大规模FPGA的粗粒度并行布线方法可以在降低资源消耗与内存占用90%的情况下,获得3.18倍的运行时间加速比,而不会对线长与关键路径实验等性能指标造成影响。
TIAN C S, CHEN L, WANG S, et al. Research on coarse-grained parallel routing method for large-scale FPGAs[J]. Integrated Circuits and Embedded Systems, 2025, 25(6):68-77. (in Chinese)

Aiming at the problems such as excessive resource overhead, high memory consumption, and low routing efficiency in the routing process of large-scale FPGAs, a resource-friendly coarse-grained parallel routing method tailored for large-scale FPGAs is proposed. First, a non-intrusive data optimization technique is proposed to reduce the resource overhead and memory consumption caused by the routing resource graph, addressing the memory explosion problem resulting from the increasing scale of FPGAs and providing a data foundation for the routing method. Second, adaptive load balancing and high-fanout net partitioning techniques are introduced to tackle the low parallelism in coarse-grained parallel routing, thereby improving the overall routing efficiency. The experimental results show that the proposed coarse-grained parallel routing method for large-scale FPGAs can achieve a 3.18× speedup in runtime while reducing resource and memory consumption by 90%, without compromising performance metrics such as wirelength and critical path delay.

基金

国家自然科学基金项目(62374138)

责任编辑: 薛士然
PDF(4051 KB)

Accesses

Citation

Detail

段落导航
相关文章

/