面向SoC内存的流水线奇偶校验电路设计与优化

马敬博; 张光达; 王会权; 裴秉玺; 方健; 黄成龙; 罗慧; 蒋艳德

doi:10.20193/j.ices2097-4191.2025.0137

PDF(10454 KB)

集成电路与嵌入式系统 ›› 2026, Vol. 26 ›› Issue (4) : 26-33. DOI: 10.20193/j.ices2097-4191.2025.0137

集成电路设计自动化(EDA)与高可靠性设计研究专栏

面向SoC内存的流水线奇偶校验电路设计与优化

作者信息 +

Design and optimization of pipelined parity check circuit for SoC memory

Author information +

文章历史 +

摘要

随着片上系统(SoC)设计日益追求高性能与高可靠性,以满足各种AI应用场景处理海量数据的严苛要求。奇偶校验机制被广泛引入电路设计中以增强SoC数据传输的可靠性。然而,在宽位宽传输数据场景下,传统的奇偶校验电路设计面临校验复杂度高、译码延时大等问题,制约了SoC整体性能,如系统主时钟频率和数据访存带宽。针对这一技术难题,创新性地提出了一种面向SoC内存的AXI总线多级流水线奇偶校验电路设计方法。该设计通过流水线架构对校验过程进行分级优化,显著减小了数据通路中关键路径的延时。实验结果表明,在电路总面积增加0.47%和功耗上升0.24%的微小代价下,所提出设计方法实现了数据读/写通路关键路径的时序优化,将AXI总线写数据和读数据通道路径最大延时分别降低了18.62%和25.60%,有效提升了SoC整体性能与可靠性。

Abstract

As SoC architectures evolve to meet the computational intensity of diverse AI applications, the pursuit of high-performance throughput must be balanced with uncompromising reliability. Consequently, parity check mechanisms have emerged as a cornerstone of modern circuit design, essential for safeguarding the integrity of massive data movement within the SoC fabric. However, in wide-bit-width data transmission scenarios, traditional parity check circuit designs face challenges such as high verification complexity and significant decoding latency, which in turn constrain the overall performance of SoCs, including system master clock frequency and data access bandwidth. To address this technical challenge, this paper innovatively proposes a multi-stage pipelined parity check circuit design method for the AXI bus in SoC memory. This design employs a pipelined architecture to optimize the verification process in stages, significantly reducing the critical path delay in the data pathway. The experiment results demonstrate that, at a minimal cost of a 0.47% increase in total circuit area and a 0.24% rise in power consumption, the proposed design method achieves timing optimization of the date read/write bus critical path, reducing the maximum delay of the AXI bus write and read data circuit paths by 18.62% and by 25.60% respectively, effectively enhancing the overall performance and reliability of the SoC.

导出引用

马敬博, 张光达, 王会权, 等. 面向SoC内存的流水线奇偶校验电路设计与优化[J]. 集成电路与嵌入式系统. 2026, 26(4): 26-33 https://doi.org/10.20193/j.ices2097-4191.2025.0137

MA Jingbo, ZHANG Guangda, WANG Huiquan, et al. Design and optimization of pipelined parity check circuit for SoC memory[J]. Integrated Circuits and Embedded Systems. 2026, 26(4): 26-33 https://doi.org/10.20193/j.ices2097-4191.2025.0137

中图分类号： TP872 (远距离控制和信号、远距离控制和信号系统)

参考文献

列表( 原文顺序 | 文献年度倒序 | 文中引用次数倒序 ) 可视化分析

[1]

PRIHOZHY

, BEZATI

, AB

RAHMAN A A H

, et al. Synthesis and optimization of pipelines for HW implementations of dataflow programs[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2015, 34(10):1613-1626.

https://doi.org/10.1109/TCAD.2015.2427278

http://ieeexplore.ieee.org/document/7097015/

本文引用 [1]

[2]	WU W, JIA Q, LUO F, et al. A Parallel Optimization Method for KCF based on Inter-core Communication of Multi-core DSP[C]// 2021 IEEE 4th Advanced Information Management,Communicates,Electronic and Automation Control Conference (IMCEC).IEEE, 2021:1736-1740. 本文引用 [1]

[3]	TALUKDAR P. Power-aware automated hybrid pipelining of combinational circuits[C]// 2015 IEEE International Conference on Signal Processing, Informatics,Communication and Energy Systems (SPICES).IEEE, 2015:1-5. 本文引用 [1]

[4]	TALUKDAR P. On logic depth per pipelining stage with power aware flop, wave and hybrid pipelining with gate size and area constraints[C]// 2015 19th International Symposium on VLSI Design and Test.IEEE, 2015:1-6. 本文引用 [1]

[5]

王培富, 李振涛. 一种流水线架构的2D-FFT加速引擎设计[J]. 电子与封装, 2025, 25(12):120302.

WANG

P F

, LI

Z T

. Design of a 2D-FFT Acceleration Engine with Pipelined Architecture[J]. Electronics and Packaging, 2025, 25(12):120302. (in Chinese)

https://doi.org/10.16257/j.cnki.1681-1070.2025.0166

本文引用 [1] 摘要

To meet the demand for efficient, small-point two-dimensional fast Fourier transform (2D-FFT) in the distance and velocity dimensions of millimeter-wave radar signal processing, a pipelined architecture 2D-FFT acceleration engine based on single-path delay feedback is designed. This engine incorporates a data pick-and-pass module before each stage and supports configurable point sizes of <i>M</i>×<i>N</i>≤2 048. Results demonstrate that this design enables flexible configuration of 2D-FFT points. The absolute error of all 2D-FFT operation results is less than 2.5, and the relative error is less than 0.5%, which meets the accuracy requirements. Compared with the traditional 2D-FFT operation, the design achieves significantly improved computational efficiency.

[6]	HAN L, HUANG P, ZHOU Z, et al. Pipeline design of Nonvolatile-based computing in memory for Convolutional Neural Networks Inference Accelerators[C]// 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE).IEEE, 2024:1-2. 本文引用 [1]

[7]	SAUSSEREAU J, JEGO C, LEROUX C, et al. Design and Implementation of a RISC-V core with a flexible pipeline for Design Space Exploration[C]// 2023 30th IEEE International Conference on Electronics,Circuits and Systems (ICECS).IEEE, 2023:1-5. 本文引用 [1]

[8]	KUMAR S, IJARWAL Y, TIWARI S, et al. Design and Analysis of Power Efficient Four Phase Pipelined ALU[C]// 2025 3rd IEEE International Conference on Industrial Electronics: Developments & Applications (ICIDeA).IEEE, 2025:1-4. 本文引用 [1]

[9]	RYU J, KIM J, KIM H, et al. SPipe: Hybrid GPU and CPU Pipeline for Training LLMs under Memory Pressure[C]// Proceedings of the 20th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 2025:1-19. 本文引用 [1]

[10]	高剑刚, 石嵩, 郑方. R-RS:一种面向E级计算的内存可靠性增强技术[J]. 计算机学报, 2023, 46(2):260-273. GAO J G, SHI S, ZHENG F. R-RS:A Memory Reliability Improvement Technology for Exascale Computing[J]. Chinese Journal of Computers, 2023, 46(2):260-273. (in Chinese) 本文引用 [1]

[11]	崔小乐, 张世界, 张强, 等. 一种邻近层资源共享的三维堆叠存储器内建自修复策略[J]. 计算机学报, 2017, 40(9):2030-2039. CUI X L, ZHANG S J, ZHANG Q, et al. A BISR Scheme for 3D Stacked Memory by Sharing Adjacent Redundancy Cells Across Dies[J]. Chinese Journasl of Computers, 2017, 40(9):2030-2039. (in Chinese) 本文引用 [1]

[12]	PRAVEENA H, KALYANI K. FPGA implementation of parity check matrix based low density parity check decoder[C]// 2018 2nd International Conference on Inventive Systems and Control (ICISC).IEEE 2018:1214-1217. 本文引用 [1]

[13]	WANG Z, ZHANG P, LIU C, et al. Parity check for decoding QC-LDPC codes with all-diagonal parity-check structure[C]// 2019 IEEE 3rd Advanced Information Management,Communicates,Electronic and Automation Control Conference (IMCEC).IEEE, 2019:210-213. 本文引用 [1]

[14]	HE Z, CHEN Y, GAO S, et al. Fast Node Polar Decoding Method Based on Hybrid Parity Check and CRC[J]. IEEE Access, 2025, 13:202794-202806. https://doi.org/10.1109/ACCESS.2025.3636559 https://ieeexplore.ieee.org/document/11267413/ 本文引用 [1]

[15]	CHOPDE A, KAMBLE D, SANGAMESHWAR K, et al. Low-Density Parity Check (LDPC) Architecture Using Verilog[C]// 2025 International Conference on Computing Technologies & Data Communication (ICCTDC).IEEE, 2025:1-6. 本文引用 [1]