Special Topic on IC Design Automation and High-reliability Design

Select

Layer assignment algorithm considering antenna effects for advanced technology nodes

CHEN Kehao, LI Zepeng, LIN Ziqing, LIU Genggeng

Integrated Circuits and Embedded Systems. 2026, 26(4): 1-13. https://doi.org/10.20193/j.ices2097-4191.2025.0136

Download PDF (25) HTML (1145)

Knowledge map

Save

As integrated circuit feature sizes continue to shrink, the antenna effect increasingly impacts chip reliability. Layer assignment, a critical step in physical design, allocates 2D routing segments into a multi-layer 3D space. Improper assignment can cause wires to form excessively long antennas that accumulate charge and damage gates. However, existing research primarily focuses on delay and via optimization without adequately considering antenna effects. Moreover, the widely adopted non-default-rule (NDR) wire technology in advanced nodes exacerbates antenna effects due to larger wire widths. This paper proposes an antenna-aware layer assignment algorithm for advanced technology nodes comprising four core strategies. An antenna-cost-aware dynamic programming strategy that reduces violations during initialization. A high-layer-priority segment reassignment strategy that precisely controls antenna area growth. A timing-aware NDR replacement strategy that fixes violations while limiting delay impact. A g-edge resource negotiation strategy that releases routing resources through cross-net coordination. The experimental results demonstrate that the proposed algorithm significantly reduces antenna-violating nets and pins while maintaining excellent delay and via count performance.

Select

Special Topic on IC Design Automation and High-reliability Design

Scan chain diagnosis and algorithm design based on sideway scan

WANG Qitao, FENG Haoran, LAO Junjie, YOU Jiaxin, LIN Zefan, LAI Liyang

Integrated Circuits and Embedded Systems. 2026, 26(4): 41-50. https://doi.org/10.20193/j.ices2097-4191.2025.0133

Download PDF (16) HTML (1129)

Knowledge map

Save

With the increasing complexity and integration levels of integrated circuits, Diagnosis-Driven Yield Analysis (DDYA) has become increasingly important in accelerating physical failure analysis and improving yield. However, the low diagnostic resolution of scan chain diagnosis based on scan testing remains a weak link in DDYA. This thesis studies a scan chain diagnosis based on hardware architecture improvement-sideway scan. This technique groups scan chains through clock domain or layout constraints and introduces a cyclic shift sideway transmission path between adjacent scan chains within each group. By transmitting data from the faulty chain to the normal chain and then unloading it, followed by analysis using the sideway diagnostic algorithm, the technique enables precise diagnosis of various fault scenarios. This architecture offers lower hardware overhead compared to the two-dimensional scan and higher diagnostic resolution compared to the bidirectional scan. Comparative experiments across multiple circuits demonstrate that, compared to software-based scan chain diagnosis, Sideway Scan achieves up to 41% improvement in single-fault diagnosis resolution, up to 80% in double-fault diagnosis, and up to 168% in triple-fault diagnosis. Meanwhile, in various fault scenarios, diagnosis time is reduced by over 90%, with the maximum reduction reaching 99%. The study demonstrates the feasibility, stability, time advantage, and diagnostic resolution advantage of the sideway scan, providing a more efficient and precise solution for fault diagnosis in complex integrated circuits.

Select

Special Topic on IC Design Automation and High-reliability Design

Design and optimization of pipelined parity check circuit for SoC memory

MA Jingbo, ZHANG Guangda, WANG Huiquan, PEI Bingxi, FANG Jian, HUANG Chenglong, LUO Hui, JIANG Yande

Integrated Circuits and Embedded Systems. 2026, 26(4): 26-33. https://doi.org/10.20193/j.ices2097-4191.2025.0137

Download PDF (26) HTML (1122)

Knowledge map

Save

As SoC architectures evolve to meet the computational intensity of diverse AI applications, the pursuit of high-performance throughput must be balanced with uncompromising reliability. Consequently, parity check mechanisms have emerged as a cornerstone of modern circuit design, essential for safeguarding the integrity of massive data movement within the SoC fabric. However, in wide-bit-width data transmission scenarios, traditional parity check circuit designs face challenges such as high verification complexity and significant decoding latency, which in turn constrain the overall performance of SoCs, including system master clock frequency and data access bandwidth. To address this technical challenge, this paper innovatively proposes a multi-stage pipelined parity check circuit design method for the AXI bus in SoC memory. This design employs a pipelined architecture to optimize the verification process in stages, significantly reducing the critical path delay in the data pathway. The experiment results demonstrate that, at a minimal cost of a 0.47% increase in total circuit area and a 0.24% rise in power consumption, the proposed design method achieves timing optimization of the date read/write bus critical path, reducing the maximum delay of the AXI bus write and read data circuit paths by 18.62% and by 25.60% respectively, effectively enhancing the overall performance and reliability of the SoC.

Select

Special Topic on IC Design Automation and High-reliability Design

Discrete particle swarm optimization-based placement-routing co-optimization method for multiplexers

ZHOU Shiqi, CAI Huayang, WANG Jingyi, LIU Genggeng

Integrated Circuits and Embedded Systems. 2026, 26(4): 51-60. https://doi.org/10.20193/j.ices2097-4191.2025.0134

Download PDF (18) HTML (1089)

Knowledge map

Save

Continuous-flow microfluidic biochips (CFMBs) are widely used in biochemical analysis due to their high precision and reliability. CFMBs consist of a flow layer and a control layer. To manage complex logic in the control layer with limited control pins, multiplexers are extensively employed. However, the physical design of multiplexers-specifically the co-optimization of valve placement and channel routing-remains underexplored. To address this, this paper proposes a co-optimization method based on Discrete Particle Swarm Optimization (DPSO). First, valve placement regions are constrained via preprocessing to ensure routing feasibility. Second, a DPSO framework encodes placement into particle positions and utilizes an embedded A* router to provide routing cost as fitness, establishing a closed-loop feedback mechanism between placement and routing. Third, X-architecture routing is introduced to expand the solution space and minimize wirelength. Experimental results demonstrate that the proposed method reduces the average control channel length by 8.27%. Notably, the X-architecture contributes a 5.01% improvement over traditional R-type routing, significantly enhancing both layout quality and routing efficiency.

Select

Special Topic on IC Design Automation and High-reliability Design

FPGA static timing analysis algorithm accelerated by high-efficiency heterogeneous parallelization strategy

TIAN Chunsheng, ZHAO Xiangyu, WANG Shuo, WANG Zhuoli, CAO Yongzheng, ZHOU Jing, ZHANG Yaowei, CHEN Lei

Integrated Circuits and Embedded Systems. 2026, 26(4): 14-25. https://doi.org/10.20193/j.ices2097-4191.2025.0138

Download PDF (35) HTML (772)

Knowledge map

Save

The widespread integration of Field Programmable Gate Arrays (FPGAs) in high-performance computing, AI inference, and 5G communications has led to an unprecedented escalation in design scale and timing constraint complexity. These trends impose stringent demands on the runtime efficiency of Static Timing Analysis (STA). Current FPGA STA tools, primarily anchored in single-core or multi-core CPU architectures, are increasingly hitting a performance wall, despite persistent algorithmic refinements, they struggle with computational bottlenecks and suboptimal memory throughput when confronted with large-scale designs. In recent years, Graphics Processing Units (GPUs) with their massive parallel computing capabilities have provided new opportunities for improving FPGA STA performance. However, challenges in memory access patterns under heterogeneous GPU architectures, the optimization for timing graph loop detection, and heterogeneous parallel acceleration strategies continue to hinder the effectiveness of current GPU-accelerated methods in FPGA STA scenarios. To address these issues, we propose an FPGA STA algorithm accelerated by an efficient heterogeneous parallel strategy. First, targeting the problem of discontinuous memory access and field interleaving in traditional object-oriented data structures under CPU-GPU heterogeneous architectures, a structure-of-arrays (SoA)-based data layout strategy is presented. Combined with data reordering operations, this approach effectively reduces memory access latency and improves bandwidth utilization, providing a data foundation for high-performance FPGA STA computational engines. Second, to overcome the limitations of low efficiency and poor robustness in timing graph loop detection, a parallel loop detection optimization algorithm based on color propagation is designed, enabling efficient acceleration in the preprocessing stage of FPGA STA. Furthermore, a task decomposition and timing graph traversal method tailored for CPU-GPU heterogeneous architectures is proposed, achieving efficient acceleration of core STA operations such as delay calculation, levelization, and graph propagation. Finally, experimental results on both the OpenCores and industrial-grade FPGA benchmarks demonstrate that, compared with traditional CPU implementations, the proposed method achieves a runtime speedup of 3.125× to 33.333×, with overall performance surpassing that of the OpenTimer tool. This research provides a practical and feasible approach for efficient timing verification in large-scale FPGA designs.

Select

Special Topic on IC Design Automation and High-reliability Design

Design of median filter circuit based on stochastic number correlation

DAI Yunqiong, WU Yuzhang, WANG Sheng, YU Fuan, SUN Wanghong, ZHANG Yongqiang, WANG Shaowei

Integrated Circuits and Embedded Systems. 2026, 26(4): 34-40. https://doi.org/10.20193/j.ices2097-4191.2025.0135

Download PDF (29) HTML (670)

Knowledge map

Save

Stochastic computing (SC), an unconventional computational paradigm, employs probabilities to represent numerical values. This representation enables complex arithmetic operations to be performed using simple logic gates. This work presents a fast unary median filtering circuit design. The proposed filter utilizes counters to generate stochastic numbers (SNs) and constructs fundamental sorting network units using stochastic correlation logic. A feedback loop, formed based on the output, dynamically terminates computations early without consuming additional hardware area, significantly reducing substantial circuit latency. The experimental results demonstrate that the proposed median filter design outperforms existing implementations in both actual bitstream length and energy consumption. Specifically, the proposed 3×3 window median filter circuit achieves a 55.58% reduction in energy. Further validation using median filtering on images corrupted by salt-and-pepper noise confirms the accuracy of the proposed circuit. For a 16-input sorting network application, the proposed design exhibits lower consumption when inputs range within [0, 0.5], achieving up to a 50% reduction in actual bitstream length and energy consumption.

Topic

Please choose a citation manager

Content to export

模态框（Modal）标题

Topic

Please choose a citation manager

Content to export