Home Browse Just Accepted

Just Accepted

Note: The articles listed below have been peer-reviewed and accepted for publication in this journal. These articles have not yet been scheduled for a specific issue; their content and layout may undergo minor changes in the final published version. Please refer to the final published version as the definitive one. This journal has assigned each of these articles a unique and persistent DOI. You may use the DOI to cite this article directly.
Please wait a minute...
  • Select all
    |
  • 王 少威
    Accepted: 2026-01-12
    Stochastic computing (SC), an unconventional computational paradigm, employs probabilities to represent numerical values. This representation enables complex arithmetic operations to be performed using simple logic gates. This work presents a fast unary median filtering circuit design. The proposed filter utilizes counters to generate stochastic numbers (SNs) and constructs fundamental sorting network units using stochastic correlation logic. A feedback loop, formed based on the output, dynamically terminates computations early without consuming additional hardware area, significantly reducing substantial circuit latency. Experimental results demonstrate that the proposed median filter design outperforms existing implementations in both actual bitstream length and energy consumption. Specifically, the proposed 3×3 window median filter circuit achieves a 55.58% reduction in energy. Further validation using median filtering on images corrupted by salt-and-pepper noise confirms the accuracy of the proposed circuit. For a 16-input sorting network application, the proposed design exhibits lower consumption when inputs range within [0, 0.5], achieving up to a 50% reduction in actual bitstream length and energy consumption.
  • Accepted: 2026-01-12
    With the wide application of Field Programmable Gate Arrays (FPGAs) in high-performance computing, artificial intelligence inference, and 5G communications, the scale of circuit designs and the complexity of timing constraints continue to increase, placing higher demands on the runtime efficiency of Static Timing Analysis (STA). Existing FPGA STA tools predominantly rely on single-core or multi-core Central Processing Unit (CPU) architectures. Although continuous algorithmic optimizations have been made, they still face computational bottlenecks and insufficient memory access efficiency when handling large-scale FPGA designs. In recent years, Graphics Processing Units (GPUs), with their massive parallel computing capabilities, have provided new opportunities for improving FPGA STA performance. However, challenges in memory access patterns under heterogeneous GPU architectures, optimization for timing graph loop detection, and heterogeneous parallel acceleration strategies limit the effectiveness of current GPU-accelerated methods in FPGA STA scenarios. To address these issues, we propose an FPGA STA algorithm accelerated by an efficient heterogeneous parallel strategy. First, targeting the problem of discontinuous memory access and field interleaving in traditional object-oriented data structures under CPU-GPU heterogeneous architectures, a structure-of-arrays (SoA)-based data layout strategy is presented. Combined with data reordering operations, this approach effectively reduces memory access latency and improves bandwidth utilization, providing a data foundation for high-performance FPGA STA computational engines. Second, to overcome the limitations of low efficiency and poor robustness in timing graph loop detection, a parallel loop detection optimization algorithm based on color propagation is designed, enabling efficient acceleration in the preprocessing stage of FPGA STA. Furthermore, a task decomposition and timing graph traversal method tailored for CPU-GPU heterogeneous architectures is proposed, achieving efficient acceleration of core STA operations such as delay calculation, levelization, and graph propagation. Finally, experimental results on both the OpenCores and industrial-grade FPGA benchmarks demonstrate that, compared with traditional CPU implementations, the proposed method achieves a runtime speedup of 3.125× to 33.333×, with overall performance surpassing that of the OpenTimer tool. This research provides a practical and feasible approach for efficient timing verification in large-scale FPGA designs.
  • Accepted: 2026-01-12
    In response to the issues such as the reduction in the acoustic wave intensity emitted by the ultrasonic transducer and the decrease in the sensitivity of the received signal of traditional ultrasonic anemometers under conditions like rain and snow, strong wind, dusty environments, and long-term usage, which subsequently lead to low measurement accuracy and low stability of wind speed and direction, an ultrasonic anemometers based on adaptive adjustment of signal gain has been designed. The electrical hardware section mainly consists of the driving circuit and receiving circuit for ultrasonic signal transmission, the programmable signal gain amplification and adjustment circuit, the filtering circuit, the AD acquisition circuit, and the MCU control circuit, etc. The software part mainly employs the time difference method, the ring tone method, the cross-correlation method, and the adaptive adjustment control algorithm of signal gain based on the binary search method to calculate the wind speed and direction. Experimental results indicate that the designed ultrasonic anemometer yields highly accurate and stable data for wind speed and direction measurements in harsh environments,Within the range of 0-15m/s, the error is ±0.5m/s; for wind speeds between 15-40m/s, the error does not exceed ±3%.The absolute error of wind direction is less than 3°.
  • 林 晓会
    Accepted: 2026-01-09
    CPLD器件的严格使用场景对其提出了高可靠性测试要求,针对上位机测试CPLD器件过程繁琐、测试效率低等问题,提出了一种基于逻辑分析仪的CPLD配置向量生成方法。该方法以国产CPLD为例,利用逻辑分析仪实时采集JTAG下载配置数据,通过协议解码和对解码后的配置数据深入分析和总结,基于SVF标准语句格式编写生成了配置向量,完成了配置向量转码和ATE在线配置测试验证,测试结果有效证明了生成配置向量的正确性和该生成方法的可行性,对后续CPLD器件使用ATE自动化量产测试具有重要指导意义。
  • Accepted: 2026-01-09
    This paper presents a low-noise, high-voltage, high-output-current power operational amplifier designed in a SMIC 180 nm BCD process. The architecture features a low-noise PMOS input stage, a voltage gain stage, and a Class AB output stage biased by a transconductance linear loop. Stability is ensured by Cascode frequency compensation, while integrated hysteretic over-temperature and current-limiting circuits provide robust protection against thermal and electrical damage. Combining 60V DMOS and 1.8V CMOS devices, the amplifier operates over a wide supply range of ±4V to ±30V and a temperature range of -55°C to +125°C. Simulation results demonstrate an equivalent input voltage noise of 8.85 nV/√Hz, a 400mA output current, 143.3dB DC gain, a 6.804 MHz unity-gain bandwidth, a 33.7 V/μs slew rate, with the chip area of 1.79 × 1.12 mm². The proposed amplifier is well-suited for vehicle electronic applications including precision battery sensing, sensor interfaces, and power device driving.
  • Accepted: 2026-01-09
    To address the challenges of traditional MOSFET testing, such as cumbersome procedures, reliance on bulky instruments, and a low degree of intelligence, this paper presents an automated test system integrating a Large Language Model (LLM) with the "Yuzhu S" portable hardware. Centered around the "Yuzhu S" instrument, the system performs characteristic curve, switching time, and double-pulse tests using an integrated PCB carrier board. It innovatively leverages the Gemini API to empower the software, enabling automatic parsing of PDF datasheets, intelligent recommendation of test parameters, and in-depth error analysis of the results. The test results for an IRF7401 device demonstrate that the key static and dynamic parameters obtained by the system show excellent agreement with datasheet specifications and simulation values, thus validating the accuracy and feasibility of the proposed solution. This research provides an efficient, intelligent, and portable new method for end-users to evaluate device performance.
  • HE Yuhua, Wang Xueying, Xu Ke, Liu Sijia
    Accepted: 2026-01-09
    Aiming at the constraints of time and venue in FPGA-related experimental teaching, as well as the difficulty of collecting process data on teaching and learning in traditional offline on-site board-level experiments, this paper presents the design and implementation of a remote laboratory system for digital-circuit instruction based on the Unisoc FPGA platform. By adopting a hardware–software co-design philosophy, the system not only supports remote download/update via a emulated JTAG interface, bit-stream flashing, waveform capture and signal generation, but also extends to a dual-channel arbitrary waveform generator and spectrum analyzer. Integrating remote cameras and digital-twin technology panels, the system streams real-time experimental phenomena over Ethernet, enables remote interaction and continuous monitoring of experiment status, and thus establishes an immersive and scalable remote laboratory environment.
  • Accepted: 2026-01-09
    To tackle the concurrent challenges of bandwidth, linearity, and integration in the analog front-end (AFE) of a 100-Gbps PAM-4 wireline receiver for Chiplet interconnect applications, this paper presents a high-performance AFE architecture based on a transconductance–transimpedance amplifier (GM-TIA) continuous-time linear equalizer (CTLE). The proposed AFE efficiently compensates for channel loss while maintaining high linearity through an integrated broadband input matching network consisting of an asymmetric T-coil, a programmable attenuator, and an AC coupler. A two-stage cascaded GM-TIA-based CTLE enables wide-range gain tuning from low to high frequencies and also serves as a variable-gain amplifier (VGA). Designed in a 28-nm CMOS process, the AFE occupies a core area of 0.012 mm² with the power dissipation of 9.94 mW. The equalization tuning range extends from 2.25 dB to 13.39 dB. After equalization, the 100-Gbps PAM-4 output exhibits an eye height greater than 100 mV and an eye width exceeding 0.52 UI.
  • Wu Yuhan, Wang Shiyuan, Chen Xiaowen, Xing Shiyuan
    Accepted: 2026-01-09
    Front-end RTL design is a critical phase determining a chip's performance, power, and area (PPA). Conventional methodologies often prioritize functional implementation, lacking systematic optimization for PPA metrics. To address this, this paper proposes a multi-dimensional RTL optimization approach—the DCAP co-optimization model. This model establishes a framework encompassing four dimensions: Data-path (D), Computation (C), Area-management (A), and Power-management (P). Using the USB 2.0 link layer as a case study, data throughput is enhanced via a coupled handshake scheme, computational efficiency is optimized using a real-time iterative CRC architecture, area overhead is controlled through resource management, and power consumption is reduced by improving clock gating coverage. Back-end implementation results based on TSMC 65nm technology demonstrate that the design achieves a throughput of 52.3 MB/s (protocol efficiency:87%) in High-Speed mode, with a power consumption of 0.156 mW and an area of 3333.6 μm². Compared to the pre-optimized design, this represents a 39% reduction in power and a 23% reduction in area. In conclusion, the proposed DCAP model provides a reusable methodological guide for addressing PPA optimization challenges in digital circuit design at the register-transfer level.
  • Qin Haiyan, Feng Jiahao, Xie Zhiwei, Li Jingjing, Kang Wang
    Accepted: 2026-01-09
    Large Language Models (LLMs) face dual challenges in automated hardware design: ensuring functional correctness and achieving human-expert-level optimization efficiency. Circuits generated by existing models often suffer from a fundamental "Boolean optimization barrier," resulting in a gate count that is 38% to 1075% higher than human-expert designs. To address this, we introduce VeriOptima, a novel two-stage AI framework designed to bridge the gap from natural language specifications to highly-optimized gate-level netlists. The first stage, ReasoningV, is a high-fidelity Verilog generation model that ensures functional correctness through a high-quality dataset and an adaptive reasoning mechanism. Its performance was independently evaluated, achieving a 57.8% pass@1 accuracy on the VerilogEval-Human benchmark, which is competitive with top-tier state-of-the-art (SOTA) models. The second stage, CircuitMind, is a multi-agent optimization framework that takes the code generated by ReasoningV and refines it to human-competitive efficiency. For rigorous evaluation, we introduce TC-Bench, the first gate-level benchmark derived from a competitive circuit design platform. Experiments validate the effectiveness of our integrated framework. ReasoningV achieves state-of-the-art performance among open-source Verilog generation models. More critically, in comparative compilation-optimization experiments, using designs from ReasoningV as a starting point yields significantly better final Power, Performance, and Area (PPA) metrics than when starting with code from other LLMs. Ultimately, after refinement by CircuitMind, 55.6% of the implementations reach or surpass the efficiency of top human experts. This work presents the first end-to-end solution to systematically overcome the challenges of both generation and optimization, paving the way for fully automated, high-quality circuit implementation. The related code has been released on GitHub:ReasoningV(https://github.com/BUAA-CLab/ReasoningV)and CircuitMind(https://github.com/BUAA-CLab/CircuitMind)。
  • Chen Guanfu, Lan Xiaolei, Chen Zhencheng, Zhang Yihao, Chen Linliang, Li Sai
    Accepted: 2026-01-09
    Field-Programmable Gate Arrays (FPGAs) are key platforms for edge computing and have broad potential in edge-side image recognition. This paper presents an embedded system based on a domestically produced FPGA and a self-designed ASIC-like architecture for real-time edge deployment. On the software side, a lightweight neural network named NexusEdgeNet is proposed. It achieves 94.22% accuracy on 39 farmland disease categories with only 0.184 MB of parameters. On the hardware side, an ASIC-like accelerator fully described in Verilog is designed. It adopts a distributed on-chip memory structure, eliminating external memory access, and supports arbitrary-shaped convolution, pooling, and fully connected operations. Several optimization techniques are applied, including near-memory parallel computing, pipelining, sliding convolution windows, and double buffering. The accelerator reaches 399 FPS inference speed on the EP6HL130 FPGA, with 85% resource utilization and significantly reduced logic consumption. The system integrates image acquisition, processing, and display, supporting real-time video stream recognition. It maintains high accuracy while achieving excellent real-time performance and resource efficiency. This work provides a practical, low-cost solution for edge computing applications based on domestic FPGAs.
  • Accepted: 2026-01-09
    To address the high-precision requirements for resolver excitation current peak control, this paper designs and implements a high-precision excitation current peak control system based on distributed Delay-Locked Loop (DLL) timing control. By integrating a 7-stage low-offset, high-bandwidth Programmable Gain Amplifier (PGA), a 13-bit hybrid-timing-logic-based Successive Approximation Register Analog-to-Digital Converter (SAR ADC), a 12-bit digital Sinusoidal Pulse Width Modulation (SPWM) module employing a bipolar modulation architecture, and three sets of DLL timing control circuits on-chip, a complete closed-loop excitation current control system is constructed. This system achieves precise sampling and dynamic adjustment of the excitation current peak. Experimental results demonstrate that the designed PWM waveform output achieves a resolution of 1.3 ns and an excitation current peak error of less than ±0.934% at a target current of 400 mA, providing an effective solution for resolver excitation current peak control.
  • Accepted: 2026-01-09
    针对绝缘栅双极型晶体管(IGBT)功率模块在高速开关过程中产生的近场磁辐射干扰问题,本文采用仿真与实验相结合的方法,对模块内部的空间磁场分布规律进行了系统性研究。首先,基于磁矢量势(MVP)理论,利用自主开发的有限元求解器对GCV900系列IGBT模块进行三维电磁场建模与仿真,分析了不同频率下模块内部的磁场分布特性。随后,搭建了三相无功测试平台,通过高精度近场磁探头对模块上不同位置的IGBT芯片表面的磁场进行了实验测量。仿真与实验结果均表明:近场磁辐射强度在模块内部呈不均匀分布,靠近直流输入端、处于主换流路径核心位置的IGBT芯片区域承受的磁场辐射最强,而靠近交流输出端的芯片区域所受影响最小。 本研究揭示了IGBT模块内部磁场辐射的分布规律,为功率模块的电磁兼容优化设计及近场耦合干扰抑制提供了理论依据和数据支持
  • LIU Mingjian, YAN Konghan, WANG Jiaqi, FENG Chaochao, SUI Bingcai
    Accepted: 2026-01-09
    Wafer manufacturing involves multi-module coordination and strong temporal constraints. Traditional scheduling methods struggle in high-mix production scenarios due to poor adaptability and difficulty in handling complex constraints. To address the dynamic scheduling challenges under strict Just-in-Time (JIT) constraints, this paper proposes an efficient dynamic scheduling scheme named GA-JIT Scheduler, based on a genetic algorithm. The approach models equipment and processes using directed graphs, encoding JIT and other complex constraints into a fitness function. By integrating time-window detection and genetic evolution strategies, a "perception-decision-execution" closed-loop tuning mechanism is constructed to enable rapid responses to dynamic disturbances. This paper verifies the GA-JIT Scheduler using 4 differentiated scheduling tasks from the "9th National Innovation Competition (BeiFang HuaChuang Cup)", with the measured solution times of the four tasks being 93256.5s, 15311.5s, 13013.5s, and 18470s respectively. The algorithm satisfies constraints such as equipment exclusivity and JIT (movement ≤ 30s, residence time ≤ 15s), adapts to multiple scenarios, verifies its engineering applicability and scalability in the dynamic scheduling of wafer manufacturing under strict JIT constraints, and provides a feasible solution for wafer manufacturing with high mix and strong temporal constraints.
  • Accepted: 2026-01-09
    In high-reliability applications such as aerospace, satellite communication, and nuclear control systems, multiple node upsets (MNUs) induced by radiation have become a major threat to the stability of static random access memory (SRAM). In recent years, to address the double node upset (DNU) issue, various radiation-hardened-by-design (RHBD) structures have been proposed and extensively studied, including S8P8N, QUCCE12T, SARP12T, HRLP16T, RH20T, S6P8N, and RH14T.
  • 王 浚蘅, 张 俊昌, 周 乐成, 沈 祺, 林 向阳, 兰 馗博
    Accepted: 2026-01-09
    Automatic Test Equipment (ATE) for integrated circuits is a core device used to verify the functionality and performance of chips. Traditional testing methods suffer from limitations such as low efficiency and insufficient precision. To address these issues, this study proposes an automatic testing scheme based on the ST3020 ATE. This equipment features innovative characteristics including automation, high efficiency, high precision, wide measurement range, strong flexibility, and excellent scalability. Taking the UC2625 chip as the test object, automatic test codes were developed at the software level, and an interface printed circuit board (PCB) was designed at the hardware level. By integrating technologies such as cyclic testing, array storage, and data comparison, a systematic study was conducted on the logical functions and key parameter indicators of the chip, ultimately realizing a complete ATE automatic testing scheme. The test results are consistent with the specifications in the chip datasheet and meet the requirements of practical testing. This scheme conducts valuable explorations in automatic testing methods, provides a reference for the independent development of ATE testing technology in China, contributes to promoting the construction of the semiconductor testing industry, and supports the technological self-reliance and self-improvement in China's integrated circuit field.
  • Wang Zhipeng, Li Wenbin, Li Guoyong
    Accepted: 2026-01-09
    The global issue of “garbage encircling cities” is intensifying, making intelligent waste sorting a research hotspot for tackling this challenge. However, embedded platforms commonly face the trade-off dilemma of “limited computing power - high real-time requirements - optimal recognition accuracy.” Traditional approaches struggle to meet practical demands: cloud-based architectures suffer from high latency due to data transmission, pure embedded architectures lack sufficient computing power, and cloud-edge collaborative architectures still exhibit interaction delays. This paper proposes a heterogeneous collaborative computing architecture based on FPGA-STM32. The FPGA handles image preprocessing and parallel convolution computations, while the STM32 manages fully connected layer operations and classification decisions. Concurrently, a lightweight convolutional neural network is optimized through pruning into a “single convolution layer + three fully connected layers” structure, incorporating INT16 quantization and clipping mechanisms to balance accuracy and hardware adaptability. Experiments demonstrate that the system achieves an 83.33% accuracy rate in identifying ten categories of household waste. Compared to the MATLAB platform, it accelerates inference by 15.675 times with a processing latency of only 40.004ms. The low FPGA core resource utilization enables efficient deployment in embedded waste sorting scenarios such as communities and households.
  • HU Xianghong, YIN Feiyue, LIANG Kelong, FENG Zhaozhang, LIN Yuanmiao, CAI Shuting, Xiong Xiaoming
    Accepted: 2026-01-09
    With the rapid development of artificial intelligence and deep learning applications, tensor computing urgently demands high-efficiency and multi-precision computing hardware accelerators. Traditional general-purpose processors face energy efficiency bottlenecks when processing large-scale matrix multiplication operations, while existing dedicated accelerators often lack flexibility in supporting diverse data precision and hybrid computing modes. This paper presents a multi-precision and mixed-precision tensor processing unit (TPU), designed based on a reconfigurable architecture, which supports five data formats (INT4, INT8, FP16, BF16, FP32) and two hybrid modes (FP16+FP32, BF16+FP32).It is capable of efficiently performing matrix multiplication and accumulation across three different dimensions (m16n16k16, m32n8k16, m8n32k16). By incorporating a reconfigurable computing array, dynamic data flow control, multi-mode buffer design, and a unified floating-point processing unit, the design achieves high hardware reuse and significantly improved computational efficiency. Synthesized on the VCU118 FPGA platform at 251.13MHz, it delivers a peak theoretical performance of 257.16 GOPS/GFLOPS (INT4/INT8/FP16/BF16) and 64.29 GFLOPS (FP32). This design is well-suited for applications such as deep learning inference, autonomous driving, and medical imaging, where both computational efficiency and flexibility are critical.
  • Accepted: 2026-01-09
    In response to the lack of research on a universal architecture for multi robot collaboration systems suitable for FPGA platforms in existing literature, this paper introduces a multi robot collaboration system based on FPGA. The system constructs a communication network between robots based on UART communication protocol, and uses a designed information transmission mechanism to enable FPGA to receive data from all robots and control them. At the same time, to cope with abnormal situations, an alarm mechanism and remote control mode are also designed for the system. After the actual construction of a multi robot collaboration system and the verification of its functions one by one, the feasibility of the design concept of the multi robot collaboration system proposed in this paper has been confirmed.
  • Accepted: 2026-01-09
    Abstract:Aiming at the industry pain points of traditional water quality monitoring equipment, such as high cost, redundant power consumption, weak data collaboration capability and insufficient scalability, this paper designs and implements a low-power water quality monitoring and cloud collaboration system based on Phytium Pi CEK8903. The system uses the fully localized Phytium Pi development board as the core control unit, integrates TS-200 pH sensor, TS-300B turbidity sensor and DS18B20 temperature sensor to build a perception network, and realizes hardware-level low-power optimization through Dynamic Voltage and Frequency Scaling (DVFS) technology. Locally, the software simulates I2C to drive the OLED display module, and realizes cross-device data interaction between Phytium Pi and Arduino through UART protocol; the cloud builds a front-end and back-end separation service based on Flask+Socket.IO architecture, and realizes full-link data synchronization of "edge device-cloud platform-user terminal" with the help of HTTP/Socket.IO dual protocols. The system supports plug-and-play expansion of sensors and hierarchical fault tolerance mechanism. Users can obtain real-time core parameters such as pH value, turbidity and water temperature through Web terminal (PC/mobile terminal) or local OLED screen, and trigger multi-level early warning when parameters are abnormal. Verified by 72-hour laboratory calibration and field simulation tests: standby power consumption is as low as 0.48W (only 1/10 of traditional equipment), the relative error of pH measurement is ≤0.78%, the turbidity detection accuracy is ±30 NTU, the full-link data transmission delay is ≤1 second, and the data packet loss rate is <0.1%. The system breaks through the energy constraints and data island bottlenecks of long-term field monitoring, provides a low-cost and high-reliability nationalized solution for river water quality supervision, refined aquaculture management and industrial wastewater discharge monitoring, and has significant practical value and promotion prospects. Keywords: Phytium Pi CEK8903;Low Power Consumption;Water Quality Monitoring;Cloud Collaboration;Socket.IO; Dynamic Voltage and Frequency Scaling