Home Browse Just Accepted

Just Accepted

Note: The articles listed below have been peer-reviewed and accepted for publication in this journal. These articles have not yet been scheduled for a specific issue; their content and layout may undergo minor changes in the final published version. Please refer to the final published version as the definitive one. This journal has assigned each of these articles a unique and persistent DOI. You may use the DOI to cite this article directly.
Please wait a minute...
  • Select all
    |
  • SHI Huanhuan, LI Sujuan, BAO Zhong
    Accepted: 2026-06-17
    Time sensitive network has the characteristics of low latency, low cost, and high reliability. The Time-Aware Shaper (TAS), defined by the IEEE 802.1Qbv standard, stands as one of the critical technologies for enabling deterministic network traffic and holds a core position within the TSN protocol suite. First, by means of logical designs such as top-level architecture, workflow, and active-standby coordination, a method for implementing TAS is proposed. Second, by constructing constraints for traffic with different priorities, a deterministic scenario is designed to verify the implementation of clock synchronization and the time-aware shaper. Finally, through comparative experiments conducted before and after enabling TAS, the superiority of TSN technology over standard Ethernet in terms of low latency and low jitter is validated. The results show that, after enabling clock synchronization and gate control functions, the average delay and jitter are only about 1/20 to 1/15 of the original, which is significantly different. This provides a demonstration solution for the rapid promotion of TSN technology.
  • SU He, TANG Wei, WANG yi Jing, CHEN Lin
    Accepted: 2026-06-17
    Addressing the power supply issues of noise-sensitive electronic devices such as mobile phone cameras and Bluetooth, a low dropout regulator (LDO) with high power supply rejection ratio (PSRR) and ultra-low dropout voltage has been designed. The circuit employs an N-type field-effect transistor (NMOS) as the regulation transistor and is powered by a dual-power supply. It segregates the bias power supply from the input power supply of the regulation transistor, thereby attaining a high Power Supply Rejection Ratio (PSRR) and an ultra-low dropout voltage. The design incorporates a pre-regulation modulation circuit and a low-pass filter to process the reference voltage, enhancing the PSRR of the bias power supply. The main loop adjust the system's pole-zero distribution through inverse nested Miller compensation, improving the overall PSRR of the circuit. The circuit and layout design were completed based on a 0.18µm CMOS process. The maximum load current of the circuit is 500mA. Simulation results show that the dropout voltage at maximum load current is 100mV. At a load of 10mA, the power supply rejection ratio (PSRR) of the input power supply at frequencies of 100Hz, 1kHz, 10kHz, and 1MHz are -110dB, -90dB, -70dB, and -65dB, respectively.
  • 张 剑
    Accepted: 2026-06-17
    Spiking neural networks (SNNs) represent and process information using discrete spike trains, exhibiting event-driven and sparse computation characteristics and offering strong potential for energy-efficient intelligent computing. To fully exploit their low-power advantages, dedicated hardware implementation techniques and neuromorphic chips for SNNs are of significant research importance. From the perspective of hardware implementation, this paper systematically reviews the full hardware development chain of SNNs, covering neuron models, information encoding methods, network structures, and hardware architectures. The paper describes commonly used spiking neuron models and information encoding methods, discusses network structures including fully connected, convolutional, and attention-based architectures, systematically summarizes research on hardware architectures, compares the advantages and limitations of different circuit implementation technologies and computing paradigms, and analyzes current key challenges while outlining future research directions.
  • Accepted: 2026-06-17
    RT-Thread is an open-source embedded Real-Time Operating System (RTOS) that has been widely adopted in the Internet of Things (IoT) domain due to its rich components and excellent tailorability. TH4001 is a 32-bit Reduced Instruction Set Computer-V (RISC-V) processor chip targeted at lightweight embedded control scenarios. It suffers from limited on-chip Static Random Access Memory (SRAM) capacity and is not equipped with standard hardware peripherals such as the Universal Asynchronous Receiver/Transmitter (UART). To date, no mature solution exists for porting the RT-Thread operating system to the TH4001 platform. To fill this gap, this paper presents a complete RT-Thread porting scheme for the TH4001 chip, focusing on addressing the constraints of insufficient SRAM capacity and the lack of UART peripherals. The scheme adopts a hybrid loading and running architecture for system booting, and leverages the Programmable I/O (PIO) module to emulate a UART interface for console output. Experimental results demonstrate that the ported RT-Thread system boots successfully, runs the user main thread, and correctly performs context switching. This work serves as a reference solution for porting RTOS to similarly resource-constrained RISC-V chips.
  • Chen Cheng, Chen Guang Wei, Chen Wen Tao, Sun Chen Yang
    Accepted: 2026-06-15
    Aiming at the problems of redundant volume, large link loss, and weak anti-electromagnetic interference of traditional FMCW radar transceiver systems in miniaturized devices, this paper designs and implements a highly integrated FMCW radar transceiver SiP module operating in the 4200-4400MHz frequency band, which is suitable for aircraft altimeters. Based on System-in-Package (SiP) technology, the module adopts a 4-layer BT substrate to realize heterogeneous integration of multiple chips. It internally integrates a transmission link, a reception link, and a clock generator. A DDS is used to generate 500MHz~700MHz signals, which are then upconverted to the target frequency band in one step. The module features transmit power control, receive gain adjustment, and frequency-swept waveform configuration. The system is designed with a transmit output power of 20dBm, a maximum receive link gain of 80.5dB, and a transmit-receive isolation of ≥90dB, with a compact package size of only 14mm×14mm. Test results show that the module has a transmit power flatness of less than 1dB, a noise figure of only 4.08dB at maximum gain, and an FMCW signal linearity of 0.012%. It can achieve kilometer-level altitude detection and fully meets the application requirements of aircraft for miniaturization, low power consumption, and high signal stability.
  • Accepted: 2026-06-11
    QuestaSIM/ModelSIM serves as a prevalent simulation platform for FPGA verification, where workflow automation has demonstrated substantial efficacy in enhancing verification productivity. The canonical simulation flow of QuestaSim/ModelSim comprises six sequential stages: compilation, optimization, design elaboration and loading into the VSIM kernel, execution and debugging, context preservation, and platform termination. Contemporary automation scripts predominantly leverage primitive commands documented in the QuestaSim/ModelSim Command Reference Manual to mechanize this pipeline in a batch-processing manner. Switching between different testcases under the existing simulation workflow script necessitates exiting the QuestaSim/ModelSim platform, modifying the script to select the target testcase, and then repeating the entire workflow. Executing a large number of testcases necessitates frequent launches and exits of the simulation platform. Moreover, the current workflow lacks a dedicated error-handling procedure for common simulation-time errors. To address these issues, this paper proposes an interactive, automated simulation workflow implemented in Tcl/Tk. The workflow features GUI-based testcase selection and comprehensive error handling. Experimental results show that it enables seamless testcase switching without exiting the simulation platform and provides handling for common errors, significantly reducing manual steps. This work advances automation in multi-testcase scenarios and offers a practical path toward improving simulation automation in FPGA software verification.
  • Accepted: 2026-06-09
    The steady-state simulation of analog integrated circuits generally adopts the traditional transient analysis method. Therefore, it is necessary to integrate from the initial state to the steady state over a long period of time. For high-Q and low-damping circuits (such as LC oscillators), the solution process is thus time-consuming and becomes a significant bottleneck in design efficiency. This paper proposes an efficient steady-state simulation method based on the Shooting method, converting the periodic steady-state solution into a boundary value problem. Firstly, the state transition function is constructed, and then the Newton-Raphson iterative method is used to solve it. At the same time, pseudo-transient initialization and adaptive damping strategies are combined to ensure rapid and reliable convergence. What's more remarkable is that the simulation environment is built based on the PyTorch framework, using automatic differentiation to directly and accurately calculate the Jacobian matrix, and supporting dual acceleration on CPU and GPU. Through experimental verification of three types of circuits: ring oscillators, LC voltage-controlled oscillators, and Gilbert mixer, it is clearly and rigorously proved that the proposed method, while ensuring accuracy, increases the simulation speed by 20 to 117 times compared to traditional methods. It has extremely prominent acceleration effects for high-Q circuits and demonstrates its superiority in steady-state simulation of analog integrated circuits.
  • Accepted: 2026-06-04
    This work presents the design and implementation of a distributed gallium arsenide power amplifier (DPA) with integrated temperature and power sensing for high-power ultra-wideband applications, based on a 0.15-μm GaAs pHEMT process. The proposed amplifier consists of a distributed power amplifier (DPA), a temperature sensing unit, and a differential power detection unit. The distributed power amplifier adopts a cascode architecture, which significantly enhances the output power. The proposed temperature sensing unit employs a multi-point array-based temperature measurement scheme, in which different temperature sensing nodes can be selected through an external encoder. The differential power detection unit outputs detection voltages through the Vdet and Vref pins, effectively suppressing the influence of temperature variations on power detection accuracy. Measurement results indicate that, under the operating condition of Vgs = −0.5 V, the DPA achieves an operating bandwidth from DC to 28 GHz, with the output power (Pdc) increased to 30 dBm, a power-added efficiency (PAE) of 15%, and a gain flatness better than ±1 dB. Small-signal measurement results demonstrate that, over the operating bandwidth, both the input return loss (S11) and output return loss (S22) are better than −10 dB. The overall chip layout area is 3.8 mm × 1.5 mm.
  • Kong Linghui, Mao Jingna, Zhiwei Zhang, Yu Shan
    Accepted: 2026-06-03
    Aiming at the problem that the existing FPGA matrix inversion methods are difficult to balance hardware resources, calculation delay and calculation accuracy, this paper proposes a matrix inversion architecture that integrates algorithm optimization and hardware architecture design. In this paper, based on MGS-QR decomposition, the parallel vector operations in the operation flow are integrated, and the calculation of the inverse matrix is transformed into the integration of the partial sum of each iteration in the iterative process, and the vector processing unit adaptation algorithm supporting multiple vector operation modes is designed. This architecture enables the inversion of an N×N real matrix to be accomplished using resources that grow quadratically with the matrix size and clock cycles that increase linearly with the matrix size. It is capable of adapting to the processing requirements of matrices of different sizes. Compared with the traditional MGS-QR factorization matrix inversion method, the designed architecture further reduces the amount of DSP usage and significantly shorts the calculation delay under the premise of ensuring the same calculation accuracy, which can provide an efficient and highly reliable engineering solution for real-time matrix inversion in embedded systems.
  • Accepted: 2026-06-03
    In low-voltage distribution networks, the power communication module is responsible not only for high-speed communication but also plays a crucial role in feeder identification and phase identification. The zero-crossing detection function of the module is key to achieving feeder and phase identification. The intelligent verification of zero-crossing detection circuits is a challenging aspect in power distribution technology. Traditional centralized zero-crossing detection verification systems suffer from high latency, network dependency, high costs, poor maintainability, and low data security. To address these issues, this paper proposes the design and application of a high-speed power communication module zero-crossing detection verification system tailored for edge computing. The system is designed for edge computing and consists of five layers: hardware device layer, communication interface layer, communication protocol layer, business logic layer, and application software layer. In the zero-crossing detection verification algorithm, the focus is on ensuring networking, data reading, and error calculation processes to ultimately achieve edge-parallel computing, data storage, and conclusion output. Currently, the average mean time between failures for edge computing-oriented zero-crossing detection verification systems reaches 264 hours, with an average test duration of 5.1 seconds and a verification accuracy rate of 99.96%. Compared to traditional methods, costs are reduced by 48%. The system has now been widely deployed in manufacturing factories for zero-crossing detection verification, achieving significant economic benefits and providing new insights for the application of edge computing technology in power systems.
  • Accepted: 2026-06-01
    Aiming at the technical bottlenecks of limited functionality and inadequate measurement precision inherent in conventional current data sampling modules, this study develops a high-precision current data acquisition module based on STM32 series microcontrollers. The module integrates a high-performance analog-to-digital (AD) conversion chip and a network communication interface, thereby enabling high-precision sampling of current signals and real-time data transmission. Field test results indicate that the proposed current data acquisition module exhibits stable operation and high data accuracy. It can fully meet the application requirements of multi-point current monitoring in industrial workshops, and thus boasts considerable potential for engineering popularization and remarkable practical significance.
  • WANG Pengzhang, SUN Hao, LIANG Jian, WANG Haoyan
    Accepted: 2026-05-27
    In real-time embedded bare-metal systems such as relay protection devices, traditional synchronous logging schemes suffer from high interrupt response latency, poor thread safety, and significant resource consumption. This paper proposes a lightweight asynchronous logging scheme tailored for bare-metal systems. The scheme adopts a dual-buffer asynchronous write mechanism: only raw parameter values are saved in interrupt service routines, while log formatting and storage operations are completed in the background main loop. Critical section protection is achieved through interrupt disabling, ensuring thread safety in interrupt nesting environments. A unified storage strategy is designed for both AMP architecture and single-core bare-metal scenarios, supporting log level filtering, Hexdump memory dumping, and power-fail safe storage. Experimental results on the RockChip 4-core processor show that the scheme reduces single log recording time to within 1.1μs, achieving a 6x performance improvement compared to standard formatting printing interfaces, with peak buffer occupancy of 65%. The dual-buffer design and flow control mechanism effectively guarantee high real-time performance and reliability of the system.This solution is suitable for embedded bare-metal systems in fields such as relay protection, industrial control, and aerospace, where real-time performance and reliability are of the utmost importance.
  • Accepted: 2026-05-25
    本文介绍了基于FPGA芯片及宽带D/A芯片AD9739A实现的任意模拟干扰波形信号发生器。本发生器充分利用FPGA丰富的逻辑、RAM、DSP及高速接口等资源,实现千兆/万兆以太网、Microblaze和高速缓存等功能模块,完成通信的功能;为满足快速切换模拟干扰信号波形需要,精心设计数据流程及高速数据读写时序,并适配采样率高达2.5GSPS的高速D/A转换器AD9739,以实现模拟雷达信号D/AC功能。本信号发生器在项目中得到了成功应用,取得了良好的效果。
  • Accepted: 2026-05-21
    在激光三角测量系统中,激光条纹中心线的提取精度与处理速度之间存在固有矛盾,制约了高动态在线三维检测的实现。针对该问题,本文提出一种基于现场可编程门阵列(FPGA)的硬件协同处理方案,旨在兼顾亚像素级精度与高动态下的检测性能。系统以灰度质心法为核心算法,通过并行流水线架构在单片FPGA上集成图像采集、实时处理与基于GigE Vision协议的传输功能。设计中采用每个时钟周期处理4像素的并行计算单元、多级流水线划分及基于“乒乓操作”的行缓冲机制,显著提升数据吞吐率;同时,结合自适应阈值生成与高保真定点运算策略,在避免浮点运算的同时确保亚像素精度。实验使用1624×1240像素、358 fps的图像输入,结果表明:中心线提取帧率达358.4 fps,与采集帧率完全同步;中心线拟合残差均值约为0.244像素,达到亚像素精度;在手动重复放置工况下, 2 mm 标准量块测量的高度重复性标准差优于 2.1 μm;全量程内最大测量误差控制在 60 μm 以内。本研究验证了所述FPGA方案能够有效统一激光条纹中心线提取的速度与精度,为高速高精度的嵌入式视觉测量系统提供了可行的技术路径。
  • Accepted: 2026-05-15
    In light of the urgent demand for system-level health management in the domestic application of the current VPX architecture within high-reliability fields such as aerospace and defense electronics, this paper proposes and designs a domestic health management module based on the collaborative architecture of Phytium processor, FPGA and MCU, and deeply integrates it into the VPX system framework to achieve comprehensive, intelligent, and full-life-cycle monitoring and management of the system operating status. Based on the VITA46 specification and IPMI protocol, the health management module establishes a three-level collaborative, clearly divided, and mutually redundant core control architecture of "Phytium Processor + FPGA + MCU" through the collaboration of three core units, which significantly improves system reliability and fault tolerance, and reduces operation and maintenance costs. The Phytium D2000/8 undertakes system-level health status decision-making, data fusion processing, and global scheduling; the FPGA, as the acceleration control unit, is responsible for high-speed data acquisition, precise timing control, and custom interface expansion;the MCU focuses on real-time collection of key parameters such as voltage and temperature, local fault early warning, IPMI bus, and Ethernet communication. The module integrates domestic sensors, communication interfaces, and protocols, and realizes health management functions such as status monitoring, fault diagnosis, and fault prediction of key components in the VPX system through software-hardware collaboration. Through modular design, standardized interfaces, and custom protocols, the module can be seamlessly embedded into existing VPX systems, improving system operation and maintenance efficiency and localization rate. This solution fully adopts localized components and technical routes, has independent intellectual property rights, meets the requirements of the national independent and controllable strategy, and provides a replicable and promotable technical paradigm for the localization, intelligence, and sustainable operation and maintenance of high-performance embedded systems under the VPX architecture. Experimental and application results show that the health management module exhibits high reliability and accurate monitoring capability in harsh environments, and has significant application value.
  • Liangshun Wu
    Accepted: 2026-05-10
    An ASIP (Application-Specific Instruction-set Processor) acceleration core based on RISC-V is designed specifically for Spiking Neural Networks (SNNs). The goal is to provide flexible and efficient neuronal computation support for various low-power applications, offering a new approach for neuromorphic chip computation cores. An in-depth analysis of the parallel computation patterns of SNNs reveals their computational modes, leading to the introduction of operational instructions and extension modules to accelerate state parallel computation. SIMD (Single Instruction, Multiple Data) operations enhance parallel computation efficiency, allowing the core to process multiple neurons or synapses simultaneously. To further boost efficiency, fixed-point computation modules, and vector multipliers are incorporated, significantly increasing SNN operation speed with minimal precision loss. Specialized extension instructions for the Izhikevich (IZH) and Leaky Integrate-and-Fire (LIF) neuron models provide direct computational support for specific neuron models, enhancing efficiency. Performance evaluation across five application scenarios shows that the ASIP computation core significantly improves parallel state processing while maintaining SNN performance.
  • xu xiang, wei shuhua, chen liang, wei qi, qiao fei
    Accepted: 2026-05-07
    With the rapid development of edge Artificial Intelligence (AI) technologies, resource-constrained terminals impose increasingly stringent requirements on the energy efficiency of neural network inference. However, existing multi-bit neural network models and their hardware implementations generally suffer from high power consumption and complex architectures, which limit their applica-bility in low-power scenarios. Therefore, this work focuses on the demand for lightweight inference in edge vision tasks and propos-es a low-power computing-in-memory (CIM) circuit design method for convolutional inference based on Binarized Neural Networks (BNNs).Based on the charge-domain coupling principle, a compact 10T1C-SRAM CIM cell is designed. To accommodate the ana-log output characteristics, a low-offset comparator and an efficient input driver circuit are further developed. In the proposed archi-tecture, three convolutional layers (Conv2–Conv4) of a five-layer BNN are mapped onto the CIM array to accelerate convolution operations in BNN inference. Meanwhile, to avoid frequent off-chip memory accesses during multi-layer convolution processing, a sliding-window-driven data scheduling mechanism is adopted to complete the inference process. The proposed architecture is im-plemented using the TSMC 180-nm CMOS process. The system achieves an average power consumption of 38.67 μ W and an aver-age energy efficiency of 243 TOPS/W, outperforming several existing low-power CIM architectures. The system enables the infer-ence of a three-layer Convolutional Neural Network (CNN) under microwatt-level power consumption, demonstrating significant energy-efficiency advantages.
  • Accepted: 2026-05-07
    In the toner cartridge chip of a laser printer, the 32-bit data processed by the RF front-end and ADC requires encryption to ensure the security of data transmission and storage. Addressing challenges such as limited hardware resources, significant critical path delays, and constrained throughput in the drum chip, this paper adopts FPGA as the hardware prototype verification platform and introduces resource constraints during the comprehensive implementation process to simulate the implementation effect under embedded-level gate circuit resource conditions. Collaborative optimization focuses on three key modules: data padding, message expansion, and compression functions. A pipeline-based dynamic padding structure driven by finite state machines replaces traditional serial padding logic. a sliding-window pipeline based on shift registers to achieve temporal overlap between message expansion and compression operations; and a 4:2 compressor structure within the compression function's critical path, reconfiguring multi-operand accumulation logic to reduce critical path latency and decrease clock cycles per compression operation. Test results demonstrate that this design has been validated on Xilinx Zynq-7000 series FPGAs, achieving a throughput of 35.56 Gb/s at a maximum operating frequency of 139 MHz, with hardware resource utilization of 1492 Slices. Compared to similar approaches, this solution achieves a superior trade-off between throughput and resource overhead, making it suitable for resource-constrained applications demanding both performance and resource efficiency.
  • Accepted: 2026-04-29
    To address the low efficiency and high false positive rates of traditional SSI absolute encoder testing, this paper presents an FPGA-based automated defect detection system. The system adopts a dual-processor architecture, integrating an FPGA and an STM32 microcontroller. The FPGA performs SSI communication parsing, data acquisition, and real-time detection of anomalies such as skipping, looping, and dead codes, while the STM32 handles stepper motor control and user interaction. An adaptive threshold algorithm based on encoder increments, combined with cache logic, enables accurate and timely analysis of encoder outputs. Experimental results demonstrate that the system achieves over 99% detection accuracy even at a communication rate of 2 MHz, exhibiting strong resistance to interference and compatibility with multiple encoder types. The proposed system offers a cost-effective, high-performance solution for efficient and reliable encoder inspection and quality control in industrial applications.
  • Accepted: 2026-04-27
    To address the high cost and extended downtime of manual maintenance for aerospace sealed-cabin equipment, this paper proposes a reliable over-the-air (OTA) upgrade solution. The method adopts a dual-image backup mechanism with integrity verification and adaptive network protocols, enabling secure firmware updates of up to 16 MB and automatic rollback in case of failure. Experimental results show a misdetection probability below 9.32×10-10, approximately doubles the Flash lifespan, a 99.9% upgrade success rate, and 100% fault recovery. Compared to traditional manual methods, this approach reduces maintenance time from hours to about one minute and cuts costs by over 90%, effectively resolving key maintenance challenges.