Home Browse Just Accepted

Just Accepted

Note: The articles listed below have been peer-reviewed and accepted for publication in this journal. These articles have not yet been scheduled for a specific issue; their content and layout may undergo minor changes in the final published version. Please refer to the final published version as the definitive one. This journal has assigned each of these articles a unique and persistent DOI. You may use the DOI to cite this article directly.
Please wait a minute...
  • Select all
    |
  • Accepted: 2026-04-29
    To address the low efficiency and high false positive rates of traditional SSI absolute encoder testing, this paper presents an FPGA-based automated defect detection system. The system adopts a dual-processor architecture, integrating an FPGA and an STM32 microcontroller. The FPGA performs SSI communication parsing, data acquisition, and real-time detection of anomalies such as skipping, looping, and dead codes, while the STM32 handles stepper motor control and user interaction. An adaptive threshold algorithm based on encoder increments, combined with cache logic, enables accurate and timely analysis of encoder outputs. Experimental results demonstrate that the system achieves over 99% detection accuracy even at a communication rate of 2 MHz, exhibiting strong resistance to interference and compatibility with multiple encoder types. The proposed system offers a cost-effective, high-performance solution for efficient and reliable encoder inspection and quality control in industrial applications.
  • Accepted: 2026-04-27
    To address the high cost and extended downtime of manual maintenance for aerospace sealed-cabin equipment, this paper proposes a reliable over-the-air (OTA) upgrade solution. The method adopts a dual-image backup mechanism with integrity verification and adaptive network protocols, enabling secure firmware updates of up to 16 MB and automatic rollback in case of failure. Experimental results show a misdetection probability below 9.32×10-10, approximately doubles the Flash lifespan, a 99.9% upgrade success rate, and 100% fault recovery. Compared to traditional manual methods, this approach reduces maintenance time from hours to about one minute and cuts costs by over 90%, effectively resolving key maintenance challenges.
  • ZHANG Yunwei, ZHANG Hao, TAO Chenjin, YANG Nin, MA Zongxin
    Accepted: 2026-04-22
    To address the issues of heavy reliance on manual operations, low efficiency, and insufficient consistency during on-site verification of dial pressure gauges, this paper designs and implements a novel portable automated verification system based on deep learning. This system employs a hierarchical collaborative architecture comprising ‘object detection-character recognition-geometric calculation-process management’. Compliant with JJG 52-2013 regulations, the system centres on an embedded platform integrating an autofocus camera, LED ring light, micro-electromagnet tapping device, and process guidance module. This enables localised model deployment and standalone operation without network connectivity. To address misidentification and confusion in dial information recognition, a general lightweight OCR model underwent targeted fine-tuning. An Information Correction Algorithm (ICA) was designed, combining regular expression filtering with arithmetic sequence fitting to achieve consistent verification and correction of scale values. Addressing the unique demands of pointer reading tasks, an enhanced YOLO11n-PR lightweight object detection network was proposed to improve pointer keypoint localisation accuracy, thereby enhancing reading precision in complex industrial environments. Experimental results demonstrate that compared to the baseline model, dial information recognition accuracy improves to 97.44%, word error rate (WER) decreases to 2.56%, and character error rate (CER) reduces to 1.28%, enabling precise differentiation between morphologically similar numerals and symbols. The Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) for pointer readings decreased from 0.0421 and 0.0474 to 0.0154 and 0.0270 respectively, representing reductions of 63.42% and 43.04%. The average calibration time per gauge was reduced to 73 seconds, achieving approximately fourfold efficiency gains over standard manual operations. The system demonstrated stable and reliable performance during prolonged continuous operation, indicating strong engineering applicability and practical value for field deployment.
  • Accepted: 2026-04-22
    In SoC (System on Chip) systems containing DDR memory, the CPU accesses DDR extensively. When the CPU performs read/write operations with short burst lengths, mixed read/write commands, or non‑continuous addresses, the read/write efficiency of DDR decreases sharply. The CPU must wait for DDR read/write operations to complete before proceeding with other tasks, which significantly degrades CPU efficiency. Based on the existing DMA (Direct Memory Access), this design adds the following functions:(1) Support for the AXI4 interface to DDR;(2) Support for burst lengths of 1/2/4/8/16/32/64/128/256;(3) Support for CPU-initiated pause, stop, and start operations during DMA execution;(4) An internal DMA error status detection mechanism that generates an interrupt to report errors to the CPU. This design mainly uses the newly added functions to support CPU read/write access to DDR under different scenarios. When the CPU reads/writes DDR at continuous addresses, an appropriate burst length can be selected according to the data length. When the CPU reads/writes DDR at non‑continuous addresses, the burst length is set to 1, and the DMA continuously sends data with a burst length of 1 internally to complete the read/write operations for non‑continuous addresses. This design uses the Xilinx K7-325T FPGA as the hardware platform, with joint verification using Vivado and VCS. Simulation and board-level experiments verify the correctness of DDR DMA read/write operations for data with burst lengths of 1/2/4/8/16/32/64/128/256.Bandwidth under different burst lengths is measured using the 4‑Gb DDR3 device MT41J256M16xx-125 which is as expected.
  • HU XUEYING
    Accepted: 2026-04-21
    The reliance on commercial EDA tools and the complexity of environment construction in traditional processor verification workflows often hinder agile development. To overcome these bottlenecks, this paper presents an agile, core-level verification methodology for RISC-V processors driven entirely by an open-source toolchain. By leveraging Verilator as the simulation engine and the Python-based Cocotb framework's coroutine mechanism, the proposed approach achieves both high-level abstraction of test stimuli and precise cycle-level driving. Furthermore, a lightweight test architecture is designed to accelerate the feedback loop. Case studies on the open-source RISC-V Ibex core demonstrate that, compared to the traditional Universal Verification Methodology (UVM), this scheme reduces code size by approximately 85% and compresses the iteration cycle of a single test case from hours to minutes—all while ensuring the effective verification of critical paths such as instruction execution and exception response. This solution significantly enhances early-stage design efficiency and offers a cost-effective alternative for educational experiments and prototype development.
  • Accepted: 2026-04-19
    In the off-chip interconnection application scenario of domestic embedded system, the traditional interconnection interfaces have many bottleneck problems such as insufficient bandwidth, high latency, large CPU load, which are increasingly unable to meet the high-speed data transmission needs of key fields such as aerospace and military. RapidIO, as a special interconnection bus protocol for embedded systems, has the advantages of high bandwidth, low latency and flexible topology. To solve the above problems, this paper proposes a serial interface software and hardware optimization design scheme based on RapidIO V2.2 protocol for domestic embedded processors. In terms of hardware design, a 4-channel 6.25 Gbaud integrated architecture is designed, which integrates 2 independent Write/Read DMA channels and implements a linked-list DMA transmission mechanism. By pre configuring the linked list descriptor, batch transmission of discontinuous address data can be completed by a single DMA trigger. At the same time, the hardware cache consistency bus is designed based on MOESI protocol to solve the pain point problem that the software synchronization cache takes too long after DMA transmission in the application. In terms of software implementation, the RapidIO driver is optimized and improved based on hardware characteristics, and the number of data copies in the application is reduced through shared memory and memory mapping mechanism based on zero copy technology, so as to further improve the data transmission rate. Verified by actual chip tests, the read and write rates of the RapidIO interface bus reach 2145.92 MB/s and 1876.35 MB/s respectively, which reach more than 80% of the theoretical bandwidth of 6.25 gbaud in four channels. And in the actual application project, the optimized data transmission rate was increased from 630 MB/s to 1463 MB/s, and the CPU load was reduced by 32%. The scheme meets the requirements of the off chip interconnection performance of the domestic embedded system, and provides reliable technical support and application examples for the high-speed interconnection of the domestic SOC.
  • Accepted: 2026-04-15
    Serial communication is widely utilized in industrial control, the Internet of Things, and embedded systems. Asynchronous protocols such as UART are limited by low transmission rates and strict clock tolerance requirements, whereas synchronous protocols like SPI necessitate an additional clock signal, incurring inherent disadvantages in power consumption and cost. An asynchronous serial communication method based on level width modulation is proposed, where symbols are encoded by the duration of a level and symbol synchronization is achieved via signal edges. This method can be implemented using purely digital circuits, enabling high clock tolerance communication over a single-wire connection. By optimizing parameters such as the transmitted level width and the receiver decision window, flexible trade-offs among communication rate, clock tolerance, and SNR-FER performance are enabled. The mathematical relationship between the parameter set and clock tolerance is derived and subsequently verified through simulation. The simulations of SNR-FER curves for typical parameter sets under different operating modes were obtained. A purely digital transceiver prototype was implemented on an FPGA with a clock frequency ratio of 1.2 between transceivers, a frame error rate of 2.4×10⁻⁷ and an average data rate of 142.6 Mbps are achieved at an SNR of 19.4 dB. This method is well-suited for on-chip clock domain crossing or off-chip low-power communication scenarios, particularly where a significant clock frequency difference exists between transceivers and wiring resources are constrained, meeting the demands for medium-rate, low-cost communication.
  • Accepted: 2026-04-14
    An all-digital phase-locked loop(ADPLL)designed for CMOS image sensor chips.A dual-loop all-digital phase-locked loop based on injection-locked technology is designed to address the issue that traditional ADPLL rely on high-precision time-to-digital converters(TDC),making them difficult to implement through a fully digital design flow.This ADPLL does not depend on highly customized modules;instead,it leverages injection locking,PVT sensing compensation,and dual-loop techniques to enhance phase noise performance.As a result,all modules of the PLL,including the digitally controlled oscillator(DCO),can be implemented using standard cells and Verilog code.The design can be fully realized through a digital flow,fully leveraging the high portability advantages of all-digital PLL.In addition,to address the issue of potentially long locking times caused by the need for multiple phases to achieve lock in this PLL,this design incorporates a lock detection and frequency prediction algorithm to improve the locking speed.
  • CHEN Wei, LI Jian, WEI Cong
    Accepted: 2026-04-11
    This paper presents a bandwidth and resolution-configurable discrete-time Delta-Sigma modulator based on switched-capacitor circuits. To address the diverse requirements for measurement precision, signal bandwidth, and dynamic range in various industrial applications, the modulator features a reconfigurable loop filter architecture that can switch between third-order and fourth-order modes via an external control signal; concurrently, the system enables the collaborative adjustment of signal bandwidth and resolution by adapting different oversampling ratios (OSRs). System modeling and simulations were conducted using MATLAB (OSR=200). Regarding stability, the third-order mode exhibits more relaxed stability conditions with a Maximum Stable Amplitude (MSA) of -4.4 dBFS—an optimization of 15.2 dB over the fourth-order mode (-19.6 dBFS)—making it suitable for large-amplitude signal processing. In terms of resolution, the fourth-order mode demonstrates superior noise-shaping capabilities, achieving a dynamic range (DR) of -154.4 dBFS, which is a 24.8 dB improvement over the third-order mode (-129.6 dBFS), allowing for the precise resolution of weak signals. To verify the configurability of bandwidth and precision, performance metrics were tested under varying OSRs. Simulation data indicate that in the low-OSR region (OSR < 20), the third-order modulator provides a superior Signal-to-Quantization-Noise Ratio (SQNR); for instance, at OSR=14, the third-order and fourth-order modes yield SQNRs of 22.549 dB and 15.383 dB, respectively. As the OSR increases (OSR ≥ 20), the fourth-order mode's noise-shaping advantage becomes dominant. This design not only overcomes the limitations of single-structure modulators in multi-scenario applications but also achieves an optimized bandwidth-precision trade-off by leveraging the higher gain of the third-order structure at low OSRs and the high-precision characteristics of the fourth-order structure at high OSRs, effectively resolving the stability issues of high-order modulators under large-signal conditions.
  • Li weiye, Huang peiwen, Liu kaiyuan, Shen rensheng, Chang yuchun
    Accepted: 2026-04-11
    This paper presents a wideband low-phase-noise voltage-controlled oscillator (VCO) implemented in a 22 nm CMOS advanced process for high-performance frequency synthesis systems requiring both wide tuning range and low phase noise. The proposed VCO adopts a complementary cross-coupled topology and incorporates a transformer-coupled common-mode noise-suppression network at the sources, which establishes a high impedance at the second harmonic to suppress flicker-noise upconversion and improve oscillation waveform symmetry. In addition, a 3-bit programmable source capacitance is introduced for adaptive phase-noise optimization. To alleviate the inherent tradeoff between wide frequency coverage and low VCO gain, a hybrid tuning scheme combining 10-bit switched-capacitor coarse tuning with varactor-based fine tuning is employed. Operating from a 1.2 Vsupply, the proposed VCO consumes 3.35 mW and occupies an area of 0.185 mm². It achieves a continuous tuning range of 6.07–8.50 GHz and exhibits a phase noise of −122.1 to −120.6 dBc/Hz at a 1-MHz offset frequency.
  • Liu Jixiang
    Accepted: 2026-04-08
    Addressing the capacitor mismatch issue in high-precision successive approximation analog-to-digital converters (ADCs), this paper designs a foreground calibration technique based on sine signal input. By collecting kernel data for multiple fitting to ensure the signal-to-noise ratio (SNR) meets the specifications, the capacitor mismatch register values are obtained and OTP programming is performed. This effectively improves conversion accuracy and SNR without affecting the ADC sampling rate. This foreground calibration technique is derived from the Least Mean Squares (LMS) algorithm. It collects 16K kernel data from the SAR ADC and performs nonlinear least squares fitting using Matlab. Drawing on the idea of the LMS algorithm, the residual signal undergoes multiple iterations, with each iteration adjusting the weight of each bit of the ADC accordingly. After approximately 1000 iterations, the SNR reaches 88 dB and the spurious-free dynamic range (SFDR) is 98 dB, which are 22 dB and 17 dB higher than before calibration, respectively. Simulation and test results show that this calibration technique effectively enhances the output performance of the ADC.
  • 杨 钰泽
    Accepted: 2026-04-08
    Irregular data access patterns in high-performance computing and intelligent computing often render traditional data prefetching techniques ineffective. Existing models that rely on fixed rules or offline learning based on specific program contexts also struggle to adapt to dynamically changing memory access patterns during runtime. While the Pythia reinforcement learning (RL) prefetching framework demonstrates adaptability through online learning, it still requires manual tuning under extreme irregular workloads, limiting its generalization in practical applications. This paper proposes IEP(Irregular Enhanced Pythia), a context-aware reinforcement learning prefetching framework to enhance the prediction capability for irregular memory access patterns. The framework introduces two key innovations: first, an irregular feature enhancement module that incorporates address bit masks and access sequence distance as state features to capture hidden spatiotemporal patterns in memory allocator behavior, thereby improving the representation of irregular memory accesses; second, a hierarchical reward strategy module that employs a dynamic reward mechanism combining confidence awareness and bandwidth sensitivity to finely guide the learning process of the agent, accelerating policy optimization and improving final performance. Experiments were conducted using the ChampSim simulator, testing various irregular workloads. Results show that compared to the Pythia framework, the proposed solution achieves a maximum improvement of 2.27% in average prefetching accuracy and 2.90% in average single-core IPC for typical irregular workloads such as Ligra and PARSEC, while maintaining stable performance advantages in multi-core environments.
  • DONG Xinyi, WANG Yongliang, WANG Yuanqing, QIAN Chenghui
    Accepted: 2026-04-03
    To assist visually impaired individuals in navigation, an AI-based machine vision system for collecting tactile paving information has been designed to enable intelligent data acquisition. The system identifies standard tactile paving through image edge detection and morphological constraints, enabling wheeled robots to autonomously traverse tactile paths. It employs the YOLOv8 object detection model paired with Huawei Ascend AI processors to detect anomalies in tactile paving, transmitting detection results to a host computer via Wi-Fi. During testing, simulated tactile paving tiles measuring 30cm × 30cm were laid out. Multiple detection runs were conducted with damaged tactile paving, missing sections, and both movable and immovable obstacles placed at various positions. Testing confirmed the system's capability to collect tactile paving data within defined scenarios, identify anomaly types and locations, with an average detection accuracy of 95%. The average absolute error in anomaly location pinpointing was less than 9.12 cm relative to actual positions. This system can assist municipal authorities in understanding tactile paving conditions and support safe travel for visually impaired individuals.
  • Accepted: 2026-03-27
    Abstract: To address the predictability issue of interrupt latency in real-time operating systems (RTOS) under multi-core heterogeneous architectures, this paper proposes a layered interrupt latency modeling method and a comprehensive benchmark testing framework, using the domestic RK3588 chip and SylixOS real-time operating system as research subjects. Theoretical modeling is employed to analyze the impact of hardware architecture and operating system scheduling strategies on interrupt response time, and a testing scheme incorporating multiple scenarios such as single-core idle, mixed load, and full-core high pressure is designed. The 'modeling-testing-optimization' methodology is proposed, providing a systematic reference for real-time evaluation and optimization of multi-core heterogeneous platforms.
  • Li Ziyi, Xiong Zhengye, Cai Fanglin
    Accepted: 2026-03-26
    To address the challenges of stability assessment caused by the change in the center of gravity after the renewal of ship equipment, as well as the problems of high cost, long time consumption and difficulty in popularization of traditional inclining tests on small and medium-sized fishing, an automatic ship stability measurement system based on embedded technology is designed. The system takes an inertial measurement unit (IMU) as the core sensing module, and realizes high-precision measurement of ship roll attitude angle and rapid calculation of metacentric position through multi-source attitude data acquisition, embedded real-time processing and wireless transmission. Adopting a master-slave architecture and with the STM32F401 microcontroller as the core, the system integrates an accelerometer and an ultrasonic ranging module, and transmits data to the upper computer via 2.4GHz communication to achieve multi-dimensional perception of the dynamic response of the ship under slight inclination. In the outdoor model ship test environment, the system significantly simplifies the measurement procedure through an automated process. Compared with the traditional manual calculation and observation, the single measurement time is greatly reduced. The average relative error of metacentric height measurement is less than 3%, which verifies the high efficiency of the system in data acquisition and algorithm solving. This system provides an efficient, cost-effective and field-operable solution for the stability safety assessment of fishing vessels after equipment renewal, meeting the demand for rapid on-site measurement in marine engineering.
  • Accepted: 2026-03-26
    The wo-transistor capacitorless (2T0C) gain-cell embedded dynamic random access memory (eDRAM) offers long data retention and high potential for three-dimensional (3D) integration, making it a compelling candidate for high-density embedded storage applications. However, write-data uniformity in large-scale 2T0C arrays is susceptible to various degradation mechanisms, thus driving the need for precise memory-channel modeling to ensure reliability. However, the storage-node voltage (VSN) suffers from stage-dependent ambiguity due to nonlinear capacitance and coupling effects, preventing it from being a unique state descriptor. To overcome this, we propose a unified Z-channel model centered on the stored charge (QSN) to accurately describe both write and hold operations. By shifting the core descriptor from VSNto QSN, the proposed framework eliminates representation ambiguity while enabling the direct quantification of three major degradation mechanisms: write-history-dependent effect, the parasitics of array, and the leakage during retention. To validate its generality, comprehensive Monte Carlo simulations were conducted across 2T0C arrays fabricated in multiple technology nodes. The results show that scaling down amorphous-oxide-semiconductor field effect transistors (AOSFETs) effectively suppresses the write-history-dependency, improves write uniformity in large 2T0C arrays, and achieves 7500 seconds data retention.
  • LI Quanliang, WANG Chao, WANG Ruilin, JIAO Yang, QIAO Chuan
    Accepted: 2026-03-24
    For long-life products,the FPGA bitstream stored in NOR Flash may experience bit flips due to floating-gate charge leakage, which leads to FPGA configuration failure.To address this issue,this paper proposes a Flash refresh method based on FPGA multiboot.The Flash is refreshed during annual product maintenance to restore the floating-gate charge.The multiboot of the Kintex-7 series FPGA was investigated,and the configuration data composition was restructured to enhance the robustness of configuration.The optimized configuration data comprises one header file,two identical copies of the bitstream,and three identical sets of auxiliary file.When launching the Flash refresh,FPGA executes a sequential steps including self-check,data refresh,and read-back verification to ensure the reliability of the process.Test results have verified that this refresh method is stable and reliable,exhibiting high practical engineering utility.
  • Accepted: 2026-03-23
    This paper designs and implements an FFT hardware accelerator and host computer system for bridge structural health monitoring.The hardware adopts a sequential 4-base FFT architecture with single-butterfly multiplexing to perform computations.This approach ensures functional integrity while effectively reducing resource overhead and power consumption,enabling configurable FFT sizes of 4,16,64,and 256 points.To enable data interaction and visualization,a host computer platform was further developed for parameter configuration, operational control,and real-time display and analysis of frequency-domain results.The hardware accelerator was verified under CMOS 180nm process conditions and maintains stable operation at 100MHz.Applied to bridge vibration signal processing,this system accurately extracts primary frequency components, meeting the comprehensive requirements of real-time performance,precision, and energy efficiency for bridge structural health monitoring.
  • Accepted: 2026-03-23
    With the continuous and rapid development of Artificial Intelligence (AI), machine vision and embedded control have progressively become foundational technologies for the intelligent manufacturing industry. To meet the urgent demand for teaching and experimental platforms amidst the reform of AI education in universities, this paper proposes and implements an intelligent sorting system based on the Robot Operating System 2 (ROS2) framework. The system utilizes a Raspberry Pi as the upper-computer platform for vision acquisition and inference. Real-time video streams are collected via a USB camera, and OpenCV is employed for preprocessing operations, including video frame decoding, color space conversion, and scaling. Subsequently, ONNX Runtime is utilized for the deployment and inference of deep learning models. At the execution level, the system employs an ESP32 microcontroller as the ROS2 lower-level node. It establishes stable communication with the Raspberry Pi over a Local Area Network (LAN) via micro-ROS, enabling precise control of the conveyor belt motor and the pusher mechanism. All nodes operate within the same Wireless LAN (WLAN) and utilize the DDS protocol for rapid node discovery and reliable message transmission. This paper provides a detailed introduction to the system's design, covering hardware structure, vision processing workflows, communication architecture, and actuator control. Furthermore, the stability, real-time performance, and scalability of the system are validated through multiple rounds of experiments. Finally, centering on the requirements of educational platform construction, this paper analyzes the value of the system in experimental teaching and discusses its future application prospects in intelligent manufacturing and university laboratory platforms.
  • Accepted: 2026-03-20
    In compute-in-memory (CIM) employing high-density 2T0C arrays, parasitic capacitances critically determine charge redistribution and bit line integration dynamics, directly impacting storage-node (SN) disturbance and computational linearity. However, the escalating computational cost of conventional extraction methods with array size obstructs efficient array-level modeling and system analysis. To address this, we propose a high-accuracy approximation method for extracting parasitics from the central cell of large-scale arrays by leveraging the attenuating coupling of long interconnects. The method constructs a nine-port aggregated equivalent network by bundling non-adjacent word/bit lines and derives a quantitative expression for the minimum truncation distance of key capacitances under a 1% relative-error bound, enabling rapid array-level (AM) parameter extraction. This facilitates high-accuracy models for the SN and bit line capacitances (CSN and CRBL) across operational phases, accurately capturing the near-linear scaling of CRBL with array size. Simulations under 10× geometric scaling show a 15% accuracy improvement over the device-level model (DM). Crucially, linearity analysis based on this precise model reveals that using the low-accuracy DM would overestimate the peak integral non-linearity (INL) by approximately 1.5 least significant bit (LSB).