Home Browse Just Accepted

Just Accepted

Note: The articles listed below have been peer-reviewed and accepted for publication in this journal. These articles have not yet been scheduled for a specific issue; their content and layout may undergo minor changes in the final published version. Please refer to the final published version as the definitive one. This journal has assigned each of these articles a unique and persistent DOI. You may use the DOI to cite this article directly.
Please wait a minute...
  • Select all
    |
  • Accepted: 2026-04-15
    Serial communication is widely utilized in industrial control, the Internet of Things, and embedded systems. Asynchronous protocols such as UART are limited by low transmission rates and strict clock tolerance requirements, whereas synchronous protocols like SPI necessitate an additional clock signal, incurring inherent disadvantages in power consumption and cost. An asynchronous serial communication method based on level width modulation is proposed, where symbols are encoded by the duration of a level and symbol synchronization is achieved via signal edges. This method can be implemented using purely digital circuits, enabling high clock tolerance communication over a single-wire connection. By optimizing parameters such as the transmitted level width and the receiver decision window, flexible trade-offs among communication rate, clock tolerance, and SNR-FER performance are enabled. The mathematical relationship between the parameter set and clock tolerance is derived and subsequently verified through simulation. The simulations of SNR-FER curves for typical parameter sets under different operating modes were obtained. A purely digital transceiver prototype was implemented on an FPGA with a clock frequency ratio of 1.2 between transceivers, a frame error rate of 2.4×10⁻⁷ and an average data rate of 142.6 Mbps are achieved at an SNR of 19.4 dB. This method is well-suited for on-chip clock domain crossing or off-chip low-power communication scenarios, particularly where a significant clock frequency difference exists between transceivers and wiring resources are constrained, meeting the demands for medium-rate, low-cost communication.
  • Accepted: 2026-04-14
    An all-digital phase-locked loop(ADPLL)designed for CMOS image sensor chips.A dual-loop all-digital phase-locked loop based on injection-locked technology is designed to address the issue that traditional ADPLL rely on high-precision time-to-digital converters(TDC),making them difficult to implement through a fully digital design flow.This ADPLL does not depend on highly customized modules;instead,it leverages injection locking,PVT sensing compensation,and dual-loop techniques to enhance phase noise performance.As a result,all modules of the PLL,including the digitally controlled oscillator(DCO),can be implemented using standard cells and Verilog code.The design can be fully realized through a digital flow,fully leveraging the high portability advantages of all-digital PLL.In addition,to address the issue of potentially long locking times caused by the need for multiple phases to achieve lock in this PLL,this design incorporates a lock detection and frequency prediction algorithm to improve the locking speed.
  • CHEN Wei, LI Jian, WEI Cong
    Accepted: 2026-04-11
    This paper presents a bandwidth and resolution-configurable discrete-time Delta-Sigma modulator based on switched-capacitor circuits. To address the diverse requirements for measurement precision, signal bandwidth, and dynamic range in various industrial applications, the modulator features a reconfigurable loop filter architecture that can switch between third-order and fourth-order modes via an external control signal; concurrently, the system enables the collaborative adjustment of signal bandwidth and resolution by adapting different oversampling ratios (OSRs). System modeling and simulations were conducted using MATLAB (OSR=200). Regarding stability, the third-order mode exhibits more relaxed stability conditions with a Maximum Stable Amplitude (MSA) of -4.4 dBFS—an optimization of 15.2 dB over the fourth-order mode (-19.6 dBFS)—making it suitable for large-amplitude signal processing. In terms of resolution, the fourth-order mode demonstrates superior noise-shaping capabilities, achieving a dynamic range (DR) of -154.4 dBFS, which is a 24.8 dB improvement over the third-order mode (-129.6 dBFS), allowing for the precise resolution of weak signals. To verify the configurability of bandwidth and precision, performance metrics were tested under varying OSRs. Simulation data indicate that in the low-OSR region (OSR < 20), the third-order modulator provides a superior Signal-to-Quantization-Noise Ratio (SQNR); for instance, at OSR=14, the third-order and fourth-order modes yield SQNRs of 22.549 dB and 15.383 dB, respectively. As the OSR increases (OSR ≥ 20), the fourth-order mode's noise-shaping advantage becomes dominant. This design not only overcomes the limitations of single-structure modulators in multi-scenario applications but also achieves an optimized bandwidth-precision trade-off by leveraging the higher gain of the third-order structure at low OSRs and the high-precision characteristics of the fourth-order structure at high OSRs, effectively resolving the stability issues of high-order modulators under large-signal conditions.
  • Li weiye, Huang peiwen, Liu kaiyuan, Shen rensheng, Chang yuchun
    Accepted: 2026-04-11
    This paper presents a wideband low-phase-noise voltage-controlled oscillator (VCO) implemented in a 22 nm CMOS advanced process for high-performance frequency synthesis systems requiring both wide tuning range and low phase noise. The proposed VCO adopts a complementary cross-coupled topology and incorporates a transformer-coupled common-mode noise-suppression network at the sources, which establishes a high impedance at the second harmonic to suppress flicker-noise upconversion and improve oscillation waveform symmetry. In addition, a 3-bit programmable source capacitance is introduced for adaptive phase-noise optimization. To alleviate the inherent tradeoff between wide frequency coverage and low VCO gain, a hybrid tuning scheme combining 10-bit switched-capacitor coarse tuning with varactor-based fine tuning is employed. Operating from a 1.2 Vsupply, the proposed VCO consumes 3.35 mW and occupies an area of 0.185 mm². It achieves a continuous tuning range of 6.07–8.50 GHz and exhibits a phase noise of −122.1 to −120.6 dBc/Hz at a 1-MHz offset frequency.
  • Liu Jixiang
    Accepted: 2026-04-08
    Addressing the capacitor mismatch issue in high-precision successive approximation analog-to-digital converters (ADCs), this paper designs a foreground calibration technique based on sine signal input. By collecting kernel data for multiple fitting to ensure the signal-to-noise ratio (SNR) meets the specifications, the capacitor mismatch register values are obtained and OTP programming is performed. This effectively improves conversion accuracy and SNR without affecting the ADC sampling rate. This foreground calibration technique is derived from the Least Mean Squares (LMS) algorithm. It collects 16K kernel data from the SAR ADC and performs nonlinear least squares fitting using Matlab. Drawing on the idea of the LMS algorithm, the residual signal undergoes multiple iterations, with each iteration adjusting the weight of each bit of the ADC accordingly. After approximately 1000 iterations, the SNR reaches 88 dB and the spurious-free dynamic range (SFDR) is 98 dB, which are 22 dB and 17 dB higher than before calibration, respectively. Simulation and test results show that this calibration technique effectively enhances the output performance of the ADC.
  • 杨 钰泽
    Accepted: 2026-04-08
    Irregular data access patterns in high-performance computing and intelligent computing often render traditional data prefetching techniques ineffective. Existing models that rely on fixed rules or offline learning based on specific program contexts also struggle to adapt to dynamically changing memory access patterns during runtime. While the Pythia reinforcement learning (RL) prefetching framework demonstrates adaptability through online learning, it still requires manual tuning under extreme irregular workloads, limiting its generalization in practical applications. This paper proposes IEP(Irregular Enhanced Pythia), a context-aware reinforcement learning prefetching framework to enhance the prediction capability for irregular memory access patterns. The framework introduces two key innovations: first, an irregular feature enhancement module that incorporates address bit masks and access sequence distance as state features to capture hidden spatiotemporal patterns in memory allocator behavior, thereby improving the representation of irregular memory accesses; second, a hierarchical reward strategy module that employs a dynamic reward mechanism combining confidence awareness and bandwidth sensitivity to finely guide the learning process of the agent, accelerating policy optimization and improving final performance. Experiments were conducted using the ChampSim simulator, testing various irregular workloads. Results show that compared to the Pythia framework, the proposed solution achieves a maximum improvement of 2.27% in average prefetching accuracy and 2.90% in average single-core IPC for typical irregular workloads such as Ligra and PARSEC, while maintaining stable performance advantages in multi-core environments.
  • DONG Xinyi, WANG Yongliang, WANG Yuanqing, QIAN Chenghui
    Accepted: 2026-04-03
    To assist visually impaired individuals in navigation, an AI-based machine vision system for collecting tactile paving information has been designed to enable intelligent data acquisition. The system identifies standard tactile paving through image edge detection and morphological constraints, enabling wheeled robots to autonomously traverse tactile paths. It employs the YOLOv8 object detection model paired with Huawei Ascend AI processors to detect anomalies in tactile paving, transmitting detection results to a host computer via Wi-Fi. During testing, simulated tactile paving tiles measuring 30cm × 30cm were laid out. Multiple detection runs were conducted with damaged tactile paving, missing sections, and both movable and immovable obstacles placed at various positions. Testing confirmed the system's capability to collect tactile paving data within defined scenarios, identify anomaly types and locations, with an average detection accuracy of 95%. The average absolute error in anomaly location pinpointing was less than 9.12 cm relative to actual positions. This system can assist municipal authorities in understanding tactile paving conditions and support safe travel for visually impaired individuals.
  • Accepted: 2026-03-27
    Abstract: To address the predictability issue of interrupt latency in real-time operating systems (RTOS) under multi-core heterogeneous architectures, this paper proposes a layered interrupt latency modeling method and a comprehensive benchmark testing framework, using the domestic RK3588 chip and SylixOS real-time operating system as research subjects. Theoretical modeling is employed to analyze the impact of hardware architecture and operating system scheduling strategies on interrupt response time, and a testing scheme incorporating multiple scenarios such as single-core idle, mixed load, and full-core high pressure is designed. The 'modeling-testing-optimization' methodology is proposed, providing a systematic reference for real-time evaluation and optimization of multi-core heterogeneous platforms.
  • Li Ziyi, Xiong Zhengye, Cai Fanglin
    Accepted: 2026-03-26
    To address the challenges of stability assessment caused by the change in the center of gravity after the renewal of ship equipment, as well as the problems of high cost, long time consumption and difficulty in popularization of traditional inclining tests on small and medium-sized fishing, an automatic ship stability measurement system based on embedded technology is designed. The system takes an inertial measurement unit (IMU) as the core sensing module, and realizes high-precision measurement of ship roll attitude angle and rapid calculation of metacentric position through multi-source attitude data acquisition, embedded real-time processing and wireless transmission. Adopting a master-slave architecture and with the STM32F401 microcontroller as the core, the system integrates an accelerometer and an ultrasonic ranging module, and transmits data to the upper computer via 2.4GHz communication to achieve multi-dimensional perception of the dynamic response of the ship under slight inclination. In the outdoor model ship test environment, the system significantly simplifies the measurement procedure through an automated process. Compared with the traditional manual calculation and observation, the single measurement time is greatly reduced. The average relative error of metacentric height measurement is less than 3%, which verifies the high efficiency of the system in data acquisition and algorithm solving. This system provides an efficient, cost-effective and field-operable solution for the stability safety assessment of fishing vessels after equipment renewal, meeting the demand for rapid on-site measurement in marine engineering.
  • Accepted: 2026-03-26
    The wo-transistor capacitorless (2T0C) gain-cell embedded dynamic random access memory (eDRAM) offers long data retention and high potential for three-dimensional (3D) integration, making it a compelling candidate for high-density embedded storage applications. However, write-data uniformity in large-scale 2T0C arrays is susceptible to various degradation mechanisms, thus driving the need for precise memory-channel modeling to ensure reliability. However, the storage-node voltage (VSN) suffers from stage-dependent ambiguity due to nonlinear capacitance and coupling effects, preventing it from being a unique state descriptor. To overcome this, we propose a unified Z-channel model centered on the stored charge (QSN) to accurately describe both write and hold operations. By shifting the core descriptor from VSNto QSN, the proposed framework eliminates representation ambiguity while enabling the direct quantification of three major degradation mechanisms: write-history-dependent effect, the parasitics of array, and the leakage during retention. To validate its generality, comprehensive Monte Carlo simulations were conducted across 2T0C arrays fabricated in multiple technology nodes. The results show that scaling down amorphous-oxide-semiconductor field effect transistors (AOSFETs) effectively suppresses the write-history-dependency, improves write uniformity in large 2T0C arrays, and achieves 7500 seconds data retention.
  • LI Quanliang, WANG Chao, WANG Ruilin, JIAO Yang, QIAO Chuan
    Accepted: 2026-03-24
    For long-life products,the FPGA bitstream stored in NOR Flash may experience bit flips due to floating-gate charge leakage, which leads to FPGA configuration failure.To address this issue,this paper proposes a Flash refresh method based on FPGA multiboot.The Flash is refreshed during annual product maintenance to restore the floating-gate charge.The multiboot of the Kintex-7 series FPGA was investigated,and the configuration data composition was restructured to enhance the robustness of configuration.The optimized configuration data comprises one header file,two identical copies of the bitstream,and three identical sets of auxiliary file.When launching the Flash refresh,FPGA executes a sequential steps including self-check,data refresh,and read-back verification to ensure the reliability of the process.Test results have verified that this refresh method is stable and reliable,exhibiting high practical engineering utility.
  • Accepted: 2026-03-23
    This paper designs and implements an FFT hardware accelerator and host computer system for bridge structural health monitoring.The hardware adopts a sequential 4-base FFT architecture with single-butterfly multiplexing to perform computations.This approach ensures functional integrity while effectively reducing resource overhead and power consumption,enabling configurable FFT sizes of 4,16,64,and 256 points.To enable data interaction and visualization,a host computer platform was further developed for parameter configuration, operational control,and real-time display and analysis of frequency-domain results.The hardware accelerator was verified under CMOS 180nm process conditions and maintains stable operation at 100MHz.Applied to bridge vibration signal processing,this system accurately extracts primary frequency components, meeting the comprehensive requirements of real-time performance,precision, and energy efficiency for bridge structural health monitoring.
  • Accepted: 2026-03-23
    With the continuous and rapid development of Artificial Intelligence (AI), machine vision and embedded control have progressively become foundational technologies for the intelligent manufacturing industry. To meet the urgent demand for teaching and experimental platforms amidst the reform of AI education in universities, this paper proposes and implements an intelligent sorting system based on the Robot Operating System 2 (ROS2) framework. The system utilizes a Raspberry Pi as the upper-computer platform for vision acquisition and inference. Real-time video streams are collected via a USB camera, and OpenCV is employed for preprocessing operations, including video frame decoding, color space conversion, and scaling. Subsequently, ONNX Runtime is utilized for the deployment and inference of deep learning models. At the execution level, the system employs an ESP32 microcontroller as the ROS2 lower-level node. It establishes stable communication with the Raspberry Pi over a Local Area Network (LAN) via micro-ROS, enabling precise control of the conveyor belt motor and the pusher mechanism. All nodes operate within the same Wireless LAN (WLAN) and utilize the DDS protocol for rapid node discovery and reliable message transmission. This paper provides a detailed introduction to the system's design, covering hardware structure, vision processing workflows, communication architecture, and actuator control. Furthermore, the stability, real-time performance, and scalability of the system are validated through multiple rounds of experiments. Finally, centering on the requirements of educational platform construction, this paper analyzes the value of the system in experimental teaching and discusses its future application prospects in intelligent manufacturing and university laboratory platforms.
  • Accepted: 2026-03-20
    In compute-in-memory (CIM) employing high-density 2T0C arrays, parasitic capacitances critically determine charge redistribution and bit line integration dynamics, directly impacting storage-node (SN) disturbance and computational linearity. However, the escalating computational cost of conventional extraction methods with array size obstructs efficient array-level modeling and system analysis. To address this, we propose a high-accuracy approximation method for extracting parasitics from the central cell of large-scale arrays by leveraging the attenuating coupling of long interconnects. The method constructs a nine-port aggregated equivalent network by bundling non-adjacent word/bit lines and derives a quantitative expression for the minimum truncation distance of key capacitances under a 1% relative-error bound, enabling rapid array-level (AM) parameter extraction. This facilitates high-accuracy models for the SN and bit line capacitances (CSN and CRBL) across operational phases, accurately capturing the near-linear scaling of CRBL with array size. Simulations under 10× geometric scaling show a 15% accuracy improvement over the device-level model (DM). Crucially, linearity analysis based on this precise model reveals that using the low-accuracy DM would overestimate the peak integral non-linearity (INL) by approximately 1.5 least significant bit (LSB).
  • Accepted: 2026-03-18
    在国产自主化加速推进与AI算力需求爆发的双重背景下,存储底层盘控核心技术国产化依然不足。针对这一问题,本文设计并实现了一套基于国产软硬件生态的SATA标准NAS存储系统,其核心硬件架构采用复旦微FPGA与龙芯3A3500处理器,软件层面依托银河麒麟操作系统。其中,FPGA实现了PCIe与SATA协议的桥接转换,主要包括PCIe高速数据交互、HBA控制器逻辑与SATA协议处理等核心功能。在此基础上,进一步结合RAID冗余存储技术,设计并实现了集卷组管理与多协议文件共享于一体的NAS存储管理系统,为存储国产化提供了一种可行的技术路径和落地方案。
  • Accepted: 2026-03-17
    Facing the new era's cybersecurity situation, China has timely proposed an active immune protection system based on trusted computing 3.0. Combined with the national standard specifications of the Trusted Platform Control Module (TPCM) in China, a TPCM module solution based on a secure SSD and its implementation method are proposed. Through the built-in SM2, SM3, SM4, random number generator module, and OTP controller module of the security control chip, the power-on self-test, secure boot, data encryption and decryption, and key management functions of the security control chip are realized. Based on the secure SSD, an authentication software is designed to achieve the trusted boot measurement and response of the physical environment such as the computer motherboard, BIOS, peripherals, and IP address. Compared with the existing TPCM cards, the proposed solution has the advantages of high security, good compatibility, strong scalability, convenient deployment, and low cost.
  • Accepted: 2026-03-16
    星载合成孔径雷达(Synthetic Aperture Radar,SAR)实时成像处理需在星上资源严格受限、空间辐照恶劣条件下,实现多模式、大运算量计算。研制专用SAR 成像处理SoC(System on Chip)芯片,通过专用计算引擎电路实现算法加速,可有效提高实时成像处理效率。专用芯片设计过程中,需验证其功能边界及异常场景,而如何提升专用算法加速电路的验证覆盖率是核心难题。本文提出一种基于细粒度电路匹配建模的SAR成像处理芯片的验证方法,对每个计算引擎建立独立的细粒度参考模型,在实现运算功能基础上增加对电路运算精度、电路执行优先级等特征建模,解决了传统参考模型无法用于SAR计算引擎数据比对难题;在此基础上基于UVM(Universal Verification Methodology)构建可复用、可扩展的验证环境,采用算法功能加随机配置的双组合验证策略。经测试试验,本文方法可将计算引擎验证覆盖率提升14%~30%,满足SAR实时成像处理应用需求。
  • Accepted: 2026-03-15
    针对工业领域对高效率、低功耗的嵌入式三维测量系统的迫切需求,设计了一种基于半周期条纹二值编码算法架构的FPGA三维测量系统,首先通过相机控制模块采集结构光图像,并对图像进行缓存和滤波预处理。随后利用FPGA并行运算的优势实现了包裹相位、相位展开和三维重建等关键算法模块的硬件加速。最后通过千兆以太网模块将重建结果传输至上位机进行可视化显示,构建了完整的嵌入式三维测量流程。实验表明,所提系统在处理单组720*540分辨率的图像单次重建速度为4ms,系统功耗为1.687W,对标准小球测量的拟合误差为0.0532mm,可见该系统在保持低功耗的同时具有良好的鲁棒性与实用性,为嵌入式三维测量系统的实现提供了可行的解决方案。
  • Accepted: 2026-03-13
    To address the complex challenges of modern new energy vehicles, such as low network transmission latency and high real-time data requirements, a vehicle gateway controller based on CAN FD (Controller Area Network Flexible Data-rate) was designed using Renesas automotive-grade microcontroller RH850/F1K. It features six bus channels, supports both classical CAN and CAN FD, and includes data storage functionality. The bus transceiver employs the next-generation CAN transceiver TJA1462 to enhance signal transmission performance. The software design is based on the embedded operating system FreeRTOS, enabling multitasking and structured hierarchical functions. Additionally, a specialized driver was developed for the CAN peripheral's unique transceiver method on the RH850/F1K microcontroller. Using a USB-to-CAN tool, the gateway's data transmission and storage capabilities were tested. The results demonstrate that the designed gateway controller can operate stably even when the load rate approaches the bus usage limit, providing an efficient and reliable solution for vehicle gateway controllers.
  • Chen Zhihong, Du Yuan, Du Li
    Accepted: 2026-03-13
    Ensuring the structural integrity of bridge expansion joints is critical for maintaining traffic safety and prolonging infrastructure lifespan. However, conventional inspection methods are often labor-intensive, time-consuming, and disruptive to traffic flow. To overcome these limitations, this study proposes an edge–cloud collaborative acoustic monitoring system for the long-term health assessment of bridge expansion joints. Over an 18-month monitoring period spanning multiple bridges, approximately 21,000 acoustic signature samples of bridge expansion joints were collected. To ensure accuracy and reliability, all samples were meticulously annotated by experienced highway engineers. Based on this dataset, an adaptive edge–cloud collaborative classification framework was developed. Specifically, the edge device employs a cascade classification model based on Support Vector Machines (SVM) for real-time, lightweight inference, while the cloud leverages a deep learning model based on Gated Recurrent Units (GRU) to perform more complex analysis. The cascaded classification model deployed on the edge device achieved an accuracy of 96.7%, with a 95% confidence interval (CI) of (0.9632, 0.9668). In comparison, the GRU-based classifier running on the cloud attained a higher accuracy of 98.2%, with a 95% CI of (0.9783, 0.9823). Furthermore, the proposed adaptive two-stage classification strategy reduced data transmission to less than 5% of the total collected data. These experimental results demonstrate that the proposed system offers a reliable, efficient, and accurate solution for the acoustic health monitoring of bridge expansion joints.