Home Browse Just Accepted

Just Accepted

Note: The articles listed below have been peer-reviewed and accepted for publication in this journal. These articles have not yet been scheduled for a specific issue; their content and layout may undergo minor changes in the final published version. Please refer to the final published version as the definitive one. This journal has assigned each of these articles a unique and persistent DOI. You may use the DOI to cite this article directly.
Please wait a minute...
  • Select all
    |
  • Accepted: 2025-12-16
    The stability of link establishment in the JESD204B interface protocol is a core prerequisite for ensuring the reliability of high-speed data communication, and it is of great significance for enhancing the performance of high-speed acquisition and transmission systems. Addressing the issues of low link establishment success rates and poor fault localization efficiency of traditional JESD204B IP cores in harsh environments such as high and low temperatures, this paper proposes an IP core optimization scheme that balances environmental adaptability and debug-gability. This scheme adopts a hierarchical optimization strategy, introducing an XADC temperature acquisition module into the jesd204_phy core to dynamically configure high-speed interface parameters based on real-time temperature ranges, thereby enhancing the link's resistance to temperature drift. Additionally, a link establishment timeout response module is added to the jesd204_core core, which avoids link blockage caused by timeouts through error type classification statistics and ordered reset control, and provides a quantitative basis for fault localization. By constructing a "ADC+FPGA" data acquisition and transmission verification system architecture and conducting tests within a wide temperature range of -55℃ to 125℃, the results show that the link establishment success rate under extreme high and low temperature conditions is improved by approximately 8% compared to the traditional scheme. Furthermore, the optimized IP core effectively locates the cause of faults, verifying its high reliability and engineering practicality, and meeting the high-speed data transmission requirements in harsh environments.
  • CHENG Zexiang, FENG Chaochao, ZHAO Zhenyu, LUO Yuansheng
    Accepted: 2025-12-12
    With the continuous scaling down of transistor technology nodes, achieving timing closure in nanoscale integrated circuits faces severe challenges. Although traditional circuit simulation can evaluate the performance of cell netlists and layouts, its computationally intensive nature results in prohibitively high time costs. This paper proposes a delay-optimization-sensitive cell prediction model that integrates Graph Convolutional Networks (GCN) and Multilayer Perceptrons (MLP). The approach first dynamically adjusts transistor sizes in the netlist based on input signal states, then employs GCN to parse cell netlist structures and generate homogeneous graph representations of transistor connectivity relationships and process parameters. Finally, these topological features are fused with conventional timing characteristics and fed into an MLP to predict cell optimization potential, thereby identifying delay-optimization-sensitive cells. Experimental results demonstrate prediction accuracy rates of 83.2% for the top 10 delay-optimization-sensitive cells with the highest optimization potential and 75.3% for the top 5 such cells. Compared to SPICE simulation, the time required to identify delay-optimization-sensitive cells is reduced from hours to minutes, achieving approximately 600 times acceleration. This method can accurately identify critical optimization targets, provide layout designers with transistor-level optimization parameters, and significantly improve timing closure efficiency.
  • Accepted: 2025-12-05
    n order to solve the problem that partition application cannot be executed correctly when running ARINC653 partition operating system and application program on MPC750 processor hardware simulator provided by QEMU5.1.0, this paper carries out abnormal cause analysis, related technical research and problem code investigation. Based on the research of PPC simulator source code provided by QEMU, MPC750 processor documentation and ARINC653 operating system related code, by analyzing the abnormal printing information of the operating system, observing the modification of the memory status of the simulator, and testing the setting of the relevant status register of MMU, the QEMU code problem is located. The partition application can start and run normally under MPC750 processor hardware simulator and ARINC653 operating system environment.
  • Accepted: 2025-12-02
    An efficient detection technology based on the colorimetric principle and combined with multi-channel parallel detection. Its core lies in using a detection system to independently detect 8 sample channels simultaneously. Then, through the analog-to-digital conversion chip (ADC), the analog signals are converted into digital signals and sent to the microcontroller. At the same time, the microcontroller controls loads such as motors, heating elements, and fans, featuring solution mixing function and temperature control technology. It ensures detection accuracy while significantly improving detection efficiency. It is widely applied in scenarios requiring batch sample analysis (such as clinical diagnosis, environmental monitoring, food safety, etc.).
  • zong, pengchen, qu, shaoru, zhao, wenzhe, ren, pengju, xia, tian
    Accepted: 2025-11-20
    Indirect memory accesses, prevalent in data-intensive applications like graph processing and sparse linear algebra, exhibit irregular patterns that severely degrade cache performance due to their low spatial/temporal locality. Traditional stride-based prefetchers fail to capture such patterns where target addresses are dynamically computed through index arrays (e.g., x[a[i]]). This paper proposes the dynamic multi-pattern-aware prefetcher (DMP) to address these challenges. DMP introduces a lightweight shifted differential matching mechanism to autonomously identify indirect access patterns by comparing index data sequences with target address sequence. Implemented on the open-source XuanTie C910 RISC-V processor, DMP reduces L1 data cache miss rates by 27.3% and achieves speedups of 1.07–1.22× for Sparse Matrix-Vector Multiplication (SpMV) algorithm. This work provides a hardware-efficient solution for non-contiguous memory access patterns in modern processors.
  • Ma, Chengyu, Li, Suolan, Liu, Yinuo, Zhao, Wenzhe, Ren, Pengju, Xia, Tian
    Accepted: 2025-11-20
    To address the performance bottleneck of Sparse Matrix-Vector Multiplication (SpMV) on GPU platforms, this paper proposes an optimization algorithm based on row re-segmentation and its accompanying performance evaluation model. The method first establishes a quantitative mapping relationship between matrix row lengths and computational resource allocation. By setting dynamic thresholds, the original matrix is partitioned into long-row and short-row submatrices, which are then computed using thread-level and thread-block-level parallel strategies respectively. This approach effectively alleviates the inherent conflict between GPU SIMT execution characteristics and irregular data distribution in sparse matrices. To quantify the additional overhead introduced during preprocessing, performance penalty models for Atomic Conflict and Padding are developed, transforming extra memory access and computation into computable cost functions. Building upon these models, a parameter space search algorithm is constructed that rapidly identifies optimal preprocessing parameters within predefined parameter sets by leveraging pre-acquired hardware performance metrics and matrix non-zero element distribution information. Experimental results demonstrate that the proposed optimization algorithm outperforms traditional GPU sparse computation library cuSPARSE across multiple benchmark sparse matrix datasets, achieving performance improvements of up to 1.26× and 1.17× in specific scenarios. Furthermore, the parameter search process incurs low overhead, and the method exhibits strong generalizability, demonstrating adaptability to diverse input matrices and GPU hardware architectures.
  • Accepted: 2025-11-20
    地下石油管道因腐蚀、疲劳、蠕变、冲刷及磨损减薄等一系列因素导致断裂,从而引起巨大损失。针对地下管道断裂的探测问题,设计了基于合成孔径原理的圆环阵列超声周向目标探测电路。该电路包括超声激励模块、阵列探头控制电路、超声波收发电路、数据采集存储电路。超声波激励模块负责将连续的方波激励信号放大加到超声探头两端,阵列控制电路控制着超声发射模块,在某时刻采用一发多收的工作模式,而超声波接收电路承担着信号放大和检测部分,最后由数据采集模块采集存储数据并上传PC进行处理。实验结果表明,在空气介质中该电路可对180°范围内的任何物体进行探测,且探测距离大于2m。
  • Accepted: 2025-11-19
    The transponder is a critical The transponder is a critical component of the launch vehicle measurement system, capable of receiving and coherently relaying two C-band velocity measurement signals. To achieve high-precision coherent signal relay, the project team utilized an FPGA hardware platform and implemented methods such as improved quantization accuracy, innovative quantization approaches for relay ratios, cross-relay operation modes, and rational allocation of signal processing time. These measures enabled the design of high-precision coherent relay software for velocity measurement signals. Taking the commonly used 200 kHz Doppler frequency shift as an example, the designed velocity measurement accuracy has reached 0.0023 Hz. Additionally, unlike the independent operation modes where Channel A's main station transmits and Channel A's main/auxiliary stations receive, and Channel B's main station transmits and Channel B's main/auxiliary stations receive, the system now supports bidirectional non-common-source velocity measurement. When either Channel A or B fails to receive signals normally, the design allows any main station of Channel A or B to transmit, while the main/auxiliary stations of both channels synchronously receive. This enhances the system's velocity measurement accuracy under abnormal conditions.
  • Accepted: 2025-11-17
    In high-speed storage devices based on FPGA, the cascading capability between devices is crucial for the compatibility and scalability of the devices. Therefore, this paper designs a cascading storage system for high-speed storage devices based on FPGA, which integrates the high bandwidth of FPGA-based high-speed storage devices and the flexible scalability of general storage devices. Experimental results show that under the "one master and multiple slaves" management mode of global clock synchronization and token polling, this cascading storage system can maintain a storage bandwidth of 6.40GB/s. In the continuous writing and replay tests of large-scale data, the data is stably written and there are no error codes in the verification, effectively achieving the transparent expansion of the storage system.
  • Accepted: 2025-11-17
    本文通过对QEMU的虚拟机与宿主机映射的页表机制进行研究,并深入分析了其中的页表填充原理以及读写指令如何触发区分对不同类型内存的处理流程的原理。通过在页属性中增加新的标志位,并在页表填充和指令对内存读写的helper函数中对该位进行对应的设置和判定,从而实现了对具有某一属性地址的定位,并进入特定的回调函数。参照QEMU自带的leon3例程中添加外设的流程,设计了动态库的接口函数包括设备创建、初始化、读写回调函数等。分析了QEMU对MMIO外设的读写流程和传参特征,得出外设定位原理及回调函数所需的基本参数,在此基础上,设计并给出了对片外MMIO型外设的动态库中读写回调函数的精确调用位置。本文最后通过实验对研究的正确性和速度敏感性进行了分析,得出用本研究的方法能够很好的实现外设和QEMU代码的分离,运行结果正确,运行速度能达到外设源码与QEMU源码在一起编译时速度的97%以上。本研究能为虚拟机开发人员以及QEMU开源使用者提供一定的借鉴意义。
  • Accepted: 2025-11-11
    Abstract: Helicopter is widely used in military, civil and people's livelihood fields because of its superior flexibility and mobility, and has an irreplaceable position in specific fields. Since the advent of the helicopter, its faults have emerged one after another. For helicopter fault detection, it is often used to manually check each part one by one. Aiming at the traditional helicopter fault detection, based on ep4ce series FPGA chip, this paper uses 12 bit 8-channel analog-to-digital converter (ADC) chip adc128s022 to realize the real-time acquisition of helicopter vibration data, and extracts the eigenvalues of the collected data by fast Fourier transform (FFT). The experimental results show that the alarm rate of helicopter fault detection is more than 99%.
  • Accepted: 2025-10-20
    The application of dynamic reconfiguration technology is numerous, but research oriented towards DSP chips is extremely scarce. This paper proposes a DSP-based partial dynamic reconfiguration method, which takes the most frequently modified functions in application programs as reconfiguration elements. It reasonably allocates the DSP's memory space and FLASH storage space occupied by the functions to be reconfigured, and replaces the function data in this space online on demand, thereby realizing partial dynamic reconfiguration. Tests using the domestic FT-M6678 show that this method can effectively change the functions of the reconfigurable functions without affecting the operation of other modules of the software. It provides practical methods and experience for the flexible use of DSP and has a good application prospect.
  • Accepted: 2025-09-17
    The existing debugging functionalities in embedded real-time operating system application development encompass variable inspection,breakpoint management,and memory read/write operations,which generally satisfy users’ debugging requirements for multitasking applications.However,there is limited focus on debugging specific tasks during the multitasking debugging process.Particularly,when multiple tasks invoke the same funtion ,it creates inconvenience for users during debugging.This paper presents a method for debugging specified tasks in multitasking programs,implemented on an embedded real-time operating system and an autonomous debugger software for the “HunXin” digital signal processor. This approach significantly enhances the efficiency of multitasking application debugging and reduces the application development time.