Home Browse Just Accepted

Just Accepted

Note: The articles listed below have been peer-reviewed and accepted for publication in this journal. These articles have not yet been scheduled for a specific issue; their content and layout may undergo minor changes in the final published version. Please refer to the final published version as the definitive one. This journal has assigned each of these articles a unique and persistent DOI. You may use the DOI to cite this article directly.
Please wait a minute...
  • Select all
    |
  • Accepted: 2025-11-17
    In high-speed storage devices based on FPGA, the cascading capability between devices is crucial for the compatibility and scalability of the devices. Therefore, this paper designs a cascading storage system for high-speed storage devices based on FPGA, which integrates the high bandwidth of FPGA-based high-speed storage devices and the flexible scalability of general storage devices. Experimental results show that under the "one master and multiple slaves" management mode of global clock synchronization and token polling, this cascading storage system can maintain a storage bandwidth of 6.40GB/s. In the continuous writing and replay tests of large-scale data, the data is stably written and there are no error codes in the verification, effectively achieving the transparent expansion of the storage system.
  • Accepted: 2025-11-17
    本文通过对QEMU的虚拟机与宿主机映射的页表机制进行研究,并深入分析了其中的页表填充原理以及读写指令如何触发区分对不同类型内存的处理流程的原理。通过在页属性中增加新的标志位,并在页表填充和指令对内存读写的helper函数中对该位进行对应的设置和判定,从而实现了对具有某一属性地址的定位,并进入特定的回调函数。参照QEMU自带的leon3例程中添加外设的流程,设计了动态库的接口函数包括设备创建、初始化、读写回调函数等。分析了QEMU对MMIO外设的读写流程和传参特征,得出外设定位原理及回调函数所需的基本参数,在此基础上,设计并给出了对片外MMIO型外设的动态库中读写回调函数的精确调用位置。本文最后通过实验对研究的正确性和速度敏感性进行了分析,得出用本研究的方法能够很好的实现外设和QEMU代码的分离,运行结果正确,运行速度能达到外设源码与QEMU源码在一起编译时速度的97%以上。本研究能为虚拟机开发人员以及QEMU开源使用者提供一定的借鉴意义。
  • Accepted: 2025-11-11
    Abstract: Helicopter is widely used in military, civil and people's livelihood fields because of its superior flexibility and mobility, and has an irreplaceable position in specific fields. Since the advent of the helicopter, its faults have emerged one after another. For helicopter fault detection, it is often used to manually check each part one by one. Aiming at the traditional helicopter fault detection, based on ep4ce series FPGA chip, this paper uses 12 bit 8-channel analog-to-digital converter (ADC) chip adc128s022 to realize the real-time acquisition of helicopter vibration data, and extracts the eigenvalues of the collected data by fast Fourier transform (FFT). The experimental results show that the alarm rate of helicopter fault detection is more than 99%.
  • Accepted: 2025-10-20
    The application of dynamic reconfiguration technology is numerous, but research oriented towards DSP chips is extremely scarce. This paper proposes a DSP-based partial dynamic reconfiguration method, which takes the most frequently modified functions in application programs as reconfiguration elements. It reasonably allocates the DSP's memory space and FLASH storage space occupied by the functions to be reconfigured, and replaces the function data in this space online on demand, thereby realizing partial dynamic reconfiguration. Tests using the domestic FT-M6678 show that this method can effectively change the functions of the reconfigurable functions without affecting the operation of other modules of the software. It provides practical methods and experience for the flexible use of DSP and has a good application prospect.
  • Accepted: 2025-09-24
    With the gradual adoption of embedded systems in industrial control systems, the need to establish a data-centric digital factory to support production management, scheduling decisions, and the intelligent configuration of production resources has become increasingly prominent. Among these, efficient and reliable data transmission methods play a crucial underlying supporting role in digital construction and are the prerequisite for the orderly operation of the entire embedded system. Data Distribution Service (DDS), as a high-performance communication middleware, provides a specification for data sharing between different systems and has received widespread attention in recent years. However, there are still issues with current complete data distribution services on embedded platforms, such as the inability to allow embedded devices to directly join the distributed network of data distribution services as communication nodes, and the inability to guarantee the real-time performance of urgent messages in scenarios of network resource conflicts. To address this issue, this paper proposes an optimization strategy based on software and hardware co-design, focusing on the operational characteristics of DDS. It involves a dedicated SRAM for rapid loading of DDS modules and utilizes DMA technology to improve data interaction energy efficiency, including multi-level parallel computing technology based on module decoupling and a high-availability software design strategy based on the Master-Works pattern. Testing and verification were conducted on STM32H4, and the analysis results show that the method designed in this paper is suitable for real-time performance analysis of data distribution services in network environments. Compared to centralized data centers, the packet loss rate is reduced by 5%, and the data transmission efficiency is improved by approximately 8%
  • Accepted: 2025-09-24
    Currently, for SOC chips integrated with neural network processors on the market, when running the YOLO algorithm, the post-processing part is executed on the CPU, which increases the overall time consumption of the algorithm. This paper proposes a hardware acceleration scheme for YOLO post-processing based on FPGA chips using RTL logic.First, the algorithm execution process is optimized to greatly reduce redundant calculations.Next, the numerical distribution of feature maps is analyzed and restricted, and the variable range is reasonably defined.Subsequently, the RAM lookup process is sorted out to complete the mapping of nonlinear functions.Then, the data flow control logic architecture of the overall post-processing algorithm is elaborated, and some practical techniques are proposed for key functional modules.Finally, the acceleration scheme is tested on the board based on the domestic ZYNQ chip, and the performance is evaluated from multiple dimensions with the reasons analyzed.The experimental results show that the implementation scheme occupies less than 3% of the logic resources, with a calculation accuracy loss of about 0.5%, and the calculation efficiency is 7 times higher than that of the CPU. When connected to real-time video acquisition, the FPGA system runs stably, and the target frame detection and marking are accurate.
  • Accepted: 2025-09-17
    The existing debugging functionalities in embedded real-time operating system application development encompass variable inspection,breakpoint management,and memory read/write operations,which generally satisfy users’ debugging requirements for multitasking applications.However,there is limited focus on debugging specific tasks during the multitasking debugging process.Particularly,when multiple tasks invoke the same funtion ,it creates inconvenience for users during debugging.This paper presents a method for debugging specified tasks in multitasking programs,implemented on an embedded real-time operating system and an autonomous debugger software for the “HunXin” digital signal processor. This approach significantly enhances the efficiency of multitasking application debugging and reduces the application development time.
  • Accepted: 2025-09-17
    PCIe interface bus enables low-latency, high-bandwidth data transmission between CPU and FPGA, with the key factor being the design of a DMA engine, allowing CPU to be uninvolved in the data transmission. However, the majority of current CPU+FPGA data transmission solutions are based on foreign FPGA devices from Xilinx, and there is a severe shortage of commercial IP cores for domestic FPGA, making it challenging to port these solutions to domestic FPGA platforms. Therefore, this paper uses domestic FPGA to design a PCIe interface-based DMA engine and its corresponding driver, hiding the parsing of transaction layer packets in the PCIe protocol stack and reducing the development complexity of domestic FPGA in PCIe based applications. Experimental results demonstrate that, the DMA engine achieves a read throughput of 784 MB/s and a write throughput of 800 MB/s via PCIe 2.0 x2 bus, reaching 82% and 84% of the theoretical maximum bandwidth of PCIe 2.0 x2, respectively.