Home Browse Just Accepted

Just Accepted

Accepted, unedited articles published online and citable. The final edited and typeset version of record will appear in the future.
Please wait a minute...
  • Select all
    |
  • Accepted: 2025-09-24
    With the gradual adoption of embedded systems in industrial control systems, the need to establish a data-centric digital factory to support production management, scheduling decisions, and the intelligent configuration of production resources has become increasingly prominent. Among these, efficient and reliable data transmission methods play a crucial underlying supporting role in digital construction and are the prerequisite for the orderly operation of the entire embedded system. Data Distribution Service (DDS), as a high-performance communication middleware, provides a specification for data sharing between different systems and has received widespread attention in recent years. However, there are still issues with current complete data distribution services on embedded platforms, such as the inability to allow embedded devices to directly join the distributed network of data distribution services as communication nodes, and the inability to guarantee the real-time performance of urgent messages in scenarios of network resource conflicts. To address this issue, this paper proposes an optimization strategy based on software and hardware co-design, focusing on the operational characteristics of DDS. It involves a dedicated SRAM for rapid loading of DDS modules and utilizes DMA technology to improve data interaction energy efficiency, including multi-level parallel computing technology based on module decoupling and a high-availability software design strategy based on the Master-Works pattern. Testing and verification were conducted on STM32H4, and the analysis results show that the method designed in this paper is suitable for real-time performance analysis of data distribution services in network environments. Compared to centralized data centers, the packet loss rate is reduced by 5%, and the data transmission efficiency is improved by approximately 8%
  • Accepted: 2025-09-24
    Currently, for SOC chips integrated with neural network processors on the market, when running the YOLO algorithm, the post-processing part is executed on the CPU, which increases the overall time consumption of the algorithm. This paper proposes a hardware acceleration scheme for YOLO post-processing based on FPGA chips using RTL logic.First, the algorithm execution process is optimized to greatly reduce redundant calculations.Next, the numerical distribution of feature maps is analyzed and restricted, and the variable range is reasonably defined.Subsequently, the RAM lookup process is sorted out to complete the mapping of nonlinear functions.Then, the data flow control logic architecture of the overall post-processing algorithm is elaborated, and some practical techniques are proposed for key functional modules.Finally, the acceleration scheme is tested on the board based on the domestic ZYNQ chip, and the performance is evaluated from multiple dimensions with the reasons analyzed.The experimental results show that the implementation scheme occupies less than 3% of the logic resources, with a calculation accuracy loss of about 0.5%, and the calculation efficiency is 7 times higher than that of the CPU. When connected to real-time video acquisition, the FPGA system runs stably, and the target frame detection and marking are accurate.
  • Accepted: 2025-09-17
    The existing debugging functionalities in embedded real-time operating system application development encompass variable inspection,breakpoint management,and memory read/write operations,which generally satisfy users’ debugging requirements for multitasking applications.However,there is limited focus on debugging specific tasks during the multitasking debugging process.Particularly,when multiple tasks invoke the same funtion ,it creates inconvenience for users during debugging.This paper presents a method for debugging specified tasks in multitasking programs,implemented on an embedded real-time operating system and an autonomous debugger software for the “HunXin” digital signal processor. This approach significantly enhances the efficiency of multitasking application debugging and reduces the application development time.
  • Accepted: 2025-09-17
    PCIe interface bus enables low-latency, high-bandwidth data transmission between CPU and FPGA, with the key factor being the design of a DMA engine, allowing CPU to be uninvolved in the data transmission. However, the majority of current CPU+FPGA data transmission solutions are based on foreign FPGA devices from Xilinx, and there is a severe shortage of commercial IP cores for domestic FPGA, making it challenging to port these solutions to domestic FPGA platforms. Therefore, this paper uses domestic FPGA to design a PCIe interface-based DMA engine and its corresponding driver, hiding the parsing of transaction layer packets in the PCIe protocol stack and reducing the development complexity of domestic FPGA in PCIe based applications. Experimental results demonstrate that, the DMA engine achieves a read throughput of 784 MB/s and a write throughput of 800 MB/s via PCIe 2.0 x2 bus, reaching 82% and 84% of the theoretical maximum bandwidth of PCIe 2.0 x2, respectively.
  • Accepted: 2025-09-02
    With the increasing demand for non-volatile storage in embedded systems, the functional verification of embedded flash (eFlash) controllers has become a crucial step to ensure system reliability. In response to the low efficiency and poor timing compatibility of traditional directed testing in eFlash controller verification, this paper designs and implements an efficient verification platform for eFlash controllers based on the Universal Verification Methodology (UVM) and oriented to the AHB-Lite bus. The platform utilizes the core components of UVM to achieve a hierarchical architecture, and employs automated scripts and an integrated register model (RAL), adopting random constraint testing and coverage-driven strategies. This ensures verification completeness while shortening the verification cycle. The verification results show that this verification platform can effectively verify the various functions of the eFlash controller, achieving 100% code coverage and 100% functional coverage.