Abstract
With the widespread application of FPGAs in high-performance embedded computing and data centers, the demand for data transmission bandwidth via the PCIe interface is increasing. Xilinx's XDMA IP core, as a mainstream high-performance DMA solution, often has its actual performance limited by the complex memory management mechanism of the Linux system. This paper studies the key points affecting the XDMA transmission performance in the standard Linux driver model through theoretical and modeling analysis, and finds that the "lazy allocation" strategy of user-space memory causes the allocation and mapping of physical pages to be delayed until after the DMA transmission request is initiated, frequently triggering page faults and increasing the TLB miss rate, which seriously restricts the efficiency and determinacy of high-bandwidth transmission. This paper proposes an application layer memory pre-mapping optimization strategy that utilizes advanced parameters of the mmap system call. This strategy moves the physical memory allocation, page table establishment, and page locking operations forward to the system initialization stage, thereby reducing the runtime overhead and significantly improving the subsequent memory access efficiency. Theoretical analysis and experimental results show that this strategy increases the data transmission rate by 85.5% under the default TLB size compared to the optimized version. Furthermore, the impact of TLB size on XDMA transmission is studied, which is of great reference significance for building high-performance, low-latency embedded heterogeneous systems.
Key words
FPGA /
XDMA /
Linux memory management /
strategy of pre-mapping /
performance optimization
Cite this article
Download Citations
Research on Huge-Page Memory Pre-mapping Optimization Mechanism in Embedded Systems[J]. Integrated Circuits and Embedded Systems. 0 https://doi.org/10.20193/j.ices2097-4191.2026.0049
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}