Accepted: 2025-09-17
PCIe interface bus enables low-latency, high-bandwidth data transmission between CPU and FPGA, with the key factor being the design of a DMA engine, allowing CPU to be uninvolved in the data transmission. However, the majority of current CPU+FPGA data transmission solutions are based on foreign FPGA devices from Xilinx, and there is a severe shortage of commercial IP cores for domestic FPGA, making it challenging to port these solutions to domestic FPGA platforms. Therefore, this paper uses domestic FPGA to design a PCIe interface-based DMA engine and its corresponding driver, hiding the parsing of transaction layer packets in the PCIe protocol stack and reducing the development complexity of domestic FPGA in PCIe based applications. Experimental results demonstrate that, the DMA engine achieves a read throughput of 784 MB/s and a write throughput of 800 MB/s via PCIe 2.0 x2 bus, reaching 82% and 84% of the theoretical maximum bandwidth of PCIe 2.0 x2, respectively.