光流法通过分析帧间像素位移构建密集运动场表征,能够以亚像素精度量化场景中物体的运动方向与速度,是具身智能、低空经济中智能感知与定位导航等应用的核心技术。然而,稠密光流算法面临较高的计算复杂度,并且其多层金字塔结构以及层间数据依赖关系导致访存效率低和计算资源闲置等问题,这些因素共同限制了该算法在边缘侧的实时高效部署。为了解决这一问题,基于算法、架构与电路协同设计的优化策略,针对稠密LK金字塔光流算法提出了一种实时且高效的FPGA硬件加速方案。该方案通过批量双线性插值与时间梯度生成优化算法精度与硬件友好性,通过金字塔多层流水折叠设计优化硬件架构并行度,通过三级分段处理架构优化金字塔下采样过程的访存效率,进而显著提升了稠密光流计算的能效与实时性。在AMD KV260平台的实测结果表明,该加速器相比高性能CPU提升了102倍的处理速度,在752×480分辨率下实现62 f/s的实时处理能力,平均端点误差(AEE)为0.522 pixel,平均角度误差(AAE)为0.325°,为高动态视觉感知场景提供了兼具高精度与低延迟的硬件加速解决方案。
The optical flow method constructs a dense motion field representation by analyzing the pixel displacements between frames, which can quantify the motion direction and velocity of objects in the scene with sub-pixel accuracy, and is a core technology for applications such as body-awareness, intelligent sensing in low-altitude economy, and localization and navigation. However, the dense optical flow algorithm faces high computational complexity, and its multi-layer pyramid structure and inter-layer data dependencies lead to problems such as inefficient access and idle computational resources, which together limit the real-time and efficient deployment of the algorithm on the edge side. In order to solve this problem, this paper proposes a real-time and efficient FPGA hardware acceleration scheme for the dense LK pyramid optical flow algorithm based on the optimization strategy of co-designing algorithms, architectures and circuits. The scheme optimizes the algorithm accuracy and hardware friendliness through batch bilinear interpolation and temporal gradient generation, optimizes the parallelism of hardware architecture through pyramid multilayer folding design, and optimizes the access efficiency of pyramid downsampling process through three-stage segmentation architecture, which significantly improves the energy efficiency and real-time performance of the dense LK optic flow computation. Measurements on the AMD KV260 platform show that the hardware accelerator achieves 102 times faster processing speed compared to high-performance CPUs, realizes 62 f/s real-time processing capability at 752×480 resolution, with an average endpoint error (AEE) of 0.522 pixel, and an average angular error (AAE) of 0.325°, providing both highly dynamic visual perception scenarios. This provides a hardware-accelerated solution with high accuracy and low latency for highly dynamic visual perception scenes.