基于FPGA的实时高效稠密光流加速器

冯钰泰, 徐文炀, 陈璠, 王稼兴, 汤勇明, 孙豪

集成电路与嵌入式系统 ›› 2025, Vol. 25 ›› Issue (6) : 78-86.

PDF(9181 KB)
PDF(9181 KB)
集成电路与嵌入式系统 ›› 2025, Vol. 25 ›› Issue (6) : 78-86. DOI: 10.20193/j.ices2097-4191.2025.0021
FPGA前沿技术与应用研究专刊

基于FPGA的实时高效稠密光流加速器

作者信息 +

Real-time efficient dense optical flow accelerator based on FPGA

Author information +
文章历史 +

摘要

光流法通过分析帧间像素位移构建密集运动场表征,能够以亚像素精度量化场景中物体的运动方向与速度,是具身智能、低空经济中智能感知与定位导航等应用的核心技术。然而,稠密光流算法面临较高的计算复杂度,并且其多层金字塔结构以及层间数据依赖关系导致访存效率低和计算资源闲置等问题,这些因素共同限制了该算法在边缘侧的实时高效部署。为了解决这一问题,基于算法、架构与电路协同设计的优化策略,针对稠密LK金字塔光流算法提出了一种实时且高效的FPGA硬件加速方案。该方案通过批量双线性插值与时间梯度生成优化算法精度与硬件友好性,通过金字塔多层流水折叠设计优化硬件架构并行度,通过三级分段处理架构优化金字塔下采样过程的访存效率,进而显著提升了稠密光流计算的能效与实时性。在AMD KV260平台的实测结果表明,该加速器相比高性能CPU提升了102倍的处理速度,在752×480分辨率下实现62 f/s的实时处理能力,平均端点误差(AEE)为0.522 pixel,平均角度误差(AAE)为0.325°,为高动态视觉感知场景提供了兼具高精度与低延迟的硬件加速解决方案。

Abstract

The optical flow method constructs a dense motion field representation by analyzing the pixel displacements between frames, which can quantify the motion direction and velocity of objects in the scene with sub-pixel accuracy, and is a core technology for applications such as body-awareness, intelligent sensing in low-altitude economy, and localization and navigation. However, the dense optical flow algorithm faces high computational complexity, and its multi-layer pyramid structure and inter-layer data dependencies lead to problems such as inefficient access and idle computational resources, which together limit the real-time and efficient deployment of the algorithm on the edge side. In order to solve this problem, this paper proposes a real-time and efficient FPGA hardware acceleration scheme for the dense LK pyramid optical flow algorithm based on the optimization strategy of co-designing algorithms, architectures and circuits. The scheme optimizes the algorithm accuracy and hardware friendliness through batch bilinear interpolation and temporal gradient generation, optimizes the parallelism of hardware architecture through pyramid multilayer folding design, and optimizes the access efficiency of pyramid downsampling process through three-stage segmentation architecture, which significantly improves the energy efficiency and real-time performance of the dense LK optic flow computation. Measurements on the AMD KV260 platform show that the hardware accelerator achieves 102 times faster processing speed compared to high-performance CPUs, realizes 62 f/s real-time processing capability at 752×480 resolution, with an average endpoint error (AEE) of 0.522 pixel, and an average angular error (AAE) of 0.325°, providing both highly dynamic visual perception scenarios. This provides a hardware-accelerated solution with high accuracy and low latency for highly dynamic visual perception scenes.

关键词

FPGA / 硬件加速器 / 软硬件协同设计 / 稠密光流 / 图像金字塔

Key words

FPGA / hardware accelerator / hardware and software co-design / dense optical flow / image pyramid

引用本文

导出引用
冯钰泰, 徐文炀, 陈璠, . 基于FPGA的实时高效稠密光流加速器[J]. 集成电路与嵌入式系统. 2025, 25(6): 78-86 https://doi.org/10.20193/j.ices2097-4191.2025.0021
FENG Yutai, XU Wenyang, CHEN Fan, et al. Real-time efficient dense optical flow accelerator based on FPGA[J]. Integrated Circuits and Embedded Systems. 2025, 25(6): 78-86 https://doi.org/10.20193/j.ices2097-4191.2025.0021
中图分类号: TN47 (大规模集成电路、超大规模集成电路)   

参考文献

[1]
G NDESOUZA, A C KAK. Vision for mobile robot navigation: A survey[J]. IEEE Trans. Pattern Anal. Mach. Intell., 2002, 24(2):237-267.doi: 10.1109/34.982903.
[2]
B PADEN, M CÁP, S Z YONG, et al. A survey of motion planning and control techniques for self-driving urban vehicles[J]. IEEE Trans. Intell. Vehicles, 2016, 1(1):33-55.doi:10.1109/TIV.2016.2578706.
[3]
C CADENA. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age[J]. IEEE Transactions on robotics, 2016, 32(6): 1309-1332.
[4]
EDDY ILG, NIKOLAUS MAYER, TONMOY SAIKIA, et al. Flownet 2.0: Evolution of optical flow estimation with deep networks[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2017:2462-2470.
[5]
SUN D Q, YANG X D, LIU M Y, et al. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8934-8943.
[6]
TEED Z, DENG J. Raft: Recurrent all-pairs field transforms for optical flow[C]// Computer Vision-ECCV 2020: 16th European Conference,Glasgow,UK, 2020:23-28.
[7]
B D LUCAS, T KANADE. An iterative image registration technique with an application to stereo vision[C]// Proc. Int. Joint Conf. Artif. Intell.,Vancouver,Canada, 1981: 24-28.
[8]
HORN B KP, SCHUNCK B G. Determining optical flow[J]. Artificial intelligence, 1981, 17(1-3):185-203.
[9]
F BARRANCO, M TOMASI, J DIAZ, et al. Parallel architecture for hierarchical optical flow estimation based on FPGA[J]. IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 2012, 20(6):1058-1067.
[10]
M TOMASI, M VANEGAS, F BARRANCO, et al. Highperformance optical-flow architecture based on a multi-scale, multiorientation phase-based model[J]. IEEE Trans. Circuits Syst. Video Technol., 2010, 20(12):1797-1807.
[11]
F BARRANCO, M TOMASI, M VANEGAS, et al. Hierarchical architecture for motion and depth estimations based on color cues[J]. J. Real-Time Image Process., 2015, 10(2):435-452.
[12]
K SEYID, A RICHAUD, R CAPOCCIA, et al. FPGA-Based Hardware Implementation of Real-Time Optical Flow Calculation[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(1):206-216.doi: 10.1109/TCSVT.2016.2598703.
[13]
J TANG, B YU, S LIU, et al. P-SoC:Heterogeneous SoC architecture for visual inertial SLAM applications[C]// Proc.IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2018:8302-8307.
[14]
YAN Y X. An efficient real-time accelerator for highaccuracy DNN-based optical flow estimation in FPGA[J]. Journal of Systems Architecture, 2023, 136:102818.
[15]
S J JANG, C M KYUNG. Resource-efficient and high-throughput vlsi design of global optical flow method for mobile systems[C]// IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2020.
[16]
GONG Y F, ZHANG J S, LIU X, et al. A Real-Time and Efficient Optical Flow Tracking Accelerator on FPGA Platform[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2023, 70(12):1-14.
[17]
LIU Y H. An FPGA-based Ultra-High Performance and Scalable Optical Flow Hardware Accelerator for Autonomous Driving[C]// 2024 IEEE International Symposium on Circuits and Systems (ISCAS), 2024.
[18]
S SMETS, T GOEDEMÉ, M VERHELST. Custom processor design for efficient, yet flexible Lucas-Kanade optical flow[C]// Proc. Conf. Design Archit. Signal Image Process.(DASIP), 2016:138-145.
[19]
M KUNZ, A OSTROWSKI, P ZIPF. An FPGA-optimized architecture of Horn and Schunck optical flow algorithm for real-time applications[C]// Proc. 24th Int. Conf. Field Program. Log. Appl. (FPL), 2014:1-4.
[20]
V MAHALINGAM, K BHATTACHARYA, N RANGANATHAN, et al. A VLSI architecture and algorithm for Lucas-Kanade-based optical flow computation[J]. IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 2010, 18(1):29-38.

编辑: 薛士然
PDF(9181 KB)

Accesses

Citation

Detail

段落导航
相关文章

/