Overview of binocular stereo vision processor for robot navigation

CHEN Zhuoyu, AN Fengwei

Integrated Circuits and Embedded Systems ›› 2024, Vol. 24 ›› Issue (11) : 15-28.

PDF(4786 KB)
PDF(4786 KB)
Integrated Circuits and Embedded Systems ›› 2024, Vol. 24 ›› Issue (11) : 15-28. DOI: 10.20193/j.ices2097-4191.2024.0036
Special Topic of Energy-efficient Dedicated Chips for Intelligent Robots

Overview of binocular stereo vision processor for robot navigation

Author information +
History +

Abstract

With the rapid development of the robotics industry, robotic technology has emerged as a new driving force for enhancing productivity, particularly highlighting the importance of technologies such as 3D reconstruction and obstacle avoidance navigation. However, active 3D imaging technologies based on Time of Flight (ToF) and structured light suffer from limitations such as low resolution, lack of original color information, and and susceptibility to ambient light interference, leading to suboptimal performance. Therefore, passive binocular stereo vision sensors, which can output dense depth and color information (RGB-D) in real-time, have been widely applied in fields such as autonomous robots, automobiles, and drones. Nonetheless, binocular stereo vision technology, which calculates disparity by mimicking human binocular vision for depth information, is computationally intensive and reliant on general-purpose computing platforms. This results in high energy consumption and latency for binocular stereo vision processors, limiting the technology's application in high-speed scenarios, small robots and edge computing. In recent years, binocular stereo vision processors integrated with hardware accelerators for stereo vision algorithms have gained significant attention in both academia and industry. This article systematically explains the theoretical foundation of binocular 3D stereo vision and its application examples in robotic stereo vision in the first section. It then introduces the structural components of binocular stereo vision processors, including core parts such as image acquisition, camera calibration and correction, and stereo matching. For the convenience of stereo vision hardware developers, this paper reviews the basic concepts, research status, challenges, and future trends based on the core components of the binocular stereo vision system, with a special focus on comparing new hardware computing architectures.

Key words

robots / stereo vision / visual obstacle navigation / image signal processor / hardware architecture / hardware acceleration

Cite this article

Download Citations
CHEN Zhuoyu , AN Fengwei. Overview of binocular stereo vision processor for robot navigation[J]. Integrated Circuits and Embedded Systems. 2024, 24(11): 15-28 https://doi.org/10.20193/j.ices2097-4191.2024.0036

References

[1]
HANSARD MILES. Time-of-Flight Cameras[J]. Springer Briefs in Computer Science, 2013.https://doi.org/10.1007/978-1-4471-4658-2.
[2]
FOIX S, ALENYA G, TORRAS C. Lock-in Time-of-Flight (ToF) Cameras:A Survey[J/OL]. IEEE Sensors Journal, 2011: 1917-1926. http://dx.doi.org/10.1109/jsen.2010.2101060. Doi:10.1109/jsen.2010.2101060.
[3]
KIM M Y, AYAZ S M, PARK J, et al. Adaptive 3D sensing system based on variable magnification using stereo vision and structured light[J/OL]. Optics and Lasers in Engineering, 2014, 55:113-127.http://dx.doi.org/10.1016/j.optlaseng.2013.10.021. Doi:10.1016/j.optlaseng.2013.10.021.
[4]
ZHANG SH, WANG CH, CHAN S C. A new high resolution depth map estimation system using stereo vision and depth sensing device[C/OL]// 2013 IEEE 9th International Colloquium on Signal Processing and its Applications, 2013.http://dx.doi.org/10.1109/cspa.2013.6530012. Doi:10.1109/cspa.2013.6530012.
[5]
CAMPOS C, ELVIRA R, RODRIGUEZ J J G, et al. ORB-SLAM3:An Accurate Open-Source Library for Visual,Visual-Inertial and Multi-Map SLAM[J/OL]. IEEE Transactions on Robotics, 2021:1874-1890.http://dx.doi.org/10.1109/tro.2021.3075644. Doi:10.1109/tro.2021.3075644.
[6]
MUR-ARTAL R, TARDOS J D. ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras[J/OL]. IEEE Transactions on Robotics, 2017:1255-1262. http://dx.doi.org/10.1109/tro.2017.2705103. Doi:10.1109/tro.2017.2705103.
[7]
YAO R, DENG H, ZHANG W, et al. Asynchronous Double-Frame-Exposure Binocular-Camera with Pixel-Level Pipeline Architecture for High-speed Motion Tracking[J/OL]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2022, 69(6): 2967-2971.http://dx.doi.org/10.1109/tcsii.2022.3166772. Doi:10.1109/tcsii.2022.3166772.
[8]
DOVAL G N, AL-KAFF A, BELTRAN J, et al. Traffic Sign Detection and 3D Localization via Deep Convolutional Neural Networks and Stereo Vision[C/OL]// 2019 IEEE Intelligent Transportation Systems Conference (ITSC), 2019.http://dx.doi.org/10.1109/itsc.2019.8916958. Doi:10.1109/itsc.2019.8916958.
[9]
FAN R, LIU Y, YANG X, et al. Real-Time Stereo Vision for Road Surface 3-D Reconstruction[C/OL]// 2018 IEEE International Conference on Imaging Systems and Techniques (IST), 2018. http://dx.doi.org/10.1109/ist.2018.8577119. Doi:10.1109/ist.2018.8577119.
[10]
DA SILVEIRA T L T, JUNG C R. Dense 3D Scene Reconstruction from Multiple Spherical Images for 3-DoF+ VR Applications[C/OL]// 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). 2019.http://dx.doi.org/10.1109/vr.2019.8798281. Doi:10.1109/vr.2019.8798281.
[11]
MARR D. Vision:A Computational Investigation into the Human Representation and Processing of Visual Information[J/OL]. Journal of Mathematical Psychology, 1983: 107-110.http://dx.doi.org/10.1016/0022-2496(83)90030-5. Doi:10.1016/0022-2496(83)90030-5.
[12]
BARNARD S T, FISCHLER M A. Computational Stereo[J/OL]. ACM Computing Surveys, 1982:553-572.http://dx.doi.org/10.1145/356893.356896. Doi:10.1145/356893.356896.
[13]
LIPSON L, TEED Z, DENG J. RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching[C/OL]// 2021 International Conference on 3D Vision (3DV), 2021. http://dx.doi.org/10.1109/3dv53792.2021.00032. Doi:10.1109/3dv53792.2021.00032.
[14]
LUO W, SCHWING A G, URTASUN R. Efficient Deep Learning for Stereo Matching[C/OL]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.http://dx.doi.org/10.1109/cvpr.2016.614. Doi:10.1109/cvpr.2016.614.
[15]
KENDALL A, MARTIROSYAN H, DASGUPTA S, et al. End-to-End Learning of Geometry and Context for Deep Stereo Regression[C/OL]// 2017 IEEE International Conference on Computer Vision (ICCV), 2017.http://dx.doi.org/10.1109/iccv.2017.17. Doi:10.1109/iccv.2017.17.
[16]
LI Z, DONG Q, SALIGANE M, et al. 3.7 A 1920×1080 30fps 2.3TOPS/W stereo-depth processor for robust autonomous navigation[C/OL]// 2017 IEEE International Solid-State Circuits Conference (ISSCC), 2017.http://dx.doi.org/10.1109/isscc.2017.7870261. Doi:10.1109/isscc.2017.7870261.
[17]
LI Z, LU Y, DO A, et al. A 4.2pJ/Pixel 480 fps Stereo Vision Processor with Pixel Level Pipelined Architecture and Two-path Aggregation Semi-Global Matching[C/OL]// IEEE CICC 2024. DOI:10.1109/CICC60959.2024.10528980.
[18]
DONG P, CHEN Z, LI K, et al. A 1920×1080 129fps 4.3pJ/pixel Stereo-Matching Processor for Pico Aerial Vehicles[C/OL]// 2023 IEEE European Solid State Circuits Conference (ESSCIRC),2023:345-348.DOI:10.1109/ESSCIRC59616.2023.10268790.
[19]
章毓晋. 图像工程(下册)——图像理解[M]. 4版. 北京: 清华大学出版社, 2018.
ZHANG Y J. Image Engineering (Volume II) -Image Understanding[M]. 4th Edition.Beijing: Tsinghua University Press, 2018 (in Chinese).
[20]
章毓晋. 3D计算机视觉:原理、算法及应用[M]. 北京: 电子工业出版社, 2021.
ZHANG Y J. D Computer Vision: Principles,Algorithms And Applications[M]. Beijing: Electronic Industry Press, 2021 (in Chinese).
[21]
KESELMAN L, WOODFILL J I, GRUNNET-JEPSEN A, et al. Intel(R) RealSense(TM) Stereoscopic Depth Cameras[C/OL]// 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017.http://dx.doi.org/10.1109/cvprw.2017.167. Doi:10.1109/cvprw.2017.167.
[22]
WANG J, GAO Z, ZHANG Y, et al. Real-Time Detection and Location of Potted Flowers Based on a ZED Camera and a YOLO V4-Tiny Deep Learning Algorithm[J/OL]. Horticulturae, 2021:21.http://dx.doi.org/10.3390/horticulturae8010021. Doi:10.3390/horticulturae8010021.
[23]
LEE Y, KIM H. A High-Throughput Depth Estimation Processor for Accurate Semiglobal Stereo Matching Using Pipelined Inter-Pixel Aggregation[J/OL]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(1):411-422.http://dx.doi.org/10.1109/tcsvt.2021.3061200. Doi:10.1109/tcsvt.2021.3061200.
[24]
CHEN G, LING Y, HE T, et al. StereoEngine:An FPGA-Based Accelerator for Real-Time High-Quality Stereo Estimation With Binary Neural Network[J/OL]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020:4179-4190.http://dx.doi.org/10.1109/tcad.2020.3012864. Doi:10.1109/tcad.2020.3012864.
[25]
LI K, GUAN X, DONG P, et al. A 320 FPS Pixel-Level Pipelined Stereo Vision Accelerator with Regional Optimization and Multi-direction Hole Filling[C/OL]// 2022 IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia). 2022:85-88.DOI:10.1109/PrimeAsia56064.2022.10104009.
[26]
ZABIH R, WOODFILL J. Non-parametric local transforms for computing visual correspondence[J/OL]// Computer Vision — ECCV ’94: 1994(801):151-158.http://dx.doi.org/10.1007/bfb0028345. Doi:10.1007/bfb0028345.
[27]
CHEN X, WU X, GAO S, et al. Synchronization and calibration of a stereo vision system[C/OL]//Global Oceans 2020:Singapore-U. S. Gulf Coast. 2020. http://dx.doi.org/10.1109/ieeeconf38699.2020.9389422. Doi:10.1109/ieeeconf38699.2020.9389422.
[28]
CHEN Z, DONG P, LI Z, et al. Real-Time FPGA-Based Binocular Stereo Vision System with Semi-Global Matching Algorithm[C/OL]// 2021 IEEE 34th International System-on-Chip Conference (SOCC), 2021.http://dx.doi.org/10.1109/socc52499.2021.9739626. Doi:10.1109/socc52499.2021.9739626.
[29]
ZHANG X, TANG X, YU L, et al. Automated Camera Exposure Control for Accuracy-Enhanced Stereo-Digital Image Correlation Measurement[J/OL]. Sensors, 2022, 22(24):9641.http://dx.doi.org/10.3390/s22249641. Doi:10.3390/s22249641.
[30]
ZHENG H, ZHANG Z, FAN J, et al. Decoupled Cross-Scale Cross-View Interaction for Stereo Image Enhancement in the Dark[C/OL]// Proceedings of the 31st ACM International Conference on Multimedia, 2023.DOI:10.1145/3581783.3611962.
[31]
ZHAO M, QIN X, DU S, et al. Low-light Stereo Image Enhancement and De-noising in the Low-frequency Information Enhanced Image Space[J/OL]. arXiv preprint arXiv:2401.07753, 2024.DOI:10.48550/arXiv.2401.07753.
[32]
SHI G, WANG X, OUYANG Y, et al. A Spatio-Temporal Video Denoising Co-Processor With Adaptive Codec[J/OL]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2023, 70(11):4223-4234.DOI:10.1109/tcsi.2023.3311486.
[33]
XU C, PENG Z, HU X, et al. FPGA-Based Low-Visibility Enhancement Accelerator for Video Sequence by Adaptive Histogram Equalization With Dynamic Clip-Threshold[J/OL]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2020:3954-3964.http://dx.doi.org/10.1109/tcsi.2020.3010634. Doi:10.1109/tcsi.2020.3010634.
[34]
YAO R, CHEN L, DONG P, et al. A Compact Hardware Architecture for Bilateral Filter With the Combination of Approximate Computing and Look-Up Table[J/OL]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2022, 69(7):3324-3328.http://dx.doi.org/10.1109/tcsii.2022.3159261. Doi:10.1109/tcsii.2022.3159261.
[35]
ZHANG Z. A flexible new technique for camera calibration[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000:1330-1334.http://dx.doi.org/10.1109/34.888718. Doi:10.1109/34.888718.
[36]
BOUGUET J Y. Camera Calibration Toolbox for Matlab[EB/OL].[2024-07-10]. http://www.vision.caltech.edu/bouguetj/calib_doc/.
[37]
FURGALE P, REHDER J, SIEGWART R. Unified temporal and spatial calibration for multi-sensor systems[C/OL]// 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013.http://dx.doi.org/10.1109/iros.2013.6696514. Doi:10.1109/iros.2013.6696514.
[38]
MUR-ARTAL R, MONTIEL J M M, TARDOS J D. ORB-SLAM:A Versatile and Accurate Monocular SLAM System[J/OL]. IEEE Transactions on Robotics, 2015:1147-1163.http://dx.doi.org/10.1109/tro.2015.2463671. Doi:10.1109/tro.2015.2463671.
[39]
LING Y, SHEN S. High-precision online markerless stereo extrinsic calibration[C/OL]// 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016.http://dx.doi.org/10.1109/iros.2016.7759283. Doi:10.1109/iros.2016.7759283.
[40]
PAPADIMITRIOU D V, DENNIS T J. Epipolar line estimation and rectification for stereo image pairs[J/OL]. IEEE Transactions on Image Processing, 1996, 5(4):672-676.http://dx.doi.org/10.1109/83.491345. Doi:10.1109/83.491345.
The assumption that epipolar lines are parallel to image scan lines is made in many algorithms for stereo analysis. If valid, it enables the search for corresponding image features to be confined to one dimension and, hence, simplified. An algorithm that generates a vertically aligned stereo pair by warped resampling is described. The method uses grey scale image matching between the components of the stereo pair but confined to feature points.
[41]
ANDREA F, EMANUELE T, ALESSANDRO V. A compact algorithm for rectification of stereo pairs[J]. Machine Vision and Applications, 2000, 12(1):16-22.
[42]
VANCEA C, NEDEVSCHI S. LUT-based Image Rectification Module Implemented in FPGA[C/OL]// 2007 IEEE International Conference on Intelligent Computer Communication and Processing, 2007.http://dx.doi.org/10.1109/iccp.2007.4352154. Doi:10.1109/iccp.2007.4352154.
[43]
AKIN A, BAZ I, GAEMPERLE L M, et al. Compressed look-up-table based real-time rectification hardware[C/OL]// 2013 IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC), 2013.http://dx.doi.org/10.1109/vlsi-soc.2013.6673288. Doi:10.1109/vlsi-soc.2013.6673288.
[44]
HUBERT H, STABERNACK B, ZILLY F. Architecture of a Low Latency Image Rectification Engine for Stereoscopic 3-D HDTV Processing[J/OL]. IEEE Transactions on Circuits and Systems for Video Technology, 2013, 23(5):813-822.http://dx.doi.org/10.1109/tcsvt.2012.2223795. Doi:10.1109/tcsvt.2012.2223795.
[45]
DONG P, CHEN Z, LI Z, et al. Configurable Image Rectification and Disparity Refinement for Stereo Vision[J/OL]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2022, 69(10):3973-3977.http://dx.doi.org/10.1109/tcsii.2022.3191811. Doi:10.1109/tcsii.2022.3191811.
[46]
DONG P, CHEN Z, LI Z, et al. A 4.29nJ/pixel Stereo Depth Coprocessor With Pixel Level Pipeline and Region Optimized Semi-Global Matching for IoT Application[J/OL]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2022:334-346.http://dx.doi.org/10.1109/tcsi.2021.3100071. Doi:10.1109/tcsi.2021.3100071.
[47]
FSIAN H, MOHAMMADI V, GOUTON P, et al. Comparison of Stereo Matching Algorithms for the Development of Disparity Map[J/OL]. ArXiv, 2022,abs/2210. 15926.DOI:10.48550/arXiv.2210.15926.
[48]
HIRSCHMULLER H. Stereo Processing by Semiglobal Matching and Mutual Information[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008:328-341.http://dx.doi.org/10.1109/tpami.2007.1166. Doi:10.1109/tpami.2007.1166.
[49]
DONG P, LI Z, CHEN Z, et al. A 139 fps pixel-level pipelined binocular stereo vision accelerator with region-optimized semi-global matching[C/OL]// 2021 IEEE Asian Solid-State Circuits Conference (A-SSCC), 2021.http://dx.doi.org/10.1109/a-sscc53895.2021.9634805. Doi:10.1109/a-sscc53895.2021.9634805.
[50]
HERNANDEZ-JUAREZ D, CHACÓN A, ESPINOSA A, et al. Embedded Real-time Stereo Estimation via Semi-global Matching on the GPU[J/OL]. Procedia Computer Science, 2016:143-153.http://dx.doi.org/10.1016/j.procs.2016.05.305. Doi:10.1016/j.procs.2016.05.305.
[51]
MEI X, SUN X, ZHOU M, et al. On building an accurate stereo matching system on graphics hardware[C/OL]// 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2011.http://dx.doi.org/10.1109/iccvw.2011.6130280. Doi:10.1109/iccvw.2011.6130280.
[52]
HIRSCHMULLER H, SCHARSTEIN D. Evaluation of Cost Functions for Stereo Matching[C/OL]// 2007 IEEE Conference on Computer Vision and Pattern Recognition. 2007.http://dx.doi.org/10.1109/cvpr.2007.383248. Doi:10.1109/cvpr.2007.383248.
[53]
RHEMANN C, HOSNI A, BLEYER M, et al. Fast cost-volume filtering for visual correspondence and beyond[C/OL]// CVPR 2011, 2011.http://dx.doi.org/10.1109/cvpr.2011.5995372. Doi:10.1109/cvpr.2011.5995372.
[54]
KLAUS A, SORMANN M, KARNER K. Segment-Based Stereo Matching Using Belief Propagation and a Self-Adapting Dissimilarity Measure[C/OL]// 18th International Conference on Pattern Recognition (ICPR’06), 2006.http://dx.doi.org/10.1109/icpr.2006.1033. Doi:10.1109/icpr.2006.1033.
[55]
CHEN L D, LU Y T, HIAO Y L, et al. A 95pJ/label Wide-Range Depth-Estimation Processor for Full-HD Light-Field Applications on FPGA[C/OL]// 2018 IEEE Asian Solid-State Circuits Conference (A-SSCC), 2018.http://dx.doi.org/10.1109/asscc.2018.8579289. Doi:10.1109/asscc.2018.8579289.
[56]
CAMBUIM L F S, OLIVEIRA L A, BARROS E N S, et al. An FPGA-based real-time occlusion robust stereo vision system using semi-global matching[J/OL]. Journal of Real-Time Image Processing, 2020, 17(5):1447-1468.http://dx.doi.org/10.1007/s11554-019-00902-w. Doi:10.1007/s11554-019-00902-w.
[57]
LU Z, WANG J, LI Z, et al. A Resource-Efficient Pipelined Architecture for Real-Time Semi-Global Stereo Matching[J/OL]. IEEE Transactions on Circuits and Systems for Video Technology, 2022:660-673.http://dx.doi.org/10.1109/tcsvt.2021.3061704. Doi:10.1109/tcsvt.2021.3061704.
[58]
WANG H, ZHOU W, ZHANG X, et al. A 39pJ/label 1920×1080 165.7 FPS Block PatchMatch Based Stereo Matching Processor on FPGA[C/OL]// 2022 IEEE Custom Integrated Circuits Conference (CICC), 2022.http://dx.doi.org/10.1109/cicc53496.2022.9772830. Doi:10.1109/cicc53496.2022.9772830.
[59]
LI Z, WANG J, SYLVESTER D, et al. A 1920×1080 25-Frames/s 2.4-TOPS/W Low-Power 6-D Vision Processor for Unified Optical Flow and Stereo Depth With Semi-Global Matching[J/OL]. IEEE Journal of Solid-State Circuits, 2019, 54(4):1048-1058.http://dx.doi.org/10.1109/jssc.2018.2885559. Doi:10.1109/jssc.2018.2885559.
[60]
ZHANG Y, ZHENG Y, LING Y, et al. A robust and real-time DNN-based multi-baseline stereo accelerator in FPGAs[J/OL]. Journal of Systems Architecture, 2023, 143.DOI:https://doi.org/10.1016/j.sysarc.2023.102966.
[61]
GU X, FAN Z, ZHU S, et al. Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching[C/OL]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.http://dx.doi.org/10.1109/cvpr42600.2020.00257. Doi:10.1109/cvpr42600.2020.00257.
[62]
MILELLA A, REINA G. 3D reconstruction and classification of natural environments by an autonomous vehicle using multi-baseline stereo[J/OL]. Intelligent Service Robotics, 2014, 7(2):79-92.http://dx.doi.org/10.1007/s11370-014-0146-x. Doi:10.1007/s11370-014-0146-x.
PDF(4786 KB)

Accesses

Citation

Detail

Sections
Recommended

/