随着优化技术的发展以及对神经网络的进一步认知,提出ConvNeXt网络并应用于视觉分类任务,其性能超越Transformer等一系列多参数量和多计算量网络。姿态估计任务是计算机视觉任务中的基本任务,也是手势识别技术的基础,有着广泛的应用前景。将ConvNeXt网络应用于手势姿态估计并进行优化,引入heatmap编码,从而增加对关键点坐标预测的准确率。使用改进的Adamw优化器对模型参数进行优化,其PCK@0.2指标达到了0.992,EPE指标也达到了3.47,超越了其他模型的实验结果。
Abstract
With the development of optimization technology and the further recognition of neural network,ConvNeXt network is proposed and applied to visual classification tasks,and its performance surpassed a series of large-parameter and large-computational networks such as Transformer.Pose estimation task is the basic task in computer vision task and the basis of hand gesture recognition,which has a wide range of application prospects.In this paper,the ConvNeXt network is applied to hand pose estimation,and optimized.Heatmap encoding is introduced to increase the accuracy of key point coordinate prediction.Using the improved Adamw optimizer to optimize the model parameters,its PCK@0.2 index reached 0.992,and the EPE index also reached 3.47,surpassing the experimental results of other models.
关键词
深度学习 /
姿态估计 /
手势识别 /
关键点检测 /
模式识别 /
ConvNeXt
Key words
deep learning /
pose estimation /
hand gesture recognition /
keypoint detection /
pattern recognition /
ConvNeXt
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 牟书辉,李凡.基于人体姿态视觉判断的立定跳远成绩测量方法[J].中国科技论文,2022,17(11):1181-1187.
[2] 刘勇,李杰,任立成,等.并联化高分辨网络的人体姿态估计方法[J].计算机工程与设计,2022,43(1):237-244.
[3] 孙志勇,李宏友,叶俊勇.基于弱监督迁移网络的3D人体关节点识别[J].吉林大学学报(工学版):2022(4):1-9.
[4] Krizhevsky A,Sutskever I,Hinton G.ImageNet Classification with Deep Convolutional Neural Networks[J].Advances in neural information processing systems,2012,25(2).
[5] Li Z,Zhang Y,Arora S.Why are convolutional nets more sample-efficient than fully-connected nets?[J].arXiv preprint:2010.08515,2020.
[6] Su X,You S,Xie J,et al.ViTAS: Vision transformer architecture search[C]//Computer Vision-ECCV 2022: 17th European Conference,Tel Aviv,Israel,October 23-27,2022,Proceedings,Part XXI.Cham:Springer Nature Switzerland,2022:139-157.
[7] IU Z,LIN Y T,CAO Y,et al.Hierarchical Vision Transformer using Shifted Windows[C]//International Conference on Computer Vision,2021.
[8] Vaswani A,Shazeer N,Parmar N,et al.Attention is all you need[J].Advances in neural information processing systems,2017(30).
[9] Liu Z,Mao H,Wu C Y,et al.A convnet for the 2020s[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022:11976-11986.
[10] 王连明,吴鑫.基于姿态估计的物体3D运动参数测量方法[J].吉林大学学报(工学版):2021(12):1-10.
[11] Xiao B,Wu H,Wei Y.Simple baselines for human pose estimation and tracking[C]//Proceedings of the European conference on computer vision (ECCV),2018:466-481.
[12] Newell A,Huang Z,Deng J.Associative embedding:End-to-end learning for joint detection and grouping[J].Advances in neural information processing systems,2017(30).
[13] Ba J L,Kiros J R,Hinton G E.Layer normalization[J].arXiv preprint,2016.
[14] Hendrycks D,Gimpel K.Bridging nonlinearities and stochastic regularizers with gaussian error linear units[J].CoRR,2016(3).
[15] Loshchilov I,Hutter F.Decoupled weight decay regularization[J].arXiv preprint,2017.
[16] He T,Zhang Z,Zhang H,et al.Bag of tricks for image classification with convolutional neural networks[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.2019:558-567.
基金
*山西省重点研发计划项目(201903D121060)。