With the development of optimization technology and the further recognition of neural network,ConvNeXt network is proposed and applied to visual classification tasks,and its performance surpassed a series of large-parameter and large-computational networks such as Transformer.Pose estimation task is the basic task in computer vision task and the basis of hand gesture recognition,which has a wide range of application prospects.In this paper,the ConvNeXt network is applied to hand pose estimation,and optimized.Heatmap encoding is introduced to increase the accuracy of key point coordinate prediction.Using the improved Adamw optimizer to optimize the model parameters,its PCK@0.2 index reached 0.992,and the EPE index also reached 3.47,surpassing the experimental results of other models.
Key words
deep learning /
pose estimation /
hand gesture recognition /
keypoint detection /
pattern recognition /
ConvNeXt
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
References
[1] 牟书辉,李凡.基于人体姿态视觉判断的立定跳远成绩测量方法[J].中国科技论文,2022,17(11):1181-1187.
[2] 刘勇,李杰,任立成,等.并联化高分辨网络的人体姿态估计方法[J].计算机工程与设计,2022,43(1):237-244.
[3] 孙志勇,李宏友,叶俊勇.基于弱监督迁移网络的3D人体关节点识别[J].吉林大学学报(工学版):2022(4):1-9.
[4] Krizhevsky A,Sutskever I,Hinton G.ImageNet Classification with Deep Convolutional Neural Networks[J].Advances in neural information processing systems,2012,25(2).
[5] Li Z,Zhang Y,Arora S.Why are convolutional nets more sample-efficient than fully-connected nets?[J].arXiv preprint:2010.08515,2020.
[6] Su X,You S,Xie J,et al.ViTAS: Vision transformer architecture search[C]//Computer Vision-ECCV 2022: 17th European Conference,Tel Aviv,Israel,October 23-27,2022,Proceedings,Part XXI.Cham:Springer Nature Switzerland,2022:139-157.
[7] IU Z,LIN Y T,CAO Y,et al.Hierarchical Vision Transformer using Shifted Windows[C]//International Conference on Computer Vision,2021.
[8] Vaswani A,Shazeer N,Parmar N,et al.Attention is all you need[J].Advances in neural information processing systems,2017(30).
[9] Liu Z,Mao H,Wu C Y,et al.A convnet for the 2020s[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022:11976-11986.
[10] 王连明,吴鑫.基于姿态估计的物体3D运动参数测量方法[J].吉林大学学报(工学版):2021(12):1-10.
[11] Xiao B,Wu H,Wei Y.Simple baselines for human pose estimation and tracking[C]//Proceedings of the European conference on computer vision (ECCV),2018:466-481.
[12] Newell A,Huang Z,Deng J.Associative embedding:End-to-end learning for joint detection and grouping[J].Advances in neural information processing systems,2017(30).
[13] Ba J L,Kiros J R,Hinton G E.Layer normalization[J].arXiv preprint,2016.
[14] Hendrycks D,Gimpel K.Bridging nonlinearities and stochastic regularizers with gaussian error linear units[J].CoRR,2016(3).
[15] Loshchilov I,Hutter F.Decoupled weight decay regularization[J].arXiv preprint,2017.
[16] He T,Zhang Z,Zhang H,et al.Bag of tricks for image classification with convolutional neural networks[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.2019:558-567.