基于时空增强和多流特征融合的人体动作识别

2025年03月05日 19:55 点击：[]

全文下载： 20250103.pdf

文章编号： 1672-6987（2025）01-0135-09 DOI： 10.16351/j.1672-6987.2025.01.018

周青霞¹，金鑫²，付飞²，冯宇平^1*，陈通³，安文志¹，李云文¹（1.青岛科技大学自动化与电子工程学院，山东青岛 266061； 2.青岛科创信达科技有限公司，山东青岛 266000；3.季华实验室材料科学与技术研究部，广东佛山 528022）

摘要：针对人体骨架动作识别网络训练时骨架特征信息和时空特征信息利用不充分的问题，提出一种基于时空增强和多流特征融合图卷积人体动作识别模型。本工作提出时空增强模块，通过时空注意力机制来增强模型对时间和空间维度特征信息的关注度；通过调整邻接矩阵，改进自适应图卷积层来丰富上下文信息；提出多流特征融合模块来增强高阶骨架信息利用率，提取关节点信息、骨骼位置信息和骨骼运动信息进行融合。实验结果表明，与基线方法2s-AGCN相比，本工作模型在Kinetics数据集上Top-1和Top-5的准确率分别提升1.2与1.5个百分点，在NTU RGB+D数据集上X-Sub和X-View的准确率分别提升1.4与1.6个百分点。实验表明，该算法可以充分利用人体特征信息，对动作的识别效果具有明显提升。

关键词：骨架序列；时空注意力机制；特征融合；动作识别

中图分类号： TP 391.9 文献标志码： A

引用格式：周青霞，金鑫，付飞，等.基于时空增强和多流特征融合的人体动作识别［J］.青岛科技大学学报（自然科学版）， 2025， 46（1）： -.

ZHOU Qingxia， JIN Xin， FU Fei，et al.Human motion recognition based on spatiotemporal enhancement and multi-stream feature fusion［J］.Journal of Qingdao University of Science and Technology（Natural Science Edition）， 2025， 46（1）： -.

Human Motion Recognition Based on Spatiotemporal Enhancement and Multi-Stream Feature Fusion

ZHOU Qingxia¹， JIN Xin²， FU Fei²， FENG Yuping¹， CHEN Tong³， AN Wenzhi¹， LI Yunwen¹

（1.College of Automation and Electronic Engineering， Qingdao University of Science and Technology， Qingdao 266061， China； 2.Qingdao Kechuang Xinda Technology Co. Ltd， Qingdao 266000， China；3.Department of Materials Science and Technology Research， Jihua Laboratory， Foshan 528022， China）

Abstract： In this paper， a model based on spatiotemporal enhancement and multi-stream feature fusion graph convolutional human action recognition is proposed to solve the problem of insufficient utilization of skeletal feature information and spatiotemporal feature information during the training of action recognition network based on human skeleton. In this paper， a spatiotemporal enhancement module is proposed to enhance the model's attention to the characteristic information of temporal and spatial dimensions through the spatiotemporal attention module. By improving the adaptive graph convolutional layer to enrich the context information， a multi-stream feature fusion module is proposed to enhance the utilization rate of high-order bone information， and joint point information， bone position information and bone movement information are extracted for fusion. The experimental results show that compared with the baseline method 2s-AGCN， the accuracy of the proposed model on the Kinetics dataset is improved by 1.2 （Top-1） and 1.5 （Top-5） percentage points， and the accuracy on the NTU RGB+D dataset is improved by 1.4 （CS） and 1.6 （CV） percentage points， respectively. Experiments show that the proposed algorithm can make full use of human characteristic information and significantly improve the recognition effect of actions.

Key words： skeleton sequence； spatiotemporal attention mechanism； feature fusion； action recognition

收稿日期： 2023-12-11

基金项目：国家自然科学基金项目（61971253）；青岛科技大学大学生创新训练计划项目（S202210426012）.

作者简介：周青霞（1998—），女，硕士研究生. * 通信联系人.

附件【20250103.pdf】已下载次

上一条：基于指数收敛干扰观测器的四旋翼滑模控制下一条：一种轻量级混凝土裂缝识别网络

【关闭】