全文下载: 20240319.pdf
文章编号: 1672-6987(2024)03-0141-06; DOI: 10.16351/j.1672-6987.2024.03.019
张浩, 王慧薷, 王传旭*(青岛科技大学 信息科学技术学院, 山东 青岛 266061)
摘要: 目标检测中如何将多尺度特征进行有效地融合仍是一个挑战,提出了一种细粒度级别融合多尺度特征的组件,称为语义化多尺度特征融合SMSFF(semantic multi-scale feature fusion)。首先多尺度卷积核生成目标检测网络所需的多尺度语义信息,然后使用新颖的多尺度特征融合方法将其充分融合。最后,利用SE(squeeze-and-excitation)跨通道的加权注意力重新标定多尺度特征,有效地强化了网络的多尺度信息,进而提高了网络的特征表征能力。因此,SMSFF能够有效地提高检测精度,且模型对不同尺度实例目标更具鲁棒性。本工作所提方法在基准数据集COCO 2017 test和Pascal VOC上的YOLOX目标检测器的mAP分别为48.6%和87.6%。
关键词: 目标检测; 多尺度特征融合; 注意力机制; 计算机视觉; 图像分类
中图分类号: TP 389.1文献标志码: A
引用格式: 张浩, 王慧薷, 王传旭. 基于语义化多尺度卷积与注意力机制的目标检测算法[J]. 青岛科技大学学报(自然科学版), 2024, 45(3): 141-146.
ZHANG Hao, WANG Huiru, WANG Chuanxu. Object detection based on semantic multi-scale convolution and attention mechanism[J]. Journal of Qingdao University of Science and Technology(Natural Science Edition), 2024, 45(3): 141-146.
Object Detection Based on Semantic Multi-scale
Convolution and Attention Mechanism
ZHANG Hao, WANG Huiru, WANG Chuanxu
(College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China)
Abstract: In object detection, the scale variation of objects is one of the most challenging problems, so it is particularly important to fuse more effective multi-scale features. This paper proposes a component for fusing multi-scale features at a fine-grained level, called SMSFF (semantic multi-scale feature fusion). First, the multi-scale convolution kernel generates the multi-scale semantic information required by the object detection network, and then fully fuses it using a novel multi-scale feature fusion method. In addition, using the weighted attention across channels of SE (squeeze-and-excitation) to re-calibrate the multi-scale features, which effectively strengthens the multi-scale information of the network, thereby improving the feature representation ability of the network. Therefore, SMSFF can effectively improve the detection accuracy, and the model is more robust to instance objects of different scales. The mAP of the proposed method on the benchmark datasets COCO 2017 test and Pascal VOC for the YOLOX object detector is 48.6% and 87.6%, respectively.
Key words: object detection; multi-scale feature fusion; attention mechanism;computer vision;image classification
收稿日期: 2023-06-06
基金项目: 国家自然科学基金项目(61672305).
作者简介: 张浩(1998—),男,硕士研究生.*通信联系人.