基于自注意力路由胶囊网络的多音事件检测
全文下载: 202205016.pdf
文章编号: 1672-6987(2022)05-0121-05; DOI: 10.16351/j.1672-6987.2022.05.016
李海涛, 杨树国*(青岛科技大学 数理学院,山东 青岛 266061)
摘要: 声音事件检测是目前计算机听觉领域中的重要问题,而多声音事件检测是其中一个极具挑战性的研究热点。基于最新提出的非迭代的自注意力路由方法和胶囊网络,本文提出了一种基于自注意力路由的多路径胶囊网络模型,将其用于多声音事件检测。由于自注意力路由方法是非迭代且高度并行的,大大加快了模型的训练速度;多路径基础胶囊层使用不同大小的非对称卷积核,不仅使模型能获得不同分辨率的信息,还能极大地保留时间信息,从而提高了模型的性能。本工作在2017年声音场景与事件检测分类挑战赛(Detection and Classification of Acoustic Scenes and Events, DCASE 2017) 挑战任务4数据集上对所提出的模型和方法进行了对比实验及性能评估。其中,音频标注子任务的F分数达到了595%,音频事件检测的错误率降低到072,检测效果有较大的提升。结果表明:本方法具有事件检测准确率高、速度快、泛化能力强等优点。
关键词: 多声音事件检测; 胶囊网络; DCASE 2017挑战
中图分类号: TP 18文献标志码: A
引用格式: 李海涛,杨树国. 基于自注意力路由胶囊网络的多音事件检测[J]. 青岛科技大学学报(自然科学版), 2022, 43(5): 121-126.
LI Haitao, YANG Shuguo. Polyphonic sound event detection based on self-attention routing capsule network[J]. Journal of Qingdao University of Science and Technology(Natural Science Edition), 2022, 43(5): 121-126.
Polyphonic Sound Event Detection Based on Self-Attention
Routing Capsule Network
LI Haitao, YANG Shuguo
(College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China)
Abstract: Sound event detection is currently an important issue in the field of computer hearing, and polyphonic sound event detection is one of the most challenging research hotspots. Based on the newly proposed non-iterative self-attention routing method and capsule network, this paper proposes a multi-path capsule network model based on self-attention routing, which is used for polyphonic event detection. Since the self-attention routing method is non-iterative and highly parallel, it greatly accelerates the training speed of the model; the multi-path primary capsule layer uses asymmetric convolution kernels of different sizes, which not only enables the model to obtain information of different resolutions, but also extremely retains time information, thereby improving the performance of the model. This paper conducts comparative experiments and performance evaluation of the proposed models and methods on the data set of DCASE 2017 Task 4. The F score of the audio tagging subtask is 595%, and the error rate of the sound event detection is reduced to 072, which is a big improvement. The results show that the method in this paper has the advantages of high sound event detection accuracy, fast speed and strong generalization ability.
Key words: polyphonic sound event detection; capsule network; DCASE 2017 challenge
收稿日期: 2021-09-08
基金项目: 山东省自然科学基金项目(ZR2021QF040).
作者简介: 李海涛(1997—), 男,硕士研究生.*通信联系人.