设为首页 联系我们 加入收藏

当前位置: 网站首页 期刊分类目录 2025第5期 正文

基于深度胶囊网络融合模型的多声音事件检测

作者:时间:2025-11-01点击数:



全文下载: 202505019.pdf


文章编号: 1672-6987(2025)05-0152-07 DOI: 10.16351/j.1672-6987.2025.05.019

姜轻舟, 杨树国* 王文武(青岛科技大学 数理学院, 山东 青岛 266061)

摘要: 传统的胶囊网络架构是基于动态路由机制实现的,需要大量迭代和向量计算来更新权值系数,并且胶囊之间不存在信息共享,导致信息冗余。针对这一缺陷,本工作提出了一种基于融合深度胶囊网络的多声音事件检测模型,在门控卷积和3D卷积下通过动态路由减少了特征重叠导致的信息冗余,并且对原始特征进行编码,将其用于特征信息补充,提高了训练次数模型的速度和准确性。本工作使用DCASE2017(Detection and Classification of Acoustic Scenes and Events 2017) Challenge Task 4数据集对模型进行评估,最终F1分数达到59.6%,声音事件检测错误率低至0.71。结果表明,所提出的方法可以显著提高训练速度和精度。


关键词: 多声音事件检测; 胶囊网络; 融合网络; DCASE 2017挑战赛


引用格式: 姜轻舟, 杨树国, 王文武. 基于深度胶囊网络融合模型的多声音事件检测[J]. 青岛科技大学学报(自然科学版), 2025, 46(5): 152-158.


中图分类号: TN 912.2        文献标志码: A


JIANG Qingzhou, YANG Shuguo, WANG Wenwu. Multi-Voice event detection based on fused deep capsule network fusion model[J]. Journal of Qingdao University of Science and Technology(Natural Science Edition), 2025, 46(5): 152-158.

Multi-Voice Event Detection Based on Fused Deep Capsule Network Fusion Model

JIANG Qingzhou, YANG Shuguo, WANG Wenwu(College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China)

Abstract: The traditional capsule network architecture is implemented based on dynamic routing mechanism, which requires a large number of iterations and vector calculations to update the weight coefficients, and there is no information sharing between capsules, leading to information redundancy. To address this shortcoming, this paper proposes a multi-sound event detection model based on fusion depth capsule network, which reduces the information redundancy caused by feature overlap by dynamic routing under gated convolution and 3D convolution, and encodes the original features and uses them for feature information supplementation to improve the speed and accuracy of training times models. In this paper, the model is evaluated using DCASE2017 (Detection and Classification of Acoustic Scenes and Events 2017) Challenge Task 4 dataset and the final F1 score reaches 59.6% with a low sound event detection error rate of 0.71. The results show that the proposed method can significantly improve the training speed and accuracy.


Key words: polyphonic sound event detection; capsule network; converged networks; DCASE 2017 challenge

收稿日期: 2024-11-19

基金项目: 山东省自然科学基金项目(ZR2024QF112).

作者简介: 姜轻舟(1998—), 男, 硕士研究生.     * 通信联系人.


Copyright © 2011-2017 青岛科技大学学报 (自然科学版)