全文下载:
202001016.pdf
文章编号: 1672-6987(2020)01-0104-06; DOI: 10.16351/j.1672-6987.2020.01.016
武阳, 余综
(中国电子科技集团公司第十五研究所,北京 100083)
摘要: 提出了一种改进的端到端语音识别方法。该方法在多任务学习框架下的混合注意力模型和CTC(联结时序主义分类)模型基础上,在编码器扩展了深度卷积神经网络,弥补了纯注意力模型和纯CTC算法模型各自的缺点,相较于混合模型有一定性能提升。进一步验证传统MTL模型在噪声环境下的优异表现的同时也证明Ex-MTL相比传统模型有更好的识别准确率。基于安静环境和噪声环境下多种中文语料库的实验证明了其表现优于纯注意力模型、纯CTC算法模型,且训练收敛和对齐速度更快。安静环境下字符错误率(CER)分别降低2.53%和0.93%,噪声环境下字符错误率(CER)分别降低4.45%和3.45%。
关键词: 语音识别; 端到端; 联结时序主义分类; 注意力机制; 卷积神经网络
中图分类号: TP 301.6文献标志码: A
引用格式: 武阳, 余综. 基于CNN的扩展混合端到端中文语音识别模型\[J\]. 青岛科技大学学报(自然科学版), 2020, 41(1): 104-109.
WU Yang, YU Zong. An extended hybrid end-to-end Chinese speech recognition model based on CNN\[J\]. Journal of Qingdao University of Science and Technology(Natural Science Edition), 2020, 41(1): 104-109.
An Extended Hybrid End-to-end Chinese Speech Recognition Model Based on CNN
WU Yang, YU Zong
(Dept. of 1st Foundation,The 15th Research Institute of China Electronic Science and Technology Corporation, Beijing 100083, China)
Abstract: An improved end-to-end speech recognition method is proposed. Based on the hybrid attention model and CTC model in the framework of multi task learning, the deep convolution neural network is extended in the encoder, whichhas a qualitative improvement compared with the hybrid model. Further verify the excellent performance of the traditional MTL model in noise environment, and also prove that ex MTL has better recognition accuracy than the traditional model. Experiments on Chinese corpora in quiet and noisy environments show that the performance is better than pure attention model and pure CTC algorithm model, and the training convergence and alignment speed are faster.
Key words: speech recognition; end to end; connectionist temporal classification; attention; Convolutional neural network
收稿日期: 2019-10-11
基金项目: 中国电子科技集团项目(C201700721).
作者简介: 武阳(1994—),男,硕士研究生.