全文下载:
202006013.pdf
文章编号: 1672-6987(2020)06-0093-06; DOI: 10.16351/j.1672-6987.2020.06.013
周艳平, 李金鹏(青岛科技大学 信息科学技术学院,山东 青岛 266061)
摘要: 针对句子的词序问题,提出了一种基于词向量及位置编码的Jaccard相似度算法。该方法首先使用词向量模型将每个词映射成高维语义向量,然后结合词位置编码计算出各个词向量的相似度,最后使用Jaccard算法计算出句子之间最终的相似度。实验结果证明:本工作提出的方法与传统的Jaccard算法和基于词向量的Jaccard相似度算法相比,有效提升了相似度准确率,对词序也有很好的辨别能力。
关键词: 位置编码; Jaccard算法; 词向量; 句子相似度
中图分类号: TQ 207+.2文献标志码: A
引用格式: 周艳平, 李金鹏. 一种基于词向量及位置编码的Jaccard相似度算法\[J\]. 青岛科技大学学报(自然科学版), 2020, 41(6): 93-98.
ZHOU Yanping, LI Jinpeng. Jaccard similarity algorithm based on word embedding and position encoding\[J\]. Journal of Qingdao University of Science and Technology(Natural Science Edition), 2020, 41(6): 93-98.
Jaccard Similarity Algorithm Based on Word Embedding and Position EncodingZHOU Yanping
,
LI Jinpeng
(College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China)
Abstract: In terms of word order problem for sentences, this paper proposes a Jaccard similarity algorithm based on word embedding and position encoding. Firstly, the word vector model is used to map each word into a high-dimensional semantic vector, and then the similarity of each word vector is calculated by combining the word position encoding. Finally, the final similarity between sentences is calculated by Jaccard algorithm. The experimental results show that compared with the traditional Jaccard algorithm and Jaccard text similarity algorithm based on word embedding, the proposed method effectively improves the similarity accuracy and has good discriminating ability for word order.
Key words: position encoding; Jaccard algorithm; word embedding; sentence similarity
收稿日期: 2019-09-03
基金项目: 国家自然科学基金项目(61402246).
作者简介: 周艳平(1976—),男,副教授.