设为首页 联系我们 加入收藏

当前位置: 网站首页 期刊分类目录 2018第1期 正文

基于集成学习的人类LncRNA大数据基因预测

作者:时间:2018-03-16点击数:

PDF全文下载:  201801017.pdf

 

文章编号: 16726987201801010608 DOI 10.16351/j.16726987.2018.01.017

 

于彬, 李珊, 陈成, 陈瑞欣, 田保光

(青岛科技大学 数理学院,山东 青岛 266061)

 

摘要: 长非编码RNA (LncRNA)在表观遗传调控、转录后调控和人类疾病中发挥着重要作用,利用机器学习方法从海量的RNA数据中识别出LncRNA十分必要。本研究提出一种基于集成学习的LncRNA大数据基因预测新方法。首先提取序列碱基出现频率的86个特征作为原始特征集合,其次,基于GA-SVM选取出最优特征,以SVM五折交叉验证的准确率作为适应度,最后构建AdaBoost算法与SVM相结合的基因预测模型(AdaBoost-SVM)。实验结果表明:AdaBoost-SVM模型对测试集LncRNA的预测准确率为89-26%,优于RFSVMDWT-SVM3种预测模型的结果。

关键词: 长非编码RNA 基因预测; 集成学习; AdaBoost算法; 支持向量机

中图分类号: Q 811.4文献标志码: A

引用格式:于彬, 李珊, 陈成, . 基于集成学习的人类LncRNA大数据基因预测\[J\]. 青岛科技大学学报(自然科学版), 2018 391): 106113.

YU Bin LI Shan, CHEN Cheng, et al. Prediction of human LncRNA big data genes based on ensemble learning\[J\]. Journal of Qingdao University of Science and TechnologyNatural Science Edition), 2018, 39(1) 106113.

 

Prediction of Human LncRNA Big Data Genes Based on Ensemble Learning

 

YU Bin LI Shan, CHEN Cheng, CHEN Ruixin, TIAN Baoguang

(College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China)

 

Abstract: Long noncoding RNA (LncRNA) plays an important role in epigenetic regulation, posttranscriptional regulation and human diseases. It is of great necessity to identify LncRNA from vast amounts of RNA data by using machine learning. This paper presents a new method for predicting LncRNA big data genes based on ensemble learning. Firstly, such 86 features as frequency of occurrence of base sequence are extracted as initial characteristic sets. Secondly, the optimal features based on GASVM are selected, and 5fold crossvalidation accuracy of SVM is employed as fitness. Lastly, gene prediction model (AdaBoostSVM) combined by AdaBoost algorithm and SVM is constructed. The experimental results show that the prediction accuracy of test set LncRNA based on AdaBoostSVM model is 8926%, which is better than that of the RF, SVM and DWTSVM models.

Key words: long noncoding RNA; gene prediction; ensemble learning; AdaBoost algorithm; support vector machine

 

收稿日期:  20170502

基金项目: 国家自然科学基金项目(51572136); 山东省自然科学基金项目(ZR2014FL021);山东省高等学校科技计划项目(J17KA159.

作者简介: 于彬(1977),男,副教授.

 

Copyright © 2011-2017 青岛科技大学学报 (自然科学版)