设为首页 联系我们 加入收藏

当前位置: 网站首页 期刊分类目录 2021第6期 正文

基于Bagging集成学习的蛋白质折叠识别

作者:时间:2021-12-15点击数:

全文下载: 202106013.pdf



文章编号: 1672-6987202106-0101-10 DOI 10.16351/j.1672-6987.2021.06.013



杨欣华, 顾海明*


(青岛科技大学 数理学院,山东 青岛 266061


摘要: 提出了一种新的蛋白质折叠识别方法-BAG-fold模型。首先,通过伪位置特异性得分矩阵(pseudo position specific score matrix,PsePSSM)方法,二级结构(secondary structure,SS)方法,分组重量编码(encoding based on grouped weight,EBGW)方法和去趋势互相关分析(detrended cross-correlation analysis,DCCA)方法,共4种方法提取蛋白质序列的特征信息,并由4种特征信息得到混合特征空间。其次,采用局部Fisher判别分析(linear Fisher discriminant analysis,LFDA)减少冗余信息以选取最优特征子集。最后,将最优特征子集输入到Bagging集成分类器中进行蛋白质折叠识别。使用10折交叉验证在DD数据集和RDD数据集的精度分别达到了96.8%98.8%。实验结果表明,提出的BAG-fold方法明显优于其它预测方法。

关键词: 蛋白质折叠; 多信息融合; 去趋势互相关分析法; 局部Fisher判别分析; Bagging集成学习



中图分类号: Q 811.4文献标志码: A

引用格式: 杨欣华, 顾海明. 基于Bagging集成学习的蛋白质折叠识别[J. 青岛科技大学学报(自然科学版), 2021, 42(6): 101-110.


YANG Xinhua GU Haiming. Protein folding recognition based on Bagging ensemble learningJ. Journal of Qingdao University of Science and TechnologyNatural Science Edition), 2021 426): 101-110.


Protein Folding Recognition Based on Bagging Ensemble Learning


YANG Xinhua GU Haiming

(College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China)


Abstract: In this article, we propose a new protein fold recognition method-BAG-fold model. First, through the pseudo position specific score matrix (PsePSSM) method, secondary structure (SS) method, Encoding based on grouped weight (EBGW) method and detrended cross-correlation analysis (DCCA) method, there are four methods to extract the feature information of protein sequence, and the mixed feature space is obtained from the four types of feature information. Secondly, using linear Fisher discriminant analysis (LFDA) reduces redundant information to select the optimal feature subset. Finally, the optimal feature subset is input into the Bagging ensemble classifier for protein folding recognition. Using 10-fold cross-validation, the accuracy of the DD dataset and RDD dataset reached 96.8% and 98.8%, respectively. Experimental results show that the BAG-fold method proposed in this paper is significantly better than other prediction methods.

Key words: protein folding multi-innformation fusion detrended cross-correlation analysis linear Fisher discriminant analysisBagging ensemble learning



收稿日期: 2020-12-03

基金项目: 国家自然科学基金面上项目(62172248.

作者简介: 杨欣华(1995—),女,硕士研究生.*通信联系人.



Copyright © 2011-2017 青岛科技大学学报 (自然科学版)