全文下载: 202102016.pdf
文章编号: 1672-6987(2021)02-0112-07; DOI: 10.16351/j.1672-6987.2021.02.016
赵金超, 李仪, 王冬, 张俊虎*(青岛科技大学 信息科学技术学院,山东 青岛 266061)
摘要: 为适应优化算法的模型,用K近邻方法对数据进行预处理,提出了KNNRF模型。对数据集用K近邻进行缺失补充,并进行归一化等预处理操作,以随机森林算法为基础,并采用交叉检验和网格搜索寻找最佳参数。在比较流行的UCI心脏病数据集和克利夫兰医学中心公开数据集分别进行实验,建立了心脏病预测模型,用于辅助医生对患者是否患有心脏病进行诊断预测。通过对实验结果中的准确率、AUC值进行分析,随机森林预测结果最优,准确率达到了832%,AUC值达到0965,实验结果表明:该算法分类效果较好,泛化能力强,对辅助医生进行心脏病预测具有可行性。
关键词: 心脏病预测; 数据预处理; 随机森林
中图分类号: TP 301.6文献标志码: A
引用格式: 赵金超, 李仪, 王冬, 等. 基于优化的随机森林心脏病预测算法[J]. 青岛科技大学学报(自然科学版), 2021, 42(2): 112118.
ZHAO Jinchao, LI Yi, WANG Dong, et al. Research on heart disease prediction algorithm based on optimized random forest[J]. Journal of Qingdao University of Science and Technology(Natural Science Edition), 2021, 42(2): 112118.
Heart Disease Prediction Algorithm Based on Optimized Random ForestZHAO Jinchao, LI Yi, WANG Dong, ZHANG Junhu
(College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China)
Abstract: In order to adapt to the model of optimization algorithm, This paper uses Knearest neighbor method to preprocess the data, and establish KNNRF model. Knearest neighbor is used to fill the dataset, and preprocessing operations such as normalization are carried out. Then, based on the random forest algorithm, this paper uses cross test and grid search to find the best parameters。 In the popular UCI heart disease data set and Cleveland medical center open data set, respectively, experiments were carried out to establish a heart disease prediction model to assist doctors in the diagnosis and prediction of patients with heart disease. Through the analysis of the accuracy and AUC value in the experimental results, the result of random forest prediction is the best, the accuracy is 832%, and AUC value is 0965. The experimental results show that the algorithm has good classification effect and strong generalization ability, and it is feasible to assist doctors in heart disease prediction.
Key words: prediction of heart disease; data preprocessing; random forest
收稿日期: 20200425
基金项目: 山东省重点研发计划项目(2015GSF119016).
作者简介: 赵金超(1994—),男,硕士研究生.*通信联系人.