当前位置：网站首页＞期刊分类目录＞ 2015第2期＞正文

大数据环境下频繁项集挖掘的研究

作者：时间：2015-04-20点击数：

李挥剑

(交通运输部管理干部学院信息技术应用研究所，北京 101601）

摘要：多种频繁项集挖掘（FIM）方法组合用来对大数据进行挖掘会暴露很多问题。针对暴露的问题，在MapReduce平台上对两种频繁项集挖掘算法进行了研究。采用两种新的大数据集挖掘方法：DistEclat和BigFIM，前者侧重于速度，利用基于kFIs的简易负荷平衡方案来解决问题。而后者通过先验变体对kFIs进行挖掘后将找出的频繁项集分配给映射程序，通过优化后在真正大的数据集上运行。最后通过实验证明该方法时间复杂度较低,数据量越大优势将越明显，扩展效果越好。

关键词：分布式数据挖掘; 频繁项集挖掘; MapReduce; Hadoop; Eclat算法

中图分类号： TP 301.6 文献标志码： A

Research on Frequent Itemsets Mining in Large Data Environment

LI Huijian

(Institute of Information Technology Application, Ministry of Transport Management

Cadre Institute,Beijing 101601, China)

Abstract: A variety of mining frequent itemsets (FIM) combination method used for mining on large data will expose many problems. According to the exposed problems to two kinds of frequent itemsets mining algorithm were researched in the platform of MapReduce, This paper adopts two kinds of big new data set mining method: DistEclat and BigFIM. The former focuses on speed, using simple load balancing scheme based on kFIs to solve the problem. The latter by mining the kFIs through a priori variants will find frequent item sets assigned to mapping procedures, through optimized operation in a real large data sets. The experiments prove that the time complexity of the method is low. The advantage will be more obvious and the effect of expansion is better，when data quantity is bigger.

Key words: distributed data mining; FIM; MapReduce; Hadoop; Eclat Algorithm

收稿日期： 20140412

基金项目：交通运输部应用基础研究（主干学科）项目(2012319226320).

作者简介：李挥剑（1976—），男，高级工程师.

当前位置： 网站首页 ＞ 期刊分类目录 ＞ 2015第2期 ＞ 正文

大数据环境下频繁项集挖掘的研究

作者：时间：2015-04-20点击数：_showDynClicks("wbnews", 1424592861, 2394)

上一篇：基于LabVIEW的数据协议灵活配置

当前位置：网站首页＞期刊分类目录＞ 2015第2期＞正文

作者：时间：2015-04-20点击数：