设为首页 联系我们 加入收藏

当前位置: 网站首页 期刊分类目录 2016第5期 正文

面向大规模流数据的可扩展分布式实时处理方法

作者:时间:2016-10-11点击数:

PDF全文下载:2016050584

蔡斌雷1,郭芹2,朱世伟1,任家东3

(1.山东省科学院 情报研究所,山东 济南 250014;2.济南大学 泉城学院,山东 烟台 265600;3.燕山大学 信息科学与工程学院,河北 秦皇岛 066004)

摘要:MapReduce是处理大规模数据集的常用技术,但不能满足大规模数据集中流数据实时计算的要求。对此提出一种面向大规模流数据的可扩展、分布式实时处理方法。该方法在Map阶段,建立基于内存Hash B+树的缓存结构对中间结果处理机制进行优化,以降低对中间结果的频繁读写造成的I/O消耗,同时消除对中间结果的排序,以降低对CPU的消耗;在Reduce阶段,设计基于动态增量Hash技术的快速内存处理方法,并消除对中间结果的多遍扫描合并,对流数据进行增量处理、单遍分析,以提高对流数据的实时分析能力。实验结果表明:上述方法可以对大规模流数据进行实时性处理,并且具有较好的可扩展性。

关键词:大数据;分布式计算;流数据处理; MapReduce

中图分类号: TP 391文献标志码: A

A Scalable and Distributed Method for Processing Large Scale Data Streams in Real Time

CAI Binlei1, GUO Qin2, ZHU Shiwei1, REN Jiadong3

(1.    Information Research Institute,Shandong Academy of Sciences,Jinan 250014, China; 2.Quancheng College,University of Jinan,Yantai 265600, China; 3.College of Information Science and Engineering,Yanshan University,Qinhuangdao 066004,China)

Abstract: MapReduce is a widely used technique for processing massive datasets, however, it is unable to support the realtime processing for large scale data streams. In this paper, we studied a scalable and distributed method, called SDRT MR, based on MapReduce model, to process large scale data streams in real time. To lower the I/O cost and efficiently utilize CPU, a memory caching mechanism using Hash B+ tree is adopted to optimize the processing mechanism of intermediate results. To boost incremental onepass analytics of data streams processing, we develop dynamic incremental hash techniques to support fast in memory processing, simultaneously employ an efficient technique to identify frequent keys. Our experimental results on synthetic datasets show that SDRTMR has higher real time performance and better scalability.

Key words: big data; data streams; distributed computing; realtime processing; MapReduce

收稿日期: 20150811

基金项目: 国家自然科学基金资助项目(61170190);                                                                           

                 山东省科技发展计划项目(2014GGX101013,2015GGX101032).

作者简介:蔡斌雷(1984-),男,助理研究员.

文章编号: 1672-6987(2016)05-0584-07;DOI:10.16351/j.1672-6987.2016.05.021

Copyright © 2011-2017 青岛科技大学学报 (自然科学版)