Smart Replication: Accelerate Big Data Handling
Scientists from Saarland University have developed substantial enhancements of Hadoop Distributed File System (HDFS) and Hadoop MapReduce, which dramatically improve the runtime of MapReduce jobs.
MapReduce and especially Hadoop MapReduce (the most popular open source implementation) has become the de facto standard for large scale analytics in enterprises. It is used for novel solutions on massive datasets such as web or real-time analytics and data mining.
Bilder & Videos
However, there is one major drawback of Hadoop MapReduce: the truly slow response times, which are mainly due to the full scan functioning of the MapReduce jobs.
The key idea of associated ‘Smart Replication’ is to keep the already existing physical replicas of an HDFS block in different layouts, sort orders and/or with different (clustered) indexes.
This is, for a default replication factor of three, at least three different sort orders, indexes and/or data layouts are available for MapReduce job processing. The ‘Smart Replication’ related modification of the HDFS upload pipeline ensures the suitable creation of indexes, layouts as well as sort orders already during the data upload. Thus, the likelihood to find a suitable index and/or data layout increases and, consequently, the runtime for the workload improves.
Extensive benchmark experiments (see diagrams) indicate that ‘Smart Replication’ typically creates a win-win situation over Hadoop, i.e. simultaneously improves both data upload to HDFS (up to 60%) as well as the runtime of the actual Hadoop MapReduce job (up to a factor of 68).
- Modifications of HDFS and MapReduce related to ‘Smart Replication’ are almost invisible to the user
- ‘Smart Replication’ is easy to integrate into existing Hadoop based systems
- ‘Smart Replication’ is implementable for distributed systems in general
Universität des Saarlandes Wissens- und Technologietransfer GmbH
Dr. Christof Schäfer
Universität des Saarlandes Wissens- und Technologietransfer GmbH, Campus, Gebäude A1 1
- US 2015120652 anhängig