EMR
ECRAM MapReduce framework
The MapReduce programming model simplifies the design and implementation of some parallel algorithms. Recently, several work-groups have extended MapReduce's application domain to iterative and on-line data processing. In contrast to the original MapReduce model, software frameworks for extended MapReduce use message passing for propagating data updates. For example, Twister promotes data updates using publish-subscribe messaging, and Hadoop Online Prototype sends intermediate data directly from map to reduce workers using RPC. In order to benefit from aggregated main memories, fast data access and stronger data consistency, we propose to use transactional in-memory storage for extended MapReduce. We have designed and implemented the EMR framework, a transactional in-memory framework for extended MapReduce based on the ECRAM in-memory storage.
The EMR framework implements job management for extended MapReduce on top of ECRAM. EMR stores work-queues, jobs and configuration data in ECRAM to take advantage of automatic replication and consistency. EMR uses ECRAM’s built-in event propagation that blocks until a storage object fulfills a given condition in order to avoid polling. The contention on a global job queue would provoke high transaction conflict rates. Therefore, we have implemented per-worker job queues and a simple work-stealing approach for load balancing. If a worker detects that it is about to block on its empty queue, it scans the work-queues of its peers for jobs to steal from them. Given that work-queues are stored as shared objects, there is no danger of deadlocks or lost jobs.
To illustrate the usage and performance of the EMR framework, we have implemented several extended MapReduce applications from diverse problem domains including iterative and on-line data processing. A real-time ray-tracing example with some videos is described here. Experiments show that in-memory MapReduce scales well to at least 32 worker nodes. Designing and implementing the EMR framework has provided us with valuable insights about the application of distributed transactional memory.
Contacts:
Prof. Dr. Michael Schöttner