A Brief History: MapReduce circa 2004-Google MapReduce: Simplified Data Processing on Large clusters Jeffrey dean and sanjay ghemawat researchgoogle.com/archive/mapreduce.html MapReduce is a programming model and an associated implementation for processing and generating large data sets
A Brief History: MapReduce circa 2004 – Google MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat MapReduce is a programming model and an associated implementation for processing and generating large data sets. research.google.com/archive/mapreduce.html
A Brief History: MapReduce circa 2004-Google M Program jeff resel Master (2) reduce worker plit O (6)w Worker (5) remote read file o split 2A(3)read worker (4)local write ork file I plit 4 worker ntermediate files Reduce Output files phase (on local disks) phase files
A Brief History: MapReduce circa 2004 – Google MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat MapReduce is a programming model and an associated implementation for processing and generating large data sets. research.google.com/archive/mapreduce.html
A Brief History: MapReduce MapReduce use cases showed two major limitations. difficultly of programming directly in MR 2. performance bottlenecks, or batch not fitting the e use cases In short, Mr doesnt compose well for large applications
A Brief History: MapReduce MapReduce use cases showed two major limitations: 1. difficultly of programming directly in MR 2. performance bottlenecks, or batch not fitting the use cases In short, MR doesn’t compose well for large applications
A Brief History: Spark Developed in 2009 at Uc berkeley amPlab then open sourced in 2010, Spark has since become one of the largest oss communities in big data with over 200 contributors in 50+ organiZations Unlike the various specialized systems, Sparks goal was to generalize mapreduce to support new apps within same engine Q Lightning-fast cluster computing
A Brief History: Spark Developed in 2009 at UC Berkeley AMPLab, then open sourced in 2010, Spark has since become one of the largest OSS communities in big data, with over 200 contributors in 50+ organizations Unlike the various specialized systems, Spark’s goal was to generalize MapReduce to support new apps within same engine Lightning-fast cluster computing
A Brief History: Special Member Lately Ive been working on the Databricks Cloud and Spark. Ive been responsible for the architecture, design, and implementation of many Spark components Recently led an effort to scale spark and built a ystem based on Spark that set a new world record for sorting 100TB of data(in 23 mins) @Reynold Xin
A Brief History: Special Member Lately I've been working on the Databricks Cloud and Spark. I've been responsible for the architecture, design, and implementation of many Spark components. Recently, I led an effort to scale Spark and built a system based on Spark that set a new world record for sorting 100TB of data (in 23 mins). @Reynold Xin