A Brief History: Benefits Of Spark Speed Run programs up to 100x faster than Hadoop MapReduce in memory, or l Ox faster on disk Ease of use Write applications quickly in Java, Scala or Python G enera ity Spark Spark MLlibGraphX SQL Streaming(machine(graph) Combine S naytIcs Apache Spark
A Brief History: Benefits Of Spark Speed Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Ease of Use Write applications quickly in Java, Scala or Python. Generality Combine SQL, streaming, and complex analytics
A Brief History: Key distinctions for Spark vs MapReduce handles batch interactive, and real-time within a single framework programming at a higher level of abstraction more general: map/reduce is just one set of supported constructs functional programming /ease of use reduction in cost to maintain large apps lower overhead for starting jobs ess expensive shuttles Soak
A Brief History: Key distinctions for Spark vs. MapReduce • handles batch, interactive, and real-time within a single framework • programming at a higher level of abstraction • more general: map/reduce is just one set of supported constructs • functional programming / ease of use ⇒ reduction in cost to maintain large apps • lower overhead for starting jobs • less expensive shuffles …
TL, DR: Smashing The Previous Petabyte Sort Record databricks. com/blog/2014/11105spark-officially- sets-a-new-record-in-large-scale-sorting. html Hadoop mr Spark Spark Record Record 1 PB Data size 1025TB 100TB 1000TB Elapsed Time72 mins 23 mins 234 mins Nodes 2100 206 190 Cores 50400 physical 6592 virtualized 6080 virtualized Cluster disk 3150GB/5 618 GB/s 570 GB/s throughput (est Sort Benchmark Yes Yes No Daytona Rules Network dedicated data virtualized(EC2)virtualized(EC2) center, 10Gbps 10Gbps network 10Gbps network Sort rate 1.42 TB/min4.27TB/min4.27TB/min Sort rate/node 0.67 GB/min 20.7 GB/min 22.5 GB/min 」 Spark
TL;DR: Smashing The Previous Petabyte Sort Record databricks.com/blog/2014/11/05/spark-officiallysets-a-new-record-in-large-scale-sorting.html
TL, DR: Sustained Exponential Growth Spark is one of the most active Apache projects ohloh. net/orgs/apache Number of contributors who made changes to the project source code each month 2012 2013 2014 Soak
TL;DR: Sustained Exponential Growth Spark is one of the most active Apache projects ohloh.net/orgs/apache
TL, DR: Spark just Passed Hadoop in Popularity on Web datanami. com/2014/1/217spark-just-passed- hadoop-popularity-web-heres/ √ News headlines Forecast In October Apache Spark blue line) passed Apache Hadoop( red line) in popularity according to Google Trends G 2009 2011 2013 Soak
TL;DR: Spark Just Passed Hadoop in Popularity on Web datanami.com/2014/11/21/spark-just-passedhadoop-popularity-web-heres/ In October Apache Spark (blue line) passed Apache Hadoop (red line) in popularity according to Google Trends