上声定通大字 SHANGHAI JLAO TONG UNIVERSITY CS427 Multicore Architecture and Parallel Computing Lecture 9 MapReduce Prof Li Jiang 201411/19
CS427 Multicore Architecture and Parallel Computing Lecture 9 MapReduce Prof. Li Jiang 2014/11/19 1
O What is MapReduce Origin from Google, [OSDI 04 A Simple programming mode Functional model For large-scale data processing Exploits large set of commodity computers Executes process in distributed manner Offers high availability
What is MapReduce 2 • Origin from Google, [OSDI’04] • A simple programming model • Functional model • For large-scale data processing – Exploits large set of commodity computers – Executes process in distributed manner – Offers high availability
③ Motivation Large-scale data processing Want to use 1000s of Cpus But don t want hassle of managing things Mapreduce provides Automatic parallelization e distribution fault tolerance Monitoring &t status updates
Motivation 3 • Large-Scale Data Processing – Want to use 1000s of CPUs – But don’t want hassle of managing things • MapReduce provides – Automatic parallelization & distribution – Fault tolerance – I/O scheduling – Monitoring & status updates
o)Benefit of MapReduce Map/reduce Programming model from Lisp (and other functional languages) Many problems can be phrased this way easy to distribute across nodes Nice retry/failure semantics
Benefit of MapReduce 4 • Map/Reduce – Programming model from Lisp – (and other functional languages) • Many problems can be phrased this way • Easy to distribute across nodes • Nice retry/failure semantics
G) Distributed Word Count Split data→→ count→ count Very Split datal→→ count count merged big split datal→→ count count +merde count data Split data→→ count→ count」
Distributed Word Count 5 Very big data Split data Split data Split data Split data count count count count count count count count merge merged count