O Distributed Grep Split datal→→grep matches Very sp| t data→grep matches big-sp| t data→grep→ matches→cat→A∥ matches data Split data-+ grep matches
Distributed Grep 6 Very big data Split data Split data Split data Split data grep grep grep grep matches matches matches matches cat All matches
②Map+ Reduce Very MAP Partitioning big REDUcE Result Function data Map Reduce Accepts input key / value Accepts intermediate pair ★ ey/value pair -Emits intermediate Emits output key/value Rey value pair
Map+Reduce 7 • Map – Accepts input key/value pair – Emits intermediate key/value pair • Reduce – Accepts intermediate key/value* pair – Emits output key/value pair Very big data Result M A P R E D U C E Partitioning Function
②Map+ Reduce map(key val) is run on each item in set emits new-Rey/ new-val pairs reduce(key, vals) is run for each unique key emitted by mapo emits final output
8 • map(key, val) is run on each item in set – emits new-key / new-val pairs • reduce(key, vals) is run for each unique key emitted by map() – emits final output Map+Reduce
G)Square Sum (map f list lista listg'Unary operator ( map square“(1234) -14916 o reduce (14916) 30
Square Sum 9 • (map f list [list2 list3 …]) • (map square ‘(1 2 3 4)) – (1 4 9 16) • (reduce + ‘(1 4 9 16)) – (+ 16 (+ 9 (+ 4 1) ) ) – 30
G)Word Count Input consists of(url, contents) pairs map key=url, val=contents) For each word w in contents,emit(W,“1”) reduce key-word, values=unig- counts Sum all“1” s in values list Emit result "(word, sum
Word Count 10 – Input consists of (url, contents) pairs – map(key=url, val=contents): • For each word w in contents, emit (w, “1”) – reduce(key=word, values=uniq_counts): • Sum all “1”s in values list • Emit result “(word, sum)