MapReduce:Word Count Example Consider the problem of counting the number of occurrences of each word in a large collection of documents How would you do it in parallel? ■Solution: Divide documents among workers Each worker parses document to find all words,map function outputs (word,count)pairs Partition (word,count)pairs across workers based on word For each word at a worker,reduce function locally add up counts Given input:"One a penny,two a penny,hot cross buns." Records output by the map()function would be opep)1).("1).("peny1). hot”,1),(“cross'”,1),(buns”,1). Records output by reduce function would be (“One”,1),(“a”,2),(“penny'”,2),(“two”,1),(“hot”,1),(“cross'”, 1),(“buns”,1) Database System Concepts-7th Edition 10.18 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 10.18 ©Silberschatz, Korth and Sudarshan th Edition MapReduce: Word Count Example ▪ Consider the problem of counting the number of occurrences of each word in a large collection of documents ▪ How would you do it in parallel? ▪ Solution: • Divide documents among workers • Each worker parses document to find all words, map function outputs (word, count) pairs • Partition (word, count) pairs across workers based on word • For each word at a worker, reduce function locally add up counts ▪ Given input: “One a penny, two a penny, hot cross buns.” • Records output by the map() function would be ▪ (“One”, 1), (“a”, 1), (“penny”, 1),(“two”, 1), (“a”, 1), (“penny”, 1), (“hot”, 1), (“cross”, 1), (“buns”, 1). • Records output by reduce function would be ▪ (“One”, 1), (“a”, 2), (“penny”, 2), (“two”, 1), (“hot”, 1), (“cross”, 1), (“buns”, 1)
Pseudo-code of Word Count map(String record): for each word in record emit(word,1); /First attribute of emit above is called reduce key /In effect,group by is performed on reduce key to create a /list of values (all 1's in above code).This requires shuffle step ∥across machines. /The reduce function is called on list of values in each group reduce(String key,List value_list): String word key int count 0; for each value in value list: countcount value Output(word,count); Database System Concepts-7th Edition 10.19 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 10.19 ©Silberschatz, Korth and Sudarshan th Edition Pseudo-code of Word Count map(String record): for each word in record emit(word, 1); // First attribute of emit above is called reduce key // In effect, group by is performed on reduce key to create a // list of values (all 1’s in above code). This requires shuffle step // across machines. // The reduce function is called on list of values in each group reduce(String key, List value_list): String word = key int count = 0; for each value in value_list: count = count + value Output(word, count);
MapReduce Programming Model Inspired from map and reduce operations commonly used in functional programming languages like Lisp. Input:a set of key/value pairs User supplies two functions: ·map(k,)→list(k1,v1) ·reduce(k1,list(v1)→v2 (k1,v1)is an intermediate key/value pair Output is the set of (k1,v2)pairs For our example,assume that system Breaks up files into lines,and Calls map function with value of each line Key is the line number Database System Concepts-7th Edition 10.20 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 10.20 ©Silberschatz, Korth and Sudarshan th Edition MapReduce Programming Model ▪ Inspired from map and reduce operations commonly used in functional programming languages like Lisp. ▪ Input: a set of key/value pairs ▪ User supplies two functions: • map(k,v) → list(k1,v1) • reduce(k1, list(v1)) → v2 ▪ (k1,v1) is an intermediate key/value pair ▪ Output is the set of (k1,v2) pairs ▪ For our example, assume that system • Breaks up files into lines, and • Calls map function with value of each line ▪ Key is the line number