Word Count:Version 1 1:class MAPPER 2: method MAP(docid a,doc d) 3: H-new ASSOCIATIVEARRAY 4: for all term t∈doc d do 5: H{t}←-H{t}+1 Tally counts for entire document 6: for all term t∈Hdo 7: EMIT(term t,count H{t) Are combiners still needed?
Word Count: Version 1 Are combiners still needed?
Computing the Mean:Version 2 1:class MAPPER 2 method MAP(string t,integer r) 3: EMIT(string t,integer r) 1:class COMBINER 2: method CoMBINE(string t,integers [r1,r2,...) 3: sum←-0 4: cmt←-0 5: for all integer r E integers [r1,r2,...]do 6: sum←-sum+T 7: cnt←-cmt+1 8: EMIT(string t,pair (sum,cnt)) Separate sum and count 1:class REDUCER 2: method REDUCE(string t,pairs [(s1,c1),(s2,c2)...]) 3: sum←-0 4: cmt←-0 5: for all pair (s,c)E pairs [(s1,c1),(s2,c2)...]do 6: sum←-sum+s 7: cnt←-cnt+c 8: Tavg←-sum/cnt 9: EMIT(string t,integer ravg) Why doesn't this work?
Computing the Mean: Version 2 Why doesn’t this work?
Another Try:“Stripes” Idea:group together pairs into an associative array (a,b)→1 (a,c→2 (a,d)→5 a→{b:1,c:2,d:5,e:3,f:2} (a,e)3 (a,f)→2 Each mapper takes a sentence: Generate all co-occurring term pairs For each term,emit a{b:countp,c:countc,d:countd... Reducers perform element-wise sum of associative arrays a→{b:1,d:5,e:3} Key:cleverly-constructed data structure a→{b:1,c:2,d:2,f:2} brings together partial results a→{b:2,c:2,d:7,e:3,f:2}
◼ Idea: group together pairs into an associative array ◼ Each mapper takes a sentence: ◼ Generate all co-occurring term pairs ◼ For each term, emit a → { b: countb , c: countc , d: countd … } ◼ Reducers perform element-wise sum of associative arrays Another Try: “Stripes” (a, b) → 1 (a, c) → 2 (a, d) → 5 (a, e) → 3 (a, f) → 2 a → { b: 1, c: 2, d: 5, e: 3, f: 2 } a → { b: 1, d: 5, e: 3 } a → { b: 1, c: 2, d: 2, f: 2 } a → { b: 2, c: 2, d: 7, e: 3, f: 2 } +
f(BlA):“Pairs” (a,*)→32 Reducer holds this value in memory (a,b)→3 (a,b)→3/32 (a,b2)→12 (a,b2)→12/32 (a,b3)→7 (a,b3)→7/32 (a,b4)→1 (a,b4)→1/32 For this to work: Must emit extra(a,*for every b in mapper Must make sure all a's get sent to same reducer (use partitioner) Must make sure (a,*comes first(define sort order) Must hold state in reducer across different key-value pairs
f(B|A): “Pairs” ◼ For this to work: ◼ Must emit extra (a, *) for every bn in mapper ◼ Must make sure all a’s get sent to same reducer (use partitioner) ◼ Must make sure (a, *) comes first (define sort order) ◼ Must hold state in reducer across different key-value pairs (a, b1 ) → 3 (a, b2 ) → 12 (a, b3 ) → 7 (a, b4 ) → 1 … (a, *) → 32 (a, b1 ) → 3 / 32 (a, b2 ) → 12 / 32 (a, b3 ) → 7 / 32 (a, b4 ) → 1 / 32 … Reducer holds this value in memory
NC&IS Homework Reading
Homework Reading