Outline Last Course Review Reading Check Document Retrieval &Inverted Index ■PageRank Algorithm
Outline ◼ Last Course Review & Reading Check ◼ Document Retrieval &Inverted Index ◼ PageRank Algorithm
NC&IS Last Course Review
Last Course Review
ki vi k2 v2 ka va ka va ks Vs ke Ve map map map map a 1b 2 c3c6 a 5c 2 b 7 c 8 Shuffle and Sort:aggregate values by keys a 15 b27 c2368 reduce reduce reduce 12 S2 T3 S3
map map map map Shuffle and Sort: aggregate values by keys reduce reduce reduce k1 v1 k2 v2 k3 v3 k4 v4 k5 v5 k6 v6 a 1 b 2 c 3 c 6 a 5 c 2 b 7 c 8 a 1 5 b 2 7 c 2 3 6 8 r1 s1 r2 s2 r3 s3
MapReduce "Runtime" a Handles scheduling Assigns workers to map and reduce tasks Handles "data distribution" Moves processes to data Handles synchronization Gathers,sorts,and shuffles intermediate data Handles errors and faults Detects worker failures and restarts -Everything happens on top of a distributed FS (later)
MapReduce “Runtime” ◼ Handles scheduling ◼ Assigns workers to map and reduce tasks ◼ Handles “data distribution” ◼ Moves processes to data ◼ Handles synchronization ◼ Gathers, sorts, and shuffles intermediate data ◼ Handles errors and faults ◼ Detects worker failures and restarts ◼ Everything happens on top of a distributed FS (later)
Ki vi k2 v2 ka ve ka va ks vs ke ve map map map map 1b2 a c 3 c 6 a 5c 2 b 7 c 8 combine combine combine combine 1b2 a c 9 a 5c 2 partition partition partition partition Shuffle and Sort:aggregate values by keys a15 b27 c2968 reduce reduce reduce S1 2S2 3s3
combine combine combine combine a 1 b 2 c 9 a 5 c 2 b 7 c 8 partition partition partition partition map map map map k1 v1 k2 v2 k3 v3 k4 v4 k5 v5 k6 v6 a 1 b 2 c 3 c 6 a 5 c 2 b 7 c 8 Shuffle and Sort: aggregate values by keys reduce reduce reduce a 1 5 b 2 7 c 2 9 8 r1 s1 r2 s2 r3 s3 c 2 3 6 8