ARchitecture Master node user Job tracker Slave node Slave node 2 Slave node n Task tracker Task tracker Task tracker Workers Workers Workers 16
Architecture 16 Job tracker Task tracker Task tracker Task tracker Master node Slave node 1 Slave node 2 Slave node N Workers user Workers Workers
O Task Granularity Fine granularity tasks: map tasks > machines Minimizes time for fault recovery Can pipeline shuffling with map execution Better dynamic load balancing Often use 200, 000 map 5000 reduce tasks Running on 2000 machines Process Time ---------------------- User Program Map Reduce waIt Master Assign tasks to worker machines Worker I Map 1 Map 3 Worker 2 Map 2 Worker 3 Read I1 Read 1.3 Read 1.2 Reduce l Worker 4 Read 2.1 Read 2.2 Read 2.3 Reduce 2
Task Granularity 17 • Fine granularity tasks: map tasks >> machines – Minimizes time for fault recovery – Can pipeline shuffling with map execution – Better dynamic load balancing • Often use 200,000 map & 5000 reduce tasks • Running on 2000 machines
②GFS Goal global view 1Re几 uge files available in the face of node failures Master node (meta server Centralized, index all chunks on data servers Chunk server(data server) File is split into contiguous chunks, typically 16-640B Each chunk replicated(usually 2x or 3x Try to keep replicas in different racks
GFS 18 • Goal – global view – make huge files available in the face of node failures • Master Node (meta server) – Centralized, index all chunks on data servers • Chunk server (data server) – File is split into contiguous chunks, typically 16-64MB. – Each chunk replicated (usually 2x or 3x). • Try to keep replicas in different racks
CGFS GFS Master Client 1Co IC1 C‖c 0 5 5 Chunkserver 1 Chunkserver 2 Chunkserver n
GFS 19 GFS Master C0 C1 C5 C2 Chunkserver 1 C0 C5 Chunkserver N C1 C5 C3 Chunkserver 2 … C2 Client