Straggler What is straggler in MapReduce? Nodes on which tasks take an unusually long time to finish It will Delay the job execution time Degrade the cluster throughput How to solve it peculative execution Slow task is backed up on an alternative machine with the hope that the backup one can finish faster
Straggler What is straggler in MapReduce? Nodes on which tasks take an unusually long time to finish It will: Delay the job execution time Degrade the cluster throughput How to solve it? Speculative execution Slow task is backed up on an alternative machine with the hope that the backup one can finish faster
Outlines 1 Introduction 02.Background 3. Previous work 4. Pitfalls 5. Our Desian 06. Evaluation 07. Conclusion
0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion
Architecture Master Assign Assign Part Map Part 2 Reduce plit 1 Split 2 art 1 N OutputI Map Part 2 Output2 Split 米 Reduce Output files Input files Part Part 2 Map Stage Reduce stage
Architecture Split 1 Split 2 … Split M Map Part 2 Part 1 Map Part 2 Part 1 Map Part 2 Part 1 Reduce Reduce Output2 Input files Map Stage Reduce Stage Output files Output1 Master … Assign Assign
Programming model a Input: (key, value) pairs o Output: key*, value *) pairs Phase Map Combine ap List(K1, V1) List(K2, v2) List(K2 List(v2)) Stage Copy Sort Reduce Reduce List(K2 Ordered( K2 List(V2)) List(V2) List(K3, V3)
Programming model ❑ Input : (key, value) pairs ❑ Output : (key*, value*) pairs Phase Stage Map: Map Combine List(K1,V1) → List(K2,V2) → List(K2, List(V2)) Reduce: Copy Sort Reduce List(K2, List(V2)) → Ordered (K2, List(V2)) → List(K3,V3)
Causes of Stragglers nternal factors External factors resource capacity of worker resource competition due to nodes is heterogeneous Co-hosted applications resource competition due to v input data skew other MapReduce tasks running on the same worker v remote input or output node source is too slow hardware fault
Causes of Stragglers Internal factors External factors ✓ resource capacity of worker nodes is heterogeneous ✓ resource competition due to other MapReduce tasks running on the same worker node ✓ resource competition due to co-hosted applications ✓ input data skew ✓ remote input or output source is too slow ✓ hardware faulty