Job schedule Data plane Runtime Files, FIFO. Network Services NSDDD Name server Daemon Control plane · Job Manager Centralized coordinating process User application to construct graph Linked with Dryad libraries for scheduling vertices Vertex executable Dryad libraries to communicate with JM User application sees channels in/out Arbitrary application code, can use local FS
Runtime • Services – Name Server – Daemon • Job Manager – Centralized coordinating process – User application to construct graph – Linked with Dryad libraries for scheduling vertices • Vertex executable – Dryad libraries to communicate with JM – User application sees channels in/out – Arbitrary application code, can use local FS V V V
Job= directed Acyclic graph Outputs Processing vertices Channels (file, pipe shared memory, puts
Job = Directed Acyclic Graph Processing vertices Channels (file, pipe, shared memory) Inputs Outputs
What's wrong with MapReduce? Literally Map then Reduce and thats it Reducers write to replicated storage Complex jobs pipeline multiple stages No fault tolerance between stages Map assumes its data is always available: simple Output of Reduce: 2 network copies 3 disks In Dryad this collapses inside a single process Big jobs can be more efficient with Dryad
What’s wrong with MapReduce? • Literally Map then Reduce and that’s it… – Reducers write to replicated storage • Complex jobs pipeline multiple stages – No fault tolerance between stages • Map assumes its data is always available: simple! • Output of Reduce: 2 network copies, 3 disks – In Dryad this collapses inside a single process – Big jobs can be more efficient with Dryad
What's wrong with Map+ Reduce? Join combines inputs of different types Split produces outputs of different types Parse a document, output text and references Can be done with Map+Reduce Ugly to program Hard to avoid performance penalty Some merge joins very expensive Need to materialize entire cross product to disk
What’s wrong with Map+Reduce? • Join combines inputs of different types • “Split” produces outputs of different types – Parse a document, output text and references • Can be done with Map+Reduce – Ugly to program – Hard to avoid performance penalty – Some merge joins very expensive • Need to materialize entire cross product to disk
How about Map+ Reduce+join+.? Uniform stages aren 't really uniform
How about Map+Reduce+Join+…? • “Uniform” stages aren’t really uniform