MapReduce的概念-1 ·MapReduce MapReduce is a software framework introduced by Google in 2004 to support distributed computing on large data sets on clusters of computers. The framework is inspired by the map and reduce functions commonly used in functional programming, although their purpose in the MapReduce framework is not the same as their original forms. MapReduce libraries have been written in C++,C#,Erlang, Java,OCaml,Perl,Python,PHP,Ruby,F#,R and other programming languages
MapReduce的概念-1 • MapReduce – MapReduce is a soJware framework introduced by Google in 2004 to support distributed compu6ng on large data sets on clusters of computers. – The framework is inspired by the map and reduce func6ons commonly used in func6onal programming, although their purpose in the MapReduce framework is not the same as their original forms. – MapReduce libraries have been wriNen in C++, C#, Erlang, Java, OCaml, Perl, Python, PHP, Ruby, F#, R and other programming languages
MapReduce的概念-2 MapReduce is a framework for processing huge datasets on certain kinds of distributable problems using a large number of computers(nodes). The nodes collectively are referred to as a cluster (if all nodes use the same hardware) or a grid (if the nodes use different hardware). Computational processing can occur on data stored either in a filesystem(unstructured)or within a database(structured)
MapReduce的概念-2 • MapReduce is a framework for processing huge datasets on certain kinds of distributable problems using a large number of computers (nodes). • The nodes collec6vely are referred to as – a cluster (if all nodes use the same hardware) – or a grid (if the nodes use different hardware). • Computa6onal processing can occur on data stored either in a filesystem (unstructured) or within a database (structured)
MapReduce的概念-2 大数据计算任务 问题分解 子任务 子任务 子任务 子任务 结果合 计算结果
MapReduce的概念-2 大数据计算任务 子任务 子任务 子任务 子任务 …… 问题分解 计算结果 结果合并
MapReduce的概念-3 ·"Map"step The master node takes the input,partitions it up into smaller sub-problems,and distributes those to worker nodes.A worker node may do this again in turn,leading to a multi-level tree structure.The worker node processes that smaller problem,and passes the answer back to its master node. 。"Reduce"step The master node then takes the answers to all the sub- problems and combines them in some way to get the output-the answer to the problem it was originally trying to solve
MapReduce的概念-3 • "Map" step – The master node takes the input, par66ons it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a mul6-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node. • "Reduce" step – The master node then takes the answers to all the subproblems and combines them in some way to get the output – the answer to the problem it was originally trying to solve
MapReduce的原理-I 。Example map(String key,String value): /key:document name Consider the problem of /value:document contents counting the number of for each word w in value: occurrences of each word in a EmitIntermediate(w,"1"); large collection of documents. reduce(String key,Iterator values): The user would write code /key:a word similar to the following /values:a list of counts pseudo-code: int result 0; for each v in values: result +Parselnt(v); Emit(AsString(result));
MapReduce的原理-1 • Example Consider the problem of coun6ng the number of occurrences of each word in a large collec6on of documents. The user would write code similar to the following pseudo-code: map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1"); reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result));