当前位置：和泉文库 > 计算机 > 浏览文档

上海交通大学：《Multicore Architecture and Parallel Computing》课程教学资源（PPT课件讲稿）Lecture 9 MapReduce

文件格式：PPTX，文件大小：2.6MB，售价：11.62元

文档详细内容（约56页）

G)Word Count Count map key=url, val=contents For each word w in contents, emit(w Ilustrated reduce key=word, values=unig_counts Sum all“1” s in values list Emit result“(word,sum)” see bob throw see bob bob see spot run rUn run see 2 see spot spot throw 1 throw 1 11

Word Count 11 Count, Illustrated map(key=url, val=contents): For each word w in contents, emit (w, “1”) reduce(key=word, values=uniq_counts): Sum all “1”s in values list Emit result “(word, sum)” see bob throw see spot run see 1 bob 1 run 1 see 1 spot 1 throw 1 bob 1 run 1 see 2 spot 1 throw 1

G)Reverse Web-Link ·Map For each Urc linking to target, Output target, source> pairs Reduce Concatenate list of all source URLs -Outputs <target, list(source)>pairs

Reverse Web-Link 12 • Map – For each URL linking to target, … – Output <target, source> pairs • Reduce – Concatenate list of all source URLs – Outputs: <target, list (source)> pairs

G) Model is Widely Used 1000 800 600 400 200 Mar May Jul Sep Nov Jan Mar May Jul Sep 2003 2004 Exam ple uses: distributed grep distributed sort web link-graph reversal term-vector /host web access log stats inverted index construction statistical machine document clustering machine learning tr anslation

Model is Widely Used 13 Example uses: distributed grep distributed sort web link-graph reversal term-vector / host web access log stats inverted index construction document clustering machine learning statistical machine translation ... ...

@Implementation Typical cluster 100s/1000s of 2-CPU X86 machines, 2-4 GB of memory Limited bisection bandwidth Storage is on local IdE disks GF S: distributed file system manages data (SOSP03) Job scheduling system: jobs made up of tasks scheduler assigns tasks to machines Implementation is a C++ library linked into user programs

Implementation 14 Typical cluster: • 100s/1000s of 2-CPU x86 machines, 2-4 GB of memory • Limited bisection bandwidth • Storage is on local IDE disks • GFS: distributed file system manages data (SOSP'03) • Job scheduling system: jobs made up of tasks, scheduler assigns tasks to machines Implementation is a C++ library linked into user programs

How is this distributed? Partition input key/value pairs into chunks, run mapo tasks in parallel After all mapos are complete, consolidate all emitted values for each unique emitted key Now partition space of output map keys, and run reducel in parallel If mapo or reduce fails, reexecute

Execution 15 • How is this distributed? ➢ Partition input key/value pairs into chunks, run map() tasks in parallel ➢ After all map()s are complete, consolidate all emitted values for each unique emitted key ➢ Now partition space of output map keys, and run reduce() in parallel • If map() or reduce() fails, reexecute!

点击进入文档下载页（PPTX格式）

共56页，可试读19页，点击继续阅读 ↓↓

您可能感兴趣的文档

河南中医药大学（河南中医学院）：《计算机网络》课程教学资源（PPT课件讲稿）第三章数据链路层
《多媒体教学软件设计》课程教学资源（PPT课件讲稿）第4章多媒体教学软件的图文演示设计
四川大学：《计算机操作系统 Operating System Principles》课程教学资源（PPT课件讲稿）第9章文件管理
南京航空航天大学：《数据结构》课程教学资源（PPT课件讲稿）第十章排序
西安电子科技大学：《信息系统安全》课程教学资源（PPT课件讲稿）第二章安全控制原理
《C程序设计》课程电子教案（PPT课件讲稿）第四章数组和结构
北京航空航天大学：Graph Search & Social Networks
《数字图像处理 Digital Image Processing》课程教学资源（各章要求及必做题参考答案）
Online Minimum Matching in Real-Time Spatial Data：Experiments and Analysis
中国科学技术大学：《并行算法实践》课程教学资源（PPT课件讲稿）上篇并行程序设计导论单元II 并行程序编程指南第七章 OpenMP编程指南
上海交通大学：《网络安全技术》课程教学资源（PPT课件讲稿）比特币（主讲：刘振）
电子工业出版社：《计算机网络》课程教学资源（第五版，PPT课件讲稿）第三章数据链路层
西安交通大学：《网络与信息安全》课程PPT教学课件（网络入侵与防范）第四章口令破解与防御技术
《机器学习》课程教学资源（PPT课件讲稿）第十二章计算学习理论 Machine Learning
广西外国语学院：《计算机网络》课程教学资源（PPT课件讲稿）第9章 DHCP协议（任课教师：卢豫开）
《信息技术基础》课程教学资源（PPT课件）信息技术基础知识的内容
《PHP程序设计》教学资源（PPT课件讲稿）项目二网站用户中心
Microsoft .NET（PPT课件讲稿）Being Objects and A Glimpse into Coding
《Data Warehousing & Data Mining》课程教学资源（PPT讲稿）Ch 2 Discovering Association Rules
《软件工程》课程教学资源（PPT课件讲稿）需求分析
西安电子科技大学：《微机原理与接口技术》课程教学资源（PPT课件讲稿）第八章中断系统与可编程中断控制器8259A
《ARM原理与设计》课程教学资源（PPT课件讲稿）Lecture 04 Cortex M3指令集
电子工业出版社：《计算机网络》课程教学资源（第五版，PPT课件讲稿）第一章概述
上海交通大学：《计算机控制技术》课程教学资源（PPT课件）第一章计算机控制系统概述 Computer Control Technology

点击购买下载（PPTX）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录