当前位置：和泉文库 > 计算机 > 香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 05 Fault-Tolerant Computing

香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 05 Fault-Tolerant Computing

文件格式：PPT，文件大小：1.88MB，售价：11.2元

文档详细内容（约50页）

Seriously,Why Fault-Tolerance Comes Back? Simply put,technology-driven oundation of Innovation: With technology scaling Defining the Pace 2005 22007 2009 2011 Total Cost 20nm prototype 32nm process 5nm prototype 22nm process Onm prototype Reliability Cost Transistor Cost Time Today's chips are extremely We cannot afford heavyweight, complex (billion transistors macro-scale redundancy for running with less noise margin) commodity computing systems. and are much hotter! Part.1.11 Qiang Xu CUHK,Fall 2012

Part.1 .11 Qiang Xu CUHK, Fall 2012 Seriously, Why Fault-Tolerance Comes Back?  Simply put, technology-driven Time Transistor Cost Reliability Cost Total Cost With technology scaling Today’s chips are extremely complex (billion transistors running with less noise margin) and are much hotter! We cannot afford heavyweight, macro-scale redundancy for commodity computing systems

The Impact of Technology Scaling Decreasing Constant Increasing Failure Failure Failure Rate Rate Rate Burn-in test Higher random 'failure rate Faster less effective Observed Failure wear-out Rate ounjey Mortality" Failure Wear Out Fallures Constant(Random) Failures Time ◆More leakage More process variability Smaller critical charges Weaker transistors and wires Part.1.12 Qiang Xu CUHK,Fall 2012

Part.1 .12 Qiang Xu CUHK, Fall 2012 The Impact of Technology Scaling More leakage More process variability Smaller critical charges Weaker transistors and wires Burn-in test less effective Higher random failure rate Faster wear-out

What Can We Do when Confronting Enemies? Surrender,but don't become traitor Fail,but safe,i.e.,don't corrupt anything (ATM machine) Not that easy as you may think,you have to detect faults! ◆Weaken the enemies fault-avoidance and fault-removal Process improvement with less threats Testing and DfT to remove defective circuits Careful design reviews to remove design bugs More training to reduce operator errors Always some faults cannot be avoided and removed completely Make yourself stronger Fault-tolerance >Adding redundancy to detect,diagnose,confine,mask,compensate and recover from faults Mind the cost in terms of hardware,power,and performance Fault-evasion (a.k.a.,Fault-prediction) Observe,learn and take pre-emptive steps to stop fault from occurring Part.1.13 Qiang Xu CUHK,Fall 2012

Part.1 .13 Qiang Xu CUHK, Fall 2012 What Can We Do when Confronting Enemies?  Surrender, but don’t become traitor  Fail, but safe, i.e., don’t corrupt anything (ATM machine)  Not that easy as you may think, you have to detect faults!  Weaken the enemies  fault-avoidance and fault-removal » Process improvement with less threats » Testing and DfT to remove defective circuits » Careful design reviews to remove design bugs » More training to reduce operator errors  Always some faults cannot be avoided and removed completely  Make yourself stronger  Fault-tolerance » Adding redundancy to detect, diagnose, confine, mask, compensate and recover from faults » Mind the cost in terms of hardware, power, and performance  Fault-evasion (a.k.a., Fault-prediction) » Observe, learn and take pre-emptive steps to stop fault from occurring

A Motivating Case Study Data availability and integrity concerns Distributed DB system with 5 sites So Full connectivity,dedicated links 0 5 User Only direct communication allowed Sites and links may malfunction Lo Redundancy improves availability 6 S:Probability of a site being available L:Probability of a link being available L3 18 Single-copy availability SL Unavailability 1 SL =1-0.99×0.95=5.95% F Data replication methods,and a challenge File duplication:home mirror sites File triplication:home backup 1/backup 2 Are there availability improvement methods with less redundancy? Part.1.14 Qiang Xu CUHK,Fall 2012

Part.1 .14 Qiang Xu CUHK, Fall 2012 A Motivating Case Study Data availability and integrity concerns Distributed DB system with 5 sites Full connectivity, dedicated links Only direct communication allowed Sites and links may malfunction Redundancy improves availability S0 S1 S3 S2 S4 L1 L0 L2 L3 L4 L5 L6 L7 L8 L9 S: Probability of a site being available L: Probability of a link being available Data replication methods, and a challenge File duplication: home / mirror sites File triplication: home / backup 1 / backup 2 Are there availability improvement methods with less redundancy? Single-copy availability = SL Unavailability = 1 – SL = 1 – 0.99  0.95 = 5.95% Fi User

Data Duplication:Home and Mirror Sites S:Site availability e.g,99% F mirror L:Link availability e.g.,95% So 0 User A=SL+(1-SL)SL Lo S Primary site Mirrorsite can be reached can be reached 8 Primary site inaccessible S3 S2 Duplicated availability 2SL-(SL)2 Unavailability =1-2SL +(SL)2 Fhome =(1-SL)2=0.35% Data unavailability reduced from 5.95%to 0.35% Availability improved from ~94%to 99.65% Part.1.15 Qiang Xu CUHK,Fall 2012

Part.1 .15 Qiang Xu CUHK, Fall 2012 Data Duplication: Home and Mirror Sites S0 S1 S3 S2 S4 L1 L0 L2 L3 L4 L5 L6 L7 L8 L9 Data unavailability reduced from 5.95% to 0.35% Availability improved from  94% to 99.65% Duplicated availability = 2SL – (SL) 2 Unavailability = 1 – 2SL + (SL) 2 = (1 – SL) 2 = 0.35% A = SL + (1 – SL)SL Primary site can be reached Primary site inaccessible Mirror site can be reached S: Site availability e.g., 99% L: Link availability e.g., 95% Fi home Fi mirror User

点击进入文档下载页（PPT格式）

共50页，可试读17页，点击继续阅读 ↓↓

您可能感兴趣的文档

香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 04 CRYPTOGRAPHY
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 03 Controlling Salinity in a Potable Water Supply System Using a Constraint Programming Approach
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 02 Game theory in computer science
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 01 A Secure Overlay Cloud Storage System with Access Control and Assured Deletion
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 08 An introduction to expander graphs（EXPANDER GRAPHS AND THEIR APPLICATIONS）
香港中文大学：《Theory of Computational Complexity》课程教学资源（讲义）Lecture 12 A glimpse of computational complexity
香港中文大学：《Theory of Computational Complexity》课程教学资源（讲义）Lecture 11 Information theoretical argument
香港中文大学：《Theory of Computational Complexity》课程教学资源（讲义）Lecture 10 Circuit Complexity 2
香港中文大学：《Theory of Computational Complexity》课程教学资源（讲义）Lecture 9 Circuit Complexity
香港中文大学：《Theory of Computational Complexity》课程教学资源（讲义）Lecture 7 Decision Tree Complexity and Fourier analysis
香港中文大学：《Theory of Computational Complexity》课程教学资源（讲义）Lecture 6 Formula complexity II
香港中文大学：《Theory of Computational Complexity》课程教学资源（讲义）Lecture 5 Formula complexity I
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 06 3D computer vision techniques
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 07-1 Research and Applications of Virtual Medicine Part I Introduction to Medical Visualization
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 07-2 Research and Applications of Virtual Medicine Part II Virtual Reality Based Surgical Simulations
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 11 Design of Microfluidics-Based Biochips
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 10 An Introduction to Bioinformatics and its application in Protein-DNA/Protein Interactions Research and Drug Discovery
香港中文大学：《CMSC5719 Seminar》课程教学资源（讲义）Lecture 12 Introduction to Computational Photography
Minimal Cover-Automata for Finite Languages
香港中文大学：《Topics in Theoretical Computer Science》课程教学资源（PPT课件讲稿）Lecture 7 Stable matching.Gale-Shapley algorithm
《农业信息技术概论》课程教学资源（教学大纲）
《仿真与虚拟农业》课程教学资源（实验指导）
天津农学院：《微机原理与汇编语言程序设计》课程教学资源（实验指导书）
《3S技术导论》课程教学资源（实验指导）

点击购买下载（PPT）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录