CMSC 5719 MSc Seminar Fault-Tolerant Computing X,Qiang(Johnny)徐強 [Partly adapted from Koren Krishna,and B.Parhami Slides] Part.1.1 Qiang Xu CUHK,Fall 2012
Part.1 .1 Qiang Xu CUHK, Fall 2012 CMSC 5719 MSc Seminar Fault-Tolerant Computing XU, Qiang (Johnny) 徐強 [Partly adapted from Koren & Krishna, and B. Parhami Slides]
Why Learn This Stuff? 空出= 滨治 Part.1.2 Qiang Xu CUHK,Fall 2012
Part.1 .2 Qiang Xu CUHK, Fall 2012 Why Learn This Stuff?
Outline ◆Motivation Fault classification ◆Redundancy Metrics for Reliability ◆Case studies Part.1.3 Qiang Xu CUHK,Fall 2012
Part.1 .3 Qiang Xu CUHK, Fall 2012 Outline Motivation Fault classification Redundancy Metrics for Reliability Case studies
Fault-Tolerance Basic definition Fault-tolerant systems ideally systems capable of executing their tasks correctly regardless of either hardware failures or software errors +In practice we can never guarantee the flawless execution of tasks under any circumstances Limit ourselves to types of failures and errors which are more likely to occur Part.1.4 Qiang Xu CUHK,Fall 2012
Part.1 .4 Qiang Xu CUHK, Fall 2012 Fault-Tolerance - Basic definition Fault-tolerant systems - ideally systems capable of executing their tasks correctly regardless of either hardware failures or software errors In practice - we can never guarantee the flawless execution of tasks under any circumstances Limit ourselves to types of failures and errors which are more likely to occur
Need For Fault-Tolerance ◆ Critical applications require extreme fault tolerance (e.g.,aircrafts,nuclear reactors, medical equipment,and financial applications) A malfunction of a computer in such applications can lead to catastrophe Their probability of failure must be extremely low, possibly one in a billion per hour of operation System operating in a harsh environment with high failure possibilities electromagnetic disturbances particle hits and alike Complex systems consisting of millions of devices Part.1.5 Qiang Xu CUHK,Fall 2012
Part.1 .5 Qiang Xu CUHK, Fall 2012 Need For Fault-Tolerance Critical applications require extreme fault tolerance (e.g., aircrafts, nuclear reactors, medical equipment, and financial applications) A malfunction of a computer in such applications can lead to catastrophe Their probability of failure must be extremely low, possibly one in a billion per hour of operation System operating in a harsh environment with high failure possibilities electromagnetic disturbances particle hits and alike Complex systems consisting of millions of devices