The Problem to be solved The primary safety problem in computer-based systems is the lack of appropriate constraints on design The job of the system safety engineer is to identify the design constraints necessary to maintain safety and to ensure the system and software design enforces ther
c ✌☎✄☎✍✎✄☎✏✎✠☎✑✝✒✔✓☎✓ ✗✠☎☞✝✘☎✙☎✚✄☎✞✏✜✛☎✑☎✢✝✣✥✤✏✎✦ The Problem to be Solved The primary safety problem in computer−based systems is the lack of appropriate constraints on design. The job of the system safety engineer is to identify the design constraints necessary to maintain safety and to ensure the system and software design enforces them.
Safety≠ Reliability Accidents in high-tech systems are changing their nature, and we must change our approaches to safety accordingly Confusing Safety and Reliability From an FAA report on AtC software architectures The faas en route automation meets the criteria for consideration as a safety-critical system. Therefore en route automation systems must posses ultra-high eliability From a blue ribbon panel report on the v-22 Osprey problems Safety [software] Recommendation: Improve reliability, then verify by extensive test/fix/test in challenging environments
c ✌☎✄☎✍✎✄☎✏✎✠☎✑✝✒✔✓✭✕ ✮✛☎✯✄☎✚✰✜✍✎✏✲✱☎✣✥✄☎☛ ✤✛☎✡☎✤ ☛ ✤✚✰ . . Safety Reliability Accidents in high−tech systems are changing their nature, and we must change our approaches to safety accordingly. . . c ✌☎✄☎✍✎✄☎✏✎✠☎✑✝✒✔✓✭✧ ✮✛☎✯✄☎✚✰✜✍✎✏✲✱☎✣✥✄☎☛ ✤✛☎✡☎✤ ☛ ✤✚✰ Confusing Safety and Reliability From an FAA report on ATC software architectures: "The FAA’s en route automation meets the criteria for consideration as a safety−critical system. Therefore, en route automation systems must posses ultra−high reliability." From a blue ribbon panel report on the V−22 Osprey problems: "Safety [software]: ... Recommendation: Improve reliability, then verify by extensive test/fix/test in challenging environments
OLe Safety vs. Reliab ility Does Software Fail? Failure: Nonperformance or inability of system or component to perform its intended function for a specified time under specified environmental conditions A basic abnormal occurrence, e.g burned out bearing in a pump relay not closing properly when voltage applied Fault: Higher-order events, e.g relay closes at wrong time due to improper functioning of an upstream component All failures are faults but not all faults are failures Reliability Engineering Approach to Safety Reliability: The probability an item will perform its required function in the specified manner over a given time eriod and under specified or assumed conditions (Note: Most software-related accidents result from errors in specified requirements or function and deviations from assumed conditions. Concerned primarily with failures and failure rate reduction Parallel redundancy Standby sparing Safety factors and margins Derating Screening Timed replacements
c ✌☎✄☎✍✎✄☎✏✎✠☎✑✝✒✔✓✭★ ✮✛☎✯✄☎✚✰✜✍✎✏✲✱☎✣✥✄☎☛ ✤✛☎✡☎✤ ☛ ✤✚✰ Does Software Fail? Failure: Nonperformance or inability of system or component to perform its intended function for a specified time under specified environmental conditions. A basic abnormal occurrence, e.g., burned out bearing in a pump relay not closing properly when voltage applied Fault: Higher−order events, e.g., relay closes at wrong time due to improper functioning of an upstream component. All failures are faults but not all faults are failures. c ✌☎✄☎✍✎✄☎✏✎✠☎✑✝✒✔✓✭✳ ✮✛☎✯✄☎✚✰✜✍✎✏✲✱☎✣✥✄☎☛ ✤✛☎✡☎✤ ☛ ✤✚✰ Reliability Engineering Approach to Safety Reliability: The probability an item will perform its required function in the specified manner over a given time period and under specified or assumed conditions. (Note: Most software−related accidents result from errors in specified requirements or function and deviations from assumed conditions.) Concerned primarily with failures and failure rate reduction Parallel redundancy Standby sparing Safety factors and margins Derating Screening Timed replacements
OLeveson-16 Reliability Engineering Approach to Safety(2) Assumes accidents are the result of component failure t Techniques exist to increase component reliability Failure rates in hardware are quantifiable Omits important factors in accidents May even decrease safety Many accidents occur without any component" failure e.g. Accidents may be caused by equipment operation outside parameters and time limits upon which reliability analyses are based Or may be caused by interactions of components all operating according to specification Highly reliable components are not necessarily safe
c ✌✵✄☎✍✲✄☎✏✎✠☎✑✝✒✶✓✭✷ ✮✛☎✯✄☎✚✰✜✍✎✏✎✱☎✣✴✄☎☛ ✤✛☎✡☎✤ ☛✤✚✰ Reliability Engineering Approach to Safety (2) Assumes accidents are the result of component failure. Techniques exist to increase component reliability Failure rates in hardware are quantifiable. Omits important factors in accidents. May even decrease safety. Many accidents occur without any component ‘‘failure’’ e.g. Accidents may be caused by equipment operation outside parameters and time limits upon which reliability analyses are based. Or may be caused by interactions of components all operating according to specification Highly reliable components are not necessarily safe
Software Component reuse One of most common factors in software -related accidents Software contains assumptions about its environment Accidents occur when these assumptions are incorrect Therac-25 Ariane 5 U.K. ATC software Most likely to change the features embedded in or controlled by the software CoTS makes safety analysis more difficult Safety and reliability are different qualities! Software-Related Accidents Are usually caused by flawed requirements Incomplete or wrong assumptions about operation of controlled system or required operation of computer Unhandled controlled-system states and environmental condition Merely trying to get the software"correct"or to make it reliable will not make it safer under these conditions
c ✌☎✄☎✍✎✄☎✏✎✠☎✑✝✒✔✓✭✪ ✮✛☎✯✄☎✚✰✜✍✎✏✲✱☎✣✥✄☎☛ ✤✛☎✡☎✤ ☛ ✤✚✰ Software Component Reuse One of most common factors in software−related accidents Software contains assumptions about its environment. Accidents occur when these assumptions are incorrect. Therac−25 Ariane 5 U.K. ATC software Most likely to change the features embedded in or controlled by the software. COTS makes safety analysis more difficult. Safety and reliability are different qualities! c ✌☎✄☎✍✎✄☎✏✎✠☎✑✝✒✔✓✭✩ ✮✛☎✯✄☎✚✰✜✍✎✏✲✱☎✣✥✄☎☛ ✤✛☎✡☎✤ ☛ ✤✚✰ Software−Related Accidents Are usually caused by flawed requirements Incomplete or wrong assumptions about operation of controlled system or required operation of computer. Unhandled controlled−system states and environmental conditions. Merely trying to get the software ‘‘correct’’ or to make it reliable will not make it safer under these conditions