Step 1: Understand and Negotiate Goals of the Investigation. Every failure investigation needs to establish four items at the onset The priority of the investigation The resources available Any constraints imposed The goal of the investigation There usually is not a discussion to define these four items, but there should be. These four items set the importance, direction, and expected results of the failure investigation at an early stage. Always discuss priority first, because priority sets the pace and can sometimes determine the resources and constraints. Resources and constraints may simply involve the same three items: money, personnel, and time Constraints can also include inability to destroy the hardware, lack of other pieces to review, lack of information, and so on The goal of the failure investigation is the final and most important item to be understood. The customer may desire a very simple goal, one that does not require determination of the root cause of the problem. Discuss the goal, and review whether determining root cause is beneficial. For example, a customer requested the determination of a minimum property allowable for some hardware that had been received with mechanical properties below the specification limit. The goal of the customer was to provide information for a stress analysis to determine if the hardware could be deemed acceptable on a nonconformance report. However, this problem was really a failure of the heat treatment or the raw material. Because this was an approved supplier that fabricated many other pieces of hardware from a large amount of raw material from inventory stock, the internal customer was persuaded that it would be in the best interest of the company to determine the root cause of the problem. It was viewed worthwhile to investigate the heat treatment and raw material as well Always understand these four items at the beginning of the failure investigation so that all parties are in Step 2: Obtain Clear Understanding of the Failure. What is the problem? This is the first step listed in the previous section, " The Four-Step Problem-Solving Process. This should also be the first question asked in a failure investigation. What happened? Why is the investigation team here? Engineers so often want the details that they forget to hear the reason their customer wants their help Information is crucial at this point The investigator needs to make himself as much of an expert about the component or system or process as possible There are tools at one's disposal. Brainstorming is a good one. Get everyone together and brainstorm the theories as to why the failure occurred. This means everyone. The investigator may learn some very important information. For example, the drawing may call out one process specification, but the people on the shop floor may have replaced that specification a few months ago through production documentation. Just reviewing drawing would have missed this information Review all documentation and records available on operational history and fabrication. This process is time consuming and tedious but important. Remember to be rigorous in all aspects of the investigation. "It is not only in the finding that we learn much, but also in the looking. All things, big and little, teach us; perhaps at the end, the little things teach us the most. This quote is by Dr. Van Helsing during his search for the elusive Count Dracula The failure scene is very important. Document and protect it. If the investigator is not able to visit the failure scene, he should request the drawings and/or photographs of the failure scene. Photographs of the failure scene both before and after the failure, are frequently helpful If an investigator has the opportunity to go, he should take a field investigation kit( Fig. I and Table 1)and document everything. Film is cheap, and now with digital cameras, photos are free. Take hundreds or even thousands of photographs
Step 1: Understand and Negotiate Goals of the Investigation. Every failure investigation needs to establish four items at the onset: · The priority of the investigation · The resources available · Any constraints imposed · The goal of the investigation There usually is not a discussion to define these four items, but there should be. These four items set the importance, direction, and expected results of the failure investigation at an early stage. Always discuss priority first, because priority sets the pace and can sometimes determine the resources and constraints. Resources and constraints may simply involve the same three items: money, personnel, and time. Constraints can also include inability to destroy the hardware, lack of other pieces to review, lack of information, and so on. The goal of the failure investigation is the final and most important item to be understood. The customer may desire a very simple goal, one that does not require determination of the root cause of the problem. Discuss the goal, and review whether determining root cause is beneficial. For example, a customer requested the determination of a minimum property allowable for some hardware that had been received with mechanical properties below the specification limit. The goal of the customer was to provide information for a stress analysis to determine if the hardware could be deemed acceptable on a nonconformance report. However, this problem was really a failure of the heat treatment or the raw material. Because this was an approved supplier that fabricated many other pieces of hardware from a large amount of raw material from inventory stock, the internal customer was persuaded that it would be in the best interest of the company to determine the root cause of the problem. It was viewed worthwhile to investigate the heat treatment and raw material as well. Always understand these four items at the beginning of the failure investigation so that all parties are in agreement. Step 2: Obtain Clear Understanding of the Failure. What is the problem? This is the first step listed in the previous section, “The Four-Step Problem-Solving Process.” This should also be the first question asked in a failure investigation. What happened? Why is the investigation team here? Engineers so often want the details that they forget to hear the reason their customer wants their help. Information is crucial at this point. The investigator needs to make himself as much of an expert about the component or system or process as possible. There are tools at one's disposal. Brainstorming is a good one. Get everyone together and brainstorm the theories as to why the failure occurred. This means everyone. The investigator may learn some very important information. For example, the drawing may call out one process specification, but the people on the shop floor may have replaced that specification a few months ago through production documentation. Just reviewing the drawing would have missed this information. Review all documentation and records available on operational history and fabrication. This process is time consuming and tedious but important. Remember to be rigorous in all aspects of the investigation. “It is not only in the finding that we learn much, but also in the looking. All things, big and little, teach us; perhaps at the end, the little things teach us the most.” This quote is by Dr. Van Helsing during his search for the elusive Count Dracula. The failure scene is very important. Document and protect it. If the investigator is not able to visit the failure scene, he should request the drawings and/or photographs of the failure scene. Photographs of the failure scene both before and after the failure, are frequently helpful. If an investigator has the opportunity to go, he should take a field investigation kit (Fig. 1 and Table 1) and document everything. Film is cheap, and now with digital cameras, photos are free. Take hundreds or even thousands of photographs
Fig. 1 Carrying case with contents of field investigation kit(Table 1)without camera, cell phone, and laptop or hand-held computer device Table 1 field investigation kit contents Ite Reason/ comments Open and questioning Be prepared Have questions. Be ready for the unexpected. Good attitude You need the people you meet more than they need you Professional demeanor If you look like you are organized and know what you are doing, people will consider your presence important, and you might get more hel Digital camera Take as many pictures as you can. It used to be that"film is cheap, but now J-pegs are free. Photograph the item, the area, any supporting equipment, maybe people anything and everything. Be careful about color. Known color chart, white Kodak grey or color chart, Ace Hardware paint charts, red button, your tie. lece of paper, etc. Anything you can use later as a standard to judge the color of your pictures. Ruler-steel and plastic Use in photos for scale--steel rulers are grey and good on light backgrounds. White plastic rulers are good for dark backgrounds. 2. Steel rulers will tell you if the something is magnetic(make sure your ruler is not 3. Plastic rulers may work better if the subject is magnetic. Magnet--flat one and a 1. Identifies magnetic materials from nonmagnetic materials wand with an extension 2. Can use the wand to retrieve items 3. Can use the wand to collect debris that is magnetic. This will immediately separate your debris into magnetic and nonmagnetic debris Loop and magnifying Use to look at samples. Loop is higher magnification, 10-25x. Lower-power magnifying glass can also be helpful Tape measure Measure long distances Indelible ink marker(fine) Mark items, bags, bottles Flashlight Look at items in holes. dark areas etc. Good to have 90o bend attachment to look in holes and pin-light attachment to look in small crevices Conductivity tester Check conductivity or nonconductivity of surface Mirror Check around corners and under objects Surface-finish comparators Machine, cast, electrical discharge machining surface-finish standards by GAR Electroforming division Microleatherman tool and Scissors, screwdriver(flat and Phillips), punch, tweezers, blades, plastic pocket knife toothpick(not on the plane though). Various uses include examination tools, surface-finish profilometer, cutting, etc Pen and pencil Just because White pieces of lined Color standard, note paper, drawing paper, and collection funnel Thefileisdownloadedfromwww.bzfxw.com
Fig. 1 Carrying case with contents of field investigation kit (Table 1) without camera, cell phone, and laptop or hand-held computer device Table 1 Field investigation kit contents Items Reason/Comments Open and questioning mind Be prepared. Have questions. Be ready for the unexpected. Good attitude You need the people you meet more than they need you. Professional demeanor If you look like you are organized and know what you are doing, people will consider your presence important, and you might get more help. Digital camera Take as many pictures as you can. It used to be that “film is cheap,” but now “J-pegs are free.” Photograph the item, the area, any supporting equipment, maybe people, anything and everything. Be careful about color. Known color chart, white piece of paper, etc. Kodak grey or color chart, Ace Hardware paint charts, red button, your tie. Anything you can use later as a standard to judge the color of your pictures. 1. Use in photos for scale—steel rulers are grey and good on light backgrounds. White plastic rulers are good for dark backgrounds. 2. Steel rulers will tell you if the something is magnetic (make sure your ruler is NOT magnetic). Ruler—steel and plastic 3. Plastic rulers may work better if the subject is magnetic. Magnet—flat one and a “wand” with an extension. 1. Identifies magnetic materials from nonmagnetic materials 2. Can use the wand to retrieve items. 3. Can use the wand to collect debris that is magnetic. This will immediately separate your debris into magnetic and nonmagnetic debris. Loop and magnifying glass Use to look at samples. Loop is higher magnification, 10–25×. Lower-power magnifying glass can also be helpful Tape measure Measure long distances Indelible ink marker (fine) Mark items, bags, bottles Flashlight Look at items in holes, dark areas, etc.. Good to have 90° bend attachment to look in holes and pin-light attachment to look in small crevices Conductivity tester Check conductivity or nonconductivity of surface Mirror Check around corners and under objects Surface-finish comparators Machine, cast, electrical discharge machining surface-finish standards by GAR Electroforming Division Microleatherman tool and pocket knife Scissors, screwdriver (flat and Phillips), punch, tweezers, blades, plastic toothpick (not on the plane, though). Various uses include examination tools, surface-finish profilameter, cutting, etc. Pen and pencil Just because… White pieces of lined Color standard, note paper, drawing paper, and collection funnel The file is downloaded from www.bzfxw.com
Plastic bags,4×4in. ample collection Swabs Sample collection--always keep one for control sample Alloy reference list Materials, compositions, specifications, data Hardness Conversion Martensitic and austenitic charts Charts Addresses and phone Assistance, information, etc. Technical information Specification lists, design criteria, drawings, notes, etc.. This is where the individual criteria come in Cellular phone Immediate access to important contacts Eyewitness report forms Better if these are e-mailed or sent ahead to be filled out as soon as possible Computer provides spreadsheets, word documents, etc. Other Anvthing you think you will need Equally important are eyewitness statements or incident reports. Unfortunately, when things go wrong, eyewitnesses seem to disappear, or, at best, are reluctant to speak. Sometimes, they are shielded by management or unions. Getting information from them is a learned talent One suggestion is to send forms ahead of time to be filled out. The number that are returned and the completeness of the forms may provide ome indication of the amount of cooperation to be expected In many cases, the investigator may never tually see the failure scene and has to depend on the photographs and eyewitness statements available It is advisable that the eyewitness statements be formatted in a standard form similar to a procedure. Personnel are more familiar with forms, and the document is less intimidating. It is also suggested that the eyewitness statements be electronic and in a form that is searchable. This is convenient. In addition the information is useful later for the generation of statistics or for searching for similar failures, failures at the same plant, failures involving the same personnel, and so on There are other related information items that may not be obvious, such as on-the-floor"standard"process deviations, planned or unplanned plant shutdowns, time of year, weather conditions, where the failure happened (geography), and personnel losses. In fact, a good question to request on the eyewitness statement form is, "Are you aware of any recent changes in operations, personnel, equipment, and so on? "This kind of information is not apparent, so ask An example is the failure of a large precipitation-hardening martensitic stainless steel forging. The failures were sporadic and were discovered during acceptance testing, due to the lack of mechanical properties. It turned out that the forging company had no trouble cooling the large forgings to below -50C(-60F)during the quench cycle in the winter months when there was an abundance of snow outside to dump in the quench pit. However, during the hot summer months, the engineers had to calculate how much ice to bring in to dump in the quench tank. It became apparent that their calculations were not correct. This case is a perfect example of when geographic location and time of year are important Another very good question that should al ways be asked is, Has this happened before? Many times,a company simply dismisses a failure the first few times it happens and writes it off as a unique"or"one-time occurrence. These occurrences are then forgotten until the next one. Unfortunately, because the failure is not documented and is dismissed as unique, no one remembers the first failure, and so, the second failure may also be deemed unique. It is truly a downward spiral e consistent. If the investigator always requests complete failure scene photographs, eyewitness statements, and all the information available, the customers start to collect the information before contacting the investigator. They will have been trained. If the investigator works in a large company, it would be a good idea to create the procedure" What to do when a failure occurs"and then to train company personnel on how to properly document and protect the failure scene and how to properly fill out eyewitness statements. As a start, the Astm Committee E-30 on forensic sciences and astm Subcommittee E-30-05 on forens heering sciences that have created some specifications that can be reviewed and used when relevant. Some specifications are E 620. Standard Practice for Reporting Opinions of Technical Experts E678. Standard Practice for Evaluation of Technical Data
paper Plastic bags, 4 × 4 in. Sample collection Swabs Sample collection—always keep one for control sample Alloy reference list Materials, compositions, specifications, data Hardness Conversion Charts Martensitic and austenitic charts Addresses and phone numbers Assistance, information, etc. Technical information Specification lists, design criteria, drawings, notes, etc.. This is where the individual criteria come in. Cellular phone Immediate access to important contacts Eyewitness report forms Better if these are e-mailed or sent ahead to be filled out as soon as possible Laptop/palm pilot Computer provides spreadsheets, word documents, etc. Other Anything you think you will need Equally important are eyewitness statements or incident reports. Unfortunately, when things go wrong, eyewitnesses seem to disappear, or, at best, are reluctant to speak. Sometimes, they are shielded by management or unions. Getting information from them is a learned talent. One suggestion is to send forms ahead of time to be filled out. The number that are returned and the completeness of the forms may provide some indication of the amount of cooperation to be expected. In many cases, the investigator may never actually see the failure scene and has to depend on the photographs and eyewitness statements available. It is advisable that the eyewitness statements be formatted in a standard form similar to a procedure. Personnel are more familiar with forms, and the document is less intimidating. It is also suggested that the eyewitness statements be electronic and in a form that is searchable. This is convenient. In addition, the information is useful later for the generation of statistics or for searching for similar failures, failures at the same plant, failures involving the same personnel, and so on. There are other related information items that may not be obvious, such as on-the-floor “standard” process deviations, planned or unplanned plant shutdowns, time of year, weather conditions, where the failure happened (geography), and personnel losses. In fact, a good question to request on the eyewitness statement form is, “Are you aware of any recent changes in operations, personnel, equipment, and so on?” This kind of information is not apparent, so ask. An example is the failure of a large precipitation-hardening martensitic stainless steel forging. The failures were sporadic and were discovered during acceptance testing, due to the lack of mechanical properties. It turned out that the forging company had no trouble cooling the large forgings to below -50 °C (-60 °F) during the quench cycle in the winter months when there was an abundance of snow outside to dump in the quench pit. However, during the hot summer months, the engineers had to calculate how much ice to bring in to dump in the quench tank. It became apparent that their calculations were not correct. This case is a perfect example of when geographic location and time of year are important. Another very good question that should always be asked is, “Has this happened before?” Many times, a company simply dismisses a failure the first few times it happens and writes it off as a “unique” or “one-time” occurrence. These occurrences are then forgotten until the next one. Unfortunately, because the failure is not documented and is dismissed as unique, no one remembers the first failure, and so, the second failure may also be deemed unique. It is truly a downward spiral. Be consistent. If the investigator always requests complete failure scene photographs, eyewitness statements, and all the information available, the customers start to collect the information before contacting the investigator. They will have been trained. If the investigator works in a large company, it would be a good idea to create the procedure “What to do when a failure occurs” and then to train company personnel on how to properly document and protect the failure scene and how to properly fill out eyewitness statements. As a start, there is the ASTM Committee E-30 on forensic sciences and ASTM Subcommittee E-30-05 on forensic engineering sciences that have created some specifications that can be reviewed and used when relevant. Some of the specifications are: · E 620. Standard Practice for Reporting Opinions of Technical Experts · E 678. Standard Practice for Evaluation of Technical Data
E 860. Standard Practice for Examining and Testing Items That Are or May Become Involved in Litigation E 1020. Standard Practice for Reporting Incidents E 1188. Standard Practice for Collection and Preservation of Information and Physical Items by Technical Investigato Another good approach was adapted from a concept from the Failsafe Network, Inc. It is the concept of the five "Ps that need to be documented and recorded to freeze the evidence at the failure scene Position: fragments, equipment, parts, people(witnesses, people involved), controls, and photogra People: job descriptions, witnesses, accountabilities, information sources, experts, and those personnel that do not know Paper: drawings, design changes, manufacturing records, mill certificates, operation records, operating procedures, past failure histories, maintenance records, photographs, inspection records, stress analysis, and past nonconformances Process: design process, operational process, approved process changes, unapproved process changes (how it really operated or was made ), environment, and weather Part: materials specified, materials used, mechanical and physical properties of materials for hardware or machine, fracture faces, distortion of the failed part and other hardware, remnants of failed hardware or machine, microscopy analysis, stress analysis, and metallurgical analysis Step 3: Objectively and Clearly Identify All Possible Root Causes. The next step in the organization of a failure investigation is to objectively and clearly identify all possible root causes. There are many tools one can use One common tool is to create a Fault Tree( Fig. 2). Computers make creation and revision of fault trees and other analytical evaluation tools easier. Once again, brainstorming is also a good tool. To discover every root cause, the investigator should ask, "Why, why, why? Why Do the Forgings have Penetrant Defects? Casting Process Heat Machining (2) 原 Shrinkage Burst Defect Cavity Inclusions (2A) Short Enhancement (1A 1c) (1B) Chemistry (48) 2B2) Temperature (2A1 and 2C1) 1c1 (2A2and202) (4B1 Processes Metal Flow train Direction Thefileisdownloadedfromwww.bzfxw.com
· E 860. Standard Practice for Examining and Testing Items That Are or May Become Involved in Litigation · E 1020. Standard Practice for Reporting Incidents · E 1188. Standard Practice for Collection and Preservation of Information and Physical Items by a Technical Investigator Another good approach was adapted from a concept from the Failsafe Network, Inc. It is the concept of the five “Ps” that need to be documented and recorded to freeze the evidence at the failure scene: · Position: fragments, equipment, parts, people (witnesses, people involved), controls, and photographs · People: job descriptions, witnesses, accountabilities, information sources, experts, and those personnel that do not know · Paper: drawings, design changes, manufacturing records, mill certificates, operation records, operating procedures, past failure histories, maintenance records, photographs, inspection records, stress analysis, and past nonconformances · Process: design process, operational process, approved process changes, unapproved process changes (how it really operated or was made), environment, and weather · Part: materials specified, materials used, mechanical and physical properties of materials for hardware or machine, fracture faces, distortion of the failed part and other hardware, remnants of failed hardware or machine, microscopy analysis, stress analysis, and metallurgical analysis Step 3: Objectively and Clearly Identify All Possible Root Causes. The next step in the organization of a failure investigation is to objectively and clearly identify all possible root causes. There are many tools one can use. One common tool is to create a Fault Tree (Fig. 2). Computers make creation and revision of fault trees and other analytical evaluation tools easier. Once again, brainstorming is also a good tool. To discover every root cause, the investigator should ask, “Why, why, why?” The file is downloaded from www.bzfxw.com
Fig. 2 Example of fault tree chart for forgings with dye-penetrant defects First ask why the failure happened This is the top of the fault tree. Write down the first root-cause answer, and then ask why that first root-cause answer happened. Continue asking"Why, why, why? " until each line of questioning is exhausted or a practical stopping point has been reached, but do not stop too soon. Then, go back to the top of the fault tree and start again. Keep doing this until there are no more root-cause answers to the why"question. Be objective and list all answers, that is the key to brainstorming. If someone says that gremlins did it, write it down. The chance to be subjective comes later. It is not appropriate that any idea be dismissed. The dismissal violates the brainstorming concept, impedes the flow of ideas from other participants, and possibly ends the session Once the fault tree is completed, there are two other questions to ask What was different about this failure? and "What are we missing? Ask them over and over throughout the failure investigation There are many reasons structured root-cause analysis is important. First is to provide documentation. This permanent record of all possible root causes imagined by the team. No ideas are lost or forgotten. All root causes are succinctly listed and organized on one chart Second is to create a" living " list, which can be added to at any time. Once the failure investigation is in progress, information is generated that proves or disproves each root cause. The information may also lead to a new root cause not imagined before Third, these techniques can help simplify a very complicated failure. They help " divide and conquer"the oblem by compartmentalizing many cause-and-effect relationships. The failure is divided into individual root causes, and each one of those is divided further. It is easier to investigate and prove or disprove each root cau than the resultant failure Fourth, structured problem solving may prevent the potentially misleading presumption of a"magic bullet"or ilver bullet" theory approach to failure investigation. There is always a predominant or"pet "root cause Resist the urge to spend all the time, money, and manpower to prove this one root cause. If wrong, the investigator has wasted time and money and perhaps destroyed any evidence that would prove or disprove the other root causes in his haste to prove this one. Not all failure investigations have enough time, manpower and available test samples to test every root cause. The investigator may have to be selective in his approach. Also the failure may be a combination of root causes, not just one. Just because the pet root cause is proved does not mean there is not another, more important root cause out there. Remember that the goal of a failure investigation is to determine the root cause(s)of a failure, not to prove one root cause. Sometimes, the failure investigation goes off track and becomes a proof of a singular theory, hence the silver or magic bullet theory Step 4: Objectively Evaluate Likelihood of Each Root Cause. The next step of the failure investigation is to objectively evaluate the likelihood of each root cause listed. a good tool for this is the Failure Mode Assessment(FMA)chart( Fig 3). The FMa chart can be created in a spreadsheet
Fig. 2 Example of fault tree chart for forgings with dye-penetrant defects First ask why the failure happened. This is the top of the fault tree. Write down the first root-cause answer, and then ask why that first root-cause answer happened. Continue asking “Why, why, why?” until each line of questioning is exhausted or a practical stopping point has been reached, but do not stop too soon. Then, go back to the top of the fault tree and start again. Keep doing this until there are no more root-cause answers to the “why” question. Be objective and list all answers; that is the key to brainstorming. If someone says that gremlins did it, write it down. The chance to be subjective comes later. It is not appropriate that any idea be dismissed. The dismissal violates the brainstorming concept, impedes the flow of ideas from other participants, and possibly ends the session. Once the fault tree is completed, there are two other questions to ask: “What was different about this failure?” and “What are we missing?” Ask them over and over throughout the failure investigation. There are many reasons structured root-cause analysis is important. First is to provide documentation. This is a permanent record of all possible root causes imagined by the team. No ideas are lost or forgotten. All root causes are succinctly listed and organized on one chart. Second is to create a “living” list, which can be added to at any time. Once the failure investigation is in progress, information is generated that proves or disproves each root cause. The information may also lead to a new root cause not imagined before. Third, these techniques can help simplify a very complicated failure. They help “divide and conquer” the problem by compartmentalizing many cause-and-effect relationships. The failure is divided into individual root causes, and each one of those is divided further. It is easier to investigate and prove or disprove each root cause than the resultant failure. Fourth, structured problem solving may prevent the potentially misleading presumption of a “magic bullet” or “silver bullet” theory approach to failure investigation. There is always a predominant or “pet” root cause. Resist the urge to spend all the time, money, and manpower to prove this one root cause. If wrong, the investigator has wasted time and money and perhaps destroyed any evidence that would prove or disprove the other root causes in his haste to prove this one. Not all failure investigations have enough time, manpower and available test samples to test every root cause. The investigator may have to be selective in his approach. Also, the failure may be a combination of root causes, not just one. Just because the pet root cause is proved does not mean there is not another, more important root cause out there. Remember that the goal of a failure investigation is to determine the root cause(s) of a failure, not to prove one root cause. Sometimes, the failure investigation goes off track and becomes a proof of a singular theory, hence the silver or magic bullet theory. Step 4: Objectively Evaluate Likelihood of Each Root Cause. The next step of the failure investigation is to objectively evaluate the likelihood of each root cause listed. A good tool for this is the Failure Mode Assessment (FMA) chart (Fig. 3). The FMA chart can be created in a spreadsheet