Excessive preventable error rates in hospitals continue to

An Improved Failure Mode Effects Analysis for Hospitals Jan S. Krouwer, PhD, FACB ● Objective.—To review the Failure Mode Effects Analysis (FMEA) pro...
8 downloads 0 Views 192KB Size
An Improved Failure Mode Effects Analysis for Hospitals Jan S. Krouwer, PhD, FACB

● Objective.—To review the Failure Mode Effects Analysis (FMEA) process recommended by the Joint Commission on Accreditation of Health Organizations and to review alternatives. This reliability engineering tool may be unfamiliar to hospital personnel. Data Sources.—Joint Commission on Accreditation of Health Organizations recommendations, Mil-Std-1629A, and other articles about FMEA were used. Study Selection.—The articles were selected by a literature search that included Web site–accessible material. Data Extraction.—All articles found were used. Data Synthesis.—The results are based on the articles cited and the author’s experience in conducting FMEAs in the medical diagnostics industry.

E

xcessive preventable error rates in hospitals continue to be a concern and have prompted the Joint Commission on Accreditation of Health Organizations (JCAHO) to require hospitals to perform one Failure Mode Effects Analysis (FMEA) each year.1 FMEA is a tool used in reliability engineering, especially in analyzing product design, although processes are also the subject of FMEAs.2 FMEA is synonymous with FMECA when the extra ‘‘C’’ is Criticality. An explanation of the terms used in this article is provided in Table 1. AN OVERVIEW OF FMEA There are different ways to conduct a FMEA.1,2 An outline of the JCAHO process is as follows:

Conclusions.—Fault trees and a list of quality system essentials are recommended additions to the FMEA process to help identify failure mode effects and causes. Neglecting mitigations for failure modes that have never occurred is a possible danger when too much emphasis is placed on improving risk priority numbers. A modified Pareto, not based on the risk priority number, is recommended when there are qualitatively different failure mode effects with different severities. Performing a FMEA that both meets accreditation requirements and reduces the risk of medical errors is an attainable goal, but it may require a different focus. (Arch Pathol Lab Med. 2004;128:663–667) DETAILS OF THE FMEA PROCESS AND RECOMMENDED IMPROVEMENTS The preparation of a flowchart is important—visualizing the process workflow helps in the enumeration of possible failure modes. Figure 1 shows a flowchart for a part of the process of assaying a patient specimen in a hospital laboratory, namely the step of receiving the specimen.3 Each of the 3 process steps and the 2 output steps represents a potential failure mode and should be a FMEA event. Fault trees and a list of quality system essentials (QSEs) are recommended additions to the FMEA process that will help ensure that both failure mode causes and effects are correct.

A. A flowchart of the process B. A FMEA for the flowchart steps that includes 1. A description of the effect of each failure mode event 2. A description of the cause(s) of each failure mode event using fishbone diagrams 3. A numeric ranking (1–10) of each failure mode’s a. Severity b. Probability c. Detection capability C. A multiplication of these 3 numbers to obtain the risk priority number D. A proposal and implementation of mitigations for the causes of the failure modes with the highest risk priority number

A fault tree is a hierarchic chart of causes for a top-level (error) event. An example of a high-level fault tree for a hospital laboratory is shown in Figure 2. Fault trees use logical operators called gates. The most common gates are OR and AND. An OR gate means that if any child events of an OR gate occur, the higher-level event will occur, while an AND gate means that all child events of an AND gate must occur for the higher-level event to occur. If a failure event has no children, it is called a BASIC event. BASIC events are the causes of a failure mode and are thus the subject of a FMEA. Figure 2 shows 4 main branches for the top-level event of ‘‘Lab Error.’’ They are as follows:

Accepted for publication February 16, 2004. From Krouwer Consulting, Sherborn, Mass. The author has no relevant financial interest in the products or companies described in this article. Reprints: Jan S. Krouwer, PhD, FACB, Krouwer Consulting, 26 Parks Dr, Sherborn, MA 01770 (e-mail: [email protected]).

● Perception—Complaints from either hospital or nonhospital staff ● Performance—Traditional quality, including errors that can affect patient safety ● Financial—Errors that threaten the financial health of the service

Arch Pathol Lab Med—Vol 128, June 2004

Fault Trees

An Improved FMEA for Hospitals—Krouwer 663

Table 1. Definitions of Terms Used When Performing a Failure Mode Effects Analysis Corrective action—A design change intended to prevent the recurrence of an error. Corrective actions are often used to deal with observed errors, although the term is sometimes used for steps that prevent potential errors. Fault detection—A method to signal that an error event has occurred. If detection and recovery are successful, the effect of the error event will be prevented. Fault tree—A hierarchic chart of causes for a top-level event. FMEA/FMECA (Failure Mode Effects [and Criticality] Analysis)—A description of failure mode events, including the action that will mitigate the probability that the effect of the event is observed. Gate—A logical operator for fault tree events. Gates are typically ‘‘OR’’ or ‘‘AND.’’ Mitigation—A design change intended to prevent an error or its effect. A mitigation implies preventing errors that have not been observed, but it can also refer to preventing a recurrence of observed errors. Pareto—A means of ranking (often displayed as a table or chart), with the most important items at the top. Probability—The likelihood of the occurrence of an event. FMEA usually requires classifying the probability of an event. Process flowchart—A detailed chart of steps that occur for a process to be carried out. Typically, there are inputs, process steps, and outputs. Quality system essentials (QSEs)—The list of policies and procedures that enable an organization to carry out its function properly. Recovery—An action that prevents the effect of a failure mode event from being observed (see also ‘‘Fault detection’’). Risk priority number (RPN)—A number that is the multiplication result of severity and probability (and sometimes detection capability) and that forms the basis for a Pareto ranking. Severity—The importance of an event, often with respect to some hazard. FMEA usually requires classifying the severity of an event.

Figure 1. Process steps for receiving a specimen.

● Regulatory—Errors that threaten the accreditation status of the service The benefit of this model can be seen by examining the process flowchart in Figure 1. Consider a failure mode for the step in which the specimen label is examined. If any required information for the specimen is missing and is unnoticed, the effect of this error is that this required information will be obtained by some exception procedure that adds cost and a possible delay. This type of failure mode could be expected to fall either under the financial or perception branch in Figure 2. On the other hand, a failure mode for the step that examines the condition of the specimen might be a hazard to the patient. For example, an undetected lipemic sample could result in a large error in the assay result reported to the clinician. A fault tree helps associate failure mode events with the appropriate effect(s) by having branches in which to place failure mode events.4 An abbreviated fault tree (without 664 Arch Pathol Lab Med—Vol 128, June 2004

showing all causes) for the 2 top process errors from Figure 1 is shown in Figure 3. Note the use of the AND gate in Figure 3. The AND gate shows that a lipemic sample by itself will not cause an error—the assay also has to suffer an interference from lipemia, and the lipemic sample condition check has to have failed. AND gates are common, since many errors occur only with the occurrence of all (child) error events. Quality System Essentials Quality system essentials are the list of policies and their procedures that enable an organization to carry out its function properly.3 An example list of policies in a hospital laboratory is shown below.3 Each policy would have several procedures (data not shown). This list is important because it represents potential failure mode event causes. For example, in the process flowchart in Figure 1, inadequate training (a procedure under the policy ‘‘Personnel’’) An Improved FMEA for Hospitals—Krouwer

improvement, customer service and satisfaction, and facilities and safety.

Figure 2. A template fault tree for a hospital laboratory.

Figure 3. The fault tree for the receive specimen process.

was selected in Figure 3 as a cause of the failure of the process step ‘‘Specimen Labeling OK?’’ Although training as a process step could be included in the flowchart, virtually all QSEs in every process step would need to be included to maintain consistency, which would make process flowcharts too complicated. However, the list of QSEs should be available for review when a FMEA is prepared. Laboratory QSEs.—Quality system essentials for a laboratory are as follows: documents and records, organization, personnel, equipment, purchasing and inventory, process control, information management, occurrence management, internal and external assessment, process Arch Pathol Lab Med—Vol 128, June 2004

ENGINEERING VERSUS HOSPITAL FMEAS—BEWARE THE RISK PRIORITY NUMBER As an engineering example of a FMEA, consider a large copier as the subject of a FMEA. A flowchart of the copier steps provides possible failure mode events in the FMEA. These events will typically be copier parts that have failed, leading to the effect of a copier function failure. In this scenario, each part that fails has an associated cost that ranges from an inexpensive item that can be replaced by a customer to an expensive part that must be replaced or repaired onsite by a service technician. These costs can be ranked, and they correspond to the severity of each failure mode. The probability or frequency of occurrence of each failure mode can also be estimated. Using numeric severity and probability scales (eg, 1–10) and multiplying them together gives a risk priority number (RPN) that can be used in a Pareto chart to prioritize failure mitigation efforts. In some situations, detection is used as a third variable in the RPN. For example, an expensive part that fails frequently, for which detection is unlikely, would have the highest RPN. This ranking has been suggested for use in hospitals.1 However, it would often be inappropriate, because severity in a hospital FMEA is frequently not limited to one variable, such as the cost of parts, as in the copier example. Thus, if one failure event in a hospital were to lead to a patient death and another failure event were to lead to an added cost, it would be illogical to use an RPN ranking because, regardless of the frequency of the occurrence of events, a patient death event always has a higher priority than a failure that increases cost. An RPN works in the copier example, because it simply corresponds to dollars.5 To remedy the situation for a hospital FMEA, a partial Pareto is recommended, whereby mitigations are first provided for the most severe events, in order of their decreasing frequency of occurrence, and then the next most severe events and so on. It may be helpful to view events in a risk map that presents events in a severity-by-probability grid.6,7 These risks maps must also be viewed with caution, since catastrophic events, while having the highest severity, will also often have the lowest probability. The worst cell in the risk map (highest severities and probabilities) will likely be empty. A GENERIC FAILURE EVENT SEQUENCE Figure 4 shows a possible sequence when an error event occurs. After an error event occurs, an attempt can be made to detect and recover from the error. If the detection and recovery are successful, the effect of the error will not be observed, even though the error event has occurred. Often, the detection and recovery steps themselves are process steps, as shown in the flowchart in Figure 2. For example, if the condition of the sample is not acceptable because of lipemia, then the effect of reporting a patient outlier result may be prevented by technician detection (visually), followed by recovery, which would involve notifying the clinician that the sample is unsuitable (and, of course, not using the specimen as is). There is no guarantee that the recovery will be successful. For example, if a laboratory result review detects an outlier after the result has been reported and contacting the clinician is the reAn Improved FMEA for Hospitals—Krouwer 665

Figure 4. A simplified relationship between an error event and error effect.

Table 2. Different Probabilities for Figure 4 Item

No. of tries Error event occurs Error event not detected Recovery fails

Symbol

Overall Effect Rate (Example)

a b c d

. . . (100) b/a (10/100 5 10%) c/a (5/100 5 5%) (c + d)/a (7/100 5 7%)

covery method, the clinician must be reached in time, or the recovery will have failed. Probability (frequency of occurrence) could refer to several items in Figure 4. As shown in Table 2, frequency of occurrence is recommended as the cell containing recovery by overall effect rate. Table 2 also has implications for risk mitigation strategies. A combination of improving error prevention, error detection, and error recovery may be attempted. The cost of any mitigation is also a factor that is traded off against risk reduction. FMEA VERSUS FAILURE REVIEW AND CORRECTIVE ACTION SYSTEM/ROOT CAUSE ANALYSIS As an engineering tool, FMEA is used during product or process design and is intended to inform designers of any changes to the product that will prevent errors from occurring after launch. A hospital laboratory can be viewed as an existing process (eg, already launched) with many observed, preventable errors.8 FRACAS (Failure Review and Corrective Action System) and RCA (Root Cause Analysis) are engineering tools that have specifically been designed to deal with observed errors.9 Both FMEA and FRACAS/RCA are useful and can be applied at the same time. However, if only a FMEA is being performed, it must be realized that preventing errors that have never occurred is an important purpose, if not one of the main purposes, of this tool (eg, in the nuclear power industry, FMEA is used to help prevent a ‘‘meltdown’’ event from ever occurring). A hazard scoring matrix, such as that proposed by the Veterans Administration10 in which the most severe 666 Arch Pathol Lab Med—Vol 128, June 2004

Task Efficiency (Example)

... (a 2 b)/a (90/100 5 90%) (b 2 c)/b (5/10 5 50%) (c 2 d)/c (3/5 5 60%)

and least probable events are not included in the highest priority, is thus questionable. ASSESSING THE QUALITY OF A FMEA It has been suggested that an RPN ranking be prepared before and after risk mitigation with the goal of realizing a certain percentage of improvement in the RPN after risk mitigation.11 This suggestion may lead to a selection of events for mitigation that is based on providing the biggest improvement to the RPN rather than the maximum benefit to patient safety. The problem is that an event with the highest severity and the lowest frequency of occurrence cannot have its RPN improved, yet it still may be of great benefit by having a risk mitigation applied. This is particularly true for catastrophic events that have never occurred. For example, organ transplants require the correct blood type, or else the transplant will fail, and patient death is a possible outcome. If the failure mode of ‘‘wrong blood type’’ has never occurred, it might be tempting to assess the process step of checking for blood types as adequate. Yet there still may be room for improvement. Once such catastrophic events occur, there is, of course, a greater emphasis placed on preventing their recurrence.12 The value of a FMEA is that it reduces the risk that such events will ever occur. Moreover, it is likely that many types of catastrophic events will have extremely low probabilities before a FMEA is conducted, since a process error that has resulted in severe outcomes with a high frequency would not be allowed to exist. A series of mitigations is a likely result of a FMEA. Yet An Improved FMEA for Hospitals—Krouwer

each mitigation represents a design change to the process and itself represents a risk that (1) the mitigation is not sufficiently effective, or (2) the mitigation causes new errors. For example, in the airline industry, a mitigation for preventing communication errors is that pilots must repeat orders from air traffic controllers. Yet there have been airline disasters—including a runway collision in 1977 in which 583 lives were lost—whose causes were attributed to communication failures, indicating that this mitigation step of repeating air traffic control orders was inadequate.13 There is no easy way to measure the quality of a FMEA because FMEA quality is, in essence, an assessment of a model. An analysis of near-miss data may help in model assessment.14 Of course, the actual frequency of events after mitigation can be observed; however, as mentioned, catastrophic events will be extremely unlikely, regardless of the quality of the mitigation. ACCREDITATION VERSUS QUALITY INITIATIVES At least 2 types of initiatives can be envisioned within a hospital to reduce error rates: (1) a quality program initiated by the hospital staff, and (2) a quality program required by a regulatory body. Either quality program has the potential to reduce error rates; however, in a difficult financial climate, quality programs that are required by regulatory bodies may capture the limited resources available. A concern for a regulatory body–required quality program is that efforts may be focused on accreditation.5 This may lead to an overemphasis on documentation, as was described for a related case with ISO 9001 certification.15 This contrasts with a hospital staff–initiated quality program, the aim of which might be a more direct quality goal, such as a specific reduction in an error rate. Accred-

Arch Pathol Lab Med—Vol 128, June 2004

itation is important—the fault tree in Figure 2 accommodates both of the failure events (ie, loss of accreditation and hazard to the patient), which, in turn, allows mitigations to be devised for both. References 1. Rich D. Complying with the FMEA requirements of the New Patient Safety Standards. JCAHO PowerPoint presentation 2001. Available at: http:// www.fmeainfocentre.com/download/fmeaprequirements.ppt. Accessed December 5, 2003. 2. Department of Defense. Procedures for performing a failure mode effects and criticality analysis. MIL-STD-1629A. 1980. Available at: http://jcs.mil/htdocs/ teinfo/software/ms18.html. Accessed December 5, 2003. 3. NCCLS. Application of a Quality System Model for Laboratory Services; Approved Guideline. 2nd ed. GP26-A2. Wayne, Pa: NCCLS; 2003. 4. Krouwer JS. The value of performing fault tree analysis concurrently with FMEA. Available at: http://krouwerconsulting.com/FTFMECA.htm. Accessed December 5, 2003. 5. Harpster RA. How to get more out of your FMEAs. Qual Dig. 1999;19:40– 42. 6. Smith PG. Managing risk as product development schedules shrink. Res Technol Manage. 1999;42:25–32. 7. Ozog H. Risk management in medical device design. Medical Device and Diagnostic Industry, October 1997. Available at: http://www.devicelink.com/ mddi/archive/97/10/023.html. Accessed December 5, 2003. 8. Nevalainen D, Berte L, Kraft C, Leigh E, Picaso L, Morgan T. Evaluating laboratory performance on quality indicators with the six sigma scale. Arch Pathol Lab Med. 2000;124:516–519. 9. Krouwer JS. Using a learning curve approach to reduce laboratory error. Accred Qual Assur. 2002;7:461–467. 10. Derosier J, Stalhandske E, Bagian JP, Tina N. Using health care failure mode and effect analysis. Joint Commission J Qual Improvement. 2002;27:248–267. 11. Institute for Healthcare Improvement. Failure modes and effects analysis tool. QualityHealthcare.org Web site. Available at: http://qualityhealthcare.org/ qhc/workspace/tools/fmea/. Accessed December 5, 2003. 12. Molter J. Background information on Jesica Santillan blood type mismatch, February 17, 2003. Available at: http://dukemednews.duke.edu/mediakits/detail. 1php?id56498. Accessed December 5, 2003. 13. Cushing S. Fatal Words: Communication Clashes and Aircraft Crashes. Chicago, Ill: University of Chicago Press; 1997:9–10. 14. Kaplan HS, Battles JB, Van der Schaaf TW, Shea CE, Mercer SQ. Identification and classification of the causes of events in transfusion medicine. Transfusion. 1998;38:1071–1081. 15. Krouwer JS. ISO 9001 has had no effect on quality in the in-vitro medical diagnostics industry. Accred Qual Assur. 2004;9:39–43.

An Improved FMEA for Hospitals—Krouwer 667