KONGSBERG MODELS FOR QUANTIFICATION OF AVAILABILITY OF CONTINUOUS CONTROL SYSTEMS AND RELIABILITY OF SAFETY SYSTEMS

NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY FACULTY OF ENGINEERING SCIENCE AND TECHNOLOGY DEPARTMENT OF PRODUCTION AND QUALITY ENGINEERING MODELS ...
Author: Nicholas Lucas
5 downloads 0 Views 3MB Size
NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY FACULTY OF ENGINEERING SCIENCE AND TECHNOLOGY DEPARTMENT OF PRODUCTION AND QUALITY ENGINEERING

MODELS FOR QUANTIFICATION OF AVAILABILITY OF CONTINUOUS CONTROL SYSTEMS AND RELIABILITY OF SAFETY SYSTEMS

STUD. TECHN. MARIANNE KJØRSTAD SPRING 2005

KONGSBERG Kongsberg Maritime AS

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

PREFACE

This report documents my master thesis in safety and reliability engineering at the Norwegian University of Science and Technology (NTNU). The extent of the master work is estimated to 800 working hours. The target group for this report is assumed to be familiar with the terminology used in the NTNU course TPK4120 Industrial safety and reliability and/or the terminology used in Rausand and Høyland (2004). The reader is assumed to have basic knowledge about the IEC 61508 industrial standard for functional safety of electric/electronic/programmable electronic safety-related systems. This report is based on Kjørstad (2004), which also provide information about IEC 61508. Special thanks go to my supervisor professor Marvin Rausand for good guidance and helpful advises during the master work. Also special thanks go to Kongsberg Maritime’s representatives; Erik Korssjøen and Jens Sverre Gjærevoll who have always been helpful to answer my questions and provide me with information whenever I have needed. Thanks also to all other Kongsberg Maritime’s employees who have provided me with information and guidance during my master work! My last thanks go to Kongsberg Maritime AS for providing me with an office and equipment during my master work. The calculation tool that has been developed during the master work includes confidential information and is only open to senor, supervisor and Kongsberg Maritime.

Marianne Kjørstad Kongsberg, 10. June 2005

Models for quantification of availability of continuous control systems and reliability of safety systems

ii

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

TABLE OF CONTENTS TABLE OF FIGURES ............................................................................................................................... IV TABLE OF TABLES ...................................................................................................................................V TABLE OF APPENDICES .........................................................................................................................V SUMMARY................................................................................................................................................. VI ABBREVIATIONS AND DEFINITIONS............................................................................................. VIII 1

INTRODUCTION............................................................................................................................. 13 1.1 1.2 1.3 1.4 1.5 1.6

2

ANALYSIS OF METHODS ............................................................................................................ 19 2.1 2.2 2.3 2.4 2.5 2.6

3

POTENTIAL IMPROVEMENTS ....................................................................................................... 52

IMPROVED CALCULATION MODELS ..................................................................................... 54 5.1 5.2

6

SAFETY SYSTEMS ....................................................................................................................... 43 DYNAMIC POSITIONING SYSTEMS ............................................................................................... 44 CONFIGURATION OF MODULES ................................................................................................... 46 SPECIAL CONSIDERATIONS ......................................................................................................... 50

EXISTING CALCULATION MODELS ........................................................................................ 52 4.1

5

COMMON CAUSE FAILURES ........................................................................................................ 22 PROBABILITY OF FAILURE ON DEMAND ...................................................................................... 26 SYSTEMATIC FAILURES .............................................................................................................. 37 SPURIOUS TRIPS.......................................................................................................................... 39 SAFE FAILURE FRACTION ............................................................................................................ 40 MAIN DIFFERENCES AND DISCUSSION ......................................................................................... 40

PRODUCT RANGE ......................................................................................................................... 43 3.1 3.2 3.3 3.4

4

BACKGROUND ............................................................................................................................ 13 OBJECTIVES ............................................................................................................................... 13 LIMITATIONS .............................................................................................................................. 14 METHODOLOGY ......................................................................................................................... 15 STATE OF THE ART SURVEY ........................................................................................................ 15 STRUCTURE OF THE REPORT ....................................................................................................... 18

AIM SAFE SYSTEMS ................................................................................................................... 57 DYNAMIC POSITIONING SYSTEMS ............................................................................................... 61

CALCULATION TOOL .................................................................................................................. 68 6.1 6.2 6.3

EXISTING CALCULATION TOOL ................................................................................................... 68 PRODUCT SPECIFICATION ........................................................................................................... 69 DESCRIPTION OF IMPROVED CALCULATION TOOL ....................................................................... 70

7

CONCLUSIONS AND RECOMMENDATIONS FOR FURTHER WORK .............................. 73

8

REFERENCES.................................................................................................................................. 75

Models for quantification of availability of continuous control systems and reliability of safety systems

iii

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

TABLE OF FIGURES FIGURE 1.1: THE AVAILABILITY OF AN ITEM (REPRODUCED FROM RAUSAND AND HØYLAND 2004) ............................................................................................................................. XII FIGURE 2.1: CONTRIBUTION TO SIS UNAVAILABILITY.................................................................. 21 FIGURE 2.2: BETA-FACTOR MODEL (ADAPTED FROM HOKSTAD ET. AL. 2003)........................ 24 FIGURE 2.3: EXTENDED BETA-FACTOR MODEL USED IN PDS METHOD (ADAPTED FROM HOKSTAD ET. AL. 2003) ................................................................................................................. 26 FIGURE 2.4 : PFD FOR A SIS OVER TIME (ADAPTED FROM RAUSAND AND HØYLAND 2004) 28 FIGURE 2.5: AVERAGE PFD FOR A SIS (ADAPTED FROM RAUSAND AND HØYLAND 2004). .. 28 FIGURE 2.6: OCCURRENCE OF DU FAILURES .................................................................................... 32 FIGURE 2.7: SIS SATISFYING SIL2......................................................................................................... 38 FIGURE 3.1: BLOCK DIAGRAM OF THE DP SYSTEM (ADAPTED FROM KONGSBERG MARITIME AS 2003)........................................................................................................................ 45 FIGURE 3.2: INPUT SIGNAL FOR THE AIM SAFE 1OO2D DUAL IO SYSTEM ................................ 46 FIGURE 3.3: OUTPUT SIGNAL FOR THE AIM SAFE 1OO2D DUAL IO SYSTEM ............................ 47 FIGURE 3.4: INPUT/OUTPUT COMMUNICATION INCLUDING HUB AND TERM FOR A FUNCTION. ....................................................................................................................................... 48 FIGURE 3.5: CONFIGURATION FOR A SDP-22 SYSTEM .................................................................... 48 FIGURE 3.6: DP-31 ..................................................................................................................................... 50 FIGURE 5.1: PFD AVERAGE FOR A COMPONENT WITH DU FAILURE RATE 0.7E-05 PER HOUR. ............................................................................................................................................................ 56 FIGURE 5.2: PFD AVERAGE FOR A COMPONENT WITH DU FAILURE RATE 0.7E-04 PER HOUR. ............................................................................................................................................................ 56 FIGURE 5.3: RELIABILITY BLOCK DIAGRAM FOR AN AIM SAFE 1OO2D DUAL IO REDUNDANCY SYSTEM................................................................................................................ 57 FIGURE 5.4: RELIABILITY BLOCK DIAGRAM FOR AN AIM SAFE 1OO2D DUAL IO REDUNDANCY SYSTEM, WHEN ASSUMPTIONS ARE APPLIED ........................................... 58 FIGURE 5.5: RELIABILITY BLOCK DIAGRAM OF A DP-31 SYSTEM ............................................. 61 FIGURE 5.6: STATE TRANSITION DIAGRAM A SINGLE COMPONENT.......................................... 63 FIGURE 6.1: THE MENU FOR ADDING NEW COMPONENTS AND PERFORMING RELIABILITY/AVAILABILITY CALCULATIONS....................................................................... 71 FIGURE 6.2: THE MENU FOR LOADING OLD SYSTEMS AND MOVE FUNCTIONS TO "PROJECT" WORKSHEET.................................................................................................................................... 72 FIGURE 6.3: EXAMPLE OF A AIM SAFE 3 DUAL IO REDUNDANCY SYSTEM.............................. 72

Models for quantification of availability of continuous control systems and reliability of safety systems

iv

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

TABLE OF TABLES TABLE 1.1: VOTING LOGICS .................................................................................................................. XI TABLE 2.1: MODIFICATION FACTOR FOR BETA, ACCORDING TO VOTING OF CHANNELS (ADAPTED FROM HOKSTAD ET. AL. 2003)................................................................................ 26 TABLE 2.2: RELIABILITY MEASUREMENTS FOR A COMPONENT WITH AN EXPONENTIAL DISTRIBUTED LIFETIME ............................................................................................................... 28 TABLE 2.3: PFD OF SOME KOON VOTING LOGIC (REPRODUCED FROM RAUSAND AND HØYLAND 2004) .............................................................................................................................. 30 TABLE 2.4: FAILURE TERMINOLOGY USED IN THE PDS METHOD (ADAPTED FROM HOKSTAD ET. AL. 2003) ................................................................................................................. 30 TABLE 2.5: ABBREVIATIONS AND EXPLANATIONS TO THE PDS MODEL.................................. 31 TABLE 2.6: COMMENTS ON PFD UNKNOWN (ADAPTED FROM HOKSTAD AND CORNELIUSSEN 2003) .................................................................................................................... 31 TABLE 2.7: COMMENTS ON PFD KNOWN (ADAPTED FROM HOKSTAD AND CORNELIUSSEN 2003) ................................................................................................................................................... 31 TABLE 2.8: FAILURE TERMINOLOGY IN IEC 61508-6 ....................................................................... 34 TABLE 2.9: ABBREVIATIONS AND EXPLANATIONS TO THE IEC 61508-6 METHOD (ADAPTED FROM IEC 61508) ............................................................................................................................. 34 TABLE 2.10: COMMENTS ON PFD IN IEC 61508-6............................................................................... 35 TABLE 2.11: COMMENTS ON PFH ACCORDING IN IEC 61508-6 ...................................................... 37 TABLE 2.12: PFD AND STR FOR SYSTEMS WITH IDENTICAL AND INDEPENDENT COMPONENTS (ADAPTED FROM TIEZEMA 1998).................................................................... 39 TABLE 3.1: KONGSBERG MARITIME'S SAFETY SOLUTIONS (REPRODUCED FROM KORSSJØEN 2004)............................................................................................................................ 44 TABLE 3.2: KONGSBERG MARITIME’S DP SYSTEMS (ADAPTED FROM KONGSBERG MARITIME AS 2003)........................................................................................................................ 45 TABLE 5.1: REQUIREMENTS FOR CONSIDERATIONS WHEN PERFORMING PFD CALCULATIONS (IEC 61508-2)...................................................................................................... 54 TABLE 5.2: INPUT DATA FOR AN AIM SAFE 1OO2D DUAL IO REDUNDANCY SYSTEM .......... 59 TABLE 5.3: PFD FOR DIFFERENT AIM SAFE SYSTEMS .................................................................... 61 TABLE 5.4: RECOMMENDED PFD FORMULAS FOR KONGSBERG MARITIME'S AIM SAFE SYSTEMS .......................................................................................................................................... 61 TABLE 5.5: SYSTEM STATE DEFINITIONS .......................................................................................... 63 TABLE 5.6: PFH FOR 1OO2 AND 1OO3 SYSTEMS FOR INDEPENDENT COMPONENTS .............. 65 TABLE 5.7: PFH FOR 1OO2 AND 1OO3 SYSTEMS FOR DEPENDENT COMPONENTS .................. 66 TABLE 5.8: AVAILABILITY MEASUREMENTS FOR 1OO2 AND 1OO3 SYSTEMS......................... 66 TABLE 5.9: INPUT DATA FOR A 1OO2 DP SYSTEM ........................................................................... 67 TABLE 5.10: PFH AND MTBF CALCULATION FOR DIFFERENT DP SYSTEMS ............................. 67

TABLE OF APPENDICES Appendix A: Appendix B: Appendix C:

Markov models for dual and triple redundant DP systems User manual for calculation tool Preliminary report

Models for quantification of availability of continuous control systems and reliability of safety systems

4 pages 11 pages 12 pages

v

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

SUMMARY Kongsberg Maritime AS is a vendor of safety systems (e.g. emergency shutdown systems) and continuous control systems (e.g. dynamic positioning systems). Today Kongsberg Maritime meet requirements from customers on quantification of the reliability and availability of their systems, and is in lack of satisfactory calculation models for this purpose. The calculation models must be flexible and adapted to the relevant operational conditions for their systems. Kongsberg Maritime’s customers require that the reliability and availability quantifications are performed in accordance with the industrial standard IEC 61508. IEC 61508 presents a method for quantifying reliability and availability of safety integrated systems (SIS). In addition the company is familiar with another method for quantification of reliability and availability of SIS; the PDS method developed by Sintef. This method is well-known within the Norwegian industrial sector, and the method is also adjusted to apply for the requirements given in IEC 61508. Kongsberg Maritime is therefore interested in calculation models based on one of these two methods. The main objective of this master work was to improve Kongsberg Maritime’s existing calculation models and verify the adequacy of the models. A thorough analysis of the PDS method and the method presented in IEC 61508 has revealed the main differences. Both methods claim that the probability of failure of the SIS is contributed by three main factors; common cause failures, random hardware failures and systematic failures. The contribution factors are handled differently by the two methods, and the main differences and a discussion of the adequacy of the two methods have been performed. The systems that Kongsberg Maritime need to develop reliability and availability calculation models for are, the safety systems called the AIM Safe systems, and the dynamic positioning (DP) systems which are defined as control systems. The AIM Safe systems are considered as low demand type of systems, meaning systems where the time period between activations is long. A failure in these systems will only cause a dangerous situation if the equipment under control places a demand for the system when it is in a failed state. The DP system is a continuously operating system; hence a failure of the system at any time will cause a dangerous situation. The existing calculation models used for reliability and availability of Kongsberg Maritime’s system is based on an older version of the PDS method, and adapted to be in accordance with IEC 61508 to a certain degree. For the AIM Safe systems Kongsberg Maritime is recommended to apply the method presented in the PDS method, in order to calculate the reliability of their AIM Safe systems. The formulas presented in this method is thoroughly derived, and the technique for modeling common cause failures are more realistic than the method presented in IEC 61508. For the DP systems the PDS method does not apply, due to the fact the PDS method only covers systems operating in low demand mode. IEC 61508 also presents a method for quantifying the frequency of system failures for continuously systems. This Models for quantification of availability of continuous control systems and reliability of safety systems

vi

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

method does not cover all the features of the DP systems, thus new calculation models for the DP systems are derived using the Markov technique. In order to effectively apply the calculation models in the reliability and availability calculations, Kongsberg Maritime was in need for an improved version of the existing calculation tool. A new calculation tool based on the recommended calculation models was derived using Microsoft Excel. The main factors of improvements was to validate the reliability and availability calculations, and to improve the user interface such that the tool could be used in the development- and delivery project by others than only the safety engineers.

Models for quantification of availability of continuous control systems and reliability of safety systems

vii

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

ABBREVIATIONS AND DEFINITIONS AIM CCF CSU DC DD failures DI DO DP DU failures E/E/PES ESD EUC Ex F&G HFT HW I/O MTBF MTTF MTTR OS PDS PFD PFH PSD PSF RCU RIO SFF SINTEF STR SU SW TB-xxxx WD

Albatross Integrated Multifunction Common Cause Failure Critical Safety Unavailability Diagnostic Coverage Dangerous Detected failures Digital Input Digital Output Dynamic Positioning Dangerous Undetected failures Electric/Electronic/Programmable Electronic Systems Emergency Shutdown Equipment Under Control Explosion Protection Fire & Gas Hardware Fault Tolerance Hardware Input/Output Mean Time Between Failure Mean Time To Failure Mean Time To Repair Operator Station Norwegian abbreviation meaning “reliability of computer-based systems Probability of Failure on Demand Probability of a dangerous Failure per Hour Process Shutdown Probability of Systematic Failures Remote Control Unit Remote Input/Output Card Safe Failure Fraction Norwegian abbreviation meaning “The Foundation for Scientific and Industrial Research at the Norwegian Institute of Technology (NTH)” Spurious Trip Rate System unavailability Software Termination Board where xxxx is the board type Watchdog

Models for quantification of availability of continuous control systems and reliability of safety systems

viii

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

The definitions given below are reproduced from IEC 61508-4, if not else is stated. Diagnostic test interval “Interval between on-line test to detect faults in a safety-related system that has a specified diagnostic coverage” Proof test interval “Periodic test performed to detect failures in a safety-related system so that, if necessary, the system can be restored to an “as new” condition or as close as practical to this condition NOTE: The effectiveness of the proof test will be dependent upon how close to the “as new” condition the system is restored. For the proof test to be fully effective, it will be necessary to detect 100 % of all dangerous failure. Although in practice 100 % is not easily achieved for other than low-complexity E/E/PE safety-related system, this should be the target. As a minimum, all the safety functions which are executed are checked according to the E/E/PES safety requirements specification. If separate channels are used, these tests are done for each channel separate.” Dangerous failures “Failure which has the potential to put the safety-related system in a hazardous or failto-function state” Safe failures “Failure which does not have the potential to put the safety-related system in a hazardous or fail-to-function state” PES “System for control, protection or monitoring based on one or more programmable electronic devices, including all elements of the system such as power supplies, sensors and other input devices, data highways and other communication paths, and actuators and other output devices” Channel “Element or group of elements that independently performs(s) a function” Logic system “Portion of a system that performs the function logic but excludes the sensors and final elements” Mode of operation “Way in which a safety-related system is intended to be sued, with respect to the frequency of demand made upon it, which may be either

Models for quantification of availability of continuous control systems and reliability of safety systems

ix

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

-

low demand mode: where the frequency of demands for operation made on a safety-related system is no greater than one per year and no greater than twice the proof-test frequency;

-

high demand or continuous mode: where the frequency of demand for operation made on a safety-related system is greater than one per year or greater than twice the proof-check frequency

NOTE 1: high demand or continuous mode covers those safety-related systems which implement continuous control to maintain functional safety” Safety instrumented system (SIS) “Instrumented system used to implement one ore more safety instrumented functions. An SIS is composed of any combination of sensors (s), logic solver (s), and the final elements(s) NOTE 1: This can include either safety instrumented control functions or safety instrumented protection functions or both.” (IEC 61511-1) Redundancy “Use of multiple elements or systems to perform the same function; redundancy can be implemented by identical elements (identical redundancy) or by diverse elements (diverse redundancy) NOTE 1: Examples are the use of duplicate functional components and the addition of parity bits. NOTE 2: Redundancy is used primarily to improve reliability or availability” (IEC 61511-1) Voting logics A system with redundancy is in this report referred to as a voting logic (koon) where k of n components must fail in order for the system to fail. The most common voting logics are shown in Table 1.1. In addition to 1oo2, 1oo3 and 2oo3, the 1oo2D is also a common voting logic. 1oo2D includes a special diagnostic feature which will on detection of a failure of one channel; disregard the failed channel in the voting logic. For the other voting logics a detection of failure will not change the voting logic, only alarm the failure.

Models for quantification of availability of continuous control systems and reliability of safety systems

x

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering Table 1.1: Voting logics

1oo2

2oo2

1oo3

2oo3

Reliability versus availability Reliability is generally defined as: “The ability of an item to perform a required function, under the given environmental and operational conditions and for a stated period of time” (ISO 8402) The “item” in this report refers to a safety system. The required function for a safety system is mainly to respond on demand, and the reliability of the safety system is Models for quantification of availability of continuous control systems and reliability of safety systems

xi

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

therefore described as the probability of failure on demand. Control systems are on the other hand designed to perform its function at high demand or continuously. It is therefore the availability of the control system that is of interest for customers. Generally availability is defined as (see also Figure 1.1): “The ability of an item (under combined aspects of its reliability, maintainability and maintenance support) to perform its required function at a stated instant of time or over a stated period of time” (BS4778)

Availability

Inherent reliability

Maintainability

Maintainance support

Figure 1.1: The availability of an item (reproduced from Rausand and Høyland 2004)

Throughout this report the availability of a control system describes only the inherent reliability of an item, thus the definition of availability becomes: “The ability for the control system to perform its designed function at a stated instant of time or over a stated period of time” …and does not include the system’s maintainability or maintenance support. The availability of the control system is affected by the probability of a dangerous failure per hour.

Models for quantification of availability of continuous control systems and reliability of safety systems

xii

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

1 INTRODUCTION 1.1 BACKGROUND Kongsberg Maritime AS covers the maritime business area of the international technology cooperation Kongsberg Gruppen ASA. The main market segments of Kongsberg Maritime are merchant marine, offshore and subsea, yachting and fisheries, marine information technology, simulation and process automation. Kongsberg Maritime’s headquarter is located in Horten, but they have manufacturers located several places all over the world. This actual project is related to Kongsberg Maritime’s manufacture in Kongsberg, which produces safety systems (e.g. emergency shutdown systems) and continuous control systems (e.g. dynamic positioning system) for the maritime sector. Kongsberg Maritime must react to requirements on quantification of reliability for their safety systems and availability for their continuous control systems from their customers. Mainly the requirements imply a documentation of the safety system’s safety integrity level (SIL) in accordance with IEC 61508, and a documentation of meant time between failures (MTBF) for the control system. This documentation requires use of calculation models that are flexible and adapted to the relevant operational conditions. Well adapted calculation models will reveal how to effectively improve the system in the design phase. Hence increase the reliability and availability of the system, and increase the system’s ability to compete on the market. Today Kongsberg Maritime is in lack of satisfactory calculation models for this purpose. The main objective of this master work is to help Kongsberg Maritime developing satisfactory calculation models for quantification of reliability for their safety systems and availability for their continuous control systems, and develop a calculation tool based on the calculation models for use in the design- and delivery projects. The main calculation method used within this field today, and therefore also best known for Kongsberg Maritime’s customers, are the method presented in IEC 61508-6 and the PDS method presented by Sintef. Therefore Kongsberg Maritime request that satisfactory calculation models must be based on these methods and adapted to their actual systems.

1.2 OBJECTIVES 1. Break down and compare the similarities/dissimilarities of the calculation methods presented in IEC 61508 and the PDS method presented by Sintef, and the assumptions these methods are based upon. Make a survey of the relevance of these methods and assumptions for Kongsberg Maritime’s systems. Identify special considerations not handled by any of the calculation methods. 1.1. Identify differences in IEC 61508 method and PDS method

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 13 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

1.2. Make a survey of the relevance of IEC 61508 method and PDS method and assumptions for Kongsberg Maritime’s systems 1.3. Identify Kongsberg Maritime’s relevant product range for this project 1.4. Identify special considerations not handled by any of the calculation methods 2. Improve Kongsberg Maritime’s existing calculation models of reliability for safety systems and availability for continuous control systems with respect to methods presented in IEC 61508, Sintef’s PDS method and other requirements specific for Kongsberg Maritime’s systems, based on the results obtained in objective 1. 2.1. Identify and review Kongsberg Maritime’s existing calculation models in their existing Excel worksheet 2.2. Work out a structure and optimize one or more calculation models for Kongsberg Maritime’s relevant product range, based on the results obtained in objective 1 3. Develop a tool for calculation of reliability for Kongsberg Maritime’s safety systems and availability for Kongsberg Maritime’s control systems to be used in development- and delivery projects, based on the improved calculation models developed in objective 2. 3.1. Develop a product specification for a calculation tool based on the calculation models developed in objective 2 3.2. Develop a tool for calculation of reliability and availability of Kongsberg Maritime’s relevant product range

1.3 LIMITATIONS ƒ

There is only a certain range of Kongsberg Maritime’s products that is treated in this master work. The specific product range is revealed in chapter 3.

ƒ

As described in section 1.1, the calculation methods treated in this master work is the method presented in IEC 61508-6 and Sintef’s PDS method. In addition the Markov method is introduced.

ƒ

Kongsberg Maritime is the vendor for the logic subsystems, thus only calculation models for the logic subsystems are derived (IEC 61508 split the system into sensor subsystem, logic subsystem and field subsystem).

ƒ

The main objective of this master work is to improve Kongsberg Maritime’s calculation model and develop a calculation tool. The collection and calculation of failure data for Kongsberg Maritime’s components are not treated in this master work.

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 14 of 77

Norwegian University of Science and Technology

ƒ

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

The master work is based Kjørstad (2004). Information given in Kjørstad (2004) will not be repeated in this report, unless it is needed.

1.4 METHODOLOGY A preliminary study was first performed with the aim of describing the objectives of the master work, planning the activities that had to be performed in order to reach the objectives, and develop a time schedule for the activities. The first activity that had to be performed was a literature survey. The literature survey resulted in a good foundation for the master work. When the literature survey was conducted, the analyzing process for comparing the method presented in IEC 61508-6 and the PDS method was performed. This process revealed the main differences between the two methods, and also gained knowledge on how the different methods derived their formulas. In order to apply the knowledge and develop satisfactory calculation models, the actual product range of Kongsberg Maritime’s product had to be revealed. Through the literature found in the literature survey on this area, knowledge about the specified systems was gained. In order to improve Kongsberg Maritime’s existing calculation models, the existing calculation tool was examined thoroughly. Based on the results from the analyzing process of the calculation methods (IEC and PDS method), the potential for improvements was revealed. Improved calculation models was developed, especially designed to fit the product range identified in chapter 3. Kongsberg Maritime has experience with quantification of the reliability of their safety systems, but not as much experience with the quantification of the availability of their DP systems. In addition little information except the method presented in IEC 61508-6, was found in the literature survey for the availability quantification of the DP systems. Hence the development of the calculation models for the DP system was a larger challenge than for the safety systems. The development of the calculation tool has been performed simultaneously with the development of the improved calculation models.

1.5 STATE OF THE ART SURVEY The literature survey has been conducted by a systematic search in relevant scientific databases; ScienceDirect, Compendex, Web of Science, and on the Internet using the search engine Google. The search has resulted in a great amount of literature, which has been sorted thoroughly in order to select the relevant literature to solve the problem at hand. The search resulted in information on the main areas: ƒ ƒ ƒ ƒ ƒ ƒ

IEC 61508 The PDS method Common cause failures Probability of failure on demand Safety analysis techniques Kongsberg Maritime’s specified product range

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 15 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

The industrial standard IEC 61508 is a standard for functional safety of electric/electronic/programmable electronic safety-related systems. Kjørstad (2004) has revealed the main requirements in order to design and develop safety systems in accordance with the standard. The literature survey conducted in this master work on this area resulted in information about the main disadvantages and advantages of the standard, and also other calculation methods that have been suggested to replace the methods suggested in IEC 61508-6 (Knegtering and Brombacher 1999). IEC 61508 presents little information about the derivation of the formulas presented in IEC 61508-6. There has also been hard to obtain literature of the derivations of these formulas elsewhere. The PDS method is a calculation method for safety systems developed by Sintef, and is presented in the PDS method handbook by Hokstad and Corneliussen (2003). The search has resulted in information about the technique used for modeling common cause failures (Hokstad and Corneliussen 2000), and derivations of the formulas given in the handbook. It is mainly Sintef’s reports that have resulted in information about the PDS method and it has been hard to obtain information about this method elsewhere. A reason for this may be that the method is mostly known within the Norwegian industrial sector. Identification and analysis of common cause failures and effects is a highly relevant subject, and has been thoroughly discussed by several authors (e.g. Parry 1991, Summers and Raney 1998, Rausand and Høyland 2004). Different techniques for analyzing common cause failures have previously been applied, both quantitative and qualitative. Summers and Raney (1998), and Parry (1991) discuss the possibility for using qualitative techniques for eliminating the potential sources for common cause failures in order to develop systems that have a low susceptibility for common cause failures. Summers and Raney (1998) even states that the failure data and the quantitative techniques that exist today, will result in questionable results when using quantitative techniques. Kvam (1998) and Goble (2003) on the other hand claim that a quantitative technique will have its advantages. Information about several quantitative modeling techniques for common cause failures have been revealed, and it seems like most experts recommend the use of the β-factor model for modeling common cause failures. Some experts claims that this is an insufficient method when used on a system with more than dual redundancy, but both Goble (2003) and Hokstad and Corneliussen (2000) recommends the use of the β-factor model or extended versions of this technique. The literature survey resulted in great information on the area of common cause failures, and clearly it is an important factor for considerations when performing reliability analysis of a system. Another factor of great importance when performing reliability analysis is the quantification of the system’s probability of failure on demand (PFD). Formulas for quantifying the PFD is given both in the PDS method handbook (Hokstad and Corneliussen 2003), and in IEC 61508-6. In addition the PFD has been discussed by several authors; Knegtering and Brombacher (2000), Zhang et. al. (2003) and Bukowski and Goble (2002). The PFD is used for quantification of the reliability of systems operating in a low demand mode. For systems working in a high demand mode, or continuous mode, little information except the method presented in IEC 61508-6 has Models for quantification of availability for continuous control systems and reliability for safety systems

Page 16 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

been found. For quantification of the PFD, several safety analyzing techniques have commonly been used. The literature survey resulted in information about different safety analyzing techniques (e.g. Rausand and Høyland 2004), and a comparison of the different techniques was found in Rouvroye and Bliek (2002). Several authors claim that the Markov technique is best fitted for analyzing safety integrated systems (SIS), and suggestions on how to use the Markov technique has been found in e.g. Rausand and Høyland (2004), Knegtering and Brombacher (2000) and Zhang et. al. (2003). The literature survey has also resulted in information about Kongsberg Maritime’s specified product range. In order to improve Kongsberg Maritime’s calculation models and adapt them to the relevant operational and environmental conditions for their products, information about the specified product range was needed. Most of the information about Kongsberg Maritime’s products has been gained by information given from employees of Kongsberg Maritime and the product description documents given on the specified products (Kongsberg Maritime AS 2003, and Kongsberg Maritime AS 2004).

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 17 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

1.6 STRUCTURE OF THE REPORT

Definitions and abbreviations needed to understand the contents of the report

Ch. 1: Introduction to the problem at hand and the objectives of the master work

Ch. 2: Analyzing process: comparing the method presented in IEC 61508-6 and the PDS method

Ch. 3: Identification and information about the specified product range. Resulted in a survey of special considerations that must be taken for the products

Ch. 4: Identification and discussion about Kongsberg Maritime's existing calculation models, resulted in a survey of potential improvements

Ch. 5: Reccomendations for improved calcualtion models, based on the result gained in the analyzing process and special considerations revealed in chapter 3

Ch. 6: Product specification and a description of the new calcualtion tool

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 18 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

2 ANALYSIS OF METHODS Systems for protecting and controlling processes in the industry today must include high reliability and availability in order to meet customer requirements as well as requirements from the authority. Programmable electronic systems (PES) are most commonly used as safety- and control systems due to the severe increase in availability compared to the conventional relay systems. These types of systems are often refereed to as safety instrumented systems (SIS), and this terminology will be used throughout this report to describe such systems. The main contribution factors to the increase in availability for the SIS are the quick repair and replacement possibilities, the digital communication capability and the ability for diagnostic testing of electronic components (Stavrianidis 1992). An analysis of the SIS’s availability and reliability is often needed in order to document that the requirements from customers and authority are met. More details about these requirements are given in Kjørstad (2004). Three different techniques are often used when performing analysis of the SIS (Summers and Raney 1999); 1. Industrial standards 2. Engineering guidelines and standards 3. Qualitative assessment In addition to qualitative analysis, it is common to perform a quantitative analysis of the SIS. Quantitative analysis opens for the possibility to compare different system configurations, and reveal the change in reliability and availability when considering different solutions. A quantitative analysis will also help the analyst to gain knowledge and understanding for the contribution factors during the analyzing process. Hence the list above should include: 4. Quantitative assessment Industrial standards, e.g. ANSI/ISA 84.01-1996 and IEC 61508 provide requirements for design of systems for protecting processes in the industry. The main advantage of using industrial standards in the analyzing process is that the standards are often well-known and widely accepted. A disadvantage is that industrial standards are often too general for providing satisfactory guidance to the specific system. Engineering guidelines and standards are developed within the company to provide information on how to design systems with an acceptable reliability. Due to the fact that the guidelines are developed within the company, they are specially designed to fit the system and will therefore give complete guidance. A negative side with internal guidelines and standards are the difficulty of getting an agreement within the company on how to develop the guidelines and to define the acceptable design of the system. Qualitative assessment are performed by experts on the actual field, examples of techniques used are checklists, what-if analysis, HAZOP and FMEA. Each of the different techniques has both advantages and disadvantages, and the choice of technique depends on the specific system.

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 19 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

In order to meet the requirements from customers and authority, the SIS must often be analyzed using an industrial standard. IEC 61508 is the leading industrial standard for functional safety of SIS at this date. Part 6 of this standard provides informative suggestions on methods for quantifying the reliability of SIS. In addition to models presented in IEC 61508-6, another calculation model called the PDS method is highly accepted within the Norwegian industrial sector. The PDS method handbook is developed by Sintef, and has been used to perform quantitative reliability analysis previous to the IEC committee’s IEC 61508. The latest version of the handbook has adapted the IEC terminology, which makes it possible to meet the requirements in the standard using the PDS method. It is therefore in Kongsberg Maritime’s interest that their calculation models are based on one of these methods, and thus meets the requirements given in IEC 61508. The SIS offers several major advantages compared to the conventional relays, as described above. Even so the SIS has also some disadvantages due to the fact that electronic components have several different failure modes, which may cause the system not to perform its designed function. Both the PDS method and IEC 61508 presents methods for performing quantitative analysis of SIS reliability, using the reliability block diagram technique. Other common quantitative techniques often used to analyze SIS are fault tree analysis and Markov analysis. The reliability block diagram technique is a wellknown technique within the industry today, and represents the system structure in a logical manner. A reliability block diagram presents a success oriented system structure, where each block represents the functions that must be performed in order for the system to perform its main function. A fault tree analysis is on the other hand a failure oriented technique, where the analyst starts to define the system failure (top-event) and reveals the possible events that may cause a system failure. When performing quantitative analysis using the reliability block diagram, it may be difficult to reveal all possible causes for system failure due to the fact that the technique is success oriented. The failure tree analysis will easier reveal the possible causes for system failures. However, both the reliability block diagram and the failure tree analysis represent rather static systems. The Markov analysis describes dynamic systems with the system states; functioning state, non-functioning state and also system states where the system is operating in a degraded mode. A comparison of the different quantitative techniques described above has been performed by Rouvroye and Bliek (2002), and concludes that the technique that holds the greatest modeling power is the Markov analysis. IEC 61508 does also recommend the use of Markov technique for performing quantitative reliability analysis, but there is only given calculation models based on the reliability block diagram technique in the standard (IEC 61508-6). The main advantage of the Markov technique is its ability to describe dynamic systems, hence it may present the special features included in a SIS. However the Markov model has a tendency to grow large when the systems are complex, resulting in calculation models that are cumbersome and difficult to follow. Solutions to this problem have been suggested by Knegtering and Brombacher (2000) and Rausand and Høyland (2004).

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 20 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

The contribution to SIS unavailability is often split into three types of failures: ƒ ƒ ƒ

Random hardware failures Systematic failures Common cause failures

The graph show in Figure 2.1 represents the unavailability of a SIS over time, and how the three contribution factors listed above influence the unavailability.

Figure 2.1: Contribution to SIS unavailability

The unavailability is increasing in a proof test interval, starting just above 0 % and ending at 100 %. The decrease in unavailability is caused by the probability of random hardware failures and common cause failures. After each proof test interval the system is repaired (if needed) and the unavailability is decreased to approximately 0 %. This is dependent on the coverage of the proof test. Due to the probability of systematic failures, the SIS will not achieve 0 % of unavailability just after each test interval. The probability of systematic failures is a constant value, independent of the test interval. Both the IEC 61508 method and the PDS method suggest solutions to analyze the contributions to SIS unavailability for safety systems. IEC 61508 quantifies the probability of random hardware failures and common cause failures, and suggest a qualitative approach for minimizing systematic failures. For both low demand and high demand/continuous mode systems (see Abbreviations and definitions). The safety systems are ranked based on safety integrity level (SIL), hardware fault tolerance (HFT) and safe failure fraction (SFF). Thorough information about SIL, HFT and SFF are given in Kjørstad (2004). The PDS method also suggests a method to analyze the SIS unavailability, however only for systems working in low demand mode. The PDS method presents quantitative methods for determining probability of random hardware failures, common cause failures and also systematic failures. In addition the PDS method includes Models for quantification of availability for continuous control systems and reliability for safety systems

Page 21 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

the spurious trip rate (STR) as a contribution factor to the unavailability of the equipment under control (EUC). In the following sections a comparison and discussion of the two methods (the IEC 61508-6 method and the PDS method) regarding the contribution factors to SIS unavailability are given.

2.1 COMMON CAUSE FAILURES Common cause failures (CCF) are defined as failures in redundant system where some or all components in the system fail due to a common cause. CCF may be split into two categories: 1. CCF where the components fail simultaneously, commonly caused by external shocks 2. CCF where the components fail within a larger interval of time, typical examples are CCF due to humidity and vibration The two different categories of CCF result in difficulty for failure reporting. Particularly category two will be hard to reveal as a CCF, and may be mistaken for a random hardware failure. This may result in inadequate failure rates for CCF. The diagnostic feature of the SIS presents a great advantage with respect to CCF type two. The probability of system failure due to CCF of category two will be much lower for a system with the ability of diagnostic testing, than fro a system without this ability. A SIS is able to avoid system failure due to CCF category two if the failure is detected and repaired before the other component(s) fail. Hence, for SIS with good diagnostic coverage it is CCF of category two that will give the largest contribution to the probability of system failure. Typical root causes for CCF of SIS are (Stavrianidis 1992): ƒ ƒ ƒ ƒ ƒ

Shared environmental dust Humidity Fire Vibration Electromagnetic interference (EMI)

In order to increase the reliability of a SIS it is common to introduce redundancy within the system. Systems with redundancy is in this report referred to as voting logics, where the most common voting logics are 1oo2, 1oo2D, 1oo3, 2oo2 and 2oo3 (see Abbreviations and definitions). It is generally accepted today that CCF has a great impact on the reliability of a redundant system. There has been shown that when introducing a higher degree of redundancy within the system, it usually will result in a higher probability of system failure due to CCF. A comparison performed by Tiezema (1998) of the reliability and availability of the most common voting logics, concludes that systems Models for quantification of availability for continuous control systems and reliability for safety systems

Page 22 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

with voting logic 1oo2D hold the highest reliability (with respect to random hardware failures and CCF), and at the same time the lowest STR. The second best system configuration is the 2oo3 voting logic, which also holds a low STR but is slightly more unreliable. In order to decrease the system’s susceptibility for CCF, Tiezema (1998) suggests these factors to be considered when designing and installing systems: ƒ ƒ ƒ ƒ

Common environment Diversity Robustness Software

Redundant components operating in different environment, e.g. placed in different rooms, will reduce the chance for CCF due to e.g. fire. Components from different vendors may reduce the chance for CCF due to e.g. different diagnostic tests, and introducing higher robustness to each component will result in lower probability of failure due to environmental conditions. Several techniques are used to document the system’s susceptibility to CCF, both quantitative and qualitative. Summer and Raney (1998) claim that the available failure data and quantitative methods at this date are too poor to give a realistic mathematical model of the system. They recommend to perform qualitative analyzes of the system (e.g. checklists) with respect to design, installation, operation and maintenance, in order to eliminate the root causes. In addition to qualitative analysis, a quantitative analysis will result in the possibility to compare different types of voting logics and thus determine the optimal system configuration. A quantitative analysis will during the analyzing process give the analyst a better understanding of CCF, and how to design systems with lower susceptibility to CCF. A quantitative analysis may be performed using either shock models or non-shock models. Shock models treat CCF as independent shocks occurring at a certain frequency with a certain distribution, causing all or some components to fail. Non-shock models estimate the probability of CCF without treating the appearance of shocks as a distribution. The different methods that have been used in the past for modeling of CCF are (Stavrianidis 1992): ƒ

Shock models o Random probability shock model o Multinomial failure rate model o Binomial failure rate model

ƒ

Non-shock models o The square root bounding method o Multiple Greek letter model o The β-factor model o Basic parameter model o Multiple dependent failure function model o Alpha factor model

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 23 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

The most common quantitative method for modeling CCF is the β-factor model, or extended versions of this model. The main disadvantage of the general β-factor model is its inability to treat CCF of category two (see page 22), meaning that it assumes that CCF are independent of the degree of redundancy in the voting logic. Thus it is most suitable for 1oo2 systems, and will give conservative numbers for systems with a higher degree of redundancy. Figure 2.2 illustrate the β-factor model for a 1oo3 system, with components A, B and C. β represent the probability for CCF when a failure occurs, 1-β represent the probability for failure not caused by CC when a failure occurs. It is clear from the figure that the model assumes a system failure whenever a CCF occur in the system, and disregards the probability of only two components fail due to CCF. In the reality a CCF may cause just two of the components to fail, hence the system will function as a single system until the other components are repaired and set back to a functioning state.

A

B

0

1− β

1− β

β 0

0 C

1− β

Figure 2.2: Beta-factor model (adapted from Hokstad et. al. 2003)

Stavrianidis (1992) criticize the original β-factor model and suggests the use of extended versions of the β-factor model; the basic parameter model, multiple Greek letter model or the alpha model. These three techniques include the probability of not all components fail due to CCF. According to Goble (2003) the β-factor model is sufficient if it is applied in a conservative matter. Both IEC 61508-6 and the PDS method also make use of the βfactor model. IEC 61508-6 annex D has improved the β-factor model by suggesting a method to determine a plant specific β-factor, based on points scored from a table (XYtable) containing questions considering: ƒ ƒ ƒ ƒ ƒ ƒ ƒ ƒ

Separation/segregation Diversity/redundancy Complexity/design/application/maturity/experience Assessment/analysis and feedback of data Procedures/human interface Competence/training/safety culture Environmental control Environmental testing

The total probability of a dangerous failure due to CCF is then, according to IEC 615086, equal to

λDU β + λDD β D Models for quantification of availability for continuous control systems and reliability for safety systems

(1.1)

Page 24 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

The standard differs between β-factor for dangerous undetected (DU) failures and dangerous detected (DD) failures. The β-factor (βD) for DD failures is in addition to the points scored in the XY-table, dependent on the coverage of the diagnostic tests (DC) and interval for diagnostic tests. The βD is generally assumed to be 50 % of the β for DU failures. It is logical that the βD factor is assumed lower than the β-factor due to the advantage of the diagnostic features of the SIS. The probability of a system failure due to CCF of category two is normally much lower than the probability of system failure due to CCF of category one (see page 22). Example 2.1 and Example 2.2 illustrate the derivation of the β-factors (β and βD) according to IEC 61508-6, for a perfect logic subsystem and a non-perfect logic subsystem. Example 2.1: Perfect logic subsystem

Perfect logic subsystem (scores on every point in the XY-table) with DC larger than or equal to 99 % and a diagnostic test interval1 less than 1 minute. Points scored from XY-table and Z table: X = 62.5, Y = 50, Z = 2 Results: S = X + Y = 112.5 ⇒ β = 1% S D = X ( Z + 1) + Y = 237.5 ⇒ β D = 0.5% Example 2.2: Non-perfect logic subsystem

Non-perfect logic subsystem (no scores in the XY-table) with DC larger than or equal to 60 % and diagnostic test interval greater than 5 minutes. Points scored from XY-table and Z table: X = 23, Y = 17, Z = 0 Results: S = X + Y = 40 ⇒ β = 5% S D = X ( Z + 1) + Y = 40 ⇒ β D = 5% This method is a solution in the middle of the experts’ opinions, using both checklists (as recommended by Summers and Raney 1998) and a quantitative method for describing CCF. Special features within the system will in reality decrease the dependency, but by using other quantitative models the effect of these will not be shown. Hence a plant specific β-factor is achieved. The disadvantage of the method is that it results in little deviation between identical systems using different degree of redundancy. A comparison of different voting logic for a system will therefore be hard to perform. Another issue for 1

See Abbreviations and definitions for difference between diagnostic test interval and proof test interval. Models for quantification of availability for continuous control systems and reliability for safety systems Page 25 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

consideration is the correctness of the XY and Z tables. Will the points scored from each question be in proportion to the effect the contribution factors have to the system’s susceptibility for CCF? Clearly the correctness of the β-factor is fully dependent on the correctness of the XY and Z tables. The PDS method has developed an extended version of the β-factor model that claims to improve the one presented in IEC 61508-6. Previous versions of the PDS method (Hansen and Vatn 1998) modeled CCF using a pfactor model, where the dependency factor (p-factor) was determined taking into account the number of components and the degree of dependency between the channels2 (low, medium, high or complete). This way the β-factor model was extended to include the possibility of not all channel fail due to CCF. Hence include the CCF of category two where the diagnostic feature of the SIS will give the possibility to repair a detected failure before the remaining components fail. Subsequent work that has been performed by Sintef (Hokstad and Corneliussen 2000) within the field of CCF, after the introduction of IEC 61508 in 1999, recommends including the IEC 61508-6 plant specific approach. This will give a more realistic dependency evaluation of the system. The PDS method (2003) has adapted the modification factor which replaces the p-factor model. Table 2.1 shows the modification factors for different voting systems, and Figure 2.3 illustrate the extended β-factor model for a 1oo3 system. Table 2.1: Modification factor for beta, according to voting of channels (adapted from Hokstad et. al. 2003)

Voting CMooN

1oo2 1.0

1oo3 0.3

2oo3 2.4

A

1− β

1oo4 0.15

0.7 ⋅ β 0.3 ⋅ β

0.7 ⋅ β

2oo4 0.8

3oo4 4.0

B

1− β

0.7 ⋅ β C

1− β

Figure 2.3: Extended beta-factor model used in PDS method (adapted from Hokstad et. al. 2003)

2.2 PROBABILITY OF FAILURE ON DEMAND The reliability of a SIS operating in low demand mode3 is often presented as the average probability of failure on demand (PFD). The PFD includes the contribution factors: ƒ

Random hardware failures

2

Channel is defined Abbreviations and definitions. Systems operating in low demand mode are defined in Abbreviations and definitions Models for quantification of availability for continuous control systems and reliability for safety systems 3

Page 26 of 77

Norwegian University of Science and Technology

ƒ

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Common cause failures

Random hardware failures are defined as “Failure, occurring at a random time, which results from one or more of the possible degradation mechanisms in the hardware” (IEC61508-4) Random hardware failures are commonly more predictable than other failures, such as CCF and systematic failures, due to the fact that the lifetime distribution and failure rate of a component will be revealed when tested in its intended operational and environmental conditions. In addition good historical failure data are often available, because random hardware failures are commonly easy to reveal. A system failure of a SIS will only cause a dangerous situation if the system is in a failed state when a demand for the system arises, e.g. if an emergency shutdown system (ESD) is in failed state when the EUC places a demand on the ESD. The SIS may be found in a failed state when a demand arises for mainly three reasons: 1. The SIS is in a failed state due to a dangerous undetected (DU) failure 2. The SIS is being repaired due to a system failure that has been detected by the diagnostic tests within the system 3. The SIS is being repaired due to a system failure that has been revealed by a proof test The PFD for the SIS due to reason number one is strongly dependent on the coverage of the diagnostic test, often called the diagnostic coverage (DC). If the SIS includes a high DC, the amount of DU failures is very small and hence the PFD due to this reason is very small. Another factor that influences the PFD for reason number one is the proof test interval and the coverage of the proof test. If the proof test interval is short and the coverage of the proof test is high, the PFD will be even smaller. If the proof test interval is larger and the coverage of the proof test lower, the PFD will be higher due to the fact that the DU failures will be present over a longer time period. High DC, high coverage of proof test and a short proof test interval will therefore result in a low PFD due to reason number one. However, a high proof test interval represents a great cost and the gain in reliability will often not be worth the extra cost. The PFD of the SIS due to reason number two and three are strongly dependent on the mean time to repair (MTTR) of the failed components, and the time period from detection to a repair crew have started the repair actions. If the MTTR is low and assuming repair teams begin repair actions immediately, which often is assumed for SIS, the PFD due to the last two reasons are very low and often considered as negligible. When studying a SIS over time, the PFD(t) of the SIS may be illustrated as in Figure 2.4. The PFD(t) will increase within the proof test interval (τ) until the proof test is performed. After a proof test is performed the PFD will approximately be equal zero, if assuming that the proof test revealed all possible failures (random hardware failures and CCF) within the SIS. Models for quantification of availability for continuous control systems and reliability for safety systems

Page 27 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

PFD(t)

t

τ Figure 2.4 : PFD for a SIS over time (adapted from Rausand and Høyland 2004)

When quantifying the reliability of a SIS, it is most convenient to express the long run value of the PFD (PFDaverage) and not the PFD(t). The average PFD for a SIS is illustrated in Figure 2.5 as a constant value. PFD(t)

PFDaverage t

τ Figure 2.5: Average PFD for a SIS (adapted from Rausand and Høyland 2004).

Example 2.3: A SIS consisting of a single component

Assuming a SIS consisting of a single component, and that the single component has an exponential distributed lifetime (i.e. constant failure rate). Table 2.2 shows the different measures for the component’s reliability. Table 2.2: Reliability measurements for a component with an exponential distributed lifetime

The failure rate function

z (t ) = λ

The reliability function

R ( t ) = e − λt

The distribution function

F ( t ) = 1 − R ( t ) ⇒ F ( t ) = 1 − e− λt

The failure rate function describes the probability of failure at time t for the component. Hence the failure rate is the expected amount of failures per time interval for this type of component. Assuming an exponential distributed lifetime the failure rate will be a constant value, often denoted as λ and presenting the amount of failures per hour. The reliability function is also often called the survival probability of the component, and represents the probability that the component survives up to and including time t. The distribution function is commonly called the failure probability of the component, and Models for quantification of availability for continuous control systems and reliability for safety systems

Page 28 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

represents the probability that the component fails before or at time t. It is clear the failure probability is equal to 1-R(t). Assuming that the component is not being repaired when it has failed and that the component will fail within the proof test interval, the PFD of the component is equal to the average of the PFD(t) in one proof test interval. For a single component with the specified assumptions, the PFD(t) for one proof test interval will be equal to the failure probability in the proof test interval. As described in Table 2.2, the failure probability is equal to 1-R(t), hence: PFD =

1

τ

τ

∫ PFD ( t )dt = 0

1

τ

τ

∫ F ( t )dt = 1 − 0

1

τ

τ

∫ R ( t )dt

(1.2)

0

Further including the survival probability for a component with an exponential distributed lifetime, the PFD may be written as in Equation 1.3. λDU is the rate of dangerous undetected (DU) failures, hence component failure due to reason number two and three (see page 27) are considered as negligible because it is assumed that the component is not repaired. PFD = 1 −

1

τ

τ

∫ R ( t )dt = 1 − 0

1

τ

e τ∫ 0

− λDU t

dt = 1 −

1

λDU

(1 − e τ

− λDU τ

)

(1.3)

An approximation formula for PFD average when λDUτ is small is found by replacing e − λDU τ above with its Maclaurins series (Rausand et. al. 2004): 2 3 4 ⎞ λDUτ ) ( λDUτ ) ( λDUτ ) ( 1 ⎛ + − + ... ⎟ PFD = 1 − ⎜ λDUτ − ⎟ λDUτ ⎜⎝ 2 3! 4! ⎠

⎛ ( λDUτ )2 ( λDUτ )2 ( λDUτ )3 ⎞ (λ τ ) = 1 − ⎜1 − + − + ... ⎟ ≈ DU ⎜ ⎟ 2 3! 4! 2 ⎝ ⎠

(1.4)

The approximation in Equation 1.4 will always result in more conservative values than the exact formula in Equation 1.3. Making use of the method above, approximation formulas for different voting logic (for identical, independent components tested at equal proof test interval) may be derived under equal assumptions. The results are shown in Table 2.3.

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 29 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Table 2.3: PFD of some koon voting logic (reproduced from Rausand and Høyland 2004)

k\n 1

1 λDUτ

2

( λDUτ )

2

2 3

3

( λDUτ )

2

3



3

λDUτ

4 2 ( λDUτ )





3λDUτ 2

The PFD formulas given in Table 2.3 are based on assumptions that are unrealistic for most type of systems, e.g. assuming that the components are independent, but will give more realistic formulas when the probability of CCF failures and repair are included.

2.2.1 PDS method The PDS method bases the calculation model for PFD due to independent failures, on the approximation formulas presented in Table 2.3. Hence it is assumed that the components are identical and that λDUτ is small enough to allow the approximation. The PDS method split PFD into known and unknown PFD. Unknown PFD is the probability of a system failure during the time period when it is not known that the system is in failed state (DU failure in one or more channels). Known PFD is the probability of a system failure during the time period when it is known that the system is in failed state (dangerous detected (DD) failures in one or more channels are detected and under repair). PFD = PFDUK + PFDK

(1.5)

Equation 1.6 and 1.7 is the detailed formulas for a 1oo2 system, where the probability for system failure due to independent failure of both channels is included. The modification factor used to model CCF (see Table 2.1) is not shown in these equations. Table 2.4 and Table 2.5 describe the terminology used in the PDS method.

PFDUK = β ⋅ λDU ⋅τ

( (1 − β ) ⋅ λ 2+

DU

⋅τ )

3

2

+ 2 ⋅ (1 − β ) ⋅ λdet ⋅ MTTR ⋅ λDU ⋅τ 2

(1.6)

PFDK = β ⋅ λD ⋅ MTTR + ( (1 − β ) ⋅ λDU ) ⋅τ ⋅ MTTR + (1 − β ) ⋅ λdet ⋅ MTTR ⋅ λDD ⋅ MTTR (1.7) 2

Table 2.4: Failure terminology used in the PDS method (adapted from Hokstad et. al. 2003)

Dangerous Spurious Trip Sum Models for quantification of availability for continuous control systems and reliability for safety systems

Undetected λDU λSTU λundet

Detected λDD λSTD λdet

Sum λD λST λcrit

Page 30 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Table 2.5: Abbreviations and explanations to the PDS model

MTTR

τ β

Mean time to repair (time period from the failure is revealed, either by diagnostic testing or by proof test, to the component is put back to a functioning state). Proof test interval [hours] The β-factor used in the β-factor model and represent the fraction of dangerous failures that are CCF

Equation 1.6 and 1.7 may each be split into three terms, Table 2.6 and Table 2.7 explains the different terms for unknown and known failures respectively. Table 2.6: Comments on PFD unknown (adapted from Hokstad and Corneliussen 2003)

Term # 1 2

3

Comments System failure due to CCF Independent dangerous undetected (DU) failure in both channels in the same test interval During repair of one component due to a detected failure (λdet), the operating channel fails due to a DU failure.

Table 2.7: Comments on PFD known (adapted from Hokstad and Corneliussen 2003)

Term # 1 2

3

Comments Repair of both components due to CCF Repair of both components due to an independent DU failure in both channels in the same test interval During repair of one component due to a detected failure (λdet), the operating channel fails due to a DD failure.

If assuming that the PFD of the SIS is only influenced by CCF, the PFD formula presented in Equation 1.6 and 1.7 is reduced to: PFDCCF = β ⋅ λDU ⋅τ 2 + β ⋅ λD ⋅ MTTR

(1.8)

When rearranging Equation 1.8 we get Equation 1.9 which is more understandable:

PFDCCF = β ⋅ λDU ⋅ (τ 2 + MTTR ) + β ⋅ λDD ⋅ MTTR

Models for quantification of availability for continuous control systems and reliability for safety systems

(1.9)

Page 31 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

The first term describes system failure due to DU CCF4, which is assumed to occur in half of the proof test interval. The failure is assumed revealed in the proof test and repaired with MTTR. The second term describes DD CCF5, which is detected at occurrence by diagnostic testing and repaired at once with MTTR. Time from detection of failure (either by diagnostic test or by proof test) to the repair actions are taken is considered as negligible. The PDS method assumes that a DU failure will in average occur at τ/2, thus it will take τ/2+MTTR time in order detect and repair the DU failure and put the system into normal operation (see Figure 2.6). When the system fails due to a DD failure, the failure is detected at once and it will only take MTTR in order to put the system back to normal operation.

Occurence of DU failure

MTTR

τ

τ 2

0

t

Figure 2.6: Occurrence of DU failures

The PDS method assumes that the mean down time (MDT) for a DU failure in a proof test interval is τ/2+MTTR for a single system. This statement is proved by Rausand and Høyland (2004) p. 433. MDT for one component in one proof test interval is then given in as: τ

MDT = τ ⋅ PFD = ∫ F ( t )dt

(1.10)

0

where F(t) is the distribution function for a component with an exponential distributed lifetime (see Table 2.2). The MDT in a proof test interval, when assuming that the component is found in a failed state at the proof test and that it is not repaired, is then given as: MDT =

1

τ

τ

τ

F ( t )dt = ⋅ PFD ≈ F (τ ) ∫ F (τ ) 1− e λ −

0

DU τ



λDUτ 2



τ 2

(1.11)

The last term (λDUτ/2) is the approximation formula for a single system (1oo1) shown in Table 2.3, thus the conditional MDT given in Equation 1.11 is only valid if λDUτ is small. When repair time is included, the MDT due to a DU failure is equal to the one assumed in the PDS method:

4

With DU CCF it is meant the fraction of dangerous undetected failures that are common cause failures. With DD CCF it is meant the fraction of dangerous detected failures that are common cause failures. Models for quantification of availability for continuous control systems and reliability for safety systems Page 32 of 77 5

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

MDT =

τ 2

+ MTTR

(1.12)

Further, Rausand and Høyland (2004) p. 433 derives the conditional MDT in a proof test interval for a 1oo2 system of two independent and identical components (assuming that the system is found in a failed state at proof test): MDT ≈

τ 1 − 2e

λDU τ

+e

−2 λDU τ

( λ ⋅τ ) ⋅ DU 3

2



τ 3

(1.13)

where the denominator in the first term is equal to the failure probability of a 1oo2 system, and the last term is the approximation formula for a 1oo2 system shown in Table 2.3. The second term in the PFDUK formula (see Equation 1.6), is also based on the approximation formulae for a 1oo2 system shown in Table 2.3. When adding the second term in the PFDK the repair actions are included. The second term in the PFDUK formula is rewritten as:

( (1 − β ) λ

DU

τ)

2

3

(1.14)

For both second and last term in PFDUK and PFDK, the DU failure rate is multiplied with 1-β. Hence it is assumed that the failure rate (λDU) represent the total failure rate where

λtot = λCC + λindependent

(1.15)

and

λindependent = (1 − β ) λtot

∧ λCC = β ⋅ λtot

(1.16)

λCC is the failure rate of CCF, λindependent is the failure rate of independent failures and λtot is the total failure rate (including both CCF and independent failures). It is important to notice that different data sources presents different failure rates, some including CCF and some excluding CCF. Information about the collection of failure data are normally given in the data source. If the failure rates are based on historical data from field, the failure rate will most likely include CCF. If the failure rates are based on testing of single components (e.g. laboratory testing), the failure rates will only represent independent failures.

2.2.2 IEC 61508-6 method The PFD formula for a single system presented in IEC 61508-6 is:

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 33 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

⎛τ ⎞ PFD = ( λDU + λDD ) tCE = λDU ⎜ + MTTR ⎟ + λDD MTTR ⎝2 ⎠

(1.17)

when

tCE =

λDU λD

⎛τ ⎞ λDD MTTR ⎜ + MTTR ⎟ + ⎝2 ⎠ λD

(1.18)

where Table 2.8: Failure terminology in IEC 61508-6

Dangerous Safe failures Sum

Undetected λDU λSU λundet

Detected λDD λSD λdet

Sum λD λS λtotal

Table 2.9: Abbreviations and explanations to the IEC 61508-6 method (adapted from IEC 61508)

tGE tCE

τ β βD

MTTR

λtotal

Voted group equivalent mean down time [hours] for 1oo2 architecture. This is the mean down time (MDT) in a test interval for the voting logic. Channel equivalent mean down time [hours] for 1oo2 architecture. This is the mean down time (MDT) in a test interval for one channel. Proof test interval [hours] The β-factor used in the β-factor model and represent the fraction of DU failures that are CCF. The fraction of DD failures that are CCF. IEC 61508-6 operates with the assumption that βD is 50 % of β: β = 2 ⋅ βD Mean time to restoration (time period from the failure is revealed, either by diagnostic testing or by proof test, to the component is put back to a functioning state). IEC 61508-6 generally assumes that: λ λtotal = λS + λD ∧ λS = λD = total 2

This formula is equal to the one given in the PDS method for 1oo1 systems, and thus includes the same failure philosophy and assumptions. The PFD formula for 1oo2 system presented in IEC 61508-6 is: ⎛τ ⎞ PFD = 2 ( (1 − β D ) λDD + (1 − β ) λDU ) 2tCE tGE + β D λDD MTTR + βλDU ⎜ + MTTR ⎟ (1.19) ⎝2 ⎠

where tCE is given in Equation 1.18, and tGE is Models for quantification of availability for continuous control systems and reliability for safety systems

Page 34 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

tGE =

λDU λ ⋅ (τ 3 + MTTR ) + DD ⋅ MTTR λD λD

(1.20)

Equation 1.19 may be split into three terms, Table 2.10 explains the different terms in the equation.

Table 2.10: Comments on PFD in IEC 61508-6

Term # 1 2 3

Pr(System failure due to…) Independent dangerous failure in both channels in the same test interval Dangerous detected common cause failure Dangerous undetected common cause failure

tCE represent the MDT for a channel, and tGE represent the MDT for the voting logic. When assuming only DU failures, the MDT for the system will result in τ/3 and the MDT for a channel will result in τ/2 when disregarding repair. The assumptions and derivations for the MDT for 1oo1 system (tCE) and 1oo2 system (tGE) are described thoroughly in section 2.2.1. When assuming only DD failures, the MDT for one channel will result in MTTR, which is logical when assuming that the time from detection to repair actions are began is negligible. If assuming that one channel will fail due to a DD failure, the MDT will be equal to zero due to the assumptions that the channel will be repaired immediately, and that the probability of a dangerous failure in the operating channel during MTTR is negligible. The first term in the PFD calculation model for a 1oo2 system in Equation 1.19 is rewritten as: 2 ( (1 − β D ) λDD + (1 − β ) λDU ) 2tCE tGE

(1.21)

For a SIS without the diagnostic capability where only DU failures will have an effect on the PFD, and disregarding repair of the component the formula is reduced to: 2 ( (1 − β ) λDU ) ⋅τ 2 ⋅τ 2

( (1 − β ) λ 3⇒ 3

DU

τ)

2

(1.22)

that is equal to the approximation formula in Table 2.3 for a 1oo2 system. When assuming only DD failure and disregarding repair, the MDT for a channel and the system is equal to zero. The probability of system failure due to independent DD failure in both Models for quantification of availability for continuous control systems and reliability for safety systems

Page 35 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

channels is then equal to zero, hence it is assumed negligible. The second and last term in equation 1.19 is rewritten as

β ⋅ λDU ⋅ (τ 2 + MTTR ) + β D ⋅ λDD ⋅ MTTR

(1.23)

Equation 1.23 represents the probability of system failure due to CCF (DU CCF and DD CCF). IEC 61508-6 makes use of different β-factors for detected and undetected failures respectively, commented in section 2.1. When assuming that β is equal to βD, equation 1.23 is equal to the model for CCF used in the PDS method (without the modification factor).

2.2.3 Probability of a dangerous failure per hour The PDS method covers only SIS operating in low demand mode, thus calculation models for the system’s probability of failure on demand (PFD). However, the PDS method does not cover systems operating in high demand or continuous mode6. For a low demand SIS a dangerous situation occurred any time the system was in a failed state and it was placed a demand for the SIS. For a continuously operating SIS a dangerous situation will occur any time the system fails dangerously undetected (DU). If the system fails dangerously detected (DD), the SIS will put the EUC into a safe state. For continuously operating SIS that are functioning as a control system and integrated in the EUC, a dangerous situation will normally occur any time the system fails dangerously (both DU and DD). High demand or continuously operating SIS is often control systems, an example of a SIS control system may be a dynamic positioning system. The availability of a continuously operating system is often presented as the average probability of a dangerous failure per hour (PFH). The PFH describes also the frequency of system failures. As for the probability of failure on demand (PFD) for SIS operating in low demand mode, the contribution factors included in the PFD are: ƒ ƒ

Random hardware failures Common cause failures

IEC 61508-6 presents calculation models to calculate the PFH for SIS systems operating in high demand and continuous mode. PFH for a single system (1oo1) is according to IEC 61508-6 equal to the DU failure rate, when assuming that whenever a DD failure break out the EUC is set to a safe state. PFH = λDU

(1.24)

PFH for a 1oo2 system is according to IEC 61508-6:

6

The definition of low demand and high demand/continuous mode is given in Abbreviations and definitions. Models for quantification of availability for continuous control systems and reliability for safety systems Page 36 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

PFH average = 2 ( (1 − β D ) λDD + (1 − β ) λDU ) tCE + β D λDD + βλDU 2

where

tCE =

λDU λD

⎛ T1 ⎞ λDD MTTR ⎜ + MTTR ⎟ + ⎝2 ⎠ λD

(1.25)

(1.26)

The models are based on the same failure philosophy as for the PFD models presented in IEC 61508-6 (assumptions and derivations are performed in section 2.2.1 and 2.2.2). Equation 1.25 may be split into three terms; each term is commented in Table 2.11. Table 2.11: Comments on PFH according in IEC 61508-6

Term # 1 2 3

Pr(System failure due to…) Independent dangerous failure in both channels in the same test interval Dangerous detected common cause failure Dangerous undetected common cause failure

The first terms includes the probability of system failure due to independent dangerous failures in all channels, while the two last terms include the probability of system failure due to CCF (both for DD and DU failures). CCF are modeled using the plant specific βfactor model used in IEC 61508-6 annex D, where the β-factor (βD and β) are derived based on the points scored in the XY- and Z tables (described in section 2.1).

2.3 SYSTEMATIC FAILURES Systematic failures are defined as (IEC 61508-4) ”Failure related in a deterministic way to a certain cause, which can only be eliminated by a modification of the design or of the manufacturing process, operational procedures, documentation or other relevant factors” Systematic failures are often hard to quantify, because there is difficult to predict the failures. Typical systematic failures for SIS are (Hokstad and Corneliussen 2003): ƒ ƒ ƒ

design failures (including software failures) random human interaction failures test interaction failures

These types of failures are hidden in the system and not discovered before the right conditions for the failure to occur are present. Systematic failures, and then especially software failures, has a tendency to decrease over time because the system is improved when a systematic failure is discovered. Hence new type of systems will often have a larger probability of systematic failures (PSF) than systems that have been operating for a while. When developing new systems it is therefore of major importance to keep Models for quantification of availability for continuous control systems and reliability for safety systems

Page 37 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

systematic failures in mind during the design process, and try to minimize the possibility for such failures. IEC 61508 suggest a qualitative method to be performed with the intention to avoid the occurrence of systematic failures in the design phase, and also for the rest of the lifecycle phases of the system. According to IEC 61508 there are not possible to quantify the PSF of a system due to the difficulty of listing every individual cause of the systematic failures. A quantitative analysis is therefore not recommended. However, quantitative analysis has its advantages. One advantage is the possibility to compare systems including identical components with different degree of redundancy. Another major advantage is revealed when comparing systems that satisfies the same SIL. Figure 2.7 illustrates an example of two systems, both satisfying SIL2.

Figure 2.7: SIS satisfying SIL2

As mentioned earlier, the SIL of a system is determined based on the PFD, SFF and HFT. The two systems are according to Figure 2.7 both satisfying SIL2, hence they are said to be equal. If including the systematic failures in the PFD (PFD+PSF), the reliability of the systems will decrease and it may be that the system which before just barley satisfied SIL2 will now only satisfy SIL1. Thus the two systems are no longer considered as equal. When calculating the PSF it is possible to present a confidence interval for the PFD value, which will provide information about the probability that the system actually satisfies the given SIL. The PDS method suggests a quantitative analysis to be performed for the systematic failures, and present the term critical safety unavailability (CSU) which is equal to PFD + PSF. The PDS method splits the PSF into three parts; systematic failure caused by design failure, random human interaction failure or test interaction failure: PSF = PSFDF + PSFRI + PSFTI

(1.27)

A realistic quantitative measure for each of the PSF is often hard to obtain, and thus the PDS handbook suggest that only the total PSF shall be assessed independent of the proof test interval. In order to get a realistic quantification of the PSF, it should be based on expert judgment and be specific for each component. When calculating the total PSF of the system, the probability for CCF should be included with a β-factor assessed on expert judgment for systematic CCF (Equation 1.28). The value is then adjusted with the modification factor for the specific voting logic (Table 2.1). Models for quantification of availability for continuous control systems and reliability for safety systems

Page 38 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

β SF ⋅ PSF

(1.28)

An assessment of the total PSF may be hard to obtain due to no or little experienced data, and to the fact that identification of causes is hard. However, introducing the CSU either in addition to or instead of PFD will present a more realistic picture of the system. There will always be a probability of systematic failures, thus the PSF should be presented.

2.4 SPURIOUS TRIPS Spurious trips are safe failures causing the safety system to activate without demand and put the EUC into a safe state, which normally means a shutdown of the EUC (the control system). The spurious trip rate (STR) is in the PDS method handbook defined as (Hokstad and Corneliussen 2003) “The mean number of spurious activations of the safety system per time unit (due to random hardware failures)” The STR for safety systems is according to Rausand and Høyland (2004), often comparable to the DU failure rate and is in some cases even higher. A high STR will decrease the availability of the control system and thereby present significant costs. Redundancy is mainly introduced to insure high reliable systems, but will also cause a higher STR. The STR is increasing with the increasing degree of redundancy: n

λST , 1oon = ∑ λST , i

(1.29)

i =1

The perfect system should hold both a high reliability and a high availability. In order to obtain this, the STR and PFD for different voting logics must be compared. Table 2.12 shows the result from a comparison performed by Tiezema (1998), assuming identical and independent components. Table 2.12: PFD and STR for systems with identical and independent components (adapted from Tiezema 1998)

Voting logic 1oo1

λDUτ 2

1oo2D

( λDUτ )

PFD = 4.38 ⋅10 −3

λST [failure/hour] λST = 1 ⋅10−6 2 ⋅ ( λST ) ⋅ MTTR = 1.6 ⋅10 −9 2

2

−5

2oo3

= 2.56 ⋅10 3 2 ( λDUτ ) = 7.67 ⋅10−5

6 ⋅ ( λST ) ⋅ MTTR = 4.8 ⋅10 −9

2oo2

λDUτ = 8.76 ⋅10−3

2 ⋅ ( λST ) ⋅ MTTR = 1.6 ⋅10 −9

2

2

According to the results given in Table 2.12, voting logic 1oo2D holds the highest reliability and availability. Voting logic 2oo2 has an equally low STR, but does also hold Models for quantification of availability for continuous control systems and reliability for safety systems

Page 39 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

the lowest reliability. Next after 1oo2D is voting logic 2oo3, which holds a low STR and also a high reliability. The 2oo3 voting logic is often chosen for the sensor subsystem (Rausand and Høyland 2004). IEC 61508 does only consider the reliability of the safety system, the availability of the EUC is not given any importance. The PDS method states the importance of the STR, and includes it in their calculation models as the amount of critical safe failures (see the failure terminology in Table 2.4).

λST = λS − λNONC

(1.30)

As for systematic failures, the PDS method also considers CCF for spurious trips described with the extended β-factor model (β-factor multiplied with the modification factor for the specific voting logic, see Table 2.1).

2.5 SAFE FAILURE FRACTION Safe failure fraction is in IEC 61508-4 defined as: ”The ratio of the average rate of safety failures plus dangerous detected failures of the subsystem to the total average failure rate of the subsystem” SFF =

λS + λDD λ = 1 − DU λtot λtot

(1.31)

PDS method handbook defines SFF as: SFF = 1 −

λDU λcrit

(1.32)

The main difference between the two methods is the definition of total failure rate. While IEC 61508 consider all failures included as total failure rate in the SFF equation, the PDS method limits the total failure rate to the failures that will be critical for the system (failures causing spurious trips or unavailability of the safety function). When making use of IEC 61508’s definition of SFF, it is possible to increase the SFF by introducing components in the subsystem with a high safe failure. The PDS method avoids this by introducing the λcrit which exclude this type of failures. It is also important to notice that a system with high SFF may also represent a system with high STR (for both methods). In order to meet the SIL requirements given in IEC 61508, the SFF for each subsystem must be calculated according to Equation 1.31. Further discussions about the correctness of IEC 61508 formula for SFF are given in Kjørstad (2004).

2.6 MAIN DIFFERENCES AND DISCUSSION When modeling CCF of a system, the semi-quantitative method suggested in IEC 615086 annex D is a good solution. The method is time-consuming, but results in a β-factor Models for quantification of availability for continuous control systems and reliability for safety systems

Page 40 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

which describes the specific system. A β-factor derived from historical data will often not reflect the actual system, and it may also hold a great uncertainty. CCF of category two will often not be reported as CCF. E.g. when the failure of a cooling fan cause several components to fail, it is not likely that all components fail simultaneously. Hence the failure will probably not be reported as a CCF. The β-factor derived using the IEC 615086 method will hold a higher accuracy than the one based on historical data. On the other hand, one may question the scoring points given in the tables in IEC 61508-6 annex D. These points are derived using engineering judgment; however the determination of the Z value (dependency of diagnostic coverage and diagnostic test interval) is questioned in Hokstad and Corneliussen (2000). The PDS method’s suggestion to modify the β-factor with a factor representing the voting logic also seems to be a good solution. It is realistic that the susceptibility for CCF is higher for a 1oo2 system than a 1oo3 system. It is therefore recommended to follow the method suggested in the PDS method for modeling CCF, i.e. using the scoring tables in IEC 61508-6 annex D and multiply the β-factor with the modification factors given in the PDS method handbook for the different voting logics. For systems with 1oo2 voting logic the modification factor is naturally equal to 1.0, and thus the two methods will give equal results. The calculation models for PFD presented in the PDS method and the method presented in IEC 61508-6 also have some differences. The PDS method includes the probability of system failures due to a dangerous failure when a channel is being repaired. If the MTTR is small, this probability is very low and will be even lower for a system with a high degree of redundancy. Hence, it may be considered as negligible. If the MTTR is high, this probability may not be considered as negligible. Another difference is the βD factor used in IEC 61508-6 for DD CCF, hence it is assumed that the susceptibility for CCF when the dangerous failure is detected is lower than when the dangerous failure is undetected. In addition IEC 61508-6 assume that the amount of safe failures are equal to the amount of dangerous failures, hence both are 50 % of the total failure rate. The PDS method makes no such assumptions, but if there is no other good suggestions that may be proved thoroughly, this may be a good solution. Little information about the derivation of the models presented in IEC 61058-6 is given, thus it may be recommended to make use of the PFD models as suggested in the PDS method. The difference between the two methods is large for the handling of systematic failures. The PDS method suggest a quantitative analysis of PSF. IEC 61508-6 claims that a quantitative analysis is not possible to be performed, and requires a qualitative method for the avoidance of systematic failures to be applied in all life-cycle phases of the system. It is of high importance to perform a qualitative analysis in the life-cycle phases, in order to develop system with low PSF. However it is recommended to also quantify the PSF, in order to give a confidence interval that may show the probability of the system’s SIL compliance. The PDS method also quantifies the STR of the system, while IEC 61508-6 not consider these types of failures. A quantification of the STR will present information that may be Models for quantification of availability for continuous control systems and reliability for safety systems

Page 41 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

included in an availability calculation of the ECU, and may therefore be of interest for most customers. The SFF of the system is required to be calculated in order to determine the SIL of the system, and it must be calculated as presented in IEC 61508 (Equation 1.31). The assumptions that are taken in order for the models presented in IEC 61508-6 and the PDS method to be applicable are: ƒ ƒ ƒ ƒ ƒ ƒ

The components has an exponential distributed lifetime (failure rates are constant) The components in a system are identical (same failure rate and DC) and proof tested at equal test intervals λDUτ is small enough so that the approximation formulas in Table 2.3 apply Time from detection of failures (either detected by diagnostic tests or at proof test) to repair actions begin are negligible Perfect proof tests and repairs The failure rates are represented as total failure rates including CCF

In order to determine the optimal calculation models for Kongsberg Maritime’s systems, the product range must be described thoroughly and special considerations for each system must be revealed.

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 42 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

3 PRODUCT RANGE Relevant product range of Kongsberg Maritime’s systems for this project is: ƒ ƒ

The AIM Safe systems (safety systems) The dynamic positioning (DP) systems (continuous control systems)

The AIM Safe system is a SIS operating in low demand mode while the DP system is a continuously operating SIS. Kongsberg Maritime deliver the logical subsystems to the customers, hence the sensor subsystem and the final element subsystem is not considered in this report. Both the AIM safe systems and the DP systems are mainly operating on vessels or platforms, thus the intended operational environment are naval sheltered (NS). The components in the systems are analyzed using a failure mode and effect analysis (FMEA), and failures rates are calculated using MIL-HDBK217F and supplier data for NS environment. FMEA is a common quantitative technique for analyzing the system’s reliability, and it is also required in IEC 61508 to perform FMEA on the components included in the SIS. After the introduction of programmable electronic SIS, an extended version of the FMEA technique, called FMEDA, has been developed. FMEDA includes the diagnostic features of the SIS in the analysis, thus the diagnostic coverage (DC) of the system is measured and evaluated. Further information about the FMEDA technique is given in Kjørstad (2004). Failure data are often derived from data sources or from historical data. When data are derived from data sources, it is important to choose the data source that fits your component and its working environment well. In addition it is of high importance to investigate how the data from the data sources are collected and how it is presented. The failure rates presented in different data sources may include different types of failures (e.g. CCF may be included or excluded in the failure rate, as described in section 2.1). How to collect the data will have an impact on what kind of failures that is included in the failure rate. Information about the data source MIL-HDBK217F, and how the data in this data source are collected, is presented in Kjørstad (2004).

3.1 SAFETY SYSTEMS The main task of the AIM Safe systems are safe error detection and provision of automatic corrective actions in unacceptable hazardous situations. Kongsberg Maritime delivers safety systems as emergency shutdown systems (ESD), process shutdown systems (PSD) and fire- and gas detection systems (F&G) with both single and dual redundant (1oo2) solutions. AIM Safe systems are mainly delivered to the maritime sector. Safety systems are most often required to meet a high reliability, due to the importance of their function. When a demand for the safety system arises, it is crucial that the safety system is functioning and able to put the EUC to a safe state (thus perform a shutdown of the process). As described in chapter 2, the most common requirement from customers are to satisfy a given SIL for the system according to IEC 61508. Models for quantification of availability for continuous control systems and reliability for safety systems

Page 43 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Kongsberg Maritime’s AIM Safe systems with corresponding SIL calculated today are shown in Table 3.1. Table 3.1: Kongsberg Maritime's safety solutions (reproduced from Korssjøen 2004)

AIM Safe system 1oo2 dual I/O. AI/DO and DI/DO 1oo2 shared I/O. AI/DO and DI/DO IS 1oo2 shared I/O. AI/DO and DI/DO 1oo1 single I/O

SIL 3 2 2

3.2 DYNAMIC POSITIONING SYSTEMS The main task of a DP system is to make a vessel or platform hold its position only by means of its propulsion and thrusters. The DP system is a continuously operating system, where the input data are handled continuously in order to hold the vessel within its desired position. As opposed to the safety systems, the safe state for a DP system is when the process is running and a dangerous state occurs when the process is shutdown. Kongsberg Maritime’s DP systems consist of a DP control system designed for different applications and vessels. The DP system is delivered as single, dual redundant (1oo2) and triple redundant (1oo3) solutions for both stand alone systems and integrated system. The stand alone system is designed to interface with other systems (e.g. power plant and thrusters) via conventional signal cables. Integrated systems are designed to communicate with other Kongsberg Maritime systems via a dual network (dual Ethernet LAN). Figure 3.1 shows a block diagram of the DP system, and illustrates the main functionality. Table 3.2 describes the different DP controller systems that Kongsberg Maritime delivers.

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 44 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Figure 3.1: Block diagram of the DP system (adapted from Kongsberg Maritime AS 2003)

Table 3.2: Kongsberg Maritime’s DP systems7 (adapted from Kongsberg Maritime AS 2003)

DP System DP-10 DP-11 DP-12 DP-21 DP-22 DP-31 DP-32

7

Comment Stand-alone single DP control system Stand-alone single DP control system Integrated single DP control system Stand-alone dual-redundant DP control system Integrated dual-redundant DP control system Stand-alone triple-redundant DP control system Integrated triple-redundant DP control system

Combinations of the different DP systems satisfy the international maritime organization’s (IMO) DP class 1, 2 and 3. Models for quantification of availability for continuous control systems and reliability for safety systems Page 45 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

3.3 CONFIGURATION OF MODULES The AIM Safe systems and the DP systems are module based configurations, where the main components for the logic subsystem are: ƒ ƒ ƒ ƒ ƒ

Remote input/output module (RIO) Process control unit (RCU) Operator station (OS) Power Network

Figure 3.2 and Figure 3.3 illustrate the AIM Safe 1oo2d dual IO system configuration, for signal transmission of input signals from sensors and output signals to field devices respectively. The system consists of one OS, two RCU, two RIO, power, a field sensor (e.g. a pressure transmitter), a field device (e.g. a release valve) and a dual network (A and B net). The dual network is an Ethernet local area network (LAN).

Figure 3.2: Input signal for the AIM Safe 1oo2d dual IO system

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 46 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Figure 3.3: Output signal for the AIM Safe 1oo2d dual IO system

Input signals are sent from field sensors to both RIO, and each RIO transmit the signal to the belonging RCU. There is a 1oo2 redundancy where one RIO and the belonging RCU must function correctly, in order for the system to perform its designed function. The OS is connected with the system through a dual network, but it is not possible to activate the AIM Safe systems manually through the OS. The safety systems may only be activated manually with an emergency button in the operator room, thus the safety system is not affected by the OS and therefore it is not considered in Kongsberg Maritime’s reliability calculations. For the DP systems the OS is used for communication between the system and the operator, hence it must be included in the availability calculations. Output signal is sent from the RCU’s to each of the RIO’s, and thereafter sent to the field device. This gives a 1oo2 redundancy where one RCU and one RIO are required to function in order for the system to perform its designed function. Failure in one RIO and the belonging RCU will not affect the system, except when the system is running as a single system. Therefore hot replacement is possible without shutting down the system. Figure 3.4 illustrates a complete configuration of a AIM Safe 1oo2d dual IO system with several functions included. The dual network is based on a star-topology, and the HUB and Term are included in the figure. One function includes the operation of one field device. Several field devices are connected to the network with each a dual RIO. The green arrows represent the input signals, while the red arrows represent the output signal.

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 47 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Figure 3.4: Input/output communication including HUB and Term for a function.

Figure 3.5 illustrates a DP-22 system for one thruster connected to the system. Redundancy is applied to the RCU and OS. The RIO is connected to the dual network and controls one thruster. Several thrusters are connected to the network, each thruster with one RIO.

Figure 3.5: Configuration for a SDP-22 system

Diagnostic testing is performed within the RCU using a watchdog (WD), and within the RIO using a final input stage test (FIST) and final output stage test (FOST). If the WD Models for quantification of availability for continuous control systems and reliability for safety systems

Page 48 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

detects a fault in the RCU, the other RCU is informed through a direct connection between the two RCU’s, only signals from the “healthy” RCU will be transmitted. The FIST/FOST test is a short input/output signal sent to the field sensor/field device. The signal is testing if the field sensor/field device responds, but will not activate the sensor/device. Kongsberg Maritime delivers several different types of RIO’s and RCU’s, dependent on the function and environment of the system. Detected failures are repaired immediately with MTTR of approximately 1 hour. Repair will mostly imply rebooting the modules or replacement and start-up of new modules. The system is considered as a 1oo1 system during maintenance and repair actions, but as the MTTR is small Kongsberg Maritime does not consider these situations in their existing reliability/availability calculations. The AIM Safe system is a system working in low demand mode of operation. Frequent proof tests are therefore performed in order to reveal the possible DU failures that are not detected by the diagnostic tests in the different modules. The DP system are on the other hand a continuously operating system, and thus most functions will be run continuously or at high demand. For some critical functions that are not frequently used by the system, tests are performed approximately once per year from other integrated systems that is connected to the DP system. These tests will reveal the possible DU failures that are not detected by the diagnostic tests. Figure 3.6 illustrate a DP-31 system with two I/O boards. The system is a stand alone system including three RCU (DP A, B and C), two HUB and two Term. A signal is sent from the sensor and to the RCU’s via termination boards (TB). The RCU units sends a signal on to the network (via HUB) and to the nRIO8 that are connected to the specific thruster. One thruster is connected to one nRIO, thus failure of one nRIO will cause only one thruster to fail. The system illustrated in Figure 3.6 is still under development.

8

nRIO – nanoRIO: RIO with fewer channels (8 channels) Models for quantification of availability for continuous control systems and reliability for safety systems

Page 49 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Figure 3.6: DP-31

3.4 SPECIAL CONSIDERATIONS In order to develop calculation models that are flexible and adapted to the relevant operational conditions for Kongsberg Maritime’s systems, some adjustments to the calculation models analyzed in chapter 2 must be made. The special considerations for Kongsberg Maritime’s systems that the calculation models must be adapted to are listed below: ƒ

Customers often require a quantification of the control system’s mean time between system failures (MTBFS), which reflects the unavailability of the control system. Neither IEC 61508 nor the PDS method presents these measurements.

ƒ

The DP system is a continuously operating system where a system failure at any time will cause a dangerous situation. The PDS method does not cover continuously operating systems, and the method presented in IEC 61508-6 are based on some assumptions that not reflect the DP system.

ƒ

The DP systems are delivered as single, dual and triple redundant systems. The method presented in IEC 61508-6 does not include 1oo3 voting logics, and little information is given in the standard for the derivation of the formulas.

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 50 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

ƒ

For the triple redundant DP system (DP-31 and DP-32), the PDS method and the IEC 61508-6 method will result in different modeling of common cause failures.

ƒ

Mean time to repair (MTTR) for all of Kongsberg Maritime’s components are considered as low as 1 hour.

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 51 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

4 EXISTING CALCULATION MODELS Kongsberg Maritime would like a review of their models for quantification of reliability of the AIM Safe solutions and the availability of the DP systems. Calculation models used in the AIM Safe projects must be in accordance with IEC 61508, in order to meet requirements from customers. Kongsberg Maritime is interested in using calculation models that will give the best reflection of their systems, and are willing to look into other solutions than the calculation methods suggested in IEC 61508 in order to achieve this. It is of high importance that the reliability and availability calculations are based on models that reflect the reality of Kongsberg Maritime’s systems in the best possible matter. At this date there are not commonly required from customers to document the availability of the DP system. However, it may be required in the future and Kongsberg Maritime would like to be prepared for this possibility. Kongsberg Maritime base their calculation models on the approximation formulas used in the PDS method handbook (Hokstad and Corneliussen 2003). PFD for a single component (1oo1) is shown in Equation 1.33 and PFD for a dual redundant component (1oo2) is shown in Equation 1.34. PFD = λDU ⋅τ 2

(1.33)

PFD = β ⋅ λDU ⋅τ 2

(1.34)

Kongsberg Maritime’s calculation models make use of the β-factor model as suggested in the older versions of the PDS method (Hansen and Vatn 1998) (the p-factor model described in section 2.1). As revealed in chapter 2, the PDS method bases their calculation models on the approximation formula given in Table 2.3. Hence it is assumed that λDUτ is small enough for the approximation formulas to apply. Systematic failures have been analyzed and a quantification of the rate of systematic failures has been derived in accordance with the PDS method.

4.1 POTENTIAL IMPROVEMENTS As described in section 3.4, the MTTR for Kongsberg Maritime’s systems are assumed to be approximately one hour. Due to the small MTTR and failures rates for Kongsberg Maritime’s system, several terms in the exact formulas used in the PDS method will give little contribution to the PFD. They are therefore assumed negligible. For single components the PFD is represented by the probability of DU failures. For dual redundant components (1oo2), Kongsberg Maritime calculates the PFD due to dependent DU failures of both components (DU CCF). The probability of group failure due to independent DU failures of both channels is assumed negligible. For Kongsberg Maritime’s system, the probability of a group failure due to independent DU failure of both channels is very small. If one component has a DU failure and is being repaired, the probability of the occurrence of an independent DU failure in the other component during the one hour of restoration time is very small. Hence the main contribution factor to Models for quantification of availability for continuous control systems and reliability for safety systems

Page 52 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Kongsberg Maritime’s system is CCF. However, the calculation model for dual redundant components in Equation 1.34 will only reflect the CCF. If the MTTR is increased, the probability of independent DU failures in both channels will be higher and may not be assumed negligible. Thus it is recommended to include the probability of group failure due to independent DU failures in both channels. Kongsberg Maritime has today only requirements from customers of the AIM Safe system on quantification of reliability (presented in SIL). As for the SDP system, Kongsberg Maritime has to this date no requirements for quantification of the availability of the system. To this date availability calculations for the DP systems has been performed using the PDS method, based on the same assumptions as for the AIM Safe system. It is in Kongsberg Maritime’s interest to improve the existing calculation models for the DP systems, such that the special features of the systems are reflected in the calculation models and that the availability of the systems are presented in a realistic matter.

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 53 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

5 IMPROVED CALCULATION MODELS The main objective for Kongsberg Maritime is to document the reliability of their AIM Safe systems in accordance with IEC 61508. Table 5.1 presents the requirements given in IEC 61508-2 on what type of considerations that must be present when quantifying the reliability and availability of a system. Table 5.1: Requirements for considerations when performing PFD calculations (IEC 61508-2)

Considerations when performing PFD calculations The functional architecture of the system and subsystems The estimated rate of failure of each subsystem in any modes which would cause a dangerous failure of the E/E/PE safety system but which are detected by the diagnostic tests The estimated rate of failure of each subsystem in any modes which would cause a dangerous failure of the E/E/PE safety system but which are undetected by the diagnostic tests The susceptibility of the E/E/PE safety system to common cause failures The diagnostic coverage of the diagnostic tests The intervals at which proof tests are undertaken to reveal dangerous faults which are undetected by diagnostic tests The repair times for detected failures (includes repair time, the time taken to detect the fault and any time period during which repair is not possible)

The analyzing process of the PDS method and the method presented in IEC 61508-6 performed in chapter 2, has shown that the requirements in Table 5.1 is met using both the PDS method and the model presented by IEC for calculating the PFD of a system. Both the IEC 61508-6 method and the PDS method are applicable for the AIM Safe systems, but it is only the method presented in IEC 61508-6 that includes the availability calculations for the DP systems. For the AIM Safe systems, it is therefore a choice between the PDS method or the method presented in IEC 61508-6. Based on the result revealed in chapter 2 and 3, the PDS method is recommended for use in the reliability quantification for Kongsberg Maritime’s AIM Safe systems. The PDS method is well known for the safety engineers within the company today, and the PDS method handbook presents good information about the derivation and use of the models. Little information about the derivation and use of the formulas in the IEC 61508-6 method is given in the standard or found elsewhere. In order for the calculation models presented in the PDS method and the IEC 61508-6 method to be applicable for the AIM Safe systems and DP systems, it is important to investigate if the assumptions apply for Kongsberg Maritime’s systems. The assumptions have been revealed in section 2.6, and are rewritten below: ƒ

The components has an exponential distributed lifetime (failure rates are constant)

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 54 of 77

Norwegian University of Science and Technology

ƒ ƒ ƒ ƒ ƒ

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

The components in a system are identical (same failure rate and DC) and proof tested at equal test intervals λDUτ is small enough so that the approximation formulas in Table 2.3 apply Time from detection of failures (either detected by diagnostic tests or at proof test) to repair actions begin are negligible Perfect proof tests and repairs The failure rates are represented as total failure rates including CCF

Figure 5.1 and Figure 5.2 illustrate the effect of assuming that λDUτ is small enough for the approximation formulas to apply. Figure 5.1 illustrates the PFD average for a single component with DU failure rate 0.7E-05 per hour, and Figure 5.2 illustrates the PFD average for a single component with DU failure rate 0.7E-04 per hour. Both for different proof test intervals, τ. The red graph is the PFD average calculated for different proof test intervals with the PFD approximation formula shown in Equation 1.35 (this approximation formula is introduced in section 2.2). PFD ≈

( λDUτ )

(1.35)

2

The yellow graph presents the PFD average calculated for different proof test intervals with the PFD exact formula given in Equation 1.36 (this formula is also introduced in section 2.2). PFD = 1 −

1

τ

τ

∫ R ( t )dt = 1 − 0

1

τ

e τ∫ 0

− λDU τ

dt = 1 −

1

λDU

(1 − e ) τ λDU τ

(1.36)

The approximation formula used in the calculations shown in Figure 5.1 result in more conservative values than the exact formula when the proof test interval is large, but is shown to be a good approximation for the exact formula. The approximation formula used in Figure 5.2 is a good approximation for small proof test intervals, but will be too conservative for larger test intervals. E.g. for a proof test interval of one year the difference is 5.41E-02 in PFD average. The DU failure rate presented in Figure 5.1 is within the area of Kongsberg Maritime’s components with the lowest reliability, and the components are not proof tested with a greater interval than one year (8760 hours). Hence it is clear that the approximation formulae will apply for all proof test intervals and DU failure rates used when calculating the reliability/availability of Kongsberg Maritime’s systems. This proves that the formulae in IEC 61508-6 and the PDS method apply for all components in Kongsberg Maritime’s systems.

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 55 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Exact vs. approximated value 4.00E-02 3.50E-02 PFD average

3.00E-02 2.50E-02 2.00E-02 1.50E-02 1.00E-02 5.00E-03 0.00E+00 0

2000

4000

6000

8000

10000

12000

Proof test interval (hours) PFD approximation formula

PFD exact formula

Figure 5.1: PFD average for a component with DU failure rate 0.7E-05 per hour.

Exact vs. approxim ated value 4.00E-01

PFD average

3.50E-01 3.00E-01 2.50E-01 2.00E-01 1.50E-01 1.00E-01 5.00E-02 0.00E+00 0

2000

4000

6000

8000

10000

12000

Proof test interval (hours) PFD approximation formula

PFD exact formula

Figure 5.2: PFD average for a component with DU failure rate 0.7E-04 per hour.

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 56 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

5.1 AIM SAFE SYSTEMS Calculation models for the AIM Safe systems are based on the system configuration presented in chapter 3. The AIM Safe 1oo2d dual IO redundancy may perform several functions, where one function is defined as the communication between a specific sensor subsystem to a specific field subsystem. The reliability block diagram in Figure 5.3 illustrates one function for the AIM Safe 1oo2d dual IO redundancy system. Throughout this report a function is referred to as a system. The reliability block diagram is split into three subsystems, where on subsystem consist of a dual redundant power component, the second subsystem consist of dual redundant RIO, RCU, HUB and Term components, and the last subsystem consist of a dual redundant RIO component. A reliability block diagram is a success oriented network, where the components in at least one path through the structure diagram must function in order for the system to perform its designed function.

Figure 5.3: Reliability block diagram for an AIM Safe 1oo2d dual IO redundancy system

The system may fail to perform its designed task when one or more of the subsystems fail. Each subsystem may fail when on or more component from each channel fails due to random hardware failure, CCF or systematic failures. The system will perform as a single system when one or more components in one channel fail due to degradation. Detected failures are repaired when detected with a MTTR of one hour, and will not affect the other channel. Hence the probability of a system failure caused by degradation failure of one or more component from each channel is dependent on the coverage of the diagnostic tests. Each channel is powered by at least one power, if failure of one power the other is capable of feeding both channels. For system solutions where one channel has only one power the failure rate of the power will increase when the load is doubled. The diagnostic coverage and safe failure fraction of the powers are very high, thus the probability of loosing power to both channels are very small. Kongsberg Maritime’s AIM Safe systems are either 1oo2 systems or single systems. Hence there will be no difference between the β-factor model presented in IEC 61508 and the β-factor model presented in the PDS method. It is assumed that a system failure is caused due to one of the combinations: ƒ ƒ

both power fail both RIO input channel fail

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 57 of 77

Norwegian University of Science and Technology

ƒ ƒ ƒ ƒ

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

both RCU fail both SPHUB fail both SPTerm fail both RIO output channel fail

Hence, the following assumptions is taken for the calculation models for the AIM Safe systems: ƒ

The probability of a system failure due to independent dangerous failures in two or more different components, from different channels in one subsystem is very small. Due to the low MTTR for the AIM Safe systems this probability is considered as negligible.

ƒ

If one power fails, the other will continue with the same failure rate

ƒ

OS and dual communication net has no effect on the reliability of the system

ƒ

All assumptions listed in section 2.6 applies for the AIM Safe systems

When the assumptions above are applied, the reliability block diagram of an AIM Safe 1oo2d dual IO redundancy system may be drawn as illustrated in Figure 5.4. The system is split into six dual redundant groups.

Figure 5.4: Reliability block diagram for an AIM Safe 1oo2d dual IO redundancy system, when assumptions are applied 6

PFDsystem = ∑ PFDn

(1.37)

n =1

The PFD is calculated as a sum of the PFD for each dual redundant group (see Equation 1.37), thus the PFD calculation model suggested in the PDS method for a 1oo2 system applies to each of the dual redundant groups in Figure 5.4: PFD = PFDUK + PFDK

(1.38)

Hence the calculation models for reliability of the AIM Safe 1oo2 systems are:

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 58 of 77

Norwegian University of Science and Technology

PFDUK = β ⋅ λDU ⋅τ

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

( (1 − β ) ⋅ λ 2+

DU

3

⋅τ )

2

+ 2 ⋅ (1 − β ) ⋅ λdet ⋅ MTTR ⋅ λDU ⋅τ 2 (1.39)

PFDK = β ⋅ λD ⋅ MTTR + ( (1 − β ) ⋅ λDU ) ⋅τ ⋅ MTTR ⋅ λDD ⋅ MTTR 2

(1.40)

Due to the low MTTR for the AIM Safe systems, Equation 1.39 and 1.40 is approximated to: PFD = β ⋅ λDU ⋅τ

( (1 − β ) ⋅ λ 2+

DU

⋅τ )

2

(1.41)

3

Equation 1.41 represents the probability of failure of the system on demand due to CCF and independent DU failures in both channels. Example 5.1: AIM Safe 1oo2d dual IO redundancy, function RDI2001 and RDO2000

Table 5.2 shows the input factors for the AIM Safe 1oo2d dual IO redundancy system described in the reliability block diagram in Figure 5.4. Table 5.2: Input data for an AIM Safe 1oo2d dual IO redundancy system Common description

IO card Bus-Hub Bus-term.card Computer Power IO card

Voting logic koon 1oo1 1oo2 1oo2 1oo2 1oo2 1oo1

Diagnostic coverage

Failure rate [failure/hours]

DU failure rate [failure/hours]

λ 5.26E-06 5.16E-06 6.63E-06 6.34E-06 4.91E-07 5.26E-06

λDU 2.63E-07 5.16E-07 6.63E-07 3.17E-07 4.91E-08 2.63E-07

DC 0.9 0.8 0.8 0.9 0.8 0.9

6

PFD = ∑ PFDn = 6.14 ⋅10−4

Beta factor β 1 0.02 0.02 0.1 0.02 0.1

Proof test interval [hours] τ 4380 5 5 2190 4380 5

(1.42)

n =1

When a demand for the function arise, the average probability that the system will not respond adequately is PFD = 0.000614, meaning that the system will fail to respond to 1 out of approximately 1630 demands. In other words the system will be unavailable as a safety barrier approximately 0.0614 % of the operation time, or 5.4 hours per year if the system is in continuous operation (assuming 8760 hours per year). The logic subsystem may use a maximum of 15 % of the PFD for the total system, including the sensor subsystem and final element subsystem. Equation 1.43 shows that the AIM Safe 1oo2d dual IO system satisfies SIL2 when used in a complete safety system. Models for quantification of availability for continuous control systems and reliability for safety systems

Page 59 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

PFD = 0.15 ⋅ PFDtot ⇒ PFDtot =

1 ⋅ PFD = 4.10 ⋅10−3 0.15

(1.43)

5.1.1 Maintenance and repair During maintenance and repair actions for the system described in Figure 5.2, one or more components in a redundant group will not be available to the system and the affected redundant group will function as a single system. The calculation models for PFD of single systems, as suggested in the PDS method, are shown in Equation 1.44 and 1.45: PFDUK = λDU ⋅τ 2

(1.44)

PFDK = λD + MTTR

(1.45)

Due to the small MTTR of the AIM Safe systems, Equation 1.44 and 1.45 are approximated to: PFD = λDU ⋅τ 2

(1.46)

Equation 1.46 represents the probability that the system fails to respond on demand due to DU failure in the component. Hence the probability of a demand in the time period when the component is repaired for a DD failure is considered as negligible, due to the low MTTR. Example 5.2: The AIM Safe 1oo2d dual IO redundancy system, one RCU is beeing repaired

When the system is being repaired for a failure in the RCU, the redundant group of RCU components will function as a single system. This will also cause the HUB, Term and power redundant groups to be reduced to single groups, hence all redundant groups are reduced to single groups. The PFD for each components is calculated using Equation 1.46, and the total PFD of the system can no be calculated as: 6

PFD = ∑ PFDn = 1.03 ⋅10−3

(1.47)

n =1

Equation 1.47 shows that when a system for the AIM Safe 1oo2d dual IO system is being repaired or maintained, hence reduced to a system of single subsystems, the reliability of the system is reduced from a PFD of 6.15E-04 to a PFD of 1.03E-03.

5.1.2 Comparison of the AIM Safe systems Table 5.3 shows the PFD for different types of the AIM Safe systems, where the input data are as given in Table 5.2 (except different voting logics, and the β-factor is equal to 1.0 for single systems). Models for quantification of availability for continuous control systems and reliability for safety systems

Page 60 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Table 5.3: PFD for different AIM Safe systems

AIM Safe system 1oo2 dual I/O. 1oo2 shared I/O. 1oo1 single I/O

PFD of function 9.51E-05 6.15E-04 1.03E-03

PFDtot 6.34E-04 4.10E-03 6.89E-03

SIL

3 2 2

The recommended calculation models for Kongsberg Maritime’s safety systems are rewritten and collected in Table 5.4. It is important to notice that the recommended formulas are approximation formulas, and will only apply for the assumptions described in this chapter. Table 5.4: Recommended PFD formulas for Kongsberg Maritime's AIM Safe systems

System 1oo1 1oo2

PFD formula PFD = λDU ⋅τ 2 PFD = β ⋅ λDU ⋅τ

( (1 − β ) ⋅ λ 2+

DU

⋅τ )

2

3

5.2 DYNAMIC POSITIONING SYSTEMS The reliability block diagram of a DP-31 system is shown in Figure 5.5. The figure is based on the configuration information given in chapter 3, and the assumption taken for the AIM Safe systems is also applied here. Hence the reliability block diagram consists of several redundant or single groups connected in a series configuration.

Figure 5.5: Reliability block diagram of a DP-31 system

Kongsberg Maritime’s DP systems are delivered as single, dual redundant and triple redundant configurations. The triple redundant configurations includes three RCU (DP A, B and C), but dual redundant solution for the network (Hub and term) as illustrated in fig. xx. The dual redundant configurations include two RCU (DP A and B), with dual redundant solution for the network (Hub and term). The probability of system failure due to loss of power is extremely low, due to the fact that one power is capable of supplying Models for quantification of availability for continuous control systems and reliability for safety systems

Page 61 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

more than one module. In addition the failure of one power will not have a significant impact on the failure rate of the other powers. In reality the system is supplied by several powers, but is shown as two redundant components for simplifying the reliability block diagram. As a result of this, the PFD for the powers will be slightly conservative. As for the AIM Safe systems, the modules in the DP systems have diagnostic testing that will reveal a high percent of the total dangerous failures. For the DU failures, the time for detection is strongly dependent on the operator of the vessel. If the operator is welltrained, most failures undetected by the diagnostic tests will be revealed approximately at occurrence by the operator. A few amounts of functions in the DP systems are not activated as often as the normal functions. Failure in these functions may not be revealed before the function has a demand, or the failures are detected by a proof test. Proof test of these functions are claimed to be done when external systems that are connected to the DP system are tested. The DP system is a complex system that not easily may be described in a calculation model. Some assumptions of the system are therefore taken in order to develop calculation models for describing the availability for the system: ƒ ƒ ƒ

DD failures of all channels at once (DD CCF) will give system failure The probability of independent DD failures of all channels at once are considered as negligible, due to the low MTTR All components will have a proof test interval of one year

The availability calculations for the DP systems is considered as conservative, due to the assumptions that all components will have a proof test interval of one year (thus disregarding the possibility for a well-trained operator to detect dangerous failures that has not been detected by the diagnostic tests). The AIM Safe systems are delivered as a maximum of dual redundant configuration, hence the modeling of CCF as suggested in the PDS method and in the method presented in IEC 61508-6 will be equal. The DP systems on the other hand are delivered as single, dual and triple redundant solutions; hence the choice of method will effect the treatment of CCF. As revealed when analyzing the different methods in chapter 2, the extended βfactor model suggested by the PDS method will give the possibility for comparing identical systems with different voting logics. Hence, the modification factor used in the PDS method is recommended to be used for the quantification of CCF for the DP systems. However, the PDS method applies for systems operating in low demand mode and the DP systems are defined as continuously operating systems. As described in section 2.2.3 the method presented in IEC 61508-6 also suggests methods for calculating failure frequency of continuous systems, called the probability of a dangerous failure per hour (PFH). The method presented in IEC 61508-6 suggests calculation models based on the reliability block diagram technique. These models do however make some assumptions that are not applicable for the DP systems. Several experts have investigated the use of different reliability analysis techniques, and states that the Markov technique holds the greatest modeling power (e.g. Rouvroye and Bliek (2002), Knegtering and Brombacher (2000) and Bukowski and Goble (1995)). Both the PDS method and the method presented in IEC 61508-6 are based on the reliability block diagram, and thus it is Models for quantification of availability for continuous control systems and reliability for safety systems

Page 62 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

of interest to investigate if the Markov techniques may give a more realistic reflection of the DP systems.

5.2.1 Markov models The Markov technique is a more comprehensive technique than other reliability analyzing techniques, but gives more flexibility and is thus able to give a more realistic description of the system. When the system becomes large and complex, the number of Markov states will be large and the calculation models difficult to follow. These situations are often solved by splitting the system into smaller independent modules ( Knegtering and Brombacher (2000) and Rausand and Høyland (2004)). The system must satisfy the given criteria in order to be analyzed using the Markov technique (Rausand and Høyland 2004 p. 303): ƒ

When the present state of the system is known, the future development of the system is independent of anything that has happened in the past (the components are assumed to have an exponential distributed lifetime)

ƒ

The probability of a transition from one state to another does not depend on the global time but only on the time interval available for the transition (system with stationary transition probabilities, thus a time-homogeneous system)

A Markov state transition diagram for a single component is given in Figure 5.6, and the two states are defined in Table 5.5.

λ

0

1

µ Figure 5.6: State transition diagram a single component

Table 5.5: System state definitions

System state 0 1

System state definition System functioning as 1oo1 System failure

The state transition diagram presents the states of the system as 0 (component is functioning as designed) and 1 (component has failed). Transition between the two states is represented by arrows. λ present the transition rate between state 0 and 1, in this case the failure rate of the component. µ present the transition rate between state 1 and 0, which is the repair rate of the component. The corresponding transition rate matrix is thus: Models for quantification of availability for continuous control systems and reliability for safety systems

Page 63 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

⎡a A = ⎢ 00 ⎣ a10

a01 ⎤ ⎡ −λ = a11 ⎥⎦ ⎢⎣ µ

λ ⎤ − µ ⎥⎦

(1.48)

where aij presents the transition rate from state i to state j. When assuming that the component will approach a steady state when the component has been in operation for a long time ( t → ∞ ), the probability of the component being in a state j is called the steady state probability for state j (Pj). The steady state equations may be calculated based on the rules: P⋅ A = 0 and r

∑P j =0

Steady state equations:

j

−λ ⋅ P0 + µ ⋅ P1 = 0

λ ⋅ P0 − µ ⋅ P1 = 0 P0 + P1 = 1 Hence the steady state probabilities for state 0 and 1 are:

µ

P0 =

λ+µ λ P1 = λ+µ The probability of a dangerous failure per hour for the components it thus: PFH = ∑ ∑ Pj ⋅ a jk j∈B k ∈F

= P0 ⋅ a01 =

µ

⋅λ

(1.49)

λ+µ =λ ,λ µ where B represents the functioning system states (in this case state 0), and F represents the failed states (in this case state 1). IEC 61508-6 presents the PFD for a single component as λDU, and assuming that the component will put the EUC into a safe state at detection of a dangerous failure. For the DP system, which is operated continuously, a DD failure of a single component will be equally dangerous than a DU failure. Hence the PFH for a single component in the DP system should be: Models for quantification of availability for continuous control systems and reliability for safety systems

Page 64 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

PFH = λDU + λDD

(1.50)

According to IEC 61508-6 the MDT of a channel (in this case the component) is: MDTchannel =

λDU λ (τ 2 + MTTR ) + DD MTTR λD λD

(1.51)

where the occurrence of DU failures is assumed to occur in half of the proof test interval for a single system. The derivation of MDT is thoroughly discussed in section 2.2.1. Due to the low MTTR of Kongsberg Maritime’s systems, the MDT is approximated to: MDT ≈

λDU τ ⋅ λD 2

(1.52)

The mean time between system failures (MTBF) are then (Rausand and Høyland 2004): MTBFs =

1 1 = PFH λD

(1.53)

Following the same procedure that for the single components, PFH calculation models for dual redundant and triple redundant DP components may be derived using the Markov technique. The derivation of the formulas is given in appendix A. Table 5.6 shows the result for redundant independent and identical components, and Table 5.7 shows the result where the probability of CCF is included. Table 5.6: PFH for 1oo2 and 1oo3 systems for independent components

1oo2 1oo3

PFH = 2 ⋅ λD 2 ⋅ PFH = 3 ⋅ λD 3 ⋅

Models for quantification of availability for continuous control systems and reliability for safety systems

1

µ 1

µ2

= 2 ⋅ λD 2 ⋅ MDTchannel = 3 ⋅ λD 3 ⋅ MDTchannel 2

Page 65 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Table 5.7: PFH for 1oo2 and 1oo3 systems for dependent components

PFH = 2 ⋅ ( (1 − β D ) ⋅ λDD + (1 − β ) ⋅ λDU ) ⋅ MDTchannel

1oo2

2

+ β D λDD + βλDU PFH = 3 ⋅ ( (1 − 1.7 ⋅ β D ) ⋅ λDD + (1 − 1.7 ⋅ β ) ⋅ λDU ) ⋅ MDTchannel 2

1oo3

3

+0.3 ⋅ β D λDD + 0.3 ⋅ βλDU

The probability of CCF for the dual and triple redundant systems is included as suggested in the method presented in IEC 61508-6. However the β-factor for the triple redundant components is multiplied with the modification factor as suggested in the PDS method (see Table 2.1). Recommended approximation formulas for the DP systems when assuming low MTTR (hence the terms multiplied with MTTR as negligible), and the probability of group failure due to independent DD failure in all channels as negligible are shown in Table 5.8. Table 5.8: Availability measurements for 1oo2 and 1oo3 systems

1oo2

1oo3 Frequency of system failures

Frequency of system failures PFH = 2 ⋅ ( (1 − β ) ⋅ λDU ) ⋅ 2

λDU ⋅τ 2 λD

+ β D λDD + βλDU

Mean time between system failures MTBFS =

1 PFH

⎛ 2 ⋅ ( (1 − β ) ⋅ λ )2 ⋅ MDT ⎞ D DU channel ⎟ =⎜ ⎜ + β λ + βλ ⎟ DU ⎝ D DD ⎠

PFH = 3 ⋅ ( (1 − 1.7 ⋅ β ) ⋅ λDU )

⎛λ ⎞ ⋅ ⎜ DU ⋅τ 2 ⎟ ⎝ λD ⎠

2

+0.3 ⋅ β D λDD + 0.3 ⋅ βλDU

Mean time between system failures MTBFS =

−1

3

1 PFH

2⎞ ⎛ 3 ⋅ ( (1 − 1.7 ⋅ β ) ⋅ λ )3 ⋅ MDT D DU channel ⎟ =⎜ ⎜ +0.3 ⋅ β λ + 0.3 ⋅ βλ ⎟ D DD DU ⎝ ⎠

−1

5.2.2 Comparison of DP systems Table 5.9 shows the input data for a DP system including the main components. All components are assumed to have a proof test interval of one hour. As discussed on page 62 several of the functions may be detected by a well-trained operator before the proof Models for quantification of availability for continuous control systems and reliability for safety systems

Page 66 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

test. Hence the number presented in this section will be conservative, and only apply for a system with the components given in Table 5.9. Table 5.9: Input data for a 1oo2 DP system Common description

Diagnostic coverage DC

TB Computer HUB RIO Power Net

0.8 0.8 0.8 0.8 0.8 0.8

Failure rate [failure/hours] λ 1.14E-05 1.75E-05 8.60E-06 1.14E-05 4.91E-07 1.28E-06

DU failure rate [failure/hours] λDU 1.14E-06 1.75E-06 8.60E-07 1.14E-06 4.91E-08 1.28E-07

Beta factor

Proof test interval [hours]

β

τ 1 1 1 1 1 1

8760 8760 8760 8760 8760 8760

Table 5.10 shows the PFH and MTBF for single, dual and triple redundant solutions. For the single solution, all subsystems are calculated as single systems. For the dual redundant solution, all subsystems are calculated as 1oo2 systems except the TB, RIO and Net. For the triple redundant solution only the computer is calculated as a 1oo3 system, while the other systems are calculated as 1oo2 systems except the TB RIO and Net that are calculated as 1oo1 systems. Table 5.10: PFH and MTBF calculation for different DP systems

System 1oo1 1oo2 1oo3

PFH

MTBF [hours] 2.53E-05 2.00E-05 1.64E-05

3.95E+04 4.99E+04 6.11E+04

The results given in Table 5.10 show an increase in reliability as the degree of redundancy increases. For a triple redundant system including the components and failure data given in Table 5.9, the mean time between system failures (MTBF) are 6.11E-04 hours or approximately seven years.

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 67 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

6 CALCULATION TOOL Kongsberg Maritime is in need for a tool for calculation of reliability for their AIM Safe systems and availability for their DP systems. A calculation tool will validate the reliability and availability calculations, and making use of a tool will be time-saving for the company. In addition if the calculation tool has a good user interface, it may be easier for other than the safety engineers to make the calculations. Thus e.g. engineers working with design and development that has good knowledge about the system details, may use the calculation tool to compare different system configurations and develop optimal solutions. It is of course also of high importance that the calculation tool is based on calculation models that are adapted to Kongsberg Maritime’s systems.

6.1 EXISTING CALCULATION TOOL At this date Kongsberg Maritime make use of a Microsoft Excel Worksheet for reliability calculations of the AIM Safe solutions and availability calculations for the DP systems. The calculations are based on the older versions of the PDS method (earlier version of PDS Method Handbook 2003), and to a certain degree adapted to IEC 61508. The worksheet contains historical data, formulas and several examples of calculations used in previous projects. The main advantage with this program is the possibility to start a new system based on previous and similar systems, which make it timesaving to use. On the other hand, such calculations might suffer from loss of data due to the copy-paste method that is used. The main disadvantage with this worksheet is the user interface which is difficult to follow. Making use of several worksheets and more macros would have a positive effect on the user interface. Another disadvantage of the program is that the historical data in the worksheet is not synchronized with the failure data stored in the internal bookshelves. Updates and new failure data are stored in the internal bookshelves, and there are no routines on updating the data in the worksheet. In order to validate the reliability and availability of the calculations, one major factor for improvement is to synchronize these data. An improvement of the existing calculation tool is needed, and the main factors for improvement must be: ƒ ƒ ƒ

Validate the reliability/availability calculations performed within the software Validate that the failure data used in the calculations are based on the ones provided by Kongsberg Maritime Improve the user interface of the tool, such that it is possible for others than the safety engineers to make use of the tool

In addition to the factors for improvements mentioned above, the advantages of the existing calculation tool should be included in the new calculation tool: ƒ

Possible to load older systems and to edit these, this will be time-saving for the user. However it is of high importance to insure the correctness of the calculation models and the failure data when these operations are performed

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 68 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Another factor of improvement would be to synchronize the failure data stored in the internal bookshelves, with the ones stored in the calculation tool. This task is considered to be too extensive for this project and it is not a part of the objectives for this project. However it is strongly recommended that a synchronization of the failure data are performed in the future.

6.2 PRODUCT SPECIFICATION 6.2.1 Environmental considerations The software is to be used with the operator system (OS) Windows XP and Microsoft Excel version 2002.

6.2.2 Functional requirements The functional requirements for the software are derived based on the factors of improvements revealed in section 6.1, and Kongsberg Maritime’s requirements. The functional requirements are listed below: ƒ

The calculation tool shall be able to calculate the reliability of the AIM Safe systems

ƒ

The calculation tool shall be able to calculate the availability of the DP systems

ƒ

The calculation models used within the calculation tool shall be based on the models suggested in chapter 5

ƒ

The calculation tool shall validate the reliability/availability calculations

ƒ

The calculation tool shall validate that the failure data used in the calculations are as provided by Kongsberg Maritime

ƒ

The calculation tool shall be able to: o Save new failure data o Save systems for the active project o Start a project from scratch (adding new systems) o Start a project based on previous work (editing old systems)

ƒ

The calculation tool shall provide a straightforward user interface

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 69 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

6.2.3 User requirements In order to make use of the software the user is required to have basic knowledge within Microsoft Excel, basic knowledge within reliability theory, knowledge about Kongsberg Maritime’s components and have read the user manual provided in annex B.

6.3 DESCRIPTION OF IMPROVED CALCULATION TOOL The calculation tool is developed in Microsoft Excel with use of Visual Basic script in order to make a more user friendly interface than the existing calculation tool. The tool was chosen to be developed in Excel because this is common software within the company, and most employees have thorough knowledge about the software. In addition the software is available for all employees in Kongsberg Maritime. The calculation tool consists of four excel worksheets: ƒ ƒ ƒ ƒ

Project Calculation Library Failure data

All calculations are performed within the “Calculation” worksheet. The worksheet is provided with a menu with two main functions: starting to build a system from scratch (see Figure 6.1), or load an old system (se Figure 6.2). When starting a system from scratch, it is possible to choose the desired components from a list. The list is automated from the failure data stored in the “Failure data” worksheet, and will automatically include new failure data when this is added to the ones stored in the tool. The remaining failure data specific for this component in the actual system are added in the menu by the user. For loading an old system, it is also possible to choose the desired system from an automated list. The old systems are stored in the “Library” worksheet. When the user has loaded the desired system, it is possible to switch to the menu for adding new component and performing reliability/availability calculations. The “Project” worksheet is for storage of the system for the actual projects. When a reliability/availability calculation of a system is performed, the calculations is moved to the ”Project” worksheet. The reliability/availability calculations are available from the menu. When the calculations have been performed, it is possible to exit the menu and edit the information. In this way it is possible to compare different configurations, and see the effect of different failure data for the different components. The calculations are based on the calculation model presented in chapter 5. The models are secured for each system, old or new, because the calculation models are programmed within the code. This way there are no possibilities for failures occurring in the calculations, as it was when using the copypaste method in Kongsberg Maritime’s existing calculation tool. The calculation tool is to be stored in an internal database, and copied from this database to the actual project when needed. This way the actual project will have the calculation tool available at all times, and it is then possible to add new information to the project Models for quantification of availability for continuous control systems and reliability for safety systems

Page 70 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

(new system added or editing existing system stored in the “Project” worksheet). The failure data that are stored in the “Failure data” worksheet, and are protected with the Excel function for protection of worksheets. This protection function has the possibility to add a password for protection of the worksheet if needed. It is recommended to protect the failure data with a password. For practical reasons, a password has not been included at this stage. An example of the calculation tool used on a function for the AIM Safe 3 dual IO system is shown in Figure 6.3.

Figure 6.1: The menu for adding new components and performing reliability/availability calculations

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 71 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Figure 6.2: The menu for loading old systems and move functions to "Project" worksheet

Figure 6.3: Example of a AIM Safe 3 dual IO redundancy system Models for quantification of availability for continuous control systems and reliability for safety systems

Page 72 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

7 CONCLUSIONS AND RECOMMENDATIONS FOR FURTHER WORK Both the PDS method and the method presented in IEC 61508-6 define the contribution factors for system failure as; random hardware failures, common cause failures and systematic failures. The PDS method suggest quantitative analysis of all three contribution factors, while the method presented in IEC 61508-6 suggest quantitative analysis of the system’s susceptibility to random hardware failures and common cause failures. In IEC 61508 the systematic failures are suggested to be avoided with quantitative analyses. Both methods describe the probability of failure on demand (PFD) for a system with respect to random hardware failures and common cause failures. The PDS method suggest to use the plant specific version of the β-factor model as described in IEC 61508-6, and in addition multiply with a modification factor when redundancy is introduced. This result in a more realistic modeling of common cause failures than the one presented in IEC 61508-6. The main differences between the models for PFD calculation are that the PDS method includes the probability of system failure due to dangerous undetected failure in a channel during the restoration time of a failed channel. In addition the PDS method derives the PFD formulas based on the situations when it is not known that the system has a failure, and when it is known that the system has a failure. The PDS method also includes a quantification of the spurious trip rate (STR) of the system. IEC 61508 on the other hand does not consider the STR of a system due to the fact that the standard only covers the reliability of the safety system and not considers the availability of the equipment under control. Both methods apply for mainly equal assumptions. Kongsberg Maritime’s relevant product range for this master work is the AIM safe systems and the DP systems. The main difference between the systems is that the AIM Safe systems are operating in low demand mode, and the dynamic positioning (DP) systems are continuously operating systems. For the AIM Safe system, a dangerous situation occurs only if the system is found in a failed state when a demand for the system arises. For the DP system, a dangerous situation will occur at any time the system fails to perform its designed function. Both the PDS method and the IEC 61508-6 method applies for Kongsberg Maritime’s AIM Safe systems. The PDS method does however not apply for the DP system, due to the fact that the PDS method only covers systems operating in low demand mode. The method presented in IEC 61508-6 is based on some assumption that not reflects the DP system. In addition the method does not present formulas for components with triple redundancy, which is the case for the DP-31 and DP-32 systems. Kongsberg Maritime’s existing calculation models are based on older version of the PDS method, and have been adapted to the IEC 61508 to a certain degree. The PFD calculations are approximations of the models presented in the PDS method, based on the fact that Kongsberg Maritime’s systems have a very low mean time to repair (MTTR). Thus Kongsberg Maritime consider only the probability of system failure due to common cause failures. It is recommended to also include the probability of system failure due to independent dangerous undetected failures in all channels. Due to the fact that the PDS Models for quantification of availability for continuous control systems and reliability for safety systems

Page 73 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

method is well-known within the company today, and that the derivation of the formulas given in the PDS method handbook are well described, it is recommended to quantify the reliability of the AIM Safe system based on the PDS method. For Kongsberg Maritime’s DP systems there have been developed calculation models based on the Markov technique and adapted to the IEC 61508-6 method. In addition the modification factor suggested in the PDS method for modeling common cause failures has been adapted. It is recommended to apply these calculation models for the DP systems, instead of the ones suggested in the method presented in IEC 61508-6. A calculation tool based on the calculation models for calculation of reliability of the AIM Safe systems and the availability of the DP systems is developed in Microsoft Excel. The main improvement factors from the existing calculation tool are; validation of the reliability and availability calculations, and improve the user interface such that the tool can be used in the development- and delivery project by others than only the safety engineers. A complete product specification is given in the report. The user manual is given in appendix B, and the calculation tool is attached with the report. For further work within this topic it is recommended to perform a more thorough validation of the failure data for Kongsberg Maritime’s systems, specially the DP systems where the failure data are poor due to the fact that there is not common to perform availability quantification of the DP system at this date. Further development of the improved calculations models for the DP system must be performed in order to verify the adequacy of the models. In addition further development of the calculation tool is recommended, especially with respect to the user interface of the availability calculations of the DP systems.

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 74 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

8 REFERENCES 1. Apostolakis, G. and P. Moieni (1987). The Foundations of Models of Dependence in Probabilistic Safety Assessment. Reliability Engineering vol. 18 p. 177-195 2. Bukowski, J. V. (2001). Modeling and Analyzing the Effects of Periodic Inspection on the Performance of Safety-Critical Systems. IEEE Transactions on reliability, vol. 50, no. 3, p. 321-329 3. Bukowski, J. V. and A. Lele (1997). The Case for Architecture-Specific Common Cause Failure Rates and How They Affect System Performance.1997 Proceedings annual reliability and maintainability symposium. IEEE. 4. Bukowski, J. V. and Goble, W. M. (1994). Effects on maintenance policies on MTTF of dangerous failures in programmable electronic controllers. ISA Transactions vol. 33 p. 185-193 5. Bukowski, J. V. and W. M. Goble (1995). Using Markov models for safety analysis of programmable electronic systems. ISA Transactions vol. 34 p. 193-198 6. Bukowski, J. V., Rouvroye, J. and W. M. Goble (2002). What is PFDavg? exida.com 7. Goble, W. M. (2000). Using Simulation to Characterize Common Cause. exida.com 8. Goble, W. M. (2003). Estimating the Common Cause Beta Factor. exida.com 9. Hansen, G. K. and J. Vatn (1998). Reliability Data for Control and Safety Systems. 1998 Edition. SINTEF Report STF38 A98445, SINTEF, Trondheim, Norway 10. Hokstad, P. and K. Corneliussen (2000). Improved Common Cause Failure Model for IEC 61508 Analysis. SINTEF Report STF38 A00420, SINTEF, Trondheim, Norway 11. Hokstad, P. and K. Corneliussen (2003). Reliability Prediction Method for Safety Instrumented Systems; PDS Method Handbook, 2003 Edition. SINTEF Report STF38 A02420, SINTEF, Trondheim, Norway 12. IEC 61508 (1999). Functional safety of electrical/electronic/programmable electronic safety-related systems. Part 1-7. Geneva, International Electrotechnical Commission 13. IEC 61511 (2003). Functional Safety – Safety Instrumented Systems for the Process Industry. Part 1-3. Geneva, International Electrotechnical Commission 14. Kjørstad, M. (2004). Reliability of safety systems. NTNU, Trondheim Models for quantification of availability for continuous control systems and reliability for safety systems

Page 75 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

15. Knegtering, B. and A. C. Brombacher (1999). Application of micro Markov models for quantitative safety assessment to determine safety integrity levels as defined by the IEC 61508 standard for functional safety. Reliability Engineering and System Safety vol. 66 p. 171-175 16. Knegtering, B. and A. C. Brombacher (2000). A method to prevent excessive numbers of Markov states in Markov models for quantitative safety and reliability assessment. ISA Transactions vol. 39 p. 363-369 17. Kongsberg Maritime AS (2003). Product Description. Kongsberg GreenDP – SDP. Dynamic Positioning System. Kongsberg Maritime AS. 18. Kongsberg Maritime AS (2004). Product Description. Kongsberg Maritime AIM Safe. Safety System. Kongsberg Maritime AS. 19. Korssjøen, E. (2004). Safety Analysis Report AIM Safe. Kongsberg Maritime AS 20. Kvam, P. H. (1998). The binomial failure rate mixture model for common cause failure data from the nuclear industry. Appl. Statist. vol. 47 p. 49-61 21. Parry, G. W. (1991). Common Cause Failure Analysis: A Critique and Some Suggestions. Reliability Engineering and System Safety vol. 34 p. 309-326 22. Rausand, M. (2004). Risk and Reliability in Subsea Engineering. Rio de Janeiro, Brazil. UFRJ 23. Rausand, M. and A. Høyland (2004). System reliability theory: models, statistical methods, and applications (2nd ed.). New Jersey, United States and Canada: John Wiley & Sons 24. Rouvroye, J. L. and E. G. van den Bliek (2002). Comparing safety analysis techniques. Reliability Engineering and System Safety vol. 75 p. 289 - 294 25. Stavrianidis, P. (1992). Reliability and uncertainty analysis of hardware failures of a programmable electronic system. Reliability Engineering and System Safety vol. 39 p. 309-324 26. Summers, A. E. (2002). Software-Implemented Safety Logic. Process Safety Progress vol. 21 no. 2 p. 161-163 27. Summers, A. E. and G. Raney (1998). Common Cause and Common Sense – Designing Failure Out of Your SIS. ISA TECH/EXPO Technology Update Conference Proceedings vol. 2 p. 39-48

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 76 of 77

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

28. Tiezema, R. (1998) Common Cause effects on safety rated PLCs. Apeldoorn, The Netherlands: Yokogawa 29. Turnbull, G. (2004). The broader aspect of certifying Industrial Systems to IEC 61508. PROFIBUS International Conference. Ragley Hall, Warwickshire, UK, June 2004 30. Zhang, T., Long, W. and Y. Sato (2003). Availability of systems with self-diagnostic components – applying Markov model to IEC 61508-6. Reliability Engineering and System Safety vol. 80 p. 133-141

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 77 of 77

APPENDIX A: MARKOV MODELS FOR DUAL AND TRIPLE REDUNDANT DP SYSTEMS

Markov model for a 1oo2 configuration, for identical and independent components:

State transition diagram

λ

2λ 0

1

2

µ System state 0 1 2



System state definition System function as 1oo2 System reduced to 1oo1 System failure

State transition matrix:

⎡ −2λ A = ⎢⎢ µ ⎢⎣ 0

2λ 0 ⎤ − ( λ + µ ) λ ⎥⎥ 2µ −2µ ⎥⎦

Steady state equations are found using the same procedure as for 1oo1 systems: −2λ ⋅ P0 + µ ⋅ P1 = 0

2λ ⋅ P0 − ( λ + µ ) ⋅ P1 + 2µ ⋅ P2 = 0

λ ⋅ P1 − 2µ ⋅ P2 = 0 P0 + P1 + P2 = 1

Steady state probabilities:

µ2 P0 = 2 (λ + µ ) P1 = P2 =

2λµ

(λ + µ )

2

λ2 2 (λ + µ )

ωF = ∑∑ Pj ⋅ a jk = P1 ⋅ a12 j∈B k∈F

Frequency of system failures:

=

2λµ

(λ + µ )

2

⋅λ ≈

2λ 2 µ

µ2

λ2 = 2⋅ µ

System unavailability:

λ2 AS = ∑ Pj = P2 = 2 j∈F (λ + µ )

Mean duration of a system failure:

MDTS =

Mean time between system failures:

MTBFS =

AS

ωF 1

ωF

λµ

Markov model for a 1oo3 configuration, for identical and independent components:

Markov state diagram:



λ



0

1

µ

2



System state 0 1 2 3

3



System state definition System function as 1oo3 System reduced to 1oo2 System reduced to 1oo1 System failure

State transition matrix: ⎡ −3λ ⎢ µ A=⎢ ⎢ 0 ⎢ ⎣ 0

3λ 0 0 ⎤ 2λ 0 ⎥⎥ − ( 2λ + µ ) − ( 2µ + λ ) λ ⎥ 2µ ⎥ −3µ ⎦ 0 3µ

Steady state equations: −3λ ⋅ P0 + µ ⋅ P1 = 0

3λ ⋅ P0 − ( 2λ + µ ) ⋅ P1 + 2 µ ⋅ P2 = 0 2λ ⋅ P1 − ( 2µ + λ ) ⋅ P2 + 3µ ⋅ P3 = 0

λ ⋅ P2 − 3µ ⋅ P3 = 0 P0 + P1 + P2 + P3 = 1

Steady state probabilities:

µ3 P0 = 3 (λ + µ ) P1 = P2 = P3 =

Frequency of system failures:

3λµ 2

(λ + µ )

3

3λ 2 µ

(λ + µ )

3

λ3 3 (λ + µ )

ωF = ∑ ∑ Pj ⋅ a jk = P2 ⋅ a23 = j∈B k∈F

System unavailability:

AS = ∑ Pj = P3 = j∈F

AS

Mean duration of a system failure:

MDTS =

Mean time between system failures:

MTBFS =

ωF 1

ωF

λ3 3 (λ + µ )

3λ 2 µ

(λ + µ )

2

⋅λ

APPENDIX B: USER MANUAL FOR NEW CALCULATION TOOL

USER MANUAL FOR THE RELIABILITY AND AVAILABILITY CALCULATION TOOL

TABLE OF CONTENTS

1

INTRODUCTION............................................................................................................................... 3 1.1

2

CONFIGURATION............................................................................................................................ 3 2.1 2.2 2.3 2.4

3

INSTALLATION ............................................................................................................................. 3

CALCULATION .............................................................................................................................. 4 PROJECT ....................................................................................................................................... 4 LIBRARY ...................................................................................................................................... 4 FAILURE DATA ............................................................................................................................. 5

RELIABILITY AN AVAILABILITY CALCULATION MENU................................................... 6 3.1 3.2 3.3 3.4

ADD ITEMS ................................................................................................................................... 6 LOAD/MOVE FUNCTION ................................................................................................................ 7 MAKING A NEW SYSTEM (FUNCTION) ........................................................................................... 8 LOADING A SYSTEM (FUNCTION) FROM THE “LIBRARY” WORKSHEET........................................... 9

4

ADDING NEW FAILURE DATA................................................................................................... 10

5

CONSTRAINTS................................................................................................................................ 10

6

EDIT MTTR AND PORTION OF TOTAL PFD .......................................................................... 11

User manual for calculation tool

Page 2 of 11

1 INTRODUCTION The Reliability and Availability calculation tool is developed in Microsoft Excel, and the main objective is to calculate the reliability of functions for the AIM Safe systems and the DP systems. A function are hereafter referred to as a system, hence a system is a function performed in an AIM Safe system or in a DP system.

1.1 INSTALLATION The software is developed for Microsoft Excel version 2002 with the operating system Windows XP. The software contains macros that are essential for the software to function as designed. When opening the software in Microsoft Excel the message box below might appear (or a similar message box), dependent on the security level in Microsoft Excel.

The problem may be solved in two ways: 1. Tick the “Always trust macros from this source”, and then choose the “Enable Macros” command button. or 2. Exit the message box and set the security level in Microsoft Excel to medium, then reopen the calculation tool and enable the macros. Changing the security level is done with: Tools – Macro - Security…- Security level – Medium.

2 CONFIGURATION The software consists of four worksheets: ƒ ƒ ƒ ƒ

Calculation Project Library Failure data

User manual for calculation tool

Page 3 of 11

2.1 CALCULATION The “calculation” worksheet performs all the calculations in the software. Through the menu in this worksheet the system are made, calculated, edited and saved.

2.2 PROJECT The “project” worksheet is for storage of systems that applies for the specific project. When a system is constructed in the “calculation” worksheet, and the reliability/availability calculations have been performed, the system is moved to the “project” worksheet using the menu in the “calculation” worksheet.

2.3 LIBRARY The “library” worksheet is for storage of old systems. The systems stored in this worksheet are general systems that are similar to many other systems. Through the menu in the “calculation” worksheet these systems may be loaded into the “calculation” worksheet for editing and calculation. It is also possible to save a system in the “library” worksheet through the menu in the “calculation” worksheet.

User manual for calculation tool

Page 4 of 11

2.4 FAILURE DATA In the “failure data” worksheet, all the components and the belonging failure data are stored. The menu in the “calculation” worksheet automates a list of the components and copies the failure data to the “calculation” worksheet when a specific component is selected by the user.

User manual for calculation tool

Page 5 of 11

3 RELIABILITY AN AVAILABILITY CALCULATION MENU The reliability and availability calculation menu is accessible from the “calculation” worksheet. The menu is split into two pages: ƒ ƒ

Add items Load/move function

3.1 ADD ITEMS Through the “add items” menu it is possible to: ƒ

Select items from the “failure data” worksheet. Before selecting items, the list must be updated using the “update” command.

ƒ

Add specific information about the selected item: o Voting logic o Critical portion of module o Proof test interval o Diagnostic coverage o Beta factor

ƒ

Perform reliability/availability calculations for systems working in low demand mode and continuous demand mode. The AIM Safe systems are operating in low demand mode, and the DP systems are operating in continuous demand mode.

User manual for calculation tool

Page 6 of 11

3.2 LOAD/MOVE FUNCTION Through the “load/move function” menu it is possible to: ƒ

Select and load a system (function) from the “library” worksheets. Before selecting the system (function), the list must be updated using the “update” command.

ƒ

Name the current system (function) and add the name to the “calculation” worksheet

ƒ

Move the current system (function) to the “project” worksheet

ƒ

Clear the current system (function) from the “calculation” worksheet

ƒ

Save the current system (function) in the “calculation” worksheet to the “library” worksheet

User manual for calculation tool

Page 7 of 11

3.3 MAKING A NEW SYSTEM (FUNCTION) For making a new system (function) from scratch, the following procedure applies: 1. Open the “view menu” command button in the “calculation” worksheet 2. Select the page “add items” on the menu 3. Press the “update” command button 4. Click the desired type of item in the list “select type of item” 5. Click the desired item in the list “select item” 6. Input information for the selected item: a. Voting logic. The available voting logics are shown in the list. b. Critical portion of module. A recommended value for “critical portion of modules” is shown in the text box. If there are no recommended values the text “No guide” appears. c. Proof test interval. Suggestions for commonly used proof test intervals are shown in the list, but it is possible to choose a different proof test interval. d. Diagnostic coverage. A recommended value for “diagnostic coverage” is shown in the text box on the right. One of the values shown in the list should be chosen. e. Beta-factor: Suggestions for commonly used “beta-factor” are shown in the list, but it is possible to choose a different beta-factor. 7. When all desired input information for the selected item is chosen, click the “add item” command button which places the item in the “calculation” worksheet. 8. The procedure from 3 to 7 is repeated until all desired items for the system (function) are in the “calculation” worksheet 9. Perform the reliability/availability calculation by clicking the command button for either the “PFD” (for low demand systems) or the “PFH” (for continuous systems). 10. If there are need to edit the information in the “calculation” worksheet, exit the menu by clicking the “x” in the upper-right corner of the menu. The white columns are for editing, while the blue columns are NOT for editing. See also section xx for more information about the constraints of the software.

User manual for calculation tool

Page 8 of 11

11. When the system (function) is finished and ready for moving to the “project” worksheet, enter the Reliability and Availability calculation-menu again by clicking the “view menu” command button in the “calculation worksheet. 12. Click the page “Load/move function” on the menu. 13. Enter the name of the system (function) in the text boxes below the “name current function” label. This includes naming the: a. “main type” b. “sub type” c. “loop type” 14. When the names above are entered in the text boxes, click the “add name to worksheet” command button, this will copy the name to the “calculation” worksheet 15. Click the “Move function to project” for moving the function to the “project” worksheet 16. Exit the menu and switch to “project” worksheet to view the system (function), or add new systems (functions) using the same procedure

3.4 LOADING A SYSTEM (FUNCTION) FROM THE “LIBRARY” WORKSHEET For loading a system (function) from the “library” worksheet, the following procedure applies: 1. Open the “View menu” command button in the “calculation” worksheet 2. Select the page “load/move function” on the menu 3. Press the “update” command button 4. Click the desired type of system (function) in the list “main type” 5. Click the desired sub type of system (function) in the list “sub type” 6. Click the desired loop type of system (function) in the list “loop type” 7. Click the “load function” command button, this will copy the desired system (function) from the “library” worksheet into the “calculation worksheet

User manual for calculation tool

Page 9 of 11

8. If it is desired to add more items to the loaded system (function), follow the procedure in section xx. 9. If it is desired to edit the information in the “calculation” worksheet: exit the menu by clicking the “x” in the upper-right corner of the menu. The white columns are for editing, while the blue columns are NOT for editing. See also section xx for more information about the constraints of the software. 10. If the system (function) is finished and ready to be moved to the “project” worksheet, follow the procedure 12 to 16 in the section xx. Be sure to have the Reliability and Availability calculations-menu open.

4 ADDING NEW FAILURE DATA When adding new failure data into the software, the following procedure apply: 1. Go to the “failure data” worksheet 2. Unprotect the worksheet: a. Click tools – protection – unprotect sheet b. It may be questioned for a password 3. Add the new item and belonging failure data. NB: it is important that the information given in column C: “common description” is ordered alphabetical. If this is not the case, the list of items in the “Add item” menu will not be correct. 4. Remember to protect the worksheet when the new failure data has been added: a. Click tools – protection – protect sheet – OK

5 CONSTRAINTS In order for the software to perform as intended, there are some necessities within the design of the worksheets: ƒ

Do not delete or edit the cells within the “calculation” worksheet. Only the white columns within the calculation area for the current system (function) are to be edited. If the blue columns within the calculation area are edited, the reliability/availability calculations must be redone.

ƒ

Do not delete the last row (column A) with the text “last row” in the “project” worksheet and “library” worksheet. The software can not copy systems (functions) from or to these worksheets if this is deleted. If there are no text “last

User manual for calculation tool

Page 10 of 11

row” in the “project” worksheet and “library” worksheet, insert this text in the last available rows in column “A” in both worksheets. ƒ

When editing information in the calculation area of the “calculation” worksheet: if editing the voting logic column, the calculations have to be redone through the “add item” menu. In order for the system to calculate the reliability/availability the input for the voting logic column must be exactly one of the alternatives: o 1oo1 o 1oo2 o 1oo3

6 EDIT MTTR AND PORTION OF TOTAL PFD In the “load/move” function, there is an area called the “input to calculation”. Here it is possible to change the MTTR and the portion of the total PFD the system (function) may include. By default the value for MTTR is set to 1 hour for all components, and the default value for the portion of total PFD are 15 %.

User manual for calculation tool

Page 11 of 11

APPENDIX C: PRELIMINARY REPORT

Norwegian University of Science and Technology Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Preliminary report:

MODELS FOR QUANTIFICATION OF AVAILABILITY FOR CONTINUOUS CONTROL SYSTEMS AND RELIABILITY FOR SAFETY SYSTEMS

Stud. techn. Marianne Kjørstad Spring 2005

KONGSBERG Kongsberg Maritime AS

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

TABLE OF CONTENTS 1

PROBLEM ANALYSIS ..................................................................................................................... 3 1.1 1.2

2

PROBLEM DESCRIPTION ................................................................................................................ 3 OBJECTIVES ................................................................................................................................. 4

ACTIVITY DESCRIPTION.............................................................................................................. 5

APPENDIX A: WBS APPENDIX B: GANTT CHART

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 2 of 10

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

1 PROBLEM ANALYSIS 1.1 PROBLEM DESCRIPTION This master work is based on the project Reliability of safety systems, performed autumn 2004 by stud. techn. Marianne Kjørstad for Kongsberg Maritime. Kongsberg Maritime AS covers the maritime business area of the international technology cooperation Kongsberg Gruppen ASA. The main market segments of Kongsberg Maritime are merchant marine, offshore and subsea, yachting and fisheries, marine information technology, simulation and process automation. Kongsberg Maritime’s headquarter is located in Horten, but they have manufacturers located several places all over the world. This actual project is related to Kongsberg Maritime’s manufacture in Kongsberg that produces safety systems (e.g. emergency shutdown systems) and continuous control systems (e.g. dynamic positioning system) for the maritime sector. Kongsberg Maritime must react to requirements on quantification of reliability for their safety systems and availability for their continuous control systems from their customers. Mainly the requirements imply a documentation of the safety system’s safety integrity level (SIL) in accordance with IEC 61508, and a documentation of MTTF/MTBF for the control system. This documentation requires use of calculation models that are flexible and adapted to the relevant operational conditions. Good calculation models will reveal how to effectively improve the system in the design phase and thereby increase the reliability and availability of the system, and increase the system’s ability to compete on the market. Today Kongsberg Maritime is in lack of satisfactory calculation models for this purpose. The main objective of this project is to help Kongsberg Maritime developing satisfactory calculation models for quantification of reliability for their safety systems and availability for their continuous control systems. The main calculation method used within this field today, and therefore also best known for the customers, are the method presented in IEC 61508 and the PDS method presented by Sintef. Therefore Kongsberg Maritime request that satisfactory calculation models must be based on these methods and adapted to their actual systems.

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 3 of 10

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

1.2 OBJECTIVES 1. Identify and compare the similarities/dissimilarities of the calculation methods presented in IEC 61508 and the PDS method presented by Sintef, and the assumptions these methods are based upon. Describe the relevance of these methods and assumptions for Kongsberg Maritime’s systems. Identify special considerations not handled by any of the calculation methods. 1.1. Identify differences in IEC 61508 method and PDS method 1.2. Describe the relevance of IEC 61508 method and PDS method and assumptions for Kongsberg Maritime’s systems 1.3. Identify Kongsberg Maritime’s relevant product range for this project 1.4. Identify special considerations not handled by any of the calculation methods 2. Improve Kongsberg Maritime’s existing calculation models of reliability for safety systems and availability for continuous control systems with respect to methods presented in IEC 61508, SINTEF’s PDS method and other requirements specific for Kongsberg Maritime’s systems, based on the results obtained in objective 1. 2.1. Identify and review Kongsberg Maritime’s existing calculation models in their existing Excel worksheet 2.2. Work out a structure and optimize one or more calculation models for Kongsberg Maritime’s relevant product range, based on the results obtained in objective 1 3. Develop a tool for calculation of reliability for Kongsberg Maritime’s safety systems and availability for Kongsberg Maritime’s control systems to be used in development- and delivery projects, based on the improved calculation models developed in objective 2. 3.1. Develop a product specification for a calculation tool based on the calculation models developed in objective 2 3.2. Develop a tool for calculation of reliability and availability of Kongsberg Maritime’s relevant product range

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 4 of 10

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

2 ACTIVITY DESCRIPTION Activity No 1: Preliminary report Task: Prepare a preliminary report for the master: Models for quantification of availability for continuous control systems and reliability for safety systems. Content: ƒ

Analyze the problem

ƒ

Define objectives

ƒ

Define activities and work out cost time resource (CTR) sheets

ƒ

Work out WBS

ƒ

Work out a Gantt chart

Literature: Rolstadås, A. (2001). Praktisk prosjekt styring. 3. edition. Tapir Akademisk Forlag, Trondheim Outcome: A preliminary report of the project with descriptions of activities that must be performed and a progress schedule to follow in order to solve the problem. Start date:

End date:

17.01.05

25.01.05

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 5 of 10

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Activity No 2: Information search Task: Information search Content: Gather information about: ƒ

IEC 61508 calculation method

ƒ

PDS method

ƒ

Alternative calculation methods, specially Markov-method

ƒ

Kongsberg Maritime’s requirements for the calculation method

ƒ

Existing calculation tools for reliability and availability of safety systems and control systems

Outcome: Essential information that makes basis for the master work Start date:

End date:

18.01.05

01.06.05

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 6 of 10

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Activity No 3: Calculation methods Task: Identify and compare the similarities/dissimilarities of the calculation methods presented in IEC 61508 and the PDS method presented by Sintef, and the assumptions these methods are based upon. Describe the relevance of these methods and assumptions for Kongsberg Maritime’s systems. Identify special considerations not handled by any of the calculation methods. Content: ƒ

Identify differences in IEC 61508 method and PDS method

ƒ

Describe the relevance of IEC 61508 method and PDS method and assumptions for Kongsberg Maritime’s systems

ƒ

Identify Kongsberg Maritime’s relevant product range for this project

ƒ

Identify special considerations not handled by any of the calculation methods

Literature: ƒ

IEC 61508

ƒ

PDS method handbook

ƒ

Information provided through information search

Outcome: A description of the similarities and dissimilarities between IEC 61508 calculation method and PDS method, and the relevance of these methods and pertaining assumptions to Kongsberg Maritime’s systems. Start date:

End date:

26.01.05

28.02.05

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 7 of 10

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Activity No 4: Calculation models Task: Improve Kongsberg Maritime’s existing calculation models of reliability for safety systems and availability for continuous control systems with respect to methods presented in IEC 61508, SINTEF’s PDS method and other requirements specific for Kongsberg Maritime’s systems, based on the results obtained in objective 1. Content: ƒ

Identify and review Kongsberg Maritime’s existing calculation models in their existing Excel worksheet

ƒ

Work out a structure and optimize one or more calculation models for Kongsberg Maritime’s relevant product range, based on the results obtained in objective 1

Literature: ƒ

Kongsberg Maritime’s existing calculation method

ƒ

Information provided through information search

Outcome: Calculation methods which is to form basis of the calculation tool Start date:

End date:

01.03.05

25.04.05

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 8 of 10

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Activity No 5: Calculation tool Task: Develop a tool for calculation of reliability for Kongsberg Maritime’s safety systems and availability for Kongsberg Maritime’s control systems to be used in development- and delivery projects, based on the improved calculation models developed in objective 2. Content: ƒ

Develop a product specification for a calculation tool based on the calculation models developed in objective 2

ƒ

Develop a tool for calculation of reliability and availability of Kongsberg Maritime’s relevant product range

Literature: Information provided through literature search Rough product specification: ƒ

The calculation tool shall calculate reliability and availability using the calculation models worked out in activity no 4

ƒ

The calculation tool shall be relevant for Kongsberg Maritime’s product range identified in activity no 3

ƒ

The calculation tool shall have an user friendly interface, which enables users without specific knowledge about the calculation models to use the software

ƒ

Changing of parameter values shall immediately initiate calculation of the effect of the changes, showing the user where to effectively make changes in the system in order to increase the reliability/availability

Outcome: An effective tool for calculation of reliability and availability of Kongsberg Maritime’s relevant product range, used to improve Kongsberg Maritime’s systems and increase ability to compete on the market. Start date:

End date:

26.01.05

01.06.05

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 9 of 10

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

Activity No 6: Report Task: Writing report of master work Content: ƒ

Determine structure of report

ƒ

Write introduction

ƒ

Collect and structure main part of report

ƒ

Write summary, conclusion and references

ƒ

Correct and improve report

Outcome: A well structured report documenting the master work Start date:

End date:

26.01.05

10.06.05

Models for quantification of availability for continuous control systems and reliability for safety systems

Page 10 of 10

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

APPENDIX A: WBS

Appendix A: WBS

Page 1 of 1

Norwegian University of Science and Technology

Faculty of Engineering Science and Technology Department of Production and Quality Engineering

APPENDIX B: GANTT CHART

Appendix B: Gantt chart

Page 1 of 1