5
Common-Mode Software Failure Potential

INTRODUCTION AND BACKGROUND

Safety systems in nuclear power plants must reliably satisfy their functional requirements. To help achieve this goal, safety systems are designed to be single-failure proof, i.e., no single failure is to prevent safety system actuation if needed, nor shall a single failure cause a spurious activation. Various forms of redundancy are commonly used to achieve this design goal, i.e., to achieve the functional goals in the presence of component failures.

There are two approaches to providing redundant components: active redundancy and standby redundancy. In active redundancy, the outputs of multiple identical components or strings of components, operating in parallel, are compared or selected in some way to determine which outputs will actually be used. If the individual components are each highly reliable and fail independently, then a correct output can be assured with high probability.

To avoid the problem of spurious scrams in a nuclear power plant, the active redundancy may involve multiple channels, all carrying the same kind of information and connected so that no protection action will be taken unless a certain number of these channels trip simultaneously. For example, the output from four parallel strings of identical components might be combined using Boolean logic in such a way that the safety systems are activated when two of the four channels exceed the preset threshold level. In this way, a single channel failure cannot prevent or cause safety system activation.

The second type of redundant design uses standby (or backup) redundancy. In this scheme, one or more spares are available to replace failed components. An example of standby redundancy is switching to an alternate or backup power supply when loss of electrical power is detected. Combinations of active and standby redundancy can also be used.

In both active and standby redundancy, components are designed to implement the same function. If the components are identical, this is called component duplication. Component duplication provides protection against independent failures caused by physical degradation (e.g., wearing out) of the components.

The benefits of component duplication can be defeated by common-cause or common-mode failures. Common-cause failures are multiple component failures having the same cause. Common-mode failures denote the failure of multiple components in the same way, such as stuck open or fail as-is. Common-cause and common-mode failures occur when the assumption of independence of the failures of the components is invalid.

Common-cause failures can occur owing to common external or internal influences. External causes may involve operational, environmental, or human factors. The common cause may also be a (dependent) design error internal to the supposedly independent components.

To protect against common design errors, components with a different internal design (but performing the same function) may be used. This approach is called ''design diversity" in this report. Multiple versions of software that are written from equivalent requirements specifications are examples of design diversity. That is, the component requirements are the same, but the way the requirement is achieved within the component may be different. Two pieces of software that compute a sine function but use different algorithms to do so are an example of design diversity. As another example, consider two algorithms where the required function is to determine whether two numbers are equal. One algorithm may compute the ratio of the numbers and the other may compare their differences to some number epsilon which has a value close to zero.

A second type of diversity, which is called "functional diversity" in this report, involves components that perform completely different functions at the component level (although the components may be related in that they are used to satisfy higher-level system requirements). The crucial point is that the component requirements are different. An example of functional diversity is the use of high reactor power to flow ratio to cause a reactor trip using control rods, and high coolant temperature to cause a reactor trip using



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 43
5 Common-Mode Software Failure Potential INTRODUCTION AND BACKGROUND Safety systems in nuclear power plants must reliably satisfy their functional requirements. To help achieve this goal, safety systems are designed to be single-failure proof, i.e., no single failure is to prevent safety system actuation if needed, nor shall a single failure cause a spurious activation. Various forms of redundancy are commonly used to achieve this design goal, i.e., to achieve the functional goals in the presence of component failures. There are two approaches to providing redundant components: active redundancy and standby redundancy. In active redundancy, the outputs of multiple identical components or strings of components, operating in parallel, are compared or selected in some way to determine which outputs will actually be used. If the individual components are each highly reliable and fail independently, then a correct output can be assured with high probability. To avoid the problem of spurious scrams in a nuclear power plant, the active redundancy may involve multiple channels, all carrying the same kind of information and connected so that no protection action will be taken unless a certain number of these channels trip simultaneously. For example, the output from four parallel strings of identical components might be combined using Boolean logic in such a way that the safety systems are activated when two of the four channels exceed the preset threshold level. In this way, a single channel failure cannot prevent or cause safety system activation. The second type of redundant design uses standby (or backup) redundancy. In this scheme, one or more spares are available to replace failed components. An example of standby redundancy is switching to an alternate or backup power supply when loss of electrical power is detected. Combinations of active and standby redundancy can also be used. In both active and standby redundancy, components are designed to implement the same function. If the components are identical, this is called component duplication. Component duplication provides protection against independent failures caused by physical degradation (e.g., wearing out) of the components. The benefits of component duplication can be defeated by common-cause or common-mode failures. Common-cause failures are multiple component failures having the same cause. Common-mode failures denote the failure of multiple components in the same way, such as stuck open or fail as-is. Common-cause and common-mode failures occur when the assumption of independence of the failures of the components is invalid. Common-cause failures can occur owing to common external or internal influences. External causes may involve operational, environmental, or human factors. The common cause may also be a (dependent) design error internal to the supposedly independent components. To protect against common design errors, components with a different internal design (but performing the same function) may be used. This approach is called ''design diversity" in this report. Multiple versions of software that are written from equivalent requirements specifications are examples of design diversity. That is, the component requirements are the same, but the way the requirement is achieved within the component may be different. Two pieces of software that compute a sine function but use different algorithms to do so are an example of design diversity. As another example, consider two algorithms where the required function is to determine whether two numbers are equal. One algorithm may compute the ratio of the numbers and the other may compare their differences to some number epsilon which has a value close to zero. A second type of diversity, which is called "functional diversity" in this report, involves components that perform completely different functions at the component level (although the components may be related in that they are used to satisfy higher-level system requirements). The crucial point is that the component requirements are different. An example of functional diversity is the use of high reactor power to flow ratio to cause a reactor trip using control rods, and high coolant temperature to cause a reactor trip using

OCR for page 43
boron concentration. Diversity in this case involves using different principles of operation or physical principles to satisfy the same or different system-level requirements. In the case of software, functional diversity means that the behavioral requirements for the software are different. For example, one program may check to see whether two numbers are equal and another, functionally diverse, program might select the larger of two numbers. Note that the components must have different functional requirements to count as functionally diverse. Digital components that have the same functional requirements are not functionally diverse and do not make two separate systems diverse. An example of the latter case is the use of a digital component or components to provide the same protection functions where a diverse means to actually shutdown the reactor (such as control rods and soluble neutron absorption) is used. The system-level actuation functions may be physically different (dropping the control rods or injecting a soluble neutron absorber), but if the digital components are performing the same protection functions (detection of the conditions to signal the need for a reactor scram), then the digital components do not have functional diversity. To summarize: Redundancy is the use of duplication or diversity to provide alternate means of performing a required function in the event of failure of an individual item (single failure). Redundancy may be active (all results, or components, are used) or standby (some results, or components, are not used until failure occurs). Duplication is the use of multiple copies of the same component to provide protection against independent failures caused by physical degradation. Design diversity is the use of two or more components with a different internal design but performing the same function. Functional diversity is the use of two or more components to achieve different functions at the component level, although the functions may be related in terms of higher-level system requirements. Design diversity and functional diversity are used to protect against common-cause or common-mode failures. This chapter is concerned only with digital components. Design diversity, as defined above, is not extensively practiced in nuclear power plants for analog instrumentation and control; identical components and devices are used in redundant channels. This practice results from a conscious decision that design diversity of the nature suggested for software would introduce counter-productive complexity into the hardware environment. Analog systems are believed to fail in more predictable and obvious ways than do the more hidden and insidious failure mechanisms in software. This fact has allowed assessment and protection against common-mode analog failure potential without use of diversity except in a very limited way. It also allows the industry to collect operating experience on failure modes over a large application base. Digital technology introduces a possibility that common-mode software failure may cause redundant safety systems to fail in such a way that there is a loss of safety function. Arguments for independence in redundant or functionally diverse hardware designs are often based on the failures being related to different physical principles or causes and therefore acceptably independent or on the ability to build in a particular failure mode, e.g., a value that is designed to fail open. These same arguments and methods do not apply to software. When considering common-mode software failure, the issue is whether assumptions about independence could be compromised when digital components are substituted for analog components. Although the committee found that some people use the term "common-mode software error" to mean any software error, the term as used here specifically denotes errors that involve dependencies between two or more digital components. When only one of a set of diverse components is digital, i.e., when a digital component is used in conjunction with analog devices or human backups (e.g., when a relay system, a digital device, and a manual actuator are used together to provide design diversity and adequate reliability), there appear to be no additional issues raised over current practice. The committee sees nothing special concerning the common-mode failure problem in this situation that is not covered by current procedures to evaluate the potential for common-mode failure between different types of devices. Statement of the Issue Digital technology introduces a possibility that common-mode software failures may cause redundant safety systems to fail in such a way that there is a loss of safety function. Various procedures have been developed and evolved for evaluating common-mode failure potential in analog devices. Do these same procedures apply to computers and software or are different approaches to ensuring reliability needed? What does software diversity mean? Can it be achieved and assessed and, if so, how? Do techniques exist for assessing common-cause failure and common-mode failure when computers are involved? What are the implications of common-mode software failure for the licensing process and the use of component diversity? Are redundancy and diversity the most effective way to achieve reliability for digital systems? Applicability to Existing and New Plants The problem of common-mode software failure is important in both retrofits of digital components into existing plants and in new plant design. In older plants where digital components are being substituted for analog ones, assumptions

OCR for page 43
about the independence of components may have been made in the original licensing basis. If these independence assumptions can be invalidated by the introduction of the digital components, then the safety evaluation must be redone using the new assumptions. In new plants, if the use of digital components can invalidate standard assumptions and procedures for achieving and assessing independence and high reliability, then new procedures may be needed. U.S. NUCLEAR REGULATORY COMMISSION POSITION The U.S. Nuclear Regulatory Commission (USNRC) staff has developed the following position with respect to diversity, as stated in the draft branch technical position, Digital Instrumentation and Control Systems in Advanced Plants (USNRC, 1992): The applicant shall assess the defense-in-depth and diversity of the proposed instrumentation and control system to demonstrate that vulnerabilities to common-mode failures have been adequately addressed. The staff considers software design errors to be credible common-mode failures that must be specifically included in the evaluation. In performing the assessment, the vendor or applicant shall analyze each postulated common-mode failure for each event that is evaluated in the analysis section of the safety analysis report (SAR) using best-estimate methods. The vendor or applicant shall demonstrate adequate diversity within the design for each of these events. If a postulated common-mode failure could disable a safety function, then a diverse means, with a documented bases [sic] that the diverse means is unlikely to be subject to the same common-mode failure, shall be required to perform either the same function or a different function. The diverse or different function may be performed by a nonsafety system if the system is of sufficient quality to perform the necessary function under the associated event conditions. Diverse digital or nondigital systems are considered to be acceptable means. Manual actions from the control room are acceptable if time and information are available to the operators. The amount and types of diversity may vary among designs and will be evaluated individually. A set of displays and controls located in the main control room shall be provided for manual system-level actuation and control of critical safety functions and monitoring of parameters that support the safety functions. The displays and controls shall be independent and diverse from the safety computer system identified in items 1 and 3 above. The position for existing plants is the same except that item 4 is not required. Because the regulatory requirement depends on providing a diverse means of carrying out a safety function, the USNRC has also recently issued guidelines to assess whether sufficient diversity exists between digital systems. The guidelines state (USNRC, 1996) that adequate diversity is assumed to exist if: All of the following are different: programming language hardware function signal design (including design team), or The digital systems provide a different function but are developed using the same programming language and by the same vendor, or The digital systems have a different vendor but perform the same function ("nameplate" diversity), or Case-by-case review is required for other implementations of diversity. DEVELOPMENTS IN THE FOREIGN NUCLEAR INDUSTRY The Canadian Atomic Energy Control Board (AECB) has recently also been developing a position on this issue. Their draft regulatory guide C-138, Software in Protection and Control Systems (also discussed in Chapter 4 above), contains the following policy (AECB, 1996): To achieve the required levels of safety and reliability, the system may need to be designed to use multiple, diverse components performing the same or similar functions. For example, AECB Regulatory Documents R-8 and R-10 require two independent and diverse protective shutdown systems in Canadian nuclear power reactors. It should be recognized that when multiple components use software to provide similar functionality, there is a danger that design diversity may be compromised. The design should address this danger by enforcing other types of diversity such as functional diversity, independent and diverse sensors, and timing diversity. Thus, the AECB draft regulatory guide agrees with the USNRC with respect to recognizing the possibility of common-mode software failure and requiring steps to be taken to reduce that possibility. The difference appears to be that the AECB accepts functional diversity as one means of addressing the common-mode software failure issue but does not mandate it. The USNRC accepts digital systems performing the same function but provided by different vendors. DEVELOPMENTS IN OTHER SAFETY-CRITICAL INDUSTRIES Regulatory agencies in fields other than nuclear power do not, in general, have equivalent policies about common-mode software failure because of the different approach to safety assurance in other industries. Thus simple comparisons can be misleading. In general, in other industries, all components are considered potentially safety-critical and no distinction is made between safety and nonsafety systems except with respect to their potential to contribute to hazards identified in a system hazard analysis. Components whose operation or failure could cause hazards (such as control

OCR for page 43
systems) are treated in the same way as those that could mitigate hazards and, in fact, are considered more important because hazard prevention is given a higher priority than hazard mitigation. No assumptions are made or requirements levied to use protection or shutdown systems—the design approach used must be justified for each system according to the hazard analysis and characteristics of the particular system. The Federal Aviation Administration (FAA) satisfies the need for guidance in satisfying airworthiness requirements for airborne systems by a series of industry-generated and accepted guidelines reflecting best practices: DO-178B, Software Considerations in Airborne Systems and Equipment Certification. These guidelines are in the form of objectives for software life-cycle processes, descriptions of activities and design considerations for achieving these objectives, and descriptions of the evidence that indicate that the objectives have been achieved. The guidelines are applied in a graded manner that depends on the assessed level of criticality of the software component. Redundancy or diversity in the software is not required by DO-178B. If the licensee wants to take credit for it, that is, reduce the set of normally required activities for their software development process, they must argue the case and get approval from the FAA. Specifically, DO-178B states with respect to using software design diversity (FAA, 1992): The degree of dissimilarity and hence the degree of protection is not usually measurable. Probability of loss of system function will increase to the extent that the safety monitoring associated with dissimilar software versions detects actual errors or experiences transients that exceed comparator threshold limits. Multiple software versions are usually used, therefore, as a means of providing additional protection after the software verification process objectives for the software level have been satisfied. In summary, the FAA position on the use of software diversity is that the degree of dissimilarity and protection provided by design diversity is not usually measurable and therefore is usually counted only as additional protection above a required level of assurance. The defense and aerospace industry use MIL-STD-882C (DOD, 1993) or variations of it (for example, NASA standards are based on MIL-STD-882C). This standard requires the use of a formal safety program that stresses early hazard identification and elimination or reduction of associated risk to a level acceptable to the managing authority. Rather than specify a particular safety design approach, such as defense-in-depth, or design features, such as redundancy or diversity, MIL-STD-882C requires that contractors establish a system safety program that includes specific tasks (such as hazard tracking, reviews and audits, hazard analyses, and safety verification) and criteria (such as the use of qualitative risk assessment and an order of precedence for resolving hazards). In contrast to the FAA and some nuclear power standards, software components are not graded as to their criticality and then subjected to different software development procedures, but rather the hazards themselves are assessed and either eliminated or controlled. Earlier versions of this defense standard included tasks that were specific to software, but the latest version (MIL-STD-882C) has integrated the software tasks with the nonsoftware tasks and does not distinguish them. The U.S. Office of Device Evaluation of the Center for Devices and Radiological Health of the U.S. Food and Drug Administration has issued Reviewer Guidance for Computer Controlled Medical Devices Undergoing 510(k) Review (FDA, 1991). This guidance applies to the software aspects of premarket notification (510(k)) submissions for medical devices. It provides (1) an overview of the kind of information about software that FDA reviewers may expect in company submissions and (2) specification of the approach that FDA reviewers should take in reviewing computer-controlled devices, such as some key questions that will be asked during the review. The FDA guidance does not dictate any particular approach to safety, as does the USNRC, or specific software development or quality assurance procedures, as does the FAA. Instead, it focuses attention on the software development process to assure that potential hazardous failures have been addressed, effective performance has been defined, and means of verifying both safe and effective performance have been planned, carried out, and properly reviewed. The FDA believes that in addition to testing, device manufacturers should conduct appropriate analyses and reviews in order to avoid errors that may affect operational safety. The depth of review is dictated by both the risk to the patient of using (and not using) the device and the role that software plays in the functioning of the device. The three levels of concern are (FDA, 1991): MAJOR: The level of concern is major if operation of the device or software function directly affects the patient so that failures or latent design flaws could result in death or serious injury to the patient, or if it indirectly affects the patient (e.g., through the action of a care provider) such that incorrect or delayed information could result in death or serious injury of the patient. MODERATE: The level of concern is moderate if the operation of the device or software function directly affects the patient so that failures or latent design flaws could result in minor to moderate injury to the patient, or if it indirectly affects the patient (e.g., though the action of a care provider) where incorrect or delayed information could result in injury of the patient. MINOR: The level of concern is minor if failures or latent design flaws would not be expected to result in death or injury to the patient. This level is assigned to a software component that the manufacturer can show to be totally independent of other software or hardware that may be involved in a potential hazard and would not directly or indirectly lead to a failure of the device that could cause a hazardous condition to occur.

OCR for page 43
The FDA does not specify particular software assurance or development procedures. Instead, the FDA specifies what information should be included in the review documents and what types of questions will be asked during the review for each level of concern. The submission must include a hazard analysis that identifies the potential hazards associated with the device, the method of control (hardware or software), the safeguards incorporated, and the identified level of concern. Because there is no specification of how safety should be achieved, there is no guidance provided on redundancy or diversity. U.S. NUCLEAR REGULATORY COMMISSION RESEARCH ACTIVITIES The Office of Nuclear Regulatory Research of the USNRC indicated that they currently fund only one research project on common-mode failure potential. This research project is developing a software tool called Unravel for program slicing. Program slicing is a technique that was developed to assist with software debugging. Basically, program slicing extracts the statements that might affect the value of a specified variable before execution reaches a particular statement in the program. Thus, if one is trying to fix an error in statement N, it is helpful to know what other statements in the software can affect the values of the variables in that statement. To perform the slicing, the program is first represented as a flow graph annotated with the variables referenced and defined at each node (roughly, a node is a programming language statement). Unravel works only on programs written in the (ANSI [American National Standards Institute]) C programming language, without any extensions to the language. Some features of C cannot be handled, including calls to the C standard library. The USNRC argument for the usefulness of this tool is that it can assist auditors in evaluating functional diversity in safety-critical software and in conducting a thread audit. The committee has not previously seen any argument that the technique could be used for evaluating diversity and are skeptical about this (see the evaluation later in this chapter). ANALYSIS When multiple digital components are used to provide diversity, the potential for common-mode software failure exists, requiring consideration of two relevant issues: (1) whether failure independence can be assumed or under what conditions it can be assumed (Issue 1); and (2) whether failure independence can be verified, that is, whether there are any ways to determine that the digital components are adequately independent or diverse in their failure behavior (Issue 2). Both issues are examined in turn, considering both digital hardware and software. Issue 1 Is the failure independence assumption justified for independently produced digital components? For the purposes of discussing this question, design diversity is separated from functional diversity. Also, operating systems are grouped with hardware unless the operating system functions have been specially written for a particular application or digital device. In the latter case, operating systems are considered as application software. Design Diversity Case 1: Digital Hardware and Operating Systems. For hardware, the prevalence of a very few processors and real-time operating systems invalidates the use of simple "nameplate" diversity assumptions. Many computers with different manufacturers in fact have identical internal components or use the same operating system. Although the committee knows of no data to support generally rejecting the assumption of independence between failures of diverse digital hardware devices, there are three concerns in assuming independent failures between digital hardware components providing the same function but produced by different manufacturers. The first is that many of the well-publicized errors found in processors have involved similar functions, for example, floating point operations. The second is the increasing complexity of chip designs, which has led to a lowered ability to adequately test the designs before using them. Testing and verification techniques originally developed for software are now being adapted for use in digital hardware because the complexity of these hardware designs is approaching that of software, thus defying exhaustive testing. A third consideration is the use of common design environments, libraries, and fabrication facilities. Therefore, the question of whether hardware design errors can be assumed to be independent is beginning to have a close relationship to the same question with respect to software. Currently, however, when the design is different there exists no evidence to invalidate the assumption that failures of digital hardware components due to design errors will be independent. Similarly, assuming intended differences in design, there also is little current evidence to invalidate an assumption of independence of failure between different real-time operating systems. Note, however, that this assumption applies only to operating systems developed by different companies. Different versions of an operating system by one vendor often include the reuse of much of the same code. In addition, evidence does exist of similar failure modes and errors being found in UNIX operating systems built by different vendors (Miller et al., 1990). However, the above restrictions may be relaxed if analysis has shown that there is functional diversity. This would allow a single company to design functionally diverse operating

OCR for page 43
systems. Similarly, functional diversity needs to be assured when using different companies for operating systems and hardware. Licensing agreements between companies can destroy assumptions of functional diversity based on different vendors. However, even operating systems and library functions produced by different companies can have common-mode software errors. For example, in 1990, a mathematician reported on a computer bulletin board that he had found a serious bug in MACSYMA, a widely used program that computes mathematical functions (Sci.math, 1990). This program incorrectly computes the integral from 0 to 1 of the square root of (x + 1/x - 2) to be -(4/3) instead of the correct value of 4/3. Other readers of the bulletin board became curious and tried the same problem on other math packages. The result was that four packages (MACSYMA, Maple, Mathematica, and Reduce) got the same wrong answer while only one (Derive) got the correct answer. These mathematical packages were all developed separately in different programming languages, and even in different countries, and had been widely used for many years and yet contained the same error. Case 2: Application Software. The effectiveness of design diversity in increasing software reliability rests on the assumption of statistical independence of failures in separately developed software versions (including both application software and specially constructed operating system functions), such as separately developed digital protection systems. This assumption is important in evaluating whether software design diversity satisfies the USNRC requirements for diversity and independent failures. Several scientific studies have experimentally evaluated the hypothesis that software separately developed to satisfy the same functional requirements will fail in a statistically independent manner (Brilliant et al., 1990; Eckhardt et al., 1991; Knight and Leveson, 1986; Scott et al., 1987). All these studies have rejected the hypothesis with a high confidence level, i.e., concluded that the number of correlated (common-mode) failures that actually occurred for the programs in the various experiments could not have resulted by chance. The implication is that although design diversity might be able to increase reliability, increased reliability cannot be assumed. In two of the experiments, the programming errors causing correlated (common) failures were examined to better understand the nature of faults that lead to coincident failures and to determine methods of development for multiple software versions that would help avoid such faults. The first experiment (Knight and Leveson, 1986) found that, as anticipated, in some cases the programmers made equivalent logical errors. More surprising, there were cases in which apparently different logical errors yielded correlated failures in completely different algorithms or in different parts of similar algorithms. For example, in order to satisfy the requirements, the programs needed to compute the size of an angle given three points. Most of the programs worked correctly for the normal case. However, eight of the 27 programmers had difficulty in handling the case where three points were collinear, even though the algorithms used and the actual errors made were quite different. Five of the eight mishandled or failed to consider one or both of the possible subcases (i.e., angle equal to zero degrees and angle equal to 180 degrees). One handled all the cases, but used an algorithm that was inaccurate over certain parts of the input space. Another had machine round-off problems. The final programmer had an apparent typo in an array subscript that, seemingly by chance, resulted in an error only when the points were collinear. Knight and Leveson concluded that there are some input cases (i.e., parts of the problem space) that are more difficult to handle than others and are therefore likely to lead to errors, even though the algorithms used and the actual errors made may be very different. The second experiment (Scott et al., 1987) examined the errors made in the programs in their experiment and also concluded that dependence was related to a "difficulty factor": If one program gave a wrong answer for a particular input, then it was likely that other programs would also produce an incorrect answer, even though the errors made were different and the programs used different algorithms. In another experiment, Brunelle and Eckhardt (1985) took a portion of an operating system (SIFT) and ran it in a three-way voting scheme with two other operating systems written for the same computer. The results showed that although no errors were found in the original version, there were instances where the two new versions outvoted the correct original version to produce a wrong answer. Following these experiments, Eckhardt and Lee (1985) produced a mathematical model that explains the results. Their model also shows that even small probabilities of correlated failures, i.e., deviation from statistically independent failures, cause a substantial reduction in potential reliability improvement when using diverse software components. In summary, the experiments conducted on this issue indicate that statistically correlated failures result from the nature of the application, from similarities in the difficulties experienced by individual programmers, and from special cases in the input space. The correlations seem to be related to the fact that the programmers are all working on the same problem and that humans do not make mistakes in a random fashion. There is no reason to expect that the use of different development tools or methods, or any other simple technique, will reduce significantly the incidence of errors giving rise to correlated failures in multiple-version software components. All evidence points to the fact that independently developed software that uses different programmers, programming languages, and algorithms but computes the same function (satisfies the same functional requirements) cannot be assumed to fail in an independent fashion. Thus the USNRC

OCR for page 43
position that allows "nameplate" diversity or design diversity to be used to assure independence is not supported by the extensive scientific evidence that is available. Other regulatory agencies, such as the FAA and the AECB, do not accept design diversity as evidence of failure independence. Functional Diversity In contrast with design diversity, no assumptions about the independence of the code are made when using functional diversity, only about whether the functional requirements are independent and different. The problem here really reduces to the same problem that is found with functionally diverse analog components, and no new procedures are necessary except to determine whether any new failure modes have been introduced that might violate the system-level independence assumptions. Thus, the current USNRC position on functional diversity is consistent with the scientific evidence. Issue 2 Can the independence of multiple versions of software be evaluated? That is, if the assumption of statistical independence cannot simply be assumed in independently developed software, can software diversity be evaluated or assessed in some way? Procedures have been developed for evaluating the potential for common-mode failure of analog hardware components. In addition, the number of states and the continuity of behavior over the total state space for analog components allows either exhaustive testing or much more confidence in the testing. In contrast, only a small fraction of the state space for digital systems can usually be tested and the lack of continuity in behavior does not allow any assumptions about the behavior of the software for any inputs or input sequences not specifically tested. Verifying diversity between two algorithms is impossible in general. Equivalence between two algorithms (and thus also lack of equivalence) has been proven to be mathematically undecidable. But even if diversity cannot be assessed formally, perhaps it can be evaluated informally. The problem reduces to determining what is meant by design diversity between two computer programs. Syntactic diversity (differences in the syntax or lexical structure of the programs) is not the relevant issue: Two programs can be syntactically different (look very different) and yet compute identical mathematical functions. Even if one could verify diversity between two algorithms, that would not be adequate, because different algorithms may compute the same functions and therefore behave identically. Basically, what is sought is two programs that compute the same function except where they are incorrect (i.e., where they differ from the requirements). Evaluating for independence of failure behavior would require proving that the two programs were different only in their failure behavior (or that they were not identical in their total function computed). To accomplish this would require the same logical power as that required to identify design errors (at which point they would just be removed). Thus, if it were possible to verify effective design diversity, diversity would not be needed. In summary, there is no way to verify or evaluate the diversity of two software versions or to determine whether they will fail independently. As discussed earlier, the USNRC currently is funding a research project at the National Institute of Standards and Technology to build a tool called Unravel for program slicing. A stated goal for this tool is to assist USNRC auditors in evaluating functional diversity in nuclear power plant safety system software. The developers of the tool say that it can be used to "identify code that is executed in more than one computation and [that] thus could lead to a malfunction of more than one logical software component." In general, evaluating functional diversity is not possible by simply identifying the code related to a particular computation, as done by program slicing. The probability that separately developed programs will contain the same code is extremely small. If there is any attempt to make the software diverse, then the programs will almost certainly use different variables, data structures, and algorithms. In addition, the experiments described above found that programs failed in a statistically dependent manner even when they used completely different algorithms and had unrelated programming errors. The only relationship needed between software errors to cause statistically dependent failures is that the errors occur on the execution paths for each program that will be followed by the same input data. The errors can appear anywhere on those paths, and the computations and errors on the paths may be different. The second proposed use of program slicing is for thread audits. However, a technique like program slicing that works backward from a particular statement to find any statements that might affect it seems to have much less relevance for thread audits than a tool that will identify paths through the code starting from particular inputs. Other techniques, such as symbolic execution, are more precise (provide more information to the analyst) and are probably less costly. Slicing can work backward from an output to identify statements affecting the output and thus all paths to that output, but cannot distinguish feasible from infeasible paths and identifies all such paths, not just those related to a particular input. The analyst must then by hand determine which paths relate to the thread being investigated and determine whether the path is feasible (a difficult task). Symbolic execution, on the other hand, can start from specific inputs and identify feasible paths through the code, evaluating the particular predicates that must be true for the path to be taken. Another potentially useful technique related somewhat to symbolic evaluation, called Software Deviation Analysis (Reese, 1996), also does a forward analysis from inputs to determine

OCR for page 43
the effect on outputs, but starts with likely or possible deviations in the inputs from their expected values and determines whether hazardous outputs can be generated. Alternatives to Diversity for Software In addition to the two main diversity issues discussed above (Issue 1 and Issue 2), one final question is whether redundancy and diversity are the most effective way to increase reliability for digital systems or whether there are more effective alternatives. Potential alternatives include mathematical verification techniques, self-checking software, and safety analysis and design techniques. While mathematical verification of software is potentially effective in finding programming errors, these techniques are difficult to use and have only been applied to very small programs by mathematically sophisticated users. The difficulty of writing the required formal specifications and doing the proofs has not been shown to be less error prone in practice than using less formal techniques. In fact, little or no comparative evaluation with the alternatives has been done. Despite these caveats, the committee notes that mathematical verification has been used by Ontario Hydro on their Darlington and Pickering plant protection system software. The committee understands that the Canadian experience shows that mathematical verification costs can be very high but is far more cost effective if it is built into the development process from the beginning rather than being imposed at the end. Digital systems have the capability to provide self-checking to detect digital hardware failures and some software errors during execution. This has proven effective for random hardware failures but not for software design errors. Built-in tests for some programming errors, such as attempting to divide by zero, are easily implemented and effective. However, checking for more subtle errors is more difficult and may, in itself, add so much complexity that it leads to errors. For example, a licensee event report about a problem at the Turkey Point plant in Florida in 1994 described a software error that could result in a real emergency signal being ignored if it is received 15 seconds or more after the start of particular test scenarios (see discussion in Chapter 4). An experiment by Leveson et al. (1990) in writing self-checks for software found that very few of the known errors in the code were found by the self-checks. Even more discouraging, the self-checks themselves introduced more errors than they found. Safety analysis and design techniques (see Leveson, 1995) extend standard system safety techniques to software. Software-related hazards are identified and then eliminated or controlled. In this approach, not all potential errors are targeted but simply those that could lead to hazards or accidents. As such, this approach is potentially less costly than a full formal verification. A type of safety verification procedure (called software fault tree analysis) was used (in addition to formal verification) during the licensing of the Darlington shutdown system (Bowman et al., 1991). The information provided during the analysis was used to change the code to be more fault-tolerant and to design 40 self-checks that were added to the software. Although many in the software engineering community believe that there are more cost-effective techniques (including both those described here and others) for achieving high software reliability than redundancy and diversity, there is no agreement among them about what these alternatives are. CONCLUSIONS AND RECOMMENDATIONS Conclusions Conclusion 1. The USNRC position of assuming that common-mode software failure could occur is credible, conforms to engineering practice, and should be retained. Conclusion 2. The USNRC position with respect to diversity, as stated in the draft branch technical position, Digital Instrumentation and Control Systems in Advanced Plants, and its counterpart for existing plants, is appropriate. Conclusion 3. The USNRC guidelines on assessing whether adequate diversity exists need to be reconsidered. With regard to these guidelines: (a) The committee agrees that providing digital systems (components) that perform different functions is a potentially effective means of achieving diversity. Analysis of software functional diversity showing that independence is maintained at the system level and no new failure modes have been introduced by the use of digital technology is no different from that for upgrades or designs that include analog instrumentation. (b) The committee considers that the use of different hardware or real-time operating systems is potentially effective in achieving diversity provided functional diversity has been demonstrated. With regard to real-time operating systems, this applies only to operating systems developed by different companies or shown to be functionally diverse. (c) The committee does not agree that use of different programming languages, different design approaches meeting the same functional requirements, different design teams, or different vendors' equipment used to perform the same function is likely to be effective in achieving diversity. That is, none of these methods is a proof of independence of failures. Conversely, neither is the presence of these proof of dependence of failures. Conclusion 4. There appears to be no generally applicable, effective way to evaluate diversity between two pieces of software performing the same function. Superficial or surface (syntactic) differences do not imply failure independence, nor does the use of different algorithms to achieve the same functions. Therefore, funding research to try to evaluate

OCR for page 43
design diversity does not appear to be a reasonable use of USNRC research funds. Conclusion 5. Although many in the software community believe that there are more cost-effective techniques for achieving high software reliability than redundancy and diversity, there is no agreement as to what these alternatives may be. The most promising of these appear to be the extension of standard safety analysis and design techniques to software and the use of formal (mathematical) analysis. (See Recommendation 3 in Chapter 4.) Conclusion 6. The use of self-checking to detect hardware failures and some simple software errors is effective and should be incorporated. However, care must be taken to assure that the self-checking features themselves do not introduce errors. Recommendations Recommendation 1. The USNRC should retain its position of assuming that common-mode software failure is credible. Recommendation 2. The USNRC should maintain its basic position regarding the need for diversity in digital instrumentation and control (I&C) systems as stated in the draft branch technical position, Digital Instrumentation and Control Systems in Advanced Plants, and its counterpart for existing plants. Recommendation 3. The USNRC should revisit its guidelines on assessing whether adequate diversity exists. The USNRC should not place reliance on different programming languages, different design approaches meeting the same functional requirements, different design teams, or using different vendors' equipment ("nameplate" diversity). Rather, the USNRC should emphasize potentially more robust techniques such as the use of functional diversity, different hardware, and different real-time operating systems. Recommendation 4. The USNRC should reconsider the use of research funding to try to establish diversity between two pieces of software performing the same function. This does not appear to be possible. Specifically, it appears the USNRC funding of the Unravel tool is based on the use of this tool for this purpose and, as such, is unlikely to be useful. REFERENCES AECB (Atomic Energy Control Board, Canada). 1996. Draft Regulatory Guide C-138, Software in Protection and Control Systems. Ottawa, Ontario: AECB. Bowman, W.C., G.H. Archinoff, V.M. Raina, D.R. Tremaine, and N.G. Leveson. 1991. An Application of Fault Tree Analysis to Safety-Critical Software at Ontario Hydro. Presentation at Conference on Probabilistic Safety Assessment and Management (PSAM), Beverly Hills, Calif., April. Brilliant, S., J.C. Knight, and N.G. Leveson. 1990. Analysis of faults in an N-version software experiment. IEEE Transactions on Software Engineering 16(2):238–247. Brunelle, J.D., and D.E. Eckhardt. 1985. Fault-Tolerant Software: Experiment with the SIFT Operating System. Presentation at AIAA Computers in Aerospace Conference, Dallas, October. Eckhardt, D.E., and L. Lee. 1985. A theoretical basis for the analysis of multiversion software subject to coincident errors. IEEE Transactions on Software Engineering 11(12):1511–1517. Eckhardt, D.E., A.K. Caglayan, P. Lorczak, J.C. Knight, D.F. McAllister, M. Vouk, L. Lee, and J.P. Kelly. 1991. Robustness of software redundancy as a strategy for improving reliability. IEEE Transactions on Software Engineering 17(7):692–702. DOD (U.S. Department of Defense). 1993. Military Standard 882C, System Safety Program Requirements. Washington, D.C.: U.S. Department of Defense. FAA (Federal Aviation Administration). 1992. DO-178B, Software Considerations in Airborne Systems and Equipment Certification. Washington, D.C.: FAA. FDA (Food and Drug Administration). 1991. Reviewer Guidance for Computer Controlled Medical Devices Undergoing 510(k) Review. Washington, D.C.: FDA. Knight, J.C., and N.G. Leveson. 1986. An experimental evaluation of the assumption of independence in multi-version programming. IEEE Transactions on Software Engineering 12(1):96–109. Leveson, N.G. 1995. Safeware: System Safety and Computers. New York: Addison-Wesley. Leveson, N.G, S.S. Cha, J.C. Knight, and T.J. Shimeall. 1990. The use of self checks and voting in software error detection: An empirical study. IEEE Transactions on Software Engineering 16(4):432–443. Miller, B.P., L. Fredrikson, and B. So. 1990. An empirical study of the reliability of UNIX utilities. Communications of the Association for Computing Machinery 33(12):32–44. Reese, J.D. 1996. Software Deviation Analysis. Ph.D. dissertation, University of California, Irvine. January. Sci.math. 1990. Various authors posting to this Usenet newsgroup, Feb. 3–8. Scott, R.K., J.W. Gault, and D.F. McAllister. 1987. Fault tolerant reliability modeling. IEEE Transactions on Software Engineering 13(5):582–592. USNRC (U.S. Nuclear Regulatory Commission). 1992. Draft Branch Technical Position on Digital Instrumentation and Control Systems in Advanced Plants. Washington, D.C.: USNRC. USNRC. 1996. Draft Branch Technical Position on Defense-in-Depth and Diversity. Washington, D.C.: USNRC. (Also USNRC staff presentation to the Committee on Application of Digital Instrumentation and Control Systems to Nuclear Power Plant Operations and Safety, Washington, D.C., October 1995.)