4
Software Quality Assurance

INTRODUCTION

Software in nuclear power plants can be used to execute relatively simple combinational logic, such as that used for reactor trip functions, or more elaborate sequential logic, such as that used for actuating engineered safety features or for process control and monitoring. In either case, it must be ensured that required actions are taken and unnecessary trips are avoided.1

One way of assuring software quality is by examining and approving the process used to produce it. The assumption behind assessing the process by which software is produced is that high-quality software development processes will produce software products with similar qualities. An alternate approach to quality assurance is to directly evaluate properties of the software. Software properties include correctness, reliability, and safety.

Software is defined as correct if it behaves according to its requirements. Assurance of software correctness is sought either experimentally via program testing or analytically through formal verification techniques. Software may be correct but still not perform as intended, however, because of flaws in requirements (e.g., inconsistencies or incompleteness) or assurance techniques (e.g., failing to consider or design for all significant parts of the software's input space).

Software reliability is "the probability that a given program will operate correctly in a specified environment for a specified duration" (Goel and Bastani, 1985). Several models have been proposed for estimating software reliability (Musa et al., 1987).

Software is safe if it does not exhibit behaviors that contribute to a system hazard (i.e., a state that can lead to an accident given certain environmental conditions). Safety analysis and assurance techniques have been developed for all stages of the software life cycle (i.e., systems analysis, requirements, design, and code verification) (Leveson, 1995).

Complexity is an important aspect of assessing correctness, reliability, and safety of software. (The committee notes that complexity is of critical importance to the use of digital instrumentation and control [I&C] systems, and it is addressed in numerous places in this report.) For example, the committee is not aware of software metrics for complexity which are reliable and definitive.

Analog and digital systems should be analyzed differently because the assumptions underlying their design and production are different. Reliability estimation for analog systems primarily measures failures caused by parts wearing out, whereas for digital systems it seeks to address failures primarily caused by latent design flaws. Analog systems can be modeled using continuous and discrete functions, whereas digital systems must be modeled using discrete mathematics only. Although analog systems could contain similar latent design flaws, they are believed to be accommodated by existing evaluation techniques. When an analog system functions correctly on two "close" test points and continuous mathematics is applicable, it can be assumed that it will also function on all points between the two test points. This is not necessarily true for digital systems, which may produce very different results for similar test points.

Statement of the Issue

The use of software is a principal difference between digital and analog I&C systems. Quality of software is measured

1  

Although this chapter covers software quality assurance, its conclusions apply to any technology requiring equivalent design effort, e.g., field programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), and programmable logic controllers (PLCs). Digital hardware designs can range in complexity from a simple circuit to a microprocessor to a general purpose computer. The complexity of a design is not eliminated or changed simply by expressing the design in a different form. The committee has seen no proof that software that is implemented on an ASIC is unlikely to have a different level of reliability or to be more verifiable. Testability (which is related to complexity) is not changed simply because the form of the software instructions has changed from a set of programming language instructions to a set of gate arrays. However, software implemented in ASICs (as well as software stored in read-only memory) does have configuration control advantages in that unintended changes to the software outside the configuration management system becomes much more difficult.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 33
4 Software Quality Assurance INTRODUCTION Software in nuclear power plants can be used to execute relatively simple combinational logic, such as that used for reactor trip functions, or more elaborate sequential logic, such as that used for actuating engineered safety features or for process control and monitoring. In either case, it must be ensured that required actions are taken and unnecessary trips are avoided.1 One way of assuring software quality is by examining and approving the process used to produce it. The assumption behind assessing the process by which software is produced is that high-quality software development processes will produce software products with similar qualities. An alternate approach to quality assurance is to directly evaluate properties of the software. Software properties include correctness, reliability, and safety. Software is defined as correct if it behaves according to its requirements. Assurance of software correctness is sought either experimentally via program testing or analytically through formal verification techniques. Software may be correct but still not perform as intended, however, because of flaws in requirements (e.g., inconsistencies or incompleteness) or assurance techniques (e.g., failing to consider or design for all significant parts of the software's input space). Software reliability is "the probability that a given program will operate correctly in a specified environment for a specified duration" (Goel and Bastani, 1985). Several models have been proposed for estimating software reliability (Musa et al., 1987). Software is safe if it does not exhibit behaviors that contribute to a system hazard (i.e., a state that can lead to an accident given certain environmental conditions). Safety analysis and assurance techniques have been developed for all stages of the software life cycle (i.e., systems analysis, requirements, design, and code verification) (Leveson, 1995). Complexity is an important aspect of assessing correctness, reliability, and safety of software. (The committee notes that complexity is of critical importance to the use of digital instrumentation and control [I&C] systems, and it is addressed in numerous places in this report.) For example, the committee is not aware of software metrics for complexity which are reliable and definitive. Analog and digital systems should be analyzed differently because the assumptions underlying their design and production are different. Reliability estimation for analog systems primarily measures failures caused by parts wearing out, whereas for digital systems it seeks to address failures primarily caused by latent design flaws. Analog systems can be modeled using continuous and discrete functions, whereas digital systems must be modeled using discrete mathematics only. Although analog systems could contain similar latent design flaws, they are believed to be accommodated by existing evaluation techniques. When an analog system functions correctly on two "close" test points and continuous mathematics is applicable, it can be assumed that it will also function on all points between the two test points. This is not necessarily true for digital systems, which may produce very different results for similar test points. Statement of the Issue The use of software is a principal difference between digital and analog I&C systems. Quality of software is measured 1   Although this chapter covers software quality assurance, its conclusions apply to any technology requiring equivalent design effort, e.g., field programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), and programmable logic controllers (PLCs). Digital hardware designs can range in complexity from a simple circuit to a microprocessor to a general purpose computer. The complexity of a design is not eliminated or changed simply by expressing the design in a different form. The committee has seen no proof that software that is implemented on an ASIC is unlikely to have a different level of reliability or to be more verifiable. Testability (which is related to complexity) is not changed simply because the form of the software instructions has changed from a set of programming language instructions to a set of gate arrays. However, software implemented in ASICs (as well as software stored in read-only memory) does have configuration control advantages in that unintended changes to the software outside the configuration management system becomes much more difficult.

OCR for page 33
in terms of its ability to perform its intended functions. This, in turn, is traced to software specifications and compliance with these specifications. Neither of the classic approaches of (a) controlling the software development process or (b) verifying the end-product appears to be fully satisfactory in assuring adequate quality of software, particularly for use with safety-critical systems. How can the U.S. Nuclear Regulatory Commission (USNRC) and the nuclear industry define a generally accepted, technically sound solution to specifying, producing, and controlling software needed in digital I&C systems? Discussion High quality software results from the use of good software engineering practices during development to minimize the probability of introducing errors into the software, and a rigorous verification process to maximize the probability of detecting errors. Good software engineering practices (e.g., structured programming and data abstraction) reduce the amount of information that developers must remember when writing, analyzing, or changing software. However, good software engineering methods are not easy to apply, and the methods only reduce rather than eliminate errors (Parnas, 1985). Thus software verification activities remain a key concern. Software verification seeks to determine that the software being built corresponds to its specification, and software validation seeks to demonstrate that the system meets its operational goals. Verification and validation (V&V) activities may focus on either the process or the product. Process-oriented V&V focuses on the process by which the software is produced. It typically involves performing and observing inspections and evaluating test results. Product-oriented V&V focuses on testing and evaluating the final product, independent of the process. Different techniques for assessing software quality have been developed. These techniques fall into two broad categories, analytic or experimental, each of which encompasses a large number of methods. Analytic techniques include inspections or walk-throughs and formal analysis methods based on mathematics. Program testing is the most common experimental analysis technique. In software reviews or inspections, teams of software developers examine software artifacts for defects. Participants may be given lists of questions about the artifact that they must answer in order to ensure that they are sufficiently prepared for an inspection, and they may be given lists of potential errors for which they are to check. Inspections have proved to be an effective method for detecting software defects (Fagan, 1976). Requirements inspections catch errors before they propagate into designs and implementations, making them less costly to repair. Also, inspections subject a software artifact to the scrutiny of several people, some of whom would not have participated in the artifact's design. Successful inspections depend on the experience levels of the participants and the quality of the artifacts inspected (Porter et al., 1996). They also depend on the requirements being expressed in a precise, unambiguous manner so the reviewers are able to check the document without having to make assumptions on how the system will be implemented. This can be challenging in practice because it is difficult to find a notation such that reviewers are able to effectively check the correctness of the requirements rather than focusing on the details of the notation. Furthermore, the notation must be "readable" by both users and developers. Formal methods2 use mathematical techniques to assess if an artifact is consistent with a more abstract description of its general and specific properties (Rushby, 1993). General properties derive from the form of the artifact's description (e.g., that functions are total, that axioms are consistent, or that variables are initialized before they are referenced). Specific properties derive from the problem domain and are captured in an abstract description. Verification using formal methods involves the comparison of a more detailed description of a software system with the more abstract description of its properties. Verifying specific properties of programs using formal methods has proved to be very difficult (Gerhart and Yelowitz, 1976; Rushby and von Henke, 1991, 1993). Furthermore, making mathematical proofs does not guarantee the software will function correctly. Even if one could perform the verification using formal methods, testing would still be necessary to validate the assumptions in the proofs. These assumptions would include that the model matches the real world and that the code statements will behave as modeled when executed on the target hardware. Moreover, errors are often made in proofs. Testing is used to expose program flaws and to estimate software reliability. Black-box testing seeks to determine if software has functional behavior that is consistent with its requirements. Black-box testing is concerned only with inputs and outputs. White-box testing addresses the internal structure of software (e.g., the outcome of its logical tests) and seeks to exercise the internal structure: Some engineers believe one can design black box tests without knowledge of what is inside the box. This is, unfortunately, not completely true. If we know that the contents of a black box exhibit linear behavior, the number of tests needed to make sure it would function as specified could be quite small. If we know that the function can be described by a polynomial of order 'N,' we can use that information to determine how many tests are needed. If the function can have a large number of discontinuities, far more tests are needed. That is why a shift from analogue technology to software brings with it a need for much more testing (Parnas et al., 1990). 2   The committee does not make a blanket endorsement of "formal methods." However, the committee considers that elements of formal methods are useful and appropriate and has indicated in the report specific instances where they should be used. For example, see Recommendation 2 in this chapter.

OCR for page 33
In testing, practitioners seek to find suitable test cases so that if the software exhibits acceptable behavior for these cases it can be inferred that it will work similarly for other cases. However, complex software systems have large numbers of states and irregular structure. Testing can only sample a fraction of these states, and it cannot be inferred that untested states are free from errors if none are exhibited in tested states. As Dijkstra (1970) points out, "Program testing can be used to show the presence of bugs, but never to show their absence!" Software standards can help achieve acceptable levels of software quality. Because software development practices are constantly improving, standards should not require developers to use particular techniques. However, standards can include definitive acceptance criteria. An example of definitive and objective acceptance criteria in existing standards is the requirement for white-box structural coverage in the Federal Aviation Administration standard, Software Considerations in Airborne Systems and Equipment Certification (DO-178B). Depending on the safety category, software logic must be test-exercised until the specified acceptance criteria have been met. There are several existing standards for the production of safety-critical software for nuclear power plants. These include IEC 880, Software for Computers in the Safety Systems of Nuclear Power Stations (IEC, 1986) and IEEE 7-4.3.2–1993, Standard Criteria for Digital Computers in Safety Systems of Nuclear Power Generating Stations (IEEE, 1993). IEC 880 outlines the software development techniques to be used in the development of software for the shutdown systems of nuclear power plants. Rather than mandate particular techniques, IEC 880 states the requirements on the product; it is up to the developer to meet those requirements using whatever techniques the developer considers suitable. There are guidelines presented in an appendix to IEC 880 that describe the effects that particular techniques are expected to achieve. IEEE 7-4.3.2–1993 advocates choosing a combination of the following V&V activities: independent reviews, independent witnessing, inspection, analysis, and testing. Some of these activities may be performed by developers, but independent reviews must subsequently be performed. Walk-throughs of design, code, and test results are recommended inspection techniques. Analysis includes, but is not limited to, formal proofs, Petri net and other graphical analysis methods, and related techniques. Functional and structural testing are recommended for any software artifact that is executable or compilable. Testing of nonsafety functions may be required to provide adequate confidence that nonsafety failures do not adversely impact safety functions. The standard points out that functional testing cannot be used to conclusively determine that there are no internal characteristics of the code that would cause unintended behavior unless all combinations of inputs, both valid and invalid, are exhaustively tested. IEEE standards have been criticized as "ad hoc and unintegrated" because they have been developed in a piecewise fashion (Moore and Rada, 1996). Generally, IEEE 7-4.3.2–1993 does not suggest which V&V activities are most effective, nor does it discriminate between activities that are mainly actuarial (e.g., witnessing) and those that are technical (e.g., analysis and testing). In addition, the standard states that path testing is feasible. Except for extremely simple programs, however, numerous references have shown that path testing requires an infeasible number of tests (e.g., Myers, 1979). Therefore, for most practical programs, path testing is infeasible. Furthermore, even if path testing were feasible and were performed, the resulting program could still have undetected errors: for example, there could be missing paths, the program might not satisfy its requirements (the wrong program could have been written), and there could be data-sensitivity errors. (As an example of a data-sensitivity error, suppose a program has to compare two numbers for convergence, that is, see if the difference between two numbers is less than some predetermined value. One could write: "If A - B < ε then…" But this is wrong, because the comparison should have been with the absolute value of A - B. Detection of this error is dependent on the values used for A and B and would not necessarily be found by simply executing every path through the program.) Once high-quality software has been prepared initially, it is likely to undergo continuous change to accommodate new hardware, fix latent errors, or add new functions to existing systems. Configuration control requires rigorous review and formal approval of software changes. Managing multiple versions of software systems and assuring that changes do not degrade system reliability and safety is a difficult problem. CURRENT U.S. NUCLEAR REGULATORY COMMISSION REGULATORY POSITIONS AND PLANS Current Positions The USNRC regulatory basis for software quality assurance is given in: 10 CFR 50.55a(h), Protection Systems, which mandates the use of IEEE Standard 279–1971, Criteria for Protection of Systems for Nuclear Power Generating Stations Title 10 CFR Part 50, Appendix A, General Design Criteria for Nuclear Power Plants (Criterion 1, Quality Standards and Records; Criterion 21, Protection System Reliability and Testability; Criterion 22, Protection System Independence; and Criterion 29, Protection Against Anticipated Operational Occurrences) Title 10 CFR Part 50, Appendix B, Quality Assurance Criteria for Nuclear Power Plants and Fuel Reprocessing

OCR for page 33
Plants (Section III, Design Control; Section V, Instructions, Procedures, and Drawings; and Section VI, Document Control) To provide more specific guidance, the USNRC uses Regulatory Guide 1.152, Criteria for Programmable Digital Computer System Software in Safety-Related Systems of Nuclear Power Plants, and ANSI/IEEE/ANS 7-4.3.2–1982, Application Criteria for Programmable Digital Computer Systems in Safety Systems of Nuclear Power Generating Stations (promulgated jointly by the American National Standards Institute, the Institute of Electrical and Electronics Engineers, and the American Nuclear Society), in conducting software reviews. Other standards are used as reference, e.g., IEEE 1012–1986, IEEE Standard for Software Verification and Validation Plans, and ASME [American Society of Mechanical Engineers] NQA-2A–1990, Part 2.7, Quality Assurance Requirements of Computer Systems for Nuclear Facility Applications. The Standard Review Plan cites and makes use of these standards and is an attempt to integrate their various requirements. Staff Reviews USNRC staff reviews of the V&V processes used during software development seem quite thorough. One particularly good example is the staff review of the V&V process for the Eagle 21 reactor protection system installed at Zion Units 1 and 2 (USNRC, 1992). Staff activities included comparing V&V to ANSI/IEEE/ANS 7-4.3.2–1982, verifying the independence of V&V personnel, reviewing the development of functional requirements and subsequent software development documents, and reviewing software problem reports and fixes. They also performed a thread audit by picking sample plant parameters and tracing the software development from developing the requirements to the writing and testing of code. This review included reviewing code on a sample basis, comparing software development documents and code, and examining software problem reports and corrections. The entire system was also examined for potential timing problems between the software and hardware. The staff noted: "Experience with computer projects has demonstrated that the development of computer system functional requirements can have a significant impact on the quality and safety of the implemented system" (USNRC, 1992). The staff randomly sampled 56 of the 408 problem reports and found that 21 percent had significant implications (e.g., equations that did not match requirements or logic defects). Discovery of this type of error raised the staff's concerns regarding the potential for common-mode failures in digital electronics and convinced the staff that rigorous V&V activities were needed to augment the developer's functional tests. The staff's thread audit discovered three discrepancies between the requirements and the design documents (e.g., a piece of source code that the requirements seemed to mandate but that was omitted in the design). The staff concluded that although there were problems in implementation of the V&V plan, the basic plan was sound. The staff also considered whether the use of different releases of compilers affected the correctness of the software. They also considered Commonwealth Edison's configuration management program for the software. The USNRC approved the approach taken on both of these issues. Research and Plans The seven existing sections of Chapter 7 of the 1981–1982 version of the Standard Review Plan (SRP) are being updated (project completion expected in June 1997) to incorporate digital technology aspects. Two new sections are being added (Section 7.8, Diverse I&C Systems, to deal with the ATWS [anticipated transients without scram] rule and the defense-in-depth and diversity analysis of digital safety I&C systems, and Section 7.9, Data Communications, to deal with new issues like multiplexing). New branch technical positions are also being developed for inclusion in the SRP update, including ones on software development process, software development outputs, and programmable logic controllers. As part of the SRP update process, the USNRC is developing regulatory guides to endorse (with possible exceptions) 10 industry software standards: IEEE 7-4.3.2–1993, Standard Criteria for Digital Computers in Safety Systems of Nuclear Power Generating Stations (an update of the 1982 version) IEEE 603–1991, Standard Criteria for Safety Systems in Nuclear Power Generating Stations (follow-on to IEEE 279–1971) IEEE 828–1990, Standard for Configuration Management Plans IEEE 829–1983, Standard for Software Test Documentation IEEE 830–1984, Guide for Software Requirements Specifications IEEE 1008–1987, Standard for Software Unit Testing IEEE 1012–1986, Standard for Software Verification and Validation Plans IEEE 1028–1988, Standard for Software Reviews and Audits IEEE 1042–1987, Guide to Software Configuration Management IEEE 1074–1991, Standard for Developing Life Cycle Processes The USNRC also has ongoing research programs. One of these, called Review and Assessment of Software Languages for Use in Nuclear Power Plant Safety Systems, is assessing advantages and disadvantages of programming languages used in safety systems. Another, called Measurement Based Dependability Analysis for Digital Systems, is analyzing operational failure data to estimate failure probabilities.

OCR for page 33
Finally, as a member of the Halden Reactor Project, the USNRC is following research being conducted at the project on the use of formal methods in development and in quality assurance/licensing issues. DEVELOPMENT IN THE U.S. NUCLEAR INDUSTRY Vendors During the course of Phase 2 activities, the committee talked with three digital I&C vendors about software quality assurance: Foxboro Controls, General Electric Nuclear Engineering, and Westinghouse. Vendors reported developing systems containing at least 10,000 lines of code in a mixture of high-level and assembly languages. Their software quality assurance programs were generally modeled after IEEE 7-4.3.2–1993 and IEC 880 and had been audited and approved by USNRC staff. Nuclear Utilities In Phase 2, the committee also talked with a number of nuclear utilities engaged in digital I&C upgrades: Baltimore Gas and Electric Company, Public Service Electric and Gas (PSE&G) Company, Northeast Utilities, and Pacific Gas and Electric Company. Representatives from several of the utilities mentioned that strong requirements analysis and configuration control were keys to producing high-quality software. The representatives noted that strong analysis requirements and configuration control should be applied to safety-critical software and nonsafety software, even though nuclear plant designs routinely separate the hardware and software so that nonsafety software does not run on the same computer as the safety-critical applications. It is clear that high standards must be applied to software running on safety-critical computers since any such program has the potential to cause a safety-critical failure. The utility representatives emphasized that the same strong requirements should be applied to the nonsafety software because even nonsafety applications could malfunction in such a way that safety systems could be required to respond and have safety implications. They also noted that hazard/failure analyses should be part of a V&V program. PSE&G described a four-stage review that considers hardware-software interactions, the software development process, thread analysis of a small number of functions, and component-based failure analysis. DEVELOPMENTS IN THE FOREIGN NUCLEAR INDUSTRY During the course of Phase 2 activities, the committee also talked with representatives from the Canadian and Japanese nuclear power industry and had access to information on the British experience with digital I&C systems pertaining to software quality assurance. A representative from Mitsubishi Heavy Industries asserted that they rely on the IEC 880 standard for software quality assurance. British Nuclear Electric issued Nuclear Electric Programmable Electronic Systems guidelines for the quality assurance of digital I&C systems. Considerable controversy surrounds the results of British Nuclear Electric's tests of the Sizewell B primary protection system (PPS). These tests were not part of system validation testing, but rather a set of tests concentrated on infrequent fault scenarios that were designed to support safety claims made for the PPS (W.D. Ghrist III, personal communication to the committee, May 1996). Most test results were to be resolved automatically by use of a test driver that compared them to responses predicted from a model, and the remainder were to be resolved manually. However, only half of the first 50,000 tests were resolved automatically, resulting in reports that the PPS failed 50 percent of its tests. Manual inspections of test results were necessary because of timing problems between the PPS and the test driver. For example, inputs were not being provided to the PPS fast enough to prevent it from indicating failures of incoming data link communications, or the PPS responded at a rate much faster than input values were changing. In fact, only three or four errors were found in time delay and setpoint levels because of specification discrepancies. One conclusion that could be drawn about this experience is that there were problems with the completeness and configuration control of the requirements: Understanding the response time of the PPS required knowledge of the system design as well as the requirements; hysteresis information was in the original functional specification but not the specification provided to the test group; and default actions on some input quantities were omitted from the specifications. Canada's Atomic Energy Control Board (AECB) licensed a computerized shutdown system at Atomic Energy of Canada Limited's (AECL) Darlington plant operated by Ontario Hydro. The AECB had originally raised objections about the lack of a widely accepted definition of what constituted "good enough" for safety-critical software. Ontario Hydro used formal methods to verify the consistency of the software and the requirements and also used tests randomly chosen to model one of six accident scenarios to demonstrate the system's reliability (Joannou, 1993). Ontario Hydro and AECL embarked on an effort to develop standards for the software engineering process, the outputs from the process, and the requirements to be met by each output. The standards, called OASES, use a graded approach based on categories of criticality. For each category, OASES defines a software engineering process, procedures used to perform activities within each step of a process, and guidelines defining how to qualify already developed software in each category. OASES is a more unified approach to developing standards than the USNRC approach of developing standards for individual process activities.

OCR for page 33
The AECB has also developed a draft regulatory guide, C-138, Software in Protection and Control Systems, for software assessment (AECB, 1996). They stress that "evidence of software completeness, correctness, and safety will have to be reviewed and understood by people other than those who prepared it." Several aspects were identified as critical for providing evidence of high-quality software: software requirements specification systematic inspection of software design and implementation software testing the software development process and its management The AECB draft regulatory guide (AECB, 1996) includes a number of acceptance criteria: Software requirements should be unambiguous, consistent, and complete. Requirements should be precise enough to distinguish between correct and incorrect implementations, and mechanical rules should exist for resolving disputes about the meanings of requirements. The attributes indicate that a formal notation be used. The notation should define how continuous quantities in the environment can be represented by discrete values in software. Systematic inspections should include functional analysis to provide evidence that the software does what it is defined to do, and software safety analysis to provide evidence that the software does not initiate unsafe actions. Functional analysis should be based on formally defined notations and techniques so that mathematical models and automated tools can be used. A system-level hazard analysis should determine the contribution of software to each hazard, and analysis should extend into the software to increase confidence that hazardous states cannot occur. Both functional and random testing should be employed. Functional tests should be chosen to expose errors in normal and boundary cases, and measures of test coverage should be reported for them. Random tests selected from input conditions should be used to demonstrate that the software will function without failure under specific conditions. Software design and implementation methods are rapidly improving. Instead of mandating a single set of methods, the guide specifies that software be developed "by properly qualified people following a controlled and accepted software development and quality assurance plan." Methods selected should enable software designs and implementations to be reviewed to determine if quality attributes (e.g., completeness, consistency, etc.) have been attained. Configuration management should be used to control change. Changes should be justified and reviewed, and all artifacts (e.g., designs and test plans) relating to the component being changed should also be updated. Changed release versions (with indicated changes) should be distributed to all holders of the original versions, including the regulatory agency. DEVELOPMENTS IN OTHER SAFETY-CRITICAL INDUSTRIES During the course of Phase 2 activities (see Appendix B), the committee heard presentations from John Rushby of SRI International, committee member Michael DeWalt of the Federal Aviation Administration, Joseph Profetta of Union Switch and Signal Inc., and Lynn Elliott of Guidant Cardiac Pacemakers. The committee also examined the circumstances surrounding problems in other applications. Dr. Rushby summarized his experience with a number of high-assurance software systems by stating that mishaps are generally due to requirements errors rather than coding errors. Current techniques for quality assurance are adequate for later software development stages (e.g., coding). However, early stages have weak V&V methods because of a lack of adequate validation techniques, particularly for systems with complex interactions (e.g., concurrent, fault-tolerant, reactive, real-time systems). Dr. Rushby suggested that formal methods could be used to specify assumptions about the environment in which a system operates, the requirements of the system, and a design to accomplish the requirements. If these specifications were written, they could be analyzed for certain forms of consistency and completeness and validated by posing "challenges" as to whether a specification satisfies a requirement or whether a design implements a specification. Committee member DeWalt presented the FAA's Software Considerations in Airborne Systems and Equipment Certification (DO-178B), which provides guidelines for the production of software for airborne systems. These guidelines represent an industry and regulator consensus document. The guidelines used by the FAA identify 66 objectives covering the entire software development process. These objectives represent a distillation of best industry practices and do not rely on or reference other standards or guidelines. The number of objectives that must be satisfied and the associated rigor applied is a function of five different severity categories of safety. These objectives for the most part have objective acceptance criteria understood by the regulators and industry. The compliance of a specific software product with the guidelines is established by examining data products produced by the software process and interviewing developer personnel. The guidelines recognize that objectives can be satisfied by alternative methods (e.g., service experience) provided that equivalent levels of confidence can be demonstrated. The FAA also has a delegation system that allows industry representatives to make compliance findings on behalf of the FAA. Mr. Profetta described the distributed process control systems in which control signals from remote controllers could

OCR for page 33
be overridden by local signals in trains or switches. Critical software is developed following IEEE standards and development processes. Quality is assured via extensive testing on a simulation of a train control system. The application has very well-defined safety problems, and only six events are needed to characterize the problems. Extensive testing is undertaken using seeded faults to estimate the probability that test cases expose faults. Mr. Elliott stated that his most difficult software development problem was writing and reviewing requirements specifications. Food and Drug Administration (FDA) regulators expect natural language requirements, but Mr. Elliott has found that describing systems with Statemate (Harel et al., 1990), a notation for describing event-driven reactive systems, is superior to either natural language or data flow diagrams. Guidant Cardiac Pacemakers developers use fault tree analysis to analyze the safety of their system and dynamic testing to ensure the software's behavior. FDA regulators specify guidelines for these activities but do not prescribe particular development methods. A prior NRC study of space shuttle avionics software (NRC, 1993) identified shortcomings of inspections with respect to assumptions reviewers made about hardware and software platforms on which their implementations execute. Inspections focus on the development of software by a single contractor, and do not probe beyond the descriptions of interfaces supplied by other contractors. As a result, implementations are vulnerable to errors arising from assumptions made about erroneously documented interfaces. The Ariane 5 failure (Lions et al., 1996) offers a cautionary note for drawing conclusions about the reliability or safety of software based on prior operating experience. The first flight of the Ariane 5 launcher ended in a failure caused by responses to erroneous flight data provided by alignment software in its Inertial Reference System. Part of the data contained a diagnostic bit pattern which was erroneously interpreted as flight data. The alignment software computes meaningful results only before lift-off. After lift-off, this software serves no purpose. The original requirement for the continued operation of the alignment software after lift-off was retained during 10 years of the earlier models of Ariane, in order to cope with a hold in the countdown. The period selected for continued alignment operation was based on the time needed for the ground equipment to resume full control of the launcher in the event of a hold. The same requirement does not apply to Ariane 5, but was maintained for commonality reasons, presumably based on the view that, unless proven necessary, it was not wise to make changes in software which worked well on Ariane 4. REVIEW OF EXPERIENCE In order to better understand what types of software problems have occurred in software quality assurance, the committee reviewed a number of licensee event reports (LERs, which are submitted to the USNRC) and summaries of LERs reporting problems with computer-based systems in nuclear power plants. LERs describing events at Diablo Canyon (LER 92-028-00), Salem (LER 92-107-00), and Turkey Point (LER 94-005-02) identify instances of software design errors, inadequate review of requirements and designs, excessive reliance on testing as a V&V method, and problems with configuration control. The Turkey Point incident illustrates several problems that can occur.3 The Florida Power and Light (FPL) Company's Turkey Point LER describes an upgrade to the Turkey Point Unit 3 and 4 emergency power system (EPS) using commercial-grade programmable logic controllers (PLCs) in the EPS load sequencer. FPL stated that these new load sequencers would replicate the functions of the old sequencers, with some improvements to the sequence timing for loading of safety equipment. In response to USNRC review, FPL committed to follow the verification and validation program in IEEE 1012-1986, Standard for Software Verification and Validation Plans, and the guidelines in Regulatory Guide 1.152, which endorses ANSI/IEEE/ANS 7-4.3.2–1982, Application Criteria for Programmable Digital Computer Systems in Safety Systems of Nuclear Power Generating Stations. Additionally, the contractor responsible for developing and installing the load sequencer performed independent V&V of the PLCs and the load sequencer logic. FPL qualified the PLCs as Class 1E through dedication of the commercial-grade equipment based on guidance provided in EPRI [Electric Power Research Institute] NP-5652, Guideline for the Utilization of Commercial Grade Items in Nuclear Safety Related Applications. FPL guaranteed that all logic functions would be tested under the guidelines of the above-mentioned V&V program, particularly to ensure that there were no common-mode failures between the redundant trains of load sequencers. FPL stated that in addition to the regularly scheduled startup and bus load tests, the load sequencer would be tested "continuously" using an automatic self-test mode. This enhancement was approved by the USNRC (Newberry, 1990). On November 3, 1994, Turkey Point Unit 3's sequencer failed to respond to Unit 4's safety injection (SI) signal because of a defect in the sequencer software logic. The defect could inhibit any or all of the four sequencers from responding to input signals. The problem arose in trying to design the sequencers so that if a "real" emergency signal is received while the sequencer is being tested, the test signal clears and the engineering safety features controlled by the sequencer are activated. As actually implemented, if an SI signal is received 15 seconds or later into particular test scenarios, the test signal is cleared but the inhibit signal preventing actuation is 3   The Diablo Canyon plant is in Avila Beach, California; the Salem plant is in Salem, New Jersey; the Turkey Point plant is in Florida City, Florida.

OCR for page 33
maintained by latching logic. The test signal initiates the latching logic, but an input signal maintains the latching logic if the signal arrives prior to the removal of the test signal. Thus, if a real signal arrives more than 15 seconds into the test scenario, the test signal clears but the inhibit logic is held locked in and actuation is prevented. As the result of erroneous inhibit signals, any sequencer output might be blocked. The outputs blocked are determined by a combination of factors, including which test scenario was executing, the length of time the test was running, and which other inputs were received. The designer and independent verifier failed to recognize the interactions between the inhibit and test logic. An independent assessment team found that logic diagrams contained information not reflected in the ladder diagrams, and that the V&V was not comprehensive enough to test certain aspects of the logic. In its review, the USNRC stated, "The plan was weak in that it relied almost completely on testing as the V&V methodology. More emphasis on the analysis of the requirements and design would have increased the likelihood of discovering the design flaw." This incident illustrates many of the potential problems with digital systems: added design complexity from self-testing software components, incomplete requirements, and inadequate testing. Two recent studies by Lee (1994) and Ragheb (1996) provide data on digital application experiences in the United States and Canada. Lee reviewed 79 LERs for digital failures and classified them according to their root causes. With respect to the U.S. experience, Lee found that electromagnetic interference (EMI), human-machine interface error, and software error caused a "significant number of failures" (where "significant" is not defined in the report) in digital components during the three-year period studied (1990–1993). Fewer digital system failures involved random component failure. The actual numbers are shown in Table 4-1. The report concludes that the root causes of these failures were (1) poor software V&V, (2) inadequate plant procedures, and (3) inadequate electromagnetic compatibility of the digital system for its environment.4 Although the study is not yet completed, the Canadian AECB has been reviewing data from the United States, Canada, and France on software failures in nuclear power plants (Ragheb, 1996). The reviews include only events that resulted in consequences that meet reporting criteria of the government agency and do not necessarily include all digital system failures. The results of this study are in draft form only and may change before final publication. It is also important to note again that classification of errors is very TABLE 4-1 U.S. Software-Related LERs between 1990 and 1993 Cause of Events Number of Events Software error 30 Human-machine interface error 25 Electromagnetic interference 15 Random component failure 9   Source: Lee (1994). difficult and may be subject to the classifier's biases or personal definitions. In the AECB study, 459 event reports from 22 reactors over 13 years are being evaluated. The AECB found all trends either decreasing or flat, except those attributable to inappropriate human actions (which have shown a marked increase in the last five years). Hardware problems overall were found to be decreasing with time, although peaks can be found in some recent years. The number of software faults appears to be relatively constant over time.5 A large majority of the computer-related events occurred in digital control systems, which is not surprising given that they have been in operation the longest (since 1970) and perform a complex and continuous task: 363 computer system failures were in control systems, 29 in shutdown systems, and 65 in other systems. Table 4-2 shows the distribution of the failure types. Of the problems classified as relating to software, 104 involved application software, four involved the executive or operating system, four a database or table, and five were classified as other. We emphasize that the classification of the errors in this report was subjective and thus the data should be used with caution. However, it does appear that a number of software errors have been found in operating nuclear power plant software and more extensive evaluation and collection of data would be useful in making decisions about most of the issues in this report. Finally, Ragheb notes that introducing modern digital I&C systems may not alleviate software quality assurance concerns. He points out: "Programmable logic controllers (PLCs) are being introduced as a cost-effective method of replacing older analogue or digital controls. PLCs have resulted in a number of incidents within the plants and it must be recognized that they are themselves digital computers." A study of PLCs used in a U.S. phenol plant (Paula, 1993) reported a processor failure rate of approximately two per 4   Experience shows that classification of incidents into categories solely using LER data is fraught with uncertainty and likely to be erroneous because of the great difficulty of determining root causes from the summary data in the LERs. Further, committee review of Lee's classification indicates several questionable or apparently erroneous classifications. Nevertheless, the committee agrees that inadequate V&V is a substantial problem that must be addressed. 5   Ragheb was also critical of temporary software modifications performed by "patching" the software to change its behavior. For example, at the Canadian Bruce-A plant, the software was patched to permit the software to operate correctly at very low reactor power. However, the patch was not removed when the reactor power increased, and "the software operated incorrectly and caused a power excursion that was terminated by a trip" (Ragheb, 1996).

OCR for page 33
TABLE 4-2 Summary of Canadian Software-Related Event Reports 1980-1993 Failure Cause Number Software problems 117 Human-machine interface problems 130 Hardware problems 220 External (power, electromagnetic interference, other) 39 Unassigned 37 NOTE: Total number of failure causes exceeds number of events. Some events apparently had multiple causes. Source: Ragheb (1996). year. The plant operators also reported a total of our complete PLC failures (both primary and secondary processors) for all PLCs over seven years of plant operation. No PLC failures were reported because of errors in the software, including operating systems and applications software, or because of operator error. For PLCs with fault-tolerant redundant architectures installed to perform control interlocks in several nuclear power plants of French design, Paula found there were 58 failures of both processors out of a total 1,200 PLCs over a three-year period (Paula, 1993). In evaluating these data, Paula warns that system size and complexity are important factors. The PLCs considered are relatively simple, generally accepting a few input signals and performing only a few control functions. In a study of fault-tolerant digital control systems that are much larger and more complex than these PLCs, the failure rates were about 15 to 50 times higher (Paula et al., 1993). In these fault-tolerant digital control systems, software errors were an important contributor to system failure. In several of the systems studied, failure due to software errors occurred as often as hardware failures, and the authors further (Paula et al., 1993) conclude that software errors tended to be difficult to prevent because they may occur only when an unusual set of inputs exists. Inadvertent operator actions, particularly during maintenance, also contributed significantly to the frequency of failures of these fault-tolerant digital control systems. CONCLUSIONS AND RECOMMENDATIONS Conclusions Conclusion 1. Software quality assurance procedures typically monitor process compliance rather than product quality. In particular, there are no generally accepted evaluation criteria for safety-related software; rather, standards and guidelines help to repeat best practices. Because most software qualities related to system safety, e.g., maintainability, correctness, and security, cannot be measured directly, it must be assumed that a relationship exists between measurable variables and the qualities to be ensured. To deal with this limitation, care must be taken to validate such models, e.g., using past development activities, and to assure that the measurements being made are appropriate and accurate in assessing the desired software qualities. Conclusion 2. Prior operating experience with particular software does not necessarily ensure reliability or safety properties in a new application. Additional reviews, analysis, or testing by a utility or third-party dedicator may be necessary to reach an adequate level of assurance. Conclusion 3. Testing must not be the sole quality assurance technique. In general, it is not feasible to assure software correctness through exhaustive testing for most real, practical I&C systems. Conclusion 4. USNRC staff reviews of the verification and validation process used during software development seem quite thorough. Conclusion 5. Exposing software flaws, demonstrating reliable behavior of software, and finding unintended functionality and flaws in requirements are different concepts and should be assessed by a combination of techniques including: Systematic inspections of software and planned testing with representative inputs from different parts of the systems domain can help determine if flaws exist in the software. Functional tests can be chosen to expose errors in normal and boundary cases, and measures of test coverage can be reported for them. Testing based on large numbers of inputs randomly selected from the operational profiles of a program can be used to assess the likelihood that software will fail under specific operating conditions. Requirements inspections can be an effective method for detecting software defects, provided requirements are studied by several experienced people who did not participate in their construction. The effectiveness of these reviews also depends on the quality of the requirements. A system-level hazard analysis can identify states that, combined with environmental conditions, can lead to accidents. The analysis should extend into software components to ensure that software does not contribute to system hazards. Conclusion 6. The USNRC research programs related to software quality assurance appear to be skewed toward investigating code-level issues, e.g., coding in different languages to achieve diversity and program slicing to identify threads containing common code. Conclusion 7. Rigorous configuration management must be used to assure that changes are correctly designed and implemented and that relationships between different software artifacts are maintained.

OCR for page 33
Conclusion 8. Software is not more testable simply because the design has been implemented on a chip. Use of any technology requiring equivalent design effort to software requires commensurate quality assurance. For example, this conclusion applies to ASIC (application-specific integrated circuit), PLC (programmable logic controllers), and FPGA (field programmable gate arrays). However, the committee notes that the use of these technologies may be useful in addressing some configuration management problems. Recommendations Recommendation 1. Currently, the USNRC's path is to develop regulatory guides to endorse (with possible exceptions) a variety of industry standards. The USNRC should develop its own guidelines for software quality assurance that focus on acceptance criteria rather than prescriptive solutions. The draft regulatory guide, Software in Protection and Control Systems, by Canada's Atomic Energy Control Board is an example of this type of approach. The USNRC guidelines should be subjected to a broad-based, external peer review process including (a) the nuclear industry, (b) other safety-critical industries, and (c) both the commercial and academic software communities. Recommendation 2. Systems requirements should be written in a language with a precise meaning so that general properties like consistency and completeness, as well as application-specific properties, can be analyzed. Cognizant personnel such as plant engineers, regulators, system architects, and software developers should be able to understand the language. Recommendation 3. USNRC research in the software quality assurance area should be balanced in emphasis between early phases of the software life cycle and code-level issues. Experience shows the early phases contribute more frequently to the generation of software errors. Recommendation 4. The USNRC should require a commensurate quality assurance process for ASICs, PLCs, and other similar technologies. References AECB (Atomic Energy Control Board, Canada). 1996. Draft Regulatory Guide C-138, Software in Protection and Control Systems. Ottawa, Ontario: AECB. Dijkstra, E.W. 1970. Structured programming. Pp. 84–88 in Software Engineering Techniques, J.N. Buxton and B. Randell (eds.). Brussels: Scientific Affairs Division, NATO. Fagan, M.E. 1976. Design and code inspections to reduce errors in program development. IBM Systems Journal 15(3):182–211. Gerhart, S., and L. Yelowitz. 1976. Observations of fallibility in applications of modern programming methodologies. IEEE Transactions on Software Engineering 1(2):195–207. Goel, A.L., and F.B. Bastani. 1985. Forward: Software reliability. IEEE Transactions on Software Engineering 11(12):1409–1410. Harel, D., H. Lachover, A. Naamad, A. Pnueli, M. Politi, R. Sherman, A. Shtull-Trauring, and M. Trakhtenbrot. 1990. STATEMATE: A working environment for the development of complex reactive systems. IEEE Transactions on Software Engineering 16(4):403–414. IEC (International Electrotechnical Commission). 1986. Software for Computers in the Safety Systems of Nuclear Power Stations, IEC 880. Geneva, Switzerland: IEC. IEEE (Institute of Electrical and Electronics Engineers). 1993. IEEE Standard Criteria for Digital Computers in Safety Systems of Nuclear Power Generating Stations, IEEE Std 7-4.3.2–1993. New York: IEEE. Joannou, P.K. 1993. Experiences for application of digital systems in nuclear power plants. NUREG/CP-0136. Pp. 61–77 in Proceedings of the Digital Systems Reliability and Nuclear Safety Workshop, U.S. Nuclear Regulatory Commission, September 13–14, 1993, Gaithersburg, Md. Washington, D.C.: US. Government Printing Office. Lee, E.J. 1994. Computer-Based Digital System Failures. Technical Review Report AEOD/T94-03. Washington, D.C.: USNRC. July. Leveson, N.G. 1995. Safeware: System Safety and Computers. New York: Addison-Wesley. Lions, J.L., L. Lubeck, J.-L. Fauquembergue, G. Kahn, W. Kubbat, S. Levedag, L. Mazzini, D. Merle, and C. O'Halloran. 1996. Ariane 5 Flight 501 Failure: Report by the Inquiry Board. Paris: European Space Agency. July 19. Moore, J.W., and R. Rada. 1996. Organizational badge collecting. Communications of the Association for Computing Machinery 39(8):17–21. Musa, J.D., A. Iannino, and K. Okumoto. 1987. Software Reliability: Measurement, Prediction, Application. New York: McGraw-Hill Book Company. Myers, G. 1979. The Art of Software Tests. New York: John Wiley and Sons. Newberry, S. 1990. SSICB Review of the Load Sequencers in the Enhanced Power System at Turkey Point Plant, Units 3 & 4. Docket Nos. 50-250 and 50-251, November 5, 1990. Washington, D.C. NRC (National Research Council). 1993. An Assessment of Space Shuttle Software Development Processes. Aeronautics and Space Engineering Board, National Research Council. Washington, D.C.: National Academy Press. Parnas, D.L. 1985. Software aspect of strategic defense systems. Communications of the Association for Computing Machinery 28(12):1326–1335. Parnas, D.L., A.J. van Schouwen, and S.P. Kwan. 1990. Evaluation of safety-critical software. Communications of the Association for Computing Machinery 33(6):636–648. Paula, H.M. 1993. Failure rates for programmable logic controllers. Reliability Engineering and System Safety 39:325–328. Paula, H.M., M.W. Roberts, and R.E. Battle. 1993. Operational failure experience of fault-tolerant digital control systems. Reliability Engineering and System Safety 39:273–289. Porter, A., H.P. Sly, and L.G. Votta. 1996. A review of software inspections. Pp. 40–77 in Software Process, Advances in Computers 42, M.V. Zelkowitz (ed.). San Diego: Academic Press. Ragheb, H. 1996. Operating and Maintenance Experience with Computer-Based Systems in Nuclear Power Plants. Presentation at International Workshop on Technical Support for Licensing of Computer-Based Systems Important to Safety, Munich, Germany. March. Rushby, J. 1993. Formal Methods and the Certification of Critical Systems. Menlo Park, Calif.: SRI International. November. Rushby, J., and F. von Henke. 1991. Formal Verification of the Interactive Convergence Clock Synchronization Algorithm Using EHDM. Technical Report SRI-CSL-89-3R. Menlo Park: SRI International. August. Rushby, J., and F. von Henke. 1993. Formal verification of algorithms for critical systems. IEEE Transactions on Software Engineering 19(1):13–23. USNRC. 1992. Safety Evaluation by the Office of Nuclear Reactor Regulation Related to Amendment No. 138 to Facility Operating License No. DPR-39 and Amendment No. 127 to Facility Operating License No. DPR-48, USNRC, June 1992. Washington, D.C.: USNRC.