Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
APPENDIX F DESCRIPTION OF PROPOSED SYSTEMS SAFETY ENGINEERING FUNCTIONS IN SUPPORT OF NATIONAL SPACE TRANSPORTATION SYSTEM RISK ASSESSMENT AND RISK MANAGEMENT . In Section 5. l ~ the Committee recommends that NASA consider bringing together appropriate ac- tivities into a focused "Systems Safety Engineering" function at both Headquarters and the centers. This activity would apply across the entire set of design, development, qualification and certifica- tion, and operations activities of the National Space Transportation System (NSTS) Program in support of risk assessment and risk management. Systems safety engineering wouic! embrace the functions (fisted in Section S.l ~ and illustrated here in Figure F-~) which are described briefly in the following paragraphs. ~ 1. IDENTIFICATION OF FAILURE MODES AND EFFECTS The failure mocles of each hardware item can be identifiec! at this step without addressing the prob- ability of each failure mocle occurring. All of the significant effects of each failure mode also wouic] be iclentifiecI. These effects (not just the estimates! worst-case effect) are needed also for identification of hazards and for evaluating potential cascading influences on' the failure modes of other parts of the system.'All of the causes of each failure mocle (including the feedback influences from the hazard analysis, step 3 below) should then be identified. The control of all causes of each failure mocle by clesign margin, process controls, redun(lancy, and operating constraints would be defined. This in- formation would be an input to the analysis of safety risks in steps 5, 8, and 9. 2. ESTABLISHMENT OF DESIGN CRITERIA FOR REDUNDANCY Design criteria for redundancy would be based on functional ant] fail-operational requirements for components or units which do not have cata- strophic single failure modes. These criteria wouic! be based on reliability analyses of components using either statistical data bases where available or estimated failure rate functions. - In Figure F-1, the thirteen functions discussed in this appendix are shown by the boxes which are numbered to correspond. This diagram can be compared to that currently described for the NSTS Program by the JSC SR&QA office, as shown in Figure 5-12 in Section 5.11. 139 3. IDENTIFICATION OF HAZARDS AND THEIR POTENTIAL CONSEQUENCES Hazards associates] with the system can be sys- tematically identified using various methods such as fault-tree or event-tree networks. Inputs will come from mission requirements, the system con- figuration, the applicable identifiecl hardware fail- ure effects, human factors and the expecter! envi- ronments. Potential consequences of the presence of each hazard can then be derived without regard for the probability of the events or mishaps occur- ring. (However, some screening out of very Tow probability failure events wouIcl simplify this ef- fort.) Mishaps resulting from combinations of events and the impacts of creates] hazards on failure mocles in other hardware can be identified. Each of the causes of the identified hazards, along with pro- posed controls, would be defined for later risk assessment in steps 5, 8, and 9. 4. IDENTIFICATION OF CRITICAL ITEMS Using the set of information generated in the previous steps, hardware failure modes could be categorized on the basis of their potential conse- quences. Those designs having failure modes with consequences that could result in loss of vehicle or life would be returned to engineering for possible alternative concepts. Failure modes that remain after this cycle could be put into criticality cate- gories to be prioritized based on severity of the failure effects and the probability of occurrence (steps 8 and 9~. Those in prioritized categories which require Level ~ approval for either retention or a waiver authorization would be submitted through Level I! PRCB along with a full safety- risk assessment produced under the direction of NASA systems safety engineers (step 131. 5. EVALUATION OF THE PROBABILITY OF OCCURRENCE OF CAUSES AND CONSEQUENCES OF FAILURE MODES AND HAZARDS An evaluation can be made of the probability of occurrence of each of the causes and consequences for each retained failure mode and hazard. These
6 - <` ^ Z _ aS i_ ~ Z o ~ o en ~ o Z Co Z o IS t, Cat Cat Z Z O Ct: V) ~ LU Cat In ~ Z o L" ~0~ - 1~ ~ _~' _f OZw' 140 L . ,~ ~ ~ Z if o flu ~ e ~ cD - to ~ IS ° ~ _ ° Is ~ ~ C) _ cot U. A: _ 1~
y ~ In LO In l-. LL - By ~ to to 6 1e ~ ~~L to Cat LU Ct In to - : Y Cc o , s ·c to o C~ ~ Ud Z ~C r , ~ ~ ~ =. Z -< Z ~ '? : ~ : ~! - J V, o eS eS C~ =' 0 ~ ~ ~ o Z C~ ~ Z o S C~ C:' - ~Q ec "Z o ~o ~ aS S o "Y ~ ~ z ~ ~ ~S - o . FIGURE F-1 Flow diagram of proposed systems safety engineering functions in support of risk assess~nent. 141
analyses could be performed by both the contrac- tors' and NASA's systems safety engineers. A va- riety of tools can be used to perform these evalu- ations. The determination of probability of occurrence of the causes of failures wouIc! be expressed as a set of functions related to: a. Reliability data for hardware items having causes of failure anodes that are statistical in nature, such as electronic boards. I. Wear-out functions for hardware line replace- able units where the causes of the failure modes are both statistical and have safety operating margins that are either time or cycle dependent. - Operating margins required where the causes of the particular modes of hardware failure are dependent on stress, temperature, or other environmental factors to which the unit may be subjected. The control which can be exercised over the true configuration of the part, unit, sub- system, or system. This includes both the validation and control of manufacturing anc! integration processes, anc3 the ahilit~v to ex- plic~tly verify the configurations prior to op- erations. d. Evaluation of the probability of occurrence of each of the possible consequences of critical hard- ware failures or the presence of other severe hazarcis requires assessment of each path of the fault tree. The prevention of certain consequence paths would be evaluated relative to the system design and the specific operational hazard control techniques. Probability functions need to be cleterminec] for both the causes and consequences in orcler to provide inputs, both to the overall risk assessment which will guide the final design (or for the current STS, the proposed design changes), and to the criteria on which the vaTiciation and certification test programs shouic! be based. 6. ESTABLISHMENT OF SAFETY-RISK LEVEL CRITERIA FOR DESIGN MARGINS AND HAZARD CONTROLS Using relationships of the types derived under step 5 as a framework, risk levels can be allocated among the various subsystems, units, and compo- nents that would be consistent with the acceptable safety-risk requirements established by NASA for the overall NSTS program. Design criteria can then be establisher! for the margins required against each cause of a critical failure mocle (using the functions developed in step 5) ant] for the controls required to limit the consequences of each hazard. This task is critical to providing assurance that the NSTS system has been configured to a given (acceptable) set of safety-risk levels. (Note that one cannot assure fully safe operations.) Those risk levels (which may be quite different for toss of hardware versus loss of life) must have a definable and objective set of measures that can be agreec! upon by Level ~ and the Administrator of NASA. They must later be verified during the test programs. Without such quantitative safety-risk level assess- ments, assurances of acceptable safety are not meaningful and the fulfillment of responsibility is not measurable. 7. DESIGN OF QUALIFICATION AND CERTIFICATION TEST PROGRAMS Once safety margins have been cleterminec! for each failure mode of the acceptec! clesigns, quan- titative~y significant validation, qualification, anal (where require(l) time or cycle (reuse) dependent certification test programs can be designed. These test plans must be optimizer! to extract the maxi- mum amount of information on operating margins against critical failure modes from the most cost effective quantity of harc~ware and the time period which can be allocated to tests. Design of the test programs is crucial to the viability of making risk assessments. The criteria for the tests should be established by reliability and/or systems safety engineers who specialize in test program clesign and statistical analysis of test data. 8. OBJECTIVE ASSESSMENT OF SAFETY RISKS The test data should be statistically analyzed to establish credible validated margins against the causes of each significant potential failure mocle. When these measurer! margins are compared with the margin criteria from step 6, and when the probability functions for configuration control (step 5.~) are derived, there will be a meaningful basis for making assessments of the probability of oc- currence for each failure mode and its associated hazard. These probabilities of occurrence must be combinect with the appropriate analyses of the probabilities of the consequences being realizer! for each failure at the subsystem ant! total system levels 142
to provide an objective measure of the portions of the overall safety-risks that are associated with each retainer! design ant! hazard. 9. DEVELOPMENT OF ACCEPTANCE RATIONALE FOR RETAINED HAZARDS AND HAZARD REPORTS Rationales for accepting the safety risks associ- ated with all creates! ant] intrinsic hazards would be cleveloped. For those hazarcis caused by hard- ware failure modes, these rationales would embody the Critical Items List retention rationales Jevel- oped by the various engineering groups and the test-basec] safety-risk assessments generates! in step 8. This information would be publisher! as a set of risk assesses! hazard reports. These reports would go through the approval and data management process shown in Figure F-~. Upon approval by Level Il PRCB, they would constitute the NSTS Accepted Hazards Data Base. Those hazards in the data base which result from the currently defindJ Criticality ~ and 1R items could then be further classified and prioritizes! hosed on their assesses! safety risks. Those requiring final acceptance at Level ~ would have special request packages prepared by NASA systems safety engineering. To avoid the misconceptions associ- atec! with thousands of waivers to an accepted system design; these requests should fall into two categories: 2. I. Items which met their specific design criteria, including safety-risk criteria (step 61. These items shouIc! not require a "waiver," but only Level ~ approval of the retention requests because of their perceived importance or risk contribution. Items which did not meet their specific safety- risk clesign criteria as indicated by test mar- gins or detailed risk analyses. These items would therefore require a "waiver" for re- tentlon. These approval requests to Level ) wouic! be pre- sentect in conjunction with an overall System Safety Assessment Report and specific Mission Risk As- sesssment Reports (step 13 below). 10. SPECIFICATION OF ENVIRONMENTAL AND OPERATING CONSTRAINTS Having accepted a resiclual hazard (whether contained or catastrophic) the NASA systems safety engineers must specify very explicitly for all equip- ment levels (part, unit, subsystem, element, ant] full system) the environmental and operating con- straints which wild assure that the validated margins wit! not be violated. In this regard, this task also would have a major interface with the operations activities. The analysis of such things as the effect of environmental conditions on the validity of validations and certifications is usually not done by the quality assurance engineers; therefore, the systems safety engineers should be the responsible focus for this task. 1 1. QUANTITATIVE EVALUATION OF FLIGHT DATA TO UPDATE SAFETY MARGIN VALIDATIONS By reviewing all flight data (or other off-line test data and even test data from other programs) for explicit information, updated quantitative assess- ments of the validated design criteria can be made. In order to retain the assured level of risk as new data become available, specifications may have to he changed for some hardware or new operational constraints may have to be defined. 12. OVERSIGHT OF QUALITY ASSURANCE FUNCTIONS TO CONTROL SAFETY-RISKS In order to fulfill its responsibility to assure control to the accepted levels of risk, the systems safety engineers must oversee the appropriate qual- ity assurance functions. This is essential because the validated margins and assessed risks of the retained hazards are dependent on total configu- ration verification of the overall system and each of its constituent parts. By "total" configuration one means all aspects of the hardware, software, external environments and operating constraints. 13. OVERALL SYSTEM SAFETY RISK ASSESSMENT AND DEFINITION OF THE POTENTIAL TO REDUCE THE LEVEL OF RISK Using all of the above information, the NASA systems safety engineers can prepare a series of "System Safety Assessment Reports." These reports would continuously update overall system risk assessments against the safety-risk objectives estab- lished for the various phases of the NSTS Program by the risk management activity. The systems safety engineers also would define the potential to reduce the levels of risk in the program. Mission risk 143
assessment reports would also be preparer] which would incorporate mission accomplishment risk assessments, of which the safety risks would be one input. Where required, retention request packages gen- erated in step 9 would be submitter! through Level T! to Level ~ along with the approved safety-risk assessments for each item and an appropriate summary of the overall system safety-risks assess- ment report. Thus, the retention requests can be considered by Level ~ within the context of a definable and objective risk management process. The arguments for retention of prioritizes] critical items wouIc! be combined with objective assess- ments of safety-risks for each item's contribution to the overall system's safety risks. 144