| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 139
APPENDIX F
DESCRIPTION OF PROPOSED SYSTEMS SAFETY ENGINEERING FUNCTIONS IN
SUPPORT OF NATIONAL SPACE TRANSPORTATION SYSTEM RISK ASSESSMENT
AND RISK MANAGEMENT
.
In Section 5. l ~ the Committee recommends that
NASA consider bringing together appropriate ac-
tivities into a focused "Systems Safety Engineering"
function at both Headquarters and the centers.
This activity would apply across the entire set of
design, development, qualification and certifica-
tion, and operations activities of the National Space
Transportation System (NSTS) Program in support
of risk assessment and risk management. Systems
safety engineering wouic! embrace the functions
(fisted in Section S.l ~ and illustrated here in Figure
F-~) which are described briefly in the following
paragraphs. ~
1. IDENTIFICATION OF FAILURE
MODES AND EFFECTS
The failure mocles of each hardware item can be
identifiec! at this step without addressing the prob-
ability of each failure mocle occurring. All of the
significant effects of each failure mode also wouic]
be iclentifiecI. These effects (not just the estimates!
worst-case effect) are needed also for identification
of hazards and for evaluating potential cascading
influences on' the failure modes of other parts of
the system.'All of the causes of each failure mocle
(including the feedback influences from the hazard
analysis, step 3 below) should then be identified.
The control of all causes of each failure mocle by
clesign margin, process controls, redun(lancy, and
operating constraints would be defined. This in-
formation would be an input to the analysis of
safety risks in steps 5, 8, and 9.
2. ESTABLISHMENT OF DESIGN
CRITERIA FOR REDUNDANCY
Design criteria for redundancy would be based
on functional ant] fail-operational requirements for
components or units which do not have cata-
strophic single failure modes. These criteria wouic!
be based on reliability analyses of components
using either statistical data bases where available
or estimated failure rate functions.
- In Figure F-1, the thirteen functions discussed in this appendix are
shown by the boxes which are numbered to correspond. This diagram
can be compared to that currently described for the NSTS Program
by the JSC SR&QA office, as shown in Figure 5-12 in Section 5.11.
139
3. IDENTIFICATION OF HAZARDS AND
THEIR POTENTIAL CONSEQUENCES
Hazards associates] with the system can be sys-
tematically identified using various methods such
as fault-tree or event-tree networks. Inputs will
come from mission requirements, the system con-
figuration, the applicable identifiecl hardware fail-
ure effects, human factors and the expecter! envi-
ronments. Potential consequences of the presence
of each hazard can then be derived without regard
for the probability of the events or mishaps occur-
ring. (However, some screening out of very Tow
probability failure events wouIcl simplify this ef-
fort.) Mishaps resulting from combinations of events
and the impacts of creates] hazards on failure mocles
in other hardware can be identified. Each of the
causes of the identified hazards, along with pro-
posed controls, would be defined for later risk
assessment in steps 5, 8, and 9.
4. IDENTIFICATION OF CRITICAL ITEMS
Using the set of information generated in the
previous steps, hardware failure modes could be
categorized on the basis of their potential conse-
quences. Those designs having failure modes with
consequences that could result in loss of vehicle or
life would be returned to engineering for possible
alternative concepts. Failure modes that remain
after this cycle could be put into criticality cate-
gories to be prioritized based on severity of the
failure effects and the probability of occurrence
(steps 8 and 9~. Those in prioritized categories
which require Level ~ approval for either retention
or a waiver authorization would be submitted
through Level I! PRCB along with a full safety-
risk assessment produced under the direction of
NASA systems safety engineers (step 131.
5. EVALUATION OF THE PROBABILITY
OF OCCURRENCE OF CAUSES AND
CONSEQUENCES OF FAILURE MODES
AND HAZARDS
An evaluation can be made of the probability of
occurrence of each of the causes and consequences
for each retained failure mode and hazard. These
OCR for page 140
6
- <`
^
Z
_ aS i_
~ Z
o ~ o
en
~ o Z
Co Z o
IS t,
Cat
Cat
Z Z
O Ct:
V)
~ LU —
Cat
In ~
Z o
— L"
~0~
-
1~
~ _~'
_f OZw'
140
L
. ,~ ~
~ Z
if o flu
~ e ~
cD - to
~ IS
° ~ _
° Is ~
~ C) _
cot
U.
A: _
1~
OCR for page 141
y
~ In
LO
In
l-.
LL
- By
~ to
to
6
1e ~
~~L
to
Cat
LU Ct
In to
- :
Y Cc
o
, s
·c
to
o
C~ ~
Ud
Z ~C
r
, ~
~ ~ =.
Z -< Z ~
'? : ~ : ~!
-
J
V,
o
eS eS
C~
=' 0 ~
~ ~ o
Z
C~
~ Z
o
S C~
C:'
-
~Q
ec "Z
o
~o ~ aS
S o
"Y ~ ~
z
~ —
~ ~S
-
o
.
FIGURE F-1 Flow diagram of proposed systems safety engineering functions in support of risk assess~nent.
141
OCR for page 142
analyses could be performed by both the contrac-
tors' and NASA's systems safety engineers. A va-
riety of tools can be used to perform these evalu-
ations. The determination of probability of
occurrence of the causes of failures wouIc! be
expressed as a set of functions related to:
a. Reliability data for hardware items having
causes of failure anodes that are statistical in
nature, such as electronic boards.
I. Wear-out functions for hardware line replace-
able units where the causes of the failure
modes are both statistical and have safety
operating margins that are either time or
cycle dependent.
- Operating margins required where the causes
of the particular modes of hardware failure
are dependent on stress, temperature, or other
environmental factors to which the unit may
be subjected.
The control which can be exercised over the
true configuration of the part, unit, sub-
system, or system. This includes both the
validation and control of manufacturing anc!
integration processes, anc3 the ahilit~v to ex-
plic~tly verify the configurations prior to op-
erations.
d.
Evaluation of the probability of occurrence of
each of the possible consequences of critical hard-
ware failures or the presence of other severe hazarcis
requires assessment of each path of the fault tree.
The prevention of certain consequence paths would
be evaluated relative to the system design and the
specific operational hazard control techniques.
Probability functions need to be cleterminec] for
both the causes and consequences in orcler to
provide inputs, both to the overall risk assessment
which will guide the final design (or for the current
STS, the proposed design changes), and to the
criteria on which the vaTiciation and certification
test programs shouic! be based.
6. ESTABLISHMENT OF SAFETY-RISK
LEVEL CRITERIA FOR DESIGN
MARGINS AND HAZARD CONTROLS
Using relationships of the types derived under
step 5 as a framework, risk levels can be allocated
among the various subsystems, units, and compo-
nents that would be consistent with the acceptable
safety-risk requirements established by NASA for
the overall NSTS program. Design criteria can then
be establisher! for the margins required against each
cause of a critical failure mocle (using the functions
developed in step 5) ant] for the controls required
to limit the consequences of each hazard. This task
is critical to providing assurance that the NSTS
system has been configured to a given (acceptable)
set of safety-risk levels. (Note that one cannot
assure fully safe operations.) Those risk levels
(which may be quite different for toss of hardware
versus loss of life) must have a definable and
objective set of measures that can be agreec! upon
by Level ~ and the Administrator of NASA. They
must later be verified during the test programs.
Without such quantitative safety-risk level assess-
ments, assurances of acceptable safety are not
meaningful and the fulfillment of responsibility is
not measurable.
7. DESIGN OF QUALIFICATION AND
CERTIFICATION TEST PROGRAMS
Once safety margins have been cleterminec! for
each failure mode of the acceptec! clesigns, quan-
titative~y significant validation, qualification, anal
(where require(l) time or cycle (reuse) dependent
certification test programs can be designed. These
test plans must be optimizer! to extract the maxi-
mum amount of information on operating margins
against critical failure modes from the most cost
effective quantity of harc~ware and the time period
which can be allocated to tests. Design of the test
programs is crucial to the viability of making risk
assessments. The criteria for the tests should be
established by reliability and/or systems safety
engineers who specialize in test program clesign
and statistical analysis of test data.
8. OBJECTIVE ASSESSMENT OF SAFETY
RISKS
The test data should be statistically analyzed to
establish credible validated margins against the
causes of each significant potential failure mocle.
When these measurer! margins are compared with
the margin criteria from step 6, and when the
probability functions for configuration control (step
5.~) are derived, there will be a meaningful basis
for making assessments of the probability of oc-
currence for each failure mode and its associated
hazard. These probabilities of occurrence must be
combinect with the appropriate analyses of the
probabilities of the consequences being realizer! for
each failure at the subsystem ant! total system levels
142
OCR for page 143
to provide an objective measure of the portions of
the overall safety-risks that are associated with
each retainer! design ant! hazard.
9. DEVELOPMENT OF ACCEPTANCE
RATIONALE FOR RETAINED HAZARDS
AND HAZARD REPORTS
Rationales for accepting the safety risks associ-
ated with all creates! ant] intrinsic hazards would
be cleveloped. For those hazarcis caused by hard-
ware failure modes, these rationales would embody
the Critical Items List retention rationales Jevel-
oped by the various engineering groups and the
test-basec] safety-risk assessments generates! in step
8. This information would be publisher! as a set of
risk assesses! hazard reports. These reports would
go through the approval and data management
process shown in Figure F-~. Upon approval by
Level Il PRCB, they would constitute the NSTS
Accepted Hazards Data Base.
Those hazards in the data base which result from
the currently defindJ Criticality ~ and 1R items
could then be further classified and prioritizes!
hosed on their assesses! safety risks. Those requiring
final acceptance at Level ~ would have special
request packages prepared by NASA systems safety
engineering. To avoid the misconceptions associ-
atec! with thousands of waivers to an accepted
system design; these requests should fall into two
categories:
2.
I. Items which met their specific design criteria,
including safety-risk criteria (step 61. These
items shouIc! not require a "waiver," but only
Level ~ approval of the retention requests
because of their perceived importance or risk
contribution.
Items which did not meet their specific safety-
risk clesign criteria as indicated by test mar-
gins or detailed risk analyses. These items
would therefore require a "waiver" for re-
tentlon.
These approval requests to Level ) wouic! be pre-
sentect in conjunction with an overall System Safety
Assessment Report and specific Mission Risk As-
sesssment Reports (step 13 below).
10. SPECIFICATION OF ENVIRONMENTAL
AND OPERATING CONSTRAINTS
Having accepted a resiclual hazard (whether
contained or catastrophic) the NASA systems safety
engineers must specify very explicitly for all equip-
ment levels (part, unit, subsystem, element, ant]
full system) the environmental and operating con-
straints which wild assure that the validated margins
wit! not be violated. In this regard, this task also
would have a major interface with the operations
activities. The analysis of such things as the effect
of environmental conditions on the validity of
validations and certifications is usually not done
by the quality assurance engineers; therefore, the
systems safety engineers should be the responsible
focus for this task.
1 1. QUANTITATIVE EVALUATION OF
FLIGHT DATA TO UPDATE SAFETY
MARGIN VALIDATIONS
By reviewing all flight data (or other off-line test
data and even test data from other programs) for
explicit information, updated quantitative assess-
ments of the validated design criteria can be made.
In order to retain the assured level of risk as new
data become available, specifications may have to
he changed for some hardware or new operational
constraints may have to be defined.
12. OVERSIGHT OF QUALITY ASSURANCE
FUNCTIONS TO CONTROL SAFETY-RISKS
In order to fulfill its responsibility to assure
control to the accepted levels of risk, the systems
safety engineers must oversee the appropriate qual-
ity assurance functions. This is essential because
the validated margins and assessed risks of the
retained hazards are dependent on total configu-
ration verification of the overall system and each
of its constituent parts. By "total" configuration
one means all aspects of the hardware, software,
external environments and operating constraints.
13. OVERALL SYSTEM SAFETY RISK
ASSESSMENT AND DEFINITION OF THE
POTENTIAL TO REDUCE THE LEVEL
OF RISK
Using all of the above information, the NASA
systems safety engineers can prepare a series of
"System Safety Assessment Reports." These reports
would continuously update overall system risk
assessments against the safety-risk objectives estab-
lished for the various phases of the NSTS Program
by the risk management activity. The systems safety
engineers also would define the potential to reduce
the levels of risk in the program. Mission risk
143
OCR for page 144
assessment reports would also be preparer] which
would incorporate mission accomplishment risk
assessments, of which the safety risks would be
one input.
Where required, retention request packages gen-
erated in step 9 would be submitter! through Level
T! to Level ~ along with the approved safety-risk
assessments for each item and an appropriate
summary of the overall system safety-risks assess-
ment report. Thus, the retention requests can be
considered by Level ~ within the context of a
definable and objective risk management process.
The arguments for retention of prioritizes] critical
items wouIc! be combined with objective assess-
ments of safety-risks for each item's contribution
to the overall system's safety risks.
144
Representative terms from entire chapter:
safety engineers