Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
4 Report of the Panel on Engineering for Complex Systems INTRODUCTION The Engineering for Complex Systems (ECS) pro- gram is one of three programs under NASA's Pioneer- ing Revolutionary Technology (PRT) program. The ECS program, funded at $24 million for FY2002, com- prises three broad, level 2 projects: System Reasoning and Risk Management (SRRM); Knowledge Engineer- ing for Safety and Success (KESS ); and Resilient Sys- tems and Operations (RSO). The projects are then di- vided into level 3 elements and, further, into a total of 52 level 4 and level 5 tasks. Program organization and budget are presented in Table 4-1. The goal of the ECS program is to "achieve ultra- high levels of safety and missions success by funda- mentally advancing NASA's system life-cycle ap- proach through the infusion of advanced technologies" (Gawdiak, 2002a). The program's intent is to accom- plish this goal by addressing areas of need, including increasing NASA's ability to conduct system and trade- off analyses and NASA's understanding of organiza- tional risk and of knowledge acquisition and commu- nication and improving its system control strategies and status assessment. REVIEW PROCESS The National Research Council's Panel on Engi- neering for Complex Systems (referred to as the ECS panel in this report) conducted its review in two phases. 38 TABLE 4-1 Engineering for Complex Systems (ECS) Program Organization and Budget, FY2002-2003 Budget (million $) FY2002 FY2003 ECS program, total Level 2 projects Knowledge Engineering for Safety and Success (KESS) Resilient Systems and Operations (RSO) Systems Reasoning and Risk Management (SRRM) 23.6 27.4 4.9 5.9 1 1.9 13.7 6.8 7.8 SOURCE: Gawdiak (2002b) and Andrucyk (2003~. The first phase was to gain an understanding of the top- level objectives of the ECS program as the program relates to overall NASA needs. This phase was com- pleted at the first meeting of the ECS panel, June 10- 13, 2002, at NASA Ames Research Center at Moffett Field, California. The second phase of the review was aimed at understanding the quality and technical mer- its of individual tasks being conducted under the aus- pices of the ECS program. To accomplish this task-
PANEL ON ENGINEERING FOR COMPLEX SYSTEMS level evaluation, the ECS panel gave ECS management a one-page questionnaire, which the management dis- tributed to some 52 level 4 and level 5 managers and principle investigators (PIs). The ECS panel then evalu- ated the individual tasks, referring to the question- naires, conducting follow-up site visits, reviewing tech- nical publications, and talking directly to PIs, as needed. A copy of the questionnaire can be found in Appendix E. Subpanels of the main ECS review panel visited six sites: Goddard Space Flight Center in Maryland (June 24, 2002), Jet Propulsion Lab (JPL) in California (July 1, 2002, and April 17, 2003), NASA Headquarters in Washington, D.C. (July 22, 2002), Glenn Research Center in Ohio (July 23, 2002), Johnson Space Flight Center in Texas (July 23- 24, 2002), and Kennedy Space Flight Center in Florida (July 30, 2002~. The findings and recommendations of the ECS panel are presented in this chapter. This report section initially discusses the top-level issues that are relevant to the ECS program as a whole. Other sections concen- trate on issues that apply individually to the three projects within ECS and discuss tasks specific to these projects. It should be noted that not all tasks within the ECS program are discussed in this report. If a task is not discussed, the ECS panel deemed that the effort is good work and should continue per the current ECS program plan. GENERAL OBSERVATIONS Programmatic Risk Management The ECS program has been in a state of flux since the program's formation some 6 months before the start of this review. At the end of this review process, the program is still in the early stages of developing a criti- cal mass a large enough effort to make a difference within NASA and the external community of re- search in programmatic risk management with the lim- ited budget available to it. The programmatic risk man- agement concept is a comprehensive, probabilistic process with full development and delineation of un- 39 certainties. For a given operational system, program- matic risk consists of the probability (with uncertainty delineated) of achieving each of the following system requirements: · System safety (probability of crew survival), · Reliability (probability of system completing its designed mission), Performance (probability of achieving the de- sign parameters of system performance), Cost of the program (probability of staying within the budget), and Schedule for system delivery (probability of meeting the schedule). Such a programmatic risk management concept, from early design to mission completion, would con- tribute to the comprehensive programmatic risk man- agement approach that is under development and being applied by safety organizations within NASA. Given the limited resources of the ECS program, it will not be able to create such a comprehensive plan in the fore- seeable future. Given the dynamic physical environ- ments in which NASA operates, and in light of the Mars robotic exploration failures and the Columbia tragedy, it is critical for ECS to aggressively contribute to the development of NASA's programmatic risk as defined above. The headquarters group responsible for implement- ing large-scale safety protocols at NASA is the Office of Safety and Mission Assurance (Code Q). Code Q performs research, both in house and by contract, on the development and application of methods for man- aging program risk. Code Q also provides guidance to NASA programs in the application of these methods. The other groups responsible for safety are located at the NASA centers. The safety and mission assurance organizations at the NASA centers assist NASA pro- grams in the application of programmatic risk technol- ogy. Improved cooperation between the ECS program and Code Q in developing and applying risk methods is strongly encouraged. The panel did see evidence of varying levels of ECS program involvement at the NASA centers visited (listed earlier in this chapter). Finding: NASA has a critical need for a compre- hensive risk management program that can be implemented throughout program life cycles. The ECS program should contribute to the development and application of such a program for NASA.
40 AN ASSESSMENT OF NASA 'S PIONEERING REVOLUTIONARY TECHNOLOGY PROGRAM Recommendation: In light of the Mars exploration failures and the Columbia tragedy, the ECS pro- gram should aggressively contribute to a compre- hensive programmatic risk management program that would develop the probability (with uncer- tainty delineated) of achieving each of the following system requirements: System safety (probability of crew survival), Reliability (probability of system completing its designed mission), Performance (probability of achieving the de- sign parameters of system performance), Cost of the program (probability of staying within the budget), and Schedule for system delivery (probability of meeting the schedule). NASA' s efforts to rework the overall ECS program plan during the course of this review (from June 2002 through April 2003) are appropriate given the program's central importance to NASA's mission. The overall ECS effort has the potential to make advances in complex systems engineering. The ECS program has succeeded as a role model for NASA by recognizing the need for and conducting multidisciplinary human and organization factors research. The goals of the ECS program are consistent with NASA goals of achieving safe and reliable systems and missions (Venneri, 2001~. As such, the mission and work of the ECS program are critical for NASA. Finding: The current ECS program, as formulated and funded, will not by itself develop a comprehen- sive programmatic risk management program in the foreseeable future, yet this ECS risk manage- ment work is important for NASA. Technical Quality ECS work in individual tasks is, in general, good. The ECS program appears to address the right prob- lems using correct justification (i.e., hypotheses) through a multidisciplinary research approach. How- ever, at the start of the review in June 2002, there were many gaps in the ECS portfolio that weakened the ef- fectiveness of the program, including the following: . . A need for benchmarks in the ECS program overall, Addressing NASA-specific problems in soft- ware development within Resilient Systems and Operations (RSO), Modeling organizational risk under Knowl- edge Engineering for Safety and Success (KESS), and The absence of a coherent roadmap or strategy for the development of System Reasoning for Risk Management (SRRM) technology, which could apply across many NASA programs, Many, though not all, of these gaps have been ad- dressed since the start of the review. The remaining gaps are discussed under the individual projects and in the section "Challenge Areas." The tasks have been reasonably distributed among NASA PIs, academia, and industry, and the balance of fundamental research versus user-driven research is ap- propriate, considering the early stage of the ECS pro- gram. The ECS panel anticipates that the current ECS program will have small but positive incremental im- pacts on limited areas or individual missions within NASA. However, the panel believes that more funda- mental research should be conducted if the ECS pro- gram is to have a profound and revolutionary impact on the NASA culture. Since NASA systems typically are more dynamic and more complex than other sys- tems, such as those in the nuclear industry, fundamen- tal research is required to enable adequate modeling of the interactions between hybrid-dynamic systems (sys- tems composed of hardware, software, and human ele- ments) operating in high-energy and hostile environ- ments. To balance the risk of not successfully completing a task against the payoff when a high-risk task suc- ceeds the ECS program has an appropriate mix of tasks (Gawdiak, 2002a, and based on response question- naires ). The majority are oriented to direct applications in missions and organization risk mitigation and risk analysis. A program such as ECS can be said to fail if no other NASA programs use the tools and methodolo- gies developed by the program. The payoff, on the other hand, is a revolutionary change in NASA culture, whereby mission and organization risk management become an integral part of the design of spacecraft and . . missions. The panel was also concerned with several tasks in the ECS program. The ECS panel acknowledges that NASA has constructed a competent team of in-house Integration of mission- and organizational- risk-oriented tasks within the ECS program overall,
PANEL ON ENGINEERING FOR COMPLEX SYSTEMS and external expertise. Moreover, it agrees with ECS management that top-down, functional decomposition to identify research requirements is important, that the starting point for the ECS program did not permit an optimal decomposition, and that there is a continuing challenge to integrate top-level research requirements with lower-level tasks. Since the program is still evolving, it is important for the ECS program to continually seek external guid- ance to help maintain the focus and quality of the pro- gram. This is especially so for the SRRM project, which has only recently come into clear focus (ECS,2003~. Finding: ECS work in individual tasks is, in gen- eral, good and focused. Finding: The ECS program is in the early stages of developing a concentration in programmatic risk management within NASA, which should prove to be central to the NASA mission. Finding: In general, the ECS program appears to address the right problems through a multidisci- plinary research approach. Finding: The ECS program, and SRRM in particu- lar, is still evolving. External reviews can help the program maintain focus and quality. Recommendation: The ECS panel recommends that the ECS program make use of external, indepen- dent reviews to maintain the focus and quality of the program. The personnel working under the ECS program are, with few exceptions, of very excellent quality and rep- resent NASA well. The program generally makes use of contractors and contractor facilities efficiently. Challenge Areas The ECS panel has identified two general areas that cut across the ECS program where, if changes are made, the ECS program as a whole would benefit. The first is the institution of benchmarks by which to mea- sure success and the second is an analysis of the sys- tem-level impacts of task products. At the management level, for instance, a benchmark of success for the SRRM project could be the successful deployment of 41 the risk workstation that is currently being developed, as detailed in the SRRM review below. The implementation of benchmarks (so-called "measurables") is also very important at the individual task level. In general, benchmarks refer to quantitative goals or expectations that serve as technical measures of success. For instance, a measure of success at the task level under SRRM might be successfully engag- ing a target number of NASA mission program offices and having them use the defect detection and preven- tion tool or the technology infusion, maturity assess- ment process. The SRRM project is already doing this informally, but it is important to list these criteria as formal benchmarks of success. The ECS panel did not see evidence that measurables have been integrated into all of the tasks. The panel acknowledges that the ECS program is relatively new but believes that measurables should be part of the planning for all individual tasks. Recommendation: The ECS program should imple- ment clearly defined benchmarks at the individual task level in order to judge task progress. The ECS panel was impressed by the top-level vi- sion that the ECS managers presented (Gawdiak, 2002a; Jones, 2002; Pallix, 2002; Prusha, 2002). This vision could be strengthened by analyzing the impact these tasks will have on NASA missions. This system- level analysis is important to guide NASA managers on the scope of their tasks and the impact they may have on large, distributed physical systems (computers and hardware) as well as organizational systems for management and decision making. It was not apparent to the ECS panel that these plans are currently in place. Recommendation: Each ECS task should have a system-level analysis of the impact it could have on NASA missions in order to determine the overall scope and the possible agency-wide impact of each ECS task. SPECIFIC TASK DISCUSSIONS The review panel evaluated individual tasks based on presentations from researchers during site visits and questionnaires completed by individual researchers. If the ECS panel had follow-up questions, individual members of the panel contacted the researchers directly or NRC staff requested additional information, such as
42 AN ASSESSMENT OF NASA 'S PIONEERING REVOLUTIONARY TECHNOLOGY PROGRAM published reports. Individual tasks are discussed under their projects, below. The ECS panel found some tasks within the ECS portfolio that were world-class efforts. Tasks falling under this category have the potential to yield signifi- cant results for NASA and possibly for communities outside NASA. The panel points out some of those tasks below as examples. The ECS panel found other tasks within the ECS portfolio that could be strengthened to make the over- all ECS program better. The tasks identified as "Wor- thy Efforts That Could Be Strengthened" should not be discontinued but should be altered in a way to further benefit NASA. While most of the ECS program is good, goal-ori- ented work that supports the NASA mission, the ECS panel identified two level 5 tasks that no longer appear to be relevant. These are discussed in the SRRM and KESS sections. SYSTEM REASONING FOR RISK MANAGEMENT The System Reasoning for Risk Management (SRRM) project consists of two elements, Integrated Risk Management Technologies and Integrated Sys- tem Modeling and Reasoning. The objective of SRRM is to develop a comprehensive risk management ap- proach for safety and system failure, to be used in the design and development of NASA aeronautics and space systems (Prusha, 2002~. If successful, this work will yield a risk management framework that can be applied throughout program and system life cycles to achieve greater safety and dependability of operation than achieved by traditional design approaches still in use today. SRRM research is critical to future NASA missions and has the potential for cross-NASA applicability, as well as applicability in the broader technical commu- nity outside NASA, but only if managed and imple- mented effectively. The introduction of a comprehen- sive risk management approach in the preliminary design phase rather than afterwards would enable de- sign trade-offs between performance, cost, schedule, and safety throughout program and system life cycles. If NASA considers safety early in the design cycle, it can halt or alter the development of unsafe systems early on. Improving safety later in the design process can be extremely expensive. The risk management framework supporting sys- tem design and development can help achieve a proper balance among system performance, schedule, cost, and safety, such that all of these parameters are at an acceptable level. System safety, in terms of the prob- ability of catastrophic failure on the one hand and the probability of mission success on the other, will be pro- vided, along with appropriate confidence bounds to delineate the uncertainties inherent in the estimates. A risk management framework may also be used to ac- cess the risk (probability and associated uncertainty) of achieving various system performance levels, system delivery dates, and cost. The SRRM objectives appear to be both worth- while and achievable with a reasonable investment of time and resources, considering the current state of the art in the various disciplines of design, mission, and organizational risk management. Improved system per- formance and safety appear to be within reach. Cur- rently, programmatic risk assessment and risk manage- ment research outside NASA is scattered. With the expertise available to NASA in the SRRM project, it would be appropriate for the SRRM project to assume a greater role in the risk management process and tool development efforts within NASA. In addition, by le- veraging work conducted outside NASA, SRRM has the opportunity to integrate best practices that are suit- able to NASA missions and needs. Finding: The SRRM project, conducted under the ECS program, is addressing critical issues in pro- grammatic risk management. The project has the potential to advance the state of the art for NASA- wide and external applications. The ECS panel has attempted to assess how the SRRM project has fared in terms of translating its top- level objectives into a set of well-defined, achievable goals. The panel has also examined how well these goals have been addressed in specific project tasks, as well as how these tasks as a whole will achieve overall program objectives. The ECS panel understands that the SRRM project seeks to further the state of the art in important areas of systems engineering for aeronautics and space systems. It is assumed that the project would then organize its activities in a systems engineering fashion, with top- level project objectives flowing down to lower-level task objectives. Initially, the ECS panel could not see much evi- dence that this orderly process was taking place, but to be fair, it could also see serious programmatic reasons
PANEL ON ENGINEERING FOR COMPLEX SYSTEMS for the delay in top-level project formulation. At first, the project was called Design for Safety (DFS), but very quickly, before significant project developments could take place, its objectives and impact shrank sig- nificantly owing to a reduction in budget. The panel assumed that a considerable realignment of activities took place because of these changes mandated from above. The ECS panel saw evidence that the SRRM project included some tasks carried over from the DFS program together with some newly defined tasks. Thus, in assessing the work being conducted under the SRRM project, it would be unfair for the panel to ignore the background situation that has directly affected the pro- gram mode of development. The SRRM management team proactively stream- lined its efforts during the course of this review by twice revising its work breakdown structure. In this sense, the panel has been reviewing a moving target. With that said, the panel appreciates ECS management efforts to identify and start implementation of needed changes to the structure and organization of the SRRM project. Between June and October 2002, there were indi- cations that the ECS management tried to put a more disciplined approach into effect, especially in the SRRM project. As of April 2003, the project plan for SRRM (ECS, 2003) appeared to be sounder and better organized than at the beginning of the panel's review. In fact, the material presented in April 2003 by NASA demonstrated to the panel that the reorganized SRRM project, as well as the goals and objectives, was greatly improved and more in line with NASA's mission. The panel also saw evidence that the SRRM project had taken into account and acted on guidance provided by the PRT committee in the interim letter report issued in January 2003 (NRC, 2003~. During the April 2003 re- visit, NASA reported that the SRRM project had par- ticipated in the internal NASA review of the Columbia accident. Development and implementation of a ma- ture risk management process might provide a frame- work for accident investigation, as well as a process for safety improvement. At the start of the review, in June 2002, the ECS panel had concerns about the ad hoc flavor of the tasks (Gawdiak, 2002a; Prusha, 2002~. However, after the April 2003 follow-up review of the redirected SRRM project, the panel found that the changes made to the project are good and appropriate. Details of the 43 project's new makeup and comments on specific tasks are presented below. Finding: The ECS panel concurs that the changes made during the course of the review to the SRRM project under the ECS program were correct and appropriate. The ECS panel wishes to point out some of the key concepts that ECS management appears to have em- braced. Specifically, the SRRM project has developed a coherent strategy and roadmap for the development of technology that will potentially apply to many NASA programs. In this process, ECS management identified how the individual tasks and research ele- ments fit into a broader concept of risk management that incorporates metrics of risk in the early design phases and carries them all the way through final de- sign and even implementation and operation. The panel wishes to emphasize that all activities under the SRRM project should be carried out in close consultation with the risk assessment technical community at large, as discussed in the next section. Connections to the External Community As part of the review process, the ECS panel was asked to examine how well the technical program was interacting with the community outside NASA. It found that the interaction of the SRRM project with the risk assessment and management community outside NASA has been limited and can use improvement. As presented, the tasks within the SRRM portfolio ap- peared to be based mostly on internal NASA work and knowledge despite the wealth of information available outside NASA that would only enhance the program. The panel observed little leverage of external work in mission and organization risk management. This has been especially true for the interaction (or lack thereof) of level 4 and level 5 task PIs with other communities. As one example, the Risk Management Collo- quium is held every year in Palo Alto, California, in the summer or early fall. This colloquium is cosponsored by NASA headquarters Code Q and NASA Ames Re- search Center. Several members of the SRRM team presented at, and took part in, this meeting in 2002. The panel encourages SRRM to take an even more ac- tive role, such as organizing a special session on SRRM objectives and needs. Since most of the NASA risk
44 AN ASSESSMENT OF NASA 'S PIONEERING REVOLUTIONARY TECHNOLOGY PROGRAM management and assessment community from all of the NASA centers would be present, along with some invited outside experts, such a session would expose a wide NASA audience to the SRRM project and goals. In general, the ECS panel noted that the SRRM dialogue with the risk assessment community at large appears to have been limited and unorganized. It be- lieves that there are ideas and expertise in the external technical community that could greatly benefit SRRM activities if better lines of communication and coop- eration are established. There is some anecdotal evi- dence that SRRM connections to the external commu- nity are improving. The panel encourages SRRM to continue pursuing formal and informal interactions. Recommendation: SRRM project management and task principal investigators should formally engage the internal NASA community as well as the exter- nal technical community as often as possible in or- der to gain exposure to external expertise and ben- efit from it. Every technical community has a concept of what risk means to it. These communities also have their own methods for risk mitigation. To become the center of gravity for multiple communities, including NASA, and to maintain credibility in those design communi- ties, ECS must demonstrate that it has knowledge of the state of the art and best practices in each of those communities. NASA can move toward demonstrating this understanding and credibility through the use of the Cross-Enterprise NASA Research Announcement (NRA) open solicitation process. Recommendation: The SRRM project, conducted under the ECS program, should ensure that all task deliverables (engineering tools, etc.) can be used across the diversity of NASA projects and missions in order to maximize their effect. This broad appli- cability will require that researchers in the SRRM project become familiar with the challenges faced and methods used in many NASA projects and missions. Research Portfolio The ECS panel found that the level 4 and level 5 tasks appear to reasonably cover important areas of need, as defined by NASA and top-level ECS goals. At the start of this review, the ECS panel had concerns that the SRRM project was attempting to fund too many projects at a funding level too low to yield significant results. During the course of this review, the SRRM project correctly decided to reduce the number of tasks it was supporting (ECS, 2002; Penix and Jones, 2003). This appropriate action has yielded a much more effec- tive but leaner SRRM organization. The ECS panel be- lieves that this limited research, in view of limited fund- . . . sing, Is appropriate. As already stated, it is important for NASA to re- member that SRRM results should be applicable to all NASA projects and missions. The project should also ensure that even during the initial development of the risk workstation that NASA has planned and is con- ducting, serious efforts should be made to capture and include all of the major risk factors in the trade-offs encountered in mission or system design. The ECS panel recognizes the difficulty of this undertaking. However, if not done carefully, incomplete design trades may produce misleading or incorrect informa- tion, resulting in a false sense of security in the final product. The tasks being conducted under the SRRM project appear to be defined with an appropriate level of pro- gram risk in that they have a reasonable chance of suc- cessful completion. The ECS panel emphasizes that SRRM research inherently needs to proceed from a solid basis of previous developments, and in an orderly and systematic fashion. Therefore, the risk associated with development and application of the risk mitiga- tion work has a medium chance of successful comple- tion but, if completed, a high payoff for the NASA missions. SRRM is approaching and managing its risk appropriately, mainly by assigning the right people to the task in question or by soliciting help as needed. People and Facilities A positive aspect of the work being conducted un- der the SRRM project is that the project is bringing together people from multiple disciplines and perspec- tives to jointly develop a common vocabulary and achieve common goals. SRRM has apparently as- sembled this group in a very deliberate manner. First, it found the people in NASA it wants to work with and it has also developed a program management structure. SRRM has made an attempt to bring in people from the outside through the use of various NRAs. These are leaders in their fields who can easily fill any gaps in expertise. The facilities were in good condition and appropriate for the work being conducted.
PANEL ON ENGINEERING FOR COMPLEX SYSTEMS Methodology SRRM personnel have approached mission projects within NASA in order to explore and possibly create technology transition opportunities for SRRM products. The ECS panel commends the SRRM project for its successful use of this methodology. Once NASA missions take on SRRM products or processes, the SRRM work can reach a higher TRL more easily by virtue of its real-world use by NASA missions. Of several notable successes of the SRRM project, one success is the use of the Technology Infusion, Maturity Assessment (TIMA) process by the Primary Atomic Reference Clock in Space mission. The same TIMA process was also used to validate the LabView software architecture for use in space. Finally, the De- fect Detection and Prevention (DDP) program is being used by the Mars Science Laboratory mission, slated to fly in 2009. DDP is being used as a tool for program management, which is based on an underlying risk mitigation process. These successes prove that SRRM has taken steps to ensure that the methods and techniques it develops can be migrated out to a user community. SRRM should continue to make certain that the applications propagate beyond these few programs to allow ECS to become a center of gravity for NASA in probabilistic risk-based design. Recommendation: The SRRM project, under ECS, should further concentrate on migrating its devel- oped techniques and methodologies, such as Tech- nology Infusion Maturity Assessment (TIMA), to a user community within NASA in order to make ECS a center of gravity within NASA for risk assessment. Quality of Work At the start of the SRRM project, the ECS panel found that there was a mismatch between the types of tasks being conducted under SRRM and the SRRM project objectives (Prusha, 2002~. Since then, SRRM has reduced the number of tasks being pursued and has also apparently made active use of the NRA process. As a result of these program changes, the SRRM project improved dramatically over the course of the review. The panel has determined that the current work being conducted in the SRRM project is on the right path and could become excellent quality and meet ac- cepted peer-review standards. 45 Observations on Specific SRRM Tasks Wor/~-C/ass Tasks There are no tasks in this category for SRRM. Comp/eled Work or areas Not Supporting the NASA Mission Probabilistic Analysis of HISS Power System This task is being conducted as an SRRM level 2 task. Previous work under this task gave some very good and useful results on the power usage of the International Space Station. The follow-on efforts aim to bring the power system model to a higher resolution. The ECS panel believes that there is a diminishing rate of return on this activity and that there is no need to go beyond the current understanding of the ISS power system. In this sense, the work conducted under that task was very good, but it is complete. ECS management has can- celed this task and the ECS review panel agrees with this decision. KNOWLEDGE ENGINEERING FOR SAFETY AND SUCCESS The Knowledge Engineering for Safety and Suc- cess (KESS) project concentrates on system-level model and methods design for system and human orga- nizational risk and failure analysis. The project is di- vided into two level 3 elements: Human and Organiza- tional Risk Management (HORM) and Knowledge Management (KM). Human and Organizational Risk Management This level 3 element aims to identify, model, pre- dict, and mitigate technical and program risks as a func- tion of the structure and processes of teams and organi- zations. This is a very ambitious undertaking. It involves the creation of a novel synthesis of observa- tional, cognitive, technical, and organizational meth- odologies in the service of critically important NASA goals. If successful, this element could become a core competency within NASA. The ECS panel found that the element summary presented by ECS management is well written and pro- vides clear directions for the research to follow. Gener- ally, the portfolio mix of beginning versus mature tech-
46 AN ASSESSMENT OF NASA 'S PIONEERING REVOLUTIONARY TECHNOLOGY PROGRAM nology tasks appears to be appropriate. The ECS panel based its conclusion on the technology readiness level (TRL) system that NASA employs. Key parts of this effort include the development and deployment of a useful multilevel model of risk perception and management while optimizing organi- zational performance. Some parts of this important set of tasks are easier to conceive and execute than others. While improvement in techniques and processes often appears to be straightforward, the approach on funda- mental issues is more daunting. For instance, developing the organizational portion of the risk management model would clearly be a ma- jor outcome. However, given the current state of orga- nizational modeling, the early development and vali- dation of such a model (to be completed in FY2003) seems overly optimistic. Important questions that re- main to be answered are these: What are the research hypotheses? What would a model look like? How would such a model be used? In what sense would the model be computa- tional? How would it be validated? The ECS panel knows that the research team is well aware of these issues, yet seemingly infeasible mile- stones are presented as part of the overall plan. It is possible that the short development time may have been imposed from above the element level to foster coher- ence across programs. While timeliness is also a desir- able goal, the ECS panel believes that the development of novel models and validation methodologies should be given enough time. Recommendation: The Human and Organizational Risk Management level 3 element within the ECS program should establish a feasible time frame for the development of novel models and methodolo- gies in order to allow researchers an appropriate amount of time to generate measurable results. The ECS panel commends NASA for taking on such a challenging task as developing complex organi- zational models. A critically important aspect of the modeling research is its multidisciplinary nature. The ECS panel believes that the model development effort can be improved by having social scientists work on the same team with engineers, systems/equipment op- erators, and computational model developers. This type of interaction should be in addition to engaging in col- legial dialogues. The diverse group could integrate the individual expertise of its members into the joint enter- prise and together be responsible for the outcome. Closer ties with the multiagent systems researchers in the CICT program would also help to create computa- tional models of organizations that could be used to predict risk factors. Recommendation: To improve the model develop- ment process, the Human and Organizational Risk Management level 3 element within the ECS pro- gram should increase the diversity of the research team by including engineers, systems/equipment operators, and computational model developers. The effort within ECS should also establish closer ties to the NASA CICT program to help in the cre- ation of computational models. One place where this multidisciplinary approach is being tried is in the empirical analysis of problems that may occur during shift handovers, as identified in the level 5 task Understanding Shift Handovers in Mission Control, for which a new choice capture tool has been implemented in the level 5 task Choice Capture Tech- nology for Mission Control. The work is important in that it could improve the success of shift handovers, as well as provide a means to evaluate the usability and utility of the developed tool. Other aspects of multi- disciplinary research that could be incorporated into the Human and Organizational Risk Management ele- ment include, for example, results from artificial intel- ligence (especially multiagent systems), distributed cognition, and natural language processing, particularly in the design and deployment of controlled languages as they are being developed by Boeing Corporation. Since organizational dynamics figure prominently in mishaps, it is very important that more effort be de- voted to understanding and modeling organizations in tasks under the Human and Organizational Risk Man- agement element. For instance, organizational metrics are not as well understood as team metrics and depend on an organizational model for relating team and orga- nizational performance. The ECS panel believes that it is important for NASA to increase its knowledge of the organizational processes, climates, and information flows in order to detect and repair latent pathogens in complex systems (Reason, 1990~. These so-called pathogens were cited in recent accidents, including the
PANEL ON ENGINEERING FOR COMPLEX SYSTEMS Columbia tragedy. Multiorganizational processes and relationships, such as interorganizational relationships and dynamics, should also be studied. Finally, some account needs to be taken of the role of management priorities in setting safety/risk policies and in their as- sessment. At the start of the review, the ECS panel was concerned that the technical and management links between the three elements within ECS were not strong. As of April 2003, it appeared to the panel that the ECS program has established those linkages. The panel believes that these linkages should be strength- ened, especially between Human and Organizational Risk Management and SRRM. Recommendation: In terms of overlap between ar- eas of expertise within ECS, it would be desirable for the Human and Organizational Risk Manage- ment element to be more explicitly integrated with the products coming from the System Reasoning for Risk Management project. Knowledge Management The Knowledge Management level 3 element of- fers a number of exciting, well-integrated research ef- forts that aim to capture and represent knowledge of the structure and function of spacecraft hardware throughout that hardware's life cycle. The results of these efforts will be an integration of contractor and agency knowledge bases that could be used by design- ers, operators, and maintenance personnel, as well as by automated diagnostic systems. Two examples of this work are the level 4 task Wire Integrity Research and the level 5 task Hybrid Reflectometer. The ECS panel deemed that the Virtual Iron Bird level 4 task is especially notable. The goal of the task is to build a detailed virtual model of the space shuttle as a means to visualize and interact with a virtual shuttle before attempting operations on the vehicle itself that have a high risk of damaging the shuttle. This project is showing the way forward for other spacecraft designs. Also notable under the Knowledge Management element are two database collection tasks, Lifecycle Systems Integration and Digital Modeling, that will enable NASA to capture technical measurements and specifications of the shuttle and other spacecraft de- signs. These efforts will allow NASA to take into ac- count uncertainties in structure and function that are created by wear, maintenance, and undocumented modifications. 47 There is some general overlap between the Knowl- edge Management element within ECS and another PRT program, the Computing Information and Com- munications Technology (CICT) program. The ECS panel looked carefully at this issue and found that stron- ger collaboration between the two groups would create an even more goal-directed program. Recommendation: The Knowledge Management el- ement within ECS should work together with Com- puting, Information, and Communications Technol- ogy (CICT) program researchers to prioritize research on computational tools that underlie the Knowledge Management element's efforts. The ECS panel noted that one level 4 task, Inter- Organizational Process Analysis, and its two subtasks, Socio-Technical Approach for Identifying Ground Pro- cessing Errors and Human Factors in Inter-Organiza- tional Process Analysis, appear not to fit the focus of the Knowledge Management level 3 element because these two subtasks involve characteristics of organiza- tions, which are supposed to be incorporated under the level 3 element Human and Organizational Risk Man- agement. Moreover, the ECS panel believes that the level 4 task, as currently executed, is not likely to pro- duce the desired general knowledge. Recommendation: The level 4 task Inter-Organiza- tional Process Analysis should be integrated under the Human and Organizational Risk Management level 3 element with better designed subtasks in or- der to make the effort more effective. Observations on Specific KESS Tasks Wor/~-C/ass Tasks Organizational Risk Perception and Management This level 4 task, placed under the level 2 KESS project, is a well-written overview of the domain and task direc- tions for risk perception. The scientific background and research progress of this task is also impressive. Virtual iron Birds This level 4 task under the level 2 KESS project appears to be an innovative approach to allowing users to visualize and interact with informa- tion about the structure and function of complex sys- tems. The results will support maintenance and modifi- cation operations.
48 Worthy Efforts That Cold Be Strengthened Organizational Metrics At the start of the review cycle, this level 5 task under the level 2 KESS project demon- strated a good beginning to an effort that consists of NASA-relevant work. A weakness identified at that time was that the project researchers should have been more cognizant of the literature in the high reliability field in order to give the technical community at large more confidence in the ultimate result of the task. Also, the panel felt that the PI should consider collaborating or consulting with the high reliability community. As of April 2003, in conjunction with the Human and Or- ganizational Risk Management element, the team had conducted a literature search of organizational culture, safety culture, and high-reliability organizations. It also apparently cross-linked this effort to other researchers pursuing similar topics. The panel commends ECS for taking all of these steps. ~nterorganization Process Analysis At the start of the re- view, the panel was concerned that, given the current working plans, this level 4 task under the level 2 KESS project was not likely to produce the general knowl- edge that is being sought. In early 2003, the task was rescoped by ECS management and placed under the direct support of the Digital Shuttle task. The panel concurs with this action. Human Factors in ~nterorganizationa~ Process Analysis This level 5 task under the level 2 KESS project would ben- efit by involving more basic science principles. As de- scribed in the PI's response to the questionnaire, the task "builds on the methods and tools of psychology, anthropology, linguistics, and communication sci- ences," and presumably employs "field and observa- tional methods to characterize technical operations." However, this level of generality is not helpful for un- derstanding the strengths and weaknesses of an effort at TRL 4-6, where the expectation of near-term posi- tive outcomes needs to be based on earlier basic sci- ence. For the work to succeed, it is critical to specify the conceptual frameworks and corroborated results from the previous research on which the task will build and to which the task should be expected to contribute. The absence of such well-articulated starting points is of concern to the panel. Wire integrity Research This level 4 task under the level 2 KESS project and the level 5 task under it, Hybrid AN ASSESSMENT OF NASA 'S PIONEERING REVOLUTIONARY TECHNOLOGY PROGRAM Reflectometer, are good work and should continue. The effort aims to develop techniques that would automati- cally diagnose the state of health of wiring on aging spacecraft and is directly applicable to NASA's mis- sions. At the beginning of the review, the ECS panel be- lieved that this research activity might be generalizable beyond the shuttle and that it therefore needed to be more basic. Also, the panel found that the project should include more physics-based measurement re- search on techniques applicable to multiple systems. To strengthen the project, NASA should take more of a leadership role in industry and academia in this tech- nology field than it has been taking. The ECS program, apparently the leader at this point in wire integrity re- search within NASA, could expand its efforts, specifi- cally by involving NASA's Office of Safety and Mis- sion Assurance (Code Q). As of April 2003, the panel was satisfied that the task PI and management have significantly improved their work by more actively engaging the external com- munity and participating in the research on physics- based measurement techniques. The ECS panel com- mends the members of the Wire Integrity Team for their good performance given the limited amount of funding. Areas Not Supporting the NASA Mission or Completed Work Sociotechnical Approach for Identifying Ground Processing Risk This task is being conducted under the level 2 KESS project. The private contracting company that NASA employs has little demonstrated familiarity with sociotechnical systems theory. Also, it has no approach for integrating social and technical risk. It is not clear whether the off-the-shelf tool the contractor developed and the experience it has with medication errors in hos- pitals will generalize to the NASA-specific task. There would not be opportunities for validating the resulting error estimates, especially the interdependencies between errors, using the task's current plan. In gen- eral, the activities rely too heavily on unverifiable judgments. As of April 2003, this task was still ongoing. The review panel understands that it intended to provide support to the Digital Shuttle effort. However, the work conducted under this task could seriously jeopardize the overall success of the Digital Shuttle effort. Unless
PANEL ON ENGINEERING FOR COMPLEX SYSTEMS drastic changes are made to the task, including chang- ing the support contractors, the review panel recom- mends that it be discontinued, as the quality of work is not up to NASA standards. Recommendation: The ECS panel recommends that the level 5 task Socio-Technical Approach for Iden- tifying Ground Processing Risk be discontinued or its support contractors replaced with more quali- fied personnel. RESILIENT SYSTEMS AND OPERATIONS The Resilient Systems and Operations (RSO) project was formed to address safety and stability is- sues in mission software, autonomous systems, and human-machine interfaces (Pallix, 2002~. The project comprises two level 3 elements, Intelligent and Adap- tive Operations and Control and Resilient Software Engineering. The Intelligent and Adaptive Operations and Control element encompasses four level 4 tasks. The Resilient Software Engineering element includes two level 4 tasks and seven level 5 tasks. The RSO project appears to be broad in scope, and its two principal components appear to be central to the NASA mission. The project has engaged some of the nation's best talent in human-computer interaction, es- pecially in software engineering. The ECS panel noted particularly exemplary efforts in JAVA Pathfinder and in the establishment of the NASA Ames Research Cen- ter testbed for computer code verification. With this said, the ECS panel did have concerns about the direct applicability to NASA of some of the tasks, as dis- cussed in the next section. As of April 2003, the RSO project had made significant improvements, including becoming more customer- and application-oriented. Intelligent and Adaptive Operations and Control The Intelligent and Adaptive Operations and Con- trol element has a very well-balanced portfolio of level 4 tasks in terms of their probability of success. There is a good mix of TRLs across the tasks. The TRL range is from 1 to 2 for the task Autonomous Propulsion Sys- tem Technologies and 4 to 7 for the task Adantive Flight Control Research. Overall, the tasks are compa- rable to and competitive with academic work in the same category. From discussions it had with manage- ment of this element, the ECS panel anticipates that the element's work will take the positive step of concen- 49 bating more on autonomous operations for systems than on system components, as it does now. Resilient Software Engineering The use of software is pervasive throughout all the NASA missions. There has long been a challenge for both NASA and industry to produce reliable software in a timely, cost-effective manner. As such, the Resil- ient Software Engineering element is central to NASA's mission. The potential benefits of better software develop- ment techniques are very great: They could improve the prospects for successful missions and ensure that missions meet their cost objectives. However, it was difficult for the ECS panel to determine if the tasks being conducted under the Resilient Software Engi- neering element are likely to have a significant impact on software development within NASA given the long history of research and work in these areas outside NASA and the lack of specificity in identifying NASA software challenges in task descriptions. Software engineering in general and dependable systems in particular are well-developed research dis- ciplines ubiquitous in academia and industry. Resil- ient Software Engineering element research in these areas follows several currently popular approaches. However, the rationale for NASA selecting these spe- cific approaches over others was unclear to the review panel at the beginning of the review, as was the appli- cability of the tasks to NASA's most pressing software problems. At the start of the review, academic participation in High Dependability Computing research, a level 4 task within Resilient Software Engineering, appeared to have been heavily skewed toward the conventional software engineering community and did not give suf- ficient weight to researchers from other important com- munities. These other communities include real-time computing, dependable computing, and static program analysis, all three of which could make valuable con- tributions. As of April 2003, the work under the Resilient Software Engineering element had greatly improved. While there is much work to be done, the top-level view of the collection of tasks was much more focused and the element's goals appeared to be relevant to NASA or, at a minimum, to other work being done within the ECS program, such as the SRRM project. The element has also apparently expanded its involvement within
so AN ASSESSMENT OF NASA 'S PIONEERING REVOLUTIONARY TECHNOLOGY PROGRAM NASA by making contact with researchers at Marshall Space Flight Center and outside NASA with the United Space Alliance. It has also added at least one task in- volving network dependability to the mix of tasks be- ing pursued. Finally, the element adjusted its mix by dropping some tasks that were complete or not per- forming to expectations. The panel concurs that all of these actions should improve the overall Resilient Soft- ware Engineering element. Finding: The Resilient Software Engineering ele- ment tasks, while showing improvement over the course of the review, appear to have limited partici- pation by other software-related communities such as real-time computing, dependable computing, and static program analysis. Research by the Intelligent Software Engineering Tools task exhibits considerable overlap with that by active academic and commercial research, as many of the problems are shared by industry, including the high cost of software development, the difficulty of large- scale collaboration, and the need to ensure high de- pendability. Since funding for the NASA effort is scanty, the panel questions the benefit of duplicative efforts. To maximize the impact of this work on NASA, the panel recommends focusing the research on prob- lems of high priority to NASA that are not adequately addressed by outside research and commercial tools. Recommendation: The Intelligent Software Engi- neering Tools component of the Resilient Software Engineering element within the RSO project of ECS should continue to identify which of NASA's high-priority software problems are unique to NASA and which high-priority software problems are not being addressed by industry research or the academic community. Based on this identification, management should then shift resources under the Resilient Software Engineering element as needed. The ECS panel encourages NASA to continue to seek involvement from other software-related com- munities, such as dependable computing and static analysis. The high dependability testbed included in the level 5 tasks High Dependability Computing, Testbed for Re- usable Flight Software, and Dependable Networks for Flight Testbed has the potential to make significant contributions to NASA and beyond. Collaboration with academics by making testbeds available to them is an excellent and relatively new approach that allows aca- demic researchers to work on problems immediately relevant to NASA. The result of such a testbed collabo- ration should be flight-qualified or near-flight-quali- fied software that is made available to the general re- search community. Such software is often difficult to obtain from other sources. Recommendation: The ECS panel suggests that ECS management make the flight software code de- veloped at the Ames Research Center testbed avail- able to the general research community. The ECS panel notes that the Ames Research Cen- ter established such a software dependability labora- tory in the past namely, the Digital Flight Control Systems Verification Lab (DFCSVL), a facility that existed at Ames about 20 years ago. The panel hopes that the current effort can be sustained, since past simi- lar work was discontinued. While the ECS panel is en- thusiastic about the establishment of such a laboratory, it urges project managers to carefully plan for sustainability of the lab. Because it involves complete systems, software development under the Resilient Software Engineering element is very complex. Examples of this complexity are evident in the Jet Propulsion Laboratory's (JPL's) Mission Data System (MDS) software, a project for managing telemetry, flight, and experiment data on- board a spacecraft, and the code for the Center Termi- nal Radar Approach Control (TRACON) Automation System (CTAS), a system providing controllers with various aids for more effectively directing the flow of aircraft at busy airports. In fact, the software may be too complex for this (or any other) testbed to perform the intensive verification and validation needed for high-dependability software. It is unclear what benefit could be gained from unit or subsystem testing of the MDS or CTAS, because the development organizations (JPL and Ames) are already performing such testing. The reported goal of the new Ames laboratory is to perform integrated testing. It appears that the level 4 tasks Autonomous Pro- pulsion System Technologies and Adaptive Flight Con- trol Research, which do not have a significant verifica- tion and validation component, could definitely benefit iSee <www.ctas.arc.nasa.gov/CTAS>.
PANEL ON ENGINEERING FOR COMPLEX SYSTEMS from the activities of the Ames laboratory. These tasks, if successful, would be subject to extensive certifica- tion and qualification requirements if the technology developed is eventually applied to civil aircraft. The ECS panel therefore encourages the project to concen- trate the laboratory testbed research on these two level 4 tasks as these two tasks will in all likelihood benefit the most. Recommendation: The ECS panel suggests that NASA management concentrate the Ames Research Center software verification and validation labora- tory testbed on the level 4 tasks Autonomous Pro- pulsion System Technologies and Adaptive Flight Control Research being conducted under the Resil- ient Systems and Operations level 2 project. Observations on Specific RSO Tasks Wor/~-C/ass Tasks Advanced Software Verification and Testing Tools This level 5 task is contained under the RSO level 2 project. It concentrates on statically analyzing software to find bugs and has a clear motivation and approach. The work also builds clearly on previous work at Ames Research Center. Worthy Efforts That Could Be Strengthened Empirically Validated Software Dependability Model The ECS panel questions whether this level 5 task, under the level 3 Resilient Software Engineering element, will yield the desired results. The validity and useful- ness of the software dependability model that is being used have not been explored, but the model is slated to be transferred directly to development efforts. The TRL listed for the task is based on high expectations that the model will work in a new domain. Perhaps the most important question here concerns the data being used. Without adequate data, this re- search, like much of the research that preceded it over the last three decades, has a far lower probability of success than the 75 percent estimated in the task de- scription. Much of the data that might be used in this program originated in the NASA Goddard Software Engineering Laboratory (SEL). NASA has a wealth of other data that could be made available to this effort. 51 made to provide the research team with data and the background information necessary to assemble such data. Dependable Networks for Flight Testbed This level 5 task under Resilient Software Engineering is being devel- oped to assess the applicability of dependability tech- nologies to potential next-generation ISS onboard in- formation technology capabilities. At the start of the review, the panel found that the task could be improved by involving researchers from the distributed systems and reliable computing communities. These communi- ties have been working on similar issues for many years. As of April 2003, ECS had taken significant steps to involve researchers in these other areas. The review panel is comfortable that the effort should yield useful results. intelligent Software Engineering Tools The comments the ECS panel has for this level 4 task under the level 3 Resilient Software Engineering element also apply to the level 5 tasks under it except for the level 5 task Advanced Software Verification and Testing Tools, which the ECS panel deems was world-class (criteria listed in Chapter 2~. At the start of the review, the ECS panel found that the goals of the general tools projects could have been more solidly formulated and given a clearer motiva- tion. The panel was unable to determine at that time if there were any novel ideas or approaches for these tasks. In particular, it was unclear why NASA needed to develop its own tools in light of the many commer- cial and open-source projects cited in the project de- scriptions. Specifically, the motivation for the level 5 task Formal Specifications Database, to create a database of formal specifications, was not clear, nor did the panel understand how such a database would be searched or who the end user community would be. During the course of the review, the ECS manage- ment canceled the level 5 task Collaborative Software Engineering Tools and descoped the other two tasks: Formal Specifications Database and Formal Enough Notations for Computer System Engineering. The ECS panel believes that the ECS management took the cor- rect action in this case. Furthermore, as discussed in the preceding section, ECS has taken additional posi- tive steps by engaging the internal NASA community The highly qualified lead on this task will have a much to identify those NASA-specific problems that need to higher likelihood of success if a concerted effort is be addressed.
52 REFERENCES AN ASSESSMENT OF NASA 'S PIONEERING REVOLUTIONARY TECHNOLOGY PROGRAM National Research Council (NRC). 2003. Interim Report of National Re- search Council Review of NASA's Pioneering Revolutionary Technol- ogy Program. Washington, D.C.: The National Academies Press. Avail- able online at <http://www.nap.edu/catalog/10605.html>. Accessed April 29, 2003. Reason, James. 1990. Human Error. Cambridge, England: Cambridge Uni- versity Press, pp. 173-216. Venneri, Sam. 2001. NASA Aerospace Technology Enterprise, Strategic Master Plan, April. Washington, D.C.: National Aeronautics and Space Administration. BRIEFINGS Dennis Andrucyk, NASA Headquarters, "Office of Aerospace Technology FY2004 President's Budget," material provided to the committee on May 5, 2003. Yuri Gawdiak, NASA Headquarters, "Summary SRRM WBS Modifica- tions: April 2003 Update," material provided to the ECS panel in April 2003. Yuri Gawdiak, NASA Headquarters, "ECS Program Strategies: Part I," pre- sentation to the ECS panel on June 11, 2002(a). Yuri Gawdiak, NASA Headquarters, "ECS NASA Research Council Re- view," presentation to the committee and panels on June 11, 2002(b). Patricia Jones, NASA Ames Research Center, "Knowledge Engineering for Safety and Success," presentation to the ECS panel on June 12, 2002. Joan Pallix, NASA Ames Research Center, "Resilient Systems and Opera- tions," presentation to the ECS panel on June 12, 2002. John Penix, NASA Ames Research Center, and Patricia Jones, NASA Ames Research Center, "ECS Response to the NRC Review Committee's Request for Additional Information," presentation to the ECS panel on June 12, 2002. Steve Prusha, Jet Propulsion Laboratory, "System Reasoning and Risk Management," presentation to the ECS panel on June 11, 2002.