In early 1991, the National Aeronautics and Space Administration's (NASA's) Office of Space Flight commissioned the Aeronautics and Space Engineering Board (ASEB) of the National Research Council (NRC) to investigate the adequacy of the current process by which NASA develops and verifies changes and updates to the Space Shuttle flight software. The Committee for Review of Oversight Mechanisms for Space Shuttle Flight Software Processes (hereafter, the Committee) was convened in January 1992 to accomplish the following tasks (see Appendix B):
Review the entire flight software development process from the initial requirements definition phase to final implementation, including object code build and final machine loading.
Review and critique NASA's independent verification and validation process and mechanisms, including NASA's established software development and testing standards.
Determine the acceptability and adequacy of the complete flight software development process, including the embedded validation and verification processes through comparison with (1) generally accepted industry practices, and (2) generally accepted Department of Defense and/or other government practices (comparing NASA's program with organizations and projects having similar volumes of software development, software maturity, complexity, criticality, lines of code, and national standards).
Consider whether independent verification and validation should continue.
The first issue the Committee was asked to consider was the Shuttle program's decision to eliminate the independent verification and validation (IV&V) function currently performed on the Shuttle flight software at an annual cost of $3.2 million (out of approximately $100 million per year for the complete software development and assurance process). The IV&V effort was scheduled to be eliminated by October 1992. The Office of Space Flight requested that the Committee first address whether there was a need to continue this function and later address other aspects of the flight software development process. An interim report on the IV&V issue only, included as Appendix C, was issued by the ASEB in July 1992.
The IV&V was instituted, in part, as a result of recommendations by the Rogers Commission on the Space Shuttle Challenger accident; 1 an NRC committee to evaluate post-Challenger Shuttle risk assessment and management; 2 the House of Representatives Committee on Science, Space, and Technology; and the General Accounting Office (GAO). Although the recommendations in the previous studies differ in their details, they were unanimous in their belief that additional oversight of the software development process and independent evaluation of the software is necessary to assure safe and effective operation of the Shuttle. Despite this unanimity, NASA's Shuttle Program Office has been reluctant to continue the use of IV&V, arguing that the risk reduction it provides does not justify the additional cost. The Shuttle Program Office felt that the previous investigations had not had the benefit of recent efforts to document the current verification and validation (V& V) process and had not adequately addressed the cost of additional oversight in relation to the benefits gained.
After hearing presentations from the Shuttle Program Office and their various contractors, and after reviewing the extensive documentation they provided, the Committee concluded that:
. the current IV&V process is necessary to maintain NASA's stringent safety and quality requirements for man-rated vehicles. Therefore, the Committee does not support NASA's plan to eliminate funding for the IV&V effort in fiscal year 1993. The Committee believes that the Space Shuttle software development process is not adequate without IV&V and that elimination of IV&V as currently practiced will adversely affect the overall quality and safety of the software, both now and in the future.
As a result of this and the previous recommendations, NASA has decided to continue IV&V in its current form as a permanent part of the program. This final report expands somewhat on the IV&V issue but also includes an evaluation of the current process and other safety and organizational issues associated with the maintenance and upgrade of the Shuttle flight software that were not covered in the interim report. The report is organized in terms of findings and recommendations with respect to the verification and validation process, safety, organizational issues, and considerations for future NASA projects. Part 1 of the report, Overview and Background , is a discussion of the information the Committee feels is necessary for a reader to understand the processes used to maintain and upgrade the Shuttle software. Part 2, Findings and Recommendations , is a detailed discussion of the findings and recommendations that resulted from the Committee's in-depth assessment of the entire Shuttle software development process. These findings and recommendations are summarized below, but a detailed discussion can be found in the body of the report.
The Committee's investigation, as outlined in its Statement of Task (see Appendix B), considered all aspects of the overall software development process as it was described to the Committee by NASA and NASA's contractors. This investigation included: the process for
Report of the Presidential Commission on the Space Shuttle Challenger Accident, by William P. Rogers, Chairman (Washington, D.C.: Government Printing Office, 1986).
Post-Challenger Evaluation of Space Shuttle Risk Assessment and Management, by the National Research Council Committee on Shuttle Criticality Review and Hazard Analysis Audit (Washington, D.C.: National Academy Press, 1988).
requirements definition and specification; the processes used by the development and IV&V contractors; the configuration management process; test case development and evaluation; system software testing and integration; preparation of mission-specific software and data; and the loading and verification of the final flight software package. Although it did not have the time or resources to completely exhaust all potential avenues of investigation, the Committee believes that the overall process was addressed in sufficient detail to justify the findings and recommendations that are discussed in this report. Additional investigation (by other committees or internal NASA bodies) and a continuing evaluation by those involved in making the process work may be necessary as NASA and its contractors proceed with implementation of the Committee's recommendations, particularly the recommendations regarding better documentation of the overall process. However, at this time the Committee feels that the evaluation provided in the report is sufficient to help NASA improve the overall process and ensure that safe and effective software is developed for the Space Shuttle.
Finally, the Committee recognizes that NASA must be very conscious of cost. Many of the Committee's recommendations will not require additional cost to NASA, because they involve only changing reporting relationships and providing additional authority (but not necessarily additional staff) to existing organizations. Some will actually save money over the long run by helping management better understand the overall process and thereby avoiding unnecessary difficulties. Others will require additional staff and an associated increase in costs. This is unfortunate but, in the Committee's opinion, necessary. The Committee was not asked, nor was it constituted, to develop specific cost estimates for these additional activities. Instead, the Committee has attempted to provide a coherent description of the benefits these recommendations will provide to the Shuttle program and to NASA as a whole. The Committee does not believe that the cost of implementing these recommendations will be excessive.
THE SHUTTLE VERIFICATION AND VALIDATION PROCESS
Although in general the Committee was impressed with the Shuttle flight software V&V process, there is room for improvement with respect to requirements, subsystem interactions, hardware/software platforms, off-nominal cases, and the use of potentially error-prone coding practices.
NASA Guidelines and Standards
Finding #1: Each software development contractor provides its own development and coding guidelines for the Shuttle software. These guidelines are not consistent among the developers.
The Committee found generally high-quality practices by the software contractors and NASA V&V participants. It was surprised, however, to find that NASA provides no software development or V&V guidelines to its contractors. Different V&V procedures are used by the various contractors, some of whom consider these procedures to be proprietary. This can lead
to unfortunate inconsistencies among the contractors and the software components and also to a less than optimal overall process from the NASA viewpoint.
Recommendation #1: NASA should develop guidelines for software development and V&V procedures and should require contractors to share experiences gained while developing NASA-contracted software.
Finding #2: V&V inspections by the development contractors pay little attention to off-nominal cases.
During design and code inspections, off-nominal situations (i.e., crew/ground errors, hardware failures, or software errors) are explicitly considered only for loop termination and multipass activity (e.g., abort control sequence). 3 A study sponsored by NASA found that:
Problems associated with rare conditions emerge as the leading cause of software discrepancies during the late testing stage in this sample. A better methodology for treating rare conditions during design and the earlier test stages could avoid over one-half of all failures and over two-thirds of the failures in the most severe classifications. 4
Recommendation #2: The V&V performed by the development contractors should include off-nominal scenarios beyond loop termination and abort control sequence actions and should include a detailed coverage analysis.
System-Level Software V&V
Finding #3: V&V inspections by software development contractors focus on verifying the consistency of two descriptions at different levels of detail (e.g., consistency between a module's requirements and the design of its implementation). The correctness of the requirements with respect to the hardware and software platforms on which implementations run are generally not considered. As a result,
Loop termination is a term used for the logic and criteria by which the software determines when a programming loop has completed an appropriate number of cycles. The term multipass activity refers to the logic by which a count is kept of the number of times a certain part of the code is executed. Both loop termination and multipass activities are subject to errors resulting from off-nominal situations, because the criteria and logic they use is often based on assumptions about how the mission is to be performed and the normal range of values the algorithm is likely to experience. Off-nominal testing is designed to identify situations where those assumptions, and others, are not adequate.
Investigation of Shuttle Software Errors, by Herbert Hecht (Beverly Hills, California: SoHar Incorporated), p 10.
despite rigorous inspections, implementations are vulnerable to errors arising from incorrect requirements or changes in hardware and software platforms.
Although NASA and its contractors collaborate on all aspects of the software development process, NASA is ultimately responsible for developing flight software requirements. The development contractors are responsible for implementing those requirements. Incomplete consideration of some system-level issues is an important shortcoming in this division of responsibility. NASA's description of its software development process states that the responsibility for requirements belongs to the flight software community, where the community seems to be composed of everyone having anything to do with the software. This is obviously not adequate from either a managerial or technical standpoint and better system-level V& V processes for software requirements need to be put in place. Some evidence to support this conclusion is the aftermath of the Endeavor/Intelsat incident 5 in which the members of the community all pointed fingers at each other when it came to determining responsibility for the problem, which stemmed from erroneous requirements.
Deficiencies also exist with respect to other systems engineering issues such as hardware/software platform V&V and interfaces. Because V&V inspections focus on the development of software by a single contractor, inspections do not probe beyond the descriptions of interfaces of implementations supplied by other contractors. As a result, despite rigorous inspections, implementations are vulnerable to errors arising from assumptions about incorrectly documented interfaces.
Recommendation #3: NASA should augment the current V&V process to expand the consideration of system-level issues and should provide adequate funding to allow for successful completion of these tasks.
The Independence of IV&V
Finding #4: Independence of the IV&V contractor is limited. For example, the functions the IV&V contractor is allowed to investigate are controlled by the Shuttle Avionics Software Control Board, thereby reducing the IV&V contractor 's ability to fully investigate potential problems.
The independence aspect of IV&V can be evaluated along three dimensions: managerial, technical, and financial. Technical independence implies an independent set of test and analysis
A loss of expensive hardware nearly occurred during the maiden flight of Endeavor (STS-49) (May 12, 1992) as the crew attempted to rendezvous with and repair the Intelsat satellite. The software routine used to calculate rendezvous firings, called the Lambert Targeting Routine, failed to converge to a solution due to a mismatch between the precision of the state-vector variables, which describe the position and velocity of the Shuttle, and the limits used to bound the calculation. The state-vector variables were double precision while the limit variables were single precision. The rescue mission was nearly aborted, but a workaround was found that involved relaying an appropriate state-vector value from the ground.
tools and the use of IV&V personnel not involved in the development. Managerial independence means that the IV&V responsibility is vested outside the contractor and program organizations and that the IV&V team independently decides which areas of the system to examine, the techniques to be used, the schedule of activities to be performed (within the overall system schedule), and the technical issues to be acted upon. Financial independence means that the IV&V budget is controlled by a group outside the development contractors and program organizations.
Using these definitions, the Shuttle IV&V contractors have good technical independence but little managerial or financial independence. In the opinion of the Committee, if the IV&V contractor were not given its budget and direction solely from the Shuttle Program Office, its effectiveness would be enhanced because its freedom to choose what to analyze and to what depth would be increased.
The Committee realizes that the current implementation of IV&V is a compromise between independence and close teamwork, and in the Committee's Interim Report (see Appendix C) it is stated that: “despite the limited resources, the Committee has found that the current implementation of IV&V is valuable and effective.”
The Committee still believes this is true. However, it feels that the Shuttle implementation of IV&V can be more valuable and effective (1) by expansion of its role to include analysis of some non-critical functions (the error in the Lambert Targeting Routine that led to the Endeavor/Intelsat incident demonstrates that sometimes non-critical functions can cause critical situations), and (2) by giving it managerial and financial independence from the Shuttle Program Office.
Recommendation #4: In order to provide a greater level of independence, responsibility for IV&V should be vested in entities separate from the Shuttle program structure and the centers involved in the Shuttle software development and operation. However, these organizations should continue to conduct activities supporting IV&V.
THE SILENT SAFETY PROGRAM REVISITED
NASA was the first group outside of the military to adopt system-safety engineering and, spurred on by the Apollo fire in 1967, established one of the best system-safety programs of the time. Perhaps because of the success of the NASA program, the Challenger accident was a surprise to safety professionals. Many have attributed it to a combination of complacency (which is inherent in any successful program), politics, and budget cuts.
The Rogers Commission report on the Challenger accident identified many safety engineering and management problems at NASA and spoke of a Silent Safety Program that had lost at least some of its effectiveness after the Apollo flights. Important factors cited in the Rogers Commission report were complacency and reduction of activity after the Shuttle program became operational.
After this report, NASA fixed many of these problems. The previously mentioned NRC report evaluated the progress made in these areas and made additional recommendations. The
Committee did not further evaluate the current system-safety program, but did investigate the software aspects of safety. It found that software is underemphasized in the NASA system-safety program and that many of the same mistakes that contributed to the Challenger accident are now being repeated with respect to software, especially with respect to the belief that safety procedures can be relaxed for operational programs.
Software Safety Standards
Finding #5: Current NASA safety standards and guidelines do not include software to any significant degree. A software safety guideline has been in draft form for four years. Decisions are being made and safety-critical software is being built without minimal levels of software safety analysis or management control being applied.
Efforts at getting a draft software safety guideline approved have been stalled for many years. At the same time, changes are being made to Shuttle software and new programs are being started, such as the Space Station Freedom, without adequate standards for software safety in place. The sticking point seems to be the NASA requirement for consensus on all standards and guidelines. It seems odd to the Committee that those in NASA responsible for safety do not have the authority to impose the standards that are needed to achieve it. Four years is too long to wait for consensus.
Even if the guideline is approved, it will be possible for the various centers and programs to tailor their software safety programs without approval from those responsible for safety in the headquarters Safety Office. From what the Committee can determine, the headquarters Safety and Mission Quality (S&MQ) 6 Office is limited to providing comments and conducting audits whose results are advisory. Those with responsibility must be given the authority to carry out their jobs.
Recommendation #5: NASA should establish and adopt standards for software safety and apply them as much as possible to Shuttle software upgrades. The standards should be applied in full to new projects such as the space station. NASA should not be building any software without such standards in place.
Recommendation #6: NASA should provide headquarters S&MQ with the authority to approve or reject any tailoring of the software safety standards for individual programs and minimize the differences between the safety programs being followed at different centers within a single program.
The S&MQ office at NASA headquarters is also commonly referred to as Code Q. In this report the Committee has avoided the term Code Q except where it appears in a document name or is otherwise more commonly used than the S&MQ acronym.
Software Safety Procedures
Finding #6: The Committee found insufficient coordination between the Shuttle system-safety program and the software activity. There is no tracing of system hazards to software requirements and no criticality assessment of software requirements or components (except when they are changed). There is no baseline software hazard analysis that can be used to evaluate the criticality of software modifications and no documentation of the software safety design rationale. There appear to be gaps in the reporting of identified software hazards to the system-level hazard auditing function; for example, a criticality 1 hazard can be accepted by the program without being evaluated by the Shuttle Avionics Software Configuration Board or the center safety office.
The Committee found evidence that safety issues with respect to software were considered carefully during Shuttle development, and a software hazard analysis was performed. Somehow, this concern and recognition waned after the Shuttle became operational, and attention was turned to software maintenance and upgrades. Although the individual software developers and the IV&V contractor have implemented some safety programs on their own, there appears to be little direction provided by NASA and little integration with the system-safety efforts.
For proper decision making, a program must have traceability of safety requirements in two directions--down from the system to the subsystems and from the subsystems back up to the system level. Software is somewhat unique in that it can be considered a subsystem, but it controls other subsystems and operates as the interface between subsystems. Therefore, software analysis must be closely integrated into the system-safety activity.
Recommendation #7: For the Shuttle software safety process, NASA should provide a software safety program plan (as described in the draft software safety guideline) that is reviewed and approved by headquarters S&MQ, the Safety, Reliability, and Quality Assurance (SR&QA) managers at the centers, and the Shuttle program manager. This plan should describe the organizational responsibilities, functions, and interfaces associated with the conduct of the Shuttle software safety program.
Recommendation #8: NASA should perform a hazard analysis for the Shuttle software, as described in the draft software safety guideline. NASA should also implement the other appropriate aspects of the draft software safety guideline (testing, change hazard analysis, and system-safety requirements traceability) and provide a software safety design-rationale document. NASA should establish (if necessary) and use reporting channels from software to system-safety activities.
Finding #7: The SR&QA offices at the centers have limited personnel to support software-related activities. The assignment of one civil servant to software safety is not adequate to do more than just attend meetings.
Finding #8: There is little oversight or evaluation of software development activities by the center SR&QA offices.
The 1988 NRC committee report on the Shuttle found that there was limited staff and oversight of software activities. The present Committee found that this situation has not changed.
Recommendation #9: NASA should build up expertise on software and software safety within the center SR&QA groups and headquarters and provide adequate personnel to perform flight software S&MQ activities.
System-Safety Organizational Roles and Responsibilities
Finding #9: The reporting relationship between the centers and headquarters S &MQ is ill-defined. There is little interaction between the Johnson Space Center (JSC) SR&QA office and the software development activities within IBM and Rockwell. Headquarters has no enforcement power (i.e., no authority for performance). Multiple centers on the same program may be enforcing different standards and procedures.
Several management issues arose in the investigation. First, there is a need for better reporting relationships. Dotted-line relationships 7 between the headquarters S&MQ Office and the centers are ill-defined in practice. Second, there is little communication between the center safety office personnel and the safety efforts within the development contractors. The Committee notes that other government agencies have solved this type of communication and coordination problem through the use of working groups. Other agencies also have program-independent safety certification boards that provide independent safety reviews. Finally, more emphasis in the Aerospace Safety Advisory Panel on software issues, perhaps in the form of a special subcommittee to consider software safety issues, would demonstrate and give visibility to NASA's understanding of the growing importance of software to the safe accomplishment of
The term dotted-line is often used to describe two organizations between which there is no formal line of authority. The term originates from organization charts that have a solid line to indicate formal reporting relationships and dotted lines to indicate less formal relationships. The relationship between the headquarters S&MQ and the center SR&QA groups is informal in the sense that headquarters cannot compel the center offices to perform specific tasks or provide information. On the other hand, the center offices receive some of their funding from the headquarters office, so there is some incentive, albeit informal, to cooperate.
NASA's mission, its dependence upon that software, and its commitment to resolving the issues related to this relatively new technology.
Recommendation #10: NASA should establish better reporting and management relationships between developers, centers, programs, and the headquarters Safety Office.
Recommendation #11: NASA should consider the establishment of a NASA safety certification panel or board separate from the program offices and also the establishment of a subcommittee of the Aerospace Safety Advisory Panel to deal with software issues.
ORGANIZATIONAL ROLES AND RESPONSIBILITIES
Documenting the Process
Finding #10: The Shuttle flight software maintenance and upgrade process is not adequately documented. There are important aspects of the process that are not described in the available documentation. This lack of visibility represents an increased risk of software-related problems.
The Shuttle Program Office has recently attempted to document the software V&V process to provide some visibility into the software maintenance and upgrade process as a whole. This was a good first step and has been valuable in helping the Committee understand the roles and relationships of the various organizations that participate. However, the single greatest difficulty faced by the Committee in gaining an understanding of the software and the process by which it is maintained was in obtaining adequate descriptions of the detailed actions of the people who perform the process. In particular, the Committee was interested in the way decisions are made, the coupling of authority and responsibility, and the interactions among and between the numerous NASA organizations and their contractors. Each of these is vital to the performance of the process and has very definitive effects on the quality of the software that is produced.
The Committee found that, in fact, there is a great deal of information about the day-to-day execution of the Shuttle flight software process that is not contained in any existing document but is instead passed on from person to person in the form of accumulated knowledge and on-the-job training. This can lead to the following problems:
Without complete and accurate delineation of each organization's role and responsibility, upper management cannot have the proper visibility into the process to assure that all necessary functions are being performed.
If the roles and responsibilities are not completely spelled out in a form to which all organizations have access, those organizations may be unsure of their proper roles and the roles of others within the process.
The program runs the risk of losing important information when the people who understand the process retire or move on to other programs.
By undertaking an exercise to better understand and document the current process, the Shuttle program may, independently of the other findings and recommendations of this committee, discover areas where the process could be streamlined to reduce cost without adversely affecting safety and performance.
Recommendation #12: NASA should continue to enhance the current effort to fully document all aspects of the Shuttle flight software process. The effort should clarify the responsibilities of each contractor and each part of the NASA organization in a concise and readable format. The level of detail of the descriptions should be commensurate with: (1) the needs of NASA's upper management for visibility into the process, (2) the needs of the Shuttle Program Office to understand and pass on information regarding its procedures for administering and controlling the process, and (3) the needs of each participant in the process to understand the boundaries of its responsibilities and authority.
The Role of Headquarters S&MQ and the Center SR&QA Offices
Finding #11: The headquarters S&MQ Office would have no authority to enforce established guidelines and policies if such existed.
Finding #12: The SR&QA offices at the centers do not have the resources, manpower, or authority to compel the development contractors or other NASA organizations to provide information that is sufficient to assure that the proper process is being followed.
The S&MQ Office at NASA headquarters and the SR&QA offices at the centers are not as effective as they should, or could, be. Because of inadequate resources and lack of authority, they have been unable to produce NASA-wide standards for software IV&V, reliability, quality assurance, or safety in a timely fashion. This has resulted in inconsistent and, in the Committee's opinion, inadequate implementation of these valuable oversight functions. In addition, there is insufficient technical expertise in the S& MQ offices at headquarters and SR&QA offices at the centers to ensure that software oversight functions are adequately implemented and carried out.
These problems have been mentioned above with respect to software system safety, but they are also true in the broader context of software reliability, quality assurance, and the overall organization and management of the program.
The current role and authority assigned to the S&MQ offices at NASA headquarters is counter to the recommendation of the Rogers Commission that originally resulted in the S&MQ Office being created. The Committee believes that the spirit of this recommendation has not been followed. The S&MQ and SR&QA offices currently lack the authority and the resources needed to approve the manner of oversight implemented by the Shuttle program and to fully monitor effectiveness.
Recommendation #13: The headquarters S&MQ Office should be given the authority to approve or disapprove the program's implementation of software oversight functions once appropriate guidelines and policies are established.
Recommendation #14: NASA should increase the support for software-related SR&QA activities at the centers and give them the authority to obtain any information they consider necessary to adequately assure compliance with the established process.
Finding #13: There is a lack of visibility for potential software problems because there are few requirements or opportunities to report software reliability, quality assurance, or safety problems to the program-level safety organizations or to headquarters.
The Committee was told, in response to a question submitted to NASA, that the headquarters S&MQ Office is not routinely included in the reporting of software-related problems. In other words, there is a lack of visibility for potential software problems because of a lack of clearly defined and implemented reporting channels for software reliability, quality assurance, or safety problems to the program-level safety, reliability, and quality-assurance organizations or to headquarters. For example, the Committee was told that those responsible for tracking software errors at NASA headquarters do not have routine access to the same data bases that the center and contractor personnel use. The Committee questions the need for multiple data bases tracking software error information because it could lead users to lose, confuse, or simply ignore valuable information.
Recommendation #15: The headquarters S&MQ Office and the SR&QA offices at the centers should be given routine access to all software-related problem reports, and all members of the flight software community should be made aware of their responsibility to keep these oversight organizations involved in their activities.
Finding #14: Many important functions within the flight software process appear to be assigned to the flight software community rather than a specific NASA or contractor organization.
The Committee found that the responsibility for some very important functions was assigned to what NASA terms the flight software community rather than to a specific organization or, better yet, to a specific individual. The Committee realizes that assigning everyone the responsibility for part of the process is an attempt on the part of NASA to show how all members of the community are encouraged to participate, in the hope that having more people involved in the process makes it more likely that potential problems will be found early.
However, the Committee believes that failure to assign responsibility for the performance of a function to a specific organization opens the process up to interpretation and increases the potential that important functions will be forgotten or ignored because responsibility for them was left to the community. In short, the Committee's experience is that community responsibility often results in no one taking responsibility, even in situations where safety of the crew or performance of the mission is at stake. The Rogers Commission pointed to this type of community responsibility as one of the factors that contributed to the Challenger accident.
Recommendation #16: NASA should assign specific responsibilities for each aspect of the flight software process and document them accordingly. Responsibility should be assigned to individuals or offices and not to the community as a whole.
Policies, Guidelines, and Enforcement
Finding #15: There is a lack of accepted policies and guidelines for appropriate implementation of V&V, IV&V, reliability, quality assurance, and safety measures.
Several documents have been supplied to the Committee that are meant to provide guidance in software oversight functions for NASA programs. But, in most cases, they have not been officially adopted by NASA as standards or even officially published as guidelines for program managers. Without clear guidelines and policies, it is very difficult for program management to determine appropriate roles, authority, and responsibilities for these functions. This lack of NASA-wide policies and guidelines for software has permitted a wide range of implementations of the various oversight functions, which, in the Committee's opinion, has resulted in an inconsistent retrieval of the benefits offered by these functions.
Recommendation #17: NASA should establish a process that provides the center and program managers with the opportunity to comment on proposed policies and guidelines, but also gives the appropriate headquarters personnel the authority to approve the policies and guidelines in cases where complete consensus cannot be reached in a reasonable amount of time. This process should have the following features:
The authors of proposed policies and guidelines must respond in writing to explain why concerns or criticisms that have been expressed are not incorporated in the final version.
The process should have well-defined deadlines for submitting comments, and the authors should be given the option of proceeding with the approval process once those deadlines have passed.
The process should include a provision for arbitrating disputes at a level of management above the program offices and the headquarters S&MQ Office, i.e., to the Deputy Administrator or to the Administrator, if necessary.
Finding #16: A primary reason for the lack of established policies and guidelines is the absence of sufficient resources, manpower, and expertise devoted to developing them.
To address this situation, the Committee believes that:
Recommendation #18: NASA should provide the S&MQ Office at headquarters and the SR&QA offices at the centers with the additional resources needed to build their expertise in software IV&V, safety, reliability, and quality assurance. The budget and personnel devoted to software safety, reliability, and quality-assurance activities should be of sufficient size to allow adequate policies and guidelines to be prepared, and compliance with those guidelines and policies to be fully monitored.
FINAL THOUGHTS AND FUTURE CONSIDERATIONS
The Committee believes it is imperative that the “lessons learned” up to this point in the current Shuttle program be used to guide future operation of the Shuttle and to guide the preparation of development, assurance, and maintenance procedures for future programs. Because the Shuttle flight software is, for a while at least, unique within NASA in its size and years of use, the Committee believes that NASA would do itself, and the nation, a great service if it were to capture what it has learned from this program and make it available to the Space Station Freedom and other planned or potential programs. A great benefit would also be obtained if these new programs made a concerted effort from their very beginning to fully document all decisions, both formal and informal, that may have an impact on the software or the processes used to develop it.
Recommendation #19: NASA should undertake an effort to capture the lessons learned in the development, maintenance, and assurance of the Shuttle flight software for use by other programs. This not only should take the form of official documentation of the current process, but also should include less formal reports, observations, and opinions drawn from current personnel and as many former Shuttle program and contractor management and technical personnel as appropriate. The same type of documentation should be routinely prepared for other programs as well.
In this spirit, the Committee believes it would be remiss not to bring to NASA's attention a few of the most obvious generic conclusions drawn from the Committee's investigations. These recommendations involve observations that were true for the Space Shuttle program, in varying degrees. The Committee believes that similar problems may occur in the Space Station Freedom program, the Earth Observing System, and elsewhere within NASA.
Contract Reporting Requirements
There is a perception, which may or may not be fact, that the development contractors can withhold vital information from the oversight organizations because of proprietary concerns. Although the Committee was not constituted to address this type of dispute and did not have the time to fully investigate all the relationships between the contractors and NASA, there is a view by some NASA personnel and contractors that the development contractors can choose to avoid full cooperation with the oversight activities if they determine that it is not in their best interest to do so. The Committee saw instances where this seemed to be the case.
Recommendation #20: In future procurements, NASA should more precisely identify the information that each development and oversight contractor is responsible for making available to each other and to the community as a whole.
The Committee has found a reluctance by the Shuttle program to fully implement the recommendations of the Rogers Commission, the earlier NRC committee, the GAO, and NASA's own Aerospace Safety Advisory Panel. This is particularly true in regard to fully independent V &V, but the Committee has noted other instances throughout this report with respect to issues such as better system engineering practices and the reliance on community responsibility. In the Committee's opinion, NASA has not been as aggressive as it should have been at implementing the recommendations given to it by the various outside panels and committees in the area of software oversight. This is due, in large part, the Committee believes, to the lack of a concerted effort from within NASA to educate the program managers charged with controlling software projects on the benefits of these important oversight functions.
This same problem is likely to occur in future programs. For example, the GAO has expressed some of the same concerns about the Space Station 's software development process as expressed by all of the groups, including this committee, that have examined the Shuttle program. NASA should understand that the recommendations it has been offered in the past are worthy of greater consideration than they appear to have been given.
Recommendation #21: Based on the lessons learned in the Shuttle program, NASA should put in place the mechanisms necessary to ensure that all existing and future programs are given the information needed to make intelligent implementations of software oversight functions such as IV&V.
NASA has planned and is engaged in managing and overseeing some of the most complex software projects ever attempted. For example, the Space Station software effort makes the scope of the Shuttle software seem almost trivial in comparison, and it will stretch the limits of software engineering and software management capabilities. The current plans are to develop the software in a decentralized manner, with each of the NASA centers developing different pieces that will later be integrated into a coherent system. Each of the centers has contractors and subcontractors along with NASA program management at the center to manage and oversee the development. However, there is no single prime contractor that is responsible for integrating all the software nor is an IV&V effort planned.
To bring the Space Station software effort and others such as the Earth Observing System Data and Information System to a successful completion, NASA will need to design and implement aggressive software development and software system safety programs using state-of-the-art technology and leading edge methodologies. This will require upgrading the education and knowledge of the NASA workforce to make it a leader in software engineering and software quality.
The Committee is concerned that the current software engineering and software system safety capabilities within NASA may not be adequate to acquire and manage the development of such large, complex, and safety-critical systems. The Committee believes believes the importance of software to NASA will only increase; NASA needs to increase its in-house expertise both at the working level and among those expected to manage future programs and choose the contractors that will do the work.
Contractors can be expected to do their best to provide a quality product, but, ultimately, the responsibility for the safety and functionality of the software that is put in place in future systems, including future Shuttle upgrades, belongs to NASA. If the contractors fail to provide a quality product or if the numerous parts of the total system do not operate together as expected, NASA will be the one left to explain to Congress and the nation why the system failed.
Recommendation #22: NASA should upgrade its workforce and management practices to make it a leader in software engineering and software quality. NASA should maintain as much in-house capability as possible to reduce its dependence on contractors and to provide proper assurance that contracted work is done on time and with as much attention to safety and other qualities as future systems require and deserve.