The process by which NASA maintains and upgrades the Shuttle flight software involves a very complicated network of NASA organizations and contractors, with numerous formal and informal relationships between the program, the centers, and headquarters. The purpose of this chapter is to bring together several of the specific findings and recommendations that have been alluded to in previous chapters regarding the interaction of these various organizations.
The organizational problems that the Committee has identified can be summarized as follows:
The relationships among the various members of the flight software community are not well defined, despite the program's recent attempts to do so. The Committee believes this lack of visibility could result in inadequate monitoring of the process and inadequate reporting and resolution of problems. This increases the chance that problems will be overlooked.
The reporting, management, communication, and oversight relationships between the various members of the flight software community need to be improved.
The S&MQ Office at NASA headquarters and SR&QA offices at the centers are not as effective as they should, or could, be. Because of inadequate resources and lack of authority, they have been unable to produce NASA-wide standards for software IV&V, reliability, quality assurance, or safety in a timely fashion. This has resulted in inconsistent and, in the Committee's opinion, inadequate implementation of these valuable oversight functions. In addition, there is insufficient technical expertise in the S& MQ offices at headquarters and at the centers to ensure that software oversight functions are adequately implemented and carried out.
As a result of these issues, the Committee believes that potentially removable elements of risk remain in the NASA Shuttle program.
The sections that follow in this chapter describe the specific findings of the Committee regarding the organization of the Shuttle flight software process, and the corresponding recommendations that will help ensure proper control over the process.
DOCUMENTING THE PROCESS
As mentioned previously, the Shuttle Program Office has recently attempted to document the software V&V process to provide some visibility into the software maintenance and upgrade
process as a whole. This was a good first step and has been valuable in helping the Committee understand the roles and relationships of the various organizations that participate. However, as evidenced by its numerous additional questions about various organizations and their responsibilities, the Committee does not feel that the complete process is adequately documented. In fact, based on discussions with NASA and contractor personnel, and based on the Committee's own experience with the Shuttle program, the Committee believes that there is a great deal of information about the day-to-day execution of the Shuttle flight software process that is not contained in any existing document but is instead passed on from person to person in the form of accumulated knowledge and on-the-job training.
Finding #10: The Shuttle flight software maintenance and upgrade process is not adequately documented. There are important aspects of the process that are not described in the available documentation. This lack of visibility represents an increased risk of software-related problems.
An example of this lack of important documentation came to the Committee 's attention when it asked to see the process by which DRs are dispositioned. There was no single document, or even small group of documents, that could be readily provided to describe this process. To respond to the Committee's question, it was necessary for the Shuttle Program Office to write a description from scratch, using the accumulated knowledge of various people in several different organizations. The Committee considers the DR dispositioning process to be a vital piece of the overall maintenance and upgrade process and a prime example of an important function that should be captured for all to see and understand. There are other examples of which the Committee is aware, and, undoubtedly, several instances that have escaped its attention, precisely because they lack the visibility that would be afforded by more complete documentation.
This situation is an artifact of the evolution of the process over the lifetime of the Shuttle; the Shuttle flight software process is nearly unique in its age, the number of people and organizations that are involved, and the size and complexity of the software. While the Committee believes that most of the people who are responsible for managing and assuring the execution of the process understand how all the pieces fit together, the situation will only get worse as experienced personnel leave the program over the ensuing years. An effort must be made to step back from the day-to-day execution of the process and get the details down in writing. The Committee believes it is time for the Shuttle program to do so for the following three reasons:
Without complete and accurate delineation of each organization's role and responsibility, upper management cannot have the proper visibility into the process to assure that all necessary functions are being performed.
If the roles and responsibilities are not completely spelled out in a form that all organizations have access to, those organizations may be unsure of their proper roles and the roles of others within the process.
The program runs the risk of losing important information when the people who understand the process retire or move on to other programs.
The end result of failing to fully capture the details of the process will be an increased risk of software, requirements, and process errors causing delays and potential safety problems. The Committee also believes that by undertaking an exercise to better understand and document the current process, the Shuttle program will, independently of the other findings and recommendations of this committee, discover areas where the process could be streamlined to reduce cost without adversely affecting safety and performance.
Recommendation #12: NASA should continue to enhance the current effort to fully document all aspects of the Shuttle flight software process. The effort should clarify the responsibilities of each contractor and each part of the NASA organization in a concise and readable format. The level of detail of the descriptions should be commensurate with: (1) the needs of NASA's upper management for visibility into the process, (2) the needs of the Shuttle Program Office to understand and pass on information regarding its procedures for administering and controlling the process, and (3) the needs of each participant in the process to understand the boundaries of its responsibilities and authority.
ORGANIZATIONAL ROLES AND RESPONSIBILITIES
The Committee submitted numerous questions to NASA in an attempt to clarify the relationships among headquarters S&MQ, the program offices at the centers, the center S&MQ offices, and the contractors. The Committee also spent a great deal of time and effort with the documentation that purports to describe the process, trying to understand the lines of authority and responsibility within what NASA refers to as the flight software community. The information obtained from the Committee's investigations, the responses obtained to the Committee's questions, and information gleaned from corresponding discussions held with representatives from the various organizations uncovered several areas of concern regarding the responsibility and authority of various organizations, and the manner in which potential problems are brought to light for consideration by the community.
Headquarters S&MQ reports to the NASA Administrator and has only a dotted-line relationship with the S&MQ offices at the centers (i.e., headquarters S&MQ funds the center activities but the centers do not report to headquarters).
The S&MQ offices at the centers report to the center director, but interact with the Shuttle program at the working level and through their participation at SASCB meetings.
The IV&V contractor reports directly to the Shuttle program through the SASCB.
The software development effort at MSFC interacts with the rest of the flight software community through the SASCB.
These relationships form the framework for the findings and recommendations that follow.
The Role of S&MQ
Finding #11: The headquarters S&MQ Office would have no authority to enforce established guidelines and policies if such existed.
Finding #12: The SR&QA offices at the centers do not have the resources, manpower, or authority to compel the development contractors or other NASA organizations to provide information that is sufficient to assure that the proper process is being followed.
The Committee investigated the approach used to provide oversight (i.e., IV&V, safety, reliability, and quality assurance) for software development and maintenance in the U.S. Air Force and the U.S. Navy. The Committee found that, in general, it is the responsibility of a program manager to tailor the implementation of these oversight functions based on the particular needs and constraints of his/her program. This should be done within guidelines that are supplied, approved, and monitored by a quality-assurance organization outside the control of the program.
Within NASA, the Shuttle program management is also responsible for determining the best implementation of these functions, but, as discussed later in this chapter, there are no approved policies or guidelines to move the program toward an effective implementation. Furthermore, there is no authority vested in the S&MQ Office to approve and monitor the particular approach chosen by the program. In other words, the Shuttle program does not conform to the model followed by the U.S. Air Force and Navy because there are no policies or guidelines for the programs to follow, and no authority or manpower to enforce them if they existed. Instead, the Shuttle program itself is responsible for implementing software oversight functions, while the S&MQ Office at headquarters and the SR&QA offices at the centers have been relegated to an advisory role.
The Committee has also found, through its discussions with various NASA personnel, that the headquarters S&MQ Office and the SR&QA Office at JSC do not have the manpower needed to fully monitor the process. In addition, the Committee understands that the number of people within the S&MQ and SR&QA offices who have the technical expertise to consider issues that are unique to software is very limited, especially considering the number of people and organizations involved in developing the software. Great concern was expressed to the Committee regarding the ability of the SR&QA offices at JSC and MSFC to obtain from the development contractors the type and volume of information needed to properly monitor compliance to the process. For example, the SR &QA Office at JSC does not have the authority to compel contractors to provide the information needed, nor would they have the manpower to fully utilize the information if it were provided. Instead they rely on their ability to maintain a good working relationship with the contractors and the program itself. The Committee endorses
the idea of maintaining good working relationships but stresses the need to have other avenues of enforcement when, as often happens, those relationships become strained.
In summary, as quoted previously in Chapter 5, the current role and authority assigned to the S&MQ offices at NASA headquarters and the SR&QA offices at the centers is counter to the recommendation of the Rogers Commission that resulted in the S&MQ Office being created.
The Committee believes the spirit of this recommendation has not been followed. This is evidenced by the fact that the S&MQ and SR &QA offices lack the authority or the resources needed to approve the manner of oversight implemented by the Shuttle program and to fully monitor their effectiveness.
Recommendation #13: The headquarters S&MQ Office should be given the authority to approve or disapprove the program's implementation of software oversight functions once appropriate guidelines and policies are established.
Recommendation #14: NASA should increase the support for software-related SR&QA activities at the centers and give them the authority to obtain any information they consider necessary to adequately assure compliance with the established process.
Finally, the Committee was told, in response to a question submitted to NASA, that the headquarters S&MQ Office is not routinely included in the reporting of software-related problems. In addition, it became clear during discussions with the S&MQ personnel that much of their effort is spent trying to obtain important information from the program, simply because they are not on the normal distribution list. In fact, more than one member of the S&MQ staff stated to the Committee that they greatly appreciated being invited to the Committee's meetings so they could find out what is happening in the Shuttle flight software program. The Committee was also told that there are no requirements for routine reporting of software issues to higher-level program boards that are responsible for safety of the overall Shuttle system. In other words, software issues are not given the same visibility within the Shuttle program as hardware issues.
Finding #13: There is a lack of visibility for potential software problems because there are few requirements or opportunities to report software reliability, quality assurance, or safety problems at the program-level safety organizations, or to headquarters.
Recommendation #15: The headquarters S&MQ Office and the SR&QA offices at the centers should be given routine access to all software-related problem reports, and all members of the flight software community should be made aware of their responsibility to keep these oversight organizations involved in their activities.
The issue of community, or collective, responsibility arose during the Committee's attempts to understand precisely which organizations are responsible for each stage of the Shuttle flight software process. Early in this investigation the Committee was struck by the lack of detailed information on the various organizational roles and responsibilities throughout the process despite the recent attempts by the Shuttle program to provide better documentation. The Committee had expected to find a detailed delineation of each function that is performed, with a specific NASA or contractor organization given responsibility for that function. In most cases, this was the case. However, the Committee found that the responsibility for some very important functions was assigned to what NASA terms the flight software community.
Finding #14: Many important functions within the flight software process appear to be assigned to the flight software community rather than a specific NASA or contractor organization.
Figure 6-1 shows a chart from the NASA-approved description of the software development process 1 that shows the flight software community as being responsible for such important activities as generating CRs and analyzing and inspecting requirements. Other, similar charts from the same document show the flight software community as participants in activities where the responsibility lies with specific contractors or a specific NASA organization.
The Committee realizes that the document from which Figure 6-1 is taken is an attempt to condense a great deal of information about a very complicated process into a relatively short description, and it understands that NASA, and particularly the Shuttle Avionics Office at JSC, is ultimately responsible for all aspects of the Shuttle flight software. The Committee further realizes that assigning the flight software community responsibility for part of the process is an attempt on the part of NASA to show how all members of the community are encouraged to participate, in the hope that having more people involved in the process makes it more likely that potential problems will be found before they are implemented in the software. This is a valid goal, and the Committee believes it should be encouraged. However, specific task accountability and safety goals cannot be reached unless there are specific organizations, and thus specific people, within the flight software community who are given responsibility for performing each function. The Committee believes that failure to assign responsibility for the performance of a function to a specific organization opens the process up to interpretation and increases the potential that important functions will be forgotten or ignored because responsibility for them was left to the community. In short, community responsibility often results in no one taking responsibility, even in situations where safety of the crew or performance of the mission is at stake. This type of community responsibility, for example, was one of the factors that contributed to the Challenger accident.
This discussion pertains primarily to the information found in the often quoted roadmap of the V&V process, Space Shuttle Flight Software Verification and Validation Requirements, NSTS-08271.
The Committee believes the way to ensure that all aspects of the process are performed with diligence and integrity is to assign each part of the process to a specific organization, with the appropriate Shuttle Program Office given ultimate responsibility. However, because the Committee also believes that much can be gained by having the community evaluate the software, the flight software community should continue to be encouraged, and in many cases required, to participate. Both approaches can, and should, be implemented. Also, the Committee cautions against relying too heavily on the ultimate responsibility that is vested in the program itself. The NASA organizations that make the final decision to fly the Shuttle cannot be expected to fully understand all the issues involved; they must rely on the good advice of the organizations that built and tested the software. The best way to make sure the program gets good advice is to make sure that all the developers and evaluators have specific responsibilities that must be performed before the process can proceed.
Recommendation #16: NASA should assign specific responsibilities for each aspect of the flight software process and document them accordingly. Responsibility should be assigned to individuals or offices and not to the community as a whole.
POLICIES, GUIDELINES, AND ENFORCEMENT
The fact that this and other studies have been necessary indicates that the benefits of IV&V, software reliability, software quality assurance, and software safety have not been fully impressed upon the Shuttle program management. This is partially a failure of the program management to realize these benefits, but it is also a failure by NASA headquarters to provide the program management with the appropriate cost-versus-benefit information and the appropriate policies and guidelines for implementation of these oversight functions.
Finding #15: There is a lack of accepted policies and guidelines for appropriate implementation of software V&V, IV&V, reliability, quality assurance, and safety.
In general, the Shuttle Program Office is responsible for tailoring the implementation of these oversight functions in a way that is appropriate for the program, given the funds available and the perceived benefits to be gained.
Several documents have been given to the Committee that are meant to provide guidance to NASA programs in these areas but, in most cases, they have not been officially adopted by NASA as standards or even officially published as guidelines for program managers. Without clear guidelines and policies, it is very difficult for program management to determine appropriate roles, authority, and responsibilities for these functions. This lack of NASA-wide policies and guidelines has permitted a wide range of implementations of the various oversight functions which, in the Committee's opinion, has resulted in an inconsistent retrieval of the benefits offered by these functions. If headquarters were to better educate program managers in the benefits of software IV&V, reliability, quality assurance, and safety, NASA's programs, including the Shuttle program, would surely benefit. This education process, however, requires
a consistent and coherent description of the benefits and the associated costs followed up by appropriate policies and guidelines.
The Committee was told that the two primary reasons such policies and guidelines have not been published is a lack of sufficient personnel to develop them and the cumbersome process by which NASA-wide approval is obtained. In brief, the Committee found that several very useful documents were held up from being officially accepted by NASA because of the requirement to obtain complete consensus from all the centers. In some cases, this consensus process took years to complete. The delays resulted in part from conflicts between the centers and in part from personnel responsible for granting approval simply missing the deadlines established by headquarters for providing comments on the documents under consideration.
This consensus-building process is a worthy goal but should not be used as an excuse for failing to issue policies in a timely fashion. Without enlisting the centers and program personnel to determine the best implementations of the oversight functions, headquarters runs the risk of fostering distrust and outright opposition. On the other hand, the requirement for complete agreement before these documents can be accepted allows for possible filibusters or simply passive resistance that results in no policies being established. The Committee believes that a process must be put in place that forces headquarters to solicit, in good faith, the opinions of those managers at the centers who will be responsible for implementing the proposed oversight functions and yet gives headquarters the authority to break any impasse that may result.
Recommendation #17: NASA should establish a process that provides the center and program managers with the opportunity to comment on proposed policies and guidelines, but also gives the appropriate headquarters personnel the authority to approve the policies and guidelines in cases where complete consensus cannot be reached in a reasonable amount of time. This process should have the following features:
The authors of proposed policies and guidelines must respond in writing to explain why concerns or criticisms that have been expressed are not incorporated in the final version.
The process should have well-defined deadlines for submitting comments, and the authors should be given the option of proceeding with the approval process once those deadlines have passed.
The process should include a provision for arbitrating disputes at a level of management above the program offices and the headquarters S&MQ Office, i.e., to the Deputy Administrator or to the Administrator, if necessary.
Finding #16: A primary reason for the lack of established policies and guidelines is the lack of sufficient resources, manpower, and expertise devoted to developing them.
Based on discussion with the S&MQ personnel at NASA headquarters and the SR&QA offices at the centers, and also based on the Committee's observations of the time required to develop and obtain approval for appropriate policies and guidelines, the Committee believes that
there have been inadequate resources devoted to software IV&V, reliability, quality assurance, and safety efforts within the SR&QA offices at the centers and at headquarters. The lack of sufficient personnel with knowledge of the unique aspects associated with software is at least partially responsible for delays in getting consistent policies and guidelines prepared and disseminated. This, in turn, has resulted in the centers being forced to make difficult choices between the needed oversight functions and other pressing activities in the absence of complete information about the benefits these various oversight activities offer.
The Committee realizes that there is great pressure from within NASA and from Congress to cut costs, particularly in the Space Shuttle program; when resources are limited these oversight functions are often the first to be targeted for elimination. However, it is the belief of this Committee that if a commitment were made by NASA headquarters and Shuttle program management to adequately support the oversight functions with the funds and personnel needed, and if a consistent NASA-wide policy were prepared, considerable benefits could be realized that would justify any additional cost. Furthermore, the Committee believes that an effective case could be made to Congress and to the administration, based on the long-term savings realized by avoiding expensive overruns and failures, that would help lessen the pressure to reduce costs.
The Committee also realizes that the more prominent role played by software in modern flight systems is relatively new and that engineering procedures have not entirely caught up with the need. At the very least, the budget for the S&MQ and SR&QA offices for software-related activities should be increased above the threshold level needed to produce appropriate guidelines and policies and to adequately track compliance with those policies and guidelines within the programs that are affected.
Recommendation #18: NASA should provide the S&MQ Office at headquarters and the SR&QA offices at the centers with the additional resources needed to build their expertise in software IV&V, safety, reliability, and quality assurance. The budget and personnel devoted to software safety, reliability, and quality-assurance activities should be of sufficient size to allow adequate policies and guidelines to be prepared and compliance with those guidelines and policies to be fully monitored.