The National Airspace System (NAS) is a critical infrastructure for the United States. In concert with revising the architectural approach for the Next Generation Air Transportation System (NextGen), planning to cope with change is needed. Change can be thought of as the ongoing management of trade-offs, which are not clearly identified in the existing tacit architecture (discussed in Chapter 2). Indeed, any system architecture developed will need to reflect planning for resilience in order to encapsulate anticipated variability. This chapter discusses cybersecurity, unmanned aircraft systems (UAS), and safety to illustrate why planning for resilience in the NextGen is so important. The chapter then lays out a broader framework for thinking through resilience and risk management for software-intensive systems such as NextGen.
As the committee noted in its interim report,1 the designers and developers of any software- and communications-intensive system deployed today must grapple with questions of cybersecurity.2 Understanding
1 National Research Council (NRC), Interim Report of a Review of the Next Generation Air Transportation System Enterprise Architecture, Software, Safety, and Human Factors, The National Academies Press, Washington, D.C., 2014.
2 Here the committee refers to what some call cybersecurity (system, data, and communications security), which is distinct from the physical security required for airport and aircraft operation, provided in part by the Transportation Security Administration (TSA).
cybersecurity risks and threats and developing appropriate threat models and mitigations are challenges endemic across government and industry. NextGen is no exception; indeed, the safety-of-life implications and the vital economic importance of air travel make the security of NextGen and the NAS critically important. As various programs and components of the NAS are modernized, upgraded, and transformed, the security implications of the changes will need to be taken into account. The criticality of cybersecurity for NextGen increases as more services rely on digital technologies, networked communications, and commercial-off-the-shelf software.
Although acknowledging an increase in risk as they move to new, more connected technologies, the Federal Aviation Administration (FAA) staff noted in briefings to the committee that they rely heavily on federal guidelines for cybersecurity support. For example, the FAA relies on the National Institute of Standards and Technology risk management framework,3 and the Department of Transportation Inspector General and Government Accountability Office (GAO) periodically audit the compliance with federal and FAA security orders, directives, and guidance.
FAA staff stated that the enterprise-level programs address specific information threats; however, they also state that there are no current NAS-level threat models. Furthermore, from what the committee has learned, information security is not currently a consideration during safety analysis. FAA staff did note that the current safety management manual was in revision, and there are plans to address the exclusion of security. The FAA also noted that threats are addressed at the program level, and all major programs must comply with federal guidelines on information security, and information security is a component of the acquisition management life cycle. NAS-level threats are expected to be addressed through these enterprise-level programs. For example, the committee was told that information security is an integral part of the acquisition management life cycle. The committee was also briefed on a proposed NextGen cybersecurity test facility, which would provide some initial movement toward eventual capability.4
The committee remains concerned that cybersecurity, although acknowledged as an issue and with some efforts under way to address
Cybersecurity efforts may themselves require physical security components, such as physical safeguards to servers, data centers, and workers, to mitigate various kinds of threats.
3 National Institute of Standards and Technology, Framework for Improving Critical Infrastructure Cybersecurity. Version 1.0, February 12, 2014, http://www.nist.gov/cyberframework/upload/cybersecurity-framework-021214.pdf.
4 Federal Aviation Administration (FAA), “ANG-B3 Proposed NextGEN Cyber Security Test Facility: Analysis and Research of Common Cyber Security Requirements,” presented to the committee on February 19, 2014.
it, has not been fully integrated into the agency’s thinking, planning, and efforts with respect to NextGen and the NAS generally (although it recognizes that some of these efforts may be subject to classification, and therefore, the committee may not have a complete view). Because of the scale of the NAS, hazard analysis for security must be undertaken for individual subsystems and components within the system rather than just at a notion of a “perimeter.” This is an important consideration for any larger-scale system and particularly systems that are interconnected with other systems. That is, internal service interfaces must be designed to be resilient against the possibility that other parts of the system may be controlled by adversaries. In addition, as new technologies and procedures are rolled out, there will inevitably be new vulnerabilities (this is true for any information technology (IT)-based system and is not specific to the FAA). Moreover, changes in the way existing, long-stable technologies are used may introduce new security issues. There may also be vulnerabilities associated with avionics governed by international standards. So, threat analyses in both dimensions—on existing systems and associated standards with any expected changes, and on new components—are needed. Threat analysis should encompass both the nature of threats in the operating environment and the security-focused hazard analysis that connects the understanding of possible threats with architectural decisions. For these reason, cybersecurity will need to be managed architecturally. Individual threat analyses of programs need to be “rolled up” to an architectural threat model, and that threat model also needs to be potentially checked on each program.
In the committee’s view, as systems are increasingly digital and dependent on communications and networks, and as the threat landscape for the nation as a whole continues to evolve, cybersecurity will need to be an important and integral part of safety activities and is an ongoing operational matter (not only a question of design and architecture). The committee saw little evidence of adequate measures to defend systems against various kinds of attack. Data fusion of wide-area multilateration (WAM)5 with Automatic Dependent Surveillance-Broadcast (ADS-B) and radar tracks, often cited as an important systemic hedge against certain sorts of attacks (e.g., spoofing), is not a sufficient safeguard because it only protects against a certain class of attacks.
5 Wide-area multilateration is a surveillance capability that “works by employing multiple small remote sensors throughout an area to compensate for terrain obstructions … the data from multilateration sensors is fused to determine aircraft position and identification” (FAA, “Wide Area Multilateration (WAM) Project,” last modified August 19, 2014, http://www.faa.gov/nextgen/programs/adsb/wsa/wam/).
Consistent with the committee’s observations, a March 2015 GAO6 report noted:
The weaknesses in FAA’s security controls and implementation of its security program existed, in part, because FAA had not fully established an integrated, organization-wide approach to managing information security risk that is aligned with its mission. […] FAA has established a Cyber Security Steering Committee to provide an agency-wide risk management function. However, it has not fully established the governance structure and practices to ensure that its information security decisions are aligned with its mission. For example, it has not (1) clearly established roles and responsibilities for information security for the NAS or (2) updated its information security strategic plan to reflect significant changes in the NAS environment, such as increased reliance on computer networks.
Until FAA effectively implements security controls, establishes stronger agency-wide information security risk management processes, fully implements its NAS information security program, and ensures that remedial actions are addressed in a timely manner, the weaknesses GAO identified are likely to continue, placing the safe and uninterrupted operation of the nation’s air traffic control system at increased and unnecessary risk.
The system architecture for the NAS and its future goals need to embrace comprehensive, system-wide measures to ensure cybersecurity. Some cybersecurity requirements are new, based on the fact that some upgrades are using new (e.g., digital) technologies, and the requirements to meet some risks (e.g., Internet-based hacking) are themselves new. So there are new risks, and new requirements that these risks be met and mitigated. Reasoning about such risks will need to be based on clearly stated goals and requirements. Reasoning about risk assessment will become increasingly thorough and definitive as development proceeds through architecture, design, and eventual implementation, although there will always be significant uncertainty. The absence of such architectures precludes the possibility of such reasoning and leaves doubt about the exact security capabilities that NextGen will be able to achieve. Cybersecurity requires a system-wide approach that is managed architecturally and cannot be addressed piecemeal by each contractor (or program) separately. Nor can security be added to the system later. Safety properties themselves are dependent on a resilient, trustworthy, secure system, so careful integration of cybersecurity models and processes into safety analysis will
6 Government Accountability Office, Information Security: FAA Needs to Address Weaknesses in Air Traffic Control Systems, GAO-15-221, publicly released March 2, 2015, http://www.gao.gov/products/GAO-15-221.
become increasingly important. Finally, cybersecurity itself is an ongoing challenge in many domains and the subject of ongoing research; it will be important to track and integrate relevant results as the field continues to evolve.
Finding: Cybersecurity is critical to the Nextgen and the NAS. Cybersecurity challenges extend from major software platforms into the specification and design of embedded (avionics) equipment that connects directly to the NAS. The cybersecurity challenge for the NAS is a direct consequence of increasingly digital communications and systems.
Finding: Although there will always be risk, the lack of appropriate architectural approaches to security and safety that allow for reasoning about risks and uncertainty only increase the likelihood that risks of unknown magnitude can remain embedded in the NAS.
Recommendation: The Federal Aviation Administration (FAA) should incorporate cybersecurity as systems characteristic at all levels of the architecture and design. The FAA should begin by developing a threat model followed by an appropriate set of architectural and design concepts that will mitigate the associated risks, support resilience in the face of attack or compromise, and allow for dynamic evolution to meet a changing threat environment. The FAA should inculcate a cybersecurity mindset complementary to its well-established safety mindset throughout the organization, its contractors, and leadership.
The FAA defines a UAS as “an unmanned aircraft and its associated elements related to safe operations, which may include control stations (ground-, ship-, or air-based), control links, support equipment, payloads, flight termination systems, and launch/recovery equipment.”7 The FAA Reauthorization Act of 2012 calls for the safe integration of UAS in the NAS by 2015. Several interim steps have been taken, including the establishment of six UAS test sites and the first roadmap for the integration of
7 FAA, Integration of Civil Unmanned Aircraft Systems (UAS) in the National Airspace System (NAS) Roadmap, November 7, 2013.
UAS are already in use as hobbyist craft, and the FAA estimates that thousands of small UAS could be active over the next 5 years.10 Many of these will be small operations—flying below 500 feet, within line of sight, or away from controlled airspace—and not require air traffic services. Small UAS have the potential for significant economic impact; examples include surveying and treating crops fields, local news reporting, or supporting local law enforcement operations. When additional guidance is in place, higher-altitude operations that fly above 500 feet, are beyond line of sight, or that need civil airspace infrastructure will presumably need to be equipped with applicable technologies to interact with current and future air traffic services.
Several NextGen technologies are essential to the safe integration of UAS: the NAS voice system, which will allow UAS pilots to communicate with air traffic control (ATC) over ground-to-ground communication networks; Data Communications (Data Comm), which will support the sending of digital messages to the flight crew; and System Wide Information Management, which will support more timely and improved information access to all users of the NAS. However, NextGen planning and architecture did not explicitly anticipate the introduction of UAS and, indeed, the de facto system architecture, having substantially predated the advent of UAS, does not seem to lend itself to incorporating these new types of aircraft that will place new demands on the system.11 The expected integration of UAS into the NAS will present new safety issues stemming from increased reliance on data links, limited operator sensory and environmental cues, and so on. And insufficiently developed
8 A 2014 Department of Transportation (DOT) Inspector General report on FAA’s progress notes that the FAA faces large delays in the integration and that “delays are due to unresolved technological, regulatory, and privacy issues, which will prevent FAA from meeting Congress’ September 30, 2015, deadline for achieving safe UAS integration.” (DOT, FAA Faces Significant Barriers to Safely Integrate Unmanned Aircraft Systems into the National Airspace System, Office of Inspector General Audit Report AV-2014-061, June 26, 2014, https://www.oig.dot.gov/library-item/31975.
9 FAA, “Overview of Small UAS Notice of Proposed Rulemaking,” released February 15, 2015, http://www.faa.gov/uas/nprm/; and FAA, FAA Aerospace Forecast Fiscal Years 2013-2033, 2013, https://www.faa.gov/about/office_org/headquarters_offices/apl/aviation_forecasts/aerospace_forecasts/2013-2033/, p. 66.
10 See FAA, FAA Aerospace Forecast: Fiscal Years 2014-2034, 2014, https://www.faa.gov/about/office_org/headquarters_offices/apl/aviation_forecasts/aerospace_forecasts/2014-2034/media/2014_FAA_Aerospace_Forecast.pdf, p. 65.
11 The FAA’s deputy administrator was quoted saying that UAS “weren’t really part of the equation when you go back to the origin of NextGen” in J. Lowy, Drones left out of air traffic plans, AP News, September 23, 2014. http://bigstory.ap.org/article/d2f90d7230af40b493a849df06e7512e/ap-exclusive-drones-left-out-air-traffic-plans.
system architecture is one of several obstacles to fully integrating UAS into the NAS.
The integration of UAS is an example of a rapidly emerging requirement that could provoke disruptive changes to both technology and to roles and responsibilities. Allowing detect-and-avoid capability (versus see-and-avoid) will require changes to the roles of pilot and controller. Emergency procedures will need to be developed and tested (e.g., loss of data link).12 There are privacy issues that arise, as well as questions about airworthiness and associated certifications. And, related to the discussion of cybersecurity above, the introduction of UAS into the NAS will be another security risk that will need to addressed in the security architecture and mitigated. Further, low-altitude UAS operations will require new thinking because most are passing through the usual airspace. UAS missions and operations may be considerably different in their location and flight plan (e.g., they may survey an area, rather than transit through a space). Finally, some degree of autonomy in UAS operations may become increasingly desirable, which would generate a variety of new challenges for NAS and NextGen planning.13
The committee urges that the FAA use UAS as a use case for developing a better approach to system architecture (and associated technical and procedural designs). As one example, satellite-based surveillance (ADS-B Out and ADS-B In), if fully deployed, allows a different class of solutions to UAS. A living system architecture that appropriately integrates technology and procedural planning could be used to make claims about how the overall system will react (and possibly need to be changed) in response to the new usage model presented by UAS. Are the data requirements alone—content and update rate—for ADS-B Out and ADS-B In sufficient to provide safe operations absent a pilot in the cockpit? And has this been modeled and verified in the system architecture?
Finding: The challenge of integrating uAS into the NAS illustrates the challenges of accommodating changing requirements within the current approach to managing architectural and system evolution.
Finding: One measure of the quality of the NAS architecture is (and will be) its flexibility in addressing uAS operations as they unfold, recognizing that uAS requirements and capabilities are likely to change a great deal as these technologies mature.
12 See the 2014 NRC report Autonomy Research for Civil Aviation: Toward a New Era of Flight (The National Academies Press, Washington, D.C.).
13 NRC, Autonomy Research for Civil Aviation, 2014.
Recommendation: The Federal Aviation Administration (FAA) and its architecture leadership community should look for and apply lessons from the challenge of integrating unmanned aircraft systems (uAS) into the National Airspace System (NAS) as it develops an effective system architecture. The FAA and its architecture leadership community should incorporate measures in the NAS system architecture to address uAS integration.
The FAA and the United States rightly pride themselves on a devotion to safety and an excellent safety record to match. At the same time, a conservative safety culture can affect how quickly process and technological change can happen—a challenge in an arena where technologies change rapidly. Such a culture may inhibit the adoption of new technologies or increased automation that could potentially result in net improvements in both safety and efficiency. A strong safety culture can make up for some limitations in an architecture. For example, while it is a good thing for controllers and pilots to be highly sensitive to close-calls, it would be better if the architecture and design precluded those near-misses from happening. Moreover, if the FAA is going to be held accountable for an extremely conservative safety culture—which has historically been the case—then it should be recognized that such conservatism will understandably bias the agency away from innovation. Thus, there are risks associated with a safety culture as well, not least of which are opportunity costs due to not deploying improved (and potentially even safer) technology and procedures in the long run. In addition, excessive care regarding safety can result in the accumulation of technical debt—the deferral of significant refactoring and infrastructure refresh.
The original Joint Planning and Development Office (JPDO) vision of improved safety as a result of NextGen systems and technology has not been realized. The “safety management system” used by the FAA is generally very good for airborne systems but less so for ground systems. One issue with ground systems can be seen in a recent ERAM failure (discussed in Chapter 2) and in the National Transportation Safety Board’s annual appeal for better technology to prevent runway incursions.
Safety engineering is about reducing residual risk as low as possible and certainly below a threshold of acceptability. Safety engineers do not cease analysis just because a system’s operational record is satisfactory. Any accident, especially loss of a commercial transport, is regarded as an extremely serious event to be avoided. With that view in mind, the modernization efforts under way in NextGen raise two key countervailing safety issues:
- The opportunity to exploit the overall system information infrastructure to further reduce residual risk. This opportunity has not been well exploited. An enterprise architecture, for example, could have provided clear data architectural views to allow discovery of single points of failure in the system14 and to expose data limits that would cause the system to perform in ways not designed.15
- New practices and procedures made available by NextGen will transform a system with an excellent record into a new system with no operational record (even if the change is incremental). This is, in part, because even for incremental changes, there are at least four implementation cycles, all of which pose some risk: keeping the current state functional; updating and upgrading today’s NAS with existing technology; updating the NAS system with future technology; and the implementation of these changes along the way. All change has some risk; the fact that process and procedure must be changed multiple times, especially when there is not a system approach to the architecture (discussed in Chapter 2), creates difficulties. In such circumstances, a comprehensive analysis of the residual risk that results from the change, coupled with precise system operational monitoring until confidence in risk levels is established, will be important.
The early JPDO vision focused on the first item. At present, there are systems such as minimum safe altitude warning (MSAW) that are designed to provide supplementary information to ATC about hazardous states. ADS-B and wide area augmentation system (WAAS), for example, provide a major improvement in the information available to ATC over what has been available with radars. That NextGen does not have a system-wide monitoring system above and beyond things like MSAW, traffic collision avoidance system (TCAS), and enhanced ground proximity warning system (EGPWS) is surprising. New sensors and communications and computing capabilities suggest that additional monitoring—of hazardous states and states that existing sensors are expected to detect—will be important.
The second item poses more challenges to safety analysis. With new practices and procedures, even if the technology is primarily aimed at upgrading systems in place, there will be emergent properties and behaviors, some of which may create new safety risks. Is it understood and
14 A recent Chicago Center fire took down the whole center by cutting certain communications. The fact that this occurred is made worse by a previous example of single fiber optic cable cut that did the same thing. See James Adams, The Next World War: Computers Are the Weapons and the Front Line Is Everywhere, Simon and Shuster, New York, N.Y., 1998, p. 173.
15 See Box 2.2.
well articulated to stakeholders how changes in the NAS could affect the hazard rate? The committee believes this understanding should be reflected in the system architecture and be readily assessable as proposed changes are considered. Stakeholders should find their concerns reflected explicitly, and there should be models that evaluate safety requirements in terms of the highest-level structural choices in the system. Moreover, stakeholders should also be able to see evidence of evaluation of alternatives. There will also, undoubtedly, be opportunities to take advantage of new approaches to safety engineering that have emerged in recent years—primarily driven by the broad introduction of digital technology.
In architectural terms, safety is, presumably, a key performance parameter. As such, it should be linked to an understanding of how changes in ATC capabilities would affect the accident rate. Those links need to be understood, communicated, and explicitly reflected in the system architecture. A system architecture, as described in Chapter 2, would allow evaluation of things like safety in terms of the highest-level structural choices in the system. And it would enable generation and communication of the evidence of evaluation of alternatives.
The discussions of cybersecurity and UAS illustrate the need for a dynamic and flexible approach to emerging challenges that will inevitably present themselves over time. Given expected future changes, the architectural capability encouraged in Chapter 2 would offer insights on how such changes could be incorporated and where the highest risks will be. More generally, the NAS will need to be resilient, and the FAA will need to ensure appropriate and effective risk management strategies. Such strategies will need to encompass safety of flight and security, in addition to programmatic, operational, and engineering risk. This section offers a brief overview of the challenges to traditional engineering project management of software-intensive systems. It then focuses briefly on management of software risk in particular, in response to the statement of task, and describes the committee’s views on risks to NextGen.
As discussed in Chapter 1, NextGen today implicitly embodies a set of decisions to not dramatically change a wide range of current operations. Those decisions, along with an analysis of their implications, are not explicit in the tacit architecture. But a decision to not change carries heavy implications for the realization of any gains that would require such changes. The 2008 NRC report16 cited earlier and ISO/IEC/IEEE Standard
16 NRC, Pre-Milestone A and Early-Phase Systems Engineering, The National Academies Press, Washington, D.C., 2008.
4201017 both have a clear perspective on what constitutes good practice in architecting. They presume that the heart of good practice is to explicitly state value attributes (with scales) at the full system, develop multiple alternative architectures (in the sense of systems or systems-of-systems), and have evaluation models that compare those alternatives to the value attributes. Both recommend that multiple alternatives be explored and that the rationale for choice be explicit. The committee was struck by the lack of alternatives articulated for NextGen.
Conventional engineering project management techniques assume little uncertainty in their requirements and exploit mature precedents for construction and deployment. Large-scale software projects managed with such engineering governance models typically uncover changes late in the life cycle that are difficult to manage and spend 40 percent or more of their effort consumed in late scrap and rework.18 Much of NextGen is focused on new software and the computer platforms it runs on. The iron law of traditional software engineering is this: the later you are in the life cycle, the more expensive things are to fix.19
In the committee’s experience, project managers who are experienced and trained in traditional project management disciplines such as detailed planning, critical-path analysis, and earned value management may have a particularly rough transition to dealing with these types of projects. They must move from a world of managing certainty and precision to a world of resolving uncertainty based on imprecise probabilistic judgments. Although these ideas are far from new, they are also far from being standard practice in most software enterprises and require management support, leadership, and training to be implemented well. In addition, it can be easy for a program office to go into denial regarding risks, especially without incentives to aggressively seek out and identify uncertainties. Unless such incentives exist, there is likely to be a coupling of engineering risk with overall project risk. Another factor that can make transparent risk assessment and communication difficult could be the technical use of the term “risk” to refer to uncertainties regarding
17 International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC)/Institute of Electrical and Electronics Engineers (IEEE), Standard ISO/IEC/IEEE 42010:2011, “Systems and Software Engineering—Architecture Description,” December 2011, http://www.iso.org/iso/catalogue_detail.htm?csnumber=50508.
18 W. Royce, Software Project Management: A Unified Framework, Addison-Wesley, Reading, Mass., 1998.
19 A study from NASA suggests that costs can increase by more than two orders of magnitude as fault discovery and repair are deferred until later in the life cycle. See J.M. Stecklein, J. Dabney, B. Dick, B. Haskins, R. Lovell, and G. Moroney, “Error Cost Escalation Through the Project Lifecycle,” paper presented at 14th Annual International Symposium, June 19, 2004, available http://ntrs.nasa.gov/search.jsp?R=20100036670.
the consequences of potential engineering commitments. There may be counter-incentives in place to present a picture in which “risks” appear to be minimized.
There is a reasonable framing of these sorts of risk-related issues in the new book by Boehm et al., “The Incremental Commitment Spiral Model: Principles and Practices for Successful Systems and Software,”20 which focuses on how engineering uncertainties are identified and resolved, framing the activity as a process of making commitments. The use of the term “risk,” while familiar to software and systems engineers, can be misleading to non-practitioners, who might think, “we want to avoid risk,” whereas engineers must actively seek identification and engagement with these “risks.” It may be useful to think of risk as “engineering uncertainties.” An overview of the process is as follows: (1) active identification of uncertainties (part of the ongoing architectural exercise), (2) architecture work to decouple the various categories of uncertainties, (3) identification and consideration of options for handling the various uncertainties (through modeling, simulation, prototyping, etc.), (4) appropriately timed resolution of individual uncertainties (entailing an engineering commitment), and (5) ongoing reconsideration of commitments in response to changes in the operating environment and in the technical infrastructure. All of this is enabled by “good” architectural design, which minimizes the extent of coupling among the various uncertainties and commitments.
However it is phrased, for large-scale, critical initiatives such as NextGen, clear assessments, understanding, and communication of risk are essential. The risk management foundation underlying the modern spiral model and the basic ideas of software engineering economics were first laid out in the 1980s and have been updated over time.21 Applying probability theory to deal with uncertainty is also well established.22 As an example of how probability can be helpful in managing risks, consider a project that will move forward in three successive phases, where the duration of each is governed by independent bell-shaped normal distributions. Then the total time to completion is the sum of the three
20 B. Boehm, J.A. Lane, S. Koolmanojwong, and R. Turner, The Incremental Commitment Spiral Model: Principles and Practices for Successful Systems and Software, Pearson Education, Upper Saddle River, N.J., 2014.
21 See B. Boehm, A spiral model of software development and enhancement, Computer 21(5):61-72, 1988; B. Boehm, Software Engineering Economics, Prentice Hall, 1981; and Boehm et al., The Incremental Commitment Spiral Model, 2014. In addition, the 2010 NRC report Critical Code: Software Producibility for Defense (The National Academies Press, Washington, D.C.) also discusses the concept of risk in the engineering process.
22 S. Biffl, A. Aurum, B. Boehm, H. Erdogmus, and P. Grünbacher, eds., Value-Based Software Engineering, Springer-Verlag, Berlin Heidelberg, 2006; H. Erdogmus, Valuation of learning options in software development under private and market risk, Engineering Economist 47(3):308-353, 2002.
normal random variables, and the total uncertainty—as measured by the standard deviation—is not the sum of the three individual uncertainties (it could be considerably less). Moreover, the probability of completion in any particular time frame (e.g., 3 years) could be specified.
New, iterative development methods have emerged organically from diverse software development communities to improve navigation through uncertainty. Such navigation requires measured improvement with dynamic controls, instrumentation, and intermediate checkpoints that permit stakeholders to assess what they have achieved so far (the as-is situation), what adjustments they should make to the target objectives (the predicted-to-be situation), and how to refactor what they have achieved to adjust those targets in the most economical way (the roadmap forward). The key results could be reduced overhead and a significant reduction (perhaps as high as 50 percent) in scrap and rework.23
Uncertainty can be quantified by measuring the reduction in variance in the distribution of resource estimates to complete.24 A reduction in variance, even when means are unchanged, is an important sort of progress because reduction in uncertainty regarding cost to complete is valuable progress. A reduction in the standard deviation could help reveal how the spread of the distribution around its mean has shrunk. Or, a reduction in the probability of some particularly adverse outcome (which may not be proportional to variance in the distribution of resources) might be a useful quantification. These estimates are random variables and should be represented by their probability distributions, not just the mean values. In a healthy software project, each phase of development produces an increased level of understanding by reducing uncertainty in the evolving plans, specifications, and demonstrable releases. At any point in the life cycle, the precision of the subordinate artifacts, especially the code and test base, should be in balance with the evolving precision in understanding and at compatible levels of detail. Specialists in probability and statistics can play an important, ongoing role in risk management for NextGen. Some such specialists are available to the FAA through the National Center of Excellence in Operations Research.
The risks of NextGen’s software development approach are inherently difficult to quantify. However, quantifying risks and value offers means better planning and management. The challenge for complex systems such as NextGen is how to quantify and prioritize risks so that
23 W. Royce, Measuring agility and architectural integrity, International Journal of Software and Informatics 5(3):415-433, 2011.
24 The reduction in variance of a forecasted value is a measure of “validated learning,” which is elaborated further in the discussion of entrepreneurial risk management presented in The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses by Eric Reis (Crown Business, New York, N.Y., 2011).
projects can be steered effectively and uncertainties can be systematically resolved earlier in the life cycle. In all engineering projects, and particularly software engineering projects, this usually means understanding as early in the life cycle as possible, the consequences of risky decisions. If the consequences are not understood until late in the process, then the costs of unwinding previous bad decisions may become prohibitive, and the architecture becomes a source of change friction that burdens efficiency of execution. If the consequences can be understood and managed earlier, then the architecture can be effectively refactored and optimized. An effective architecture can be a basis for risk assessment and mitigation and can be used as a tool to support decision making and the recording of decisions.
A good “window” through which one can manage risks and assess the value that NextGen is likely to deliver is inherent in how the FAA’s predictions of risk have changed over time. Unfortunately, this window is far too opaque for the committee to draw quantifiable conclusions. The risk management employed by the FAA as described to the committee is heavy on process and procedure, but there is little insight inherent in the artifacts and outcomes of their risk management process. Although requested, the committee did not receive a clear description of the “top five” risks to NextGen and did not receive any quantified representation of the top risks, whether they be schedule, cost, technical, or cultural. In the committee’s view, in an environment with an effective risk management process, the top several risks—whatever they were, and there will always be risks—would be well known and internalized by everyone.
With regard to specific risk drivers, the committee observed that some important choices and considerations are driven by what appear to be hardware fixed-points, rather than being driven by a systems architecture. In some ways, the engineering agenda seems to be set by assumptions about hardware procurement (e.g., the hardware selected for ADS and Data Comm). In such a case, incompatibilities, risks, overall system costs, and life-cycle trade-offs might not have been adequately considered and appropriately factored into the decisions that led to the selection of these hardware components and the incurring of their now-sunk costs. There are also risks caused by the protracted development cycles of ATC technologies.25 These challenges impede prospects for future evolution and impinge significantly on architecture. The current mandates for hardware would have benefited from in-depth architectural appraisal along with an analysis of trade-offs between hardware and software.
25 For instance, ADS-B has been under development since the 1990s and will not be fully in service until the 2020s, and ADS-B was developed with little consideration for cybersecurity concerns.
With regard to the specific question of schedule risk, in the committee’s view, the schedule risks in NextGen have multiple sources, including budget, approval, certification, and procedure design. With the exception of resourcing and budgets, architecture can help mitigate these. Risk and project management needs are well served by an effective architecture that can be used for risk assessment and planning. However, the under development a system architecture makes it a challenge to determine how well the overall system will address system requirements (e.g., for security and robustness), causing risks of many kinds, including schedule risks. A conventional cost and schedule risk analysis would need to assess the program variance in reaching particular objectives, but NextGen functional and performance objectives are not really defined, or worse, they are inconsistently understood from stakeholder to stakeholder.
Finding: The risks to Nextgen are not clearly articulated and quantified in order of importance, making it difficult to make sound decisions about how to prioritize effort and allocate resources.
Recommendation: The Federal Aviation Administration should use an architecture leadership community and a system architecture, with input from specialists in probability and statistics, as a key tool in managing and mitigating risks and in assessing new value opportunities.