Read "Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K" at NAP.edu

Page 93 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

Chapter 4
Managing ICT Risk

Out of Y2K we’ve learned so much more about the way we really do business and rely on one another. …We have to find a way to flow this into the next level of what we’re going to work on. And whether it’s CIP (critical infrastructure protection) or whether it’s information assurance, they’re all part of the same problem. We need to maintain that continuity. …It’s going to be up to those of us who participated in Y2K to see to the success in critical infrastructure protection and information assurance and information warfare… that’s the future for our organization and, in a sense, for our country. (AFCIO/AFY2KO)

Many managers saw considerable linkage between their Year 2000 (Y2K) response efforts and ongoing efforts to manage other widespread, generic threats to critical infrastructure in general and information and communication technology (ICT) systems in particular. The central aspect of this linkage was the recognition that these efforts were more about managing risk than they were about fixing things.

Not every threat to the security and accuracy of information systems can be eliminated or anticipated; they must be managed as an unavoidable cost of increased reliance on these systems. “Military operations today are heavily dependent on globally shared critical infrastructures. Technological advances have interconnected these infrastructures, better enabling mission accomplishment anywhere in the world. While this connectivity better enables mission accomplishment, it also increases our vulnerability to human error, natural disasters, and physical or cyber attack” (USAF 1999a).

Similarly, the Y2K effort evolved from a focus on fixing ICT systems to a focus on how to manage an uncertain risk to critical information infrastructures. The unpredictability of Y2K stemmed from numerous interrelated sources, including the complexity of information technology (IT) systems and lack of clarity as to exactly how they worked, uncertainty surrounding the nature of the problem itself and how it could be identified, and uncertainty concerning the effectiveness and secondary impact of preventative measures. “Fear of the unknown drove the way Y2K was conducted” (AMC).

Given this high degree of uncertainty, much of the Y2K effort evolved into a massive risk management project. For nearly five years, people across the entire organization worked not so much to eliminate the Y2K threat as to limit the risk of its anticipated impact. “Every effort under Y2K was done to lower risk. … That drove all the decisions we made” (AFCA).

Yet, despite this overlap between Y2K and ongoing security and information assurance efforts, little formal effort was made to leverage the Y2K “investment” to improve management of these related critical ICT issues. “That’s a fundamental flaw…a matter of the Air Force’s priorities. …We’re missing an opportunity to invest wisely…[so as to] sustain some of the good that we had from Y2K” (AF/XOIWD).

This chapter clarifies and analyzes potential benefits from the Y2K experience for ongoing management of ICT security, CIP, and infrastructure assurance (IA).¹ It does

¹	IA more commonly refers to “information assurance,” but as discussed throughout this work, the Y2K experience argues for an expanded notion of “ICT infrastructure” that includes not only hardware,

Page 94 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

this in two parts: (1) a discussion of the nature of the Y2K risk and the relationship between that risk and the responses it engendered, and (2) the drawing of lessons from Y2K for ongoing management of ICT risk and vulnerability.

4.1
Understanding the Relationship Between Y2K Risk and Response

During Y2K, organizations had great difficultly clarifying the risks they faced. This had a major impact on the nature of the response to those risks. One striking aspect of the Y2K response was a highly reduced tolerance for risk.

4.1.1.
Reduced Risk Tolerance

As discussed in Chapter 2 (in particular, see Section 2.2), much of the difficulty in clarifying the risk to ICT from Y2K stemmed from issues involving the complexity of ICT systems and their environments. This complexity contributed to uncertainties about the nature of the Y2K problem itself. “The problem wasn’t understood. We had to assume that we would be operating in uncertainty” (374^th AW/LG). In turn, uncertainties surrounding the nature of Y2K led to uncertainties about the threat to critical systems and the operations they supported. “People couldn’t quantify the risk” (AFY2KO).

Faced with an uncertain threat to highly critical operations, organizations significantly reduced their tolerance for Y2K risk—in some cases, to zero. This occurred “not just within the Air Force but DOD [Department of Defense] wide, and probably even government wide” (AMC/HQ). The difficulty of quantifying a complex, multifaceted problem with a fixed deadline, coupled with an extremely reduced tolerance for risk, led to a response that approached anything associated with Y2K with a broad effort to eliminate as much risk as possible. “If you could identify a problem you had to fix it. If you could theorize a problem you had to go after it” (AMC/SCA).

This broad and exhaustive effort led to frustration among those ICT managers who saw a need to distinguish specific Y2K threats by their likelihood and criticality. For example, the AMC/SCA “instituted a review board to have the programs in a technical sense try to defend why they should be certified. …[The board probed] the changes and how the program worked to determine the probability from the technical standpoint that they’d missed something or created some other problem…” (AMC/SCA). In other words, based upon an understanding of its programs and of the nature of the problem, the board could have made a specific response, but because of other reasons—specifically, the inability of senior leadership to accept less than zero risk—it had to apply a general, zero risk response. “I found it very difficult to explain [differing levels of risk] to more senior leadership. … I can’t tell you how much time I could have saved. They basically said ‘Any risk at all, forget it’” (AMC/SCA).

Most ICT managers had never operated under a policy of zero risk tolerance, and they saw it as inappropriate for their situation. “Because of the atmosphere of paranoia, any kind of information that appeared would generate [exaggerated responses]….” If

data, and information,but also its use and management. Therefore, unless specifically noted otherwise, IA stands for the more encompassing notion of “infrastructure assurance.”

Page 95 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

completion percentages were less than 100 percent, offices would “spend the next several weeks writing current reports on all the bases…and providing twice-a-week status reports to senior management listing every single piece of infrastructure that wasn’t complete, where it was, what the expected due date was, what the fix action was, and who the POC working it [was]. And most of that…was category 4 [lowest priority].” This was considered wasted time; they could have been working on more important problems (AMC/HQ).

ICT middle managers were sometimes caught between the broad requirements of senior leaders and the efforts of local workers to minimize efforts by focusing on high priority systems and issues.

We spent a lot of time categorizing all of our systems according to the mission criticality categories. … Nevertheless, on several occasions I spent two or three days answering questions from OSD about category 4, non-mission-essential systems. … I had to call the people who manage these systems for this information and they said, “Who cares? If it breaks we get out a pencil and piece of paper and nobody knows the difference.” I’d say, “I understand that, but I have to put this information together so it can go up to OSD next week for a big conference that’s going on.” (AMC HQ)

Clearly there were differences between the risk tolerance of ICT managers and that of senior leadership.

4.1.2.
Risk Tolerance of ICT Managers versus Senior Leadership

To a great extent, the frustrations of ICT managers stemmed from fundamental differences in their tolerance for risk versus that of senior leadership. As discussed in Section 2.4, ICT managers and senior strategic managers have significant differences in training, experience, work environment, and perspective. These differences led to very different responses to the uncertainty surrounding Y2K risk.

On the one hand, to the more politically motivated senior administrator, failure can be a career-threatening event. Senior managers saw Y2K as a predictable risk. From this perspective, the clear response was to anticipate the worst-case scenario and work to eliminate as much of this risk as possible. On the other hand, ICT managers saw failure as part of their job; something they understood and dealt with every day. As engineers, they viewed senior decision makers as trying to force an unrealistic level of software reliability and assurance on Y2K (AMC/SCA). ICT managers found it difficult to understand why levels of risk they routinely accepted were now unacceptable and, even worse, causing a considerable drain on their time and resources. One manager reported being required to go through a week’s worth of effort to recertify a system after a minor change, despite having informed his senior officer that it had virtually no chance of causing a Y2K failure, because he could not guarantee that a failure would not occur.

As senior managers took responsibility for Y2K, however, they also became frustrated. In their case, frustration stemmed from the inability to get clear, concrete answers to what they saw as basic questions, such as, “Is this system Y2K compliant?” or “Can you assure me that this will not experience a Y2K failure?”

Page 96 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

4.1.3.
IT Industry Compliance Statements

Given the uncertain atmosphere surrounding Y2K and its accompanying risks, most organizations (including the United States Air Force, hereafter simply USAF, or Air Force) sought assurance from the IT industry itself. In developing their operational definition of Y2K compliant, these organizations relied heavily on Y2K compliance statements provided by system component manufacturers. Y2K workers either obtained these statements from individual system points of contact (POCs) (a highly duplicative effort) or found them posted in organization-wide inventories, such as the Air Force All Systems Inventory (AFASI).

Unfortunately, whether industry Y2K-compliance statements were applied bottom up or top down, they generally failed to reduce the uncertainties surrounding Y2K risk. Specifically, compliance of parts, components, and subsystems carried no guarantee of reliability when interacting with other components of the larger system, and industry Y2K-compliance statements were careful to state this. For example:

Our products depend on many aspects of your computer system for their correct operation. They will not be Year 2000 Compliant if the rest of your computer system, including software, hardware, firmware, and other aspects of the system or service including the operating system and BIOS, is not Year 2000 Compliant or is adversely affected by the Year 2000. (QLogic Corporation 1999)

In addition, component manufacturers could not be viewed as fully objective, since noncompliance of older products could lead to increased sales of newer ones. Here again, senior managers could not find the assurances they sought.

4.1.4.
Legal Factors

Another set of factors that greatly impacted senior management tolerance of Y2K risk and further fueled the broad and exhaustive Y2K response was legal concerns related to faultfinding if disruption from Y2K occurred. The Federal Y2K Act, passed in July 1999, specified that

A defendant who wishes to establish the affirmative defense of Y2K upset shall demonstrate, through properly signed, contemporaneous operating logs, or other relevant evidence that—(A) the defendant previously made a reasonable good faith effort to anticipate, prevent, and effectively remediate a potential Y2K failure. … (Y2K Act of 1999)

The Air Force’s good faith effort was couched under the even stronger phrase “due diligence.” “The responsible individuals, in this case…leadership, had to take an attitude of due diligence and awareness” (AFCIO/AFY2KO). However, due diligence was not formally defined; rather, it became a general sense of doing everything possible to assure operational capabilities. Nevertheless, due diligence became a fundamental aspect of the Air Force Y2K response, filtering down from senior leadership to all participants in the effort.

Page 97 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

Due diligence will probably be the most important tenet after the dust settles. If your mission-critical or mission-essential system should fail due to a Y2K problem, you may find yourself testifying in court. Without evidence of due diligence, those involved could be held liable for the damage created by system failures—this is the case in the Air Force because the certifier must sign documentation to certify the system is Y2K compliant. (Ashton 1998)

Unfortunately, people charged with addressing what was already an unclear Y2K threat now had the added vague legal threat of due diligence to consider. This led to even more extreme efforts to eliminate what was viewed as a nonquantifiable risk. In other words, the only time workers were allowed not to fix a problem was when they “could guarantee zero probability that there would be the possibility of an error,” which was impossible to do (AMC/SCA).

These due diligence concerns made it difficult to control the scope of the Y2K response. “Because of due diligence, there was a fear that if we didn’t try something, then somebody could accuse us of not doing everything possible if anything at all went wrong (AMC/SCA). The legal (and political) pressures of due diligence drove certification approval to higher levels, involving more auditing and inspection groups. These groups further advocated a broad response that treated all potential aspects of the Y2K problem equally, whether they were mission critical or not. According to the 375^th AW/LG, the AMC Audit Agency’s position was, “If it’s got power, check it.”

4.1.5.
Politics and the Media

Even when not directly linked to due diligence, outside political and media pressure further heightened the level of Y2K response. On the one hand, this pressure was beneficial in helping to bring together the critical mass of people and resources needed to address the cross-functional, cross-organizational Y2K problem. On the other hand, this outside pressure, particularly in the form in which it came from the media, helped fuel the broad nonspecific response and zero tolerance policy. “In the beginning, there were basically two types of press coverage on Y2K.” These were one, the kind that predicted disaster, and two, none. “In that kind of environment…the only information…[being disseminated is about] the next disaster that’s going to destroy western civilization. …That brought to bear all the political pressure that helped drive the zero tolerance policy and guided everything we did from there on” (AMC HQ).

Reports from the media could particularly affect senior administrators who lacked the familiarity with ICT that would allow them to discern media hype from accurate reporting. “Generals read press reports and called down, wondering about their cars” (AFCA). Whether or not it was the media who brought non-ICT infrastructure to the attention of senior leaders, this focus also had a major impact on Y2K risk management.

4.1.6.
The Inclusion of Non-ICT Infrastructure

The inclusion into Y2K of non-ICT infrastructure was another significant factor in the heightened effort to eliminate Y2K risk. The 374^th AW/LG “spent months going through inventory…[including checking] if a Toyota Land Cruiser was Y2K compliant.” This

Page 98 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

expansion of Y2K to include more traditional infrastructure went far beyond cars to include power generators, alarms, refrigerators, traffic lights, air conditioning, and sprinklers. The uncertainty that drove this expansion stemmed from a tiny but legitimate risk from hardwired dates in embedded chips. The likelihood of having an embedded chip relying on a hardwired date was small; the likelihood of that date having a Y2K error was even smaller; the likelihood that the chip’s function required sensitivity to the century was even smaller (a sprinkler, for example, might be sensitive to the day of the week or hour of the day); the likelihood of a century-sensitive chip failure having a critical cascading functional impact was even smaller. Yet, because a faulty chip was so difficult to locate and impossible to fix (it required replacement), this issue became the public focus of Y2K, the so-called ticking time bomb widely reported in the media and discussed in Congress.

In the context of zero risk tolerance, meeting the embedded chip threat required a huge, almost never-ending effort. ICT managers recognized that this was a fundamentally different issue from the Y2K data and ICT systems issues they had been dealing with. Since traditional infrastructure items varied greatly by location and were generally managed at the local level, many Air Force ICT managers passed this burden on to the bases. Not surprisingly, delegating the chip issue to the bases had a significant impact on the Y2K effort at the local level. For some bases, this meant checking “anything with possible date ramifications. We received inquiries about refrigerators and sprinklers” (375^th AW/CG). Even though one base may have had the same systems as another base, each chip was considered unique and therefore was checked. Thus, the effort was extremely arduous (“I wasted 10 months of my life” [374^th AW/LG]) and could require what appeared to be considerable duplication. Were there existing organizational mechanisms for managing risk that could have brought greater order to this effort?

4.1.7.
COOPs and ORM

An organization such as the Air Force has considerable experience with conducting operations under risk conditions. How did existing mechanisms for managing risk impact the Y2K effort? Two related mechanisms that came into play to varying degrees during Y2K were Continuity of Operations Plans (COOPs) and Operational Risk Management (ORM).

Risk of disruption from a predictable threat can be reduced not only by addressing the threat itself but also by addressing its potential impact. As the perceived deadline for Y2K approached, the response effort shifted focus from fixing systems to preparing for continued function in the face of uncertain impact (see Section 1.3.2.3). In August 1998, GAO released its Year 2000 report on “Business Continuity and Contingency Planning,” which stated:

Time is running out for solving the Year 2000 problem. Many federal agencies will not be able to renovate and fully test all of their mission-critical systems and may face major disruptions in their operations. …Because of these risks, agencies must have business continuity and contingency plans to reduce the risk of Year 2000 business failures. Specifically, every federal agency must ensure the continuity of its core business

Page 99 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

processes by identifying, assessing, managing, and mitigating its Year 2000 risks. (GAO 1998b)

As an organization familiar with operating under threat of disruption, the Air Force already maintained plans for assuring continuity of operations if a disruption occurred. Faced with the need to establish Y2K continuity plans, Air Force leaders looked to its existing COOPs (as well as, to some extent, its ORM, as discussed below).

Review and exercise your continuity-of-operations plans: A Y2K test at Keesler Air Force Base, Miss., showed we couldn't simply rely on assurances that systems are Y2K compliant. During that May 11 and 12 test, compliant systems—including commercial off-the-shelf software, encountered Y2K anomalies. Ensure your COOPs cover your mission-critical processes—the ones you can't afford to shut down. Use operational risk management to assess which of your critical processes are most likely to be affected and how they would be affected. Review your COOPs to ensure you can get the job done even if computers fail. Ensure [that] your COOPs are resourced, particularly if you're depending on goods or services you don't control. Finally, ensure [that] you've thoroughly tested your workarounds. Think of Y2K as ability to survive and operate. (Ambrose 1999)

The effort to apply COOPs to Y2K continuity planning revealed inadequacies in the COOPs as plans for minimizing uncertain risk of widespread disruption. First, given the high degree of personnel turnover and reassignment in the military, COOPs tended to be more about job continuity than consequence management. The focus was on individual unit activity, not overall mission operation with the possibility of uncertain disruption. There was little uniformity in the various unit or even base plans and little attention to interdependencies outside a given unit’s control. “Each squadron and group had a COOP, but we needed an overall plan for the base” (374^th AW/CS).

Second, given reductions in personnel and the accompanying increased effort to meet ongoing operational needs, COOPs were given extremely low priority and were rarely reviewed; in some cases, they could not even be located during base visits. In most cases, the backup plan was “just do things like we were doing them using pen and paper” (374^th AW/OG).

Nevertheless, Air Force Y2K leaders made a concerted, though initially ad hoc, effort to build on the existing COOP structure to develop a cross-organizational plan for continuity of critical missions while facing uncertain Y2K disruption. The focus of this centralized effort was interdependency—how to continue mission accomplishment with the possibility of disruption to mutually dependent, separately controlled entities. Over time, this focus was extended to include organizations and communities outside the Air Force. For example, a logical extension of a COOP was a community outreach program based on the president’s program (AFCA).

Once Y2K managers began looking at interdependencies, it became difficult to delineate where these interdependencies stopped. Like other aspects of the Y2K response, the effort to use COOPs to reduce the risk of disruption became exhaustive, especially as the new century approached. To help prioritize the ever-widening range of continuity efforts, some Y2K leaders looked toward another Air Force mechanism for managing risk, namely, ORM. “Operational Risk Management was applied to contingency planning” (375^th AW/SC). In the Air Force, ORM had been a recent development focused on predictable risks to safety. “The [Air Force] always deals with

Page 100 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

risk, but…the culture has become not to take any predictable risk. Aircraft crashes are predictable, so the [Air Force] will try to reduce them to zero. …ORM…[is] a systematic approach to get the risk out in the open for evaluation” (AFCA). Since Y2K was considered a predictable risk, it made sense to attempt to use ORM as a tool to guide response efforts by evaluating the various levels of risk.

Unfortunately, efforts to guide the Y2K response strategy based on such ORM classifications as criticality of impact and likelihood of occurrence, though not without some benefit, had minimal impact. As discussed in Section 2.2, these efforts were hampered by the technical complexity of ICT systems and nontechnical factors of the ICT environment. Other than at the highest level Commander in Chief (CINC) thin line systems, it became too difficult to discriminate different levels of Y2K problems and responses; thus, “for all practical purposes, every system went through the same level of scrutiny” (AMC/SCA).

While complexity issues were central to the difficulties in applying ORM to Y2K, there were organizational issues as well. ORM was seen as a process applied by specific offices associated with safety. It was difficult to suddenly turn ORM into a general way of thinking that could guide a cross-organizational risk reduction effort. “The Office of Primary Responsibility (OPR) is usually responsible for ORM training. Safety and Manpower usually apply ORM to most things to assure a common understanding in the case of, for example, accident investigations. ORM should be decentralized. It’s a concept, not an application” (375^th AW/SC).

Though little formal effort was made to apply ORM to addressing potential Y2K problems in ICT systems, there was a more formal effort to apply ORM to Y2K continuity planning.

We told people in the guidance for making COOPs to go through an ORM process. Deal with your safety guys and find out how to do [it]. We didn’t do that in the software; we didn’t do that in the infrastructure; we did it basically in the COOPs. (AFCA)

However, since ORM categories and procedures were not clearly defined or understood, the uneven application of ORM to COOPs could actually confuse rather than clarify the continuity planning effort. AFCA asked bases to use an ORM approach to COOPS using prioritized risks, such as complete loss of service and generation of bad data, but “the development of COOPs occurred at different levels, some of them very high and some of them very low. And what we found on a lot of our strike teams was a marrying of those levels [that] became a cloud [of general assumptions that everything would work out]” (AFCA).

In fact, some local units used ORM to justify a reduced focus on their ICT systems. Since central management was already working on most of the systems on a base, an ORM analysis could lead a base to focus more on traditional infrastructure, such as power and telecommunications, especially when those systems were not under their control (that is, they were provided by off-base facilities).

Page 101 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

4.1.8.
Effectiveness and Appropriateness of the Response

Many lessons for ongoing management of ICT risk can be drawn from the overall story of Y2K risk and response. Before drawing out the most important of these lessons, however, it seems appropriate to briefly address what was once a highly visible issue— the effectiveness and appropriateness of the Y2K response. While this study was never geared toward formally assessing the effectiveness of the Y2K response, this section on the relationship between Y2K risk and response might be viewed as incomplete without a few general comments on this complex and potentially controversial subject.

At the most basic level, the question could be asked, Was the Y2K response effective? If “effective” is defined simply by outcome, than the answer is yes. There was a complex problem; changes were made; in the end, there was little impact. If “effective” entails a sense of how much change was made, however, the answer would depend on how change is defined and counted. The anecdotal evidence indicates there was far more change to IT equipment and systems than to non-IT infrastructure. Changes to IT infrastructure may have impacted up to half of the IT inventory items; changes to more traditional infrastructure items probably ranged between 2 and 5 percent (AFCA). As discussed throughout this report, changes to operational and management practice could be far more significant, though less easy to quantify.

Did some of the resources to address Y2K go toward changes that were needed outside of the Y2K effort and would have occurred anyway? Given the interconnectedness of ICT systems, changes to one part of the system invariably impact other parts. Therefore, it is extremely hard to separate changes that addressed Y2K from other associated problems in the system. This was especially true where upgrades were seen as part of the Y2K solution. For instance, hardware could become obsolete from software upgrades—like taking an e-mail system from Microsoft Mail to Outlook—or firmware upgrades could require the replacement of routers (374^th AW/CS). Some of these upgrades and replacements, made with Y2K resources, addressed maintenance issues that existed independently of the Y2K situation. For instance, the 374^th AW/CES replaced generators dating back to 1948.

This leads to perhaps the most difficult question: Was the magnitude of the Y2K response proportional to the Y2K risk faced? It is extremely difficult to tie specific remediation efforts to the eventual outcome or to precisely quantify the costs of those efforts. To some, “the real loss was the functionality that we didn’t have because we were working on Y2K. …All the manpower [not accounted for in the costs] that we spent on Y2K [could have been allocated to more meaningful tasks],” such as implementing new functionality (AMC/HQ).

Given all the factors discussed thus far, it certainly was difficult for Y2K managers to determine when their efforts were cost-effective. The combination of reduced risk tolerance and a situation where risk could never be eliminated meant that there was little rationale for declaring a Y2K activity to be completed. Instead, Y2K workers ran a race against the clock to continually reduce risk, with little basis for determining when a given effort was sufficient.

Toward the end, the questions were coming down from on high—“Have we done enough? How sure do you feel?”… A contractor came forward at one of the OSD meetings and said, “We’ve tested a couple of your systems and we’ve found a couple of errors.” That just drove

Page 102 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

people crazy. “You mean we haven’t solved every error?” Then we expanded this risk aversion out to the overseas bases. There’s nothing wrong here [but] let’s check the countries that are giving any kind of infrastructure support to the host base. … Even to the last moment… if you had any resources left, [you had to find] something else to do. …How much risk [should we have been] willing to accept [given our] mission operation? I can’t answer that, but if you take that train of thought you’re going to lead to that same paranoia we had. … I don’t know what enough is. [At some point] you have to say, “This is enough” and then live with that. (AFCA)

Thus, from the perspective of many ICT workers who were used to dealing with constant risk to software and systems, the unusually reduced tolerance for risk in the face of Y2K uncertainty led to an excessive response. “It was a due diligence action that…wasted…[a] lot of time and money doing things that were completely unimportant” (AMC HQ). With IT, risk is assumed just by using it, something upper management did not understand. “Every one of our programs has a number of problems that are always being uncovered and always being fixed. …Was the cost of the response equal to the potential cost of harm? …Absolutely not. I think we spent a lot more than was necessary…” (AMC/SCA). “There was a genuine risk,” but because of the publicity surrounding it and the abundance of funding available, “it was overstated” (374^th AW/CS).

For numerous reasons, however, many of which had little to do with technology, senior leadership took the position that disruption was not acceptable and that if it did occur, it would not be for lack of attention or effort. Viewing Y2K primarily as a predictable and potentially highly disruptive threat, this was a reasonable stance to take. An effort that went beyond the usual cost justification was acceptable as long as disruption was minimized, which it was. If a similar degree of success could have been achieved with less effort, that does not negate Y2K as a real problem, nor does it mean that future efforts to manage risk should receive not comparable attention and support.

Far more important than trying to determine whether the Y2K response went beyond cost-effectiveness is the recognition that Y2K was an important new experience and that we need to learn from it. For most senior leaders, it was their first experience in being responsible for an enterprise-wide, mission-driven, highly uncertain ICT problem. The goal at this point should not be to determine the cost-effectiveness of the Y2K effort but, rather, to continually improve handling what is an extremely complex and dynamic job—managing ICT in general and ICT risk in particular.

We need to treat CIP and information warfare and where we go post-Y2K in some of the same ways we treated Y2K. We have to… bring it to the attention of the leadership in such a fashion that they understand it, so that these don’t fall off the table like Y2K almost did. (AFY2KO)

4.2
Application to Security, CIP, and Infrastructure Assurance

Even though Y2K included a massive, multiyear ICT risk management effort, that effort did not significantly impact the ongoing programs for addressing threats to ICT. One of the main reasons for this was that Y2K represented a different kind of ICT risk that did not fit neatly under the existing categories of ongoing effort: security, critical

Page 103 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

infrastructure protection, and infrastructure assurance. Security and CIP focus on deterring hostile threats, while infrastructure assurance focuses on mitigating the impacts of those threats.

Infrastructure Assurance: Preparatory and reactive risk management actions intended to increase confidence that a critical infrastructure’s performance level will continue to meet customer expectations despite incurring threat inflicted damage. For instance, incident mitigation, incident response, and service restoration. (PCCIP 1997)

Probably the central lesson of the Y2K experience for ongoing management of ICT risk was the recognition that serious and costly threats could stem not only from the intentional action of a conscious enemy but also from the unintentional consequences of our own actions, confounded by complexities of the ICT system itself and our inability to adequately manage those complexities. Y2K argues that we should expand our notion of infrastructure assurance to include these unintentional, systemic threats.

4.2.1.
Intentional versus Systemic ICT Risk

The 1997 report of the President’s Commission on Critical Infrastructure Protection focused on issues stemming from intentional actions of hostile enemies. “A threat is traditionally defined as a capability linked to hostile intent” (PCCIP 1997). In the commission’s report, the categorization of risk was based primarily on the nature of the target. Physical threats were threats to such physical assets as power stations, pipelines, telecommunications facilities, bridges, and water supplies. Cyber threats were threats to computer systems, especially the information-carrying components of those systems—data and code. “The Commission focused more on cyber issues than on physical issues, because cyber issues are new and not well understood. We concentrated on understanding the tools required to attack computer systems in order to shut them down or to gain access to steal, destroy, corrupt or manipulate computer data and code” (PCCIP 1997).

In its consideration of physical vulnerabilities, the commission acknowledged both natural and man-made threats: “Infrastructures have always been subject to local or regional outages resulting from earthquakes, storms, and floods. … Physical vulnerabilities to man-made threats, such as arson and bombs, are likewise not new” (PCCIP 1997).

Similarly, in its consideration of new cyber vulnerabilities, the commission acknowledged natural threats in the form of accidents and negligence while focusing its energies almost entirely on man-made threats, which range “from prankish hacking at the low end to organized, synchronized attacks at the high end” (PCCIP 1997). Following this report, the next year’s Presidential Decision Directive (PDD) 63 focused on intentional man-made “attacks” on cyber systems (even as Y2K was increasingly expanding our awareness of and attention to the more “natural” systemic components of ICT risk).

II. President’s Intent: It has long been the policy of the United States to assure the continuity and viability of critical infrastructures. President Clinton intends that the United States will take all necessary measures to swiftly eliminate any significant

Page 104 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

vulnerability to both physical and cyber attacks on our critical infrastructures, including especially our cyber systems. (EOP 1998b)

No one can deny the importance of ICT security, CIP, and the high degree of attention required to address intentional threats to critical systems. Nevertheless, Y2K illuminated the magnitude of another category of cyber threat—unintentional threats from the complexity of the system itself and our inability to fully manage it. As discussed throughout this report, the Y2K systemic threat went far beyond accidents and negligence to the heart of how ICT systems evolve over time and are used to achieve organizational goals and accomplish mission objectives.

The Y2K experience also revealed fundamental differences between intentional cyber threats and systemic ones. Hostile intentional threats originated primarily from outside the ICT system (although this includes outsiders who gain access to the inner workings of the system); systemic threats originated from the nature of the system itself, including the complexities of its interrelated subsystems, the environments within which it exists, and the ways it is managed and maintained (as discussed in Chapters 2 and 3). Intentional threats presume an adversarial relationship, with a general goal of deterrence; systemic threats presume an interdependent relationship, with a general goal of improved communication and coordination of multiple perspectives, objectives, and tactics.

These two types of threat generate two categories of ICT risk: (1) intentional risk from outside disruption of functionality, and (2) systemic risk that is often the price of increased functionality itself. Y2K was a symptom of the second type of risk. Furthermore, Y2K revealed that the uncertainties surrounding systemic risk could be as great, if not greater, than the uncertainties surrounding hostile enemy attack. For one thing, responsibilities for addressing intentional risk are easier to identify—it is primarily an us-versus-them scenario. Responsibilities for addressing systemic risk are much harder to identify. In fact, identifying responsibility for systemic risk can be the toughest issue to address, particularly when there is a mismatch between functional nodes of responsibility and a potential problem located between those nodes.

Despite the differences between intentional and systemic risk, ICT managers need to address both types of risk within a coherent strategy. In some instances, these two elements of risk overlap, allowing a single tactic such as continuity planning to minimize the potential impact of both. In other instances, efforts to address these types of risk become competing desirable ends that need to be balanced along with the many other competing desirable ends of ICT, such as functionality, usability, and maintainability.

4.2.2.
Enterprise-wide ICT Risk Management

As with ICT management in general (discussed in Chapter 3), management of the various types of ICT risk requires an enterprise-wide perspective that carefully considers and appropriately balances the many competing dynamic demands on ICT systems. This means that efforts to manage intentional and systemic risk need to be integrated not only with each other but also with other desirable ICT goals. For example, user behavior that increases ICT risk can stem from tensions between security and desired functionality, as when users punch a hole in or even take out altogether a firewall in order to accommodate a legacy system that cannot deal with it (AFCIC/SY).

Page 105 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

Conversely, actions taken to increase security can adversely affect user functionality. Much of IT security tends to degrade capability, and often with unknown consequences. For example, when an e-mail attachment or signature profile is infected with a virus, it is blocked. For a war-fighting CINC, this loss of in-transit visibility means that the information on an airplane’s contents (namely, cargo or personnel) is lost. When users detect a virus, their Simple Mail Transfer Protocol and Internet Protocol are blocked. For a 24-hours-a-day, 7-days-a-week operation, this denial of service can mean the loss of an enormous amount of data. Further compounding these issues is the breadth of activity that is affected: Many units operate outside of Air Force bases and even outside the DOD (AMC/SCA). “We’re still not organized in how we will deal with balancing security and information flow needs” (AF/XOIWD).

In addition to interdependencies between security and functionality, there are also interdependencies between efforts to manage ICT risk and efforts to acquire, develop, and field systems. Sometimes these interdependencies can lead to tensions in the relationship between the risk management and development communities. For example, units responsible for fielding new systems can be frustrated by the lack of uniformity across the organization, especially in the diverse operating environments. Different servers in the same environment may have different disk drives, use different versions of the database management system, and run different versions of the operating system. “As a result of that we can’t distribute software in a rational manner” (SSG).

On the other hand, units responsible for risk management can take a different view of this situation. From the perspective of information warfare, diversity makes it more difficult for an adversary to figure out how to breach a system. “If every piece of software is absolutely standardized, one hole gets you in everywhere. …That’s a fundamental point that’s almost always missed” (AF/XOIWD). “Using the same system on every base is a double-edged sword. … They figure out what to do with it, and they’re going to attack everybody. … That’s one of the reasons why we want some variety out there” (AFCERT).

Still other potential tensions can be seen between the goals of risk management and the informational needs for management of integrated systems (see Chapter 3, Section 3.11). ICT managers cannot handle such issues as version control and configuration management without regular gathering and dissemination of system information, yet the restriction of this same information may be necessary for security. Y2K informational efforts brought out this tension. “The cause and tension that really needs to be acknowledged with the issue of classifying the AFASI is that while this database is useful [for ICT managers], it also is your adversaries’ targeting database; therefore, there is a rationale for classification. The tension is between usability for continuing IT management and not giving your adversary… the keys to your kingdom” (AF/XOIWD).

These various interrelationships and tensions indicate that strategic risk management, like strategic ICT management in general, is a cross-organizational activity best approached from an enterprise-wide perspective. This lesson was learned during Y2K, but even though many ICT managers saw its relevance to ongoing security and risk management efforts, there was little actual transfer. “Alhough we learned that Y2K was an operational problem—not just the purview of the SC—we fundamentally have handed

Page 106 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

CIP to the SC to do. This means that with CIP, people are fighting the same battles we had to fight [with] Y2K” (AF/XOIWD).

Clearly, the Y2K experience was relevant to ongoing ICT risk management, though that relevance still needs to be captured and assimilated. To help in this effort, the following section includes lessons of the Y2K experience that can be incorporated into strategic ICT risk management efforts.

4.2.3.
Lessons of Y2K for Strategic ICT Risk Management

Based on the discussion thus far, two central lessons for the ongoing, strategic management of ICT risk can be drawn from the Y2K experience.

Expand the notion of infrastructure assurance to include unintentional, systemic risk, and integrate efforts to address systemic risk with more established efforts to address hostile, intentional risk.
Manage ICT risk from an enterprise-wide perspective, balancing and incorporating efforts to achieve the goals of security, CIP, and infrastructure assurance with the many other interrelated efforts to achieve ICT goals.

In addition to these two general lessons, additional lessons of Y2K can be applied to the ongoing management of ICT risk. Many are versions of the general management lessons discussed in Chapter 3, applied to the issues of security and risk.

In risk management efforts, increase the focus on the use of data and information to achieve organizational goals.

As discussed in Chapter 2, Section 2.3, strategic ICT management needs to shift its central focus from hardware and software to data, knowledge, and organizational goals. Similarly, risk management needs a greater focus on data and information corruption issues, which span both intentional and systemic ICT risk.

As we move increasingly into a world where critical actions are taken based on electronic output, corruption of data and information (whether from hacker maliciousness or systemic complexity) becomes the element of ICT risk that has highest impact and is most difficult to recognize.

Integrate risk management with life-cycle management of ICT systems.

Section 3.10 discussed the importance of addressing cross-boundary organizational issues in the life-cycle management of systems. For risk management, this means that life-cycle issues such as version control and configuration management need to be integrated with security and infrastructure assurance efforts. “The real basis of information assurance…is maintaining accurate inventory systems, making sure that configurations are controlled and managed and making sure that all the settings on the firewall are the same on every Air Force base” (SSG). “You need to know what you’re defending in order to do critical

Page 107 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

infrastructure protection. …This [knowledge] has the possibility of atrophying very quickly…[without a] resource stream to support its continued viability” (AF/XOIWD). Y2K showed the importance of knowing not only what is being protected but also the current state of that protection.

The information needed to support life-cycle management of ICT also represents a security risk, however, as discussed in Section 3.11. “We have to have an up-to-date inventory. But there again, it’s a double-edged sword. We have to protect that information; otherwise, somebody else is going to use it” (AFCERT). This further increases the need to integrate ICT risk management with life-cycle management.

Clarify how risk information is disseminated.

Another information issue associated with risk management is the dissemination of risk-related information. Y2K demonstrated that there could be considerable confusion about how this occurs, especially in a large, security-conscious organization. Therefore, the dissemination of risk-related information needs to be a coherent component of the enterprise-wide, strategic management of ICT.

Extend collaboration on risk management beyond the organization.

Y2K emphasized the importance to risk management of collaboration and information sharing outside the organization. As discussed in Section 3.13, the Chief Information Officer’s (CIO’s) office should serve as the single point of contact for ICT coordination outside the organization. Actual cooperation and communication among organizations, as coordinated by the CIO’s office, might well be undertaken at various levels.

Address funding barriers to enterprise-wide risk management.

Just as funding issues can become a barrier to overall ICT management (as discussed in Section 3.8), so too can they represent a barrier to enterprise-wide ICT risk management. Security is an area where funding is often available and where “stovepipe” efforts to gain that funding can work against cross-organizational coordination. In most cases, an enterprise approach to risk management is both functionally superior and more cost-effective. “Until SPOs (system program offices) and requirements writers and commanders understand information assurance, we’re not going to have it built into the system. … [Some say] security is too expensive to build in up front, but it’s a lot more expensive to put in later” (AFCIC/SY).

Distinguish day-to-day functional issues from enterprise-wide issues.

Y2K demonstrated the importance of distinguishing day-to-day operational issues from cross-organizational strategic issues. The strategic approach of senior leaders was not always applicable to individual issues of ICT risk, nor was the functional approach of ICT managers always applicable to enterprise-wide strategic risk issues. In adopting an enterprise-wide approach to risk management, it is important to distinguish day-to-day

Page 108 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

functional issues that can be handled better and more efficiently at the local level from higher-level, cross-organizational issues that require more central, strategic management.

Adopt and apply existing safety-oriented approaches to ICT risk management.

As discussed in Section 4.1.7, safety-oriented approaches to risk management, such as COOPs, ORM, and Operationalizing and Professionalizing the Network (OPTN) were only marginally applied to the Y2K situation. With modification, these approaches can help formalize cross-organizational risk management, but the transition from safety to ICT is not trivial.

For many, the Y2K effort that was most clearly relevant to ongoing ICT risk management was the COOP initiative. While this effort revealed a number of inadequacies in the creation and maintenance of existing COOPs, the Air Force can build on this learning experience. COOPs are a highly applicable way to minimize ICT risk, whether from hostile enemy action or systemic complexity, but problems were revealed during Y2K. Specifically, COOPs need to be far more rigorous in both creation and maintenance. In addition, given their background in traditional disaster planning, COOPs need to be more sophisticated in accounting for the complexities of ICT systems.

Do not return to business as usual.

As discussed in Section 2.7, after Y2K there were many reasons why managers sought a return to more comfortable, less enterprise-wide methods of managing ICT.

Nevertheless, the crisis mentality of Y2K stimulated enterprise-wide approaches to ICT that produced benefits for related security and infrastructure protection issues. ICT risk is always with us, and even in the absence of immediate crisis, it is critical to resist the seemingly easy path of a return to business as usual. Certainly the events of 9/11 have cemented this lesson.

Recognize the possible need for special regulations in support of ICT risk management.

Y2K helped the Air Force recognize that, in many ways, ICT risk management was different from managing risk in a more traditional infrastructure. These differences could require special regulations and more centralized, cross-functional management. For example, stricter regulations are needed on the use of International Merchant Purchase Authorization Cards to make ICT purchases outside the funding cycle.

Recognize the need for special training on ICT risk management.

Y2K demonstrated that ICT risk management could require both the need for special regulations and the need for special training, particularly in support of users. For example, some units recognized the need to give users more exposure to network issues, since they represented “an internal vulnerability” (374^th AW/XP).

Page 109 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×

Consider not only the question of why complex systems fail but also why they do not.

One of the hidden lessons of Y2K was that complexity increases system vulnerabilityas well as reduces vulnerability through redundancies and other inherent backups and alternative functions. As an organization with experience in attacking infrastructure, the Air Force knows that disrupting critical infrastructure is not a trivial undertaking. Y2K revealed that the Air Force is not yet completely dependent on ICT systems, and perhaps that it does not want to be. Most units are trained in operating without computers, and they perform quarterly exercises that involve using manual forms. This, of course, slows down the procedure and causes a slight degradation of service; moreover, during wartime it would be “an enormous manpower drain.” However, it can be done (374^th AW/LG). In part, this is an issue of trust in technology as well as a realization that, ultimately, people enable our systems to function.

Understanding why systems are resistant to failure is an important component of learning to better protect them. The rhetoric of cyber warfare is that infrastructures fail rapidly, yet Y2K indicated that “information infrastructure may be more robust than people assume” (AF/XOIWD). Did the small scale of Y2K disruption result from organizations solving all their problems, or did the infrastructures have an inherent robustness that we need to better understand? More study is needed to explore this question.

Establish a permanent, enterprise-wide point of contact for ICT risk management.

Finally, the difficulty in capturing and applying the lessons of Y2K, even after a multiyear, multibillion-dollar, cross-organizational effort, indicates the need for better methods of absorbing new ICT policies and practices into organizational structure and culture. As ICT systems open new operational possibilities, they also call for increased coordination and organizational flexibility.

Like the many other aspects of ICT management discussed throughout this report, ICT risk management requires a permanent, cross-organization point of contact under the guidance and auspices of the CIO (as discussed in Section 3.13). Only such an entity, bringing together not only knowledge of security issues but also multiple perspectives on the organizational roles and goals of ICT, can take on the complexity of enterprise-wide, strategic ICT risk management.

Page 110 Cite

Suggested Citation:"4 Managing ICT Risk." National Research Council. 2007. Strategic Management of Information and Communication Technology: The United States Air Force Experience with Y2K. Washington, DC: The National Academies Press. doi: 10.17226/11999.

×