1."
Integrity is an umbrella term that includes other important
telecommunications infrastructure requirements such as quality,
reliability, and survivability.
This paper discusses various dimensions of integrity in the NII.
Threats to integrity are presented and lessons learned during the
past decade summarized, as are efforts currently under way to
improve network robustness. Finally, this paper concludes that
architects and designers of the NII must take issues of integrity
seriously. Integrity must be considered from the foundation up; it
cannot be regarded as a Band-Aid.
Threats to NII Integrity
Network elements can fail for any number of reasons, including
architectural defects, design defects, inadequate maintenance
procedures, or procedural error. They can fail due to acts of God
(lightning, hurricane, earthquake, flood), accidents (backhoe, auto
crashes, railroad derailment, power failure, fire), or sabotage
(hackers, disgruntled employees, foreign powers). Architects and
designers of the NII should weigh each of these threats and perform
cost-benefit studies that include societal costs of failure as well
as first-time network costs. Users of the NII should understand
that failures will occur and should have contingency plans.
Over the past 10 years, public networks in the United States
have experienced failures resulting from most of the threats
described above. In May 1988, a fire in the Hinsdale, Illinois,
central office disrupted telecommunications services for 35,000
residential telephones, 37,000 trunks, 13,500 special circuits,
118,000 long-distance fiber optic circuits, and 50 percent of the
cellular telephones in Chicago 2.
Full service was not restored for 28 days. The failure affected air
traffic control, hospitals, businesses, and virtually all economic
sectors. Two months later, technicians in Framingham,
Massachusetts, accidentally blew two 600A fuses in the Union Street
central office. The local switch stopped operation, and calls from
35,000 residential and business customers were denied for most of
the day 3.
In November 1988, much of the long-distance service along the
East Coast was disrupted when a construction crew accidentally
severed a major fiber optic cable in New Jersey; 3,500,000 call
attempts were
OCR for page 375
Page 375
blocked 4. Also in November 1988,
a computer virus infiltrated the Internet, shutting down hundreds
of workstations 5.
Several well-publicized SS7 outages occurred in 1990 and 1991
due to software bugs 6, 7. The first
had a nationwide impact and involved the loss of 65,000,000 calls.
Others involved entire cities and affected 10,000,000
customers.
In response to a massive outage in September 1991, the mayor of
New York established a Task Force on Telecommunications Network
Reliability. The task force noted that "the potential for
telecommunications disasters is real, and losses in service can be
devastating to the end user" 8.
Lessons Learned that are Applicable to
the NII
Network infrastructure architects and designers have used
redundancy and extensive testing to build integrity into
telecommunications networks. They have recognized the critical role
that such infrastructure plays in society and are mindful of the
consequences of network failure. Techniques such as extensive
software testing, hardware duplication, protection switching,
standby power, alternate routing, and dynamic overload control have
been used throughout the network to enhance integrity.
A 1989 report published by the National Research Council
identified trends in infrastructure design that have made networks
more vulnerable to large-scale outage
9. Over the past 10 years, network evolution has been paced by
changes in technology, new government regulations, and increased
customer demand for rapid response in provisioning voice and data
services. Each of these trends has led to a concentration of
network assets. Although additional competitive carriers have been
introduced, the capacity of the new networks has not been adequate
to absorb the traffic lost due to a failure in the established
carrier's network. End-user access to all carriers has been limited
by this lack of familiarity with use of access codes.
Economies of scale have caused higher average traffic cross
sections for various network elements. Fiber optic cables can carry
thousands of circuits, whereas copper cables carried hundreds.
Other technologies such as microwave radio and domestic satellites
have been retired from service in favor of fiber. When a fiber
cable is rendered inoperable for whatever reason, more customers
are affected unless adequate alternate routing is provided. The
capacity of digital switching systems and the use of remote
switching units have reduced the number of switches needed to serve
a given area, thus providing higher traffic cross sections. More
customers are affected by a single switch failure.
In signaling, the highly distributed multifrequency approach has
been replaced by a concentrated common channel signaling system.
Also, call processing intelligence that was once distributed in
local offices is now migrating into centralized databases.
Stored program control now exists in virtually every network
element. Software technology has led to increased network
flexibility; however, it has also brought a significant challenge
to overall network integrity because of its "crash" potential.
Along with accidental network failures, there have been a number of
malicious attacks, including the theft of credit cards from network
databases and the theft of cellular electronic security
numbers.
In regulation, the Federal Communications Commission has
mandated schedules for the introduction of network features such as
equal access. For carriers to meet the required schedules, they
chose to amalgamate traffic at "points of presence" and modify the
software at a small but manageable number of sites to meet the
imposed schedules. Hinsdale was one such site and, unfortunately,
the fire's impact was greater than it would have been without such
regulatory intervention because of the resulting traffic
concentration.
In my opinion, the most important lesson learned in the recent
past regarding telecommunications infrastructure integrity is that
we must not be complacent and assume that major failures or network
intrusions cannot happen. In addition to past measures, new metrics
must be developed to measure the societal impact of network
integrity and bring the scientific method of specification and
measurement to the problem 10.
Another lesson learned is that design for "single-point
failures" is inadequate. Fires cause multiple failures, as do
backhoe dig-ups, viruses, and acts of God. There has been too much
focus on individual network elements and not enough on end-to-end
service.
OCR for page 376
Page 376
Software is another issue. We have learned that testing software
to remove all potential bugs is difficult if not impossible.
Software does not wear out like hardware, but it is a single point
of failure that can take down an entire network. Three faulty lines
of code in 2.1 million lines of instructions were enough to cripple
phone service in Washington, D.C., Los Angeles, and Pittsburgh in
nearly identical failures between June 26, 1991, and July 2,
1991.
Improving Network Robustness
In recent years, efforts to improve network robustness have been
redoubled. In addition to the work of individual common carriers,
there are many organizations that are addressing these problems,
including Bellcore, the National Security Telecommunications
Advisory Committee, the FCC, the Institute for Electrical and
Electronics Engineers, and American National Standards Institute
Committee T1.
Exhaustive testing of new systems and new generic software
programs has been instituted by manufacturers and by Bellcore. New
technologies have been applied, including "formal methods." New
means have been developed and implemented to try and detect "bugs"
that previously would have gone undetected.
New network topologies have been implemented using bidirectional
SONET rings and digital cross-connect systems. The concept of
design for single-point failure has been supplemented to include
multiple failures. In cases where economical network design calls
for elimination of already sparse network elements, robustness has
become a consideration, and the reduction has not occurred.
New metrics have been established to quantify massive failures
and reporting means have been implemented by the FCC. Standards
have been set to quantify the severity of network outages.
Means have been implemented to detect the theft of cellular
electronic security numbers, and new personal identification
numbers have been used. There is increased awareness by the
employees of common carriers of the need for protection of codes
used to access proprietary databases and generic software.
Over the next 2 to 5 years, infrastructure robustness will be
enhanced through new procedures and network elements that will soon
be in production. Products deploying asynchronous transfer mode
(ATM) will give more flexibility in restoring a damaged network.
More parallel networks will be deployed which, if interoperable,
will add new robustness to the NII.
Current and planned research will enhance NII robustness in the
5- to 10-year window. Some of the research topics were recently
summarized in the IEEE Journal of Selected Areas in
Communications 11. Open issues
addressed in this issue included user survivability perspectives on
standards, planning, and deployment; analysis and quantification of
network disasters; survivable and fault tolerant network
architectures and associated economic analyses; and techniques to
handle network restoration as a result of physical damage or
failures in software and control systems. These subjects were
organized into four categories: user perspectives and planning;
software quality and reliability; network survivability
characterization and standards; and physical layer network
restoration, ATM layer network restoration, network layer
restoration, and survivable network design methods.
Conclusions
Over the past decade, we have learned many important lessons in
the design of telecommunications infrastructure that are applicable
to the NII. Although past networks have been designed with high
levels of integrity in mind, these efforts have not completely
measured up to the expectations of society. Recently, efforts have
been redoubled to improve network robustness.
As the NII is defined, it is important that integrity issues be
considered from the ground up. Only by these means will an NII be
constructed that meets the expectations of society.
OCR for page 377
Page 377
Bibliography
1. Private communication with W. Blalock,
Bell Communications Research.
2. National Communications System. 1988.
"May 8, 1988, Hinsdale, Illinois Telecommunications Outage," Aug.
2.
3. Brown, B., and B. Wallace. 1988. "CO
Outage Refuels Users' Disaster Fears," Network World, July
11.
4. Sims, C. 1988. "AT&T Acts to Avert
Recurrence of Long-Distance Line Disruption," New York
Times, November 26.
5. Schlender, B. 1988. "Computer Virus,
Infiltrating Network, Shuts Down Computers Around World," Wall
Street Journal, November 28.
6. Fitzgerald, K. 1990. "Vulnerability
Exposed in AT&T's 9-Hour Glitch," The Institute,
March.
7. Andrews, E. 1991. "String of Phone
Failures Reveals Computer Systems' Vulnerability," New York
Times, July 3.
8. City of New York. 1992. "Mayor's Task
Force on Telecommunications Network Reliability," January.
9. National Research Council. 1989.
Growing Vulnerability of the Public Switched Networks: Implications
for National Security Emergency Preparedness. National Academy
Press, Washington, D.C.
10. McDonald, J. 1994. "Public Network
IntegrityAvoiding a Crisis in Trust," IEEE Journal on
Selected Areas in Communications, January.
11. IEEE Journal on Selected Areas in
Communications, January 1994.
Representative terms from entire chapter:
telecommunications infrastructure