Several well-publicized SS7 outages occurred in 1990 and 1991 due to software bugs 6, 7. The first had a nationwide impact and involved the loss of 65,000,000 calls. Others involved entire cities and affected 10,000,000 customers.
In response to a massive outage in September 1991, the mayor of New York established a Task Force on Telecommunications Network Reliability. The task force noted that "the potential for telecommunications disasters is real, and losses in service can be devastating to the end user" 8.
Network infrastructure architects and designers have used redundancy and extensive testing to build integrity into telecommunications networks. They have recognized the critical role that such infrastructure plays in society and are mindful of the consequences of network failure. Techniques such as extensive software testing, hardware duplication, protection switching, standby power, alternate routing, and dynamic overload control have been used throughout the network to enhance integrity.
A 1989 report published by the National Research Council identified trends in infrastructure design that have made networks more vulnerable to large-scale outage 9. Over the past 10 years, network evolution has been paced by changes in technology, new government regulations, and increased customer demand for rapid response in provisioning voice and data services. Each of these trends has led to a concentration of network assets. Although additional competitive carriers have been introduced, the capacity of the new networks has not been adequate to absorb the traffic lost due to a failure in the established carrier's network. End-user access to all carriers has been limited by this lack of familiarity with use of access codes.
Economies of scale have caused higher average traffic cross sections for various network elements. Fiber optic cables can carry thousands of circuits, whereas copper cables carried hundreds. Other technologies such as microwave radio and domestic satellites have been retired from service in favor of fiber. When a fiber cable is rendered inoperable for whatever reason, more customers are affected unless adequate alternate routing is provided. The capacity of digital switching systems and the use of remote switching units have reduced the number of switches needed to serve a given area, thus providing higher traffic cross sections. More customers are affected by a single switch failure.
In signaling, the highly distributed multifrequency approach has been replaced by a concentrated common channel signaling system. Also, call processing intelligence that was once distributed in local offices is now migrating into centralized databases.
Stored program control now exists in virtually every network element. Software technology has led to increased network flexibility; however, it has also brought a significant challenge to overall network integrity because of its "crash" potential. Along with accidental network failures, there have been a number of malicious attacks, including the theft of credit cards from network databases and the theft of cellular electronic security numbers.
In regulation, the Federal Communications Commission has mandated schedules for the introduction of network features such as equal access. For carriers to meet the required schedules, they chose to amalgamate traffic at "points of presence" and modify the software at a small but manageable number of sites to meet the imposed schedules. Hinsdale was one such site and, unfortunately, the fire's impact was greater than it would have been without such regulatory intervention because of the resulting traffic concentration.
In my opinion, the most important lesson learned in the recent past regarding telecommunications infrastructure integrity is that we must not be complacent and assume that major failures or network intrusions cannot happen. In addition to past measures, new metrics must be developed to measure the societal impact of network integrity and bring the scientific method of specification and measurement to the problem 10.
Another lesson learned is that design for "single-point failures" is inadequate. Fires cause multiple failures, as do backhoe dig-ups, viruses, and acts of God. There has been too much focus on individual network elements and not enough on end-to-end service.