National Academies Press: OpenBook

The Internet's Coming of Age (2001)

Chapter: 2 Scaling Up the Internet and Making It More Reliable and Robust

« Previous: 1 Introduction and Context
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 53
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 54
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 55
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 56
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 57
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 58
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 59
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 60
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 61
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 62
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 63
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 64
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 65
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 66
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 67
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 68
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 69
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 70
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 71
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 72
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 73
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 74
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 75
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 76
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 77
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 78
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 79
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 80
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 81
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 82
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 83
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 84
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 85
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 86
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 87
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 88
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 89
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 90
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 91
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 92
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 93
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 94
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 95
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 96
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 97
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 98
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 99
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 100
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 101
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 102
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 103
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 104
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 105
Suggested Citation:"2 Scaling Up the Internet and Making It More Reliable and Robust." National Research Council. 2001. The Internet's Coming of Age. Washington, DC: The National Academies Press. doi: 10.17226/9823.
×
Page 106

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Scaling Up the Internet and Taking It Fore Reliable and Robust BUILDING A BETTER INTERNET The Internet has become a place where many live, work, and play. It is a critical resource for many businesses that depend on e-commerce. Indeed, when attacks are made on Internet infrastructure or commonly used Web sites like CNN, Yahoo! and the like, they become front-page news.1 As a consequence, the Internet must become and remain more robust and reliable. Reflecting demand for its capabilities, the Internet is expected to grow substantially worldwide in terms of users, devices, and applications. A dramatic increase in the number of users and networked devices gives rise to questions of whether the Internet's present address- ing scheme can accommodate the demand and whether the Internet community's proposed solution, IPv6, could, in fact, be deployed to rem- edy the situation. The l990s saw widespread deployment of telephony and streaming audio and video. These new applications and protocols have had significant impacts on the infrastructure, both quantitatively in terms of a growing level of traffic and qualitatively in terms of new types of traffic. The future is likely to see new applications that place new demands on the Internet's robustness and scalability. In short, to meet the potential demand for infrastructure, the Internet will have to support 1For example, Matt Richtel. 2000. "Several Web Sites Attacked Following Assault on Yahoo." New York Times, February 9, p. A1; and Matt Richtel. 2000. "Spread of Attacks on Web Sites Is Slowing Traffic on the Internet," New York Times, February 10, p. A1. 53

54 THE INTERNET'S COMING OF AGE a dramatically increasing number of users and devices, meet a growing demand for network capacity (scale), and provide greater robustness at any scale. SCALING "Scaling" refers to the process of adapting to various kinds of Internet growth, including the following: · The increasing number of users and devices connected to the Internet, · The increasing volume of communications per device and total volume of communication across the Internet, and · The continual emergence of new applications and ways in which users employ the Internet. While details of the growth in Internet usage are subject to interpre- tation and change over time, reflecting the dynamic nature of Internet adoption, it is only the overall trends that concern us here. In the United States, a substantial fraction of homes have access to the Internet, and that number is likely to eventually approach the fraction of homes that have a personal computer (a fraction that itself is still growing). Over 100 million people report that they are Internet users in the United States.2 Overseas, while the current level of Internet penetration differs widely from country to country, many countries show rates of growth compa- rable to or exceeding the rapid growth seen in the United States,3 so it is reasonable to anticipate that similar growth curves will be seen in other less-penetrated countries, shifted in time, reflecting when the early adop- tion phase began. Perhaps a more important future driver for overall growth is the trend toward a growing number and variety of devices being attached to the Internet. Some of those devices will be embedded in other kinds of equipment or systems, and some will serve specific purposes for a given user. This trend could change the number of devices per user from the current number, slightly less than 1 in developed countries, to much more than this 10 or even 100. 2Data from Computer Industry Almanac, available online at <http://www.c-i-a.com>. 3For an analysis based on OECD data, see Gonzaolo Diez-Picazo Figuera. 1999. An Analysis of International Internet Diffusion. Masters Thesis, MIT, June, p. 83.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 55 Scaling of Capacity The basic design of the Internet, characterized by the elements dis- cussed in Chapter 1, has proved remarkably scalable in the face of such growth. Perhaps the most obvious component of growth is the demand for greater speed in the communications lines that make up the Internet. As was noted in Chapter 1, the first major scaling hurdle was seen about a decade ago when, in response to growing demands, many of the 56-kbps lines in the NSFNET backbone were replaced with higher capac- ity 1.5-Mbps lines (also known as T1 lines).4 Doing so required develop- ing higher performance Internet routers and some retuning of protocols and software. Since then, the Internet has passed many scaling hurdles and increased its capacity many times over. The fastest lines in the Internet were 2.5 Gbps (OC-48) in 1999, almost 50,000 times faster than the original lines, and the deployment of 10-Gbps lines (OC-192) is under way. All expectations are that more such growth will be seen in the coming decade. There is a persistent and reasonable fear that demand for capac- ity will outstrip the ability of the providers to expand owing to a lack of technology or capital. The 1990s were characterized by periodic scram- bling by ISPs, equipment providers, and researchers to develop and deploy new technologies that would provide the needed capacity in ad- vance of demand. The success of those efforts does not, however, guaran- tee continued success into the future. Furthermore, efforts to expand capacity may not be uniformly successful. Regional variations in the availability of rights of way, industry strategies, and regulation could slow deployments in particular areas. Better use of existing bandwidth also plays a role in enhancing scalability. A recent trend has been to compensate for the lack of network capacity (or other functionality, such as mechanisms for assuring a par- ticular quality of service) by deploying servers throughout the Internet. Cache servers keep local copies of frequently used content, and locally placed streaming servers compensate for the lack of guarantees against delay. In some cases, innovative routing is used to capture requests and direct them to the closest servers. Each of these approaches has side Their implications for effects that can cause new problems, however. robustness and transparency are discussed elsewhere in this report. 4An abbreviation for bits per second is bps; kbps means thousands of bits per second, Mbps means millions of bits per second, Gbps means billions of bits per second, and Tbps means trillions of bits per second. The use of a capital B in place of lower case b means the unit of measurement is bytes t8 bits' rather than bits.

56 THE INTERNET'S COMING OF AGE Scaling of Protocols and Algorithms A more difficult aspect of growth is in design of new or improved protocols and algorithms for the Internet. The ever-present risk is that solutions will be deployed that work for the moment but fail as the num- ber of users and applications continues to grow today's significant im- provement may be tomorrow's impediment to progress. Scaling must thus be considered in every design. This lesson is increasingly important as there are many pressures driving innovations that may not scale well or at all. The IETF processes through which lower-level network protocols are developed involve extensive community review. This means that the protocols undergo considerable scrutiny with regard to scaling before they are widely deployed. However, particularly at the applications layer, protocol proposals are sometimes introduced that, while adequate in such settings as a local area network, have been designed without sufficient understanding of their implications for the wider Internet. Market pressures can then lead to their deployment before scaling has been completely addressed. When a standard is developed through a forum such as the IETF, public discussion of it in working groups helps. However, a protocol can nonetheless reach the status of a "proposed standard," and thus begin to be widely deployed, with obvious scalability problems only partially fixed. The Web itself is a good example of scaling challenges arising from particular application protocols. It is not widely appreciated that the "World Wide Wait" phenomenon is due in part to suboptimal design choices in the specialized protocol used by the Web (HTTP), not to the core Internet protocols. Early versions of HTTP relied on a large number of short TCP sessions, adding considerable overhead to the retrieval of a page containing many elements and preventing TCP's congestion control mechanisms from working.5 An update to the protocol, HTTP 1.1, adopted as an Internet standard by the IETF in 1999,6 finally fixed enough of the problem to reduce the pressure on the network infrastructure, but the protocol still lacks many of the right properties for use at massive 5Though it took some time to launch an update, the shortcomings of HTTP l.o were recognized early on. see, for example, Simon E. spero. 1994. Analysis of HTTP Performance Problems, Technical report. Cambridge, Mass.: World Wide Web consortium, July. Avail- able online at <http://www.w3.org/Protocols/HTTP/l.O/HTTPPerformance.html>. 6R Fielding et aL 1999. Hypertext Transfer Protocol HTTP/1.1. RFC 2616. Network Working Group, Internet Engineering Task Force, June. Available online at <http:// www.ietf.org/rfc/rfc2616.txt>.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 57 scale. The challenge posed by this lack of scalability has been significant given HTTP's large share of Internet backbone traffic.7 The case of IP multicast demonstrates the interplay between protocol design, the Internet's routing system, and scaling considerations. Multi- cast is a significant example because it allows applications to simulta- neously and inexpensively deliver a single data stream to multiple deliv- ery points, which would alleviate Internet scaling challenges. Multicast can be used in numerous applications where the same data are to be sent to multiple users, such as audio and audiovisual conferencing, entertain- ment broadcasting, and various other forms of broad information dis- semination (the delivery of stock quotes to a set of brokers is one ex- ample). All of these applications are capable of running over today's Internet, either in the backbone or within corporate networks, but many operate via a set of individual, simultaneous (unicast) transmissions, which means that they use much more bandwidth than they might oth- erwise. Despite its promise of reducing bandwidth requirements for one-to- many communications, multicast itself presents scaling challenges. By definition, an Internet-wide multicast group needs to be visible through- out the Internet or at least everywhere where there is a group's member. The techniques available today require that routers track participation in each active group, and in some case for each group's active senders. Such participation tracking requires complex databases and supporting proto- col exchanges. One might reasonably assume that the number of groups grows with the size of the Internet or with the growth of applications such as Internet radio broadcast, and that the footprint of each group (the fraction of the Internet over which the group information must be trans- mitted) will grow as the size of the Internet. However, the two factors multiply, meaning that under these assumptions, the challenges posed to providers will grow as the square of the Internet's size. Resolving this situation requires not merely defining an appropriate protocol but also researching a hard routing question how to coalesce routing informa- tion of multiple groups into manageable aggregates without generating too much inefficiency. 7For example, Internet traffic statistics for the vBNS, a research backbone, show that about two-thirds of TCP flows were HTTP. see MCI vBNS Engineering. 2000. NSF Very High Speed Backbone Network Service: Management and Operations Monthly Report, January. Available online at <http://www.vbns.net:8080/nettraff/2000/Jan.htm >.

58 THE INTERNET'S COMING OF AGE Scaling of the Internet's Naming Systems Growth in the number of names and an increasing volume of name resolution requests, both of which reflect Internet growth, are placing scaling pressures on the Internet's name-to-address translation service, the Domain Name System (DNS).8 There is broad consensus as well as a strong technical argument that a common naming service is needed on the Internet.9 People throughout the world need to be able to name ob- jects (systems, files, and facilities) correctly in their own languages and have them unambiguously accessible to authorized people under those names, which requires a common naming infrastructure. People also need naming services to allow them to identify applications and services provided by particular companies and organizations. The DNS is instrumental in hiding the Internet's internal complexity from users and application developers. In the DNS, network objects such as the host computers that provide Web pages or e-mail boxes are desig- nated by symbolic names that are independent of the location of the re- source. The name provides an indirect reference to the network object, which allows the use of names instead of less mnemonic numbers and also allows the actual address to which the name points to be changed without disrupting access via the name. Because the computer associated with a particular named service can be changed without changing the IP addresses of that machine (only the address associated with the name in the DNS needs changing), indirection provides users with portability if they wish to switch Internet providers. While most users receive IP ad- dress allocations from their ISP and thus have to change address if they change ISP, DNS names are controlled by the user a change of provider requires only that the address pointed to by the DNS entry be changed. The significance of DNS names was greatly increased as a result of the decision by the original developers of the World Wide Web to use them directly to identify information locations. The importance attached to DNS names is reflected in the contention surrounding the system's man- agement (Box 2.1) The DNS is organized as a hierarchy. At the very top of the hierarchy, the "root servers" record the address of the top-level domain servers, such as the .com or .uk servers (Figure 2.1~. The addresses of these root 8The DNS was first introduced in P. Mockapetris. 1983. Domain Names - Concepts and Facilities, R;FC 882. November. Available online at <http://www.ietf.org/rfc/rfcO882.txt>. 9Internet Architecture Board. 2000. IAB Technical Comment on the Unique DNS Root, R;FC 2826. May. Available online at <http://www.ietf.org/rfc/rfc2826.txt>.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 59 generic top-level domains levPel COM ORG EDU NET GOV MIL country top-level domains US UK ·-- level STANFORD MIT 2 level CS 3 . . . FIGURE 2.1 DNS hierarchy. servers are known locally to every name server of the Internet, using information provided by ICANN (in practice, coded into the DNS soft- ware by the vendor). Each top-level domain server records the addresses of the domain name servers for the second-level domains, such as example.com. These secondary servers are responsible for providing in- formation on name-to-address mappings for names in the example.com domain. The hierarchical design permits the secondary servers to point themselves to third-level servers, and so forth. To access named objects, Internet sessions start with a transaction with a name server, known as name resolution, to find the IP address at which the resource is located, in which a domain name such as www.example.com is translated into a numerical address such as 128.9.176.32. Assuming that the local name server has not previously stored the requisite information locally (see the discussion of caching, below), three successive transactions are generally required in order to find the address of a target server such as www. example.com: (1) to learn the address of the .com server from the root server, (2) to learn the address of the example.com server from the .com server, and (3) to learn the address of the target web server, www. example.com, from the example.com name server. The situation in practice may, in fact, be more complicated. If example.com is a very popular service, it is useful to be able to distribute the load among multiple servers and/or to direct a user to the server that is closest to him. To do either of these, the name servers run by example. com may make use of a clever trick: requests for the address correspond- ing to www.example.com, for example, may produce replies pointing to

60 THE INTERNET'S COMING OF AGE one of a number of different servers that, presumably, contain copies of the same informational The rules governing DNS names would seem to permit millions of naming domains each containing billions of names,ll which would seem adequate to support scaling demands. However, with the number of top- 1OAnother non-DNS trick for load distribution makes use of so-called transparent proxies or interception proxies. These intercept and divert data packets going to a particular ad- dress to one of a number of servers that contain the same content. Because it interposes information processing outside the control of either the user's computer or the server he is

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 61 level domains currently limited to one national domain per country (e.g., .fr for France), plus a limited number of global domains (e.g., .com and .org), many domains are organized with a very large number of names contained at the next level rather than by distributing names further down connecting to, this technique runs counter to the end-to-end principle and can sometimes have the side effect of delivering inconsistent information to the user. 1lEach DNS name can be composed of up to 256 characters and up to 64 naming ele- ments, each of which can be made of up to 64 characters (letters, digits, and hyphen).

62 THE INTERNET'S COMING OF AGE in the hierarchy (e.g., using product.example.com instead of product. com). This can cause scaling problems, and there are concerns that the performance of the DNS will worsen over time. The multistage process required to find the address of a target, re- peated for many Web page accesses by millions of Internet users, can result in a heavy load on the servers one level down from the top of the tree. If the name servers were to be overwhelmed on a persistent basis, all Internet transactions that make use of domain names (i.e., virtually all Internet transactions) would be slowed down, and the whole Internet would suffer. Today's DNS design relies on two mechanisms to cope with this load caching and replication. These mechanisms have been effective in alleviating scaling pressures, but there are signs that they may not be sufficient to cope with the continuing rapid growth of the network. DNS caching is a technique whereby the responses to common queries are stored on local DNS servers. Applications such as Web browsers also may perform DNS caching. Using caching, a local DNS server need only request the addresses of the .com server from the root servers infre- quently rather than repeatedly. Similarly, once a request has been made for the address of the example.com server, the local name server need not ask for this information again for a period of time known as the "time to live." Because of the dynamic nature of DNS information, name servers return not only an address but a time-to-live parameter selected by the administrator of the name server for the relevant domain, usually on the order of days or hours, which indicates how long the name-to-address mapping can be considered valid helping ensure that servers do not retain outdated information. 1 a Caching works well when the same request is repeated many times. This is the case for high-level queries, such as requesting the address of the .com name servers, and also for the most popular Web servers, the search engines, and the very large sites. (It works even better for very frequently accessed services like a file server on a local area network.) However, the efficiency of caching decreases as the number of names that Achy do applications also need to cache DNS names? Good DNS performance depends on having local access to DNS information. Because the target platform was a tiny, diskless machine, the earliest implementations of TCP/IP software for the IBM PC lacked DNS cache functionality and depended on local LAN access to a DNS server for all name resolu- tion requests. This resolver-only design has persisted in a number of machines today. Not only does this force the application designer to implement DNS caching, but there are performance costs as well. Since an application cannot determine whether the host it is running on supports a caching server, application-layer caching makes it possible for cach- ing to be carried out twice, potentially yielding inconsistent results.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 63 are kept active by a given user or domain name server increases. When millions of names are registered and accessed in the DNS, only a small fraction can be present in any given cache. Requests for the names of less frequently requested sites, which in total will represent a significant frac- tion of all requests, will have to be forwarded to the DNS. Even if user queries are concentrated mostly on large sites or queries from the same local group of hosts are concentrated on the same group of sites, which may or may not be the case, the remaining fraction still constitutes an important and growing burden for the DNS. The effect of cache misses is made even worse by the concentration of names in a small number of popular top-level domains, such as .com, .net, and .org. Consequently, an inordinate fraction of the load is sent to these domains' servers (a load that could be alleviated if the hierarchical design of the DNS were used to limit the number of highest-level names). These servers need to scale in two ways. They must support an ever-growing name population, which means that the size of their database keeps increasing very quickly, and they must serve ever more frequent queries. The growth of the database implies increased memory requirements and an increased management load. Replication, whereby name databases are distributed to multiple name servers, is a way of sharing the load and increasing reliability. With replication, the root server is able, for example, to provide the addresses of several .com servers instead of one. The volume of name resolution inquiries could be met by splitting the load across a sufficiently large number of replicated servers. Unfortunately, current DNS technology limits this approach because the list of the names and addresses of all the servers for a given domain must fit into a single 512-byte packet. (Even after efforts were made to shorten host names, the number of root servers remains limited to 13.) Once the maximum number of servers that will fit within the single-packet constraint has been deployed, increased load in that domain can only be dealt with by increasing the capacity and pro- cessing power of each of the individual .com name servers. While the performance of the most widely used DNS software, BIND, lags that of modern high-performance database systems and root servers' software can almost certainly be improved to handle much higher loads, Internet growth rates suggest that the demand on the root servers is likely to be growing faster than their processing speed is increasing and that in a few years the root servers could nonetheless be heavily overloaded. One proposal for addressing issues ranging from scaling to DNS name-trademark conflicts is to move toward a solution that makes use of directories as an intermediate layer between applications and the DNS. A directory might help resolve conflicts between DNS names and registered trademarks because a particular keyword could be associated with mul-

64 THE INTERNET'S COMING OF AGE tiple trademarks. It also would help alleviate pressures on the DNS by freeing companies from the necessity of registering separate domain names for each of their brand names (as well as all possible variants). For example, Procter & Gamble would not need to register domains for ivory.com and crest.com and so forth to ensure that customers would be able to locate Web pages describing these products. Directories would also support the association of a particular resource with multiple com- puters without resorting to clever tricks in the DNS server. A directory can be aware of the source making an inquiry and respond to a query by providing the address of the nearest server that has the requested infor- mation. A number of directory proposals have been floated, many of which might prove adequate. The combination of rivalry among proponents of various systems (many of them at least partially proprietary) and the Internet community's traditional resistance to changing something that is working although only poorly is probably responsible for impeding deployment of any one of these proposals. There is reason to hope that rising pressure for new capabilities that the DNS cannot easily accommo- date, such the ability to support non-Roman alphabet characters in do- main names, could unlock the problem and speed deployment of a direc- tory-based solution that would alleviate scaling pressures. SCALING UP THE ADDRESS SPACE Achieving global connectivity requires not only that every part be interconnected but also that the constituent parts use common labels the numerical addresses (of the form 144.171.1.26) that permit any connected device to communicate with any other. Provisioning these numbers to users who need them to connect their computers to the Internet raises a number of technical, organizational, and management challenges. This section briefly discusses organizations and management issues related to addresses and then focuses on the scaling challenges associated with the complexity of Internet routing and the threat that a likely explosion in the number of devices attached will exhaust the address space. Both raise questions about what technical measures will help alleviate the situation and how such measures can be implemented pervasively. In addition, the challenge of scaling up the address space exemplifies the technical complexity, the interplay between problems and solutions, and the orga- nizational deployment challenges that arise in scaling up the Internet infrastructure overall.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 65 Managing Addresses The overall concern with the allocation of Internet addresses is that the address pool may be exhausted by the growing number of users and attached devices. The Internet is not the only infrastructure facing scaling challenges associated with its addresses.l3 For traditional telephony, these challenges are being addressed in an established (if evolving) context of global and regional numbering administrations and government scrutiny by federal and state regulators. By contrast, addressing functions in the Internet have been provided by a group of organizations with less formal status. These include a group of regional registries and the Internet As- signed Numbers Authorityl4 and Network Solutions (supported by a U.S. government contract). In 1997, the responsibility for network address allocation in North America was shifted from Network Solutions to ARIN, a nonprofit organization funded by North American ISPs. In 1999, under arrangements coordinated by the U.S. Department of Commerce to re- place and expand on IANA, the overall responsibility for management of address space was assumed by ICANN. Under the current rules, regional registries such as ARIN receive large blocks of addresses from ICANN (formerly IANA). They, in turn, distribute smaller blocks of addresses to Internet service providers. Customers generally receive their addresses from these service providers, though in some instances large organiza- tions are able obtain addresses directly from the registries.l5 The rules that determine how many addresses are allocated, and to whom they are allocated, are debated within the regional registries, within ICANN, and within the IETF. These rules are often contentious, as there is an obvious tension between ensuring that there are enough addresses to go around and meeting the desire of users to be assigned the quantity and type of address blocks that they feel best meet their needs. Addresses are considered scarce today, and network managers' initial requests for ad- dress space are often turned down and renegotiated. Because the en- forcement of address allocation rules has significant consequences, the registries and ICANN must obtain and maintain the trust of all the orga- 13A growing tide of cell phones, fax machines, and second lines for dial-up Internet access have all put pressure on the pool of available telephone numbers. Responses to this demand have included the allocation of phone numbers to local exchange carriers, imple- mentation of local number portability, and establishment of new area codes. DIANA, now incorporated into the new ICANN, was located at the University of South- ern California's Information Sciences Institute under the technical leadership of the late ISI computer scientist Jon Postel, supported by contracts from DARPA and NSF. 15The assignment of a number of very large address blocks to individual institutions (companies and universities) predates the practice under current rules.

66 THE INTERNET'S COMING OF AGE nizations that make up the Internet; ICANN will probably be under pres- sure to develop appropriate processes, including appeal procedures, to make the allocation process as fair as possible. Routing Table Scaling and Address Aggregation The first part of an Internet address, known as the "routing prefix," is used to direct the routing of each packet from the source to the destina- tion end point. lust as a telephone number contains a country code, area code, central office selector, and individual phone portion, IP addresses contain hierarchically organized topological identifiers that identify an address as belonging to a particular topological segment of the Internet, a subset (e.g., a corporate network) within that region, a local area network within that, and finally an individual interface connected to a particular computing device. In contrast to area codes and central office locators, which typically map to particular geographic regions, Internet identifiers map to logical regions which may or may not coincide with geographical regions. To route packets through the Internet successfully, each router must store a table that provides a mapping between each known routing prefix and the correct routing decision for that prefix. At one extreme, within the network of a customer, this routing table can be very simple, and a "default route" can be set to map all nonlocal prefixes to the customer's ISP. At the other extreme, in the backbones of tier 1 providers, the routing table must contain a complete list of all prefixes. Today this requires tables that hold on the order of 75,000 entries.l6 As the Internet grows, the routing tables are sure to grow, but the limited capabilities of today's routers dictate that this growth must be constrained. The first and most obvious consideration is that the size of the table cannot exceed the memory available in the routers. Even the latest switching equipment used in the Internet can only accommodate a couple of hundred thousand routing entries, and older equipment as well as new, low-end equipment can store many fewer. Routers could be redesigned so that their memory could be increased to store more routes, but this is nontrivial. High-performance routing typically requires that the routing table be stored either in memory located directly on the chip or in memory connected at very high speed to the chip to avoid delays. While large amounts of memory are widely viewed as something cheap, very fast memory remains expensive and places severe constraints on the 16As of May 2, 2000, 76,265 entries had been reported in Tony Bates. 2000. The CIDR Report. Available online at <http://www.employees.org/~tbates/cidr-report.html>. The precise number of entries varies depending on where in the Internet the measurement is made, but this number gives the right sense of scale.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 67 hardware design. Furthermore, as the Internet moves to supporting qual- ity of service and various forms of filtering, router memory will be re- quired for more than just holding routing tables. So despite continuing improvements in the capabilities of microelectronic devices, the amount of high-speed memory available in routers remains a limiting factor. And router memory is only one of the factors that limit the size of the routing table. Each of the routing entries in the table describes the path from the router to a given network corresponding to a particular routing prefix. Each entry must be updated in all routers across the network whenever that path changes. Route update messages have to be carried throughout the Internet; because each backbone router must maintain the full table, each must receive and process each update. Given that the network is always changing (owing, for example, to installation, recon- figuration, or failure of network elements), the frequency of updates will increase as the size of the table grows. The updates today are already so frequent that many routers can hardly keep up, and network operators have to resort to "update damping" techniques to limit the rate of update that they accept from their peer networks, slowing the rate at which rout- ing information is distributed. Recent data show that it often takes sev- eral minutes for route updates to propagate throughout the Internet.l7 This results in long transition periods during which the routing tables are incorrect and information cannot be routed to portions of the Internet. If the size of the table were to be further increased without increasing the processing power of the routers, the processing of updates would have to be further slowed down, and the whole Internet could be plagued by routing failures. The explosive growth of the Internet address structure from 1993 to 1995 led to a routing table scaling crisis, when it was feared that the capabilities of routers would be overwhelmed. This pressure resulted in remedial actions starting with the rapid deployment of Classless Inter- domain Routing (CIDR). CIDR controls the number of routes that a router must remember by assigning addresses in a hierarchical manner that forces addresses to be assigned in blocks. This involves, for example, Internet service providers consolidating address space allocations into fewer, larger routing prefixes, which are then allocated to customers (these, in turn, may include smaller Internet service providers) out of the service provider's block of addresses. In addition, to reduce pressures on the size of the global routing table, the address registries were also forced 17Craig Labovitz et al. 1999. Analysis and Experimental Measurements of Internet BGP Convergence Latencies. Presentation to the NANOG conference, Montreal, October. Avail- able online at <http://www.nanog.org/mtg-9910/converge.html>.

68 THE INTERNET'S COMING OF AGE to adopt a very restrictive policy whereby small, independently routable address blocks would not be allocated to individual organizations. In- stead, the registries decided to require organizations to obtain address allocations from their service providers. More recently, tensions surround- ing address allocation were heightened when a few large backbone pro- viders, in an effort to force reluctant network operators to aggregate their routing information into larger blocks, refused to pass routing informa- tion about prefixes that were smaller than a certain size. This caused some address blocks that had been allocated independently to small pro- viders and companies to lose global connectivity, since the global routing system would filter out their topology information and not allow many destinations on the Internet to be able to pass traffic to their networks. These networks generally were reconfigured into larger prefixes and the crisis was resolved without widespread service outages. The overarching interest of the players in maintaining the Internet's interconnection led to self-correction of the problem. Deployment of CIDR, together with the adoption of a restrictive ad- dress allocation policy by the registries and the use of network address translation, has contained the growth of the routing tables, and the growth in the global routing table has by and large been slow and linear (Figure 2.2~. Note, however, that the most recent data displayed in this figure suggest that CIDR and restrictive allocation policies have not entirely alleviated pressures on the routing table size and that table size has re- cently grown faster than linear. Whatever pain the new rules created, uncontrolled growth of the routing tables would have caused even greater pain by rendering the whole Internet unusable. Without proactive aggregation, the table would have grown exponentially, like the Internet, and the routers would have been overwhelmed long ago. Provider-based addressing also helps each ISP limit the growth in its routing table by eliminating the necessity of knowing individual prefixes in other service provider domains. It should be noted, however, that the switch to CIDR has had a price. When an organization is given aggregatable addresses by its Internet service pro- vider, it must relinquish these addresses and renumber its network if it changes providers. Note that this is more an issue of ease of network management than of portability of identity; the more familiar names by which the organization is known to the outside world are portable and can remain the same when an organization switches ISP. Large users, For organizations that outsource management of their DNS service to their ISP, there are two barriers to switching: transferring DNS functions and renumbering.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 69 r--------------------------------------------------------------------------------------------------------- / SODOQ ~ Emu hi,.. I ~ I ~ 1~< 206300~ ~ ~ ~- 90 9~ 94 95 98 1-:~0 94 95 FIGURE 2.2 Border Gateway Patrol (BGP) route advertisements as reported by Telstra in April 2000, showing an overall linear increase in the routing table size but with a recent upward inflection in 1999-2000. SOURCE: Geoff Huston. 2000. BGP Table Size. Technical Report, Telstra Corporation, Ltd., Canberra, Australia. Data from April 12. Available online at <http://www.telstra.net/ops/ bgptable.html>. who had in the past been able to manage their own address space, have felt constrained by the new rules. The limited availability of address space would make it harder to design management-friendly addressing rules within their network, while the provider-supplied addressing rule meant that they would have to renumber whenever they changed provider. Despite the emergence of better tools for renumbering networks, this remains an involved, expensive operation that may inhibit organizations from seeking better deals from competing providers. As a result, there have been calls for users to be provided again with portable addresses so as to minimize switching costs. However, because addresses would no longer be aggregated within the blocks assigned to an ISP network, allo- cating portable addresses in small blocks to small networks would trigger a dramatic increase in the size of the routing tables. With the current state of routing technology, such a policy could destabilize the whole Internet. The desire of users, CIDR notwithstanding, to retain the ability to manage their own address space led to the development of a technology known as network address translation (NAT) (see Box 2.2~. NAT permits users to use and manage a large amount of private address space inde- pendent of the allocation policies of the registrars, giving the network

70 THE INTERNET'S COMING OF AGE manager great latitude in assigning addresses because he or she need not worry about doing so in an efficient manner, unless the private network is so large as to push up against the limit set by the size of the private address blocks. The expectation was that NAT use would be a short-term phenomenon that would be obviated by the deployment of a next-genera- tion Internet Protocol, IPv6. But NAT had an unintended side effect the explosion of private addressing. This widespread use had the effect of letting the wind out of IPv6's sails, as the perception of crisis requiring a wholesale replacement of the Internet Protocol faded. Instead, it began to be debated whether IPv4 needs to be replaced at all. Efforts to improve the aggregation of addressing notwithstanding,

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 71 the number of address blocks that must be stored in the core routers of the Internet can be expected to continue to grow, suggesting that new ap- proaches to routing may be important in the future. Specific problems that deserve exploration include modifications to the current global rout- ing scheme (Border Gateway Protocol, or BGP) to better support con- trolled peering. Further, while there is by no means consensus within the Internet community on this point, there are some who view routing con- cerns as stemming fundamentally from the approach taken in today's interdomain routing protocol, BGP4. Routing in BGP depends on each backbone or backbone node having a complete and fairly precise picture of the Internet. As the Internet grows, this need to have complete knowl- edge of the system presents scaling problems. Many other complex prob- lems, such as ambulance routing, airplane routing, and chess games, have been solved using an approach where in place of complete knowledge, one starts things off in the right general direction, looks ahead a limited distance at each step, and keeps adjusting as the objective comes nearer. Such algorithms have much better scaling properties in very large, com- plex systems than complete-knowledge approaches. Running Out of Addresses? In addition to scaling issues raised by constraints on the structure of addresses and their allocation, there exist concerns about running out of numbers altogether. The present Internet protocol, IPv4, provides 32-bit- long addresses, which translates into an address "space" of about 4.3 billion unique addresses. For historical reasons, about seven-eighths of this address space is available for use as addresses for connecting devices to the Internet; the remainder is reserved for multicast and experimental addresses. While more than 4 billion addresses was considered more than sufficient in the Internet's early days, widespread Internet adoption means that this seemingly large number is becoming a constraint. There is evidence today of pressures on the address space, but using this evi- dence to predict long-term trends is difficult. Clearly, while there has been no crisis thus far, there is still considerable risk associated with exhausting the IPv4 address space, particularly over the longer term. But just how long is "long term"? Estimates vary wildly, from "never" to 10 to 20 years to as few as 2 or 3. Estimating Address Use and Demand How much of the IPv4 address space is used? Huitema reports that about half of the address space has now been allocated by the address registries, which along with the rate of address allocation, suggests that

72 THE INTERNET'S COMING OF AGE exhaustion is relatively near.l9 However, not all the addresses that have been allocated are in active use. Another measure is the sum of the size of all the blocks of addresses (addresses with a common prefix) that are made known ("advertised," in Internet lingo) through BGP to the routers that lie within the core of the network. NLANR reports that 22 percent of the address space is advertised in BGP. This number is smaller than the allocation figure would indicate, reflecting the fact that not all allocated addresses are in use. The NLANR data also indicate relatively rapid consumption: the advertised address space increased by roughly 6 per- cent from November 1997 to May 1999.2° Interpretation of these data is further complicated because not all the addresses that are advertised are actually assigned to an active host. A provider using an address block for a set of devices will generally advertise the whole block (since there is no cost to it in doing so, and doing otherwise would result in many more routing table entries) and then assign the unused addresses from the block as necessary. In most cases, ISPs do not reveal whether they have many or few unused addresses in their active blocks, an uncertainty that confounds determination of the immediacy of address space pressures.21 The advertised address space reported by NLANR is sufficient to address over 960 million hosts. How does this figure compare to what we can learn from other sources of information about the number of Internet users and computers attached to the Internet? The Computer Industry Almanac estimates that in the year 2000 there are about 580 million com- puters in use worldwide. 22 If all of the 580 million deployed computers were attached to the Internet all the time, we would surely exhaust at least the advertised addresses. This is because the addresses are allocated in a hierarchical fashion, in fairly large blocks. Each computer must there- fore be assigned an address from a block associated with the part of the Internet it is attached to, so that not all 960 million addresses could, in practice, be used. Data on computers attached to the Internet suggest a much smaller number. Estimates from Telcordia's Netsizer project, for example, are 19Christian Huitema. 1999. Evaluating the Size of the Internet. Technical Report, Telcordia. Available online at <ftp://ftp.telcordia.com/pub/huitema/stats/global.html>. 20National Laboratory for Applied Network Research (NLANR). 1999. 18 Months in the Routing Life of the Internet. Technical Report, Measurement and Network Analysis team, NLANR. University of California at San Diego, May 27. Available online at <http:// moat.nlanr.net/IPaddrocc/18monthSequel/>. 21For an examination of the inherent inefficiencies in addressing, see C. Huitema. 1994. The H Ratio for Address Assignment Efficiency, RFC 1715. Internet Engineering Task Force. Available online at <http://www.ietf.org/rfc/rfcl715.txt>. 22Data obtained from <http://www.c-i-a.com/>.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 73 that there were more than 86 million active Internet hosts23 in fuly 2000. The Internet Software Consortium estimates that there were over 72 mil- lion hosts in lanuary 2000 (the Telcordia data show a slightly smaller number for that date).24 If these numbers are accurate, fewer than 10 percent of the addresses within the advertised blocks are actually in use today, suggesting that pressures are, on average, not as great as one might conclude based on other measures. Another relevant piece of information is the number of Internet users. According to Nua Internet Surveys, there were an estimated 332 million users online as of tune 2000.25 Today this number exceeds the number of active Internet hosts. If each of these users were to have an active connec- tion to the Internet, the resulting consumption of addresses would be a significant issue. The ratio of users to hosts today is greater than 1, reflecting the fact that not all computers are attached to the Internet at any given time (especially dial-up connections) and that many computers are shared by more than one individual, particularly in the developing world. However, both of these factors are likely to lessen with time, and it is reasonable to project that in the long run the ratio of users to hosts will drop below 1, meaning that there will be more hosts than users. Latent demand further complicates the prediction of address con- sumption rates. Available data reflect the outcome of the present address allocation regime, in which addresses are tightly rationed for fear of de- pleting the address space. Consequently, the address consumption statis- tics, even to the extent that they are accurate, do not reflect the actual demand for addresses. Nor do they reflect the degree of hardship experi- enced by customers as a result of this rationing. Unique addresses would probably be more widely used if they were not so tightly rationed, but how much more widely is a matter of speculation. A further challenge is that today's rate of consumption may greatly underestimate demand in the future. One source of potentially large growth is the predicted emergence of many new IP devices.26 For ex- ample, it is reasonable to assume that within a few years individuals who currently have one or two IP-addressable devices will have five or ten 23Data obtained from <http://www.netsizer.com/daily.html>. 24Data obtained from <http://www.isc.org/ds/WWW-9907/report.html>. 25Data from <http:/ /www.nua.ie/surveys/how_many_online/index.html>. 26Various forecasts project that the number of such networked devices will vastly exceed the number of individual Internet users within the next decade. See, for example, Frank Gens. 1998. Death of the PC-Centric Era, International Data Corporation (IDC) Executive Insights, Technical report. Framingham, Mass.: IDC. Abstract available online at <http:// www.itresearch.com/alfatst4.nsf /UNITABSX/W16276?0penDocument>.

74 THE INTERNET'S COMING OF AGE (considering pagers, phones, PDAs, etc.~. The need for IP-addressability will also increasingly extend beyond human-held and -operated devices, to devices that are embedded in our physical infrastructures. While na- tionwide deployment of such systems would take many years, the next 3 to 5 years could see explosive growth of this sort of instrumented envi- ronment in office buildings, factories, hospitals, etc.27 In the longer term (beyond 5 to 10 years), the size of the realizable devices will continue to decrease to the point where we might have hundreds or thousands of devices per person. Some will be freestanding devices while others will be embedded in the user's environment. Current research and development efforts on the part of government and industry are addressing a number of technical issues such as power consumption and wireless communication; progress on these issues will help propel deployment of new networked devices. The IEEE 802.11 standard for wireless local area networks and other efforts to standardize physical layer communications are evidence of the readiness of the tech- nology and market. Not all devices need to be individually and globally addressable and not all IP addresses in use are public (i.e., visible in the global address space). These private address spaces are used for a variety of reasons, including reducing the number of public addresses required and enhanc- ing security.28 In many instances where the system is "closed" such as when a number of computers operate elements of a manufacturing facil- ity it may be perfectly adequate to give individual devices local, net- work-specific addresses. Data transfer into and out of a system or facility can be mediated by a special computer that acts as a gateway or external interface between a group of computers and the outside network. The number of computers currently assigned private addresses could be a significant factor in estimating future demand for global addresses. Many provide access to the Internet in some way, but most often through a private address space indirectly attached to the Internet. Because any device with a private IP address may, with appropriate modifications to a network's connectivity to the public Internet, be connected to the Internet, 27More examples of possible applications and environments: medical (operating rooms, hospital beds, injected in-body monitoring), scientific (environmental monitoring, physi- ological data collection), office and home (tracking and coordinating people and objects, implementing sophisticated security systems), and industrial (factory floors, hazardous op- erations, distributed robotics). 28The security afforded by NAT is limited. In particular, it suffers from all of the limita- tions associated with perimeter defensive measures, as discussed later in the chapter.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 75 its owner may at any time seek an address for it on the public Internet. This latent demand is found in home networks as well as the networks of large organizations. In the home, some users today share a single pro- vider-supplied IP address among multiple home devices using NAT run- ning on a home gateway (which may be in the form of software running on one of the computers in the home).29 However, many computers will be used in applications that require devices to be capable of external interrogation in a flexible manner and thus not via explicitly configured gateways meaning that they will need to be assigned unique global IP addresses. International Pressures for Addresses Address availability and consumption rates are not uniform from country to country. Demand from regions of the world where Internet use can be expected to continue to grow steeply not surprisingly, these are often places that historically have not had many computers connected to the Internet could rapidly exhaust their existing global address space allocations. Forecasts for the next decade suggest, for example, that al- though there are relatively few computers and users in China today, by the end of this decade that country may have more computers and users on the Internet than the United States.30 The present disparity in address allocation is illustrated by the observation that a number of organizations and businesses, including several U.S. universities, were each allocated 224, or 16,777,216, addresses many years ago under a different regime (about 40 class A networks were allocated that way), whereas the 1.3 billion people in China have far fewer. Because of the shortage of ad- dresses, Stanford University undertook a project, completed on May 1, 2000, to consolidate its use of addressees for its roughly 56,000 computers and return the more than 15-million-address block for reassignment by 29IP address scarcity as seen by home users is not just a reflection of an overall shortage of addresses. Many ISPs restrict residential customers to a single IP address on the assump- tion that a site that needs more than one address is really a commercial site, for which a higher rate can be charged. 30For example, a 1999 study from BDA (China Ltd.) and the Strategis Group projects that total Internet users in China could exceed 33 million by the end of 2003. (See PRC Informa- tion Technology Review. 1999. "Internet Users in China to Reach 33 Million by 2003." 1~26), July 2.) More recently, the Boston Consulting Group forecast that, based on projected growth of 25 to 35 percent per year, the number of Internet users in Asia is likely to grow more than fivefold, to nearly 375 million, by 2005. (See Agence France Presse. 1999. "China to Propel Asian Internet Users to 375 Million in 2005." November 2.)

76 THE INTERNET'S COMING OF AGE the organization that allocates addresses in North America.31 The com- parison between addresses assigned to these organizations and China provides an extreme example of the dominant factor behind these alloca- tion disparities address assignments reflect needs that the requesting organization has been able to substantiate on the basis of current use or through credible projections of future needs and reflect the overall avail- ability of addresses at the time that the assignment is made. Network Address Translation As described above, CIDR and NAT were adopted to alleviate ad- dress scaling-related challenges. In addition to offering capabilities for local address management, NAT enables reuse of global addresses. The technology is widely employed and is included in a number of current and planned operating system releases.32 Not just a technology used by larger organizations, NAT is being used in home and small office net- works to allow a single IP address connection (e.g., through a dial-up, DSL, or cable modem connection) to be shared by multiple computers on the home network. However, the NAT approach has significant shortcomings. To start with, the growing support for NAT and NAT-like facilities delivers the wrong message to anyone trying to resolve the address space shortage by deploying IPv6 technology rather than NAT. NAT is viewed by many as a rather ugly, simplistic fix that is cheaper to deploy (something that might be referred to as a "kludge") than an architectural model backed by long-term analysis. NAT devices, which are computers where large num- bers of data streams come through in one place, also are attractive targets for man-in-the-middle attacks that can listen in on or redirect communi- cations. NAT is also not a satisfactory solution for some very large net- works because the size of the address blocks designated for use in private networks (i.e., blocks of IPv4 addresses that are not allocated for global addresses) is finite. These blocks would be, for instance, too small to permit an ISP to provide unique addresses for each of potentially tens of millions of customers (e.g., set-top boxes or wireless Internet phones), although some ISPs, such as China's UNINET, have been forced to make 31See Networking Systems Staff. 2000. IF Address Changes at Stanford. Stanford Univer- sity. Available online at <http://www.stanford.edu/group/networking/ipchange/> and Carolyn Duffy Marsan. 2000. "Stanford Move Rekindles 'Net Address Debate."' Network World, January 24. Available online at <http://www.nwfusion.com/news/2000/ 012ipv4.html>. 32For example, both Windows 98 SE (as part of the Internet connection software) and Windows 2000 include NAT functionality.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 77 use of NATs to cope with address shortages. As an architectural ap- proach, NATs are seen by some as self-contradictory. NATs have the advantage that they provide some degree of security by hiding private addresses behind the address translator, but the protection afforded is limited. Another difficulty with the model is that it presumes that the Internet is limited in design to a finite set of edge domains surrounding a single core network, each of which contains a number of machines sitting behind a NAT. This model is inadequate because there are, in fact, many parts to the Internet core. Another difficulty is that because NAT requires devices to in effect have two addresses the external address (and port number) seen by the global Internet and the internal address (and port number) used to con- nect to the device within the local network it breaks the transparency of the network. This difficulty is discussed in detail in Chapter 3; the essen- tial point is that globally unique addresses are an important design at- tribute of the Internet and that applications are supposed to be able to rely on them without any knowledge of the details of the network between source and destination. If a NAT is being employed, the address the device knows about (the local address) will differ from the address by which the device is known in the global Internet (the external address). Also, the mapping between local and external address is dynamic, de- pending on the actions of the NAT. IPv6: A Potential Solution to Addressing and Configuration The specter of an address shortage drove the development of IPv6, a recommended follow-on to today's IPv4 protocol. The Internet Engineer- ing Task Force (IETF) produced specifications33 defining a next-genera- tion IP protocol known variously as "IPng," "IP Version 6," or "IPv6." IPv6 was designed to improve on the existing IPv4 implementation, tack- ling the IP address depletion problem as well as a variety of other issues. Its major goals were the following: · Expand addre.~.sin~ canabilitie.~. eliminating the TP adore.. denle- tion problem; data; -A -~-r--~-------, -------_ __ <~ --r - · Incorporate quality-of-service capabilities, particularly for real-time · Reduce the processing time required to handle the "typical" packet and limit the bandwidth cost of the IPv6 packet header; 33s. Deering and R. Hinden. 1998. Internet Protocol, Version 6 (IPv6) Specification. RFC 2460. December. Available online from <http://www.ietf.org/rfc/rfc2460.txt>.

78 THE INTERNET'S COMING OF AGE · Provide routing improvements through address autoconfiguration, reduced routing table sizes, and a simplified protocol to allow routers to process packets faster; · Provide privacy and security at the network layer supporting au- thentication, data integrity, and data confidentiality; · Improve network management and policy routing capabilities; · Allow a transition phase by permitting the old (IPv4) and new protocol to coexist; · Allow the protocol to evolve further by imposing less stringent limits on the length of options and providing greater flexibility for intro- ducing new options in the future; and · Improve the ability of the network manager to autoconfigure and manage the network. The request for proposals for a next-generation Internet Protocol was released in fuly 1992, and seven responses, which ranged from making minor patches to IP to replacing it completely with a different protocol, had been received by year's end. Eventually, a combination of two of the proposals was selected and given the designation IPv6. The essential improvement offered by the new protocol is the expanded address. It is 16 bytes long with 61 bits allocated to the network address, compared to the 32 bits provided by IPv4. IPv6 addresses support billions of billions of hosts, even with inefficient address space allocation. The new standard offered a number of additional features, including the following: · Increased efficiency and flexibility. Some IPv4 header fields were cropped, so that the header now contains only 7 mandatory fields as opposed to the 13 required in IPv4. IPv6 also offers improved support for extensions and options. · Enhanced autoconfiguration. IPv6 offers improved autoconfiguration or negotiation of certain kinds of addresses. Possible approaches include (1) using the Ethernet or other media address as the "host selector" part of an address otherwise learned from the local router, (2) procedures for creating local addresses on point-to-point links, and (3) improvements to the Dynamic Host Configuration Protocol (DHCP) that may allow partial autoconfiguration of routers.34 34Recently, there have been concerns that using a unique identifier as part of the IPv6 address would facilitate the tracking of users on the networks and could therefore pose a privacy risk. This risk is in fact inherent in any system where the connection to the Internet is always on, a characteristic of most broadband services. This concern, while widely expressed, is not necessarily valid; there exist ways to limit the ability to isolate addresses

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 79 · Enhanced support for quality of service (QOS). QOS functionality is potentially enhanced through the addition of a new traffic-flow identifi- cation field. QOS products based on this flow identifier are still in the planning stage. Deploying an IPv6 Solution The technical viability of IPv6 has been demonstrated by a number of implementations of IPv6 host and router software as well as the establish- ment of a worldwide IPv6 testing and preproduction deployment net- work called the 6BONE, which reaches hundreds of academic and com- mercial sites worldwide. However, to date, IPv6 has not seen significant deployment and use. This is not a surprise; IPv4 has been adopted by a very large number of users, and there are significant costs associated with a transition to IPv6. Reflecting their perception that the gain for switching to IPv6 is not sufficient to justify the pain of the switch, customers have not expressed much willingness to pay for it, and equipment vendors and service providers are for the most part not yet providing it. An important exception is the planned use of IPv6 for the so-called third-generation wireless devices now being developed as successors to present mobile telephone systems.35 For many, the devil they know is better than the one they don't know. Until an address shortage appears imminent, incremental steps restric- tions on address growth or use, or the use of NATs or some other (painful but understood) kludge will appear less painful than switching to IPv6. Indeed, some believe that address exhaustion is not a problem meriting deployment of a new solution, arguing that CIDR makes the Internet's core a stable enough addressing domain that private address spaces around the periphery of that domain can effectively serve demands for the foreseeable future. On the other hand, as discussed in the previous section, this approach could have serious drawbacks: an Internet in which NATs become pervasive is an Internet in which the assumption of trans- parency no longer holds (see Chapter 3 for more on this issue). Moreover, to machines, and since many machines are used by more than one person, the identifier doesn't track back to a person. Full protection requires the use of explicit services, such as anonymizing relays. The specific IPv6 problem has been addressed by the IETF, which has devised a way to derive the address from the unique identifier without exposing the identi- ner itself. 35see, e.g., Nokia. 2000. Nokia Successfully Drives Forward IPv6 As the Protocolfor Future 3G Networks. Press release, May 26. Available online at <http: / /press.nokia.com/PM/ 782371.html>.

80 THE INTERNET'S COMING OF AGE it ignores the needs of potentially large consumers for address space for devices such as IP-based mobile telephones. Transition to IPv6 requires the development and deployment of new networking software (Internet Protocol stacks) on all the devices that are connected to networks running IPv6, updated software (and even hard- ware) in routers within these networks, and translation gateways between networks running IPv4 and IPv6. Also, many applications running on computers connected to IPv6 networks will probably also need to be adapted to use the longer address. Desktop support alone is not enough to trigger industry adoption, because IPv6 is not something that can take place as a result of actions taken at the edges of the network alone.36 Implementations of many of the requisite elements exist today; the hurdle is largely not one of developing the required technology but of making the investments needed to deploy it. There are various strategies by which IPv6 could be deployed. No one believes that a scheduled, "flag-day" transition would be effective, given the absence of incentives for players to comply and, worse still, the prospect that an abrupt shift to IPv6 could go awry catastrophically. (No one in the industry wants to wake up to news reports like "Major ISP Backbone Melts After Messing with IPv6.") The difficulty of making a switch has long been recognized, along with the idea that deployment will need to be incremental. Transition is complicated because an IPv4- only system cannot communicate directly with an IPv6 system, meaning that either the IPv6 device must also speak IPv4 or it must have a transla- tor. A transition period is foreseen in which IPv4/IPv6 translators (some- what like NATs) sit between systems running different protocol versions, enabling a piecemeal deployment of IPv6 from the Internet's edge toward its core, and a model known as "6 to 4" is emerging in which every IPv4 site automatically inherits an IPv6 prefix with the IPv6 traffic encapsu- lated in IPv4. Finally, it should be noted that many of the addressing issues dis- cussed here arise, in part, from routing scaling considerations. CIDR was a measure to control the number of routes that a router must remember. It assigns addresses in a hierarchical manner that forces them to be as- signed in blocks (and also forces most Internet subscribers to obtain their addresses from their ISP). It is, however, viewed as a burden for the subscriber, because it means that to switch from one ISP to another, it is necessary to renumber all the attached machines. With NAT, this is not an issue, because subscriber machines are assigned private addresses, 36Another example of this is multicast, which, although it is supported in major operating systems such as Windows, is not widely used.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 81 and only the single addresses of the NAT boxes need to be changed, a much smaller number. However, if IPv6 were to be deployed and all machines were to revert to the original architectural model of globally unique, public addresses, this problem would resurface on a large scale. While it would not be fully automatic, IPv6 would provide for much simpler renumbering. Still, while the current situation is generally ad- equate, the difficulties associated with assignment and renumbering sug- gest that continued research on new approaches to addressing and rout- ing would be worthwhile. RELIABILITY AND ROBUSTNESS The proliferation of devices and applications discussed above is symp- tomatic of growing use and an expectation that the Internet can be de- pended on. This expectation translates into a requirement for greater reliability, robustness, and predictability. As more and more critical func- tions are moved to the Internet, there is increasing pressure for reliable and continuous service, even when things go wrong. As the uses of the Internet expand, the stakes and thus the visibility of reliability and ro- bustness concerns will rise. There are also public safety considerations, some of which will derive from expectations associated with conventional telephony; telephone us- ers expect a high level of availability for normal calling and demand availability for making emergency (911) calls. It is reasonable to assume that such expectations including, for instance, the capability for auto- matically notifying authorities of the geographical location of a 911 caller will transfer to Internet telephony services. As Internet use be- comes more widespread, it is conceivable, or even likely, that other, new life-critical applications will emerge. For example, Internet-connected appliances could be used to monitor medical conditions, e.g., remote EKG monitoring.37 Concerns about the Internet's robustness and vulnerability to attack are reflected in the attention given to these matters by the national secu- rity/emergency preparedness (NS/EP) community. This community tra- ditionally relied on the PSTN for 911 emergency services and priority restoration and service delivery during designated crises. It promulgates guidelines that are then codified in regulations and builds on years of experience and personnel training as well as the cost structures of regu- 37computer science and Telecommunications Board ~csTsy, National Research Council ~NRCy. 2000. Networking Health: Prescriptions for the Internet. Washington, D.C.: National Academy Press.

82 THE INTERNET'S COMING OF AGE fated telephony. NS/EP organizations, with public and private sector elements, have expanded their missions and compositions to embrace nontelephony service providers and the Internet, and they have begun to study and make recommendations regarding the Internet's robustness.38 Discussions between the NS/EP community and ISPs are in their early stages, and what form any agreements between the two would take re- mains to be seen. Designing for Robustness and Reliability While the terms are often used interchangeably, reliability, robust- ness, and predictability refer to different aspects of the communications service. Robustness refers to the ability of a system to continue to provide service even under stress (e.g., if some of the system's components or capabilities are lost to failure or it is subject to malicious attack.) A robust system is, essentially, one that is not subject to catastrophic collapse; in- stead, it degrades gracefully, providing the level of service that would be expected from the available resources. Reliability is a measure of whether a system provides the expected level of service For example, a system designed for 99.999 percent reliability would fail not more than 5 minutes per year. Reliability is typically achieved by the combination of compo- nent reliability, component redundancy, and a robust system design. An important distinction between robustness and reliability is that while a robust system typically provides a reliable service, a reliable system need not be robust. For example, many companies run certain vital functions on a single computer that is very well maintained. This computer and its services are reliable (because the system is carefully maintained, the ser- vice is almost always available) but they would not be considered robust (if the computer fails, the service is not available). Predictability refers to the user's expectations about the availability and quality of service rou- tinely being met. While a service can, in theory, be predictably poor (e.g., telephone service in some developing countries), when most users speak of a predictable service they mean a reliable service whose occasional periods of degraded service can be anticipated (e.g., the difficulty of mak- ing long-distance phone calls on Mother's Day due to the volume of calls). The Internet's design is based on a world view different from that of the PSTN on how to provide the qualities of reliability, robustness, and predictability. The choice to base the Internet on richly interconnected 38See, for example, National Security Telecommunications Advisory Committee (NSTAC). 1999. Network Group Internet Report: An Examination of the NS/EP Implications of Internet Technologies. Washington, D.C.: NSTAC.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 83 components stems from two observations that have their origins in re- search in the early 1960s.39 First, although the PSTN depends on the reliability of its components, neither the overall system nor any of its components are perfectly reliable. Second, with the right design, the reliability of a network does not depend on all its components working; it only requires that there are enough working components to connect any two end points. Each component is itself reasonably reliable, but the design relies on the network's ability to adapt, i.e., to reconfigure itself when a component fails so as to make efficient use of the remaining components. Consequently, it is able to recover from local outages of components by simply requiring the computers communicating across it to retransmit the messages that were in transit at the time of the failure. The result is a robust service built of components that are far less reliable than those of the PSTN. In contrast, the PSTN's design emphasizes building a network out of highly reliable components and using them reasonably well. The PSTN achieves its reliability in part by working very hard to make sure that every important component of the PSTN is, in and of itself, reliable. The result is a system that may not be robust in all circumstances, because if a critical component of the PSTN fails, the system fails. Typically these failures are minor (e.g., a few phone calls get prematurely disconnected) but some can be spectacular (e.g., the multiday loss of phone service as a result of a fire at a central office in Hinsdale, Illinois, in 1990~. Also, in the PSTN the viability of a given call is viewed as secondary to the reliability of the total infrastructure individual calls need not be reliable in an infrastructure failure. Rather, user applications are required to be robust in the presence of failure, and the PSTN concerns itself with restoring service quickly when an outage occurs so that the application can restart. Following service restoration, applications restart the caller places a new call, and interrupted file transfers restart from the beginning or from a saved checkpoint. Any resource optimization that might result from dy- namic rerouting is viewed as secondary to the predictability of the call itself. The routing path for a connection through the PSTN is established at call time; this routing does not change during the life of a call. While the Internet is sometimes unreliable, it generally operates as the underlying design presumed it would. There are a number of reasons why the Internet is viewed as unreliable despite its robust design; the discussion below outlines several of these as well as possible remedies. 39See, for example, Leonard Kleinrock. 1964. Communication Nets: Stochastic Message Flow and Delay. New York: McGraw-Hill.

84 THE INTERNET'S COMING OF AGE Vulnerability of the Internet to Attack There is increasing concern about deliberate attempts to sabotage the operation of the Internet. Recent reports, including the 1999 CSTB report Trust in Cyberspace40 and a National Security Telecommunications Advi- sory Committee report,41 identified a number of vulnerabilities of the Internet (as well as of the PSTN). These vulnerabilities are associated with failures of hardware and software as well as the system's suscepti- bility to malicious attack. Indeed, there are a number of points of poten- tial massive vulnerability in today's Internet, including the integrity of the addresses being routed, the integrity of the routing system, the integ- rity of the domain name system, and the integrity of the end-to-end appli- cation communication (Box 2.3~. Considerable attention has been devoted to the Internet as part of the late-199Os examination of the nation's critical infrastructure. In 1997, the President's Commission for Critical Infrastructure Protection issued a re- por*2 highlighting concerns about the vulnerability of several infrastruc- ture sectors, including information and communications; it found both a growing dependence on and vulnerability associated with such infra- structures. The next year, 1998, saw the issuance of Presidential Decision Directive 63, which focuses on protecting critical infrastructure against both physical and cyber attack, and the establishment of a National Infra- structure Protection Center, located in the Department of Justice, which is indicative of law enforcement's growing attention to the problem of mali- cious attacks on infrastructure, including the Internet. The network also threatens the security of users because it can serve as the conduit for a variety of attacks against their systems. Various forms of attack exploit weaknesses in end-user systems, allowing an un- authorized attacker to read or manipulate information in the systems. Some attacks involve breaking into a system for the purpose of obtaining or modifying its data. Fraudulent entries might be made in accounting files, pictures on Web pages might be replaced with others that the at- tacker finds humorous, or sensitive information might be accessed. Or, an attack might simply overwrite (destroy) data. A man-in-the-middle 40Computer Science and Telecommunications Board (CSTB), National Research Council. 1999. Trust in Cyberspace. Washington, D.C.: National Academy Press. 41National Security Telecommunications Advisory Committee (NSTAC). 1999. Network Group Internet Report: An Examination of the NS/EP Implications of Internet Technologies. Wash- ington, D.C.: NSTAC, June. 42President's Commission for Critical Infrastructure Protection (PCCIP). 1997. Critical Foundations: Protecting America's Infrastructures. Washington, D.C.: PCCIP. Available online at <http: / /www.pccip.ncr.gov/report_index.html>.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 85 attack does the same thing from the vantage point of some intermediate system that has access at some point between the sender and receiver (e.g., a router or gateway). Other attacks do not target the data but attack the resources for its transport a denial-of-service attack might be imple- mented simply by filling a communications link with junk messages, con- suming much or all of the available bandwidth, thereby slowing or deny- ing its intended use. Attacks can be launched against the systems or networks of particular ISPs or end users or more broadly against the Internet's infrastructure.43 The Internet also serves as a conduit for the rapid transmission of viruses computer code that can alter or destroy data in computer systems which are readily attached to files and sent by e-mail to users of a network.44 As new viruses continue to be invented and distributed, techniques for detecting and mitigating their effects must be developed as well. The threat posed by Internet vulnerabilities is magnified by several coincident developments. Tools to exploit Internet vulnerabilities are readily disseminated. They can be distributed to a large number of techni- cally unsophisticated users, some of whom may not even be aware of the implications of using them. Also, as an increasing number of users access the Internet through broadband services, their machines become plat- forms for launching attacks, such as denial of service, that are more effec- tive when the attacker has a high-speed connection to the Internet. (This is somewhat compensated for by the even greater speed used by ISPs and large servers, which renders saturation attacks more difficult simply be- cause it is harder to completely congest the target computer's communi- cations link.) At the same time, the always-on systems of broadband users become targets for attackers and may be captured and used, unbe- knownst to their owners, as relays for further attacks on the network. In such an environment, ISPs have to take steps to protect their own net- works as well as the customers connected to it. The types of attacks discussed here are varied; while those who conduct the attacks may be unsophisticated, those who develop the capabilities are resourceful and creative. While a number of measures can be implemented within the network 43For example, the routing infrastructure itself might be attacked by hijacking a TCP connection or inserting incorrect routing information into the system. Or, the name service infrastructure could be attacked on a global basis by launching a denial of service against the Domain Name system s root computers. 44In 2000, several viruses disseminated as e-mail attachments spread widely and received a great deal of media attention. Most prominent was the "I Love You" virus, which spread by exploiting a vulnerability in the Microsoft Outlook e-mail software.

86 THE INTERNET'S COMING OF AGE to enhance robustness, the Internet's architecture, in which service defini- tion is pushed to the end systems connected to the network, requires many security issues to be addressed at the edges of the network. Both the performance of individual applications and services and the overall integrity of the network are therefore placed in the hands of these end systems. As more and more end systems with widely varying degrees of trustworthiness are added to the network, the more tenuous the assump-

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 87 lion of mutual trust becomes. Thus, absent the adoption of new mea- sures both within the network as well as at its edges, security violations of varying degrees of severity can be expected to continue. In particular, the dependence on end systems means that much of the security burden rests on vendors of applications and operating system software as well as on the users that configure and operate those systems. Concerns about these issues have led to investment in research aimed

88 THE INTERNET'S COMING OF AGE at addressing some of these shortcomings. The Internet's vulnerability stems in part from the difficulty, today, of tracing the location from which attacks such as denial of service have been launched, making it difficult to identify an attacker after the fact. One approach under development aims to allow an attack to be traced, without requiring access to ISP logs, if the attack involves a lot of packets. Other research is aimed at tracing attacks without storing large amounts of information, thereby reducing the stor- age burden and alleviating the privacy concerns that would arise from ISPs logging of detailed information on the traffic passing over their net- works. In many cases, however, the Internet is vulnerable to attack not be- cause there are no technologies to protect it but because they are not used, whether for lack of perceived need or lack of knowledge about such mea- sures. Known vulnerabilities are frequently not addressed despite the promulgation of patches to fix them. Enhanced technologies such as single-use passwords, cryptographic authentication, and message digest authentication technology have been developed and are starting to see deployment. Nonetheless, many network operators do not necessarily use these capabilities to defend their systems; plain text passwords, which are readily guessed or captured, remain the authentication technique in most common use. Likewise, while many data access or modification attacks are made by insiders (people within the organization), the most commonly installed intrusion detection and prevention technology is a firewall, which by nature provides only limited defense and only protects against attacks across the perimeter they defend the intruder who suc- ceeds in getting inside the defense walls ends up with free run of the interior unless a defense in depth has been put in place. Responses to these vulnerabilities have been mixed, and continued work will be required. Specifying and deploying the needed upgrades to the protocols and software used in the Internet is a complex problem. They are a result of both group development efforts and marketplace dynamics. These, in combination, determine what is developed and whether and when it is adopted and deployed. An indicator of both today's problems and prospects for future improvements is the relatively recent move by the IETF to prevent those proposing Internet standards from sidestepping analyses of security implications.45 Within the Internet 45The September 1998 guidelines and procedures for IETF working groups state as fol- lows: ''It should be noted that all IETF working groups are required to examine and under- stand the security implications of any technology they develop. This analysis must be included in any resulting RFCs in a security Considerations section. Note that merely noting a significant security hole is no longer sufficient. IETF developed technologies should not add insecurity to the environment in which they are run Internee Engineering Task Force tIETF'. 1998. IETF Working Group Guidelines and Procedures, RFC 2418. IETF, p. 22. Available online from <http://www.ietf.org/rfc/rfc2418.txt>~.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 89 marketplace, progress is evident because some ISPs see robustness and service guarantees as an enticement to attract customers. More Adaptive Routing Improvements in robustness will become increasingly important as service-level agreements between providers and customers (including other providers) begin to specify high reliability. There is considerable debate among those who design and operate the Internet's networks about the relative priority to be attached to improving the reliability of compo- nents and to increasing the adaptive performance of the network. One approach is to make components more reliable, the argument being that this improvement, taken in combination with the Internet's current level of robustness, would yield a network potentially far more reliable than the PSTN. Another approach, which is based on the belief that engineer- ing extremely reliable equipment is extremely expensive, is that the Internet should be able to achieve much better overall reliability with less reliable but less costly components by improving its adaptive perfor- mance. Robustness depends on software as well as hardware, and achieving a high level of robustness, particularly when a strategy of using lower cost, less reliable hardware is adopted, depends on software and deploy- ments that enhance the Internet's adaptability in the face of individual component failures. A principal means of adaptation is the modification of routing paths to reflect information discovered about links and routers that are not operational. Today's routing protocols were developed at a time when major links were significantly slower than they are now. Im- proved routing algorithms, suggested by Djikstra in 1970 but widely implemented only in the past decade (in the open shortest path first rout- ing protocol), have improved reliability and robustness. Still, the rate at which the network adapts to failures remains fairly sluggish. Routing could, in principle, recover from failures (or incorporate newly available capacity) in tens of milliseconds, but the historically based timer settings that are found in deployed routers permit recovery only on much longer timescales of tens to hundreds of seconds a timescale long enough to have a noticeable impact on users and to result in the failure of many individual transactions. Why do the routing protocols have such long time constants? The basic reason is that a local routing change can, in principle, have global consequences. For example, one might have a situation in which the failure of just one link would cause significant traffic flows leaving the United States to shift from an eastward (Atlantic Ocean link) to a west- ward (Pacific Ocean link) path. The circumference of the earth is ap-

So THE INTERNET'S COMING OF AGE proximately 40,000 kilometers, so for communication at the speed of light it takes a data packet slightly more than 0.1 seconds to go around the earth. That represents a lower bound; in practice, the time it takes a data packet to go around the earth will be substantially longer, owing to de- lays in the various network elements it must traverse in the Internet. Any routing protocol that takes into account the global consequences of a routing change has to adjust on a timescale longer than this roughly 0.1-second minimum; otherwise, the results of the routing computation will not be globally consistent for example, there would be no use sud- denly sending traffic via the Pacific if routing information describing what the routers located in Asia should do with the traffic has not had time to make it there yet. However, it turns out that most local link failures do not have global consequences. So if there were designs for propagating rout- ing information that could limit the consequences of certain classes of changes, the network could respond much faster. The development of these sorts of bounded schemes would make a good class of research. Faster adaptation whereby failure of an Internet element triggers a quick rerouting of traffic also depends on supporting capabilities such as rapid failure detection. To increase robustness, redundancy is also needed in the major com- munications links. Optical fibers, for example, are carrying growing amounts of traffic as wave division multiplexing (WDM) increases the capacity of each fiber and, accordingly, the impact if it fails. The sharing (multiplexing) of communications facilities means that it is increasingly difficult to provision physically diverse communications links between two given sites. For example, two nominally different communications links, even when obtained from different providers, may in fact run over the same fiber, in the same fiber bundle, or in the same conduit, meaning that they are both vulnerable to a single physical disruption (e.g., a back- hoe inadvertently cutting a buried fiber bundle). Finally, as the next section discusses, the extent to which the Internet is able to adapt to failures depends on how the interconnections between providers are de- signed and implemented. Putting It Together The heterogeneous, multiprovider nature of the public Internet poses additional robustness challenges. The Internet is not a monolithic system, and the several thousand Internet service providers in North America range in size from large international providers to one-room companies serving a single town with dial-up service. The business demand for these operators to provide a high level of redundant facilities, constantly manned operation centers, or highly trained staff and their ability to do

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 91 so varies considerably. Nonetheless, all these different providers, with their different goals for availability, reliability, and so on, interoperate to provide the global Internet service. The section above described why it can be difficult to provide suffi- ciently diverse communications paths. Doing so is a necessary but not sufficient condition for increasing the robustness of interprovider links. Traffic cannot be routed to alternative paths if the interprovider connec- tion agreements overly constrain the number of paths the data can follow. For example, interconnection agreements often contain provisions, aimed at preventing traffic dumping (one provider dumping traffic onto seg- ments of another provider's network), that have the effect of permitting traffic between a particular point on one ISP's network and a point on another ISP's network to follow only one specific path. If that specified link fails, there are no alternative paths for the traffic to follow (until the failure is detected and other traffic flow arrangements are put in place). ISPs devote considerable resources to detecting and responding to failures and attacks. Problems that cross ISP boundaries represent a par- ticular challenge as the reliability of any one part of the Internet (and thus the reliability of the whole Internet) can depend on what happens within other ISP networks. Not all ISPs have the same level of expertise or capabilities, and one may trace a problem only to find that it originates with a provider that lacks the facilities or expertise to address it. There have been instances of failures in one ISP that have caused DNS failures, routing failures, or transmission failures in a neighboring ISP. ISPs de- pend on expert network operators to maintain the health of their indi- vidual networks as well as the Internet as a whole. They continue to track problems and troubleshoot on an ad hoc basis (e.g., through the informal relationships that exist among the expert network operators) and through loose coordination mechanisms (e.g., the North American Network Op- erators Group) in order to minimize network service outages. And indus- try groups such as IOPS have been convened to tackle robustness issues. It is hoped, but is by no means certain, that better tools and methods will result in operations that are more robust. ISPs must be able to resist attacks that originate within other ISPs (e.g., foreign ISPs with perhaps unfriendly intentions) but at the same time must interconnect with other ISPs to preserve the basic transport function of the Internet. However, more structured and effective methods and tools are required for robust operation, especially to protect against intentional malicious attacks that originate in a connected ISP. The design of such protocols and methods is, however, still largely a research issue.46 46Computer Science and Telecommunications Board (CSTB), National Research Council. 1999. Trust in Cyberspace. Washington, D.C.: National Academy Press.

92 THE INTERNET'S COMING OF AGE Application Reliability and Robustness Given the Internet's layered architecture, responsibility for ensuring that a given application works properly is diffuse. The Internet's design allows each user to decide what applications to run over it. Because it is the software running on computers attached to the network rather than software in the routers that determines what applications can be run, the Internet service provider does not, in general, see what application the user is running and may not take specific steps to support it. A conse- quence is that a user who obtains software for a new application such as Internet telephony may be surprised to discover that the part of the T . . . ~ · ~ ~ ~ · . . ~ ~ · . · · ~ . . · ~ . ·1 Internet to which he or she Is attached Is not provisioned to satisfactorily support the application. An application such as telephony thus raises not only the question of how to address reliability and quality across network providers but also the question of who in fact is providing the service. Telephony may be offered by an ISP or by a third party. The third party may simply offer hardware or software, in which case users would be able to set them- selves up for Internet telephony on their own, without any Internet ser- vice provider being involved (or even having knowledge of how its net- work is being used). Or, a third party may offer services via the Internet that are related to placing calls, such as a directory service that allows one to determine the Internet device associated with a particular user and thus place a call. In this case, although a service provider can be identi- fied, it need not have established any relationship with the Internet ser- vice providers responsible for actually serving the customer and carrying the data associated with a telephone call. In the absence of specific service arrangements made by customers or an Internet telephony provider, an Internet provider that happens to be carrying voice traffic will not in general even know that its facilities are being used for telephony. A user might then place a 911 call and discover that the ISP serving that user has not provided a robust connection to any 911 facility. This separation of Internet service from application means that, with today's Internet, one cannot assume that the robustness of an infrastructure is appropriate for the robustness of the application. One response to a view that "telephony is telephony" (that is, no matter what technologies are used) would be to impose quality, robust- ness, and emergency service requirements only on the parties that specifi- cally offer Internet telephony services. These parties would be respon- sible for negotiating with the Internet providers who carry traffic for them to provide an acceptable quality of service. However, this approach is not easily implemented when an application comes in the form of shrink- wrap or downloadable software that is installed by the consumer and intended to be run over whatever network the consumer subscribes to.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 93 Another response would be to wait for market and technology devel- opments. Mobile phones were deployed despite obvious limitations in quality and availability, and the users simply adapted their expectations. Internet applications may be introduced with similar expectations. They may then be improved over time, either because the quality of the net- work increases or because the application designers incorporate features to query or test the network's quality and compensate for it a develop- ment that may or may not happen. Robustness and Auxiliary Servers Sometimes ISPs do make special provisions to support particular types of content or services. One popular, partial solution to scaling and quality of service issues is to distribute content servers at strategic points in the Internet. Their use raises an interesting reliability trade-off. On the one hand, as evidenced by the substantial investment in deploying cach- ing and proxy services, such capabilities are viewed by many as an essen- tial tool for making usable services that would otherwise be compromised by network congestion. On the other hand, the robustness concerns asso- ciated with some of these capabilities can be viewed as a good example of the principle that today's performance enhancements can become tomorrow's problems. In some cases, the use of the servers is integrated into the application during the publishing process. Web pages are rewritten to point the user toward secondary content servers. In other cases, the caches are intro- duced in the form of content routers that are deployed by the ISP. The content routers, which operate outside the control of either the end user or the content provider, will typically filter packets that appear to request a Web service by looking at the TCP port numbers, and they will route these packets to the local cache server or content server regardless of the IP destination address. Because both practices insert a set of auxiliary servers in the infra- structure, they have implications for robustness, as any failure of the servers would affect the end-to-end availability of the service. But their implications for robustness are different. When the usage of auxiliary servers is decided by the content provider at the application level, the responsibility for providing the right content and providing robust serv- ers remains with that content provider. Wrong decisions, poor reliability of auxiliary servers, or implementation mistakes only affect the products of a specific publisher, without compromising the service experienced by other users. In contrast, a network provider that decides to intercept Web requests and to route them to its own content servers may run a greater risk, as there is no way for the content provider or end user to correct the

94 THE INTERNET'S COMING OF AGE failure of a cache server by requesting the same information from another server; by the same token, there is no way to stop the content server from modifying the information. Toward Greater Reliability and Robustness: Reporting Outages and Failures Overloads, failures, operator errors, and other events on occasion disrupt user activities on the Internet. Industry organizations such as NANOG47 and IOPS48 provide forums for the exchange of information among ISPs about both problems and solutions. Anecdotal reports (e.g., press reports and e-mail exchanges among network operators and other industry experts) provide additional information. Thus we have some indication of the sources of failure, which include the following: · Communications links that are severed by construction workers digging near fiber-optic cables; · Network operators that issue incorrect configuration commands; and vices. · Inadequately tested software upgrades that cause systems to fail; · Deliberate attacks that are carried out on Internet systems and ser- But information on Internet failures has not been reported or ana- lyzed on a systematic basis. From the standpoint of both customers and policy makers, more systematic Internet statistics49 would be helpful in both appraising and improving the Internet's robustness. Also, a require- ment to report problems would create pressure to increase the overall reliability of the Internet. It could also help consumers distinguish among providers with different service records and it might help providers de- tect recurring patterns of failures, so that the root cause could be elimi- nated.50 47see <www.nanog org> 48See <www.iops.org>. 49Such information was available on the NSFNET, for which Merit, Inc., collected and published statistics. 50For example, in the case of PSTN reliability reporting, the Network Reliability steering committee used the FCC-mandated data to identify construction activity resulting in dam- age to fiber-optic cables as the factor responsible for more than 50 percent of facility outages and, further, found that more than one-half of these occurrences were due to the failure of excavators to notify infrastructure owners or provide adequate notice before digging.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 95 There is an important distinction between the needs of customers and the needs of those seeking to understand and eliminate problems. The performance of ISPs can be monitored by independent organizations that publish reports for subscribers or the public. A buying guide of sorts, this type of information helps a customer assess the service of the various ISPs. Independent monitoring may discover and report outages, but it does not necessarily provide access to the causal and diagnostic informa- tion that is needed to try to eliminate the root cause of problems. The latter goal requires that detailed information on failures be made avail- able to industry groups and/or researchers. The government has mandated problem reporting in other circum- stances. Where public safety is involved, the government has imposed very strict requirements for reporting and analysis (e.g., crashes of com- mercial aircraft). The government has also required reporting in cases where the issue is not public safety but where a perceived need exists to help the consumer understand the service being provided. Airlines are, for example, required to track and report how late their flights are, lost baggage statistics, and the like. The Federal Communications Commission requires outage reporting by operators of some elements of the PSTN.5~ It was motivated by con- cerns that it did not have the means to systematically monitor significant telephone outages on a timely basis, as well as a more general interest in seeing that information on vulnerabilities is shared. A series of major service outages in interexchange and local exchange carrier networks in the early l990s underscored the need to obtain information that would improve reliability. Under these rules, outages of more than a specified level of severity (duration, number of subscribers affected, etc.) must be publicly documented.52 For example, service outages that significantly degrade the ability of more than 30,000 customers to place a call for more than 30 minutes must be reported.53 resee FCC common carrier Docket No. 91-273, paragraphs 4 and 32 February 27, 1992y, as cited in Network Reliability and Interoperability Council ENRICH. 1997. Network Interoperability: The Key to Competition. Washington, D.C.: NRIC, Federal communications commission federal advisory committee, p. 12. 5247 C.F.R. section 63.100 sets mandatory reporting thresholds for wireline telephone companies. The requirements are based on the type of outage, the number of affected subscribers, and the duration of the failure. 53see Network Reliability and Interoperability Council ENRICH. 1997. Network Interoperability: The Key to Competition. Washington, D.C.: NRIC, Federal communications commission federal advisory committee, p. 11. Available online at <http://www.nric.org/ pubs /nric3 /reportj9.pdf>.

96 THE INTERNET'S COMING OF AGE Changes in the telecommunications industry led the FCC, in 1998, to ask the Network Reliability and Interoperability Council (NRIC) IV to explore reliability concerns in the wider set of networks (e.g., telephone, cable, satellite, and data, including the Internet) that the PSTN is part of. The report of the NRIC IV subcommittee looking at needs for data on service outages54 called for a trial period of outage reporting. NRIC V, chartered in 2000, has initiated a 1-year voluntary trial starting in Septem- ber 2000 and will monitor the process, analyze the data obtained from the trial, and report on how well the process works.55 ISPs are not, at present, mandated to release such information. In- deed, the release of this type of information is frequently subject to the terms of private agreements between providers. This situation is not surprising, given the absence of regulation of the Internet and the high degree of regulation of the telephone industry. As the Internet becomes an increasingly important component of our society, there will probably be calls to require reporting on overall reliability and specific disruptions. It is not now clear what metrics should be used and what events should be reported, what the balance between costs and benefits would be for different types of reporting, or what the least burdensome approach to this matter would be. One response to rising expectations would be for Internet providers to work among themselves to define an industry ap- proach to reporting. Doing so could have two benefits it might provide information useful to the industry and it might avoid government impo- sition of an even-less-welcome plan. As noted above, one important reason for gathering information on disruptions is to provide researchers with the means to discover the root causes of such problems. For this to be effective, outage data must be available to researchers outside the ISPs; ISPs do not generally have re- search laboratories and are not necessarily well placed to carry out much of the needed analysis of the data much less design new protocols or build new technologies to improve robustness. Also, data should not be anonymized before they are provided to researchers; the anonymity hides information (e.g., on the particular network topology or equipment used) 54see Network Reliability and Interoperability Council ENRICH. 2000. Network Reliability Interoperability Council IV, Focus Group 3, Subcommittee 2, Data Analysis and Future Consider- ations Team. Washington, D.C.: NRIC, Federal communications commission federal advi- sory committee, p. 4. Available online at <ftp://ftp.atis.org/pub/nrsc/fg3sc2final.PDF>. 55see Network Reliability and Interoperability Council ENRICH. 2000. Revised Network Reliability and Interoperability Council - V Charter. Washington, DC: NRIC, Office of Engi- neering and Technology, Federal communications commission. Available online at <http: / /www.nric.org/charter_v/>.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 97 from the researcher. However, in light of proprietary concerns attached to the release of detailed information, researchers must agree not to dis- close proprietary information (and must live up to those agreements). Disclosure control in published reports is not simply a matter of anonymizing the results; particular details may be sufficient to permit the reader of a research report, including an ISP's competitors, to identify the ISP in question. Attention must, therefore, also be paid to protecting against inadvertent disclosure of proprietary information. Looking to the future, the committee can see other reasons why ISPs would benefit from sorting out what types of reliability metrics should be reported. For example, it is not hard to imagine that at some point there would be calls from high-end users for a more reliable service that spans the networks of multiple ISPs and that some of the ISPs would decide to work together to define an "industrial-strength" Internet ser- vice to meet this customer demand. When they interconnect their net- works, how would they define the service that they offer? Since the performance experienced by an ISP's customer depends on the perfor- mance of all the networks between the customer and the application or service the customer is using, each ISP would have an interest in ensur- ing that the other ISPs live up to reliability standards. Absent a good source of data on failures (and a standardized framework for collecting and reporting on failures), how would the ISPs keep tabs on each other? In the process of defining a higher-grade service, ISPs will want to un- derstand what sort of failure would degrade the service, and it is this sort of failure that they ought to be reporting on. From this perspective, outage reporting shifts from being a mandated burden to an enabler of new business opportunities. It is unlikely that simple, unidimensional measures that summarize ISP performance would prove adequate. Creating standard reporting or rating models for the robustness and quality of ISPs would tend to limit the range of services offered in the marketplace. What form might such user choices take? Consider, as an example, that an ISP that experiences the failure of a piece of equipment might face a tough trade-off. It could continue to operate its network at reduced performance in this condition or undergo a short outage to fix the problem a choice between an extended period of uptime at much reduced performance and a short outage that restores performance to normal. Some of the ISP's custom- ers e.g., those who depend on having a connection rather than on the particular quality of that connection will prefer the first option, while others will prefer the second. Indeed, some may be willing to pay extra to get a service that aims to provide a particular style of degraded service. (Such a "guaranteed style of degradation" is an interesting variation on QOS and does not impose much overhead.) These considerations suggest

98 THE INTERNET'S COMING OF AGE that, more generally, there is a need for many different rating scales or, put another way, a need for measuring several different things that might be perceived as "quality" or reliability. Combining them into a single metric does not serve the interests of different groups (user or vendor or both) that are likely to prefer different weighting factors or functions for combining the various measures. QUALITY OF SERVICE The Internet's best-effort quality of service (QOS) makes no guaran- tees about when, or whether, data will be delivered by the network. To- gether with the use of end-to-end mechanisms such as the Transmission Control Protocol (TCP), which provides capabilities for reassembling in- formation in proper order, retransmitting lost packets, and ensuring com- plete delivery, best effort been successful in supporting a wide range of applications running over the Internet. However, unlike Web browsing, e-mail transmission, and the like, some applications such as voice and video are very time-sensitive and degrade when the network is congested or when transmission delays (latency) or variations in those delays (jitter) are excessive. Some performance issues, of course, are due to overloaded servers and the like, but others are due to congestion within the Internet. Interest in adding new QOS mechanisms to the Internet that would tailor network performance for different classes of application as well as inter- est in deploying mechanisms that would allow ISPs to serve different groups of customers in different ways for different prices have led to the continued development of a range of quality-of-service technologies. While QOS is seeing limited use in particular circumstances, it is not widely employed. The technical community has been grappling with the merits and particulars of QOS for some time; QOS deployment has also been the subject of interest and speculation by outside observers. For example, some ask whether failure to deploy QOS mechanisms represents a missed opportunity to establish network capabilities that would foster new ap- plications and business models. Others ask whether introducing QOS capabilities into the Internet would threaten to undermine the egalitarian quality of the Internet whereby all content and communications across the network receive the same treatment, regardless of source or destina- tion that has been the consequence of best-effort service. Beyond the baseline delay due to the speed of light and other irreduc- ible factors, delays in the Internet are caused by queues, which are an intrinsic part of congestion control and sharing of capacity. Congestion occurs in the Internet whenever the combined traffic that needs to be forwarded onto a particular outgoing link exceeds the capacity of that

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 99 link, a condition that may be either transient or sustained. When conges- tion occurs in the Internet, a packet may be delayed, sitting in a router's queue while waiting its turn to be sent on, and will arrive later than a packet not subjected to queuing, resulting in latency. litter results from variations in the queue length. If the queue fills up, packets will be dropped. In today's Internet, which uses TCP for much of its data transport, systems sending data are supposed to slow down when congestion oc- curs (e.g., the transfer of a Web page will take longer under congested conditions). When the Internet appears to be less congested, transfers speed up and applications complete their transaction more quickly. Be- cause the adaptation mechanisms are based on reactions to packet loss, the congestion level of a given link translates into a sufficiently large packet loss rate to signal the presence of congestion to the applications that share the link. Congestion in many cases only lasts for the transient period during which applications adapt to the available capacity, and it reaches drastic levels only when the capacity available to each application is less than the minimum provided by the adaptation mechanism. Congestion is generally understood to be rare within the backbone networks of major North American providers, although it was feared otherwise in the mid-199Os, when the Internet was commercialized. In- stead, it is more likely to occur at particular network bottlenecks. For example, links between providers are generally more congested than those within a provider's network, some very much so. Persistent congestion is also observed on several international links, where long and variable queuing delays, as well as very high packet loss rates, have been mea- sured.56 Congestion is also frequent on the links between customers' local area networks (or residences) and their ISPs; sometimes it is feasible to increase the capacity of this connection, while in other cases a higher capacity link may be hard to obtain or too costly. Where wireless links are used, the services available today are limited in capacity, and wireless bandwidths are fundamentally limited by the scarcity of radio spectrum assigned to these services as well as vulnerable to a number of impair- ments inherent in over-the-air communication. At least some congestion problems can be eliminated by increasing the capacity of the network by adding bandwidth, especially at known 56See V. Paxson. 1999. "End-to-End Internet Packet Dynamics," IEEE/ACM Transactions on Networking 7~3~:277-292, June. Logs of trans-Atlantic traffic available online at <http:// bill.ja.net/> show traffic levels that are flat for most of the day at around 300 Mbps on a 310 Mbps (twin OC-3) terminating in New York.

100 THE INTERNET'S COMING OF AGE bottlenecks. Adding bandwidth does not, however, guarantee that con- gestion will be eliminated. First, the TCP rate-adaptation mechanisms described above may mask pent-up demand for transmission, which will manifest itself as soon as new capacity is added. Second, on a slightly longer timescale, both content providers and users will adjust their usage habits if things go faster, adding more images to Web pages or being more casual about following links to see what is there and so on. Third, on a longer timescale (on the order of months but not years), new applications can emerge when there is enough bandwidth to enough of the users to make them popular. This has occurred with streaming audio and is likely to occur with streaming video in the near future. Also, certain applications notably, real-time voice and video re- quire controlled delays and predictable transfer rates to operate accept- ably. (Streaming audio and video are much less sensitive to brief periods of congestion because they make use of local buffers.) Broadly speaking, applications may be restricted in their usefulness unless bandwidth is available in sufficient quantity that congestion is experienced very rarely or new mechanisms are added to ensure acceptable performance levels. A straightforward way to reduce jitter is to have short queue lengths, but this comes at the risk of high loss rates when buffers overflow. QOS mechanisms can counteract this by managing the load placed on the queue so that buffers do not overflow However, the situation in the Internet, with many types of traffic competing in multiple queues, is complex. Better characterization of network behavior under load may provide in- sights into how networks might be engineered to improve performance. Concerns in the past about being able to support multimedia appli- cations over the Internet led to the development of a variety of explicit mechanisms for providing different qualities of service to different ap- plications e.g., best effort for Web access and specified real-time service quality for audio and video.57 Today, two major classes of QOS support different kinds of delay and delivery guarantees (see Box 2.4~. They are based on the assumption that applications do not all have the same re- quirements for network performance (e.g., latency, jitter, or priority) and 57In essence, these proposed QOS technologies resemble those that have proven effective in ATM and Frame Relay networks, with the exception that they are applied to individual application sessions or to aggregates of traffic connecting sets of systems running sets of applications rather than to individual circuits connecting pairs of systems. The mathemati- cal difference between IP QOS and ATM QOS is that ATM sends variable-length bursts of cells, while IP sends variable-length messages. The biggest operational difference is that ATM QOS is generally used in ATM networks carrying real-time traffic, while QOS is generally not configured in IP networks today.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 101 that the network should provide classes of service that reflect these dif- ferences.58 There is significant disagreement among experts (including the ex- perts on this committee) as to how effective quality-of-service mecha- nisms would be and which would be more efficient, investing in addi- tional bandwidth or deploying QOS mechanisms. One school of thought, which sees a rising tide of quality, argues that increasing bandwidth in the Internet will provide adequate performance in many if not most cir- cumstances. As higher capacity links are deployed, the argument goes, Internet delays will tend to approach the theoretical limit imposed by the propagation of light in optical fibers, and the average bandwidth avail- able on any given connection will increase. As the overall quality in- creases, it will enable more and more applications to run safely over the Internet, without requiring specific treatment, in the same way that a rising tide as it fills a harbor can lift ever-larger boats. Voice transmission, for example, is enabled if the average bandwidth available over a given connection exceeds a few tens of kilobits per second and if the delays are less than one-tenth of a second, conditions that are in fact already true for large business users; interactive video is enabled if the average band- width exceeds a few hundred kilobits per second, a performance level that is already obtained on the networks dedicated to connecting univer- sities and research facilities. If these conditions were obtainable on the public Internet (e.g., if the packet loss rate or jitter requirements for tele- phony were met 99 percent of the time), business incentives to deploy QOS for multimedia applications would disappear and QOS mechanisms might never be deployed. Proponents of the rising tide view further observe that the causes of jitter within today's Internet are poorly understood, and that investment in better understanding the reasons for this behavior might lead to an understanding of what improvements might be made in the network as well as what QOS mechanisms would best cope with network congestion and jitter if tweaking the network is not a sufficient response. There are, however, at least some places within the network where there is no tide of rising bandwidth, and capacity is intrinsically scarce. One example is the more expensive and limited links between local area networks (or residences) and the public network. Even here, however, 58This presumes, of course, that one should meet the full range of requirements in a single infrastructure with a single switching environment. This is not necessarily an opti- mal outcome; while the Internet has been able to support a growing set of service classes within a single network architecture, it is an open question what network models would best support the broad range of communications service profiles.

02 THE INTERNET'S COMING OF AGE

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 103

104 THE INTERNET'S COMING OF AGE some will argue that it is better to invest in increased capacity of the gateway link than in mechanisms to allocate scarce bandwidth. As noted above, wireless links are inherently limited in capacity and are therefore candidates for QOS. Prospects for the use of Internet QOS technologies in this context depend in part on whether QOS services are provided at the Internet protocol layer or through specialized mechanisms incorporated into the lower-level wireless link technology. Current plans for third- generation wireless services favor the latter approach, suggesting that this may not be a driver of Internet QOS. Service quality, like security, is a weak-link phenomenon. Because the quality experienced over a path through the Internet will be at least as bad as the quality of the worst link in that path, quality of service may be most effective when deployed end to end, on all of the links between source and destination, including across the networks of multiple ISPs. It may be the case that localized deployment of QOS, such as on the links between a customer's local area network and its ISP, would be a useful alternative to end-to-end QOS, but the effectiveness of this approach and the circumstances under which it would prove useful are open questions. The reality of today's Internet is that end-to-end enhancement of QOS is a dim prospect. QOS has not been placed into production for end-to- end service across commercial ISP networks. Providing end-to-end QOS requires ISPs to agree as a group on multiple technical and economic parameters, including on technical standards for signaling, on the seman- tics of how to classify traffic and what priorities they should be assigned, and on the addition of complex QOS considerations to their interconnec- tion business contracts. Perhaps more significantly, the absence of com- mon definitions complicates the process of negotiating QOS across all of the providers involved end to end. ISP interest in differentiating their service quality from that of their competitors is another potential disin- centive to interprovider QOS deployment. There are also several technical obstacles to deployment of end-to- end QOS across the Internet. One challenge is associated with the routing protocols used between network providers (e.g., Border Gateway Proto- col, or BGP). While people have negotiated the use of particular methods for particular interconnects, there are no standardized ways of passing QOS information, which is needed for reliable voice (or other latency- sensitive traffic) transport between provider domains. Also, today's rout- ing technology provides limited control over which peering points inter- provider traffic passes through, owing to a lack of symmetric routing and the complexities involved in managing the global routing space. Exchanging latency-sensitive traffic (such as voice) will, at a minimum, require careful attention to interconnect traffic growth and routing configurations.

SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 105 While the original motivation for developing quality-of-service mechanisms was support of multimedia, another factor has been respon- sible for a sizable portion of recent interest in quality of service: ISPs that wish to value-stratify their users, that is, to offer those customers who place a higher value on better service a premium-priced service, need mechanisms to allow them to do so. In practice, this may be achieved by mechanisms to allocate relative customer dissatisfaction, degrading the service of some to increase that of others. (Anyone who has flown on a commercial airliner understands the basic principle: lower-fare-paying customers in coach have fewer physical comforts than their fellow travel- ers in first class, but they all make the same trip.) Value stratification may be of particular interest in situations where there is a scarcity of band- width and thus an interest in being able to charge customers more for increased use, but value stratification may also find use under circum- stances where ISPs are able to provision sufficient capacity to meet the demands of their customers and customers perceive enough value in a premium service to pay more for it. There is a central tension in the debate over QOS. If the providers, in order to make their customers happy, add enough capacity to carry the imposed load, why would one need more complex allocation schemes? Put another way, if there is no overall shortage of capacity, all that can be achieved by establishing allocation mechanisms is to allocate relative dis- satisfaction. Would providers intentionally underprovision certain classes of users? As indicated above, the answer may be yes under certain mar- keting and business plans. Such differentiation of service packages and pricing are sustainable inasmuch as customers perceive differences and are willing to pay the prices charged. One consequence of the development of mechanisms that enable dis- parate treatment of customer Internet traffic has been concern that they could be used to provide preferential support for both particular custom- ers and certain content providers (e.g., those with business relationships with the ISP).59 What, for instance, would better service in delivery of content from preferred providers imply for access to content from provid- ers without such status? What people actually experience will depend not only on capabilities possible from the technology and the design of marketing plans but also on what customers want from their access to the Internet and what capabilities ISPs opt to implement in their networks. 59See, for example, Center for Media Education. 2000. What the Market Will Bear: Cisco's Vision for Broadband Internet. Washington, D.C.: Center for Media Education. Available online at <http://www.cme.org/access/broadband/market_will_bear.html>.

106 THE INTERNET'S COMING OF AGE The debate over quality of service has been a long-standing one within the Internet community. Over time, it has shifted from its original focus on mechanisms that would support multimedia applications over the Internet to mechanisms that would support a broader spectrum of poten- tial uses. These uses range from efficiently enhancing the performance of particular classes of applications over constrained links to providing ISPs with mechanisms for value-stratifying their customers. The committee's present understanding of the technology and economics of the Internet does not support its reaching a consensus on whether QOS is, in fact, an important enabling technology. Nor can it be concluded at this time whether QOS will see significant deployment in the Internet, either over local links, within the networks of individual ISPs, or more widely, in- cluding across ISPs. Research aimed at better understanding network performance, the limits to the performance that can be obtained using best-effort service, and the potential benefits that different QOS approaches could provide in particular circumstances is one avenue for obtaining a better indication of the prospects for QOS in the Internet. Another avenue is to accumulate more experience with the effectiveness of QOS in operational settings; here the challenge is that deployment may not occur without demon- strable benefits, while demonstrating those benefits would depend at least in part on testing the effectiveness of QOS under realistic conditions.

Next: 3 Keeping the Internet the Internet: Interconnection, Openness, and Transparency »
The Internet's Coming of Age Get This Book
×
Buy Paperback | $48.00 Buy Ebook | $38.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

What most of us know as "the Internet" is actually a set of largely autonomous, loosely coordinated communication networks. As the influence of the Internet continues to grow, understanding its real nature is imperative to acting on a wide range of policy issues.

This timely new book explains basic design choices that underlie the Internet's success, identifies key trends in the evolution of the Internet, evaluates current and prospective technical, operational, and management challenges, and explores the resulting implications for decision makers. The committee-composed of distinguished leaders from both the corporate and academic community-makes recommendations aimed at policy makers, industry, and researchers, going on to discuss a variety of issues:

  • How the Internet's constituent parts are interlinked, and how economic and technical factors make maintaining the Internet's seamless appearance complicated.
  • How the Internet faces scaling challenges as it grows to meet the demands of users in the future.
  • Tensions inherent between open innovation on the Internet and the ability of innovators to capture the commercial value of their breakthroughs.
  • Regulatory issues posed by the Internet's entry into other sectors, such as telephony.
  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!