Click for next page ( 54


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 53
Scaling Up the Internet and Taking It Fore Reliable and Robust BUILDING A BETTER INTERNET The Internet has become a place where many live, work, and play. It is a critical resource for many businesses that depend on e-commerce. Indeed, when attacks are made on Internet infrastructure or commonly used Web sites like CNN, Yahoo! and the like, they become front-page news.1 As a consequence, the Internet must become and remain more robust and reliable. Reflecting demand for its capabilities, the Internet is expected to grow substantially worldwide in terms of users, devices, and applications. A dramatic increase in the number of users and networked devices gives rise to questions of whether the Internet's present address- ing scheme can accommodate the demand and whether the Internet community's proposed solution, IPv6, could, in fact, be deployed to rem- edy the situation. The l990s saw widespread deployment of telephony and streaming audio and video. These new applications and protocols have had significant impacts on the infrastructure, both quantitatively in terms of a growing level of traffic and qualitatively in terms of new types of traffic. The future is likely to see new applications that place new demands on the Internet's robustness and scalability. In short, to meet the potential demand for infrastructure, the Internet will have to support 1For example, Matt Richtel. 2000. "Several Web Sites Attacked Following Assault on Yahoo." New York Times, February 9, p. A1; and Matt Richtel. 2000. "Spread of Attacks on Web Sites Is Slowing Traffic on the Internet," New York Times, February 10, p. A1. 53

OCR for page 53
54 THE INTERNET'S COMING OF AGE a dramatically increasing number of users and devices, meet a growing demand for network capacity (scale), and provide greater robustness at any scale. SCALING "Scaling" refers to the process of adapting to various kinds of Internet growth, including the following: The increasing number of users and devices connected to the Internet, The increasing volume of communications per device and total volume of communication across the Internet, and The continual emergence of new applications and ways in which users employ the Internet. While details of the growth in Internet usage are subject to interpre- tation and change over time, reflecting the dynamic nature of Internet adoption, it is only the overall trends that concern us here. In the United States, a substantial fraction of homes have access to the Internet, and that number is likely to eventually approach the fraction of homes that have a personal computer (a fraction that itself is still growing). Over 100 million people report that they are Internet users in the United States.2 Overseas, while the current level of Internet penetration differs widely from country to country, many countries show rates of growth compa- rable to or exceeding the rapid growth seen in the United States,3 so it is reasonable to anticipate that similar growth curves will be seen in other less-penetrated countries, shifted in time, reflecting when the early adop- tion phase began. Perhaps a more important future driver for overall growth is the trend toward a growing number and variety of devices being attached to the Internet. Some of those devices will be embedded in other kinds of equipment or systems, and some will serve specific purposes for a given user. This trend could change the number of devices per user from the current number, slightly less than 1 in developed countries, to much more than this 10 or even 100. 2Data from Computer Industry Almanac, available online at . 3For an analysis based on OECD data, see Gonzaolo Diez-Picazo Figuera. 1999. An Analysis of International Internet Diffusion. Masters Thesis, MIT, June, p. 83.

OCR for page 53
SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 55 Scaling of Capacity The basic design of the Internet, characterized by the elements dis- cussed in Chapter 1, has proved remarkably scalable in the face of such growth. Perhaps the most obvious component of growth is the demand for greater speed in the communications lines that make up the Internet. As was noted in Chapter 1, the first major scaling hurdle was seen about a decade ago when, in response to growing demands, many of the 56-kbps lines in the NSFNET backbone were replaced with higher capac- ity 1.5-Mbps lines (also known as T1 lines).4 Doing so required develop- ing higher performance Internet routers and some retuning of protocols and software. Since then, the Internet has passed many scaling hurdles and increased its capacity many times over. The fastest lines in the Internet were 2.5 Gbps (OC-48) in 1999, almost 50,000 times faster than the original lines, and the deployment of 10-Gbps lines (OC-192) is under way. All expectations are that more such growth will be seen in the coming decade. There is a persistent and reasonable fear that demand for capac- ity will outstrip the ability of the providers to expand owing to a lack of technology or capital. The 1990s were characterized by periodic scram- bling by ISPs, equipment providers, and researchers to develop and deploy new technologies that would provide the needed capacity in ad- vance of demand. The success of those efforts does not, however, guaran- tee continued success into the future. Furthermore, efforts to expand capacity may not be uniformly successful. Regional variations in the availability of rights of way, industry strategies, and regulation could slow deployments in particular areas. Better use of existing bandwidth also plays a role in enhancing scalability. A recent trend has been to compensate for the lack of network capacity (or other functionality, such as mechanisms for assuring a par- ticular quality of service) by deploying servers throughout the Internet. Cache servers keep local copies of frequently used content, and locally placed streaming servers compensate for the lack of guarantees against delay. In some cases, innovative routing is used to capture requests and direct them to the closest servers. Each of these approaches has side Their implications for effects that can cause new problems, however. robustness and transparency are discussed elsewhere in this report. 4An abbreviation for bits per second is bps; kbps means thousands of bits per second, Mbps means millions of bits per second, Gbps means billions of bits per second, and Tbps means trillions of bits per second. The use of a capital B in place of lower case b means the unit of measurement is bytes t8 bits' rather than bits.

OCR for page 53
56 THE INTERNET'S COMING OF AGE Scaling of Protocols and Algorithms A more difficult aspect of growth is in design of new or improved protocols and algorithms for the Internet. The ever-present risk is that solutions will be deployed that work for the moment but fail as the num- ber of users and applications continues to grow today's significant im- provement may be tomorrow's impediment to progress. Scaling must thus be considered in every design. This lesson is increasingly important as there are many pressures driving innovations that may not scale well or at all. The IETF processes through which lower-level network protocols are developed involve extensive community review. This means that the protocols undergo considerable scrutiny with regard to scaling before they are widely deployed. However, particularly at the applications layer, protocol proposals are sometimes introduced that, while adequate in such settings as a local area network, have been designed without sufficient understanding of their implications for the wider Internet. Market pressures can then lead to their deployment before scaling has been completely addressed. When a standard is developed through a forum such as the IETF, public discussion of it in working groups helps. However, a protocol can nonetheless reach the status of a "proposed standard," and thus begin to be widely deployed, with obvious scalability problems only partially fixed. The Web itself is a good example of scaling challenges arising from particular application protocols. It is not widely appreciated that the "World Wide Wait" phenomenon is due in part to suboptimal design choices in the specialized protocol used by the Web (HTTP), not to the core Internet protocols. Early versions of HTTP relied on a large number of short TCP sessions, adding considerable overhead to the retrieval of a page containing many elements and preventing TCP's congestion control mechanisms from working.5 An update to the protocol, HTTP 1.1, adopted as an Internet standard by the IETF in 1999,6 finally fixed enough of the problem to reduce the pressure on the network infrastructure, but the protocol still lacks many of the right properties for use at massive 5Though it took some time to launch an update, the shortcomings of HTTP l.o were recognized early on. see, for example, Simon E. spero. 1994. Analysis of HTTP Performance Problems, Technical report. Cambridge, Mass.: World Wide Web consortium, July. Avail- able online at . 6R Fielding et aL 1999. Hypertext Transfer Protocol HTTP/1.1. RFC 2616. Network Working Group, Internet Engineering Task Force, June. Available online at .

OCR for page 53
SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 57 scale. The challenge posed by this lack of scalability has been significant given HTTP's large share of Internet backbone traffic.7 The case of IP multicast demonstrates the interplay between protocol design, the Internet's routing system, and scaling considerations. Multi- cast is a significant example because it allows applications to simulta- neously and inexpensively deliver a single data stream to multiple deliv- ery points, which would alleviate Internet scaling challenges. Multicast can be used in numerous applications where the same data are to be sent to multiple users, such as audio and audiovisual conferencing, entertain- ment broadcasting, and various other forms of broad information dis- semination (the delivery of stock quotes to a set of brokers is one ex- ample). All of these applications are capable of running over today's Internet, either in the backbone or within corporate networks, but many operate via a set of individual, simultaneous (unicast) transmissions, which means that they use much more bandwidth than they might oth- erwise. Despite its promise of reducing bandwidth requirements for one-to- many communications, multicast itself presents scaling challenges. By definition, an Internet-wide multicast group needs to be visible through- out the Internet or at least everywhere where there is a group's member. The techniques available today require that routers track participation in each active group, and in some case for each group's active senders. Such participation tracking requires complex databases and supporting proto- col exchanges. One might reasonably assume that the number of groups grows with the size of the Internet or with the growth of applications such as Internet radio broadcast, and that the footprint of each group (the fraction of the Internet over which the group information must be trans- mitted) will grow as the size of the Internet. However, the two factors multiply, meaning that under these assumptions, the challenges posed to providers will grow as the square of the Internet's size. Resolving this situation requires not merely defining an appropriate protocol but also researching a hard routing question how to coalesce routing informa- tion of multiple groups into manageable aggregates without generating too much inefficiency. 7For example, Internet traffic statistics for the vBNS, a research backbone, show that about two-thirds of TCP flows were HTTP. see MCI vBNS Engineering. 2000. NSF Very High Speed Backbone Network Service: Management and Operations Monthly Report, January. Available online at .

OCR for page 53
58 THE INTERNET'S COMING OF AGE Scaling of the Internet's Naming Systems Growth in the number of names and an increasing volume of name resolution requests, both of which reflect Internet growth, are placing scaling pressures on the Internet's name-to-address translation service, the Domain Name System (DNS).8 There is broad consensus as well as a strong technical argument that a common naming service is needed on the Internet.9 People throughout the world need to be able to name ob- jects (systems, files, and facilities) correctly in their own languages and have them unambiguously accessible to authorized people under those names, which requires a common naming infrastructure. People also need naming services to allow them to identify applications and services provided by particular companies and organizations. The DNS is instrumental in hiding the Internet's internal complexity from users and application developers. In the DNS, network objects such as the host computers that provide Web pages or e-mail boxes are desig- nated by symbolic names that are independent of the location of the re- source. The name provides an indirect reference to the network object, which allows the use of names instead of less mnemonic numbers and also allows the actual address to which the name points to be changed without disrupting access via the name. Because the computer associated with a particular named service can be changed without changing the IP addresses of that machine (only the address associated with the name in the DNS needs changing), indirection provides users with portability if they wish to switch Internet providers. While most users receive IP ad- dress allocations from their ISP and thus have to change address if they change ISP, DNS names are controlled by the user a change of provider requires only that the address pointed to by the DNS entry be changed. The significance of DNS names was greatly increased as a result of the decision by the original developers of the World Wide Web to use them directly to identify information locations. The importance attached to DNS names is reflected in the contention surrounding the system's man- agement (Box 2.1) The DNS is organized as a hierarchy. At the very top of the hierarchy, the "root servers" record the address of the top-level domain servers, such as the .com or .uk servers (Figure 2.1~. The addresses of these root 8The DNS was first introduced in P. Mockapetris. 1983. Domain Names - Concepts and Facilities, R;FC 882. November. Available online at . 9Internet Architecture Board. 2000. IAB Technical Comment on the Unique DNS Root, R;FC 2826. May. Available online at .

OCR for page 53
SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 59 generic top-level domains levPel COM ORG EDU NET GOV MIL country top-level domains US UK -- level STANFORD MIT 2 level CS 3 . . . FIGURE 2.1 DNS hierarchy. servers are known locally to every name server of the Internet, using information provided by ICANN (in practice, coded into the DNS soft- ware by the vendor). Each top-level domain server records the addresses of the domain name servers for the second-level domains, such as example.com. These secondary servers are responsible for providing in- formation on name-to-address mappings for names in the example.com domain. The hierarchical design permits the secondary servers to point themselves to third-level servers, and so forth. To access named objects, Internet sessions start with a transaction with a name server, known as name resolution, to find the IP address at which the resource is located, in which a domain name such as www.example.com is translated into a numerical address such as 128.9.176.32. Assuming that the local name server has not previously stored the requisite information locally (see the discussion of caching, below), three successive transactions are generally required in order to find the address of a target server such as www. example.com: (1) to learn the address of the .com server from the root server, (2) to learn the address of the example.com server from the .com server, and (3) to learn the address of the target web server, www. example.com, from the example.com name server. The situation in practice may, in fact, be more complicated. If example.com is a very popular service, it is useful to be able to distribute the load among multiple servers and/or to direct a user to the server that is closest to him. To do either of these, the name servers run by example. com may make use of a clever trick: requests for the address correspond- ing to www.example.com, for example, may produce replies pointing to

OCR for page 53
60 THE INTERNET'S COMING OF AGE one of a number of different servers that, presumably, contain copies of the same informational The rules governing DNS names would seem to permit millions of naming domains each containing billions of names,ll which would seem adequate to support scaling demands. However, with the number of top- 1OAnother non-DNS trick for load distribution makes use of so-called transparent proxies or interception proxies. These intercept and divert data packets going to a particular ad- dress to one of a number of servers that contain the same content. Because it interposes information processing outside the control of either the user's computer or the server he is

OCR for page 53
SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 61 level domains currently limited to one national domain per country (e.g., .fr for France), plus a limited number of global domains (e.g., .com and .org), many domains are organized with a very large number of names contained at the next level rather than by distributing names further down connecting to, this technique runs counter to the end-to-end principle and can sometimes have the side effect of delivering inconsistent information to the user. 1lEach DNS name can be composed of up to 256 characters and up to 64 naming ele- ments, each of which can be made of up to 64 characters (letters, digits, and hyphen).

OCR for page 53
62 THE INTERNET'S COMING OF AGE in the hierarchy (e.g., using product.example.com instead of product. com). This can cause scaling problems, and there are concerns that the performance of the DNS will worsen over time. The multistage process required to find the address of a target, re- peated for many Web page accesses by millions of Internet users, can result in a heavy load on the servers one level down from the top of the tree. If the name servers were to be overwhelmed on a persistent basis, all Internet transactions that make use of domain names (i.e., virtually all Internet transactions) would be slowed down, and the whole Internet would suffer. Today's DNS design relies on two mechanisms to cope with this load caching and replication. These mechanisms have been effective in alleviating scaling pressures, but there are signs that they may not be sufficient to cope with the continuing rapid growth of the network. DNS caching is a technique whereby the responses to common queries are stored on local DNS servers. Applications such as Web browsers also may perform DNS caching. Using caching, a local DNS server need only request the addresses of the .com server from the root servers infre- quently rather than repeatedly. Similarly, once a request has been made for the address of the example.com server, the local name server need not ask for this information again for a period of time known as the "time to live." Because of the dynamic nature of DNS information, name servers return not only an address but a time-to-live parameter selected by the administrator of the name server for the relevant domain, usually on the order of days or hours, which indicates how long the name-to-address mapping can be considered valid helping ensure that servers do not retain outdated information. 1 a Caching works well when the same request is repeated many times. This is the case for high-level queries, such as requesting the address of the .com name servers, and also for the most popular Web servers, the search engines, and the very large sites. (It works even better for very frequently accessed services like a file server on a local area network.) However, the efficiency of caching decreases as the number of names that Achy do applications also need to cache DNS names? Good DNS performance depends on having local access to DNS information. Because the target platform was a tiny, diskless machine, the earliest implementations of TCP/IP software for the IBM PC lacked DNS cache functionality and depended on local LAN access to a DNS server for all name resolu- tion requests. This resolver-only design has persisted in a number of machines today. Not only does this force the application designer to implement DNS caching, but there are performance costs as well. Since an application cannot determine whether the host it is running on supports a caching server, application-layer caching makes it possible for cach- ing to be carried out twice, potentially yielding inconsistent results.

OCR for page 53
SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 63 are kept active by a given user or domain name server increases. When millions of names are registered and accessed in the DNS, only a small fraction can be present in any given cache. Requests for the names of less frequently requested sites, which in total will represent a significant frac- tion of all requests, will have to be forwarded to the DNS. Even if user queries are concentrated mostly on large sites or queries from the same local group of hosts are concentrated on the same group of sites, which may or may not be the case, the remaining fraction still constitutes an important and growing burden for the DNS. The effect of cache misses is made even worse by the concentration of names in a small number of popular top-level domains, such as .com, .net, and .org. Consequently, an inordinate fraction of the load is sent to these domains' servers (a load that could be alleviated if the hierarchical design of the DNS were used to limit the number of highest-level names). These servers need to scale in two ways. They must support an ever-growing name population, which means that the size of their database keeps increasing very quickly, and they must serve ever more frequent queries. The growth of the database implies increased memory requirements and an increased management load. Replication, whereby name databases are distributed to multiple name servers, is a way of sharing the load and increasing reliability. With replication, the root server is able, for example, to provide the addresses of several .com servers instead of one. The volume of name resolution inquiries could be met by splitting the load across a sufficiently large number of replicated servers. Unfortunately, current DNS technology limits this approach because the list of the names and addresses of all the servers for a given domain must fit into a single 512-byte packet. (Even after efforts were made to shorten host names, the number of root servers remains limited to 13.) Once the maximum number of servers that will fit within the single-packet constraint has been deployed, increased load in that domain can only be dealt with by increasing the capacity and pro- cessing power of each of the individual .com name servers. While the performance of the most widely used DNS software, BIND, lags that of modern high-performance database systems and root servers' software can almost certainly be improved to handle much higher loads, Internet growth rates suggest that the demand on the root servers is likely to be growing faster than their processing speed is increasing and that in a few years the root servers could nonetheless be heavily overloaded. One proposal for addressing issues ranging from scaling to DNS name-trademark conflicts is to move toward a solution that makes use of directories as an intermediate layer between applications and the DNS. A directory might help resolve conflicts between DNS names and registered trademarks because a particular keyword could be associated with mul-

OCR for page 53
96 THE INTERNET'S COMING OF AGE Changes in the telecommunications industry led the FCC, in 1998, to ask the Network Reliability and Interoperability Council (NRIC) IV to explore reliability concerns in the wider set of networks (e.g., telephone, cable, satellite, and data, including the Internet) that the PSTN is part of. The report of the NRIC IV subcommittee looking at needs for data on service outages54 called for a trial period of outage reporting. NRIC V, chartered in 2000, has initiated a 1-year voluntary trial starting in Septem- ber 2000 and will monitor the process, analyze the data obtained from the trial, and report on how well the process works.55 ISPs are not, at present, mandated to release such information. In- deed, the release of this type of information is frequently subject to the terms of private agreements between providers. This situation is not surprising, given the absence of regulation of the Internet and the high degree of regulation of the telephone industry. As the Internet becomes an increasingly important component of our society, there will probably be calls to require reporting on overall reliability and specific disruptions. It is not now clear what metrics should be used and what events should be reported, what the balance between costs and benefits would be for different types of reporting, or what the least burdensome approach to this matter would be. One response to rising expectations would be for Internet providers to work among themselves to define an industry ap- proach to reporting. Doing so could have two benefits it might provide information useful to the industry and it might avoid government impo- sition of an even-less-welcome plan. As noted above, one important reason for gathering information on disruptions is to provide researchers with the means to discover the root causes of such problems. For this to be effective, outage data must be available to researchers outside the ISPs; ISPs do not generally have re- search laboratories and are not necessarily well placed to carry out much of the needed analysis of the data much less design new protocols or build new technologies to improve robustness. Also, data should not be anonymized before they are provided to researchers; the anonymity hides information (e.g., on the particular network topology or equipment used) 54see Network Reliability and Interoperability Council ENRICH. 2000. Network Reliability Interoperability Council IV, Focus Group 3, Subcommittee 2, Data Analysis and Future Consider- ations Team. Washington, D.C.: NRIC, Federal communications commission federal advi- sory committee, p. 4. Available online at . 55see Network Reliability and Interoperability Council ENRICH. 2000. Revised Network Reliability and Interoperability Council - V Charter. Washington, DC: NRIC, Office of Engi- neering and Technology, Federal communications commission. Available online at .

OCR for page 53
SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 97 from the researcher. However, in light of proprietary concerns attached to the release of detailed information, researchers must agree not to dis- close proprietary information (and must live up to those agreements). Disclosure control in published reports is not simply a matter of anonymizing the results; particular details may be sufficient to permit the reader of a research report, including an ISP's competitors, to identify the ISP in question. Attention must, therefore, also be paid to protecting against inadvertent disclosure of proprietary information. Looking to the future, the committee can see other reasons why ISPs would benefit from sorting out what types of reliability metrics should be reported. For example, it is not hard to imagine that at some point there would be calls from high-end users for a more reliable service that spans the networks of multiple ISPs and that some of the ISPs would decide to work together to define an "industrial-strength" Internet ser- vice to meet this customer demand. When they interconnect their net- works, how would they define the service that they offer? Since the performance experienced by an ISP's customer depends on the perfor- mance of all the networks between the customer and the application or service the customer is using, each ISP would have an interest in ensur- ing that the other ISPs live up to reliability standards. Absent a good source of data on failures (and a standardized framework for collecting and reporting on failures), how would the ISPs keep tabs on each other? In the process of defining a higher-grade service, ISPs will want to un- derstand what sort of failure would degrade the service, and it is this sort of failure that they ought to be reporting on. From this perspective, outage reporting shifts from being a mandated burden to an enabler of new business opportunities. It is unlikely that simple, unidimensional measures that summarize ISP performance would prove adequate. Creating standard reporting or rating models for the robustness and quality of ISPs would tend to limit the range of services offered in the marketplace. What form might such user choices take? Consider, as an example, that an ISP that experiences the failure of a piece of equipment might face a tough trade-off. It could continue to operate its network at reduced performance in this condition or undergo a short outage to fix the problem a choice between an extended period of uptime at much reduced performance and a short outage that restores performance to normal. Some of the ISP's custom- ers e.g., those who depend on having a connection rather than on the particular quality of that connection will prefer the first option, while others will prefer the second. Indeed, some may be willing to pay extra to get a service that aims to provide a particular style of degraded service. (Such a "guaranteed style of degradation" is an interesting variation on QOS and does not impose much overhead.) These considerations suggest

OCR for page 53
98 THE INTERNET'S COMING OF AGE that, more generally, there is a need for many different rating scales or, put another way, a need for measuring several different things that might be perceived as "quality" or reliability. Combining them into a single metric does not serve the interests of different groups (user or vendor or both) that are likely to prefer different weighting factors or functions for combining the various measures. QUALITY OF SERVICE The Internet's best-effort quality of service (QOS) makes no guaran- tees about when, or whether, data will be delivered by the network. To- gether with the use of end-to-end mechanisms such as the Transmission Control Protocol (TCP), which provides capabilities for reassembling in- formation in proper order, retransmitting lost packets, and ensuring com- plete delivery, best effort been successful in supporting a wide range of applications running over the Internet. However, unlike Web browsing, e-mail transmission, and the like, some applications such as voice and video are very time-sensitive and degrade when the network is congested or when transmission delays (latency) or variations in those delays (jitter) are excessive. Some performance issues, of course, are due to overloaded servers and the like, but others are due to congestion within the Internet. Interest in adding new QOS mechanisms to the Internet that would tailor network performance for different classes of application as well as inter- est in deploying mechanisms that would allow ISPs to serve different groups of customers in different ways for different prices have led to the continued development of a range of quality-of-service technologies. While QOS is seeing limited use in particular circumstances, it is not widely employed. The technical community has been grappling with the merits and particulars of QOS for some time; QOS deployment has also been the subject of interest and speculation by outside observers. For example, some ask whether failure to deploy QOS mechanisms represents a missed opportunity to establish network capabilities that would foster new ap- plications and business models. Others ask whether introducing QOS capabilities into the Internet would threaten to undermine the egalitarian quality of the Internet whereby all content and communications across the network receive the same treatment, regardless of source or destina- tion that has been the consequence of best-effort service. Beyond the baseline delay due to the speed of light and other irreduc- ible factors, delays in the Internet are caused by queues, which are an intrinsic part of congestion control and sharing of capacity. Congestion occurs in the Internet whenever the combined traffic that needs to be forwarded onto a particular outgoing link exceeds the capacity of that

OCR for page 53
SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 99 link, a condition that may be either transient or sustained. When conges- tion occurs in the Internet, a packet may be delayed, sitting in a router's queue while waiting its turn to be sent on, and will arrive later than a packet not subjected to queuing, resulting in latency. litter results from variations in the queue length. If the queue fills up, packets will be dropped. In today's Internet, which uses TCP for much of its data transport, systems sending data are supposed to slow down when congestion oc- curs (e.g., the transfer of a Web page will take longer under congested conditions). When the Internet appears to be less congested, transfers speed up and applications complete their transaction more quickly. Be- cause the adaptation mechanisms are based on reactions to packet loss, the congestion level of a given link translates into a sufficiently large packet loss rate to signal the presence of congestion to the applications that share the link. Congestion in many cases only lasts for the transient period during which applications adapt to the available capacity, and it reaches drastic levels only when the capacity available to each application is less than the minimum provided by the adaptation mechanism. Congestion is generally understood to be rare within the backbone networks of major North American providers, although it was feared otherwise in the mid-199Os, when the Internet was commercialized. In- stead, it is more likely to occur at particular network bottlenecks. For example, links between providers are generally more congested than those within a provider's network, some very much so. Persistent congestion is also observed on several international links, where long and variable queuing delays, as well as very high packet loss rates, have been mea- sured.56 Congestion is also frequent on the links between customers' local area networks (or residences) and their ISPs; sometimes it is feasible to increase the capacity of this connection, while in other cases a higher capacity link may be hard to obtain or too costly. Where wireless links are used, the services available today are limited in capacity, and wireless bandwidths are fundamentally limited by the scarcity of radio spectrum assigned to these services as well as vulnerable to a number of impair- ments inherent in over-the-air communication. At least some congestion problems can be eliminated by increasing the capacity of the network by adding bandwidth, especially at known 56See V. Paxson. 1999. "End-to-End Internet Packet Dynamics," IEEE/ACM Transactions on Networking 7~3~:277-292, June. Logs of trans-Atlantic traffic available online at show traffic levels that are flat for most of the day at around 300 Mbps on a 310 Mbps (twin OC-3) terminating in New York.

OCR for page 53
100 THE INTERNET'S COMING OF AGE bottlenecks. Adding bandwidth does not, however, guarantee that con- gestion will be eliminated. First, the TCP rate-adaptation mechanisms described above may mask pent-up demand for transmission, which will manifest itself as soon as new capacity is added. Second, on a slightly longer timescale, both content providers and users will adjust their usage habits if things go faster, adding more images to Web pages or being more casual about following links to see what is there and so on. Third, on a longer timescale (on the order of months but not years), new applications can emerge when there is enough bandwidth to enough of the users to make them popular. This has occurred with streaming audio and is likely to occur with streaming video in the near future. Also, certain applications notably, real-time voice and video re- quire controlled delays and predictable transfer rates to operate accept- ably. (Streaming audio and video are much less sensitive to brief periods of congestion because they make use of local buffers.) Broadly speaking, applications may be restricted in their usefulness unless bandwidth is available in sufficient quantity that congestion is experienced very rarely or new mechanisms are added to ensure acceptable performance levels. A straightforward way to reduce jitter is to have short queue lengths, but this comes at the risk of high loss rates when buffers overflow. QOS mechanisms can counteract this by managing the load placed on the queue so that buffers do not overflow However, the situation in the Internet, with many types of traffic competing in multiple queues, is complex. Better characterization of network behavior under load may provide in- sights into how networks might be engineered to improve performance. Concerns in the past about being able to support multimedia appli- cations over the Internet led to the development of a variety of explicit mechanisms for providing different qualities of service to different ap- plications e.g., best effort for Web access and specified real-time service quality for audio and video.57 Today, two major classes of QOS support different kinds of delay and delivery guarantees (see Box 2.4~. They are based on the assumption that applications do not all have the same re- quirements for network performance (e.g., latency, jitter, or priority) and 57In essence, these proposed QOS technologies resemble those that have proven effective in ATM and Frame Relay networks, with the exception that they are applied to individual application sessions or to aggregates of traffic connecting sets of systems running sets of applications rather than to individual circuits connecting pairs of systems. The mathemati- cal difference between IP QOS and ATM QOS is that ATM sends variable-length bursts of cells, while IP sends variable-length messages. The biggest operational difference is that ATM QOS is generally used in ATM networks carrying real-time traffic, while QOS is generally not configured in IP networks today.

OCR for page 53
SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 101 that the network should provide classes of service that reflect these dif- ferences.58 There is significant disagreement among experts (including the ex- perts on this committee) as to how effective quality-of-service mecha- nisms would be and which would be more efficient, investing in addi- tional bandwidth or deploying QOS mechanisms. One school of thought, which sees a rising tide of quality, argues that increasing bandwidth in the Internet will provide adequate performance in many if not most cir- cumstances. As higher capacity links are deployed, the argument goes, Internet delays will tend to approach the theoretical limit imposed by the propagation of light in optical fibers, and the average bandwidth avail- able on any given connection will increase. As the overall quality in- creases, it will enable more and more applications to run safely over the Internet, without requiring specific treatment, in the same way that a rising tide as it fills a harbor can lift ever-larger boats. Voice transmission, for example, is enabled if the average bandwidth available over a given connection exceeds a few tens of kilobits per second and if the delays are less than one-tenth of a second, conditions that are in fact already true for large business users; interactive video is enabled if the average band- width exceeds a few hundred kilobits per second, a performance level that is already obtained on the networks dedicated to connecting univer- sities and research facilities. If these conditions were obtainable on the public Internet (e.g., if the packet loss rate or jitter requirements for tele- phony were met 99 percent of the time), business incentives to deploy QOS for multimedia applications would disappear and QOS mechanisms might never be deployed. Proponents of the rising tide view further observe that the causes of jitter within today's Internet are poorly understood, and that investment in better understanding the reasons for this behavior might lead to an understanding of what improvements might be made in the network as well as what QOS mechanisms would best cope with network congestion and jitter if tweaking the network is not a sufficient response. There are, however, at least some places within the network where there is no tide of rising bandwidth, and capacity is intrinsically scarce. One example is the more expensive and limited links between local area networks (or residences) and the public network. Even here, however, 58This presumes, of course, that one should meet the full range of requirements in a single infrastructure with a single switching environment. This is not necessarily an opti- mal outcome; while the Internet has been able to support a growing set of service classes within a single network architecture, it is an open question what network models would best support the broad range of communications service profiles.

OCR for page 53
02 THE INTERNET'S COMING OF AGE

OCR for page 53
SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 103

OCR for page 53
104 THE INTERNET'S COMING OF AGE some will argue that it is better to invest in increased capacity of the gateway link than in mechanisms to allocate scarce bandwidth. As noted above, wireless links are inherently limited in capacity and are therefore candidates for QOS. Prospects for the use of Internet QOS technologies in this context depend in part on whether QOS services are provided at the Internet protocol layer or through specialized mechanisms incorporated into the lower-level wireless link technology. Current plans for third- generation wireless services favor the latter approach, suggesting that this may not be a driver of Internet QOS. Service quality, like security, is a weak-link phenomenon. Because the quality experienced over a path through the Internet will be at least as bad as the quality of the worst link in that path, quality of service may be most effective when deployed end to end, on all of the links between source and destination, including across the networks of multiple ISPs. It may be the case that localized deployment of QOS, such as on the links between a customer's local area network and its ISP, would be a useful alternative to end-to-end QOS, but the effectiveness of this approach and the circumstances under which it would prove useful are open questions. The reality of today's Internet is that end-to-end enhancement of QOS is a dim prospect. QOS has not been placed into production for end-to- end service across commercial ISP networks. Providing end-to-end QOS requires ISPs to agree as a group on multiple technical and economic parameters, including on technical standards for signaling, on the seman- tics of how to classify traffic and what priorities they should be assigned, and on the addition of complex QOS considerations to their interconnec- tion business contracts. Perhaps more significantly, the absence of com- mon definitions complicates the process of negotiating QOS across all of the providers involved end to end. ISP interest in differentiating their service quality from that of their competitors is another potential disin- centive to interprovider QOS deployment. There are also several technical obstacles to deployment of end-to- end QOS across the Internet. One challenge is associated with the routing protocols used between network providers (e.g., Border Gateway Proto- col, or BGP). While people have negotiated the use of particular methods for particular interconnects, there are no standardized ways of passing QOS information, which is needed for reliable voice (or other latency- sensitive traffic) transport between provider domains. Also, today's rout- ing technology provides limited control over which peering points inter- provider traffic passes through, owing to a lack of symmetric routing and the complexities involved in managing the global routing space. Exchanging latency-sensitive traffic (such as voice) will, at a minimum, require careful attention to interconnect traffic growth and routing configurations.

OCR for page 53
SCALING UP THE INTERNET AND MAKING IT MORE RELIABLE AND ROBUST 105 While the original motivation for developing quality-of-service mechanisms was support of multimedia, another factor has been respon- sible for a sizable portion of recent interest in quality of service: ISPs that wish to value-stratify their users, that is, to offer those customers who place a higher value on better service a premium-priced service, need mechanisms to allow them to do so. In practice, this may be achieved by mechanisms to allocate relative customer dissatisfaction, degrading the service of some to increase that of others. (Anyone who has flown on a commercial airliner understands the basic principle: lower-fare-paying customers in coach have fewer physical comforts than their fellow travel- ers in first class, but they all make the same trip.) Value stratification may be of particular interest in situations where there is a scarcity of band- width and thus an interest in being able to charge customers more for increased use, but value stratification may also find use under circum- stances where ISPs are able to provision sufficient capacity to meet the demands of their customers and customers perceive enough value in a premium service to pay more for it. There is a central tension in the debate over QOS. If the providers, in order to make their customers happy, add enough capacity to carry the imposed load, why would one need more complex allocation schemes? Put another way, if there is no overall shortage of capacity, all that can be achieved by establishing allocation mechanisms is to allocate relative dis- satisfaction. Would providers intentionally underprovision certain classes of users? As indicated above, the answer may be yes under certain mar- keting and business plans. Such differentiation of service packages and pricing are sustainable inasmuch as customers perceive differences and are willing to pay the prices charged. One consequence of the development of mechanisms that enable dis- parate treatment of customer Internet traffic has been concern that they could be used to provide preferential support for both particular custom- ers and certain content providers (e.g., those with business relationships with the ISP).59 What, for instance, would better service in delivery of content from preferred providers imply for access to content from provid- ers without such status? What people actually experience will depend not only on capabilities possible from the technology and the design of marketing plans but also on what customers want from their access to the Internet and what capabilities ISPs opt to implement in their networks. 59See, for example, Center for Media Education. 2000. What the Market Will Bear: Cisco's Vision for Broadband Internet. Washington, D.C.: Center for Media Education. Available online at .

OCR for page 53
106 THE INTERNET'S COMING OF AGE The debate over quality of service has been a long-standing one within the Internet community. Over time, it has shifted from its original focus on mechanisms that would support multimedia applications over the Internet to mechanisms that would support a broader spectrum of poten- tial uses. These uses range from efficiently enhancing the performance of particular classes of applications over constrained links to providing ISPs with mechanisms for value-stratifying their customers. The committee's present understanding of the technology and economics of the Internet does not support its reaching a consensus on whether QOS is, in fact, an important enabling technology. Nor can it be concluded at this time whether QOS will see significant deployment in the Internet, either over local links, within the networks of individual ISPs, or more widely, in- cluding across ISPs. Research aimed at better understanding network performance, the limits to the performance that can be obtained using best-effort service, and the potential benefits that different QOS approaches could provide in particular circumstances is one avenue for obtaining a better indication of the prospects for QOS in the Internet. Another avenue is to accumulate more experience with the effectiveness of QOS in operational settings; here the challenge is that deployment may not occur without demon- strable benefits, while demonstrating those benefits would depend at least in part on testing the effectiveness of QOS under realistic conditions.