Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
2 Technology Suppose that a student is assigned to do a report for school on ani- mals that build things, and he selects beavers as his primary topic. Con- necting to the Internet through a computer at home, he goes to an online search engine, where he tries to search the Internet for information about "adult beavers." The search engine returns links to a large number of Web pages. When he clicks on a certain link, he is surprised when he finds a sexually oriented Web site intended for adult use. This scenario or one similar to it is one of the most common that underlies parental concerns about children using the Internet. This chap- ter addresses the technological dimensions of this "reference scenario" and some of the things that can be done to protect against it. 2.1 AN ORIENTATION TO CYBERSPACE AND THE INTERNET 2.1.1 Characteristics of Digital Information In the reference scenario, the student is seeking information (content) on beavers a kind of animal. All information on the Internet is repre- sented in bits electronic strings of l's and O's that are later interpreted according to some algorithm to produce a representation that is meaning- ful to human beings. Digital information has properties very different from those of the information that a student might retrieve in a book. For purposes of this report, the salient aspects of this digital representation of information are the following: 1 1More discussion can be found in Computer Science and Telecommunications Board, National Research Council, 2000, The Digital Dilemma: Intellectual Property in the Information Age, National Academy Press, Washington, D.C. 31
32 YOUTH, PORNOGRAPHY, AND THE INTERNET · Reproducible. Unlike a physical book or photograph or analog au- dio recording, a digital information object can be copied infinitely many times, often without losing any fidelity or quality. · Easily shared. Because information is easily copied, it is also easy to distribute at low cost. Digital information can be shared more easily than any type of analog information in the past. In the physical world, broad- casting information to groups has serious costs and hence requires a cer- tain wherewithal and commitment. Technologies such as e-mail and Web sites allow broadcasting to many people at the touch of a single button. · Flexible. A variety of different types of information can be repre- sented digitally: images, movies, text, sound. Digital information can even be used to control movement in the physical world through digitally controlled actuators. · Easily modified. Digital representations of information can be easily manipulated. It is trivial to modify an image say, changing hair color from blond to red, adding a few notes to a musical score, or deleting and adding text to a document. So, for example, a naked body can be affixed to a head of a child, words modified from their original intent and music "borrowed" freely, and even virtual "people" created, all without leaving a visible trace of these manipulations. · Difficult to intercept. Because no physical object is necessarily asso- ciated with a digital information object, interdiction of digital information is much more difficult than interdiction of a physical object carrying in- formation. In other words, there is no book, no magazine, no photo that can be intercepted by physical means. 2.1.2 The Nature of the Internet Medium and a Comparison to Other Media Types In the reference scenario, the student relies on the Internet. The pre- ceding discussion about digital information is important, but the nature of the Internet itself also makes it quite unlike other more traditional media such as television, film, print, and the telephone. Thus, it is useful to describe certain key features of the Internet medium and to compare it to some other, more traditional media. · The Internet supports many-to-many connectivity. A single user can receive information and content from a large number of different sources, and can also transmit his or her content to a large number of recipients (one-to-many). Or a single user can engage with others in a one-to-one mode (one-to-one). Or multiple users can engage with many others (many- to-many). Broadcast media such as television and radio as well as print are one-to-many media one broadcast station or publisher sends to many recipients. Telephony is inherently one-to-one, although party lines and
TECHNOLOGY 33 conference calling change this characterization of telephones to some ex- tent. · The Internet supports a high degree of interactivity (Box 2.1~. Thus, when the user is searching for content (and the search strategy is a good one), the content that he or she receives can be more explicitly customized to his or her own needs.2 In this regard, the Internet is similar to a library in which the user can make an information request that results in the production of books and other media relevant to that request. By con- trast, user choices with respect to television and film are largely limited to the binary choice of "accept or do not accept a channel," and all a user has to do to receive content is to turn on the television. The telephone is an inherently interactive medium, but one without the many-to-many con- nectivity of the Internet. · The Internet is highly decentralized. Indeed, the basic design philoso- phy underlying the Internet has been to push management decisions to as decentralized a level as possible. Thus, if one imagines the Internet as a number of communicating users with infrastructure in the middle facili- tating that communication, management authority rests mostly (but not exclusively) with the users rather than the infrastructure which is sim- ply a bunch of pipes that carry whatever traffic the users wish to send and receive. (How long this decentralization will last is an open question.3) By contrast, television and the telephone operate under a highly central- ized authority and facilities. Furthermore, the international nature of the Internet makes it difficult for one governing board to gain the consensus necessary to impose policy, although a variety of transnational organiza- tions are seeking to address issues of Internet governance globally. · The Internet is intrinsically a highly anonymous medium. That is, noth- ing about the way in which messages and information are passed through the Internet requires identification of the party doing the sending.4 One Customization happens explicitly when a user undertakes a search for particular kinds of information, but it can happen in a less overt manner because customized content can be delivered to a user based, for example, on his or her previous requests for information. 3Marjory s. Blumenthal and David D. Clark. 2001. "Rethinking the Design of the Internet: The End to End Arguments vs. the Brave New World,,, in Communications Policy in Transi- tion: The Internet and Beyond, a. compaine and s. Greenstein, eds. MIT Press, Cambridge, Mass. 4It is true that access to the Internet may require an individual to log into a computer or even to an Internet service provider. But for the most part, the identity of the user once captured for purposes of accessing the Internet is not a part of information that is auto- matically passed on to an applications provider E.g., a Web site ownery. More importantly, many applications providers for entirely understandable business reasons choose not to require authentication. Strong authentication in general requires an infrastructure that is capable of providing a trusted verification of identity and in the absence of such an infra- structure, strong authentication is an expensive and inconvenient proposition for the user. This point is discussed at greater length in section 2.3.2.'
34 YOUTH, PORNOGRAPHY, AND THE INTERNET
TECHNOLOGY 35 important consequence of the Internet's anonymity is that it is quite diffi- cult to differentiate between adult and minor users of the Internet, a point whose significance is addressed in greater detail in Chapter 4. A second consequence is that technological approaches that seek to differentiate between adults and minors (discussed in Chapter 13) generally entail some loss of privacy for adults who are legitimate customers of certain sexually explicit materials to which minors do not have legitimate access. · The capital costs of becoming an Internet publisher are relatively low, and thus anyone can establish a global Web presence at the cost of a few hundred dollars (as long as it conforms to the terms of service of the Web host). Further, for the cost of a subscription to an Internet service pro- vider (ISP), one can interact with others through instant messages and e-mail without having to establish a Web presence at all. The costs of reaching a large, geographically dispersed audience may be about the same as those required to reach a small, geographically limited audience, and in any event do not rise proportionately with the size of the audience. · Because nearly anyone can put information onto the Internet, the appro- priateness, utility, and even veracity of information on the Internet are generally uncertified and hence unverified. With important exceptions (generally as- sociated with institutions that have reputations to maintain), the Internet is a "buyer beware" information marketplace, and the unwary user can be misinformed, tricked, and seduced or led astray when he or she encoun- ters information publishers that are not reputable. · The Internet is a highly convenient medium, and is becoming more so. Given the vast information resources that it offers coupled with search capabilities for finding many things quickly, it is no wonder that for many people the Internet is the information resource of first resort. 2.1.3 Internet Access Devices In the reference scenario, the student uses a computer to access the Internet. While today a personal computer is the most common way to connect to the Internet, devices for accessing the Internet are proliferating. Entire businesses have begun to spring up in order to ready content and delivery of information for a host of other devices. These devices include: · Handheld organizers like Palm and Handspring typically these devices contain built-in wireless modems and use services like OmniSky; · Cell phones with built in Web access; · WebTVTM and Internet access devices that are used on TV sets and customized to MSN and AOL and whose deployment began in 2001; · Blackberry RIM and wireless paging devices;
36 YOUTH, PORNOGRAPHY, AND THE INTERNET · Standalone Internet machines like the Compaq Ipaq and mail- stations; · Kiosks designed for surfing the Internet and typically used in pub- lic spaces; · Game machines like Sega, Nintendo, Microsoft's Xbox. Today's gaming technology (e.g., Sony's Playstation) increasingly uses the Inter- net to provide users with multi-player communities in which a user can compete against and/or cooperate with other like-minded individuals. Software is generally available on CD-ROMs, and the widespread avail- ability of CD-ROM writers makes the possibility of non-vendor-produced games and activities a realistic one. Game-playing applications are also increasingly available for use on various Web sites, sometimes for free. Note that such games often contain violent material. In addition, many commercial establishments frequented by children, including coffee shops, department stores, and fast food restaurants, will have customer-usable Internet access points. Broadband Internet access- needed for efficient transmission of images and movies will also grow in the future, though with some uncertainty about how fast it will be deployed. Specialized Web access devices will cost much less than today's computers (a few hundred dollars each rather than several hundred or thousand dollars). Wireless Internet access is also expected to grow in popularity, though the feasibility of transmitting high-quality images through wireless links remains an open question. These devices and business trends suggest increasingly ubiquitous access to the Internet. Note also an important social point wireless ac- cess and access "anywhere" enable users, including children, to escape many forms of local supervision (e.g., someone looking over his or her shoulder), and individuals will not be as dependent on school, libraries, and work to provide Internet access. Consequently, approaches to Inter- net protection and safety for children that depend on actions whose effect is limited to a single venue will be increasingly ineffective. 2.1.4 Connecting to the Internet In the reference scenario, the student connects to the Internet. In general, access to cyberspace is provided by one or more Internet service providers (ISPs). For children, Internet connections are available via: · Personal Internet service. In this case, a party subscribes to a con- sumer-oriented ISP, and gains access to the Internet through as many places as the provider can provide access ports. Such services are gener- ally responsible for home access. There are many variations in the offer- ings from ISPs and many different fee structures as well. Note that an
TECHNOLOGY 37 individual child may be using a family account, a personal account asso- ciated with a family account, or a friend's personal Internet service. · School and/or library Internet service. A student (or faculty member or staff person) or a library patron uses school or library facilities to obtain Internet access. In general, schools and libraries obtain Internet service for their students and patrons through business-oriented ISPs, and a whole host of classroom ISPs have been brought to the market. · Public terminals. An individual pays "by the minute" for Internet access at a public terminal, which may be located in a coffee shop or an airport, or through a wireless service. In addition to Internet connections, some ISPs offer other services designed to enhance the user's experience. Proprietary services (includ- ing parental controls to help manage the online experience of children) and content are offered by a number of online service providers. These services and content are available only to those who subscribe to those online service providers. In other cases, services are available to some non-subscribers (for example, the instant message (IM) services of some ISPs can provide IM service to those who do not subscribe to those ISPs). Moreover, various online service providers develop and seek to de- velop reputations about the kinds of content that they may offer. For example, a service provider may bill itself as being "family-friendly" and thus provide access only to Web sites that it regards as appropriate. The denial of access to all Web sites not on the provider's "family-friendly" list is a proprietary service that the online provider offers that is unavail- able to others who do not subscribe to it. ISPs offer dial-up or broadband access to the Internet. The majority of at-home access is today achieved through dial-up connections a user's computer dials an ISP phone number and connects to the ISP through an ordinary modem. However, broadband access, generally through DSL (digital subscriber lines) from phone companies or cable modems from cable TV companies, is growing because of the higher-bandwidth connec- tions offered. Higher bandwidth is relevant because some kinds of mate- rial contain many more bits than others. Text, for example, typically contains many fewer bits than do images, and images contain many fewer bits than movies have. Thus, viewing of graphics-intensive material online through a low-bandwidth connection is often very tedious and tries the patience of all but the most dedicated users. ISPs also require their subscribers to abide by certain terms of service, violation of which is grounds for termination of the service contract with a subscriber. An individual subscriber to an ISP is bound directly by the terms of service of that ISP. An individual who obtains Internet service through an intermediary is bound by the terms of service imposed by the
38 YOUTH, PORNOGRAPHY, AND THE INTERNET intermediary, which may (or may not) be stricter than those that bind the ISP and the intermediary. Note also that ISPs vary across a wide range in the extent to which they enforce their terms of service. A typical provi- sion in the terms of service of many ISPs might forbid a user from posting sexually explicit material under most conditions. ISPs make decisions about content that they will carry. In particular, many ISPs do not allow access to every Usenet newsgroup (e.g., they may not carry newsgroups that carry a large volume of child pornography).5 For subscribers to these ISPs, the newsgroups that are not carried can be difficult to find and are for many practical purposes non-existent.6 Finally, ISPs are funded by subscription and/or by advertising. Sub- scription entails periodic payment by the user to the ISP for access privi- leges. Advertising entails payments by advertisers to the ISP for the privilege of displaying ads, and thus the user must be willing to accept the presence of ads in return for access privileges. 2.1.5 Identifying Devices on the Internet: The Role of Addressing Every computer or other device connected to the Internet is identified by a series of numbers called an IP address.7 The domain name system is a naming system that translates these computer-readable IP addresses into human-readable forms, namely domain names. Thus, a domain name is a name that identifies one or more IP addresses. A canonical domain name has the form "example.com." Every domain name has a suffix corresponding to a top-level domain (TLD), in this example .com. Until October 1, 2001, the most common top- level domains allowed for Internet use have been .net, .org, .com, .edu, 5usenet is a worldwide distributed discussion system consisting of a set of newsgroups with names that are classified hierarchically by subject. "Articles" or ''messages,, are "posted" to these newsgroups by people on computers with the appropriate software- these articles are then broadcast to other interconnected computer systems via a wide vari- ety of networks. Some newsgroups are "moderated"; in these newsgroups, the articles are first sent to a moderator for approval before appearing in the newsgroup. For more infor- mation, see Chip Salzenberg, "What Is Usenet?," available online at <http: / /www.faqs.org/ faqs/usenet/what-is/partl />. 6There are Web sites through which one can read Usenet newsgroups even if the ISP has decided not to carry certain newsgroups, thus circumventing the ISP's selection policy. 7The IP address of a device provides a unique address to which and from which mes- sages can be routed. A typical IP address has the form a.b.c.d, where a, b, c, and d are numbers from zero to 255. The mapping between domain name and IP address is managed by devices known as domain name servers. More information is given in a Computer Science and Telecommunications Board report on domain name systems that is currently in preparation. Note also that IP addresses may be mapped dynamically to devices, so today, a user's computer would have one IP address and tomorrow it might have a different one.
TECHNOLOGY 39 .gov, and .mil. In addition, a number of two-letter country suffixes have been recognized. As this report goes to press, a number of other top-level domains have been approved: .biz, .info, .pro, .coop, .aero, museum, and .name. (How many other TLDs will eventually be available is an open question, and the issue of the number and type of TLDs is highly charged politically and economically.) As a rule of thumb, the non-country suf- fixes indicate something about the nature of the party with which the site is affiliated. For example, example.museum is likely operated by a mu- seum; example.gov is operated by a government agency. The domain name is a key element of routing traffic across the Internet. For example, a typical e-mail address is of the form 'fohn.Doe~ example.com." The address of a typical Web site has the form "www.example.com." The Web site address is generally part (or all) of a uniform resource locator (URL) that identifies a particular Web page that can be found on a Web site. Thus, www.example.com/pagel might refer to a page on the example.com Web site. 2.1.6 Functionality of the Internet In the reference scenario, the student used a search engine to search the World Wide Web for information about beavers. Search engines are only one aspect of the functionality that the Internet of- fers, and as the Internet matures, new functions based on new appli- cations and technologies are constantly being introduced. Some of the more important applications of the Internet are described below and are summarized in Table 2.1. · The World Wide Web (WWW) refers to the set of all the information resources that can be accessed via the hypertext transfer protocol (HTTP). Loosely speaking, it is the set of all Web pages that can be addressed by a request of the form "http: URL."8 Today, the publicly accessible World Wide Web consists of over 2 billion Web pages,9 though there is a great deal of uncertainty in any estimate of Web size. Web pages are associated with particular hosts (though not every host has a Web page), and many Web pages themselves include links to other Web pages. The Web is based on a client-server model a user (client) specifically requests a Web page from a host (server). · Search engines help to organize, classify and return information based on a query, and those who surf the Web typically rely on various .J 1 .J .J Most browsers handle addresses without a preceding "http:" as though it was present. Also, some Web pages are accessible only through the "https:" protocol. 9For example, as of November 2001 the Google search engine had indexed 1.6 billion Web pages. As of April 2002, it had indexed 2.1 billion Web pages.
40 YOUTH, PORNOGRAPHY, AND THE INTERNET TABLE 2.1 Selected Internet Applications and Their Implications for Exposing Children to Inappropriate Sexually Explicit Material and Potentially Dangerous Experiences Channel Key Points Web pages Identified by and accessed through knowledge of the uniform resource locator (for example, http://www.random_sex_site.com, http: / /www.just_fine_kids_site.com) Can display still images, text, and movies Generally the channel used today by the adult online industry Can be found by typing the URL into a browser or clicking on a link (links can be embedded in instant messages, e-mail, and so on; included in other Web pages; or found through a search engine) E-mail Requires knowledge of a user's e-mail address Can contain (or carry) text, images, links to Web pages; can be used to initiate two-way dialog as well as to deliver information and files Sender's e-mail address can be faked (or be misleading) Is the route for unsolicited commercial e-mail (spam) Chat Generally text-based, and conducted in a "chat room"; text can contain links to Web pages Can be public (accessible to anyone) or private (by invitation only) Content of chat and online identities of participants are visible to everyone participating in the chat room Chat rooms are an online equivalent of CB radio Used to initiate, establish, and maintain online relationships Instant One-on-one dialog, and private messages Text-based, but can contain links; images and voice can sometimes be transmitted as well Initiation of instant message requires knowledge of user name "Buddy lists" allow user to know who is online at the same time as the user Usenet Populated by some 30,000 newsgroups of specialized topics; newsgroups function essentially as online bulletin boards on which users can post anything they wish, often anonymously Many newsgroups contain sexually explicit material, and some are oriented primarily toward such material; sexually explicit content on Usenet newsgroups is often more extreme than those on adult-oriented Web sites Cost of content distribution is borne by Internet service provider that carries newsgroups with content rather than by publisher or receiver Sexually explicit Usenet newsgroups serve as conduits for advertising of adult-oriented Web sites and as a medium in which sexually explicit content can be exchanged among users Internet service providers make choices about what Usenet newsgroups to carry; some carry the full line, and others carry only a subset (e.g., all except those devoted to child pornography) Peer-to-peer Connection between two users that is made directly without connections mediation through a central server Purpose of peer-to-peer connection is typically for file-sharing (of any kind of content, including sexually explicit content) Not generally anonymous (because connections are peer-to-peer, each user must have an Internet address with which to interact) Cost of distribution is borne by the Internet service provider rather than the end users
TECHNOLOGY 41 types of search engines to find the information they are seeking. Box 2.2 describes how search engines work. Search engines rely on technologies of information retrieval, as discussed in Section 2.2. Given the enormous volume of information on the Web, users in general do not know where to find the information they seek. To cope with this situation, search en- gines have been developed to help users find the addresses of informa- tion residing on the Web. While no data have been collected on this point, it is probably fair to say that search engines enable the finding of most information that people access on the Internet.
42 YOUTH, PORNOGRAPHY, AND THE INTERNET · E-mail refers to messages that are sent electronically from one user to another (or to many others) and read at a time of the recipient's choos- ing. E-mail can carry attachments that can be other information objects, such as images, movies, audio recordings, and so on. E-mail can also be used as a direct marketing tool ("spam") analogous to third-class postal mail (also known as junk mail). The use of e-mail requires knowledge of a recipient's e-mail address. · File sharing refers to a process in which devices controlled by end users (i.e., "peers") interact directly with each other to transfer files be- tween them, rather than interacting through a central server. In some file- sharing networks, a central server holds a publicly accessible index to the files available from end users (but not the files themselves). End users then transfer the files between themselves.l° Other peer-to-peer file-shar- ing networks eliminate even the centralized server index function. Users of these systems are connected to a network of other parties (rather than to a centralized index), and a query from one user goes to an immediate circle of possible respondents. If not satisfied, the query then goes from those respondents to other respondents. Furthermore, such queries are highly anonymous, though file transfers between end users are not. A1- though peer-to-peer interaction is most often performed in a user-to-user mode, there is no reason that in principle a single user could not establish peer-to-peer connections to a large number of other users and thus func- tion in a "server-like" mode for those users. · Usenet newsgroups are a broadcast medium in which anyone any- where with a computer can be a transmitter. Typically groups form around shared social interests. Thus, the Usenet becomes the place for discussion among self-selecting groups interested in specific topics. The volume carried by Usenet newsgroups is substantial (over 50 gigabytes per day on more than 10,000 newsgroups).ll Anyone can "post" a mes- sage on any Usenet newsgroup (perhaps anonymously see Section 2.3), governed only by his or her own judgment in ascertaining the relevance of the message to the nominal topic of that newsgroup. Newsgroups are named as described in Box 2.3. 10This mode of file sharing first gained widespread publicity with the Napster network, an online service that facilitated the sharing of digital music files among users. The files themselves the information content of interest to end users always remained on client systems and never passed through a centralized server (such as one that would host a Web page). Instead, the server gave end users the ability to search for particular files of interest and to initiate a peer-to-peer transfer between the users willing to share (and receive) files without the payment of a fee, even when the files constituted legally protected intellectual property. Napster is important for this discussion because there is no particular reason that the files in question must be digital music files and indeed, extensions of the Napster protocol can handle other types of files. 1lPersonal communication, Dan Geer, president of Usenet.
44 YOUTH, PORNOGRAPHY, AND THE INTERNET · Internet relay chat (IRC) and chat rooms. These are popular real-time interactive services on the Internet that function as the equivalent of CB radio, where one person talks on a channel and anyone listening on that channel can hear and respond. IRC and chat rooms allow users to ex- change text-only messages in real time with other people all over the world. IRC "channels" and chat rooms can be public (so that they can be found by others wishing to join the conversation) or private (so that they are invisible to the general public and special knowledge of the channel's or chat room's name is needed to join). IRC and chat rooms require a user to take active (initiating) steps to join an ongoing conversation. In addition, some chat rooms or channels on the Internet are monitored by employees or volunteers for language and content and behaviors, but most are not. (These monitors sometimes have the ability to force particu- lar users out of a conversation.) There are many variants of chat rooms. Chat rooms can be based on interests movies, sports, hobbies or can be just a place to meet people. Some of the latest technologies are found in the online gaming commu- nity where people assume digital visual representations called avatars. Avatars can then interact with each other in cyberspace. The chat then has a visual animated component. MUDs and MOOs are complex online games relying mainly on text interactions while relatively new games like Microsoft's Age of Empires and Electronic Arts' The Sims Online utilize visual representations to create fantastic communities for role playing. · Instant messaging services allow a two-way, real-time, private dialog between two users These services include such well-known entities as AOL's Instant Messenger and Yahoo's Messenger. A user initiating a message sends an invitation to talk to another (specific) user who is online at the same time. Unlike IRC, no channel-seeking initiating step is re- quired on the part of the recipient to become part of such a conversation.l2 Instant messaging also allows someone to carry on multiple private con- versations simultaneously. Instant messaging is very popular today for both professional and personal use, because unlike in chat rooms, one tends to talk to people whom one already knows. Note also that IMs are 12Buddy lists are an important element of IM services. A buddy list contains the online names of "buddies" of a given user and indicates when one or more come online. When the user knows that "suel23" is online, she can send "suel23" an instant message and start a conversation. Thus, buddy lists facilitate online real-time communication among people who know each other's online names. Most IM services also offer a blocking option that enables a user who receives an IM from someone to block it. This option is used when a user receives an IM from someone with whom the user does not want to communicate (e.g., a stranger, or a friend with whom one is on the "outs".
TECHNOLOGY 45 often used in conjunction with chat rooms or other online activities a user in a chat room can send an IM to someone else in the chat room (because he or she sees the other party's screen name or "handle"), thus establishing a private communication. Once limited to text-only interac- tions, IM services are increasingly sophisticated. For example, some IMs can support direct voice interactions and exchanges of music or image files. Other IM services allow a user to block selected other parties from contacting him or her, thus increasing the difficulty of harassment. · Videoconferencing applications are growing. Web cameras and streaming media depend on the increasing availability of broadband Internet connections to allow the high-quality real-time transmission of audio and video content. Today's Internet videoconferencing suffers from many of the same problems as Internet telephony, most notably poor quality (low resolution as well as "jitter" in the moving images). A popular consumer videoconferencing application is CO-SEE-ME, a very inexpensive videoconferencing tool originally developed at Cornell Uni- versity for educational applications and now used to support a wide variety of video applications. Chat rooms are often forums in which Web cameras are used to send pictures in real time. · Streaming media, video, and audio are allowing people to watch mov- ies like broadcasts over the Web as well. A movie that is now available through pay-per-view cable TV may readily become available through the Internet (a phenomenon known as digital convergence), perhaps aug- mented by the availability of an online chat room for discussion of that content with one's friends and/or an electronic commerce site where one can purchase products or services illustrated in the movie. · Internet telephony allows two-way real-time voice communication to be established without records of such communications appearing on family telephone bills. A variety of standards now in place facilitate the interoperability of Internet telephony products, which would otherwise be hampered by proprietary specifications and protocols. However, because the Internet was not designed to support real-time operations, the quality of such connections remains an issue, though progress is being made in this area. Internet telephony products enable Internet users to establish real-time voice contact without the need for a tele- phone, and even today, voice connections (of somewhat low fidelity) can be established through certain types of instant message and in some chat rooms. In addition to these functions, there are a variety of Internet applica- tions for facilitating Web activity (Box 2.4~. The use of these applications is often free, and they are important because they reduce the costs and difficulty of establishing a (non-commercial) Web presence and of gener-
46 YOUTH, PORNOGRAPHY, AND THE INTERNET ating communities of shared interest in sports, in science, and in trading of sexually explicit materials. Finally, a variety of peripheral devices are also relevant to a discus- sion of Internet functionality. The availability of devices to convert sound into digital form, to digitize existing images, and to record still and video imagery enables individuals to generate digital content inexpensively and in private. Digital cameras, Web cameras, and camcorders are dropping in price and the pictures they take increasing in quality, and virtually anyone can publish videos to the Web or can participate in or set up videoconferences at very low cost.l3 Thus, while one might have had 13A 2001 video advertisement from Sony Europe for its Vaio line of notebook computers (which can have a Webcam built into them) depicts a man working at home on his Vaio notebook computer (with the Webcam). An adult female whom he obviously knows enters the room, greets him, strips to her underwear in another room, and starts behaving with him in a very sexually aggressive manner. The advertisement closes with several business- men on the other end of a video conference looking at their screen in surprise seeing the woman on top of the man. The advertisement is sexually suggestive but depicts no overt sexual activity or nudity.
TECHNOLOGY 47 difficulty in the past in taking a picture of a couple having sex (because of the difficulty in having the film developed), today a digital camera en- ables one to do the same in complete privacy. 2.1.7 Cost and Economics of the Internet On the Internet, the cost of handling information is rapidly decreas- ing. From a message sender's point of view, electronic messages cost next to nothing to create, exactly nothing to duplicate, and virtually nothing to send, and given the anonymity of the Internet inexpensive bandwidth imposes none of the costs normally associated with responsibility, pru- dence, or probity, leading to problems such as unsolicited commercial e- mail (also known as spam). Bandwidth is inexpensive enough that most ISPs and services recognize that it is cheaper to "send everything" through its pipes than to determine if a message or information is inappropriate, unwanted, or unrequested by the receiver. Furthermore, because digital information can be so freely reproduced, it is essentially impossible to rely on mechanical difficulty or expense of reproduction to curtail the availability of anything to anyone. Once re- leased onto the Internet, content is next to impossible to ban whether that content involves a political manifesto, sensitive classified information, com- pany trade secrets, one's medical records, or child pornography.~4 Finally, the Internet contains an enormous volume of material that changes rapidly. The sheer mass of this material means that it is economi- cally prohibitive to review every publicly accessible item for its inappro- priateness or lack thereof. The economics described above suggest that if it costs virtually noth- ing to provide content to everyone, then an entirely free market will seek to make all possible content available to everyone. The implications of such economics are further discussed in Chapter 3. 2.1.8 A Global Internet The Internet transcends the physical boundaries of local communities and national borders alike, thus expanding the universe from which con- ~4This is not to say that all content on the Internet remains accessible, but in practice attempts to ban certain information content result in efforts by those interested in such information to copy and distribute it. Thus, while the personal medical records of John Doe may not be of particular interest, and if posted today may disappear without a trace tomor- row, the reason is that no one except John Doe is likely to be interested in such records. However, if the personal medical records of the President of the United states were posted on the Internet, it would be virtually impossible for the most determined efforts of the White House to erase them and to eliminate all access to them.
48 YOUTH, PORNOGRAPHY, AND THE INTERNET tent of various kinds can be drawn. Of particular relevance is that many other nations have different views about visual depictions of sexuality and the human body. For example, images of frontal nudity are found in mainstream print media in many parts of Europe, and publication or broadcast of such images raises little concern or outcry there. Thus, mate- rial not seen as "pornographic" by those providing it (e.g., content pro- viders in Europe) may be perceived as such by those viewing it in a different cultural context (e.g., by some viewers in the United States). A further consequence of the Internet's international nature is that only with great difficulty (and many would argue that it is impossible) can laws passed in one jurisdiction affect the behavior of parties in other jurisdictions that are not generally subject to such laws.l5 Thus, to the extent that sexually explicit material of any kind or any other type of material, for that matter is available from overseas sources, laws that seek to restrict U.S. content providers from making such material avail- able to U.S. citizens will fail to restrict it in practice.l6 2.1.9 The Relative Newness of the Internet Amidst all of the attention given to the Internet and dot-coin phe- nomena, it is helpful to recall that the Internet has been a part of the national consciousness for less than a decade (since the mid-199Os). Ten years is an enormously long time compared to the time scale of technol- ogy change, but it is quite short on the time scale of social, economic, and legal change. Given that the array of pre-Internet social, economic, and legal and regulatory practices to balance competing societal interests de- veloped over a time scale of many decades (and in some cases, centuries), it is not surprising that the Internet has offered something of a vacuum into which many parties seeking quick advantage have moved. For example, the practice of adult-oriented Web sites using addresses that are based on common words or that are similar to those of non-adult businesses draws many people to sites that they had not intended to visit. Branding histories have not been established that allow users to differen- tiate between reliable and unreliable information. Certain practices that are acceptable in the real world such as direct marketing may cross 150f course, such a claim is valid only to the extent that content providers and ISPs are numerous and dispersed internationally. If the number of ISPs is small enough (as could happen through attrition or mergers and acquisitions in one jurisdiction), they become likely targets for regulation, as regulatory efforts can be concentrated rather than dispersed. 160n the other hand, sources that appear to be foreign may in fact be under the jurisdic- tion of U.S. law. For example, the mere fact that a domain name has a country suffix such as .ru or .jp does not necessarily mean that its owner is located in Russia or Japan. Indeed, in this hypothetical example, such parties may well reside in California or Iowa.
TECHNOLOGY 49 over into the unacceptable in cyberspace because they are increasingly voluminous and often seen as more intrusive as well. Perhaps the most important consequence of the relative newness of the Internet is the generation gap in knowledge between parent and child. It may be that as today's children become parents themselves, their famil- iarity with rapid rates of technological change will reduce the knowledge gap between them and their children, and mitigate to some extent the consequences of the gap that remains. 2.2 TECHNOLOGIES OF INFORMATION RETRIEVAL As suggested in the reference scenario in which a student seeks infor- mation on adult beavers, information retrieval is an important part of what people do on the Internet. By virtue of its vast scope, the Internet is a route for obtaining a range and variety of material to which one would most likely not otherwise have easy access such materials include his- tory, science, entertainment, games, medical information, and religious information, as well as materials that adults deem inappropriate for chil- dren. If children are treated as adults on the Internet, children may come across such materials. Searching for information on the Internet is different from searching for information in, for example, a library in the physical world. Typically, an individual might search for information using an Internet search engine. A common initial search strategy used by many inexperienced individu- als is to type one or two keywords and then to examine the sites that are returned. For a word such as "sex," a search engine might return informa- tion on sex education sites, a set of biology notes on sex, and adult-oriented Web sites. By contrast, a user of a physical library might rely on the content labeling in various classification systems, such as those of the Library of Congress and the Dewey Decimal systems. On the Internet, this absence of reliable content labeling confounds specificity in searching. Further, the scale of a "Web catalog" (i.e., the volume of information accessible through popular search engines) is much larger than that of most library catalogs of holdings, and Web search engines often do not provide adequate categori- zation of Web pages contained in their databases. Finally, the most impor- tant distinction between the physical library and the Internet is the fact that all physical libraries exercise some editorial discretion in acquiring materi- als, whereas the Internet is a venue in which the publications of any party are available and retrievable without editorial restriction. Information retrieval systems support people in finding information in large databases of information objects (whether in the form of text, images, video, or other media) that is relevant to their problems or situa- tions. Internet search engines, where the database is the Web, are a typi-
50 YOUTH, PORNOGRAPHY, AND THE INTERNET cat example of such systems, as are libraries, where the database is the collection. To accomplish their goals, information retrieval systems must: · Represent the content of the information objects (what the objects are about), through a process called content representation or content analysis; · Represent the person's information "neec3 " through ~ nroc:~.~. called problem or user representation; - r- · Match the representations of information objects and information problem, to retrieve those objects that are most likely to be useful to the searcher (search techniques); and · Provide an interface between the user and the other components of the system to support the user's interaction with those components and with the information objects. Filtering systems, discussed at greater length in Sections 2.3.1 and 12.1, work like information retrieval systems in reverse; that is, they are concerned not with retrieving desirable information, but rather with mak- ing sure that undesirable information is not retrieved. However, their essential operations remain the same: they must represent the content of the information objects; they must represent relevant characteristics of the user; they must match object representations with user representations to eliminate undesirable objects; and they must provide a means for users to specify or otherwise indicate what is not desired. The essential problem with information retrieval (and filtering) is that all of these processes are inherently uncertain. With respect to content analysis, what an information object is about can be many things for many people. The problem is intrinsically difficult, even for humans: one person may think a picture shows a starry sky; another may interpret it as a symptom of mental ill-health; and a third is interested only in the brush technique. Similarly, one user may find a particular page of text obscene; to a second it is merely embarrassing; and to the third, it contains impor- tant health-care information. Also, even representing what text is about is fraught with uncertainty. Most words mean many things (polysemy); most concepts can be expressed in many ways (synonymy). Images are a particularly difficult recognition challenge for comput- ers. Computers seek to recognize an image by analyzing the relationship of the pixels in it (color tone, contrast, and so on). While it is often possible to tell whether a picture has nearly naked people in it, images of the California desert and apple pies are also sometimes identified as pic- tures with naked people by today's image recognition software.~7 And ~7D.A. Forsyth and M.M. Fleck. 1999. ''Automatic Detection of Human Nudes," Interna- tional Journal of Computer Vision 32~1~: 63-77.
TECHNOLOGY 51 image recognition technology is for the most part incapable of distin- guishing minors from adults (and hence cannot identify child pornogra- phy with any reliability). At the same time, using words that may be found alongside images provides additional information that can help identify sexually explicit images properly. With respect to representing the user (or what the user desires or desires not to see) the problems are similarly difficult. Users are in general unable to specify precisely that which they do not know but may be searching for, nor are they (or a computer algorithm, or another person) able to specify precisely the characteristics of that which they should not see. The matching process is thus itself inevitably uncertain, since the representations on which it depends cannot be complete and certain. Because information retrieval and information filtering are probabi- listic, any search engine will find material that is irrelevant to the user's needs and fail to find material that is relevant. Similarly, any filter will inevitably allow the passing of some undesirable material, and will filter out some desirable material. Any attempt to avoid errors of the first type will lead to an increase in errors of the second type, and vice versa. These points are discussed in greater detail in Section 2.3.1 and in Appendix C. 2.3 TECHNOLOGIES RELATED TO ACCESS CONTROL AND POLICY ENFORCEMENT As more people and children connect to the Internet, problems such as exposure to inappropriate material and experiences assume a higher profile. One logical conclusion might be that if technology helped to create these problems, technology can help to solve them. While the committee does not believe that technology is yet the foundation of good solutions to these problems (and may never be), technologies neverthe- less do have useful roles to play. Below is a brief discussion of technolo- gies that may be relevant. 2.3.1 Filtering Technologies Filtering technologies allow Internet material or activities that are deemed inappropriate to be blocked, so that the individual using that filtered computer cannot gain access to that material or participate in those activities. Typically, material is determined to be inappropriate on the basis of its source, its content, or the labels that have been associated with it. Determination of inappropriate content can be accomplished by computer-based methods, by a combination of computer-based methods and human judgment, or by human judgment alone. This section ad-
52 YOUTH, PORNOGRAPHY, AND THE INTERNET dresses automatic and human plus automatic methods, since the size of the Internet effectively prevents use of human judgment alone (Box 2.5~. (In the case of methods based on a combination of human plus automatic techniques, a human rater examines Web sites that a preliminary ma- chine-performed analysis has identified for human examination, and makes a judgment call about whether the site is inappropriate, and if so, determines the objectionable category into which the page falls.) Filtering technologies can be applied in several ways. One is by the establishment of so-called "black lists," which are lists of sources that have been deemed to be inappropriate, and that the user is prevented from accessing. Another is by the establishment of "white lists," which are lists of sources that have been deemed appropriate, and thus are the only sources that the user is allowed to access. These two methods re- quire a priori identification of the bad (good) sites, which are then incor- porated into the filtering software, which stands between the user's Inter- net access tool and the Internet itself. Bad sites for black lists can be identified through any of the technologies described below. Also, in a priori determinations of inappropriate content, the categorization judg- ment is usually made days, weeks, or even months in advance of the user's request for the Web site a point that is significant in light of the fact that the content of Web sites typically changes over time. A third means of applying filtering technologies occurs in real time, that is, at the time that the user is actually interacting on the Internet and when the information in question is flowing directly to the user. In this case, there may be no a priori blocking of specific sites or sources; rather, the content or other characteristics of retrieved items are analyzed prior to display, and on this basis it is determined whether they should be dis- played to the user. This real-time method can also be used in reverse; that is, it can be used to analyze the user's request, and on this basis decide whether the request should be allowed, or disallowed. Although con- ducted in real time, this method nevertheless requires a priori specifica- tion of indicators of content which determine that that source has inap- propriate content. Finally, only real-time content monitoring is useful for monitoring and selective blocking of outgoing information, such as block- ing certain text from appearing in e-mail (e.g., a phone number). 1 1 eJ Note that if a requested Web site is determined to be inappropriate, there are several options for how much material from that site should be blocked. For example, all material on that site might be blocked (every- thing on www.example.com). Or only a certain directory might be blocked (www.example.com/directoryl might be blocked, while www.example. com/directory2 might not be blocked). Or a particular page within a direc- tory might be blocked (e.g., www.example.com/directoryl/picturel.jpg).
54 YOUTH, PORNOGRAPHY, AND THE INTERNET Filtering by Internet Domain Names and Addresses Filtering by Internet domain names and addresses is typically accom- plished by examination of the name of the Web site that is requested by the user or returned to the user, in the case of real-time filtering. The name of a Web site (or page on a Web site) is specified by a uniform resource locator (URL). A given URL, for example, http: / /www.example.com/directoryl / picturel.jpg, is usually checked against this list in a number of ways. In the case of a priori filtering, the URL is checked against a preexist- ing list of inappropriate names generated by the filter vendor. All parts of the URL are compared to a list of words or terms that have been previ- ously found to be associated with sites containing inappropriate material, or that are believed are likely to be associated with inappropriate mate- rial. For example, www.hotmama.com is likely to refer to an adult Web site.l8 The .xxx domain (discussed in Section 13.1) is based on this notion. This method can be used to permit access, as well as to prevent access. For instance, a site in the .gov domain would in general be considered highly unlikely to contain inappropriate material, as would a site with the name of a museum. In the case of real-time filtering, access would be denied (allowed) based on the comparison; in the case of a priori filtering, the URL would probably be forwarded to a human evaluator, who would determine whether it should be placed on the black list. A related method is to examine the links that are made from a site and to a site. Because many adult Web sites are linked to each other, a referral to a known adult site A that is present on Web site B provides reason to assume that B is also an adult site. A second method is to check the IP address of the Web site in this (made-up) case, 188.8.131.52. If this address is on a list of inappropriate IP addresses, access is blocked. This approach is helpful when a Web site has only an IP address and no domain name associated with it. A complication in this analysis of page names is that different hosts can share the same IP address through a process known as IP-based vir- tual hosting, which is a way of assigning multiple domain names to the same IP address. IP-based virtual hosting is made possible by the fact that the HTTP protocol passes the URL containing the requested domain name to the site at the given IP address, and the software at that IP address maps the domain name to the appropriate portion of the server. Thus, an entry in the domain name server need not point to a unique address, and a given IP address does not specify a Web site unambiguously. Thus, www.porn-company.com and www.safe-for-kids.com might share the 18As of October 26, 2001, this Web site presented a blank page. But it may not be blank in the future.
TECHNOLOGY 55 same IP address (e.g., 184.108.40.206), even though each of these names, when entered into a browser, would reach the correct sites. A list that desig- nated 220.127.116.11 as containing inappropriate material would block both domain names. Filtering by Textual Analysis Filtering by textual analysis makes use of information retrieval repre- sentation technologies discussed in Section 2.2 and Appendix C. The basic concept is to examine all of the text that is on the site or page that is being considered (or in the search request), and to determine whether that text is indicative of inappropriate content. The most naive method of doing this is to compare the individual words of the text or request to a list of words that are strongly associated with inappropriate content. For example, the site might be deemed inap- propriate if any of a number of keywords is found (e.g., "orgy," "cum," "bomb," "gun," "marijuana," and so on). When such words are found, access is blocked, or the site is flagged for possible inclusion on a black list. However, many words have more than one meaning (for instance, "beaver" can have both sexual and nonsexual meanings); furthermore, the context in which words appear has a great effect on their appropriate- ness (for instance, the word "breast" can appear in a cancer information site, as opposed to an adult-oriented, sexually explicit site). More sophis- ticated text analysis techniques that are available to address these prob- lems can, for instance, identify phrases (e.g., "beaver dams" or "breast cancer") in order to determine appropriateness more precisely. Another method of textual analysis that is used for filtering is text classification or categorization (see Appendix C). This technique ana- lyzes the text as a whole, taking account of such characteristics as fre- quency of occurrence of various words, co-occurrence of pairs or other combinations of words, and other statistical parameters of the text. Text classification is first applied to a so-called training collection of texts that are already known to be either appropriate or inappropriate, in order to discover regularities in the statistical properties of appropriate texts and inappropriate texts. The same technique is then applied to texts retrieved from the Web, and their statistical characteristics are used to classify them as either appropriate or inappropriate. Filtering by Image Analysis Almost all sexually explicit material on the Internet is associated with images. As indicated in Section 2.2 and Appendix C, analysis of images to
56 YOUTH, PORNOGRAPHY, AND THE INTERNET determine if they are inappropriate is a very hard problem, if it is to be done accurately. Nevertheless, there are some techniques that can pro- vide clues to the potential inappropriateness of an image.~9 For instance, it is possible to identify large expanses of what is likely to be flesh in an image, and it is also possible to determine whether an image is likely to be of one or more people. Also, it is possible to have a set of canonical or usual inappropriate images, against which images on a Web site can be compared. However, all of these techniques are highly error-prone and therefore are most often used in combination with other indicators of potential inappropriateness as described below. Filtering by Labels All Web pages have associated with them information that describes various characteristics of the page and that is typically hidden from the user. For example, HTML or XML tags within the body of a page can encode various rules that determine how information is structured on the page. This low-level information can be used to compare the page's structure against a set of structures commonly associated with inappro- priate pages. At a somewhat higher level, Web sites have associated with them information about the site or page as a whole. Such metadata can be used to determine the appropriateness (or not) of a site. Metadata is not directly viewable by the user, a feature that has been exploited by many inappropriate (and even some appropriate) sites in order to bias search results toward themselves. For instance, due to the nature of search engines, the more times a word that is used in a query appears in a site, the higher up in retrieval rankings that site will be placed. Thus, extended repetition of commonly used search terms in the metadata, which have no relationship to the actual content of the site itself, will result in that site's being retrieved and placed highly in the results when those terms are used. This methodology can, however, also be used for filtering purposes, in the following ways. The terms in the metadata can be compared to the words in the text of the page, and if those in the metadata are markedly dissimilar from those in the page, that page is suspect. Also, the fact of unusual repetition of words in the metadata can be used as a clue for filtering. MA brief summary concerning the technology of screening for sexually explicit images can be found in James Ze Wang, Jia Li, Gio Wiederhold, and Oscar Firschein, 1998, "System for Screening Objectionable Images," Computer Communications Journal 21~15~: 1355-1360, and papers referenced therein.
TECHNOLOGY 57 The most straightforward method of labeling for filtering is labeling to indicate the nature of the content of the Web page or site. This can be accomplished either by third parties who label sites according to some established set of categories that indicate their content, or by the producer of the site. This is, in effect, the human version of the statistically based automatic text classification described above. The filter then works by establishing which categories of sites are allowed to be presented, reading the appropriate label in the metadata, and refusing all sites that are either on a black list of categories, or not on a white list. A common framework for labeling is the Platform for Internet Con- tent Selection (PICS Box 2.6~. In the domain of television, the V-chip is a filter that is based on labeling. (Movies and video games also have labels
58 YOUTH, PORNOGRAPHY, AND THE INTERNET (i.e., ratings) that often appear before a program is televised or a game is played, but these are not machine-readable. Further, these labels are intended to provide advice to consumers rather than to enable techno- logical denial.) Filtering Using Combinations of Methods All of the technologies of filtering that are discussed above have in- herent uncertainties associated with them, which lead them to make er- rors both of commission (misinterpreting a site as inappropriate) or omis- sion (not identifying an inappropriate site). However, the sources of error in each of the techniques are different. Thus, by combining the various techniques, the level of error can be reduced. For example, if image analysis indicates the high probability of a naked person but textual analy- sis does not indicate any of the words usually associated with adult- oriented material, analysis of the associated URL finds the domain .gov, and the metadata indicates that the owner of the site is the National Gallery of Art, the filter would be justified in predicting that the site should not be regarded as containing adult-oriented, sexually explicit material, despite the evidence from image analysis. Such methods show promise in improving filter performance. Trade-offs in Filtering As mentioned above, filtering is subject to two kinds of error: errors of commission, also known and referred to in this volume as Type I er- rors, or as overblocking, and errors of omission, also referred to as Type II errors, or underblocking. In the information retrieval literature (see Ap- pendix C), these kinds of errors are associated, respectively, with the performance measures of precision and recall. The first type of error- overblocking occurs when a site that is appropriate is filtered, i.e., is deemed inappropriate and therefore denied to the user. The second type of error underblocking occurs when a site that is in fact inappropriate is deemed appropriate, and therefore permitted to the user. Due to the nature of filtering, these two types of errors are inevitable. It is possible to adjust the method of filtering such that the occurrence of one type of error is reduced; however, reducing one type of error will always result in increasing the other type of error. For instance, one can reduce underblocking by setting the standard for what is inappropriate at a very low level (e.g., denying access to all sites or refusing all queries that contain the word "adult" or the word "sex". This might result in many sexually explicit sites being successfully filtered, but it will clearly also result in a concomitant increase in overblocking, since many obviously
TECHNOLOGY 59 appropriate sites will also be filtered.20 In some settings (e.g., in doing research), it is desirable to minimize overblocking. In other settings (e.g., in households that are highly risk-averse), it is desirable to minimize underblocking. But it is not possible to minimize both simultaneously. Note also that even a low rate of overblocking will still cause a large number of pages to be blocked, simply because most of the content on the Web consists of innocuous content. Quantitatively estimating the rates of these two types of errors, or the rate of success in blocking and not blocking, depends on knowledge (or estimation) of four numerical parameters, as indicated in Box 2.7. Placement of Filters Filters can be installed in a variety of places. Some ISPs use filters to screen the content they pass onto their subscribers. The major Internet browsers (Internet Explorer and Netscape) support label-based filtering. Some search engines provide users with the option to perform filtered searches. Third-party commercial software vendors sell stand-alone fil- ters that can be installed on a personal computer or into a local area network serving an organization (e.g., a school or a library system). See Section 12.1.1 for a more detailed discussion of this issue. 2.3.2 Technologies for Authentication and Age Verification The process of authentication involves assessing the validity of an assertion about the identity of a user.21 (Note that a separate issue relates to the identification of a specific piece of software or hardware being used (Box 2.8~. When only a specific individual is using that software or hard- ware, the authentication problem is reduced to that of identifying the specific software or hardware in use. But in general, multiple users of a given software or hardware system must be assumed.22) 20This is a real example from a filtering system that was encountered at one of the site . ., visors. 21In this report, the term "identity" is used in its colloquial sense, namely for the biologi- cal life form the human being in question. Security specialists often refer to identity more generally as a collection of information about an individual. For more discussion, see CSTB's forthcoming study on authentication technologies, with project information avail- able at <http://www.cstb.org/web/projects/authentication>. 22For more discussion of authentication technologies, see Computer Science and Tele- communications Board, National Research Council, Computers at Risk, 1991; Cryptography's Role in Securing the Information Society, Kenneth W. Dam and Herbert S. Lin, eds., 1996; Trust in Cyberspace, 1999; and Realizing the Potential of C4I: Fundamental Challenges, 1999, all pub- lished by National Academy Press, Washington, D.C. CSTB's forthcoming study on au- thentication will address these technologies comprehensively (see footnote 21~.
60 YOUTH, PORNOGRAPHY, AND THE INTERNET
62 YOUTH, PORNOGRAPHY, AND THE INTERNET In the physical world, the authentication process is conceptually straightforward because of face-to-face interactions. When an individual buying beer presents a driver's license to a liquor store clerk, the clerk can compare the picture on the license to the individual in front of him. Of course, the license could be phony, but the face-to-face nature of the interaction helps to ensure that the subject being compared to the cre- dential is real.23 Such assurance is not available when a face-to-face interaction is not possible, as in the automated authentication of a user to a computer sys- tem.24 Automated authentication depends on the prospective system user sharing with the authentication device something the person knows, has, or includes as a feature, such as a "smart card" belonging to the appropri- ate individual, a secret password, the individual's voice, or a biometric signature such as a fingerprint or retinal pattern. Authentication is only one dimension of keeping children away from age-inappropriate materials. The second key element is that of ensuring that a user is older than some specified age (e.g., older than 17~. While authentication involves assessing the validity of an assertion about the identity of a user, it does not speak directly to the issue of age verification. Assurance about age must, in general, be provided by reference to a docu- ment that provides information about it, and today's infrastructures needed to support online authentication of identity ~enerallv do not in- clude such documents. In the physical world, age verification can be provided as a part of the credential being presented a driver's license generally has a date of birth 23Indeed, in the physical world, someone who presents a fake ID that is recognized as such by the clerk is subject to arrest. 24In principle, age verification could occur through the use of streaming video and audio. In this scenario, a Web camera and microphone located on the user's access point would be used to transmit a high-fidelity voice and video image to a human being working on behalf of the adult content provider. The human being (who might be called a cyberspace "bouncer") would ascertain the adult status from viewing the image and listening to the voice, and if there were any doubt, the bouncer would demand to see a driver's license that the alleged adult could hold up to the camera. Even through voice alone, a trained human verifier can often determine whether the person on the other end is in fact an adult, though this may not always work for very young adults. The human verifier asks questions, and then listens for tone of voice, composure, presence, stuttering, and other things that are not reflected in a typed textual interaction. Because adults tend to have more confidence and self-assurance than children, such voice interactions provide valuable distinguishing infor- mation. These scenarios are technically feasible even today, but are likely not to be eco- nomically attractive. The reason is that one of the major advantages of Internet commerce is the ability to drastically reduce the extent to which human beings are involved. Given that many adult-oriented Web sites operate on very thin margins, the cost of using such a mechanism would likely be prohibitive.
TECHNOLOGY 63 recorded on it. However, a driver's license would be just as good an authenticator of identity if it did not have the date of birth on it. In an online environment, age verification is much more difficult be- cause a pervasive nationally available infrastructure for this purpose is not available. One method is based on the fact that many adults (but not very many children) have credit cards presentation of a valid credit card number is presumed to be an indicator that the presenter is an adult. Taken in the large, this is not a bad assumption the vast majority of credit cards are in fact owned by adults, and the vast majority of minors do not own or have legitimate access to credit cards. Thus, an adult- oriented Web site that uses credit cards as its medium of exchange pre- sumes that the presentation of a valid credit card also verifies that the card user is of legal age. Entering a valid credit card number grants access to the inside of the site.25 Many online adult verification services (AVSs), which provide a veri- fication of adult status to other adult Web sites, also use credit cards.26 Because the credit card is generally the user's method of payment for the service, the AVS relies on the credit card to verify the adult status of the user.27 Another approach to age verification is to rely upon databases of public records (i.e., government-issued documents such as voter registra- tions and/or drivers' licenses). For example, an individual wishing to gain access to an adults-only service sends an online request to an age verification service (along with a credit-card number to effect payment) for a certification of age for a given individual. He or she also provides appropriate personal information, and the adult verification service checks that information against public records such as state drivers' li- censes and voting registration that contain or imply age information. Even higher confidence in age verification can be obtained by cou- pling the use of public record databases to an authentication process that 25Determining with certainty whether a submitted credit card number corresponds to an account in good standing requires an online transaction between the site operator and the credit card company. That is, the site operator transmits the number to the credit card company and the company checks to see if the number refers to an account in good stand- ing. There are other methods that allow the offline identification of some invalid credit card numbers, but they can be defeated with a little effort and sophistication. 26Such services also accept applications via 1-900 phone numbers (which children are not supposed to use without parental permission) that charge phone bills automatically and via U.S. mail. Mail applications are supposed to include proof of age. 27The "typical" adult verification service provides the user with a special code number. Adult Web sites contract with the service (of which many exist). A user wishing access to one of these adult Web sites enters the code number. The adult Web site then contacts the AVS to confirm that the number is valid, and if it is, grants the user access. (The adult Web site usually pays the AVS a commission for users who are verified in this manner.)
64 YOUTH, PORNOGRAPHY, AND THE INTERNET provides assurance of identity. In this case, when adult status is con- firmed, a credential certifying one's adult status is mailed (via postal service) to the address of record on those public records. In this context, the postal service serves as an authenticating process that ensures the adult credential is sent to the right person. The individual can then use this special key to obtain access to adults-only services that recognize this special key. A third approach is to use age verification scripts. An online script can guide a user through a questionnaire that asks, among other things, the user's age, and it can reject users who are underage. To help deal with the problem of lying about one's age, some scripts are written to accept only one attempt at entering age, and so a user who enters "15" at first, is rejected for being underage, and then tries to enter "20" is unsuccessful. In such cases, he or she may have to try again from another computer. Note that each of these methods imposes a cost in convenience of use, and the magnitude of this cost rises as the confidence in age verification increases. Age verification scripts are very convenient for the legitimate adult user, who must simply tell the truth about his or her age. But they are also susceptible to being fooled by a savvy adolescent who knows that the correct age must be entered. A credit card is less convenient for the legitimate adult user, because he or she must be willing to incur the expense of a subscription (or the hassle of canceling one). However, since most credit cards are owned by adults, the use of a credit card provides additional confidence that it is truly an adult who is seeking to use it. At the same time, some minors do own credit cards or prepaid cards that function as credit cards, while other minors are willing to use credit cards borrowed with or without permission from their parents. (Even when parents review credit card statements, either their own or those of their children, they may not be able to identify transactions made with adult- oriented sexually explicit Web sites, as the adult nature of such transac- tions is often not readily identifiable from information provided on the statement.) Using public record databases to verify adult status provides additional confidence in age, but increases the amount of personal infor- mation that the user must provide to gain access. Mailing the certify- ing credential to the user provides the greatest confidence of all that the alleged adult is truly an adult, but because the user must wait for the processing and mailing of the adult credential, it is also the least convenient. Claims have been made that certain "biometric" signatures can differ- entiate between adults and children. While human physiology does in- deed dictate that certain changes in one's body occur as one grows from child to adult, the precise trajectory of these changes varies from indi-
TECHNOLOGY 65 vidual to individual. However, one's legal status as being entitled to privileges as an adult that are not enjoyed as a child is fixed by laws that specify, for example, that individuals even one day over 18 are consid- ered adults and one day under 18 are considered unemancipated minors. No technology today or on the horizon can hope to make such fine dis- tinctions in the case of individuals.28 For this reason, biometric technolo- gies as a method for age verification are not considered here. Age verification technologies as integrated into functional systems are discussed in greater detail in Chapter 13. 2.3.3 Encryption (and End-to-End Opacity) Encryption is used to hide information from all but specific autho- rized parties. In the most general encryption process, an originator (the first party) creates a message intended for a recipient (the second party), protects (encrypts) it by a cryptographic process, and transmits it as ciphertext. The receiving party decrypts the received ciphertext message to reveal its true content, the plaintext. Anyone else (a third party) who wishes undetected and unauthorized access to the message must pen- etrate (by cryptanalysts) the protection afforded by the cryptographic pro- cess or obtain the relevant decryption key (or use another approach to obtain the key, such as bribing someone to reveal it). Encryption also has relevance to the protection of digitized intellec- tual property, such as proprietary images. Because encryption restricts the access of unauthorized parties, encryption can be used to help prevent the dissemination of unauthorized reproductions of digital objects. En- cryption is thus the fundamental technology underlying digital rights management systems (discussed in greater detail in Chapter 13~. The use of encryption may increase dramatically in the coming years. In the context of this study, the significance of encryption is that if content, whether acceptable or inappropriate, is encrypted properly, it cannot be identified by third parties. Thus, while it is possible to interdict all information flows that are encrypted, it is impossible to interdict spe- cific transmissions on the basis of content a point with obvious relevance to filtering systems intended to block specific content. Thus, encryption allows transmission and reception of information to occur with essen- tially no outside scrutiny possible. 28See, for example, testimony of John Woodward, senior policy analyst, RAND, to the COPA Commission on June 9, 2000. Available online at <http: / /www.copacommission.org/ meetings /hearing! /woodward.test.pdf>.
66 YOUTH, PORNOGRAPHY, AND THE INTERNET 2.3.4 Anonymizers As noted in Section 2.1.2, the technology of the Internet itself does not generally require any party to authenticate its identity. Thus, users and online identities (e.g., a screen name or an e-mail address) are bound together through administrative procedures, usually those of an ISP, that are associated with gaining access to the Internet. Through such bind- ings, any interaction of an individual with an Internet-related service- whether visiting a Web page, sending an e-mail, posting a message, set- ting up a Web page, or participating in a chat room is tied to a specific identity that can, in principle, be traced administratively back to that specific individual. Anonymizers break this binding and decouple an individual from a specific online identity. The anonymizer provides what amounts to an identity that is randomly generated. This identity is then used for posting messages, sending e-mail, participating in chats, and accessing Web pages. (Some anonymizers enable return paths when necessary; for example, the recipient of an anonymous e-mail may wish to reply to the (anonymous) sender.) However, anyone seeking to trace the anonymized identity back to the original user will find a number of barriers that make it very diffi- cult to recover the identity of the original user. One example of an anonymizer useful to publishing information on the Web is described in Box 2.9. Anonymizers are significant because they enable individuals to un- dertake activities for which they need not suffer retribution. For an indi- vidual living in a totalitarian state, an anonymizer enables him or her to post an anti-government message in safety or to browse forbidden Web sites. In the United States, it enables someone to freely post a message expressing unpopular political views or to browse Web sites in privacy. Commercial enterprises which need to have a way to accept money do not have much use for anonymizers, even if they are posting materials that may be controversial. But those with non-commercial interests can use the same technology to anonymously post child pornography or ha- rass or stalk an individual online. When anonymizers are used, tracing the identity of online criminal perpetrators becomes difficult. 2.3.5 Location Verification The legal regimes of today are ones in which jurisdiction is based largely on geographical borders. For example, as noted in Chapter 4, "community standards" are an important factor in determining whether a given image is obscene. However, the Internet is designed and struc- tured in such a way that geographical borders and the physical location of
TECHNOLOGY 67 a user have no significance for the functionality he or she expects from the Internet or any resources to which he or she is connected. This fact raises the question of the extent to which a user's location can in fact be established. One way to establish location is simply to ask the user where he or she is located upon logging in. Thus, the first screen seen by the user might ask for his or her present zip code (or state, or country). But in the event that the user chooses to be deceptive (e.g., to avoid restrictions on Internet service based on his or her location), the problem shifts to one of determining location through technological means. Under some circumstances, it can be virtually impossible to deter- mine the precise physical location of an Internet user. Consider, for ex- ample, the case of an individual connecting to the Internet through a dial- up modem. It is not an unreasonable assumption that the user is most likely in the region in which calls to the dial-up number are local, simply because it would be unnecessary for most people to incur long-distance calling costs for such connections. However, nothing prevents a user from using a long-distance telephone call (e.g., from Tennessee) to access a modem in California.
68 YOUTH, PORNOGRAPHY, AND THE INTERNET In practice, recovering location information is a complex and time- consuming process.29 As a rule, the information needed to ascertain the geographic location of an IP address associated with a fixed (wired) Inter- net access point at a given time is known collectively by a number of administrative entities, and could be aggregated automatically. But there is no protocol in place to pass this information to relevant parties, and thus such aggregation is not done today. The bottom line is that determining the physical location of most Internet users is a challenging task today, though this task is likely to be easier in the future. Appendix C provides additional discussion. 2.4 WHAT THE FUTURE MAY BRING The hardest part of this report to calibrate is how the future will change the technologies that today scope both the problem and any puta- tive solutions. As of this writing (May 2002), the World Wide Web is not even a decade old, while the creation and adoption rates for new tech- nologies show generally accelerating deployments of these technologies. The rapid changes of capability in the hardware underlying informa- tion technologies will lead to computing that is 100 times more cost- effective, storage 1,000 times more cost-effective, and bandwidth 10,000 times more cost-effective 10 years hence, and it is highly likely that many applications will emerge to take advantage of such increased capability, as has occurred in the past. What follows below is admittedly specula- tive, but even if any given speculation is far from the mark, taken together these notions paint a portrait of a very different technological milieu in which the age-old problem of "protecting children on the Internet" will play out in the future. · Mechanisms for financial transactions will change significantly over the course of a decade. Financial transactions are likely to become increasingly less private, as the various forms of payment embody differ- ent features to enable traceability. Even cash may become more traceable in the future. This development will favor parents who wish to monitor the expenditures of their children, but will have no impact on those chil- 29While location information is not provided automatically from the IP addresses that an administrative entity allocates, some location information can be inferred. For example, if the administrative entity is an ISP, and the ISP is, for example, a French ISP, it is likely- though not certain that most of the subscribers to a French ISP are located in France. Of course, a large French company using this ISP might well have branch offices in London, so the geographical correspondence between French ISP and Internet user will not always be valid for this case, though as a rule of thumb, it is not a bad working assumption.
TECHNOLOGY 69 dren who borrow electronic wallets at home or who access those sources of sexually explicit material that do not charge. · Voice interaction with computers will become increasingly com- mon, and the capability of computer-generated voices to sound like real people, or even parties known to an individual, will increase. Today, a 55-year-old man can pretend to be a 13-year-old girl using e-mail and instant messages; tomorrow, a 55-year-old man may be able to sound just like a 13-year-old girl over the telephone. It may even be possible for the same 55-year-old man to sound like the girl's mother. In short, technol- ogy will offer greater deceptive capabilities, and those that are most at risk from the existence of such capabilities are likely to be children who lack the experience to identify deception. · Voice interaction will allow younger children, who would find typing difficult, to speak a Web site address to their computer. · Peer-to-peer interactions will be increasingly common, as the tech- nology will largely eliminate the need for large-scale servers, thus elimi- nating them as principal points of leverage for any control strategy. It already grows ever more expensive to selectively delete content than to keep it all, and this economic fact will dominate the future with implica- tions for privacy, digital rights management, and the steady accumula- tion of data that is best described as digital detritus. · Virtual reality advances will soon defeat the ability of even experts to distinguish pictures that are real from those that are synthetic. Haptic devices (i.e., touch-, motion-, and pressure-sensitive devices) may become more common as a way to interface with computers. Whether then a person, an action, or an event is real or not may soon be irrelevant to many consumers. Action, especially "action" in the sexual and violent sub-meanings of those words, will be as realistic as the audience is willing to pay for, and the prices of such offerings will inevitably drop. · Locations from which access to the Internet is possible will prolif- erate wildly. And, with an expansion in the types of information re- sources that are accessible (e.g., new virtual reality resources), policies that give permission to view, access, modify, or delete any information resource will present an enormously complex problem simply as a result of scale. Even today, fine-grained access control driven by policy is, or soon will be, beyond the scope of human management and may be be- yond the scope of mechanistic alternatives. If access control policies are impossible to formulate, the only alternative is an approach that depends on users to exercise self-control. Monitoring of user actions in order to ensure appropriately self-controlled users then becomes the only tech- nical alternative to access control. This is not a statement about the de- sirability of this outcome, only that it is a possible one if access control policies become impractical.
70 YOUTH, PORNOGRAPHY, AND THE INTERNET Although the notions described above are not necessarily desirable from a societal or personal standpoint, they are extrapolations of certain phenomena today, and there are at least some paths from today that could result in their coming true. On the other hand, they may not come true, a point that emphasizes a vast range of uncertainty about the techno- logical future. What has been true over the years is that those who produce and consume sexual content both for commercial and non-commercial pur- poses have stayed on the leading edge of new technologies.30 Thus, whatever the technological future is like in detail, it seems safe to predict with reasonably high confidence that sexual content will be dispropor- tionately present in the initial stages of adoption of any new technology. Because technology changes rapidly, no final technological solutions are possible. It is for this reason, among others, that the committee in later chapters emphasizes social and educational strategies for protecting chil- dren from inappropriate sexually explicit material. Finally, many of the issues associated with protecting children from inappropriate material and experiences on the Internet relate to the archi- tecture of the Internet as it exists today, a state of existence that reflects policy and engineering decisions made decades ago. These are not im- mutable, though major changes that might facilitate control of content delivery could be made only at very considerable cost and at the potential expense of other societal interests. 30For example, the video cassette recorder, inexpensive video cameras, and CD-ROM technologies found some of their first applications in the production and viewing of sexu- ally explicit "adult" movies and interactive sexual games and entertainment. For one per- spective on this point, see Jonathan Coopersmith, 2000, "Pornography, Videotape, and the Internet," IEEE Technology and Society 19~1~: 27-34.