8
Internet Navigation: Selected Prospects and Issues

In this chapter, the committee explores a number of factors that are likely to shape the future of Internet navigation. The exposition that follows should not be construed as a comprehensive or representative treatment of these issues. Internet navigation encompasses a number of the established subdisciplines of computer and information science such as information retrieval, database management, human-computer interface, computer algorithms, information economics, and intellectual property law, to name only some of them. The brief discussion that follows only touches on a selected number of these subdisciplines—and only for those issues that came to the attention of the committee during its deliberations.

8.1 TECHNOLOGICAL PROSPECTS

Despite the relative success of the current array of Internet navigation services in satisfying their diverse and numerous users and providers, in the future they will be faced both with pressures to improve further and with technology-driven opportunities to do so.

Those pressures and opportunities have motivated a wide range of research and development activity. Part of this activity is devoted to advancing key technologies, three of which are navigation service algorithms and operations, navigation interfaces, and navigation to audio and visual materials. Another part is dedicated to improving navigation performance by addressing some of the distinctive features of Internet navigation (as noted in Section 6.1)—making use of contextual information, improving persistence, and understanding user behavior.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation 8 Internet Navigation: Selected Prospects and Issues In this chapter, the committee explores a number of factors that are likely to shape the future of Internet navigation. The exposition that follows should not be construed as a comprehensive or representative treatment of these issues. Internet navigation encompasses a number of the established subdisciplines of computer and information science such as information retrieval, database management, human-computer interface, computer algorithms, information economics, and intellectual property law, to name only some of them. The brief discussion that follows only touches on a selected number of these subdisciplines—and only for those issues that came to the attention of the committee during its deliberations. 8.1 TECHNOLOGICAL PROSPECTS Despite the relative success of the current array of Internet navigation services in satisfying their diverse and numerous users and providers, in the future they will be faced both with pressures to improve further and with technology-driven opportunities to do so. Those pressures and opportunities have motivated a wide range of research and development activity. Part of this activity is devoted to advancing key technologies, three of which are navigation service algorithms and operations, navigation interfaces, and navigation to audio and visual materials. Another part is dedicated to improving navigation performance by addressing some of the distinctive features of Internet navigation (as noted in Section 6.1)—making use of contextual information, improving persistence, and understanding user behavior.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation 8.1.1 Navigation Service Algorithms and Operations Efforts to improve Internet navigation services1 are being undertaken in several areas that include: Increasing the amount of material indexed and the frequency of indexing.2 This is a topic of competitive research and development among commercial search services and is dependent primarily on the available computing and storage capacities. Most of the effort goes into increasing the computing capabilities and storage facilities deployed. There is also a trade-off between the size of the computational resources and the depth to which sites are searched. Improving algorithms for matching requests with results.3 Commercial search services devote substantial effort to improving these algorithms, and there is a large and vibrant community studying them in academic and other research institutions.4 Delimiting and describing specific regions of search. In many cases, users wish to limit the scope of their search. For example, searches may be limited to a particular site or Uniform Resource Locator (URL), to definitions, to telephone numbers, to a range of dates, to specific locations, and to a number of other special regions. Many other categories could be used to limit or filter results (e.g., a person, a book, an article). Autonomous collection of information by search agents. Software agents5 to automate access to information have long been predicted. Research efforts continue to look for ways to use agents automatically to aggregate news and information based on a person’s interests. Some of 1   For an overview of research on information retrieval that underlies much of Internet navigation technology, see Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley, Wokingham, U.K., 1999. 2   See, for example, Baeza-Yates and Ribeiro-Neto, Chapter 8, “Indexing and Searching,” written with Gonzalo Navarro, in Modern Information Retrieval, 1999. 3   See, for example, Baeza-Yates and Ribeiro-Neto, Chapter 5, in Modern Information Retrieval, 1999. 4   For example, see Michael Kanellos, “Next Generation Search Tools to Refine Results,” Techrepublic.com, August 9, 2004, available at <http://techrepublic.com.com/5100-22_11-5302095.html>. In addition, the considerable worldwide research activity is reported in conferences and publications sponsored by TREC (Text Retrieval Conference), which is supported by the National Institute of Standards and Technology and the Department of Defense, and the Special Interest Group on Information Retrieval (SIGIR) of the Association for Computing Machinery (ACM). Information on TREC can be found at <http://trec.nist.gov>. Information on SIGIR can be found at <http://www.acm.org/sigir>. 5   According to the Dublin Core Metadata Glossary, “A computer program that carries out tasks on behalf of another entity. Frequently used to reference a program that searches the Internet for information meeting the specified requirements of an individual user.” The Dublin Core Web site is at <fttp://www.purl.org/dc/>.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation the more interesting recent examples look for “deals”—on, for example, auction sites and travel sites.6 Search specialized for non-Roman scripts and various cultures. There has been a considerable amount of work on commercial navigation tools, much of it government supported, in Asia, especially Korea and China. Although it is inspired by the need to work with distinctly different Asian language/culture/character sets, the techniques developed may prove to be applicable globally. The work has focused on intentionally populated directory systems and especially KEYWORD (see Section 7.1.4) systems.7 Efforts to improve the algorithms and operations of Internet navigation services will continue, and are likely to increase, because of competitive pressures, evolving user requirements, and technological advances. Unlike the early days, when almost all research and even development was done within academic settings, commercial organizations now devote substantial resources to development and even research. However, research at universities and research organizations continues to be active, often with federal government support, and can be a source of distinctly new approaches. Furthermore, many academics are working collaboratively with commercial technologists, facilitating the transfer of ideas between academia and industry. 8.1.2 Navigation Interfaces Interfaces play a key role both in the creation of a query and in the display of the results of that query.8 One of Google’s most attractive features, which has been adopted by other search services, for many of its general users is the simplicity of its single-line basic query interface. For those so inclined and skilled, queries can be further specified through the additional capabilities in “Advanced Search.” The clarity of the structure of Google’s display of results, with a clear separation between algorith- 6   For flights, hotels, and rental cars, SideStep (<http://www.sidestep.com>) claims to search the Web for travel values, presenting them to the user side by side with Expedia or Travelocity results, allowing for comparisons. For extensive information on software agents, see the University of Maryland, Baltimore County’s Agent Web, accessible at <http://agents.umbc.edu/about.shtml>. 7   Two examples are (1) Netpia, a Korean Internet service that enables substitution of a native language word or phrase (a KEYWORD) for a unique URL (see <http://e.netpia.com>) and (2) Beijing 3721 Technology Co., Ltd., which has offered Chinese language keywords since 1999 (see <http://www.3721.com/english/about.htm>). 8   For background on this subject, see, for example, Marti Hearst, “User Interfaces and Visualization,” Chapter 10 in Baeza-Yates and Ribeiro-Neto, Modern Information Retrieval, 1999.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation mic search results and those that are sponsored, has also contributed to its success with many users. Further improvements in the query interface that would enable relatively unsophisticated users to characterize their queries more precisely would be desirable, although to succeed they will have to remain very easy to use. There is, as well, room for improvement in the display of query results. For example, the relevance-ranked listing that most search engines produce or the alphabetical listing that many directories provide might be improved by displaying the relationships among the listed responses in a more-readily grasped visual form. A substantial body of research on the display of information exists. In the late 1980s and early 1990s, the Xerox Palo Alto Research Center (PARC),9 in particular, developed several novel display representations including cones, fish-eye views, and hyperbolic trees.10 Researchers at Apple, the Massachusetts Institute of Technology, and elsewhere have experimented with arranging webs of information (including search results) as three-dimensional spaces; see, for example, the (now discontinued) Apple Hot Sauce project.11 Others have experimented with mapping results on to two-dimensional spaces. See, for example, Kartoo, a metasearch engine that displays the search term (keyword) in a map with links to a range of related terms,12 and Grokker2 that groups and maps the results of a metasearch of the Web (and some sites, including Amazon.com) by subtopics.13 The display of query results is a subset of the larger field of information visualization, which incorporates the visual display of data of all kinds.14 Research in that field may very well lead to new methods for visualizing query results. Still other experiments have been directed at simplifying the management of the search. Built-in search boxes, add-in tool bars, frames (in Web pages), sidebars, and tabs are a few of the browser additions that help users manage searches (among other things). At times, a number of companies offered browser add-ons or browser companions to aid Web navigation and searching by collecting and displaying commentary on the 9   Xerox PARC was founded in 1970. In 2002, it became incorporated as PARC, a subsidiary of the Xerox Corporation. See <http://www.parc.com/about/factsheet.html>. 10   See <http://www2.parc.com/istl/projects/uir> for a description of the Palo Alto Research Center’s user interface research projects. 11   See <http://www.eclectica-systems.co.uk/complex/hotsauce.php>. 12   See <http://www.kartoo.com/>. 13   See <http://www.groxis.com/>. 14   The annual IEEE Symposium on Information Visualization is a good source of information on current research on the subject. For information about the 2004 conference, see <http://infovis.org/infovis2004/>.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation pages being viewed. Most of these applications failed as commercial products, even though their interface ideas appeared to have merit. Microsoft is expected to incorporate an Internet search interface in its next-generation operating system, code-named “Longhorn.” It is anticipated that the search interface will be the same for searching the Internet, the local network’s files, and the local computer files.15 This feature will encourage users to consider search an integral function of the operating system, rather than a separate application available only through a browser. Future interface designers will also continue to be faced with designing interfaces to fit within form factors16 ranging from small (e.g., cell phones17 and personal digital assistants) to expansive (multiscreen wall-size displays) and with employing one or more of a variety of sensory systems (auditory, visual, tactile) to communicate under diverse circumstances. 8.1.3 Navigation to Audio and Visual Materials The increase of multimedia materials—containing digital images, audio, or video—available via the Internet has complicated the process of navigation by search engines whose crawlers are challenged to extract index terms from still or moving images or from sounds. Tools to index audio well enough to support search services exist, but generally only for a particular input domain such as television news broadcasts or application-specific telephone conversations. Commercial video often has closed-captioning, obviating the need for recognition. Some technologies exist for searching images based on colors and shapes, but they are still in a relatively early stage of development.18 Resources that incorporate mul- 15   See Michael Kanellos, “Microsoft Aims for Search on Its Own Terms,” c/netnews.com, November 24, 2003, available at <http://news.com.com/Microsoft+aims+for+search+on+its+own+terms/2100-1008_3-5110910.html?tag=nl>. “Microsoft has set a firmer date for the release of its desktop search software, after Google launched a test version of its rival program for scouring a PC’s hard drive,”reported in Ina Fried, “Microsoft Fixes Date for Desktop Search Tool,” c/net news.com, October 22, 2004, available at <http://news.zdnet.com/2100-3513_22-5423080.html>. 16   The “form factor” of a device is its physical size and shape. The form factors of cell phones, personal digital assistants, and laptop computers differ substantially, resulting in different size displays that generally require different interface designs. 17   In October 2004, both Yahoo! and Google began offering search services from cell phones. Yahoo!’s service is called Yahoo! Mobile, and Google’s is Google SMS. 18   For background on this subject, see, for example, Christos Faloutsos, “Multimedia IR: Indexing and Searching,” Chapter 12 in Baeza-Yates and Ribeiro-Neto, Modern Information Retrieval, 1999.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation tiple media, such as electronic literature that contains text, images, animation, and voice, are a particularly challenging search problem.19 Full accessibility for most multimedia materials, comparable to that for textual materials, will require development of technologies for their automatic indexing by search engines, which is a very difficult technology problem. For the foreseeable future, most effective multimedia search will depend on the use of metadata and associated text (see Section 7.1.5). This can be done manually; can be picked up by Web crawlers from page metatags; or can be extracted from text associated with still image, video, or audio files. A number of navigation services using these techniques are available on the Web to find multimedia materials.20 Among them are Google Images, Yahoo! Search Images, Alta Vista Photo Finder, FAST Multimedia Search, and Lycos Pictures and Sounds. A navigation challenge common to all forms of multimedia search is standardization and automatic capture of the metadata to be used for indexing, which would improve the availability and accessibility of such materials.21 Considerable research progress is being made in the searching of music by text, sound, and music notation,22 which is an active area of academic research.23 Video metadata is being pushed by industry forces, so it is reasonably far along. The MPEG-7 standard24 for describing multimedia content in a form that can be used by a device or a program is highly developed, and deployment is likely to begin soon. 19   For background on this subject, see, for example, Elisa Bertino, Barbara Catania, and Elena Ferrari, “Multimedia IR: Models and Languages,” Chapter 11 in Baeza-Yates and Ribeiro-Neto, Modern Information Retrieval, 1999. Current research activities are reported, for example, in the Conferences on Image and Video Retrieval (CIVR), a series held since 1998. Links to the conferences can be found at <http://www.informatik.uni-trier.de/~ley/db/conf/civr/>. 20   See Danny Sullivan, “Multimedia Search Engines,” SearchEngineWatch, January 25, 2002, available at <http://www.searchenginewatch.com/links/article.php/2156251>. 21   See <http://www.chin.gc.ca/English/Standards/metadata_multimedia.html> for an overview of the topic. Research on computer-assisted extraction of metadata from scholarly material associated with images is underway in the CLiMB project at Columbia University. See <http://www.columbia.edu/cu/cria/climb/>. 22   For example, look at the work presented at the 5th International Conference on Music Information Retrieval, available at <http://ismir2004.ismir.net/>. 23   One example is the work underway at Carnegie Mellon University in the infomedia project on “digital video understanding,” which aims “to achieve machine understanding of video and film media, including all aspects of search, retrieval, visualization and summarization in both contemporaneous and archival content collections.” See <http://www.informedia.cs.cmu.edu/>. 24   See <http://www.chiariglione.org/mpeg/index.htm> and also Rob Koenen, “From MPEG-1 to MPEG-21: Creating an Interoperable Multimedia Infrastructure,” 2001, available at <http://www.chiariglione.org/mpeg/from_mpeg-1_to_mpeg-21.htm>.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation Query by example25 is another promising approach to multimedia search. Given an image, it is possible in experimental systems (and in some commercial image-processing software) to find others with similar shapes and colors.26 However, given an image of horses, such techniques can only find other images with the general shapes, colors, and textures in the sample image, while missing images that have to do with horses, but differ in those respects. Conclusion: Indexing and retrieving multimedia materials on the Internet is an extremely difficult technical problem in its full generality, when there are few textual clues. However, for specific purposes or contexts, where textual descriptions are associated with the media, or where relatively low precision can be tolerated, the existing systems can suffice. Research prototypes and commercial offerings can be expected to continue to make slow but useful progress by focusing on specific subcases. 8.1.4 Making Greater Use of Contextual Information As noted in Section 6.1.6, most current general Internet navigation services do not remember users’ recent searches. In most cases, each query is treated the same; the service collects no information about its users’ interests or search goals. While this protects the searcher’s privacy, it can also reduce the responsiveness of the search. In contrast, some site-specific navigation services make considerable use of previous search history to create user models and provide context for specifying searches. Amazon.com, for example, gathers and displays a running history of what has been seen within the current session and retains considerable information about what has been searched for or purchased previously that it uses to make user-specific recommendations. Theoretically, general Internet search engines could offer similar services to improve the ranking or filtering of results or to suggest additional searches. Another approach, which is less likely to raise privacy concerns, would be to have a navigation aid that captures contextual information on the user’s computer and uses that information to formulate context-aware requests to an Internet navigation service. For many searches, knowing the geographical location of the users can help in providing the desired information. But should navigation services assume that users are seeking local or global information? At present, the default assumption of a general Internet navigation service is 25   Query by example for textual queries is used in several conventional database systems. The concept was developed by IBM in 1975. 26   See <http://elib.cs.berkeley.edu/vision.html>.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation that users are seeking global information. However, in theory, navigation services could sort multiple matches by geographic location (for objects with geographic data, such as stores, restaurants, and libraries), listing the nearest matches first, as specialized travel reservation services already can do for hotels around a specific place. In response to this perceived need, both Google and Yahoo! now allow searches to be localized through the entry of an address, a zip code, or a city name together with the subject keyword (e.g., “San Francisco Italian restaurants”).27 The result is a listing of locally relevant Web sites, maps, and listings from businesses in the area. Both services also offer local businesses the opportunity to advertise in response to localized keyword queries. In addition, Google can obtain general information about the location of a query from the Internet Protocol (IP) address of the user, while Yahoo! could make use of its users’ addresses, which they provide when registering for e-mail, photo exchange, or other Yahoo! services. The demand for geographically localized context information is likely to grow rapidly as information appliances become smaller and more portable. A New Yorker searching the Web from his or her cell phone while in Chicago is likely to want to find a restaurant in Chicago.28 A navigation tool that made that assumption might, in that situation, be appreciated. However, although with today’s Internet there is no fully reliable way to determine the location of a searcher, technical tools do exist that offer good enough guesses to allow search engines to tune results to specific geographic areas (through, for example, the IP address). For example, such tools are currently being used to implement certain nationally required censorship practices on Yahoo! and e-Bay, such as the prohibition of the sale of Nazi memorabilia in France or of Mein Kampf29 in Germany. Google will recognize Canada as the source of a search dialed in from there.30 Of course, when the user enters geographic information voluntarily, or the device enters it automatically—as cell phones may soon be able to do— 27   See Stefanie Olsen, “Google Goes Local,” cNet news.com, March 17, 2004, available at <http://news.com.com/2100-1038-5173685.html>; and Jefferson Graham, “Websites Test Local Search Marketing,” USA Today, February 6, 2004, available at <http://www.usatoday.com/tech/news/2004-02-04-localsearch_x.htm>. 28   However, it is worth noting that while geographic context can increase the likelihood of obtaining more relevant information, it is not a perfect process. In the example given, the New Yorker might be searching for the name and phone number of a New York restaurant to provide to a Chicagoan in response to a query about recommendations for good restaurants in New York. 29   There are a number of versions of Adolf Hitler’s Mein Kampf. One version is a translation to English by Ralph Manheim, Houghton-Mifflin, Boston, 1971. 30   Examples provided by an anonymous reviewer.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation such geographic searches can be made easily. However, the automatic reporting of a user’s location to a search engine or other Internet service would raise significant privacy concerns.31 Conclusion: The collection of some contextual information about users by navigation services can be used to improve Internet navigation, but as the data become more detailed, difficult conceptual and implementation issues should be resolved and the associated privacy concerns addressed. The increased use of contextual information is likely to include some combination of improvements in the collection and use of such information by the navigation services, extension of the option for users to enter specific contextual information (e.g., location), development of context-sensitive local aids directly under the user’s control, and improvements in the training and experience of users. The incorporation into queries of information about the location of users, either automatically or voluntarily, and the addition of location filters into navigation services’ ranking algorithms is already underway and is likely to expand rapidly under the impetus of local advertising revenue.32 User modeling—the collection, retention, and use of information about specific users to assist in responding to their queries—is an active research area.33 Creation of user models generates privacy concerns, and this is another area of active research.34 Those user models where the user’s identity is known to the organization creating the model (such as 31   Such systems are likely to work effectively only if the user wants to be located. The user will have the option to disguise her location or to disable the system. 32   In the latter part of 2004, several major Internet navigation service providers took steps to increase the level of personalization in their services. See Chris Sherman, “Yahoo Introduces Personal Search,” SearchEngineWatch, October 5, 2004, available at <http://searchenginewatch.com/searchday/article.php/3417111>; Gary Price, “Ask Jeeves Serves It Your Way,” SearchEngineWatch, September 21, 2004, available at <http://searchenginewatch.com/searchday/article.php/3410441>; and Leslie Walker and David A. Vise, “Google’s New Tool Brings Search Home,” Washington Post, October 15, 2004, p. E1, available at <http://www.washingtonpost.com/wp-dyn/articles/A34099-2004Oct14.html>. 33   See Peter Brusilovsky and Carlo Tasso, “Preface to Special Issue, User Modeling for Web Information Retrieval,” User Modeling and User-Adapted Interaction: The Journal of Personalization Research 14(2):147-157, 2004. 34   See Alfred Kobsa, “Personalized Hypermedia and International Privacy,” Communications of the ACM 45(5):64-67, 2002; Alfred Kobsa, “Tailoring Privacy to Users’ Needs,” 8th International Conference on User Modeling, Springer-Verlag, Sonthofen, Germany, 2001, available at <http://www.ics.uci.edu/~kobsa/papers/2001-UM01-kobsa.pdf>; and Alfred Kobsa and Jörg Schreck, “Privacy Through Pseudonymity in User-adaptive Systems,” ACM Transactions on Internet Technology, 2003, available at <http://www.ics.uci.edu/~kobsa/papers/2003-TOIT-kobsa.pdf>.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation Amazon.com) raise the greatest privacy concerns, as discussed further in Section 8.2.2. User models that are maintained on the client-side and where the user can maintain control over what is known about him or her raise relatively fewer privacy concerns. 8.1.5 Improving Persistence Section 6.1.7 characterizes the many reasons that resources once discovered at a particular location on the Internet may not be there when subsequently sought. While this transience is not a problem for many resources, it can be a difficulty for many others. For example, the references to Web resources throughout this report provide examples of materials that the report’s authors and readers would like to see persist—but cannot control. The notion of “persistence” of materials on the Internet is related to, but not identical with, the more traditional notion of “preservation.” Generally speaking, the goal of persistence is to maintain the same material at the same address for an indefinite period, so that once discovered there it can always be retrieved from that location in the identical form. Preservation, however, has the goal of saving the material for future reference, but not necessarily at the same address. In other words, to find something that has been preserved will require at least one additional discovery step—finding the location (e.g., in an archive) at which it has been preserved. Persistence is most likely to be achieved through the adoption of practices by Web site managers and designers that leave unchanged the URLs of material judged valuable enough to persist and locate modified versions of those materials at new URLs. Consequently, unless there were to be widespread adoption by Web site managers and designers of common persistence practices, the problem of transient persistence will persist. However, there are services that provide a degree of persistence for some materials on the World Wide Web. Google offers access to the cached version, which is the version of a resource available at the time it was most recently added to the index. However, there is no attempt to provide access to earlier versions, and so persistence is very short. That leaves preservation as the most viable alternative. Web preservation initiatives comprise three approaches: harvesting, selection, and deposit.35 35   Michael Day, “Preserving the Fabric of Our Lives: A Survey of Web Preservation Initiatives,” Research and Advanced Technology for Digital Libraries, 7th European Conference, EDCL, Trondheim, Norway, Springer, Berlin, Germany, 2003.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation The most far-reaching approach to preservation has been taken by the Internet Archive,36 a non-profit corporation founded and run by Brewster Kahle, which is supported by contributions from individuals, foundations, and corporations. Rather than being concerned with the persistence of specific material on the Internet, the Internet Archive is devoted to capturing (and preserving) a sequence of snapshots of what is publicly accessible on the Internet. Its goal is preserving the history both of the Internet and of the vast range of human activities reflected in the constantly evolving materials on it. The Internet Archive, also called the “Wayback Machine,” has taken and stored snapshots of materials on the Internet since 1996. In December 2003 it comprised over 11 billion Web pages and over 300 terabytes of data storage, increasing at 12 terabytes per month.37 It is often the only way to locate digital documents that were moved to other sites or taken offline and, therefore, is of great value to users and scholars—and to copyright holders, who can track the use of their content. At present, the Internet Archive is the only active effort in the United States to preserve and provide access to the history of a significant portion of Internet materials.38 In other countries, however, the national libraries are undertaking similar efforts.39 The International Internet Preservation Consortium (IIPC) was formally chartered at the Bibliothèque Nationale de France with 12 participating institutions, all national libraries (including the Library of Congress) and the Internet Archive.40 Its goals are as follows: To achieve the collection of a rich body of Internet content from around the world to be preserved in a way that it can be archived, secured and accessed over time. To foster the development and use of common tools, techniques and standards that enable creation of international archives. To encourage and support national libraries everywhere to address Internet archiving and preservation. During the 3 years of IIPC’s initial agreement, membership is limited to the charter institutions. It will open to other national libraries in 2006. 36   For information on the Internet Archive, see <http://www.archive.org>. 37   Paul Marks, “Way Back When,” New Scientist (date unknown), available at <http://www.newscientist.com/opinion/opinterview.jsp?id=ns23701>; latest information at <http://www.waybackmachine.org>. 38   The Internet Archive is mirrored at the New Library of Alexandria, Egypt, and at the time of writing it is in the process of establishing a European Internet Archive in Amsterdam, The Netherlands. 39   For example, the Australian National Library’s PANDORA project, which has been archiving Australian online publications since 1996, is described at <http://www.nla.gov.au/initiatives/digarch.html>. 40   See <www.netpreserve.org> for full information on the IIPC.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation However, the IIPC will not serve as an operational archive. Rather, it will provide a forum for sharing knowledge; develop and recommend standards; develop tools and techniques to acquire, archive, and provide access to Web sites; and raise awareness of preservation issues through meetings and publications.41 In a similar vein, the Library of Congress has been leading since 2001 a cooperative national digital-strategy effort, called the National Digital Information Infrastructure and Preservation Program.42 The Library, working with government and private partners, is to “develop a national strategy to collect, archive and preserve the burgeoning amounts of digital content, especially materials that are created only in digital formats, for current and future generations.” There is currently no commonly accepted way to decide which material on the Internet should be retained or to ensure the availability of the resources or incentives needed to achieve that goal. These are among the issues that the Library of Congress effort is addressing. 8.1.6 Understanding User Behavior User behavior in navigating through traditional information resources has been a subject of considerable research, but less is known about the Internet case. If such information were available, it is likely that more effective Internet navigation aids and services could be designed. Research on information seeking in print environments dates back to early in the 20th century, and research on information seeking in electronic environments dates to the 1960s. Although a large body of empirical data exists, it is not clear how much of it is relevant to Internet navigation. Much of the prior research is in library (or comparable) contexts and assumes more homogeneous content, more constrained searching goals, and non-commercial environments. Although relatively little is known about how people navigate the Internet generally, there is a small but growing body of empirical research on the use of the World Wide Web. However, research on the Web is severely restricted because search companies have been unwilling to share samples of the enormous amount of data they collect every day with researchers in academic environments. Conclusion: Basic research aimed at a better understanding of user behavior in a variety of Internet navigation tasks using a variety of methods and services is highly desirable. 41   Information obtained from the IIPC Web site on September 3, 2004. 42   See the program’s Web site at <http://www.digitalpreservation.gov/>.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation However, standard methods to evaluate searching performance on the Internet are lacking. The most advanced evaluation methods are constrained to text searching in bounded databases. A broader set of metrics, measures, and test beds is needed for the Internet and digital libraries, and their development would also be desirable.43 An array of new National Science Foundation initiatives in cyberinfrastructure may contribute to these efforts.44 8.2 INSTITUTIONAL ISSUES Most of the institutional issues affecting Internet navigation arise with respect to the commercially supported navigation services, and especially with respect to services whose results are influenced by advertiser payments. The expectation by users that they will be able to understand and trust the results presented by navigation systems leads to efforts by governments to impose disclosure requirements on navigation system operators, similar to the way other advertising practices are regulated in many countries. The desire by information providers to protect their ownership of trademarked and copyrighted material must be balanced with the needs of other providers to incorporate some of that material in descriptions of their own material. These issues are examined in this section. 8.2.1 Regulation It is generally assumed by researchers and other observers of the industry that users want access to navigation services that are neutral, or at least services whose biases match their own.45 In either event, they are assumed to want to know enough about the criteria by which results are returned so that they can judge if those results are trustworthy. Yet these 43   See Christine L. Borgman, Evaluation of Digital Libraries: Testbeds, Measurements, and Metrics, final report to the National Science Foundation, Fourth DELOS Workshop, Hungarian Academy of Sciences, Computer and Automation Research Institute (MTA SZTAKI), Budapest, Hungary, June 6-7, 2002, available at <http://www.sztaki.hu/conferences/deval/presentations/final_report.html>. 44   See Daniel Atkins, Revolutionizing Science and Engineering Through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon Panel on Cyberinfrastructure, January 2003, available at <http://www.cise.nsf.gov/sci/reports/toc.cfm/>. See also the new NSF Division on Shared Cyberinfrastructure, whose Web site is available at <http://www.cise.nsf.gov/div/index.cfm?div=sci>, and similar programs in other directorates. 45   See Deborah Fallows, Lee Rainie, and Graham Mudd, “The Popularity and Importance of Search Engines,” data memo, Pew Internet & American Life Project, August 2004, available at <http://www.pewinternet.org/pdfs/PIP_Data_Memo_Searchengines.pdf>; 68 percent of respondents to the Pew/Internet survey thought that Internet search engines are a fair and unbiased source of information, while 19 percent thought they were not.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation assumptions are not proven; more complete understanding is needed of the value that users place on the explicit disclosure of search and results ranking criteria and on having a choice among navigation systems employing a range of different criteria. In addition, there is a presumed social benefit in having an information infrastructure that can be trusted. A searcher’s need to understand the criteria for ranking the results of a search has risen in importance now that advertising has become the primary source of revenue for search engine companies. That need conflicts with the objectives of some advertisers, who would like their listings to appear as much as possible like the high-ranking results of a neutral search. Consequently, it is not surprising that U.S. Federal Trade Commission (FTC) regulators concluded in June 2002 that some Internet search engines46 were not adequately informing consumers when advertisers paid for prominent placement in search results. The FTC Division of Advertising Practices sent a letter47 to major search services recommending that they review their Web sites and make any changes necessary to ensure that: any paid ranking search results are distinguished from non-paid results with clear and conspicuous disclosures; the use of paid inclusion is clearly and conspicuously explained and disclosed; and no affirmative statement is made that might mislead consumers as to the basis on which a search result is generated. In addition, “to the extent that search engine companies provide search results to third-party Web sites, including other search engines or guides, [the FTC is] encouraging the companies to discuss with the third-party Web sites whether the above criteria are being met with respect to any supplied search results that involve a payment of any kind for ranking, insertion of paid results into unpaid results, or any pay-for-inclusion program.” Furthermore, the FTC staff recognized “that search engine companies’ business models vary and that there is a need for flexibility in the manner in which paid placement and paid inclusion are clearly and conspicuously disclosed.” The FTC letter went on to say that the few studies of consumer views on paid inclusion and paid placement that have been done indicate that many consumers are not aware of the practice. It referred explicitly to two studies: 46   Other search engines, such as Google and AltaVista, clearly designate or segregate the sponsored listings. See section 5.4.2. 47   Letter from FTC to Gary Ruskin, executive director of Commercial Alert, June 27, 2002, available at <http://www3.ftc.gov/os/closings/staff/commercialalertletter.htm>.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation A Consumers Union national survey found that 60% of U.S. Internet users had not heard or read that certain search engines were paid fees to list some sites more prominently than others in their search results. After being told that some search engines take these fees, 80% said it is important (including 44% who said it is very important) for a search engine to disclose, in its search results or in an easy-to-find page on its site, that it is being paid to list certain sites more prominently. If clearly told in the search results that some sites are displayed prominently because they paid, 30% said they would be less likely to use that search engine, 10% said more likely, and 4% said don’t know/refused. Consumers Union also reported that “given the complicated situation, 56% say it would make no difference to them.” It stated that the “combination of users’ low level of knowledge of search engine practices and their strong demand that search engines should come clean leaves users splintered about how to react.”48 A recent BBC-commissioned survey found that 71% of U.K. users were unaware that some search engines let advertisers pay to get more prominent positions in search results.49 Against this background, the FTC also issued, in September 2002, a consumer alert, “Being Frank About Search Engine Rank,” which advises users to be aware that the results of their searches may be affected by various pay-for-placement programs of Internet search engines.50 Although neither of these actions constitutes an enforcement action with the force of law, they do alert the navigation services operators to the interest of the FTC and the possibility that in the absence of change it might consider more formal action. In addition, Internet advertising, whether search engine linked or not, is subject to the same types of national regulation as other advertising with respect to fraudulent or misleading claims and so on. In the United States, the FTC has pursued various cases on those grounds. Furthermore, search engines typically have guidelines for the content they will provide. In 2003, Yahoo! and Google announced that they would restrict advertisements from unlicensed pharmacies in response to consumer concerns about illegal online drug sales.51 The way in which search engines provide rankings has also been the subject of a U.S. District Court case. SearchKing, an online advertising network, sued Google because it asserted that Google reduced the 48   See “A Matter of Trust: What Users Want from Web Sites,” April 16, 2002, available at <www.consumerwebwatch.com/news/report1.pdf>. 49   See, for example, “BBC Launches Its Non-Commercial Search Engine in Response to ‘Tainted’ Results,” May 2, 2002, available at <http://www.VentureReporter.net> (subscription required). 50   Available at <http://www3.ftc.gov/bcp/conline/pubs/alerts/searchalrt.htm>. 51   Saul Hansell, “Search Engines Limit Ads for Drugs but Ease Rules on Sex,” New York Times, December 3, 2003.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation PageRank™ of its site after SearchKing created a network of sites that had the effect of boosting all the members’ PageRanks™. By reducing SearchKing’s PageRank™, Google also had the countervailing effect of reducing the PageRank™ of the network members. SearchKing asserted that Google’s action harmed its business. The court, however, found that Google had the right to adjust PageRank™ value since it constituted an opinion and was covered by First Amendment protections.52 These two examples illustrate the nascent engagement of national regulatory agencies and legal systems with issues arising in navigation services. As Internet navigation continues its growth as a major source of contacts for information and service providers and as a major advertising medium, it may be expected that the scrutiny and activity of regulatory agencies and legal systems—and legislatures—will increase as well. Conclusion: The behavior of commercial navigation services can have a substantial influence on the kind, quality, and appropriateness of the information that Internet users receive. Although there is no evidence that abuse has yet occurred, the potential for abuse is inherent in the navigation services’ ability to affect users’ access to information for commercial or other reasons. Recommendation: Although competition and the desire to be seen as useful by searchers are incentives for fair and open behavior, appropriate regulatory agencies of the U.S. federal government and of other governments should pay careful and continuing attention to the result ranking and display practices of Internet navigation services and their advertisers to ensure that information can flow freely and that those critical practices are fully disclosed. Recommendation: Since competition in the market for Internet navigation services promotes innovation, supports consumer choice, and prevents undue control over the location of and access to the diverse resources available via the Internet, public policies should support the competitive marketplace that has emerged and avoid actions that damage it. 8.2.2 Privacy Privacy issues affect Internet navigation, in both overt and subtle ways. The crux of the privacy concerns rests on the ability of Web sites and other online resources to track their visitors and to capture data about 52   See “Google Wins Over SearchKing in PageRank Case,” Pandia Search Engine News, June 2, 2003, available at <http://www.pandia.com/sw-2003/21-searchking.html>.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation what is being viewed, read, downloaded, or otherwise used without the consent of users.53 As noted in the discussion of context (Section 6.1.6), the more that a system knows about a person’s goals, intentions, and prior activities, the greater the context that can be provided and the more tailored the searching can be. The negative sides of tracking are equally significant, however. Tracking what people read or view could violate long-established liberties in the United States and in many other free societies if that information were made available, freely or under subpoena, to government agencies. Lack of privacy also has a potential “chilling effect.” People are less likely to act on their freedom of speech if they feel that their queries are being recorded and may be disclosed without their permission. Yet the Internet is the site of illegal activities, such as identity theft, illegal transactions, and non-protected speech, such as child pornography. Law enforcement has always had means to target illegal activities without undermining basic democratic principles and needs them on the Internet as well. The designers of future navigation services and of the laws that affect them will, of necessity, be trying to find a workable balance among the services’ desire to use individual information to improve service, the individual’s right to privacy, and the government’s legitimate needs to know.54 Issues of privacy are both important and complex and relate to the Internet and information technology more broadly, not only to navigation. This study could not do them justice, but there are a number of reports and ongoing studies on Internet privacy.55 8.2.3 Trademarks and Copyright Intellectual property rights is an issue whose link to Internet navigation may not be obvious. However, a number of court cases have arisen in which the use of trademarked material in the navigation process has been in dispute.56 Moreover, the extent to which search engines may make use 53   See Fallows, Rainie, and Mudd, “The Popularity and Importance of Search Engines,” 2004. According to the Pew/Internet survey, 85 percent of search engine users rate “knowing that personal information will not be shared without permission” as an important attribute of search engines, but only 55 percent believe that they deliver. 54   Google’s privacy policy is available at <http://www.google.com/privacy.html>; Yahoo!’s, at <http://privacy.yahoo.com/>. 55   For example, the Computer Science and Telecommunications Board of the National Research Council has an ongoing study, whose report is forthcoming in 2006, on privacy in the information age. For further details, see <http://www.cstb-privacy.org/>. 56   See Cindy Sherman, “Search Engines and Legal Issues—October 23, 2002,” SearchEngineWatch, 2002, available at <http://www.searchenginewatch.com/searchday/article.php/2161041>.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation of copyrighted material has been and is certain to continue to be a significant issue. Trademark As in the DNS, the use of trademarked names is a source of contention in Internet navigation. Whereas for the DNS the issue is the use of trademarks in domain names, in navigation the issue is their use in metatags and keywords. Unlike the DNS, for which the non-judicial Uniform Domain Name Dispute Resolution Process (UDRP) has been established, most disputes in Internet navigation that are not resolved through navigation services’ own policies have found their way to the courts. However, thus far, there have been far fewer trademark cases concerning navigation than concerning domain names. One trademark dispute that reached the courts concerned the right to use such terms in metatags, the invisible markers of a Web site selected by the site creator and sometimes used by search engines as keywords. Playboy Enterprises sued a former playmate for incorporating some of its trademarked terms in the metatags at her site. The court, however, decided in her favor on the grounds that she had a legitimate right to use those terms in describing herself and had not done so with the intent of attracting users seeking the Playboy site.57 Another trademark dispute concerned allowing non-trademark holders to bid for a trademarked term. Mark Nutritionals filed suit against Overture (then GoTo) and other paid placement providers for auctioning its trademarked phrase “Body Solutions” to their competitors. As a result, those competitors were showing up higher in searches for “body solutions” than was Mark Nutritionals, which claimed that this constituted trademark infringement as well as unfair competition.58 In a third dispute, J.K. Harris & Co. sued Taxes.com because the Taxes.com site was higher ranked in search engine results than the J.K. Harris site for the search term “J.K. Harris.” The suit was for trademark infringement, unfair competition, false and misleading advertising, and defamation. The reason for the higher ranking was that the phrase “J.K. Harris” appeared frequently (75 times) on the Web page entitled “Complaints about J.K. Harris,” which contained e-mails detailing the site owner’s conversations with investigators about J.K. Harris. The judge 57   The court decision is available at <http://caselaw.lp.findlaw.com/data2/circs/9th/0055009p.pdf>. 58   See Christopher Saunders, “Weight Loss Company Sues Search Engines,” Internetnews.com, February 1, 2002, available at <http://www.internetnews.com/IAR/article.php/12_966901>.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation ruled that the site had the right to use the “J.K. Harris” term, but did not like the number of times that it was used. A preliminary injunction against Taxes.com was issued by the court but was due for reconsideration as a result of a brief filed by the Electronic Frontier Foundation.59 Note that this case was against the Web site owner, not the search engine company. Presumably to avoid becoming the subject of frequent suits, Google has established a complaint procedure to enable companies to claim “reasonable” rights to their trademarked terms.60 In a prominent use of that procedure, eBay asked Google in August 2003 to refuse to sell ads that use eBay’s trademarked name, either alone or in phrases and variations, “so that third-party advertisers do not abuse the intellectual property of the company.” eBay submitted a 13-page list of terms, such as “eBay selling” and “eBay power seller,” that it wanted Google to bar. eBay says that Google has complied with its requests.61 eBay’s trademarks can still be referenced under fair-use provisions, which allow an advertiser to use some one else’s trademarked term for description or comparison of its product—for example, to sell the book eBay for Dummies.62 However, in France, Google has been sued by three companies and in three significant cases has been ordered by regional French courts to stop selling a company’s trademarked terms as keywords to other companies and to pay damages. In the first case, a regional court ordered Google to pay 75,000 euros to two travel companies whose trademarked terms were sold as keywords to rival companies. The court said that Google should “find the means to block advertisements by third parties who have no right to [the] trademarks.” In the second case, a Nanterre court told Google to stop selling trademarked terms of the Le Meridien hotel chain as keywords to its competitors or pay a daily fine of 150 euros.63 In the third case, a Paris district court ordered Google not to sell keywords incorporating trademarks of the luxury goods firm Louis Vuitton Malletier and to pay a fine of 200,000 euros.64 In an effort to block a similar case in the United States, Google has 59   See Cindy Sherman, “Search Engines and Legal Issues—October 23, 2002,” 2002, available at <http://searchenginewatch.com/searchday/article.php/2161051>. 60   See <http://www.google.com/tm_complaint.html>. 61   See Brian Morrisey, “eBay Invokes Trademark on Google Keywords,” Internetnews, August 11, 2003, available at <http://www.internetnews.com/IAR/print.php/2447071>. 62   Marsha Collier and Roland Woerner, eBay for Dummies, 2nd edition, Hungry Mind Press, St. Paul, Minn., 2000. 63   See Stefanie Olsen, “Google Loses Trademark Dispute in France,” c/net news.com, January 20, 2005, available at <http://news.com.com/Google+loses+trademark+dispute+in+France/2100-1030_3-5543827.html?tag=nl>. 64   See Stefanie Olsen, “Google Loses Trademark Case in France,” c/net news.com, February 4, 2005, available at <http://news.com.com/Google+loses+trademark+case+in+France/2100-1030_3-5564118.html>.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation asked a U.S. district court judge for a declaratory judgment on trademark issues raised by American Blind, which sells wallpaper and window coverings. The company complained that Google was selling AdWords infringing its trademarks, listing over 30 terms ranging from the obvious to more generic terms, such as “American wallpaper discount.” Google agreed to block the trademarks, but not variant terms because they were descriptive terms that other advertisers had the right to use. In January 2004, American Blind filed a lawsuit.65 Shortly before, Google had made a request for a judgment that AdWords do not infringe American Blind’s trademarks and demanded a jury trial. The outcome can be quite significant for Google, and other advertising-dependent search engines, since it could affect the degree of scrutiny that they would have to apply to each keyword sale, potentially increasing costs and reducing the number of available words. In December 2004, Google won a U.S. victory when a judge of the U.S. District Court granted Google’s request to dismiss a trademark-infringement complaint from the insurance company, Geico. The judge ruled that it is not trademark infringement to use trademarks as keywords to trigger advertising.66 Copyright Only a few contentious issues have arisen regarding copyright and navigation services. One such issue involves the so-called “notice and take down” provisions of the Digital Millennium Copyright Act (DMCA),67 which requires any Internet service provider (ISP) (which would include any search engine operator) to remove or disable access to any third-party content that has been identified in a statutorily compliant notice provided to the ISP by the owner, or its agent, of the copyright in such content. In order to be statutorily compliant, a DMCA notice must (1) be signed by someone authorized to act on behalf of the owner of the exclusive right that is allegedly infringed; (2) identify the copyrighted work allegedly infringed; (3) identify the allegedly infringing content or activity and provide enough information to enable the ISP to find the content; (4) provide information that is reasonably sufficient to permit the ISP to contact the complaining party, such as a mailing address, telephone number, or 65   See Stefanie Olsen, “Google Faces Trademark Suit Over Keyword Ads,” c/net news.com, January 28, 2004, available at <http://news.com.com/Google+faces+trademark+suit+over+keyword+ads/2100-1024_3-5149780.html?tag=nl>. 66   See Stefanie Olsen, “Google Wins in Trademark Suit with Geico,” c/net news.com, December 15, 2004, available at <http://news.com.com/Google+wins+in+trademark+suit+with+Geico/2100-1024_3-5491704.html?tag=nl>. 67   Public Law 105-304.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation e-mail address; and (5) include a statement that the complaining party has a good-faith belief that use of the allegedly infringing content is not authorized by the copyright owner, its agent, or the law. Back in 1996 and 1997, when the DMCA was being negotiated, hyperlinks to third-party content were thought to be outside the scope of the notice and take down (NTD) provisions of the DMCA, and in fact a few courts have refused to enforce the DMCA when asserted in this context. However, the nature of providing links, at least within the context of search engines, has changed over the years, in that many search engines now include excerpts of the information as part of the link, and so the applicability of the NTD provisions is less clear. Accordingly, some search engine operators have taken to complying with DMCA notices, even though they may not be technically required to do so. For example, in 2002, Google removed some 126 pages that the Church of Scientology claimed infringed its copyright. One of the pages was the home page of an anti-Scientology site that had gained a high ranking in searches on the term “scientology” through the efforts of anti-Scientology activists to build links to it. After protest, Google restored that page, saying that it was “inadvertently removed.”68 The case shows the potential danger from use of the DMCA to unfairly shut down access to Web sites. However, according to Google, the case was unusual. It generally gets one or two DMCA complaints per week that it describes as “open and shut.” Although the DMCA requires only that the complaining party attest to its good-faith belief that the content in question is infringing, the DMCA also provides a counternotice provision that enables the provider of the questionable content to challenge the complaining party’s notice and have the information restored until the complaining party avails itself of the federal courts and obtains injunctive relief. One of the primary purposes of the DMCA, other than to extend the protections of the copyright laws to digital works published over the Internet, was to remove ISPs from being caught in between third-party providers of content and the owners of copyrights when a fight broke out over who owned the rights to that content. By filing a counternotice, the provider of the content is effectively accepting the jurisdiction of a U.S. court should the original complainant want to pursue its complaint, and so many content providers may choose not to avail themselves of this protection. It should also be noted, however, that the DMCA does not obligate search engines to inform content providers when their content has been removed or blocked. According to the statement of DMCA policy on its Web site,69 however, Google will 68   See David F. Gallagher, “New Economy; a Copyright Dispute with the Church of Scientology Is Forcing Google to Do Some Creative Linking,” New York Times, April 22, 2002, p. C4. 69   Available at <http://www.google.com/dmca.html>.

OCR for page 349
Signposts in Cyberspace: The Domain Name System and Internet Navigation make a “good-faith attempt to contact the owner or administrator of each affected site so that they may make a counter notification.” Conclusion: As with the Domain Name System, the most contentious intellectual property issues affecting navigation services concern trademarks. Since there is no arbitral process, such as the UDRP, by which such disputes could be resolved outside the courts and with worldwide effect, it seems likely that conflicting court decisions in different jurisdictions, worldwide, will establish the potentially conflicting rules by which navigation services will have to abide. Potential rulings in some jurisdictions could substantially reduce the ability of search engines to sell keywords using the current automated methods with restriction of specifically trademarked terms only.