The National Academies Press: Home The National Academies: Home
Read more than 4,000 books online FREE! More than 1900 PDFs now available for sale
HOME ABOUT NAP CONTACT NAP HELP NEW RELEASES ORDERING INFO Questions? Call 888-624-8373 cart icon Items in cart [0]
Browse by topic
View special offersEmail this pageSign up for email updates
BOX 2.7 | Youth, Pornography, and the Internet | Dick Thornburgh and Herbert S. Lin, Editors | Committee to Study Tools and Strategies for Protecting Kids from Pornography and Their Applicability to Other Inappropriate Internet Content | Computer Science and Telecommunications Board | National Research Council


Box 2.7
Appropriate and Inappropriate Blocking

A given filter claims to work over a class of content. Examples of a class include "all Web content," "all content in a particular set of chat rooms," and "all e-mail headed for a particular set of inboxes." Classes for a filter can also be combinations of these things.

At a certain moment in time, let T be the total number of "content items" to which a particular filter could be applied. Break the number T into 4 parts, such that A + B + C + D = T, where A, B, C, and D are defined in the table below.

InappropriateAppropriate

BlockedAB
Not blockedCD

The success of a filter in blocking inappropriate sites is measured by the ratio of A to A + C, and the success of a filter in not blocking appropriate sites is the ratio of D to B + D. Thus, a completely successful filter would have values of 1 for each of these ratios; that is, only inappropriate sites would be blocked, and all appropriate sites would not be blocked. A value of zero for each of these ratios indicates complete failure of the filter to block inappropriate sites and to not block appropriate sites, respectively. Put another way, the rate of underblocking is C/(A + C), and the rate of overblocking is B/(B + D), and a value of 1 for each indicates the complete failure of the filter with respect to each measure, and a value of zero indicates complete success.

It is possible in principle to know the values of A, B, and C for any given filter, since inappropriate and appropriate sites that are blocked (B) can be identified, as can inappropriate sites that are not blocked (C). However, the value of D must always be only an estimate, since we cannot know exactly all of the appropriate sites that exist on the Internet. Thus, although the rate of underblocking can be empirically determined, the rate of overblocking can only be estimated.

How should D, the number of not blocked appropriate pages, be estimated? A controversy over methodology was the subject of testimony to the committee. One approach is that the number of appropriate pages should be estimated on the basis of a random sampling of Web pages. A second approach is that the number should be estimated on the basis of actual usage, which weights certain popular Web pages more heavily than those not accessed as frequently. Note that computing the overblock rate in the first way increases it relative to the overblock rate computed in the second way.

For the purpose of determining the accessibility of information in general, the first approach is arguably better. For the purpose of determining the accessibility of information in practice, the second approach is arguably better. The reason that these two approaches are different is that the information needs of people--aggregated as a group--are not uniformly spread over the spectrum of information that the Internet provides. For this reason, someone looking at information needs of large groups in practice might well choose the second approach. However, the information needs of any given individual may not fall within "typical" search parameters--for those with non-typical information needs, the first approach may be more relevant.




Copyright 2002 by the National Academy of Sciences  



">