Box 2.5
Human Scrutiny of Every Site to Be Blocked?
The Web is so large that it is impossible for a human being to have scrutinized every possible Web page that a viewer might access. However, filter vendors often claim that some human being has examined every one of the sites blocked by their filter and identified it as inappropriate.
On the face of it, such a claim is not necessarily inconsistent with the first statement about the size of the Web. But the sheer size of the Web means that the screening process involved must involve a mix of automated and human process. In particular, the only method that makes sense for large-scale screening is for some automated process to nominate sites for blocking, and for an individual to evaluate the nominated sites.
Can the claim of human examination for every site be substantiated? One assumption on which the claim rests is that blocked sites are revisited on the time scale on which the content of a site is likely to change. If the revisit rate is inadequate to keep up with such changes, a site that may have been properly blocked when it was first added to the list may remain blocked even if the site's new content would not be deemed inappropriate (overblocking).
A more serious issue is that human evaluation of a site is a labor-intensive process. The critical variable is how long it takes for a human being to evaluate a Web page. Finkelstein and Tien1 estimate 1 minute per page as a reasonable overall estimate for sustained work. This estimate is quite plausible if the page contains a significant amount of textual material that must be read (as would be the case for hate speech, for example), but is likely high by a factor of 5 or 10 for pages that contain images of the kind typically found on adult Web sites. Assuming about 200 workdays per year, an individual doing page evaluation alone might be able to evaluate somewhere between 0.1 million and 1 million pages per year.
These figures suggest that an effort of 10 person-years is needed to create a comprehensive list of a million sites to be blocked (including both text and images)--while an effort on the order of a person-year is required if the primary target is sexual images of the sort found on adult Web sites.
Reliable software to identify possibly inappropriate sites could reduce the effort required considerably. That is, the software--set to minimize underblocking--would propose candidate sites for human evaluation, and the resulting higher level of overblocking would propose sites for which human decision making is essential. Moreover, technology could be used to filter out duplicates, reducing further the overall number of sites that would require human decision making.
See also the discussion in Finkelstein and Tien, Blacklisting Bytes.2
1Seth Finkelstein and Lee Tien, 2000, "Blacklisting Bytes," white paper submitted to the Committee on Tools and Strategies for Protecting Kids from Pornography and Their Applicability to Other Inappropriate Internet Content. Available online at <http://www.itasnrc.or\
g> and at <http://www.eff.org/Censorship/Censorware/200\
10306_eff_nrc_paper1.html>.
2Finkelstein and Tien, 2000, "Blacklisting Bytes."