The mainstream commercial programs used in the home—which filter and block pages on the fly (not for auditing or later review)—do not filter images. We did a study involving the only commercial program at the time that claimed to filter images on the fly, using 50 pornographic images taken from the Web and 50 nonpornographic images. We found that the software performed no better than random chance if the images were placed in a location that the software did not know about in advance. All the pornographic and nonpornographic images in the test remained accessible, so the claim of filtering based on image contents turned out not to be true.
The company later came out with some fixes so that the program began to filter based on skin tone, but it could not do complex object recognition. The best it could do was to count the number of pixels in the picture that were skin toned and then block based on that. We did another test involving the 50 pornographic images and 50 nonpornographic pictures of people’s faces, and the software scored exactly the same for each type; it was not able to tell the difference.
CYBERSitter is mostly a content-based program. Cyber Patrol is mainly a list-based program. The content-based programs are notorious for errors that arise if you block sites based on keywords on the page or in the URL. It is nowhere near as advanced as the vector space model described earlier. Yet, even though these programs are so sloppy, the examples of what they block are not very controversial, because the company justifiably can say it has no control in advance over what will be blocked. There is a certain phrase in the word filter, and if a site uses that phrase, then it is not really the company’s fault. Blocking software got a bad reputation initially because of examples like a page about the exploration of Mars being blocked because the title was “Mars Explore,” or marsexpl.html.
I have a friend named Frank who made a Web page about Cyber Patrol, and he later found that his page was blocked—not because he was criticizing the software, but because his name was Frank, and “ank” was on Cyber Patrol’s list of dirty phrase keywords. The list of blocked sites could not be edited, but the list of dirty phrases was viewable and you could add and remove terms from it. Presumably to avoid offending the parents who had to deal with it, the company put in word fragments instead of whole words. The list contained phrases such as “uck” and “ank,” the latter apparently an abbreviation for “spanking” because the company wanted to block pages and chat channels about spanking fetishes.
There are many other examples, some involving programs that even remove words from the pages as they download them, without making it