Skip to main content

Currently Skimming:

7 A Critique of Filtering
Pages 36-47

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 36...
... Some programs examine the text on a downloaded page to look for keywords in the Web page address (the uniform resource locator, or URL) or in the body of the page.
From page 37...
... Blocking software got a bad reputation initially because of examples like a page about the exploration of Mars being blocked because the title was "Mars Explore," or marsexpl.html. I have a friend named Frank who made a Web page about Cyber Patrol, and he later found that his page was blocked not because he was criticizing the software, but because his name was Frank, and "ank" was on Cyber Patrol's list of dirty phrase keywords.
From page 38...
... You run one of these programs on a computer that has CYBERSitter or Cyber Patrol installed, and it reads the file, decrypts it, and prints out the list of blocked sites into a text file. The Digital Millennium Copyright Act (P.L.
From page 39...
... But at the time these programs came out, there was no such exemption, so many people were worried about the consequences. If you have a server product installed on the Internet service provider's system, then you do not have access to the file where the list of blocked sites is stored.
From page 40...
... Bennett Haselton noted that the company's Web page specifically said that material does not have to be blocked because it shares an IP address with another blocked site; if it is true that IP address sharing is the cause of blocking, then this is a false claim. The Web hosting issue has been around for several years and also applies to proxy servers.
From page 41...
... Therefore, we are concerned about errors in the less popular sites, even though we know that the popular sites contain fewer errors. Moreover, the SurfWatch error rate is not okay if you are one of those 42 sites blocked incorrectly.
From page 42...
... Sites can be blocked erroneously for reasons other than a lack of human review. In an incident that became the baseline in discussions about the appropriateness of blocking software, Time magazine wrote an online article about CYBERSitter's blocking policies and the controversy over 2David Forsyth suggested that the substantial difference in results between tests of 1,000 sites and tests of 2,000 sites means that 1,000 sites is too small a set with which to conduct an experiment like this.
From page 43...
... Web site, the home page of an extremely conservative organization, should be blocked as a hate site because of the amount of antigay rhetoric. Because most programs that publish definitions of hate speech include discrimination based on race, gender, or sexual orientation, Cyber Patrol agreed to block the site.
From page 44...
... Yet we have examples of unblocked sites run by large or wellfunded groups that no reasonable person could disagree meet that definition.3 We recently published two reports about Web sites blocked by various programs. These reports are linked to our main page.
From page 45...
... Anonymizer.com is a site that enables you to circumvent blocking software. You can connect to a third-party Web site through Anonymizer, which has a policy of not disclosing who is being redirected to connect to a site.
From page 46...
... Some people knew about it before then; they had just published a page on how to use this technique and how often it works to unblock a blocked site. The problem is that if the blocking software companies were to block it, they also would block many banner ads served by Akamai.
From page 47...
... It does all kinds of fancy things, such as scrambling the text on the source page and using lava script code to unscramble the text and write it. The censoring proxy server cannot block the page unless it parses the lava script to figure out what the actual text is.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.