7
A Critique of Filtering

Bennett Haselton

7.1 INTRODUCTION

I have been running the Peacefire.org site for about 5 years, and we have become known as a source of mostly critical information about blocking software and filtering. I am biased in general against the idea of filtering, as well as the existing limitations, but that is fair because all intelligent people should have opinions about what they study. They simply need to design the experiments so that the person with the opinion will not influence the outcome.

The earlier presentations provided a general idea of how different types of programs work. Some programs examine the text on a downloaded page to look for keywords in the Web page address (the uniform resource locator, or URL) or in the body of the page. Other programs are mainly list based; they do little analysis of the text on a page but have a built-in list of sites that are blocked automatically. All the programs that I know of are some combination of the two types. They have some keyword filtering and some list filtering, but they can be slotted easily into one of these categories.

Most mainstream commercial programs, such as Cyber Patrol, Net Nanny, and SurfWatch, are list based. People often talk about a scenario in which a site might get blocked if the word “sex” is in the title or first paragraph. This scenario has not been accurate for years. Sites can be blocked inaccurately, but this is not a correct way to describe what happens, because the most popular programs that look at words on the page also work off built-in lists of sites.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 36
Technical, Business, and Legal Dimensions of Protecting Children from Pornography on the Internet: Proceedings of a Workshop 7 A Critique of Filtering Bennett Haselton 7.1 INTRODUCTION I have been running the Peacefire.org site for about 5 years, and we have become known as a source of mostly critical information about blocking software and filtering. I am biased in general against the idea of filtering, as well as the existing limitations, but that is fair because all intelligent people should have opinions about what they study. They simply need to design the experiments so that the person with the opinion will not influence the outcome. The earlier presentations provided a general idea of how different types of programs work. Some programs examine the text on a downloaded page to look for keywords in the Web page address (the uniform resource locator, or URL) or in the body of the page. Other programs are mainly list based; they do little analysis of the text on a page but have a built-in list of sites that are blocked automatically. All the programs that I know of are some combination of the two types. They have some keyword filtering and some list filtering, but they can be slotted easily into one of these categories. Most mainstream commercial programs, such as Cyber Patrol, Net Nanny, and SurfWatch, are list based. People often talk about a scenario in which a site might get blocked if the word “sex” is in the title or first paragraph. This scenario has not been accurate for years. Sites can be blocked inaccurately, but this is not a correct way to describe what happens, because the most popular programs that look at words on the page also work off built-in lists of sites.

OCR for page 36
Technical, Business, and Legal Dimensions of Protecting Children from Pornography on the Internet: Proceedings of a Workshop 7.2 DEFICIENCIES IN FILTERING PROGRAMS The mainstream commercial programs used in the home—which filter and block pages on the fly (not for auditing or later review)—do not filter images. We did a study involving the only commercial program at the time that claimed to filter images on the fly, using 50 pornographic images taken from the Web and 50 nonpornographic images. We found that the software performed no better than random chance if the images were placed in a location that the software did not know about in advance. All the pornographic and nonpornographic images in the test remained accessible, so the claim of filtering based on image contents turned out not to be true. The company later came out with some fixes so that the program began to filter based on skin tone, but it could not do complex object recognition. The best it could do was to count the number of pixels in the picture that were skin toned and then block based on that. We did another test involving the 50 pornographic images and 50 nonpornographic pictures of people’s faces, and the software scored exactly the same for each type; it was not able to tell the difference. CYBERSitter is mostly a content-based program. Cyber Patrol is mainly a list-based program. The content-based programs are notorious for errors that arise if you block sites based on keywords on the page or in the URL. It is nowhere near as advanced as the vector space model described earlier. Yet, even though these programs are so sloppy, the examples of what they block are not very controversial, because the company justifiably can say it has no control in advance over what will be blocked. There is a certain phrase in the word filter, and if a site uses that phrase, then it is not really the company’s fault. Blocking software got a bad reputation initially because of examples like a page about the exploration of Mars being blocked because the title was “Mars Explore,” or marsexpl.html. I have a friend named Frank who made a Web page about Cyber Patrol, and he later found that his page was blocked—not because he was criticizing the software, but because his name was Frank, and “ank” was on Cyber Patrol’s list of dirty phrase keywords. The list of blocked sites could not be edited, but the list of dirty phrases was viewable and you could add and remove terms from it. Presumably to avoid offending the parents who had to deal with it, the company put in word fragments instead of whole words. The list contained phrases such as “uck” and “ank,” the latter apparently an abbreviation for “spanking” because the company wanted to block pages and chat channels about spanking fetishes. There are many other examples, some involving programs that even remove words from the pages as they download them, without making it

OCR for page 36
Technical, Business, and Legal Dimensions of Protecting Children from Pornography on the Internet: Proceedings of a Workshop obvious that words were removed. Sites blocked by these programs are much more controversial, because the company can control exactly what is on the list. If you find something that is blocked, then they cannot claim they did not know in advance. Supposedly, everything on the list was checked for accuracy in advance. We periodically do reports, published on the Peacefire.org site, about what types of sites we have found blocked. We focus on sites blocked by the list-based programs; finding sites blocked by the keyword-based programs is not very interesting, because you almost always find some part of almost every site blocked by something like CYBERSitter. If someone wants to know if they have standing to challenge a local library filtering ordinance, and they want an example, I say: “Well, if you have 20 or more documents, I will just run it through CYBERSitter and one of them will be filtered.” The main controversy regarding list-based programs is how they create the list of sites to block. The lists are divided into categories. If a site is classified into one of these categories, then the site will become inaccessible. This gives the illusion of more flexibility than really exists. If you are using, say, SurfWatch and you elect to block only sex sites, then you block sites that SurfWatch has classified under its sex category, which may or may not be accurate. Even if it were accurate, it might not agree with your views on what a sex site is. Even if you did agree with the company on what qualified as a pornography site, the actual review process might not be accurate. 7.3 EXPERIMENTS BY PEACEFIRE.ORG We are one of the third parties that designed experiments to test the accuracy of the lists used by these companies. There are a couple of ways to do this. The list of blocked sites is supposed to be secret and is not published, but it is always stored in a file that comes with the software. A client-based program has a local list, and periodically you update the list by downloading the latest version from the company that makes it. You can try to break the code on the file and decrypt it, using either Unsoftware or something else. I wrote a decryption program for CYBERSitter in 1997, and two other programmers wrote a decoding program for Cyber Patrol in 2000. You run one of these programs on a computer that has CYBERSitter or Cyber Patrol installed, and it reads the file, decrypts it, and prints out the list of blocked sites into a text file. The Digital Millennium Copyright Act (P.L. 105-304) was passed in 1998. The Library of Congress was designated to set out regulations for how parts of that act would be enforced. Part of the act prohibited

OCR for page 36
Technical, Business, and Legal Dimensions of Protecting Children from Pornography on the Internet: Proceedings of a Workshop decryption of certain files perceived to be storing trade secrets of the company that produced them. The Library of Congress, which had been following the controversy regarding third parties decrypting lists of sites blocked by blocking software and criticizing them, specifically said that the act of decrypting the list of sites blocked by a blocking program would be considered exempt from this law. But at the time these programs came out, there was no such exemption, so many people were worried about the consequences. If you have a server product installed on the Internet service provider’s system, then you do not have access to the file where the list of blocked sites is stored. In that case you need to do a traffic analysis instead of decrypting. The hard way is trial and error, looking at your favorite sites in a directory like Yahoo. The easier approach is to run a list of sites through the program. I have written scripts that run a large number of URLs through one of these programs and record exactly which ones are blocked. This takes some programming skill, and third parties who review this type of software generally do not go to this much trouble. Reviewers for Consumer Reports or PC Magazine usually just use the trial and error approach. The flaw in that approach is that if you want a small sample of sites and you get them from a place like Yahoo—perhaps sites in one of Yahoo’s pornography categories—then you will get an overly good impression of the software, because the software gets its list of pornography sites from the same type of place. Any good program should block 100 percent of those sites. You want to test a larger sample of sites to get a more reliable accuracy rate. In one study, we took a cross section of 1,000 dot-com domain names from the files of Network Solutions, which keeps track of all 22 million (and counting) dot-com sites. We wanted to do a random selection. The problem was that if the blocking error rate came out too high with a random selection, then anyone could claim that we stacked the deck by not taking a really random sample. This is a deeply politicized issue, and the companies knew me as someone who had strong feelings about it. It would be too easy for them to say that we must have cheated by using a disproportionate number of sites that we knew were errors. Therefore, we took the first 1,000 dot-com sites in an alphabetical list of all of the sites, because the first ones are not any more or less likely to contain errors than the rest of the list. They all began with “A-1,” I think. This report is linked to my subpage. You can see the 1,000 sites that we used and the ones that are blocked and which ones of those we classified as errors or nonerrors. The sites that we classified as inaccurately blocked were cases in which we believed that no reasonable person could possibly believe that they were accurately blocked. These sites were about things like plumbing, aluminum siding, or home repair toolkits. There

OCR for page 36
Technical, Business, and Legal Dimensions of Protecting Children from Pornography on the Internet: Proceedings of a Workshop was absolutely no doubt that these were errors; we did not encounter any borderline cases at all. I did the analysis again using 1,000 random dot-com sites, and, for all cases, it looked like the result was within 10 percent of the error rate we got doing it the alphabetical way. We publicized this report with a strong caveat that the second digit of the error rate should not necessarily be taken as accurate. For example, if the error (false positive) rate is 50 percent, we are saying that 50 percent is likely to be close to the actual error rate. If a company claims that it is 99 percent accurate, and we get 30 blocked sites and 15 of them are errors, we can determine with almost 100 percent accuracy that their 99 percent figure is false. Our 50 percent figure could indicate an error rate anywhere from 30 percent to 70 percent, but we definitely can say that 99 percent accuracy is a false claim. Of the 1,000 dot-com sites in the study, programs blocked anywhere from 5 to 51 sites. Of those blocked sites, how many do we feel were errors? In the case of the five blocked sites, the error number is not meaningful. In the case of 50 blocked sites, there is a certain spread of error. The intent was not so much to come up with a hard number for accuracy but rather to address the question of whether the “99 percent” claims are true. Here is what we found. Cyber Patrol blocked 21 sites, and 17 of them were mistakes. These were not borderline cases at all; these were sites selling tool hardware, home repair kits, and stuff like that.1 The examples of blocked sites are listed on our page, so you can verify which sites from the first 1,000 were recorded as blocked or not blocked. We took screen capture images of the sites being blocked, showing the message, “This site has been blocked by this software.” Obviously, screen capture is not proof, because it is trivial to fake an image. But there is a danger of people 1   Bob Schloss asked whether the same host might be hosting both a pornographic site and a hardware site, and, because of the way in which domain names, IP addresses, and port numbers are mapped, the hardware site ends up blocked along with the pornographic site. Susan Getgood said Cyber Patrol formerly contained a bug that allowed this to happen— which Peacefire.org may have known about and used in designing the test. She said the technical problem involving hosted servers has been solved in all network versions used in schools and libraries. Bennett Haselton noted that the company’s Web page specifically said that material does not have to be blocked because it shares an IP address with another blocked site; if it is true that IP address sharing is the cause of blocking, then this is a false claim. The Web hosting issue has been around for several years and also applies to proxy servers. The BESS filtering system and the parental controls of America Online see the host name, not the IP address, of the site that a user tries to access, so they should not have this problem.

OCR for page 36
Technical, Business, and Legal Dimensions of Protecting Children from Pornography on the Internet: Proceedings of a Workshop being suspicious that the study was done incorrectly, that there was a bug in our scripts to record the number of sites blocked, or maybe a site was down at the time and we mistakenly entered it as being blocked. A rate of 17 errors out of the first 1,000 dot-com sites on the list extrapolated across the entire name space of 22 million dot-com sites yields a figure of several hundred thousand incorrectly blocked sites in the dot-com name space alone, not even counting dot-org and dot-net name spaces. SurfWatch’s error (i.e., false-positive) rate was 82 percent; it blocked 42 sites incorrectly and 9 correctly. Even though the same company owned SurfWatch and Cyber Patrol by that time, the lists of sites they blocked turned out to be different. AOL’s Parental Controls, which supposedly uses Cyber Patrol’s list, blocked fewer sites, possibly because it was using an older version or because the list was frozen after they licensed it from Cyber Patrol. When we found the Surf Watch number, we knew that we had better get all the back-up documentation we could possibly get, because there was such a high error rate. The reason that people do not get these high error rates when casually testing the software is that they test their favorite sites or sites that they know about, and errors in popular sites already have been spotted and corrected. They get an overly good picture of how well the software works. People spend a certain amount of time on sites that everyone else spends time on; however, people also spend time on sites that are less popular. Therefore, we are concerned about errors in the less popular sites, even though we know that the popular sites contain fewer errors. Moreover, the SurfWatch error rate is not okay if you are one of those 42 sites blocked incorrectly. We plan to do a follow-up study in which we look at the error rates in a sample of 1,000 sites returned from a search on Google or Alta Vista, in which the more popular sites are pushed to the top. I expect that the error rate in that sample will be lower, because the popular sites are weighted more heavily. This study measured only the percentage of blocked sites that are mistakes—false positives. It did not measure the percentage of pornographic sites that are blocked, or the percentage of nonpornographic sites that are not blocked. If we use either of those numbers to judge a program, then we run into a problem. To determine how good the programs are at blocking pornography, we first would have to find out how many of the 1,000 dot-com sites are pornographic and then see how many are blocked. We used the same 1,000 dot-com sites for every program except BESS (a filter made by N2H2), which blocked 26 of 1,000 sites, 19 appropriately and 7 by mistake. We did the experiment first with SurfWatch, and that one was published first last August. We thought the other companies

OCR for page 36
Technical, Business, and Legal Dimensions of Protecting Children from Pornography on the Internet: Proceedings of a Workshop might have heard about the first study and perhaps fixed their programs to block fewer sites incorrectly in that small 1,000 site sample. It turned out that none of them apparently had heard about it, because their error rates were the same as before—except for BESS. In BESS, we observed a clean break in the error rate pattern. We took the first 2,000 dot-com sites, and the first 1,000 contained no errors; but right after that, the error pattern appeared.2 Technically, all they did was fix errors in their software, so can we accuse them of cheating or not? They removed errors from the sample that they knew we were using, so we used the second set of 1,000 dot-com sites. Our conclusion from this study was that the people are not actually checking every site before they put it on a list. If there are 42 errors in the first 1,000 dot-com sites in a list, then there is no way of knowing how many errors will occur throughout the entire space of 22 million. This does not necessarily mean there is a conspiracy at the highest levels in the company. The most innocent explanation may be that some intelligent, lower-level employee whose job it was to find these sites may have written a program that scoured these sites and added them to the list automatically, without the person having necessarily having to look at them first. There is not necessarily an explanation for how someone could have looked at one of these sites and determined that it was offensive. The borderline cases receive a lot of attention, because someone brings them to the company’s attention and they have debates about whether or not the blocking is appropriate. This happened with an animal rights page that was blocked by Cyber Patrol, for example. There was a discussion about whether the depictions of victims of animal testing were appropriate. But the vast majority of blocked sites that have not been viewed are moving targets, because if you raise the issue of these sites, then generally the company will fix the problems right away. Then it becomes a question of finding more blocked sites. That was why we did the study using 1,000 dot-com sites, so that, even if these specific errors were fixed, the fact that we found them in this cross-section says something about the number of errors that exist in the list as a whole. Sites can be blocked erroneously for reasons other than a lack of human review. In an incident that became the baseline in discussions about the appropriateness of blocking software, Time magazine wrote an online article about CYBERSitter’s blocking policies and the controversy over 2   David Forsyth suggested that the substantial difference in results between tests of 1,000 sites and tests of 2,000 sites means that 1,000 sites is too small a set with which to conduct an experiment like this.

OCR for page 36
Technical, Business, and Legal Dimensions of Protecting Children from Pornography on the Internet: Proceedings of a Workshop the blocking of a gay rights advocacy group’s Web pages. CYBERSitter put pathfinder.com, Time magazine’s domain name, on its list. The magazine’s Web site has an article written after CYBERSitter blocked the site, which is good, because otherwise nobody would believe me. At the other end of the spectrum, I sent e-mail to Cyber Patrol saying that the American Family Association (AFA) Web site, the home page of an extremely conservative organization, should be blocked as a hate site because of the amount of antigay rhetoric. Because most programs that publish definitions of hate speech include discrimination based on race, gender, or sexual orientation, Cyber Patrol agreed to block the site. It is still on the list today. This is an example of controversial blocking. Many of Cyber Patrol’s customers would not block this type of site themselves. Many filtering companies, in their published definitions of hate speech, have painted themselves into a corner by including discrimination based on race, gender, and sexual orientation. There are many extremely conservative religious organizations, reasonably well respected, that publish speech denigrating people based on sexual orientation. It does not have to be hateful; it just has to meet the discrimination criteria. (“I Hate Rudy Giuliani” is not a hate site.) Even though anti-gay hate speeches generally are considered politically incorrect, it is not so politically incorrect that many people favor blocking it in a school environment, the way they might favor blocking the Ku Klux Klan Web site. We did an experiment a couple of months ago in which we nominated some pages on Geocities and Tripod to be blocked by SurfWatch, Cyber Patrol, Net Nanny, and some of the other companies, saying that the quotes on the pages constituted antigay hate speech. The quotes said things like, “We believe that homosexuality is evil, unhealthy, and immoral and is disruptive to individuals and societies.” The companies agreed to block the pages. Then we said we had created these pages, and they consisted of nothing but quotes taken from the Focus on the Family Web page or the Dr. Laura Web page. We asked the companies if, to be consistent, they also planned to block these sites as well. So far, all the companies have declined to do this. Net Nanny was the only one that responded, saying it would consider blocking the subpages of sites that contained the material that was blocked when copied to the other page. But about 6 months have passed since then, and the company still has not done it. We concluded that an unspoken criterion for whether or not to block a page is how much clout the organization that owns the page has and whether it could incite a boycott against the filtering company. If Dr. Laura talked on her radio show about how Cyber Patrol or SurfWatch blocked her Web site, this has the potential to alienate a good proportion

OCR for page 36
Technical, Business, and Legal Dimensions of Protecting Children from Pornography on the Internet: Proceedings of a Workshop of potential customers, as well as possibly leading to a situation in which someone sues a local school or library for blocking access to political speech. If conservatives join forces to raise a legal challenge to speech blocked in a school or library, then it becomes a larger problem. Even without that experiment, the point is still valid. The companies say they block speech that is discriminatory based on race, gender, or sexual orientation. Yet we have examples of unblocked sites run by large or well-funded groups that—no reasonable person could disagree—meet that definition.3 We recently published two reports about Web sites blocked by various programs. These reports are linked to our main page. One is Blind Ballots, about candidates in the U.S. elections in 2000 whose Web sites were blocked; these candidates included Democrats, Republicans, and one Libertarian, blocked by BESS and Cyber Patrol. The other report is Amnesty Intercepted, about Amnesty International Israel and other human-rights-related Web pages blocked by programs such as SurfWatch, BESS, Cyber Patrol, CYBERSitter, and some of the others. These reports were published just before the U.S. Congress passed a law requiring schools and libraries to use blocking software if they receive federal funding. I think the reports will still come in handy later as the debate continues about the appropriateness of blocking software. Just because these reports did not stop passage of the law does not mean that they will not be used as evidence in the court cases to be filed regarding the legality of the law. There is a question about whether some of the more obvious mistakes made by blocking software can be avoided if you disable the function that dynamically examines pages as they are downloaded and blocks them based on certain keywords. If the list of blocked sites was assembled using keyword searches, and if the pages were not necessarily reviewed first, then the keyword blocking cannot be turned off if the software is installed in an environment (such as a library) in which the administrator wants to be extra careful about not blocking sites that should not be blocked. 3   Susan Getgood said that Cyber Patrol reviewed the four pages that Peacefire.org created and blocked them. The company also reviewed the four source sites but decided not to put them on the list. Cyber Patrol does block afa.net and will continue to do so; AFA promotes a boycott of Disney because it offers same-sex partner benefits. Getgood said that Cyber Patrol is not afraid of an organization’s clout; she receives mail from the AFA every 2 months asking for a site re-review, which is done. Bennett Haselton said that the AFA is less mainstream than other groups focusing on the family, such as the Family Research Council, which has a large lobbying group in Washington, D.C.

OCR for page 36
Technical, Business, and Legal Dimensions of Protecting Children from Pornography on the Internet: Proceedings of a Workshop 7.4 CIRCUMVENTION OF BLOCKING SOFTWARE Blocking software can be circumvented. The easiest way is to find pornography that is not blocked. If you run a search, it is not difficult to find unblocked sites. Everyone who runs a search, with small changes in the query, will get a completely different list of results, so you often find at least one site that is not blocked. You also can disable the software, either by moving files around or by running programs to extract the password. I have written some of these programs. I wrote them because the standards that people use to determine what is indecent and pornographic strike me as arbitrary and silly. I have never heard an explanation for why a man’s chest, but not a woman’s chest, can be shown on TV. The companies that make the software are reinforcing those standards of decency. Whether parents should have a right to filter is still a political issue. I think that rights are more abstract; it is difficult to talk about them. I wrote these programs because I believe that no harm is done if you see something that your parents do not want you to see. All of us can think of things that our parents did not want us to see when we were growing up. All of us can think of examples of when we thought they were wrong, and some of us still believe that they were wrong. People would not use a program like this just to find pornography, because it is trivially easier to find pornography than to disable the software. People use such a program if they need to access a specific site that happens to be blocked. This is either a borderline case, like a sex education site, or something that you do not think should be blocked at all. People have asked me whether I think nothing ever should be blocked. I usually give the example that, if I had a friend whom I thought was depressed and likely to read something that might provoke suicide, then I might go out of my way to try and stop him or her from reading that material. What I would not do is say, “If they’re under 18, then I have the right to interfere, but if they’re over 18, I can’t stop them.” I think that criterion is arbitrary and silly, and that it’s a red herring people use to avoid thinking about the real censorship issues at stake. Anonymizer.com is a site that enables you to circumvent blocking software. You can connect to a third-party Web site through Anonymizer, which has a policy of not disclosing who is being redirected to connect to a site. Anyone can circumvent blocking software by going to Anonymizer and typing in the site that they want to access, because blocking software looks at the first site you connect to, not the URL. However, all blocking software blocks Anonymizer. We never make a big deal out of this, because it is not something worth complaining about. SafeWeb is a site that does the same type of thing.

OCR for page 36
Technical, Business, and Legal Dimensions of Protecting Children from Pornography on the Internet: Proceedings of a Workshop Translator services also are blocked. Babelfish.AltaVista.com is a site where you can type in the URL of a foreign language site and the words from that language will be translated to English, or vice versa. The rationale behind blocking this site was that otherwise the pictures would come through. But Babelfish cannot be used to access images because it does not modify the image tags. (The images are loaded from the original location because Babelfish does not want that data traffic.) The text comes through translated (poorly) but the images are blocked. We published a short piece on why this was probably an unnecessary overreaction on the part of the blocking software, because the text is converted and the images are not accessible. The third example is Akamai.com, a content distribution service. If you sign up, then the images on your site—instead of being loaded from your site—can be loaded through Akamai’s server to save on your bandwidth costs. It is a caching service with servers distributed around the country. A person who requests one of these images will get it directly from the server closest to them. It is a complex scheme that can shave seconds off the load time of a page, so many people place a high value on it. The catch is that a loophole in the software allows you to put any URL on the end of the page, and it will fetch the page through Akamai and deliver it to you.4 We pointed this out last August, but it still works. Some people knew about it before then; they had just published a page on how to use this technique and how often it works to unblock a blocked site. The problem is that if the blocking software companies were to block it, they also would block many banner ads served by Akamai. It is used mostly for banner ads to save on bandwidth costs. Large sites, such as Yahoo, also use it to serve their own images. Programs installed on a network are more difficult to circumvent by moving files around or disabling the software locally, but you can circumvent them by finding unblocked pornography or using the Akamai trick. In addition, if you have the cooperation of someone on the outside willing to set up an Anonymizer-type program on a server, then you can go through that program to access whatever you want. This is becoming easier to do, and people are starting to publish smaller and more light-weight versions of Anonymizer that anyone can put on a Web page as a secret source for them and their friends to use to tunnel through and ac- 4   Milo Medin emphasized that this is a bug, which should be fixed, as opposed to a generic issue.

OCR for page 36
Technical, Business, and Legal Dimensions of Protecting Children from Pornography on the Internet: Proceedings of a Workshop cess blocked sites. We are working on one of those. It does all kinds of fancy things, such as scrambling the text on the source page and using Java script code to unscramble the text and write it. The censoring proxy server cannot block the page unless it parses the Java script to figure out what the actual text is. To summarize, two points are important. First, a significant percentage of blocked sites have not been reviewed by humans. This situation may be due to honest errors, such as IP address sharing or employees whose eyes are glazing over. But one way or another, significant amounts of content are blocked that should not be. Second, it is easy to circumvent blocking software.