Skip to main content

Currently Skimming:

6 Advanced Techniques for Automatic Web Filtering
Pages 33-35

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 33...
... To prevent kids from looking at inappropriate material, one solution is to have dedicated, pornography-free Web sites such as Yahoo! Kids and disney.com and assign reviewers to look at those particular Web sites.
From page 34...
... The common image-processing challenges to be overcome include nonuniform image background; textual noise in foreground; and a wide range of image quality, camera positions, and composition. This work was inspired by the Fleck-Forsyth-Bregler System at the University of California at Berkeley, which classifies images as pornographic or not.4 The published results were 52 percent sensitivity (i.e., 48 percent false negatives)
From page 35...
... A statistical analysis was done showing that, if you download 20-35 images for each site, and 20-25 percent of downloaded images are objectionable, then you can classify the Web site as objectionable with 97 percent accuracy.6 Image content analysis can be combined with text and IP address filtering. To avoid false positives, especially for art images, you can skip images that are associated with the IP addresses of museums, dog shows, beach towns, sports events, and so on.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.