. "2 Text Categorization and Analysis." Technical, Business, and Legal Dimensions of Protecting Children from Pornography on the Internet: Proceedings of a Workshop. Washington, DC: The National Academies Press, 2002.
The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
Technical, Business, and Legal Dimensions of Protecting Children from Pornography on the Internet: Proceedings of a Workshop
by area. There is probably a big difference in accuracy between pornography and the other objectionable areas. There is also a trade-off between false positives and false negatives. The extent to which advanced techniques make a difference depends on where in the trade-off you start out. If I had to give a number, I would expect a 20 to 30 percent improvement in accuracy over the bag-of-words model—if you want to let all good content through (if you do not want over-blocking).