3
Categorization of Images
David Forsyth
3.1 CHALLENGES IN OBJECT RECOGNITION
The process of determining whether a picture is pornographic involves object recognition, which is difficult for a lot of reasons. First, it is difficult to know what an object is; things look different from different angles and in different lights. When color and texture change, things look different. People can change their appearance by moving their heads around. We do not look different to one another when we do this, but we certainly look different in pictures.
The state of the art in object recognition is finding buildings in pictures taken from satellites. Computer programs sometimes can find people. We are good at finding faces. We can tell—sort of—whether a picture has nearly naked people in it. But there is no program that reliably determines whether there are people wearing clothing in a picture. The main way to look for people with clothes is to look for the ones without clothes. It is a remarkable fact of nature that virtually everyone’s skin looks about the same in a picture (even across different racial groups), as long as we are careful about intensity issues. Skin is easy to detect reliably in pictures, so the first thing we look for is skin. But we need to realize that photographs of the California desert, apple pies, and all sorts of other things also have a blank color. Therefore, we need a pattern for how skin is arranged.
Long, thin bits of skin might be an arm, leg, or torso. Because the kinematics of the body is limited, certain things cannot be done with arms and legs. If I find an arm, for example, then I know where to look for a
leg. If I put enough of them together, then there is a person in the picture. If there is a person and there is skin, then they have no clothes on, and there is a problem. We could reason about the arrangement of skin, or we could simply say that any big blob of skin must be a naked person. We did a classification based on kinematics.
Performance assessment is complicated. There are two things to consider: first, the probability that the program will say a picture is rude when it is not (i.e., false positive) and, second, the probability that the program will say a picture is not rude when it is (i.e., false negative). Although it is desirable to try to make both numbers as small as possible, the appropriate trade-off between false positives and false negatives depends on the application, as described below. Moreover, false positive and false negative rates can be measured in different ways. Doing the experiments can be embarrassing because a lot of pictures need to be handled and viewed, and all sorts of other things make it tricky as well. The experiments are difficult to assess because they all use different sets of data. People usually report the experiments that display their work in a good light. In view of these phenomena, it is not easy to say what would happen if we dropped one of these programs on the Web.
3.2 SCREENING OF PORNOGRAPHIC IMAGES
One way to reduce viewing of pornographic images is intimidation. A manager or parent might say to employees or children that Internet traffic will be monitored. They might explain that the image categorization program will store every image it is worried about in a folder and, once a week, the folder will be opened and the contents displayed. If the images are problematic, the manager or parent will have a conversation with the employee or child. This approach might work, because when people are warned about monitoring, they may not behave in a silly way.
But it will work only if there is a low probability of false positives. No one will pay attention to monitoring if each week 1,500 “pornographic” pictures are discovered in the folder, all being pictures of apple pies that the program has misinterpreted. The security industry usually says that people faced with many false positives get bored and do not want to deal with the problem.1 On the other hand, a high rate of false negatives is not a concern in this context. Typically, in a monitoring application, letting
one or two pictures sneak in is not a problem. If there is a high false-negative rate, then we will get a warning. We might not see every one, but we will know there is an issue.
Another approach is to render every picture coming through a network. We could fill a building with banks of people looking at all the pictures and saying, “I don’t like this one.” This is not practical. We could take a “no porn shall pass” attitude, but then we really care whether the possibility of a false negative is small, and there is a risk that we might not know what is being left out. Large chunks of information might be ruled as objectionable by the program without, in fact, being objectionable, and we would not know about it.
Yet another approach is site classification. We could look at a series of pictures from one site, and if our program thinks that enough of them are rude, then we could say that the whole site is rude. We need to be careful about such rules, however, because of a conditional probability issue, as discussed below.
A program that I wrote with Ida Fleck marks about 40 percent of pornographic pictures, where a pornographic picture is an image that can be downloaded from an adult-oriented site. This program thinks pictures are pornographic if they contain lots of stuff that looks like skin that is in long bits and in a certain arrangement. A picture that appears to have lots of skin but in the wrong arrangement is not judged to be pornographic. Pictures with little skin showing are not identified as pornographic. But pictures of things like deserts, cabins, the Colorado plateau, cuisine, barbecue, salads, fruit, and the colors of autumn are sometimes identified as pornographic. Spatial analysis is difficult and is done poorly. The program often identifies pies as torsos. But the program is not completely worthless—it does find some naughty pictures. Sometimes the colors are not adjusted correctly, so that the skin does not look like skin, but the background does. But this seldom happens because it makes people look either seasick or dead; usually, the people who scan the film adjust the colors.
This brings up the conditional probability issue. This program is slightly better at identifying pictures of puddings than it is at detecting pictures of naked people, because an apple tart looks like skin arranged in lines and strips. Generally, if a Web page contains pictures of puddings, then the program says each picture is a problem and, therefore, the Web page is a problem. This is a common conditional probability issue that arises in different ways with different programs. There is no reason to believe that computer vision technology will eliminate it.
Mike Jones and Jim Ray did some work on skin detectors. When they found skin, they looked for a big skin blob and, if it was big enough, they
said the picture was a problem. The program cannot tell if a person is wearing a little bathing costume or if the skin belongs to a dog instead of a human. They plotted the probability of a false positive against the probability of detection. If you wanted only a 4 percent probability of a false positive, for example, then you would mark about 70 percent of pornographic pictures. I am not sure whether they used as many pictures of puddings or the Colorado desert in their experiments as I did. Density also affects the results; doing these experiments right is not easy. They analyzed text as well as images. I think they used a simple bag-of-words model with perhaps some conditional probability function. To mark about 90 percent of the pornographic pictures, you would get about 8 percent false positives, which might be a very serious issue. Unless you are in the business of finding out who is looking at rude pictures, then 8 percent false alarms would be completely unacceptable.
Several things make it easier to identify pornography than you might think. First, people tend to be big in these pictures because there is not much else. There are also wild correlations among words, pictures, and links. Most porn Web sites are linked to most others. What you think about a picture should change based on where you came from on the Web.
Filtering, or at least auditing, can be done in close to real time. A Canadian product called Porn Sweeper audits in close enough to real time that the producers claim that someone transmitting or receiving large numbers of these pictures will get a knock on the door within the next day or so, rather than the next month. But this is not fast enough to meet everyone’s needs.
3.3 THE FUTURE
Face detection is becoming feasible. The best systems recognize 90 percent of faces with about 5 percent false positives. This is good performance and getting much better.2 In 3 to 5 years, the computer vision community will have many good face-detection methods. This might help in identifying pornography, because skin with a face is currently more of a problem than skin without a face. Face detection technology probably can be applied to very specific body parts; text and image data and connectivity information also will help.
However, I do not believe that the academic computer vision community will be highly engaged in solving this problem, for three reasons. First, it embarrasses the funding agencies. Second, my students have been tolerant, but it is difficult to assign a job containing all sorts of problematic pictures. Third, it embarrasses and outrages colleagues, depending on their inclinations.
Technical solutions can help manage some problems. I am convinced that most practical solutions will have users in the loop somewhere. The user is not necessarily a child trying to avoid pornography; he or she may be a parent who backs up the filter and initiates a conversation when problematic pictures arise. What is almost certainly manageable, and going to become more so, is a test to determine whether there might be naked people in a picture. The intimidation scenario described above could work technically in the not too distant future.
What will remain difficult are functions such as distinguishing hardcore from soft-core pornography. These terms are used as though they mean something, but it is not clear that they do. Significant aspects of this problem are basically hopeless for now. There have been reasonable disagreements about the photographs of Jock Sturgess, for example. Many depict naked children. They are generally not felt to be prurient, but whether they are problematic is a real issue. There is no hope that a computer program will solve that issue.
Another example of a dilemma is a composite photograph prepared by someone whose intentions were clearly prurient. One side shows children on a beach looking in excited horror at the other side of the frame, where a scuba diver is exposing himself. There was a legal debate over this photo in the United Kingdom and a legal issue in this country as well. One part of the photo showed kids pointing at a jellyfish on the beach; the other part was a lad with his shorts off. Real people might believe that the intention of that photograph is prurient and seriously problematic, but there is no hope that a computer program will detect that. It is not even clear whether pictures such as this are legal or illegal in this country; reasonable people could differ on that question.
Based on my knowledge of computer vision and what appears to be practically possible, any government interested in getting around filters designed to censor things like Voice of America is wasting its money. Either that, or it is engaged in the essentially benevolent activity of supporting research. Something like this could be regarded as a final course project in information-retrieval computer vision for a statistical English program. This will remain true for the foreseeable future.