Skip to main content

Currently Skimming:

Appendix C: Selected Technology Issues
Pages 418-429

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 418...
... that can characterize the objects in a form suitable for automated comparison to the user's needs. The representation of information objects also requires interpretations by a human indexer, machine algorithm, or other entity.
From page 419...
... An extensive literature on interindexer consistency indicates that when people are asked to represent an information object, even if they are highly trained in using the same meta-language (indexing language) , they might achieve 60 to 70 percent consistency at most in tasks like assigning descriptors.]
From page 420...
... , and also accurate and complete representations of the information objects that are to be tested for exclusion. lust as in information retrieval, making the exclusion decision in information filtering is an inherently uncertain process.
From page 421...
... Efficient automatic text categorization requires an automated categorization decision that identifies, on the basis of some categorization rules, the category into which an object falls. (Note that if the rules are separated from the decision maker, the behavior of the decision maker can be changed merely by changing the rules, rather than requiring the rewriting every time of the software underlying the decision maker.)
From page 422...
... When fidelity of representation becomes important, a number of techniques can go beyond the bag-of-words model: morphological analysis, part-of-speech tagging, translation, disambiguation, genre analysis, information extraction, syntactic analysis, and parsing. For example, a technique more robust than the bag-of-words approach is to consider adjacent words, as search engines do when they give higher weight to information objects that match the query and have certain words in the same sentence.
From page 423...
... It is often possible to tell whether a picture has nearly naked people in it, but there is no program that reliably determines whether there are people wearing clothing in a picture. To find naked people, image recognition programs exploit the fact that virtually everyone's skin looks about the same in a picture, as long as one is careful about intensity issues.
From page 424...
... But any of the contextual issues raised above will remain beyond the purview of automated recognition for the foreseeable future. C.2 SEARCH ENGINES AND OTHER OPERATIONAL INFORMATION RETRIEVAL SYSTEMS Information retrieval systems consist of a database of information objects, techniques for representing those objects and queries put to the database, and techniques for comparing query representations to information object representations.
From page 425...
... The search engine often removes stop words, a list of words that it chooses not to index typically quite common words like "and" and "the."4 In addition, the search engine may apply natural language processing to identify known phrases or chunks of text that properly belong together and indicate certain types of content. What remains after such processing is a collection of words that need to be matched against documents represented in the database.
From page 426...
... Every individual using the Internet at a given moment in time is associated with what is known as an IP address, and that IP address is usually associated with some fixed geographical location. However, because IP addresses are allocated hierarchically by a number of different administrative entities, knowing the geographical location of one of these entities does not automatically provide information about the locations associated with IP addresses that it allocates.
From page 427...
... While such mapping is usually done for billing and customer care reasons, it provides a ready guide to geographical addresses at the end user's level. Those who gain access through DSL connections can be located because the virtual circuit from the digital subscriber line access multiplexer is 5While location information is not provided automatically from the IP addresses an administrative entity allocates, under some circumstances, some location information can be inferred.
From page 428...
... The bottom line is that determining the physical location of most Internet users is a challenging task today, though this task will become easier as broadband connections become more common. C.4 USER INTERFACES The history of information technology suggests that increasingly realistic and human-like forms of human-computer interaction will develop.
From page 429...
... The latter kind of speech is not particularly realistic today but is expected to become more realistic with more research and over time. Speech recognition is still in its infancy as a useful tool for practical applications, even after many years of research, but it, too, is expected to improve in quality (e.g., the ability to recognize larger vocabularies, a broader range of voices, a lower error rate)


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.