Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 76
Appendix 11 Types of Errors Common in Machine Translation Two studies have recently been made of the types of errors made in mechanical translation. The first study was very kindly made available to the Committee by the IBM Thomas J. Watson Research Center, Yorktown Heights, New York. By counting and classifying the corrections made by posteditors, this study determined the types and frequency of errors found in the output of four machine translations (Russian to English). GENERAL CLASSIFICATION AND PERCENTAGE OF ERRORS OF ARTICLE I Total number of words: Transliterated words Multiple meddlings and ambiguities Word order rearranged Miscellaneous insertions and corrections Total No. 96 23 45 164 GENERAL CLASSIFICATION AND PERCENTAGE OF ERRORS OF ARTICLE II Total number of words: Transliterated words Multiple meanings and ambiguities Word order rearranged Miscellaneous insertions and corrections Total 76 Approximately 1,200 % 8.0 2.0 3.6 13.6 Approximately 1,200 No. 6 132 17 77 232 % 0.5 11.0 1.4 6.4 19.3
OCR for page 76
GENERAL C LASSIFICATION AND P ERCENTAGE OF ERRORS OF ARTICLE III Total number of words: Transliterated words Multiple meanings and ambiguities Word order rearranged Miscellaneous insertions and corrections Total No. 17 143 36 122 318 GENERAL CLASSIFICATION AND PERCENTAGE OF ERRORS OF ARTICLE IV Total number of words (including individual digits and symbols in all formulas): Transliterated words Multiple meanings and ambiguities No. 1 87 Word order rearranged 14 Miscellaneous insertions and corrections 436 Total 538 Approximately 1,700 l 9 2 7 19 Approximately 1,600 % 5.8 0.9 29.0 35.7 The second study was made by Arthur D. Little, Inc., and was done in a manner similar to the IBM study. That is, machine trarts- lation output was postedited and the errors classified and counted. From the study, the A. D. Little group was able to tell the percent- age of total corrections made in each category. The original con- sisted of approximately 200 pages of scientific Russian. One set of approximately 100 pages was edited by two different editors. The second set contained "approximately 100 pages from seven MT articles edited by at least four different editors."* *An Evaluation of Machine-Aided Translation Activities at F.T.D., Contract AP 33~657~-13616, Case 66556, May 1, 1965, p. ~10. 77
OCR for page 76
PERCENTAGE OF TOTAL CORRECTIONS COUNTED* Error Word omission A. Articles B. Others Wrong words A. Prepositions B. Verb tense, voice, suffix C. Others Russian left in Choice A. Choice of two B. Choice of two, both wrong Unnecessary word Symbol Phrase not interpreted Word order Total Number of Corrections: 7,573 78 % 18.76 15.98 34.74 3.78 5.56 16.24 25.58 4.48 8.17 3.57 11.74 3.09 4.5 3.14 12.73