Page 40

Chapter 4

Speech Recognition Technology and the Assessment of Beginning Readers

Susan M. Williams

The University of Texas at Austin

At the beginning of first grade, most children are entering the initial stage of reading development during which they acquire basic decoding knowledge. Until they become fluent, they must rely on more able readers, such as parents or teachers, to listen to them read and provide assistance when they falter. Technology such as talking books with synthetic or digitized speech can also provide support by reading large blocks of text aloud and giving oral pronunciation and definitions of unfamiliar words. However, talking books provide only passive support. Students must monitor their own decoding and comprehension and request assistance when needed.

Recent research in speech recognition technology has made it possible to develop computer-based reading coaches that listen to students, assess their performance, and provide immediate customized feedback (Mostow & Aist, 1999; Nix, Fairweather, & Adams, 1998). This active support allows students with less knowledge and fewer learning strategies to read independently.

This paper provides an overview of speech recognition technology and how it is being used by computer-based reading coaches to assess the performance of beginning readers. I begin by outlining research related to the importance of frequent practice with feedback for beginning readers. Next, I describe Watch Me! Read, an example of a computer-based reading coach. I then provide an overview of speech recognition technology and how it is adapted for use with children and beginning readers. Finally, I discuss issues and possibilities for using speech recognition as an assessment tool.

THE IMPORTANCE OF FREQUENT READING PRACTICE

Research has shown a positive correlation between frequent reading and reading achievement: Frequent reading improves the speed at which words are recognized which, in turn, leads to fewer disruptions in the comprehension process (Perfetti, 1985). Extensive reading leads to enhanced phonemic awareness (Stanovich, 1986). Extensive reading promotes the acquisition of new vocabulary and grammatical constructions (Stanovich & Cunningham, 1982). Children who read more frequently have higher test scores (Cipielewski & Stanovich, 1992; Greany & Hegarty, 1987).



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 40
Page 40 Chapter 4 Speech Recognition Technology and the Assessment of Beginning Readers Susan M. Williams The University of Texas at Austin At the beginning of first grade, most children are entering the initial stage of reading development during which they acquire basic decoding knowledge. Until they become fluent, they must rely on more able readers, such as parents or teachers, to listen to them read and provide assistance when they falter. Technology such as talking books with synthetic or digitized speech can also provide support by reading large blocks of text aloud and giving oral pronunciation and definitions of unfamiliar words. However, talking books provide only passive support. Students must monitor their own decoding and comprehension and request assistance when needed. Recent research in speech recognition technology has made it possible to develop computer-based reading coaches that listen to students, assess their performance, and provide immediate customized feedback (Mostow & Aist, 1999; Nix, Fairweather, & Adams, 1998). This active support allows students with less knowledge and fewer learning strategies to read independently. This paper provides an overview of speech recognition technology and how it is being used by computer-based reading coaches to assess the performance of beginning readers. I begin by outlining research related to the importance of frequent practice with feedback for beginning readers. Next, I describe Watch Me! Read, an example of a computer-based reading coach. I then provide an overview of speech recognition technology and how it is adapted for use with children and beginning readers. Finally, I discuss issues and possibilities for using speech recognition as an assessment tool. THE IMPORTANCE OF FREQUENT READING PRACTICE Research has shown a positive correlation between frequent reading and reading achievement: Frequent reading improves the speed at which words are recognized which, in turn, leads to fewer disruptions in the comprehension process (Perfetti, 1985). Extensive reading leads to enhanced phonemic awareness (Stanovich, 1986). Extensive reading promotes the acquisition of new vocabulary and grammatical constructions (Stanovich & Cunningham, 1982). Children who read more frequently have higher test scores (Cipielewski & Stanovich, 1992; Greany & Hegarty, 1987).

OCR for page 40
Page 41 Furthermore, the relationship between frequent reading and reading achievement is reciprocal (Stanovich, 1986), i.e., frequent reading leads to higher achievement which leads to more frequent reading. This means that the gap between more and less frequent readers will grow over time. Stanovich (1986) dubbed this phenomenon the “Matthew effect” as a reference to the biblical passage about the rich getting richer and the poor getting poorer (Matt. 13:12). This research on frequent reading is consistent with cognitive theories suggesting that regular extensive practice in a skill promotes proficiency (Anderson, 1995; Ericsson & Smith, 1991). Practice at earlier stages of learning is thought to be more beneficial. As a learner increases in skill, additional practice is likely to bring diminishing returns (Anderson, 1995). Thus, additional reading practice in the early grades when many children are learning to read could be especially important. It is noteworthy that most studies of the effects of frequent reading have been done with older children, presumably because younger children are not yet fluent readers. Because of the Matthew effect, the potential value of frequent reading for younger children could be even greater than the results of these studies suggest. Translating these findings into classroom practice is not straightforward (Byrnes, 2000). While the research seems to suggest that providing more time for reading, especially in the early grades, would lead to increased reading achievement, there is also evidence that certain conditions of practice may be more effective in promoting achievement. For example, at early stages of acquisition, learners often need expert advice to help them understand how they are doing. Formative assessment (and instruction based on that assessment) is especially important for struggling readers who benefit more from scaffolded tutoring than from attempts to read literature on their own (Juel, 1996). Guthrie (1980) also makes a distinction between the time allocated for reading and the time that students are actually engaged in this task. Teachers differ in their instruction and classroom management strategies and in their ability to keep children “on task.” Independent reading attempts by unsuccessful beginning readers can lead to frustration and lack of engagement (Williams, 2000). Thus, allocating time for independent reading is not enough to improve reading performance, especially for beginning and less successful readers. These readers also require feedback and instruction to make the additional time beneficial. Finding time to provide individual feedback during children's reading practice is difficult for teachers who often have 20 or more students in their class in the early grades. Thus, class size is likely to be a constraint on students' opportunities for the type of reading practice that might be most beneficial. It is possible that new developments in speech recognition technology could increase opportunities for individual reading practice with feedback, as well as collecting assessment data to inform instructional decision making. In the next section of this paper, I describe Watch Me! Read, a computer-based reading environment developed by IBM's T.J. Watson Research Center and currently being tested in the Houston Independent School District as part of IBM's Reinventing Education program.

OCR for page 40
Page 42 WATCH ME! READ Commercially available reading software often seems somewhat alien to the actual process of learning to read. Such software must resort to clever schemes to compensate for its inability to react directly to youngsters as they read aloud. One common strategy calls upon the child to perform tasks that presume to exercise the same skills that reading requires, with directions like “Find the word on the screen that rhymes with this picture.” These types of activities fail to give children much of a sense of the experience of reading—not surprising, given their orientation toward isolated word recognition and their reliance on picture interpretation. They also fail to provide students and teachers with valid, meaningful assessment information because they bear little similarity to the cognitive demands of “real” reading. Watch Me! Read (WM!R) software is designed to give a young child a sense of being a reader (Nix et al., 1998). Specifically, the designers' goals are to provide reading practice, comprehension awareness, and a sense of reading as communication. The software uses speech recognition to assess a child's performance and provide individualized feedback. It works in much the same way as an adult who listens to the child, provides help with the pronunciation of words when the child falters, and asks questions to probe the child's understanding of what he or she is reading. In the WM!R environment, books appear on the computer screen much as they do in traditional form, i.e., text and illustrations are displayed on two facing “pages” of a graphic book ( Figure 4-1). A small, animated Panda acts as a guide, walking across the surface of the book, pointing to the current reading location, and providing feedback and encouragement. Students are asked to read the text one phrase at a time. For students who are just beginning to read, the Panda reads each phrase first and then students read only the last word when the phrase is repeated. At the most advanced level, students read the entire phrase without assistance. The phrase being read is marked in color: The text read by the Panda is blue; the text read by the student is red. If a student does not know a word, he or she can click on the word and hear the Panda pronounce it. The student's voice is recorded as he or she reads the book. This recording is used later in the performance section of the program. ~ enlarge ~ Figure 4-1 Reading view of Watch Me! Read showing a book written and illustrated by a student SOURCE: Williams, Nix, & Fairweather, 2000, p. 116 At the beginning of each page, the student can choose to hear an overview of what he or she is about to read. At the end of each page, the Panda asks a comprehension or prediction question based on the contents of the page. These questions are customized for each book. The student uses a graphical “boom box” tool displayed at the bottom of the page to record the

OCR for page 40
Page 43 answers. The student can listen to his or her answer and re-record it if desired. WM!R does not provide immediate feedback for answers to comprehension questions, but the teacher can review recorded answers at a later time. After the student finishes reading, WM!R presents a performance of the book with the words highlighted as they are read in the student's voice. If a camera is attached to the computer, the student can create a video introduction to the performance. Information collected about the interaction includes a recording of the student's reading of the book, answers to comprehension questions, and an assessment of his or her word recognition performance. A discussion of this information and how it might be used is included in the final section of this paper. OVERVIEW OF SPEECH RECOGNITION TECHNOLOGY In order to better evaluate the potential of speech recognition technology for the assessment of beginning readers, it is helpful to have at least a rudimentary understanding of how this technology works. What follows is a highly simplified description of a very complex process. This explanation is intended to highlight how speech recognition systems deal with variations among speakers and domain vocabulary that have an impact on the reliability of the technology. Computers use speech recognition 1 technology to capture human speech and translate it to a written format. This translation process is based on two underlying models: an acoustic model containing representations of the phonemes in English and a language model representing typical sequences of words for a specific target population or domain. The acoustic model is created by an analysis of actual human speech. A set of words is chosen that contains all the phonemes in the English language. During data collection a recording is made of a person saying each of these words. Next, an acoustic model of each phoneme is created from all words having this phoneme, e.g., the tee sound in “tree” is created from all tee sounds in all words having such a sound (e.g., toad, tree, sit). 2 To represent the natural variability in human speech, samples are collected from a large number of speakers and then blended to form a single acoustic model. These blends enable the system to recognize speakers with a wide variety of regional and second language accents. Word forms, representations of the actual vocabulary for an application, are constructed by concatenating the appropriate phonemes from the acoustic model. Thus, a word that was not one of the words spoken during data collection should still be recognizable because a new word form can be created from phonemes in the original sample. Multiple word forms are another way to represent naturally occurring variability. For example, in some dialects the word “get” might be pronounced “git.” Acoustic models can be constructed that include both “get” and “git”

OCR for page 40
Page 44 in order to recognize this difference in pronunciation and still support a translation that accurately captures the intended meaning. Until recently, research on speech recognition for children used standard acoustic models based on a blend of adult voices. In order to improve the accuracy of recognition for children, researchers at IBM's T.J. Watson Research Center have created a children's acoustic model based on data collected from 800 children interviewed at multiple locations across the United States. When a user speaks a word, the speech recognition system converts the recorded word to sets of phonemic representations and searches the system's stored word forms for a “close enough” match. The smaller the set of word forms, the faster and more accurate the matching process. Applications that must recognize large numbers of words (100,000 in IBM's ViaVoice product) also benefit from the addition of a language model to speed up the recognition process and improve accuracy. Language models are constructed by scanning millions of lines of running text and calculating the statistical probabilities of three-word transitions. For example, a speech recognition system for a business application might be based on samples from business letters, Wall Street Journal articles, etc. When a person dictates a business letter, the speech recognition system takes the phonemic representation of the words the person says and, instead of trying to match them to the 100,000 word forms, trims the search as it progresses by excluding paths of lower probability. The probability of the next word depends on the history of the words that have been spoken so far. Speech recognition technology is used in two modes: command and dictation. Command mode uses only the acoustic model along with a limited set of word forms. The captured speech is typically used to trigger an action such as dialing a phone or launching a computer application. Software such as Watch Me! Read falls into this category because the passages presented to the reader represent a limited vocabulary that is known in advance. In dictation applications, the captured speech data are transcribed and stored as a text file, edited by the user, and used like any other word-processing file. The vocabulary to be recognized for dictation is comprehensive; in order to translate speech efficiently and accurately, both an acoustic and a language model are used. (See previous section.) Dictation applications are popular with adult users; however, they are beyond the scope of this short paper, which focuses only on reading (command) applications. GENERAL DESIGN ISSUES AND OPPORTUNITIES FOR RESEARCH Tradeoffs are made in the design of any system. In the case of speech recognition technology, the tradeoffs are typically a balance among accuracy of recognition, speed of recognition, and ease of use. While computing power and research may eventually lessen the impact of these design decisions, they highlight important considerations for those thinking about the use of this technology with children or for assessment. Here is a partial list of issues relevant to reading applications.

OCR for page 40
Page 45 Speaking Rate In order to achieve an acceptable accuracy in recognition, some existing systems only recognize discrete speech, i.e., a user must pause slightly between words. Research on continuous speech has recently created systems that allow users to speak naturally. Without continuous speech recognition, children must learn to speak slowly and deliberately so that their reading can be reliably assessed. The speech of children who are learning to read contains numerous pauses, repetitions, omissions, and partial words as they sound out unfamiliar words. While a human tutor may be able to follow this process and identify a child's current position in the text, this tracking is difficult for a computer system. Interfaces are needed that aid in this tracking without slowing the reader. If the computer loses its place, the feedback provided will be incorrect. Speaker Dependency Some speech recognition systems improve accuracy by having users train the computer system to understand individual variations in their speech. To do this, a user reads prescribed passages into the computer so that the system can personalize its acoustic model. But this training is time-consuming, and the logistics of implementing this in a classroom would be difficult. In addition, alternative training procedures would have to be created for children who cannot read. Microphones and Headsets Recognition is improved by the use of high-quality microphones. To facilitate proper placement and optimum distance from the user's mouth, microphones are often incorporated into a headset. These microphones are delicate, expensive instruments. Headsets have not been developed in a size appropriate for children and sturdy enough for the wear and tear of the classroom. Some research has been done with microphones strategically placed around the room, but this has not been tested in classrooms. ASSESSMENT AND SPEECH RECOGNITION Reading environments such as Watch Me! Read are based on oral reading as a measure of students' competence. Oral reading assessments such as running records (Clay, 1993), informal reading inventories (IRI) (Farr & Carey, 1986), and miscue analysis (Goodman, 1982) take into consideration contextual factors such as passage length and complexity and the reader's reliance on pictures and prior knowledge. They provide rich, detailed data about students' performance in the context of real reading. This information is related to students' fluency, oral reading accuracy, and decoding, as well as strategies such as rereading and self-correction. WM!R's assessment is based on fluency and accurate word identification. The system tracks the child's progress as it compares its model of each word in the text with the word spoken by the child. The data the system provides for the teacher include a copy of the text that was read, a recording of each word spoken by the child, and an indication of whether or not the word

OCR for page 40
Page 46 spoken was accepted as a match for the comparison text. If a word is accepted as a match, the system provides positive oral feedback and moves onto the next word/phrase. If the word does not appear to be a match, the system asks the student to repeat the word. If a second attempt also does not match, the system supplies the correct pronunciation. Failed matches receive feedback such as “I did not hear you.” Failed matches are not labeled as errors because the technology is not capable of making this judgment with accuracy. WM!R does not attempt to interpret data in order to assign a score or reading level. Instead it provides the detailed data to the teacher for his or her interpretation. The decision not to summarize the data is based in part on the reliability of speech recognition. While the average accuracy of recognition for the WM!R system is above 95 percent, some children are not as easily recognized as others because of characteristics of their speech, reading rate, regional or second language accents, etc. Equally important, contextual factors such as background noise and microphone adjustments can vary from session to session, even for the same reader. Summarizing data across one or more sessions can mask this variation and make the assessment appear more reliable. A second reason for not summarizing the data stems from the belief that fluent reading sometimes means that a reader makes meaningful substitutions. Goodman (1982) called such substitutions miscues rather than errors because meaning may be maintained even when the text is not read as written. 3 Thus, valid inferences about whether or not a mismatch represents an error or an appropriate substitution need to be made by the teacher. From a practical point of view, the amount of data produced by WM!R can be overwhelming to monitor on a regular basis. Listening to students' oral reading, whether live or recorded, requires a great amount of time. The designers of WM!R are currently exploring the generation of alert messages to the teacher to identify children who might be having difficulty. These alerts are triggered by a “matching” rate that falls below a prescribed threshold. Additionally, the system identifies potential problem words and creates a list of practice words for students to study. These reporting strategies allow efficient use of the feedback while leaving the final interpretation of the data up to the teacher. The data provided by WM!R also offer interesting instructional possibilities. First, students can review their own reading performance by listening to their recorded voice as text is highlighted word-by-word on the screen. This can encourage and enable self-assessment. Second, student and teacher could review the recorded reading together and discuss appropriate and inappropriate substitutions and other strategies. 4 These types of reflective activities would be impossible without the assistance of technology to create a representation pairing each word in the text with the child's attempt to read it. 3 It is possible to represent common miscues as additional word forms; however, it is not possible to include all substitutions. 4 Thanks to my colleague, David Schwarzer, for this suggestion.

OCR for page 40
Page 47 EVALUATIONOF WM!R Preliminary studies suggest that WM!R can enhance literacy instruction by supporting independent reading practice (Williams, 2000). In these studies, first graders using WM!R as a part of their regular instruction read similar stories without assistance and with WM!R. While using WM!R, all the students were significantly more engaged in the reading task, even those who did not need the support that it provided. When they were asked to reread or retell the stories, their word recognition and retelling scores were significantly higher after they had used WM!R. Interviews with classroom teachers provided further insight into the benefits of WM!R. Teachers reported that the main benefit for their students was regular reading practice with individualized feedback. Other benefits of WM!R practice varied according to the reading level of the student and the way the student used the software. For beginning readers who had not yet learned to sound out words for themselves, the software greatly increased the likelihood that they could get prompt help with words they did not recognize. The software also helped new readers mark the word they were currently reading and track their progress across the page. For more advanced readers, teachers reported that the software provided structure. Without WM!R, these students were likely to rush through books without getting the details of what they were reading or taking time to monitor their own comprehension. The pacing provided by the software helped them attend to details, and the comprehension questions at the end of the page encouraged them to reflect on the book's meaning. When asked about their advanced students, all the teachers said that these students were very interested in using the program and were benefiting from it. Special needs students also benefited from WM!R. One teacher described a hearing-impaired student who was not fitted for a hearing aid until spring of the school year. WM!R was his best opportunity for getting feedback on his pronunciation because the volume could be adjusted so that he could hear well. Students who exhibited symptoms of attention deficit disorder were more engaged in reading practice with WM!R than with reading on paper. All teachers mentioned Limited English Proficiency students as benefiting from the program. In the Houston Independent School District, classes for bilingual students are conducted mostly in Spanish, and Spanish is the primary language in their homes and communities. Thus, these students have few opportunities to work on their developing English skills, and they can be very insecure about trying to say the words out loud. Using WM!R allows these students to hear their own voice and compare their pronunciation with that of the system. It is interesting that not a single teacher mentioned problems with speech recognition as a barrier to use of the software by their students. When teachers were asked about failures (false positives or false negatives) in the speech recognition, they indicated that the positive aspects of WM!R far outweighed occasional problems with recognition.

OCR for page 40
Page 48 CONCLUSION AND CAUTIONS Preliminary research indicates that speech recognition technology has developed to the point that it is useful as a scaffold for early reading because of the feedback it provides and the engagement in reading that it encourages. The potential benefits of WM!R for students depend on many factors, such as the design of the technology itself, the interaction of the technology with students and classroom instruction (Williams, Nix, & Fairweather, 2000), etc. Tools such as WM!R provide reading practice but not reading instruction. They monitor word recognition but not comprehension. Therefore, the teacher must ensure that students have the instruction they need in order to make the best use possible of time spent with WM!R. The benefits of frequent reading also depend on the availability of extensive reading material at appropriate reading levels. Although WM!R includes an authoring tool to enter new books into the system easily, getting permission to use trade books is almost impossible. Writing and illustrating engaging books at appropriate levels require skills and time that many teachers do not have. Speech recognition technology requires top-of-the-line hardware. Most school systems that purchase powerful hardware focus on supplying older students. Thus, it may be difficult to get enough high-performance hardware into 1st and 2nd grade classrooms to make a difference in practice time. It is important to be cautious about relying on data provided by speech recognition technology as a summative assessment. These data must be considered in context to be useful for making instructional decisions. Despite these issues, the promise of this technology is very real and very exciting: It can be a valuable aid in supporting practice for beginning readers and providing assessment information for their teachers. REFERENCES Anderson, J.R. ( 1995 ). Learning and memory: An integrated approach . New York : Wiley . Byrnes, J.P. ( 2000 ). Using instructional time effectively. In L. Baker, M.J. Dreher, J.T. Guthrie (Eds.), Engaging young readers: Promoting achievement and motivation ( pp. 188-208 ). New York : Guilford Press . Cipielewski, J., & Stanovich, K.E. ( 1992 ). Predicting growth in reading ability from children's exposure to print. Journal of Experimental Child Psychology , 54 , 74-89 . Clay, M. ( 1993 ). An observation survey of early literacy achievement . Portsmouth, NH : Heinemann . Ericsson, & Smith, J. ( 1991 ). Toward a general theory of expertise: Prospects and limits . Cambridge, UK : Cambridge University Press .

OCR for page 40
Page 49 Farr, R., & Carey, R. ( 1986 ). Reading: What can be measured? Newark, DE : International Reading Association . Goodman, K.S. ( 1982 ). A linguistic study of cues and miscues in reading. In F.V. Gollasch (Ed.), Language and literacy—The selected writings of Kenneth S. Goodman: Vol. 1. Process, theory, and research ( pp. 115-120 ). Boston : Routledge & Kegan Paul . Greany, V., & Hegarty, M. ( 1987 ). Correlates of leisure time reading. Reading Research Quarterly , 15 , 337-357 . Guthrie, J.T. ( 1980 ). Time in reading programs. The Reading Teacher , 34 , 500-502 . Juel, C. ( 1996 ). What makes literacy tutoring effective? Reading Research Quarterly , 31 , 268-289 . Mostow, J., & Aist, G. ( 1999 ). Giving help and praise in a reading tutor with imperfect listening—because automated speech recognition means never being able to say you're certain. CALICO Journal , 16(3), 407-424 . Nix, D., Fairweather, P., & Adams, W. ( 1998 ). Speech recognition, children and reading. Human factors in computing systems . New York : Association for Computing Machinery . Perfetti, C.A. ( 1985 ). Reading ability. New York : Oxford University Press . Stanovich, K.E. ( 1986 ). Matthew effects in reading: Some consequences of individual differences in the acquisition of literacy. Reading Research Quarterly , 21 , 360-407 . Stanovich, K.E., & Cunningham, A.E. ( 1992 ). Studying the consequences of literacy within a literate society. The cognitive correlates of print exposure. Memory and Cognition , 21 , 51-68 . Williams, S.M. ( 2000 ). What children learn from using Watch Me! Read. A report prepared for IBM. Williams, S.M., Nix, D., & Fairweather, P. ( 2000 ). Using speech recognition technology to enhance literacy instruction for emerging readers. Proceedings of the InternationalConference for the Learning Sciences. Ann Arbor, MI .