Appendix A
Review of English Language Proficiency Tests

As part of the panel’s work, we identified eight English language proficiency (ELP) tests to review in detail (see Chapter 3). These eight tests are used by 40 states and are administered to approximately 75 percent of the English language learner (ELL) students in the country. The tests that we reviewed are listed in Table A-1 along with the states that used each of them during the 2009-2010 school year.

Our review is based on several sources of information. First, we reviewed the technical manuals available for each test. Second, we consulted two recent reports that summarized technical information about the tests: Abedi (2007) provides detailed information about each of the consortium-developed ELP tests (as explained in Chapter 3) and brief descriptions of all of the tests used by the states during the 2006-2007 school year; Wolf et al. (2008) provide a summary of technical information available for 13 ELP tests available as of 2007. Third, representatives from four testing programs—Assessing Comprehension and Communication in English State-to-State (ACCESS), the English Language Development Assessment (ELDA), Language Assessment Scales Links K-12 (LAS-Links), and the Stanford English Language Proficiency Test (SELP)—met with the panel at our second meeting to discuss their tests. This appendix summarizes the information we obtained from these sources.

ASSESSING COMPREHENSION AND COMMUNICATION STATE TO STATE FOR ELL STUDENTS

ACCESS was developed by the World-Class Instructional Design and Assessment (WIDA) Consortium. It began as a partnership of three states—Arkansas, Delaware, and Wisconsin—with technical support through the Center for Applied



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 181
Appendix A Review of English Language Proficiency Tests As part of the panel’s work, we identified eight English language proficiency (ELP) tests to review in detail (see Chapter 3). These eight tests are used by 40 states and are administered to approximately 75 percent of the English language learner (ELL) students in the country. The tests that we reviewed are listed in Table A-1 along with the states that used each of them during the 2009-2010 school year. Our review is based on several sources of information. First, we reviewed the technical manuals available for each test. Second, we consulted two recent reports that summarized technical information about the tests: Abedi (2007) provides de - tailed information about each of the consortium-developed ELP tests (as explained in Chapter 3) and brief descriptions of all of the tests used by the states during the 2006-2007 school year; Wolf et al. (2008) provide a summary of technical infor- mation available for 13 ELP tests available as of 2007. Third, representatives from four testing programs—Assessing Comprehension and Communication in English State-to-State (ACCESS), the English Language Development Assessment (ELDA), Language Assessment Scales Links K-12 (LAS-Links), and the Stanford English Language Proficiency Test (SELP)—met with the panel at our second meeting to discuss their tests. This appendix summarizes the information we obtained from these sources. ASSESSING COMPREHENSION AND COMMUNICATION STATE TO STATE FOR ELL STUDENTS ACCESS was developed by the World-Class Instructional Design and Assess- ment (WIDA) Consortium. It began as a partnership of three states—Arkansas, Delaware, and Wisconsin—with technical support through the Center for Applied 181

OCR for page 181
182 ALLOCATING FEDERAL FUNDS TABLE A-1 English Language Proficiency Tests Reviewed and the States That Use Them Test States Using the Test During the 2009-2010 School Year Alabama, Delaware, DC, Georgia, Hawaii, Illinois, Kentucky, Maine, ACCESS Mississippi, Missouri, New Hampshire, New Jersey, New Mexico, North Carolina, North Dakota, Oklahoma, Pennsylvania, Rhode Island, South Dakota, Vermont, Virginia, Wisconsin, Wyoming California CELDT Florida CELLA ELDA Arkansas, Iowa, Louisiana, Nebraska, South Carolina, Tennessee, West Virginia Colorado, Connecticut, Indiana, Maryland LAS Links* New York NYSESLAT SELPa Arizona, Washington Texas TELPAS Total Tests, 8 Total states, 40 NOTE: States in bold are those with high numbers of ELL students. *Test is customized for each state so that it measures the state’s content standards. Linguistics (CAL), the University of Wisconsin system, and the University of Il - linois at Urbana-Champaign. Shortly after grant funding was awarded, seven other states joined the consortium (Alabama, District of Columbia, Illinois, Maine, New Hampshire, Rhode Island, and Vermont). Field-testing was done in 2004, and by spring 2005, the test was operational in three states (Alabama, Maine, and Vermont). By spring 2006, 12 states were using the assessment. At this point, development efforts were transferred from the Wisconsin Department of Public Instruction to the University of Wisconsin-Madison’s Wisconsin Center for Education Research (WCER) (Bauman et al., 2007, pp. 81, 82). In the 2010-2011 testing cycle, AC - CESS will be operational in 24 states. Development work on ACCESS is on-going, and approximately one-third of the test is refreshed every year.1 Content Standards The ELP content standards for ACCESS were developed jointly by eight of the WIDA member states in 2003. According to Bauman and colleagues (2007), in developing the standards, the consortium wanted to ensure two essential elements: (1) a strong representation of the language of state academic standards across the 1 Information about ACCESS is available at http://www.wida.us/assessment/access/index.aspx [De - cember 2010].

OCR for page 181
183 APPENDIX A core content areas (language arts, math, science, social studies, and the classroom set- ting); and (2) consensus by member states on the components of the ELP standards. As new states have joined the consortium, teams of researchers have continued the process by conducting alignment studies between the WIDA standards and a state’s content standards. Grade Bands ACCESS reports information for five grade bands: K, 1-2, 3-5, 6-8, and 9-12. For each grade band except kindergarten, three difficulty levels of the test are avail - able. The difficulty levels are intended to tailor the test to students’ approximate proficiency range. Item Types ACCESS consists of both multiple-choice (the listening and reading tests) and constructed-response items (the writing and speaking tests). The speaking test is adaptive and administered one-on-one; the other tests are typically administered in a group setting. ACCESS test items are embedded in the context of a content-based theme, called a folder. A folder typically consists of a shared theme graphic followed by three or four items. Scores Reported ACCESS reports scores for each of the domains—reading, writing, listen- ing, and speaking—as well as four composite scores. The overall composite score is formed by weighting reading and writing by 35 percent each and by weighting listening and speaking by 15 percent each. Reading and writing are weighted higher on the basis of the test developer’s judgment about their importance for academic language proficiency. An oral language composite score is formed by equally weight - ing scores in listening and speaking; similarly, a literacy composite score is formed by equally weighting the scores in reading and writing. The comprehension composite score weights reading by 70 percent and listening by 30 percent (from Bauman et al., 2007, p. 90). Performance Levels ACCESS scores are reported using six proficiency levels: entering, beginning, developing, expanding, bridging, and reaching, defined as follows (MacGregor et al., 2009): Entering: English language learners will process, understand, produce, or use • pictorial or graphic representation of the language of the content areas;

OCR for page 181
184 ALLOCATING FEDERAL FUNDS • words, phrases, or chunks of language when with one-step commands, directions, use of questions, or statements with visual and graphic support. Beginning: English language learners will process, understand, produce, or use • general language related to the content areas; • phrases or short sentences; • oral or written language with phonological, syntactic, or semantic errors that often impede the meaning of the communication when presented with one, to multiple-step commands, directions, questions, or a series of statements with visual and graphic support. Developing: English language learners will process, understand, produce, or use • general and some specific language of the content areas; • expanded sentences in oral interaction or written paragraphs; • oral or written language with phonological, syntactic, or semantic errors that may impede the communication but retain much of its meaning when presented with oral or written, narrative or expository descriptions with occasional visual and graphic support. Expanding: English language learners will process, understand, produce, or use • specific and some technical language of the content areas; • a variety of sentence lengths of varying linguistic complexity in oral dis- course or multiple, related paragraphs; • oral or written language with minimal phonological, syntactic, or semantic errors that do not impede the overall meaning of the communication when presented with oral or written connected discourse with occasional visual and graphic support. Bridging: English language learners will process, understand, produce, or use • the technical language of the content areas; • a variety of sentence lengths of varying linguistic complexity in extended oral or written discourse, including stories, essays, or reports; • oral or written language approaching comparability to that of English proficient peers when presented with grade level material. Reaching: English language learners will process, understand, produce, or use • specialized or technical language reflective of the content area at grade level; • a variety of sentence lengths of varying linguistic complexity in extended oral or written discourse as required by the specified grade level; • oral or written communication in English comparable to proficient English peers.

OCR for page 181
185 APPENDIX A Cut scores for the levels were set using the bookmark procedure2 for listening and reading and the body of work method3 for writing and speaking (Bauman et al., 2007, pp. 84, 86). Following the introduction of the new pre-K cluster in 2007, an additional standard setting study for this cluster was conducted in 2008 (MacGregor et al., 2009). The WIDA Consortium allows its member states to determine the performance level on the ACCESS they consider to be English proficient (i.e., the level that indicates the student is sufficiently proficient to be considered for reclassification as a former ELL). The levels vary by state, with some setting the proficient level at expanding, some at bridging, and some at reaching. Reliability and Validity Information about the technical qualities of the ACCESS assessment is provided in its technical reports, which are prepared each year; and the most recent report available to the panel was for the administrations held during the 2008-2009 school year.4 The technical reports contain detailed information about test specifications, item and form development, item and form analysis, equating, and standard setting. They also contain results of analyses to evaluate reliability and validity, and they document efforts to evaluate fairness issues (e.g., bias review panels, analyses of dif - ferential item functioning). Reliability analyses include the standard types of analyses used for tests with multiple-choice items (i.e., estimates of internal consistency), as well as those used for open-ended items (i.e., interrater agreement, generalizabilty analyses). A number of validity studies have been conducted to collect content-, con- struct-, and criterion-related evidence. Content-related validity evidence was col - lected by comparing a priori proficiency levels (the proficiency level the item was designed to target) against the item’s difficulty. Expert review is also used to evaluate the extent to which items measure the intended content. Construct-related evidence consists primarily of the degree of correspondence among the subtest scores (i.e., the intercorrelations). Some evidence of criterion-related validity has been collected. One study involved comparing ACCESS scores to a priori ELP categorizations of students who participated in the field tests (described in Wolf et al., 2007, p. J2-75]). Another study involved comparisons of performance for students who took ACCESS and one of the older generation ELP tests, including the New IDEA Proficiency Test (New-IPT), the Language Assessment Scales (LAS), the Maculaitis Assessment of Competencies Test of English Language Proficiency (MAC II), and the Language Proficiency Test Series (LPTS). 2 SeeMitzel et al. (2001) for an explanation of this method. 3 SeeKingston et al. (2001) for an explanation of this method. 4 The reports are available at http://www.wida.us/assessment/access/TechReports/index.aspx [December 2010].

OCR for page 181
186 ALLOCATING FEDERAL FUNDS CALIFORNIA ENGLISH LANGUAGE DEVELOPMENT TEST The California English Language Development Test (CELDT) was developed and in place prior to the implementation of the No Child Left Behind Act (NCLB). In 1997, state legislation authorized the California Department of Education to develop ELP standards and a language proficiency assessment that would be used statewide, and the standards were adopted in 1999. The first version of the CELDT consisted primarily of items developed by CTB/McGraw-Hill for the Language Assessment Scales (LAS) tests, with some new items the test publisher developed specifically for the state. This version of the test was field tested in fall 2000. Data from the field test were used to select items and create the operational forms of the test, which were first administered in 2001. The CELDT has been updated yearly since 2001. Subsequent versions have replaced the LAS items with new items that are aligned with the California standards (Porta and Vega, 2007, p. 138). 5 Content Standards According to CELDT information, its test questions are designed to assess basic social conventions, rudimentary classroom vocabulary, and ways to express personal and safety needs. Some of the questions are designed to assess student performance at the early advanced and advanced proficiency levels and to incorporate classroom language. To this end, the questions engage academic language functions, such as explaining questions, analyzing, and summarizing. Grade Bands The CELDT has test versions for each of four grade bands: K-2, 3-5, 6-8, and 9-12. Item Types The test uses a combination of multiple-choice and constructed-response items. The reading test uses only multiple-choice items, and the speaking test uses only con- structed-responses items (requiring both short and extended answers). The listening and writing tests use a combination of item types: the listening uses multiple-choice and short-answer constructed-response items; the writing uses multiple-choice, short-answer constructed-response, and extended-answer constructed-response items (California Department of Education, 2008c, 2009c). 5 Information about the test is available at http://www.cde.ca.gov/ta/tg/el/ [December 2010].

OCR for page 181
187 APPENDIX A Scores Reported Scores are reported for each domain—listening, speaking, reading, and writing. Two composite scores are also reported. The comprehension score is derived from performance on the reading and listening subtests, and an overall composite score is also reported. For grades 3 through 12, the composite score is the average of the scores in all four domains. For kindergarten through grade 1, the composite score is formed by weighting listening and speaking by 45 percent each and by weighting reading and writing by 5 percent each (California Department of Education, 2009c). Performance Levels Five performance levels are reported for the CELDT: beginning, early interme- diate, intermediate, early advanced, and advanced, as follows (California Department of Education, 2009c). Beginning: Students performing at this level of may demonstrate little or no receptive or productive English skills. They are beginning to understand a few concrete details during unmodified beginning instruction. They may be able to respond to some communication and learning demands but with many errors. Oral and written production is usually limited to disconnected words and mem- orized statements and questions. Frequent errors make communication difficult. Early Intermediate: Students performing at this level continue to develop recep- tive and productive English skills. They are able to identify and understand more concrete details during unmodified instruction. They may be able to respond with increasing ease to more varied communication and learning de - mands with a reduced number of errors. Oral and written production is usually limited to phrases and memorized statements and questions. Frequent errors still reduce communication. Intermediate: Students performing at this level begin to tailor the English lan- guage skills to meet communication and learning demands with increasing accuracy. They are able to identify and understand more concrete details and some major abstract concepts during unmodified instruction. They are able to respond with increasing ease to more varied communication and learning demands with a reduced number of errors. Oral and written production has usually expanded to sentences, paragraphs, and original statements and ques - tions. Errors still complicate communication. Early Advanced: Students at this level begin to combine the elements of the Eng- lish language in complex, cognitively demanding situations and are able to use English as a means for learning in academic domains. They are able to identify and summarize most concrete details and abstract concepts during unmodi -

OCR for page 181
188 ALLOCATING FEDERAL FUNDS fied instruction in most academic domains. Oral and written productions are characterized by more elaborate discourse and fully developed paragraphs and compositions. Errors are less frequent and rarely complicate communication. Advanced: Students at this level communicate effectively with various audiences on a wide range of familiar and new topics to meet social and learning demands. In order to attain the English performance level of their native English speaking peers, further linguistic enhancement and refinement are still necessary. They are able to identify and summarize concrete details and abstract concepts during unmodified instruction in all academic domains. Oral and written productions reflect discourse appropriate for academic domains. Errors are infrequent and do not reduce communication. The cut scores were set using the bookmark standard-setting procedure (Mitzel et al., 2001). The first standard setting was conducted in spring 2001, followed by a second standard setting conducted in February 2006. To be considered proficient in English on the CELDT, students need to score at the “early advanced” level or higher and have no domain scores below “intermediate.” Reliability and Validity Information about the technical qualities of the CELDT is provided in techni - cal reports, which are prepared each year by the contractor (CTB/McGraw-Hill); and the most recent report available to the panel was for the administrations held during the 2008-2009 school year.6 The technical reports contain detailed informa- tion about test specifications, item and form development, item and form analysis, equating, and standard setting. They also contain results of analyses to evaluate reliability and validity, although no bias or fairness studies appear to have been done. Reliability analyses include the standard types of analyses used for tests with multiple-choice items (i.e., estimates of internal consistency) as well as those used for open-ended items (i.e., interrater agreement, generalizabilty analyses). For the current version of the test, validity studies have been conducted to collect content- and construct-related evidence. The only criterion-related evidence that has been collected was a cut-score validation study completed in 2003, which compared qualitative assessments of 600 ELL students’ language ability with their CELDT scores (Wolf et al., 2008, pp. 72-79). COMPREHENSIVE ENGLISH LANGUAGE LEARNING ASSESSMENT The Comprehensive English Language Learning Assessment (CELLA) was developed by the English Proficiency for All Students (EPAS) consortium with the 6 The technical reports are available at http://www.cde.ca.gov/ta/tg/el/techreport.asp) [December 2010].

OCR for page 181
189 APPENDIX A assistance of the Educational Testing Service (ETS) and Accountability Works. 7 Five states initially participated in the consortium—Florida, Maryland, Michigan, Penn - sylvania, and Tennessee. Field testing of the items occurred in fall 2004. At present, Florida is the only state that uses the assessment. Content Standards According to the developer of the assessment, Ted Rebarber (Rebarber et al., 2007), the first stage in the process was to develop a set of proficiency benchmarks, defined as a matrix of component skills, at the grade level that students are expected to attain. The benchmarks were developed based on the experience and professional judgment of researchers at Accountability Works, language researchers, and ETS test developers. The benchmarks were reviewed and approved by educators and other representatives of the five states and acted as a set of common assessment objectives (Rebarber et al., 2007, p. 68). Once the benchmarks/objectives were established, analyses were conducted to determine the extent of alignment between the bench - marks and ELP content standards of the consortium states: The aligned standards served as the basis for developing the test. Grade Bands The CELLA has versions of the test available for four grade bands: K-2, 3-5, 6-8, and 9-12. Item Types The test uses both multiple-choice and constructed-response items. The reading and listening tests consist solely of multiple-choice items. The speaking test consists solely of constructed-response items. The writing test includes a combination of both item types. Scores Reported The CELLA reports four scale scores: (1) a score for the reading test; (2) a score for the writing test; (3) an oral score, which is a composite of performance on the listening and speaking subtests; and (4) an overall composite score. The subtest scores are unit weighted (i.e., summed) in forming the composites. CELLA score reports for students also provide information on the raw scores (referred to as “points awarded”) in several areas. These “subscores” are reported for listening/ speaking and reading/writing. Score reports indicate that the raw scores can be used 7 Information on the assessment is available at http://www.fldoe.org/aala/cella.asp [December 2010].

OCR for page 181
190 ALLOCATING FEDERAL FUNDS to evaluate students’ strengths and weaknesses, but they cannot be compared across administrations. Performance Levels Standard setting was conducted separately for each state participating in the consortium. Florida conducted its standard setting in winter 2006 using the book - mark procedure (Mitzel et al., 2001). Four performance levels are used: beginning, low intermediate, high intermediate, and proficient (Educational Testing Service, 2005). Beginning: Beginning students speak in English and understand spoken Eng - lish that is below grade level and require continuous support. Beginning stu - dents read below grade level text and require continuous support. Beginning students write below grade level and require continuous support. Low Intermediate: Low intermediate students speak in English and understand spoken English that is at or below grade level and require some support. Low intermediate students read at or below grade level text and require some sup - port. Low intermediate students write at or below grade level and require some support. High Intermediate: High intermediate students, with minimal support, speak in English and understand spoken English that is at grade level. High intermediate students read at grade level with minimal support. High intermediate students write at grade level with minimal support. Proficient: Proficient students speak in English and understand spoken English at grade level in a manner similar to non-English language learners. Proficient students read at grade level text in a manner similar to non-English language learners. Proficient students write at grade level in a manner similar to non- English language learners. Separate cut scores were set for three subscores—the oral score (listing and speak- ing), reading, and writing. Performance level descriptions are provided for each of these areas. The state’s policy on reclassification procedures specifies the following criteria for determining proficient performance from the composite score (Florida Depart- ment of Education, 2006):

OCR for page 181
191 APPENDIX A Grade Cluster English Proficient Composite Score K-2 2050 3-5 2150 6-8 2200 9-12 2250 Reliability and Validity Information about the technical qualities of the CELLA is provided in techni- cal reports, which are prepared by the contractor (ETS).8 The most recent report available to the panel was published in 2005. The technical report contains detailed information about test specifications, item and form development, item and form analysis, equating, and standard setting. The report also contains results of an analy - sis to evaluate bias and fairness (through analyses of differential item functioning). Reliability estimates are reported in the form of standard errors of measurement. No validity information is reported in the technical manual, although Porta and Vega (2007) indicate that a factor analysis study was conducted to provide construct-re - lated validity evidence (Fitzpatrick et al., 2006, cited in Porta and Vega, 2007, p. 77). ENGLISH LANGUAGE DEVELOPMENT ASSESSMENT The English Language Development Assessment (ELDA) is a consortium-based test that was developed by the Council of Chief State School Officers (CCSSO) in conjunction with states in the State Collaborative on Assessment and Student Standards for Limited English Proficient students (LEP-SCASS). To develop the as - sessment, the consortium worked with the American Institutes for Research (AIR) and Measurement, Incorporated—with external advice from the Center for the Study of Assessment Validity and Evaluation (C-SAVE).9 Development work oc- curred between fall 2002 and December 2005. Initially, 18 states were members of LEP-SCASS, and 13 states participated in the process of developing, field testing, validating, and implementing ELDA as an operational assessment (Sharon Saez, program director with the Council of Chief State School Officers, personal com - munication, August 2010).10 8 The technical reports are available at http://www.accountabilityworks.org/photos/CELLA_Technical_ Summary_Report.pdf [December 2010] 9 C-SAVE was then housed at the University of Maryland and is now housed at the University of Wisconsin. 10 Nevada was the lead state in collaboration with Georgia, Indiana, Iowa, Kentucky, Louisiana, Ne - braska, New Jersey, Ohio, Oklahoma, South Carolina, Virginia, and West Virginia.

OCR for page 181
198 ALLOCATING FEDERAL FUNDS Scores Reported The NYSESLAT assesses skills in the domains of reading, writing, listening, and speaking. Two composite scores are reported: an oral score that combines performance on the listening and speaking tests and a written score that combines performance on the reading and writing tests. Performance Levels Four performance levels have been developed for the test: beginning, intermedi- ate, advanced, and proficient. The technical manual provides descriptions only of the proficient level: Proficient Level: Reading • Students read English fluently and confidently and reflect upon a wide range of grade appropriate English language texts. • Students identify and interpret relevant data, facts, and main ideas in Eng- lish literary and informational texts. • Students comprehend and analyze the author’s purpose, point of view, tone, and figurative language and appropriate inferences in English. • Students analyze experiences, ideas, information, and issues presented by others in printed English languages text, using a variety of established criteria. • Students demonstrate inference and “beyond the text” understanding of grade-level written English language texts. • Students interpret, predict, draw conclusions, categorize, and make con- nections to their own lives and other texts. Proficient Level: Writing • Students utilize standard written English to express ideas on a grade- appropriate level by using varied sentence structure, language patterns, and descriptive language. • Students apply appropriate grade-level strategies to produce a variety of English language written products that demonstrate an awareness of audi- ence, purpose, point of view, tone, and sense of voice. • Students use written English language to acquire, interpret, apply, and transmit information. • Students present, in written English language and from a variety of perspec- tives, their opinions and judgments on experiences, ideas, information, and issues. • Students use written English for effective social communication with a wide variety of people. • Students integrate conventions of English language grammar, usage, spell- ing, capitalization, and punctuation to communicate effectively about

OCR for page 181
199 APPENDIX A various topics. (Minor errors in spelling grammar or punctuation do not interfere with comprehension.) • Students self-monitor and edit their English language written work. • Students write literary, interpretive, and responsive essays for personal expression. Proficient Level: Listening • Students interpret important features of oral English language, at their grade level, relating to social academic topics and can discriminate between what is and what is not relevant. • Students distinguish, conceptually or linguistically, complex oral English lan- guage expected of their grade level of fluent and/or native English speakers. • Students comprehend grade-level English vocabulary, idioms, colloquial ex- pressions, and apply their prior knowledge to grasp complex ideas expressed in English. • Students listen to spoken English for a variety of purposes, including to acquire information and to take notes. Proficient Level: Speaking • Students select precise and descriptive grade-level vocabulary to participate actively in both social and academic English language settings. • Students make use of standard English to communicate their ideas ef- fectively in an organized and cohesive manner by adjusting to the social context to make themselves understood in English. • Students utilize a variety of oral standard English language resources to ana- lyze, solve problems, make decisions, and communicate shades of meaning in English. • Students use oral standard English language to acquire, interpret, apply, and transmit information. • Students present, in oral standard English language, their opinions and judgments on experiences, ideas, information, and issues. • Students use the English language for effective social communication in socially and culturally appropriate manners. Because there are two composite scores, the state has adopted a rule for deter - mining proficiency from the two composites. That is, the overall proficiency level is defined by the lower of the two proficiency level designations. For example, if a student scores in the advanced level for listening/speaking and the proficient level for reading/writing, the overall level is advanced (CTB/McGraw-Hill, 2006). Standard setting was based on the item mapping procedure (Mitzel, et al., 2001). The technical manual for 2006 indicates that detailed descriptions for each performance level exist, but they are not included in the manual.

OCR for page 181
200 ALLOCATING FEDERAL FUNDS Reliability and Validity Information about the technical qualities of the assessment is provided in the technical reports.13 Technical reports are prepared each year by the contractor, Pearson. The technical reports contain detailed information about test specifica - tions, item and form development, item and form analysis, equating, and standard setting. They also contain results of analyses to evaluate reliability and validity, and they document efforts to evaluate fairness issues (e.g., bias review panels, analyses of differential item functioning). Reliability analyses include the standard types of analyses used for tests with multiple-choice items (i.e., estimates of internal consis - tency) as well as those used for open-ended items (i.e., interrater agreement), along with analyses of classification accuracy. Some validity evidence has been collected. Construct-related validity evidence was obtained by examining the intercorrela - tions between subtest scores and by conducting confirmatory factor analyses of the internal structure of the test. Evidence of criterion-related validity was collected by examining the degree of correspondence between students’ performance on the NYSELAT and performance on the state’s English assessments: for the lower grades, the latter was the state’s English language arts assessment used for NCLB account - ability purposes; for the higher grades, it was the Regents English exam. STANFORD ENGLISH LANGUAGE PROFICIENCY TEST The Stanford English Language Proficiency Test (SELP) was developed by NCS Pearson (formerly Harcourt Assessment, Inc.). Pearson offers both an “off-the-shelf ” version of the assessment and customized versions that are augmented to meet a particular state’s needs. Often the customized versions have different names. For in - stance, Arizona’s version of the SELP is called the Arizona English Language Learner Assessment (AZELLA), and Washington’s version is called the Washington Language Proficiency Test-II (WLPT-II). Currently, these are the only two user states, although when our study began, New Mexico and Wyoming also used the assessment (Roger Frantz, manager with Pearson, personal communication, June 2009). 14 Content Standards The test framework was developed in 1997 through analyses of the standards in place for six states (California, Delaware, Georgia, Hawaii, Missouri, and Texas), in conjunction with the TESOL standards (Roger Frantz, manager with Pearson, personal communication, June 2009). Frantz indicated that when a state chooses to 13 The reports are available at http://www.emsc.nysed.gov/osa/reports/ [December 2010]. 14 Basic information about the assessment is available through Pearson at http://www.pearsonassess - ments.com/haiweb/cultures/en-us/productdetail.htm?pid=015-8429-206 [December 2010]. Information is also available at the websites for user states: for Washington, at http://www.k12.wa.us/assessment/wlptii/ default.aspx [December 2010]; for Arizona, at http://www.ade.state.az.us/oelas/AZELLA/AZELLAAZ- 1TechnicalManual.pdf [December 2010].

OCR for page 181
201 APPENDIX A use the SELP, an alignment study is conducted to determine the extent to which the assessment is aligned with the state’s ELP content standards. The test is then custom - ized or augmented to ensure that the items cover the state standards. Grade Bands The SELP provides versions for six grade bands: Pre-K, K-1, 1-2, 3-5, 6-8, and 9-12. However, the grade bands can be customized for a state. For instance, Wash- ington uses versions for four grade bands, K-2, 3-5, 6-8, and 9-12, and Arizona uses versions for five grade bands, K, 1-2, 3-5, 6-8, and 9-12. Item Types The SELP consists of five subtests: (1) listening, (2) reading, (3) writing, (4) writing conventions, and (5) speaking. The listening and reading subtests use multiple-choice items. The speaking subtest uses constructed-response items (described by the developer as “performance-based”). The writing conventions subtest uses multiple-choice items to measure the mechanics of writing. The writ - ing subtest uses extended-answer constructed-response items. Scores Reported The off-the-shelf version of SELP offers scores for listening, speaking, and reading. Writing is a composite of the writing and writing conventions subtest. Five other composite scores are available: (1) productive skills [speaking and writing]; (2) comprehension skills [listening and reading]; (3) oral skills [listening and speaking]; (4) academic skills [reading, writing, and writing conventions]; and (5) an overall composite score. Washington reports individual domain scores (listening, speaking, reading, writing) and an overall composite score. Arizona reports the four domain scores and three composites (comprehension, oral, and overall composite). Performance Levels The off-the-shelf version of the SELP has set five performance level descriptions, but states are free to determine their own levels (Roger Frantz, manager with Pearson, personal communication, June 2009). The off-the-shelf version uses the following performance levels: pre-emergent, emergent, basic, intermediate, and proficient. The recommended cut scores for these levels were set by the publisher using the modified Angoff procedure (Angoff, 1984, also see Stephenson, 2003). For states that use a customized version, separate standard setting is done, and performance levels are adapted to the state needs. Arizona uses the performance level names established for the off-the-shelf ver - sion (Porta and Vega, 2007, p. 137). The cutoff scores for the performance levels

OCR for page 181
202 ALLOCATING FEDERAL FUNDS were determined through a standard setting based on the modified-Angoff pro - cedure (Angoff, 1984; Reckase, 2000, as cited in Harcourt, 2007). Performance- level descriptions were developed for each domain area and for each grade band: that is, there are 20 sets of descriptions for the five performance levels. No overall performance-level descriptions appear to be available. As a sample of the perfor - mance level descriptions used by Arizona, below are the descriptions for the com - posite score in comprehension (reading and listening) for the middle elementary grades (3-5) (Harcourt, 2007). Pre-Emergent: This student made very few or no responses. This student has very little ability to understand spoken English and understands only a few isolated words. This student understands almost no written English or only a few iso - lated words. This student may be able to understand visual universal symbols and graphics associated with a text. Emergent: This student is able to comprehend a few key words, phrases, and short sentences in simple conversations on topics of immediate personal rel - evance when spoken slowly with frequent repetitions and contextual clues. This student is able to understand a few common high-frequency sight words and simple sentences in English. This student is able to comprehend a few simple content-area words with the aid of picture cues. This student is able to indicate the meaning of some common signs, graphics, and symbols. Basic: This student is able to comprehend and follow three- to four-step oral directions related to the position of one’s movements in space. This student can comprehend a few content-area words, including grade-level math and science vocabulary. This student is able to understand a few words that indicate math - ematics operations. This student is able to comprehend some simple grade-level math word problems. This student comprehends and follows up to five-step written directions for classroom activities. Intermediate: This student is able to comprehend and follow three- to four-step oral directions related to the position, frequency, and duration of one’s move- ments in space. This student can comprehend some content-area words, includ- ing grade-level math and science vocabulary. This student is able to understand some words that indicate mathematics operations. Occasionally, this student is able to comprehend grade-level math word problems. This student compre - hends and follows a short set of written instructions on routine procedures. Proficient: This student comprehends and follows multiple-step oral instruc- tions (four or more steps) for familiar processes or procedures. This student can comprehend many content-area words, including grade-level math and science vocabulary. This student is able to understand many words that indicate

OCR for page 181
203 APPENDIX A mathematics operations. Sometimes this student comprehends grade-level math word problems. This student comprehends and follows a set of written multi- step instructions on routine procedures. Washington uses four performance levels: beginning/advanced beginning, in- termediate, advanced, and transitional. Students must reach the transitional level to be considered for reclassification (Kimberly Hayes, WLPT-II memo, Office of Superintendent of Public Instruction, available: http://www.k12.wa.us/assessment/ wlptii/pubdocs/WLPTMemoUpdated2010.pdf ). Performance level descriptions are not provided in the technical manual but were obtained through the state Title III director (Helen Malagon, personal correspondence, September 2010): Beginning/Advanced Beginning: Has little or no English reading skills with some understanding of content-area vocabulary and concepts. Writes simple English words, patterned phrases, and simple sentences. Communicates with words, sentences, drawings, gestures, and actions. Intermediate: Comprehends short connected texts with context clues. Writes simple sentences or repetitive language. Participates in social discussions on unfamiliar topics. Begins to self-correct speech. Advanced: Reads both short and long connected texts with understanding. Writes simple essays with standard conventions, organization, and detail. Uses figurative and idiomatic language in discussions of academic content and ideas. Transitional: Reads and writes at grade level. Uses grammatically correct English with native-like proficiency. Details about Washington’s standard-setting methods are not described in the tech - nical manual. Reliability and Validity Information about the technical qualities of the SELP assessment is provided in technical reports, some of which are available through state websites and some through the publisher.15 The technical report for the WLPT-II was obtained from the state Title III director (Pearson Education, 2010). Technical reports do not ap - 15 For instance, we obtained a technical report for the off-the-shelf version of SELP through the pub - lisher, and we obtained the technical report for AZELLA at http://www.ade.state.az.us/oelas/AZELLA/ AZELLAAZ-1TechnicalManual.pdf [December 2010]. We also obtained a technical report for New Mexico, for the 2007-2008 school year, at http://www.ped.state.nm.us/AssessmentAccountability/ procurementLib3.html [December 2010]. We do not provide details about the New Mexico test because the state discontinued its contract with Pearson in 2009 and began using the ACCESS.

OCR for page 181
204 ALLOCATING FEDERAL FUNDS pear to be prepared each year for each state. An updated version of the technical report for the off-the-shelf version was still under preparation for the 2009 adminis - tration year. The version of Arizona’s technical manual that the panel obtained was a summary of technical information for the 2006 administration year. The version of Washington’s technical manual that we reviewed was for the 2008-2009 testing year. The technical reports contain detailed information about test specifications, item and form development, item and form analysis, equating, and standard setting. They also contain results of analyses to evaluate reliability and validity and docu - ment efforts to evaluate fairness issues. For the SELP, reliability analyses include the standard types of analyses used for tests with multiple-choice items (i.e., estimates of internal consistency) as well as those used for open-ended items (i.e., interrater agreement). Studies of classification accuracy are also reported in the technical manu- als for the two user states (Arizona and Washington). A number of validity studies have been conducted for both states. Evidence of content-related validity is based on studies of the alignment between the test items and the content standards. Evidence of construct-related validity is based on examination of the intercorrelations among the subtests, point biserial correlations, and principal components factor analyses of the internal structure. No evidence of criterion-related validity is reported for either state, although the report for Arizona indicates that such studies were planned for the 2007 testing cycle. Studies of fairness/bias appear to be based on bias reviews conducted as items were developed and test forms assembled. Results from analyses of differential item functioning are reported in the technical manual for Washington but not for Arizona. TEXAS ENGLISH LANGUAGE PROFICIENCY ASSESSMENT SYSTEM In response to state legislation passed in 1995, the Texas Education Agency (TEA), along with the testing contractor, Beck Evaluation and Testing Associates, developed the Reading Proficiency Tests in English (RPTE), which were imple- mented during the 1999-2000 school year for ELL students in grades 3 through 12. These were the first state-administered reading tests of ELP in the Texas assess - ment program. In response to federal requirements for assessing additional grades and language domains, additional assessments of English language proficiency were implemented during the 2003-2004 school year. At that time, the Texas English Language Proficiency Assessment System (TELPAS) was created, and RPTE was retained as the reading component of TELPAS for ELL students in grades 3-12. Holistically rated assessments were developed for the domain of reading in K-2 and for listening, speaking, and writing in K-12. Changes were made to the RPTE dur - ing the 2007-2008 school year, and the name RPTE was discontinued. The current version of the test is an online assessment. Technical information is available in the technical digest published by the Texas Education Agency (2009c).16 16 Technical Digest 2008-2009, Chapter 7, TELPAS pp. 165-167; it is available at http://www.tea.state. tx.us/index3.aspx?id=2147484418&menu) [December 2010].

OCR for page 181
205 APPENDIX A Content Standards The RPTE was originally intended to align with the state’s previous assessment program, the Texas Assessment of Academic Skills. Beginning in spring 2004, the RPTE was augmented in order to align it with another assessment, the Texas As - sessment of Knowledge and Skills reading selections and test questions. In 2008, a new edition of RPTE was developed to align with the state’s revised ELP standards, at which point a number of test modifications were made. • The TELPAS subcomponent name RPTE was discontinued. • A grade 2 test was added, resulting in the discontinuation of the previously administered holistically rated grade 2 TELPAS reading assessment. • The grade clustering of the middle and high school tests changed from grades 6-8 and 9-12 to grades 6-7, 8-9, and 10-12. • More reading selections and test questions were added to assess English language reading proficiency in mathematics and science contexts. • The test blueprints were modified to include more reading material at the highest ELP level. • The tests were developed as online assessments. TELPAS is intended to measure learning in alignment with the Texas English Language Proficiency Standards, which are a component of the Texas Essential Knowledge and Skills curriculum. The standards outline the instruction that ELL students must receive to support their ability to develop academic ELP and acquire challenging academic knowledge and skills. Grade Bands The TELPAS reading tests have versions for the following grade bands: 2, 3, 4-5, 6-7, 8-9, and 10-12. The holistically rated components are grade specific. Item Types TELPAS includes holistically rated, performance-based components to assess skills in some of the domains. These assessments are used in all domains for grades K-1, listening, speaking, reading, and writing. For grades 2-12, they are used to assess all domains except reading, which is assessed through multiple-choice items. The holistic assessments are conducted by teachers in the classroom. The teachers are trained to collect information on their own students and to evaluate on the basis of their interactions with and observations of students. Writing in grades 2-12 is assessed through a collection of students’ classroom writing assignments. Teachers must undergo training to learn how to conduct the ratings and must meet qualifi - cation standards. The rating rubrics are the proficiency-level descriptors, which are defined in the Texas ELP standards and which teachers are required to use in ongoing

OCR for page 181
206 ALLOCATING FEDERAL FUNDS instruction to develop students’ English language proficiency and make grade-level instruction linguistically accessible. Scores Reported Scores are reported for each domain, listening, speaking, reading, and writing. Two composite scores are also reported. One is a comprehension score, derived from performance on the reading and listening subtests. An overall composite score and rating are also reported. In computing this composite score, listening and speaking are each weighted by 5 percent, writing is weighted by 15 percent, and reading is weighted by 75 percent. According to the technical manual (Texas Education Agency, 2009c) listening and speaking receive less weight so that students do not attain a high composite proficiency rating before they acquire the English reading and writing proficiency needed to support their full potential for academic success. Performance Levels TELPAS scores are reported according to four performance levels: beginning, intermediate, advanced, and advanced high. Performance-level descriptions are avail- able for each domain and for the overall score. The global descriptors appear below: Beginning: Beginning students have little or no ability to understand and use English. They may know a little English but not enough to function meaning - fully in social or academic settings. Intermediate: Intermediate students do have some ability to understand and use English. They can function in social and academic settings as long as the tasks require them to understand and used simple language structures and high- frequency vocabulary in routine contexts. Advanced: Advanced students are able to engage in age-appropriate academic instruction in English, although ongoing second language support is needed to help them understand and use grade-appropriate language. These students function beyond the level of simple, routinely used English. Advanced High: Advanced high students have attained the command of English that enables them, with minimal second language acquisition support, to engage in regular, all-English, academic instruction at their grade level. To be considered proficient in English on the TELPAS, students must score at the “advanced high” level.

OCR for page 181
207 APPENDIX A Reliability and Validity Information about the technical qualities of the assessment is provided in an - nually published technical digests.17 The technical digests are prepared for each administration cycle by TEA, in conjunction with Pearson, the state’s testing con - tractor. The version used for the panel’s review was for the 2008-2009 school year (Texas Education Agency, 2009c). This digest contains detailed information about test specifications, item and form development, item and form analysis, and statisti - cal procedures for equating of the reading test. The holistically rated assessments are not statistically equated; instead, the difficulty is maintained through the use of consistent rating rubrics developed to define the proficiency levels and through consistent training and qualifying procedures for the raters. Details about standard setting appear in the report for the 2007-2008 school year. The technical report contains results of analyses to evaluate reliability and va - lidity. Reliability analyses include the standard types of analyses used for tests with multiple-choice items (i.e., estimates of internal consistency) as well as those used for open-ended items (i.e., interrater agreement). Estimates of classification accuracy are also provided (e.g., accuracy of student classifications into performance catego - ries). Some validity evidence has been collected. Content-related validity evidence consists primarily of expert review of the extent to which the items correspond/ conform to the item specifications and the performance-level descriptions. The TEA indicates that construct-related validity evidence is provided through estimation of internal consistency reliability for the multiple-choice components and the training and administration procedures for the holistically rated components. Evidence of criterion-related validity was collected by examining the degree of correspondence between performance on the TELPAS reading component and performance on the state’s reading assessment (the Texas Assessment of Knowledge and Skills, TAKS). For the study, the average TAKS reading score was calculated for students at each grade level and at each performance level: for example, the mean TAKS score for 3rd graders classified on the TELPAS as beginning, intermediate, advanced, or advanced high, and so on for each grade). Rating audits of the other language domains are conducted to provide evidence that the internal structure of the assessments are intact and that teachers administer the holistically rated assessments and apply the rating rubrics as intended. No information is provided about attempts to evaluate the assessment for fairness or bias. 17 The digests are available at http://www.tea.state.tx.us/index3.aspx?id=2147484418&menu_id=793) [December 2010].

OCR for page 181