Read "Allocating Federal Funds for State Programs for English Language Learners" at NAP.edu

Page 181 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

Appendix A
Review of English Language Proficiency Tests

As part of the panel’s work, we identified eight English language proficiency (ELP) tests to review in detail (see Chapter 3). These eight tests are used by 40 states and are administered to approximately 75 percent of the English language learner (ELL) students in the country. The tests that we reviewed are listed in Table A-1 along with the states that used each of them during the 2009-2010 school year.

Our review is based on several sources of information. First, we reviewed the technical manuals available for each test. Second, we consulted two recent reports that summarized technical information about the tests: Abedi (2007) provides detailed information about each of the consortium-developed ELP tests (as explained in Chapter 3) and brief descriptions of all of the tests used by the states during the 2006-2007 school year; Wolf et al. (2008) provide a summary of technical information available for 13 ELP tests available as of 2007. Third, representatives from four testing programs—Assessing Comprehension and Communication in English State-to-State (ACCESS), the English Language Development Assessment (ELDA), Language Assessment Scales Links K-12 (LAS-Links), and the Stanford English Language Proficiency Test (SELP)—met with the panel at our second meeting to discuss their tests. This appendix summarizes the information we obtained from these sources.

ASSESSING COMPREHENSION AND COMMUNICATION STATE TO STATE FOR ELL STUDENTS

ACCESS was developed by the World-Class Instructional Design and Assessment (WIDA) Consortium. It began as a partnership of three states—Arkansas, Delaware, and Wisconsin—with technical support through the Center for Applied

Page 182 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

TABLE A-1 English Language Proficiency Tests Reviewed and the States That Use Them

Test	States Using the Test During the 2009-2010 School Year
ACCESS	Alabama, Delaware, DC, Georgia, Hawaii, Illinois, Kentucky, Maine, Mississippi, Missouri, New Hampshire, New Jersey, New Mexico, North Carolina, North Dakota, Oklahoma, Pennsylvania, Rhode Island, South Dakota, Vermont, Virginia, Wisconsin, Wyoming
CELDT	California
CELLA	Florida
ELDA	Arkansas, Iowa, Louisiana, Nebraska, South Carolina, Tennessee, West Virginia
LAS Links*	Colorado, Connecticut, Indiana, Maryland
NYSESLAT	New York
SELP^a	Arizona, Washington
TELPAS	Texas
Total Tests, 8	Total states, 40
NOTE: States in bold are those with high numbers of ELL students. *Test is customized for each state so that it measures the state’s content standards.

Linguistics (CAL), the University of Wisconsin system, and the University of Illinois at Urbana-Champaign. Shortly after grant funding was awarded, seven other states joined the consortium (Alabama, District of Columbia, Illinois, Maine, New Hampshire, Rhode Island, and Vermont). Field-testing was done in 2004, and by spring 2005, the test was operational in three states (Alabama, Maine, and Vermont). By spring 2006, 12 states were using the assessment. At this point, development efforts were transferred from the Wisconsin Department of Public Instruction to the University of Wisconsin-Madison’s Wisconsin Center for Education Research (WCER) (Bauman et al., 2007, pp. 81, 82). In the 2010-2011 testing cycle, ACCESS will be operational in 24 states. Development work on ACCESS is on-going, and approximately one-third of the test is refreshed every year.¹

Content Standards

The ELP content standards for ACCESS were developed jointly by eight of the WIDA member states in 2003. According to Bauman and colleagues (2007), in developing the standards, the consortium wanted to ensure two essential elements: (1) a strong representation of the language of state academic standards across the

¹	Information about ACCESS is available at http://www.wida.us/assessment/access/index.aspx [December 2010].

Page 183 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

core content areas (language arts, math, science, social studies, and the classroom setting); and (2) consensus by member states on the components of the ELP standards. As new states have joined the consortium, teams of researchers have continued the process by conducting alignment studies between the WIDA standards and a state’s content standards.

Grade Bands

ACCESS reports information for five grade bands: K, 1-2, 3-5, 6-8, and 9-12. For each grade band except kindergarten, three difficulty levels of the test are available. The difficulty levels are intended to tailor the test to students’ approximate proficiency range.

Item Types

ACCESS consists of both multiple-choice (the listening and reading tests) and constructed-response items (the writing and speaking tests). The speaking test is adaptive and administered one-on-one; the other tests are typically administered in a group setting. ACCESS test items are embedded in the context of a content-based theme, called a folder. A folder typically consists of a shared theme graphic followed by three or four items.

Scores Reported

ACCESS reports scores for each of the domains—reading, writing, listening, and speaking—as well as four composite scores. The overall composite score is formed by weighting reading and writing by 35 percent each and by weighting listening and speaking by 15 percent each. Reading and writing are weighted higher on the basis of the test developer’s judgment about their importance for academic language proficiency. An oral language composite score is formed by equally weighting scores in listening and speaking; similarly, a literacy composite score is formed by equally weighting the scores in reading and writing. The comprehension composite score weights reading by 70 percent and listening by 30 percent (from Bauman et al., 2007, p. 90).

Performance Levels

ACCESS scores are reported using six proficiency levels: entering, beginning, developing, expanding, bridging, and reaching, defined as follows (MacGregor et al., 2009):

Entering: English language learners will process, understand, produce, or use

pictorial or graphic representation of the language of the content areas;

Page 184 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

words, phrases, or chunks of language when with one-step commands, directions, use of questions, or statements with visual and graphic support.

Beginning: English language learners will process, understand, produce, or use

general language related to the content areas;
phrases or short sentences;
oral or written language with phonological, syntactic, or semantic errors that often impede the meaning of the communication when presented with one, to multiple-step commands, directions, questions, or a series of statements with visual and graphic support.

Developing: English language learners will process, understand, produce, or use

general and some specific language of the content areas;
expanded sentences in oral interaction or written paragraphs;
oral or written language with phonological, syntactic, or semantic errors that may impede the communication but retain much of its meaning when presented with oral or written, narrative or expository descriptions with occasional visual and graphic support.

Expanding: English language learners will process, understand, produce, or use

specific and some technical language of the content areas;
a variety of sentence lengths of varying linguistic complexity in oral discourse or multiple, related paragraphs;
oral or written language with minimal phonological, syntactic, or semantic errors that do not impede the overall meaning of the communication when presented with oral or written connected discourse with occasional visual and graphic support.

Bridging: English language learners will process, understand, produce, or use

the technical language of the content areas;
a variety of sentence lengths of varying linguistic complexity in extended oral or written discourse, including stories, essays, or reports;
oral or written language approaching comparability to that of English proficient peers when presented with grade level material.

Reaching: English language learners will process, understand, produce, or use

specialized or technical language reflective of the content area at grade level;
a variety of sentence lengths of varying linguistic complexity in extended oral or written discourse as required by the specified grade level;
oral or written communication in English comparable to proficient English peers.

Page 185 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

Cut scores for the levels were set using the bookmark procedure² for listening and reading and the body of work method³ for writing and speaking (Bauman et al., 2007, pp. 84, 86). Following the introduction of the new pre-K cluster in 2007, an additional standard setting study for this cluster was conducted in 2008 (MacGregor et al., 2009).

The WIDA Consortium allows its member states to determine the performance level on the ACCESS they consider to be English proficient (i.e., the level that indicates the student is sufficiently proficient to be considered for reclassification as a former ELL). The levels vary by state, with some setting the proficient level at expanding, some at bridging, and some at reaching.

Reliability and Validity

Information about the technical qualities of the ACCESS assessment is provided in its technical reports, which are prepared each year; and the most recent report available to the panel was for the administrations held during the 2008-2009 school year.⁴ The technical reports contain detailed information about test specifications, item and form development, item and form analysis, equating, and standard setting. They also contain results of analyses to evaluate reliability and validity, and they document efforts to evaluate fairness issues (e.g., bias review panels, analyses of differential item functioning). Reliability analyses include the standard types of analyses used for tests with multiple-choice items (i.e., estimates of internal consistency), as well as those used for open-ended items (i.e., interrater agreement, generalizabilty analyses).

A number of validity studies have been conducted to collect content-, construct-, and criterion-related evidence. Content-related validity evidence was collected by comparing a priori proficiency levels (the proficiency level the item was designed to target) against the item’s difficulty. Expert review is also used to evaluate the extent to which items measure the intended content. Construct-related evidence consists primarily of the degree of correspondence among the subtest scores (i.e., the intercorrelations). Some evidence of criterion-related validity has been collected. One study involved comparing ACCESS scores to a priori ELP categorizations of students who participated in the field tests (described in Wolf et al., 2007, p. J2-75]). Another study involved comparisons of performance for students who took ACCESS and one of the older generation ELP tests, including the New IDEA Proficiency Test (New-IPT), the Language Assessment Scales (LAS), the Maculaitis Assessment of Competencies Test of English Language Proficiency (MAC II), and the Language Proficiency Test Series (LPTS).

²	See Mitzel et al. (2001) for an explanation of this method.
³	See Kingston et al. (2001) for an explanation of this method.
⁴	The reports are available at http://www.wida.us/assessment/access/TechReports/index.aspx [December 2010].

Page 186 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

CALIFORNIA ENGLISH LANGUAGE DEVELOPMENT TEST

The California English Language Development Test (CELDT) was developed and in place prior to the implementation of the No Child Left Behind Act (NCLB). In 1997, state legislation authorized the California Department of Education to develop ELP standards and a language proficiency assessment that would be used statewide, and the standards were adopted in 1999. The first version of the CELDT consisted primarily of items developed by CTB/McGraw-Hill for the Language Assessment Scales (LAS) tests, with some new items the test publisher developed specifically for the state. This version of the test was field tested in fall 2000. Data from the field test were used to select items and create the operational forms of the test, which were first administered in 2001. The CELDT has been updated yearly since 2001. Subsequent versions have replaced the LAS items with new items that are aligned with the California standards (Porta and Vega, 2007, p. 138).⁵

Content Standards

According to CELDT information, its test questions are designed to assess basic social conventions, rudimentary classroom vocabulary, and ways to express personal and safety needs. Some of the questions are designed to assess student performance at the early advanced and advanced proficiency levels and to incorporate classroom language. To this end, the questions engage academic language functions, such as explaining questions, analyzing, and summarizing.

Grade Bands

The CELDT has test versions for each of four grade bands: K-2, 3-5, 6-8, and 9-12.

Item Types

The test uses a combination of multiple-choice and constructed-response items. The reading test uses only multiple-choice items, and the speaking test uses only constructed-responses items (requiring both short and extended answers). The listening and writing tests use a combination of item types: the listening uses multiple-choice and short-answer constructed-response items; the writing uses multiple-choice, short-answer constructed-response, and extended-answer constructed-response items (California Department of Education, 2008c, 2009c).

⁵	Information about the test is available at http://www.cde.ca.gov/ta/tg/el/ [December 2010].

Page 187 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

Scores Reported

Scores are reported for each domain—listening, speaking, reading, and writing. Two composite scores are also reported. The comprehension score is derived from performance on the reading and listening subtests, and an overall composite score is also reported. For grades 3 through 12, the composite score is the average of the scores in all four domains. For kindergarten through grade 1, the composite score is formed by weighting listening and speaking by 45 percent each and by weighting reading and writing by 5 percent each (California Department of Education, 2009c).

Performance Levels

Five performance levels are reported for the CELDT: beginning, early intermediate, intermediate, early advanced, and advanced, as follows (California Department of Education, 2009c).

Beginning: Students performing at this level of may demonstrate little or no receptive or productive English skills. They are beginning to understand a few concrete details during unmodified beginning instruction. They may be able to respond to some communication and learning demands but with many errors. Oral and written production is usually limited to disconnected words and memorized statements and questions. Frequent errors make communication difficult.

Early Intermediate: Students performing at this level continue to develop receptive and productive English skills. They are able to identify and understand more concrete details during unmodified instruction. They may be able to respond with increasing ease to more varied communication and learning demands with a reduced number of errors. Oral and written production is usually limited to phrases and memorized statements and questions. Frequent errors still reduce communication.

Intermediate: Students performing at this level begin to tailor the English language skills to meet communication and learning demands with increasing accuracy. They are able to identify and understand more concrete details and some major abstract concepts during unmodified instruction. They are able to respond with increasing ease to more varied communication and learning demands with a reduced number of errors. Oral and written production has usually expanded to sentences, paragraphs, and original statements and questions. Errors still complicate communication.

Early Advanced: Students at this level begin to combine the elements of the English language in complex, cognitively demanding situations and are able to use English as a means for learning in academic domains. They are able to identify and summarize most concrete details and abstract concepts during unmodi-

Page 188 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

fied instruction in most academic domains. Oral and written productions are characterized by more elaborate discourse and fully developed paragraphs and compositions. Errors are less frequent and rarely complicate communication.

Advanced: Students at this level communicate effectively with various audiences on a wide range of familiar and new topics to meet social and learning demands. In order to attain the English performance level of their native English speaking peers, further linguistic enhancement and refinement are still necessary. They are able to identify and summarize concrete details and abstract concepts during unmodified instruction in all academic domains. Oral and written productions reflect discourse appropriate for academic domains. Errors are infrequent and do not reduce communication.

The cut scores were set using the bookmark standard-setting procedure (Mitzel et al., 2001). The first standard setting was conducted in spring 2001, followed by a second standard setting conducted in February 2006. To be considered proficient in English on the CELDT, students need to score at the “early advanced” level or higher and have no domain scores below “intermediate.”

Reliability and Validity

Information about the technical qualities of the CELDT is provided in technical reports, which are prepared each year by the contractor (CTB/McGraw-Hill); and the most recent report available to the panel was for the administrations held during the 2008-2009 school year.⁶ The technical reports contain detailed information about test specifications, item and form development, item and form analysis, equating, and standard setting. They also contain results of analyses to evaluate reliability and validity, although no bias or fairness studies appear to have been done. Reliability analyses include the standard types of analyses used for tests with multiple-choice items (i.e., estimates of internal consistency) as well as those used for open-ended items (i.e., interrater agreement, generalizabilty analyses). For the current version of the test, validity studies have been conducted to collect content-and construct-related evidence. The only criterion-related evidence that has been collected was a cut-score validation study completed in 2003, which compared qualitative assessments of 600 ELL students’ language ability with their CELDT scores (Wolf et al., 2008, pp. 72-79).

COMPREHENSIVE ENGLISH LANGUAGE LEARNING ASSESSMENT

The Comprehensive English Language Learning Assessment (CELLA) was developed by the English Proficiency for All Students (EPAS) consortium with the

⁶	The technical reports are available at http://www.cde.ca.gov/ta/tg/el/techreport.asp) [December 2010].

Page 189 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

assistance of the Educational Testing Service (ETS) and Accountability Works.⁷ Five states initially participated in the consortium—Florida, Maryland, Michigan, Pennsylvania, and Tennessee. Field testing of the items occurred in fall 2004. At present, Florida is the only state that uses the assessment.

Content Standards

According to the developer of the assessment, Ted Rebarber (Rebarber et al., 2007), the first stage in the process was to develop a set of proficiency benchmarks, defined as a matrix of component skills, at the grade level that students are expected to attain. The benchmarks were developed based on the experience and professional judgment of researchers at Accountability Works, language researchers, and ETS test developers. The benchmarks were reviewed and approved by educators and other representatives of the five states and acted as a set of common assessment objectives (Rebarber et al., 2007, p. 68). Once the benchmarks/objectives were established, analyses were conducted to determine the extent of alignment between the benchmarks and ELP content standards of the consortium states: The aligned standards served as the basis for developing the test.

Grade Bands

The CELLA has versions of the test available for four grade bands: K-2, 3-5, 6-8, and 9-12.

Item Types

The test uses both multiple-choice and constructed-response items. The reading and listening tests consist solely of multiple-choice items. The speaking test consists solely of constructed-response items. The writing test includes a combination of both item types.

Scores Reported

The CELLA reports four scale scores: (1) a score for the reading test; (2) a score for the writing test; (3) an oral score, which is a composite of performance on the listening and speaking subtests; and (4) an overall composite score. The subtest scores are unit weighted (i.e., summed) in forming the composites. CELLA score reports for students also provide information on the raw scores (referred to as “points awarded”) in several areas. These “subscores” are reported for listening/speaking and reading/writing. Score reports indicate that the raw scores can be used

⁷	Information on the assessment is available at http://www.fldoe.org/aala/cella.asp [December 2010].

Page 190 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

to evaluate students’ strengths and weaknesses, but they cannot be compared across administrations.

Performance Levels

Standard setting was conducted separately for each state participating in the consortium. Florida conducted its standard setting in winter 2006 using the bookmark procedure (Mitzel et al., 2001). Four performance levels are used: beginning, low intermediate, high intermediate, and proficient (Educational Testing Service, 2005).

Beginning: Beginning students speak in English and understand spoken English that is below grade level and require continuous support. Beginning students read below grade level text and require continuous support. Beginning students write below grade level and require continuous support.

Low Intermediate: Low intermediate students speak in English and understand spoken English that is at or below grade level and require some support. Low intermediate students read at or below grade level text and require some support. Low intermediate students write at or below grade level and require some support.

High Intermediate: High intermediate students, with minimal support, speak in English and understand spoken English that is at grade level. High intermediate students read at grade level with minimal support. High intermediate students write at grade level with minimal support.

Proficient: Proficient students speak in English and understand spoken English at grade level in a manner similar to non-English language learners. Proficient students read at grade level text in a manner similar to non-English language learners. Proficient students write at grade level in a manner similar to non-English language learners.

Separate cut scores were set for three subscores—the oral score (listing and speaking), reading, and writing. Performance level descriptions are provided for each of these areas.

The state’s policy on reclassification procedures specifies the following criteria for determining proficient performance from the composite score (Florida Department of Education, 2006):

Page 191 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

Grade Cluster	English Proficient Composite Score
K-2	2050
3-5	2150
6-8	2200
9-12	2250

Reliability and Validity

Information about the technical qualities of the CELLA is provided in technical reports, which are prepared by the contractor (ETS).⁸ The most recent report available to the panel was published in 2005. The technical report contains detailed information about test specifications, item and form development, item and form analysis, equating, and standard setting. The report also contains results of an analysis to evaluate bias and fairness (through analyses of differential item functioning). Reliability estimates are reported in the form of standard errors of measurement. No validity information is reported in the technical manual, although Porta and Vega (2007) indicate that a factor analysis study was conducted to provide construct-related validity evidence (Fitzpatrick et al., 2006, cited in Porta and Vega, 2007, p. 77).

ENGLISH LANGUAGE DEVELOPMENT ASSESSMENT

The English Language Development Assessment (ELDA) is a consortium-based test that was developed by the Council of Chief State School Officers (CCSSO) in conjunction with states in the State Collaborative on Assessment and Student Standards for Limited English Proficient students (LEP-SCASS). To develop the assessment, the consortium worked with the American Institutes for Research (AIR) and Measurement, Incorporated—with external advice from the Center for the Study of Assessment Validity and Evaluation (C-SAVE).⁹ Development work occurred between fall 2002 and December 2005. Initially, 18 states were members of LEP-SCASS, and 13 states participated in the process of developing, field testing, validating, and implementing ELDA as an operational assessment (Sharon Saez, program director with the Council of Chief State School Officers, personal communication, August 2010).¹⁰

⁸	The technical reports are available at http://www.accountabilityworks.org/photos/CELLA_Technical_Summary_Report.pdf [December 2010]
⁹	C-SAVE was then housed at the University of Maryland and is now housed at the University of Wisconsin.
¹⁰	Nevada was the lead state in collaboration with Georgia, Indiana, Iowa, Kentucky, Louisiana, Nebraska, New Jersey, Ohio, Oklahoma, South Carolina, Virginia, and West Virginia.

Page 192 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

Content Standards

ELDA was designed to assess academic English, which the consortium defines as (1) language used to convey curriculum-based academic content and (2) the language of the social environment of school. Accordingly, the test items are intended to measure language skills with content drawn from language arts, math, science, technology; and social studies, although the items do not require skills in or knowledge of content in those subjects. The test items are also intended to incorporate the language required for the school environment, which covers such topics as extracurricular activities, student health, homework, classroom management, and lunch time (American Institutes for Research, 2005).

The starting point for ELDA’s standards was a synthesis of the state standards in participating states and TESOL (Teachers of English to Speakers of Other Languages) standards. Of the initial 18 member states, 6 had existing ELP standards. These standards were reviewed and merged by AIR staff, and a consortium steering committee identified a common core of standards for each domain. Some states used these standards to guide the development of their own standards; others used them to review their existing standards and ensure alignment (Lara et al., 2007, p. 48).

The development process for the assessments for grades 3 through 12 was conducted separately from the development of the test for kindergarten through grade 2, although both processes followed the same steps. Both require test administrators to observe students in a variety of settings to record students’ typical behaviors or responses to a set of tasks. For each test item, test administrators are provided a description for each score point (0-3) for each task that they use to determine a score. For instance, the inventory contains items such as (Lara et al., 2007, p. 57):

Follows a two-step verbal instruction in a nonacademic setting (e.g., going to the lunchroom).
Identifies a picture of an object with the same ending sound as “cat.”
Uses correct English words for manipulatives (content-, age-, and grade-appropriate items).

Grade Bands

ELDA has separate versions of the test for three grade bands: 3-5, 6-8, and 9-12. ELDA also provides an inventory to assess skills of students in kindergarten through grade 2, with separate inventories for kindergarten and for grades 1 and 2. The wording or the focus of the inventory tasks varies in order to be targeted either for the developmental level of kindergarteners or that of 1st and 2nd graders.

Item Types

ELDA uses a combination of multiple-choice and constructed-response items. The listening and reading subsections consist of only multiple-choice items. The writ-

Page 193 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

ing subsection consists of both multiple-choice and short- and extended-constructed-response items. The speaking subsection consists only of oral constructed-response items.

Scores Reported

ELDA reports scores for each of the domains—reading, writing, listening, and speaking. Two composite scores are also reported: an overall score formed from scores on all four subtests and a comprehension score formed from the listening and reading subtests.

Performance Levels

ELDA uses five proficiency levels: pre-functional, beginning, intermediate, advanced, and full English proficiency (American Institutes for Research, 2005):

Pre-functional indicates that the student is beginning to

understand short utterances;
use gestures and simple words to communicate;
understand simple printed material; and
develop communicative writing skills.

Beginning indicates that the student can

understand simple statements, directions, and questions;
use appropriate strategies to initiate and response to simple conversation;
understand the general message of basic reading passages; and
compose short informative passages on familiar topics.

Intermediate indicates that the student can

understand standard speech delivered in school and social settings;
communicate orally with some hesitation;
understand descriptive material within familiar contexts and some complex narratives; and
write simple texts and short reports.

Advanced indicates that the student can

identify the main ideas and relevant details of discussions or presentations on a wide range of topics;
actively engage in most communicative situations familiar or unfamiliar;
understand the context of most text in academic areas with support; and
write multiparagraph essays, journal entries, personal/business letters, and creative texts in an organized fashion with some errors.

Page 194 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

Full English proficiency indicates that the student can

understand and identify the main ideas and relevant details of extended discussion or presentations on familiar and unfamiliar topics;
produce fluent and accurate language;
use reading strategies the same as their native English speaking peers to derive meaning from a wide range of both social and academic texts; and
write fluently using language structures, technical vocabulary, and appropriate writing conventions with some circumlocutions.

The bookmark procedure (Mitzel et al., 2001) was used for the reading, writing, and listening domains. For the speaking test, a generalized holistic approach was used: standard setters evaluated samples of student work, placing them into one of the five categories.

The LEP-SCASS consortium established the fully English proficient level as the performance level considered to be English proficient (i.e., the level that indicates the student is sufficiently proficient to be considered for reclassification as a former ELL), although each state may set a different level as it deems appropriate.

Reliability and Validity

Information about the technical qualities of ELDA is provided in technical reports, which are available through CCSSO staff. The technical reports are prepared by the contractor (American Institutes for Research). The most recent report available to the panel was published in 2005, with a supplement published in 2006 (see American Institutes for Research, 2005). In addition, some states have prepared their own reports. The technical reports contain detailed information about test specifications, item and form development, item and form analysis, equating, and standard setting. They also contain results of analyses to evaluate reliability and validity, including studies of bias and fairness (i.e., reviews by panels, studies of differential item functioning). Reliability analyses include the standard types of analyses used for tests with multiple-choice items (i.e., estimates of internal consistency) as well as those used for open-ended items (i.e., interrater agreement, generalizabilty analyses). The technical manual does not contain results from validity analyses, but they are available in separate reports. Studies have been conducted to obtain content- and construct-related validity evidence. Several estimates of criterion-related evidence have been collected through comparison of test performance with teacher ratings, performance on the LAS, and performance on the New Idea Proficiency Test (Lara et al., 2007).

LANGUAGE ASSESSMENT SYSTEMS LINKS

The Language Assessment Systems Links (LAS Links) was developed by CTB/McGraw-Hill, which had previously developed the Language Assessment Scales

Page 195 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

(LAS), an ELP assessment in use prior to NCLB. LAS Links was developed to comply with NCLB. The assessment is available for states to use as an “off-the-shelf” test or to be customized so that it meets a given state’s needs and is aligned with its ELP content standards. Customized versions of LAS Links may be renamed by the state. For instance, Colorado’s version of LAS Links is called the Colorado English Language Assessment (CELA).¹¹

Content Standards

CTB/McGraw-Hill had previously developed the English Language Proficiency Assessment Standards (ELPAS), which were intended to include the primary components of the TESOL standards and the standards for English as a second language (ESL) from several states. According to the technical manual (CTB/McGraw-Hill, 2006), LAS Links tests are intended to be aligned to these standards. When a state chooses to use LAS Links, the test publisher conducts an alignment study to determine the extent to which the assessment is aligned with the state’s ELP content standards. The test is then customized or augmented to ensure that the items cover the state standards (Chris Morrison, director with CTB-McGraw/Hill, personal communication, June 2009).

Grade Bands

LAS Links has versions of the test available five grade bands: K-1, 2-3, 4-5, 6-8, and 9-12.

Item Types

The assessment uses both multiple-choice and constructed-response items. The listening and reading subtests use only multiple-choice items. The speaking subtest uses only constructed-response items (both short-answer and extended-response items) and is described by the developer as “performance based.” The writing subtest includes multiple-choice items, short-answer constructed-response items, and one extended constructed-response item.

Scores Reported

The assessment reports scores separately for each domain—listening, speaking, reading, writing—and provides two composite scores. The oral score summarizes performance in listening and speaking, and the comprehension score summarizes performance in reading and writing. An overall composite score is also reported.

¹¹	Information about the assessment is available at http://www.ctb.com/ctb.com/control/productFamilyViewAction?productFamilyId=454&p=products[December 2010].

Page 196 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

Performance Levels

The assessment reports scores using five performance levels: beginning, early intermediate, intermediate, proficient, and above proficient, defined as follows:

Beginning: A Level 1 student is beginning to develop receptive and productive uses of English in the school context, although comprehension may be demonstrated nonverbally or through the native language, rather than in English.

Early Intermediate: A Level 2 student is developing the ability to communicate in English within the school context. Errors impede basic communication and comprehension. Lexical, syntactic, phonological, and discourse features of English are emerging.

Intermediate: A Level 3 student is developing the ability to communicate effectively in English across a range of grade-level appropriate language demands in the school context. Errors interfere with communication and comprehension. Repetition and negotiation are often needed. The student exhibits a limited range of lexical, syntactic, phonological, and discourse features when addressing new and familiar topics.

Proficient: A Level 4 student communicates effectively in English across a range of grade-level appropriate language demands in the school context, even though errors occur. The student exhibits productive and receptive control of lexical, syntactic, phonological, and discourse features when addressing new and familiar topics.

Above Proficient: A Level 5 student communicates effectively in English, with few if any errors, across a wide range of grade-level appropriate language demands in the school context. The student commands a high degree of productive and receptive control of lexical, syntactic, phonological, and discourse features when addressing new and familiar topics.

States may rename these levels for their own purposes. Performance-level descriptions are provided for the overall composite score. According to the technical manual, the standard setting utilized a bookmark procedure (see Lewis et al., 1996) along with a policy-based review of the cut scores by a national group.

Reliability and Validity

Information about the technical qualities of the assessment is provided in technical reports, available from the publisher. The most recent report available to the panel was published in 2006 (CTB/McGraw-Hill, 2006). The technical reports contain detailed information about test specifications, item and form development,

Page 197 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

item and form analysis, equating, and standard setting. The 2006 technical report also includes reliability estimates for the multiple-choice items (estimates of internal consistency) and open-ended items (interrater correlations). The report indicates that bias was attended to during item development but does not report any results of analyses of differential item functioning. No validity studies are reported in the technical report, and such studies do not appear to have been conducted (other than a study to relate scores on the LAS Links to the earlier LAS).

NEW YORK STATE ENGLISH AS A SECOND LANGUAGE ACHIEVEMENT TEST

The New York State English as a Second Language Achievement Test (NYSESLAT) was developed by the state of New York, first in conjunction with the Educational Testing Service (ETS) and later with Harcourt Assessment Inc. (now owned by Pearson). For the 2005 test administration, items from the Harcourt ELL item bank were initially used to construct the newly developed items for the test (including items initially developed for Stanford English Language Proficiency test forms).¹²

Content Standards

According to the technical manual (New York State Department of Education, 2006, p. 10), the New York State Education Department (NYSED) developed Learning Standards for English as a Second Language (ESL) to meet the requirements of NCLB. Accordingly, the state’s learning standards and performance indicators are derived from the domains of speaking, listening, reading, and writing and are intended to align with the state’s English Language Arts standards.

Grade Bands

The NYSESLAT has versions available for five grade bands: K-1, 2-4, 5-6, 7-8, and 9-12.

Item Types

The NYSESLAT uses a combination of multiple-choice and constructed-response items. Listening and reading subtests are comprised entirely of multiple-choice items. The speaking test requires students to provide an oral constructed response. The writing test uses both multiple-choice and constructed-response items, some involving a short written response and some involving an extended response.

¹²	Information on the tests is provided at http://www.emsc.nysed.gov/osa/nyseslat/ [December 2010].

Page 198 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

Scores Reported

The NYSESLAT assesses skills in the domains of reading, writing, listening, and speaking. Two composite scores are reported: an oral score that combines performance on the listening and speaking tests and a written score that combines performance on the reading and writing tests.

Performance Levels

Four performance levels have been developed for the test: beginning, intermediate, advanced, and proficient. The technical manual provides descriptions only of the proficient level:

Proficient Level: Reading

Students read English fluently and confidently and reflect upon a wide range of grade appropriate English language texts.
Students identify and interpret relevant data, facts, and main ideas in English literary and informational texts.
Students comprehend and analyze the author’s purpose, point of view, tone, and figurative language and appropriate inferences in English.
Students analyze experiences, ideas, information, and issues presented by others in printed English languages text, using a variety of established criteria.
Students demonstrate inference and “beyond the text” understanding of grade-level written English language texts.
Students interpret, predict, draw conclusions, categorize, and make connections to their own lives and other texts.

Proficient Level: Writing

Students utilize standard written English to express ideas on a grade-appropriate level by using varied sentence structure, language patterns, and descriptive language.
Students apply appropriate grade-level strategies to produce a variety of English language written products that demonstrate an awareness of audience, purpose, point of view, tone, and sense of voice.
Students use written English language to acquire, interpret, apply, and transmit information.
Students present, in written English language and from a variety of perspectives, their opinions and judgments on experiences, ideas, information, and issues.
Students use written English for effective social communication with a wide variety of people.
Students integrate conventions of English language grammar, usage, spelling, capitalization, and punctuation to communicate effectively about

Page 199 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

various topics. (Minor errors in spelling grammar or punctuation do not interfere with comprehension.)

Students self-monitor and edit their English language written work.
Students write literary, interpretive, and responsive essays for personal expression.

Proficient Level: Listening

Students interpret important features of oral English language, at their grade level, relating to social academic topics and can discriminate between what is and what is not relevant.
Students distinguish, conceptually or linguistically, complex oral English language expected of their grade level of fluent and/or native English speakers.
Students comprehend grade-level English vocabulary, idioms, colloquial expressions, and apply their prior knowledge to grasp complex ideas expressed in English.
Students listen to spoken English for a variety of purposes, including to acquire information and to take notes.

Proficient Level: Speaking

Students select precise and descriptive grade-level vocabulary to participate actively in both social and academic English language settings.
Students make use of standard English to communicate their ideas effectively in an organized and cohesive manner by adjusting to the social context to make themselves understood in English.
Students utilize a variety of oral standard English language resources to analyze, solve problems, make decisions, and communicate shades of meaning in English.
Students use oral standard English language to acquire, interpret, apply, and transmit information.
Students present, in oral standard English language, their opinions and judgments on experiences, ideas, information, and issues.
Students use the English language for effective social communication in socially and culturally appropriate manners.

Because there are two composite scores, the state has adopted a rule for determining proficiency from the two composites. That is, the overall proficiency level is defined by the lower of the two proficiency level designations. For example, if a student scores in the advanced level for listening/speaking and the proficient level for reading/writing, the overall level is advanced (CTB/McGraw-Hill, 2006).

Standard setting was based on the item mapping procedure (Mitzel, et al., 2001). The technical manual for 2006 indicates that detailed descriptions for each performance level exist, but they are not included in the manual.

Page 200 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

Reliability and Validity

Information about the technical qualities of the assessment is provided in the technical reports.¹³ Technical reports are prepared each year by the contractor, Pearson. The technical reports contain detailed information about test specifications, item and form development, item and form analysis, equating, and standard setting. They also contain results of analyses to evaluate reliability and validity, and they document efforts to evaluate fairness issues (e.g., bias review panels, analyses of differential item functioning). Reliability analyses include the standard types of analyses used for tests with multiple-choice items (i.e., estimates of internal consistency) as well as those used for open-ended items (i.e., interrater agreement), along with analyses of classification accuracy. Some validity evidence has been collected. Construct-related validity evidence was obtained by examining the intercorrelations between subtest scores and by conducting confirmatory factor analyses of the internal structure of the test. Evidence of criterion-related validity was collected by examining the degree of correspondence between students’ performance on the NYSELAT and performance on the state’s English assessments: for the lower grades, the latter was the state’s English language arts assessment used for NCLB accountability purposes; for the higher grades, it was the Regents English exam.

STANFORD ENGLISH LANGUAGE PROFICIENCY TEST

The Stanford English Language Proficiency Test (SELP) was developed by NCS Pearson (formerly Harcourt Assessment, Inc.). Pearson offers both an “off-the-shelf” version of the assessment and customized versions that are augmented to meet a particular state’s needs. Often the customized versions have different names. For instance, Arizona’s version of the SELP is called the Arizona English Language Learner Assessment (AZELLA), and Washington’s version is called the Washington Language Proficiency Test-II (WLPT-II). Currently, these are the only two user states, although when our study began, New Mexico and Wyoming also used the assessment (Roger Frantz, manager with Pearson, personal communication, June 2009).¹⁴

Content Standards

The test framework was developed in 1997 through analyses of the standards in place for six states (California, Delaware, Georgia, Hawaii, Missouri, and Texas), in conjunction with the TESOL standards (Roger Frantz, manager with Pearson, personal communication, June 2009). Frantz indicated that when a state chooses to

¹³

The reports are available at http://www.emsc.nysed.gov/osa/reports/ [December 2010].

¹⁴

Basic information about the assessment is available through Pearson at http://www.pearsonassessments.com/haiweb/cultures/en-us/productdetail.htm?pid=015-8429-206 [December 2010]. Information is also available at the websites for user states: for Washington, at http://www.k12.wa.us/assessment/wlptii/default.aspx [December 2010]; for Arizona, at http://www.ade.state.az.us/oelas/AZELLA/AZELLAAZ1TechnicalManual.pdf [December 2010].

Page 201 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

use the SELP, an alignment study is conducted to determine the extent to which the assessment is aligned with the state’s ELP content standards. The test is then customized or augmented to ensure that the items cover the state standards.

Grade Bands

The SELP provides versions for six grade bands: Pre-K, K-1, 1-2, 3-5, 6-8, and 9-12. However, the grade bands can be customized for a state. For instance, Washington uses versions for four grade bands, K-2, 3-5, 6-8, and 9-12, and Arizona uses versions for five grade bands, K, 1-2, 3-5, 6-8, and 9-12.

Item Types

The SELP consists of five subtests: (1) listening, (2) reading, (3) writing, (4) writing conventions, and (5) speaking. The listening and reading subtests use multiple-choice items. The speaking subtest uses constructed-response items (described by the developer as “performance-based”). The writing conventions subtest uses multiple-choice items to measure the mechanics of writing. The writing subtest uses extended-answer constructed-response items.

Scores Reported

The off-the-shelf version of SELP offers scores for listening, speaking, and reading. Writing is a composite of the writing and writing conventions subtest. Five other composite scores are available: (1) productive skills [speaking and writing]; (2) comprehension skills [listening and reading]; (3) oral skills [listening and speaking]; (4) academic skills [reading, writing, and writing conventions]; and (5) an overall composite score. Washington reports individual domain scores (listening, speaking, reading, writing) and an overall composite score. Arizona reports the four domain scores and three composites (comprehension, oral, and overall composite).

Performance Levels

The off-the-shelf version of the SELP has set five performance level descriptions, but states are free to determine their own levels (Roger Frantz, manager with Pearson, personal communication, June 2009). The off-the-shelf version uses the following performance levels: pre-emergent, emergent, basic, intermediate, and proficient. The recommended cut scores for these levels were set by the publisher using the modified Angoff procedure (Angoff, 1984, also see Stephenson, 2003). For states that use a customized version, separate standard setting is done, and performance levels are adapted to the state needs.

Arizona uses the performance level names established for the off-the-shelf version (Porta and Vega, 2007, p. 137). The cutoff scores for the performance levels

Page 202 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

were determined through a standard setting based on the modified-Angoff procedure (Angoff, 1984; Reckase, 2000, as cited in Harcourt, 2007). Performance-level descriptions were developed for each domain area and for each grade band: that is, there are 20 sets of descriptions for the five performance levels. No overall performance-level descriptions appear to be available. As a sample of the performance level descriptions used by Arizona, below are the descriptions for the composite score in comprehension (reading and listening) for the middle elementary grades (3-5) (Harcourt, 2007).

Pre-Emergent: This student made very few or no responses. This student has very little ability to understand spoken English and understands only a few isolated words. This student understands almost no written English or only a few isolated words. This student may be able to understand visual universal symbols and graphics associated with a text.

Emergent: This student is able to comprehend a few key words, phrases, and short sentences in simple conversations on topics of immediate personal relevance when spoken slowly with frequent repetitions and contextual clues. This student is able to understand a few common high-frequency sight words and simple sentences in English. This student is able to comprehend a few simple content-area words with the aid of picture cues. This student is able to indicate the meaning of some common signs, graphics, and symbols.

Basic: This student is able to comprehend and follow three- to four-step oral directions related to the position of one’s movements in space. This student can comprehend a few content-area words, including grade-level math and science vocabulary. This student is able to understand a few words that indicate mathematics operations. This student is able to comprehend some simple grade-level math word problems. This student comprehends and follows up to five-step written directions for classroom activities.

Intermediate: This student is able to comprehend and follow three- to four-step oral directions related to the position, frequency, and duration of one’s movements in space. This student can comprehend some content-area words, including grade-level math and science vocabulary. This student is able to understand some words that indicate mathematics operations. Occasionally, this student is able to comprehend grade-level math word problems. This student comprehends and follows a short set of written instructions on routine procedures.

Proficient: This student comprehends and follows multiple-step oral instructions (four or more steps) for familiar processes or procedures. This student can comprehend many content-area words, including grade-level math and science vocabulary. This student is able to understand many words that indicate

Page 203 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

mathematics operations. Sometimes this student comprehends grade-level math word problems. This student comprehends and follows a set of written multistep instructions on routine procedures.

Washington uses four performance levels: beginning/advanced beginning, intermediate, advanced, and transitional. Students must reach the transitional level to be considered for reclassification (Kimberly Hayes, WLPT-II memo, Office of Superintendent of Public Instruction, available: http://www.k12.wa.us/assessment/wlptii/pubdocs/WLPTMemoUpdated2010.pdf). Performance level descriptions are not provided in the technical manual but were obtained through the state Title III director (Helen Malagon, personal correspondence, September 2010):

Beginning/Advanced Beginning: Has little or no English reading skills with some understanding of content-area vocabulary and concepts. Writes simple English words, patterned phrases, and simple sentences. Communicates with words, sentences, drawings, gestures, and actions.

Intermediate: Comprehends short connected texts with context clues. Writes simple sentences or repetitive language. Participates in social discussions on unfamiliar topics. Begins to self-correct speech.

Advanced: Reads both short and long connected texts with understanding. Writes simple essays with standard conventions, organization, and detail. Uses figurative and idiomatic language in discussions of academic content and ideas.

Transitional: Reads and writes at grade level. Uses grammatically correct English with native-like proficiency.

Details about Washington’s standard-setting methods are not described in the technical manual.

Reliability and Validity

Information about the technical qualities of the SELP assessment is provided in technical reports, some of which are available through state websites and some through the publisher.¹⁵ The technical report for the WLPT-II was obtained from the state Title III director (Pearson Education, 2010). Technical reports do not ap-

¹⁵

For instance, we obtained a technical report for the off-the-shelf version of SELP through the publisher, and we obtained the technical report for AZELLA at http://www.ade.state.az.us/oelas/AZELLA/AZELLAAZ-1TechnicalManual.pdf [December 2010]. We also obtained a technical report for New Mexico, for the 2007-2008 school year, at http://www.ped.state.nm.us/AssessmentAccountability/procurementLib3.html [December 2010]. We do not provide details about the New Mexico test because the state discontinued its contract with Pearson in 2009 and began using the ACCESS.

Page 204 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

pear to be prepared each year for each state. An updated version of the technical report for the off-the-shelf version was still under preparation for the 2009 administration year. The version of Arizona’s technical manual that the panel obtained was a summary of technical information for the 2006 administration year. The version of Washington’s technical manual that we reviewed was for the 2008-2009 testing year.

The technical reports contain detailed information about test specifications, item and form development, item and form analysis, equating, and standard setting. They also contain results of analyses to evaluate reliability and validity and document efforts to evaluate fairness issues. For the SELP, reliability analyses include the standard types of analyses used for tests with multiple-choice items (i.e., estimates of internal consistency) as well as those used for open-ended items (i.e., interrater agreement). Studies of classification accuracy are also reported in the technical manuals for the two user states (Arizona and Washington). A number of validity studies have been conducted for both states. Evidence of content-related validity is based on studies of the alignment between the test items and the content standards. Evidence of construct-related validity is based on examination of the intercorrelations among the subtests, point biserial correlations, and principal components factor analyses of the internal structure. No evidence of criterion-related validity is reported for either state, although the report for Arizona indicates that such studies were planned for the 2007 testing cycle. Studies of fairness/bias appear to be based on bias reviews conducted as items were developed and test forms assembled. Results from analyses of differential item functioning are reported in the technical manual for Washington but not for Arizona.

TEXAS ENGLISH LANGUAGE PROFICIENCY ASSESSMENT SYSTEM

In response to state legislation passed in 1995, the Texas Education Agency (TEA), along with the testing contractor, Beck Evaluation and Testing Associates, developed the Reading Proficiency Tests in English (RPTE), which were implemented during the 1999-2000 school year for ELL students in grades 3 through 12. These were the first state-administered reading tests of ELP in the Texas assessment program. In response to federal requirements for assessing additional grades and language domains, additional assessments of English language proficiency were implemented during the 2003-2004 school year. At that time, the Texas English Language Proficiency Assessment System (TELPAS) was created, and RPTE was retained as the reading component of TELPAS for ELL students in grades 3-12. Holistically rated assessments were developed for the domain of reading in K-2 and for listening, speaking, and writing in K-12. Changes were made to the RPTE during the 2007-2008 school year, and the name RPTE was discontinued. The current version of the test is an online assessment. Technical information is available in the technical digest published by the Texas Education Agency (2009c).¹⁶

¹⁶	Technical Digest 2008-2009, Chapter 7, TELPAS pp. 165-167; it is available at http://www.tea.state.tx.us/index3.aspx?id=2147484418&menu) [December 2010].

Page 205 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

Content Standards

The RPTE was originally intended to align with the state’s previous assessment program, the Texas Assessment of Academic Skills. Beginning in spring 2004, the RPTE was augmented in order to align it with another assessment, the Texas Assessment of Knowledge and Skills reading selections and test questions. In 2008, a new edition of RPTE was developed to align with the state’s revised ELP standards, at which point a number of test modifications were made.

The TELPAS subcomponent name RPTE was discontinued.
A grade 2 test was added, resulting in the discontinuation of the previously administered holistically rated grade 2 TELPAS reading assessment.
The grade clustering of the middle and high school tests changed from grades 6-8 and 9-12 to grades 6-7, 8-9, and 10-12.
More reading selections and test questions were added to assess English language reading proficiency in mathematics and science contexts.
The test blueprints were modified to include more reading material at the highest ELP level.
The tests were developed as online assessments.

TELPAS is intended to measure learning in alignment with the Texas English Language Proficiency Standards, which are a component of the Texas Essential Knowledge and Skills curriculum. The standards outline the instruction that ELL students must receive to support their ability to develop academic ELP and acquire challenging academic knowledge and skills.

Grade Bands

The TELPAS reading tests have versions for the following grade bands: 2, 3, 4-5, 6-7, 8-9, and 10-12. The holistically rated components are grade specific.

Item Types

TELPAS includes holistically rated, performance-based components to assess skills in some of the domains. These assessments are used in all domains for grades K-1, listening, speaking, reading, and writing. For grades 2-12, they are used to assess all domains except reading, which is assessed through multiple-choice items. The holistic assessments are conducted by teachers in the classroom. The teachers are trained to collect information on their own students and to evaluate on the basis of their interactions with and observations of students. Writing in grades 2-12 is assessed through a collection of students’ classroom writing assignments. Teachers must undergo training to learn how to conduct the ratings and must meet qualification standards. The rating rubrics are the proficiency-level descriptors, which are defined in the Texas ELP standards and which teachers are required to use in ongoing

Page 206 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

instruction to develop students’ English language proficiency and make grade-level instruction linguistically accessible.

Scores Reported

Scores are reported for each domain, listening, speaking, reading, and writing. Two composite scores are also reported. One is a comprehension score, derived from performance on the reading and listening subtests. An overall composite score and rating are also reported. In computing this composite score, listening and speaking are each weighted by 5 percent, writing is weighted by 15 percent, and reading is weighted by 75 percent. According to the technical manual (Texas Education Agency, 2009c) listening and speaking receive less weight so that students do not attain a high composite proficiency rating before they acquire the English reading and writing proficiency needed to support their full potential for academic success.

Performance Levels

TELPAS scores are reported according to four performance levels: beginning, intermediate, advanced, and advanced high. Performance-level descriptions are available for each domain and for the overall score. The global descriptors appear below:

Beginning: Beginning students have little or no ability to understand and use English. They may know a little English but not enough to function meaningfully in social or academic settings.

Intermediate: Intermediate students do have some ability to understand and use English. They can function in social and academic settings as long as the tasks require them to understand and used simple language structures and high-frequency vocabulary in routine contexts.

Advanced: Advanced students are able to engage in age-appropriate academic instruction in English, although ongoing second language support is needed to help them understand and use grade-appropriate language. These students function beyond the level of simple, routinely used English.

Advanced High: Advanced high students have attained the command of English that enables them, with minimal second language acquisition support, to engage in regular, all-English, academic instruction at their grade level.

To be considered proficient in English on the TELPAS, students must score at the “advanced high” level.

Page 207 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×

Reliability and Validity

Information about the technical qualities of the assessment is provided in annually published technical digests.¹⁷ The technical digests are prepared for each administration cycle by TEA, in conjunction with Pearson, the state’s testing contractor. The version used for the panel’s review was for the 2008-2009 school year (Texas Education Agency, 2009c). This digest contains detailed information about test specifications, item and form development, item and form analysis, and statistical procedures for equating of the reading test. The holistically rated assessments are not statistically equated; instead, the difficulty is maintained through the use of consistent rating rubrics developed to define the proficiency levels and through consistent training and qualifying procedures for the raters. Details about standard setting appear in the report for the 2007-2008 school year.

The technical report contains results of analyses to evaluate reliability and validity. Reliability analyses include the standard types of analyses used for tests with multiple-choice items (i.e., estimates of internal consistency) as well as those used for open-ended items (i.e., interrater agreement). Estimates of classification accuracy are also provided (e.g., accuracy of student classifications into performance categories). Some validity evidence has been collected. Content-related validity evidence consists primarily of expert review of the extent to which the items correspond/conform to the item specifications and the performance-level descriptions. The TEA indicates that construct-related validity evidence is provided through estimation of internal consistency reliability for the multiple-choice components and the training and administration procedures for the holistically rated components. Evidence of criterion-related validity was collected by examining the degree of correspondence between performance on the TELPAS reading component and performance on the state’s reading assessment (the Texas Assessment of Knowledge and Skills, TAKS). For the study, the average TAKS reading score was calculated for students at each grade level and at each performance level: for example, the mean TAKS score for 3rd graders classified on the TELPAS as beginning, intermediate, advanced, or advanced high, and so on for each grade). Rating audits of the other language domains are conducted to provide evidence that the internal structure of the assessments are intact and that teachers administer the holistically rated assessments and apply the rating rubrics as intended. No information is provided about attempts to evaluate the assessment for fairness or bias.

¹⁷	The digests are available at http://www.tea.state.tx.us/index3.aspx?id=2147484418&menu_id=793) [December 2010].

Page 208 Cite

Suggested Citation:"Appendix A: Review of English Language Proficiency Tests." National Research Council. 2011. Allocating Federal Funds for State Programs for English Language Learners. Washington, DC: The National Academies Press. doi: 10.17226/13090.

×