Page 21

2

Current NAEP

This chapter begins with an overview of NAEP and highlights features of the current assessment program that bear on or may be affected by district-level and market-basket reporting practices. Later in the chapter, we address the issues and concerns about NAEP reports that prompted consideration of these two reporting methods.

OVERVIEW OF NAEP

As mandated by Congress in 1969, NAEP surveys the educational accomplishments of students in the United States. According to NAEP's sponsors, the program has two major goals: “to reflect current educational and assessment practices and to measure change reliably over time” (U.S. Department of Education, 1999:3). The assessment informs national- and state-level policy makers about student performances, and thus plays an integral role in evaluations of the conditions and progress of the nation's educational system.

In addition, NAEP has proven to be a unique source of background information that has both informed and guided educational policy. Currently, NAEP includes two distinct assessment programs with different instrumentation, sampling, administration, and reporting practices, referred to as long-term trend NAEP and main NAEP (U.S. Department of Education, 1999).



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 21
Page 21 2 Current NAEP This chapter begins with an overview of NAEP and highlights features of the current assessment program that bear on or may be affected by district-level and market-basket reporting practices. Later in the chapter, we address the issues and concerns about NAEP reports that prompted consideration of these two reporting methods. OVERVIEW OF NAEP As mandated by Congress in 1969, NAEP surveys the educational accomplishments of students in the United States. According to NAEP's sponsors, the program has two major goals: “to reflect current educational and assessment practices and to measure change reliably over time” (U.S. Department of Education, 1999:3). The assessment informs national- and state-level policy makers about student performances, and thus plays an integral role in evaluations of the conditions and progress of the nation's educational system. In addition, NAEP has proven to be a unique source of background information that has both informed and guided educational policy. Currently, NAEP includes two distinct assessment programs with different instrumentation, sampling, administration, and reporting practices, referred to as long-term trend NAEP and main NAEP (U.S. Department of Education, 1999).

OCR for page 21
Page 22 Components of NAEP Long-term trend NAEP is a collection of test items in reading, writing, mathematics, and science that have been administered many times over the last three decades. As the name implies, trend NAEP is designed to document changes in academic performance over time. During the past decade, trend NAEP was administered in 1990, 1992, 1994, 1996, and 1999. Trend NAEP is administered to nationally representative samples of 9-, 13-, and 17-year olds (U.S. Department of Education, 1999). Main NAEP test items reflect current thinking about what students know and can do in the NAEP subject areas. They are based on recently developed content and skill outlines in reading, writing, mathematics, science, U.S. history, world history, geography, civics, the arts, and foreign languages. Main NAEP assessments use the latest advances in assessment methodology. Typically, two subjects are tested at each biennial administration. Main NAEP has two components: national NAEP and state NAEP. National NAEP tests nationally representative samples of students in grades four, eight, and twelve. In most subjects, NAEP is administered two, three, or four times during a 12-year period, making it possible to track changes in performance over time. State NAEP assessments are administered to representative samples of students in states that elect to participate. State NAEP uses the same largescale assessment materials as national NAEP. It is administered to grades four and eight in reading, writing, mathematics, and science (although not always in both grades in each of these subjects). ANALYTIC PROCEDURES NAEP differs fundamentally from other testing programs in that its objective is to obtain accurate measures of academic achievement for groups of students rather than for individuals. This goal is achieved using innovative sampling, scaling, and analytic procedures. Sampling of Students NAEP tests a relatively small proportion of the student population of interest using probability sampling methods. The national samples for main NAEP are selected using stratified multistage sampling designs with three stages of selection: districts, schools, and students. The result is a sample of

OCR for page 21
Page 23 about 150,000 students sampled from 2,000 schools. The sampling design for state NAEP has only two stages of selection: schools and students within schools and samples approximately 3,000 students in 100 schools per state (roughly 100,000 students in 4,000 schools nationwide). The school and student sampling plan for trend NAEP is similar to the design for national NAEP. In 1996, between 3,500 and 5,500 students were tested in mathematics and science and between 4,500 and 5,500 were tested in reading and writing (Campbell, Voekl, & Donahue, 1997). Sampling of Items NAEP assesses a cross section of the content within a subject-matter area. Due to the large number of content areas and sub-areas within those content areas, NAEP uses a matrix sampling design to assess students in each subject. Using this design, blocks of items drawn from each content domain are administered to groups of students, thereby making it possible to administer a large number and range of items while keeping individual testing time to one hour for all subjects. Consequently; students receive different but overlapping sets of NAEP items using a form of matrix sub-sampling known as balanced incomplete block spiraling. This design requires highly complicated analyses and does not permit the performance of a particular student to be accurately measured. Therefore, NAEP reports only group-level results, and individual results are not provided. Analytic Procedures Although individual results are not reported, it is possible to compute estimates of individuals' performance on the overall assessment using complex statistical procedures. The observed data reflect student performance over the particular NAEP block the student actually took. Given that no individual takes all NAEP blocks, statistical estimation procedures must be used to derive estimates of individuals' proficiency on the full complement of skills and content covered by the assessment. The procedure involves combining samples of values drawn from distributions of possible proficiency estimates for each student. These individual student distributions are estimated from their responses to the test items and from background variables. The use of background variables in estimating proficiency is called conditioning. For each student, five values, called plausible values, are randomly

OCR for page 21
Page 24 drawn from the student's distribution of possible proficiency estimates. Five plausible values are drawn to reflect the uncertainty in a student's proficiency estimate, given the limited set of test questions administered to each student. The sampling from the student's distribution is an application of Rubin's (1987) multiple imputation method for handling missing data (the responses to items not presented to the student are considered missing). In the NAEP context this process is called plausible values methodology (National Research Council, 1999b). The conditioning process derives performance distributions for each student using information about performance of other students with similar background characteristics. That is, performance estimates are based on the assumption that a student's performance is likely to be similar to that of other students with similar backgrounds. Conditioning is performed differently for national and state NAEP. For national NAEP, it is based on the relationship between background variables and performance on test items for the national sample. For state NAEP, conditioning is based on the relationship between the background variables and item performance for each state; these relationships may not be the same for the different state samples. As a result, the estimated distributions of proficiency for two individuals with similar background characteristics and item responses may differ if the individuals are from different states. REPORTING NAEP RESULTS Statistics Reported NAEP's current practice is to report student performance on the assessments using a scale that ranges from 0 to 500. Scale scores summarize performance in a given subject area for the nation as a whole, for individual states, and for subsets of the population based on demographic and background characteristics. Results are tabulated over time to provide trend information. In addition, NAEP reports performance using performance standards, or achievement levels. The percentage of students at or above each achievement level is reported. NAGB has established, by policy; definitions for three levels of student achievement: basic, proficient, and advanced (U.S. Department of Education, 1999). The achievement levels describe the range of performance NAGB believes should be demonstrated at each

OCR for page 21
Page 25 grade. NAGB's definitions for each level are as follows (U.S. Department of Education, 1999:29): Basic: partial mastery of prerequisite knowledge and skills that are fundamental for proficient work at each grade. Proficient: solid academic performance for each grade assessed. Students reaching this level have demonstrated competency over challenging subject matter, including subject-matter knowledge, application of such knowledge to real-world situations, and analytical skills appropriate to the subject matter. Advanced: superior performance NAEP also collects a variety of demographic, background, and contextual information on students, teachers, and administrators. Student demographic information includes characteristics such as race/ethnicity, gender, and highest level of parental education. Contextual and environmental data provide information about students' course selection, homework habits, use of textbooks and computers, and communication with parents about schoolwork. Information obtained about teachers includes the training they received, the number of years they have taught, and the instructional practices they employ. Administrators also respond to questions about their schools, including the location and type of school, school enrollment numbers, and levels of parental involvement. NAEP summarizes achievement results by these various characteristics. Types of Reports NAEP produces a variety of reports, each targeted to a specific audience. According to NCES, targeting each report to a segment of the audience increases its impact and appeal (U.S. Department of Education, 1999). Table 2-1 below lists the various types of NAEP reports along with the targeted audience and general purpose for each type of report. Uses of NAEP Reports The Committee on the Evaluation of National and State Assessments of Educational Progress conducted an analysis of the uses of the 1996 NAEP mathematics and science results. The analysis considered reports of NAEP results in the popular and professional press, NAEP publications, and vari-

OCR for page 21
Page 26 TABLE 2-1 Type of Report Targeted Audience Purpose/Contents NAEP Report Cards Policy makers Present results for all test takers and for various population groups Highlights Reports Parents, school board members, general public Answer frequently asked questions in non-technical manner Instructional Reports Educators, school administrators, and subject-matter experts Include many of the educational and instructional material available from the NAEP assessments. State Reports Policy makers, Department of Education officials, chief state school officers Present results for all test takers and various population groups for each state. Cross-State Data Compendia Researchers and state testing directors Serve as reference documents that accompany other reports and present state-by-state results for variables included in the state reports. Trend Reports [Not specified] Describe patterns and changes in student achievement as measured by the long-term trend assessments. Focused Reports Educators, policy makers, psychometricians, and interested citizens Explore in-depth questions with broad educational implications. Summary Data Tables [Not specified] Present extensive tabular summaries based on background data from student, teacher, and school questionnaires. Technical Reports Educational researchers, psychometricians, and other technical audiences Document details of the assessment, including sample design, instrument development, data collection process, and analytic procedures.

OCR for page 21
Page 27 ous letters, memoranda, and other unpublished documents. They found that NAEP results were used to (National Research Council, 1999b:27): 1. describe the status of the educational system, 2. describe student performance by demographic group, 3. identify the knowledge and skills over which students have (or do not have) mastery, 4. support judgments about the adequacy of observed performance, 5. argue the success or failure of instructional content and strategies, 6. discuss relationships among achievement and school and family variables, 7. reinforce the call for high academic standards and educational reform, and 8. argue for system and school accountability. These findings are similar to those cited by McDonnell (1994). Redesigning NAEP Reports The diverse audiences and uses for NAEP reports have long posed challenges for the assessment (e.g., Koretz and Deibert, 1995/1996). Concern about appropriate uses and potential misinterpretations were heightened by the media's reporting on the results of the first Trial State Assessment (Jaeger, 1998). One of the most widespread interpretation problems was the media translation of mean NAEP scores into state rankings. Many newspapers simply ranked states according to average scores, notwithstanding the fact that differences among state scores were not statistically reliable. In addition, there have been misinterpretations associated with reporting of achievement-level results. The method of reporting the percentage of students at or above each achievement level has been found to cause confusion (Hambleton & Slater, 1995). Because the proportion of students at or above the advanced level are also above the basic and proficient levels, and the proportion at or above proficient are also above basic, the percentages of students at or above all three levels add up to more than 100 percent. This is confusing to users. The mental arithmetic that is required to determine the percentage that scored at a specific achievement level is difficult for many users of NAEP data. Other studies have cited difficulties associated with interpreting standard errors, significance levels, and other

OCR for page 21
Page 28 statistical jargon included in NAEP reports (Jaeger, 1996; Hambleton & Slater, 1995). NAEP's sponsors have sought ways to improve its reports. The 1996 redesign of NAEP described the concept of market-basket reporting as one means for making reports more meaningful and understandable (National Assessment Governing Board, 1996). The authors of the document reasoned that public release of the market basket of items would give users a concrete reference for the meaning of the scores. This method would also have the advantage of being more comfortable to users who are “familiar with only traditional test scores,” such as those reported as percents correct (Forsyth et al, 1996:6-26). The most recent design plan, Design 2000-2010 (National Assessment Government Board, 1999a), again addressed reporting issues. Authors of the document set forth the objective of defining the audience for NAEP reports. They distinguished among NAEP's audiences by pointing out that the primary audience is the U.S. public, while the primary users of its data have been national and state policy makers, educators, and researchers. The document stated (National Assessment Governing Board, 1999a:10): [NAEP reports] should be written for the American public as the primary audience and should be understandable, free of jargon, easy to use and widely disseminated. National Assessment reports should be of high technical quality, with no erosion of reliability, validity, or accuracy. The amount of detail in reporting should be varied. Comprehensive reports would be prepared to provide an in-depth look at a subject, using new adopted test framework, many students, many test questions, and ample background information. Results would be reported using achievement levels. Data also would be reported by sex, race-ethnicity, socio-economic status (SES), and for public and private schools. Standard reports would provide overall results in a subject with achievement levels and average scores. Data could be reported by sex, race/ethnicity, SES, and for public and private schools, but would not be broken down further. Special, focused assessments on timely topics also would be conducted, exploring a particular question or issue and possible limited to one or two grades. SUMMARY AND RECOMMENDATIONS NAEP serves a diverse audience with varied interests and needs. Communicating assessment results to such a broad audience presents unique challenges. The breadth of the audiences combined with their differing

OCR for page 21
Page 29 needs and uses for the data make effective communication particularly difficult. The Committee on NAEP Reporting Practices views market-basket and district-level reporting as falling within the context of making NAEP results more useful and meaningful to a variety of audiences. These are important goals that deserve focused attention. RECOMMENDATION 2-1: We support the efforts thus far on the design of NAEP reports and encourage NAEP's sponsors to continue to find ways to report NAEP results in ways that engage the public and enhance their understanding of student achievement in the United States.