Awarding or Withholding High School Diplomas
Certification exams based on externally developed standards of performance are routinely administered in the United States to prospective nurses, doctors, pilots, plumbers, and insurance adjusters. It is no wonder, then, that the idea of requiring students to pass a test before graduating from high school has great appeal. In the 1970s, several states implemented minimum competency testing as a partial requirement for high school graduation. A single test, consisting of multiple-choice items, was thought to measure accurately whether students had mastered the basic skills that should be required of a high school graduate.
Florida was one of the first states to develop a minimum competency graduation test and was also one of the first to have to defend its testing program in court. As described in Chapter 3, in Debra P. v. Turlington (1981), a U.S. court of appeals ruled that (1) students have a legally recognized property interest in receiving a high school diploma; (2) the graduation test must be a fair measure of what students have been taught; and (3) students must have adequate advance notice of the high-stakes test requirement.
The current emphasis on standards-based educational reform is shifting the nature of assessments. Instead of focusing on multiple-choice measures of minimum competencies, assessments are emphasizing more challenging tasks that are aligned with demanding content standards (American Federation of Teachers, 1997). Although many states have
begun to look critically at the level of skills assessed in their high school graduation tests, they face dilemmas in trying to raise standards on these tests (Bond and King, 1995).
For example, states wanting to test complex skills in their graduation exams face the challenge of ensuring that all schools are teaching those skills. Bond and King describe this as a catch-22. Before using a graduation test for high-stakes purposes (awarding or denying a diploma), a state must ensure that curriculum and instruction are aligned with what the test measures. Some proponents of reform, however, see the test as a tool for inducing changes in the content and methods of teaching. "Clearly, a test cannot both lead the curriculum and reflect the curriculum at the same time," Bond and King conclude (1995:3). One possible way around this dilemma is for test users to plan a gap of several years between the introduction of new tests and the attachment of high stakes to individual student performance, during which time schools may achieve the necessary alignment between tests, curriculum, and instruction.
In addition, lower-level skills are easier to test, whereas more advanced skills are not as well defined, and ways to assess them are not well established. Moreover, there is evidence that "any of the high standards that are now being touted … would fail an unacceptably large fraction of the students" if used for making high-stakes decisions such as awarding or withholding high school diplomas (Linn, 1998a:3). Thus, as states move toward assessment of more rigorous standards, there are numerous challenges in the context of graduation testing that remain to be worked out.
Current Graduation Testing Practices
In most states, students earn high school diplomas by accumulating Carnegie units, which are based on the number of hours spent in class. The system also ensures that students have passed certain courses, but this is an imprecise and nonuniform measure of what students actually have learned. Many states are therefore requiring that students also pass one or more competency exams in order to graduate (Mehrens, 1993). According to the most recent Council of Chief State School Officers survey (1998), 18 states have high school exit exams.1 In fact, graduation
tests are the most popular type of individual accountability mechanism aimed at students.2
Bond and King (1995) provide a summary of state high school graduation testing practices as of 1995. State programs typically assessed 10th and 11th grade students, with some states starting as early as 6th grade. In most states, the student was allowed an unlimited number of chances to retake the exam, even several years after completing high school course work. All states assessed reading and math; the next most frequently assessed subject was writing. Every state used a multiple-choice test, often in combination with a writing sample, and all but one state used criterion-referenced tests.3 Of the 18 states that currently have high school exit exams, 9 use tests that could be considered to measure minimum competency in that they are based on 9th grade or lower standards.4
Two models are commonly used to combine data from multiple requirements and assessments: conjunctive and compensatory. A conjunctive model requires adequate performance on each measure, whereas a compensatory model allows performance on one measure to offset, or compensate for, substandard performance on another. Phillips (1991) points out that test-based graduation decisions typically follow a conjunctive model—students do not receive diplomas until they complete all required course work satisfactorily and pass the test(s). So although graduation decisions do not rest on test scores alone, passing the test is still a necessary condition of earning a diploma.
Some critics argue that the model chosen for these decisions should be compensatory rather than conjunctive (e.g., Mehrens, 1986). In a compensatory model, students with low test scores would be able to earn a diploma if they met or exceeded other requirements, such as getting good to excellent grades in required course work. Such an approach is
more compatible with current professional testing standards, which state that "in elementary or secondary education, a decision or characterization that will have a major impact on a test taker should not automatically be made on the basis of a single test score. Other relevant information … should also be taken into account" (American Educational Research Association et al., 1985:54, Standard 8.12).
Logic of Certification Decisions
High school graduation decisions are inherently certification decisions: the diploma certifies that the student has attained an acceptable level of learning. A test is one of many types of evidence that may be used in certification; to be valid, the test must be an accurate measure of the student's degree of mastery of the relevant knowledge and skills (American Educational Research Association et al., 1985: Standard 8.7; 1998).
As discussed in Chapters 5 and 6, the psychometric requirements for tests used for placement and certification decisions overlap considerably, but not completely. For example, the setting of cutoff scores is important to both types of decisions. Tests used for certification, however, have some distinct psychometric requirements. In particular, there is a greater need for evidence that the test's content represents the student's school experience—because a high school graduation test is presumably a measure of achievement rather than readiness (Green, 1973).
The most important assumptions underlying the use of a test for certification decisions are (1) that the test taps the knowledge, skills, or other attributes it is interpreted to measure and (2) that the cutoff score is an accurate discriminator of mastery or nonmastery in the domain. These assumptions and the evidence required to support them are addressed in more detail below. Other assumptions may arguably be plausible on their face or may already be supported by evidence provided by the test developer or in the testing literature (e.g., that the test scores are sufficiently reliable). A persuasive argument for test use will not require that every assumption be documented empirically, but that the assembled evidence be sufficient to make a credible case that the use of the test for a particular certification decision is appropriate—that is, both valid and fair (see Chapter 4 for an explanation of the kinds of evidence pertinent to this judgment).
Validation of Test Use
Because not everything can be measured in a graduation test, the choice of constructs is critical, as is setting the boundaries of the content domain to be included in the test. Should a graduation test cover only reading and math, or should it also include writing, social studies, and a foreign language? Depending on the answer to this question, the content and cognitive demands of the test will differ, as will the consequences of using the test score to make graduation decisions. For example, as mentioned earlier, graduation rates for males and females, as well as for English-language learners, would be quite different if the test emphasized reading and writing as opposed to science and mathematics (Willingham and Cole, 1997).
Should a graduation test be a measure of basic skills, or should it attempt to assess higher-level skills and knowledge? As discussed earlier, several states that employ graduation tests are moving away from minimum competency tests that measure basic skills in a few subject areas, toward tests that measure higher-level skills in several subjects and are aligned with more demanding content standards (American Federation of Teachers, 1997). This move toward more demanding standards for certifying students is significant, not only in terms of what passing the graduation test would represent, but also in terms of the likely effects on graduation rates. For examples, since revising their graduation tests in order to align them to rigorous state-level content standards, both Texas and Florida have experienced increased failure rates among their minority student populations (National Coalition of Advocates for Students, 1998; G.I. Forum, 1997). It is unclear at present, however, to what extent these increases are due simply to higher standards. There is also the possibility that the test is not yet representative of what students have actually been taught. This issue is discussed more fully in the section below on test fairness.
Whichever type of graduation test is employed, there are some steps that may be taken—before, during, or after test development—to help ensure the appropriateness of the content and skills measured by the test. For example, the domain to be assessed on the graduation test should be carefully defined and widely publicized. The test content should be representative of what students have actually been taught. Test items should
also be reviewed to ensure that they do not tap irrelevant skills or knowledge that could confound the test score interpretation.5
Of particular concern is the extent to which construct underrepresentation or construct-irrelevant components may give an unfair advantage or disadvantage to one or more subgroups of examinees. For example, a mathematics test that uses unnecessarily complex vocabulary may disadvantage some English-language learners, leading to lower passing rates that do not constitute a valid measure of these students' mathematics knowledge. Similarly, not providing certain testing accommodations (such as large-print versions) may make it difficult for some students with disabilities to adequately demonstrate what they know and can do.
These are complex issues, which we discuss further in Chapters 8 and 9. Careful review of the construct and content domain by a diverse panel of experts may point to some potential sources of irrelevant difficulty (or easiness) on a test that require further investigation. Taken together, these steps should ensure that evidence of the appropriateness of the content and cognitive processes measured by the graduation test, for individuals and groups, is gathered as part of the validation process (American Educational Research Association et al., 1985, 1998).
Setting Cutoff Scores
Chapter 4 describes different procedures for setting cutoff scores on tests as well as some of the difficulties involved. As with tests used for tracking or promotion decisions, the validity of the cutscore(s) on graduation tests depends on the reasonableness of the standard-setting process and of its outcome and consequences—especially if these differ by gender, race, or language-minority group. Several psychometric standards are relevant here (e.g., American Educational Research Association et al., 1985: Standards 1.24, 2.8 to 2.10, 2.12, 6.9 and 10.9).
Although there are right and wrong ways of setting cutscores, there is no single right answer to the question of where the cutscore should be set on a graduation test—or any other test with high stakes for students. This is partly because of the conceptual problems involved in interpreting
the cutscore. As Jaeger has acknowledged: "If competence is a continuous variable, there is clearly no point on the continuum that would separate students into the competent and the incompetent" (1989:492).
This lack of certainty has several implications—not only for the need to document and evaluate the standard-setting process but also for the need to examine the consequence of choosing a particular cutscore. The most obvious consequence is the pass rate on the test.
Three variables interact to influence pass rates on any test: the positioning of the cutscore, the complexity of the content domain, and the difficulty of the items chosen to measure this content. For example, if a low cutscore is combined with easy items measuring simple knowledge, pass rates on the test will probably be high. If any of these variables change, that is, if the cutscore is raised and/or the content becomes more complex, and/or the difficulty of the items is increased, pass rates will decrease, at least in the short term. These issues are important to keep in mind at all stages of test development, but particularly when deciding on the stakes that will be attached to passing or failing the graduation test.
General concern over the lack of a "right answer" in setting cutscores is reflected in the recommendation that, when feasible, multiple standard-setting methods should be used, and that all the results should be considered together when determining a final cutscore on the test (Hambleton, 1980; Shepard, 1984). Concern is also reflected in current psychometric standards, which recommend that a decision that will have a major impact on a test taker should not be made solely or automatically on the basis of a single test score, and that other relevant information about the student's knowledge and skills should also be taken into account (American Educational Research Association et al., 1985: Standard 8.12; 1998). This concern affects not only students who fail the test while performing well on other measures, but also those who pass the test, if their poor performance on other measures suggests the need for extra instruction or other interventions (Madaus, 1983).
New challenges in the setting of cutoff scores for graduation tests have been raised by the proliferation of open-ended items, extended-response items, and performance assessments (which are sometimes mixed with multiple-choice items to increase domain coverage). The minimum competency tests of the past contained mostly multiple-choice items. And there was much debate over the best method for setting standards on these tests, even though the properties of multiple-choice items were well
understood. In comparison, systems for determining reliable cutscores for the newer forms of assessments are less well established (Burstein et al., 1996; Linn, 1998b; National Academy of Education, 1996).6
As for any test used for making high-stakes decisions, the test developer or user needs to provide evidence that the test score is sufficiently reliable, and that the standard error surrounding scores on the test is sufficiently small for the proposed interpretation and use (American Educational Research Association et al., 1985: Standard 2.1). This is particularly important in relation to student scores that fall around the cutscore on the test, because even a one point difference can lead to the denial of a high school diploma. When feasible, therefore, some experts have recommended that multiple standard-setting methods be used in any given study, and that all the results be considered together when determining a final cutscore on the graduation test (Hambleton, 1980; Shepard, 1984).
Another issue is whether more than one cutscore should be set. Some researchers have concluded that it may be preferable to have more than two options (pass and fail) on graduation tests, either for political reasons (to prevent backlash against high failure rates) or to increase student motivation (by allowing students the opportunity to earn different types of diplomas) (Bishop, 1997a; Costrell, 1994; Kang, 1985). Some states are already doing this: students in Michigan, New York, and Tennessee may receive either a regular or a special "endorsed" diploma, depending on their test performance (Council of Chief State School Officers, 1998). This emerging practice is discussed below as a possible alternative to simply awarding or withholding diplomas.
Making Predictive Inferences
Tests used for granting or denying high school diplomas are often treated as if they were pure certification tests, but some people interpret them as also having an implicit predictive purpose. That is, a low score on a graduation test may be though to reflect not only lack of mastery of
what a student has been taught but also a future inability to function successfully in society (Madaus, 1983; Tenopyr, 1977). This was a problem in the Debra P. case, in which students who failed a graduation test were labeled "functionally illiterate"—the long-term consequences of which could be as severe as diploma denial itself. A similar issue has arisen in a Texas lawsuit concerning use of a 10th grade test, the Texas Assessment of Academic Skills (TAAS), to make decisions regarding high school graduation.7 The plaintiffs contend that awarding or denying a high school diploma on the basis of a student's score on this test cannot be justified because there is insufficient evidence that TAAS scores predict future success in school or at work (G.I. Forum, 1997). The case has not yet gone to trial, and the validity of this claim has yet to be determined.
Gathering evidence to demonstrate the predictive power of a graduation test would be difficult. In most instances, there is no single criterion or outcome variable that could be applied to all students. Schools might gather multiple sources of information, such as the number of students in college or employed two years after graduating, as a general indication of how students are functioning, but this information might be difficult to obtain as time passed.
Schools that give graduation tests early (sometimes as early as 8th or 9th grade), however, assume that such tests are diagnostic and that students who fail can benefit from effective remedial instruction. For example, in the Charlotte-Mecklenburg school district in North Carolina, students who fail the graduation test (first taken in 8th grade) get remedial assistance. Using these test results to place a pupil in a remedial class or other intervention also involves a prediction about the student's performance—that is, that as a result of the placement, the student's mastery of the knowledge and skills measured by the test will improve. Thus, evidence that a particular treatment (in this case, the remedial program) benefits students who fail the test would be appropriate as part of the test validation process.
Consequences of Graduation Test Use
Test validation includes collecting evidence on the intended and unintended consequences of test use. Determining whether the use of a test for making graduation decisions produces better overall educational outcomes requires that the various intended benefits of test use be weighed against unintended negative consequences for individual students and different kinds of students (American Educational Research Association et al., 1985: Standard 6.5; Joint Committee on Testing Practices, 1988; Messick, 1989). The committee recognizes, however, that decisions about graduation will be made with or without information from standardized tests; the costs and benefits of using test scores to make these decisions should therefore be balanced against the costs and benefits of making the same decisions using other kinds of information.
There is very little research that specifically addresses the consequences of graduation testing. In the absence of substantial empirical evidence, proponents and opponents of minimum competency graduation tests have argued over the probable consequences. Reardon reports that "while proponents of the tests have generally argued that such requirements provide incentives for students and schools, particularly those at the low end of the achievement spectrum, to improve their performance, opponents have argued that the tests lead to a low level basic skills curriculum and increase dropout rates by discouraging students who fail the tests from continuing in school" (1996:1).
Catterall adds, "initial boasts and doubts alike regarding the effects of gatekeeping competency testing have met with a paucity of follow-up research" (1990:1). This is clearly an important area for future research, to which test users should pay particular attention in validating their testing programs, as well as an issue for policymakers when they are considering whether to administer high-stakes graduation tests. The studies reported below generally address the impact of minimum competency graduation testing. The consequences of emerging graduation assessments based on higher standards may well be different. Research is needed to explore the different effects of these two types of programs.
Impact on Instruction
Many proponents and opponents of graduation testing agree that high-stakes minimum competency tests have a substantial impact on instruction and curriculum. They disagree, however, on whether the
impact is, on balance, beneficial or detrimental to learning. Much of the literature on high-stakes minimum competency testing (not limited to graduation testing) argues that preparing students for high-stakes tests often results in drill-and-practice teaching methods that fail to develop higher levels of thinking (Darling-Hammond and Wise, 1985; Madaus and Kellaghan, 1991; O'Day and Smith, 1993). A comparison of low- and high-stakes state testing programs found that, as the stakes of testing increase, "there is a point at which district strategies take on the flavor of a single-minded devotion to specific, almost 'game-like' ways to increase the test scores" (Wilson and Corbett, 1991:36).
Other researchers have concluded that the increased emphasis on basic skills may be educationally sound if accompanied by intelligent professional development of teachers that promotes active learning among students (Berger and Elson, 1996). The same study also found that, when schools use tests that carry high stakes for students, teachers are likelier to report a clear understanding of their school's mission but a diminished sense of their own professional autonomy.
Finally, some researchers hold that minimum competency graduation testing has little or no impact on instruction and learning. For instance, some findings suggest that the teachers' selection of topics for instruction does not seem to be influenced by minimum competency testing (Kuhs et al., 1985; Porter et al., 1988). Catterall (1990) interviewed students in four states with high school exit exams and found that half of the students at all performance levels were not even aware of the test, even though the majority had already taken it. Educators explained that the issues surrounding graduation testing had subsided markedly over the years. The numbers now being denied diplomas based on low minimum competency test scores ranged from negligible to none. Catterall concludes: "if a graduation test is ever to contribute to student performance through motivational or diagnostic mechanisms, it might be advantageous for students to know about the test, its use, and its meaning. Large shares of students at all performance levels are not aware of exit testing policies in their own schools, which raises doubts about any such educational contributions" (1990:7).
The findings described thus far have focused on the impact of minimum competency exams. It will become increasingly important to study the impact of emerging graduation assessments based on high standards, because the effects on instruction and learning are likely to be different. In one of the few such studies on this topic, Bishop (1997a) compared the
Third International Mathematics and Science Study (TIMSS) test scores of countries with and without rigorous graduation tests. He found that countries with demanding exit exams outperformed other countries at a comparable level of development. He concluded, however, that such exams were probably not the most important determinant of achievement levels and that more research was needed.
Impact on Dropout Rates
Although the causal connections are unclear, much of the existing research shows that the use of high-stakes tests is associated with higher dropout rates. Kreitzer et al. (1989) compared the testing activities in the 10 states with the highest dropout rates and the 10 states with the lowest dropout rates. They found that 9 of the 10 states with the highest dropout rates had high-stakes graduation tests, and none of the states with low dropout rates used tests for high-stakes purposes.
Using data from the National Educational Longitudinal Study (NELS), Reardon found that high-stakes 8th grade tests were associated with sharply higher dropout rates—6 to 8 percentage points higher just between the 8th and 10th grades. Reardon also found that the schools most likely to have high-stakes testing policies were those with high concentrations of students of low socioeconomic status (SES). His analysis suggests that "it is the concentrated poverty of these schools and their communities, and their concomitant lack of resources, that link [high-stakes testing] policies to higher dropout rates, rather than other risk factors such as student grades, age, attendance, and minority group membership" (1996:5).
Reardon and other researchers acknowledge that these studies do not provide clear evidence of causality. The question therefore remains: Do high-stakes tests cause students to drop out, or do high dropout rates spur policymakers to adopt high-stakes testing programs in the first place? We do not know the answer, though Kreitzer et al. conclude that high-stakes graduation tests may give at-risk students "an extra push out the school door" (1989:146).
In an effort to collect more direct evidence of the relationship between minimum competency graduation tests and students' decisions about dropping out, Catterall (1990) conducted interviews with educators, administrators, and high school students. He found that school administrators tend to believe that high school competency tests are so
easy that they pose no real threat to graduation, and that most students consider graduation tests beneficial. Still, students who fail the test at least once are considerably more likely than those who pass to report that they may drop out of school, even after a number of other academic variables are controlled. His results suggest that graduation tests pose no threat to most students, but, among those who fail them, they increase a sense of discouragement and contribute to the likelihood of dropping out. A limitation of this study is that it reports on students' beliefs about the likelihood of their dropping out in the future, but provides no data on whether they actually do drop out.
Cawthorne (1990) interviewed students at two schools in Boston to find out why many good students, who "were doing everything their schools asked of them," had failed Boston's newly implemented graduation test. He found that many of them were minority and/or bilingual students who had received good grades. Although all the students interviewed could read, some said either that they "did not test well" or that they read English too slowly to be able to finish the test, even though they could do well in school by working long hours on their assignments. Many of the students with good grades who failed the graduation test reported that they would not have returned to school for another year simply to pass the test requirement. The need to do so was eliminated, however, as Boston rescinded its graduation test policy that year.
A more recent study suggests that quite different subgroups are most strongly affected by failing a high-stakes test. Griffin and Heidorn (1996) report that failing a minimum competency graduation test significantly increased the likelihood that students would leave school, but only for students who were doing well academically. Students with poorer academic records did not appear to be affected by failing the test, and minority students who failed the test did not demonstrate an increased likelihood of leaving school as a result. These researchers speculate that the perceived stigma attached to test failure may cause students with higher grades to experience a substantial drop in self-esteem or a sense of embarrassment before their peers. This study suggests that such experiences might be especially acute for students with records of academic success.
These studies are not inconsistent. Some groups, such as low-SES children, blacks, Hispanics, and English-language learners, are more likely than other students to attend the schools in which high-stakes tests are given, and they are therefore likelier to be subject to high-stakes test policies and their consequences (Reardon, 1996). The same groups are
also more likely to attend schools that do not provide high-quality curriculum and instruction. It is thus not surprising that low-SES and minority students tend to fail high-stakes graduation tests at higher rates than do high-SES and white students (Eckland, 1980). 8 What is less clear is whether high-stakes graduation tests lead over time to improved curriculum, instruction, and student performance, which is one of the stated purposes of such tests. Nor is it clear why students with high grades would react more strongly than other students to failing a high-stakes graduation test. These findings and unresolved issues underscore the need for further empirical research in this area.
Societal Effects of Not Earning a High School Diploma
Very little is known about the specific consequences of passing or failing a high school graduation exam, but a good deal is known about whether and how earning a high school diploma affects a student's future life chances. Jaeger (1989) asserts that having a high school diploma, as distinct from having the skills assessed by a minimum competency test, largely determines whether a young person can obtain employment and earn money, as well as the amount of money a person can earn. He bases this conclusion on evidence suggesting that performance on a minimum competency test is not a good predictor of whether a young person will obtain employment or earn a good salary, provided the person receives a high school diploma (Eckland, 1980). Statistics show that in 1997 the unemployment rate of 25- to 34-year-old men who lacked a diploma was more than twice that of men who had diplomas. At the same ages, unemployment was 3 times higher among women who had dropped out of high school than among graduates (National Center for Education Statistics, 1998: Supplemental Table 31-1).
Hauser (1997) provides evidence that the failure to complete high school, whether due to graduation tests or other reasons, is increasingly associated with problems in employment, earnings, family formation and stability, civic participation, and health. For instance, in the last two decades, employment has been very high and stable among male college graduates; it has declined, however, among high school graduates and, to
an even greater extent, among dropouts. Furthermore, the earning power of high school dropouts has fallen relative to that of high school graduates. Over the last two decades, the earnings of white male dropouts declined from 85 percent to less than 75 percent of the earnings of white high school graduates; among black and Hispanic men, there appears to have been a similar decline. Electoral participation by high school dropouts is also lower than among high school graduates. Based on a large collection of such evidence, Hauser concludes: "Failure to obtain at least a high school diploma looks more and more like the contemporary equivalent of functional illiteracy. High school dropout indicates a failure to pass minimum thresholds of economic, social or political motivation, access and competence" (1997:154).
Issues of Fairness
The core meaning of fairness in test use concerns comparable validity. Thus a fair graduation test is one that yields comparably valid scores from person to person, from group to group, and from setting to setting.
There are several ways to assess the comparability, or fairness, of scores. Test items can be checked (using judgmental and statistical methods) to ensure that they are not biased in favor of any particular group. The testing process itself can also be assessed in terms of the extent to which students are given a comparable opportunity to demonstrate their knowledge of the construct(s) the test is intended to measure. For example, all students should have adequate notice of the skills and content to be tested, as well as access to appropriate test preparation materials, and they should be tested under equivalent conditions. Students who are at risk of failing a graduation test should be advised of their situation well in advance and provided with appropriate instruction that would improve their chances of passing. In addition, students who fail a graduation test should be given multiple opportunities to demonstrate their capabilities through repeated testing with alternate forms, or through other construct-equivalent means. The validity and fairness of score interpretations on a graduation test will be enhanced by taking into account other relevant information about individual students (American Educational Research Association et al., 1985: Standards 8.4, 8.5, 8.7, 8.8; 1998).
When assessing the fairness of a graduation test, it is particularly important that test users ask whether certain groups of students are being denied diplomas unfairly due to insufficient opportunities to learn the
material tested. Graduation tests should provide evidence of mastery or nonmastery of material taught. Thus there is a need for evidence that the content of the test is representative of what students have been taught.
Measuring What Students Have Been Taught
Not surprisingly, a great deal of research has shown that students learn best what they are taught (Porter, 1998). Thus, an important determinant of the fairness of a graduation test is the degree to which curriculum and instruction are aligned with what the test measures. Debra P. v. Turlington (1981), a circuit court decision that has influenced courts and policymakers in other parts of the United States, established the principle that a high-stakes graduation test should be a fair measure of what students have been taught. The court ordered a four-year phase-in period for Florida's graduation test, partly to provide time to bring the test, the curriculum, and instruction into alignment.
Test users can demonstrate the necessary alignment by comparing evidence on test content, curricular coverage, and instructional preparation. Curricular coverage refers to how well test items represent the objectives of the curriculum. Instructional preparation is an appraisal of the extent to which schools equip students with the knowledge and skills that the test measures. McLaughlin and Shepard (1995) state that the effort to assess alignment should (1) focus on the elements of schooling that are directly related to student achievement, (2) focus on the curriculum as enacted rather than as reported or listed in official documents, and (3) identify indicators that can be tried and then evaluated for adequacy. They also recommend that a test not be used to make high-stakes decisions about individual students until test users can show that the content of the test is representative of what the students have actually been taught.
Popham and Lindheim (1981) describe two possible approaches for measuring curricular coverage and instructional preparation. One is to analyze textbooks, syllabi, lesson plans, and other materials to determine the degree to which the planned instruction covers the content of the assessment. The second method is to observe actual classrooms. Madaus (1983) also offers steps that states can take to ensure fairness on graduation tests.9
It is neither straightforward nor inexpensive to measure the content of actual instruction (Popham and Lindheim, 1981). As a result, there is little evidence to suggest that exit exams in current use have been validated properly against the defined curriculum and actual instruction; rather, it appears that many states have not taken adequate steps to validate their assessment instruments, and that proper studies would reveal important weaknesses (Stake, 1998).
Finally, today's professional standards for content in such core academic subjects as mathematics and science are much more demanding than the minimum competency standards of the 1970s and 1980s (e.g., National Council of Teachers of Mathematics, 1989; National Research Council, 1996). If a high school graduation test needed to be aligned with these more ambitious content standards and with actual instruction, the task would be more difficult today than it was in 1981, when Debra P. was decided. Thus, states and school districts face challenges in demonstrating that the content of a high-standards, high-stakes graduation test is representative of what students have been taught (McLaughlin and Shepard, 1995).
Diploma Denial: Alternative and Complementary Strategies
As noted above, current professional testing standards state that "in elementary or secondary education, a decision or characterization that will have a major impact on a test taker should not automatically be made on the basis of a single test score. Other relevant information … should also be taken into account" (American Educational Research Association et al., 1995:54, Standard 8.12).
With or without tests, states and school districts must make decisions about which students receive high school diplomas. The decision to award or withhold a high school diploma plainly has a major impact on a young person's future life chances (Hauser, 1997; Jaeger, 1989), however,
state- and district-level curricular frameworks. Third, at the local level, Madaus recommends efforts to ensure that what is being taught corresponds with the test's domain. Fourth, he recommends that, after the test has been given, item-response patterns be analyzed by district, by school, and by student, characteristics (e.g., race, gender, and curricular tracks in high school). Finally, he suggests a series of small-scale evaluations on the long-term impact of the program on curriculum and teaching.
and there are alternative ways of making such decisions that do not rely on test scores alone. These include the use of compensatory models for making diploma decisions, the use of differentiated diplomas, and reliance on end-of-course examinations in making high school graduation decisions. In addition, many states, including those with high-stakes graduation tests, have adopted strategies in which students who are at risk of failing a graduation test are advised of their situation well in advance and provided with instruction to improve their chances of passing. For some of these approaches, which are described in more detail below, American experience is limited and research is needed to explore their effectiveness. For instance, we do not know how best to combine advance notice of high-stakes test requirements, remedial intervention, and opportunity to retake graduation tests. Research is also needed to explore the effects of different kinds of high school credentials on employment and other post-school outcomes.
As discussed earlier, states that use high-stakes graduation tests typically require students to complete all their course work satisfactorily and to pass the graduation test(s) (Phillips, 1991; Council of Chief State School Officers, 1998). This is a conjunctive model. An alternative, compensatory, model (Mehrens, 1986) would allow a student's strong performance on one indicator, such as course work, to offset or compensate for low performance on another, such as the graduation exam. This strategy has its drawbacks, particularly where policymakers and citizens question what it actually means to receive good grades. At the same time, combined with other strategies for promoting high standards in classroom instruction, this model could help satisfy demands that diplomas be based on tangible evidence of achievement while respecting standards of good test practice.
Some states offer advanced or endorsed diplomas for students who pass a high-stakes graduation test. The test may thus provide increased incentives for teachers and students. At the same time, students who pass their courses can still graduate with a traditional diploma even if they do not pass the test. This is similar to how high-stakes tests are used in many
other countries, including Britain and France, to indicate students' qualifications as they depart secondary school (Broadfoot, 1996). Using tests to indicate qualifications allows high stakes to be attached to tests without the punishing effects of barring graduation to those who fail. On one hand, this approach has potential drawbacks; teacher and student motivation may not be as high if students can graduate without passing the test. On the other hand, this strategy provides some incentives to students and teachers, and it allows states to develop assessments based on higher standards, even if the majority of students do not initially meet them.
Connecticut recently implemented a grade 10 assessment that attaches a certificate of mastery to students' high school diplomas if they meet the state goals. Students have the option of retaking the assessment in grades 11 and 12 to earn the certificate. In the first year, only 11 percent of students statewide earned the certificate in all four subjects; that percentage is climbing gradually, however.
Teacher focus groups conducted after the first two years of the program suggested that the assessment has caused some narrowing of the curriculum. Teachers also reported some positive effects, however, including placing more emphasis on higher-level thinking skills in their instruction, having students write more, and emphasizing problem-solving and open-ended science labs that reflect the activities on the assessment. Teachers also reported having more reason and opportunity to talk to each other about instruction, informally as well as in the context of local and state-sponsored workshops. The study concluded that, although an endorsed diploma has the potential to motivate students, the ways in which the special certificate will serve students in the future must be clear and tangible. Teachers reported widespread ambivalence among students because the state's colleges and businesses had not yet taken a clear position on the use of the certificate in admissions and hiring decisions (Chudowsky and Behuniak, 1999).
Some states award advanced or endorsed diplomas on the basis of end-of-course exams. There are few data on the effectiveness of these programs, but in general they appear promising and warrant further investigation as a possible alternative basis for graduation decisions.
One of the best-known programs of this kind is New York's Regents
examination system. Bishop (1997b) found that, when student demography was held constant, New York did significantly better than other states on the SAT and the NAEP math assessments without experiencing a reduction in high school graduation rates. Bishop attributes these results to the presence of the rigorous Regents examinations that many New York students take prior to graduation.10
Virginia is in the process of implementing an end-of-course graduation exam system. Starting in 2004, 12th graders will be required to pass a series of tests—based on Virginia's standards of learning—to earn a standard diploma. Students will be able to earn an advanced studies diploma if they pass additional end-of-course tests. This program also has an accountability component: by 2007, if 70 percent of students in a given school do not pass the exams, the school could lose its state accreditation. The standards of learning, approved in 1995, have won praise from many national experts for their content and rigor.
Virginia's case is an example of standards being introduced well in advance of the high-stakes assessment, providing adequate notice and time to bring tests, curriculum, and instruction into alignment. The large majority of the state's 135 districts have already begun incorporating the standards into their English, math, science, and social studies curricula (Education Week, 1998).
Early Intervention and Remedial Instruction
As noted earlier, the rates at which students fail minimum competency graduation exams has declined over the years, in part because states and school districts administer the test early (often in the 10th grade or earlier), provide multiple opportunities for students to retake the test(s) they have failed, and offer remedial education aimed at helping students learn what they need to know to satisfy requirements for graduation. This is sound educational practice; students who fail a high-stakes test should have the opportunity to retake the test, and students who are at risk of failing a graduation test should be apprised of their situation well
in advance and provided with effective instruction that will improve their chances of passing. It is also sound practice legally, for effective remedial education is one way of helping to ensure that students have been taught the kinds of knowledge and skills that the graduation test measures (Debra P. v. Turlington, 1981). In the committee's judgment, when tests are used to make decisions about graduation, states and school districts should implement programs of early intervention and effective remedial assistance.
This strategy is appealing, and low failure rates on minimum competency graduation tests could mean that the strategy is effective. At the same time, solid evaluation research on the most effective remedial approaches is sparse. Indeed, there are concerns that some existing remedial programs may offer only intense drill and practice, so that they treat the symptom (low test scores) without affecting the underlying condition (low achievement) (Office of Technology Assessment, 1992). There is plainly a need for good research on effective remedial education.
Lack of funding is also a problem for some of these programs and may jeopardize their long-term viability. Only 7 of the 18 states with graduation exams in 1994 earmarked funds to either schools or districts expressly for remedial education (Bond and King, 1995). Many remedial programs therefore rely on alternative sources of funding. Effective remedial education is expensive; whether most states and school districts have the resources needed to provide high-quality remedial instruction for students who have failed high-stakes graduation tests is not known.
The committee's findings and recommendations about awarding or withholding high school diplomas are reported in Chapter 12.
American Educational Research Association, American Psychological Association, and National Council on Measurement in Education 1985 Standards for Educational and Psychological Testing. Washington, DC: American Psychological Association.
1998 Draft Standards for Educational and Psychological Testing. Washington, DC: American Psychological Association.
American Federation of Teachers 1997 Making Standards Matter 1997: An Annual Fifty-State Report on Efforts to Raise Academic Standards. Washington DC: American Federation of Teachers.
Berger, Noah, and Hilari Hanamaikai Elson 1996 What Happens When MCTs Are Used an Accountability Device: Effects on Teacher Autonomy, Cooperation and School Mission. Paper presented at the annual meeting of the American Educational Research Association, April, New York.
Bishop, John H. 1997a The effect of national standards and curriculum-based exams on achievement. American Economic Review 87(2):260–264.
1997b Do Curriculum-based External Exit Exam Systems Enhance Student Achievement? New York: Consortium for Policy Research in Education and Center for Advanced Human Resource Studies, Cornell University.
Bond, Linda A., and Diane King 1995 State High School Graduation Testing: Status and Recommendations. Oak Brook, IL: North Central Regional Educational Laboratory.
Broadfoot, Patricia M. 1996 Education, Assessment and Society. Buckingham, England: Open University Press.
Burstein, Leigh, Daniel Koretz, Robert Linn, Brenda Sugrue, John Novak, Eva L. Baker, and Elizabeth Lewis Harris 1996 Describing performance standards: Validity of the 1992 National Assessment of Educational Progress achievement level descriptors as characterizations of mathematics performance. Educational Assessment 3(1):9–51.
Catterall, James S. 1990 A Reform Cooled-Out: Competency Tests Required for High School Graduation. CSE Technical Report 320. UCLA Center for Research on Evaluation, Standards, and Student Assessment.
Cawthorne, John E. 1990 "Tough" Graduation Standards and "Good" Kids. Chestnut Hill, MA: Boston College, Center for the Study of Testing, Evaluation and Educational Policy.
Chudowsky, Naomi, and Peter Behuniak 1999 Using focus groups to examine the consequences of large-scale assessments. Educational Measurement: Issues and Practice.
Costrell, Robert 1994 A simple model of educational standards. The American Economic Review 84(4):956–971.
Council of Chief State School Officers 1998 Trends in State Student Assessment Programs. Washington, DC: Council of Chief State School Officers.
Darling-Hammond, L., and A. Wise 1985 Beyond standardization: State standards and school improvement. Elementary School Journal 85(3).
Eckland, Bruce K. 1980 Sociodemographic implications of minimum competency testing. In Minimum Competency Achievement Testing: Motives, Models, Measures, and Consequences , Richard M. Jaeger and Carol K. Tittle, eds. Berkeley, CA: McCutchan Publishing.
Education Week 1998 Quality Counts. An Education Week/Pew Charitable Trusts report.
G.I. Forum and Image de Tejas 1997 Brief for lawsuit brought against the Texas Education Agency, Dr. Mike Moses, and members of the Texas State Board of Education regarding the Texas Assessment of Academic Skills.
Green, R. 1973 The Aptitude Achievement Distinction. Proceedings of Second CTB/McGraw-Hill Conference in Educational Measurement. Monterey, CA: CTB/McGraw-Hill.
Griffin, Bryan W., and Mark H. Heidorn 1996 An examination of the relationship between minimum competency test performance and dropping out of high school. Educational Evaluation and Policy Analysis 18(3):243–252.
Hambleton, R.K. 1980 Test score validity and standard-setting methods. In Criterion-Referenced Measurement: The State of the Art, R.A. Berk, ed. Baltimore, MD: Johns Hopkins University Press.
Hauser, Robert M. 1997 Indicators of high school completion and dropout. In Indicators of Children's Well-Being, Robert M. Hauser, Brett V. Brown, and William R. Prosser, eds. New York: Russell Sage Foundation.
Jaeger, Richard M. 1989 Certification of student competence. In Educational Measurement, 3rd Ed., R. Linn, ed. New York: Macmillan.
Joint Committee on Testing Practices 1988 Code of Fair Testing Practices in Education. American Psychological Association. Washington, D.C.: National Council on Measurement in Education.
Kang, Suk 1985 A formal model of school reward systems. In Incentives, Learning, and Employability , John Bishop, ed. Columbus, OH: National Center for Research in Vocational Education.
Kreitzer, A.E., F. Madaus, and W. Haney 1989 Competency testing and dropouts. Pp 129–152 in Dropouts from School: Issues, Dilemmas and Solutions, L. Weis, E. Farrar, and H.G. Petrie, eds. Albany: State University of New York Press.
Kuhs, T., A. Porter, R. Floden, D. Freeman, W. Schmidt and J. Schwille 1985 Differences among teachers in their use of curriculum-embedded tests. Elementary School Journal 86(2):141–153.
Linn, Robert 1998a Assessments and Accountability. Paper presented at the annual meeting of the American Educational Research Association, April, San Diego.
1998b Validating inferences from National Assessment of Educational Progress achievement-level setting. Applied Measurement in Education 11(1):23–47.
Madaus, George, ed. 1983 The Courts, Validity, and Minimum Competency Testing. Hingham, MA: Kluwer-Nijhoff Publishing.
Madaus, G.F., and T. Kellaghan 1991 Examination Systems in the European Community: Implications for a National Examination System in the U.S. Paper prepared for the Science, Education and Transportation Program, Office of Technology Assessment, U.S. Congress, Washington, DC .
McLaughlin, M.W., and L.A. Shepard 1995 Improving Education Through Standards-Based Reform. A report by the National Academy of Education Panel on Standards-Based Education Reform. Stanford, CA: National Academy of Education.
Mehrens, William A. 1986 Measurement specialists: Motive to achieve or motive to avoid failure? Educational Measurement: Issues and Practice 5(4):5–10.
1993 Issues and Recommendations Regarding Implementation of High School Graduation Tests. North Central Regional Educational Laboratory.
Messick, Samuel 1989 Validity. Educational Measurement, 3rd Ed., Robert L. Linn, ed. New York: MacMillan.
National Academy of Education 1996 Quality and Utility: The 1994 Trial State Assessment in Reading, Robert Glaser, Robert Linn, and George Bohrnstedt, eds. Panel on the Evaluation of the NAEP Trial State Assessment. Stanford, CA: National Academy of Education.
National Center for Education Statistics 1998 The Condition of Education, 1998. NCES 98-013. Washington, DC: U.S. Government Printing Office.
National Coalition of Advocates for Students 1998 A Gathering Storm: How Palm Beach County Schools Fail Poor and Minority Children. A report by the National Coalition of Advocates for Students. Boston: Shea Brothers.
National Council of Teachers of Mathematics 1989 Curriculum and Evaluation Standards for School Mathematics. Reston, VA: National Council of Teachers of Mathematics.
National Research Council 1996 National Science Education Standards. Washington, DC: National Academy Press.
O'Day, Jennifer A., and Marshall S. Smith 1993 Systemic reform and educational opportunity. In Designing Coherent Educational Policy, Susan H. Fuhrman, ed. San Francisco: Jossey-Bass.
Office of Technology Assessment 1992 Testing in American Schools: Asking the Right Questions. OTA-SET-519. Washington, DC: U.S. Government Printing Office.
Phillips, S.E. 1991 Diploma sanction tests revisited: New problems from old solutions. Journal of Law and Education 20(2):175–199.
Popham, W. James, and Elaine Lindheim 1981 Implications of a landmark ruling on Florida's minimum competency test. Phi Delta Kappan 63(1):18–20.
Porter, A.C. 1998 The effects of upgrading policies on high school mathematics and science. In Brookings Papers on Education Policy, D. Ravitch, ed. Washington, DC: Brookings Institution Press.
Porter, A., R. Floden, D. Freeman, W. Schmidt, and J. Schwille 1988 Content determinants in elementary school mathematics. Pp. 96–113 in Perspectives on Research on Effective Mathematics Teaching, D.A. Grouws and T.J. Cooney, eds. Hillsdale, NJ: Lawrence Erlbaum Associates.
Reardon, Sean F. 1996 Eighth Grade Minimum Competency Testing and Early High School Dropout Patterns. Paper presented at the annual meeting of the American Educational Research Association, New York, April.
Shepard, Lorrie A. 1984 Setting performance standards. In A Guide to Criterion-Referenced Test Construction, R. A. Berk, ed. Baltimore, MD: Johns Hopkins University Press.
Stake, R. 1998 Some comments on assessment in U.S. education. Education Policy Analysis Archives (on-line serial) 6(14). Available: http://epaa.asu.edu/epaa/v6n14.html.
Tenopyr, M. 1977 Content-construction confusion. Personnel Psychology 30:47–54.
Willingham, W.W., and N.S. Cole 1997 Gender Bias and Fair Assessment. Hillsdale, NJ: Erlbaum.
Wilson, Bruce L., and H. Dickson Corbett 1991 Two State Minimum Competency Testing Programs and Their Effects on Curriculum and Instruction. Philadelphia, PA: Research for Better Schools.
Debra P. v. Turlington, 474 F. Supp. 244 (M.D. Fla. 1979); aff'd in part and rev'd in part, 644 F.2d 397 (5th Cir. 1981); rem'd, 564 F. Supp. 177 (M.D. Fla. 1983); aff'd, 730 F.2d 1405 (11th Cir. 1984).