The use of standardized tests in the college admissions process has grown steadily since they were developed early in the twentieth century.1 These tests grew out of a larger movement to use newly devised measures of mental ability to help address a variety of emerging social problems: the early development of standardized ability testing was characterized in a 1982 National Research Council report as both ''a search for order in a nation undergoing rapid industrialization and urbanization, and a search for ability in the sprawling, heterogeneous society that emerged from these processes" (National Research Council, 1982:81). Standardized tests were first used for selection for the civil service and other employment, but their potential value in education was quickly apparent. As student populations grew in the early years of the century, both secondary school and college officials sought means of introducing order in a haphazard system. Colleges were developing increasingly diverse requirements, and secondary schools were providing increasingly diverse preparation. To address this situation, the College Board, formed in 1900, developed a set of essay examinations to assess the preparation, in various subjects, of secondary students from schools around the country.2
Continuing population growth and demand for college access soon placed further pressures on the system. The view of college as a privilege for the relative few was giving way to a conception that society needed more educated workers and that a college degree benefited individuals in increasingly practical ways. After World War I, many colleges for the first time received applications from more students than they could accommodate and were forced to select among them. The colleges saw a need for a means of identifying students who were capable of college work, not only those who had completed familiar college preparatory programs
(National Research Council, 1982:92). Advances in ability testing promised to make that identification possible, and the College Board sponsored the development of the first multiple-choice-format Scholastic Aptitude Test (SAT), which was administered in 1926.3 By the start of World War II, the SAT was a well-established part of the admissions process, and as the century ends it is taken by millions of students every year. Its success inspired the development of similar tests for admission to graduate and professional schools, and by the late 1950s, a competing undergraduate admissions test, the American College Test (ACT).
On average, a college degree offers significant economic and social benefits, and the proportion of high school graduates seeking these benefits has been growing steadily during the twentieth century (National Center for Education Statistics, 1998a:1–2). There are many kinds of institutions with many different missions, and students with a range of strengths and purposes seek college educations. College degrees are not all equally easy to obtain, nor do they offer equal benefits. Demand for places at some institutions, particularly the most prestigious ones, exceeds supply. However, 93 percent of qualified applicants to 4-year institutions are accepted by at least one, and 84 percent enroll (National Center for Education Statistics, 1998b:6).4 The problems lie primarily not with the possibility of going to any college, but with gaining admission to the competitive ones. At best, the sorting process matches students with colleges in ways that will benefit both; at worst, it perpetuates deep inequalities in American society. The question of fairness has consequently been a perennial part of the discussion of college admissions in the United States.
Today, approximately 90 percent of 4-year public and private institutions require applicants to submit admissions test scores (Breland, 1998:7–9).5 Although institutions make use of a wide variety of other informa-
tion, it is their uses of test scores in particular that have given rise to considerable controversy and confusion. The two tests that supply these scores, the SAT and the ACT, have significant similarities and are viewed by some as almost interchangeable, but they were designed with somewhat different purposes and retain important differences in content and structure. The SAT, originally developed to assist competitive institutions (mostly located on the two coasts), was designed to measure general verbal and mathematical reasoning in order to provide "a standard way of measuring a student's ability to do college-level work" (quoted in Wightman and Jaeger, 1998:5–6). The ACT, in contrast, was designed to assist institutions (mostly in the middle states) that generally admitted all qualified applicants—typically, students who have completed particular course requirements (perhaps achieving a minimum grade point average) and received a high school diploma. Consequently, this newer test was designed to draw more explicitly on the content knowledge students had acquired in high school and to assess how well they could use and apply it. The ACT was intended not only to assist colleges in admissions and recruitment, but also with course placement and academic planning. It had the additional purpose of helping students to "identify and develop realistic plans for accomplishing their educational and career goals" (quoted in Wightman and Jaeger [1998:3] from ACT materials). Thus, a fundamental distinction between the two tests is that the SAT was originally intended to help colleges identify the ablest students for admission to elite institutions, and the ACT was originally intended to provide fairly detailed profiles of the full range of students, to help both students and colleges determine the best academic path for each student.
Although the distinction between the coastal and midwestern institutions that accounted for these differences has faded, the SAT and the ACT have retained their distinct goals (despite the fact that in many institutions the two tests are used almost interchangeably).6 The SAT is described by its developers as a measurement of reasoning abilities that develop "over years of schooling and in ... outside reading and study" (quoted in Wightman and Jaeger, 1998:6). It is endorsed by the College Board and the Educational Testing Service as a predictor of academic
success that is useful in admissions decisions, although not as the sole criterion for such decisions. The ACT is also intended as a predictor of success in college, but it is endorsed by its developers for use by high schools in counseling, evaluation studies, accreditation documentation, and public relations; by state and national agencies for financial aid, loan, and scholarship decisions, and other uses; and by colleges for placement and recruitment, as well as admissions decisions.
SAT and ACT scores are currently used in a wide range of ways in a wide range of settings. Some of these uses are technically defensible means of pursuing important goals, but others are not. The steering committee has been guided in its deliberations about these uses by general criteria for appropriate test use that have been defined in the context of previous work by The Board on Testing and Assessment. Of those criteria, the two most relevant to admissions testing are that a test's validity can be understood only in the context of the purpose for which it is being used, and that "no single test score can be considered a definitive measure of a student's knowledge" (National Research Council, 1999:2–3).
The proportion of high school graduates who enroll in college grew from 49 percent in 1979 to 65 percent in 1996 (Breland, 1998:3). Not surprisingly, the proportion of high school graduates who take standardized tests has also risen. These test scores are also frequently put to uses other than those for which they were devised. The tests have indeed become, in the words of Wightman and Jaeger (1998), a "ubiquitous presence" with high stakes attached, so it is not surprising that they have been demonized, lionized, and misunderstood.
There are several reasons that debate and controversy continue to surround the use of admissions tests at selective institutions. A key one is the persistence of score gaps, particularly between white and minority students, but among other groups as well. The black-white score gap on admissions tests, as on most standardized tests, is large, and the proportion of black students in the highest score ranges is low (Kane, 1998:433–435). What do the gaps mean? Are the tests biased against minority students or females or unfair to particular groups in some other way? Do they simply reflect inequities that begin affecting minority students long before they take college admissions tests?
Samuel Lucas reviewed the existing research on explanations for the score gaps, particularly between blacks and whites, for the workshop and made several points that are particularly important in this context. First, although significant score gaps persist, they have shifted: in general, black students' scores have risen significantly in comparison with those of whites since the 1960s, and gaps associated with socioeconomic differences among test takers are sometimes larger than those between black and white students. Lucas also drew a sharp distinction between students' actual ability, which may or may not be revealed through a particular performance, and their demonstration of that ability—through performance on a test, for example—which he called achievement. He noted (Lucas, 1998:3):7
Any given performance, or set of performances, can only reveal, at best, one's level of mastery; it is not possible to reveal one's untapped capacities, which might be far greater than the achievement level demonstrated on the measuring instrument. For this reason, then, by the definition of ability, a test may measure achievement, not ability.
Lucas' point relates to broad questions about uses of test scores and varying interpretations of academic merit. If one views the test score gap as valuable evidence of differing likelihoods of academic success in college for different groups, then relying on the scores as an element in selection makes sense. If, however, one views the gap as a reflection of differences in prior accomplishment, then that use may be questioned. For the steering committee, however, this question leapfrogs over the more basic question of whether the test scores in question are sufficiently robust—that is, statistically strong—to bear the weight of the gatekeeping function, regardless of how they are viewed.
Addressing another reason for controversy, Sylvia Johnson discussed at the workshop some of the consequences of the disparities in test scores. Clearly, to the extent that test scores have been used to regulate access to higher education, minority students' lower scores have put them at a disadvantage in the competition for places (Johnson, 1998:13). But the gap may have other, subtler effects, as well. For example, researchers have suggested that students' test performance may be impaired by their aware-
ness of negative stereotypes about the group to which they belong (Johnson, 1998:19; see also Steele, 1997; Steele and Aronson, 1995). Thus, Claude Steele and others have argued, knowledge of the performance gap on standardized admissions tests may lower the performance of minority students, contributing to continuance of the gap. Research on uses of other kinds of high-stakes tests also suggests that classification of minority students as low achievers can serve to limit their opportunities to benefit from demanding curricula and other educational opportunities (Johnson, 1998:8–13). Although findings such as these have not been linked to college admissions tests, they do demonstrate the complexity of the discussions of academic merit and the role of tests in defining it.
More practically, the existence of score gaps, particularly between black and white students, is one key reason that affirmative action programs were developed—and are so controversial—since colleges have long sought both academic merit and diverse student populations.8 Since academic merit has increasingly been defined by test scores, the gap in test performance has made these goals seem starkly opposed. If there were no gap—and if minorities as a group were as well positioned as whites for competitive college selection—the role of test scores in admissions might not have become so controversial. Although being well positioned for college selection involves far more than strong test scores, as selective institutions have long stated in their policies, other performance measures, such as high school records, often show similar gaps. It is important to note that standardized tests are not the sole reason that minorities' access to higher education has been limited. Indeed, affirmative action has been intended as a means of enriching the education of all students by finding means—including the possibility of relaxed requirements for test scores or other criteria—of including diverse, capable individuals from groups that were traditionally underrepresented (or excluded) because of complex historical and cultural inequalities. These inequalities continue to affect educational aspirations and achievement. Moreover, views of the problems raised by test use have to a certain extent been shaped by the context of the United States' long history of racial inequality.
The test score gap fosters the notion that minorities as a group are less qualified for academic success, and it also complicates the use of scores
for selection.9 Some institutions have used test scores in formulas and other numerical systems for selecting among applicants; because minority groups score lower than others, on average, colleges have sometimes applied these methods differently for different groups in order to ensure racial diversity in their entering classes. Some of these uses have been challenged as unfair and illegal; the 1996 legal ruling in Texas's Hopwood case and voter actions in both California and Washington have barred public colleges in those jurisdictions from considering race at all as a means of promoting diversity in the admissions process. It is important, however, to distinguish among the many methods of considering race that have been used. The University of Texas School of Law, for example, (the subject of the Hopwood lawsuit) was using an explicit two-track system, under which different selection criteria were used for white and black students. Many other institutions, however, have simply used race as one among many so-called "plus factors" in the process.
Colleges throughout the nation are reevaluating their admissions policies in light of the outcomes of court cases and voters' mandates and waiting to see what effects the changes will have on the institutions that are subject to them. Many colleges want to know whether they are vulnerable to legal challenge themselves, and many are also taking the opportunity to reflect more broadly on their reasons for seeking culturally and ethnically diverse student populations, the goals underlying their admissions practices, and the extent to which their practices serve their goals.
In this climate of rapid change and reevaluation, an objective look at what is known about the current admissions process, and about the strengths and limitations that standardized tests bring to it, is important.