![]() |
||||||||||||||||||
|
|
For many parents, standardized test scores seem to answer the basic questions about education that everyone asks: How good is the school my child attends? How do the schools in our state or region compare with schools elsewhere? What chance does my son or daughter have of being admitted to a good college or university? Schools commonly use achievement test results to determine whether students are making progress and to ensure that programs in basic subjects, such as mathematics and reading, are effective. The referendum-voting and home-buying public use test scores to judge the quality of their schools and the desirability of living in one community over another. Colleges often combine students' high school grade point averages and scores on standardized tests to make crucial in-or-out decisions about admissions. But how reliable are standardized test results? Can we really use achievement test rankings or college-admissions test scores to make fair and meaningful comparisons of students, schools, and states? Workplace and everyday tasks like those in this volume bring these questions into sharp relief in two very different ways. First, such tasks differ considerably from the tasks used in standardized tests, most notably in the time a student is expected to spend on the task and in what constitutes a solution. This point raises questions about whether any standardized test can adequately assess a type of education in which extended tasks are used for instruction. Second, the kind of reasoning that parents need to gauge properly the significance of test score data is precisely the kind of reasoning that we might encourage. Interpreting test score data is itself a mathematical task!
Interpreting the SAT
For better or worse, we know that Americans have been making the comparisons outlined above for generations. This fact is perhaps best illustrated by the power and popularity of the SAT1 --the single most-used standardized test of its type in the country and an important ingredient in college admissions since the 1940s. The SAT I: Reasoning Test, a multiple-choice exam2 with verbal and mathematics components, is used in combination with high school grades to predict a student's readiness for college. Fluctuations in average SAT scores are tracked as indicators, albeit indirect, of the quality of education in this country (Bracey, 1996; Powell & Steelman, 1996). Beginning in 1964, average scores on the SAT dropped slowly but steadily for about 15 years. This led to much speculation and considerable hand-wringing about possible causes of the apparent decline in education quality in the U.S. By the early 1990s, average scores on the mathematics portion of the SAT had rebounded significantly, but scores on the verbal section had not. Scores on other national standardized exams also declined during this same time period, but none attracted as much attention as the SAT. Further significance was attributed to this alarming drop in SAT scores by the many reports of poor mathematics and science performance by U.S. students relative to that of students in other countries. In the U.S., state-by-state comparisons of SAT results became a standard feature of the economic "warfare" among the states to lure businesses and their employees based on relative measures of "quality of life" as reflected in part by test-score rankings.
Missing Information
The trouble with all of this is that test scores, particularly average scores on nationally-normed standardized tests such as the SAT, don't mean as much as we typically think they do. Understanding who takes the test--and who doesn't--is the first and perhaps single most important factor to consider when trying to understand what raw SAT scores or state-to-state comparisons really mean. For starters, we need to know what percentage of students actually take the test. Students who take the SAT (or equivalent admissions exams such as the ACT, used more extensively in the Midwestern states) are obviously prospective college students and thus not representative of the total school population. Thirty years ago, average national scores on the SAT exam were highly unrepresentative of the achievement of the "average" student in the U.S. because the pool of candidates taking the test was much smaller than it is now. In fact, in 1975 SAT-takers included only one-third of the nation's graduating seniors, but by 1994, they included 42% of the nation's graduates (College Entrance Examination Board, 1994). Although experts disagree about specifics, the decline in SAT scores during the 1960s and 1970s seems due at least in part to the fact that the test-taking pool is formed of a larger percentage of high school graduates, and, thus, in the '60s and '70s, an average SAT-taker was more like an average high school graduate than was the case in the 1950s. Today, with an even larger portion of total school population taking the SAT, we can more fairly make comparisons, yet in making state-to-state contrasts we must still account for differential rates of overall student participation. In 1993 for example, the average SAT score for students in Iowa was 1103. Students in Massachusetts averaged 903 (Powell & Steelman, 1996). But in Iowa only 5% of high school seniors took the SAT. In Massachusetts, participation was far greater--81% of high school seniors took the exam. Generally speaking, in states where SAT participation rates are relatively high, the likelihood is that average scores will be lower than in the low-participation states. (See Figure 8-1.)
However, who takes the test, as well as how many take it, also makes a difference in scores. To some extent, the diversification of the SAT-taking pool during the 1960s and 1970s was due to the overdue "democratization" of the test-taking group, with the addition to the SAT pool of significant numbers of female students and students of color, groups who previously had been underrepresented among the college-bound. Students in these groups often do not experience the "same" education as those from groups who are traditionally expected to be college-bound (Oakes, 1990; Wellesley College Center for Research on Women, 1992).
But the effect of this democratization is not simple. Changes and rates of change in SAT scores are linked in complex ways to both how many students take the test and the composition of the test-taking pool. Consider the following facts that run counter to the common expectation that such democratization causes average scores to fall:
Comparing states gets even more complicated when we consider the fact that SAT scores can also be influenced by differences in school environments. Low expectations for student performance, tracking of students into unchallenging academic programs, high student-teacher ratios, and differences in curricula and instructional practices account for significant differences in school performance and can also influence a state's SAT average. And, conversely, some changes in curriculum and instruction may not influence SAT averages. For example, the Interactive Mathematics Program (IMP) is a non-traditional high-school curriculum based on complex extended problems. The average SAT score for IMP students was only 1 point higher than that of a matched sample of students enrolled in a traditional mathematics curriculum. But 87 percent of IMP students took the SAT and only 58% of their counterparts did (Interactive Mathematics Program, 1995). For reasons like these, the U.S. government long ago instituted the National Assessment of Educational Progress (NAEP), which tests a representative sampling of all high school students in the country, not just the college-bound. Although the NAEP does not adjust for state-to-state variation in expectations, curricula, and other environmental differences, it does test a representative sample of students on a range of tasks, which include SAT-like items as well as extended-response tasks. In this sense, it provides a more reliable way than the SAT of comparing one state's educational performance with another. In fact, a cross-check of the state-by-state NAEP results for grade 4 and grade 8 (White, 1993) is one way to put some "context" around SAT-score comparisons. So here we have a mathematical task for parents and guardians. The calculations necessary for comparing one state's SAT scores with another's must include the percentage of total students taking the examination in each state as well as the composition of each pool and, secondarily, information regarding the relative strengths and weaknesses of the states based on existing measures of academic performance. Unlocking the mystery of test-score statistics means understanding raw numbers in the context of a host of student population and school environment factors. It means getting beyond the simplistic messages of test-score headlines in order to understand relative measures of student achievement and quality of the educational system.
References
Notes
1. The SAT is not an acronym, but a registered trademark. Since 1994, the SAT program has consisted of the SAT I: Reasoning Test, formerly the Scholastic Aptitude Test, and SAT II: Subject Tests, formerly the Achievement Tests. Also in 1994, calculators became optional for the mathematics portions of the tests. 2. The mathematics section of the "old" SAT consisted of 60 multiple-choice questions, which were to be done within one hour thirty minutes (Interactive Mathematics Program, 1995). The mathematics section of the new SAT consists of 60 questions to be done within one hour and forty-five minutes. Fifty of the questions are multiple choice and 10 are "grid-ins" (Burton, 1996).
|
|||||||||||||||||
|
|
||||||||||||||||||
|
|
||||||||||||||||||