Page 86

6

Designing Reports of District-Level and Market-Basket NAEP Results

The goal of NAEP is to inform our society about the status of educational achievement in the United States and, more recently, in specific states. Currently, policy makers are considering if NAEP data gathered from still smaller geopolitical units and based on smaller numbers of test items can be used to generate meaningful reports for a variety of constituents. These proposed reporting practices emanate from desires to improve the usefulness and ease of interpretation of NAEP data. Both proposals call for close attention to the format and contents of the new reports.

When NAEP first proposed producing state-level results, a number of concerns were expressed about potential misinterpretation or misuse of the data (Stancavage et al., 1992; Hartka & Stancavage, 1994). With the provision of below-state NAEP results, the potential for reporting/misinterpretation problems is also high. If readers are proud, distressed, or outraged by their statewide results, their reaction to district or hometown results are likely to be even stronger. In addition, the wider variety of education and media professionals providing the public with information about local-level test results is also likely to contribute to potential interpretation problems. These professionals may have a greater variety of positions to promote as well as more varied levels of statistical sophistication. In short, consideration of effective reporting formats may become more urgent.

Even if the proposals for district-level and market-basket reporting do not come to fruition, attention to the way NAEP information is provided would be useful. As described in Chapter 2, the types of NAEP reports are



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 86
Page 86 6 Designing Reports of District-Level and Market-Basket NAEP Results The goal of NAEP is to inform our society about the status of educational achievement in the United States and, more recently, in specific states. Currently, policy makers are considering if NAEP data gathered from still smaller geopolitical units and based on smaller numbers of test items can be used to generate meaningful reports for a variety of constituents. These proposed reporting practices emanate from desires to improve the usefulness and ease of interpretation of NAEP data. Both proposals call for close attention to the format and contents of the new reports. When NAEP first proposed producing state-level results, a number of concerns were expressed about potential misinterpretation or misuse of the data (Stancavage et al., 1992; Hartka & Stancavage, 1994). With the provision of below-state NAEP results, the potential for reporting/misinterpretation problems is also high. If readers are proud, distressed, or outraged by their statewide results, their reaction to district or hometown results are likely to be even stronger. In addition, the wider variety of education and media professionals providing the public with information about local-level test results is also likely to contribute to potential interpretation problems. These professionals may have a greater variety of positions to promote as well as more varied levels of statistical sophistication. In short, consideration of effective reporting formats may become more urgent. Even if the proposals for district-level and market-basket reporting do not come to fruition, attention to the way NAEP information is provided would be useful. As described in Chapter 2, the types of NAEP reports are

OCR for page 86
Page 87 many and varied. The information serves many purposes for a broad constellation of audiences, including researchers, policy makers, the press, and the public. These audiences, both the more technical users and the lay public, look to NAEP to support, refute, or inform their ideas about the academic accomplishments of students in the United States. The messages taken from NAEP's data displays can easily influence their perceptions about the state of education in the United States. Generally, both technical users and the lay public tend to extract whatever possible from data displays. Unfortunately, the “whatever possible” often translates to “very little” for at least two reasons. First, readers may pay very little attention to data reports, feeling that the time required to decode often arcane reports is not well spent; the data are not worth the additional effort. Second, even when readers carefully study the displays, they might misinterpret the data. Even well-intentioned report designs fall prey to the cognitive and perceptual misinterpretations of the most serious reader (Monmonier, 1991; Cleveland & McGill, 1984; Tversky & Schiano, 1989). Earlier chapters of this report have focused on the feasibility and desirability of collecting and reporting such data. This chapter focuses on the end product—the reports released for public consumption. As part of our study, the committee hoped to review prototypes of district-level and market-basket reports. NCES provided an example of a district-level report that was part of an early draft of technical specifications for below-state reporting, and Milwaukee shared with us the report they received as part of their participation in a district-level pilot. These reports were presented as drafts and examples, not as the definitive formats for district-level reports. We reviewed one preliminary mock-up of a market-basket report based on simulated data (Johnson, Laser, & O'Sullivan, 1997). Since ETS is currently designing reports as part of the second year of the year 2000 pilot project on market-basket reporting, much of the decision making about market-basket reports has not yet occurred. Given the stage of the work on district-level and market-basket reporting, we present the following discussion to assist NAEP's sponsors with the design of the reports. This chapter begins with a review and description of some problems cited with regard to the presentations of NAEP data. For this review, we relied on the work of a number of researchers, specifically, Hambleton and Slater (1995); Wainer (1997); and Jaeger (1998); Wainer, Hambleton, & Meara (1999); and Hambleton & Meara (2000). The next section pro-

OCR for page 86
Page 88 vides commentary on report samples reviewed during the study. The documents reviewed include the following: 1. Draft Guidelines and Technical Specifications for the Conduct of Assessments Below-State Level NAEP Testing, NCES, August, 1995, Draft, which included a mock-up of a report for a district (National Center for Education Statistics, 1995). 2. NAEP 1996 Science Report for Milwaukee Public Schools, Grade 8, Findings from a special study of the National Assessment of Educational Progress (Educational Testing Service, 1997b) 3. NAEP 1996 Mathematics Report for Milwaukee Public Schools, Grade 8, Findings from a special study of the National Assessment of Educational Progress (Educational Testing Service, 1997a) 4. Sample market-basket report based on simulated data (Johnson, Lazer, & O'Sullivan, 1997) 5. NAEP's Year 2000 Market-Basket Study: What Do We Expect to Learn? (Mazzeo, 2000) The chapter concludes with additional suggestions for enhancing the accessibility and comprehensibility of NAEP reports. To assist in the design of future reports, we encourage the application of procedures to make the data more useable, including user- and needs-assessment, heuristic evaluation, and actual usability testing. In the appendix to this report, we provide an example of how these techniques might be applied. CRITIQUES OF NAEP DATA DISPLAYS To date, a number of concerns with the accessibility and comprehensibility of NAEP reports have been described. The most consistent concerns are discussed below. High-Level Knowledge of Statistics Is Assumed Reports assume an inappropriately high level of statistical knowledge for even well-educated lay audiences. There are too many technical terms, symbols, and concepts required to understand the message of even relatively simple data, such as mean test scores as a function of time or location. In interviews assessing policy makers', educational administrators' and media representatives' understanding of NAEP reports, Hambleton and Slater

OCR for page 86
Page 89 (1995) reported that 42 percent did not understand the meaning of “statistically significant.” Even relatively basic mathematical symbols are the source of some misunderstanding. For example, roughly one-third of those interviewed by Hambleton and Slater did not understand the meaning of the ‘>' and ‘<' symbols that were used to indicate a reliable increase or decrease in mean scores. Information Overload and Report Density In an attempt to be complete, reports may present too much information, making it difficult for readers to find and extract what they really want to know. Wainer (1997a) described this problem in detail with respect to NAEP tables, but the same arguments would hold for other formats as well. Reports also often contain overly dense displays that readers find daunting. This problem deals with readers' perceptions of ease of access. Designers of textbooks and other technical documents have learned that reports can be designed to appear more or less difficult to understand just by varying simple report features such as the amount and placement of “white space” on the page. In addition to ensuring that reports are easy to understand, care must be taken to make reports look easy to understand. Attempts at Redesign Have Increased “Clutter” When displays are redesigned for easy access, design devices are sometimes used that undermine this objective through increased clutter or perceptual inaccuracies. That is, designers can go too far in their attempts to make data appear more enticing. A case in point is the use of three-dimensional renderings of data, where line graphs become cliffs, and pie charts become floating discs. Three-dimensional renderings are inherently ambiguous when the information to be extracted involves relative size judgments of parts, such as, the relative heights of two bars in a three-dimensional bar graph. So, while attempts should be focused on making data reports appear more accessible, concurrent design reviews should ensure that comprehensibility is not compromised. Unnecessary Mental Arithmetic Is Required Reports sometimes require readers to perform unnecessary mental steps, including unreliable mental arithmetic, to derive information most

OCR for page 86
Page 90 relevant to them. For example, change scores across NAEP administrations may be as important to most readers as the absolute mean scores at each individual administration. Mistakes in mental arithmetic can easily lead to incorrect interpretations, even among readers who understand the meaning of the presented data. Graphics Are Infrequently Used Reports do not make enough use of graphical alternatives to textual and tabular formats. Associated with both the actual and perceived complexity issues noted above, reports use vast tables of numbers more frequently than necessary. Some researchers (e.g., Wainer, 1997a; Wainer et al., 1999) argue that, in many cases, graphical displays are more appropriate than tables. In an experimental study comparing redesigned NAEP data displays, many of which were graphs, with traditional NAEP displays consisting primarily of tables, Wainer demonstrated that the graphical formats promote more rapid and accurate interpretations (Wainer et al., 1999). CONCLUSION 6-1: Enhancements to the design of NAEP reports that allow for communication to a broader audience are a way to increase the utility of these tests, independent of changes to the methods used to collect and analyze the actual data. The data currently available can be made more accessible, comprehensible, and relevant. REVIEW OF SAMPLE DISTRICT-LEVEL AND MARKET-BASKET REPORTS District-Level Reports NCES' Specifications for Below-state Reporting (National Center for Education Statistics, 1995), still considered a draft document, included a report summarizing results for one of the “naturally occurring” districts. This report was in tabular format and included means, standard deviations, quartiles, and percents at or above each achievement level. Data were reported for test takers grouped by gender, ethnicity, parents' educational level, type of location, Title I participation, and eligibility status in the school lunch program. Very basic (and somewhat cryptic) interpretive information described the grouping categories and the statistics reported.

OCR for page 86
Page 91 The reports prepared for the Milwaukee Public School system consisted entirely of tables accompanied by detailed explanatory text. To enable comparisons, the tables included results for Milwaukee, Wisconsin, and the United States. The report contained numerous two-way tables that presented mean scaled scores for test takers grouped by demographic (e.g., gender, ethnicity, parental education), school environment (e.g., parental support, absenteeism, availability of classroom resources), and classroom characteristics (e.g., amount of homework assigned, availability of computers). Appendices provided guidance on grouping categories and on the reported statistics. Critique of District-Level Reports To begin our review, we compared the sample district-level reports, particularly those prepared for Milwaukee, with some of the standard NAEP reports. Although the district-level efforts attempted to make the reports more readable, while limiting misinterpretations, there is still substantial room for improvement. The most salient deficiency in both reports is the proliferation of tables. Much of the data could be relayed succinctly in graphical form, yet none were used. If we were allowed to make only one suggestion about NAEP reporting, it would be to use graphical rather than tabular formats whenever feasible, even when displaying relatively few data values (Carswell & Ramzy, 1997). The use of graphical formats will help address many of the other problems associated with previous NAEP reports, including information overload and readers' perceptions that the reports are difficult to read. One of the important ways that graphs can reduce overload is by showing relations among display elements, called “emergent features,” to allow the reader to draw conclusions without having to hold and manipulate numerical information in their working memory (Bennet & Flach, 1992). For example, a graph with three lines could be used to portray the trends in the relationships between NAEP scale scores and the amount of daily homework students complete for the United States, Wisconsin, and Milwaukee. One line would show the relationship of homework and NAEP scores for the city, another line for the state, and a third for the nation. The direction of the slopes of the lines, and the relationships among the lines (for example, fanning out vs. parallel) can be recognized very rapidly. These emergent features can be used to evaluate relationships among the data for different

OCR for page 86
Page 92 groups. For example, the relationships between amount of homework assigned and NAEP performance can readily be compared for Milwaukee versus Wisconsin students and versus the nation. The amount of information presented in individual data displays is a concern for the samples in the below-state technical specifications. The tables reporting achievement-level percentages include seven columns which, based on current knowledge about working memory constraints, is probably about three columns too many. It will be difficult for people to read the table and keep track of which column they are reading while moving down the page, at least without resorting to annoying and error-prone visual scanning to reread the column headings. Although the Milwaukee report limited most of its tables to between three and five columns, the actual range was from two to seven. While this streamlining aids the readability of individual tables, it adds to the size of the overall report and may make it difficult for some readers to find specific information spread over multiple tables and pages. This potential problem points to the importance of ascertaining users' information needs and priorities during the early stages of report design. For example, if the homework and test score relationships are of greater interest than the relationship between calculator use and test scores, then the homework table should be given priority of position in the report. Determination of the information to be combined in a single display should be based on the types of questions readers tend to ask of the data. Again, it should be noted that the use of graphs rather than tables may allow more variables to be combined in a single display without overloading the reader. Finally, the language of the reports we reviewed still overestimates the statistical expertise of its audience. For example, in the below-state report specifications, column headings included “n,” “cv,” and “< basic.” Recall that Hambleton and Slater (1995) found that only about one-third of their subjects understood the use of “<” and “>” symbols. The “cv” is likely to be beyond the grasp of most readers, and the “n,” though possibly familiar to undergraduates enrolled in a statistics course, is probably a vague memory, at best, for most people. The Milwaukee reports avoided many of these problems by reporting mainly percentages and average-scale scores. However, they did report scale scores by selected percentiles (percent at each quartile), which may not be widely understood. The Milwaukee reports also provided brief textual interpretations directly above each table. Some interpretations were provided to ensure that readers did not focus too heavily on small, statistically unreliable differ-

OCR for page 86
Page 93 ences; other interpretations were simply overviews of table content. In general, these brief text inserts are likely to be useful to people searching for specific kinds of information or who may be unfamiliar with inferential statistics and associated notations. However, the writers of these inserts must take care in selecting their terminology and in avoiding the specialized statistical usage of terms such as “significant” in describing results. Market-Basket Reports Work on designing market-basket reports is still in its earliest stage. As part of market-basket preliminary research, Johnson and colleagues (1997) provided a sample report based on simulated data. Reactions to this report were obtained during the committee's workshop on market-basket reporting. The mock-up appears below. Table 6-1 displays percent correct results for test takers in fourth, eighth and twelfth grades. Column 2 presents the overall average percent correct for test takers in each grade. Column 3 shows the percent correct scores for each achievement-level category associated with the minimum score cutpoint for the category. For example, the cutpoint for the fourth-grade advanced category would be associated with a score of 80 percent correct. A score of 33 percent correct would represent performance at the cutpoint for twelfth-grade's basic category. TABLE 6-1 Example of Market-Basket Results * (1) (2) (3)     Grade Average Percent Correct Score † Cut Points by Achievement Level         Advanced Proficient Basic 4 41% 80% 58% 34% 8 42% 73% 55% 37% 12 40% 75% 57% 33% * Data in Table 6-1 are based on simulations from the full NAEP assessment; results for a market basket might differ depending on its composition. † In terms of possible total points.

OCR for page 86
Page 94 Comments on this report were mixed, especially given that it was presented as a mock-up and not as a prototype for market-basket reporting. The primary concerns related to substantive issues, specifically the percent correct scores that would be associated with the achievement level descriptors (e.g., 55 percent correct would represent a proficient level). Given this concern, it would be essential to provide explanatory text documenting the meaning of the various achievement level descriptors. Further design of market-basket reports is an ongoing part of ETS's pilot study. The year 2000 study is expected to yield two type of reports: (1) a research report intended for technical audiences that examines test development and data analytic issues associated with the implementation of market-basket reporting, and (2) a report intended for general audiences. According to Mazzeo (2000), some of the features being explored include National and state-level NAEP results (average scores and achievement level percentages) expressed in a market-basket metric (e.g. percent correct). Such results could be confined to “total-group” scores or could be extended to include national and state results by gender, race/ethnicity, parental education, and other standard NAEP reporting groups. All, or a sample, of the items that make up the short form as well as performance data. The text of the items, scoring rubrics, and sample student responses might also be provided. A format and writing style appropriate for a general public audience. Electronic reporting. Pilot study plans call for focus groups to be conducted during the second year to obtain feedback on the report designs. Because report design is in the early development stage and actual prototypic reports are unavailable, we next discuss methods for designing reports to assist NAEP's sponsors with this process.

OCR for page 86
Page 95 TOWARD COMPREHENSIBLE AND ACCESSIBLE DISTRICT-LEVEL AND MARKET-BASKET REPORTS: THE ARGUMENT FOR FORMAL USABILITY AUDITS Current Practice NCES and NAGB have recognized the need for more attention to the public “face” of NAEP reports, funding research on readers' responses to and understanding of current reports (Jaeger, 1998; Hambleton & Meara, 2000). However, the design reviews and modifications necessary to address the comprehensibility and accessibility issues raised by this research remain fairly informal and unsystematic. NAGB has encouraged NCES to redirect NAEP reports to the general public and away from more technical audiences (Bourque, personal communication, April 2000). For example, in 1992, NAGB adopted resolutions calling for achievement levels as the primary way of reporting NAEP data, believing that achievement levels are more understandable to the public than the traditional scale scores. In addition, a separate NAGB resolution resulted in the relocation of standard errors—of most interest to the technical community and less so to the public—to the appendices of reports. However, such changes appear to be based on the opinions of board members through NAGB's Dissemination and Reporting Committee, rather than on results from formal usability audits or tests. Although NAEP reports go through NCES departmental reviews and adjudication, it is not current practice to require that a usability expert be a part of the review process. Suggested Practice One way to bring the concerns of accessibility and comprehensibility into the design and review process for NAEP reports is through the application of a number of “usability engineering” methods. These methods, which have been applied extensively to consumer product and electronic information design, rely on user-centered feedback and user participation in all phases of development (e.g., Neilsen, 1993; Norman, 1988; Rubin, 1994). Box 6-1 illustrates user-centered design strategies that might be applied to the development and revision of NAEP reports. After defining the “mission” of the report by incorporating directives, constraints (e.g., costs, time lines), and program requirements, an in-depth

OCR for page 86
Page 96 BOX 6-1 Example of design heuristics for evaluating the usability of data displays (1) Is the format compatible with the performance criterion selected? If speed of finding and reporting information is more important than absolute accuracy, then graphical or more holistic displays should generally be used. If accuracy of retrieval of precise values is the goal, a tabular display may be required. (2) Is the structure of the display compatible with the structure of the data? If the data structure has been described prior to choosing a display, then the data structure should determine the format. For example, periodic or cyclic time trends should be presented on a polar plot and linear trends should be presented in the form of a line graph. (3) Is the perceptual grouping of information compatible with the mental grouping users must perform to extract the information they want and need? Given data from the user needs assessment, are the data values necessary for the most important comparison or integration grouped most strongly (i.e., associated by the greatest number of gestalt grouping principles such as spatial proximity, similarity, connectedness, and enclosure)? Are information values that are rarely combined isolated from one another? (4) Is the level of numeric detail compatible with the reliability of the data and the needs of the reader? Reporting of decimal places should be reduced to the minimum necessary for the task at hand, as unnecessary precision results in increased reading time and reduced discriminability among numbers (and increased potential for error). (5) Is data salience compatible with data importance? One of the purposes of some data displays is to direct the reader's attention. Because involuntary shifts of attention are induced by dissimilarity (e.g., a red pie chart in a table filled with blue numbers), make certain that the most dissimilar or incongruent features of the visual array represent information of genuine importance (based on the results of data analysis or on the interests of the users). (6) Is the data display compatible with working memory limits? Working memory refers to two fundamental phenomena that all

OCR for page 86
Page 97 humans experience. The first phenomenon is that people retain their immediate thoughts only until other thoughts displace them. New thoughts displace old thoughts because working memory can only hold so much information at a given time. In general, individual displays should include no more than four organizational “objects” that must be used in conjunction (e.g., lines in a graph, columns in a table, or footnote identifier in either type of display). In addition, information to be used in conjunction should be placed together, so that one piece of information does not have to be held in working memory while the reader is looking for the information with which to integrate it. (7) Are physical properties of the stimuli compatible with our ability to detect, discriminate, and recognize these properties? Does the physical difference in the height of two bars or the slope of two lines exceed the minimum necessary to result in a perceptual just-noticeable-difference (JND)? Are data values that need to be compared presented, where possible, as points along common scales? If points along common scales cannot be used, then are physical dimensions chosen from as near the front of the following as possible—lengths, angles and slopes, volumes, lightness/darkness, and hue? If users must precisely identify a visual element from among a small set of alternatives (e.g., the color of a line that represents the data collected from the far western states rather than the Northeast, Midwest, or South), then different dimensions should be combined redundantly to aid identification and to maximize dissimilarity. (8) Is the organization of information in the display compatible with spatial metaphors and population stereotypes? Are better scores represented as “higher” scores (e.g., by graphing number correct rather than number of errors)? Are more recent scores reported to the right of earlier scores? Are lines or bars representing more southern geographic regions represented by “warmer” colors? (9) Is the choice of display format and ornamentation compatible with the users' preferences and biases? Three-dimensional displays should be avoided when showing controversial results, since readers find two-dimensional displays more “trustworthy.” Use bar graphs instead of line graphs when readers are likely to be intimidated by statistical displays.

OCR for page 86
Page 98 study is needed to identify the target audience and their likely information needs. This is the stage of user-needs analysis, an aspect of NAEP design both in terms of test construction and reporting that seems to be somewhat neglected. As we have emphasized elsewhere in this report, we need to know exactly who is interested in district-level and market-basket NAEP data, as well as who is interested in current NAEP data. It will also be necessary to determine users' expectations of what information can be gleaned from the reports; gauge their level of statistical sophistication and experience with educational test data; and elicit information about their experiences, from which guiding metaphors might be derived to aid in translating test data into more understandable concepts. This information can then be translated into a series of user requirements. For example, these requirements should include a list of statistical terms or concepts that the users can be expected to know and a list of terms and concepts likely to be misunderstood. Likewise, the requirements could indicate the minimum reading level of likely users. After gaining information about the users' interests and expectations, a list of “most important questions” can also be generated to inform the selection and ordering of specific data displays in the reports. Knowledge about the users' educational and work histories might provide suggestions for appropriate data metaphors, for example, use of sports statistics rather than economic indices. With the user requirements identified, report designers can create mock-ups of entire reports and component displays. These mock-ups can use past data or “dummy” data to increase their realism. The mock-ups should then undergo heuristic evaluations in which a usability specialist checks the designs against a list of empirically established guidelines for reducing effort, time, and errors in the reading of data displays. Box 6-1 provides one example of a set of such heuristics. However, there are additional guidelines available, such as those described by Jaeger (1998), Pickle and Herrmann for statistical maps (1994), Wainer (1997a) for tables, Spence & Lewandowsky (1989), Kosslyn (1994), and Cleveland (1985), and Gillian, Wickens, Hollands, & Carswell (1998) for graphs. It is important when choosing and using heuristics for early and rapid usability reviews that care be taken to select scientifically validated heuristics (Herrmann & Pickle, 1996; Kosslyn, 1985; Simkin & Hastie, 1987; Tversky & Schiano, 1989) that are not simply the result of design lore or convention. That is, care should be taken to ensure that the science of human cognition and comprehension informs the art of NAEP reporting.

OCR for page 86
Page 99 Suggestions made during the heuristic evaluation can be used to modify the overall report layout or the design of specific displays. At this point, actual usability testing becomes essential. Wainer (1997a) provides an excellent example of this step in the review process. In his study, a sample of potential users answered questions about NAEP data while viewing original and revised data displays. The user-subjects were also timed and probed for their preferences. In the Wainer study, most of the revised displays led to better performance and were preferred. However, there were some exceptions, which should lead to additional design revision or to the reconsideration of the original design for the final report. Once the reports are produced and distributed, further usability analyses can be made on the actual use of the reports (e.g., citations, requests for copies) and on misuses made of the data (overgeneralizations, errors in interpretation). This information can be integrated into the next user-needs analysis before the next round of NAEP data is published. Previous critiques of NAEP report design (Jaeger, 1998) have suggested a number of these components in isolation, such as market research to determine user expectations and field testing to review actual usability. Focus groups, like those conducted by Hartka and Stancavage (1994) during evaluations of the Trial State Assessment, provide examples. We suggest that these processes should be applied to the development of the reports issued to NAEP's audience in connection with district-level reporting and the design of market-basket reports. In the appendix to this report, we provide an example of how a usability process might work. Drawing on Appropriate Imagery The issue of defining appropriate metaphors to enhance report comprehension is particularly important when considering market-basket style reports. The model that has been used for market-basket reporting is the CPI (Forsyth et al., 1996). For communicating information about fluctuations in the price of consumer goods, the image of an actual market basket is both appropriate and very familiar to consumers. However, a market basket is an odd, even jarring image in the context of educational achievement. Most people probably do not view education as a consumer purchase, nor are they likely to perceive it as an assortment of independent parcels placed in a shopping cart. The question, however, is what metaphor should replace the market basket in representing a composite reporting statistic of NAEP performance? Again, the user-needs analysis is the

OCR for page 86
Page 100 appropriate forum for determining the most direct or evocative metaphor, be it a “report card,” a “GPA,” or some sort of educational “batting average.” CONCLUSIONSAND RECOMMENDATIONS Given the amount of attention that below-state results would be likely to receive, significant time and effort should be devoted to product design. The design of data displays should be carefully reviewed and should evolve through methodical processes to consider the purposes the data might serve, the needs of users, the types of interpretations, and anticipated types of misinterpretations. Any imagery used to describe reports should be based on metaphors that evoke appropriate images for educational data. User-needs analysis is the appropriate forum for determining both product design and effective metaphors for aiding in communication. RECOMMENDATION 6-1: Appropriate user profiles and needs assessments should be considered as part of the integrated design of district-level and market-basket reports. The integration of usability as part of the overall design process is essential because it considers the information needs of the public. RECOMMENDATION 6-2: The text, graphs, and tables of reports developed for market-basket or district-level reporting should be subjected to standard usability engineering techniques including appropriate usability testing methodologies. The purpose of such procedures would be to make reports more comprehensible to their readers and more accessible to their target audiences.