Click for next page ( 15


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 14
RECOMMENDED PRINCIPLES FOR APPRAISING PROPOSALS FOR INTERNATIONAL COMPARATIVE STUDIES IN EDUCATION This section presents the principles recommended by the board for the appraisal of proposals to conduct international educa- tion studies. These criteria do not constitute a precise set of standards to be applied rigidly in assessing all proposals. Rather, they are the dimensions that the board believes should be con- sidered In reviewing plans for international comparative education studies in which the United States is a prospective participant or contributor. Comparative studies that exclude the United States are obviously also important in the larger, global educational context of which the United States is a part, but the board is unlikely to review proposals for such studies. These principles have been adopted both to guide the board's own appraisal of planned activities and for consideration by all those who are involved or interested in international comparative studies. Introduction The board encourages the conduct of international compara- tive studies across a wide range of research strategies, formats, and procedures and a broad range of nations. in the past, many of the most widely publicized research efforts have been rooted in cross-national comparisons of student academic achievement. The dominant method has been item and student sampling, that is, collection of responses from each student for a sample of items from a pool and careful scientific sampling of schools or classes. Where appropriately conducted, this is a productive line of research and the board encourages similar efforts in the future. However, there are other research models, some highly quantitative, others relying on rigorous qualitative techniques, that also can enhance knowledge. The board also encourages international studies using qualitative techniques, especially when 14

OCR for page 14
PRINCIPLES FOR APPRAISING PROPOSALS 15 they enrich or parallel previous or contemporaneous quantita- tive studies. Explanatory and Descriptive Studies Comparative education studies may be more or less directly grounded in educational models or theories. At one end of a continuum are theoretically based or explanatory studies in- tended to build or test complex models linking educational resources, practices, and outcomes. At the other end are descriptive studies, intended only to monitor or document critical facets of educational systems, practices, or outcomes. More theoretically grounded studies often probe the relationships among variables in an effort to seek evidence for causality. For example, they might be designed to study the educational effects of cultural and other large contextual differences among countries or to determine the degree to which teacher charac- teristics, family expectations, textbooks, or funding levels are correlated with and might explain educational achievement. They might relate the education levels of different nations' populations to their financial support for schooling or to voter participation. They may also be designed to compare peda- gogical approaches and their effects on students' learning by including longitudinal item-level data. Less theoretically oriented studies might include collection and compilation of data on student achievement, teacher salaries, curricula, or enrollments. They might map the range of variation, determine trends over time, or chart the progress of reforms. These studies are of increasing interest to policy makers as nations intensify their investments in human capital because they provide information that can assist in shaping and selecting from broad policy options. We caution, however, that the comparability of the results of such studies depends on the degree of similarity between the country contexts, and therefore the results must be placed in a clearly identified context. In discussing the board's principles for appraising compara- tive education studies, we refer to less theoretically oriented studies as descriptive, and those that are explicitly grounded in particular theories as explanatory. We use the term explanatory

OCR for page 14
16 INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION because explanation is the goal. However, it needs to be em- phasized that correlations are not necessarily and often are not indicators of cause and effect. In addition, there is no sharp division between these two categories of studies, and any particular study is likely to partake of both purposes in some degree. Quantitative and Qualitative Studies Comparative studies also vary in their reliance on objective measurement, quantification, and narrative description and on use of statistical methods or systematic observation. There is no sharp division between these latter two research approaches, but we refer to the first approach as quantitative and the second as qualitative. Some studies use both quantitative and qualitative methods; in fact, qualitative strategies can be embedded in quantitative studies to illuminate relationships. Quantitative studies most often rely on scientific samples from carefully framed populations that are usually defined at the level of individual students, although primary and intermediate sampling units may be at some other level of aggregation. Numerically quantifiable data are collected, usually with tests or questionnaires, and these sample data are used to support statistical inferences to the population. Quantitative methods can also be used to study resources, activities, and outcomes at the classroom or school level. Qualitative studies are more likely to use samples defined at the level of classrooms, schools, or school systems, rather than individual students. The number of units sampled Is typically much smaller than for quantitative studies, but they are investigated much more intensively. The sites investigated are usually chosen systematically to represent a range of demographic characteristics, organizational arrangements, or other features relevant to the questions to be addressed. Observations and interviews will be conducted over a period of time, sometimes by an investigator who participates in the ongoing activities of the school or other setting studied. Case studies can be used initially to document relationships that, once understood, can then be translated to survey formats; and survey results, in turn, can stimulate in- depth case studies. A special type of qualitative study is docu-

OCR for page 14
PRINCIPLES FOR APPRAISING PROPOSALS 17 mentation relating to the history of education systems. His- torical studies are very important for understanding the conditions that account for particular structures of schooling and achievement levels and can aid in developing realistic policy alternatives. The fundamental principles of sound research apply equally to qualitative and quantitative studies, but there are different canons of systematic inquiry for each which entail different warrants for generalization. Thus, proposals for qualitative or historical studies and those for quantitative studies must be evaluated by somewhat different criteria. In characterizing studies, other distinctions can also be made. Many studies are cross-sectional, obtaining data for only one point in time. Others are longitudinal, obtaining information on the same sample at various points in time, for example, at the beginning and end of the school year. Other contrasting approaches are large-scale, randomized surveys of entire nations versus smaller, localized, but intensive observational studies. The board believes there is value in all these different varieties of inquiry and does not hold any particular research strategy, descriptive or explanatory, quantitative or qualitative, longitudinal or cross-sectional, to be uniformly superior. Rather, the overriding concerns are that the methods used be appropriate to the ques- tions posed and that, regardless of topic or technique, a pro- posed study adhere to appropriate canons of systematic inquiry, consistent with the principles, enunciated below. These principles are to be regarded as a set of basic stan- dards to which proposed studies should aspire. Rather than suggesting what ought to be studied or which proposed studies would be of greatest significance, these criteria only suggest how a study ought to be conducted or what questions most proposals should address. In practice, of course, discussions about "how" will be shaped by views about what ought to be studied and the significance of the issues. Finally, it will be clear that not all of these principles are relevant to all studies. Many pertain only to particular purposes or methods of inquiry. Moreover, many of the principles describe ideals that may sometimes be difficult or impossible to attain. Because of practical constraints imposed by time, resources, knowledge, and the sometimes competing values and interests of study participants, the design of every study must embody

OCR for page 14
18 INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION compromises. Depth may be traded for breadth, sample sizes may be smaller and instruments shorter than ideal, and so on. There is probably no perfect proposal or perfect study. Conse- quently, researchers are encouraged to consider which principles are most relevant to their own investigations and to view these principles as ideals to strive for as they inevitably balance competing demands and practical constraints. Certainly all principles should be carefully considered in the design of any study. Relation to Education The board interprets "education" broadly. In addition to for- mal instruction delivered through various institutions to indi- viduals of all ages (including adults), the term is intended to include activities, whether formal or informal, that directly re- late to education and educational agencies and institutions. Areas within the purview of the board include studies or surveys of student performance or other educational outcomes; educational requirements; planning processes; curricula; instructional ma- terials, resources, and practices; structural arrangements; pro- fessional preparation; parents', pupils', and professional edu- cators' attitudes; enrollment and dropout rates; and those that analyze education as part of the political agenda or the economy. Even this list is only illustrative; it is by no means exhaustive. By way of contrast, proposed international comparative studies or surveys of the effects of nutrition, housing, or health effects on schooling, however significant and useful, would probably not be construed primarily as studies of educational activities, agencies, or institutions. Relation to Other Studies and Information Sources The value of achievement scores and other educational data or findings may be enhanced when they can be compared directly with information collected in the past or from other populations. Thus, the board supports the idea of studies that provide for linkages to earlier comparative studies or surveys in the same subject area, even though it recognizes that most international studies to date have not been so designed. Because of the

OCR for page 14
PRINCIPLES FOR APPRAISING PROPOSALS feature of such studies. 19 technical difficulties associated with monitoring trends over time, an appropriate statistical mode} should be a key design ~ ' ' ' When appropriate and feasible, the value of a proposed study may also be enhanced by the use of test items and data collection strategies that permit linkage to planned or ongoing national or regional data collections. Such linkages might be accomplished by providing for a core data collection with options for national augmentation. However, any such scheme should strive to ensure that augmentation does not compromise the validity of the international comparisons. Relation to Policy, Practice, or Understancling in the United States A proposal for an international comparative education study or survey should be appraised first and foremost on its likelihood of informing educational policies, practices, or the scholarly understanding of professional educators and researchers. Or- ganizations and individuals planning such studies should not assume that the utility of what they propose is automatically evident. Thus, a proposal should include a list of the questions the proposers expect to answer, and it should include a de- scription of its significance for informing policy makers, im- proving practice, or systematically adding to knowledge. In documenting how a critical issue will be addressed, the proposal should show inputs that can be manipulated by policy makers. It should show sensitivity to questions important to policy makers, administrators, teachers, researchers, and other stakeholders, and it should specify the means by which the analysis and study conclusions will be disseminated to relevant audiences in participating nations. The board notes that studies narrowly limited to comparing highly aggregated mean levels of educational achievement for participating nations, assessed at a single point in time, are likely to be somewhat more difficult to justify in terms of their relevance to policy, practice, or understanding than are studies with the potential to illuminate the role of educational factors (e.g., organization of the curriculum or teacher training) in promoting achievement. They do, however, provide impor- tant contextual information for policy makers, particularly on

OCR for page 14
20 INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION macroleve] and alterable variables. Clearly, the board has spe- cific and particular concerns with the utility of cross-national studies to audiences within its own nation and therefore en- courages proposals for studies of potential value to educational practice, policy, and research in the United States. Every country's curriculum is rooted in its culture. Some- times, in the interests of expanding a study to make it a wide- ranging cross-national comparison of achievement, data relevant to national understanding and national policy may be compro- mised. More detailed and purposeful studies of a small num- ber of comparable countries may be more useful in these cases than large-scale cross-national studies. Attention to Eclucational Influences and Cultural Context The cultural context for learning may contribute to differ- ences in expectations that affect not only what is taught but when it is taught. The fundamental problem of cross-cultural comparisons is the need for a strong theory explaining the contextual differences among the nations. A proposed international study should display sensitivity to the cultural contexts (e.g., language spoken, religion, laws, implements used, values held) for the education dimensions to be assessed. The study plan should be reviewed by an individual in each participating country who understands how educational influences and cultural context shape and are shaped by policy. Also of concern are demographic and economic trends disag- gregated by occupational divisions or rural-urban residence, for example, to permit examining the educational attainment of various subpopulations across nations. Among other concerns, the utility and interpretation of the study should be considered in the light of participating nations' resources, curricula, graduation requirements, and school-going populations. Even descriptive surveys, intended to chronicle the conditions of two or more nations on one or a few dimensions (e.g., teacher salaries or 12- year-olds' mathematics knowledge) should strive to provide information regarding the contextcountry wealth, value placed on technology, and so on in which such conditions are em- bedded in each of the nations included in the sample. Although much of this information is available, organizing it into a com-

OCR for page 14
PRINCIPLES FOR APPRAISING PROPOSALS 21 mon framework with interpretive usefulness can be very diffi- cult. Conceptual Coherence of the Research Another underlying principle in considering proposals, par- ticularly those for explanatory studies, is the degree to which the prospective study represents a conceptually cohesive research endeavor. This means that a proposal that is technically sound but that largely ignores past studies or is disconnected from existing bodies of knowledge in the study area, or in which intellectual elements of the research are fragmented or contra- dictory, may be inadequate. Descriptive studies should likewise demonstrate awareness of any recent closely related studies. Research Neutrality and Involvement An international comparative education study must avoid political, national, religious, racial, gender, or ideological bias. It is particularly important to make certain that, if western paradigms are used, they are relevant to other geographic areas. Therefore, it is essential that all nations to be included in a study participate in the study design, and mechanisms for fa- cilitating such participation should be described in the proposal. Although it is important to safeguard against biases, actual differences (political, ideological, gender, and even religious) present challenges in comparative research that must be recognized. Such differences are often meaningful sources of cultural variation. International Scope Prospective studies submitted should have a clear cross-na- tional scope, and the United States, either in toto or in appropriate states and regions, should be included among the nations pro- posed to be studied. The United States and at least one other nation should be involved, unless a study has already been done in the United States and the same study is being repeated in other countries to obtain relevant comparisons. In general, there should be no upper limit on the number of international comparisons to be undertaken, although for reasons of resources

OCR for page 14
22 INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION and manageability it may be important to limit the number of countries participating in any given study. Involvement of developing countries in international studies contributes to the development of local research capacity and also broadens the sample of participating countries. Third-worId participation improves North-South dialogue as well as East-West linkages. Education research studies are good vehicles for building trust and cooperation. The important consideration is that the pro- posed study be clearly cross-national in its scope and intent. Conclitions under which countries (or national data) watt be excluded from a given study which are usually associated with data quality or failure to meet deadlines should be macle explicit. Personnel, Institutional, and Financial Capacity Organizations and individuals proposing a comparative in- ternational study should have qualifications and credentials appropriate for the proposed undertaking. The institution pro- posing the study or serving as the international center should demonstrate that it has a good research record, preferably in international research. The institution must show that it pos- sesses among its staff the necessary organizational, language, psychometric, statistical, probability sampling, data management, and specific subject-matter skills, as well as staff who have a thorough knowledge of the principal ideas behind the educational systems that are included in the study and experience working with researchers in different countries and cultures. The in- dividuals who coordinate the study within individual countries are also key for success of the study. They should have a very thorough knowledge of their own educational systems and of the subject areas under study, and they should have some ex- perience with survey research. To participate effectively in the international planning meetings they need to speak the inter- national common language which currently is English. Cross- national study organizers need to ensure that participating nations have available sufficient expertise to enable them to fulfill their obligations. In addition to ensuring that the researchers involved possess the appropriate background and training, evidence should be

OCR for page 14
PRINCIPLES FOR APPRAISING PROPOSALS 23 provided that financial resources being sought for the proposed study (or, occasionally, already available) are sufficient to con- duct the study in a technically valid manner. The matter of sufficient resources is particularly significant. Past experience suggests that proposed studies are frequently well conceived, but that they later develop operational flaws due to debilitating compromises necessitated by inadequate resources. International studies cost more than national studies, but without realistic handing neither the quality of the work nor adherence to time schedules can be guaranteed. The board encourages organizations that are planning international studies and researchers who undertake responsibility for a country's participation in a study to avoid such situations by ensuring from the outsets to the extent reasonable, that adequate resources exist or wiD be obtained. Prior to undertaking a study, the organization responsible for the international aspects of the study should have firm funding commitments for international planning (both theoretical and operative); coordination; instrument development; training; data cleaning; analysis; and data documentation, preservation, and dissemination. The study plan should demonstrate that the steps of the study are well integrated and mapped out in advance. Provision should be made for an initial task force to secure pertinent expert advice, and sufficient time should be provided to secure funding from multiple sources. Schedules and budgets should be realistic and should cover data analysis, reporting, and dis- semination as well as study design and data collection. Finally, it is important to ascertain whether a proposed study is overly ambitious. Would participating countries have the personnel and financial capacity and endurance to complete a study with large numbers of instruments ant] questions, which would take up to 7 years, or would a more modest study be more productive in the long run? Technical Validity A complex education study may serve a variety of descrip- tive or explanatory needs, but its primary justification is likely to rest on the few central questions or issues it is designed to address. For any proposed international study, these key ques-

OCR for page 14
24 INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION lions or issues should be explicit. In an explanatory study, the relationship of the issues to existing knowledge should be clear, and the study should be technically capable of addressing those issues. The proposed methodology, design, and statistics should fit the underlying model. The more specific guidelines that follow are subordinate to this general principle. Their importance to any particular study will depend on the major purposes the study is intended to serve. They are directed primarily to cross- national student achievement studies, which have been the fo- cus of most of the board's early activity. The board's scope of activity is expanding and later revisions of the principles will include specific guidelines for other kinds of comparative studies of education, for example, studies that attempt to explain how differences in attainment are produced or those that focus on more culture-bound factors. Sampling en c! Access to Schools Nearly all quantitative studies, both descriptive and explanatory, as well as some qualitative studies, necessitate drawing a sample from the full population of all respondents, that is, all teachers, all administrators, all students at an age or grade level, or all policy makers. Valid estimation of population parameters from sample data depends critically on rigorous adherence to an explicit sample design. Whenever statistical inference from a sample to a population is intended, proposals for international comparative studies should describe in appropriate detail their plans for framing and selecting samples in participating coun- tries as well as for exclusion of particular subgroups (e.g., persons who are developmentally disabled or who do not speak the language of the test). Subgroups should not be excluded solely for convenience in administering a test: for example, students not in the modal grade for the target population should not be excluded. Whenever a subgroup is excluded, information should be provided on the portion of the target population excluded and the extent and direction of bias introduced by the exclusion. Potential differences in student demographics among countries must also be considered. The population of students in coun- tries in which the rate of participation in education is low may

OCR for page 14
PRINCIPLES FOR APPRAISING PROPOSALS 25 be very different from the population sampled in a country where the participation rate is high. Each sample should be designed so as to support reasonably accurate inferences about an age or grade cohort, and the pro- portion of each cohort covered should be carefully estimated and reported. The sample should be designed to ensure it captures the range of individual, school, or classroom variation that exists in the nation sampled. Explicit delineation of the populations and subpopulations to be sampled is critical. Within- country samples may be defined according to geographic regions, language groups, school systems or sectors (e.g., public versus private), or other relevant stratification variables. The board recognizes the difficulty of defining comparable samples across different nations' school systems and curricula. Nonetheless, corresponding national samples should be defined in such a way that valid and informative cross-national comparisons are possible. To facilitate the sample selection, an international sampling manual is essential. In view of the complexities in this area, the board encourages the appointment of an experienced and expert sampling consultant to scrutinize sampling plans in all participating countries. Individual country samples should be approved by the international sampling consultant before testing takes place. Well in advance of the date for test administration, arrange- ments should be made with the appropriate organization or individuals (ministry, state, district, school, teachers) to ensure high participation rates in the study. While the principle of strict adherence to an explicit sample design is sound, the achieved sample in actual international studies is usually different from the designed sample, especially so in countries in which response rates are low. The sampling manual should include a maxi- mum acceptable nonresponse rate for inclusion of a country's data in the international analyses. Subnational or regional units smaller than a nation should be allowed to participate in international studies if they have separate autonomous school systems. However, study results for such units should be reported in separate tables from the data for whole nations. Even though the sample designs for large-scare studies sat- isfy the criteria described above, typically they cannot afford

OCR for page 14
26 INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION the close direct observations that qualitative educational re- searchers want. Smaller in-depth studies of relatively small, localized samples in a small number of sites can also play an important role in comparative education research and policy development. Content Sampling and Design of Achievement Items Achievement items in an international comparative study may be used to support inferences about broad curriculum areas. Thus, it is critical that they be chosen according to an explicit and justified plan. The curricula of all participating nations should be considered in formulating such a plan, and content specifications should be developed through a consensual process involving representatives from all of the nations involved. Ample time should be allowed for meetings on content sampling and design of achievement items. At these meetings, information should be available on the purpose of each item, to assist the country representatives in selecting those that will evaluate the most important knowledge and skills. In general, coverage should be broadly inclusive. It will probably be desirable to assess a core of learning objectives common to most participating nations, but if there is general agreement on the importance of relevant, measurable learning outcomes that do not appear in participating nations' curricula, they may be included. It may also be desirable to include objectives in other domains, for example, student attitudes, values, and creativity. Matrix sampling (i.e., dividing the items to be included into subsamples to be administered to different students) might be considered as a means to increase the number and diversity of test questions included without unduly burdening individual survey respon- dents. The validity of test items should be reviewed by teams of experts that include cognitive scientists, educational psy- chologists, and curriculum or methods specialists in the rel- evant disciplines. The board recognizes the complexity of sampling curriculum content and the intractable problems of interpreta- tion when comparing student outcomes for countries with very different learning objectives.

OCR for page 14
PRINCIPLES FOR APPRAISING PROPOSALS 27 Coverage of Performance and Higher-Order Skills When assessing student performance, objective questions can offer considerable assessment efficiencies relative to free-re- sponse items (such as open-ended questions), and multiple choice paper-and-pencil items can be designed to measure some higher- order skills. Nonetheless, consideration should also be given to the inclusion of test items and other data collection formats offering opportunities for students to display their performance abilities. Increased emphasis should be placed on writing, speaking, and interacting in both practical and school tasks. For example, reading, writing, and problem solving might be assessed in the context of particular subject areas. When feasible, complex, conceptual knowledge, process skills, and higher-order thinking should be assessed, as well as important factual knowledge, basic skills, and other outcomes usually achieved earlier and considered prerequisite for higher-level learning. Of course, there are economic considerations that must be taken into account in any study that uses "hands-on" assessment activities, but in most cases time and resources should be reserved to make some open- ended tasks possible. Instrument Construction Test Instruments There may be sound reasons to use existing test instruments in international comparative studies, including continuity with earlier studies and linkage to other ongoing studies, as well as economy and efficiency. When new instruments are developed, however, they should adhere to high standards. Test content should represent a reasoned balance among the curricula and the information needs of all nations to be included in a stubbly. The test development process should allow for participation by representatives of the various nations involved and should be informed by expertise in the curriculum area assessed, in the cognate academic discipline, and in educational measure- ment. Care should be taken to avoid redundancy among the questions. If new measures are proposed, there should be evidence

OCR for page 14
28 INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION that the measure works in at least one country before it is included in an international study. Whenever corresponding tests in more than one language must be prepared, the test should include some items origi- nating in each of the languages represented. Consideration should be given to the development of parallel text materials that are constructed simultaneously within the cultural context of the different nations, rather than simply translated. If this is not feasible economically, and translation is used, aU exercises should be back-translated to enhance accuracy and compara- bility. In addition, qualified bilingual experts should scrutinize pairs of tests, item by item, for unintended differences in emphasis or levels of abstraction. Care must be taken to ensure the equivalence of meaning of an item in the different languages. New or substantially revised tests should be pilot-tested to ensure the quality of individual items and instructions to examiners, as well as the appropriateness of time Innits for the questionnaire. Following the pilot test, a check should be maple for item bias, including cultural bias or translation bias, by examining the relative difficulty of an item to other items in a subtest or domain. A check should also be made of the appropriateness of any statistical mode! used for scaling to ensure that it can cover the total range of scaled scores from all countries before the tests are used in any main testing. A standardized research design across countries is essential, although national or international options can be added. Other modifications of the standardized design should not be per- mitted, since they can have serious consequences for validity or comparability. ~ ~ e ~ ~ Backgrounc! Questionnaires Educational achievement data cannot be appropriately inter- preted in the absence of information about responding students, their backgrounds, their motivations, and their educational experiences. For cross-national studies of achievement test scores, it is especially critical that such information be collected. Back- ground questions should be selected judiciously, and particular attention should be given to matters such as variables (a) relevant to the interpretation of achievement patterns, (b) plausibly related

OCR for page 14
PRINCIPLES FOR APPRAISING PROPOSALS 29 to school achievement (including locally available educational resources), or (c) reflecting additional schooling outcomes val- ued in their own right. Explanatory studies that rely on quantitative data should generally not rely exclusively on students' own reports of such factors. Such studies should also include instruments directed to teachers, administrators, and parents. For example, teachers or curriculum coordinators might be asked about the availability and use of particular instructional materials, local curriculum, or specific instructional practices. A structural mode} that postulates cause-effect relationships to account for variation in student achievement should be used in selecting background questions. The mode} can also guide the analyses directed to identifying the sources of individual and group differences in achievement and the relative impact of these sources. Background variables about students seek to explore the relationship between students' background and home environments and achievement and attitudes. For example, information might be requested about the students (age, gender, race or ethnicity), indicators of family environment, parental encouragement, and attitudes toward school assignments in the subject matter being assessed. Information sought from teachers might include information about their teaching expenence, availability and use of particular instructional materials, local curriculum, and classroom environment. School administrators might be asked for data on school factors believed to influence student achievement, such as instructional time, student enrollment and attendance, and programs in the subject area. Background information collected from students, teachers, and school administrators can be supplemented by data from other sources that provide economic and social indicators for the various nations participating in the study. Economic and social indicators can be related to student achievement In various sectors of the population (e.g., rural or urban) and can also be used to explore the relationship of student achievement to eco- nomic development, resource development, industrialization, political stability, and the like across nations. Representatives of all the countries participating in a study should be involved in developing background questionnaires as they are for the test instruments. Similarly, care should be

OCR for page 14
30 INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION given to translation, back-translation, and scrutiny of background questions to ensure the equivalence of meaning of a question In different languages. The background questionnaires should be pilot-tested. Because background data become more valuable if they can be compared over time and across populations, the same wording should be retained from study to study. Although it is difficult, effort shout be made to ensure that background} variables are defined similarly in the languages of all participating nations. Similar effort is required to ensure the comparability of social and economic indicators for all participating nations. All variant definitions should be documented. Test Aciminisbation Whenever achievement results are to be compared from one test administration to another, it is ~rnperative that administrative procedures be controlled to be as nearly identical as possible. Maintenance of standard test administration procedures over time and from one nation to another is of paramount importance. Standardized procedures for instructing students and establishing conditions for testing should be developed, based on a pilot test of the instructions in each participant country. Time should be allotted at an international meeting of study coordinators to listen to their complaints and suggestions following the pilot test and to agree to standard administrative procedures. Test- ing materials should be clearly understandable. The testing environment should be comparable from one setting to another both within ant} across nations and should be free from clis- tractions. Each study design should address plans to control and stan- dard~ze conditions of test administration. IdeaDy, to ensure adequate quality control, suitably trained people from outside the schools should be in charge of the test administration. In addition, people from different countries should supervise the implementation of the procedures to be followed (previously agreed on by the countries involved) by being present on site when the field work is conducted. Such quality control procedures would assure more uniform test administration, particularly in countries with little experience in assessment. Each design

OCR for page 14
PRINCIPLES FOR APPRAISING PROPOSALS 31 also should address the level of student motivation to try to minimize any plausible systematic differences from one nation (and from one test administration within a nation) to another in incentive to perform well in response to test questions. Each country report should carry a description of test administra- tion conditions. Plans for Analysis, Reporting, and Dissemination Plans for analysis, reporting, and dissemination of interna- tional comparative study findings should be described at the time the study is proposed and should indicate how the critical questions to be informed by that study will be addressed. These plans should provide for balanced reporting of cross-national comparisons and may also involve separate analysis and reporting of data from each participating nation or subsets of them. The board discourages exclusive, or even heavy, reliance on overall national rankings. Very often differences in educational systems render such comparisons invalid; a more productive approach is to find out the reasons for observed differences in pupil achievement. Prior to the release of any cross-national report, opportunities should be provided to all nations for review of the analysis and interpretations. Without dwelling on them too much, reports should give prominent place to a discussion of the known and surmised limitations. Reporting should be sensitive to contextual factors that might affect test validity, for example, the relative familiarity of children in different countries with testing in general or with the particular item formats used in a comparative study. The possibility might also be considered that children who are exposed to a great deal of testing may expend less effort on "low stakes" tests they know do not matter for their own educational futures. Reporting should also be sensitive to technical limitations on a study's interpretability. Limitations might include caveats about the comparability of national samples, the limited number of test items or range of content on which comparisons are based, differences in administration conditions from place to place, the match of tests to different curricula, the difficulty of trans- lating exercises from one language to another, the limited pre- cision of sample statistics, or other qualifications on study findings.

OCR for page 14
32 Analysis Plan INTE=ATIONAL COMPARATIVE STUDIES OF EDUCATION For various reasons, data analysis plans may change or evolve from the time a study is designed to the time it is completed and reported. Unforeseen difficulties in data collection or limitations of data quality may preclude some planned analyses. New questions or insights that occur in the course of data col- lection and analysis may open productive new lines of inquiry. Data already collected may be pressed into service to address emergent policy issues. Even when such evolution is anticipated, however, every proposal for an international educational study should include an analysis plan. The correspondence between the analyses proposed and the questions they are intended to answer- if not obvious should be made explicit. In both ex- planatory and descriptive studies, it should be clear how theo- reticaDy central variables are to be measured and how relationships among critical variables are to be assessed. In qualitative studies, methods of examining and relating alternative data sources should be indicated, and anticipated procedures for developing conceptual or explanatory frameworks should be described. Level of Detail in Reporting In any complex study, there is a tension between the level of detail and the precision of the reported results. At one extreme, an average score over a large number of test items for an entire nation may be estimated quite precisely, but it conveys little information. At the other extreme, reports of numerous quartiles of the score distributions for narrow student subpopulations on individual items may be so poorly estimated that they also convey little information. However this tension is resolved, it is crucial that standard errors be calculated and reported with all reported statistics. Calculation of standard errors is technically complex, and the board encourages the use of a recognized expert consultant in this and other analysis stages, as it does for sampling. The first issue to be resolved with respect to the appropriate level of detail in reporting is the number and size of subpopu- lations to be distinguished. Performance may be reported for major subgroups of student cohorts, defined by geographic region,

OCR for page 14
PRINCIPLES FOR APPRAISING PROPOSALS 33 language background, gender, race and ethnicity, or other variables, if such reporting advances the purposes of the study. When achievement is reported, the utility of multiple scores should be considered. In many cases, interpretive emphasis is prop- erly given to major content and process categories rather than to total scores. Finally, within the limits on precision imposed by the design and size of a study, distributional summaries should be given and not just means and standard deviations. Reporting of quartiles (e.g., deciles, or quartiles) is one method that is readily explained and understood, and graphics such as box plots are easily understood and of potential value. Con- sideration may also be given to reporting at multiple levels of aggregation if that is appropriate to the design and intent of the study. In addition to presenting the student-level score distribution, for example, distributions of classroom or school means might also be reported. Standards and Criterion Levels Studies concerned with student achievement data can be en- hanced considerably by reporting outcomes in terms of performance standards, for example, the percentage of students who know everyday science facts or who use scientific procedures and analyze scientific data. This can be difficult to accomplish, however, and there is a risk that arbitrarily established stan- dards will lead to serious misinterpretations of achievement levels. if results are reported relative to specified performance levels (e.g., functional literacy), the basis for establishing these levels must be explicit, defensible, and responsive to the needs and contexts of all the nations involved. This might imply the use of different criterion levels for cross-national reporting than for national reporting. Alternatively, a graduated series of proficiency levels might be defined, labeled with appropriate descriptors, and illustrated with representative test items. Special Reports for Nontechnical Audiences Special reports should be prepared for nontechnical audi- ences, including the press, politicians, and policy makers. These

OCR for page 14
34 INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION reports, which are designed to serve political purposes, differ from the more detailed reports intended for research and edu- cational purposes. They should be designed so that the infor- mation is easily assimilated. Useful analytic tools for such reports include simple graphs, percentiles, and a graduated series of proficiency levels with illustrative test items. Preparation of this type of report plays a role in institutional capacity building by forging links between the research and policy making communities. It also augments the dissemination of the latest information and techniques and will enhance long- term funding prospects. Study proposals should provide for mechanisms to disseminate results widely among public and private organizations. Such dissemination stimulates debate, which makes it more likely that study findings will be put into practice. Data Audit and Evaluation Experience has shown that national researchers make many changes in background questionnaires from the intent of the international questions. This leads to nonconformity of data to the international code book, which requires extensive work by the international coordinators, to clean the data. In some cases it is desirable to produce a data-entry program and a data- cleaning program for the use of national research coordinators. The technical features of any international comparative study should be clearly documented. It is desirable that at least a summary of the methods involved be included in the principal reports, along with estimates of sampling precision. More de- tailed documentation, which might be published in a separate volume from the main report of the study, should address such matters as maintenance of the security of test materials before the actual testing; sampling adequacy (participation rate, attri- tion, absentee follow-up); comparability of administration con- ditions; procedures for audit of data collection; data checking, cleaning, and scoring; procedures for review of study reports prior to publication; and other procedural matters that may condition the confidence placed in study findings.

OCR for page 14
PRINCIPLES FOR APPRAISING PROPOSALS Public Use of Data 35 Countries participating in studies should be authorized to release their own findings as soon as the national data file is cleaned, merged into the international file, and ready for analysis. Provisions should be made to ensure that, when appropriate and within a reasonable period after analysis and reporting by project sponsors, data are placed in the public domain in a form accessible for secondary analysis. Special attention should be paid to making the data accessible to researchers in third- worId countries. Clear and complete data documentation is crucial. When feasible, consideration should be given to using existing archives. The importance of making international data easily accessible for secondary analysis should not be underestimated. More extensive use of the data at the national policy level can help in understanding the weaknesses and strengths of the U.S. educational system as well as those of other countries.