References

Abedi, J., Hofstetter, C., Baker, E., and Lord, C. (2001). NAEP math performance and test accommodations: Interactions with student language background. Los Angeles: National Center for Research on Evaluation, Standards, and Student Testing, University of California.

Abedi, J., Lord, C., and Hofstetter, C. (1998). Impact of selected background variables on students’ NAEP math performance. Los Angeles: National Center for Research on Evaluation, Standards, and Student Testing, University of California.

Achieve, Inc. (2002). Staying on course: Standards-based reform in America’s schools: progress and prospects. Washington, DC: Author.

Almond, R.G., Steinberg, L.S., and Mislevy, R.J. (2001). A sample assessment using the four process framework. (CSE Technical Report No. 543). Los Angeles: Center for the Study of Evaluation, National Center for Research on Evaluation, Standards, and Student Testing, University of California. Available: http://www.cse.ucla.edu/products/reports_set.htm [acccessed June 2005].

Almond, R.G., Steinberg, L.S., and Mislevy, R.J. (2002). Enhancing the design and delivery of assessment systems: A four-process architecture. Journal of Technology, Learning, and Assessment, 1(5). Available: http://www.bc.edu/research/intasc/jtla/journal/v1n5.shtml [accessed June 2005].

American Association for the Advancement of Science. (1989). Science for all Americans: A Project 2061 report on literacy goals in science, mathematics, and technology. Washington, DC: Author.

American Association for the Advancement of Science. (1993). Benchmarks for Science Literacy. New York: Oxford University Press.

American Association for the Advancement of Science. (2001). Atlas of Science Literacy. Washington, DC: Author.

American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: Authors.

American Federation of Teachers. (1996). Making standards matter. Washington, DC: Author.

American Federation of Teachers (1999). Making standards matter 1999: An update on state activity. Available: http://www.aft.org/pubs-reports/downloads/teachers/policy11.pdf [accessed June 2005].



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 171
Systems for State Science Assessment References Abedi, J., Hofstetter, C., Baker, E., and Lord, C. (2001). NAEP math performance and test accommodations: Interactions with student language background. Los Angeles: National Center for Research on Evaluation, Standards, and Student Testing, University of California. Abedi, J., Lord, C., and Hofstetter, C. (1998). Impact of selected background variables on students’ NAEP math performance. Los Angeles: National Center for Research on Evaluation, Standards, and Student Testing, University of California. Achieve, Inc. (2002). Staying on course: Standards-based reform in America’s schools: progress and prospects. Washington, DC: Author. Almond, R.G., Steinberg, L.S., and Mislevy, R.J. (2001). A sample assessment using the four process framework. (CSE Technical Report No. 543). Los Angeles: Center for the Study of Evaluation, National Center for Research on Evaluation, Standards, and Student Testing, University of California. Available: http://www.cse.ucla.edu/products/reports_set.htm [acccessed June 2005]. Almond, R.G., Steinberg, L.S., and Mislevy, R.J. (2002). Enhancing the design and delivery of assessment systems: A four-process architecture. Journal of Technology, Learning, and Assessment, 1(5). Available: http://www.bc.edu/research/intasc/jtla/journal/v1n5.shtml [accessed June 2005]. American Association for the Advancement of Science. (1989). Science for all Americans: A Project 2061 report on literacy goals in science, mathematics, and technology. Washington, DC: Author. American Association for the Advancement of Science. (1993). Benchmarks for Science Literacy. New York: Oxford University Press. American Association for the Advancement of Science. (2001). Atlas of Science Literacy. Washington, DC: Author. American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: Authors. American Federation of Teachers. (1996). Making standards matter. Washington, DC: Author. American Federation of Teachers (1999). Making standards matter 1999: An update on state activity. Available: http://www.aft.org/pubs-reports/downloads/teachers/policy11.pdf [accessed June 2005].

OCR for page 171
Systems for State Science Assessment American Federation of Teachers. (2001). Making standards matter 2001. Washington, DC: Author. Archibald, D.A. (1998, July). The reviews of state content standards in English language arts and mathematics: A summary and review of their methods and findings and implications for future standards development. Paper commissioned by the National Educational Goals Panel. Available: http://govinfo.library.unt.edu/negp/reports/810fin.pdf [accessed June 2005]. Ayala, C.C., Yin, Y., Shavelson, R.J., and Vanides J. (2002). Investigating the cognitive validity of science performance assessment with think alouds: Technical aspects. Paper presented at the annual meeting of the American Educational Research Association. New Orleans, LA. Baker, E.L. (1997). Model-based performance assessment. Theory into Practice, 36, 247–254. Baker, E.L. (2003, Summer). Multiple measures: Toward tiered systems. Educational Measurement: Issues and Practice, 22(2), 13–17. Baker, E.L., Abedi, J., Linn, R.L., and Niemi, D. (1996). Dimensionality and generalizability of domain-independent performance assessments. Journal of Educational Research, 89, 197–205. Baker, E.L., Linn, R.L. Herman, J.L., and Koretz, D. (2002). Standards for educational accountability systems. (Policy Brief No. 5). Los Angeles: National Center for Research on Evaluation, Standards, and Student Testing, University of California. Baron, J.B. (1990). Performance assessment: Blurring the edges among assessment, curriculum, and instruction. In A.B. Champagne, B.E. Lovitts, and B.J. Calinger (Eds.), Assessment in the service of instruction: This year in school science 1990. Washington, DC: American Association for the Advancement of Science. Baxter, G.P., and Glaser, R. (1998). Investigating the cognitive complexity of science assessments. Educational Measurement: Issues and Practices, 17, 37–45. Baxter, G.P., Elder, A.D., and Glaser, R. (1996). Knowledge-based cognition and performance assessment in the science classroom. Educational Psychologist, 31(2), 133–140. Bejar, I.I. (1996). Generative response modeling: Leveraging the computer as a test delivery medium. (ETS Research Report No. 96–13). Princeton, NJ: Educational Testing Service. Bennett, R. (1998). Reinventing assessment: Speculations on the future of large scale educational testing. Princeton, NJ: Educational Testing Service, Policy and Information Centre. Bennett, R.E. (2002). Using electronic assessment to measure student performance. (Issue Brief). Washington, DC: NGA Center for Best Practices. Available: http://www.nga.org/cda/files/ELECTRONICASSESSMENT.pdf [accessed June 2005]. Blank, R., and Pechman, E. (1995). State curriculum frameworks in mathematics and science: How are they changing across the states? Washington, DC: Council of Chief State School Officers. Blumenfeld, P., Soloway, E., Marx, R., Krajcik, J.S., Guzdial, M., and Palincsar, A. (1991). Motivating project-based learning. Educational Psychologist, 26(3 & 4), 369–398. Bond, L. (2000). Good grades, low test scores: A study of the achievement gap in measures of quantitative reasoning. Paper presented at the Fifth Annual National Institute for Science Education Forum, May 22–23, Detroit, MI. Borko, H., and Elliott, R. (1998). Tensions between competing pedagogical and accountability commitments for exemplary teachers of mathematics in Kentucky. (CSE Technical Report No. 495). Los Angeles: Center for the Study of Evaluation, National Center for Research on Evaluation, Standards, and Student Testing, University of California. Borko, H., and Stecher, B.M. (2001, April). Looking at reform through different methodological lenses: Survey and case studies of the Washington state education reform. Paper presented as part of the symposium Testing Policy and Teaching Practice: A Multimethod Examination of Two States at the annual meeting of the American Educational Research Association, Seattle, WA. Boston, C., Rudner, L., Walker, L., and Crouch, L. (Eds.). (2003). What reporters need to know about test scores. Washington, DC: Education Writers Association and ERIC Clearinghouse on Assessment and Evaluation. Bransford, J.D. (1979). Human cognition: Learning, understanding, and remembering. Belmont, CA: Wadsworth.

OCR for page 171
Systems for State Science Assessment Brewer, D.J., and Stacz, C. (1996). Enhancing opportunity to learn measures in NCES data. Santa Monica, CA: RAND Corp. Briggs, D., Alonzo, A., Schwab, C., and Wilson, M. (2004). Developmental assessment with ordered multiple-choice items. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA. Brown, J.S., Collins, A., and Duguid, P. (1989). Situated cognition and the culture of learning. Educational Researcher, 18(1), 32–42. Buckendahl, C.W., Impara, J.C., and Plake, B.S. (2002). District accountability without a state assessment: A proposed model. Educational Measurement: Issues and Practice, 21, 6–16. Catley, K., Reiser, B., and Lehrer, R. (2005). Tracing a prospective learning progression for developing understanding of evolution. Commissioned paper prepared for the National Research Council’s Committee on Test Design for K–12 Science Achievement, Washington, DC. Champagne, A.B., and Kouba, V.L. (1996, October). Science literacy: A cognitive perspective. Paper presented at the College Board Forum, New York. Champagne, A.B., and Newell, S. (1994). Directions for research and development: Alternative methods of assessing scientific literacy. Journal of Research in Science Teaching, 29, 841–860. Champagne, A.B., Kouba, V.L., and Hurley, M. (2000). Assessing inquiry. In J. Minstrell and E.H. Van Zee (Eds.), Inquiring into inquiry learning and teaching in science (pp. 447–470). Washington, DC: American Association for the Advancement of Science. Chi, M.T.H., Feltovich, P.J., and Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. Cognitive Science, 5:121–152. Chi, M.T.H., Glaser, R., and Rees, E. (1982). Expertise in problem solving. In R. Sternberg (Ed.), Advances in the psychology of human intelligence (pp. 7–75). Hillsdale, NJ: Lawrence Erlbaum Associates. Chiu, C.W.T., and Pearson, P.D. (1999). Synthesizing the effects of test accommodations for special education and limited English proficient students. Paper presented at the National Conference on Large-Scale Assessment, Snowbird, UT. Choi, K., Seltzer, M., Herman, J., and Yamachiro, K. (2004). Children left behind: Focusing on the distribution of student growth in longitudinal studies. Part of the paper session Using Data Accountability Systems to Judge Schools and Reform Efforts presented at 2004 Annual Meeting of the American Educational Research Association, April 12–16, San Diego, CA. Clotfelter, C.T., Ladd, H.F., and Vigdor, J.L. (2002). Who teaches whom? Race and the distribution of novice teachers. Paper presented at the American Economic Association Annual Meeting, January, Atlanta, GA. Clotfelter, C.T., Ladd, H.F., and Vigdor, J.L. (2004). Teacher sorting, teacher shopping, and the assessment of teacher effectiveness. Available: http://trinity.aas.duke.edu/~jvigdor/tsaer5.pdf [accessed June 2005]. Cohen, D., and Ball, D. (1990). Policy and practice: An overview. Educational Evaluation and Policy Analysis, 12(3): 347–353. Collins, A., and Smith, E.E. (1982). Teaching the process of reading comprehension. In D.K. Detterman and R.J. Sternberg (Eds.), How much and how can intelligence be increased? Norwood, NJ: Ablex. Commission on Instructionally Supportive Assessment. (2001). Building tests that support instruction and accountability: A guide for policymakers. Washington, DC: Author. Consortium for Policy Research in Education. (1993). Developing content standards: Creating a process for change. (Policy Brief No. RB–10–10/93). New Brunswick, NJ: Author, Rutgers University. Cross, R.W., Rebarber, T., and Torres, J. (2004). Grading the systems: The guide to state standards, tests, and accountability policies. Washington, DC: Thomas B. Fordham Foundation. Darling-Hammond, L. (1998). Teacher learning that supports student learning. Educational Leadership, 55(5), 6–11.

OCR for page 171
Systems for State Science Assessment Darling-Hammond, L. (1999). Teacher quality and student achievement: A review of state policy evidence. Seattle, WA: University of Washington, Center for the Study of Teaching and Policy. Doherty, K., and Skinner, R. (2003). State of the states. Quality counts 2003 special report. Education Week, 22(17), 75–76, 78. Downing, S., and Haladyna, T.M. (in press). Handbook of test development. Mahwah, NJ: Lawrence Erlbaum Associates. Duschl, R. (2003). Assessment of inquiry. In J.M. Atkin and J.E. Coffey (Eds.), Everyday assessment (pp. 41–60). Arlington, VA: National Science Teachers Association Press. Education Week. (2002). Quality counts 2002: Building blocks for success. Education Week, 21(16), 8–9. Education Week. (2003). Technology counts 2003: Tech’s answer to testing. Education Week, 22(35), 8–10. Education Week (2004). Quality counts 2004: Count me in: Special education in an era of standards. 23(17), January 8. Elliott, S.N., Kratochwill, T.R., and McKevitt, B.C. (2001). Experimental analysis of the effects of testing accommodations on the scores of students with and without disabilities. Journal of School Psychology, 39(1), 3–24. Figlio, D.N., and Rouse, C.E. (2004). Do accountability and voucher threats improve low-performing schools? Available: http://www.aeaweb.org/annual_mtg_papers/2005/0109_0800_0303.pdf [accessed June 2005]. Figlio, D.N., and Rueben, K.S. (2001). Tax limits and the qualifications of new teachers. Journal of Public Economics, 80(1), 49–71. Finn, C.E., and Petrilli, M.J. (2000). The state of state standards, 2000: English, history, geography, mathematics, and science. (ERIC Document Reproduction Service No. ED 439 133). Washington, DC: Thomas B. Fordham Foundation. Firestone, W.A., Camilli, G., Yurecko, M., Monfils, L., and Mayrowetz, D. (2000, April). State standards, socio-fiscal context and opportunity to learn in New Jersey. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. Available: http://epaa.asu.edu/epaa/v8n35/ [accessed June 2005]. Firestone, W.A., Mayrowetz, D., and Fairman, J. (1998). Performance-based assessment and instructional change: The effects of testing in Maine and Maryland. Education Evaluation and Policy Analysis, 20, 95–113. Frederiksen, J.R., and Collins, A. (1989). A systems approach to educational testing. Educational Researcher, 18(9), 27–32. Galison, P. (1997). Image and logic: A material culture of microphysics. Chicago, IL: University of Chicago Press. Glaser, R. (1992). Expert knowledge and processes of thinking. In D.F. Halpern (Ed.), Enhancing thinking skills in the sciences and mathematics (pp. 63–75). Hillsdale, NJ: Lawrence Erlbaum Associates. Glaser, R., and Baxter, G.P. (1999). Assessing active knowledge. Paper presented at the Center for Research on Evaluation, Standards, and Student Testing Conference Benchmarks for Accountability: Are We There Yet?, September 16–17, University of California, Los Angeles. Glaser, R., and Chi, M. (1988). Overview. In M. Chi, R. Glaser, and M.J. Parr (Eds.), The nature of expertise. Hillsdale, NJ: Lawrence Erlbaum Associates. Goldberg, G.L., and Rosewell, B.S. (2000). From perception to practice: The impact of teachers’ scoring experience on performance based instruction and classroom practice. Educational Assessment, 6, 257–290. Goodman, D.P., and Hambleton, R.K. (2003). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. (Center for Educational Assessment Research Report No. 477). Amherst, MA: University of Massachusetts School of Education.

OCR for page 171
Systems for State Science Assessment Goodwin, B., Englert, K., and Cicchinelli, L.F. (2003). Comprehensive accountability systems: A framework for evaluation (rev. ed.). Aurora, CO: Mid-continent Research for Education and Learning. Gummer, E., and Champagne, A.B. (2005). Classroom assessment of opportunity to learn science through inquiry. In Lawrence B.Flick and Norman G. Lederman (Eds.), Scientific inquiry and nature of science: Implications for teaching, learning, and teacher education. Dordrecht: Kluwer Academic Publishers. Haertel, E.H., and Lorie, W.A. (2004). Validating standards-based score interpretations. Measurement: Interdisciplinary Research and Perspectives, 2(2), 61–103. Hambleton, R.K., and Slater, S.C. (1997). Reliability of credentialing examinations and the impact of scoring models and standard setting policies. Applied Measurement in Education, 10, 19–38. Hansche, L.N. (1998). Handbook for the development of performance standards: Meeting the requirements of Title I. Washington, DC: Council of Chief State School Officers. Hawley, W.D., and Valli, L. (1999). The essentials of professional development: A new consensus. In L. Darling-Hammond and G. Sykes (Eds.), Teaching as the learning profession: Handbook of policy and practice (pp.127–150). San Francisco, CA: Jossey-Bass. Herman, J. (2003). The effects of testing instruction. In S. Fuhrman and R. Elmore (Eds.), Redesigning accountability systems for education. New York: Teachers College Press. Herman, J., and Golan, S. (1991). Effects of standardized tests on teachers and learning—Another look. (CSE Technical Report No. 334). Los Angeles: Center for the Study of Evaluation, National Center for Research on Evaluation, Standards, and Student Testing, University of California. Herman, J.L., and Klein, D. (1996). Evaluating equity in alternative assessment: An illustration of opportunity to learn issues. Journal of Educational Research, 89(9), 246–256. Herman, J.L., and Perry, M. (2002, June). California student achievement: Multiple views of K–12 progress. Menlo Park, CA: Ed Source. Herman, J.L., Baker, E.L., and Linn, R.L. (2004, Spring). Accountability systems in support of student learning: Moving to the next generation. CRESST Line, pp. 1–7. Available: http://www.cse.ucla.edu/products/newsletters/CLspring2004.pdf. Hestenes, D. (1992). Modeling games in the Newtonian world. American Journal of Physics, 60, 732–748. Hestenes, D., Wells, M., and Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30, 141–158. Hoz, R., Bowman, D., and Chacham, T. (1997). Psychometric and edumetric validity of geomorphological knowledge which are tapped by concept mapping. Journal of Research in Science Teaching, 34(9), 925–947. Impara, J.C. (2001). Alignment: One element of an assessment’s instructional unity. Paper presented at the 2001 annual meeting of the National Council on Measurement in Education. Seattle, WA. Irvine, S.H., and Kyllonen, P.C. (Eds.). (2002). Item generation for test development. Mahwah, NJ: Lawrence Erlbaum. Jacob, B. (2003). High stakes in Chicago. Education Next, Winter, 66–72. Jaeger, R.M. (1989). Certification of student competence. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 485–514). New York: Macmillan. Jaeger, R.M. (1995). Setting performance standards through two-stage judgmental policy capturing. Applied Measurement in Education, 8, 15–40. Jaeger, R.M. (1998). Evaluating the psychometric qualities of the National Board for Professional Teaching Standards’ assessments: A methodological accounting. Journal of Personnel Evaluation in Education, 22, 189–210. Jaeger, R.M., Cole, J., Irwin, D.M., and Pratto, D.J. (1980). An interactive structure judgment process for setting passing scores on competency tests applied to the North Carolina high school competency tests in reading and mathematics. Greensboro, NC: Center for Education Research and Evaluation, University of North Carolina.

OCR for page 171
Systems for State Science Assessment Kane, M.T. (2001). So much remains the same: Conception and status of validation in setting standards. In G.J. Cizek (Ed.), Setting performance standards: Concepts, methods and perspectives (pp. 53–88). Mahwah, NJ: Lawrence Erlbaum Associates. Kingston, N., Kahl, S.R., Sweeney, K., and Bay, L. (2001). Setting performance standards using the body of work method. In G.J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives. Mahwah, NJ: Lawrence Erlbaum Associates. Klein, S.P., Hamilton, H., McCaffrey, D., and Stecher, B. (2000). What do test scores in Texas tell us? (Issue paper). Santa Monica, CA: RAND Corp. Available: http://www.rand.org/publications/IP/ IP202/ [accessed June 2005]. Koretz, D. (2005). Alignment, high stakes, and the inflation of test scores. Center for the Study of Evaluation, Report #655. University of California, Los Angeles. Koretz, D.M., and Baron, S.I. (1998). The validity of gains in scores on the Kentucky Instructional Results Information System (KIRIS). Santa Monica, CA: RAND Corp. Koretz, D., Barron, S., Mitchell, K., and Stecher, B. (1996). The perceived effects of the Kentucky instructional results information system. (MR–792–PCT/FF). Santa Monica, CA: RAND Corp. Koretz, D., McCaffrey, D., Klein, S., Bell, R., and Stecher, B. (1993). The reliability of scores from the 1992 Vermont portfolio assessment program. (CSE Technical Report No. 355). Los Angeles: Center for the Study of Evaluation, National Center for Research on Evaluation, Standards, and Student Testing, University of California. Kouba, V.L., and Champagne, A.B. (2002). Can external assessments assess science inquiry? In Robert W. Lissitz (Ed.), Optimizing state and classroom tests: Implications of cognitive research for assessments of higher order reasoning in subject-matter domains. College Park: University of Maryland. Kozma, R., and Russell, J. (1997). Multimedia and understanding: Expert and novice responses to different representations of chemical phenomena. Journal of Research in Science and Teaching, 43(9), 949–968. Krajcik, J.S., Mamlok, R., and Hug, B. (2000). Modern content and the enterprise of science: Science education in the twentieth century. In L. Corno (Ed.), Education across a century: The centennial volume. (One-hundredth yearbook of the National Society for the Study of Education). Chicago, IL: University of Chicago Press. Lane, S., Parke, C.S., and Stone, C.A. (2002). The impact of a state performance-based assessment and accountability program on mathematics instruction and student learning: Evidence from survey data and school performance. Educational Assessment, 8(4), 279. Lane, S., Stone, C.A., Parke, C.S., Hansen, M.A., and Cerrillo, T.L. (2000). Consequential evidence for MSPAP from the teacher, principal and student perspective. Paper presented at the annual meeting of the National Council on Measurement in Education, April, New Orleans, LA. LaPointe, A.E., Mead, N.A., and Phillips, G.W. (1989). A world of differences: An international assessment of mathematics and science. Princeton, NJ: Educational Testing Service. Larkin, J.H. (1981). Enriching formal knowledge: A model of learning to solve textbook physical problems. In J. Anderson (Ed.), Cognitive skills and their acquisition. Hillsdale, NJ: Lawrence Erlbaum Associates. Larkin, J.H. (1983). The role of problem representation in physics. In D. Gentner and A. Stevens (Eds.), Mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Latour, B. (1999). Pandora’s hope: Essays on the reality of science studies. Cambridge, MA: Harvard University Press. Lerner, L.S. (1998). State science standards: An appraisal of science standards in 36 states. Washington, DC: Thomas B. Fordham Foundation. Available: http://lsc-net.terc.edu/do.cfm/paper/8070/show/page-3/use_set-l_standards [accessed June 2005]. Lerner, L.S. (2000). The state of state standards in science. In C.E. Finn and M.J. Petrilli (Eds.), The state of state standards 2000. Washington, DC: Thomas B. Fordham Foundation.

OCR for page 171
Systems for State Science Assessment Lester, F.K., Jr., Masingila, J.O., Mau, S.T., Lambdin, D.V., dos Santon, V.W., and Raymond, A.M. (1994). Learning how to teach via problem solving. In D. Aichele and A. Coxford (Eds.), Professional development for teachers of mathematics (pp. 152–166). Reston, VA: National Council of Teachers of Mathematics. Li, M. (2001). A framework for science achievement and its link to test items. Unpublished doctoral dissertation, Stanford University. Li., M., and Shavelson, R.J. (2001). Examining the links between science achievement and assessment. Paper presented at the annual meeting of the American Educational Research Association, Seattle, WA. Li, M., Shavelson, R.J., Kupermintz, H., and Ruiz-Primo, M.A. (2002). On the relationship between mathematics and science achievement: An exploration of the Third International Mathematics and Science Study. In D.F. Robitaille and A.E. Beaton (Eds.), Secondary analysis of the TIMSS data (pp. 233–249). Boston, MA: Kluwer Academic. Linn, R.L. (2003). Accountability: Responsibility and reasonable expectations. Educational Researcher, 32(7), 3–13. Linn, R.L., and Haug, C. (2002). Stability of school building accountability scores and gains. Educational Evaluation and Policy Analysis, 24(1), 29–36. Little, J.W. (1994). Teachers’ professional development in a climate of educational reform. Educational Evaluation and Policy Analysis, 15, 129–151. Loucks-Horsley, S., Hewson, P., Love, N., and Stiles, K. (1998). Designing professional development for teachers of science and mathematics. Thousand Oaks, CA: Corwin Press. Madaus, G. (1998). The distortion of teaching and testing: High-stakes testing and instruction. Peabody Journal of Education, 65, 29–46. Marzano, R., Pickering, D., and Pollack, J. (2001). Classroom instruction that works. Alexandria, VA: Association of Supervision and Curriculum Development. Masters, G., and Forster, M. (1996). Progress maps. Assessment resource kit. Victoria, Australia: Commonwealth of Australia. Mazur, E. (1997). Peer instruction: A user’s manual. Upper Saddle River, NJ: Prentice Hall. McDonnell, L.M., and Choisser, C. (1997). Testing and teaching: Local implementation of new state assessments. (CSE Technical Report No. 442). Los Angeles: Center for the Study of Evaluation, National Center for Research on Evaluation, Standards, and Student Testing, University of California. Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Education Researcher, 23(2), 13–23. Mestre, J.P. (1994). Cognitive aspects of learning and teaching science. In S.J. Fitzsimmons and L.C. Kerpelman (Eds.), Teacher enhancement for elementary and secondary science and mathematics: Status, issues and problems (NSF 94–80, pp. 3-1–3-53). Arlington, VA: National Science Foundation. Mestre, J.P. (Ed.). (2005). Transfer of learning from a modern multidisciplinary perspective. Greenwich, CT: Information Age. Metzenberg, S. (2004). Science and mathematics testing: What’s right and wrong with the NAEP and the TIMSS? In W.M. Evers and H.J. Walberg (Eds.). Testing student learning, evaluating teacher effectiveness. Stanford, CA: Hoover Institution Press. Millman, J., and Greene, J. (1993). The specification and development of tests of achievement and ability. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp 335–366). New York: American Council on Education. Minstrell, J. (2001). The role of the teacher in making sense of classroom experiences and effecting better learning. In D. Klahr and S. Carver (Eds.), Cognition and instruction: 25 years of progress. Mahwah, NJ: Lawrence Erlbaum Associates. Mislevy, R.J. (1996). Test theory reconceived. Journal of Educational Measurement, 33(4), 379–416.

OCR for page 171
Systems for State Science Assessment Mislevy, R.J., and Haertel, G. (2005). Overview of the PADI assessment design system. Paper presented at the American Educational Research Association Annual Meeting, April, Montreal. Mislevy, R.J., Steinberg, L.S., and Almond, R.G. (2002). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3–67. Mislevy, R.J., Wilson, M., Ercikan, K., and Chudowsky, N. (2003). Psychometric principles in student assessment. In T. Kellaghan and D.L. Stufflebeam (Eds.), International handbook of educational evaluation (pp. 489–532). Dordrecht, The Netherlands: Kluwer Academic. National Assessment Governing Board. (2004, November). NAEP 2009 science framework development: issues and recommendations. Washington, DC: Author. National Center for Education Statistics. (2001). Teacher preparation and professional development: 2000. Available: http://nces.ed.gov/pubs2001/2001088.pdf [accessed June 2005]. National Commission on Excellence in Education. (1983, April). A nation at risk: The imperative for educational reform. A report to the nation and the Secretary of Education United States Department of Education. Available: http://www.ed.gov/pubs/NatAtRisk/index.html [accessed June 2005]. National Commission on Mathematics and Science Teaching for the 21st Century. (2000). Before it’s too late: A report to the nation from the National Commission on Mathematics and Science Teaching for the 21st Century. Jessup, MD: Education Publications Center. National Council on Educational Standards and Testing. (1992). Raising standards for American education. Washington, DC: U.S. Government Printing Office. National Education Goals Panel. (1993). Promises to keep: Creating high standards for American students. (Report on the Review of Educational Standards from the Goals 3 and 4). Washington, DC: Author, Technical Planning Group. National Research Council. (1990). Fulfilling the promise: Biology education in the nation’s schools. Committee on High School Biology Education, Board on Biology, Commission on Life Sciences. Washington, DC: National Academy Press. National Research Council. (1996). National science education standards. National Committee on Science Education Standards and Assessment. Center for Science, Mathematics, and Engineering Education. Washington, DC: National Academy Press. National Research Council. (1999a). How people learn: Brain, mind, experience, and school. J.D. Bransford, A.L. Brown, and R.R. Cocking (Eds.), Committee on Developments in the Science of Learning, Commission on Behavioral and Social Sciences and Education. Washington, DC: National Academy Press. National Research Council (1999b). Testing, teaching, and learning: A guide for states and school districts. R.F. Elmore and R. Rothman (Eds.), Committee on Title I Testing and Assessment, Board on Testing and Assessment, Commission on Behavioral and Social Sciences and Education. Washington, DC: National Academy Press. National Research Council. (2000a). Educating teachers of science, mathematics, and technology: New practices for the new millennium. Committee on Science and Mathematics Teacher Preparation, Center for Education. Washington, DC: National Academy Press. National Research Council. (2000b). How people learn: Brain, mind, experience, and school: Expanded edition. Committee on Developments in the Science of Learning, J.D. Bransford, A.L. Brown, and R.R. Cocking (Eds.) with additional material from the Committee on Learning Research and Educational Practice. M.S. Donovan, J.D. Bransford, and J.W. Pellegrino (Eds.), Commission on Behavioral and Social Sciences and Education. Washington, DC: National Academy Press. National Research Council. (2000c). Inquiry and the national science education standards: A guide for teaching and learning. Committee on Development of an Addendum to the National Science Education Standards on Scientific Inquiry, Center for Science, Mathematics, and Engineering Education. Washington, DC: National Academy Press.

OCR for page 171
Systems for State Science Assessment National Research Council. (2001a). Classroom assessment and the national science education standards. Committee on Classroom Assessment and the National Science Education Standards. J. M. Atkin, P. Black, and J. Coffey (Eds.). Center for Education. Washington, DC: National Academy Press. National Research Council. (2001b). Knowing what students know: The science and design of educational assessment. Committee on the Foundations of Assessment. J. Pellegrino, N. Chudowsky, and R. Glaser (Eds.). Board on Testing and Assessment. Center for Education. Division of Behavioral and Social Sciences and Education. Washington, DC: National Academy Press. National Research Council. (2002). Learning and understanding: Improving advanced study of mathematics and science in U.S. high schools. Committee on Programs for Advanced Study of Mathematics and Science in American High Schools. J.P. Gollub, M.W. Bertenthal, J.B. Labov, and P.C. Curtis (Eds.). Center for Education, Division of Behavioral and Social Sciences and Education. Washington, DC: National Academy Press. National Research Council (2003). Assessment in support of instruction and learning: Bridging the gap between large-scale and classroom assessment. Committee on Assessment in Support of Instruction and Learning. Board on Testing and Assessment. Committee on Science Education K–12, Mathematical Sciences Education Board. Center for Education, Division of Behavioral and Social Sciences Education. Washington, DC: The National Academies Press. National Research Council. (2004). Keeping score for all: The effects of inclusion and accommodation policies on large-scale educational assessments. Committee on Participation of English Language Learners and Students with Disabilities in NAEP and Other Large-Scale Assessments. J.A. Koenig and L.F. Bachman (Eds.). Board on Testing and Assessment, Center for Education. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press. National Science Board Commission on Precollege Education in Mathematics, Science and Technology. (1983). Educating Americans for the 21st century: A report to the American people and the National Science Board. Washington, D.C.: Author. National Science Teachers Association. (1992). Scope, sequence, coordination. The content core: A guide for curriculum designers. Washington, DC: Author. National Staff Development Council. (2001). Standards for staff development (Rev. Ed.). Oxford, OH: Author. Neill, M., and Medina, N.J. (1989). Standardized testing: Harmful to educational health. Phi Delta Kappan, 70, 688–697. Neuberger, W. (2004). Online assessment in Oregon: The technology-enhanced student assessment. Presented at the No Child Left Behind Leadership Summit, March, St. Louis, MO. Niemi, D. (1996). Assessing conceptual understanding in mathematics: Representation, problem solutions, justifications, and explanations. Journal of Educational Research, 89, 351–363. Odendahl, N. (1999). Online delivery and scoring of constructed-response assessments. Paper presented at the American Educational Research Association Annual Meeting, Montreal. Olson, L. (1998). An “A” or a “D”: State rankings differ widely. Education Week, April 15. Oswald, J.H., and Rebarber, R. (2002). State innovations priorities for state testing programs. Washington, DC: Education Leadership Council. Patz, R., Reckase, M., and Martineau, J. (2005). Building NCLB science assessments: Psychometric and practical considerations. Commissioned paper prepared for the National Research Council’s Committee on Test Design for K–12 Science Achievement, Washington, DC. Perkins, D. (1992). Smart schools: From training memories to educating minds. New York: Free Press. Perkins, D. (1993). Teaching for understanding. American Educator: The Professional Journal of the American Federation of Teachers, 17(3), 28–35. Perkins, D. (1998). What is understanding? In M.S. Wiske (Ed.), Teaching for understanding: Linking research with practice. San Francisco: Jossey-Bass Publishers. Phillips, S.E., and Rebarber, T. (2002). Model contractor standards and state responsibilities. Washington, DC: Education Leadership Council.

OCR for page 171
Systems for State Science Assessment Plake, B.S., Buckendahl, C.W., and Impara, J.C. (2004). Classroom-based assessment system for science: A model . Commissioned paper prepared for the National Research Council’s Committee on Test Design for K–12 Science Achievement, Washington, DC. Poggio, J.P., Glasnapp, D.R., and Eros, D.S. (1981). An empirical investigation of the Angoff, Ebel, and Nedelsky standard setting methods. Paper presented at the American Educational Research Association Annual Meeting, April, Los Angeles, CA. Popham, J., Keller, T., Moulding, B., Pellegrino, J., and Sandifer, P. (2004). Instructionally supportive accountability tests in science: A viable assessment option? An analysis. Commissioned paper prepared for the National Research Council’s Committee on Test Design for K–12 Science Achievement, Washington, DC. Porter, A.C. (2002). Measuring the content of instruction: Uses in research and practice. Educational Researcher, 31(7), 3–14. Prawat, R. (1992). Teachers’ beliefs about teaching and learning: A constructivist perspective. American Journal of Education, 100, 354–395. Putnam, R., and Borko, H. (1997). Teacher learning: Implications of new views of cognition. In B.J. Biddle (Ed.), International handbook of teachers and teaching (pp. 1223–1296). Boston, MA: Kluwer Academic. Putnam, R.T., and Borko, H. (2002). What do new views of knowledge and thinking have to say about research on teacher learning? In B. Moon, J. Butcher, and E. Bird (Eds.), Leading professional development in education (pp. 11–29). London: Routledge and Falmer. Quellmalz, E.S. (1984). Designing writing assessments: Balancing fairness, utility, and cost. Educational Evaluation and Policy Analysis, 6, 63–72. Quellmalz, E.S., and Haertel, G.D. (2004). Use of technology-supported tools for large-scale science assessment: implications for assessment practice and policy at the state level. Commissioned paper prepared for the National Research Council’s Committee on Test Design for K–12 Science Achievement, Washington, DC. Quellmalz, E.S., and Kreikemeier, P. (2002). The alignment of standards and assessment: Building better methodologies—Validities of science inquiry assessments: a study of the alignment of items and tasks drawn from science reference exams with the National Science Education Standards. Paper presented at American Educational Research Association Symposium, New Orleans, LA. Quellmalz, E.S., and Moody, M. (2004). Models for multi-level state science assessment systems. Commissioned paper prepared for the National Research Council’s Committee on Test Design for K–12 Science Achievement, Washington, DC. Raizen, S.A., and Kaser, J.S. (1989). Assessing science learning in elementary school: Why, what, and how? Phi Delta Kappan, 70(9), 718–722. Reckase, M., and Martineau, J. (2004). The vertical scaling of science achievement tests. Commissioned paper prepared for the National Research Council’s Committee on Test Design for K–12 Science Achievement, Washington, DC. Reiser, R.A. (2002). A history of instructional design and technology. In R.A. Reiser and Dempsey, J.V. (Eds.), Trends and issues in instructional design and technology. NJ: Merrill, NJ: Prentice Hall. Reiser, B.J., Krajcik, J., Moje, E., and Marx, R. (2003). Design strategies for developing science instructional materials. Paper presented at the National Association for Research in Science Teaching Annual Meeting, March, Philadelphia, PA. Resnick, L.B. (1995). From aptitude to effort: A new foundation for our schools. Daedalus, 124(4), 55–62. Roeber, E. (1996). Designing coordinated assessment systems for Title I of the Improving America’s Schools Act of 1994. Washington, DC: Council of Chief State School Officers.

OCR for page 171
Systems for State Science Assessment Rothman, R. (2003). Imperfect matches: The alignment of standards and tests. Commissioned paper prepared for the National Research Council’s Committee on Test Design for K–12 Science Achievement, Washington, DC. Rudolph, J.L., and Stewart, J.H. (1998). Evolution and the nature of science: On the historical discord and its implications for education. Journal of Research in Science Teaching, 35, 1069–1089. Ruiz-Primo, M.A., Shavelson, R.J., Li, M., and Schultz, S.E. (2001). On the cognitive validity of interpretations of scores from alternative concept-mapping techniques. Educational Assessment, 7(2), 99–141. Rutherford, F.J., and Ahlgren, A. (1989). Science for all Americans: American Association for the Advancement of Science, Project 2061. New York: Oxford University Press. Salomon, G., and Perkins, D.N. (1989). Rocky roads to transfer: Rethinking mechanisms of a neglected phenomenon. Educational Psychologist, 24(2), 113–142. Schoenfeld, A.H. (1983). Problem solving in the mathematics curriculum: A report, recommendation, and annotated bibliography. (MAA Notes No. 1). Washington, DC: Mathematical Association of America. Schoenfeld, A.H. (1985). Mathematical problem solving. Orlando, FL: Academic Press. Schum, D.A. (1994). The evidential foundations of probabilistic reasoning. New York: Wiley. Senge, P. (1990). The fifth discipline: The art and practice of the learning organization. New York: Currency Doubleday. Shapin, S., and Shaffer, S. (1985). Leviathan and the air-pump: Hobbes, Boyle, and the experimental life. Princeton, NJ: Princeton University Press. Shavelson, R.J., and Ruiz-Primo, M. A. (1999). On the assessment of science achievement. Unterrichts Wissenschaft, 2(27), 102–127. Shavelson, R.J., Li, M., Ruiz-Primo, M.A., Wood, R., and Martin, K. (2004, July) On Delaware’s Assessment of Science Achievement: II Audit Test Development, Reliability and Validity (second of two unpublished reports on Delaware’s Assessment of Science Achievement). Shepard, L.A. (2000). The role of assessment in a learning culture. Educational Researcher, 29(7), 4–14. Sibum, H.O. (2004). Beyond the ivory tower: What kind of science is experimental physics? Science, 306, 60–61. Simon, H.A. (1980). Problem solving and education. In D.T. Tuma and R. Reif (Eds.), Problem solving and education: Issues in teaching and research (pp. 81–96). Hillsdale, NJ: Lawrence Erlbaum Associates. Sireci, S.G., Li, S., and Scarpati, S. (2003). The effects of test accommodations on test performance: A review of the literature. (Center for Educational Assessment Research Report No. 485). Amherst, MA: University of Massachusetts School of Education. Smith, C., Wiser, M., Anderson, C.W., Krajcik, J., and Coppola, B. (2004). Implications of research on children’s learning for assessment: matter and atomic molecular theory. Commissioned paper prepared for the National Research Council’s Committee on Test Design for K–12 Science Achievement, Washington, DC. Smith, M.L., and Rottenberg, C. (1991). Unintended consequences of external testing in elementary schools. Educational Measurement: Issues and Practices, 10, 7–11. Smylie, M.A., Allensworth, E., Greenberg, R.C., Harris, R., and Luppescu, S. (2001). Teacher professional development in Chicago: Supporting effective practice. Chicago, IL: Consortium on Chicago School Research. Stecher, B., and Barron, S. (1999). Quadrennial mile-post accountability testing in Kentucky. (CSE Technical Report No. 505). Los Angeles: Center for the Study of Evaluation, National Center for Research on Evaluation, Standards, and Student Testing, University of California.

OCR for page 171
Systems for State Science Assessment Stecher, B., Barron, S.L., Chun, T., and Ross, K. (2000). The effects of the Washington state education reform on schools and classroom. (CSE Technical Report No. 525). Los Angeles: Center for the Study of Evaluation, National Center for Research on Evaluation, Standards, and Student Testing, University of California. Stecher, B., Barron, S., Kaganoff, T., and Goodwin, J. (1998). The effect of standards-based assessment on classroom practices: Results of the 1996–1997 RAND survey of Kentucky teachers of mathematics and writing . (CSE Technical Report No. 482). Los Angeles: Center for the Study of Evaluation, National Center for Research on Evaluation, Standards, and Student Testing, University of California. Steinberg, L.S., and Almond, R.G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3–67. Steinberg, L.S., Mislevy, R.J., Almond, R.G., Baird, A.B., Cahallan, C., Dibello, L.V., Senturk, D.,Yan, D., Chernick, H., Kindfield, A.C.H. (2003). Introduction to the Biomass Project: An illustration of evidence-centered assessment design and delivery capability. (CSE Technical Report No. 609). Available: http://www.cse.ucla.edu/reports/R609.pdf [accessed June 2005]. Stiggins, R.J. (1999). Evaluating classroom assessment training in teacher education programs. Educational Measurement: Issues and Practice, 18(1), 23–27. Sylvester, R. (1995). A celebration of neurons: An educator’s guide to the human brain. Alexandria, VA: Association for Supervision and Curriculum Development. Thompson, S.J., Blount, A., and Thurlow, M.L. (2002). A summary of research on the effects of test accommodations—1999 through 2001. Minneapolis, MN: National Center on Educational Outcomes. Tindal, G., and Fuchs, L. (2000). A summary of research on test accommodations: An empirical basis for defining test accommodations. (ERIC Document Reproduction Service No. ED 442 245). Lexington, KY: Mid-South Regional Resource Center. Trevisan, M.S. (2002, June). The states’ role in ensuring assessment competence. Phi Delta Kappan, 83(10), 766–771. U.S. Department of Education. (2000). Before it’s too late: Report to the nation from the National Commission on Mathematics and Science Teaching for the 21st Century. Washington, DC: Author. U. S. Department of Education. (2004). Standards and assessments peer review guidance: Information and examples for meeting requirements of the No Child Left Behind Act of 2001. Available: http://www.ed.gov/policy/elsec/guid/saaprguidance.pdf [accessed June 2005]. Van Valkenburgh, B., Wang, X., and Damuth, J. (2004). Cope’s rule, hypercarnivory, and extinction in North American canids. Science, 306, 101–104. Vygotsky, L.S. (1978). Mind in society. Cambridge, MA: Harvard University Press. Wainer, H. (1997). Improving tabular displays: With NAEP tables as examples and inspirations. Journal of Educational and Behavioral Statistics, 22, 1–30. Wainer, H., Hambleton, R.K., and Meara, K. (1999). Alternative displays for communicating NAEP results: A redesign and validity study. Journal of Educational Measurement, 36, 301–335. Webb, N.L. (1997a). Criteria for alignment of expectations and assessments in mathematics and science education. (Research Monograph No. 6). Madison, WI: National Institute for Science Education. Webb, N.L. (1997b, January). Determining alignment of expectations and assessments in mathematics and science education. NISE Brief 1(2). Madison: University of Wisconsin–Madison, National Institute for Science Education. Webb, N.L. (1999). Alignment of science and mathematics standards and assessments in four states. Research monograph #18. Madison: University of Wisconsin–Madison, National Institute for Science Education.

OCR for page 171
Systems for State Science Assessment Webb, N.L. (2001). Alignment analysis of STATE F language arts standards and assessments, grades 5, 8, and 11. Paper Prepared for the Technical Issues of Large-Scale Assessment Group of the Council of Chief State School Officers. November 30. Webb, N.L. (2002). Assessment literacy in a standards-based urban education setting. Paper presented at the AERA Annual Meeting, New Orleans. Whalen, S.J., and Bejar, I.I. (1998). Relational databases in assessment: An application to online scoring. Journal of Educational Computing Research, 18, 1–13. Wiggins, G.P. (1998). Educative assessment: designing assessments to inform and improve student performance. San Francisco, CA: Jossey-Bass. Wiggins, G.P., and McTighe, J. (1998). Understanding by design. Alexandria, VA: Association for Supervision and Curriculum Development. Wiliam, D., and Black, P. (2004). International approaches to science assessment. Commissioned paper prepared for the National Research Council’s Committee on Test Design for K–12 Science Achievement, Washington, DC. Wilson, M. (2004). Assessment tools: Psychometric and statistical. In J.W. Guthrie (Ed.), Encyclopedia of education, 2nd ed. New York: Macmillan Reference USA. Wilson, M. (2005). Constructing measures: An item-response modeling approach. Mahwah, NJ: Lawrence Erlbaum Associates. Wilson, M., and Draney, K. (2002). A technique for setting standards and maintaining them over time. In S. Nishisato, Y. Baba, H. Bozdogan, and K. Kanefugi (Eds.), Measurement and multivariate analysis (pp. 325–332). Proceedings of the International Conference on Measurement and Multivariate Analysis, Banff, Canada, May 12–14, 2000. Tokyo, Japan: Springer-Verlag. Wilson, M., and Draney, K. (2004). Some links between large-scale and classroom assessments: The case of the BEAR Assessment System. In M. Wilson (Ed.), Towards coherence between classroom assessment and accountability. One-hundred-third Yearbook of the National Society for the Study of Education, Part II. Chicago: University of Chicago Press. Wilson, M., and Sloane, K. (2000). From principles to practice: An embedded assessment system. Applied Measurement in Education, 12(2), 181–208. Wixson, K.K., Fisk, M.C., Dutro, E., and McDaniel, J. (2002). The alignment of state standards and assessments in elementary reading. CIERA Technical Report. Ann Arbor, MI: Center for the Improvement of Early Reading Achievement. Wolf, S.A., and McIver, M.C. (1999). When progress becomes policy: The paradox of Kentucky state reform for exemplary teachers. Phi Delta Kappan, 80, 401–406. Zuriff, G.E. (2000) Extra examination time for students with learning disabilities: An examination of the maximum potential thesis. Applied Measurement in Education, 13(1), 99–117.

OCR for page 171
Systems for State Science Assessment This page intentionally left blank.