Read "Developing Assessments for the Next Generation Science Standards" at NAP.edu

Page 235 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

REFERENCES

Allen, R. (2012). Developing the enabling context for school-based assessment in Queensland, Australia. Washington, DC: The World Bank.

Almond, R.G., Steinberg, L.S., and Mislevy, R.J. (2002). A four-process architecture for assessment delivery, with connections to assessment design. Journal of Technology, Learning, and Assessment, 1(5). Available: http://www.bc.edu/research/intasc/jtla/journal/v1n5.shtml [June 2013].

Alonzo, A.C., and Gearhart, M. (2006). Considering learning progressions from a classroom assessment perspective. Measurement: Interdisciplinary Research and Perspectives, 4(1 and 2), 99-104.

American Association for the Advancement of Science. (2001). Atlas of science literacy: Project 2061, Volume 1. Washington, DC: Author.

American Association for the Advancement of Science. (2007). Atlas of science literacy: Project 2061, Volume 2. Washington, DC: Author.

American Association for the Advancement of Science. (2009, originally published 1993). Benchmarks for Science Literacy. New York: Oxford University Press.

American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington DC: American Psychological Association.

Andrade, H., and Cizek, G.J. (Eds.). (2010). Handbook of formative assessment. New York: Taylor and Francis.

Association of Public and Land-grant Universities. (2011). The common core state standards and teacher preparation: The role of higher education. APPLU/SMTI, paper 2. Washington, DC: Author. Available: http://www.aplu.org/document.doc?id=3482 [May 2013].

Page 236 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

Baker, E.L. (1994). Making performance assessment work: The road ahead. Educational Leadership, 51(6), 58-62.

Banilower, E.R., Fulp, S.L., and Warren, C.L. (2010). Science: It’s elementary. Year four evaluation report. Chapel Hill, NC: Horizon Research.

Barton, K., and Schultz, G. (2012). Using technology to assess hard-to-measure constructs in the CCSS and to expand accessibility: English language arts. Paper presented at the Invitational Research Symposium on Technology Enhanced Assessments, Washington, DC. Available: http://www.k12center.org/events/research_meetings/tea.html [September 2013].

Basterra, M., Trumbul, E., and Solano-Flores, G. (2011). Cultural validity in assessment: Addressing linguistic and cultural diversity. New York: Routledge.

Baxter, G.P., and Glaser, R. (1998). The cognitive complexity of science performance assessments. Educational Measurement: Issues and Practice, 17(3), 37-45.

Bennett, R.E., and Bejar, I.I. (1998). Validity and automated scoring: It’s not only the scoring. Educational Measurement: Issues and Practice, 17(4), 9-16.

Berland, L.K., and Reiser, B.J. (2009). Making sense of argumentation and explanation. Science Education, 93(1), 26-55.

Black, P., and Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7-74.

Black, P., and Wiliam, D. (2010). Chapter 1: Formative assessment and assessment for learning. In J. Chappuis, Seven strategies of assessment for learning. New York: Pearson.

Black, P., Wilson, M., and Yao, S. (2011). Road maps for learning: A guide to the navigation of learning progressions. Measurement: Interdisciplinary Research and Perspectives, 9, 1-52.

Braun, H., Bejar, I.I., and Williamson, D.M. (2006). Rule-based methods for automated scoring: Applications in a licensing context. In D.M. Williamson, R.J. Mislevy, and I.I. Bejar (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 83-122). Mahwah, NJ: Lawrence Erlbaum Associates.

Briggs, D., Alonzo, A., Schwab, C., and Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11(1), 33-63.

Brown, N.J.S., Furtak, E.M., Timms, M.J., Nagashima, S.O., and Wilson, M. (2010) The evidence-based reasoning framework: Assessing scientific reasoning. Educational Assessment, 15(3-4), 123-141.

Buckley, B.C., and Quellmalz, E.S. (2013). Supporting and assessing complex biology learning with computer-based simulations and representations. In D. Treagust and C.-Y. Tsui (Eds.), Multiple representations in biological education (pp. 247-267). Dordrecht, Netherlands: Springer.

Page 237 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

Buckley, J., Schneider, M., and Shang, Y. (2004). The effects of school facility quality on teacher retention in urban school districts. Washington, DC: National Clearinghouse for Educational Facilities. Available: http://www.ncef.org/pubs/teacherretention.pdf [December 2013].

Burke, R.J., and Mattis, M.C. (Eds.). (2007). Woman and minorities in science technology, engineering, and mathematics: Upping the numbers. Northampton, MA: Edward Elgar.

Bystydzienski, J.M., and Bird, S.R. (Eds.). (2006). Removing barriers: Women in academic science, technology, engineering and mathematics. Bloomington: Indiana University Press.

Camilli, G. (2006). Errors in variables: A demonstration of marginal maximum likelihood. Journal of Educational and Behavioral Statistics, 31, 311-325.

Camilli, G., and Shepard, L.A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.

Catterall, J., Mehrens, W., Flores, R.G., and Rubin, P. (1998). The Kentucky instructional results information system: A technical review. Frankfort: Kentucky Legislative Research Commission.

Center on Education Policy. (2007). Choices, changes, and challenges: Curriculum and instruction in the NCLB era. Washington, DC: Author.

Claesgens, J., Scalise, K., Wilson, M., and Stacy, A. (2009). Mapping student understanding in chemistry: The perspectives of chemists. Science Education, 93(1), 56-85.

Clark- Ibañez, M. (2004). Framing the social world through photo-elicitation interviews. American Behavioral Scientist, 47(12), 1507-1527.

College Board. (2009). Science: College Board standards for college success. New York: Author.

College Board. (2011). AP biology: Curriculum framework 2012-2013. New York: Author.

College Board. (2012). AP biology course and exam description effective fall 2012. New York: Author

College Board. (2013a). AP biology 2013 free-response questions. New York: Author. Available: https://secure-media.collegeboard.org/ap-student/pdf/biology/ap-2013-biology-free-response-questions.pdf [December 2013].

College Board. (2013b). AP biology 2013 scoring guidelines. New York: Author. Available: http://media.collegeboard.com/digitalServices/pdf/ap/apcentral/ap13_biology_q2.pdf [December 2013].

Corcoran, T., Mosher, F.A., and Rogat, A. (2009). Learning progressions in science. Philidelphia, PA: Consortium for Policy Research in Education.

Page 238 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

Darling-Hammond, L., Herman, J., Pellegrino, J., Abedi, J., Aber, J.L., Baker, E., Bennett, R., Gordon, E., Haertel, E., Hakuta, K., Ho, A., Linn, R.L., Pearson, P.D., Popham, J., Resnick, L., Schoenfeld, A.H., Shavelson, R., Shepard, L.A., Shulman, L., and Steele, C.M. (2013). Criteria for high-quality assessment. Stanford, CA: SCOPE, CRESST, and Learning Sciences Research Institute.

Davis, E.A., Petish, D., and Smithey, J. (2006). Challenges new science teachers face. Review of Educational Research, 76(4), 607-651.

Dee, T.S., Jacob, B.A., and Schwartz, N.L. (2013). The effects of NCLB on school resources and practices. Educational Evaluation and Policy Analysis, 35(2), 252-279.

Deng, N., and Yoo, H. (2009). Resources for reporting test scores: A bibliography for the assessment community. Prepared for the National Council on Measurement in Education, Center for Educational Measurement, University of Massachusetts, Amherst. Available: http://ncme.org/linkservid/98ADCCAD-1320-5CAE6ED4F61594850156/showMeta/0/ [April 2014].

Dietel, R. (1993). What works in performance assessment? In Proceedings of the 1992 CRESST Conference. Evaluation Comment Newsletter [Online]. Available: http://www.cse.ucla.edu/products/evaluation/cresst_ec1993_2.pdf [October 2013].

diSessa, A.A. (2004). Metarepresentation: Native competence and targets for instruction. Cognition and Instruction, 22, 293-331.

diSessa, A.A., and Minstrell, J. (1998). Cultivating conceptual change with benchmark lessons. In J.G. Grceno and S. Goldman (Eds.), Thinking practices. Mahwah, NJ: Lawrence Erlbaum Associates.

Dorph, R., Shields, P., Tiffany-Morales, J., Hartry, A., and McCaffrey, T. (2011). High hopes—few opportunities: The status of elementary science education in California. Sacramento, CA: The Center for the Future of Teaching and Learning at WestEd.

Draney, K., and Wilson, M. (2008). A LLTM approach to the examination of teachers’ ratings of classroom assessment tasks. Psychology Science, 50, 417.

Dunbar, S., Koretz, D., and Hoover, H.D. (1991). Quality control in the development and use of performance assessment. Applied Measurement in Education, 4(4), 289-303.

Duschl, R., and Gitomer, D. (1997). Strategies and challenges to changing the focus of assessment and instruction in science classrooms. Educational Assessment, 4, 37-73.

Educational Testing Service. (2002). ETS standards for quality and fairness. Providence, NJ: Author. Available: http://www.ets.org/s/about/pdf/standards.pdf [October 2013].

Ericsson, K.A., and Simon, H.A. (1984). Protocol analysis: Verbal reports as data. Cambridge, MA: Bradford Books/MIT Press.

Ferrara, S. (2009). The Maryland School Performance Assessment Program (MSPAP) 1991-2002: Political considerations. Paper prepared for the Workshop of the Committee on Best Practices in State Assessment Systems: Improving Assessment while Revisiting Standards, December 10-11, National Research Council, Washington, DC. Available: http://www7.nationalacademies.org/bota/Steve%20Ferrara.pdf [September 2010].

Page 239 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

Frey, W.H. (2011). America’s diverse future: Initial glimpses at the U.S. child population from the 2010 Census. Brookings Series: State of Metropolitan America Number 26 of 62, April. Available: http://www.brookings.edu/~/media/research/files/papers/2011/4/06%20census%20diversity%20frey/0406_census_diversity_frey.pdf [October 2013].

Gobert, J.D. (2000). A typology of causal models for plate tectonics: Inferential power and barriers to understanding. International Journal of Science Education, 22(9), 937-977.

Gobert, J.D. (2005). The effects of different learning tasks on model-building in plate tectonics: Diagramming versus explaining. Journal of Geoscience Education, 53(4), 444-455.

Gobert, J.D., and Clement, J.J. (1999). Effects of student-generated diagrams versus student-generated summaries of conceptual understanding of causal and dynamic knowledge in plate tectonics. Journal of Research in Science Teaching, 36, 36-53.

Gobert, J.D., and Pallant, A. (2004). Fostering students’ epistemologies of models via authentic model-based tasks. Journal of Science Education and Technology, 13(1), 7-22.

Gobert, J.D., Horwitz, P., Tinker, B., Buckley, B., Wilensky, U., Levy, S., and Dede, C. (2003). Modeling across the curriculum: Scaling up modeling using technology. Proceedings of the Twenty-Fifth Annual Meeting of the Cognitive Science Society. Available: http://ccl.sesp.northwestern.edu/papers/2003/281.pdf [December 2013].

Gong, B., and DePascale, C. (2013). Different but the same: Assessment “comparability” in the era of the common core state standards. White paper prepared for the Council of Chief State School Officers, Washington, DC.

Goodman, D.P., and Hambleton, R.K. (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education, 17(2), 145-220.

Gotwals, A.W., and Songer, N.B. (2013). Validity evidence for learning progession-based assessment items that fuse core disciplinary ideas and science practices. Journal of Research in Science Teaching, 50(5), 597-626.

Griffith, G., and Scharmann, L. (2008). Initial impacts of No Child Left Behind on elementary science education. Journal of Elementary Science Education, 20(3), 35-48.

Haertel, E. (2013). How is testing supposed to improve schooling? Measurement: Interdisciplinary Research and Perspectives, 11(1-2), 1-18.

Haertel, G.D., Cheng, B. H., Cameto, R., Fujii, R., Sanford, C., Rutstein, D., and Morrison, K. (2012). Design and development of technology enhanced assessment tasks: Integrating evidence-centered design and universal design for learning frameworks to assess hard-to-measure science constructs and increase student accessibility. Paper presented at the Invitational Research Symposium on Technology Enhanced Assessments, May 7-8, Washington, DC. Available: http://www.k12center.org/events/research_meetings/tea.html [September 2013].

Page 240 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

Haertel, G.D., Vendlinski, T., Rutstein, D.W., Cheng, B., and DeBarger, A. (2013). Designing scenario-based, technology-enhanced assessment tasks using evidence-centered design. Paper presented at the annual meeting of the National Council on Measurement in Education, April 28, San Francisco, CA.

Hambleton, R.K., and Slater, S. (1997). Reliability of credentialing examinations and the impact of scoring models and standard-setting policies. Applied Measurement in Education, 13, 19-38.

Hambleton, R.K., Impara J., Mehrens W., and Plake B.S. (2000). Psychometric review of the Maryland School Performance Assessment Program (MSPAP). Baltimore, MD: Abell Foundation.

Hambleton, R.K., Jaeger, R.M., Koretz, D., Linn, R.L., Millman, J., and Phillips, S.E. (1995). Review of the measurement quality of the Kentucky instructional results information system, 1991-1994. Report prepared for the Office of Educational Accountability, Kentucky General Assembly.

Hamilton, L.S., Nussbaum, E.M., and Snow, R.E. (1997). Interview procedures for validating science assessments. Applied Measurement in Education, 10(2), 191-200.

Hamilton, L.S., Stecher, B.M., and Klein, S.P. (Eds.) (2002). Making sense of test-based accountability in education. Santa Monica, CA: RAND.

Hamilton, L.S., Stecher, B.M., and Yuan, K. (2009). Standards-based reform in the United States: History, research, and future directions. Washington, DC: Center on Education Policy.

Harmon, M., Smith, T.A., Martin, M.O., Kelly, D.L., Beaton, A.E., Mullis, I.V.S., Gonzalez, E.J. and Orpwood, G. (1997). Performance assessment in IEA’s third international mathematics and science study. Chestnut Hill, MA: Center for the Study of Testing, Evaluation, and Educational Policy, Boston College.

Heritage, M. (2010). Formative assessment and next-generation assessment systems: are we losing an opportunity? Los Angeles, CA: Council of Chief State School Officers.

Hill, R.K., and DePascale, C.A. (2003). Reliability of no child left behind accountability designs. Educational Measurement: Issues and Practices, 22(3), 12-20.

Hinze, S.R., Wiley, J., and Pellegrino, J.W. (2013). The importance of constructive comprehension processes in learning from tests. Journal of Memory and Language, 69(2), 151-164.

Ho, A.D. (2013). The epidemiology of modern test score use: Anticipating aggregation, adjustment, and equating. Measurement: Interdisciplinary Research and Perspectives, 11, 64-67.

Holland, P.W., and Dorans, N.J. (2006). Linking and equating. In R.L. Brennan (Ed.), Educational measurement (4th ed., pp. 187-220). Westport, CT: Praeger.

Holland, P.W., and Wainer, H. (Eds.). (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.

Page 241 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

Hoskens, M., and Wilson, M. (2001). Real-time feedback on rater drift in constructed response items: An example from the Golden State Examination. Journal of Educational Measurement, 38, 121-145.

Huff, K., Steinberg, L., and Matts, T. (2010). The promises and challenges of implementing evidence-centered design in large-scale assessment. Applied Measurement in Education, 23(4), 310-324.

Huff, K., Alves, C., Pellegrino, J., and Kaliski P. (2012). Using evidence-centered design task models in automatic item generation. In M. Gierl and T. Haladyna (Eds.), Automatic item generation. New York: Informa UK.

Intergovernmental Panel on Climate Change (2007). Climate change 2007: Synthesis report. Geneva, Switzerland: Author.

International Association for the Evaluation of Educational Achievement. (2013). International computer and information literacy study: Assessment framework. Amsterdam, the Netherlands: Author.

International Baccalaureate Organization. (2007). Diploma programme: Biology guide. Wales, UK: Author.

International Baccalaureate Organization. (2013). Handbook of procedures for the diploma programme, 2013. Wales, UK: Author.

Jaeger, R.M. (1996). Reporting large-scale assessment results for public consumption: Some propositions and palliatives. Presented at the 1996 Annual Meeting of the National Council on Measurement in Education, April, New York.

Joint Committee on Testing Practices (2004). Code of fair testing practices in education. Washington DC: American Psychological Association. Available: http://www.apa.org/science/programs/testing/fair-code.aspx [December 2013].

K-12 Center at Educational Testing Service. (2013). Now underway: A step change in K-12 Testing. Princeton, NJ: Educational Testing Service. Available: http://www.k12center.org/rsc/pdf/a_step_change_in_k12_testing_august2013.pdf [December 2013].

Kane, M.T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement: Fourth edition (pp. 17-64). Westport, CT: Praeger.

Kane, M.T. (2013). Validation as a pragmatic, science activity. Journal of Educational Measurement, 50(1), 115-122.

Kennedy, C.A. (2012a). PBIS student assessment on earth systems concepts: Test and item analysis. Berkeley, CA: KAC Consulting.

Kennedy, C.A. (2012b). PBIS student assessment on energy concepts: Test and item analysis. Unpublished Report. Berkeley, CA: KAC Consulting.

Kingston, N., and Nash, B. (2011). Formative assessment: A meta-analysis and a call for research. Educational Measurement: Issues and Practice, 30(4), 28-37.

Klein, S.P., McCaffrey, D., Stecher, B., and Koretz, D. (1995). The reliability of mathematics portfolio scores: Lessons from the Vermont experience. Applied Measurement in Education, 8(3), 243-260.

Page 242 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

Kolen, M.J., and Brennan, R.L. (2004). Test equating, linking, and scaling: Methods and practices (2nd ed.). New York: Springer-Verlag.

Kopriva, R., and Sexton, U.M. (1999). Guide to scoring LEP student responses to open-ended science items. Washington, DC: Council of Chief State School Officers.

Kopriva, R.J. (2008). Improving testing for English language learners. New York: Routledge.

Koretz, D. (2005). Alignment, high stakes, and inflation of test scores. CSE Report 655. Los Angeles: National Center for Research on Evaluation, Standards, and Student Testing, Center for the Study of Evaluation, Graduate School of Education & Information Studies, University of California, Los Angeles.

Koretz, D. (2008). Measuring up: What educational testing really tells us. Cambridge, MA: Harvard University Press.

Koretz, D., McCaffrey, D., Klein, S., Bell, R., and Stecher, B. (1992a). The reliability of scores from the 1992 Vermont portfolio assessment program: Interim report. CSE Technical Report 355. Santa Monica, CA: RAND Institute on Education and Training.

Koretz, D., Stecher, B., and Deibert, E. (1992b). The Vermont portfolio assessment program: Interim report on implementation and impact, 1991-1992 school year. CSE Technical Report 350. Santa Monica, CA: RAND and Los Angeles: Center for Research on Evaluation, Standards and Student Testing, University of California.

Koretz, D., Klein, S., McCaffrey, D., and Stecher, B. (1993a). Interim report: The reliability of Vermont portfolio scores in the 1992-1993 school year. CSE Technical Report 370. Santa Monica, CA: RAND and Los Angeles: Center for Research on Evaluation, Standards and Student Testing, University of California.

Koretz, D., Stecher, B., Klein, S., McCaffrey, D., and Deibert, E. (1993b). Can portfolios assess student performance and influence instruction? The 1991-1992 Vermont experience. CSE Technical Report 371. Santa Monica, CA: RAND and Los Angeles: Center for Research on Evaluation, Standards and Student Testing, University of California.

Koretz, D., Stecher, B., Klein, S., and McCaffrey, D. (1994). The evolution of a portfolio program: The Impact and quality of the Vermont program in its second year (1992-93), CSE Technical Report 385. Los Angeles: Center for Research on Evaluation, Standards and Student Testing, University of California.

Krajcik, J., and Merritt, J. (2012). Engaging students in scientific practices: What does constructing and revising models look like in the science classroom? Understanding a framework for K-12 science education. The Science Teacher, 79, 38-41.

Krajcik, J., Slotta, J., McNeill, K.L., and Reiser, B (2008a). Designing learning environments to support students constructing coherent understandings. In Y. Kali, M.C. Linn, and J.E. Roseman (Eds.), Designing coherent science education: Implications for curriculum, instruction, and policy. New York: Teachers College Press.

Page 243 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

Krajcik, J., McNeill, K.L. and Reiser, B. (2008b). Learning-goals-driven design model: Curriculum materials that align with national standards and incorporate project-based pedagogy. Science Education, 92(1), 1-32.

Krajcik, J., Reiser, B.J., Sutherland, L.M., and Fortus, D. (2013). Investigating and questioning our world through science and technology. Second ed. Greenwich, CT: Sangari Active Science.

Labudde, P., Nidegger, C., Adamina, M. and Gingins, F. (2012). The development, validation, and implementation of standards in science education: Chances and difficulties in the Swiss project HarmoS. In S. Bernholt, K. Neumann, and P. Nentwig (Eds), Making It Tangible: Learning Outcomes in Science Education (pp. 237-239). Munster, Germany: Waxmann.

Lee, O. (2012). Next generation science standards for English language learners. Presentation prepared for the Washington Association of Bilingual Education, May 11, New York University.

Lee, O., Quinn, H., and Valdés, G. (2013). Science and language for English language learners in relation to next generation science standards and with implications for common core state standards for English language arts and mathematics. Educational Researcher, 42(4), 223-233.

Lehrer, R. (2011). Learning to reason about variability and chance by inventing measures and models. Paper presented at the National Association for Research in Science Teaching, Orlando, FL.

Lehrer, R.L., and Schauble, L. (2012). Seeding evolutionary thinking by engaging children in modeling its foundations. Science Education, 96(4), 701-724.

Lehrer, R.L., Kim, M., and Schauble, L. (2007). Supporting the development of conceptions of statistics by engaging students in modeling and measuring variability. International Journal of Computers for Mathematics Learning, 12, 195-216.

Lehrer, R., Kim, M.J., and Jones, S. (2011). Developing conceptions of statistics by designing measures of distribution. International Journal on Mathematics Education (ZDM), 43(5), 723-736.

Lehrer, R., Kim, M-J., Ayers, E., and Wilson, M. (2013). Toward establishing a learning progression to support the development of statistical reasoning. In J. Confrey and A. Maloney (Eds.), Learning over Time: Learning trajectories in mathematics education. Charlotte, NC: Information Age.

Linn, R.L., Baker, E.L., and Dunbar, S.B. (1991). Complex performance-based assessment: Expectations and validation criteria. Educational Researcher, 20(8), 15-21.

Linn, R.L., Burton, E.L., DeStafano, L., and Hanson, M. (1996). Generalizability of new standards project 1993 pilot study tasks in mathematics. Applied Measurement in Education, 9(3), 201-214.

Page 244 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

Masters, G.N., and McBryde, B., (1994) An investigation of the comparability of teachers’ assessment of student folios. Research report number 6. Brisbane, Australia: Queensland Tertiary Entrance Procedures Authority.

Michaels, S., and O’Connor, C. (2011). Problematizing dialogic and authoritative discourse, their coding in classroom transcripts and realization in the classroom. Paper presented at ISCAR, the International Society for Cultural and Activity Research. Rome, Italy. September 7.

Marion, S., and Shepard, L. (2010). Let’s not forget about opportunity to learn: Curricular support for innovative assessments. Dover, NH: The National Center for the Improvement of Educational Assessment, Center for Assessment. Available: http://www.nciea.org/publication_PDFs/Marion%20%20Shepard_Curricular%20units_042610.pdf [June 2013].

McDonnell, L.M. (2004). Politics, persuasion, and educational testing. Cambridge, MA: Harvard University Press.

McNeill, K.L. and Krajcik, J. (2008). Scientific explanations: Characterizing and evaluating the effects of teachers’ instructional practices on student learning. Journal of Research in Science Teaching, 45(1), 53-78.

Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational Measurement (3rd ed., pp. 13- 104). New York: Macmillan.

Messick, S. (1993). Foundations of validity: Meaning and consequences in psychological assessment. Princeton, NJ: Educational Testing Service.

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Education Researcher, 23(2), 13-23.

Minstrell, J., and Kraus, P. (2005). Guided inquiry in the science classroom. In M.S. Donovan and J.D. Bransford (Eds.), How Students Learn: History, Mathematics, and Science in the Classroom. Washington, DC: The National Academies Press.

Minstrell, J., and van Zee, E. H. (2003). Using questioning to assess and foster student thinking. In J. M. Atkin and J. E. Coffey (Eds.), Everyday assessment in the science classroom. Arlington, VA: National Science Teachers Association.

Mislevy, R.J. (2007). Validity by design. Educational Researcher, 36, 463-469.

Mislevy, R.J., and Haertel, G. (2006). Implications for evidence centered design for educational assessment, Educational Measurement: Issues and Practice, 25, 6-20.

Mislevy, R.J., Almond, R.G., and Lukas, J.F. (2003). A brief introduction to evidence-centered design. Princeton, NJ: Educational Testing Service.

Mislevy, R.J., Steinberg, L.S., and Almond, R.A. (2002). Design and analysis in task-based language assessment. Language Testing, 19, 477-496.

Mitchell, K.J., Murphy, R.F., Jolliffe, D., Leinwand, S., and Hafter, A. (2004). Teacher assignments and student work as measures of opportunity to learn. Menlo Park, CA: SRI International.

Page 245 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

Moss, P.A., Pullin, D.C., Gee, J.P., Haertel, E.H., and Young, L.J. (Eds.). (2008). Assessment, equity, and opportunity to learn. Cambridge, UK: Cambridge University Press.

National Academy of Engineering and National Research Council. (2009). Engineering in K-12 education. Committee on K-12 Engineering Education, L. Katehi, G. Pearson, and M. Feder (Eds.). Washington, DC: The National Academies Press.

National Assessment Governing Board. (2009). Science framework for the 2009 national assessment of educational progress. Washington, DC: U.S. Department of Education. Available: http://www.nagb.org/content/nagb/assets/documents/publications/frameworks/science-09.pdf [May 2013].

National Research Council. (1996). National science education standards. National Committee for Science Education Standards and Assessment. National Committee on Science Education Standards and Assessment, Board on Science Education, Division of Behavioral and Social Sciences and Education, National Research Council. Washington, DC: National Academy Press.

National Research Council. (2000). How people learn: Brain, mind, experience, and school. Committee on Developments in the Science of Learning. J.D. Bransford, A.L. Brown, and R.R. Cocking (Eds.). Committee on Learning Research and Educational Practice, M.S. Donovan, J.D. Bransford, and J.W. Pellegrino (Eds.). Commission on Behavioral and Social Sciences and Education. Washington, DC: National Academy Press.

National Research Council. (2001). Knowing what students know: The science and design of education assessment. Committee on the Foundations of Assessment. J.W. Pellegrino, N. Chudowsky, and R. Glaser (Eds.). Board on Testing and Assessment, Center for Education, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.

National Research Council. (2003). Assessment in support of instruction and learning: Bridging the gap between large-scale and classroom assessment: Workshop report. Committee on Assessment in Support of Instruction and Learning, Committee on Science Education K-12. Board on Testing and Assessment, Committee on Science Education K-12, Mathematical Sciences Education Board, Center for Education. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.

National Research Council. (2005). America’s lab report: Investigations in high school science. Committee on High School Science Laboratories: Role and Vision, S.R. Singer, M.L. Hilton, and H.A. Schweingruber (Eds.). Board on Science Education, Center for Education. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.

Page 246 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

National Research Council. (2006). Systems for state science assessment. Committee on Test Design for K-12 Science Achievement. M.R. Wilson and M.W. Bertenthal (Eds.). Board on Testing and Assessment, Center for Education, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.

National Research Council. (2007). Taking science to school: Learning and teaching science in grades K-8. Committee on Science Learning, Kindergarten Through Eighth Grade. R.A. Duschl, H.A. Schweingruber, and A.W. Shouse (Eds.). Board on Science Education, Center for Education. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.

National Research Council. (2009). Learning science in informal environments: People, places, and pursuits. Committee on Learning Science in Informal Environments. P. Bell, B. Lewenstein, A.W. Shouse, and M.A. Feder (Eds.). Board on Science Education, Center for Education. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.

National Research Council. (2010). State assessment systems: Exploring best practices and innovations: Summary of two workshops. A. Beatty, Rapporteur, Committee on Best Practices for State Assessment Systems: Improving Assessment While Revisiting Standards. Board on Testing and Assessment. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.

National Research Council. (2011a). Expanding underrepresented minority participation: America’s science and technology talent at the crossroads. Committee on Underrepresented Groups and the Expansion of the Science and Engineering Workforce Pipeline. Washington, DC: The National Academies Press.

National Research Council. (2011b). Incentives and test-based accountability in education. M. Hout and S.W. Elliott (Eds.). Committee on Incentives and Test-Based Accountability in Public Education. Board on Testing and Assessment, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.

National Research Council. (2012a). A framework for K-12 science education: Practices, crosscutting concepts, and core ideas. Committee on Conceptual Framework for the New K-12 Science Education Standards. Board on Science Education. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.

National Research Council. (2012b). Monitoring progress toward successful K-12 STEM education: A nation advancing? Committee on the Evaluation Framework for Successful K-12 STEM Education. Board on Science Education and Board on Testing and Assessment, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.

Page 247 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

National Task Force on Teacher Education in Physics. (2010). National task force on teacher education in physics: Report synopsis. Available: http://www.compadre.org/Repository/document/ServeFile.cfm?ID=9845&DocID=1498 [September 2013].

Nehm, R., and Härtig, H. (2011). Human vs. computer diagnosis of students› natural selection knowledge: Testing the efficacy of text analytic software. Journal of Science Education and Technology, 21(1), 56-73.

Newmann, F.M., Lopez, G., and Bryk, A. (1998). The quality of intellectual work in Chicago schools: A baseline report. Prepared for the Chicago Annenberg Research Project. Chicago, IL: Consortium on Chicago School Research. Available: https://ccsr.uchicago.edu/sites/default/files/publications/p0f04.pdf [April 2014].

Newmann, F. M., and Associates. (1996). Authentic achievement: Restructuring schools for intellectual quality. San Francisco, CA: Jossey-Bass.

NGSS Lead States. 2013. Next generation science standards: For states, by states. Washington, DC: Achieve, Inc. on behalf of the twenty-six states and partners that collaborated on the NGSS.

OECD. (2011). Quality time for students: Learning in and out of school. Paris, France: Author.

Pashler, H., Bain, P., Bottge, B., Graesser, A., Koedinger, K., McDaniel, M., and Metcalfe, J. (2007). Organizing Instruction and Study to Improve Student Learning (NCER 2007-2004). Washington, DC: National Center for Education Research, Institute of Education Sciences, U.S. Department of Education.

Pecheone, R., Kahl, S., Hamma, J., and Jaquith, A. (2010). Through a looking glass: Lessons learned and future directions for performance assessment. Stanford, CA: Stanford University, Stanford Center for Opportunity Policy in Education.

Pellegrino, J.W. (2013). Proficiency in science: Assessment challenges and opportunities. Science, 340(6130), 320-323.

Pellegrino, J.W., and Quellmalz, E.S. (2010-2011). Perspectives on the integration of technology and assessment. Journal of Research on Technology in Education, 43(3), 119-134.

Pellegrino, J.W., DiBello, L.V., and Brophy, S.P. (2014). The science and design of assessment in engineering education. In A. Johri and B.M. Olds (Eds.), Cambridge handbook of engineering education research (Ch. 29). Cambridge, UK: Cambridge University Press.

Penuel, W.R., Moorthy, S., DeBarger, A.H., Beauvineau, Y., and Allison, K. (2012). Tools for orchestrating productive talk in science classrooms. Workshop presented at the International Conference for Learning Sciences, Sydney, Australia.

Perie, M., Marion, S., Gong, B., and Wurtzel, J. (2007). The role of interim assessment in a comprehensive assessment system. Washington, DC: The Aspen Institute.

Peters, V., Dewey, T., Kwok, A., Hammond, G.S., and Songer, N.B. (2012). Predicting the impacts of climate change on ecosystems: A high school curricular module. The Earth Scientist, 28(3), 33-37.

Page 248 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

Queensland Studies Authority. (2010a). Moderation handbook for authority subjects. The State of Queensland: Author.

Queensland Studies Authority. (2010b). School-based assessment: The Queensland system. The State of Queenlsand: Author.

Quellmalz, E.S., Timms, M.J., and Buckley, B.C. (2009). Using science simulations to support powerful formative assessments of complex science learning. Washington, DC: WestEd.

Quellmalz, E.S., Timms, M.J., Silberglitt, M.D., and Buckley, B.C. (2012). Science assessments for all: Integrating science simulations into balanced state science assessment systems. Journal of Research in Science Teaching, 49(3), 363-393.

Reiser, B.J. (2004). Scaffolding complex learning: The mechanisms of structuring and problematizing student work. Journal of the Learning Sciences: 13(3), 273-304.

Reiser, B.J. et al. (2013). Unpublished data from IQWST 6th grade classroom, collected by Northwestern University Science Practices project, PI Brian Reiser.

Rennie Center for Education Research and Policy. (2008). Opportunity to learn audit: High school science. Cambridge, MA: Author. Available: http://renniecenter.issuelab.org/resource/opportunity_to_learn_audit_high_school_science [December 2013].

Rosebery, A., and Warren, B. (Eds.). (2008). Teaching science to English language learners: Building on students’ strengths. Arlington, VA: National Science Teachers Association.

Rosebery, A., Ogonowski, M., DiSchino, M. and Warren, B. (2010). The coat traps all the heat in your body: Heterogeneity as fundamental to learning. Journal of the Learning Sciences 19(3), 322-357.

Ruiz-Primo, M.A., Shavelson, R.J., Hamilton, L.S., and Klein, S. (2002). On the evaluation of systemic science education reform: Searching for instructional sensitivity. Journal of Research in Science Teaching, 39(5), 369-393.

Rutstein, D., and Haertel, G. (2012). Scenario-based, technology-enhanced, large-scale science assessment task. Paper prepared for the Technology-Enhanced Assessment Symposium, Educational Testing Service, SRI International, April 9.

Scalise, K. (2009). Computer-based assessment: “Intermediate constraint” questions and tasks for technology platforms. Available: http://pages.uoregon.edu/kscalise/taxonomy/taxonomy.html [October 2013].

Scalise, K. (2011). Intermediate constraint taxonomy and automated scoring approaches. Session presented for Colloquium on Machine Scoring: Specification of Domains, Tasks/Tests, and Scoring Models, the Center for Assessment, May 25-26, Boulder, CO.

Scalise, K., and Gifford, B.R. (2006). Computer-based assessment in e-learning: A framework for constructing “intermediate constraint” questions and tasks for technology platforms. Journal of Teaching, Learning and Assessment, 4(6).

Page 249 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

Schmeiser, C.B., and Welch, C.J. (2006). Test development. In R.L. Brennan (Ed.), Educational Measurement (fourth ed.). Westport, CT: American Council on Education and Praeger.

Schmidt, W. et al. (1999). Facing the consequences: Using TIMSS for a closer look at United States mathematics and science education. Hingham, MA: Kluwer Academic.

Schwartz, R., Ayers, E., and Wilson, M. (2011). Mapping a learning progression using unidimensional and multidimensional item response models. Paper presented at the International Meeting of the Psychometric Society, Hong Kong.

Schwarz, C.V., Reiser, B.J., Davis, E.A., Kenyon, L., Acher, A., Fortus, D., Shwartz, Y., Hug, B., and Krajcik, J. (2009). Developing a learning progression for scientific modeling: Making scientific modeling accessible and meaningful for learners. Journal of Research in Science Teaching, 46(6), 632-654.

Shwartz, Y., Weizman, A., Fortus, D., Krajcik, J., and Reiser, B. (2008). The IQWST experience: Using coherence as a design principle for a middle school science curriculum. The Elementary School Journal, 109(2), 199-219.

Science Education for Public Understanding Program. (1995). Issues, evidence and you: Teacher’s guide. Berkeley: University of California, Lawrence Hall of Science.

Shavelson, R.J., Baxter, G.P., and Gao, X. (1993). Sampling variability of performance assessments. Journal of Educational Measurement, 30(3), 215-232.

Shaw, J.M., Bunch, G.C., and Geaney, E.R. (2010). Analyzing language demands facing English learners on science performance assessments: The SALD framework. Journal of Research in Science Teaching, 47(8), 908-928

Smith, C., Wiser, M., Anderson, C., and Krajcik, J. (2006). Implications of research on children’s learning for standards and assessment: A proposed learning progression for matter and atomic-molecular theory. Measurement, 14(1-2), 1-98.

Solano-Flores, G., and Li, M. (2009). Generalizability of cognitive interview-based measures across cultural groups. Educational Measures: Issues and Practice, 28(2), 9-18.

Solano-Flores, G., and Nelson-Barber, S. (2001). On the cultural validity of science assessments. Journal of Research in Science Teaching, 38(5), 553-573.

Songer, N.B. et al. (2013). Unpublished resource material from University of Michigan.

Songer, N.B., Kelcey, B., and Gotwals, A.W. (2009). How and when does complex reasoning occur? Empirically driven development of a learning progresion focused on complex reasoning about biodiversity. Journal of Research in Science Teaching, 46(6), 610-631.

SRI International. (2013). Student assessment for everchanging earth unit. Menlo Park, CA: Author.

Stecher, B.M., and Klein, S.P. (1997). The cost of science performance assessments in large-scale testing programs. Educational Evaluation and Policy Analysis, 19(1), 1-14.

Page 250 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

Steinberg, L.S., Mislevy, R.J., Almond, R.G., Baird, A.B., Cahallan, C., DiBello, L.V., Senturk, D., Yan, D., Chernick, H., and Kindfield, A.C.H. (2003). Introduction to the Biomass project: An illustration of evidence-centered assessment design and delivery capability. CSE Report 609. Los Angeles: University of California Center for the Study of Evaluation.

Steinhauer, E., and Koster van Groos, J. (2013). PISA 2015: Scientific literacy. Available: http://www.k12center.org/rsc/pdf/s3_vangroos.pdf [December 2013].

Stiggins, R.J. (1987). The design and development of performance assessments. Educational Measurement: Issues and Practice, 6, 33-42.

Sudweeks, R.R., and Tolman, R.R. (1993). Empirical versus subjective procedures for identifying gender differences in science test items. Journal of Research in Science Teaching, 30(1), 3-19.

Thompson, S.J., Johnstone, C.J., and Thurlow, M.L. (2002). Universal design applied to large scale assessments. Synthesis Report 44. Minneapolis: University of Minnesota.

Topol, B., Olson, J., and Roeber, E. (2010). The cost of new higher quality assessments: A comprehensive analysis of the potential costs for future state assessments. Stanford, CA: Stanford University, Stanford Center for Opportunity Policy in Education.

Topol, B., Olson, J., Roeber, E., and Hennon, P. (2013). Getting to higher-quality assessments: Evaluating costs, benefits, and investment strategies. Stanford, CA: Stanford University, Stanford Center for Opportunity Policy in Education.

Tzou, C.T., and Bell, P. (2010). Micros and me: Leveraging home and community practices in formal science instruction. In K. Gomez, L. Lyons and J. Radinsky (Eds.), Proceedings of the 9th International Conference of the Learning Sciences (pp. 1135-1143). Chicago, IL: International Society of the Learning Sciences.

Tzou, C.T., Bricker, L.A., and Bell, P. (2007). Micros and me: A fifth-grade science exploration into personally and culturally consequential microbiology. Seattle: Everyday Science and Technology Group, University of Washington.

U.S. Census Bureau. (2012). The 2012 statistical abstract. Available: http://www.census.gov/compendia/statab/cats/education.html [September 2013].

Wainer, H. (2003). Reporting test results in education. In R. Fernández-Ballesteros (Ed.), Encyclopedia of psychological assessment. (pp. 818-826). London: Sage.

Warren, B., Ballenger, C., Ogonowski, M., Rosebery, A. and Hudicourt-Barnes, J. (2001). Rethinking diversity in learning science: The logic of everyday sensemaking. Journal of Research in Science Teaching, 38, 1-24.

Warren, B., Ogonowski, M. and Pothier, S. (2005). “Everyday” and “scientific”: Re-thinking dichotomies in modes of thinking in science learning. In R. Nemirovsky, A. Rosebery, J. Solomon, and B. Warren (Eds.), Everyday matters in science and mathematics: Studies of complex classroom events (pp. 119-148). Mahwah, NJ: Lawrence Erlbaum Associates.

Page 251 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

Wiggins, G., and McTight, J. (2005). Understanding by design: Expanded 2nd edition. New York: Pearson.

Williamson, D.M., Mislevy, R.J., and Bejar, I.I. (2006). Automated scoring of complex tasks in computer-based testing. Mahwah, NJ: Lawrence Erlbaum Associates.

Williamson, D.M., Xi, X., and Breyer, F.J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2-13.

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, NJ: Lawrence Erlbaum Associates.

Wilson, M. (2009). Measuring progressions: Assessment structures underlying a learning progression. Journal for Research in Science Teaching, 46(6), 716-730.

Wilson, M., and Draney, K. (2002). A technique for setting standards and maintaining them over time. In S. Nishisato, Y. Baba, H. Bozdogan, and K. Kanefugi (Eds.), Measurement and multivariate analysis (pp. 325-332). Proceedings of the International Conference on Measurement and Multivariate Analysis, Banff, Canada, May 12-14, 2000. Tokyo: Springer-Verlag.

Wilson, M., and Scalise, K. (2012). Measuring collaborative digital literacy. Paper presented at the Invitational Research Symposium on Technology Enhanced Assessments, May 7-8, Washington, DC. Available: http://www.k12center.org/rsc/pdf/session5-wilsonpaper-tea2012.pdf [April 2014].

Wilson, M. and Sloane, K. (2000). From principles to practice: An embedded assessment system. Applied Measurement in Education, 13(2), 181-208.

Wilson, M, and Wang, W. (1995). Complex composites: Issues that arise in combining different modes of assessment. Applied Psychological Measurement. 19(1), 51-72.

Wilson, M. et al. (2013). Unpublished data from the BEAR Center at the University of California, Berkeley.

Wood, W.B. (2009). Innovations in teaching undergraduate biology and why we need them. Annual Review of Cell and Developmental Biology, 25, 93-112.

Yen, W.M., and Ferrara, S. (1997). The Maryland school performance assessment program: Performance assessment with psychometric quality suitable for high-stakes usage. Educational and Psychological Measurement, 57, 60-84.

Young, B.J., and Lee, S.K. (2005). The effects of a kit-based science curriculum and intensive professional development on elementary student science achievement. Journal of Science Education and Technology, 14(5/6), 471-481.

Page 252 Cite

Suggested Citation:"References." National Research Council. 2014. Developing Assessments for the Next Generation Science Standards. Washington, DC: The National Academies Press. doi: 10.17226/18409.

×

This page intentionally left blank.

Developing Assessments for the Next Generation Science Standards (2014)

Chapter: References

Welcome to OpenBook!

Get Email Updates