ACT, Inc. (2012). Foreign Service Officer Test: Study Guide, 5th Edition. Iowa City: ACT, Inc.
Alderson, C. (2000). Assessing Reading. Cambridge, UK: Cambridge University Press.
American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (2014). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association.
Bachman, L.F. (1990). Fundamental Considerations in Language Testing. Oxford, UK: Oxford University Press.
Bachman, L.F. (2005). Building and supporting a case for test use. Language Assessment Quarterly 2:1–34.
Bachman, L.F. (2007). What is the construct? The dialectic of abilities and contexts in defining constructs in language assessment. In J. Fox, M. Wesche, D. Bayliss, L. Cheng, C. Turner, and C. Doe (Eds.), Language Testing Reconsidered (pp. 41–71). Ottawa, Canada: University of Ottawa Press.
Bachman, L.F., and A. Palmer. (1996). Language Assessment in Practice: Designing and Documenting Useful Language Tests. Oxford, UK: Oxford University Press.
Bachman, L.F., and A. Palmer. (2010). Language Assessment in Practice. Oxford, UK: Oxford University Press.
Baker, B., and A. Hope. (2019). Incorporating translanguaging in language assessment: The case of a test for university professors. Language Assessment Quarterly 16(4-5):408–425.
Barnett, E.A, P. Bergman, E. Kopko, V. Reddy, C.R. Belfield, and S. Roy. (2018). Multiple Measures Placement Using Data Analytics: An Implementation and Early Impacts Report. Center for the Analysis of Postsecondary Readiness. Available: https://ccrc.tc.columbia.edu/publications/multiple-measures-placement-using-data-analytics.html.
Bernstein, J.C. (2013). Computer scoring of spoken responses. In C.A. Chapelle (Ed.), The Encyclopedia of Applied Linguistics (pp. 1–7). New York: Blackwell Publishing.
Brannick, M.T., K. Pearlman, and J.I. Sanchez. (2017). Work analysis. In J.L. Farr and N.T. Tippins (Eds.), Handbook of Employee Selection (pp. 134–162). New York: Routledge.
Brindley, G. (1994). Task-centered assessment in language learning: The promise and the challenge. In N. Bird, P. Falvey, A.B.M. Tsui, D. Allison, and A. McNeill (Eds.), Language and Learning (pp. 73–94). Hong Kong: Institute of Language in Education, Hong Kong Department of Education.
Brown, J.D, and T. Hudson. (1998). The alternatives in language assessment. TESOL Quarterly 32(4):653–675. doi:10.2307/3587999.
Brown, J.D., T. Hudson, J. Norris, and W. Bonk. (2002). An Investigation of Second Language Task-based Performance Assessments. Honolulu, HI: University of Hawaii Press.
Buck, G. (2001). Assessing Listening. Cambridge, UK: Cambridge University Press.
Canagarajah, S. (2006). Changing communicative needs, revised assessment objectives: Testing English as an international language. Language Assessment Quarterly 3(3):229–242.
Canale, M. (1983). On some dimensions of language proficiency. In J. Oller (Ed.), Issues in LT Research (pp. 333–342). Rowley, MA: Newbury House.
Canale, M., and M. Swain. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics 1(1):1–47.
Carroll, J.B. (1961). Fundamental considerations in testing English proficiency of foreign students. In Testing the English Proficiency of Foreign Students (pp. 30–40). Washington, DC: Center for Applied Linguistics.
Cenoz, J. (2013). Defining multilingualism. Annual Review of Applied Linguistics 33:3–18.
Chalhoub-Deville, M. (2003). Second language interaction: Current perspectives and future trends. Language Testing 20(4):369–383.
Chapelle, C. (1998). Construct definition and validity inquiry in SLA research. In. L. Bachman and A.D. Cohen (Eds.), Interfaces between Second Language Acquisition and Language Testing Research (pp. 32–70). Cambridge, UK: Cambridge University Press.
Chen, L., K. Zechner, S.Y. Yoon, K. Evanini, X. Wang, A. Loukina, and B. Gyawali. (2018). Automated scoring of nonnative speech using the speech rater SM v. 5.0 engine. ETS Research Report Series 18(10):1–31. Available: https://doi.org/10.1002/ets2.12198.
Chester, M.D. (2005). Making valid and consistent inferences about school effectiveness from multiple measures. Educational Measurement: Issues and Practice 24(4):40–52.
Chun, C.W. (2008). Comments on “Evaluation of the usefulness of the Versant for English test: A response”: The author responds. Language Assessment Quarterly 5(2):168–172. doi.org/10.1080/15434300801934751.
Ciechanowski, L., A. Przegalinska, M. Magnuski, and P. Gloor. (2019). In the shades of the uncanny valley: An experimental study of human-chatbot interaction. Future Generation Computer Systems 92:539–548.
Cizek, G.J. (Ed.). (2012). Setting Performance Standards: Concepts, Methods, and Perspectives, 2nd Edition. New York: Routledge.
Cizek, G.J., and M.B. Bunch. (2007). Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests. Thousand Oaks, CA: Sage.
Clark, J.L.D. (1972). Foreign Language Testing: Theory and Practice. Philadelphia: Center for Curriculum Development.
Council of Europe. (2018). Common European Framework of Reference for Languages: Learning, Teaching, Assessment; Companion Volume with New Descriptors. Available: https://rm.coe.int/cefr-companion-volume-with-new-descriptors-2018/1680787989.
Cumming, A. (2013). Assessing integrated writing tasks for academic purposes: Promises and perils. Language Assessment Quarterly 10(1):1–8.
Cummins, P.W., and C. Davesne. (2009). Using electronic portfolios for second language assessment. The Modern Language Journal 93:848–867. doi.org/10.1111/j.1540-4781.2009.00977.x.
Cushing-Weigle, S.C. (2004). Integrating reading and writing in a competency test for nonnative speakers of English. Assessing Writing 9(1):27–55.
Davies, A. (2003). The Native Speaker: Myth and Reality. Tonawanda, NY: Multilingual Matters Ltd.
Davis, L. (2009). The influence of interlocutor proficiency in a paired oral assessment. Language Testing 26(3):367–396. Available: https://doi.org/10.1177/0265532209104667.
Dewaele, J.M. (2018). Why the dichotomy ‘L1 versus LX user’ is better than ‘native versus non-native speaker. Applied Linguistics 39(2):236–240.
Dorsey, D.W. (2005). The portfolio as a multipurpose tool: Part 1–using the portfolio for leadership development. In R. Mueller-Hanson’s and D. Dorsey’s (Chairs), The Portfolio: An Innovative Approach to Assessment, Development, and Evaluation. Practitioner Forum conducted at the Twentieth Annual Conference of the Society for Industrial and Organizational Psychology, Los Angeles, California.
Douglas, D. (2000). Assessing Languages for Specific Purposes. Cambridge, UK: Cambridge University Press.
Douglas, K.M., and R.J. Mislevy. (2010). Estimating classification accuracy for complex decision rules based on multiple scores. Journal of Educational and Behavioral Statistics 35(3):280–306.
Ducasse, A.M., and A. Brown. (2009). Assessing paired orals: Raters’ orientation to interaction. Language Testing 26(3):423–443. doi.org/10.1177/0265532209104669.
East, M. (2015). Coming to terms with innovative high-stakes assessment practice: Teachers’ viewpoints on assessment reform. Language Testing 32(1):101–120. doi. org/10.1177/0265532214544393.
East, M. (2016). Assessing Foreign Language Students’ Spoken Proficiency: Stakeholder Perspectives on Assessment Innovation. New York: Springer.
East, M., and A. Scott. (2011). Assessing the foreign language proficiency of high school students in New Zealand: From the traditional to the innovative. Language Assessment Quarterly 8(2):179–189. doi.org/10.1080/15434303.2010.538779.
Ferrara, S., E. Lai, A. Reilly, and P.D. Nichols. (2017). Principled approaches to assessment design, development and implementation. In A.A. Rupp and J.P. Leighton (Eds.), The Handbook of Cognition and Assessment, Frameworks, Methodologies and Applications (pp. 41–74). West Sussex, UK: Wiley.
Figueras, N., and J. Noijons (Eds.). (2009). Linking to the CEFR Levels: Research Perspectives. Arnhem: CITO and EALTA. Available: http://www.coe.int/t/dg4/linguistic/Proceedings_CITO_EN.pdf.
Forscher, P.S., C.K. Lai, J. Axt, C.R. Ebersole, M. Herman, P.G. Devine, and B.A. Nosek. (2016, August 15; preprint). A Meta-Analysis of Procedures to Change Implicit Measures. doi.org/10.31234/osf.io/dv8tu.
Friedrich, P. (Ed.). (2016). English for Diplomatic Purposes. Bristol, UK: Multilingual Matters.
Fulcher, G., and F. Davidson (Eds.). (2012). The Routledge Handbook of Language Testing. New York: Routledge.
Gass, S., and M. Varonis. (1984). The effect of familiarity on the comprehensibility of nonnative speech. Language Learning 34(1):65–89.
Gatewood, R.D., H.S. Feild, and M.R. Barrick. (2015). Human Resource Selection. Mason, OH: South-Western, Cengage Learning.
Gebril, A., and L. Plakans. (2014). Assembling validity evidence for assessing academic writing: Rater reactions to integrated tasks. Assessing Writing 21:56–73.
Gorter, D., and J. Cenoz. (2017). Language education policy and multilingual assessment. Language and Education 31(3):231–248.
Green, A. (2014). Exploring Language Assessment and Testing. Routledge Introductions to Applied Linguistics. New York: Routledge.
Greenwald, A.G., D.E. McGhee, and J.L. Schwartz. (1998). Measuring individual differences in implicit cognition: The implicit association test. Journal of Personality and Social Psychology 74(6):1464–1480.
Griffin, P., B. McGraw, and E. Care (Eds.). (2012). Assessment and Teaching of 21st Century Skills. New York: Springer Science+Business, Media.
Hambleton, R.K., and M.J. Pitoniak. (2006). Setting performance standards. In R.L. Brennan (Ed.), Educational Measurement (4th ed., pp. 433–470). Westport, CT: Praeger.
Hambleton, R.K., and A.L. Zenisky. (2013). Reporting test scores in more meaningful ways: A research-based approach to score report design. In K.F. Geisinger, B.A. Bracken, J.F. Carlson, J.-I.C. Hansen, N.R. Kuncel, S.P. Reise, and M.C. Rodriguez (Eds.), APA Handbooks in Psychology®. APA Handbook of Testing and Assessment in Psychology, Vol. 3. Testing and Assessment in School Psychology and Education (pp. 479–494). American Psychological Association. Available: https://doi.org/10.1037/14049-023.
Hart-Gonzalez, L. (1994). Raters and Scales in Oral Proficiency Testing: The FSI Experience. Paper presented at the Annual Language Testing Research Colloquium, Washington, DC. Available: https://www.semanticscholar.org/paper/Raters-and-Scales-in-OralProficiency-Testing%3A-The-Hart-Gonz%C3%A1lez/ff6a160d3dcc091f7d78a1db6a308572123333c7.
Herman, J.L., M. Gearhart, and E.L. Baker. (1993). Assessing writing portfolios: Issues in the validity and meaning of scores. Educational Assessment 1(3):201–224. Available: http://dx.doi.org/10.1207/s15326977ea0103_2.
Housen, A., F. Kuiken, and I. Vedder. (2012). Dimensions of L2 Performance and Proficiency: Complexity, Accuracy and Fluency in SLA. Amsterdam: John Benjamins.
Hu, G. (2018). The challenges of world Englishes for assessing English proficiency. In E.L. Low and A. Pakir (Eds.), World Englishes: Rethinking Paradigms (pp. 78–95). New York: Routledge.
In’nami, Y., and R. Koizumi. (2011). Structural equation modeling in language testing and learning research: A review. Language Assessment Quarterly 8(3):250–276.
Institute for Credentialing Excellence. (2016). National Commission for Certifying Agencies (NCCA) Standards for the Accreditation of Certification Programs. Available: https://www.credentialingexcellence.org/ncca.
International Test Commission. (2001). International guidelines for test use. International Journal of Testing 1(2):93–114.
International Test Commission. (2018). ITC Guidelines for the Large-scale Assessment of Linguistically and Culturally Diverse Populations. Available: www.InTestCom.org.
Isaacs, T. (2008). Towards defining a valid assessment criterion of pronunciation proficiency in non-native English-speaking graduate students. Canadian Modern Language Review 64(4):555–580.
Isbell, D., and P. Winke. (2019). ACTFL Oral Proficiency Interview – computer (OPIc). Language Testing 36(3):467–477. doi.org/10.1177/0265532219828253.
Jenkins, J. (2006). Current perspectives on teaching world Englishes and English as a lingua franca. TESOL Quarterly 40(1):157–181. doi.org/10.2307/40264515.
Jonsson, A. (2014). Rubrics as a way of providing transparency in assessment. Assessment & Evaluation in Higher Education 39(7):840–852.
Kachru, B.B. (1996). The paradigms of marginality. World Englishes 15(3):241–255. doi. org/10.1111/j.1467-971X.1996.tb00112.x.
Kane, M.T. (2006). Validation. In R. L. Brennan (Ed.), Educational Measurement (pp. 17–64). Westport: American Council on Education/Praeger Publishers.
Kelly, J., J. Renn, and J. Norton. (2018). Addressing consequences and validity during test design and development: Implementing the CAL Validation Framework. In J.E. Davies, J.M. Norris, M.E. Malone, T.H. McKay, and Y. Son (Eds.), Useful Assessment and Evaluation in Language Education (pp. 185–200). Washington, DC: Georgetown University Press.
Kenyon, D.M., and V. Malabonga. (2001). Comparing examinee attitudes toward computer-assisted and other oral proficiency assessments. Language Learning and Technology 5(2):60–83.
Knoch, U., and S. Macqueen. (2020). Assessing English for Professional Purposes. Routledge.
Knoch, U., and W. Sitajalabhorn. (2013). A closer look at integrated writing tasks: Towards a more focused definition for assessment purposes. Assessing Writing 18(4):300–308.
Koretz, D. (1998). Large-scale portfolio assessments in the US: Evidence pertaining to the quality of measurement. Assessment in Education 5:309–334.
Koretz, D.M., and L.S. Hamilton. (2006) Testing for Accountability in K–12. In R. L. Brennan (Ed.), Educational Measurement (pp. 531–578). American Council on Education/Praeger Publishers.
Kress, G. (2010). Multimodality: A Social Semiotic Approach to Contemporary Communication. New York: Routledge.
Kunnan, A.J. (2018). Evaluating Language Assessments. New York: Routledge.
Lado, R. (1961). Language Testing. New York: McGraw-Hill.
Lazaraton, A., and L. Davis. (2008). A microanalytic perspective on discourse, proficiency, and identity in paired oral assessment. Language Assessment Quarterly 5(4):313–335. doi. org/10.1080/15434300802457513.
Lei, L., and D. Liu. (2019). Research trends in applied linguistics from 2005 to 2016: A bibliometric analysis and its implications. Applied Linguistics 40(3):540–561.
Lemke, J. (2002). Travels in hypermodality. Visual Communication 1:299–325. doi. org/10.1177/147035720200100303.
Levashina, J., C.J. Hartwell, F.P. Morgeson, and M.A. Campion. (2014). The structured employment interview: Narrative and quantitative review of the research literature. Personnel Psychology 67(1):241–293.
Levinson, S. (1983). Pragmatics. Cambridge, UK: Cambridge University Press.
Little, D. (2011). The common European framework of reference for languages, the European language portfolio, and language learning in higher education. Language Learning in Higher Education 1(1):1–21. doi.org/10.1515/cercles-2011-0001.
Livingston, S.A., and M.J. Zieky. (1982). Passing scores: A manual for setting standards of performance on educational and occupational tests. Princeton, NJ: Educational Testing Service. Available: https://www.ets.org/Media/Research/pdf/passing_scores.pdf.
Long, M.H. (2015). Second Language Acquisition and Task-based Language Teaching. Malden, MA: Wiley Blackwell.
Long, M.H., and J.M. Norris. (2000). Task-based teaching and assessment. In M. Byram (Ed.), Encyclopedia of Language Teaching (pp. 597–603). London, UK: Routledge.
Luecht, R.M., T. Brumfield, and K. Breithaupt. (2006). A Testlet assembly design for adaptive multistage tests. Applied Measurement in Education 19(3):189–202. (Special edition on multistage testing.)
Luecht, R.M., and R.J. Nungester. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement 35:229–249.
Luecht, R.M., and S.G. Sireci. (2011). A Review of Models for Computer-Based Testing. Research report 2011–12, College Board. Available: https://files.eric.ed.gov/fulltext/ED562580.pdf.
Luke, S.D., and A. Schwartz. (2007). Assessment and Accommodations, Evidence for Education, National Dissemination Center for Children with Disabilities. Available: https://successforkidswithhearingloss.com/beta/wp-content/uploads/2013/09/Assessment-Accommodations-NICYC.pdf.
Luoma, S. (2004). Assessing Speaking. Cambridge, UK: Cambridge University Press.
Malabonga, V., D.M. Kenyon, and H. Carpenter. (2005). Self-assessment, preparation and response time on a computerized oral proficiency test. Language Testing 22(1):59–92.
May, L. (2009). Co-constructed interaction in a paired speaking test: The rater’s perspective. Language Testing 26(3):397–421. doi.org/10.1177/0265532209104668.
McNamara, T. (1996). Measuring Second Language Performance. New York: Longman.
Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational Measurement, 3rd Ed. (pp. 13–103). New York: American Council on Education/Macmillan.
Messick, S. (1996). Validity and washback in language testing. Language Testing 13:243–256.
Mislevy, R.J. (2018). Sociocognitive Foundations of Educational Measurement. New York/London: Routledge.
Mislevy, R.J., and G. Haertel. (2006). Implications for evidence centered design for educational assessment. Educational Measurement: Issues and Practice 25:6–20.
Mislevy, R.J., L.S. Steinberg, and R.G. Almond. (1999a). Evidence-centered Assessment Design. Princeton, NJ: Educational Testing Service.
Mislevy, R.J., L.S. Steinberg, and R.G. Almond. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives 1(1):3–62. doi. org/10.1207/S15366359MEA0101_02.
Mislevy, R.J., L.S. Steinberg, F.J. Breyer, R.G. Almond, and L. Johnson. (1999b). A cognitive task analysis with implications for designing simulation-based performance assessment. Computers in Human Behavior 15(3-4):335–374.
MLA Ad Hoc Committee on Foreign Language. (2007). Foreign languages and higher education: New structures for a changed world. Profession 12:234–245.
National Council on Measurement in Education. (1995). Code of professional responsibilities in educational measurement. National Council of Measurement in Education. Washington, DC: Author.
National Research Council. (2001a). Building a Workforce for the Information Economy. Committee on Workforce Needs in Information Technology; Computer Science and Telecommunications Board; Board on Testing and Assessment; Board on Science, Technology, and Economic Policy; and Office of Scientific and Engineering Personnel. Washington, DC: National Academy Press.
National Research Council. (2001b). Knowing What Students Know: The Science and Design of Educational Assessment. Committee on the Foundations of Assessment. J. Pellegrino, N. Chudowsky, and R. Glaser (Eds.). Washington, DC: National Academy Press.
National Research Council. (2008). Assessing Accomplished Teaching: Advanced-level Certification Programs. Committee on Evaluation of Teacher Certification by the National Board for Professional Teaching Standards. M.D. Hakel, J.A. Koenig, and S.W. Elliott (Eds.). Washington, DC: The National Academies Press.
National Research Council. (2014). Developing Assessments for the Next Generation Science Standards. Committee on Developing Assessments of Science Proficiency in K–12. Board on Testing and Assessment and Board on Science Education, J.W. Pellegrino, M.R. Wilson, J.A. Koenig, and A.S. Beatty (Eds.). Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.
Newton, J., and E. Kusmierczyk. (2011). Teaching second languages for the workplace. Annual Review of Applied Linguistics 31:74–92. doi.org/10.1017/S0267190511000080.
Nordquist, R. (2020). Definition and Examples of Language Varieties. ThoughtCo. Available: thoughtco.com/language-variety-sociolinguistics-1691100.
Norris, J.M. (2016). Current uses for task-based language assessment. Annual Review of Applied Linguistics 36(13):230–244. doi.10.1017/S0267190516000027.
Norris, J.M., J.D. Brown, T.D. Hudson, and W. Bonk. (2002). Examinee abilities and task difficulty in task-based second language performance assessment. Language Testing 19(4):395–418. doi.org/10.1191/0265532202lt237oa.
Norton, J. (2005). The paired format in the Cambridge speaking tests. ELT Journal 59(4):287–297. doi.org/10.1093/elt/cci057.
O’Reilly, T., and J. Sabatini. (2013). Reading for Understanding: How Performance Moderators and Scenarios Impact Assessment Design. Research Report RR-13-31. Princeton, NJ: Educational Testing Service. doi: 10.1002/j.2333-8504.2013.tb02338.x
Ockey, G.J. (2009). The effects of group members’ personalities on a test taker’s L2 group oral discussion test scores. Language Testing 26:161–186. doi.org/10.1177/0265532208101005.
Ockey, G.J., and E. Wagner. (2018a). An overview of interactive listening as part of the construct of interactive and integrated oral test tasks. In G. Ockey and E. Wagner (Eds.), Assessing L2 Listening: Moving towards Authenticity (pp. 179–192). Amsterdam and Philadelphia: John Benjamins.
Ockey, G.J., and E. Wagner. (2018b). Assessing L2 Listening: Moving towards Authenticity. Amsterdam and Philadelphia: John Benjamins.
Oh, S. (2019). Second language learners’ use of writing resources in writing assessment. Language Assessment Quarterly 17(1):60–84. doi.org/10.1080/15434303.2019.1674854.
Organisation for Economic Co-operation and Development. (2003). Education at a Glance: OECD Indicators 2003. Available: http://www.oecd.org/site/worldforum/33703760.pdf.
Organisation for Economic Co-operation and Development. (2016). PISA 2018: Draft Analytical Frameworks. Paris, France: Author.
Organisation for Economic Co-operation and Development. (2018). PISA 2015: Results in Focus. Paris, France: Author.
Oswald, F.L., G. Mitchell, H. Blanton, J. Jaccard, and P.E. Tetlock. (2013). Predicting ethnic and racial discrimination: A meta-analysis of IAT criterion studies. Journal of Personality and Social Psychology 105(2):171–192.
Papageorgiou, S., R.J. Tannenbaum, B. Bridgeman, and Y. Cho. (2015). The Association between TOEFL iBT®Test Scores and the Common European Framework of Reference (CEFR) Levels (Research Memorandum No. RM-15-06). Princeton, NJ: Educational Testing Service. Available: https://www.ets.org/Media/Research/pdf/RM-15-06.pdf.
Pettis, J.C. (2014). Portfolio-based Language Assessment (PBLA): Guide for Teachers and Programs. Available: https://listn.tutela.ca/wp-content/uploads/PBLA_Guide_2014.pdf.
Plakans, L. (2009). Discourse synthesis in integrated second language writing performance. Language Testing 26(4):561–587.
Plakans, L. (2014). Written discourse. In A. Kunnan (Ed.), The Companion to Language Assessment. Somerset, NJ: Wiley and Sons.
Plakans, L., and A. Gebril. (2012). Using multiple tests in an integrated writing assessment: Source text use as a predictor of score. Journal of Second Language Writing 22:317–230.
Pulakos, E., and T. Kantrowitz. (2016). Choosing Effective Talent Assessments to Strengthen Your Organization. Society for Human Resource Management (SHRM) Foundation. Available: https://www.shrm.org/hr-today/trends-and-forecasting/special-reports-and-expert-views/documents/effective-talent-assessments.pdf.
Purpura, J.E. (2004). Assessing Grammar. Cambridge, UK: Cambridge University Press.
Purpura, J.E. (2016). Second and foreign language assessment. The Modern Language Journal 100(S1):190–208.
Purpura, J.E. (2017). Assessing Meaning. In E. Shohamy and L. Or (Eds.), Encyclopedia of Language and Education, Vol. 7. Language Testing and Assessment. New York: Springer International Publishing.
Purpura, J.E. (2019). Questioning the currency of second and foreign language proficiency exams as measures of 21st century competencies. Teachers College, Columbia University, The Arts and Humanities Distinguished Lecture Series, October 10, 2019. Available: https://vimeo.com/367018433.
Purpura, J.E., and C.E. Turner. (2018). Using Learning-oriented Assessment in Test Development. Auckland, New Zealand: Language Testing Research Colloquium.
Purpura, J.E., and J.W. Dakin. (2020). Assessment of the linguistic resources of communication. In C. Chapelle (Ed.), The Concise Encyclopedia of Applied Linguistics: Assessment and Evaluation (pp. 1–10). Oxford, UK: Wiley.
Qian, D.D. (2009). Comparing direct and semi-direct modes for speaking assessment: Affective effects on test takers. Language Assessment Quarterly 6(2):113–125.
Quaid, E. (2018). Output register parallelism in an identical direct and semi-direct speaking test: S case study. International Journal of Computer-assisted Language Learning and Teaching 8(2):75–91.
Ramanarayanan, V., K. Kvanini, and E. Tsuprun. (2020). Beyond monologues: Automated processing of conversational speech (pp. 176–191). In K. Zechner and K. Evanini (Eds.), Automated Speaking Assessment: Using Language Technologies to Score Spontaneous Speech. New York: Routledge.
Read, J. (2000). Assessing Vocabulary. Cambridge, UK: Cambridge University Press.
Robinson, P. (2001). Task complexity, cognitive resources, and syllabus design: A triadic framework for examining task influences on SLA. In P. Robinson (Ed.), Cognition and Second Language Instruction (pp. 287–318). Cambridge, UK: Cambridge University Press.
Roever, C., and G. Kasper. (2018). Speaking in turns and sequences: Interactional competence as a target construct in testing speaking. Language Testing 35(3):331–355. doi. org/10.1177/0265532218758128.
Sackett, P.R., P.T. Walmsley, A.J. Koch, A.S. Beatty, and N.R. Kuncel. (2016). Predictor content matters for knowledge testing: Evidence supporting content validation. Human Performance 29(1):54–71.
Schaeffer, G.A., B. Bridgeman, M.L. Golub Smith, C. Lewis, M.T. Potenza, and M. Steffen. (1998). Comparability of paper and pencil and computer adaptive test scores on the GRE® general test. ETS Research Report Series (2):i–25. Available: https://onlinelibrary.wiley.com/doi/pdf/10.1002/j.2333-8504.1998.tb01787.x.
Schissel, J.L., C. Leung, and M. Chalhoub-Deville. (2019). The Construct of Multilingualism in Language Testing. Language Assessment Quarterly 16(4-5):373–378. doi.org/10.1080/15434303.2019.1680679.
Seidlhofer, B. (2009). Common ground and different realities: World Englishes and English as a lingua franca. World Englishes 28(2):236–245. doi.org/10.1111/j.1467-971X.2009.01592.x.
Shavelson, R.J., and N.M. Webb. (1991). Generalizability Theory: A Primer. Newbury Park, CA: Sage.
Shohamy, E. (2011). Assessing multilingual competencies: Adopting construct valid assessment policies. The Modern Language Journal 95:418–429.
Skehan, P. (1998). A Cognitive Approach to Language Teaching. Oxford, UK: Oxford University Press.
Skehan, P. (2003). Task-based instruction. Language Teaching 36(1):1–12.
So, Y., M.K. Wolf, M.C. Hauck, P. Mollaun, P. Rybinski, D. Tumposky, and L. Wang. (2015). TOEFL Junior Design Framework (TOEFL). Young Students Research Report No. TOEFL Jr-02). Princeton, NJ: Educational Testing Service.
Society for Industrial and Organizational Psychology. (2018). Principles for the Validation and Use of Personnel Selection Procedures, 5th edition. Bowling Green, OH: Author.
Tannenbaum, R.J., and E.C. Wylie. (2008). Linking English Language Test Scores onto the Common European Framework of Reference: An Application of Standard-setting Methodology. TOEFL iBT Research Report RR-08-34. Princeton, NJ: Educational Testing Service. dx.doi.org/10.1002/j.2333-8504.2008.tb02120.x.
Taylor, L. (Ed.). (2011). Examining Speaking: Research and Practice in Assessing Second Language Speaking. Cambridge, UK: Cambridge University Press.
Taylor, L., and P. Falvery (Eds.). (2007). IELTS Collected Papers: Research in Speaking and Writing Assessment. Cambridge, UK: Cambridge University Press.
Tillema, H., M. Leenknecht, and M. Segers. (2011). Assessing assessment quality: Criteria for quality assurance in design of (peer) assessment for learning–a review of research studies. Studies in Educational Evaluation 37(1):25–34.
Turner, C.E., and J.E. Purpura. (2016). Learning-oriented assessment in second and foreign language classrooms. In D. Tsagari and J. Baneerjee (Eds.), Handbook of Second Language Assessment (pp. 255–272). Boston, MA: De Gruyter, Inc.
Van der Linden, W.J., and C.A. Glas (Eds.). (2000). Computerized Adaptive Testing: Theory and Practice. Dordrecht, The Netherlands: Kluwer Academic.
Van Moere, A. (2013). Raters and Ratings. Wiley Online Library.
VanPatten, B., J. Williams, and S. Rott. (2004). Form-meaning Connections in Second Language Acquisition. In B. VanPatten, J. Williams, S. Rott, and M. Overstreet (Eds.), Form-meaning Connections in Second Language Acquisition (pp. 1–26). Mahwah, NJ: Lawrence Erlbaum Associates Publishers.
Wagner, E. (2018). A comparison of l2 listening performance on tests with scripted or authenticated spoken texts. In G. Ockey and E. Wagner (Eds.), Assessing L2 Listening: Moving towards Authenticity (pp. 29–44). Amsterdam and Philadelphia: John Benjamins.
Wainer, H., N.J. Dorans, R. Flaugher, B.F. Green, and R.J. Mislevy. (2000). Computerized Adaptive Testing: A Primer. Oxfordshire, UK, and Philadelphia: Routledge.
Wang, Z., K. Zechner, and Y. Sun. (2018). Monitoring performance of human and automated scores for spoken responses. Language Testing 35(1):101–120. doi.org/10.1177/0265532216679451.
Weigle, S.C. (1998). Using FACETS to model rater training effects. Language Testing 15(2):263–287.
Weigle, S.C. (2002). Assessing Writing. Cambridge, UK: Cambridge University Press.
Winke, P. (2013). The effectiveness of interactive group oral for placement testing. In K. McDonough and A. Mackey (Eds.), Second Language Interaction in Diverse Educational Contexts (pp. 247–268). New York: John Benjamins.
Wolfram, W., C. Temple Adger, and D. Christian. (1999). Dialects in Schools and Communities. Mahwah, NJ: Erlbaum
Yamamoto, K., L. Khorramdel, and H.J. Shin. (2018). Introducing multistage adaptive testing into international large-scale assessments designs using the example of PIAAC. Psychological Test and Assessment Modeling 60(3):347–368.
Yan, D., A.A. von Davier, and C. Lewis (Eds.). (2014). Computerized Multistage Testing: Theory and Applications. Boca Raton, FL: CRC Press.
Zapata-Rivera, D. (Ed.). (2018). Score Reporting Research and Applications. New York / Oxon, UK: Routledge.
Zenisky, A., R.K. Hambleton, and R.M. Luecht. (2010). Multistage testing: Issues, designs, and research in W.J. van der Linden and C.A.W. Glas (Eds.), Elements of Adaptive Testing, Statistics for Social and Behavioral Sciences. Available: https://link.springer.com/content/pdf/10.1007%2F978-0-387-85461-8.pdf.
Zhang, Y., and C. Elder. (2009). Measuring the speaking proficiency of advanced EFL learners in China: The CET-SET solution. Language Assessment Quarterly 6(4):298–314. doi. org/10.1080/15434300902990967.
Zuengler, J., and E. Miller. (2006). Cognitive and sociocultural perspectives: Two parallel SLA worlds? TESOL Quarterly 40(1):35–58.
This page intentionally left blank.