As illustrated in the committee’s guiding framework (see Figure 1-1, in Chapter 1), the design for a high-stakes language assessment for use in a professional setting starts from an understanding of the nature of language and its use, the broader sociocultural and institutional contexts for the assessment, and the specific language use in the domain that will be targeted for the assessment. This chapter discusses some of the key concepts and techniques that inform these understandings.
The knowledge, skills, abilities, and other characteristics that are the focus of an assessment are described in terms of a “construct.” A construct is an abstract, theoretical concept, such as “knowledge” or “proficiency,” that has to be explicitly described and specified in test design. This definition usually comes from a mix of theory, research, and experience.
Construct definition plays a central role in a principled approach to assessment design and use. The goal of defining the construct is to provide a basis not only for the development of a given assessment but also for the interpretation and use of its results. For FSI, the construct will relate to descriptions of the language proficiency of Foreign Service officers who need to use a given language at a foreign post.
Conceptualizations of language and language proficiency become more nuanced over time, so every testing program needs to periodically revisit its construct definitions. Since the 1960s, approaches to construct definition have evolved to reflect broadened conceptions of language and language
use. They also reflect ongoing refinements in language assessment theory, advances in theories of learning and knowing, especially with respect to context and the social dimension of language, and the changing nature of language assessment in light of advances in technology (Bachman, 2007; Purpura, 2016). These refinements have had important consequences for operationalizing the construct of language proficiency and conceptualizing and justifying validity claims, and are, to varying degrees, reflected in current language assessments.
To address FSI’s desire to keep pace with developments in language assessment, this section summarizes four key approaches to defining language proficiency and their implications for the design and operationalization of test constructs and for the meaningful interpretation of performance on a test: trait-based, interactionist, meaning-oriented, and task-based. This summary illustrates the expansion of the construct of language proficiency over time, but the committee is not suggesting that all assessments should use the broadest measure possible. Rather, we call attention to the many different factors that can be considered in an assessment, depending on its intended goals and uses, and highlight the importance of being explicit about these factors and their contribution to performance. Such careful attention to the intended construct will allow for an accurate representation of a scored performance and its meaningful interpretation.
Probably the oldest and most common approach to defining the construct of language proficiency is to specify in a theoretical model how the trait of language proficiency is represented in a test-taker’s mind. This is done by identifying the knowledge components—such as grammatical knowledge—that underlie a test-taker’s proficiency and then designing tasks that measure those components (“traits”). Lado (1961) used this approach to conceptualize language proficiency as the ability to use language elements or forms (grammatical forms, lexical meanings) to understand and express meanings through listening, reading, speaking, and writing. Carroll (1961) expanded this conception to include not only how individuals communicate but also what they communicate in a complete utterance.
Knowledge of the mapping between form and meaning is still a critical component of language use (VanPatten et al., 2004), and it is the basis for grammatical assessment in tests designed to measure grammatical ability (e.g., the Oxford Online Placement Exam1). It has also been a central feature of scoring rubrics (scoring instructions and criteria) of language pro-
ficiency that have an independent language use scale (e.g., the TOEFL iBT test2); rubrics that have grammatical-level performance descriptors (such as in the skill-level descriptions of the Interagency Language Roundtable [ILR], used by FSI and discussed in Chapter 2); and approaches to the automatic scoring of speaking and writing (Purpura, 2004, 2016). Knowledge of this mapping is also reflected in the widely used “CAF” measures, which incorporate measures of three related but separable traits: complexity, accuracy, and fluency in speaking or writing (Housen et al., 2012). However, this conceptualization fails to resolve the potential vagueness and ambiguity of meaning often found in language.
Notable expansions of the language proficiency trait beyond grammatical rules and vocabulary include communicative competence (Canale, 1983; Canale and Swain, 1980) and communicative language ability (Bachman, 1990; Bachman and Palmer, 1996, 2010), which incorporate additional components to the language use model, such as knowledge of how to use language to achieve a functional goal or how to use language appropriately in social contexts with a diverse range of interlocutors. Bachman’s communicative language ability model specifies grammatical knowledge, textual knowledge, functional knowledge, and sociolinguistic knowledge. It has been used, for example, to guide the development of the Canadian Language Benchmarks Standards for adults learning English as a second language.3 Alongside language knowledge, this model also specifies the role that strategic processing plays in the ability to complete language-related tasks, which underlies the examinee’s ability to consider context, content, language, and dispositions while generating responses during a test, all considerations in the skill-level descriptions used by FSI.
Despite its strengths, the trait-based approach does not fully specify how language ability is affected by the context of language use. Context is a key determinant of language use, as can be seen in the Foreign Service context by the contrast between informally communicating with host nationals in a coffee shop and interacting with high-ranking government officials in a policy discussion. The interactionist approach (Chapelle, 1998) to construct definition addresses this omission by highlighting the role that the features of context, in addition to language knowledge and strategic processing, play
2 TOEFL is the Test of English as a Foreign Language; TOEFL iBT measures one’s ability to use and understand English at the university level. See https://www.ets.org/s/toefl/pdf/toefl_speaking_rubrics.pdf.
in language proficiency. With this approach, according to Chalhoub-Deville (2003), language proficiency is seen as an “ability-in-individual-in-context.”
Recognizing that the nature of language knowledge differs from one domain of use to another, Douglas (2000) proposed the language for specific purposes framework. In this framework, language ability reflects the interaction among language knowledge, specific-purpose background knowledge, the contextual features of specific-purpose language use, and the ability to put all these components together simultaneously through strategic processing. In this framework, all the components should be specified and accounted for in test development and interpretation. The FSI test is a form of language test for specific purposes in that many of the tasks reflect characteristics of the Foreign Service context and require test takers to engage language and content knowledge specific to Foreign Service work.
Extending the interactionist approach, a meaning-oriented approach to the construct definition of language proficiency added “the ability to effectively express, understand, dynamically co-construct, negotiate, and repair variegated meanings in a wide range of language contexts” (Purpura, 2017, p. 1). In other words, this approach underscores the role of meaning and the communication of literal and contextually constructed meanings in the conceptualization of language proficiency.
The meaning-oriented conceptualization of language proficiency provides a detailed depiction of the knowledge components underlying language use. It suggests that, depending on the assessment task characteristics, contextualized performance could be observed and scored for (1) grammatical accuracy, complexity, fluency, or range; (2) content accuracy or topical meaningfulness; (3) functional achievement or task fulfillment; and (4) pragmatic appropriateness (e.g., formal register) (for further details, see Purpura and Dakin, 2020). This model is also useful for assessments that seek to use independent and integrated skills rubrics to hold test takers responsible for topical information presented in the assessment itself (as in the TOEFL iBT test noted above). It has also been useful for conceptualizing and scoring the ability to understand and convey nuanced pragmatic meanings implied by context (e.g., sarcasm).
The approaches discussed so far attribute performance consistencies to expressions of the knowledge, skills, abilities, and other characteristics
that test takers have and can apply during language use. All of these play a role in a test-taker’s ability to perform tasks on the current FSI assessment, and some of them are incorporated into the assessment’s scoring rubric. In contrast, a different approach emerged in the 1990s that focused on test-takers’ ability to successfully complete tasks that approximate real-world instances of communicative language use designed for specific purposes in given contexts (Brindley, 1994). As this approach mostly uses “task performance,” not “language knowledge or communicative language ability,” as the unit of analysis (Brown et al., 2002; Long and Norris, 2000; Norris et al., 2002), it is called a task-based approach to construct definition.4
The task-based approach seeks to create assessment conditions that approximate real-life contexts in which the tasks “replicate typical task procedures, content, situations, interlocutors, and other factors, in order to provide trustworthy indications of the extent to which learners can handle the real-world requirements of task performance” (Norris, 2016, p. 236). Norris et al. (2002) implemented this approach in a rating scale designed to evaluate test-takers’ success in responding in writing to a voicemail request from a boss to make a hotel reservation. In this example, the rating scale ranges from “inadequate” to “adept” performance. At the lower end of the scale, inadequate responses involve the choice of an incorrect hotel, a confusing message about the reservation, or a stylistically inappropriate message. At the higher end, adept responses involve a correct choice for the hotel, a clear justification for the decision, and a stylistically appropriate message.
The task-based approach has contributed to the scope of language assessment by highlighting the importance of functional language use based on task fulfilment. This approach corresponds with the notion of task accomplishment as the desired standard or outcome; it is reflected in the performance descriptors of many assessment frameworks that focus on observation of the outcome. For example, did the test taker succeed in describing the advantages and disadvantages of the U.S. educational system to hypothetical host-country nationals during the test? A “pure” or “strong” task-based approach may consider only the task outcome and not the language the test taker used (Clark, 1972; Long, 2015; McNamara, 1996); other versions consider task fulfillment alongside knowledge components of language use as part of a task-based construct.
4 A separate approach to using task-based assessment valued “tasks” for their potential to trigger cognitive processes related to language rather than because of their potential to provide estimates of real-world language use (see, e.g., Skehan, 1998, 2003; Robinson, 2001).
CURRENT UNDERSTANDINGS OF LANGUAGE AND LANGUAGE USE: IMPLICATIONS FOR DEFINING THE CONSTRUCT OF LANGUAGE PROFICIENCY
As the above discussion illustrates, language is no longer viewed from a uniquely cognitive perspective as a set of discrete linguistic forms and skills that need to be mastered. Instead the field is moving toward a more sociocultural perspective, in which language is viewed as a complex system of communication that is often constructed through use. Indeed, a recent analysis of 42 journals in applied linguistics (Lei and Liu, 2018) found that such topics as traditional phonological and grammatical issues have decreased significantly since 2005. Instead, Lei and Liu (2019, p. 557) note:
[T]he most popular topics now include the impacts of socioeconomic class, ideology, and globalization on language use/learning and identity in various local contexts, the development and use of English as a Lingua Franca, the practice and effects of multilingualism, and corpus-based investigation of field-specific discourse and literacy practices and variations.
The sociocultural perspective considers language as “a resource for participation in the kinds of activities our everyday lives comprise” (Zuengler and Miller, 2006, p. 37). This perspective highlights the multifaceted nature of language and its use in the real world. Important dimensions of the sociocultural perspective include the value of multiple varieties of any given language use, the increasingly multilingual nature of language use, and the recognition that communication is multimodal.
The idea of the value of multiple varieties5 of any given language reflects an important shift in assessment away from the notion of a “native speaker” as the gold standard of language proficiency (Davies, 2003; Dewaele, 2018). For example, in the context of learners of English, instead of viewing language learners as having a deficit linguistic variety (Kachru, 1996), some applied linguists argue that English belongs to anyone who uses it (Seidlhofer, 2009). In this view, international or World English(es) are accepted as complete and whole linguistic systems of communication that have no bearing on when, how, or by whom the language was learned (Jenkins, 2006).
5 In sociolinguistics, “variety” or “dialect” is a general term for any distinctive form of a language. Wolfram et al. (1999, p. 3) defined “language variety” (which at the time was used synonymously with “dialect”) as “a social or geographic form of language distinguished by the specific pronunciation, vocabulary, and grammar of its speakers.” As a geographic example, on a narrow scale, a New York variety of English is different from a Texas variety of English. On a broader scale, an American variety of English is different from a British variety of English. Social examples include varieties used by a socioeconomic class, a profession, an age group, or any other social group (Nordquist, 2020).
The increasingly multilingual nature of language use reflects the fact that “there are almost 7,000 languages in the world and about 200 independent countries” (Cenoz, 2013, p. 3), suggesting that multiple languages are likely used in any given country and that many individuals are likely multilingual. Moreover, multilingual individuals often use multiple languages simultaneously in a conversation or in writing, drawing on all their linguistic repertoire in constructing meaning (translanguaging). Globalization, immigration, and new technologies have contributed to the growing importance of multilingualism in modern society. Given this reality, there have been calls for language assessments to reflect the multilingual nature of communication in various contexts of target language use and the language abilities of multilingual users (Gorter and Cenoz, 2017; Schissel et al., 2019; Shohamy, 2011).
It is now recognized that communication is multimodal, and language use is just one resource for achieving meaning. A common view among applied linguists and language educators is that language is the primary means for communicating meaning. This view continues to be challenged, however, and replaced by the idea that meaning is communicated through both linguistic and nonlinguistic modes (e.g., images, gestures, three-dimensional models) that are socially and culturally shaped (Kress, 2010). This expanded view emphasizes the relationships between and among linguistic modes (e.g., comparisons of listening, speaking, reading, and writing) to accomplish communicative goals. It also includes attention to nonlinguistic modes because the potential for conveying meaning is increased when they are used with linguistic modes (Lemke, 2002).
These contemporary understandings of language use—involving not just varieties of a language but also multiple languages and modalities—have implications for assessment and are already being reflected in some assessments. For example, the TOEFL iBT now uses varieties of English from North America, the United Kingdom, New Zealand, and Australia as test inputs. Some language testing researchers also are beginning to design language tests that include translanguaging components in the test input and that allow for translanguaging in the response, thus “enabling test takers to draw on their entire repertoires as multilingual persons, and more authentically representing and valuing the translanguaged reality of current workplace language practice” (Baker and Hope, 2019, p. 421). Finally, the idea of multimodal communication is reflected in the increasing use of integrated tasks in language assessment.
These broader understandings of language use also have prompted calls for broadening language constructs in assessment. For example, the Modern Language Association (MLA) Ad Hoc Committee on Foreign Language (2007) has called for an emphasis on “translingual and transcultural competence,” which it defines as the “ability to operate between languages.”
Focusing specifically on English, Hu (2018, p. 80) has proposed the construct of “communicative effectiveness” that would take into account, among other things, “the necessity of an empathetic openness to different varieties of English, the relevance of various dimensions of understanding and the crucial role of strategic competence in negotiating communication online.”
The current FSI test already embraces multilingual perspectives to some degree. In two sections of the test, test takers are required to use two languages: in the interview section of the speaking test, they interview the tester in the tested language and report what they learn to the examiner in English; in the reading in depth section of the reading test, they read a passage on a specialized topic in the target language and then summarize it orally in English. These tasks likely also occur in a similarly multilingual way in the daily work of Foreign Service officers.6
Moving from considering language use in a broad sense to its use in a specific work-related or professional context—in FSI’s case, the use of language in Foreign Service tasks—raises a separate set of assessment issues. These issues relate to the use of the test scores for high-stakes employment-related decisions and the procedures for determining the scope of tasks covered on the test.
Scores on the FSI test are used to make many types of personnel decisions about job placement, promotion, retention, and pay. Principled approaches to assessment and professional standards for assessment and evaluation suggest that assessments that are used to support personnel decisions in professional settings should be grounded in an understanding of real-world job requirements to be valid and legally defensible. The U.S. government’s generally accepted guidelines on decisions involving selection, promotion, retention, pay, and other related career decisions emphasize the need to demonstrate close approximation between the content of a test and the observable behaviors or products of a job.7 Moreover, validity is enhanced when an assessment aligns to the work context (Sackett et al., 2016). By extension, the content of language tests in professional settings should be relevant to the types of decisions that test results are used to make.
Job Analysis and Assessment Development
Job analysis is one way to connect language use in a professional setting to the content and format of a language test. Broadly speaking, understanding the content of a job, set of jobs, or an occupation involves standard work or job analysis techniques. Job analysts use these techniques to identify tasks or work behaviors and the knowledge, skills, abilities, and other characteristics of workers that underlie performance on these tasks (Brannick et al., 2017). Knowledge, skills, abilities, and other characteristics refer to the characteristics that workers need for either performing the tasks at hand or demonstrating the human behaviors described in the job analysis. These characteristics are generally considered to be constructs—as defined in psychological and educational measurement—that predict job performance. Job analysis can also document the physical and physiological context in which the work is performed, such as stressful, ever-changing, or extreme contexts.
Specifying the critical tasks and identifying the underlying knowledge, skills, abilities, and other characteristics that enable performance of the tasks are important to any kind of test development. Linking test content to tasks helps to establish content validity evidence, while linking test content to important worker characteristics helps to determine the specific constructs that a given test needs to measure. In addition, job analysis makes it possible to build a test plan or “blueprint” showing the relative importance or weight of different topics that correspond to tasks and knowledge, skills, abilities, and other characteristics, which helps ensure that the job domain has been sampled adequately (Brannick et al., 2017). Job analysis can also illuminate how real-world job aids are used—such as online translation programs—and to understand how a job is changing and could require future changes to an assessment.
It is important to note that not all knowledge, skills, abilities, and other characteristics that are identified in job analysis need to be tested, depending on the employee population and the types of training that may be provided. However, job analysis can identify the set of characteristics that employees need and that should be considered for testing. In the FSI context, the critical characteristics to consider for language proficiency testing will involve tasks that are carried out using a foreign language.
The techniques for conducting job analysis are too voluminous to review here. However, a few notable methods that could be used in a foreign language assessment context to infer specific language demands from known job demands include
- evidence-centered design (Mislevy et al., 1999a, 2003), a structured assessment development process to ensure that the evidence gathered from the assessment tasks corresponds to the underlying constructs that the assessment purports to address—in this case, language use in the professional setting;
- ethnographic approaches, which investigate the nature, type, and quality of workplace language through methodologies that illuminate social processes for individuals in workplace contexts (Newton and Kusmierczyk, 2011);
- critical-incidents techniques to generate lists of examples of especially good and poor job performance behaviors (or “incidents”) and identify observable behaviors that may lead to overall success or failure in a position (Gatewood et al., 2015); and
- cognitive task analysis, which uncovers the knowledge structures that people use to perform tasks and helps elucidate contextual factors that affect performance (Mislevy et al., 1999b).
Regardless of the technique, one key design decision involves the level of specificity of analysis. Jobs can be studied at various levels, from specific positions in one organization to occupations that describe the entire economy. Similarly, knowledge, skills, abilities, and other characteristics can be described narrowly or broadly: for example, speaking skill could be described for a specific role, such as customer service, or broadly, across all possible contexts. Ultimately, job analysts and assessment developers must specify the domain of use and the degree of generalization that is assumed across tasks. Box 3-1 provides an example of language use in the Foreign Service context, illustrating some of the specific features of the domain, which would need to be clarified in a job analysis.
Job Analysis and Language Assessment for Professional Purposes
Target language use analysis is an examination of tasks and knowledge, skills, abilities, and other characteristics that are relevant to the development of language tests (Bachman and Palmer, 1996, 2010). Target language use analysis uses a framework of task characteristics to identify critical features of occupational, academic, or other language use in real-world contexts (Bachman and Palmer, 2010). Test developers can use this framework to describe characteristics of the language use setting, characteristics of the inputs and expected responses, and relationships between inputs and expected responses. Test developers can use these same characteristics to specify language use tasks for assessment, maximizing approximations between the actual context and the assessment tasks. Existing research
that describes aspects of the target language use domain relevant to the FSI’s Foreign Service context can serve as a useful resource (e.g., Friedrich, 2016). A test blueprint (for an example, see So et al., 2015) could be built based on information combined from job analyses and target language use analyses. In a work setting, developers can identify subskills and stimulus situations directly from job analysis—using tasks and knowledge, skills, abilities, and other characteristics—and weight these elements according to their importance to overall job functioning, creating linkages that support validity argumentation.
A recent approach for conceptualizing professional communication identified four different varieties of language (“codes of relevance”) that can inform the development of language assessment for professional purposes (Knoch and Macqueen, 2020):
- Intraprofessional language is used by a small number of people with shared professional knowledge (e.g., doctors speaking to each other in “medicalese”). Language use is practically inseparable from content knowledge.
- Interprofessional language involves interactions among individuals with some shared professional knowledge (e.g., a doctor interacting with a nurse or social worker in “cross-disciplinary medicalese”).
- Workplace community language involves interactions between those with professional knowledge and lay people (e.g., a doctor communicating with a patient).
- Language varieties used in the broader social context include all language varieties and minority languages in the jurisdiction, as well as whatever patterns govern their use and combination, which can illuminate where miscommunications occur in the workplace and how they can be reduced.
Sampling from these different language varieties to develop a language assessment for professional purposes involves careful analysis of the professional context (the target language use domain) and the purpose of the assessment (Knoch and Macqueen, 2020).
In terms of sampling the job domain, Foreign Service jobs vary across several dimensions, such as career tracks, specialist or generalist, and differences in language requirements. Every job analysis needs to consider differences in job requirements across these dimensions and how these differences may be reflected in the test specifications. Moreover, it is worth noting that FSI uses the ILR framework to designate job language requirements for language-designated positions. Thus, deliberations about the role of the ILR framework now and in the future should consider that the ILR describes not only worker-related requirements (skills) but also work or job requirements.
Test scores reflect an examinee’s proficiency with regard to the construct that is explicitly assessed, as well as other factors that are not intended to be explicitly measured (Bachman, 1990; Turner and Purpura, 2016). For example, the current FSI test contains a speaking component, which is designed to determine whether test takers have sufficient proficiency in a language to gather information from an interlocutor, retain that information, and then report back in English to another interlocutor. Although oral language proficiency represents the proficiency dimension of the assessment and is the explicit object of measurement, performance on the test can be influenced by other factors, such as the test-taker’s memory, processing skills, affective dispositions, and task engagement. Although it might appear that language testers are only measuring the construct of language proficiency because they score responses related to the proficiency
dimension, these other factors are also involved, and they often moderate performance. These factors (called performance moderators by O’Reilly and Sabatini, 2013) can enhance or detract from the measurement of proficiency. Purpura and Turner (2018) elaborate on five types of performance moderators:
- The contextual dimension addresses the social, cultural, institutional, political, or economic characteristics of the assessment context and the extent to which these characteristics might impact performance.
- The sociocognitive dimension includes the extent to which test takers have the mental capacity to process, retain, and retrieve information, and the capacity to execute those steps with automaticity. This dimension is also invoked in assessments where test takers receive feedback or assistance that they are expected to process in order to improve their performance.
- The instructional dimension reflects the need for a test taker to process new information.
- The social-interactional dimension reflects the extent to which the test taker needs to manage interaction, such as turn-taking (Levinson, 1983).
- The affective dimension addresses the effect of the test-taker’s engagement, effort, anxiety, confidence, and persistence on test performance.
Traditional assessment design frameworks often focus on the context, elicitation, and proficiency dimensions of assessments. However, many fail to explicitly address these other factors in the design stage, even though they can affect performance. Whether or not these moderators are defined as part of the test construct and are explicitly measured, their implications for test design, development, and validation need to be considered.
This page intentionally left blank.