Skip to main content

Currently Skimming:

8 Psychometrics and Technology
Pages 161-186

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 161...
... to continue to support this long tradition of research that aims to increase the precision, validity, efficiency, and security of existing assessments; to develop and evaluate new methods for measuring human capabilities; and to explore methods for analyzing the potentially vast amounts of data that 161
From page 162...
... The most widely administered and well-known personnel screening test is the Armed Services Vocational Aptitude Battery (ASVAB; Maier, 1993)
From page 163...
... For example, reducing the number of applicants taking the paper-and-pencil test over time, coupled with the variability in content that adaptive item selection provided, reduced the risk of a sudden and serious test compromise, as did the implementation of methods for thwarting blatant cheating and other attempts to "game the system." Collectively, these features made CAT-ASVAB one of the most psychometrically sound, sophisticated tests ever developed across either military or civilian settings, and it set a high standard for future high-volume personnel screening instruments in the domain of cognitive abilities and skills. Concurrent with measurement research to prepare for the deployment and maintenance of CAT-ASVAB, ARI conducted a detailed review of military jobs and potential predictors of successful performance under what was called Project A (Campbell, 1990; Campbell and Knapp, 2001)
From page 164...
... , so that unidimensional IRT models could be applied for CAT-ASVAB development (see Drasgow and Parsons, 1983)
From page 165...
... , IRT = Item Response Theory. For the next several years, ARI supported research to increase the validity of the AIM using methods that capitalize on patterns of relationships among item responses and test scores.
From page 166...
... . This is partly because most modern test construction and item evaluation practices are imbued with assumptions from traditional IRT models appropriate to cognitive ability (e.g., dominance models, which assume that a person who tends to answer hard items correctly should also be able to answer most easy items correctly)
From page 167...
... The CAT-ASVAB and TAPAS also have two primary differences that highlight needs and opportunities for basic research: First, CAT-ASVAB comprises nine cognitive subtests that are individually administered and scored based on the aforementioned unidimensional dominance model, which is a standard model for cognitive ability tests. Correlations among the subtest scores are sizable, as they tend to be between cognitive ability subtests (e.g., r = .4 to .7)
From page 168...
... The emerging field of serious gaming offers potential for measuring examinee KSAOs with less-structured, highly engaging methods, which could yield vast amounts of streaming data that are best analyzed by methods currently used in physics or computer science, rather than methods used in psychology and education. The next sections of this chapter provide a snapshot of developments in psychometric modeling, gaming and simulation, and Big Data analytics, which the committee believes merit serious attention in the Army's long-term research agenda.
From page 169...
... . Moreover, when abilities are highly correlated, administering a sequence of unidimensional tests is inefficient; Multidimensional Item Response Theory (MIRT)
From page 170...
... Gaps remain in understanding the intricacies and implications of test construction practices; the capabilities of parameter estimation procedures with tests of different dimensionality, length, and sample size; how to efficiently calibrate item pools, select items, and control exposure of items with CAT; how to create parallel nonadaptive test forms; how to equate alternative test forms; how to test for measurement invariance; and how to judge the seriousness of any specifications or constraints in test construction that are being violated. In short, all of the questions that have been explored for decades with unidimensional IRT models need to be answered for MFC, and more generally, MIRT models.
From page 171...
... standardized log likelihood statistic (lz) became one of the more popular early IRT indices because it was effective for detecting spuriously high- and low-ability scores on nonadaptive cognitive tests and because it could be used not only with dichotomous unidimensional IRT models but also with polytomous unidimensional models and multidimensional test batteries.
From page 172...
... . • Automatically assemble tests that meet detailed design specifica tions (e.g., van der Linden, 2005; van der Linden and Diao, 2011; Veldkamp and van der Linden, 2002)
From page 173...
... In addition to optimizing measurement precision, CAT item selection approaches are also considered in terms of the security of the item bank. Barrada and colleagues (2011)
From page 174...
... And Big Data methods will probably be needed to parse the gigabytes of data that each assessment will generate. Technology In terms of potential for use in assessment, in contrast with the deep and long-standing tradition of self-report measures and ratings from peers and supervisors, technology advances such as those enabling immersive and realistic simulations and serious gaming provide opportunities for examinees to demonstrate knowledge, emotions, and interactions through their behavior as it is expressed within rich and often realistic scenarios (National Research Council, 2011)
From page 175...
... An integration of simulation and serious gaming with modern psychometric algorithms, based on some combination of IRT and Big Data methods, could be considered as part of a learning analytics model that moves past a test of binary correct-versus-incorrect responses, capturing unique and rich sources of information relevant to performance and to the 21st century skills that elude conventional assessment (Bennett, 2010; Redecker and Johannessen, 2013; see also this report's discussions of performance under stress [Chapter 6] , adaptive behavior [Chapter7]
From page 176...
... Furthermore, IRT could be used to optimize serious gaming by calibrating scenarios and tasks, using information functions to optimally choose activities for examinees to complete, and using the trait scores to route examinees through gaming levels ranging from novice to expert, much like traditional CAT applications (Batista et al., 2013)
From page 177...
... A.  Potential topics of research on Item Response Theory (IRT) in clude the use of multidimensional IRT models, the application of rank and preference methods, and the estimation of applicant standing on the attributes of interest with greater efficiency (e.g., via automatic item generation, automated test assembly, detect
From page 178...
... . A method for comparison of item selection rules in computerized adaptive testing.
From page 179...
... . Making the most of what we have: A practical applica tion of multidimensional Item Response Theory in test scoring.
From page 180...
... . Application of unidimensional item response theory models to multidimensional data.
From page 181...
... . Item Response Theory: Application to Psychological Testing.
From page 182...
... . A strategy for controlling item exposure in multidi mensional computerized adaptive testing.
From page 183...
... . Improving Item Response Theory model calibration by considering response times in psychological tests.
From page 184...
... . Constructing fake-resistant personality tests using item response theory: High stakes personality testing with multidimensional pairwise preferences.
From page 185...
... . Using response times for item selection in adaptive testing.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.