Appendix C

Validating the Assessment

An organizational assessment results, fundamentally, in a set of predictions based on sampling of the characteristics of an organization. The predictions may be about the current characteristics of the organization’s wider activities, staff, or processes based on the set examined; or they may be about the organization’s future relevance or impact based on observed trends. This report identifies guidelines that may be considered and possible measurement methods applicable to the key characteristics of a research and development (R&D) organization. Some measures and criteria may be quantitative, and others may be qualitative, including anecdotal evidence. Just as an organization’s activities can be assessed, so too can the assessment itself be assessed with respect to the validity of its measurement of quality, preparedness (management), and impact.

DEFINITION OF VALIDITY

Validity is the extent to which an assessment measures what it claims to measure. It is vital for an assessment to be valid in order for the results to be applied and interpreted accurately. Validity is not determined by a single statistic, but by a set of parameters that demonstrate the relationship between the assessment and that which it is intended to measure. There are four types of validity—content validity, criterion-related validity, construct validity, and face validity.

Content Validity

Content validity signifies that the items constituting an assessment represent the entire range of possible items that the assessment is intended to address. Individual assessment questions may be drawn from a large pool of items that cover a broad range of topics. For example, to achieve adequate content validity the project assessed will be shown to represent by some clearly defined strategy the wider pool of projects to which the conclusions of the assessment are also intended to apply; similarly with respect to surveys of an organization’s customers.

In some instances when an assessment measures a characteristic that is difficult to define, expert judges may rate the relevance of items under consideration for the assessment. Items that are rated as strongly relevant by multiple judges may be included in the final assessment.

Criterion-related Validity

An assessment is said to have criterion-related validity when it has demonstrated its effectiveness in predicting criteria or indicators of the characteristics it intends to assess. There are different types of criterion validity: concurrent validity and predictive validity.

Concurrent validity is examined when the criterion measures are obtained at the same time as the assessment. This indicates the extent to which an assessment’s measures accurately estimate the organization or project’s current state with respect to the criterion. For example, on



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 54
Appendix C Validating the Assessment An organizational assessment results, fundamentally, in a set of predictions based on sampling of the characteristics of an organization. The predictions may be about the current characteristics of the organization's wider activities, staff, or processes based on the set examined; or they may be about the organization's future relevance or impact based on observed trends. This report identifies guidelines that may be considered and possible measurement methods applicable to the key characteristics of a research and development (R&D) organization. Some measures and criteria may be quantitative, and others may be qualitative, including anecdotal evidence. Just as an organization's activities can be assessed, so too can the assessment itself be assessed with respect to the validity of its measurement of quality, preparedness (management), and impact. DEFINITION OF VALIDITY Validity is the extent to which an assessment measures what it claims to measure. It is vital for an assessment to be valid in order for the results to be applied and interpreted accurately. Validity is not determined by a single statistic, but by a set of parameters that demonstrate the relationship between the assessment and that which it is intended to measure. There are four types of validity--content validity, criterion-related validity, construct validity, and face validity. Content Validity Content validity signifies that the items constituting an assessment represent the entire range of possible items that the assessment is intended to address. Individual assessment questions may be drawn from a large pool of items that cover a broad range of topics. For example, to achieve adequate content validity the project assessed will be shown to represent by some clearly defined strategy the wider pool of projects to which the conclusions of the assessment are also intended to apply; similarly with respect to surveys of an organization's customers. In some instances when an assessment measures a characteristic that is difficult to define, expert judges may rate the relevance of items under consideration for the assessment. Items that are rated as strongly relevant by multiple judges may be included in the final assessment. Criterion-related Validity An assessment is said to have criterion-related validity when it has demonstrated its effectiveness in predicting criteria or indicators of the characteristics it intends to assess. There are different types of criterion validity: concurrent validity and predictive validity. Concurrent validity is examined when the criterion measures are obtained at the same time as the assessment. This indicates the extent to which an assessment's measures accurately estimate the organization or project's current state with respect to the criterion. For example, on 54

OCR for page 54
an assessment that measures current levels of customer satisfaction, the assessment would be said to have concurrent validity if it measured the current levels of satisfaction experienced by the organization's customers. Predictive validity refers to the extent to which the predictions yielded by an assessment turn out to be correct at some specified time in the future. For example, if an assessment yields the prediction that a certain avenue of research will yield a certain outcome, and that avenue is pursued, the accomplishment of the predicted outcome enhances the predictive validity of the assessment. Construct Validity An assessment has construct validity if the measures on the items assessed correlate well with measures of the same items performed by other assessment methods. For example, if quantitative measures of research productivity (e.g., papers published) correlate well with subjective measures (e.g., expert rating of the productivity of the research), this supports the construct validity of the assessment. Face Validity Face validity is the extent to which the participants in the assessment agree that it appears to be designed to measure what is intended to be measured. For example, if an assessment survey contains many questions perceived as irrelevant by the participants, its face validity will be low. RELIABILITY OF THE ASSESSMENT The validity of an assessment instrument is reliant on its reliability. Examples of reliability include inter-rater reliability, test-retest reliability, and parallel-forms reliability. Inter- rater reliability is the extent to which multiple raters of a given item agree. For example, if there is consensus among the members of a peer review committee, this indicates good inter-rater reliability. Test-retest reliability is the extent of agreement among repeated assessments of an item that has not changed between the assessments. Parallel-forms reliability is gauged by comparing two different assessments, created using different versions of the same assessment items and then randomly dividing the items into two separate tests. The two forms of the assessment would then be administered together, and the correlation of their results would indicate the parallel-forms reliability. EFFICIENCY AND IMPACT OF THE ASSESSMENT Efficiency and impact are also key aspects of an effective assessment. Factors related to the efficient conduct of an assessment include its cost in terms of money and time, burdens perceived by those being assessed, and timeliness of reported findings. Factors relating to the impact of an assessment include the extent to which the recipients of the assessment implement the advice provided in the assessment, the extent to which the assessment findings are distributed to those who should receive them, and the content of the feedback from those who receive the findings. 55