Appendix C
Validating the Assessment
An organizational assessment results, fundamentally, in a set of predictions based on sampling of the characteristics of an organization. The predictions may be about the current characteristics of the organization’s wider activities, staff, or processes based on the set examined; or they may be about the organization’s future relevance or impact based on observed trends. This report identifies guidelines that may be considered and possible measurement methods applicable to the key characteristics of a research and development (R&D) organization. Some measures and criteria may be quantitative, and others may be qualitative, including anecdotal evidence. Just as an organization’s activities can be assessed, so too can the assessment itself be assessed with respect to the validity of its measurement of quality, preparedness (management), and impact.
DEFINITION OF VALIDITY
Validity is the extent to which an assessment measures what it claims to measure. It is vital for an assessment to be valid in order for the results to be applied and interpreted accurately. Validity is not determined by a single statistic, but by a set of parameters that demonstrate the relationship between the assessment and that which it is intended to measure. There are four types of validity—content validity, criterion-related validity, construct validity, and face validity.
Content Validity
Content validity signifies that the items constituting an assessment represent the entire range of possible items that the assessment is intended to address. Individual assessment questions may be drawn from a large pool of items that cover a broad range of topics. For example, to achieve adequate content validity the project assessed will be shown to represent by some clearly defined strategy the wider pool of projects to which the conclusions of the assessment are also intended to apply; similarly with respect to surveys of an organization’s customers.
In some instances when an assessment measures a characteristic that is difficult to define, expert judges may rate the relevance of items under consideration for the assessment. Items that are rated as strongly relevant by multiple judges may be included in the final assessment.
Criterion-related Validity
An assessment is said to have criterion-related validity when it has demonstrated its effectiveness in predicting criteria or indicators of the characteristics it intends to assess. There are different types of criterion validity: concurrent validity and predictive validity.
Concurrent validity is examined when the criterion measures are obtained at the same time as the assessment. This indicates the extent to which an assessment’s measures accurately estimate the organization or project’s current state with respect to the criterion. For example, on
an assessment that measures current levels of customer satisfaction, the assessment would be said to have concurrent validity if it measured the current levels of satisfaction experienced by the organization’s customers. Predictive validity refers to the extent to which the predictions yielded by an assessment turn out to be correct at some specified time in the future. For example, if an assessment yields the prediction that a certain avenue of research will yield a certain outcome, and that avenue is pursued, the accomplishment of the predicted outcome enhances the predictive validity of the assessment.
Construct Validity
An assessment has construct validity if the measures on the items assessed correlate well with measures of the same items performed by other assessment methods. For example, if quantitative measures of research productivity (e.g., papers published) correlate well with subjective measures (e.g., expert rating of the productivity of the research), this supports the construct validity of the assessment.
Face Validity
Face validity is the extent to which the participants in the assessment agree that it appears to be designed to measure what is intended to be measured. For example, if an assessment survey contains many questions perceived as irrelevant by the participants, its face validity will be low.
RELIABILITY OF THE ASSESSMENT
The validity of an assessment instrument is reliant on its reliability. Examples of reliability include inter-rater reliability, test-retest reliability, and parallel-forms reliability. Inter-rater reliability is the extent to which multiple raters of a given item agree. For example, if there is consensus among the members of a peer review committee, this indicates good inter-rater reliability. Test-retest reliability is the extent of agreement among repeated assessments of an item that has not changed between the assessments. Parallel-forms reliability is gauged by comparing two different assessments, created using different versions of the same assessment items and then randomly dividing the items into two separate tests. The two forms of the assessment would then be administered together, and the correlation of their results would indicate the parallel-forms reliability.
EFFICIENCY AND IMPACT OF THE ASSESSMENT
Efficiency and impact are also key aspects of an effective assessment. Factors related to the efficient conduct of an assessment include its cost in terms of money and time, burdens perceived by those being assessed, and timeliness of reported findings. Factors relating to the impact of an assessment include the extent to which the recipients of the assessment implement the advice provided in the assessment, the extent to which the assessment findings are distributed to those who should receive them, and the content of the feedback from those who receive the findings.