Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 1
Division of Behavioral and Social Sciences and Education 500 Fifth Street, NW
Board on Human-Systems Integration Washington, DC 20001
Phone: 202 334 2678
Fax: 202 334 2210
Email: bohsi@nas.edu
www.nationalacademies.org
July 8, 2011
Ms. Mary Darnell
Contracting Officer’s Representative
U.S. Department of Education
Office of Special Education and Rehabilitative Services
550 12th Street, SW
Washington, DC 20202
Dear Ms. Darnell:
At the request of the National Institute on Disability and Rehabilitation Research
(NIDRR) within the Office of Special Education and Rehabilitative Services, U.S. Department of
Education, the Board on Human-Systems Integration of the National Research Council (NRC)
convened an ad hoc committee to conduct an evaluation of aspects of NIDRR’s program.
Specifically, the Committee on the External Evaluation of NIDRR and Its Grantees was charged
to review NIDRR's priority-setting, peer review, and grant management processes, develop an
overall framework and evaluation design for the review of grantee outputs for a sample of 30
grantees, conduct a review of the sampled grantee outputs, and assess the output review process.
(For a list of committee members, see Attachment A.)
The results of this project will be presented in a final report that will include a description
and assessment of NIDRR’s priority-setting, peer review, and grant management processes and
the quality of the grantees’ outputs. The committee’s evaluation is nearing completion, and the
committee plans to deliver the final report in fall 2011. However, knowing that NIDRR plans to
move forward with an additional evaluation cycle prior to the delivery of the committee’s final
report, we wanted to provide the agency with information that could inform future evaluation
design. This letter report is therefore limited in scope to discussing the procedures the committee
used in its output evaluation, its assessment of those procedures, and recommendations for future
evaluations. (See Attachment B for the names of the reviewers of this letter report.)
BACKGROUND ON NIDRR AND COMMITTEE CHARGE
The National Institute on Disability and Rehabilitation Research is the principal federal
agency that funds applied research and development to improve the lives and functioning of
persons with disabilities (Office of Special Education and Rehabilitative Services, 2007).
NIDRR was established by the 1978 amendments to the Rehabilitation Act of 1973 and is one of
three components of the Office of Special Education and Rehabilitative Services at the U.S.
Department of Education (Office of Special Education and Rehabilitative Services, 2007).
NIDRR has conducted various efforts to assess its portfolio and grant results and hold its
programs accountable for results. In 2009, NIDRR requested that the NRC conduct an evaluation
of NIDRR and its grantees. The charge to the committee included two major components. The
1
OCR for page 2
first component, termed the “process evaluation,” involved examining NIDRR’s priority-writing
process, its practices for the peer review of grant applications, and grant management processes.
This component will be discussed in the committee’s final report and is not covered in this letter
report.
The second component, termed the “summative evaluation,” involved the assessment of
grantee outputs. The key question of the summative evaluation was articulated by NIDRR as
follows: To what extent are the final outputs from NIDRR grants of high quality? The major
portion of this component will similarly be covered in the committee’s final report and is not
covered in this letter report. However, one element in the summative evaluation involved a
committee self-assessment of the methods it developed to conduct the summative evaluation and
the identification of implications for future reviews. This element is the sole focus of this letter
report, which is organized into three main sections. The first section summarizes the methods
and procedures the committee used in the summative evaluation. The second section discusses
the committee’s assessment of these methods. The third section offers recommendations for
future evaluations of NIDRR and its grantees.
SUMMARY OF METHODS DEVELOPED FOR ASSESSING
THE QUALITY OF OUTPUTS
As noted above, the summative evaluation component involved an assessment of a wide
range of grantee outputs, which are defined and categorized by NIDRR as follows.
1. Publications (e.g., research reports and other publications in peer-reviewed and nonpeer-
reviewed publications).
2. Tools, measures, and intervention protocols (e.g., instruments or processes created to
acquire quantitative or qualitative information, knowledge, or data on a specific disability
or rehabilitation issue; or to provide a rehabilitative intervention).
3. Technology products and devices (e.g., industry standards/guidelines, software/netware,
inventions, patents/licenses/patent disclosures, working prototypes, product(s) evaluated
or field-tested, product(s) transferred to industry for potential commercialization,
product(s) in the marketplace).
4. Informational products (e.g., training manuals or curricula, fact sheets, newsletters,
audiovisual materials, marketing tools, educational aids, websites or other Internet sites
that were produced in conjunction with research and development, training,
dissemination, knowledge translation, and/or consumer involvement activities).
Committee members reviewed 148 outputs: 103 publications (category 1); 9 tools,
measures, and intervention protocols (category 2); 9 technology products and devices (category
3), and 27 information products (category 4).
To prepare for the output review, the committee first developed a set of criteria and
dimensions under those criteria that would be used to assess the quality of outputs. Second, the
committee developed a questionnaire to assist grantees in nominating outputs to be reviewed and
to give them the opportunity to provide supplemental descriptive information about each of the
nominated outputs, along with the outputs themselves. Third, a sampling plan was developed to
select grantees who would be invited to participate in the evaluation. Fourth, the committee staff
worked with grantees who agreed to participate to gather and catalogue the outputs and
supplemental information that were submitted for the committee’s review. Fifth, the committee
assessed the outputs through an expert review process that was based on direct review of the
2
OCR for page 3
outputs and any supplemental information provided by the grantees. These five steps in the study
process are each described in detail below.
Quality Criteria Development
A key element of the summative evaluation was to respond to NIDRR’s request to
develop criteria for assessing the quality of its grantees’ outputs.1 In developing the criteria, the
committee drew on its own research expertise, recommendations of the external advisory group
convened by NIDRR while planning this NRC evaluation (National Institute on Disability and
Rehabilitation Research, 2008), and methods used in other NRC and international studies that
have evaluated federal research programs (see, e.g., Bernstein et al., 2007; Canadian Academy of
Health Sciences, 2009; Chien, Chen, and Chen, 2009; Ismail, Tiessen, and Wooding, 2010;
National Research Council, 2007; Wooding and Starkey, 2010; Wooding et al., 2009). The
committee developed four criteria:
1. Technical quality of output The technical quality of outputs was assessed using
dimensions that included applying standards of science and technology, appropriate
methodology (quantitative or qualitative design and analysis), and degree of accessibility
and usability.
2. Advancement of knowledge or the field The dimensions used to assess advancement of
the knowledge base or of the field (e.g., research, practice, or policy as relevant) included
scientific advancement of methods, tools, and theory; developing new information or
technologies; closing an identified gap; and using methods and approaches that were
innovative or novel.
3. Likely or demonstrated impact This criterion was used to assess the likely or
demonstrated impact of outputs on science (journal impact, citations); consumers (for
people with disabilities: health, quality of life, participation), provider practice, health
and social systems, social and health policy, or the private sector or commercialization.
4. Dissemination The dimensions of dissemination included the identification and tailoring
of materials for reaching different audience/user types; collaboration with audience/users
in identifying content and medium needs/preferences; delivery of information through
multiple media types and sources for optimal reach and accessibility; evaluation of
dissemination efforts and impacts; and commercialization/patenting of devices, if
applicable.
A 7-point scale was used to rate the criteria at varying levels of quality: 1 indicated poor
quality, 4 indicated good quality, and 7 indicated excellent quality. A rating of 4 meant that the
output solidly fell in the range of meeting expectations for good quality. See Box 1 for examples
of quality indicators considered by committee members in determining each criterion score.
These examples are not intended to be exhaustive, but to illustrate the attributes of outputs that
were considered in the committee’s review. In rating the outputs, committee members drew from
their scientific expertise to consider the output’s quality with respect to the dimensions under
each criterion.
1
The development of the criteria was informed by open session discussions in which NIDRR staff were present.
3
OCR for page 4
BOX 1
Examples of Quality Indicators Considered in Determining Output Scores
Technical Quality
Strength of literature review and framing of issues
Competence of design, considering the research question and other parameters of the study
Quality of measurement planning and description
Analytic methods and interpretation; degree to which recommendations for change were drawn clearly from the
analysis
Description of feasibility, usability, accessibility, and consumer satisfaction testing
Advancement of Knowledge/Practice
Degree to which a ground-breaking and innovative approach is presented
Application of a formal test of a hypothesis regarding a technique used widely in the field to improve practice
Level of advancement and improvement to current classification systems
Usefulness of descriptive base of information about factors associated with a condition
Novelty of ways of studying a condition that can be applied to developing new models, training, or research
Likely or Demonstrated Impact
Degree to which output is well cited or has promise to be (for newer articles)
Potential to improve the lives of persons with disabilities through increasing accessibility
Possibly transformative clinical and policy implications
Potential for building capacity, lowering costs, commercialization, etc.
Influence on the direction of research, use in the field, or capacity of the field
Dissemination
Method and scope of dissemination
Description of the evidence of dissemination (e.g., numbers distributed to different audiences);
Level of strategic dissemination to target audiences when needed
Evidence of reaching the target audience
Degree to which appropriate multiple media outlets were used, such as webinars, TV coverage, senate testimony,
website, DVD, and/or social network sites.
Grantee Questionnaire
NIDRR supplied the committee with information gathered from grantees in their Annual
Performance Reports (APRs) and final reports (Research Triangle International, 2009). Grantees
are required to complete APRs annually to report on their progress. At the end of a grant, they
must complete a final report. To supplement the APRs and final reports provided by NIDRR, the
committee developed a grantee questionnaire (see Attachment C). The first part of the
questionnaire asked grantees to list each of the projects under the grant and nominate the top two
outputs from each project that reflected the grants’ best achievements. The questionnaire
specified that outputs were to be drawn from the four categories defined in the APR (Research
Triangle Institute, 2009), described above: (1) publications; (2) tools, measures, and intervention
protocols; (3) technology products and devices; and (4) informational products.
The questionnaire instructions indicated that the committee would prefer to review one
publication and one other type of output for each project under a grant, but that grantees could
submit two publications for review if that was the only type of output for a project. The
questionnaire asked the grantees to submit the actual outputs for the committee's review. If the
output was a website, a tool, or a technology device that had to be demonstrated, grantees were
4
OCR for page 5
asked to provide descriptive information, pictures, or links to websites for the committee's direct
review.
The second part of the questionnaire included a series of questions to elicit more in-depth
descriptions of the outputs if needed and to provide supplemental evidence of the output's
technical quality, how it advanced knowledge or practice, its likely or demonstrated impact, and
about how it was disseminated. This type of information needed for a comprehensive assessment
of the output would not always be apparent in reviewing the output in isolation.
For supplemental information on technical quality, grantees were asked to describe
examples, such as the approach or method used in its development; relevant peer recognition;
receipt of a patent, approval by the Food and Drug Administration, or use of the output in
standards development; and evidence of the usability and accessibility of the output. For
supplemental information on advancement of knowledge or the field, grantees were asked to
discuss the importance of the original question or issue and describe how the output had
advanced knowledge in such arenas as making discoveries; providing new information;
establishing theories, measures, and methods; closing gaps in the knowledge base; and
developing new interventions, products, technology, and environmental adaptations. For
supplemental information on likely or demonstrated impact, grantees were instructed to describe
the output’s potential or actual impact on science, people with disabilities, provider practice,
health and social systems, social and health policy, private sector/commercialization, capacity
building, and any other relevant arenas. For supplemental information about dissemination,
grantees were asked to describe the stage and scope (e.g., local, regional, national) of
dissemination efforts, specific dissemination activities, any identification and tailoring of
materials for particular audiences, efforts to collaborate with particular audiences or user
communities to identify content and medium needs and preferences, and the delivery of
information through multiple media types. Grantees were also asked to provide information from
evaluations they may have conducted of their dissemination efforts and impacts (e.g., results of
audience feedback or satisfaction surveys).
The committee piloted the questionnaire on one NIDRR grant that had ended in 2008 and
was outside the sampling pool (described below). Operating through subgroups, the committee
assessed five outputs of this grant, which consisted of publications, an assessment package, a
working prototype, and a fact sheet. As a result of that assessment, the committee revised the
questionnaire by collapsing some of the dimensions from an original six criteria into the four
final criteria.2
To further supplement the grantee questionnaire in assessing the likely impact of
published articles, the committee also used such sources as Scopus and the Web of Science to
determine the journal impact factor and the number of citations of a particular article.
Sampling
NIDRR provided the committee with a data set of grantee information that consisted of
all grants ending in years 2006−2010 (n = 248). Table 1 shows, within each program mechanism:
the number of grants and the corresponding proportion of all NIDRR grants, the mean duration
of grants, and the total funds expended and proportion of all funds expended. The last five
2
An original criterion on output usability was collapsed into the final technical quality criterion. Another original
criterion on consumer and audience involvement was restructured as dimensions of the other criteria. For example,
the technical quality criterion now includes a dimension on "evidence of usability and accessibility." The impact
criterion includes a dimension on "impact on people with disabilities." The dissemination criterion includes a
dimension on "tailoring materials to audiences" and "collaboration with users."
5
OCR for page 6
columns of the table show the number of grants that ended in each year from 2006 to 2010.
Highlighted in these last five columns is a subset of 111 grants that comprised the sampling pool
from which 30 grants were randomly sampled for the summative evaluation.
The committee used the smaller subset of all NIDRR grants as the sampling pool because
of its charge and preliminary analysis of the data. The committee was directed by its charge to
draw a sample of 30 grants ending in 2009 that reflected the range of work conducted across
NIDRR’s 14 program mechanisms. However, as can be seen in Table 1, several program
mechanisms did not have at least two grants ending in 2009: the three Model Systems (MS)
mechanisms, Disability Business Technical Assistance Centers (DBTAC), Disability and
Rehabilitation Research Projects-Knowledge Translation (DRRP-KT), Advanced Rehabilitation
Research Training (ARRT), and grants under the Disability and Rehabilitation Research Projects
Program (DRRP)-Section 21.
Because the MS grant mechanisms support some of NIDRR’s flagship programs,
including traumatic brain injury (MS-TBI), spinal cord injury (MS-SCI), and burn injuries (MS-
Burn), adjustments were made to the sampling pool to ensure that these programs would be
included in the sample. The committee thus went back to the nearest year that yielded a total of
at least two grants, which was 2008 for MS-Burn and MS-TBI (n = 5 for Burn; n = 9 for MS-
TBI, with 1 in 2009 and 8 in 2008) and 2007 for MS-SCI (n = 9) and included these grants in the
pool. The DBTAC, DRRP-KT, ARRT, and DRRP-Section 21 were excluded from the pool for
this first cycle of evaluations. Small Business Innovation Research, Phase I, grants were also
excluded from the sampling pool because they do not produce “outputs” and therefore did not
align with the evaluation parameter to review two outputs for each project within a grant. After
these adjustments, the total pool consisted of 111 grants across nine NIDDR program
mechanisms. It is possible that the older grants included in the evaluation had an advantage over
the grants ending in 2009 because of the additional time for their outputs to have had an impact.
From this pool of 111 grants, 30 grants (27%) were randomly selected for review in the
following way. To balance the desire for the sample of grants to represent the nine program
mechanisms included in the pool, the sampling was stratified at the program mechanism level as
a proportion of all grants in the sampling pool. For example, there were 36 Field Initiated Project
(FIP) grants in the sampling pool (see Table 1), which was 32 percent of all of the grants in the
sampling pool (n = 111). Therefore, 32 percent of the 30 grants in the sample should be FIPs (n =
10). The 36 FIPS in the sampling pool were numbered 1 through 36 and then 10 FIP grants were
randomly selected, using a website that generated random numbers.
Table 2 in the next section shows the number of grants included in the sample by
program mechanism. Proportionally, the number of grants sampled in each program mechanism
did not reflect the actual proportions of all grants in the larger NIDRR data set (N = 248), but the
sampling method did allow for the largest number of grants in the sample to be FIP grants, which
was the largest program mechanism in the NIDRR data set.
6
OCR for page 7
TABLE 1 NIDRR Grants Ending Between 2006 and 2010, by Program Mechanisms
Number of Grants in Program
Percent of Total
Mechanism, by Year Ending, with
Mean Total Grant NIDRR Grant
Grants Included in Sampling Pool
Number Percent Duration of Funding by Funding (for
Highlighted
of of all Grant Program Grants Ending
Program/Funding Mechanism Grants Grants (years) Mechanism (in $) 2006–2010) 2006 2007 2008 2009 2010
Model Systems Grants
Burn Injury (MS-Burn) 5 2.02 6.1 7,271,563 2.34 0 0 5 0 0
Traumatic Brain Injury (MS-TBI) 16 6.45 5.6 29,132,862 9.38 0 7 8 1 0
Spinal Cord Injury
17 6.85 6.5 33,977,321 10.94 8 9 0 0 0
(MS-SCI)
Center Grants
Rehabilitation Engineering and
12 4.84 5.7 55,816,980 17.98 0 0 0 8 4
Research Centers (RERC)
Rehabilitation Research and
21 8.47 5.9 82,920,345 26.71 0 0 0 10 11
Training Centers (RRTC)
Research and Development Grants
Disability and Rehabilitation
18 7.26 5.0 30,627,386 9.87 0 0 0 14 4
Research Projects (DRRP)
Field Initiated Projects (FIP) 74 29.84 3.8 35,881,454 11.56 0 0 0 36 38
Small Business Innovation
31 12.50 0.6 2,323,305 0.75 0 0 0 16 15
Research, Phase I (SBIR)
Small Business Innovation
16 6.45 2.5 7,990,171 2.57 0 0 0 8 8
Research, Phase II (SBIR)
Translation Grants
DRRP-Disability Business
Technical Assistance Centers 1 0.40 1.8 1,742,400 0.56 0 0 1 0 0
(DBTAC)
DRRP-Knowledge Translation
3 1.21 5.0 8,179,933 2.64 0 0 0 0 3
(DRRP-KT)
Training Grants
Advanced Rehabilitation
11 4.44 5.8 8,229,338 2.65 0 0 0 1 10
Research Training (ARRT)
Switzer Fellowships 20 8.06 1.3 1,220,000 0.39 0 0 0 12 8
DRRP-Section 21 3 1.21 6.1 5,141,955 1.66 0 0 1 1 1
Total 248 100.00 $310,455,013 100.00 8 16 15 107 102
Number of Grants in Sampling 0 9 13 89 0
Pool by End Year
Total Number Grants in Pool 111
SOURCE: Data summarized from National Institute on Disability and Rehabilitation Research (September 2009). Annual Performance Report Data Set of Grants
Ending in 2006 to 2010. Washington, DC: National Institute on Disability and Rehabilitation Research.
7
OCR for page 8
After the proposed evaluation methods received approval from the institutional review
board of the National Academies, the sample of 30 grants was drawn, and invitations to
participate were sent to the principal investigators (PI) of the 30 grants. The PIs were fully
informed about the methods to be used in the evaluation and what would be required of them. Of
the original 30 grantees invited, 3 (1 DRRP and 2 FIPs) declined because they did not have time
to fulfill the evaluation requirements (n = 2) or changed institutions (n = 1). Three other grants
were then randomly selected from the remaining pool for the appropriate program mechanisms
to bring the final sample to 30 grants (i.e., 1 DRRP and 2 FIPs were drawn). In replacing three of
the originally sampled grants, we acknowledge that bias from self-selection could have crept into
the evaluation findings and that the final sample of 30 grants that participated in the evaluation
may not be fully representative of the larger population of grants.
Compiling Outputs to Be Reviewed and Number of Outputs Reviewed
As noted, the PIs of the grants included in the sample were provided with written
instructions about how to submit their outputs for the review and provide supplemental
information about the outputs. Committee staff worked with the grantees to clarify the
instructions and to encourage them to submit their output packages. Because some grants had
ended several years before our review (2007 and 2008 for the Model Systems grants), some
grantees had difficulty in submitting materials because the PIs had changed departments or
institutions or had other competing priority activities during the time period of our review. Staff
accommodated these PIs by providing additional time for submitting their materials and, in five
cases, by assisting them in completing the questionnaires through telephone interviews. Two
grantees did not provide the supplemental questionnaires.
As described above, grantees were sent questionnaires on which they were asked to list
each project under their grant and nominate two outputs per project to be reviewed by the
committee. They were asked to identify the top two outputs per project that reflected their grant’s
best achievements. In order to permit assessment of outputs beyond journal publications,
grantees were asked to offer at least one nonjournal publication per project, if such outputs were
available. The number of projects for each grant varied by size, from 1 for small field-initiated
grants to 10 on larger center grants. Therefore, the number of outputs nominated for review per
grant ranged from 2 to 20; the average number of outputs per grant was 5. A total of 156 outputs
were submitted for review across the 30 grants selected. Eight outputs were considered highly
related to other outputs, and they were reviewed together. This occurred when one output was a
derivative or different expression of another output, and when the PI responses to criteria
questions were basically the same. Therefore, the number of outputs for analysis was 148. Table
2 presents the number of grants included in the sample by program mechanism and the types of
outputs that were reviewed.
To put the outputs reviewed into the larger context of the outputs produced by grantees in
the sampling pool of 111 grants, Table 2 also shows that the proportion of publications and other
outputs (tools, technology, and information products) that were reviewed by the committee were
relatively close to the proportions of the various output types produced by grantees in the larger
sampling pool. The proportion of publications reviewed was somewhat lower at 70 percent
(compared with 76 percent in the sampling pool), and the proportion of information products
reviewed was somewhat higher at 18 percent (compared with 11% in the sampling pool). The
mean number of outputs per grant in the sample is much lower (mean = 5) than in the sampling
pool (mean = 13) because the sampled grants only submitted their top two outputs per project (as
described above).
8
OCR for page 9
TABLE 2 Number of Grants and Distribution of Outputs Reviewed, by Program Mechanism
NIDRR Grant Category and Publica- Tech- Infor-
Program Funding Mechanisms Grants tions Tools nology mation Total
Model Systems Grants
Burn Injury (MS-Burn) 2 12 2 0 4 18 (12%)
Traumatic Brain Injury (MS-TBI) 2 12 0 0 2 14 (10%)
Spinal Cord injury 2 11 0 0 0 11 (7%)
(MS-SCI)
Center Grants
Rehabilitation Research and 3 16 0 0 12 28 (19%)
Training Center (RRTC)
Rehabilitation Engineering and 2 16 2 5 3 26 (18%)
Research Centers (RERC)
Research and Development Grants
Disability and Rehabilitation 4 13 4 0 5 22 (15%)
Research Projects (DRRP)
Field Initiated Projects (FIP) 10 17 1 3 1 22 (15%)
Small Business Innovation 2 1 0 1 0 2 (1%)
Research, Phase II (SBIR)
Training Grants
Switzer Fellowship 3 5 0 0 0 5 (3%)
Total and Proportion of Output
Types in Sample 30 103 (70%) 9 (6%) 9 (6%) 27 (18%) 148
Total and Proportion of Output
Types in Sampling Pool 111 1,060 (76%) 101 (7%) 84 (6%) 148 (11%) 1,393
SOURCE: Data summarized from Questionnaires submitted to committee by NIDRR Grantees that participated in
the evaluation (Rows 3 to 16); and National Institute on Disability and Rehabilitation Research (September 2009).
Annual Performance Report Data Set of Grants Ending in 2006 to 2010. Washington, DC: National Institute on
Disability and Rehabilitation Research (Row 17).
The Review Process
The committee members, whose expertise covers social sciences, rehabilitation medicine,
engineering, evaluation, and knowledge translation, were divided into three subgroups of five
members each. The subgroups were organized to ensure that outputs would be reviewed by a
group of individuals with the collective expertise necessary to judge their quality. The subgroups
met in October 2010, December 2010, and February 2011. Because of the relatively short period
of time in which to conduct the reviews, grants were scheduled for review according to size, with
the smaller grants being invited first (e.g., FIPS, Switzers, SBIRs), and the larger grants (DRRPs,
models systems, center grants) being invited to participate in the later rounds. The rationale for
the scheduling was that the smaller grants had fewer outputs and would need less preparation
time for the review than the larger grants, which had many projects and more outputs to prepare
for the review. As a result of this approach, the content of the grants being reviewed in each
round tended to be mixed and so required a corresponding mix of expertise in each subgroup.
However, efforts were made to match the expertise of the reviewers in each subgroup with the
outputs they would be reviewing (e.g., technology output was assigned to a subgroup with
engineering expertise). For a detailed description of the review procedures, see Box 2.
9
OCR for page 10
BOX 2
Committee Review Procedures
Each of the 30 grants was assigned to one of the three committee subgroups, so that all outputs from
a grant were reviewed by the same subgroup. To ensure consistency in approach across subgroups, the
committee chair attended all subgroup meetings.
Based on direct review of the output itself and supplemental information about the output provided
in the APRs, final reports, and questionnaire responses from grantees, each subgroup member
independently rated every output assigned to that subgroup, assigning a quality criteria score for each of
the four quality criteria (technical quality, advancement of knowledge or the field, likely or actual impact,
and dissemination), as well as an overall score for the output and a rationale for the overall score. Scores
were assigned using a 7-point scale ranging from 1 to 7 and anchored at 3 points: 1 = poor quality, 4 =
good quality, and 7 = excellent quality.
For each output, one subgroup member was assigned as the primary reviewer; the remaining four
subgroup members were secondary reviewers.
The subgroups used the following process for arriving at consensus scores:
The primary reviewer opened discussion of each output by presenting a brief summary of the
output and his or her rationale for rating each relevant criterion plus the overall score.
The secondary reviewers then presented their ratings for each output and a brief rationale.
The subgroup then developed consensus group ratings for each output through discussion
facilitated by the subgroup chair.
Following the discussion of all outputs from an individual grant, the subgroup considered the full
spectrum of the reviewed material, along with the grant’s overall purpose and objectives (using the
grant’s APR), and assigned an overall performance rating for the grant using the same 7-point scale.
The committee's expert review involved a qualitative consideration and assessment of the
multiple quality dimensions of the outputs — a process that has been recommended as a valid
method for evaluating the relevance and quality of federal research programs (Committee on
Science, Engineering, and Public Policy, 1999). The 7-point rating scale was used in order to
more precisely describe the results of the output assessment in terms of varying levels of quality.
During the reviews, the committee members frequently discussed how they were applying the
criteria and interpreting the anchors of the rating scale so they could calibrate their ratings. In
addition, brief narrative statements were written that summarized the rationale for the subgroups’
ratings of each output. These statements were reviewed after the ratings were completed to
identify attributes that particularly characterized the varying levels of quality and were helpful in
further exemplifying the dimensions of the criteria.
Although the final scores used to report results of the output assessment were based on
the consensus scores, the committee conducted an interrater reliability analysis of their initial
independent ratings (i.e., raw scores before their discussion) to determine the degree to which
individual committee members were using and interpreting the scale in the same way. The
interrater reliability analysis was conducted, using methods suggested by MacLennan (1993), for
more than two raters with ordinal data. This method calculates an intraclass correlation
coefficient (ICC) that represents an average correlation among raters. The interrater reliability
analyses were run on 15 grants that had at least 3 outputs reviewed by the subgroups. The ratings
compared were the individual committee members' raw scores (before discussion) on the
technical quality criterion only. The ICCs ranged between .64 and .98 and were statistically
significant at p < .05. According to Yaffee (1998), the minimum acceptable ICC is .75 to .80. Of
the 15 grants, 13 had ICCs great than .75. The ICC results suggest that individual members were
10
OCR for page 11
using and interpreting the 7-point scale in a similar manner prior to the full subgroup’s
discussions of the output ratings and their subsequent determination of consensus scores.
ASSESSMENT OF THE COMMITTEE’S REVIEW METHODS
The committee developed and implemented an evaluation process for assessing the
outputs of NIDRR's grantees and was able to identify varying levels of quality as well as some of
the output characteristics associated with these varying quality levels. Considerable time was
spent selecting and refining the criteria used to assess quality. Although there was some variation
in the independent scoring among subgroup members, it was rarely extreme, particularly after
the group discussions. And although the specific content area expertise to assess every output
could not be ensured for the diversity and breadth of the outputs reviewed, the committee
concludes that, collectively, the subgroups were able to adequately assess all the outputs.
The committee endeavored to assess its evaluation methods throughout the study process.
Members engaged in continuous reflection and recording of strengths and weaknesses during the
rating process conducted in subgroup meetings. To facilitate this effort, the committee chair
participated in all subgroup meetings to ensure members understood how each subgroup was
applying the rating methods. In addition, conference calls with the full committee were held after
each set of subgroup meetings to discuss the evaluation process and refine the methods. Lastly,
during its final meeting, the committee devoted a half-day session to discussion of the strengths
and weaknesses of the process and developing conclusions and recommendations for future
evaluations. This discussion was based on the continuous reflections of committee members,
along with findings from an informal, anonymous poll of committee members about the review
process.
In the poll, each committee member was asked to rate his or her level of confidence in 16
aspects of the review process and 8 topics related to its replication. For each of these aspects,
members assigned a confidence rating on a 5-point scale in which 1 indicated “no confidence at
all” and 5 indicated “extreme confidence.” The poll was intended to provide an indicator of each
committee member’s assessment of the output rating process. Poll results confirmed that
individual members were generally confident in the review process and the potential replication
of the process, with confidence ratings above the midpoint for all but one of the review process
aspects and all but one of the replication topics.
Aspects of the review process in which the committee had the greatest confidence (with
scores above 4 on the rating scale) were:
the technical quality score,
the face validity of the consensus scores that were produced for outputs,
the ability of the committee to evaluate outputs without having consumers on
subgroups, and
the appropriateness of a 7-point quality rating scale.
These results were consistent with the committee’s overall impressions on the strengths and
weakness of the evaluation process over the course of its work.
With regard to the poll item on the ability of the committee to evaluate outputs without
having consumers on subgroups, the committee notes that its confidence rating on this item is not
meant to suggest that the input of individuals with disabilities is not a necessary part of the
process. The committee included two subject-matter experts who are also individuals with
disabilities, and the point above relates to committee members’ view that the subgroups, while
11
OCR for page 12
lacking consumers without relevant scientific expertise, did assign appropriate scores to the
outputs.
The poll also confirmed committee impressions regarding the challenge of rating outputs
other than peer-reviewed journals; this was the one aspect of the review process receiving an
average confidence rating below the midpoint of the scale.
The results of the poll related to replication of the review process largely mirrored the
results related to the review process itself. Committee members expressed the greatest
confidence in the ability to match appropriate reviewer expertise with outputs to review and the
ability to appropriately secure knowledgeable reviewers. The only issue that received an average
confidence rating below the midpoint was the ability to assess the overall quality of grants by
reviewing selected outputs.
Overall, members’ reflections on the summative evaluation process suggest that it
worked well and achieved what it was designed to do. However, the committee encountered
several challenges and limitations during the course of our work that limit the generalizability of
the findings from this evaluation and restrict what can be said about the totality of outputs
generated by all NIDRR grantees. In the next section, within the context of recommendations for
future evaluations, we discuss these limitations and issues.
RECOMMENDATIONS FOR FUTURE EVALUATIONS
The committee offers conclusions, recommendations, and suggestions on defining
evaluation objectives, strengthening the output assessment, and using NIDRR’s APR system to
capture data for future evaluations. The goal of our recommendations and suggestions is to
improve future evaluation efforts and to ensure that evaluation results optimally inform NIDRR’s
efforts to maximize the impact of its research grants.
Defining Future Evaluation Objectives
The primary focus of the committee’s summative evaluation was to assess the quality of
research and development outputs produced by grantees. This evaluation did not allow for an in-
depth examination or comparison of the larger context of the funding programs, grants, or
projects in which the outputs were produced. Although capacity building is a major thrust of
NIDRR's center and training grants, assessment of training outputs, such as the number of
trainees moving into research positions, was not part of our charge.
NIDRR’s grant mechanisms or programs vary substantially in both size and duration (see
Table 1, above), with grant amounts varying from less than $50,000 (Field Initiated Projects) to
more than $4 million (Center Grants), and grant durations varying from less than 1 year to more
than 5 years. Programs also differ in their objectives, so the expectations of the grantees under
different programs vary widely. For example, a Switzer training grant is designed to increase the
number of qualified researchers active in the field of disability and rehabilitation research. In
contrast, Center Grants and Model Systems have multiple objectives that include research,
technical assistance, training, and dissemination. Model Systems have the added expectation of
contributing patient-level data to a pooled set of data on the targeted condition (i.e., Burn, TBI,
SCI).
The number of grants to be reviewed was set at 30 by the committee’s charge; this
represented about one-quarter of the pool of 111 grants from which the sample was drawn, with
the requirement that the sample reflect grants across NIDRR’s program mechanisms. Even
though five program mechanisms were not included in the sampling pool, the number of grants
12
OCR for page 13
reviewed for any of the remaining nine program mechanisms was very small. (The largest
number of grants reviewed for any single program mechanism was 10—for FIPs). Since the
number of grants reviewed for any given program was small, the committee did not attempt to
make comparisons of the type or quality of outputs by program mechanism.
The committee was directed by NIDRR to review two outputs for each of the projects
identified by a given grantee. Therefore, a grantee with a single project had two outputs
reviewed, a grantee with three projects had six outputs reviewed, and so on. Although larger
grants with more projects also had more outputs reviewed, the current design considers neither
grant size nor duration. The design also did not take into consideration the relative importance of
a given project within a grant.
The committee was also asked to produce an overall grant rating based on the outputs
reviewed and the information available about the grants from the APRs. Results at the grant level
are subject to more limitations than those regarding outputs due to the general lack of
information about how the outputs did or did not interrelate; whether, and if so, how grant
objectives were accomplished; and the relative priority placed on the various outputs. In
addition, for larger, more complex grants, such as Center Grants, a number of grant expectations,
such as capacity building, dissemination, outreach, technical assistance, and training, are unlikely
to be adequately reflected in the approach used, which focused exclusively on specific outputs.
The relationship of outputs to grants is more complex than this approach allowed.
Recommendation 1: NIDRR should determine whether assessment of the
quality of outputs should be the sole evaluation objective.
Considering other evaluation objectives might offer NIDRR further opportunities
to continuously assess and improve its performance and achieve its mission. Alternative
designs would be needed to evaluate the quality of grants or to allow comparison across
program mechanisms. For example, if one goal of an evaluation is to assess the larger
outcomes of grants (i.e., the overall impact of a grant’s full set of activities), in addition
to the methods used in the current output assessment, the evaluation would need to
include interviewing grantees about their original grant objectives, to learn about how the
grant was implemented and any changes that may have occurred in the projected
pathway, how various projects were tied into the overall grant objectives, and how the
outputs demonstrated the achievement of the grant and project objectives. This approach
would also involve conducting bibliometric or other analyses of all publications and
examining documentation of the grant's activities and its self-assessments, including
cumulative APRs over time. Focusing at the grant level would provide evidence of
movement along the research and development pathway (e.g., from theory to measures,
from prototype testing to market), as well as allowing for assessment of other aspects of
the grant, such as training and technical assistance and the possible synergies of multiple
projects within one grant.
If the goal of an evaluation is to assess and compare the impact of program
mechanisms, different methods may be needed, depending on the expectations for each
program mechanism. They would need to include not only those mentioned above, but
also stakeholder surveys to learn about the specific ways that individual grants affect
their intended audiences. And in order to allow for generalization and comparison across
program funding mechanisms, larger grant sample sizes would be needed. An alternative
would be to increase the grant sample size in a narrower area by focusing assessments on
13
OCR for page 14
grants for specific research areas across different program mechanisms or on grants with
shared objectives (e.g., product development, knowledge translation, capacity building).
NIDRR's questions will necessarily drive future evaluations, but other levels of
analysis that NIDRR might focus on could include the portfolio level (e.g., Model System
grants, research and development, or training grants), which NIDRR has done in the past;
the program priority level (i.e., grants funded under certain NIDRR funding priorities) to
answer questions regarding the quality and impact of NIDRR's priority setting; and
institute-level questions to evaluate the net impact of NIDRR grants or to test
assumptions embedded in NIDRR's logic model. For example, NIDRR's intermediate
outcome arena targets adoption and use of new knowledge leading to
changes/improvements in policy, practice, behavior, and system capacity (see Federal
Register, February 15, 2006, pp. 8,173–8,175).
The number of outputs reviewed should depend on the unit of analysis. At the
grant level, it might be advisable to assess all outputs to examine their development, how
they relate to one another, and their impacts. A case study methodology could be used for
subsets of outputs that are related. If NIDRR aims its evaluation at the program funding
mechanism or portfolio level, sampling grants and assessing all outputs would be the
preferred method. For output-level evaluation, having grantees self-nominate their best
outputs, as was done in the present evaluation, is a good approach.
Although assessing grantee outputs is of great value, it is the committee’s view
that the most meaningful results would come from assessing outputs in the context of a
more comprehensive grant-level evaluation. More time and resources would be required
to trace a grant's progress over time in accomplishing its objectives, to understand its
evolutionary development that might have altered original objectives, and to examine the
specific projects that produced the various outputs. However, more closely examining the
inputs and processes of grant implementation that produced the outputs would yield
broader implications for the value of grants, their impact, and future directions for
NIDRR.
Strengthening Future Output Assessments
The committee was able to create a reasonably reliable system for evaluating the outputs
of NIDRR grantees based on criteria used in assessing federal research programs both in the
United States and other countries. With refinements, it could be applied to evaluate future
outputs even more effectively. In implementing the output-level assessment, particular
challenges and issues arose in relation to the diversity of outputs, the timing of evaluations,
sources of information, and reviewer expertise.
Diversity of Outputs
The quality rating system used in the summative evaluation worked very well for
publications in particular, which comprised 70 percent of the outputs reviewed. Using the four
criteria developed by the committee, the reviewers were able to identify varying levels of quality
and the characteristics associated with each of them. However, each of the quality criteria was
not so easily applied for diverse outputs such as websites, conferences, and interventions. These
outputs require more individualized criteria for assessing specialized technical elements and
sometimes more in-depth evaluation methods. Applying one set of criteria, even though broad
and flexible, could not guarantee sufficient and appropriate applicability to every type of output.
14
OCR for page 15
Timing of Evaluations
The timing of an assessment of outputs depends on the goal of the assessment. Assessing
technical quality can be done immediately, but assessing impact of outputs requires time between
the release of an output and its eventual impact. Evaluation of outputs during the final year of an
award may not allow sufficient time for them to have full impact. For example, some
publications will be forthcoming, and others will not have had sufficient time to have an impact.
The tradeoff of waiting a year or more after the end of a grant is the likelihood that staff involved
with the original grant may not be available, recollection of grant activities may be
compromised, and engagement or interest in demonstrating results may be reduced. However,
publications can be tracked regardless of access to the grantee. Outputs other than publications,
such as technology products, could be assessed in an interim evaluation.
Sources of Information
Committee members were provided with structured briefing books containing the outputs
to be reviewed and supplemental information that members could draw on if additional
information was needed to assign quality scores. The supplemental information included
information submitted through the grantees’ APRs and final reports and information provided in
a supplemental questionnaire developed by the committee (see Attachment C). The primary
source of information used by committee members in assigning scores was direct review of the
output itself. The supplemental information played a small role in assessing publications; for
outputs such as newsletters and websites, this information could provide needed context and
additional evidence helpful in determining quality scores. However, it is important to note that
the supplemental information involved grantees’ self-reports, which may be susceptible to social
desirability bias. Therefore, committee members were cautious in the degree to which this
information could serve as the basis for assigning higher output scores. Moreover, the APR was
designed for grant-monitoring and performance reporting rather than as a source of information
for a program evaluation.
As a supplemental source, the information supplied on the APRs and the questionnaire
was not always sufficient to inform the quality ratings. As examples, the technical quality of a
measurement instrument was difficult to assess if there was insufficient information about its
conceptual base or its development and testing. For conferences, workshops, and websites, it
would have been preferable for the grantee to identify the intended audience so that the
committee might better assess whether the described dissemination activities were successful in
reaching it.
For the output categories of tools, technology, and informational products, grantees
sometimes provided a publication that did not necessarily describe the output. In addition, some
outputs were difficult to assess when there was no corroborating evidence provided to support
grantees’ claims about technical quality, advancement of the field, impact, or dissemination
efforts.
The committee did not use standardized reporting guidelines, such as CONSORT
(Schultz et al., 2010) or PRISMA (Mohrer et al., 2009), which journals use in their peer review
processes for selecting manuscripts for publication. The committee members generally assumed
that publications that were peer-reviewed warranted a minimum score of 4 for technical quality,
which could be changed after the committee’s discussion. In some cases, the final committee
scores for technical quality for peer-reviewed publications were above 4; in other cases, the final
15
OCR for page 16
scores were below 4. If reporting guidelines had been used in the review of research
publications, it is possible that the ratings would have changed.
Reviewer Expertise
The committee was directed to assess the quality of four types of specified outputs.
Although the most common output type was publications, NIDRR grants produce a range of
other complex, varied outputs, including tools and measures, technology devices and standards,
and informational products. These outputs vary widely in their complexity and the investment
needed to produce them. For example, a newsletter is a more modest output than a new
technology or device. To assess the quality of outputs, the committee members used criteria that
were based on the cumulative literature reviewed and their own research expertise in diverse
areas of rehabilitation and disability research, medicine, and engineering, as well as expertise in
evaluation, economics, knowledge translation, and policy. However, the combined expertise of
the panel did not include every possible content area in the broad field of disability and
rehabilitation research.
Recommendation 2: In any future evaluations of output quality, NIDRR
should refine the process developed by the committee to strengthen the
design related to the diversity of outputs, the timing of evaluations, sources of
information, and reviewer expertise.
Corresponding to the points above these refinements include the following.
Diversity of Outputs The dimensions of the quality criteria should be tailored and
appropriately operationalized for different types of outputs, such as devices, tools, and
information products (including newsletters, conferences, and websites) and should be
field tested with grants under multiple program mechanisms and refined as needed.
For example, the technical quality criterion includes the dimension of
accessibility and usability. The questionnaire asked grantees to provide evidence of these
traits. However, the dimensions should be better operationalized for different types of
outputs. For “tools,” such as measurement instruments, the evidence to be provided
should pertain to pilot testing and psychometrics. For informational products, such as
websites, the evidence should include results of user testing, assessment of usability
features, compliance with Section 508 standards (regulations from the 1998 amendment
to the Rehabilitation Act of 1973 requiring the accessibility of federal agencies’
electronic and information technology to people with disabilities), etc. For technology
devices, the evidence should document the results of research and development tests
related to human factors, ergonomics, universal design, product reliability and safety, etc.
The quality criterion related to dissemination provides other clear examples of the
need for further specification and operationalization of the dimensions. For example, the
dissemination of technology devices should be assessed by examining the progress
toward commercialization, grantees’ partnerships with relevant organizations, including
consumers and manufacturers, and the delivery of information through multiple media
types and sources tailored to intended audiences for optimal reach and accessibility.
Timing of Evaluations The committee suggests that the timing of an output assessment
should vary by output type. Publications would best be assessed at least 2 years after a
16
OCR for page 17
grant ends. However, plans for publications and other dissemination, as well as the
audience for scientific papers, could be included as an item in the final report. As
discussed above, other outputs developed during the course of a grant should be
evaluated on an interim basis to assess the development and evolution of products.
Outputs that have the potential to affect practice or policy may require longer periods of
time to pass before impact materializes and can be measured, so they would also best be
evaluated on an interim basis.
Sources of Information A more proactive technical assistance approach is needed to
ensure that grantees provide the data necessary to assess the specific dimensions of each
quality criteria. As stated above, the information supplied on the APRs and the
questionnaire was not always sufficient to inform the quality ratings. (See also the
discussion of information requested in the grantee questionnaire, above, and the
discussion of APRs, below.)
Reviewer Expertise The committee suggests that future output evaluations should
consider including an accessible pool of experts in different technical areas who can be
called on to review selected grants and outputs. In addition, it is essential that future
review panels include scientists with disabilities. Consumers, who are not scientists,
could also play a vital role as review panel members who can address the impact and
dissemination criteria.
Using Annual Performance Reports for Evaluation
NIDRR's APR system has numerous strengths, but the committee identified some points
that NIDRR should consider in building greater potential for use of these data in evaluations. The
APR system (Research Triangle Institute, 2009) includes the grant abstract, funding information,
descriptions of the research and development projects, and outcome domains targeted by
projects, as well as a range of variables for reporting on the four different types of grantee
outputs; see Table 3. The system is tailored to different program mechanisms as needed. All of
the descriptive information listed above, plus the output-specific variables listed in Table 3, were
used in the committee’s work. The data were provided to the committee as electronic data bases
and in the form of individual grant reports.
The APR data provided to the committee by NIDRR at the outset of our work was used
to profile the grants for sampling and in listing all of the grantees' projects and outputs. They
facilitated asking the grantees to nominate outputs for our review, since we were able to generate
comprehensive lists of all reported projects and outputs to make the task of output selection less
burdensome for the grantees. If grantees had more recent outputs that they wished to nominate as
their top two for the committee's review, they had the option to do so.
TABLE 3 Data Elements Related to Outputs That Are Covered in an APR
Variables in APRa Publications Tools Technology Information
Type of output X X X X
Name and full citation X X X X
Brief description of purpose X X X
Brief description of how output was validated or tested X X X
Whether publication was peer reviewed or not X
Whether the research and related activity reported in the X
article took place during current, immediate past, or previous
17
OCR for page 18
Variables in APRa Publications Tools Technology Information
(nonconsecutive) funding cycle
Whether publication was sent to NARIC for inclusion in X
REHABDATA
Whether publication was produced as a direct result of X
receiving funding for this grant?
“Most important”b outputs that contributed the most to X X X X
achieving the outcome-oriented goals for the award
Outcome-oriented goal that corresponds to most important X X X X
outputs (advances knowledge; increases capacity for
research, training, or knowledge translation; or facilitates
change in policy, practice, or system capacity)
NIDRR outcome arena that corresponds to most important X X X X
outputs (health and function, employment, participation and
community living, cross-cutting)
Whether output is described in a publication output and X X X
indicate which one
Key findings or lessons learned X
How output is contributing to the outcome-oriented goal by X X X X
solving a problem, closing an identified gap, or benefiting the
target population
a
SOURCE: Using NIDRR APR report format for Rehabilitation Research and Training Centers as an example
b
Defined for grantees by NIDRR as “those that contributed most to achieving the outcome-oriented goals for the
award by advancing knowledge, increasing capacity for research, training or knowledge translation; or facilitating
changes in policy, practice, or system capacity.”
NIDRR also provided grantees' narrative APRs from the last year of the grants, as well as
their final reports. These narratives were very useful to the committee for compiling descriptions
of the grants. However, the quality of the information contained in the narrative annual reports
varied.3 For example, grant abstracts were not uniform in the information they contained. Some
stated their grant objectives; others omitted them and focused on summarizing their main grant
activities. The APRs of the grants reviewed were inconsistent in providing useful information for
understanding how the outputs being reviewed fit in the context of the overall grant or projects.
The final reports in most cases did not provide a cumulative overview of the life cycle of the
grants and outputs, which would have been helpful. The APR does collect information on
changes in the course of grants, but it was not always easy to understand this information from
just viewing the last year's APR or the final report.
NIDRR also provided the committee with special text reports that contained some of the
narrative information in the APRs about outputs other than publications. These reports included
such information as the purpose of the output, NIDRR outcome domains targeted by the output,
how the output was validated, and how the output contributes to achievement of the grantee’s
goals. These reports have the potential to supply contextual information for evaluations.
However, the quality of the information in them varied across the text reports describing the tool,
technology, and information outputs that the committee reviewed. Only half of the text reports
contained substantive descriptive information.
3
The APR is a large information technology system that is used for monitoring and tracking grantee progress and for
reporting on NIDRR’s performance measures under the Government Performance and Results Act (GPRA). The
system was not designed to serve as the basis for grantee evaluations. A systematic evaluation of the APR was not
part of our charge. Though the quality and level of detail included in the APRs varied, these narratives were useful
in providing descriptive grant information.
18
OCR for page 19
Not all of the specific outputs reviewed by the committee were reported in the APRs.
Some may have been reported in earlier reporting periods or had been produced after the NIDRR
grant ended.
Recommendation 3: NIDRR should consider revising its APR to better
capture information needed to routinely evaluate the quality and impacts of
outputs, grants, and funding mechanisms. They might consider such efforts
as consolidating existing data elements or adding new elements to capture the
quality criteria and dimensions used in the committee’s summative
evaluation.
From a recent interview with senior executives at NIDRR, the committee learned
that NIDRR takes pride in having stabilized its APR system in recent years after prior
periods of changing and improving it to make the data more usable for grantees, for grant
monitoring by project officers, and for agency performance reporting. We were informed
that NIDRR is currently in the process of adding a new "accomplishments" module to the
APR that will focus on the external use and adoption of NIDRR-funded outputs. In this
new module, NIDRR will consolidate some data elements that are already being collected
and add new ones. For up to five outputs that have been used or adopted by persons or
groups external to the grant during the reporting period, grantees will be asked to provide
information for each output on who adopted the outputs (in 16 categories, such as
researchers, practitioners, service providers); how the output is being used or adopted by
the target audience; the source of the evidence; and if and how the output may be
contributing to changes in policy, practice, system capacity, or other impact areas. These
efforts that are under way to change the APR will address the quality criteria used in the
committee’s evaluation for assessing the advancement of knowledge or practice and the
likely or demonstrated impact of outputs.
For the technical quality criterion, the current APR system collects data on
whether articles were published in peer-reviewed journals. For the technical quality of
outputs other than publications, we provide examples in the discussion of
Recommendation 2 (above) of ways to operationalize dimensions of accessibility and
usability, such as providing evidence of testing the psychometrics of measurement
instruments, assessing the usability features of informational products, and documenting
the results of research and development tests of technology products that relate to human
factors, ergonomics, universal design, product reliability, and safety. The APR system
currently asks for information on how outputs were validated, but data elements that
relate to such testing might be further specified in the APR system.
The APR system might also be modified to capture evidence on the quality
criterion of dissemination of outputs through such data elements as target audiences for
dissemination activities, media types, number of outputs disseminated, and reach of
dissemination, such as number of hits on websites.
Recommendation 4: NIDRR should investigate ways to work with grantees to
ensure the completeness and consistency of information provided in the
APRs.
The committee fully appreciates the necessity of minimizing the data collection
burden on grantees and acknowledges the challenges and feasibility issues related to
19
OCR for page 20
modifying the APR system while at the same time providing continuity in the system.
The committee suggests, however, that embedding evaluation data collection processes
into existing processes will lead to greater efficiencies and reduce grantee burden while
enhancing NIDRR’s ability to evaluate quality and impact. The committee acknowledges
that the refinements suggested would have to be undertaken in the context of a larger
assessment of the APR system as part of NIDRR's ongoing initiatives to improve the
system.
In sum, the committee was able to create a reasonably valid and reliable system for
evaluating the outputs of NIDRR grantees. If future evaluations of output quality are conducted,
the process developed by the committee should be implemented with refinements to strengthen
the design and process. Although assessing grantee outputs is of great value, the committee
thinks that even greater value would come from assessing outputs in the context of a more
comprehensive grant-level evaluation, which could yield broader implications for the value of
grants, their impact, and future directions for NIDRR.
The committee has appreciated the opportunity to work on this important endeavor, and
we look forward to delivering our final report to you later this year.
Sincerely yours,
David H. Wegman, Chair
Committee on the External Evaluation of NIDRR and Its Grantees
20