Click for next page ( 2


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 1
Division of Behavioral and Social Sciences and Education 500 Fifth Street, NW Board on Human-Systems Integration Washington, DC 20001 Phone: 202 334 2678 Fax: 202 334 2210 Email: bohsi@nas.edu www.nationalacademies.org July 8, 2011 Ms. Mary Darnell Contracting Officer’s Representative U.S. Department of Education Office of Special Education and Rehabilitative Services 550 12th Street, SW Washington, DC 20202 Dear Ms. Darnell: At the request of the National Institute on Disability and Rehabilitation Research (NIDRR) within the Office of Special Education and Rehabilitative Services, U.S. Department of Education, the Board on Human-Systems Integration of the National Research Council (NRC) convened an ad hoc committee to conduct an evaluation of aspects of NIDRR’s program. Specifically, the Committee on the External Evaluation of NIDRR and Its Grantees was charged to review NIDRR's priority-setting, peer review, and grant management processes, develop an overall framework and evaluation design for the review of grantee outputs for a sample of 30 grantees, conduct a review of the sampled grantee outputs, and assess the output review process. (For a list of committee members, see Attachment A.) The results of this project will be presented in a final report that will include a description and assessment of NIDRR’s priority-setting, peer review, and grant management processes and the quality of the grantees’ outputs. The committee’s evaluation is nearing completion, and the committee plans to deliver the final report in fall 2011. However, knowing that NIDRR plans to move forward with an additional evaluation cycle prior to the delivery of the committee’s final report, we wanted to provide the agency with information that could inform future evaluation design. This letter report is therefore limited in scope to discussing the procedures the committee used in its output evaluation, its assessment of those procedures, and recommendations for future evaluations. (See Attachment B for the names of the reviewers of this letter report.) BACKGROUND ON NIDRR AND COMMITTEE CHARGE The National Institute on Disability and Rehabilitation Research is the principal federal agency that funds applied research and development to improve the lives and functioning of persons with disabilities (Office of Special Education and Rehabilitative Services, 2007). NIDRR was established by the 1978 amendments to the Rehabilitation Act of 1973 and is one of three components of the Office of Special Education and Rehabilitative Services at the U.S. Department of Education (Office of Special Education and Rehabilitative Services, 2007). NIDRR has conducted various efforts to assess its portfolio and grant results and hold its programs accountable for results. In 2009, NIDRR requested that the NRC conduct an evaluation of NIDRR and its grantees. The charge to the committee included two major components. The 1

OCR for page 1
first component, termed the “process evaluation,” involved examining NIDRR’s priority-writing process, its practices for the peer review of grant applications, and grant management processes. This component will be discussed in the committee’s final report and is not covered in this letter report. The second component, termed the “summative evaluation,” involved the assessment of grantee outputs. The key question of the summative evaluation was articulated by NIDRR as follows: To what extent are the final outputs from NIDRR grants of high quality? The major portion of this component will similarly be covered in the committee’s final report and is not covered in this letter report. However, one element in the summative evaluation involved a committee self-assessment of the methods it developed to conduct the summative evaluation and the identification of implications for future reviews. This element is the sole focus of this letter report, which is organized into three main sections. The first section summarizes the methods and procedures the committee used in the summative evaluation. The second section discusses the committee’s assessment of these methods. The third section offers recommendations for future evaluations of NIDRR and its grantees. SUMMARY OF METHODS DEVELOPED FOR ASSESSING THE QUALITY OF OUTPUTS As noted above, the summative evaluation component involved an assessment of a wide range of grantee outputs, which are defined and categorized by NIDRR as follows. 1. Publications (e.g., research reports and other publications in peer-reviewed and nonpeer- reviewed publications). 2. Tools, measures, and intervention protocols (e.g., instruments or processes created to acquire quantitative or qualitative information, knowledge, or data on a specific disability or rehabilitation issue; or to provide a rehabilitative intervention). 3. Technology products and devices (e.g., industry standards/guidelines, software/netware, inventions, patents/licenses/patent disclosures, working prototypes, product(s) evaluated or field-tested, product(s) transferred to industry for potential commercialization, product(s) in the marketplace). 4. Informational products (e.g., training manuals or curricula, fact sheets, newsletters, audiovisual materials, marketing tools, educational aids, websites or other Internet sites that were produced in conjunction with research and development, training, dissemination, knowledge translation, and/or consumer involvement activities). Committee members reviewed 148 outputs: 103 publications (category 1); 9 tools, measures, and intervention protocols (category 2); 9 technology products and devices (category 3), and 27 information products (category 4). To prepare for the output review, the committee first developed a set of criteria and dimensions under those criteria that would be used to assess the quality of outputs. Second, the committee developed a questionnaire to assist grantees in nominating outputs to be reviewed and to give them the opportunity to provide supplemental descriptive information about each of the nominated outputs, along with the outputs themselves. Third, a sampling plan was developed to select grantees who would be invited to participate in the evaluation. Fourth, the committee staff worked with grantees who agreed to participate to gather and catalogue the outputs and supplemental information that were submitted for the committee’s review. Fifth, the committee assessed the outputs through an expert review process that was based on direct review of the 2

OCR for page 1
outputs and any supplemental information provided by the grantees. These five steps in the study process are each described in detail below. Quality Criteria Development A key element of the summative evaluation was to respond to NIDRR’s request to develop criteria for assessing the quality of its grantees’ outputs.1 In developing the criteria, the committee drew on its own research expertise, recommendations of the external advisory group convened by NIDRR while planning this NRC evaluation (National Institute on Disability and Rehabilitation Research, 2008), and methods used in other NRC and international studies that have evaluated federal research programs (see, e.g., Bernstein et al., 2007; Canadian Academy of Health Sciences, 2009; Chien, Chen, and Chen, 2009; Ismail, Tiessen, and Wooding, 2010; National Research Council, 2007; Wooding and Starkey, 2010; Wooding et al., 2009). The committee developed four criteria: 1. Technical quality of output The technical quality of outputs was assessed using dimensions that included applying standards of science and technology, appropriate methodology (quantitative or qualitative design and analysis), and degree of accessibility and usability. 2. Advancement of knowledge or the field The dimensions used to assess advancement of the knowledge base or of the field (e.g., research, practice, or policy as relevant) included scientific advancement of methods, tools, and theory; developing new information or technologies; closing an identified gap; and using methods and approaches that were innovative or novel. 3. Likely or demonstrated impact This criterion was used to assess the likely or demonstrated impact of outputs on science (journal impact, citations); consumers (for people with disabilities: health, quality of life, participation), provider practice, health and social systems, social and health policy, or the private sector or commercialization. 4. Dissemination The dimensions of dissemination included the identification and tailoring of materials for reaching different audience/user types; collaboration with audience/users in identifying content and medium needs/preferences; delivery of information through multiple media types and sources for optimal reach and accessibility; evaluation of dissemination efforts and impacts; and commercialization/patenting of devices, if applicable. A 7-point scale was used to rate the criteria at varying levels of quality: 1 indicated poor quality, 4 indicated good quality, and 7 indicated excellent quality. A rating of 4 meant that the output solidly fell in the range of meeting expectations for good quality. See Box 1 for examples of quality indicators considered by committee members in determining each criterion score. These examples are not intended to be exhaustive, but to illustrate the attributes of outputs that were considered in the committee’s review. In rating the outputs, committee members drew from their scientific expertise to consider the output’s quality with respect to the dimensions under each criterion. 1 The development of the criteria was informed by open session discussions in which NIDRR staff were present. 3

OCR for page 1
BOX 1 Examples of Quality Indicators Considered in Determining Output Scores Technical Quality  Strength of literature review and framing of issues  Competence of design, considering the research question and other parameters of the study  Quality of measurement planning and description  Analytic methods and interpretation; degree to which recommendations for change were drawn clearly from the analysis  Description of feasibility, usability, accessibility, and consumer satisfaction testing Advancement of Knowledge/Practice  Degree to which a ground-breaking and innovative approach is presented  Application of a formal test of a hypothesis regarding a technique used widely in the field to improve practice  Level of advancement and improvement to current classification systems  Usefulness of descriptive base of information about factors associated with a condition  Novelty of ways of studying a condition that can be applied to developing new models, training, or research Likely or Demonstrated Impact  Degree to which output is well cited or has promise to be (for newer articles)  Potential to improve the lives of persons with disabilities through increasing accessibility  Possibly transformative clinical and policy implications  Potential for building capacity, lowering costs, commercialization, etc.  Influence on the direction of research, use in the field, or capacity of the field Dissemination  Method and scope of dissemination  Description of the evidence of dissemination (e.g., numbers distributed to different audiences);  Level of strategic dissemination to target audiences when needed  Evidence of reaching the target audience  Degree to which appropriate multiple media outlets were used, such as webinars, TV coverage, senate testimony, website, DVD, and/or social network sites. Grantee Questionnaire NIDRR supplied the committee with information gathered from grantees in their Annual Performance Reports (APRs) and final reports (Research Triangle International, 2009). Grantees are required to complete APRs annually to report on their progress. At the end of a grant, they must complete a final report. To supplement the APRs and final reports provided by NIDRR, the committee developed a grantee questionnaire (see Attachment C). The first part of the questionnaire asked grantees to list each of the projects under the grant and nominate the top two outputs from each project that reflected the grants’ best achievements. The questionnaire specified that outputs were to be drawn from the four categories defined in the APR (Research Triangle Institute, 2009), described above: (1) publications; (2) tools, measures, and intervention protocols; (3) technology products and devices; and (4) informational products. The questionnaire instructions indicated that the committee would prefer to review one publication and one other type of output for each project under a grant, but that grantees could submit two publications for review if that was the only type of output for a project. The questionnaire asked the grantees to submit the actual outputs for the committee's review. If the output was a website, a tool, or a technology device that had to be demonstrated, grantees were 4

OCR for page 1
asked to provide descriptive information, pictures, or links to websites for the committee's direct review. The second part of the questionnaire included a series of questions to elicit more in-depth descriptions of the outputs if needed and to provide supplemental evidence of the output's technical quality, how it advanced knowledge or practice, its likely or demonstrated impact, and about how it was disseminated. This type of information needed for a comprehensive assessment of the output would not always be apparent in reviewing the output in isolation. For supplemental information on technical quality, grantees were asked to describe examples, such as the approach or method used in its development; relevant peer recognition; receipt of a patent, approval by the Food and Drug Administration, or use of the output in standards development; and evidence of the usability and accessibility of the output. For supplemental information on advancement of knowledge or the field, grantees were asked to discuss the importance of the original question or issue and describe how the output had advanced knowledge in such arenas as making discoveries; providing new information; establishing theories, measures, and methods; closing gaps in the knowledge base; and developing new interventions, products, technology, and environmental adaptations. For supplemental information on likely or demonstrated impact, grantees were instructed to describe the output’s potential or actual impact on science, people with disabilities, provider practice, health and social systems, social and health policy, private sector/commercialization, capacity building, and any other relevant arenas. For supplemental information about dissemination, grantees were asked to describe the stage and scope (e.g., local, regional, national) of dissemination efforts, specific dissemination activities, any identification and tailoring of materials for particular audiences, efforts to collaborate with particular audiences or user communities to identify content and medium needs and preferences, and the delivery of information through multiple media types. Grantees were also asked to provide information from evaluations they may have conducted of their dissemination efforts and impacts (e.g., results of audience feedback or satisfaction surveys). The committee piloted the questionnaire on one NIDRR grant that had ended in 2008 and was outside the sampling pool (described below). Operating through subgroups, the committee assessed five outputs of this grant, which consisted of publications, an assessment package, a working prototype, and a fact sheet. As a result of that assessment, the committee revised the questionnaire by collapsing some of the dimensions from an original six criteria into the four final criteria.2 To further supplement the grantee questionnaire in assessing the likely impact of published articles, the committee also used such sources as Scopus and the Web of Science to determine the journal impact factor and the number of citations of a particular article. Sampling NIDRR provided the committee with a data set of grantee information that consisted of all grants ending in years 2006−2010 (n = 248). Table 1 shows, within each program mechanism: the number of grants and the corresponding proportion of all NIDRR grants, the mean duration of grants, and the total funds expended and proportion of all funds expended. The last five 2 An original criterion on output usability was collapsed into the final technical quality criterion. Another original criterion on consumer and audience involvement was restructured as dimensions of the other criteria. For example, the technical quality criterion now includes a dimension on "evidence of usability and accessibility." The impact criterion includes a dimension on "impact on people with disabilities." The dissemination criterion includes a dimension on "tailoring materials to audiences" and "collaboration with users." 5

OCR for page 1
columns of the table show the number of grants that ended in each year from 2006 to 2010. Highlighted in these last five columns is a subset of 111 grants that comprised the sampling pool from which 30 grants were randomly sampled for the summative evaluation. The committee used the smaller subset of all NIDRR grants as the sampling pool because of its charge and preliminary analysis of the data. The committee was directed by its charge to draw a sample of 30 grants ending in 2009 that reflected the range of work conducted across NIDRR’s 14 program mechanisms. However, as can be seen in Table 1, several program mechanisms did not have at least two grants ending in 2009: the three Model Systems (MS) mechanisms, Disability Business Technical Assistance Centers (DBTAC), Disability and Rehabilitation Research Projects-Knowledge Translation (DRRP-KT), Advanced Rehabilitation Research Training (ARRT), and grants under the Disability and Rehabilitation Research Projects Program (DRRP)-Section 21. Because the MS grant mechanisms support some of NIDRR’s flagship programs, including traumatic brain injury (MS-TBI), spinal cord injury (MS-SCI), and burn injuries (MS- Burn), adjustments were made to the sampling pool to ensure that these programs would be included in the sample. The committee thus went back to the nearest year that yielded a total of at least two grants, which was 2008 for MS-Burn and MS-TBI (n = 5 for Burn; n = 9 for MS- TBI, with 1 in 2009 and 8 in 2008) and 2007 for MS-SCI (n = 9) and included these grants in the pool. The DBTAC, DRRP-KT, ARRT, and DRRP-Section 21 were excluded from the pool for this first cycle of evaluations. Small Business Innovation Research, Phase I, grants were also excluded from the sampling pool because they do not produce “outputs” and therefore did not align with the evaluation parameter to review two outputs for each project within a grant. After these adjustments, the total pool consisted of 111 grants across nine NIDDR program mechanisms. It is possible that the older grants included in the evaluation had an advantage over the grants ending in 2009 because of the additional time for their outputs to have had an impact. From this pool of 111 grants, 30 grants (27%) were randomly selected for review in the following way. To balance the desire for the sample of grants to represent the nine program mechanisms included in the pool, the sampling was stratified at the program mechanism level as a proportion of all grants in the sampling pool. For example, there were 36 Field Initiated Project (FIP) grants in the sampling pool (see Table 1), which was 32 percent of all of the grants in the sampling pool (n = 111). Therefore, 32 percent of the 30 grants in the sample should be FIPs (n = 10). The 36 FIPS in the sampling pool were numbered 1 through 36 and then 10 FIP grants were randomly selected, using a website that generated random numbers. Table 2 in the next section shows the number of grants included in the sample by program mechanism. Proportionally, the number of grants sampled in each program mechanism did not reflect the actual proportions of all grants in the larger NIDRR data set (N = 248), but the sampling method did allow for the largest number of grants in the sample to be FIP grants, which was the largest program mechanism in the NIDRR data set. 6

OCR for page 1
TABLE 1 NIDRR Grants Ending Between 2006 and 2010, by Program Mechanisms Number of Grants in Program Percent of Total Mechanism, by Year Ending, with Mean Total Grant NIDRR Grant Grants Included in Sampling Pool Number Percent Duration of Funding by Funding (for Highlighted of of all Grant Program Grants Ending Program/Funding Mechanism Grants Grants (years) Mechanism (in $) 2006–2010) 2006 2007 2008 2009 2010 Model Systems Grants Burn Injury (MS-Burn) 5 2.02 6.1 7,271,563 2.34 0 0 5 0 0 Traumatic Brain Injury (MS-TBI) 16 6.45 5.6 29,132,862 9.38 0 7 8 1 0 Spinal Cord Injury 17 6.85 6.5 33,977,321 10.94 8 9 0 0 0 (MS-SCI) Center Grants Rehabilitation Engineering and 12 4.84 5.7 55,816,980 17.98 0 0 0 8 4 Research Centers (RERC) Rehabilitation Research and 21 8.47 5.9 82,920,345 26.71 0 0 0 10 11 Training Centers (RRTC) Research and Development Grants Disability and Rehabilitation 18 7.26 5.0 30,627,386 9.87 0 0 0 14 4 Research Projects (DRRP) Field Initiated Projects (FIP) 74 29.84 3.8 35,881,454 11.56 0 0 0 36 38 Small Business Innovation 31 12.50 0.6 2,323,305 0.75 0 0 0 16 15 Research, Phase I (SBIR) Small Business Innovation 16 6.45 2.5 7,990,171 2.57 0 0 0 8 8 Research, Phase II (SBIR) Translation Grants DRRP-Disability Business Technical Assistance Centers 1 0.40 1.8 1,742,400 0.56 0 0 1 0 0 (DBTAC) DRRP-Knowledge Translation 3 1.21 5.0 8,179,933 2.64 0 0 0 0 3 (DRRP-KT) Training Grants Advanced Rehabilitation 11 4.44 5.8 8,229,338 2.65 0 0 0 1 10 Research Training (ARRT) Switzer Fellowships 20 8.06 1.3 1,220,000 0.39 0 0 0 12 8 DRRP-Section 21 3 1.21 6.1 5,141,955 1.66 0 0 1 1 1 Total 248 100.00 $310,455,013 100.00 8 16 15 107 102 Number of Grants in Sampling 0 9 13 89 0 Pool by End Year Total Number Grants in Pool 111 SOURCE: Data summarized from National Institute on Disability and Rehabilitation Research (September 2009). Annual Performance Report Data Set of Grants Ending in 2006 to 2010. Washington, DC: National Institute on Disability and Rehabilitation Research. 7

OCR for page 1
After the proposed evaluation methods received approval from the institutional review board of the National Academies, the sample of 30 grants was drawn, and invitations to participate were sent to the principal investigators (PI) of the 30 grants. The PIs were fully informed about the methods to be used in the evaluation and what would be required of them. Of the original 30 grantees invited, 3 (1 DRRP and 2 FIPs) declined because they did not have time to fulfill the evaluation requirements (n = 2) or changed institutions (n = 1). Three other grants were then randomly selected from the remaining pool for the appropriate program mechanisms to bring the final sample to 30 grants (i.e., 1 DRRP and 2 FIPs were drawn). In replacing three of the originally sampled grants, we acknowledge that bias from self-selection could have crept into the evaluation findings and that the final sample of 30 grants that participated in the evaluation may not be fully representative of the larger population of grants. Compiling Outputs to Be Reviewed and Number of Outputs Reviewed As noted, the PIs of the grants included in the sample were provided with written instructions about how to submit their outputs for the review and provide supplemental information about the outputs. Committee staff worked with the grantees to clarify the instructions and to encourage them to submit their output packages. Because some grants had ended several years before our review (2007 and 2008 for the Model Systems grants), some grantees had difficulty in submitting materials because the PIs had changed departments or institutions or had other competing priority activities during the time period of our review. Staff accommodated these PIs by providing additional time for submitting their materials and, in five cases, by assisting them in completing the questionnaires through telephone interviews. Two grantees did not provide the supplemental questionnaires. As described above, grantees were sent questionnaires on which they were asked to list each project under their grant and nominate two outputs per project to be reviewed by the committee. They were asked to identify the top two outputs per project that reflected their grant’s best achievements. In order to permit assessment of outputs beyond journal publications, grantees were asked to offer at least one nonjournal publication per project, if such outputs were available. The number of projects for each grant varied by size, from 1 for small field-initiated grants to 10 on larger center grants. Therefore, the number of outputs nominated for review per grant ranged from 2 to 20; the average number of outputs per grant was 5. A total of 156 outputs were submitted for review across the 30 grants selected. Eight outputs were considered highly related to other outputs, and they were reviewed together. This occurred when one output was a derivative or different expression of another output, and when the PI responses to criteria questions were basically the same. Therefore, the number of outputs for analysis was 148. Table 2 presents the number of grants included in the sample by program mechanism and the types of outputs that were reviewed. To put the outputs reviewed into the larger context of the outputs produced by grantees in the sampling pool of 111 grants, Table 2 also shows that the proportion of publications and other outputs (tools, technology, and information products) that were reviewed by the committee were relatively close to the proportions of the various output types produced by grantees in the larger sampling pool. The proportion of publications reviewed was somewhat lower at 70 percent (compared with 76 percent in the sampling pool), and the proportion of information products reviewed was somewhat higher at 18 percent (compared with 11% in the sampling pool). The mean number of outputs per grant in the sample is much lower (mean = 5) than in the sampling pool (mean = 13) because the sampled grants only submitted their top two outputs per project (as described above). 8

OCR for page 1
TABLE 2 Number of Grants and Distribution of Outputs Reviewed, by Program Mechanism NIDRR Grant Category and Publica- Tech- Infor- Program Funding Mechanisms Grants tions Tools nology mation Total Model Systems Grants Burn Injury (MS-Burn) 2 12 2 0 4 18 (12%) Traumatic Brain Injury (MS-TBI) 2 12 0 0 2 14 (10%) Spinal Cord injury 2 11 0 0 0 11 (7%) (MS-SCI) Center Grants Rehabilitation Research and 3 16 0 0 12 28 (19%) Training Center (RRTC) Rehabilitation Engineering and 2 16 2 5 3 26 (18%) Research Centers (RERC) Research and Development Grants Disability and Rehabilitation 4 13 4 0 5 22 (15%) Research Projects (DRRP) Field Initiated Projects (FIP) 10 17 1 3 1 22 (15%) Small Business Innovation 2 1 0 1 0 2 (1%) Research, Phase II (SBIR) Training Grants Switzer Fellowship 3 5 0 0 0 5 (3%) Total and Proportion of Output Types in Sample 30 103 (70%) 9 (6%) 9 (6%) 27 (18%) 148 Total and Proportion of Output Types in Sampling Pool 111 1,060 (76%) 101 (7%) 84 (6%) 148 (11%) 1,393 SOURCE: Data summarized from Questionnaires submitted to committee by NIDRR Grantees that participated in the evaluation (Rows 3 to 16); and National Institute on Disability and Rehabilitation Research (September 2009). Annual Performance Report Data Set of Grants Ending in 2006 to 2010. Washington, DC: National Institute on Disability and Rehabilitation Research (Row 17). The Review Process The committee members, whose expertise covers social sciences, rehabilitation medicine, engineering, evaluation, and knowledge translation, were divided into three subgroups of five members each. The subgroups were organized to ensure that outputs would be reviewed by a group of individuals with the collective expertise necessary to judge their quality. The subgroups met in October 2010, December 2010, and February 2011. Because of the relatively short period of time in which to conduct the reviews, grants were scheduled for review according to size, with the smaller grants being invited first (e.g., FIPS, Switzers, SBIRs), and the larger grants (DRRPs, models systems, center grants) being invited to participate in the later rounds. The rationale for the scheduling was that the smaller grants had fewer outputs and would need less preparation time for the review than the larger grants, which had many projects and more outputs to prepare for the review. As a result of this approach, the content of the grants being reviewed in each round tended to be mixed and so required a corresponding mix of expertise in each subgroup. However, efforts were made to match the expertise of the reviewers in each subgroup with the outputs they would be reviewing (e.g., technology output was assigned to a subgroup with engineering expertise). For a detailed description of the review procedures, see Box 2. 9

OCR for page 1
BOX 2 Committee Review Procedures Each of the 30 grants was assigned to one of the three committee subgroups, so that all outputs from a grant were reviewed by the same subgroup. To ensure consistency in approach across subgroups, the committee chair attended all subgroup meetings. Based on direct review of the output itself and supplemental information about the output provided in the APRs, final reports, and questionnaire responses from grantees, each subgroup member independently rated every output assigned to that subgroup, assigning a quality criteria score for each of the four quality criteria (technical quality, advancement of knowledge or the field, likely or actual impact, and dissemination), as well as an overall score for the output and a rationale for the overall score. Scores were assigned using a 7-point scale ranging from 1 to 7 and anchored at 3 points: 1 = poor quality, 4 = good quality, and 7 = excellent quality. For each output, one subgroup member was assigned as the primary reviewer; the remaining four subgroup members were secondary reviewers. The subgroups used the following process for arriving at consensus scores:  The primary reviewer opened discussion of each output by presenting a brief summary of the output and his or her rationale for rating each relevant criterion plus the overall score.  The secondary reviewers then presented their ratings for each output and a brief rationale.  The subgroup then developed consensus group ratings for each output through discussion facilitated by the subgroup chair. Following the discussion of all outputs from an individual grant, the subgroup considered the full spectrum of the reviewed material, along with the grant’s overall purpose and objectives (using the grant’s APR), and assigned an overall performance rating for the grant using the same 7-point scale. The committee's expert review involved a qualitative consideration and assessment of the multiple quality dimensions of the outputs — a process that has been recommended as a valid method for evaluating the relevance and quality of federal research programs (Committee on Science, Engineering, and Public Policy, 1999). The 7-point rating scale was used in order to more precisely describe the results of the output assessment in terms of varying levels of quality. During the reviews, the committee members frequently discussed how they were applying the criteria and interpreting the anchors of the rating scale so they could calibrate their ratings. In addition, brief narrative statements were written that summarized the rationale for the subgroups’ ratings of each output. These statements were reviewed after the ratings were completed to identify attributes that particularly characterized the varying levels of quality and were helpful in further exemplifying the dimensions of the criteria. Although the final scores used to report results of the output assessment were based on the consensus scores, the committee conducted an interrater reliability analysis of their initial independent ratings (i.e., raw scores before their discussion) to determine the degree to which individual committee members were using and interpreting the scale in the same way. The interrater reliability analysis was conducted, using methods suggested by MacLennan (1993), for more than two raters with ordinal data. This method calculates an intraclass correlation coefficient (ICC) that represents an average correlation among raters. The interrater reliability analyses were run on 15 grants that had at least 3 outputs reviewed by the subgroups. The ratings compared were the individual committee members' raw scores (before discussion) on the technical quality criterion only. The ICCs ranged between .64 and .98 and were statistically significant at p < .05. According to Yaffee (1998), the minimum acceptable ICC is .75 to .80. Of the 15 grants, 13 had ICCs great than .75. The ICC results suggest that individual members were 10

OCR for page 1
using and interpreting the 7-point scale in a similar manner prior to the full subgroup’s discussions of the output ratings and their subsequent determination of consensus scores. ASSESSMENT OF THE COMMITTEE’S REVIEW METHODS The committee developed and implemented an evaluation process for assessing the outputs of NIDRR's grantees and was able to identify varying levels of quality as well as some of the output characteristics associated with these varying quality levels. Considerable time was spent selecting and refining the criteria used to assess quality. Although there was some variation in the independent scoring among subgroup members, it was rarely extreme, particularly after the group discussions. And although the specific content area expertise to assess every output could not be ensured for the diversity and breadth of the outputs reviewed, the committee concludes that, collectively, the subgroups were able to adequately assess all the outputs. The committee endeavored to assess its evaluation methods throughout the study process. Members engaged in continuous reflection and recording of strengths and weaknesses during the rating process conducted in subgroup meetings. To facilitate this effort, the committee chair participated in all subgroup meetings to ensure members understood how each subgroup was applying the rating methods. In addition, conference calls with the full committee were held after each set of subgroup meetings to discuss the evaluation process and refine the methods. Lastly, during its final meeting, the committee devoted a half-day session to discussion of the strengths and weaknesses of the process and developing conclusions and recommendations for future evaluations. This discussion was based on the continuous reflections of committee members, along with findings from an informal, anonymous poll of committee members about the review process. In the poll, each committee member was asked to rate his or her level of confidence in 16 aspects of the review process and 8 topics related to its replication. For each of these aspects, members assigned a confidence rating on a 5-point scale in which 1 indicated “no confidence at all” and 5 indicated “extreme confidence.” The poll was intended to provide an indicator of each committee member’s assessment of the output rating process. Poll results confirmed that individual members were generally confident in the review process and the potential replication of the process, with confidence ratings above the midpoint for all but one of the review process aspects and all but one of the replication topics. Aspects of the review process in which the committee had the greatest confidence (with scores above 4 on the rating scale) were:  the technical quality score,  the face validity of the consensus scores that were produced for outputs,  the ability of the committee to evaluate outputs without having consumers on subgroups, and  the appropriateness of a 7-point quality rating scale. These results were consistent with the committee’s overall impressions on the strengths and weakness of the evaluation process over the course of its work. With regard to the poll item on the ability of the committee to evaluate outputs without having consumers on subgroups, the committee notes that its confidence rating on this item is not meant to suggest that the input of individuals with disabilities is not a necessary part of the process. The committee included two subject-matter experts who are also individuals with disabilities, and the point above relates to committee members’ view that the subgroups, while 11

OCR for page 1
lacking consumers without relevant scientific expertise, did assign appropriate scores to the outputs. The poll also confirmed committee impressions regarding the challenge of rating outputs other than peer-reviewed journals; this was the one aspect of the review process receiving an average confidence rating below the midpoint of the scale. The results of the poll related to replication of the review process largely mirrored the results related to the review process itself. Committee members expressed the greatest confidence in the ability to match appropriate reviewer expertise with outputs to review and the ability to appropriately secure knowledgeable reviewers. The only issue that received an average confidence rating below the midpoint was the ability to assess the overall quality of grants by reviewing selected outputs. Overall, members’ reflections on the summative evaluation process suggest that it worked well and achieved what it was designed to do. However, the committee encountered several challenges and limitations during the course of our work that limit the generalizability of the findings from this evaluation and restrict what can be said about the totality of outputs generated by all NIDRR grantees. In the next section, within the context of recommendations for future evaluations, we discuss these limitations and issues. RECOMMENDATIONS FOR FUTURE EVALUATIONS The committee offers conclusions, recommendations, and suggestions on defining evaluation objectives, strengthening the output assessment, and using NIDRR’s APR system to capture data for future evaluations. The goal of our recommendations and suggestions is to improve future evaluation efforts and to ensure that evaluation results optimally inform NIDRR’s efforts to maximize the impact of its research grants. Defining Future Evaluation Objectives The primary focus of the committee’s summative evaluation was to assess the quality of research and development outputs produced by grantees. This evaluation did not allow for an in- depth examination or comparison of the larger context of the funding programs, grants, or projects in which the outputs were produced. Although capacity building is a major thrust of NIDRR's center and training grants, assessment of training outputs, such as the number of trainees moving into research positions, was not part of our charge. NIDRR’s grant mechanisms or programs vary substantially in both size and duration (see Table 1, above), with grant amounts varying from less than $50,000 (Field Initiated Projects) to more than $4 million (Center Grants), and grant durations varying from less than 1 year to more than 5 years. Programs also differ in their objectives, so the expectations of the grantees under different programs vary widely. For example, a Switzer training grant is designed to increase the number of qualified researchers active in the field of disability and rehabilitation research. In contrast, Center Grants and Model Systems have multiple objectives that include research, technical assistance, training, and dissemination. Model Systems have the added expectation of contributing patient-level data to a pooled set of data on the targeted condition (i.e., Burn, TBI, SCI). The number of grants to be reviewed was set at 30 by the committee’s charge; this represented about one-quarter of the pool of 111 grants from which the sample was drawn, with the requirement that the sample reflect grants across NIDRR’s program mechanisms. Even though five program mechanisms were not included in the sampling pool, the number of grants 12

OCR for page 1
reviewed for any of the remaining nine program mechanisms was very small. (The largest number of grants reviewed for any single program mechanism was 10—for FIPs). Since the number of grants reviewed for any given program was small, the committee did not attempt to make comparisons of the type or quality of outputs by program mechanism. The committee was directed by NIDRR to review two outputs for each of the projects identified by a given grantee. Therefore, a grantee with a single project had two outputs reviewed, a grantee with three projects had six outputs reviewed, and so on. Although larger grants with more projects also had more outputs reviewed, the current design considers neither grant size nor duration. The design also did not take into consideration the relative importance of a given project within a grant. The committee was also asked to produce an overall grant rating based on the outputs reviewed and the information available about the grants from the APRs. Results at the grant level are subject to more limitations than those regarding outputs due to the general lack of information about how the outputs did or did not interrelate; whether, and if so, how grant objectives were accomplished; and the relative priority placed on the various outputs. In addition, for larger, more complex grants, such as Center Grants, a number of grant expectations, such as capacity building, dissemination, outreach, technical assistance, and training, are unlikely to be adequately reflected in the approach used, which focused exclusively on specific outputs. The relationship of outputs to grants is more complex than this approach allowed. Recommendation 1: NIDRR should determine whether assessment of the quality of outputs should be the sole evaluation objective. Considering other evaluation objectives might offer NIDRR further opportunities to continuously assess and improve its performance and achieve its mission. Alternative designs would be needed to evaluate the quality of grants or to allow comparison across program mechanisms. For example, if one goal of an evaluation is to assess the larger outcomes of grants (i.e., the overall impact of a grant’s full set of activities), in addition to the methods used in the current output assessment, the evaluation would need to include interviewing grantees about their original grant objectives, to learn about how the grant was implemented and any changes that may have occurred in the projected pathway, how various projects were tied into the overall grant objectives, and how the outputs demonstrated the achievement of the grant and project objectives. This approach would also involve conducting bibliometric or other analyses of all publications and examining documentation of the grant's activities and its self-assessments, including cumulative APRs over time. Focusing at the grant level would provide evidence of movement along the research and development pathway (e.g., from theory to measures, from prototype testing to market), as well as allowing for assessment of other aspects of the grant, such as training and technical assistance and the possible synergies of multiple projects within one grant. If the goal of an evaluation is to assess and compare the impact of program mechanisms, different methods may be needed, depending on the expectations for each program mechanism. They would need to include not only those mentioned above, but also stakeholder surveys to learn about the specific ways that individual grants affect their intended audiences. And in order to allow for generalization and comparison across program funding mechanisms, larger grant sample sizes would be needed. An alternative would be to increase the grant sample size in a narrower area by focusing assessments on 13

OCR for page 1
grants for specific research areas across different program mechanisms or on grants with shared objectives (e.g., product development, knowledge translation, capacity building). NIDRR's questions will necessarily drive future evaluations, but other levels of analysis that NIDRR might focus on could include the portfolio level (e.g., Model System grants, research and development, or training grants), which NIDRR has done in the past; the program priority level (i.e., grants funded under certain NIDRR funding priorities) to answer questions regarding the quality and impact of NIDRR's priority setting; and institute-level questions to evaluate the net impact of NIDRR grants or to test assumptions embedded in NIDRR's logic model. For example, NIDRR's intermediate outcome arena targets adoption and use of new knowledge leading to changes/improvements in policy, practice, behavior, and system capacity (see Federal Register, February 15, 2006, pp. 8,173–8,175). The number of outputs reviewed should depend on the unit of analysis. At the grant level, it might be advisable to assess all outputs to examine their development, how they relate to one another, and their impacts. A case study methodology could be used for subsets of outputs that are related. If NIDRR aims its evaluation at the program funding mechanism or portfolio level, sampling grants and assessing all outputs would be the preferred method. For output-level evaluation, having grantees self-nominate their best outputs, as was done in the present evaluation, is a good approach. Although assessing grantee outputs is of great value, it is the committee’s view that the most meaningful results would come from assessing outputs in the context of a more comprehensive grant-level evaluation. More time and resources would be required to trace a grant's progress over time in accomplishing its objectives, to understand its evolutionary development that might have altered original objectives, and to examine the specific projects that produced the various outputs. However, more closely examining the inputs and processes of grant implementation that produced the outputs would yield broader implications for the value of grants, their impact, and future directions for NIDRR. Strengthening Future Output Assessments The committee was able to create a reasonably reliable system for evaluating the outputs of NIDRR grantees based on criteria used in assessing federal research programs both in the United States and other countries. With refinements, it could be applied to evaluate future outputs even more effectively. In implementing the output-level assessment, particular challenges and issues arose in relation to the diversity of outputs, the timing of evaluations, sources of information, and reviewer expertise. Diversity of Outputs The quality rating system used in the summative evaluation worked very well for publications in particular, which comprised 70 percent of the outputs reviewed. Using the four criteria developed by the committee, the reviewers were able to identify varying levels of quality and the characteristics associated with each of them. However, each of the quality criteria was not so easily applied for diverse outputs such as websites, conferences, and interventions. These outputs require more individualized criteria for assessing specialized technical elements and sometimes more in-depth evaluation methods. Applying one set of criteria, even though broad and flexible, could not guarantee sufficient and appropriate applicability to every type of output. 14

OCR for page 1
Timing of Evaluations The timing of an assessment of outputs depends on the goal of the assessment. Assessing technical quality can be done immediately, but assessing impact of outputs requires time between the release of an output and its eventual impact. Evaluation of outputs during the final year of an award may not allow sufficient time for them to have full impact. For example, some publications will be forthcoming, and others will not have had sufficient time to have an impact. The tradeoff of waiting a year or more after the end of a grant is the likelihood that staff involved with the original grant may not be available, recollection of grant activities may be compromised, and engagement or interest in demonstrating results may be reduced. However, publications can be tracked regardless of access to the grantee. Outputs other than publications, such as technology products, could be assessed in an interim evaluation. Sources of Information Committee members were provided with structured briefing books containing the outputs to be reviewed and supplemental information that members could draw on if additional information was needed to assign quality scores. The supplemental information included information submitted through the grantees’ APRs and final reports and information provided in a supplemental questionnaire developed by the committee (see Attachment C). The primary source of information used by committee members in assigning scores was direct review of the output itself. The supplemental information played a small role in assessing publications; for outputs such as newsletters and websites, this information could provide needed context and additional evidence helpful in determining quality scores. However, it is important to note that the supplemental information involved grantees’ self-reports, which may be susceptible to social desirability bias. Therefore, committee members were cautious in the degree to which this information could serve as the basis for assigning higher output scores. Moreover, the APR was designed for grant-monitoring and performance reporting rather than as a source of information for a program evaluation. As a supplemental source, the information supplied on the APRs and the questionnaire was not always sufficient to inform the quality ratings. As examples, the technical quality of a measurement instrument was difficult to assess if there was insufficient information about its conceptual base or its development and testing. For conferences, workshops, and websites, it would have been preferable for the grantee to identify the intended audience so that the committee might better assess whether the described dissemination activities were successful in reaching it. For the output categories of tools, technology, and informational products, grantees sometimes provided a publication that did not necessarily describe the output. In addition, some outputs were difficult to assess when there was no corroborating evidence provided to support grantees’ claims about technical quality, advancement of the field, impact, or dissemination efforts. The committee did not use standardized reporting guidelines, such as CONSORT (Schultz et al., 2010) or PRISMA (Mohrer et al., 2009), which journals use in their peer review processes for selecting manuscripts for publication. The committee members generally assumed that publications that were peer-reviewed warranted a minimum score of 4 for technical quality, which could be changed after the committee’s discussion. In some cases, the final committee scores for technical quality for peer-reviewed publications were above 4; in other cases, the final 15

OCR for page 1
scores were below 4. If reporting guidelines had been used in the review of research publications, it is possible that the ratings would have changed. Reviewer Expertise The committee was directed to assess the quality of four types of specified outputs. Although the most common output type was publications, NIDRR grants produce a range of other complex, varied outputs, including tools and measures, technology devices and standards, and informational products. These outputs vary widely in their complexity and the investment needed to produce them. For example, a newsletter is a more modest output than a new technology or device. To assess the quality of outputs, the committee members used criteria that were based on the cumulative literature reviewed and their own research expertise in diverse areas of rehabilitation and disability research, medicine, and engineering, as well as expertise in evaluation, economics, knowledge translation, and policy. However, the combined expertise of the panel did not include every possible content area in the broad field of disability and rehabilitation research. Recommendation 2: In any future evaluations of output quality, NIDRR should refine the process developed by the committee to strengthen the design related to the diversity of outputs, the timing of evaluations, sources of information, and reviewer expertise. Corresponding to the points above these refinements include the following. Diversity of Outputs The dimensions of the quality criteria should be tailored and appropriately operationalized for different types of outputs, such as devices, tools, and information products (including newsletters, conferences, and websites) and should be field tested with grants under multiple program mechanisms and refined as needed. For example, the technical quality criterion includes the dimension of accessibility and usability. The questionnaire asked grantees to provide evidence of these traits. However, the dimensions should be better operationalized for different types of outputs. For “tools,” such as measurement instruments, the evidence to be provided should pertain to pilot testing and psychometrics. For informational products, such as websites, the evidence should include results of user testing, assessment of usability features, compliance with Section 508 standards (regulations from the 1998 amendment to the Rehabilitation Act of 1973 requiring the accessibility of federal agencies’ electronic and information technology to people with disabilities), etc. For technology devices, the evidence should document the results of research and development tests related to human factors, ergonomics, universal design, product reliability and safety, etc. The quality criterion related to dissemination provides other clear examples of the need for further specification and operationalization of the dimensions. For example, the dissemination of technology devices should be assessed by examining the progress toward commercialization, grantees’ partnerships with relevant organizations, including consumers and manufacturers, and the delivery of information through multiple media types and sources tailored to intended audiences for optimal reach and accessibility. Timing of Evaluations The committee suggests that the timing of an output assessment should vary by output type. Publications would best be assessed at least 2 years after a 16

OCR for page 1
grant ends. However, plans for publications and other dissemination, as well as the audience for scientific papers, could be included as an item in the final report. As discussed above, other outputs developed during the course of a grant should be evaluated on an interim basis to assess the development and evolution of products. Outputs that have the potential to affect practice or policy may require longer periods of time to pass before impact materializes and can be measured, so they would also best be evaluated on an interim basis. Sources of Information A more proactive technical assistance approach is needed to ensure that grantees provide the data necessary to assess the specific dimensions of each quality criteria. As stated above, the information supplied on the APRs and the questionnaire was not always sufficient to inform the quality ratings. (See also the discussion of information requested in the grantee questionnaire, above, and the discussion of APRs, below.) Reviewer Expertise The committee suggests that future output evaluations should consider including an accessible pool of experts in different technical areas who can be called on to review selected grants and outputs. In addition, it is essential that future review panels include scientists with disabilities. Consumers, who are not scientists, could also play a vital role as review panel members who can address the impact and dissemination criteria. Using Annual Performance Reports for Evaluation NIDRR's APR system has numerous strengths, but the committee identified some points that NIDRR should consider in building greater potential for use of these data in evaluations. The APR system (Research Triangle Institute, 2009) includes the grant abstract, funding information, descriptions of the research and development projects, and outcome domains targeted by projects, as well as a range of variables for reporting on the four different types of grantee outputs; see Table 3. The system is tailored to different program mechanisms as needed. All of the descriptive information listed above, plus the output-specific variables listed in Table 3, were used in the committee’s work. The data were provided to the committee as electronic data bases and in the form of individual grant reports. The APR data provided to the committee by NIDRR at the outset of our work was used to profile the grants for sampling and in listing all of the grantees' projects and outputs. They facilitated asking the grantees to nominate outputs for our review, since we were able to generate comprehensive lists of all reported projects and outputs to make the task of output selection less burdensome for the grantees. If grantees had more recent outputs that they wished to nominate as their top two for the committee's review, they had the option to do so. TABLE 3 Data Elements Related to Outputs That Are Covered in an APR Variables in APRa Publications Tools Technology Information Type of output X X X X Name and full citation X X X X Brief description of purpose X X X Brief description of how output was validated or tested X X X Whether publication was peer reviewed or not X Whether the research and related activity reported in the X article took place during current, immediate past, or previous 17

OCR for page 1
Variables in APRa Publications Tools Technology Information (nonconsecutive) funding cycle Whether publication was sent to NARIC for inclusion in X REHABDATA Whether publication was produced as a direct result of X receiving funding for this grant? “Most important”b outputs that contributed the most to X X X X achieving the outcome-oriented goals for the award Outcome-oriented goal that corresponds to most important X X X X outputs (advances knowledge; increases capacity for research, training, or knowledge translation; or facilitates change in policy, practice, or system capacity) NIDRR outcome arena that corresponds to most important X X X X outputs (health and function, employment, participation and community living, cross-cutting) Whether output is described in a publication output and X X X indicate which one Key findings or lessons learned X How output is contributing to the outcome-oriented goal by X X X X solving a problem, closing an identified gap, or benefiting the target population a SOURCE: Using NIDRR APR report format for Rehabilitation Research and Training Centers as an example b Defined for grantees by NIDRR as “those that contributed most to achieving the outcome-oriented goals for the award by advancing knowledge, increasing capacity for research, training or knowledge translation; or facilitating changes in policy, practice, or system capacity.” NIDRR also provided grantees' narrative APRs from the last year of the grants, as well as their final reports. These narratives were very useful to the committee for compiling descriptions of the grants. However, the quality of the information contained in the narrative annual reports varied.3 For example, grant abstracts were not uniform in the information they contained. Some stated their grant objectives; others omitted them and focused on summarizing their main grant activities. The APRs of the grants reviewed were inconsistent in providing useful information for understanding how the outputs being reviewed fit in the context of the overall grant or projects. The final reports in most cases did not provide a cumulative overview of the life cycle of the grants and outputs, which would have been helpful. The APR does collect information on changes in the course of grants, but it was not always easy to understand this information from just viewing the last year's APR or the final report. NIDRR also provided the committee with special text reports that contained some of the narrative information in the APRs about outputs other than publications. These reports included such information as the purpose of the output, NIDRR outcome domains targeted by the output, how the output was validated, and how the output contributes to achievement of the grantee’s goals. These reports have the potential to supply contextual information for evaluations. However, the quality of the information in them varied across the text reports describing the tool, technology, and information outputs that the committee reviewed. Only half of the text reports contained substantive descriptive information. 3 The APR is a large information technology system that is used for monitoring and tracking grantee progress and for reporting on NIDRR’s performance measures under the Government Performance and Results Act (GPRA). The system was not designed to serve as the basis for grantee evaluations. A systematic evaluation of the APR was not part of our charge. Though the quality and level of detail included in the APRs varied, these narratives were useful in providing descriptive grant information. 18

OCR for page 1
Not all of the specific outputs reviewed by the committee were reported in the APRs. Some may have been reported in earlier reporting periods or had been produced after the NIDRR grant ended. Recommendation 3: NIDRR should consider revising its APR to better capture information needed to routinely evaluate the quality and impacts of outputs, grants, and funding mechanisms. They might consider such efforts as consolidating existing data elements or adding new elements to capture the quality criteria and dimensions used in the committee’s summative evaluation. From a recent interview with senior executives at NIDRR, the committee learned that NIDRR takes pride in having stabilized its APR system in recent years after prior periods of changing and improving it to make the data more usable for grantees, for grant monitoring by project officers, and for agency performance reporting. We were informed that NIDRR is currently in the process of adding a new "accomplishments" module to the APR that will focus on the external use and adoption of NIDRR-funded outputs. In this new module, NIDRR will consolidate some data elements that are already being collected and add new ones. For up to five outputs that have been used or adopted by persons or groups external to the grant during the reporting period, grantees will be asked to provide information for each output on who adopted the outputs (in 16 categories, such as researchers, practitioners, service providers); how the output is being used or adopted by the target audience; the source of the evidence; and if and how the output may be contributing to changes in policy, practice, system capacity, or other impact areas. These efforts that are under way to change the APR will address the quality criteria used in the committee’s evaluation for assessing the advancement of knowledge or practice and the likely or demonstrated impact of outputs. For the technical quality criterion, the current APR system collects data on whether articles were published in peer-reviewed journals. For the technical quality of outputs other than publications, we provide examples in the discussion of Recommendation 2 (above) of ways to operationalize dimensions of accessibility and usability, such as providing evidence of testing the psychometrics of measurement instruments, assessing the usability features of informational products, and documenting the results of research and development tests of technology products that relate to human factors, ergonomics, universal design, product reliability, and safety. The APR system currently asks for information on how outputs were validated, but data elements that relate to such testing might be further specified in the APR system. The APR system might also be modified to capture evidence on the quality criterion of dissemination of outputs through such data elements as target audiences for dissemination activities, media types, number of outputs disseminated, and reach of dissemination, such as number of hits on websites. Recommendation 4: NIDRR should investigate ways to work with grantees to ensure the completeness and consistency of information provided in the APRs. The committee fully appreciates the necessity of minimizing the data collection burden on grantees and acknowledges the challenges and feasibility issues related to 19

OCR for page 1
modifying the APR system while at the same time providing continuity in the system. The committee suggests, however, that embedding evaluation data collection processes into existing processes will lead to greater efficiencies and reduce grantee burden while enhancing NIDRR’s ability to evaluate quality and impact. The committee acknowledges that the refinements suggested would have to be undertaken in the context of a larger assessment of the APR system as part of NIDRR's ongoing initiatives to improve the system. In sum, the committee was able to create a reasonably valid and reliable system for evaluating the outputs of NIDRR grantees. If future evaluations of output quality are conducted, the process developed by the committee should be implemented with refinements to strengthen the design and process. Although assessing grantee outputs is of great value, the committee thinks that even greater value would come from assessing outputs in the context of a more comprehensive grant-level evaluation, which could yield broader implications for the value of grants, their impact, and future directions for NIDRR. The committee has appreciated the opportunity to work on this important endeavor, and we look forward to delivering our final report to you later this year. Sincerely yours, David H. Wegman, Chair Committee on the External Evaluation of NIDRR and Its Grantees 20