Read "Evaluation of the Sea Grant Program Review Process" at NAP.edu

« Previous: E Revised Policy Memorandum on NSGO Final Evaluation and Merit Funding (2005); April 8, 2005

Page 157 Cite

Suggested Citation:"F A Multivariate Analysis of Potential Biases in the Final Evaluation Scores." National Research Council. 2006. Evaluation of the Sea Grant Program Review Process. Washington, DC: The National Academies Press. doi: 10.17226/11670.

Page 158 Cite

Page 159 Cite

Page 160 Cite

Page 161 Cite

Page 162 Cite

Page 163 Cite

Page 164 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Appendix F A Multivariate Analysis of Potential Biases in the Final Evaluation Scores Because bivariate relationships can be obscured if the data generating processes are multivariate, the data were also examined using a multi- variate regression approach. As was also true in the case of the bivariate statistical analyses, the multivariate model was designed to explore the statistical significance of potential sources of bias in the determination of National Sea Grant Office Final Evaluation Review (FE) scores. Thus the model did not include measures of program accomplishments and suc- cess, but instead assumed that the Program Assessment Team (PAT) and FE scores provide accurate assessments of program quality according to the assessment criteria, but might be subject to random errors associated with differences between Cycle 1 and Cycle 2, the number of years that particular NSGO program officers are associated with: particular Sea Grant programs; program seniority; the size of state and federal budget allocations awarded to programs; the within cycle order of review of programs; and the number of years that particular program officers have served as program officers. The general linear model that was estimated can be represented by: Cyclej, PO Continuityij, Program Maturityij, State Budgetij, FEij = f Federal Budgetij, Order of Reviewi, PO Seniorityij where Cyclej is a binary variable used to differentiate between scores awarded in Cycle 1 and Cycle 2; PO Continuity is the number of years that a particular NSGO program officer is assigned to the ith individual Sea 157

158 APPENDIX F Grant program during the jth review cycle; Program Maturity is the num- ber of years that elapsed between the initial chartering of the ith indi- vidual Sea Grant program and the jth review cycle; State Budget is the average state budget allocated to the ith individual Sea Grant program for 2000 through 2002 for observations from Cycle 1 and the 2003 budget for Cycle 2; Federal Budget is the average federal budget allocated to the ith individual Sea Grant program for 2000 through 2002 for observations from Cycle 1 and the 2003 budget for Cycle 2; Order of Review is a pair of binary variables used to differentiate between individual Sea Grant pro- grams reviewed in the first or second year of each cycle from those that were reviewed in the third or fourth year of that cycle; and PO Seniority is a set of binary variables used to differentiate between individual Sea Grant programs that were reviewed by program officers with one or less, 2 or 3, 4 to 10, or more than 10 years of experience as program officers. With observations from Cycle 1 and Cycle 2, there were 44 observations available to use in the analysis. The initial model coefficient estimates are: Standard Coefficients Error P-value Intercept 2.723 0.608 0.000 Cycle Dummy 0.068 0.156 0.667 PO Continuity -0.087 0.038 0.029 Program Maturity -0.040 0.019 0.046 State Budget 1.17E-07 2.32E-07 0.617 Federal Budget 7.68E-08 1.11E-07 0.493 Prog Reviewed in Year 1 0.049 0.186 0.793 Prog Reviewed in Year 2 0.088 0.162 0.591 PO Experience < or = 1 year -0.277 0.411 0.505 PO Experience 2 to 3 years 0.093 0.212 0.664 PO Experience 4 to 10 years -0.117 0.146 0.428 The structure of the model can be viewed as an attempt to explain variations in FE scores for the individual programs using information or proxy information for potential sources of bias that were suggested by the individual Sea Grant program directors. Thus, if the model were to pro- vide accurate predictions of the FE scores, there would be evidence to support the concerns of the individual Sea Grant program directors. The value of R2 (0.292) indicates that the estimated model accounts for 29.2 percent of the observed variation in FE scores. The F-statistic (1.359) is used to test whether the model estimates provide a statistically significant improvement over simply using the average of all FE scores as a predic- tor. The null hypothesis for the test is that the sum of squared deviations

APPENDIX F 159 of the estimates is not significantly different from the sum of squared deviations about the mean. Because the probability that the null hypoth- esis is true (0.242) is greater than 5 percent, the null hypothesis cannot be rejected. Although the overall model performance does not lend credence to the hypothesized biases, it is instructive to look at the model coefficients. The coefficients are the partial derivatives of the model with respect to the explanatory variables. That is, the coefficients are the estimated changes in the value of the FE score for a marginal increase in the associated explanatory variable, holding the value of all other explanatory variables constant. The coefficient associated with the Cycle dummy suggests that there has been an average increase of 0.068 points in the scores of programs in Cycle 2 relative to the scores of programs in Cycle 1. This increase could be due to across-the-board degradation in the programs or tougher grad- ing, but the difference could also have resulted from pure chance. Indeed, the probability that a value of 0.068 could have been observed even if the truth were that there is no effect is 0.667; consequently, it can be con- cluded that the estimated difference is not significantly different from zero. The PO Continuity variable is associated with a coefficient of -0.087. This suggests that for each additional year that a particular program of- ficer spends working with a particular Sea Grant program, the average FE score falls by 0.087 points. This is consistent with public testimony that suggested that the scores would be lower for individual Sea Grant pro- grams that enjoyed longer working relationships with their program of- ficers. Consequently, the relevant null (no effect) hypothesis is that this coefficient is not significantly greater than zero. Because the probability of observing an estimate of -0.087 if the true value of this coefficient were greater that or equal to zero is 0.014, the null hypothesis can be rejected. That is, there is statistical support for the assertion that individual Sea Grant programs with long-term relationships with their program officers scored lower than programs with less program officer continuity. The coefficient associated with the Program Maturity variable (-0.040) suggests that for every additional year of age, program scores decline by 0.040 points. Because testimony suggested that there is an inverse rela- tionship between program age and the FE score, the null hypothesis is that the estimated coefficient is greater than or equal to zero. Because the probability that we would observe an estimate of -0.040 if the true value of the coefficient were greater than or equal to zero is 0.023,1 the null 1The p-value for a 1-tail test is one half the magnitude of the p-value for a 2-tail test; Excel's regression output defaults to a 2-tail p-value.

160 APPENDIX F hypothesis can be rejected; there is statistical support for the assertion that mature programs are scored lower than newer programs. The coefficients associated with the magnitude of state and federal budgets allocated to the individual Sea Grant programs indicate that pro- grams with larger budgets earn higher scores, but the effect is miniscule: a $1 increase in the individual program's state budget is associated with an increase of 1.17E-07 in the score, and a $1 increase in the individual program's federal budget is associated with an increase of 7.68E-08 in the score. That is, to increase the score by 0.1 point, the individual program's state budget would need to be increased by about $8.5 million or the individual program's federal budget would need to be increased by about $13 million. Moreover, the standard errors of the coefficient estimates are so large that the probabilities that differences in the magnitude of state and federal budget allocations have no effect on FE scores are greater than 50 percent. The effect of Order of Review is represented by two binary variables, so the influence of order of review must consider both coefficients to- gether. The appropriate test is an F-test that compares the predictive abil- ity of the model presented above and a model that differs from the above model by excluding the two binary variables used to represent the order of review. The probability that the order of review has no statistically significant influence on the FE score is 93 percent. The effect of PO Seniority is represented by three binary variables, each of which represents the average difference in scores awarded to programs with the most senior program officers relative to the scores awarded to programs with one of the three categories of less experienced program officers. The statistical significance of the influence of program officer seniority is tested with an F-test similar to the test applied for Order of Review. The probability that program officer seniority has no statisti- cally significant influence on the FE score is 64 percent. Because preliminary analysis failed to eliminate the possibility that PO Continuity or Program Maturity exercise statistically significant influ- ence on FE scores, the model was respecified using only those variables as explanations of the observed variation in final scores. The restricted model coefficient estimates are:

APPENDIX F 161 Standard Coefficients Error P-value Intercept 2.504 0.530 2.74E-05 PO Continuity -0.079 0.029 0.010 Program Maturity -0.040 0.019 0.046 State Budget -0.023 0.015 0.137 Although the value of R2 (0.184) for this simpler model is smaller than the R2 for the initial model (0.292), the difference in model performance is not statistically significant.2 In the restricted model, the coefficient (-0.079) associated with the PO Continuity variable suggests that for each additional year that a particular program officer spends working with a particular individual Sea Grant program, the average FE score falls (is improved) by 0.079 points. Again, because public testimony suggested that the scores would be lower for programs that enjoyed longer working relationships with their program officers, the null (no effect) hypothesis is that this coefficient is not signifi- cantly greater than zero. Because the probability of observing an estimate of -0.079 if the true value of this coefficient were greater that or equal to zero is 0.005, the null hypothesis can be rejected. That is, there is again statistical support for the assertion that individual Sea Grant programs that have enjoyed long term relationships with their program officers scored lower (better) than programs with less program officer continuity. The coefficient associated with the Program Maturity variable (-0.023) suggests that for every additional year of age, program scores decline by 0.023 points. Because testimony suggested that there is an inverse rela- tionship between program age and the FE score, the null hypothesis is that the estimated coefficient is greater than or equal to zero. However, because there is a 0.069 probability of observing an estimate of -0.023 even if the true value of the coefficient were greater than or equal to zero, the null hypothesis cannot be rejected, thus there is insufficient statistical support for the assertion that mature programs are scored lower than newer programs. The results of the restricted model suggest that the model could be further simplified without statistically significant loss of performance. The coefficient estimates for a simple linear regression model are: 2 If the true difference in performance between the initial model and the restricted model were zero, the probability of observing this large of a decrease in model fit with the elimina- tion of 8 explanatory variables is 0.747.

162 APPENDIX F Standard Coefficients Error P-value Intercept 1.715 0.103 0.000 PO Continuity -0.077 0.030 0.013 Although the value of R2 (0.138) for this model is again smaller than the R2 for the initial model (0.292), the difference in model performance is not statistically significant.3 In this model, the PO Continuity variable is associated with a coeffi- cient of -0.077, suggesting that for each additional year that a particular program officer spends working with a particular individual Sea Grant program, the average FE score falls (improves) by 0.077 points. Again, because public testimony suggested that the scores would be lower for Sea Grant Colleges and Institutes that enjoyed longer working relation- ships with their program officers, the null (no effect) hypothesis is that this coefficient is not significantly greater than zero. Because the probabil- ity of observing an estimate of -0.077 if the true value of this coefficient were greater that or equal to zero is only 0.007, the null hypothesis can be rejected. That is, there is again statistical support for the assertion that individual Sea Grant programs with long term relationships with pro- gram officers are scored lower (better) than programs with less program officer continuity. In summary, the results of the multivariate analysis are generally consistent with the results of the bivariate analyses and do not support the suggestions that the FE scores are biased as a result of program officer seniority, program funding levels, program maturity, order of review within a cycle, or between Cycle 1 and Cycle 2. However, there is persis- tent and statistically significant evidence that program officer continuity with the individual Sea Grant program is inversely related to the FE score. Indeed, there is less than a 0.007 probability of observing an estimate as large as |-0.077| if the true value of the coefficient were zero. The analysis suggests that knowing how long a program officer has been assigned to a state program carries information that is reflected in the FE scores, but the analysis does not identify whether the observed 3If the true difference in performance between the initial model and the restricted model were zero, the probability of observing this large of a decrease in model fit with the elimina- tion of 9 explanatory variables is 0.622.

APPENDIX F 163 effect is a consequence of program officers representing the program dur- ing the PAT or FE or due to the program officers helping to mentor the individual Sea Grant programs or some other cause. While an effect of 0.077 points seems small, in 2004-05, the average difference between Cat- egory 1A and Category 1B was 0.13 points, the predicted equivalent mag- nitude of a two-year difference in the length of time that a particular program officer is assigned to a particular individual Sea Grant program. The average difference between Category 1B and Category 1C is of a similar magnitude. Thus for two otherwise identical individual Sea Grant programs that deserve to be rated in Category 1A--one with a new pro- gram officer and one with a program officer who has been with an indi- vidual Sea Grant program for 4 years--the program with the new officer would be expected to score 0.307 points higher (worse), a difference large enough to move it from Category 1A to Category 1C.

Next: G Expected Indicators of Performance and Other Issues of Importance »

Evaluation of the Sea Grant Program Review Process (2006)

Chapter: F A Multivariate Analysis of Potential Biases in the Final Evaluation Scores

Welcome to OpenBook!

Get Email Updates