Practice 12: A Strong Internal and External Evaluation Program
STATISTICAL AGENCIES should have regular, ongoing evaluations of their major statistical programs and their overall portfolio of programs. Reviews of major data collection programs and their components should consider how to produce relevant, accurate, and timely data in the most cost-effective manner possible and whether there are ways to improve cost-effectiveness by combining data from multiple sources.87 Reviews of an agency’s portfolio should consider ways to reduce duplication, fill gaps, and adjust priorities so that the overall portfolio is as relevant as possible to the information needs of policy makers and the public. Such evaluations should include internal reviews by staff and external reviews by independent groups.
Statistical agencies that fully follow practices on using multiple data sources (Practice 3), openness (Practice 4), wide dissemination of data (Practice 5), commitment to quality and professional standards (Practice 9), and an active research program (Practice 10) will likely be in a good position to make continuous assessments of and improvements in the relevance, quality, and efficiency of their data collection systems. Yet even the best
__________________
87 OMB issued a proposed addendum, Section 10: Performance Review, to Statistical Policy Directive No. 4 in October 2016 for public comment (see Appendix A). The comment period has closed, and comments are under review. The addendum would require statistical agencies and recognized statistical units to submit annual performance reviews for their key statistical products, focused on such aspects as accuracy of the data, completeness of documentation, and timeliness (see Appendix A). Statistical Policy Directive No. 3 (U.S. Office of Management and Budget, 1985) already requires statistical agencies to examine accuracy and timeliness of key economic indicators on a 3-year cycle.
functioning agencies will benefit from an explicit program of internal and independent external evaluations to obtain fresh perspectives.
EVALUATING QUALITY, RELEVANCE, EFFICIENCY
Evaluation of data quality for a continuing survey or any kind of data collection program begins with regular monitoring of quality indicators that are readily available to users. For surveys, such monitoring includes unit and item response rates, population coverage rates, and measures of sampling error. In addition, in-depth assessment of quality on a wide range of dimensions—including sampling and nonsampling errors across time and among population groups and geographic areas—needs to be undertaken on a periodic basis and the results made public (see Practices 4 and 9, and National Research Council, 2007b).
Research on methods to improve data quality may cover such areas as alternative methods for imputing values for missing data, alternative question wordings to reduce respondent reporting errors (based on cognitive methods), and alternative sources of data and ways for combining them to enhance quality. Methods for such research may include the use of “methods panels” (small samples for which experiments are conducted by using alternative procedures and questionnaires), matching with administrative records, and simulations of sensitivity to alternative procedures. The goal of the research should be the development of better methods that are feasible and cost-effective.
In ongoing programs for which it is disruptive to implement improvements on a continuing basis, a common practice is to undertake major research and development activities at intervals of 5, 10, or more years. Agencies should ensure, however, that the intervals between major research and development activities do not become so long that data collection programs deteriorate in quality, relevance, and efficiency. Regular, well-designed program evaluations, with adequate budget support, are key to ensuring that data collection programs do not deteriorate. Having a set schedule for research and development efforts will enable data collection managers to ensure that the quality and usefulness of their data are maintained and help prevent that increasingly less optimal procedures are locked in over time.
In addition to quality, relevance of an agency’s data collection programs needs to be assessed. The question of relevance is whether the agency is “doing the right thing” in contrast to whether the agency is “doing things right.” Relevance should be assessed not only for particular programs or closely related sets of programs, but also for an agency’s complete portfolio
in order to assist it in making the best choices among program priorities given the available resources.
Communicating closely with stakeholders and important user constituencies—through such means as regular meetings, workshops, conferences, and other activities—is important to ensuring relevance (see Practice 6). Including other federal statistical colleagues in this communication, both as users and as collaborators, can be valuable.
Statistical agencies commonly find it difficult to discontinue or scale back a particular data series, even when it has largely outlived its usefulness relative to other series, because of objections by users who have become accustomed to it. In the face of limited resources, however, discontinuing a series is preferable to across-the-board cuts in all programs, which would reduce the accuracy and usefulness of both the more relevant and less relevant data series. Regular internal and external reviews and a documented priority-setting process or framework can help an agency not only reassess its priorities, but also develop the justification and support for changes to its portfolio.
Finally, statistical agencies should review their programs for efficiency and cost-effectiveness.88 Federal statistics as a public good represent a legitimate call on public resources, and statistical agencies in turn are properly called on to analyze the costs of their programs on a continuing basis to ensure the best return possible on tax dollars. For this purpose, statistical agencies should develop complete, informative models for evaluating costs of current procedures and possible alternatives and follow best practice for design of statistical production processes. One excellent guide to best practices is the Generic Statistical Business Process Model of the United Nations Economic Commission for Europe. First developed in 2008 and most recently updated in 2013 (version 5), this model is designed to enable statistical agencies to describe production processes in a coherent way, compare processes within and among organizations, and make better decisions on production systems and allocation of resources (UNECE High-Level Group for the Modernisation of Official Statistics, 2013).
__________________
88 “Efficiency” is generally defined as an ability to avoid waste (of materials, energy, money, time) in producing a specified output. “Cost-effectiveness” connotes a broader, comparative look at inputs and outputs to assess the most advantageous combination. (“Cost-benefit” analysis attempts to add monetary values to outputs.) In the context of federal statistical programs, cost-effectiveness analysis would assess the costs of conducting a program for different combinations of desired characteristics, such as improved accuracy or timeliness and reduced burden on respondents.
TYPES OF REVIEWS
Regular program reviews should include a mixture of internal and external evaluation. Agency staff should set goals and timetables for internal evaluations that involve staff who do not regularly work on the program under review. Independent external evaluations should also be conducted on a regular basis, the frequency of which should depend on the importance of the data, how quickly the phenomena being measured change, and how quickly respondent behavior and data collection technology that may adversely affect a program change.
In a world in which people and organizations are increasingly less willing to respond to surveys, it becomes increasingly urgent to have more frequent evaluations to determine whether there are alternative data sources to surveys with better quality. Agencies should seek outside reviews not only of specific programs, but also of program priorities and quality practices across their entire portfolio.
External reviews can take many forms. They may include recommendations from advisory committees that meet at regular intervals (typically, every 6 months). However, advisory committees should never be the sole source of outside review because the members of such committees rarely have the opportunity to become deeply familiar with agency programs. External reviews can also take the form of a “visiting committee,” following the model of the National Science Foundation (NSF),89 or a special committee or panel established by a relevant professional association or other recognized group.90
__________________
89 For links to evaluations of NSF programs, see https://www.nsf.gov/od/oia/activities/cov/ [April 2017].
90 Examples include an evaluation of the National Center for Education Statistics (NCES) by the National Institute of Statistical Sciences (2017) and numerous evaluations by the National Research Council (e.g., National Research Council, 2009a, which reviewed the statistical programs of the Bureau of Justice Statistics).