Practice 4: Openness about Sources and Limitations of the Data Provided
A STATISTICAL AGENCY SHOULD BE OPEN about the strengths and limitations of its data, taking as much care to understand and explain how its statistics may fall short of accuracy as it does to produce accurate data. Metadata, or “data about data,” are a critical element of data dissemination and curation (see Practice 5). All data contain some uncertainty and error, which does not mean the data are wrong, but that they need to be used with understanding of their limitations.56
Openness requires that data releases from a statistical program include a full description of the purpose of the program; the methods and assumptions used for data collection, processing, and estimation; what is known and not known about the quality and relevance of the data; sufficient information for estimating variability and other errors in the data, when possible; appropriate analysis methods that take account of variability and other sources of error; and the results of research on the methods and data. Openness also means that a statistical agency should describe how decisions on methods and procedures were made for a data collection program and provide ready access to research results that entered into such decisions. Such transparency is essential for credibility with data users and trust of data providers.
Openness about data limitations requires much more than providing estimates of sampling error for surveys or basic attributes for administrative
__________________
56Manski (2015) points to a need for fuller measurement and communication of uncertainty in official statistics.
records or other nonsurvey data sources. In addition to a discussion of aspects that statisticians characterize as nonsampling errors—such as coverage errors, nonresponse errors, measurement errors, and processing errors—it is valuable to have a description of the concepts used and how they relate to the major uses of the data. Descriptions of the shortcomings of and problems with the data should be provided in sufficient detail to permit a user to take them into account in analysis and interpretation. Descriptions of how the data relate to similar data collected by other agencies should also be provided, particularly when the estimates from two or more surveys or other data sources exhibit large differences that may have policy implications.
On occasion, the objective of presenting the most accurate data possible may conflict with user needs for timely information. When concerns for timeliness prompt the release of preliminary estimates (as is done for some economic indicators), consideration should be given to the frequency of revisions and the mode of presentation from the point of view of the users as well as the issuers of the data. Agencies that release preliminary estimates need to educate the public about differences among preliminary, revised, and final estimates.
To meet their responsibility to users for openness, some statistical agencies in the 1990s developed detailed “quality profiles” for major surveys, including the American Housing Survey (Chakrabarty and Torres, 1996); the Residential Energy Consumption Survey (Energy Information Administration, 1996); the Schools and Staffing Survey (Kalton et al., 2000); and the Survey of Income and Program Participation (U.S. Census Bureau, 1998). Previously the Federal Committee on Statistical Methodology (1978) developed a quality profile for employment as measured in the Current Population Survey. These profiles documented what was and was not known about errors in estimates as a help to experienced users and agency personnel (see Federal Committee on Statistical Methodology, 2001; National Research Council, 1993a, 2007b). As print publications, however, they quickly became outdated and were rarely updated given the burden on agency staff. Today, the Internet enables easier maintenance and updating of quality profile-type information (e.g., separate web pages for major types of error).57
__________________
57 The Census Bureau posts basic quality indicators for the American Community Survey for the nation and states; these include sample size, population coverage, household response rates, and item response rates. See https://www.census.gov/acs/www/methodology/sample_size_and_data_quality/ [April 2017]. The Bureau of Labor Statistics issued a prototype data quality report for the Consumer Expenditure Quarterly and Diary Surveys, which it characterized as the “first in a series of iterations towards a single reference source on a comprehensive set of CE data quality metrics that are timely and routinely updated.” The metrics provided in the prototype report refer to data for 2009–2013. Available: https://www.bls.gov/cex/ce_methodology.htm#dqreports0 [April 2017].
Error (conversely, “accuracy”) is not the only dimension of quality of concern to statistical agencies and their data users. Building on a seminal paper by Brackstone (1999), many statistical agencies around the world have adopted “quality frameworks,” which are typologies of key attributes or dimensions to use in systematically measuring, improving, and documenting data quality. For example, the Eurostat (2000) framework includes relevance, accuracy, timeliness and punctuality, accessibility and clarity, comparability (across time and geography), and consistency (with other series). Biemer et al. (2014) decompose “accuracy” into sampling error (where applicable) and seven types of nonsampling error. Daas et al. (2012) and Federal Committee on Statistical Methodology (2013) address quality attributes for administrative records-based data series.
In the United States, the Information Quality Act of 2000 required all federal agencies to develop written guidelines for how they ensure the quality of the information they disseminate to the public. Using a framework developed by the Interagency Council on Statistical Policy, individual statistical agencies developed quality guidelines (see Practice 9 and Appendix A). However, the guidelines, in response to the legislation, are process oriented (e.g., indicating how to request correction of a datum) and are not quality frameworks as described above. Federal statistical agencies should consider adopting and implementing quality frameworks for their programs.
An important aspect of openness not addressed in quality frameworks concerns the treatment of mistakes that are discovered subsequent to data release. Openness means that an agency has an obligation to issue corrections publicly and in a timely manner. The agency should use not only the same dissemination avenues to announce corrections that it used to release the original statistics, but also additional vehicles, as appropriate, to alert the widest possible audience of current and future users of the corrections in the information. Agencies should be proactive in seeking ways to alert known and likely users of the data about the nature of a problem and the corrective action that it is taking or that users should take.
Overall, agencies should treat the effort to provide information on the quality, limitations, and appropriate use of their data as an essential part of their mission. Such information should be made available in ways that are readily accessible to all known and potential users (see National Research Council, 1993a, 1997b, 2007b).