DEFINITION OF A FEDERAL STATISTICAL AGENCY
A federal statistical agency is a unit of the federal government whose principal function is the compilation and analysis of data and the dissemination of information for statistical purposes.
A statistical agency may be labeled a bureau, center, division, or office or similar title, so long as it is recognized as a distinct entity. Statistical agencies have been established to serve several purposes, including:
• to develop new information for an area of public concern (e.g., the Bureau of Labor Statistics [BLS] and the National Center for Health Statistics);
• to conduct large statistical collection and dissemination operations specified by law (e.g., the U.S. Census Bureau);
• to compile and analyze statistics from sets of administrative records for policy purposes and public use (e.g., the Statistics of Income Division in the Internal Revenue Service [IRS]); and
• to develop broad and consistent estimates from a variety of statistical and administrative sources under a specified conceptual framework (e.g., the Bureau of Economic Analysis [BEA] in the U.S. Department of Commerce, which produces the U.S. National Income and Product Accounts).
Once established, many statistical agencies engage in all of these functions to varying degrees.
This definition of a federal statistical agency does not include many statistical activities of the federal government because they are not performed by distinct units or because they do not result in the dissemination of statistics to others. Such activities include statistics compiled by the U.S. Postal Service to set rates or statistics developed by the U.S. Department of Defense in the testing of weapon systems (see National Research Council, 1998b, 2003b, and 2012d). Nor does it include agencies whose primary functions are the conduct or support of problem-oriented research, although their research may be based on information gathered by statistical means, and they may also sponsor important surveys: examples include the National Institutes of Health, the Agency for Healthcare Research and Quality, and other agencies in the U.S. Department of Health and Human Services.
This definition of a statistical agency also does not usually include agencies whose primary function is policy analysis and planning (e.g., the Office of Tax Analysis in the U.S. Department of the Treasury, the Office of the Assistant Secretary for Planning and Evaluation in the U.S. Department of Health and Human Services). Such agencies may collect and analyze statistical information, and statistical agencies, in turn, may perform some policy-related analysis (e.g., produce reports on trends in after-tax income or child care arrangements of families). However, to maintain credibility as an objective source of accurate, useful information, statistical agencies must be separate from units that are involved in developing policy and assessing policy alternatives.
Statistical agencies have as their primary purpose the dissemination of information that can be used for a wide range of statistical purposes but not for administrative, enforcement, or regulatory purposes that could affect an individual (person or business) data provider. Such data are usually collected under a pledge of confidentiality. Statistical agencies may collect information from government agencies in which individual reporting units are identified because the data are already public information—as, for example, in the Census Bureau’s program to collect financial and employment information for state and local governments (see National Research Council, 2007a) and the program of the National Center for Science and Engineering Statistics to collect information on research and development spending from federal agencies (see National Research Council, 2010c).
Occasionally, statistical agencies are charged to collect information that is made available for both statistical and nonstatistical purposes. For example, the Bureau of Transportation Statistics (BTS) maintains the Airline On-Time Statistics Program (originated by the former Civil Aeronautics Board), which identifies individual airlines.1 However, BTS does not itself use the data for administrative or regulatory purposes—those functions are carried out by the Federal Aviation Administration—and the data are not collected under a pledge of confidentiality. As another example, higher education institutions that participate in federal student aid programs are required by law (20 USC 1094(a)(17)) to respond to surveys conducted by the National Center for Education Statistics (NCES). The data collected on enrollments, graduation rates, faculty and staff, finances, institutional prices, and student financial aid feed into the Integrated Postsecondary Education Data System (IPEDS). The data are not collected under a pledge of confidentiality, and NCES makes information on individual institutions available to parents and students to help them in choosing a college, as well as to researchers and others.2 NCES also collaborates with institutions through the National Postsecondary Education Cooperative, which works to improve the quality of reporting and dissemination of information to the public, identify modifications to definitions that are necessary to keep abreast of changes in the field, and address other aspects of the IPEDS program.
Statistical agencies should carefully consider the advantages and disadvantages of undertaking a program with both statistical and nonstatistical purposes. One potential advantage is that there may be improved consistency and quality when a statistical agency collects information for its own use and that of other parts of its department. One potential disadvantage is that the program may compromise the public perception of the agency as objective and separate from government administrative, regulatory, and enforcement functions.
When an agency decides to carry out a program that has both statistical and nonstatistical uses, it must take care to clearly describe that program on such dimensions as the extent of confidentiality protection, if any (for example, some but not all of the data may be collected under a pledge of confidentiality); the statutory basis for the program and the public purposes it serves, including benefits to respondents from having comparative information
1Available: http://www.bts.gov/xml/ontimesummarystatistics/src/index.xml [February 2013].
available of uniform quality; and the role of the agency (for example, providing information to the public, working with respondents to improve reporting). Should an agency decide that the nature of a program is such that no amount of description or explanation is likely to make it possible for the agency to maintain its credibility as a statistical agency, it should decline to carry out the activity.
The work of federal statistical agencies is coordinated through the U.S. Office of Management and Budget (OMB) Statistical and Science Policy Office (SSP) and the Interagency Council on Statistical Policy (ICSP), which was created by OMB in the 1980s and authorized in statute in the 1995 reauthorization of the Paperwork Reduction Act (44 USC 3504(e)(8)). The ICSP is chaired by the chief statistician in OMB and currently includes representation from 14 agencies and units, which are housed in 9 cabinet departments and 3 independent agencies (see Appendix B):
• Bureau of Economic Analysis (U.S. Department of Commerce)
• Bureau of Justice Statistics (U.S. Department of Justice)
• Bureau of Labor Statistics (U.S. Department of Labor)
• Bureau of Transportation Statistics (U.S. Department of Transportation)
• Census Bureau (U.S. Department of Commerce)
• Economic Research Service (U.S. Department of Agriculture)
• Energy Information Administration (U.S. Department of Energy)
• National Agricultural Statistics Service (U.S. Department of Agriculture)
• National Center for Education Statistics (U.S. Department of Education)
• National Center for Health Statistics (U.S. Department of Health and Human Services)
• National Center for Science and Engineering Statistics (U.S. National Science Foundation [NSF])
• Office of Environmental Information (U.S. Environmental Protection Agency)
• Office of Research, Evaluation, and Statistics (Social Security Administration, U.S. Department of Health and Human Services)
• Statistics of Income Division (U.S. Department of the Treasury)
In addition to these 14 agencies, OMB currently recognizes 110 other units and agencies that are not statistical agencies but that have annual
budgets of $500,000 or more for statistical activities (U.S. Office of Management and Budget, 2012b:Table 1). The principles for federal statistical agencies presented here are relevant to these other agencies that carry out statistical activities, and many of the detailed practices are also pertinent. Similarly, the principles and practices may be relevant to statistical units in state and local government agencies, as well as for international statistical agencies.
ESTABLISHMENT OF A FEDERAL STATISTICAL AGENCY
One of the most important reasons for establishing a statistical agency is to provide information that will allow for an informed citizenry. A democracy depends on an informed electorate. A citizen has a right to information that comes from a trustworthy, credible source and that is relevant, accurate, and timely. Timely information of high quality is also critical to policy analysts and decision makers in both the public and private sectors. (For more information on the purposes of official statistics, see the Fundamental Principles of Official Statistics of the U.N. Statistical Commission in Appendix C; see also U.N. Economic Commission for Europe, 2003; U.N. Statistical Commission, 2003.) Federal statistical agencies serve the key functions of providing a broad array of information to the public and to policy makers and of ensuring the necessary quality and credibility of the data.
Commercial, nonprofit, and academic organizations in the private sector also provide useful statistical information, including data they collect themselves and data they acquire from government agencies and other data collectors to which they add useful other information or analysis. However, because the benefits of statistical information are shared widely throughout society and because it is often difficult to garner payments for these benefits, private entities are not likely to collect all of the data that are needed for public and private decision making or to make data as widely available as needed for important public purposes. Nor are they likely to have the capacity or interest to continually work to make data comparable across geographic areas, population groups, and over time, or, in general, to continually work to provide the needed scope, scale, and quality of statistical information. Government statistical agencies are established to ensure that a broad range of relevant, accurate, timely, and credible information is publicly available. (See National Research Council, 1999b, 2005b, for a discussion of the governmental role in providing public goods, or near public goods, such as research and data.)
The U.S. government collected and published statistics long before any distinct federal statistical agency was formed (see Duncan and Shelton, 1978; Norwood, 1995). The U.S. Constitution mandated the conduct of a decennial census of population; the first such censuses (beginning in 1790) were conducted by U.S. marshals as one of their many duties. Legislation providing for the compilation of statistics on agriculture, education, and income was enacted by Congress in the 1860s. The Bureau of Labor (forerunner of the Bureau of Labor Statistics) was established by law in 1884 as a separate agency with a general mandate to respond to widespread public demand for information on the conditions of industrial workers. The Census Bureau was established as a permanent agency in 1902 to conduct the decennial census and related statistical activities.
Many federal statistical agencies that can trace their roots back to the 19th or early 20th century, such as the National Center for Education Statistics and the National Center for Health Statistics, were organized in their current form following World War II. Several relatively new agencies have since been established, including the Energy Information Administration, the Bureau of Justice Statistics, and the Bureau of Transportation Statistics.3
In every case, the agency itself, in consultation with users of its information, has major responsibility for determining its specific statistical programs and for setting priorities. Initially, many of these agencies also had responsibilities for certain policy analysis functions for their department heads. More recently, policy analysis has generally been located in separate units that are not themselves considered to be statistical agencies, a separation that helps establish and maintain the credibility of statistical agencies as providers of data and analyses that are not designed for particular policy alternatives. Nevertheless, an effective statistical agency has a role as a creative, not just reactive, actor in the development of data needed for policy analysis. Statistical agencies may also play additional roles, such as reviewer and consultant on statistical matters for other units in the same department (see, e.g., National Research Council, 1985a) and collector of data on a reimbursable basis for other agencies.
There is no set rule or guideline for when it is appropriate to establish a separate federal statistical agency, carry on statistical activities within the
3Within the past decade, the Division of Science Resources Studies in the National Science Foundation became the Division of Science Resources Statistics in 2002 and, as provided by section 505 of the America COMPETES Reauthorization Act of 2010, became the National Center for Science and Engineering Statistics in 2011.
operating units of departments and independent agencies, or contract for statistical services from existing federal statistical agencies or other organizations. Establishment of a federal statistical agency should be considered when one or more of the following conditions prevail:4
• There is a need for information on an ongoing basis beyond the capacity of existing operating units, possibly involving other departments and agencies. Such needs may require coordinating data from various sources, initiating new data collection programs to fill gaps, or developing regularly updated time series of estimates.
• There is a need, as a matter of credibility, to ensure that major data series are independent of policy makers’ control.
• There is a need to establish the functional separation of data on individuals and organizations that are collected for statistical purposes from data on individuals and organizations that may be used for administrative, regulatory, or law enforcement uses. Such separation, recommended by the Privacy Protection Study Commission (1977), bolsters a culture and practice of respect for privacy and protection of confidentiality. Functional separation is easier to maintain when the data to be used for statistical purposes are compiled and controlled by a unit that is separate from operating units or department-wide data centers. The Confidential Information Protection and Statistical Efficiency Act of 2002 (CIPSEA) extended legal confidentiality protection to statistical data collections that may be carried out by any federal agency, whether a statistical agency or other type of agency (see Appendix A). Nonetheless, functional separation of statistical data from other kinds of data is important because it makes promises of confidentiality protection more credible.5
4The National Research Council (2001b:Ch. 6) cited a number of these reasons in recommending to the U.S. Department of Health and Human Services that it establish or identify a statistical unit to be assigned responsibility and authority for carrying out statistical functions and data collection for social welfare programs and the populations they serve (although the recommendation was not adopted); see also National Research Council and Institute of Medicine (2004).
5Under the guidance issued for CIPSEA in 2007, OMB has recognized 4 new statistical units in addition to the 12 statistical agencies originally recognized (see Appendix A). These agencies and units are authorized to assign agent status to researchers and contractors, which permits sharing individually identifiable information with them for statistical purposes and holding them legally liable for protecting the confidentiality of the information.
• There is a need to emphasize the principles and practices of an effective statistical agency, for example, professional practice, openness about the data provided, and wide dissemination of data.
• There is a need to encourage research and development of a broad range of statistics in a particular area of public interest or of government activity or responsibility.
• There is a need to consolidate compilation, analysis, and dissemination of statistics in one unit to encourage high-quality performance, eliminate duplication, and streamline operations.
PRINCIPLES FOR A FEDERAL STATISTICAL AGENCY
Principle 1: A federal statistical agency must be in a position to provide objective, accurate, and timely information that is relevant to issues of public policy.
A statistical agency supplies information not only for the use of managers and policy makers in the executive branch and for legislative designers and overseers in Congress, but also for everyone who requires objective statistical information on public issues, whether the information is needed for purposes of production, trade, consumption, or participation in civic affairs. Just as a free enterprise economic system depends on the availability of economic information to all participants, a democratic political system depends on—and has a fundamental duty to provide—wide access to information on education, health, transportation, the economy, the environment, criminal justice, and other social issues.
Federal statistical agencies are responsible for providing statistics on conditions in a variety of areas. The resulting information is used both inside and outside the government not only to delineate problems and to guide courses of action, but also to evaluate the results of government activity or lack of activity. The statistics provide much of the basis on which the government itself is judged. This role places a heavy responsibility on federal statistical agencies for impartiality and objectivity.
In order to provide information that is relevant to public issues, statistical agencies need to reach out to users of the data. While they are usually in touch with the primary users in their own departments, considerable energy and initiative are required to open avenues of communication more broadly to other current and potential users, including analysts and policy makers in other federal departments, state and local government agencies, academic
researchers, private-sector businesses and other organizations, organized constituent groups, associations that represent data users, the media, and members of Congress and their staffs.6
One way to obtain the views of users outside an agency, as well as people with relevant technical expertise, is through advisory committees (see National Research Council, 1993a, 2007c). Many agencies obtain advice from committees that are chartered under the Federal Advisory Committee Act: examples include the Advisory Committee on Agriculture Statistics for the National Agricultural Statistics Service; the Board of Scientific Counselors for the National Center for Health Statistics; the Data Users Advisory Committee and the Technical Advisory Committee for the Bureau of Labor Statistics; and the Scientific Advisory Committee and the National Advisory Committee on Race, Ethnic, and Other Populations for the Census Bureau. The Federal Economic Statistics Advisory Committee (FESAC), chartered in November 1999, provides substantive and technical advice to three agencies—the Bureau of Economic Analysis, the Bureau of Labor Statistics, and the Census Bureau—thereby providing an important cross-cutting perspective on major economic statistics programs.7 Some agencies obtain advice from committees and working groups that are organized by an independent association, such as the American Statistical Association’s Committee on Energy Statistics for the Energy Information Administration.
Other means to gather information about user priorities for federal statistics include workshops and conferences, which are valuable for facilitating interchange among users and agency staff (see National Research Council, 2012a). Online mechanisms, such as blogs and web surveys, may also assist a statistical agency obtain input from users.
It is also important for an agency’s own staff to engage in analysis of its data to improve them and make them more relevant to users (Martin, 1981; Norwood, 1995; Triplett, 1991). For example, relevant analysis may use the agency’s data to examine correlates of key social or economic phenomena or to study the statistical error properties of the data. Such in-house analysis can lead to improvements in the quality of the statistics, to identification of new needs, to a reordering of priorities, and to closer cooperation and
6For example, there are more policy uses of statistical data from the American Community Survey (ACS) and ways to make the ACS data more relevant and accurate for these purposes than have yet been tapped: see National Research Council (2008d, 2011a, 2012g).
mutual understanding with policy analysis units. In working with a policy analysis unit, a statistical agency may describe conditions and possibly measure progress toward some previously identified goal, but it refrains from making policy recommendations. The distinction between statistical analysis and policy analysis is not always clear, and a statistical agency will need to consider carefully the extent of policy-related activities that are appropriate for it to undertake.
Principle 2: A federal statistical agency must have credibility with those who use its data and information.
Users of a statistical agency’s data must be able to trust that the data were collected and analyzed in an objective, impartial manner and that they are as relevant, accurate, and timely as the agency can make them. Without the reality and appearance of credibility, policy debates may deteriorate into attacks on the data, instead of using the information to inform policy choices. Credibility is enhanced when an agency fully informs users of the strengths and weaknesses of the data, makes data available widely, and actively engages with users about priorities for data collection and analysis. When it does so, an agency is perceived to be working in the national interest, not the interest of a particular administration (Ryten, 1990).
Credibility is also enhanced when a statistical agency’s website has readily accessible information about its policies on such topics as confidentiality and privacy protection, scientific integrity, standards for data quality and for documenting sources of error in data collections and the limitations of datasets and statistical models, procedures and schedules for the release of new and continuing data series, and procedures for timely notice of errors and corrections to previously released data. Links to policies of an agency’s parent cabinet department or independent agency that clearly specify the authority that is delegated to the statistical agency also enhance credibility and build trust with users.
Principle 3: A federal statistical agency must have the trust of those whose information it obtains.
The statistical programs of the federal government rely on information supplied by many data providers, including not only other agencies of the federal government, but also individuals and organizations outside the federal government, such as state and local governments, businesses, and
other organizations. Some of this information is a by-product of data collections that are required by law or regulation for use in the administration of government tax and transfer programs, such as employers’ wage reports to state employment security agencies or records of payments to program beneficiaries. But much of it is obtained through the voluntary cooperation of respondents in statistical surveys. Even when response is mandatory, as in the case of statistical programs that are critical to the nation, such as the population and economic censuses, the cooperation of respondents reduces costs and likely promotes accuracy (see National Research Council, 1995b, 2004d).
Important elements in encouraging such cooperation are that respondents believe that the data requested are important and legitimate for the government to collect, that they are being collected in an impartial, competent manner, and that the confidentiality of their responses will be protected. With regard to confidentiality, trust depends on providing respondents with realistic promises of confidentiality that the agency can reasonably expect to honor and then scrupulously honoring those promises.
Respondent trust also depends on adopting practices that respect personal privacy, such as taking steps to minimize the intrusiveness of questions and the time and effort required to participate in a survey to the maximum extent possible that is consistent with the needs for information.
When data are obtained from the administrative records of other federal, state, or local government agencies, the same principle of trust applies in order to secure the fullest possible cooperation of the agencies with a statistical agency’s needs for the records and their documentation. Provider agencies need to believe that their records are important and legitimate for a statistical agency to obtain, that their restrictions on data access will be honored, and that the statistical agency will make every effort to minimize their burden in responding to the agency’s requests.
Principle 4: A federal statistical agency must be independent from political and other undue external influence in developing, producing, and disseminating statistics.
A statistical agency must be able to provide credible information that may be used to evaluate the program and policies of its own department or the government as a whole. More broadly, a statistical agency must be a trustworthy source of objective, relevant, accurate, and timely information for decision makers, analysts, and others—inside and outside the
government—who want to use statistics to understand present conditions, draw comparisons with the past, and help guide plans for the future.8 For these purposes, it is essential that a statistical agency has a strong position of independence that protects it from political and other undue external influence in developing, producing, and disseminating statistics.
Statistical agency independence is exercised in a broad framework. Legislative authority usually gives ultimate responsibility to the secretary of the department rather than to the head of the statistical agency. In addition, a statistical agency is subject to the normal budgetary processes and to various coordinating and review functions of OMB, as well as the legislative mandates and oversight of Congress.
Within this broad framework, a statistical agency has to work to maintain its credibility as an impartial purveyor of information. In the long run, the effectiveness of an agency depends on its maintaining a reputation for impartiality; thus, an agency must be continually alert to possible infringements on its credibility and be prepared to argue strenuously against such infringements.
For an agency head, independence and protection from undue political influence can be strengthened by the method of the appointment. Appointment by the President with confirmation by the Senate for a fixed term, as is the case for the BLS and the U.S. Census Bureau, and departmental appointment of a career civil servant, as is the case for many statistical agencies, are both methods that can bolster the professional independence of an agency head.9 For a fixed term, it is desirable that it not coincide with the presidential term so that professional considerations, rather than political ones, are more likely to be paramount in the appointment process. Appointment by the President with Senate confirmation for a term that is at the pleasure of the President, as is the case for the head of the Energy
8See the Fundamental Principles of Official Statistics of the U.N. Statistical Commission in Appendix C and the European Statistics Code of Practice for the National and Community Statistical Authorities in Appendix D.
9Agencies headed by career civil servants, many of whom hold their positions for long periods of time, include the Bureau of Economic Analysis; the Bureau of Transportation Statistics; the Economic Research Service and the National Agricultural Statistics Service in the U.S. Department of Agriculture (USDA); the National Center for Health Statistics in the Department of Health and Human Services; the National Center for Science and Engineering Statistics in the National Science Foundation; the Office of Research, Evaluation, and Statistics in the Social Security Administration; and the Statistics of Income Division in the Internal Revenue Service.
Information Administration, is not ideal for agency independence. However, this agency does have strong legislative protection for the authority of its administrator (see Practice 2).
The Presidential Appointment Efficiency and Streamlining Act of 2011 (P.L. 112-166), which took effect on August 10, 2012, kept the commissioner of the National Center for Education Statistics and the director of the Bureau of Justice Statistics as presidential appointees but dropped the requirement for Senate confirmation.10 It also provided that the director of the Census Bureau remain a presidential appointee with Senate confirmation but have a fixed 5-year term (with one renewal permitted) for terms beginning on January 1 of years ending in 2 and 7. Previously, the Census Bureau director served at the pleasure of the President.
It is valuable for the head of a statistical agency to have direct access to the secretary of the department or the head of the independent agency in which the statistical agency is located. Such access allows the head to inform new secretaries about the role of a statistical agency and be able to directly present the case for new statistical initiatives. Such direct access currently is provided by legislation only for the Bureau of Labor Statistics and the Energy Information Administration.
It is also desirable for a statistical agency to have its own funding appropriation from Congress and not be dependent on allocations from the budget of its parent department or agency, which may be subject to reallocation. Agencies that have been assisted by departmental allocations include the Bureau of Justice Statistics and the National Center for Health Statistics. Such organizational aspects as direct access to the secretary of the agency’s department and separate budgetary authority are neither necessary nor sufficient for a strong position of independence that protects a statistical agency from undue political influence, but they facilitate such independence. In contrast, some agencies are under several layers of supervision within their departments (see Appendix B).
10These two positions were among a large number of subcabinet-level posts that were changed from presidential appointments requiring Senate confirmation to presidential appointments not requiring confirmation, reflecting a desire by the U.S. Senate to lessen its confirmation workload and increase the prospects for speedier confirmations for the remaining positions.
PRACTICES FOR A FEDERAL STATISTICAL AGENCY
Practice 1: A Clearly Defined and Well-Accepted Mission
A clear understanding of the mission of an agency, the scope of its statistical programs, and its authority and responsibilities are basic to planning and evaluating its programs and to maintaining credibility and independence from political control (National Research Council, 1986, 1997b). Some agency missions are clearly spelled out in legislation; other agencies have only very general legislative authority. On occasion, very specific requirements may be set by legislation or regulation.
A statistical agency’s mission should focus on the compilation, evaluation, analysis, and dissemination of relevant, accurate, and timely information for statistical purposes. When an agency is charged to carry out an activity that could be perceived as undermining its credibility as an objective source of information (e.g., collecting data not only for statistical, but also for administrative purposes), the agency should carefully describe and structure the activity (e.g., perhaps locating it within a clearly demarcated office) so as to preserve credibility. Should it not be possible to develop a satisfactory arrangement that is both responsive to the charge and credible, then a statistical agency should request that the activity be assigned elsewhere.
Agencies should clearly communicate their mission to others. The use of the Internet is one means to publicize an agency’s mission to a broad audience and to provide related information, including enabling legislation, the scope of the agency’s statistical program, confidentiality provisions, operating procedures, and data quality guidelines.
An agency also needs to pay considerable and formal attention to setting statistical priorities (National Research Council, 1976). Advice from outside groups should be sought on the agency’s statistical program, on setting statistical priorities, on the statistical methods used, and on data products and services. Such advice may be sought in a variety of formal and informal ways, but it should be obtained from both data users and providers and from professional or technical experts in the subject-matter area and in statistical methods and procedures. A strong research program in the agency’s subject-matter field can assist in setting priorities and identifying ways to improve an agency’s statistical programs (Triplett, 1991).
Practice 2: Necessary Authority to Protect Independence
Protection from political and other undue external influence over a statistical agency’s data collection, production, dissemination, and other operations necessitates that the agency have the necessary authority for professional decisions in key aspects of its work, including the following:
• authority for professional decisions over the scope, content, and frequency of data compiled, analyzed, or published within the framework set by its authorizing legislation;11
• authority for selection and promotion of professional, technical, and operational staff;
• authority—and recognition by policy officials outside the statistical agency of that authority—to release statistical information, including accompanying press releases and documentation, without prior clearance regarding the statistical content of the release;
• authority to control information technology systems in order to securely maintain the integrity and confidentiality of data and reliably support timely and accurate production of key statistics; and
• authority for the statistical agency head and qualified staff to speak about the agency’s statistics before Congress, with congressional staff, and before public bodies.
In order to guard against even the perception of political and other undue external influence, it is important that a statistical agency strictly observe several basic operating procedures:
• adhere to fixed schedules in public release of important statistical indicators to prevent even the appearance of manipulation of release dates for political purposes;
• maintain a clear distinction between statistical information and policy interpretations of such information by the president, the secretary of the department, or others in the executive branch; and
• have dissemination policies that foster regular, frequent release of major findings from the agency’s statistical programs to the public through the traditional media, the Internet, and other means.
11Most statistical agencies have such broad authority, limited by budgetary constraints, departmental requirements, OMB review, and congressional mandates.
Another important aspect of independence is control over personnel actions, especially the selection and appointment of qualified professional staff, including senior executive career staff. The agency staff who report directly to the agency head should have formal education and deep experience in the substantive, methodological, operational, or management issues facing the agency as appropriate for their positions. For the head of a statistical agency, professional qualifications are of the utmost importance, whether the profession is that of statistician or the subject-matter field of the statistical agency (National Research Council, 1997b). Relevant professional associations can be a source of valuable input on suitable candidates.
The authority to ensure that information technology systems fulfill the specialized needs of the statistical agency is another important aspect of independence. A statistical agency must be able to vouch for the integrity, confidentiality, and impartiality of the information collected and maintained under its authority so that it retains the trust of its data providers and data users. Such trust is fostered when a statistical agency has control over its information technology resources, and there is no opportunity or perception that policy, program, or regulatory agencies could gain access to records of individual respondents. A statistical agency also needs control over its information technology resources to support timely and accurate release of official statistics, which are often produced under stringent deadlines.
Authority to decide the scope and specific content of the data collected or compiled and to make decisions about technical aspects of data collection programs is yet another important element of independence, although such authority can never be without limits. Congress frequently specifies particular data that it wishes to be collected (e.g., data on job openings and labor turnover by the BLS, data on family farms by the Economic Research Service and National Agricultural Statistics Service) and, in the case of the decennial census, requires an opportunity to review the proposed questions.
The OMB Office of Information and Regulatory Affairs, under the Paperwork Reduction Act (and under the preceding Federal Reports Act), has the responsibility for designating a single data collection instrument for information wanted by two or more agencies. It also has the responsibility under the same act for reviewing all questionnaires and other instruments for the collection of data from 10 or more respondents (see Appendix A). In addition, the courts sometimes become involved in interpreting laws and regulations that affect statistical agencies, as in a number of issues concerning data confidentiality and Freedom of Information Act requests and in the use of sampling in the population census.
The budgetary constraints on statistical agencies and OMB’s review of data collections are ongoing. Other pressures depend, in part at least, on the relations between a statistical agency and those who have supervisory or oversight functions. Agencies need to develop skills in communicating to oversight groups the need for statistical series and credibility in assessing the costs of statistical work. In turn, although it is standard practice for the secretary of a department or the head of an independent agency to have ultimate responsibility for all matters in the department or agency, the head of a statistical agency, for credibility, should be allowed full authority in professional and technical matters. For example, decisions to revise the methodology for calculating the consumer price index (CPI), the gross domestic product (GDP), and the new Supplemental Poverty Measure (SPM)12 have been and are properly made by the relevant statistical agency heads or their designees.
Other aspects of independence underscore a statistical agency’s credibility and work to protect it from political and other undue external influence. Authority to release statistical information and accompanying materials (including press releases) without prior clearance for the statistical content by department policy officials is essential so that there is no opportunity for or perception of political manipulation of any of the information.13 Authority for the statistical agency head and qualified staff to speak about the agency’s statistics before Congress, with congressional staff, and before public bodies is also important to maintain an agency’s standing.
When a statistical agency releases information publicly, a clear distinction should be made between the statistical information and any policy interpretations of it. Not even the appearance of manipulation for political purposes should be allowed. This essential requirement is one reason that statistical agencies are required by Statistical Policy Directive Number 3
12See Observations from the Interagency Technical Working Group on Developing a Supplemental Poverty Measure (March 2010:1,3). Available: http://www.census.gov/hhes/www/poverty/SPM_TWGObservations.pdf [February 2013].
13One statistical agency, the Energy Information Agency, has its independence authorized in the statute that established the Department of Energy: “The Administrator [of EIA] shall not be required to obtain the approval of any other officer or employee of the Department in connection with the collection or analysis of any information; nor shall the Administrator be required, prior to publication, to obtain the approval of any other officer or employee of the United States with respect to the substance of any statistical or forecasting technical reports which he has prepared in accordance with law” (Section 205 of Department of Energy Organization Act of 1977; 42 USC 7135(d)).
(U.S. Office of Management and Budget, 1985) to adhere to predetermined schedules for the public release of key economic indicators and take steps to ensure that no person outside the agency has access to such indicators before the official release time. Statistical Policy Directive Number 4 (U.S. Office of Management and Budget, 2008) requires agencies to develop and publish schedules for release of other important social and economic indicators as well (see Appendix A). When an agency modifies a customary release schedule for statistical purposes, it should announce and explain the change as far in advance as possible.
Regarding press releases, Statistical Policy Directive Number 4 encourages statistical agencies to use them as a way to publicize and thereby expand the dissemination of data to the public. The directive explicitly states that “statistical press releases must be produced and issued by the statistical agency and must provide a policy-neutral description of the data.” Policy pronouncements must be issued separately by executive branch policy officials and not by the statistical agency, and “policy officials of the issuing department may review the draft statistical press release to ensure that it does not include policy pronouncements” but for no other reason.
Practice 3: Continual Development of More Useful Data
Federal statistical agencies cannot be static. To provide information of continued relevance for public and policy use, they must continually anticipate data needs for future policy considerations and look for ways to develop data systems that can serve broad purposes. To improve the quality and timeliness of their information, they must keep abreast of methodological and technological advances and be prepared to implement new procedures in a timely manner. They must also continually seek ways to make their operations more efficient and cost-effective. Preparing for the future requires that agencies periodically assess the justification, scope, and frequency of existing data series, plan new data series as required, and be innovative and open in their consideration of ways to improve their programs. Because of the decentralized nature of the federal statistical system, innovation often requires cross-agency collaboration. Innovation also implies a willingness to reach out to a broad range of users to identify emerging needs and to implement different kinds of data collection efforts to answer different needs.
In essence, statistical agencies need to define their primary business as that of providing relevant, accurate, and timely statistical information rather
than continuing long-standing data collection and estimation programs for their own sakes. Over time, this goal will likely require the use of new methods and data sources and new ways of combining information from multiple sources, including cross-sectional and longitudinal surveys, administrative records, and other sources. In considering data collection, estimation, and dissemination strategies for the future, statistical agencies must be mindful of tradeoffs among relevance, accuracy, timeliness, costs, and transparency. It will not usually be possible to maximize all five criteria at the same time, and, indeed, a major challenge for federal statistical agencies is that of continuing to produce high-quality statistics in the face of constrained budgets and growing user demands for relevance and timeliness.
Multiple Data Sources
Statistical agencies need to continuously think creatively about using multiple data sources in their statistical programs, including such strategies as the use of small-area estimation with auxiliary data to expand the value of surveys without the need for increased sample sizes,14 or the use of administrative records or possibly Internet sources to supplement, calibrate, and even replace data that would otherwise be collected in a survey.15 Statistical agencies are already using these and other strategies to maximize the relevance and cost-effectiveness of one or more statistical programs. However, there is much more that can be done, including the further development of infrastructure and policies for the federal statistical system as a whole to facilitate cost-effective approaches to the design of statistical programs. Examples include standard protocols for such essential operations as acquiring administrative records from federal and state agencies, evaluating the completeness and quality of data sources that were not originally designed
14For example, the Census Bureau’s programs of Small-Area Income and Poverty Estimates (SAIPE) and Small-Area Health Insurance Estimates (SAHIE) use various administrative datasets together with the American Community Survey (ACS) in models to produce small-area estimates with less error than the ACS estimates by themselves; see http://www.census.gov/saipe [February 2013]; http://www.census.gov/sahie [February 2013].
15For example, the Current Employment Statistics (CES) program of the Bureau of Labor Statistics (BLS) collects monthly data on employment, hours, and earnings from surveys of business establishments; each year BLS uses a census of such information from state unemployment insurance records to benchmark or correct the monthly estimates; see http://bls.gov/sae/790faq2.htm#Ques8 [February 2013].
for statistical use, and synchronizing individually identifiable information among statistical agencies for statistical purposes.
The Role of Surveys
Many current statistical programs rely on well-established probability sampling methods that draw representative samples of a population, such as household members or business establishments, interview the sample units, and produce estimates that account for known errors in population coverage and missing data and have a quantifiable level of uncertainty from sampling variability. The probability sampling paradigm represented a quantum leap forward in providing cost-effective information on a variety of subjects when it was first introduced for federal surveys beginning in the late 1930s (see Duncan and Shelton, 1978). For example, no longer did everyone in the United States have to answer a long battery of questions in the decennial census, when the use of a separate “long-form” questionnaire administered to a sample of the population could produce reliable estimates for the nation and many subnational geographies (see National Research Council, 2010d).16 Surveys are now widely used throughout the federal government to collect information on a wide array of characteristics of individuals, households, business establishments, government agencies, and other organizations.
However, declining rates of survey response over the past few decades in the United States and other countries pose increasingly difficult challenges to containing the costs of data collection with traditional surveys in ways that do not risk compromising the quality of the data (see, e.g., Brick and Williams, 2013; de Leeuw and de Heer, 2002).17 Survey researchers are actively seeking techniques to maintain and improve both the quality and the cost-effectiveness of surveys (see National Research Council, 2013b), which are and will remain important components of federal statistical programs. Yet the challenges to the survey paradigm make it critically important to consider how other data sources can be used to bolster the completeness, quality, and utility of estimates from statistical agency programs while containing costs.
16The ACS replaced the long-form sample in the 2010 census, after it entered full-scale data collection in 2005.
17Lower response rates reduce the effective sample size and increase the sampling error of estimates from surveys; lower rates also increase response bias in survey estimates to the extent that nonrespondents differ from respondents in ways that affect analysis and are not addressed by weighting and imputation procedures.
Roles for Administrative Records
Administrative records include records of federal, state, and local government agencies that are developed and maintained to administer a government program. Examples include the various records maintained by the U.S. Social Security Administration of taxes collected from workers and benefits paid out to retirees and other beneficiaries; records maintained by state agencies of information provided by applicants for various assistance programs and payments to applicants deemed eligible; and property tax records of local governments.
Administrative records do not fit the sample survey paradigm, but they are not dissimilar from household or business censuses in that they are designed to capture information for all instances of a specified population (e.g., program beneficiaries) according to a set of rules that typically have a statutory basis. The challenge for a statistical agency is to determine not only the extent to which a particular administrative records system covers a relevant population and uses similar enough concepts for the agency’s purposes, but also the extent to which the information recorded is accurate according to the system’s rules. Like a population census, administrative records may have errors of omission and duplication or other types of erroneous enumeration, and the variables in the system may be of inconsistent accuracy (for example, payments to beneficiaries may be more accurate than information provided at the time of application regarding the beneficiary’s characteristics). The records may also be stored in formats that are not readily usable by a statistical agency, they may not be well documented, and they may not be provided on a timely basis. Acquiring the records may involve challenging negotiations with the custodial agency, and because the records are designed for administrative and not statistical purposes, their contents and formats are subject to change that is not under the control of the statistical agency.
Yet given that government program administrative records are rule based, it is possible with sufficient effort to understand their concepts and error properties. It is also possible to develop productive relationships with the custodial agency to make them easier to access and use. That the payoffs can be great is illustrated by many examples of current well-established uses of administrative records for statistical programs. One such example is the Census Bureau’s quinquennial economic censuses, which rely heavily on IRS tax records to obtain basic information for the nation’s millions of sole proprietor businesses. The availability of the IRS records for statistical
purposes by the Census Bureau obviates the need for costly and burdensome surveys of these businesses.18 The IRS information is not without error, but survey responses would also have errors, and the cost and burden savings are enormous. Another example is the regular use of population estimates based on a prior census updated with administrative records of births and deaths and estimates of net immigration to control household survey responses for population undercoverage relative to the census. This use reduces the nonsampling error in the survey estimates for key population groups defined by age, gender, race, and ethnicity.19
There are many ways in which administrative records from federal, state, and even local government agencies could be used more extensively in federal statistical programs. As an example, a study by the Committee on National Statistics (CNSTAT) on retirement income statistics concluded that some of the information that is essential for analysis of savings and retirement decisions and the effect of medical care use and expenditures on retirement income security is most efficiently and accurately obtained from existing administrative records (National Research Council, 1997a). To be useful for estimation, this information (e.g., Social Security earnings histories, Medicare and Medicaid benefits) must be linked to individual data from panel surveys, which has been done to some extent in the Health and Retirement Study sponsored by the National Institute on Aging, the National Longitudinal Surveys sponsored by the Bureau of Labor Statistics,20 and the Census Bureau’s Survey of Income and Program Participation. Similarly, linkage of employer and employment survey data with administrative records can provide enhanced analysis and modeling capability: a good example is the Census Bureau’s Longitudinal Employer-Household Dynamics Program (LEHD).21
In another example, a CNSTAT study on reengineering the Census Bureau’s Survey of Income and Program Participation (SIPP) outlined a strategy for using administrative records to improve the survey’s income information, which, like other household surveys, suffers from substantial
18See https://www.census.gov/econ/census02/pub_text/sector00/cmdesc.htm [February 2013].
19See, for example, National Research Council (2007b), which describes the use of population controls in the ACS.
underreporting of many income sources (National Research Council, 2009e). The recommended strategy includes such elements as conducting regular, routinized comparisons of SIPP data with appropriate control totals from administrative records to identify problematic income sources and monitor improvement (or deterioration) in completeness of the data; incorporating administrative records in model-based imputations to replace older methods; and adjusting SIPP income data to match administrative records control totals. This strategy is more practical for federal records to which the Census Bureau already has access (e.g., Social Security payments) than for state records (e.g., Temporary Assistance to Needy Families benefits), but the development of standard protocols for data sharing could facilitate readier access to relevant state records. In the longer term, it could be cost-effective to use administrative records to replace selected questions in surveys such as SIPP.22
Another method for using survey and administrative records data to improve data quality and relevance is in models that produce improved estimates for specific quantities of interest, such as small-area estimates of school-age children in poverty or of people with health insurance coverage, which use data from the ACS and administrative sources (see National Research Council, 2000c, 2000d). An important step forward would be if methods could be developed to use models to improve all or large parts of entire datasets.
In most uses of administrative data, consideration needs to be given to at least three factors: (1) upfront investments to facilitate the most effective approach to their acquisition and use, accompanied by estimates of the likely longer term cost savings, data quality improvements, or both that can be used to determine priorities for moving forward; (2) the means by which the confidentiality of linked or augmented data files can be protected while allowing access for research purposes (see National Research Council, 2005b); and (3) the protocols and criteria that can be followed to ensure full understanding by the statistical agency of the properties of a specific administrative records system (e.g., population coverage, coding error rates,
22This approach is used by Statistics Canada, which asks respondents to its surveys that collect income data, such as its Survey of Labour and Income Dynamics (SLID), to permit the agency to use the respondent’s income tax information (which is available to the agency for statistical and research purposes) instead of asking a series of questions. See, for example, http://www23.statcan.gc.ca/imdb/p3Instr.pl?Function=assembleInstr&Item_Id=113810&a=1&LI=128008&lang=en&db=imdb&adm=8&dis=2, first question in the SLID questionnaire IN module [February 2013].
frequency of updating). In addition, it is important for statistical agencies to develop protocols and procedures for cooperative working relationships with the custodians of administrative records to facilitate joint understanding of the data needed for statistical use.23 Finally, it is vital for statistical agencies to maintain transparency by developing full documentation for users of the sources of the data provided, including the role played by administrative records, and the limitations (as well as the benefits) of the sources.
Roles for Nontraditional Data Sources
Statistical agencies are currently exploring the use of data sources in addition to surveys and administrative records that hold promise to improve the relevance, accuracy, and timeliness of federal statistics (see National Research Council, 2013a). These nontraditional data sources include, among others, data gleaned from relevant Internet websites and data obtained from the private sector (e.g., scanner data on consumer purchases). Often, these sources generate large volumes of data that require data mining and other computationally intensive techniques for extracting information (see National Research Council, 2008a, esp. App. H).
There are already instances of effective use of nontraditional data sources by statistical agencies, some of long standing. For example, the Economic Research Service in USDA obtains data from private vendors of expenditure data scanned by households from store receipts and has evaluated the quality of the data.24 The National Center for Health Statistics, in its surveys of hospitals and other health care providers, obtains data from questionnaires, abstracts of samples of patient records, and providers’ medical care claim records.25
23A decades-old report from the Federal Committee on Statistical Methodology (FCSM) (1980b) still has much to offer on the issues and problems in using administrative records for statistical purposes, and an FCSM Subcommittee on Administrative Records is making progress toward the development of protocols for accessing, using, and evaluating administrative records (see discussion in Practice 13).
24See http://www.ers.usda.gov/publications/err-economic-research-report/err69.aspx [February 2013].
25See, for example, http://www.cdc.gov/nchs/nhcs/faq.htm, “What Does Participation in the NHCS [National Hospital Care Survey] Entail,” and related frequently asked questions [February 2013]. Much of the data collected could be characterized as administrative records of private-sector organizations and evaluated under the framework outlined above for government administrative records.
Most nontraditional data sources present significant challenges to statistical agencies to evaluate the accuracy and error properties of the information. For example, harvesting website data to develop up-to-the-minute consumer price indexes26 may offer significant timeliness and cost savings compared with traditional methods, but it is not clear how to adjust these data for consumer expenditures that occur off-line so that they accurately represent the universe of purchases. More generally, information that is taken from the Internet cannot usually be described or evaluated according to either a probability survey paradigm or a rules-based administrative records paradigm—for example, people who post items to sell on an auction website do not comprise any specified population. Another challenge is that statistical agencies lack control over the consistency of nontraditional data over time or among vendors or sites, so that deciding to rely heavily on such data sources carries high risks of compromising key time series if a vendor or site then ceases operation or there are marked changes in data content or population coverage.
Yet in an era when data users expect timeliness and when budgets are constrained, it is important that statistical agencies actively explore means by which nontraditional data sources can contribute to their programs. Such means could include augmenting information obtained from traditional sources, replacing information elements that have been obtained from traditional sources, and providing earlier estimates that are later benchmarked using traditional sources. Just as more and more surveys use multiple data collection modes (including the Internet, by mail, by telephone, and in person), so more statistical programs will likely benefit from using multiple data sources.
To garner acceptance for the use of multiple data sources, particularly newer, nontraditional sources with which users are less familiar, statistical agencies should take care to invest resources in documentation and user training and education. Agencies will likely need to “wall off” data series that are derived from nontraditional sources by labeling them as experimental or for research use until their properties can be fully understood. If it is not possible to evaluate a nontraditional source sufficiently to establish its quality and suitability for inclusion in a statistical program, then a statistical agency should not use the data, although it may assist users by informing them of the problems with the source.
Integration and Synchronization of Data Across Agencies
Another way to improve data quality, develop new kinds of information, and increase cost-effectiveness is for statistical agencies that collect similar information to integrate their microdata records for specified statistical uses. One such cost-effective use is for a large survey to provide the sampling frame and additional content for a smaller, more specialized survey. Currently, the National Health Interview Survey of the National Center for Health Statistics serves this function for the Medical Expenditure Panel Survey of the Agency for Healthcare Quality and Research, as does the Census Bureau’s American Community Survey for the National Survey of College Graduates that the Bureau conducts for the National Center for Science and Engineering Statistics (see National Research Council, 2008d).
Another cost-effective use is to synchronize or harmonize similar data held by different agencies. An example is the business establishment lists maintained by the Bureau of Labor Statistics and the Census Bureau. The lists derive from different sources (state employment security records in the case of the BLS and a variety of sources, including federal income tax records, in the case of the Census Bureau), and research has demonstrated that synchronization of the lists would not only improve the accuracy of the information, but also increase the coverage of business establishments in the United States. Such synchronization would make it possible to develop more useful and accurate statistics on the nation’s economy while decreasing the reporting burden on business data providers (National Research Council, 2006b, 2007b).
Enactment of Subtitle B of the 2002 Confidential Information Protection and Statistical Efficiency Act (CIPSEA) was a major achievement, authorizing the synchronization of business data among the three principal statistical agencies that produce the nation’s key economic statistics—BEA, BLS, and the Census Bureau. The first formal proposal for data synchronization under CIPSEA involved matching data from BEA’s international investment surveys with data from the Survey of Industrial Research and Development, which is now the Business Research, Development, and Innovation Survey (conducted by the Census Bureau for the NSF). The results helped BEA improve its survey sample frames and enabled the Census Bureau to identify companies that were not previously known to engage in research and development activities (U.S. Office of Management and Budget, 2004b:44–45). However, synchronization of business establishment lists between BLS and the Census Bureau cannot be accomplished
at present because of the requirement in Title 26 of the U.S. Code that prohibits the Census Bureau from sharing with BLS (or BEA) any tax information of businesses or individuals that it has permission to acquire from the IRS, even for statistical purposes.27
The need to understand temporal changes in important social or economic events may call for the development of longitudinal surveys that track people, institutions, or firms over time. Developing longitudinal data (and general-purpose, repeated cross-sectional data, as well) usually requires much coordination with policy research agencies, other statistical agencies, and academic researchers. Longitudinal data may require more sophisticated methods for collection and analysis than data from repeated or one-time cross-sectional surveys. In addition, considerable time may be needed to produce useful data products for analyzing transitions and other dynamic characteristics of longitudinal samples (although production of cross-sectional products from longitudinal surveys need not take long). Yet data from longitudinal surveys are potentially very useful—sometimes, they are the only means to answer important policy questions: see, for example, National Research Council (1997a) on data needs to inform retirement income policy and National Research Council (2001b) on data needs to evaluate the effects of the 1996 welfare reform legislation.
Historically, because statistical agencies are oriented toward the mission of their particular department, the longitudinal surveys they developed (and cross-sectional data activities as well) typically focused on subject matter and population groups (or other entities) that the department serves. For example, separate datasets are available on health characteristics of infants and children, educational characteristics for children and teenagers, and work force characteristics for adults. Increasingly, however, agencies have considered surveys that follow individuals across such key transitions as from early childhood to school and from school to the labor force (National Research Council, 1998a; National Research Council and Institute of Medicine, 2004, 2008).
Examples of statistical agency surveys that are designed for analysis of
27Efforts have been under way since CIPSEA was enacted to introduce legislation that would permit business data synchronization involving IRS records, but, to date, this has not occurred.
some kinds of transitions include the Early Childhood Longitudinal Study (ECLS), sponsored by the National Center for Education Statistics in collaboration with other agencies, and the National Longitudinal Surveys of Youth (NLSY79, NLSY97), sponsored by BLS. The ECLS includes two cohorts of children, one of kindergartners in 1998 who were followed through 8th grade and another of babies born in 2001 who were followed through kindergarten. A new cohort of kindergartners was sampled in fall 2010 and will be followed through 5th grade.28 The NLSY includes two cohorts, one of people aged 14–22 in 1979 and the other of people aged 12–17 in 1997, both of which are currently being interviewed every other year.29
Other important longitudinal surveys are sponsored by research agencies—for example, the National Institute on Aging sponsors the Health and Retirement Study (HRS), and the National Institute of Child Health and Human Development sponsors the new National Children’s Study (NCS) (see National Research Council and Institute of Medicine, 2008). The HRS, which began in 1992, includes people aged 50 and older, who are interviewed every 2 years, with a new cohort introduced every 6 years.30 The NCS plans to follow 100,000 children and their families from before birth through age 21; it is currently in a testing stage.31
Finally, administrative records can be turned into longitudinal datasets that are useful for research, policy analysis, and program evaluation. For example, the National Center for Education Statistics is assisting states through the Statewide Longitudinal Data Systems (SLDS) program to develop datasets from administrative records that follow school children through primary and secondary education and even into higher education and the workforce.32
It is important for statistical agencies to be innovative in the methods used for data collection, processing, estimation, analysis, and dissemination. Agencies need to investigate new or modified methods that have the potential
31For information about NCS, see http://www.nationalchildrensstudy.gov/Pages/default.aspx [February 2013].
to improve the accuracy and timeliness of their data and the efficiency of their operations. Careful evaluation of new methods is required to assess their benefits and costs in comparison with current methods and to determine effective implementation strategies, including the development of methods for bridging time series before and after a change in procedures.
For example, experience with the use of computer-assisted interviewing techniques, which are widely used for survey data collection, has identified their benefits. It has also identified challenges for the timely provision of data and documentation that require continued research to develop solutions that maximize the gains from these techniques (see National Research Council, 2003e). Currently, statistical agencies are exploring how best to use “paradata,” that is, data about the process by which the information from a survey was generated, such as the times of day that interviews were conducted, how long they took, how many contacts were attempted with sample cases, and similar data. Using paradata together with cost information for various survey processes offers a statistical agency the potential for optimizing the costs and timeliness of data collection and estimation and the accuracy of the survey results—for example, by varying contact modes depending on respondent characteristics (see National Research Council, 2013b).
Statistical agencies have turned to the Internet as a standard vehicle for data dissemination and are increasingly using it as a means of data collection. Internet dissemination facilitates the timely availability of data to a broad audience and provides a valuable tool for users to learn of related datasets from other agencies. However, it poses challenges in such areas as the best ways to provide information on the various dimensions of data quality, such as measurement error and sampling error, and appropriate use of the data to an audience that spans a wide range of analytical skills and understanding (see National Research Council, 2012b).
Internet data collection offers opportunities to reduce costs in comparison with other survey modes and to reduce errors by incorporating automatic edits, prompts, and other features. However, it also poses new challenges in such areas as sample design, questionnaire design, and protecting data confidentiality, and it requires careful evaluation of the effects on the quality of responses in comparison with traditional data collection modes (telephone, mail, personal interview). Yet even as work is ongoing on meeting these challenges, population censuses around the world, federal business surveys, and other surveys are using the Internet as one data collection mode to reduce costs and facilitate response (see National
Practice 4: Openness About Sources and Limitations of the Data Provided
A critically important means to instill credibility and trust among data users and data providers is for an agency to operate in an open and fully transparent manner with regard to the sources and the limitations of its data. Openness requires that an agency provide a detailed description of its data with acknowledgment of any uncertainty and a description of the methods used and assumptions made. Agencies should provide to users reliable indications of the kinds and amounts of statistical error to which the data are subject (see Brackstone, 1999; Federal Committee on Statistical Methodology, 2001a; see also President’s Commission on Federal Statistics, 1971).
Some statistical agencies developed detailed quality profiles for some of their major series, such as those produced for the American Housing Survey (Chakrabarty, 1996), the Residential Energy Consumption Survey (Energy Information Administration, 1996), the Schools and Staffing Survey (Kalton et al., 2000), and the Survey of Income and Program Participation (U.S. Census Bureau, 1998). Earlier, the Federal Committee on Statistical Methodology (1978c) developed a quality profile for employment as measured in the Current Population Survey. These profiles have proved helpful to experienced users and agency personnel responsible for the design and operation of major surveys and data series (see National Research Council, 1993a, 2007c). They were staff-intensive to produce, however, and so have not often been updated. Agencies should pursue creative use of the Internet as a means for easier maintenance and updating of quality profile-type information (e.g., separate web pages for major types of error).33
Openness about data limitations requires much more than providing estimates of sampling error. In addition to a discussion of aspects that statisticians characterize as nonsampling errors—such as coverage errors, nonresponse, measurement errors, and processing errors—it is valuable to have a description of the concepts used and how they relate to the major
33The Census Bureau posts basic quality indicators for the ACS, such as sample size, population coverage, household response rates, and item response rates for the nation and states on the ACS website; see http://www.census.gov/acs/www/methodology/sample_size_and_data_quality [February 2013].
uses of the data. Descriptions of the shortcomings of and problems with the data should be provided in sufficient detail to permit a user to take them into account in analysis and interpretation. Descriptions of how the data relate to similar data collected by other agencies should also be provided, particularly when the estimates from two or more series differ significantly in ways that may have policy implications.
Openness means that a statistical agency should describe how decisions on methods and procedures were made for a data collection program. It is important to be open about research conducted on methods and data and other factors that were weighed in such decisions.
Openness also means that, when mistakes are discovered after statistics are released, the agency has an obligation to issue corrections publicly and in a timely manner. The agency should use not only the same dissemination vehicles to announce corrections that it used to release the original statistics, but should also use additional vehicles, as appropriate, to alert the widest possible audience of current and future users of the corrections in the information.
In summary, agencies should make an effort to provide information on the quality, limitations, and appropriate use of their data that is as frank and complete as possible. Such information, which is sometimes termed “metadata,” should be made available in ways that are easy for users to access and understand, recognizing that users differ in their level of understanding of statistical data (see National Research Council, 1993a, 1997b, 2007c). Agencies need to work to educate users that all data contain some uncertainty and error, which does not mean the data are wrong, but that they should be used with understanding of the possible limitations.
The Information Quality Act of 2000 required all federal agencies to develop written guidelines for maintaining and documenting the quality of their information programs and activities. Using a framework developed collaboratively by the members of the Interagency Council on Statistical Policy (U.S. Departments of Agriculture et al., 2002), individual statistical agencies have developed quality guidelines for their own data collection programs, which are available on the Internet (see Practice 9 and Appendix A).
Practice 5: Wide Dissemination of Data
A statistical agency must have vigorous and well-planned dissemination programs to get information into the hands of users who need it on a timely basis. Planning should be undertaken from the viewpoint that the public
has contributed the data elements, has paid for the data collection and processing, and should in return have the information accessible in ways that make it as useful as possible to the largest number of users.
A good dissemination program provides data to users in forms that are suited to their needs. Data release of aggregate statistics may take the form of regularly updated time series, cross-tabulations of aggregated characteristics of respondents, analytical reports, and brief reports of key findings. Such products should be made accessible through an agency’s Internet website, which should also make available more detailed tabulations in formats that are downloadable from the website. Agencies should take care in designing their websites to make it as easy as possible for users to locate and access information. They should also explore ways to provide data to developers of applications for smartphones and similar media (see National Research Council, 2012b).
Yet another form of dissemination involves access to microdata files, which make it possible to conduct in-depth research in ways that are not possible with aggregate data. Public-use microdata files can be developed for general release. Such files contain data for individual respondents that have been processed to protect confidentiality by deleting, aggregating, or modifying any information that might permit individual identification. Alternatively, an agency can provide or arrange for a facility on the Internet to allow researchers to analyze restricted microdata (that is, data that have not been processed for general release) to suit their purposes, with safeguards so that the researcher is not seeing the actual records and cannot obtain any output, such as too-detailed tabulations, that could identify individual respondents.34 Another alternative is to grant a license to individual researchers to analyze restricted microdata at their own sites by agreeing to follow strict procedures for protecting confidentiality and accepting liability for penalties if confidentiality is breached. A fourth alternative is to allow researchers to analyze restricted microdata at secure sites maintained by a statistical agency, such as one of the Census Bureau’s Research Data Centers located at several universities and research organizations around the country or the National Center for Health Statistics’ Research Data
34Such a utility is provided by the Data Enclave of NORC at the University of Chicago, which provides secure access by researchers to selected microdata sets of the USDA Economic Research Service, the National Center for Science and Engineering Statistics, and several other federal agencies and private foundations (see http://www.dataenclave.org/index.php/home [February 2013]).
Center at its headquarters. Agencies should consider all forms of dissemination in order to gain the most use of their data consistent with protecting the confidentiality of responses (see Doyle et al., 2001; National Research Council, 2005b).
The stunning improvements over the past three decades in computing speed, power, and storage capacity, the growing availability of information from a wide range of public and private sources on the Internet, and the increasing richness of statistical agency data collections have increased the risk that individually identifiable information can be obtained (see National Research Council, 2003d:Ch. 5, 2005b). Statistical agencies must be vigilant in their efforts to protect against the increased threats to disclosure from their summary data and microdata products while honoring their obligation to be proactive in seeking ways to provide data to users. When statistical data are not disseminated in useful forms, there is a loss to the public, not only of wasted taxpayer dollars, but also of research findings that could have informed public policy and served other important societal purposes.
A good dissemination program for statistical data uses a variety of channels to inform the broadest possible audience of potential users about available data products and how to obtain them. Such channels may include providing direct access to aggregated data on the Internet, depositing data products in libraries, establishing a network of centers to work with users (such as the Census Bureau’s state data centers), holding exhibits and making presentations at conferences, and maintaining lists of individuals and organizations to notify of new data. Agencies should also arrange for archiving of data with the National Archives and Records Administration (NARA) and other data archives, as appropriate, so that data are available for historical research in future years with suitable protections for confidentiality.
An effective dissemination program provides not only the data, but also information about the strengths and weaknesses of the data in ways that can be comprehended by diverse audiences. Information about the limitations of the data should be included in every form of data release, whether in a printed report, on a computer-readable data file, or on the Internet (for useful measures of data quality to report, see Federal Committee on Statistical Methodology [2001a]; National Research Council [2007b]).
On occasion, the objective of presenting the most accurate data possible may conflict with the needs of users for the information. The tension between frequency and promptness of release and accuracy should be explicitly considered. When concerns for timeliness prompt the release of preliminary estimates (as is done for some economic indicators), consideration
should be given to the frequency of revisions and the mode of presentation of revised figures from the point of view of the users as well as the issuers of the data. Agencies that release preliminary estimates must educate the public about differences among preliminary, revised, and final estimates.
Practice 6: Cooperation with Data Users
Users of federal statistical data span a broad spectrum of interests and needs. They include policy makers, planners, government program administrators, members of Congress and their staffs, and researchers in federal agencies, state and local governments, the business sector, and academia. They also include activists, citizens, students, and media representatives. An effective statistical agency endeavors to learn about its data users and to obtain input from them on the agency’s statistical programs, including learning what data they use or want, how they use data, and for what purposes.
The needs of users can be explored by forming advisory committees, holding focus groups, analyzing requests and Internet activity, or undertaking formal surveys of users. The task requires continual alertness to the changing composition and needs of users and of potential users. An agency should cooperate with professional associations, institutes, universities, and scholars in the relevant fields to determine the needs of the research community and obtain their insight on potential uses. An agency should work with relevant associations and other organizations to determine the needs of business and industry for its data, as well as with user groups that are formed around federal statistics generally or particular statistical programs.
Within the limitations of its confidentiality procedures, as noted above, an agency should seek to provide maximum access to its data, including making the data available to external researchers for secondary analysis (National Research Council, 1985c, 2005b). Having data accessible for a wide range of analyses increases the return on the investment in data collection and provides support for an agency’s program. Once statistical data are made public, they may be used in numerous ways not originally envisaged. An agency should attempt to monitor the major uses of its data as part of its efforts to keep abreast of user needs.
Researchers and other users of data frequently request data from statistical agencies for specific purposes. The agency should have procedures in place for referring users to the appropriate professional staff who can understand the user’s purposes and needs and who have a thorough knowledge of
the agency’s data. Statistical agencies should view these services as a part of their dissemination activities.
Ensuring equal access requires avoiding release of data to selected individuals or organizations in advance of other users. Agencies that prepare special tabulations of their data on request for external groups must be alert to the proposed uses. If the data are to be used in court cases, administrative proceedings, or collective bargaining negotiations, an agency should have an explicit and publicly known policy for ensuring that all sides may receive the special tabulations, regardless of which side requested them or paid the cost of the tabulation.
Practice 7: Respect for the Privacy and Autonomy of Data Providers
Clear policies and effective procedures for respecting the privacy of respondents and, more broadly, protecting the rights and respecting the autonomy of human research participants are critical to maintaining the quality and comprehensiveness of the data that federal statistical agencies provide to policy makers and the public. Part of the challenge for statistical agencies is to develop effective means of communicating not only the agency’s protection procedures and policies, but also the importance of the data being collected for the public good.
To promote trust and encourage accurate response from data providers, it is important that statistical agencies respect their privacy. When data providers are asked to participate in a survey, they should be told whether the survey is mandatory or voluntary, how the data will be used, and who will have access to the data. In the case of voluntary surveys, information on these matters is necessary in order for data providers to give their informed consent to participate (see National Research Council, 2003d, on regulations and procedures for informed consent).
Respondents invest time and effort in replying to surveys. The amount of effort or burden varies considerably from survey to survey, depending on such factors as the complexity of the requested information. Statistical agencies should attempt to minimize such effort, to the extent possible, by using concepts and definitions that fit respondents’ common understanding; by simplifying questionnaires; by minimizing the intrusiveness of questions and explaining why questions that may be perceived as intrusive are needed
for important purposes; by allowing alternative modes of response when appropriate (e.g., via the Internet); and by using administrative records or other data sources, if they are sufficiently complete and accurate to provide some or all of the needed information. In surveys of businesses or other institutions, agencies should seek innovative ways to obtain information from the institution’s records and minimize the need for respondents to reprocess and reclassify information. It is also the responsibility of agencies to use qualified, well-trained interviewers. Respondents should be informed of the likely duration of a survey interview and, if the survey involves more than one interview, how many times they will be contacted over the life of the survey. This information is particularly important when respondents are asked to cooperate in extensive interviews, search for records, or participate in longitudinal surveys.
Ways in which participation in surveys can be made easier for respondents and result in more accurate data can be explored by such means as focus group discussions or surveys. Many agencies apply the principles of cognitive psychology to questionnaire design, not only to make the resulting data more accurate, but also to make the time and effort of respondents more efficient (National Research Council, 1984). Some agencies thank respondents for their cooperation by providing them with summaries of the information after the survey is compiled.
It is possible that increasing concerns about privacy are contributing to observed declines in survey response rates. In a time when individuals are inundated with requests for information from public and private sources, when there are documented instances of identity theft and other abuses of confidential information on the Internet, when individual information is being used for terrorism-related investigatory or law enforcement purposes, it is not surprising that some people object to responding to censuses and surveys, even when the questions appear noninvasive and the data are collected for statistical purposes under a pledge of confidentiality. 35 The E-Government Act of 2002 requires agencies to develop privacy impact assessments (PIAs) whenever “initiating a new collection of information [that] includes any information in an identifiable form.” The purpose of a privacy impact assessment is to ensure there is no collection, storage, access, use, or dissemination of identifiable information that is not both needed and permitted. In response, statistical agencies have begun
35For a literature review of public opinion on privacy in the wake of the September 11, 2001, terrorist attacks, see National Research Council (2008a:App. M).
conducting and releasing PIAs for statistical programs and, in the process, rethinking how to respect individual privacy in order to maintain trust with data providers (see Appendix A).
Statistical agencies should devote resources to understanding the privacy and confidentiality concerns of individuals (and organizations). They should also devote resources to devising effective strategies for communicating privacy and confidentiality policies and practices to respondents. Such strategies appear to be more necessary—and more challenging—than ever before.
Finally, a reason that respondents reply to statistical surveys is that they believe that their answers will be useful to the government or to society generally. Statistical agencies should respect this contribution by compiling the data and making them accessible to users in convenient forms. A statistical agency has an obligation to publish statistical information from the data it has collected unless it finds the results invalid.
Protecting and Respecting the Autonomy of Human Research Participants
Collecting data from individuals for research purposes with federal funds falls under a series of regulations, principles, and best practices that the federal government has developed over a period of 50 years for research involving human participants (see National Research Council, 2003d). The pertinent regulations, which have been adopted by 10 departments and 7 agencies, are known as the “Common Rule” (45 CFR 46). The Common Rule regulations require that researchers adequately protect the privacy of human participants and maintain the confidentiality of data collected from them, minimize the risks to participants from the data collection and analysis, select participants equitably with regard to the benefits and risks of the research, and seek the informed consent of individuals to participate (or not) in the research. Under the regulations, most federally funded research involving human participants must be reviewed by an independent institutional review board (IRB) to determine that the design meets the ethical requirements for protection.36
Some federal statistical agencies consider certain of their information
36For information about the Common Rule and procedures for the certification of IRBs by the Office for Human Research Protections in the U.S. Department of Health and Human Services, see http://www.hhs.gov/ohrp [February 2013]. For information about proposed changes to the Common Rule, see also http://www.hhs.gov/ohrp/humansubjects/anprm2011page.html [February 2013].
collections to be subject to IRB review. Whether or not a given information collection is subject to formal IRB review, a statistical agency should strive to incorporate the spirit of the Common Rule regulations in the design and operation of its programs involving data collection from individuals. When an agency is required to obtain IRB approval for data collection, it should work proactively with the IRB to determine how best to apply the regulations in ways that do not unnecessarily inhibit participant responses. For example, implied consent is typically used for mail and telephone surveys of the general population; in these situations, written documentation tends not to provide any added protection to the respondent, and could reduce participation. An effective statistical agency will seek ways—such as sending an advance letter—to furnish information to potential respondents that will help them make an informed decision about whether to participate. Such information should include the planned uses of the data and their benefits to individuals and the public.
Practice 8: Protection of the Confidentiality of Data Providers’ Information
Data providers must believe that the data they give to a statistical agency will not be used by the agency to harm them. For statistical data collection programs, protecting the confidentiality of individual responses is considered essential to encourage high response rates and the accuracy of the responses. (For reviews of research on the relationship of concerns about confidentiality protection to response rates, see Hillygus et al., 2006; National Research Council, 2004d:Ch. 4.) Furthermore, if participants have been assured of confidentiality, then under federal policy for the protection of human subjects, disclosure of identifiable information about them would violate the principle of respect for persons even if the information is not sensitive and would not result in any social, economic, legal, or other harm (National Research Council, 2003d:Ch. 5).
Historically, some agencies have had provisions for promising respondent confidentiality written into their authorizing or enabling legislation (e.g., for the U.S. Census Bureau, Title 13 of the U.S. Code, first enacted in 1929, and for the National Agricultural Statistics Service, various provisions in Title 7 of the U.S. Code). However, other agencies (e.g., the Bureau of Labor Statistics) relied on strong statements of policy, legal precedents in court cases, or customary practices (see Gates, 2012; Norwood, 1995). These latter agencies risked having their policies overturned by judicial
interpretations of legislation or executive decisions that might have required the agency to disclose identifiable data collected under a pledge of confidentiality (for an example involving the Energy Information Administration, see National Research Council, 1993b:185–186).
To give additional weight and stature to policies that statistical agencies had pursued for decades, OMB issued a Federal Statistical Confidentiality Order on June 27, 1997. This order assured respondents who provided statistical information to specified agencies that their responses would be held in confidence and would not be used against them in any government action “unless otherwise compelled by law” (U.S. Office of Management and Budget, 1997; see also Appendix A).
The Confidential Information Protection and Statistical Efficiency Act (CIPSEA) became law in 2002, as Title V of the E-Government Act of 2002. Subtitle A of CIPSEA provides a statutory basis for protecting the confidentiality of all federal data collected for statistical purposes under a confidentiality pledge, including but not limited to data collected by statistical agencies. Subtitle A places strict limits on the disclosure of individually identified information collected with a pledge of confidentiality; such disclosure to persons other than the employees or agents of the agency collecting the data can occur only with the informed consent of the respondent and the authorization of the agency head and only when the disclosure is not prohibited by any other law (e.g., Title 13 of the U.S. Code). It also provides penalties for employees or agents who knowingly or willfully disclose statistical information (up to 5 years in prison, up to $250,000 in fines, or both). OMB issued guidance in 2007 to assist agencies in implementing Subtitle A of CIPSEA (U.S. Office of Management and Budget, 2007; see also Appendix A).
Although confidentiality protection for statistical data is now on a much firmer legal footing across the federal government than prior to CIPSEA, there is an exception for some data from the National Center for Education Statistics. The USA PATRIOT Act of 2001, Section 508, amended the National Center for Education Statistics Act of 1994 to allow the U.S. attorney general (or an assistant attorney general) to apply to a court to obtain any “reports, records, and information (including individually identifiable information) in the possession” of NCES that are considered relevant to an authorized investigation or prosecution of domestic or international terrorism. Section 508 also removed the penalties for NCES employees who furnish individual records under this section. This section has not been invoked, and its possible effect on survey response rates has not been tested,
but it is a type of language that is not helpful for the mission of statistical agencies and their need to be independent of undue political interference.
Statistical agencies continually strive to avoid inadvertent disclosure of confidential information in disseminating data. The widespread dissemination of statistical data through the Internet has heightened attention by agencies to ensuring that effective safeguards to protect confidential information are in place. Risks are increased when data for small groups are tabulated, when the same data are tabulated in a variety of ways, or when public-use microdata files (samples of records for unidentified individuals or units) are released with highly detailed content. Longitudinal surveys, for example, particularly newer ones, typically have richly detailed content for multiple domains (e.g., health, education, labor force participation) or multiple respondents (e.g., parents, students, teachers) or both. Risks may also be increased when surveys include linked administrative data or collect biomarkers from blood samples or other physiological measures (National Research Council, 2001a, 2010b).
Because of the disclosure risks associated with detailed tabulations and rich public-use microdata files, there is always a tension between the desire to safeguard confidentiality and the desire to provide public access to data. This dilemma is an important one to federal statistical agencies, and it has stimulated ongoing efforts to develop new statistical and administrative procedures to safeguard confidentiality while permitting more extensive access. An effective federal statistical agency will exercise judgment in determining which of these procedures are best suited to its requirements to serve data users while protecting confidentiality.37
Finally, there is a tension between safeguarding confidentiality and departmental initiatives to consolidate data processing, storage, and maintenance as a way to satisfy requirements of the Federal Information Security Management Act (FISMA), which is Title III of the E-Government Act of 2002 (see Appendix A). FISMA is intended to bolster computer and network security in the federal government. An effective statistical agency will work with its department on approaches to computer security that recognize the need for the statistical agency to control the processing, storage, and maintenance of data that it collects under a pledge of confidentiality and for which it provides an assurance that such data will be not accessible for other departmental purposes, such as regulation or enforcement.
37Several CNSTAT study panels have discussed these issues and alternative procedures for providing data access while maintaining confidentiality protection; see National Research Council (1993b, 2000a, 2003d, 2005b, 2009d, 2010f).
Practice 9: Commitment to Quality and Professional Standards of Practice
The best guarantee of high-quality data is a strong professional staff that includes experts in the subject-matter fields covered by the agency’s program, experts in statistical methods and techniques, and experts in data collection, processing, and other operations. A major function of an agency’s leadership is to strike a balance among these groups and promote working relationships that make the agency’s program as productive as possible, with each group of experts contributing to the work of the others.
An effective statistical agency devotes resources to developing, implementing, and inculcating standards for data quality and professional practice. Although a long-standing culture of data quality contributes to professional practice, an agency should also seek to develop and document standards through an explicit process. The existence of explicit standards and guidelines, regularly reviewed and updated, facilitates training of new in-house staff and contractors’ staffs. The OMB document Standards and Guidelines for Statistical Surveys (U.S. Office of Management and Budget, 2006b) is helpful in that it covers every aspect of a survey from planning through data release (see also U.S. Office of Management and Budget, 2006a, and Appendix A).38 It recommends that agencies develop additional, more detailed standards that focus on their specific statistical activities.39
An effective statistical agency keeps up to date on developments in theory and practice that may be relevant to its program, such as new techniques for imputing missing data (see, e.g., National Research Council, 2004d:App. F, 2010g). An effective agency is also alert to changes in the economy or in society that may call for changes in the concepts or methods used in particular datasets.40 Yet the need for change often conflicts with the
38The data quality guidelines of statistical agencies in other countries are also helpful; for example, see Statistics Canada (2009); United Kingdom Office for National Statistics (2007).
39For examples, see the Statistical Standards of the National Center for Education Statistics, available: http://nces.ed.gov/statprog/2002/stdtoc.asp [February 2013], and the Energy Information Administration’s Standards Manual, available: http://www.eia.doe.gov/smg/Standards.html [February 2013].
40Reviews of concepts underlying important statistical data series include National Research Council (1995a and 2005c) on concepts of poverty; National Research Council (2002a) on cost-of-living concepts; National Research Council (2005a) on “satellite” accounts for nonmarket activities, such as home production, volunteerism, and human capital investment; National Research Council (2006a) on concepts of food insecurity and hunger; National Research Council (2006c) on concepts of residence for the U.S. census and the ACS;
need for comparability with past data series, and this issue can easily dominate consideration of proposals for change. Agencies have the responsibility to manage this conflict by initiating more relevant data series or revising existing series to improve quality while providing information to compare old and new series, such as was done when the BLS revised the treatment of owner-occupied housing in the CPI.41
To ensure the quality of its data collection programs and reports, an effective statistical agency has mechanisms and processes for obtaining both inside and outside review of such aspects as the soundness of the data collection and estimation methods and the completeness of the documentation of the methods used and the error properties of the data. For individual publications and reports, formal processes are needed that incorporate review by agency technical experts and, as appropriate, by technical experts in other agencies and outside the government. (See Appendix A for a description of recent OMB guidelines for peer review of scientific information; reviews at a program or agency-wide level are considered under Practice 12.)
Practice 10: An Active Research Program
Substantive Research and Analysis
A statistical agency should include staff with responsibility for conducting objective substantive analyses of the data that the agency compiles, such as analyses that assess trends over time or compare population groups:
• Agency analysts are in a position to understand the need for and purposes of the data from a survey or other data collection program and know how the statistics will be used. Such information has to be available to the agency and understood thoroughly if the survey design is to produce the data required.
• Those involved in analysis can best articulate the concepts that should form the basic framework of a statistical series. Agency analysts are well situated to understand and transmit the views of external users and researchers; at the same time, close working relationships between analysts
National Research Council (2009b) on concepts of disability; National Research Council (2010a) on satellite accounts for health; and National Research Council and Institute of Medicine (2012) on concepts of medical care economic risk and burden.
41See, e.g., Gillingham and Lane (1982).
and data producers are needed for the translation of the conceptual framework into the design and operation of the survey or other data collection program.
• Agency analysts have access to the complete microdata and so are in a better position than analysts outside the agency to understand and discribe the limitations of the data for analytic purposes and to identify errors or shortcomings in the data that can lead to subsequent improvements.
• Substantive research by analysts on an agency’s staff will have credibility because of the agency’s commitment to openness about the data provided and maintaining independence from political influence.
• Substantive research by analysts on an agency’s staff can assist in formulating the agency’s data program, suggesting changes in priorities, concepts, and needs for new data or discontinuance of outmoded or little-used series.
As with descriptive analyses provided by an agency, substantive analyses should be designed to be relevant to policy by addressing topics of public interest and concern. However, such analyses should not include positions on policy options or be designed to reflect any particular policy agenda. These issues are discussed in Martin (1981), Norwood (1975), and Triplett (1991).
Research on Methodology and Operations
For statistical agencies to be innovative in methods for data collection, analysis, and dissemination, research on methodology and operational procedures must be ongoing. Methodological research may be directed toward improving survey design, measuring error, and, when possible, reducing error from such sources as nonresponse and reporting errors. Other important topics for research include reducing the time and effort asked of respondents, evaluating the best mix of interview modes (e.g., Internet, mail, telephone, personal interview) to cope with increasing nonresponse rates due to such phenomena as cell-phone-only households, developing new and improved summary measures and estimation techniques, and developing innovative statistical methods for confidentiality protection. Research on operational procedures may be directed toward facilitating data collection in the field, improving the efficiency and reproducibility of data capture and processing, and enhancing the usability of Internet-based data dissemination systems.
Many of the current practices in statistical agencies were developed through research they conducted or obtained from other agencies. Federal statistical agencies, frequently in partnership with academic researchers, pioneered the applications of statistical probability sampling, the national economic accounts, input-output models, and other analytic methods. The U.S. Census Bureau pioneered the use of computers for processing the census, and research on data collection, processing, and dissemination operations continues to lead to creative uses of automated procedures and equipment in these areas. Several federal statistical agencies sponsor research using academic principles of cognitive psychology to improve the design of questionnaires (see National Research Council, 1984), the clarity of data presentation, and the ease of use of electronic data collection and dissemination tools such as the Internet. The history of the statistical agencies has shown repeatedly that methodological and operations research can lead to large productivity gains in statistical activities at relatively low cost.
An effective statistical agency actively partners with the academic community for methodological research. It also seeks out academic and industry expertise for improving data collection, processing, and dissemination operations. For example, a statistical agency can learn techniques and best practices for improving software development processes from computer scientists (see National Research Council, 2003e, 2004c).
Research on Policy Uses
Much more needs to be known about how statistics are actually used in the policy-making process, both inside and outside the government. Research about how the information produced by a statistical agency is used in practice should contribute to future improvements in the design, concepts, and format of data products. For example, public-use files of statistical microdata were developed in response to the growing analytic needs of government and academic researchers.
Gaining an understanding of the variety of uses and users of an agency’s data is only a first step. More in-depth research on the policy uses of an agency’s information might, for example, explore the use of data in microsimulation or other economic models, or go further to examine how the information from such models and other sources is used in decision making (see National Research Council, 1991a, 1991b, 1997a, 2000b, 2001b, 2003a, 2010e).
Practice 11: Professional Advancement of Staff
An effective federal statistical agency has personnel policies that encourage the development and retention of a strong professional staff who are committed to the highest standards of quality work for their agency and in collaboration with other agencies. There are several key elements of such a policy:
• The required levels of technical and professional qualifications for positions in the agency are identified, and the agency adheres to these requirements in recruitment and professional development of staff. Position requirements take account of the different kinds of technical and other skills, such as supervisory skills, that are necessary for an agency to have a full range of qualified staff, including not only statisticians, but also experts in relevant subject-matter areas, data collection, processing, and dissemination processes, and management of complex, technical operations.
• Continuing technical education and training, appropriate to the needs of their positions, is provided to staff through in-house training programs and opportunities for external education and training.
• Position responsibilities are structured to ensure that staff have the opportunity to participate, in ways appropriate to their experience and expertise, in research and development activities to improve data quality and cost-effectiveness of agency operations.
• Professional activities, such as publishing in refereed journals and presentations at conferences, are encouraged and recognized, including presentations of technical work in progress with appropriate disclaimers. Participation in relevant statistical and other scientific associations, including leadership positions, is encouraged to promote interactions with researchers and methodologists in other organizations that advance the state of the art. Such participation is also a mechanism for disseminating information about an agency’s programs, including the sources and limitations of the data provided. Guidance from the Office of Science and Technology Policy issued in 2010 stresses the importance of participation in professional activities as a means of ensuring a culture of scientific integrity in federal agencies (see Appendix A).
• Interaction with other professionals inside and outside the agency is fostered through opportunities to participate in technical advisory committee meetings, establish and be active in listservs, blogs, and wikis that take advantage of Internet technology to foster informal exchanges on technical
matters, supervise contract research and research consultants on substantive matters, interact with visiting fellows and staff detailed from other agencies, take assignments with other relevant statistical, policy, or research organizations, and regularly receive new assignments within the agency.
• Participation in cross-agency collaboration efforts, such as the Federal Committee on Statistical Methodology and its subcommittees, is supported. Such participation not only benefits the professional staff of an agency, but also contributes to improving the work of the statistical system as a whole.
• Accomplishment is rewarded by appropriate recognition and by affording opportunities for further professional development. The prestige and credibility of a statistical agency is enhanced by the professional visibility of its staff, which may include establishing high-level nonmanagement positions for highly qualified technical experts.
An effective statistical agency considers carefully the costs and benefits—both monetary and nonmonetary—of using contractor organizations, not only for data collection, as most agencies do, but also to supplement in-house staff in other areas.42 Outsourcing can have benefits, such as: providing experts in areas in which the agency is unlikely to be able to attract highly qualified in-house staff (e.g., some information technology functions), enabling an agency to handle an increase in its workload that is expected to be temporary or that requires specialized skills, and allowing an agency to learn from best industry practices. However, outsourcing can also have costs, including that agency staff become primarily contract managers and less qualified as technical experts and leaders in their fields. An effective statistical agency maintains and develops a sufficiently large number of in-house staff, including mathematical statisticians, survey researchers, and subject-matter specialists, who are qualified to analyze the agency’s data and to plan, design, carry out, and evaluate its core operations so that the agency maintains the integrity of its data and its credibility in planning and fulfilling its mission. Statistical agencies should also maintain and develop staff with the expertise necessary for effective technical and administrative management of contractor resources.
An effective statistical agency has policies and practices to instill the highest possible commitment to professional ethics among its staff, as well as procedures for monitoring contractor compliance with ethical standards. When an agency comes under pressure to act against its principles—for example, if it is asked to disclose confidential information for an enforcement
42Only BLS and the Census Bureau maintain their own interviewing staffs.
purpose or to support an inaccurate interpretation of its data—it must be able to rely on its staff to resist such actions as contrary to the ethical principles of their profession. An effective agency ensures that its staff are aware of and have access to such statements of professional practice as the guidelines published by the American Statistical Association (1999) and the International Statistical Institute (1985), as well as to the agency’s own statements about protection of confidentiality, respect for privacy, standards for data quality, and similar matters. It endeavors in other ways to ensure that its staff are fully cognizant of the ethics that must guide their actions in order for the agency to maintain its credibility as a source of objective, reliable information for use by all.
Practice 12: A Strong Internal and External Evaluation Program
Statistical agencies that fully follow such practices as continual development of more useful data, openness about sources and limitations of the data provided, wide dissemination of data, commitment to quality and professional standards of practice, and an active research program will likely be in a good position to make continuous assessments of and improvements in the relevance and quality of their data collection systems. Yet even the best functioning agencies will benefit from an explicit program of internal and independent external evaluations, which frequently offer fresh perspectives. Such evaluations need to address not only specific agency programs, but also the agency’s portfolio of programs considered as a whole.
Evaluation of data quality for a continuing survey or other kind of data collection program begins with regular monitoring of quality indicators that are readily available to users. For surveys, such monitoring includes unit and item response rates, population coverage rates, and information on sampling error, such as coefficients of variation. In addition, in-depth assessment of quality on a wide range of dimensions—including sampling and nonsampling errors across time and among population groups and geographic areas—needs to be undertaken on a periodic basis and the results made public (National Research Council, 2007c).
Research on methods to improve data quality may cover such areas as alternative methods for imputing values for missing data and alternative question wordings, using cognitive methods, to reduce respondent reporting
errors. Methods for such research may include the use of “methods panels” (small samples of respondents with whom experiments are conducted by using alternative procedures and questionnaires), matching with administrative records, simulations of sensitivity to alternative procedures, and the like. The goal of the research is the development of feasible, cost-effective improved procedures for implementation.
In ongoing programs for which it is disruptive to implement improvements on a continuing basis, a common practice is to undertake major research and development activities at intervals of 5 or 10 years or longer. Agencies should ensure, however, that the intervals between major research and development activities do not become so long that data collection programs deteriorate in relevance, quality, and efficiency.
Regular, well-designed program evaluations, with adequate budget support, are key to ensuring that data collection programs do not deteriorate. Having a set schedule for research and development efforts will enable data collection managers to ensure that the quality and usefulness of their data are maintained and help prevent the locking into place of increasingly less optimal procedures over time.
In addition to quality, it is important to assess the relevance of an agency’s data collection programs. The question in this instance is whether the agency is “doing the right thing” in contrast to whether the agency is “doing things right.” Relevance should be assessed not only for particular programs or closely related sets of programs, but also for an agency’s complete portfolio in order to assist it in making the best choices among program priorities given the available resources.
Keeping in close touch with stakeholders and important user constituencies—through such means as regular meetings, workshops, conferences, and other activities—is important to ensuring relevance. Customer surveys can be helpful on some aspects of relevance, although they typically provide only gross indicators of customer satisfaction, usually with regard to timeliness and ease of use of data products. As discussed in the next section, including other federal statistical colleagues in this communication, both as users and as collaborators, can also be valuable.
Statistical agencies commonly find that it is difficult to discontinue or scale back a particular data series, even when it has largely outlived its usefulness relative to other series, because of objections by users who have
become accustomed to it. In the face of limited resources, however, discontinuing a series is preferable to across-the-board cuts in all programs, which would reduce the accuracy and usefulness of both the more relevant and less relevant data series. Regular internal and external reviews can help an agency not only reassess its priorities, but also develop the justification and support for changes to its portfolio.
Types of Reviews
Regular program reviews should include a mixture of internal and external evaluation. Agency staff should set goals and timetables for internal evaluations, which should involve staff who do not regularly work on the program under review. Independent external evaluations should also be conducted on a regular basis, the frequency of which should depend on the importance of the data and on how quickly changes in such factors as respondent behavior and data collection technology may adversely affect a program. In a world in which people and organizations appear increasingly less willing to respond to surveys, it becomes increasingly urgent to continually monitor response and have more frequent evaluations than in a more stable environment. In addition to program evaluations, agencies should seek outside reviews to examine priorities and quality practices across the entire agency.
External reviews can take many forms. They may include recommendations from advisory committees that meet at regular intervals (typically every 6 months). However, advisory committees should never be the sole source of outside review because the members of such committees rarely have the opportunity to become deeply familiar with agency programs. External reviews can also take the form of a “visiting committee” using the NSF model;43 an academic-type visiting committee; a special committee established by a relevant professional association (see, e.g., American Statistical Association, 1984); or a study by a panel of experts.44
44See, e.g., National Research Council (1985a—study of the statistical programs of the Immigration and Naturalization Service); (1986—study of NCES); (1997b—study of BTS); (2004b—study of NCSES statistics on research and development expenditures); (2009a— study of BJS); and other National Research Council reports in the references.
Practice 13: Coordination and Collaboration with Other Statistical Agencies
The U.S. federal statistical system consists of many agencies in different departments, each with its own mission. Nonetheless, statistical agencies do not and should not conduct their activities in isolation. An effective statistical agency actively explores ways to work with other agencies to meet current information needs, through such means as seeking ways to integrate the designs of existing data systems to provide new or more useful data than a single system can provide. An effective agency is also alert for occasions when it can provide technical assistance to other agencies—including not only other statistical agencies, but also program agencies in its department—as well as occasions when it can benefit from such assistance in turn.
Efforts to standardize concepts and definitions, such as those for industries, occupations, and race and ethnicity, can contribute to effective coordination of statistical agency endeavors (see, e.g., National Research Council, 2004a, 2004b; also see Appendix A), as does the development of broad macro models, such as the system of national accounts. Efforts to standardize categories on survey questionnaires among agencies can enhance data comparability, while efforts by agencies to adopt common standards for data documentation and other metadata can contribute to the ease of the use of statistical products. Initiatives for interrelating and synchronizing data among statistical agencies (including individual data and address lists when permitted by law) can be helpful for such purposes as achieving greater efficiency in drawing samples, evaluating completeness of population coverage, and reducing duplication among statistical programs, as well as reducing respondent burden.
Role of OMB
The responsibility for coordinating statistical work in the federal government is specifically assigned to the Office of Information and Regulatory Affairs (OIRA) in OMB by the Paperwork Reduction Act (previously, by the Federal Reports Act and the Budget and Accounting Procedures Act— see Appendix A). The Statistical and Science Policy Office in OIRA, often working with the assistance of interagency committees, reviews concepts of interest to more than one agency; issues standard classification systems (of industries, metropolitan areas, etc.) and oversees their periodic revision;
consults with other parts of OMB on statistical budgets; and, by reviewing statistical information collections as well as the statistical programs of the government as a whole, identifies gaps in statistical data, programs that may be duplicative, and areas in which interagency cooperation might lead to greater efficiency and added utility of data. The Statistical and Science Policy Office also is responsible for coordinating U.S. participation in international statistical activities.45
The Statistical and Science Policy Office encourages the use of administrative data for statistical purposes, when feasible, and works to establish common goals and norms on major statistical issues, such as confidentiality. It sponsors and heads the interagency FCSM, which issues guidelines and recommendations on statistical issues common to a number of agencies, typically by working through subcommittees, and also hosts conferences that facilitate professional interaction and development (see Federal Committee on Statistical Methodology, 1978a–2005).46 It encourages CNSTAT at the National Research Council to serve as an independent adviser and reviewer of federal statistical activities. The 1995 reauthorization of the Paperwork Reduction Act created a statutory basis for the existing Interagency Council on Statistical Policy, formalizing an arrangement whereby statistical agency heads participate with OMB in activities to coordinate federal statistical programs (see Appendixes A and B).
Forms of Interagency Collaboration
There are many forms of interagency collaboration and coordination. Some efforts are multilateral, some bilateral. Many result from common interests in specific subject areas, such as economic statistics, statistics on people with disabilities, or statistics on children or the elderly. The U.S. Office of Management and Budget (2011:Ch. 3) describes several interagency collaborative efforts, such as joint support for research that fosters new and innovative approaches to surveys; the development of a statistical community of practice for agencies to share, standardize, and improve
45The Statistical and Science Policy Office, formerly the Statistical Policy Office, was renamed to reflect added responsibilities with respect to the 2001 Information Quality Act standards and guidelines, OMB’s guidance on peer review planning and implementation, and evaluations of science underlying proposed regulatory actions.
statistical protocols and tools; a systemwide initiative to facilitate the statistical uses of administrative records under the leadership of an FCSM subcommittee; and implementation of comparable measures of disability on major household surveys.
A common type of bilateral arrangement is the agreement of a program agency to provide administrative data to a statistical agency to be used as a sampling frame, a source of classification information, or a summary compilation to check (and possibly revise) preliminary sample results. The Bureau of Labor Statistics, for example, benchmarks its monthly establishment employment reports to data supplied by state employment security agencies. Such practices improve statistical estimates, reduce costs, and eliminate duplicate requests for information from the same respondents. In other cases, federal statistical agencies engage in cooperative data collection with state counterparts to let one collection system satisfy the needs of both. A number of such joint systems have been developed, notably by the Bureau of Labor Statistics, the National Agricultural Statistics Service, the National Center for Education Statistics, and the National Center for Health Statistics.
Another example of a joint arrangement is one in which a statistical agency contracts with another to conduct a survey, compile special tabulations, or develop models. Such arrangements make use of the special skills of the supplying agency and facilitate use of common concepts and methods. The Census Bureau conducts many surveys for other agencies, both the National Center for Health Statistics and the National Agricultural Statistics Service receive funding from other agencies in their departments to support their survey work, and the National Center for Science and Engineering Statistics receives funding from agencies in other departments to support several of its surveys (see U.S. Office of Management and Budget, 2012b:Table 2).
The major federal statistics agencies are also concerned with international comparability of statistics. Under the leadership of OMB’s Statistical and Science Policy Office, they contribute to the deliberations of the United Nations Statistical Commission, the OECD, and other international organizations; participate in the development of international standard classifications and systems; and support educational activities that promote improved statistics in developing countries. Statistical agencies also learn from and contribute to the work of established statistical agencies in other countries in such areas as survey methodology, record linkage, confidentiality protection techniques, and data quality standards. Several
statistical agencies run educational programs for government statisticians in developing countries. Some statistical agencies have long-term cooperative relationships with international groups: examples include the Bureau of Labor Statistics with the International Labor Organization, the National Agricultural Statistics Service with the Food and Agriculture Organization, the National Center for Education Statistics with the International Indicators of Education Systems project of the OECD, and the National Center for Health Statistics with the World Health Organization.
To be of most value, the efforts of statistical agencies to collaborate as partners with one another need to involve the full range of their activities, including definitions, concepts, measurement methods, analytical tools, dissemination modes, and disclosure limitation techniques. Such efforts should also extend to policies and professional practices, so that agencies can respond effectively and with a coordinated voice to such government-wide initiatives as data quality guidelines, privacy impact assessments, institutional review board requirements, and others.
Collaboration efforts should also encompass the development of data, especially for emerging policy issues (see, e.g., National Research Council, 1999a, 2007b). In some cases, it may be not only more efficient, but also productive of needed new data for agencies to fully integrate the designs of existing data systems, such as when one survey provides the sampling frame for a related survey. In other instances, collaborative efforts may identify ways for agencies to improve their individual data systems so that they are more useful for a wide range of purposes.
Collaboration on ways and means of using alternative data sources, such as administrative records, should be pursued so that the entire statistical system can move forward to improve the relevance, accuracy, timeliness, and cost-effectiveness of their data programs. Toward this goal, in 2008 the FCSM established a Subcommittee on Administrative Records, which is working to develop standards and provide guidance to statistical agencies that will facilitate not only use of administrative records, but also evaluation of their quality and fitness to be part of an agency’s data collection, estimation, and evaluation programs. This subcommittee has released two products from its work: one is a compilation of case studies of successful statistical uses of administrative data (Federal Committee on Statistical Methodology, 2009); the other is a checklist tool for assessing the quality of administrative data (Federal Committee on Statistical Methodology, 2013).
Two continuing collaborative efforts for providing statistical information to the public in a broad area of interest are the Federal Interagency
Forum on Aging-Related Statistics and the Federal Interagency Forum on Child and Family Statistics. The former was established in the mid-1980s by the National Institute on Aging, in cooperation with the National Center for Health Statistics and the Census Bureau. The forum’s goals include coordinating the development and use of statistical databases among federal agencies, identifying information gaps and data inconsistencies, and encouraging cross-national research and data collection for the aging population. The forum was reorganized in 1998 to include 6 new agencies and has grown since then to include 15 agencies. The forum develops a periodic indicators chart book, which was first published in 2000 and was most recently issued in 2012 (Federal Interagency Forum on Aging-Related Statistics, 2012).
The Federal Interagency Forum on Child and Family Statistics was formalized in a 1994 executive order to foster collaboration in the collection and reporting of federal data on children and families. Its membership currently includes 22 statistical and program agencies. The forum’s reports (e.g., Federal Interagency Forum on Child and Family Statistics, 2012) describe the condition of America’s children, including changing population and family characteristics, the environment in which children are living, and indicators of well-being in the areas of economic security, health, behavior, social environment, and education.
No single agency, whether a statistical agency or program agency, could have produced the forum reports alone. Working together in this way, federal statistical agencies contribute to presenting data in a form that is more relevant to policy concerns and to a stronger statistical system overall. Similar collaborative efforts aimed at integrating not only data dissemination, but also data collection and estimation, using traditional and nontraditional data sources, are critically important to improving the relevance, accuracy, timeliness, and cost-effectiveness of the output from the nation’s federal statistical system.