TO EARN THE RESPECT AND TRUST of individual respondents and other data providers, such as businesses and other government agencies, a statistical agency must be able to offer a credible pledge of confidentiality for the information it collects for statistical purposes. Providers must trust that the data they share with a statistical agency will neither be made available for any administrative, regulatory, law enforcement, or other targeted proceeding that might harm individuals or organizations nor be hacked or otherwise intruded on by unauthorized people inside or outside the statistical agency.
A credible pledge of confidentiality for individual and organizational responses is considered essential to encourage high response rates and accuracy of responses from survey participants.76 Moreover, if individual participants have been assured of confidentiality, disclosure of identifiable information about them would violate the principle of respect for persons even if the information is not sensitive and would not result in any social, economic, legal, or other harm (see Practice 7; National Research Council, 2003b:Ch. 5). For sensitive administrative data obtained from another government agency, there must be a credible pledge of confidentiality in a properly formulated memorandum of understanding or other authorizing document.
76 Reviews of research on how confidentiality and privacy concerns may affect response rates include Hillygus et al. (2006) and National Research Council (1979, 2004a:Ch. 4, 2013a:Ch. 1). Not all statistically useful data are collected under a pledge of confidentiality—see “Definition of a Federal Statistical Agency” in Part I above.
Some agencies, including the Census Bureau and the National Agricultural Statistics Service, have long had legislative protection for ensuring respondent confidentiality.77 However, prior to the passage of the Confidential Information Protection and Statistical Efficiency Act of 2002 (CIPSEA), other agencies, including the Bureau of Labor Statistics, had to rely on strong statements of policy, legal precedents in court cases, or customary practices (see Gates, 2012; Norwood, 1995). Agencies that did not have legal protection for their practices were at risk of having their policies overturned by judicial interpretations of legislation or executive decisions that would require the agency to disclose identifiable data collected under a pledge of confidentiality.78
The passage of CIPSEA was a landmark event in the history of confidentiality protection for statistical data (see Appendix A).79 Subtitle A provides a statutory basis for protecting the confidentiality of all federal data collected for statistical purposes under a confidentiality pledge, including but not limited to data collected by statistical agencies. CIPSEA states that individually identified information obtained under a confidentiality pledge cannot be disclosed to persons other than the agency’s employees without the respondent’s informed consent and the agency head’s authorization and only when another law (e.g., Title 13 of the U.S. Code) does not prohibit the disclosure. It also provides penalties for employees who knowingly disclose identifiable statistical information (up to 5 years in prison, up to $250,000 in fines, or both). Principal statistical agencies and recognized statistical units may also designate contractors and outside researchers as “agents,” who may have access to specified confidential information, such as microdata in a restricted access environment, if they agree to be subject to the penalties for disclosure.
Confidentiality protection for statistical data is now on a much firmer legal footing across the federal government than prior to CIPSEA, with one exception. Section 508 of the USA PATRIOT Act of 2001 (P.L. 107-56) amended the National Center for Education Statistics (NCES) Act of 1994 to allow the U.S. Attorney General (or an assistant attorney general) to apply to a court to obtain any “reports, records, and information (including individually identifiable information) in the possession” of NCES that
77 For the Census Bureau, such legislation was first enacted in 1929 in Title 13 of the U.S. Code; for the National Agricultural Statistics Service, such provisions are in Title 7.
79 CIPSEA was preceded by a Federal Statistical Confidentiality Order, issued by OMB on June 27, 1997. It told respondents who provide statistical information to specified agencies that their responses would be held in confidence and would not be used against them in any government action “unless otherwise compelled by law” (see Appendix A).
are considered relevant to an authorized investigation or prosecution of domestic or international terrorism. Section 508 also removed penalties for NCES employees who furnish individual records under this section. This exclusion for NCES has not been invoked, and its possible effect on survey response rates has not been tested, but its existence is not helpful for the mission of statistical agencies and the need for trust on the part of data providers.
Both the perception and reality of agencies’ confidentiality protection may be affected by departmental initiatives to consolidate data processing and storage to bolster computer and network security in the federal government, improve the cost-effectiveness of information technology development and maintenance, and protect against cyberattacks. Such initiatives are required by, respectively, the 2002 Federal Information Security Management Act, the 2014 Federal Information Technology and Acquisition Reform Act, and the 2015 Federal Cybersecurity Enhancement Act (see Practice 2 and Appendix A). An effective statistical agency will work with its department on approaches to computer security that recognize the need for the agency to control the processing and storage of data collected for statistical purposes under a pledge that the data will be not accessible for other departmental purposes, such as regulation or enforcement.
CONFIDENTIALITY AND DATA ACCESS
Although confidentiality protection is essential for gaining and keeping trust with data providers, a statistical agency’s fundamental mission is to disseminate information widely. Consequently, there is a tension between the goals of protection and access (see Practice 5). Agencies cannot guarantee zero risk of disclosure for public-use products. And even if all use were restricted to secure enclaves (which would not be desirable), there would still be the risk that an employee or agent might breach confidentiality inadvertently (or advertently) or that an agency’s computer systems could be hacked. The challenge to statistical agencies is to devise appropriate methods and procedures to minimize the disclosure risk and to continually improve on methods as the threats to confidentiality change.
Computerized data processing has enabled statistical agencies for more than 60 years to make available a large volume of public-use products. Those products include detailed tabulations and public-use microdata samples that are safeguarded against disclosure by such basic methods as suppression of small cells and removal of obvious identifiers. The advent of distributed computing and the Internet greatly increased—and continues to increase—the disclosure risks for such products because of the potential for reidentifying individual respondents through data linkage with the vast
amounts of possibly related information on the web. Risks are increased when data for small groups are tabulated, when the same data are tabulated in a variety of ways, or when public-use microdata samples are released with highly detailed content, particularly when surveys are longitudinal and follow the same respondents over time. Risks are also increased when surveys include linked administrative data or collect biomarkers from blood samples or other physiological measures, as is increasingly being done (National Research Council, 2001a, 2010b).
Statistical agencies have responded to increased disclosure risks by pioneering not only more sophisticated techniques to further protect public-use products, but also procedures to restrict access (e.g., through secure enclaves) to qualified researchers and other agents when a public-use product has been deemed too risky to release (see Practice 5).80 However, agencies have yet to develop formal approaches to confronting the ever more difficult challenges in this area. Such work can benefit from close attention to the work of computer scientists, who are developing conceptual frameworks for assessing disclosure risk, along with sophisticated privacy protective techniques for modifying datasets to preserve analytic capabilities for a given privacy guarantee. Although these techniques are not fully mature, they represent a means toward a more structured and less ad hoc way of measuring the effectiveness of alternative disclosure risk reduction procedures and implementing them as appropriate (see National Academies of Sciences, Engineering, and Medicine, 2017b:Ch. 5).
80 For reviews of alternative procedures for providing data access while maintaining confidentiality protection, see National Academies of Sciences, Engineering, and Medicine (2017b:Ch. 5) and National Research Council (1993b, 2000a, 2003b, 2005b, 2009d, 2010f).