Page 80 Cite

Suggested Citation:"5 Privacy and Confidentiality." National Academies of Sciences, Engineering, and Medicine. 2017. Strengthening Data Science Methods for Department of Defense Personnel and Readiness Missions. Washington, DC: The National Academies Press. doi: 10.17226/23670.

×

5

Privacy and Confidentiality

INTRODUCTION

When conducting analyses relating to personnel data, it is essential that privacy, confidentiality, and fairness be considered primary factors rather than, as is too often the case, being left as an afterthought secondary to an analysis’s findings. This chapter provides a brief overview of relevant privacy issues. The committee’s findings and recommendations are given in Chapter 7.

As a way to establish procedures for and ensure compliance with federal privacy laws, regulations, and policies, the Deputy Assistant Secretary of Defense for Health Readiness Policy and Oversight administers the Human Research Protection Program for the Office of the Under Secretary of Defense (Personnel & Readiness), referred to throughout the report as P&R.¹ Federal regulations require that ethical guidelines² apply when humans are used as research subjects and that the level of risk is proportionate to the potential benefit for these subjects. Furthermore, the level of review

___________________

¹ The following laws, regulations, and policies govern the Human Research Protection Program: (1) 32 CFR 219, “Protection of Human Research Subjects”; (2) 10 USC 980, “Limitation on use of humans as experimental subjects,” which provides additional requirements for obtaining informed consent; (3) 48 CFR Parts 207, 235, and 252, which address requirements for the protection of human subjects involved in research conducted under contracts; and (4) DoDI 3216.02, “Protection of Human Subjects and Adherence to Ethical Standards in DoD-Supported Research.”

² These ethical guidelines, established by the Department of Health, Education, and Welfare’s Belmont Report in 1979 and codified into federal regulation in 1991, are based on three core principles: “respect for persons, beneficence, and justice.”

Page 81 Cite

Suggested Citation:"5 Privacy and Confidentiality." National Academies of Sciences, Engineering, and Medicine. 2017. Strengthening Data Science Methods for Department of Defense Personnel and Readiness Missions. Washington, DC: The National Academies Press. doi: 10.17226/23670.

×

and oversight should be commensurate with the level of risk associated with the research. The current privacy and confidentiality protections in place with government databases rest heavily on Institutional Review Board (IRB) supervision. Contracts, access control, and de-identification are some of the tools at the disposal of the board. For example, the Person-Event Data Environment (PDE) implemented by the Defense Manpower Data Center (DMDC) and the Army Analytics Group in 2006 is regulated by the Army Human Research Protection Office; data researchers are required to apply for access and explain the analyses they plan to conduct. The data are deidentified by removal of 16 of the 18 fields specified in the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision, and individuals are assigned randomly generated 12-character, long-lived alphanumerical identifiers. For individual studies, secondary randomly generated identifiers are created. The mapping between an individual’s long-lived random identifier and the study-specific random identifier is destroyed either immediately or as determined by the nature of the study and IRB rulings. Data are analyzed in a protected environment accessible only to authorized users, where information flow is monitored and in many cases prohibited.³

However, questions remain as to whether these protections are adequate, if they interfere with legitimate use, and if protections of this sort can be extended to less traditional forms of data—such as e-mail, search histories, or Facebook and Twitter postings—to enable the harnessing of these nontraditional data to address traditional P&R concerns. Questions of this type cannot be answered in a vacuum; the answers may depend on who “owns” the data of the individual in question (e.g., military personnel, their civilian spouses, their children, military retirees, reservists, the individuals with whom they communicate). The answers may also depend on the use to which the results of analyses will be put: for example, to inform policy, to be published, to identify personnel in distress. Even in these cases, questions arise about whose information may be released and to whom it may be released. For example, an e-mail monitoring system might be used to inform individuals that their writings suggest they may need professional help, which has fewer privacy implications because no third party is involved; or, instead, it might be used to alert a commander of a distressed recruit. Privacy—even for a single data analysis task—is not a one-size-fits-all problem.

LIMITATIONS OF PRIVACY-PRESERVING APPROACHES

The Fair Information Practice Principles (FIPPs), set forth in the 1973 U.S. Department of Health, Education, and Welfare report Records, Com-

___________________

³ However, data may be imported and cross-referenced with government data sets, enabling linkage attacks.

Page 82 Cite

Suggested Citation:"5 Privacy and Confidentiality." National Academies of Sciences, Engineering, and Medicine. 2017. Strengthening Data Science Methods for Department of Defense Personnel and Readiness Missions. Washington, DC: The National Academies Press. doi: 10.17226/23670.

×

puters, and the Rights of Citizens, form the bedrock of modern protection regimes, both in national and international law, such as the Privacy Act of 1974, which regulates the federal government’s maintenance, collection, use, and dissemination of personal information in systems of records (Teufel, 2008; NSTIC, 2011).⁴ Among these principles, for example, are the right to know what data are collected and how they are used; a right to object to some uses and to correct inaccurate information; and an obligation of the collecting organization to ensure that the data are reliable and are kept secure.

Implicit in the FIPPs, the HIPAA Safe Harbor provisions, and the de-identification procedures carried out by the PDE is a separation between PII—personally identifiable information—and other kinds of information. The U.S. General Services Administration (GSA, 2014) defines personally identifiable information as follows:

Information about a person that contains some unique identifier, including but not limited to name or Social Security Number, from which the identity of the person can be determined. . . . The definition of PII is not anchored to any single category of information or technology. Rather, it requires a case-by-case assessment of the specific risk that an individual can be identified. In performing this assessment, it is important for an agency to recognize that non-PII can become PII whenever additional information is made publicly available—in any medium and from any source—that, when combined with other available information, could be used to identify an individual.

Such a distinction between personally identifiable information and other kinds of information is now widely considered questionable, according to the May 2014 report of the President’s Council of Advisors on Science and Technology (PCAST):

Some techniques for privacy protection that have seemed encouraging in the past are useful as supplementary ways to reduce privacy risk, but do not now seem sufficiently robust to be a dependable basis for privacy protection where big data is concerned. For a variety of reasons, PCAST judges anonymization, data deletion, and distinguishing data from meta-data . . . to be in this category. . . . Anonymization is increasingly easily defeated by the very techniques that are being developed for many legitimate applications of big data.

Today, ample evidence exists of re-identification of apparently deidentified data, through methods such as multiple queries on a single data-

___________________

⁴ As a result, FIPPs became the foundation for privacy policy principles at the Department of Homeland Security (DHS).

Page 83 Cite

Suggested Citation:"5 Privacy and Confidentiality." National Academies of Sciences, Engineering, and Medicine. 2017. Strengthening Data Science Methods for Department of Defense Personnel and Readiness Missions. Washington, DC: The National Academies Press. doi: 10.17226/23670.

×

base or linking the data in one database with those in another, often publicly available online (Narayanan and Shmatikov, 2008).⁵ Moreover, measures commonly taken to de-identify personal data, such as scrubbing or aggregating certain data fields, degrade the utility of the data (Brickell and Shmatikov, 2008; Detmar, 2012). Existing methods for aggregation of fields, such as k-anonymity, “fail to compose,” meaning that they do not survive multiple instantiations, and, taken together, the k-anonymizations of multiple overlapping data sets can completely fail to protect privacy (Ganta et al., 2008).

More generally, aggregations such as contingency tables, synthetic data, data visualizations, and multiparty computations, which are apparently not about an individual, can also be problematic, as they fail to account for the fact that information derived from explicit information about specific individuals within a data set may have a profound privacy impact. Indeed, theoretical analysis yields a Fundamental Law of Information Recovery stating that “overly accurate” estimates of “too many” statistics can completely destroy privacy (Dinur and Nissim, 2003). For example, forensic analysis of protein statistics in genome-wide association studies, coupled with a DNA sample, can leak participation in a medical case group (and, consequently, the fact of having been diagnosed with a disease) (Homer et al., 2008).

Security of the data is a crucial concern, as the Office of Personnel Management knows too well in light of the June 2015 data breach that FBI Director James Comey estimated to have affected 18 million individuals (Perez and Prokupecz, 2015). Data should always be encrypted at rest and in flight. To the greatest extent possible, computations should be carried out on encrypted data.⁶ The techniques of homomorphic encryption and secure multiparty computation, which permit a data analyst to run computations on data without having direct access to the raw data themselves, even when these data are shared among multiple parties, provide the effect of having the data held by a single trusted and trustworthy data curator who will carry out computations as instructed and release the results. Nonetheless, these techniques do not ensure privacy, in that they do not address the question of what can be safely released. To see the difficulties, suppose we have a perfectly secure system that gives the answers to “counting” questions such as, “How many individuals in the database are over 6 feet tall?” Let’s use our system to answer questions about the House of Representatives

___________________

⁵ If the publicly available data set can be downloaded, it can be brought into a data enclave, potentially enabling a data linkage attack.

⁶ Much is possible even without fully homomorphic encryption, which allows arbitrary computations to be carried out but does not yet enjoy highly efficient implementation. See, for example, Dowlin et al. (2015).

Page 84 Cite

Suggested Citation:"5 Privacy and Confidentiality." National Academies of Sciences, Engineering, and Medicine. 2017. Strengthening Data Science Methods for Department of Defense Personnel and Readiness Missions. Washington, DC: The National Academies Press. doi: 10.17226/23670.

×

and consider these two questions: “How many members of the House of Representatives have the sickle cell trait?” and “How many members of the House, other than the Speaker of the House, have the sickle cell trait?” Neither question seems invasive. But given the exact answers to both of them, the sickle cell status of the Speaker can be determined, which is a clear privacy breach. This breach happens even when everything works exactly as it should: questions are correctly answered, data remain intact, and there is no intrusion into the data set or viewing of raw data.

The example illustrates the Fundamental Law in action: the Speaker’s privacy can be completely destroyed given only two perfectly accurate statistics. The fact that the House of Representatives is large is irrelevant. There is no safety in numbers, and the same attack would work if the data set consisted of all U.S. military personnel, with the Chairman of the Joint Chiefs of Staff playing the role of the Speaker of the House.

DIFFERENTIAL PRIVACY

Differential privacy, a term tailored to the statistical analysis of large data sets, together with a large body of algorithmic work for ensuring computations that satisfy this definition, addresses this last problem. Very roughly, differential privacy ensures that the outcome of any analysis is essentially equally likely, independent of the presence or absence of the data for any individual (e.g., the Speaker of the House). The likelihood is over random choices made by the differentially private algorithm (e.g., through the addition of judiciously chosen random noise). This guarantee is very strong: It says that even if the analyst knows a complete data set of n individuals and is given all the data of an n + 1st individual x, the analyst cannot determine whether the data set actually in use is D or is D′ = D ∪ {x}, the union of D and x.⁷

Data sets can teach us that smoking causes cancer and thereby results in a rise in insurance premiums for the smokers.⁸ But this sort of impact will occur independent of whether any individual x joins or refrains from joining the data set. Differential privacy ensures these are the only forms of harm that can arise, disentangling harms that come from “facts of life” (the data set as a whole) from harms that arise as a result of participating in the data set. If the same things are learned when the database is D′ as

___________________

⁷ Formally, a randomized algorithm M is ε-differential privacy if for all pairs of data sets D, D′ differing in one element, and for all possible events S, the probability of S, over the randomness of M, when the data set is D, is at most e^ε times the probability of S when the data set is D′, and vice versa. Here ε is a user-specified parameter and is the measure of privacy loss. See Dwork (2011) for an introduction.

⁸ Of course, the smoker is also helped: Learning that smoking causes cancer convinces the smoker to join a smoking cessation program.

Page 85 Cite

Suggested Citation:"5 Privacy and Confidentiality." National Academies of Sciences, Engineering, and Medicine. 2017. Strengthening Data Science Methods for Department of Defense Personnel and Readiness Missions. Washington, DC: The National Academies Press. doi: 10.17226/23670.

×

are learned when the database is D, the only things x need reasonably fear are consequences of what can be learned without his data.

Differential privacy permits the tracking of the cumulative privacy loss under composition. Thus, it is possible to combine simple differentially private computational primitives in order to obtain privacy-preserving algorithms for complex computational tasks while minimizing cumulative privacy loss—just as in traditional algorithm design, simple computational primitives are combined in clever and creative ways to carry out complex computations while minimizing certain resources of interest, such as time, space, generalization error, and so on. Differential privacy necessarily degrades accuracy (often necessarily) because, as discussed, perfectly accurate estimates of only two statistics can destroy privacy. In comparing differentially private algorithms, a “better” algorithm is one that gives better accuracy, or uses smaller sample size, or yields better predictive capabilities, and so on, for the same degree of privacy loss. Good differentially private algorithms exist for many standard analytical tasks (Dwork and Roth, 2013).

Differential privacy is suited to statistical data analysis, especially when the data sets are large and the trends to be identified are sufficiently strong. It can provide privacy-preserving access to data when, absent this technology, access may otherwise be impermissible. For example, one may wish to permit members of the general public to request a rich class of statistics; it may be viewed as complying with privacy policies that clearly prohibit access to raw data (because the data analyst never gets such access or, unlike in the case of computing on encrypted data, even the functionality of such access), or it may be a way to permit data to be used for purposes other than those for which they were collected. However, it is not a panacea—it does not defeat the Fundamental Law of Information Recovery, and it is not the right tool for finding a needle in a haystack.

Thus, differential privacy can be used to discover, in a privacy-preserving fashion, what “typical” behavior looks like, but it is the wrong tool for finding the outlier. For example, differentially private sentiment analysis of a corpus of e-mail can safely reveal overall troop sentiment, but it cannot be used to find individuals in great distress.

In addition, the field is only now in transition from theory to practice; individual use cases⁹ and academic investigations¹⁰ are far from yielding a library of differentially private methods, and much research and algorithm engineering remain to be done. On a positive note, differential privacy protects against overfitting and false discovery attributable to adaptive data

___________________

⁹ Examples of individual use cases include On the Map, RAPPOR, and Mobility.

¹⁰ These include Penn State University’s project Putting Differential Privacy to Use and Harvard University’s project Tools for Sharing Research Data.

Page 86 Cite

Suggested Citation:"5 Privacy and Confidentiality." National Academies of Sciences, Engineering, and Medicine. 2017. Strengthening Data Science Methods for Department of Defense Personnel and Readiness Missions. Washington, DC: The National Academies Press. doi: 10.17226/23670.

×

analysis, in which the second question posed to a data set depends on the answer to the first, and the 100th study undertaken on a data set depends on the outcomes of the first 99 studies of the same set (Dwork et al., 2015a and 2015b).

INSTITUTIONAL REVIEW BOARDS

IRB review has been the preferred approach within DoD to assess and control risk in permitting access to data, and the techniques discussed above are not an alternative to the IRB mechanism but represent some of the tools at its disposal. The key lesson drawn from privacy attacks, both in practice and in theory, is that information flows and combines in mysterious ways. In (pre–World Wide Web) 1980, Denning and Schlörer wrote, “These results show that existing designs for query systems do not adequately prevent disclosure of confidential data by combinatorial inference. . . . [I]t has only been recently that we have begun to understand the myriad of inference techniques that may be used. We are continually finding that compromise is easier than once thought.” These problems are heavily exacerbated in the networked world—and it is precisely the web of information of this networked world that DoD wishes to constructively employ—and when the same or overlapping data sets are analyzed over and over. As a result, the task of the IRB is extremely challenging; tools and standards are needed to assist the boards, together with general principles to be applied.

The following suggestions have been made for sharing human subjects’ data for research: The establishment of Safe Harbor lists specifying contexts in which specific techniques can be used without IRB review, organizational infrastructure for keeping the list current, and general principles to guide the IRB when a ruling is required (Vadhan, 2011). These suggestions are discussed briefly, noting that in addition to the Safe Harbor list, the guidance should include a “danger list” of data-sharing methods to be eschewed because they do not provide sufficient protection. Each entry in the Safe Harbor list specifies a class of data sources (e.g., electronic health records not including genomic data), a class of data-sharing methods (e.g., specific collections of aggregations or interactive mechanisms that achieve a given level of differential privacy), a class of informed consent mechanisms, and a class of potential recipients. IRB review would not be necessary in a context in which one of the Safe Harbor entries would apply.

To remain current, to evolve toward being comprehensive, to accommodate contexts that were not previously anticipated, and to take into account new developments in the scientific community’s constantly evolving understanding of data privacy risks and countermeasures (which may lead to either additions or deletions from the Safe Harbor list), the Safe Harbor list should be maintained by a periodically convened task force including

Page 87 Cite

Suggested Citation:"5 Privacy and Confidentiality." National Academies of Sciences, Engineering, and Medicine. 2017. Strengthening Data Science Methods for Department of Defense Personnel and Readiness Missions. Washington, DC: The National Academies Press. doi: 10.17226/23670.

×

data privacy experts from computer science, statistics, and law, as well as members of IRBs and researchers who do various kinds of human-subjects research. This Safe Harbor list review could be done under the purview of a body such as the National Center for Health Statistics (NCHS), the National Institute of Standards and Technology (NIST), or another relevant group.

For contexts outside the Safe Harbor list, the following guiding principle should be helpful: “No individual should incur more than a minimal risk of harm from the use of his or her data in computing the values to be released, even when those values are combined with other data that may be reasonably available” (Vadhan, 2011).

In considering what is “reasonably available,” the IRB should consider the extent to which the revealed values depend uniquely on an individual’s data and the potential harm that may result. An IRB should ask, “Would the proposed data-sharing method be protective if the study consisted of a single individual?” A negative answer is an indication that technical expertise might be needed.

THE FEDERAL STATISTICAL SYSTEM’S LEGAL AND GOVERNANCE POLICY

The Privacy Act of 1974 and the use of IRBs may not be sufficient to control, and more importantly leverage, the vast amounts of administrative and survey data about the military population that are collected by the Department of Defense and each of the military Services. These data have huge potential for use in cross-sectional and longitudinal analyses and could provide new insights into the military population as well as have spillovers for understanding microcosms of populations in the civilian sector. For example, these data would be valuable for addressing questions about how social and organizational factors affect the behavior of individuals and their units (NRC, 2014; NRC, 2015). Unlike the civilian statistical agencies, DoD does not report the data from its data collections in a systematic way that is accessible by DoD or the public, nor does it make these data available to researchers (unless under contract to DoD). The urgency for DoD to create an infrastructure and framework for administrative and statistical data collections is highlighted by the big data revolution and increasing recognition of the importance of optimizing use of existing data sources to improve efficiency of operations and decision making (DoD, 2014). These issues have also been highlighted in this report.

For DoD to more fully use its administrative and survey data to create statistical products and to make these data available to researchers requires engaging in privacy conversations that adopt existing approaches to statistical disclosure limitations and adapt new approaches, such as dif-

Page 88 Cite

Suggested Citation:"5 Privacy and Confidentiality." National Academies of Sciences, Engineering, and Medicine. 2017. Strengthening Data Science Methods for Department of Defense Personnel and Readiness Missions. Washington, DC: The National Academies Press. doi: 10.17226/23670.

×

ferential privacy protection. There is a parallel evolution of privacy policy and governance, with a focus on access to data for research, on the civilian side of the economy. The recommendations in the National Research Council’s Private Lives and Public Policies (NRC, 1993) played a pivotal role in shaping the U.S. federal statistical system’s approach to privacy and the collection and dissemination of statistical and research data. The big data revolution has changed the privacy discussion as the availability of digital and administrative data has made massive flows of data available for research and other uses (Keller et al., 2016). DoD is an important source of these data flows to create internal and external statistical products and could benefit from joining the federal statistical system through the creation or designation of one or more DoD statistical units.

A legal framework is necessary to implement the vision to create a DoD statistical unit. The first step would be to adopt the privacy and governance structure developed by the Office of Management and Budget (OMB) in coordination with and implemented across federal statistical agencies. Two key products lay the foundation for governing privacy in the federal statistical system: (1) The OMB memorandum “Guidance for Providing and Using Administrative Data for Statistical Purposes” (Burwell, 2014) and (2) the Confidential Information Protection and Statistical Efficiency Act (CIPSEA) (Wallman and Harris-Kojetin, 2004).

The OMB memorandum is directly applicable to DoD and the PDE, and it represents an important step toward implementation of many of the recommendations set forth in this report. The memorandum has four objectives: (1) to encourage leadership to support collaboration and designation of responsibilities across programs and offices that collect statistical and administrative data and to develop data stewardship and data-quality policies; (2) to address legal and policy requirements related to privacy; (3) to discuss best practice tools, including guidance on the use of administrative data for statistical purposes; and (4) to report on progress. Box 5.1 sets forth principles based on those formulated by Burwell (2014) in her memorandum to heads of executive agencies for providing administrative data for statistical purposes, with a focus on repurposing data to minimize reporting burden and to protect privacy.

CIPSEA provides a uniform set of confidentiality protections for information collected by statistical agencies for statistical purposes, at the same time keeping in place the stringent privacy laws governing many agencies. It also provides the mechanism for DoD units to become statistical agencies or units. The law provides a standardized guide to agencies to protect the release of survey participants’ information by (1) obtaining their consent at time of collection to use their data in statistical data products and (2) not exposing the information that would allow for identification of the survey respondent. The designation as a statistical agency or unit allows the desig-

Page 89 Cite

Suggested Citation:"5 Privacy and Confidentiality." National Academies of Sciences, Engineering, and Medicine. 2017. Strengthening Data Science Methods for Department of Defense Personnel and Readiness Missions. Washington, DC: The National Academies Press. doi: 10.17226/23670.

×

BOX 5.1
Principles for Providing Administrative Data for Statistical Purposes

In her memorandum, Burwell (2014, p. 11) says as follows:

In the context of providing administrative data for statistical purposes, both program agencies and statistical components facilitate these principles by, among other things:

Respecting the public’s time and effort by minimizing the number of times they are asked to provide the same or similar information.
Being transparent by providing adequate notice about the planned purpose and potential statistical uses of administrative data (such as in SORNs and Privacy Act statements).
Collaborating to define which data are needed for specified statistical purposes and providing access only to those data for those purposes, and only to those who have a need for the data in the performance of their duties. Identifiable information should be provided only if the need cannot be met by relying on non-identifiable information, and even then, only relevant subsets should be provided.
Protecting data provided to the statistical agency or component against unauthorized access and disclosure. And once the data are provided to the statistical agency or component, providing the level of confidentiality protection in policy and practice necessary to ensure that the data, particularly if linked to other data, are not provided from the statistical agency or component back to the program agency for non-statistical purposes.
Implementing a set of policy and procedural safeguards, including the use of a written agreement, to certify the procedural safeguards that are employed to implement assurances of exclusively statistical uses and confidentiality. Such safeguards include applying sufficient expertise in statistical disclosure avoidance in final products in order to maintain confidentiality, taking into account risks posed by external influences such as the mosaic effect.^a
Eliminating the identifiable information when the data are no longer needed or timely.

__________________

^a The mosaic effect occurs when information alone is not identifiable but poses a privacy or security risk when coupled with other available information.

nee to offer the “platinum pledge” when it is collecting data with a pledge of confidentiality for exclusively statistical purposes. Under CIPSEA, only designated agencies and units can appoint “agents,” who may then access the confidential data. CIPSEA is broad in that survey, interview, administrative, or other data provided to a statistical agency are protected under this statute, just as are individual data reported directly to the statistical agency (usually through surveys) (CIPSEA, 2006).

Page 90 Cite

Suggested Citation:"5 Privacy and Confidentiality." National Academies of Sciences, Engineering, and Medicine. 2017. Strengthening Data Science Methods for Department of Defense Personnel and Readiness Missions. Washington, DC: The National Academies Press. doi: 10.17226/23670.

×

REFERENCES

Brickell, J., and V. Shmatikov. 2008. The cost of privacy: Destruction of data-mining utility in anonymized data publishing. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. http://www.cse.psu.edu/~ads22/privacy598/papers/gks08.pdf.

Burwell, S. 2014. Memorandum for the Heads of Executive Departments and Agencies: Guidance for Providing and Using Administrative Data for Statistical Purposes. M-14-06.

CIPSEA (Confidential Information Protection and Statistical Efficiency Act). 2006. Implementation Guidance for Title V of the E-Government Act Implementation. https://www.whitehouse.gov/sites/default/files/omb/assets/omb/inforeg/proposed_cispea_guidance.pdf.

Denning, D., and J. Schlörer. 1980. A fast procedure for finding a tracker in a statistical database. ACM Transactions on Database Systems 5(1):88-102.

Department of Health, Education, and Welfare. 1979. The Belmont Report. http://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/.

Detmar, D. 2012. iDASH Workshop on Biomedical Data Sharing: Ethical, Legal, and Policy Perspectives. https://idash.ucsd.edu/events/workshops/biomedical-data-sharing-ethical-legal-and-policy-perspectives.

Dinur, I., and K. Nissim. 2003. Revealing information while preserving privacy. Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.

DoD (Department of Defense). 2014. “Big Data: Opportunities and Challenges for Human Capital.” White paper. Washington, D.C.

Dowlin, N., R. Gilad-Bachrach, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing. 2015. Manual for Using Homomorphic Encryption for Bioinformatics. Microsoft Research Tech Report MSR-TR-2015-87. http://research.microsoft.com/pubs/258435/ManualHEv2.pdf.

Dwork, C. 2011. A firm foundation for private data analysis. Communications of the ACM 54(1):86-95.

Dwork, C., and A. Roth. 2013. The algorithmic foundations of differential privacy. Theoretical Computer Science 9(3-4):211-407.

Dwork, C., V. Feldman, M. Hardt, T. Pitassi, O. Reingold, and A. Roth. 2015a. The reusable holdout: Preserving validity in adaptive data analysis. Science 349(6248):636-638.

Dwork, C., V. Feldman, M. Hardt, T. Pitassi, O. Reingold, and A. Roth. 2015b. “Preserving Statistical Validity in Adaptive Data Analysis.” Cornell University Library: arXiv:1411.2664. http://arxiv.org/abs/1411.2664.

Ganta, S.R., S. Kasiviswanathan, and A. Smith. 2008. Composition attacks and auxiliary information in data privacy. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. http://www.cse.psu.edu/~ads22/privacy598/papers/gks08.pdf.

GSA (General Services Administration). 2014. “GSA Rules of Behavior for Handling Personally Identifiable Information (PII).” http://www.gsa.gov/portal/mediaId/199847/fileName/CIO_P21801_GSA_Rules_of_Behavior_for_Handling_Personally_Identifiable_Information_(PII)_(Signed_on_October_29__2014).action.

Homer, N., S. Szelinger, M. Redman, D. Duggan, W. Tembe, J. Muehling, J.V. Pearson, D.A. Stephan, S.F. Nelson, and D.W. Craig. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genetics 4.8:e1000167.

Keller, S., S. Shipp, and A. Schroeder. 2016. Does big data change the privacy landscape? A review of the issues. Annual Review of Statistics and Its Application 3:161-180.

Page 91 Cite

Suggested Citation:"5 Privacy and Confidentiality." National Academies of Sciences, Engineering, and Medicine. 2017. Strengthening Data Science Methods for Department of Defense Personnel and Readiness Missions. Washington, DC: The National Academies Press. doi: 10.17226/23670.

×

Narayanan, A., and V. Shmatikov. 2008. Robust de-anonymization of large sparse datasets. Pp. 111-125 in Proceedings of the 2008 IEEE Symposium on Security and Privacy. doi:10.1109/SP.2008.33.

NRC (National Research Council). 1993. Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics. Washington, D.C.: National Academy Press.

NRC. 2014. The Context of Military Environments: An Agenda for Basic Research on Social and Organizational Factors Relevant to Small Units. Washington, D.C.: The National Academies Press.

NRC. 2015. Measuring Human Capabilities: An Agenda for Basic Research on the Assessment of Individual and Group Performance Potential for Military Accession. Washington, D.C.: The National Academies Press.

NSTIC (National Strategy for Trusted Identities in Cyberspace). 2011. “Appendix A—Fair Information Practice Principles (FIPPs).” http://www.nist.gov/nstic/NSTIC-FIPPs.pdf.

PCAST (President’s Council of Advisors on Science and Technology). 2014. Big Data and Privacy: A Technological Perspective. https://www.whitehouse.gov/sites/default/files/microsites/ostp/PCAST/pcast_big_data_and_privacy_-_may_2014.pdf.

Perez, E., and S. Prokupecz. 2015. “U.S. Data Hack May Be 4 Times Larger Than the Government Originally Said.” CNN Politics. June 22. http://edition.cnn.com/2015/06/22/politics/opm-hack-18-million/index.html.

Teufel, H. 2008. “Department of Homeland Security Privacy Policy Guidance Memorandum.” http://www.dhs.gov/xlibrary/assets/privacy/privacy_policyguide_2008-01.pdf.

Vadhan, S. 2011. “Comments on Advance Notice of Proposed Rulemaking: Human Subjects Research Protections: Enhancing Protections for Research Subjects and Reducing Burden, Delay, and Ambiguity for Investigators, Docket ID number HHS-OPHS-2011-0005.” http://privacytools.seas.harvard.edu/files/privacytools/files/commonruleanprm.pdf?m=1365005819.

Wallman, K.K., and B.A. Harris-Kojetin. 2004. Implementing the Confidentiality Information Protection and Statistical Efficiency Act of 2002. Chance 17(3):21-25.