4
Data Users

Please, sir, I want some more.

Charles Dickens, Oliver Twist

Statistical agencies serve as intermediaries and mediators between data providers and data users. In Chapter 3 we asserted that the federal statistical system could not function without the cooperation of data providers. It is equally true that the system would be in serious trouble if it was not able to satisfy, to a reasonable degree, the needs of a broad array of users inside and outside government. Failing this, it would not be fulfilling its purpose and would not deserve to be supported with public funds. Data users would have to place greater reliance on commercial data sources, which could have potentially undesirable consequences for the scope, cost, and quality of data available to them.

In addition, feedback from a wide data user community is essential for maintaining and improving the quality of federal statistics. It is an effective way to find errors and uncover anomalies in the data and to assess data quality.

In this chapter, we examine the relationships between federal statistical agencies and the persons and organizations that use their statistical products. The chapter has three sections. In the first, we set the stage by defining basic concepts associated with data access. Next, we consider the expectations that data users have about access to federal statistics and the extent to which those expectations are being and should be met. Finally, we examine the ethical and legal responsibilities of data users. At present, users' formal responsibilities extend almost entirely to the statistical agencies, but in a broader sense data users also have ethical and pragmatic responsibilities to data subjects and to society at large.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics 4 Data Users Please, sir, I want some more. Charles Dickens, Oliver Twist Statistical agencies serve as intermediaries and mediators between data providers and data users. In Chapter 3 we asserted that the federal statistical system could not function without the cooperation of data providers. It is equally true that the system would be in serious trouble if it was not able to satisfy, to a reasonable degree, the needs of a broad array of users inside and outside government. Failing this, it would not be fulfilling its purpose and would not deserve to be supported with public funds. Data users would have to place greater reliance on commercial data sources, which could have potentially undesirable consequences for the scope, cost, and quality of data available to them. In addition, feedback from a wide data user community is essential for maintaining and improving the quality of federal statistics. It is an effective way to find errors and uncover anomalies in the data and to assess data quality. In this chapter, we examine the relationships between federal statistical agencies and the persons and organizations that use their statistical products. The chapter has three sections. In the first, we set the stage by defining basic concepts associated with data access. Next, we consider the expectations that data users have about access to federal statistics and the extent to which those expectations are being and should be met. Finally, we examine the ethical and legal responsibilities of data users. At present, users' formal responsibilities extend almost entirely to the statistical agencies, but in a broader sense data users also have ethical and pragmatic responsibilities to data subjects and to society at large.

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics BASIC CONCEPTS RELATED TO DATA ACCESS The data products of statistical agencies are released to other agencies, organizations, and individuals for statistical and research uses. These uses include end uses, such as policy analysis, commercial and academic research, advocacy, and various educational applications in the private and public sectors. They also include intermediate uses, such as the development of sampling frames for surveys, the enhancement of existing data sets by adding information from other sources, and the evaluation of data quality by comparing aggregate data or individual records (microdata) from different sources. Most releases of publicly collected data for intermediate statistical uses are to other government agencies and their contractors or grantees. The spectrum of end users is very broad, however, especially in the private sector. In government, it includes policymakers, policy analysts, program planners and evaluators, researchers, and educators. In the private sector, it includes all of the foregoing, plus advocacy groups, market analysts, and the media. Depending on users' requirements, the products released to them may consist of aggregate data or individual records, with or without explicit identifiers, such as name, address, and Social Security or Employer Identification number. Media for release may be hard copy or electronic, and the latter may assume various forms, such as tapes, diskettes, CD-ROM, or direct on-line transmission. On-line access may be in the form of transmission of complete data sets or outputs resulting from queries of in-house data sets. A significant attribute of data access, the term for release when seen from the user's side, is whether access is unrestricted or restricted . We consider data access to be unrestricted if aggregate data or microdata are available to anyone who wants them (and is willing to pay any user fees that may be required), without restrictions or conditions of any kind on the uses to be made of the data. Access is restricted whenever any conditions on use are imposed. Restricted access is important for two reasons. The first relates to end uses of the data. In preparing data for unrestricted access, agencies may have to limit severely the amount of information included in statistical summaries or individual records in order to comply with requirements to preserve the confidentiality of individually identifiable information. Such limitations may

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics prevent users from performing the analyses best suited to their needs. As a result, returns from the investment of public funds in data collection and processing may not always be as great as they otherwise would be. The use of various forms of restricted access can overcome some of the limitations by allowing access to more detailed data, under controlled conditions, for selected users outside the producing agency. It is also important that there be data sharing for intermediate statistical uses. In the decentralized federal statistical system, in order to improve the quality and consistency of information collected by different agencies and to avoid costly duplication of effort, data are sometimes shared among statistical agencies. Such data sharing cannot be unrestricted, however, because it usually requires access to individually identifiable information. EXPECTATIONS OF DATA USERS BACKGROUND Increasing Demands for Access Demand for access to federal statistical data increased at an extraordinary rate in the 1980s and will surely continue to do so in the 1990s. This demand has been fueled by the development of powerful, widely accessible computers and sophisticated analytic software, improved data transmission capability, the creation of large-scale administrative data sets with numerical identifiers, and favorable user experience with public-use microdata sets based on statistical and administrative data collections. Some of the demand for access is being satisfied. Data users have access to federal data that they could scarcely have imagined 50 years ago. Nevertheless, potential users inside and outside government continue to assert that they have important needs that are not being met. Are they like spoiled children, never satisfied with what they receive, always wanting more, or is there some legitimacy to their complaints? Are statistical agencies being sufficiently receptive to user needs, or could they do more? Below we summarize the evidence that we have obtained and draw some conclusions on these complicated questions. Meeting User Needs: Successes and Failures The picture over the past two decades is mixed. Overall, there can be little doubt that user access to data has increased,

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics but so has demand. Some new restrictions have also been imposed. To convey the flavor of developments during this period and a sense of the current situation, we present examples of movement in both directions. (These examples apply only to data on persons. User access to data on businesses and other organizations is discussed in Chapter 7.) Following are some developments that signify a potential for greater user access: substantial growth in the use of public-use microdata sets issued by federal statistical agencies; increased in-house access to more detailed microdata files for researchers participating in the American Statistical Association/National Science Foundation (ASA/NSF) fellows program and other similar arrangements; in addition, some statistical units, like the National Agricultural Statistics Service and the Statistics of Income Division of the Internal Revenue Service, have made arrangements for researchers to use such data under controlled conditions in the unit's state or regional offices; increased use of formal user licensing agreements by government agencies and nongovernment organizations that have developed important research data bases with federal funding—these agreements provide for access, at the user's work site, to detailed microdata sets under controlled conditions; initiation of the release, by the National Center for Education Statistics (NCES), of encrypted microdata files in CD-ROM format with built-in software for analysis—this mode of release may permit more widespread dissemination of microdata sets that link survey and administrative data on persons; the undertaking of research to explore the possibility of releasing microdata sets that do not contain data for any specific individual but would allow users to draw valid inferences from the data (see Chapter 6 and Rubin, 1993, for details); and increased use of formal agreements for interagency sharing of identifiable data about persons for statistical and research purposes. Until recently, for example, the Bureau of Labor Statistics (BLS), the primary funding agency for the Census Bureau's Current Population Survey, had access only to tabulations and public-use microdata files from the survey. However, the Census Bureau and BLS have recently concluded an interagency agreement that allows the latter to have access, for use in special analyses and methodological research, to nonpublic microdata from the Current Population Survey. (Chapter 6 provides details on this and other examples.)

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics On the other side of the ledger, the following are instances in which attempts to arrange for interagency sharing of data or access by nongovernment users for legitimate and often potentially important statistical uses have not succeeded: Gates (1988) provides several examples of user requests for Census Bureau demographic microdata that were denied because of confidentiality concerns. The topics that the users had hoped to study included a disparity in Social Security benefits between adjacent cohorts of retirees, the economic well-being of persons living outside metropolitan areas, racial segregation in the United States, and the outcomes of the Selective Service draft lotteries held in the 1970s. The Tax Reform Act of 1976 ended presidential authority to allow access to tax return information through executive orders (Wilson and Smith, 1983). The act specified which organizations could have access to tax return data and for what purposes. It created new barriers to access, for statistical and research purposes, to employee earnings data and other tax return information, such as taxpayers' current addresses. Among the consequences of this legislation have been (1) almost complete denial of external access to the Social Security Administration's (SSA's) Continuous Work History Sample, (2) increased difficulty in tracing study populations in epidemiologic follow-up studies, (3) increased difficulty in developing arrangements for the sharing of lists of businesses among federal statistical agencies, and (4) barriers to the linkage of survey and tax return data for studies of the income distribution of the general population or selected subgroups. For many years, the National Center for Health Statistics (NCHS) used a sampling frame based on address lists from the decennial census for its National Health Interview Survey (NHIS), a continuing national sample survey on health topics, for which the sample selection and field work are done by the Census Bureau. However, the confidentiality provisions required of the Census Bureau (Title 13 U.S.C.) prevented NCHS from using the sample of households derived from census addresses for other surveys in which data were to be collected for NCHS by private contractors. In the early 1980s, NCHS abandoned the use of decennial census address lists and, at a very substantial cost, switched to a sampling procedure that required independent listing of addresses in sample areas by field workers. The Census Bureau continues to select the sample and, using the new frame, collect the NHIS data for NCHS; because the survey is now conducted under the provisions

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics of Title 15 U.S.C., the Census Bureau is able to share identifiable information with NCHS for the latter's statistical uses in other surveys and analyses. Release of data to other users is governed by NCHS confidentiality requirements, as stated in Title 42 U.S.C. As a result of recent arrangements between SSA and state vital statistics offices for joint issuance of birth certificates and Social Security numbers, SSA no longer receives race-ethnic information for most births. Information on the race-ethnic status of the parents is recorded on birth certificates, but it is not being made available to SSA. While the immediate consequences are limited, in the longer run this practice will make it difficult or impossible for SSA to analyze the effects of its programs on different racial-ethnic groups and for the Census Bureau to provide reliable intercensal population estimates by race. It will also be a negative factor in considering a possible shift to greater reliance on administrative records in conducting the decennial censuses of population. Although some researchers are gaining restricted access to microdata sets at agency sites through fellowship programs and similar arrangements, many other potential users of such data sets cannot be accommodated or could work more effectively at their home sites because of the reduced cost and better access to computing facilities. As noted, a few agencies (e.g., NCES) are offering off-site access under licensing agreements, but others either lack the legal authority (especially the Census Bureau) or do not choose to offer such arrangements. As discussed in Chapter 3, the Office of Management and Budget recently issued a legal opinion that the Census Bureau's data subjects and providers cannot waive their entitlement to confidentiality under Title 13. The ruling caused the National Institute on Aging to abandon efforts to conduct a potentially valuable follow-up study of surviving panel members from the Longitudinal Retirement History Survey, which had been conducted in the 1970s for the SSA by the Census Bureau under the authority of Title 13. Because of legal restrictions or other concerns about the confidentiality of the data requested, most federal statistical agencies, unlike the Census Bureau, do not systematically document cases in which they have denied data user requests or provided only part of the data requested. If they did, and the panel believes they should, we probably could have presented similar examples from several other agencies.

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics Factors that Affect Agency Decisions on Access In deciding what kinds of releases of personal data can and cannot be made, federal statistical agencies must first ensure that they comply with all relevant statutory requirements and honor all pledges made to data subjects and providers. Failure in either respect could seriously compromise their ability to carry out their mission. Many agency officials believe that in addition to adhering to legal requirements and honoring commitments to respondents, they must be perceived by the public as doing so (see, e.g., Butz, 1985b). On the other hand, to survive, statistical agencies must be perceived by the public and especially by the executive and legislative branches of government as fulfilling their basic mission by providing information that is relevant to important social and economic policy questions and that can be used in ways that benefit society. Federal statistical agencies must also consider the costs associated with various forms of dissemination. All forms of dissemination have some costs, but restricted access procedures tend to be more resource intensive. Even if users are willing to pay some of the costs, agencies with employment ceilings must give priority to their primary data collection, processing, and dissemination activities, as opposed to, say, undertaking special tabulations on a reimbursable basis, providing staff support and computer services for an ASA/NSF fellow, or making a site visit to a licensed user to ensure that all access conditions are being observed. User Influence on Agency Decisions To what extent and by what means are users able to influence agency decisions on access? Few mechanisms are available specifically for this purpose. The panels that have been established at the Census Bureau and the National Center for Education Statistics to determine how much information can be included in microdata files that are released to nonagency users do not include any representatives of nongovernment data users. The Census Bureau, we understand, has been considering an arrangement whereby outside advisors representing data subjects and data users would review decisions of its Microdata Review Panel twice a year and recommend whatever changes in release policies they consider appropriate. Users have opportunities—at meetings of the American Statistical Association, the Association for Public Data Users, the Council

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics of Professional Associations on Federal Statistics, and other professional organizations—to discuss dissemination procedures with other users and with agency representatives. Discussions of this kind influenced the Census Bureau's choice of statistical disclosure limitation procedures for summary tape files from the 1990 decennial census. The bureau switched from the cell suppression procedures used in previous censuses, which had caused many problems for users, to a new procedure that did not require any cell suppression. Another topic that is often discussed in such settings is the population size cutoff for the smallest geographic areas that can be identified for individual records in public-use microdata sets. The Census Bureau lowered its general cutoff from 250,000 to 100,000 in 1981, but some users would like to see it lowered further. A few users who have been denied data they want have requested access to the data under the Freedom of Information Act. However, most regular users of federal data for research and statistical purposes are reluctant to take such a step because of concerns, whether justified or not, that their ongoing dealings with the statistical agencies might be prejudiced. Another option might be to seek changes in the statutes that restrict access to data. Given the complexity of existing confidentiality statutes and the legislative process in general, it is hard to predict the outcome of attempts to change the rules for data access through changes in legislation. (The confidentiality legislation now in force and the pros and cons of seeking changes are discussed in Chapter 5.) FINDINGS AND RECOMMENDATIONS Data Sharing Within Government A substantial amount of data sharing occurs between agencies for statistical and research purposes. Nevertheless, some of the laws that govern the confidentiality of statistical data prohibit or severely limit interagency sharing of data collected by some agencies. Laws that control access to administrative records, such as tax returns and earnings records, restrict their use for important statistical applications. As noted by the Council of Economic Advisers (1991), barriers to data sharing for statistical purposes have led to costly duplication of effort, inconsistencies among related data sets, and excessive burden on individuals and organizations

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics who are asked to supply information. They have also made it difficult or impossible to develop data sets needed for policy analysis on important topics, such as trends in income distribution and the long-range consequences of occupational and other environmental exposures to suspected carcinogens. Recommendation 4.1 Greater opportunities should be available for sharing of explicitly or potentially identifiable personal data among federal agencies for statistical and research purposes, provided the confidentiality of the records can be properly protected and the data cannot be used to make determinations about individual data subjects. Greater access should be permitted to key statistical and administrative data sets for the development of sampling frames and other statistical uses. Additional data sharing should only be undertaken in those instances in which the procedures for collecting the data comply with the panel's recommendations for informed consent or notification (see Recommendations 3.2 and 3.3). The panel supports the proposal of the Council of Economic Advisers (1991:6) that legislation be developed that would permit ''limited sharing of confidential statistical information solely for statistical purposes between statistical agencies under stringent safeguards." Access to Data by Nongovernment Users Because of legitimate concerns about the possibility of disclosure of individual information, statistical agencies have limited the amount of detailed data provided to nongovernment users in tabulations and public-use microdata files. This lack of detail limits the ability of users to do research that could contribute to the understanding and resolution of significant economic and social problems. Some agencies have developed mechanisms for providing access to more detailed information on a restricted basis, but existing arrangements are far from meeting all legitimate needs. Recommendation 4.2 Federal statistical agencies should seek to improve the access of external users to statistical data, through both legislation and the development and

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics greater use, under carefully controlled conditions, of tested administrative procedures. We believe that this can be done without sacrificing confidentiality protection for data subjects and providers. As discussed further in Chapter 5, legislation should not pose unnecessary barriers to interagency data sharing for statistical and research purposes. It should extend legal sanctions for violations of confidentiality to include data users who are not employed by the agency that produced the data. Among the administrative procedures, the panel believes that licensing agreements that allow users to analyze data sets at their own work sites are a particularly promising solution. It is difficult to evaluate the trade-offs between confidentiality of and access to data without better information on the numbers and types of user requests for data that are being denied for confidentiality reasons. Without such information it is hard to define the problems that inhibit data access. Recommendation 4.3 All federal statistical agencies should establish systematic procedures for capturing information on a continuing basis about user requests for data that have been denied or only partially fulfilled. Such information should be used for periodic reviews of agency confidentiality and data access policies. Better information on denied requests, however, will not fully reveal the extent to which important uses of data are not made because the data are not accessible. Users who already know that certain data sets are not available are unlikely to make much effort to develop plans for analyses that would require access to such data. LEGAL AND ETHICAL RESPONSIBILITIES OF DATA USERS BACKGROUND Legal Responsibilities of Users Users given restricted access to personal data are asked and sometimes required by law or written contracts to comply with various conditions relating to their use and disposition of the data. Such conditions may include the following:

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics use of the data solely for purposes specified in an access agreement, use of the data only at specified physical locations, allowing access only to specified individuals, observing limitations on further releases of the initial data set or the products produced from it, observing provisions for eventual disposition of the data and outputs, application of statistical disclosure limitation techniques and physical security measures to protect the data from unintended disclosure, and execution of a written agreement between the releasing agency and the recipient and formal extension of the agreement to all individuals who will have access to the data. In some instances users who violate conditions associated with their access to the data are subject to criminal or civil penalties. The features of arrangements for providing restricted data access vary according to the uses that will be made of the data. Intermediate statistical uses generally involve transfer of data with identifiers from one agency to another, based on formal interagency agreements covering all or most of the conditions of use listed above. A basic principle of such agreements is that there must be full compliance with the legal confidentiality requirements of both agencies at all times. For releases of Census Bureau data, this is accomplished by having persons in the receiving agency who will have access to identifiable data take oaths as special sworn employees of the Census Bureau, which makes them subject to the same penalties as regular Census Bureau employees for any improper release or disclosure of identifiable information. The conditions associated with restricted access for end uses and the penalties for violations are much more varied, especially when the recipients of the data are not other federal agencies. Among the more restrictive conditions are those associated with access obtained by individual researchers under a ASA/NSF fellows program that has been in operation since 1978. For research requiring access to potentially identifiable records, fellows must usually work on-site at one of the federal agencies that participate in the program, and they are subject to the same penalties as regular agency employees for violation of confidentiality requirements. In addition, ASA/NSF fellows usually have access to the data only for the term of their appointment. Less restrictive procedures are used by the National Center

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics for Health Statistics (NCHS) in its release of microdata files based on the National Health Interview Survey and other surveys. Prior to release of the file, which can be used at the recipient's work site, NCHS applies standard statistical disclosure limitation techniques to the files, but it recognizes the impossibility of reducing the risk of disclosure of individually identifiable information to zero. Thus, each recipient is required to sign a document (see Figure 4.1) indicating acceptance of restrictions on how the data will be used and on re-release of the file to other organizations. Recipients must agree not to attempt to identify individual units in the file and, if a unit should be inadvertently identified, notify the NCHS and refrain from disclosing the discovered identity to others. If NCHS learns that the agreement has been violated in any way, the user responsible would probably be denied further access to NCHS microdata files. The NCHS has recently begun to release some of its microdata files on CD-ROMs, without requiring a signed data release agreement. Whenever the CD-ROM is used, the user is presented with an on-screen notification of the data restrictions, which are essentially the same as those shown in Figure 4.1. This screen cannot be bypassed and the striking of a specific key is taken as the indication that the user has read and agreed to the restrictions and recognizes the penalties (National Center for Health Statistics, 1992a). The National Center for Education Statistics is experimenting with a form of restricted access that occupies an intermediate position with respect to conditions of use. Under a licensing agreement, researchers in organizations and agencies are allowed to use confidential NCES data at their own sites for statistical purposes. The agreement incorporates several provisions designed to maintain the confidentiality and physical security of the data, including agreement by the users to be subject to unannounced inspections of their work site. All persons who will have access to the data must sign affidavits of nondisclosure and are subject to severe penalties for violation of their oath. Licensing agreements are also being used by university-based research organizations to allow restricted access to detailed microdata sets based on federally funded surveys. Jabine (1993a) provides two examples, one relating to the Panel Study of Income Dynamics and the other to the National Longitudinal Survey of Youth. The conditions are similar to but somewhat less rigorous than those of the NCES licensing agreement just described. One of the organizations involved includes a hidden unique identifier in each

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics DATA USE AGREEMENT—The Public Health Service Act (42 U.S.C. 242m(d)) provides that the data collected by the National Center for Health Statistics (NCHS) may be used only for the purpose for which they were obtained; any effort to determine the identity of any reported cases, or to use the information for any purpose other than for health statistical reporting and analysis, would violate this statutory restriction and the conditions of the data use agreement. NCHS does all it can to assure that the identity of data subjects cannot be disclosed; all direct identifiers, as well as characteristics that might lead to identifications, are omitted from the data set. Nevertheless, it may be possible in rare instances, through complex analysis and with outside information, to ascertain from the data sets the identity of particular persons or establishments. Considerable harm could ensue if this were done. Therefore, the undersigned gives the following assurances with respect to all NCHS data sets: I will not use nor permit others to use the data in these sets in any way except for statistical reporting and analysis; I will not release nor permit others to release the data sets or any part of them to any person who is not a member of this organization, except with the approval of NCHS; I will not attempt to link nor permit others to attempt to link the data set with individually identifiable records from any other NCHS or non-NCHS data set. If the identity of any person or establishment should be discovered inadvertently, then (a) no use will be made of this knowledge, (b) the Director of NCHS will be advised of the incident, (c) the information that would identify an individual or establishment will be safeguarded or destroyed as requested by NCHS, and (d) no one else will be informed of the discovered identity. My signature indicates my agreement to comply with the above-stated statutorily-based requirements with the knowledge that deliberately making a false statement in any matter within the jurisdiction of any department or agency of the Federal Government violates 18 U.S.C. 1001 and is punishable by a fine of up to $10,000 or up to 5 years in prison. Signed:____________________________________________Date:______________ Print or Type Name:_______________________________________________________________ Title:_______________________________________________________________ Organization:________________________________________________________ Address:_____________________________________________________________ City:__________________________________State________Zip:_____________ Phone Number:______________________________________________________________ FIGURE 4.1 Data use agreement by the National Center for Health Statistics.

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics of the data sets it releases so that it would be able to determine the origin of any copy found in unauthorized hands. Both of the agreements carry financial penalties for violations. This brief account portrays some of the steps that have been taken in recent years to accommodate the needs of users for data sets that are too detailed to release without any conditions of use attached to them. The sentiment is growing among government statisticians that producers and users of statistical data should share responsibility for adherence to the confidentiality pledges given to data subjects and providers and that violations of that understanding by users, like violations by agency staff, should be subject to penalties. At present, the nature of the penalties and the legal authority for assessment of penalties on users varies, and there does not appear to be a consensus on what would be most effective. The idea that data users should be held legally responsible for the confidentiality of the data to which they have access is not new; it was advanced by the ASA's Ad Hoc Committee on Privacy and Confidentiality in its 1977 report. The committee encouraged the dissemination of microdata sets by statistical agencies and said that they should be released without restrictions or conditions, provided all explicit identifiers had been removed and "it is virtually certain that no recipients can identify specific individuals in the file" (American Statistical Association, 1977:75). The committee recommended that data sets not meeting the second requirement be made accessible for research and statistical uses only if a specific set of conditions on use were agreed to in advance, in writing, by the recipient. They recommended further that the releasing agency assume responsibility for ensuring that users are observing these conditions and that "both the recipient and the agency staff are subject to enforceable penalties for failure to observe the agreed conditions of use" (p. 75). A less detailed recommendation about release of microdata files was included in the Bellagio principles, which were developed at a 1977 international conference on Privacy, Confidentiality, and the Use of Government Microdata for Research and Statistical Purposes (see Flaherty, 1978). The Bellagio principles support widespread dissemination of data, including microdata, for research and statistical purposes, subject to confidentiality controls. A specific recommendation was that users of microdata should be required to agree in writing to protect the confidentiality of the files. No reference was made, however, to sanctions for users who violate confidentiality requirements.

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics Ethical Responsibilities of Users In a search for material related to the obligations and responsibilities of external users of data, that is, users not directly associated with the agencies or organizations that developed the data sets, the panel reviewed guidelines and standards established by several professional associations and other groups. As described in Chapter 3, the American Statistical Association and the International Statistical Institute have developed detailed guidelines covering the responsibilities of agencies and organizations that collect data and, in some instances, use their own data for analyses. However, we found that those and other guidelines of professional societies had relatively little to say about the ethical obligations of those who use data provided by other organizations. One exception comes from the Bellagio principles. Principle 14 (Flaherty, 1978:277) reads as follows: Professional or national organizations should have codes of ethics for their disciplines concerning the utilization of individual data for research and statistical purposes. Such ethical codes should furnish mutually agreeable standards of behavior governing relations between providers and users of governmental data. The report of the ASA's Ad Hoc Committee on Privacy and Confidentiality recommended that "training in ethical standards and in privacy safeguards should be incorporated into the statistics and survey research curricula at colleges and universities" (American Statistical Association, 1977:76). This would be a useful step, but to reach a larger part of the ever-growing population of external users one would need to ensure the inclusion of similar training in curriculums for sociology, economics, and other disciplines whose members are likely to be secondary users of federal and other data sets. What appears to be an unnecessarily rigorous requirement for data access is included in the ASA's Ethical Guidelines for Statistical Practice. A recommendation to data collectors was to ensure that, whenever data are transferred to other persons or organizations, this transfer conforms with the established confidentiality pledges; and require written assurance from the recipients of the data that the measures employed to protect confidentiality will be at least equal to those originally pledged (American Statistical Association, 1989:24). Since the term "data" was not defined, this recommendation seems to suggest that no data, whether in aggregate or microdata form,

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics with or without identifiers, should be transferred for any purpose without obtaining written assurance from recipients that they will protect confidentiality. We suspect this was not really the intent of the recommendation. Perhaps the committee meant the recommendation to apply only to data in explicitly or potentially identifiable form. If this supposition is correct, the committee's position and the panel's recommendations below would be consistent. FINDINGS AND RECOMMENDATIONS Secondary data users tend to be impatient with restrictions that inhibit access to data they want for their research, and they may occasionally attempt to circumvent such restrictions. The panel strongly urges all data users to recognize that the continued ability of federal statistical agencies to meet user needs depends on scrupulous observance of confidentiality pledges given to data subjects and data providers. Users must be willing to accept responsibility for appropriate use of data entrusted to them. They must abide by agency-imposed confidentiality requirements that include features such as licensing agreements, bonding, access only at authorized sites, and legal sanctions for failure to abide by agreed conditions of use. If they believe that restrictions on access to some kinds of data are unnecessarily stringent, the ethical course is to work through existing institutions to change the rules, not to seek ways to circumvent the rules. Recommendation 4.4 All users of federal data, regardless of the formal conditions of access, should subscribe to the following principles for responsible data use: Conscientiously observe all conditions agreed to in order to obtain access to the data. Allow access to the original data set only by those permitted access under the agreed conditions of recipiency and ensure that all such persons are aware of the required conditions of use. Make no attempt to identify particular individuals or other units whose data are considered to be confidential. In the event that one or more individuals or other units are identified in the course of research, notify the organization that provided the data set, and do not inform anyone else of the discovered identities.

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics Recommendation 4.5 To promote knowledge of and adherence to the principles of responsible data use, Federal statistical agencies should ask all recipients of federal microdata sets to submit to the releasing agency, in writing, their agreement to observe the above principles, plus any other conditions deemed necessary for specific data sets. Professional societies and associations that have ethical codes, standards, or guidelines should incorporate these principles in them. The principles and the justifications for them should be included in academic and other training for disciplines whose members are likely to be users of federal statistical data. Other potentially relevant types of controls are in place in universities and in some government settings, such as the National Institutes of Health. For example, investigators planning research involving human subjects must have their research proposals approved by the local institutional review board. A more recent development is the establishment of procedures for promoting scientific integrity and investigating allegations of fraud and misconduct in university-based research. Such oversight mechanisms, with suitable definition of their scope to cover research uses of federal data sets, could serve to reassure custodians of federal data that adequate controls are in place to monitor compliance with data protection rules and regulations by users in the research community. The panel applauds the development of local institutional mechanisms to promote ethical behavior in the research community and hopes that they will include fair statistical information practices and compliance with the data protection requirements of statistical agencies among their concerns.

OCR for page 91
Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics This page in the original is blank.