Chapter 8

Invited Session on Confidentiality

Chair: Laura Zayatz, Bureau of the Census

Authors:

Eleanor Singer and John VanHoewyk, University of Michigan and Stanley Presser, University of Maryland

Arthur B.Kennickell, Federal Reserve Board

Katherine K.Wallman and Jerry L.Coffey, Office of Management and Budget



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition Chapter 8 Invited Session on Confidentiality Chair: Laura Zayatz, Bureau of the Census Authors: Eleanor Singer and John VanHoewyk, University of Michigan and Stanley Presser, University of Maryland Arthur B.Kennickell, Federal Reserve Board Katherine K.Wallman and Jerry L.Coffey, Office of Management and Budget

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition This page in the original is blank.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition Public Attitudes Toward Data Sharing by Federal Agencies Eleanor Singer and John VanHoewyk, University of Michigan Stanley Presser, University of Maryland Abstract Very little information exists concerning public attitudes on the topic of data sharing among Federal agencies. The most extensive information prior to 1995 comes from questions on several IRS surveys of taxpayers, from questions a a series of Wisconsin surveys carried out in 1993–95, and from scattered other surveys reviewed by Blair (1995) for the National Academy of Sciences panels. From this review it is clear that the public is not well informed about what data sharing actually entails, nor about the meaning of confidentiality. It seems likely that opinions on this topic are not firmly held and liable to change depending on other information stipulated in the survey questions as well as on other features of the current social climate. In the spring of 1995, the Survey Research Center at the University of Maryland (JPSM) carried out a random digit dialing (RDD) national survey which was focused on the issue of data sharing. The Maryland survey asked questions designed to probe the public's understanding of the Census Bureau's pledge of confidentiality and their confidence in that pledge. Respondents were also asked how they felt about the Census Bureau's obtaining some information from other government agencies in order to improve the decennial count, reduce burden, and reduce cost. In addition, in an effort to understand responses to the data sharing questions, the survey asked about attitudes toward government and about privacy in general. Then, in the fall of 1996, Westat, Inc. repeated the JPSM survey and, in addition, added a number of split-ballot experiments to permit better understanding of some of the responses to the earlier survey. This paper examines public attitudes toward the Census Bureau's use of other agencies' administrative records. It analyzes the relationship of demographic characteristics to these attitudes as well as the interrelationship of trust in government, attitudes toward data sharing, and general concerns about privacy. It also reports on trends in attitudes between 1995 and 1996 and on the results of the question-wording experiments imbedded in the 1996 survey. Implications are drawn for potential reactions to increased use of administrative records by the Census Bureau. Introduction For a variety of reasons, government agencies are attempting to satisfy some of their needs for information about individuals by linking administrative records which they and other agencies already possess. Some of the reasons for record linkage have to do with more efficient and more economical data collection, others with a desire to reduce the burden on respondents, and still others with a need to improve coverage of the population and the quality of the information obtained.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition The technical problems involved in such record linkage are formidable, but they can be defined relatively precisely. More elusive are problems arising both from concerns individuals may have about the confidentiality of their information and from their desire to control the use made of information about them. Thus, public acceptance of data sharing among Federal and state statistical agencies is presumably necessary for effective implementation of such a procedure, but only limited information exists concerning public attitudes on this topic. A year and a half ago, the Joint Program in Survey Methodology (JPSM) at the University of Maryland devoted its practicum survey to examining these issues. The survey asked questions designed to probe the public 's understanding of the Census Bureau's pledge of confidentiality and their confidence in that pledge. It also asked how respondents felt about the Census Bureau's obtaining some information from other government agencies in order to improve the decennial count or to reduce its cost. In addition, in an effort to understand responses to the data sharing questions, the survey asked a series of questions about attitudes toward government and about privacy in-general. Most of these questions were replicated in a survey carried out by Westat, Inc. in the fall of 1996, a little more than a year after the original survey. The Westat survey asked several other questions in addition—questions designed to answer some puzzles in the original survey, and also to see whether the public was willing to put its money where its mouth was—i.e., to provide social security numbers (SSN's) in order to facilitate data sharing. Today, I will do four things: Report on trends in the most significant attitudes probed by both surveys; Discuss answers to the question about providing social security numbers; Report on progress in solving the puzzles left by the JPSM survey; and Discuss the implications of the foregoing for public acceptance of data sharing by Federal agencies. Description of the Two Surveys The 1995 JPSM survey was administered between late February and early July to a two-stage Mitofsky-Waksberg random digit dial sample of households in the continental United States. In each household, one respondent over 18 years of age was selected at random using a Kish (1967) procedure. The response rate (interviews divided by the total sample less businesses, nonworking numbers, and numbers that were never answered after a minimum of twenty calls) was 65.0 percent. The nonresponse consisted of 23.4% refusals, 6.5% not-at-home, and 5.1% other (e.g., language other than English and illness). Computer-assisted telephone interviewing was conducted largely by University of Maryland Research Center interviewers, supplemented by graduate students in the JPSM practicum (who had participated in the design of the questionnaire through focus groups, cognitive interviews, and conventional pretests). The total number of completed interviews was 1,443. The Westat survey (Kerwin and Edwards, 1996) was also conducted with a sample of individuals 18 or older in U.S. households from June 11 to mid-September. The response rate, estimated in the same way as the JPSM sample, was 60.4%[1]. The sample was selected using a list-assisted random digit dial method. One respondent 18 or over was selected at random to be interviewed.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition Trends in Public Attitudes Toward Data Sharing The most significant finding emerging from a comparison of the two surveys was the absence of change with respect to attitudes relating to data sharing. Indeed, if we are right that there has been little change on these matters, the new survey is testimony to the ability to measure attitudes reliably when question wording, context, and procedures are held reasonably constant—even on issues on which the public is not well informed and on which attitudes have not crystallized. In 1996 between 69.3% and 76.1%, depending on the agency, approved of other agencies sharing information from administrative records with the Census Bureau in order to improve the accuracy of the count, compared with 70.2% to 76.1% in 1995[2]. Responses to the Immigration and Naturalization Service, asked about in 1995, and the Food Stamp Office, asked about in 1996, are comparable to those to the Social Security Administration (SSA). Responses are consistently least favorable toward the Internal Revenue Service (IRS). Westat documents five significant changes (p < 10) among 22 questions asked about the Census Bureau on both surveys. First, there is more awareness of the fact that census data are used to apportion Congress and as a basis for providing aid to communities; but second, there is less awareness that some people are sent the long census form instead of the short form. (Both of these changes make sense in retrospect. In the election year of 1996, apportionment was very much in the news; at the same time, an additional year had elapsed since census forms, long or short, had been sent to anyone.) Third, fewer people in 1996 than 1995 said that the five questions asked on the census short form are an invasion of privacy—a finding at odds with others, reported below, which suggest increasing sensitivity to privacy issues between the two years. This issue will be examined again in the 1997 survey. Fourth, there was a modest increase in the strength with which people opposed data sharing by the IRS. This finding (not replicated with the item about data sharing by SSA) may have less to do with data sharing than with increased hostility toward the IRS. These changes are mostly on the order of a few percentage points. Finally, among the minority who thought other agencies could not get identifiable Census data there was a substantial decline in certainty, although the numbers of respondents being compared are very small. Trends in Attitudes Toward Privacy In contrast with attitudes toward data sharing and the Census Bureau, which showed virtually no change between 1995 and 1996, most questions about privacy and alienation from government showed significant change, all in the direction of more concern about privacy and more alienation from government. The relevant data are shown in Table 1. There was a significant decrease in the percentage agreeing that “people's rights to privacy are well protected” and a insignificant increase in the percentage agreeing that “people have lost all control over how personal information about them is used.” At the same time, there was a significant decline in the percentage disagreeing with the statement, “People like me don't have any say about what the government does,” and a significant increase in the percentage agreeing that “I don't think public officials care much what people like me think” and in the percentage responding “almost never” to the question, “How much do you trust the government in Washington to do what is right?” The significant decline in trust and attachment to government manifested by these questions is especially impressive given the absence of change in responses to the data sharing questions. We return to the implications of these findings in the concluding section of the paper.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition Table 1. —Concerns about Privacy and Alienation from Government, by Year Attitude/Opinion Agree Strongly or Somewhat   1995 1996 People's rights to privacy are well protected 41.4 (1,413) 37.0 (1,198) People have lost all control over how personal information about them is used 79.5 (1,398) 80.4 (1,193) People like me don't have any say about what the government does 59.2 (1,413) 62.9 (1,200) I don't think public officials care much what people like me think 65.4 (1,414) 71.1 (1,202) How much do you trust the government in Washington to do what is right? (Almost never) 19.2 (1,430) 25.0 (1,204) Willingness to Provide Social Security Number to Facilitate Data Sharing One question of particular importance to the Census Bureau is the extent to which people would be willing to provide their social security number to the Census Bureau in order to permit more precise matching of administrative and census records. Evidence from earlier Census Bureau research is conflicting in this regard. On the one hand, respondents in four out of five focus groups were overwhelmingly opposed to this practice when they were asked about it in 1992 (Singer and Miller, 1992). On the other hand, respondents to a field experiment in 1992 were only 3.4 percentage points less likely to return a census form when it requested their SSN than when it did not; an additional 13.9 percent returned the form but did not provide a SSN (Singer, Bates, and Miller, 1992). To clarify this issue further, the Bureau asked Westat to include a question about SSN on the 1996 survey. The question (Q21) read as follows: “The Census Bureau is considering ways to combine information from Federal, state, and local agencies to reduce the costs of trying to count every person in this country. Access to social security numbers makes it easier to do this. If the census form asked for your social security number, would you be willing to provide it? ” About two thirds (65.9%) of the sample said they would be willing to provide the number; 30.5% said they would not; and 3.5% said don 't know or did not answer the question. The question about SSN was asked after the series of questions asking whether or not people approved of other administrative agencies sharing data with the Census Bureau. Therefore, it is reasonable to

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition assume that responses to this question were influenced by opinions about data sharing, which the preceding questions had either brought to mind or helped to create. And, not surprisingly, there is a relationship between a large number—but not all—of the preceding questions and the question about providing one's SSN. For example, those who would provide their SSN to the Bureau are more likely to believe the census is extremely or very important and more likely to be aware of census uses. They are more likely to favor data sharing. Those who would not provide their SSN to the Bureau are more concerned about privacy issues. They are less likely to trust the Bureau to keep census responses confidential; they are more likely to say they would be bothered “a lot” if another agency got their census responses; they are less likely to agree that their rights to privacy are well protected; less likely to believe that the benefits of data sharing outweigh the loss of privacy this would entail, and more likely to believe that asking the five demographic items is an invasion of privacy. All of these differences are statistically significant. Table 2. —Willingness to Provide SSN and Attitudes to Census Bureau Attitude/Opinion Would Not Provide SSN % Would Provide SSN % Believes counting population is “extremely” or “very” important 63.8 79.7 Is aware of census uses 43.1 54.8 Would favor SSA giving Census Bureau short-form information 56.3 85.0 Would favor IRS giving Census Bureau long-form information 30.4 61.2 Would favor “records-only” census 45.6 60.0 Trusts Bureau to not give out/keep confidential census responses 45.0 76.7 Would be bothered “a lot” if other agency got census responses 54.1 29.9 Believes benefits of record sharing outweigh privacy loss 36.0 51.1 Believes the five items on short form are invasion of privacy 31.3 13.4 There are also significant relationships between political efficacy, feelings that rights to privacy are well protected, feelings that people have lost control over personal information, and trust in “the government in Washington to do what is right” (Q24a-d) and willingness to provide one's SSN. These political attitude questions, it should be noted, were asked after the question about providing one's SSN, and so they could not have influenced the response to this question.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition Of the demographic characteristics, only two—gender and education—are significantly (for gender, p<.10; for education, p<.05) related to willingness to provide one's SSN. Almost three quarters (71.4%) of men, but only 65.5% of women, are willing to provide their SSN. This is true of 71.2% of those with less than a high school education, 63.9% of those who are high school graduates, 68.7% of those with some college, and 76.8% of those who are college graduates. The same curvilinear relationship is apparent for income: 75.4% of those with family incomes of less than $20,000, 69.6% of those with incomes between $20,001 and $30,000 and $30,001 and $50,000, 68.6% of those between $50,001 and $75,000, and 75.4% of those with incomes over $75,000 say they would be willing to provide their SSN if asked by the Census Bureau to do so. Table 3. —Willingness to Provide SSN, by Concerns about Privacy and Alienation from Government Concern/Alienation Would Provide SSN % Would Not Provide SSN % Disagrees strongly that rights to privacy are well protected 24.2 45.6 Agrees strongly people have lost control over personal information 37.9 54.2 Agrees strongly “people like me” have no say about what government does 27.7 43.7 Agrees strongly public officials don't care much about “what people like me think” 31.2 45.4 Almost never trusts government in Washington to do what's right 19.5 37.8 Privacy loss outweighs economic benefit of data sharing 47.1 56.0 Economic benefit of data sharing outweighs privacy loss 47.9 30.4 From the foregoing, it appears that there are two reasons underlying reluctance to provide one's SSN. First, there are reasons associated with beliefs about the census: People who are less aware of the census, who consider it less important, and who are less favorable toward the idea of data sharing are significantly less willing to provide their SSN. Low levels of education are also associated with these characteristics. Second, however, is a set of beliefs and attitudes concerning privacy, confidentiality, and trust: People who are more concerned about privacy, who have less trust in the Bureau's maintenance of confidentiality, and who are less trusting of government in general are much less likely to say they would provide their SSN to the Census Bureau. Women are more likely to be concerned about privacy issues than men, and they are also less willing to say they would provide their SSN to the Bureau. In earlier analyses (Singer and Presser, 1996) we found that importance attached to the census, knowledge about the census, and attitudes about privacy were independent factors predicting willingness to have other agencies share data with the Bureau. Though we have not carried out a factor analysis of attitudes toward willingness to provide one's SSN, the relationships described above suggest that the same clusters of beliefs are relevant for this attitude, as well.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition We should point out that the question asked on the 1996 survey, about whether or not respondents would be willing to provide their SSN, is not equivalent to a field experiment. The number of people who would provide their SSN if asked to do so in an actual census might very well be higher than the two thirds who said they would do so on this survey, as suggested by the field experiment cited at the beginning of this section. On the other hand, if the issue of privacy became salient prior to the census, the number complying might well be less. Arguing for the second, more cautious, inference is the fact that more than a third of those approached for the survey did not participate, and, since the introduction to the survey informed potential respondents about the topic, the nonparticipants may well have included those more suspicious of government and less inclined to cooperate with any request from government agencies, including the Census Bureau[3]. What Does Confidentiality Mean? A number of question wording experiments were included in the 1996 Westat survey. The most important of these, from the perspective of understanding data sharing attitudes, had to do with the meaning of the Census Bureau's assurance of data confidentiality to respondents. The short answer to the question, “What does confidentiality mean to the public?” is, “We don't know.” However, in the rest of this paper, we try to summarize what we think we learned. The 1995 JPSM survey resulted in one very puzzling finding. When asked whether other agencies could get their answers to census questions, identified by name and address, 41% said they did not know; of the rest, about 90% said other agencies could get such information (Presser and Singer, 1995). To make things even more puzzling, the better educated were more likely to believe, erroneously, that other agencies could get such data—virtually the only time, so far as we know, that more education has been associated with more error (Hyman, Wright, and Reed, 1975). Furthermore, the belief that other agencies could get such data was associated with more favorable attitudes toward data sharing. It thus seemed fairly clear that our attempt to provide a neutral definition of “confidentiality” in the 1995 instrument had not had the intended effect. Accordingly, we incorporated a four-way split ballot experiment into the 1996 survey. One quarter of the sample were asked the 1995 question; one quarter, the 1995 question without the DK filter. One quarter were asked, “Do you think the Census Bureau does or does not protect the confidentiality of this (household demographic) information, or don't you know (DK)? ” And the final quarter were asked the confidentiality question without the DK filter. The results are shown in Table 4. The most striking thing about the table is simply the variation in responses, depending on the wording of the question. But the next most startling finding is the difference in responses to the questions asking whether other agencies can get identified data, and whether the Bureau keeps data confidential. Omitting those who answer DK, the percentages who believe responses are NOT shared (or data ARE kept confidential) ranges from 11.5% in Q 7–1 to 69.2% in Q 7–4. Omission of the DK filter reduces the size but does not change the basic form of the relationship. Majorities of the public believe that other agencies can get identified data; they also believe that the Bureau maintains data confidentiality.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition Table 4. —The Effects of Question Wording on Beliefs Regarding Sharing of Responses by Census Bureau Response Do you think other government agencies…can or cannot get people's names and addresses along with their answers to the census? Do you think the Census Bureau does or does not protect the confidentiality of this [household demographic] information?   Explicit “Not Sure” % No Explicit “Not Sure” % Explicit “Not Sure” % No Explicit “Not Sure” % Believe that census responses are shared 47.1 76.9 9.6 20.9 Believe that census responses are not shared 6.1 15.4 12.9 47.0 Not Sure/Don't Know 46.8 7.7 77.5 32.1 N (unweighted) 310 296 294 315 In passing, we should note that the distribution of answers to the version of the question which is identical to the 1995 question do not differ significantly from the 1995 distribution; and, as in 1995, people who said other agencies CAN get data were significantly more likely to favor data sharing in 1996 as well. In another effort to understand the meaning of confidentiality to respondents, we asked another splitballot question near the end of the 1996 survey. One asked whether the Census Bureau was required by law to keep census information confidential; the other, whether the Bureau was forbidden by law from giving identified census information to other agencies. The responses to the two versions of this question are shown in Table 5. Majorities of those who have an opinion give the correct answer to both questions; but the proportion answering DK is larger, and the proportion giving the correct answer smaller, when the question asks about giving other agencies identified information than when it asks about maintaining confidentiality. As a follow-up to both questions, we asked those who said the Bureau is required to protect the information or forbidden from disclosing it, whether or not they trusted the Bureau to uphold the law—that is, to keep the information confidential, or to refrain from disclosing it to other agencies. Regardless of which version of Q22 they got, two thirds of those who answered Yes to the factual question about legal requirements said they trusted the Bureau to comply with the law. However, those who not only say the Bureau is required to keep information confidential but who also trust the Bureau to do so, are significantly more likely to say both that other agencies cannot get the data and that the Bureau keeps data confidential. Thus, not only knowledge of the law, but also trust in the Bureau's compliance with the law,

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition is implicated in responses to the factual questions about whether the Bureau does or does not protect the data in its possession. Table 5. —The Effect of Question Wording on Knowledge of Laws Regarding Sharing of Census Information Response Is the Census Bureau forbidden by law from giving other government agencies census information identified by name or address? % Is the Census Bureau required by law to keep census information confidential? % Total % Yes 28.3 51.1 40.2 No 17.1 11.6 14.2 Dont't Know 54.6 37.3 45.5 N (unweighted) 591 624 1215 What differentiates those who trust the Bureau to keep information confidential from those who do not? We found only two demographic characteristics that seemed to make a difference. Women are considerably more likely to say they trust the Bureau than men, and younger respondents are more likely to express trust than older respondents are. Whether this is an effect of age or of cohort is impossible to tell from this cross-sectional survey. None of the other demographic characteristics we examined—education, race, or income—make a consistent difference in attitudes of trust. Finally, we looked at the relation of the beliefs about legal requirements to attitudes about data sharing. People who believe the Bureau is required by law to keep data confidential are significantly more likely to favor data sharing than those who do not. On the other hand, people who believe the Bureau is forbidden from sharing data with other agencies are significantly more likely to oppose data sharing by other agencies. Whether this results from confusion, or from an application of the norm of reciprocity, or from opposition to all data sharing, is impossible to tell. Conclusion and Implications The following conclusions seem to follow from comparison of the 1995 and 1996 surveys: Beliefs about the Census Bureau and attitudes toward data sharing have undergone little change since 1995. Beliefs about privacy and trust in government have deteriorated since 1995. To the public, the belief that the Bureau protects confidentiality does not seem to mean that other agencies cannot get data identified by name and address. What it does mean, we cannot tell from these data.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition In each of the figures, the set of observations is the same across all six of the panels. For the income plots, households reporting negative income have been excluded. [15] For example, total income is the first variable imputed, and all reported values (or midpoints of ranges) for variables included in the model for that variable are used to condition the imputation. [16] For disclosure reasons, the scatterplots supporting this claim cannot be released. [17] In the cases examined, this result also holds if the data are separated by implicates rather than averaged across implicates. [18] The five implicates were pooled for these regressions. Standard errors shown in the table are simple regression standard errors that take no account of imputation or sampling error; the degrees of freedom were altered in the standard error calculation to reflect the fact that there were five times as many implicates as observations. [19] Fienberg, Steele and Makov (1996) also address this question. References Fienberg, Stephen E. ( 1997). Confidentiality and Disclosure Limitation Methodology: Challenges for National Statistics and Statistics Research, working paper Department of Statistics, Carneige Mellon University, Pittsburgh, PA. Fienberg, Stephen E. and Makov, Udi E. ( 1997). Confidentiality, Uniqueness, and Disclosure Limitation for Categorical Data, working paper, Department of Statistics, Carneige Mellon University, Pittsburgh, PA. Fienberg, Stephen E.; Steele, Russell J.; and Makov, Udi E. ( 1996). Statistical Notions of Data Disclosure Avoidance and their Relationship to Traditional Statistical Methodology: Data Swapping and Loglinear Models, Proceedings of the 1996 Annual Research Conference and Technology Interchange, Washington, DC: U.S. Bureau of the Census, 87–105. Fries, Gerhard; Johnson, Barry W.; and Woodburn, R.Louise ( 1997a). Analyzing Disclosure Review Procedures for the Survey of Consumer Finances, paper for presentation at the 1997 Joint Statistical Meetings, Anaheim, CA. Fries, Gerhard; Johnson, Barry W.; and Woodburn, R.Louise ( 1997b). Disclosure Review and Its Implications for the 1992 Survey of Finances Proceedings of the Section on Survey Research Methods, 1996 Joint Statistical Meetings, Chicago, IL. Geman, Stuart and Geman, Donald ( 1984). Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6, 6 (November), 721–741. Kennickell, Arthur B. ( 1991). Imputation of the 1989 Survey of Consumer Finances, Proceedings of the Section on Survey Research Methods, 1990 Joint Statistical Meetings, Atlanta, GA.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition Kennickell, Arthur B. and Woodburn, R.Louise ( 1997). Consistent Weight Design for the 1989, 1992 and 1995 SCFs, and the Distribution of Wealth, working paper, Board of Governors of the Federal Reserve System, Washington, DC. Kennickell, Arthur B. ( 1997). Using Range Techniques with CAPI in the 1995 Survey of Consumer Finances Proceedings of the Section on Survey Research Methods, 1996 Joint Statistical Meetings, Chicago, IL. Little, Roderick J.A. ( 1983). The Nonignorable Case, Incomplete Data in Sample Surveys, New York: Academic Press. Rubin, Donald B. ( 1993). Discussion of Statistical Disclosure Limitation, Journal of Official Statistics, 9, 2, 461–468. Schafer, Joseph ( 1995). Analysis of Incomplete Multivariate Data, Chapman and Hall. Tourangeau, Roger; Johnson, Robert A.; Qian, Jiahe; Shin, Hee-Choon; and Frankel, Martin R. ( 1993). Selection of NORC's 1990 National Sample, working paper, National Opinion Research Center at the University of Chicago, Chicago, IL. The views presented in this paper are those of the author alone and do not necessarily reflect those of the Board of Governors or the Federal Reserve System. Any errors are the responsibility of the author alone.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition Sharing Statistical Information for Statistical Purposes Katherine K.Wallman and Jerry L.Coffey Office of Management and Budget Abstract Congress has recognized that a confidential relationship between statistical agencies and their respondents is essential for effective conduct of statistical programs. However, the specific statutory formulas devised to implement this principle in different agencies have created difficult barriers to effective working relationships among these agencies. The development of mechanisms to establish a uniform confidentiality policy that substantially eliminates the risks associated with sharing confidential data will permit significant improvements in data used for both public and private decisions without compromising public confidence in the security of information respondents provide to the Federal government. Initiatives of the Statistical Policy Office to enhance public confidence in the stewardship of sensitive data and to permit limited sharing of confidential data far exclusively statistical purposes received a substantial impetus in the 1995 reauthorization of the Paperwork Reduction Act. The Act strongly endorses the principles embodied in statistical confidentiality pledges and charges OMB to promote sharing of data for statistical purposes within a strong confidentiality framework. This paper discusses the history, the promise, and the current status of initiatives to strengthen and improve data protection while promoting expanded data sharing for statistical purposes. The most recent efforts include the OMB Federal Statistical Confidentiality Order, the Statistical Confidentiality Act (SCA), and companion legislation the SCA, that would make complementary changes to the Internal Revenue Code. Introduction A promising initiative to improve the quality and efficiency of Federal statistical programs is a legislative proposal that would allow the sharing of confidential data among statistical agencies under strict safeguards. The development of this approach has been a painstaking, careful process that has been supported and nurtured by Administrations of both parties over many years. The Administration's Statistical Confidentiality Act and two companion initiatives—the OMB Federal Statistical Confidentiality Order and an amendment to the Internal Revenue Code—address two issues that are vital to ensuring the integrity and efficiency of Federal statistical programs and, ultimately, the quality of

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition Federal statistics. These are the unevenness of current statutory protections for the confidential treatment of information provided to statistical agencies for exclusively statistical purposes; and the barriers to effective working relationships among the statistical agencies that stem from slightly different statutory formulas devised to implement the principle of confidentiality for statistical data in different agencies. The proposed legislation would establish policies and procedures to guarantee the consistent and uniform application of the confidentiality privilege and authorize the limited sharing of information among designated statistical agencies for exclusively statistical purposes. Initiatives Span More Than Two Decades Efforts to address confidentiality concerns with regard to Federal statistical data have a history that extends for more than 25 years. Such efforts have been endorsed on both sides of the aisle in the Congress. The roots of the policies in the Administration's current Statistical Confidentiality Act reflect the work of three Commissions that examined statistical and information issues during the Administrations of Presidents Nixon and Ford. In 1971, the President's Commission on Federal Statistics recommended that the term confidential should always mean that disclosure of data in a manner that would allow public identification of the respondent or would in any way be harmful to him should be prohibited; this commission also recommended that consideration should be given to providing for interagency transfers of data where confidentiality could be protected. In July 1977, the Privacy Protection Study Commission stated that “no record or information…collected or maintained for a research or statistical purposes under Federal authority…may be used in individually identifiable form to make any decision or take any action directly affecting the individual to whom the record pertains…” Later, in October of that year, the President's Commission on Federal Paperwork endorsed the confidentiality and functional separation concepts, but applied them directly and simply to statistical programs, saying that: Information collected or maintained for statistical purposes must never be used for administrative or regulatory purposes or disclosed in identifiable form, except to another statistical agency with assurances that it will be used solely for statistical purposes; and Information collected for administrative and regulatory purposes must be made available for statistical use, with appropriate confidentiality and security safeguards, when assurances are given that the information will be used solely for statistical purposes. The policy discussions generated by the three Commissions came together during the Carter Administration in a bipartisan outpouring of support for the Paperwork Reduction Act (PRA), which largely addressed the efficiency recommendations of the Paperwork Commission. The legislative history of that Act recognized the unfinished work of fitting the functional separation of statistical information into the overall scheme.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition The first attempt to deal with the issues of confidentiality and sharing of statistical data was made by the Carter Administration 's Statistical Reorganization Project (popularly known as the “Bonnen Commission”). This effort paralleled the legislative development work by OMB that became the Paperwork Reduction Act. The initiative identified a group of statistical agencies that could serve as protected environments —or enclaves—for confidential data and attempted to create a harmonized confidentiality policy by synthesizing the several prescriptions in existing laws. The initiative was left behind by the fast-track PRA for two reasons: First, each new prescription to solve problems in one agency raised new questions in other agencies, so that objections to the language increased as the draft legislation became longer and more complex. Second, the approach failed to appreciate that some large databases —e.g., Census and tax files—represented more significant risks and, thus, needed more elaborate confidentiality protection than other files. During the first Reagan Administration, this prescriptive formula became more and more complex, as attempts were made to incorporate comments from both statistical and nonstatistical agencies. The draft proposal eventually was withdrawn when it became apparent that almost no one could understand how all of the myriad definitions and exceptions fit together. While the proposed approach did not succeed, the effort did draw attention to many subtle weaknesses in existing law and led to new statutes and amendments during the second Reagan Administration. In particular, stronger statutory protections were enacted for the National Center for Health Statistics, the National Agricultural Statistics Service, and the National Center for Education Statistics. At the same time, the concept of a government-wide law for statistical confidentiality and data sharing received a complete overhaul. A new strategy was presented to the statistical agencies during the Bush Administration. It had five important features that were missing from earlier efforts: It was designed to work with the tools already available in the PRA—promoting data sharing, but providing for functional separation to ensure that the statistical data are only shared for statistical purposes. It was designed to be robust with respect to reorganizations within the statistical system. Since every major statistical agency had been involved in one or more reorganizations since 1970, it became apparent that any successful strategy would have to work well in any reasonable organizational environment. It was built around a procedural strategy that gives due deference to the precepts of existing law that are tailored to specific risks and builds on agency experience in implementing that body of law. The idea was to adopt a general confidentiality policy consistent with existing law and provide the tools — data sharing agreements, coordinated rules, and consistent Freedom of Information Act (FOIA) exemptions—to address those risks. It provided a means for the major statistical agencies to work closely with other agencies in their areas of expertise. While only the Statistical Data Centers would have broad access to data, any agency that collects its own statistical data can act as a full partner in improving those data under the terms of a data sharing agreement.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition It strengthened the Trade Secrets Act. This universal confidentiality statute consolidated provisions of tax law, customs law, and statistical law, but the statistical implications had been ignored. The new proposal set uniform policies for confidential statistical data, increasing penalties and addressing questions of agents. This fresh start—based on a precedent-setting data sharing order involving the Internal Revenue Service, the Census Bureau, and the Bureau of Labor—had strong support within the Administration. But the effort failed to reach closure. The basic strategy developed during the Bush Administration was later expanded and refined during the first term of the Clinton Administration. Criteria for the Statistical Data Centers (SDCs) were incorporated into the Statistical Confidentiality Act, and every statistical agency that could meet these tests was added to the list of SDCs—bringing the total from four agencies to eight. The relationship to the PRA was fine-tuned, as well, and this process identified some improvements to the PRA that were adopted in the 1995 amendments to that Act. The final step in the recent initiative involved negotiating a complementary amendment to the Statistical Use section of the tax code [26 USC 6103(j)]. This change actually facilitates increased security for taxpayer information, by targeting and, thus, limiting the wholesale disclosures permitted under current law. It permits multi-party sharing agreements, so that specific statistical data sets that include tax data can be shared under IRS security procedures with other SDCs. What Factors Argue for Success Now? After more than two decades, why should we think that these efforts will be any more successful than those of the past? Perhaps it comes down to what can be called the “Three E's:” Experience. —Over the past 25 years we have learned a considerable amount. The current proposal builds on the experience OMB and the agencies gained through earlier efforts. Environment. —The Federal statistical system is faced with growing fiscal resource constraints. At the same time, the 1995 Paperwork Reduction Act extends requirements for reducing burdens imposed on respondents to Federal surveys. Yet another factor that has affected agency views is the increasing number of proposals for consolidating statistical agencies. Enthusiasm. —Last but not least, the statistical agencies appear to be in a “can do” mood—enthusiastically supporting the development and passage of legislation that will even out statutory confidentiality protections and permit data sharing for statistical purposes. Whatever the reasons, the agencies have come together on the Administration proposal now embodied in Statistical Confidentiality Act and its companion pieces. The Statistical Confidentiality Act As the centerpiece of this effort, the Statistical Confidentiality Act has two principal functions: To ensure consistent and uniform application of the confidentiality privilege; and To permit limited sharing of data among designated agencies for exclusively statistical purposes.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition A limited number of Federal statistical agencies would be designated as Statistical Data Centers. The eight agencies that currently meet the criteria to become SDCs are the Bureau of Economic Analysis (BEA), Bureau of the Census, Bureau of Labor Statistics (BLS), National Agricultural Statistics Service (NASS), National Center for Education Statistics (NCES), National Center for Health Statistics (NCHS), the Energy End-Use and Integrated Statistics Division of the Energy Information Administration (EIA), and the Science Resources Studies Division of the National Science Foundation (NSF). A key component of the legislation is functional separation, whereby data or information acquired by an agency for purely statistical purposes can be used only for statistical purposes and cannot be shared in identifiable form for any other purpose without the informed consent of the respondent. If a designated SDC is authorized by statute to collect data or information for any nonstatistical purposes, such data or information must be distinguished by rule from those data collected for strictly statistical reasons. The procedural strategy for implementing the legislation would be carried out via written data sharing agreements between or among statistical agencies. The Statistical Data Centers would provide information on actual disclosures and information security to OMB for inclusion in the annual report to Congress on statistical programs. OMB would also review and approve any implementing rules to ensure consistency with the purposes of the SCA and the PRA. Companion Legislation In addition to the Statistical Confidentiality Act, special amendments have been proposed to the Statistical Use subsection of the Internal Revenue Code—Section 6103 (j). These amendments would authorize limited disclosure of tax data to agencies which have been designated as Statistical Disclosure Centers. In addition, the Research and Statistics Division at the Federal Reserve Board has been added to the group of agencies covered under the IRS companion Bill. The amendment would provide access to tax return information to construct sampling frames and for related statistical purposes as authorized by law. Names, addresses, taxpayer identification numbers, and classifications of other return information in categorical form could be provided for statistical uses. These latter data are not to be used as direct substitutes for statistical program content, but rather can be applied using statistical methods such as imputation to improve the quality of the data. Class sizes or ranges for such data—e.g., for income —will vary by purpose. The amendment is designed to protect taxpayer rights and maintain proper oversight and control over tax return disclosures, while allowing carefully targeted expansion of access to tax return information for statistical purposes only. The Statistical Confidentiality Order As an integral step to foster passage of these legislative proposals, OMB felt it was critical to move ahead with efforts to clarify and make consistent government policy protecting the privacy and confidentiality interests of individuals and organizations that provide data for Federal statistical programs. With that aim in mind, OMB developed and sought public comment on an Order that assures respondents who supply statistical information that their responses will be held in confidence and will not be used against them in any government

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition action. The Order also gives additional weight and stature to policies that statistical agencies have pursued for decades and includes procedures to resolve a number of ambiguities in existing law. Following the public review process, the Federal Statistical Confidentiality Order went into effect on June 27, 1997. What Opportunities Will Attend Passage of the Legislation? For more than a decade, we have worked within the constraints of existing law to make limited comparisons A between similar data sets in different agencies. We have set in motion a series of limited exchanges tailored to conform to current law, but they cannot address all of the problems. Moreover, such exchanges could be cut short by an unfavorable interpretation of any one of the dozens of statutes involved. In each of these cases, extraordinary efforts have been required to accomplish even limited data exchanges. Based on these experiences, we believe that even modest exchanges of information could, in the future, unearth and eliminate important errors in existing economic series, enable significant consolidations of overlapping programs (with comparable reductions in costs), and permit substantial reductions in reporting burden imposed on the public. As the possibility of a law to permit data sharing in a safe environment has become more credible, statistical agencies have begun to identify potential improvements to current operations and programs that this law would permit. These include possibilities such as the following: Integrated database concepts for information on particular segments of the economy and society, such as educational institutions (NCES, NSF, and Census), health care providers (NCHS, Census, and some program-specific agencies), and agricultural establishments (NASS, Census, and the Economic Research Service at the Department of Agriculture), would improve the consistency and quality of data while reducing current data collection costs. Collaboration on sampling frames would improve accuracy and reduce maintenance costs. A more efficient division of labor would make it possible to maintain high quality frames at minimum cost, both for list frames (Census, BLS, NASS) and for area frames (NASS, Census, NCHS). This approach would avoid duplicate expenditures and improve quality. Coordination and shared use of relisting information (updates) in large multi-stage designs could also reduce frame maintenance costs. Targeted frames—or sample selection services—from improved master frames could reduce duplicative expenditures in agencies that must currently pay the cost of independently developing these resources for specific surveys. Access to specific data details that can resolve uncertainties in particular analyses—e.g., anomalies that arise in the Gross Domestic Product estimation process—would reduce errors in macroeconomic statistics without imposing additional burden. Coordination of sample selection across agencies could reduce the total reporting burden that falls on any one household or company (and, thus, improve the level of respondent cooperation). What Systemic Problems Will the Act Address? The Statistical Confidentiality Act creates a credible government-wide confidentiality umbrella.— The public will know that the entire government stands behind the pledges of statistical confidentiality offered by the SDCs or any agency engaged in joint statistical projects with the SDCs.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition The SCA creates the legal presumption that data collected for most purposes may be used in a safe environment for statistical purposes. —This is one of the critical insights of the Privacy and Paperwork Commissions. The SCA provides consistent FOIA policies for all the SDCs. —This was controversial 15 years ago, but now six of the eight agencies designated as SDCs already have in place statutes that meet the requirements of Section (b)(3) of FOIA. The SCA permits the data sharing authorities of the PRA to work without compromising confidentiality. —By establishing the functional separation principle in law, the SCA facilitates the use of PRA mechanisms to promote and manage data sharing for exclusively statistical uses. The SCA provides a privacy-sensitive alternative to the creation of universal databases, which each Department has proposed at one time or another to support its own policy interests. —Statistical methods—particularly sampling—coupled with secure data sharing provide a natural hedge against the big database (i.e., dossier building) mentality that puts privacy at risk. In short, the Statistical Confidentiality Act permits the SDCs and their statistical partners to share both expertise and data resources to improve the quality and reduce the burden of statistical programs, while preserving privacy. Moreover, no matter how the organizational boxes for the ideal Federal statistical system are drawn, this legislatin will permit the components of the statistical system to manage their data as if they were a single, functionally-integrated organization. Current Status of the SCA and Related Initiatives Culminating efforts that literally have spanned decades, the Statistical Confidentiality Act initially was introduced on a bipartisan basis in the House of Representatives in 1996. Late in 1997, the Administration 's proposed legislation was included in a broader bill, S. 1404, introduced on a bipartisan basis in the Senate. With growing bipartisan support in both houses, hopes are high that the SCA will soon become law. The complementary amendment to the Internal Revenue Code is also pending before Congress, with broad bipartisan support. OMB is working with the House and Senate to attain re-introduction and successful action on the legislation during 1998. In addition to these legislative approaches to foster efficiency and quality in Federal statistical programs, the agencies are actively exploring other means of expanding collaboration to improve the effectiveness of the Federal statistical system. Recently the Interagency Council on Statistical Policy (ICSP), under the leadership of the Office of Management and Budget, has broadened efforts of the principal Federal statistical agencies to coordinate statistical work—particularly in areas where activities and issues overlap and/or cut across agencies. One by-product of these efforts was the establishment in 1997 of the Interagency Confidentiality and Disclosure Avoidance Group, under the auspices of OMB's Federal Committee on Statistical Methodology. This working group discusses common technical issues involving privacy, confidentiality, and disclosure limitation. The group is currently working on developing a set of generic guidelines for disclosure review, which could be adapted for use by other agencies.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition It is our hope and expectation that both the statistical confidentiality legislation and the subsequent cooperative efforts will go a long way towards solving some of the challenges the Federal statistical agencies have encountered in a decentralized environment.

OCR for page 235
Record Linkage Techniques—1997: Proceedings of an International Workshop and Exposition This page in the original is blank.