Prior to the floor discussion, Alan Zaslavsky (Harvard Medical School) summarized some of the salient points from the first day of the workshop. Referring to the series of presentations on other countries’ survey systems, he noted that what is impossible to implement in one country might be the only way to do things someplace else. In the same manner, what is impossible in the United States today could be a research project in 5 years, and in 10 years it might become obvious that this once-impossible strategy is now the only way to operate. In other words, persistence can pay off.
He went on to say that the reasons for some of the differences across countries go beyond the realm of scientific considerations to areas in which participants at this workshop do not necessarily specialize: history, politics, and culture. The degree of centralization characterizing administrative structures is another important factor contributing to differences. Nevertheless, the presentations can serve as a wake-up call for the statistical community in the United States to consider household survey systems in other countries and to aspire to learn from the experience of others.
Zaslavsky mentioned that there was a lot of discussion about innovation. Now, he said, it is a question of how can the statistical system convince itself, and then others as well, that many of the ideas mentioned today are worth pursuing. In the case of the U.K. survey, validation was carried out by comparing the new series with the previous series, which from a statistical point of view is a fairly clear-cut process. But if members of the statistical system are truly interested in innovation, then they must be prepared for situations in which
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 55
5 End of Day 1: Discussant Remarks and Floor Discussion DISCUSSANT REMARKS Prior to the floor discussion, Alan Zaslavsky (Harvard Medical School) summarized some of the salient points from the first day of the workshop. Referring to the series of presentations on other countries’ survey systems, he noted that what is impossible to implement in one country might be the only way to do things someplace else. In the same manner, what is impossible in the United States today could be a research project in 5 years, and in 10 years it might become obvious that this once-impossible strategy is now the only way to operate. In other words, persistence can pay off. He went on to say that the reasons for some of the differences across countries go beyond the realm of scientific considerations to areas in which participants at this workshop do not necessarily specialize: history, politics, and culture. The degree of centralization characterizing administrative structures is another important factor contributing to differences. Nevertheless, the presen - tations can serve as a wake-up call for the statistical community in the United States to consider household survey systems in other countries and to aspire to learn from the experience of others. Zaslavsky mentioned that there was a lot of discussion about innovation. Now, he said, it is a question of how can the statistical system convince itself, and then others as well, that many of the ideas mentioned today are worth pur- suing. In the case of the U.K. survey, validation was carried out by comparing the new series with the previous series, which from a statistical point of view is a fairly clear-cut process. But if members of the statistical system are truly interested in innovation, then they must be prepared for situations in which 55
OCR for page 55
56 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS the new measures will not be consistent with what was done before. Although changes in methodology will make some data users unhappy, a new methodol - ogy may be equally or perhaps more fit for use and more practical to implement. This may mean that agencies and decision makers will have to think hard about who the key data users are, as well as what information and policy needs have to be satisfied. An example of a transition to a new methodology in the U.S. federal sta - tistical system is the transportation research community’s transition from using the census long-form sample to using the American Community Survey as a source of transportation data. At the start of this process, they were reportedly quite unsure about the idea of using data that were based on a rolling sample and that would usually be 2 or 3 years old, as opposed to the data from the census long-form sample, which could be up to 10 years old. This is a good example of breaking away from the way things have been done with the goal of improving the fitness for use, and now they may have something better than what they had before. Another way of thinking about the issue of acceptability is to question what are considered official statistics. Some people argue that an actual enumeration is the only legitimate way to count the population, but the statistical commu - nity knows that this is not the best approach to obtain most of the data. The question is how far is the statistical and survey community really willing to go to innovate. When will model-based estimates be widely accepted as official statistics? There have been and continue to be challenges to almost all forms of statistical methodology applied to the census. But the statistical system is in a position that it could be releasing a lot more official numbers that are model-based, and indeed there are some areas in which model-based estimates are well accepted, such as unemployment statistics that are adjusted through a sophisticated time-series model. There has been considerable talk of Google’s consumer price index (CPI) recently. If Google develops a method that tracks the online sales of groceries, it will probably reflect the price of groceries in stores fairly well. The index will, of course, be based on a biased sample, with not nearly the right coverage of grocery stores, but if there is demand to get a leading indicator of the CPI without having to wait for data to arrive from an agency whose field representa- tives are visiting stores or calling people and asking what they paid for a gallon of milk, the Google CPI, or a more disaggregated version of it, can be useful for statistical modeling. However, this does not mean that the statistical community should be accepting all new methodologies that come along. There is still an important role for statistical agencies, perhaps as gatekeepers, because raw administrative data and unvetted Internet surveys are not going to necessarily yield very good statistics. Zaslavsky also reflected on the discussions about the use of different modes
OCR for page 55
57 END OF DAY 1 for data collection, which may require the use of different sampling frames. There are some purposes for which Internet panels may be a useful tool—for example, they are widely used in market research. Few researchers believe that these panels are efficient, representative, or accurate as a simple statistical esti - mation tool. However, they are quite consistent from month to month, because respondents are on the same panel for a few years or even longer. If the research interest is to look at trends or change over time, the data from these panels may be quite useful, although only in modeling. This is another area in which the statistical community must consider how far it is willing to stretch the concept of official statistics in order to make use of tools like this. In the day’s presentations there was a good deal of discussion about the use of surveys as sampling frames for other surveys. There are obviously substantial efficiencies resulting from collaborations of this type, but there are also substan- tial challenges related to making these arrangements work well, Zaslavsky said. There is the problem of the second-phase survey inheriting the limitations of the first-phase survey. Beyond this, there are significant administrative barriers that exemplify many of the problems occurring in the statistical system more generally, especially different objectives that come along with different sources of funding. Some of the important underlying issues are those of privacy and confiden- tiality. These concerns are very ill defined. What exactly does privacy mean? Jean-Louis Tambay gave an excellent example of how a confidentiality scandal can be created by simply informing the public of an existing data collection practice, even if there have been no breaches of confidentiality. A scandal on this topic is easy to create at any time. One could argue that, in the past, the protection of privacy was guaranteed primarily through inefficiency and inaccessibility. For example, a great deal of public data are unalphabetized and moldering in the basements of courthouses in over 3,000 different counties. In some sense, those data are private, and it does not matter that they are actually public. Today a lot of information is easily accessible over the Internet, and as the inefficiencies are fading, organizations are finding that they must establish official policies about storing public records that were once much less obviously public. A national policy conversation is required to think about what the rational trade-offs are and the obligations of individual citizens and the polity toward each other. Zaslavsky added that it is also worth mentioning that the greatest threats to privacy and consequences of breaches are from the commercial sector, not government data collections. For example, being denied a home loan because someone stole your credit card is a scenario that is a lot more likely than confidential data being released by a government agency. For years there has been talk of using administrative records, especially for the census, but in every case it was decided that it was not the right time. Zaslavsky has always believed that taking small steps and making incremental
OCR for page 55
58 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS progress is important to move the statistical system forward in this area. If there had been more persistent efforts in the early days, the system would be much further ahead now. Julie Trépanier presented a good list of alternative uses for administrative data and of programs actually being implemented at Statistics Canada, incremental as they may be. Zaslavsky said that the current work in this area, described by Rochelle Martinez, is perhaps one of the most optimistic developments in years for the federal statistical system. But one question that arises in response to these initia- tives is whether the opportunities for sharing will be adequate for everything that is needed. As an example, there is clearly a role for those who work with the Statistics of Income Division (SOI) to work with data from the Internal Revenue Service (IRS). The SOI can collect a sample and clean it, thus making it a much better data system than just the raw tax returns would be. These analysts can then cooperate with other agencies for data matching. However, there are some situations in which there really is a need to have access to the entire IRS database, and a statistical agency may or may not be able to gain that access. The point, Zaslavsky said, is that broader support is needed to carry out linkage projects. FLOOR DISCUSSION The topics covered during the floor discussion at the end of the first day were as varied as the day’s presentations. Cynthia Clark commented that as part of the thinking about the sharing of sampling frames across agencies, it would be useful to consider the development of a frame that contained both households and establishments in a comprehensive geographic system. She recalled that a suggestion similar to this was made as part of the work of a United Nations commission developing a global strategy for agricultural and rural statistics. The goal of the initiative was to develop a system that enables the collection of comparable data across countries and to build a master sam - pling frame that would allow linkages to occur. She added that, in the National Agricultural Statistics Service, which focuses on rural statistics, access to a household sampling frame would enable the agency to better meet some of its data needs than what is currently feasible given the design of the American Community Survey. Trivellore Raghunathan (University of Michigan) noted that, with the advent of mixed-mode designs, there needs to be an effort to understand what is really being measured, because context matters for survey participation. Research has shown that if the same question is asked in two different ways, different answers will result. Perhaps the differences should be modeled to cre - ate some sort of population-level equivalence. Jelke Bethlehem agreed, saying that in the Netherlands, much of the survey data can be collected via the web, making mixed-mode surveys cheaper. However, it is difficult to disentangle
OCR for page 55
59 END OF DAY 1 mode effects and selection effects, and there are concerns about the estimates as a result. Developing models to examine these questions would be interesting. Phillip Kott noted that as long as there is nonresponse in a survey, model- based methods will have to be applied. Many of the participants at the workshop recognize that models are already being used in multiple ways. For example, model-assisted methods are used to get a good sense of probability sampling properties, to carry out small-area estimation, and to create synthetic estimates. Furthermore, data users generally do not care how the data are produced; they just want them. So perhaps it is worth considering how much of the resistance to model-based estimation comes from the statistical community itself. Roderick Little (Census Bureau) agreed that much of what is done now is model-based. The issue is the robustness of the models and how they repre - sent the data. Regarding administrative records, he added that their role may be different depending on the intended analysis. In many cases, administra - tive records may be most useful for descriptive statistics, such as an income distribution, given that the records do not usually contain information about relationships. Zaslavsky responded that in some cases it is possible to imagine adminis - trative records being more useful for analytic purposes than survey data. An example of this would be longitudinal data, such as income tax records that go back 30 years. Survey data are rarely available for a similar time period. However, producing model-based estimates designed for descriptive purposes and then using these in analytic studies could be problematic. In an analytic study that involves a model-based estimate with a large regression component, relationships may be discovered that are primarily due to the way the model was specified. So it is important to go back to the original data and understand how they were put together in order to be able to use them in an analytic study. Bethlehem provided an example from Statistics Netherlands to illustrate how relationships can be studied using administrative data. Statistics Nether- lands combined police register data with population register data to examine relationships between ethnic background and crimes committed. He added that sometimes it is possible to study relationships that could not have been examined with survey data alone, but he acknowledged that a major limitation is that these types of data are not necessarily accessible to outside researchers because of disclosure concerns. Frauke Kreuter (University of Maryland) said that the German Department of Labor Statistics has permission to link indicators, such as nonresponse and linkage consent indicators, to an administrative database on the grounds that they are survey production data that do not reveal personal information. This could be described as an incremental step that allows researchers to use the administrative data for modeling in various forms. It may be interesting to con - sider whether such a step could be within reach in the United States, she said. Katherine Wallman said that it is time to have a conversation with the
OCR for page 55
60 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS American public about the issue of privacy. Prior to the release of the memo by the Office of Management and Budget (OMB) that outlined several pilot programs for the use of administrative records, OMB staff met with privacy advocates. Despite these conversations, it remains unclear whether many of these privacy issues have been fully parsed out with this community, and they have definitely not been parsed out with the public. She said that the federal statistical community needs to take some risks in this area and to have a care - fully constructed conversation about privacy, and in her view the time to do that is now. Wallman said that there is frequent miscommunication on the topic of administrative records, because often assumptions are made about how the data will be used without the specifics being discussed. She was reminded of this during Trépanier’s very clear presentation, which made her realize that she and her Canadian colleagues have been talking past one another about the use of tax data for the past few years. She clarified that the Census Bureau does have access to tax data for most of the functions that Statistics Canada does, short of actually using the records to replace missing data. Another example recalled by Wallman involved the discussions of extending authority to the Bureau of Labor Statistics to use tax records, and this dialogue was also hindered by mis - communications related to the type of use. Wallman ended by saying that she plans to advocate for more conversations about data sharing.