National Academies Press: OpenBook

The Future of Federal Household Surveys: Summary of a Workshop (2011)

Chapter: 4 Collection of Household Data

« Previous: 3 Sampling Frames
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

4

Collection of Household Data

NEW DATA COLLECTION MODES AND THE CHALLENGE OF MAKING THEM EFFECTIVE

Don Dillman (Washington State University) began his presentation by saying that surveys are now in a period of tailored design, in which different modes and implementation procedures are appropriate for different situations. The focus of his talk was on the challenges this new period presents.

An understanding of the evolution of survey modes and designs is important for gaining perspective on the current system. First was the transition from face-to-face interviews to telephone. Dillman recalled that his own experience with this transition was relatively easy, because face-to-face methods could readily be adapted to telephone surveys. Households had about 78 percent telephone coverage in 1970, and this number seemed to be increasing, making the transition increasingly more feasible.

Household survey methods, including sampling approaches, could reasonably be applied to telephone, he said. The interviewer’s role in telephone surveys is similar to that in a face-to-face interview in terms of reading items, clarifying questions, and relying on hidden categories (categories that are not offered to the respondent), as needed. The main differences are that show cards need to be eliminated, scales have to be shortened to achieve the same level of comprehension, and questions sometimes need fewer words to be understood aurally. Another difference is that supervisors are more accessible during telephone than face-to-face interviews.

Bringing email and the web into data collection was a more difficult transition. Currently, approximately two-thirds of households have Internet access

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

and use it with some regularity, leaving a possible one-third of households unable to respond to a survey over the Internet from home.

Another problem that arises with creating a sample of Internet respondents is that it is harder to implement a within-household random selection because some householders lack Internet skills. In the case of some households, this phenomenon may be related to a division of labor: just as some people do the laundry and some take care of cars, a particular person in a household may use the Internet. Furthermore, survey organizations generally do not have email addresses that would enable them to send respondents links to Internet surveys, unless a prior relationship exists. Even if this could be resolved, it is likely that response to an initial email invitation would be quite low.

Meanwhile, the telephone is losing its viability as a survey mode option. There are many reasons for this, including the increasing use of cell phones (although these can sometimes be added to a frame), the decreasing reliance on landlines (current coverage is less than 75 percent of households), and increasingly blurred lines when it comes to the geography of phone numbers. American culture has also changed. People no longer use the telephone for most business interactions unless they have to, and they tend to exercise more control over their devices than in the past, by not always answering calls.

The telephone itself now fulfills a variety of functions, often serving as a personal computer. However, the screen space available for a web questionnaire is small, and entering text on a telephone is prone to error. Finally, responding to a survey on a phone device often cannot be combined well with other activities the potential respondent may be doing while accessing the Internet.

Changes related to the telephone and the continuing limitations of Internet access suggest that, in the near future, there will be more reliance on mixed-mode survey designs to collect data. Dillman devised a typology of the ways data collection modes are most commonly mixed (Dillman et al., 2009), summarized in Box 4-1.

The first type involves the use of a particular mode to encourage people to respond by another mode (typically, the Internet). In a sense, this is still a single-mode study, and therefore measurement differences between modes are not as big a concern as they might be otherwise. In the second type, one mode is used to ask some of the questions, and another mode to ask others, such as more sensitive questions. In practice, this interview technique often entails an interviewer simply turning a laptop around during a face-to-face interview so that the respondent can self-administer part of the interview. A third type of mixed-mode design involves using different modes of administration for different types of respondents. A fourth approach, typically used in longitudinal studies, employs one interview mode for the first interview and another mode for the second and subsequent interviews.

Dillman pointed out that it is important to remember when combining different modes of administration that sometimes achieving one survey objective

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

BOX 4-1
Typology of Mixed-Mode Surveys

Type 1: One mode for data collection, another mode for selection/encouragement.

Type 2: One mode to ask certain questions, another mode for additional questions.

Type 3: One mode for some respondents, another mode for other respondents.

Type 4: One mode for Time 1 data collection, another for Time 2 data collection.

SOURCE: Workshop presentation by Don Dillman.

may get in the way of another. For example, improving response rates by offering alternative modes of responding may introduce measurement differences, or reducing costs may conflict with obtaining quicker responses.

There are also several significant barriers to wider adoption of mixed mode designs, he said. There is a tendency among survey professionals to construct survey questions differently for different modes, and part of the reason for this is the desire to maximize the design for a specific mode. Visual (self-administered) versus aural (telephone) presentations, in particular, have different requirements.

For example, in the face-to-face mode, show cards can be used for answer choices, scales are often fully labeled, questions and questionnaires tend to be longer, and some of the answer options can be made available to the interviewer without explicitly offering them to the respondent (such as “Don’t know” or “Refused”). In the telephone mode, scales tend to be shorter and are presented without all categories labeled, questionnaires are shorter, complex branching formats can be used without affecting respondent comprehension, and, as in the face-to-face mode, some answer options can be made available without being explicitly offered. The mail mode encourages less question branching but can accommodate longer, more complex scales. Open-ended question formats are avoided when possible, and response categories cannot be hidden. A web mode encourages required answers and fewer “don’t know” options. Fill-ins are possible from previous answers. Audio, video, and other add-ons are possible, and typically there are no hidden categories. Unintentional mode-related construction differences can often lead to significant differences in the distribution of the answers provided.

Research has shown that the visual layout of survey items influences answers. Dillman highlighted the 24 most significant concepts in visual design

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

(see Box 4-2). As an example of different design requirements for visual communication, he described a challenge encountered by the National Science Foundation while designing one of its web surveys. The goal was to obtain date information from respondents using two digits for month and four digits for year, in adjacent character spaces. Cognitive interviewing revealed that respondents will attempt a variety of approaches to answering a date question (e.g., using alphabetic abbreviations for the month) and that they get frustrated when

BOX 4-2
Visual Design Concepts That Matter

Attention and visual processing:

Preattentive processing

Attentive processing

Useful field of view

Foveal view

Top-down processing

Bottom-up processing

Visual features that influence the expression of words, numbers, and symbols:

Figure/ground composition

Size

Shape

Location

Spatial arrangement

Color

Brightness

Contrast

Languages that give independent meaning to information on a page:

Words

Numbers

Symbols

Grouping principles:

Pragnanz (law of simplicity)

Proximity

Elemental connectedness

Common region

Continuity

Closure

Common fate

SOURCE: Workshop presentation by Don Dillman.

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

image

FIGURE 4-1 Summary of web experiments.
SOURCE: Workshop presentation by Don Dillman.

they receive an error message. This led to extensive testing of this question over a period of four years.

Figure 4-1 shows that changes in visual formatting led to large differences. According to the law of proximity in Gestalt psychology, if something is connected, it tells people to do the same. When this principle was applied in experiments, 55 percent of respondents filled in the boxes correctly. If the month box was smaller and the year box a little larger, 63 percent filled in the boxes correctly. When the symbolic language MM, YYYY was added to the respective boxes, this yielded 87 percent correct responses. Finally, when boxes and symbolic language were arranged in natural reading order, 96 percent of respondents provided responses in the desired format.

they receive an error message. This led to extensive testing of this question over a period of four years.

Dillman also described some experiments to address the issue of visual versus aural presentation. In one study, he asked respondents in three different ways when they began their studies: (1) When did you begin your studies? (2) What date did you begin your studies? And (3) What month and year did you begin your studies? On the web survey, there was little difference in the percentage of students using the preferred MM/YYYY format. However, over the phone, the differences between the distribution of the responses were drastic. The percentage of respondents reporting month and year was 13.4 in the “when” condition, 49.5 in the “what date” condition, and 83.7 percent in the “what month and year” condition.

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

Of course, in the case of telephone interviews, the interviewer can act as an “intelligent system” that converts the responses to the desired format. That luxury does not exist in a web mode, forcing researchers to think of questionnaire construction differently and to invoke theoretical concepts on visual information processing.

Another issue related to different modes of administration involves scalar questions. The concepts of social desirability, acquiescence, primacy, and recency have been often used to explain why people respond the way they do, but Dillman argued that these concepts often do not explain mode differences. He and colleagues conducted several experiments to examine whether using the same wording for scalar questions will produce the same answers in aural as in visual presentations. The experiments involved a variety of scales, including 5-point, 7-point, fully labeled, and polar point labeled scales. Regardless of the scale type, each of the experiments resulted in slightly more positive responses on the telephone than on the web. The point here is that there is a consistent body of evidence building that mode makes a difference in responses.

A line of research Dillman is particularly interested in involves combining two visual modes of data collection and avoiding the aural mode. Sending an email request as a first contact is typically not appropriate in cross-sectional household surveys, unless there is an existing relationship with the sample members or if they are a part of a longitudinal study. When given a choice of mail or web response, through mail contact, people tend to opt for mail, and overall response rates are lower. Requests for web-only responses typically result in low response rates.

However, and despite declining response rates for most modes of data collection, response rates in mail surveys, particularly with prior screenings or incentives, tend to remain fairly high. Some of the reasons can be explained by social exchange theory, and such concepts as rewards/benefits, burden/ costs, and trust in the delivery of benefits. Social exchange theory could serve as a guide for other self-administered modes, such as the web, and for mixing modes in order to avoid having to rely on email only to obtain web responses and postal contacts only to get postal responses.

In many ways the Internet is different. There are problems with using it for surveys: the burden can be greater when responding to a survey via the Internet, particularly if going from postal letter to the computer; computer literacy is low for some respondents; there are operational issues—Does the computer work properly, or at all?—and emails from strangers can be harder to find or get lost more easily after the first day in one’s email inbox.

The benefits of Internet surveys vary. Technology is easier to deal with for some than for others. For some, there may be faster ways of responding. With an Internet survey, there is no need to try to find a mailbox to return a questionnaire. But with Internet surveys, trust is a significant concern. People do not like to open email from strangers, the sources of emails and websites can be faked,

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

and there is the ever-present threat of downloading a virus or other malicious software. This last issue represents an area in which government agencies may have an advantage: people tend to trust communication coming from a government authority much more than any other potential survey contact.

Still, if people are given a choice of responding by either mail or Internet, most will chose mail. And, if mail is withheld to encourage respondents to use the web, research has shown that the respondents who end up participating during follow-up are very different from one mode to another. Dillman noticed in his research, however, that if an address-based sample is used to try to push people to the web, the result is a greater response from an advance postal token incentive for the mail-plus-web combination than for just the mail response alone. Email tends to cut the burden of web response because it brings respondents closer to their response mode preference. In essence, what will best bring postal, email, and web contacts together to obtain more responses by web is to begin integrating two modes, rather than forcing all web options together or mail options together.

In Dillman’s view, it is important for the survey community to bring together token cash incentives, mode choice, and email augmentation in trying to move forward. New options like address-based sampling and the sequential use of modes need further exploration but have great potential.

He ended by saying that the transition to the web is desirable, but it is going to be difficult. A positive development is that Dillman’s experiments that were based on address-based samples have yielded two-thirds of the responses over the web, which three or four years ago would not have been possible. However, coverage limitations suggest the need to use another mode (most likely mail) to at least deliver the request. This also raises concerns about mode differences. Evidence is mounting that the aural and visual modes sometimes produce different responses.

INTEGRATING ADMINISTRATIVE RECORDS INTO THE FEDERAL STATISTICAL SYSTEM 2.0

The focus of the presentation by Rochelle Martinez (Office of Management and Budget) was to illustrate what the statistical system could do to address barriers to making greater use of administrative records. For the past few years, interesting work has been going on to try to build capacity to use more administrative records, particularly with demographic data collection. Her talk specifically addressed the work going on across the statistical system, coordinated by the Office of Management and Budget (OMB). She discussed initiatives in the president’s budget and recent events related to administration support for these activities.

For many years, members of the statistical community have said that administrative records can and should be used more fully in the federal statis-

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

tical system and in federal programs. The use of administrative records in the Netherlands and other countries gives a good flavor of the kinds of things the statistical system can envision doing in the United States to varying degrees. There are also areas, however, in which substantial work has already been done in the U.S. context. Most notably, administrative records have been used in economic statistical programs since the 1940s. There are also good examples of administrative data use with vital statistics, population estimates, and other programs across several federal statistical agencies.

Martinez mentioned that former director of the U.S. Census Bureau, Kenneth Prewitt, often talks about another reason that administrative records hold potential: the need for innovation. He has said that he is less concerned about the federal statistical system with regard to relevance and integrity than he is about innovation, in particular about how prepared statistical agencies are for the innovation necessary to navigate the new world. In many cases, national information systems are increasingly reliant on administrative data and, in some instances, on data from the commercial sector. Prewitt’s greatest concern is that government agencies seeking statistical information about the population will bypass statistical agencies altogether as they turn to the parts of the government that control large administrative data sets.

Martinez said that she sees this happening in some federal agencies right now. Offices that are collecting data for administrative purposes can (at least reportedly) produce a statistical result much more quickly than the principal statistical agency in that department. For a congressional or public affairs office, this is very appealing. Those in the statistical system can think of reasons why that might be a problem, but these offices may not. The best case scenario is that there are multiple estimates in the public domain that somebody has to be able to explain. The worst case is that somebody thinks that a statistical agency is less relevant and less timely and therefore that its data are less useful than the administrative data source. At OMB, Open Government and Data.gov initiatives encourage putting many more administrative data sets in the public domain, where they can be used for a variety of purposes, so these issues need to be addressed across the system.

Members of the Federal Committee on Statistical Methodology (FCSM) wanted to facilitate statistical agency use of administrative records. To explore how to achieve this, an interagency subcommittee was formed. This group created a set of products that the statistical community may find useful going forward.

The first product to come out of the subcommittee was a set of case studies, “Profiles in Success,” focusing on projects that had successfully acquired and used administrative data in a statistical project. Martinez said that the case studies were quite useful in helping the subcommittee members identify systematic barriers to greater use of administrative records. It is these barriers that the group has tried to address head-on in recent months and years.

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

Following the “Profiles” product, the subcommittee turned to awareness activities, in part to dispel myths related to difficulties related to using state administrative records data. This group found many good examples of successful administrative data use in research and, in some cases, production. The subcommittee wanted to highlight the necessary success factors for using administrative data, and the statistical community has been very receptive. As a result, the subcommittee has been asked to develop training and other activities to help data users navigate the difficult world of acquiring and using administrative data.

A subsequent product for the toolkit, she said, was one of creating model agreements. Getting an agreement in place for data sharing and usage between agencies is often a drain on time and money. Thus, the subcommittee has created a model agreement that agencies can use to facilitate the data-sharing process. Although many aspects of such agreements can be covered in a template, not all can, so there will be tailoring to some extent. The idea behind model agreements is to reduce front-end costs, because so many projects either die on the vine at this stage or use too many project resources, leaving fewer resources for the research.

Another product created by the subcommittee is related to informed consent. The informed consent product is an in-depth look at legal requirements across federal agencies, current practices for informed consent at statistical agencies, and current practices at administrative agencies. It also synthesizes research on informed consent wording in the context of data sharing and record linkage. This product is likely to help the statistical system in terms of best practices for new activities going forward. It will also provide guidance on how to meet requirements for projects for which administrative data were collected before there was an identified statistical use for them. The subcommittee has also done some work on data quality, with the goal of creating tools for data quality measurement and documentation, but it is far from complete.

As a result of the subcommittee’s work, Martinez went on, at least four barriers to using administrative data crystallized. One of these barriers is statistical agency access to administrative data. Statistical agencies have statutes that are designed to protect the confidentiality of data, and they consider themselves very much stewards of data. But despite these provisions and helpful language in the Privacy Act, statistical uses of administrative data are sometimes difficult to achieve. In many departments, program offices have data on which the legislation is either silent, unclear, or perhaps narrow in terms of the kinds of uses that are considered appropriate.

There is also an issue of incentives; program offices may not think it worth the effort to figure out how to address a statistical agency’s request for data. Whose job is it to work with the statistical agency? It can be very time-consuming to identify variables that are needed or to work with an agency to understand what data they have now or how these could be used. Negotiating

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

agreements is a practical product that comes out of these discussions. Some agencies spend years and years trying to obtain administrative data. Statistical agency access to administrative data may be the most important barrier because, without access, projects cannot be undertaken.

A second, somewhat related barrier is what the subcommittee has termed inadequate infrastructure, referring to the infrastructure at both the statistical office and the administrative office. There is an administrative infrastructure needed to address such issues as the process for requesting data and approving the request. Technical infrastructure can require a significant investment of time and resources on the statistical side. But even on the administrative agency side, someone has to be able to extract and transfer the data. The subcommittee thinks that infrastructure is lacking in many of these cases.

The third barrier is administrative data quality. Although they are not perfect, with survey data, agencies have the capability to describe and to understand the quality of what they have. In other words, there are a lot of measurement tools for survey data that do not yet exist for administrative records. Some have assumed that administrative data are a gold standard of data, that they are the truth. However, others in the statistical community think quite the opposite: that survey data are more likely to be of better quality. Without a common vocabulary and a common set of measurements between the two types of data, the conversation about data quality becomes subjective.

Another significant data quality issue for statistical agencies is the bias that comes with the refusal or the inability to successfully link records. In addition to the quality of the administrative data as an input, the quality of the data as they come out of a linkage must be considered as well.

The final barrier has to do with researcher access. This includes researchers both internal and external to the government. Sometimes an afterthought, this is the idea of creating documentation that would be needed to really make a file, particularly a linked file, useful for someone else outside the project. There are issues of documentation and of providing disclosure protection to a linked file. For this reason, linked files are very rarely public-use files. Few methods for restricted access have been devised beyond those that existed for projects before record linkage was a focus. Many of these linked files have been created and not really used by people outside the immediate project, and that is a concern both in terms of the utility of what has been created and for data quality.

Martinez said that some initiatives in the president’s fiscal year 2011 budget should help further the subcommittee’s goal of promoting the use and exchange of administrative data. Specifically, three major pilot studies have been proposed, two for the Census Bureau (2010 Census Simulation Pilot and Health Data Pilot) and one for the Economic Research Service (Nutrition and Food Assistance Pilot).

Together, these three pilot studies are designed to address all four barriers. Although the barriers will not be resolved in a year, agencies can certainly

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

begin to address them in ways that benefit the entire federal statistical system. Martinez emphasized that the notion of a common good was very important in proposing the initiative.

The first pilot project is designed to use both government and commercial administrative data to see if it is possible to simulate 2010 census results. Outcomes envisioned include advancing both knowledge about and measurement of the quality of many administrative record data sets. Ideally, this will not only inform the decennial census, but also other demographic surveys.

In Martinez’s view, this project is also critical to setting up an infrastructure. Some consider the Census Bureau to be the ideal place for this, because it is thought to be big enough and stable enough to handle a large number of different files and many different activities. This is why the Census Bureau also received much of the funding; it would be much less efficient to attempt to build up infrastructure at multiple statistical agencies than to centralize the technology, capacity, expertise, and synergy.

The second pilot project is related to the first one and is also housed mostly at the Census Bureau. The idea is that the Census Bureau has the capacity and stabilizing infrastructure that enables it to provide record linkage services to other federal statistical agencies. The National Center for Health Statistics (NCHS) has agreed to be the pilot agency to provide identifiers from multiple health-related administrative and survey data sets to the Census Bureau to link and return to NCHS.

The overarching concept behind this pilot study is that record linkage is a service, a line of business that the Census Bureau could provide to agencies that are smaller or that lack similar capacity. A vision for the future is to centralize to some degree the expertise and the hands-on experience with different data files while still retaining the benefit of having a subject-matter agency, such as NCHS, getting back the data and using them for both subject-matter research and for providing access to other health researchers.

The goal of the third pilot project, the nutrition project, is to help the statistical community better understand how to acquire and use state administrative records for statistical research and to demonstrate the utility of such data for program evaluation. The hope is that this project can help identify a model in which these data might be acquired in a more centralized way. This project also helps to bring together multiple agencies that are interested in state data.

Although a primary goal of the pilots is to address the barriers outlined, Martinez said that these projects have also created interest among policy officials because of the ability to learn more from a subject-matter perspective. To make any of these ideas happen, it is essential that administrative agencies be included in the conversations about these uses of their data.

To that end, OMB has recently issued a memorandum encouraging federal agencies to share data in order to meet the needs of several administration initiatives, including statistical data projects. This demonstrates that administra-

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

tion officials are supportive of these efforts to increase the use of administrative data. The support of senior officials will be necessary, she said, because a move to expand administrative data use necessarily entails difficult conversations about legal and policy issues regarding data access.

Martinez added that all of the work she described was sponsored by the Interagency Council on Statistical Policy (ICSP). The ICSP is comprised of the heads of the principal statistical agencies. Among these agency heads a subgroup has been focused on developing a vision beyond the three pilot projects. She said that among agency heads and project teams alike, there is continued enthusiasm for these projects, and they are hopeful that the studies can continue to move forward in an uncertain budget environment.

Despite operating under a continuing resolution, project teams have already been working on the aforementioned pilot projects. These groups would like to involve more researchers in the projects to help think through some of the issues that crop up in the course of the work. Furthermore, it is very important that not only federal statistical agencies, but also the professional statistical community, and particularly those working in the states, contribute to this conversation.

THE ROLE OF ADMINSTRATIVE RECORDS IN HOUSEHOLD SURVEYS: THE CANADIAN PERSPECTIVE

Julie Trépanier (Statistics Canada) described her agency’s use of administrative records in household surveys. To set the stage for this perspective, she outlined official legislation, policies, and guidelines that govern administrative data use in Canada.

Statistics Canada’s guiding principle—though not a policy—is to use administrative records whenever they present a cost-effective alternative to direct data collection. Section 13 of the Statistics Act allows Statistics Canada to obtain administrative data files from any organization for the purposes of the law. It also specifies some rights of access to administrative data. Specifically, Section 24 gives Statistics Canada the right to use income tax records; Section 25 gives access to excise tax records; and Sections 26 and 29 give access to crime and justice records. The act also stipulates that Statistics Canada is responsible for promoting the avoidance of duplication in the information collected by the various departments.

A memorandum of understanding (MOU) governs the release of administrative information to Statistics Canada. These documents say what the data are, when the data will be available, how much they will cost, and how and between whom the data will be shared. MOUs are lengthy, extremely detailed documents. For example, the MOU between the Canada Revenue Agency and Statistics Canada is over 100 pages. Creating an MOU is often difficult, involving negotiations that last years.

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

Another important aspect of the legal framework for linking survey data to administrative data are two policies that govern these transactions: (1) the policy on informing survey respondents and (2) the policy on record linkage.

Currently, data from different sources cannot be linked unless the Statistics Canada policy committee approves of the linkage. This committee is the highest committee at Statistics Canada, chaired by the chief statistician. However, under the policy on record linkage, two omnibus record linkage authorities have been approved and allow linkages to be performed under certain circumstances without requiring separate approval by the policy committee.

The first authority is the omnibus record linkage authority for the economic statistics program, and it allows linkage of data for business surveys. The second authority is the omnibus record linkage authority for improving population and household survey programs, which allows linking data for three reasons: (1) to improve a survey (e.g., to improve stratification, nonresponse adjustment), (2) to study and assess survey data quality (e.g., to improve survey frame quality, assess disclosure risk), and (3) to aid in data collection (e.g., to add addresses or phone numbers). Record linkage is not allowed under these omnibus authorities, however, if the purpose of the linkage is to produce estimates for public release. To do this, approval is still required from the policy committee.

Trépanier also discussed the challenges and drawbacks they experienced using administrative data. Referencing points also made by Jelke Bethlehem about the Netherlands, she commented that researchers will never have the same control over administrative data that is possible over statistical data. Even if a thorough evaluation of the administrative data is conducted before deciding to use them, there are still errors and risks that can jeopardize the process, and statistical agencies often are not informed about changes that can have these types of effects. Some of the major risks are summarized below:

  • Data may change or cease to be collected without warning for some parts of the population.
  • The concepts and definitions underlying data may not be exactly what is assumed or expected.
  • Often quality assurance by the organization collecting the administrative data is not comparable to what could have been put into place for purposes of statistical usage.
  • Timeliness of the data is frequently a problem.
  • The lack of stability in the administrative data program is also a danger.

Much like the United States, Canada is encountering many challenges with household surveys. Trépanier named decreasing response rates and increasing costs as the most important. Even in the Labour Force Survey (LFS), which is mandatory, there has been a slight decline in participation. There is also a perception of an increased response burden, not only due to requests for infor-

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

mation from statistical agencies, but also from administrative agencies and the private sector.

Similar to the United States, Canada has considered ways of overcoming these challenges, and the use of administrative data has been identified as one option for overcoming them, because it allows for the reduction of sample size. Specifically, administrative data can be used to construct list frames, which can in turn be used to allow for stratified simple random sampling. List-type frames can make design simpler and more efficient.

Administrative data are also helpful to use in indirect estimation (calibration). Administrative data may reduce the effort required to reach each respondent, and they may be able to provide better contact information for the sampling frame. They can also be used to help implement a more efficient collection strategy, such as responsive design. Using administrative data may help reduce the volume of data collected by partially or completely replacing survey data. Furthermore, they can reduce the impact of nonresponse.

There are multiple examples of how Statistics Canada has used administrative data, Trépanier said. Even before the passage of the omnibus record linkage authority, administrative data have been used to complement existing sampling frames, such as the Address Register (AR) mentioned earlier, with additional information on addresses and telephone numbers. The AR was substituted for the listing of approximately 40 percent of clusters in the last redesign of the LFS area frame. Administrative data have also been used in the random digit dialing frame to identify a working bank of telephone numbers and to add addresses for advance letters to the residences whose telephone numbers were selected for interview.

There are also instances of using administrative data for partial substitution of other survey data. For example, rather than collecting income from respondents as part of the 2006 census and other household surveys, such as the Survey of Labour and Income Dynamics (SLID) and the Survey of Financial Security (SFS), Statistics Canada asked respondents for permission to use income tax information instead. Currently, the permission rate is about 80 percent.

Trépanier explained that Statistics Canada has used administrative data for indirect estimation in the past. Specifically, they were used to improve consistency across surveys for income estimates using harmonized calibration for the SLID, the SFS, and the Survey of Household Spending (SHS). Statistics Canada used what is referred to as T4 information, or employers’ forms on salaries and wages. The number of employees by class of salaries and wages is used as a control total in the calibration in conjunction with the traditional calibration to demographic control totals. These methods were successful in improving consistency across survey estimates produced by these surveys. Administrative data have been used for direct estimates as well for tabulations of certain pension, health, justice, education, and travel statistics.

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

Since the passage of the 2008 data omnibus record linkage authority, an example of how administrative data have been used is to construct a frame for the new Survey of Young Canadians. Neither households rotating out of the LFS nor a fresh sample of dwellings from an area frame was sufficient or cost-effective for generating a sample for this survey. Because of the need to sample from a unique population of respondents ages 1-18, Statistics Canada turned to the Canada Child Tax Benefit (CCTB) file. Every child ages 0-6 in Canada receives a monthly benefit, irrespective of family income, and the child is registered in the hospital at birth. Children who are no longer eligible for the benefit are also included; thus the database is quite comprehensive.

In comparing the 2006 CCTB file with that of the 2006 census, it was discovered that coverage in the CCTB was quite good: 93-97 percent per age per year. Income distributions between the two collections were also quite similar. However, the Survey of Young Canadians was planned primarily as a survey using computer-assisted telephone interviewing (CATI), and contact information was not in the file received by Statistics Canada. Arrangements were subsequently made with the Canada Revenue Agency to obtain contact information, Trépanier said.

In a field test of the survey, which was mostly a test of the contact information, 83 percent of the 1,000 test cases had a valid address on the file. Also worthy of note is that there was an anticipation of concern, particularly from parents, about the use of the CCTB to reach respondents, but the pretest indicated that this was not a problem. As an example of previously described potential drawbacks of administrative data, at some point the records of all persons over age 18 were removed from the database based on the argument that they were no longer eligible for the benefit, even though they would have been of interest for the survey.

Other efforts to centralize and improve tracing operations using administrative data currently pursued by Statistics Canada include samples sent to the Canadian Council of Motor Transport Administrators (CCMTA), which returns them with addresses from driver’s license information. Statistics Canada is also making greater use of the National Change of Address file that is created by Canada Post.

One recommendation put forth by the Vision for Administrative Data Task Force at Statistics Canada was to develop an explicit policy on administrative data, Trépanier said. Currently, Statistics Canada has a guiding principle for administrative data use but no official policy. In addition, centralizing processes for taking in and using administrative data need to be established, she said. This would entail creating an inventory of data and assigning management responsibility for each data source. There is also a push to mobilize existing resources, prioritize research, and establish a governance process on how to use administrative data.

For the future, Trépanier said, using administrative data to build sampling

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

frames is of particular interest. There is the risk of coverage error in using an administrative database in constructing a frame, but if it is done in the context of using multiple other frames and calibration to correct coverage error, this is probably less of an issue. The ideal goal is a single frame, which is the approach used in building Statistics Canada’s Address Register, but this does not preclude the inclusion of auxiliary information. A single frame would allow for better coordination of samples and survey feedback, she said.

For data collection, one of the goals related to administrative data is to enable tracing. Statistics Canada wants to centralize the tracing process leading to the linking of all administrative data sources to make available the best contact information possible. This will require substantial effort, including a process to weigh the quality of the different sources and determine what contact information is most likely to be accurate. Another goal for administrative data could be to better understand the determinants of survey response and improve data collection procedures based on this information. For example, administrative data can provide guidance on preferred mode of data collection if one can assess whether persons who file their taxes electronically are also more likely to respond to an electronic questionnaire.

Statistics Canada has been successful in using substitution of income data from tax records, and this is likely to be continued. It is yet unclear, however, whether other information is available that could replace survey data. Investigating these options is done with caution because of the risk discussed. There is also the problem of ensuring consistency between survey and administrative data across variables.

Administrative data can also assist researchers in better understanding nonresponse bias and the impact of lower response rates. Finally, they can help both reduce the volume of data collected in surveys and improve estimation. Now that Statistics Canada has the omnibus record linkage authority in place, exploring all of these options has become a much easier process.

DISCUSSION

The discussion of the various methods used in the collection of household data began with several questions about the Canadian system of household surveys. Kathleen Styles (Census Bureau) asked for clarification on the omnibus record linkage authority—specifically, how did that come to pass, what was the motivation, and what did it hope to accomplish? Trépanier answered that it was established after someone realized that requests for linkage were going to the policy committee quite frequently (about every two weeks) and that many of these linkage requests were similar in nature. This process became burdensome, particularly considering that the requests generally did not involve disseminating administrative data. Since a record linkage authority already existed on the business side, that was extended for use in the area of

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

linking social and survey data as well. But it is important to remember that the omnibus authority was designed to be used for evaluations that could improve surveys—not to disseminate administrative data sources. And although going to the policy committee is no longer necessary, the Access Division at Statistics Canada must be notified of the administrative data use so that it can make an inventory of all the linkages.

Styles followed up her question with another one about registers. A register of persons is a loaded issue, but does Statistics Canada have permanent files that are intended to represent all Canadian residents? In the discussion of tracing and a centralized address frame, it seemed as if this may be similar to a register. Trépanier responded that the central processes for tracing are under construction now. As for the Address Register, the plan is not necessarily to use it for all of Canada. As Tambay said earlier, the AR will be good for listing in urban areas, but it is likely that there will still be a need for an area frame, particularly for rural areas.

Cynthia Clark asked Trépanier to clarify under what circumstances is Statistics Canada required to obtain consent for the use of tax data. Trépanier said that one interpretation of the Statistics Act is that permission is only necessary if administrative data were to be used in conjunction with other survey data. In those cases the respondent would need to be informed that the data are being linked.

Graham Kalton reminded the participants that according to Trépanier’s presentation, the SLID obtains permission from a high proportion of respondents for the use of tax records, but about 15 percent refuse to grant permission. But researchers still have access to all the records. Is Statistics Canada now allowed to match those records together to evaluate the returns? How is this problem handled? Would it be better not to ask permission and just use the records?

Trépanier said that they were interested in conducting a study of the SLID respondents who refused access to their tax records, but it turned out that the way they are currently asking for permission is very general, and this precludes the linkage if respondents refuse.

A discussion participant asked Martinez for clarification on the integration of administrative health data, specifically, whether a linkage of the National Health and Nutrition Examination Survey (NHANES) to states is the issue under consideration or whether something more elaborate is planned. Martinez replied that, initially, the primary files being linked would be Health Interview Survey data with Centers for Medicare & Medicaid Services data, using mostly the Medicare files. The NHANES linkages to some state files are part of the other pilot study, the nutrition and food assistance project.

Jay Ryan (Bureau of Labor Statistics) is interested in new data collection technologies and asked Dillman what kind of research is being done with text messaging for survey contact, particularly now that text messaging has become

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

so prevalent. Also, how will the shift to larger cell phone screens, particularly in the case of smart phones and tablet PCs, affect data collection? Phillip Kott agreed that text messaging is becoming an increasingly important mode of communication among young people in particular, who often consider phone calls rude and expect a text message even before agreeing to talk to someone on the phone.

Dillman said that he was not aware of much research on text messaging, but this was something he has thought about, particularly what kind of coverage it would entail and the type of people most likely to use it. He added that he suspects that people who use text messaging frequently may be quite different from those who do not. Another concern related to this technology is that if people read text messages on the go, they are not going to stop to fill out a survey, because they are probably not in a good place to do that.

On smart phones and tablet PCs, Dillman said that the screens of many of these are still too small. Still, surveys will eventually be constructed for these devices. He predicted that the first study of surveys on smart phones and tablet PCs will happen as early as spring 2011.

This issue is a challenge even in the case of those who rely on email as their primary form of communication, Dillman continued. In the studies he has conducted of both mail and email contacts to entice survey participation, he received a higher response when a questionnaire was sent via postal mail than when an email response was requested. Young people also tend to go to paper first. The bottom line, however, is that little progress will be made on electronic surveys if all that is done is to send an email and then expect people to respond. Even for young people, surveys will need to do something different. This sometimes results in a higher cost for web surveys than mail.

Keith Rust noted that, in Westat’s studies of mode choice, many respondents use more than one mode, which means that responses have to be unduplicated. This may be because respondents use a mode that is convenient to them and then use another one in addition to respond to the survey because they think that is what the administrators of the survey want them to use.

Dillman replied that it is critical that researchers be very clear about what is requested of respondents. For example, if a web response is preferred, the survey should state that and explain the reasons. Even then, giving a questionnaire to a person but then telling them to respond by another mode, web for example, is a challenge, because the respondent will consider that the paper is right there in hand and, in order to respond by web, one must wake up the computer, and type in a complex URL.

Jelke Bethlehem asked Dillman for clarification on his advice not to use CATI and computer-assisted personal interviewing (CAPI) in mixed-mode surveys but rather use mail and emails. One of the Statistics Netherlands surveys follows up web contact with mail, then CATI, and then CAPI. Does Dillman

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

recommend that the CATI and CAPI follow-up steps be abandoned in this survey?

Dillman clarified that he was not suggesting that any of the modes should be abandoned. Different situations call for different modes. It is, however, increasingly difficult to conduct a conversation with people over the telephone, because that is not how the telephone is used anymore. Society has evolved so that people control the phone, and they use it when they want to. It used to be that they had to answer the phone or miss a call. Changes in culture are contributing to the decline of phone surveys more than changes in technology. The technology just made the culture change possible.

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×

This page intentionally left blank.

Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 35
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 36
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 37
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 38
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 39
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 40
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 41
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 42
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 43
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 44
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 45
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 46
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 47
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 48
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 49
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 50
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 51
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 52
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 53
Suggested Citation:"4 Collection of Household Data." National Research Council. 2011. The Future of Federal Household Surveys: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/13174.
×
Page 54
Next: 5 End of Day 1: Discussant Remarks and Floor Discussion »
The Future of Federal Household Surveys: Summary of a Workshop Get This Book
×
Buy Paperback | $31.00 Buy Ebook | $24.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Federal household surveys today face several significant challenges including: increasing costs of data collection, declining response rates, perceptions of increasing response burden, inadequate timeliness of estimates, discrepant estimates of key indicators, inefficient and considerable duplication of some survey content, and instances of gaps in needed research and analysis. The Workshop on the Future of Federal Household Surveys, held at the request of the U.S. Census Bureau, was designed to address the increasing concern among many members of the federal statistical system that federal household data collections in their current form are unsustainable. The workshop brought together leaders in the statistical community to discuss opportunities for enhancing the relevance, quality, and cost-effectiveness of household surveys sponsored by the federal statistical system.

The Future of Federal Household Surveys is a factual summary of the presentations and related discussions that transpired during the workshop. This summary includes a number of solutions that range from methodological approaches, such as the use of administrative data, to emphasis on interagency cooperative efforts.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!