Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 35
4 Collection of Household Data NEW DATA COLLECTION MODES AND THE CHALLENGE OF MAKING THEM EFFECTIVE Don Dillman (Washington State University) began his presentation by say- ing that surveys are now in a period of tailored design, in which different modes and implementation procedures are appropriate for different situations. The focus of his talk was on the challenges this new period presents. An understanding of the evolution of survey modes and designs is impor- tant for gaining perspective on the current system. First was the transition from face-to-face interviews to telephone. Dillman recalled that his own experience with this transition was relatively easy, because face-to-face methods could readily be adapted to telephone surveys. Households had about 78 percent telephone coverage in 1970, and this number seemed to be increasing, making the transition increasingly more feasible. Household survey methods, including sampling approaches, could rea - sonably be applied to telephone, he said. The interviewer’s role in telephone surveys is similar to that in a face-to-face interview in terms of reading items, clarifying questions, and relying on hidden categories (categories that are not offered to the respondent), as needed. The main differences are that show cards need to be eliminated, scales have to be shortened to achieve the same level of comprehension, and questions sometimes need fewer words to be understood aurally. Another difference is that supervisors are more accessible during tele - phone than face-to-face interviews. Bringing email and the web into data collection was a more difficult transi - tion. Currently, approximately two-thirds of households have Internet access 35
OCR for page 36
36 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS and use it with some regularity, leaving a possible one-third of households unable to respond to a survey over the Internet from home. Another problem that arises with creating a sample of Internet respondents is that it is harder to implement a within-household random selection because some householders lack Internet skills. In the case of some households, this phenomenon may be related to a division of labor: just as some people do the laundry and some take care of cars, a particular person in a household may use the Internet. Furthermore, survey organizations generally do not have email addresses that would enable them to send respondents links to Internet surveys, unless a prior relationship exists. Even if this could be resolved, it is likely that response to an initial email invitation would be quite low. Meanwhile, the telephone is losing its viability as a survey mode option. There are many reasons for this, including the increasing use of cell phones (although these can sometimes be added to a frame), the decreasing reliance on landlines (current coverage is less than 75 percent of households), and increasingly blurred lines when it comes to the geography of phone numbers. American culture has also changed. People no longer use the telephone for most business interactions unless they have to, and they tend to exercise more control over their devices than in the past, by not always answering calls. The telephone itself now fulfills a variety of functions, often serving as a personal computer. However, the screen space available for a web questionnaire is small, and entering text on a telephone is prone to error. Finally, respond - ing to a survey on a phone device often cannot be combined well with other activities the potential respondent may be doing while accessing the Internet. Changes related to the telephone and the continuing limitations of Internet access suggest that, in the near future, there will be more reliance on mixed- mode survey designs to collect data. Dillman devised a typology of the ways data collection modes are most commonly mixed (Dillman et al., 2009), sum - marized in Box 4-1. The first type involves the use of a particular mode to encourage people to respond by another mode (typically, the Internet). In a sense, this is still a single-mode study, and therefore measurement differences between modes are not as big a concern as they might be otherwise. In the second type, one mode is used to ask some of the questions, and another mode to ask others, such as more sensitive questions. In practice, this interview technique often entails an interviewer simply turning a laptop around during a face-to-face interview so that the respondent can self-administer part of the interview. A third type of mixed-mode design involves using different modes of administration for dif - ferent types of respondents. A fourth approach, typically used in longitudinal studies, employs one interview mode for the first interview and another mode for the second and subsequent interviews. Dillman pointed out that it is important to remember when combining dif - ferent modes of administration that sometimes achieving one survey objective
OCR for page 37
37 COLLECTION OF HOUSEHOLD DATA BOX 4-1 Typology of Mixed-Mode Surveys Type 1: One mode for data collection, another mode for selection/encouragement. Type 2: One mode to ask certain questions, another mode for additional questions. Type 3: One mode for some respondents, another mode for other respondents. Type 4: One mode for Time 1 data collection, another for Time 2 data collection. SOURCE: Workshop presentation by Don Dillman. may get in the way of another. For example, improving response rates by offer- ing alternative modes of responding may introduce measurement differences, or reducing costs may conflict with obtaining quicker responses. There are also several significant barriers to wider adoption of mixed mode designs, he said. There is a tendency among survey professionals to construct survey questions differently for different modes, and part of the reason for this is the desire to maximize the design for a specific mode. Visual (self-administered) versus aural (telephone) presentations, in particular, have different requirements. For example, in the face-to-face mode, show cards can be used for answer choices, scales are often fully labeled, questions and questionnaires tend to be longer, and some of the answer options can be made available to the interviewer without explicitly offering them to the respondent (such as “Don’t know” or “Refused”). In the telephone mode, scales tend to be shorter and are presented without all categories labeled, questionnaires are shorter, complex branching formats can be used without affecting respondent comprehension, and, as in the face-to-face mode, some answer options can be made available without being explicitly offered. The mail mode encourages less question branching but can accommodate longer, more complex scales. Open-ended question formats are avoided when possible, and response categories cannot be hidden. A web mode encourages required answers and fewer “don’t know” options. Fill-ins are possible from previous answers. Audio, video, and other add-ons are possible, and typically there are no hidden categories. Unintentional mode-related con - struction differences can often lead to significant differences in the distribution of the answers provided. Research has shown that the visual layout of survey items influences answers. Dillman highlighted the 24 most significant concepts in visual design
OCR for page 38
38 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS (see Box 4-2). As an example of different design requirements for visual com - munication, he described a challenge encountered by the National Science Foundation while designing one of its web surveys. The goal was to obtain date information from respondents using two digits for month and four digits for year, in adjacent character spaces. Cognitive interviewing revealed that respon - dents will attempt a variety of approaches to answering a date question (e.g., using alphabetic abbreviations for the month) and that they get frustrated when BOX 4-2 Visual Design Concepts That Matter Attention and visual processing: Preattentive processing Attentive processing Useful field of view Foveal view Top-down processing Bottom-up processing Visual features that influence the expression of words, numbers, and symbols: Figure/ground composition Size Shape Location Spatial arrangement Color Brightness Contrast Languages that give independent meaning to information on a page: Words Numbers Symbols Grouping principles: Pragnanz (law of simplicity) Proximity Elemental connectedness Common region Continuity Closure Common fate SOURCE: Workshop presentation by Don Dillman.
OCR for page 39
39 COLLECTION OF HOUSEHOLD DATA 55.3 Survey #1 63.3 #1 #2 45.4 #2 87.2 #1 88.5 #1 90.6 #3 95.8 0 50 100 FIGURE 4-1 Summary of web experiments. SOURCE: Workshop presentation by Don Dillman. they receive an error message. This led to extensive testing of this question over a period of four years. Figure 4-1 shows that changes in visual formatting led to large differ- ences. According to the law of proximity in Gestalt psychology, if something is connected, it tells people to do the same. When this principle was applied in experiments, 55 percent of respondents filled in the boxes correctly. If the month box was smaller and the year box a little larger, 63 percent filled in the boxes correctly. When the symbolic language MM, YYYY was added to the respective boxes, this yielded 87 percent correct responses. Finally, when boxes and symbolic language were arranged in natural reading order, 96 percent of respondents provided responses in the desired format. Dillman also described some experiments to address the issue of visual versus aural presentation. In one study, he asked respondents in three differ- ent ways when they began their studies: (1) When did you begin your studies? (2) What date did you begin your studies? And (3) What month and year did you begin your studies? On the web survey, there was little difference in the percentage of students using the preferred MM/YYYY format. However, over the phone, the differences between the distribution of the responses were dras - tic. The percentage of respondents reporting month and year was 13.4 in the “when” condition, 49.5 in the “what date” condition, and 83.7 percent in the “what month and year” condition.
OCR for page 40
40 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS Of course, in the case of telephone interviews, the interviewer can act as an “intelligent system” that converts the responses to the desired format. That luxury does not exist in a web mode, forcing researchers to think of question - naire construction differently and to invoke theoretical concepts on visual information processing. Another issue related to different modes of administration involves sca - lar questions. The concepts of social desirability, acquiescence, primacy, and recency have been often used to explain why people respond the way they do, but Dillman argued that these concepts often do not explain mode differences. He and colleagues conducted several experiments to examine whether using the same wording for scalar questions will produce the same answers in aural as in visual presentations. The experiments involved a variety of scales, including 5-point, 7-point, fully labeled, and polar point labeled scales. Regardless of the scale type, each of the experiments resulted in slightly more positive responses on the telephone than on the web. The point here is that there is a consistent body of evidence building that mode makes a difference in responses. A line of research Dillman is particularly interested in involves combining two visual modes of data collection and avoiding the aural mode. Sending an email request as a first contact is typically not appropriate in cross-sectional household surveys, unless there is an existing relationship with the sample members or if they are a part of a longitudinal study. When given a choice of mail or web response, through mail contact, people tend to opt for mail, and overall response rates are lower. Requests for web-only responses typically result in low response rates. However, and despite declining response rates for most modes of data collection, response rates in mail surveys, particularly with prior screenings or incentives, tend to remain fairly high. Some of the reasons can be explained by social exchange theory, and such concepts as rewards/benefits, burden/ costs, and trust in the delivery of benefits. Social exchange theory could serve as a guide for other self-administered modes, such as the web, and for mixing modes in order to avoid having to rely on email only to obtain web responses and postal contacts only to get postal responses. In many ways the Internet is different. There are problems with using it for surveys: the burden can be greater when responding to a survey via the Internet, particularly if going from postal letter to the computer; computer literacy is low for some respondents; there are operational issues—Does the computer work properly, or at all?—and emails from strangers can be harder to find or get lost more easily after the first day in one’s email inbox. The benefits of Internet surveys vary. Technology is easier to deal with for some than for others. For some, there may be faster ways of responding. With an Internet survey, there is no need to try to find a mailbox to return a question- naire. But with Internet surveys, trust is a significant concern. People do not like to open email from strangers, the sources of emails and websites can be faked,
OCR for page 41
41 COLLECTION OF HOUSEHOLD DATA and there is the ever-present threat of downloading a virus or other malicious software. This last issue represents an area in which government agencies may have an advantage: people tend to trust communication coming from a govern - ment authority much more than any other potential survey contact. Still, if people are given a choice of responding by either mail or Internet, most will chose mail. And, if mail is withheld to encourage respondents to use the web, research has shown that the respondents who end up participating during follow-up are very different from one mode to another. Dillman noticed in his research, however, that if an address-based sample is used to try to push people to the web, the result is a greater response from an advance postal token incentive for the mail-plus-web combination than for just the mail response alone. Email tends to cut the burden of web response because it brings respon - dents closer to their response mode preference. In essence, what will best bring postal, email, and web contacts together to obtain more responses by web is to begin integrating two modes, rather than forcing all web options together or mail options together. In Dillman’s view, it is important for the survey community to bring together token cash incentives, mode choice, and email augmentation in trying to move forward. New options like address-based sampling and the sequential use of modes need further exploration but have great potential. He ended by saying that the transition to the web is desirable, but it is going to be difficult. A positive development is that Dillman’s experiments that were based on address-based samples have yielded two-thirds of the responses over the web, which three or four years ago would not have been possible. However, coverage limitations suggest the need to use another mode (most likely mail) to at least deliver the request. This also raises concerns about mode differences. Evidence is mounting that the aural and visual modes sometimes produce different responses. INTEGRATING ADMINISTRATIVE RECORDS INTO THE FEDERAL STATISTICAL SYSTEM 2.0 The focus of the presentation by Rochelle Martinez (Office of Manage- ment and Budget) was to illustrate what the statistical system could do to address barriers to making greater use of administrative records. For the past few years, interesting work has been going on to try to build capacity to use more administrative records, particularly with demographic data collection. Her talk specifically addressed the work going on across the statistical system, coordinated by the Office of Management and Budget (OMB). She discussed initiatives in the president’s budget and recent events related to administration support for these activities. For many years, members of the statistical community have said that administrative records can and should be used more fully in the federal statis -
OCR for page 42
42 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS tical system and in federal programs. The use of administrative records in the Netherlands and other countries gives a good flavor of the kinds of things the statistical system can envision doing in the United States to varying degrees. There are also areas, however, in which substantial work has already been done in the U.S. context. Most notably, administrative records have been used in economic statistical programs since the 1940s. There are also good examples of administrative data use with vital statistics, population estimates, and other programs across several federal statistical agencies. Martinez mentioned that former director of the U.S. Census Bureau, Kenneth Prewitt, often talks about another reason that administrative records hold potential: the need for innovation. He has said that he is less concerned about the federal statistical system with regard to relevance and integrity than he is about innovation, in particular about how prepared statistical agencies are for the innovation necessary to navigate the new world. In many cases, national information systems are increasingly reliant on administrative data and, in some instances, on data from the commercial sector. Prewitt’s greatest concern is that government agencies seeking statistical information about the population will bypass statistical agencies altogether as they turn to the parts of the government that control large administrative data sets. Martinez said that she sees this happening in some federal agencies right now. Offices that are collecting data for administrative purposes can (at least reportedly) produce a statistical result much more quickly than the principal statistical agency in that department. For a congressional or public affairs office, this is very appealing. Those in the statistical system can think of rea - sons why that might be a problem, but these offices may not. The best case scenario is that there are multiple estimates in the public domain that some - body has to be able to explain. The worst case is that somebody thinks that a statistical agency is less relevant and less timely and therefore that its data are less useful than the administrative data source. At OMB, Open Government and Data.gov initiatives encourage putting many more administrative data sets in the public domain, where they can be used for a variety of purposes, so these issues need to be addressed across the system. Members of the Federal Committee on Statistical Methodology (FCSM) wanted to facilitate statistical agency use of administrative records. To explore how to achieve this, an interagency subcommittee was formed. This group created a set of products that the statistical community may find useful going forward. The first product to come out of the subcommittee was a set of case studies, “Profiles in Success,” focusing on projects that had successfully acquired and used administrative data in a statistical project. Martinez said that the case stud- ies were quite useful in helping the subcommittee members identify systematic barriers to greater use of administrative records. It is these barriers that the group has tried to address head-on in recent months and years.
OCR for page 43
43 COLLECTION OF HOUSEHOLD DATA Following the “Profiles” product, the subcommittee turned to awareness activities, in part to dispel myths related to difficulties related to using state administrative records data. This group found many good examples of suc - cessful administrative data use in research and, in some cases, production. The subcommittee wanted to highlight the necessary success factors for using administrative data, and the statistical community has been very receptive. As a result, the subcommittee has been asked to develop training and other activities to help data users navigate the difficult world of acquiring and using administrative data. A subsequent product for the toolkit, she said, was one of creating model agreements. Getting an agreement in place for data sharing and usage between agencies is often a drain on time and money. Thus, the subcommittee has cre - ated a model agreement that agencies can use to facilitate the data-sharing pro - cess. Although many aspects of such agreements can be covered in a template, not all can, so there will be tailoring to some extent. The idea behind model agreements is to reduce front-end costs, because so many projects either die on the vine at this stage or use too many project resources, leaving fewer resources for the research. Another product created by the subcommittee is related to informed con - sent. The informed consent product is an in-depth look at legal requirements across federal agencies, current practices for informed consent at statistical agencies, and current practices at administrative agencies. It also synthesizes research on informed consent wording in the context of data sharing and record linkage. This product is likely to help the statistical system in terms of best practices for new activities going forward. It will also provide guidance on how to meet requirements for projects for which administrative data were collected before there was an identified statistical use for them. The subcommittee has also done some work on data quality, with the goal of creating tools for data quality measurement and documentation, but it is far from complete. As a result of the subcommittee’s work, Martinez went on, at least four bar- riers to using administrative data crystallized. One of these barriers is statistical agency access to administrative data. Statistical agencies have statutes that are designed to protect the confidentiality of data, and they consider themselves very much stewards of data. But despite these provisions and helpful language in the Privacy Act, statistical uses of administrative data are sometimes difficult to achieve. In many departments, program offices have data on which the leg - islation is either silent, unclear, or perhaps narrow in terms of the kinds of uses that are considered appropriate. There is also an issue of incentives; program offices may not think it worth the effort to figure out how to address a statistical agency’s request for data. Whose job is it to work with the statistical agency? It can be very time- consuming to identify variables that are needed or to work with an agency to understand what data they have now or how these could be used. Negotiating
OCR for page 44
44 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS agreements is a practical product that comes out of these discussions. Some agencies spend years and years trying to obtain administrative data. Statisti - cal agency access to administrative data may be the most important barrier because, without access, projects cannot be undertaken. A second, somewhat related barrier is what the subcommittee has termed inadequate infrastructure, referring to the infrastructure at both the statistical office and the administrative office. There is an administrative infrastructure needed to address such issues as the process for requesting data and approv - ing the request. Technical infrastructure can require a significant investment of time and resources on the statistical side. But even on the administrative agency side, someone has to be able to extract and transfer the data. The subcommittee thinks that infrastructure is lacking in many of these cases. The third barrier is administrative data quality. Although they are not perfect, with survey data, agencies have the capability to describe and to under- stand the quality of what they have. In other words, there are a lot of measure - ment tools for survey data that do not yet exist for administrative records. Some have assumed that administrative data are a gold standard of data, that they are the truth. However, others in the statistical community think quite the oppo- site: that survey data are more likely to be of better quality. Without a common vocabulary and a common set of measurements between the two types of data, the conversation about data quality becomes subjective. Another significant data quality issue for statistical agencies is the bias that comes with the refusal or the inability to successfully link records. In addition to the quality of the administrative data as an input, the quality of the data as they come out of a linkage must be considered as well. The final barrier has to do with researcher access. This includes researchers both internal and external to the government. Sometimes an afterthought, this is the idea of creating documentation that would be needed to really make a file, particularly a linked file, useful for someone else outside the project. There are issues of documentation and of providing disclosure protection to a linked file. For this reason, linked files are very rarely public-use files. Few methods for restricted access have been devised beyond those that existed for projects before record linkage was a focus. Many of these linked files have been created and not really used by people outside the immediate project, and that is a con - cern both in terms of the utility of what has been created and for data quality. Martinez said that some initiatives in the president’s fiscal year 2011 budget should help further the subcommittee’s goal of promoting the use and exchange of administrative data. Specifically, three major pilot studies have been pro - posed, two for the Census Bureau (2010 Census Simulation Pilot and Health Data Pilot) and one for the Economic Research Service (Nutrition and Food Assistance Pilot). Together, these three pilot studies are designed to address all four barri - ers. Although the barriers will not be resolved in a year, agencies can certainly
OCR for page 45
45 COLLECTION OF HOUSEHOLD DATA begin to address them in ways that benefit the entire federal statistical system. Martinez emphasized that the notion of a common good was very important in proposing the initiative. The first pilot project is designed to use both government and commercial administrative data to see if it is possible to simulate 2010 census results. Out - comes envisioned include advancing both knowledge about and measurement of the quality of many administrative record data sets. Ideally, this will not only inform the decennial census, but also other demographic surveys. In Martinez’s view, this project is also critical to setting up an infrastruc- ture. Some consider the Census Bureau to be the ideal place for this, because it is thought to be big enough and stable enough to handle a large number of different files and many different activities. This is why the Census Bureau also received much of the funding; it would be much less efficient to attempt to build up infrastructure at multiple statistical agencies than to centralize the technology, capacity, expertise, and synergy. The second pilot project is related to the first one and is also housed mostly at the Census Bureau. The idea is that the Census Bureau has the capacity and stabilizing infrastructure that enables it to provide record linkage services to other federal statistical agencies. The National Center for Health Statistics (NCHS) has agreed to be the pilot agency to provide identifiers from multiple health-related administrative and survey data sets to the Census Bureau to link and return to NCHS. The overarching concept behind this pilot study is that record linkage is a service, a line of business that the Census Bureau could provide to agencies that are smaller or that lack similar capacity. A vision for the future is to centralize to some degree the expertise and the hands-on experience with different data files while still retaining the benefit of having a subject-matter agency, such as NCHS, getting back the data and using them for both subject-matter research and for providing access to other health researchers. The goal of the third pilot project, the nutrition project, is to help the sta - tistical community better understand how to acquire and use state administra - tive records for statistical research and to demonstrate the utility of such data for program evaluation. The hope is that this project can help identify a model in which these data might be acquired in a more centralized way. This project also helps to bring together multiple agencies that are interested in state data. Although a primary goal of the pilots is to address the barriers outlined, Martinez said that these projects have also created interest among policy offi - cials because of the ability to learn more from a subject-matter perspective. To make any of these ideas happen, it is essential that administrative agencies be included in the conversations about these uses of their data. To that end, OMB has recently issued a memorandum encouraging federal agencies to share data in order to meet the needs of several administration initiatives, including statistical data projects. This demonstrates that administra-
OCR for page 46
46 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS tion officials are supportive of these efforts to increase the use of administrative data. The support of senior officials will be necessary, she said, because a move to expand administrative data use necessarily entails difficult conversations about legal and policy issues regarding data access. Martinez added that all of the work she described was sponsored by the Interagency Council on Statistical Policy (ICSP). The ICSP is comprised of the heads of the principal statistical agencies. Among these agency heads a sub - group has been focused on developing a vision beyond the three pilot projects. She said that among agency heads and project teams alike, there is continued enthusiasm for these projects, and they are hopeful that the studies can con - tinue to move forward in an uncertain budget environment. Despite operating under a continuing resolution, project teams have already been working on the aforementioned pilot projects. These groups would like to involve more researchers in the projects to help think through some of the issues that crop up in the course of the work. Furthermore, it is very important that not only federal statistical agencies, but also the professional statistical community, and particularly those working in the states, contribute to this conversation. THE ROLE OF ADMINSTRATIVE RECORDS IN HOUSEHOLD SURVEYS: THE CANADIAN PERSPECTIVE Julie Trépanier (Statistics Canada) described her agency’s use of adminis - trative records in household surveys. To set the stage for this perspective, she outlined official legislation, policies, and guidelines that govern administrative data use in Canada. Statistics Canada’s guiding principle—though not a policy—is to use administrative records whenever they present a cost-effective alternative to direct data collection. Section 13 of the Statistics Act allows Statistics Canada to obtain administrative data files from any organization for the purposes of the law. It also specifies some rights of access to administrative data. Specifically, Section 24 gives Statistics Canada the right to use income tax records; Sec - tion 25 gives access to excise tax records; and Sections 26 and 29 give access to crime and justice records. The act also stipulates that Statistics Canada is responsible for promoting the avoidance of duplication in the information col - lected by the various departments. A memorandum of understanding (MOU) governs the release of admin- istrative information to Statistics Canada. These documents say what the data are, when the data will be available, how much they will cost, and how and between whom the data will be shared. MOUs are lengthy, extremely detailed documents. For example, the MOU between the Canada Revenue Agency and Statistics Canada is over 100 pages. Creating an MOU is often difficult, involv - ing negotiations that last years.
OCR for page 47
47 COLLECTION OF HOUSEHOLD DATA Another important aspect of the legal framework for linking survey data to administrative data are two policies that govern these transactions: (1) the policy on informing survey respondents and (2) the policy on record linkage. Currently, data from different sources cannot be linked unless the Statistics Canada policy committee approves of the linkage. This committee is the high - est committee at Statistics Canada, chaired by the chief statistician. However, under the policy on record linkage, two omnibus record linkage authorities have been approved and allow linkages to be performed under certain circum - stances without requiring separate approval by the policy committee. The first authority is the omnibus record linkage authority for the economic statistics program, and it allows linkage of data for business surveys. The second authority is the omnibus record linkage authority for improving population and household survey programs, which allows linking data for three reasons: (1) to improve a survey (e.g., to improve stratification, nonresponse adjustment), (2) to study and assess survey data quality (e.g., to improve survey frame quality, assess disclosure risk), and (3) to aid in data collection (e.g., to add addresses or phone numbers). Record linkage is not allowed under these omnibus authori - ties, however, if the purpose of the linkage is to produce estimates for public release. To do this, approval is still required from the policy committee. Trépanier also discussed the challenges and drawbacks they experienced using administrative data. Referencing points also made by Jelke Bethlehem about the Netherlands, she commented that researchers will never have the same control over administrative data that is possible over statistical data. Even if a thorough evaluation of the administrative data is conducted before deciding to use them, there are still errors and risks that can jeopardize the process, and statistical agencies often are not informed about changes that can have these types of effects. Some of the major risks are summarized below: • ata may change or cease to be collected without warning for some D parts of the population. • he concepts and definitions underlying data may not be exactly what T is assumed or expected. • ften quality assurance by the organization collecting the administra- O tive data is not comparable to what could have been put into place for purposes of statistical usage. • imeliness of the data is frequently a problem. T • he lack of stability in the administrative data program is also a danger. T Much like the United States, Canada is encountering many challenges with household surveys. Trépanier named decreasing response rates and increasing costs as the most important. Even in the Labour Force Survey (LFS), which is mandatory, there has been a slight decline in participation. There is also a perception of an increased response burden, not only due to requests for infor-
OCR for page 48
48 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS mation from statistical agencies, but also from administrative agencies and the private sector. Similar to the United States, Canada has considered ways of overcoming these challenges, and the use of administrative data has been identified as one option for overcoming them, because it allows for the reduction of sample size. Specifically, administrative data can be used to construct list frames, which can in turn be used to allow for stratified simple random sampling. List-type frames can make design simpler and more efficient. Administrative data are also helpful to use in indirect estimation (cali - bration). Administrative data may reduce the effort required to reach each respondent, and they may be able to provide better contact information for the sampling frame. They can also be used to help implement a more efficient col - lection strategy, such as responsive design. Using administrative data may help reduce the volume of data collected by partially or completely replacing survey data. Furthermore, they can reduce the impact of nonresponse. There are multiple examples of how Statistics Canada has used administra- tive data, Trépanier said. Even before the passage of the omnibus record linkage authority, administrative data have been used to complement existing sampling frames, such as the Address Register (AR) mentioned earlier, with additional information on addresses and telephone numbers. The AR was substituted for the listing of approximately 40 percent of clusters in the last redesign of the LFS area frame. Administrative data have also been used in the random digit dialing frame to identify a working bank of telephone numbers and to add addresses for advance letters to the residences whose telephone numbers were selected for interview. There are also instances of using administrative data for partial substitu - tion of other survey data. For example, rather than collecting income from respondents as part of the 2006 census and other household surveys, such as the Survey of Labour and Income Dynamics (SLID) and the Survey of Finan- cial Security (SFS), Statistics Canada asked respondents for permission to use income tax information instead. Currently, the permission rate is about 80 percent. Trépanier explained that Statistics Canada has used administrative data for indirect estimation in the past. Specifically, they were used to improve consis - tency across surveys for income estimates using harmonized calibration for the SLID, the SFS, and the Survey of Household Spending (SHS). Statistics Can - ada used what is referred to as T4 information, or employers’ forms on salaries and wages. The number of employees by class of salaries and wages is used as a control total in the calibration in conjunction with the traditional calibration to demographic control totals. These methods were successful in improving consistency across survey estimates produced by these surveys. Administrative data have been used for direct estimates as well for tabulations of certain pen - sion, health, justice, education, and travel statistics.
OCR for page 49
49 COLLECTION OF HOUSEHOLD DATA Since the passage of the 2008 data omnibus record linkage authority, an example of how administrative data have been used is to construct a frame for the new Survey of Young Canadians. Neither households rotating out of the LFS nor a fresh sample of dwellings from an area frame was sufficient or cost-effective for generating a sample for this survey. Because of the need to sample from a unique population of respondents ages 1-18, Statistics Canada turned to the Canada Child Tax Benefit (CCTB) file. Every child ages 0-6 in Canada receives a monthly benefit, irrespective of family income, and the child is registered in the hospital at birth. Children who are no longer eligible for the benefit are also included; thus the database is quite comprehensive. In comparing the 2006 CCTB file with that of the 2006 census, it was discovered that coverage in the CCTB was quite good: 93-97 percent per age per year. Income distributions between the two collections were also quite similar. However, the Survey of Young Canadians was planned primarily as a survey using computer-assisted telephone interviewing (CATI), and contact information was not in the file received by Statistics Canada. Arrangements were subsequently made with the Canada Revenue Agency to obtain contact information, Trépanier said. In a field test of the survey, which was mostly a test of the contact infor- mation, 83 percent of the 1,000 test cases had a valid address on the file. Also worthy of note is that there was an anticipation of concern, particularly from parents, about the use of the CCTB to reach respondents, but the pretest indicated that this was not a problem. As an example of previously described potential drawbacks of administrative data, at some point the records of all persons over age 18 were removed from the database based on the argument that they were no longer eligible for the benefit, even though they would have been of interest for the survey. Other efforts to centralize and improve tracing operations using adminis - trative data currently pursued by Statistics Canada include samples sent to the Canadian Council of Motor Transport Administrators (CCMTA), which returns them with addresses from driver’s license information. Statistics Canada is also making greater use of the National Change of Address file that is created by Canada Post. One recommendation put forth by the Vision for Administrative Data Task Force at Statistics Canada was to develop an explicit policy on administrative data, Trépanier said. Currently, Statistics Canada has a guiding principle for administrative data use but no official policy. In addition, centralizing processes for taking in and using administrative data need to be established, she said. This would entail creating an inventory of data and assigning management responsibility for each data source. There is also a push to mobilize existing resources, prioritize research, and establish a governance process on how to use administrative data. For the future, Trépanier said, using administrative data to build sampling
OCR for page 50
50 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS frames is of particular interest. There is the risk of coverage error in using an administrative database in constructing a frame, but if it is done in the context of using multiple other frames and calibration to correct coverage error, this is probably less of an issue. The ideal goal is a single frame, which is the approach used in building Statistics Canada’s Address Register, but this does not preclude the inclusion of auxiliary information. A single frame would allow for better coordination of samples and survey feedback, she said. For data collection, one of the goals related to administrative data is to enable tracing. Statistics Canada wants to centralize the tracing process lead - ing to the linking of all administrative data sources to make available the best contact information possible. This will require substantial effort, including a process to weigh the quality of the different sources and determine what contact information is most likely to be accurate. Another goal for administrative data could be to better understand the determinants of survey response and improve data collection procedures based on this information. For example, administra- tive data can provide guidance on preferred mode of data collection if one can assess whether persons who file their taxes electronically are also more likely to respond to an electronic questionnaire. Statistics Canada has been successful in using substitution of income data from tax records, and this is likely to be continued. It is yet unclear, however, whether other information is available that could replace survey data. Investi - gating these options is done with caution because of the risk discussed. There is also the problem of ensuring consistency between survey and administrative data across variables. Administrative data can also assist researchers in better understanding nonresponse bias and the impact of lower response rates. Finally, they can help both reduce the volume of data collected in surveys and improve estimation. Now that Statistics Canada has the omnibus record linkage authority in place, exploring all of these options has become a much easier process. DISCUSSION The discussion of the various methods used in the collection of household data began with several questions about the Canadian system of house- hold surveys. Kathleen Styles (Census Bureau) asked for clarification on the omnibus record linkage authority—specifically, how did that come to pass, what was the motivation, and what did it hope to accomplish? Trépanier answered that it was established after someone realized that requests for linkage were going to the policy committee quite frequently (about every two weeks) and that many of these linkage requests were similar in nature. This process became burdensome, particularly considering that the requests generally did not involve disseminating administrative data. Since a record linkage authority already existed on the business side, that was extended for use in the area of
OCR for page 51
51 COLLECTION OF HOUSEHOLD DATA linking social and survey data as well. But it is important to remember that the omnibus authority was designed to be used for evaluations that could improve surveys—not to disseminate administrative data sources. And although going to the policy committee is no longer necessary, the Access Division at Statistics Canada must be notified of the administrative data use so that it can make an inventory of all the linkages. Styles followed up her question with another one about registers. A reg - ister of persons is a loaded issue, but does Statistics Canada have permanent files that are intended to represent all Canadian residents? In the discussion of tracing and a centralized address frame, it seemed as if this may be similar to a register. Trépanier responded that the central processes for tracing are under construction now. As for the Address Register, the plan is not necessarily to use it for all of Canada. As Tambay said earlier, the AR will be good for listing in urban areas, but it is likely that there will still be a need for an area frame, particularly for rural areas. Cynthia Clark asked Trépanier to clarify under what circumstances is Sta - tistics Canada required to obtain consent for the use of tax data. Trépanier said that one interpretation of the Statistics Act is that permission is only necessary if administrative data were to be used in conjunction with other survey data. In those cases the respondent would need to be informed that the data are being linked. Graham Kalton reminded the participants that according to Trépanier’s presentation, the SLID obtains permission from a high proportion of respon - dents for the use of tax records, but about 15 percent refuse to grant permis- sion. But researchers still have access to all the records. Is Statistics Canada now allowed to match those records together to evaluate the returns? How is this problem handled? Would it be better not to ask permission and just use the records? Trépanier said that they were interested in conducting a study of the SLID respondents who refused access to their tax records, but it turned out that the way they are currently asking for permission is very general, and this precludes the linkage if respondents refuse. A discussion participant asked Martinez for clarification on the integration of administrative health data, specifically, whether a linkage of the National Health and Nutrition Examination Survey (NHANES) to states is the issue under consideration or whether something more elaborate is planned. Martinez replied that, initially, the primary files being linked would be Health Interview Survey data with Centers for Medicare & Medicaid Services data, using mostly the Medicare files. The NHANES linkages to some state files are part of the other pilot study, the nutrition and food assistance project. Jay Ryan (Bureau of Labor Statistics) is interested in new data collection technologies and asked Dillman what kind of research is being done with text messaging for survey contact, particularly now that text messaging has become
OCR for page 52
52 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS so prevalent. Also, how will the shift to larger cell phone screens, particularly in the case of smart phones and tablet PCs, affect data collection? Phillip Kott agreed that text messaging is becoming an increasingly important mode of communication among young people in particular, who often consider phone calls rude and expect a text message even before agreeing to talk to someone on the phone. Dillman said that he was not aware of much research on text messaging, but this was something he has thought about, particularly what kind of coverage it would entail and the type of people most likely to use it. He added that he suspects that people who use text messaging frequently may be quite different from those who do not. Another concern related to this technology is that if people read text messages on the go, they are not going to stop to fill out a survey, because they are probably not in a good place to do that. On smart phones and tablet PCs, Dillman said that the screens of many of these are still too small. Still, surveys will eventually be constructed for these devices. He predicted that the first study of surveys on smart phones and tablet PCs will happen as early as spring 2011. This issue is a challenge even in the case of those who rely on email as their primary form of communication, Dillman continued. In the studies he has conducted of both mail and email contacts to entice survey participation, he received a higher response when a questionnaire was sent via postal mail than when an email response was requested. Young people also tend to go to paper first. The bottom line, however, is that little progress will be made on electronic surveys if all that is done is to send an email and then expect people to respond. Even for young people, surveys will need to do something different. This sometimes results in a higher cost for web surveys than mail. Keith Rust noted that, in Westat’s studies of mode choice, many respon - dents use more than one mode, which means that responses have to be undu - plicated. This may be because respondents use a mode that is convenient to them and then use another one in addition to respond to the survey because they think that is what the administrators of the survey want them to use. Dillman replied that it is critical that researchers be very clear about what is requested of respondents. For example, if a web response is preferred, the survey should state that and explain the reasons. Even then, giving a question - naire to a person but then telling them to respond by another mode, web for example, is a challenge, because the respondent will consider that the paper is right there in hand and, in order to respond by web, one must wake up the computer, and type in a complex URL. Jelke Bethlehem asked Dillman for clarification on his advice not to use CATI and computer-assisted personal interviewing (CAPI) in mixed-mode sur- veys but rather use mail and emails. One of the Statistics Netherlands surveys follows up web contact with mail, then CATI, and then CAPI. Does Dillman
OCR for page 53
53 COLLECTION OF HOUSEHOLD DATA recommend that the CATI and CAPI follow-up steps be abandoned in this survey? Dillman clarified that he was not suggesting that any of the modes should be abandoned. Different situations call for different modes. It is, however, increasingly difficult to conduct a conversation with people over the telephone, because that is not how the telephone is used anymore. Society has evolved so that people control the phone, and they use it when they want to. It used to be that they had to answer the phone or miss a call. Changes in culture are contributing to the decline of phone surveys more than changes in technology. The technology just made the culture change possible.
OCR for page 54