Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 35
4
Collection of Household Data
NEW DATA COLLECTION MODES AND THE
CHALLENGE OF MAKING THEM EFFECTIVE
Don Dillman (Washington State University) began his presentation by say-
ing that surveys are now in a period of tailored design, in which different modes
and implementation procedures are appropriate for different situations. The
focus of his talk was on the challenges this new period presents.
An understanding of the evolution of survey modes and designs is impor-
tant for gaining perspective on the current system. First was the transition from
face-to-face interviews to telephone. Dillman recalled that his own experience
with this transition was relatively easy, because face-to-face methods could
readily be adapted to telephone surveys. Households had about 78 percent
telephone coverage in 1970, and this number seemed to be increasing, making
the transition increasingly more feasible.
Household survey methods, including sampling approaches, could rea -
sonably be applied to telephone, he said. The interviewer’s role in telephone
surveys is similar to that in a face-to-face interview in terms of reading items,
clarifying questions, and relying on hidden categories (categories that are not
offered to the respondent), as needed. The main differences are that show cards
need to be eliminated, scales have to be shortened to achieve the same level of
comprehension, and questions sometimes need fewer words to be understood
aurally. Another difference is that supervisors are more accessible during tele -
phone than face-to-face interviews.
Bringing email and the web into data collection was a more difficult transi -
tion. Currently, approximately two-thirds of households have Internet access
35
OCR for page 36
36 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS
and use it with some regularity, leaving a possible one-third of households
unable to respond to a survey over the Internet from home.
Another problem that arises with creating a sample of Internet respondents
is that it is harder to implement a within-household random selection because
some householders lack Internet skills. In the case of some households, this
phenomenon may be related to a division of labor: just as some people do the
laundry and some take care of cars, a particular person in a household may use
the Internet. Furthermore, survey organizations generally do not have email
addresses that would enable them to send respondents links to Internet surveys,
unless a prior relationship exists. Even if this could be resolved, it is likely that
response to an initial email invitation would be quite low.
Meanwhile, the telephone is losing its viability as a survey mode option.
There are many reasons for this, including the increasing use of cell phones
(although these can sometimes be added to a frame), the decreasing reliance
on landlines (current coverage is less than 75 percent of households), and
increasingly blurred lines when it comes to the geography of phone numbers.
American culture has also changed. People no longer use the telephone for
most business interactions unless they have to, and they tend to exercise more
control over their devices than in the past, by not always answering calls.
The telephone itself now fulfills a variety of functions, often serving as a
personal computer. However, the screen space available for a web questionnaire
is small, and entering text on a telephone is prone to error. Finally, respond -
ing to a survey on a phone device often cannot be combined well with other
activities the potential respondent may be doing while accessing the Internet.
Changes related to the telephone and the continuing limitations of Internet
access suggest that, in the near future, there will be more reliance on mixed-
mode survey designs to collect data. Dillman devised a typology of the ways
data collection modes are most commonly mixed (Dillman et al., 2009), sum -
marized in Box 4-1.
The first type involves the use of a particular mode to encourage people
to respond by another mode (typically, the Internet). In a sense, this is still a
single-mode study, and therefore measurement differences between modes are
not as big a concern as they might be otherwise. In the second type, one mode
is used to ask some of the questions, and another mode to ask others, such as
more sensitive questions. In practice, this interview technique often entails an
interviewer simply turning a laptop around during a face-to-face interview so
that the respondent can self-administer part of the interview. A third type of
mixed-mode design involves using different modes of administration for dif -
ferent types of respondents. A fourth approach, typically used in longitudinal
studies, employs one interview mode for the first interview and another mode
for the second and subsequent interviews.
Dillman pointed out that it is important to remember when combining dif -
ferent modes of administration that sometimes achieving one survey objective
OCR for page 37
37
COLLECTION OF HOUSEHOLD DATA
BOX 4-1
Typology of Mixed-Mode Surveys
Type 1: One mode for data collection, another mode for selection/encouragement.
Type 2: One mode to ask certain questions, another mode for additional questions.
Type 3: One mode for some respondents, another mode for other respondents.
Type 4: One mode for Time 1 data collection, another for Time 2 data collection.
SOURCE: Workshop presentation by Don Dillman.
may get in the way of another. For example, improving response rates by offer-
ing alternative modes of responding may introduce measurement differences,
or reducing costs may conflict with obtaining quicker responses.
There are also several significant barriers to wider adoption of mixed
mode designs, he said. There is a tendency among survey professionals to
construct survey questions differently for different modes, and part of the
reason for this is the desire to maximize the design for a specific mode. Visual
(self-administered) versus aural (telephone) presentations, in particular, have
different requirements.
For example, in the face-to-face mode, show cards can be used for answer
choices, scales are often fully labeled, questions and questionnaires tend to be
longer, and some of the answer options can be made available to the interviewer
without explicitly offering them to the respondent (such as “Don’t know” or
“Refused”). In the telephone mode, scales tend to be shorter and are presented
without all categories labeled, questionnaires are shorter, complex branching
formats can be used without affecting respondent comprehension, and, as in
the face-to-face mode, some answer options can be made available without
being explicitly offered. The mail mode encourages less question branching but
can accommodate longer, more complex scales. Open-ended question formats
are avoided when possible, and response categories cannot be hidden. A web
mode encourages required answers and fewer “don’t know” options. Fill-ins are
possible from previous answers. Audio, video, and other add-ons are possible,
and typically there are no hidden categories. Unintentional mode-related con -
struction differences can often lead to significant differences in the distribution
of the answers provided.
Research has shown that the visual layout of survey items influences
answers. Dillman highlighted the 24 most significant concepts in visual design
OCR for page 38
38 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS
(see Box 4-2). As an example of different design requirements for visual com -
munication, he described a challenge encountered by the National Science
Foundation while designing one of its web surveys. The goal was to obtain date
information from respondents using two digits for month and four digits for
year, in adjacent character spaces. Cognitive interviewing revealed that respon -
dents will attempt a variety of approaches to answering a date question (e.g.,
using alphabetic abbreviations for the month) and that they get frustrated when
BOX 4-2
Visual Design Concepts That Matter
Attention and visual processing:
Preattentive processing
Attentive processing
Useful field of view
Foveal view
Top-down processing
Bottom-up processing
Visual features that influence the expression of words, numbers, and
symbols:
Figure/ground composition
Size
Shape
Location
Spatial arrangement
Color
Brightness
Contrast
Languages that give independent meaning to information on a page:
Words
Numbers
Symbols
Grouping principles:
Pragnanz (law of simplicity)
Proximity
Elemental connectedness
Common region
Continuity
Closure
Common fate
SOURCE: Workshop presentation by Don Dillman.
OCR for page 39
39
COLLECTION OF HOUSEHOLD DATA
55.3
Survey #1
63.3
#1
#2 45.4
#2 87.2
#1 88.5
#1 90.6
#3 95.8
0 50 100
FIGURE 4-1 Summary of web experiments.
SOURCE: Workshop presentation by Don Dillman.
they receive an error message. This led to extensive testing of this question over
a period of four years.
Figure 4-1 shows that changes in visual formatting led to large differ-
ences. According to the law of proximity in Gestalt psychology, if something
is connected, it tells people to do the same. When this principle was applied
in experiments, 55 percent of respondents filled in the boxes correctly. If the
month box was smaller and the year box a little larger, 63 percent filled in the
boxes correctly. When the symbolic language MM, YYYY was added to the
respective boxes, this yielded 87 percent correct responses. Finally, when boxes
and symbolic language were arranged in natural reading order, 96 percent of
respondents provided responses in the desired format.
Dillman also described some experiments to address the issue of visual
versus aural presentation. In one study, he asked respondents in three differ-
ent ways when they began their studies: (1) When did you begin your studies?
(2) What date did you begin your studies? And (3) What month and year did
you begin your studies? On the web survey, there was little difference in the
percentage of students using the preferred MM/YYYY format. However, over
the phone, the differences between the distribution of the responses were dras -
tic. The percentage of respondents reporting month and year was 13.4 in the
“when” condition, 49.5 in the “what date” condition, and 83.7 percent in the
“what month and year” condition.
OCR for page 40
40 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS
Of course, in the case of telephone interviews, the interviewer can act as
an “intelligent system” that converts the responses to the desired format. That
luxury does not exist in a web mode, forcing researchers to think of question -
naire construction differently and to invoke theoretical concepts on visual
information processing.
Another issue related to different modes of administration involves sca -
lar questions. The concepts of social desirability, acquiescence, primacy, and
recency have been often used to explain why people respond the way they do,
but Dillman argued that these concepts often do not explain mode differences.
He and colleagues conducted several experiments to examine whether using
the same wording for scalar questions will produce the same answers in aural as
in visual presentations. The experiments involved a variety of scales, including
5-point, 7-point, fully labeled, and polar point labeled scales. Regardless of the
scale type, each of the experiments resulted in slightly more positive responses
on the telephone than on the web. The point here is that there is a consistent
body of evidence building that mode makes a difference in responses.
A line of research Dillman is particularly interested in involves combining
two visual modes of data collection and avoiding the aural mode. Sending an
email request as a first contact is typically not appropriate in cross-sectional
household surveys, unless there is an existing relationship with the sample
members or if they are a part of a longitudinal study. When given a choice of
mail or web response, through mail contact, people tend to opt for mail, and
overall response rates are lower. Requests for web-only responses typically
result in low response rates.
However, and despite declining response rates for most modes of data
collection, response rates in mail surveys, particularly with prior screenings or
incentives, tend to remain fairly high. Some of the reasons can be explained
by social exchange theory, and such concepts as rewards/benefits, burden/
costs, and trust in the delivery of benefits. Social exchange theory could serve
as a guide for other self-administered modes, such as the web, and for mixing
modes in order to avoid having to rely on email only to obtain web responses
and postal contacts only to get postal responses.
In many ways the Internet is different. There are problems with using it for
surveys: the burden can be greater when responding to a survey via the Internet,
particularly if going from postal letter to the computer; computer literacy is low
for some respondents; there are operational issues—Does the computer work
properly, or at all?—and emails from strangers can be harder to find or get lost
more easily after the first day in one’s email inbox.
The benefits of Internet surveys vary. Technology is easier to deal with for
some than for others. For some, there may be faster ways of responding. With
an Internet survey, there is no need to try to find a mailbox to return a question-
naire. But with Internet surveys, trust is a significant concern. People do not like
to open email from strangers, the sources of emails and websites can be faked,
OCR for page 41
41
COLLECTION OF HOUSEHOLD DATA
and there is the ever-present threat of downloading a virus or other malicious
software. This last issue represents an area in which government agencies may
have an advantage: people tend to trust communication coming from a govern -
ment authority much more than any other potential survey contact.
Still, if people are given a choice of responding by either mail or Internet,
most will chose mail. And, if mail is withheld to encourage respondents to use
the web, research has shown that the respondents who end up participating
during follow-up are very different from one mode to another. Dillman noticed
in his research, however, that if an address-based sample is used to try to push
people to the web, the result is a greater response from an advance postal token
incentive for the mail-plus-web combination than for just the mail response
alone. Email tends to cut the burden of web response because it brings respon -
dents closer to their response mode preference. In essence, what will best bring
postal, email, and web contacts together to obtain more responses by web is to
begin integrating two modes, rather than forcing all web options together or
mail options together.
In Dillman’s view, it is important for the survey community to bring
together token cash incentives, mode choice, and email augmentation in trying
to move forward. New options like address-based sampling and the sequential
use of modes need further exploration but have great potential.
He ended by saying that the transition to the web is desirable, but it is
going to be difficult. A positive development is that Dillman’s experiments that
were based on address-based samples have yielded two-thirds of the responses
over the web, which three or four years ago would not have been possible.
However, coverage limitations suggest the need to use another mode (most
likely mail) to at least deliver the request. This also raises concerns about mode
differences. Evidence is mounting that the aural and visual modes sometimes
produce different responses.
INTEGRATING ADMINISTRATIVE RECORDS INTO
THE FEDERAL STATISTICAL SYSTEM 2.0
The focus of the presentation by Rochelle Martinez (Office of Manage-
ment and Budget) was to illustrate what the statistical system could do to
address barriers to making greater use of administrative records. For the past
few years, interesting work has been going on to try to build capacity to use
more administrative records, particularly with demographic data collection.
Her talk specifically addressed the work going on across the statistical system,
coordinated by the Office of Management and Budget (OMB). She discussed
initiatives in the president’s budget and recent events related to administration
support for these activities.
For many years, members of the statistical community have said that
administrative records can and should be used more fully in the federal statis -
OCR for page 42
42 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS
tical system and in federal programs. The use of administrative records in the
Netherlands and other countries gives a good flavor of the kinds of things the
statistical system can envision doing in the United States to varying degrees.
There are also areas, however, in which substantial work has already been done
in the U.S. context. Most notably, administrative records have been used in
economic statistical programs since the 1940s. There are also good examples
of administrative data use with vital statistics, population estimates, and other
programs across several federal statistical agencies.
Martinez mentioned that former director of the U.S. Census Bureau,
Kenneth Prewitt, often talks about another reason that administrative records
hold potential: the need for innovation. He has said that he is less concerned
about the federal statistical system with regard to relevance and integrity than
he is about innovation, in particular about how prepared statistical agencies are
for the innovation necessary to navigate the new world. In many cases, national
information systems are increasingly reliant on administrative data and, in some
instances, on data from the commercial sector. Prewitt’s greatest concern is that
government agencies seeking statistical information about the population will
bypass statistical agencies altogether as they turn to the parts of the government
that control large administrative data sets.
Martinez said that she sees this happening in some federal agencies right
now. Offices that are collecting data for administrative purposes can (at least
reportedly) produce a statistical result much more quickly than the principal
statistical agency in that department. For a congressional or public affairs
office, this is very appealing. Those in the statistical system can think of rea -
sons why that might be a problem, but these offices may not. The best case
scenario is that there are multiple estimates in the public domain that some -
body has to be able to explain. The worst case is that somebody thinks that a
statistical agency is less relevant and less timely and therefore that its data are
less useful than the administrative data source. At OMB, Open Government
and Data.gov initiatives encourage putting many more administrative data
sets in the public domain, where they can be used for a variety of purposes,
so these issues need to be addressed across the system.
Members of the Federal Committee on Statistical Methodology (FCSM)
wanted to facilitate statistical agency use of administrative records. To explore
how to achieve this, an interagency subcommittee was formed. This group
created a set of products that the statistical community may find useful going
forward.
The first product to come out of the subcommittee was a set of case studies,
“Profiles in Success,” focusing on projects that had successfully acquired and
used administrative data in a statistical project. Martinez said that the case stud-
ies were quite useful in helping the subcommittee members identify systematic
barriers to greater use of administrative records. It is these barriers that the
group has tried to address head-on in recent months and years.
OCR for page 43
43
COLLECTION OF HOUSEHOLD DATA
Following the “Profiles” product, the subcommittee turned to awareness
activities, in part to dispel myths related to difficulties related to using state
administrative records data. This group found many good examples of suc -
cessful administrative data use in research and, in some cases, production.
The subcommittee wanted to highlight the necessary success factors for using
administrative data, and the statistical community has been very receptive.
As a result, the subcommittee has been asked to develop training and other
activities to help data users navigate the difficult world of acquiring and using
administrative data.
A subsequent product for the toolkit, she said, was one of creating model
agreements. Getting an agreement in place for data sharing and usage between
agencies is often a drain on time and money. Thus, the subcommittee has cre -
ated a model agreement that agencies can use to facilitate the data-sharing pro -
cess. Although many aspects of such agreements can be covered in a template,
not all can, so there will be tailoring to some extent. The idea behind model
agreements is to reduce front-end costs, because so many projects either die on
the vine at this stage or use too many project resources, leaving fewer resources
for the research.
Another product created by the subcommittee is related to informed con -
sent. The informed consent product is an in-depth look at legal requirements
across federal agencies, current practices for informed consent at statistical
agencies, and current practices at administrative agencies. It also synthesizes
research on informed consent wording in the context of data sharing and record
linkage. This product is likely to help the statistical system in terms of best
practices for new activities going forward. It will also provide guidance on how
to meet requirements for projects for which administrative data were collected
before there was an identified statistical use for them. The subcommittee has
also done some work on data quality, with the goal of creating tools for data
quality measurement and documentation, but it is far from complete.
As a result of the subcommittee’s work, Martinez went on, at least four bar-
riers to using administrative data crystallized. One of these barriers is statistical
agency access to administrative data. Statistical agencies have statutes that are
designed to protect the confidentiality of data, and they consider themselves
very much stewards of data. But despite these provisions and helpful language
in the Privacy Act, statistical uses of administrative data are sometimes difficult
to achieve. In many departments, program offices have data on which the leg -
islation is either silent, unclear, or perhaps narrow in terms of the kinds of uses
that are considered appropriate.
There is also an issue of incentives; program offices may not think it
worth the effort to figure out how to address a statistical agency’s request for
data. Whose job is it to work with the statistical agency? It can be very time-
consuming to identify variables that are needed or to work with an agency to
understand what data they have now or how these could be used. Negotiating
OCR for page 44
44 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS
agreements is a practical product that comes out of these discussions. Some
agencies spend years and years trying to obtain administrative data. Statisti -
cal agency access to administrative data may be the most important barrier
because, without access, projects cannot be undertaken.
A second, somewhat related barrier is what the subcommittee has termed
inadequate infrastructure, referring to the infrastructure at both the statistical
office and the administrative office. There is an administrative infrastructure
needed to address such issues as the process for requesting data and approv -
ing the request. Technical infrastructure can require a significant investment of
time and resources on the statistical side. But even on the administrative agency
side, someone has to be able to extract and transfer the data. The subcommittee
thinks that infrastructure is lacking in many of these cases.
The third barrier is administrative data quality. Although they are not
perfect, with survey data, agencies have the capability to describe and to under-
stand the quality of what they have. In other words, there are a lot of measure -
ment tools for survey data that do not yet exist for administrative records. Some
have assumed that administrative data are a gold standard of data, that they are
the truth. However, others in the statistical community think quite the oppo-
site: that survey data are more likely to be of better quality. Without a common
vocabulary and a common set of measurements between the two types of data,
the conversation about data quality becomes subjective.
Another significant data quality issue for statistical agencies is the bias that
comes with the refusal or the inability to successfully link records. In addition
to the quality of the administrative data as an input, the quality of the data as
they come out of a linkage must be considered as well.
The final barrier has to do with researcher access. This includes researchers
both internal and external to the government. Sometimes an afterthought, this
is the idea of creating documentation that would be needed to really make a
file, particularly a linked file, useful for someone else outside the project. There
are issues of documentation and of providing disclosure protection to a linked
file. For this reason, linked files are very rarely public-use files. Few methods
for restricted access have been devised beyond those that existed for projects
before record linkage was a focus. Many of these linked files have been created
and not really used by people outside the immediate project, and that is a con -
cern both in terms of the utility of what has been created and for data quality.
Martinez said that some initiatives in the president’s fiscal year 2011 budget
should help further the subcommittee’s goal of promoting the use and exchange
of administrative data. Specifically, three major pilot studies have been pro -
posed, two for the Census Bureau (2010 Census Simulation Pilot and Health
Data Pilot) and one for the Economic Research Service (Nutrition and Food
Assistance Pilot).
Together, these three pilot studies are designed to address all four barri -
ers. Although the barriers will not be resolved in a year, agencies can certainly
OCR for page 45
45
COLLECTION OF HOUSEHOLD DATA
begin to address them in ways that benefit the entire federal statistical system.
Martinez emphasized that the notion of a common good was very important in
proposing the initiative.
The first pilot project is designed to use both government and commercial
administrative data to see if it is possible to simulate 2010 census results. Out -
comes envisioned include advancing both knowledge about and measurement
of the quality of many administrative record data sets. Ideally, this will not only
inform the decennial census, but also other demographic surveys.
In Martinez’s view, this project is also critical to setting up an infrastruc-
ture. Some consider the Census Bureau to be the ideal place for this, because
it is thought to be big enough and stable enough to handle a large number
of different files and many different activities. This is why the Census Bureau
also received much of the funding; it would be much less efficient to attempt
to build up infrastructure at multiple statistical agencies than to centralize the
technology, capacity, expertise, and synergy.
The second pilot project is related to the first one and is also housed mostly
at the Census Bureau. The idea is that the Census Bureau has the capacity and
stabilizing infrastructure that enables it to provide record linkage services to
other federal statistical agencies. The National Center for Health Statistics
(NCHS) has agreed to be the pilot agency to provide identifiers from multiple
health-related administrative and survey data sets to the Census Bureau to link
and return to NCHS.
The overarching concept behind this pilot study is that record linkage is a
service, a line of business that the Census Bureau could provide to agencies that
are smaller or that lack similar capacity. A vision for the future is to centralize
to some degree the expertise and the hands-on experience with different data
files while still retaining the benefit of having a subject-matter agency, such as
NCHS, getting back the data and using them for both subject-matter research
and for providing access to other health researchers.
The goal of the third pilot project, the nutrition project, is to help the sta -
tistical community better understand how to acquire and use state administra -
tive records for statistical research and to demonstrate the utility of such data
for program evaluation. The hope is that this project can help identify a model
in which these data might be acquired in a more centralized way. This project
also helps to bring together multiple agencies that are interested in state data.
Although a primary goal of the pilots is to address the barriers outlined,
Martinez said that these projects have also created interest among policy offi -
cials because of the ability to learn more from a subject-matter perspective. To
make any of these ideas happen, it is essential that administrative agencies be
included in the conversations about these uses of their data.
To that end, OMB has recently issued a memorandum encouraging federal
agencies to share data in order to meet the needs of several administration
initiatives, including statistical data projects. This demonstrates that administra-
OCR for page 46
46 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS
tion officials are supportive of these efforts to increase the use of administrative
data. The support of senior officials will be necessary, she said, because a move
to expand administrative data use necessarily entails difficult conversations
about legal and policy issues regarding data access.
Martinez added that all of the work she described was sponsored by the
Interagency Council on Statistical Policy (ICSP). The ICSP is comprised of the
heads of the principal statistical agencies. Among these agency heads a sub -
group has been focused on developing a vision beyond the three pilot projects.
She said that among agency heads and project teams alike, there is continued
enthusiasm for these projects, and they are hopeful that the studies can con -
tinue to move forward in an uncertain budget environment.
Despite operating under a continuing resolution, project teams have
already been working on the aforementioned pilot projects. These groups
would like to involve more researchers in the projects to help think through
some of the issues that crop up in the course of the work. Furthermore, it is very
important that not only federal statistical agencies, but also the professional
statistical community, and particularly those working in the states, contribute
to this conversation.
THE ROLE OF ADMINSTRATIVE RECORDS IN HOUSEHOLD
SURVEYS: THE CANADIAN PERSPECTIVE
Julie Trépanier (Statistics Canada) described her agency’s use of adminis -
trative records in household surveys. To set the stage for this perspective, she
outlined official legislation, policies, and guidelines that govern administrative
data use in Canada.
Statistics Canada’s guiding principle—though not a policy—is to use
administrative records whenever they present a cost-effective alternative to
direct data collection. Section 13 of the Statistics Act allows Statistics Canada
to obtain administrative data files from any organization for the purposes of the
law. It also specifies some rights of access to administrative data. Specifically,
Section 24 gives Statistics Canada the right to use income tax records; Sec -
tion 25 gives access to excise tax records; and Sections 26 and 29 give access
to crime and justice records. The act also stipulates that Statistics Canada is
responsible for promoting the avoidance of duplication in the information col -
lected by the various departments.
A memorandum of understanding (MOU) governs the release of admin-
istrative information to Statistics Canada. These documents say what the data
are, when the data will be available, how much they will cost, and how and
between whom the data will be shared. MOUs are lengthy, extremely detailed
documents. For example, the MOU between the Canada Revenue Agency and
Statistics Canada is over 100 pages. Creating an MOU is often difficult, involv -
ing negotiations that last years.
OCR for page 47
47
COLLECTION OF HOUSEHOLD DATA
Another important aspect of the legal framework for linking survey data
to administrative data are two policies that govern these transactions: (1) the
policy on informing survey respondents and (2) the policy on record linkage.
Currently, data from different sources cannot be linked unless the Statistics
Canada policy committee approves of the linkage. This committee is the high -
est committee at Statistics Canada, chaired by the chief statistician. However,
under the policy on record linkage, two omnibus record linkage authorities
have been approved and allow linkages to be performed under certain circum -
stances without requiring separate approval by the policy committee.
The first authority is the omnibus record linkage authority for the economic
statistics program, and it allows linkage of data for business surveys. The second
authority is the omnibus record linkage authority for improving population and
household survey programs, which allows linking data for three reasons: (1) to
improve a survey (e.g., to improve stratification, nonresponse adjustment), (2)
to study and assess survey data quality (e.g., to improve survey frame quality,
assess disclosure risk), and (3) to aid in data collection (e.g., to add addresses or
phone numbers). Record linkage is not allowed under these omnibus authori -
ties, however, if the purpose of the linkage is to produce estimates for public
release. To do this, approval is still required from the policy committee.
Trépanier also discussed the challenges and drawbacks they experienced
using administrative data. Referencing points also made by Jelke Bethlehem
about the Netherlands, she commented that researchers will never have the
same control over administrative data that is possible over statistical data. Even
if a thorough evaluation of the administrative data is conducted before deciding
to use them, there are still errors and risks that can jeopardize the process, and
statistical agencies often are not informed about changes that can have these
types of effects. Some of the major risks are summarized below:
• ata may change or cease to be collected without warning for some
D
parts of the population.
• he concepts and definitions underlying data may not be exactly what
T
is assumed or expected.
• ften quality assurance by the organization collecting the administra-
O
tive data is not comparable to what could have been put into place for
purposes of statistical usage.
• imeliness of the data is frequently a problem.
T
• he lack of stability in the administrative data program is also a danger.
T
Much like the United States, Canada is encountering many challenges with
household surveys. Trépanier named decreasing response rates and increasing
costs as the most important. Even in the Labour Force Survey (LFS), which
is mandatory, there has been a slight decline in participation. There is also a
perception of an increased response burden, not only due to requests for infor-
OCR for page 48
48 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS
mation from statistical agencies, but also from administrative agencies and the
private sector.
Similar to the United States, Canada has considered ways of overcoming
these challenges, and the use of administrative data has been identified as one
option for overcoming them, because it allows for the reduction of sample size.
Specifically, administrative data can be used to construct list frames, which can
in turn be used to allow for stratified simple random sampling. List-type frames
can make design simpler and more efficient.
Administrative data are also helpful to use in indirect estimation (cali -
bration). Administrative data may reduce the effort required to reach each
respondent, and they may be able to provide better contact information for the
sampling frame. They can also be used to help implement a more efficient col -
lection strategy, such as responsive design. Using administrative data may help
reduce the volume of data collected by partially or completely replacing survey
data. Furthermore, they can reduce the impact of nonresponse.
There are multiple examples of how Statistics Canada has used administra-
tive data, Trépanier said. Even before the passage of the omnibus record linkage
authority, administrative data have been used to complement existing sampling
frames, such as the Address Register (AR) mentioned earlier, with additional
information on addresses and telephone numbers. The AR was substituted for
the listing of approximately 40 percent of clusters in the last redesign of the
LFS area frame. Administrative data have also been used in the random digit
dialing frame to identify a working bank of telephone numbers and to add
addresses for advance letters to the residences whose telephone numbers were
selected for interview.
There are also instances of using administrative data for partial substitu -
tion of other survey data. For example, rather than collecting income from
respondents as part of the 2006 census and other household surveys, such as
the Survey of Labour and Income Dynamics (SLID) and the Survey of Finan-
cial Security (SFS), Statistics Canada asked respondents for permission to use
income tax information instead. Currently, the permission rate is about 80
percent.
Trépanier explained that Statistics Canada has used administrative data for
indirect estimation in the past. Specifically, they were used to improve consis -
tency across surveys for income estimates using harmonized calibration for the
SLID, the SFS, and the Survey of Household Spending (SHS). Statistics Can -
ada used what is referred to as T4 information, or employers’ forms on salaries
and wages. The number of employees by class of salaries and wages is used as
a control total in the calibration in conjunction with the traditional calibration
to demographic control totals. These methods were successful in improving
consistency across survey estimates produced by these surveys. Administrative
data have been used for direct estimates as well for tabulations of certain pen -
sion, health, justice, education, and travel statistics.
OCR for page 49
49
COLLECTION OF HOUSEHOLD DATA
Since the passage of the 2008 data omnibus record linkage authority, an
example of how administrative data have been used is to construct a frame
for the new Survey of Young Canadians. Neither households rotating out of
the LFS nor a fresh sample of dwellings from an area frame was sufficient or
cost-effective for generating a sample for this survey. Because of the need to
sample from a unique population of respondents ages 1-18, Statistics Canada
turned to the Canada Child Tax Benefit (CCTB) file. Every child ages 0-6 in
Canada receives a monthly benefit, irrespective of family income, and the child
is registered in the hospital at birth. Children who are no longer eligible for the
benefit are also included; thus the database is quite comprehensive.
In comparing the 2006 CCTB file with that of the 2006 census, it was
discovered that coverage in the CCTB was quite good: 93-97 percent per age
per year. Income distributions between the two collections were also quite
similar. However, the Survey of Young Canadians was planned primarily as
a survey using computer-assisted telephone interviewing (CATI), and contact
information was not in the file received by Statistics Canada. Arrangements
were subsequently made with the Canada Revenue Agency to obtain contact
information, Trépanier said.
In a field test of the survey, which was mostly a test of the contact infor-
mation, 83 percent of the 1,000 test cases had a valid address on the file. Also
worthy of note is that there was an anticipation of concern, particularly from
parents, about the use of the CCTB to reach respondents, but the pretest
indicated that this was not a problem. As an example of previously described
potential drawbacks of administrative data, at some point the records of all
persons over age 18 were removed from the database based on the argument
that they were no longer eligible for the benefit, even though they would have
been of interest for the survey.
Other efforts to centralize and improve tracing operations using adminis -
trative data currently pursued by Statistics Canada include samples sent to the
Canadian Council of Motor Transport Administrators (CCMTA), which returns
them with addresses from driver’s license information. Statistics Canada is also
making greater use of the National Change of Address file that is created by
Canada Post.
One recommendation put forth by the Vision for Administrative Data Task
Force at Statistics Canada was to develop an explicit policy on administrative
data, Trépanier said. Currently, Statistics Canada has a guiding principle for
administrative data use but no official policy. In addition, centralizing processes
for taking in and using administrative data need to be established, she said.
This would entail creating an inventory of data and assigning management
responsibility for each data source. There is also a push to mobilize existing
resources, prioritize research, and establish a governance process on how to
use administrative data.
For the future, Trépanier said, using administrative data to build sampling
OCR for page 50
50 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS
frames is of particular interest. There is the risk of coverage error in using an
administrative database in constructing a frame, but if it is done in the context
of using multiple other frames and calibration to correct coverage error, this is
probably less of an issue. The ideal goal is a single frame, which is the approach
used in building Statistics Canada’s Address Register, but this does not preclude
the inclusion of auxiliary information. A single frame would allow for better
coordination of samples and survey feedback, she said.
For data collection, one of the goals related to administrative data is to
enable tracing. Statistics Canada wants to centralize the tracing process lead -
ing to the linking of all administrative data sources to make available the best
contact information possible. This will require substantial effort, including a
process to weigh the quality of the different sources and determine what contact
information is most likely to be accurate. Another goal for administrative data
could be to better understand the determinants of survey response and improve
data collection procedures based on this information. For example, administra-
tive data can provide guidance on preferred mode of data collection if one can
assess whether persons who file their taxes electronically are also more likely to
respond to an electronic questionnaire.
Statistics Canada has been successful in using substitution of income data
from tax records, and this is likely to be continued. It is yet unclear, however,
whether other information is available that could replace survey data. Investi -
gating these options is done with caution because of the risk discussed. There
is also the problem of ensuring consistency between survey and administrative
data across variables.
Administrative data can also assist researchers in better understanding
nonresponse bias and the impact of lower response rates. Finally, they can help
both reduce the volume of data collected in surveys and improve estimation.
Now that Statistics Canada has the omnibus record linkage authority in place,
exploring all of these options has become a much easier process.
DISCUSSION
The discussion of the various methods used in the collection of household
data began with several questions about the Canadian system of house-
hold surveys. Kathleen Styles (Census Bureau) asked for clarification on the
omnibus record linkage authority—specifically, how did that come to pass,
what was the motivation, and what did it hope to accomplish? Trépanier
answered that it was established after someone realized that requests for linkage
were going to the policy committee quite frequently (about every two weeks)
and that many of these linkage requests were similar in nature. This process
became burdensome, particularly considering that the requests generally did
not involve disseminating administrative data. Since a record linkage authority
already existed on the business side, that was extended for use in the area of
OCR for page 51
51
COLLECTION OF HOUSEHOLD DATA
linking social and survey data as well. But it is important to remember that the
omnibus authority was designed to be used for evaluations that could improve
surveys—not to disseminate administrative data sources. And although going
to the policy committee is no longer necessary, the Access Division at Statistics
Canada must be notified of the administrative data use so that it can make an
inventory of all the linkages.
Styles followed up her question with another one about registers. A reg -
ister of persons is a loaded issue, but does Statistics Canada have permanent
files that are intended to represent all Canadian residents? In the discussion of
tracing and a centralized address frame, it seemed as if this may be similar to
a register. Trépanier responded that the central processes for tracing are under
construction now. As for the Address Register, the plan is not necessarily to
use it for all of Canada. As Tambay said earlier, the AR will be good for listing
in urban areas, but it is likely that there will still be a need for an area frame,
particularly for rural areas.
Cynthia Clark asked Trépanier to clarify under what circumstances is Sta -
tistics Canada required to obtain consent for the use of tax data. Trépanier said
that one interpretation of the Statistics Act is that permission is only necessary
if administrative data were to be used in conjunction with other survey data. In
those cases the respondent would need to be informed that the data are being
linked.
Graham Kalton reminded the participants that according to Trépanier’s
presentation, the SLID obtains permission from a high proportion of respon -
dents for the use of tax records, but about 15 percent refuse to grant permis-
sion. But researchers still have access to all the records. Is Statistics Canada
now allowed to match those records together to evaluate the returns? How is
this problem handled? Would it be better not to ask permission and just use
the records?
Trépanier said that they were interested in conducting a study of the SLID
respondents who refused access to their tax records, but it turned out that the
way they are currently asking for permission is very general, and this precludes
the linkage if respondents refuse.
A discussion participant asked Martinez for clarification on the integration
of administrative health data, specifically, whether a linkage of the National
Health and Nutrition Examination Survey (NHANES) to states is the issue
under consideration or whether something more elaborate is planned. Martinez
replied that, initially, the primary files being linked would be Health Interview
Survey data with Centers for Medicare & Medicaid Services data, using mostly
the Medicare files. The NHANES linkages to some state files are part of the
other pilot study, the nutrition and food assistance project.
Jay Ryan (Bureau of Labor Statistics) is interested in new data collection
technologies and asked Dillman what kind of research is being done with text
messaging for survey contact, particularly now that text messaging has become
OCR for page 52
52 THE FUTURE OF FEDERAL HOUSEHOLD SURVEYS
so prevalent. Also, how will the shift to larger cell phone screens, particularly
in the case of smart phones and tablet PCs, affect data collection? Phillip Kott
agreed that text messaging is becoming an increasingly important mode of
communication among young people in particular, who often consider phone
calls rude and expect a text message even before agreeing to talk to someone
on the phone.
Dillman said that he was not aware of much research on text messaging,
but this was something he has thought about, particularly what kind of coverage
it would entail and the type of people most likely to use it. He added that he
suspects that people who use text messaging frequently may be quite different
from those who do not. Another concern related to this technology is that if
people read text messages on the go, they are not going to stop to fill out a
survey, because they are probably not in a good place to do that.
On smart phones and tablet PCs, Dillman said that the screens of many of
these are still too small. Still, surveys will eventually be constructed for these
devices. He predicted that the first study of surveys on smart phones and tablet
PCs will happen as early as spring 2011.
This issue is a challenge even in the case of those who rely on email as
their primary form of communication, Dillman continued. In the studies he
has conducted of both mail and email contacts to entice survey participation,
he received a higher response when a questionnaire was sent via postal mail
than when an email response was requested. Young people also tend to go to
paper first. The bottom line, however, is that little progress will be made on
electronic surveys if all that is done is to send an email and then expect people
to respond. Even for young people, surveys will need to do something different.
This sometimes results in a higher cost for web surveys than mail.
Keith Rust noted that, in Westat’s studies of mode choice, many respon -
dents use more than one mode, which means that responses have to be undu -
plicated. This may be because respondents use a mode that is convenient to
them and then use another one in addition to respond to the survey because
they think that is what the administrators of the survey want them to use.
Dillman replied that it is critical that researchers be very clear about what
is requested of respondents. For example, if a web response is preferred, the
survey should state that and explain the reasons. Even then, giving a question -
naire to a person but then telling them to respond by another mode, web for
example, is a challenge, because the respondent will consider that the paper
is right there in hand and, in order to respond by web, one must wake up the
computer, and type in a complex URL.
Jelke Bethlehem asked Dillman for clarification on his advice not to use
CATI and computer-assisted personal interviewing (CAPI) in mixed-mode sur-
veys but rather use mail and emails. One of the Statistics Netherlands surveys
follows up web contact with mail, then CATI, and then CAPI. Does Dillman
OCR for page 53
53
COLLECTION OF HOUSEHOLD DATA
recommend that the CATI and CAPI follow-up steps be abandoned in this
survey?
Dillman clarified that he was not suggesting that any of the modes should
be abandoned. Different situations call for different modes. It is, however,
increasingly difficult to conduct a conversation with people over the telephone,
because that is not how the telephone is used anymore. Society has evolved
so that people control the phone, and they use it when they want to. It used
to be that they had to answer the phone or miss a call. Changes in culture are
contributing to the decline of phone surveys more than changes in technology.
The technology just made the culture change possible.
OCR for page 54