Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 61
4
Data Needs
Within the next few years, policy debates about the retirement income security of
current and future generations of Americans are likely to require a range of
modeling capabilities with which to evaluate and project the likely effects of
alternative policy proposals. However, as is clear from the preceding chapter,
there are important gaps and uncertainties in what is known about the behaviors
and processes that affect retirement income security. These gaps stem from
deficiencies in available data, which hamper or preclude the development of
robust analytical models and parameter estimates from them. In some cases,
notably for employers, there are insufficient data with which to describe the
distribution of relevant employer and employee characteristics, much less to
support analysis of behavioral change over time.
These deficiencies need to be remedied and the knowledge base further
developed before it will be possible to construct reasonably adequate projection
models with broad capabilities. Moreover, existing retirement-income-related
projection models and the associated databases have many limitations and do not
generally provide an adequate platform on which to develop improved models
once new data and research knowledge become available (see Chapter 5 and
Appendix D; see also Hollenbeck, 1995~.
Thus, there is a great deal of work to do to prepare for the policy debate.
With very tight budget constraints, the question is one of priorities. We conclude
that agencies should devote the bulk of their limited resources over the next few
years to data collection and analysis rather than making significant investments in
large-scale projection models. This conclusion is based on our assessment that
some of the gaps in needed data and basic research are so critical that projection
models, no matter how elaborate or elegant, cannot compensate for them. An
61
OCR for page 62
62
ASSESSING POLICIES FOR RETIREMENT INCOME
example is the failure of existing research to adequately explain observed savings
patterns in the population.
Moreover, past experience suggests that it takes more time to collect new
data and analyze them than it does to build a projection model to use data and
research in estimating the likely consequences of policy changes. There are more
than a few instances in the history of policy analysis when models were built in a
span of weeks or months. As an example, the prototype of the Carter admin-
istration's welfare reform projection model, KGB, was completed in a few weeks
(see Citro and Hanushek, 1991: 107-114~. It is very rare that needed new data can
be obtained and analyzed sufficiently in so short a time, particularly if the data set
is rich enough to be useful. A small-scale, quick-response survey of employers'
health care costs was completed for use in the recent health care reform debate
within 10 months from initial design to final output (Ponikowski, Scheible, and
Wiatrowski, 1994), but its scope was very limited. More detailed information on
employers' health care plans and costs that would have been useful, from a large
survey for which the design work had begun in spring 1993, was still not avail-
able by the end of 1996 (Hing et al., 1995~.
THE LESSON FROM HEALTH CARE REFORM
The experiences and reflections of policy analysts who provided estimates for the
1993-1994 health care reform debate underscore the panel's conclusion about
giving priority to investments in data and research. Box 4-1 describes the major
players in health care reform estimation and the models and databases they used.1
More lead time and prior investment would have facilitated the development of
usable projection models for estimating the likely effects of alternative health
care reform plans. Indeed, some timely investments that were made in model
building were helpful (e.g., the extension of the TRIM2 model to simulate em-
ployer-provided health care benefits). Conversely, inexperience with building
health care projection models, particularly with a database not previously used
for this purpose, was a handicap. That was the case, for example, for the Agency
for Health Care Policy and Research (AHCPR), which based its new AHSIM
model on the 1987 National Medical Expenditure Survey (NMES).
However, the model builders themselves pointed to major difficulties that
stemmed from the absence of critical data and research; see Box 4 2.2 Existing
data were so inadequate that it was difficult to develop an agreed-upon "baseline"
scenario that is, a representation of the current distribution of health insurance
coverage, utilization of services, costs, and other characteristics of consumers,
Information for this discussion and Box 4-1 comes from Bandeian and Lewin (1994), Bilheimer
and Reischauer (1996), Citro and Hanushek (1991, esp. Chap. 5), Nichols (1996), Office of Technol-
ogy Assessment (1993, 1994), Shells (1996), and interviews with analysts.
2See footnote 1.
OCR for page 63
DATA NEEDS
63
providers, and insurers let alone simulate the likely effects of alternative re-
forms relative to the baseline. Bilheimer and Reischauer (1996:149), speaking
from the Congressional Budget Office (CBO) experience, flatly concluded: "To
construct a comprehensive picture of the health care system is impossible with
today's databases. What is known must be pieced together from several inad-
equate or dated surveys and sources."
Also lacking was up-to-date research with which to estimate behavioral
responses to changes in the health care system. Bilheimer and Reischauer
(1996:152) noted that "such studies can credibly illuminate only the effects of
marginal changes in the current environment. The effects of large, systemic
changes that major health care reform proposals would generate are far outside
the boundaries of knowledge that can be gleaned from existing economic re-
search or even from social experiments." Nonetheless, they identified several
areas in which better data about the current system would have made it possible
to develop more credible estimates of the effects of reform proposals (see Box 4-
2; see also Bandeian and Lewin, 1994~.
In the absence of key data and research, rough estimates based on very
inadequate information or simply guesses were used for values of behavioral
parameters, and no projection model, however complex or elegant, could com-
pensate for the lacking information. Different models incorporated widely differ-
ent assumptions in key areas, and consequently, there were significant differ-
ences in estimates of the likely effects for the same reform plan (see Office of
Technology Assessment, 1993, 1994~. Differences in databases for example,
between the March 1994 Current Population Survey (CPS) and the 1987 NMES
aged to 1993-1994 also contributed to differences in estimates.
Moreover, in the heat of debate, it proved difficult, if not impossible, to
develop new sources of needed information on a timely basis. Subsequently, and
anticipating future health care policy debates, AHCPR and the National Center
for Health Statistics (NCHS) are working to implement a major reorganization
and expansion of health-related surveys that could meet many of the information
requirements identified by participants in the 1993-1994 effort (Hunter and
Arnett, 1996~.
The picture is much the same for retirement-income-related policy analysis,
namely, that key descriptive and analytical data with which to develop credible
projections of the likely effects of current and alternative policies are missing or
incomplete. As with health care reform, even the best data and analysis are
unlikely to resolve the uncertainty associated with major policy changes, such as
privatization of Social Security (which would resemble a system of universally
mandated Individual Retirement Accounts), because there is no historical experi-
ence on which to base any models.3 For example, an important issue about
3However, research on the experience of other countries with privatization schemes may help
develop projections for a u.s. system.
OCR for page 64
64
ASSESSING POLICIES FOR RETIREMENT INCOME
.............................................................................................................................
^,,u,,m,~, ~ ~ -ion An, l, -up u ~ Or -m,e,,nl or ne,a,,,~ ,,a,,nu num,,a,,n
privatization is whether it will increase or decrease personal saving. One can
argue that privatization will educate people about saving and what it can do for
them and thereby lead millions of people who now save little or nothing to save
much more, in addition to their mandatory privatized accounts. But one can also
plausibly argue that people will be more confident of actually obtaining payments
from their dedicated personal accounts than they are of receiving Social Security
benefits and thus will curtail other forms of saving (see Mitchell and Zeldes,
1996).
Nonetheless, as with health care reform, filling key gaps in data and research
knowledge can go a long way to make it possible to develop credible projections
of the likely effects of many retirement-income-related policy alternatives. We
urge that priority be given to strengthening the base of data and research for
OCR for page 65
DATA NEEDS
65
- ............................................................................................................................
.............................................................................................................................
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
.............................................................................................................................
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
.............................................................................................................................
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
.............................................................................................................................
.............................................................................................................................
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
retirement income modeling through improvement of existing data sets when-
ever possible and through new data collection when necessary, and including
appropriate levels of funding for analytical research and validation.
The remainder of this chapter addresses: the dimensions of databases that
should be considered in designing and evaluating cost-effective retirement-
income-related data collection systems, whether new or modified; issues in con-
tinuing existing panel surveys of middle-aged and older people in order to pro-
vide sufficient longitudinal observations for analysis of consumption, savings,
and retirement behavior of individuals; issues in developing new and improved
cross-sectional and panel data for employers and their workers in order to under-
stand labor demand and employer decisions about pensions and other benefits;
issues in linking administrative and survey data, which can be a cost-effective
OCR for page 66
66
ASSESSING POLICIES FOR RETIREMENT INCOME
.............................................................................................................................
MA l D A e D e 1 A De
~-v-~---v^--l--^---~-^r-~ r--v--~ ~--~-~-~ ~ ~ ~-m-~
N ..... . .
of nouseno at surly pm-w a ED a E-e so off ne u-~ meow
..... ...
""""""""' ' ' ' """"h'' " " ' 'h" 'I'd""""' '' ' 'b' ''--' ''""""h'' ' '1' h"""' at' '' """'' ~ 'h'' ' Ith~'' ' ' """'' ti'l'i' ' ti'' ' ~ 't'h'' """""""""""""""""""
........ .. ..... . . ..... ..... . . . . ......... . . . ... .. .................... .... . . ...............
- tit i i 1 l 1 i I r
:':':':':':':':':':':':':':':~:I':'E:V O=:':':':'E:':I'V :~:':':':LV' ':':':':'V :I':':':':W:I':'E:~.:L I :'t='I:::: WV :I':'n='E:'O':':':'~::I::~:I:::I:::::::~:I:::I:~:I: V : 1 - :~:~:I:::::::~:V::V ::~:1::~:~::::::: - V:~:I:~::::::::I:::I:~:V::~.::::::::::::
stat ES a' d
................. ........... .......... ..... ............... . ......... ................. .......... ...... ...
.............................................................................................................................
E j E EKE i 1l · .- r f i
-~- ~ ~- ~~- -.~. -~-~--~.- -~I-~-Ily~ l~l ~ l~lLl l w~l~ l~lvl~ ll w
........................................................................................................................................
~\IO.E .~ .~ne. ~l-.-~.-t-. 1~1 EtO EaU ~
':':':':':':':':':':':':':':'' :':':':':':':':':':':.': ':':':':':':':':':':'':':':':':,':':':':':':':':':':':':':':':r:':':':':':':':':':':':':':':':':' :':':':':':':':':':':':':':':':':':':':':':':':':':':'A':':':':':':':':':':':':':':':':':':':':':
Ta-m-l-- y---slal' -s~ oT~ e-m-p- oye-es~ ^n~ e-mp- -oye-r---su-' ey---conc -u-cTeo~ ~y---T 1-e~ -e-a ~ 1~
::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::::::::: ::::::::: ::::::::::::::::::::::::::::::::::::::::
.................................................. ~/ ........ ....... .. ~............... ... . f
'~ '#-~-~ -'&~-6-~ ~ '--~-~-'&~-6-~ ~-'#-l'#-~&~--'-'VI~ l'~l-~l'-'-'-~-'W'-'--'V-'&~ '~l'-~-'-' E-'~
/~ l~ e ~-eal~ E ~aE ~lnaE cing ^~mlE !sirmlon p~pam$ aE E'U'al"""'eSl!"""""""""
~ ' ''~''' ' ' '' ' ' '~' ~ 0'i~ ' E~ ' 6'~ '1~ ~t' t' -~ E~ t' ~ ~ ' 1 ~$ 1
............................................................................................................................
''''''''''''''" ' "'''''''' " ' ' ' ' ",''''' ' d' ''''i''' "' ' ''i' ' '''"'""""''' """" ' ' i'l'' b'l'' """t' ''""'' t' ' ' """" ''"""' ob' ' t
OCR for page 67
DATA NEEDS
67
- ............................................................................................................................
.............................................................................................................................
.................................................................................................................................................................................
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
.............................................................................................................................
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
.............................................................................................................................
................................................................................................................................................................................
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
.............................................................................................................................
............................................................................................
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
...........................................................................................................................................................................
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
OCR for page 68
68
ASSESSING POLICIES FOR RETIREMENT INCOME
.............................................................................................................................
.............................................................................................................................
means of obtaining high-quality measures of key variables with minimal added
expenditure; and issues of data validation, internally and in comparison with
other sources.
DIMENSIONS OF DATABASES
Databases differ on a number of dimensions, including source, reporting unit and
universe, type and frequency, scope, size, data collection methodist, accuracy or
validity, uncertainty, ease of use, level of aggregation, and cost; see Box 4-3.
There are trade-offs to consider among these dimensions. For example, the level
of uncertainty in survey responses can be reduced by increasing the sample size;
however, such a decision will increase costs. Similarly, an expansion of the
number and detail of survey questions will make a survey more useful for a wider
OCR for page 69
DATA NEEDS
69
range of purposes, but increase its cost and the burden it places on respondents.
Such an expansion may also make the data more difficult for analysts to use.
There are also trade-offs with regard to the use of administrative records
instead of survey data. Administrative records are usually thought to be more
accurate than survey responses. They are also relatively inexpensive to use for
analysis because the costs of data collection and processing have already been
incurred for administrative purposes. However, such records usually lack de-
tailed content, and their content may change from year to year to reflect changes
in program data requirements. Also, administrative records are not without errors
(e.g., Social Security and Medicare records may have inaccurate information on
whether individuals are still alive). Indeed, when comparing information for the
same variable in an administrative records data set with a survey estimate, it is
important to take account of likely errors in both sources and of differences in
definitions and other features that could affect the comparison. Finally, adminis
OCR for page 70
70
trative records are often inaccessible to researchers because of concerns about
maintaining confidentiality for individuals and other reporting units.
Data sources are rarely satisfactory for both analytical and projection model-
ing purposes on every dimension. In fact, analytical and projection models often
use different types of data. For example, analytical models of individual behav-
ior generally require rich longitudinal data from panel surveys, but models that
project individual outcomes rarely use panel surveys as their primary database
because of small sample sizes and restricted universes.4 Yet if the projection
model database does not contain a similarly rich set of variables as were used to
estimate key behavioral relationships in an analytical model, it will not be pos-
sible to take advantage of the most advanced behavioral models. Instead, the
behavioral relationships will have to be reestimated with a reduced variable set,
or such procedures as statistical matching (see Cohen, l991b) will have to be
used to impute needed variables to the primary database. (We discuss this issue
further in Chapter 5.)
ASSESSING POLICIES FOR RETIREMENT INCOME
il
PANEL DATA ON INDIVIDUALS
An underlying theme throughout our report and the papers we commissioned
(Hanushek and Maritato, 1996) is the need to understand how people reach their
retirement years. What enters into decisions about working as people age? What
are the implications of different employment paths for pension plan participation
and the level of benefits received in retirement? How do government and em-
ployer policies affect personal savings behavior and the ultimate wealth accumu-
lations that influence both retirement decisions and well-being in retirement?
Questions such as these emphasize two key issues that have implications for
data collection and analysis. First, many of the antecedents of retirement out-
comes are present long before any actual retirement decisions. Second, behavior
that is related to policy often has a long time horizon, with individuals looking
many years into the future as they make decisions. To obtain suitable data for
analysis, it is essential to follow individuals over many years in order to under-
stand their retirement behavior and outcomes.
This central fact leads us to emphasize the development of panel surveys
that obtain longitudinal data by interviewing the same individuals over time.
Panel surveys, which have become increasingly common to study individual
behavior, permit investigation of behavior that evolves and that has implications
over long periods. Moreover, panel surveys provide a variety of ways for dealing
with the heterogeneity across individuals that can complicate analyses based
4An exception is a recently developed public assistance model STEWARD (Simulation of
Trends in Employment, Welfare, and Related Dynamics) which directly uses data from the Na-
tional Longitudinal Survey of Youth (NLSY) to simulate the effects of welfare reform proposals on
program participation (Jacobson and Czajka, 1994).
OCR for page 71
DATA NEEDS
71
solely on a cross-section of individuals. Finally, panel surveys can often permit
corrections for measurement and observational errors because consistency checks
for individuals over time can aid in separating errors from true changes for
individuals.
Of course, the need to follow the same individuals over long periods implies
that a panel survey is likely to be expensive certainly more expensive than a
one-time cross-sectional survey of equivalent size and perhaps more expensive
than a repeated cross-sectional survey.5 Also, for cost reasons, it may be difficult
in a panel survey to refresh the sample frequently enough to address such ques-
tions as whether patterns of behavior remain the same for newer cohorts or to
maintain representation of a changing population (e.g., to represent immigrants).
The trade-offs often suggest the need for cross-sectional data collection. For
example, we argue below for collecting data to understand employer behavior
that is relevant for retirement income security, but we believe that the first step is
to improve cross-sectional data. Although a panel may later be appropriate, the
initial efforts which include learning about what data to collect and how and
what the sampling frame should be would most appropriately be thought of as
a cross-sectional effort. Also, there is a need for regularly updated descriptive
information on trends in the characteristics of employers, work forces, and ben-
efits that more efficiently comes from repeated cross-sections than from panels.6
Similarly, repeated, nationally representative cross-sectional surveys are
needed to provide important data on trends in the population that are relevant to
tracking and understanding retirement outcomes (e.g., trends in ages at retire-
ment). Nonetheless, the central longitudinal data with which to analyze indi-
vidual behavior and individual decisions should almost certainly be gathered
through panels of individuals. Although cross-sectional surveys can use retro-
spective questions to collect longitudinal information, such as employment and
earnings histories (and in some cases this is done), the quality of retrospective
information is much less, compared with panel surveys, because of recall and
other errors, which may be large (see, e.g., Kennickell and Starr-McCluer, 1995~.
Also, cross-sectional surveys are limited in the amount of retrospective informa-
tion that they can collect due to considerations of respondent burden. Panel
surveys, in contrast, can obtain a wealth of information with which to understand
different life courses and retirement outcomes.
5Whether a panel survey is more or less expensive than a repeated cross-sectional survey with the
same number of sample members is affected by many factors, such as frequency of interviews, costs
of obtaining an interview (a panel survey may have higher costs to locate sample members but lower
costs to obtain an interview once the sample member is located), and others.
6Pane1 surveys will provide consistent time series for a population as well if a new panel is
introduced on a frequent basis, such as every year; however, costs will be prohibitive unless the size
or length (or both) of each panel is reduced, which will, in turn, reduce the usefulness of each panel
for longitudinal analysis.
OCR for page 121
DATA NEEDS
121
and employers at very low marginal cost. The major difficulty concerns how to
provide access to such data for research and modeling purposes when their use
raises concerns about maintaining the confidentiality of respondent information.20
Records on Individuals
Greater access to Social Security Administration earnings and benefits records
could advance many important areas of retirement-income-related analysis and
modeling. As a stand-alone database, SSA records have the potential to improve
U.S. data on mortality at older ages and to study the relationship of socioeco-
nomic status (as measured by earnings levels) to mortality.21 Such studies could
be carried out by SSA staff or by researchers who are sworn in as SSA employees
to prevent disclosure of confidential data (as has been done for some Census
Bureau studies). Given the importance of mortality projections for projecting
retirement income security, we urge that priority be given to mortality research
with SSA records.
More problematic from the perspective of confidentiality protection are pro-
posals to link SSA records with survey responses. Some studies have been done,
but they have been limited. Exact-match files of SSA records with the March
1973 and 1978 CPS, developed by the Census Bureau, were made publicly avail-
able (the 1973 file included an exact match with IRS records), as were exact-
match files of SSA records with the Retirement History Survey. However, no
exact-match files of SSA records with CPS data for years later than 1979 have
been developed for public use. The Census Bureau has developed exact-match
files of Social Security records with the 1984, 1990, and 1991 panels of the
Survey of Income and Program Participation (SIPP), but these files are made
available only to SSA analysts with strict restrictions on use. The Census Bureau
recently released a public-use, exact-match file of the March 1991 CPS with
selected data from IRS administrative tax records. In this file, techniques of data-
switching and the addition of noise were used to mask the data so that no sensi-
tive information that could identify specific individuals was released. More
extensive matches of IRS data with CPS and SIPP files have been used to evalu-
ate the quality of income reporting in the March CPS and SIPP and for research
on improved weighting schemes to reduce the variance of SIPP estimates, but
these files are only available internally to Census Bureau staff.
The Department of Labor sponsored a 1977-1978 Survey of Private Pension
Benefit Amounts that linked private employer pension plan records on beneficia
20See Duncan, Jabine, and deWolf (1993) for a review of confidentiality and access issues for
federal statistical data and promising avenues for addressing the difficulties.
21If SSA tracked marital status of all beneficiaries, then SSA records could also support needed
analysis of the relationship of marital status to mortality.
OCR for page 122
22
ASSESSING POLICIES FOR RETIREMENT INCOME
ries with SSA earnings and benefits records (Office of Pension and Welfare
Benefit Programs, 1985~. This survey used the Form 5500 database to sample
private pension plans and obtain information from plan administrators on ben-
efits paid to individual plan participants. The matched records of pension and
Social Security benefits and earnings were used to analyze the contribution of
employer pensions to retirement income security (e.g., to calculate earnings re-
placement rates). The response rate from plan administrators was low (about
50%), and large defined contribution plans were underrepresented. However, the
matched data were viewed as more accurate than household survey estimates of
pension retirement benefits, which are typically underreported. No public-use
files were made available from the survey, and it would presumably be difficult
to do so if it were to be repeated.
Legislative restrictions are one reason that publicly available exact-match
files of SSA and survey data have not been developed in recent years. Another
reason is that statistical agencies have become more concerned with questions of
privacy and confidentiality of data and the potentially adverse effects on survey
response rates if people believe that their replies are not held in strict confidence.
Nevertheless, there is a strong need for exact-match files. Calculations of
expected Social Security benefits require either complete histories of covered
earnings or summary variables, such as average indexed monthly covered earn-
ings over a worker's span of employment, that in turn derive from earnings
histories. Such histories are difficult to obtain retrospectively in surveys and
would require decades of data collection to obtain prospectively. Earnings histo-
ries, including earnings above the payroll tax ceiling (available in SSA records
beginning in 1979), are also helpful in calculating expected benefits from the
types of employer pension plans that calculate benefits on the basis of several
years of highest earnings with the employer or that specify employer contribu-
tions as a percentage of earnings. Finally, benefit histories are useful to evaluate
and augment survey responses of Social Security income.
Plans are now being implemented to make available on a restricted basis
exact-match files of HAS/AHEAD and SSA records that will provide very valu-
able information for analysis purposes. (Links will also be made with HCFA
Medicare data and possibly with state Medicaid data.) A three-pronged strategy
will be followed to protect confidentiality. First, linked data files with complete
earnings and benefits histories will be made available on a limited access basis
only to researchers who sign nondisclosure agreements that include penalties for
violation. Second, public-use files will include only summary variables derived
from the earnings histories. Third, estimated Social Security entitlements that
have been computed under a variety of assumptions will be made available to
HRS users under restricted conditions (Mitchell, Steinmeier, and Olson, 1996~.
We support the preparation of exact-match files that link SSA and other
administrative records with HAS/AHEAD and urge that arrangements be made to
perform these linkages on a regular basis. We also encourage the Census Bureau
and SSA to consider the development of SIPP-SSA exact-match files that can be
OCR for page 123
DATA NEEDS
123
made publicly available by following the strategy of HRS and AHEAD, namely,
to provide summary variables derived from the earnings histories that facilitate
the calculation of expected Social Security benefits. (lams and Sandell [1996i,
SSA researchers who are using matched SIPP-SSA files for Social Security ben-
efit modeling, make a similar recommendation.) There are plans to include SSA
information on Social Security benefit type, and whether the respondent has died,
in publicly available SIPP files. We support these efforts and also urge consider-
ation of developing SIPP files for public release that include derived variables
from SSA earnings records.
Records on Employers
Administrative records for employers, such as financial statements that are ab-
stracted in Compustat and the Form 5500 data series, provide useful information
for analytic purposes. These particular data sets, unlike SSA records, are derived
from public documents, but problems can arise when they are merged with other
data for which confidentiality protection is promised (e.g., BLS or Census Bu-
reau surveys).
Employers are sensitive about the release of data that could be useful to
competitors, and it can be very difficult to mask such variables as employer size
sufficiently to prevent disclosure and at the same time maintain the analytical
value of the data. Indeed, microdata from employer surveys, let alone matched
survey and administrative records data, are often not made publicly available at
all.
Sometimes agencies are willing to retabulate confidential data at the request
of researchers. For example, BLS has linked Form 5500 data with the EBS and
run analyses for outside researchers. However, the researchers were not them-
selves given access to the microrecords, and they found this mode of data access
very limiting (MacDonald, 19951.
One possible strategy to provide greater access to matched employer data is
to adopt the strategy proposed for exact-match files of SSA earnings histories
with HAS/AHEAD. Under this strategy, researchers could gain access to the
complete data sets under very strict conditions of use. At the same time, public-
use files could be developed in which key administrative records variables are
summarized in a manner that is most relevant for research needs and other steps
are taken (e.g., limited geographic identification) to prevent disclosure. If this
approach is adopted for matched employer data, it would be important for agen-
cies to consult with researchers to determine the appropriate summarized vari-
ables.
The Census Bureau is pursuing another very promising approach for re-
search access to its employer data files, including the LRD, which have not been
available for use except at the Bureau's headquarters. This approach may pro-
vide a model for other agencies. Several years ago, the Census Bureau, in
OCR for page 124
24
ASSESSING POLICIES FOR RETIREMENT INCOME
collaboration with the National Bureau of Economic Research, a private organi-
zation, established a secure Research Data Center at its Boston regional office.
Researchers may come to the center, be sworn in as special Census Bureau
agents, and use the data sets on site. Census Bureau employees must review any
output that researchers take with them to ensure that it does not identify specific
respondents. Although more limiting than use of microdata at one's own institu-
tion, this arrangement is far preferable for researchers in the Boston area than
having to come to the headquarters in the Washington, D.C., area. The success of
the Boston data center has led the Census Bureau to set up a second center at
Carnegie Mellon University in Pittsburgh, and the agency is exploring research-
ers' interest in having similar centers in other major cities around the country.
Recommendations
11. Matched files of panel survey responses and key administrative
records should be regularly produced for retirement-income-related policy
analysis and projection purposes. Examples include exact matches of survey
records with Social Security earnings histories and benefit records, Medi-
care and Medicaid records, and the National Death Index. The added infor-
mation in matched files is obtainable at low marginal cost and is essential for
analysis of retirement and savings decisions and the effect of medical care
use and expenditures on retirement security.
12. Agencies should collaborate on the development and oversight of
matched data sets for individuals and employers, with input from research-
ers on content. They should also vigorously explore creative solutions for
providing research access to exact-match files that safeguard the confidenti-
ality of individual responses. Possible solutions include: (1) developing
public-use files that contain summary variables derived from the adminis-
trative records portion of the matched file; (2) requiring researchers to sign
nondisclosure agreements with significant penalties for violations; and (3)
providing researchers with access to matched files on site at secure data
centers.
DATA VALIDATION
Validation of databases that are used in behavioral and projection models is as
important as validation of the models themselves. Sampling errors in data inputs
are one source of uncertainty of model estimates; more important, nonsampling
errors can introduce both uncertainty and bias into model estimates. Data valida-
tion is essential to identify the types and magnitudes of such errors. It is also
essential for survey methodological research, which should be part of every data
OCR for page 125
DATA NEEDS
125
collection program to determine procedures for improving data quality at the
outset by improving questionnaire design and data collection procedures.
There are many sources of nonsampling errors in both surveys and adminis-
trative records. One source is unit nonresponse, that is, failure by a reporting unit
to provide any information at all. Panel surveys are subject to cumulative unit
nonresponse over time, or attrition, as people become tired of cooperating with
the survey or move and cannot be traced. Other sources of error are nonresponse
to specific items, overreporting (e.g., a false positive report of pension coverage),
underreporting (e.g., reporting an amount less than actually received for an in-
come source), and misclassification (e.g., reporting a defined benefit pension
plan as a defined contribution plan or vice versa). Yet another source of error in
surveys is undercoverage of the population because the sampling frame does not
include all people or employers in the universe or other reasons. For example,
household surveys of the general population almost always have low coverage
rates of such groups as young minority men.22
Surveys and administrative records systems use several methods to try to
compensate for nonsampling errors, such as adjustment of survey weights for
population undercoverage and attrition, imputation for item nonresponse, and
editing for misclassification or inconsistency in reporting. However, these proce-
dures are not likely to maintain all of the underlying relationships and may
themselves be a source of bias.
Validation Methods
Validation involves estimating overall error rates and the contribution of indi-
vidual sources of error to them, including the contribution of weighting, imputa-
tion, and editing procedures. The problem is to determine appropriate bench-
marks for comparison. There are several approaches to validation; see Box 4-10
for examples of their use.
Reinterviews Asking a sample of respondents the same question in a
reinterview cannot establish which answer is correct, but it can indicate whether
the responses are robust in the sense that there is a high level of consistency
between the answers given originally and in reinterviews.
Use of Alternative Question Wording Experimentation with different ques-
tion wording, or other aspects of questionnaire design (such as the order in which
questions are asked) may determine that the responses are sensitive to such
22Coverage rates are developed by comparing survey population estimates by age, race, and sex to
census population estimates updated by births, deaths, and estimated net immigration; Medicare
records are used for the elderly.
OCR for page 126
126
ASSESSING POLICIES FOR RETIREMENT INCOME
.............................................................................................................................
:::::::::::::::::::::::::::::::::::::::::::::::::: ::::::::: ::::::::::: ::::::: :::::::::::: ::::::::: ::::::::::: ::::::::::::::::::::::::: ::::::::::: :::::::: :::::::::::: :::::::::
S::::::::::::::::::::::::::::::::::::::::::::::::: .: of: _' :_::: ::: :: ::: ' ,,.A :': :':'A:':':':':~:':'A _' . ' ,,.A :': :: : ::, ::::~:: :: :
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
~ Aaa~aalO HepO=lna toners in ASSel virtues DO Lyme gremlin
- ... a.. a... .... ~
................. - -
~ ~-l.-.Y.Y-~-:--~--~- -a-- ! advises ~ co-m-pa em ag-g-reg-a Ed asleep val-u-e
. ~..
.........................................................................................................................
... ~
............................................................................................. _
.............................................................................................................................
.......................... - -. - -
'''''''''''''''~!''L1 1 't~'''1 - ''1 '';' 1~!''1''!'1'!'''V `!''I A.. l - - l v ~ Y - '.''''''1''1'!'w!'w''~'1'w'' - !'~w'1'~ ~1'tw!'~=''1!''1''1'w~V l'`;t'l'!'~'''''''
- _
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::::
. ~ A~te--Re on~ng--~ ~ In on--l.n ome--Amo .n~ C d ~
.............................. . ....................... .................... ............................................
............................................................................................................................
~ acne Coon-- -~og-ers~ ~---~--~-~-:-~---~--- ~ I a--e ~ co-m-p-a-reu~ l-n-c-ome~ ~-po-~l-n-g~ lint
.... .. ~
. . . . . . .
- ~trl l~t^^ ~^m tn~ ~l~tl^~l ln~d m^---~r ^---v~-~-~-l-l- .~.t:':':'~:~^d I lr tO---l-~t-l-~-~-~ l--n~l-r~
'''''''''''''''~'t'i'" .:L.~'''!'!'~'i'!'!''''t'l'!' - .-'''i'~'~ L!'~t'l~!''''I'I'!~'I'l'!'~''~I'!'~'''I''''I'~L'''~'Y'I'!'t~'''t'I'~!'I'''~`'~'''''''!''I'!
~ ~ ~| h I Ih Tj i T ~ |~ l I I P j T ~ i T
:::::::::::::::.:~..b~j:::: :..:.:.:~:.:.:~ :~:~-~..:~:~...:.y :~.::.:::~.:.:.:
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::....:::::::::::::::::::::::::
'~ 't~'d t~ n-ot~ p- -ol-l-c)- e -p-love-d ~ p-e-nslo-d S--~-md tne ~l~A ano maKl' -a~ oin-e-r~ aD~
::::::::::::::: ::::::::::::::::::::::::: ::::::::::::::::::: ::::::::::::: :::::::::J:::::::::: ::::::::::::::::::::::::::::
:::::::::::::: :::::::::: :::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::: :::::::::::::::::::::::::::::::::::::
: :::::d::::::::::::::::::: :: ::::1:::::::::::::::::::::::::::':::::':':::':':::':'':':':':':::'
:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:~- T^:T.-~-:-:-:d :~:~-:-:-~-~-~-I:-:-:-~-~-~-J ^-:r~-~-~-: - T -:-:-~-~-~-1:-~-~-~-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:~-~-:-:-:-:-:-:-~:-:-:~-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:
.................................. ~ L".L= "I I ~ tW~=I ~ V V ~ I 1 ~ l ~ l ~ l l ~ ~t l~l ~ l l ~
I :2:2:2:2:2:2:2:2:2:2:2:2:2:2: 4:: 1:l: t: ::: :1:1: i T .::: : ::: : T:: d 1: :I T: t ' '~''T r' ' ''' '' ;' ''' i' ''' ' ' ' ;'' ' '''' '' '' ''I' t i
1 1 U 1 1 1 = ~ U 1 1 1 1 =~' ~ LO 1 =. 1 I V ~ ~ U ~ L 1 "~ L~ ~ 1 ~ VI 1 1 ~1 I V "~. ~ =t 1~1~1 1 "1 1 1~ U 1 1 L ~11 1
I I A ~ ~X A ~ ^ 1
' ' ' ' ' ' ' ' ' ' ' ' ' ' `1 't~' ' ' !' '~'1'i' ' ' ~ ' ' ' ' `1 't~'1 ' !' ' ' 'i'1 ' !' - .-' ' ' V l 'l ' ' ' 'l ' ' ' ' ' '~ !' 'l ~' ' 'i'V l:~l:'w'l:':!':':':~'i':':':'V:: :~:~:L! l: :! :l~::: ~:! :~:: :V:!: l:l:~:: :~ ~. . ~' ' '
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
. . . ...... ~ ...... .......... .........
.............................................................................................................................
............ ~ t'entS""'OT""'!"d 'd 'd ' $ d-'d '-'-'d enSl'On''g'!$lr!0ul!-o~ S""are""'t ep-'t l'nq""'l'n'em"'aS'''d -q- d-'l-ar~ l'd '' '~
OCR for page 127
DATA NEEDS
127
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- lmoulallon turrets on onus enor 1naor aarnmas Lo em ~ ~
........................................ A ....................... ................................. ... ... ......... ....... ............ ................. .............. .... By. ....... ...........
::::::::::::::::::::::::::::::::: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
................ - . - ~ ....... .........
co-m-pa en ea-m-l-n-as~ e-oo-r so off m-a--e-c ~ coup -es lint t 1e calm 1 l ad
-hi alas 1~ ~ ~ -~ waling exact ye malc-n-ea~ l-~-~ reco-ras~ ~-e~ To-u-no~ Doing
..............................................................................................................................
-
~ u--nc-e-r epoding anD ~rrepoding pmolems Me Tuhner Determines tnat
::::::::::::::: It:::::::::::::::::: :::::::::::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::::: :::::::::::: ::::::::::::::::::::::::::::::.:::::::::::.:: ::::::::::::::::::::::::::::::::::::::::::::::::::::::
~ knee -ensues Flu-- earl so l-m-p-ulal-l-on~ p-roc-ea-u- es Tor missing reports Inc ease-a
Q . . . . . . .... .
~ . u e.-s.~.ion~.vr.om.ing--- ~-~.Is---on---c m picy.men-~---- Janus expense v. e---q-u-es~
....
............... , - - . . i ..
~ t Io-n-nal--re~ tem-l-n-~ l-n-c-l- ulna
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: - , :::::::::::::::::::::::::~::::::::::::::::::::::::::::::
.............................................................................................................................
......... I !~^ To r=~Imn To" ~ - ` ^' l==TI^n RIO! A.-= TO r.-=TI= ~one nOmO ~ n =mO
':':':':':':':':':':':':':':'w'w'w'~:':':':~ .~.-.:':':':'I:'~:~:~:~:!'~:I':'l:':':':~!':'l:~':':':'~:!':':':':~:':':':~:~:~'~'l:~:!':'l:'l:':l~l:'l:':~:':':':~.-.:':':':'I:':~:I':I'~:~:':':'w'I:':I'~:I':'I:y:~:':':':!':l':'l:':':':~: I :'1:':1'~'1
~ ' e'' to a e''' -so d'-'-ta' i-l-itate~ ' es'' ' -' ' em O-' We'd e '-'-' t~ ' ' ti'' ~ '' h' ' Be' ""e'l*' ''I
........................................... ........................................................................................................................................ ..................
...............................................................................................................................
. , ,
~ -e-a more e-p-o-ns~ -room women wo-rK'ng
................. -. - - - - - - -.
~ lOem as nol ln lHe la~or Iome ~se oT me new quesTIonnalre anu uaTa
.............. ~.~.''''' t'r' ' '''' '' ' ' ^'i''''' ' ''' t~' ' i' ^''' '' ' ' '''~' ''''I' ^' ''''t' 'r.' ' ''' +' 't.'' t'r' ' '''1' ' ' '''~' 'lr''' '''''''''''''''
......... ''''''~'l 1~ ~ I'~t' 1 ''~'E'V~ U I '=O'''" I' l'=. ~ L=. "'' MV'I'I'I'~''~= V'''t=~V I''''t'V I'~.''= L"'L'I ~ ~ I'~O''l'~='~''l''''V l'I'V ~ ''''''''''''''
-
~r see - ~ ~-.-Ae~Qr~.~ _rr.~r~ ~l er.-~^l ~ ~ t~^ ~ --~^r .^ l l ~ t ~l~ l l--
.-~.~-~-'&lV~-~'-'-'~V'-~,'-~-'~y=-'-'_~-'l-'Vl'--'~ '~'U'-I'-='~U'~ V'!~
~2) e s ea e te i the Ma h 1994 CPS b a e e a d e
....................... ............. ............................................................... ...............................................................................................
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:::::::::::::: - I::::::::::::::::::::::::::~::::::::::::::::::::::::::::~::::::::::::::::::::~:::::.:.::::::::::::::::::::::::::::
~ l~ ne~ wo-rSl~ cove-ma~ po-p-u--lmlo-n~ g-~-u-ps~ a-~ yo-u-n-g~ an-a~ m-l-a-a-l-e---age-a~ ~-l-ac-K~
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :::::::: ::::::::::::::::::::::::::: :::::::
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
~ me-n--:~ co *e-~ge~ ra-les~ are~ ~ l~ p-e-rc-en-l~ I--~ pe-me-n-~ ~--f~ p-e-rc-en-l-l~ a-n-u~ ~
...................................... - - - - - - -. - -
' ' ' ' ' ' ' ' ' ' ' ' ' ' '~= I '~='I' I L - ' ' ' ' l' '=O'~ Ll'V'= i'V '- ' ' ' ' 'I ~'I' ' ' ' '~ I=~ ~' ' ' 'I'l' l'= I 't' ' ' ' "~'=~ ' ' ' '~ V.-~ - '- ' ' ' ' = - ' ='= -' ' ' ' '~V '~ - - ' ' ' 'M'1 'l'~' ' ' ' ' ' ' ' ' ' '
~ a a a i il I I
....
::::::::::::::::::::::::::::::::::::: :::::::::::.::::::: ::::::::: ::: ::::::::::: :::::::::::::::: : ::: ::: : : :::::::::::::: :::
..... 2''''''''~ ^^'''~O t^'O'''l ^''tn^'''N'n'~'r.-^n'''1''a"~''' ''---'w'`-'''~'M'M ''` I' v v V'''''''B'' n ^''~'t ^^'rt'''' ''M'^ n ^'r~'l'l' t'' r O' r^'''''''''''''''
::::::::::::::~:~:::I:~:L~:::I I::!::Lt:l~:::!:V I:~I:~I::!::::I::~:V~:: *:!::::V::~!:I:~::~l:t:::t:::):.::::::1::1::!~:::~:!~:~:1:1: y:::~:~l::l - .:l :~l:l:y:::l:
~ b'ed'''"""' ' ' ' "a' '' """' ' s'""""""94"""' ' '' ' " '""""'96"""' e" ' '' '""""'93""" ' '' e" '"""
.................. ...................... ~ ................. ............ ~- j ~.........
.... .
~ ce-n~ res-n-ecilv.-elV ~orolac~men olac~women nonolac~men ano non ~
1 ~ :::::::::::::::::::: ~: t:::::::::::::::::::: ::::::::::::: ::: ::::: ::: ::: ::: ::::::: ::: 1
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::::: :::::::::::::::::-:::.::::::::::::::::::::::::::::::::::::-'::::::::::::::::::::::::::::::::::::::::::::::::::::::-::'-'::::::::::::::::::::::-::::
~ $ tl ~ h-- e ti- -t-d ~eae t ~ f~hi- h ~ t-han~
................................................................................................................................
. - . - *:::::::::::::::::::::::::::::
~ pe-me-nt---a-u-e---to~ ag-e~ repo-~-n-g~ e-~-ro-~-*~ ~ee~ Te~---To-~---a-eT-I-n-l-l-l-on~ a-n-a~ m-eas-u-re~
. ... .. ....
variations. In the past 10 years, federal statistical agencies have made increasing
use of techniques from cognitive psychology to study in greater depth the ways in
which respondents react to and interpret specific question wording. The results
of such methods, which include one-on-one sessions in which a researcher probes
the respondent after each question to ask what he or she had in mind when
answering, have often shown startling differences in perceptions between respon-
dents and survey personnel (see Jabine et al., 1984~.
Aggregate Comparisons of Two or More Surveys Comparing aggregate
estimates from one survey with aggregate estimates from another survey that is
believed to be superior can provide an overall measure of data quality. For
OCR for page 128
28
ASSESSING POLICIES FOR RETIREMENT INCOME
example, as discussed above, estimates of household wealth from such surveys as
HRS or SIPP have been compared with estimates from the SCF. Another ex-
ample is comparing estimates of retiree pension and health care benefits from the
March CPS income supplement with estimates from the detailed supplements
that have been conducted occasionally on these income sources (most recently in
September 1994; see Pension and Welfare Benefits Administration, 1995b).
However, aggregate comparisons do not generally shed light on the sources of
error in survey estimates. Also, they need to be carefully made to ensure that
definitions of the reporting universe and data items are comparable between the
surveys being compared.
Aggregate Comparisons of Surveys with Administrative Records Data Sur-
vey and administrative records comparisons are often viewed as a preferred
method of measuring overall data quality, on the assumption that the administra-
tive records estimates represent "truth." For example, validation studies of the
quality of income data in such surveys as the March CPS and SIPP have used
estimates from IRS tax records, food stamps and other program records, and the
National Income and Product Accounts (NIPA) as benchmarks.
However, such comparisons often require extensive adjustments of the ad-
ministrative sources, which cannot always be completely made, for consistency
of coverage and definitions with the survey data. Thus, comparing NIPA and
survey income estimates requires adjusting the NIPA estimates to exclude in-
come of institutionalized people, Armed Forces members overseas, and others
who are not covered in household surveys (including nonprofit institutions in
some cases). In another example, comparing the percentage of private wage and
salary workers who participate in employer pension plans between the Form
5500 data series and the periodic supplements to the CPS on pensions requires
several adjustments (see Belier and Lawrence, 1990~. The two series do not
include exactly the same types of pensions; also, the Form 5500 series includes
nonvested participants who left their jobs less than 1 year previously, and it
double counts workers with more than one job in which they are covered.
Finally, administrative sources are not always error free. For example, there
is evidence that earnings are underreported to assistance program caseworkers,
which suggests that household surveys are not necessarily inaccurate when they
find higher proportions of public assistance recipients with earnings than shown
in case records. Also, Medicare records are not an entirely accurate representa-
tion of the older population, given the problem of phantom enrollees (records for
people who have already died).
Microlevel Comparisons of Survey and Administrative Records Exact-
match files make it possible to carry out detailed validation studies that decom-
pose overall error levels into specific sources of error, including overreporting,
underreporting, misclassification, erroneous imputation for nonresponse. Again,
OCR for page 129
DATA NEEDS
129
care needs to be taken to assure comparability of universes and data items: for
example, not everyone is required to file a tax return.
Because of confidentiality restrictions, the opportunity for microlevel error
analyses has generally been limited to federal statistical agency staff. One analy-
sis by outside researchers is Herzog and Rubin (1983), who studied the quality of
March CPS Social Security benefit imputations with the publicly available 1973
CPS-SSA-IRS exact-match file. David et al. (1986) carried out a similar study of
earnings imputations with a 1981 CPS-IRS exact-match file that they used while
working at the Census Bureau as special sworn agents.
Validation Needs
To improve the capability for accurate modeling and analysis of retirement-
income-related policies and behaviors, validation studies of key data sources
should be carried out on a regular basis. Such studies can provide important
feedback to data collection agencies to improve data quality at the source. They
are also needed to enable researchers and policy analysts to determine appropri-
ate strategies to compensate for data problems in their models. For these pur-
poses, it can be useful to develop data quality profiles that are regularly updated
as new information becomes available. Quality profiles bring together the results
of validation studies for a particular survey or administrative records system into
a comprehensive document that describes sources of error and their magnitudes,
where known, and that identifies areas for which more validation work is needed
(see, e.g., Jabine, King, and Petroni, 1990, which is a quality profile for SIPP.)
Several kinds of data validation studies could be useful for retirement-
income-related databases.
Comparing CPS, SIPP, and HRS Reports of Pension Participation SSA
recently completed a comparison of the May 1993 CPS pension supplement with
1993 data from the 1992 SIPP panel, finding that participation (coverage) esti-
mates in the two surveys are almost identical (lams, 1995~. A similar analysis
should be performed for all three surveys for the HRS age cohort.
Comparing Household Survey Reports of Pension Participation with Esti-
mates from Employer Administrative Records Aggregate comparisons, such as
the study by Belier and Lawrence (1990) of the CPS pension supplements and the
Form 5500 data series, should be carried out on a regular basis. More work is
needed to improve the validity of such comparisons to account, for example, for
worker participation in more than one plan and in plans of more than one em-
ployer.
Microlevel comparisons of household survey reports of pension plan provi-
sions with employer records are possible and should be carried out for sample
members of HRS, although the quality of the analysis may be affected by the
OCR for page 130
130
ASSESSING POLICIES FOR RETIREMENT INCOME
relatively low rate of employer response. About 25 percent of sample members'
employers did not respond to the request for Summary Plan Descriptions, and
another 10 percent provided inadequate information with which to code relevant
pension plan features. This level of employer response is typical of the experi-
ence of other surveys that have requested the descriptions, such as the 1989 SCF
and 1989 NLS-Mature Women (see Juster and Suzman, 1995:44-45~.
Comparing Household Survey Data on Income and Assets Across Surveys
and with Administrative Records Comparisons should be regularly performed
of household survey reports with other surveys (e.g., comparing wealth estimates
from HAS/AHEAD or SIPP with the SCF) and with NIPA and other administra-
tive records sources (e.g., income tax records). Such comparisons, particularly
with administrative records, require considerable care.
With regard to pension income, a major issue is the treatment of the rapidly
growing phenomenon of lump-sum pension distributions, which are treated dif-
ferently in different surveys and records. Lump-sum distributions are included in
the NIPA accounts and in income tax returns; according to the income concept of
the March CPS, lump sums are not to be reported; SIPP has a separate category to
report lump sums of all types; and HAS/AHEAD has questions on several types
of lump sums, including pension distributions. Comparisons of March CPS,
SIPP, IRS, and NIPA data suggest that some CPS and SIPP respondents may be
reporting lump-sum pension amounts as regular income, but the extent to which
this happens is not clear (Coder and Scoon-Rogers, 1994:21-24; see also Schieber,
1995~. Careful analysis of pension income reporting in the March CPS and SIPP
in comparison with HAS/AHEAD for the HAS/AHEAD age range could be
helpful, as could cognitive research with respondents to determine their knowl-
edge of types of pension income and, in particular, whether they distinguish lump
sums from pension distributions that are spread out over time. To make house-
hold surveys more useful for retirement-income-related analysis, it would clearly
be desirable to obtain as complete reporting as possible of both regular and lump-
sum pension amounts.
To the extent that these and other validation studies identify serious data
quality problems, behavioral and projection models will need to be adjusted or
their results qualified in an appropriate manner. For example, some microsimu-
lation projection models have a provision to adjust March CPS income data for
comparability with NIPA estimates. Such adjustment procedures must be care-
fully worked out, not only to be sure that the NIPA estimates are in fact compa-
rable with CPS income concepts, but also to preserve key relationships among
income amounts and other variables.
OCR for page 131
DATA NEEDS
131
Recommendation
13. Budgets for retirement-income-related surveys should include suffi-
cient resources for regular evaluation of data quality. Evaluation methods
include reinterviewing subsamples of respondents to measure consistency of
reporting; experimentation with alternative question wording to identify
possible reporting problems; and comparing survey estimates with adminis-
trative records to determine the completeness and accuracy of survey re-
porting, taking care to adjust for differences in definitions and other aspects
of the two sources.
Representative terms from entire chapter:
administrative records