Data

Good data are essential for good models of retirement-income-related behavior and policy changes. Indeed, the availability of relevant, high-quality data can often simplify the analysis and modeling task by reducing the steps that are required to compensate for inadequate data. For example, modelers have sometimes had to expend considerable effort to develop an integrated set of needed variables by using statistical methods to match data files for different units of observation (see Cohen, 1991a). The availability of exactly matched records for the same units would have provided much higher quality information and eliminated the statistical matching operation.

In the final report, we will develop criteria for needed data for retirement-income-related behavioral analysis and policy modeling and consider cost-effective data development strategies. We will also consider the important issue of data quality and needed data validation work. For example, there are well-known discrepancies between the reporting of such income sources as pension benefits, interest, and dividends in household surveys compared with income tax and other administrative records (see, e.g., Coder and Scoon-Rogers, 1994). Similarly, there are discrepancies in reported pension coverage. Some portion of these discrepancies is due to differences in definitions and universes (e.g., household surveys exclude the income of the institutionalized; also, they provide data for individual workers whereas administrative records for employers double count employees with more than one job). However, reporting errors are also a factor. It is important to understand the reasons for the differences among data sources so that strategies can be developed to improve the input data for models and also to improve the adjustments in models to compensate for remaining data problems.

Here we briefly review types of existing data for individuals, focusing on the need for panel surveys for analysis purposes. We also review existing data for employers, noting the paucity of information about the characteristics of employers and their employees, both cross-sectionally and longitudinally. Finally, we discuss the need for exact-match files of survey and administrative records information to improve analysis and modeling in the area of retirement income.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 28
Toward Improved Modeling of Retirement Income Policies: Interim Report Data Good data are essential for good models of retirement-income-related behavior and policy changes. Indeed, the availability of relevant, high-quality data can often simplify the analysis and modeling task by reducing the steps that are required to compensate for inadequate data. For example, modelers have sometimes had to expend considerable effort to develop an integrated set of needed variables by using statistical methods to match data files for different units of observation (see Cohen, 1991a). The availability of exactly matched records for the same units would have provided much higher quality information and eliminated the statistical matching operation. In the final report, we will develop criteria for needed data for retirement-income-related behavioral analysis and policy modeling and consider cost-effective data development strategies. We will also consider the important issue of data quality and needed data validation work. For example, there are well-known discrepancies between the reporting of such income sources as pension benefits, interest, and dividends in household surveys compared with income tax and other administrative records (see, e.g., Coder and Scoon-Rogers, 1994). Similarly, there are discrepancies in reported pension coverage. Some portion of these discrepancies is due to differences in definitions and universes (e.g., household surveys exclude the income of the institutionalized; also, they provide data for individual workers whereas administrative records for employers double count employees with more than one job). However, reporting errors are also a factor. It is important to understand the reasons for the differences among data sources so that strategies can be developed to improve the input data for models and also to improve the adjustments in models to compensate for remaining data problems. Here we briefly review types of existing data for individuals, focusing on the need for panel surveys for analysis purposes. We also review existing data for employers, noting the paucity of information about the characteristics of employers and their employees, both cross-sectionally and longitudinally. Finally, we discuss the need for exact-match files of survey and administrative records information to improve analysis and modeling in the area of retirement income.

OCR for page 28
Toward Improved Modeling of Retirement Income Policies: Interim Report DATA ON INDIVIDUALS Data on individuals (or families or households) include administrative time series (e.g., statistics on age at acceptance of Social Security retirement benefits by year); cross-sectional surveys of samples of respondents at a point in time, sometimes with retrospective longitudinal information (e.g., job histories); and panel surveys that provide longitudinal data for the same people observed at several points in time. For analytical purposes, it is critical to have good longitudinal data from panel surveys. Such data make it possible to assess more accurately than with cross-sectional data the factors that influence behavior and to determine the behavioral effects of changed conditions, including policy changes.1 Panel survey data also make it possible to separate age from cohort effects: for example, people who are now in their 40s may not exhibit the same retirement patterns when they reach their 60s as the people observed in a cross-sectional survey who are now in their 60s. Policy models, as distinct from analytical or behavioral models, have not usually operated directly with panel data. For example, microsimulation models typically use large-scale cross-sectional surveys that provide the sample size to permit disaggregating outcomes for population groups.2 One widely used cross-section is the March income supplement to the CPS, which asks about income and employment for the previous year for a sample of 60,000 households. When the Survey of Income and Program Participation (SIPP) is redesigned in 1996 to provide larger samples, it may also become a modeling database. SIPP has both cross-sectional and longitudinal features, following members of original sample households for 32 months—to be expanded to 48 months under the redesign. (See the Appendix to Bodie and Munnell [1992] for descriptions of the March CPS, the pension supplements that are included periodically in the CPS, and SIPP.) Microsimulation and other types of policy models—no matter what type of data they operate on directly—depend heavily on panel surveys for key inputs. For example, dynamic microsimulation models use behavioral coefficients developed from analysis of panel data to age or project forward a cross-sectional sample to represent the population in future years (e.g., to simulate probabilities of marriage, job change, retirement, death). There are several completed and ongoing panel surveys sponsored by government agencies that contain data relevant to retirement-income-related analysis and modeling (see Appendix B). These surveys have supported a wide range of important behavioral studies that have contributed insights on a breadth of topics, including the retirement decision and other retirement-income-related behavior (see studies cited in the papers we commissioned, e.g., Gustman and Juster, 1994; Lumsdaine, 1995). However, panel surveys vary in sample size, content, age range and sex of people originally included, and length of time for which they followed, or expect to follow, the panel members. Hence, there are some significant gaps. For example, a sample of women currently in their 40s has been followed for over 25 years in the National Longitudinal Survey of Young Women; however, a companion sample of men was followed only through their 30s. 1   Cross-sectional surveys can collect longitudinal information, such as employment and earnings histories, retrospectively, and in some cases this is done. However, compared with panel surveys, the quality of retrospective information is more subject to recall and other errors; also, the length of such a cross-sectional survey could pose a problem of respondent burden. 2   An exception is a recently developed public assistance model—STEWARD (Simulation of Trends in Employment, Welfare, and Related Dynamics)—which makes direct use of data in the National Longitudinal Survey of Youth (NLSY) to simulate the effects of welfare reform proposals on program participation (Jacobson and Czajka, 1994).

OCR for page 28
Toward Improved Modeling of Retirement Income Policies: Interim Report One panel survey that may make possible major advances in research knowledge and also contribute methodological innovations that other surveys can usefully adopt is the new Health and Retirement Survey (HRS; see Juster and Suzman, 1993). HRS originated when it proved infeasible to go forward with a plan to reinterview the survivors of the Retirement History Survey, which followed men (and their spouses) ages 58-63 in 1969 for a 10-year period (the reasons had to do with new confidentiality restraints as the result of federal income tax law changes). Instead, the National Institute on Aging (NIA) funded, through a cooperative agreement with the University of Michigan Survey Research Center, the new HRS. HRS began with a sample of 8,000 households in which the household head or the spouse, if present, was age 51-61 in 1992. (The sample included an oversample of blacks, Hispanics, and residents of the state of Florida.) People interviewed included the age eligible individual in the household and his or her spouse (often, both spouses were age eligible), for a total of 12,650 respondents in the first wave of interviewing. A second interview wave was conducted in 1994. The interviews are lengthy, covering demographic characteristics, family structure, economic status, labor market activity, intergenerational transfers, income, assets, expectations, and functional health. Innovative techniques have been used in HRS to obtain information about assets. Typically, in surveys, people provide information about the types of assets they hold but not about the value of each asset. In HRS, an “unfolding” or “bracketing” technique is used, in which holders of an asset who don't know or refuse to provide a value are asked if the value is above a certain amount; if yes, whether it is above another (higher) amount, and so on. High rates of response are obtained by this method, although the response categories are very broad—for example, less than $1,000, $1,000 to $10,000, $10,000 to $50,000, $50,000 or more. Another innovation in HRS is the battery of questions about expectations, expressed as probability judgments relating to future events. Thus, respondents are asked about the probability of living to age 75 or 85, the probability of working full time at age 62 or 65, the probability of moving, the probability of adverse health events, and the probabilities attached to such macro phenomena as high inflation rates, a major depression, and the likelihood that the Social Security system will become more or less generous in the future. Also asked are questions that measure risk aversion and time preference. Preliminary analysis suggests that the expectation questions obtain results that make sense in light of respondents' other characteristics. Further work will be required to determine to what extent they may account for differences in retirement and savings decisions. Data are obtained in HRS, not only through survey instruments but also through linkages with administrative records. Summary plan descriptions that describe pension coverage and benefits are obtained from respondents ' employers, coded, and added to the survey records. Also, respondents ' employers are surveyed about health insurance coverage and benefits. There are plans to link Social Security records on covered earnings and (from 1978 forward) on total taxable compensation to the survey data for respondents (see below). A companion survey to HRS, Asset and Health Dynamics Among the Oldest Old (AHEAD), began in winter 1993-1994 with interviews of a sample of households in which the head or spouse was age 70 or older (Hurd et al., 1994). Sample cases were drawn from the screening interviews conducted by HRS and from a list frame provided by HCFA. Blacks, Hispanics, and residents of Florida were oversampled. Interviews were obtained from 8,200 age-eligible people and their spouses. The questionnaire content in AHEAD is similar to that in HRS, except that AHEAD examines functional health in much greater depth and touches only lightly on labor market activity. Linkages of Social Security earnings records and Medicare and Medicaid claims data are planned for AHEAD.

OCR for page 28
Toward Improved Modeling of Retirement Income Policies: Interim Report AHEAD promises to add valuable information to address such questions as whether and when older households in fact decumulate assets—which the life-cycle theory of consumption postulates should occur but for which there is only limited evidence from studies to date—and whether older households treat housing assets as interchangeable with other assets in providing resources for consumption. The answer to the latter question is critical for an assessment of the prospects for retirement income security of the baby boom generation, many of whose members appear to have relatively little saved beyond their homes. NIA has funded an extension of both HRS and AHEAD for another 5-year funding cycle (although the funding has recently been cut back). The plan for HRS is to conduct two more interview waves with the original cohort (waves 3 and 4) and to introduce in 1998 a new cohort of people born in 1942-1947 (i.e., ages 51-56) and their spouses. For AHEAD, the plan is to conduct three more interviews with the respondents to the original first wave interview and to introduce a cohort in between HRS and AHEAD. For the longer run, it is hoped that interviewing can continue with the HRS cohorts and that new cohorts can be introduced. DATA ON EMPLOYERS By and large, much less information that is relevant to retirement-income-related policy concerns is available for employers than for individuals. (See the Appendix to Bodie and Munnell [1992] for descriptions of available public and private data sets on employers and employer benefits.) One source of information is the Form 5500 data system maintained by the U.S. Department of Labor. Private employers are required to file with the Internal Revenue Service (IRS) information about their pension plan type(s), participants, and financial characteristics on Form 5500. IRS, in turn, processes the data and forwards copies to PWBA (see, e.g., Pension and Welfare Benefits Administration, 1995). The Pension Benefit Guaranty Corporation has data (e.g., plan types, number of participants) on defined benefit pension plans that it insures and on people who are receiving payments from it because it assumed responsibility for their employer's plan. Surveys conducted by the Bureau of Labor Statistics also provide some relevant information. The Employee Benefit Survey obtains descriptions of various types of benefits offered to workers by employers from a sample of about 6,000 private sector establishments and state and local governments (mid- and large-size private employers are surveyed in one year and small private employers and state and local governments the next year). The Employment Cost Index survey provides quarterly estimates of changes in wage and benefit costs and annual estimates of the level of wage and benefit costs, developed from samples of establishments and occupations. In addition, some private benefit consulting firms and employer associations maintain databases of pension plan information for client or member companies (see, e.g., HayGroup, 1993). Recently, the Society of Actuaries ' Retirement Systems Experience Task Force began a program of data collection from large companies (5,000 or more employees) with defined benefit pension plans as the basis for studies of employee turnover and retirement. The intent is to collect data every 3 years on the number of employees at each company at the beginning and end of the year by age, sex, and length of service, and for each category the employees' eligibility for full or reduced retirement benefits, aggregate salary, and aggregate accumulated retirement benefit. Also collected will be pension plan descriptions. None of the current or planned data sets is adequate for retirement-income-related analysis of employers. Each data set has limited content, particularly with regard to information about individual employer characteristics that could explain differences among employers with re-

OCR for page 28
Toward Improved Modeling of Retirement Income Policies: Interim Report gard to worker recruitment, retention, benefit, and retirement policies. Also, none of them contains other than the most aggregate information on the characteristics of the workers of an employer, including the important dimension of worker productivity. Some of the data sets have a panel aspect to them (e.g., Form 5500 data can be linked for private employers over time).3 However, their use to analyze factors that may lead employers to change their worker and benefit policies is limited because of limited content, or a restricted universe (e.g., only the largest companies may be included), or both. Finally, there is even less information available about retirement-income-related employer benefit programs that are not tax-qualified pension plans (e.g., nonqualified pension plans for executives and others and retiree health plans).4 EXACT-MATCH FILES For many kinds of policy questions, models can best use a combination of survey and administrative data for individual decision units (e.g., people or employers). For example, detailed models of Social Security and employer pensions need information on workers' employment and earnings histories along with a range of other information (demographic characteristics, family relationships, other sources of income). SSA records are the best source of complete employment and earnings histories, but they do not contain much additional information, so an exact match of a household survey with SSA records is needed. However, no exact matches of Social Security records with the March CPS or other household surveys have been made publicly available subsequent to the 1979 exact-match file used by the PRISM model. (An exact-match file of Social Security records with the 1984 SIPP panel was prepared but was made available only to SSA analysts under strict conditions of use.) On the employer side, some analyses have been conducted with matched data sets from the Form 5500 data system linked by employer identification number with Compustat data abstracted from companies' financial reports to the Securities and Exchange Commission (see, e.g., Bajtelsmit, 1995; Ghilarducci, 1995). However, such data have never been matched to other kinds of records (e.g., Social Security wage records for employees) or to surveys of employers and employees that could increase their analytical usefulness. Legislation that restricts the release of administrative records has been a major impediment to the development of more recent or extensive exact-match files. Also, statistical agencies that conduct surveys have been more and more concerned with questions of privacy and confidentiality of data and the potentially adverse effects on survey response rates if individuals or businesses believe that their replies are not held in strict confidence. Recently, data collection agencies and data users have engaged in serious discussion about ways to provide greater access to information while safeguarding confidentiality (see Duncan 3   PWBA makes available computer files of all Form 5500 filings and of samples for some years in which the data have been more extensively edited. The composition of the sample changes from year to year; however, it is possible to link records across years for some employers and plans by matching on employee identification and plan numbers. See Smart and Waldfogel (1995) for a study of funding patterns of defined benefit pension plans that uses a matched file for 1984-1989. 4   The Agency for Health Care Policy and Research (AHCPR) collects information about employer-provided health care benefits for current and retired workers as part of the household-based National Medical Care Expenditure Survey, last conducted in 1987. In 1994, AHCPR co-sponsored with HCFA and the National Center for Health Statistics the National Employer Health Insurance Survey, which obtained information about health care plans from a large sample of 39,000 private and public sector establishments. However, these surveys do not also collect information about employer pension plans or other benefits, and whether and how often they will be conducted in the future is uncertain.

OCR for page 28
Toward Improved Modeling of Retirement Income Policies: Interim Report et al., 1993). Some of the most promising approaches involve schemes for limited access—for example, arrangements in which users are sworn in as temporary agency employees and have access to the data only on-site or in which users (and their institutions) sign agreements that prohibit redistribution of the data and contain penalties for disclosure of individual information. We understand that discussions are under way about the preparation of exact matches of SSA administrative records with HRS and AHEAD. (SSA earnings histories would be matched with both HRS and AHEAD; SSA benefits histories and Medicare claims histories would be matched with AHEAD.) These files would be available to researchers who sign the necessary agreements; also, some variables on the files (e.g., geographic location of residence) would be masked to further safeguard confidentiality. The availability of matched files from HRS and AHEAD would be very helpful for analysis of retirement and savings decisions and other behavior relevant to retirement income security.