Read "The Future of the Survey of Income and Program Participation" at NAP.edu

« Previous: 5 DATA COLLECTION AND PROCESSING

Page 158 Cite

Suggested Citation:"6 DATA PRODUCTS AND THEIR USE." National Research Council. 1993. The Future of the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/2072.

Page 159 Cite

Page 160 Cite

Page 161 Cite

Page 162 Cite

Page 163 Cite

Page 164 Cite

Page 165 Cite

Page 166 Cite

Page 167 Cite

Page 168 Cite

Page 169 Cite

Page 170 Cite

Page 171 Cite

Page 172 Cite

Page 173 Cite

Page 174 Cite

Page 175 Cite

Page 176 Cite

Page 177 Cite

Page 178 Cite

Page 179 Cite

Page 180 Cite

Page 181 Cite

Page 182 Cite

Page 183 Cite

Page 184 Cite

Page 185 Cite

Page 186 Cite

Page 187 Cite

Page 188 Cite

Page 189 Cite

Page 190 Cite

Page 191 Cite

Page 192 Cite

Page 193 Cite

Page 194 Cite

Page 195 Cite

Page 196 Cite

Page 197 Cite

Page 198 Cite

Page 199 Cite

Page 200 Cite

Page 201 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

6 Data Products and Their Use Obviously, an investment in data collection can only earn a return to the extent that the data are used for basic and applied research, policy analysis, and improved public information. In order for the investment in a rich, complex survey such as SIPP to earn a high return, it is imperative that the responsible agency have an active data dissemination program that includes published reports, computer-readable data products, and associated explana- tory materials all produced on a timely basis and in accessible formats. In this chapter we discuss the requirements for an effective data dis- semination program for SIPP. We cover both the types of reports that should be developed and some of the conceptual and measurement issues that arise in estimating income and program statistics from the complex information in SIPP. We also consider microdata products and review the kinds of informational and instructional materials that SIPP users-whether of computer-readable files or printed reports need in order to make the most effective use of the survey data. PUBLICATIONS Regular publication series from a major, continuing survey such as SIPP serve many important purposes. Such publications, containing basic de- scriptive statistics plus key analytic measures (e.g., spell lengths for pro- gram participation), are a valuable reference source for the general user- and their value increases as each successive report adds to a time series. 158

DATA PRODUCTS AND THEIR USE 159 The annual P-60 series on income and poverty from the March Current Population Survey (CPS) is a notable example each fall's publication is eagerly awaited and immediately used by a broad community of policy analysts, researchers, and executive branch and congressional staff. Such publications also serve to orient an analyst who is using or plans to use the more detailed information contained in computer data products: they intro- duce the analyst to the survey, help the analyst develop fruitful study plans (e.g., the numbers may suggest hypotheses or indicate that the sample size is or is not sufficient for analysis of subgroups), and provide important control totals for the analyst to determine the accuracy of his or her com- puter output. The last function is particularly important for a complex survey like SIPP. Preparation of regular publications is also vitally important to the agency that sponsors the survey. It is only by having analysts who work with the data regularly develop tabulations and analytic measures that the agency can gain first-hand, in-depth knowledge of the quality and utility of the information. The agency, of course, needs input from outside users regard- ing data quality and utility, but it needs its own assessment as well to plan needed improvements in the survey and to provide informed guidance to users. For most of its household surveys, the Census Bureau is the data collec- tion agency but not the sponsor agency and so is not directly involved with the publication program. However, for SIPP, the Census Bureau is both the sponsor and the collection agency and, consequently, has publication re- sponsibility. It is especially important that the Census Bureau have a com- prehensive publication program for SIPP because of the richness and com- plexity of SIPP data. Users need to be made keenly aware, through regular publications that present and explain key indicators, of both the analytical power of SIPP-based measures and the problems that may result from in- complete understanding of such measures. To date, the publication program for SIPP, while including many useful reports, has not adequately served these needs. A Checkered History The Census Bureau's publication program for SIPP has been very uneven, including a stretch of several years in which almost nothing was published from the core information on income and program participation (see Table 6-1 for a chronological list of SIPP report titles published through 1991~. Initially, the Census Bureau, which established a new Household Economic Studies series (P-70) for SIPP, fully intended to publish a regular set of cross-sectional statistics from the core. The first SIPP report, released in September 1984, provided average monthly data on income and program

160 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION TABLE 6-1 SIPP Reports Published in P-70 Series Through 1991 by U.S. Bureau of the Census (in chronological order) Public all on Date (and Source of Data Number) Report Title Wave and Panel - Sept. 1984 Economic Characteristics of Households in Wave 1, 1984 (P-70-1) the United States: Third Quarter 1983 Feb. 1985 Economic Characteristics of Households in Waves 1-2, 1984 (P-70-2) the United States: Fourth Quarter 1983 April 1985 Economic Characteristics of Households in Waves 2-3, 1984 (P-70-3) the United States: First Quarter 1984 May 1985 Economic Characteristics of Households in Waves 3-4, 1984 (P-70-4) the United States: Second Quarter 1984 Oct. 1985 Economic Characteristics of Households in Waves 3-5, 1984 (P-70-5) the United States: Third Quarter 1984 Jan. 1986 Economic Characteristics of Households in Waves 4-5, 1984 (P-70-6) the United States: Fourth Quarter 1984 July 1986 *Household Wealth and Asset Ownership: 1984 Wave 4, 1984 (P-70-7) Dec. 1986 *Disability, Functional Limitations and Wave 3, 1984 (P-70-8) Health Insurance Coverage: 1984-85 (disability) Waves 2-9, 1984 (health insurance) May 1987 *Who's Minding the Kids? Child Care Wave 5, 1984 (P-70-9) Arrangements: Winter 1984-85 Aug. 1987 *Male-Female Differences in Work Experience, Wave 3, 1984 (P-70-10) Occupation, and Earnings: 1984 Sept. 1987 *What's It Worth? Educational Background and Wave 3, 1984 (P-70-11) Economic Status: Spring 1984 Sept. 1987 *Pensions: Workers Coverage Wave 4, 1984 (P-70-12) and Retirement Income: 1984 Oct. 1988 *Who's Helping Out? Support Networks Among Wave 5, 1984 (P-70- 13) American Families April 1989 Characteristics of Persons Receiving Benefits 1984 panel file (P-70- 14) from Major Assistance Programs Aug. 1989 Transitions in Income and Poverty Status: 1984 panel file (P-70-l5) 1984-85 July 1989 Spells of Job Search and Layoff. . . and 1984 panel file (P-70- 16) Their Outcomes

DATA PRODUCTS AND THEIR USE TABLE 6-1 Continued 161 Publication Date (and Number) Source of Data Wave and Panel March 1990 Health Insurance Coverage: 1986-88 Waves 1-8, 1985 (P-70-17) Waves 1-7, 1986-1987, 1985 panel file June 1990 Transitions in Income and Poverty Status: 1985 panel file (P-70- 18) 1985-86 June 1990 * The Need for Personal Assistance with Every- Wave 6, 1985 (P-70-19) day Activities: Recipients and Caregivers Wave 3, 1986 July 1990 *Who's Minding the Kids? Child Care Wave 6, 1985 (P-70-20) Arrangements: Winter 1986-87 Wave 3, 1986 Wave 6, 1986 Wave 3, 1987 Oct. 1990 *What's It Worth? Educational Background Wave 2, 1987 (P-70-21) and Economic Status: Spring 1987 Dec. 1990 *Household Wealth and Asset Ownership: 1988 Wave 7, 1986 (P-70-22) Wave 4, 1987 Jan. 1991 Family Disruption and Economic Hardship: 1984 panel file (P-70-23) The Short-Run Picture for Children Aug. 1991 Transitions in Income and Poverty: 1987-88 1987 panel file (P-70-24) June 1991 *Pensions: Worker Coverage and Retirement Wave 7, 1985 (P-70-2s) Benefits: 1987 Wave 4, 1986 *Denotes reports based primarily on data from topical modules. participation for the third quarter of 1983 (July-September) from wave 1 of the 1984 panel. From February 1985 to January 1986, the Census Bureau published five more quarterly reports-for the fourth quarter of 1983 through the fourth quarter of 1984 in the same format, but then discontinued the series (see Figure 6-1 for the contents of the quarterly reports). There were several factors behind the decision to drop the quarterly reports. First, as we discuss in Chapter 5, the Census Bureau clearly under- estimated the resources and capabilities required to process the volume of SIPP data that poured in from the field. Very quickly, the data processing system began to buckle under the strain' with consequent delays in both data files and tabulations for publication. For a time, each successive panel took longer and longer to process, and, indeed, the Census Bureau did not

62 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION For all persons · monthly household cash income (mean, median, and distnbunon from under $300 to $4,000 and over) by sex crossed by race and ethnicity, metropolitan residence, region, household relationship, age, labor force status, and work disability status; · residence in household receiving cash benefits, food stamps, and other noncash benefits for persons by characteristics (as listed above); and · mean monthly household cash income, receipt of unemployment compensation, and household receipt of cash benefits, food stamps, and ocher noncash benefits by age and sex crossed by labor force status. For persons aged 16 and over · monthly earnings (mean, median, and distubunon) by sex and full-time versus ocher work status crossed by race and ethnicity, age, household relationship, and current occupation. For households · mean monthly household cash income, and household receipt of unemployment compensation, cash benefits, food stamps, and other noncash benefits by labor force status of household crossed by household type; · monthly household cash income (mean, median, and distribution from under $300 to $4,000 and over) by race and edacity of householder, metropolitan residence, region, household type, age of householder, and work disability status of householder; · mean monthly household cash income, and household receipt of unemployment compensation, cash benefits, food stamps, and other noncash benefits by charactensucs (as listed immediately above); · household receipt of food stamps, WIC, free or reduced-pnce school meals, public or subsidized housing, Medicaid, Medicare, AFDC or other cash assistance, SSI, social security, veterans' benefits, and unemployment compensation by household type crossed by household receipt of food stamps, etc. (same categories as in table heading also for persons); and · monthly household cash income (mean, median, distnbuiion) by household receipt of earnings, property income, social secunty, private pensions, federal government retirement, U.S. military retirement, state or local government retirement, veterans' payments, private support payments, AFDC or other cash assistance, SSI, unemployment compensation, other income, food stamps, WIC, free or reduced-pnce school meals, public or subsidized housing, energy assistance, Medicaid, Medicare. FIGURE 6-1 Contents of SIPP Quarterly Reports, Series P-70, Nos. 1-6. publish any reports-using either core or topical module information-from other than the 1984 panel until 1990. Second, Census Bureau analysts documented disturbing anomalies in the quarterly data that were hard to explain. A comparison of aggregate SIPP figures for selected income types with independent sources formed a regular (and highly useful) feature of the reports. However, some income types showed erratic patterns in comparison with the independent sources: for example, average monthly unemployment insurance benefits from SIPP were more than 100 percent of the independent source for the third and

DATA PRODUCTS AND THEIR USE 163 fourth quarters of 1983, but dropped to 85 and 80 percent, respectively, in the first and third quarters of 1984. Third, for most income sources, Census Bureau analysts believed that the reports showed little change from quarter to quarter and hence would not be interesting to users. The analysts also determined that sample sizes, particularly after reductions due to budget cuts, were insufficient in many cases to ascertain quarter-to-quarter changes that were significant. For all these reasons, the Census Bureau dropped the quarterly series (although the tabulations continued to be produced in-house). From February 1986 through April 1989, the only SIPP publications were cross-sectional reports based primarily on topical modules from the 1984 panel. (The modules are generally easier to analyze than the core- for one thing, they are specific to an interview wave.) The seven reports published during this period provided interesting and often path-breaking statistics on the following topics: child care, disability and health insurance coverage, household wealth and asset ownership, educational background and economic status, pensions, sex differences in work experience and oc- cupation and earnings, and support networks. From April 1989 through December 1991, the Census Bureau stepped up the pace and increased the scope of SIPP publications, releasing 12 reports in the P-70 series. Four of these reports on child care, educational background and economic status, household wealth and asset ownership, and pensions updated previous publications, based on the 1984 panel, with data from comparable topical modules in later SIPP panels. A fifth report analyzed the topical module data on caregiving from the 1985 and 1986 panels. The other seven reports used the core data contained in longitudinal files created from all waves of a SIPP panel. Reports from the 1984 panel file included characteristics of persons receiving benefits from major assis- tance programs, transitions in income and poverty status for 1984-1985, spells of job search and layoff, and the effects of family disruption and economic hardship on children. Reports from the 1985 panel file included transitions in health insurance coverage (this report also used the core data Tom several SIPP panels to provide quarterly estimates of health insurance coverage for 1986-1988) and transitions in income and poverty status for 1985-1986 (a modified version of the 1984-1985 report). Finally, a third report in the series, on transitions in income and poverty status (for 1987- 1988), used data from the 1987 panel file. 1In addition, tabulations on home ownership from the assets and liabilities topical module in wave 4 of the 1987 panel were published in the Current Housing Reports series (Fronczek and Savage, 1991); rates of migration calculated from the 1984 panel longitudinal file were published in the Special Studies series (DeAre, 1990); and tabulations on maternity leave arrangements during the years 1961-1985 from the fertility history topical module in wave 8 of the 1984 panel and wave 4 of the 1985 panel were published in the Special Studies series (O'Connell, 1990).

164 Descriptive Reports THE SURVEY OF INCOME AND PROGRAM PARTICIPATION Reports on Income and Programs The Census Bureau's publication plans for SIPP (see Bureau of the Census, 1991a) include a phased-in development of regular reports from the core data on income and program participation. For the first time since the quarterly reports were discontinued, cross-sectional as well as longitudinal measures will be published on these topics. Both cross-sectional and longi- tudinal data from the 1987 panel will be included in a report-on major assistance programs that is scheduled for release in 1992. Updated cross- sectional statistics will be published in 1993 on income, poverty status, and programs, followed by publication in 1994 of updated longitudinal statistics on transitions in income, poverty, and program participation from the 1990 panel.2 Thereafter, cross-sectional and longitudinal reports will alternate yearly. In addition, the Census Bureau plans to prepare a major report that compares annual income and poverty data from the 1990 SIPP panel with data from the March 1991 CPS. There are also plans to incorporate some SIPP-based tabulations into the P-60 report series from the CPS. We ap- plaud the initiatives by the Census Bureau's Housing and Household Eco- nomic Statistics Division (HHES) to develop a regular, comprehensive pro- gram of publications from the core SIPP data. One change that we urge in the overall publication plan relates to the role of SIPP vis-a-vis the March CPS. As we recommend in Chapter 3, the long-range goal should be for SIPP to become the centerpiece of the nation's income statistics. Hence, we urge HHES to reconsider its stated intention that the March CPS remain the primary source for annual income and po,~- erty estimates and to work instead towards a more prominent role for SIPP. Specifically, HEWS should consider a publication schedule for SIPP, once the new design is phased in, of releasing cross-sectional statistics every year instead of in alternate years (with longitudinal statistics being released every 2 years). Ultimately, HHES should look to scale back the extent of detail in the P-60 series from the March CPS as users become accustomed to the SIPP data and a sufficient time series is built up from SIPP to support mend analysis. In the interim, we are very supportive of HHES's plans to assess the comparability of SIPP and March CPS estimates and to develop SIPP-based tabulations for inclusion in the P-60 series as an immediate way to make more use of SIPP and alert users to the additional detail that it provides. 2Updated longitudinal information is not scheduled for publication earlier than 1994: the 1988 and 1989 panels were truncated lo Six and three waves, respectively, and hence do not provide sufficient periods of observation, and the complete longitudinal file from the 1990 panel will not be available until late 1993.

DATA PRODUCTS AND THEIR USE 165 Finally, we believe it would be useful for the Census Bureau to release an historical report containing the tabulations of average monthly income by quarter that have been produced from SIPP on a regular basis, although not published since the fourth quarter of 1984. Such a report would enable analysts to become more familiar with the SIPP core information. This report, and others from the SIPP core, should include appendix material of the type that was included in the original quarterly reports on the quality of the data (e.g., information on item nonresponse rates and comparison of SIPP aggregates with independent sources). Research Reports In addition to regular publications that provide tabulations and other statis- tics from SIPP, we urge the Census Bureau to issue a research report series of special analytical studies on topics related to income and program par- ticipation. Special studies could cover both substantive and methodological subjects-such as an analysis of trends in income and poverty status for particular subgroups of the population and an investigation of new methods of estimating duration of spells of program participation-and would go well beyond the level of analysis provided in the descriptive reports. Such studies would of course draw heavily on SIPP but should also include rel- evant data, as appropriate, from such sources as the March CPS income supplement, other surveys, and administrative records. The importance of having a strong analysis program in the core subjects of SIPP at the Census Bureau stems from the agency's role as the sponsor agency for both SIPP and the March CPS income supplement and the fact that there is no other center for income statistics. The Census Bureau's program should be at least as strong as the analysis program for labor force topics in the Bureau of Labor Statistics (BLS). Indeed, BLS's Monthly Labor Review offers a possible model for the income and program research report series (although the Census Bureau's series could be quarterly or semiannual). The Census Bureau needs to have a strong analytical capability, not only to serve as a beacon and source of information for the user community, but also for its own purposes (as we note above). Such a capability should put the Bureau in a better position to understand user data requirements, to assess and improve the survey data, and periodically to evaluate and im- prove the basic descriptive report series. We understand that Statistics Canada gives strong support to in-house analysis programs, publishing spe- cial studies in Perspectives on Labour Force and Income. Indeed, the Canadian statistical agency made a deliberate effort in the 1980s to improve its analytical capability. We point out that the SIPP Working Paper series includes special stud

166 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION ies by Census Bureau staff and outside analysts of the type that we have in mind (see the section below on user information and training). This is an important and useful series to continue, but we believe that a regularly published research report series is also needed to provide a more visible outlet and a strong motivation for in-depth substantive analysis as well as methodological investigations on the part of the Census Bureau's income and program staff. (Most of the Census Bureau contributions to the SIPP Working Paper series to date focus on survey research and methodological issues rather than analysis methods or substantive research findings.) In addition, the Census Bureau should encourage the analysis staff-to submit articles for publication in professional journals. Reports on Demographic and Employment Transitions SIPP is a rich source of information on a wide range of topics other than income and program participation. As we have noted, the Census Bureau has prepared a number of interesting and valuable publications from various SIPP topical modules. We wholeheartedly support continuation of such series; see Figure 6-2 for the schedule of topical module (and core) reports planned for 1992. We also urge the Census Bureau analysis staff from both the Population and HBS Divisions to give attention to a somewhat neglected aspect of SIPP: its potential for analyzing the dynamics of family and household composition over time and the correlates and consequences of key demo- graphic and employment events (such as mamage, job loss, or retirement). Some reports have been published or planned in this area (see Table 6-1 and Figure 6-2), and, in addition, the HHES publications on income and pro- gram participation include some statistics on related demographic and em- ployment transitions. However, we believe that much more should be done. We envision a series of publications that would focus on demographic and employment events for example, comparing the economic situation before and after marriage or divorce or widowhood for all people expen- encing each type of marital status change.3 The series would include the following: · summary annual reports on a wide range of demographic and em- ployment transitions, including marital status change, family composition change involving children, change in residence, labor force status change, and job change; these reports would provide counts and basic charactens 3The report series on income and poverty transitions will look at the related but different issue of how many people entering a program or falling into poverty also experienced a change in marital status.

DATA PROD UCTS AND THEIR USE 167 Characteristics of Recipients and the Dynamics of Program Participation: 1987-88 Extended Measures of Well-Being: Selected Data from the 1984 Survey of Income and Program Participation (No. 26) Health Insurance Coverage: 1987 to 1990 (No. 29) Job Creation During the Late 1980s: Dynamic Aspects of Employment Growth (No. 27) What's It Worth? Earnings Data: 1990 Who's Helping Out? Support Networks Among American Families: 1988 (No. 28) Who's Minding the Kids? Child Care Arrangements: Fall 1988 NOTE: The P-70 series number is given for reports that have been released as of summer 1992; the other reports are scheduled for publication by December. In addition to Me reports shown above, a report based on SIPP data is scheduled for release in the P-23 series by December 1992: When Households Continue, Discontinue, and Form. FIGURE 6-2 SIPP Reports in P-70 Series Published in 1992 (in alphabetical order) tics, such as the age and sex of those people experiencing a particular type of change; and · a special report each year that would provide an in-depth analysis of one or two particular types of events and their antecedents and consequences; a rotating schedule could be established, with reports on employment topics alternating with reports on demographic topics. Recent papers from the PSID by Burkhauser and Duncan (1988) and from the 1984 SIPP panel by David arid Flory (1988) and Ruggles and Williams (1987) provide examples of the types of analysis that the detailed special reports could include. Recommendations A strong publication program on income, program participation, and related topics is an essential component of the Census Bureau's responsibilities for SIPP. The program should include several types of descriptive and analyti- cal report series that provide basic information and more in-depth analysis from the survey. Recommendation 6-1: The Census Bureau should move forward with its plans for regular, comprehensive series of descriptive reports on income, programs, and related topics from the core

168 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION data in SIPP. Longitudinal statistics (e.g., on the dynamics and correlates of transitions in income, poverty, and program sta- tus) should be published; cross-sectional statistics should also be issued on a frequent schedule. The Census Bureau should also establish a research report series to include in-depth analytical and methodological studies of special topics related to income and program participation. Data sources for these studies could include in addition to SIPP- the March CPS income supplement and other surveys and ad- ministrative records. The Census Bureau should continue publications from the SIPP topical modules and also establish a regular series of summary and in-depth reports from SIPP on the dynamics and correlates of major demographic and employment transitions (e.g., mar- riage, retirement). The program outlined above is both important and ambitious. Our major concern is that the Census Bureau may underestimate the level of resources and capabilities required to carry it out. There are many complex technical issues involved in developing appropriate and policy-relevant sta- tistics from SIPP, particularly those based on the longitudinal data (see discussion in the next section). Development of useful series from SIPP also requires extensive analysis to understand the quality of the data and their comparability with other widely-used data sets, such as the March CPS. In addition, major work will be needed to develop the capacity to estimate from SIPP such statistics as after-tax income and appropriate val- ues for in-kind benefits. Hence, staff and resources need to be sufficient not only to produce publications, but also to support an ongoing program of research and development to identify and implement improved methods of analyzing and presenting statistics from SIPP. In this regard, it is critically important for the Census Bureau to invest in the skills and knowledge of the SIPP analysis staff so that they are up to date with relevant policy issues and analytical methods. There are many avenues to accomplish this goal. The Census Bureau already has experi- ence with several approaches: organizing in-house seminar programs and sessions at professional meetings for both staff and outside analysts to present findings and discuss analysis issues; commissioning experts, through joint statistical agreements or other means, to conduct research on specific ana- lytical issues (e.g., longitudinal weights); and making use of the American Statistical Association (ASA)/Census fellowship program to bring research- ers on-site to work with SIPP data and share their experiences. We urge the Bureau to sustain these efforts for SIPP and to focus them more directly on the needs of the analysis staff. In addition, the Bureau should provide

DATA PRODUCTS AND THEIR USE 169 support for the analysis staff to enroll in courses and other continuing edu- cation programs.4 We also recommend, as part of an improved oversight program for SIPP (see Chapter 8), that the Bureau establish a working group of expert analysts to periodically review SIPP statistics on income and programs and provide feedback and advice to the staff on conceptual and measurement issues. Recommendation 6-2: The Census Bureau should ensure that its analysis staff, in addition to preparing the regular publica- tions from SIPP, are able to undertake an ongoing program of research and development into effective means of analyzing and presenting SIPP statistics and are able to stay well versed in relevant policy issues and analytical techniques. MEASUREMENT ISSUES FOR CORE STATISTICS Development of appropriate and useful statistics from the rich, complex data sets provided by SIPP gives rise to difficult conceptual and measure- ment issues. This is the case whether one is a Census Bureau analyst seeking to develop statistics for a published report series or an outside researcher pursuing a particular analytic interest. However, the Census Bureau faces perhaps the more challenging task, in that the statistics it publishes must at the same time be methodologically sound, relevant to policy concerns, and able to be interpreted to users with varied levels of technical expertise. Members of the panel wrestled with these problems in working on illus- trative tabulations for SIPP reports on income, poverty, and program par- ticipation (for details see Citro, 1990, 199Ib).5 Here we provide an over- view of some of the more important conceptual and measurement issues that will have to be considered in developing core statistics from SIPP, particularly those that involve use of the SIPP longitudinal data: relating statistics to policy concerns; specifying analysis units (e.g., persons, fa~- lies); charactenzing change in contextual variables (e.g., marital or employ- ment status changes); constructing equivalence scales (i.e., household and family income measures that are adjusted for family characteristics); mea- suring duration of spells of poverty and program participation; and treat 4The Center for Survey Methods, which the National Science Foundation expects to fund in the near future at a university site in the Washington, D.C., area, may be willing to offer relevant courses. 5The panel's suggested tables are very preliminary. The Census Bureau will need to review them carefully, in particular, run them on SIPP data files to determine their feasibility: for example, sample sizes may be too small to support some of the suggested cross-classifications.

170 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION ment of missing data due to a respondent's missing an interview or entering or leaving the sample. Because income and program statistics from SIPP will be issued side by side with statistics from the March CPS income supplement for a period of time, we also discuss ways in which many of these measurement issues are handled in the March CPS publications and note instances in which statistics from SIPP will represent an improvement. Policy Relevance Census Bureau publications are mainly intended to provide statistics of interest to a broad user community, rather than to address specific policy (or research) questions. Nonetheless, it is important that SIPP statistics on income and program participation take into account important policy con- cerns and perspectives. Indeed, the detail in SIPP makes it possible to provide a rich set of policy-relevant statistics, for which we have several concrete suggestions. SIPP obtains detailed information on sources of income, which we be- lieve should be exploited in tabulations that address policy concerns A1- though all of the detail cannot be shown because of publication and sample size constraints, some important classifications can and should be made. Specifically, we suggest that SIPP income reports, in addition to tables of total income, routinely include separate tabulations for people with income from the following sources: earnings (wages and salaries and self-employ- ment income); asset income from financial and property holdings;6 social insurance programs (social security, unemployment insurance, workers' com- pensation, and veterans' compensation); pensions (from public and private employers) and private disability insurance; and public assistance. In the March CPS reports, only earnings are currently distinguished as a separate income type, and reports to date from SIPP on income transitions make no distinctions by type of income. Yet the major income types are important to identify separately so that policy analysts and other users can get a sense of what is happening when they see changes in median income, poverty rates, and other overall trend indicators. It is important to know, for example, whether all sources of income are moving in the same direction or whether an income rise (or decline) is more or less associated with a change in labor markets, returns to assets, or public programs, such as means-tested transfers. Cross-classifying people who have income of a particular type by the presence or absence of the other major income sources and the contribu- don that each income type makes to their total income can also provide 6We note that the asset data that are currently collected in SIPP are of questionable quality (see Chapter 3). It may not be a good idea to show asset income as a separate category in SIPP publications until the quality is improved.

DATA PRODUCTS AND THEIR USE 171 important information. For example, trends in the proportion of public assistance recipients who receive earnings can give a sense of the effects of public policy changes and perhaps also of general economic conditions. SIPP also obtains detailed inflation on assistance programs, includ- ing the timing of benefit receipt, so that concurrent versus sequential mul- tiple program participation can be distinguished. We recommend that many more categories of programs be distinguished in published statistics than was done in the first SIPP report on participation in major assistance pro- grams (P-70, No. 14), which showed only three categories: major assis- tance, defined as Aid to Families with Dependent Children (AFDC), general assistance, Supplemental Security Income (SSI), food stamps, Medicaid, and housing assistance; cash assistance, defined as AFDC, general assis- tance, and SSI; and food stamps. In addition, social insurance programs, as well as means-tested public assistance programs, should be shown. In de- termining how to group assistance programs for tabulations, it is important when possible to recognize differences in target populations. For example, among means-tested programs, SSI and AFDC have very different targets- poor elderly and disabled people in the former and poor families with de- pendent children in the latter. Similarly, unemployment compensation, which is targeted at people temporarily without work, is very different from such insurance programs as social security, which support retired people and those with long-term disabilities. SIPP also includes information to determine eligibility for assistance programs.7 Program participation rates that use the eligible population as the denominator are much more useful than rates that are based on the general population or even on demographic subgroups because assistance programs typically limit benefits to people with certain combinations of financial and other characteristics. It is important for policy analysts to know whether increased numbers of program participants represent higher take-up rates on the part of the eligible population or increased numbers of eligible people or both. We are pleased to note that the Census Bureau has begun work on a model of program eligibility and will include participation rates for eligible populations in its report series on participation in major assistance programs as soon as the model is fully developed. Finally, the Census Bureau's decision to give high priority to develop- ing measures of after-tax income and noncash benefits from SIPP makes eminent sense from a policy standpoint, as well as from the standpoint of the proper conceptual approach to the measurement of income and other economic resources There has been growing use of the tax code as a social 7The eligibility information is most detailed in the 1987 and later panels that include a special topical module. There is room for yet further improvement in the eligibility informa- tion from SIPP (in particular, more frequent measures), as we note in Chapter 3.

72 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION welfare policy instrument, as well as growing reliance by policy makers on noncash programs of income assistance. We have already indicated our wholehearted support for an active program of research and development on these topics for SIPP (see Chapter 3~. Here we note our support for a published series of alternative measures of income from SIPP- like the series published from the March CPS that take account of taxes and noncash benefits. Units of Analysis The Census Bureau's P-60 reports, based on the March CPS, provide cross- sectional statistics for different units of analysis, including annual income for household and family units, personal income for people aged 15 and older, and poverty status for families and people (based on family income and poverty thresholds). The SIPP-based P-70 reports on income and pov^ erty transitions (Nos. IS, 18, 24) have used people as the sole unit of analysis, with total family income attnbuted to each member in order to take account of resource sharing within the family.8 The SIPP report on participation in major assistance programs (P-70, No. 14) also used people as the unit of analysis, with participation ascribed to the spouse or children of a primacy recipient. Statistics for household and family units are useful for a number of purposes (e.g., for government and business planning, which often requires information on households for targeting purposes). However, for policy analysis and research on such topics as income inequality and the effects of government policies on poverty, the use of household or family units can be misleading because smaller households or families are counted as equal to larger units.9 For these purposes, it is more appropriate to present house 8It would also be useful to include tabulations for selected types of own income of persons in SIPP reports. We do not consider such statistics here because they do not present special measurement problems, provided that the income sources tabulated are appropriate to consider on a personal basis. For example, statistics on own earnings for adults are useful for analysis of labor market success for different kinds of individuals. However, it would be misleading to provide statistics on program participation (e.g., for AFDC or food stamps) that ascribed benefits just to primary beneficiaries and did not take into account that the primary recipient's benefit commonly covers other household members. 9Ruggles (199Oa:123) provides a dramatic example of the effect of using families versus people as the unit of analysis. The annual poverty rate for families headed by an elderly person is higher than that for other f~rrulies, while the poverty rate for elderly persons is lower than that for other persons. The reason is that more elderly people who are poor live in small family units compared with nonpoor elderly people, while the reverse is true for the nonelder- ly. Hence, the elderly (nonelderly) poor are a higher (lower) proportion of family units in poverty than of people in poverty. Clearly, the family-based measure distorts the picture of the types of individuals who are more likely to be poor. In a longitudinal context, Doyle and Long (1988) found differences in patterns of multiple program participation in comparing measures based on a longitudinal program unit definition with attribute-based person measures.

DATA PRODUCTS AND THEIR USE 173 hold and family income statistics in terms of individual people (attributing the household or family total to each member). A problem with the CPS annual household and family income mea- sures, whether they are developed on a household or family unit basis or attributed to individuals, concerns the mismatch in measuring household and family composition and measuring income. Composition is measured as of the March following the income reference year and no information is obtained about intrayear changes in composition. For example, two people found to be married as of March will be classified as a married couple for the entire income reference year and assigned the combined income of both spouses for that year. However, this treatment is misleading if, in fact, the couple's marriage took place after the start of the income year.~° SIPP was explicitly designed to overcome this problem by following people over time and collecting monthly information on family composition and income. The question is how to use this rich detail from SIPP for cross-sectional and longitudinal statistics. For cross-sectional statistics, one can use the monthly data from SIPP directly to construct annual (or quarterly) average monthly income and pro- gram participation measures by treating each month as a separate cross- section that is weighted to represent the total population and then taking an average (as was done in the quarterly reports from the 1984 SIPP panel. This approach makes it possible to construct cross-sectional household and family income and program statistics for people and also for households and families, if desired that reflect the appropriate classification of house- hold and family type and the appropriate assignment of incomes Users will need to become accustomed to different income levels as reference 10Citro, Hernandez, and Moorman (1986), using data for the first 12 months of the 1984 SIPP panel, estimated that a household definition fixed in month 12 would misrepresent 9 percent of households as having had the same family type all year. This estimate is biased downward because the wave 1 SIPP interview does not completely measure family composi- tion changes during the first 4 months. The current CPS definition? which fixes household composition as of month 15 (in the SIPP context), would most likely misrepresent a consider- ably higher proportion of households as having had the same family type for the entire year (and an even higher proportion as having had no change in either type or size during the year). There have been no definitive studies of the effects of a fixed household definition on income measures, but the limited available evidence suggests that annual poverty rates in the CPS are distorted to some extent (see Czajka and Citro, 1982; Scardamalia, 1978). 11 Note that it would be important to ascertain if the anomalies found in the quarterly reports from the 1984 panel (see above discussion) persisted and, if so, determine their implications for the detail that is appropriate to publish in tabulations. 12The fact that intrarnonth changes in composition are ignored is a trivial matter for most purposes. Another advantage of average monthly measures from SIPP is that people who leave the universe during the year (or other reference period)- for example, people who die, move abroad, or enter an institution are included for the portion of the reference period in which

74 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION points: for example, the median average monthly income in the third quar- ter of 1984 for households from SIPP was $1,768 (Bureau of the Census, 1985b:Table 9), compared with a median annual household income in 1984 from the March CPS of $22,415 (Bureau of the Census, 1989~:Table 2~. We note that average monthly statistics may not be appropriate for all types of economic measures. Specifically, annual poverty rates that are constructed on an average monthly basis do not likely, in our view, const~- tute a useful set of statistics. A month appears to be too short a period in which to establish economic hardship, given measurement error and the fact that many people experience fluctuations in income that may put them be- low the poverty line for a short time but not over a longer period.~3 For longitudinal statistics, the use of the detailed income and family composition data in SIPP to develop longitudinal income measures for units that are observed over a period longer than a month (e.g., measures of change in income level and poverty or program participation status from one year to the next) is more problematic. To develop att~ibute-based income measures for people is conceptually straightforward, although computationally tedious because of the need to aggregate over months and household or family members. (For example, a measure of year-to-year change in pov- erty status would be constructed as follows: aggregate across people in each family each month to determine monthly family incomes and poverty thresholds for each individual; add the income and poverty threshold values for each person across months in year I, and divide through to obtain the person's poverty ratio for that year; similarly, for year t ~ 1.) However, some problems do anse. One problem is how best to categorize people by other contextual variables (e.g., family type, marital status, region of resi- dence, employment status) that will often have changed over the period of measurement. Another problem is how to treat people with missing data for part of the measurement period, either because they missed an interview or entered or left the sample. (Both of these issues are discussed below). To develop longitudinal statistics for households or families as such (or another aggregation, such as tax filing or program assistance units, which can be subsets of households) poses a more basic conceptual problem. The difficulty is how to define these units longitudinally, given observed com they were in the sample. Such people are not included in the CPS annual income measures because they cannot be in the sample in the following March, and no attempt is made to find out about them. Finally, in the SIPP context, average monthly statistics maximize the avail- able sample size, by including people who provided data for some but not all months. Also, sample size can readily be increased by combining panels. lain arguing against the use of monthly poverty rates, we do not mean to imply that poverty rates based on annual income are the only appropriate measure. Indeed, SIPP provides an important capability for measuring intrayear spells of low income; see discussion of spell definition issues below.

DATA PRODUCTS AND THEIR USE 175 positional changes. As one example, it may be easy to decide that a married couple that has a baby should be treated as the same family before and after the birth. A more difficult question is how to treat the couple if they later split up. Is the parent who retains custody of the child the continuation of the original family and the other parent a new family, or does the original family end at the time of the split and two new families begin? Further complicating matters is that any longitudinal household or family definition will produce units that existed for only part of the year (or other period of analysis), and a decision must be reached on whether to count part-period units the same as units that experienced no change or to apply some type of time weight to them. Research on the effects of the choice of longitudinal household or fam- ily unit definition is mixed. Citro, Hernandez, and Moorman (1986) found from analysis of the 1984 SIPP panel that the specifics of a longitudinal household definition had relatively little effect on annual poverty rates, but only if part-year units were time weighted (e.g., a unit that existed for only 7 months would be given a weight of 7112~. Not using time weights in- creased the poverty rate because part-year units had considerably higher poverty rates than full-year units under all definitions More important, the amount of the increase varied because the number of part-year units and their poverty rate varied across definitions.~4 It is not a desirable property of a measurement concept that minor variations in specification produce important differences in results. More- over, as noted above, the use of household or family units to present statis- tics on income and poverty will often produce different results from using people as the unit of analysis and attributing household or family income to them. Finally, because of the conceptual and computational problems of making a longitudinal household or family definition operational, it is tempting for users of longitudinal surveys to take the path of least resistance: namely, to study only those units that experienced little or no change in composition over time (see Duncan and Hill [1985] for a discussion of this phenomenon in the PSID context). Yet this approach discards the most interesting cases in the sample and the ones that are likely to differ appreciably from the rest. In view of these kinds of problems, a number of analysts have con- cluded that there is no defensible way to define households (or other units) on a longitudinal basis (Duncan and Hill, 1985, Ruggles, l990b). Instead, they recommend the use of attnbute-based person measures. We agree with this conclusion and recommend that the Census Bureau continue the prac- tice of developing longitudinal income, poverty, and program statistics for 14Findings from an analysis of the 1979 Income Survey Development Program (ISDP) research panel (Citro and Watts, 1986) were similar, except that part-year households had lower poverty rates than full-year households in that study.

176 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION SIPP reports that are person-based, with attribution of household, family, and program unit characteristics to individuals. In the case of annual statis- tics from the March CPS and SIPP that are designed for comparison pur- poses, the tables from both sources should use attribute-based person mea- sures of annual household and family income and poverty status. One last point regarding the unit of analysis concerns attribution of program participation when the assistance unit is a subset of the household or family. Doyle and Long (1990) included two types of attribute measures in their analysis, one that attributed program participation to members of the assistance unit per se (e.g., a mother and dependent children for AFDC) and the other that attributed program participation to all household mem- bers if any member of the household participated (e.g., attributing AFDC participation to a grandmother or aunt). They found some differences be- tween the two attribute-based measures, as well as between each such mea- sure and the longitudinal program unit measure. For statistics on the characteristics of program beneficiaries, it seems most useful to use the first type of attribute measure-that based on the program assistance unit. (As we note in Chapter 3, further work is needed to improve measures of the assistance unit, which are often problematic in surveys, even in SIPP with its emphasis on program participation.) But for general purpose statistics on income and poverty that are designed to an- swer such questions as how many people live in families that benefit from programs and the contribution of program benefits to total family income, it seems most useful to attribute program receipt to all members of a family (or household). This approach assumes that resource sharing extends to members of the family or household other than the assistance unit, an as- sumption that seems reasonable but one that would obviously benefit from research. Contextual Variables The use of SIPP monthly data makes it possible to develop longitudinal measures of income, poverty, and program participation that properly re- flect each person's economic status over the measurement period. How- ever, during that period whether it be a fixed calendar unit such as 1 or 2 years, the length of a SIPP panel, or the length of a spell of poverty or program participation-many people will have experienced other changes that represent important contextual variables. The problem is how to char- acterize these people for example, people who changed jobs one or more times or got married or divorced-in a way that is accurate yet retains clarity and ease of understanding.~5 1SThe problem of how to treat people with missing data for part of the measurement period is discussed below.

DATA PRODUCTS AND THEIR USE 177 The simplest approach is to use a definition for such variables as man- tal status, employment status, and family type that is fixed at a point in time (as must be done in the March CPS, given the lack of monthly data). How- ever, this type of definition is misleading to users who may assume, for example, that all unemployed poor people were continuously unemployed during the income reference period. Another approach, which has been followed in the SIPP income and poverty transitions reports, is to use a definition that reflects the person's status for the largest number of months in the reference period. This type of definition more accurately represents each person's situation during the reference period, but it, too, fails to distinguish between people who did and did not experience a change. Yet that distinction is important given the strong evidence that changes in marital status, employment status, and other characteristics relate to changes in income and program participation (see e.g., Ruggles and Williams, 1987; Williams and Ruggles, 1987~. However, it can be difficult to develop indicators of change in contex- tual variables that do not overwhelm users with detail given the many dif- ferent patterns that are possible for example, during a year, some people may experience several employment or marital status changes of different types. We do not believe that there is a general-purpose solution to this problem. Rather, the presentation of the contextual variables in a table or set of tables should be congruent with the presentation and interceded uses of the longitudinal measures of income, poverty, or program participation. For annual income and program statistics from SIPP that are designed for comparison with the March CPS, it is appropriate to use a fixed defini- tion for contextual variables; the end of the income reference year seems best. However, this type of table should also identify people who experi- enced one or more changes: for example, a table of marital status as of December could show separately for each category those people who re- mained in that status all year and those who experienced one or more changes in marital status during the year. We note that annual measures from SIPP will effect some improvements relative to the March CPS by summoning monthly household or family incomes for people in the sample at the end of the calendar year. Not only are contextual variables measured at the end of the income reference year and not in March, but all income available to a household or family member during the year-for example, income of dece- dents is included. 16However, the income of people who lived alone and left the SIPP universe during the year will not be included. In general, such annual tables from SIPP are not ideal in that, like the current March CPS annual tables, they are neither truly cross-sectional nor truly longitudinal. Nonetheless, they will be important to produce until users become fully accustomed to average monthly cross-sectional statistics on one hand and longitudinal statistics on spells and transi- tions on the other.

178 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION In the case of tables that look at year-to-year changes in economic status (e.g., whether annual family income rose or fell by 5, 10, or 20 percent or more or stayed about the same across pairs of years), there is a corresponding need to distinguish people who changed employment, man- tal, or other statuses versus those who did not over the 2-year period. The same approach could be followed for categorizing contextual variables (e.g., marital status) in tables of year-to-year changes in annual income as sug- gested above for tables of annual income. First, tabulate the people whose marital status did not change over the 24-month period by the appropriate category never married, married spouse present, married spouse absent, separated, widowed, or divorced. Then tabulate separately the people who experienced one or more changes in marital status, identifying them by their marital status at the end (or beginning) of the 24-month period.~7 Tables of year-to-year changes in annual income could also provide additional detail about changes in contextual variables (e.g., perhaps separately tabulating people with no, one, and more than one marital status change). However, such tables should not provide elaborate detail about the nature of the de- mographic changes given that the income change measures themselves are of a very aggregate nature In the case of tables that hone in on changes in income and program participation by identifying people who fall into or move out of poverty or enter or leave programs, it is appropriate and useful to provide detailed information about associated changes in contextual variables. The detail is justified given that one can legitimately seek to identify causal relationships from tables of spell duration and other characteristics for people who enter or exit poverty and programs. Using the marital status example, we suggest the following approach for categorizing contextual variables in tables of poverty or program participation spells, entrances, and exits. First, tabulate people who did not change marital status at the time of entering (or exiting) poverty or a program by the usual categories (never mamed, etc.~. Then tabulate people who did experience a contemporaneous change by using the following categones: from never married to marred, from married to sepa- rated or divorced, from married to widowed, from divorced or separated to remained, and from widowed to remand. resources of family income received over the 24-month period represent a somewhat differ- ent type of contextual variable. We suggest the following detail for each major source of income (e.g., earnings, asset income, social insurance income, public assistance see above): income from this source in first year only, income from this source in second year only, income from this source in both years, no income from this source. 18The definition of a "contemporaneous" change is open to question. The study by Ruggles and Williams (1987) of entrants and exits for AFDC and food stamps applied a rigorous definition: that a change in, say, marital status had to occur in the same pair of months as a program entrance or exit. However, they pointed out that there may be lags between such

DATA PRODUCTS AND THEIR USE 179 Equivalence Scales A controversial issue in the assessment of economic well-being, whether measured by income, consumption, or some other concept, concerns the development of appropriate equivalence scales that is, adjustment factors to permit comparisons across families of different size and composition. There is general agreement that an equivalence scale is needed to avoid the conclusion that everyone, regardless of family type or size, with a compa- rable level of income has a comparable level of well-being. There is also broad agreement on the basic outlines of an appropriate equivalence scale, namely, that it should recognize that an additional family member increases the income requirements of a family but that the increase is not uniform (because larger families can achieve some economies of scale: for example, a two-bedroom apartment typically rents for less than twice as much as a one-bedroom apartment). However, there is no agreement on the best form for an equivalence scale. The current official poverty measure includes an equivalence scale that is based on the differing nutritional requirements of elderly people, nonelderly adults, and children. Other needs for a family are assumed to be a simple ratio of 3 times the costs of meeting the minimal nutnnona1 needs of its members. Ruggles (199Oa:Ch. 4) offers a critique of the poverty measure's equivalence scale, particularly of the lower poverty thresholds for families headed by art elderly person compared with other families and the irregular patterns of increase in the thresholds by family size. There is a large literature on equivalence scales (see, e.g., Buhmann et al., 1988; Danziger et al., 1984; Deaton and Muellbauer, 1986; Lazear and Michael, 1980, 1988; van der Gaag and Smolensky, 1982~. Methods of developing such scales include relying on expert judgment about differ- ences in needs, using the preferences revealed in consumption data, or using opinion surveys that ask people about the minimum needed "to make ends meet." Ruggles (199Oa:Ch. 4) proposes that statistical agencies conduct long-term research on this issue, including research on the desirability of developing adjustment factors for other characteristics besides family size, in particular, place of residence.~9 In the short term, Ruggles favors smoothing out the irregularities in the current poverty measure equivalence scale and events as job loss (gain) and beginning (ending) assistance receipt. They proposed investigat- ing a definition that would recognize changes in contextual variables that occurred a month or two earlier than the associated program change. David and Flory (1988) used a loosed wave- to-wave-based definition of change in marital status. 19Price differences among geographic areas imply that residents of higher cost areas may need more income to maintain a similar level of consumption than residents of lower cost areas.

180 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION eliminating the distinction between elderly and nonelderly people (just as distinctions by sex of family head and farm versus nonfarm residence were earlier eliminated from the official measure). The issue of an appropriate equivalence scale for measuring poverty has broader significance because the ratio of income to poverty provides a convenient and widely used method for grouping people with equivalent income levels across the entire income distribution. For example, many of the tables in the SIPP reports on transitions in income and poverty status categorize a person's family income as a ratio of the relevant poverty threshold, using the categories of income less than 1 times the poverty level, 1 to 2 times poverty, 3 to 4 times poverty, and 5 or more times poverty. One can then answer such questions as what proportions of people in single-parent families compared with those in married-couple families are in the highest (or lowest) income-to-poverty ratio category. We support the desirability of research into alternative methods of con- structing an appropriate equivalence scale for the official poverty measure, but we focus here on the short-te~ question of what type of equivalence scale to use in SIPP publications at this juncture and for what purposes.20 SIPP core publications, like those from the March CPS, will include tabula- tions of the distribution of family income for people unadjusted for family composition. (We suggest that a useful approach is to display the mean and upper limit values for each fifth of the income distribution instead of using fixed income categories.) Such tables should always include family type and size as contextual variables, so that users can make rough-and-ready assessments of equivalence. In addition, we urge that SIPP publications include companion tables that explicitly categorize income as a ratio of the relevant poverty threshold. Both the average monthly cross-sectional series from SIPP as well as the various longitudinal series (e.g., year-to-year com- parisons) should include income-to-poverty ratio tables. Such tables should also be developed for the March CPS to facilitate SIPP-CPS comparisons. We agree with the use of the current poverty thresholds, despite their flaws, as the basis for grouping people with equivalent income levels be- cause we see no significantly better alternative in the short term. The Census Bureau should examine, and might consider publishing on art ex- penmental basis, income-to-poverty ratio tables that incorporate some modi- fications to the thresholds, such as those proposed by Ruggles.2~ In any case, it will be important to explain clearly to users the purpose of the - 20We note that a new Committee on National Statistics panel is about to undertake a com- prehensive assessment of all aspects of the current poverty measure, including the equivalence scale. We hope that this panel will make recommendations that can feed into SIPP income measures over the longer term. 21Some Census Bureau analyses of March CPS data have used Ruggles's modified poverty thresholds (see McNeil, 1992; Weinberg and ~ amass 1992).

DATA PRODUCTS AND THEIR USE 181 income-to-poverty ratio categories, namely, to facilitate comparisons across groups and, once a series is established, across time. It is particularly important to be clear on this point for the annual average monthly cross- sectional statistics from SIPP. Otherwise, users may be tempted to use the lowest income-to-poverty ratio category (less than 1 times the poverty level) to construct an average monthly poverty rate, which we note above may not be a particularly useful statistic. Analysis of Spells Over the past decade, there has been growing policy and research interest in such questions as the extent to which program beneficiaries are dependent on assistance over the long term versus those needing help only for short periods.22 Paralleling this interest has been further development of the needed research tools that make it possible to analyze spells and duration of poverty and program participation namely, large-scale longitudinal data sets, powerful computer hardware and software, and statistical methods for estimating spell duration. The SIPP longitudinal data have already been used by analysts to estimate duration of spells of low income, spells of participation in AFDC and food stamps, and spells without health insurance coverage (see Chapter 1; see also Gogan, 1988~. In addition, SIPP has been used to study completed spells of job search and layoff (Bureau of the Census, 1989b). Extending the length of SIPP panels from 32 to 48 months will increase the utility of the data for spell analysis. However, there are methodological issues that have not yet been fully resolved with such analysis, and the Census Bureau will need to consider carefully how best to present spell statistics in the core SIPP publications. To date, none of the SIPP income reports have included spell estimates. The report on participation in major assistance programs (P-70, No. 14) provided statistics on observed number of months of program participation within the 32-month period of the 1984 SIPP panel file; however, these estimates are not measures of the duration of spells. Estimating Spell Duration Because it is not feasible to follow people from the cradle to the grave, In any longitudinal data set there will be many spells (of low income, program participation, unemployment, etc.) that are not observed in their entirety. If 22As evidence of this interest, Senator Daniel Moynihan (D-N.Y.) introduced a bill in summer 1991 to require the Secretary of Health and Human Services to develop and publish statistics on welfare dependency.

82 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION no account is taken of such spells, the estimates of spell duration will be biased downwards. The standard approach to estimating spell duration is to define the population of spells to be those that are observed to begin within a specified period of observation (the length of a panel or a particular calendar year or other period that is covered by the panel).23 Survival analysis techniques are then used to estimate the probability that a given type of spell (e.g., welfare participation) will survive to the next hme period (e.g., the next month) on the basis of the cumulative distribution of observed spell durations up to that point, including spells that are nght-censored (i.e., spells for which the end date is not known).24 Alternatively, one could define the population to be all spells that ended within a specified period of observation and esti- mate survival probabilities backwards in time to the start of each spell. Such an analysis, which would include cases censored at the start rather than at the end of the panel, could be useful to carry out occasionally. As Ruggles (199Ob) notes, survival analysis of this type is a popular approach for analyzing spell durations and their determinants, and we sup- port its use to develop estimates of spells of low income and program participation for inclusion in SIPP published reports. We propose that these reports contain tables that show the median estimated spell length and the survival rate or percentage of spells in progress after 1, 4, 8, 12, 16 months, etc., derived from the product-limit (Kaplan-Meter) estimation procedure. These results should be produced for the total sample and for venous popu- lation subgroups. We suggest that the spell population for duration esti- mates from SIPP be defined to include spells that begin in a particular calendar year. This population of spells can be readily explained, and, as a time series accumulates, users can begin to see changes in spell lengths across the yearly cohorts of new spells and relate them to other phenomena. It is probably sufficient to update the spell estimates every other year. For a given year, the estimates can then be based on the panel that starts in that year, thus permitting the maximum length of observation for the spells (up to 4 years with our new design). 23If more than one spell is observed for the same person, then each spell is often treated as independent. Modeling multiple spells for the same person is also an important research priority, since total time spent on welfare or in poverty is of policy and analytic interest. Ellwood (1986) has shown that it is feasible to talce information about the length of initial spells and combine it with information about the chance and length of subsequent spells to construct estimates of lifetime welfare use. The relatively short duration of SIPP makes it more difficult to examine multiple spells with data from SIPP than from other, longer-term surveys (e.g., the PSID or NLS). 24See Tuma and Hannan (1984) for an overview of survival analysis methods; see also Blossfeld, Hamerle, and Mayer (1989); Yamaguchi (1991). See Ruggles (199Ob) for a non- technical discussion of spell analysis and other issues in longitudinal analysis of federal survey data.

DATA PRODUCTS AND THEIR USE 183 The above approach produces duration estimates averaged across the calendar year. The approach assumes that spell durations do not differ systematically according to their starting time within the year. To the extent that external events for example, legislative changes or changes in the state of the economy- affect spell durations over time, it may be mislead- ing to pool spell observations across the period as a whole. Our recommen- dation to include spells that begin within a calendar year, rather than in some longer period (such as the 4-year period of observation of a SIPP panel under our proposed new design) reduces the likelihood of changes in spell durations over time, although at the cost of reducing the sample size for analysis. If insufficient sample size turns out to be a problem (which may be the case for spells on such programs as AFDC, although probably not for spells of low income), the spell population could be defined to include spells that begin within a 2-year rather than 1-year period. Also, the estimates could be developed from pooled panels (although this ap proach will increase the proportion of censored spells). One issue that needs to be resolved is the sample of cases to include in the spell analysis. One possibility is to base the analysis on original sample persons in a panel file who provide data for every wave for which they are eligible, making use of the longitudinal weights in the analysis.25 With this approach, nght-censonug occurs when a spell is still in existence at the end of the panel or when a panel member leaves the survey universe through death, institutionalization, or emigration. The former type of nght-censor- ing, which is independent of the spell duration, is routinely handled by survival analysis techniques. Although the numbers of leavers are small, appropriate methods need to be developed to handle the latter type of right- censoring. People who die may be simply treated as having ended their spell (e.g., of low income). Depending on the type of spell, people who leave the universe because they entered an institution or moved abroad can be treated either as having ended their spell or as cases whose spells were still in progress at the end of the survey. (Note that participation in social security can continue even if a person moves abroad, but not participation in AFDC.) An alternative approach to selecting sample cases for estimating spell durations is to include all spells observed to begin during the reference period, including those experienced by persons who left the panel through attrition. In some of these attrition cases, completed spells will be ob- served, but in others the spell will be still in existence when a person leaves the panel. A common method of handling the incomplete spells of attrition cases is to treat them in the same way as right-censoring that is due to the 25There is a concern, however, that these weights may not adequately compensate for non response see the next section.

84 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION end of the survey-under the assumption that people with a spell in progress who drop out of the survey are as at much risk of ending the spell as people whose spell is censored by the completion of interviewing. Alternatively, attrition can be treated as a different type of exit from a spell that can be explicitly modeled. This approach recognizes that the decision to leave the survey is not random, but involves a questionable assumption of statistical independence between competing risks (see McBride and Swartz [1990:App.] for further discussion and references). The inclusion of spells of attrition cases can appreciably increase the sample of spells for analysis. However, there are no readily available weights that can be used. For this reason, analysts who have included cases of attrition directly in their analyses have developed unweighted estimates (see discussion in the next section). Finally, the standard approach to estimating spell durations reduces the sample size for analysis by virtue of excluding spells that exist at the start of the panel. These spells are a length-biased sample of spells since longer spells are more likely to be present at any particular date One way to include such spells is to define the spell population to include all spells that begin in a period that began prior to the start of the survey for example, in the case of 1990 SIPP panel, one might seek to estimate the duration of all spells of low income that began in the period January 1986 to December 1990 (or 1991~. The beginning date is chosen under the assumption that all of the spells existing at the start of the panel commenced after that begin- ning date. This assumption, with a January 1986 start date, might be rea- sonable in the case of food stamp spells, most of which are relatively short in duration, but not so reasonable for AFDC spells, which may last a long time. Starting dates of program spells are collected in the personal history module. If found to be sufficiently accurate, these dates could be used to determine the beginning of spells existing at the start of a panel (see Miller and Martini, 1992~. An added complication is that a number of spells that started within the period defining the spell population will have already ended before the start of the survey (between February 1986 and fall-winter 1989-1990 in our example).26 These unobserved spells, which are referred to as "truncated," need to be included in the analysis. Their numbers and durations can be estimated by a model with restrictive assumptions about the stationanty of the occurrence and duration of spells, or they can be estimated from past SIPP panels. However, the extension of the reference period involved in this general approach seems inappropriate for standard analyses of spell 26The first interviews in the 1990 SIPP panel were conducted in February through May 1990, with a 4-month reference period extending back to October, November, or December 1989 or January 1990, depending on the rotation group.

DATA PRODUCTS AND THEIR USE 185 durations. We do not recommend it for such analyses, although it may be useful for some special problems. We suggest that the standard procedure of defining the population of spells as those that begin within a period that is observed in the survey is the best approach now for the Census Bureau to follow for SIPP for its regular publication series. However, it is important for the Bureau to keep up to date with new developments in methods for spell analysis and their potential application to SIPP. In particular, we urge that the Bureau staff become familiar with the more sophisticated survival analysis techniques (such as the Cox proportional hazards models) that are likely to-be widely used for policy analysis and research. These techniques can incorporate vectors of explanatory factors relating to the type of spell, including both fixed factors (e.g., race and sex) and time-varying factors (e.g., employment status). Defining a Spell Another issue for spell analysis concerns the definition of a spell. SIPP makes it possible to identify spells as short as 1 month; however, it may not always be sensible to do so for published spell analyses of low income and program participation. For duration estimates for low income, a primary concern of analysts has to do with the persistence of poverty over time. A number of individu- als may have short spells of low income, but such spells may not represent spells of poverty. Thus, people may have little or no income in a given month- for example, when changing jobs without being poor in any real sense. In addition, people who are close to the poverty line may experience small fluctuations in income (e.g., one fewer pay period in some months) that result in apparent short poverty spells with very little real change in income. Conversely, it could be misleading to recognize very short breaks in an otherwise long spell of poverty, as such breaks could result from small fluctuations in income that temporarily put long-term poor people over the poverty line. We suggest that the duration estimates that are developed for inclusion in SIPP publications recognize an entrance into poverty only if the new state is maintained for at least 2 consecutive months. Conversely, the esti- mates should recognize an exit from poverty or an exit from a program only if the new state is maintained for a minimum of 2 months. We also encour- age the Census Bureau staff to experiment with alternative definitions. Ruggles and Williams (1989) have done useful work in this area. In analyzing several definitions of poverty spells, they found that 25 percent of the 1984 SIPP panel sample had a spell of poverty defined as 1 month's income less

186 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION than the corresponding monthly poverty threshold. The percentage with a spell dropped to 15 percent for the most stringent definition that they used. For program participation, it is appropriate to include short spells of recipiency, provided that they are measured accurately.27 However, be- cause analysts are concerned with the persistence of welfare dependency, it could be misleading to recognize very short breaks in recipiency. For ex- ample, administrative actions (e.g., recertification) may deny benefits for a short period to people who are otherwise experiencing a long spell of pro- gram dependency.28 Finally, we believe that there is an interest in the characteristics of the long-term poor and long-term program recipients in comparison with the general population. Hence, we suggest that SIPP publications include tables for people who have been in poverty or on programs continuously over a span of 2 calendar years, using 2 years as a reasonable definition of "long term" in the SIPP context.29 There must be accompanying text that warns users not to interpret the population in these tables as representing all long- term poor people or long-term recipients; however, we believe this ap- proach is a reasonable way to provide some information that is of consider- able policy interest. Treating Missing Data Household surveys never obtain complete data for all respondents. A com- plex panel survey like SIPP has complex missing data patterns, including missing items, missing waves for people in otherwise interviewed house- holds (Type Z nonresponse), and missing waves for whole households. Missing data cannot be ignored: restricting analysis to only cases with complete response can greatly reduce sample size and introduce bias. Weighting and imputation procedures are commonly used to adjust for nonresponse. How- ever, these procedures may fm1 to fully compensate for the nonresponse bias or fail to make use of all available information. We raise concerns about the SIPP weighting and imputation procedures elsewhere in this re- port for example, that the imputation of specific income and asset values 27It is important to review the data to determine that confusion among program names is not present. For example, a short spell of AFDC recipiency for a person who was employed at other times could, in fact, be a spell of general assistance. 28As noted above, it would also be useful to conduct analyses of total time on welfare, including repeat spells for the same individual (i.e., multiple spells that represent a failure to maintain independence rather than a purely administrative action). SIPP is not ideal for such a purpose, but extending the panel length from 32 to 48 months will be helpful. 29The time span could be lengthened to 3 or 4 calendar years under our proposed design ot 4-year SIPP panels; however, 2 years seems long enough and permits more rapid publication of updated estimates. The definition of `'continuously" poor or on programs could be people who are in that state every month with possibly one or two exceptions during the period.

DA TA PROD UCTS AND THEIR USE 187 does not tale account of low income or program receipt (see Chapter 3) and that the weights do not adequately compensate for differential rates of undercoverage and attrition by income level and other characteristics that are important for analysis (see Chapters 4 and 7~. We also express strong support for the development of a longitudinal imputation system for SIPP, so that, to the extent possible, imputations for each wave make use of information for the same individual from previous waves (see Chapter 5~. Here we consider a more narrow question namely, the suitability of the currently available weights for the kinds of cross-sectional and longitu- dinal statistics on income and program participation that we suggest be developed from SIPP. At present, the SIPP files for specific waves contain cross-sectional weights for the interview month and each reference month. These weights are assigned to all people in the sample that month, including original sample members and people who joined them after the first wave. The longitudinal panel files include up to three weights for each record: a weight for people with complete data for the first calendar year covered by the panel; a weight for people with complete data for the second calendar year; and a panel weight for people with complete data for all 32 months.30 Records for other people who participated in the survey are also included- so that their information can be used ire analysis of the weighted cases but are not assigned weights. We see no problem in principle in the case of average monthly cross- sectional statistics from SIPP, which can readily make use of the monthly weights and thereby include the maximum number of observations.3~ For tables of annual income that are designed for comparability with the March CPS, we suggest using the cross-sectional weights for December of the income year. Such tables present a problem, not of weighting, but of how to treat the family income of people who have data for only part of the year. We suggest the following strategy: assign newborns the mother's monthly family income for the entire year and inflate to an annual amount the in- come of people who joined the household of an original sample member 30The panel weights (which apply to March of the first year of the panel) and those for the first calendar year (which apply to January) exclude people who were not original sample members as well as original members who missed one or more waves (over the course, respec- tively, of the panel or the first year). However, they include original sample members with complete data up to the point when they left the universe (e.g., through death or institutional- ization). The weights for the second calendar year include all people- both original sample members and people not part of the original sample with complete data for that year along with original sample people with complete data for that year up to the point when they left the universe. 31 However, as we note elsewhere in the report (see Chapters 3 and 7), it is important that the Census Bureau study further the problem of different rates of attrition for important sub- groups and their implications for the cross-sectional weights.

188 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION dunug the year, whose household missed a wave, or who were abroad for part of the year.32 For tables of year-to-year transitions (e.g., movement of people in and out of poverty from one to another year), we suggest that the Census Bureau develop 2-year longitudinal weights that are similar to the existing longitu- dinal panel and calendar-year weights. The 2-year weights would apply to all cases with complete data for the particular pair of calendar years, includ- ing people with complete data up to the time they left the universe.33 New- borns should also receive weights. Using 2-year weights would represent an improvement over using the full panel weights (as is currently done in the Census Bureau's reports on transitions in income and poverty), because more people would receive positive weights. It will be particularly impor- tant to have 2-year weights under our proposed design of 4-year panels. Tables that present duration estimates and other information related to spells of low income and program participation can make use of the longitu- dinal panel weights. However, to date, most spell analyses conducted with the 1984 SIPP panel (e.g., Ruggles, 1989; Ruggles and Williams, 1989; McBride and Swartz, 1990) have not used these (or any) weights. Analysts have doubted that the weights adequately compensate for the differences between people who dropped out of the sample and the remaining sample members. Also, a significant reduction in sample size occurred midway through the 1984 panel (because of budget cuts), and exclusion of these cases (which do not have longitudinal weights) would have further reduced the sample size available for analysis.34 We note that the Census Bureau is continuing to sponsor research on ways to impute information for people who are missing one or two waves in 32A longitudinal imputation system that can fill in a missing wave should reduce the num- ber of part-year cases that result from wave nonresponse on the part of original sample mem- bers. (See Singh, Huggins, and Kasprzyk [1990] for a review of methods to handle single wave nonresponse; see also Ernst and Gillman [1988].) There is a problem of how to treat people who were institutionalized for part of the year because their income may have been very different in the two periods. Information to address this problem should result from implementing our recommendation (see Chapter 4) that SIPP collect some data for original sample members who enter institutions. Note that annual tables that categorize income as a ratio of poverty do not necessarily have to do anything special for part-year cases, as the ratio can be calculated by dividing the sum of monthly incomes by the sum of monthly poverty thresholds for the months for which information is available. 33"Complete" data may include some imputed data for wave as well as item nonresponse- see below. Also, while weights can be developed for people who were in the universe for only part of the 2-year period, the year-to-year transition tables may want to exclude these people or at least show them in a separate category. 34The Census Bureau report on spells of job search and layoff did use the longitudinal weights for the 1984 panel, but this study included only completed spells for which both the start and end dates were observed.

DATA PRODUCTS AND THEIR USE 189 a panel. If successful, the result would be to increase the number of people who receive longitudinal panel or 2-year weights and thereby reduce the sampling error in the weighted estimates. We urge the Bureau to give high priority to this work and also to work to improve the weights themselves in terms of how adequately they compensate for differential sample attrition (see Chapter 7~. Summary We have examined a range of conceptual and measurement issues that enter into the development of useful income and program statistics from SIPP, particularly those that make use of the SIPP longitudinal data. Given the complex nature of many of these issues and the advances that are occurring in analytical techniques (e.g., in approaches to spell analysis), we did not develop formal recommendations on these topics. Rather, we suggest the appropriate resolution of such issues as the unit of analysis and equivalence scales that we believe may be most useful for the Census Bureau to adopt at this time. As we recommend above, the Census Bureau should carry out research and development on measurement topics, looking to make contin- ued improvements in the core statistics from SIPP. MICRODATA PRODUCTS Timely release of computer-readable files containing microdata (i.e., the coded values for the information furnished by individual respondents, suit- ably processed to protect the confidentiality of the replies) is as important a component of the data dissemination program for a rich, complex survey like SIPP as the regular release of publications. The availability of public- use microdata products can increase the return to the investment in a survey many times over. Microdata files permit researchers to perform extensive analyses of the data, constrained only by the limits of the questionnaire content, confidentiality restrictions, and a user's imagination. Researchers can produce tabulations that disaggregate or recombine the data in ways not considered in the agency's publication program and can use advanced analysis techniques to investigate the relationships among the survey variables. Available Products We have described above the problems experienced at the outset of SIPP in data processing, with the result that, for a period, microdata files were released only after long delays. Moreover, many of the early files were recalled and reissued because of errors. At the present time, the Census Bureau is adhering to a schedule of releasing files containing the core infor

190 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION mation from each wave within 9-12 months of completion of data collection (e.g., the spring of the year following the first wave of a panel, for which the last interview month is May),3s and quality problems appear to have been markedly reduced. Initially, the Census Bureau released separate cross-sectional files for each wave of a panel in two formats on magnetic tape. One fonnat used a modified hierarchical or relational file structure that corresponded to the structure of the Bureau's in-house database management system. The hier- archical files included separate records for household, family, and personal characteristics and people's jobs and income sources. Even before the data were available, federal agency and other users expressed strong fears that they could not work with these files, and the Bureau responded by develop- ing a rectangular file structure with a record for each person that repeated household and family characteristics and included space to record informa- tion for multiple jobs and income sources. The rectangular files were suitable for processing with widely available programming languages and statistical software packages. However, these very large records greatly increased demands on users' hardware and soft- ware facilities. (One user developed a preprocessing system based on the COBOL language to minimize input and output costs for preparing extracts that were then fed into a statistical package; see Doyle, Citro, and Cohen [19873.) The large records also increased acquisition costs to users. The files were priced according to the number of magnetic tape reels, but a significant portion of the tapes (usually two) for a wave file were blank because many people only had one job or a few sources of income during the reference period or were not present for all months of the reference period. This problem was even more pronounced for the longitudinal panel files (see below), which each required three, four, or five tapes. Whether using the hierarchical or rectangular file format, users faced pitfalls if they were not thoroughly familiar with the contents and design of SIPP. For example, the records contained fields for every month of a wave, but not all people were in the sample for each month, and, hence, users had to carefully screen such people out of their analyses for those months. As another example, users had to aggregate different reference months for the four rotation groups to produce calendar-month or calendar-quarter esti- mates and, often, pull records from more than one wave. Users also experi- enced problems in trying to link wave files for longitudinal analysis. Link- ages were complicated in the early panels by the [act that not all rotation groups received all interview waves (e.g., rotation group four in the 1984 panel was skipped for wave 2; hence, to develop, say, a 12-month file, users 35Beginning with the 1990 panel, topical module files are being issued separately a few months after the corresponding core file.

DATA PRODUCTS AND THEIR USE 191 had to slot the information from waves 3 and 4 for this rotation group into the reference months covered by waves 2 and 3 for the other three groups).36 The SIPP ACCESS system developed at the University of Wisconsin, with National Science Foundation funding, put the hierarchical files into the INGRES relational database management system and provided marry ancil- lary services to assist users. The SIPP ACCESS staff estimated that the system provided support for about two-thirds of the research on SIPP con- ducted by academic social scientists outside the Census Bureau during 1985- 1990, when the facility was in operation (David and Robbin, 1992:i). How- ever, users of SIPP ACCESS also found that various aspects of- the SIPP design made the data hard to understand and work with, and, indeed, SIPP quickly gained a reputation for complexity that detected at least some po- tential users from exploring the utility of the information for their needs (Committee on National Statistics, 1989:52~.37 One problem confronting early users of SIPP was alleviated when the Census Bureau developed fully linked longitudinal public-use files. Ini- tially, a 12-month file was developed from the 1984 panel, followed by a 32-month panel file. The longitudinal files include only information from the core questionnaire; users must merge topical module data from separate files Working with input from users (see, e.g., Smith, 1989), the Census Bureau recently redesigned the format of the wave files beginning with the 1990 panel-to include "person-month" records, that is, a record for each month for which a person has data from either a self or proxy inter- view or by means of imputation (e.g., the Type Z people not interviewed in an otherwise cooperating household). This format reduces waste space because records are omitted for months for which a person has no data. Also, records can readily be aggregated in a variety of ways-for example, to produce estimates for all people for a calendar month or estimates for all months of a wave, or to create new family or household variables to at- tribute to persons-without confusion about which records to include (see McMillen, 1990~. Most users expect that the person-month format will be significantly easier to understand and use.38 36This feature of the design, which was intended to better align certain topical module questions with a particular time of the year for all four rotation groups, was dropped at user insistence beginning with the 1987 panel. (See Committee on National Statistics [1989:Table 2-13 for a listing of the interviews received by each rotation group in the 1984-1986 panels.) 37The CNSTAT report noted that a good deal of the complexity of the SIPP data reflects the real world and is not something that the Census Bureau should attempt to simplify. However, the report urged that unnecessary complexities in the survey design and file structures be reduced. 38However, no file format for such a complex survey as SIPP will serve all users equally well. The Census Bureau will need to evaluate the successes and problems that users experi

192 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION The Census Bureau has also made other improvements to the microdata files in consultation with users: for example, standardizing the codes used to indicate that the respondent was not in the universe for a particular item and hence not asked the question. This improvement is important because of the highly complex skip patterns in the SIPP questionnaire and hence the need to distinguish carefully between a not-in-universe situation and a true zero or negative response (e.g., to a question about income amounts). Another recent innovation with regard to SIPP microdata products is that Census Bureau staff have developed an on-line system (SIPP On Call- Data Extraction System) for users to specify and receive extracts (as SAS or plain ASCII files) from the public-use SIPP microdata over dial-up telecom- munication lines (Bureau of the Census, l991C).39 At present, the capabili- ties of the system are limited for example, there is no facility to extract records based on the value of a continuous variable such as income or to use a recode of one or more variables in the record retrieval specifica- tions.40 Priorities for Improvements Because of the critical importance of microdata for research, microsimulation. and other types of policy analysis with SIPP, we urge the Census Bureau to continue to seek ways to improve the timeliness, format, content, and other aspects of SIPP microdata products. Timing Although the Census Bureau has made commendable progress in improving delivery schedules for SIPP microdata files, we believe that further im- provements in timeliness are both necessary and feasible. In order to achieve the goal of SIPP's serving as the nation's main source of income statistics, core data files must be available from SIPP on about as timely a basis as they are from the March CPS income supplement~urrently about 6 months ence with the person-month format and consider ways to alleviate problems that arise. For example, it may be possible for the Census Bureau to provide illustrative SAS code that would help users with particular kinds of applications. 39The SIPP ACCESS system was transferred from the University of Wisconsin to the Cen- sus Bureau in mid-1990, and Bureau staff worked to make it operable at the Bureau for access to the 1984 and 1985 panels; however, the staff decided to develop SIPP On Call instead for access to the person-month files from the 1990 and subsequent panels. SIPP On Call also includes an electronic mail feature for users of the system to communicate with each other and the Census Bureau. 40For a recent evaluation of SIPP On Call prepared for the Food and Nutrition Service (which provided funding to help develop the system), see Doyle and Cohen (1992).

DA TA PROD UCTS AND THEIR USE 193 after data collection. The SIPP rotation group structure and the length and complexity of the SIPP questionnaire have made it difficult to contemplate releasing SIPP files on the same schedule as files from the March CPS. However, we believe that the implementation of computer-assisted personal interviewing (CAPI) and database management technology for SIPP should make it possible to move toward-and achieve that goal. Kinds of Files Currently, the Census Bureau releases wave and panel files from each panel of SIPP (separate wave files for core and topical module information). The panel files are in the rectangular format, and we encourage the Census Bureau to consider converting them to the person-month format of the wave files. (The space-saving features of the person-month format would be particularly valuable for the lengthy panel files.) We also urge the Bureau to release calendar-year files that contain data from both panels that are in the field at the same time. Although the ability to combine panels was originally viewed as an important feature of SIPP, the Census Bureau has, to date, approached the processing of each SIPP panel as a completely separate operation. The delays in releasing files from the early panels meant that users had to wait for very long periods to be able to combine, say, wave 6 of the 1984 panel with wave 3 of the 1985 panel. At present, the Census Bureau's data delivery schedule for SIPP specifies approximately simultaneous release of wave files from panels that are in the field at the same time (e.g., the core files for wave 8 of the 1990 panel and wave 5 of the 1991 panel are both targeted for release in April 1993~. Hence, users can readily develop wave files that combine panels. We propose that the Bureau take the next step of preparing calendar-year files from combined panels, as such files are likely to prove very useful for many research and policy analysis purposes (e.g., for use in microsimulation models of tax and transfer programs). Content and Coding We encourage the Census Bureau to continue working with user groups (see discussion of advisory mechanisms in Chapter 8) to identify and implement changes to the content and fort of the SIPP microdata files in order to enhance their utility and ease of use. For example, users may identify recoded variables that it would be helpful for the Bureau to create rather than leaving them to the user. We note that adding variables can add processing costs and result in records that are difficult for users to work with However, in some in- stances the Census Bureau's efforts to keep the records in the SIPP data

194 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION files to a manageable length may have gone too far. Thus, the person- month format excludes some variables that were available in the rectangular format (e.g., it is no longer possible to determine coverage by more than one health insurance plan in the person-month files). Also, some program- related variables for which the Food and Nutntion Service provided funding to include on the initial 12-month longitudinal file from the 1984 panel were never adopted for the 32-month panel files. We urge the Census Bureau to consult with users about the benefits of reinstating these vari- ables. We also encourage the Census Bureau to work closely with users to further improve the information in the data files about missing values. As a general policy, the Bureau prefers to provide users with complete records for every respondent in a survey by supplying values for missing items through some type of imputation procedure. Complete records have the advantage of maximizing the sample size for analysis, as cases do not have to be discarded because of missing information. Also, there is the advan- tage that the Bureau can implement imputations in a consistent manner (individual users may vary in the sophistication and care with which they supply values for missing items). In a separate set of fields, the Bureau generally provides yes-no indicators of whether an individual item was re- ported or imputed. Because the imputations performed by the Census Bu- reau may have disadvantages for certain analyses (e.g., see the discussion in Chapter 3 of problems with the imputation of income and assets for pro- gram recipients) and because the imputation rates can be quite large for some variables, it is important for users to have as much information as possible about them. We believe it would be helpful for users to assess the quality of the imputations from the perspective of particular analyses if the imputation flags contained information about the reason for an imputaiion- that is, whether the respondent refused to answer a question or did not know the answer. Delivery Media We encourage the Census Bureau to explore alternative media for delivery of SIPP microdata to users that can make easier the process of obtaining extracts for analysis. We note that the days of 9-track magnetic tape as a medium for data dissemination are numbered, as more and more researchers are tuning away from cumbersome mainframe systems to work with micro- or minicomputer hardware and software that use some type of direct access disk media for file storage and input and output. The familiar floppy diskettes are much too small to serve as a file storage medium for a survey as large as SIPP. However, high-storage ca- pacity CD-ROM (compact disk-read only memory) technology is rapidly

DATA PRODUCTS AND THEIR USE 195 gaining popularity for large data sets. Users of the National Longitudinal Surveys of Labor Market Experience (NLS) currently choose CD-ROM over tape by about four to one The Census Bureau is now releasing CD-ROM versions of the public-use data sets from the 1990 census. CD-ROM with suitable extraction software could well be a useful access medium for SIPP (although further improvements in microcomputing hardware may be neces- sary before CD-ROM becomes sufficiently fast and easy to access for such a large data set as SIPP). We also encourage the Census Bureau to further develop its on-line extraction system (SIPP On Call), which could save users the time and expense of acquiring and archiving complete SIPP files when they only require a subset of the data. To be most useful, the Bureau needs to add sophisticated retrieval capabilities to the system. In addition, if SIPP On Call is to be an effective means for users to work with the large volume of SIPP data, the Bureau needs to provide access to the system over high- speed communications lines for example, those provided by Internet. Moving large amounts of data over regular telephone lines is tedious and costly. Finally, we note the importance for the effective use of CD-ROM and on- line technology of having full documentation- including frequencies for each variable" integrated with the actual data (see discussion below and in Chapter 5~. Recommendation Recommendation 6-3: The Census Bureau should continue to develop improved microdata products from SIPP to support policy analysis and social science research. Priority improvements in- clude: · moving toward ~ goal of releasing core data files within 6 months after the end of data collection; · producing calendar-year files that combine panels, in addi- tion to wave and panel files; · determining, in consultation with users, changes and addi- tions to the file contents that would assist their analyses; and · developing additional ways of delivering SIPP microdata products to users, such as by means of high-storage capacity compact disks (CD-ROM) and an improved on-line data extrac- tion system. DOCUMENTATION AND SERVICES FOR USERS For effective use of large, complex data sets, users need not only the data, but also what has been termed the metadata, that is, information that en

96 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION ables the user to access, understand, and analyze the data appropriately (see David, 1991; David and Robbin, 1989, 1990~. Users of computer-readable products most obviously need basic documentation that enables them to instruct a computer program how to "read" the data on the magnetic tape or other medium. In addition, users of computer products need information to help them understand the quality and meaning of the data. Users of printed publications require such information as well. The larger and richer the data set, the more extensive must be the accompanying documentation- also, the greater the need for ancillary ser- vices, such as training sessions, working papers, and other means-of reach- ing and educating users about the potentials and pitfalls of the data and data products. An investment in documentation and related services is amply justified in that it minimizes wasted time and resources and increases the return to users from their processing and analysis efforts. Good documenta- tion makes a vital contribution to the development of a strong and growing community of users for a survey like SIPP. Documentation and Related Services to Date Microdata Documentation From the beginning of SIPP, each microdata file has been accompanied by a codebook providing basic information on the file structure and tape location and content of each variable. Codebooks are available in printed form and as machine-readable files attached to the data files. A SIPP Users' Guide containing additional explanatory information for users about SIPP and its microdata products was initiated at the start of the survey but took several years to prepare the first edition was released in 1987 (Bureau of the Census, 1987~. The guide included chapters on survey design, survey con- tent, structure of the cross-sectional public-use microdata files, use of cross- sectional files for estimation and analysis, linking waves, and assessing the reliability of SIPP data. A second edition that added a chapter about SIPP cross-sectional weighting procedures and appendix material about the 1990 panel and the new person-month format was released in late 1991 (Bureau of the Census, l991e). The initial documentation did not include frequencies for the variables; at the behest of users, the Census Bureau contracted in 1989 to have fre- quencies prepared for each file and made available on diskettes, with a subset of key control counts provided in printed form. Such frequencies, which indicate the distribution of responses to each item, are invaluable tools for users in making initial decisions about variables and population subgroups to analyze, hypotheses to explore, and analytical methods to use. To inform data file users about problems with the files or documenta

DATA PRODUCTS AND THEIR USE 197 lion, the Census Bureau has a SIPP User Notes series that is sent to all file purchasers and can also be obtained on request. Notice of the user notes is contained in ~ supplement to the newsletter of the Association of Public Data Users (APDU) that is mailed to a large list of people who have in- quired of the Census Bureau about SIPP.4i Finally, the SIPP Qualiry Pro- file (Jabine, King, and Petroni, 1990) is a very valuable tool for informing data file users about the quality of the survey information. Documentation for Printed Reports Each SIPP publication includes appendix material that describes the survey, defines key terms, indicates how to make approximate calculations of sam- pling errors of the estimates, and briefly reviews other sources of nonsampling error (e.g., underreporting). Also, reference is generally made to the addi- tional detailed information on sampling and nonsampling errors provided by the SIPP Quality Profile. Other User Services The Census Bureau regularly publishes What's Available from SIPP (e.g., Bureau of the Census, l991g), a highly useful basic reference source that lists the publications, data files, and working papers from the survey. For a number of years, the Census Bureau supported a vigorous, proactive pro- gram to educate and inform users about SIPP and to keep users aware of others who were analyzing the data. This program included training ses- sions offered as part of the summer program of the Inter-university Consor- tium for Political and Social Research at the University of Michigan and as workshops in conjunction with many professional association meetings. Ire addition, the Bureau published the SIPP Working Paper series, which, by the end of 1990, totaled 140 substantive and methodological papers by analysts both inside and outside the Census Bureau (Bureau of the Census, l991g).42 Bureau staff also organized sessions that featured SIPP research at professional association meetings and published compilations of SIPP- related papers that were presented at meetings of the ASA. Census Bureau staff further encouraged SIPP researchers to apply for ASA/Census fellow- ships to use the data files on-site at the Bureau. In addition, Bureau staff regularly appeared at the monthly meetings of SIPP analysts in the Wash 41Arrangements for the APDU SIPP supplement and for an APDU SIPP committee to consult with the Census Bureau about the SIPP data products and documentation were made in early l9g9. 42For this series, Census Bureau staff identify papers using SIPP data that have been pre- pared for professional meetings or initiated in draft, solicit their inclusion in the series, have them reviewed by one or two others, and edit them and prepare reproducible copy.

198 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION ington, D.C., area and were available to meet with groups of users in other locations on request. All of these were valuable activities that enabled users and potential users to become informed about SIPP and keep abreast of what others were learning from the data. About 2 years ago, the loss of SIPP staff who had been most active in this program led to a suspension or reduction of many of these services. Recently, with the appointment of a SIPP liaison in the HHES Division (see Chapter 8), activities such as releasing new titles in the SIPP Working Pa- per series and providing workshops about SIPP at professional association meetings have started up again. However, the level of activity has not yet reached that of the earlier years. Recommendation We urge the Census Bureau to continue regular consultations with users about needed kinds of documentation and other informational and instruc- tional matenals. We cannot stress enough the importance of having com- prehensive, accurate, and intelligible documentation and related services to interest users in the potential of SIPP data and to enable them to make the most cost-effective use of the data. We see a number of areas in which improvements to the current documentation, information, and training pack- age for SIPP would be useful. First, it is vital, as part of the implementation of the proposed redesign of SIPP, to make use of CAPI and database management system technology to fully integrate the microdata file documentation with the actual data. Such integration should enable immediate calculation of frequencies for variables and inclusion of the frequencies in the printed and machine-read- able forms of the codebook. Integration should also reduce the likelihood of errors in the documentation, such as field positions not matching the actual positions in the file, and make it possible to improve the description of skip patterns in the questionnaire that define the universe of respondents for particular items. With regard to the microdata documentation, we note that an adequate description has never been developed for one important aspect of the SIPP processing system that affects the quality of a significant portion of the data: the procedures used to impute values for missing items.43 Although it may be infeasible and indeed unnecessary to provide detailed imputation 43Documentation has also never been provided for the elaborate routines that are used to edit inconsistent replies or, in some cases, to supply values for missing responses by means of an edit rather than imputation. These routines, which are highly specific to individual vari- ables, present a daunting documentation task. A benefit from the use of CAPI technology for SIPP should be that inconsistent replies are either resolved in the field or accepted, thus minimizing the need for after-the-fact editing.

DATA PRODUCTS AND THEIR USE 199 specifications for each vanable, it should certainly be possible to describe the procedures used for various classes of variables and to provide illustrative information on the effects of imputations (e.g., before-and-after distributions for selected variables). In addition, documentation is needed for variables in the data files that result from a process of recoding other vanables.44 Again, integration of the data and documentation in a database management system may well facilitate the development of useful descriptions of imputation procedures as well as documentation of recoded variables. It is also important to develop means to more frequently update docu- ments such as the SIPP Users' Guide that provide important contextual information. The material in a well-formulated, comprehensive guide can be invaluable in orienting users to the data and alerting them to processing and analytical pitfalls. The limited background information earned ire the codebook or documentation of individual variables and codes is not suffi- cient for these needs. Only two editions of the SIPP Users' Guide have been issued to date, even though SIPP has gone through many changes since 1983, and not all of those changes are well reflected in the latest edition (Bureau of the Census, l991e). Most notably, there is little informa- tion provided about the longitudinal panel files, although they represent a widely used end complex~ata product from SIPP. This deficiency needs to be remedied. Similarly, we encourage the Census Bureau to evaluate and determine ways to enhance the text in SIPP reports that is intended to educate and warn readers about the data contents. Cross-references to such other docu- ments as the SIPP Users' Guide and the SIPP Quality Profile are helpful, but many users will not seek out those references; hence, it is important to provide as much pertinent information in the report itself as possible. We have commented on the valuable nature of the venous ancillary informational and instructional materials (e.g., working papers, compila- tions of professional association papers) and training programs that were developed for SIPP. We urge the Census Bureau to restore and enhance these programs to serve the growing community of SIPP users. The pub- lished research report series that we recommend above will also play a valuable role in this regard.45 Preparation of a complete on-line bibl~ogra 44Work is in progress under a joint statistical agreement between the Census Bureau and the University of Michigan to develop documentation for the longitudinal imputations in the SIPP panel files, and the Census Bureau expects to make arrangements with another organization to obtain documentation for the cross-sectional imputations and edits. Also, work is in progress by Social and Scientific Systems, Inc., under contract to the Census Bureau, to develop docu- mentation for recoded vanables. 45The research report series will not, at least until it is well established, substitute for the SIPP Working Paper series, which makes available the work in progress of outside analysts as well as Census Bureau staff

200 THE SURVEY OF INCOME AND PROGRAM PARTICIPATION phy that includes relevant Census Bureau staff memoranda that would not otherwise be known to most users is also an idea to consider.46 Finally, we believe it is important for the Census Bureau to take steps to ensure that there are effective channels for individual users to communi- cate both problems and suggestions for SIPP data products and documenta- tion and to obtain timely feedback from Bureau staff (see Chapter 8 for a discussion of more formal advisory mechanisms). Because of the decentral- ized system of operations at the Census Bureau for SIPP and other surveys, it has not always been clear to users which staff members to consult about problems and suggestions. Even when a responsive staff member- has been reached, it has not always been clear that there is an effective, timely sys- tem of internal communications within the Census Bureau to ensure that all relevant staff members such as those in data processing and data user services are informed and able to take appropriate action. Nor is it always clear that there are effective means of informing the user, or other users, of the reasons for the problem and the nature of the proposed solution or of the response to a suggestion. The recent establishment of a SIPP liaison position in HHES is helpful in this regard, as is the use of the APDU SIPP Supplement as a vehicle to reach users. We urge the Census Bureau to keep a vigilant eye on its user- staff communication channels and act promptly to keep them functioning in an open and timely manner. The upcoming redesign of SIPP, which will entail changes in data products and documentation, makes it all the more important to have good means of communication with individual users and the user community as a whole. Recommendation 6-4: The Census Bureau should work to im- prove documentation and related user information services for SIPP. Priority improvements include: · making use of CAPI and database management system tech nology to fully integrate documentation (including frequency counts for variables) and data; · developing documentation for recoded variables and the types of imputations that are performed for missing data in SIPP; · developing means to update key explanatory documents, such as the SIPP Users' Guide, on a more frequent basis; · restoring and expanding information and training programs, such as training sessions, working papers, and compilations of professional society presentations; and 46The SIPP ACCESS project developed such an on-line bibliography of SIPP working papers, presentations, and memoranda, which could serve as a model.

~ =~1~1aluing e~ct1ve channels of communication For users to Wed back problems and suggesdous and learn of the Bureaus response, and for users to be indeed of new development in the survey and as data product 207

Next: 7 METHODOLOGICAL RESEARCH AND EVALUATION »

The Future of the Survey of Income and Program Participation (1993)

Chapter: 6 DATA PRODUCTS AND THEIR USE

Welcome to OpenBook!

Get Email Updates