Survey Measurement of Disability
This chapter first provides a brief summary of the general features of the National Study of Health and Activity (NSHA) as planned and the experience to date in the planning and development of the survey. The remainder of the chapter is structured around the key statistical issues of measurement facing such complex surveys. For each issue the chapter describes the basis of the issue, gives examples of the issue as illustrated by the NSHA, and then draws more general implications for the Social Security Administration (SSA) research agenda in work disability.
As stated in Chapter 1, NSHA is a response to the recommendation made by the Board of Trustees of the Federal Old Age and Survivors Insurance and Disability Insurance Trust Funds (DHHS, 1992). SSA considers NSHA as the cornerstone of its long-term disability research plan aimed at understanding the growth of the disability programs. It is also needed to answer policy and research questions about the nature and extent of disability in the United States. SSA also needs to know the magnitude and characteristics of the population with disabilities who may be eligible for benefits and the factors that keep them employed. It needs answers to these and other questions in order to project future trends in its disability programs with a degree of confidence.
A major component of the committee’s deliberations has been to evaluate the NSHA—its information goals, the process of developing measurements to meet these goals, the method of data collection, and the sample selection and allocation required to adequately represent the potential recipients of disability benefits. The committee has focused on
matters of measurement issues to meet the Social Security Administration’s information needs, on the adequacy of the research design, and the implementation plan for the NSHA. The committee issued two interim reports (IOM, 1997a, 1999b) on its findings and conclusions based on its review. The first interim report provided a preliminary review of the general features of the proposed survey design, data collection plans, coverage, sampling plans, and operational decisions as described in the scope of work prepared by SSA in the draft request for proposals (RFP) for the conduct of NSHA. The committee believed that SSA needed to make important decisions about the survey design, the research and development work for the survey, and other basic features before issuing an RFP for the survey. It also discussed some of the limitations as they related to the efficiency of the sampling plan in terms of accepted statistical principles and practices. The committee’s third interim report reviewed and provided guidance on the sample design, instruments and procedures, and response rate goals for the pilot study. It also commented on the time line established by SSA for initiation of each phase of the survey. Both reports provided SSA with specific and detailed guidance on various aspects of the survey. SSA has responded by altering various features of the survey. All of the committee’s recommendations made in these reports can be found in Appendix C.
THE NATIONAL STUDY OF HEALTH AND ACTIVITY
The National Study of Health and Activity is a complex, national sample survey designed to estimate the number and characteristics of a broad range of people with disabilities that affect their ability to work and carry out activities of daily living. SSA has contracted with Westat to conduct the survey. As originally conceived, the principal information goals of the NSHA were to
Estimate the total number and characteristics of people who are severely enough impaired that, but for work or other reasons,1 they would meet SSA’s statutory definition of disability. (This group would represent the universe of potentially eligible nonbeneficiaries who could apply and meet the current criteria, but who are not now receiving benefits.)
Identify the number and characteristics of people who are not eligible under the current SSA definition of disability, but who could be included as a result of any changes in the disability decision process.
Identify the factors (e.g., accommodations, social support, and other factors) that permit persons with similar impairments, who could qualify for benefits, to continue working.
Examine the variables needed to monitor and assess in a cost-effective manner future changes in the prevalence of disability.
In addition, SSA plans included simulating the disability applicants’ folders developed at the Disability Determination Service (DDS) level using measures collected from the survey.
While the NSHA was being developed, efforts to redesign the disability decision process were on a parallel but separate track. NSHA assumed an additional role of evaluating the proposed redesigned process and of serving as a source for testing functional assessment instruments and the decision process itself. The original goals and design of the study were modified to accommodate an additional role for the NSHA. This part of NSHA design was subsequently dropped when SSA made the decision to no longer pursue the redesign initiative.2
More recently the survey has assumed an additional role to obtain data to explore if people with disabilities support SSA’s Disability Employment Strategy, an initiative designed to encourage people with disabilities to continue to work or to leave the rolls and return to work by providing incentives to keep more earned income relative to benefits. SSA is currently assessing the impact of the Ticket to Work program that would allow Social Security Disability Insurance and Supplemental Security Income beneficiaries to keep $1 for every $2 earned. Another information goal added for the survey is to identify the effects of planned or possible increases in the retirement age on the disability program.
General Features of the NSHA Design
The sample design for the NSHA is driven by the following four core objectives (Westat, 1999b, p. 5). The design should yield samples of sufficient size to produce statistically precise estimates for
In late 1999, SSA decided to abandon the redesign initiative. Chapter 6 of this report discusses the redesign initiative and the decision by SSA to shift away from it.
various subgroups of working age people with severe enough disabilities to be eligible for disability benefits for SSA purposes if they applied;
“borderline” group of people, with disabilities sufficient to permit estimates of the number and characteristics of those who might become eligible, or cease to be eligible, if the current SSA disability decision criteria are altered;
people with only mild or no disabilities, sufficient to permit comparisons with the population with disabilities on measures of physical and functional performance and medical conditions in the population; and
people currently receiving disability benefits under the Social Security Disability Insurance (SSDI) program and/or the Supplemental Security Income (SSI) program.
The sample for the NSHA is a dual-frame, multistage, stratified probability sample design. The first stage is a stratified sample of primary sampling units (PSUs) selected with probability proportional to size. Within the PSUs, households with persons 18–69 years of age are subsampled at rates designed to yield a nationally representative sample.
The sample sizes appear to be driven primarily by the first objective and by cost considerations. With those two factors in mind, SSA set a target to identify a sample of about 3,090 nonbeneficiaries with severe disabilities (the likely eligible group) out of a total sample of about 5,665 persons. Severe impairments are relatively rare in the general population. In fact, the severity and prevalence of a disabling condition are inversely related; the higher the prevalence of a condition, the lower the severity, and vice versa (LaPlante, 1991). Because SSA’s eligibility criteria tend to filter out people with less severe disabilities, SSA is faced with many low-prevalence disabling conditions, all of which cannot be screened adequately into the sample. The exceptions may be mental conditions and low-back conditions. SSA is cognizant of this situation; therefore it has built into its sampling plan provision for oversampling persons with severe disabilities.
Accordingly, the sample was conceived to contain
a “core” group of nonbeneficiaries with severe disabilities (about 3,090);
persons with significant but lesser disabilities, the “borderline” cases (about 1,545);
nondisabled persons (about 515); and
current SSDI and/or SSI disability beneficiaries, who will be included primarily for the purpose of benchmarking the distinctive characteristics of the core group (about 515).
The first group, a core group of nonbeneficiaries, would consist of persons whose impairments are severe enough that they would likely be eligible for disability benefits if they applied. Other subgroups—current beneficiaries, people with lesser impairments (the “borderline” group), and nondisabled—are to be included in the survey to ensure full coverage as well as to provide the data needed to meet the NSHA objectives.
Data Collection Plans
Data collection for the NSHA involves
a screening interview of a household respondent;
a personal interview and physical performance tests;
an extensive medical, and if needed, psychological examination; and
a series of core and special medical tests.
In addition, SSA would obtain all medical evidence of record identified by the respondent and by third party reports on all persons in the sample to supplement information from the interviews and medical examinations in order to determine if the person meets SSA’s current definition of disability.
SSA’s assumptions about the sample size that would have to be screened in order to obtain the required 5,665 persons distributed disproportionately in the four strata for the various components were based on achieving the following response rates:
90 percent for the initial screening interview;
90 percent for the subsequent in-person interview and medical examination among those screened; and
80 percent overall response rate for the combined interview and medical examination components.
Assuming that these high response rates could be achieved, Westat estimated that a sample of about 98,095 persons in about 57,712 house-
holds would be sufficient to yield 5,665 persons for the NSHA study group.
The Pilot Study3
In response to a recommendation in the committee’s first interim report (IOM, 1997a), plans were developed for a large comprehensive pilot study preceded by extensive testing before the conduct of the national study. Extensive plans for testing were developed for the pilot study. These included a comprehensive series of tests and experiments covering all aspects of the survey operations, design, response rates, and the content and effectiveness of the questionnaires before the start of, and during, the pilot study. A sample of approximately 13,200 households was expected to be contacted in eight PSUs in the initial screener.
The purposes of the pilot study were to experiment with several data collection methods and procedures, and to ensure that the questionnaires were clear and concise, that all procedures ran smoothly and efficiently, and that the burden and discomfort placed on the respondent were kept to a minimum. Other purposes included testing the effectiveness of the screening instruments and measuring the accuracy of the screening algorithm; evaluating procedures to maximize response rates—both total and item response; and developing estimates of prevalence rates to determine the final sample sizes for the main study. Finally, the pilot study was also designed to test the operational procedures for medical examinations, including measuring the reliability of physician and nurse practitioner examinations; to evaluate medical examinations performed in the home and in mobile examination centers (MECs); and to measure the reliability and validity of the simulated disability decision process. The pilot study was also designed to test instrument designs for the DES and more thoroughly test the screens and questionnaires. The tests concerned the screener methods used to allocate the general population into the four study groups.
The Time Demands to Achieve Survey Quality
A major lesson learned from the experience in planning and developing NSHA is that before starting a national survey, sufficient time should be allowed to (1) conduct and analyze the results of the various pretests,
focus groups, and cognitive tests; (2) conduct a comprehensive pilot study with the planned and other built-in experiments; and (3) analyze and test alternative solutions in areas that need resolution as a result of the pilot study.
NSHA implements survey measurement of complex concepts in the absence of a scientific consensus on what measures are best suited. It is on the frontiers of survey design. When survey measurements must be crafted without the benefit of years of prior development, great care must be taken in assessing whether they measure what is intended. Similarly, screening protocols and physical measurements require time for development and evaluation prior to their use in production settings.
Because of significant committee uncertainty about the effectiveness of the survey instruments to measure disability, the committee strongly recommended in its interim report (IOM, 1997a) that SSA set aside a significant amount of time and resources for NSHA questionnaire design and testing. The committee also recommended a rigorously designed field experimentation and development phase of the survey to identify mechanisms for enhancing participation in the survey, to establish the validity of measures obtained, to assist in the quality of medical records obtained, and to guide decisions on issues relating to medical examinations. The rush to launch the national survey, however, caused serious logistical inflexibility during the various phases of the survey.
The pilot study is an example of allowing inadequate time for the development and testing that is required. SSA planned to complete developmental work and conduct a pilot six to nine months after award of the contract for the survey. Following the committee’s recommendations, SSA developed extensive plans for a large comprehensive pilot study including testing all exploratory information and procedures through focus groups, cognitive laboratory tests, and pretests.
The pilot study was conducted in the first half of 2000 with about 12,000 initially selected households and a completed database of nearly 4,000 cases. It was conducted in four counties (and not eight as previously planned) selected for their geographic and regional diversity. Only a short period of time was allowed in the schedule for development, testing, and making the necessary modifications before launching the national survey. Decisions had to be made throughout the process, and the results of the pilot study made it obvious that there was insufficient time to resolve issues and test alternatives before launching the national survey.
Several reports evaluating the results of the pilot study were prepared by Westat identifying corrective revisions made during and immediately following the pilot study, and recommendations to SSA for further revisions that would be tested before implementation in the main survey. The revisions were focused on achieving two goals: (1) reducing the
burden on respondents and (2) maximizing the capacity of the items to produce data needed to answer the research questions posed by SSA. The revisions, therefore, took on an iterative process aiming to strike a balance between these two goals, at times with a possible net result of no reduction in respondent burden. Several small-scale pretests were planned, some already were under way or had been completed at the time of this writing. These pretests should provide feedback on instrument length, flow, item clarity, and item sensitivity.
As a result of the pilot study experience, the data collection plans are being restructured, and the mode of data collection changed because of poor results with the random digit dialing (RDD) sampling frame. Westat will be using area sampling and will try to get telephone numbers for the sampled persons. If successful in obtaining telephone numbers, the screening interview will be conducted by telephone. If unsuccessful, a field interview will be administered. Westat expects to get about 25 percent of the responses by telephone. The screening interview also is being revised with the goal of reducing the respondent burden to about 20 minutes.
One of the primary concerns expressed by the DDSs was that the information presented to them in the NSHA data packet from the pilot study did not always seem complete. This led to their lack of confidence in making a simulated disability determination prior to the full survey. SSA and Westat are planning to conduct a small “end-to-end” test involving about 100 persons, most all with known disability status. The main purpose of this test is to check that the revisions made to the data collection procedures do in fact improve the completeness of the data collected on respondents to determine medical and vocational eligibility for SSA benefits.
The committee in its third interim report had concluded that it seriously doubted that enough time was allotted to determine what changes are needed and to implement those changes before the conduct of the national survey. In order to assess the findings of the pilot study and resolve the problem areas in a satisfactory manner, more time will be needed between the completion of the pilot study and the start of the national study than the two to three months allocated. The time frame provided little flexibility in terms of the amount of time available to make deliberate and rigorous decisions on issues of design, procedures, and questionnaire if problems are uncovered during the pilot study. The committee recommended that SSA revise the project schedule to allow significantly more time to plan and analyze the pilot study and test alternative solutions for problem areas before starting the national study. Unless the period for testing, analysis, and development is extended, SSA could encounter serious problems during the national survey. The committee recognizes that increasing the time and level of research between the pilot
study and the national survey may have cost implications. The committee understands SSA and Westat are already addressing many of the issues raised in this report. The committee notes that since then, SSA has approved significant additional time to the schedule to adequately evaluate the results of the pilot study and to test alternative solutions for problem areas before starting the national study.
Given the complexity of the NSHA, the committee in its interim report (IOM, 1999b) also suggested the conduct of a dress rehearsal once all the issues are resolved and before starting the national study. No time had been allocated for a dress rehearsal in the timetable for the study. In response to the committee’s recommendation, however, a dress rehearsal is included and will be the last step before nationally representative data are collected in the main survey. It is slated to begin only slightly ahead of data collection in the first year of the main survey. Preliminary work on the dress rehearsal is expected to begin March 2002. The actual interviews and examinations will be conducted between December 2002 and January 2003. As of July 2001, plans called for the field work for the main survey to be carried out over multiple years beginning in early 2003. The full NSHA sample of 80 PSUs will be divided into two or more replicates, each of which will be nationally representative. This design will provide the ability to assess response rates and the ability to obtain preliminary estimates at the end of the first replicate.
In summary, not allocating sufficient time in the beginning for research, development, and testing prior to launching a major complex survey has resulted in the need to repeatedly revise the timetable for the various steps in the development and conduct of the survey. To illustrate: the original schedule for planning, development, and completion of the survey as reflected in SSA’s request for proposals for contract covered a total of two and a half years from January 1998 to August 2000. Ten months were allowed for the award of the contract, planning, and development, and 10 days later a pilot study was planned, with no time for iterative testing and experimentation before the pilot and between the pilot and the start of stage one of the survey. In response to the committee’s concerns and recommendations issued in its first interim report (IOM, 1997a), the pilot study was delayed, but only by about a month. SSA also assumed that all analysis and revisions could be done during the pilot study and so allowed only two to three months from the end of the pilot study (November 2000) to the start of the main survey (January 2001); therefore very limited time was allowed for research, development, testing, and making the needed changes. Although some decisions on instrumentation can be made prior to the end of the pilot study, a thorough analysis of issues was not possible until the end of the data collection phase in the pilot study. Even if analysis of some tests and experiments
could have begun earlier in the analysis phase of the NSHA pilot study, additional time would have been needed to examine the implications and plausibility of several different “adjustments” in the problem areas. As indicated earlier in this chapter, the results of the pilot study made it clear that revisions and more iterative testing of the revisions were needed. The most recently revised schedule available to the committee called for the end-to-end test data collection from December 2001 to February 2002; dress rehearsal data collection from December 2002 to January 2003; and the main study to start early in 2003. Thus, the survey that was originally planned for the middle of 2000 is now scheduled to start in 2003 and assumes a five-year data collection plan.
NEEDED RESEARCH IN THE MEASUREMENT OF DISABILITY IN A SURVEY CONTEXT
The experience to date with the NSHA, as well as work with other surveys that include measurement of disability, makes clear that the measurement of people with work disabilities is complex. The complexity stems, in part, from differences in conceptual models of the enablement– disablement process and alternative interpretations of the various conceptual models discussed in Chapter 3. In addition, there exists an incongruity between the various conceptual models and SSA’s statutory definition of work disability. The various constructs do not necessarily identify the same population. Finally, NSHA must address both the estimation of how many persons might apply for SSA benefits and the number that would be classified as persons with work disabilities in the SSA benefits decision process.
All complex surveys such as the NSHA require trade-offs between the cost of the survey, the timeliness of the survey statistics, and the quality of the statistics derived from the survey. For example, quickly mounted surveys, especially in new fields, can rarely produce high-quality statistics, although they may save the sponsor money. Quality in survey statistics, in turn, has a well-established structure in surveys, involving closeness of the responses obtained to the true underlying attributes of sample persons, on the one hand, and the ability of the resulting set of respondents to represent the characteristics of the full U.S. population, on the other hand.
Although a number of research activities are under way worldwide that address issues related to statistical error associated with the measurement of disability, these efforts are but a beginning with respect to understanding the properties of measurement error associated with disability-related questions. In addition, other sources of error are, for the most part, not addressed in current research activities. The committee sponsored a
workshop in May 1999 to bring together disability researchers and experts in survey methods to discuss conceptual and survey design and measurement issues, and to identify unanswered questions of measurement of persons with work disabilities (IOM, 2000). The discussion revealed several gaps in survey methods and measurement of work disability, leading to a framework for long-term research for SSA and others in the field. This framework encompassed four broad areas of research, paralleling the stages of survey measurement: (1) coverage error, (2) measurement error, (3) nonresponse error, and (4) the development of measures of the environment. Each of these areas is discussed briefly below, with specific references to NSHA.
Coverage error is produced by the failure to include all eligible people on the list or frame used for identifying and sampling the population of interest. The use of screening questions to identify the population of interest leads to an additional source of coverage error—the exclusion of persons due to inaccurate classification at the time of the screening.
Household-based surveys by definition eliminate from the sampling frame those members of the population who are homeless, as well as those living in institutions. Those residing in group homes, assisted-living facilities, and other new types of residences may or may not be included in the frame, depending on how the distinction is made between institutional and noninstitutional residence. SSA likewise has decided to exclude from the NSHA the institutionalized population and the segment of the homeless population who cannot be found in households or other quarters at the time of the interview.
However, the question of including or excluding homeless people in the NSHA is not as straightforward as the other household surveys. (The committee discussed the issues surrounding the inclusion of the homeless and institutionalized population in its interim report; IOM, 1997a). The committee recognizes the likelihood of relatively high rates of disability among homeless and institutionalized populations, and the resulting negative bias resulting from their exclusion. The extent of this coverage error, when attempting to describe the entire U.S. population with respect to disabilities, is unknown. It is likely to be a function of the type of disability, with estimates of the population with mental retardation or mental health problems most likely subject to the highest rates of coverage error. Empirical data are needed to estimate the differences in the rate
and characteristics of the population with disabilities based on household surveys as compared to the entire population.
At the same time the committee has serious questions about the operational and methods issues involved in attempting to include homeless and institutionalized populations in NSHA. Can reliable information be obtained, feasibly and economically, from homeless and institutionalized populations? Techniques have been developed to locate, sample, and obtain data about each of these populations. Yet locating and screening respondents for eligibility require special efforts involving careful, and long-term planning, large amount of staff resources, considerable time, and high levels of funding. Homeless people present problems in scheduling, interviewing, and administering performance tests and medical examinations. Maintaining contact with them and getting them to participate in adequate numbers in the medical examination also would be problematic. Likewise, obtaining permission from family members for the participation of people in long-term care institutions who are not able to grant permission themselves may be difficult.
The committee concurred with SSA that adding homeless and institutionalized populations to the sampling frame at this time would not be cost-effective. Much research and testing are required to develop the necessary protocols and procedures for conducting the NSHA among homeless people and those living in different types of institutions. The costs of sampling and interviewing in the various types of institutions would be prohibitive. Thus, limiting the target population to the household population seems appropriate. In its earlier report the committee urged SSA to undertake research as part of its long-term research plan leading to the inclusion of these populations in subsequent studies or a separate supplement to future surveys such as the NSHA.
Effects of Alternative Approaches to Screening
The use of a screening instrument to identify the population of interest often impacts coverage error. The committee believes that three areas of research are particularly important with respect to the use of screening instruments:
the effect of alternative wording of questions on the identification of the population—given the discrepancy among rates of disability evident in the literature, establishing the reliability of screening items is particularly important,
comparisons of estimates based on simultaneous screening and interviewing with those based on separate screening operations—
this research should also focus on understanding the mechanism by which the two operations result in different estimates, and
the effect on estimates when a subsample of cases classified as negative according to screening questions is included and re-screened as part of the extended interview (this approach is taken by Statistics Canada in its Health and Activity Limitations Survey).
SSA in its survey plans had specified the use of telephone number frames for NSHA. Households with telephones were to be selected by list-assisted RDD sampling. This decision by SSA appeared to be driven primarily by cost considerations. The choice of sampling frame determines the nature of noncoverage error in any survey. Common choices in surveys in the United States are area frames, offering theoretically complete coverage of households and institutions; dual-frame designs combining telephone and area frames; dual-frame designs combining area and institutional list frames; and telephone number frames.
The committee expressed serious concerns about the adequacy of coverage of the general population based on RDD sampling. Noncoverage of persons in households with no telephones should be of particular concern for persons with disabilities. In addition, there was no indication of how SSA will deal with people with hearing loss, communication disorders, mental and cognitive impairments, and emotional disturbances, who are not likely to be covered well in a household frame.
Approximately 5 percent of households in the United States are without telephones. Moreover, persons in households without telephones have a higher rate of disability (17 percent) than those in households with telephones (15 percent) (Thornberry and Massey, 1988; LaPlante and Carlson, 1996). The availability of telephones also is negatively correlated with income.
In addition, telephone sampling and screening would likely offer lower response rates than face-to-face screening (Groves, 1989; Lessler and Kalsbeek, 1992). As a consequence the screening sample would need to be increased to compensate for the losses from the sample because of nonresponse; the higher nonresponse rates are likely to increase the risk of bias in the estimates. Thus, although telephone screening may be less expensive, some aspects of the quality of the data collected are more suspect. Careful study of mechanisms to increase the screener response rate is required. These mechanisms might include incentives, refusal conversion efforts, switches to alternative modes of data collection, and so on.
Also, there was no indication by SSA how it would deal with people with hearing loss, communication disorders, mental and cognitive impairments, and emotional disturbances. SSA also has the problem of response
burden for the total household if more than one person in the household has a disability and proxy reporting is not encouraged. Similar problems will have to be faced in the main interview and in administering medical examinations and performance tests to persons with severe disabilities. The effect on response rates and bias could be significant. The committee advised in its interim report (IOM, 1997a) that SSA should test several options dealing with these problems in pretests prior to the start of the national survey.
In terms of coverage of the adult working age population, survey response rates, and some features of the screening measurement, the preferred design is an area probability, face-to-face survey. It is also clear that the cost of such a design would be higher than the alternative proposed by SSA. The additional costs for a survey of this importance and complexity should be considered in the context of the size of the program itself (SSDI and SSI) and the implications of poor or imprecise information. The committee, therefore, urged a careful review of the costs of a full area probability survey, in light of the cost savings proposed in later recommendations.
These concerns about the exclusion of non-telephone households led the committee to recommend in its first interim report that NSHA should be based on a design offering full coverage of the U.S. household population of adults. The committee recognized that the cost of including persons in non-telephone households would increase the costs of NSHA. The committee therefore recommended that if resources were lacking to use an area probability sample using face-to-face interviews, the Social Security Administration should use a multiple-frame design of a statistically optimum mix of RDD and area frame of the general population followed by face-to-face interviews of the eligible population.
The NSHA pilot study demonstrated that while the cost of using a sample from the RDD frame was lower than that of an area frame, the resulting response rates (a risk indicator for nonresponse error, reviewed below) were much lower. After the pilot, consistent with the committee’s earlier recommendation, Westat has recommended to SSA that an area frame design be used, offering greater coverage of the household population and likely better response rates, at likely higher costs.
The issue of the use of proxies arises in this survey because a large number of people in the sample will have disabilities or some kind of functional limitation. Westat plans to avoid proxies whenever possible. However, it may be necessary to collect information from proxies to ensure the highest possible response rate and to obtain as much informa-
tion as possible from people who have difficulty responding on their own.
Westat’s plans call for a household reporter to answer questions in the initial screener about all working age adults in the household. Westat is concerned, however, that such reporters may not be able to answer accurately and honestly questions about the mental and cognitive health of other members of the household. Westat is also concerned about the risk of very low response rates if it attempts to interview each person in the household about his or her mental and cognitive health. During the follow-up screener and the comprehensive survey interview, Westat plans to use medical exam proxy assistants in interpreting for and assisting the sample person with medical needs or language problems (Westat, 1999c).
Proxy interviews have varying levels of accuracy depending on the topic of the interview and the relationship of the subject to the proxy. Westat believes that the use of proxies in the initial screening process will make it oversensitive; for purposes of the initial screener, however, that would be acceptable. Beyond the initial screener, Westat plans to avoid using proxy reporters but does expect to have proxy-assisted interviews. The decision to use or not use a proxy respondent will be made when the sample person is initially contacted. If the respondent is available and able to complete the interview, the interviewers will be discouraged from accepting a proxy (IOM, 1999b; Westat, 1999c).
The committee believes that the issue of proxy respondents is an area for fruitful research as noted below.
Most users of survey data know that larger samples reduce the uncertainty that the survey results will depart from those in the full target population because of the subset of the population that was sampled. Sampling error can also be reduced by stratification of the frame into separate diverse populations, followed by independent selections from each subpopulation or stratum. Conversely, use of clustered samples (e.g., sampling persons together who live in the same geographical area) and assignment of vastly different probabilities of selection can increase the instability of survey statistics due to sampling error. NSHA samples will have to be clustered given the use of the MECs to conduct the medical examinations and tests.
SSA assumed that the core group sample of 3,090 will be sufficient to estimate several subgroups of particular policy interest. These subgroups include potentially eligible nonbeneficiaries who are working; younger nonbeneficiaries with disabilities; nonbeneficiaries aged 62–69 years;
nonbeneficiaries with mental, emotional, or behavioral conditions; and nonbeneficiaries with disabilities from minority groups.
The committee expressed concerns in its interim reports about the adequacy of the size of the total sample and of the allocations among the four subgroups—nonbeneficiaries with severe disabilities, persons with significant but lesser impairments, nondisabled persons, and current beneficiaries—and questioned SSA about this disproportionate sample design and the basis for choosing the specific sample sizes for the four groups. The committee could not understand the logic that led to this particular disproportionate sample design. It believes that the targeted sample sizes would lack the condition specificity that SSA would require for estimation and analytical purposes. Even if SSA can achieve these planned sample sizes, the cells very likely will be much too small, especially if SSA stratifies on more than one disabling condition and/or demographic or socioeconomic characteristics such as age, gender, minority status, or working nonbeneficiaries with specific disabling conditions.
Similarly, the proposed sample size for the borderline group of persons with less severe disabilities may not be sufficient in its analytical strength for assessing how alternative decisions and policies would affect outcomes. The differences in outcomes resulting from changes in policies or procedures is likely to be minimal, if any, for persons with severe disabilities, but some real differences could show up among borderline cases under alternative conditions.
The committee expressed similar concerns in its third interim report and continues to have several questions and concerns about the adequacy of the total sample size and especially about the allocation of people among the four subgroups. The sample sizes may not support SSA’s requirements for estimation and analytical purposes. As stated above, the committee does not understand the logic that led to these sample sizes and allocations. It has not seen the statistical rationale for setting the sample size targets or the plans for analysis that would drive the sample and content of the survey.
Although adequate empirical data do not exist to measure the impact of nonresponse on estimates of persons with disabilities, the nature of a person’s impairments or disabilities might result easily in differential nonresponse among members of the population with disabilities. This deficit in the literature suggests that a priority for nonresponse research is the assessment of differential nonresponse among persons with disabilities.
The role of gatekeepers and interviewers may represent sources of nonresponse error unique to the measurement of persons with disabilities. Gatekeepers may limit access to persons with disabilities who, if provided with the opportunity, might be quite willing to serve as respondents. The role of gatekeepers, their contribution to nonresponse, and the differential impact of gatekeepers for telephone surveys compared to face-to-face administration of interviews have never been addressed in the literature. Similarly, interviewers may classify sampled persons as incapable of serving as respondents, due to apparent cognitive, sensory, or other impairments. Research also is needed to address the extent to which such judgments by an interviewer result in nonresponse among the population of primary interest.
SSA had assumed that at response rates of 90 percent for each component of the NSHA, it should get the planned sample sizes. The committee repeatedly has stated that the expected rates may be overly optimistic, especially for a population with disabilities. It raised these issues in its first interim report (IOM, 1997a); it reemphasized in its third interim report (IOM, 1999b) the problems that could arise as a result of sample selection, size, and allocation if adequate advance planning and testing are not undertaken.
The committee has learned recently that SSA is rethinking these targets. As a result of experience with the pilot study, SSA has reevaluated the response rates and now believes that response rates of 85 percent for the screening interview; 85 percent for the in-person interview; 90 percent for the medical examination; and an overall response rate of about 60 percent are more realistic to achieve. SSA also is now revising upward the sample size estimates on the basis of information from a number of sources including the simulation experience from the pilot study. This process will not be finished until the “end-to-end” test is completed. (Personal communication, John R. Kearney, Office of Research, Evaluation, and Statistics, SSA, March 21, 2002.)
Each of the NSHA survey instruments used in the pilot is lengthy and complex, thus creating a risk that respondents will be unwilling or unable to provide useful data to SSA. For example, SSA has noted that the Comprehensive Survey Interview will impose a burden on some respondents who have a complicated medical history, considerable income or assets, and a complex work history. The committee agreed and expected that other NSHA components will also impose a significant burden on these and other respondents. Another concern is the initial screener, because its results will be used to sort individuals into the four categories. For this
screener, one household member will be asked to respond to numerous questions, including questions about mental and emotional problems, for all household members 18–69 years of age. If the informant does not answer these questions correctly for all household members, individuals who have conditions that should result in their selection for the follow-up screener may be missed.
Because of its length and complexity, SSA and the committee agreed that the instrument would have to be reduced in length between the end of the pilot study and the start of the national study. SSA first must decide which questionnaire items are to be eliminated, and then the shortened version must be evaluated and field-tested to ensure its viability as an instrument that can meet the study’s goals. These steps will take several weeks or months to be done well. In its third interim report the committee recommended that SSA revise the project schedule to allow significantly more time to plan and analyze the pilot study and test alternative solutions for problem areas before starting the national survey (IOM, 1999b).
Estimates of the population appear to vary as a function of the essential survey conditions under which the data are collected, specifically, the mode of data collection, the wording of the specific question, the context of the question, the overall content of the survey, the survey’s sponsorship, and the nature of the respondent providing the information (self versus proxy response).
Regardless of the type of impairment, the development of valid and reliable measures of disability—especially work disability—is a challenging undertaking, but their episodic nature, as well as perceptions of social stigma make the measurement of mental and cognitive impairments all the more difficult. Valid and reliable measures of participation in the social and economic environment are needed. Valid questions should reflect the conceptual models that view work disability as a matter of degree, suggesting that the measurement of disability be on a continuum as opposed to the dichotomous measures used in many surveys.
Three areas of research are needed for developing valid and reliable measures of work disability:
Assessment of the effects of specific question wording and question context. This involves
research directed toward understanding respondent’s comprehension of the key concepts within the question, such as “difficulty,” “work,” “performance,” and “ability”;
decomposing long questions used to screen for persons with disabilities and making comparisons between the approaches with respect to reliability, validity, and length of administration; and
assessment of the role of context on estimates of the population where context is broadly defined, ranging from subjective factors such as mood to objective factors such as the survey sponsor, the questions immediately preceding the disability measures, and even such factors as the weather.
Assessment of the effects of self and proxy reporting: A limited empirical literature on the effects of self and proxy reporting of functional limitations suggests that the direction and magnitude of response error is, in part, related to whether the report is provided by the individual or by proxy. (See for example LaPlante and Carlson, 1996; Todorov and Kirchner, 2000.)
Assessment of the effects of essential survey design features: Estimates of persons with disabilities or persons with work disabilities vary as a function of essential survey design features. Some examples of design features include sponsorship of the survey that could affect both the properties of nonresponse (motivation to respond or not respond) and the measurement process (response editing and formation); the effects of the presence of others during a survey administration, especially in the measurement of mental illness; the effects of mode of interview; and incorporation of new technology (e.g., audio computer-assisted interviewing) to enhance participation and privacy among persons with disabilities.
The Challenge of Measuring the Environment
One of the major voids between conceptual models of impairment and disability and survey measures is the inadequacy of survey questions to measure the environment. Current data collection efforts, for the most part, fail to measure the environment and its impact, either as a means of facilitating or as a barrier to participation in the social and economic environment.
Environmental factors are external factors that make up the physical, social, and attitudinal environment in which people live (Fougeyrollas, 1995; Friedman and Wachs, 1999; Schneider, 2001; Whiteneck, 2001). The classification of environmental features enumerated in the second revision of the International Classification of Functioning, Disability and Health (ICF), (formerly the International Classification of Impairments, Disabilities, and Handicaps [ICIDH]) provides a well-defined architec-
ture for developing questionnaire items designed to capture environmental factors that affect the disablement process. Among the environmental factors of importance in the ICF framework are products and technology, the natural environment, support and relationships, attitudes, social services, systems, and policies. Of interest with respect to disability is the extent to which environmental factors either facilitate or present barriers to participation in social roles. As part of the research to design questionnaires that map conceptually to the ICF coding framework, researchers are currently addressing the development of both objective and subjective environmental measures (Schneider, 2001).
The committee underscores the need to develop measures of both the physical and the social environments. The measurement of environmental context should examine both factors that accommodate impairments and those that serve as barriers. The development of objective measures of the physical environment may be facilitated by fostering collaboration with researchers in ergonomics and human factors engineering, fields in which a primary focus is the measurement of the environment.
To aid in the development of objective measures of the social environment, the committee notes the need to develop and test questions concerning social climate, barriers, and stigma. These questions are especially important for those with mental illness, but they are relevant for, and should be asked of, all persons with disabilities.
One of the challenges related to developing objective measures of the environment is the identification of a set of questions that can be asked of the general population. However, to fully understand either barriers to employment or factors that facilitate employment, questions must be tailored so as to be relevant to the individual’s situation. Ethnographic exploratory studies of workplace environments are one means by which to inform household measurement of accommodation and barriers. For those who are no longer working, questions that enumerate what accommodations would be necessary to facilitate, or what barriers prevent, participation in the workforce have to be designed and subjected to evaluation. Similarly, research is needed on developing subjective measures of both the physical and the social environments that either facilitate or limit participation.
In addition to research for developing such measures of the environment, research also is needed on two additional topics: (1) assessment of systematic differences in evaluating the environment among those for whom the environment is benign versus those for whom the environment is hostile and (2) assessment of the difference between self and proxy subjective reports of environmental conditions.
To summarize, the empirical literature examining measurement error associated with specific questions, albeit limited, suggests that items cur-
rently used to screen or measure persons with disability are subject to low levels of reliability and are of questionable validity. The impact of both coverage error and survey nonresponse on estimates of the population with disabilities and work disabilities has not been addressed in the literature. In light of these points, the measurement of people with disabilities as well as work disabilities could be greatly improved with research directed toward one or more of these agenda topics.
Although a number of research activities are under way in the federal agencies (Hale, 2001; Rand, 2001) that address issues related to response validity and reliability associated with the measurement of disability, these efforts are only a beginning with respect to understanding the properties of measurement error associated with disability-related questions. Other sources of error identified above—most notably coverage and nonresponse error—are for the most part not addressed in current research activities. Without an understanding of the extent to which coverage error and nonresponse error impact estimates of work disability, it will be difficult for SSA to monitor the size and characteristics of the potential pool of applicants based on survey data. SSA, in collaboration with other federal agencies, should engage in an ongoing program of research on measurement issues, taking into consideration the conceptual developments in the field.
The impact of the research efforts designed to address measurement error on subsequent rounds of NSHA and related data collection activities is that in the near and intermediate future, questionnaires incorporating measures of disability will be in a dynamic state. Changes to question wording and response options are likely as research reveals the characteristics of questions and design features that result in higher-quality (validity and reliability) measures of disability. Question wording identified in the current NSHA for monitoring the pool of potential applicants for disability benefits, or models using questions in current use, may be obsolete in the near future, as surveys adopt new questions or design features to minimize response error.
Because SSA had not mounted an ongoing program of survey measurement of disability for many years, much of what it is attempting in NSHA is novel. New survey measurement demands careful, time-consuming development. For measurement involving questions, qualitative research probing issues of comprehension by diverse respondent groups is needed. Cognitive interviewing techniques are used to examine the memory structure of respondents relevant to the material being measured. Computer-assisted interviewing software needs to be designed to improve memory cues and reduce psychological threats to measurement error. The reduction of survey nonresponse requires that interviewers identify and address the concerns of different types of sample persons to
the survey request. Finally, all the components of the survey must be tested together in a pilot study or dress rehearsal.
Such research, when conducted extramurally, but guided by the mission of the agency, can provide the agency with proven measurement approaches when new concepts become integrated into statutes guiding program designs. For example, the Disability Research Institute (DRI) established by SSA in May 2000 could serve as a useful vehicle for the conduct of the research discussed above.
FUTURE SURVEYS OF DISABILITY AND WORK
The enduring lesson of the NSHA for other survey efforts to be undertaken by SSA as part of the work disability program is clear—careful survey design and measurement require considerable development and field-testing prior to implementation. Cost savings that appear to arise when work is rushed are illusory. Cutting corners can be done only with careful, experience-based judgments and analysis. Delays in the original schedule of the NSHA that evolved over the course of the committee’s interaction with SSA often arose because unanticipated discoveries were made about the complexity of the survey design and implementation tasks. It is likely that the total cost and total time of the project are greater than would have occurred if more careful, deliberate developmental studies had preceded the launch of the major national survey.
The committee has repeatedly stated during the course of the study and in its interim reports that the NSHA, if well designed, could be the cornerstone for long-term disability research. When completed it can be of fundamental importance to future analyses by the SSA and other researchers. It will provide information that would guide SSA in making decisions about its disability programs and will play a key role in projecting and understanding disability rolls in the future. Moreover, it will lay the groundwork for future surveys. Early in the study the committee strongly endorsed the conduct by SSA of a well-designed, carefully pretested, and statistically sound survey. The committee reiterated its position later in the study. It has not changed its position today. Rather it reemphasizes its endorsement. However, the value of the information diminishes with time. It is therefore critical that SSA update the comprehensive database with regular periodicity. To ensure effective planning, SSA must examine the fundamental characteristics of who has work disabilities, and how many more, or fewer, people will become eligible. SSA has not collected such information for more than 20 years, and it is long overdue. It is critically important that SSA not wait another 20 or more years before obtaining such basic information so relevant to its policies and programs.
Recommendation 4-1: The committee recommends that prior to undertaking any future large-scale data collection effort, the Social Security Administration should allow sufficient time and provide adequate resources to
investigate, test, and incorporate conceptual developments; and
develop, pretest, pilot test, and revise measurement instruments and design.
In conclusion, the immediate need of the NSHA involves estimates of the size and characteristics of the pool of persons eligible for SSA disability benefits. A cross-sectional sample of the household population done at a particular point in time provides useful estimates for such needs. When change over time is an issue, survey measurements must be repeated in order to provide estimates of change. When the only interest is whether the full target population has experienced a change in the prevalence of a phenomenon, an independent cross-sectional survey conducted at a later time provides useful change estimates. When the interest concerns whether some types of individuals change and others do not, a longitudinal survey, conducting repeated interviews of the same persons, provides the most useful data.
SSA’s needs for the estimation of change over time in the size and characteristics of the eligible population stem from the necessity to forecast the growth or decline of the applicant and beneficiary pool. SSA has stated that NSHA will permit forecasting of changes in the size of the beneficiary population. Such a goal implies ongoing measurement of the size and characteristics of the eligible population, with updated instrumentation to reflect any changes in conceptual and measurement issues and in SSA’s eligibility protocol that may have occurred in the intervening years.
The next chapter discusses the design choices for obtaining the needed information on an ongoing basis using a reduced set of measures in the intervening years between the conduct of the large surveys.