4
Sampling Methods

INTRODUCTION

As is both common and appropriate in large, continuing statistical studies, the NHATS is intended to serve many purposes. EPA has described its major objectives as the detection of toxic substances in human tissue, the establishment of baselines and trend data on chemical exposures, the identification and ranking of chemicals for toxicologic testing, the identification of populations at risk and their ranking for risk reduction, and the assessment of the effects of regulatory actions. The generally accepted method of developing a data base to satisfy such objectives is to define the target population and measurement method, select a probability sample of the population, and apply the measurement methods to this sample. To move from a set of measurements to desired statistics, it is necessary to decide on the kinds of statistics that best summarize the data, to carry out computations, and to present the resulting data for a representative sample in a useful form. Sampling errors are usually calculated to give analysts and other users of the data some understanding of the effects of random variation in the data. Nonrandom variation (bias) sometimes is also discussed and assessed by such means as sensitivity analysis and (when several sources exist) reviews, including meta-analysis.

When human populations are studied, many of those steps can be carried out only imperfectly. Problems in developing a true probability sample are common, serious measurement errors might be unavoidable, and compromises in the definition of the target population are sometimes necessary. Most statisticians recognize that such problems are inevitable and accept some small or moderate departures from an ideal survey. However, when the methods deviate in important ways from accepted standards and practice, the validity of results is questionable, and the extent to which the statistics accurately



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 71
Monitoring Human Tissues for Toxic Substances 4 Sampling Methods INTRODUCTION As is both common and appropriate in large, continuing statistical studies, the NHATS is intended to serve many purposes. EPA has described its major objectives as the detection of toxic substances in human tissue, the establishment of baselines and trend data on chemical exposures, the identification and ranking of chemicals for toxicologic testing, the identification of populations at risk and their ranking for risk reduction, and the assessment of the effects of regulatory actions. The generally accepted method of developing a data base to satisfy such objectives is to define the target population and measurement method, select a probability sample of the population, and apply the measurement methods to this sample. To move from a set of measurements to desired statistics, it is necessary to decide on the kinds of statistics that best summarize the data, to carry out computations, and to present the resulting data for a representative sample in a useful form. Sampling errors are usually calculated to give analysts and other users of the data some understanding of the effects of random variation in the data. Nonrandom variation (bias) sometimes is also discussed and assessed by such means as sensitivity analysis and (when several sources exist) reviews, including meta-analysis. When human populations are studied, many of those steps can be carried out only imperfectly. Problems in developing a true probability sample are common, serious measurement errors might be unavoidable, and compromises in the definition of the target population are sometimes necessary. Most statisticians recognize that such problems are inevitable and accept some small or moderate departures from an ideal survey. However, when the methods deviate in important ways from accepted standards and practice, the validity of results is questionable, and the extent to which the statistics accurately

OCR for page 71
Monitoring Human Tissues for Toxic Substances reflect what is going on is uncertain. As a minimum, a statistician would like to see an analysis of departures from norms to determine whether those departures affect the quality of data produced in ways and degrees that compromise their utility. This chapter examines the degree to which the sampling methods used in the NHATS permit inferences to be made about the total U.S. population. It deals only with sampling; measurement methods are discussed elsewhere. We describe ideal sampling practices, the compromises that are often necessary because some parts of the population are inaccessible or because of budgetary restrictions, the sampling methods used in the NHATS, and implications for drawing conclusions about the U.S. population. We also describe possible actions to improve this aspect of the NHATS without revising its general structure and consider the long-term goals of the NHATS and methods of accomplishing them. This chapter is intended not to provide detailed guidance on designing and selecting a sample, but rather to highlight important issues. Sample design and selection are, in general, complex activities that require special expertise in sampling, not just in statistics generally. The sponsor of a new program for monitoring human tissues should engage such expertise early in the process of program development. The greatest demand for special competence will extend through the whole period of development and through the first round or two of implementation, but some need for access to sound statistical advice on sampling will continue indefinitely and might well be part of the expertise of the permanent staff. Competence in sampling should also be represented on the advisory committee that we recommend in Chapter 7. SOME FEATURES FOR SAMPLING IN A CONTINUING POPULATION SURVEY An early step in the planning of any survey is to establish its primary goals in considerable detail. That should include description of the target population and, if necessary, subpopulations to be studied separately. The goals should indicate the precision desired for key statistics. For example, the survey might be required to detect whether a year-to-year change in a particular statistic had occurred and have at least an 80% probability that a 10% change over 1 year will show up as statistically significant. Or it might be desired to measure the current prevalence of some condition with an error of no more than 5%, also at the 95% confidence level. Other possibilities are that both requirements must be met simultaneously, or that one of them must apply to particular subpopulations. Precision requirements must be specified

OCR for page 71
Monitoring Human Tissues for Toxic Substances to establish the minimum sample sizes for each population group to be analyzed separately. Any subpopulations for which the population sampling fraction does not produce the sample size required for adequate precision will need to be oversampled. In national surveys based on personal observations or interviews, the sampling is almost always based on a multistage design (Hansen et al., 1953; Cochran, 1977). Simple random sampling and other forms of single-stage sampling are generally too expensive and impractical for other reasons. The sample design in a national survey is thus usually fairly complex, and the statisticians responsible for the sample-selection methods normally go through a series of steps: Consider the reasons for and effects of foreseeable failures in various possible sampling schemes, such as incomplete information on some sampled persons. Determine what the sampling stages should be; cost, logistics, and information that is or could be made available for sample selection at each stage should be considered. Decide on the sources for the sampling frame at each stage. The sampling frame is the list of units from which the sample will be selected. Establish the modes of stratification at each stage. Plan for the method of sample selection at each stage. We strongly advise that probability sampling be used, but several approaches are feasible within the rubric of probability sampling, e.g., simple random sampling, systematic selection, and sampling with probability proportionate to size. Determine targets for sample sizes that will satisfy the precision goals of the study, including the effects of probable departures from the plan. Sample sizes at the various stages are usually calculated to provide an approximately “optimal” sample design, i.e., one that produces the lowest sampling variances for the available budget. Consider whether oversampling of some subpopulations is desirable. If so, efficient methods of oversampling should be developed and included in the sampling plan. Develop the best method of estimating the characteristics of the target total population and subpopulations from the sample. Also plan for the calculation of reasonable estimates of uncertainty in results. Methods to carry out the various operations necessary to implement the sample design also need to be planned. Which sampling operations should be carried out by field or contract personnel, and which should be done in or from the central office under the direct supervision of agency statisticians?

OCR for page 71
Monitoring Human Tissues for Toxic Substances How will the process of sampling be monitored, and what kinds of quality control will be imposed on sample selection? Some kinds of quality control can be effected by checking the work, and others only by comparing selected statistics that are estimated from the sample with known population data. Statisticians generally consider that the development of a sample design should include procedures for estimating sampling errors. Most national surveys of human populations use complex sample designs, with multistage sample selection, stratification, and probabilities of selection that differ among subpopulations. The variances that are available in most of the multipurpose statistical software packages do not apply to such designs, and separate computations are needed for appropriate estimation of sampling errors. Several special software packages are available for that purpose (Flyer et al., 1989). It is rare for statisticians to have all the data they need to design the most efficient sample. Population parameters are often only approximate. Even less is known about other factors that are normally taken into consideration, including unit costs at each stage, nonresponse rates, such logistical features as transportation, and quality control. Thus, it is advisable to conduct research on the sample design to see whether important improvements can be made. That might be a major task in initial design, but periodic review of continuing experience is usually sufficient to keep a basically sound design up to date. Another reason for periodic review of the sample is that modifications in the objectives of the study often occur after initial survey data become available. For a continuing survey, it is sensible to see whether revisions in the sample design would support revised objectives better than the design initially chosen. COMMONLY REQUIRED COMPROMISES IN SAMPLING METHODS As indicated earlier, most population surveys require some compromises from the practices discussed above. We describe here the departures from strict probability sampling that occur in many national studies and are usually accepted as having only a modest effect on the validity of resulting data. This description will serve as a crude yardstick for review of the NHATS. Shrinking of Target Population It is fairly common to exclude small portions of the target population from the sampling frame. They might be omitted because it would be extraordi-

OCR for page 71
Monitoring Human Tissues for Toxic Substances narily expensive to include them or because they are not accessible with the data-collection methods planned. Study conclusions must, of course, be appropriately qualified or modified. Some common examples follow. Omission of institutional populations from the sampling frame, even though one would prefer to consider them as part of the target population. This is done in the National Health and Nutritional Examination Survey (NHANES), the National Health Interview Survey, and many other major federal surveys (Chu and Waksberg, 1988; NCHS, 1989). Inclusion only of persons in households that have telephones, so that data can be collected by telephone. This has become common practice for one-time surveys for both federal agencies and other sponsors and for political polls and market research. Restriction of the survey to the 48 states and the District of Columbia to avoid high costs of operating in Alaska and Hawaii. In studies of minority-group populations, restriction of the sampling frame to geographic areas that contain high concentrations of the group members. This reduces costs of screening households to identify the minority and members reduces travel costs if personal interviewing is used. The practice was followed in the Hispanic Health and Nutritional Examination Survey (Gonzalez et al., 1985). Another type of shrinkage occurs because particular segments of the population cannot be contacted with usual survey methods. For example, in the Census Bureau’s major population sample surveys, there is evidence of substantial undercoverage of homeless persons, young black males, and other minority groups. The same shortage is found in almost all population surveys. The target population thus excludes them, though not by design. Revision of Goals to Fit Sample of Affordable Size Standards of precision that are stated as desired in early planning stages of a survey often require sample sizes that are seriously inconsistent with the available budget. Adjustments can be made in the goals, the budget, or both. For example, one can combine data for several years, instead of analyzing each year separately, or one can broaden the demographic groups to be studied.

OCR for page 71
Monitoring Human Tissues for Toxic Substances Acceptance of Nonresponse Most population surveys cannot contact all sample units or get 100% cooperation from persons contacted. Nonresponse reduces the integrity of the sample, because the sampling process is not being strictly followed. Survey practitioners are resigned to the inevitability of some nonresponse, and most statisticians feel that a small amount of attrition in the sample will usually have only a minor effect on the quality of the resulting data. However, some exceptions are important and the effect of nonresponse must be considered separately for each survey. When refusal to cooperate is especially common among a particular segment of the population that differs in important ways from the population at large, even a small nonresponse rate can produce serious biases in the statistics. Responsible survey organizations put considerable effort into attaining as high a response rate as feasible. Furthermore, statistical methods are used to adjust the resulting data to reduce, to the greatest extent possible, the potential biases introduced by nonresponse, and analytic reports explore the possible effects of nonresponse. When knowledgeable statisticians judge the nonresponse rate to be so high as to compromise the most critical findings, the survey might have to be abandoned. Substitution for Sample Units In most national surveys, the samples are selected in stages. Usually, all stages except the last consist of grouping of sample units that are the subjects of the survey. For example, school children can be sampled by first selecting a sample of school districts, their subsampling schools within the sample districts, and finally choosing children within the schools. The school districts and schools, which containing groups of children, are stages in sample selection. Nonresponse sometimes occurs in the stages of the sample that consist of groups of the units of interest. In the example above, this would happen if a school district or school failed to cooperate. Many statisticians will substitute a willing unit (e.g., school district or hospital) not initially picked for a sample for an uncooperative sample unit. An effort is made to choose the substitute within the substratum containing the uncooperative unit, on the assumption that there is reasonable homogeneity within substrata, so that the units are largely interchangeable. Substitution is a form of adjustment for nonresponse. There is some disagreement among statisticians on whether substitution is preferable to statisti-

OCR for page 71
Monitoring Human Tissues for Toxic Substances cal adjustment. However, all agree that stringent efforts should be made to keep nonresponse as low as practical before either substitution or some other adjustment procedure is used and that serious nonresponse at any stage should lead to reexamination of procedures used, adequacy of the data collection staffs, or other features of the survey operations that may contribute to nonresponse. Analyses of Effects of Compromises It is desirable for statistical agencies to expend considerable effort and resources in trying to control and understand the effects of necessary compromises on the statistics generated in their surveys. Three kinds of actions are taken: A substantial share of survey resources is applied to keeping nonresponse and undercoverage low. This usually requires numerous callbacks to sample units that are difficult to contact, as well as attempts to convert refusals through experienced staff members’ special appeals, the efforts of survey sponsors, or in some cases peer pressure. Statistical adjustments are made to reduce the effects of nonresponse, undercoverage, and exclusions (by design or otherwise) from the target population. This is frequently done through poststratification and other forms of weighting the resulting data. Research is carried out on how a missed part of a population is likely to differ from the observed part and on how much the differences are taken into account in the statistical adjustments. Research also investigates whether better methods of adjustment exist and sometimes whether techniques are available to reduce the size of the missing portion of the desired target population. THE NATIONAL HUMAN ADIPOSE TISSUE SURVEY SAMPLE Goals of Study and Target Population NHATS goals are expressed in very general terms. The committee could find no clear articulation either of its primary objectives and priorities or of the precision needed in detecting magnitudes of pollution or year-to-year changes. As a result, there is no way to determine the minimal sample sizes needed. Similarly, there is no guide to follow when choices have to be made,

OCR for page 71
Monitoring Human Tissues for Toxic Substances e.g., whether to use adipose tissue or blood, whether to choose broad-scan analysis of tissue samples, and whether to rely on composites (which make prevalence estimates impossible). The target population is presumably the entire living U.S. population, but the subjects on whom measurements are made are an uncontrolled combination of recently deceased persons and surgical patients. There is no attempt to keep the mix of deceased and surgical patients constant from year to year. It is implicitly assumed that the distribution of pollutants is the same for deceased persons and surgical patients within the race, sex, and broad age groups for which sample quotas are designated. More important, it is assumed that substances in deceased persons and surgical patients reflect what is present in tissues of living Americans generally. The underlying assumption that long-lived contaminants in human tissue are spread fairly uniformly among recently deceased and living persons appears plausible (except perhaps for the effects of recent weight loss), but this has not been evaluated either empirically or theoretically. The exclusion of most of the rural population from the sample frame is a serious deficiency, particularly because the detection of pesticide residues was, at least at its onset, an objective of this survey, and exposure to pesticides in rural (farm) areas can be very different from that in urban areas. It is likely that underrepresentation of the rural population leads to understatement of contamination. There is no information on urban-rural differences that would permit inferences about the effect of the rural exclusion on the statistics. Alaska and Hawaii were also excluded from the sampling frame. Those two states currently account for only 0.7% of the U.S. population. Then-exclusion should, thus, have only a trivial effect on nation-wide statistics—far less than the exclusion of the approximately 25% of the population that is outside the metropolitan sampling areas (MSAs)—but could be serious if there is a need for data specific to these areas (e.g., certain farm workers in Hawaii, or regular consumers of meat from wild animals in Alaska). First Stage: Metropolitan Sampling Area Probability sampling methods were used to select the initial sample of MSAs. They were selected with probability proportionate to size, and detailed geographic stratification was used. Unfortunately, the integrity of the sample of SMSAs has been eroded, in that 10 of the 47 sample MSAs have been replaced. For example, one SMSA was replaced when the medical examiner failed to cooperate and no large

OCR for page 71
Monitoring Human Tissues for Toxic Substances cooperating hospital in the area could be found. The nonrandom replacement of more that 20% of the MSAs could have a substantial effect on the statistics. The substitutes for noncooperative MSAs were chosen from the same strata as the initially selected areas; they are presumably as much like the dropouts as possible. If replacements have to be made, the procedure followed in the NHATS appears to be sensible, but it should be recognized that it is the best among a set of unpleasant alternatives. There is no way to know whether refusals to cooperate are somehow related to magnitudes of contamination. In the absence of such knowledge, it is prudent to attempt to avoid substitutions. Second Stage: Medical Examiner or Pathologist NHATs does not use probability sampling at the second or following stages of sampling. In MSAs containing more than one county (virtually all the large MSAs and many of the smaller ones), a single county is first selected. Specific rules for county selection are not prescribed, and the choices are left to the contractor for the project. There seems to be a preference for the first choice to be the largest county or the one with the largest city, perhaps because it will contain more hospitals and thus make finding a cooperative one easier. However, the contractor can move to other counties, if there is difficulty in getting cooperation in the initially selected counties. It is likely that there is a bias in overrepresentation of city residents, compared with suburban residents. The extent of the bias is uncertain, because many of the large hospitals draw their patients from a wider area than a specific county. That does not hold for cases brought to the county medical examiner’s attention or to city-or county-operated hospitals, where the cases are much more likely to be restricted to city or county residents. It is plausible that city and suburban residents are exposed to different magnitude of pesticides and other contaminants. Sample selection methods are thus likely to influence the statistics to an unknown extent. Within each selected county, an attempt is first made to procure the county medical examiner’s agreement to cooperate. There seem to be no strong efforts to persuade reluctant examiners to change their minds: no contacts are made by high-level EPA officials, state officials, or local medical society representatives emphasizing the importance of the study; and the Midwest Research Institute (MRI) operations manual on procedures to be followed in data collection does not provide any motivation for a contractor’s field representatives to attempt to convert refusals. It is implied that all potential re-

OCR for page 71
Monitoring Human Tissues for Toxic Substances spondents are equally acceptable. For example, the manual describes the contact with medical examiners and pathologists as aimed at determining whether “an interest exists in participating in the survey,” and there is no sense of importance in having those individuals cooperate. Similarly, the letters in the recruitment package use the phrases “invitation to participate” and “we hope that you will assist.” The attitude seems to be that it is not important who provides the specimens. According to MRI, within each SMSA one or more hospitals and associated pathologists or medical examiners are selected and asked to supply adipose tissue specimens. If a medical examiner or pathologist cannot be recruited, a hospital pathologist in the same county is designated. Guidelines for choosing hospitals are more specific than those for counties. The choices are restricted to the ones that have the largest number of beds and that are fullservice institutions (neonatal to geriatrics). If more than one hospital satisfies the requirements, the contractor picks one. There are no clear directions on how to do so; the manual states that “the decision is…then made as to which hospital would be the best choice….” If the first hospital contacted does not wish to cooperate, another hospital is picked, etc. If the list of available hospitals and medical examiners is exhausted, a replacement SMSA is selected. There is no requirement for serious efforts to motivate reluctant hospital pathologists to cooperate. The operations manual states that “the pathologist is contacted to determine if an interest exists in participating in the survey. If the answer is ‘no’,…a notation and date of nonacceptance are made… so the hospital is not contacted again in the near future.” With the county medical examiners and hospital pathologists chosen subjectively, it is not obvious why it makes a difference which ones agree to cooperate. However, the current system is close to self-selection. Unknown and unexpected biases are plausible under such conditions. For example, if large city hospitals are overrepresented, there is also overrepresentation of the kinds of persons who attend such hospitals, including those who are alcoholic, are homeless, are in chronic poor health, or have other characteristics that might be correlated with particular environmental exposures. Third Stage: Specimen Donor Each medical examiner is given a quota—by age, sex, and race—for the number of specimens to be provided in the course of a year. However, no sampling method is designated for choosing the persons or even for avoiding possible seasonal effects. Attempts are made to attain the desired quotas for each cooperator. Field

OCR for page 71
Monitoring Human Tissues for Toxic Substances personnel are instructed to call the sample medical examiners and pathologists periodically to stimulate them to meet the quotas. Those calls appear to be ineffective: in the last few years, only about half the quotas were achieved. The quota sizes are intended to be representative the SMSA and to provide an approximately equal probability sample within the sex-age-race categories designated by the quotas. Weighting is used to adjust for departures from the quotas. With probability sampling, the selection of first-stage units with probability proportionate to size and the use of properly calculated fixed quotas of persons within the first-stage units provide close to equal probability samples for each of the designated sex-age-race categories. The quotas have to be revised periodically to reflect changes in the distribution of the population among the first-stage units. Weighting to adjust for failures to meet quotas retains the representativeness of the sample (although it increases the sampling variance), provided that the persons on whom measurements are taken are random samples of the population within the first-stage units. However, when shortages in the quotas appear with no indication of why they occur, it becomes uncertain how well the sample cases represent the population. Because there is no describable sampling system for choosing the specimens, it is not clear whether failing to meet quotas has an appreciable effect on the representativeness of the sample. Fourth Stage: Body Source of Specimen In developing a monitoring system, one can choose to monitor the adipose tissue in particular parts of the body if that tissue is considered to yield an effective measure of a contaminant concentration that affects health, or one can take specimens from random regions of the body to determine the body burden. If a contaminant is uniformly distributed through the body’s adipose tissue, it does not matter where the specimens are taken from. For blood samples, the assumption of uniformity seems appropriate and blood samples from any part of the peripheral circulation are reasonable. Pathologists are given no specific instruction on the body part for specimens. It is implicitly assumed that, as is true for blood, the contaminant in question is uniformly distributed through the body. Sampling Errors Sampling errors have not been calculated in the NHATS. Without proba-

OCR for page 71
Monitoring Human Tissues for Toxic Substances We believe that several steps could improve participation rates. First, the field personnel responsible for recruitment should be instructed on the importance of persuading reluctant prospects to cooperate. The NHATS should revise the manual currently used, which gives the impression that it does not make any difference who cooperates, as long as someone is found. Second, recruiters need tools to help to persuade reluctant prospects. One of the participants in the workshop (Appendix B) suggested that support from local medical societies might improve response rates. Another possibility is to try to get local health officials to help. Other health surveys, such as NHANES, generally enlist the support of such peer groups. Similar attempts for NHATS might be helpful. Finally, letters or telephone calls from high-level EPA officials could be effective. Invited respondents should know that the highest administrators at EPA recognize the NHATS as an important program. Adherence to Quotas We see no reason why medical examiners and hospital pathologists should regularly fall so far short of their assigned and accepted quotas. The MRI Operations Manual instructs field personnel to make calls to remind cooperating pathologists to check on progress, but does not emphasize the importance of meeting the quotas. Perhaps the field staff itself does not recognize that this is a major requirement or is too diffident in its dealings with medical examiners and pathologists. EPA and MRI should examine their procedures to see what might be done to attain the desired quotas. Replacement of MSAs If the procedures described above increase response rates substantially, it would be useful to go back to pathologists in the discarded MSAs to see whether they can be persuaded to reverse their original refusal to cooperate. Those who agree should be included in the sample, and the replacements dropped. In addition, EPA should evaluate the impact of the replacements on the statistics. One way is to examine differences in average contamination concentrations among SMSAs that are selected from similar strata (e.g., drawn from neighboring geographic divisions). Small differences would support the hypothesis that the SMSAs are fairly homogeneous within broad geographic areas; replacements then might not seriously affect the statistics. Larger differences could indicate a major effect of replacements on statistics.

OCR for page 71
Monitoring Human Tissues for Toxic Substances Such comparisons can best be made by calculating components of variance, i.e., estimating what part of the total sampling variance comes from sampling MSAs and what part from sampling hospitals and persons within the SMSAs. To estimate variance components, one must measure the variability of contaminants among persons within the same MSAs. That cannot be done when composite samples are used, because the composites merge the data on individuals. It will be necessary to use historical data—from years when individual specimens were individually analyzed. Nonrandom Selection of Counties, Medical Examiners, and Hospitals In data collection, one does not need an uncontrolled system of choosing counties and institutions within counties. We believe that acceptable sampling methods could have been developed for those stages of sample. The current method turns over the selection to the subjective choices of field personnel. Reasons for that approach should be reviewed. If it was for some minor conveniences of the contractor and field personnel, plans for the basic samples within the SMSAs should be revised. It would also be useful to evaluate possible effects of the lack of specificity in the sampling. Some light might be shed on the subject by examining differences in contamination concentrations among hospitals in the same SMSA, or by comparing data on SMSAs before and after some hospitals were changed. Composite Specimens Starting with tissues collected in 1982, the EPA stopped analyzing adipose tissue specimens from separate individuals and started analyzing “composites.” Individual specimens for immediate analysis (but not archived specimens) were combined within each of the nine U.S. Census divisions for three age groups. Some combinations were kept separate by sex or race, depending on the sample size in the census division. The combined tissues were mixed to form 46 “composites,” which were the units subjected to chemical analysis. A linear model was fitted to the results of chemical analyses of the composite samples to derive estimates of average residue concentrations by geographic region, age, sex, and race in the broad scan work. Two reasons were given for compositing: To accumulate sufficient tissue in a sample to be analyzed to ensure a

OCR for page 71
Monitoring Human Tissues for Toxic Substances high probability of detecting of toxic residues of interest. EPA indicated that that was necessary because the probability of detection is a function of the amount of analyte injected into the analytic instrument, in addition to the concentration. To reduce costs of analysis by reducing the number of analyses. That was considered desirable because the cost per analysis is high. (For example, current dioxin analyses cost $1,500–2,000 per analysis.) We discuss the amount of adipose tissue that can be extracted from a person in Chapter 5. EPA, through the NHATS, does not appear to be hindered in collecting sufficient tissue for chemical analyses of individuals. If amounts of tissue were not sufficient in the past, amounts can probably be increased in the future. Compositing will still help to reduce costs, however. Although individual specimens must still be selected and prepared, it obviously costs less to analyze a small number of composites than the much larger number of individual specimens. The NHATS staff recognizes that costs are reduced at the expense of a substantial loss in ability to analyze and interpret NHATS results. Because individual specimens are not chemically analyzed, there is no direct way to derive prevalence estimates (e.g., the proportion of the population with detectable concentrations of some specific substance above a specified level). Statistical models from which prevalence can be computed have been discussed (Nisselson, 1987) but we are not sanguine about the prospects. However, if such models are attempted, their validity should be tested empirically. Perhaps it could be done by examining the data for the years in which individual specimens were analyzed separately. Statistical modeling is currently used to estimate mean concentrations by geographic regions, three age groups, sex, and race. Direct estimates are not possible, because the specimens included in each composite cut across the subpopulations. The statistical model assumes that the mean log concentration for any combination of the four variables—region, age, race, and sex—is the simple sum of individual concentration factors for each variable and that there are no “interactions,” or factors for combinations of the variable. For example, a model without interaction implies that the difference in the concentration of residues between males and females (although not the concentrations themselves) is the same for Caucasians and non-Caucasians and that the difference between Caucasians and non-Caucasians is the same in males and females. Similarly, lack of interaction implies that the sex difference and the race difference are the same in all regions. Nisselson (1987) has described the models and the mathematical expressions for the effects of interaction, but the adequacy of the models has not been tested sufficiently. As pointed out

OCR for page 71
Monitoring Human Tissues for Toxic Substances by Nisselson, the presence of interactions can drastically affect both subpopulation estimates and their standard errors. For the model evaluation, we suggest applying the model to a period in which individual specimens were analyzed and comparing the model results with estimates made from the individual observations. COMPUTATION OF SAMPLING ERRORS There has been little or no effort to establish statistical confidence levels around the estimates that have been prepared from NHATS data. It is therefore uncertain whether differences observed among subpopulations or changes over time are more likely to reflect real changes in exposure than random fluctuations among the sample specimens. Similar uncertainties arise when NHATS data are compared with data from other populations. When essentially nonprobability samples are used (as in the NHATS), confidence intervals computed by standard statistical techniques do not ade-quately reflect the probability that the true values are within specified neighborhoods of the sample estimates. However, they are the best approximations that are possible and generally provide lower bounds on uncertainty. We suggest that the necessary computations be carried out for at least some of the substances analyzed and that the resulting information be made available both to the users of the data and to EPA personnel for consideration of the adequacy of NHATS sample sizes. Techniques can be used to estimate standard errors (EPA, 1987a). The computation of standard errors in most statistical software packages is not appropriate for a multistage sample design that uses variable weights. CRITICISMS OF NHATS STATISTICAL SAMPLING METHODS The EPA staff involved in the planning of the NHATS are well aware of the limitations of and problems with the NHATS sample. However, they believe that the NHATS nevertheless meets EPA’s goals for the National Human Monitoring Program (see Chapter 2) as stated in their preliminary response to NRC inquiries; we would agree if those goals are considered in limited manner. The dramatic changes that the NHATS showed in the prevalence of PCB’s in human tissue over a few years seem to reflect real changes in our environment. We would also generally assume that NHATS statistics showing increases of 100% or more in mean concentrations of some substances over a few years indicate important changes in our environment. Similarly,

OCR for page 71
Monitoring Human Tissues for Toxic Substances the NHATS can demonstrate even very low concentrations of some toxic substances in human tissue. The problem with a mostly nonprobability sample is its inability to ascertain, with any measurable level of confidence, smaller annual changes in average concentration or population distributions that build up over time to have important effects in our society. With the present program, it cannot be determined whether moderate increases over several years indicate an increasing prevalence of higher body burden within the overall population sampled, a greater concentration in a small fraction of the population, erratic sampling effects, or a shift in the magnitude of the sampling bias due to lack of control on the sample. It could be a long time before the statistics show clearly recognizable patterns. There are other problems in extrapolating from the NHATS sample to the total U.S. population. As mentioned earlier, one result of using composite samples is the loss of prevalence estimates. Composites might also create poorer estimates for subpopulations because of a need to rely on statistical models if compositing across subpopulations is used. The use of baseline data for comparison with results of studies of other population groups (e.g., persons living near Superfund sites, or those living in rural areas and subject to substantial pesticide exposure) is also weakened by the wide margins of uncertainty around the baseline statistics. The current operating procedures probably do meet some set of limited EPA objectives. Those objectives would be met better if the improvements suggested above were adopted. However, we do not think that EPA, Congress, or the public should be satisfied with such limited objectives. The federal government should assume a more comprehensive responsibility for informing the public and administrative agencies about potential public-health hazards as revealed by the accumulation of chemicals in the population. That will require fundamental changes in data collection. We discuss some major changes below. The Benefits of Blood Collection for Probability Sampling From a statistical point of view, a troublesome aspect of the entire NHATS program has been replacing the U.S. population as the target population with surgical patients and cadavers subject to autopsy. The replacement creates three serious problems: specimens taken in this way might well not represent the average living population, it is very difficult (probably impossible) to ensure a true probability sample, and it is not feasible to obtain important demographic or other data on the subjects.

OCR for page 71
Monitoring Human Tissues for Toxic Substances Those problems are inevitable with a measurement system that uses human adipose tissue. Therefore, we have recommended that blood replace adipose tissue as the primary medium for measuring toxic, or potentially toxic, substances in human tissue. Blood specimens could be collected, in accord with procedures roughly similar to those in NHANES, from subjects that are close to a true probability sample of the U.S. population. We recognize that blood and adipose tissue differ in buildup of substances. However, the ability to have a good sample that would be accompanied by demographic and related descriptive material about each specimen makes blood the tissue of choice. Chapter 3 discusses the relationship of substances in blood to those in adipose tissues. The use of blood specimens from a sample of live persons has other advantages. First, it would be possible to conduct interviews with the sampled persons to obtain information on covariates that would support a search for causal relationships and risk factors. The covariates could include type of drinking water used by the household (private wellwater use vs. a community system), occupation and industry (particularly whether employment is in a chemical plant or refinery), and farm vs. nonfarm residence (and, if on a farm, use of pesticides, dietary information, etc.). Second is the possibility of a longitudinal design in which sampled persons (all or some) are revisited every few years, instead of being selected independently each year. Such a design has two desirable features. It usually provides more precise estimates of year-to-year changes. And, it permits more sophisticated analysis of sources of contamination by attempting to associate changes in the concentrations of toxic substances in a person with changes in the environment (e.g., construction of a new road or industrial facility) or in other factors peculiar to the person (e.g., a job or residence change). We note, however, that a longitudinal survey also has some disadvantages. It is usually expensive to locate and visit the part of the sample that has moved between sample periods. Some movers cannot be located or have moved to areas that would be inordinately expensive to visit. Response rates thus tend to decrease over successive rounds of followup. Finally, there is a loss of ability to increase sample size for some analyses by combining data for several years. Thus, we do not unconditionally recommend introducing longitudinal features in the sample. However, the advantages and disadvantages should be carefully weighed, so that a reasonable decision on the best sample design is reached. Third is the possibility of considerable flexibility in oversampling specific demographic or other subgroups of the population. There is some over-sampling in the current NHATS, in that quotas are specified by sex, race, and three age groups. That could be extended to a finer division of age groups or

OCR for page 71
Monitoring Human Tissues for Toxic Substances to other domains. Some possibilities are the rural population, persons living in areas with heavy pesticide use, and persons living in the vicinity of particular types of industrial facilities. Despite the value of those additional features of a program based on blood, it must be clearly understood that the primary rationale is the need for true probability sampling, and that the advantages of having a probability sample must be protected in other aspects of study design. For example, EPA’s plans for a National Blood Network (NBN) are based on the collection of blood specimens, through the cooperation of the major national blood collection agencies, from volunteer donors to those agencies. Such a program would have many of the problems inherent in the NHATS, because it would rest on the assumption that the population of volunteer blood donors is a useful surrogate for the general U.S. population—an unvalidated and even dubious assumption. Furthermore, organizations whose priorities are in their own programs generally do not give other projects close attention. It is likely that the NBN will be subject to many of the NHATS operating problems, such as the refusal of sampled units to cooperate, inability to meet quotas, and general lack of quality control. Those problems are not conducive to the high quality statistical program that the public has a right to expect from EPA. EPA should plan to have a sample and data collection system that is dedicated to EPA’s interests. NHANES provides a good model. We recognize that the costs to EPA would be much higher than the current budget; however, we believe that EPA has understated the need and importance of the data and thereby has been too modest in its budget requests and allocations. In addition to a blood program, we recommend that, in spite of the statistical limitations of the NHATS program, an adipose tissue program be continued, although possibly on a reduced scale. One reason is to retain the consistent series of specimens that goes back almost 20 years. A second reason is that the NHATS would supplement the blood analysis program and provide data on some substances that cannot be measured adequately in blood. If an NHATS program is retained, the improvements and modifications described elsewhere in this document should be implemented. Use of Composite Specimens The use of composites has required EPA to set aside some of the objectives initially stated for the NHATS. Prevalence estimates can no longer be supplied. Estimates of mean concentrations for subpopulations are less precise, because they assume model validity. The ability to carry out risk assess-

OCR for page 71
Monitoring Human Tissues for Toxic Substances ment is uncertain. Those compromises in the initially contemplated program have been made to permit broad scan analysis of an increased number of substances, which becomes quite expensive per specimen. Although the broad scan analysis provides a substantial amount of information not available with the original measurement method, it does not satisfy all the purposes of a human tissue monitoring program. In deciding to use broad scan analysis, EPA acted as though it had a fixed budget for monitoring and as though the higher costs per chemical analysis had to be compensated for with a smaller number of samples analyzed. The realities of life in a government agency are such that current programs need to work within fixed budgets and it is difficult to change them, except under extraordinary conditions. However, those conditions do not necessarily apply to long-range programs. It is possible to request and obtain increased funding when it is necessary for the success of an important project. In fact, government agencies are obliged to work to obtain such increases. EPA should consider the benefits of a mix of analyses—some performed on individual specimens and others on composites—in which the results from individual specimens can be used for prevalence estimates and to test the models. It might not be necessary to prepare prevalence estimates each year; if not, smaller samples can be used for the individual specimens, with prevalence estimates based on 2- or 3-year averages. The precision required for the most important uses of the data should be reconsidered, and the total sample sizes and the allocations to samples used for individual specimens and for composites should be recalculated to meet these requirements. Samples should be collected and stored in a manner that preserves the possibility of basing measurements on individual samples, and a substantial part of the new program should be based on individual analyses. Compositing can reduce costs at the stage of chemical analysis and thus permit additional sampling or studies. When it can be shown explicitly that values based on individual samples are not needed (e.g., for estimating variances, presence above some specified concentration, or differences among population segments), some degree of compositing might be appropriate, though that degree may never reach the present EPA degree of compositing. NHATS Sample Design If our primary recommendation for a new program is not adopted, and if the present program is continued with modifications, its sample design should be reviewed and revised to institute probability sampling at all stages of sampling. Key aspects of the review should be as follows.

OCR for page 71
Monitoring Human Tissues for Toxic Substances Nonmetropolitan counties should be included in the sample. The rural and urban populations may well differ in prevalence and body concentration of pesticides and possibly of other substances. We recognize that hospitals in rural counties are generally smaller and that medical examiners have fewer cases, but populations are smaller, too. Thus, quotas for the number of specimens in those rural counties may be smaller than in SMSAs. That will add some fixed costs per hospital to the project, but is necessary for true representation of the U.S. population. The quotas for the sample size for each SMSA should be updated after the 1990 Census results become available. Current quotas are based on the population distribution in 1980, and there have been important changes in the last decade. The subjective method of choosing counties (within the selected SMSAs), medical examiners, and hospitals should be replaced with strict probability sampling. Stronger efforts should be made to attain cooperation of sample institutions and to have them meet their sample-size goals. It might be difficult to have the medical examiners and pathologists select specimens with random sampling methods, but the possibility should be explored, particularly in the larger organizations. As a minimum, an attempt should be made to spread the selection of cases evenly across the year, to avoid possible seasonal effects. A broad national program for human tissue monitoring might at first seem to be a suitable vehicle for studies and evaluation of groups in the population that appear to have unusually high exposure to some chemical substances. Examples include accidental exposures to environmental disasters, occupational risks, and persons who live near Superfund sites. The committee believes that, in general, evaluation of other than background exposures should not be built into a national human monitoring program, although data collected in the broad basic program that turn out to be useful should of course be used. The basic problem is that a broad sample would include only a handful of persons in any small group of special concern, and design of a sample to answer questions about such groups would either require an enormous equalprobability sample or distort a weighted sample to the point where it might be unsuitable for its basic purpose, even with appropriate analyses to adjust for differential weights. An agency staff qualified to maintain a strong human tissue monitoring program should have skills and facilities for investigation of risks in special populations, and special studies might well be assigned to them. However, such special studies should in general be regarded as add-ons, supported by

OCR for page 71
Monitoring Human Tissues for Toxic Substances separate budgets and explicit reallocations of staff responsibilities and time commitments, so that the basic monitoring program is protected. The national data will, of course, be invaluable for comparison with results of special studies and might determine whether the special populations have increased concentrations of contaminants. SUMMARY AND RECOMMENDATIONS There are serious deficiencies in the NHATS in that adipose tissues collected are not a representative sample of the U.S. population. Although most population surveys find it necessary to compromise somewhat on ideal standards, the departures from probability sampling in the NHATS are far in excess of what most statisticians would consider acceptable. The main deficiencies are these: Although the target population is the living U.S. population, the subjects on whom measurements are taken are an uncontrolled mix of recently deceased persons and surgical patients. The sample size has been driven by the budget, rather than by needs to satisfy important goals of the program. Some important segments of the population are omitted from the sample. The exclusion of the rural population is the most serious omission. Although probability sampling was used in the selection of the metropolitan areas that are the first stage of sampling, problems of cooperation forced substitutions for 20% of the areas. Consequently, the extent to which the sample of areas now represents all metropolitan areas is uncertain. There is no designated sampling method for choosing the persons from whom specimens are taken. Each medical examiner or pathologist is given a quota by age, sex, and race, but the quotas are poorly adhered to; even if the quotas were met, the quota procedure would be inherently biased. There are no specific instructions for pathologists on the body part to be used for specimens. It is implicitly assumed that contaminant concentrations are the same in all adipose tissue in the body. Recent uses of composite measurements have made it impossible to provide prevalence estimates and seriously weakened the estimates of mean contamination concentrations for the sex, race, and age subdomains. Sampling errors have not been calculated, so users are not informed about the precision of the data. There is no plan for regular release of findings to the public.

OCR for page 71
Monitoring Human Tissues for Toxic Substances Even within the limitations of the NHATS system, it is possible to improve the methods used to implement the NHATS so that it can come closer to a realization of the original plan. This chapter includes recommendations for improvements. Even with the improvements, however, the NHATS will have major limitations in its ability to reflect the accumulation of toxic substances in the U.S. population. The committee believes that the limitations are so serious that the NHATS should be replaced with another system for measuring the accumulation of contamination in human bodies. The committee recommends that the NHATS be replaced with a blood monitoring program as the primary method of measuring toxic or potentially toxic substances in human tissue. The sampling plan can be patterned after the one used in NHANES, whose subjects are close to a true probability sample of the U.S. population. In addition to having a probability sample, the system would permit interviews to be conducted with the sampled persons to obtain data on covariates. Studies could then be made on sources of contamination as well as on amounts in human tissue. Other advantages of using blood are described in Chapter 3. Collection of blood specimens should be designed in strict accord with the methods of probability sampling at all stages. The methods used should be efficient for giving virtually all persons in the United States a known probability of selection. During the period in which the NHATS is continued, the selection methods should be revised to reduce the subjective elements in the choices of counties, hospitals, and specimens. In addition to initiating a blood collection program, the committee recommends the continuation of the collection and analysis of adi pose tissue, although possibly on a reduced scale. One reason is to have a continuous time series, and a second is to provide data on substances that cannot be measured adequately in blood.