3
Review of Relevant Data Issues
This chapter addresses the important issues relevant to acquiring and analyzing health data that would inform or comprise suggested sets of leading health indicators. Each issue is presented briefly in order to introduce the constraints on the selection of these indicators and their suggested measures.
Data Sources.
There is a substantial amount of national data available by which one can measure leading health indicators. Among the most important ongoing federal health surveys are (1) the National Health Interview Survey, which continuously collects data on the American population; (2) the National Health and Nutrition Examination Survey (NHANES), which offers physiological as well as health status information at ten year intervals; and (3) the Behavioral Risk Factor Surveillance Surveys, which documents preventive behaviors and service receipt on a state-by-state basis. A critical additional data source is the state and federal vital record system. This is the most universal of all the available data sources referent to population health.
Additional research databases exist that cover many important health and non-health topics such as motor vehicle accident rates, injuries, and deaths by the National Traffic Safety Board, environmental data collected by the Environmental Protection Agency, tobacco consumption data collected by the Internal Revenue Service, patient counseling about primary prevention captured by the National Ambulatory Medical Care Survey, and nutrition data collected by the Continuing Survey of Food Intake by Individuals, U.S. Department of Agriculture. These cover all ages, both genders, and a wide variety of social, cultural, and economic situations. However, although they can inform many possible leading indicators, they have a set of availability issues that may limit their usefulness: (1) there is often no uniformity in the structure of questions, and thus there may not be comparability between data sources, (2) data collection may not be performed at sufficiently frequent intervals, and thus no information will be available to measure leading indicators on a routine basis, and, (3) investigators and data collectors may not wish to release information that could be related to forthcoming scientific papers and presentations. Some health and non-health survey information may be of great value to leading indicators, but may only be collected on local or regional
populations, limiting national representativeness and not allowing inferences about the national population.
Another data source that will have to be seriously considered depending on the indicator sets suggested, is the collection of new survey data that contains desired information and is referent to the American population and select population groups. There may be incumbent costs, the magnitude of which will depend on the extent of the survey, but these can be constrained by “piggy-backing” relevant survey items on existing surveys that are being conducted on an ongoing basis. Some candidate surveys may not be devoted centrally to health issues, but may be exploring labor, economic or other themes.
Physiological Measures
Theoretically, it might be of great interest to have leading indicators represented by physiological or biochemical measures. Examples might include blood pressure levels, blood levels of antibodies representing exposure to designated infectious diseases, blood levels of important nutrients such as vitamins, or population-average muscle strength or physical balance capacity as a reflection of physical dysfunction. Many, if not all of these, might be available on a national sample survey such as the NHANES. However, there are formidable impediments to these measures, the most prominent of which is cost. Such routine data collection would be extremely expensive. In addition, such data may not be available in a timely manner or for select population groups. The NHANES is only performed once each decade, and would not be suitable to inform policy on a regular basis; there are no other equivalent surveys of national scope. Finally, the processing of physiological and other laboratory data might add incremental delays in making the data available on a real time basis.
Timeliness of Data and Indicator Availability
In addition to timeliness of physiological information, there is an issue relevant to more conventional data sources. Provision of “final” vital records data may take up to 3 or 4 years after the year in which they were collected. Consequently, measures of leading indicators based on vital records, such as mortality or birth-related information, would have to be based on provisional information and subject to change at a later date. Even routinely collected national survey data may be subject to delays of 6 months to 2 years while the data are validated, analyzed, and presented for appropriate use. This will have relevance to the frequency and recency of reporting to the population on the status of specific indicators.
Small-Area Analysis
Leading indicators based on survey data or vital records will have great interest with respect to the health status of the nation. However, as the indicators garner increasing public attention, there will be a desire to assess the status of indicators at the state and local levels. Some potential indicators may be pertinent to smaller geographic areas, such as measures that reflect common occurrences (e.g., all-cause mortality or live births) and those that are collected at frequent intervals. However, it may be extremely difficult to infer local findings from national sample survey data and, depending on the indicators, there should be plans to assist local communities to interpret them and find alternate sources of similar information or conduct local surveys that will be responsive to the specific indicators. For the most part, this is a matter of the statistical stability of the measures, and when the number of events in a given time period and jurisdiction is small, confidence in the local estimated indicator measure may be low. The same issue may apply to other types of select population groups defined by age-, gender-and race-or ethnic-specific health measures. The number of health events of interest may be too small to have confident measures, or in some cases, there simply may be no substantial sample survey data available on the population of interest, such as elders over age 75 years, Mexican Americans, or persons with physical disability.
These inherent limitations will to some extent be insurmountable. However, some mechanisms may partially address some of the data requests from local jurisdictions and interest groups seeking information relevant to leading indicator measures. These include, but are not limited to the following:
- Assist in locating existing local risk factor, vital records, or survey data that might be referent to the indicator;
- Provide indicator information from a geographically or demographically similar population survey;
- Provide a statistical “toolkit” that would allow extrapolation of national or state statistical information to the local population demographic characteristics.
- Suggest a set of analytical techniques that would allow summarization of existing information over longer but more statistically secure intervals, such as “rolling averages”;
- Provide alternatives to the leading indicator measures for purposes of health planning and evaluation.
Representativeness and Data Accuracy
There are 2 general issues relating to the accuracy of health data. The first relates to the representativeness of sample survey data. Most sample surveys of the United States have complex sampling frames, and this complexity must be addressed in data analysis to have valid estimates. The second is data accuracy. Does the measure approximate the real situation (validity), and if the question is asked again by the same or a different interviewer is the same response obtained (reliability)? Most of the data sources that will inform the leading indicators are well established and
in general can be presumed to be representative and accurate, as well as scientifically defensible. This is because most measures have been in use for several years and have been evaluated in both public health and research venues. For those health measures known to have modest to moderate deficits in accuracy, such as certain specific causes of death, the final report will indicate and address them. For indicators that are purely subjective, there are still ways to determine their construct and predictive validity. These will also be addressed in the final report if such subjective indicators are included in the recommended indicator sets.
Ecological Measures: Social and Environmental
It is well established that many of determinants of population health status do not reside within individuals but are characteristic of the social or physiochemical environment. Examples of important social exposures that are referent to the community include crime rates, poverty rates, the quality of primary and secondary education, and the availability of employment. Examples from the physiochemical environment include air or water pollution levels or the prevalence of workplace safety programs. In addition to general conceptual considerations, the same data-related issues apply. Are measures for these indicators available on a timely basis? Can they be summed to represent the entire nation or broken down to be meaningful at the level of select population groups? Furthermore, are the measures accurate and reliable?
Summary
Issues related to data quality, access, timeliness, and comprehensiveness are central to the selection process of leading indicators, and as specific sets are suggested, the overall feasibility of acquiring informative data will weigh heavily in the final determinations and recommendations of this IOM committee.