administrative data have not been used extensively. A good starting point for such analysis is examining the frequencies of certain fields to determine if there are any anomalies, such as values that are out of range; or examining inexplicable variation by region, suggesting variation in data entry practices; or seeking missing periods of the time series. Substantive consistency of the data is an important starting point as well. One example of this with which we have been wrestling is why 100 percent of the AFDC caseload were not eligible for Medicaid. We were certain that we had made some error in our record linkage. When we conferred with the welfare agency staff, they also were stymied at first. We eventually discovered that some AFDC recipients are actually covered by private health insurance through their employers. With this information, we are at least able to explain an apparent error.

  • Finally, are the items in the data fields critical to the mission of the program? This issue is related to the first noted issue above. Cutting checks is critical for welfare agencies. If certain types of data are required to cut checks, the data may be considered to be accurate. For example, if a payment cannot be made to an individual until a status that results in a sanction is addressed, one typically expects that the sanction code will be changed so payment can be made. On the other hand, if a particular assessment is not required for a worker to do his or her job or if an assessment is outside the skill set of the typical worker doing the assessment, one should have concerns about the accuracy (Goerge et al., 1992). For example, foster care workers have been asked to provide the disability status of the child on his or her computerized record. This status in the vast majority of the cases has no impact on the decision making of the worker. Therefore, even if there is an edit check that requires a particular set of codes, one would not expect the coding to be accurate.

We will continue to give examples of data quality issues as we discuss ways to address some of them. The following examples center on the linking of an administrative data set with another one in order to address inadequacies in one set for addressing a particular question.

The choice-based nature of administrative data can be addressed in part by linking the data to a population-based administrative data set. Such linkages allows one to better understand who is participating in a program and perhaps how they were selected or selected themselves into the program. There are some obvious examples of choice-based linking data to population-based data. In analyzing young children, it is possible to use birth certificate data to better understand what children might be selected into programs such as Women, Infants and Children (WIC), Early, Periodic, Screening, Diagnosis And Treatment Program (EPSDT), and foster care. If geographic identifiers are available, administrative data can be linked to census tract information to provide additional information on the context as well as the selection process. For example, knowing how many poor children live in a particular census tract and how many children participate

The National Academies of Sciences, Engineering, and Medicine
500 Fifth St. N.W. | Washington, D.C. 20001

Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement