names might be a good alternative while providing a better means of protecting individual identities.4
We recommend a number of activities in the cleaning of administrative data for research use. These include:
Examining the internal consistency of the data;
Examining how the data were collected, processed, and maintained before delivery to the researcher;
Taking every opportunity to compare with other data sets, either survey or administrative, through record linkage; and,
Most important, getting to know the operations of the program, not just the collection of administrative data, but also how services are provided so that inconsistencies in the data might be understood better.
We also recommend using probabilistic record linkage and not relying on any one identifier for linking records. We believe our analysis above makes this case. The golden rule of record linkage is that there is no such thing as a unique identifier, because individuals can match on many identifiers. In many cases the same SSN has been provided to two or more individuals.
Much of what is discussed previously is required because public policy organizations are still, for the most part, in their first generation of information systems. These “legacy” systems are typically a decade or older mainframe installations that do not take advantage of much of today’s technology. Data entry in the legacy systems, for example, is often quite cumbersome and requires a specialized data entry function. Frontline workers are typically not trained to do this or do not have the time or resources to take on the data entry task. An exception is in entitlement programs in some jurisdictions, where the primary activity for eligibility workers is collecting information from individuals and entering it into a computerized eligibility determination tool. The development of new graphical user interfaces that are more worker friendly—in that the screens