. "Appendix B: Matching Records Across Databases." Improving State Voter Registration Databases: Final Report. Washington, DC: The National Academies Press, 2010.
The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
Improving State Voter Registration Databases: Final Report
Box B.4
Illustrative Records
Record R-1: As written on registration form
County A
Daniel R Smith
123 Post Street
My City
DLN 0873457345
DOB 6/1944
Record R-2: As captured by the Social Security Administration
County B
Dan Randal Smith
456 Adele Lane
Your City
SSN4 5657
DOB 6/1944
Record R-3: As provided by credit header data (version 1 of Record R)
Daniel Randal Smith
DOB 6/1944
Current address: 123 Post Street, My City
Previous address: 456 Adele Lane, Your City
SSN4 5657
Record R-4: As recorded by credit header data (version 2 of Record R)
Daniel Richard Smith
DOB 6/1944
Current address: 123 Post Street, My City
Previous address: 789 Temple Hills, Some Other City
SSN4 1212
is illegible). If there is an error in the UID, a search could be performed using the name and the date of birth to find all possible UIDs associated with those names and dates to find the UID that is most similar to the one recorded in error—that UID would likely be the “correct” UID for the person in question.
A more general strategy would be needed when there is a possibility of typographical error in every field. The matching strategy is to search the entire file and apply suitable proximity metrics that indicate that the UID, first name, last name, and date of birth are sufficiently close to the query record. The feasibility of this strategy depends on the frequency with which invalid UIDs are encountered, because it is not practical to sequentially read every record in the database and perform substantial computation on every record in the file for every query.
The most general strategy involves substantial restructuring of the database to facilitate fast searches. Keys such as first character of first name plus last name plus date of birth, telephone number, or house number plus street name are defined and added to the database to allow fast searches. Using all appropriate fields, only records with proximity scores sufficiently close to the query record are retrieved for review. Definition of the keys and the order in which they are applied requires certain experience and skill.