Skip to main content

Currently Skimming:

B Matching Records Across Databases
Pages 29-40

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 29...
... THE BASIC PROCESS OF MATCHING RECORDS ACROSS DATABASES WITHOUT UNIQUE IDENTIFIERS1 The basic element of a VRD is a record with data contained within specific fields associated with an individual -- first name, last name, street address, date of birth, and so on. Databases may differ in the number of fields that a given record contains (for example, one database may include a field for telephone number and another might not)
From page 30...
... For example, if the voter registration database is being checked against a database of felons or dead people, a low rate of false positives is needed to reduce the likelihood that eligible voters are removed from the VRD. Just how low a rate is acceptable is a policy choice.
From page 31...
... Records will be deemed not to match under the criteria listed above if they share common blank data fields among the fields listed above, except for cases in which the middle name field or suffix field is blank in both records. Records will be deemed not to match under the criteria listed above if one of the fields being compared contains data and the same field in the match record contains no data.
From page 32...
... Finally, the above technically oriented comments presume that the databases to be matched against the VRD are in fact available. But in the real world of state voter registration databases, fragmented state control over state social service agencies and departments of motor vehicles, and state/county tensions regarding authority over voter registration, the politics of database availability are at least as challenging as the technology for matching.
From page 33...
... Table 12, "Verification of Applications," on page 72 in the EAC report11 shows that each state has its own unique set of criteria for verifying the applications, ranging from states like Pennsylvania, which verifies only through the DMV and the SSA, to Montana, which verifies against the DMV, the SSA, Vital Records, "Match Against Voter Registration Databases," "Tracking Returned Voter ID Cards," "Tracking Returned Disposition Notices," and "Verify Through Other Agency." According to Table 13, "Data Fields for Comparison to Identify Duplications," in the EAC report, 15 states verify using the address; 48 verify the date of birth; 38 verify the driver's license number; 46 verify the names provided by the registrant; 40 verify "Social Security number" (although surely that is just the last four digits in most cases, since according to Table 11, pages 68-69, in the EAC report, only 7 states use the full SSN) ; and 10 verify "other" data.
From page 34...
... Upon receipt of the applicable information, the SSA queries its database and returns one of five responses: no match found; one unique match, death indicator absent; one unique match, death indicator present; multiple matches found with at least one lacking a death indicator; multiple matches found but all with death indicator. As noted above, the query is based on searching for exact matches on the applicable information.
From page 35...
... For purposes of this report, "ad hoc matching" is used to mean matching developed on the basis of intuitive reasoning that is not further validated systematically or analyzed with mathematical rigor. By contrast, systematic matching is based on a formal mathematical approach that develops metrics to measure match efficacy.
From page 36...
... Winkler, "String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage," Proceedings of the Section on Survey Research Methods, American Statistical Association, pp.
From page 37...
... Winkler, "Automatically Estimation Record Linkage False Match Rates," Proceedings of the Section on Survey Research Methods, American Statistical Association, CD-ROM. Also available at http://www.census.gov/srd/papers/pdf/rrs2007-05.pdf.
From page 38...
... are optimal in the sense that they can minimize the size of the clerical review region. Further, in many situations such as with voter registration databases or department of motor vehicle files, it is possible to estimate or give reasonable approximations of the error rates even without training data.2 The earliest matching parameter and error-rate estimation procedures are the easiest to implement and most likely appropriate for VRD files.
From page 39...
... 274-279, 1993; William E Winkler, "Automatically Estimation Record Linkage False Match Rates," Proceedings of the Section on Survey Research Methods, American Statistical Association, CD-ROM, 2006, also at http://www.census.gov/srd/papers/pdf/rrs2007-05.pdf ; Thomas R
From page 40...
... Other technical approaches to blocking and string comparators can be found in Fienberg et al.2 1 William E Winkler, "String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage," Proceedings of the Section on Survey Research Methods, American Statistical Association, pp.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.