C
Data Issues

As noted in Appendix B, the quality of data with which matching procedures must work has a significant impact on the rate of false positives and false negatives that result from such procedures.

SOURCES OF VOTER REGISTRATION INFORMATION

The NVRA requires state departments of motor vehicles to incorporate the voter registration application into the application for driver’s licenses in a way that does not require the applicant to duplicate any information (except for a second signature). Thus, the DMV is responsible for passing to voter registrars the information needed to register a voter. In most states, the forms are simply sent from DMV offices to the local elections office, where a second manual data entry into the VRD takes place. In a few states, the data from the form is entered into DMV records, and then the proper information is extracted and sent to the registrar electronically (eliminating the need for a second data entry). State DMVs are also required to transmit changes of address received for driver’s licenses to the appropriate voter registrar for a change of registration address unless the individual involved indicates otherwise.

The NVRA also requires public assistance and disability service agencies to provide voters with voter registration forms that voters complete manually and then return to the agency or department for delivery to the voter registrar, or to certify in writing that the individual applying for assistance or service has declined the opportunity to register to vote.1 (However, the committee also recognizes that election officials are not generally in the chain of command for these agencies, a fact that often leads to a certain amount of bureaucratic politics as Agency A seeks to persuade Agency B to help carry out the mission of Agency A.) The availability of registration forms in these many locations increases the opportunities for eligible voters to register, but can also result in duplicate registrations that are sent to election agencies, and if voters themselves fill out the form manually, they can and do make mistakes.

DATA CAPTURE AND QUALITY

Under all procedures used for voter registration in the United States today, the prospective voter must take action to register to vote.2 Through such action, the voter provides certain pieces of information that eventually wind up in a voter registration database. If this process could be guaranteed to be error-free, many fewer problems of data quality would exist. But unfortunately, this is not the case.

It is useful to distinguish between three categories of error that may be introduced in the journey of these pieces of information from the voter’s head to the database. Usually, the voter provides

1

The committee received testimony during its second workshop that many state assistance and service agencies are not following through with this obligation.

2

Exceptions arise from the fact that some states allow same-day registration and that North Dakota does not require voter registration.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 41
C Data Issues As noted in Appendix B, the quality of data with which matching procedures must work has a significant impact on the rate of false positives and false negatives that result from such procedures. SOURCES OF VOTER REGISTRATION INFORMATION The NVRA requires state departments of motor vehicles to incorporate the voter registration application into the application for driver’s licenses in a way that does not require the applicant to duplicate any information (except for a second signature). Thus, the DMV is responsible for passing to voter registrars the information needed to register a voter. In most states, the forms are simply sent from DMV offices to the local elections office, where a second manual data entry into the VRD takes place. In a few states, the data from the form is entered into DMV records, and then the proper information is extracted and sent to the registrar electronically (eliminating the need for a second data entry). State DMVs are also required to transmit changes of address received for driver’s licenses to the appropriate voter registrar for a change of registration address unless the individual involved indicates otherwise. The NVRA also requires public assistance and disability service agencies to provide voters with voter registration forms that voters complete manually and then return to the agency or department for delivery to the voter registrar, or to certify in writing that the individual applying for assistance or service has declined the opportunity to register to vote.1 (However, the committee also recognizes that election officials are not generally in the chain of command for these agencies, a fact that often leads to a certain amount of bureaucratic politics as Agency A seeks to persuade Agency B to help carry out the mission of Agency A.) The availability of registration forms in these many locations increases the opportunities for eligible voters to register, but can also result in duplicate registrations that are sent to election agencies, and if voters themselves fill out the form manually, they can and do make mistakes. DATA CAPTURE AND QUALITY Under all procedures used for voter registration in the United States today, the prospective voter must take action to register to vote.2 Through such action, the voter provides certain pieces of information that eventually wind up in a voter registration database. If this process could be guaranteed to be error-free, many fewer problems of data quality would exist. But unfortunately, this is not the case. It is useful to distinguish between three categories of error that may be introduced in the journey of these pieces of information from the voter’s head to the database. Usually, the voter provides 1 The committee received testimony during its second workshop that many state assistance and service agencies are not following through with this obligation. 2 Exceptions arise from the fact that some states allow same-day registration and that North Dakota does not require voter registration. 41

OCR for page 41
42 STATE VOTER REGISTRATION DATABASES: IMMEDIATE ACTIONS AND FUTURE IMPROVEMENTS handwritten information on a form. The form is transmitted or carried to the voter registrar, where the data are transcribed from the form into machine-readable form, usually by a data-entry clerk who performs this task manually. Once in machine-readable form, the data may then be processed in some minimal fashion before it is stored permanently in the database. All of these steps can result in some kind of error. A variety of problems complicate the data capture process. For example, data capture efforts are often compromised by: • Illegibility. The information on most voter registration forms is handwritten, and in many cases, the handwriting is difficult to read, entirely illegible, or misunderstood. This makes the act of entering this information more challenging and increases the potential for errors in voter registration records to be entered in the database. • Inaccurate or incomplete voter registration information. Applicants may fill out the forms inaccurately or incompletely if they misunderstand what information is required. Although applicants make such errors in all venues in which they fill out applications, they are more likely to make errors when the venue is crowded, noisy, and chaotic and when those available to help applicants do not have time or are not knowledgeable enough to answer questions about the applications. These conditions are often met during voter registration drives that take place in locations other than election offices—shopping centers, university campuses, and other locations that attract large crowds. In addition, voter registration drives are frequently staffed by volunteers, some of whom may not have sufficient knowledge of process and procedures in collecting voter information; this may be especially true when volunteers are brought in from out of town. • Missing voter registrations. For example, Jim Dickson of the American Association of People with Disabilities testified to the committee that the volume of voter registration applications received from state social service and disability agencies (a service to potential voters that the NVRA directs these agencies to provide) has dropped significantly since the initial implementation of the law in 1995, although the committee notes that the causality of this drop remains unclear—that is, it is unknown whether this drop reflects failures in the social service agencies to meet their legal obligations; a change in the demographics and/or preferences of those applying for social services; problems in conveying completed applications to voter registrars; or some other reason(s). • Repeated (duplicate) registration applications. An individual may submit multiple voter registration applications “just to be sure,” or because s/he may have forgotten that s/he is already registered to vote. Although voter registrars are supposed to have mechanisms in place to screen duplicate registrations, the screening process does not always work smoothly, and sometimes the same individual may be registered more than once. • Inconsistencies in submitted information. In filling out forms, individuals are often unintentionally inconsistent in the information they provide, especially if a period of time has elapsed between multiple form-fillings (either across registrations or between registrations and other activities such as applying for a driver’s license or an SSN). An individual may use a nickname in one case and the full legal name in another, or include a middle initial in one and omit it in another. Such inconsistencies may arise because of a lack of clarity in the instructions given to the individual about what specific information to provide or a lack of recall about what s/he entered on a previous occasion. In other cases, the information requested may have changed (names sometimes change upon marriage, for example). • Data entry errors. Typographical errors are made by hitting one key when another was intended. Transposition errors transpose two letters in a field, or even two fields. Even with carefully handwritten registration forms, it is possible that transcription/keying error may

OCR for page 41
APPENDIX C 43 TABLE C.1 Illustrative Sources of Error in Names Name on Voter Registration Forma Source of Error Name in Database Typos Pierce Peirce or Pearce or Perce or Pierrce Transliteration Mohammad Muhammed Marriage Mary Pierce (maiden name Owens) Mary Owens or Mrs. Martin Pierce Nickname Sam Pierce Samuel Pierce Transposed field Bao Lu Lu Bao Double names “Mary Ann” (first) “Pierce” (last) “Mary” (first) “Ann” (middle) “Pierce” (last) Hyphenated name “Mary” (first) “Owens-Pierce” (last) “Mary” (first”) “Owens” (middle) “Pierce” (last) Punctuation al-Amin al Amin Omitted middle name or initial John Philip Pierce John Pierce a Handwriting assumed to be readable. SOURCE for all rows but the last: Justin Levitt, Wendy R. Weiser, and Ana Muñoz, Making the List: Database Matching and Verification Processes for Voter Registration, Brennan Center, New York University, 2006. Reprinted with permission. approach 5 percent or more in fields such as first name, last name, and date of birth if the data entry clerks lack adequate training and monitoring.3 • Systematic errors stemming from different data representation conventions. Among the most important are those associated with dates and names. ⎯In many countries (including most of Europe), 01/03/2007 means March 1, 2007, whereas in the United States it means January 3, 2007. A naturalized U.S. citizen is perhaps more likely to make such a mistake than an individual raised in the United States. ⎯In many Asian nations, the family name is always stated first. Kim Jong-il is a Korean name; the family name is Kim, and the given name is Jong-il. However, it would be easy for an American to recognize Kim as a first name, perhaps as an abbreviation for Kimberly, and Jong-il as a last name. ⎯Names normally rendered in an alphabet other than a Roman alphabet may well be spelled inconsistently when transcribed into a Roman alphabet. This problem is of particular concern to those of Russian, Asian, Israeli, and Arabic descent. These factors generate a wide range of errors. Table C.1 describes a variety of additional error types that may also exist in name fields; Table C.2 describes some possible errors in date-of-birth fields. Voter registrars are left with the problem of managing an environment in which such errors are common. 3 See J.J. Pollock and A. Zamora, “Automatic Spelling Correction in Scientific and Scholary Text,” Communications of the ACM 27(4):358-368, 1984. In a highly controlled situation, keying error rates were in excess of 2 percent (in keystrokes). A 1-2 percent error rate in keystrokes could easily yield a 5 percent error rate in fields.

OCR for page 41
44 STATE VOTER REGISTRATION DATABASES: IMMEDIATE ACTIONS AND FUTURE IMPROVEMENTS TABLE C.2 Illustrative Sources of Error in Dates of Birth In Database (Voter, DMV, and/or Source of Error On Voter Registration Form SSA) Typos 01/03/05 02/03/05 or 1/00/05 or 1/03/05 or 11/03/05 Transposed field 01/03/05 03/01/05 or 05/01/03 Invented default 01/03/05 01/01/05 (submitted only as January 2005) SOURCE: Justin Levitt, Wendy R. Weiser, and Ana Muñoz, Making the List: Database Matching and Verification Processes for Voter Registration, Brennan Center, New York University, 2006. Reprinted with permission. Problems with data capture and errors in the voter registration database can have an important effect on the individuals whose data are involved. The voter believes that he or she is properly registered, but the registration may have been rejected as a result of the inaccurate, incomplete, or illegible information on the form, or the voter may not know to bring to the polls on Election Day the additional identification required because of a problem with his or her form. In some cases, the voter may be entirely absent from the voter registration rolls. Errors in databases will accumulate if action is not taken to correct them promptly. For example, assume that 16 percent of all records in a database reflect at least one change in a field per year. After 3 years, 40 percent of the records will be different. This means that if the database is not updated yearly, 40 percent of the records in the database will be in error. In addition, it may become more difficult over time to correct errors that occurred at previous time periods in the absence of mechanisms to keep track of individuals uniquely (for example, through driver’s license numbers or through secondary systems that keep history)—that is, errors can compound as multiple matches and corrections take place. For instance, if a state VRD file has dates of birth corrected using a semiautomatic procedure that utilizes matching with a state DMV file, then incorrect matching or an erroneous date of birth in the DMV file will induce error in the state VRD file. Subsequent matching against state social services files or SSA files to determine whether an individual is deceased will either fail or possibly induce additional error. IMPROVING DATA CAPTURE AND QUALITY A number of approaches are available for improving the quality of data within a VRD. However, all such approaches require certain skills and resources on a continuing basis. This last point is important—because of ongoing changes in the population eligible to vote, a continuous effort to maintain data quality in a voter registration database is needed if the database is not to fall into an error-filled state. Inadequate resources for database maintenance will result in greater amounts of error. The remainder of this section addresses a variety of ways for improving data quality. However, one often-used method for improving data quality is not an option for voter registrars—starting over from scratch. In many cases, databases with errors that accumulate over time eventually become so filled with erroneous data that it is more cost-effective to rebuild the databases from scratch than to try to clean them up. Voter registrars in Kentucky did so in 1973, requiring all voters to re-register. However, “starting from scratch” for a VRD would mean purging everyone from the VRD, and since the NVRA establishes specific criteria for removing voters from registration lists, such an act would be contrary to existing law.

OCR for page 41
APPENDIX C 45 Human-assisted Data Cleaning Many traditional systems for managing administrative lists incorporate procedures that improve data capture and remove some typographical variations. The data-capture procedures are intended to improve the quality (legibility and completeness) of the information on written forms and the subsequent keying of the data-derived information into computer files. In traditional systems, list cleanup is often performed by skilled specialists who can determine name variations or possible missing information in the main administrative files. Using experience and auxiliary information, the specialists might determine that “Johm Smeth” must really be “John Smith.” They might determine that the date of birth (in the form MMDDYYYY) “06139182” might have really have been meant to be “06131982.” The intent of the corrections by the specialists was to remove typographical errors in the main administrative list. A cleaned-up list allows more effective searching of large files and effective comparison of pairs of records. For a new record “John Smith” with date of birth “06131982,” it is much easier to search for “John Smith” in the corrected administrative list and compare dates of birth or search for “06131982.” Note that some types of typographical error simply cannot be identified using such a technique. Although automated accounting for the presence of typographical errors in a database is often possible, certain “errors” may not in fact be errors. “Bill” is only one character away from “Bull”—and indeed the “i” in Bill may be a mistyped “u,” but “Bull” is used as a first name from time to time as well. There are no known ways to handle such “errors” automatically without the availability of tertiary reference data. In some instances, such as UK national health files or U.S. SSA files, a full-time staff locates, follows up, and corrects for certain types of errors. This effort can significantly reduce the number of individuals who are represented in the lists two or more times. If these cleaned-up lists are used in verifying information associated with other lists, then these other lists are much less likely to induce additional error than are lists that have not undergone intense cleanup. Voter-assisted Error Correction New registrants can sometimes be given the opportunity to correct erroneous information. For example, the name and address provided on a registration card may be legible, but the date of birth illegible. If enough legible information is provided, voter registrars can contact the voter to inform him/her of the problem and ask them to resubmit correct information. In many polling places today, voters can correct registration information—a poll worker notes an error on the registry or on another log, and the election officials can update their registry as part of the postelection canvass. In addition, voters in many states now receive confirmation cards that confirm their registrations; these cards provide the voter with an opportunity to review the information that is part of their registration. To help minimize keying errors, registrars might ask individuals with access to the relevant facilities to correct their information online through a Web site; security would be provided by a special code or password returned to the individual with the data correction request to ensure that only the proper individual could view or correct the information. Electronic Transmission of Voter Registration Applications Important sources of voter registration applications include departments of motor vehicles and social service agencies. Today’s processes usually require individuals to register using handwriting on paper forms, a process that is highly subject to error upon data entry. But there is no reason in principle that the information collected by the DMVs and social service agencies (which is almost surely being captured in electronic form for use in DMV or social service agency systems) that is relevant to voter

OCR for page 41
46 STATE VOTER REGISTRATION DATABASES: IMMEDIATE ACTIONS AND FUTURE IMPROVEMENTS registration could not be transmitted electronically to voter registrars, thereby eliminating errors associated with repeated keying (once for the agency in question and a second time for the VRD). Some states also require that the voter provide a signature for the voter registration record, which is used for verification against pollbooks or ballot return envelopes in the mail-in voting process. An electronic transfer of voter registration forms must therefore accommodate in some way the need for the signature. Though recommended by the Election Assistance Commission in its Voluntary Guidance on Implementation of Statewide Voter Registration Lists,4 electronic transmission is not required by any present regulation and would entail some nontrivial work to implement on a large scale, such as agreement on the format for transmission and the construction of additional software to permit the exchange of information. Use of Other Databases (Including Third-party Data) Yet another way to correct errors in an existing database is to match as many of its records as possible with those in another complete, (nearly) error-free database (or several such databases) and to use these other databases as “truth” for error correction. If there are no such complete high-quality databases available, then the use of other databases can still be useful to triangulate on the correct information, but the error correction process will take a lot more work under these circumstances. At the same time, the fact that other databases may contain data with fewer errors does not mean that the information they provide should automatically be used to update the voter’s registration. Discrepancies between the voter’s registration information as represented in the VRD and data in these other databases are indicators of possible errors in the VRD, but in most cases voter registrars are required by law or policy to follow up on such discrepancies by contacting the voter to inquire as to which information is accurate—the voter database or the other database used in the match. Third-party data, or secondary data, of high quality can be used to reduce ambiguity in record- level matches because they can be used to associate the same identity with a different record using data values based on a different time period or on differences in the values recorded. Sources of such data include telephone books and credit header data (credit records), which can be used to determine or validate middle names, addresses, dates of birth, and so on. Other generally available sources of data sometimes worth consideration include databases of property ownership, magazine subscriptions, and so on. Data aggregators, such as Lexis-Nexis, Choicepoint, and Acxiom, collect data from a variety of disparate sources and sell data on a record-by-record request basis over an Internet connection, although the expense of access to such data may be a significant barrier to their use. Third-party data vary in quality, with some sources worse than others. In addition, data collected to serve one purpose are sometimes less well suited for another purpose. These issues with quality may affect judgments about the suitability of available third-party data for correcting errors in a VRD. Note that 94 percent of the parties responding to a 2007 National Association of State Election Directors survey on voter registration practices indicated that they did not use secondary data sources such as phone directories or real-property records to reconstruct a voter’s information if information supplied by the voter on a voter registration card was missing or incomplete.5 A special source of third-party data for a given state is the VRDs of other states. That is, under most circumstances, an individual can vote in only one jurisdiction. Generally, it violates no law for an individual to be registered to vote in more than one jurisdiction, but the presence of the same person in the VRDs of two states suggests that one of those registrations does not accurately reflect the status of that individual. A number of states have agreed to exchange voter registration data in a couple of ongoing collaborations. Only preliminary data from these collaborations are available at this point, and the 4 Available at http://www.eac.gov/election/docs/statewide_registration_guidelines_072605.pdf/attachment_download/file. 5 See http://www.surveymonkey.com/sr.aspx?sm=jK8QyNXCIwgdaY4SjASFyN0v4coilbBEvQxDuSyIS4s_3d.

OCR for page 41
APPENDIX C 47 committee looks forward to analyzing more detailed data from these projects in the future, including information on the fields they are matching, the number of potential duplicates on the lists, and the number of actual duplicates they remove from their lists. A start at tracking some efforts at interstate checking of duplicate registrations can be found in the EAC report Impact of the National Voter Registration Act on Federal Elections 2005-2006.6 On page 76 of that report can be found the fact that at least three groups of states have checked for such duplicates at least once: District of Columbia, Virginia, and Maryland; Minnesota, Missouri, Nebraska, Kansas, and Iowa; and Kentucky, South Carolina, and Tennessee. Improving match accuracy can contribute to improved completeness of a VRD. Match accuracy, whether performed by automated processes or manual review, can be benefited by tertiary, third-party, data. When such external data are carefully harnessed for improved match accuracy, systems can more often resolve ambiguities without human involvement. Reducing the number of exceptions necessitating human review and judgment increases the repeatability of list maintenance. Such data can be used in two ways. First, such data can be acquired across the entire population and made available for error-correction processes. Second, data can be selectively made available only when they are needed to resolve ambiguities in any putative record-level match—an approach that minimizes privacy concerns because it obtains additional data on individuals only when they are needed.7 When using third-party data to enhance matching accuracy, additional logging and accountability requirements must be introduced. Each third-party record requested and received must be retained and retained in its original form until it is no longer needed (for example, until the point that the voter has confirmed any changes that may have resulted from the use of such data). Furthermore, any third-party record used to improve a match should be logged and accounted for similarly. In addition, government matching with third-party datasets raises privacy concerns (such as concerns if credit header data is merged with voter history data, for example). COLLATERAL ISSUES IN IMPROVING DATA QUALITY Application of the techniques discussed above is intended to improve the quality of the data in a VRD by making the data more accurate—that is, these techniques allow erroneous data to be changed into correct data. But their success in doing so is not guaranteed—use of the techiques may introduce additional error, or the original data may in fact have been correct. Thus, it may well be advisable to keep the old data as well as the new, but with a flag that indicates that the old data have been corrected. In addition, a policy must be established regarding notification of the voter if a field is changed. The cost of such notification must be weighed against the value of ensuring with high confidence that the updated data are correct. 6 Available at http://www.eac.gov/clearinghouse/docs/the-impact-of-the-national-voter-registration-act-on-federal- elections-2005-2006/attachment_download/file. See also Thad Hall and Michael Alvarez, “The Next Big Election Challenge: Developing Electronic Data Transaction Standards for Election Administration,” IBM Center for the Business of Goverment, 2005, available at http://www.vote.caltech.edu/media/documents/AlvarezReport.pdf. 7 This technique is explained in detail in Paul Rosenzweig and Jeff Jonas, “Correcting False Positives: Redress and the Watch List Conundrum,” Legal Memorandum 17, The Heritage Foundation, June 17, 2005, available at http://www.heritage.org/Research/HomelandSecurity/lm17.cfm.