3
Technical Considerations for Voter Registration Databases

3.1
DATA CAPTURE AND QUALITY

The data contained in a VRD can be characterized with respect to two different attributes—accuracy and completeness. For purposes of this report, accuracy refers to the factual correctness of the data that exist in the database, whereas completeness refers to the presence in the database of all individuals who should be in the database. If the database is perfect, it is both 100 percent accurate and 100 percent complete—that is, all of the data in the database are correct (and thus the database contains no individual who should not be in the database), and the database includes all of the individuals who should be in the database. Notice that in this formulation, accuracy does not subsume completeness, so that a database must be characterized with respect to both attributes.

This usage of the term “accurate” appears to be consistent with the meaning of the word in common discourse. However, the reader is cautioned that some other commentators and analysts use the term “accurate” to mean both “factually correct” and “complete.”

As is the case with all other databases, the utility of a VRD depends strongly on the quality of the data it contains (the accuracy and completeness of the data), although a variety of processes can be applied to the data in order to improve their quality.

One common source of error in the data is data entry. Applicants typically submit handwritten voter registration forms that are sent to the election official. The applicant can make a mistake, forget to answer a question, or not write legibly. The form or its information could be altered in transmission (a field could get smudged or torn or otherwise damaged in postal handling, for example). Keying errors result in mistranscriptions.

Another source of error is the quality of other lists that are compared with VRDs. The quality of other lists similarly depends on the procedures for data collection and entry; methods employed to minimize errors in the data, such as removing duplicates and other anomalies from these secondary databases; and staff training and audits, among other aspects.

Moreover, the different purposes for which secondary data are collected can limit their use for other purposes and may not fully address what is needed for the purposes of voter registration databases. For instance, the USPS compiles change-of-address data when customers request mail forwarding through the USPS NCOA system. However, the USPS has defined its information services so as to serve its primary business function, that is, without particular concern for the needs of election officials—and



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 17
3 Technical Considerations for Voter Registration Databases 3.1 DATA CAPTURE AND QUALITy The data contained in a VRD can be characterized with respect to two different attributes—accuracy and completeness. For purposes of this report, accuracy refers to the factual correctness of the data that exist in the database, whereas completeness refers to the presence in the database of all individuals who should be in the database. If the database is perfect, it is both 100 percent accurate and 100 percent complete—that is, all of the data in the database are correct (and thus the database contains no indi - vidual who should not be in the database), and the database includes all of the individuals who should be in the database. Notice that in this formulation, accuracy does not subsume completeness, so that a database must be characterized with respect to both attributes. This usage of the term “accurate” appears to be consistent with the meaning of the word in common discourse. However, the reader is cautioned that some other commentators and analysts use the term “accurate” to mean both “factually correct” and “complete.” As is the case with all other databases, the utility of a VRD depends strongly on the quality of the data it contains (the accuracy and completeness of the data), although a variety of processes can be applied to the data in order to improve their quality. One common source of error in the data is data entry. Applicants typically submit handwritten voter registration forms that are sent to the election official. The applicant can make a mistake, forget to answer a question, or not write legibly. The form or its information could be altered in transmission (a field could get smudged or torn or otherwise damaged in postal handling, for example). keying errors result in mistranscriptions. Another source of error is the quality of other lists that are compared with VRDs. The quality of other lists similarly depends on the procedures for data collection and entry; methods employed to minimize errors in the data, such as removing duplicates and other anomalies from these secondary databases; and staff training and audits, among other aspects. Moreover, the different purposes for which secondary data are collected can limit their use for other purposes and may not fully address what is needed for the purposes of voter registration databases. For instance, the USPS compiles change-of-address data when customers request mail forwarding through the USPS NCOA system. However, the USPS has defined its information services so as to serve its primary business function, that is, without particular concern for the needs of election officials—and 

OCR for page 17
 IMPROVING STATE VOTER REGISTRATION DATABASES in particular, it does not collect date-of-birth information because such information is not related to the primary business purpose of the USPS. Thus, the NCOA system cannot be queried with name and date of birth to learn an individual’s new address. Furthermore, because of privacy considerations, the USPS limits the disclosure of change-of-address information, and thus a name and old address must be presented before a new address can be provided. This limitation is significant because it means that an election official cannot simply query the NCOA database for the new address of an individual known to have moved. A more detailed discussion of data capture and quality can be found in Appendix C. 3.2 DATABASE INTEROPERABILITy Database interoperability arises as a requirement because election officials must perform a variety of tasks that involve other databases, ranging from other state VRDs to lists of deceased persons as described above. From a technical standpoint, database interoperability refers to the capability of two databases to exchange data (perhaps with a third-party application) and to use the exchanged data. 1 Data exchange involves transmitting and receiving data between two systems, by whatever means, in a way that maintains the usability (preserves the structure and formatting) of the data. Data use depends on the corresponding data fields having the same meaning in each database. Transmitting and receiving data involve moving the electronic bits that represent the data in ques - tion through some channel. In practice, this involves either a communications network connecting the two database systems or use of a physical medium such as a CD-ROM to carry the data. Using a direct linkage (e.g., an Internet connection) provides for real-time communications—the data that are trans - ferred to the receiving system can be kept current with changes. Use of a physical medium generally “batches” the data to be transferred, and thus changes to the sending system’s database will arrive to the recipient with some delay and may not reflect the most recent changes. As for the data that are passed through either approach, they must be formatted in a manner so that one system can write and the other can read. A common approach to achieve formatting compatibility is to use the sending system’s ability to “export” its data into a known file format (e.g., a comma-delimited file) and for the resulting file to be transmitted or carried to the receiving system. Data usability is guaranteed if all databases use the same data definitions. 2 However, in the situa- tions faced by election officials, data definitions of the comparison databases (the databases containing the data with which VRD data must be compared) may well be different. Ensuring the similarity of data definitions goes beyond classic definitions such as “integer” or “character string”—it also includes issues such as formatting and data semantics. For example, System A may define dates in a mm-dd-yyyy format, and System B in a dd-mm-yyyy format. The semantics of the two systems may differ: System A may use standardized addresses and strip all punctuation from name fields, whereas System B may not use standardized addresses and may retain punctuation in name fields. Or, System A may include name suffixes in the last name field, and System B may provide a separate field for name suffixes. Such definitional differences may increase the difficulties of comparing fields unless the definitions of these fields can be reconciled. A variety of technical approaches have been developed for dealing with differing standards or incompatible definitions; see Box 3.1. In any event, data definitions must either match or be transformed in a way that preserves the semantics of the data. 1 In colloquial usage, database interoperability sometimes has a broader meaning that entails data access, of which data ex - change is a subset. Database interoperability without data exchange, for example, can refer to the ability of election officials in State A to view records and perform searches in the VRD of State B. Although such a capability can be helpful in individual instances, the inability to perform data exchange prevents any large-scale operation involving either database. 2 On the other hand, a release of a given database system may have data definitions that are somewhat different from those of an earlier release. System developers know that such changes create operational chaos, and thus avoid such changes whenever possible.

OCR for page 17
 TECHNICAL CONSIDERATIONS FOR VOTER REGISTRATION DATABASES Box 3.1 Approaches for Achieving Data Compatibility There are a number of approaches for reconciling data definitions: • The data translator approach requires two systems that need to interoperate to have a translator that converts one set of data definitions into the other. Data translators are probably the simplest and most straightforward approach to achieving data interoperability, although the data translator approach does not scale upward if interoperability among all databases is required. • The common format approach calls for each system to use its own data definitions internally. However, exchanges of data with other systems are conducted by using a common data standard into which data must be translated before being transmitted to another system. Any system using these data then downloads them in the common format and retranslates the data into locally meaningful terms before the data are used. • The data server approach (an extension of the common format approach) is based on the separation of data from the applications that use the data. When a system requires data, it connects to a data server that provides the data. Thus, enforcement of definitions can be limited to just a few servers rather than a myriad of applications. By moving the data into a system separate from the individual applications, this approach facilitates reuse of data in new, unanticipated ways. Achieving interoperability between different systems is potentially complicated by the fact that these systems are built or acquired by a variety of agencies (election officials, departments of motor vehicles, departments of vital statistics, departments of correction) that are not generally subject to the same overall chain of command and thus may not implement compatible data definitions. These agencies are concerned primarily with developing systems optimized to serve their own mission needs. Thus, they generally have little interest (or funding or incentive) to focus very much attention on the needs of the other agencies with which they may some day need interoperability, and are likely to pay minimum attention even to mandated tasks that are outside their primary mission needs. As an illustration, consider that voter registration databases must provide for recording the physical residential address of record for the voter as well as a mailing address; the former is essential for the determination of voting eligibility, precinct boundaries, and ballot style assignment (making sure that a given voter registered for a specific address receives the correct ballot for that address). The database systems of other agencies may only support fields for mailing address. Lastly, any discussion of database interoperability is incomplete without mentioning its organiza - tional dimensions. Specifically, interoperability between a state VRD and another database operated by federal, state, or county authorities depends on cooperation between the election official and the relevant federal, state, or county authority to exchange and/or process the relevant data. No technical solution can force agencies to cooperate with each other, and if such cooperation is not forthcoming, data exchanges may well be more infrequent or the data could be prepared more poorly than would otherwise be the case. 3.3 MATCHING Database interoperability can be regarded as the process through which the data from one database can be made available in a useful and understandable format for a meaningful comparison to the data from another database. The matching process is the essence of the comparison in the context of VRDs. Adding new voters to the VRD and maintaining the VRD both require a procedure by which attri -

OCR for page 17
0 IMPROVING STATE VOTER REGISTRATION DATABASES butes of one data registration record are compared to attributes of another record (for example, a new voter registration application, a DMV driver’s license, an SSA record, a record in a database of felons, and so on). This procedure, variously known as record linkage, identity matching, identity resolution, or simply “record matching,” is “good” when it results in low rates of false positives (matches indi - cated when no match in fact exists) and false negatives (nonmatches indicated when a match does in fact exist). In adding individuals to a VRD, poor procedures could have either or both of two undesirable consequences. First, they might result in improper indications of a nonmatch when a match should be indicated, a result that could be used (1) to disenfranchise voters (in the event that an applicant’s information cannot be verified when it should be verifiable), or (2) to inflate the size of the VRD list mistakenly (in the event that an earlier registration for an applicant cannot be found and a new record is improperly added as though the individual were a new registrant). Second, they might result in improper indications of a match when a nonmatch should be indicated, a result that could be used to add ineligible names to the VRD list. In maintaining the VRD, procedures of poor quality will result in improper indications of a match between the voter registration list and one of the databases of ineligible-to-vote individuals when a nonmatch should be indicated (a result that tends to remove voters from the voter registration list improperly) or improper indications of a nonmatch when a match should be indicated (a result that would keep felons, mentally incompetent individuals, and deceased people in the VRD). The consequences of false positives and false negatives may vary depending on the purpose of the matching (and thus depending on the other databases against which VRD records are being matched). By law, the information on new voter registration applications must be checked against DMV or SSA records, and the consequences of a false negative (that is, no matches found when an individual is in fact represented in the DMV or SSA database) may be to wrongly keep the individual off the rolls—false negatives in this context may lead to a less complete VRD. List maintenance often calls for existing VRD records to be matched against felon or death records. The consequences of a false negative in the context of list maintenance are precisely the opposite: individuals may erroneously be kept on the rolls—false negatives in this context may lead to a less accurate VRD. False positives (that is, a match improperly found when in fact the individual is not represented in the database being checked) have a different impact. In the context of adding individuals and checking against DMV or SSA records, false positives result in a less accurate VRD, because individuals may be improperly added to the list. In the context of list maintenance and checking against felon or death records, false positives result in a less complete VRD, because individuals may be improperly removed from the list. Box 3.2 summarizes these conclusions regarding false positives and false negatives. Because of data quality issues and the lack of a universally used unique identifier, record match - ing cannot be done perfectly in this context, that is, with zero false positives and zero false negatives. 3 The consequence is that achieving the goal of a simultaneously 100 percent accurate and 100 percent complete voter registration list is virtually impossible. At the same time, what counts as an acceptable rate of false positives or false negatives, or an acceptable tradeoff between accuracy and completeness, depends on the particular policy goals that are desired. For example, given that a choice is necessary, a state could prefer to emphasize completeness over accuracy in its VRD. With this goal in mind, it may choose to minimize the rate of false positives in matching the VRD against a list of felons, a policy choice that almost certainly will increase the number of ineligible individuals on the list. Alternatively, a state could prefer to emphasize accuracy over com - 3If a unique identifier for every person were available, and if that identifier were used in all databases that were to be com - pared, and if those identifiers were always recorded correctly in the databases, perfect matching would be possible. But these conditions are essentially never realized in practice. When a DMV number or a full SSN number are unavailable, the matching process becomes one of probabilistic inference rather than logical deduction.

OCR for page 17
 TECHNICAL CONSIDERATIONS FOR VOTER REGISTRATION DATABASES Box 3.2 Consequences of False Positives and False Negatives Adding new voters to the voter registration list List maintenance ___________________________________ ________________________________________ Consequence of false Less accurate VRD (ineligible Less complete VRD (eligible voters positive persons may be added to may be improperly removed from the rolls) the rolls) Consequence of false Less complete VRD (eligible Less accurate VRD (ineligible negative voters may be kept off the persons may be kept on the rolls) rolls) pleteness in its VRD. With this goal in mind, it may choose to minimize the rate of false negatives in matching the VRD against a list of felons, a policy choice that almost certainly will increase the number of legitimately eligible individuals removed from the list.4 Inevitably, a number of voters in a given state will be disenfranchised given one policy choice that would not have been disenfranchised under the other. Also, if State A makes the first policy choice and State B the second, some similarly situated voters in these states will not be treated identically. (The committee does not make any normative judgment regarding either of these policy choices, and observes that the federal government appears to be more concerned that voters within a single state are treated alike than the possibility that voters in different states may be treated differently.) From a technical standpoint, the hard problem in matching usually lies not in identifying potential matches (e.g., pairs of records that may have some but not all elements in common) but rather in how to handle the potential matches that are identified. (It is for this reason that the use of common unique identifiers greatly enhances matching outcomes—such use materially and significantly reduces the chal - lenges of possible matches.) Determining whether two records refer to the same individual is usually the problematic step. Record-matching procedures can, in principle, be executed by computer, by a human being, or both. Computer-based procedures for verification or maintenance have the advantages that they can perform matches very rapidly and can operate consistently (because they depend only on the specific data involved and the prescriptive rules as implemented). But computers using naïve matching rules (e.g., processing Liz and Elizabeth as different names) can also be “fooled” by data problems that suit - ably trained humans can often handle. Human-based matching has the advantage of bringing to bear training and personal experience, which can be used to determine with confidence a match or nonmatch in more cases (Box 3.3). In some cases, humans can obtain additional information by contacting the individual(s) who may be involved, and use the information obtained to help resolve a match. A human can also compare signatures asso - ciated with each member of a proposed match and make a judgment about whether the signatures are 4Arguments might sometimes be put forth to make only a particular subset of the database maximally accurate or maximally complete. (Hypothetically, a particular subset might be “all female voters” or “all voters in precincts x, y, and z” which hap- pen to have the highest fraction of registered Democrats or Republicans.) While legitimate policy reasons for doing so in some cases cannot be ruled out, such actions are inherently suspect and deserve the highest scrutiny before being implemented. For example, an election official might be motivated to maximize the number of voters in a particular socioeconomic class or other group in order to give his or her party of preference an advantage at the polls. Although the political motivation for wishing to take such action is clear, such an action would do serious injustice to the democratic process, and such a motivation would never be acknowledged publicly.

OCR for page 17
 IMPROVING STATE VOTER REGISTRATION DATABASES Box 3.3 An Illustrative Example of Human Exception Processing • Example 1—Users entering new voter registrations must check existing rolls for matches. New registration card Existing voter ________________________________ ________________ Mary Sinclair Mary Sinclair 43 Bayberry Street 73 Ascot Drive 4/28/63 4/28/63 SSN XXXXX3434 (4-digit SSN) DL 00767234633 To address this ambiguity, if the user could confirm that the driver’s license number is already known to be associated with an SSN with the same last four digits (3434), then this user could associ- ate these records with high confidence. Another alternative would be to determine if Mary Sinclair on Bayberry Street used to live on Ascot Drive. Example 2—System must match new voter registrations against records in SSA databases. • New registration card Closest record in SSA ________________________________ __________________________ Tom T Bowden Taylor T Bowden 32 Escondido Way /04/77 /04/77 SSN XXXXX087 (4-digit SSN) SSN XXXXX087 (4-digit SSN) With the match algorithm currently used by the SSA for matching inquiries from election of- ficials, the SSA would return a “no-match” result. If the algorithm were changed to include the closest matches to the submitted inquiry, the “Taylor T Bowden” record would be displayed. To address the potential ambiguity, the election official could seek to confirm that either Tom has a middle name of Taylor or Taylor has a middle name of Tom or Thomas; if so, the election official could associate these records with some degree of confidence if he or she concludes that the first and middle names have been transposed. The human review process involves review and best judgment based on the attributes at hand. Because having more attributes improves match accuracy, having more attributes reduces the number of voters inappropriately categorized as ineligible. sufficiently similar to indicate a genuine match. On the other hand, human-based matching is slow and thus impractical when large numbers of records are involved. Human-based matching is generally less consistent than computer-based matching but may be better (though still somewhat subjective) in other areas, such as comparing signatures. Human-based matching may also be biased—for example, a human matcher may have prejudices against Hispanics, and may be less likely to resolve in a favorable manner apparent matches in the database involving people with common Hispanic surnames compared to others. These procedures can be used in tandem, so that any possible match or nonmatch (which depends on context) found by a computer-based procedure is directed to a human being before any action is taken.5 For example, if the submission of a given name to the DMV and SSA results in a nonmatch, a 5These comments should not be taken to imply that the combination of computer plus human review is necessarily better than the computer alone in all circumstances. Indeed, the literature indicates that for human review to add to the quality of the

OCR for page 17
 TECHNICAL CONSIDERATIONS FOR VOTER REGISTRATION DATABASES human being may inspect the original voter registration form to compare the handwritten data on the form with the data as transcribed into the database, correcting the database record if necessary and resubmitting it with the correct spelling as indicated on the handwritten form. A human being may also resubmit the query with a different but equivalent name. This different-but-equivalent name may be a common nickname (e.g., Bill, Will, Willie, Willy for William) or a different spelling of a name (Jazmine for Jasmine).6 Helpful though such manual procedures are, they can break down under the stress of large numbers of applications, as may happen when applications are submitted near the deadline for submission of registrations. In addition, it is probably unrealistic even under normal conditions to expect a human to resubmit a large number of name variations—at most, trying a few alternatives is likely the best that can be expected. In addition, match algorithms based on exact matches between corresponding data fields cannot account for typographical error. Blocking techniques and string comparators are helpful for dealing with this problem; when used, most query results would logically take the form of a list of records, sorted by a score indicating the likelihood of a match (that is, a fuzzy match) rather than a simple binary result (match or no match).7 A more detailed discussion of matching can be found in Appendix B. Some privacy issues that arise with matching are addressed in Appendix D. 3.4 SySTEM AVAILABILITy Availability is the property of a system related to a user’s ability to use the system when necessary. Many factors influence the accessibility of a system, including how many users are trying to use the system at the same time, what kinds of tasks the system is handling at any given time, and whether or not an adversary is trying to reduce system availability. Systems that are subject to large variations in the user load they support pose technical challenges. As compared to other times of year, VRDs in particular must typically support intense usage from many users in the period before registration deadlines occur, on Election Day or during other periods of voting (e.g., early voting), just before primaries, and so on. Furthermore, the demands on the system are different during these different periods—data entry tasks are likely to be most plentiful just before registration deadlines expire, whereas user queries to the database are likely to be most plentiful when voting is occurring. In addition, VRDs also depend on other systems being available. For example, election officials make heavy use of DMV and SSA databases for verifying applicant-provided registration information, as required by HAVA. If these systems are unavailable during peak demand times, election officials may be unable to verify such information in a timely manner and thus may not be able to register a voter in time for a primary or an election. For example, the Social Security Administration often performs system maintenance and upgrades over the Columbus Day weekend (mid-October). Although such actions are understandable given the SSA’s primary mission, they also have major negative effects on election officials trying to process the enormous influx of voter registration applications that arrive before Election Day (in November). Some outcome, human reviewers must be well trained (see, for example, H.B. Newcombe et al., “Reliability of Computerized Versus Manual Death Searches in a Study of the Health of Eldorado Uranium Workers,” Computers in Biology and Medicine 13(3):157-69, 1983). Nonetheless, it tends to be true that the combination of good computer matching procedures and well-trained human reviewers is often superior in performance to the use of those procedures alone. 6 Managing known name equivalents can also be performed in an automated fashion, but if automated assistance is not avail - able, humans must undertake this task. 7 Match algorithms are based on comparisons made at the level of individual fields or at the record level. String comparators compare text strings within individual fields and generate a score that reflects the amount of difference between the two strings. Blocking techniques bring together pairs via characteristics that are believed to contain less typographical error, and the remaining (or all) information in pairs is used in computing a matching score.

OCR for page 17
 IMPROVING STATE VOTER REGISTRATION DATABASES voter registration databases do not have the capability to enter data from voter registration forms with - out verifying those data (that is, if verification cannot be attempted, data entry must stop). In short, the unavailability of SSA databases over the Columbus Day weekend means that election officials must halt all processing of applications if their VRDs do not support forms in a “verification pending” state. Another issue, often classified as an issue of security, relates to deliberate denial-of-service (DOS) attacks against voter registration systems. A DOS attack attempts to flood a voter registration system with false requests for service, leaving no capability for processing legitimate requests. One DOS attack may target the servers hosting a VRD, thus preventing local election officials from accessing it. Another kind of DOS attack may target election officials by flooding them with fake voter registration forms. Although these forms will ultimately be rejected as being fake, it takes time to process each form, and processing fake forms prevents election officials from processing real forms. 3.5 SECURITy AND PRIVACy Security Security issues in VRDs arise for two reasons. First, state VRDs contain personal information associ - ated with registered voters, and such information must be protected against disclosures not permitted by law. Second, the overall integrity of the VRD must be protected against unauthorized alterations (e.g., individual records being improperly added, deleted, or changed). Insecure VRDs pose a number of dangers. Individual voters may be disenfranchised if records of their registration are improperly deleted from the VRD. Voter fraud may be possible if registration records are improperly added. A voter might fall victim to identify theft if sensitive personal information such as a Social Security number is compromised. And improper changes to a voter’s record might also effectively disenfranchise him or her (e.g., an altered address might cause the voter to go to the wrong polling place) and at the very least have the potential for creating confusion and difficulty for a voter. Security measures address the issues of both who is authorized to view or change information in the VRD and of what information within any record in the VRD may be viewed or changed. In the security context, viewing information includes seeing individual records and sending or transferring records en masse; changing information includes adding entirely new records, altering one or more fields within one or more records, and deleting records. Appendix D describes some important best practices in security. However, these practices only work for data that are under the control of the relevant election official. In the event that the election official shares information with another party (e.g., on demand to a requestor as required by policy or applicable law), there are few if any practical technical measures that the election official can take to ensure the subsequent security of the released data (though some actions can be taken to increase the accountability of the party to whom data are released). Perhaps the only action that the election official can take is to ensure that the data released consist only of those data that are required to be released and no other data. Once the data leave the control of the election official, it is up to the recipient to abide by the terms of use and enforce any relevant security measures. Accordingly, the election official should find a way to bind the recipient—legally—to take the necessary precautions. Appendix D addresses security issues in greater detail. Privacy Privacy is not the same as security, even though they are often discussed together. Privacy issues relate to policy regarding what information may be disclosed to which parties under what circumstances. Thus, a hypothetical law requiring that any registered voter’s name and address (but not party affiliation or Social Security number) must be available without restriction to the public reflects a policy choice

OCR for page 17
 TECHNICAL CONSIDERATIONS FOR VOTER REGISTRATION DATABASES rather than a security issue. A security issue arises if an unauthorized party is able to gain access through the VRD to the voter’s Social Security number, which is supposed to be kept confidential. Some of the information in VRDs is, by law, public information, although the specifics of which data items can be regarded as public information vary from state to state. In addition, states often limit the purposes for which such information may be used. Nevertheless, the electronic availability of such information raises concerns about the privacy of that information, because electronic access greatly increases the ease with which the information can be made available to anyone, including those who might abuse it. Some transparency measures are required by law—for example, the NVRA requires public access to the outcomes of most list maintenance activities (excluding declinations, source of registration). Access to such information has been a critical enabler for the efforts of public watchdog groups in discovering problems with state list maintenance activities. Election officials sometimes advocate transparency mea - sures—and most importantly a philosophy of open access to registration-related data—as an approach that helps to ensure the maximum possible accuracy of their files. Many analysts of privacy issues point to fair information practices (FIPs) as a gold standard for privacy protection that balances privacy rights against user needs for personal information, and in the context of voter registration, the 2006 USACM report on statewide databases recommends the adoption of such practices as the basis for privacy policy regarding voter registration activities. 8 FIPs generally include notifying individuals with personal information that such information is being collected; provid - ing individuals with choices about how their personal information may be used; enabling individuals to review the data collected about them in a timely and inexpensive way and to contest those data’s accu - racy and completeness, taking steps to ensure that the personal information of individuals is accurate and secure, and providing individuals with mechanisms for redress if these principles are violated. From an operational standpoint, a full implementation of FIPs for VRDs is likely to prove prob - lematic or undesirable for many jurisdictions. Perhaps the most salient issue is the tension between privacy of personal information and openness and transparency for public records. In its starkest terms, maintaining privacy involves withholding from public view certain information associated with indi - viduals, while transparency involves the maximum disclosure of information, even if such information is associated with individuals. Although a number of states, such as California, Hawaii, Idaho, kentucky, Massachusetts, Minne- sota, New york, Ohio, and Virginia, have enacted state privacy acts based largely on the provisions of the federal Privacy Act, these acts are typically formulated in such a way that they bar the disclosure of personal information unless disclosure is required by the relevant state’s public records act, which may or may not allow the protection of all personal information associated with voter registration records. Such protection may be undesirable for policy reasons as well. Appendix D addresses privacy issues in greater detail. 3.6 BACkUP Backed-up files provide users with the capability to restore the VRD in the event of hardware failure (e.g., a fire or flood in the machine room), database corruption as the result of hardware or software problems, operator error, or a successful malicious attack (e.g., a cyber attack) against the database or the hardware. There are two basic ways (not mutually exclusive) to backing up files. First, copies of the database can be stored and retrieved in the event of disaster. Second, mirrored or replicated facilities allow a system to continue operating even if the primary database is unavailable. How best to back up data 8U.S. Public Policy Committee of the Association for Computing Machinery, Statewide Databases of Registered Voters: Study of Ac- curacy, Priacy, Usability, Security, and Reliability Issues, 2006, available at http://usacm.acm.org/usacm/PDF/VRD_report.pdf.

OCR for page 17
 IMPROVING STATE VOTER REGISTRATION DATABASES should reflect an assessment of threats and vulnerabilities (both accidental and deliberate), acceptable parameters for data loss and time-to-restore capability, and available financial resources. Copying the database has the primary virtues of simplicity and low cost. For databases of modest size, backing up files in this manner is a task that could be accomplished in just a few hours using tech - niques available to any home PC user—one would simply copy the database file to some backup media late at night. The database could be locked at night for routine maintenance, and the entire file could be copied and stored away. A variety of automated tools are also available to simplify this process. On the other hand, this simple approach to backup works only when the database in question is sufficiently small. A reasonable upper-bound estimate on the size of a voter’s record is 200 bytes, assuming only textual information is stored.9 The largest state voter database is that of California, with approximately 18 million registered voters, corresponding to a total database size of at most 3.6 giga - bytes—files of this size can be copied easily in an hour or two. But for many voter registration databases, textual information is not the only thing stored. VRD systems are increasingly incorporating capabilities for imaging the paper forms on which voters submit information. If only the voter’s signature is stored, a high-quality image may require 100 kilobytes. If the entire filled-in form is imaged, 2 megabytes may be needed. Thus, the incorporation of image-handling capabilities into a VRD changes the storage requirements completely. A California-scale VRD that imaged the entire form for each registration might be 40 terabytes. Although databases of terabyte scale do not come anywhere near stressing the current state of the art in file management and backup, they call for the use of database technology and hardware platforms that are considerably more sophisticated—and costly—than that of PC technology. These more sophisticated approaches—available in commercial database systems—provide for mirroring data in real time (that is, as it is written) onto redundant media and differential or incremental backup.10 Such systems sometimes allow selective field backups, so that rarely used information (e.g., the large images of voter registration forms) would be backed up at a much lower frequency than fields that are used regularly (e.g., the much smaller text representations of the information contained in the voter registration forms). 3.7 THE IMPACT OF ELECTION DAy REGISTRATION AND PORTABLE REGISTRATION ON VOTER REGISTRATION DATABASES Election Day Registration A traditional VRD operates within a structure that requires a multi-week period between the dead - line for new voters to submit voter registration forms and Election Day. Election officials use this period to enter the data from these forms into the VRD and to verify some of the data on these forms if required by HAVA (as in the case of mailed-in registration forms). Election Day registration (EDR) eliminates this period, allowing voters to register on the same day on which they cast their ballot. On Election Day (or during a period of early voting), a person shows up at an appropriate location (which may be a polling place or a central election office) and presents the necessary identification to an election official. The official consults the registration list and if he or she is not already registered, the election official registers the voter immediately. 9 In August 2008, the full VRD for Oregon consisted of 2,053,444 records, corresponding to approximately 280 megabytes of data, while the full VRD for Washington state consisted of 3,407,596 records, corresponding to approximately 465 megabytes of data. These totals point to an average record size of 136 bytes per voter. However, the records included only the minimum data needed to perform matching and pointers to the original records; thus, other information such as address, phone numbers, driver’s license number, voting histories, and the scanned image of the voter’s registration card, including the signature, were omitted. 10 Differential backup saves all records that have been changed since the last full backup. Thus, a complete data restore in - volves only two operations—restoring the last full backup and then applying the differential backup. Incremental backup saves all records that have been changed since the last incremental backup. Thus, a complete data restore may involve many opera - tions—restoring the last full backup and then applying the complete sequence of incremental backups in order. On the other hand, incremental backups are much less storage-intensive than differential backups.

OCR for page 17
 TECHNICAL CONSIDERATIONS FOR VOTER REGISTRATION DATABASES A number of states allow EDR today. Although this report takes no stand on the desirability of EDR, EDR appears to be a trend in the evolution of voter registration, and represents a middle ground between those who would relax or eliminate voter registration requirements and those who would tighten voter registration requirements. Depending on how EDR is implemented, it may have no implications at all for the design and deployment of a statewide VRD, or it may have many deep and significant implications. The descrip - tion below is not intended to be a complete discussion of the relationship between EDR, but rather a sketch of some of the important considerations that must be taken into account should any given state adopt EDR. A VRD must perform two essential tasks for the registration of new voters. It must be able to take in information from a voter registration form (data entry), and it must be able to attempt to verify the necessary information with the DMV or SSA (data verification) for registration forms submitted by mail (a HAVA requirement). If data entry is to take place on Election Day, sufficient data entry facilities must be available to handle the demand for EDR. These facilities may be located at some central location(s) or at polling places. Data entry at polling places has major disadvantages, such as a noisy or a sometimes-confused or chaotic environment that may make data entry more prone to error. It also requires a data entry station for each polling place, and additional training for poll workers. Data entry at a central location is likely to enable data entry facilities to be used in a more efficient and less error-prone manner, and voters should be able to cast their ballots at central locations in any event. If EDR is implemented in such a way that data entry and data verification can take place after Election Day, there are few implications, if any, for the design of a VRD, simply because this operat - ing scenario is no different from the traditional one involving a multi-week lag between submission of registration forms and Election Day. Assurances that a voter is legitimate would have to be provided by the first-time voter’s presentation of the necessary identification. And because HAVA requires a match to either SSA or DMV data only in the case of mailed-in applications, a person who registers in person is not subject to data verification. From a HAVA standpoint, it is not necessary to perform data verification for individuals submit - ting voter registration forms to election officials on Election Day if these individuals provide appropri - ate identification at the same time. However, states may have their own verification requirements for nonfederal elections, and in this case, the VRD must have access to the relevant databases on Election Day. As in the case of data entry, verification is likely to be performed in a more cost-effective and more secure manner from one or a few central locations rather than from polling places. Portable Registration Portable voter registration (PVR), defined in this report as the ability of a previously registered voter to vote even if his or her address has changed, has several variations. (In the majority of cases where PVR is implemented, the voter shows up at his or her new polling place or at a central location, submits a change of address form, and is immediately allowed to vote based on the new address.) PVR is required by NVRA for voters who move within a county (more precisely, whose new address falls within the jurisdiction of the same election officials and also is not included within a new congressio - nal district). PVR is allowed but not required by federal law for changes of address within the same state, and several states allow in-state PVR as of this writing. 11 PVR that crosses state lines has not been implemented by any state. 11 These states include Delaware, Florida, Oregon, Maryland, Ohio, Colorado, South Dakota, and Washington. These states variously allow the voter to use a regular ballot corresponding to the new address, a provisional ballot for the new address, and a regular ballot from the old address. In addition, eight other states have implemented EDR, which provides an in-place process for Election-Day address updates. See Adam Skaggs and Jonathan Blitzer, Permanent Voter Registration, New york University, 2009.

OCR for page 17
 IMPROVING STATE VOTER REGISTRATION DATABASES PVR can, in principle, help to mitigate problems arising from a major source of duplicate registra - tions in a statewide VRD—registered voters who change address. PVR does not necessarily have implications for the design of a VRD. There is no reason that a voter’s change-of-address form must be entered into the VRD on Election Day—only that the voter be allowed to vote (preferably based on the new address). It does mean that poll workers must have access to the statewide VRD (or a suitable local copy of it, such as one in paper form or more likely as a DVD or CD- ROM that can be loaded on a personal computer at the polling place) in order to confirm that a voter was indeed previously and properly registered. 3.8 THOUGHTS ON A NATIONAL VOTER REGISTRATION DATABASE Proposals are sometimes made to establish a national voter registration database. In principle, such a database could serve one of two purposes. First, it could be used to coordinate statewide VRDs to eliminate duplicate voter registrations across state lines and to facilitate interstate portability of voter registration. Second, it could be used in support of universal or automatic voter registration—an approach to voter registration in which the need for individuals to take affirmative action to register to vote is eliminated by shifting the burden of voter registration to the states in which these individuals reside.12 Conceptually, the first purpose is an extension of intrastate portability of voter registration. As noted in Section 3.7, statewide VRDs can facilitate intrastate portability and help to address problems aris - ing from duplicate voter registrations within a state. A national VRD for this purpose could easily be constructed by amalgamating the statewide VRDs of all states and other voting districts and using the statewide VRD data export functions to move the data to the national VRD. Such a database would have to contain some 150 million to 200 million entries (the number of registered voters in the United States), and thus would be approximately 10 times as large as the largest statewide VRD in existence today. Despite its larger size, however, performing list management (specifically—eliminating duplicates) on such a database is a relatively straightforward computational task. This task could not be managed on a single personal computer commercially available today in a reasonable time, but a mid-size depart - mental computer using commercially available software and a few dozen terabytes of disk storage would be able to do so with ease. Alternatively, generally available cloud computing services (an example of which is the Amazon Elastic Compute Cloud13) could be employed to perform the computational task. Cloud computing has the advantage of eliminating the need for capital investment in hardware. On the other hand, cloud computing is not a technology with which either the public or election administrators have much experience, and thus the use of cloud computing may suffer from a lack of transparency. The substantially larger size of a national VRD would result in a large number of pairs of entries flagged as possible duplicate registrations. Even with a match rule based on an exact character-by-char - acter matching on first name, last name, and date of birth, it can be expected that around 480 coincidental matches (that is, different individuals who share the same first name, last name, and date of birth) would be identified in comparing VRD lists from Oregon and Washington alone. 14 The use of a universal national identifier would significantly increase the accuracy of any process 12 See, for example, Wendy Weiser and Margaret Chen, “America’s National Embarrassment: Why Is the Rest of the World So Much Better at Signing up the Vote?,” Foreign Policy, July 29, 2009, available online at http://www.foreignpolicy.com/ articles/2009/07/29/americas_backward_voter_registration_system. Note, however, that advocates of universal or automatic voter registration do not necessarily support a national VRD, and a national VRD is not a necessary component of universal voter registration. For example, Weiser also argues that a national VRD could prove costly and unwieldy, and errors in such a database might improperly disenfranchise voters. Eliza Newlin Carney, “Looking Abroad for Answers on Voter Registration,” National Journal, July 20, 2009. 13 See http://aws.amazon.com/ec2/. 14 This estimate is based on the fact that individuals with a common name such as “Sharon Smith” may coincidentally agree on date of birth.

OCR for page 17
 TECHNICAL CONSIDERATIONS FOR VOTER REGISTRATION DATABASES intended to identify possible duplicate registrations. The notion of a universal national identifier is itself politically controversial, and the committee takes no stand on the desirability of adopting such an identifier. Accuracy in the resolution of the identified possible duplicates could be enhanced through the use of tertiary data, as discussed in Appendix C under the discussion of third-party data. Which states would wish to participate in a national VRD? The most likely participants are jurisdic - tions likely to contain the majority of duplicate registrations, that is, adjoining jurisdictions, jurisdictions that serve as “bedroom” communities for another, and jurisdictions that experience seasonal migration. However, the committee notes that connection to a national VRD eliminates the need for multiple bilat - eral jurisdictional data exchanges. Thus, if most states will eventually participate in one bilateral data exchange, that exchange may as well be with a national VRD—and subsequent bilateral exchanges will not be necessary. As for the second purpose, the committee recognizes the political controversy in universal voter registration, and is explicitly silent on the desirability of universal voter registration as a policy choice. Furthermore, a full examination of the technical dimensions of universal voter registration would require more time and resources than are available to this committee. It suffices here to make several observations: • Universal or automatic voter registration generally calls for government authorities (especially state election officials) to use all available data sources (including those from state departments of motor vehicles (driver’s license records), tax rolls, social services agencies, and so on) to assemble lists of eligible voters. Obtaining cooperation from all of these government data sources is likely to require significant effort on the part of political leaders. • Significant coordination with federal immigration authorities and their databases may be needed to minimize the number of noncitizens added to the voter registration rolls. Such coordination is largely unnecessary today. In addition, noncitizens added inadvertently must be protected from legal harm so long as they do not try to vote. • The use of tertiary data to identify eligible voters is likely to enhance the accuracy and complete - ness of automatically compiled voter registration rolls. • Using any of these data sources is likely to be controversial from a privacy standpoint. As a general rule, privacy advocates are concerned when government authorities, whatever their mission, aggregate data from multiple sources. Some sources of data raise particular concerns—tax rolls, social services agencies, and private-sector sources might be included in this category. • Standards for data quality assurance would have to be developed and adopted as a part of any attempt to implement universal voter registration. • The overall cost of universal voter registration may be lower than today’s state-centric system, especially if the effort expended by individual voters in registration is taken into account. Resource- strapped counties particularly may benefit from universal voter registration. • Sustained funding for the voter registration enterprise will be even more necessary that it is today, given the larger role of government authorities in the process. • To the extent that a national VRD is used to support election officials for checking voter registra - tions in real time, security (e.g., against denial-of-service attacks) and system reliability and availability will be issues of concern.