Possible Future Improvements That Will Require Longer-Term Action
The material below describes actions that are almost certainly guaranteed to take more than several months to implement successfully. Indeed, given the time frame needed to implement changes that require the modification of computer systems (which involve at a minimum time to design, code, test, and document changes, and may require new procurements, procedures, and/or training), the committee cautions election officials against best-case planning scenarios in trying to implement any of them. In other words, none of the actions below should be placed on the critical path for an election that is coming up shortly.
As before, these longer-term changes are directed primarily at election officials at the state and local/county level, and the legislatures and county commissions that make policy regarding the conduct of elections at the state and local level. In some cases, the Election Assistance Commission has a useful role to play as well in facilitating and promoting their implementation. In addition, a number of the recommendations below are directed to the U.S. Congress, to the Social Security Administration, and to various nonelection agencies in the states and counties, because the effectiveness of statewide VRDs will depend on actions that these entities do or do not take in the future.
PROVIDE FUNDING TO SUPPORT VRD OPERATIONS, MAINTENANCE, AND UPGRADES
Recommendation L-1: Provide long-term funding for sustaining voter registration database operations.
The one-time infusion of federal funding provided by HAVA will not—and was never intended to—support VRD operations in the long run. A statewide VRD is a major investment in information technology, and its effective operation over time will require funding for operations, maintenance, and upgrades. The committee is silent on the appropriate source(s) for such funding, which might be some combination of federal, state, and local sources, but makes three critical points:
Funding for operations, maintenance, and upgrades must be sustained over time—whatever amounts are allocated for such purposes must be continued year after year.
The amount required annually to support these activities is likely to be a significant fraction of the sums spent for the initial procurement of a full VRD system—40-50 percent would not be surprising.
Giving short shrift to funding for operations, maintenance, and upgrades is likely to result in poorer performance and the occurrence of avoidable mishaps in the operation of VRD systems.
Improve Data Collection and Entry
Recommendation L-2: Develop and promote public access portals for online checking of voter registration status.
In anticipation of being able to vote on Election Day, prospective voters may wish to check their voter registration status so that any irregularities can be corrected in time. Web-based portals for checking the state VRD increase the ability of individuals to+ do so. For example, such a portal may ask the user to provide a name, birth date, and Zip code, and return either the user’s current registration status or an indication that there is no record on file that matches the information provided. A number of jurisdictions across the country, including Kentucky, Washington, Oregon, Nebraska, and Nevada, provide this service today to voters today.
When protected against security and privacy violations, such portals serve the public interest in increasing transparency of the VRD and create another opportunity for the verification of voter information. They benefit individual voters who want to verify their information, and may provide an opportunity (if it is legal to do so, and if potential privacy concerns over retention of the data can be addressed) for third-party voter registration groups to confirm that the applications they have collected have been received, processed, and accurately entered in the voter registration database.
States that have developed such portals (for example, Nevada1 and Nebraska2) have generally integrated them into their voter registration Web sites. These portals must access information stored in a state’s VRD, which means that their development requires some sensitivity to and technical capacity for dealing with security issues. For example, data compromises have been reported in other instances when live queries have been allowed access to the primary database, suggesting that it may be safer to implement some sort of buffered arrangement whereby the portal provides access only to a synchronized copy containing only the minimum amount of information.
Another point to be considered is the prevention of automated exploitation that might circumvent existing legal restrictions on making the voter registration database available to commercial users; automated tests (“captchas”) that distinguish between human and automated responses (for example requiring the user to type the letters displayed in a distorted image3) may be relevant in this regard, although this is an ongoing battle. Special steps must also be taken to prevent the display of voter registration information for individuals who need protection, such as victims of domestic abuse or individuals in witness protection, and in any event, the information to be displayed at all should be the minimum information needed for the voter to know that he or she is registered to vote and to inform the voter of the proper polling place (for example, driver’s license numbers or SSNs (even SSN4) do not need to be displayed). Some states collect more information (for example, phone numbers, occupation, or e-mail addresses) on their application forms than is necessary for voter registration per se; such information poses increased privacy risks to the individual if needlessly disclosed.
Finally, for all states that provide online verification of voter registration information, it is important to inform voters that they can and should check their voter registration status well in advance of Election Day.
A CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart) is a program that differentiates between humans and computers by generating and scoring tests that humans can pass but current computer programs cannot. For more information, see http://www.captcha.net/.
Recommendation L-3: Allow voters to register and to update missing or incorrect registration information online.
As noted in Appendix C, typographical errors could be reduced significantly by eliminating the data transcription process and importing most or all of the relevant data from another system and/or allowing the voter to enter data himself or herself when necessary. However, the voter will always have to provide at registration some means of authenticating himself or herself at the polls, such as a signature. A mail-in registration form can contain a box for the voter’s signature, but online registration requires the applicant to appear (or to have appeared) somewhere in person at some official government agency to provide a signature. If this signature is digitized, it can be made available to the election official along with the information needed to register to vote. A number of states today take advantage of the fact that applicants for driver’s licenses must provide a signature; these states have developed online registration portals that enable citizens with driver’s license application signatures on file to register to vote online without having to appear in person anywhere.
Registration portals can also leverage the fact that basic information about the individual, such as name, address, birth date, and so on, are often also stored along with the signature—suggesting that importing the relevant data from the original state agency with the signature into the voter registration database is feasible in principle. When the voter registration application required information not already on file, the user would enter the information himself or herself and then be given a chance to verify and correct the information.
In addition, individuals whose registration forms contain illegible or missing information could be notified of that fact and at the same time be given a special code or password that would grant entry to a secure Web page, whereupon the individual could correct or provide the missing data. In the longer term, it might be possible to realize real-time verification of an online application for voter registration, so that an applicant whose information did not match DMV or SSA information on file could be informed of that fact immediately, so that corrections could be made at the moment.
If the individual’s signature is not already online with some other government agency, the individual will have to provide an original signature on a physical registration form. But such a form can be provided to the individual online, filled in online and the data captured, and then printed (and signed) for submission.
Such a procedure has several advantages for this committee’s recommended short-term action regarding online fill-in forms, which would still require that the data be manually captured upon receipt at the election official’s office. With online data capture, the individual’s data can be stored temporarily and then entered officially into the VRD (i.e., made permanent) when the signed form is received. This procedure eliminates the need for further processing of the typed information on the form (i.e., no data reentry or Optical Character Recognition (OCR) scanning), reducing costs and increasing accuracy. In addition, during the period between online data capture and receipt of the form, election officials can “pre-verify” the data entered and contact the individual if the necessary match cannot be made. With contact information on file (such as e-mail addresses), election officials can also remind the individual to submit the form and can provide information regarding drop-off locations for the form at colleges, schools, and other locations. And online acknowledgment of receipt of the signed form can be provided as well.
Online registration would also help UOCAVA voters to register.4 Today, the registration process for military and civilian voters overseas is cumbersome, requiring transmittal of completed registration forms by physical mail. Transmitting the information on voter registration forms would eliminate this sometimes-unreliable step.
Recommendation L-4: Encourage/require departments of motor vehicles as well as public assistance and disability service agencies to provide voter registration information electronically.
The NVRA requires state DMVs, public assistance agencies, and disability service agencies to facilitate the voter registration process. Today, this facilitation is mostly paper-based. Automatically providing information on new applications or changes of address to election officials would significantly reduce the burden of maintaining VRDs by reducing requirements for manual data entry and updating registrations with new addresses.5
As part of promoting cooperation and coordination between election officials and these other public service agencies, states may wish to develop and maintain performance metrics on the percentage of voter registration additions, modifications, and deletions that arrive electronically and on the number of electronic files that arrive from NVRA agencies that contain errors requiring correction. Making such figures public (e.g., through publication at www.data.gov) would provide a way of holding these agencies more accountable for their NVRA responsibilities.
The committee recognizes that election officials have no control over the budgets or operations of these agencies, a fact that often leads to a certain amount of bureaucratic politics as Agency A seeks to persuade Agency B to help carry out the mission of Agency A.
Recommendation L-5: Improve the design of voter registration forms.
The design of forms has a significant impact on their usability and their ability to capture the data that the form filler intends to record. For example, providing a specific separate space for each letter/ number of the name/address often improves the legibility of forms completed, and may improve the suitability of the filled-out form for processing by optical character recognition software. In addition, the design of voter registration forms and data entry screens for VRD systems should be coordinated in order to minimize the data entry clerk’s effort necessary to find information on the form.
Form design is often challenging and generally requires a significant degree of empirical testing to assess the usability of any given design. The committee finds considerable value in the work of the Design for Democracy project in designing election-related forms that are highly usable by lay people.6
Recommendation L-6: Encourage and if possible require departments of motor vehicles, public assistance and disability service agencies, tax assessors, and other public service agencies of state and local government in their communications with the public to remind voters to check and update their information.
Agencies of state and local government communicate with the public regularly, and each such communication is an opportunity to remind voters to check and update their information. Such reminders
Recommendation L-4 is consistent with the EAC’s Voluntary Guidance on Implementation of Statewide Voter Registration Lists, III-D.2-d. This particular guidance notes that states should “ensure that the coordination of information in the verification process is accurate and efficient. Verification of voter registration information shall be accomplished through electronic transmission. Further, to the greatest extent allowed by State law and available technologies, this electronic transfer between statewide voter registration lists and coordinating, verification databases should be accomplished through direct, secure, interactive and integrated connections.” See www.eac.gov/election/docs/statewide_registration_guidelines_072605.pdf/attachment_download/file.
See, for example, the informative reference on the design of forms for use by election officials by Marcia Lausen, Design for Democracy: Ballot and Election Design, University of Chicago Press and American Institute of Graphic Arts, 2007. (The Design for Democracy project recommends for voter registration forms using capital and lowercase letters rather than all capital letters; prioritizing information for registrants over information for administrators; keeping type font, size, weight, and width variations to a minimum; not center-aligning text or headings; using contrast and graphics to support hierarchy and to aid legibility; and not using decorative art or illustration.) More information on the Design for Democracy project can be found at http://www.aiga.org/content.cfm/design-for-democracy.
could be helpful in increasing the accuracy and completeness of the data contained in VRDs. Further, the online environment for state and local agencies provides opportunities for less passive forms of reminder—for example, individuals who use online government services to indicate a change of address (for example, on tax or property assessment records) can be offered reminders to update their registration information, or can even be routed automatically to online voter registration services to effect a similar change of address.
Because these additions would generally entail only small changes to existing applications in these other service agencies, they would be significantly less expensive than implementing the previous recommendation on developing and promoting portals for online checking of registration status and thus might well be a first long-term step that states could take.
Recommendation L-7: Consider providing tracking tags for voter registration forms to improve administrative processes.7
If a jurisdiction were to provide tracking tags, voter registration forms would have a tear-off tracking tag, and online registrants would be told to make a copy of the online form. First-time voters would be instructed to keep the tag or the copy of the online form and to bring it with them when they try to vote. States would keep data (and report such data to the EAC) on how many individuals attempted to vote and were not registered but had their tags or presented copies of their form. States would then be encouraged to lower the number of individuals in this category. In order to discourage attempts to improperly discredit or disrupt the voter registration process (e.g., through the use of fake tags and false claims that an individual was not registered), it might be necessary to provide for statutory penalties for the inappropriate use of these tags.
In addition, the tag might also include a tracking number or bar code to match it with the registration form itself, facilitating the association of specific individuals with specific forms. Blocks of numbers could also be allocated to different organizations to use. On the other hand, because including such a number or code would almost certainly have to be a government function, requiring such numbers or codes might run afoul of the NVRA, which specifically allows private duplication of voter registration forms in order to facilitate their widest possible distribution. In addition, numbered forms would entail additional costs for printing. Some states (e.g., Missouri and New Mexico) provide numbered registration forms today. The committee, however, takes no position on the general desirability of tracking numbers or codes at this time.
Although the use of these tags is not intended to substitute for a proper voter registration or for provisional voting, such tags would provide a factual basis for investigating, at least partially, claims from one political party that supporters of the other party have “pocketed” voter registration forms—that is, when conducting voter registration drives, receiving registrations for people of the opposite party and never turning them in. This activity is against the law, but there can be no proof as to whether it has occurred unless there is some form of receipt given to the person registering. If there were tags, then people who possessed them but were not in the VRD would be proof of some problem, including the possibility that registration forms had gone missing.
Election officials would also note in the language on the form explaining the tag that the tag is intended for administrative purposes, and is in no way a substitute for a valid and properly processed voter registration form. That is, in the absence of clear explanations to the contrary, citizens may believe
that they will be allowed to vote, even if not properly registered, if they can present a tag or a copy of an online registration form to poll workers.
The committee recognizes that the NVRA (Section 8(a)(2)) already requires that election officials provide notice to applicants on the disposition of all voter registration applications. But this requirement can only be met when the applications indeed make it into the hands of these officials—if they never arrive, notice cannot be given, and individuals who never receive a notice cannot prove that they should have received notice.
Another important benefit of such tags is that they can facilitate reminders to third-party voter registration groups to turn in forms that they have been holding for an excessive period of time. Election officials can keep track of numbered registration forms as they are distributed to third-party groups, and if the applicant has the tag when he or she calls the election office, tardy groups can be identified and reminded to turn in the forms they are holding.
IMPROVE MATCHING PROCEDURES
Recommendation L-8: Upgrade the match algorithms and procedures used by election officials, the Social Security Administration, and departments of motor vehicles.
To the best of the committee’s knowledge, many (if not most) of the matching procedures used by the states have been developed on the basis of intuitive reasoning without further systematic validation or mathematically rigorous analysis, do not reflect the state of the art in matching techniques, and have not been validated scientifically, in the market, or otherwise. The best computer matching procedures that have been developed and compared by both researchers and industry do not appear to be widely used by the states for voter registration purposes. State-of-the-art matching techniques have been successfully used in a variety of commercial and government applications. The committee believes that there are several areas in which matching involving VRDs can be improved, and thus recommends that election officials engage the relevant technical community when considering improvements in matching techniques as described in the section “Improving Record-Level Matching” in Appendix B.
The enhanced methods should improve (1) the capability for locating of duplicates in a state’s VRD, (2) the matching of voters against the state DMV file and the SSA files, and (3) the matching of registered voters against any secondary federal or state list (for example, of deaths, felons, and so on). The effectiveness of these enhanced methods could readily be demonstrated by applying them to a particular state’s VRD file and showing (especially through confirmed communication with the voters) how rates of false positives can be quite low even while significantly lowering rates of false negatives.
The committee believes that matching procedures can be substantially improved by implementing four changes that are described below and in greater detail in Appendix B.
Automated name rooting. Matching processes should handle equivalent common names (e.g., Bill, William, Will, Willie) and different spellings of those names (e.g., Jazmine and Jasmine, Mohamed and Muhammad) in a more automated fashion in order to avoid the problems associated with manual processing of equivalent names. This process should be implemented either at the election office or, more ideally at the highest point of integration (e.g., at the DMV or the SSA which provides lookup services for many users). One option is for the system to generate all the name variants. A second option is to assign to each name a most-rooted form (e.g., Bob = Robert and Rob = Robert), and when rooted forms match, putatively different names can be regarded as members of the same name family. The false positive rate will be low if other attributes such as date of birth and SSN4 or driver’s license number are taken into account.
Automated name ordering. Different cultural conventions may affect how names are represented in a database. For example, the Hispanic name Lucia Vega Garcia may be recorded in a database as Lucia Vega, Lucia Garcia, Lucia Vega-Garcia, or Lucia VegaGarcia, depending on how the data entry clerk chose
to represent the fact that she uses Vega Garcia as her “last name.” Thus, matching processes should be able to handle in a more automated fashion different representations arising from ordering and spacing variations. As with name rooting, this process should be implemented either at the election office or, more ideally at the highest point of integration (e.g., at the DMV or the SSA, which provide lookup services for many users). (More discussion of the issues associated with name ordering is contained in Appendix B and Appendix C.)
Wildcard matching capabilities. These capabilities may be useful for searching and matching in the presence of incomplete information.8 Wildcard matching, especially for “*” on name fields located at the beginning of the string, may be impractically slow on large databases because the match may require examining every record in the database. This would be especially true in searching SSA and DMV databases, given the need for relatively quick response. However, if the universe of relevant search can be narrowed (e.g., by using the first few characters of the name, or by using other fields such as a date of birth), a wildcard match can be performed in a much shorter amount of time.
Blocking and string comparators. Used in matching, these techniques—described in Appendix B—return a score indicating the degree of similarity of two fields, rather than the simple “match or no match” outcome of naïve matching algorithms.
The above changes should be implemented in applications of the Social Security Administration and state departments of motor vehicles for processing verification queries from election officials. As a technical matter, it is easier to implement such changes in the query processing application rather than in the query generation application (if done in the generation application, an inordinately large number of queries would be generated). In addition, implementation of blocking and string comparators is likely to be a nontrivial programming task, and such a task may be beyond the resources and technical capabilities of many jurisdictions. Lastly, from the point of view of reducing duplication of effort, implementing it once at the SSA or the DMV makes much more sense than implementing it in multiple jurisdictions. Individual jurisdictions may also wish to adopt these changes to improve intrastate matching (such as in the case of lists of state felons) and when they compare their own VRDs with those of other states.
Although it is not likely that software for implementing these functions will be free for the taking from the Internet, a number of sites provide good points of departure for technical personnel interested in improving matching capabilities.9
Finally, any new matching procedure used in a VRD or to support a VRD should be rigorously evaluated and benchmarked in a public (and preferably peer-reviewed) study against the procedure currently in use in the existing VRD. Although an exact character-by-character match on the first name, middle name, last name, and date-of-birth fields is easily implemented, it must be regarded as a very weak default baseline (and calling such an algorithm a default baseline is not a recommendation that it should be used—it is only a recognition that it is often used). As discussed in Appendix B, it is somewhat common for two different individuals who have a common name such as “John Smith” to also agree on the full date of birth.
Recommendation L-9: Use commonly used unique identifiers for voter identification when available and when necessary privacy safeguards are in place.
Traditionally, the “*” character refers to a string of arbitrary length and arbitrary content, while the “?” refers to a string of length one (1) and arbitrary content. Thus, the string “SMITH*” matches SMITH, SMITHSON, and SMITHSONIAN, or any other string starting with SMITH. The string “R?B” matches RIB, ROB, RUB, and RCB. Conceptually, using wildcard searches is a generalization of automated resubmission for different name variants?—OB matches both ROB and BOB, as well as COB, DOB, and so on.
See, for example, http://datamining.anu.edu.au/projects/linkage.html; http://www.cdc.gov/cancer/npcr/tools/registryplus/lp_tech_info.htm; http://www.mathcs.emory.edu/Research/Area/datainfo/FRIL/; http://www.cs.umd.edu/projects/linqs/ddupe/; http://www.the-link-king.com/; and http://members.shaw.ca/andre.wajda/linkpro.html.
From a technical standpoint, the use of a commonly used unique identifier generally enhances the accuracy of matching. Today, the full SSN is the only commonly used unique identifier in the United States. Thus, if technical considerations were the only relevant considerations, the committee would recommend its use, even though today’s SSN has a number of technical flaws even as a unique identifier. (These flaws include the lack of a check digit and the scarcity of 9-digit SSNs relative to the population of the United States.10)
But technical considerations are not the only relevant ones. Because it is linked to so many other kinds of personal records, the use of the SSN for voter ID purposes inevitably raises significant privacy issues, especially when it must be disclosed under public records acts in the name of openness.11 Similar issues would arise in the United States with any effort to assign a new and unique voter ID identifier to every voter, because of concerns that its use could not be limited to the voter ID application. Thus, the use of a unique identifier for voter identification, which today could only be the SSN, is necessarily conditional on resolving these privacy issues (a task not within the committee’s charge and one that has been examined in many other contexts without definitive policy resolution).
It is also relevant that under today’s law, only six states are allowed to collect full SSNs for purposes of voter registration; these states were “grandfathered” at the time NVRA was passed because they were already using full SSNs as identifiers. The use of the full SSN nationally for voter registration purposes would require legislative change at the national level, and would quite likely be highly controversial.
Recommendation L-10: Establish standards or best practices for matching algorithms.
Standards or best practices for matching algorithms would have three components.
Pre-packaged software implementations of tested and debugged matching algorithms. A repository of such implementations to which states have free or low-cost access could significantly reduce the financial and logistical burden on individual states to implement such procedures and promote the adoption of these procedures. Broad adoption of such packages would provide greater uniformity in how similarly situated voters in different states are treated.
Specifications of acceptable levels of false positives and false negatives and the necessary thresholds for defining matches and nonmatches. When comparison algorithms return numerical scores rather than a binary result, it is necessary to define threshold values for those scores that determine matches and nonmatches. Best practice usually calls for establishing two thresholds, X and Y (X greater than Y), such that for scores greater than X, a match is indicated; for scores less than Y, a nonmatch is indicated; for scores between X and Y, manual review is indicated.
A standardized voter registration data set with known characteristics that can be used to evaluate the performance of specific algorithms and thresholds with respect to rates of false positives and false negatives. (In concept, a similar data set is the data set associated with the USPS CASS system.12) Vendors would then be able to demonstrate in a consistent manner how well their implementations of matching algorithms perform—results of tests involving these implementations could then be compared to the acceptable levels of false positives and false negatives described above.
A number of entities (such as the National Association of State Elections Directors, the EAC, or the National Institute of Standards and Technology) could establish a repository for algorithms, threshold
William E. Winkler, “Should Social Security Numbers Be Replaced by Modern, More Secure Identifiers?,” Proceedings of the National Academy of Sciences USA 106(27):10877-10878, July 7, 2009, available at http://www.pnas.org/content/106/27/10975.full.
Indeed, in Greidinger v. Davis (92-1571), the 4th Circuit Court of Appeals held that a voter has a legitimate privacy interest in preventing disclosure of an SSN to the public when that number is provided for matching and other internal election-related purposes.
values, and standardized data sets—such a repository would support the adoption of best practices and standards for improved matching algorithms.
Recommendation L-11: Use the Social Security Death Master File and STEVE (when deployed) for list maintenance.
In order to purge VRDs of deceased voters, many jurisdictions rely on sources such as newspaper obituaries and information provided by their state departments of vital statistics. Relying only on such data means that these jurisdictions are likely to have a difficult time indentifying voters on their voter registration rolls that die in other jurisdictions (e.g., a New Mexico voter who dies in Texas).
The SSA Death Master File (DMF) is widely regarded as a high-quality database. Use of this database—a national database—would enable jurisdictions to address the dying-out-of-state problem (some jurisdictions, such as Kentucky and occasionally Missouri, already do). Such use is consistent with HAVA, and the committee believes that when a very close match between the VRD record and the DMF record can be accomplished, such a match can be considered sufficient evidence to cancel a voter registration without further investigative action.13 (Similar points and conclusions apply to a high-quality database from a state department of vital statistics.)
However, nongrandfathered states are not allowed to capture a full SSN in a voter registration record, but rather simply SSN4. Matching is more difficult without using a full SSN, and in such cases, even a full match on the remaining fields (as well as the SSN4) should be taken only as an indicator of a possible death that warrants further investigation to see if the person is really dead. Furthermore, because the DMF is a relatively high-quality database, it is likely that with the use of high-quality matching algorithms, the number of false positives (death wrongly indicated) would be exceptionally small, and thus election officials would not be wasting significant resources on further investigation.
At the time of this writing, the STEVE system for exchanging death information is not widely deployed, and thus does not yet provide such information comprehensively. But when it is widely deployed, it is likely to provide information to election officials with death information in a more timely fashion than does the SSA DM, and election officials should either subscribe to STEVE on their own or work their own state departments of vital statistics to obtain STEVE data. VRD systems will need to be configured to accept data from STEVE, perhaps accumulating them as they arrive and performing list maintenance on a “batch” basis.
Recommendation L-12: Use third-party data when available to resolve possible matches.
As discussed in Appendix C, third-party data such as telephone books or multiple previous addresses where an individual has resided can be used effectively to resolve pairs of records identified as possible matches. For example, two records may have the same name and similar dates of birth (12/01/80 and 01/12/80). Third-party data could be used to determine if these two records refer to the same individual; if, for example, these data indicated that both records shared a number of common addresses for the last 20 years, a higher likelihood of this possible match being a true match would be indicated.
Third-party data are likely most useful in applications where a false positive has high consequences—where individuals would be wrongly disenfranchised. In addition, today most uses of third-party data involve manual processing and review by humans. Automated processes to use third-party
data would reduce the number of cases necessitating human review and judgment and would improve the overall accuracy, quality, and repeatability of matching.
Recommendation L-13: Develop procedures for handling potential disenfranchisement caused by mistaken removals from voter registration lists.
Any given removal of a name from a voter registration list may have been performed in error. Indeed, a great deal of experience with information technology suggests that even a combination of automated and human matching can sometimes result in inappropriate action because of data errors, inherent ambiguity in the data, algorithm deficiencies, human error, and so on. For example, a felony may have been reduced to a misdemeanor by the court without that fact being made known to election officials. Other sources of error exist as well, and there is an inherent unfairness in changing a voter’s status and potentially disenfranchising him or her without providing an opportunity for contesting the removal.
Procedures for addressing disenfranchisement could be handled in a number of different ways. For example, one approach is to provide the person removed from a voter registration list with the opportunity to contest that decision before the removal is made final, though understaffed and/or underfunded election offices might find this approach onerous in light of small staffs, high mailing costs, and other pertinent issues. In addition, notification of voters removed from the list may be upsetting to the families of those individuals suffering from the pain of a relative’s death or the person’s being declared mentally incompetent. Another approach might be to allow a voter who was inadvertently removed to vote provisionally. Such an approach is mandated by HAVA for federal elections, but it could be adopted for state and local elections as well.14
Developing such procedures might well require new legislation and administrative processes.
IMPROVE PRIVACY, SECURITY, AND BACKUP
Recommendation L-14: Implement basic practices for backing up important data.15
Basic backup practices include:
Backing up regularly. Backup of data every night (or at least every night after data are entered into the VRD) is a sensible practice.
Keeping backup media for as long as necessary, based on an explicit risk assessment for determining appropriate data retention periods.
Practicing restoration of backups. File backups are useless if they cannot be restored. Although in principle file backups should be easily usable, experience shows that such is not necessarily the case. Most installations learn a lot from the first time they try to restore a backup, and subsequent restores go much more smoothly. Of course, precisely because problems may occur, attempts to restore a backup should occur only at times when such problems would cause minimal disruption.
Storing backups offsite. Backup media should be stored in a physical location that is some distance away from the main site where the database is used—such a precaution protects against a single catastrophe destroying the database at the main site and the backup media. Offsite storage requires both backing up the data and arranging for an alternative facility. Many commercial facilities and services exist for this purpose.
Of course, provisional ballots will be counted only if, in fact, the caster of the provisional ballot is indeed eligible to vote. Since eligibility is determined by a proper and accurate registration of the voter, the caster of the provisional ballot must be able to challenge what he or she believes to be an improper removal from the VRD. Thus, such individuals must also receive the information they need to understand why they were removed and how they might correct the error(s).
See, for example, ca.com/files/whitepapers/backup_recov_wp.pdf.
Maintaining backup logs. Operators should know what is backed up, when it was backed up, and where the backups are located.
Encrypting backups. Backup media are a treasure trove of information for miscreants to steal. They are especially vulnerable to loss or theft while in transit and remain vulnerable while in storage. Thus, backups should be encrypted. It is true that encrypted files are often difficult to restore (passwords can be lost, files corrupted, and so on), but a combination of good backup logs and at least occasional practice of file restore procedures generally suffices to make encryption a reasonable safety precaution to take. Decryption keys should also be stored securely, preferably in locations separate from the backup media.
Performing full backups when possible, and incremental or differential backups when necessary. Because full backups are much easier to restore than incremental and/or differential backups, full backups are recommended if it is possible to perform them within the necessary time constraints (usually, an 8-hour night shift). Incremental or differential backups may be necessary if a full backup would take more than 8 hours, and in such cases, making full backups may have to be done on a weekly or a monthly basis. Incremental backups are also more likely to fit onto a single storage unit (e.g., one DVD), and the ability to use a single unit for backup rather than multiple units may make it feasible to fully automate the backup.
Cycling backups. A robust example of scheduling backups might be a schedule in which data are backed up every day separately (e.g., Monday through Sunday). At the end of the week, the Sunday backup is kept as the backup for the week, and the backup media from the other days are recycled or reused. Sunday backups are kept for the month, and then an end-of-month backup is stored. End-of-month backups are kept for the entire year, and the end-of-year backup is kept in perpetuity or until data destruction practices come into play. This type of cycling backup reduces the risk that by the time some data corruption is found (e.g., 3 months after it occurred), a backup prior to the data corruption cannot be located.
When possible, investing in real-time backup in the form of data replication or mirroring.
In addition, an Election Day full backup should be performed in order to have a permanent record of those who were deemed eligible to vote on Election Day. Such a record provides statistics that would not otherwise be available—and such data could be helpful both to election officials and to researchers.
All parties—state or county—that store data for any length of time should take some responsibility for backing up the data in their possession. However, the most essential backup points are located with the systems of record (that is, where the data are posted for all to use). Secondary aggregations such as state-level databases in bottom-up configurations have backup obligations as well, though they may need backup less frequently than local offices making daily changes that need backup every day, since the state-level database can in principle be recreated from the data contained in the local systems.
Recommendation L-15: Implement basic security measures.
Good security policies and procedures start with a commitment to security being an integral part of an organization’s operating practice. All too often, organizations give lip service to security but in practice are never willing to pay any price (in either operational or fiscal terms) to improve security. The reality is that cybersecurity expenses must be regarded in the same way as expenses for disaster insurance and door locks. Such purchases entail some degree of expense and inconvenience for organizations and individuals and they are ideally never needed, but they are intended as a hedge against the presence of security threats.
Best practices (described further in Appendix D) for security include:
Establishing and enforcing access control policies that group people by established roles and assign to these roles the minimal level of access needed to carry out their job functions.
Limiting the number of people with administrative privileges that afford the ability to grant access to others.
Training authorized users of the system in security practices, such as choosing and protecting passwords and resisting “social engineering” attacks. (A “social engineering” attack is one based on duping an authorized VRD user into taking some action that compromises the security of the system.)
Securing all communications channels used by the system via end-to-end cryptography to protect both the confidentiality and the integrity of the data.
Limiting connectivity between internal and external networks through the use of mechanisms such as firewalls.
Deploying mechanisms such as commercially available intrusion detection and antivirus systems to reduce the risk of cyberattacks or insider misuse.
Minimizing the use of VRD systems for other purposes, and minimizing the amount of non-VRD-related software installed on it.
Limiting the number of access points to the VRD with access to particularly sensitive information such as complete or last-four digits of Social Security numbers.
Obtaining independent security review of the VRD system before deployment and periodically thereafter through penetration testing.
Tracking and logging all changes to VRD data and systems.
Recommendation L-16: Take measures to help ensure system accessibility during critical times.
In some cases, technical fixes can be implemented to enhance system accessibility. For example, a VRD can be designed in such a way that applicant-provided data that cannot be immediately verified are accepted, stored, and flagged as “verification-pending.” Such a feature would enable election officials to continue with data entry if the nonelection databases on which they depend are unavailable during periods when the volume of voter registration forms is high.
On the other hand, DOS attacks against Internet-based VRDs are difficult to mitigate—the only known solution with broad applicability is the acquisition of additional bandwidth to “soak up” falsified requests for service. Such a solution is expensive and is likely not to be cost-effective, given relatively few DOS attacks in the elections environment.
Absent such measures, election officials can only make contingency plans for a DOS (e.g., ensuring that copies of a statewide VRD are widely distributed on a computer-readable DVD to polling places on Election Day, saving paper forms for later entry when automated entry is not available). As a general rule, the best contingency plan for electronic outages is the ability to use (temporarily) whatever paper-based procedures were in place before the VRD system was introduced. Such a measure requires clear documentation and a modicum of training for Election Day poll workers and election officials.
In the election environment, a specific measure recommended by the committee is to make the entire VRD available to poll workers. (In some cases, the entire VRD may refer just to a county VRD, and in other cases, to the full state VRD.) The easiest and most secure method for doing so is probably to write the relevant fields of each record in the VRD to a file and then to distribute the file to every precinct.
Distribution could take place over the Internet (but would most likely require a broadband connection, which is not available to every county) or via CD-ROMs or even paper. The former has the advantage of currency—using broadband transmission, file creation could reflect the most recent updates. The latter have the disadvantage of latency—physical media take time to mail, and by the time they arrive at the election offices, they will be a few days out of date. On the other hand, they do not require any special technology to use.
Lastly, other agencies—such as state DMVs and the Social Security Administration—should take steps to ensure the availability of critical election-related databases during times of peak electoral business. The committee calls special attention to the Columbus Day weekend, which occurs in near prox-
imity to Election Day. Although the holiday is recognized by the federal government and many state agencies, it is also a period in which election officials process enormous numbers of voter registration applications. Accessibility to DMV and SSA databases during this period is extraordinarily important from an elections management standpoint.
Recommendation L-17: Consider fair information practices as a point of departure for protecting privacy in voter registration databases.
Although fair information practices are often regarded as a reasonable framework for balancing privacy of personal information against the needs of users, judgments about protecting privacy have to be subject to a balancing test against other interests to be served by public policy. A full implementation of FIPs for voter registration databases is likely to conflict with other legal requirements for openness and to interfere with administrative efficiency. For this reason, the committee believes that FIPs should be only a starting point for election officials thinking through their privacy policies. That is, it is the spirit and philosophy underlying the FIPs rather than a literal reading that should guide the efforts of election officials in designing VRD systems.
For example, FIPs afford the individual a high degree of control over the disclosure of his or her personal information. But openness of individual voter registration information also serves valid public purposes (e.g., as a tool for helping to prevent or reduce voter fraud) and facilitating communications between election candidates and voters. One possible way of balancing these interests would be to provide selected individuals—but only those individuals—with opportunities for limited disclosure of information (such as addresses).
Recommendation L-18: Take steps to protect voter privacy when voter registration data are released on a large scale.
Although voter privacy is important even when just one voter’s information is at stake, large-scale compromises of personal information can be particularly damaging. Obviously, election officials must do what they can to protect information while it is within their control. But once the information has been released (putatively in accordance with the applicable law), election officials have no effective technical control over how that information will be actually used. To the extent that they can do so, election officials should find a way to bind the recipient—legally—to take the necessary precautions.
Election officials can take some steps to trace how data are used. As discussed in Appendix D, they can seed the data before they are transferred with one or more fake record(s) that can be used to indicate subsequent misuse.
Recommendation L-19: Review appropriate nonelection uses of voter registration data.
States use voter registration data for a number of purposes other than election administration. One of the more common uses is for juror selection—voter registration lists are often one of the sources used to compile lists from which potential jurors are selected. Many states also make such lists available to political parties to facilitate communications with voters in their parties. A number of states regard voter registration lists as public information, and disseminate them to any party willing to pay a nominal fee, though they may place restrictions on the use of such data (e.g., not for commercial purposes).
Nonelection uses of and/or restrictions on voter registration data are sometimes contested by parties wishing to use the data for their own purposes, which may include commercial purposes, nonpartisan educational purposes, and so on. The committee notes that most state and local policy regarding how voter registration data can be used and by whom was developed in a technological environment that did not make it easy to aggregate personal information on a widespread basis, in which commercial
use of all forms of individual data was not commonplace, and in which privacy concerns were not as salient in the public eye as they are today.
For this reason, the committee believes it is appropriate for state policy makers to review their policies regarding nonelection use of voter registration data (who should have access, under what circumstances, and for what purposes) with particular attention to whether the users, uses, and restrictions entailed are consistent with the purpose for which voter registration information is requested and collected.
IMPROVE DATABASE INTEROPERABILITY
Recommendation L-20: Encourage and if possible require state, local, and federal agencies to cooperate with election officials in providing data to support voter registration.
The starting point for achieving database interoperability between a VRD and the databases of other agencies is a willingness and a desire of those other agencies to share data with election officials. But because sharing data with election officials does not generally further the primary mission of those agencies, it is not difficult to imagine that devoting resources to this task would be low on their priority lists.
Broadly speaking, there are two ways to seek the necessary cooperation in the face of agency reluctance—invocation of higher authority, and incentives to cooperate. When the other agencies in question are under state control, the governor’s office may have an important role to play in persuading them to provide data to election officials in a timely fashion. When the other agencies in question are under local control, state incentives and/or directives may be necessary to secure cooperation.
The NVRA requires various state agencies to make voter registration forms available to people seeking the services that these agencies provide. The data that service-seeking individuals must provide to the relevant agency in many cases has all of the information needed to perform voter registration for these individuals to vote (and these individuals must usually provide signatures to obtain services). These data are electronically captured and thus could, in principle, be easily available to VRDs.
But the reality is that the NVRA requirement is met by merely delivering completed forms to election officials. Data already captured in electronic form are generally not transmitted to election officials, leaving them with the same data entry task as always. Nothing in the NVRA forbids these agencies to transmit their electronic data to election offices, but doing so would require these agencies to identify individuals who would like to register to vote and then to make their data available to election officials in electronic form. At the very least, these agencies would have to invest in some redesign and reimplementation of some parts of their information technology systems. Raising the priority of these agencies for implementing changes that primarily benefit election officials is likely to require direction from higher authority, such as governors or state legislatures, and the NVRA may itself need to be clarified to allow electronic transfer of registration information or modified to require such transfer.
A second kind of support involves access to useful federal databases, such as the USPS National Change of Address database and the Social Security Death Master File. Although such databases are in principle available to state election officials, access suitable for the needs of these officials is often expensive relative to the financial resources. In some cases, election officials cannot approach the relevant agency directly, but must instead go through a qualified commercial provider.
In general, these barriers to access arise from the fact that state/local election registrars are regarded as customers on a par with other private sector or commercial entities. Providing privileged low-cost access to these databases for election officials to help maintain accurate and complete voter registration rolls would seem to serve a worthy public purpose, and the committee would support providing such access even if it would require legislation to do so.
Recommendation L-21: Use inexpensive data export functions to facilitate data exchange.
It is commonly believed that direct linkages (real-time electronic interfaces) between systems are essential for effective data exchange. Although direct linkages generally provide the most current and recent data (because they have direct unmediated access to the database in question), they are often expensive to deploy and complex in operation. By contrast, data can also be exchanged using the sending system’s ability to “export” its data into a known file format (e.g., an Excel spreadsheet, or a comma-delimited file). Such a file can be written onto a physical medium or sent electronically. This approach may not capture the most recent data, but since most systems support a data-export function, it entails a very low cost of operation.
In addition, file comparisons can usually be performed offline, that is, using applications separate from the core VRD application that read the exported files. Offline applications have the major benefit that they can be developed without rewriting the VRD system itself, and they thus pose little danger to its functionality.
Recommendation L-22: Develop national standards for data-exchange formats for voter registration databases.
Standardized field definitions for database records greatly facilitate the common processing of records derived from different databases, as might be entailed, for example in a search for voter registration records for the same person in two different databases. For example, one database may record dates in a yyyy-mm-dd format and the other in a dd-mm-yyyy format. Or, one database may include a suffix such as Jr. or Sr. as part of the last-name field, and another might include a separate field for suffixes.
One approach to implementing standardized field definitions is for every database to adopt the same conventions for recording data. Thus, any export of that data outside the system’s boundaries would automatically be rendered consistent with any other database. On the other hand, requiring all systems of record to convert to the same standard field definition necessarily entails operational costs (e.g., disruption to current ongoing operations, costs of reprogramming internal logic of the applications using the databases, and so on) and is thus impractical.
A second approach, and one recommended by the committee, focuses on standards only for data interchange—that is, standardized field definitions only for data that are intended to be used by another database system. With such an approach, the internal logic of applications can remain unchanged (because the data are stored in the same format as before). But the data are converted into this standard form when data are prepared for export.
Standards for name, date, and SSN representation are, in principle, not difficult to implement. The last-name field should include name suffixes, such as Jr. or III.
The implementation of this recommendation would do much to facilitate the matching of records within different VRDs. But the committee also notes that the data of interest for matching VRD records— name, SSN, date of birth—are sufficiently standardized in their definitions that with the use of string comparators, these data fields can be used for matching purposes.