in particular, it does not collect date-of-birth information because such information is not related to the primary business purpose of the USPS. Thus, the NCOA system cannot be queried with name and date of birth to learn an individual’s new address. Furthermore, because of privacy considerations, the USPS limits the disclosure of change-of-address information, and thus a name and old address must be presented before a new address can be provided. This limitation is significant because it means that an election official cannot simply query the NCOA database for the new address of an individual known to have moved.

A more detailed discussion of data capture and quality can be found in Appendix C.

3.2
DATABASE INTEROPERABILITY

Database interoperability arises as a requirement because election officials must perform a variety of tasks that involve other databases, ranging from other state VRDs to lists of deceased persons as described above. From a technical standpoint, database interoperability refers to the capability of two databases to exchange data (perhaps with a third-party application) and to use the exchanged data.1 Data exchange involves transmitting and receiving data between two systems, by whatever means, in a way that maintains the usability (preserves the structure and formatting) of the data. Data use depends on the corresponding data fields having the same meaning in each database.

Transmitting and receiving data involve moving the electronic bits that represent the data in question through some channel. In practice, this involves either a communications network connecting the two database systems or use of a physical medium such as a CD-ROM to carry the data. Using a direct linkage (e.g., an Internet connection) provides for real-time communications—the data that are transferred to the receiving system can be kept current with changes. Use of a physical medium generally “batches” the data to be transferred, and thus changes to the sending system’s database will arrive to the recipient with some delay and may not reflect the most recent changes.

As for the data that are passed through either approach, they must be formatted in a manner so that one system can write and the other can read. A common approach to achieve formatting compatibility is to use the sending system’s ability to “export” its data into a known file format (e.g., a comma-delimited file) and for the resulting file to be transmitted or carried to the receiving system.

Data usability is guaranteed if all databases use the same data definitions.2 However, in the situations faced by election officials, data definitions of the comparison databases (the databases containing the data with which VRD data must be compared) may well be different. Ensuring the similarity of data definitions goes beyond classic definitions such as “integer” or “character string”—it also includes issues such as formatting and data semantics.

For example, System A may define dates in a mm-dd-yyyy format, and System B in a dd-mm-yyyy format. The semantics of the two systems may differ: System A may use standardized addresses and strip all punctuation from name fields, whereas System B may not use standardized addresses and may retain punctuation in name fields. Or, System A may include name suffixes in the last name field, and System B may provide a separate field for name suffixes.

Such definitional differences may increase the difficulties of comparing fields unless the definitions of these fields can be reconciled. A variety of technical approaches have been developed for dealing with differing standards or incompatible definitions; see Box 3.1. In any event, data definitions must either match or be transformed in a way that preserves the semantics of the data.

1

In colloquial usage, database interoperability sometimes has a broader meaning that entails data access, of which data exchange is a subset. Database interoperability without data exchange, for example, can refer to the ability of election officials in State A to view records and perform searches in the VRD of State B. Although such a capability can be helpful in individual instances, the inability to perform data exchange prevents any large-scale operation involving either database.

2

On the other hand, a release of a given database system may have data definitions that are somewhat different from those of an earlier release. System developers know that such changes create operational chaos, and thus avoid such changes whenever possible.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement