in particular, it does not collect date-of-birth information because such information is not related to the primary business purpose of the USPS. Thus, the NCOA system cannot be queried with name and date of birth to learn an individual’s new address. Furthermore, because of privacy considerations, the USPS limits the disclosure of change-of-address information, and thus a name and old address must be presented before a new address can be provided. This limitation is significant because it means that an election official cannot simply query the NCOA database for the new address of an individual known to have moved.
A more detailed discussion of data capture and quality can be found in Appendix C.
Database interoperability arises as a requirement because election officials must perform a variety of tasks that involve other databases, ranging from other state VRDs to lists of deceased persons as described above. From a technical standpoint, database interoperability refers to the capability of two databases to exchange data (perhaps with a third-party application) and to use the exchanged data.1 Data exchange involves transmitting and receiving data between two systems, by whatever means, in a way that maintains the usability (preserves the structure and formatting) of the data. Data use depends on the corresponding data fields having the same meaning in each database.
Transmitting and receiving data involve moving the electronic bits that represent the data in question through some channel. In practice, this involves either a communications network connecting the two database systems or use of a physical medium such as a CD-ROM to carry the data. Using a direct linkage (e.g., an Internet connection) provides for real-time communications—the data that are transferred to the receiving system can be kept current with changes. Use of a physical medium generally “batches” the data to be transferred, and thus changes to the sending system’s database will arrive to the recipient with some delay and may not reflect the most recent changes.
As for the data that are passed through either approach, they must be formatted in a manner so that one system can write and the other can read. A common approach to achieve formatting compatibility is to use the sending system’s ability to “export” its data into a known file format (e.g., a comma-delimited file) and for the resulting file to be transmitted or carried to the receiving system.
Data usability is guaranteed if all databases use the same data definitions.2 However, in the situations faced by election officials, data definitions of the comparison databases (the databases containing the data with which VRD data must be compared) may well be different. Ensuring the similarity of data definitions goes beyond classic definitions such as “integer” or “character string”—it also includes issues such as formatting and data semantics.
For example, System A may define dates in a mm-dd-yyyy format, and System B in a dd-mm-yyyy format. The semantics of the two systems may differ: System A may use standardized addresses and strip all punctuation from name fields, whereas System B may not use standardized addresses and may retain punctuation in name fields. Or, System A may include name suffixes in the last name field, and System B may provide a separate field for name suffixes.
Such definitional differences may increase the difficulties of comparing fields unless the definitions of these fields can be reconciled. A variety of technical approaches have been developed for dealing with differing standards or incompatible definitions; see Box 3.1. In any event, data definitions must either match or be transformed in a way that preserves the semantics of the data.