Consider, by way of example, a memo in the form of a word-processor file obtained from a disk of a White House staffer; the text of the memo identifies the author, the date it was written, and the recipient. The items in the text could be recorded as metadata in anticipation of indexing and searching metadata to find all documents written by a given author, for example. If these metadata are not recorded as the record is ingested, they can be recorded later. By contrast, there are ephemeral metadata about the record—items such as which staffer’s PC contained the disk, the date and time the file was written onto the disk, the version of the operating system used—that are not evident from the record itself and that will be lost unless explicitly recorded during ingest.
Databases offer another example. One of the challenges of preserving older databases is that schema documentation is absent. In particular, many critical integrity assumptions are not made explicit, nor can they be deduced by inspecting the data, though in many cases it may be possible to deduce them by analyzing a corpus of queries. Metadata that should be saved would include formal and informal information on schema information, query libraries, and so forth.
6. Save essential external references that are implicit or explicit in the record. As is well-known to archivists, a digital resource will often make references to other resources. In some cases, this is because these resources represent other components of the same “compound document.” For example, the image components of some document types are stored as physically separate resources. In general, digital records may comprise multiple explicit components, so to preserve such a record, one must be vigilant about archiving all of its components. Implicit references to resources such as default style sheets or fonts must also be considered. It may be valuable to have a tool or process to ensure that records are in fact saved in their entirety.12
In principle, this is a straightforward goal, but no clear solutions exist for managing external references; it is an active area of research in the digital library community. Digital records present some other challenging problems, including these: (1) The cross-references are buried inside the representation rather than being explicitly visible, as are the citations in a paper report, and (2) digital cross-references often use naming schemes (for example, file numbers, local file names, or URLs) that are unstable, not standardized, and may not survive very long. Indeed they may have stopped working by the time that the document is ingested.
One way to do this is to simulate, at ingest time, access and presentation of the record, making sure that all resources used by the access and presentation processes are available in the archive. The full set of external references required to support the emulation approach includes the files required to install the application, operating system, and other supporting facilities on bare hardware. These are all implicit external references that would need to be preserved in a software repository, perhaps as part of the ERA or perhaps shared with other digital archives.
The Vesta research project (Allan Heydon et al., 2002, The Vesta Software Configuration Management System, SRC Research Report 177, Compaq’s Systems Research Center, Palo Alto, Calif. Available online at <http://gatekeeper.dec.com/pub/DEC/SRC/research-reports/abstracts/src-rr-177.html>) developed this sort of capability for all of the code and other resources required to build a large software system.