demonstrations address the problems of work flow and scale associated with ingest—for example, How will NARA be able to ingest electronic records fast enough to meet its needs?
The SDSC demonstrations also shed little light on the operational capabilities NARA will need to run ERA systems. ERA users are less likely to have such skills or recourse to the same sort of IT support. Each demonstration was carried out by highly skilled programmers capable of diagnosing problems in the input documents, the ingest processing, and the data finally added to the archive. There was no attempt to build a system for ingest that could be operated on a routine basis by less skilled people. (Users of SDSC’s scientific data management systems are generally highly technically savvy and they can turn to a highly proficient support staff.) More generally, such exploratory work is no substitute for having NARA staff work with a system in a production environment.
The archival file system used, HPSS, should not be used as a model for NARA because it lacks important properties. HPSS is designed for manipulating large data sets on large computers (i.e., scientific data on supercomputers), which have a set of requirements different from those of an electronic archive. For example, HPSS does not provide facilities for replication (redundant copies must be created explicitly), for automatically refreshing the storage media (media refresh has been done under HPSS, but it requires explicit management by staff), or for geographic redundancy.
The SDSC project covered a very small number of file formats and cannot therefore serve as a model for preserving records across the federal government, where a large number of formats will be encountered. The project, for example, did not consider the following:
How to prioritize how much support to provide to which formats (quality of service),
How to determine what subset of formats may cover many of the commonly found record types, and
How to deal with formats for which there are no existing tools to extract the information from which an XML structure can be built.
The efforts to build a knowledge layer are not ready for deployment. Trying to express semantic constraints within records is a worthwhile long-term goal, but the demonstrations of how to “lift knowledge” from a document (such as the Senate legislative activity example discussed above) is not persuasive. These techniques are insufficiently developed to be planned for the NARA system.