5
Record Integrity and Authenticity

An archive is vitally concerned with the authenticity and integrity of its holdings. An “authentic” record is, loosely, one that is what it purports to be: it was duly issued by an authorized person or agency. A record has “integrity” if it is preserved without any alteration that would impair its use as an authentic record.

The issue of digital record assurance, the topic of this chapter, is discussed in greater detail than are other technical issues in this report for two reasons. First, the committee did not address this topic in depth in its first report, which emphasized other system design and related acquisition issues. Second, digital assurance is an area in which correct implementation requires great care.

To ensure the authenticity and integrity of paper records, archivists have worked out techniques such as establishing a chain of custody from a record’s issuer to a record’s user. Although the same techniques can be applied to digital records, the properties of digital records compel the use of additional techniques for ensuring authenticity and integrity. Digital techniques are available that can provide much stronger assurances than can existing techniques for paper records. Also, digital records are potentially more vulnerable to forgery and tampering—an attacker with access to the archive’s computer could add, delete, or alter records in a wholesale fashion or make subtle alterations that would be difficult to detect by inspection. Increased experience with personal computers has made the public aware of how easily digital documents can be altered undetectably. The committee firmly believes that within the decade, both the public and the courts will have little confidence in digital records that lack the best assurances that technology can provide.

Digital checks have another huge advantage: an archive can verify the integrity of all records periodically. If errors are found—whether due to hardware or media failures, operator errors, or unauthorized changes—a redundant, undamaged copy of the record can be retrieved. When this form of auditing is performed often enough, errors in the archive can be fixed before they spread (e.g., before a redundant copy is created from an erroneous record).



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 59
Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy 5 Record Integrity and Authenticity An archive is vitally concerned with the authenticity and integrity of its holdings. An “authentic” record is, loosely, one that is what it purports to be: it was duly issued by an authorized person or agency. A record has “integrity” if it is preserved without any alteration that would impair its use as an authentic record. The issue of digital record assurance, the topic of this chapter, is discussed in greater detail than are other technical issues in this report for two reasons. First, the committee did not address this topic in depth in its first report, which emphasized other system design and related acquisition issues. Second, digital assurance is an area in which correct implementation requires great care. To ensure the authenticity and integrity of paper records, archivists have worked out techniques such as establishing a chain of custody from a record’s issuer to a record’s user. Although the same techniques can be applied to digital records, the properties of digital records compel the use of additional techniques for ensuring authenticity and integrity. Digital techniques are available that can provide much stronger assurances than can existing techniques for paper records. Also, digital records are potentially more vulnerable to forgery and tampering—an attacker with access to the archive’s computer could add, delete, or alter records in a wholesale fashion or make subtle alterations that would be difficult to detect by inspection. Increased experience with personal computers has made the public aware of how easily digital documents can be altered undetectably. The committee firmly believes that within the decade, both the public and the courts will have little confidence in digital records that lack the best assurances that technology can provide. Digital checks have another huge advantage: an archive can verify the integrity of all records periodically. If errors are found—whether due to hardware or media failures, operator errors, or unauthorized changes—a redundant, undamaged copy of the record can be retrieved. When this form of auditing is performed often enough, errors in the archive can be fixed before they spread (e.g., before a redundant copy is created from an erroneous record).

OCR for page 59
Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy A comprehensive system design that addresses authenticity and integrity has many pieces and a great many design and operational details. This chapter first describes basic tools and principles for digital record assurance and then offers some basic detailed approaches. Those approaches are presented as examples of the level of care that must be applied; it is not claimed that they are the only possible approaches. DIGITAL ASSURANCE TOOLS AND PRINCIPLES Digital assurances for records are based fundamentally on maintaining multiple, geographically and administratively separated copies and on using cryptographic techniques to provide integrity checking and secure transmission of records to and from the archive. These techniques and their appropriate application to a long-term archive are discussed below. Geographical Replication Multiple, geographically and administratively separated replication provides an essential technique for protecting integrity. There are various ways to meet this requirement, involving complete replicas or multiple partial replicas. A detailed design starts with reasonable goals for the acceptable bit loss rates and desired availability for each local site. Given these reliability metrics for the local sites, one can compute the reliability of an N-way replicated archive, and the total archive can be designed to achieve a specified level of reliability. At any given point in time, the engineering and design question will be how to achieve these goals in a cost-effective manner with currently available storage technologies. Storage technologies and their relative prices are evolving rapidly, so the National Archives and Records Administration (NARA) needs a flexible design that can evolve along with them. Cryptographic Techniques As described below, cryptographic techniques provide basic tools for ensuring authenticity and integrity.1 Both qualities depend on cryptographic algorithms for which forgery is computationally infeasible: that is, defeating the system (e.g., by altering the original record without altering the digest or its signature) would require so many samples of records or so much computing power that no attacker has the resources to succeed. Integrity Check Using Hash Digests The technique of computing a hash digest or checksum of a record is used to check its integrity. A secure hash algorithm computes a compact hash digest from the digital bits that comprise a record. There are several algorithms in common use; Federal Information Processing Standards (FIPS) Publication 180-22 specifies four standard algorithms. The standard explains that a secure hash algorithm is one for which “it is computationally infeasible (1) to find 1   More details on these and other assurance techniques can be found in the following reports from the National Research Council’s Computer Science and Telecommunications Board: Cryptography’s Role in Securing the Information Society (1996), Computers at Risk: Safe Computing in the Information Age (1991), and Trust in Cyberspace (1999), all published by National Academy Press, Washington, D.C.

OCR for page 59
Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy a message that corresponds to a given message digest, or (2) to find two different messages that produce the same message digest. Any change to a message will, with a very high probability, result in a different message digest.”3 In the field of databases, “witness” is the technical term for a hash digest that is computed when a record first enters an integrity-management system. One later verifies the integrity of an offered record by computing a new hash digest from the offered record and comparing that new value with the witness. Secure Transmission Various cryptographic techniques can be used to ensure that what is received is what was sent. A digital record can be authenticated by creating a digital signature that can be verified by anyone using the record. If the record is altered in any way, the signature check will fail. To sign a digital object, one performs a key-driven cryptographic transformation, typically on a witness of the digital object (such as a hash digest) rather than on the object itself, to create what is known as an authentication tag or signature for the given digital object. To verify the origin of a digital object, one performs an algorithm that involves a second key-driven cryptographic transformation, taking as input the authentication tag and a public key of verified authenticity; the output is a binary value, “origin verified” or “origin not verified.” The meaning of “origin verified” is that the object originated with someone who possessed the private key. One must trust that the putative possessor of the private key has properly protected it, so that he or she is the only one who could have created the authentication tag. It is important to recognize that digital signatures have limited value for the long-term preservation of a chain of custody or for data integrity. Their value is limited by their validity window: the time-to-compromise of the secret signing key, the time-to-compromise of the signature algorithm, and the time-to-obsolescence of the public key infrastructure—whichever is shorter. For example, if a private key used to form digital signatures for records becomes compromised as of a certain date, any records verified with the public key corresponding to that private key after that date are suspect. The case of a private key becoming compromised after it is no longer in use is more subtle: while the compromise does not endanger records already verified, it allows the attacker to forge a document and to record its creation date as one during which the key was still valid. Discovery of the compromise of a key may come a long time after it actually occurred. Compromise could also occur as a result of a cryptoanalytic attack.4 Digital signatures are, therefore, an excellent means of verifying that recently transmitted data—such as a set of records being transferred from an agency to 2   National Institute of Standards and Technology (NIST). 2002. Federal Information Processing Standards [FIPS] Publication 180-2. Information Technology Laboratory, NIST, U.S. Department of Commerce, Washington, D.C., hereafter referred to as FIPS Publication 180-2. Available online at <http://csrc.nist.gov/publications/>. Accessed May 23, 2005. 3   NIST. 2002. FIPS Publication 180-2. 4   The first widely distributed description of the RSA algorithm for public-key encryption was in Martin Gardner, 1977, “Mathematical Games: A New Kind of Cipher That Would Take Millions of Years to Break,” Scientific American 237(2):120-124. The article included a challenge ciphertext, which was decoded in 1994, as reported in Derek Atkins, Michael Graff, Arjen K. Lenstra, and Paul C. Leyland, 1995, “The Magic Words Are Squeamish Ossifrage,” Advances in Cryptology (ASIACRYPT’94), Lecture Notes in Computer Science 917:265-277.

OCR for page 59
Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy NARA—actually came from where it is claimed that they came from, and a poor means for authenticating the far-in-the-past origin of stored data. Digital signatures can be used together with several other cryptographic tools to establish a secure communications channel. The Secure Sockets Layer (SSL) protocol, first implemented in Netscape browsers, later standardized by the Internet Engineering Task Force, and implemented today in all widely used Web browsers, is an example of a secure channel. It establishes a secure channel on top of the insecure channel provided by the Internet’s Transmission Control Protocol (TCP) by an exchange of credentials (using digital signatures), negotiation of shared secret keys, and then exchanges of message authentication codes (MACs). The MACs are essentially a shared-secret-key signature mechanism. One important step that is sometimes overlooked in the application of SSL is that the protocol reports who is at the other end of the secure connection. For authentication to be complete, the recipient must check that report to see that the correspondent that it identifies is the expected one. Many current Web browsers that use SSL either omit or deemphasize this step, with the consequence that their supposedly secure channel may not be secure at all. Long-term record assurance depends on measures other than digital signatures or secure channels. It requires (1) at record ingest, securely transmitting the data from its origin to NARA and recording metadata about that assurance; and (2) from then on, maintaining the integrity of the stored data and associated metadata while the record resides in the archive. These two steps, ingest and retention, are examined in the following sections. ASSURANCE AT RECORD INGEST Verification of Validity Agencies that create and retain records must use appropriate techniques for authenticating the origin of records and for maintaining their integrity while they remain in the custody of the agency. When a record is transferred to NARA, the agency should transmit, along with the record, metadata that exhibit an audit of the record’s assurances using the agency’s internal techniques. If the agency maintained integrity checks, these should be included as part of the metadata. Digital signatures may be helpful in this verification, depending on the particular circumstances. If a set of records were digitally signed and the digital signature system is still operational and the validity window (see above) is believed not to have expired, agencies can provide additional information about record validity by checking the signatures and recording the result in metadata. On the other hand, it would be problematic for NARA itself to perform this verification, because the verification depends on the operation of a system outside NARA’s control. Moreover, if an agency retains custody of its records for a lengthy period, it must face the prospect that the validity window of its digital assurances expires before the records are transferred to NARA. Once a record’s authenticity has been verified and the record ingested, the primary requirement from then on is to record the verification method and outcome in the metadata and to use integrity assurances to protect the integrity of both the record and the metadata while they are retained in the archive.

OCR for page 59
Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy Assurance in Transmission As NARA evolves away from transfer and storage based on the delivery of physical media toward network-based transfers and online storage, the use of more robust safeguards in transferring records from agencies to NARA becomes even more important. For example, verifying that an electronic file transfer via File Transfer Protocol (FTP) yields a file with the right number of bytes provides little assurance against either unintentional corruption or deliberate tampering. Assurances for the transmission between the Electronic Records Archives’s (ERA’s) ingest component and the originating agency’s system are needed to provide adequate assurance that the transmitted data come from the claimed source and thus that an unauthorized person has not attempted to submit false records and that the records have not been altered in transmission. The best option is for NARA to use a standard and well-vetted scheme rather than a custom scheme (which would necessarily have been subject to less scrutiny). A good example of such a standard, widely used scheme today is a properly authenticated SSL connection. Records themselves can be sent securely via such a channel. Alternatively, a secure channel can be used to send individual hash values for the records, and the records can be sent any which way as long as they are checked against the hash values at NARA’s end. If Standard Form 258 (“Agreement to Transfer Records to the National Archives of the United States”) becomes electronic in the future, it would also be appropriate to authenticate and securely transmit these forms as well. Guarding Against Human Error Multiple copies and integrity checks can protect against mechanical threats to integrity, but they do not address the biggest remaining threat to integrity: human error. To illustrate: a careful look at “agency report 493” reveals it to be another copy of “agency report 492,” accidentally created when someone clicked on “Revert” just before clicking on “Save as.” Such errors can occur within an agency before the records reach NARA or during the ingestion process. NARA ingest processes should be designed to minimize the opportunity for human error of this sort. In addition, NARA may be able to afford to have humans occasionally inspect the incoming records and query originating agencies to resolve problems. The results of such interactions with agencies would constitute additional metadata (e.g., “at ingest, so-and-so received assurances from so-and-so that this record, and not record xxyyxx999-04444, is the authentic representation of the document with the given title”). In other words, ingest negotiations add additional metadata. This observation is not intended to suggest that NARA undertake comprehensive examination and repair of records at ingest. ASSURANCE DURING RETENTION The starting point of a full design for the assurance of archived records is to carry out a careful, comprehensive threat assessment (see the discussion below). This section considers several of the obvious threats and techniques available to address them. Designing digital assurances into an electronic records archive is similar to designing security measures. First, the cryptographic techniques must be chosen carefully, and the

OCR for page 59
Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy archive must be ready to change the cryptographic algorithms that it uses. For example, if someone were to discover a way to create two different documents that generate the same hash digest with the current hash algorithm, the algorithm would need to be replaced. Second, the overall system design, not just the cryptographic mechanisms, must not allow openings for attackers. For example, if an attacker can surreptitiously tinker with the code that NARA runs to perform signature or hash code verification (to make it say “yes” when it should have said “no”), the assurances are worthless. If such vulnerabilities are discovered, the system will require modification. Cryptographic Integrity Assurance Multiple, geographically separated copies provide the fundamental basis for establishing integrity. Hash algorithms provide a performance enhancement for checking that records have not been corrupted, whether by hardware failure, system errors, or deliberate tampering. The integrity of a record can be verified by computing the hash digest for the bits in the record and comparing this digest to a previously calculated and separately protected witness for the record. If the two digests match, the integrity of the record is established to a very high degree of probability. A witness computed for a bit stream that includes the record plus metadata representing the time at which the hash was computed provides a digital time stamp that allows one to verify the contents of an object at the time the witness was calculated and published. Although the description of integrity checking presented here has assumed that each record carries a separate hash digest, this need not be the case. After all, integrity considerations apply not only to individual records but also to collections, or series, of records. A town clerk who issues 10 marriage licenses on July 12 passes 10 records to the archive, but also attests that these records constitute the entire collection of licenses issued on that date—that is, that these and only these licenses were issued on July 12. A hash can be computed for a set of records in a given order. Checking such a hash may require retrieving all of the records on which it is based and recomputing the hash, which might be time-consuming. In exchange, however, fewer known good hashes have to be published. To protect against malevolent change by an attacker, a record’s witness must be separately protected so that an attacker who manages to gain access in order to change the record cannot also alter the witness. Since a hash digest can be written in a relatively small number of digits, one way to protect it from change is to publish it in a very public place, such as a classified advertisement in a major newspaper (which will, a short time after publication, be distributed on microfilm to many hundreds of libraries), or by otherwise depositing the hash value in hundreds of libraries. (Such a hash publication service could be provided by the Government Printing Office, which currently distributes government documents to Federal Depository Libraries.) Because integrity checking requires being able to establish a relationship between a record and its published hash digest, a unique record identifier associated with each record is very useful. The published list is then simply a correspondence between record identifiers and hash digests. Techniques for combining hash values can be used to reflect the integrity of a huge archive using a single hash digest, which is more practical to publish than a very large number of individual hashes. The most straightforward approach would be to compute the single hash digest formed by hashing the hash digests of every record in the archive, but such a sequential computation would be prohibitively time-consuming. Also, a too-simple combination such as concatenating all hash values may lower the precision of the integrity check. Techniques have

OCR for page 59
Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy been devised that require far less computation to update a single hash value when the archive is changed and do not lower the precision of the check.5 The important property of all techniques that compute a single hash value is that the value irrefutably depends on the precise bit-string contents of every record in the archive.6 Other Assurance Measures Cryptographic protections are only a part of the solution, however. They detect damage or attack only after the fact. Prevention and repair are even more crucial, and they depend on careful procedures and system designs for handling digital records. They include the maintenance of multiple, geographically and administratively separated copies; access controls; audit logs; retention of old or deleted files to recover from unauthorized or faulty operations; and procedural safeguards for changing the archive contents to reduce human error (e.g., requiring multiple people to authorize a change). These protection measures are important in order to avoid or correct tampering with the archive. Although cryptographic techniques can detect that tampering has occurred, they are not able to deter or repair tampering. Contingencies What if stored records are found to contain errors? What if one or more of the cryptographic algorithms on which digital assurances are based are shown to be easy to subvert? Over the life of records in the archive, some of these events are sure to occur. Thus, the archive must be designed to anticipate corrective responses. Correcting Post-Ingestion Errors Errors may creep into the archive after records have been verified and ingested. This could happen as a result of hardware failure, operational error, a software bug, malicious attack, or something else. Errors are detected by comparing each file in the archive with another copy or by reading the file, calculating its current hash, and comparing that hash with the separately protected witness for that file. There must be an ongoing process that performs such comparisons so that errors do not accumulate unnoticed. 5   One such approach, first suggested by Ralph Merkle (1980, “Protocols for Public Key Cryptosystems,” pp. 122-133 in Proceedings of the 1980 Symposium on Security and Privacy, IEEE Computer Society Press, Los Alamitos, Calif.), is to build a binary tree out of the million witness values, as follows. The leaves are these values, and each internal node is the hash of the concatenation of its two children. The root of this tree is published so that it is widely available and hard to tamper with, such as in a major newspaper that is stored on microfilm in libraries across the nation. Stored with each of the million witness values is the following linking information: the list of 20 sibling hash values (each one accompanied by a bit indicating whether it is the right or the left sibling) along the path from the leaf up to the published hash value. 6   The Merkle-tree technique was adapted for the purpose of digital time-stamping by D. Bayer, S. Haber, and W.S. Stornetta, 1993, “Improving the Efficiency and Reliability of Digital Time-Stamping,” pp. 329-334 in Sequences II: Methods in Communication, Security, and Computer Science, R.M. Capocelli, A. De Santis, and U. Vaccaro (eds.), Springer-Verlag, New York, N.Y., and by Josh Benaloh and Michael de Mare, 1991, Efficient Broadcast Time-Stamping, Technical Report No. TR-MCS-91-1, Clarkson University Department of Mathematics and Computer Science, Potsdam, N.Y. The technology is offered commercially by Surety, Inc.

OCR for page 59
Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy When integrity checks fail, both repair and investigation are required. First, invalid data must be restored from mirrors or backups provided by redundancy mechanisms designed into the archive. Second, the cause of the error must be investigated. Failure to rigorously investigate cases of failed integrity will eventually lead to failure to protect the archive; modifying the system design in response is an essential part of iterative design. If it turns out that files have been improperly altered by the ERA software or its operators, it will be necessary to “undo” certain changes to the archive. Repair can be facilitated by designing the file system to allow operations that modify files to be “undone,” at least to some level; techniques similar to those used to implement “undo” in desktop word-processing software can be applied to a file system as well. The most effective technique will be to limit the amount of the archive that can be modified at all: ideally, most storage would be treated as “read only” by the software; only portions of the archive in which collections of records were being assembled during the ingest process would be modifiable. Read-only records may, of course, be subject to data-type transformations so as to combat obsolescence, but these processes should only add new forms, not delete the old forms. Compromise of Cryptographic Algorithms over Time Cryptographic algorithm compromise is a distinct possibility that must be assumed to happen at some point during the lifetime of the ERA. Breakthroughs in mathematics, cryptanalysis, or computing power over the lifetime of the ERA may result in successful attacks on the one-way functions that form the basis of digital assurances used in the archive. For example, results reported at the 24th Annual International Cryptology Conference (held in Santa Barbara, Calif., in 2004) indicated some weaknesses in hash functions commonly used today. After the conference, NIST announced that it plans to phase out the use of Secure Hash Algorithm (SHA)-1, the recommended federal standard cryptographic hash function, by 2010. In mid-February 2005, it was announced that a team of cryptanalysts at universities in China had invented a new hash-collision algorithm attacking SHA-1 that is substantially faster than all previously known attacks. While these results are far short of a devastating compromise of existing standard algorithms, they remind us that the algorithms may not be impregnable. If a cryptographic algorithm were to be compromised, hash values computed using that algorithm would not provide assurances of integrity. The following provisions are, therefore, essential for long-term archival preservation: Archive systems should be designed to accommodate replacement of cryptographic functions and to allow records to be reprocessed to attach revised digital assurances. Well-designed security and signature systems accommodate more than one cryptographic algorithm for just this reason; such provisions are especially important owing to the very long lifetime of an archive system. Replacement of the algorithm and recalculation of hash values should be performed both when compromise is threatened and when more-robust hash algorithms are developed and accepted by the cryptographic community. Better algorithms should be adopted early rather than waiting until the older algorithms are known to have been compromised. This action would guard against the old integrity schemes being compromised, unbeknownst to NARA, before the new one was applied.

OCR for page 59
Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy BOX 5.1 How to Introduce a New Integrity Scheme into an Archive One way of introducing a new integrity scheme would require that, in addition to the hashing system for integrity, there would also be a secure time-stamping system. Suppose that an implementation of a particular time-stamping system is in place, and consider the pair (r, c1), where “c1” is a valid time-stamp certificate1 (in this implementation) for the digital record “r.” Now suppose that some time later an improved time-stamping system is implemented and deployed—by replacing the hash function used in the original system with a new hash function, or even perhaps after the invention of a completely new algorithm. Is there any way to use the new time-stamping system to buttress the guarantee of integrity supplied by the certificate c1, in the face of potential later attacks on the old system? One could simply submit r as a request to the new time-stamping system, but this would lose the connection to the original time of certification. Another possibility is to submit c1 as a request to the new time-stamping system, but that would be vulnerable to the later occurrence of a devastating attack on the hash function used in the computation of c1, as follows: if an adversary could find another record r′ with the same hash value as r (a hash collision), then the renewal system could be used to backdate r′ to the original time. Suppose instead that the pair (r, c1) is time-stamped by the new system, resulting in a new certificate “c2,” and that some time after this is done (i.e., at a definite later date), the original method is compromised. The certificate c2 provides evidence not only that the record contents r existed prior to the time of the new time-stamp, but also that it existed at the time stated in the original certificate, c1; prior to the compromise of the old implementation, the only way to create a valid time-stamp certificate was by legitimate means.2 1   Whatever output a time-stamping system produces in response to a time-stamp request for a particular bit string, such as a signature returned by a hash-and-sign system. 2   This approach first appeared in the technical literature in D. Bayer, S. Haber, and W.S. Stornetta, 1993, “Improving the Efficiency and Reliability of Digital Time-Stamping,” pp. 329-334 in Sequences II: Methods in Communication, Security, and Computer Science, R.M. Capocelli, A. De Santis, and U. Vaccaro (eds.), Springer-Verlag. The paper is available online at <http://www.surety.com/solutions/DN/bhspap.pdf>, accessed May 1, 2005. Archive systems should also support multiple algorithms simultaneously. It would be prudent to use two hash functions in parallel in anticipation of future cryptanalytic advances. In addition, migration to new hash schemes would likely be performed over time, as records are copied from one medium or storage system to another, rather than all at once. Box 5.1 describes an approach to putting a new integrity scheme in place. The issue of algorithm compromise will be an ongoing operational concern. Someone must pay enough attention to keep track of the need to apply a new integrity scheme and must make a judgment call about when to add the new scheme to the system. Once the new scheme is added, new hash values can readily be calculated during the course of routine scans of the archive for errors.

OCR for page 59
Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy By using standard schemes, NARA can leverage the knowledge of a wide community of cryptographers and system designers to inform its decisions over time. For example, NARA can look to NIST, which monitors developments in this area and issues advisories, for advice. ASSURANCE AT DELIVERY TO ARCHIVE USERS When it delivers an electronic record or batch of records to a customer, NARA can use a secure transmission method or a digital signature, which provides assurance that the data can be trusted because they originated at NARA (which checked the origin of the data at ingest and then held the data in a trusted repository). Such protections also provide assurance that data have not been tampered with in transit after leaving the ERA system. ASSURANCE FOR ADDITIONAL INFORMATION The records themselves, in their original forms, are considered above. But additional information, such as derived forms and metadata, should also be protected with the same level of assurance. Assurance for Record Migration and Derived Forms During its operation, the ERA may need to create alternative forms of the records that it ingests. These alternatives may be needed to simplify meeting ERA users’ needs (derived forms, as described in the committee’s first report), to prevent records from becoming unusable if the data type in which they are encoded becomes obsolete, or to meet the needs of certain users (e.g., for the redaction of classified records). In any case, the new form of the record will not contain the same bits as those in the original, and as a result neither the authentication nor the integrity checking associated with the original record will apply to the new form. Nevertheless, a chain of trust can be established from the original record to the new form, assured by cryptographic techniques. The basic idea is that the operation that transforms a record in order to create a new one must certify the following: (1) the record it used as input, (2) the authority imputed to the transformation process, and (3) the integrity of the output of the process. Whether the transformation involves only computer processing or includes manual processes (e.g., redaction), the transformation process must make such a certification. It includes an unambiguous identification of the input record (e.g., record identifier and hash digest), an unambiguous identification of the output record (e.g., record identifier and hash digest), and perhaps other information such as the identity of the person performing the redaction or the version number of the software that performs a data-type conversion. The transformation process or processes are thus unambiguously identified, and the user of the record in its new form is thus presented with an assurance history of the document. Users can then decide, based on the trust they impute to any transformation processes applied, whether to trust the new record. The scientific community that records, processes, and analyzes sensor data is grappling with similar problems. Agencies and disciplines seeking to keep track of these transformations applied to streams of data include NOAA (weather), NASA (satellite imagery), and high-energy physics (records of particle accelerator experiments). Developments in these areas may prove useful to NARA in the future.

OCR for page 59
Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy Assurances for Metadata Some metadata is an essential accompaniment to a record and must be protected accordingly. For example, the date on which a record was created may not be a part of the record, but is instead recorded as metadata. This metadata must be protected by digital assurances, just as is the rest of the record. Some metadata may change over time, being revised as metadata standards change or as better techniques become available for extracting metadata from record contents. This information is similar to a derived form of a document—that is, it should be assured by an audit trail that identifies the original input record and the processes used to derive the new metadata. THREAT MODELING AND THREAT COUNTERING Threat modeling is a systematic technique for enumerating threats against digital systems. It includes, among other elements, developing scenarios involving accidental or deliberate misuse of systems that might lead to vulnerabilities. The model is then used to analyze a system design to understand how it performs in different scenarios. After developing a threat model, designing ways to counter the expected threats, and building the system, one should then observe the system to see what kinds of integrity, authenticity, and security failures occur despite carefully laid plans. Next, this failure analysis is used to adjust the threat model, to design additional or better threat-countering procedures, and to upgrade the system. The process is repeated for the entire lifetime of the system. Systems that are to provide fault tolerance, integrity, security, or life safety all require this kind of continual iteration throughout their entire life cycle. The buzzword is “design for iteration,” and this style of system design makes the feedback mechanism an essential component of the system design. EVOLUTION OF ASSURANCE OF RECORDS An ideal scheme for digital assurance would afford an end-to-end verification of authenticity and integrity. That is, the reader of a record, many decades after its creation, would be able to directly verify the authenticity of the creator and the integrity of the record’s preservation. As described above, current techniques are limited to piecemeal assurances that together bridge the validity windows of key management systems, of cryptographic algorithms, and data-type transformations. An ideal scheme is not currently available; perhaps research can improve on the techniques that we depend on today (see Chapter 4). With present techniques, the chain-of-custody and assurance bridging techniques are only as strong as their weakest link. An archive that superbly guarantees the integrity of its records will not be useful if the agencies sending records to the archive have been sloppy about any aspect of stewardship of the records in their custody. To ensure that the ERA, decades hence, will be valued as a source of authentic government records, the entire chain of record custody must use robust assurances. Promoting Digital Assurance Throughout the Federal Government The preceding arguments clearly call for digital assurances to be applied to records throughout their life, not just starting at the time they are ingested by NARA. However, digital

OCR for page 59
Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy assurance techniques are not currently widespread in either government or commercial information technology (IT) systems or electronic records-management systems. NARA’s mission to preserve essential evidence suggests that it take a much more active role in promoting digital authentication and integrity assurance techniques that agencies can use to safeguard the records they create. If NARA cannot provide state-of-the-art attestations about the records it holds, the records will not be honored as valid in an environment in which significantly better practices exist. A well-designed and well-operated ERA itself can serve as an exhibit of “best practices” in the digital retention of electronic records, including both technical and operational methods for assuring record authenticity and integrity. Also, as new government IT systems are developed and as new records-management systems are developed or procured, NARA should assist agencies in their adoption of acceptable digital assurance measures. These techniques require software to create and maintain digital signatures and the like, but also operational measures to issue and manage cryptographic keys, and operational measures to ensure that the records-management system itself is not compromised. The Records Management Redesign initiative, currently underway, provides an opportunity to inaugurate and promote the use of digital assurance techniques for government records. The thrust of this initiative is to engage the record-creating agencies in the overall records-preservation mission, taking on responsibility for defining, creating, and maintaining digital records in a form that streamlines the preservation and later use of the records. Strong assurance is a vital aspect of electronic records preservation that has not received adequate attention. Until record creators use digital assurance methods that conform to NARA’s standards, agencies that create and hold electronic records should be held to stringent chain-of-custody standards for their holdings. When records without digital assurances are transferred to NARA’s custody, NARA should immediately augment these records with suitable digital assurances, which can then be maintained throughout the records’ life in the ERA. The ultimate goal must be to achieve and maintain the best digital assurances and other record-retention practices in both the creating agencies and the archive itself.