3
Partnering with Other Institutions

The National Archives and Records Administration (NARA) need not go it alone in providing services to its users. NARA’s work with electronic records affords a number of opportunities to leverage and partner with other organizations, commercial firms, and researchers to provide access to NARA collections. Designing the Electronic Records Archives (ERA) to feature public interfaces can enable much broader use of NARA holdings. The inclusion of appropriate application programming interfaces (APIs) in the ERA design would enable third parties to access records and to provide access services that are layered on top of the NARA-provided records and access understructure. Federation would allow collections held by NARA and other organizations to be accessed as if they were a single collection. This chapter briefly describes ways of partnering with other institutions and technologies that would make such partnerships possible.

APPLICATION PROGRAMMING INTERFACES

An application programming interface allows access to selected system internal data or operations, enabling people to build new interfaces or applications on top of NARA’s systems. APIs would allow third parties to do such things as run queries against NARA holdings (catalogs or content) and retrieve individual records. A good example of the potential of open API access is illustrated in the commercial arena by the search service Google. Simple API-level access to its search engine is provided to anyone who registers, letting others spend their own research resources developing applications that might eventually be interesting or useful to Google.

Capabilities that might be built on top of the ERA include the following:

  • Data annotation to add and/or correct record metadata. Third parties with an interest in particular records could provide services to annotate records—using systems separate from the ERA but accessible to users as if they were part of it. These annotations could be held by



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 46
Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy 3 Partnering with Other Institutions The National Archives and Records Administration (NARA) need not go it alone in providing services to its users. NARA’s work with electronic records affords a number of opportunities to leverage and partner with other organizations, commercial firms, and researchers to provide access to NARA collections. Designing the Electronic Records Archives (ERA) to feature public interfaces can enable much broader use of NARA holdings. The inclusion of appropriate application programming interfaces (APIs) in the ERA design would enable third parties to access records and to provide access services that are layered on top of the NARA-provided records and access understructure. Federation would allow collections held by NARA and other organizations to be accessed as if they were a single collection. This chapter briefly describes ways of partnering with other institutions and technologies that would make such partnerships possible. APPLICATION PROGRAMMING INTERFACES An application programming interface allows access to selected system internal data or operations, enabling people to build new interfaces or applications on top of NARA’s systems. APIs would allow third parties to do such things as run queries against NARA holdings (catalogs or content) and retrieve individual records. A good example of the potential of open API access is illustrated in the commercial arena by the search service Google. Simple API-level access to its search engine is provided to anyone who registers, letting others spend their own research resources developing applications that might eventually be interesting or useful to Google. Capabilities that might be built on top of the ERA include the following: Data annotation to add and/or correct record metadata. Third parties with an interest in particular records could provide services to annotate records—using systems separate from the ERA but accessible to users as if they were part of it. These annotations could be held by

OCR for page 46
Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy external parties (provided there was a way for third parties to point to ERA records unambiguously, e.g., via a unique identifier), or the ERA could hold and provide access to the overlays.1 The continued introduction of specialized enhanced capabilities by partners and vendors to support particular needs. For example, a third party or NARA might supply an interface that provides translations of popular audio recordings (e.g., the Nixon White House tapes) from the format in which they are archived into commonly used formats (e.g., Windows Media or Real Audio). Some of these capabilities might prove useful enough to be folded back into the ERA system proper. Entrepreneurial innovation to provide value-added services, through either APIs or bulk download. Today, for example, third parties provide value-added access to information in the Securities and Exchange Commission’s Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system. Similarly, commercial third parties currently provide enhanced access to old data of the U.S. Census Bureau. Educational materials. Overlays onto NARA holdings could be a relatively low cost method of dramatically providing a new set of services that satisfy NARA’s educational mission. Educational publishers, as well as historians and social scientists, could develop online materials that draw on NARA holdings. Such capabilities would allow NARA to tap at low cost the resources of third parties in order to provide enhanced services, help fulfill NARA’s mission, increase the perceived value of NARA’s holdings, and explore new technologies and approaches. In allowing such capabilities to be built on top of the ERA, NARA will have to consider the problem of how a typical user distinguishes between results that are provided directly from NARA and those produced via a third-party extension. It would probably not be practical for NARA to certify third-party software or services. However, NARA may want to find ways of letting users know when they are using NARA versus third-party software and services, because the guarantees that NARA would make about the authenticity of what is viewed in the two cases would be very different. Additionally, support for bulk download of records should be provided by NARA to allow third parties to retrieve sets of records against which to run their own full-content searches, perform data mining, and so forth. And an additional design issue that would facilitate third-party access is the adoption within the ERA of a persistent (i.e., stable and durable) record-identifier scheme. FEDERATION Federation is a technique whereby a software layer is used to make a collection of relatively different systems—which are often controlled administratively by different organiza- 1   The overlay concept also has application internally within NARA. For example, Thomson West undertakes to rebuild the full index for the Westlaw databases only every 10 to 15 years. Between rebuilds, it augments the index with new information, some generated automatically, some manually. The system is structured so that a query collects information from all of these sources—the original index and the subsequent overlays. NARA could accept overlaid annotations from partners to add value to its holdings.

OCR for page 46
Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy tions and often integrated loosely—appear to be a single system.2 Federation has technical (e.g., the definition of standards for the federation layer and connection to the systems below the federation layer) and organizational (e.g., agreement on standards and the forging of agreements to federate holdings) dimensions. Applications of federation to the ERA would include the following: Federated access. There are several reasons why federated access to electronic records collections held by NARA and other organizations would be an attractive capability. First, it would provide a common interface to federal records regardless of which agency happened to have responsibility for managing them, allowing responsibility for electronic records preservation to be distributed among multiple agencies without giving up the advantages of a common access point for users. For example, users could retrieve in a single query records held by NARA as well as agency records that had not yet been transferred to NARA. Federated access would also be valuable, for example, in allowing access to span NARA and presidential library resources. Another example would be enabling seamless access across multiple systems containing archived and active data. For example, someone searching for data about the Snake River would otherwise have to look across many U.S. federal agencies (e.g., the Bureau of Indian Affairs, the Environmental Protection Agency, the Department of the Interior, and the Library of Congress), and the archives and agency records of four states (Wyoming, Idaho, Oregon, and Washington) as well as NARA, all of which might have relevant records. With agreement on how to federate, access could be enabled across NARA and the archives and other holdings of international, state, and local archives, libraries, and other institutions. Federation of archival systems. More generally, it may prove useful to federate the ingest of records, their access, and other functions across multiple archival systems. This arrangement would provide a way to combine multiple instances of the ERA, allow simultaneous operation of multiple versions of hardware and software, and generally avoid having to maintain a single monolithic archival system. Federated storage. Placing a federation layer between an archival system and its file storage is a very useful technique. Among its benefits are that file systems at multiple sites can be treated the same by software using the file system, and new implementations of file systems (or storage types) can be added to the system as it evolves. Old file systems, no longer used, can easily be removed. Federation is a mainstream design approach employed in a number of commercial applications today. Databases are routinely federated, usually at the Structured Query Language (SQL) level, in industry. Federated file systems (the San Diego Supercomputer Center Storage Request Broker3 is one example) similarly appear in a variety of production systems today. The Z39.50 Information Retrieval Protocol4 is used to support federated cross-library searches. Following are some other examples of applications that use federation: 2   By contrast, the term “distributed” usually means multiple instances of identical or similar systems, often under a single administrative control, often tightly integrated. The terms “federated” and “distributed” label end points in a spectrum. 3   A detailed description is available online at <http://www.sdsc.edu/srb/>. Accessed May 1, 2005. 4   ANSI/NISO Z39.50—2003 Information Retrieval: Application Service Definition and Protocol Specification.

OCR for page 46
Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy Some comparison shopping systems. For example, Orbitz federates several airline reservation systems. It is not just a portal providing access to multiple sites, but an actual federation of several airline reservation systems. Identity servers managing identity in computer systems. In such an application, a (corporate) user logs on to a single identity server; that server then prepares credentials for various information technology (IT) systems when asked by the IT system, using whatever credential types the particular IT system requires. The Liberty Alliance is an industry group working to build federated identity capabilities. New designs federating mail servers. With such a federation, when one queries a set of e-mail folders (e.g., in a large e-mail archive), the folders residing on different e-mail servers will be searched within their respective servers, but the results will be consolidated. This form of federation involves only a single product, so it is simpler than a federation in which the federated systems run different software (in such a case, the federation interfaces must be carefully defined). Despite the use of federation in various commercial applications today, there is no common way to federate access to electronic records archives or digital libraries. There is already a significant technology base to draw on, including metasearch technologies and standards such as the Z39.50 Information Retrieval Protocol and the Open Archives Initiative Protocol for Metadata Harvesting.5 There is, however, no established base of practice for federating access to archives. As NARA and other organizatons start building such archives, federation is a technical area in which NARA might become a leader or engage the research community by informing it of its needs. If NARA is to take advantage of these opportunities, it will also have to address policy issues involved with allowing its resources to participate in federations (that might be run by groups other than NARA). 5   Additional information is available online at <http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm>. Accessed May 23, 2005.