I

Report of the Physics, Chemistry, and Materials Sciences Data Panel

Gerd Rosenblatt,* R. Stephen Berry, Edward Galvin, J.G. Kaufman, Kirby Kemper, David Lide, Jr., and Edgar Westrum, Jr.

CONTENTS

1 INTRODUCTION AND OVERVIEW

This report is concerned with the long-term retention of scientific data—generated or held by the federal government—in the laboratory physical sciences exemplified by physics, chemistry, and materials sciences. “Long term” means essentially permanently; i.e., for longer than a hundred years, similar to the way that archived scientific data have been preserved in the journal scientific literature since at least the 17th century. Particular attention is paid to scientific and engineering data that exist in electronic form because of the large increase in the amount and kinds of data in computer-compatible formats brought about by recent advances in computers and data storage.

The major questions to be addressed are:

  • What data should be preserved—or conversely, what data need not be preserved—for the “long term”?

  • Who should save these data?

  • What role and modes of operation are appropriate for the National Archives and Records Administration (NARA) in the preservation of data from physics, chemistry, and materials sciences?

*  

Panel chair. The authors' affiliations are, respectively, Lawrence Berkeley Laboratory, University of California; University of Chicago; The Aerospace Corporation; The Aluminum Association; Florida State University; Consultant, North Potomac, Maryland; and University of Michigan.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers I Report of the Physics, Chemistry, and Materials Sciences Data Panel Gerd Rosenblatt,* R. Stephen Berry, Edward Galvin, J.G. Kaufman, Kirby Kemper, David Lide, Jr., and Edgar Westrum, Jr. CONTENTS     1  Introduction and Overview,   1     2  Characteristics of Data in Physics, Chemistry, and Materials Sciences,   2     3  Data Management Requirements,   5     4  Illustrative Examples of Electronic Records from Physics, Chemistry, and Materials Sciences,   8     5  Data Retention and Record Preservation Criteria,   14     6  Suggested Role of the National Archives and Records Administration,   15     7  Summary Conclusions and Recommendations,   19      Acknowledgments,   21      Bibliography,   21 1 INTRODUCTION AND OVERVIEW This report is concerned with the long-term retention of scientific data—generated or held by the federal government—in the laboratory physical sciences exemplified by physics, chemistry, and materials sciences. “Long term” means essentially permanently; i.e., for longer than a hundred years, similar to the way that archived scientific data have been preserved in the journal scientific literature since at least the 17th century. Particular attention is paid to scientific and engineering data that exist in electronic form because of the large increase in the amount and kinds of data in computer-compatible formats brought about by recent advances in computers and data storage. The major questions to be addressed are: What data should be preserved—or conversely, what data need not be preserved—for the “long term”? Who should save these data? What role and modes of operation are appropriate for the National Archives and Records Administration (NARA) in the preservation of data from physics, chemistry, and materials sciences? *   Panel chair. The authors' affiliations are, respectively, Lawrence Berkeley Laboratory, University of California; University of Chicago; The Aerospace Corporation; The Aluminum Association; Florida State University; Consultant, North Potomac, Maryland; and University of Michigan.

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers These questions are important because of the nature of scientific understanding—it depends upon accumulated knowledge—and of the ways in which scientific understanding and technological knowledge have become woven into the fabric of our society and civilization. What Data Should Be Preserved? The areas of science being considered by this panel—physics, chemistry, and materials sciences—are laboratory physical sciences. Data from physics, chemistry, and materials sciences differ from data in fields being considered by other panels in this study because much of the data stem from experiments that, in principle, could be reproduced. However, closer examination reveals that it is simply not feasible to reproduce many data sets because the samples, apparatus, or expertise that led to them cannot be reproduced at an acceptable cost. Furthermore, some types of data cannot be reproduced at any cost. This leads immediately to the major criterion to be used in this report: if the data will be important to science in the future, the primary criterion for determining whether a laboratory science data set is a candidate for long-term preservation is whether or not it is feasible to reproduce it. The discussion and examples in this panel report will illustrate the factors that make data sets impossible, or impractical, to reproduce. Who Should Save the Data? Data should be saved by organizations and in formats so that they are maximally available to the primary user—the scientific and technical community. In most cases, this means primary responsibility will continue to be held by technical libraries, government agencies, and professional societies that currently archive and make accessible scientific and technical data, records, and publications. The system for generating, preserving, disseminating, and accessing scientific information is evolving—some would say too rapidly, others, not rapidly enough—as data storage, scientific communications, and data acquisition are revolutionized by electronics and computers. But the system is not broken. There is little evidence that the organizations that have been meeting the needs of the scientific community will be unable to do so in the near future. Thus, we see primary responsibility for preserving electronically stored scientific data remaining with those currently preserving and supplying scientific information. There is, however, an enhanced role for NARA to play: in preserving electronically stored scientific information for access outside the scientific-technical community; in helping facilitate access to electronically stored scientific records by providing (or cooperating in the provision of) locator information; and by being ready to step in and preserve records when they might otherwise be lost. Contents of This Report Following this introduction, the panel's report begins, in Section 2, by considering some of the characteristics of scientific data generated and used in physics, chemistry, and materials sciences, particularly characteristics that distinguish data in these fields from data in fields covered by other panels in this study. Section 3 describes general issues and requirements critical to the subsequent use of data from physics, chemistry, and materials sciences. Section 4 describes six example databases. The examples illustrate data that are not feasible to reproduce because they are from a no-longer reproducible “mega-experiment,” represent tremendous accumulated effort, or are measurements on unique samples. Section 5 summarizes criteria to be used in deciding what data are worth preserving for a very long time. The criteria are modeled upon the processes that have been developed and used over the past 300 years by the scientific community in preserving printed records. Section 6 turns to the panel 's suggestions of how NARA might add to the nation's ability to preserve and utilize irreplaceable scientific data. The panel's conclusions and recommendations are summarized in section 7. The documents consulted by the panel are listed in the bibliography at the end of this report. 2 CHARACTERISTICS OF DATA IN PHYSICS, CHEMISTRY, AND MATERIALS SCIENCES The primary purpose of data generated or stored by scientists and engineers in physics, chemistry, and materials sciences was, and in the great majority of cases will continue to be, to provide specific information to other

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers scientists and engineers. More and more, these data are being generated or stored in electronic form. The electronically recorded data are based in one way or another, either directly or after considerable manipulation, on observations. Their long-term primary value is to add to the body of scientific and technical information so that other scientists and engineers can use the results, or generalizations based upon the results, and thereby eliminate the need to carry out many future observations or experiments. Characteristics Distinguishing Data in Physics, Chemistry, and Materials Sciences from Data in Fields Covered Elsewhere in This Volume Data derived from laboratory experiments, such as the hardness of a steel produced in a particular melt, differ from data based upon observations of transient natural phenomena, such as the records of the 1993 midwestern floods. Thus, they pose different questions when one examines data preservation issues. One difference arises from the fact that transient natural phenomena are not reproducible; the fact that the resulting observational data are “snapshots in time” sometimes means that the data have historical or evidential value in addition to their informational value. Observational data sets that provide a continuous time-series record of the physical universe, or of human impact upon it, are important to future generations for comparisons and the identification of trends. In addition, many observational data sets represent major engineering or worker-intensive collection activities that warrant documentation and could not feasibly be carried out again. However, there are not always clear distinctions between physical and chemical sciences data and observational databases. For instance, the Hanford Environmental Information System (HEIS) database is a collection of environmental measurements made at the site for manufacture of much of U.S. nuclear weapons material. HEIS is an example of physical-chemical observational data that should be preserved both because they provide basic information needed to model the site's situation—i.e., the data enable a quantitative, consistent description of the measured and inferred characteristics of the site—and because they affect future decisions about health and safety of those who will live or work near that site and of those who might carry out similar operations in other geographic locations. It is important to note that, in addition to the site observations, laboratory physics and chemistry data are required for input into the environmental models. Good laboratory data have a critical, perhaps underestimated, role in the proper interpretation of such models. There is a need to build a proper relational record into scientific data holdings to note linkages and dependencies between observational data and laboratory data. As just noted, one characteristic that distinguishes physics, chemistry, and materials sciences data from earth sciences data, for example, is the possibility of reproducing the conditions under which the measurements were made and, therefore, being able to repeat the measurements at a later time. However, the distinction between data sets from these various fields is blurred somewhat because it can become possible to infer what would have been transient earth science observations (such as temperature, precipitation, or carbon dioxide levels) of past events from reproducible laboratory measurements on, for example, ice borings or tree rings. Another equally significant difference between the fields is associated with the maturity and established theoretical framework of the various sciences. Physics, chemistry, and materials sciences have built up a powerful theoretical base that allows reliable prediction of many potential experiments and observations without actually performing them. This accumulated knowledge base also allows treatment—and, in the information sense, extensive compression—of acquired data. The net effect is that the sheer volume of data that it would be valuable to archive is much less in the physical sciences than in fields more dependent upon continued manipulation of raw observational results. The Nature and Kinds of Electronically Recorded Data in Physics, Chemistry, and Materials Sciences Electronically recorded data in the physical sciences are of two forms: original experimental results and evaluated compilations of published data from a variety of sources. Most scientific data are in one or more of three general formats: textual, numeric, or graphical. Textual, numeric, and graphical data. At least three types of data need to be dealt with by a useful scientific information and data service. Textual data, exemplified by bibliographic records, are alphabetic character sets searchable as strings. They tend to be similar regardless of specific scientific discipline.

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers Numeric data, exemplified by the quantitative properties of materials, are numbers or searchable numeric ranges. Scientific numeric data have units, and these units must be specified for the numbers to be meaningful (for example, a pressure will have a very different numeric value if the units are atmospheres than if the units are Pascals). Numeric values for a measurable property usually depend upon such independent variables as time, temperature, or pressure, and the values of those independent variable also must be specified for the property value to be meaningful. Numeric data tend to be substantially more complex in format than textual data because of the interdependence of numeric values for the independent and dependent variables and because the numbers are meaningless without their units. Graphical data, exemplified by chemical structures or optical spectra, are typically two-(and increasingly three-) dimensional drawings or images. Their description may include both shape and type of corrections or fits among parts of the shapes. Graphical data too are more complex than textual data, and their retrieval and use typically require a capability for searching by total or partial graphical shape. Original results. Significant changes have taken place in recent decades in the form of “original data.” A raw result was, in the past, typically a measured parameter, such as a voltage or distance. These measurements were acquired, written in a notebook, mathematically treated to obtain the desired scientific parameter from the raw measurement, analyzed, and often then discarded or ignored. Nowadays, many raw data are acquired and processed electronically “on the fly” so that only processed data exist long enough for anyone to look at. With rapid automated data acquisition and manipulation, the option exists to keep (electronic) data and (re-) analyze them as required. However, speedy, automated data collection makes for large volumes of often insignificant data, so that in many experiments the data stream is screened and most of it discarded “in real time” by-a computer program or an experimenter. For example, whereas spectroscopists used to keep the photographic plates or recorder charts from which they had taken measurements, the peaks now may be analyzed electronically, immediately upon being measured, and only relevant attributes of significant peaks recorded. The fraction of the raw data that is saved after initial processing may be very small, sometimes less than 1 part in 104. The protocols that pertained to laboratory notebooks to ensure that these documented the experiment completely for future checking and patentability still hold, but not everyone continues to use laboratory notebooks; additional new protocols need to be developed by the scientific community. Another factor leading to “unpreserved data” is that many experiments in the physical sciences are undertaken to develop or suggest theoretical understanding. Once the theory is established by a multitude of complementary experimental results and consistency with other scientific knowledge, no one may care about revisiting the original data, which, if considered in isolation, might be unconvincing. When considering laboratory data of the kind described in this and the preceding paragraph, it is usually best to accept that generally no one knows as much about the original data as the original experimenter. If the experimenter does not find the data worth preserving, the data are probably not going to be worth much to anyone else, at least not for scientific purposes. Of course, it is possible that the data may be of interest for historical or legal reasons quite different from the considerations influencing the experimenter. This aspect of the rationale for preservation is well known to professional archivists. However, there are other kinds of original data sets that should be preserved because the information in them is needed by future generations of scientists and engineers, and because it would be impossible or impractical to reproduce the measurements. As an obvious example, a supernova is not a reproducible laboratory experiment. Not so obviously, neither are most measurements of the mechanical and physical properties of engineering materials; the samples were consumed or lost and represented a unique fabrication technology or service exposure. For example, a cracked artificial heart valve is a unique sample. To obtain the understanding required to avoid future failures of similar valves, the physical properties of the cracked valve must be measured to ascertain how, why, and under what conditions of fabrication, installation, and exposure it failed. Evaluated compilations. In addition to original experimental results, the other kinds of data from the physical sciences that exist and are disseminated in electronic form are compilations based upon (critical) analysis of a large body of data from the scientific literature. Well-known examples (cf.Section 4) include thermodynamic property compilations such as the National Institute of Standards and Technology's Joint Army-Navy-Air Force (JANAF) tables and the thermophysical properties disseminated by the Department of Defense Center for Information and Data Analysis and Synthesis at Purdue University. In this case, the data typically are not impossible to replace; they simply represent so much effort that it would be very costly and highly impractical to replace them. The costs of not

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers having the data available, although usually difficult to measure other than anecdotally, can be much higher than the cost of preserving them. On the whole, the data sets described above have been, and will continue to be, preserved by the scientific organizations, agencies, laboratories, libraries, and professional societies that played a major role in their original acquisition or dissemination. A key question to be examined later in this report is what happens when the originally responsible institution is unable or unwilling to continue preservation and dissemination. This question has a corollary that introduces what may be a third class of scientific record: how do we as a nation preserve enough documentation about dormant—but not obsolete—areas of experimental science that benefited from considerable accumulated expertise (materials for breeder reactors, bubble chambers, possibly calorimetry, and others) that the expertise can be recovered when needed at some future time? There is a need, that might become urgent, to identify those scientific and technological information sources (paper, as well as electronic) that are in danger of being lost if no one steps in to preserve them. 3 DATA MANAGEMENT REQUIREMENTS There are a number of general requirements critical to subsequent use of data from physics, chemistry, and materials sciences, including the need for detailed metadata along with the data set. This includes the need to preserve and describe the algorithms and models used to acquire, process, evaluate, or utilize the data set. to save classified data and to ensure that it is declassified as soon as the reason for classification is no longer applicable. for a comprehensive, up-to-date, easily accessible, national or international “locator system” that will enable scientific researchers to determine whether sought-for data exist and how to access them. for prompt access. Scientific researchers need to know quickly if data exist and, for some types of data, to have rapid access to the data. to handle different kinds of appropriate storage media and yet to have standards so that the number of types of media is limited. to handle myriad appropriate data formats and yet to ensure that data are retrievable and usable. to consider data management and preservation, and to provide guidance to scientists as to appropriate formats for data of long-term value, during project initiation. Metadata, Algorithms, and References The scientific community often lumps under the heading “metadata” all the information that is required to understand what is in a data set and how to access and use it. In other words, the metadata provide information and references equivalent to that which would be in a peer-reviewed, archived, research journal article, in the introductory chapter(s) of a book of tables, or in an instruction manual. This includes a description of the algorithms and models used to process or interpret the data, as well as the environmental variables, calibration procedures, and other experimental details. The panel believes that it is crucial to preserve the algorithms and models along with the data. The standard methods in many areas of laboratory science use electronic and computational filters on the data, followed by computational analysis of the resulting data. The programs and program descriptions used by experimenters (and also evaluators) should be saved, along with the papers and reports that are referred to in the documentation for these programs. The information necessary to make full, effective use of scientific data include the definitions of the data in each field; a description of the methods used to acquire and manipulate the data; references to the sources of the data; detailed descriptions of samples, conditions, assumptions, limitations, etc.; deviations from standard practice; and references and cross-references to other work and data sets. In contrast, archivists generally decide that the metadata are sufficient when they, as nonspecialists, can tell what is in the record and how to access it. On the one hand, institutions holding electronically stored scientific data should be encouraged strongly to ask researchers what documentation is required in order to utilize the data fully and to include that documentation in the archival record. On the other hand, documentation from the scientific community should keep this difference in mind and provide

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers the specific pieces of information that archivists would not think of that will be necessary to users within and without the scientific field. One way to check whether metadata are adequate is to ask a typical potential user to use data from an unfamiliar database in a mock (or real) situation. Can users obtain what they need? The users of many laboratory science data sets are a more homogeneous group than are the potential users of nontechnical databases. This may decrease somewhat the metadata requirements, as it may be assumed that users needing the most detailed documentation will be literate scientists from fields close to that of the data originators—just as when preparing a journal publication. Because providing adequate metadata for a set of scientific data is just as arduous as writing a scientific paper, it is necessary for the scientific community (including its administrators) to provide incentives to encourage adequate documentation of all data sets that will be preserved for the long term. Researchers or centers would be more likely to produce adequate documentation if that documentation, in the form of a report on file with the database holder, were recognized as a bona fide scholarly document. Although usually much is lost when a non-scientist documents science, there is also the possibility of having a knowledgeable scientist or archivist other than the original investigator prepare the metadata. The result would be better than grossly insufficient documentation. Ideally the appraisal of potentially archivable records should be accomplished jointly by an archivist and the originator of the record. But this cannot be done always. Too often a project or grant ends and no money or staff are left to ensure that the records are retained at all, let alone described effectively. Appraisal and documentation should take place while a project is active, as part of the routine of documenting the results of experimentation. As a final comment regarding data fields and metadata, it is important that the “searchable” information about a data set be both broad and deep so that the data can be found. For example, the investigator's institution can be very useful in finding particular sought-for information, but it does not help much if this field cannot be searched. Any keywords must be general enough, and the keyword list long enough, that secondary users (who do not know the right keywords or their nuances) can still find the record. Classification There is an obvious need to save classified, as well as unclassified, scientific data; the complete records of the atmospheric atomic bomb tests (cf. Example 1 in Section 4) are a clear example. However, it is important to recognize that it is more difficult to provide and assess metadata for a classified data set and it costs more to store classified data. Also, some of the potential value to society of having the data is not realized when only a limited group know of their existence and nature. Thus, it is highly beneficial and cost-effective to have mechanisms in place that promote declassification as soon as the reason for classification is no longer applicable. Locator System Since scientific and technical electronically recorded data sets are, and will continue to be, held by numerous institutions, there is an urgent, ever-increasing need to develop a locator system so that those needing the data can be aware of their existence and can access them. Scientific data that have been preserved but are not used by other researchers might just as well not exist; they are not contributing to the accumulation of scientific knowledge and understanding. The panel's ideal system is one where a researcher sitting at a terminal (which might be at work, at home, or in an airplane) asks a question. A typical question might be: “What is the density of solid nitrogen at 14K?” After perhaps prompting for more specification of what is wanted, the computer provides an answer. It also directs the researcher to further information, at increasing levels of detail The additional information enables the researcher, if desired, to evaluate the source of the data, its accuracy, its limitations and assumptions, alternatives, related information (such as crystal structure and thermal expansion coefficients, for the example cited), and so forth. There are already several directories of scientific data collections, but they go out of date rapidly. This task requires a significant dedicated effort if it is to be done properly. An alternative to this hierarchical model is the “Internet model:” define standards and let individual nodes (in this case, individual data collection sites) maintain their data and keep the system informed in a fully distributed fashion. A distributed system might be more easily expanded to be truly international —clearly the ideal for the scientific community. On the other hand, the Internet has been criticized for being an undocumented resource, one difficult to learn about and to utilize fully.

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers Speed of Retrieval The value of scientific data is often contingent on its existence being known in a matter of days or even less than an hour. If a researcher spends two hours on a computer or in a library trying to find out if information exists and does not find it, the researcher is very apt to conclude it does not exist and to stop looking. A researcher who needs a piece of data to treat that day's results, or to referee someone else's results, may not bother to use it, even after confirming its existence, if it will take weeks to months before the data can be looked at. Thus, an issue in who can provide useful custody may be who can support rapid confirmation of existence and reasonable retrieval times. As was emphasized above, it is the use of preserved scientific data that gives them value and makes their preservation worthwhile. Data only contribute to society when they are used. Storage Media Up to this point, this report has discussed “electronic data” because that is the way that the data of concern are transmitted, manipulated, and perhaps displayed. If preserved, they are also stored —on a tape, chip, or disk—magnetically or optically. There clearly is a need to handle different kinds of storage media and yet to have standards so that the number of types of media, and of media specifications and characteristics, is limited. There also is a need to have backward consistency so that data acquired using the technology of one decade can be utilized decades later without tremendous human effort to modify those data frequently during the intervening period. We suggest a middle road between “accept any medium convenient to one researcher on one day ” and “all data must be submitted on IBM-PC 5.25″ 360 kbyte floppy disks” or “electronic records must be hardware and software independent files in ASCII or EBCDIC with no internal control characters on 1/2″ 7-or 9-track open-reel magnetic tape or on 18-track 3480-class tape cartridges.” (The latter is the current NARA requirement (NARA, 1992a); suggestions for NARA on this issue are in Section 6.) The criteria should be accessibility, suitability, longevity, and stability of the storage medium. Data Formats Computer-compatible electronic data coming from the laboratory physical sciences are of many different types (graphics, images, multi-media, numbers, equations, symbols, text) and formats (word processing languages, database and spreadsheet formats, unique experimental data streams, image formats). Given adequate metadata, almost all of these should be preservable and usable in the future. As one panel member noted, we can still read ancient Greek and Sanskrit, so our grandchildren will probably be able to handle ASCII, WK1, DBF, and other widely used formats. Of course, it is important to remember that we cannot read all ancient scripts and that “machine-readable” formats require having proper hardware as well as the “key” to the language. It is appropriate to aim for a reasonable degree of consistency, with the goal being to make all technical data and records—across the entire government—more accessible. The panel urges that, whenever possible, performance standards that ensure future utility be applied to data format requirements. Performance standards that focus on what the archive truly needs —permanence and usability—are much more likely to be adaptable to complex scientific data sets and new technologies than are process standards. The distinction is analogous to that used to increase the fuel economy of the nation 's automobile fleet: setting overall fuel-economy performance goals for the car manufacturers (the so-called CAFE requirements) was much more effective and long-lasting than would have been the enunciation of vehicle-design requirements that fixed the allowable vehicle weight, materials of construction, or engine or chassis design, for instance. Consistency standards for data formats should always be minimal standards; accessibility is the goal, not uniformity. Standards should be set jointly with all interested parties. There are many de facto software standards in scientific laboratories today, both commercial and shareware, and more are likely to emerge as Internet exchanges become more common. A related issue is how to handle chemical structures. There are a number of different formats for storage, but no dominant standard as yet. A standard format for machine storage and readability would be helpful. But any one of the methods currently employed is a better solution than “standardizing” on a format (such as software-independent ASCII) that does not allow chemical structures to be written at all. Finally, it should perhaps be noted explicitly that most of the electronically stored data of physics, chemistry, and materials sciences are of a type for which error-free electronic retrieval is critical. This may be different from

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers some other fields that rely more on imaging data, where “lossy” data compression does not necessarily invalidate the data and may sometimes be desirable. 4 ILLUSTRATIVE EXAMPLES OF ELECTRONIC RECORDS FROM PHYSICS, CHEMISTRY, AND MATERIALS SCIENCES Federal data sets in the physical sciences that are candidates for long-term preservation can be classified into three generic types, for which illustrative examples are presented in this section: Massive records and data from an original experiment, particularly a “mega-experiment,” that there is no realistic chance of replicating, even though it is, in principle, reproducible. Critically evaluated compilations of data from a large number of original sources that represent tremendous accumulated effort. Unique, perhaps time- and environment-dependent, engineering data collected at federal facilities or as part of a government project (that may or may not ever be completed), much of which never reaches the published literature. Original Experimental Records Experiments carried out on a “grand scale” may become impossible to reproduce for many reasons, e.g., the sheer effort and cost involved, a change in societal circumstances, or the destruction of the apparatus (such as a unique particle accelerator). Example 1: Atmospheric and Underground Nuclear Weapons Test Results The Department of Defense (DOD) Nuclear Information Analysis Center —which goes by the acronym of its former name, DASIAC—is the Defense Nuclear Agency's (DNA) repository for information from nuclear weapons tests. Its holdings cover the entire era of nuclear weapons testing, including both atmospheric and underground tests, but only includes data that have been sent to DASIAC; which has no authority to claim data. DASIAC is not a mere repository, but rather has the explicit goal of keeping information in a form that is useful and accessible to interested parties. DASIAC's data document the effects of nuclear explosions in contrast to the performance of the weapons per se. The latter information, which was the primary goal of the majority of tests, is the responsibility of the Department of Energy (DOE). Consequently, DASIAC's data holdings are just from those tests and experiments that were designed to evaluate the effects of nuclear explosions, perhaps 10 percent of the total number. Its holdings from atmospheric tests are currently more comprehensive than from underground tests. The bibliographic citations of DASIAC's holdings (shelf lists) run to some 40 shelf feet. A more recent estimate was that the holdings, if all converted to electronic form, would require more than 2 terabytes of capacity. The collection is growing as people with long experience in nuclear effects testing retire, and the rate of accumulation is likely to accelerate with the end of the Cold War. Some of the major blocks of the DASIAC holdings include some 300,000 pages of published reports (the definitive DNA documentation of tests in which that agency was involved); about 50,000 documents of historical literature; DNA research, still being added at a high rate; 9000 data sets resulting from tests for the semiconductor industry of the effects of nuclear explosions on electronic parts; 40,000 data sets related to electromagnetic pulses; 130,000 still photos and 17,000 motion pictures of blasts; and about 8000 cubic feet (a major bulk of DASIAC data) of data on thermal radiation and air blast shock waves. Many of the data are classified as restricted or formerly restricted so that, in accordance with the Atomic Energy Act, no automatic declassification is scheduled. The data are stored on a variety of media in addition to the many paper records: silver disk, oscilloscope tracings, magnetic tape (not stored as required for good archiving practice), and photographic film. All the motion pictures are on safety film. Most of the holdings are in an environmentally controlled facility in Santa Barbara, California. The data of interest to the physical science community cover such topics as: Nuclear test detection, surveillance, control, weapon safety, and security. Survival, vulnerability, hardness of military systems, civilian facilities, electronic and structural components. Phenomena associated with the fireball, shock waves, fallout and radioactivity, electromagnetic pulse, atmospheric ionization, and electromagnetic propagation.

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers Weapons output (gamma rays, neutrons, x-rays, infrared, ultraviolet, thermal, and visible radiation and debris), radiation transport, cross sections, and shielding. Research programs and test equipment. As an example of the types of scientific holdings, the electromagnetic pulse data include some 1000 digital records of U.S. and British surface, near-surface, and high-altitude events plus scope traces and digital records of electromagnetic fields, currents, and voltages. The DOE holdings include a Project Officer's Report for each test, which includes such information as experiment objectives and justifications, a summary of results, and conclusions. Additional holdings include much more detailed information about the experiments, such as drawings, photographs, and raw-data recorded waveforms, and even some exposed hardware. Background information has also been preserved in the form of research reports about the development of experiments, calibration details, and other aspects. However, there are still questions about how well these data will be preserved for long-term use by investigators unfamiliar with the original tests; a project called Test Information Preservation is beginning to address these issues. DASIAC requests that metadata accompany all data submissions, but it accepts data with a range of documentation. For many purposes the agency considers the final test reports, with their documentation, to be the definitive data in the collection. A quandary exists in that many record disposition schedule instructions call for the destruction of experimental data after the results have been incorporated into the final test report. With the retirement of Cold War scientists and the downsizing or phase-out of the nuclear testing program, increased efforts are needed in the near term to preserve the data that exist scattered in DNA, DOE, and elsewhere. DASIAC recognizes that this is important and has a working plan, the Data Archiving and Retrieval Enhancement (DARE) project, for addressing it. One of the DARE goals is to locate data and prioritize actions for their long-term preservation. For instance, DASIAC currently is digitizing air blast, electromagnetic pulse, and thermal records for better access and storage. Panel comments It is clear that the data from nuclear weapons tests are extensive and will be of significant future scientific and historical value. The tests will almost surely not be repeated. Furthermore, there appears to be some urgency in acquiring, cataloging, and preserving these data—those held by both DOE and DNA—while they are still intact and those who know about them are available to identify them. Compiled Data A number of federal agencies, particularly the Departments of Defense and Energy and the National Institute of Standards and Technology, have invested, over time, significant resources in building up collections of critically evaluated data. These collections are concentrated in the areas of physics, chemistry, and materials sciences. Many of these data are evaluated, archived, and disseminated by dedicated “centers” that serve both specialists and the broader scientific community. A major concern to the panel in considering these data collections is how the data and the underlying documentation can be preserved and made accessible if the “centers” producing them lose their funding or expert personnel. This concern is a side effect of the Departments of Defense and Energy “downsizing” their activities. The example of the Metal and Ceramics Information Center, a DOD Information and Analysis Center run by Battelle in Columbus, Ohio, which was closed down and its holdings transferred to CINDAS (cf. example 4 below), is an illustration of the issues that arise when a database closes down. Example 2: JANAF Thermochemical Tables1 The Joint Army-Navy-Air Force (JANAF) data were funded for many years by the U.S. Air Force Office of Scientific Research, the Department of Energy, and the National Institute of Standards and Technology. In addition to their scientific importance, the JANAF tables may have historical value as their early years indirectly document the U.S. push in science following the Sputnik launch. The JANAF thermochemical tables provide recommended temperature-dependent values for chemical thermodynamic properties of inorganic substances and for organic substances containing only one or two carbon atoms. The tables cover the thermodynamic properties over a wide temperature range with single-phase and multiphase tables for the crystal, liquid, and ideal gas state. The properties tabulated are heat capacity, entropy, Gibbs energy function, enthalpy, enthalpy of formation, Gibbs energy of formation, and the logarithm of the equilibrium constant for formation 1   This section draws heavily from the preface (p. ix), abstract(p. 1), introduction (p. 4), and reprinted preface to the previous edition (p. vii) of Chase, M.W., Jr., et al., “JANAF Thermochemical Tables, Third Edition,” J. Phys. Chem. Ref., vol. 14, suppl. 1, 1985.

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers of each compound from the elements in their standard reference states. Starting with the third edition, all values are given in SI units and are for a standard-state pressure of 100 kPa = (1 bar). Each tabulation is accompanied by a critical evaluation of the literature upon which the thermochemical table is based. Literature references are given (Chase et al., 1985 2). Between 1955 and 1958, severe difficulties were encountered by individuals attempting to conduct rigorous performance calculations for propellant systems that gave multiphase combustion products characterized by complex chemical and thermal equilibria. Several of these individuals approached the Armed Services requesting that a group be assembled to assess the validity of the calculation methods and thermochemical data which were being employed at that time3. In January 1958, following the Sputnik launch, the Armed Services jointly instructed the Solid Propellant Information Agency to organize the Joint Army-Navy-Air Force Ad Hoc Panel on Performance Calculation Methods and Thermodynamic Data. This panel, which consisted of 38 representatives of military facilities, defense contractors, and research organizations, terminated its operations in June 1959, with a recommendation that future activities be handled by a smaller working group. An additional recommendation was that the working group initiate the establishment of a thermochemical data compilation, evaluation, and dissemination program utilizing the available personnel and facilities of the Dow Chemical Company. On September 1, 1959, the JANAF Thermochemical Panel was formed under the sponsorship of the Bureau of Naval Weapons, Department of the Navy; Office, Chief of Ordnance, Department of the Army; Air Research and Development Command, Department of the Air Force; and the Advanced Research Projects Agency, Department of Defense. The JANAF Thermochemical Panel Membership, which consisted originally of approximately 15 individuals with special experience in the technological subject, reviewed and formally initiated the project at the Pentagon on November 16, 1959. The sense of urgency created by the launch of Sputnik resulted in a requirement that as large a set of consistent tables be assembled as quickly as possible. Computer programs had to be developed, and at first it was not possible to adequately assess the input information for every table. When the original data were evaluated, the table was printed on white paper; otherwise the table was printed on gray paper. By the end of 1960, the first set was ready for distribution to some 1000 qualified recipients. For a number of years, at the end of each quarter a supplement was issued that contained additional tables and revised tables. Some of the gray tables were revised to white tables. By June of 1971, the tables project was sponsored by the U.S. Air Force Office of Scientific Research (AFOSR) and prepared under the advice and assistance of a thermochemistry group, referred to as reviewers. In late 1976, the U.S. Energy Research and Development Administration (now the Department of Energy) joined in sponsoring the tables, with an expanded list of reviewers. AFOSR and DOE jointly funded this activity until late 1982. After 1982, AFOSR was the sole sponsor for a number of years. The tables, funded through AFOSR, were intended originally for calculating performance of thermochemical reactors, such as rocket engines. The computation of such performance figures as thrust and exhaust temperature for a rocket require data such as those in the tables.4 The DOE and its predecessor agencies (Energy Research and Development Administration, Office of Coal Research, and Bureau of Mines) became interested in the JANAF thermochemical tables because the tables had become the benchmark for thermochemical data for performance calculations. However, the reactions and reagents of interest to DOE differed from those of primary concern to the U.S. Air Force (USAF). The DOE needed performance calculations for reactors that are of interest in fossil fuel research, e.g., air pollution control equipment, automotive internal combustion engines, coal gasifiers and furnaces, fuel cells, liquefaction reactors and their catalyst structures, and magnetohydrodynamic generators. The calculations needed for DOE research had as their basis the same mathematics and physical chemistry required for the performance calculations of USAF interest, but DOE needed tables for more and different chemicals. For example, a rocket fuel probably would not be formulated to include silicon and sulfur, but all known coals contain both these elements. The published second edition of the tables called attention to some of the reasons for the phenomenal success of the JANAF thermochemical tables in achieving the initial limited objective of providing the standard data for the chemical rocket propulsion industry, and later, upon publication, worldwide recognition as thermodynamic reference data of the highest quality and timeliness.5 First, and most obvious, there was the selection and continued support of a highly competent evaluation team, themselves engaged in a broad spectrum of thermodynamic research. Moreover, the group remained productive in 2   This paragraph adapted from Chase et al., op. cit., p. 1. 3   This paragraph and the next four were adapted from Chase et al., op. cit., p. 4. 4   This paragraph and the next were adapted from Chase et al., op. cit., p. ix. 5   This paragraph and the next three were adapted from Chase et al., op. cit., p. vii.

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers spite of many battles to retain continuing support, and the actual sharp reduction of funding just before publication of the second edition. A second important factor was the unusual approach to format, evaluation, and distribution of the tables. The primary distribution was in frequently issued loose-leaf supplements. Each previously issued table could thus be revised as often as necessary to take account of improved data. Each loose-leaf table was accompanied on its reverse side by a complete explanation of the selection of the key data, together with all references. The third vital distinction of these tables was the existence of a continuing cognizant working group composed of technological users of data, thermodynamicists, and government sponsors of both research and development. Independent prepublication review of the tables was an important contribution, but the annual technical meetings resulted in even more far-reaching benefits. Together, the users and generators of data established realistic priorities for the species to be included in the tables; at the same time experimental research was guided by the demonstration of absence or inadequacy of needed data. The tables have been published by the National Institute of Standards and Technology (NIST) and are widely available in technical libraries. The documentation on which the tables are based—the catalogs of reactions and data employed, reprints and photocopies of published and unpublished literature and reports, worksheets, computer programs and input and output files from the computer processing—are held as microfilms, microfiche, backup tapes, 40 to 50 electronic diskettes and magnetic records, and hard-disk files in various formats. The annotated literature documents are especially valuable for future use. It also should be noted that the files are completely separate from those of the NIST thermodynamic tables produced in the same department. Since 1959 approximately one person year per calendar year (i.e., ca. 34 person years, equivalent to perhaps $6 million in today's dollars) represents the scientific and professional manpower that has gone into the project. To replace the files would require at least an equal and probably a greater amount of work. Currently, the effort devoted to this project is about $100,000 per annum, with three scientists working on the project part time. All of the funding comes from the Navy or NASA. There are two primary reasons to archive the JANAF tables and their associated data and documentation. The first is historical value. Because these tables were created to support the defense-related selection of rocket fuels, they may be compared with Soviet endeavors in the Sputnik era to show the ebb and flux of interest in particular fuels as a function of time (provided that one has access to the sets of gray and white tables in the sequence they were issued). Earlier interesting parallels between the USSR and USA interests in such fuels can be delineated. An equally important reason to archive these tables is to meet the scientific needs of primary and secondary users of chemical thermodynamic data. Panel comments The above description and history illustrates the scientific value of the JANAF tables, about which the editor of the Journal of Physical and Chemical Reference Data wrote,6 “Of the almost 300 titles we have published in the last 14 years, the four supplements to the JANAF Thermochemical Tables . . . have probably been the most widely distributed and used.” There are some compelling reasons for immediate concern about preservation of the files, documentation, and data underlying these tables. One is anticipation of a need to extend, for example, the temperature range of the tables—or significant portions of them—to higher temperatures at some future date. Extension would be greatly facilitated by the preservation of the working papers. Whether this should happen in 10 or 50 years is debatable, of course, but the point is that these tables—unlike the more recent CODATA tables (Garvin et al., 1987)—do not provide built in and published “reaction catalogs” and all the original work, if not carefully preserved, would have to be repeated since it is unpublished and therefore at present not protected by NIST. This would involve both delay in the endeavor and high cost. Physico-chemical studies and interest even today often exceed the 6000 K temperature range of the published JANAF tables. Users working at much lower temperatures also may need the unpublished documentation to interpolate the translated results, or to evaluate the impact of later measurements. Another concern about the current array of supporting data and documentation —in an atmosphere in which thermodynamics has low market value—is that even before the retirement or reassignment of the present part-time compiler, someone may need the space occupied by the JANAF documentation. The files then get relegated to “storage” and are lost in being displaced. Hence, proper archiving is needed not only for the long term but for the present as well. Example 3: Evaluated Neutron Cross Sections The National Nuclear Data Center (NNDC) at Brookhaven National Laboratory is funded by the DOE. The nuclear physics data that are compiled and evaluated by centers around the United States and the rest of the world have been made available to users since 1967. Numerous data files are now available online at the Brookhaven center. 6   J. Phys. Chem. Ref. Data, vol. 14 (1985), suppl. 1, p. iii.

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers The different data sets at the NNDC are the computer index to neutron data, nuclear structure references, computerized experimental nuclear reaction data, computerized evaluated nuclear data, photo-atomic data, and medical internal radiation doses. The compilation and evaluation of the data that have gone into these data files has taken at least 200 person years of effort. There are over 3 million 80-character records that make up the data files. The users of these data are involved in nuclear technology, such as reactor design, waste management, and nuclear safeguards; in space exploration, where shielding the inhabitants of space stations from cosmic rays is needed; in applied physics work such as nuclear medicine, oil and mineral prospecting, airport security and archaeology; and in basic research workers in low- and medium-energy nuclear physics. Panel comments The funding for this center has been decreasing as the United States moves away from nuclear energy as a power source. However, the information collected at the NNDC should be preserved since there is a strong possibility that nuclear energy will be the predominant power source in the next century. Example 4: Center for Information and Data Analysis and Synthesis The Center for Information and Data Analysis and Synthesis (CINDAS) at Purdue University, funded by the Department of Defense, evolved from the Thermophysical Properties Research Center (TPRC). TPRC was started in the 1960s by Y. Toulukian in one of the earliest efforts to collect and organize the expanding literature on materials properties. The staff of TPRC systematically searched the world literature for all relevant papers and built an indexed bibliographical database that permitted easy location of all papers reporting data on any specific property. This database was used as a resource from which specific data sets were extracted for further critical analysis. Experts evaluated the reported measurements, compared the results with theory, and attempted to select the most reliable values. Interpolations and extrapolations were made when possible. Thus, the center's functions included both analysis of existing data and “synthesis” of data that had not actually been measured; i.e., the objective was to extract the maximum information from the cumulative results of all the research on a given topic. In this respect, TPRC was typical of several dozen “data evaluation centers” established in the 1960s and 1970s with federal government support, located at federal laboratories (particularly the National Bureau of Standards, now NIST), universities, and the DOE national laboratories. TPRC later expanded its technical scope to include topics such as electronic and optical properties, properties of rocks, and mechanical properties of metals and ceramics. In keeping with the broader scope, the name was changed to CINDAS. The same basic approach—collect and organize the literature, extract the data, and select (or synthesize) the most reliable values—was applied to each new technical area. In its early days, TPRC received most of its funding from DOD and the National Bureau of Standards, with some support from Atomic Energy Commission (later DOE) and other federal agencies. In the 1980s, financial support for this type of activity diminished sharply. CINDAS was forced to rely on DOD for most of its support, and its priorities were set by military needs. Its character gradually changed from providing a service to the broad technical community of the country to carrying out specific tasks for DOD. CINDAS has extensive files in electronic form, accumulated over a generation. These fall into three categories: Bibliographic files that are indexed and annotated so as to permit efficient retrieval of references on any desired combination of materials and properties; Files of raw numerical data extracted from the literature, along with the pertinent metadata; and Files of evaluated data. The last category is the smallest in bulk. Also, much of the evaluated data has been published in books and journals, and thus is available through many technical libraries. If CINDAS were to disappear today, the evaluated data they have produced would still be accessible. However, just as with the JANAF tables, the bibliographic and numerical files that they built up at substantial cost in terms of dollars and intellectual effort still have significant value, and that value will remain for many decades. If in the future a need appears for properties of a material covered by the CINDAS files, it will be far less costly to use these files than to carry out a complete new literature search. Stated differently, CINDAS has organized an immense amount of technical information into these files and has extracted what was needed to carry out specific tasks deemed important today. Other tasks will undoubtedly appear in the future for which these files will be an extremely valuable resource. Example 5: Radiation Chemistry Data Center The Radiation Chemistry Data Center (RCDC) at Notre Dame University is another example of the data evaluation centers started in the 1960s. This center is a part of the University's Radiation Chemistry Laboratory and receives most of its financial support from DOE and NIST. Its original focus was on the chemical effects of ionizing radiation—the reactions that occur and the products produced when x-rays and gamma rays interact with chemical substances. Over

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers the years the scope has been extended to cover chemical kinetics and photochemistry more generally. The spectra of short-lived chemical species, solar energy utilization, and biomedical aspects of radiation chemistry now are included in the coverage. As is the case with CINDAS, the RCDC systematically searches the world literature for papers in its scope of interest and indexes these papers according to a thesaurus that it developed for this purpose. The resulting electronic database now contains extracts from over 125,000 papers. It represents a comprehensive coverage of the world literature in this field back to 1966, with some earlier papers. The database is used to produce a printed service, the “ Biweekly List of Papers on Radiation Chemistry and Photochemistry, ” which is widely distributed in the pertinent research communities. Annual indexes are also distributed. The data evaluation function occurs in the data center as well as in other laboratories. The RCDC has developed working relationships with a number of chemists throughout the world who use its bibliographic database as source material for preparing evaluated data compilations and critical reviews. The detailed indexing permits the center to provide customized reference lists to chemists who have agreed to prepare a compilation or critical review. This takes a major burden off the evaluator. The center also enters the numerical data resulting from the evaluations into electronic databases, which can be called up to support other evaluations and reviews. Updates of previous compilations, taking into account new literature, are readily accomplished because of the systematic management of the information. Panel comments on CINDAS and RCDC Both CINDAS and RCDC maintain electronic databases that represent the accumulated results of 30 years of intellectual effort and many millions of dollars. These are national resources, paid for by the public through taxes, which have lasting value. However, the financial support of centers of this type is often very tenuous. Although there is no reason to believe the centers chosen as examples are in danger of being closed, other data centers have had to close when their support was terminated. Funding decisions by federal agencies are sometimes dictated by factors that have nothing to do with the long-term value of the work. If a data center is forced to close because of loss of funding, the disposition of its electronic files is a serious problem. While the laboratory or university where the center was located may agree to store the files, such arrangements tend to be done on an ad hoc basis. As staff retire or die, the historical memory is lost, and there is little chance that the files will be maintained in usable condition over the long term. However, there is a good chance that the value of the files will remain, even on a 100-year time scale. The importance of the field may reemerge when data are needed in connection with some new technological development. The history of science and technology contains many examples where a field once considered stagnant suddenly becomes interesting again because of an unanticipated breakthrough. In any case, preserving files when a data center (or a military base) closes requires recognition that archiving of records takes financial resources and manpower, and that these must be available at the time the records are to be stored. If a data center must close, NARA may be the most appropriate agency to preserve the files of potential long-term value. The files in question here are far smaller than those resulting from satellite monitoring or other environmental measurements, and the formats are generally simple. In comparison with geophysical databases, these databases are minuscule. However, the potential for future use may be very high because they represent so many years of intellectual effort. Engineering Data There also is a need to preserve unique, extensive collections of engineering data that involved a massive effort to collect. These would be virtually impossible or prohibitively expensive to reproduce. Such collections generally are complex numeric data, not textual data that are readily understood and searched by nontechnical personnel. The following example is representative of collections of mechanical properties data for engineering materials. Other examples include the measurements at Hanford discussed in Section 2, polymer and sensor data from the Strategic Defense Initiative (SDI), test results published by the Electric Power Research Intstitute on the toughness of steels that were put into a digital database by the Materials Property Council, and the superconducting materials measurements that were carried out to develop magnet fabrication techniques for the Superconducting Supercollider (SSC). Even though the SSC was not completed, the materials measurements should be saved. Example 6: Aluminum Fracture Toughness Data Bank Test results documenting the dependence on several variables of plane-strain fracture toughness of aluminum alloys used in critical aerospace applications were pulled together by the Aluminum Association and the Metals Properties Council in the years 1970-1985. These were subsequently made available in electronic format by the National Materials Property Data Network, Inc. with support from the NIST Standard Reference Data Program. The information content includes description of material, including alloy, temper, product form, specimen orientation, producer, baseline tensile yield strength, and modulus of elasticity; results of complex tensile tests of

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers notched uniaxial specimens, including notch-yield ratio; results of plane-strain fracture toughness tests including degree to which all 13 criteria of validity were satisfied. Detailed data are quite complete for individual tests, and unique in that degree of validity of each individual test result is clearly shown, in accordance with ASTM Test Method E-399. There are about 30,000 test data sets, each representing about 50 individual pieces of information. All data are raw test results. The data are used to judge average and statistical distribution of toughness of aluminum alloys for critical applications, and to estimate the relationships among different test indices and their relative value for assessing fracture resistance. Panel comments Because the cost-effectiveness of making such data commercially available is marginal, this massive amount of information representing many commercial and military structures may someday be lost unless it is archived by the government. These examples and the earlier discussion illustrate a number of aspects of the burgeoning data of physics, chemistry, and materials sciences stored in electronic form. The data are very diverse in type and content; scientific data are not only sets of numbers. Considered as data in the information science sense, most of the data sets are small. Finally, there are many ways in which valuable, or useful, data in the reproducible laboratory sciences can be nonreproducible, and, if not preserved, irretrievable. NARA's Laboratory Data Holdings While picking illustrative examples and considering the relation of the above examples to the National Archives and Records Administration, the panel asked about the physical sciences data in NARA's Center for Electronic Records. Of the thousands of holdings in the Center, very few are related to the physical sciences and apparently none contain scientific data of the kind represented by the above examples. NARA's policy has been to acquire laboratory data only if they are of high historical value, under the presumption that the data would be repeatable. The decision whether or not to archive has not been influenced greatly by how often the data are or will be accessed. NARA's current electronic holdings of laboratory science records are limited to such items as a National Register of Scientific and Technical Personnel; a 1971 survey of scientists and engineers; and records of the investigation of the “Challenger” accident (NARA, 1992b). Paper holdings are somewhat more extensive and include such items as historical records from the Lawrence Berkeley Laboratory, including scientific notes from Nobelists McMillan, Alvarez, and Calvin and from some major high energy physics experiments; computations, calibrations, and comparisons on weights and measures from the National Bureau of Standards; reactor and reactor safety records from Argonne National Laboratory; and environmental contamination and toxic substance exposure records from the Atomic Energy Commission's Idaho Operations Office. The holdings illustrate that the agency is not yet playing a significant role as a repository for scientific data, especially not of electronically recorded data. A study performed in 1991 for NARA by the National Academy of Public Administration (NAPA) examined a large number of databases for possible consideration for accessioning by NARA (NAPA, 1991). When the panel went over the approximately 300 databases listed in Appendix C of the NAPA report, it found six titles that might be similar to the kind being considered here: USNRC Nuclear Materials Management and Safeguard System (item 183); DOD Naval Environmental Protection Support Service Data Files (item 289); EPA Environmental Monitoring and Assessment Program (item 369); NSF Academic Research Equipment in Selected Science/Engineering (item 385); USNRC Reactor Safety Data Bank (item 185); and DOE Test Data Database (item 935). The fact that NARA does not currently play a role in the preservation of scientific data sets does not mean it has none. However, to take on this responsibility, NARA will have to operate somewhat differently than it has in the areas it has served well in the past. The remainder of the panel's report deals with the issues of what to save, who should save it, the challenges and opportunities that arise because of involvement by multiple institutions, and how NARA might modify its methods if it is to serve future generations most effectively as a preserver of scientific data. Just as the traditional means of transmitting and storing scientific information—technical journals and libraries—will have to change to adapt to new mechanisms of doing science and of scientific information transfer, so will the approaches to archiving and mechanisms of long-term preservation. 5 DATA RETENTION AND RECORD PRESERVATION CRITERIA What data are worth keeping for a long time? Data can have long-term retention value either because of the difficulty of reproducing them (e.g., nuclear test data, data from obsolete accelerators, materials property data) or

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers from the effort put into collecting or processing them (e.g., extensive critical compilations). Archivists should be able to apply these criteria of retention value with a modest amount of outside input. When determining whether scientific data should be preserved, the panel suggests a few simple questions (analogous to those addressed before a scientific paper is published in an archival scientific journal): Do the originators or current holders of the data think the data are of sufficient long-term value to be preserved as part of the archived scientific record? Have they demonstrated this by annotating the data and by providing a written description sufficient for others to use the data set; i.e., have they made the effort to provide the necessary metadata? As noted earlier, this step is analogous to preparing research results for publication. Have the data and metadata been certified by peer review or the appropriate equivalent? This review would attest to both the quality and value of the data set. If the data are not preserved, but are needed in the future, could they be reproduced at reasonable cost? If the organization considering preservation does not have the in-house expertise to answer these questions, it can easily request recommendations from outside referees as is done in refereeing technical articles and research proposals. The Current System Is Effective Most of the Time The above questions are modeled purposely upon those that have proven useful to the scientific community in preserving printed records over the past 300 years. The panel believes that a change in medium requires adapting and modifying the methods that have worked for science, not throwing them out and starting anew. Most of the data sets that contain accumulated knowledge about natural phenomena and measured properties will continue to be useful to and maintained by the scientific community for a long time. Their value will diminish only if and when theory advances sufficiently that such items as material properties can be computed rapidly, as needed, from first principles. As already noted, the scientific community's mechanisms for maintaining, updating, and disseminating such data are, on the whole, quite satisfactory. However, two areas that are of particular importance for electronically stored data sets and that are worthy of considerable attention are (1) establishment of an effective locator system so a potential user can determine in a short time whether searched-for data exist, where they are, and how to access them; and (2) an ability to identify and preserve data sets (meeting the above criteria for preservation) that might otherwise be lost because the responsible institution is no longer able or willing to maintain them. 6 SUGGESTED ROLE OF THE NATIONAL ARCHIVES AND RECORDS ADMINISTRATION As discussed in Section 1, the panel's answer to the question of who should save scientific data is that these data should, whenever feasible, be saved by those institutions best equipped to make them accessible to the primary users of the data, the scientific community. In most cases, this means primary responsibility will continue to be held by technical libraries, government agencies, and professional societies that currently archive and make accessible scientific and technical data, records, and publications. In addition, in response to the third question raised in Section 1, the panel suggests that there is a helpful, enhanced role for NARA to play in the preservation of data from physics, chemistry, and materials sciences. NARA might provide: A repository of last resort; A focus for interagency cooperation and communication; A locator system; Collaborative standards; and Education and assistance with preservation and archiving. To meet these needs it is necessary that NARA modify some of its practices regarding: The definition of archivable (and perhaps of “secondary user”) for scientific data; Data formats and storage media; Customer orientation as opposed to rule promulgation; and Distributed versus centralized record holdings.

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers To meet these needs adequately, it may also be necessary that NARA attain new competencies in: Scientific and technological disciplines; and Ability to provide rapid access to data. Need for a “Repository of Last Resort” There would be value in NARA or another federal agency keeping a watchful eye on scientific data of national importance. In the rare event that a federal scientific data set (that the scientific community feels should be stored long-term, and for which there is no other home) is threatened with disappearance, the panel sees a need for a “repository of last resort” to preserve valuable data that would otherwise be discarded. The JANAF and CINDAS examples illustrate the panel's concern. This responsibility may be appropriate for NARA. If a data set is about to be abandoned, NARA should have it appraised and consider accessioning it if asked by the holding agency or, if the data set is about to be destroyed, by scientific advisors. Perhaps, if NARA decides not to accept a threatened database, there should be a mechanism for appeal. In the role of backup agency, NARA should, at a minimum, store databases that would otherwise be lost from lack of funding or interest. By storing a database, even if NARA could not maintain it, there would at least be a snapshot of the database contents in case the compilation effort were ever started up again. For example, if one of the 14 DOD contractor-operated Information Analysis Centers (IACs) is threatened with losing its funding and closing, there should be a systematic way of ascertaining this and notifying NARA. Typically, this information spreads only by word of mouth, and, at best, a stopgap solution is found. An early warning to NARA would be better, so that NARA could determine whether to hold the data, even if only temporarily, until interest might grow again and result in funding. IAC products include handbooks, data tables, and descriptions of work done to produce or evaluate the data. Much of this information eventually ends up in the Defense Technical Information Center (DTIC), but the IACs do keep some separate sets. There also can be a discontinuity when a contractor is changed. Similar problems face data centers sponsored by other federal agencies. Data holdings of such centers fit NARA's definition of federal records. Broadening the Definition of “Archivable” and of “Secondary User” for Scientific Data One criterion for NARA's archiving of a record is that it be “of interest to more than one secondary user for more than just the existing season” (Peterson, 1993). On the whole, scientific data are being adequately kept by the community, the national laboratories, the journal system, and the professional societies, and there is not a major need for NARA to play a role. NARA's current priorities, which emphasize scientific data that are of more interest historically than for the originating community, have generally been adequate, but data sets should not be immediately excluded from consideration if they are of interest to only the primary users. Perhaps NARA should consider adopting a fairly broad interpretation of “secondary user” in the case of science. As a hypothetical example: if the JANAF database were archived and researchers in 2050 decided to extend those tables into a new temperature range, the current compilers might consider those users secondary rather than primary users. Even though the future scientists are from the same research community that produced the original data, they would be extending the value of the data in the way a secondary user does, rather than accessing just the original values in the way envisioned by the compilers for their primary users. The panel thus suggests that a fifth question be added to the four in Section 5 to determine whether a data set should be accessioned by NARA: If the data are not preserved by NARA, will they be lost? More Flexibility on Data Formats and Storage Media Greater involvement in the archiving of scientific data would mean that NARA would be serving a community new to it. The scientific community uses, in addition to conventional modes, forms of communication quite different from the normal language and numerical tabulations used by NARA's current clients. Mathematical expressions, graphics, and schematic structural data, as well as complex, multivariate, numerical tabulations, are integral components of scientific data sets. The forms in which these are expressed and stored may be, in many

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers cases, unfamiliar to NARA. Furthermore, these forms are changing rapidly because they are still being developed. In this situation, not only must NARA be able to talk knowledgeably with its clients about these formats, but NARA's acceptable data formats need to accommodate these different types of data. The current acceptance criteria do not allow for electronic formats compatible with much scientific information, so that broadening the range of acceptable electronic forms is mandatory. For instance, NARA's current standards for electronic records do not allow for the accession of image data, in part because NARA cannot support all the graphic formats and software available. Likewise, NARA does not accept CD-ROMs and optical disks because they are not yet standardized. NARA does not want to keep a “computer museum” to accompany its data holdings. The needed graphic standards are not available, and this lack negatively affects the way science is done (lots of time spent transferring data among media) and precludes safe long-term storage of some data. Clearly, both the scientific community and NARA would benefit if there were enough progress on media standardization to facilitate the interchange and safe preservation of image and graphic data. NARA's restrictions on electronic records specify only ASCII or EBCDIC characters, so scientific symbols, Greek letters, chemical formulas, etc. are not allowable without clumsy substitutions. We urge NARA, at a minimum, to loosen its restrictions on this, for instance by allowing internal control characters. A manageable number of informal standards will likely emerge as Internet exchanges become more common, at which time NARA can concentrate on and support just a few standards (e.g., for different word processing systems). Should NARA have the responsibility of setting standards for electronic records? Certainly not by itself; this is an area that requires collaboration and cooperation among many players. The NARA mission and tradition have not made them a leader in this area. Agencies other than NARA with primary missions involving the retention of electronically stored data have had better opportunity to devote staff and energy to these issues. NARA also will need to devote more staff and energy to the technical aspects of media and formats to be able to cooperate effectively with other agencies, institutions, and committees dealing with electronically recorded data so that NARA standards (or better yet, NARA performance criteria, as discussed below) reflect those of the field at large and of the users of the preserved data. It would be very unfortunate if irreplaceable records were lost because of unwarranted conservatism regarding “archivable” data formats. NARA needs to look at the data conversion issue and to be more flexible about the data records that it will accept. For instance, Lotus 1-2-3® is a commonplace de facto standard that is also useful for some scientific data applications. Standard Lotus data files (WKS and WK1) are readable by most competing software programs, but NARA will not accept them. The panel encourages NARA 's Center for Electronic Records to be more responsive and flexible in this regard. Considering the growing volume and importance of electronic records, a significant infusion of resources will be needed, and perhaps some resources should be shifted from the paper side of the house to the electronic side. New criteria for acceptable data formats would best be constructed with a high degree of flexibility, because of the changing ways in which scientific data are represented and stored. To this end, the panel encourages NARA to work with the scientific community to develop a set of performance criteria, rather than locked-in format standards, which would define acceptable formats for data acquisition. As noted in the discussion of storage media in Section 3, the performance criteria will have to address the underlying need for accessibility, suitability, longevity, and stability of the storage medium. Changing from inflexible standard formats to flexible performance criteria will be a challenge, but the impact on the archiving of scientific data—and of other data as well—is likely to be very large indeed. In summary, the panel is concerned that current NARA format and media requirements are unnecessarily restrictive. One guest of the panel observed that the current restrictions on electronic records are analogous to requiring all paper records to be typed double-spaced on 8.5″ × 11″ white rag paper using pica type and a carbon ribbon. NARA should be encouraged to accept more physical and electronic formats and recognize widely utilized data formats. The major concern should be the need for preservation and the usabilty of the archived data by future generations. The panel emphasizes that NARA's record transfer standards should not be allowed to constrain the archiving of data that otherwise would qualify for archiving. Scientific and Technical Competence The management and distribution of scientific data are different from those same responsibilities for data where the principal interest is historical. Access, evaluation, and updating, for example, are aspects in which the

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers differences are so great as to be qualitative. If NARA is going to include scientific data in its collections in ways that meet the needs of the scientific community, it needs to build a staff with knowledge and skills in this different kind of data management. The present staff of NARA simply has different skills and a different orientation. Archiving of scientific data will require expanding the staff, not only expanding the physical capabilities. The new staff will have to be knowledgeable about the nature of the data at the scientific agencies, including numeric and structured data, and the differences between these kinds of data and textual or bibliographic data. They will need to understand and be sympathetic to the requirements of scientific users, including the kinds of access the scientific community needs, and the forms technical data can take when they are made into databases and stored. They will need to be able to discuss with scientists not only the decisions about storage of specific data sets, but also the nature of the metadata required for the data to keep their value. Experience with materials science and engineering databases used in on-line searching has shown that nondisciplinary reference libraries and professional information specialists do very well in searching textual and bibliographic databases in the sciences, but have considerably more difficulty in understanding and searching numeric scientific databases. As noted previously in this report (cf. Section 2), numeric databases, of the kind most common in engineering, are complex in that (a) there are many numeric independent variables to be dealt with, (b) all properties have units that must be kept aligned with their respective numeric values, and (c) unique database structures are regularly encountered. Either through a network of consultants or a few fulltime staff members, NARA will need to have technical expertise in major disciplines in order to organize effectively the searching of archived technical files. An analogy to NARA's need for staff with scientific experience is provided by patent law: almost all successful patent lawyers have backgrounds in science or engineering, without which they hardly would be able to function. Interagency Cooperation and Communication The panel encourages more interaction between NARA and other agencies, with agencies alerting NARA to databases threatened with closure and NARA then accepting the role of data preservationist of last resort for those databases. Some DOD agencies or contractors incorrectly assume that sending records to the National Technical Information Service (NTIS) fulfills their obligation to archive federal records permanently. NARA has seen instances where permanent data that should be at NARA have been discarded by NTIS within 20 years, before NARA realized the data were there and could effect the transfer. A number of panel visitors and members observed that the NARA brochures are authoritarian, legalistic, and not conducive to establishing productive partnerships with NARA. Considering the many institutions, private as well as public, with interests and expertise in scientific data and in electronic data formats, NARA's future effectiveness in archiving such information requires that NARA improve its relations with other agencies and institutions in the data world. As a corollary, none of this panel's suggestions should be construed to imply that NARA should issue additional proclamations, regulations, or standardized procedures, which would be counterproductive. The goal should be to present more carrots than sticks; for instance, to make clear to more researchers, managers, and funders the value of having and following data retention plans, with appropriate metadata. With better communications, NARA can play the role of a “service provider,” which is desirable. NARA is already working with the DOD Legacy Resource Management Program to identify and preserve cultural resources under DOD jurisdiction, and together with Legacy, has sponsored a conference to assist military contractors in preserving their documentary heritage. The panel suggests that NARA further pursue such collaborations in the same spirit of partnership. NARA should consider setting up an in-house database to track federal holdings, especially to anticipate problems with data sets housed in other agencies that may eventually need NARA protection or other assistance. To do this effectively would require establishing a set of contacts (at the working level; i.e., with people who understand the databases in the agency collections) in other agencies. CENDI may be a starting point or model for a good communication mechanism. It is a committee composed of representatives of the information departments of major federal agencies; the acronym stands for “Commerce, Energy, NASA, Defense, Interior,” although other agencies, such as the National Library of Medicine, are represented. While CENDI itself deals mostly with bibliographic databases, it could be augmented with a numerical data group.

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers Need for a Locator System There is a need for a more general locator function, or “directory of directories.” Some sort of broad locator is needed even though it is not clear that NARA is the most appropriate government agency to perform this function. The National Technical Information Service (NTIS) in the Department of Commerce may be a more appropriate candidate. During the course of this study, NTIS expanded its FedWorld bulletin board gateway into such a locator (Jones, 1993). In addition, the Office of Management and Budget has organized an interagency effort to develop a Government Information Locator Service. Nevertheless, there is a need for a NARA-maintained directory of archival data within its own system. This should include archival records maintained by other government agencies that are recognized as part of a distributed archival system overseen broadly by NARA. Approaching the Ideal—Guidelines for the Preservation of Archivable Information The preceding sections of this report have emphasized two needs: (1) scientific data should be preserved if they will be of value to future scientists and cannot practically be reproduced; and (2) the single major criterion in how they are preserved is that they must be usable. Other aspects to consider are the possible broadening of the definition of data to include different data types (graphics, images, multi-media, audio), formats (word processing languages, optical disks, and others that violate one or more of the NARA records restrictions), and interlinkages (such as interactions between environmental, physical, and chemical data and environmental modeling, or the way that chemical thermodynamic data can have very widespread ramifications). Turning to some of the more technical, as well as policy, aspects of an ideal archiving system, the panel concurs with, and endorses, the recommendations made by the National Academy of Public Administration in their 1991 report (NAPA, 1991). That report contains 13 detailed recommendations for NARA in its executive summary, 11 of which are pertinent to scientific data and are reproduced in Appendix B of this volume of panel reports. The overall goal of an ideal archiving system is to “protect the interests of the next generation of researchers” (Thibodeau, 1993). In order to help protect those interests, the panel suggests that NARA improve its capabilities to be the archivist of last resort for scientific and engineering data. 7 SUMMARY CONCLUSIONS AND RECOMMENDATIONS What data should be preserved for the “long term”? The primary criterion for determining whether a laboratory science data set is a candidate for long-term preservation is whether or not it is feasible to reproduce it. Federal data sets in the laboratory sciences that are candidates for long-term preservation can be classified into three generic types: Massive records and data from an original experiment, particularly a “mega-experiment,” that there is no realistic chance of replicating, even though it is, in principle, reproducible. Critically evaluated compilations of data from a large number of original sources that represent tremendous accumulated effort. Unique, perhaps time- and environment-dependent, engineering data collected at federal facilities or as part of a government project (that may or may not ever be completed), much of which never reaches the published literature. In summary, data can have long-term retention value either because of the difficulty of reproducing them (e.g., nuclear test data, materials property data) or from the effort put into collecting and processing them (e.g., extensive critical compilations). Archivists should be able to apply these criteria of retention value with a modest amount of outside input. When determining whether scientific data should be preserved, the panel suggests a few simple questions (analogous to those addressed before a scientific paper is published in an archival scientific journal): Do the originators or current holders of the data think the data are of sufficient long-term value to be preserved as part of the archived scientific record? Have they demonstrated this by annotating the data and by providing a written description sufficient for others to use the data set; i.e., have they made the effort to provide the necessary metadata? This step is analogous to preparing research results for publication.

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers Have the data and metadata been certified by peer review or the appropriate equivalent? This review would attest to both the quality and value of the data set. If the data are not preserved, but are needed in the future, could they be reproduced at reasonable cost? During the course of this study, the panel became aware of two data sets, both among the examples described in Section 4, whose preservation has acquired some urgency: Results of the atmospheric and underground nuclear weapons tests carried out by the U.S. government in the period 1945-1992. The files and auxiliary data accumulated by the JANAF Thermochemical Data project at the National Institute of Standards and Technology. Who should save the data? Scientific data should be saved by organizations and in formats so that they are maximally available to the primary users—the scientific and technical community. In most cases, this means primary responsibility will continue to be held by technical libraries, government agencies, and professional societies that currently archive and make accessible scientific and technical data, records, and publications. There is, however, an enhanced role for NARA to play (cf. conclusion 4). Scientific data management requirements. There are general requirements critical to subsequent use of data from physics, chemistry, and materials sciences, including the need for detailed metadata along with the data set. This includes the need to preserve and describe the algorithms and models used to acquire, process, evaluate, or utilize the data set. Researchers or centers would be more likely to produce adequate documentation if that documentation were recognized as a bona fide scholarly document by the scientific community and its administrators. to save classified data and to ensure that it is declassified as soon as the reason for classification is no longer applicable. for a comprehensive, up-to-date, easily accessible, national or international “locator system” that will enable scientific researchers to determine whether data exist and how to access them. There would be value in having one federal agency responsible for staying informed about all federal databases; that is, acting as holder of the master “ directory of directories.” for prompt access. Scientific researchers need to know quickly if data exist and, for some types of data, to have rapid access to the data. to handle different kinds of appropriate storage media and yet to have standards so that the number of types of media is limited. to handle myriad appropriate data formats and yet to ensure that data are retrievable and usable. Performance criteria for data formats are much more likely to meet these underlying objectives than are rigid format requirements that become obsolete quickly and that may not accommodate all types of technical data. to consider data management and preservation, and to provide guidance to scientists as to appropriate formats for data of long-term value, during project initiation. There is an enhanced role for NARA to play in preserving scientific data. NARA would contribute greatly by providing: A repository of last resort. We suggest that an additional question, “if the data are not preserved by NARA, will they be lost?” be added to the four under conclusion 1.b in determining whether a data set should be accessioned by NARA. A focus for interagency cooperation and communication; A locator system; Collaborative standards; Education and assistance with preservation and archiving. NARA should modify some of its practices if it is to address effectively the needs of the scientific community. If NARA is to play an enhanced role in effectively archiving scientific and engineering data, there is a need for immediate action by NARA in making changes along the lines suggested below and in the body of this report.

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers NARA should work with the scientific community and potential sources of scientific data to develop adaptable performance criteria for data formats and media. The goal would be to meet NARA 's basic need to ensure long-term usability while also enabling accession of data, such as images and structures, that cannot be accommodated by NARA's current restrictive standards. NARA should pursue collaborations with other agencies to ensure data preservation and accessibility in a spirit of partnership, rather than by promulgating additional rules. NARA's current priorities, which emphasize scientific data that are of historical interest, are generally adequate for laboratory sciences data, but scientific data sets should not be immediately excluded from consideration if they are of interest to only the scientific community. With scientific data, where detailed knowledge about the data is often a key component of usefulness, NARA should explore and encourage distributed, as opposed to centralized, record holdings. Serving the scientific community requires new or enhanced capabilities to provide rapid access to data. The ability to serve the scientific community requires increasing the amount of scientific and technological expertise within NARA. The panel endorses the comprehensive recommendations made by the National Academy of Public Administration in the 1991 report, The Archives of the Future: Archival Strategies for the Treatment of Electronic Databases. The executive summary of the NAPA report contains 11 recommendations pertinent to scientific data that are presented in Appendix B of this volume. ACKNOWLEDGMENTS The Physics, Chemistry, and Materials Sciences Data Panel gratefully acknowledges the contributions of many individuals from the federal agencies, as well as of Paul Uhlir and Scott Weidman of the National Research Council staff, to the preparation of this report. At the panel's first meeting, in July 1993, it benefited from presentations by and detailed discussions with Robert Billingsley, Defense Technical Information Center; Mark Conrad of NARA; Suzanne Leech, Bionetics, Inc.; and Patricia Schuette, Battelle Pacific Northwest Laboratory. In addition, Victoria McLane of Brookhaven National Laboratory, and the staff of the National Archives and Records Administration and the National Research Council, provided substantial amounts of printed material for the panel's consideration and discussion. At its second meeting, in November 1993, the panel again benefited from a useful discussion with Mark Conrad of NARA and was given a comprehensive overview of the Defense Nuclear Agency's holdings of data from nuclear weapons tests by Donald Alderson of the Department of Defense Nuclear Information Analysis Center. Subsequently, Frank Biggs of Sandia National Laboratory provided written material on Department of Energy holdings that complement those described by Alderson. BIBLIOGRAPHY Billingsley, Robert. 1993. “Numeric Data and Archiving at DTIC” (briefing notes), Defense Technical Information Center, Alexandria, Va., July 8. Brookhaven National Laboratory. (undated). “National Nuclear Data Center Products and Services,” pamphlet, Upton, N.Y. Bundy, Dean. 1993. Written briefing presented to the Space Sciences Data Panel of the Committee for the Long-term Retention of Selected Scientific and Technical Records of the Federal Government, Sept. 21, on archiving experience at the Naval Research Laboratory. Burrows, Thomas W. 1990. “The Evaluated Nuclear Structure Data File: Philosophy, Content, and Uses,” Nuclear Instruments and Methods in Physics Research A286, pp. 595-600. Chase, M.W., Jr., et al. 1985. “JANAF Thermochemical Tables, Third Edition,” J. Phys. Chem. Ref., vol. 14, suppl. 1. Conrad, Mark. 1993. Listings of three record groups from the National Archives and Records Administration (Records of the National Institute of Standards and Technology, 1830-1974, record group 167; Records of the Atomic Energy Commission, 1942-1975, record group 326; and General Records of the Department of Energy, 1858-1990, record group 434), July 19. Defense Technical Information Center (DTIC). 1993. Directory of the Department of Defense Information Analysis Centers, Alexandria, Va., August. Haas, J.K., H.W. Samuels, and B.T. Simmons. 1985. Appraising the Records of Modern Science and Technology: A Guide, Society of American Archivists, Chicago, Ill. Hamner, Richard M. 1992. Microgravity Archive Study and Implementation Plan (final report for phase I of the Marshall Space Flight Center tasks) , NASA, Marshall Space Flight Center, Huntsville, Ala.

OCR for page 1
Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers Jones, Jennifer. 1993. “NTIS takes aim at becoming public access hub,” Federal Computer Week, July 19, p. 16. Leech, Susanne A. 1993. “MSAD Archive Development Program” (briefing notes concerning the Microgravity Sciences and Applications Division of NASA), Bionetics, Inc., Washington, D.C., July 7. Marsh, K.E. (ed). 1992. “Physico-Chemical Data Centers,” CODATA Bulletin 24, July-Sept., pp. 31-43. McLane, Victoria. 1993. “Comments for NRC Briefing” (briefing notes, including a description of data holdings at the National Nuclear Data Center), July. McLane, V., C. Nordberg, H.D. Lemmel, and V.N. Manokhin. 1988. “Nuclear Reaction Data Centers,” Proceedings of the International Conference on Nuclear Data for Science and Technology, Mito, Japan, pp. 1157-1160. National Academy of Public Administration (NAPA). 1991. The Archives of the Future: Archival Strategies for the Treatment of Electronic Databases, Washington, D.C. National Archives and Records Administration (NARA). 1985. Saving the Right Stuff, Office of Records Administration, Washington, D.C. National Archives and Records Administration (NARA). 1990. Managing Electronic Records, Office of Records Administration, Washington, D.C. National Archives and Records Administration (NARA). 1992a. “Information About the Center for Electronic Records,” general information leaflet 36, Washington, D.C. National Archives and Records Administration (NARA). 1992b. “Information About Electronic Records in the National Archives for Prospective Researchers,” general information leaflet 37, Washington, D.C. Pearlstein, Sol. 1990. “Finding Nuclear Data: The National Nuclear Data Center.” Pp. 189-195 in Conference Proceedings, Heavy Ion Beams, Materials Research Society, Pittsburgh, Penn. Peterson, Trudy. 1993. Presentation to the Committee on Long-Term Retention of Selected Scientific and Technical Records of the Federal Government, Irvine, Calif., July 7. Reese, Ken. 1993. “How Chemists Came in From the Cold,” Today's Chemist at Work, October, Pp. 69-70. Schuette, Patricia. 1992. “Information Systems Inventory” (briefing notes on the Hartford Environmental Information System) , Battelle Pacific Northwest Laboratory, Richland, Wash., December 11. Tempo, Kaman. 1983. DASIAC Users Guide, Santa Barbara, CA, May. Thibodeau, Kenneth. 1993. Presentation to the Committee on Long-Term Retention of Selected Scientific and Technical Records of the Federal Government, Irvine, Calif., July 7. Warnow-Blewett, J., L. Maloney, and R. Nilan. 1992.>Documenting Collaborations in High-Energy Physics, report no. 2 of phase I (high-energy physics) of the AIP Study of Multi-Institutional Collaborations, Center for History of Physics, American Institute of Physics, New York. Warnow-Blewett, Joan, and Spencer R. Weart. 1992. Summary of Project Activities and Findings; Project Recommendations, report no. 1 of phase I (high-energy physics) of the AIP Study of Multi-Institutional Collaborations, Center for History of Physics, American Institute of Physics, New York. Westbrook, J.H. 1992. “Current Activity in North America on Numerical Databases on Materials Properties,” CODATA Bulletin 24, Jan-Mar, Pp. 62-73.