Documentation of Genetic Resources
In the first decades of genetic resources activities, emphasis was given to the study and conservation of genetic variation. Now, general attention is focused on the management and utilization of the germplasm materials held in collections (Frankel and Brown, 1984). As a result, information on the germplasm in collections has become nearly as important as the germplasm itself.
In the management of the collections, comprehensive information is required to decide which material should be included or excluded from a collection. Descriptive information about each accession is also required. Easily accessible information is needed to optimize the efficiency of the management and use of the collections. Readily accessible information is required to facilitate the exchange of materials and information among germplasm banks and to help the users in experimenting with conserved germplasm.
Increasingly, genetic resources activities are being coordinated on a global scale, leading toward the realization of a cohesive worldwide network. Eventually, this would make possible the more efficient allocation of available genetic resources. This can only be done successfully if information on existing germplasm collections is both available and compatible.
The distinction between the information and the systems that allow for the management of that information is important. Information is the raw material that can be used for decision making. Information systems are the mechanisms for storing and using large amounts of information.
This chapter provides an overview of the current status of the documentation of germplasm collection, the information and management of the information in the germplasm system, and some possibilities for future developments.
INFORMATION ON GERMPLASM COLLECTIONS
A distinction can be made between the information needed to manage and maintain the collection and the descriptive information needed for its utilization.
The information needed for germplasm-bank management is the information produced during the acquisition, monitoring, regeneration, and distribution of accessions (Ellis, 1985). The amount and type of information that is processed for germplasm-bank management depend entirely on the procedures used for these basic operations. For that reason they are very specific to a particular germplasm bank. The types of management information are quality, quantity and the type of material, storage location, distribution to the users, and treatment of the material (Konopka and Hanson,1985).
Since descriptive information should support all kinds of decisions concerning the choice of the material for utilization, it should contain the information most appropriate for that task, including information that is available from other germplasm-bank collections as well. The main reason cited by breeders for not using material in germplasm banks is that the information provided with accessions is either irrelevant to needs or too incomplete to be of use (Peeters and Williams, 1984).
Users must be able to make rational choices of the germplasm that they want to use in their research or breeding programs. The information at the germplasm bank should support this choice. Users must make decisions about their activities on the basis of information about their own and other's collections, including passport, characterization, and evaluation data, described in Chapter 5. Together these basic data describe the smallest unit contained in the germplasm bank, the accession.
Once the basic data have been entered into the information system, the quality of the information can be enhanced. The integrity and plausibility should be confirmed. Then the quality of data can
be increased. For example, if the collection site is documented but the longitude and latitude are not available, these can be added. If the longitude and latitude are available, climatologic data on the collection site can often be added.
Sources of Additional Information
Besides the data base on the particular collection, many additional sources of information can be used in managing a germplasm bank. These can be data bases on collections at other germplasm banks, central crop data bases, and data bases on related subjects.
Information on Other Collections
The aim of the user is to select the material most suitable for a particular purpose from a wide range of genetic variability (Holden,
1984). However, a curator or user may want to know where other collections are located, the type of material they included, and what is available from those collections.
There are several ways that information on other collections can be gathered. Information on an accession level is available in printed material, such as catalogues and scientific papers. Information can also be obtained on magnetic media or on-line from the institute holding the collection. Another, probably more efficient, way of compiling information on other collections is the use of central crop data bases. Since many germplasm banks have computerized all or part of their collection data, it is possible to combine the data sets of individual germplasm banks for each crop into central crop data bases. This approach has great potential (Frese and van Hintum, 1989; Vanderborcht, 1988). Because duplicate samples unnecessarily aggravate the management of collections, from the curator's viewpoint, the collection data in central crop data bases would make it possible to trace duplicate samples more efficiently.
Central data bases could be used to set priorities for collection missions and regeneration programs. Depending on the information included, it can function as a source of data for many types of studies. In addition to the detailed information on the level of collection accessions, it can be useful to have summarized information, presenting a general overview on a species level or other convenient category. An important source of this kind of information is the crop directories of the International Board for Planet Genetic Resources (IBPGR). These directories, although sometimes not very discriminating,describe most major collection by crop (Bettencourt and Konopka, 1988; Lawrence et al., 1986). Other sources of information on collection locations are available from, for example, the United Nations Development Program and International Board for Plant Genetic Resources (1986) and Sgaravatti (1986).
Information on Related Topics
A multitude of scientific disciplines, such as taxonomy, genetics, and seed technology, must be applied to work with genetic resources. Therefore, it would be logical to use the information available from these disciplines to cross-check or supplement specific information on germplasm. For instance, in the field of taxonomy, there are a variety of information systems that could be adapted for use in genetic resources (Aleva et al., 1986; Allkin and Bisby, 1984). Meteorological data bases can be linked with collection information to enable a more in-depth study of the distribution of characteristics under
environmental constraints. The results of these types of studies could be translated into more rational approaches toward collection, use, and management.
Standardization of Information
Information exchange is a vital part of genetic resources work. The compatibility between information systems determines the efficiency of this exchange. To allow an easy exchange of information, a certain degree of standardization is necessary, but to be functional it must be widely accepted.
Most problems concerning the exchange of information are caused by either technical or logical incompatibility. The problems concerning technical compatibility, those related to hardware and software, can be solved relatively easily and are discussed below. Logical incompatibility causes more serious problems. A first step in preventing logical incompatibility is to guarantee the interpretability of data files by always accompanying them with notes explaining the structure of the data files and the codes that were used in them. Before entering information from a foreign system, the format and coding must usually be converted, which apart from being very laborious sometimes causes loss of information. To minimize this effort and the loss of information, data-bank standards have been proposed. Early efforts on morphological and varietal characters of rice were made at the International Rice Research Institute (IRRI) that led to a computer-printed catalog in 1970 (Chang, 1985a). Seidewitz (1974-1976) proposed standards for terminology in his thesauri for names of descriptors and defined descriptor states. In the exchange of information workshop in Radzikow, Poland, in 1984 (International Board for Plant Genetic Resources, 1984b) a standardized list of passport descriptors and a format for data exchange were proposed. Since 1978 IBPGR and collaborators have published descriptors lists (International Board for Planet Genetic Resources, 1982; International Board for Plant Genetic Resources et al., 1985; International Rice Research Institute and International Board for Plant Genetic Resources, 1980) for various crops, giving passport, characterization, and evaluation descriptors, which generally follow those proposed during the workshop in Radzikow.
Efforts have been made to compile an international standard list with addresses and address codes. Currently, the only standard that is commonly used in germplasm-bank information systems is the three-letter country abbreviations supported by the Statistical Office of the United Nations (Anonymous, 1982). However, even this list of codes
is sometimes supplemented with extra codes that serve a different purpose.
The problems in the standardization of the logical aspects of germplasm-bank documentation are mainly caused by the fact that germplasm banks differ. This makes it unlikely that two will use exactly the same information system.
Documentation systems can be fully compatible even if different equipment and software are used, but this requires consensus over standards for exchange of information. The Radzikow workshop tried to develop such standards (International Board for Plant Genetic Resources, 1984b), but only very few germplasm banks thought that it was possible and necessary to implement the proposed standards.
COMPUTERIZATION OF GERMPLASM COLLECTION DATA
Management of information about accessions in germplasm collections is an integral part of genetic resources efforts (Blixt, 1984). As a result of the growth of collections in recent years, the amount of data available has increased considerably. More powerful systems are needed to manage effectively and efficiently the mounting quantity of data. Computers have become essential to this task.
In the early 1970s, many organizations recognized the need to computerize germplasm collection data to facilitate the maintenance and utilization of collections. Originally, plans were envisioned in which all major germplasm collection centers, as part of a global network, would be equipped with compatible computers and documentation systems to promote the free exchange of information (Williams, 1984). This approach proved to be impractical, because of the diversity of hardware and software available. As a result, a variety of computer systems is still used for documentation of genetic resources.
Since the early 1980s, the introduction of personal computers has greatly expanded the capacity to store and retrieve information. This equipment has been introduced in many germplasm banks and has the potential to greatly enhance data management capabilities.
A variety of software systems are used for documentation systems. Generally, they can be grouped into three types: software
developed in house, commercial data-base management systems, and commercial application software (International Board for Plant Genetic Resources, 1988a).
Software developed in house that uses third-generation programming languages (Pundir et al., 1988) like PASCAL or FORTRAN is well-suited to the specific needs of the germplasm bank. Its development requires sophisticated programming knowledge to develop it initially and to update it to add new functions or to address other problems as they arise. Thus, it can be both expensive and time-consuming.
Commercial data-base management systems are becoming increasingly popular (Perry et al., 1988). They are specifically developed to enable the efficient and safe management of large quantities of data. Such software packages do not generally require sophisticated programming skills and are flexible enough to be used to create suitable applications. Updating of the data-base management software can be done by the manufacturer and thus is less expensive and time consuming. In addition, manufacturers offer assistance to users in developing applications for their software.
Commercial application software is available for purposes other than data-base management. This is software that was not intended for data-base management, such as word processors, spreadsheets, text editing, or statistics. Although these packages can be used for data storage, the facilities for data-base management functions, such as selective retrieval, can be limited or absent. These types of software packages are suitable only for data sets with a limited size and are not recommended for use as primary data management tools. They can, however, be quite helpful for preparing data for entry into a data base.
Standardization of Systems
A large variety of hardware systems and software packages are used in germplasm collections worldwide. Difficulties in the technical compatibility between systems are bound to arise. Problems with hardware compatibility can usually be addressed, however. It may be necessary to transfer the data from one medium or format to another to enable its transfer to another computer system. File transfer centers for this purpose were recommended by the workshop on the exchange of information in 1984 (International Board for Plant Genetic Resources, 1984b).
While standardization of all documentation software would be the most straightforward solution, the variety of computer systems,
software, and their availability make this impractical. As an alternative, a standard for data exchange, such as the American Standard Code for Information Interchange (ASCII) fixed-format files, could be adopted for data exchange. ASCII fixed-format files can be created, read, and processed by virtually all commercial software and require few adjustments by individual data systems.
Standardization also assumes that data need to be exchanged extensively between germplasm banks. This may be true more for large collections, such as those of the international genetic resources centers, some national programs, and regional collections. In these collections, however, there are usually individuals with expertise to convert data to a format appropriate to the recipient. This is the current practice of the Germplasm Resources Information Network (GRIN) of the U.S. National Plant Germplasm System. Data in the GRIN system are stored in a large, sophisticated data management system that is incompatible with the systems of other manufacturers. However, an individual requesting information can receive data from GRIN that have been reformatted to the compatible with most major commercially available data-base management software or that have been reformatted in ASCII code. The problems of technical incompatibility can thus be minimized.
Information Supply to Users
The classic manner in which germplasm banks present their collection directly to the users has been through the listings of passport and characterization data. With the current information technology, this seems an inefficient way of making the available information accessible to users. These lists are usually very incomplete selections of the available material and information. More extensive catalogues are often very bulky, expensive to produce, and difficult to update. Other media, such as microfiche (Porter and Smith, 1982) and compact disks (Centro Internacional de Mejoramiento de Maíz y Trigo, 1988), have been used to bring the information to the attention of the users.
The media most commonly used for the exchange of computerized information are floppy diskettes and magnetic tapes. They provide a low-cost means of storing massive amounts of information. The magnetic and optical storage media also permit an interactive retrieval of, sometimes preselected, information. Suitably equipped users are able to process these data into the most suitable form and to find precisely what they are looking for with a minimum of effort. Printed media are not totally obsolete, however. They are still an
effective way of presenting collection information in scientific articles or booklets containing a collection overview with summarized information (Pundir et al., 1988). Infrequent users of large collections or users without computer facilities may find printed listings to be sufficient and more appropriate for their limited needs.
An alternative way to provide information to users is through the use of information service units at germplasm banks. The incorporation of stand-alone applications into national and international computer networks provides users with increasing possibilities to consult remote data bases using on-line data communication. In the United States, GRIN is an example of such an information network used for genetic resources (Perry et al., 1988).
CURRENT STATUS OF GENETIC RESOURCES DOCUMENTATION
The current status of genetic resources documentation can best be typified as one that is rapidly evolving. The situation has changed since the comprehensive paper of Knuepffer (1983) and status report of the International Board for Plant Genetic Resources (1984c). Many institutions have altered their data management systems with respect to hardware, software, or both; the results of massive screening activities have been documented; and for many crops, central data bases have been organized, as have useful data bases on related topics. The situation is different for every germplasm bank and for every crop. In this diverse and rapidly changing situation, it is impossible to describe the current situation accurately or comprehensively, but some trends can be observed.
Collection Data Bases
Small collections, usually working collections, are just being computerized, usually using personal computers and a commercial data-base management system. Many breeding institutes and small germplasm banks use this relatively new option, for example, the University of Birmingham in the United Kingdom and the French Institut National de la Recherche Agronomique in Montpellier. Even some larger germplasm banks, such as Zentralinstitut für Genetik und Kulturplantzenforschung in Germany (formerly, East Germany), the Nordic Gene Bank in Sweden, and the Center for Genetic Resources in The Netherlands (van Hintum, 1988), use a commercial data-base management system on stand-alone personal computers or personal computers linked to a minicomputer.
Most large germplasm banks computerized their collections some
time ago. The very large ones, such as the IRRI, use internally developed programs on mainframe computers or minicomputers (International Board for Plant Genetic Resources et al., 1985). The smaller ones often use less customized packages, also on large computers. Consiglio Nazionale delle Ricerche in Bari, Italy, uses a statistical package on a mainframe computer. The technology of data-base management has evolved very quickly; the technological solutions applied in the first germplasm information systems have become outdated. The older systems often lack the flexibility and performance of modern data-base management systems. Therefore, many large germplasm banks have already replaced, or are in the process of replacing, their older systems with a commercial system, for example, the IRRI and Institut für Pflanzenbau und Pflanzenzüchtung der Bundesforschungsanstalt für Landwirtschaft in Braunschweig, Germany (formerly, West Germany).
In 1988 the International Board for Plant Genetic Resources (unpublished data) performed a survey for preparing crop inventories. Responses were received from 52 germplasm banks maintaining mostly minor collections of cereals or vegetables; these centers were located in Africa (2 centers), Asia (5 centers), South America (8 centers), Australia (2 centers), and Europe (35 centers). The following conclusions concerning documentation can be drawn.
Personal computers are increasingly being used at germplasm banks. They either stand alone or function in combination with mainframe or minicomputers. This development considerably decreases the problems associated with the exchange of information.
Germplasm banks are rapidly computerizing their data. Only 2 of 52 respondents to the survey mentioned above did not use computers for documentation purposes, but they were planning to start using them in the near future.
The software used for documentation is mainly applications of commercial data-base management system software (66 percent) or in-house developed software programs (30 percent). Few applications are based on other software products, such as spreadsheets (4 percent).
The extent to which the collection data are computerized is difficult to estimate since detailed information on this point was scarce. Overall, it appears that active and working collections in particular are in the process of computerizing characterization and evaluation data. Many of the respondents did not mention the status of passport data in their collections. Although many respondents expressed their willingness to exchange data, it appears that the major concern
of most germplasm banks at present is computerization of their own collection data.
Central Crop Data Bases
The importance of international central crop data bases was stressed above. For a number of crops, these data bases have been established, usually by germplasm banks that hold one of the collections. The extent to which these data bases are complete varies enormously, as do the type and quality of the information they contain. Often only passport data are included (for example, International Board for Plant Genetic Resources, 1985a). Sometimes, this is extended with characterization, evaluation (Chang, 1985a; International Board for Plant Genetic Resources, 1986; Vanderborcht, 1988), or seed management data (Frese and van Hintum, 1989; International Rice Research Institute, 1991). Problems in compiling central data bases are generally caused by lack of interest, lack of funds, lack of computer facility, or incompatibility of data (International Board for Plant Genetic Resources, 1985b, 1986, 1989b; International Rice Research Institute, 1991).
Other Sources of Information
The number of other sources of information is large. As a result of data computerization, the accessibility of those sources has increased considerably in recent years. There is a multitude of on-line and off-line data bases (Anonymous, 1989) and other possible sources of information (Food and Agriculture Organization, 1987a).
Developments in information technology have occurred rapidly. This has made it difficult to predict their impact on genetic resources data management activities. Nevertheless, it is useful to look at some of the major trends and how these could affect the documentation of genetic resources.
Four major developments in information technology have occurred:
Decreasing prices and physical sizes and the increasing quality and performance of hardware;
Increasing quality, performance, and user friendliness of commercial software:
Growing popularity and possibilities of computer networks; and
Growing familiarity of the public with computers.
These developments have several unavoidable consequences for germplasm-bank documentation:
The limitations in the documentation of germplasm collections will shift from problems with hardware and software to problems in using the information in germplasm information systems and related data bases.
All data on genetic resources will be computerized by using user-friendly information systems. This will stimulate the use of the information, for example, for the compilation of central crop data bases.
Improvements in computer networks and standardization of commercial data-base management software packages will allow for much easier communications between germplasm banks and between germplasm banks and users, leading to the flow of more information.
Information about the accessions in a germplasm collection is essential. The seeds in a collection are of little value if separated from the information that will enable researchers to select those appropriate to specific needs. Once obtained, this information must be recorded in data bases in a form that is accessible to the broadest possible range of potential users.
Germplasm collections must minimally make available passport information for the materials they hold.
Too often, loading data on accessions has been accorded less importance than design and development of the computer software. An earlier report of the committee (National Research Council, 1991a) recommended, for example, that the U.S. National Plant Germplasm System direct resources more to the addition of existing information to the data base than to software and hardware development. For materials in collections, basic passport, characterization, and evaluation data should be obtained from collectors, breeders, researchers, and germplasm curators. For materials being added to collections, strict policies against adding undocumented accessions should be enforced.
Compilation of data for germplasm applications should follow easily exchanged, readily available protocols.
The availability of computerized information in a collection data base, in central crop data bases, and on-line in other data bases will change the management of genetic resources collections considerably
and should be anticipated. Central crop data bases, combined with an infrastructure for easy exchange of information, will allow for a genetic resources strategy with networks of dispersed, rationalized collections of freely available material (Perret, 1989). This strategy would create many possibilities in the field of genetic resources management. The choices of data bases to be used will be less critical than the quantity and quality of the data and the implementation of the procedures to acquire, enter, and distribute the data. However, the protocols implemented for data management should follow those used in most widely available commercial data base applications.
International crop germplasm data bases should be developed.
National and international centers should collaborate in developing a compatible and interlinkable data base for every major crop. National research programs will be greatly benefitted by having access to an internationalized documentation and exchange system. The international agricultural research centers and other well-established national centers, such as the U.S. Department of Agriculture, may serve as international data-base centers.