Chapter 20: Data Management Issues at the <A HREF="http://www.ars.usda.gov" TARGET="2nd">Agricultural Research Service</A> | Data for Science and Society: The Second National Conference on Scientific and Technical Data | U.S. National Committee for CODATA

Chapter 20: Data Management Issues at the <A HREF="http://www.ars.usda.gov" TARGET="2nd">Agricultural Research Service</A> | Data for Science and Society: The Second National Conference on Scientific and Technical Data | U.S. National Committee for CODATA | National Research Council

20

Data Management Issues at the Agricultural Research Service

Floyd Horn

I want to take just a few moments to introduce you to the Agricultural Research Service (ARS) and give you some sense of the complexity of our organization. Although we are a relatively small government research and development program, we have some unique database management needs.

The ARS is the intramural research program of the Department of Agriculture. We are charged for the most part with addressing the concerns of the mission and regulatory aspects of the Department, as well as of the Environmental Protection Agency (EPA), the Food and Drug Administration, and others agencies that have an impact one way or another on agriculture. ARS also extends the nation's scientific knowledge across a broad range of program areas other than production agriculture.

Our budget, in government-wide terms, is relatively small. It is $834 million for fiscal year 2000. We have approximately 2000 scientists in the base program. We also have $60 million in cooperative agreements, largely administered to the land grant university system.

We have 104 locations nationwide. These conduct research on virtually everything related to production, protection, and natural resources utilization that in any way relates to agriculture, and oftentimes much more than just agriculture. These locations are grouped into eight geographic areas, each with its own area office and each with its own database management challenges. The areas coordinate with the 104 locations and with our partners nationally. We have eight biological control laboratories around the world with whom we need to communicate. So it's a fairly sophisticated field structure.

Our mission relates to ensuring high-quality, safe food and other agricultural products. We also have the challenge to maintain a firm understanding of the nutritional needs of Americans. Much of human nutrition research is done in-house. It also relates to the social and behavioral sciences as they apply to food intake and the surveys upon which a lot of policy decisions are made. We are charged with sustaining a competitive agricultural economy. About 13.2 percent of the U.S. gross domestic product is agriculturally driven, as is one in every six jobs in this country. We enhance natural resource bases as best we can in order to ensure that agriculture is in fact in harmony with the environment. We help to provide economic opportunities for rural citizens, providing a way for them to stay at or near the farm, and not to flock to the cities.

ARS programs are divided into three major categories: (1) animal production, (2) natural resources and sustainable agricultural systems, and (3) crop production. They are to some extent driven by administrative convenience as I'll point out, but by and large, they do reflect research themes. Figure 20.1 provides a description of our programs.

Figure 20.1

We use a number of independent databases. Now we need them to interact, and it's necessary that we have the platforms and the capacity to do just that. I am going to discuss a few examples, including human nutrition; germplasm collections; a relatively new set of sciences for us that relate to genomics; a classical success for us--the dairy herd improvement program; and our natural resources databases. Each of these databases is national in scope and is accessible by scientists, producers, and the public.

In the area of human nutrition, with regard to food composition, we are the primary source for nutrient data. Not only are nutrient data used to set government policy, for instance, food assistance programs, they are also used by dieticians, food supplement industries, and others around the country and around the world who are interested in augmenting the nutritional well-being and disease prevention of people by improving their diet. We are forever refining these databases. For instance, we find that different ethnic populations have different dietary traces, different dietary requirements. These are constantly being entered into the food survey database, which is being updated. This database is accessible to virtually everyone. We have most of these data on CD-ROM so they can be accessed by databases. This is a very popular set of data, and we make them readily available.

We need food surveys for a number of reasons. The food surveys database determines what Americans eat by very sophisticated interview procedures. Again, these sorts of data are used to assist in setting government policy. One good example is the EPA. In order for EPA to use scientific data for setting guidelines on the use of pesticides in this country, it needs to know not only about the residues that occur on food and agricultural crops, but also about how much people eat--in particular, how much children eat. So, currently, we're spending a great deal of time trying to figure out exactly what children eat so that EPA can determine their exposure to various pesticides and collectively what these levels of exposure mean. The data have generated tremendous public interest and are available for public access.

The plant germplasm database is one that has helped us to make remarkable progress. The Germplasm Research Information Network (GRIN) is used globally and is contributed to globally in order to put in one place all of the information we have available on a wide variety of food plants. We promote free exchange of germplasm information internationally with all countries, and we benefit from exchanges with those countries, much as a library does. Our library certainly does. If anyone writes to us and asks for a germplasm, we provide it. If we write to them and ask for germplasm in return, we expect to get it, and we normally do. Of course, they would not ask for this germplasm unless they had looked at the GRIN to determine what the attributes of the plant materials are --the basis for their decision. The GRIN receives more than 4000 hits a day.

A relatively new aspect of our program is the animal germplasm program. We anticipate that in construct it will be similar to GRIN, but it will have a different set of attributes. We believe that ultimately, particularly as we become capable of storing ovaries and semen and the other things that will promote a breeding program into the future, we will have more interest in this database. In fact, decisions on how to introduce certain traits into a herd of cattle, for instance, will be dependent on decisions facilitated by these databases.

ARS is becoming more and more interested in the white-hat and the black-hat microbial germplasm. We have culture collections that primarily hold beneficial organisms, which reside in Peoria, Illinois. We also have beneficial fungi that help us to control weeds and the like in our biological control program in Ithaca, New York. We have the rhizobia that are important to legumes in our program in Beltsville, Maryland. We have a number of pathogen collections around the country. One of the new dimensions of our microbial germplasm collection is that we have to begin being concerned, it would seem, about those around the world who might intentionally increase the virulence of pathogens and introduce them into the United States in order to affect our capacity to trade in a global market. This too would be a part of this germplasm collection.

Then there are plant and livestock genomics databases. We are generating as best we can sequence information, DNA markers, phenotypes and phenotypic characteristics, pedigrees, bibliographic citations, and genetic maps. Our sense at the moment is that these data need to be in the public domain as much as we can promote them and accessible internationally through the Internet. Examples of databases in this area include GrainGenes, the Maize database, RiceGenes, and so forth. These are all searched regularly and globally for information relative to these particular genomic sequences and genome maps.

In a similar way, we are looking as best we can to functional genomics and computational genomics. We are developing new programs within the crop databases, in particular, to put on the Web, with the idea of allowing one to visualize and manipulate genomic and genetic resource data for crops. I think this will ultimately follow with livestock, but at the moment we are emphasizing crops. This is a matter of taking tremendous amounts of data and making them readily available and readily analyzed to determine gene sequences, and making decisions based on gene function. We are developing the software tools that are necessary to do just that. We are working mostly with the Cornell Theory Center and the genomics databases housed there.

The national databases for genetic research on milk and milk yield and composition, fitness of dairy cattle, disease resistance, and reproductive traits of dairy cattle are all incorporated in our dairy herd improvement program. This is the most famous piece of this laboratory's activities. We have records on 12 million dairy cattle in this database. These records make it quite simple to figure out exactly what you want and to request the right semen for a bull to introduce into a herd of cows to get the characteristics you want out of that dairy herd. As a result, we have made very significant progress in the improvement of dairy herds around the United States. Internationally we have records on a lot of cattle--for example, from the Netherlands--and we treat these in the same way.

Our natural resources databases contain data from carefully monitored and equipped sites around the country, of which there are roughly 333. We take a look at such things as moisture in the soil, rainfall events, evapotranspiration, growth rates, carbon sequestration--a wide variety of different things. We also are looking at the effects of climate on productivity and profitability of the farming enterprise across the country. One of these in particular, AGROS, is related to the National Aeronautics and Space Administration's (NASA's) Global Change Master Directory, and we are in fact a contributor to many of NASA's climatological programs.

ARS is responsible for public access to research data developed in the federal sector with public funds. I'm very proud of the fact that we are moving very quickly in this direction. So I suppose in some ways we are in competition with those that would create a proprietary database function, but that's okay. We believe many of these things will benefit science much more if they are in the public domain.