in our natural-history collections whose likelihood of being mislabeled exceeds 0.75.” Assuming that some cases in the database can be identified as “labeled correctly” and others “known to be mislabeled,” a training sample for a data-mining algorithm could be constructed. The algorithm would build a predictive model and retrieve records matching that model rather than a structured query that a person might write. This is an example of a much needed and much more natural interface between humans and databases than is currently available. In this case, it eliminates the requirement that the user adapt to the machine's needs rather than the other way around. We must refine and augment the interactions between people and machines, expand the role of agentry in information systems, and discover more-powerful and more-natural ways of navigating the scientific record.
In return, research in computer and information science and technology in the biodiversity and ecosystem domain is likely to yield discoveries of value to other fields (Spasser 1998). Nowhere do we find the problems of heterogeneous database federation more challenging than in the life sciences. A fully implemented digital library for biology would include everything from ideas to physical objects and enormous amounts of information in every medium type imaginable. Research on global climate change, habitat destruction, and the discovery of species is among the most distributed of our scientific activities and creates extraordinary opportunities to learn about computer-mediated project coordination and communication. At almost every turn, scale, complexity, and urgency conspire to create a particularly wicked set of problems. Working on these problems will undoubtedly advance our understanding and use of information technologies, perhaps more than in any other circumstance.
We have laid out the case for building a fully digital, interactive, research-library system for biodiversity and ecosystem information and the basic requirements of and goals for the library and its research and service. But how much will it cost, and how long will it take to build?
We estimate that each of the regional nodes that will form the core of NBII-2 will require an annual operating budget of at least $8 millionprobably more. Minimally, supporting five such nodes would require at least $40 million per year, an amount that is a small fraction of the funds spent nationwide each year to collect data (conservatively estimated at $500 million for federal government projects alone). As with the Internet itself, the federal government should provide the “jump start” for this new infrastructure by investing heavily in its formative stages. Part of the investment should be devoted to developing incentives for the participation of private-sector partners. Gradually, support and operation of the infrastructure should be shared by nongovernment participants, as has happened with the Internet.
The planning and request-for-proposals process should be conducted within a year. Merit review and selection of sites should be complete within the following six months. The staffing of the sites and initial coordination of research and