doesn’t have bioinformatics expertise at his or her university. I think that’s one of the real problems that we need to address.”
The other problem, Fraser added, is that the various genome projects generally make no allowance for taking care of the data they generate once the project is finished. “For the most part, even for sequencing projects with bioinformatics support during the term of the project, that support ends when the sequence is completed. There’s been no plan put in place for how to maintain and update all of this information.”
“That problem is going to get even worse as we begin to accumulate more data. There have been all sorts of models proposed, from letting people in the community who are passionately interested in an organism do it on an ad hoc basis, to having this done in a more centralized facility, to having this done in a distributed way but with clear rules for interoperability. I’ve even heard some people go so far as to suggest that perhaps we need to come up with some sort of tax on genome projects that goes to fund a bioinformatics trust managed by an inter-agency group responsible for maintaining these databases.”
Several participants pointed out that in order to maximize the value of the information generated by domestic animal genome projects, researchers and information technology specialists will have to pay more attention to data handling. In particular, programs need to be designed not only to maintain the data and make it accessible to any researcher who needs it but also to make sure the information can be integrated with new data and new understandings as they appear.
The biggest difficulty is the problem of scaling: A database must be designed so that it continues to work, and work well, when the amount of data in it is doubled or increased by a factor of ten or twenty. That will be a challenging job, Fraser noted.
“I’m not convinced,” she said, “that any of the existing databases that have been built so far to handle sequence information are robust enough to scale to the level that we know we are going to need in going forward.” The databases built to handle the sequence information are actually the easy part, she said. “We would like to begin to add in functional information, either directly or through links, to all of the existing gene and protein databases. When you start thinking about doing that, the challenge goes up by several orders of magnitude.”
Owen White, of The Institute for Genomic Research (TIGR), made a similar point. “The National Center for Bioinformatics (NCBI) is doing a heroic job,” he said. “They are doing an amazing job managing sequence data and