pieces of data scattered among hundreds of thousands of data points derived from a single experiment. Microarray data are notoriously hard to interpret; large amounts of data make analysis difficult, and it is challenging to tease apart results that are due to the intended variable and results that are due to factors for which there was not an adequate control. The scientific community today does not fully understand what the transcriptional data from microarray experiments mean with respect to cellular function, and it would be hard to put the data to practical use in enhancing a pathogen.
It might be possible to distinguish access to genome data, such as primary sequences and annotations, from access to sophisticated analytic tools that allow the assembly of biological data into a coherent picture. Tools that link many kinds of biological data to computer programs that can be used to mine and analyze them are themselves among the most potent tools for conducting biological research ever constructed, (see, for example, the work being done by the Synthetic Biology group at Massachusetts Institute of Technology—www.syntheticbiology.org). As the power of computer systems that integrate various kinds of data grows, one might argue that it will become easier for someone to use these tools anonymously through the Internet to further attempts to enhance pathogens. By the same token, that risk is balanced by the even higher likelihood that the data and tools to analyze them will be used to create new therapies and prevention measures to control natural outbreaks and bioterror attacks.
The committee was charged with determining which types of pathogen-related genome data present the most concern. As evidenced by the categories above, it is possible to identify categories of data, but it is not clear that some types of data can be correlated with a specific level of risk of misuse for bioterrorist purposes. Data on all organisms present some level of concern but, although some organisms are inherently more dangerous, it does not necessarily follow that their genome sequences are more dangerous. The organisms themselves are beyond the scope of this study, and many organisms relevant here are governed by the select agent rules.
Access to digital data is notoriously difficult to limit to approved users. The recent experience of the recording and motion-picture industries with illicit transmission of copyrighted material is well documented. Files containing genome information would likewise be resistant to effec-