first? In addition, a host of logistical questions must be considered by the participating scientists and their Institutional Review Boards (IRBs). These include how the specimens will be collected, managed, and stored; how access to them will be regulated and monitored; how they will be used to extract relevant biomarkers; and whether and how the laboratory data and survey information will be made available to scientific collaborators, the broader scientific community, and the public.
There is no single formula for adding the collection of biospecimens to a survey; each investigation will have different requirements that will have to be weighed and balanced against constraints of budget, time, field conditions, and participant burden. Significant advance planning, piloting, and revision will be required, and even then it is wise to expect the unexpected.
These cautions are particularly important given the rapid pace of technology development both with respect to the collection of specimens (e.g., blood, urine, saliva, or hair) as well as the ability to analyze data derived from such specimens (ranging from blood glucose levels to C-reactive protein to mercury levels to DNA). The prime example here is the analysis of genetic markers. Just a few years ago, analysis of large numbers of biological specimens was limited to examining a small number of candidate genes or, at best, a few thousand genetic markers. But the advent of microarrays that allow the measurement of expression levels of unprecedented numbers of human genes in a single experiment or the profiling of a million single nucleotide polymorphisms (SNPs) across the genome began to change the way scientists and the IRBs that oversee their projects conduct their work. For example, pooled data from genome-wide association studies (GWAS), representing the genomes of multiple individuals, were viewed for some time as acceptable for public release. But when recent work on forensic analysis of DNA samples showed that the presence of a single individual could be detected in a large pool of such samples (Homer et al., 2008), researchers and policy makers at the National Institutes of Health (NIH) reconsidered and changed data release policies.
Concerns about data release and protected health information have been compounded by the rapid pace of development of next-generation DNA sequencing technologies. Sequencing the first human genome was a 15-year project that cost billions of dollars; new technologies, however, allow sequencing of human genomes in times that are on the order of 1 month at a cost of less than $100,000, and technology advances may reduce the cost to $1,000 or less. This capability raises the prospect of having to deal with unprecedented amounts of personal information—the entire genome sequence from large numbers of individuals. While such data may have great potential for the discovery of biomarkers and functional studies, dealing with the data and their implications for privacy and protection of human subjects will require addressing many as yet unanswered questions.
The focus in this chapter is on the specimens themselves, not the data