far apart they are supposed to be in the puzzle. These complications present both opportunities and challenges for mathematical analysis.
Human DNA is a long molecule that is shaped like a spiral staircase, in which each step contains a pair of amino acids that fit together like a tongue and groove joint. Adenine (A) fits together with thymine (T), and cytosine (C) fits together with guanine (G). Each chemical fits only one of the others, so that the sequence of letters along one side of the staircase (GATTCC…) uniquely determines the corresponding sequence on the other side (CTAAGG…), which is conventionally read in the opposite direction (… GGAATC). Like a photographic negative, one strand is a template for duplicating the other (see Figures 19 and 20).
In all, human DNA contains about 3 billion “base pairs,” or rungs of the staircase. The goal of the Human Genome Project was to list all of them in order. Unfortunately, chemists can sequence only a few hundred base pairs at a time. To sequence the whole genome, scientists had to chop it into millions of shorter pieces, sequence those pieces, and reassemble them.
The publicly funded Human Genome Project and the privately funded Celera Genomics adopted two different strategies, both of which eventually led to the same mathematical problem. You have millions of short (500-base) overlapping puzzle pieces that have been completely scrambled by the chopping process. There are enough pieces to cover the length of the genome seven or eight times over, so there are many overlaps between pieces. You want to use these overlaps as a guide to assemble the pieces into the longest possible sequence of contiguous regions.
19 / Human DNA can be extracted from biological tissue such as skin and blood and a unique genetic sequence of amino acids can be determined. Image courtesy of the National Institutes of Health. /