As we have illustrated, a variety of genes could be involved with evolutionary shifts in flower color in Aquilegia. Flower color could be affected by the enzymatic function/activity of genes in either the core or side branches of the flavonoid biochemical pathway or by mutations affecting gene expression either through cis or trans regulation. Multiple copies of genes could also be involved. Although some researchers have predicted that changes are more likely to occur at transregulators (Clegg and Durbin, 2000; Whittall et al., 2006a), because this would allow the structural genes to be used in other tissues, no study to date has considered the expression and function of the entire flavonoid pathway (Fig. 2.3). To initiate such a study, we sought to identify candidate genes for the flavonoid pathway in Aquilegia. Using genes of known function (primarily from Arabidopsis) we conducted tblastn searches of the Aquilegia gene index (http://compbio.dfci.harvard.edu/tgi/cgi-bin/tgi/gimain.pl?gudb=aquilegia). This database is derived from EST sequencing from a broad range of tissues and developmental stages of greenhouse hybrids between A. formosa and A. pubescens, including floral tissue producing anthocyanins. The current gene index contains 13,556 tentative consensus (TC) sequences and 7,278 singleton ESTs and, thus, likely represents a large fraction of the expressed genes in Aquilegia. From these initial searches, we used the strongest Aquilegia hits in tblastx searches of The Arabidopsis Information Resource and National Center for Biotechnology Information, and those that resulted in best hits to the original characterized gene were retained as potential flavonoid pathway genes.
Several genes in the flavonoid pathway belong to large multigene families, making the criterion of reciprocal best hit for identifying homologous genes suspect. However, phylogenetic analysis of a number of these multigene families has identified clades that contain proteins with common function (Gebhardt et al., 2005; Bogs et al., 2006; Nakatsuka et al., 2008b). Thus, after we identified candidates for these genes through tblastx searches, we then included their inferred protein sequences in ClustalW alignments of genes from these studies and conducted neighbor-joining analysis. We then assigned genes to functional groups based on their phylogenetic clustering. For instance, F3H, FLS, and ANS all belong to a large gene family of 2-oxoglutarate-dependent dioxygenases (2-ODDs). However, across multiple species, genes with the same enzymatic function form monophyletic clades in phylogenetic analyses of 2-ODDs (Gebhardt et al., 2005) (Fig. 2.4A). Inclusion of Aquilegia sequences in the alignment of 2-ODDs resulted in a tree with 1 Aquilegia sequence in each of the F3H and ANS clades and 2 sequences in the FLS clade (Fig. 2.4A). We used a similar approach to identify F3′H and F3′5′H homologs using the