当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2004年 > 第11期 > 正文
编号:11255313
Divergence of Conserved Non-Coding Sequences: Rate Estimates and Relative Rate Tests
     * Department of Ecology and Evolutionary Biology Yale University, New Haven, Connecticut; Bioinformatik, Institut für Informatik, Universit?t Leipzig, Leipzig, Germany; Institut für Theoretische Chemie und Molekulare Strukturbiologie Universit?t Wien, Wien, Austria

    E-mail: gunter.wagner@yale.edu

    Abstract

    In many eukaryotic genomes only a small fraction of the DNA codes for proteins, but the non-protein coding DNA harbors important genetic elements directing the development and the physiology of the organisms, like promoters, enhancers, insulators, and micro-RNA genes. The molecular evolution of these genetic elements is difficult to study because their functional significance is hard to deduce from sequence information alone. Here we propose an approach to the study of the rate of evolution of functional non-coding sequences at a macro-evolutionary scale. We identify functionally important non-coding sequences as Conserved Non-Coding Nucleotide (CNCN) sequences from the comparison of two outgroup species. The CNCN sequences so identified are then compared to their homologous sequences in a pair of ingroup species, and we monitor the degree of modification these sequences suffered in the two ingroup lineages. We propose a method to test for rate differences in the modification of CNCN sequences among the two ingroup lineages, as well as a method to estimate their rate of modification. We apply this method to the full sequences of the HoxA clusters from six gnathostome species: a shark, Heterodontus francisci; a basal ray finned fish, Polypterus senegalus; the amphibian, Xenopus tropicalis; as well as three mammalian species, human, rat and mouse. The results show that the evolutionary rate of CNCN sequences is not distinguishable among the three mammalian lineages, while the Xenopus lineage has a significantly increased rate of evolution. Furthermore the estimates of the rate parameters suggest that in the stem lineage of mammals the rate of CNCN sequence evolution was more than twice the rate observed within the placental amniotes clade, suggesting a high rate of evolution of cis-regulatory elements during the origin of amniotes and mammals. We conclude that the proposed methods can be used for testing hypotheses about the rate and pattern of evolution of putative cis-regulatory elements.

    Introduction

    A major mode of developmental gene evolution is based on the modification of cis-regulatory elements (Arnone and Davidson 1997; Stern 2000; Carroll et al. 2001; Davidson 2001; Wray et al. 2003). Binding sites for transcription factors are usually short and variable and are thus hard to identify unambiguously, in particular if the transcription factors involved are not known a priori (Tautz 2000; Ludwig et al. 2000; Dermitzakis et al. 2003). Non-coding sequences, however, can contain islands of strongly conserved segments, so-called phylogenetic footprints (Tagle et al. 1988). In a number of cases it has been shown that these phylogenetic footprints are indicative of functional cis-regulatory elements (Tagle et al. 1988; Manen et al. 1994; Leung et al. 2000; Chiu et al. 2002; Blanchette and Tompa 2002; Santini et al. 2003), reviewed by Duret and Bucher (1997) and Fickett and Wasserman (2000). Hence it is possible in principle to gain insights into the extent and the phylogenetic timing of major changes in the cis-regulatory elements of a gene by studying the phylogenetic pattern of non-coding sequence conservation. In a recent study we presented an efficient computational tool, the tracker program, to simultaneously survey the orthologous intergenic regions in multiple large gene clusters (Prohaska et al. 2004a). This technique allowed us to demonstrate that footprint patterns contain sufficient phylogenetic information, e.g., to resolve the orthology of shark and mammalian Hox clusters (Prohaska et al. 2004b).

    The quantitative analysis of dynamical aspects of footprint loss and acquisition, however, is complicated by the fact that we cannot independently observe individual regulatory DNA regions. Instead, phylogenetic footprinting always detects regulatory elements in pairs of sequences. As a consequence, even very simplistic models of footprint loss lead to rather sophisticated inference and test methods as we shall see in this contribution. We will focus on two questions: (1) How can we detect rate differences in footprint modification in two different lineages? (2) Can we determine periods in evolutions with exceptionally large or small footprint modification?

    Data Acquisition

    Sequence data of HoxA clusters were downloaded from GenBank: Homo sapiens HsA = reverse complement (r.c.) of AC004080, AC010990 r.c. (overlaps 200nt with AC004080), and AC004079 (pos. 75001-end, r.c., overlaps 200nt with AC010990), as in Chiu et al. (2002); Heterodontus francisci HfM = AF479755; Polypterus senegalus PsA = AC132195 and AC12632 as in Chiu et al. (2004); Mus musculus MmA NT_039343 r.c.; Rattus norvegicus Rn = NW_043751; Xenopus tropicalis XtA = AC145789 (downloaded 14/Aug/2003).

    Conserved non-coding sequences are detected using the tracker program (Prohaska et al. 2004a). Very briefly, this approach is based on BLAST (Altschul et al. 1990) for the initial search of all pairs of input sequences restricted to homologous intergenic regions. The resulting list of pairwise sequence alignments is then assembled into groups of partially overlapping regions that are subsequently passed through several filtering steps and finally aligned using the segment-based multiple alignment tool DIALIGN2 (Morgenstern 1999). The final output of the program is the list of these aligned "footprint cliques" (see Supplementary Material online: URL: http://www.bioinf.uni-leipzig.de/Publications/SUPPLEMENTS/04-007/.)

    The alignments of all footprint cliques are concatenated and padded with gap characters where data are missing, i.e., where a footprint detected between some sequences does not have a counterpart in others. Consequently, all gap characters are treated as unknown nucleotides rather than as deletions. Conserved positions between groups of sequences are counted as specified in eq. (1) below. To take unknown nucleotides into account, we discount columns with gaps in the relevant sequences by a factor of 1/4 for each gap; data are summarized in tables 1 and 2.

    Table 1 Summary of Relative Rate Tests for the Rate of Modification of Ancestrally Conserved Non-coding Sequences

    Table 2 Same Data as in Table 1 but Analyzed with the Tajima Relative Rate Test (see Appendix)

    A Model

    Consider the tree in figure 1. This tree has four taxa, the two in-group taxa A and B to be compared and two outgroup taxa O and X. In contrast to a relative rate test for coding sequences, we need two outgroup data because putative cis-regulatory elements cannot be reliably identified from a single sequence. Instead, we use the comparison between the two outgroup taxa O and X to identify conserved noncoding nucleotide (CNCN) positions. These CNCN sequences are then compared to the orthologous sequences in the two ingroup taxa and their degree of modification is assessed. To do this we introduce a simple model of loss of conservation along the lineages leading to the terminal ingroup taxa. Suppose we have a set CNCN positions of footprint cliques at Q = lca(X(AB)) and set q = ||, i.e., the number of CNCN positions. We assume that CNCN positions are lost according to a simple exponential decay law. Furthermore, suppose the rate is everywhere the same, with exception of the lineage from P = lca(AB) to B.

    FIG. 1.— Each test for the rate of change in CNCN sequences is based on the comparison of four sequences O, X, A, and B. O and X are outgroup sequences, which serve for the detection of conserved non-coding sequences. The additive evolutionary distance between O and X is assumed to be long enough to randomize sequences which are not under stabilizing selection. Following Tagle et al. (1988) we only accept outgroup sequences with at least 250 Myr of additive evolutionary time between them. A and B are the two ingroup sequences, and Q is the most recent common ancestor of X and (A, B) that existed at a time T, and P is the most recent common ancestor of A and B which existed as a time T2 before the present. We test whether the rate of modification along the branch P – B is different from that along the branch P – A, where it is assumed that the rate of the evolution along P – A is the same as in the rest of this tree.

    Given an outgroup O we may consider all those CNCN sequences that appear in O and in at least one of the three species A, B, and X. The measurable variables then are

    (1)

    The values ckl are counts of CNCN sites shared among the taxa indicated by the subscripts k and l. Given the model in figure 1 we can readily express the observable CNCN counts in terms of the model parameters

    (2)

    This model assumes rate homogeneity among nucleotide positions. This assumption is certainly not met in any real sequence. We thus note that the test proposed here is based on simplifying assumptions, similar to those of many methods for coding sequence evolution. A short computation yields

    (3)

    The last line of eq. (2) then becomes

    (4)

    Multiplying with q and solving for yields

    (5)

    The variance of an exponential process with decay constant a and initial value b is

    (6)

    We are interested in the variance 2 of the difference of the loss rates along PA and PB, which equals twice the variance of the exponential process along one of the lineages. Thus

    (7)

    The number of CNCN positions exclusively lost along PA is for PB we have Thus The test statistic is therefore

    (8)

    Equation (8) gives a test statistic which assumes that the loss of conservation at each nucleotide position is stochastically independent. This assumption, however, is not plausible, assuming that the elementary event in the evolution of an enhancer is the loss or gain of a transcription factor binding site. Typically, transcription factor binding sites are between 5 and 20 nucleotide positions long, but they have various degrees of degeneracy. Evolutionary changes in the number and kind of transcription factor binding sites thus induces a stochastic dependency among the nucleotide positions compared here. To account for this stochastic dependency, we scale the predicted sampling variance with the average length of contiguous CNCN sequence elements in our data, This value is typically between and thus at the same scale as many known transcription factor binding motifs. The resulting test statistic then is

    (9)

    which in normally distributed with variance 1.

    Estimating Footprint Loss Rates

    As outlined in the previous section, the number of shared and unique CNCN positions among the four taxa X, O, A, and B can be interpreted in terms of the parameters of an exponential loss model. In particular, it is possible to derive expressions for T and T2. If, in addition, we have independent estimates for the time of divergence of the taxa compared we could, from all possible four-taxa comparisons, estimate the loss rate along the lineage from the most recent common ancestor lca(A, B) of A and B and one of the two taxa, A or B. While this exercise is computationally straightforward, the interpretation of the numbers so obtained needs careful attention to estimation biases.

    One problem with the raw estimates obtained from solving the equations of the model for the parameters T is that the CNCN sequences detected in the comparison between the two outgroup species, O and X, contain spurious CNCN positions. These are nucleotide positions which are identical between the sequences of O and X but are only identical by chance rather than as a consequence of purifying selection. While tracker and other alignment procedures are designed to identify significant stretches of conserved sequence, there is a possibility that at the borders of conserved sequence blocks spurious sites are included in the count of CNCN sites. There is no objective way to eliminate them from the sequence alignment, but it is possible to determine their influence on the estimates of the rate parameters.

    Let o(T) be the rate parameter observed from a comparison in which the most recent common ancestor of A and B lived T years before the present, and let us assume the loss of CNCN positions is time homogeneous. Then this estimated rate is determined by the true rate c, as well as by the number of spurious CNCN positions. The true CNCN positions evolve at a rate c, but the spurious sites randomize much more quickly than the true CNCN sites. Over the timescales we consider in this article, these spurious positions randomize instantaneously, and thus they contribute an additive term to the true rate to give the observed rate

    (10)

    Hence the observed rate o is predicted to be a linear function of 1/T with a slope which depends on the logarithm of the fraction of spurious CNCN positions, and an intercept equal to an estimate of the true rate c:

    (11)

    That means that if we have estimates for at least two time points we can do a linear regression of the observed rate parameters. The intercept is then a corrected rate estimate c, and the slope an estimate of the fraction of spurious CNCN sites in the alignment of sequences from O and X.

    The HoxA Clusters of Gnathostomes

    We applied the method described above to a data set containing the full HoxA cluster sequences of three mammal species, human, rat, and mouse, as well as an amphibian, Xenopus tropicalis; the basal ray finned fish bichir, Polypterus senegalus (Chiu et al. 2004); and the shark Heterodontus francisci (Kim et al. 2000; Chiu et al. 2002) (fig. 2). A subset of the tests done for this data set is presented in table 1.

    FIG. 2.— Phylogenetic tree of the taxa used in this study and their divergence times in Myr (Kumar and Hedges 1998). Hf: Heterodontus francisci; Ps: Polypterus senegalus; Xt: Xenopus tropicalis; Hs: Homo sapiens; Mm: Mus musculus, Rn: Rattus norvegicus.

    The comparison of the three mammalian species shows that the rate modification of CNCN positions is similar, leading to the retention, r, of about 35% of the CNCN positions detectable in the outgroup species. The CNCN retention rate is the same whether shark and bichir are used as outgroups or whether bichir and frog are used. The z' statistic for differences in the rate of modification of CNCN positions is between 0.32 and 0.97, and those differences are all far from significant.

    The comparison of the Xenopus sequence with the three mammalian data sets shows that the rate of modification of CNCN positions in the Xenopus lineage is higher than in the mammalian lineage. The Xenopus lineage retains about 33% of the CNCN positions detected in the comparison of shark and bichir sequences, whereas the mammalian lineages retain about 35%. All these differences are significant, with the comparison between Xenopus and human being significant at the 0.011 level, the comparison with mouse at the 0.039 level, and the comparison with rat marginally significant at the 0.067 level. Hence it seems that the Xenopus lineage experiences a higher rate of modifications of CNCN positions than the mammalian lineage.

    The results from the new test were compared to the Tajima relative rate test (Tajima 1993), which can also be applied to the kind of data analyzed here (see Appendix>). In table 2 the results for the Tajima test of the same data as in table 1 are summarized. The results are consistent with the ones from the z'-statistic (table 1), confirming that the Xenopus lineage evolves faster than the mammalian lineage. None of the comparisons of mammalian HoxA clusters are significant, but all the comparisons between Xenopus and the mammals are significant at least at the 5% level.

    The rate parameter estimated from the model for the three different mammalian lineages varies depending on the outgroup taxa used. The rate parameters are consistently smaller, the more distant is the most recent common ancestor of the compared taxa. This effect is anticipated based on the arguments put forward above (see under Estimating Footprint Loss Rates). The problem is that the comparison of the two outgroup species O and X will identify a number of spurious CNCN positions, which are identical in O and X due to chance. These CNCN positions then enter the estimation of the rate of evolution since the most recent common ancestor of A and B and inflate the rate estimate.

    To correct for this effect we performed a linear regression of rate estimates over the inverse of the time T since the most recent common ancestor of A and B. First we analyzed the rate estimates for the mammalian data with all possible combinations of outgroup species. The intercept was 0.218, but the data revealed a deviation from linearity in the plot of the residuals over 1/T. The regressions were thus repeated for data points using either only more distant (360 and 112 Myrs) or only the more recent common ancestors (112 and 40.7 Myrs). The rate estimates are 0.153 ± 0.071 (events 110g years) for the more recent time points and 0.378 ± 0.067 for the more distant time points. These results suggest that there is systematic rate variation in the evolution of mammalian lineages such that the rate of modification of CNCN positions is considerably higher in the stem lineage of amniotes and mammals than among placental mammals.

    The slope of the regression equation (11)> over 1/T allows an estimate of the fraction of spurious CNCN positions. These values suggest that only between 2% and 5% of the CNCN positions entering these calculations are spurious and thus do not greatly affect the variance used in calculating the z' statistic for the relative rate test.

    Discussion

    In this article we describe a method for detecting rate heterogeneity in the evolution of putative cis-regulatory elements. Rate heterogeneity can be detected both between two lineages and along the same lineage over different time frames. The approach detects putative cis-regulatory elements through their conservation among two outgroup species and records the rate of modification of CNCN sequences along two ingroup lineages. This approach comes with advantages as well as disadvantages. The advantage is that one does not have to rely on notoriously noisy predictions of transcription factor binding sites to assess the presence of cis-regulatory sites. The phylogenetic conservation of non-coding sequences is taken as evidence for the functional importance of non-coding DNA sequences (Tagle et al. 1988; Manen et al. 1994; Duret and Bucher 1997; Leung et al. 2000; Fickett and Wasserman 2000; Chiu et al. 2002; Blanchette and Tompa 2002; Santini et al. 2003; Ghanem et al. 2003). The disadvantage of this approach is that it is known that functionally conserved cis-regulatory elements can quickly lose their sequence similarity and would thus not be detectable as conserved non-coding sequences (Ludwig 2002; Phinchongsakuldit et al. 2004). At the same time, there are examples of functionally conserved enhancers which also retain sequence conservation over long evolutionary distances [e.g., Shashikant et al. (1998)]. The reasons for the differences of sequence conservation of functionally conserved cis-regulatory elements are unknown but may be related to such general factors as population size and mutation rate (Carter and Wagner 2002). We thus propose that the method presented here should be used primarily in a hypothesis testing framework. Below we outline a few scenarios in which the proposed test might be useful.

    The method might be useful to test the following hypothesis. It is plausible that the adaptation of a gene to a new function is not limited to the coding region of the gene, but also affects the cis-regulatory elements determining the location, timing, and the level of expression. Although it is relatively routine to detect selection in coding regions (Liberles et al. 2001), adaptive evolution of cis-regulatory elements is hard to detect in general [but see Kohn et al. (2004)]. The following hypothesis, however, is testable: if the coding region of a group of genes is under directional selection in one lineage (say, B) but not in another lineage (say, A), then the cis-regulatory elements will also evolve more quickly in lineage B than in lineage A. This hypothesis could be tested by comparing the rate of modification of CNCN sequences in the these two lineages.

    Another hypothesis testable by the proposed approach is that cis-regulatory elements of duplicated genes diverge asymmetrically—i.e., that one of the duplicates diverges faster than the other. This has been shown to be the case for coding sequences [e.g., Wagner (2002), Conant and Wagner (2003), Kondrashov et al. (2002), Zhang et al. (2003), and Kellis M (2004)], but it has not to our knowledge been demonstrated for cis-regulatory elements. Another hypothesis is that putative cis-regulatory elements evolve faster when the expression patterns of the genes in the same genomic region undergo evolution. A limited result along these lines has been presented in the example data set analyzed in this article, i.e., the HoxA cluster sequences of gnathostomes. The results suggest that, in the stem lineage of mammals and amniotes, the rate of CNCN sequence evolution is more than twice as high as among the placental mammals, human, mouse, and rat. This result is preliminary because of limited taxon sampling, but it is consistent with the idea that body-plan evolution involves major rewiring of transcriptional regulation of developmental genes (Davidson 2001).

    The usefulness of the proposed method strongly depends on the extent of taxon sampling. The example data set analyzed for this article consists of the complete sequences of HoxA clusters from six species. The continuing efforts to sequence the genomes from representatives of major clades will certainly increase the number of taxa that can be included in a comparative study of their non-coding sequences. Data sets from many different species will have considerable statistical and cladistic power if analyzed with appropriate statistical tools.

    Appendix

    TOP

    Abstract

    Introduction

    Data Acquisition

    A Model

    Estimating Footprint Loss Rates

    The HoxA Clusters of...

    Discussion

    Appendix

    Acknowledgements

    References

    Tajima's relative rate test (Tajima 1993) concerns the rates of evolutions along the terminal edges PA and PB where P = lca(A, B). In order to measure the rate of CNCN loss between P and A, however, we need an outgroup X since only the numbers defined in eq. (1)> can be obtained directly from the data. Let

    (12)

    be the numbers of CNCN positions that are present in X and are lost along PA but not along PB, and vice versa. The original Tajima statistic assumes that each residue compared is stochastically independent, which is not likely in the case of loss of conservation in putative cis-regulatory sequences, because the elementary evolutionary event is likely the loss of a transcription factor binding site. We thus correct for stochastic dependency in the same way we did for the z'-statistic proposed in this article by dividing the 2 value, which is a variance, by the average length of contiguous conserved sequences In order to test whether mA and mB are significantly different, we therefore consider the test statistics

    (13)

    Because we have a single degree of freedom, 2 is significant at 95% level of 2 > 3.841.

    Acknowledgements

    This work was supported by National Science Foundation (NSF) grant INB-0321470 to G.P.W. and the DFG Bioinformatics Initiative BIZ-6/1-2.

    References

    Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–410.

    Arnone, M. I., and E. H. Davidson. 1997. The hardwiring of development: Organization and function of genomic regulatory systems. Development 124:1851–1864.

    Blanchette, M., and M. Tompa. 2002. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res. 12:739–748.

    Carroll, S. B., J. K. Grenier, and S. D. Weatherbee. 2001. From DNA to Diversity. Blackwell Science, Malden, Mass.

    Carter, A. J., and G. P. Wagner. 2002. Evolution of functionally conserved enhancers can be accelerated in large populations: a population-genetic model. Proc. R. Soc. Lond. B Biol. Sci. 269:953–960.

    Chiu, C.-h., C. Amemiya, K. Dewar, C.-B. Kim, F. H. Ruddle, and G. P. Wagner. 2002. Molecular evolution of the HoxA cluster in the three major gnathostome lineages. Proc. Natl. Acad. Sci. USA 99:5492–5497.

    Chiu, C.-H., K. Dewar, G. P. Wagner, K. Takahashi, F. Ruddle, C. Ledje, P. Bartsch, J.-L. Scemama, E. Stellwag, C. Fried et al. 2004. Bichir HoxA cluster sequence reveals surprising trends in rayfinned fish genomic evolution. Genome Res. 14:11–17.

    Conant, G. C., and A. Wagner. 2003. Asymmetric sequence divergence of duplicate genes. Genome Res. 13:2052–2058.

    Davidson, E. 2001. Genomic Regulatory Systems. Academic Press, San Diego, Calif.

    Dermitzakis, E. T., C. M. Bergman, and A. G. Clark. 2003. Tracing the evolutionary history of Drosophila regulatory regions with models that identify transcription factor binding sites. Mol. Biol. Evol. 20:703–714.

    Duret, L., and P. Bucher. 1997. Searching for regulatory elements in human noncoding sequences. Curr. Opin. Struct. Biol. 7:399–406.

    Fickett, J. W., and W. W. Wasserman. 2000. Discovery and modeling of transcriptional regulatory regions. Curr. Opin. Biotechnol. 11:19–24.

    Ghanem, N., O. Jarinova, A. Amores, L. Qiaoming, G. Hatch, B. K. Park, J. L. R. Rubenstein, and M. Ekker. 2003. Regulatory roles of conserved intergenic domains in vertebrate Dlx bigene clusters. Genome Res. 13:533–543.

    Kellis M, L. E., and B. W. Birren. 2004. Proof and evolutionary analysis of ancient genome duplication in the yeast saccharomyces cerevisiae. Nature 428:617–624.

    Kim, C. B., C. Amemiya, W. Bailey, K. Kawasaki, J. Mezey, W. Miller, S. Minosima, N. Shimizu, W. G. P., and F. Ruddle. 2000. Hox cluster genomics in the horn shark, Heterodontus francisci. Proc. Natl. Acad. Sci. USA 97:1655–1660.

    Kohn, M. H., S. Fang, and C. I. Wu. 2004. Inference of positive and negative selection on the 5' regulatory regions of drosophila genes. Mol. Biol. Evol. 21:374–383.

    Kondrashov, F. A., I. B. Rogozin, Y. I. Wolf, and E. V. Koonin. 2002. Selection in the evolution of gene duplications. Genome Biol. 3:RESEARCH0008.

    Kumar, S., and B. Hedges. 1998. A molecular timescale for vertebrate evolution. Nature 392:917–920.

    Leung, J. Y., F. E. McKenzie, A. M. Uglialoro, P. O. Flores-Villanueva, B. C. Sorkin, E. J. Yunis, D. L. Hartl, and A. E. Goldfeld. 2000. Identification of phylogenetic footprints in primate tumor necrosis factor- promoters. Proc. Natl. Acad. Sci. USA 97:6614–6618.

    Liberles, D. A., D. R. Schreiber, S. Govindarajan, S. Chamberlin, and S. A. Benner. 2001. The adaptive evolution database (TAED). Genome Biol. 2:Research0028.

    Ludwig, M. Z. 2002. Functional evolution of noncoding DNA. Curr. Opin. Genet. Dev. 12:634–639.

    Ludwig, M. Z., C. Bergman, N. H. Patel, and M. Kreitman. 2000. Evidence for stabilizing selection in a eukaryotic enhancer element. Nature 403:564–567.

    Manen, J., V. Savolainen, and P. Simon. 1994. The atpB and rbcL promoters in plastid DNAs of a wide dicot range. J. Mol. Evol. 38:577–582.

    Morgenstern, B. 1999. DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15:211–218.

    Phinchongsakuldit, J., S. MacArthur, and J. F. Y. Brookfield, 2004. Evolution of developmental genes: molecular microevolution of enhancer sequences at the Ubx locus in Drosophila and its impact on developmental phenotypes. Mol. Biol. Evol. 21:348–363.

    Prohaska, S., C. Fried, C. Flamm, G. Wagner, and P. F. Stadler. 2004a. Surveying phylogenetic footprints in large gene clusters: applications to Hox cluster duplications. Mol. Phylogenet. Evol. 31:581–604.

    Prohaska, S. J., C. Fried, C. T. Amemiya, F. H. Ruddle, G. P. Wagner, and P. F. Stadler. 2004b. The shark HoxN cluster is homologous to the human HoxD cluster. J. Mol. Evol. 58:212–217.

    Santini, S., J. L. Boore, and A. Meyer. 2003. Evolutionary conservation of regulatory elements in vertebrate Hox gene clusters. Genome Res. 13:1111–1122.

    Shashikant, C., C. B. Kim, M. A. Borbley, W. C. Wang, and F. H. Ruddle. 1998. Comparative studies on mammalian Hoxc8 early enhancer sequence reveal a baleen whale–specific deletion of a cis-acting element. Proc. Natl. Acad. Sci. USA 95:15446–15451.

    Stern, D. L. 2000. Evolutionary developmental biology and the problem of variation. Evolution 54:1079–1091.

    Tagle, D. A., B. F. Koop, M. Goodman, J. L. Slightom, D. L. Hess, and R. T. Jones. 1988. Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J. Mol. Biol. 203:439–455.

    Tajima, F. 1993. Simple methods for testing molecular clock hypothesis. Genetics 135:599–607.

    Tautz, D. 2000. Evolution of transcriptional regulation. Curr. Opin. Genet. Dev. 10:575–579.

    Wagner, A. 2002. Asymmetric functional divergence of duplicate genes in yeast. Mol. Biol. Evol. 19:1760–1768.

    Wray, G. A., M. W. Hahn, E. Abouheif, J. P. Balhoff, M. Pizer, M. V. Rockman, and L. A. Romano. 2003. The evolution of transcriptional regulation in eukaryotes. Mol. Biol. Evol. 20:1377–1419.

    Zhang, P., Z. Gu, and W. H. Li. 2003. Different evolutionary patterns between young duplicate genes in the human genome. Genome Biol. 4:R56.(Günter P. Wagner*, Claudi)