当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2004年 > 第12期
编号:11409096
Adaptive Evolution Drives the Diversification of Zinc-Finger Binding D
http://www.100md.com 《分子生物学进展医学期刊》
     Center for Applied Mathematics and Department of Mathematics, Cornell University, Ithaca, New York

    E-mail: rtd1@cornell.edu.

    Abstract

    The human genome is estimated to contain 700 zinc-finger genes, which perform many key functions, including regulating transcription. The dramatic increase in the number of these genes as we move from yeast to C. elegans to Drosophila and to humans, as well as the clustered organization of these genes in humans, suggests that gene duplication has played an important role in expanding this family of genes. Using likelihood methods developed by Yang and parsimony methods introduced by Suzuki and Gojobori, we have investigated four clusters of zinc-finger genes on human chromosome 19 and found evidence that positive selection was involved in diversifying the family of zinc-finger binding motifs.
, http://www.100md.com
    Key Words: tandem gene duplication ? adaptive evolution ? zinc-finger genes

    Introduction

    In the human genome, there are hundreds of zinc-finger genes organized into more than a dozen different families (Gell, Crossley, and Mackay 2003). Here, we will concentrate on the C2H2 type, a 28 amino acid motif that is named for the two cysteines and two histidines that form a tetrahedral complex around a zinc ion to produce the finger structure (Miller, McLachan, and Klug 1985) (fig. 1). Zinc fingers are tandemly repeated at the end of zinc-finger genes. The number of repeats ranges from two up to three dozen or more. In rodents and in humans, about one third of the zinc-finger genes carry the Krüppel-associated box (KRAB), a potent repressor of transcription (Margolin et al. 1994), which is named for the Drosophila segmentation gene Krüppel (Schuh et al. 1986; Bellefroid et al. 1991). There are more than 200 KRAB-containing zinc-finger genes in the human genome, about 40% of which reside on chromosome 19 and show a clustered organization suggesting an evolutionary history of duplication events (Dehal et al. 2001).
, 百拇医药
    FIG. 1.— Structure of a zinc finger. Stars indicate sites involved in DNA binding.

    The total number of zinc-finger genes appears to have increased through evolution. There are 564 to 706 in humans compared with 234 to 357 in D. melanogaster, 68 to 151 in C. elegans, and 34 to 48 in S. cerevisiae (Lander et al. 2001; Venter et al. 2001). The average number of fingers per gene has increased, numbering 8, 3.5, 2.5, and 1.5, respectively, in the four species just mentioned (Looman 2003).
, 百拇医药
    In addition to a general increase in the number of zinc-finger genes, some regions of the human genome contain many such genes with no homologs in rodents. Bellefroid et al. (1995) studied the ZNF91 gene family on human chromosome 19p12-p13.1. They found ZNF91 family members in a number of primate species but could find no murine gene with sequence similarity to ZNF91. They concluded that this cluster resulted from duplication events some 55 MYA.

    The structure and binding properties of zinc-finger genes have been extensively studied (see Wolfe, Nekludova, and Pabo [1999] for a review). A C2H2 zinc finger consists of an -helix that begins between the first two asterisks in figure 1 and continues to the first histidine. The remainder of the finger consists of two antiparallel ? sheets. The amino acids at positions –1, 3, and 6 with respect to the -helix make contacts to bases 3, 2 ,and 1 in the primary DNA strand, whereas the amino acid at -helix position 2 makes contact to the complement of base 4. The recognition code for zinc-finger binding has been widely studied (Choo and Klug 1997). However recent research (Benos, Lapedes, and Stromo 2002) suggests that no simple 1 to 1 relationship exists but that different amino acid sequences bind to target nucleotide sequences with different efficiencies.
, 百拇医药
    The H/C link TGEKPY/F separating adjacent fingers (dark gray in figure 1), the two C and two H positions bound to the zinc atom to make the finger, as well as the hydrophobic phenyalanine (F) and leucine (L), are highly conserved. However, the four sites involved in binding the protein to DNA indicated by asterisks in figure 1 are highly variable.

    These observations and the fact that even closely related genes display distinct patterns of tissue-specific expression (Shannon et al. 2003) suggest that gene duplication has aided in the diversification of zinc-finger binding motifs. Shannon et al. (2003) used pairwise dN/dS comparisons to examine selective pressures in what we will call clusters I and II below. The goal of this paper is to use the methods of Yang et al. (2000), Yang and Swanson (2002), and Suzuki and Gojobori (1999) to look for signs of positive selection in these clusters and others on human chromosome 19.
, http://www.100md.com
    Materials and Methods

    Using the Human Genome Resources on the NCBI Web site (http://www.ncbi.nlm.nih.gov/genome/guide/human/), we downloaded sequences for all genes on chromosome 19 that were described as zinc-finger genes. In regions where these genes clustered, we examined the Locus Link entries for nearby predicted genes and included those annotated as having C2H2 zinc fingers or KRAB domains, resulting in a total of 173 genes. To complete our data set, we found the annotated mouse (29) and rat (20) orthologs of the human genes.
, 百拇医药
    To examine the relationship between zinc finger genes, we aligned the KRAB domains and spacer sequences of our genes using ClustalW. We did not use the zinc fingers in the alignment because the number varied considerably between genes, and the repetitive zinc finger structure resulted in the alignment of fingers with much dissimilarity. Alignments were done using the European Bioinformatics Institute's server (http://www.ebi.ac.uk/clustalw/) with default parameters. As described in Thompson, Higgins, and Gibson (1994), ClustalW (1) performs a pairwise alignment of all sequences, (2) computes a distance matrix based on the percentage of identities between the two aligned sequences, (3) produces a tree by the neighbor joining algorithm, and then (4) uses the tree to guide the multiple alignment.
, http://www.100md.com
    Using the clustering of genes on the tree and a comparison of their -helix sequences (see Results for more details), we identified four sets of genes for study. For each gene cluster, we obtained the mRNA sequences from the NCBI Web site and located the fingers that were common to all of the genes to make our comparison data set. In each case, alignment of the selected fingers using ClustalW resulted in an alignment with no gaps in any sequence and trees that agreed with those that had been constructed from the alignment of KRAB domains and spacer sequences. To further confirm the phylogenies, we built trees using parsimony and neighbor-joining methods implemented in PHYLIP using the Web server at http://bioweb.pasteur.fr/seqanal/phylogeny/phylip-uk.html. In clusters II to IV, the trees from all methods were identical. In cluster I, we found two tree topologies that differed in the positions of ZNF 224 and 225, which are almost equidistant from the pair ZNF 155 and 221, so we analyzed this cluster under both trees. Results of subsequent tests were very similar for the two trees. To look for signs of positive selection in our four clusters, we used the following three approaches.
, http://www.100md.com
    Site-Specific Models

    Nielsen and Yang (1998) and Yang et al. (2000) introduced various models to study how the distribution of = dN/dS varies along sequences. Model M7 has an for each site drawn from a beta distribution with parameters p and q. Model M8 uses the M7 recipe for a fraction p0 of the sites and assigns another to the remaining fraction. M7 and M8 are nested models, so they can be compared using a likelihood ratio test (LRT). Twice the difference in log-likelihood between models is compared with the value obtained under a 2 distribution with degrees of freedom equal to the difference in number of parameters between models (in this case 2). When M8 fits the data significantly better than M7 and the ratio estimated under model M8 is greater than 1, we need to ask whether it is significantly greater than 1. To do this, we recalculate the log-likelihood value in M8 while fixing to be 1 (model M8A from Swanson, Nielsen, and Yang [2003]) and compare the change in likelihood with a 2 distribution with 1 degree of freedom.
, 百拇医药
    Fixed-Sites Models

    The approach in the last paragraph does not take into account the fact that zinc fingers are periodic, so we will also use a method developed by Yang and Swanson (2002) that allows us to take advantage of a priori knowledge. We divide the sites into three classes: constrained sites (finger positions 1, 2, 4, 7, 11, 17, 20, 24, 25, 26, 27, and 28), the binding sites (13, 15, 16, and 19), and the remaining "unconstrained" sites. We have used quotation marks because it will turn out that these sites have values significantly smaller than 1.
, 百拇医药
    Let be the transition/transversion ratio, i the frequency of amino acid i, and let rj denote the ratio of substitution rates for the jth site class to that of the first, with r1 = 1. Yang and Swanson (2002) introduced the following models. In model A, there is only one rate class, and all sites use the same , , and values. In model B, the r values are different, but all sites use the same , , and values. In model C, the r and values are different, but all sites use the same and values. In model D, the r, , and values are different, but all sites use the same values. In model E, each class has a different set of parameters. In model F, the sites are divided into three groups and analyzed separately. Tests were carried out using version 3.14 of PAML software introduced by Yang (1997).
, 百拇医药
    Parsimony Analysis

    At the request of two referees, we used Suzuki and Gojobori's (1999) method as implemented in ADAPTSITE.p version 1.3 (http://mep.bio.psu.edu/adaptivevol.html) to look for positive selection in our four clusters. The test is based on comparing the observed total number of synonymous (sc) and nonsynonymous (nc) substitutions for a codon, to the binomial with tc trials and success probability p, where tc is the total number of changes and p is the fraction of synonymous changes expected in the tree. There are several reasons not to use this test. The first reason is that the distribution of sc conditioned on the observed values of tc and p is not binomial (R. Durrett, unpublished data). The second reason is that the test has very low power unless the number of sequences compared is large (see Wong et al. [2004]). Suzuki and Gojobori (1999) say that a tree length of at least 2.5 nucleotide changes per codon site is needed to detect positive selection. Adding the branch lengths of the maximum-parsimony trees shows that our clusters range from 0.45 to 0.6 changes per site. However, we can remedy this problem by taking advantage of the periodic structure of zinc-finger genes and grouping codons together by position in the 9 to 10 fingers being compared. This is similar to our second PAML analysis, but now our groups are the 28 finger positions rather than the three classes of sites. Because of our a priori beliefs, we performed one-tailed tests of positive selection at the four binding sites and of negative selection at the other sites.
, http://www.100md.com
    Results

    Statistical Analysis

    Figure 2 displays a histogram of the number of zinc fingers, defined as a sequence of 28 amino acids having C's, H's, F's and L's in the expected location. The average number of fingers for genes in our data set is 10.92. The five genes with the largest number are LOC126502(28), LOC25893(29), ZNF91(31), and LOC126494(34) and its mouse ortholog, MMU380856(30). Here and in what follows, we will replace LOC in the name of mouse and rat genes by MMU and RNO to make it clear what species they come from.
, 百拇医药
    FIG. 2.— Histogram of the number of fingers in the 222 genes in our data set.

    Table 1 gives amino acid usage by position in the finger. Numbers at the top of each column refer to the positions, and letters in the second row of each column give the predicted residue for each location, with asterisks (*) indicating the four DNA binding sites. The predicted residues in the H/C link all appear in at least 1,848 of the 2,435 cases. We also note that the second binding site (position 15) is serine in 1,740 cases, but all the binding sites are clearly variable.
, 百拇医药
    Table 1 Amino Acid Counts for the 2435 Fingers in the 222 Genes in Our Data Set

    Clustering of Genes

    Figure 3 illustrates the tree for genes on the p arm of chromosome 19. The number after the human gene name locates the start of the gene in megabases. It is visually obvious that the tree structure reflects the geographic structure of genes on chromosome 19. For example, if we sever one arc of the tree, then we separate the 23 genes that reside at 2.79 to 12.40 megabases from the 18 genes at 20.07 to 24.06 megabases. The probability we could get this result by cutting one of 40 arcs in the tree is at most 40/C41,23 = 1.98 x 10–10, where C41,23 is the number of ways of choosing 23 objects from a set of 41. There are 14 pairs of genes that are adjacent on the tree (i.e., both are connected to the same interior node). Eleven of these 14 genes are adjacent on the chromosome, suggesting tandem duplication events. If we keep the tree fixed and randomly reshuffle the labels, then the probability we would see this pattern is at most
, 百拇医药
    FIG. 3.— Tree for KRAB containing zinc finger genes on human chromosome 19p. Numbers give their chromosomal location in megabases. Clusters III and IV are indicated. Note the close relationship of genes at 20 to 24 megabases.

    Figure 4 shows zinc-finger genes on the q arm of chromosome 19 that reside in the clusters at 49 and 58 megabases, which were earlier identified by Dehal et al. (2001). The 10 zinc-finger genes at 49.14 to 49.36 megabases are adjacent on the chromosome and can be separated from the rest of the tree by cutting one arc, an event of probability at most 43/C44,10 = 1.73x10–8. Tang, Waterman, and Yooseph (2002) studied the pattern of duplication in this cluster of human genes using specialized phylogenetic methods. Shannon et al. (2003) investigated this group of genes and also those that appear in the corresponding part of mouse chromosome 7, Zfp genes 61, 93, 108, 109, 111, and 235.
, 百拇医药
    FIG. 4.— Tree for zinc-finger genes in clusters at 49 and 58 megabases on human chromosome 19q, along with related mouse and rat genes. Numbers give their chromosomal location in megabases of the gene or of its human ortholog. Clusters I and II are indicated.

    Figure 5 in Shannon et al. (2003) indicates the relationship between the last five Zfp genes and the trio Zfp61, ZNF226, and ZNF234 by representing fingers as boxes with various shading. That picture and the reasons for considering fingers to be similar become clearer if we list the -helix sequence for each finger, the seven amino acid sequence containing the four binding sites as shown in tables 2 and 3. A square bracket indicates a finger that has lost one of its critically important C, H, F, or L residues, and a number indicates that insertions or deletions have changed the length from 28. As also indicated in figure 5 of Shannon et al. (2003), table 2 reveals that the fingers in columns 5 to 9 and 18 to 19 of Zfp61, ZNF226, and ZNF234 are closely related, whereas the fingers in columns 10 to 17 seem to have been added in the lineage leading to ZNF 226 and 234. To have genes with comparable fingers, we choose ZNF 230, 222, 223, 221, 155, 224, and 225 to be cluster I.
, http://www.100md.com
    FIG. 5.— Genes near the telomere of human chromosome 19 that have orthologs in rat and mouse. Human genes are numbered by their chromosomal location in megabases, and rat and mouse genes are numbered by the location of their human ortholog. The structure of the tree suggests that all of these genes were present in the common ancestor of humans and rodents.

    Table 2 Alpha Helix Sequences for the Cluster of Human Zinc-Finger Genes at 49 Megabases on Chromosome 19, Including Cluster I
, http://www.100md.com
    Table 3 Alpha Helix Sequences for Mouse and Rat Genes That Are Related to ZNF235, Including Cluster II

    Table 3 lays out the -helix sequences for six of the genes considered in the left half of figure 5 of Shannon et al. (2003) and two rat genes that Locus Link once reported as being orthologs of Zfp93 and Zfp108, but which have recently been removed as being "pseudogenes" (NCBI Help Desk email correspondence). Many of the relationships depicted in table 2 of Shannon et al. (2003) are visible in ours. However, it is not clear why they concluded that the fingers in columns 17 to 19 of Zfp111 are duplicates of those in columns 13 to 15 and are, in turn, homologous to columns 11 to 12 of ZNF235 and Zfp235, 93, and 108. To have similar finger structures, we choose ZNF235, Zfp235, Zfp93, Zfp108, RNO308423, and RNO308422 to be cluster II.
, 百拇医药
    We examined -helix sequences for all of our zinc-finger genes to identify other groups. Here and in what follows, the numbers in parentheses indicate the start of the gene in megabases. As shown in table 4, the -helix sequences of ZNF440(11.78) and ZNF439(11.83) show strong signs of tandem duplication, as do ZNF44(12.22), LOC147837(12.28), ZNF442(12.36), LOC90576(12.36), and ZNF443(12.40). From the five intervening genes we choose ZNF20(12.10) to complete cluster III. Notice that these genes appear together in the tree in figure 3. Our final cluster, cluster IV, consists of ZNF90(20.07), LOC163233(20.51), ZNF85(20.89), ZNF430(20.99), LOC148206(21.04), ZNF431(21.11), and LOC163227(21.69). These genes appear in two groups in the tree (fig. 3) but their -helix sequences given in table 5 are very similar to the others in the group.
, 百拇医药
    Table 4 Alpha Helix Sequences of Cluster III

    Table 5 Alpha Helix Sequences of Cluster IV

    In contrast to the four clusters considered above, one that occurs at the telomere of chromosome 19, which we will call cluster V, has been very stable. Table 6 lists the NCBI annotated genes in this region, and their orthologs in mouse and rat as given in NCBI's Locus Link. Apart from the somewhat unexpected location of Zfp35 on mouse chromosome 18 and of the AIBG orthologs on mouse chromosome 15 and rat chromosome 7, there has been little rearrangement. If one inverts the order of the last eight genes on the rat chromosome, then the order and orientation of the genes agree, with the exception of the two FLJs on lines 6 and 7. Figure 5 illustrates the relationship between genes in table 6 as inferred by ClustalW. In contrast to the other clusters considered earlier, there is no evidence of duplication since the divergence of humans from rodents.
, 百拇医药
    Table 6 Cluster V and Orthologous Genes in Mouse and Rat

    Tests for Positive Selection

    Using the codeml program in PAML, we first applied the LRT M7 versus M8 to our four zinc-finger clusters. As table 7 shows, we reject the null hypothesis of no sites under positive selection in clusters I, III, and IV, with the Bayesian posterior pointing to several sites potentially under positive selection. In the case of cluster IV, the test statistic is 2l = 2(lM8 – lM7) = 47.910, which is compared with 2 with df = 2, so P < 0.0001. Parameter estimates in cluster IV suggest that 5% of sites are under positive selection with = 6.58. There are seven sites for which their posterior probability of > 1 is greater than 0.95. Four of these appear at the first binding site (finger position 13) and three appear at the third binding site (16). In each of clusters I and III, PAML identifies a number of sites with posterior probability greater than 0.5 of positive selection, but there is only one site with a significant (> 0.95) posterior probability of positive selection. These appear at sequence position 1 (in the H/C link) in cluster I and position 182 (finger position 14 in the binding region) in cluster III. The fitted values of in clusters I, III, and IV are 2.42, 1.53, and 6.58, respectively. To test whether these are significantly greater than 1, we perform the LRT M8 versus M8A. Clusters I and IV yield significant results, but cluster III just misses the cutoff with P = 0.07.
, 百拇医药
    Table 7 Log-Likelihood Values and Fitted Parameters for Site-Specific Models

    In our second analysis of these models, we divide the sites into constrained, binding, and unconstrained sites as described above. Results of the fixed-sites models are given in table 8. It should not be surprising that model B, which allows the mutation rate to vary between classes, and model D, which allows (transition/transversion ratio) and to vary among partitions, in all cases emerge as significant improvements (P < 0.001) in the comparisons A versus B and B versus D.
, http://www.100md.com
    Table 8 Log-Likelihood Values for Fixed-Sites Models and Fitted Values of = dN/dS

    Models C and E allow the amino acid frequencies to vary between classes. This introduces a large number of additional parameters, but paradoxically, in most cases results in fits that have a much worse likelihood than their simpler counterparts B and D. For example, in cluster I, model C is 45.9 units worse than B, and E is 43.8 units worse than D. Our best guess for the cause of this phenomenon is that when the sites are divided into classes, the observed frequencies of amino acids at the constrained sites differ considerably from the overall usage of amino acids in the protein, and this causes trouble for the mutation model in PAML.
, 百拇医药
    Model F is a separate analysis of the three partitions (i.e., it runs model A for each partition separately). As expected, the estimated ratios at the constrained sites are small in all four clusters 0.20, 0.02, 0.16, and 0.48, and the unconstrained sites are larger 0.66, 0.23, 0.55, and 0.48. For the class of binding sites, we get values larger than 1 in clusters I, III, and IV: 1.14, 2.22, and 2.10. However, in cluster II, our estimate is 0.34. To test whether the values observed at the binding sites are significantly different from 1, we recalculate the log-likelihood values in model F by fixing 1 to be 1 and perform the LRT as described above. Cluster I is not significant, but clusters III, IV, and II are significant with P values 0.05, 0.01, and 0.001, respectively.
, 百拇医药
    In the last analysis of our four clusters, we applied the parsimony-based program ADAPTSITE.p (Suzuki and Gojobori 1999) to look for selection at individual codon sites. No positively selected sites are identified in any cluster, but several nonbinding sites turn out to be under negative selection at the 5% significance level in clusters I (15 sites), II (24 sites), III (17), and IV (five sites).

    Results of our analysis using ADAPTSITE.p with data pooled by finger position are given in table 9. There are three binding sites with significant positive selection, finger position 13 in cluster III (P < 0.0004) and positions 16 and 19 in cluster IV (P < 0.0046 and P < 0.0458, respectively), but only the first two are smaller than the threshold of 0.0178 demanded by the Bonferroni correction for our 28 tests. Again, there are a large number of nonbinding sites that show negative selection at this level. In cluster II, this occurs for 21 of the 24 nonbinding sites, with four of the P values smaller than 10–6. Indeed, two of the binding sites, positions 15 and 16, show negative selection with P values less than 0.0013 and less than 0.00001, respectively, which is consistent with previous PAML analysis.
, http://www.100md.com
    Table 9 One-Sided P Values for Negative Selection at Nonbinding Sites and Positive Selection at Binding Sites

    Discussion

    Our study of four clusters of zinc-finger genes on human chromosome 19 has shown significant evidence for positive selection in cluster IV in all three analyses. In cluster III, the P values are borderline in the first PAML analysis (site-specific models) but significant in the second analysis (fixed-sites models), and there is strong support for binding site position 13 being under positive selection in the third analysis (parsimony analysis). In the case of cluster I, the significant result from the first test is not supported by the second and third. Finally, for cluster II, the second and third analyses show significant evidence of negative selection at the binding sites.
, 百拇医药
    The results for cluster II are consistent with those of Shannon et al. (2003), who examined dN/dS ratios at three of the binding sites (our finger positions 13, 16, and 19) and found no evidence of positive selection in cluster II genes but significant evidence of purifying selection in pairwise comparisons of ZNF235 with Zfp235, Zfp93, and Zfp109 (see their table 2). In ZNF genes near our cluster I, they find significant evidence of positive selection in comparisons of 226 with 230, 223, 284, and 222; 234 with 221; and 284 with 230. In no case are both of their compared genes within our cluster I (which consists of 155, 221 to 225, and 230). Some of the comparisons that Shannon et al. (2003) find significant are quite curious in view of the data presented in table 2. ZNF223 has nine zinc fingers versus 17 in ZNF226, and the overlapping fingers do not align well. ZNF284 and ZNF230 are more similar in length (11 versus nine fingers) but comparison of the -helix sequences reveals very little overall similarity.
, 百拇医药
    Tandemly duplicated genes are subject to gene conversion events. Given the ability of gene conversion to homogenize gene families (see e.g., Chapter 11 of Li [1997]), it is natural to ask whether concerted evolution can introduce correlated changes in different lineages and hence invalidate the use of Yang's and Suzuki and Gojobori's methods, which assume independent substitutions. We cannot rule out the possibility that gene conversion acted soon after duplication to protect the duplicated copies from becoming pseudogenes (see Walsh [1987]), an effect that can cause the underestimation of divergence times (see Teshima and Innan [2004]). However, there are two reasons to doubt that this force has acted in the recent past.
, http://www.100md.com
    First we observe that gene conversion acts to homogenize genes that perform the same function. Yet, Shannon et al.'s (2003) study of cluster I show that these genes have different tissue-specific expression patterns. The second obvious point is that if gene conversion is still acting, it is not doing a very good job. At a gross level, the numbers of zinc fingers of the genes in cluster I are 9, 9, 9, 15, 11, 19, and 17, respectively (the first three appear to be recent duplicates). Within clusters, there is considerable divergence between sequences. For example, in cluster IV, 23 synonymous and 36 nonsynonymous differences separate the 840 nucleotides in the most closely related pair (ZNF431 and LOC148206), and there are more than 100 differences between a typical pair of genes.
, 百拇医药
    Several studies have presented evidence of gene conversion by examining patterns in the differences between genes and pointing out regions of unusually high similarity (see figure 5 in Sharon et al. [1999], figure 6 in Lazzaro and Clark [2001]), and figure 6 in Bettencourt and Feder [2001]). To look for similar signals in our data, we conducted an analysis (fig. 6) in which we calculated the number of nucleotide differences in a 168-nucleotide window (the length of two fingers) between adjacent genes in each cluster, advancing the window by 7 nucleotides until the end of the sequence is reached. Successive differences in each cluster are indicated by hollow squares, diamonds, and triangles, followed by filled versions of the symbols and an X for the seventh comparison. We find a lot of variability in divergence, but with the exception of one gene pair at the end of cluster III, no other regions dip below 5 nucleotide differences and most are above 10, which represents 6% divergence in the window. Assuming a mutation rate of 2 x 10–8 per nucleotide per generation, this suggests that gene conversion has not acted on these clusters in the past 3 million generations.
, 百拇医药
    FIG. 6.— A sliding window analysis calculates the number of nucleotide differences between successive sequences in the cluster. We use a 168-nucleotide window (the length of two zinc fingers) that advances by 7 nucleotides in each step.

    One of the disappointing aspects of our research is that although there are other groups of zinc-finger genes on human chromosome 19 showing visible signs of a close relationship, we have found only two new clusters of genes where positive selection can be demonstrated. There is a large group of genes near the centromere on the p arm of chromosome 19 with no orthologs in rodents, but the reasons for the explosive growth of this gene family remain a mystery.
, 百拇医药
    Acknowledgements

    This work was supported by a joint NSF-NIGMS grant DMS-0201037 to R.D. and by a National Science Foundation graduate fellowship in applied mathematics to D.S. The authors would like to thank the two referees and the associate editor for their remark, which helped to strengthen the paper's conclusions.

    References

    Bellefroid, E. J., D. A. Poncelet, P. J. Lecocq, O. Relevant, and J. M. Martial. 1991. The evolutionarily conserved Krüppel-associated box domain defines a subfamily of eukaryotic multifingered proteins. Proc. Natl. Acad. Sci. 88:3608–3612.
, http://www.100md.com
    Bellefroid, E. J., J. C. Marine, A. G. Matera, C. Bourginion, T. Desai, K. C. Healy, P. Bray-Ward, J. A. Martial, J. N. Ihle, and D. C. Ward. 1995. Emergence of the ZNF91 Krüppel-associated box-containing zinc finger gene family in the last common ancestor of the Anthropedia. Proc. Natl. Acad. Sci. USA 92:10757–10761.

    Benos, P. V., A. S. Lapedes, and G. D. Stormo. 2002. Probabilistic code for DNA recognition by proteins of the EGR family. J. Mol. Biol. 323:701–727.
, 百拇医药
    Bettencourt, B. R., and M. E. Feder. 2001. Hsp70 duplication in the Drosophila melanogaster species group: How and when did two become five?. Mol. Biol. Evol. 18:1272–1282.

    Choo, Y., and A. Klug. 1997. Physical basis of a protein-DNA recognition code. Curr. Opinion. Struct. Biol. 7:117–125.

    Dehal, P., P. Predki, A. S. Olsen et al. (21 co-authors). 2001. Human chromosome 19 and related regions in mouse: conservative and lineage specific evolution. Science 293:104–111.
, 百拇医药
    Gell, D., M. Crossley, and J. Mackay. 2003. Zinc-finger genes. Pp. 823–828 in D. N. Cooper, ed. The nature encyclopedia of the human genome, Vol. 5. MacMillan Publishers, London.

    Lander, E. S., L. M. Linton, B. Birren et al. (256 co-authors). 2001. Initial sequencing and analysis of the human genome. Nature 409:860–921.

    Lazzaro, B. P., and A. G. Clark. 2001. Evidence for recurrent paralogous gene conversion and exceptional allelic divergence in the Attacin genes of Drosophila melanogaster. Genetics 159:659–671.
, 百拇医药
    Li, W. H. 1997. Molecular evolution. Sinauer Associates, Sunderland, Mass.

    Looman, C. 2003. The ABC of KRAB zinc finger proteins. Comprehensive summaries of Uppsala dissertations, Acta Universitatis Upsalenis.

    Margolin, J. F., J. R. Friedman, W. K. Meyer, H. Vissing, H. J. Thiesen, and F. J. Rauscher III. 1994. Krüppel-associated boxes are potent transcriptional repressor domains. Proc. Nat. Acad. Sci. USA 91:4509–4513.
, 百拇医药
    Miller J., A. McLachan, and A. Klug. 1985. Repetitive zinc-binding domains in the protein transcription factor IIA from Xenopus oocytes. EMBO J. 4:1609–1614.

    Nielsen R., and Z. Yang. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929–936.

    Schuh, R., W. Aichler, U. Gaul et al. (11 co-authors). 1986. A conserved family of nuclear proteins containing structural elements of Krüppel, a Drosophila segmentation gene. Cell 47:1025–1032.
, 百拇医药
    Shannon, M., A. T. Hamilton, L. Gordon, E. Branscomb, and L. Stubbs. 2003. Differential expansion of zinc-finger transcription factor loci in homologous human and mouse gene clusters. Genome Res. 13:1097–1110.

    Sharon, D., G. Glusman, Y. Pilpel, M. Khen, F. Gruetzner, T. Haaf, and D. Lancet. 1999. Primate evolution of an olfactory receptor cluster: diversification by gene conversion and recent emergence of pseudogenes. Genomics 61:24–36.
, http://www.100md.com
    Suzuki, Y., and T. Gojobori. 1999. A method for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 16:1315–1328.

    Swanson, W. J., R. Nielsen, and Q. Yang, 2003. Pervasive adaptive evolution in mammalian fertilization proteins. Mol. Biol. Evol. 20:18–20.

    Tang, M., M. Waterman, and S. Yooseph. 2002. Zinc finger clusters and tandem gene duplication. J. Comp. Biol. 9:429–446.

, 百拇医药     Teshima, K. M., and H. Innan. 2004. The effect of gene conversion on the divergence between duplicated genes. Genetics 166:1553–1560.

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.

    Venter, J. C., M. D. Adams, E. W. Myers et al. (274 co-authors). 2001. The sequence of the human genome. Science 291:1304–1351.
, 百拇医药
    Walsh, J. B. 1987. Sequence-dependent gene conversion: Can duplicated genes diverge fast enough to escape conversion?. Genetics 117:543–557.

    Wolfe, S. A., L. Nekludova, and C. O. Pabo. 1999. DNA recognition by Cys2His2 zinc finger proteins. Annu. Rev. Biophys. Biomol. Struct. 3:183–212.

    Wong, W. S. W., Z. Yang, N. Goldman, and R. Nielsen. 2004. Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. (in press).
, 百拇医药
    Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555–556.

    Yang, Z., R. Nielsen, N. Goldman, and A. M. K. Pedersen. 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431–449.

    Yang, Z., and W. J. Swanson. 2002. Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. Mol. Biol. Evol. 19:49–57., 百拇医药(Deena Schmidt and Rick Durrett)