当前位置: 首页 > 期刊 > 《分子生物学进展》 > 2005年第4期 > 正文
编号:11176558
Rates of Protein Evolution Are Positively Correlated with Developmental Timing of Expression During Mouse Spermatogenesis
http://www.100md.com 《分子生物学进展》
     Department of Ecology and Evolutionary Biology, University of Arizona

    Correspondence: E-mail: jgood@email.arizona.edu.

    Abstract

    Male reproductive genes often evolve very rapidly, and sexual selection is thought to be a primary force driving this divergence. We investigated the molecular evolution of 987 genes expressed at different times during mouse spermatogenesis to determine if the rate of evolution and the intensity of positive selection vary across stages of male gamete development. Using mouse-rat orthologs, we found that rates of protein evolution were positively correlated with the developmental timing of expression. Genes expressed early in spermatogenesis had rates of divergence similar to the genome median, while genes expressed after the onset of meiosis were found to evolve much more quickly. Rates of protein evolution were fastest for genes expressed during the dramatic morphogenesis of round spermatids into spermatozoa. Late-expressed genes were also more likely to be specific to the male germline. To test for evidence of positive selection, we analyzed the ratio of nonsynonymous to synonymous changes using a maximum likelihood framework in comparisons among mouse, rat, and human. Many genes showed evidence of positive selection, and most of these genes were expressed late in spermatogenesis and were testis specific. Overall, these data suggest that the intensity of positive selection associated with the evolution of male gametes varies considerably across development and acts primarily on phenotypes that develop late in spermatogenesis.

    Key Words: Mus musculus ? pleiotropy ? sexual selection ? sperm development ? X chromosome

    Introduction

    Genes associated with male reproduction often evolve rapidly (Swanson and Vacquier 2002). For example, several mammalian proteins involved in sperm form and function have greatly accelerated rates of molecular evolution and appear to be under positive selection (Wyckoff, Wang, and Wu 2000; Torgerson, Kulathinal, and Singh 2002; Swanson, Nielsen, and Yang 2003). It is not clear, however, whether this rapid evolution is associated with genes expressed during particular developmental stages or whether it is a general feature of genes expressed throughout reproduction. There are several reasons why the developmental timing of expression may be important for understanding patterns of molecular evolution in male reproductive genes.

    First, sexual selection may be driving the evolution of male reproductive proteins, either through male-male competition or through male-female interactions (Swanson and Vacquier 2002). In mice, spermatogenesis begins at birth and consists of approximately 11 days of mitotic cell growth, 10 days of meiosis, followed by 14 days during which haploid spermatids undergo a dramatic morphological transformation into sperm (Eddy 2002). If sexual selection on male gametes acts predominantly on mutations that influence the form and function of sperm, then genes expressed late in spermatogenesis might be expected to evolve more quickly and to show a greater signature of positive selection. In particular, genes involved in the morphological transformation from spermatids into sperm are likely targets for sexual selection.

    Second, variation in evolutionary rate among reproductive genes might be due to differences in functional constraint rather than (or in addition to) the action of positive selection. Genes expressed early in development (Raff 1996) or in many different tissues (Duret and Mouchiroud 2000) are expected to exhibit more pleiotropic effects. Hence, genes that are expressed late in development or are tissue specific might be expected to evolve more rapidly on average. Some studies support these predictions while others do not. For example, in mammals, tissue-specific genes evolve more rapidly and exhibit a larger variance in evolutionary rate compared to widely expressed genes (Duret and Mouchiroud 2000; Winter, Goodstadt, and Ponting 2004). On the other hand, there is little evidence for a simple relationship between rates of molecular evolution and developmental timing of expression in Caenorhabditis elegans (Castillo-Davis and Hartl 2002, 2003; Cutter and Ward 2005).

    Thus, hypotheses for the rapid evolution of male reproductive genes based on sexual selection or reduced functional constraint both predict that genes expressed late in spermatogenesis will evolve more quickly than early-expressed genes. However, sexual selection may lead to a ratio of nonsynonymous to synonymous nucleotide changes that is greater than one for some late-expressed genes. This signature of positive selection is not expected for genes under relaxed functional constraint.

    A third way in which developmental timing of expression may be important for understanding the molecular evolution of male reproductive proteins relates to the genomic location of these genes. In mammals, recessive mutations that benefit males are expected to accumulate on the X chromosome because selection will be more effective in the hemizygous sex (Rice 1984; Charlesworth, Coyne, and Barton 1987). However, the X chromosome is inactivated in the male germline during meiosis, and this ought to select against X linkage for genes that are expressed late in spermatogenesis (Betrán, Thornton, and Long 2002; Wu and Xu 2003). Consistent with this, there appears to be an excess of X-linked genes in the mouse that are expressed early in spermatogenesis (Wang et al. 2001), while late-expressed genes are rarely on the X chromosome (Betrán, Thornton, and Long 2002; Khil et al. 2004). The overrepresentation of X-linked genes early in spermatogenesis might be caused by (1) an evolutionary shift towards earlier expression of genes that were always X linked (Wu and Xu 2003) or (2) an evolutionary increase in the number of X-linked genes with male-specific function, either through recruitment of genes from the autosomes or through the evolution of novel function at preexisting X-linked genes. The second hypothesis predicts that early-expressed, X-linked spermatogenic genes may show a signature of positive selection.

    The recent publication of patterns of gene expression in testes and other tissues in the house mouse, Mus musculus (Schultz, Hamra, and Garbers 2003), presents an opportunity to test these predictions and ask whether rates of evolution depend on when genes are expressed during development. Here, we investigate the molecular evolution of genes expressed in the male germline of mice at nine developmental time points during spermatogenesis (Schultz, Hamra, and Garbers 2003). We discovered a significant positive correlation between developmental timing of expression and the rate of protein evolution. Moreover, many late-expressed, rapidly evolving genes were testes specific in their expression, and many showed evidence of positive selection in a maximum likelihood analysis. These observations are consistent with a strong role for sexual selection in driving the evolution of male reproductive genes in mice, particularly in later developmental stages.

    Materials and Methods

    Expression Data

    We used the data of Schultz, Hamra, and Garbers (2003) to identify genes expressed in the germline of developing mouse testes. They used the Mouse U74v2 Affymetrix array set to generate expression profiles in testes 1, 4, 8, 11, 14, 18, 21, 26, 29, and 60 days after birth (fig. 1). Of the 21,374 transcripts expressed during at least one time point in this series, 3,794 transcripts changed at least threefold in expression level relative to day 1 and were not expressed in testis somatic cells (i.e., Sertoli or interstitial cells). Of this set, Schultz, Hamra, and Garbers (2003) grouped 2,785 transcripts into eight distinct clusters based on overall similarity of expression profiles (see below). To ensure accurate annotation and to remove redundant probes, we queried the Ensembl database (Build 32; http://www.ensembl.org/Mus_musculus/) to link each of the 2,785 affymetrix probe IDs to annotated genes. Probe IDs with unknown or ambiguous genes associations in Ensembl were excluded, resulting in a set of 1,420 genes.

    FIG. 1.— Overview of male germ cell development in the mouse testis (modified from Eddy 2002). Spermatogenesis starts shortly after birth and consists of three major phases: 11 days of mitotic cell growth, 10 days of meiosis, and 14 days postmeiotic spermiogenesis. The major cell type is given for each phase. Transcription occurs until midway through the postmeiotic phase except on the X chromosome, which is inactivated during the pachytene step of prophase I (day 14; McCarrey 1993). Following the postmeiotic phase, spermatozoa are released into the lumen of the seminiferous tubule of the testis and transported to the epididymis for storage. The nine developmental time points sampled by Schultz, Hamra, and Garbers (2003) are indicated along the bottom.

    Previous studies have suggested that genes expressed early in spermatogenesis are disproportionately X linked and that genes expressed late in spermatogenesis are underrepresented on the X (Wang et al. 2001; Khil et al. 2004). To estimate the expected number of genes per chromosome, we calculated the observed proportion of genes per chromosome annotated in Ensembl (Build 32). Because some of the genes considered in this study correspond to genes predicted by Ensembl, we included both known and predicted genes in our null expectation.

    Two approaches were used to associate genes with specific time points or phases of spermatogenesis based on their expression profiles. First, Schultz, Hamra, and Garbers (2003) applied a Gaussian clustering algorithm to their data and identified eight groups of genes based on overall similarity of expression profiles across all time points (ARRAYMINER 4.0, Optimal Designs, Brussels, Belgium). Here, we used these clusters to identify groups of genes whose expression increased or decreased at particular times during spermatogenesis. Second, for each gene we noted the day of highest expression. Expression data collected at day 60 were not considered here because spermatogenesis is cyclic and all stages overlap in adult testis (Eddy 2002).

    Tissue-specific expression profiles were calculated using data from the Genomics Institute of the Novartis Research Foundation (GNF) SymAtlas (v0.8; symatlas.gnf.org/SymAtlas/), an Affymetrix array database generated by Su et al. (2004). Expression was quantified based on the average signal difference between match and mismatch probes. GNF probe numbers were matched to Ensembl genes using the University of California, Santa CRUZ (UCSC) Mouse Gene Sorter (genome.ucsc.edu). For each gene, we measured tissue specificity as the proportion of expression in the most highly expressed tissue relative to the total level of expression summed over all tissues (Winter, Goodstadt, and Ponting 2004). Using the same approach, we also calculated the proportion of expression in the testis We define testis- and tissue-specific genes as those with and respectively. This definition corresponds to approximately four times the median or of all the genes in the GNF SymAtlas database (N = 36,118). To control for correlated expression across tissues we only considered 31 tissues previously reported to not have significantly similar expression profiles in the mouse (Winter, Goodstadt, and Ponting 2004). Our measure of testis specificity should be broadly equivalent to male germline specificity because the genes identified by Schultz, Hamra, and Garbers (2003) were enriched for transcripts expressed in spermatogenic cells and not in the somatic cells of the testis.

    Rates of Molecular Evolution

    Protein-coding domains for 987 mouse and rat one-to-one orthologs were obtained from the Ensembl database. For all evolutionary analyses we only considered annotated gene pairs with orthology based on best reciprocal BLAST hits. We translated each pair into amino acid sequences, aligned them in-frame using the PileUp routine (Wisconsin Package v10.2, Genetics Computer Group, Madison, Wis.), and back translated to nucleotide sequence. We then calculated the number of nonsynonymous substitutions per nonsynonymous site (KA) and the number of synonymous substitutions per synonymous site (KS) using the method of Li (1993) as implemented in the Diverge routine (Wisconsin Package v10.2, Genetics Computer Group). All pairs with fewer than 100 aligned codons were discarded. For genes with multiple transcripts we aligned all pairwise combinations and retained those pairs with the fewest number of ambiguous codons and/or the lowest KA/KS values. In all group comparisons we used R (www.r-project.org/) to construct 95% confidence intervals about group means and medians based on 10,000 bootstrap replicates. All other statistical tests were conducted using JMP (v5.0, SAS Institute Inc., Cary, N.C.) unless otherwise noted.

    To test for positive selection we examined 387 mouse-rat-human orthologs (also based on Ensembl annotation) using a maximum likelihood framework. Genes considered in the mouse-rat pairwise comparisons were excluded from this analysis if they lacked an annotated human ortholog or were represented by more than one transcript in at least one species. First, orthologous peptide sequences were aligned with CLUSTALW using default parameters (v1.8; Thompson, Higgins, and Gibson 1994). Then nucleotide sequences were aligned in-frame with RevTrans (v1.3, written by R. Wernersson) using the amino acid alignment as a guide. Several model-based tests of positive selection have been proposed (Yang 2002). We chose to conduct a relatively conservative test using the codeml program in the PAML package (Yang 1997). For each gene tree, we fit the data to two models (models M7 and M8; Yang et al. 2000). M7 assumes that KA/KS (dN/dS) values follow a beta distribution constrained between 0 and 1, while the second model (M8) allows for an additional category of sites with KA/KS > 1. The likelihood of data under each model can be directly compared using a standard likelihood ratio test because M7 is a nested subset of M8. Positive selection is inferred if M8 provides a significantly greater fit when tested using a chi-square distribution. We chose a significance threshold of P < 0.01 to be conservative. M8 has been reported to be sensitive to starting values of (KA/KS) and tends to get stuck in local optima. We used an initial of 1 to identify genes with a significantly better fit to M8. We then further evaluated M7 and M8 in this subset using an initial value of 0.1 to check for starting point dependency. Only the highest likelihood scores are reported here.

    Functional Annotation of Genes

    We retrieve functional annotation of genes from Ensembl (Build 32; http://www.ensembl.org/Mus_musculus/)based on the Gene Ontology database (Ashburner et al. 2000). All levels of the annotation hierarchy were retrieved from each of the three Gene Ontology categories (i.e., biological process, cellular component, and molecular function).

    Results

    Developmental Timing of Expression

    Of the 2,785 affymetrix probe IDs identified by Schultz, Hamra, and Garbers (2003), 1,420 provided annotated matches to Ensembl genes. Table 1 gives the average change in expression relative to day 1 of these genes when partitioned into the eight expression clusters defined by Schultz, Hamra, and Garbers (2003). Although expression profiles were highly dynamic, the numbering of clusters provides a general temporal order of expression during spermatogenesis. Our numbering differs from that of Schultz Hamra, and Garbers (2003), who focused primarily on the five late-expressed clusters. Genes in clusters 1–3 were expressed early and showed a steady decrease in expression following early premeiosis (cluster 1) or early meiotic phases of spermatogenesis (clusters 2 and 3). Genes in clusters 4–8 showed a dramatic and continual increase in expression coincident with the onset of meiosis or later (table 1). Using the day of highest expression provided similar results in all analyses below; therefore, we only report analyses based on expression clusters.

    Table 1 Average Fold Change in Expression for Each Cluster Relative to Day 1

    Timing of Expression and KA/KS

    We aligned and calculated KA/KS values for 987 mouse-rat orthologous pairs. Genes expressed late in spermatogenesis (clusters 4–8) were found to evolve significantly faster than genes expressed early in spermatogenesis (clusters 1–3; figs. 2 and 3). Forty-four of the top 50 most rapidly evolving genes (KA/KS 0.56–1.95; table S1) were from late-expressed clusters (expected = 32.3) and 19 of these were found in cluster 7 (expected = 7.7). Furthermore, KA/KS was positively correlated with developmental timing of expression for these 987 genes (Spearman's = 0.201, P < 0.0001). In all comparisons, rates of synonymous substitutions (KS) were not significantly different among groups. Available functional data for the 50 most rapidly evolving genes are summarized in table S1.

    FIG. 2.— Evolutionary rates for genes expressed early and late in spermatogenesis. Comparisons are given for the entire data set (A), widely expressed genes (B), tissue-specific genes (C), and testis-specific genes. Values are based on pairwise comparisons between mouse and rat using the method of Li (1993). Error bars represent 95% confidence intervals about group medians based on 10,000 bootstrap replicates. The dashed line denotes the genome median KA/KS value of 0.11 (Gibbs et al. 2004).

    FIG. 3.— Rates of protein evolution across spermatogenesis based on 987 pairwise mouse-rat comparisons. Genes are clustered into groups with similar expression profiles as defined by Schultz, Hamra, and Garbers (2003). The numbering used here provides a general temporal order of timing of expression during spermatogenesis (table 1) with the major developmental phases indicated along the bottom. Error bars represent 95% confidence intervals, and the dashed line denotes the genome median KA/KS value of 0.11 (Gibbs et al. 2004). Overall rates of protein evolution were significantly correlated with developmental timing of expression (Spearman's = 0.201, P < 0.0001).

    To examine the role of tissue specificity in influencing rates of evolution across spermatogenesis we calculated the degree of tissue- and testis-specific expression for the 620 genes that were also found in the GNF SymAtlas database. In general, late-expressed genes had a higher degree of tissue and testis specificity (table 2). Furthermore, testis-specific genes were expressed almost exclusively late in spermatogenesis. We compared rates of evolution across spermatogenesis in three groups of genes: those expressed widely those expressed primarily in one tissue and those expressed primarily in testis There is considerable overlap between these last two groups, with 244 of the 620 genes showing their highest expression in the testis (i.e., ). In each of these three groups, the mean KA/KS was higher late in spermatogenesis compared to early in spermatogenesis, although this difference was greatest for testis- and tissue-specific genes (fig. 2). Similarly, there was a positive trend between KA/KS and timing of expression for genes in each of these three groups (table 3). This pattern was significant for tissue-specific genes (Spearman's = 0.166, P = 0.0012) but not for the other groups (testis-specific, Spearman's = 0.099, P = 0.1433; widely expressed, Spearman's = 0.093, P = 0.1526). However, most testis-specific genes are expressed late in spermatogenesis, leaving little power to detect a correlation in this comparison. When testis-specific genes were excluded from the tissue-specific set, the positive correlation disappeared (N = 161, Spearman's = 0.037, P = 0.6397). Thus, the high KA/KS values late in spermatogenesis in the complete data set seemed to be associated largely, though not exclusively, with genes that had testis-specific patterns of expression.

    Table 2 Tissue Specificity and KA/KS Across Expression Clusters

    Table 3 KA/KS in Testis-specific, Tissue-specific, and Widely Expressed Genes

    Tests of Positive Selection

    To determine whether the high rates of evolution late in spermatogenesis are caused by relaxed constraint or positive selection, we analyzed a subset of the genes using PAML (Yang 1997). This subset consisted of 387 of the 987 genes and corresponded to those for which human-rat-mouse one-to-one orthologs could be identified and are not represented by more than one transcript per species. Restricting the analysis to this subset is conservative for detecting selection because some of the excluded genes are likely to be rapidly evolving. Twenty-nine of the 387 genes we analyzed using a maximum likelihood framework showed evidence of positive selection (for each of these 29 genes, P < 0.01, 2 > 9.21, df = 2; table S2). Only nine of these remained significant after a sequential Bonferroni correction for multiple comparisons (for each of these nine genes, P < 0.000026; 2 > 21.09). This Bonferroni correction is likely to be conservative. For example, with a P value of 0.01, only four false positives are expected by chance in a sample of 387 genes. Thus, many or most of the 29 genes we have identified may be under selection. Selection was more common late in spermatogenesis (table 4; Fisher's Exact Test, P = 0.025) and among testis-specific genes (table 5; Fisher's Exact Test, P = 0.009). The latter comparison is based on a smaller sample because only 255 of the 387 genes considered here were represented in the GNF database. There was no significant relationship between the incidence of positive selection and overall tissue specificity (Fisher's Exact Test, P = 0.180). Available functional data for the 29 genes identified to be under positive selection are summarized in table S2.

    Table 4 Number of Genes Under Selection Early and Late in Spermatogenesis

    Table 5 Number of Testis-specific and Nonspecific Genes Under Selection

    Genomic Location and KA/KS

    Others have shown that genes expressed prior to X inactivation in early meiosis are preferentially located on the X chromosome, while late-expressed genes are preferentially located on the autosomes (Betrán, Thornton, and Long 2002; Khil et al. 2004). Overall there were fewer genes on the X than expected at random (N = 1,420, Observed = 44 X-linked, Expected = 62; Fisher's Exact Test, P = 0.046). However, genes in clusters 1–3 were slightly but not significantly overrepresented on the X chromosome (N = 538, Observed = 28 X-linked, Expected = 24; Fisher's Exact Test, P = 0.286), while late-expressed genes were underrepresented on the X (N = 882, Observed = 16 X-linked, Expected = 39; Fisher's Exact Test, P = 0.001).

    We tested if early-expressed genes showed higher rates of evolution when X linked. The median KA/KS value for early-expressed X-linked genes (0.142, C.I. 0.069–0.290, N = 21) was found to be slightly but not significantly higher than autosomal genes (0.118, C.I. 0.104–0.133, N = 329).

    Discussion

    Genes involved in sex and reproduction are among the most rapidly evolving genes known, both in terms of coding sequence and expression, presumably due to the action of sexual selection (Begun et al. 2000; Wyckoff, Wang, and Wu 2000; Torgerson, Kulathinal, and Singh 2002; Swanson, Nielsen, and Yang 2003; Glassey and Civetta 2004). Here, we examined the molecular evolution of genes expressed at different developmental time points during mouse spermatogenesis. We found a significant positive correlation between developmental timing of expression and rates of protein evolution (fig. 3). Genes expressed early in spermatogenesis were found to evolve at rates similar to the genome median (KA/KS = 0.11; Gibbs et al. 2004), while genes expressed after the onset of meiosis were found to evolve much faster. This pattern might be consistent with either relaxed constraint due to reduced pleiotropy, increased positive selection, or both, during the later stages of spermatogenesis. Below we discuss each of these in turn.

    Pleiotropy and Constraint

    There is weak evidence for reduced constraint in some of the genes examined here. For example, widely expressed genes showed higher rates of evolution late in spermatogenesis compared to early in spermatogenesis, although this difference was not significant (fig. 2B). In addition, tissue-specific and testis-specific genes showed higher rates of evolution than widely expressed genes, a pattern that might reflect differences in level of constraint, but also likely reflects differences in the strength of positive selection (see below).

    Spermatogenesis in mice appears to be exceptionally sensitive to genetic disruption. This conclusion stems primarily from the observation that knockout phenotypes disproportionately involve male sterility (Escalier 2001), including a number of genetic targets not previously implicated in male reproduction (Tourtellotte et al. 1999; Xu et al. 1999). However, there is little evidence that early events in spermatogenesis are more vulnerable. Particularly delicate steps have been identified across all phases of spermatogenesis, including primordial germ cells and spermatogonia (premeiotic), primary spermatocytes (meiotic), and elongating spermatids (postmeiotic; Escalier 2001). Consistent with this, we found only a weak trend relating rates of protein evolution to developmental timing of expression independent of testis specificity.

    In mammals, tissue-specific genes evolve more rapidly than widely expressed genes (Duret and Mouchiroud 2000). However, there is a large variance in rates of evolution across tissues, and tissue-specific biology may play a larger role than general models of pleiotropic constraint in explaining this pattern (Winter, Goodstadt, and Ponting 2004). For example, genes specific to the liver evolve more than three times as fast as those specific to the dorsal root ganglion in rodents (Winter, Goodstadt, and Ponting 2004). The correspondence between increased rates of protein evolution and increased testis specificity late in spermatogenesis was striking (table 2). Moreover, variation in rates of protein evolution depended more heavily on variation in testis specificity across spermatogenesis than overall levels of tissue specificity illustrating the importance of testis-specific biology versus overall levels of constraint in these data. We found that testis-specific genes expressed late in spermatogenesis have a median KA/KS (0.231, N = 216) of more than twice the genomic rate of 0.11 and over 70% higher than previous estimates for testis-specific genes (Winter, Goodstadt, and Ponting 2004).

    Positive Selection and Spermatogenesis

    The high rate of protein evolution in the later stages of spermatogenesis appears to be driven also by a higher prevalence of positive selection. Forty-four of the 50 most rapidly evolving genes (KA/KS 1.95–0.56, pairwise mouse-rat comparisons) were from late-expressed clusters. Likewise, a significant proportion of the genes found to be under positive selection within the subset we analyzed using maximum likelihood were from late-expressed clusters (table 4).

    Why would positive selection be more frequent late in spermatogenesis? First, development of the male germline becomes increasingly specialized following the onset of meiosis, and most testis-specific genes are expressed late in spermatogenesis (table 2; Hecht 1993; Schultz, Hamra, and Garbers 2003). In general, there appears to be strong selection for the evolution of novel reproductive function in the mouse (Eddy 2002; Waterston et al. 2002). This increase in testis specificity might facilitate adaptive evolution if mutations that are beneficial when expressed in the testis are free from pleiotropic effects in other tissues.

    Second, many of the gametic phenotypes that are likely to be targets of sexual selection, either due to sperm competition or interaction with the female reproductive tract, develop during the dramatic postmeiotic morphogenesis of round spermatids into spermatozoa. During this period, the nucleus is reshaped and condensed, the acrosome and flagellum are formed, and much of the cytoplasm is removed (Meistrich 1993). Genes in cluster 7 showed on average an almost 40-fold increase in expression during this transformation (days 21–29, table 1). Furthermore, this cluster had one of the highest median rates of protein evolution (fig. 3), the largest proportion of very rapidly evolving genes (table S1), and the highest incidence of positive selection in our maximum likelihood analysis (table S2).

    Some of these genes are thought to be involved in the formation of the sperm cell surface and are logical candidates for playing a role in fertilization (Schultz, Hamra, and Garbers 2003). For example, we identified multiple genes involved in protein binding and sperm-egg interactions that are rapidly evolving and/or positively selected (e.g., Spam, Spaca3, Zpr3; tables S1 and S2), consistent with the findings of previous studies (Torgerson, Kulathinal, and Singh 2002; Swanson, Nielsen, and Yang 2003; Castillo-Davis et al. 2004). Given their direct interaction with the female gamete, these proteins are logical targets for sexual selection through sperm competition and/or sexual conflict (Torgerson, Kulathinal, and Singh 2002). Interestingly, several of the rapidly evolving genes we identified in cluster 7 are involved in transcription, nuclear reorganization, and condensation (tables S1 and S2). Usually proteins involved in DNA packaging are highly conserved across animal taxa (Thatcher and Gorovsky 1994). However, nuclear reorganization and DNA condensation may play a key role in sperm competition and ultimately fertilization ability, and, therefore, genes underlying these phenotypes could be major targets of selection. Consistent with this, multiple proteins involved in DNA packaging in sperm appear to be under positive selection within primates (Wyckoff, Wang, and Wu 2000). Extensive divergence in sperm head morphology has been documented within mammals, even among closely related species (Breed 2004). However, the genotypic basis of this variation remains poorly known. Given the large number of testis-specific genes expressed during spermiogenesis, the potential number of selective targets could be quite large.

    Finally, other forces might lead to strong positive selection on phenotypes that develop during spermatogenesis, independent of sexual selection. In particular, natural selection related to immune defense could contribute to divergence in reproductive genes. In mammals, genes involved in immunity often evolve rapidly (Waterston et al. 2002; Castillo-Davis et al. 2004). Furthermore, multiple genes involved in mammalian pathogen defense have been identified that are expressed in reproductive tissue and appear to be under positive selection (Maxwell, Morrison, and Dorin 2003; Lynn et al. 2004). However, it is unclear if testis-specific genes involved in immune defense are common enough to contribute to the general pattern of increased rates of molecular evolution across spermatogenesis that we observed.

    Adaptive Evolution and the X Chromosome

    Previous studies have reported that a disproportionately high number of X-linked genes are expressed in the mouse testis during early spermatogenesis (Wang et al. 2001; Khil et al. 2004) and that these genes evolve more quickly on average than comparable autosomal genes (Torgerson and Singh 2003). Both patterns are consistent with some theoretical predictions of the evolution of sexually selected genes (Rice 1984). In mammals, this phenomenon is expected to be mostly restricted to early spermatogenesis because X inactivation during early meiosis clearly selects against X linkage (Betrán, Thornton, and Long 2002; Wu and Xu 2003). However, we found little evidence of either rapid evolution or differential accumulation of early-expressed genes on the X chromosome. There was a weak, nonsignificant trend towards increased KA/KS in premeiotic X-linked genes. Moreover, only 3 of the 50 most rapidly evolving genes and 1 of the 29 genes identified with maximum likelihood–based tests of selection were X linked. However, none of the eight X-linked genes considered by Torgerson and Singh (2003) were found within our set of 1,420 genes. We also found only a slight trend towards increased X linkage early in spermatogenesis. A recently proposed theoretical model that incorporates pleiotropy does not predict increased X linkage of sexually selected genes involved in male reproduction (Fitzpatrick 2004). As mentioned above, we found a much lower degree of testis specificity early in spermatogenesis. If this lack of testis specificity reduces the efficacy of selection, it is not surprising that genes expressed early in spermatogenesis identified in the current study do not conform to the classic predictions that do not consider pleiotropy (Rice 1984). For this and the developmental reasons mentioned above, the X chromosome seems a less likely target of sexual selection on genes expressed in the male germline.

    However, our result is in contrast to a stronger bias of early testis-expressed genes towards the X reported in a recent analysis of numerous data sets (Wang et al. 2001; Khil et al. 2004), including some of the data collected by Schultz, Hamra, and Garbers (2003). A number of differences exist between our study and the work of Khil et al. (2004). For example, we required that all genes have at least a threefold change in expression during spermatogenesis and be absent in expression profiles from somatic tissues of the testis. Consequently, there appears to be surprisingly little overlap between the set of testis-expressed genes considered by Khil et al. (2004) and those in the current study. It also remains unclear if the previously described patterns are driven by sexual selection or adaptive evolution in general. Other functional classes presumably not influenced by sexual selection, such as genes expressed in muscle and the brain, have also been reported to be overrepresented on the X (Bortoluzzi et al. 1998; Zechner et al. 2001), and many aspects of the unusual genetic composition of this chromosome have yet to be fully explained (Vallender and Lahn 2004).

    Conclusions

    Rates of protein evolution and tissue specificity dramatically increased with timing of expression in mouse spermatogenesis. The highest rates of protein evolution were found in testis-specific genes expressed during the morphological transformation of round spermatids into spermatozoa. Conversely, genes expressed early in spermatogenesis were found to evolve at rates similar to the genome median. Tests of natural selection using a likelihood framework confirmed that at least part of this pattern was driven by a higher incidence of positive selection later in sperm development. Overall, these data suggest that the strength of positive selection commonly associated with the evolution of male gametes varies considerably across development and acts primarily on phenotypes that develop during the postmeiotic process of spermiogenesis.

    Supplementary Materials

    Table S1. Fifty most rapidly evolving genes based on pairwise KA/KS comparisons between mouse and rat.

    Table S2. Twenty-nine genes inferred to be under positive selection using a maximum likelihood framework.

    Acknowledgements

    We thank Nikolaus Schultz and Andrew Su for providing access to gene expression data in mice. We are grateful to Matthew Saunders, Matt Dean, and Asher Cutter for providing codes to automate some of the analyses conducted in this study. Matthew Saunders, Matt Dean, and Tovah Salcedo provided critical comments on an early version of this manuscript. We also thank Bruce Walsh and Asher Cutter for useful discussions during the development of this project. This research was supported by a University of Arizona NSF IGERT Genomics Initiative fellowship to J.M.G. and a NSF grant to M.W.N.

    References

    Ashburner, M., C. A. Ball, J. A. Blake et al. (20 co-authors). 2000. Gene Ontology: tool for the unification of biology. Nat. Genet. 25:25–29.

    Begun, D. J., P. Whitley, B. L. Todd, H. M. Waldrip-Dail, and A. G. Clark. 2000. Molecular population genetics of male accessory gland proteins in Drosophila. Genetics 156:1879–1888.

    Betrán, E., K. Thornton, and M. Long. 2002. Retroposed new genes out of the X in Drosophila. Genome Res. 12:1854–1859.

    Bortoluzzi, S., L. Rampoldi, B. Simionati et al. (13 co-authors). 1998. A comprehensive, high-resolution genomic transcript map of human skeletal muscle. Genome Res. 8:817–825.

    Breed, W. G. 2004. The spermatozoon of Eurasian murine rodents: its morphological diversity and evolution. J. Morphol. 261:52–69.

    Castillo-Davis, C. I., and D. L. Hartl. 2002. Genome evolution and developmental constraint in Caenorhabditis elegans. Mol. Biol. Evol. 19:728–735.

    Castillo-Davis, C. I., and D. L. Hartl. 2003. Conservation, relocation and duplication in genome evolution. Trends Genet. 19:593–597.

    Castillo-Davis, C. I., F. A. Kondrashov, D. L. Hartl, and R. J. Kulathinal. 2004. The functional genomic distribution of protein divergence in two animal phyla: coevolution, genomic conflict, and constraint. Genome Res. 14:802–811.

    Charlesworth, B., J. A. Coyne, and N. H. Barton. 1987. The relative rates of evolution of sex chromosomes and autosomes. Am. Nat. 130:113–146.

    Cutter, A. D., and S. Ward. 2005. Sexual and temporal dynamics of molecular evolution in C. elegans development. Mol. Biol. Evol. 22:178–188.

    Duret, L., and D. Mouchiroud. 2000. Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol. Biol. Evol. 17:68–74.

    Eddy, E. M. 2002. Male germ cell gene expression. Recent Prog. Horm. Res. 57:103–128.

    Escalier, D. 2001. Impact of genetic engineering on the understanding of spermatogenesis. Hum. Reprod. Update 7:191–210.

    Fitzpatrick, M. J. 2004. Pleiotropy and the genomic location of sexually selected genes. Am. Nat. 163:800–808.

    Gibbs, R. A., G. M. Weinstock, M. L. Metzker et al. (230 co-authors). 2004. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428:493–521.

    Glassey, B., and A. Civetta. 2004. Positive selection at reproductive ADAM genes with potential intercellular binding activity. Mol. Biol. Evol. 21:851–859.

    Hecht, N. B. 1993. Gene expression during male germ cell development. Pp. 400–432 in C. Desjardins and L. L. Ewing, eds. Cell and molecular biology of the testis. Oxford University Press, New York.

    Khil, P. P., N. A. Smirnova, P. J. Romanienko, and R. D. Camerini-Otero. 2004. The mouse X chromosome is enriched for sex-biased genes not subject to selection by meiotic sex chromosome inactivation. Nat. Genet. 36:642–646.

    Li, W.-H. 1993. Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J. Mol. Evol. 36:96–99.

    Lynn, D. J., A. T. Lloyd, M. A. Fares, and C. O' Farrelly. 2004. Evidence of positively selected sites in mammalian -defensins. Mol. Biol. Evol. 21:819–827.

    Maxwell, A. I., G. M. Morrison, and J. R. Dorin. 2003. Rapid sequence divergence in mammalian ?-defensins by adaptive evolution. Mol. Imunnol. 40:413–421.

    McCarrey, J. R. 1993. Development of the germ cell. Pp. 58–89 in C. Desjardins and L. L. Ewing, eds. Cell and molecular biology of the testis. Oxford University Press, New York.

    Meistrich, M. L. 1993. Nuclear morphogenesis during spermiogenesis. Pp. 67–97 in D. de Kretser, ed. Molecular biology of the male reproductive system. Academic Press, San Diego, Calif.

    Raff, R. A. 1996. The shape of life: genes, development, and the evolution of animal form. University of Chicago Press, Chicago, Ill.

    Rice, W. R. 1984. Sex chromosomes and the evolution of sexual dimorphism. Evolution 38:735–742.

    Schultz, N., F. K. Hamra, and D. L. Garbers. 2003. A multitude of genes expressed solely in meiotic or postmeiotic spermatogenic cells offers a myriad of contraceptive targets. Proc. Natl. Acad. Sci. USA 100:12201–12206.

    Su, A.I., T. Wiltshire, S. Batalov et al. (13 co-authors). 2004. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA 101:6062–6067.

    Swanson, W. J., R. Nielsen, and Q. F. Yang. 2003. Pervasive adaptive evolution in mammalian fertilization proteins. Mol. Biol. Evol. 20:18–20.

    Swanson, W. J., and V. D. Vacquier. 2002. The rapid evolution of reproductive proteins. Nat. Rev. Genet. 3:137–144.

    Thatcher, T. H., and M. A. Gorovsky. 1994. Phylogenetic analysis of the core histones H2a, H2b, H3, and H4. Nucleic Acids Res. 22:174–179.

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.

    Torgerson, D. G., R. J. Kulathinal, and R. S. Singh. 2002. Mammalian sperm proteins are rapidly evolving: evidence of positive selection in functionally diverse genes. Mol. Biol. Evol. 19:1973–1980.

    Torgerson, D. G., and R. S. Singh. 2003. Sex-linked mammalian sperm proteins evolve faster than autosomal ones. Mol. Biol. Evol. 20:1705–1709.

    Tourtellotte, W. G., R. Nagarajan, A. Auyeng, C. Mueller, and J. Milbrandt. 1999. Infertility associated with incomplete spermatogenic arrest and oligozoospermia in Egr4-deficient mice. Development 126:5061–5071.

    Vallender, E. J., and B. T. Lahn. 2004. How mammalian sex chromosomes acquired their peculiar gene content. BioEssays 26:159–169.

    Wang, P. J., J. R. McCarrey, F. Yang, and D. C. Page. 2001. An abundance of X-linked genes expressed in spermatogonia. Nat. Genet. 27:422–426.

    Waterston, R. H., K. Lindblad-Toh, E. Birney et al. (222 co-authors). 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562.

    Winter, E. E., L. Goodstadt, and C. P. Ponting. 2004. Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. Genome Res. 14:54–61.

    Wu, C. I., and E. Y. Xu. 2003. Sexual antagonism and X inactivation: the SAXI hypothesis. Trends Genet. 19:243–247.

    Wyckoff, G. J., W. Wang, and C. I. Wu. 2000. Rapid evolution of male reproductive genes in the descent of man. Nature 403:304–309.

    Xu, X., P. A. Toselli, L. D. Russell, and D. C. Seldin. 1999. Globozoospermia in mice lacking the casein kinase II alpha' catalytic subunit. Nat. Genet. 23:118–121.

    Yang, Z. H. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555–556.

    Yang, Z. H. 2002. Inference of selection from multiple species alignments. Curr. Opin. Genet. Dev. 12:688–694.

    Yang, Z. H., R. Nielsen, N. Goldman, and A. M. K. Pedersen. 2000. Codon substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431–449.

    Zechner, U., M. Wilda, H. Kehrer-Sawatzki, W. Vogel, R. Fundele, and H. Hameister. 2001. A high density of X-linked genes for general cognitive ability: a run-away process shaping human evolution? Trends Genet. 17:697–701.(Jeffrey M. Good and Micha)