当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2004年 > 第2期 > 正文
编号:11259344
A Gradient of Silent Substitution Rate in the Human Pseudoautosomal Region
     School of Biosciences, University of Birmingham, Birmingham, United Kingdom

    E-mail: d.filatov@bham.ac.uk.

    Abstract

    It has been demonstrated that recombination in the human p-arm pseudoautosomal region (p-PAR) is at least twenty times more frequent than the genomic average of 1 cM/Mb, which may affect substitution patterns and rates in this region. Here I report the analysis of substitution patterns and rates in 10 human, chimpanzee, gorilla, and orangutan genes across the p-PAR. Between species silent divergence in the p-PAR forms a gradient, increasing toward the telomere. The correlation of silent divergence with distance from the p-PAR boundary is highly significant ( = 0.911, P < 0.001). After exclusion of the CpG dinucleotides this correlation is still significant ( = 0.89, P < 0.01), thus the substitution rate gradient cannot be explained solely by the differences in the extent of methylation across the p-PAR. Frequent recombination in the PAR may result in a relatively strong effect of biased gene conversion (BGC), which, because of the increased probability of fixation of the G or C nucleotides at (A or T)/(G or C) segregating sites, may affect substitution rates. BGC, however, does not seem to be the factor creating the substitution rate gradient in the p-PAR, because the only gradient is still detactable if only AT and GC substitutions are taken into account ( = 0.82, P < 0.01). I hypothesize that the substitution rate gradient in the p-PAR is due to the mutagenic effect of recombination, which is very frequent in the distal human p-PAR and might be lower near the p-PAR boundary.

    Key Words: human ? pseudoautosomal region ? recombination ? silent substitution rate ? mutation rate

    Introduction

    Human X and Y chromosomes pair and recombine in two small pseudoautosomal regions (PARs) at the ends of the sex chromosomes (Cooke, Brown, and Rappold 1985; Freije et al. 1992). The short arms (Xp/Yp) of the sex chromosomes contain the p-PAR, which is over 2 Mb in size (Brown 1988; Petit, Levilliers, and Weissenbach 1988). Chiasmata between X and Y chromosomes in the p-PAR are essential for the correct segregation of the sex chromosomes in male meiosis (Ellis and Goodfellow 1989; Burgoyne et al. 1992). Obligate crossing over in the relatively small p-PAR results in a very high recombination rate in the region. High-resolution sperm typing has demonstrated that the p-PAR recombination rate in male meiosis is greater than 20 times the genomic average of 1 cM/Mb (Lien et al. 2000), and may even be as high as 350 cM/Mb (May et al. 2002). The long (Xq/Yq) arms contain the much smaller q-PAR, which is about 0.4 Mb long (Ciccodicola et al. 2000). The recombination rate in the q-PAR is lower than in the p-PAR at about 5 cM/Mb (Ciccodicola et al. 2000).

    As in the human p-PAR, recombination in the mouse PAR is also very frequent (Soriano et al. 1987). Perry and Ashworth (1999) demonstrated that the silent substitution rate in the pseudoautosomal part of the mouse Fxy gene substantially accelerated after this region was translocated into the PAR, suggesting that recombination may accelerate the silent substitution rate. This work motivated us to conduct a study of substitution rates in the human PARs, revealing a significantly accelerated silent substitution rate in three human p-PAR genes (Filatov and Gerrard 2003). This finding demonstrates that the elevated silent substitution rate (and perhaps, an elevated mutation rate) is probably a general feature of the pseudoautosomal regions in mammals.

    As the elevated recombination rate is the most prominent feature of the PARs, it is tempting to associate elevated substitution and recombination rates. Indeed, there is growing evidence that recombination may affect substitution patterns and rates. Several studies recently reported a weak but significantly positive correlation of recombination rate in humans with human/mouse fourfold degenerate site divergence (Lercher and Hurst 2002; Waterston et al. 2002; Hardison et al. 2003). Recombination rate was demonstrated to correlate positively with human/chimpanzee and human/baboon non-coding divergence (Hellmann et al. 2003). As the silent substitution rate is usually assumed to be neutral, its correlation with recombination suggests a causal relationship between the processes of recombination and mutation. The mutagenic effect of recombination reported for yeast (Strathern, Shafer, and McGill 1995) strengthens the evidence in favor of this hypothesis.

    Several other factors can also result in the correlation between recombination and silent substitution rates. Both recombination and substitution rates correlate positively with GC-content (Eyre-Walker 1993; Fullerton, Carvalho, and Clark 2001; Bielawski, Dunn, and Yang 2000; Yi, Ellsworth, and Li 2002), which may result in a covariation between the substitution and recombination rates. Frequent methylation-induced CpGTpG mutations (Robertson and Wolffe 2000) may also influence the correlation of recombination rates and substitution rates, as the frequency of CpG dinucleotides has been reported to be associated with recombination rate in humans (Kong et al. 2002). Biased gene conversion (BGC), the preferential resolution of (A or T)/(G or C) heteroduplexes toward GC (Lamb 1986; Brown and Jiricny 1988), may be another such factor. In fact, BGC has been demonstrated to be mathematically equivalent to selection for (A or T)(G or C) mutations (Nagylaki 1983); thus it may have a substantial effect on fixation rates at (A or T)/(G or C) segregating sites and on the overall substitution rates (Eyre-Walker and Bulmer 1995). As BGC might be more frequent in recombinational hot spots, it may also be one of the causes of the correlation between the substitution rates and recombination rates.

    In this paper I report the analysis of substitution rates in 10 genes across the human and ape p-PAR, aiming to distinguish between the possible causes of the elevated substitution rate in this peculiar genomic region. I demonstrate that the substitution rate in the p-PAR is not uniform. In fact, it forms a gradient, rising with distance from the p-PAR boundary. The results of the analysis suggest that variation in GC-content, methylation, and biased gene conversion are not sufficient to explain the existence of the gradient. I hypothesize that this substitution rate gradient is due to the mutagenic effect of recombination, which is very frequent in the distal region (as high as >300 cM/Mb in some regions [May et al. 2002]), and might be substantially lower near the PAR boundary, which probably acts as a suppressor of recombination, resembling the suppression of recombination in the proximity of chromosomal inversions (Novitski and Braver 1954; Coyne et al. 1993).

    Materials and Methods

    To study substitution patterns and rates in the frequently recombining human and ape p-arm pseudoautosomal region, nine p-PAR loci were selected (table 1). One region per gene was sequenced for all the genes, except the XG gene, for which two regions were sequenced, a 1.5 kb pseudoautosomal region adjacent to the pseudoautosomal boundary (referred to as PAB below) and a 1.3-kb region located 7 kb distally from the PAB (referred to as XG below). Although the PAB region is just an intronic sequence of the XG gene, for convenience, I will refer to it as to a separate gene.

    Table 1 p-PAR Regions Analyzed.

    The primers for PCR amplification and sequencing of PAR genes were designed based on human genomic sequences (table 1). The primers for the PAB, XG, SHOX, and PPP2R3L genes were described in a previous paper (Filatov and Gerrard 2003). The primers for all the other genes are listed in table 2.

    Table 2 PCR and Sequencing Primers.

    All human sequences used in this study were taken from GenBank, and the orangutan sequences for PAB, XG, SHOX, and PPP2R3L were taken from the previous study (Filatov and Gerrard 2003). All the other ape sequences have not been previously published. The GenBank accession numbers for these sequences are AY296087 through AY296112.

    Chimpanzee (Pan troglodytes), gorilla (Gorilla gorilla), and orangutan (Pongo pygmaeus) genomic DNA samples were purchased from the Coriel cell repository. All the genes were PCR amplified using the Roche High Fidelity polymerae chain reaction (PCR) kit under the following conditions: 95°C, 2.5 min, 57°C, 1 min, 68°C 3 min, followed by 34 cycles of 94°C 0.5 min, X°C, 0.5 min, and 68°C 2.5 min, where X is the primer-specific annealing temperature (usually 55°C). The PCR products were gel-purified, extracted from the agarose gel using the Qiagen Gel Extraction kit, cloned into pCR4 plasmids with the TA cloning kit (Invitrogen) and were sequenced with the BigDye v3 sequencing system (ABI) on an ABI3700 automated sequencer. Chromatograms were checked and corrected by eye, and contigs were assembled manually using ProSeq software (Filatov 2002). Pairwise alignments were constructed manually or with the mcalign program (Peter Keightley and Toby Johnson, unpublished, available at the Web site http://homepages.ed.ac.uk/eang33/mcinstructions.html). The multiple alignments were constructed manually from pairwise alignments with the ProSeq software.

    The maximum-likelihood (ML) analysis of the substitution patterns and rates was conducted using the baseml program (Yang 1997). All the ML analyses assumed that the closest human relative is chimpanzee, and that gorilla is the second closest, with orangutan being an outgroup. For the ML analysis of substitution patterns a user-defined general reversible model (REVu, Yang 1994) with different numbers of allowed rate parameters was used. For every model the analysis was conducted separately for individual genes, and the overall log-likelihood was obtained by summation of the log-likelihoods for individual genes (table 3). The significance in all the likelihood ratio tests was assessed using the approximation that the log-likelihood ratios are 2-distributed with the degrees of freedom equal to the difference in the number of parameters of the models compared (e.g., Muse and Weir 1992).

    Table 3 The Nested ML Analysis of Substitution Models in the Introns of 10 P-PAR Genes in Human, Chimpanzee, Gorilla, and Orangutan.

    The GC-content and CpG analyses were conducted with the "GC-content" and "show sites" tools in the ProSeq software (Filatov 2002). In all the correlation analyses the Pearson product-moment correlation coefficient, (Sokal and Rohlf 1995) was used.

    Results

    The 10 PAR genes analyzed in this study are listed in table 1. After exclusion of exon sequences and regions deleted in one of the four species (indels), the total number of aligned intron positions analyzed in this study was 13,554 bp. Because only noncoding sequences were analyzed, I will use "silent," "intron," and "noncoding" substitution rates as synonyms, referring to the substitution rate in introns of the genes analyzed.

    Substitution Rate Gradient in the p-PAR

    Before proceeding with the estimation of substitution rates in the p-PAR genes, an appropriate substitution model had to be selected. The nested ML-ratio analysis of substitution models (table 3) was conducted on all the genes considered separately, with the total log-likelihood value obtained by the summation of the log-likelihoods for individual genes. This analysis demonstrated that the HKY model (Hasegawa, Kishino, and Yano 1985) represents a dramatic improvement over the F81 model (Felsenstein 1981), reflecting the fact that CT and GA transitions occur much more frequently than transversions. Separate substitution parameters for the two transition types (TC and AG) resulted in a significant improvement of the likelihood value, suggesting that the two transition types occur with different rates. Further sequential addition of separate parameters for GC and AT transversions also improved the fit of the model to the data. However, the further addition of separate parameters for GT and AC transversions did not improve the likelihood value, suggesting that they are not significantly different. Thus, to adequately describe substitution patterns in the PAR, five substitution rate parameters (four free rates) for CT and GA transitions, CG, AT, and all the other transversions are required ("HKY+3 rates" model). This model will be used below, unless stated otherwise.

    The distribution of intron site divergence in the p-PAR is shown in figure 1. The substitution rate, expressed as a total tree length, forms a gradient with divergence between the species, increasing with distance from the PAR boundary. The correlation of the total tree length and the distance from the PAR boundary is highly significant ( = 0.947, P < 0.001).

    FIG. 1. The gradient of intron divergence in the human, chimpanzee, gorilla, and orangutan p-PAR expressed as the total tree length. The solid line joins the data points for all the intron sites included; the dashed line is the total tree length for the intron sequences after exclusion of the effect of CpGs by removing all the sites preceded by C and followed by G

    To study whether the differences in the total tree length across the PAR are due to an acceleration of the substitution rate in some of the species, I compared the model with a single rate for all the branches to the model with branches having separate substitution rates ("no clock"). Because different genes have different substitution rates, the analysis was conducted for separate genes and the total log-likelihood was obtained by the summation of the log-likelihoods for individual genes. The no-clock model did not fit the data significantly better than the one-rate model in any of the genes or for the total data set (2lnL = 0.476, P > 0.05), providing no evidence for a significant difference in substitution rates between the species.

    GC-Content

    GC-content is known to correlate with the silent substitution rate (Bielawski, Dunn, and Yang 2000; Yi, Ellsworth, and Li 2002); hence the distribution of GC-content across the p-PAR may be of interest for this study. It is clear from table 1 that GC-content is not equal among the studied genes and that there is a tendency for the GC-content to rise with distance from the PAR boundary. The proximal part adjacent to the PAR boundary is relatively GC-poor (intronic GC% < 40%). Intronic GC-content (GCi%) rises to 42%–47% in the region including L16, L15, and DHRSXY genes. GCi% reaches 56% in the region including ASMT and ASMTL genes farther away from the PAR boundary. More distally, in the SHOX gene GCi% = 59%, and increases further to 67% in the PPP2R3L gene located 150 kb from the telomere. The positive correlation of GCi% with the distance from the PAR boundary is highly significant ( = 0.96, P < 0.001). Because the substitution rate is also increasing with the distance from the PAR boundary, it is hardly surprising that GCi% in the p-PAR positively correlates with the silent substitution rate ( = 0.88, P = 0.001).

    Methylation

    CT transitions due to deamination of 5-methylcytosine in methylated CpG pairs are known to occur with at least an order of magnitude higher frequency than the other mutations (Robertson and Wolffe 2000). Transversion rate is also somewhat higher at CpG dinucleotides (Nachman and Crowell 2000; Kondrashov 2002). Thus, mutations at CpG dinucleotides may be an important factor that influences the substitution rate gradient. To test this hypothesis, I repeated the substitution rate analysis, excluding all the sites preceded by C or followed by G. This is the most efficient way to correct for the effect of CpG hypermutability (A. Y. Eyre-Walker, personal communication). As is clear from figure 1, this type of analysis results in reduction of the tree length in the distal genes, whereas the proximal genes remain almost unaffected, suggesting that the effect of methylation is weak near the PAR boundary and increases toward the telomere. Thus, methylation may be one of the factors creating the substitution rate gradient. However, the exclusion of the effect of CpGs does not result in disappearance of the substitutiuon rate gradient; the correlation of the total tree length with distance from the PAB is still highly significant ( = 0.89, P < 0.01), however, suggesting that methylation cannot be the only factor creating the substitution rate gradient in the PAR.

    Substantial reduction of the substitution rate in the distal genes, but not the proximal genes, after the exclusion of CpGs (fig. 1) suggests that methylation might be fairly strong in the distal region. Surprisingly, the CpG/GpC ratio is fairly high in the distal genes, especially in the the PPP2R3L gene (table 1). This ratio reflects methylation-driven depletion of the number of CpGs and is expected to be close to unity in the absence of methylation. The CpG/GpC ratio was reported to underestimate the effect of methylation in GC-rich regions (Duret and Galtier 2000), which may partially account for the observed high values of this ratio in the distal genes. Another explanation for this intriguing excess of CpGs in the region where methylation is apparently quite strong is that the number of CpGs in this region is not stationary, perhaps because it started to loose CpGs only recently. The PPP2R3L gene, where the CpG/GpC ratio is especially high (69/68 1), might have been a CpG island that became heavily methylated only recently. Indeed, the adjacent human 40-kb intron sequence of the PPP2R3L gene (GenBank accession number AF215839) is significantly less CpG rich than the region studied in this paper (CpG/GpC = 2025/3096 = 0.65, G-test = 6.38, P = 0.012).

    Biased Gene Conversion in the PAR

    If BGC operates in the p-PAR, the noncoding regions may not be neutral (Nagylaki 1983), and the substitution rate may be substantially accelerated or reduced by the BGC, depending on the GC-content and the mutation matrix (Eyre-Walker and Bulmer 1995). Although BGC may affect probalilities of fixation at (A or T)/(C or G) segregating sites, the probability of fixation at G/C and A/T segregating sites should remain unaffected by the process of BGC. If the gradient of silent substitution rate in the p-PAR is caused only by BGC, then the difference in substitution rates should be due to GC-changing substitutions, and not to GC and AT substitutions. It is clear from table 4 that in a pairwise human/orangutan comparison the number of AT substitutions per 100 A or T sites (DAT%) and the number of GC substitutions per 100 G or C sites (DGC%) correlates positively with the distance to the PAR boundary ( = 0.633, P < 0.05 and = 0.729, P < 0.05, respectively), suggesting that, regardless of BGC, the silent substitution rate increases toward the telomere. As neither BGC nor methylation can affect GC and AT substitutions rates, there should be another major factor creating the substitution rate gradient in the p-PAR. Below, I argue that a gradient in recombination rate in the p-PAR may be such a factor.

    Table 4 Correlation of the A T and G C Substitution Rates With Distance to the Pseudoautosomal Boundary (PAB) in a Pairwise Human/Orangutan Comparison.

    Discussion

    Substitution Rate Gradient

    Here I have reported a significant difference in the silent substitution rates between the genes in the human and ape pseudoautosomal region. The silent substitution rate forms a gradient, with significantly more substitutions occurring in the more distal p-PAR genes. Average silent divergence across the genome for human/chimpanzee (1%), human/gorilla (1.2%), and human/orangutan (3%) comparisons (Chen and Li 2001; Filatov and Gerrard 2003) is not significantly different from that found in the proximal p-PAR genes, but divergence beyond the proximal PAR region is significantly higher than that found in both the proximal PAR and in the non-PAR genes (Filatov and Gerrard 2003 and this study).

    Assuming that substitution rate differences at noncoding sites reflect the differences in underlying mutation rate, we can use silent divergence in these regions to estimate mutation rates across the PAR. Under neutrality, divergence is equal to twice the product of the divergence time and the mutation rate, which may be used to estimate the mutation rate for relatively distant species like humans and orangutans (e.g., Li 1997). If the time since human/orangutan divergence is 12 myr (Goodman et al. 1998) and we assume a generation time of 20 years, then the per-nucleotide per-generation mutation rate ranges from 2.7 x 10–8 in the proximal 200 kb to 9 x 10–8 in the distal PAR genes. The estimate of the mutation rate in the proximal PAR region is similar to the estimates published for other human genes, but the mutation rate estimates for the the distal PAR regions are somewhat higher than those for the human non-PAR genes (Nachman and Crowell 2000; Kondrashov 2002).

    If the mutation rate is indeed higher in the p-PAR, we have to expect elevated DNA diversity in the p-PAR genes. Only two estimates of DNA diversity in the PAR are available, for the SHOX (May et al. 2002) and for PPP2R3L (Schiebel et al. 2000) genes. The two estimates contradict each other: according to May et al. (2002), the DNA diversity in the SHOX gene is not higher than elsewhere in the genome ( 0.07%), whereas the estimate of DNA diversity from the PPP2R3L sequences reported by Schiebel et al. (2000) is almost an order of magnitude higher ( 0.5%). Given the much higher mutation rate (taken into account by the HKA test [Hudson, Kreitman, and Aguade 1987]) in the p-PAR genes, the level of diversity in PPP2R3L is comparable with the estimates of the non-PAR genes, whereas the diversity in SHOX appears to be significantly reduced compared to PPP2R3L and to the non-PAR genes (Filatov and Gerrard 2003). As the DNA diversity in the SHOX gene was obtained mostly by genotyping of known SNPs (May et al. 2002), the rare segregating sites might be substantially under-represented in this data set, and it is not possible to apply the standard frequency spectrum-based techniques (i.e., Tajima 1989) to test whether the reduced diversity is caused by a recent selective sweep in this region. In principle, DNA diversity can be reduced as a consequence of frequent BGC, resulting in much faster fixation of (A or T)/(G or C) segregating sites (Nagylaki 1983). This hypothesis, as well as the level of DNA diversity in the pseudoautosomal genes, requires further investigation.

    Possible Causes of the Substitution Rate Gradient in the PAR

    The existence of the substitution gradient in the human PAR raises questions as to why the substitution rate is elevated in the PAR and why it is not uniform across the PAR.

    Partial Y-Linkage

    Y chromosomes are known to have elevated substitution (and mutation) rates because there are more cell divisions in the male germ line (e.g., Makova and Li 2002). Apart from that, the Silene latifolia Y chromosome was reported to have a higher per-cell division mutation rate than the X chromosome (Filatov and Charlesworth 2002), a finding that may also be true for human sex chromosomes. Because pseudoautosomal genes spend an equal amount of time in male and female gametes, the first factor does not apply to the PAR genes. However, if the human Y chromosome has an elevated per-cell division mutation rate, partial Y-linkage of the pseudoautosomal genes could account for some elevation of the mutation rate in the PAR. If this is the case, however, one would expect the substitution rate in the PAR genes to be only a fraction of that in the Y-linked genes. The human/orangutan silent divergence in Y-linked regions ranges from 3.9% in ZFY introns (Shimmin et al. 1993) to 8.2% in TSPY introns (Kim and Takenaka 1996), and is not significantly higher than in the p-PAR (data not shown), suggesting that a higher mutation rate on the Y chromosome cannot explain the elevated substitution rate in the p-PAR. Moreover, the partial Y-linkage of the PAR genes cannot explain the gradient in substitution rate observed, as one would expect partial Y-linkage to affect all the PAR genes equally.

    Location of the PAR Boundary

    The gradient in substitution rate could be explained by different evolutionary histories of the proximal and distal PAR regions. If the proximal PAR region is X-linked in one of the species studied, then the total tree length for this region would be shorter compared to more distal regions. However, this would result in substantial differences in the branch lengths of the phylogeny for the proximal region, which is not the case. Moreover, the position of the p-PAR boundary is well described and is known to be the same in humans, apes, and Old World monkeys (Ellis et al. 1990). It is thought to have been formed in the progenitor of simian primates as a result of the translocation of the SRY locus into the larger ancestral PAR (Glaser et al. 1999). Thus, the lower substitution rate in the proximal PAR region cannot be explained by X-linkage of this region in one or several of the species studied here.

    Methylation

    Methylation may substantially affect substitution patterns and rates, as frequent C t T transitions from the deamination of 5-methylcytosine in methylated CpG dinucleotides occur at least an order of magnitude more frequently than other types of mutations (Robertson and Wolffe 2000). A much stronger reduction in the estimates of the substitution rate in the distal region after removal of CpG dinucleotides (fig. 1) suggests that methylation might be more severe in the distal genes than in those of the proximal PAR region, and it may be one of the factors creating the substitution rate gradient. However, the effect of methylation is clearly not sufficient to explain all the variation in substitution rates among the PAR genes because, after removal of all the CpGs, the substitution rate difference between the proximal region and the rest of the PAR is still significant.

    Biased Gene Conversion

    Taking into account only the AT and CG transversions, which are not affected by BGC, the substitution rate gradient is still significant, demonstrating that BGC is insufficient to explain the observed substitution rate gradient. However, I cannot reject the hypothesis that BGC influences, to some extent, the substitution rate gradient. One would expect BGC to accelerate the substitution rate in the GC-poor proximal region and to reduce it in the more distal GC-rich genes, as it accelerates fixation of ATGC mutations (which should be fairly frequent in the AT-rich proximal region) and reduces the probability of fixation of the GCAT mutations) which should be fairly frequent in the GC-rich distal region). Thus, if anything, BGC is likely to diminish rather than create the observed gradient in substitution rates in the PAR.

    Variation in GC-Content

    The substitution rate in the p-PAR demonstrates a highly significant correlation with GC-content. To a large extent this correlation seems to be due to abundant CpG dinucleotides in GC-rich distal genes. However, even after the removal of CpG, the value of the correlation coefficient is quite high ( = 0.57, P = 0.08), and with more data points it may well become significant, suggesting that even after exclusion of the effect of methylation there may be an association between GC-content and substitution rate. One way that GC-content can affect substituion rate is to change the frequencies of nucleotides which mutate with different rates. For example, if G and C mutate more frequently than A or T, then regions with higher GC-content will have a higher mutation (and silent substitution) rate. If this is the main cause of the substitution rate gradient, then the correction for the GC-content should result in the disappearence of the substitution rate gradient. However, this is clearly not the case for the AT and GC substitution rates after the correction for the GC-content (table 4).

    Mutagenic Recombination

    The mutagenic effect of recombination seems to be a very attractive explanation for the elevated substitution rate in the PAR. If the substitution gradient in the PAR is driven by variation in the recombination rate in different PAR regions, one would expect the frequency of recombination close to the pseudoautosomal boundary to be lower than in the distal PAR. Unfortunately, the best available estimates of recombination in the p-PAR (Lien et al. 2000) are not detailed enough, and no reliable estimates of the recombination rate close to the PAR boundary are available. According to cytological observations, pairing in the human p-PAR is much more frequent near the telomere than in the more proximal regions (S. Armstrong, unpublished data), suggesting that the recombination rate might drop toward the pseudoautosomal boundary. If the recombination rate is indeed lower near the PAR boundary, this would resemble the suppression of recombination near chromosomal inversions (Novitski and Braver 1954; Coyne et al. 1993). However, we need more precise estimates of the recombination rate in the p-PAR genes to corroborate or reject this hypothesis.

    Interestingly, in the mouse PAR there is also a gradient of substitution rate (e.g., Fig. 6 in Birdsell 2002), although it is much steeper than in the human PAR and extends over only a few kilobases. However, the mouse PAR is much smaller than the human p-PAR, thus, the per-nucleotide recombination rate might be much higher in the mouse PAR, which could result in this difference in gradient scale between mouse and human PARs. Indeed, the difference in substitution rates between the PAR and non-PAR genes is substantially higher in mice (Perry and Ashworth 1999) than the difference observed in humans (Filatov and Gerrard 2003; and this paper), suggesting that recombination (if it is the cause of the higher substitution rate) may be much more frequent in the mouse PAR.

    The mutagenic recombination hypothesis is supported by several reports of the positive correlation of human recombination rate with human/mouse (Lercher and Hurst 2002; Waterston et al. 2002; Hardison et al. 2003), human/chimpanzee, and human/baboon divergence (Hellmann et al. 2003). However, we failed to detect a positive correlation of human/orangutan silent divergence with the recombination rate in humans (Filatov and Gerrard 2003). Moreover, the SHOX region, which has previously been reported to be a hot spot of recombination in humans (as high as 300 cM/Mb [May et al 2002]) does not show any signs of a significantly elevated substitution rate, when compared with adjacent PAR genes (Filatov and Gerrard 2003 and this study). This may be because recombinational hot spots are very short-lived (Boulton, Myers, and Redfield 1997), and therefore it is possible that there has not been enough time for the SHOX region to accumulate substitutions.

    If recombination is mutagenic, it is surprising that it has not been detected in extensive Drosophila genetics and genomics studies. Takano-Shimizu (2001) reported changes in recombination rate, substitution rate, and GC/AT substitution bias in a sub-telomeric region of the X chromosome in three Drosophila species. On the one hand, a 10-fold reduction in the recombination frequency in the D. melanogaster lineage was coupled with an AT-biased substitution pattern and a significant acceleration of the rate of silent substitutions, which is in contrast to what would be expected if recombination were mutagenic. On the other hand, in D. orena a significant increase in the silent substitution rate coupled with GC-substitution bias was observed, compared to a closely related D. erecta (Takano-Shimizu 2001), resembling the mutagenic effect of recombination observed in human and mouse PARs. Unfortunately, the study does not contain the data on the recombination rate in D. orena or a comparison of the recombination rates between D. orena and D. erecta, and this makes it difficult to assess whether the acceleration of the substitution rate in the D. orena subtelomeric region was due to changes in the recombination rate.

    Conclusions

    I have described a silent substitution rate gradient in the human and ape p-PAR. The existence of this gradient cannot be fully explained by the differences in GC-content, methylation, or biased gene conversion across the p-PAR. I suggest that the substitution rate gradient might be due to an underlying gradient of recombination rate; however, the available estimates of recombination rate across the p-PAR do not allow a test of this hypothesis.

    Acknowledgements

    I thank Brian Charlesworth for a helpful discussion, Dave Gerrard for critical reading of the manuscript, Adam Eyre-Walker and the anonymous reviewers for criticism and helpful comments. The work was supported by the Wellcome Trust under grant number 068193.

    Literature Cited

    Bielawski, J. P., K. A. Dunn, and Z. Yang. 2000. Rates of nucleotide substitution and mammalian nuclear gene evolution: approximate and maximum-likelihood methods lead to different conclusions. Genetics 156:1299-1308.

    Birdsell, J. A. 2002. Integrating genomics, bioinformatics and classical genetics to study the effect of recombination on genome evolution. Mol. Biol. Evol. 19:1181-1197.

    Boulton, A., R. S. Myers, and R. J. Redfield. 1997. The hotspot conversion paradox and the evolution of meiotic recombination. Proc. Natl. Acad. Sci. USA 94:8058-8063.

    Brown, W. R. A. 1988. A physical map of the human pseudoautosomal region. EMBO J. 7:2377-2385.

    Brown, T. C., and J. Jiricny. 1988. Different base/base mispairs are corrected with different efficiencies and specificities in monkey kidney cells. Cell 54:705-711.

    Burgoyne, P. S., S. K. Mahadevaiah, M. J. Sutcliffe, and S. J. Palmer. 1992. Fertility in mice requires X-Y pairing and a Y-chromosomal spermiogenesis gene mapping to the long arm. Cell 71:391-398.

    Chen, F.-C., and W.-H. Li. 2001. Genomic divergence between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am. J. Hum. Genet. 68:444-456.

    Ciccodicola, A., M. D'Esposito, and T. Esposito, et al. (21 co-authors). 2000. Differentially regulated and evolved genes in the fully sequenced Xq/Yq pseudoautosomal region. Hum. Mol. Genet. 9:395-401.

    Cooke, H. J., W. R. Brown, and G. A. Rappold. 1985. Hypervariable telomeric sequences from the human sex chromosomes are pseudoautosomal. Nature 317:687-692.

    Coyne, J. A., W. Meyers, A. Crittenden, and P. Sniegowski. 1993. The fertility effects of pericentric inversions in Drosophila melanogaster. Genetics 134:487-496.

    Duret, L., and N. Galtier. 2000. The covariation between TpA deficiency, CpG deficiency, and G+C content of human isochores is due to a mathematical artefact. Mol. Biol. Evol. 17:1620-1625.

    Ellis, N., and P. N. Goodfellow. 1989. The mammalian pseudoautosomal region. Trends Genet. 5:406-410.

    Ellis, N., P. Yen, K. Neiswanger, L. J. Shapiro, and P. N. Goodfellow. 1990. Evolution of the pseudoautosomal boundary in the old world monkeys and great apes. Cell 63:977-986.

    Eyre-Walker, A. 1993. Recombination and mammalian genome evolution. Proc. R. Soc. Lond. Ser. B Biol. Sci. 252:237-243.

    Eyre-Walker, A., and M. Bulmer. 1995. Synonymous substitution rates in enterobacteria. Genetics 140:1407-1412.

    Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368-376.

    Filatov, D. A. 2002. ProSeq: a software for preparation and evolutionary analysis of DNA sequence data sets. Mol. Ecol. Notes 2:621-624.

    Filatov, D. A., and D. Charlesworth. 2002. Substitution rates in the X- and Y-linked genes of the plants, Silene latifolia and S. dioica. Mol. Biol. Evol. 19:898-907.

    Filatov, D. A., and D. T. Gerrard. 2003. High mutation rates in human and ape pseudoautosomal genes. Gene 317:67-77.

    Freije, D., C. Helms, M. S. Watson, and H. Donis-Keller. 1992. Identification of a second pseudoautosomal region near the Xq and Yq telomeres. Science 258:1784-1787.

    Fullerton, S. M., A. B. Carvalho, and A. G. Clark. 2001. Local rates of recombination are positively correlated with GC content in the human genome. Mol. Biol. Evol. 18:1339-1142.

    Glaser, B., D. Myrtek, Y. Rumpler, K. Schiebel, M. Hauwy, G. A. Rappold, and W. Schempp. 1999. Transposition of SRY into ancestral pseudoautosomal region creates a new pseudoautosomal boundary in a progenitor of simian primates. Hum. Mol. Genet. 8:2071-2078.

    Goodman, M., C. A. Porter, J. Czelusniak, S. L. Page, H. Schneider, J. Shoshani, G. Gunnell, and C. P. Groves. 1998. Toward a phylogenetic classification of primates based on DNA evidence complemented by fossil evidence. Pol. Phylogenet. Evol. 9:585-598.

    Hardison R. C., K. M. Roskin, and S. Yang, et al. (18 co-authors). 2003. Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution. Genome Res. 13:13-26.

    Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160-174.

    Hellmann, I., I. Ebersberger, S. E. Ptak, S. Paabo, and M. Przeworski. 2003. A neutral explanation for the correlation of diversity with recombination rates in humans. Am. J. Hum. Genet. 72:1527-1535.

    Hudson, R. R., M. Kreitman, and M. Aguade. 1987. A test of neutral molecular evolution based on nucleotide data. Genetics 116:153-159.

    Kim, H. S., and O. Takenaka. 1996. A comparison of TSPY genes from Y-chromosomal DNA of the great apes and humans: sequence, evolution, and phylogeny. Am. J. Phys. Anthropol. 100:301-309.

    Kondrashov, A. S. 2002. Direct estimates of human per nucleotide mutation rates at 20 loci causing mendelian diseases. Hum. Mutat. 21:12-27.

    Kong, A., D. F. Gudbjartsson, and J. Sainz, et al. (17 co-authors). 2002. A high-resolution recombination map of the human genome. Nat. Genet. 31:241-247.

    Lamb, B. C. 1986. Gene conversion disparity—factors influencing its direction and extent, with tests of assumptions and predictions in its evolutionary effects. Genetics 114:611-632.

    Lercher, M. J., and L. D. Hurst. 2002. Human SNP variability and mutation rate are higher in regions of high recombination. Trends Genet. 18:337-340.

    Li, W.-H. 1997. Molecular evolution. Sinauer Associates. Sunderland, Mass.

    Lien, S., J. Szyda, B. Schechinger, G. Rappold, and N. Arnheim. 2000. Evidence for heterogeneity in recombination in the human pseudoautosomal region: high resolution analysis by sperm typing and radiation-hybrid mapping. Am. J. Hum. Genet. 66:557-566.

    Makova, K. D., and W.-H. Li. 2002. Strong male-driven evolution of DNA sequences in humans and apes. Nature 416:624-626.

    May, C. A., A. C. Shone, L. Kalaydjieva, A. Sajantila, and A. J. Jeffreys. 2002. Crossover clustering and rapid decay of linkage disequilibrium in the Xp/Yp pseudoautosomal gene SHOX. Nat. Genet. 31:272-275.

    Muse, S. V., and B. S. Weir. 1992. Testing for equality of evolutionary rates. Genetics 132:269-276.

    Nachman, M. W., and S. Crowell. 2000. Estimates of the mutation rate per nucleotide in humans. Genetics 156:297-304.

    Nagylaki, T. 1983. Evolution of a finite population under gene conversion. Proc. Natl. Acad. Sci. USA 80:6278-6281.

    Novitski, E., and G. Braver. 1954. An analysis of crossing over within a heterozygous inversion in Drosophila melanogaster. Genetics 39:197-209.

    Perry, J., and A. Ashworth. 1999. Evolutionary rate of a gene affected by chromosomal position. Curr. Biol. 9:987-989.

    Petit, C., J. Levilliers, and J. Weissenbach. 1988. Physical mapping of the human pseudoautosomal region; comparison with the genetic linkage map. EMBO J. 7:2369-2376.

    Robertson, K. D., and A. P. Wolffe. 2000. DNA methylation in health and disease. Nat. Rev. Genet. 1:11-19.

    Schiebel, K., J. Meder, A. Rump, A. Rosenthal, M. Winkelmann, C. Fischer, T. Bonk, A. Humeny, and G. Rappold. 2000. Elevated DNA sequence diversity in the genomic region of the phosphatase PPP2R3L gene in the human pseudoautosomal region. Cytogenet. Cell Genet. 91:224-230.

    Shimmin, L.C., B. H. J. Chang, and W.-H. Li. 1993. Male-driven evolution of DNA sequences. Nature 362:745-747.

    Sokal, R. R., and F. J. Rohlf. 1995. Biometry, 3rd edition. W. H. Freeman, San Francisco.

    Soriano P., E. A. Keitges, D. F. Schorderet, K. Harbers, S. M. Gartler, and R. Jaenisch. 1987. High-rate of recombination and double crossovers in the mouse pseudoautosomal region during male meiosis. Proc. Natl. Acad. Sci. USA 84:7218-7220.

    Strathern, J. N., B. K. Shafer, and C. B. McGill. 1995. DNA synthesis errors associated with double-strand-break repair. Genetics 140:965-972.

    Takano-Shimizu, T. 2001. Local changes in GC/AT substitution biases and in crossover frequencies on Drosophila chromosomes. Mol. Biol. Evol. 18:606-619.

    Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595.

    Waterston, R. H., K. Lindblad-Toh, E. Birney, J. Rogers, and J. F. Abril, et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520-562.

    Yang, Z. 1994. Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39:105-111.

    Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood CABIOS. 13:555–556.

    Yi, S., D. L. Ellsworth, and W.-H. Li. 2002. Slow molecular clocks in old world monkeys, apes, and humans. Mol. Biol. Evol. 19:2191-2198.(Dmitry A. Filatov)