当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2005年 > 第10期 > 正文
编号:11259180
Entropy and GC Content in the ?-esterase Gene Cluster of the Drosophila melanogaster Subgroup
     * Department of Ecology and Evolutionary Biology, University of California, Irvine; Academy of Ecology, Marine Biology, and Biotechnology, Far Eastern State University, Vladivostok, Russia; Institute of Marine Biology, Vladivostok, Russia; and Troitsk Institute of Innovation and Thermonuclear Investigations (TRINITI), Theoretical Department of Division for Perspective Investigations, Moscow Region, Russia

    E-mail: fjayala@uci.edu.

    Abstract

    We perform spectral entropy and GC content analyses in the ?-esterase gene cluster, including the Est-6 gene and the Est-6 putative pseudogene, in seven species of the Drosophila melanogaster species subgroup. Est-6 combines features of functional and nonfunctional genes. The spectral entropies show distinctly lower structural ordering for Est-6 than for Est-6 in all species studied. Our observations agree with previous results for D. melanogaster and provide additional support to our hypothesis that after the duplication event Est-6 retained the esterase-coding function and its role during copulation, while Est-6 lost that function but now operates in conjunction with Est-6 as an intergene. Entropy accumulation is not a completely random process for either gene. Structural entropy is nucleotide dependent. The relative normalized deviations for structural entropy are higher for G than for C nucleotides. The entropy values are similar for Est-6 and Est-6 in the case of A and T but are lower for Est-6 in the case of G and C. The GC content in synonymous positions is uniformly higher in Est-6 than in Est-6, which agrees with the reduced GC content generally observed in pseudogenes and nonfunctional sequences. The observed differences in entropy and GC content reflect an evolutionary shift associated with the process of pseudogenization and subsequent functional divergence of Est-6 and Est-6 after the duplication event.

    Key Words: Drosophila melanogaster subgroup ? ?-esterase gene cluster ? Est-6 ? Est-6 ? entropy ? GC content

    Introduction

    The ?-esterase gene cluster is on the left arm of chromosome 3 of Drosophila melanogaster, at 68F7-69A1 in the cytogenetic map. The cluster comprises two tandemly duplicated genes, first described as Est-6 and Est-P (Collet et al. 1990), with coding regions separated by only 193 bp. The coding regions are 1,686 and 1,691 bp long, respectively, and consist of two exons (1,387 and 248 bp) and a small (51 bp in Est-6 and 56 bp in Est-P) intron (Oakeshott et al. 1987). The Est-6 gene is well characterized (reviews in Richmond et al. 1990; Oakeshott et al. 1993, 1995). The gene encodes the major ?-carboxylesterase (EST-6) that is transferred by D. melanogaster males to females in the seminal fluid during copulation (Richmond et al. 1980) and affects the female's consequent behavior and mating proclivity (Gromko, Gilbert, and Richmond 1984). Less information is available for Est-P. Collet et al. (1990) concluded that Est-P is a functional gene based on several lines of evidence: transcriptional activity, intact splicing sites, no premature termination codons, and presence of initiation and termination codons. Balakirev and Ayala (1996) and Balakirev et al. (2003) found premature stop codons within the Est-P–coding region and some other indications suggesting that Est-P might be in fact a pseudogene, which they labeled Est-6. Dumancic et al. (1997) showed that some alleles of the Est-P produce a catalytically active esterase corresponding to the previously identified EST-7 isozyme (Healy, Dumancic, and Oakeshott 1991) and renamed the gene Est-7. The ?-esterase gene cluster in other Drosophila species also includes two (or three, in Drosophila pseudoobscura and related species) closely linked genes (Yenikolopov et al. 1989; Brady, Richmond, and Oakeshott 1990; East, Graham, and Whitington 1990; Oakeshott et al. 1993, 1995; King 1998).

    We have detected in D. melanogaster different patterns of nucleotide variation in Est-6 and Est-6 (Balakirev and Ayala 2003a, 2003b, 2004; Balakirev et al. 2003). Total variation is 2.1 times higher in Est-6 than in Est-6. In Est-6 the rate of synonymous substitutions is higher than the rate of nonsynonymous substitutions and neutrality tests (Kelly 1997; Wall 1999) are significant; the ratio of replacement to synonymous polymorphic sites is 1.2 for Est-6 but 0.59 for Est-6. The number of amino acid replacements is 2.9 times higher in Est-6 than in Est-6 and some of them are drastic. In non-African populations the recombination rate is 2.6 times higher in Est-6 than in Est-6 so that linkage disequilibrium is more pronounced in Est-6; however, in the African population, the recombination rate is similar for both genes. We have detected much higher gene conversion within Est-6 than within Est-6. The intergenic gene conversion is limited. Within the Est-6–coding region we found 17 premature stop codons among 78 sequences. The structural entropy analysis reveals significantly lower structural regularity and higher structural divergence for Est-6 than for Est-6, as expected if Est-6 is a pseudogene or nonfunctional gene (Balakirev et al. 2003). However, as noted, the gene can be expressed (Collet et al. 1990) and some alleles of Est-6 produce a catalytically active esterase (Dumancic et al. 1997), although this is detected in late larvae and adults of both sexes, whereas the functional Est-6 gene transcripts are found in all life stages but predominantly in adult males (Collet et al. 1990; Dumancic et al. 1997), consistent with the significant role of EST-6 in male mating (Richmond et al. 1980; Gromko, Gilbert, and Richmond 1984).

    We have recently suggested that pseudogenes may be an important part of the genome, representing a repertoire of sequences evolving toward the acquisition of new or changing functions (Balakirev and Ayala 2003b, 2003c, 2003d, 2004; Balakirev et al. 2003). A pseudogene may lose the initial specific coding function but retain or acquire others, which may not be simply recognizable. Pseudogenes along with their parental sequences may constitute indivisible functionally interacting entities ("intergenic complex" or "intergene") in which each single component cannot successfully accomplish the final functional role. The Est-6/Est-6 complex in D. melanogaster may represent such an intergenic complex (Balakirev and Ayala 2003b, 2003d, 2004) where the Est-6 gene plays the structural role (coding for the EST-6 enzyme) while Est-6 may enhance genetic variation in the Est-6 gene and contribute to regulate its expression.

    Previously, we have investigated nucleotide variability of the Est-6 gene and Est-6 in a D. melanogaster sample from a natural population of California (Balakirev and Ayala 1996; Balakirev et al. 1999, 2003; Ayala, Balakirev, and Sáez 2002; E. S. Balakirev, E. I. Balakirev, and Ayala 2002) and also in three populations of East Africa (Zimbabwe), Europe (Spain), and South America (Venezuela) (Balakirev and Ayala 2003a, 2003b, 2004). Now we extend the analysis by comparing the nucleotide variability of Est-6 and Est-6 in seven species of the D. melanogaster subgroup. We analyze the entropy and GC content of the genes. Spectral entropies are significantly higher for Est-6 than for Est-6 in all species studied. The observations agree with our previous results for D. melanogaster. For the first time we show that the accumulation of structural entropy is not completely random but is nucleotide dependent and related to GC content of the genes.

    Materials and Methods

    Drosophila Strains and Species

    The D. melanogaster strains have been previously described (E. S. Balakirev, E. I. Balakirev, and Ayala 2002; Balakirev and Ayala 2003a). Drosophila sechellia, Drosophila mauritiana, Drosophila erecta, Drosophila teissieri, and Drosophila orena strains were obtained from the Drosophila Species Stock Center (Bowling Green, Ohio).

    DNA Extraction, Amplification, and Sequencing

    The procedures were described earlier (Balakirev et al. 1999, 2003; E. S. Balakirev, E. I. Balakirev, and Ayala 2002). For each line, the sequences of both strands were determined using 24 overlapping internal primers spaced, on average, 350 nt. At least two independent polymerase chain reaction (PCR) amplifications were sequenced in both directions to prevent possible PCR or sequencing errors. The new sequence data have been deposited in the GenBank under the following accession numbers: AY695919 (Drosophila simulans), AY695920 (D. sechellia), AY695921 (D. mauritiana), AY695922 (D. teissieri), AY695923 (D. erecta), and AY695924 (D. orena). The population data for D. melanogaster are from Balakirev and Ayala (2003a, 2003b, 2004); see GenBank accession numbers AF147095–147102, AF150809–AF150815, AF217624–AF217645, AF526538–AF526559, AY247664–AY247713, AY247987–AY248036, AY368077–AY368109, and AY369088–AY369115.

    DNA Sequence Analysis

    The esterase sequences were assembled using the program SeqMan (Lasergene, DNASTAR, Inc., 1994–1997). Multiple alignment was carried out manually and using the program ClustalW (Thompson, Higgins, and Gibson 1994). GC content was computed using the DnaSP program, version 3.50 (J. Rozas and R. Rozas 1999) and PROSEQ, version 2.4 (Filatov and Charlesworth 1999). The Wilcoxon and Mann-Whitney tests were used to evaluate the significance of the pairwise differences in GC content.

    Spectral Structural Entropy Analysis

    Balakirev et al. (2003) have previously compared the characteristics of Est-6 and Est-6 in terms of spectral entropy (for a general review of methods and further references, see Lobzin and Chechetkin 2000). Spectral entropy characterizes the structural regularity of a nucleotide sequence. In our case it allows us conveniently to assess the comparative rates and positional distribution of mutations, as well as their influence on the regularity of the nucleotide sequences for Est-6 and Est-6, and to shed additional light on gene function. For the convenience of the reader, we reproduce here the main definitions in the Supplementary Material online.

    Because the methods based on the spectral entropy are a bit novel, we will elucidate two questions: why Fourier transform and what is spectral entropy useful for? Tandem repeats and scattered DNA repeats play important roles in the structural organization of chromatin and regulatory mechanisms (Lewin 2000). Moreover, the underlying repeats in quasi-randomly modified form and quasi-periodically phased nucleotides also play important roles in protein-coding fragments attributed to a fraction of "unique DNA" in genomic sequences (Lobzin and Chechetkin 2000, and references therein). Such periodicities emerge because the coding concordant with B-DNA double helix pitch, quasi-repeated package of nucleosomes, cooperative binding with regulatory proteins etc., exhibit evolutionary preference with respect to nonphased synonymous or nearly synonymous (in the sense of proximity between physicochemical characteristics of encoded amino acids) counterparts. We illustrate these mechanisms with examples.

    For instance, GGGCCC tracts in the presence of Mg++, or (A)n tracts with n = 3–8 phased with dsDNA helix pitch, cause DNA curvature and alleviate the package into nucleosomes (Harvey et al. 1995). Indeed, underlying periodicities both with periods P = 10.3–10.5 and 200 ± 30 bp can be detected in many sequences (Chechetkin and Lobzin 1998; Th?str?m et al. 1999). It is known that protein RecA participates in Escherichia coli recombination. These proteins bind cooperatively to single-stranded and double-stranded DNA and form helical structures with pitch 18.6 bp (Bar-Ziv and Libchaber 2001). The mode of binding is preferable to T-rich tracts. Periods P = 18.64 for T are observed for bacteriophage PHIX174 attacking E. coli (Chechetkin and Turygin 1995). Other highly reproducible features are related with periodicities P = 3 in the protein-coding regions (Tiwari et al. 1997; Lobzin and Chechetkin 2000, and references therein; see also figs. 2 and 3 below).

    FIG. 2.— Normalized structure factor spectra (see eqs. 1s–5s) for Est-6 in Drosophila orena.

    FIG. 3.— Normalized structure factor spectra (see eqs. 1s–5s) for Est-6 in Drosophila orena.

    The underlying quasi-periodic features may effectively be displayed via the Fourier technique, while their integral contribution is assessed with the spectral entropies. Therefore, Fourier transform and spectral entropy provide rather simple and effective ways to place evolutionary modifications into structural context. In the case of neutral mutations the higher the rate of mutations, the higher the randomization in nucleotide sequence and the higher the spectral entropy. As Fourier transform (eq. 1s) can be reciprocated, the structure factors (2s) contain information not only about particular periodicities but also convey general information about nucleotide positions in a DNA sequence. Normalization of structure factors (5s) ensures the independence of the structural ordering criterion from nucleotide composition. Thus, the presence or absence of correlations between ordering and composition provides valuable biological information.

    Results

    We present the entropy characteristics and GC content analysis of the ?-esterase gene cluster in seven sibling species belonging to the D. melanogaster species subgroup. These seven species belong to three complexes: (1) the melanogaster complex, represented by D. melanogaster, D. simulans, D. mauritiana, and D. sechellia; (2) the yakuba complex, represented by D. teissieri (Drosophila yakuba and Drosophila santomea are the two other species in this complex); and (3) a complex represented by D. erecta and D. orena (Lemeunier et al. 1986; Cariou 1987; Lachaise et al. 1988, 2000). The phylogenetic relationships of the Est-6 and Est-6 in the seven species are presented in figure 1. Alternative distance methods implemented in the MEGA 2.1 program (Kumar et al. 2001) yield identical topologies (data not shown). The tree in figure 1 is consistent with those derived from other genes (Kalantzi-Makri et al. 1999; Avedisov et al. 2001; Parsch et al. 2001; Ko, David, and Akashi 2003; Matsuo 2003).

    FIG. 1.— Neighbor-joining tree of the ?-esterase genes using Kimura 2-parameter distances. The tree is based on the coding (exon I + exon II) sequence for each gene. The numbers at the nodes are bootstrap percentage probabilities based on 10,000 replications.

    Entropy Analysis

    In order to calculate the relevant entropy characteristics for Est-6 and Est-6, the lengths of the nucleotide sequences were preliminarily equalized by removing insertions/deletions. Thus, the lengths of all sequences were equal to M = 1,614 bp. All Fourier spectra for the structure factors (eq. 2s) appear to be very similar to the counterpart spectra for D. melanogaster (Balakirev et al. 2003); examples for Est-6 and Est-6 in D. orena are illustrated in figures 2 and 3. The highest peaks with harmonics number n = 538 correspond to periodicity P = 3 typical for all protein-coding regions (Tiwari et al. 1997; Lobzin and Chechetkin 2000).

    The results for the normalized deviations (eq. 9s) in spectral entropies are summarized in table 1. They show distinctly higher structural ordering for Est-6 than for Est-6 in all species. Using probability distribution (10s) for normalized deviations r, the statistical assessment of entropy variations within species for Est-6 and Est-6 in table 1 may be performed with corresponding rank order statistics (Johnson and Leone 1977). In particular, taking from statistical tables the mean value and standard deviation for sweep one finds for the group of n = 7 species the values 2.70 ± 0.83. Therefore, despite the seemingly large entropy variations, both groups of data for Est-6 and Est-6 can be considered as statistically homogeneous within each group.

    Table 1 Relative Normalized Deviations for Structural Entropy, r,rel and rrel

    The partial spectral entropies (6s) or the corresponding normalized deviations (9s) may formally be considered as vectors in four-dimensional parameter space. The appropriate metrics (e.g., Euclidean or maximum of modulus difference between components) in this space can be applied to the construction of phylogenetic trees as well. Using the data in table 1, it is easy to show nearly complete correspondence with the tree in figure 1 for Est-6 and to a lesser extent for Est-6.

    The entropy values for the nucleotides A and T are similar for Est-6 and Est-6, but for G and C (and for the "total," as a consequence) the entropy values are lower (or the structural ordering is higher) in Est-6 than in Est-6 (table 1). Thus, the present analysis shows that the Est-6 and Est-6 characteristics of structural entropy are nucleotide dependent. Partially, the bias in entropy for G may be attributed to the more pronounced periodicity P = 3 for Est-6 than for Est-6, which is the common structural marker of protein-coding regions (see figs. 2 and 3). For Est-6, the normalized deviations are significant for G in all species, while A and T show particularly strong regularity in D. orena and D. erecta. For Est-6, there is significant regularity for D. erecta only for G. This observation is in good agreement with previous results for D. melanogaster and supports previous inferences about the functional roles of Est-6 and Est-6 (Balakirev et al. 2003; see below).

    To illustrate the higher rate of mutation accumulation in Est-6 with respect to Est-6, we performed a simulation by introducing random point mutations into a sequence for Est-6 and we analyzed the diminishing of the normalized deviation rrel for the total entropy Stotal. The corresponding dependence is illustrated in figure 4. The simulation results averaged over 20 realizations indicate that it is necessary to introduce about 20% of random point mutations into the sequence for Est-6 in D. orena to obtain the same structural ordering as for Est-6.

    FIG. 4.— Est-6 in Drosophila orena: dependence of normalized deviation for total entropy (see eqs. 9s and 11s) on the percentage of random point mutations introduced into the sequence without the preservation of initial nucleotide composition (solid curve) and without the preservation of initial nucleotide composition during mutation accumulation (dotted curve).

    Polymorphism Analysis with Spectral Entropy

    Spectral structural entropy may be used for analyzing intraspecific polymorphism in related genes. Positional variation is assessed by standard methods with sliding windows. A quantitative measure of polymorphism is given by the coefficient of variation,

    (1)

    where deviations of spectral entropy are calculated for fragment sequences extending from site m0 – W to a site m0 + W and averaged over a set consisting of P sequences. The mean and standard deviations are

    (2)

    and

    (3)

    respectively.

    The sliding window should be wide enough to ensure the robustness of the reference characteristics for comparable random sequences (Lobzin and Chechetkin 2000). We selected a sliding window of 2W + 1 = 201 sites with one-site increments. To avoid overestimation of polymorphic variation for small mean values of the peaks in the coefficients of variation were cut at the threshold |CV| = 3.

    The coefficients of variation were applied to (1) polymorphic changes for Est-6 and Est-6 in 78 strains of D. melanogaster (P = 78) collected in Zimbabwe, Spain, California, and Venezuela (Balakirev and Ayala 2003a, 2004) and (2) corresponding evolutionary changes in the set of seven species (P = 7). The coefficients of variation CVintra (in case [1]) and CVinter (in case [2]) characterize intraspecific diversity and interspecific divergence, respectively. The sliding window plots for the coefficients of variation related to intraspecific variability in 78 strains of D. melanogaster as well as the corresponding plots characterizing interspecific divergence for the seven Drosophila show noticeably more preserved regions in Est-6 than in Est-6 (data not shown).

    An important issue concerns the intraspecific and interspecific correlations in Est-6 versus Est-6, which may be assessed by the cross-correlation coefficients of intraspecific variability versus interspecific divergence within the sliding windows. For a window of width 2w + 1, these correlation coefficients are defined as

    (4)

    where the variation coefficients CV(m) are defined by equations (1–3) and

    (5)

    (6)

    According to Fisher's theory (Johnson and Leone 1977), the mean characteristics for the random correlations are given by

    (7)

    while the normalized variable

    (8)

    has an approximately Gaussian distribution with a unit variance.

    The relevant global cross-correlation coefficients (corresponding to 2w + 1 = M – 2W – 1 = 1413) for the total entropies Stotal (eq. 11s) for Est-6 are k = 0.11, r(k) = 4.08 and for Est-6 are k = 0.07, r(k) = 2.51. The probability of obtaining the observed correlations by chance is Pr 10–5 (for Est-6) and Pr 8 x 10–3 (for Est-6). Thus, the correlations between intraspecific variability and interspecific divergence are significant for both genes but turn out to be lower in Est-6 than in Est-6. In agreement with this observation, we have previously shown that the tests of neutrality of Kelly (1997) and Wall (1999) are significant for both Est-6 and Est-6 (Balakirev et al. 2003; Balakirev and Ayala 2004). Moreover, for Est-6 the tests are significant with lower level of recombination than for the Est-6 gene. Using maximum likelihood estimates of nonsynonymous/synonymous rate ratios ( = dN/dS) (reviewed by Yang and Bielawski 2000) we also have shown (Balakirev, Anisimova, and Ayala 2005) that the proportion of sites evolving under negative selection is substantially higher in Est-6 ( = 0.11; 83.7%) than in Est-6 ( = 0.003; 48.1% of sites).

    GC Content

    Table 2 shows the distribution of GC content. The average GC:AT ratios for all positions and coding positions are not significantly different between Est-6 and Est-6 (P < 0.05, Fisher's exact test), but the ratio is significantly different for the third codon position (P = 0.0143, Fisher's exact test). Total GC content is significantly lower in Est-6 than in Est-6 (46.6% vs. 49.7%; Wilcoxon test P = 0.0156; Mann-Whitney test P = 0.006) mostly due to GC3, the third codon position (46.0% vs. 55.1%; Wilcoxon P = 0.0156; Mann-Whitney P = 0.006). For all coding positions the difference in GC content (GCc) between Est-6 and Est-6 is not significant (47.0% vs. 46.9%; Wilcoxon P = 0.6875; Mann-Whitney P = 0.9015). Thus, the most pronounced difference in base composition between Est-6 and Est-6 is GC content at the third codon position.

    Table 2 GC Content (%) and Overall GC to AT Ratio of the Est-6 and Est-6 Genes in Seven Species of the Drosophila melanogaster Subgroup

    Exon I has significantly higher GC content than exon II for both genes (47.7% and 51.9% vs. 40.6% and 37.3%; Wilcoxon P = 0.0156; Mann-Whitney P = 0.006). The difference in GC content between exon I and exon II is more pronounced for Est-6 than for Est-6 (table 2). For Est-6 there is pronounced difference between GC3 (58.5%) and GCc (48.6%) in exon I (Wilcoxon P = 0.0156; Mann-Whitney P = 0.006); for Est-6 the difference is much less pronounced: 47.0% versus 48.1%, although marginally significant (Wilcoxon test P = 0.0469; Mann-Whitney P = 0.0379). For exon II there is no difference between GC3 and GCc for either Est-6 (Wilcoxon P = 0.2187; Mann-Whitney P = 0.1649) or Est-6 (Wilcoxon P = 0.8125; Mann-Whitney P = 0.9015). The introns of both genes have significantly lower GC content than the two exons (Wilcoxon P = 0.0156; Mann-Whitney P = 0.006). Intron GC content is significantly higher in Est-6 than in Est-6 (Wilcoxon P = 0.0312; Mann-Whitney P = 0.0070).

    The dispersion of the GC values between regions (exon I, intron, and exon II) is lower for Est-6 than for Est-6. Est-6 GC content ranges from 22.9% in the intron to 58.5% at GC3 of exon I. GC content in Est-6 ranges from 30.4% in the intron to 48.1% in exon I. GC content varies narrowly in the Est-6 exons (40.2%–48.1%) but more broadly in Est-6 (36.1%–58.5%) (table 2).

    Discussion

    Two different approaches, entropy and GC content analyses, reveal significantly different patterns of evolution in two genes, Est-6 and Est-6, in seven species of the D. melanogaster subgroup. Higher values of entropy for Est-6 than for Est-6 that were first encountered in D. melanogaster (Balakirev et al. 2003) are also observed in six other species. The significantly lower structural ordering (regularity) of Est-6 in comparison with Est-6 is compatible with the suggestion that Est-6 might be a pseudogene or nonfunctional gene. An interesting feature of the entropy in Est-6 and Est-6 is its being nucleotide dependent. The entropy values for nucleotides A and T are similar for Est-6 and Est-6, but for G and C the entropy values are lower for Est-6, which indicates that entropy increase is not a purely random process. In addition, there are significant correlations between intraspecific variability and interspecific divergence in entropy data for both genes. These observations are consistent with other Est-6 characteristics that combine features of functional and nonfunctional genes (Balakirev and Ayala 2003c, 2003d; Balakirev et al. 2003).

    The entropy nucleotide-dependent characteristics of Est-6 and Est-6 are related to their base composition. In all seven species the main difference is in the GC content of third codon positions, which is significantly higher in Est-6 than in Est-6. There are few differences between Est-6 and Est-6 in AT overall content and in GC content at the first and second codon positions. Interestingly, for Est-6 there are noticeable differences in average values of the GC content between the third position (GC3) and the overall coding (GCc) positions, while these differences do not exist for Est-6 (table 2). Thus, there is a relationship between the nucleotide-dependent entropy and the differences in GC content.

    The evolution of nucleotide composition has been extensively studied in different Drosophila lineages (for reviews see Moriyama and Hartl 1993; Akashi, Kliman, and Eyre-Walker 1998; Kliman and Eyre-Walker 1998; Rodrígues-Trelles, Tarrío, and Ayala 2000; Begun and Whitley 2002; Marais, Mouchiroud, and Duret 2003). Selection at the translational level, mutation bias, and recombination have been proposed as major factors determining base composition in Drosophila. All three factors seem to be involved in the evolution of the GC content of Est-6 and Est-6. Nucleotide variability and intragenic conversion events are significantly higher in Est-6 than in Est-6 (Balakirev and Ayala 2003a, 2004; Balakirev et al. 2003). GC content is significantly higher in exon I than in exon II for both genes, and this difference is more pronounced for Est-6 than for Est-6. Codon bias is uniformly higher for Est-6 than for Est-6, and codon bias is noticeably different between exon I and exon II (data not shown). Taking into account the fact that Est-6 and Est-6 are very closely linked genes (the intergenic sequence is only 193 bp long), these observations may indicate that translational selection is more pronounced in Est-6 than in Est-6; relaxed translational selection in Est-6 would lead to increased AT content.

    A similar trend in GC content has been observed in comparative investigations of pseudogenes and their functional homologs of Drosophila (Shields et al. 1988; Starmer and Sullivan 1989; Moriyama and Gojobori 1992; Currie and Sullivan 1994; Sullivan et al. 1994; Ramos-Onsins and Aguadé 1998). A comprehensive survey of Caenorhabditis elegans, Saccharomyces, D. melanogaster, and human pseudogenes shows that the nucleotide composition of pseudogenes is invariably intermediate between genes and intergenic regions (Echols et al. 2002). Drosophila pseudogenes have nearly the same composition as intergenic DNA. In D. melanogaster, GC content is uniformly higher at silent sites in coding regions than in the putatively neutrally evolving introns (Kliman and Hey 1994).

    In the amylase (Amy) gene family of Drosophila, decrease in GC3 content is coupled with gene and regulatory-element loss and decrease of selection intensity after duplication, suggesting that one of the two Drosophila types of Amy genes is undergoing functional decay (Zhang et al. 2002, 2003a, 2003b; Zhang and Kishino 2004). These authors suggest that one of the duplicated genes with low GC3 content (Amy3-type) is losing the ancient function. An unresolved question, however, is why the Amy3-type genes have not lost their function completely (Zhang et al. 2002). Inomata and Yamazaki (2000) have shown that the activity of amylase encoded by Amy1-type genes changes more in response to food quality and developmental stage than that encoded by Amy3-type genes, suggesting stronger regulation of the Amy1-type genes. The two genes have different expression and evolutionary patterns and may represent an intergene complex providing greater ability to respond to severe environments (see Balakirev and Ayala 2003c, 2003d, 2004).

    There is evidence of a general positive correlation between GC3 content and functionality (Epstein, Lin, and Tan 2000; Lin et al. 2003). GC-rich genes tend to be of a greater transcriptional and mitogenic significance than AT-rich genes (Epstein, Lin, and Tan 2000). Moreover, third-base GC retention also identifies critical amino acids within individual proteins, as indicated by nonrandom patterns of codon variation between gene homologs (Epstein, Lin, and Tan 2000; Lin et al. 2003). Sequence analysis of human receptor tyrosine kinase genes confirms that functionally important transmembrane hydrophobic amino acids are specified by codons containing GC third bases significantly more often than transmembrane neutral amino acids. Amino acids encoded by GC third bases thus appear more tightly linked to cell function and survival than those encoded by AT third bases. The same pattern appears in tumor-associated genes undergoing either loss-of-function mutation or rearrangements. As in gene-pseudogene comparisons, genes undergoing loss-of-function mutation tend to be GC poor, whereas those involved in rearrangements tend to be GC rich. Moreover, actively transcribed genes use more often C and G at synonymous sites than low-expressed genes (Shields et al. 1988; Duret and Mouchiroud 1999).

    The overall data on pseudogenes and nonfunctional sequences support the hypothesis that sites under low functional constraints tend to increase AT content (see however Duret and Hurst 2001). This AT bias has been observed for eukaryotic pseudogenes (Gojobori, Li, and Graur 1982; Li, Wu, and Luo 1984) and thoroughly investigated at the genomic level (Alvarez-Valin, Lamolle, and Bernardi 2002; Echols et al. 2002; Zhang and Gerstein 2003). There is evidence implying that in GC3-rich genes the majority of new mutations that are GC AT are eliminated by negative selection (Alvarez-Valin, Lamolle, and Bernardi 2002). SNP analysis of mammalian genomes evinces that GC-rich genes undergo an excess of GC AT mutations over AT GC mutations (Eyre-Walker 1999; Smith and Eyre-Walker 2001), but AT GC mutations have higher probability of fixation (Duret et al. 2002; Webster and Smith 2004). This fixation bias exhibits little variation across genome regions with different GC content (Lercher and Hurst 2002; Webster and Smith 2004) and could be explained by selection (Alvarez-Valin, Lamolle, and Bernardi 2002), biased gene conversion (Eyre-Walker 1993; Galtier et al. 2001; Birdsell 2002; Lercher and Hurst 2002; Marais 2003), or both (Smith and Eyre-Walker 2001; Lercher et al. 2002). Consequently, functional genes tend to be GC rich because negative selection eliminates the excess of GC AT mutations, whereas in pseudogene or nonfunctional sequences this elimination is not as efficient, leading to increased AT content. The contrasting characteristics of GC3 content and entropy values between Est-6 and Est-6 suggest that the major determinant of these differences may be a balance between the fixation-mutation bias (Alvarez-Valin, Lamolle, and Bernardi 2002; Webster and Smith 2004) and selection favoring increase in the GC3 content of Est-6 but not in Est-6. The observed differences in entropy and GC content may reflect an evolutionary shift associated with Est-6 pseudogenization and consequent functional divergence of Est-6 and Est-6.

    Supplementary Material

    The main definitions are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

    Acknowledgements

    We are grateful to W. M. Fitch, B. Gaut, R. R. Hudson, and A. Long for detailed and valuable comments. We thank Elena Balakireva and Iria Blanco Barca for encouragement and help. This work has been supported by National Institutes of Health grant GM42397 to F.J.A.

    References

    Akashi, H., R. M. Kliman, and A. Eyre-Walker. 1998. Mutation pressure, natural selection, and the evolution of base composition in Drosophila. Genetica 102/103:49–60.

    Alvarez-Valin, F., G. Lamolle, and G. Bernardi. 2002. Isochores, GC3 and mutation biases in the human genome. Gene 300:161–168.

    Avedisov, S. N., I. B. Rogozin, E. V. Koonin, and B. J. Thomas. 2001. Rapid evolution of a cyclin A inhibitor gene, roughex, in Drosophila. Mol. Biol. Evol. 18:2110–2118.

    Ayala, F. J., E. S. Balakirev, and A. G. Sáez. 2002. Genetic polymorphism at two linked loci, Sod and Est-6, in Drosophila melanogaster. Gene 300:19–29.

    Balakirev, E. S., and F. J. Ayala. 1996. Is esterase-P encoded by a cryptic pseudogene in Drosophila melanogaster? Genetics 144:1511–1518.

    ———. 2003a. Nucleotide variation of the Est-6 gene region in natural populations of Drosophila melanogaster. Genetics 165:1901–1914.

    ———. 2003b. Molecular population genetics of the ?-esterase gene cluster of Drosophila melanogaster. J. Genet. 82:115–131.

    ———. 2003c. Pseudogenes: are they "junk" or functional DNA? Annu. Rev. Genet. 37:123–151.

    ———. 2003d. Pseudogenes are not junk DNA. Pp. 177–193 in S. P. Wasser, ed. Evolutionary theory and processes: modern horizons. Kluwer Academic Publishers, The Netherlands.

    ———. 2004. The ?-esterase gene cluster of Drosophila melanogaster: is Est-6 a pseudogene, a functional gene, or both? Genetica 121:165–179.

    Balakirev, E. S., M. Anisimova, and F. J. Ayala. 2005. Positive and negative selection in the ?-esterase gene cluster of the Drosophila melanogaster subgroup. J. Mol. Evol. (in press).

    Balakirev, E. S., E. I. Balakirev, and F. J. Ayala. 2002. Molecular evolution of the Est-6 gene in Drosophila melanogaster: contrasting patterns of DNA variability in adjacent functional regions. Gene 288:167–177.

    Balakirev, E. S., E. I. Balakirev, F. Rodríguez-Trelles, and F. J. Ayala. 1999. Molecular evolution of two linked genes, Est-6 and Sod, in Drosophila melanogaster. Genetics 153:1357–1369.

    Balakirev, E. S., V. R. Chechetkin, V. V. Lobzin, and F. J. Ayala. 2003. DNA polymorphism in the ?-esterase gene cluster of Drosophila melanogaster. Genetics 164:533–544.

    Bar-Ziv, R., and A. Libchaber. 2001. Effects of DNA sequence and structure on binding of RecA to single-stranded DNA. Proc. Natl. Acad. Sci. USA 98:9068–9073.

    Begun, D. J., and P. Whitley. 2002. Molecular population genetics of Xdh and the evolution of base composition in Drosophila. Genetics 162:1725–1735.

    Birdsell, J. A. 2002. Integrating genomics, bioinformatics, and classical genetics to study the effects of recombination on genome evolution. Mol. Biol. Evol. 19:1181–1197.

    Brady, J. P., R. C. Richmond, and J. G. Oakeshott. 1990. Cloning of the esterase-5 locus from Drosophila pseudoobscura and comparison with its homologue in D. melanogaster. Mol. Biol. Evol. 7:525–546.

    Cariou, M.-L. 1987. Biochemical phylogeny of the eight species in the Drosophila melanogaster subgroup, including D. sechellia and D. orena. Genet. Res. 50:181–185.

    Chechetkin, V. R., and V. V. Lobzin. 1998. Nucleosome units and hidden periodicities in DNA sequences. J. Biomol. Struct. Dyn. 15:937–947.

    Chechetkin, V. R., and A. Y. Turygin. 1995. Search of hidden periodicities in DNA sequences. J. Theor. Biol. 175:477–494.

    Collet, C., K. M. Nielsen, R. J. Russell, M. Karl, J. G. Oakeshott, and R. C. Richmond. 1990. Molecular analysis of duplicated esterase genes in Drosophila melanogaster. Mol. Biol. Evol. 7:9–28.

    Currie, P. D., and D. T. Sullivan. 1994. Structure, expression and duplication of genes which encode phosphoglyceromutase of Drosophila melanogaster. Genetics 138:353–363.

    Dumancic, M. M., J. G. Oakeshott, R. J. Russell, and M. J. Healy. 1997. Characterization of the EstP protein in Drosophila melanogaster and its conservation in Drosophilids. Biochem. Genet. 35:251–271.

    Duret, L., and L. D. Hurst. 2001. The elevated GC content at exonic third sites is not evidence against neutralist models of isochore evolution. Mol. Biol. Evol. 18:757–762.

    Duret, L., and D. Mouchiroud. 1999. Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc. Natl. Acad. Sci. USA 96:4482–4487.

    Duret, L., M. Semon, G. Piganeau, D. Mouchiroud, and N. Galtier. 2002. Vanishing GC-rich isochores in mammalian genomes. Genetics 162:1837–1847.

    East, P. D., A. Graham, and G. Whitington. 1990. Molecular isolation and preliminary characterization of a duplicated esterase locus in Drosophila buzzatii. Pp. 389–406 in J. S. F. Barker, W. T. Starmer, and R. J. MacInture, eds. Ecological and evolutionary genetics of Drosophila. Plenum Press, New York.

    Echols, N., P. Harrison, S. Balasubramanian, N. M. Luscombe, P. Bertone, Z. Zhang, and M. Gerstein. 2002. Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes. Nucleic Acids Res. 30:2515–2523.

    Epstein, R. J., K. Lin, and T. W. Tan. 2000. A functional significance for codon third bases. Gene 245:291–298.

    Eyre-Walker, A. 1993. Recombination and mammalian genome evolution. Proc. R. Soc. Lond. B Biol. Sci. 252:237–243.

    ———. 1999. Evidence of selection on silent site base composition in mammals: potential implications for the evolution of isochores and junk DNA. Genetics 152:675–683.

    Filatov, D. A., and D. Charlesworth. 1999. DNA polymorphism, haplotype structure and balancing selection in the Leavenworthia PgiC locus. Genetics 153:1423–1434.

    Galtier, N., G. Piganeau, D. Mouchiroud, and L. Duret. 2001. GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. Genetics 159:907–911.

    Gojobori, T., W.-H. Li, and D. Graur. 1982. Patterns of nucleotide substitution in pseudogenes and functional genes. J. Mol. Evol. 18:360–369.

    Gromko, M. H., D. F. Gilbert, and R. C. Richmond. 1984. Sperm transfer and use in the multiple mating system of Drosophila. Pp. 371–426 in R. L. Smith, ed. Sperm competition and the evolution of animal mating systems. Academic Press, New York.

    Harvey, S. C., M. Dlakic, J. Griffith, R. Harrington, K. Park, D. Sprous, and W. Zacharias. 1995. What is the basis of sequence-directed curvature in DNA containing A tracts? J. Biomol. Struct. Dyn. 13:301–307.

    Healy, M. J., M. M. Dumancic, and J. G. Oakeshott. 1991. Biochemical and physiological studies of soluble esterases from Drosophila melanogaster. Biochem. Genet. 29:365–388.

    Inomata, N., and T. Yamazaki. 2000. Evolution of nucleotide substitutions and gene regulation in the amylase multigenes in Drosophila kikkawai and its sibling species. Mol. Biol. Evol. 17:601–615.

    Johnson, N. L., and F. C. Leone. 1977. Statistics and experimental design in engineering and the physical sciences, Vol. 1. John Wiley, New York.

    Kalantzi-Makri, M. C., I. P. Trougakos, T. P. Tafas, J. Sourdis, and L. H. Margaritis. 1999. Phylogenetic and taxonomical relationships of the eight species in the melanogaster subgroup of the genus Drosophila (Sophophora) based on the electrophoretic mobility of the major chorion proteins and the eggshell ultrastructure. J. Zool. 249:295–306.

    Kelly, J. K., 1997. A test of neutrality based on interlocus associations. Genetics 146:1197–1206.

    King, L. M. 1998. The role of gene conversion in determining sequence variation and divergence in the Est-5 gene family in Drosophila pseudoobscura. Genetics 148:305–315.

    Kliman, R. M., and A. Eyre-Walker. 1998. Patterns of base composition within the genus of Drosophila melanogaster. J. Mol. Evol. 46:534–541.

    Kliman, R. M., and J. Hey. 1994. The effects of mutation and natural selections on codon bias in the genes of Drosophila. Genetics 137:1049–1056.

    Ko, W.-Y., R. M. David, and H. Akashi. 2003. Molecular phylogeny of the Drosophila melanogaster species subgroup. J. Mol. Evol. 57:562–573.

    Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17:1244–1245.

    Lachaise, D., M.-L. Cariou, J. R. David, F. Lemeunier, L. Tsacas, and M. Ashburner. 1988. Biogeography of the Drosophila melanogaster species subgroup. Evol. Biol. 22:159–225.

    Lachaise, D., M. Harry, M. Solignac, F. Lemeunier, V. Benassi, and M.-L. Cariou. 2000. Evolutionary novelties in islands: Drosophila santomea, a new melanogaster sister species from Sao Tome. Proc. Biol. Sci. 267:1487–1495.

    Lemeunier, F., J. R. David, L. Tsacas, and M. Ashburner. 1986. The melanogaster species group. Pp. 147–256 in M. Ashburner, H. L. Carson, and J. N. Thompson Jr, eds. The genetics and biology of Drosophila, Vol. 3e. Academic Press, London.

    Lercher, M. J., and L. D. Hurst. 2002. Can mutation or fixation biases explain the allele frequency distribution of human single nucleotide polymorphisms (SNPs)? Gene 300:53–58.

    Lercher, M. J., N. G. C. Smith, A. Eyre-Walker, and L. D. Hurst. 2002. The evolution of isochors: evidence from SNP frequency distributions. Genetics 162:1805–1810.

    Lewin, B. 2000. Genes VII. Oxford University Press, Oxford.

    Li, W.-H., C.-I. Wu, and C.-C. Luo. 1984. Nonrandomness of point mutation as reflected in nucleotide substitutions in pseudogenes and its evolutionary implications. J. Mol. Evol. 21:58–71.

    Lin, K., S. B. Tan, P. R. Kolatkar, and R. J. Epstein. 2003. Nonrandom intragenic variations in patterns of codon bias implicate a sequential interplay between transitional genetic drift and functional amino acid selection. J. Mol. Evol. 57:538–545.

    Lobzin, V. V., and V. R. Chechetkin. 2000. Order and correlations in genomic DNA sequences. The spectral approach. Uspekhi Fizicheskikh Nauk 170:57–81 .

    Marais, G. 2003. Biased gene conversion: implications for genome and sex evolution. Trends Genet. 19:330–338.

    Marais, G., D. L. Mouchiroud, and L. Duret. 2003. Neutral effect of recombination on base composition in Drosophila. Genet. Res. 81:79–87.

    Matsuo, Y. 2003. Evolution of the GC content of the histone 3 gene in seven Drosophila species. Genes Genet. Syst. 78:309–318.

    Moriyama, E. N., and T. Gojobori. 1992. Rates of synonymous substitutions and base composition of nuclear genes in Drosophila. Genetics 130:855–864.

    Moriyama, E. N., and D. L. Hartl. 1993. Codon usage bias and base composition of nuclear genes in Drosophila. Genetics 134:847–858.

    Oakeshott, J. G., C. Collet, R. Phillis, K. M. Nielsen, R. J. Russell, G. K. Chambers, V. Ross, and R. C. Richmond. 1987. Molecular cloning and characterization of esterase 6, a serine hydrolase from Drosophila. Proc. Natl. Acad. Sci. USA 84:3359–3363.

    Oakeshott, J. G., T. M. Boyce, R. J. Russell, and M. J. Healy. 1995. Molecular insights into the evolution of an enzyme; esterase 6 in Drosophila. Trends Ecol. Evol. 10:103–110.

    Oakeshott, J. G., E. A. van Papenrecht, T. M. Boyce, M. J. Healy, and R. J. Russell. 1993. Evolutionary genetics of Drosophila esterases. Genetica 90:239–268.

    Parsch, J., C. D. Meiklejohn, E. Hauschteck-Jungen, P. Hunziker, and D. L. Hartl. 2001. Molecular evolution of the ocnus and janus genes in the Drosophila melanogaster species subgroup. Mol. Biol. Evol. 18:801–811.

    Ramos-Onsins, S., and M. Aguadé. 1998. Molecular evolution of the Cecropin multigene family in Drosophila: functional genes vs. pseudogenes. Genetics 150:157–171.

    Richmond, R. C., D. G. Gilbert, K. B. Sheehan, M. H. Gromko, and F. M. Butterworth. 1980. Esterase 6 and reproduction in Drosophila melanogaster. Science 207:1483–1485.

    Richmond, R. C., K. M. Nielsen, J. P. Brady, and E. M. Snella. 1990. Physiology, biochemistry and molecular biology of the Est-6 locus in Drosophila melanogaster. Pp. 273–292 in J. S. F. Barker, W. T. Starmer, and R. J. MacIntyre, eds. Ecological and evolutionary genetics of Drosophila. Plenum Press, New York.

    Rodrígues-Trelles, F., R. Tarrío, and F. J. Ayala. 2000. Fluctuating mutation bias and the evolution of base composition in Drosophila. J. Mol. Evol. 50:1–10.

    Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174–175.

    Shields, D. C., P. M. Sharp, D. G. Higgins, and F. Wright. 1988. "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. Mol. Biol. Evol. 5:704–716.

    Smith, N. G. C., and A. Eyre-Walker. 2001. Synonymous codon bias is not caused by mutation bias in G+C-rich genes in humans. Mol. Biol. Evol. 18:982–986.

    Starmer, W. T., and D. T. Sullivan. 1989. A shift in the third-codon-position nucleotide frequency in alcohol dehydrogenase genes in the genus Drosophila. Mol. Biol. Evol. 6:546–552.

    Sullivan, D. T., W. T. Starmer, S. W. Curtiss, M. Menotti-Raymond, and J. Yum. 1994. Unusual molecular evolution of an Adh pseudogene in Drosophila. Mol. Biol. Evol. 11:443–458.

    Th?str?m, A., P. T. Lowary, H. R. Widlund, H. Cao, M. Kubista, and J. Widom. 1999. Sequence motifs and free energies of selected natural and non-natural nucleosome positioning DNA sequences. J. Mol. Biol. 288:213–229.

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.

    Tiwari, S., S. Ramachandran, A. Bhattacharya, S. Bhattacharya, and R. Ramaswamy. 1997. Prediction of probable genes by Fourier analysis of genomic sequences. Comput. Appl. Biosci. 13:263–270.

    Wall, J. D. 1999. Recombination and the power of statistical tests of neutrality. Genet. Res. 74:65–79.

    Webster, M. T., and N. G. C. Smith. 2004. Fixation biases affecting human SNPs. Trends Genet. 20:122–126.

    Yang, Z., and J. Bielawski. 2000. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15:496–503.

    Yenikolopov, G. N., O. A. Malevantschuk, N. I. Peunova, P. V. Sergeev, and G. P. Georgiev. 1989. Est locus of Drosophila virilis contains two related genes. Dokl. Acad. Nauk SSSR 306:1247–1249 .

    Zhang, Z., and M. Gerstein. 2003. Patterns of nucleotide substitutions, insertion and deletion in the human genome inferred from pseudogenes. Nucleic Acids Res. 31:5338–5348.

    Zhang, Z., N. Inomata, M.-L. Cariou, J.-L. Da Lage, and T. Yamazaki. 2003a. Phylogeny and the evolution of the amylase multigenes in the Drosophila montium species subgroup. J. Mol. Evol. 56:121–130.

    Zhang, Z., N. Inomata, T. Ohba, M.-L. Cariou, and T. Yamazaki. 2002. Codon bias differences between the duplicated amylase loci following gene duplication in Drosophila. Genetics 161:1187–1196.

    Zhang, Z., N. Inomata, T. Yamazaki, and H. Kishino. 2003b. Evolutionary history and mode of the amylase multigene family in Drosophila. J. Mol. Evol. 57:702–709.

    Zhang, Z., and H. Kishino. 2004. Genomic background drives the divergence of duplicated amylase genes at synonymous sites in Drosophila. Mol. Biol. Evol. 21:222–227.(Evgeniy S. Balakirev*,,, )