当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2005年 > 第6期 > 正文
编号:11258313
Context-Dependent Codon Bias and Messenger RNA Longevity in the Yeast Transcriptome
     Department of Biology, American University

    Correspondence: E-mail: carlini@american.edu.

    Abstract

    Context-dependent codon bias and its relationship with messenger RNA (mRNA) longevity was examined in 4,648 mRNA transcripts of the Saccharomyces cerevisiae transcriptome for which mRNA half-lives have been empirically determined. Surprisingly, rare codon usage (codons used <13 times per 1,000 codons in the genome) increased with mRNA half-life. However, it is shown that this pattern was not due to preference for rare codon use within codon families containing both rare and nonrare codons. Rather, the pattern was due to an increase in the frequency of amino acids encoded solely by rare codons, and a decrease in the frequency of amino acids never encoded by rare codons, with mRNA half-life. When standardized by open reading frame length, the use of consecutive rare codons was also positively correlated with mRNA half-life. There was negative correlation between the usage of synonymous A|T dinucleotides spanning codon boundaries and mRNA half-life, despite the fact that the frequency of AT dinucleotide usage overall, and AT dinucleotide usage at other codon position contexts (e.g., 1–2, 2–3, or 3|1 total), was not correlated with mRNA half-life. The use of A|T dinucleotides at synonymous dicodon boundaries could potentially allow for more efficient 3'–5' degradation by endonucleolytic cleavage.

    Key Words: context-dependent codon bias ? mRNA half-life ? mRNA stability ? gene expression ? yeast

    Introduction

    Gene expression may be controlled at many stages during the process through which genomic information is converted into the functional components of living cells. These stages include transcription, posttranscriptional processing, translation, and messenger RNA (mRNA) degradation. Expression is controlled at each of these stages, and control of expression is also integrated across these stages. The present study is concerned with the control of expression through mRNA degradation. While specific motifs in the flanking regions of mRNAs clearly influence rates of mRNA decay, a definitive link between mRNA decay and general features of mRNAs, such as dinucleotide composition, codon bias, and context-dependent codon bias (CDCB), has not yet been established in genome-wide surveys. The central purpose of this study is to identify general features of mRNA sequences that may contribute to variation in rates of mRNA decay, and which may consequently relate to variation in levels of gene expression.

    Because expression is controlled at many levels, it is not surprising that genome-wide analysis of mRNA and protein levels has revealed that there is a substantial amount of "noise" (stochasticity) in gene expression (Elowitz et al. 2002; Blake et al. 2003; Fraser et al. 2004). Noise in gene expression refers to the random variation in transcription and translation, which results in different rates of synthesis of a specific protein in genetically identical cells maintained in identical environments. Despite this noise, the regulatory mechanisms that are responsible for modulating the expression of individual gene products in prokaryotes and eukaryotes have been studied extensively. In particular, the control of expression through differential transcription is well characterized. Cis-regulatory elements such as promoters and enhancers can affect the initiation and/or rate of transcription. In eukaryotes many of these regulatory elements are localized in the untranslated flanking regions of mRNAs but some are nested within the coding portions of exons. The rate of transcription is also controlled by trans-acting factors, which serve to activate or repress transcription through interactions with cis-regulatory sequences. In eukaryotes, gene expression is controlled primarily at the level of transcription, with the major control point at transcriptional initiation (Struhl 1999).

    Translational control of gene expression has also received much attention in two general areas—factors that control ribosome assembly and initiation (Zong et al. 1999; Kuhn et al. 2001; Arava et al. 2003) and the effects of codon bias on translational accuracy and efficiency (Bulmer 1991; Akashi 2001). The association of ribosomes and specific mRNAs may be influenced by factors such as mitogenic activation (Morris 1995; Sonenberg 1996), viral infection (Johannes et al. 1999), antibody presence (Mikulits et al. 2000), nutrient conditions (Kuhn et al. 2001), or long-range pairing between the 5' and 3' ends of mRNA (Parsch, Tanda, and Stephan 1997; Baines, Parsch, and Stephan 2004). General features of mRNAs are also known to affect translation. For example, it is known that highly expressed genes have significantly greater levels of codon bias than weakly expressed genes. Codon bias allows mRNA transcripts to be translated more quickly and accurately through efficient use of the transfer RNA pool (Ikemura 1981, 1982; Grosjean and Fiers 1982; Moriyama and Powell 1997; Akashi 2003). The efficiency of translation is determined by the rate of formation of the translation initiation complex. Because specific sequences upstream and downstream of the initiation codon are known to act as translational enhancers, these sequences also exert control on the overall level of gene expression (O'connor et al. 1999; Stenstr?m et al. 2001; Stenstr?m and Isaksson 2002).

    Levels of gene expression may also be mediated by differential rates of mRNA decay. Because the steady-state levels of mRNAs are established by the joint effects of their rates of synthesis and decay, elevated rates of decay would lead to a reduction in expression if synthesis were held constant. Several factors contribute to mRNA longevity, including specific determinants, such as stability-instability elements in 5'-untranslated regions (UTRs), coding regions, and 3'-UTRs. Nonspecific determinants, general features of mRNAs, such as the 5' cap, poly (A) tail, mRNA length, and codon bias, also contribute to mRNA longevity (Sachs 1993; Caponigro and Parker 1996; Tucker and Parker 2000). In this paper, the terms "mRNA longevity" and "mRNA half-life" are used interchangeably. To avoid confusion, the term "mRNA stability" is used only in the context of mRNA secondary structure and the predicted folding free energy (G) of an mRNA based on in silico folding applications such as mfold (Zuker 2003).

    Previous studies have demonstrated that nonspecific features of mRNAs do not affect mRNA longevity in a consistent fashion across many mRNAs, although such features may be important in specific transcripts (Caponigro and Parker 1996; Wang et al. 2002). For example, there is no transcriptome-wide relationship between codon bias and mRNA half-life (Wang et al. 2002), although several studies have demonstrated that rare codons, those used <13 times in 1,000 codons in the genome, negatively affect longevity of particular transcripts (Hoekema et al. 1987; Herrick, Parker, and Jacobson 1990; Caponigro, Muhlrad, and Parker 1993; Caponigro and Parker 1996). The clustering of several rare codons within a narrow region can lead to destabilization (Caponigro, Muhlrad, and Parker 1993). It was thought that ribosome packing could protect transcripts from degradation by preventing access to the mRNA by endoribonucleases, such that transcripts with rare codons would be more susceptible to degradation, but it was recently determined that ribosome packing densities do not differ between stable and unstable mRNAs (Arava et al. 2003). However, these conflicting observations do not exclude the possibility that codon choice in specific contexts may influence the rate of mRNA decay, even though it might only explain a small portion of the variance in mRNA half-life.

    The presence of rare codons at specific positions in a transcript is but one form of CDCB that may contribute to mRNA longevity. CDCB is the influence of surrounding nucleotides and/or codons on the choice of synonymous codons (Yarus and Folley 1984; Shpaer 1986; Fedorov, Saxonov, and Gilbert 2002). Very little is known about the relationship between other forms of CDCB and mRNA longevity, although a few studies suggest that dinucleotide composition across the reading frame may be important (Nussinov 1981; Antezana and Kreitman 1999; Duan and Antezana 2003; Katz and Burge 2003). A recent experimental study by Duan and Antezana (2003) demonstrated that a human DRD2 mRNA containing synonymous mutations causing enrichment in T|A dinucleotides (where "|" designates the codon boundary) at dicodon boundaries had significantly shorter half-lives than the wild-type DRD2 mRNA. Duan and Antezana (2003) also constructed a human DRD2 mRNA enriched in C|G dinucleotides and found it to have a significantly longer half-life than the wild-type DRD2 mRNA. Based on these results one might hypothesize that, on average, stable mRNAs should contain fewer synonymous T|A dinucleotides and more synonymous C|G dinucleotides than unstable mRNAs. This hypothesis can be tested by relating the frequencies of synonymous T|A and C|G dinucleotide use to mRNA half-life data.

    The decay rates of 4,677 mRNA transcripts in the yeast transcriptome have been measured through microarray analysis coupled with a global transcriptional shut-off assay (Wang et al. 2002). Three independent time courses of mRNA decay were each measured at nine time points after transcriptional shut-off. For 80% of the genes, the half-lives varied by <15% among the three independent replicates. The half-lives also agreed with those that had been previously determined by Northern analysis. For these reasons, the Wang et al. (2002) data are assumed to be highly reliable measures of mRNA half-life. The major conclusion of the Wang et al. (2002) study was that the decay rates of functionally related proteins, that is, those that work together in stoichiometric complexes, were similar. No relationship was observed between mRNA half-lives and codon bias, mRNA length, or mRNA abundance. However, a closer inspection of codon bias might reveal other properties of mRNAs that are associated with mRNA longevity. For example, although no overall relationship between codon bias and mRNA half-life was observed by Wang et al. (2002), it is still possible that the use of specific codons might be positively or negatively correlated with mRNA half-life. Because overall measures of overall codon bias such as the effective number of codons (ENC) (Wright 1990) distill the codon bias of all 18 degenerate codon families into a single metric, signatures of codon bias in individual codon families can go undetected. Furthermore, context-dependent effects could also influence mRNA longevity by either affecting the dinucleotide composition at codon boundaries (Duan and Antezana 2003) or through the effects of translational pausing due to the specific placement of rare codons.

    This study examines the relationship between CDCB and mRNA longevity in the yeast transcriptome (1) by relating patterns of rare codon usage to mRNA half-life and (2) by relating the frequency of synonymous dinucleotide usage at codon boundaries to mRNA half-life. The results from this study show that rare codons are used more frequently in transcripts with long half-lives. However, this pattern is not due to preference for rare codon use within codon families containing both rare and nonrare codons. Rather, the pattern is due to an increase in the frequency of amino acids encoded solely by rare codons and a decrease in the frequency of amino acids never encoded by rare codons, with mRNA half-life. Several interesting patterns of synonymous dinucleotide usage are revealed including a decrease in synonymous A|T dinucleotide usage with mRNA half-life.

    Methods

    In December 2002, 6,296 yeast open reading frames (ORFs) were parsed from the Saccharomyces cerevisiae genome downloaded from the Stanford Genome Database (ftp://genome-ftp.stanford.edu/pub/yeast/). mRNA decay data for 4,677 unique transcripts was also downloaded from the Stanford Genome Database (http://www-genome.stanford.edu/turnover.shtml). Genes encoded on the mitochondrial DNA were not examined in this study. Because the ORF identifiers were not identical in the two data sets, only those ORFs with identical identifiers were analyzed in reference to mRNA decay data, representing a total of 4,648 ORFs. The difference of 1,648 ORFs reflects those ORFs for which mRNA decay data are unavailable for one of two reasons: (1) putative ORFs that are not transcribed or (2) transcribed ORFs for which decay data were not obtained due to poor signal.

    The folding free energies of the 100 shortest half-life transcripts (t = 3–6 min) and 100 longest half-life transcripts (t = 86–469 min) were calculated on the mfold Web server using the default parameters (Zuker 2003). The folding free energy of any mRNA is strongly dependent on its length because longer sequences have more bases to pair. Length was taken into account by standardized free energies, that is, by dividing the free energy of each transcript by its length (bp). Previous work has demonstrated that linear scaling is the most appropriate normalization in the context of stability because the minimum free energy increases linearly with sequence length (Pervouchine, Graber, and Kasif 2003).

    For each ORF, the frequencies of rare codons, preferred codons, dicodons, and dinucleotide composition were obtained using Perl scripts, all of which are available upon request from D.C. Preferred codons, those that increase in frequency in highly expressed genes, were previously identified by Akashi (2003). The usage of each of the 3,904 theoretically possible dicodons (=612 sense:sense codons + 61 x 3 sense:stop codons) was summed up over all ORFs. No instances of stop codons followed by sense codons were observed.

    To facilitate comparisons with previous studies, rare codons were defined according to Caponigro, Muhlrad, and Parker (1993) as those occurring at a frequency of 13 per 1,000 codons. In yeast, there are 26 codons that meet this criterion (Supplementary Material online, table 1). Rare codon use is influenced by amino acid composition of the encoded protein because the number of rare codons varies among the different synonymous codon families. At one extreme are the amino acids Cys and Trp, which are always encoded by rare codons. At the other extreme are amino acids such as Lys or Asn, where none of the synonymous codons are rare. A protein enriched in amino acids with few or no rare codons will therefore contain few rare codons as a consequence of its amino acid composition. The proportion of synonymous codons that are rare codons can be grouped into six categories. These categories, ranked in descending potential for rare codon use, are as follows: all rare (Cys, Trp), 5/6 rare (Arg), 3/4 rare (Gly, Val), 1/2 rare (Ala, Gln, His, Leu, Pro, Thr), 1/3 rare (Ser), and 0 rare (Asn, Asp, Glu, Ile, Lys, Met, Phe, Tyr).

    To control for amino acid composition, the frequency of rare codon use in each amino acid was adjusted according to the expected frequency of rare codon use if codon usage were entirely random. For example, the amino acid Gln is encoded by two codons, CAA and CAG. Only CAG is a rare codon, so if codon usage were random one would expect 50% of Gln codons to be rare. If, for a particular gene, the observed frequency of Gln codons were 75% and 25% for the CAA and CAG codons, respectively, then the relative rare codon usage for Glu (RRCUGlu) would be 0.25/0.50 = 0.25. RRCU values are analogous to the RSCU statistic introduced by Sharp, Tuohy, and Mosurski (1986), where RRCU values >1 indicate a higher than expected frequency of rare codon use, whereas RRCU values <1 indicate a lower than expected frequency of rare codon use. The RRCU statistic standardizes rare codon usage across the different categories so that they can be combined to obtain an overall index of rare codon usage for the entire gene. To calculate total RRCU use for a particular gene, the RRCU for each codon family with rare codons was calculated, weighted by the proportion of the gene encoded by that codon family:

    where RRCUi is the RRCU value for the ith codon family, Xi is the total number of codons of codon family i in the gene, and n is the difference between the total number of codons in the gene and the number of codons in the eight codon families without rare codons. This adjustment to the denominator was made because RRCU can only be calculated for codon families containing rare codons and is undefined for the eight amino acids never encoded by rare codons. However, because these codons contribute to the total number of codons in the gene, variation in the use of these eight amino acids would contribute to variation in RRCUTotal in the absence of the adjustment. For the two amino acids always encoded by rare codons, RRCU is always 1. To determine if the use of amino acids without rare codons varies with mRNA half-life, the proportion of codons in the eight codon families containing no rare codons was analyzed separately.

    Dinucleotide composition at synonymous dicodons was calculated as follows. A pair of adjacent amino acids may be encoded by a number of synonymous dicodons, depending on the degeneracy of the codon families. For example, a lysine followed by an aspartate residue can be encoded by four synonymous dicodons because lysine and aspartate are each encoded by a twofold degenerate codon family. The following four dicodons are therefore synonymous: (1) AAA GAC, (2) AAA GAT, (3) AAG GAC, and (4) AAG GAT. In this case, dicodons 1 and 2 contain A|G dinucleotides at the codon boundary, whereas dicodons 3 and 4 contain G|G dinucleotides at the codon boundary. The frequency of Lys Asp dicodons encoded by A|G dinucleotides at the codon boundary can be calculated as the sum of dicodons 1 and 2 divided by the sum of dicodons 1–4 (i.e., the total number of Lys Asp dicodons). This calculation can be extended to include all synonymous dicodons with A|G codon boundaries. There are 12 amino acids that can be encoded by A3 codons: Lys, Arg, Leu, Ser, Thr, Ile, Glu, Gly, Ala, Val, Gln, and Pro. There are five amino acids that are encoded by G1 codons: Asp, Glu, Gly, Ala, and Val. For each of these 60 dicodons, the frequency of synonymous A|G usage can be calculated as described for Lys Asp above.

    However, the number of synonymous A|G dinucleotides for all dicodons cannot be simply pooled to obtain the total frequency of synonymous A|G usage for each ORF because 3|1 synonymous dinucleotide usage is also influenced by amino acid composition. For example, with respect to the proportion of total dicodons potentially encoded by A|G dinucleotides, there are four different categories of synonymous A|G usage: 1/2, 1/3, 1/4, and 1/6. Lys Asp dicodons are an example of the 1/2 category, where two dicodons (AAA|GAY) have A|G dinucleotides and two dicodons do not (AAG|GAY). Therefore, 1/2 of the dicodons would contain A|G dinucleotides if synonymous dinucleotide use were entirely random. Ser Asp dicodons are an example of the 1/6 category, where only 2 (TCA|GAY) of the 12 possible Ser Asp dicodons (TCN|GAY + AGY|GAY) are encoded by A|G dinucleotides. Each of the 16 synonymous dinucleotide groups has a unique set of such categories. For example, only 1/18 of Arg Arg dicodons are encoded by C|A dinucleotides.

    To control for variation in the amino acid composition of each ORF, the frequency of synonymous dinucleotide usage was adjusted by dividing the observed frequency of synonymous dinucleotide usage by the expected frequency of synonymous dinucleotide if synonymous dinucleotide use were random. This adjustment yields RSDUij, the relative synonymous dinucleotide usage for dinucleotide j for the ith dicodon:

    where Xij is the number of occurrences of dinucleotide j for the ith dicodon, which is encoded by ni synonymous dicodons. The overall RSDU value for synonymous dinucleotide j (=RSDUj) was then calculated as the weighted average of the RSDUij values for the dinucleotide of interest. When RSDUj < 1, synonymous dinucleotide j is used less frequently than expected, and when RSDUj > 1, synonymous dinucleotide j is used more frequently than expected under random synonymous dinucleotide usage.

    The data were placed into 15 bins by grouping mRNAs with similar half-lives into bins containing 270–371 ORFs (Supplementary Material online, table 1). It was not possible to bin the data into bins of exactly the same size because of the variation in the number of transcripts with half-lives of a given time. Linear regressions were performed to determine if there was any relationship between rare codon use, preferred codon use, or dinucleotide use and mRNA half-life. When appropriate, the Dunn-Sidak method of sequential Bonferroni tests was employed to correct for spurious statistical significance arising from multiple tests. Regressions were performed on the binned data and also on the raw data. There were no qualitative differences between the results from regression analysis of the raw data or binned data, but only the results from analyses of the binned data are presented here. The results from analyses of the raw data are available upon request from the author.

    Results

    When standardized by length, the average minimal free energy of the 100 transcripts with the shortest half-lives was not significantly different from the average minimal free energy of the 100 transcripts with the longest half-lives (Wilcoxon two-sample test, ts = 0.32, P = 0.401). The difference in standardized free energies ran contrary to expectations, with the short-lived transcripts having more stable global secondary structures than the transcripts with long half-lives (average Gunstable = –0.272 kcal/mol/bp, average Gstable = –0.258 kcal/mol/bp). When the free energy of each transcript was not standardized by length, the difference in free energies between the short-lived and long-lived transcripts was more pronounced, with the short-lived transcripts having a lower average free energy (G = –404.6 kcal/mol) than the long-lived transcripts (G = –286.9 kcal/mol). This difference is not statistically significant (Wilcoxon two-sample test, ts = 0.32, P = 0.113) and is probably due to the substantial difference in the lengths of the two sets of transcripts. The average length of the 100 most short-lived transcripts, 1,484.8 bp, was slightly, but not significantly, greater than the average length of the 100 most long-lived transcripts, 1,104.6 bp (Wilcoxon two-sample test, ts = 0.32, P = 0.106). Thus, global folding free energy appears to be a poor predictor of the mRNA half-life of yeast transcripts.

    Codon frequencies were calculated for all 6,296 ORFs, yielding 26 codons used at a frequency less than 13 per 1,000 codons (Supplementary Material online, table 1). In previous studies of yeast codon usage, these codons have been designated "rare" codons whose usage was predicted to be inversely correlated with mRNA longevity (Hoekema et al. 1987; Herrick, Parker, and Jacobson 1990; Caponigro, Muhlrad, and Parker 1993). Because the frequency of rare codon usage depends on the amino acid content, a standardized metric of rare codon usage, RRCUTotal, was regressed against mRNA half-life. There was no significant relationship between mRNA half-life and RRCUTotal (P = 0.238). Because the eight amino acids that are not encoded by rare codons are not included in the calculation of RRCUTotal, the proportion of each ORF encoded by amino acids without rare codons was also regressed against mRNA half-life. There was a significant negative correlation between mRNA half-life and the use of amino acids never encoded by rare codons (fig. 1, P = 0.0006, rs = –0.857). This negative trend is countered by a positive trend in the use of Trp and Cys, the two amino acids always encoded by rare codons, and mRNA half-life (fig. 1, P = 0.0026, rs = 0.893). Separate regressions for the four groups of amino acids with intermediate proportions of rare codons contributing to RRCU were not significant, nor were they indicative of any general trends for rare codon usage.

    FIG. 1.— Average proportion of codons encoded by amino acids with no rare codons ("none rare", open squares) or all rare codons ("all rare", filled circles, 2nd y axis) plotted as a function of mRNA half-life. mRNA half-life data were binned into 15 intervals, and the average proportion of codons encoded by "all rare" and "none rare" codons were calculated for each of the 15 bins. The average proportion of "all rare" codons increased as a function of average mRNA half-life (P = 0.0026, rs = 0.893), whereas the average proportion of "none rare" codons decreased with average mRNA half-life (P = 0.0006 and rs = –0.857).

    In addition to the overall frequency of rare codon use, the number and/or position of consecutive rare codons may influence the longevity of transcripts. There was no relationship between the maximum number of consecutive rare codons in an ORF and its half-life (P = 0.943). However, because longer ORFs are more likely to contain longer runs of rare codons due to chance alone, the data were also analyzed with the maximum consecutive rare codon run expressed as a fraction of the ORF length. The relationship between the maximum number of consecutive rare codons, expressed as a fraction of ORF length, and mRNA half-life was positive and highly significant (P = 0.0009, R2 = 0.585). A logistic regression improved the fit to the data (R2 = 0.820), indicating that there may be an upper limit to the number of consecutive rare codons "permitted" in a transcript.

    Rare codons represent a subset of the 38 unpreferred codons in the yeast genome, those whose usage decreases significantly with increasing expression level (Akashi 2003). There was no overall relationship between the use of unpreferred codons (or, conversely, preferred codons) and mRNA half-life (P = 0.362), nor was there a significant relationship between mRNA half-life and unpreferred (or preferred) codon use within each of the 18 degenerate codon families. These results are consistent with the results of similar analyses reported by Wang et al. (2002).

    Dinucleotide composition has also been implicated in mRNA secondary structure stability (Workman and Krogh 1999). In keeping with this idea, a relationship between overall dinucleotide composition and half-life is observed for 7 of the 16 dinucleotides (table 1). Three dinucleotides, GA, AG, and AA, are negatively correlated with mRNA half-life, while the CC, TT, CT, and TC dinucleotides are positively correlated with longevity. To obtain insight into which components of codons contributed most significantly to dinucleotide bias, the dinucleotide composition at each of the three positions relative to the reading frame were regressed against the binned half-life data. The dinucleotides at the first two positions of the reading frame, herein termed 1–2 dinucleotides, show a similar trend as that for overall dinucleotides described above. Two of the same dinucleotides, AG and GA, are significantly negatively correlated with half-life. Likewise, the CC dinucleotide is positively correlated with half-life. Two additional dinucleotides, TG and GT, are also positively correlated with half-life. Trends in dinucleotide usage at the second and third codon positions, herein termed 2–3 dinucleotides, are also similar to overall dinucleotide composition. AA and AG dinucleotide use is negatively correlated with half-life, while the 2–3 dinucleotides GC, TT, and TC are positively correlated with half-life. For dinucleotides occupying the 3|1 position, A|A, A|G, A|C, and T|A are negatively correlated with half-life, while C|C, C|T, and T|T are positively correlated with half-life, again similar to overall dinucleotide use. Thus, the relationship between mRNA half-life and dinucleotide composition is generally similar at different positions of the reading frame, with the purinic dinucleotides used less frequently in stable transcripts and the pyrimidinic dinucleotides showing the opposite trend.

    Table 1 Results of Regression Analyses of Average Dinucleotide Composition Versus Average mRNA Half-life, RANKED by Statistical Significancea

    In general, dinucleotide use is strongly affected by the amino acid sequences of the encoded proteins. With the exception of the sixfold degenerate codon families of arginine, leucine, and serine, there are no synonymous 1–2 dinucleotides due to the structure of the genetic code. In other words, at the 1–2 position, different dinucleotides encode different amino acids except when there is some first codon position degeneracy (e.g., AGA and CGA both encode arginine). Within a codon family, synonymous 2–3 dinucleotide use is similar to codon bias. In other words, for each of the 18 amino acids encoded by >1 codon, the same amino acid can be encoded by different 2–3 dinucleotides (e.g., AAA and AAG both encode lysine, where the 2–3 dinucleotides are different). Most relevant to CDCB is synonymous dinucleotide use across codon boundaries. If patterns of synonymous dinucleotide use across codon boundaries differ from overall or 3|1 dinucleotide use, it would suggest that CDCB is related to mRNA half-life. There are two patterns that would be suggestive of such an effect: first, if overall dinucleotide use is not significantly correlated with half-life, but synonymous 3|1 dinucleotide use is; second, if the slopes of both are significant, but opposite in sign. To account for differences among dicodons in the contribution of any given dinucleotide to synonymous dicodon usage, a standardized measure of synonymous 3|1 dinucleotide usage, RSDU, was adopted (see Methods for calculations of RSDU).

    Although three different synonymous dinucleotides are significantly correlated with mRNA half-life, only one, synonymous A|T dinucleotides, is significant (P = 0.0105) even though it is neither significantly correlated overall nor at the 1–2, 2–3, or 3|1 positions (table 1). Figure 2 shows the negative correlation between RSDUA|T and mRNA half-life for the binned data. The slope is highly significant, indicating that CDCB may be associated with mRNA longevity. Although RSDUC|C and RSDUA|A are also correlated with mRNA half-life, the significance may be due to factors other than CDCB because the correlations at other frames and for overall dinucleotide use are also significant. Note that the difference in the magnitude of the slope for the RSDU correlations is due to the difference in the scale of variation in RSDU versus dinucleotide percentage. When uncorrected percentage synonymous 3|1 dinucleotide usage was regressed against mRNA half-life, the same results were obtained with the exception that one additional 3|1 dinucleotide, A|G, was also found to be significantly associated with mRNA half-life, but as with the RSDU data, only synonymous A|T usage was independent of patterns found at the other frames or for overall dinucleotide usage.

    FIG. 2.— Relative synonymous A|T dinucleotide usage (RSDUA|T) at codon boundaries versus mRNA half-life. mRNA half-life data were binned into 15 intervals, and the regression of average RSDUA|T and average half-life in each bin resulted in a significant negative association after controlling for multiple tests (P = 0.0105, rs = –0.7357).

    Discussion

    Silent variation has previously been shown to influence transcript abundance through effects on transcription, splicing, translation, and, in the present study, through effects on mRNA decay. Although the underlying factors responsible for the correlations found in this study must be determined empirically, some plausible mechanisms are proposed below.

    Surprisingly, it appears that amino acid content is correlated with mRNA longevity. Proteins enriched in amino acids with rare codons have more stable mRNAs than proteins containing a high percentage of amino acids that cannot be encoded by rare codons. Cys and Trp, which are encoded exclusively by rare codons, are used more frequently in stable transcripts, whereas the eight amino acids encoded by codon families without rare codons are used less frequently in stable transcripts (fig. 1). There was no significant general trend for a relationship between rare codon use and mRNA half-life in codon families containing both rare and nonrare codons. Nevertheless, the longevity of an mRNA is probably determined in part by its nucleotide composition because some of the factors that govern the cellular residence time of an mRNA act directly on the mRNA molecule. Rare codons do not appear to confer ribosome protection as ribosome density is not correlated with rare codon use (Arava et al. 2003); some other property of rare codons, such as the nucleotide composition, must affect mRNA longevity. Rare codons may be lacking in nucleotide motifs recognized by the mRNA decay machinery, or the base composition of rare codons might affect mRNA secondary structure.

    The rare codons of yeast are deficient in A3 and T1 nucleotides (Supplementary Material online, table 1). This pattern is consistent with results of this study, in which A|T dinucleotides are avoided in transcripts with long half-lives. A greater tendency to use consecutive rare codons in stable transcripts, which would result in a dearth of A|T dinucleotides, is also consistent with patterns of synonymous dinucleotide use observed. Ribonucleases are known to exhibit preferences for certain dinucleotides at cleavage sites (Beutler et al. 1989), and AT dinucleotides are known to be targets of RNaseL endonucleolytic cleavage (Carrol et al. 1996). In mammals it is thought that T|A dinucleotides are avoided at codon boundaries because they are targets of cleavage by ribonucleases (Beutler et al. 1989; Qiu et al. 1998). In yeast, T|A usage was negatively correlated with mRNA half-life; synonymous T|A usage showed a negative, but insignificant, correlation (table 1). At present, it is not known whether AT dinucleotides are more common targets of ribonucleic cleavage than TA dinucleotides in yeast.

    Because the yeast rare codons have a high GC3 content (76%), transcripts with a higher frequency of rare codons might fold into more stable secondary structures. To date, predictive computational analyses of mRNA secondary structure have yielded conflicting conclusions regarding the influence GC3 content (Seffens and Digby 1999; Workman and Krogh 1999; Carlini, Chen, and Stephan 2001; Katz and Burge 2003). No association between mRNA half-life and global secondary structure was found in the present study. Katz and Burge (2003) found no correlation between mRNA half-life and the folding potential of local secondary structures. Therefore, in yeast there is probably no general association between mRNA secondary structure (local or global) and mRNA longevity. However, in some instances slight changes in secondary structure may affect mRNA longevity, for example, through enhancing or preventing access of the degradation machinery to target sites (discussed below).

    CDCB considers the influence of surrounding nucleotides in determining codon choice (Yarus and Folley 1984; Shpaer 1986). CDCB has been studied much less extensively than codon bias. The most significant finding from the few studies of CDCB conducted to date is that the first nucleotide following a codon is the most important in determining codon choice (Karlin and Mrazek 1996; Berg and Silva 1997; Fedorov, Saxonov, and Gilbert 2002). CDCB may contribute to the accuracy of protein synthesis due to the spatial interaction of the ribosomal proteins with the paired codon-anticodon nucleotides in the A and P sites of the ribosomes (Fedorov, Saxonov, and Gilbert 2002; Boycheva, Chkodrov, and Ivanov 2003), with certain context-dependent interactions favored because they allow for more accurate translation.

    CDCB may also affect mRNA longevity, a possibility examined in this study. The negative relationship between synonymous A|T usage and mRNA longevity can be understood in terms of the effects of nucleotide composition on local secondary structural elements of mRNAs. Experimental work has demonstrated that the formation of secondary structures upstream of destabilizing AU-rich elements (AREs) inhibit 3'–5' exoribonucleolytic degradation of yeast transcripts (Curatola, Nadal, and Schneider 1995; Vasudevan and Peltz 2001). RNA secondary structures in the 5'-UTR have also been shown to affect the longevity of yeast transcripts (Decker and Parker 1993; Muhlrad, Decker, and Parker 1994, 1995). Computational prediction of mRNA secondary structures indicates that nucleotide composition can significantly influence the formation of local and global secondary structures in eukaryotic mRNAs (Seffens and Digby 1999; Workman and Krogh 1999; Carlini, Chen, and Stephan 2001). The use of A|T dinucleotides, rather than C|G dinucleotides, at synonymous dicodon boundaries could potentially result in weaker local secondary structures, allowing for more efficient 3'–5' degradation. A|T dinucleotides may also mimic the destabilizing AREs, recruiting ARE-binding proteins, triggering rapid degradation. However, the lack of a significant trend for synonymous T|A usage does not support this hypothesis.

    The results do not necessarily run contrary to the predictions based on the Duan and Antezana (2003) study. The trends in 3|1 dinucleotide usage are qualitatively identical to the results of Duan and Antezana (2003): C|G use increased and T|A use decreased with mRNA half-life, although the slopes were not statistically significant. Although it appears that the major pathways of mRNA decay in yeast and in mammals are conserved, the relative importance and individual components of the pathways may differ (Decker and Parker 2002). The degradation pathways of individual transcripts are highly idiosyncratic. General trends obtained from analysis of transcriptome-wide half-life data will not necessarily apply to individual transcripts.

    Natural selection acts on multiple processes to optimize gene expression. The simultaneous effects of transcription and translation rates on noise in gene expression were recently measured, modeled, and related to observed levels of noise in yeast (Blake et al. 2003; Fraser et al. 2004). Blake et al. (2003) found that noise in expression was maximized in transcripts with intermediate transcription and high translation levels. Fraser et al. (2004) demonstrated that genes essential for viability, which are expected to exhibit low levels of variation in expression, tend to exhibit the highest transcription rates and lowest translation rates. mRNA decay affects noise in gene expression by altering translational efficiency or the number of protein molecules translated from a single mRNA transcript. High decay rates reduce noise by reducing the effective translational efficiency because short-lived mRNAs are translated fewer times than long-lived mRNAs (assuming other factors are held constant). Fraser et al. (2004) argue that the inverse relationship between protein evolutionary rate and mRNA decay rate can best be explained by minimization of noise in gene expression, where evolutionarily conserved essential genes are expressed at high levels and have short-lived mRNAs. Whether the inverse relationship between evolutionary rate and mRNA decay rate, or whether noise minimization in essential genes, are general phenomena remains to be determined as transcriptome-wide mRNA decay rates are not currently available for other organisms.

    Supplementary Material

    Supplementary materials are available at Molecular Biology and Evolution online (www.mbe.oupjournals.org).

    Acknowledgements

    I thank John Baines, Ying Chen, and Wolfgang Stephan for carefully reviewing drafts of the manuscript and for their invaluable comments/criticisms. Hiroshi Akashi generously provided helpful suggestions for improvement on this project. This research was supported in part by NSF grant 0315468 awarded to D.B.C.

    References

    Akashi, H. 2001. Gene expression and molecular evolution. Curr. Opin. Genet. Dev. 11:660–666.

    ———. 2003. Translational selection and yeast proteome evolution. Genetics 164:1291–1303.

    Antezana, M. A., and M. Kreitman. 1999. The nonrandom location of synonymous codons suggests that reading frame-independent forces have patterned codon preferences. J. Mol. Evol. 49:36–43.

    Arava, Y., Y. Wang, J. D. Storey, C. L. Liu, P. O. Brown, and D. Herschlag. 2003. Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA 100:3889–3894.

    Baines, J. F., J. Parsch, and W. Stephan. 2004. Pleiotropic effect of disrupting a conserved sequence involved in a long-range compensatory interaction in the Drosophila Adh gene. Genetics 166:237–242.

    Berg, O. G., and P. J. N. Silva. 1997. Codon bias in Escherichia coli: the influence of codon context on mutation and selection. Nucleic Acids Res. 25:1397–1404.

    Beutler, E., T. Gelbart, J. H. Han, J. A. Koziol, and B. Beutler. 1989. Evolution of the genome and genetic code: selection at the dinucleotide level by methylation and polyribonucleotide cleavage. Proc. Natl. Acad. Sci. USA 86:192–196.

    Blake, W. J., M. Kaern, C. R. Cantor, and J. J. Collins. 2003. Noise in eukaryotic gene expression. Nature 422:633–637.

    Boycheva, S., G. Chkodrov, and I. Ivanov. 2003. Codon pairs in the genome of Escherichia coli. Bioinformatics 19:987–998.

    Bulmer, M. 1991. The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897–907.

    Caponigro, G., D. Muhlrad, and R. Parker. 1993. A small segment of the MATa1 transcript promotes mRNA decay in Saccharomyces cerevisiae: a stimulatory role for rare codons. Mol. Cell Biol. 13:5141–5148.

    Caponigro, G., and R. Parker. 1996. Mechanisms and control of mRNA turnover in Saccharomyces cerevisiae. Microbiol. Rev. 60:233–249.

    Carlini, D. B., Y. Chen, and W. Stephan. 2001. The relationship between third-codon position nucleotide content, codon bias, mRNA secondary structure and gene expression in the drosophilid alcohol dehydrogenase genes Adh and Adhr. Genetics 159:623–633.

    Carrol, S. S., E. Chen, T. Viscount, J. Geib, M. K. Sardana, J. Gehman, and L. C. Kuo. 1996. Cleavage of oligoribonucleotides by the 2', 5'-oligodeadenylate-dependent ribonuclease L. J. Biol. Chem. 271:4988–4992.

    Curatola, A. M., M. S. Nadal, and R. J. Schneider. 1995. Rapid degradation of AU-Rich Element (ARE) mRNAs is activated by ribosome transit and blocked by secondary structure at any position 5' to the ARE. Mol. Cell Biol. 15:6331–6340.

    Decker, C. J., and R. Parker. 1993. A turnover pathway for both stable and unstable mRNAs in yeast: evidence for a requirement for deadenylation. Genes Dev. 7:1632–1643.

    ———. 2002. mRNA decay enzymes: decappers conserved between yeast and mammals. Proc. Natl. Acad. Sci. USA 99:12512–12514.

    Duan, J., and M. A. Antezana. 2003. Mammalian mutation pressure, synonymous codon choice, and mRNA degradation. J. Mol. Evol. 57:694–701.

    Elowitz, M., A. Levine, E. Siggia, and P. Swain. 2002. Stochastic gene expression in a single cell. Science 297:1183–1186.

    Fedorov, A., S. Saxonov, and W. Gilbert. 2002. Regularities of context-dependent codon bias in eukaryotic genes. Nucleic Acids Res. 30:1192–1197.

    Fraser, H. B., A. E. Hirsh, G. Giaever, J. Kumm, and M. B. Eisen. 2004. Noise minimization in eukaryotic gene expression. PLoS Biology 2:1–5.

    Grosjean, H., and W. Fiers. 1982. Preferential codon usage in prokaryotic genes: the optimal codon-anticodon interaction energy and the selective codon usage in efficiently expressed genes. Gene 18:199–209.

    Herrick, D., R. Parker, and A. Jacobson. 1990. Identification and comparison of stable and unstable mRNAs in Saccharomyces cerevisiae. Mol. Cell Biol. 10:2269–2284.

    Hoekema, A., R. A. Kastelein, M. Vasser, and H. A. Deboer. 1987. Codon replacement in the PGK1 gene of Saccharomyces cerevisiae: experimental approach to study the role of biased codon usage in gene expression. Mol. Cell Biol. 7:2914–2924.

    Ikemura, T. 1981. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol. 151:389–409.

    ———. 1982. Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in its protein genes: differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs. J. Mol. Biol. 158:573–597.

    Johannes, G., M. S. Carter, M. B. Eisen, P. O. Brown, and P. Sarnow. 1999. Identification of eukaryotic mRNAs that are translated at reduced cap binding complex eIF4F concentrations using a cDNA microarray. Proc. Natl. Acad. Sci. USA 96:13118–13123.

    Karlin, S., and J. Mrazek. 1996. What drives codon choices in human genes? J. Mol. Biol. 262:459–472.

    Katz, L., and C. B. Burge. 2003. Widespread selection for local RNA secondary structure in coding regions of bacterial genes. Genome Res. 13:2042–2051.

    Kuhn, K. M., J. I. Derisi, P. O. Brown, and P. Sarnow. 2001. Global and specific translational regulation in the genomic response of Saccharomyces cerevisiae to a rapid transfer from a fermentable to a nonfermentable carbon source. Mol. Cell Biol. 21:916–927.

    Mikulits, W., B. Pradet-Balade, B. Habermann, H. Beug, J. A. Garcia-Sanz, and E. W. Mülner. 2000. Isolation of translationally controlled mRNAs by differential screening. FASEB J. 14:1641–1652.

    Moriyama, E. N., and J. R. Powell. 1997. Codon usage bias and tRNA abundance in Drosophila. J. Mol. Evol. 45:514–523.

    Morris, D. R. 1995. Growth control of translation in mammalian cells. Prog. Nucleic Acid Res. Mol. Biol. 51:339–363.

    Muhlrad, D., C. J. Decker, and R. Parker. 1994. Deadenylation of the unstable mRNA encoded by the yeast MFA2 gene leads to decapping followed by 5'-->3' digestion of the transcript. Genes Dev. 8:855–866.

    ———. 1995. Turnover mechanisms of the stable yeast PGK1 mRNA. Mol. Cell Biol. 15:2145–2156.

    Nussinov, R. 1981. Eukaryotic dinucleotide preference rules and their implications for degenerate codon usage. J. Mol. Biol. 149:125–131.

    O'connor, M., T. Asai, C. L. Squires, and A. E. Dahlberg. 1999. Enhancement of translation by the downstream box does not involve base pairing of mRNA with the penultimate stem sequence of 16S rRNA. Proc. Natl. Acad. Sci. USA 96:8973–8978.

    Parsch, J., Tanda, S., and W. Stephan. 1997. Site-directed mutations reveal long-range compensatory interactions in the Adh gene of Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 94:928–933.

    Pervouchine, D. D., J. H. Graber, and S. Kasif. 2003. On the normalization of RNA equilibrium free energy to the length of the sequence. Nucleic Acids Res. 31:e49.

    Qiu, L., A. Moreira, G. Kaplan, R. Levitz, J. Y. Wang, C. Xu, and K. Drlica. 1998. Degradation of hammerhead ribozymes by human ribonucleases. Mol. Gen. Genet. 258:352–362.

    Sachs, A. B. 1993. Messenger RNA degradation in eukaryotes. Cell 74:413–421.

    Seffens, W., and D. Digby. 1999. mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences. Nucleic Acids Res. 27:1578–1584.

    Sharp, P. M., T. M. F. Tuohy, and K. R. Mosurski. 1986. Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 14:5125–5143.

    Shpaer, E. G. 1986. Constraints on codon context in Escherichia coli genes. Their possible role in modulating the efficiency of translation. J. Mol. Biol. 188:555–564.

    Sonenberg, N. 1996. mRNA 5' cap-binding protein eIF4E and control of cell growth. Pp. 245–270. J. W. B. Hershey, M. Mathews, and N. Sonenberg, eds. Translational control. Cold Spring Harbor Laboratory Press, Plainview, N.Y.

    Stenstr?m, C. M., and L. A. Isaksson. 2002. Influences on translational initiation and early elongation by the messenger RNA region flanking the initiation codon at the 3' side. Gene 288:1–8.

    Stenstr?m, C. M., H. Jin, L. T. Major, W. P. Tate, and L. A. Isaksson. 2001. Codon bias at the 3'-side of the initiation codon is correlated with translational initiation frequency in Escherichia coli. Gene 263:273–284.

    Struhl, K. 1999. Fundamentally different logic of gene regulation in eukaryotes and prokaryotes. Cell 98:1–4.

    Vasudevan, S., and W. W. Peltz. 2001. Regulated ARE-mediated mRNA decay in Saccharomyces cerevisiae. Mol. Cell 7:1191–1200.

    Wang, Y., C. L. Liu, J. D. Storey, R. J. Tibshirani, D. Herschlag, and P. O. Brown. 2002. Precision and functional specificity in mRNA decay. Proc. Natl. Acad. Sci. USA 99:5860–5865.

    Workman, C., and A. Krogh. 1999. No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution. Nucleic Acids Res. 27:4816–4822.

    Wright, F. 1990. The ‘effective number of codons’ used in a gene. Gene 87:23–29.

    Yarus, M., and L. S. Folley. 1984. Sense codons are found in specific contexts. J. Mol. Biol. 182:529–540.

    Zong, Q., M. Schummer, L. Hood, and D. R. Morris. 1999. Messenger RNA translation state: the second dimension of high-throughput expression screening. Proc. Natl. Acad. Sci. USA 96:10632–10636.

    Zuker, M. 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31:3406–3415.

    Tucker, M. and R. Parker. 2000. Mechanisms and control of mRNA decapping in Saccharomyces cerevisiae. Annu. Rev. Biochem. 69:571–595.(David B. Carlini)