当前位置: 首页 > 期刊 > 《分子生物学进展》 > 2005年第1期 > 正文
编号:11175497
Phylogenetic Mapping of Intron Positions: A Case Study of Translation Initiation Factor eIF2
http://www.100md.com 《分子生物学进展》
     Department of Genetics, University of Leipzig, Leipzig, Germany

    Correspondence: E-mail address: krauss@rz.uni-leipzig.de.

    Abstract

    Eukaryotic translation initiation factor 2 (eIF2) is a G protein that delivers the methionyl initiator tRNA to the small ribosomal subunit and releases it upon GTP hydrolysis after the recognition of the initiation codon. eIF2 is composed of three subunits, , ?, and . Subunit shows the strongest conservation, and it confers both tRNA and GTP/GDP binding. Using intron positioning and protein sequence alignment, here we show that eIF2 is a suitable phylogenetic marker for eukaryotes. We determined or completed the sequences of 13 arthropod eIF2 genes. Analyzing the phylogenetic distribution of 52 different intron positions in 55 distantly related eIF2 genes, we identified ancient ones and shared derived introns in our data set. Obviously, intron positioning in eIF2 is evolutionarily conserved. However, there were episodes of complete and partial intron losses followed by intron gains. We identified 17 clusters of intron positions based on their distribution. The evolution of these clusters appears to be connected with preferred exon length and can be used to estimate the relative timing of intron gain because nearby precursor introns had to be erased from the gene before the new introns could be inserted. Moreover, we identified a putative case of intron sliding that constitutes a synapomorphic character state supporting monophyly of Coleoptera, Lepidoptera, and Diptera excluding Hymenoptera. We also performed tree reconstructions using the eIF2 protein sequences and intron positioning as phylogenetic information. Our results support the monophyly of Viridoplantae, Ascomycota, Homobasidiomyceta, and Apicomplexa.

    Key Words: eIF2 ? intron evolution ? molecular phylogenetics ? intron clustering ? arthropod phylogeny ? intron sliding

    Introduction

    In recent decades, we have witnessed significant progress in reconstructing phylogenies based on molecular data (nucleotide or amino acid sequences). Good examples are the analysis of small-subunit ribosomal RNA of 2,551 species (Van de Peer et al. 2000), as well as the analysis of over 500 proteins of six genomes (Wolf, Rogozin, and Koonin 2004). However, the results of these studies often conflict with respect to phylogeny. Therefore, it is necessary to improve the phylogenetic analysis using additional molecular markers and tighter species sampling. Beyond gene sequences, such markers include singular character states as transposable element insertions, gene order changes, code variants, and intron positions (reviewed in Rokas and Holland [2000]).

    Among these singular character states, intron position appears to be rather unreliable (Krzywinski and Besansky 2002; Wada et al. 2002). Likely cases of intron insertion and loss have been documented (Rzhetsky et al. 1997; Logsdon, Stoltzfus, and Doolittle 1998; Feiber, Rangarajan, and Vaughn 2002; Roy, Fedorov, and Gilbert 2003; Brady and Danforth 2004). Based on recent genome projects, large-scale comparisons of intron positions has been done (Fedorov, Merican, and Gilbert 2002; Rogozin et al. 2003). Their results suggested that intron positioning is more dynamic than previously assumed. Therefore, comprehensive analyses of novel marker genes, focusing on both intron position and sequence data, would be useful.

    We sought to analyze the evolution of sequence and exon-intron structure of a strongly conserved single-copy gene in a representative sample of eukaryotic species. For this purpose, we have chosen the subunit of eukaryotic translation initiation factor 2 (eIF2). By delivering the initiator methionyl-tRNA to the small subunit of ribosomes, eIF2 ensures specifity of initiation codon selection (Kapp and Lorsch 2004, Roll-Mecak et al. 2004). eIF2 is the strongest conserved subunit of the heterotrimeric eIF2 and is found in Eukaryota and Archaea.

    In a preliminary study (Krauss and Reuter 2000), we described eIF2 gene structures of six arthropod species and showed that the gene is fused with the functionally unrelated Su(var)3-9 histone methyltransferase gene in holometabolic insects. Here, we have sequenced eIF2 genes of 13 other arthropod species and collected database sequences from 40 additional, selected eukaryotic species for our analysis. Examining the cladistic distribution of 52 different intron positions in 55 distantly related eIF2 genes, we identified ancient and shared derived introns. Our analysis has shown that intron positioning in eIF2 is evolutionarily conserved. However, there were episodes of complete or partial intron losses followed by intron gains. Using a maximum-parsimony analysis based on an intron presence/absence matrix, we showed that introns are phylogenetically informative. We note that in phylogenetic mapping of intron positions, sampling of taxa has to be as complete as possible.

    Materials and Methods

    Sources of Arthropods Utilized

    Species trapped in the vicinity of Leipzig (Sachsen, Germany) were Lithobius forficatus (centipede), Oniscus asellus (woodlouse), Enallagma cyathigerum (damselfly), Forficula auricularia (earwig). and Aphis sambuci (aphid). Arthropods captured around Ruhla (Thüringen, Germany) were Araneus quadratus (spider), Cercopis vulnerata (cicada), and Scoliopterix libatrix (butterfly). Allacma fusca (springtail) was trapped near Ilsenburg (Sachsen-Anhalt, Germany), and Lepismachilis spp. (bristletail) was found in the vicinity of Pfarrwerfen (Salzburg, Austria). Additional species used from commercial stocks were Daphnia magna (water flea), Locusta migratoria (locust), and Bombyx mori (silk worm).

    Isolation of eIF2 Genes Using PCR

    DNA was isolated by standard protocols. Trizol reagent (Invitrogen) was used to isolate total RNA. cDNA was synthesized using Hminus-M-MLV reverse transcriptase (Fermentas) and a polyT primer. Degenerate primers based on the amino acid sequences of already known eIF2 proteins were designed to partially amplify the eIF2 gene from genomic DNA and/or cDNA of arthropod species. Used degenerated oligonucleotide primers were Ef120 (5'-CARGCXATHAAYATHGGXACXATHGGXCAYGTXGCXCAYGG-3'), Ef440 (5'-CCRTTXARCATXGTXSYCATXARDATRTCRTGXCCXGGRCARTC-3'), Ef120c (5'-AATATAGGAACCATTGGTCATGTNGCNCAYGG-3'), Ef440c (5'-TCCATCACAGCTGCTCCGTTCAACATNGTNGCCAT-3'), Efdeg3 (5'-GARCAYTTRGCSGCYATHGARATHATG-3'), Efdeg4 (5'-GCKTCKRCTSAGDGCWATYTTYTCKCC-3'), Efdeg5 (5'-ATTCGATCGTTYGAYGTVAAYAARCCNGG-3') and Efdeg6 (5'-TTTGTACCAACACCDATHARDCCNCCNGG-3'). Primer positions within eIF2 are shown in figure 1. PCR amplifications were done in a Gradient Cycler (Eppendorf) at annealing temperatures between 45°C and 65°C. The initial PCR product (320 bp to 900 bp) was purified using Spin PCRapid Kit (Macherey & Nagel) and sequenced. Species-specific primers were designed based on the received sequence to obtain 5' ends and 3' ends of eIF2 transcripts by 5' RACE (Rapid amplification of cDNA ends) and 3' RACE, respectively (GeneRacerKit, Invitrogen). Alternatively, inverted PCR products from digested and ligated genomic DNA preparations were purified, cloned, and sequenced. The specific sequencing strategy used for each of the analyzed species is given in figure 1 of Supplementary Material online. Species-specific primer sequences are available upon request.

    FIG. 1.— Alignment of selected eIF2 proteins. Four eukaryotic main groups are represented by Drosophila melanogaster (Dme, animals), Coprinus cinereus (Cci, fungi), Chlamydomonas reinhardtii (Cre, plants), and Theileria annulata (Tan, protists). Additionally, two structurally determined eIF2 proteins from Archaea are shown (Pab, Pyrococcus abyssi; Mja, Methanococcus jannaschii). Corresponding secondary structures are given according to Schmitt, Blanquet, and Mechulam (2002) and Roll-Mecak et al. (2004). The three domains of eIF2 (G domain, Domain II, and Domain III) are marked below the alignment. Above the sequences, binding sites of the used degenerate oligonucleotide primers are shown. Intron positions are marked and named according to their position in the Drosophila melanogaster eIF2 sequence. Intron positions that are found in two or more species are underlined.

    Sequencing

    Sequences were determined either by direct sequencing of the PCR fragment or by sequencing of two or three independent clones from different PCR reactions. PCR fragments were subcloned using pGEM-T PCR cloning kit (Promega). Transcribed regions were sequenced as RT-PCR products (directly or as a clone). Sequencing was performed on ABI 3100 equipment (ABI) using BigDye Sequencing Chemistry (ABI). For sequence analysis, MacVector version 7.2 (Accelrys) was used.

    Sequence Sampling and Annotation

    eIF2-orthologous DNA sequences from genome sequencing projects were sampled from databases using Blast. In particular, we used TBlastN (Altschul et al. 1997) based on nine already known eIF2 sequences (Krauss and Reuter 2000) to retrieve eIF2-like genomic sequences from finished and unfinished genome projects deposited at the NCBI database. Additionally, single-trace sequences were screened at the TraceSite of NCBI (http://www.ncbi.nih.gov/blast/tracemb.html) using discontiguous MEGABlast and were assembled manually. Independently, we screened for similar EST sequences utilizing TBlastN and assembled these sequences if possible. The orthology of these candidate sequences was verified by multiple alignment and phylogenetic analysis. We excluded all angiosperm sequences, with the exception of Arabidopsis and Oryza eIF2 genes, from the sequence set because usage of incomplete angiosperm EST and genome data would complicate both gene assembling and phylogenetic analysis by frequent gene duplications. We also excluded all vertebrate sequences, with the exceptions of Homo and Takifugu, because protein identity between these species is exceedingly high (>95%), and we could not find any gene structure differences between vertebrate eIF2 genes.

    Alignment and Mapping of Introns

    Amino acid sequences were aligned using MacVector 7.2 (Accelrys) and revised by eye. The divergent ends of eIF2 proteins were deleted from the final data set. Intron positions at the corresponding nucleotide sequences were deduced by co-occurrence of splice consensus sites and gaps in similarity. All identified exon boundaries are supported by typical splice-site sequences of U2-dependent spliceosomal introns. This exon-intron structure was confirmed by cDNA sequence if available. Introns localized upstream or downstream from the conserved eIF2 ORF were not considered. We evaluated the location of introns with respect to (1) Drosophila melanogaster eIF2 amino acid residue numbering and (2) phase in ORF, which results in bipartite naming of all identified intron positions. Intron phase was named 0 if the intron splits two consecutive codons; 1 if an intron locates between the first and the second nucleotide of the codon; and 2 if an intron locates between the second and the third nucleotide of the codon.

    Phylogenetic Analysis

    The programs MrBayes version 3.0 (Ronquist and Huelsenbeck 2003), Tree-Puzzle version 5.0 (Schmidt et al. 2001), PAUP* version 4.0b10 (Swofford 2002), and MacVector 7.2 (Accelrys) were used for phylogenetic analyses. Tree constructions were performed through the Bayesian inference (BI) method by MrBayes using the JTT substitution model, 500,000 replicates (every 100th was saved), and a burn-in of 2,000, resulting in 3,000 trees. The posterior probability tree from this analysis was computed using PAUP*. For a maximum-likelihood (ML) analysis, we used quartet-puzzling by Tree-Puzzle 5.0 with 25,000 puzzling steps and the WAG substitution model, and we assumed rate heterogeneity with eight gamma rate categories. A maximum-parsimony (MP) analysis was done by heuristic bootstrapping (1,000 steps) using PAUP* and the branch-swapping algorithm tree-bisection-reconnection (TBR). Finally, a neighbor-joining (NJ) analysis was calculated by MacVector using bootstrapping (1,000 steps) and a Poisson-corrected distance.

    We have utilized all intron positions of 37 genes in an independent tree reconstruction. Intronless genes were excluded from this analysis because a total erasure of introns from the eIF2 gene has taken place several times in parallel during evolution (see below). We built an input matrix based on presence/absence of a given intron and implemented a branch-and-bound search (MP) using PAUP* 4.0b10 (Swofford 2002). Characters were considered as unordered.

    Results and Discussion

    Structure of eIF2 Genes

    We included 55 eIF2 genes from 52 different organisms in our structural analysis (table 1 in Supplementary Material online). Recent genome projects have shown that, except in angiosperms (see Materials and Methods) and mammals, the eIF2 gene is a single-copy gene. In mice and humans, a second gene copy is essential for male fertility (Mazeyrat et al. 2001) and is expressed only in the males of these species (Ehrmann et al. 1998). These second copies were excluded from our analysis because they show accelerated sequence evolution.

    eIF2 genes contain up to 11 introns in the conserved region of the ORF (Supplementary Material table 1), which ranges from amino acid alignment position 14 to position 477 in figure 1. Intronless as well as intron-rich eIF2 genes exist in species of each of the following taxa: protists, fungi, and deuterostomates (Supplementary Material table 1). It indicates that erasure of all introns from the gene structure (most probably by retrotransposition) has occurred several times independently during evolution. Nevertheless, eIF2 introns mapped onto multiple protein alignment show a remarkable conservation of intron locations.

    An important initial assumption of our analysis is the homology of each specific intron position. We assume that introns might have only very rarely been gained at homologous sites in different evolutionary lineages, as implicated by the proto–splice-site theory (Dibb and Newman 1989; Sadusky, Newman, and Dibb 2004), as compared with intron insertion at different sites. The strong conservation of the eIF2 protein sequence in the eukaryotic species (fig. 1) excludes alignment ambiguities resulting in wrong homologization or distinction of intron positions found in different species. Therefore, we considered only those intron positions as homologous that were identical in both location and phase.

    Altogether, we found 52 different intron positions in eIF2 genes, and 22 of these introns were identified in only one of the analyzed species. The other 30 introns are present in two or more of those species and are very likely evolutionarily conserved. According to the intron-early theory, these introns predated the origin of eukaryotes and had an important role by assembling a functional protein from short-coding DNA sequences (Gilbert 1987). Thus, we mapped the location of exon boundaries in the tertiary structure described for two archaeal orthologs (fig. 1) (Schmitt, Blanquet, and Mechulam 2002; Roll-Mecak et al. 2004). eIF2 consists of three domains: G domain, domain II, and domain III. There is no local correlation of conserved intron positions with domain borders (fig. 1). We could not find any specific location of introns with respect to secondary structure elements. Furthermore, we noticed that the bacterial and mitochondrial protein EF-Tu, an elongation factor of translation, shows significant homology to all three domains of eIF2 in sequence and structure (Schmitt, Blanquet, and Mechulam 2002). Thus, all analyzed eIF2 genes likely evolved from an intronless ortholog. Accordingly, intron positioning in eIF2 has occurred independently from an eventual exon shuffling during early evolution. Hence, it follows that the pattern of eIF2 intron-exon boundaries should reveal suitable markers of eukaryotic phylogeny.

    Distribution of Intron Positions and Exon Lengths

    Next, we examined the exon length distribution in eIF2 genes (fig. 2A), which shows a maximum between 150 and 180 nt. This differs from an estimation of 90 to 120 nt based on a database of gene structures sampled from several model organisms (Deutsch and Long 1999). In agreement with this study, exons smaller than 60 nt are rare. A 16-nt exon between intron positions 39-0 and 44-1 was found in the related fungi Coprinus and Phanerochaete and a 23-nt exon between intron positions 133-1 and 141-0 was identified in the fungus Rhizopus. The rarity of small exons probably have some functional reasons. It was shown that exons shorter than 50 nt are poorly included in mRNA unless accompanied by strengthened splice sites or accessory sequences that act as splice enhancers (Hwang and Cohen 1997; Carlo, Sierra, and Berget 2000). Thus, small exons should have evolved more seldom than the larger ones. In addition, such exons may occur more frequently in fungi than in other eukaryotes, which is consistent with the data of Deutsch and Long (1999).

    FIG. 2.— Intron positions of eIF2 genes are clustered. (A) Exon length distribution. Only internal exons are represented that are coded for the conserved portion of the ORF. (B) Nucleotide distances of all intron positions identified in at least one eIF2 gene. (C) Correlation between the number of intron positions and the intron occupation rate of each cluster. The intron occupation rate is the relation between the sum of introns found in all analyzed genes in each cluster and the number of analyzed genes. (D) Schematic representation of 17 intron clusters identified in eIF2 gene structures. Each intron position is shown as vertical hatch.

    To analyze the distribution of all intron positions occurring in eIF2 genes, we plotted the distances between them (fig. 2B). We received a distribution with two maxima around 10 and 50 nt. Midst of both maxima, we identified a minimum that includes any intron position distances between 33 and 40 nt. With this observed distribution, we could arrange the 52 intron positions into 17 clusters, each consisting of one to six introns (fig. 2D). Inside each cluster, intron positions are separated by maximum of 33 nt, whereas intron clusters have a minimal distance of 40 nt. This weak clustering of intron positions revives the question of whether only specific intron positions are homologous to each other or whether whole intron clusters, evolutionarily related by the hypothetical process of intron sliding, are the units of evolution.

    We examined the distribution of introns between the clusters. If different intron positions inside one cluster are homologous to each other, the abundance of introns found in each cluster should be independent from the number of different intron positions that belong to a cluster. However, we would expect a linear correlation of intron abundance to the number of intron positions in each cluster if each intron position were evolved independently. We found such a correlation (fig. 2C). An additional argument for evolutionary independence of each intron position is the rarity of intron sliding; that is, the movement of an existing intron to a nearby position (Stoltzfus et al. 1997; Rogozin, Lyons-Weiler, and Koonin 2000). Therefore, we suggest that apparent clusters of intron positions are mainly the result of intron erasure and subsequent insertion of novel introns at positions compatible with preferred exon sizes. A corresponding, relatively uniform exon size distribution was suggested to be based on functional limitations of the nonsense-mediated decay (NMD) pathway, which is involved in the cell's surveillance for transcripts harboring premature termination codons (Lynch and Richardson 2002; Lynch and Kewalramani 2003). Intron positions play a guiding role during the recognition of premature termination codons by NMD and, therefore, might have evolutionarily forced to a more uniform distribution than under a model of random insertion.

    It appears that a clustered distribution of introns is not specific for the eIF2 gene. Wada et al. (2002) demonstrated at least one similar intron cluster (four different small shifts of their intron position 7) in deuterostome EF-1 genes. A very similar pattern of intron distribution was revealed in the insect chemoreceptor superfamily of Drosophila melanogaster (Robertson, Warr, and Carlson 2003).

    Tree Analysis Based on eIF2 Gene Structure

    We identified at least partial gene structures of 51 out of 55 analyzed eIF2 genes. Seven intronless genes and seven incompletely analyzed gene structures were excluded from the data set (table 1 in Supplementary Material online). An MP tree reconstruction (see Materials and Methods) was performed using the remaining 37 gene structures, represented by a presence/absence matrix including all intron positions (table 2 in Supplementary Material online). The resulting unrooted tree (fig. 3) shows a remarkable phylogenetic information content and supports, for example, the following monophyletic groups: Apicomplexa, Nematoda, Viridiplantae, Angiospermae, Deuterostomia, Homobasidiomycetes, and Pezizomycotina (Ascomycota sensu stricta). Other groupings are clearly spurious, such as the branching of the diatom Thalassiosira with nematodes or the branching of the flatworm Schistosoma with Coleomata. Interestingly, most representatives of Endopterygota (metamorphosing insects) did not group with the other arthropods. The common branching between those other arthropods (Daphnia, Apis, Aphis and Allacma), Schistosoma, and the represented deuterostomes can be explained by symplesiomorphic (shared ancient) intron positions (44-1, 127-2, 212-1, 394-0, and 451-2), probably acquired from the last common ancestor of all bilaterians, in their eIF2 genes. This finding may be related to results of recent studies (reviewed in Raible and Arendt [2004]) that revealed human and platworm genes seem to be closer to the bilaterian roots than are Drosophila and Caenorhabditis genes. This thesis is based on comparisons of gene content and similarity between orthologs. Our results point to a possibly slower evolution of gene structures in deuterostomes and some platworms as well. Additionally, arthropods appear to have evolved differentially fast in this respect.

    FIG. 3.— MP analysis of presence/absence of introns from 37 eIF2 genes. A consensus tree computed on 65 most-parsimonious trees (requiring 79 changes) is shown, together with selected taxonomic groups that are supported by this analysis. Note that this intron tree contains only incomplete phylogenetic information (see text).

    Examination of other gene structures will show whether the mode of successive intron deletion and insertion is more or less clocklike as sequence evolution. More likely, ancient episodes of general intron losses might have erased most or all of the phylogenetic informative intron positions from some lineages. This might have occurred in the Pezizomycotina, which have lost all ancient eIF2 introns and have acquired two novel introns (see below). In this case, we cannot expect a correct phylogenetic reconstruction from intron position data alone, irrespective of the number of gene structure samples. Intron presence/absence trees are based on phylogeny and lineage-specific modes of intron evolution; these should not be interpreted as typical phylogenetic trees.

    eIF2 Sequence Phylogeny

    Phylogenetic analysis was carried out using BI, ML, MP, and NJ methods and an eIF2 amino acid sequence alignment (see Materials and Methods, and see figure 2 of Supplementary Material online). Preferably, a nucleotide sequence alignment was avoided because of high amounts of homoplasy, which would be expected from coding sequences separated by long divergence times. The aIF2 sequences of the archaeal species Pyrococcus abyssi and Methanococcus jannaschii were used as outgroups. The BI tree, which provides additional branching information from the other trees, is presented (fig. 4). Interesting results of these analyses are (1) the significant support of the Coelomata hypothesis in contrast to the Ecdysozoa hypothesis and (2) the sister-relationship between Daphnia and Allacma. The last result might support a novel phylogenetic hypothesis (Nardi et al. 2003). Accordingly, hexapods are not monophyletic, and both Collembola (i.e., Allacma) and ectognathian insects evolved independently from crustacean-like arthropods. However, the strongly supported, but evidently untenable, relationship of Lithobius and Strongylocentrotus argues for a cautious interpretation of the eIF2 tree. Combined analyses of eIF2 and other sequence data will deliver more soundly based phylogenies.

    FIG. 4.— Bayesian inference (BI) tree of eIF2 proteins, with groups of interest highlighted. An cladogram as drawn by PAUP* is shown. The posterior probability (BI) is given above each node in percent of trees showing the same topology. The quartet-puzzling value (ML) is given below each supported node. If the same topology is also supported by MP and/or NJ analyses, the corresponding node is marked with a dot (MP) and/or a concentric circle (NJ).

    Phylogenetic Mapping of eIF2 Intron Gains and losses

    Because all compared single-copy eIF2 genes are certainly orthologous to each other, intron insertion events on the phylogenetic tree were traced (fig. 5). For this purpose, a consensus tree resulting from eIF2 sequence analysis and commonly supported phylogeny was used. Fourteen intron locations appear to be ancestral within eIF2 genes, as determined by their common occurrence in at least two highly divergent lineages of animals, fungi, plants, or protists. Six of these intron positions (18-0, 34-0, 39-0, 189-0, 372-0, and 436-0) are present in only up to four of the analyzed eIF2 genes and in only two of those lineages (table 2 in Supplementary Material online). Those seldom identified, but seemingly ancient introns might have been acquired alternatively by two relatively recent, parallel intron insertions in two different evolutionary lineages at "proto–splice sites." Such a scenario was anticipated based on the co-occurrence of cryptic and natural splice sites in actin genes of different species (Sadusky, Newman, and Dibb 2004). We excluded, therefore, those introns (18-0, 34-0, 39-0 189-0, 372-0, and 436-0) from tracing (fig. 5).

    FIG. 5.— Phylogenetic mapping of eIF2 gene structure changes. Intron gains and losses as well as the Su(var)3-9 insertion in the intron 81-1 are shown. A consensus tree resulting from eIF2 sequence analysis and commonly supported phylogeny were used. Only one intron acquirement at each position is shown, assuming that introns with identical positions are homologous. Putative intron losses are only given for highly parsimonious cases, often connected with a nearby insertion of a novel intron.

    On the other hand, some of the introns that appear taxa specific might be undetected ancient introns. However, most, if not all, of the other 38 introns were probably gained relatively late within one specific lineage, which is supported by their highly unequal distribution between the lineages (table 2 in Supplementary Material online). They are unlikely to be ancient, because then the original eIF2 gene would have been extraordinarily fragmented by introns, and multiple independent losses must have occurred in multiple different lineages. Twenty-two of those intron positions were found only in one of all analyzed eIF2 genes. More interestingly, 16 other intron positions are shared by two or more lineages as putative synapomorphic (shared derived) characters. The tree analysis of gene-structure data (fig. 3) showed that these intron positions are indeed of high phylogenetic value because they most override the impact of probable ancient intron positions, which were lost or gained in parallel in different lineages.

    Cladistic Patterns of Specific Introns

    We further examined whether specific intron positions of eIF2 might be phylogenetically informative. Several cases of successive losses and gains of only slightly different intron positions were documented (fig. 5), resulting in a nested distribution of the evolutionary newer introns. Such nearby introns cannot coexist in one gene structure for two reasons. First, exon sizes smaller than 50 nt are seldom and functionally detrimental (see above) (Hwang and Cohen 1997; Carlo, Sierra, and Berget 2000). Second, co-occurring processes of intron gain and loss were suggested to be driven by balance between additional mutational load of intron-containing alleles and selective pressure for an efficient mechanism of NMD, which is provided by a sufficiently tight, overdispersed distribution of introns; that is, exon sizes are more uniform than expected under random insertion (Lynch and Kewalramani 2003). Therefore, such intron changes represent reliable synapomorphic character states.

    The following cases of nested intron distributions are particularly informative. First, intron 212-1 was found in several species of animals, fungi, and protists. The nearby intron position 212-0 was identified only in angiosperm plants and may demarcate a monophyletic group of plants, because the intron 212-1 had been very likely lost before an intron 212-0 was evolved. Second, intron 127-2 was detected in protists, in some animals, and in one fungi (the basidiomycet Cryptococcus). A nesting intron position, only 9 nt away, is 130-2, which was exclusively found in all five analyzed Pezizomycotina species. In this case, we assume that ancient Ascomyceta eIF2 genes were intronless, because all other analyzed ascomycet species (Saccharomyces, Candida, Kluyveromyces, Eremothecium, and Schizosaccharomyces) have completely intronless eIF2 genes, and all analyzed Pezizomycotina species contain introns, specific only for this taxa (130-2 and 460-2). Interestingly, the establishment of spliceosomal introns in former intronless genes were already reported from Pezizomycotina species (Bhattacharya et al. 2000, and references therein). Third, nearly all analyzed eIF2 genes of Coleomata contain the intron 159-1, with exception of the analyzed Coleoptera, Lepidoptera, and Diptera, which instead contain the taxa-specific intron 160-1 (Anopheles is intronless in this region). In contrast, both the aphid Aphis sambuci and the bee Apis mellifera have the plesiomorphic intron 159-1 as deuterostomes and remotely related arthropods. Therefore, we propose a nested monophyletic taxa, a group including Diptera, Lepidoptera, and Coleoptera but not Hymenoptera (fig. 5). This novel taxa is additionally supported by the intron 295-0 of Clytus arietis (Coleoptera) and Bombyx mori (Lepidoptera), which is nested in distribution of the nearby intron 289-0 (found in platworms, deuterostomes, and several arthropods, including Apis mellifera). Our novel grouping contradicts commonly supported insect phylogeny, which considered the Coleoptera as outgroup to Hymenoptera+Diptera (Wheeler et al. 2001, and references therein). However, Diptera+Lepidoptera+Coleoptera excluding Hymenoptera was at least supported by Ross (1965) based on morphological arguments.

    The taxonomic distribution of the intron positions 159-1 and 160-1 let us to assume that intron 160-1 might have originated by sliding of the 159-1 intron. Thus, we compared the 5' and 3' splice regions of both introns (fig. 6). We found a significant conservation of some nucleotide positions in the 3' splice sites of both introns, which argues, indeed, for the possibility of intron sliding involving at least three single-nucleotide substitutions. These substitutions do not necessarily need to occur contemporarily because of the implicated one-codon shift of both splice sites.

    FIG. 6.— Comparison of intron 159-1 and 160-1 splice regions. Absolute consensus sequences are shown for seven 159-1 splice sites (different Coleomata including Hymenoptera) and six 160-1 splice sites (Diptera, Coleoptera, and Lepidoptera), respectively. Intron sequences are shown in lowercase, and coded amino acid sequence is given beneath the exon sequences. Conserved nucleotides between both consensus sequences are underlined, whereas differences are shown in bold. Note that only three nucleotide exchanges are necessary to shift the 159-1 intron position 3 nt downstream, because the alanine-coding and glycine-coding nucleotide sequences downstream from the 159-1 3' splice site would support the shift.

    From all analyzed intron positions in eIF2, 23-0 and 81-1 appear more robust against erasure than any others. The persistence of the intron 23-0, which was found in nearly all analyzed metazoan genes, may be caused by a regulatory element in this near-promoter intron. The pancrustacean-specific intron 81-1 is even more interesting. During the evolution of eIF2 in insects, a gene copy of Su(var)3-9 has been inserted into this intron (Krauss and Reuter 2000). Since this event, both eIF2 and Su(var)3-9 use the same promoter and two or three common exons and, thus, are considered as one fused gene. Synthesis of the two structually and functionally unrelated proteins is mediated by alternative splicing (Krauss and Reuter 2000). We also identified a Su(var)3-9–specific exon in the 81-1 intron of Apis mellifera, Cercopis vulnerata, and Enallagma cyathigerum (V. Krauss, unpublished data) (fig. 5). Thus, the eIF2 intron 81-1 exon might have been protected by the Su(var)3-9–specific against erasure. The intron itself is clearly older than this gene fusion and was almost certainly established during the early evolution of Pancrustacea. This is supported by (1) the presence of this intron in the woodlouse Oniscus asellus (Malacostraca) and in the springtail Allacma fusca, as well by (2) the absence of intron 81-1 and instead the presence of the older intron 87-0 in the eIF2 gene of the centipede Lithobius forficatus. The absence of 81-1 in Daphnia (Branchiopoda) may be secondary or, more interestingly, may indicate a sister-relationship between Hexapoda and Malacostraca as already suggested (Burmester 2001).

    Intron Positioning in eIF2 Reveal Insights in Phylogenetics and in Modes of Intron Evolution

    Our results give support to the following hypothesis about the evolutionary history of eIF2 in eukaryotes. Intronless eIF2 genes were probably inherited by unicellular eukaryotes. First introns might have been acquired during early evolution and passed on to protists, plants, fungi, and animals. Other introns were gained significantly later. The value of these late introns for phylogenetics depends critically on their evolutionary polarization through nearby older introns, which have to be lost before the insertion of the novel introns, except in case of intron sliding, where intron loss and gain occurs simultaneously. As demonstrated by the ultrashort eIF2 exons found in fungal species (16 or 23 nt long, respectively), the detection of phylogenetically nested, synapomorphic introns can be complicated by unusual splicing features. Therefore, the reliability of this intron age classification depends critically (1) on the strong conservation of gene sequence, which makes the secure differentiation from nearby intron positions possible, and (2) on the sampling of gene structures, which has to be as tight as possible. The use of intron positioning for maximum-parsimony tree reconstruction analysis of remote related eukaryotes was relatively successful, which argues for a high phylogenetic information content of intron positions. This analysis is substantially backed by tree reconstruction based on protein sequences. For instance, both amino acid sequences and intron positions argue against an Ecdysozoa group containing both arthropods and nematodes. The strongly persistent intron 81-1 adds evidence to the Pancrustacea hypothesis. Two parallel, nested intron distributions delivered evidence for a novel monophyletic taxa, a Diptera+Lepidoptera+Coleoptera clade excluding Hymenoptera (Fig. 5). Finally, we suggest that further analyses of strongly conserved gene structures will continue to improve the knowledge of both intron evolution and higher-level phylogenetics.

    Supplementary Material

    The newly reported sequences of eIF2 from Araneus quadratus, Lithobius forficatus, Daphnia magna, Oniscus asellus, Allacma fusca, Lepismachilis spp., Enallagma cyathigerum, Forficula auricularia, Locusta migratoria, Aphis sambuci, Cercopis vulnerata, Scoliopterix libatrix, and Bombyx mori are deposited in the DDBJ/EMBL/GenBank database (accession numbers AJ290958 and AJ715857 to AJ715871). Supplementary tables and figures are available online at the MBE Web site.

    Acknowledgements

    We would like to thank J. Hebler, C. Gr?bsch, R. Kirschner, A. Anton, G. Müller, C. Wierzchacz, I. Patties, and A. Howe for help in sequencing. We gratefully acknowledge the sequencing of the yet unpublished genomes of Theileria annulata, Toxoplasma gondii, Thalassiosira pseudonana, Chlamydomonas reinhardtii, Dictyostelium discoideum, Phytophthora sojae, Magnaporthe grisea, Aspergillus fumigatus, Coccidioides immitis, Gibberella zeae, Ustilago maydis, Cryptococcus neoformans, Coprinus cinereus, Rhizopus oryzae, Schistosoma mansoni, Schmidtea mediterranea, Brugia malayi, Strongylocentrotus purpuratus, Apis mellifera, Drosophila virilis, and Drosophila pseudoobscura. We would like to thank S. Phalke for critical reading of the manuscript. This work was supported by a grant from the Deutsche Forschungsgemeinschaft to V.K. and H.S.

    References

    Altschul, S. F., T. L. Madden, A. A. Sch?ffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.

    Bhattacharya, D., F. Lutzoni, V. Reeb, D. Simon, J. Nason, and F. Fernandez. 2000. Widespread occurrence of spliceosomal introns in the rDNA. Mol. Biol. Evol. 17:1971–1984.

    Brady, S. G., and B. N. Danforth. 2004. Recent intron gain in elongation factor-1alpha of colletid bees (Hymenoptera: Colletidae). Mol. Biol. Evol. 21:691–696.

    Burmester, T. 2001. Molecular evolution of the arthropod hemocyanin superfamily. Mol. Biol. Evol. 18:184–195.

    Carlo, T., R. Sierra, and S. M. Berget. 2000. A 5' splice site-proximal enhancer binds SF1 and activates exon bridging of a microexon. Mol. Cell Biol. 20:3988–3995.

    Deutsch, M., and M. Long. 1999. Intron-exon structures of eukaryotic model organisms. Nucleic Acids Res. 27:3219–3228.

    Dibb, N. J., and A. J. Newman. 1989. Evidence that introns arose at proto-splice sites. EMBO J. 8:2015–2021.

    Ehrmann, I. E., P. S. Ellis, S. Mazeyrat et al. (12 co-authors). 1998. Characterization of genes encoding translation initiation factor eIF-2gamma in mouse and human: sex chromosome localization, escape from X-inactivation and evolution. Hum. Mol. Genet. 7:1725–1737.

    Fedorov, A., A. F. Merican, and W. Gilbert. 2002. Large-scale comparison of intron positions among animal, plant, and fungal genes. Proc. Natl. Acad. Sci. USA 99:16128–16133.

    Feiber, A. L., J. Rangarajan, and J. C. Vaughn. 2002. The evolution of single-copy Drosophila nuclear 4f-rnp genes: spliceosomal intron losses create polymorphic alleles. J. Mol. Evol. 55:401–413.

    Gilbert W. 1987. The exon theory of genes. Cold Spring Harb. Symp. Quant. Biol. 52:901–905.

    Hwang, D. Y., and J. B. Cohen. 1997. U1 small nuclear RNA-promoted exon selection requires a minimal distance between the position of U1 binding and the 3' splice site across the exon. Mol. Cell Biol. 17:7099–7107.

    Kapp, L.D., and J. R. Lorsch. 2004. GTP-dependent recognition of the methionine moiety on initiator tRNA by translation factor eIF2. J. Mol. Biol. 335:923–936.

    Krauss, V., and G. Reuter. 2000. Two genes become one: the genes encoding heterochromatin protein Su(var)3-9 and translation initiation factor subunit eIF-2gamma are joined to a dicistronic unit in holometabolic insects. Genetics 156:1157–1167.

    Krzywinski, J., and N. J. Besansky. 2002. Frequent intron loss in the white gene: a cautionary tale for phylogeneticists. Mol. Biol. Evol. 19:362–366.

    Logsdon, J. M., A. Stoltzfus, and W. F. Doolittle. 1998. Molecular evolution: recent cases of spliceosomal intron gain?. Curr. Biol. 8:R560–R563.

    Lynch, M., and A. Kewalramani. 2003. Messenger RNA surveillance and the evolutionary proliferation of introns. Mol. Biol. Evol. 20:563–571.

    Lynch, M., and A. O. Richardson. 2002. The evolution of spliceosomal introns. Curr. Opin. Genet. Dev. 12:701–710.

    Mazeyrat, S., N. Saut, V. Grigoriev, S. K. Mahadevaiah, O. A. Ojarikre, A. Rattigan, C. Bishop, E. M. Eicher, M. J. Mitchell, and P. S. Burgoyne. 2001. A Y-encoded subunit of the translation initiation factor Eif2 is essential for mouse spermatogenesis. Nat. Genet. 29:49–53.

    Nardi, F., G. Spinsanti, J. L. Boore, A. Carapelli, R. Dallai, and F. Frati. 2003. Hexapod origins: monophyletic or paraphyletic?. Science 299:1887–1889.

    Raible, F., and D. Arendt. 2004. Metazoan evolution: some animals are more equal than others. Curr. Biol. 14:R106–R108.

    Robertson, H. M., C. G. Warr, and J. R. Carlson. 2003. Molecular evolution of the insect chemoreceptor gene superfamily in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 100 (suppl 2):14537–14542.

    Rogozin, I. B., J. Lyons-Weiler, and E. V. Koonin. 2000. Intron sliding in conserved gene families. Trends Genet. 16:430–432.

    Rogozin, I. B., Y. I. Wolf, A. V. Sorokin, B. G. Mirkin, and E. V. Koonin. 2003. Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr. Biol. 13:1512–1517.

    Rokas, A., and P. W. H. Holland. 2000. Rare genomic changes as a tool for phylogenetics. Trends Ecol. Evol. 15:454–459.

    Roll-Mecak, A., P. Alone, C. Cao, T. E. Dever, and S. K. Burley. 2004. X-ray structure of translation initiation factor eIF2gamma: implications for tRNA and eIF2alpha binding. J. Biol. Chem. 279:10634–10642.

    Ronquist, F., and J. P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574.

    Ross, H. H. 1965. A textbook of entomology. 3rd edition. Wiley, New York.

    Roy, S. W., A. Fedorov, and W. Gilbert. 2003. Large-scale comparison of intron positions in mammalian genes shows intron loss but no gain. Proc. Natl. Acad. Sci. USA 100:7158–7162.

    Rzhetsky, A., F. J. Ayala, L. C. Hsu, C. Chang, and A. Yoshida. 1997. Exon/intron structure of aldehyde dehydrogenase genes supports the "introns-late" theory. Proc. Natl. Acad. Sci. USA 94:6820–6825.

    Sadusky, T., A. J. Newman, and N. J. Dibb. 2004. Exon junction sequences as cryptic splice sites: implications for intron origin. Curr. Biol. 14:505–509.

    Schmitt, E., S. Blanquet, and Y. Mechulam. 2002. The large subunit of initiation factor aIF2 is a close structural homologue of elongation factors. EMBO J. 21:1821–1832.

    Schmidt, H. A., K. Strimmer, M. Vingron, and A. V. Haeseler. 2001. TREE-PUZZLE 5.0. Maximum likelihood analysis for nucleotide, amino acid, and two-state data. http://www.tree-puzzle.de/

    Stoltzfus, A., J. M. Logsdon, J. D. Palmer, and W. F. Doolittle. 1997. Intron "sliding" and the diversity of intron positions. Proc. Natl. Acad. Sci. USA 94:10739–10744.

    Swofford, D. L. 2002. PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4.0b10. Sinauer Associates, Sunderland, Mass.

    Van de Peer, Y., S. L. Baldauf, W. F. Doolittle, and A. Meyer. 2000. An updated and comprehensive rRNA phylogeny of (crown) eukaryotes based on rate-calibrated evolutionary distances. J. Mol. Evol. 51:565–576.

    Wada, H., M. Kobayashi, R. Sato, N. Satoh, H. Miyasaka, and Y. Shirayama. 2002. Dynamic insertion-deletion of introns in deuterostome EF-1alpha genes. J. Mol. Evol. 54:118–128.

    Wheeler, W. C., M. Whiting, Q. D. Wheeler, and J. M. Carpenter. 2001. The phylogeny of the extant hexapod orders. Cladistics 17:113–169.

    Wolf, Y. I., I. B. Rogozin, and E. V. Koonin. 2004. Coelomata and not Ecdysozoa: evidence from genome-wide phylogenetic analysis. Genome Res. 14:29–36.(Veiko Krauss, Marek Pecyn)