当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2004年 > 第11期 > 正文
编号:11255304
Retroelement Dynamics and a Novel Type of Chordate Retrovirus-like Element in the Miniature Genome of the Tunicate Oikopleura dioica
     * Biofuture Research Group, Physiologische Chemie I, Biozentrum, University of Würzburg, am Hubland, Würzburg, Germany; Institute for Molecular Genetics, Berlin, Germany; and Sars Centre for Marine Molecular Biology, Bergen High Technology Centre, Bergen, Norway

    E-mail: volff@biozentrum.uni-wuerzburg.de.

    Abstract

    Retrotransposable elements have played an important role in shaping eukaryotic DNA, and their activity and turnover rate directly influence the size of genomes. With approximately 15,000 genes within 65–75 megabases, the marine tunicate Oikopleura dioica, a nonvertebrate chordate, has the smallest and most compact genome ever found in animals. Consistent with a massive elimination of retroelements, only one apparently novel clade of non–long terminal repeat (non-LTR) retrotransposons was detected within 41 megabases of nonredundant genomic sequences. In contrast, at least six clades of non-LTR elements were identified in the less compact genome of the tunicate Ciona intestinalis. Unexpectedly, Ty3/gypsy-related Tor LTR retrotransposons presented an astonishing level of diversity in O. dioica. They were generally poorly or apparently not corrupted, indicating recent activity. Both Tor3 and Tor4b families bore an envelope-like open reading frame, suggesting possible horizontal acquisition through infection. The Tor4b envelope-like gene might have been obtained from a paramyxovirus (RNA virus). Tor3 and Tor4b are phylogenetically clearly distinct from vertebrate retroviruses (Retroviridae) and are more reminiscent of certain insect and plant sequences. Tor elements potentially represent a so far unknown, ancient type of infectious retroelement in chordates. Their distribution and transmission dynamics in tunicates and other chordates deserve further study.

    Key Words: Oikopleura ? retroelement ? retrovirus ? compact genome ? reverse transcriptase ? envelope

    Introduction

    An important part of eukaryotic genomes has been generated by the copying of RNA molecules into DNA through a process called reverse transcription (Brosius 1999, 2003; Boeke 2003). For example, at least 40% of the human genome consists of retroposed sequences, and even more extreme situations have been described in plants and other organisms (Kazazian and Moran 1998; SanMiguel et al. 1998; International Human Genome Sequencing Consortium 2001; Liu et al. 2003). Reverse transcription is generally catalyzed in vivo by reverse transcriptases encoded by autonomous endogenous retrotransposons. A simple classification, based on structural and mechanistic features and supported by the molecular phylogeny of reverse transcriptase sequences, makes a distinction between autonomous retroelements with long terminal repeats (LTR retrotransposons sensu lato, including all known retroviruses), non-LTR retrotransposons (also called LINEs), and Penelope-like elements (Volff, Hornung, and Schartl 2001; Eickbush and Malik 2002; Arkhipova et al. 2003). Short interspersed nuclear elements (SINEs) and other categories of nonautonomous sequences can be mobilized in trans, sometimes very efficiently, by the retrotransposition machinery of reverse transcriptase-encoding retroelements (Esnault, Maestre, and Heidmann 2000; Kajikawa and Okada 2002; Dewannieux, Esnault, and Heidmann 2003). Some retrotransposons can also retrotranspose 3' flanking genomic sequences (Moran, DeBerardinis, and Kazazian 1999; Pickeral et al. 2000).

    The impact of retrotransposons on genome size depends on both their rate of retrotransposition and their frequency of elimination (Petrov 2001; Kidwell 2002). Retrotransposition efficiency is determined by the number of retrotransposition-competent elements in the genome and by their level of activity. Genomes and transposable elements have both developed strategies to control transposition (Ketting et al. 1999; Jensen, Gassama, and Heidmann 1999; Tabara et al. 1999; Bestor 2003; Sundararajan, Lee, and Garfinkel 2003), which has allowed their cosurvival over extremely long periods of evolution (Burke et al. 1998). Elimination of retrotransposable elements can occur either through different molecular mechanisms or by natural selection against individual insertions, against genomic rearrangements mediated by ectopic homologous recombination between nonallelic copies, and/or against retrotransposition itself (Eickbush and Furano 2002 and references therein). Finally, having a larger or a smaller genome might be adaptive depending on the circumstance (Petrov 2001), natural selection acting here on the balance between retrotransposition and elimination.

    Distinct organisms can respond to retrotransposons in different ways, possibly as the result of differential evolutionary constraints (Petrov 2001; Eickbush and Furano 2002). Species with a compacted genome are of particular interest to explore the sequences, molecular mechanisms, and selective pressures involved in genome size evolution. For example, the genomes of the smooth pufferfishes Takifugu rubripes (Fugu) and Tetraodon nigroviridis, which are about eight times smaller than the human genome, are the most compact genomes described to date in vertebrates (Brenner et al. 1993; Crollius et al. 2000; Aparicio et al. 2002). Smooth pufferfish genomes contain a low percentage of repetitive DNA but, surprisingly, a high diversity of autonomous retrotransposable elements, some of them having been recently active (Crollius et al. 2000; Aparicio et al. 2002; Bouneau et al. 2003; Neafsey and Palumbi 2003; Volff et al. 2003). We now analyze retrotransposon evolutionary behavior in the genome of the tunicate Oikopleura dioica, which represents an even more extreme context of compaction.

    Chordates are divided into urochordates (tunicates), cephalochordates, and vertebrates. Urochordates such as Oikopleura dioica (larvacean) and the sea squirt Ciona intestinalis (ascidian) form the sister group of cephalochordates and vertebrates and are therefore instrumental for comparisons of chordates with nonchordates. Important for this study, O. dioica has the smallest animal genome reported so far (65–75 megabases [Mb]), in which partial sequencing has revealed a strong compaction of all noncoding regions (Seo et al. 2001). Gene density in O. dioica is approximately two to three times higher than in Ciona intestinalis (with a similar set of genes; Dehal et al. 2002) and Takifugu rubripes (Aparicio et al. 2002) and about 20 times higher than in the human genome (International Human Genome Sequencing Consortium 2001). Whereas most of the genome of C. intestinalis has been recently sequenced and assembled (Dehal et al. 2002), we have generated 41 Mb of nonredundant sequence data from O. dioica through whole-genome shotgun sequencing. Alignments of this data set with sequences of twenty BAC inserts and 2,000 expressed sequence tags showed a very dense and uniform coverage of the genome with numerous small gaps (data not shown).

    Here we report a detailed analysis of the reverse transcriptase retrotransposon complement with respect to the degree of genome compaction in chordates, using genomic information from Ciona, Oikopleura, and vertebrates. In addition, this study was expected to provide new hints concerning the evolution of retrotransposable elements near the transitions between chordates and nonchordates, on the one hand, and between invertebrate chordates and vertebrates on the other hand. A particularly interesting question concerns the possible presence and nature of (endogenous) retroviruses in Oikopleura, since vertebrate retroviruses are phylogenetically clearly distinct from those found in insects, nematodes, and plants.

    Materials and Methods

    Genome Sequencing and Assembling

    Briefly, sperm DNA was sonicated, size-fractionated around 2 kilobases, and ligated into the pUC19 vector. Approximately 180,000 ends of plasmid inserts were sequenced. The average read length taken into account was rather short (422 bp) to avoid low quality sequences. Reads displaying 50 bp identity were assembled, leading to a data set of 44,797 contigs representing 40,983 megabases of nonredundant sequences. Alignments of 800 nonredundant expressed sequence tags (ESTs) with this shotgun data set showed an average coverage of 65%, with 83% of ESTs matched on more than one-quarter of their length and 38% covered on at least three-quarters of it. Sequences data have been submitted to GenBank/EMBL under accession numbers AY634216-AY634229.

    Sequence Analysis

    Sequence analysis was performed using the GCG Wisconsin package (Version 10.3, Accelrys Inc., San Diego, Calif.). Multiple sequence alignments were generated using PileUp from the GCG Wisconsin package and ClustalX (Thompson et al. 1997). Phylogenic analyses were performed on amino-acid alignment using maximum likelihood based on Bayesian interference with MrBayes v3.0b4 (Ronquist and Huelsenbeck 2003), using the Neighbor-Joining method (Saitou and Nei 1987; 1,000 pseudosamples) as implemented in PAUP* (Swofford 1998), and using quartet-based maximum likelihood with Tree-Puzzle (Schmidt et al. 2002). No potential artifact due to long branch attraction was detected. Blast analysis was essentially performed using sequence databases accessible from the National Center for Biotechnology Information (NCBI) server (www.ncbi.nlm.nih.gov/BLAST/).

    Other Web sites used are: the server for the prediction of signal peptides (http://bioinformatics.leeds.ac.uk/prot_analysis/Signal.html); the Takifugu rubripes Blast server at the U.K. Human Genome Mapping Project Resource Centre, Hinxton, Cambridge (http://fugu.hgmp.mrc.ac.uk/blast); the protein and nucleic acid sequence motif search server of the GenomeNet Database Service at the Bioinformatics Center, Institute for Chemical Research, Kyoto University (http://motif.genome.ad.jp); the Pfam database of protein families at the Washington University in St. Louis (http://pfam.wustl.edu/index.html); the Ciona intestinalis Blast server at NCBI (www.ncbi.nlm.nih.gov/BLAST/Genome/cionaWGSBlast.html); the conserved protein domain search server at NCBI (www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi); a server for prediction of coiled-coils in proteins (www.russell.embl.de/cgi-bin/coils-svr.pl); a server for the prediction of transmembrane domains in proteins (www.sbc.su.se/miklos/DAS/); the simple modular architecture research tool server at the EMBL (Heidelberg; http://smart.embl-heidelberg.de).

    Results

    Paucity of Non-LTR Retrotransposons in the Genome of O. dioica

    We searched for the presence of reverse transcriptase retrotransposons in different sequence databases including those of both urochordate genomes using the TBlastN program (protein queries against six-frame translations of a nucleotide database; Altschul et al. 1990). At least 14 major clades of non-LTR retrotransposons have been described so far in eukaryotes (Malik, Burke, and Eickbush 1999; Lovsin, Gubensek, and Kordis 2001; Eickbush and Malik 2002). Clades contain elements that are grouped together with ample phylogenetic support and date back to the Precambrian era, i.e., clades are at least 600 Myr old (Malik, Burke, and Eickbush 1999; Volff et al. 2003). Strikingly, none of the known clades of non-LTR retrotransposons could be identified in O. dioica, whereas at least six of them were clearly detected in C. intestinalis (table 1 and fig. 1; see also Simmen and Bird 2000; Permanyer, Gonzalez-Duarte, and Albalat 2003; Kojima and Fujiwara 2004 for C. intestinalis retrotransposons). These C. intestinalis clades included the LOA and R2 clades, which thus far have not been found outside arthropods. As observed in arthropods, R2 retrotransposons are also integrated in 28S rDNA in C. intestinalis (Kojima and Fujiwara 2004); 28S ribosomal RNA genes were identified in the O. dioica database, but they were not associated with any R2 sequence. The four other clades (NeSL, L1, I, and L2) were present in nonchordates, C. intestinalis, and vertebrates but were not detected in Oikopleura. These observations suggested extinction, or at least strong copy number reduction, of major clades of non-LTR retrotransposons in the Oikopleura lineage.

    Table 1 Retrotransposable Elements in Nonchordate and Chordate Species

    FIG. 1.— Phylogenetic analysis of the reverse transcriptase domain of non-LTR retrotransposons. Phylogeny was performed according to Malik, Burke, and Eickbush (1999) and Volff et al. (1999) with an alignment of 543 amino acids using the Neighbor-Joining method (Saitou and Nei 1987; bootstrap values are given for 1,000 pseudosamples). The tree is rooted on the CRE clade (Malik, Burke, and Eickbush 1999). Branches with less than 50% support have been collapsed. Asterisks show clades encoding a restriction enzyme-like endonuclease; other clades encode an apurinic-apyrimidinic endonuclease. Arrows indicate elements identified in the tunicate Ciona intestinalis. Accession numbers: Babar Oryzias latipes, BAB83841; BfCR1 Branchiostoma floridae, AAL40415; Bgr-like Ciona intestinalis, AABS01001398; Bgr-like Danio rerio, BX323466; Bilbo Drosophila subobscura, AAB92389; CgT1 Glomerella cingulata, S72620; CR1 Gallus gallus, AAC60281; CRE2 Crithidia fasciculata, S58380; Dong Bombyx mori, AAA92147; DRE Dictyostelium discoideum, S20106; DER-like Ciona intestinalis, AABS01001700; GilM Giardia intestinalis, AF433875; I Drosophila melanogaster, AAA70222; Ingi Trypanosoma brucei, CAD21861; Jockey Drosophila melanogaster, P21328; Lian Aedes aegypti, T30319; LOA-like Ciona intestinalis, AABS01000398; L2-like Ciona intestinalis, AABS01001177; Maui Takifugu rubripes, AF086712; NeSL1 Caenorhabditis elegans, T25352; NeSL-like Ciona intestinalis, BAC82601; NLR1Cth Chironomus thummi, AAB26437; Q Anopheles gambiae, T43020; L1 Homo sapiens, B28096; L1-like Ciona intestinalis, AABS01000070; R1Dm Drosophila melanogaster, S09111; R2 Bombyx mori, T18197; R2-like Ciona intestinalis, AABS01001743; Rex1 Xiphophorus maculatus, AY298859; Rex1-like Calliactis parasitica, AF221986; Rex3 Xiphophorus maculatus, AY298859; Rex6 Oryzias latipes, AB021490; RT2 Anopheles gambiae, AAA29365; Rte1 Caenorhabditis elegans, NP_498529; Sam Caenorhabditis elegans, NP_500227; Samurai Bombyx mori, AB055391; Tad1 Neurospora crassa, AAA21792; TX1 Xenopus laevis, P14381; Zebulon Tetraodon nigroviridis AAN12398.

    Odin, a Novel Family of Non-LTR Retrotransposons in the Genome of O. dioica

    Nevertheless, reverse transcriptase sequences presenting preferential similarities to sequences from non-LTR retrotransposons were detected in the O. dioica database (fig. 2). These sequences, forming a unique family containing divergent elements, were named Odin (for Oikopleura dioica non-LTR; figs. 1–3). Phylogenetic analysis based on reverse transcriptase sequences alone failed to detect any preferential relationship with particular known clades of non-LTR retrotransposons encoding either a restriction enzyme-like endonuclease (Yang, Malik, and Eickbush 1999) or an apurinic-apyrimidinic-like endonuclease (Feng et al. 1996; fig. 1). Sequences available in the O dioica database did not allow reconstruction of a complete Odin retrotransposon, and its 3' and 5' extremities could not be determined unambiguously (particularly due to the general state of decay of the copies and their low level of similarity outside of the coding region). Nevertheless, partial reconstruction experiments succeeded in extending the Odin open reading frame approximately 3.5 kb upstream and 1 kb downstream from the reverse transcriptase domain, respectively. This revealed that Odin also encodes an apurinic-apyrimidinic endonuclease with all classical conserved amino acid residues (fig. 2; Feng et al. 1996). Neither RNAseH nor protease domains were detected. No evidence for any additional upstream open reading frame (called orf1 in some other non-LTR retrotransposons) encoding a putative nucleic acid-binding protein was found, and no obvious nucleic acid–binding motif was present in the conceptual translation product of the sequence upstream from the reverse transcriptase region. In contrast, an apparently conserved HHCC (histidine-histidine-cysteine-cysteine) motif was present in the C-terminal region of the putative Odin translation product and might correspond to a zinc finger with nucleic acid-binding properties (fig. 2). Putative zinc fingers, generally as a CCHH motif, are frequently found at equivalent positions in other non-LTR retrotransposon proteins (Eickbush and Malik 2002 and reference therein). The HHCC motif found in Odin might be more reminiscent of the HHCC zinc finger of the integrase domain of some LTR retrotransposons and vertebrate retroviruses (Craigie 2002).

    FIG. 2.— Conceptual amino acid sequence of the apurinic/apyrimidinic-like endonuclease, reverse transcriptase, and C-terminal domains of Odin non-LTR retrotransposons from Oikopleura dioica. Identical residues are in black, conservative substitutions in gray. Conserved residues in non-LTR retrotransposon apurinic/apyrimidinic endonucleases (Feng et al. 1996) and reverse transcriptases (Malik, Burke, and Eickbush 1999) as well as putative conserved cysteine and histidine residues in the C-terminal domain of Odin conceptual proteins are shown by asterisks. The minimal and maximal size of intervening regions between the different domains is given in amino acids (aa) for Odin. Accession numbers are given in the legend of figure 1.

    FIG. 3.— Phylogenetic analysis of concatenated apurinic/apyrimidinic endonuclease and reverse transcriptase sequences of non-LTR retrotransposons. Phylogenetic analyses were performed with an alignment of 722 amino acids using maximum likelihood based on Bayesian interference with MrBayes v3.0b4 (first confidence values; Ronquist and Huelsenbeck 2003), Neighbor-Joining (second confidence values; Saitou and Nei 1987; bootstrap values are given for 1,000 pseudosamples), and quartet-based maximum likelihood with Tree-Puzzle (third confidence values; Schmidt et al. 2002). The maximum-likelihood tree was obtained by Bayesian interference and rooted on the L1 clade according to Eickbush and Malik (2002). Branches with less than 50% support by Bayesian interference have been collapsed. Accession numbers are given in the legend of figure 1.

    About 80 Odin reverse transcriptase sequences were detected by TBlastN analysis in the 41 Mb genome assembly of O. dioica using a 335 amino acid reverse transcriptase sequence as a query (we can not exclude that some nonoverlapping short hits might belong to a same partially sequenced element). About 30% of Odin endonuclease and reverse transcriptase-encoding sequences were corrupted by stop or frameshift mutations, suggesting that an important proportion of Odin elements is not functional. The degree of nucleotide sequence identity between Odin elements ranged from less than 60% up to 90%. The low nucleotide identity between certain copies indicated that Odin is an ancient family of non-LTR retrotransposons in the genome of O. dioica.

    To improve phylogenetic resolution, the apurinic-apyrimidinic endonuclease and reverse transcriptase amino acid sequences were concatenated and submitted to different types of analysis (fig. 3). This allowed visualizing phylogenetic relationships between some clades that were not detected using reverse transcriptase sequences alone (fig. 1). All methods used indicated that Odin was more closely related to evolutionary younger clades of non-LTR retrotransposons (Jockey, I, LOA, Tad1, R1, CR1, Rex1, and L2) than to more divergent clades (L1 and Rte; Malik Burke and Eickbush 1999; Eickbush and Malik 2002). This confirmed the results of the Blast analysis of public databases using Odin sequences as queries, which were always more related to evolutionary younger clades than to the L1 and Rte clades (data not shown). On the other hand, none of the different methods used revealed any preferential phylogenetic relationship of Odin with any particular clade of non-LTR retrotransposons (fig. 3). Even if we can not exclude that Odin is related to a known clade but that our methodology failed to detect this relationship, our results suggest that Odin represents the first known member of a new clade of non-LTR retrotransposons.

    Diversity and Probable Recent Activity of Ty3/gypsy LTR Retrotransposons in the Genome of O. dioica

    An unexpected variety of LTR retrotransposons from the Ty3/gypsy group were identified in the genome O. dioica. They were classified into four major families named Tor-1 to Tor-4 (for Ty3/gypsy Oikopleura retrotransposons) according to their phylogenetic position (figs. 4–5). Tor elements appeared very diverse and fairly divergent from retrotransposons from other organisms (fig. 5). In particular, none of the four clades of Ty3/gypsy elements identified in the tunicate C. intestinalis (Simmen and Bird 2000; Goodwin and Poulter 2002; this study) was found to be closely related to the Tor retrotransposons of O. dioica.

    FIG. 4.— Structure of Oikopleura dioica Ty3/Gypsy Tor LTR retrotransposons. Dashed lines show when the size of the LTRs could not be identified unambiguously and when the sequences available did not allow the identification of the gag gene. Amino acid sequence and copy number of the different putative zinc finger domains encoded by the gag genes are shown. Predicted domains: cc, coiled-coils (predicted at www.russell.embl.de/cgi-bin/coils-svr.pl); tm, transmembrane domains (predicted at www.sbc.su.se/miklos/DAS/); sp, signal peptide (predicted at http://bioinformatics.leeds.ac.uk/prot_analysis/Signal.html). INT, integrase; PR, protease; RH, RnaseH; RT, reverse transcriptase.

    FIG. 5.— Phylogenetic analysis of Ty3/Gypsy LTR retrotransposons. Phylogeny was performed with a 964 amino-acid alignment of concatenated reverse transcriptase and integrase sequences (Malik and Eickbush 1999) using the Neighbor-Joining method (Saitou and Nei 1987; bootstrap values are given for 1,000 pseudosamples). This topology was also well supported by maximum-likelihood analyses (Bayesian interference with MrBayes v3.0b4, Ronquist and Huelsenbeck 2003; and quartet-based analysis with Tree-Puzzle, Schmidt et al. 2002; data not shown). Branches with less than 50% support have been collapsed. Arrows indicate elements identified in the Ciona intestinalis genome database. Cigr3, a close relative of Cigr2 (Goodwin and Poulter 2002), is not shown. Accession numbers: Blastopia Drosophila melanogaster, AAK84933; Cer1 Caenorhabditis elegans, NP_498959; Cigr1 Ciona intestinalis, T31657; Cigr2 Ciona intestinalis, AABS01000095; Cigr2-like Danio rerio, BX248500; Cigr4 Ciona intestinalis, AABS01002378; Cigr4-like Takifugu rubripes, M002339 (HGMP); Cigr5 Ciona intestinalis, AABS01001248; Cigr5-like Takifugu rubripes, M000867 (HGMP); Gypsy Drosophila melanogaster, AAC82604; Human immunodeficiency virus 1, CAB89144; Mag Bombyx mori, S08405; Mdg1 Drosophila melanogaster, S70430; Mdg3 Drosophila melanogaster, T13798; Osvaldo Drosophila buzzatii, CAB39733; Rous sarcoma virus, P03354; Sushi Takifugu rubripes, AAC33526, Ty3 Saccharomyces cerevisiae, NP_011624; Ulysses Drosophila virilis, S18211; Zam Drosophila melanogaster, T13996.

    Tor retrotransposons generally presented the classical structural features found in most other Ty3/gypsy retrotransposons (Levin 2002), with two overlapping open reading frames called gag (encoding a structural nucleic acid-binding protein with one or several putative zinc fingers) and pol (encoding protease, reverse transcriptase, RnaseH, and integrase; fig. 4). In contrast to other Tor families, no frameshift was observed in Tor3.1 retrotransposons between gag and pol, which were separated by a stop codon at the same position in divergent elements (less than 60% nucleotide identity). Such a structure has already been reported for the Moloney murine leukemia virus, in which about 5% of ribosomes translating the gag gene read through the UAG terminator and translate the in-frame pol gene to produce the gag-pol fusion polyprotein (Wills, Gesteland, and Atkins 1991 and references therein).

    The structure and number of zinc fingers was variable in the Gag proteins of Tor retrotransposons (fig. 4). A classical CCHC motif (CX2CX4HX4C; Pfam accession PF00098; Smart accession SM00343) usually found in Gag sequences from retrotransposons and vertebrate retroviruses was detected in Tor1, Tor4, and some Tor3 retrotransposons, but the number of motifs was variable depending on the type of element (one, two, and three motifs in Tor1, Tor4, and Tor3, respectively). Interestingly, this generally conserved motif was not present in the Gag protein of Tor2 and of certain Tor3 retrotransposons. Instead of it, a CCCH putative zinc finger domain was identified in Tor2 (CX5/7CX5/7CX3H) and in Tor3 (CX8CX4/5CX3H). Such motifs are reminiscent of the CCCH zinc fingers (CX8CX5CX3H and similar motifs) that have been found in diverse proteins, some of them binding to RNA (Pfam accession PF00642; Smart accession SM00356; Blackshear 2002). CCCH motifs are also present in the matrix protein 2 of several paramyxoviruses from the Pneumovirinae subfamily (ssRNA negative-strand viruses) and in proteins encoded by the fowlpox and Chilo iridescent viruses (dsDNA viruses, no RNA stage). To the best of our knowledge, this is the first time that CCCH motifs are reported in (putative) proteins encoded by retrotransposable elements.

    About 180 Tor reverse transcriptase sequences were detected in the 41 Mb genome assembly (about 50 sequences for Tor2, Tor3, and Tor4, but only two for Tor1; the remaining sequences were too short to be classified unambiguously). The degree of nucleotide identity inside of the Tor2, Tor3, and Tor4 families ranged from less than 60% up to 98–99% (the two Tor1 elements showed 70% nucleotide identity). On average, open reading frame–corrupting mutations were observed each 7.5 kb, 2.8 kb, and 16 kb for Tor2, Tor3, and Tor4, respectively. In addition, elements with apparently intact gag and pol open reading frames were identified for all four Tor families. These observations, added to the high degree of nucleotide identity between certain copies, strongly suggested that some Tor elements were recently or even are still active. The high level of sequence divergence that was observed even within a same family demonstrated the ancient origin of the Tor elements.

    A Novel Diverse Group of Putative Chordate Retroviruses in the Genome of O. dioica

    Interestingly, multiple Tor elements phylogenetically belonging to the Ty3/gypsy group contained a third open reading frame (orf3) beside the gag and pol regions (figs. 4–7). A third open reading frame has already been identified in several plant and insect Ty3/gypsy retrotransposons including Gypsy (also called errantiviruses; Boeke et al. 1999), in some plant Ty1/copia elements, and in nematode nonchordate BEL retrotransposons (Lerat and Capy 1999; Malik, Henikoff and Eickbush 2000 and references therein). It was found to encode a transmembrane protein with structural similarities to the envelope (Env) protein of vertebrate retroviruses. Env proteins are required for virus infection and transmission, and some insect errantiviruses can indeed form infective viral-like particles (Kim et al. 1994; Song et al. 1994).

    FIG. 6.— Diversity of envelope-like proteins encoded by Oikopleura dioica Tor-3 LTR retrotransposons. Identical residues are in black, conservative substitutions in gray. Predicted coiled-coils (cc), transmembrane domains (tm), signal peptide sequence (sp), and conserved cysteines (*) are shown in boxes.

    FIG. 7.— Comparison of the envelope-like protein of Tor4b with the fusion protein of two paramyxoviruses (Tupaia paramyxovirus TPMV and Sendai virus SeV). Identical residues are in black, conservative substitutions in gray. Predicted coiled-coils (cc), transmembrane domains (tm), and conserved cysteines (*) are shown. The horizontal bar shows the putative canonical fusion tripeptide F-X-G. Accession number: Tupaia paramyxovirus, NP_054695; Sendai virus, P04855.

    Even though orf3 putative products from Tor elements were very divergent, they displayed typical features of envelope proteins such as predicted transmembrane domains, coiled-coil-like motifs, conserved cysteines (figs. 4–7), peptide cleavage, and N-glycosylation sites (data not shown). A signal peptide sequence was predicted in only one case. For other elements, the leader peptide might be encoded by an upstream region. Subgenomic envelope mRNA might be produced by splicing of the full-length transcript eliminating the intervening gag and pol open reading frames, as observed for other retroviruses (Varmus and Brown 1989; Pelisson et al. 1994; Leblanc et al. 1997). Env-like genes were detected in very divergent Tor3 elements as well as in Tor4b but not in Tor4a, supporting relatively recent differential loss or gain of an envelope-like gene in the Tor4 family. The degree of sequence divergence was very high between Tor3 and Tor4b elements (no significant nucleotide identity) and even between certain elements within the Tor3 family (less than 60% nucleotide identity). In some cases no significant similarity was detected between some Tor3 predicted Env-like proteins (E values higher than 100,000 using the Blast2 program; Tatusova and Madden 1999; fig. 6). Hence, this high diversity supported an ancient origin for Tor retrovirus-like elements.

    Possible Acquisition of the Tor4b Envelope-like Gene from a Paramyxovirus

    To trace back the origin of the env-like gene of Tor3 and Tor4b elements, sequence databases were analyzed using different search methods (see Malik, Henikoff, and Eickbush 2000). No protein strongly homologous to the Env-like sequences of Tor elements could be identified. Nevertheless, fusion glycoprotein sequences from paramyxoviruses (negative-sense genome single-stranded RNA viruses), which mediate membrane fusion, were systematically present among the five best hits using a Tor4b Env-like protein as a query. For instance, 50% similarity (26% identity) over 109 amino acids was detected with the fusion protein of the Human parainfluenza virus 1 (E = 0.31). Even if the expected value E was relatively modest, it was more than 1,000 times more significant than those obtained in some comparisons between Env-like proteins within the Tor3 group (data not shown), which can possibly be explained by the rapid evolution of the env gene in retroelements. Using the human parainfluenza virus 1 fusion protein against our O. dioica database, a Tor4b Env-like protein was giving the best hit (E = 0.01). Sequence comparison indicated conserved positions for the putative canonical fusion tripeptide F-X-G (Misseri et al. 2003 and references therein), for cysteine residues, for the predicted transmembrane domain, and for one coiled-coil (the Tor4b Env-like protein has a second predicted coiled-coil; fig. 7). Hence, these results suggested that the Tor4b family might have acquired its env-like gene from a paramyxovirus.

    Other Reverse Transcriptase Retrotransposons

    Among other groups of LTR retrotransposons, only DIRS1-like elements were found in Oikopleura. Sequences available in the database only allowed the reconstruction of a 2.4 kb open reading frame encoding the reverse transcriptase and RnaseH domains, but neither the gag open reading frame nor the LTRs could be identified. In the only two elements for which the sequence could be extended in the 3' direction, only one additional conserved open reading frame could be found 2 kb downstream from the reverse transcriptase/RnaseH-encoding region. We could not establish without ambiguity that the 420 amino acid conceptual translation product of this open reading frame corresponded to the lambda-like recombinase found in other DIRS1-related elements (Goodwin and Poulter 2002), since some very conserved residues were missing (data not shown). About 70 reverse transcriptase sequences from DIRS1-related elements were detected in the 41 Mb genome assembly, with degrees of nucleotide identity ranging from less than 60% up to 90%. DIRS1-related elements were apparently absent from the sea squirt genome and would therefore represent the only type of autonomous retroelement lost in C. intestinalis but present in O. dioica. Our results were also consistent with a possible extinction of Ty1/copia and BEL retrotransposons, which were not detected in Oikopleura (table 1). No "vertebrate" retroviruses (Retroviridae) were found in either tunicate, confirming that they might indeed be specific for the vertebrate lineage.

    Penelope-like retroelements were detected in the genome of both O. dioica and C. intestinalis. In O. dioica these elements formed a diverse monophyletic group presenting no obvious preferential relationship with any particular Penelope-like retrotransposon from other organisms (data not shown). Complete elements with an apparently intact unique open reading frame encoding a reverse transcriptase and a C-terminal YIG endonuclease (Volff, Hornung, and Schartl 2001; Arkhipova et al. 2003) were identified, suggesting recent activity. This was confirmed by the presence of very similar elements presenting 98–99% nucleotide identity. However, the high divergence between some copies (less than 60% nucleotide identity) showed the diversity and ancient origin of the group of Penelope-like retrotransposons found in O. dioica. About 60 reverse transcriptase sequences were detected in the 41 Mb genome assembly. We did not determine if the 5' untranslated region of Penelope-like elements of O. dioica contains an intron, as observed in flies and bdelloid rotifers (Arkhipova et al. 2003).

    Discussion

    TOP

    Abstract

    Introduction

    Materials and Methods

    Results

    Discussion

    Acknowledgements

    References

    In this study we have analyzed the autonomous retrotransposons of O. dioica, a chordate with the smallest and most compact genome reported to date in the animal kingdom. Even though our shotgun sequencing data set densely and rather evenly covers the genome of Oikopleura, we cannot exclude that some elements representing conserved retrotransposon families are present but were not detected in this study, particularly because they are difficult to clone. In particular, heterochromatic regions rich in repetitive sequences might be more refractory to shotgun sequencing and might be underrepresented in our data set. Genome size estimation based on the degree of coverage of expressed sequence tags by the shotgun data set indicated a total size of 65 Mb for "clonable" DNA, a value very close to the 72 Mb obtained for the complete genome using flow cytometry (Seo et al. 2001). This suggested that "nonclonable" heterochromatic regions, if any, do not represent an important fraction of the genome of Oikopleura.

    Taken together, our observations strongly support a massive elimination of major groups of autonomous retrotransposable elements in the O. dioica lineage. This is particularly true for non-LTR retrotransposons, for which only one possibly novel clade of element with apurinic/apyrimidinic endonuclease could be detected. In contrast, the genome of C. intestinalis contains at least six clades of non-LTR retrotransposons (Simmen and Bird 2000; Permanyer, Gonzalez-Duarte, and Albalat 2003; Kojima and Fujiwara 2004; this study).

    Ty1/Copia and BEL LTR retrotransposons might also have been lost. This "purification" possibly coincided with an intense process of genome compaction. Hence, our analysis of non-LTR retrotransposons in O. dioica supports an association between genome compaction and a drastic reduction of both copy number and diversity of some autonomous retroelements. This situation differs from that observed in the (less) compact genome of the pufferfish Takifugu rubripes, where a relatively high level of retrotransposon diversity has been maintained despite a strong reduction of repetitive DNA content (Crollius et al. 2000; Aparicio et al. 2002; Bouneau et al. 2003; Neafsey and Palumbi 2003; Volff et al. 2003). Whether genome compaction in O. dioica is related to the exceptional short life cycle of this organism (generation time of 4–5 days at 20°C, 1–2 days in the tropics) is a relevant but unsolved question, relevant because a correlation between genome size and longevity has been proposed in other animals (Monaghan and Metcalfe 2000).

    Among LTR retrotransposons, DIRS1 and four diverse families of Ty3/gypsy-related elements (Tor1-Tor4) were detected in O. dioica, but neither Ty1/copia nor BEL retrotransposons could be found. We estimated that about 400 reverse transcriptase (pseudo)genes are present in the 41 Mb genome data set, suggesting a total number of autonomous retrotransposons around 600. This number is similar to that observed in the slime mold Dictyostelium discoideum and the nematode Caenorhabitis elegans but higher than that observed in the three times larger genome of the fly Drosophila melanogaster (Kidwell 2002). Oikopleura apparently has a much lower number of retrotransposons than the pufferfish Takifugu rubripes and than other vertebrates with a less compact genome. Retroelements without reverse transcriptase region including very truncated non-LTR retrotransposons, SINE elements, or nonautonomous LTR retrotransposons have not been analyzed here. The genome of O. dioica is not devoid of DNA transposons (class II), since four copies of a Tc1/mariner element could also be identified (data not shown).

    A major question to be answered concerns the localization of retrotransposons in the compact genome of Oikopleura. In the pufferfish Tetraodon nigroviridis, retroelements are accumulating in specific heterochromatic regions of the genome together with other types of repeats (Dasilva et al. 2002; Bouneau et al. 2003). We found that an incomplete copy of Tor3 element is integrated 1 kb away from a stretch of 19 tandem telomeric repeats, and several Tor4a insertions are located within tandem clusters of a 58 bp repeat (data not shown). Additional studies are required to determine if repetitive DNA is compartmentalized in the compact genome of O. dioica.

    Common characteristics of Ty3/gypsy-related Tor families were a low level of sequence corruption, suggesting a rather recent retrotranspositional activity, and a modest copy number. In the case of a vertical transmission (inheritance), this would indicate a rapid turnover of Tor retrotransposons, i.e., frequent retrotransposition and subsequent elimination maintaining a low number of functional copies. However, the identification of an envelope-like open reading frame in Tor3 and Tor4b strongly suggests that these elements might have been introduced more recently into the genome of Oikopleura through infectious horizontal transfer. Nevertheless, envelope-mediated infectivity remains to be demonstrated for the Tor elements as well as for the great majority of invertebrate retrovirus-like sequences; infection has been shown to date only for the Gypsy element of Drosophila (Kim et al. 1994; Song et al. 1994). Furthermore, cases of horizontal transmission of LTR retrotransposons can also occur in the apparent absence of any detectable env-like gene, as demonstrated for the transfer of a Copia element between two species of Drosophila (Jordan, Matyunina, and McDonald 1999). Hence, we cannot exclude that other Tor elements without any obvious orf3 might also have been acquired more recently by the genome of O. dioica.

    According to the phylogenetic analysis of their reverse transcriptase and integrase domain, Tor3 and Tor4b belong to the group of Ty3/gypsy-related retrotransposons. From the phylogenetic point of view, they are therefore clearly different from vertebrate retroviruses (Retroviridae, represented by the Human Immunodeficiency Virus 1 and the Rous Sarcoma Virus in fig. 5; Varmus and Brown 1989; Eickbush and Malik 2002). Ty3/gypsy-related Env-encoding retrovirus-like elements have been described in plants (Peterson-Burch et al. 2000; Vicient, Kalendar, and Schulman 2001) and insects (Kim et al. 1994; Pelisson et al. 1994; Song et al. 1994; Leblanc et al. 1997; Pantazidis, Labrador, and Fontdevila 1999), but we could not find any evidence for a preferential phylogenetic relationship between these elements and those identified in O. dioica.

    Invertebrate retroviruses have probably obtained their env genes independently from distinct origins (Malik, Henikoff, and Eickbush 2000). Different nematode BEL-like retroviruses might have acquired their env gene from phleboviruses (single ambisense-stranded RNA viruses) and Herpesviridae (double-stranded DNA viruses) and insect Gypsy-related errantiviruses from baculoviruses (double-stranded DNA viruses). Our results suggest that a paramyxovirus (negative-sense genome single-stranded RNA virus) provided the Tor4b env gene. Interestingly, restricted sequence similarities have also been recently reported between the Env protein of Gypsy and paramyxovirus fusion proteins (Misseri et al. 2003). There is no indication about the origin of the env gene in the Tor3 family. Either env has been acquired in a common ancestor of Tor3 and Tor4 and subsequently lost in the Tor4a subfamily, or env-like genes have been gained independently in Tor3 and Tor4b. Another open question concerns the origin of the atypical CCCH zinc fingers domains in Tor2 and certain Tor3 elements. These motifs might have evolved from a preexisting canonical CCHC domain. Alternatively, they might have been acquired from other coding sequences, for example from paramyxovirus matrix genes.

    The function of the Env-like sequences in Oikopleura Tor elements remains to be determined. Nevertheless, Tor3 and Tor4b potentially represent, besides vertebrate retroviruses, a second and so far unidentified family of infectious retroelements of the chordate phylum. The extreme degree of divergence between Tor retrovirus-like elements suggests that they belong to an evolutionary ancient group of infectious retroelements. Therefore, they might not be restricted only to Oikopleura but rather be much more widespread. In the wake of SARS, experts are suggesting that surveillance for emerging diseases should extend to sampling and characterization of the entire panoply of viruses, which are circulating not only in people but also in animals (see Nature 424:113, 2003). Future prospects include the study of the distribution, transmission dynamics, and infection mechanisms of the Tor retrovirus-like elements in tunicates and other chordates, as well as the assessment of their potential as tools for transgenesis.

    Acknowledgements

    We are grateful to Manfred Schartl for discussions and encouragement. J.N.V. is supported by the Biofuture program from the German Bundesministerium für Bildung and Forschung (BMBF).

    References

    Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–410.

    . 2003. We have been warned. Nature 424:113.

    Aparicio, S., J. Chapman, E. Stupka et al. (41 co-authors). 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:1301–1310.

    Arkhipova, I. R., K. I. Pyatkov, M. Meselson, and M. B. Evgen'ev. 2003. Retroelements containing introns in diverse invertebrate taxa. Nat. Genet. 33:123–124.

    Bestor, T. H. 2003. Cytosine methylation mediates sexual conflict. Trends Genet. 19:185–190.

    Blackshear, P. J. 2002. Tristetraprolin and other CCCH tandem zinc-finger proteins in the regulation of mRNA turnover. Biochem. Soc. Trans. 30:945–952.

    Boeke, J. D. 2003. The unusual phylogenetic distribution of retrotransposons: a hypothesis. Genome Res. 13:1975–1983.

    Boeke, J. D., T. H. Eickbush, S. B. Sandmeyer, and D. F. Voytas. 1999. Metaviridae. Pp. in F. A. Murphy, ed. Virus taxonomy: ICTV VIIth report. Springer-Verlag, New York;349–357.

    Bouneau, L., C. Fischer, C. Ozouf-Costaz, A. Froschauer, O. Jaillon, J.-P. Coutanceau, C. K?rting, J. Weissenbach, A. Bernot, and J.-N. Volff. 2003. An active non-LTR retrotransposon with tandem structure in the compact genome of the pufferfish Tetraodon nigroviridis. Genome Res. 13:1686–1695.

    Brenner, S., G. Elgar, R. Sandford, A. Macrae, B. Venkatesh, and S. Aparicio. 1993. Characterization of the pufferfish (Fugu) genome as a compact model vertebrate genome. Nature 366:265–268.

    Brosius, J. 1999. Genomes were forged by massive bombardments with retroelements and retrosequences. Genetica 107:209–238.

    ———. 2003. The contribution of RNAs and retroposition to evolutionary novelties. Genetica 118:99–116.

    Burke, W. D., H. S. Malik, W. C. Lathe 3rd, and T. H. Eickbush. 1998. Are retrotransposons long-term hitchhikers? Nature 392:141–142.

    Craigie, R. 2002. Retroviral DNA integration. Pp. 613–630 in N. Craig, R. Craigie, M. Gellert, and A. Lambowitz, eds. Mobile DNA II. American Society of Microbiology Press, Washington, D.C.

    Crollius, H. R., O. Jaillon, C. Dasilva et al. (12 co-authors). 2000. Characterization and repeat analysis of the compact genome of the freshwater pufferfish Tetraodon nigroviridis. Genome Res. 10:939–949.

    Dasilva, C., H. Hadji, C. Ozouf-Costaz, S. Nicaud, O. Jaillon, J. Weissenbach, and H. R. Crollius. 2002. Remarkable compartmentalization of transposable elements and pseudogenes in the heterochromatin of the Tetraodon nigroviridis genome. Proc. Natl. Acad. Sci. USA 99:13636–13641.

    Dehal, P., Y. Satou, R. K. Campbell et al. (87 co-authors). 2002. The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science 298:2157–2167.

    Dewannieux, M., C. Esnault, and T. Heidmann. 2003. LINE-mediated retrotransposition of marked Alu sequences. Nat. Genet. 35:41–48.

    Eickbush, T. H., and A. V. Furano. 2002. Fruit flies and humans respond differently to retrotransposons. Curr. Opin. Genet. Dev. 12:669–674.

    Eickbush, T. H., and H. S. Malik. 2002. Origins and evolution of retrotransposons. Pp. 1111–1144 in N. Craig, R. Craigie, M. Gellert, and A. Lambowitz, eds. Mobile DNA II. American Society of Microbiology Press, Washington, D.C.

    Esnault, C., J. Maestre, and T. Heidmann. 2000. Human LINE retrotransposons generate processed pseudogenes. Nat. Genet. 24:363–367.

    Feng, Q., J. V. Moran, H. H. Kazazian Jr., and J. D. Boeke. 1996. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87:905–916.

    Goodwin, T. J., and R. T. Poulter. 2002. A group of deuterostome Ty3/gypsy-like retrotransposons with Ty1/copia-like pol-domain orders. Mol. Genet. Genomics 267:481–491.

    International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409:860–921.

    Jensen, S., M. P. Gassama, and T. Heidmann. 1999. Taming of transposable elements by homology-dependent gene silencing. Nat. Genet. 21:209–212.

    Jordan, I. K., L. V. Matyunina, and J. F. McDonald. 1999. Evidence for the recent horizontal transfer of long terminal repeat retrotransposon. Proc. Natl. Acad. Sci. USA 96:12621–12625.

    Kajikawa, M., and N. Okada. 2002. LINEs mobilize SINEs in the eel through a shared 3' sequence. Cell 111:433–444.

    Kazazian, H. H. Jr., and J. V. Moran. 1998. The impact of L1 retrotransposons on the human genome. Nat. Genet. 19:19–24.

    Ketting, R. F., T. H. Haverkamp, H. G. van Luenen, and R. H. Plasterk. 1999. Mut-7 of C. elegans, required for transposon silencing and RNA interference, is a homolog of Werner syndrome helicase and RNaseD. Cell 99:133–141.

    Kidwell, M. G. 2002. Transposable elements and the evolution of genome size in eukaryotes. Genetica 115:49–63.

    Kim, A., C. Terzian, P. Santamaria, A. Pelisson, N. Prud'homme, and A. Bucheton. 1994. Retroviruses in invertebrates: the gypsy retrotransposon is apparently an infectious retrovirus of Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 91:1285–1289.

    Kojima, K. K., and H. Fujiwara. 2004. Cross-genome screening of novel sequence-specific non-LTR retrotransposons: various multicopy RNA genes and microsatellites are selected as targets. Mol. Biol. Evol. 21:207–217.

    Leblanc, P., S. Desset, B. Dastugue, and C. Vaury. 1997. Invertebrate retroviruses: ZAM a new candidate in D. melanogaster. EMBO J. 16:7521–7531.

    Lerat, E., and P. Capy. 1999. Retrotransposons and retroviruses: analysis of the envelope gene. Mol. Biol. Evol. 16:1198–1207.

    Levin, H. L. 2002. Newly identified retrotransposons of the Ty3/gypsy class in fungi, plants and vertebrates. Pp. 684–701 in N. Craig, R. Craigie, M. Gellert, and A. Lambowitz, eds. Mobile DNA II. American Society of Microbiology Press, Washington, D.C.

    Liu, G., S. Zhao, J. A. Bailey, S. C. Sahinalp, C. Alkan, E. Tuzun, E. D. Green, and E. E. Eichler. 2003. Analysis of primate genomic variation reveals a repeat-driven expansion of the human genome. Genome Res. 13:358–368.

    Lovsin, N., F. Gubensek, and D. Kordis. 2001. Evolutionary dynamics in a novel L2 clade of non-LTR retrotransposons in Deuterostomia. Mol. Biol. Evol. 18:2213–2224.

    Malik, H. S., W. D. Burke, and T. H. Eickbush. 1999. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16:793–805.

    Malik, H. S., and T. H. Eickbush. 1999. Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J. Virol. 73:5186–5190.

    Malik, H. S., S. Henikoff, and T. H. Eickbush. 2000. Poised for contagion: evolutionary origins of the infectious abilities of invertebrate retroviruses. Genome Res. 10:1307–1318.

    Misseri, Y., G. Labesse, A. Bucheton, and C. Terzian. 2003. Comparative sequence analysis and predictions for the envelope glycoproteins of insect endogenous retroviruses. Trends Microbiol. 11:253–256.

    Monaghan, P., and N. B. Metcalfe, 2000. Genome size and longevity. Trends Genet. 16:331–332.

    Moran, J. V., R. J. DeBerardinis, and H. H. Kazazian Jr. 1999. Exon shuffling by L1 retrotransposition. Science 283:1530–1534.

    Neafsey, D. E., and S. R. Palumbi. 2003. Genome size evolution in pufferfish: a comparative analysis of diodontid and tetraodontid pufferfish genomes. Genome Res. 13:821–830.

    Pantazidis, A., M. Labrador, and A. Fontdevila. 1999. The retrotransposon Osvaldo from Drosophila buzzatii displays all structural features of a functional retrovirus. Mol. Biol. Evol. 16:909–921.

    Pelisson, A., S. U. Song, N. Prud'homme, P. A. Smith, A. Bucheton, and V. G. Corces. 1994. Gypsy transposition correlates with the production of a retroviral envelope-like protein under the tissue-specific control of the Drosophila flamenco gene. EMBO J. 13:4401–4411.

    Permanyer, J., R. Gonzalez-Duarte, and R. Albalat. 2003. The non-LTR retrotransposons in Ciona intestinalis: new insights into the evolution of chordate genomes. Genome Biol. 4:R73.

    Peterson-Burch, B. D., D. A. Wright, H. M. Laten, and D. F. Voytas. 2000. Retroviruses in plants? Trends Genet. 16:151–152.

    Petrov, D. A. 2001. Evolution of genome size: new approaches to an old problem. Trends Genet. 17:23–28.

    Pickeral, O. K., W. Makalowski, M. S. Boguski, and J. D. Boeke. 2000. Frequent human genomic DNA transduction driven by LINE-1 retrotransposition. Genome Res. 10:411–415.

    Ronquist, F., and J. P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574.

    Saitou, N., and M. Nei. 1987. The Neighbor-Joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425.

    SanMiguel, P., B. S. Gaut, A. Tikhonov, Y. Nakajima, and J. L. Bennetzen 1998. The paleontology of intergene retrotransposons of maize. Nat. Genet. 20:43–45.

    Schmidt, H. A., K. Strimmer, M. Vingron, and A. von Haeseler. 2002. Tree-Puzzle: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502–504.

    Seo, H. C., M. Kube, R. B. Edvardsen et al. (11 co-authors). 2001. Miniature genome in the marine chordate Oikopleura dioica. Science 294:2506.

    Simmen, M. W., and A. Bird. 2000. Sequence analysis of transposable elements in the sea squirt, Ciona intestinalis. Mol. Biol. Evol. 17:1685–1694.

    Song, S. U., T. Gerasimova, M. Kurkulos, J. D. Boeke, and V. G. Corces. 1994. An env-like protein encoded by a Drosophila retroelement: evidence that gypsy is an infectious retrovirus. Genes Dev. 8:2046–2057.

    Sundararajan, A., B. S. Lee, and D. J. Garfinkel. 2003. The Rad27 (Fen-1) nuclease inhibits Ty1 mobility in Saccharomyces cerevisiae. Genetics 163:55–67.

    Swofford, D. L. 1998. PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4. Sinauer Associates, Sunderland, Mass.

    Tabara, H., M. Sarkissian, W. G. Kelly, J. Fleenor, A. Grishok, L. Timmons, A. Fire, and C. C. Mello. 1999. The rde-1 gene, RNA interference, and transposon silencing in C. elegans. Cell 99:123–132.

    Tatusova, T. A, and T. L. Madden. 1999. Blast 2 sequences—a new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 174:247–250.

    Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24:4876–4882.

    Varmus, H., and P. Brown. 1989. Retroviruses. Pp. 53–108 in D. E. Berg and M. M. Howe, eds. Mobile DNA. American Society of Microbiology Press, Washington, D.C.

    Vicient, C. M., R. Kalendar, and A. H. Schulman. 2001. Envelope-class retrovirus-like elements are widespread, transcribed and spliced, and insertionally polymorphic in plants. Genome Res. 11:2041–2049.

    Volff, J.-N., L. Bouneau, C. Ozouf-Costaz, and C. Fischer. 2003. Diversity of retrotransposable elements in compact pufferfish genomes. Trends Genet. 19:674–678.

    Volff, J.-N., U. Hornung, and M. Schartl. 2001. Fish retroposons related to the Penelope element of Drosophila virilis define a new group of retrotransposable elements. Mol. Genet. Genomics 265:711–720.

    Volff, J.-N., C. K?rting, K. Sweeney, and M. Schartl. 1999. The non-LTR retrotransposon Rex3 from the fish Xiphophorus is widespread among teleosts. Mol. Biol. Evol. 16:1427–1438.

    Wills, N. M., R. F. Gesteland, and J. F. Atkins. 1991. Evidence that a downstream pseudoknot is required for translational read-through of the Moloney murine leukemia virus gag stop codon. Proc. Natl. Acad. Sci. USA 88:6991–6995.

    Yang, J., H. S. Malik, and T. H. Eickbush. 1999. Identification of the endonuclease domain encoded by R2 and other site-specific, non-long terminal repeat retrotransposable elements. Proc. Natl. Acad. Sci. USA 96:7847–7852.(Jean-Nicolas Volff*, Hans)