当前位置: 首页 > 期刊 > 《分子生物学进展》 > 2005年第4期 > 正文
编号:11176555
Complex Pattern of Coalescence and Fast Evolution of a Mitochondrial rRNA Pseudogene in a Recent Radiation of Tiger Beetles
http://www.100md.com 《分子生物学进展》
     Department of Entomology, The Natural History Museum, London, United Kingdom; and Department of Biological Sciences, Imperial College London, Ascot, Berkshire, United Kingdom

    Correspondence: E-mail: joap@nhm.ac.uk.

    Abstract

    Transposed copies of mitochondrial DNA into the nucleus (numts) are widespread, but to date they have not been described from the Coleoptera (beetles). Here we report the discovery of a numt derived from a mitochondrial ribosomal RNA gene in Australian tiger beetles (genus Rivacindela). The loss of function of the numt was confirmed by high proportion of transversions, numerous noncompensatory substitutions in stem regions, and large deletions in functionally important sequences. Phylogenetic analysis of orthologous numt sequences was performed together with the corresponding mtDNA lineage for a study of origination and establishment of the transposed copies in closely related populations and species. All numt sequences were strongly supported to be monophyletic, indicating a single origin of this element. However, populations were polymorphic for the presence of the numt, and phylogenetic trees based on the numt sequences showed inconsistencies with the corresponding mtDNA phylogeny, suggesting slower processes of fixation compared to the mtDNA sequences. In a side-by-side comparison with their mtDNA sister lineage, the nucleotide substitution rate of 1.66 x 10–8 substitutions/site/year in the numts was approximately equal to the average rate of mtDNA in this group but substantially higher than previous estimates of neutral nuclear rates in vertebrates. The numt clade was affected by several deletions but no insertions, with estimates of nucleotide loss exceeding the rate of nucleotide substitutions by approximately five times. The young age of the Rivacindela numt clade, their absence in species outside of a narrow lineage of related individuals, and the high rate of deletions suggest that insertions do not persist in this group, which is consistent with the view that comparatively small genomes as those of Coleoptera harbor fewer mitochondrial and other nuclear pseudogenes.

    Key Words: numts ? population polymorphism ? nucleotide substitution rate ? Rivacindela ? ribosomal RNA ? secondary structure ? compensatory substitutions ? DNA taxonomy

    Introduction

    Transposed copies of mitochondrial genes into the nuclear genome (nuclear mtDNAs or "numts"; Lopez et al. 1994) are known from many groups of animals (e.g., Sunnucks and Hales 1996; Zhang and Hewitt 1996; Bensasson et al. 2001a, 2001b; Thalmann et al. 2004). When compared with the corresponding mtDNA sequences, numts can provide useful information on the molecular basis of gene and genome evolution and reveal pattern and frequency of neutral change (Petrov and Hartl 1999; Bensasson et al. 2001b). Numts have frequently been discovered as a by-product in mtDNA-sequencing studies where they are coamplified with the target copy. More recently, scans of the human genome sequences resulted in the detection of 600 independent insertions of numts (Mourier et al. 2001; Woischnik and Moraes 2002). These sequences represent a spectrum of phylogenetic distances when compared to the authentic mtDNA copy, indicating the repeated integration of mtDNA that can be traced over a large time window (Bensasson, Feldman, and Petrov 2003). Yet, the abundance of numts varies widely. For example, in the insects, numts were found to be extremely common in grasshoppers and in Sitobion aphids and originated from numerous independent insertions (Bensasson et al. 2001b). In contrast, for most insect orders including the Coleoptera (beetles), none or only a few sequences have been described to date.

    When transferred to the nuclear genome, mitochondrial sequences become nonfunctional pseudogenes (Bensasson et al. 2001b) and can be detected due to the lower proportions of nonsynonymous substitutions in protein-coding genes and reduced transversion bias compared to their mitochondrial counterparts (e.g., Gellissen and Michaelis 1987; Sunnucks and Hales 1996; Bensasson, Zhang, and Hewitt 2000). Ribosomal RNA (rRNA) numts should be recognizable due to the disruption of their secondary structure, although this has not been confirmed in all cases (Olson and Yoder 2002). Numts have been found to exhibit a generally lower nucleotide substitution rate compared to the corresponding mtDNA sequences (Lopez et al. 1994; Arctander 1995; Dewoody, Chesser, and Baker 1999; Lu, Fu, and Zhang 2002). For example, nuclear rates in numts of mammals were up to six times lower than protein-coding mtDNA and almost equal to the slowly evolving small (rrnS) and large (rrnL) rRna subunits (Lopez et al. 1997). The absolute nucleotide substitution rates in numts were estimated to be 3 x 10–9 to 4 x 10–9 substitutions/site/year, in agreement with the well-documented rate in other nuclear pseudogenes (4 x 10–9; Li 1997) and hence clearly lower than the widely accepted mitochondrial rate of 2 x 10–8 to 2.5 x 10–8 substitutions/site/year (or 2%–2.5% divergence per million years) (Hasegawa, Kishino, and Yano 1985; Brower 1994). Because of these findings, numts have been regarded as "molecular fossils," which maintain ancestral characteristics that are lost in the mtDNA lineages from which they originated (Olson and Yoder 2002; Woischnik and Moraes 2002). However, conclusions about the type and rate of sequence variation in numts can be problematic because of the difficulty in identifying sister lineages across taxa, for instance, when numt lineages from the human genome sequence were fitted to a tree inferred from the extant mtDNAs sequences of primates (Bensasson, Feldman, and Petrov 2003).

    Similarly, precise knowledge about the relationships of mtDNA sequences and their corresponding nuclear paralogs is needed to establish the dynamics of their origination and loss in lineages of organisms. In most studies the point in time of the transposition event is inferred from evidence for gene duplication, i.e., from phylogenetic reconstruction of a common ancestor that gave rise to paralogous sequences. However, the transposed copy (numt) would be subject to processes of stochastic lineage sorting and hence at the moment of origination is not present in all individuals of a population. Due to the different modes of inheritance of mitochondrial and nuclear DNA, the distribution of the numt and its source mtDNA copy may differ substantially, giving rise to incongruence in the gene histories of both markers. The potentially short life span of numts and the similarly complex patterns of the fixation of secondary numt losses would lead to population polymorphisms. To date, only a few studies estimated the parameters of variation and persistence of numts in an explicit comparison with the corresponding mtDNA in a single lineage of organisms. These were based on a small number of taxa and minimal sampling for the purpose of establishing the common ancestor of mitochondrial and nuclear copies (Arctander 1995; Williams and Knowlton 2001; Lu, Fu, and Zhang 2002). As these studies have been carried out mostly on vertebrates, it also remains to be established if numts persist over similarly long periods in other organisms.

    Here we analyze numt variation in a large radiation of closely related species of tiger beetles (Rivacindela) occupying the temporal salt lakes of interior Australia. mtDNA variation in this group has been sampled extensively to establish the extent and number of species ("DNA taxonomy") (Pons et al., unpublished data). In a subset of individuals, we encountered difficulties with DNA sequencing in one of three mtDNA regions, indicative of coamplification of a pseudogene. As these sequences can mislead the analysis of species trees and, in our case, the delimitation of species, we investigated the pattern of variation of these presumed numt sequences in detail. The analysis revealed an unexpectedly complex pattern of presence and absence within a narrow set of closely related populations, indicating a phylogenetic history of this element that was inconsistent with the mtDNA tree. Placing the numt sequences relative to their authentic mtDNA sequences was problematic due to the difference in the types and rates of character changes, but establishing sister relationships of the numt clade is a prerequisite for comparing tempo and mode of evolution in these elements and their mtDNA paralogs.

    Materials and Methods

    Sampling and DNA Procedures

    Specimens included in the analysis had been used in a previous study of mtDNA variation in Australian Rivacindela from a population survey across the arid part of the continent, comprising 468 individuals from 65 collecting localities and including some 47 species (Pons et al., unpublished data). Specimens from each population (a morphologically distinct set of individuals collected at a particular site) were labeled with a common population number followed by a unique specimen number (5.1, 5.2, 5.3, etc.). Sequences for three fragments of mtDNA had been obtained for each individual in the previous study: 728 base pair (bp) of the cytochrome oxidase subunit I (cox1), 357 bp of the cytochrome b apoenzyme (cob), and 829 bp comprising the 3' end of the rrnL, the adjacent transfer RNA leucine 2 (trnL2), and part of NADH dehydrogenase subunit 1 (nad1). The nomenclature of mitochondrial genes used here follows Boore (2001). Sequences are available under GenBank accession numbers AJ617921–AJ618766 and AJ619087–AJ619548 (Pons et al., unpublished data). The present study focused on a set of individuals for which the standard cicindelid primer combination 16Sa (Simon et al. 1994) and Alf1 (Vogler et al. 1993) designed for amplification of the mitochondrial region comprising the 3' end of rrnL, the complete trnL2, and the partial sequence of the nad1 resulted in poor DNA-sequencing reads. Specific primers newly designed for the selective amplification of the mtDNA or numt copy of the rrnL gene were Riva16Sb (5'-CTGCCAAAGTAAYAATATTCTTC) and Riva16S (5'-CTGCCAAAGTAACAATATTAACAA), respectively (see below). Where necessary, polymerase chain reaction (PCR) products were cloned using the pMOSBlue blunt-ended cloning kit (Amersham Biosciences, Little Chalfont, UK). Clones were screened for inserts of the expected size by PCR amplification with the universal primers M13 and T7. Selected PCR products were purified with GENE CLEAN II (BIO101) and sequenced directly for both strands with the Big DyeTM Terminator Cycle Sequencing Kit and an ABI PRISMTM 3700 DNA Analyzer (Applied Biosystems, Foster City, Calif.). Numt sequences have been deposited under GenBank accession numbers AJ620919–AJ620933.

    Phylogenetic Analysis, Secondary Structure, and Nucleotide Substitution Rates

    Alignment of mtDNA sequences was trivial due to the absence of indels. Numt sequences showed differences in length, but they could be aligned manually with unambiguous placement of indels. Assuming that multiple consecutive indels were the result of a single mutational event, gaps of multiple base pairs were recoded as binary characters according to Simmons and Ochotorena (2000) prior to parsimony analysis. The secondary structure prediction for the rrnL molecule was based on the previously described models for Cicindela dorsalis (Buckley et al. 2000) and Drosophila melanogaster (Gutell et al., unpublished; see http://www.rna.icmb-utexas.edu/). Parsimony tree searches were conducted in PAUP*4.0b10 (Swofford 2002) performing 1,000 random addition searches with Tree Bisection-Reconnection branch swapping and keeping 10 shortest trees per replicate. Shimodaira-Hasegawa (SH) tests were also performed in PAUP* with 1,000 RELL replicates.

    Nucleotide composition bias between mtDNA and numt sequences of the rrnL gene was assessed with the 2 test of nucleotide homogeneity implemented in PAUP*. Because this statistical test does not take into account the phylogenetic relationships of sequences, the significance was also assessed using a null distribution from 2 statistics from simulations of nucleotide composition based on the tree and model of nucleotide evolution estimated from the real data (Foster 2004).

    Maximum likelihood branch lengths were estimated on the topology of the shortest parsimony trees using the most appropriate model of nucleotide substitution selected by Modeltest 3.06 (Posada and Crandall 1998). Nucleotide substitution rates and node ages were determined using r8s 1.50 (Sanderson 2003), which estimates a substitution rate across the entire tree using a set of calibrated divergence times for all unfixed nodes but permits the application of separate rate parameters for different parts of the tree (local molecular clocks). Ages of focal nodes were estimated with six nodes constrained to known ages based on a calibrated mtDNA clock (Pons et al., unpublished data) with absolute ages obtained from biogeographic evidence (Barraclough and Vogler 2002; Pons et al. 2004).

    Results

    Mitochondrial rrnL Amplification and Numt Detection

    PCR amplifications with the standard cicindelid 16Sa-AlfI primer combination designed to amplify rrnL and adjacent genes produced poor sequence reads whose profile was indicative of coamplification of different length fragments in some individuals. The difficulty with sequencing was limited to a set of closely related individuals from the southeastern part of South Australia and was studied in more detail to include all 84 individuals from 19 populations available from that region (populations 3a, 4, 5, 6, 8, 9, 10, 12c, 13, 14b, 16b, 18, 19, 20, 22, 39, 47, 48, and 49; Pons et al., unpublished data). In the case of three individuals (specimens 6.5, 10.2, and 10.4), we obtained clean reads which exhibited a deletion of approximately 100 bp at the 3' end of the rrnL gene plus a deletion of 1 bp in the adjacent nad1 gene. These deletions were not observed in any other group of cicindelids, indicating that the sequences constitute a nonfunctional copy of this mtDNA region. Using the newly designed primer Riva16Sb (in combination with 16Sa) which discriminates against the nonfunctional sequences (fig. 1), we amplified from individuals covering a representative range of Rivacindela. In all cases, a sequence of 617 bp of the rrnL gene was obtained which did not differ from those obtained with the larger 16Sa-Alf1 fragment, demonstrating that the new primer combination amplified the mtDNA copy. The primer also resulted in high-quality PCR and sequence of the intact rrnL gene in the group that produced poor sequence reads with the 16Sa-Alf1 combination. This confirms the presence of a functional copy in these individuals.

    FIG. 1.— Design of primers discriminating between mtDNA and numt. The mitochondrial DNA region coding rrnL, trnL2, and nad1 genes (830 bp) is shown. Primer Riva16Sb was designed to amplify the 3' end of the mitochondrial copy of the rrnL gene only (in combination with primer 16Sa), as it extends into the part deleted in the presumed numt. Primer Riva16S differed in the terminal 5 bp matching the 5' end of the deletion and hence should be selective for the numt sequence. The positions of the primer binding sites are indicated with dark arrows, and the deletion of about 100 bp detected in numt sequences is shown in gray.

    These individuals were further screened for the presence of the rrnL numt using primer Riva16S designed to amplify selectively those sequences containing the large deletion (fig. 1). We found that 50 out of the 84 individuals showed a fragment of approximately 550 bp, corresponding to the predicted size of the nonfunctional copy for which the primer was designed. The numt copy was detected in most populations but was missing from populations 18, 20, 22a, 47, and 49, a single individual of population 9 (specimen 9.2), and two individuals of population 19 (19.3 and 19.6). Individuals 13.2, 13.4, 16b.1, 48.3, and 48.4 showed faint bands of different sizes and were not considered further. In individuals in which the numt was not detected, the standard 16Sa-Alf1 primers amplified the authentic rrnL gene and the adjacent trnL2 and nad1 which suggested that numts were not present. The screening of these individuals with the Riva16S primer did not result in amplification, supporting the absence of the numts. In addition, the Riva16S primer produced a PCR fragment at lower annealing temperature, which corresponds to the authentic mtDNA copy, although the primer was designed for selective amplification of the numt copy, but mtDNA apparently acted as secondary template in the absence of the numt. These results further support the conclusion that numts were not present, rather than a failure of PCR in these individuals, and also exclude the presence of a similar numt without the long deletion.

    Direct sequencing of the numt PCR product with the Riva16S primer resulted in clean sequences of 508–510 bp, although in some cases readable electropherograms were only obtained after cloning in multicopy vectors. The analysis of clones revealed the presence of two alleles of different lengths in several individuals. Allelic sequences (found in multiple clones) were different by one substitution and one deletion of a single base in individual 9.4, one substitution and one long deletion of 26 bp in individual 39.2, or 12 nucleotide substitutions and a 1-bp deletion in individual 5.5. In addition, minor variants differing by one or two substitutions in singleton clones were encountered but were attributed to PCR or cloning errors and ignored. In total, the analysis of the numt-containing clade produced 29 different rrnL mtDNA haplotypes in the 84 individuals screened and 15 different numt alleles out of 50 individuals in this clade, which were positive for the numt. Most of the mtDNA haplotypes were limited to a single individual (19 haplotypes), seven were found in two or three individuals, and three haplotypes were widespread in 6, 13, and 31 individuals. Ten out of the 15 numt alleles were confined to a single individual, three alleles were found in a few individuals, and two other alleles were widely detected in 10 and 16 individuals (Table A1). The average pairwise divergence between mtDNA haplotypes was 0.46% and between numt alleles was 1.14%. The average divergence among mtDNA haplotypes and numts was 3.77%. The numt was not detectable in any of 27 representative populations selected from throughout the wider range of the genus Rivacindela, confirming its phylogenetically and geographically narrow distribution.

    Analysis of Nucleotide Substitution Pattern and Secondary Structure

    Patterns of nucleotide substitutions in stems and loops were compared in numt and authentic mtDNA haplotypes based on the hypothetical secondary structure of the rrnL sequence as shown in figure 2. Best fit was achieved when the large deletion present in all numt sequences was separated in two portions of 89 and 12 bp, affecting closely adjacent regions. In addition, three further deletions of 1 bp found in several individuals plus a long deletion of 26 bp (positions 64–89) in allele 2 of individual 19.4 were mapped on the secondary structure model (fig. 2).

    FIG. 2.— The hypothetical secondary structure of the 3' end of the RNA molecule of the rrnL gene of the individual 13.1 based on the previously described models for the related tiger beetle Cicindela dorsalis (Buckley et al. 2000) and the fruit fly Drosophila melanogaster (Gutell et al., unpublished). The two long deletions found in all numts are indicated by boxes with dotted borders, and the deletion of 26-bp length found in allele 2 of individual 19.4 is indicated by a box with a continuous border. Positions indicated by asterisks mark nucleotides that differ from C. dorsalis.

    The small differences in nucleotide composition between mtDNA (72.7% A+T) and numt sequences (74.1%) were statistically not significant either in the 2 test of nucleotide composition in PAUP* (P > 0.99) or in Foster's (2004) test of heterogeneity that takes phylogenetic relationships of sequences into account (P = 0.08). Hence, comparisons on type of changes between mtDNA and numt sequences will be not affected by biased nucleotide compositions. The number of variable sites per sequence was also similar (1.5 in mtDNA and 1.7 in numts). Just over half of the positions were designated as stem regions (53.6% and 52.3%), with very similar A + T content as the loop positions (2 test of nucleotide heterogeneity in PAUP* not significant, P > 0.99; table 1). These stem positions contributed two thirds of the total number of substitutions in the numts but less than half in mtDNA. Stem positions in the numts showed a much higher percentage of tranversions and a dramatically lower proportion of compensatory nucleotide substitutions than the mtDNA sequences (table 1).

    Table 1 A + T Composition and Total Number and Percentage of Substitutions Found in the Stems or Loops of the Mitochondrial rrnL Gene in the 29 Sequences of Clade A, the Remaining mtDNA (27), and Numts (15)

    Phylogenetic Position of the Numt Sequences

    Parsimony analysis was conducted to establish the origin of the numt. Aligning numt and authentic mtDNA sequences in a single data matrix several hundred trees of 227 steps were obtained with 23 nodes resolved in the strict consensus. All trees recovered the numt sequences as a strongly supported monophyletic group (Clade P), positioned within the Eastern Clade of Rivacindela, and in the vicinity of the corresponding mtDNA sequences from the same group of individuals (fig. 3a). However, because the tree was highly unresolved, it was not possible to establish the precise sister group of the numt Clade.

    FIG. 3.— (a) Cladogram representing the strict consensus of 540 trees of 227 steps obtained from the analysis of 29 mitochondrial haplotypes of rrnL gene (Clade A) and the 15 numt alleles (Clade P) found in the populations containing the numt. The tree also includes 27 representative mitochondrial sequences from different Rivacindela species of the Australian radiation selected outside of Clade A and Myriochile parasemicincta as out-group. Numbers above nodes are bootstrap support values above 50%. (b) Chronogram representing one of the 48 shortest trees (1,482 steps) obtained from the combined analysis of three mtDNA regions (cob, cox1, and the rrnL and adjacent regions) from the same 56 individuals as in panel (a). Branch lengths were estimated using the GTR + + I model, assuming a single local clock in r8s software and constraining the age of six nodes (numbers below nodes) to those values estimated in a previous study including 468 terminal taxa (Pons et al., unpublished data). The scale bar shows the calibration of absolute ages in MYA. Numbers above nodes are bootstrap support values above 50%, and asterisks mark the nodes not present in the strict consensus.

    Additional phylogenetic signal for obtaining better tree resolution and placing the numt Clade might be obtained when the rrnL data are combined with the available cox1, cob, trnL2, and nad1 sequences (Pons et al., unpublished data). The simultaneous analysis tree of 1,482 steps based on the same set of individuals resolved the large polytomy at the base of the rrnL tree (fig. 3b). Under this topology, all individuals harboring numt sequences were placed in Clade A (fig. 3b), but this clade also contains several individuals without numt sequences. Within this clade, the numt containing individuals were grouped mostly within subclade A1 (composed of populations 3a, 4, 5, 6, 8, 9, 10, 12c, 13, 14b, 16b, 22, 39, 48, and 49) and were largely missing from its sister Clade A2 (populations 18, 19, 20, and 47; fig. 3b).

    Assuming that this tree is a good reflection of the mtDNA phylogeny, the analysis of rrnL sequences and their numts was performed again applying a backbone constraint to fit the topology from the combined mtDNA analysis. This search preserves all relationships inferred in the mtDNA analysis but permits the addition of new sequences anywhere into this topology. The search retrieved 24 trees of 237 steps, i.e., 10 steps longer and significantly worse than the unconstrained search (P = 0.013; SH test). However, it may still be accepted as a good hypothesis of the mtDNA phylogeny because it is based on the topology derived under the addition of 302 informative characters and 1,293 steps to the parsimony search. Surprisingly, in this tree the numt sequences were recovered in a position distant to Clade A (the expected position if a gene duplication in a common ancestor gave rise to the numt paralog). We also tested the fit of the numt Clade to several positions of the mtDNA tree by constraining it to be sister of alternative groups, including Clade A and several other major clades (not shown). These searches produced trees of 238 or 239 steps which were not significantly different from those with the simple backbone constraints only (not significant; SH test). Hence, the use of additional mtDNA genes could not unequivocally resolve the placement of the numt Clade, but a sister relationship with the numt containing mtDNA Clade A remains defensible. However, even this placement requires a complex scenario for the origin of the numt and its spread in populations as the numt containing individuals were not monophyletic based on mtDNA (both subclades A1 and A2 contain individuals with and without numt sequences).

    Age and Nucleotide Substitution Rates of the Mitochondrial and Numt Sequences

    Comparisons of evolutionary rates in mtDNA and numt sequences were attempted based on the inferred age and the corresponding nucleotide substitution rates for the origin of Clade A (mtDNA) and Clade P (numt). Absolute ages were obtained from an existing calibration based on biogeographic evidence (Material and Methods). Rates of nucleotide substitution for the rrnL sequences alone corresponded to 3.78 x 10–9 substitutions/site/year (0.378% divergence per million years) and were very similar to that estimated based on the topology obtained with the combined analysis of the three mtDNA regions (4.37 x 10–9). These values were about three to five times lower than the rates of cox1 (1.67 x 10–8), cob (2.11 x 10–8), or the three mtDNA regions combined (1.52 x 10–8). To calculate the rate of the numt Clade, two independent molecular clock estimates were applied separately to each DNA lineage. If constrained as sister to Clade A, the nucleotide substitution rate in the numt Clade was 1.66 x 10–8, i.e., about four times faster than the corresponding mtDNA lineage, based on the estimated age for both clades of 1.83 MYA. The estimated age for Clade A using rrnL sequences alone was very similar, 1.70 MYA.

    Because of the uncertainty in the placement of the numt Clade, we also tested the age calibration under alternative topologies whereby the numt Clade was constrained to four different positions in the mtDNA tree. The age of Clade A and hence the nucleotide substitution rate of the mtDNA rrnL sequences was very similar under each topological constraint and deviated little from the above estimates of unconstrained searches of rrnL alone. However, the time of origin of the numt Clade varied greatly depending on its position on the tree. When constrained to a selection of other deep nodes in the mtDNA tree, the age of the numt Clade varied ranging from 3.02 to 3.96 MYA (corresponding to 8.96 x 10–9 to 6.85 x 10–9 substitutions/site/year). Based on an unconstrained search together with rrnL sequences only, the numt Clade was estimated to 2.22 MYA and a nucleotide substitution rate of 1.07 x 10–8, but the tree topology was highly inconsistent with the origin in a common ancestor with the numt-containing clade.

    Discussion

    Although numts have been found widely, they are still rarely studied in a phylogenetic context, permitting comparative analysis of tempo and mode of sequence evolution in a presumably neutral nuclear genome and the corresponding functional mtDNA. Equally, numts have rarely been studied across a geographical range of populations to investigate the process of their establishment in a lineage and the dynamics of numt defunctionalization and gradual deletion from the genome. The genus Rivacindela confined to isolated salt flat systems of arid Australia is a suitable system to study these issues because it constitutes a flock of recently diverged, largely allopatric populations representing various hierarchical levels of divergence and known times of coalescence. To date, no numts have been described from the Coleoptera, but the nonfunctional copies of the mitochondrial rrnL gene in Rivacindela show all attributes of a nuclear pseudogene, even if their localization in the nuclear genome is not strictly tested here.

    The presumed numts exhibited extensive deletions not present in functional copies of rrnL in other insects, their nucleotide substitution pattern suggests numerous noncompensatory changes in rRNA stem regions, and several individuals exhibited two alleles as expected for a single-copy Mendelian locus. Our findings argue against other possibilities to explain the presence of multiple copies of mtDNA genes, including contamination with parasitic or ingested organisms, laboratory artifacts, heteroplasmy, paternal leakage, and duplication of the mtDNA gene (Williams and Knowlton 2001). Although the evidence for a numt is strong, it proved remarkably difficult to determine their precise relationships with the mtDNA lineages and hence inferences of the hypothetical ancestor that gave rise to the numt. There were only 49 informative sites in the rrnL gene, plus 17 sites in numt sequences which, however, were invariant in the rrnL gene. The addition of many informative sites from other mtDNA regions provided a higher resolution of the tree but ultimately did not help to discriminate between several possible positions for the numt Clade. Numt sequences, as the product of an idiosyncratic integration of mtDNA into the nuclear genome, should be detectable as a gene duplication (the origin of a paralog) in the ancestor in which the transposition occurred. Therefore, the placement near Clade A, the only lineage containing this element, remains the most plausible scenario among several equally parsimonious possibilities.

    Despite the uncertainties about precise sister relationships, the dense sample of orthologous numt sequences is a unique resource for studying the establishment of these elements in a clade. The most conspicuous finding of this analysis was that numt sequences were distributed inconsistently throughout Clade A1 and A2 and some populations were polymorphic for their presence. All phylogenetic analyses and the clock estimates agreed, however, that these sequences had a single origin which predates the age of the subgroups in Clades A1 and A2 lacking the numt. This finding might imply a secondary loss of the numts in several parts of Clade A, and perhaps its immediate mtDNA sister group which is only very slightly older (fig. 3b), or the difficulty of detecting these elements because of variation in primer-annealing sites. While the latter cannot be excluded, the rates of nucleotide substitutions and loss of base pairs are low (see below) and perhaps insufficient to account for the repeated loss of the element.

    If secondary loss is not responsible for this pattern, other processes such as gene flow and the retention of ancestral polymorphisms could be invoked. Populations of Rivacindela are genetically highly structured throughout their range, and most populations were probably separated during the final stage of the drying of Australia some 4–500,000 years ago (Pillans and Bourman 2001). The existing groups are largely the result of vicariance (Pons et al., unpublished data) and unlikely to undergo gene flow which could explain the observed numt distribution in Clades A1 and A2. Alternatively, lineage sorting after the initial transposition event could have affected the distribution of mtDNA and nuclear markers differently. The time of coalescence of numt sequences may exceed the age of recognizable mtDNA lineages, as expected for nuclear loci due to their greater effective population size. Only 1/2Ne of transposition events will be fixed (Sorenson and Fleischer 1996), and hence populations with intermediate frequencies of a numt should also be expected (Zischler et al. 1995; Bensasson, Feldman, and Petrov 2003) and in some cases have been observed (Thomas et al. 1996; Yuan et al. 1999; Ricchetti, Tekaia, and Dujon 2004).

    The time calibration of the mtDNA tree was in good agreement with widely accepted estimates of mtDNA evolution (Hasegawa, Kishino, and Yano 1985; Brower 1994) and also showed the widely observed faster rates in protein-coding genes (Lopez et al. 1997). However, the age estimates for the numt Clade P were complicated and greatly dependent on the precise placement of this clade in the mtDNA tree. If set to be sister group to Clade A, the rates of nucleotide substitution estimated under a separate local clock were 1.66 x 10–8 substitutions/site/year and hence four to five times faster than previous estimates for numts (3 x 10–9 to 4 x 10–9; e.g., Lu, Fu, and Zhang 2002) and other nuclear pseudogenes (4 x 10–9; Li 1997). These values are lower when other phylogenetic positions of the numt Clade are considered but still exceed previous estimates by a factor of 1.5–3. These differences in estimates of neutral rates of nucleotide substitutions may be due to the methodological differences between the studies which lacked precise information about sister relationships, or they may suggest that rates are generally higher in insects. DNA-DNA hybridization studies have established that the average rate of mtDNA and single-copy nuclear DNA is similar in insects, in contrast to mammals where the mtDNA rate is about 5–10 times higher (Caccone, Amato, and Powell 1988; Sharp and Li 1989). Our results would corroborate these findings and perhaps indicate that the neutral rate in the nuclear chromosome of insects equals or even exceeds the mtDNA rate.

    Aside from nucleotide substitutions, variation in numts is greatly affected by indels. We found six deletions in the Rivacindela numts of 1–89 bp in length but no insertions. This is in agreement with studies of neutral changes in Drosophila non-LTR retrotransposons where deletions outnumber insertions almost 9 to 1, and the average deletion size of 25 bp greatly exceeds the average insertion size of 2.8 bp (Petrov et al. 1998, and references therein). Deletions contribute greatly to the degradation of these pseudogenes and will lead to their eventual demise. The two long adjacent deletions found in all numts, and hence presumably fixed at the species level, must have arisen between 1.83 and 0.46 MYA in the calibrated tree (the internodes between origin of the numt clade and the inferred point of coalescence of the numt sequences). Thus, 101 of 890 bp were lost from the numt in a maximum time span of 1.37 MYA corresponding to a rate of nucleotide loss of 8.2 x 10–8 per site and year. Similar rates can be obtained based on a 26-bp plus two 1-bp deletions (28 of 789 lost in 0.46 MYA, equal to a minimum rate of 7.7 x 10–8) closer to the tips of the tree, although the latter deletions were not found in all individuals, and hence it remains possible that they will not ultimately be fixed in the lineage. If taken at face value, the rates of deletion in the numt Clade are about five times faster than the nucleotide substitution rate of 1.66 x 10–8 substitutions/site/year.

    In comparisons between groups of organisms, the ratio of deletion rate to nucleotide substitution rate varies widely, ranging from 51 in Arabidopsis thaliana, to 4.2–5.9 in Caenorhabditis elegans, 4.5 in Drosophila, 3–4 in Strongylocentrotus purpuratus, to the much lower values of 0.13 in rodents and mammals, 0.34 in crickets, and 0.06 in grasshoppers (Petrov 2002b; Britten et al. 2003). This appears to be affected mostly by differences in the deletion spectrum size, rather than frequency at which deletions occur, and is related to genome size, as the rate of DNA loss and the number of numts (and other nuclear pseudogenes) present in the nucleus are correlated (Bensasson et al. 2001a, 2001b; Petrov 2002a; Gregory 2004). Species with small genomes such as D. melanogaster (0.2 pg) and C. elegans (0.1 pg) have higher rates of DNA loss and consequently harbor fewer numts and other pseudogenes than species with large genomes such as humans (3.5 pg) or grasshoppers (8.2 pg). The size distribution of the deletions we found here for a beetle is closer to that of Drosophila than grasshoppers and crickets. The low prevalence of numts in Rivacindela and many other groups of beetles (unpublished data) which have small genome sizes (0.5–0.6 pg, Gregory 2001) would confirm this trend of deletion rates and genome size. The fact that numt sequences have not been described in the Coleoptera previously may therefore indicate that they are truly rare, rather than having been overlooked. Numt sequences were found to be essentially absent from two fully sequenced genomes of Diptera, which also harbor few nuclear pseudogenes (Richly and Leister 2004). Although these findings are not fully conclusive because of the usual practice of genome sequencing projects to screen out mtDNA sequences, the low tendency to detect numts in holometabolan insects suggests their rarity compared to other insect groups with larger genomes.

    Conclusions

    The evolutionary analysis of the Rivacindela numt sequences revealed the complicated history of origin and possible loss of the element in closely related populations, the faster than expected rate of evolution, and the high incidence of point mutations and deletions leading to the degradation of the element. A detailed phylogenetic analysis and dense sampling of closely related populations were necessary for these inferences. Our study contributes to the growing number of case studies investigating the evolutionary dynamics of numt sequences. As we learn more about these elements, it appears that natural populations and major organismal lineages differ greatly in the abundance of these elements. Recent proposals to use mtDNA for a universal identification system of species (Tautz et al. 2002; Hebert et al. 2003) might be greatly compromised by the presence of numts. Basic knowledge about their prevalence and evolutionary dynamics is therefore necessary for DNA taxonomic studies of this kind. From the limited data currently available, some groups such as the holometabolan insects appear less affected than others, but broader surveys will have to confirm this. Comparative studies at different taxonomic levels will also be needed to identify the cellular biology and genetic factors determining numt abundance, and their significance for the study of genome evolution.

    Appendix

    Table A1 List of the Mitochondrial Haplotypes of the rrnL Gene in Clade A (Mx.x, Mx.x, ...) and Numt Alleles (Px.x, Px.x, ...) That Were Found in More Than One Individual Studied

    Acknowledgements

    We thank D. Broszka, D. Pearson, D. Sumlin, and F. Cassola for specimens and F. Kopliku for help with DNA sequencing. We are also indebted to Peter Foster for assessing the presence of nucleotide composition bias in our data using sequence simulations. The difficulties obtaining clean sequence reads in Rivacindela leading to the discovery of the numt were first observed by D. Duran. This study was supported by Natural Environment Research Council grant NER/A/S/2000/00489 to the authors and T. Barraclough.

    References

    Arctander, P. 1995. Comparison of a mitochondrial gene and a corresponding nuclear pseudogene. Proc. R. Soc. Lond. B Biol. Sci. 262:13–19.

    Barraclough, T. G., and A. P. Vogler. 2002. Recent diversification rates in North American tiger beetles (genus Cicindela). Mol. Biol. Evol. 19:1706–1716.

    Bensasson, D., M. W. Feldman, and M. Petrov. 2003. Rates of DNA duplication and mitochondrial DNA insertion in the human genome. J. Mol. Evol. 57:343–354.

    Bensasson, D., D. A. Petrov, D. X. Zhang, D. L. Hartl, and G. M. Hewitt. 2001a. Genomic gigantism: DNA loss is slow in mountain grasshoppers. Mol. Biol. Evol. 18:246–253.

    Bensasson, D., D.-X. Zhang, D. L. Hartl, and G. M. Hewitt. 2001b. Mitochondrial pseudogenes: evolution's misplaced witnesses. Trends Ecol. Evol. 16:314–321.

    Bensasson, D., D.-X. Zhang, and G. M. Hewitt. 2000. Frequent assimilation of mitochondrial DNA by grasshopper nuclear genomes. Mol. Biol. Evol. 17:406–415.

    Boore, J. L. 2001. Mitochondrial gene arrangement source guide, version 6.0. http://evogen.jgi.doe.gov/second_levels/mitochondria/MGA_Guide.html. Department of the Environment Joint Genome Institute, Walnut Creek, Calif.

    Britten, R. J., L. Rowen, J. Williams, and R. A. Cameron. 2003. Majority of divergence between closely related DNA samples is due to indels. Proc. Natl. Acad. Sci. USA 100:4661–4665.

    Brower, A. V. Z. 1994. Rapid morphological radiation and convergence among races of the butterfly Heliconius erato inferred from patterns of mitochondrial-DNA evolution. Proc. Natl. Acad. Sci. USA 91:6491–6495.

    Buckley, T. R., C. Simon, P. K. Flook, and B. Misof. 2000. Secondary structure and conserved motifs of the frequently sequenced domains IV and V of the insect mitochondrial large subunit rRNA gene. Insect Mol. Biol. 9:565–580.

    Caccone, A., G. D. Amato, and J. R. Powell. 1988. Rates and patterns of scnDNA and mtDNA divergence within the Drosophila melanogaster subgroup. Genetics 118:671–683.

    Dewoody, J. A., R. K. Chesser, and R. J. Baker. 1999. A translocated mitochondrial cytochrome b pseudogene in voles (Rodentia: Microtus). J. Mol. Evol. 48:380–382.

    Foster, P. G. 2004. Modeling compositional heterogeneity. Syst. Biol. 53:485–495.

    Gellissen, G., and G. Michaelis. 1987. Gene transfer: mitochondria to nucleus. Ann. NY Acad. Sci. 503:391–401.

    Gregory, T. R. 2001. Animal genome size database. http://www.genomesize.com/. Accessed July 2004.

    ———. 2004. Insertion–deletion biases and the evolution of genome size. Gene 324:15–34.

    Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human–ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160–174.

    Hebert, P. D. N., A. Cywinska, S. H. Ball, and J. R. deWaard. 2003. Biological identifications through DNA barcodes. Proc. R. Soc. Lond. B 270:313–321.

    Li, W.-H. 1997. Molecular evolution. Sinauer Associates, Sunderland, Mass.

    Lopez, J. V., M. Culver, J. C. Stephens, W. E. Johnson, and S. J. O'Brien. 1997. Rates of nuclear and cytoplasmic mitochondrial DNA sequence divergence in mammals. Mol. Biol. Evol. 14:277–286.

    Lopez, J. V., N. Yuhki, R. Masuda, W. Modi, and S. J. O. O'Brien. 1994. Numt, a recent transfer and tandem amplification of mitochondrial DNA to the nuclear genome of the domestic cat. J. Mol. Evol. 39:174–190.

    Lu, X. M., Y. X. Fu, and Y. P. Zhang. 2002. Evolution of mitochondrial cytochrome b pseudogene in genus Nycticelus. Mol. Biol. Evol. 19:2337–2341.

    Mourier, T., A. J. Hansen, E. Willerslev, and P. Arctander. 2001. The human genome project reveals a continuous transfer of large mitochondrial fragments to the nucleus. Mol. Biol. Evol. 18:1833–1837.

    Olson, L. E., and A. D. Yoder. 2002. Using secondary structure to identify ribosomal numts: cautionary examples from the human genome. Mol. Biol. Evol. 19:93–100.

    Petrov, D. A. 2002a. DNA loss and evolution of genome size in Drosophila. Genetica 115:81–91.

    ———. 2002b. Mutational equilibrium model of genome size evolution. Theor. Popul. Biol. 61:531–544.

    Petrov, D. A., Y.-C. Chao, E. C. Stephenson, and D. L. Hartl. 1998. Pseudogene evolution in Drosophila suggests a high rate of DNA loss. Mol. Biol. Evol. 15:1562–1567.

    Petrov, D. A., and D. L. Hartl. 1999. Patterns of nucleotide substitution in Drosophila and mammalian genomes. Proc. Natl. Acad. Sci. USA 96:1475–1479.

    Pillans, B., and R. Bourman. 2001. Mid Pleistocene arid shift in southern Australia, dated by magnetostratigraphy. Aust. J. Soil Res. 39:89–98.

    Pons, J., T. G. Barraclough, K. Theodorides, A. Cardoso, and A. P. Vogler. 2004. Using exon and intron sequences of the gene Mp20 to resolve basal relationships in Cicindela (Coleoptera:Cicindelidae). Syst. Biol. 53:554–570.

    Posada, D., and K. A. Crandall. 1998. MODELTEST: testing the model of DNA substitution. Bioinformatics 14:817–818.

    Ricchetti, M., F. Tekaia, and B. Dujon. 2004. Continued colonization of the human genome by mitochondrial DNA. PLoS Biol. 2:E273.

    Richly, E., and D. Leister. 2004. NUMTs in sequenced eukaryotic genomes. Mol. Biol. Evol. 21:1081–1084.

    Sanderson, M. J. 2003. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19:301–302.

    Sharp, P. M., and W.-H. Li. 1989. On the rate of DNA sequence evolution in Drosophila. J. Mol. Evol. 28:398–402.

    Simmons, M. P., and H. Ochotorena. 2000. Gaps as characters in sequence-based phylogenetic analyses. Syst. Biol. 49:369–381.

    Simon, C., F. Frati, A. Beckenbach, B. Crespi, H. Liu, and P. Flook. 1994. Evolution, weighting, and phylogenetic utility of mitochondrial gene sequences and a compilation of conserved polymerase chain reaction primers. Ann. Entomol. Soc. Am. 87:651–701.

    Sorenson, M. D., and R. C. Fleischer. 1996. Multiple independent transpositions of mitochondrial DNA control region sequences to the nucleus. Proc. Natl. Acad. Sci. USA 93:15239–15243.

    Sunnucks, P., and D. F. Hales. 1996. Numerous transposed sequences of mitochondrial cytochrome oxidase I–II in aphids of the genus Sitobion (Hemiptera: Aphididae). Mol. Biol. Evol. 13:510–524.

    Swofford, D. L. 2002. PAUP*: phylogenetic analysis using parsimony. Version 4.0b. Sinauer Associates, Sunderland, Mass.

    Tautz, D., P. Arctander, A. Minelli, R. H. Thomas, and A. P. Vogler. 2002. DNA points the way ahead in taxonomy. Nature 418:479.

    Thalmann, O., J. Hebler, H. N. Poinar, S. Paabo, and L. Vigilant. 2004. Unreliable mtDNA data due to nuclear insertions: a cautionary tale from analysis of humans and other great apes. Mol. Ecol. 13:321–335.

    Thomas, R. H., H. Zischler, S. Paabo, and M. Stoneking. 1996. Novel mitochondrial DNA insertion polymorphism and its usefulness for human populations studies. Hum. Biol. 68:847–854.

    Vogler, A. P., C. B. Knisley, S. B. Glueck, J. M. Hill, and R. Desalle. 1993. Using molecular and ecological data to diagnose endangered populations of the puritan tiger beetle Cicindela puritana. Mol. Ecol. 2:375–383.

    Williams, S. T., and N. Knowlton. 2001. Mitochondrial pseudogenes are pervasive and often insidious in the snapping shrimp genus Alpheus. Mol. Biol. Evol. 18:1484–1493.

    Woischnik, M., and C. T. Moraes. 2002. Pattern of organisation of human mitochondrial pseudogenes in the nuclear genome. Genome Res. 12:885–893.

    Yuan, J. D., J. X. Shi, G. X. Meng, L. G. An, and G. X. Hu. 1999. Nuclear pseudogenes of mitochondrial DNA as a variable part of the human genome. Cell Res. 9:281–290.

    Zhang, D.-X., and G. M. Hewitt. 1996. Nuclear integrations: challenges for mitochondrial DNA markers. Trends Ecol. Evol. 11:247–251.

    Zischler, H., H. Geisert, A. Vonhaeseler, and S. Paabo. 1995. A nuclear fossil of the mitochondrial D-loop and the origin of modern humans. Nature 378:489–492.(Joan Pons and Alfried P. )