当前位置: 首页 > 期刊 > 《分子生物学进展》 > 2005年第2期 > 正文
编号:11176482
Isolation and Molecular Evolution of the Selenocysteine tRNA (Cf TRSP) and RNase P RNA (Cf RPPH1) Genes in the Dog Family, Canidae
http://www.100md.com 《分子生物学进展》
     Department of Ecology and Evolutionary Biology, University of California, Los Angeles

    Correspondence: E-mail: carolyne_bard@yahoo.com.

    Abstract

    In an effort to identify rapidly evolving nuclear sequences useful for phylogenetic analyses of closely related species, we isolated two genes transcribed by RNA polymerase III (pol III), the selenocysteine tRNA gene (TRSP) and an RNase P RNA (RPPH1) gene from the domestic dog (Canis familiaris). We focus on genes transcribed by pol III because their coding regions are small (generally 100–300 base pairs [bp]) and their essential promoter elements are located within a couple of hundred bps upstream of the coding region. Therefore, we predicted that regions flanking the coding region and outside of the promoter elements would be free of constraint and would evolve rapidly. We amplified TRSP from 23 canids and RPPH1 from 12 canids and analyzed the molecular evolution of these genes and their utility as phylogenetic markers for resolving relationships among species in Canidae. We compared the rate of evolution of the gene-flanking regions to other noncoding regions of nuclear DNA (introns) and to the mitochondrial encoded COII gene. Alignment of TRSP from 23 canids revealed that regions directly adjacent to the coding region display high sequence variability. We discuss this pattern in terms of functional mechanisms of transcription. Although the flanking regions evolve no faster than introns, both genes were found to be useful phylogenetic markers, in part, because of the synapomorphic indels found in the flanking regions. Gene trees generated from the TRSP and RPPH1 loci were generally in agreement with the published mtDNA phylogeny and are the first phylogeny of Canidae based on nuclear sequences.

    Key Words: selenocysteine tRNA ? Canidae ? molecular evolution ? phylogeny ? transcriptional start site ? polymorphism

    Introduction

    Until recently, molecular phylogenetic studies have been dominated by analyses of sequence variation in mitochondrial (mt) DNA. MtDNA sequences have proven extremely useful in phylogenetic analyses due to their rapid rate of evolution and lack of recombination. A drawback of mtDNA analyses is that it represents essentially one locus and thus inference about the evolutionary history from this locus may not be equivalent to a species' evolutionary history. Recently, to remedy this problem, nuclear loci for both coding and noncoding regions of the genome have been employed in phylogenetic analyses (Mitchell, Mitter, and Regier 2000; Matthee et al. 2001; Springer et al. 2001; Koepfli and Wayne 2003). Nuclear loci have been found to evolve at a slower rate than mtDNA. Though this is a useful feature for resolving deeper nodes in a phylogeny, we were interested in finding more rapidly evolving regions in the nuclear genome in order to study the evolutionary history of species in the dog family Canidae, a family with a relatively recent evolutionary history. Identifying regions of rapidly evolving nuclear DNA with a rate comparable to mtDNA would be a useful tool for phylogenetic analyses of closely related species.

    Many different regions of the nuclear genome have been used to analyze phylogenetic relationships including the transcription units of each of the three nuclear RNA polymerases found in the eukaryotic cell (Hillis and Dixon 1991; Olsen and Woese 1993; Suzuki, Moriwaki, and Sakurai 1994; Sbisa et al. 1996; Fiedorow and Szweykowska-Kulinska 1998; Becerra 2003 and refs. above). We chose to focus on genes transcribed by RNA polymerase III (pol III) because of the compact nature of their promoters. The essential promoter elements occur either within the coding region or within a couple of hundred base pairs upstream of the transcriptional start site (promoter structures reviewed in Geiduschek and Kassavetis [1992]). Mutational analyses employing in vivo transcriptional systems of pol III–transcribed genes containing upstream promoter elements (consisting of the TATA box, PSE, and DSE, fig. 1), suggest that regions between promoter elements can tolerate mutations and small insertions (varying the length between the PSE and DSE) (Das et al. 1988; Lobo and Hernandez 1989; Carbon and Krol 1991; Myslinski, Krol, and Carbon 1992; Myslinski et al. 1993). Transcriptional termination by pol III is specified by an extremely simple termination signal, comprised of four or more thymidylate residues in the noncoding strand (Bogenhagen and Brown 1981). In vivo transcription experiments with the Xenopus TRSP (gene symbol TRSP-transfer RNA phosphoserine) gene indicate that removal of the 3' flank does not affect the level of transcription (Carbon and Krol 1991). Several genes that are transcribed by pol III have been isolated in human and mice that have a repetitive element directly downstream of the terminator, suggesting that no elements reside in the 3' flank that are essential for transcription (Murphy, Tripodi, and Melli 1986; Chang and Clayton 1989; Baer et al. 1990). Given the relative simplicity of the pol III transcription system, DNA sequence upstream of the DSE and downstream of the terminator (and perhaps between the DSE and PSE) are candidate regions that might be under minimal selective pressure because these regions are outside of the regions that are essential for transcriptional activity of the gene. Another reason for focusing on gene flanking regions of DNA is the absence of transcription-coupled DNA repair (Svejstrup 2002; Green et al. 2003).

    FIG. 1.— Bar graph representation of the number of sequences that differ at each position from the consensus sequence. Each species is represented by one sequence. (a) TRSP from 23 canid species and (b) RPPH1 from 12 canid species in the data set. The numbers on the x-axis indicate the position along the gene, where +1 corresponds to the putative transcriptional initiation site. The boxes underneath the x-axis indicate promoter elements: DSE, PSE, TATA, and the coding region, for tRNA(a) and RPPH1(b) genes. The y-axis indicates the number of sequences that differ from the consensus. The region highlighted in gray corresponds to the adjacent flanking region used in the statistical analysis. DSE = distal sequence element, PSE = proximal sequence element, T = terminator region.

    Although other phylogenetic studies have been carried out with genes transcribed by RNA polymerase III, these have used either the coding region, in which the calculated rate is too slow to resolve relationships between canids (0.3–12 MYA) or multi-copy genes which may confound phylogenetic analyses (Sanderson and Doyle 1992; Suzuki, Moriwaki, and Sakurai 1994; Sbisa et al. 1996; Fiedorow and Szweykowska-Kulinska 1998; Wasko et al. 2001; Becerra 2003, and refs. therein). We chose to target genes reported to be single copy, namely TRSP and RPPH1 (gene symbol RPPH1- ribonuclease P RNA component H1), to ensure that orthologous regions of DNA are compared (Baer et al. 1990; Lee et al. 1990; Sbisa et al. 1996). In addition, we included the flanking regions of these genes because they may evolve at a faster rate than the coding region.

    TRSP, encoding the selenocysteine tRNA gene, is ubiquitous within the animal kingdom (Lee et al. 1989, 1990). The gene has been isolated from a number of vertebrates. A pseudogene was also found in the human and rabbit genomes, and two functional selenocysteine tRNA genes were found in zebrafish (Hatfield, Dudock, and Eden 1983; O'Neill et al. 1985; Pratt et al. 1985; Ohama et al. 1994; Bosl et al. 1995; Kolker et al. 1995; Xu et al. 1999a, 1999b).

    Ribonuclease P (RNase P) is an essential ribonucleoprotein enzyme that processes the 5'-leader sequences of precursor tRNA molecules (reviewed in Frank and Pace [1998]; Xiao et al. [2002]). While the RNA component of the bacterial RNase P is independently catalytically active, the H1 RNA subunit and two protein subunits are required to reconstitute human RNase P activity (Mann et al. 2003). Most studies on the RNA subunit have focused on determining accurate models of the secondary and tertiary structure to understand the difference in catalytic capability between the bacterial and nonbacterial RNA subunit through comparative phylogenetic analysis (Frank and Pace 1998). As a result, full or partial genes for RNase P RNA have been isolated from many species (Chen and Pace 1997; Pitulle et al. 1998; Frank et al. 2000). In vertebrates, RPPH1 has been isolated from primates, rodents, amphibians, reptiles, and fish, but only the flanking region of human and mouse has been analyzed (Baer et al. 1990; Doria et al. 1991; Altman, Wesolowski, and Puranam 1993; Eder et al. 1996; Pitulle et al. 1998; Ame et al. 2001; Myslinski et al. 2001).

    We isolate and characterize the TRSP and RPPH1 genes from Canis familiaris and describe the molecular evolution of these RNA encoding genes and their flanking regions within Canidae. We evaluate the utility of these genes for phylogenetic analyses and compare results to those based on the mt-encoded cytochrome oxidase (COII) gene.

    Materials and Methods

    Samples

    We sampled 23 species of Canidae representing 14 genera (table 1; sample source is listed in table S1 of the Supplementary Materials online). Some samples were received as extracted DNA. For tissue or blood samples, total genomic DNA was isolated with standard extraction protocols using phenol/chloroform followed by precipitation with ethanol (Sambrook, Fritsch, and Maniatis 1989).

    Table 1 Taxa Used in This Study: Species, Abbreviation, and Common Name

    Isolation of TRSP

    The 87-bp coding region of the Canis familiaris TRSP gene was amplified by polymerase chain reaction (PCR) with primers derived from the consensus sequence of the human and rabbit genes (O'Neill et al. 1985; Pratt et al. 1985). The PCR product was randomly labeled with -32P dCTP and used as a probe to screen a dog genomic lambda phage library (Stratagene, La Jolla, Calif.). Duplicate nylon membrane filters were lifted from the library plates, annealed to the dog TRSP probe in Rapid-Hyb (Amersham) for 4 h at 65°C, washed at low stringency (2x SSC/0.1% SDS at 25°C) and then washed at higher stringency (0.1x SSC/0.1% SDS at 52°C). DNA was prepared from the lambda phage clones that were positive through three rounds of screening, and then sequenced.

    Isolation of RPPH1

    An approximately 300-bp coding region of the rat RPPH1 gene was amplified by PCR with two primers derived from the primate and mouse consensus sequence (Sbisa et al. 1996). The PCR product was labeled for screening, as above. Duplicate nylon membrane filters were lifted from the above library plates, annealed to the rat RPPH1 probe in 6x SSPE/0.5% SDS/5x Denhardt's/50% formamide at 42°C overnight and washed at low and higher stringency as stated above. DNA was prepared from the lambda phage clones that were positive through three rounds of plaque screening, and then sequenced.

    Conditions and primers for amplification of Canidae TRSP, RPPH1 flanking sequences, and the COII gene are listed in table S2 of the Supplementary Material online.

    Statistical Analysis of Variability

    The distribution free Wilcoxon signed rank test (Hollander and Wolfe 1999, pp 36–51) was carried out to determine whether the number of variable nucleotide positions in the flanking region directly adjacent to the RNA encoding gene differed significantly from the remaining flanking region.

    Each sequence was broken into two groups: (1) the adjacent region and (2) the flanking region. For TRSP, the adjacent region consisted of the 16 nucleotides directly upstream of the transcription start site and the 11 nucleotides between the end of the tRNA coding region and the terminator (27 nucleotides total); the flanking region group consisted of 606 nucleotides (promoter elements excluded). For RPPH1, the adjacent region only consisted of the 16 nucleotides directly upstream of the transcription start site because there is no additional sequence between the coding region and the terminator. The number of nucleotides in the flanking region group differed among individual sequences (ranging from 144 to 275 nucleotides) because there were differences in the amount of missing data. In addition, the analysis was carried out excluding all positions containing missing data (ranging from 140 to 150 nucleotides in the flanking region group).

    For each gene, a consensus sequence was generated and compared to each individual sequence, excluding indels and/or missing data. The number of mismatches in the two groups described above was recorded. The ratio of mismatches (No. of nucleotide differences/ total No. of nucleotides) from the adjacent region was subtracted from the ratio of mismatches in the flanking region for each sequence, and the differences were ranked from smallest to largest. T+ (Wilcoxon's sum of positive ranks) and T–(Wilcoxon's sum of negative ranks) were calculated and the corresponding probability (P value) of rejecting the null hypothesis (H0 = no difference in mutation rate between the flanking region directly adjacent to the RNA encoding gene and the remaining flanking region) was obtained from a table.

    Evolutionary Rate

    The evolutionary rate (substitution per site per year) was calculated using r = K/2T, where K is the ratio of the number of substitutions between two homologous sequences and the length of the sequence, and T is the time of divergence between the two sequences (Li and Graur 1991). To compare the evolutionary rate of the TRSP and RPPH1 flanking regions to other noncoding nuclear sequences, we used sequences from five introns (CHRNA1, CYPIA, fes, GHR, and VTN) that we generated for a more extensive study of Canidae systematics (manuscript in preparation). For TRSP and intron loci, the evolutionary rate was calculated between the red fox and the wolf, using 10 million years ago (MYA) as T and between the wild dog and the wolf and using 6.7 MYA as T (Wayne et al. 1997). The evolutionary rate for RPPH1 was only calculated for the wolf-wild dog pair because of the failure to amplify the RPPH1 locus in the red fox. The promoter element sequences were removed when calculating the rate of the 5' flank because we were interested in calculating the rate of the putative "unconstrained" sequences.

    The Wilcoxon rank sum test (Hollander and Wolfe 1999, pp 106–113) was carried out to determine whether the evolutionary rate of the flanking regions of TRSP and RPPH1 was significantly faster than the evolutionary rates of the five introns sequenced. The probability (P value) of rejecting the null hypothesis of no difference in evolutionary rate between the flanking regions and introns was obtained from a table.

    Phylogenetic Analyses

    All TRSP and RPPH1 and all but four COII sequences were generated in this study (Accession numbers are given in table S1of the Supplementary Material online). For the wild dog and the raccoon dog, multiple attempts with multiple samples for PCR amplification of the COII gene generated products containing out-of-frame indels. These may represent nuclear pseudogenes (Ishiguro et al. 2002) and therefore were not included in the data set. GenBank sequences for the raccoon dog (Accession No. AF028221), the wild dog (Accession No. AF028219), the dog (Accession No. U96639), and the black bear (Accession No. AF303109) were used to complete the data set. Sequences were initially aligned using ClustalX v. 1.81 (Jeanmougin et al. 1998) with subsequent adjustment to minimize indels. In the case where heterozygosity was due to an indel, alleles are represented as A1 and A2 (in the RPPH1 data set). Indels were coded using the scheme of Barriel (1994). ModelTest (Posada and Crandall 1998) was used to determine the model and parameters of nucleotide substitution that best fit the data set. Each of the species is represented by 672–709, 548–674, and 684 nucleotides of sequence data in the TRSP, RPPH1, and COII data sets, respectively. The TRSP, RPPH1, and COII data sets contain 0.24%, 5.3%, and 0%, missing data, respectively. Differences are mainly due to the quality of the sequence at the end of the fragment. Phylogenetic analyses were carried out with a data set where each species is represented by one sequence. Using different individuals of the same species did not change the highly supported nodes. Outgroups included the black bear (Ursus americanus) for the TRSP and COII data sets and the gray fox (Urocyon cinereoargenteus) for the RPPH1 data set. Phylogenetic trees were generated using minimum evolution (ME) and maximum parsimony (MP) with PAUP v. 4.0b (Swofford 2003). For parsimony analyses of nuclear data, all characters were weighted equally (unweighted). For the analysis of the COII data set, we used a transition/transversion ratio of 6.4 estimated by averaging the ratios for all pairwise comparisons among the ingroup taxa and a value of 12.5 using maximum likelihood (ML) analyses. Both weighting schemes produced an MP strict consensus tree with the identical topology. Heuristic searches with 100 replicates of random stepwise addition and tree bisection-reconnection branch swapping were used. Nodal support was evaluated using 1,000 bootstrap (BS) pseudoreplicates (Felsenstein 1985).

    Results

    Isolation and Characterization of TRSP from Canis familiaris

    Five positive clones were obtained from a dog genomic library as outlined in Materials and Methods. The sequences were identical except in the 5' and 3' flanks of two isolates, which had an alternative sequence at positions –7, –10, and –12 (relative to the presumed start site of transcription, +1, indicated with an arrow in fig. S1 in the Supplementary Material online) and 10 instead of 7 CATT repeats in a 3'-flanking microsatellite. In vitro transcription using a HeLa cell nuclear extract indicated that sequences are transcriptionally functional (data not shown).

    Alignment of the dog gene with the human, bovine, and mouse genes reveals three blocks of conserved sequence in the 5' flanking region (fig. S1 of the Supplementary Material online). These regions correspond to the TATA box and proximal sequence element (PSE) core promoter elements, as well as an enhancer-like distal sequence element (DSE) comprised of an SPH/Staf motif and an octamer sequence (Myslinski, Krol, and Carbon 1992; Schaub et al. 1997). Based on the presence and location of the DSE, PSE, and TATA box promoter elements, the conservation of the coding region, and the presence of a pol III terminator, we concluded that we have isolated TRSP from dog.

    Molecular Evolution of TRSP

    To determine whether TRSP is a phylogenetically informative marker, this locus was amplified and sequenced from 23 species within Canidae (table 1). The sequences consisted of the 87 bp tRNA coding region and 312 and 321 bp of the 5' and 3' flanking regions, respectively.

    Alignment of the sequences reveals a non-random distribution of positions that are variable. Point mutations cluster around the boundaries of the gene, directly upstream of the transcriptional start site from –1 to –16, the same region where polymorphisms are detected in the cloned dog genes, and between the 3' end of the coding region and the terminator (fig. 1; see also fig. S2 of the Supplementary Material online). Although point mutations occur throughout the 5' flanking region, they are less concentrated than those directly upstream of the transcriptional start site. A point mutation occurring within the PSE element is not conserved between mammals. As expected, no point mutations were detected in the highly conserved coding region. Nine of the 11 positions between the coding region and the terminator, which is cleaved from the nascent transcript during tRNA maturation (Lee et al. 1987), are variable. To determine whether there is a higher number of variable nucleotide positions adjacent to the coding region (see Materials and Methods), the substitution rates in this region and in the remaining flanking region were compared between each species and the consensus sequence. The number of point mutations in the positions directly adjacent to the coding region is significantly higher (Wilcoxon signed rank test, P value <0.001) than the number of differences in the remainder of the flanking region.

    Indels in the TRSP locus range from 1 nucleotide to 8 nucleotides in length, but most are 1–2 nucleotides in length. Two of the 11 inferred indels are associated with repeated elements (fig. S2 in the Supplementary Material online). It had been predicted that the 3' flanking region would display a higher substitution rate and more indels than the 5' flanking region because there were fewer functional constraints (lack of promoter elements and topological constraints) (Takahashi et al. 1986; Waibel and Filipowicz 1990; Goomer and Kunkel 1992). The similar number of variable sites within the two flanking regions (39 and 32, respectively) indicates that the 3' flank does not display a higher substitution rate; however, more indels occur in the 3' flank than in the 5' flank: 9 and 2, respectively (table 2). The two inferred indels in the 5' flank are upstream of all recognized promoter elements. Three of the 4 phylogenetically informative indels occur within the 3' flanking region (table 2; see also fig. S2 of the Supplementary Material online).

    Table 2 Sequence Characteristics of TRSP, RPPH1, introns and COII

    TRSP was amplified in more than one individual in 12 species to assess intraspecific variation. Intraspecific variation ranges from 0% to 0.8% and was found in both flanking regions (table S3 of the Supplementary Material online). The pattern of variation reflected at the interspecific level is also observed at the intraspecific level.

    Isolation and Characterization of Canis familiaris RPPH1

    To determine whether the region of high sequence variability directly adjacent to the RNA encoding gene is a general phenomenon, RPPH1 was isolated and analyzed in canids. Three RPPH1- related genes were obtained from a dog genomic library. To determine which of the three clones contained the RPPH1 gene, sequences were compared to other vertebrate RPPH1 sequences and scanned for promoter elements. RNase P RNA contains five conserved regions, termed CR-I through CR-V, that include several universally conserved nucleotides and several helical elements termed P1-4 and P7-12, which are variable in sequence but occur at the same positions in the different RNAs (Haas et al. 1994; Chen and Pace 1997). There are four additional helices characteristic of eukaryal RNase P RNA termed eP8, eP9, eP15, and eP19 (Frank et al. 2000). Only one clone contained all of the CR elements with sequence identical to that of the human and mouse genes (fig. S3 of the Supplementary Material online). All of the predicted helices are present and three stems; P2, eP8, eP9 are supported by co-variation in the primary sequence. Helices P12 and eP19 in the putative dog gene differ in length from their human counterparts, but species-specific sequence and length variations in these helical structures have been noted previously (Pitulle et al. 1998).

    The promoter of the RPPH1 gene has been extensively studied in the human gene (Baer et al. 1990; Hannon et al. 1991; Myslinski et al. 2001). It contains a TATA box, a PSE element, and a DSE element composed of an octamer and SPH/Staf binding site (Baer et al. 1990; Myslinski et al. 2001). Only the dog clone that maintains the secondary structure of the RNase P RNA contains all three promoter elements at similar positions (fig. S3 of the Supplementary Material online).

    The spacing between the PSE element and the TATA box and between the TATA box and the beginning of the putative dog gene differ slightly between the human, mouse, and dog RPPH1 gene (fig. S3 of the Supplementary Material online). Analyses of the U6 gene indicates that the spacing between the promoter elements and the start site can only tolerate minor changes (Mattaj et al. 1988; Lobo et al. 1991; Goomer and Kunkel 1992). However, the promoter elements of the RPPH1 gene are more compressed, and the effect of changing their spacing has not been analyzed. Preliminary experiments suggest that the putative dog RPPH1 is active in an in vitro transcription assay with a HeLa nuclear extract (data not shown). In addition, the putative dog RPPH1 has a chain of four thymidylate residues flanked by GC-rich sequences in the region where termination of transcription would be expected. Based on the presence, conservation, and location of promoter elements, the preservation of the conserved core coding regions and secondary structural elements, and the presence of a pol III terminator, we concluded that we have isolated the dog RPPH1 gene.

    Molecular Evolution of RPPH1

    RPPH1 was amplified and sequenced from 12 canids. Multiple attempts, including using alternative primers and different combinations of primers, failed to amplify this locus in the other species in Canidae. This is not unprecedented; failure to amplify the RPPH1 locus in closely related species has been noted in other studies (Altman, Wesolowski, and Puranam 1993). The sequence from this locus, contains 97 bp of 5' flanking region, 326 bp encoding the RNase P RNA, and 278 bp of the 3' flanking region.

    Although not as pronounced as in TRSP, there is statistical support that the 16 nucleotides directly upstream of the putative transcriptional start site contain a higher number of variable sites than the rest of the flanking region (Wilcoxon signed rank test, P value = 0.062) (fig. 1; see also fig. S4 of the Supplementary Material online). The RPPH1 transcript does not get processed at the 3' end (Baer et al. 1990); therefore there is no comparable region to that of TRSP. All point mutations in the 5' flank occur outside of the promoter elements, except for one mutation within the PSE of the wild dog. The pattern of sequence difference in the RNA coding region is consistent with data from other species (Altman, Wesolowski, and Puranam 1993; Pitulle et al. 1998). The majority of sequence differences occur in loops, including those bounded by helices P3a and P3b, and P7 and eP8 (fig. S4 of the Supplementary Material online). Eleven of the 17 variable sites that occur in the coding region are within helices P12 and eP19, helices that display substantial structure and sequence length variation in other species (Pitulle et al. 1998). Most of these differences occur in regions predicted to be loops by Mfold (Zuker 2003).

    Indels at this locus range in length from 1 to 10 nucleotides. The two indels that are inferred in the coding region occur within helices P12 and eP19. There are 12 indels downstream of the terminator, but no indels are observed in the 5' flank (table 2; see also fig. S4 of the Supplementary Material online). The compact structure of the RPPH1 promoter (Baer et al. 1990; Myslinski et al. 2001) may account for the low number of variable sites and lack of indels in the 5' flank. Alternatively, it may be due to the local gene arrangement; in human and mouse the RPPH1 gene is arranged head-to-head with another gene, PARP-2 (Ame et al. 2001).

    The data sets for both loci, RPPH1 and TRSP, show similar trends: (1) the 3' flanking region has a higher number of indels than the 5'-flanking region and (2) the region directly upstream of the transcription start site shows statistically higher sequence variation than the remaining flanking region (fig. 1 and table 2).

    Evolutionary Rate

    For comparison of evolutionary rates, the mt-encoded COII gene was chosen because the length of this gene is similar to the length of the amplified products of TRSP and RPPH1. Several measures, including the number of variable characters and uncorrected pairwise sequence ("p") divergence indicate that both nuclear loci evolve more slowly than the mt-encoded COII, a trend observed in many studies (table 2; see also table S4 of the Supplementary Material online) (Prychitko and Moore 2000; Matthee and Davis 2001; Koepfli and Wayne 2003).

    To get a more precise estimate, evolutionary rates were calculated as described in Materials and Methods. Comparison of evolutionary rates suggests that the mt COII gene evolves 2.5 to 10 times faster than the flanking regions of TRSP and RPPH1 (table 3). To compare the rate of evolution in the flanking regions of TRSP and RPPH1 to other noncoding regions of nuclear DNA, the rate of five nuclear introns was calculated (manuscript in prep.; table 3). The 5' and 3'-flanking regions of TRSP and RPPH1 have significantly higher evolutionary rates than the nuclear introns in the Clu-Lpi pair comparison (Wilcoxon Rank Sum Test, P = 0.048), however this significance is lost if nucleotides –1 to –16 are excluded. Because we have only one estimate for the evolutionary rate of the 5' and 3' flanking regions in the Clu-Vvu comparison, a statistical analysis cannot be carried out. The calculated evolutionary rate of the TRSP 5' flanking region in the Clu-Vvu comparison is higher than the rates of the five introns, while the rate of the 3' flanking region is similar to those of the introns (table 3). More indels are inferred in the TRSP and RPPH1 data sets, 1.8 and 4.2 times, respectively, than in the intron data set when normalized for nucleotide length (table 2).

    Table 3 Evolutionary Rates of 5' and 3' flanks of TRSP and RPPH1, Introns and COII

    Gene Trees

    To assess the utility of TRSP and RPPH1 as phylogenetic markers, gene trees were generated with ME and MP analyses (Swofford 2003) and compared to the COII gene tree (data sets with similar numbers of nucleotides). For ME, the model of evolution determined by ModelTest (Posada and Crandall 1998) was employed (table 4). For MP, data sets in which the indels were coded for phylogenetic information (figs. S5 and S6 in the Supplementary Material online), resulted in generally higher bootstrap values than when analyses was carried out with indels coded as missing data.

    Table 4 MP Tree Statistics and Models of Evolution for TRSP, RPPH1, and COII

    Gene Tree–TRSP

    For the comparison between TRSP and COII, each gene tree was created from a data set in which the 23 species of Canidae were represented by one sequence and Ursus americanus was used as an outgroup. The COII data set contains almost 5 times as many parsimony informative (PI) sites (205) as the TRSP data set (39) (table 4). In the MP analyses, the COII data set generated a gene tree with twice as many nodes with bootstrap values 70% as the TRSP data set (fig. 2a and 2b); however, the TRSP data set was less homoplastic (TRSP: CI = 0.5889, RI = 0.8132; COII: CI = 0.4545 and RI = 0.6121). For both data sets, ME generated a more resolved gene tree than MP. Although the COII ME tree contained more nodes, both trees had a similar number of nodes with bootstrap values 70% (TRSP: 6 nodes, COII: 8 nodes) (fig. 2c and 2d). Both trees strongly supported two monophyletic clades: (1) a red fox–like canids clade (three PI indels in the TRSP data set support this clade) and (2) Pseudalopex. In general, the COII tree offered more support at the terminal branches; for example, V. vulpes and V. corsac are sister taxa in the COII tree, and they remain unresolved in the TRSP tree. However, both the COII and the TRSP trees show support for the terminal grouping of A. lagopus with V. macrotis as sister taxa and a grouping of Pseudalopex. The two genes also resolve different parts of the phylogeny. For example, the COII data set resolves some relationships within Canis, and the TRSP data set, including a PI indel, supports a monophyletic clade of all South American canids (fig. 2a). With ME analysis, TRSP generates higher support than COII for a monophyletic clade grouping the wolf-like and South American canids, suggesting that TRSP can contribute phylogenetic signal to basal positions in the tree. The gene trees contain no nodes in significant conflict with each other or with previously published mt trees (Wayne et al. 1997). Some of the phylogenetic signal of TRSP is redundant with the COII gene; however, it contains signal for regions of the tree unresolved with COII, suggesting that TRSP is a useful phylogenetic marker for Canidae and closely related species.

    FIG. 2.— Gene trees derived from the nuclear TRSP gene and the mt COII gene based on data sets containing sequences from 23 canids. (a) Strict consensus maximum parsimony (MP) tree based on the 709 nucleotide TRSP data set. (b) Strict consensus MP tree based on the 684 nucleotide COII gene data set with transversions weighted 6 times greater than transitions. (c) Minimum evolution (ME) cladogram (bootstrap consensus) of the TRSP data set using the GTR + I + G (General Time Reversible + Invariant sites + Gamma) model of sequence evolution. (d) Minimum evolution (ME) cladogram (bootstrap consensus) of the COII data set using the TVM + I + G (TransVersion Model + I + G) model of sequence evolution. Species names are indicated and the common names are listed in table 1. Species grouped with either the red fox–like species, the South American (SA) canids, or the gray wolf–like species are indicated with the respective bracket. Species belonging to Pseudalopex are indicated with an asterisk and those belonging to Canis are indicated with a black square. Numbers above the branches indicate nodes supported in >50% of bootstrap replicates. Each black bar indicates a parsimony-informative indel, SAC = South American canid.

    Gene Tree- RPPH1

    To determine whether RPPH1 is useful for inferring phylogeny of closely related species, phylogenetic analyses using ME and MP were performed on the 12 taxa RPPH1 data set. The gray fox, thought to be the most basal canid (Wayne et al. 1997) was used as an outgroup. Out of 673 characters, 91 sites were variable of which 27 are PI (table 2). ME and MP yielded trees with identical topologies, with the exception that C. alpinus was basal to C. aureus in the ME tree (bootstrap = 53%). All phylogenetic analyses strongly supported (bootstrap values 90%) a monophyletic clade consisting of Canis, Lycaon, and Cuon, as well as a group including the two South American foxes, P. griseus and C. thous. Three nodes received moderate support (bootstrap values: 63%–84%) including (1) a monophyletic clade consisting of C. familiaris, C. lupus, and C. aureus; (2) C. mesomelas and C. adustus as sister taxa; and (3) C. lupus and C. familiaris as sister taxa. Groupings supported by PI indels are indicated (fig. 3a).

    FIG. 3.— (a) The strict consensus maximum parsimony (MP) tree based on the 673 nucleotide data set of the RPPH1 locus. (b) The strict consensus MP tree based on the 709 nucleotide data set of TRSP. (c) The strict consensus MP tree based on the 684 nucleotide data set of the COII gene weighing transversions 6 times greater than transitions. (d) The strict consensus MP tree from the combined TRSP and RPPH1 data set. The trees are based on a 12 taxa data set. Species names are indicated; common names are listed in table 1. The gray fox (U. cinereoaregenteus) was used as an outgroup. Species grouped with either the South American (SA) canids or the gray wolf–like species are indicated with the respective bracket. Numbers above the branches indicate nodes supported in >50% of bootstrap replicates. The minimum evolution analysis on the RPPH1 locus is based on using the HKY + G (Hasegawa, Kishino, Yano + Gamma) model of sequence evolution with the transition/transversion ratio set at 1.5. Bootstrap values for the minimum evolution analysis (only in a) are indicated on the MP tree. Each black bar indicates a parsimony-informative indel.

    To compare the resolution power of RPPH1 to that of TRSP and COII, an MP analysis using the same 12 taxa as in the RPPH1 data set was carried out with the other two data sets. The RPPH1 data set contains a higher number of variable and PI sites than the TRSP data set, and it yielded a gene tree with five nodes with bootstrap values >70%, compared to the TRSP gene tree, which yielded one strongly supported node grouping the South American canids (fig. 3a and 3b, table 4). The TRSP and RPPH1 gene trees do not contain any significantly conflicting nodes. The RPPH1 data set was less homoplastic than TRSP, even after CI and RI were corrected for different amounts of missing data (RPPH1: CI = 0.7228, RI = 0.7831; TRSP: CI = 0.6435, RI = 0.7232; table 4).

    The COII gene tree yielded the same number of nodes with bootstrap values >70% as the RPPH1 gene tree (fig. 3a and 3c). Three of the five nodes support identical relationships. The discrepancy is with the placement of S. venaticus, which is basal to all other species within the ingroup in the COII tree, but grouped with C. brachyurus in the RPPH1 tree. The RPPH1 locus provides resolution similar to that of the COII gene, but because it is less homoplastic (table 4, COII: CI = 0.5469, RI = 0.4949), its usefulness as a phylogenetic marker may be greater. However, its full potential has been hindered by the difficulty of amplifying this locus.

    Combining phylogenetic data from multiple loci may reveal hidden support or conflict (Gatesy, O'Grady, and Baker 1999). The strict consensus tree of the combined TRSP and RPPH1 data set does not display any conflicting nodes with the trees generated from the individual loci (fig. 3d). A more detailed analysis of a combined nuclear dataset (TRSP + 5 introns) will be presented elsewhere (manuscript in prep).

    Discussion

    The TRSP and RPPH1 genes from the dog have been isolated, and their utility as phylogenetically informative markers within Canidae have been explored. TRSP and RPPH1 appear to be useful markers for phylogenetic inference of closely related species. Although both loci evolve more slowly than the mt COII gene, they contribute phylogenetic signal to basal and terminal regions of the gene tree. An advantage of using noncoding regions is that it increases the probability of detecting indels, some of which may be phylogenetically informative. As expected, no indels are inferred in the COII gene, whereas more than 20 were inferred within the noncoding regions of TRSP and RPPH1. Because indels in nonrepetitive elements are rare and the size of an indel does not reflect its phylogenetic value, loci containing indels are particularly informative (Giribet and Wheeler 1999; Rokas and Holland 2000; Simmons and Ochoterena 2000, and refs. therein). RPPH1 has nearly the same resolution power as the COII gene in the 12 taxa tested with less homoplasy. TRSP contributes signal to basal and terminal regions of the tree. The two nuclear loci are able to resolve the three major clades previously identified with mtDNA analyses (Geffen et al. 1992; Wayne et al. 1997), as well as some associations within these clades. In general, the gene trees generated from these nuclear loci are consistent with the mt tree (Wayne et al. 1997) except in regard to the placement of the bush dog. The divergence time between red fox and the other canids is thought to be about 9–10 MYA and between the South American canids and wolf-like canids about 6 MYA (Wayne et al. 1997). Therefore these loci may be useful phylogenetic markers in this time range.

    In terms of phylogenetic content, the two genes analyzed here are more informative than other noncoding regions of nuclear DNA. The percentage of variable sites in the TRSP (11.6%) and RPPH1 (13.0%) data sets are higher than the average of five nuclear introns amplified in Canidae for another study (8.5%) (manuscript in prep.). Of the variable sites, a higher percentage are PI in TRSP (42.7%) and RPPH1 (30.7%) than in the introns (27.7%), even when the smaller number of taxa in the RPPH1 data set is not taken into account (table 4). In addition, TRSP and RPPH1 contain more indels per unit length than the introns. In a phylogenetic analysis to be presented elsewhere, TRSP contributed nearly 50% of the phylogenetic signal in an analysis that combined the five nuclear introns with TRSP.

    A phylogenetic analysis using the coding region of RPPH1 to study the relationships among primates having divergence times between 8 and 9 MYA failed to resolve the evolutionary history of species within the primate clade (Sbisa et al. 1996). In contrast, we obtained some resolution of canids using this locus, even when only the coding region was used within a 6 MYA time frame (data not shown). The substitution rate between human and orangutan using a divergence time of 8 MYA was 0.4 x 10–9 substitutions/site/year, whereas the substitution rate between the wild dog and the wolf, using a divergence time of 6.7 MYA, was 1.5 x 10–9 substitutions/site/year. Although the rate of substitution in the primate genome and canid genome appear more similar to each other than to that of the rodent genome (Kirkness et al. 2003), it appears that in Canidae, RPPH1 may evolve at a faster pace.

    TRSP may be a useful phylogenetic marker for closely related species in several orders of mammals. Preliminary experiments with TRSP primers indicate that regions of its 3' flank can be amplified in various carnivores, such as bear, raccoon, and several species of otters. Searches through the cat, cow, pig, mouse, and human genomes revealed regions of conserved sequence several hundred base pairs upstream and downstream of the TRSP coding region that are sufficient for primers to be designed and could be useful for taxonomic studies in Carnivora and (separately) Artiodactyla. Alignment of all available mammal sequences suggests that it may be difficult to design universal primers for mammals.

    One interesting finding is the nonrandom nature of the accumulation of mutations within the TRSP and RPPH1 genes. Directly upstream of the transcriptional start site (–1 to –16) there is a region of high sequence variability as predicted for a region of the genome with no evolutionary constraint on sequence and consistent with a mutational analysis on the Xenopus laevis TRSP gene indicating that transcription was only minimally affected by sequence changes between –1 and –16 (Carbon and Krol 1991). Mutational analyses on other pol III genes containing upstream promoters also indicate that the sequence directly upstream of the transcriptional start site does not have a major effect on transcriptional activity as long as the pyrimidine/purine nature of the –1/+1 sites are preserved (Mattaj et al. 1988; Lobo and Hernandez 1989; Hannon et al. 1991; Zecherle, Whelen, and Hall 1996; Myslinski et al. 2001). Models based on the RNA pol II crystal structure, which can by applied to RNA pol III because of the conserved nature of eukaryotic RNA polymerases, explain the lack of required specific sequences near the transcriptional start site (Gnatt et al. 2001; Grove et al. 2002). The transcribed strand interacts through nonspecific interactions with the DNA backbone near the catalytic site of pol II, whereas the nontranscribed strand is not tightly held (Gnatt et al. 2001). High sequence variability also occurs between the 3' end of the coding region and the terminator. The high sequence variability adjacent to the start site and terminator does not appear to be due solely to low selective pressure. Other flanking regions that prior analyses indicate lack sequence-specific content, such as the sequence between the DSE and PSE element and downstream of the terminator, do not display the high number of sequence differences found directly adjacent to the coding region (Murphy, Tripodi, and Melli 1986; Chang and Clayton 1989; Baer et al. 1990; Myslinski, Krol, and Carbon 1992).

    Whether the greater sequence variability directly adjacent to the TRSP coding region is due to differences in selection or a higher mutation rate will require an analysis with more individuals from a single species. Our preliminary analysis suggests that the intraspecific pattern reflects the interspecific pattern that would indicate that the DNA adjacent to the gene is either more prone to mutation or less likely to be repaired once a lesion has occurred. The apparent hypervariability seen in the 5' flank may stem from chromatin structure. The DNA between the DSE and PSE is wrapped around a nucleosome in the U6 snRNA gene that has a comparable promoter to the TRSP gene (Stunkel, Kober, and Seifart 1997). For a gene to be transcriptionally active, its start site must remain free of nucleosomes and accessible to RNA polymerase. Some types of DNA damage occur at lower frequencies in DNA wrapped around a nucleosome, therefore DNA around the start site may be more prone to mutation (Mitchell, Nguyen, and Cleaver 1990; Boulikas 1992, and refs. therein).

    Another possibility is that the region of high variability includes the region of DNA upstream of the transcriptional start site that becomes single stranded in an open promoter complex and the region just upstream of the transcriptional terminator that may remain single-stranded during the relatively slow process of termination (Kassavetis et al. 1992; Matsuzaki, Kassavetis, and Geiduschek 1994). It has been noted that single-stranded DNA or DNA in alternate forms is more prone to mutations and, in some cases, is less likely to be repaired (Boulikas 1992). An analysis of patterns of spontaneous mutations suggests that nucleotide substitutions are clumped at the scale of 10 nucleotides (among other scales) (Silva and Kondrashov 2002); perhaps some of these correspond to transcriptional start sites. Whether increased divergence directly upstream of a transcriptional start sites is a general phenomenon (for pol I–transcribed and pol II–transcribed genes as well) that is related to the level of transcriptional activity of a gene will require phylogenetic analyses of closely related species. If accessibility to the transcription apparatus or regions of DNA that become single stranded contributes to generating higher mutation rates, we would expect that constitutively expressed genes may be more polymorphic. If so, regions near transcription start sites might be useful for population level studies.

    In conclusion, we have shown that TRSP and RPPH1 are phylogenetically informative markers for closely related species. Trees generated with the nuclear loci are consistent with the mtDNA based phylogeny. Determining whether regions of DNA directly upstream of transcriptional start sites are generally highly variable and the mechanism underlying this variability will be of interest to future studies.

    Acknowledgements

    We thank M. Raines-Casselman for use of her lab and reagents for screening the genomic library, A. Berk and G. Caintin for their generous gift of HeLa nuclear extracts, G. Kassavetis for his generous gift of yeast TBP and comments on the manuscript, K. Koepfli for advice on phylogenetic analyses, H. Chuang for advice on statistical analyses, and I. Amorim do Rosario for helpful comments, C. Cicero, Museum of Vertebrate Zoology for tissue samples of small-eared dog and black bear, O. Ryder from Conservation and Research for Endangered Species (CRES) projects for DNA samples of bush dog and maned wolf, and L. Waits for DNA samples of black bear. This work was supported in part by National Science Foundation grant DEB-9977072.

    References

    Altman, S., D. Wesolowski, and R. S. Puranam. 1993. Nucleotide sequences of the RNA subunit of RNase P from several mammals. Genomics 18:418–422.

    Ame, J.-C., V. Schreiber, V. Fraulob, P. Dolle, G. de Murcia, and C. P. Niedergang. 2001. A bidirectional promoter connects the poly(ADP-ribose) polymerase 2 (PARP-2) gene to the gene for RNase P RNA: structure and expression of the mouse PARP-2 gene. J. Biol. Chem. 276:11092–11099.

    Baer, M., T. W. Nilsen, C. Costigan, and S. Altman. 1990. Structure and transcription of a human gene for H1 RNA the RNA component of human RNase P. Nucleic Acids Res. 18:97–104.

    Barriel, V. 1994. Molecular phylogenies and how to code insertion/deletion events. C. R. Acad. Sci. Ser. iii, Sci. Vie 317:693–701.

    Becerra, J. X. 2003. Evolution of Mexican Bursera (Burseraceae) inferred from ITS, ETS, and 5S nuclear ribosomal DNA sequences. Mol. Phylogenet. Evol. 26:300–309.

    Bogenhagen, D. F., and D. D. Brown. 1981. Nucleotide sequences in Xenopus laevis 5S DNA required for transcription termination. Cell 24:261–270.

    Bosl, M. R., M. F. Seldin, S. Nishimura, and M. Taketo. 1995. Cloning, structural analysis and mapping of the mouse selenocysteine tRNA(Ser)-Sec gene (Trsp). Mol. Gen. Genet. 248:247–252.

    Boulikas, T. 1992. Evolutionary consequences of nonrandom damage and repair of chromatin domains. J. Mol. Evol. 35:156–160.

    Carbon, P., and A. Krol. 1991. Transcription of the xenopus-laevis selenocysteine transfer RNA-ser-sec gene a system that combines an internal B box and upstream elements also found in U6 small nuclear RNA genes. EMBO J. 10:599–606.

    Chang, D. D., and D. A. Clayton. 1989. Mouse RNase MRP RNA is encoded by a nuclear gene and contains a decamer sequence complementary to a conserved region of mitochondrial RNA substrate. Cell 56:131–139.

    Chen, J.-L., and N. R. Pace. 1997. Identification of the universally conserved core of ribonuclease P RNA. RNA 3:557–560.

    Das, G., D. Henning, D. Wright, and R. Reddy. 1988. Upstream regulatory elements are necessary and sufficient for transcription of a U6 RNA gene by RNA polymerase III. EMBO J. 7:503–512.

    Doria, M., G. Carrara, P. Calandra, and G. P. Tocchini-Valentini. 1991. An RNA molecule copurifies with RNase P activity from Xenopus laevis oocytes. Nucleic Acids Res. 19:2315–2320.

    Eder, P. S., A. Srinivasan, M. C. Fishman, and S. Altman. 1996. The RNA subunit of ribonuclease P from the zebrafish, Danio rerio. J. Biol. Chem. 271:21031–21036.

    Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791.

    Fiedorow, P., and Z. Szweykowska-Kulinska. 1998. Intergenic sequences of clustered tRNA genes: new type of genetic marker for phylogenetic studies, with application to the taxonomy of liverworts. Plant Mol. Biol. 38:1257–1261.

    Frank, D. N., C. Adamidi, M. A. Ehringer, C. Pitulle, and N. R. Pace. 2000. Phylogenetic-comparative analysis of the eukaryal ribonuclease P RNA. RNA 6:1895–1904.

    Frank, D. N., and N. R. Pace. 1998. Ribonuclease P: unity and diversity in a tRNA processing ribozyme. Pp. 153–180 in C. C. Richardson, ed. Annual Review of Biochemistry. Annual Reviews Inc, Palo Alto, Calif.

    Gatesy, J., P. O'Grady, and R. H. Baker. 1999. Corroboration among datasets in simultaneous analysis: hidden support for phylogenetic relationships among higher level artiodactyl taxa. Cladistics 15:271–313.

    Geffen, E., A. Mercure, D. J. Girman, D. W. MacDonald, and R. K. Wayne. 1992. Phylogenetic relationships of the fox-like canids: mitochondrial DNA restriction fragment, site and cytochrome beta sequence analyses. J. Zool. 228:27–39.

    Geiduschek, E. P., and G. A. Kassavetis. 2001. The RNA polymerase III transcription apparatus. J. Mol. Biol. 310:1–26.

    Geiduschek, E. P., and G. A. Kassavetis. 1992. RNA polymerase III transcription complexes. Pp. 247–280 in S. L. McKnight, and K. R. Yamamoto, eds. Transcriptional Regulation. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.

    Giribet, G., and W. C. Wheeler. 1999. On gaps. Mol. Phylogenet. Evol. 13:132–143.

    Gnatt, A. L., P. Cramer, J. Fu, D. A. Bushnell, and R. D. Kornberg. 2001. Structural basis of transcription: an RNA polymerase II elongation complex at 3.3 ANG resolution. Science 292:1876–1882.

    Goomer, R. S., and G. R. Kunkel. 1992. The transcriptional start site for a human U6 small nuclear RNA gene is dictated by a compound promoter element consisting of the PSE and the TATA box. Nucleic Acids Res. 20:4903–4912.

    Green, P., B. Ewing, W. Miller, P. J. Thomas, N. C. S. Program, and E. D. Green. 2003. Transcription-associated mutational asymmetry in mammalian evolution. Nat. Genet. 33:514–517.

    Grove, A., M. S. Adessa, E. P. Geiduschek, and G. A. Kassavetis. 2002. Marking the start site of RNA polymerase III transcription: the role of constraint, compaction and continuity of the transcribed DNA strand. EMBO J. 21:704–714.

    Haas, E. S., J. W. Brown, C. Pitulle, and N. R. Pace. 1994. Further perspective on the catalytic core and secondary structure of ribonuclease P RNA. Proc. Natl. Acad. Sci. USA 91:2527–2531.

    Hannon, G. J., A. Chubb, P. A. Maroney, G. Hannon, S. Altman, and T. W. Nilsen. 1991. Multiple cis-acting elements are required for RNA polymerase III transcription of the gene encoding H1 RNA the RNA component of human RNase P. J. Biol. Chem. 266:22796–22799.

    Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 21:160–174.

    Hatfield, D. H., B. S. Dudock, and F. C. Eden. 1983. Characterization and nucleotide sequence of a chicken gene encoding an opal suppressor transfer RNA and its flanking DNA segments. Proc. Natl. Acad. Sci. USA 80:4940–4944.

    Hillis, D. M., and M. T. Dixon. 1991. Ribosomal DNA molecular evolution and phylogenetic inference. Q. Rev. Biol. 66:411–454.

    Hollander, M., and D. A. Wolfe. 1999. Nonparametric Statistical Methods. Wiley Interscience, New York.

    Ishiguro, N., A. Nakajima, M. Horiuchi, and M. Shinagawa. 2002. Multiple nuclear pseudogenes of mitochondrial DNA exist in the canine genome. Mammal. Genome 13:365–372.

    Jeanmougin, F., J. D. Thompson, M. Gouy, D. G. Higgins, and T. J. Gibson. 1998. Multiple sequence alignment with Clustal X. Trends Biochem. Sci. 23:403–405.

    Kassavetis, G. A., J. A. Blanco, T. E. Johnson, and E. P. Geiduschek. 1992. Formation of open and elongating transcription complexes by RNA polymerase III. J. Mol. Biol. 226:47–58.

    Kirkness, E. F., V. Bafna, A. L. Halpern, S. Levy, K. Remington, D. B. Rusch, A. L. Delcher, M. Pop, W. Wang, C. M. Fraser, and J. C. Venter. 2003. The dog genome: survey sequencing and comparative analysis. Science 301:1898–1903.

    Koepfli, K.-P., and R. K. Wayne. 2003. Type I STS markers are more informative than cytochrome b in phylogenetic reconstruction of the Mustelidae (Mammalia: Carnivora). Syst. Biol. 52:571–593.

    Kolker, J. D., J. Sharma, R. Cruz, and A. M. Diamond. 1995. Sequence and unusual 3' flanking region of the rat tRNA(Ser)-Sec gene. Gene (Amsterdam) 164:375–376.

    Lee, B. J., P. De la Pena, J. A. Tobian, M. Zasloff, and D. Hatfield. 1987. Unique pathway of expression of an opal suppressor phosphoserine tRNA. Proc. Natl. Acad. Sci. USA 84:6384–6388.

    Lee, B. J., P. J. Worland, J. N. Davis, T. C. Stadtman, and D. L. Hatfield. 1989. Identification of a selenocysteylseryl transfer RNA in mammalian cells that recognizes the nonsense codon UGA. J. Biol. Chem. 264:9724–9727.

    Lee, B. J., M. Rajagopalan, Y. S. Kim, K. H. You, K. B. Jacobson, and D. Hatfield. 1990. Selenocysteine serine transfer RNA gene is ubiquitous within the animal kingdom. Mol. Cell. Biol. 10:1940–1949.

    Li, W. H., and D. Graur. 1991. Fundamentals of Molecular Evolution. Sinauer Associates, Inc Sunderland, Mass.

    Lobo, S. M., and N. Hernandez. 1989. A 7 bp mutation converts a human RNA polymerase II small nuclear RNA promoter into an RNA polymerase III promoter. Cell 58:55–68.

    Lobo, S. M., J. Lister, M. L. Sullivan, and N. Hernandez. 1991. The cloned RNA polymerase II transcription factor IID selects RNA polymerase III to transcribe the human U6 gene in-vitro. Genes Dev. 5:1477–1489.

    Mann, H., Y. Ben-Asouli, A. Schein, S. Moussa, and N. Jarrous. 2003. Eukaryotic RNase P: role of RNA and protein subunits of a primordial catalytic ribonucleoprotein in RNA-based catalysis. Mol. Cell 12:925–935.

    Matsuzaki, H., G. A. Kassavetis, and E. P. Geiduschek. 1994. Analysis of RNA chain elongation and termination by Saccharomyces cerevisiae RNA polymerase III. J. Mol. Biol. 235:1173–1192.

    Mattaj, I. W., N. A. Dathan, H. D. Parry, P. Carbon, and A. Krol. 1988. Changing the RNA polymerase specificity of U small nuclear RNA gene promoters. Cell 55:435–442.

    Matthee, C. A., J. D. Burzlaff, J. F. Taylor, and S. K. Davis. 2001. Mining the mammalian genome for artiodactyl systematics. Syst. Biol. 50:367–390.

    Matthee, C. A., and S. K. Davis. 2001. Molecular insights into the evolution of the family Bovidae: a nuclear DNA perspective. Mol. Biol. Evol. 18:1220–1230.

    Mitchell, A., C. Mitter, and J. C. Regier. 2000. More taxa or more characters revisited: combining data from nuclear protein-encoding genes for phylogenetic analyses of Noctuoidea (Insecta: Lepidoptera). Syst. Biol. 49:202–224.

    Mitchell, D. L., T. D. Nguyen, and J. E. Cleaver. 1990. Nonrandom induction of pyrimidine pyrimidone 6–4 photoproducts in irradiated human chromatin. J. Biol. Chem. 265:5353–5356.

    Murphy, S., M. Tripodi, and M. Melli. 1986. A sequence upstream from the coding region is required for the transcription of the 7SK RNA genes. Nucleic Acids Res. 14:9243–9269.

    Myslinski, E., J.-C. Ame, A. Krol, and P. Carbon. 2001. An unusually compact external promoter for RNA polymerase III transcription of the human H1RNA gene. Nucleic Acids Res. 29:2502–2509.

    Myslinski, E., A. Krol, and P. Carbon. 1992. Optimal tRNA-sersec gene activity requires an upstream SPH motif. Nucleic Acids Res. 20:203–210.

    Myslinski, E., C. Schuster, J. Huet, A. Sentenac, A. Krol, and P. Carbon. 1993. Point mutations 5' to the tRNA selenocysteine TATA box alter RNA polymerase III transcription by affecting the binding of TBP. Nucleic Acids Res. 21:5852–5858.

    O'Neill, V. A., F. C. Eden, K. Pratt, and D. L. Hatfield. 1985. A human opal suppressor transfer RNA gene and pseudogene. J. Biol. Chem. 260:2501–2508.

    Ohama, T., I. S. Choi, D. L. Hatfield, and K. R. Johnson. 1994. Mouse selenocysteine tRNA(Ser)-Sec gene (Trsp) and its localization on chromosome 7. Genomics 19:595–596.

    Olsen, G. J., and C. R. Woese. 1993. Ribosomal RNA–a key to phylogeny. FASEB J. 7:113–123.

    Pitulle, C., M. Garcia-Paris, K. R. Zamudio, and N. R. Pace. 1998. Comparative structure analysis of vertebrate ribonuclease P RNA. Nucleic Acids Res. 26:3333–3339.

    Posada, D., and K. A. Crandall. 1998. MODELTEST: Testing the model of DNA substitution. Bioinformatics 14:817–818.

    Pratt, K., F. Eden, K. You, V. O'Neill, and D. L. Hatfield. 1985. Conserved sequences in both the coding and 5' flanking regions of mammalian opal suppressor tRNA genes. Nucleic Acids Res. 13:4765–4775.

    Prychitko, T. M., and W. S. Moore. 2000. Comparative evolution of the mitochondrial cytochrome b gene and nuclear beta-fibrinogen intron 7 in woodpeckers. Mol. Biol. Evol. 17:1101–1111.

    Rodriguez, F., J. L. Oliver, A. Marin, and J. R. Medina. 1990. The general stochastic model of nucleotide substitution. J. Theor. Biol. 142:485–502.

    Rokas, A., and P. W. H. Holland. 2000. Rare genomic changes as a tool for phylogenetics. Trends Ecol. Evol. 15:454–459.

    Sambrook, J., E. F. Fritsch, and T. Maniatis. 1989. Molecular Cloning—A Laboratory Manual, second edition, vols 1, 2, and 3. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.

    Sanderson, M. J., and J. J. Doyle. 1992. Reconstruction of organismal and gene phylogenies from data on multigene families: concerted evolution, homoplasy, and confidence. Syst. Biol. 41:4–17.

    Sbisa, E., G. Pesole, A. Tullo, and C. Saccone. 1996. The evolution of the RNase P- and RNase MRP-associated RNAs: phylogenetic analysis and nucleotide substitution rate. J. Mol. Evol. 43:46–57.

    Schaub, M., E. Myslinski, C. Schuster, A. Krol, and P. Carbon. 1997. Staf, a promiscuous activator for enhanced transcription by RNA polymerases II and III. EMBO J. 16:173–181.

    Silva, J. C., and A. S. Kondrashov. 2002. Patterns in spontaneous mutation revealed by human-baboon sequence comparison. Trends Genet. 18:544–547.

    Simmons, M. P., and H. Ochoterena. 2000. Gaps as characters in sequence-based phylogenetic analyses. Syst. Biol. 49:369–381.

    Simmons, M. P., H. Ochoterena, and T. G. Carr. 2001. Incorporation, relative homoplasy, and effect of gap characters in sequence-based phylogenetic analyses. Syst. Biol. 50:454–462.

    Springer, M. S., R. W. DeBry, C. Douady, H. M. Amrine, O. Madsen, W. W. de Jong, and M. J. Stanhope. 2001. Mitochondrial versus nuclear gene sequences in deep-level mammalian phylogeny reconstruction. Mol. Biol. Evol. 18:132–143.

    Stunkel, W., I. Kober, and K. H. Seifart. 1997. A nucleosome positioned in the distal promoter region activates transcription of the human U6 gene. Mol. Cell. Biol. 17:4397–4405.

    Suzuki, H., K. Moriwaki, and S. Sakurai. 1994. Sequences and evolutionary analysis of mouse 5S rDNAs. Mol. Biol. Evol. 11:704–710.

    Svejstrup, J. Q. 2002. Mechanisms of transcription-coupled DNA repair. Nat. Rev. Mol. Cell Biol. 3:21–29.

    Swofford, D. L. 2003. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Sinauer Associates, Sunderland, Mass.

    Takahashi, K., M. Vigneron, H. Matthes, A. Wildeman, M. Zenke, and P. Chambon. 1986. Requirement of stereospecific alignments for initiation from the SV-40 early promoter. Nature 319:121–126.

    Waibel, F., and W. Filipowicz. 1990. RNA polymerase specificity of transcription of arabidopsis U small nuclear RNA genes determined by promoter element spacing. Nature 346:199–202.

    Wasko, A. P., C. Martins, J. M. Wright, and P. M. Galetti Junior. 2001. Molecular organization of 5S rDNA in fishes of the genus Brycon. Chromosome Res. 9:82.

    Wayne, R. K., E. Geffen, D. J. Girman, K. P. Koepfli, L. M. Lau, and C. R. Marshall. 1997. Molecular systematics of the Canidae. Syst. Biol. 46:622–653.

    Xiao, S., F. Scott, C. A. Fierke, and D. R. Engelke. 2002. Eukaryotic ribonuclease P: a plurality of ribonucleoprotein enzymes. Pp. 165–189 in C. C. Richardson, R. Kornberg, C. R. H. Raetz, and J. W. Thorner, eds. Annual Review of Biochemistry. Annual Reviews, Palo Alto, Calif.

    Xu, X.-M., B. A. Carlson, L. K. Kim, B. J. Lee, D. L. Hatfield, and A. M. Diamond. 1999a. Analysis of selenocysteine (Sec) tRNA(Ser)Sec genes in Chinese hamsters. Gene 239:49–53.

    Xu, X.-M., X. Zhou, B. A. Carlson, L. K. Kim, T.-L. Huh, B. J. Lee, and D. L. Hatfield. 1999b. The zebrafish genome contains two distinct selenocysteine tRNA(Ser)Sec genes. FEBS Lett. 454:16–20.

    Zecherle, G. N., S. Whelen, and B. D. Hall. 1996. Purines are required at the 5' ends of newly initiated RNAs for optimal RNA polymerase III gene expression. Mol. Cell. Biol. 16:5801–5810.

    Zuker, M. 2003. Mfold Web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31:3406–3415.(Carolyne Bardeleben, Rach)