当前位置: 首页 > 期刊 > 《分子生物学进展》 > 2005年第3期 > 正文
编号:11176495
Root of the Eukaryota Tree as Inferred from Combined Maximum Likelihood Analyses of Multiple Molecular Sequence Data
http://www.100md.com 《分子生物学进展》
     * Department of Biosystems Science, Graduate University for Advanced Studies (Sokendai), Hayama, Kanagawa, Japan; The Institute of Statistical Mathematics, Minato-ku, Tokyo, Japan; The Rockefeller University; Institute of Biological Sciences, University of Tsukuba, Tsukuba, Japan

    Correspondence: E-mail: hashi@biol.tsukuba.ac.jp.

    Abstract

    Extensive studies aiming to establish the structure and root of the Eukaryota tree by phylogenetic analyses of molecular sequences have thus far not resulted in a generally accepted tree. To re-examine the eukaryotic phylogeny using alternative genes, and to obtain a more robust inference for the root of the tree as well as the relationship among major eukaryotic groups, we sequenced the genes encoding isoleucyl-tRNA and valyl-tRNA synthetases, cytosolic-type heat shock protein 90, and the largest subunit of RNA polymerase II from several protists. Combined maximum likelihood analyses of 22 protein-coding genes including the above four genes clearly demonstrated that Diplomonadida and Parabasala shared a common ancestor in the rooted tree of Eukaryota, but only when the fast-evolving sites were excluded from the original data sets. The combined analyses, together with recent findings on the distribution of a fused dihydrofolate reductase–thymidylate synthetase gene, narrowed the possible position of the root of the Eukaryota tree on the branch leading to Opisthokonta or to the common ancestor of Diplomonadida/Parabasala. However, the analyses did not agree with the position of the root located on the common ancestor of Opisthokonta and Amoebozoa, which was argued by Stechmann and Cavalier-Smith [Curr. Biol. 13:R665–666, 2003] based on the presence or absence of a three-gene fusion of the pyrimidine biosynthetic pathway: carbamoyl-phosphate synthetase II, dihydroorotase, and aspartate carbamoyltransferase. The presence of the three-gene fusion recently found in the Cyanidioschyzon merolae (Rhodophyta) genome sequence data supported our analyses against the Stechmann and Cavalier-Smith-rooting in 2003.

    Key Words: Diplomonadida ? Parabasala ? eukaryote evolution ? root ? maximum likelihood ? combined phylogeny

    Introduction

    The prominent part of eukaryotic diversity is represented by unicellular organisms, the protists. Based primarily on their morphology and biology, protists can be assigned to several dozen well-characterized groups (Lee, Leedale, and Bradbury 2000). Many attempts have been made to establish a natural, phylogenetic system of eukaryotes, but the relationships and the order of evolutionary emergence of many diverse groups remain unresolved, primarily because of a lack of clear synapomorphies. Phylogenetic inferences based on molecular sequences promised to provide a natural system, but thus far they have failed to give unequivocal results. As yet no consensus has been reached on either the structure or the root of the eukaryotic part of the universal tree. Recent articles have shown trees with essentially unresolved origins of many lineages (Dacks and Doolittle 2001; Roger and Silberman 2002; Simpson and Roger 2002; Baldauf 2003).

    Analysis of increasing numbers of sequences from more and more species resulted in trees that often resolved the relationships within individual lineages well, but more distant, deeper relationships were more often contradictory than not. Recent developments in our understanding of the processes of molecular evolution and the development of more discriminating techniques of sequence analysis uncovered several reasons for the difficulty in attaining the desired goal with single sequences (Philippe et al. 2000; Gribaldo and Philippe 2002). Among these are the progressive loss of phylogenetic information resulting from mutational saturation of diverging sequences, the long branch attraction (LBA) artifact of phylogenetic reconstruction (Felsenstein 1978), and the failure to model the group-specific and species-specific differences in the evolution of different positions of the macromolecules studied. Lateral gene transfers (LGT) have also been recognized recently to contribute to the discordance of different gene trees (Richards et al. 2003).

    To overcome the lack of resolution in single-gene phylogenies, various extensive analyses based on combined data sets with multiple genes have recently been performed, and a monophyletic origin of each of the higher-order groups, Opisthokonta (Metazoa + Fungi/Microsporidia), Amoebozoa (Lobosa + Conosa), Plantae (Viridiplantae + Rhodophyta + Glaucophyta), Euglenozoa + Heterolobosea, and Alveolata + stramenopiles has been established (Moreira, Le Guyader, and Philippe 2000; Baldauf et al. 2000; Arisue et al. 2002a, 2002b; Bapteste et al. 2002). In addition to these groups, the presence of several other higher-order groups has also been suggested by molecular and/or morphological findings (for review see Baldauf 2003). The group Cercozoa was supported by the actin and SSUrRNA phylogenies (Keeling 2001; Cavalier-Smith and Chao 2003a, 2003b) and by a shared insertion in the polyubiquitin genes (Archibald et al. 2003). The "excavate taxa" was proposed as a putative monophyletic or paraphyletic group (O'Kelly and Nerad 1999; Simpson and Patterson 1999), which includes organisms possessing a vental feeding groove that collects suspended particles driven into it by the beating of a posterior flagellum. Cavalier-Smith (2002) further proposed a larger group, "Excavata," based on an unrooted SSUrRNA tree and morphological considerations. Excavata comprises Metamonada (including Diplomonadida), Parabasala, Percolozoa (including Heterolobosea), Euglenozoa, and Loukozoa (including Oxymonadida, Trimastix, Malawimonas, Carpediomonas, Jakobea). To date, however, examination of the molecular phylogenies of SSUrRNA and tubulins have not supported either monophyly or paraphyly of Excavata with any statistical confidence (Dacks et al. 2001; Edgcomb et al. 2001; Silberman et al. 2002; Simpson et al. 2002).

    Only small numbers of higher-order groups are present in the tree of Eukaryota as mentioned above, and an overall picture of biodiversity of Eukaryota seems to be rather good and comprehensive (Cavalier-Smith 2004). However, the phylogenetic relationships of the higher-order groups are still uncertain, and many alternative possibilities still exist regarding the root of the tree of Eukaryota. On the basis of distribution of the dihydrofolate reductase (DHFR) and thymidylate synthase (TS) fused gene, Stechmann and Cavalier-Smith (2002) proposed that the root is likely to be located between Opisthokonta and the others. Later they argued that the root should be located between the bikonts and Opisthokonta/Amoebozoa (Stechmann and Cavalier-Smith 2003a), together with independent lines of evidence for other gene fusion events in the pyrimidine biosynthetic pathway. Particular attention was paid also to the presence of a fusion between the genes, carbamoyl-phosphate synthetase (CPS) II that are composed of glutamine amidotransferase (GAT), and the CPS domains, dihydroorotase (DHO), and aspartate carbamoyltransferase (ACT), which is exclusively found in Metazoa, Fungi, and the amoebozoan, Dictyostelium discoideum (Nara, Hashimoto, and Aoki 2000).

    In the present study, in order to obtain a robust resolution for the evolutionary relationship among major eukaryotic groups and to gain a better insight into the possible root of the eukaryotic tree, we performed combined maximum likelihood (ML) analyses based on 24 genes concerning the relationships among seven major eukaryotic groups (Opisthokonta, Amoebozoa, Plantae, Euglenozoa/Heterolobosea, Alveolata/stramenopiles, Diplomonadida, and Parabasala) using an outgroup for rooting the tree. For this purpose we cloned and sequenced the genes from several protists coding for isoleucyl-tRNA and valyl-tRNA synthetases (IleRS, ValRS), cytosolic-type heat shock protein 90 (HSP90c), and the largest subunit of RNA polymerase II (RPB1). Analysis of all selected sites from original alignments for the 24 genes including two rRNAs was strongly affected by the LBA artifact, significantly positioning Diplomonadida at the base of the eukaryotic tree. However, analysis of 22 protein-coding genes using only slowly evolving amino acid sites demonstrated clearly that Diplomonadida and Parabasala are closely related, and that an early emergence of the common ancestor of these two groups is not necessarily exclusively supported. Our present analyses, together with findings on the distribution of the fused DHFR-TS gene as mentioned above, narrowed the possible position of the root of the Eukaryota tree on the branches leading to Opisthokonta or to the common ancestor of Diplomonadida/ Parabasala.

    Materials and Methods

    Sequencing of Protist Genes

    The original sequences (and the GenBank Accession Numbers) reported in this work were these: IleRS of Glugea plecoglossi [Microsporidia] (AB092420 [GenBank] ), Encephalitozoon hellem [Microsporidia] (AB092421 [GenBank] ), Entamoeba histolytica (AB092423 [GenBank] and AB092424 [GenBank] ), Giardia intestinalis (AB092425 [GenBank] ), Trichomonas vaginalis (AB092426 [GenBank] ), Trypanosoma cruzi (AB092427 [GenBank] ), Plasmodium falciparum (AB092428 [GenBank] ); ValRS of G. plecoglossi (AB092429 [GenBank] ), E. hellem (AB092430 [GenBank] ), E. histolytica (AB092433 [GenBank] and AB092434 [GenBank] ), T. cruzi (AB092435 [GenBank] ), P. falciparum (AB092436 [GenBank] ); HSP90c of G. intestinalis (AB092407 [GenBank] and AB092408 [GenBank] ), T. vaginalis (AB092409 [GenBank] and AB092410 [GenBank] ), E. histolytica (AB092411 [GenBank] ); and RPB1 of G. intestinalis (AB092412 [GenBank] ), E. histolytica (AB092413 [GenBank] ). Details of cloning and sequencing strategies are described in the Supplementary Materials online.

    Sequence Alignments of the Genes Used for the Phylogenetic Analyses

    In addition to IleRS, ValRS, RPB1, and HSP90c, for which original sequences were established from several protists in this work, 20 other genes were used for phylogenetic analyses. These included small subunit (SSU) rRNA, large subunit (LSU) rRNA, EF1, EF2, ribosomal proteins (RP) S14, S15a, L5, L8, L10a, cytosolic-type HSP70 (HSP70c), ER-type HSP70 (HSP70er), mitochondrial-type HSP70 (HSP70mit), chaperonin 60 (CPN60), chaperonin-containing TCP-1 (CCT) , , , subunits, actin (ACT), -tubulin (TB), and ?-tubulin (TB?). Genes related to metabolic pathways were not used in the present analyses, because LGT events are frequently observed for these genes, and thus inclusion of such genes would have violated the correct inference for organismal phylogeny. We assumed that the genes used in this study are not subjected to LGT as far as the analysis of the Eukaryota domain is concerned, because preliminary phylogenetic analyses of these genes did not suggest the presence of any LGT events.

    For the above protein-coding 22 genes, amino acid sequences from diverse eukaryotes and several outgroup sequences were collected from various databases and alignments, including the original sequences of the above four genes obtained in this study, were constructed using the SAM2.1 program (Hughey and Krogh 1996). The obtained alignments were then adjusted manually. For SSUrRNA and LSUrRNA, alignments of diverse eukaryotic and four archaebacterial (outgroup) sequences were obtained using the secondary structure-based alignment database (http://oberon.fvms.ugent.be:8080/rRNA/index.html) (Wuyts et al. 2001, 2002). Several additional sequences not present in the database were inserted and then aligned manually. Unambiguously aligned sites were selected from each of the original alignments and used for phylogenetic analyses. Alignments and data sets used are available from T. H. upon request.

    Outgroup sequences used for individual genes are listed in table S1 of the Supplementary Materials online. In brief, archaebacteria were used for the analyses of the genes, SSUrRNA, LSUrRNA, EF1, EF2, RP-S14, RP-S15a, RP-L5, RP-L8, RP-L10a, TCP-1, TCP-1, TCP-1, and TCP-1. Eubacteria were used for IleRS, ValRS, HSP70mit, and CPN60. Paralogous eukaryotic sequences were used for the other genes. Combined maximum likelihood (ML) analyses in this study were carried out under the assumption that ingroup sequences in each gene have evolved independently.

    Programs and Models Used in the Phylogenetic Inference

    The NUCML and PROTML programs in the package MOLPHY (version 2.3) (Adachi and Hasegawa 1996) were used in the analyses, which assumed a homogeneous across-site rate (Homogeneous model). To take the evolutionary heterogeneity across-site rate into consideration, the BASEML and CODEML programs in the package PAML (version 3.1) (Yang 1997) were used, where a discrete -distribution with 8 categories for across-site rate heterogeneity was assumed (RAS model). The -shape parameter () was estimated from the analyzed data for each gene. The combined ML analysis, which calculated the sum of the log-likelihoods, was carried out using the TOTALML program in MOLPHY with a variety of different gene combinations. The HKY85 and JTT-F models were assumed for nucleotide and amino acid substitution processes, respectively (Hasegawa, Kishino, and Yano 1985; Jones, Taylor, and Thornton 1992). The RELL bootstrap analysis (Kishino, Miyata, and Hasegawa 1990) was performed on alternative trees to obtain approximate bootstrap proportion (BP) values, because the limitation of computational time did not enable us to carry out real bootstrap analyses. The RELL method was shown to be a good approximation to the real bootstrap method (Hasegawa and Kishino 1994). The AU test (Shimodaira 2002) in the CONSEL program (Shimodaira and Hasegawa 2001) and the Shimodaira-Hasegawa (SH) test in the BASEML and CODEML programs were used for statistical comparisons among the alternative trees of interest.

    Phylogenetic Analyses of Individual Genes

    In the preliminary stage of each individual gene analysis, an unrooted tree was considered for each gene, excluding sequences that belonged to an outgroup. The quick topology search option of the NUCML or PROTML program (–q –n2000) was used to produce candidate trees, which were subsequently analyzed by the ordinary ML method using the Homogeneous model. The best tree and alternative trees, of which the log-likelihood differences from the log-likelihood of the best tree were within 1 standard error (SE) (1SE criterion), were selected. These trees were further analyzed by the ML method using the RAS model, and the best tree was finally selected. Based on the best tree and widely accepted phylogenetic relationships, constraints on the subtrees for seven higher-order taxonomic groups of Eukaryota (Opisthokonta, Amoebozoa, Plantae, Alveolata/stramenopiles, Euglenozoa/Heterolobosea, Diplomonadida, and Parabasala) were assumed in advance. The subtree for the outgroup of each gene was also assumed in advance, based on established findings. Taxa and subtrees are shown in Table S1 of the Supplementary Materials online with other information.

    Thereafter, for each gene with the subtree constraints, a total of 10,395 possible trees for eight groups (seven groups + an outgroup) was exhaustively analyzed with the Homogeneous model for a data set including all the sites initially selected from an original alignment ( "all" data set). Based on the best tree using the Homogeneous model site-by-site rate categories, r1 (the slowest evolving sites, including constant sites) through r8 (the fastest evolving site), were estimated by the analysis using the RAS model. To investigate the effect of removing constant or slowly evolving sites or rapidly evolving sites in the analyses, we made alternative data sets in a way similar to that previously examined by Hirt et al. (1999) and Dacks et al. (2002). The r8, r1, and r7 sites were stepwise removed from the "all" data set, producing another four data sets, –r8, –r18, –r78, and –r178. For each of these data sets, an exhaustive analysis of 10,395 trees using the Homogeneous model was carried out in the same manner as for the "all" data set.

    Combined Analyses of the Relationships among Seven Eukaryotic Higher-Order Groups with the Outgroup

    To evaluate the support for a given tree among the 10,395 trees from the total information residing in the individual genes, phylogenetic information of individual genes were combined by summing up site-by-site log-likelihoods for each tree. Thereafter, the tree with the highest log-likelihood in total was selected as the best tree for the combined analysis. With this approach, parameters (such as branch lengths) were optimized for each gene, allowing the combined analysis to take into consideration heterogeneous phylogenetic information among the genes. For each of the five data sets ("all," –r8, –r18, –r78, and –r178) the summation was done over 24 genes including 2 rRNAs and over 22 protein-coding genes. Based on the analyses using the summation over 24 genes for the data sets "all" and –r78, candidate trees were selected from 10,395 alternatives based on the 4SE criterion (the best tree and trees with log-likelihood differences from the best tree less than 4SE), producing 137 and 572 trees for the data sets ‘all’ and –r78, respectively. The union of these tree sets contained 577 trees and was exhaustively searched by the analysis using RAS model for each of the 24 individual genes. The combined analysis was done in the same way as described above for each of the five data sets.

    Combined Analyses with Additional Assumptions

    According to the results of the analyses as described in the following section, Diplomonadida and Parabasala were grouped in advance, and 945 possible trees for the six eukaryotic groups with an outgroup were exhaustively examined for each gene using the –r78 data set and the RAS model. Based on the combined analyses, over all 24 genes and 22 protein-coding genes, the selection of candidate trees was carried out against the 3SE criterion, producing 102 and 205 trees, respectively. The union of the two tree sets resulted in 214 trees, which were then used in AU tests to enable statistical comparisons with different combinations of genes.

    Finally, assuming that Euglenozoa/Heterolobosea are closely related to Diplomonadida/Parabasala, which corresponds to the Excavata monophyly hypothesis (Cavalier-Smith 2002), 105 possible trees for the five eukaryotic groups with an outgroup were exhaustively examined for each of the 22 protein-coding genes using the data sets, "all." –r8, and –r78. Based on the combined analyses for these data sets, 105 alternative trees were compared.

    Results

    The ML trees for IleRS, ValRS, HSP90c, and RPB1 are shown in figure S1 of the Supplementary Materials online. Although the monophyly of most of the higher-order eukaryotic groups was reconstructed in these trees, the analyses did not statistically resolve the relationship among these groups. None of the best trees of the other 20 individual genes coincided with the accepted trees of Eukaryota either. The best tree of each gene showed some differences from the accepted trees (Moreira, Le Guyader, and Philippe 2000; Baldauf et al. 2000; Arisue et al. 2002a, 2002b; Bapteste et al. 2002).

    The Relationship among Seven Higher-Order Groups of Eukaryota

    At first, combined analyses were performed by the Homogeneous model using the "all" data set in order to provide an inference based on the most classical phylogenetic approach. The best tree by the 22 protein genes positioned the amitochondriate lineages, Diplomonadida and Parabasala, at the earliest and second-earliest branches of the eukaryotic tree, respectively, with 60% and 81% BP supports (fig. 1a). The stepwise divergences of Opisthokonta, Amoebozoa, and Plantae were followed by two early branches. The tree reconstructed the monophyly of Plantae, Alveolata/stramenopiles, and Euglenozoa/Heterolobosea, as suggested by the presence of a fused DHFR-TS gene (Stechmann and Cavalier-Smith 2002; 2003a). Inclusion of two rRNA genes (fig. 1b) further supported the earliest branching of Diplomonadida (96%) and changed the branching order apart from the two early branches. This tree was congruent with a previous tree inferred by the combined analysis of approximately 100 proteins with 25,000 sites but that did not include Parabasala (Bapteste et al. 2002). Because the analysis including two rRNA genes seemed to be strongly affected by a LBA artifact, uniting Diplomonadida with outgroup sequences, analyses based on the 22 protein genes were mainly used for subsequent phylogenetic inference.

    FIG. 1.— Schematic representation of the best tree from 10,395 alternatives based on the analysis of the "all" data set using the Homogeneous model for across-site rate. (a), 22 protein genes. (b), 22 protein + 2 rRNA genes. Bootstrap proportion (BP) values are shown on internal branches.

    In contrast to the analysis using the "all" data set with Homogeneous modeling (fig. 1a), removal of the fast-evolving sites (–r8, –r78) and/or the use of RAS model reduced the possibility of Diplomonadida possessing the earliest branch status. The best tree of the analysis using the RAS model on the –r78 data set positioned Diplomonadida as the closest relative of Parabasala with 86% BP support (fig. 2a, Tree A). The branching order, followed by divergence of a common ancestor for Diplomonadida and Parabasala in Tree A, was exactly the same as the tree in figure 1a, although the BP support values for internal branches were reduced.

    FIG. 2.— Relationship between seven higher-order eukaryotic groups. (a), The best tree and the alternative trees based on the analysis of the –r78 data set, including 22 protein-coding genes with the rate across site (RAS) model ("–r78 with RAS"). Five alternative trees of interest. Tree A, (the best tree); Tree B (same as the tree shown in Bapteste et al. (2002) but with Parabasala); Tree C (the tree shown in Baldauf et al. (2000) rooted by the common ancestor of Diplomonadida and Parabasala); Tree D (Stechmann and Cavalier-Smith 2002); Tree E (Stechmann and Cavalier-Smith 2003a). Internal nodes are represented by lowercase characters, a l. In Tree A, multi-taxon groups are shown as triangles. The base of the triangle is proportional to the average number of taxa for different genes, and the number for each group is shown in brackets beside the group name. The width of the triangle is proportional to average branch length for taxa and for genes weighted by the number of sites used. BP values are shown in parentheses. In Tree B through Tree E, only topologies are shown schematically with p values of the AU test for comparison of the 577 trees examined. Abbreviations for the groups are as follows: Op, Opisthokonta; Am, Amoebozoa; Pl, Plantae; AS, Alveolata/stramenopiles; EH, Euglenozoa/Heterolobosea, DP, Diplomonadida/Parabasala. (b). Variations in BP values for the internal branches of the five trees shown in panel a. The values are shown for 10 different analyses using different data sets and different models for site rates.

    In the analysis of the –r78 data set with the RAS model, alternative trees of interest which have previously been suggested in the literature were compared (fig. 2a). Tree B corresponds to the tree of Bapteste et al. (2002), and Tree C is a rooted version of the tree of Baldauf et al. (2000) and is based on the concatenated EF1, actin and tubulin genes, the root of which is located on the line leading to the common ancestor of Diplomonadida and Parabasala. Trees D and E, suggested by Stechmann and Cavalier-Smith (2002, 2003a), are based on the distribution of fused DHFR-TS and/or CPSII-DHO-ACT genes. The AU test for comparing the 577 candidate trees significantly rejected Trees D and E (p < 0.05), but not at a level of p < 0.01.

    To investigate the effect of reducing sites and different model specifications, variations in BP support values among 577 alternative trees were compared between different data sets using different sample sites and different models for the site rates (fig. 2b). For node "d," 81% support was found for the data set "all" with the Homogeneous model, whereas only 45% support was found for the data set –r78 with the RAS model. For node "e," Homogeneous model analysis using the data set "all" did not support the node (38%), whereas analysis using the data set –r78, with either Homogeneous or RAS modeling, showed a support of greater than 85%. By removing the fast evolving sites, support for the close affinity between Diplomonadida and Parabasala increased, whereas support for the early branching status of these two was decreased. This tendency was particularly obvious in the analyses using the Homogeneous model (fig. 2b, nodes d and e). In contrast, removal of the slowest evolving sites (–r1) showed no significant effect on variation of the BP values for any of the nodes (fig. 2b), suggesting that removal of the slowest or constant sites did not affect phylogenetic inference as far as the ML method is concerned. For the other nodes of interest, represented by Trees B, C, D, and E, no high BP support was obtained for grouping Opisthokonta with Amoebozoa (node f) or Euglenozoa/Heterolobosea with Diplomonadida/Parabasala (node j), corresponding to the Excavata monophyly hypothesis (Cavalier-Smith 2002). Removal of the fast-evolving sites increased support for the earliest branching status of Opisthokonta (node l), especially with regard to the RAS model analysis (more than 10%), albeit without any clear support.

    Possible Root of the Tree of Eukaryota

    According to the above analyses, Diplomonadida and Parabasala were grouped together to form a new clade. To further analyze the relationship among higher-order groups and the root of the tree, 945 possible trees for the six higher-order eukaryotic groups (including the Diplomonadida-Parabasala clade) and an outgroup were exhaustively examined for each gene, using the –r78 data set with RAS modeling. Thereafter, subsequent combined analyses were performed. Because the alternative trees for the BP analyses were different from the previous ones (577 trees), the BP support values for the nodes in figure 2a except node "e" were slightly changed but were almost the same as shown in figure 2.

    Based on the criteria described in Materials and Methods, 214 trees were selected out of 945 trees for statistical comparisons. Of these 214 trees, combined analyses were performed on various combinations of the genes. Based on the analysis with 22 protein genes, 60 trees were finally selected by the AU test with a criterion of p > 0.05. Figure 3 compares the P values of the 60 trees and three additional trees of interest (Tree D in fig. 2a by Stechmann and Cavalier-Smith (2002), the best tree from tubulins, and the best tree from rRNAs). The 214 trees did not include Tree E in figure 2a by Stechmann and Cavalier-Smith (2003a).

    FIG. 3.— Comparison of alternative candidate trees for the different combined analyses with different combinations of genes. Sixty trees were finally selected (Trees 1 to 60) based on analysis of 22 protein genes using the –r78 data set with RAS modeling, and are shown with the other trees of interest (Trees 61 to 63). Tree A to Tree D in figure 2a are shown in parentheses with Tree number. The symbols |, *, and in topologies are used to denote the root of Eukaryota, the presence of monophyly for Plantae, Euglenozoa/Heterolobosea, and Alveolata/stramenopiles, and the presence of the Excavata monophyly, respectively. li is a log-likelihood difference between the ML tree (Tree 44) and the corresponding tree. P values from the AU test are categorized into six groups by shading, as shown at the bottom of the figure. Each column corresponds to one of the different combined analyses with different gene combinations. The "15 proteins" include EF1, EF2, IleRS, ValRS, RP-S14, RP-S15a, RP-L8, RPB1, HSP70c, HSP70er, HSP70mit, HSP90c, CPN60, TCP1-, and ACT. In the best tree for each of these 15 individual proteins, branch length leading to the outgroup was less than 0.5 substitutions/site, and thus the outgroup was not so extremely distant as found in the other seven proteins.

    In the analysis of the 22 proteins, Tree A (= Tree 44, shown in fig. 2a) was the best tree. When tubulins were removed from the analysis of the 22 proteins ("–Tubulin"), the best tree shifted to the tree with Opisthokonta rooting (Tree 19). This tree was also selected as the best tree by the combined analyses of the chaperone proteins and the "15 proteins," each of which had an outgroup that was not extremely distant (see the legend to fig. 3). Tree D was not rejected in the analyses of "–Tubulin" (p 0.1), "15 proteins" (p 0.2), translation-related proteins (p 0.1), and chaperone proteins (p 0.2), whereas analyses of tubulins (p < 0.01) and rRNAs (p < 0.01) significantly rejected Tree D.

    Analyses of tubulins and rRNAs significantly rejected most of the 60 trees. It would appear that the phylogenetic signals residing in the tubulins and rRNAs must be different from those in the other protein data sets used in the present analyses. Interestingly, all the trees except Tree 45, that were not rejected by the analysis of tubulins (p 0.05), positioned Opisthokonta as the closest relative to Diplomonadida/Parabasala, indicating that the tubulin data sets strongly supported the close relationship between Opisthokonta and Diplomonadida/Parabasala. On the other hand in the analysis by rRNAs, 59 of the 60 trees were significantly rejected, leaving only Tree B (Bapteste et al. 2002). Also in this analysis, the classical eukaryotic tree of SSUrRNA (Tree 62) was significantly supported, even when rapidly evolving sites were excluded from the analysis.

    Because Plantae, Euglenozoa/Heterolobosea, and Alveolata/stramenopiles share a fused DHFR-TS gene, these three groups are likely to be monophyletic, and the root of the tree of Eukaryota should not be located within these three groups (Stechmann and Cavalier-Smith 2002). The ML tree, from the combined analyses presented in this study, also reconstructed the monophyly for these three groups by sequence-based evidence, although the BP support was not high (fig. 2a). Therefore, from the 60 selected trees listed in figure 3, together with the DHFR-TS gene fusion–based findings, the possible position of the root of the eukaryotic tree could finally be narrowed to the branch leading either to Opisthokonta (Trees 5, 6, 16, 19, 20), to the common ancestor of Diplomonadida/Parabasala (Trees 44 [Tree A], 45, 51, 52 [Tree C]), to the common ancestor of Opisthokonta and Diplomonadida/Parabasala (Tree 55), or to the common ancestor of Plantae, Euglenozoa/Heterolobosea, and Alveolata/stramenopiles (Trees 57 and 58). In addition, if we also accept the close relationship between Excavata, Plantae, and Alveolata/stramenopiles as one of the candidate relationship (Cavalier-Smith 2002; Stechmann and Cavalier-Smith 2002; 2003a), Tree 1 cannot be ruled out either. The close relationship between Opisthokonta and Diplomonadida/Parabasala found in Trees 55 and 58 is likely to be artificially affected by tubulins as mentioned above. The p values for Trees 6, 55, and 57 decreased to 0.01 p < 0.05 in the analysis of the "15 proteins" whose outgroups are not extremely distant. Based on these considerations Trees 6, 55, 57, and 58 can also be ruled out, further narrowing the possibilities to the Opisthokonta rooting or to the Diplomonadida/Parabasala rooting.

    Analysis with a Constraint on the Excavata Monophyly

    To seek a possible relationship among higher-order groups under the assumption that Excavata are monophyletic (Cavalier-Smith 2002), 105 possible alternative trees for the five higher-order groups (Opisthokonta, Amoebozoa, Plantae, Alveolata/stramenopiles, Excavata) and an outgroup were exhaustively examined using the RAS model with the –r78 data set. Tree D (Stechmann and Cavalier-Smith 2002) was selected as the best tree (fig. 4). Tree 60 in figure 3, in which Excavata was positioned at the base of Eukaryota, was not significantly different from the best tree (p = 0.433), and the tree (Tree 1 in figure 3) that exchanges the positions of Excavata and Plantae in Tree D was not significantly different either (p = 0.493). Sixty-nine trees were rejected by the AU test at the significance level, p < 0.05. Tree E (Stechmann and Cavalier-Smith 2003a) was included in these trees (p = 0.017). As shown in the BP values indicated in the internal branches of the trees in figure 4, the basal placings of Opisthokonta and Excavata were supported by 1% and 97%, respectively, when the "all" data set was used with Homogeneous modeling. In contrast, supports for these placings shifted to 74% and 18%, respectively in the analysis using the RAS model on the –r78 data set. A close relationship with 91% support was found between Opisthokonta and Amoebozoa when the Homogeneous model was used on the "all" data set. However, support decreased to 21% when the –r78 data set was used with RAS modeling. The analyses by removing fast-evolving sites with the RAS model favored the Opisthokonta rooting. This supported the hypothesis proposed by Stechmann and Cavalier-Smith in 2002 but not their later hypothesis of 2003 (Stechmann and Cavalier-Smith 2003a), under the assumption that Excavata monophyly really was the case.

    FIG. 4.— Four alternative trees of interest selected from 105 possible trees in the analysis by the –r78 data set using the RAS model under the assumption that Excavata are monophyletic. Tree D, the best tree (Stechmann and Cavalier-Smith 2002, Tree 61 in fig. 3); Tree 1 in figure 3; Tree E, (Stechmann and Cavalier-Smith 2003a); Tree 60 in figure 3. P values of the AU test are shown in parentheses for Trees 1, E, and 60. BP values are shown over internal branches for different data sets analyzed and for the different models used.

    Discussion

    A close relationship between amitochondriate lineages, Diplomonadida and Parabasala, in a rooted tree has been suggested by several phylogenies of non-metabolic genes: ?-tubulin (Keeling and Doolittle 1996); EF1 (Hashimoto et al. 1997); ValRS (Hashimoto et al. 1998); CPN60 (Horner and Embley 2001); and concatenated five ribosomal proteins (Arisue et al. 2004), but it has not been reconstructed in other trees of non-metabolic genes. The present analyses clearly demonstrated for the first time that Diplomonadida and Parabasala are the closest relatives to each other in a rooted tree. The presence of mitochondrion-related genes and organelles in several amitochondriate protists has suggested that these protists secondarily lost their typical mitochondria separately in different lineages (Roger 1999; Rotte et al. 2000; Williams et al. 2002; Embley et al. 2003; Tovar et al. 2003). The grouping of Diplomonadida and Parabasala reduced the number of secondary losses of typical mitochondria in the Eukaryota tree, although these two lineages show deep differences in the organization of their amitochondriate phenotype (Martin and Müller 1998).

    The tree of Eukaryota has been examined by the SSUrRNA phylogeny during past two decades, and various phylogenetic questions have been successfully addressed (e.g., Sogin and Silberman 1998). One of the most important implications of the SSUrRNA phylogeny was that the amitochondriate protists, Euglenozoa, Heterolobosea, and Amoebozoa diverged stepwise before the radiation of the terminal "Crown" groups (Sogin and Silberman 1998). The classical SSUrRNA tree as shown in Tree 62 of figure 3, however, was exclusively supported only by rRNAs in the present analyses. Averaged phylogenetic signals residing in the protein data sets did not favor the rRNA tree at all. Because figure 3 revealed that the rRNA tree was an extreme example of the tree of Eukaryota, the widely accepted, SSUrRNA-based scenario of the eukaryotic evolution should be extensively revised (Cavalier-Smith 2004). At the same time, the tubulin phylogeny was also very discordant with the combined protein phylogeny without tubulins (fig. 3). Tubulins polymerize to form microtubules, the major component of cytoskeleton; 9+2 axonemes, and mitotic spindles. The tubulins are the most important molecules for forming structure and morphology of the cell. Compared with those of other proteins used in the present combined analyses, functional constraints on tubulins are more likely to be affected by lifestyle and environment of the organisms. Thus, convergent evolution at the molecular level might have occurred and violated the organismal phylogeny in the tubulin data sets.

    From the combined analyses of the 22 protein sequences together with the findings for the distribution of the DHFR-TS gene fusion in the eukaryotic tree, two possibilities seem to exist for the root of the tree of Eukaryota, namely the branch leading either to Opisthokonta or that leading to the common ancestor of Diplomonadida/Parabasala. No strong support was detected for the latter possibility, in contrast to the prominent support generally obtained by the rRNA phylogeny. If the latter possibility could be discarded because of a widely recognized possible LBA artifact, Opisthokonta rooting would be the most likely option. The early emergence of Opisthokonta has been weakly recovered by a recent HSP90c phylogeny including six different, previously unsampled eukaryotic groups (Stechmann and Cavalier-Smith 2003b).

    Although the present analyses did not support the Excavata monophyly hypothesis (Cavalier-Smith 2002), we examined the eukaryotic phylum Excavata. This was because we could not entirely exclude the hypothesis by the present combined analyses alone, possibly because they were affected by the serious LBA problems. Many unknown artifacts including LBA may critically influence the present analyses. One of the two higher-order groups in Excavata, Euglenozoa/Heterolobosea, and Diplomonadida/Parabasala, might be artificially located at the base of the eukaryotic tree (fig. 3), and thus the monophyly of Excavata might be difficult to reconstruct. It is worth noting that the best tree obtained in the analysis with a constraint on the Excavata monophyly was exactly the same as Tree D (Stechmann and Cavalier-Smith 2002). This result demonstrates once again that Opisthokonta rooting is most likely, if the possibility of rooting on one of the fast-evolving groups, Diplomonadida/Parabasala and Euglenozoa/Heterolobosea, is not considered. If Tree D was really the case, then the DHFR-TS fused gene would have been lost in the parasites of the Diplomonadida/Parabasala group, because neither DHFR nor TS activity was detected in Giardia intestinalis, Trichomonas vaginalis, and Tritrichomonas foetus (Wang et al. 1983; Wang and Cheng 1984; Aldritt, Tien, and Wang 1985), and because no related gene was found in the genome sequencing database of Giardia intestinalis (McArthur et al. 2000). Examination of the presence or absence of the fused gene in the free living organisms that belong to this group will be important to clarify the status of Diplomonadida/Parabasala in the phylogenetic tree based on the DHFR-TS gene fusion.

    Preliminary exploration of the distribution of the gene fusion, CPSII-ACT-DHO, which was exclusively found in Opisthokonta and Amoebozoa, suggested that these two groups are probably monophyletic and that the root is located on their common ancestor (Stechmann and Cavalier-Smith 2003a). However, this possibility is less supported by the present study on the sequence-based phylogeny. Because information regarding the gene fusion events in the pyrimidine biosynthetic pathway is scant, further analyses examining gene organization of the pathway from diverse protist lineages is necessary to settle more precisely the discrepancy between the inferences based on the gene fusion and the sequence-based phylogeny. Interestingly a sequence similarity search of the genome project database of a unicellular red alga, Cyanidioschyzon merolae (Matsuzaki et al. 2004) (http://merolae.biol.s.u-tokyo.ac.jp/) identified a fused CPSII-ACT-DHO gene in addition to separate CPS and GAT genes, demonstrating that the gene fusion event most likely occurred on the common ancestor of all eukaryotes including bikonts, Opisthokonta, and Amoebozoa. This finding reduces the possibility for rooting on the common ancestor of Opisthokonta and Amoebozoa, but instead gives more support for the Opisthokonta rooting suggested by the present molecular phylogeny. In addition to gene fusions in the pyrimidine biosynthetic pathway summarized by Nara, Hashimoto, and Aoki (2000), a novel gene fusion, ACT-DHOD (Dihydroorotate dehydrogenase), has recently been found in a euglenozoan protist, Bodo saliens (Annoura et al. 2004). Compared to the DHFR-TS fusion event, gene fusions in the pyrimidine biosynthetic pathway may be more complicated, with fusion and separation events possibly occurring more than once on the independent branches of the eukaryotic tree.

    If the constraints that were assumed in advance for each of the higher-order groups examined in the present analyses were very discordant with phylogenetic information for each of the genes analyzed, the constraints may have violated phylogenetic inference. Because constraints to RPB1 significantly affected the log-likelihood difference between the best trees with and without constraints (p < 0.01; see table S1 of the Supplementary Materials online), we excluded RPB1 from all 22 protein genes in the combined analysis, as shown in figure 3, and explored the influence of such exclusion. Analysis only by RPB1 was not significantly different from the analysis of the 22 proteins with regard to the patterns in figure 3. Removal of RPB1 from the 22 proteins ("–RPB1") shifted the best tree to Tree 25 with Euglenozoa/Heterolobosea rooting, but the overall pattern of p values for the 63 trees in figure 3 did not change after its removal, demonstrating that no significant influence was introduced by inclusion of RPB1. Instead, as already mentioned in the Results, inclusion of tubulins and/or rRNAs might significantly affect the present combined analyses. We also examined an alternative combined analysis of the 22 protein genes by excluding sequences of Rhodophyta and/or Glaucophyta, which are present in nine of the genes (see table S1 in the Supplementary Materials online), because the monophyletic origin of Plantae, including Viridiplantae, Rhodophyta, and Glaucophyta, has recently been challenged (Nozaki et al. 2003). No differences were found either with or without Rhodophyta and/or Glaucopyhta, demonstrating that the constraint on the monophyletic origin of Plantae had no significant influence. The possibility for a polyphyletic origin of Viridiplantae, Rhodophyta, and Glaucophyta that was proposed by Nozaki et al. (2003) should be re-examined with more data.

    In our present analyses performed by removing slow- or fast-evolving sites, owing to the limitation of the computational time, we did not provide "control" experiments for assessing a specific effect of removing sites over what is expected with the random removal of sites. Although we roughly compared the BP values of the analyses with different numbers of sites in the present analyses, in general one cannot simply compare them because a positive correlation is present between BP values and numbers of sites used in the analysis. A control experiment should be done in the next step for each of the data sets with different number of sites. In spite of the use of the model for approximating rate heterogeneity among sites (RAS), the removal of fast-evolving sites (–r78) still showed an additional effect on the BP values. This is probably because an effect of model misspecification was apparent, especially on sites r7 and r8. Because violation of amino acid frequency constancy was not so evident for the 22 proteins analyzed (table S1 of the Supplementary Materials online), the misspecification can probably be attributed to evolutionary rate distribution differences across subtrees (covarion shifts), which were not taken into consideration in the present analyses with the RAS model. The presence of such model misspecification was discussed in detail in an EF1 analysis for the position of Microsporidia in light of LBA and covarion shifts (Inagaki et al. 2004). If the "all" data set in the present analyses contained such a covarion-like structure, then the RAS model could not fully approximate the data, resulting in a possible LBA artifact which locates Diplomonadida or a common ancestor of Diplomonadida and Parabasala at the base of the tree.

    The ML tree of the combined analysis using RAS model on the –r78 data set, including 22 proteins (Tree A, shown in fig. 2a) clearly demonstrated the difficulty in solving the higher-order phylogeny of Eukaryota. The branch lengths leading to the outgroup and Parabasala or Diplomonadida are extremely long, and those leading to nodes a, b, and c are extremely short. Except for Opisthokonta, taxon sampling within each higher-order group is sparse. The relationships between the major eukaryotic groups and an outgroup cannot be clearly resolved apart from the close relationship between Diplomonadida and Parabasala. Although the present analyses could narrow the possible root of the tree of Eukaryota, the problem is still open because of the lack of phylogenetic information. With the accumulation of EST sequence data, a large scale analyses for eukaryotic phylogeny ("phylogenomics") has recently been examined, and conclusively demonstrated the position of chanoflagellates (Philippe et al. 2004). The "phylogenomics" approach, using more sequence data with adequate taxon sampling, together with the application of sophisticated data analysis for combined phylogeny, will be indispensable for providing more robust inference on the higher-order relationships and the root of the Eukaryota tree.

    Supplementary Material

    Materials and Methods for Cloning and Sequencing of the Protist Genes.

    FIG. S1. Unrooted maximum likelihood trees of Eukaryota. (a), IleRS; (b), ValRS; (c), Hsp90c; and (d), RPB1.

    Table S1. Constraints on the subtrees for seven higher-order taxonomic groups of Eukaryota and an outgroup.

    Acknowledgements

    We express sincere thanks to Dr. M. Müller for providing us an opportunity to analyze genes from several amitochondriate protists, for invaluable discussions on the phylogenetic analyses, and for critical review of the manuscript. We also thank Dr. H. Philippe for provision of the IleRS and ValRS alignments and discussions, Dr. L. B. Sánchez for technical support for gene clonings, Dr. T. Shirakura for initial sequencing of the G. plecoglossi ValRS and E. hellem IleRS genes, A. Deguchi and S. Kikuchi for technical assistance, Drs. F. D. Gillin and S. A. Aley (San Diego, CA), P. J. Johnson (Los Angeles, CA), L. B. Sánchez (New York, NY), L. M. Weiss (New York, NY), and K. Kita (Tokyo, Japan) for provision of the gDNA and/or gDNA/cDNA libraries of Giardia intestinalis, Trichomonas vaginalis, Entamoeba histolytica, Encephalitozoon hellem, and Plasmodium falciparum, respectively. This work was carried out under the ISM Cooperative Research Programs (03ISMCRP-1015 and 04ISMCRP-1017) and the Research Project at Center for Computational Sciences, University of Tsukuba. Work carried out in the laboratory at the Rockefeller University in New York was supported by USPHS National Institutes of Health grant AI11942 to M.M. The visits of T.H. to the New York laboratory were supported by the US-Japan Cooperative Research Project by the National Science Foundation (USA), and by the Japan Society for the Promotion of Science (INT-9726707).

    References

    Adachi, J., and M. Hasegawa. 1996. MOLPHY version 2.3: program for molecular phylogenetics based on maximum likelihood. Comput. Sci. Monographs No. 28, The Institute of Statistical Mathematics, Tokyo.

    Aldritt, S. M., P. Tien, and C. C. Wang. 1985. Pyrimidine salvage in Giardia lamblia. J. Exp. Med. 161:437–445.

    Annoura, T., T. Nara, T. Makiuchi, T. Hashimoto, and T. Aoki. 2004. The origin of dihydroorotate dehydrogenase genes of kinetoplastids, with special reference to their biological significance and adaptation to anaerobic, parasitic conditions. J. Mol. Evol. in press.

    Archibald, J. M., D. Longet, J. Pawlowski, and P. J. Keeling. 2003. A novel polyubiquitin structure in Cercozoa and Foraminifera: evidence for a new eukaryotic supergroup. Mol. Biol. Evol. 20:62–66.

    Arisue, N., T. Hashimoto, J. A. Lee, D. V. Moore, P. Gordon, C. W. Sensen, T. Gaasterland, M. Hasegawa, and M. Müller. 2002a. The phylogenetic position of the peleobiont Mastigamoeba balamuthi based on sequences of rDNA and translation elongation factors EF-1 and EF-2. J. Eukaryot. Microbiol. 49:1–10.

    Arisue, N., T. Hashimoto, H. Yoshikawa, Y. Nakamura, G. Nakamura, F. Nakamura, T. Yano, and M. Hasegawa. 2002b. Phylogenetic position of Blastocystis hominis and of stramenopiles inferred from multiple molecular sequence data. J. Eukaryot. Microbiol. 49:42–53.

    Arisue, N., Y. Maki, H. Yoshida, A. Wada, L. B. Sánchez, M. Müller, and T. Hashimoto. 2004. Comparative analysis of the ribosomal components of the hydrogenosome-containing protist, Trichomonas vaginalis. J. Mol. Evol. 59:59–71.

    Baldauf, S. L. 2003. The deep roots of eukaryotes. Science 300:1703–1706.

    Baldauf, S. L., A. J. Roger, I. Wenk-Siefert, and W. F. Doolittle. 2000. A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290:972–977.

    Bapteste, E., H. Brinkmann, J. A. Lee, D. V. Moore, C. W. Sensen, P. Gordon, L. Duruflé, T. Gaasterland, P. Lopez, M. Müller, and H. Philippe. 2002. The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc. Natl. Acad. Sci. USA 99:1414–1419.

    Cavalier-Smith, T. 2002. The phagotrophic origin of eukaryotes and phylogenetic classification of Protozoa. Int. J. Syst. Evol. Microbiol. 52:297–354.

    Cavalier-Smith, T., and E. E. Chao. 2003a. Phylogeny of choanozoa, apusozoa, and other protozoa and early eukaryote megaevolution. J. Mol. Evol. 56:540–563.

    Cavalier-Smith, T., and E. E. Chao. 2003b. Phylogeny and classification of phylum Cercozoa (Protozoa). Protist 154:341–358.

    Cavalier-Smith, T. 2004. Only six kingdoms of life. Proc R. Soc. Lond. Ser. B 271:1251–1262.

    Dacks, J. B., and W. F. Doolittle. 2001. Reconstructing/deconstructing the earliest eukaryotes: how comparative genomics can help. Cell 107:419–425.

    Dacks, J. B., J. D. Silberman, A. G. B. Simpson, S. Moriya, T. Kudo, M. Ohkuma, and R. J. Redfield. 2001. Oxymonads are closely related to the excavate taxon Trimastix. Mol. Biol. Evol. 18:1034–1044.

    Dacks, J. B., A. Marinets, W. F. Doolittle, T. Cavalier-Smith, and J. M. Logsdon Jr. 2002. Analyses of RNA polymerase II genes from free-living protists: phylogeny, long branch attraction, and the eukaryotic big bang. Mol. Biol. Evol. 19:830–840.

    Edgcomb, V. P., A. J. Roger, A. G. B. Simpson, D. T. Kysela, and M. L. Sogin. 2001. Evolutionary relationships among "jakobid" flagellates as indicated by alpha- and beta-tubulin phylogenies. Mol. Biol. Evol. 18:514–522.

    Embley, T. M., M. van der Giezen, D. S. Horner, P. L. Dyal, and P. Foster. 2003. Mitochondria and hydrogenosomes are two forms of the same fundamental organelle. Philos. Trans. R. Soc. Lond. B Biol. Sci. 358:191–201.

    Felsenstein, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27:401–410.

    Gribaldo, S., and H. Philippe. 2002. Ancient phylogenetic relationships. Theoret. Popul. Biol. 61:391–408.

    Hasegawa, M., and H. Kishino. 1994. Accuracies of the simple methods for estimating the bootstrap probability of a maximum-likelihood tree. Mol. Biol. Evol. 11:142–145.

    Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160–174.

    Hashimoto, T., L. B. Sánchez, T. Shirakura, M. Müller, and M. Hasegawa. 1998. Secondary absence of mitochondria in Giardia lamblia and Trichomonas vaginalis revealed by valyl-tRNA synthetase phylogeny. Proc. Natl. Acad. Sci. USA 95:6860–6865.

    Hashimoto, T., Y. Nakamura, T. Kamaishi, and M. Hasegawa. 1997. Early evolution of eukaryotes inferred from protein phylogenies of translation elongation factors 1 and 2. Arch. Protistenkd. 148:287–295.

    Hirt, R. P., J. M. Logsdon Jr., B. Healy, M. W. Dorey, W. F. Doolittle, and T. M. Embley. 1999. Microsporidia are related to Fungi: evidence from the largest subunit of RNA polymerase II and other proteins. Proc. Natl. Acad. Sci. USA 96:580–585.

    Horner, D. S., and T. M. Embley. 2001. Chaperonin 60 phylogeny provides further evidence for secondary loss of mitochondria among putative early-branching eukaryotes. Mol. Biol. Evol. 18:1970–1975.

    Hughey, R, and A. Krogh. 1996. Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput. Appl. Biosci. 12:95–107.

    Inagaki, Y., E. Susko, N. M. Fast, and A. J. Roger. 2004. Covarion shifts cause a long-branch attraction artifact that unites Microsporidia and Archaebacteria in EF-1 phylogenies. Mol. Biol. Evol. 21:1340–1349.

    Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275–282.

    Keeling, P. J. 2001. Foraminifera and Cercozoa are related in actin phylogeny: two orphans find a home?. Mol. Biol. Evol. 18:1551–1557.

    Keeling, P. J., and W. F. Doolittle. 1996. Alpha-tubulin from early-diverging eukaryotic lineages and the evolution of the tubulin family. Mol. Biol. Evol. 13:1297–1305.

    Kishino, H., T. Miyata, and M. Hasegawa. 1990. Maximum likelihood inference of protein phylogeny, and the origin of chloroplasts. J. Mol. Evol. 31:151–160.

    Lee, J. J., G. F. Leedale, P. Bradbury, eds. 2000. An Illustrated Guide of Protozoa, 2nd ed. Society of Protozoologists, Lawrence, Kans.

    Martin, W., and M. Müller. 1998. The hydrogen hypothesis of the first eukaryote. Nature 392:37–41.

    Matsuzaki, M., O. Misumi, T. Shin-i et al. (42 co-authors). 2004. Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D. Nature 428:653–657.

    McArthur, A. G., H. G. Morrison, J. E. Nixon et al. (15 co-authors). 2000. The Giardia genome project database. FEMS Microbiol. Lett. 189:271–273.

    Moreira, D., H. Le Guyader, and H. Philippe. 2000. The origin of red algae and the evolution of chloroplasts. Nature 405:69–72.

    Nara, T., T. Hashimoto, and T. Aoki. 2000. Evolutionary implications of the mosaic pyrimidine-biosynthetic pathway in eukaryotes. Gene 257:209–222.

    Nozaki, H., M. Matsuzaki, M. Takahara, O. Misumi, H. Kuroiwa, M. Hasegawa, T. Shin-i, Y. Kohara, N. Ogasawara, and T. Kuroiwa. 2003. The phylogenetic position of red algae revealed by multiple nuclear genes from mitochondria-containing eukaryotes and an alternative hypothesis on the origin of plastids. J. Mol. Evol. 56:485–497.

    O'Kelly, C. J., and T. A. Nerad. 1999. Malawimonas jakobiformis n. gen., n. sp. (Malawimonadidae n. fam.): a jakoba-like heterotrophic nanoflagellate with discoidal mitochondrial cristae. J. Eukaryot. Microbiol. 46:522–531.

    Philippe, H., P. Lopez, H. Brinkmann, K. Budin, A. Germot, J. Laurent, D. Moreira, M. Muller, and H. Le Guyader. 2000. Early-branching or fast-evolving eukaryotes? An answer based on slowly evolving positions. Proc. R. Soc. Lond. B Biol. Sci. 267:1213–1221.

    Philippe, H., E. A. Snell, E. Bapteste, P. Lopez, P. W. Holland, and D. Casane. 2004. Phylogenomics of eukaryotes: impact of missing data on large alignments. Mol. Biol. Evol. 21:1740–1752.

    Richards, T. A., R. P. Hirt, B. A. P. Williams, and T. M. Embley. 2003. Horizontal gene transfer and the evolution of parasitic protozoa. Protist 154:17–32.

    Roger, A. J. 1999. Reconstructing early events in eukaryotic evolution. Am. Nat. 154:S146–S163.

    Roger, A. J., and J. D. Silberman. 2002. Mitochondria in hiding. Nature 418:827–829.

    Rotte, C., K. Henze, M. Müller, and W. Martin. 2000. Origins of hydrogenosomes and mitochondria. Curr. Opin. Microbiol. 3:481–486.

    Shimodaira, H. 2002. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51:492–508.

    Shimodaira, H., and M. Hasegawa. 2001. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17:1246–1247.

    Silberman, J. D., A. G. B. Simpson, J. Kulda, I. Cepicka, V. Hampl, P. J. Johnson, and A. J. Roger. 2002. Retortamonad flagellates are closely related to diplomonads–implications for the history of mitochondrial function in eukaryote evolution. Mol. Biol. Evol. 19:777–786.

    Simpson, A. G. B., and D. J. Patterson. 1999. The ultrastructure of Carpediemonas membranifera: (Eukaryota), with reference to the "excavate hypothesis.". Eur. J. Protistol. 35:353–370.

    Simpson, A. G. B., and A. J. Roger 2002. Eukaryotic evolution: getting to the root of the problem. Curr. Biol. 12:R691–R695.

    Simpson, A. G. B., A. J. Roger, J. D. Silberman, D. D. Leipe, V. P. Edgcomb, L. S. Jermiin, D. J. Patterson, and M. L. Sogin. 2002. Evolutionary history of "early-diverging" eukaryotes: the excavate taxon Carpediemonas is a close relative of Giardia. Mol. Biol. Evol. 19:1782–1791.

    Sogin, M. L., and J. D. Silberman. 1998. Evolution of the protists and protistan parasites from the perspective of molecular systematics. Int. J. Parasitol. 28:11–20.

    Stechmann, A., and T. Cavalier-Smith. 2002. Rooting the eukaryote tree by using a derived gene fusion. Science 297:89–91.

    Stechmann, A., and T. Cavalier-Smith. 2003a. The root of the eukaryote tree pinpointed. Curr. Biol. 13:R665–666.

    Stechmann, A., and T. Cavalier-Smith. 2003b. Phylogenetic analysis of eukaryotes using heat-shock protein Hsp90. J. Mol. Evol. 57:408–419.

    Tovar, J., G. Leon-Avila, L. B. Sánchez, R. Sutak, J. Tachezy, M. van der Giezen, M. Hernandez, M. Müller, and J. M. Lucocq. 2003. Mitochondrial remnant organelles of Giardia function in iron-sulphur protein maturation. Nature 426:172–176.

    Wang, C. C., R. Verham, S. F. Tzeng, S. Aldritt, and H. W. Cheng. 1983. Pyrimidine metabolism in Tritrichomonas foetus. Proc. Natl. Acad. Sci. USA 80:2564–2568.

    Wang, C. C., and H. W. Cheng. 1984. Salvage of pyrimidine nucleosides by Trichomonas vaginalis. Mol. Biochem. Parasitol. 10:171–184.

    Williams, B. A., R. P. Hirt, J. M. Lucocq, and T. M. Embley. 2002. A mitochondrial remnant in the microsporidian Trachipleistophora hominis. Nature 418:865–869.

    Wuyts, J., P. De Rijk, Y. Van de Peer, T. Winkelmans, and R. De Wachter. 2001. The European Large Subunit Ribosomal RNA database. Nucleic Acids Res. 29:175–177.

    Wuyts, J., Y. Van de Peer, T. Winkelmans, and R. De Wachter. 2002. The European database on small subunit ribosomal RNA. Nucleic Acids Res. 30:183–185.

    Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555–556.(Nobuko Arisue*,,1, Masami)