当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2004年 > 第8期 > 正文
编号:11255058
An Assessment of Accuracy, Error, and Conflict with Support Values from Genome-Scale Phylogenetic Data
     Department of Biological Sciences, University at Buffalo, The State University of New York, Buffalo, New York

    E-mail: djtaylor@buffalo.edu.

    Abstract

    Despite the importance of molecular phylogenetics, few of its assumptions have been tested with real data. It is commonly assumed that nonparametric bootstrap values are an underestimate of the actual support, Bayesian posterior probabilities are an overestimate of the actual support, and among-gene phylogenetic conflict is low. We directly tested these assumptions by using a well-supported yeast reference tree. We found that bootstrap values were not significantly different from accuracy. Bayesian support values were, however, significant overestimates of accuracy but still had low false-positive error rates (0% to 2.8%) at the highest values (>99%). Although we found evidence for a branch-length bias contributing to conflict, there was little evidence for widespread, strongly supported among-gene conflict from bootstraps. The results demonstrate that caution is warranted concerning conclusions of conflict based on the assumption of underestimation for support values in real data.

    Key Words: Bootstrap ? incongruence ? Bayesian phylogenetic inference ? yeast genome

    Introduction

    Part of the molecular evolution revolution has been the application of more rigorous methods to assess the reliability of the phylogenetic results, such as nonparametric bootstrapping (Felsenstein 1985). Although the statistical interpretation of support is still controversial, most practitioners have used the value as a measure of phylogenetic accuracy (Hillis and Bull 1993; Sanderson and Shaffer 2002). Moreover, most investigators have assumed that the accuracy of nonparametric bootstrapping is biased downwards such that a 70% value indicates strong support (>95%) for a clade (Soltis and Soltis 2003). Simulation, theoretical, and empirical studies of a few gene loci have supported the conservative tendencies of the bootstrap, especially compared with Bayesian posterior probabilities (Suzuki, Glazko, and Nei 2002; Wilcox et al. 2002; Alfaro, Zoller, and Lutzoni 2003; Douady et al. 2003; Erixon et al. 2003; Simmons, Pickett, and Miya 2004). Nevertheless, Hillis and Bull (1993) showed that the bootstrap underestimate depended on several assumptions, such as a symmetric tree shape and near-equal rates of evolution for external branches. Still, there is little evidence that equal rates of molecular evolution can lead to a lack of bootstrap conservativeness for real data. The consensus expectation remains that bootstraps will almost always underestimate accuracy (Hillis and Bull 1993; Soltis and Soltis 2003).

    The assumption of a conservative support value and its implementation in phylogenetics software has positioned nonparametric bootstrapping to be the gold standard by which phylogenetic assumptions such as among-gene conflict are assessed. For closely related organisms, the extent of conflict has been difficult to assess because genome projects have been largely restricted to distantly related disease-causing organisms. Recently, Rokas et al. (2003) concluded that phylogenetic conflict among 106 genes of seven closely related yeast species was strongly supported and genomically widespread. They assumed that the nonparametric bootstrap was biased downwards and used a cutoff of 70% or higher<--?1-->to designate "strong support" in their 106-gene data set. Rokas et al. (2003) concluded that more than 20 genes were necessary to attain confidence in yeast phylogenies and that there was no identifiable molecular evolutionary source of phylogenetic conflict. Phillips, Delsuc, and Penny (2004), however, identified a compositional bias in the concatenated yeast data that strongly misleads minimum-evolution (ME) results, but not maximum-likelihood (ML) or maximum-parsimony (MP) results. The role of other biases (such as long-branch attraction from substitution rate differences) in generating conflict among yeast genes has not been explored in detail. The implications of the pervasive conflict finding are profound (Gee 2003). If among-gene conflict is generally strongly supported and widespread, then the results of thousands of existing studies and major biodiversity projects may be unreliable.

    Nevertheless, among-gene conflict could be the result of chance alone if based on a bootstrap support value of 70% that is not an underestimate. Here we use the yeast genome data and its strongly supported reference tree to address three questions. What are the estimated accuracies and error rates of nonparametric bootstraps and Bayesian posterior probabilities for yeast? Is among-gene phylogenetic conflict for yeast strongly supported by bootstrap values? Can branch-length biases be ruled out as a source of conflict among yeast genes?

    No reliability method yielded support values that clearly underestimated accuracy (fig. 1A–F). Indeed, MP and ML bootstrap values were not significantly different from ideal values (table 1). This suggests that yeast bootstrap values of 70% should not be treated as strong support, because there is a 30% probability that such a value is caused by chance alone. The generality of the observed lack of underestimation is unknown because this is the first genome-scale assessment of bootstrap accuracy among closely related taxa. The yeast data did have at least two features shown by Hillis and Bull (1993) to erode underestimation: asymmetric trees and unequal rates of external branch evolution (the average external branch length ratio was 3.29).

    FIG. 1. Comparison of accuracy for six estimates of phylogenetic support among 106 genes from eight species of yeast. The solid lines represent the ideal support where support values are equal to accuracy, the error bars are ±1 SEM, and the solid boxes are mean correctness for support values in a bin. (A) Bootstrap values from maximum parsimony (MP) of nucleotide data; (B) bootstrap values from MP of amino acid data; (C) bootstrap vales from maximum likelihood of nucleotide data; (D) Bayesian posterior probabilities (PP) from nucleotide data with estimated model parameters (nt1); (E) Bayesian PP from nucleotide data with model parameters fixed to match those from ML (nt2); and (F) Bayesian PP from amino acid data with a fixed WAG rate matrix and estimated among-site rate parameters

    Table 1 Wilcoxon Signed Ranks Test Comparing Support Values from 106 Yeast Genes to Their Ideal Values for Six Phylogenetic Methods.

    In contrast to the bootstrap, the Bayesian posterior probabilities (PP) showed a clear overestimate of the ideal support value (fig. 1D–F). That is, 70% PP values were on average correct only 40% of the time for nucleotides. This overestimate was significantly different from ideal for both nucleotide and amino acid data (table 1). A lower type-II error rate for Bayesian analysis (fig. 2) indicated that this method was considerably more effective at recovering strongly supported correct clades than was bootstrapping (a pattern also found by Erixon et al. [2003]). However, these lower rates were accompanied by an increased percentage of strongly supported incorrect clades, or type-I error rates (4.2% to 8.8% for PP versus 0% to 2.6% for bootstrap). Still, the type-I error rates were less severe than the highs of 15.8% to 78% for simulated data (Suzuki, Glazko, and Nei 2002; Erixon et al. 2003). The largest type I errors were from Bayesian methods with the worst model and parameter fit (aa and nt2), which suggests sensitivity to error from model violation. Importantly, when a cutoff of greater than 99% was used for PP values, the type I and type II error rates were less than, or similar to, those for bootstrap values greater than 95% (fig. 2).

    FIG. 2. The error rates of six phylogenetic methods with two levels of support (>95% and >99%) for 106 single-gene trees and eight yeast taxa. Black bars represent the frequencies of incorrect nodes or false positives (type I errors), assuming the reference tree of Rokas et al. (2003) is correct. The numerical values for type I errors are presented above each bar. Gray bars represent the frequencies of correct nodes that were unsupported (false negatives, or type II errors) for a given cutoff value

    Given the demonstrated lack of underestimation in the bootstrap, a 95% cutoff should be used for yeast to assure reasonable error rates when conflict is estimated. Almost all (97.4% to 100%) of the strongly supported clades (>95%) agreed with the reference tree, and the standard errors of the mean decreased as the support values increased such that error bars were barely visible beyond 95% support (fig. 1). The number of gene trees that had nodes strongly conflicting with the reference tree was zero (MP nucleotide), one (MP amino acid), and three (ML nucleotide). This amount of conflict was less than expected by chance for 106 genes. The incorrect trees had two long external branches connected by a short internal branch as found in long-branch attraction bias. Overall the external/internal branch length ratios were greater for trees that were incongruent with the reference tree, t (127) = 4.55, P < 0.001. These results indicated that among-gene conflict exists, and one source was potentially identifiable as long-branch attraction bias. However, pervasive conflict was not strongly supported by bootstrap values.

    The yeast data show that the assumption of underestimated support values can be a dangerous one. Conflict support can be grossly overestimated if the values are, in fact, accurate. Despite detecting a possible branch-artifact bias, our results revealed no evidence that clades with strong bootstrap support from a single gene are pervasively unreliable. We note that Daubin, Moran, and Ochman (2003) also found negligible (<5%) incongruence among an average of 1,067 genes for closely related genomes of bacteria. Although we agree with Rokas et al. (2003) that genome-scale data will make phylogenetic results more robust, we conclude that studies using less than 20 loci, tests and corrections for biases (Phillips, Delsuc, and Penny 2004), and strong support values (>95%) are not inherently unreliable.

    Methods

    The yeast data of Rokas et al. (2003) were used for all analyses. These data comprise amino acid and nucleotide alignments from 106 genes from seven closely related yeast species and an outgroup. The reference species tree was ((Saccharomyces cerevisae, S. paradoxus), (S. mikatae), (S. kudriavzevii), (S. bayanus), (S. castellii), (S. kluyveri), (Candida albicans)). Of course, the actual species tree is unknown for this yeast complex, but the overwhelming genomic evidence for this reference tree (100% bootstraps for all nodes for the combined analysis of all loci) makes it a reasonable proxy (Rokas et al. 2003; Phillips, Delsuc, and Penny 2004). The size of the data set is 127,026 nucleotides and 42,342 amino acids.

    Nucleotide substitution models were fitted for each of the 106 genes by using two methods: decision theory (for Bayes nt1), as implemented in DT-ModSel (Minin et al. 2003), and likelihood ratio tests (for Bayes nt2 and ML), as implemented in Modeltest version 3.06 (Posada and Crandall 1998). For nt1, we used six substitution parameters when the number of substitution parameters for the model chosen was estimated to be three. For nt2, all parameter values estimated by Modeltest were fixed in the priors to improve comparisons with ML bootstraps. The amino acid rate matrix prior was fixed to the WAG model (Whelan and Goldman 2001) and an among-site rate variation parameter (gamma) was estimated for each gene. All other priors were the default values. Posterior probabilities were estimated with MrBayes version 3.0 or version 3.1 (Huelsenbeck and Ronquist 2001). We ran 500,000 generations, sampled every 100th generation, and conservatively set the burn-in value at 2,500. Where parameters were estimated (nt1), we ran 100,000 generations, sampled every 10th generation, and set the burn-in value at 1,000. The nonparametric bootstrap values were from Rokas et al. (2003).

    To assess the accuracy of each method, the support values were binned by increments of 5% between 51 and 100. These mean support values were then plotted against the proportion correct (the median value for each bin) determined by presence in the reference tree. Nonparametric Wilcoxon signed-ranks tests were used to determine the statistical significance of the support levels from ideal (Simmons, Pickett, and Miya 2004).

    Acknowledgements

    We thank Antonis Rokas for sending us the yeast genome data and David Penny for sending us his manuscript on phylogenetic bias. We also thank the University at Buffalo CCR for access to their computing clusters. D.J.T. and W.H.P. were supported by grants 9984901 and 0331095 from the National Science Foundation.

    Literature Cited

    Alfaro, M. E., S. Zoller, and F. Lutzoni. 2003. Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov Chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. Mol. Biol. Evol. 20:255-266.

    Daubin, V., N. A. Moran, and H. Ochman. 2003. Phylogenetics and the cohesion of bacterial genomes. Science 301:829-832.

    Douady, C. J., F. Delsuc, Y. Boucher, W. F. Doolittle, and E. J. P. Douzery. 2003. Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol. Biol. Evol. 20:248-254.

    Erixon, P., B. Svennblad, T. Britton, and B. Oxelman. 2003. Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. Syst. Biol. 52:664-673.

    Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783-791.

    Gee, H. 2003. Ending incongruence. Nature 425:782.

    Hillis, D. M., and J. J. Bull. 1993. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst. Biol. 42:182-192.

    Huelsenbeck, J. P., and F. Ronquist. 2001. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17:754-755.

    Minin, V., Z. Abdo, P. Joyce, and J. Sullivan. 2003. Performance-based selection of likelihood models for phylogeny estimation. Syst. Biol. 52:674-683.

    Phillips, M. J., F. Delsuc, and D. Penny. 2004. Genome-scale phylogeny and the detection of systematic biases. Mol. Biol. Evol. 21: (in press).

    Posada, D., and K. A. Crandall. 1998. MODELTEST: testing the model of DNA substitution. Bioinformatics 14:817-818.

    Rokas, A., B. L. Williams, N. King, and S. B. Carroll. 2003. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425:798-804.

    Sanderson, M. J., and H. B. Shaffer. 2002. Troubleshooting in molecular phylogenetic analysis. Annu. Rev. Ecol. Syst. 33:49-72.

    Simmons, M. P., K. M. Pickett, and M. Miya. 2004. How meaningful are Bayesian support values? Mol. Biol. Evol. 21:188-199.

    Soltis, P. S., and D. E. Soltis. 2003. Applying the bootstrap in phylogeny reconstruction. Stat. Sci. 18:256-267.

    Suzuki, Y., G. V. Glazko, and M. Nei. 2002. Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc. Natl. Acad. Sci. USA 99:16138-16143.

    Whelan, S., and N. Goldman. 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18:691-699.

    Wilcox, T. P., D. J. Zwickl, T. A. Heath, and D. M. Hillis. 2002. Phylogenetic relationships of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support. Mol. Phylogenet. Evol. 25:361-371.(Derek J. Taylor and Willi)