当前位置: 首页 > 期刊 > 《分子生物学进展》 > 2005年第3期 > 正文
编号:11176487
Comparative Analyses Reveal a Complex History of Molecular Evolution for Human MYH16
http://www.100md.com 《分子生物学进展》
     * Department of Anthropology, and Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University, Tempe

    Correspondence: E-mail: acstone@asu.edu.

    Abstract

    We describe the pattern of molecular evolution at a sarcomeric myosin gene, MYH16, using more than 30,000 bp of exon and intron sequence data from the chimpanzee and human genome sequencing projects to evaluate the timing and consequences of a human lineage–specific frameshift deletion. We estimate the age of the deletion at approximately 5.3 MYA. This estimate is consistent with the time of human and chimpanzee divergence and is significantly older than the first appearance of the genus Homo in the fossil record. We also find conflicting estimates of nonsynonymous fixation rates (dN) across different regions of this gene, revealing a complex pattern inconsistent with a simple model of pseudogene evolution for human MYH16.

    Key Words: MYH16 ? sarcomeric myosin ? Pan troglodytes ? Macaca mulatta

    Introduction

    One observation to emerge from comparative genomic studies of humans and our close relative, the chimpanzee (Pan troglodytes), is that we may be differentiated by as little as 1% at the genomic level (Sibley and Ahlquist 1984; Yi, Ellsworth, and Li 2002). In addition, these studies consistently suggest that much interspecific functional variation results from differences in regions external to exons and operates at the level of expression through transcription and translational variation (King and Wilson 1975; Enard et al. 2002a; Carroll 2003). Therefore, because human-chimpanzee variation in exonic regions may be relatively uncommon, evolutionary genetic analysis of even a single amino acid mutation can have large implications in our pursuit to determine what differentiates us from our closest relative. One example comes from studies by Enard et al. (2002b) and Zhang, Webb, and Podlaha (2002), who have shown that two amino acid substitutions in the FOXP2 gene were recently fixed in the human lineage by selection and likely played a large role in the evolution of language ability. Because we are often working with very few amino acid substitutions in human coding regions that may have been under recent positive selection, we are often heavily limited by statistical power to estimate the strength of selection on these sites (Anisimova, Bielawski, and Yang 2002). Therefore, estimates of the origin or age of a small number of amino acid substitutions can have extremely large variances, depending on mutation rate assumptions underlying the model of molecular evolution of choice (Enard et al. 2002b).

    Recently, an analysis by Stedman et al. (2004) of a sarcomeric myosin gene (MYH16) found a frameshift deletion in humans, but not in other primates, that results in a premature stop codon and appears to be associated with masticatory muscle fiber size reduction. In an effort to determine the age of the inactivation of this gene, they sequenced 1,153 bp from seven exons, including exon 18, where the deletion and stop codon occur. In their analyses, 1,065 bp from six exons downstream of exon 18 were examined, and a total of two nonsynonymous (N, 840 total sites) and two synonymous (S, 225 total sites) fixations between chimpanzee and human were found, resulting in the ratio of nonsynonymous substitutions per nonsynonymous site to nonsynonymous substitutions per synonymous site (dN/dS) of 0.270. Stedman et al. (2004) used a method (Chou et al. 2002) that assumes purifying selection on nonsynonymous mutations occurred before the deletion event and that this selection was relaxed after the origin of the deletion. In this scenario, nonsynonymous mutations are expected to become fixed at the neutral fixation rate. The authors inferred that both nonsynonymous fixations (which result in one amino acid difference) occurred along the human lineage using an ancestral reconstruction (based on dog and other primate MYH16 sequences) and estimated that the MYH16 deletion and inactivation occurred approximately 2.4 MYA. This estimate coincides with the first appearance of the genus Homo in the fossil record (Kimbel, Johanson, and Rak 1997). This led Stedman et al. (2004) to conclude that the deletion was likely associated with hominin masticatory gracilization and brain-size expansion, which are traits first observed in the genus Homo (Tobias 1991).

    Because this age estimate was based on an analysis of only one amino acid fixation in the human lineage in a total of 1,065 bp at the MYH16 gene, we obtained data from the chimpanzee and rhesus macaque (Macaca mulatta) genome sequencing projects to construct a much larger MYH16 data set. Nucleotide sequence data from genome project trace files have been used successfully in previous comparative genomic studies (Bejerano et al. 2004; Das, Miller, and Stern 2004), and, here, we use this resource to examine the molecular evolution of coding and noncoding regions of the MYH16 gene and consequently reevaluate the age estimate of the human lineage exon 18 deletion.

    Materials and Methods

    For our current study, we obtained data from the chimpanzee genome project by Blast searching the National Center for Biotechnology Information (NCBI) trace archive (http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?) with the annotated human MYH16 nucleotide sequence (GenBank accession number BK001410). Chimpanzee nucleotide sequence was aligned to the human MYH16 gene and manually reviewed with the Lasergene SeqMan program (DNAStar). Our new chimpanzee data set contains a total of 3,642 bp of coding sequence from 26 exons downstream of the exon 18 deletion (table 1). Unlike the Stedman et al. (2004) study, we also collected 1,821 bp of coding sequence from 15 exons upstream of the deletion, as well as a total of 25,079 bp of intron sequence from 10 introns upstream and 14 introns downstream of the deletion (table 1).

    Table 1 Gene Region Comparisons for Human-Chimpanzee MYH16 Locus

    Several factors confirm that the sequence we obtained is from the chimpanzee MYH16 ortholog. The divergence of MYH16 from all other MYH genes is ancient (McGuigan, Phillips, and Postlethwait 2004), likely predating the finned-fish and tetrapod split (Desjardins et al. 2002). Yet, our estimate of divergence (d, measured as the number of differences per nucleotide for introns and exon synonymous sites) between our constructed chimpanzee MYH16 sequence and the human MYH16 sequence is only approximately 1%. In addition, the seven chimpanzee exon sequences (1,153 bp) reported by Stedman et al. (2004) exactly match our chimpanzee trace file sequences for these same exons. Because of the methods by which the genome sequence is generated, there are likely to be sequencing errors in trace files. Therefore, to ensure that such sequencing errors were not tabulated, we included only exon and intron data for which we could obtain at least two independently generated sequence trace files. We found no ambiguities between overlapping trace files in exons and ambiguities in less than 0.04% of our intron sequence, none of which affected the tabulated number of fixations in our data. Our final data set includes a total of 30,542 bp of intron and exon sequence, which is more than 25 times larger than that used in the Stedman et al. (2004) study.

    We inferred the ancestral state of downstream nonsynonymous human-chimpanzee fixations by obtaining macaque sequence data from the NCBI trace archive for orthologous MYH16 exons. In support of the authenticity of the macaque trace file sequence, there is only one synonymous difference between it and the seven exons included in the Stedman et al. (2004) analysis (sequences from M. mulatta, M. nemestrina, and M. fascicularis).

    We used a comparative analysis of the separate upstream and downstream regions to test several hypotheses. First, Stedman et al. (2004) only collected downstream data, so we were interested in comparing our age estimate from a larger sample of sequence from this same region to determine if the original age estimate may have been a result of so few sampled nucleotide sites. Secondly, although the exon 18 deletion results in a downstream stop codon, as previously mentioned, the gene is still apparently transcribed. Therefore, it is possible that, subsequent to the deletion event, the upstream and downstream regions experienced different historical periods of functional constraint (i.e., the upstream region still being translated), which can be assessed by comparing nucleotide fixation estimates between the two regions as well as between human and chimpanzee MYH16 sequences.

    Results

    Compared with the estimate of Stedman et al. (2004) from 1,065 bp of coding sequence downstream from the deletion, we find a much greater proportion of fixed nonsynonymous differences in the downstream region; dN/dS = 0.637 using the method of Nei and Gojobori (1986) as implemented in the software package MEGA version 2.1 of Kumar et al. (2001) (table 1). This method produces similar results to likelihood methods (e.g., Goldman and Yang 1994) for single sequences from two closely related species and has fewer parameters. Of the 16 downstream nonsynonymous human-chimpanzee differences, we infer from chimpanzee-macaque homology that all but one occurred along the human lineage. We then employed the same evolutionary model for nonsynonymous fixations and the same dating method as in Stedman et al. (2004) to make our deletion age estimates comparable. Based on a 6-Myr divergence of the human-chimpanzee lineages (Haile-Selassie 2001; Brunet et al. 2002) and 15 nonsynonymous human lineage substitutions, we estimate the age of the exon 18 deletion at 5.3 ± 1.0 MYA. Similar to Stedman et al. (2004), our confidence interval incorporates standard errors involving a 5 to 7 MYA range for human-chimpanzee lineage divergence as well as the genome-wide estimate of human-chimpanzee silent site nucleotide divergence (Yi, Ellsworth, and Li 2002). This age estimate is not only outside the confidence interval of the 2.4 ± 0.3 MYA estimate obtained by Stedman et al. (2004) and significantly older than the first appearance of Homo in the fossil record but also consistent with an origin around the time that human and chimpanzee lineages diverged.

    Discussion

    There are several possible explanations for the incongruent mutation age estimates. The most apparent explanation is that our data set of downstream exon sequences is considerably larger than that of Stedman et al. (2004) and is likely a more accurate reflection of the neutral fixation rate in the human lineage. Watanabe et al. (2004), in a comparison of genes on chimpanzee chromosome 22 and human chromosome 21, discovered discrepancies in dN/dS estimates between their study and a previous analysis that did not include all exons of each gene (Shi et al. 2003). Although this study certainly does not suggest that all exons must be analyzed to estimate fixation rates, it does, along with our current analysis, reflect how such estimates may be impacted by a relatively small number of sampled nonsynonymous fixations. In fact, several of the discrepancies found between the Watanabe et al. (2004) and Shi et al. (2003) studies are comparable to the magnitude of difference between the dN/dS ratios estimated by Stedman et al. (2004) and our analysis.

    It is also possible that ancestral states of downstream human-chimpanzee fixations were incorrectly inferred based on only one outgroup sequence. To investigate the likelihood of this possibility, we added dog (Canis familiaris) trace files to our downstream MYH16 exon data set. Although the macaque is certainly a more acceptable outgroup compared with the dog based on a 23-MYA macaque-human divergence estimate versus a 92-MYA dog-human divergence (Kumar and Hedges 1998), this comparison provides a relative estimate of convergence at MYH16 in the primate lineage. We were able to obtain exon data to polarize 13 of the 16 human-chimpanzee nonsynonymous fixations (2,495 N sites). Of these 13 sites, one is similar between dog, macaque, and human, 10 are similar between dog, macaque, and chimp, and two are shared by dog and human. As an approximation, we divided each of these two ambiguous human-chimpanzee nonsynonymous fixations equally between each lineage (human lineage N = 10 + [2 x 0.5] = 11) and estimate the age of the exon 18 deletion at 4.3 ± 0.9 MYA, using the same method described above. This conservative estimate is still significantly older than that of Stedman et al. (2004) and consistent with an origin around the time of human-chimpanzee divergence, which suggests that the incongruence of the age estimates is not caused by errors in ancestral sequence reconstruction.

    The deletion age estimate is based on the assumption that nonsynonymous differences in the downstream region became fixed because they were simply neutral subsequent to the inactivation of the human MYH16 gene. Although we cannot directly confirm or reject the pseudogene status of human MYH16, we were interested in describing the patterns of fixation in exons in the upstream region and in introns both upstream and downstream of exon 18 to address this question. Table 1 shows estimates of dN/dS in upstream and downstream regions as well as estimates of the neutral fixation rate at MYH16 based on divergence in introns from the same regions. Because the ratio of N/S sites is similar in the upstream (3.58) and downstream (3.59) coding regions, we may expect similar dN/dS across the two regions if they have been under similar functional constraint. However, we find that dN/dS in the downstream region is more than a magnitude greater than that in the upstream region (Fisher's exact test; P < 0.01). If this statistically significant observation was caused by a simple difference in underlying mutation rates, we may expect that introns dispersed among the exons would exhibit different rates as well; however, divergence estimates in the upstream (d = 0.0076) and downstream (d = 0.0087) intron regions were similar (Fisher's exact test; P = 0.36).

    Although the significantly different dN/dS ratios of the two regions imply historical differences in functional constraint, it is not clear whether the estimates from the upstream and downstream regions are too low or too high, respectively, compared with that expected under neutral evolution. However, given that the chimpanzee MYH16 gene is still active, we can conservatively assume that the dN estimate in this lineage reflects the fixation rate expected under a model of functional constraint. Therefore, if we consider the single chimpanzee-lineage nonsynonymous fixation in the downstream region and assume that one of the two upstream nonsynonymous human-chimpanzee differences occurred in each lineage, the chimpanzee lineage has an estimated two MYH16 nonsynonymous fixations. This dN estimate is similar to that in the human upstream MYH16 region, yet it is significantly less than that in the human downstream region. It is possible that CpG sites, which have an estimated 10-fold greater mutation rate (Yang et al. 1996; Ebersberger et al. 2002; Subramanian and Kumar 2003), occur at a higher frequency in the downstream coding regions and explain the greater dN/dS ratio. However, even when we account for a 10-fold greater rate for CpG sites, our downstream estimate of dN is consistent with that expected under a neutral model of evolution since human-chimpanzee divergence.

    Although the human MYH16 downstream region is consistent with neutral evolution in the past 5 Myr, the human-chimpanzee upstream region comparisons reflect functional constraint. It is possible that the upstream region still remains active and functional; however, further analyses at both the gene and nucleotide levels are needed to address this question. As a final note about testing null models of evolution, one must take precautions against testing for the presence of natural selection. As is true of many genes, their histories have likely been subjected to periods of both neutral evolution and natural selection, and, therefore, even in our case, we cannot rule out the possibility that several MYH16 amino acid fixations were the result of positive selection. However, although any data set can eventually be fit to many adaptive scenarios, the appropriate null hypothesis to test, the neutral model, cannot be rejected in favor of positive selection at the MYH16 gene. It is also clear that models predicting equal fixation rates and neutrality for sites under different functional constraints can largely impact how we interpret patterns of molecular and phenotypic evolution among closely related lineages.

    Conclusion

    Our results show conflicting estimates of upstream and downstream nonsynonymous fixation rates that are not consistent with a simple model of pseudogene evolution for human MYH16. In addition, the assumption that all human lineage nonsynonymous mutations are deleterious before the inactivation of a gene may be overly conservative. In fact, other studies have found a higher nonsynonymous than synonymous fixation rate for about 10% to 20% of functional genes in large-scale human-chimpanzee comparisons (Clark et al. 2003; Watanabe et al. 2004), which suggests that recent positive selection may not be a rare phenomenon. Our analyses also demonstrate that fixation rate estimates based on a few differences among recently diverged lineages can be associated with large variances, and, therefore, mutation age estimates based on dN/dS calculations are highly tenuous. Thus, our results illustrate how important sample size and different gene region analyses can be for estimates related to a single evolutionary event. Finally, our results question previous conclusions that suggest the inactivation of MYH16 was associated with masticatory gracilization and an increased cranial capacity in the genus Homo.

    Acknowledgements

    We thank Sankar Subramanian and Sudhir Kumar for their discussion and comments on an earlier draft of this manuscript.

    References

    Anisimova, M., J. P. Bielawski, and Z. Yang. 2002. Accuracy and power of Bayes prediction of amino acid sites under positive selection. Mol. Biol. Evol. 19:950–958.

    Bejerano, G., M. Pheasant, I. Makunin, S. Stephen, W. J. Kent, J. S. Mattick, and D. Haussler. 2004. Ultraconserved elements in the human genome. Science 304:1321–1325.

    Brunet, M., F. Guy, D. Pilbeam et al. (35 co-authors). 2002. A new hominid from the Upper Miocene of Chad, Central Africa. Nature 418:145–151.

    Carroll, S. B. 2003. Genetics and the making of Homo sapiens. Nature 422:849–857.

    Chou, H. H., T. Hayakawa, S. Diaz, M. Krings, E. Indriati, M. Leakey, S. Paabo, Y. Satta, N. Takahata, and A. Varki. 2002. Inactivation of CMP-N-acetylneuraminic acid hydroxylase occurred prior to brain expansion during human evolution. Proc Natl Acad Sci USA 99:11736–11741.

    Clark, A. G., S. Glanowski, R. Nielsen et al. (14 co-authors). 2003. Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302:1960–1963.

    Das, J., S. T. Miller, and D. L. Stern. 2004. Comparison of diverse protein sequences of the nuclear-encoded subunits of cytochrome C oxidase suggests conservation of structure underlies evolving functional sites. Mol. Biol. Evol. 21:1572–1582.

    Desjardins, P. R., J. M. Burkman, J. B. Shrager, L. A. Allmond, and H. H. Stedman. 2002. Evolutionary implications of three novel members of the human sarcomeric myosin heavy chain gene family. Mol. Biol. Evol. 19:375–393.

    Ebersberger, I., D. Metzler, C. Schwarz, and S. Paabo. 2002. Genomewide comparison of DNA sequences between humans and chimpanzees. Am. J. Hum. Genet. 70:1490–1497.

    Enard, W., P. Khaitovich, J. Klose et al. (10 co-authors). 2002a. Intra- and interspecific variation in primate gene expression patterns. Science 296:340–343.

    Enard, W., M. Przeworski, S. E. Fisher, C. S. Lai, V. Wiebe, T. Kitano, A. P. Monaco, and S. Paabo. 2002b. Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418:869–872.

    Goldman, N., and Z. Yang. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725–736.

    Haile-Selassie, Y. 2001. Late Miocene hominids from the Middle Awash, Ethiopia. Nature 412:178–181.

    Kimbel, W. H., D. C. Johanson, and Y. Rak. 1997. Systematic assessment of a maxilla of Homo from Hadar, Ethiopia. Am. J. Phys. Anthropol. 103:235–262.

    King, M. C., and A. C. Wilson. 1975. Evolution at two levels in humans and chimpanzees. Science 188:107–116.

    Kumar, S., and S. B. Hedges. 1998. A molecular timescale for vertebrate evolution. Nature 392:917–920.

    Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17:1244–1245.

    McGuigan, K., P. C. Phillips, and J. H. Postlethwait. 2004. Evolution of sarcomeric myosin heavy chain genes: evidence from fish. Mol. Biol. Evol. 21:1042–1056.

    Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418–426.

    Shi, J., H. Xi, Y. Wang et al. (18 co-authors). 2003. Divergence of the genes on human chromosome 21 between human and other hominoids and variation of substitution rates among transcription units. Proc. Natl. Acad. Sci. USA 100:8331–8336.

    Sibley, C. G., and J. E. Ahlquist. 1984. The phylogeny of the hominoid primates, as indicated by DNA-DNA hybridization. J. Mol. Evol. 20:2–15.

    Stedman, H. H., B. W. Kozyak, A. Nelson, D. M. Thesier, L. T. Su, D. W. Low, C. R. Bridges, J. B. Shrager, N. Minugh-Purvis, and M. A. Mitchell. 2004. Myosin gene mutation correlates with anatomical changes in the human lineage. Nature 428:415–418.

    Subramanian, S., and S. Kumar. 2003. Neutral substitutions occur at a faster rate in exons than in noncoding DNA in primate genomes. Genome Res 13:838–844.

    Tobias, P. V. 1991. The skulls, endocasts and teeth of Homo habilis. University Press, Cambridge.

    Watanabe, H., A. Fujiyama, M. Hattori et al. (42 co-authors). 2004. DNA sequence and comparative analysis of chimpanzee chromosome 22. Nature 429:382–388.

    Yang, A. S., M. L. Gonzalgo, J. M. Zingg, R. P. Millar, J. D. Buckley, and P. A. Jones. 1996. The rate of CpG mutation in Alu repetitive elements within the p53 tumor suppressor gene in the primate germline. J. Mol. Biol. 258:240–250.

    Yi, S., D. L. Ellsworth, and W. H. Li. 2002. Slow molecular clocks in Old World monkeys, apes, and humans. Mol. Biol. Evol. 19:2191–2198.

    Zhang, J., D. M. Webb, and O. Podlaha. 2002 Accelerated protein evolution and origins of human-specific features: FOXP2 as an example. Genetics 162:1825–1835.(George H. Perry*, Brian C)