当前位置: 首页 > 期刊 > 《分子生物学进展》 > 2005年第4期 > 正文
编号:11176565
Evolutionary Diversification of DNA Methyltransferases in Eukaryotic Genomes
http://www.100md.com 《分子生物学进展》
     Department of Ecology & Evolution, University of Chicago

    Correspondence: E-mail: ponger@uchicago.edu.

    Abstract

    In eukaryotes, C5-cytosine methylation is a common mechanism associated with a variety of functions such as gene regulation or control of genomic stability. Different subfamilies of eukaryotic methyltransferases (MTases) have been identified, mainly in metazoa, plants, and fungi. In this paper, we used hidden Markov models to detect MTases in completed or almost completed eukaryotic genomes, including different species of Protozoa. A phylogenetic analysis of MTases enabled us to define six subfamilies of MTases, including two new subfamilies. The dnmt1 subfamily that includes all the known MTases with a maintenance activity seems to be absent in the Protozoa. The dnmt2 subfamily seems to be the most widespread, being present even in the nonmethylated Dictyostelium discoideum. We also found two dnmt2 members in the bacterial genus Geobacter, suggesting that horizontal transfers of MTases occurred between eukaryotes and prokaryotes. Even if the direction of transfer cannot be determined, this relationship might be useful for understanding the function of this enigmatic subfamily of MTases. Globally, our analysis reveals a great diversity of MTases in eukaryotes, suggesting the existence of different methylation systems. Our results also suggest acquisitions and losses of different MTases in every eukaryotic lineage studied and that some eukaryotes appear to be devoid of methylation.

    Key Words: DNA methylation ? DNA methyltransferases ? eukaryotes ? repeat-induced point mutation

    Introduction

    DNA methylation is a common epigenetic modification found throughout the three domains of life, Eubacteria, Archeabacteria, and Eukaryota. This natural process is catalyzed by some specific enzymes, the DNA methyltransferases (MTases), which introduce a methyl group at the N6 position of adenine or the N4 or C5 position of cytosine. Among all these modifications, the methylation of C5-cytosine (simply named here as methylation) is the most prevalent, notably in eukaryotic genomes. So far, methylation has been found in invertebrates, fungi, protozoa, and all of the higher plants and vertebrates studied. However, methylation is not a ubiquitous feature of the eukaryotic genomes because some organisms, including Saccharomyces cerevisiae, Schizosaccharomyces pombe, or Caenorhabditis elegans, lack detectable methylation (for review, see Colot and Rossignol 1999).

    Different C5-cytosine MTases have been characterized in prokaryotes and eukaryotes. All MTases share a catalytic domain containing 10 conserved small motifs, suggesting a common origin (Posfai et al. 1989; Kumar et al. 1994). Based on sequence similarity, the eukaryotic MTases have been grouped into six subfamilies, namely DNMT1/MET1, DNMT2/PMT1, DNMT3/DRM, the chromomethylases (CMT), MASC1/RID, and MASC2/DIM2 (for review, see Chen and Li 2004). Two major types of activities can be distinguished among these MTases, namely "maintenance" and "de novo" methylation (Riggs 1975). Maintenance methylation occurs after DNA replication on hemimethylated symmetric motifs (CpG and CpNpG), whereas methylation that occurs at previously unmethylated cytosines (in symmetrical or asymmetrical sites) is known as de novo methylation. In eukaryotes, the catalytic domain is often associated with highly variable N-terminal extensions containing various evolutionarily conserved domains, which have been shown to be involved in functional specializations (Chen and Li 2004). Comparative analysis of model organisms reveals that different subsets of MTases are associated with different methylation patterns and cellular functions among eukaryotes (Colot and Rossignol 1999).

    In mammals, methylation occurs predominantly at the symmetrical dinucleotide CpG and defines a tissue-specific pattern concerning 60%–90% of the CpGs throughout the genome (Bird 1986). This methylation pattern is set up in the early embryo by the de novo MTases DNMT3a/b and is mostly maintained during the development by the maintenance activity of DNMT1 (Reik, Dean, and Walter 2001). Numerous studies have revealed a strong negative correlation between DNA methylation of the promoter region and gene expression (Attwood, Yung, and Richardson 2002), and these MTases are indispensable for normal development (Li, Bestor, and Jaenisch 1992; Okano et al. 1999). DNA methylation is notably implicated in chromosome X inactivation and imprinting (Lee 2003) and also in cancer development or aging (Yuasa 2002). Methylation could also be used to inhibit transposition of transposable elements (Yoder, Walsh, and Bestor 1997). Indeed, mammalian transposable elements are heavily methylated, and the DNMT1 disruption in the mouse embryo shows that the intracisternal A-particules retroelements are inactivated by methylation (Walsh, Chaillet, and Bestor 1998).

    In higher plants, methylation is commonly found not only in the symmetrical motifs CpG and CpNpG but also in some asymmetrical context (CpN) and is needed for normal development (see Chen and Li 2004). Genetic experiments showed that the domain rearranged MTases (DRM1–3) are the principal de novo MTases in Arabidopsis thaliana (Cao et al. 2003). The maintenance of CpG methylation is carried out by MET1 (Bartee and Bender 2001), whereas the maintenance of CpNpG and CpN seems to be due to DRM3 and the CMT (Cao et al. 2003). Plant methylation contributes to gene and transposon silencing (Miura et al. 2001) and may involve RNA interference (Rabinowicz et al. 2003).

    Methylation is also well documented in the filamentous fungi Neurospora crassa and Ascobolus immersus and is associated with two mechanisms named repeat-induced point mutation (RIP) and methylation induced premeiotically (MIP), respectively (Bender 1998). In these species, methylation resides in the asymmetrical sites (CpN) located in the repeated sequences and was shown to protect their genomes against the deleterious effect of repeated sequences (Selker 2002). The repeated sequences are identified and methylated de novo during the sexual phase, and the methylation is maintained through the vegetative development. Methylation of the repeated elements inhibits the elongation of their transcription, reduces recombination rate (Maloisel and Rossignol 1998), and, in N. crassa (RIP), decreases the similarity between repeat copies (Selker et al. 2003). Two putative MTases were found in N. crassa. RIP defective (RIP) shows no MTase activity in vitro, and its disruption eliminates the RIP but has no effect on the methylation pattern (Freitag et al. 2002). On the other hand, disruption of DIM2 eliminates methylation in vegetative tissues, indicating that this enzyme is responsible for the maintenance of the methylation (Kouzminova and Selker 2001) and could have also a de novo activity. In A. immersus, MASC1, homologous to RID, is necessary for de novo methylation but not for maintenance methylation (Malagnac et al. 1997, 1999). The putative MTase MASC2, homologous to DIM2, did not alter MIP or the methylation pattern. These results suggest a third MTase with a maintenance activity that remains to be found in this species.

    In eukaryotic genomes, there is another well-conserved class of MTases, named DNMT2. Indeed, DNMT2 MTases were found in vertebrate species (Yoder and Bestor 1998), higher plants, Drosophila melanogaster (Hung et al. 1999), Entamoeba histolytica (Fisher, Siman-Tov, and Ankri 2004), and even in the nonmethylated S. pombe (Wilkinson et al. 1995). Except in S. pombe, all these enzymes exhibit a weak de novo activity (Hermann, Schmitt, and Jeltsch 2003; Kunert et al. 2003; Fisher, Siman-Tov, and Ankri 2004; Mund et al. 2004). The homologue in S. pombe (PMT1) does not exhibit any activity due to a mutation inside the catalytic site (Pinarbasi, Elliott, and Hornby 1996). As the prokaryotic MTases, all the DNMT2 homologues lack the N-terminal domains. The disruption of DNMT2 in the mouse or the fly does not exhibit any particular phenotype (Okano, Xie, and Li 1998; Kunert et al. 2003).

    Based on these observations, Colot and Rossignol (1999) proposed that methylation has divergent functions in different organisms, consistent with the notion that it is a dynamically evolving mechanism that can be adapted to perform various functions. In order to analyze the variability of methylation systems, we present a survey of MTases in complete or almost complete eukaryotic genomes, including several species of Protozoa. We also reconstructed a phylogeny of the putative MTases identified to study the evolutionary history of this function and to classify eukaryotic MTases.

    Materials and Methods

    Identification of MTases

    The putative MTases were obtained from the complete or almost complete eukaryotic proteomes listed in table 1 and GenBank (release 142). All sequences with at least 99% similarity over at least 100 amino acids (aa) were considered as redundant or alternative spliced, and only one copy was retained.

    Table 1 Eukaryotic Proteomes Analyzed in the Present Study

    To identify the putative MTases, we used the hidden Markov model (HMM) corresponding to the C5-cytosine–specific DNA MTase domain described in Pfam (named DNA_methylase, release 14). The domains were searched by using the program Hmmpfam (package Hmmer; S. Eddy, personal communication), and all the proteins with a score above –125 were regarded as members of the MTase family. This threshold was used instead of the default cutoff defined in Pfam (–84) in order to increase the sensitivity. For example, DNA methylation was described in Plasmodium falciparum (Pollack, Kogan, and Golenser 1991), but no MTase was found with the default cutoff. The new threshold enables us to find a putative MTase with a score of –112. The corresponding protein can be aligned with the other putative and known MTases. Contrary to the proteins with a score below –125, all the eukaryotic proteins identified with a score between –84 and –125 can be correctly aligned with the other putative and known MTases.

    We also constructed a new HMM to detect the DRM MTases found in higher plants (Cao and Jacobsen 2002). These MTases are homologous to the vertebrate DNMT3 and are characterized by a rearrangement of the MTase domain. The conserved motifs of the MTase domain are numbered from I to X from the N-terminus to the C-terminus. In the DRM MTases, a rearrangement occurred between domains IV and V and switched the N-terminus (I–V) with the C-terminus (VI–X) motifs (Cao and Jacobsen 2002). One consequence of this rearrangement is that the known DRM are not identified by the MTase HMM defined in Pfam. We constructed a specific HMM for the DRM by using the catalytic domains of the DRM described in GenBank. The domains were aligned, and the HMM was constructed by using the program Hmmbuild (S. Eddy, personal communication). The cutoff value for this model was defined to be 0 because it was not possible to align the proteins with a score above this score.

    We used an iterative method to detect putative MTases. The MTase domains of the proteins annotated in GenBank were used as query sequences in a first Blast (Altschul et al. 1997) search against the proteomes. All sequences with an E-value lower than 1 were searched for the MTase (DNA_methylase or DRM) domains. Then, the detected domains were used as query sequences for a second Blast search against the proteomes, and the candidates sequences were analyzed with the HMM. These two steps were repeated until no new putative MTases were identified.

    Phylogenetic Trees

    The protein sequences of the 299 putative MTase domains were extracted and aligned by using ClustalW (Higgins 1994). Concerning the DRM MTases, the C-terminus (motifs VI–X) and the N-terminus (motifs I–IV) of the domain were switched to fit the ancestral (or common) organization (motifs I–X). The phylogenetic trees were constructed by using the JTT model (Jones, Taylor, and Thornton 1992) with the neighbor-joining (NJ) method implemented in MEGA (Kumar, Tamura, and Nei 2004) and the maximum likelihood (ML) method implemented in PhyML (Guindon and Gascuel 2003). These methods were chosen because of the old divergence time of the MTases, probably prior to the divergence between prokaryotes and eukaryotes. The consensus trees were deducted from 250 bootstrap replicates for the ML (by using PHYLIP, J. Felsenstein, personal communication) and NJ methods. The topologies of the trees are congruent, and both support the results presented in this paper. These trees are too large to be presented in this paper and are available as Supplementary Material (figs. 2 and 3).

    The tree presented in figure 1 is a consensus tree including all the eukaryotic and 21 representative prokaryotic MTases. It was reconstructed by using the ML method and 250 bootstrap replicates. It is not known where the root of the tree is located, but there is no reason to believe that the MTase domain emerged in eukaryotes. Therefore, the trees were rooted by using a randomly chosen prokaryotic group of MTases.

    FIG. 1.— Phylogeny of C5-cytosine MTases. The sequences are represented by the species name and the score obtained with the HMMs. The domains were searched using SMART: white rectangle, DNA_methylase; black rectangle, CHROMO; gray rectangle, STYKc; white triangle, HTH_XRE; gray triangle, zinc finger domains; black triangle, ubiquitin associated domain (UBA); white pentagon, PWWP; gray pentagon, PHD; black pentagon, BAH; white circle, DEXDc; gray circle, RING finger; black circle, helicase_C. The MTase activities were extracted from the literature (Chen and Li 2004).

    Results

    We searched the MTases in the complete or almost complete eukaryotic proteomes listed in table 1. The MTases were identified by using the HMM described in Pfam (Bateman et al. 2004) and a modified HMM for the DRM (see Materials and Methods). These methods enabled us to detect the proteins with MTase domain, including the domains without any methylation activity. Thus, all the new MTases found should be considered as putative MTases.

    Distribution of MTases in Eukaryotes

    We identified 74 eukaryotic putative MTases among the genomes listed in table 1. Some of these proteins were already known, notably in vertebrates, insects, some plants, and some fungi. However, 34 of them have not been published before. Indeed, we described some new putative MTases in the Protozoa Dictyostelium discoideum (1), Leishmania major (1), P. falciparum (1) and Plasmodium yoelii (1), Trypanosoma brucei (1), Cyanidioschyzon merolae (1), and Thalassiosira pseudonana (4). We also found some putative MTases in the fungi Cryptococcus neoformans (1), Magnaporthe grisea (3), Aspergillus nidulans (2), and Aspergillus fumigatus (2). In the chordates, we found four new putative MTases in Ciona intestinalis. Among plants, we detected four new MTases in addition to DMT1 in Chlamydomonas reinhardtii (Nishiyama et al. 2002) and eight new MTases in addition to MET1 and MET2 in Oryza sativa (Teerawanichpan et al. 2004). Moreover, no MTase was identified in the genomes of Giardia lamblia, Guillardia theta, Encephalitozoon cuniculi, and the hemiascomycetes (S. cerevisiae, Candida glabrata, Kluyveromyces lactis, Kluyveromyces waltii, Ashbya gossypii, Debaryomyces hansenii, and Candida albicans) (table 2).

    Table 2 Diversity of Eukaryotic C5-Cytosine MTases

    For two species, we observed some discrepancy between the methylation status of the genome and the presence or absence of MTases (table 2). Indeed, a putative MTase was found in the proteome of D. discoideum, which does not show any detectable methylation (Smith and Ratner 1991). A similar case was already observed in S. pombe (Wilkinson et al. 1995). On the other hand, DNA methylation was described in C. albicans (Russell et al. 1987), but we were not able to identify any putative MTase in its proteome. Actually, the best score is –169 (see Materials and Methods), but the corresponding protein cannot be aligned with the other MTases. The genome of C. albicans is not yet published, so the absence of MTases in this species is not certain. Nevertheless, our failure to find any MTase in C. albicans is supported by the absence of MTase in the published proteomes of the close relative hemiascomycetes.

    We used a modified HMM to detect the DRM MTases that exhibit a rearrangement of the MTase domain and are not detectable by using the MTase domain defined in Pfam (Cao and Jacobsen 2002). However, no DRM MTases were detected in eukaryotes other than higher plant proteomes (A. thaliana and O. sativa), and surprisingly, no DRM was detected in C. reinhardtii, an organism belonging to the Viridiplantae.

    Phylogeny of Eukaryotic MTases

    A phylogeny of the MTase domains was constructed to define (1) the eukaryotic subfamilies, (2) the relationships among these subfamilies, and (3) their relationships with the prokaryotic MTases. Two full trees each including all the eukaryotic MTases listed in table 1, two MTases from A. immersus described in GenBank, and all the prokaryotic MTases were reconstructed by using the NJ and ML methods (figs. 1 and 2 in Supplementary Material). These trees are mostly congruent with the ML tree presented figure 1 (see below).

    Except two orphans in T. pseudonana and M. grisea, all the eukaryotic putative MTases can be grouped into six subfamilies. These subfamilies are named in figure 1. A classification based on the functional properties of the MTases was published by Chen and Li (2004), who defined six groups of MTases. However, we suggest that the maintenance MTases DNMT1, CMT, and MASC2/DIM2 belong to the same subfamily (named dnmt1) as previously observed by Cao et al. (2003) instead of three subfamilies, and we characterized two new subfamilies (dnmt5 and dnmt6).

    The dnmt1 subfamily includes different groups of MTases: DNMT1 from chordates, MET1 and CMT from plants, DIM2 from N. crassa and M. grisea, MASC2 from A. immersus, and an MTase from RaHV-1, an herpes virus infecting the frog Rana pipiens (Davison et al. 1999). These MTases are grouped in most phylogenies with a bootstrap value varying between 40% and 60% (fig. 1 and fig. 2 in Supplementary Material). In some rare cases, this subfamily seems polyphyletic (fig. 1 in Supplementary Material). Except for the chordate DNMT1 and the higher plant MET1, which seem to be orthologous, the relationships among MTases are not well defined. The domain structures of MET1 and DNMT1 are similar with two bromo-adjacent homology (BAH) domains in the N-terminus part. The other MTases have one or no BAH domain. The proteins exhibit a lot of different MTase activities, but the subfamily includes all the MTases with a maintenance activity (fig. 1). No member of the dnmt1 subfamily was found in the analyzed Protozoa.

    The dnmt2 subfamily is represented in some fungi, some plants, some metazoa, and also in some protozoa. Despite the large taxonomic range of this subfamily, no duplication was observed. Indeed, all chordates studied, D. melanogaster, Anopheles gambiae, all plants studied, S. pombe (PMT1), E. histolytica, P. falciparum and P. yeolii, T. pseudonana, and D. discoideum each contain only one member of the dnmt2 subfamily. A dnmt2 homologue was also detected in the Eubacteria Geobacter sulfurreducens and Geobacter metallireducens. Surprisingly, the MTase found in the nonmethylated genome of D. discoideum belongs to the dnmt2 subfamily as PMT1, the MTase found in the nonmethylated genome of S. pombe (Smith and Ratner 1991). The dnmt2 subfamily is composed of MTases with a weak or no activity (PMT1) (fig. 1). Except in P. yoelii, dnmt2 proteins are composed of only an MTase domain (fig. 1). In P. yoelii, 351 aa without any known domain were found at the N-terminus part of the protein (fig. 1).

    The dnmt3 subfamily includes the MTase of C. merolae, one MTase of T. pseudonana, the DRM of plants, and some chordate MTases (DNMT3). The DRM observed in higher plants was not observed in any other species, suggesting that the rearrangement is specific to plants. Moreover, no DNMT3 or DRM homologue was detected in the genome of the plant C. reinhardtii, indicating that this species probably lost this gene. All MTases of the dnmt3 subfamily with a known function exhibit de novo methylation activity (DRM and DNMT3). Most, but not all, of the proteins of this subfamily share a Pro-Trp-Trp-Pro (PWWP) and a plant homeodomain (PHD) domain. The MTase of C. merolae has a PWWP domain, whereas in T. pseudonana, the protein only includes a short MTase domain, suggesting that this protein is partial.

    The dnmt4 subfamily is composed of the MTase of the ascomycetes N. crassa (RID1), A. immersus (MASC1), and M. grisea and the close relatives A. fumigatus and A. nidulans. No copy of this subfamily was observed in C. neoformans (fungi, basidomycetes) or any other eukaryotes. In MASC1, the N-terminal part includes a BAH domain.

    We also found two completely new subfamilies. The dnmt5 subfamily includes the MTases of the fungi A. nidulans, A. fumigatus (Ascomycetes), and C. neoformans (Basidomycetes). Surprisingly, these proteins share a long C-terminal part containing a DEXDc domain and a helicase_C domain. The dnmt6 subfamily includes the MTases of T. brucei, L. major (Euglenozoa), and T. pseudonana (Stramenopiles). The MTase domain found in T. pseudonana is very short compared to the other protein.

    Discussion

    In this paper, we presented an in silico analysis of putative MTases in the completed or almost completed eukaryotic genomes. We included different species of Protozoa in order to reveal the diversity of MTases in eukaryotes.

    Methodological Issues

    We discuss several issues concerning the data set and the methods used in this analysis. First, several genomes analyzed are incomplete to date (table 1) (Colot and Rossignol 1999). The potential problems due to this shortcoming include (1) nondetection of some MTases, (2) inclusion of false MTases, or (3) some annotation errors on the gene structure. To increase the sensitivity, we used a less stringent threshold with the HMM method. This new threshold enabled us to detect an MTase with a low score in the methylated genome of P. falciparum (see Materials and Methods). Moreover, except C. albicans, all the genomes without any MTases were already published, suggesting that the absence of MTase in these species is real. In C. albicans, the absence of MTase is in opposite to the experimental report of DNA methylation (Russell et al. 1987) but is supported by the analysis of the published genomes of all the hemiascomycetes included in this study (table 2). However, the case of dnmt2 in Takifugu rubripes (table 2) shows that we cannot exclude the possibility of nondetection.

    Second, all the putative MTases detected exhibit a sequence similarity high enough to be aligned together and to reconstruct their phylogeny, suggesting that all these proteins do belong to the MTase family. Finally, the same subsets of MTases are found in the closely related species P. falciparum and P. yoelii, L. major and T. brucei, and A. fumigatus and A. nidulans (table 2), and most of the MTases belong to families found in different species (fig. 1). The orphan MTases found in T. pseudonana and M. histolytica are more suspicious and have to be confirmed by additional analysis or data. This is particularly true for M. histolytica because the orphan MTase was not found in the published genome of the close relative N. crassa.

    Gene structure seems to be the most important problem because the first releases of gene annotation often include partial or fused genes. This kind of error might explain the structure observed for the really short MTases in T. pseudonana (dnmt6 and dnmt3) or the MTases with a long C-terminal part in O. sativa (dnmt1) and the fungi (dnmt5). An analysis of cDNA should confirm or modify the particular structure of these proteins. The domain structure of the protein will not be discussed further in this paper.

    Diversity of the MTase Subsets in Each Species

    One major result of our analysis is the great variability of the MTases detected in different species (table 2). This variability is characterized by the use of different subsets of the six MTase subfamilies and by some species-specific duplications inside each subfamily. Indeed, the known MTase subsets used by the vertebrates, by the higher plants, or by the fungi N. crassa and A. immersus were not found in the other groups of eukaryotes (table 2). Among the plants, the subset used by C. reinhardtii is different from the subset of the higher plants because it does not include dnmt3, which is known to be an MTase de novo and responsible for the methylation pattern on the CpG motifs. On the other hand, the subset used by C. merolae is only composed of a dnmt3 MTase. In T. pseudonana, there are one dnmt2, one dnmt3, one dnmt6, and one orphan MTase. T. brucei and L. major possess only one dnmt6 MTase. Among the fungi, Aspergillus species exhibit an original subset composed of dnmt4 and dnmt5. Because the known subset in higher plants, vertebrates, and filamentous fungi is associated with different functions, the diversity of the subsets can be correlated to a great diversity of functions in these species and a fast evolving system of methylation during eukaryotic speciation.

    The absence of methylation and MTases in many unrelated species (fungi, metazoa, protozoa) suggests that life is possible without methylation. However, some MTases were found in the reduced proteomes of the parasitic eukaryotes E. histolytica, Plasmodium, T. brucei, or L. major and the herpes virus RaHV-1. After the acquisition of a parasitic life cycle in each of these organisms, several genes have been lost, so the conservation of the MTases suggests that the methylation system is important for these species.

    Origin and Function of the dnmt2 Subfamily

    Actually, dnmt2 is the most widespread MTase among the eukaryotic kingdom. The cellular function of this subfamily is not known, but its distribution suggests an ancestral origin and an essential function in eukaryotes. The members of this subfamily exhibit no or a weak MTase activity. The overexpression of dnmt2 in D. melanogaster was shown to methylate some Cp(A/T) motifs in a specific manner in randomly chosen DNA segments (Kunert et al. 2003) and could be responsible for the low level of methylation found in the early stage of development (Lyko, Ramsahoye, and Jaenisch 2000). The expression of the murine dnmt2 in D. melanogaster also exhibits a weak activity and an affinity for Cp(A/T) motifs (Mund et al. 2004). However, in vitro assays with the human dnmt2 show a weak activity on the CpG motifs (Hermann, Schmitt, and Jeltsch 2003). In vitro, E. histolytica dnmt2 exhibits a weak MTase activity with a small preference for CpA or CpT motifs in the rDNA regions (Fisher, Siman-Tov, and Ankri 2004). In yeast, dnmt2 is able to recognize and methylate the CC(A/T)GG motifs when the serine disrupting the catalytic site is removed (Pinarbasi, Elliott, and Hornby 1996). The biological function of this subfamily is not known because its disruption does not cause any observable phenotype in mouse, yeast, and fly (Pinarbasi, Elliott, and Hornby 1996; Okano, Xie, and Li 1998; Kunert et al. 2003).

    Surprisingly, the dnmt2 subfamily is also present in two nonmethylated species, S. pombe and D. discoideum (Smith and Ratner 1991). Why these enzymes are conserved in nonmethylated species is an open question because these enzymes are only composed of one MTase domain. In S. pombe, PMT1 is nonfunctional because of a mutation in the active site of the protein (Pinarbasi, Elliott, and Hornby 1996). This MTase was extracted from the laboratory Strain 972h– (Wilkinson et al. 1995; Wood et al. 2002), and we cannot exclude the possibility that the wild strain possesses some methylation and a functional MTase. In D. discoideum, some experimental assays are needed to check if the MTase is functional or not because we cannot exclude the possibility that a low level of methylation occurs in some specific development stage of this species.

    Interestingly, this subfamily also includes the MTases from the delta-proteobacterium G. sulfurreducens and G. metallireducens. This relationship suggests a horizontal transfer between the eukaryotic ancestor and an ancestor of Geobacter species. In bacteria, methylation is associated with the restriction/modification (R/M) system, which functions as a defense against infection of bacteriophages and is the source of the overwhelming majority of DNA MTases found in prokaryotes. The system is composed of a restriction endonuclease that targets specific DNA sequences and performs endonucleolytic cleavage and a modification MTase that renders these sequences resistant to cleavage. The genome of G. sulfurreducens, which is completely sequenced (Methe et al. 2003), contains two MTases. These enzymes are listed in Rebase (Roberts et al. 2003), but there is no argument to support a role in a potential R/M system. The predicted recognition motif of the MTase of G. metallireducens described in Rebase is 5'-CC(A/T)GG-3', which is partially consistent with the Cp(A/T) motifs methylated in vivo by some eukaryotic dnmt2 MTases. The evolution of the prokaryotic R/M system is characterized by extensive horizontal transfers between genomes (Kobayashi 2001), and our result indicates that eukaryotic MTases could also be implicated in these transfers. At present, an involvement of dnmt2 in a eukaryotic R/M system inherited from Eubacteria and the transfer of eukaryotic MTases to Eubacteria seem equally possible. The analysis of the methylation system in these bacteria could be useful for understanding the function of the dnmt2 subfamily in eukaryotes.

    MTases in Fungi

    The subset of MTases used by M. grisea is similar to the subset of the close relative N. crassa including one dnmt1 and one dnmt4 MTase. Because the genome of M. grisea is not yet published, further analyses are needed to confirm the existence of the orphan MTase in this species. However, the conservation of dnmt1 and dnmt4 MTases supports the RIP system found in M. grisea (Ikeda et al. 2002).

    Some evidence of RIP was also found in some repeated elements of A. nidulans (Nielsen, Hermansen, and Aleksenko 2001) and A. fumigatus (Clutterbuck 2004). The subset used by Aspergillus species includes two MTases. The first one belongs to the dnmt4 subfamily and is orthologous to RID, a protein indispensable for RIP but without any MTase activity in N. crassa. On the other hand, the second MTase belongs to the dnmt5 subfamily and is found in C. neoformans but not in N. crassa. The active MTase found in N. crassa, DIM2, is not found in Aspergillus species and could be replaced by the dnmt5 MTase.

    Absence of dnmt1 in Many Eukaryotes

    The dnmt1 genes are absent from all the protozoa and fungi analyzed with the exceptions of the filamentous fungi N. crassa, M. grisea, and A. immersus. This distribution could correspond to an acquisition just before the speciation of the plants and the loss of this gene in the fungi C. neoformans (Basidomycetes) and the hemiascomycetes. So far, this subfamily includes all the MTases with a characterized maintenance activity. The maintenance activity is used as a cellular memory for the methylation system during cell division. In vertebrates and plants, DNA methylation is used to set up and maintain the differentiated states of the cells during development. In N. crassa, methylation is set up during the sexual phase on the repeated DNA and just maintained during the asexual phase keeping in memory as to which parts of the genome have to be repressed (Galagan et al. 2003). This observation suggests that the eukaryotes without dnmt1 (1) use another family of maintenance MTase, (2) use another mechanism to memorize the epigenetic state of the cell, or (3) set up the methylation pattern in each cell individually. The existence of another family of maintenance MTase has already been suggested by the analysis of the methylation in A. immersus. In this species, a maintenance activity was detected but cannot be associated with the known MTases. Surprisingly, disruption of MASC2 (in the dnmt1 subfamily) does not modify the methylation pattern, and MASC1 (in the dnmt4 subfamily) exhibits only a de novo activity. These observations suggest that a third MTase could play this function in this species (Malagnac et al. 1999). The function of the RID protein is not known, but its mutation does not modify the methylation pattern in N. crassa. This subfamily also includes the functional MTase of A. immersus (MASC1), which exhibits a de novo methylation activity (Goyon 1998). On the other hand, the epigenetic information could be initiated by de novo methylation and conserved by some mechanisms implicating directly the chromatin structure or the histone. In this case, no maintenance activity should be needed to transmit the epigenetic state.

    The maintenance activity that seems to be specific to the eukaryotes could be actually restricted to some multicellular organisms that need a memory system for gene regulation or selfish DNA control.

    Early Evolution of Eukaryotic MTases

    The nature of the mechanisms used for the acquisition of new MTases and the ancestral function of methylation are open questions. None of the reconstructed phylogenies supports a particular relationship among the six eukaryotic MTase subfamilies, indicating either potential ancestral duplication event or potential functional divergence. Actually, the protein divergences between the eukaryotic subfamilies are similar to the distance observed between the bacterial subfamilies, suggesting an ancestral divergence among all these subfamilies, probably prior to the speciation of the first eukaryotes (Colot and Rossignol 1999). This is in accordance with the view that regards horizontal transfers with bacteria as one of the main mechanisms for the acquisition of new MTases in eukaryotes. There is evidence that the MTases implicated in the R/M system have undergone extensive horizontal transfer between bacterial genomes (Kobayashi 2001). The presence of a bacterial MTase in the dnmt2 subfamily suggests that horizontal transfer of MTases also took place between eukaryotic and bacterial genomes. Thus, even if we cannot reject a potential artifact of reconstruction, some horizontal transfer events from bacteria to eukaryotes seem to be the most important factor for the diversification of eukaryotic MTases.

    The distribution of MTases could be useful for making a hypothesis about the ancestral MTase subsets and thus the ancestral function of methylation. However, the interpretation of our results is mainly limited by our knowledge about the eukaryotic phylogeny. The widespread distribution of dnmt2 suggests that this MTase was acquired early in the eukaryotic evolution and conserved in most of the lineages. On the other hand, the distribution of dnmt1, found only in plants, fungi, and metazoan, suggests a later acquisition or a massive loss in most eukaryotic groups. What was the function of methylation before the divergence of plants from fungi and metazoan? The conservation of the dnmt1, dnmt2, and dnmt3 subfamilies in plants and metazoan suggests that these MTases were already present in the common ancestor. This conservation is also observed for the methylation pattern and the control of the host genes. Moreover, the phylogeny of the dnmt1 subfamilies suggests that the CMT (plants), RID2 (N. crassa) and MASC2 (A. immersus) diverged prior to the plant radiation. These proteins are involved in the control of repetitive sequences, thus suggesting that the ancestral function of methylation could also be the control of repetitive sequences. Finally, these results suggest that the ancestral subset of MTases and the ancestral function were already complex and the evolution of the methylation system occurred by loss of some or all MTases and the loss of some associated functions.

    Supplementary Material

    Trees, the hidden Markov model for DRM MTases, gene names, and sequences are available at the URL: http://home.uchicago.edu/ponger/article/.

    Acknowledgements

    We thank Shinhan Shiu for his help with the methods used in this paper. This study was supported by National Institutes of Health grants.

    References

    Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.

    Attwood, J. T., R. L. Yung, and B. C. Richardson. 2002. DNA methylation and the regulation of gene transcription. Cell Mol. Life Sci. 59:241–257.

    Bateman, A., L. Coin, R. Durbin et al. (10 co-authors). 2004. The Pfam protein families database. Nucleic Acids Res. 32:D138–141.

    Bartee, L., and J. Bender. 2001. Two Arabidopsis methylation-deficiency mutations confer only partial effects on a methylated endogenous gene family. Nucleic Acids Res. 29:2127–2134.

    Bender, J. 1998. Cytosine methylation of repeated sequences in eukaryotes: the role of DNA pairing. Trends Biochem. Sci. 23:252–256.

    Bird, A. P. 1986. CpG-rich islands and the function of DNA methylation. Nature 321:209–213.

    Cao, X., W. Aufsatz, D. Zilberman, M. F. Mette, M. S. Huang, M. Matzke, and S. E. Jacobsen. 2003. Role of the DRM and CMT3 methyltransferases in RNA-directed DNA methylation. Curr. Biol. 13:2212–2217.

    Cao, X., and S. E. Jacobsen. 2002. Role of the arabidopsis DRM methyltransferases in de novo DNA methylation and gene silencing. Curr. Biol. 12:1138–1144.

    Chen, T., and E. Li. 2004. Structure and function of eukaryotic DNA methyltransferases. Curr. Top. Dev. Biol. 60:55–89.

    Clutterbuck, A. J. 2004. MATE transposable elements in Aspergillus nidulans: evidence of repeat-induced point mutation. Fungal Genet. Biol. 41:308–316.

    Colot, V., and J. L. Rossignol. 1999. Eukaryotic DNA methylation as an evolutionary device. Bioessays 21:402–411.

    Davison, A. J., W. Sauerbier, A. Dolan, C. Addison, and R. G. McKinnell. 1999. Genomic studies of the Lucke tumor herpesvirus (RaHV-1). J. Cancer Res. Clin. Oncol. 125:232–238.

    Fisher, O., R. Siman-Tov, and S. Ankri. 2004. Characterization of cytosine methylated regions and 5-cytosine DNA methyltransferase (Ehmeth) in the protozoan parasite Entamoeba histolytica. Nucleic Acids Res. 32:287–297.

    Freitag, M., R. L. Williams, G. O. Kothe, and E. U. Selker. 2002. A cytosine methyltransferase homologue is essential for repeat-induced point mutation in Neurospora crassa. Proc. Natl. Acad. Sci. USA 99:8802–8807.

    Galagan, J. E., S. E. Calvo, K. A. Borkovich et al. (74 co-authors). 2003. The genome sequence of the filamentous fungus Neurospora crassa. Nature 422:859–868.

    Goyon, C. 1998. Isolation and identification by sequence homology of a second putative C5-DNA-methyltransferase gene from Ascobolus immersus. DNA Seq. 9:109–112.

    Guindon, S., and O. Gascuel. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696–704.

    Hermann, A., S. Schmitt, and A. Jeltsch. 2003. The human Dnmt2 has residual DNA-(cytosine-C5) methyltransferase activity. J. Biol. Chem. 278:31717–31721.

    Higgins, D. G. 1994. CLUSTAL V: multiple alignment of DNA and protein sequences. Methods Mol. Biol. 25:307–318.

    Hung, M. S., N. Karthikeyan, B. Huang, H. C. Koo, J. Kiger, and C. J. Shen. 1999. Drosophila proteins related to vertebrate DNA (5-cytosine) methyltransferases. Proc. Natl. Acad. Sci. USA 96:11940–11945.

    Ikeda, K., H. Nakayashiki, T. Kataoka, H. Tamba, Y. Hashimoto, Y. Tosa, and S. Mayama. 2002. Repeat-induced point mutation (RIP) in Magnaporthe grisea: implications for its sexual cycle in the natural field context. Mol. Microbiol. 45:1355–1364.

    Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275–282.

    Kobayashi, I. 2001. Behavior of restriction-modification systems as selfish mobile elements and their impact on genome evolution. Nucleic Acids Res. 29:3742–3756.

    Kouzminova, E., and E. U. Selker. 2001. Dim-2 encodes a DNA methyltransferase responsible for all known cytosine methylation in Neurospora. EMBO J. 20:4309–4323.

    Kumar, S., X. Cheng, S. Klimasauskas, S. Mi, J. Posfai, R. J. Roberts, and G. G. Wilson. 1994. The DNA (cytosine-5) methyltransferases. Nucleic Acids Res. 22:1–10.

    Kumar, S., K. Tamura, and M. Nei. 2004. MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform. 5:150–163.

    Kunert, N., J. Marhold, J. Stanke, D. Stach, and F. Lyko. 2003. A Dnmt2-like protein mediates DNA methylation in Drosophila. Development 130:5083–5090.

    Lee, J. T. 2003. Molecular links between X-inactivation and autosomal imprinting: X-inactivation as a driving force for the evolution of imprinting? Curr. Biol. 13:R242–R254.

    Li, E., T. H. Bestor, and R. Jaenisch. 1992. Targeted mutation of the DNA methyltransferase gene results in embryonic lethality. Cell 69:915–926.

    Lyko, F., B. H. Ramsahoye, and R. Jaenisch. 2000. DNA methylation in Drosophila melanogaster. Nature 408:538–540.

    Malagnac, F., A. Gregoire, C. Goyon, J. L. Rossignol, and G. Faugeron. 1999. Masc2, a gene from Ascobolus encoding a protein with a DNA-methyltransferase activity in vitro, is dispensable for in vivo methylation. Mol. Microbiol. 31:331–338.

    Malagnac, F., B. Wendel, C. Goyon, G. Faugeron, D. Zickler, J. L. Rossignol, M. Noyer-Weidner, P. Vollmayr, T. A. Trautner, and J. Walter. 1997. A gene essential for de novo methylation and development in Ascobolus reveals a novel type of eukaryotic DNA methyltransferase structure. Cell 91:281–290.

    Maloisel, L., and J. L. Rossignol. 1998. Suppression of crossing-over by DNA methylation in Ascobolus. Genes Dev. 12:1381–1389.

    Methe, B. A., K. E. Nelson, J. A. Eisen et al. (31 co-authors). 2003. Genome of Geobacter sulfurreducens: metal reduction in subsurface environments. Sci. 302:1967–1969.

    Miura, A., S. Yonebayashi, K. Watanabe, T. Toyama, H. Shimada, and T. Kakutani. 2001. Mobilization of transposons by a mutation abolishing full DNA methylation in Arabidopsis. Nature 411:212–214.

    Mund, C., T. Musch, M. Strodicke, B. Assmann, E. Li, and F. Lyko. 2004. Comparative analysis of DNA methylation patterns in transgenic Drosophila overexpressing mouse DNA methyltransferases. Biochem. J. 378:763–768.

    Nielsen, M. L., T. D. Hermansen, and A. Aleksenko. 2001. A family of DNA repeats in Aspergillus nidulans has assimilated degenerated retrotransposons. Mol. Genet. Genomics 265:883–887.

    Nishiyama, R., M. Ito, Y. Yamaguchi, N. Koizumi, and H. Sano. 2002. A chloroplast-resident DNA methyltransferase is responsible for hypermethylation of chloroplast genes in Chlamydomonas maternal gametes. Proc. Natl. Acad. Sci. USA 99:5925–5930.

    Okano, M., D. W. Bell, D. A. Haber, and E. Li. 1999. DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell 99:247–257.

    Okano, M., S. Xie, and E. Li. 1998. Dnmt2 is not required for de novo and maintenance methylation of viral DNA in embryonic stem cells. Nucleic Acids Res. 26:2536–2540.

    Pinarbasi, E., J. Elliott, and D. P. Hornby. 1996. Activation of a yeast pseudo DNA methyltransferase by deletion of a single amino acid. J. Mol. Biol. 257:804–813.

    Pollack, Y., N. Kogan, and J. Golenser. 1991. Plasmodium falciparum: evidence for a DNA methylation pattern. Exp. Parasitol. 72:339–344.

    Posfai, J., A. S. Bhagwat, G. Posfai, and R. J. Roberts. 1989. Predictive motifs derived from cytosine methyltransferases. Nucleic Acids Res. 17:2421–2435.

    Rabinowicz, P. D., L. E. Palmer, B. P. May, M. T. Hemann, S. W. Lowe, W. R. McCombie, and R. A. Martienssen. 2003. Genes and transposons are differentially methylated in plants, but not in mammals. Genome Res. 13:2658–2664.

    Reik, W., W. Dean, and J. Walter. 2001. Epigenetic reprogramming in mammalian development. Science 293:1089–1093.

    Riggs, A. D. 1975. X inactivation, differentiation, and DNA methylation. Cytogenet. Cell Genet. 14:9–25.

    Roberts, R. J., T. Vincze, J. Posfai, and D. Macelis. 2003. REBASE: restriction enzymes and methyltransferases. Nucleic Acids Res. 31:418–420.

    Russell, P. J., J. A. Welsch, E. M. Rachlin, and J. A. McCloskey. 1987. Different levels of DNA methylation in yeast and mycelial forms of Candida albicans. J. Bacteriol. 169:4393–4395.

    Selker, E. U. 2002. Repeat-induced gene silencing in fungi. Adv. Genet. 46:439–450.

    Selker, E. U., N. A. Tountas, S. H. Cross, B. S. Margolin, J. G. Murphy, A. P. Bird, and M. Freitag. 2003. The methylated component of the Neurospora crassa genome. Nature 422:893–897.

    Smith, S. S., and D. I. Ratner. 1991. Lack of 5-methylcytosine in Dictyostelium discoideum DNA. Biochem. J. 277(Pt 1):273–275.

    Teerawanichpan, P., M. B. Chandrasekharan, Y. Jiang, J. Narangajavana, and T. C. Hall. 2004. Characterization of two rice DNA methyltransferase genes and RNAi-mediated reactivation of a silenced transgene in rice callus. Planta 218:337–349.

    Walsh, C. P., J. R. Chaillet, and T. H. Bestor. 1998. Transcription of IAP endogenous retroviruses is constrained by cytosine methylation. Nat. Genet. 20:116–117.

    Wilkinson, C. R., R. Bartlett, P. Nurse, and A. P. Bird. 1995. The fission yeast gene pmt1+ encodes a DNA methyltransferase homologue. Nucleic Acids Res. 23:203–210.

    Wood, V., R. Gwilliam, M. A. Rajandream et al. (131 co-authors). 2002. The genome sequence of Schizosaccharomyces pombe. Nature 415:871–880.

    Yoder, J. A., and T. H. Bestor. 1998. A candidate mammalian DNA methyltransferase related to pmt1p of fission yeast. Hum. Mol. Genet. 7:279–284.

    Yoder, J. A., C. P. Walsh, and T. H. Bestor. 1997. Cytosine methylation and the ecology of intragenomic parasites. Trends Genet. 13:335–340.

    Yuasa, Y. 2002. DNA methylation in cancer and ageing. Mech. Ageing Dev. 123:1649–1654.(Lo?c Ponger and Wen-Hsiun)