当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2004年 > 第9期 > 正文
编号:11255077
Evolutionary Process of Amino Acid Biosynthesis in Corynebacterium at the Whole Genome Level
     * Institute of Life Sciences, Ajinomoto Co., Inc., Kawasaki, Japan

    Center for Information Biology, National Institute of Genetics, Mishima, Shizuoka, Japan

    Fermentation & Biotechnology Laboratories, Ajinomoto Co., Inc., Kawasaki, Japan

    Research Center for Glycoscience, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan

    ? National Institute of Technology and Evaluation, Shibuya, Tokyo, Japan

    E-mail: tgojobor@genes.nig.ac.jp.

    Abstract

    Corynebacterium glutamicum, which is the closest relative of Corynebacterium efficiens, is widely used for the large scale production of many kinds of amino acids, particularly glutamic acid and lysine, by fermentation. Corynebacterium diphtheriae, which is well known as a human pathogen, is also closely related to these two species of Corynebacteria, but it lacks such productivity of amino acids. It is an important and interesting question to ask how those closely related bacterial species have undergone such significant functional differentiation in amino acid biosynthesis. The main purpose of the present study is to clarify the evolutionary process of functional differentiation among the three species of Corynebacteria by conducting a comparative analysis of genome sequences. When Mycobacterium and Streptomyces were used as out groups, our comparative study suggested that the common ancestor of Corynebacteria already possessed almost all of the gene sets necessary for amino acid production. However, C. diphtheriae was found to have lost the genes responsible for amino acid production. Moreover, we found that the common ancestor of C. efficiens and C. glutamicum have acquired some of genes responsible for amino acid production by horizontal gene transfer. Thus, we conclude that the evolutionary events of gene loss and horizontal gene transfer must have been responsible for functional differentiation in amino acid biosynthesis of the three species of Corynebacteria.

    Key Words: Corynebacterium ? evolution ? comparative genomics ? amino acid biosynthesis ? horizontal gene transfer

    Introduction

    The genomes of several industrially useful bacteria as well as of pathogenic bacteria have been sequenced (Nelson et al. 2000), and one can now compare several genome sequences simultaneously. The reconstruction of metabolic pathways from genome sequences by means of tools such as KEGG (Kanehisa 1997) and WIT (Overbeek et al. 2000) can provide much useful information on differences in the metabolic pathways of bacteria that lead to variation in growth characteristics and in ability to assimilate different substances. It is especially important in applied technology to understand differences in metabolic pathways and their evolutionary history. Until now, there have been only a few studies of metabolic pathways based on comparison of the genome sequences of closely related species (Marais et al. 1999). The same is true of comparisons of specific pathways in more distantly related microorganisms with the aim of accounting for their differences from a biological and evolutionary point of view (Boucher and Doolittle 2000; Lange et al. 2000).

    We have previously sequenced and annotated the genome of Corynebacterium efficiens (Nishio et al. 2003). This bacterium is a close relative of Corynebacterium glutamicum, which has been widely used in the industrial production of glutamate, lysine, and other amino acids by fermentation. The two species are recognized as glutamic acid–producing Corynebacteria (Fudou et al. 2002). The optimal temperature for glutamate production by C. glutamicum is approximately 30°C, and the microorganism does not grow or produce glutamate at temperatures of 40°C or above. On the other hand, C. efficiens can grow and produce glutamate at temperatures greater than 40°C. On the basis of genome comparisons between these two species, three kinds of amino acid substitutions were suggested to be responsible for the thermostability of C. efficiens and the increase of 10% in genome GC content in C. efficiens (Nishio et al. 2003). In addition, the comparative genome-sequence analysis suggested that the absence of a RecBCD pathway may have been responsible for suppressing genome shuffling in Corynebacterium (Nakamura et al., 2003). One of our research interests is the extent to which the genetic control of amino acid biosynthesis differs between these closely related species. It is well known that C. glutamicum overproduces glutamic acid under a variety of conditions such as biotin limitation (Kimura 2003). We are interested in the evolutionary events responsible for the acquisition of this feature. Furthermore, C. glutamicum also overproduces lysine, arginine, threonine, isoleucine, valine, serine, tryptophan, phenylalanine, and histidine (supplementary table 1; Ikeda 2003). It is, therefore, of great interest to investigate the evolutionary processes involved in the acquisition of these productive capabilities. Corynebacterium diphtheriae is a well-known pathogen (Collins and Cummins 1986, Graevenitz and Krech 1991) whose genome has been sequenced by the Sanger Center (Cerdeno-Tarraga et al. 2003). Although the main focus of interest in the study of C. diphtheriae has been its pathogenicity, we were interested in understanding the evolutionary process of functional differentiation between the amino acid producing species and this pathogenic strain.

    Table 1 The Summary of Amino Acid Biosynthesis–Related Genes Examined in this Study.

    From the complete genome sequence data, some of the industrially useful phenotypes are suggested to be acquired by horizontal gene transfer and gene duplication. For example, Streptomyces may have acquired many genes for antibiotic production by gene duplication (Bentley et al. 2002). From the comparison of the complete genome sequences for closely related species, the functional differentiation among species will be clarified. Different phenotypes in closely related species have originated from the difference of the gene contents and regulatory mechanisms of genes. The comparison of complete genome sequences enables us to know the difference of gene content and regulation. Understanding the difference of gene contents among Corynebacteria should be the first step for the study of a regulatory system of amino acid overproduction. Making the comparison of gene content among Corynebacteria using the complete genome sequences, we conducted our study to understand when the amino acid overproduction system in C. glutamicum was acquired. Our analysis showed that the common ancestor of Corynebacterium had already possessed almost all the genes needed for the overproduction of amino acids, and that C. diphtheriae lost many of these genes. However, the difference in gene content between glutamic acid–producing Corynebacteria and C. diphtheriae may account for the amino acid productivity in C. glutamicum. Both C. efficiens and C. glutamicum acquired several genes that may be important for amino acid production, after their divergence from the common ancestor of the three Corynebacteria. Furthermore, the genes controlling amino acid biosynthesis differentiate C. glutamicum from C. efficiens. In particular, C. efficiens possesses a paralogous gene encoding glutamine synthetase I that may be responsible for its differences from C. glutamicum in glutamic acid productivity. Our results suggest that gene transfer and gene loss in Corynebacterium were responsible for functional differentiation of the three bacterial species; the emergence of features favoring the capacity for amino acid production, and acquisition of pathogenicity against humans.

    Materials and Methods

    The complete genome sequences of C. efficiens (Nishio et al. 2003), C. glutamicum (Ikeda and Nakagawa 2003; Kalinowski et al. 2003), C. diphtheriae (Cerdeno-Tarraga et al. 2003), Mycobacterium tuberculosis (Cole et al. 1998), Mycobacterium leprae (Cole et al. 2001) and Streptomyces coelicolor A3(2) (Bentley et al. 2002) were obtained from DDBJ/EMBL/Genbank (accession numbers: BA000035, BA000036, BX248353, AL123456, AL450380, and AL645882, respectively). In the case of S. coelicolor, we also used the web server (http://jic-bioinfo.bbsrc.ac.uk/S.coelicolor/index.html).

    Phylogenetic Analysis

    The Blast (Altschul et al. 1997) and FASTA (Pearson 2000) programs were used for database searches, and ClustalW (Thompson et al. 1997) for multiple alignments. Phylogenetic trees were constructed by the Neighbor-Joining (NJ) method with P distance or Kimura's distance (Saitou and Nei 1987). Estimates of synonymous (Ks) and nonsynonymous (Ka) per sites and standard deviations were calculated using Li's method (Li 1993) implemented in DAMBE (Xia and Xie 2001). We also used the Nei and Gojobori method (Nei and Gojobori 1986), but it gave virtually the same results.

    Comparison of Gene Contents in Corynebacterium

    Multiple alignments and phylogenic trees were constructed of the high-GC gram-positive bacteria, C. efficiens, C. glutamicum, C. diphtheriae, M. tuberculosis, M. leprae, and S. coelicolor, using all highly conserved proteins involved in amino acid biosynthesis. Criteria for highly conserved sequences were defined using the FASTA program. The query sequences used in the FASTA program searches were from C. glutamicum or C. efficiens. The Z scores of the FASTA program, identities of overlapping regions, and detected sequence lengths were used to establish the highly conserved sequences. All alignments were checked manually.

    Results

    Differences Between C. efficiens and C. glutamicum in Genes Related to Amino Acid Biosynthesis

    To evaluate the evolutionary processes that led to the biological capacity for amino acid production on a large scale, we collected the amino acid sequences of amino acid biosynthetic enzymes and related enzymes from genome annotations of a number of high-GC gram-positive bacteria. All the phylogenetic trees of these enzymes were compared with a 16S rRNA–based phylogenetic tree (fig. 1). In this phylogenetic tree, C. diphtheriae diverged from the common ancestor of Corynebacteria, and after that C. efficiens and C. glutamicum diverged from the common ancestor of glutamic acid–producing Corynebacteria. This representative topology of a phylogeny was supported by the phylogenetic trees for most translation/transcription-related genes. We found that only five amino acid biosynthesis–related genes possessed their paralogous genes in glutamic acid–producing Corynebacteria (table 1 and supplementary table 2). The topology of phylogenetic trees for the five genes was shown to be distinctively different from the representative topology of a phylogeny in high-GC gram-positive bacteria. Not only by the NJ method (Saitou and Nei 1987) but also by the maximum-likelihood (ML) method (Adachi and Hasegawa 1996), the same topologies were obtained for each of five genes and 16S rRNAs, respectively. Four of the five genes encouraged us to study the genome structure such as gene transfer/duplication/loss in glutamic acid–producing Corynebacteria because of their tree topologies, multiple alignments, and operon structures. We focused a further analysis on four genes: trpB (encoding tryptophan synthase ? chain), ilvD (encoding dihydroxy-acid dehydratase), aroQ (encoding 3-dehydroquinate dehydratase), and glnA (glutamine synthetase I).

    FIG. 1. Phylogenetic tree of the 16S rRNA sequences of the high-GC gram-positive bacteria examined in this study. B. subtilis was used as out group. The tree was constructed by the Neighbor-Joining method (Saitou and Nei 1987), and numbers indicate bootstrap values for 100 replications

    Table 2 Average number of Ks, Synonymous, and Ka, Nonsynonymous, Substitution Rates and Ks–Ka Ratio for Glutamine Synthetase in Corynebacterium and Mycobacterium.

    In the phylogenetic tree of TrpB, the C. efficiens (CE2880) and C. diphtheriae (DIP2351) were positioned outside the orthologs of C. glutamicum (CE2872, Cglu3034, DIP2360) (fig. 2A). The location on the genome of the paralog trpB (CE2880 and DIP2351) was very close to that of the ortholog trpB (CE2872 and DIP2360) in C. efficiens and C. diphtheriae. From these results, we suggest that gene duplication took place in the common ancestor of the Corynebacterium, and that gene loss was responsible for the single copy of this gene in C. glutamicum.

    FIG. 2. Phylogenetic trees of proteins related to amino acid biosynthesis in the high-GC gram-positive bacteria (a) tryptophan synthase beta chain (TrpB); TrpB in B. subtilis was used as out group. (b) dihydroxy-acid dehydratase (IlvD); IlvD in B. subtilis was used as out group. Sequences in the figure show the region of the multiple alignment, which contains the most critical differences (see text). (c) 3-dehydroquinate dehydratase (AroQ): 3-dehydroquinate dehydratase (AroC) in B. subtilis was used as out group. Arrows in the figure show the operon structure. The complete genome sequence of C. pseudotuberculosis was not available. (d) glutamine synthetase (GlnA): GlnA in B. subtilis was used as out group. The right part of figure shows the tree assuming gene duplication in C. efficiens. The position of CE2116 should be inside of C. glutamicum Cgl2214. (e) ornithine cyclodeaminase (Ocd): the ornithine cyclodeaminase homologues in Streptomyces avermitilis were used as a member of high-GC gram-positive bacteria. The ornithine cyclodeaminase homolog in Staphylococcus aureus was used as out group. The numbers indicate bootstrap values for 100 replications

    We constructed a multiple alignment and a phylogenetic tree of IlvD, again using high-GC gram-positive bacterial sequences. In the phylogenetic tree, the highly conserved sequence in Bacillus subtilis, a low-GC gram-positive bacterium, was used as out group (supplementary fig. 1 and fig. 2B, respectively). This phylogenetic tree contained two clusters the topologies of which were unlike the trees obtained from the 16S rRNA sequences (figs. 1, 2B). In the multiple alignment of IlvD, a large insertion was observed between positions 412 and 450 in C. efficiens CE1362, C. glutamicum Cgl1268, C. diphtheriae DIP1096, and S. coelicolor SCO3345 (supplementary fig. 1), and these four sequences were clustered in the phylogenetic tree. A large insertion was also observed in multiple alignment of the dehydratase family (PfamA, ILVD_EDD). It implies that this insertion took place a long time ago, even before the emergence of the common ancestor of high-GC gram-positive bacteria.

    The phylogenetic tree of AroQ was constructed in the same manner as that of IlvD and its topology also differed from the 16S rRNA–based phylogenetic tree (figs. 1, 2C). C. efficiens CE1739, C. diphtheriae DIP1342, Corynebacterium, pseudotuberculosis, M. leprae ML0519, and M. tuberculosis Rv2537c form a cluster in the phylogenetic tree. AroQ in C. efficiens CE1739, C. diphtheriae DIP1342, M. leprae ML0519, and M. tuberculosis Rv2537c is part of the aroCKBQ operon. Another AroQ cluster was composed of an additional aroQ in C. efficiens CE0442, C. glutamicum Cgl0423, and S. coelicolor SCO1961. The additional aroQ in C. efficiens CE0442 and aroQ in C. glutamicum Cgl0423 lie next to aroE on the chromosome, whereas in S. coelicolor SCO1961 there is no nearby aromatic amino acid biosynthesis gene. These results suggest that the evolution of the aroQ gene in high-GC gram-positive bacteria is related to operon organization, and it is curious that C. efficiens retained two aroQ genes within conserved operon structures.

    The phylogenetic tree of GlnA showed that the paralogous GlnA of C. efficiens CE2116 was positioned outside that of C. diphtheriae DIP1644 (fig. 2D). This result suggests that glnA of C. efficiens CE2116 was not acquired by gene duplication within its own evolutionary linage (unless it is a pseudogene), but rather by gene duplication in the common ancestor of Corynebacterium, or by horizontal gene transfer. To find a more likely explanation, we compared the genome structures of the three Corynebacteria (fig. 3). In C. efficiens and C. diphtheriae, there were additional genes next to orthologous GlnA when compared to C. glutamicum. These additional genes are from CE2105 to CE2116 in C. efficiens and DIP1644 to DIP1661 in C. diphtheriae, as shown in Fig. 3. These genes were dissimilar at both the DNA and amino acid levels, implying that they were acquired independently in each species. The C. diphtheriae–specific genes are annotated as putative phage-related and antibiotic resistance–related pathogenicity island and showed unusual GC content and dinucleotide signature (Cerdeno-Tarraga et al. 2003). This result suggested that the C. diphtheriae–specific genes were acquired by horizontal gene transfer. On the other hand, the paralogous ocd gene (CE2115) encoding ornithine cyclodeaminase and the paralog glnA (CE2116) (fig. 3) were C. efficiens–specific genes. The paralogous ocd gene (CE2115) was located next to the paralog glnA (CE2116) in C. efficiens. The phylogenetic tree of Ocd showed that the paralogous Ocd of C. efficiens (CE2115) was positioned outside the orthologous corynebacterial Ocd (CE1700, Cgl1582) (fig. 2E). Moreover, C. diphtheriae has lost the ocd gene. The orthologous ocd gene was not located near the orthologous glnA in C. efficiens CE2104 and C. glutamicum Cgl2214, suggesting that the paralogous glnA (CE2116) and paralogous ocd (CE2115) genes of C. efficiens were not acquired by gene duplication in the common ancestor of Corynebacterium. A possible explanation was the lack of a RecBCD pathway (Nakamura et al. 2003), genome rearrangement could not take place, and duplicated genes must remain close to where they originate. Another possible explanation was that it was a pseudogene. An analysis of the number of nonsynonymous versus synonymous substitutions showed a larger number of nonsynonymous substitutions in the paralogous glnA of C. efficiens (CE2116) than in the orthologous corynebacterial glnA (CE2104, Cgl2214, DIP1644); however, the number was not as high as in Mycobacterium, and GC-content analysis showed that there was no difference in the second-position GC content (tables 2, 3). If the paralogous glnA gene was a pseudogene on which there were no functional constraints, a significant difference would be observed in the second-position GC content of paralogous gene when comparing with that of orthologous gene. Evidently, the paralogous glnA of C. efficiens (CE2116) is not a pseudogene, but was acquired by horizontal gene transfer.

    FIG. 3. ORFs in the GlnA region of Corynebacteria. The numbers correspond to the gene designations in each species. Orthologous genes are connected with lines

    Table 3 GC contents of glutamine synthetase in Corynebacterium and Mycobacterium.

    Newly Acquired Genes in Amino Acid–Producing Species

    The genome of C. diphtheriae comprises 2,488,635 bp, thus being smaller than those of other high-GC gram-positive bacteria (Cerdeno-Tarraga et al., 2003). The evolutionary origin of this small genome must have been either massive gene loss in C. diphtheriae or massive gene acquisition in the other high-GC gram-positive bacteria. To clarify the evolutionary event responsible, we identified the common orthologous genes in the five high-GC gram-positive bacteria—C. efficiens, C. glutamicum, C. diphtheriae, M. tuberculosis, and S. coelicolor—by the reciprocal best-hit method using Blast (Mineta et al., 2003), as well as four species excluding one of Corynebacteria. There were 748 orthologous genes in the five bacteria, 768 when excluding C. glutamicum, 773 when excluding C. efficiens, and 831 when excluding C. diphtheriae. This shows that it is likely that C. diphtheriae lost many orthologs that were found in the four other bacteria after it diverged from the common ancestor of the Corynebacterium. C. diphtheriae has lost many genes present in the sister species; for example, gltBD, metE, metB, malE, cysH, cysI, cysN, and cysD are missing from C. diphtheriae but present in C. efficiens, C. glutamicum, and the out group bacteria (table 1, supplementary table 2, and fig. 4).

    FIG. 4. The overview of amino acid biosynthesis pathway in Corynebacteria. Broad line shows the conserved pathway among three Corynebacteria. Narrow line shows the lost pathway in C. diphtheriae. Glc, glucose; G6P, glucose-6-phosphate; F6P, fructose-6-phosphate; GAP, glyceraldehyde-3-phosphate; 3PG, 3-phosphoglycerate; PEP, phosphoenolpyruvate; Pyr, pyruvate; AcCoA, acetyl-coenzyme A; Cit, citrate; IsoCit, isocitrate; aKG, alpha-ketoglutarate; SucCoA, succinyl-coenzyme A; Suc, succinate; Mal, maleate; Oxa, oxaloacetic acid; Ribu5P, ribulose-5-phosphate; X5P, xylulose-5-phosphate; Rib5P, ribose-5-phosphate, E4P, erythrose-4-phosphate, Sed7P, sedoheptulose-7-phosphate, His, histidine, DAHP, 3-deoxy-D-arabino-heptulosonate-7-phosphate; Chr, chorismate; Trp, tryptophan; Pre, prephenate; Phe, phenylalanine; Tyr, tyrosine; Glu, glutamate; Gln, glutamine; Pro, proline; Arg, arginine; Ser, serine; Gly, glycine; Cys, cysteine; aKVal, alpha-ketovaline; Leu, leucine; Val, valine; Ile isoleucine, Thr; threonine; Asp, aspartate; Asn, asparagine; ASA, aspartate-semialdehyde; THP, tetrahydropicolinate; mDAP, meso-diaminopimelate; Lys, lysine; Hom, homoserine; AcHom, acetylhomoserine; hCys, homocysteine; Met, methionine

    C. diphtheriae does not possess a paralogous pyruvate kinase (pyk2) or phosphoenolpyruvate synthase (pps) gene in the anaplerotic pathway, nor an aroG-encoding 3-deoxy-D-arabinoheptulosonate-7-phosphate synthase in aromatic amino acid biosynthesis or a diaminopimelate dehydrogenase (ddh) gene in lysine biosynthesis. These genes are also absent from the other high-GC gram-positive bacteria (supplementary table 2). There are only two homologs of Pyk2 of C. glutamicum and C. efficiens among known protein sequences. One is in Thermosynechococcus elongates, a kind of Cyanobacterium, and the other in Arabidopsis thaliana. In C. efficiens and C. glutamicum, pps (CE0560 and Cgl0551) and pps2 (CE0561 and Cgl0552) are adjacent to each other. The N-terminal region of the pps2 of Corynebacteria (CE0561 and Cgl0552) is similar to that of bacterial phosphoenolpyruvate synthase. We found only one species from known protein sequences that has these two homologs in the same arrangement, and they were isolated as putative phenol phosphorylation–related genes (Breinig et al. 2000). Bacillus sphaericus and Clostridium tetani have homologous Ddh sequences at the amino acid level. Together these results suggest that pyk2, pps, pps2, and ddh were acquired by the common ancestor of the amino acid–producing species, rather than by having been lost in C. diphtheriae. There are no homologs of AroG in Mycobacterium or Streptomyces among known protein sequences: However, other high-GC gram-positive bacteria, Actinomycetales, Thermobifida fusca, and Amycolatopsis mediterranei have highly conserved sequences. We infer that aroG was lost in C. diphtheriae, Mycobacterium, and Streptomyces, but retained in C. efficiens and C. glutamicum.

    One of the biologically important characteristics in C. glutamicum is that it has been known to be a biotin-requirement organism (Kimura 2003). The biotin requirement is also observed in C. efficiens. These bacteria lack the complete biotin biosynthesis pathway from pimelate to biotin. Glutamic acid overproduction in C. glutamicum is caused by the shortage of biotin (Kimura 2003). It is of interest to note that C. diphtheriae may not be a biotin-requiring organism because it possesses the complete biotin biosynthesis pathway. From this reason, it is strongly speculated that C. diphtheriae does not possess the glutamic acid overproduction mechanism induced by the biotin limitation. Moreover, in C. diphtheriae, DIP1381 encoding 6-carboxyhexanoate—CoA ligase as the first enzyme in biotin biosynthesis, may have been acquired by horizontal gene transfer in C. diphtheriae (table 1 and supplementary table 2). This is because any other high-GC gram-positive bacteria except C. diphtheriae did not possess orthologous genes of DIP1381.

    Discussion

    Why do the glutamic acid–producing Corynebacteria have such a remarkable capacity for producing many different amino acids? To answer this question from an evolutionary point of view, we reconstructed metabolic pathways by using the complete genome sequences of high-GC gram-positive bacteria, and made a detailed comparison of their pathway genes. We first tried to determine whether C. efficiens and C. glutamicum had acquired the genes necessary for amino acid overproduction. Our analysis suggested that other high-GC gram-positive bacteria had orthologs for most of the characteristic genes needed for amino acid overproduction in C. glutamicum (Vrljic et al. 1996; Kimura et al. 1996; Kimura 2003; Simic et al. 2001). In a previous study, 2,101 orthologs were identified between C. efficiens and C. glutamicum (Nakamura et al. 2003). Only 177 orthologs failed to have any homologs in C. diphtheriae, Mycobacterium, and Streptomyces (data not shown). These results suggest that the capacity for overproducing amino acids was inherited from a common ancestor, and that actual overproduction may have emerged in the course of evolution of glutamic acid–producing Corynebacteria.

    The loss of genes in C. diphtheriae may be correlated with the microorganism's loss of amino acid–production capability. Our analysis suggested that C. diphtheriae has lost many genes present in the common ancestor and that this loss is reflected in its genome size. C. diphtheriae lacks the genes gltBD, ddh, metE, and metB, whose products encode redundant pathways for glutamate, lysine, and methionine biosynthesis in the amino acid–producing species (fig. 4). Surprisingly, C. diphtheriae has also lost all genes of the sulfur incorporation pathway, suggesting that it cannot synthesize cysteine. The addition of cysteine was critical for toxin production and cell growth of C. diphtheriae (Nagarkar et al. 2002), consistent with the absence of the sulfur incorporation pathway.

    To estimate the evolutionary events needed for the capacity for amino acid overproduction in the industrially useful Corynebacteria genome, we compared amino acid biosynthesis–related genes of C. efficiens and C. glutamicum. We found that although amino acid biosynthesis pathways were well conserved, the number of paralogs related to amino acid biosynthesis differed (table 1). Our phylogenetic analysis suggested that the paralogs glnA (CE2116) (Schulz et al., 2001) and ocd (CE2115) of C. efficiens were acquired by horizontal gene transfer. If ocd and glnA paralogous genes were acquired together, then the creation of the ammonia recycle pathway could be achieved in C. efficiens. Gene transfer may, therefore, be one of the important factors in the evolution of the amino acid–producing species.

    Choice for particular genes may also have been important in the evolution of bacterial phenotypes. In the phylogenetic tree of AroQ (fig. 1C), one cluster contains only nonpathogenic bacteria, and another pathogenic bacteria other than C. efficiens (the pathogenic cluster). This was the only phylogenetic tree of all the phylogenetic trees for amino acid biosynthesis–related genes in the high-GC gram-positive bacteria to show that Corynebacteria are separated into two clusters of pathogens and nonpathogens. One possible evolutionary explanation is that gene duplication occurred in the common ancestor of the high-GC gram-positive bacteria and that, as a result of the choice, each species except C. efficiens, lost one of the two aroQ genes, depending on their phenotypic features. Mutation of the common aromatic amino acid biosynthetic gene for the inhibition of the folic acid biosynthesis is one of the strategies for vaccine development against pathogenic bacteria. In fact, it has been observed that the growth in more than 10 pathogens was attenuated by single mutation of aromatic amino acid biosynthesis–related genes (aro genes) (Simmons et al. 1997). In C. pseudotuberculosis, mutation of aroQ weakened the pathogenicity of the microorganism in the mouse (Simmons et al., 1997). Thus, aroQ may be related to pathogenicity.

    To understand the phylogeny of IlvD, there are two possible evolutionary events: ancient gene duplication or horizontal gene transfer (fig. 2B). Our results suggested that ilvD in C. efficiens was acquired by ancient gene duplication rather than by horizontal gene transfer. In the case of TrpB, the phylogenetic tree clearly showed that gene duplication had occurred in the common ancestor and that C. glutamicum may have lost the duplicated ORF (fig. 2A). The paralogous trpB was located near the orthologous trpB in C. efficiens and C. diphtheriae. This location in the Corynebacteria supports the rule that duplicated genes are located next to one another owing to the absence of genome rearrangement resulting from the lack of an RecBCD pathway (Nakamura et al. 2003). It has been proposed that the paralogous trpB in C. diphtheriae is a pseudogene because of the long branch length (Xie et al., 2002). However, persistence of this paralog in C. diphtheriae but not in C. glutamicum seems strange because C. diphtheriae appears to have lost many genes during its evolution and the selective pressure to discard unnecessary genes appears to have been much higher in its case than in the case of C. glutamicum.

    Our findings suggest that almost all the genes required for amino acid production already existed in the common ancestor of Corynebacterium. We also believe that newly acquired genes in glutamic acid–producing Corynebacteria contribute to amino acid overproduction capacity. Actually, ddh, one of the newly acquired genes in the amino acid–producing species, has been known to contribute to lysine production in C. glutamicum. An interesting question is whether the newly acquired and previously unrecognized enzyme phosphoenolpyruvate synthase in C. efficiens and C. glutamicum contributes to the ability of these Corynebacteria to overproduce amino acids. Previous studies of glutamate and lysine production have not highlighted the existence of this enzyme. For example, Park et al. (1997) did not assume this enzyme in the flux calculation for lysine production in C. glutamicum. In Escherichia coli, the same enzyme plays an important role in the production of aromatic compounds (Yi et al. 2002), and, furthermore, pps and its homolog in Thauera aromatica were isolated as phenol-induced proteins (Breinig et al. 2000). In fact, aroG encoding 3-deoxy-D-arabino-heptulosonate-7-phosphate synthase, which is on the aromatic amino acid biosynthesis pathway, may have been retained in C. efficiens and C. glutamicum, although it was lost in C. diphtheriae, Mycobacteria, and Streptomyces. Because the gene for benzoate 1,2-dioxygenase reductase, which is related to genes for benzoate degradation (CE2306, Cgl2405), was newly acquired in amino acid–producing species (table 1), phosphoenolpyruvate synthase may cooperate with that gene in these Corynebacteria. Thus, newly acquired genes may also contribute to productivity of amino acids. Small numbers of those genes' homologs are found among known protein sequences. Therefore, they may be have been acquired by horizontal gene transfer.

    We have now shown differences in gene contents among Corynebacteria. These differences may provide a clue for elucidating the regulatory mechanisms for amino acid overproduction. Although we do not know the regulatory sequences related to glutamic acid production in C. glutamicum, the comparison of regulatory regions of glutamate overproduction–related genes among different species may lead to an overview of the regulation for amino acid production mechanism. In C. glutamicum, there may be a strong relationship between the attenuation of 2-oxoglutarate dehydrogenase (ODH) activity and glutamic acid production (Shimizu et al. 2003). One of our interests is the similarity of the regulatory regions among three species of Corynebacteria. The regulatory regions of odhA gene encoding ODH were more strongly conserved between C. efficiens and C. glutamicum than between C. diphtheriae and C. glutamicum or C. efficiens (supplementary fig. 2). On the other hand, enhanced glutamate dehydrogenase (GDH) activity may not contribute to glutamic acid production (Shimizu et al. 2003). The conservation of regulatory regions for gdh genes encoding GDH were almost the same (supplementary fig. 3). These results are consistent with the previous knowledge of glutamic acid production, suggesting the lack of glutamic acid overproduction mechanism in C. diphtheriae. The latter is also supported by the complete biotin biosynthesis pathway of C. diphtheriae. As mentioned earlier, the comparison of regulatory regions among three species of Corynebacteria may be important for studying regulatory systems of amino acid production. In this case, we may have to assume that the important part of regulatory regions is conserved in spite of a difference in the genome GC contents. Note that the genome GC content of C. efficiens was 10% higher than that of C. glutamicum or C. diphtheriae (Nishio et al. 2003).

    In this study, we have attempted to analyze the evolutionary process by which the capacity for amino acid overproduction was acquired by glutamic acid–producing Corynebacteria. Gene transfer/duplication/loss events in Corynebacteria may facilitate the formation of amino acid overproduction mechanisms. Retention of ancestral genes and gain of new genes by horizontal gene transfer may have been the major motive forces in establishing the capability of these bacteria for amino acid overproduction, whereas gene loss may have resulted in the loss of that capacity by C. diphtheriae. We have also found some genes that may be responsible for different amino acid productivity between C. efficiens and C. glutamicum by using comparison and detailed analysis of their genome sequences. Experimental analysis will be needed to clarify the contribution of these genes and their regulatory sequences to the overproduction of amino acids.

    Acknowledgements

    The authors are grateful to Dr. T. Ito, Dr. J. Mashima, and Ms. M. Suzuki for annotation of the C. efficiens genome.

    Literature Cited

    Adachi J., and M. Hasegawa. 1996. MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood. Computer Science Monographs 28. Institute Statistical Mathematics, Tokyo.

    Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402.

    Bentley, S. D., K. F. Chater, and A. M. Cerdeno-Tarraga, et al. (40 co-authors). 2002. Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2). Nature 417:141-147.

    Boucher, Y., and W. F. Doolittle. 2000. The role of lateral gene transfer in the evolution of isoprenoid biosynthesis pathways. Mol. Microbiol. 37:703-716.

    Breinig, S., E. Schiltz, and G. J. Fuchs. 2000. Genes involved in anaerobic metabolism of phenol in the bacterium Thauera aromatica. J. Bacteriol. 182:5849-5863.

    Cerdeno-Tarraga, A. M., A. Efstraitou, and L. G. Dover, et al., (26 co-authors). 2003. The complete genome sequence and analysis of Corynebacterium diphtheriae NCTC13129. Nucleic Acids Res. 31:6516-6523.

    Cole, S. T., R. Brosch, and J. Parkhill, et al. (39 co-authors). 1998. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393:537-544.

    Cole, S. T., K. Eiglmeier, and J. Parkhill, et al. (41 co-authors). 2001. Massive gene decay in the leprosy bacillus. Nature 409:1007-1011.

    Collins, M. D., and C. S. Cummins. 1986. Genus Corynebacterium. Vol. 2 Pp. 1266–1766 in P. H. A. Sneath, ed. Bergey's Manual of Systematic Bacteriology. Williams & Willkins, Baltimore.

    Fudou, R., Y. Jojima, A. Seto, K. Yamada, E. Kimura, T. Nakamatsu, A. Hiraishi, and S. Yamanaka. 2002. Corynebacterium efficiens sp. nov., a glutamic-acid-producing species from soil and vegetables. Int. J. Syst. Evol. Microbiol. 52:1127-1131.

    Graevenitz, A. V., and T. Krech. 1991. The Genus Corynebacterium—Medical. vol. 2 Pp. 1173–1187 in A. Balows, H. G. Trüper, M. Dworkin, W. Harder, and K. H. Schleicer, eds. The Prokaryotes, 2nd edition. Springer-Verlag, New York.

    Ikeda, M. 2003. Amino acid production processes. Pp. 1–35 in R. Faurie and J. Thommel, eds. Adv. Biochem. Eng. Biotechnol., vol 79. Microbial production of l-amino acids. Springer, Berlin Heidelberg New York.

    Ikeda, M., and S. Nakagawa. 2003. The Corynebacterium glutamicum genome: features and impacts on biotechnological processes. Appl. Microbiol. Biotechnol. 62:99-109.

    Kalinowski, J., B. Bathe, and D. Bartels, et al. (27 co-authors). 2003. The complete Corynebacterium glutamicum ATCC 13032 genome sequence and its impact on the production of L-aspartate-derived amino acids and vitamins. J. Biotechnol. 104:5-25.

    Kanehisa, M. 1997. A database for post-genome analysis. Trends Genet. 13:375-376.

    Kimura, E. 2003. Metabolic engineering of glutamate production. Pp. 37–57 in R. Faurie and J. Thommel, eds. Adv. Biochem. Eng. Biotechnol., vol 79. Microbial production of l-amino acids. Springer, Berlin Heidelberg New York.

    Kimura, E., C. Abe, Y. Kawahara, and T. Nakamatsu. 1996. Molecular cloning of a novel gene, dtsR, which rescues the detergent sensitivity of a mutant derived from Brevibacterium lactofermentum. Biosci. Biotechnol. Biochem. 60:1565-1570.

    Lange, B. M., T. Rujan, W. Martin, and R. Croteau. 2000. Isoprenoid biosynthesis: the evolution of two ancient and distinct pathways across genomes. Proc. Natl. Acad. Sci. U S A. 97:13172-13177.

    Li, W. H. . Unbiased estimation of the rates of synonymous and nonsynonymous substitution. 1993. J. Mol. Evol. 36:96-99.

    Marais, A., G. L. Mendz, S. L. Hazell, and F. Megraud. 1999. Metabolism and genetics of Helicobacter pylori: the genome era. Microbiol. Mol. Biol. Rev. 63:642-674.

    Mineta, K., M. Nakazawa, F. Cebria, K. Ikeo, K. Agata, and T. Gojobori. 2003. Origin and evolutionary process of the CNS elucidated by comparative genomics analysis of planarian ESTs. Proc. Natl. Acad. Sci. U S A. 100:7666-7671.

    Nagarkar, P. P., S. D. Ravetkar, and M. G. Watve. 2002. The amino acid requirements of Corynebacterium diphtheriae PW 8 substrain CN 2000. J. Appl. Microbiol. 92:215-220.

    Nakamura, Y., Y. Nishio, K. Ikeo, and T. Gojobori. 2003. The genome stability in Corynebacterium species due to lack of the recombinational repair system. Gene 317:149-155.

    Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418-426.

    Nelson, K. E., I. T. Paulsen, J. F. Heidelberg, and C. M. Fraser. 2000. Status of genome projects for nonpathogenic bacteria and archaea. Nat. Biotechnol. 18:1049-1054.

    Nishio, Y., Y. Nakamura, Y. Kawarabayasi, Y. Usuda, E. Kimura, S. Sugimoto, K. Matsui, A. Yamagishi, H. Kikuchi, K. Ikeo, and T. Gojobori. 2003. Comparative complete genome sequence analysis of the amino acid replacements responsible for the thermostability of Corynebacterium efficiens. Genome Res. 13:1572-1579.

    Overbeek, R., N. Larsen, G. D. Pusch, M. D'Souza, E. Selkov, Jr., N. Kyrpides, M. Fonstein, N. Maltsev, and E. Selkov. 2000. WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res. 28:123-125.

    Park, S. M., C. Shaw-Reid, A. J. Sinskey, and G. Stephanopoulos. 1997. Elucidation of anaplerotic pathways in Corynebacterium glutamicum via 13C-NMR spectroscopy and GC-MS. Appl. Microbiol. Biotechnol. 47:430-440.

    Pearson, W. R. 2000. Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol. 132:185-219.

    Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425.

    Schulz, A. A., H. J. Collett, and S. J. Reid. 2001. Nitrogen and carbon regulation of glutamine synthetase and glutamate synthase in Corynebacterium glutamicum ATCC 13032. FEMS Microbiol. Lett. 205:361-367.

    Shimizu, H., T. Tanaka, A. Nakato, K. Nagahisa, E. Kimura, and S. Shioya. 2003,. Effects of the changes in enzyme activities on metabolic flux redistribution around the 2-oxoglutarate branch in glutamate production by Corynebacterium glutamicum. Bioprocess Biosyst. Eng. 25:291-298.

    Simic, P., H. Sahm, and L. Eggeling. 2001. L-threonine export: use of peptides to identify a new translocator from Corynebacterium glutamicum. J. Bacteriol. 183:5317-5324.

    Simmons C. P., A. L. Hodgson, and R. A. Strugnell. 1997. Attenuation and vaccine potential of aroQ mutants of Corynebacterium pseudotuberculosis. Infect. Immun. 65:3048-3056.

    Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876-4882.

    Vrljic, M., H. Sahm, and L. Eggeling. 1996. A new type of transporter with a new type of cellular function: L-lysine export from Corynebacterium glutamicum. Mol. Microbiol. 22:815-826.

    Xia, X., and Z. Xie. 2001. DAMBE: Data analysis in molecular biology and evolution. J. Hered. 92:371-373.

    Xie, G., C. Forst, C. Bonner, and R. A. Jensen. 2002. Significance of two distinct types of tryptophan synthase beta chain in bacteria, archaea and higher plants. Genome Biol. 3:0004.1-0004.13.

    Yi, J., K. Li, K. M. Draths, and J. W. Frost. 2002. Modulation of phosphoenolpyruvate synthase expression increases shikimate pathway product yields in E.coli. Biotechnol. Prog. 18:1141-1148.(Yousuke Nishio*, Yoji Nak)