当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2004年 > 第2期 > 正文
编号:11259342
Using Quaternary Structures to Assess the Evolutionary History of Proteins: The Case of the Aspartate Carbamoyltransferase
     * Institut de Génétique et Microbiologie, Université Paris-Sud, Orsay, France

    Microbiology, Free University of Brussels (VUB) and J. M. Wiame Research Institute1, Brussels, Belgium

    E-mail: labedan@igmors.u-psud.fr.

    Abstract

    Many evolutionary scenarios describing the history of proteins are based solely on phylogenetic studies. We have designed a new approach that allows ascertainment of such questionable scenarios by taking into account quaternary structures: we used aspartate carbamoyltransferase (ATCase) as a case study. Prokaryotic ATCases correspond to different classes of quaternary structures according to the mode of association of the catalytic PyrB subunit with other polypeptides, either the PyrI regulatory subunit (class B) or a dihydroorotase (class A), which may be active (PyrC, subclass A1) or inactive (PyrC', subclass A2). Class C is uniquely made up of trimers of PyrB. The PyrB phylogenetic tree is not congruent with the tree of life, but it became coherent when we recognized the existence of two families of ATCases, ATC I and ATC II. Remarkably, a very strong correlation was found between the pattern of PyrB phylogenetic clustering and the different classes of quaternary structures of ATCases. All class B ATCases form a clade in family ATC II, which also contains all eukaryotic sequences. In contrast, family ATC I is made up of classes A and C. These results suggest unexpected common ancestry for prokaryotic B and eukaryotic ATCases on the one hand, and for A and C on the other. Thus, the emergence of specific quaternary structures appears to have been a more recent event than the separation into the ATC I and ATC II families. We propose that different evolutionary constraints, depending on the identity of the partners interacting in the different kinds of holoenzymes, operated in a concerted way on the ancestral pyrB genes and the respective associated genes pyrI or pyrC, so as to maintain appropriate inter-polypeptides interactions at the level of quaternary structure. The process of coevolution of genes encoding proteins interacting in various holoenzymes has been assessed by calculating the correlation coefficient between their respective phylogenetic trees. Our approach integrating data obtained from the separate fields of structural biology and molecular evolution could be useful in other cases where pure statistical data need to receive independent confirmation.

    Key Words: aspartate carbamoyltransferase ? catalytic PyrB subunit ? regulatory PyrI subunit ? dihydroorotase ? protein-protein interactions ? coevolution process ? linear correlation coefficient

    Introduction

    It is well known that protein evolution sometimes involves changes in quaternary structure that may result from gene rearrangements such as duplications, fusions, and fissions (Jensen 1992; Davidson et al. 1993; Snel, Bork, and Huynen 2000; Lieb et al. 2001) or even from single amino acid replacements (Shionyu, Takahashi, and G? 2001). However, the coherence of evolutionary schemes proposed to trace back the history of specific proteins has scarcely ever been challenged, up until now, by taking into account their quaternary structures. Here, we present such an approach, which integrates data obtained from the separate fields of structural biology and molecular evolution. For two main reasons, we chose aspartate carbamoyltransferase (ATCase, EC 2.1.3.2), the enzyme that catalyzes the first committed step of de novo pyrimidine biosynthesis, as a case study.

    First, the protein ATCase has been intensively studied over the last 40 years as a model of an allosteric enzyme (see Hervé 1989; Lipscomb 1994, for reviews), and crystal structures have been resolved at high resolution for several prokaryotic ATCases: those of Escherichia coli (Lipscomb 1994; Beernink et al. 1999 and references therein), Bacillus subtilis (Stevens, Reinich, and Lipscomb 1991), and Pyrococcus abyssi (Van Boxstael et al. 2003). All ATCases are composed of a basic subunit—the product of gene pyrB—which assembles in catalytic homotrimers. The PyrB polypeptide itself is composed of two structural domains which respectively bind the substrates carbamoylphosphate (N-half) and aspartate (C-half). Several classes of ATCase holoenzymes are known according to their mode of association with other proteins (table 1).

    Table 1 Summary of the Structural Properties of Prokaryotic ATCases and the Associated Polypeptides.

    Prokaryotic ATCases come in three classes (Bethell and Jones 1969; Wild and Wales 1990; Bergh and Evans 1993; Kenny, McPhail, and Shepherdson 1996).

    Class A contains a PyrB subunit complexed in a 1:1 ratio with either an active pyrC-encoded dihydroorotase (DHOase) which catalyzes the next step (EC 3.5.2.3) in the pathway (subclass A1; Hughes, Hooshdaran, and O'Donovan 1999) or a pyrC'-encoded DHOase-like inactive protein (subclass A2; Schurr et al. 1995). Either polypeptide, PyrC or PyrC' (also termed PyrX in the SwissProt nomenclature and in this article) appears to be required for assembly of the pyrB gene product into a stable dodecameric holoenzyme (Schurr et al. 1995; Van de Casteele et al. 1997).

    Class B enzymes also are dodecameric holoenzymes but consist of two catalytic trimers united by three dimers of pyrI-encoded regulatory subunits. In class B', presently limited to Thermotoga maritima and Treponema denticola, pyrB and pyrI are fused (Chen et al. 1998).

    Class C enzymes contain catalytic trimers only (Brabson and Switzer 1975). The prototype of this class (Bacillus subtilis) appears to be insensitive to allosteric effectors.

    Eukaryotic ATCases also are of different types. In plants, the situation is similar to the prokaryotic class C, except that the enzyme is sensitive to allosteric effectors (Khan, Chowdhry, and Yon 1999; Williamson and Slocum 1994). In animals and in Dictyostelium discoideum, pyrB is fused to the genes for carbamoylphosphate synthetase (EC 6.3.5.5) and DHOase in a multifunctional unit encoding the so-called CAD protein (Coleman, Suttle, and Stark 1977; Davidson et al. 1993). The CAD native structure is a hexamer of identical subunits where the ATCase domain plays a central role in oligomer formation (Qiu and Davidson 2000). In Fungi, a CAD-like protein occurs, where the DHOase segment is not catalytically functional, reminiscent of prokaryotic subclass A2 (Souciet et al. 1989). In trypanosomes, a non-fused pyrB gene is present, clustered in an operon-like pattern with other pyr genes (Gao et al. 1999).

    Our second main reason to focus on ATCase was based on a previous analysis of the evolutionary history of both aspartate-carbamoyltransferase and ornithine carbamoyltransferase (OTCases), a pair of paralogous enzymes present in nearly all organisms (Labedan et al. 1999). The phylogeny of 33 ATCases and 44 OTCases proved to diverge widely from the cognate SSU rRNA organismal tree because both gene trees turned out to be polyphyletic. This intricate topology could not be used to root the Tree of Life, but it could be rationalized in a rather simple scenario when we recognized that any ATCase belongs to one of two families—ATC I and ATC II—which could be traced back to gene duplications having occurred in the Last Common Ancestor (LCA) to all extant life or even before its emergence (Labedan et al. 1999). Likewise, present-day ornithine carbamoyltransferases belong to two ancient families, OTC and OTC ? (Labedan et al. 1999).

    Because these families of carbamoyltransferases had been defined uniquely on phylogenetic grounds, we attempted to correlate these data with other properties of these enzymes. In this article, we propose a new approach which, when applied to the ATCase case, confirms our hypothetical model. In a first step, we show that the PyrB phylogeny corresponds closely with the different classes of quaternary structures of ATCases. We further show that evolution of ATCases has been shaped by the interaction of PyrB with various partners in the different holoenzymes. This coevolution process was assessed by estimating directly the degree of correlation between the phylogenetic trees for pairs of interacting proteins using a statistical analysis. This second major component of our present work is based on the method recently developed by Goh et al. (2000) and Pazos and Valencia (2001). In this step, we further stress how crucial it is to distinguish among speciation effects, structural interactions, and functional interactions. We anticipate that the approach described in this article could be useful to ascertain scenarios proposed to trace back the history of protein families displaying either polyphyletic trees or other uncertain phylogenies.

    Materials and Methods

    Collecting Sequences

    Near 400 carbamoyltransferase (ATCases and OTCases) sequences were collected from the public databases SwissProt, TREMBL, and TREMBLNEW. Because many of the recently published sequences have been annotated uniquely by homology, we systematically checked each new sequence with a BlastP query before adding it to our multiple alignment. To facilitate the management of these data, which are continuously growing with the addition of new completely sequenced genomes, we assemble them in a relational database (available on request). This allowed us to easily filter out nearly identical sequences such as, for example, those of Brucella melitensis and B. suis. Moreover, in the case of unpublished but completely sequenced genomes, it was often possible to recover bona fide sequences from specific sequencing groups sites (see Acknowledgments) using either BlastP or tBlastN queries. We retained only sequences aligning along their whole length with bona fide carbamoyltransferases and sharing no less than 30% identity with it, using at least two distantly related seeds.

    Reconstructing Phylogenetic Trees

    Rooted phylogenetic trees were derived from multiple alignments of sets of filtered ATCases and OTCases using two different approaches.

    New sequences were manually added and aligned to the previously published (Labedan et al. 1999) multiple alignment using the BioEdit sequence alignment editor (Hall 1999). These additions were made effortless by introducing each new sequence near its closest partner (the first hit in the routine BlastP check, see above). This processing approach minimized the risk of introducing any bias when adding numerous new sequences. However, the soundness of this manual alignment was routinely checked using automatic programs (both ClustalX and DARWIN, see below) to verify that we did not miss any conserved motifs. We further ascertained this multiple alignment (especially the introduction of gaps) by using the information available from the known 3D structures of ATCases and OTCases. Maximum parsimony and distance trees were derived from this alignment using the PROTPARS and NEIGHBOR programs of the PHYLIP package (version 4.1; Felsenstein 1996), respectively. This PHYLIP package was further used to derive confidence limits for each node of either parsimony or distance trees using a bootstrap approach (programs SEQBOOT and CONSENSE).

    The PhyloTree program of the DARWIN package (Gonnet, Cohen, and Benner 1992) makes it possible (1) to make an exhaustive measure of the evolutionary distances (PAM distances) separating each sequence from all its homologs and (2) to build a multiple alignment and to derive a distance tree which is an approximation to maximum likelihood tree because the deduced evolutionary distances are weighted by computing the variance of the respective PAM distance when reconstructing the tree.

    Assessing the Correlation Between Phylogenetic Trees of Interacting Proteins

    As already proposed by Goh et al. (2000) and Pazos and Valencia (2001), the correlation between evolutionary trees was measured at the level of their respective distance matrices. Because the DARWIN approach is based on maximum likelihood (Gonnet, Cohen, and Benner 1992), we used the PhyloTree program to build matrices of both the PAM distances separating each sequence from all the others and their respective variances. A script was designed to collect automatically for each pair of interacting proteins belonging to the same set of species, their trees, and both matrices (PAM distances and variances of these PAM distances). Routinely, after checking that both evolutionary trees displayed correlated topologies, we directly estimated the Pearson's correlation coefficient r (Press et al. 2002) between the pairwise sequence distances. Calculation of r between the respective matrices was made using a Microsoft Excel 2000 automatic procedure.

    Results

    Correlation Between Phylogenetic Families and Classes of Quaternary Structures of ATCases

    Some 185 aspartate carbamoyltransferase sequences are now (spring 2003) available in the public databases, although most of the new ones have been annotated by homology only. These sequences were multiply aligned with about 205 ornithine carbamoyltransferases available at the same date, using as a guide our previous alignment (Labedan et al. 1999). Strikingly, it proved unnecessary to introduce any major change in the latter because most of the remarkable sites previously identified remained conserved in the new alignment (available on request). Because this article deals with ATCases, we used only the part of the multiple alignment containing the whole set of ATCases and a few selected and ? OTCases to reconstruct an updated ATCase phylogenetic tree rooted by the selected OTCases. Despite the considerable (sixfold) increase in the number of sequences, the general tree topology was not modified (fig. 1), the previously determined ATC I and ATC II families having been enlarged but not dislocated. This tree topology appears robust because it was found that whichever tree reconstruction method was used it was not sensitive to addition of new sequences. However, as previously noted (Labedan et al. 1999), some deep nodes (shown in fig. 1 as open circles) were found to be poorly supported (bootstrap values less than 75%) in either Neighbor-Joining or maximum parsimony analyses.

    FIG. 1. Correspondence between the ATCase phylogeny and the structural affiliation of PyrB proteins (classes A1, A2, B, B', and C). A selection of ATCases (only one sequence per genus) has been used to reconstruct a distance tree with the Darwin PhyloTree program (Gonnet, Cohen, and Benner 1992). This distance tree, which is an approximation to a maximum likelihood tree, has been rooted using a few OTCases as outgroup. The branch lengths (in PAM units, weighted by the respective variance) are drawn to scale. Species names are according to SwissProt conventions (the detailed list is available on request). Archaea are in italics and Eucarya are underlined. Classes of quaternary structures are indicated by labeled boxes in the case of sequences for which there are experimental evidences for an unambiguous structural affiliation. The names of organisms harboring a pyrI gene are framed. The open circles indicate the deep nodes which are not strongly supported in bootstrap analyses made on either Neighbor-Joining or parsimony trees

    A second striking conclusion, illustrated in figure 1, is that the two families of pyrB genes previously defined on the basis of phylogenetic criteria were found to correlate closely only with the ATCase classes of quaternary structures (table 1). All prokaryotic ATCases belonging to class B (and B') are confined inside the ATC II family, which also contains the relevant homologous domain of the eukaryotic CAD and CAD-like proteins. Beside the two B' sequences (T. maritima [labeled as PYRB_THEMA in figure 1] and T. denticola [labeled as PYRB_TREDE]) which branch at the base of the ATC II subtree, the class B ATCases form a clade consisting of all available Archaea and the set of Bacteria harboring a pyrI gene either as part of a pyrBI operon or close to the pyrB gene. The branching of archaeal ATCases was found to be analogous to that of the SSU-RNA phylogeny of this domain, with a clear separation between Crenarchaeota and Euryarchaeota, including PyrB sequences from unfinished genomes (unpublished) for which no information about the presence of a pyrI gene is so far available. This strongly suggests that all archaeal ATCases are dodecameric holoenzymes made of two trimers of PyrB united by three dimers of PyrI, as already demonstrated for Pyrococcus abyssi [PYRB_PYRAB] (Purcarea et al. 1997) and Sulfolobus acidocaldarius [PYRB_SULAC] (Durbecq et al. 1999), both of which contain a pyrBI operon, and for Methanococcus jannaschii [PYRB_METJA] (Hack et al. 2000), where both genes are distant. As regards bacteria belonging to class B, we observe a striking enlargement in biodiversity: besides the previously (Labedan et al. 1999) characterized Proteobacteria (e.g., PYRB_ECOLI) we now have members of distant branches, such as four Bacteroidetes (e.g., PYRB_PORGI), one Actinobacterium (Bifidobacterium longum [PYRB_BIFLO]), and five Firmicutes belonging to the genus Clostridium (e.g., PYRB_CLOAB). Note that C. perfringens [PYRB_CLOPE], one of the three entirely sequenced Clostridium genomes, was reported to miss a pyrI gene (Shimizu et al. 2002). Actually, a partial pyrI sequence is present immediately downward of the pyrB gene in this organism. The absence of the 5' part of this sequence could be due either to some trivial sequencing/annotation problem or to recent evolution of this gene toward a pseudogene state as has been described for other pathogens (Ogata et al. 2001).

    Interestingly, the few ATC II bacteria that are deprived of pyrI are not clustered with the class B members. This is the case for the Spirochete Leptospira interrogans [PYRB_LEPIN] and for a small and monophyletic group of five bacterial genera (e.g., PYRB_BORPE), which branches far from the class B clade, inside the second main ATC II cluster containing all eukaryotes. In this cluster, all CAD (animals [e.g., PYR1_MOUSE] and D. discoideum [PYRB_DICDI]) and CAD-like (fungal) sequences (e.g., PYRB_YEAST) form a clade which had a close common ancestor with the Trypanosomatidae sequences ( [PYRB_LEIME], [PYRB_TRYCR]). This appears to be in good agreement with previous data (Gao et al. 1999), which suggested that Trypanosoma sequences are progenitors in the evolution to CAD. The plant sequences (e.g., [PYRB_ARATH]), which seem structurally analogous to class C enzymes, form another clade. The present topology of the ATC II family strongly suggests a common history of eukaryotic ATCases and prokaryotic B class ATCases, despite their different quaternary structures, and does not support previous models where eukaryotic ATCases were supposed to derive from prokaryotic A class (Schurr et al. 1995).

    Figure 1 further shows that the ATC I family, made only of bacterial enzymes, contains all sequences belonging to classes A and C. The great majority of C enzymes—to the notable exception of Xanthomonadales ([PYRB_XANCA], [PYRB_XYLFA])—forms a clade grouping the low GC Gram-positive bacteria (e.g., [PYRB_BACSU]), Fusobacterium nucleatum [PYRB_FUSNU] and Aquifex aeolicus [PYRB_AQUAE]. We also observed tight clustering of the different known A enzymes. The A1 sequences appear to be monophyletic and group all high GC Gram-positive (e.g., [PYRB_MYCTU]) bacteria (except the Bifidobacteriales ([PYRB_BIFLO]), and Tropheryma [PYRB_TROWH]) and two members of the Thermus-Deinococcus branch (T. aquaticus [PYRB_THEAQ] and D. radiodurans [PYRB_DEIRA]). Known A2 sequences belong to a large cluster, where they are interspersed with some uncharacterized ATCases. In this cluster, the Cyanobacteria (e.g., [PYRB_THEEL]) are monophyletic and group together with a set of phylogenetically distant bacteria, such as members of the newly defined Bacteroidetes-Chlorobi group (Cytophaga hutchinsonii [PYRB_CYTHU], Chlorobium tepidum [PYRB_CHLTE]), Fibrobacteres (Fibrobacter succinogenes [PYRB_FIBSU]) and either magnetotactic (Magnetococcus [PYRB_MAGMC]) or delta (Desulfovibrio [PYRB_DESDE], Geobacter metallireducens [PYRB_GEOME]) proteobacteria. There is also a larger clade of A2 sequences grouping alpha (e.g., [PYRB_AGRTU]), beta (e.g., [PYRB_NITEU]), and gamma (e.g., [PYRB_PSEAE]) Proteobacteria. The fact that A and C ATCases appear together in the same family indicates that they descend from a common ancestor, differing by the fact that they are or are not stably associated with an active or inactive DHOase.

    Correlation Between Phylogenetic Families of ATCases and of DHOases

    Because a phylogenetic classification has been previously worked out for DHOases (Fields et al. 1999), we compared it with our ATCase data in order to obtain more information about the physical interaction of these two proteins at the holoenzyme level in ATCases of class A. Two main types of DHOases, I and II, defined according to their sizes, appear to derive from one ancestral protein of the amidohydrolase superfamily (Holm and Sander 1997), type I being the most ancient because it occurs in all three domains.

    Several classes have been delineated within each type according to their mode of interaction with other pyrimidine enzymes (Fields et al. 1999). In the larger type I we find (1) a class a, made of non-interacting and active DHOases present in low GC Gram-positive bacteria and in archaea; (2) a class b, made of DHOases interacting with PyrB and either active, as in high GC Gram-positive bacteria and the Thermus-Deinococcus clade, or inactive as in Proteobacteria and Cyanobacteria; (3) a class c, made only of eukaryotes and subdivided in active DHOases present in animals and in inactive DHOases present in fungi; and (4) a class d, which groups a special category of poorly defined DHOases present essentially in a few Proteobacteria and Cyanobacteria. The organisms possessing an inactive DHOase of type I (Proteobacteria, Cyanobacteria, Fungi) also contain a smaller, active DHOase of type II. Plants also have a type II active DHOase which would have a bacterial endosymbiotic origin (Fields et al. 1999). Whether a DHOase-like protein is catalytically active or not can be tentatively inferred from the presence or absence of four catalytically critical histidine residues in the derived amino acid sequence (Fields et al. 1999).

    Figure 2 and table 2 show the correlations between the families and classes of quaternary structures we have delineated for the prokaryotic ATCases and the different classes of DHOases (Fields et al. 1999). Figure 2 details for each ATC structural class the nature of the associated DHOase, and table 2 summarizes this rather complex situation. All species not belonging to class A2 contain a unique DHOase which is of type I—except in the case of the Proteobacteria belonging to class B. As expected, this unique DHOase is active and may belong to class a (accompanying an ATCase of class B or C), class d (with an ATCase of class B or C) or class b (with an ATCase of class A1). The organisms harboring an ATCase of class A2 present a more complex set of DHOases. Beside the Ib inactive form (PyrX), which is associated with the PyrB polypeptide at the quaternary level, there is another active DHOase which may apparently be either of type I or II. Moreover, some A2 organisms such as Pseudomonas may even have both active forms Id and IIa.

    FIG. 2. Correspondence between the ATCase phylogeny and the dihydroorotase type for each cluster of PyrB sequences. A simplified view of the distance tree shown in figure 1 has been used to indicate for each group of evolutionarily related sequences the nature of the dihydroorotase(s) present in the same set of species. Dihydroorotase types were ascertained by an unpublished homology search according to the nomenclature of Fields et al. (1999)

    Table 2 Summary of the Correlations Between the Phylogenetic and Structural Properties of Prokaryotic ATCases and the Associated DHOase(s).

    Table 2 further indicates whether the pyrB and pyrC (or pyrX) genes are genetically linked. There is some correlation between a close proximity of these genes and the structural interaction of their products at the quaternary level. In the case of ATCases of class B or C, where we do not expect any structural interactions, a large majority of the species have pyrC distant from pyrB. In contrast, in the case of class A, where there is a required structural interaction, pyrC (subclass A1) or pyrX (subclass A2) are found next to pyrB—except in the Cyanobacteria.

    Study of these ATCase-DHOase correlations may help in assigning a putative structural class to a few of the experimentally uncharacterized ATC I sequences. For example, the epsilon Proteobacteria Helicobacter pylori and Campylobacter jejuni contain two DHOases, a Ib inactive and a IIa active, and their pyrB and pyrX genes are distant. This is reminiscent of the Cyanobacteria and would suggest an A2 quaternary structure for these epsilon proteobacterial ATCases (PYRB_HELPY and PYRB_CAMJE in fig. 2).

    Assessing the Coevolution of ATCase and of Its Partner Proteins in the Different Holoenzymes

    The correlations observed above between phylogenetic clustering and classes of quaternary structures suggest the occurrence of some underlying process of gene coevolution that monitors interactions at the level of the quaternary structures of the proteins considered. To check this hypothesis, we tried to estimate the extent of this coevolution process using a recently published methodological approach (Goh et al. 2000; Pazos and Valencia 2001). This approach is based on the comparison of the phylogenetic trees obtained for a pair of intimately interacting proteins. Because comparison of phylogenetic trees could be not accurate enough, it has been proposed (Goh et al. 2000; Pazos and Valencia 2001) to compare directly the distance matrices of each interacting protein using the same set of species. The degree of correlation is estimated by determining the linear correlation coefficient r (Pearson's correlation coefficient calculated as in Press et al. 2002) between the pairwise sequence distances. It is generally admitted that a Pearson's correlation coefficient r below 0.3 means no association (Cohen 1988). An r value falling in the interval 0.3–0.7 indicates a weak association (Cohen 1988). The association is supposed to be strong if r is larger than 0.7 (Cohen 1988). Moreover, in the case of protein-protein interactions, Goh et al. (2000) and Pazos and Valencia (2001) agreed on a value of 0.79 as a threshold of significance for the linear correlation coefficient between the divergent evolution of each of the interacting proteins. Although some interacting proteins were found below this threshold and a few non-interacting proteins were above it (Goh et al. 2000; Pazos and Valencia 2001), it appeared that this value is the best compromise between sensitivity and specificity (see also Goh and Cohen 2002). Accordingly, we chose an empirical cut-off of 0.8 to ascertain if a positive correlation reflects such a coevolution process.

    Table 3 and the accompanying figure 3 show how we tried to differentiate for each ATCase class the functional, structural, and functional/structural interactions taking place at the holoenzyme level. First, we have two cases—which may be seen as internal negative controls in our experimental test—where no structural interaction is expected, although a functional interaction is possible. In the case of the 23 species (16 genera) harboring an ATCase of class C, a Pearson's correlation coefficient of 0.613 was found between the distance matrices established for the PyrB and PyrC polypeptides, respectively. This relatively low value may correspond to some functional interaction between these two enzymes, which are involved in the two first steps of the pyrimidine biosynthesis pathway. Likewise, in the case of ATCase classes B and B' (49 species, 32 genera), the correlation between PyrB and the DHOase remains weak (r = 0.525). On the contrary, in the same B and B' classes, we find a high correlation (r = 0.857) corresponding to the known structural interaction occurring between PyrB and PyrI. A more complex case implying both structural and functional interactions between PyrB and the active DHOase of type Ib can be evaluated in the case of the set of species belonging to subclass A1. A significant correlation (r = 0.759) is obtained, but this Pearson's coefficient is slightly below the cut-off previously set (Goh et al. 2000; Pazos and Valencia 2001). This difference may be due to the small size of this group of sequences (only 11 species, 7 genera). Note, however, a rather high value for the correlation between the respective variances in the case of subclass A1. Finally, in the case of subclass A2, there is a clear-cut difference for the same set of 28 species (24 genera) between the r values calculated between either PyrB and its structural partner, the inactive DHOase of type Ib, (0.917), or the same PyrB polypeptide with the other, active, DHOase (0.250) with which it does not structurally interact.

    Table 3 Measuring the Linear Coefficient Correlation (Pearson) Between Distance Matrices for Various Sets of Proteins Belonging to the Same Species.

    FIG. 3. Distribution of the linear correlation coefficient (Pearson) as a function of the type of interaction between partner proteins. The different Pearson coefficients are shown as black (experimentally known structural interaction) and gray (no structural interaction) bars. The dotted line indicates the threshold value (0.8) we used as the best compromise to discriminate between sound and questionable interactions at the structural level

    A positive correlation between protein trees suggests a coevolution of their coding genes, which may be due to (1) a phylogeny (speciation) effect, or (2) a functional interaction between their products (e.g., proteins involved in the same cellular process), or (3) a structural interaction. Table 4 shows how we tried to ascertain the respective importance of these different possible causes. We focused on the A2 subclass, where cells are supposed to contain two DHOases, one (inactive) involved only in a structural interaction, the other (active) involved only in a functional interaction. Moreover, we introduced OTCase to check the speciation effect both on PyrB (known paralogous relationship) and on each DHOase (no expected parental relationship). Table 4 shows a rather high level of correlation (0.752) between ATCase and OTCase (speciation effect) which is close to that found for the PyrB-PyrC structural interaction in class A1 (table 3). This value, which is slightly below the 0.8 threshold, was not unexpected as we already underlined the strong parental relationships between these two paralogous enzymes (Labedan et al. 1999) which also share several functional properties, including their common substrate, the carbamoylphosphate. In contrast, there is no significant correlation (0.270) between the divergent evolutions of OTCase and of the active DHOases (IIa, Id). Here, the phylogeny effect appears to be undetectable because the Pearson's coefficient is in the same range as the level of functional interaction between PyrB and the active DHOases. Note also the absence of any correlation (0.213) between the two types of DHOases in the case of this A2 subclass, even though they descend from the same ancestor. Clearly, each paralogous DHOase has been able to evolve independently of the other, although both were present in the same ancestral organisms. Finally, and strikingly, the level of correlation (0.715) between the divergent evolution of the OTCase and that of the inactive DHOase (PyrX) is in the same range as that found between OTCase and these A2 ATCases. This may be interpreted as indirect but strong support for the notion that coevolution of A2 pyrB and pyrX genes is necessary for adequate structural interaction of their encoding polypeptides.

    Table 4 Measuring the Linear Coefficient Correlation (Pearson) Between Pairwise Sequence Distances for Various Sets of Proteins Belonging to Species Having a Subclass A2 ATCase.

    Discussion

    Aspartate carbamoyltransferases may adopt three rather different quaternary structures in order to carry the same function (table 1). The basic functional unit is a trimer of the catalytic PyrB polypeptide, corresponding to holoenzymes of class C. Such a trimer may be complexed with various multimers made of two different kinds of polypeptides. The second component may be either the regulatory PyrI polypeptide (in class B/B') or the dihydroorotase (in class A). A further level of complexity was recently introduced when it was found that this dihydroorotase component may be either an active enzyme (PyrC polypeptide), as in subclass A1, or an inactive (PyrC' or PyrX) polypeptide, as in subclass A2.

    Such a complex situation suggests an intricate evolutionary history that we have tried to partially disclose in this work. The precise structural and/or functional interaction of various polypeptides requires fine-tuning processes of coevolution of their encoding genes. To detect such processes of concerted evolution and to differentiate them from other mechanisms of parallel evolution, we used several complementary approaches aiming at discriminating between different underlying factors such as gene proximity and speciation.

    In a first and major step of our experimental approach, we have disclosed a clear correlation between the different quaternary structures of ATCases and the two families ATC I and ATC II we had previously described solely on phylogenetic grounds (Labedan et al. 1999). As shown in figure 1 and summarized in table 2, all ATCases belonging to class B form a unique clade inside the ATC II family. All members of this clade, as well as the two species belonging to class B' are prokaryotes containing a pyrI gene. Thus, the mere presence of a pyrI gene in a newly studied organism would be a signature for a class B ATCase. This ATC II family contains a set of bacteria without a pyrI gene, and nothing is known about the ATCase quaternary structures of this set. The rest of this ATC II family is made up of all known eukaryotic ATCases. Thus, contrary to a previous proposition by Schurr et al. (1995), the genes encoding these eukaryotic ATCases do not appear to derive from a class A ATCase but probably have evolved in parallel with those encoding prokaryotic ATCases of class B.

    Remarkably, the ATC I family contains all the ATCases of classes A and C, suggesting a common ancestry for these two very different classes of quaternary structures. The situation appears to be more complex than in the ATC II family, because neither class A nor class C is monophyletic. For example, a majority of the class C members forms a clade containing the low GC Gram-positive bacteria, Fusobacteria and Aquifex, but the Xanthomonadales are in a distant position. Note that the Xanthomonadales also differ from the other species harboring a class C by having an active DHOase of type Id and not Ia (fig. 2). It is likely that, while subclass A1 is monophyletic, subclass A2 is divided into two clades separated by several bacteria, for which we do not have experimental evidence about the quaternary structure of their ATCase, and by class C Xanthomonadales.

    Moreover, the evolution of the ATC I family may have been influenced by the nature of the DHOase interacting with the ATCase in the two subclasses A1 and A2. This influence has been examined with different approaches. The conservation of gene proximity in phylogenetically distant species is classically interpreted (see, for example, Dandekar et al. 1998; Marcotte et al. 1999) as strong evidence for either functional or structural interaction of their products, or their common regulation. In classes B and C where there is at best a functional interaction between these two enzymes which share a common molecule (N-carbamoyl-L-aspartate) in the pyrimidine pathway, it seems irrelevant whether their encoding genes (pyrB and pyrC) are clustered or not. In class A, some strong pressure has kept these genes (pyrB and either pyrC or pyrX) adjacent, with the notable exception of the Cyanobacteria. Therefore, only the occurrence of gene proximity—not the absence of it—has any suggestive value in searches for interactions between gene products.

    A much better way to test if genes have evolved in a concerted way because their encoding proteins are supposed to interact is to assess the degree of correlation between their phylogenetic trees. The requirement for structural interaction at the quaternary level has been measured by a statistical approach (Goh et al. 2000 ; Pazos and Valencia 2001). As summarized in table 3 and figure 3, the linear correlation coefficient (Pearson coefficient) between the pairwise sequence distances of PyrB and the other partner proteins was found to be larger than the threshold value of 0.8 (Goh et al. 2000; Goh and Cohen 2002; Pazos and Valencia 2001) in the case of PyrB interacting with either PyrI (0.857) or PyrX (0.917). The known structural interaction PyrB-PyrC in subclass A1 was found to be slightly below this threshold (0.759). This lower value may be explained in two ways: either by a sampling effect (only 11 species available) or, more interestingly, by a compromise between antagonistic forces corresponding to the dual role of these DHOases, which are catalytically active but also act as a structural partner of PyrB in the A1 species. Indeed, mutations of functional residues usually decrease the activity, but concurrently they often increase stability of the protein (Kirschner and Gerhart 1998). In the case of DHOases present in A1 species, their encoding genes must at the same time avoid any deleterious change in the residues that are necessary for the good functioning of its product as well as in those required for maintaining a 3D protein structure able to interact at the quaternary level with the PyrB partner. The equilibrium between these potentially antagonistic effects on the pyrC genes present in A1 species may be reached only at this lower level of correlation.

    In conclusion: (1) We have disclosed a very strong correlation between the phylogeny of ATCases and the different classes of quaternary structures of this enzyme, suggesting unexpected common ancestry for prokaryotic B and eukaryotic ATCases on the one hand and for A and C on the other. (2) We have quantitatively assessed the structural constraints which underlie the interactions occurring between the partner proteins of these classes. The emergence of specific quaternary structures appears to have been a more recent event than the separation into the ATC I and ATC II families. As outlined at the beginning of this article, the phylogeny of carbamoyltransferases diverges widely from the organismal tree of life based on small-subunit rRNA. The correlations we have established strenghten the significance of this polyphyletic pattern, the origin of which will be discussed in a forthcoming article.

    Acknowledgements

    We thank the two anonymous reviewers for constructive comments. We thank the DOE (Department of Energy, USA) and the Wellcome Trust (United Kingdom) for making available unpublished sequences from genomic projects produced by different Sequencing Groups at either the Joint Genome Institute (www.jgi.doe.gov/JGI_microbial/html/index.html) or the Sanger Institute (www.sanger.ac.uk/Projects).

    This work was supported by the Flanders Foundation for Joint and Fundamental Research and by the Centre National de la Recherche Scientifique (CNRS) (UMR 8621). Daniil Naumoff was supported by a postdoctoral grant from the French Ministère de la Recherche.

    Literature Cited

    Beernink, P. T., J. A. Endrizzi, T. Alber, and H. K. Schachman. 1999. Assessment of the allosteric mechanism of aspartate transcarbamoylase based on the crystalline structure of the unregulated catalytic subunit. Proc. Natl. Acad. Sci. USA 96:5388-5393.

    Bergh, S. T., and D. R. Evans. 1993. Subunit structure of a class A aspartate transcarbamoylase from Pseudomonas fluorescens. Proc. Natl. Acad. Sci. USA 90:9818-9822.

    Bethell, M. R., and M. E. Jones. 1969. Molecular size and feedback-regulation characteristics of bacterial aspartate transcarbamylases. Arch. Biochem. Biophys. 134:352-365.

    Brabson, J. S., and R. L. Switzer. 1975. Purification and properties of Bacillus subtilis aspartate transcarbamylase. J. Biol. Chem. 250:8664-8669.

    Chen, P., F. Van Vliet, M. Van De Casteele, C. Legrain, R. Cunin, and N. Glansdorff. 1998. Aspartate transcarbamylase from the hyperthermophilic eubacterium Thermotoga maritima: fused catalytic and regulatory polypeptides form an allosteric enzyme. J. Bacteriol, 180:6389-6391.

    Cohen, J. 1988. Statistical power analysis for the behavioral sciences, 2nd edition. Lawrence Erlbaum Associates, Mahwah, N.J.

    Coleman P. F., D. P. Suttle, and G. R. Stark. 1977. Purification from hamster cells of the multifunctional protein that initiates de novo synthesis of pyrimidine nucleotides. J. Biol. Chem. 252:6379-6385.

    Dandekar, T., B. Snel, M. Huynen, and P. Bork. 1998. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23:324-328.

    Davidson, J. N., K. C. Chen, R. S. Jamison, L. A. Musmanno, and C. B. Kern. 1993. The evolutionary history of the first three enzymes in pyrimidine biosynthesis. Bioessays 15:157-164.

    Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. 1978. A model for evolutionary change. Pp. 345–352. In M. O. Dayhoff, ed. Atlas of protein sequence and structure, vol. 5, suppl. 3. National Biomedical Research Foundation, Washington, D.C.

    Durbecq, V., T. L. Thia-Toong, D. Charlier, V. Villeret, M. Roovers, R. Wattiez, C. Legrain, and N. Glansdorff. 1999. Aspartate transcarbamylase from the thermoacidophilic archaeon Sulfolobus acidocaldarius: cloning, sequence analysis, enzyme purification and characterization. Eur. J. Biochem. 264:233-241.

    Felsenstein, J. 1996. Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Meth. Enzymol. 266:418-427.

    Fields, C., D. Brichta, M. Shepherdson, M. Farinha, and G. A. O'Donovan. 1999. Phylogenetic analysis and classification of dihydroorotases: a complex history for a complex enzyme. Paths to Pyrimidines. An International Newsletter 7:49-63.

    Gao, G., T. Nara, J. Nakajima-Shimada, and T. Aoki. 1999. Novel organization and sequences of five genes encoding all six enzymes for de novo pyrimidine biosynthesis in Trypanosoma cruzi. J. Mol. Biol. 285:149-161.

    Goh, C. S., A. A. Bogan, M. Joachimiak, D. Walther, and F. E. Cohen. 2000. Co-evolution of proteins with their interaction partners. J. Mol. Biol. 299:283-293.

    Goh, C. S., and F. E. Cohen. 2002. Co-evolutionary analysis reveals insights into protein-protein interactions. J. Mol. Biol. 324:177-192.

    Gonnet, G. H., M. A. Cohen, and S. A. Benner. 1992. Exhaustive matching of the entire protein sequence database. Science 256:1443-1445.

    Hack, E. S., T. Vorobyova, J. B. Sakash, J. M. West, C. P. Macol, G. Hervé, M. K. Williams, and E. R. Kantrowitz. 2000. Characterization of the aspartate transcarbamoylase from Methanococcus jannaschii. J. Biol. Chem. 275:15820-15827.

    Hall, T. A. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids. Symp. Ser. 41:95-98.

    Hervé, G. 1989. Aspartate transcarbamylase from Escherichia coli. Pp. 61–79 in G. Hervé, ed. Allosteric enzymes, CRC Press, Boca Raton, Fla.

    Holm, L., and C. Sander. 1997. Unification of a broad set of amidohydrolases related to urease. Proteins 28:72-82.

    Hughes, L. E., M. Z. Hooshdaran, and G. A. O'Donovan. 1999. Streptomyces aspartate transcarbamoylase is a dodecamer with dihydroorotase activity. Curr. Microbiol. 39:175-179.

    Jensen, R. A. 1992. An emerging outline of the evolutionary history of aromatic amino acid biosynthesis. Pp 205–236 in R. P. Mortlock, ed. The evolution of metabolic function, CRC Press, Boca Raton. Fla.

    Kenny, M. J., D. McPhail, and M. Shepherdson. 1996. A reappraisal of the diversity and class distribution of aspartate transcarbamoylases in Gram-negative bacteria. Microbiology 142:1873-1879.

    Khan A. I., B. Z. Chowdhry, and R. J. Yon. 1999. Wheat-germ aspartate transcarbamoylase: revised purification, stability and re-evaluation of regulatory kinetics in terms of the Monod-Wyman-Changeux model. Eur. J. Biochem. 259:71-78.

    Kirschner M., and J. Gerhart. 1998. Evolvability. Proc. Natl. Acad. Sci. USA 95:8420-8427.

    Labedan, B., A. Boyen, M. Baetens, D. Charlier, P. Chen, R. Cunin, V. Durbecq, N. Glansdorff, G. Hervé, and C. Legrain, et al. (19 co-authors). 1999. The evolutionary history of carbamoyltransferases: a complex set of paralogous genes was already present in the last universal common ancestor. J. Mol. Evol. 49:461-473.

    Lieb, B., B. Altenhein, J. Markl, A. Vincent, E. van Olden, K. E. van Holde, and K. I. Miller. 2001. Structures of two molluscan hemocyanin genes: significance for gene evolution. Proc. Natl. Acad. Sci. USA 98:4546-4551.

    Lipscomb, W. N. 1994. Aspartate transcarbamylase from Escherichia coli: activity and regulation. Adv. Enzymol. Relat. Areas Mol. Biol. 68:67-151.

    Marcotte E. M., M. Pellegrini, H. L. Ng, D. W. Rice, T. O. Yeates, and D. Eisenberg. 1999. Detecting protein function and protein-protein interactions from genome sequences. Science 285:751-753.

    Ogata, H., S. Audic, P. Renesto-Audiffren, P. E. Fournier, V. Barbe, D. Samson, V. Roux, P. Cossart, J. Weissenbach, J. M. Claverie, and D. Raoult. 2001. Mechanisms of evolution in Rickettsia conorii and R. prowazekii. Science 293:2093-2098.

    Pazos, F., and A. Valencia. 2001. Similarity of phylogenetic trees as indicator of protein–protein interaction. Protein Eng. 14:609-614.

    Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. 2002. Numerical recipes in C, the art of scientific computing, 2nd edition. Cambridge University Press. Available at http://www.ulib.org/webRoot/Books/Numerical_Recipes/bookcpdf.html.

    Purcarea, C., M. Ladjimi, G. Hervé, and R. Cunin. 1997. Aspartate transcarbamylase from the deepsea hyperthermophilic archaeon Pyrococcus abyssi: genetic organization, structure and expression in Escherichia coli. J. Bacteriol. 179:4143-4157.

    Qiu, Y., and J. N. Davidson. 2000. Substitutions in the aspartate transcarbamoylase domain of hamster CAD disrupt oligomeric structure. Proc. Natl. Acad. Sci. USA 97:97-102.

    Shionyu, M., K-I. Takahashi, and M. G?. 2001. Variable subunit contact and cooperativity of hemoglobins. J. Mol. Evol. 53:416-429.

    Schurr, M. J., J. F. Vickrey, A. P. Kumar, A. L. Campbell, R. Cunin, R. C. Benjamin, M. S. Shanley, and G. A. O'Donovan. 1995. Aspartate transcarbamoylase genes of Pseudomonas putida: requirement for an inactive dihydroorotase for assembly into the dodecameric holoenzyme. J. Bacteriol. 177:1751-1759.

    Shimizu, T., K. Ohtani, H. Hirakawa, K. Ohshima, A. Yamashita, T. Shiba, N. Ogasawara, M. Hattori, S. Kuhara, and H. Hayashi. 2002. Complete genome sequence of Clostridium perfringens, an anaerobic flesh-eater. Proc. Natl. Acad. Sci. USA 99:996-1001.

    Snel, B., P. Bork, and M. Huynen. 2000. Genome evolution. Gene fusion versus gene fission. Trends Genet. 16:9-11.

    Souciet, J. L., M. Nagy, M. Le Gouar, F. Lacroute, and S. Potier. 1989. Organization of the yeast URA2 gene: identification of a defective dihydroorotase-like domain in the multifunctional carbamoylphosphate synthetase-aspartate transcarbamylase complex. Gene 79:59-70.

    Stevens, R. C., K. M. Reinich, and W. N. Lipscomb. 1991. Molecular structure of Bacillus subtilis aspartate transcarbamoylase at 3.0 ? resolution. Proc. Natl Acad. Sci. USA 88:6087-6091.

    Van Boxstael, S., R. Cunin, S. Khan, and D. Maes. 2003. Aspartate transcarbamylase from the hyperthermophilic Archaeon Pyrococcus abyssi: thermostability and 1.8 ? resolution crystal structure of the catalytic subunit complexed with the bisubstrate analogue N-Phosphonacetyl-L-aspartate. J. Mol. Biol. 326:203-216.

    Van de Casteele, M., P. Chen, M. Roovers, C. Legrain, and N. Glansdorff. 1997. Structure and expression of a pyrimidine gene cluster from the extreme thermophile Thermus strain ZO5. J. Bacteriol. 179:3470-3481.

    Wild, J. R., and M. E. Wales. 1990. Molecular evolution and genetic engineering of protein domains involving aspartate transcarbamoylase. Annu. Rev. Microbiol. 44:193-218.

    Williamson, C. L., and R. D. Slocum. 1994. Molecular cloning and characterization of the pyrB1 and pyrB2 genes encoding aspartate transcarbamoylase in pea (Pisum sativum L.). Plant Physiol. 105:377-384.(Bernard Labedan*, Ying Xu)