当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2005年 > 第6期 > 正文
编号:11258314
Genome Rearrangement Distances and Gene Order Phylogeny in -Proteobacteria
     Institut Cavanilles de Biodiversitat i Biologia Evolutiva and Departament de Genètica, Universitat de València, Valencia, Spain

    Correspondence: E-mail: francisco.silva@uv.es.

    Abstract

    Genome rearrangements have been studied in 30 -proteobacterial complete genomes by comparing the order of a reduced set of genes on the chromosome. This set included those genes fulfilling several characteristics, the main ones being that an ortholog was present in every genome and that none of them had been acquired by horizontal gene transfer. Genome rearrangement distances were estimated based on either the number of breakpoints or the minimal number of inversions separating two genomes. Breakpoint and inversion distances were highly correlated, indicating that inversions were the main type of rearrangement event in -Proteobacteria. In general, the progressive increase in sequence-based distances between genome pairs was associated with the increase in their rearrangement-based distances but with several groups of distances not following this pattern. Compared with free-living enteric bacteria, the lineages of Pasteurellaceae were evolving, on average, to relatively higher rates of between 2.02 and 1.64, while the endosymbiotic bacterial lineages of Buchnera aphidicola and Wigglesworthia glossinidia were evolving at moderately higher rates of 1.38 and 1.35, respectively. Because we know that the rearrangement rate in the Bu. aphidicola lineage was close to zero during the last 100–150 Myr of evolution, we deduced that a much higher rate took place in the first period of lineage evolution after the divergence of the Escherichia coli lineage. On the other hand, the lineage of the endosymbiont Blochmannia floridanus did present an almost identical rate to free-living enteric bacteria, indicating that the increase in the genome rearrangement rate is not a general change associated with bacterial endosymbiosis. Phylogenetic reconstruction based on rearrangement distances showed a different topology from the one inferred by sequence information. This topology broke the proposed monophyly of the three endosymbiotic lineages and placed Bl. floridanus as a closer relative to E. coli than Yersinia pestis. These results indicate that the phylogeny of these insect endosymbionts is still an open question that will require the development of specific phylogenetic methods to confirm whether the sisterhood of the three endosymbiotic lineages is real or a consequence of a long-branch attraction phenomenon.

    Key Words: genome rearrangement ? inversion distance ? breakpoint distance ? -Proteobacteria ? endosymbiont ? gene order phylogeny ? Buchnera ? Blochmannia ? Wigglesworthia

    Introduction

    The construction of detailed genetic maps in several bacterial species soon revealed that the overall gene order was not conserved over a long evolutionary timescale. The sequencing of the complete genome in many bacterial species and strains clearly showed that closely related species had accumulated fewer rearrangements than the distant ones. However, these tendencies presented exceptions, with some phylogenetic lineages showing remarkable conservation and others extensive genome rearrangements (Casjens 1998; Nadeau and Sankoff 1998).

    There are four types of changes that may affect the order of the genes on the bacterial genome. First, inversions and translocations are frequently detected when the genome of closely related species are compared (Hughes 2000). Inversions are frequently symmetric around the axis of DNA replication (Eisen et al. 2000; Tillier and Collins 2000), while translocations may be intra- or interchromosomal. Second, genes may be removed in a single event or as a consequence of a process of progressive disintegration, which produces gaps when the genomes of two species are compared (J. O. Andersson and S. G. Andersson 2001; Silva, Latorre, and Moya 2001; Moran 2002). Third, horizontal (or lateral) gene transfer (HGT), considered one of the major forces shaping the evolution of prokaryotic genomes (Koonin and Galperin 1997; Garcia-Vallve, Romeu, and Palau 2000; Ochman, Lawrence, and Groisman 2000; Boucher et al. 2003), may produce insertions throughout the genome. The extension of this phenomenon has been reported to be as high as 20%–30% within species in genome comparisons with many inserted foreign DNA segments in the genome (Welch et al. 2002). However, HGT varies considerably among species with some of them being completely refractory to this phenomenon (Ochman, Lawrence, and Groisman 2000). Finally, partial duplications of the genome may produce redundant genomic segments.

    Genome rearrangements have been studied in several bacterial groups. -Proteobacteria is one of them, with inversions as one of the most frequent rearrangement type in interspecies comparisons (Hughes 2000). The study of this group is also very interesting because more than 30 genomes have been sequenced, showing a diverse degree of relatedness. Between them, the small genomes of the bacterial endosymbionts of aphids Buchnera aphidicola BAp (Shigenobu et al. 2000), Bu. aphidicola BSg (Tamas et al. 2002), and Bu. aphidicola BBp (van Ham et al. 2003), of tsetse flies Wigglesworthia glossinidia (Akman et al. 2002), and of carpenter ants Blochmannia floridanus (Gil et al. 2003) have been recently reported. The case of Bu. aphidicola has been specially studied. It is transmitted maternally in aphids (Baumann et al. 1995), and the time of divergence of the three strains has been proposed to be as long as 164 Myr (Von Dohlen and Moran 2000). During this period of time, an almost complete stasis was observed. No HGT or duplication event was detected, and only four small rearrangements differentiated the chromosome of Bu. aphidicola BBp from those of the BAp and BSg strains (two small inversions affecting one and six genes and two small translocations from two plasmids affecting two and four genes) (Tamas et al. 2002; Silva, Latorre, and Moya 2003; van Ham et al. 2003). However, at least 164 gene-loss events had been placed during the evolution of the three lineages (Silva, Latorre, and Moya 2003; Gomez-Valero, Latorre, and Silva 2004). This period of genome structural stability contrasts with the remaining evolution of the Bu. aphidicola lineage after its divergence from the Escherichia coli lineage, with chromosomal rearrangements and more than 1,000 lost genes (Moran and Mira 2001; Silva, Latorre, and Moya 2001).

    During recent years several attempts have been made to produce a -proteobacterial phylogeny and to establish the relationship among the three endosymbiotic species and the remaining -proteobacterial genomes, especially the Enterobacteriaceae. Clustering of the three endosymbiotic species was proposed based on a 16S ribosomal DNA (rDNA) phylogeny (Sauer et al. 2000) or in a concatenated alignment of 61 proteins (Gil et al. 2003). The monophyly of Bu. aphidicola and W. glossinidia was also detected based on whole-genome phylogenies (Daubin, Moran, and Ochman 2003; Canback, Tamas, and Andersson 2004). On the other hand, 16S rDNA phylogenies rejecting the monophyly of the three species have also been reported (Charles, Heddi, and Rahbe 2001).

    Symbiotic lineages present particular problems in phylogeny reconstruction due to their accelerated sequence evolution (Moran 1996; Itoh, Martin, and Nei 2002) and their biases in base and amino acid compositions (Moran 1996; Clark, Moran, and Baumann 1999; Shigenobu et al. 2001; Palacios and Wernegreen 2002; Rispe et al. 2004). In fact, the production of conflicting topologies is frequent, as in the case of phylogenetic analyses with Bu. aphidicola genes, where more than two thirds of them did not support the sisterhood with E. coli and became basal to the -proteobacterial phylogeny (Itoh, Martin, and Nei 2002; Canback, Tamas, and Andersson 2004). Attempts may be made to solve these problems by using genome-based approaches (Wolf et al. 2002; Bapteste et al. 2004), which can be based on sequence information such as concatenated alignments (Hansmann and Martin 2000; Brown et al. 2001), supertrees (Sicheritz-Ponten and Andersson 2001; Bininda-Emonds 2004) or phylogenies with a putatively HGT-free core set of genes (Daubin, Gouy, and Perriere 2002), or on other comparative genome data, such as gene content (Fitz-Gibbon and House 1999) or gene order (Suyama and Bork 2001).

    The aims of this study were, first, the estimation of genome rearrangement distances (based on breakpoints and inversions) between pairs of -proteobacterial complete sequenced genomes. These distances were obtained from a subset of genes shared by all these genomes, which putatively did not contain HGT-acquired genes. We tried to analyze the movement of the genes that evolve slowly at the genome rearrangement level. For that reason, genes involved in HGT were not selected and only those shared by every genome were chosen. These genes are probably essential or functionally important, and their changes of position in the genome may be deleterious. We also determined which phylogenetic lineages had evolved faster or slower. And, finally, we used these distances to obtain a gene order–based phylogeny which was very similar to other known -proteobacterial phylogenies but presented the split of the endosymbiotic cluster as the main characteristic.

    Material and Methods

    Table of Orthology

    The first step to estimate the genome rearrangement distances was the construction of a table of orthology. Thirty -proteobacterial genomes were selected for our analyses (table 1). They corresponded to those completely sequenced and reported before August 2003, except the genome of Coxiella burnetii RSA 493. This latter genome was not included in our analysis because we detected in it the absence of a small proportion of genes that were conserved in the rest of the genomes. We considered that it would produce a small but important decrease in the total number of genes to compare. The aim of the table was to include only those genes that were present in all the genomes, either as a gene or a pseudogene, and to remove any gene acquired by HGT in at least one of the 30 genomes. Pseudogenes were included because in order to estimate rearrangement distances the only important thing was to know the position of the gene (or pseudogene) and its transcriptional orientation.

    Table 1 -Proteobacterial Genomes

    We started our analysis with the tables of orthology that were obtained from two previous studies. In the first, the genomes of Bu. aphidicola BAp, E. coli K12, and Vibrio cholerae were compared, removing paralogous or xenologous genes (Silva, Latorre, and Moya 2001). In the second, the genomes of five insect bacterial endosymbionts were compared to detect the orthologous genes (Gil et al. 2003). The presence of those genes detected in the seven previous genomes as either genes or pseudogenes was searched for in the remaining -proteobacterial genomes. The analysis was carried out in the Microbial Genome Database for Comparative Analysis (MBGD) (Uchiyama 2003) with a maximum Blast score of 0.0001 and a phylocut value of 0.4. The gaps (absence of a gene) detected in several genomes were treated in several ways to confirm the absence of an orthologous gene or pseudogene. In the case of absence confirmation, the gene was removed from the orthologous table. Genomes were searched for a similar sequence to the absent gene with the amino acid encoded sequence by using the TBlastN algorithm (Altschul et al. 1997) in the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa et al. 2004). In some cases, the detection of a sequence with significant similarity indicated incorrect annotations and in others the presence of a pseudogene. Both situations were analyzed in detail before taking the final decision to maintain or remove the gene from the table.

    When the MBGD comparison rendered more than one gene in any of the genomes, we analyzed the phylogenetic tree and the genomic context of the genes to decide which was the orthologous or paralogous gene and whether the gene was to be maintained in the table. In the case of two true orthologous genes being present in a genome as a consequence of a recent duplication, we retained one of the copies at random and removed the other.

    The decision to remove a gene when a putatively HGT event had taken place in at least one genome was difficult because after more than 600 Myr of evolution of -Proteobacteria some conflicting phylogenies in fast-evolving genes were not related to HGT events but to problems with the phylogenetic methods for inferring the actual topology.

    We looked at whether any gene present in our orthologous table was included among the putatively HGT events detected in the Horizontal Gene Transfer Database (Garcia-Vallve et al. 2003). Most of our genes were not included, and only four genomes (Bu. aphidicola BSg, Pasteurella multocida, Pseudomonas aeruginosa, and Xylella fastidiosa) showed more than two genes in these lists. We made phylogenies for these genes and looked at the genomic context, but we were unable to remove any of them with great confidence. In fact, the six genes observed in Bu. aphidicola BSg are artifacts associated with the special base composition of this species because Bu. aphidicola is refractory to HGT events (Tamas et al. 2002; Silva, Latorre, and Moya 2003). To confirm the absence of HGT events in our table, we search to see if any of our genes were present in a recent analysis where candidate xenologous genes were detected with several criteria including phylogenetic validation (Medrano-Soto et al. 2004). Our final table contained 244 genes (see Supplementary Material online) and was putatively free of xenologous genes, although it is not impossible that a very small number of them were not detected.

    Breakpoint, Inversion, and Amino Acid Substitution Distances

    Two genome rearrangement distances were estimated between genome pairs. The disruption of gene order (breakpoint) was introduced early into computational studies (Nadeau and Taylor 1984). This concept served to estimate the breakpoint (BP) distance (the number of pairwise gene adjacencies present in one genome but absent in the other). Finally, BP distances were used to reconstruct phylogenies by several methods. In our study, for each species or strain, its genome can be regarded as a small circular molecule composed of 244 genes in which the order and transcriptional orientation is the same as that observed in the real complete genome. When comparing two genomes (those composed of 244 genes), we took one of them as a reference (genome A) with its genes ordered and compared it with the other (genome B). The first step was to change the sign of the genes in genome B when they were not in the same transcriptional orientation as in the reference genome. We then searched for whether the adjacent genes in A were also maintaining their adjacency in B. Taking two adjacent genes g1 and g2 in A, we considered that a breakpoint had occurred when they did not appear consecutively in genome B as either the pair (g1 g2) or (–g2 –g1). All the breakpoints between two genomes were counted, and we obtained the BP distance as the number of breakpoints transforming one genome into the other. This distance may be normalized by dividing this number by the total number of genes.

    Inversion (INV) distances are defined as the minimum possible number of inversions (reversals) needed to transform one genome into other. We estimated these distances in the genome rearrangement Web server GRIMM (Tesler 2002). The genomes were considered as unichromosomal circular signed permutations. INV distances can also be normalized by dividing the inversion number between two genomes by the number of genes.

    Amino acid substitution distances were calculated assuming a specific empirical model of protein evolution, using a substitution matrix with scores for all the possible exchanges of one amino acid for another inferred from the protein sequences data set. For our study, we used protein sequences encoded by 10 genes (rpoC, rpoB, rho, rpoA, rpsC, rpsD, nusG, rpsG, rplP, and rpsK) present in the 30 -proteobacterial genomes. They were selected by using the following protocol. First, we searched for slowly evolving proteins by selecting those with more than 80% identity between Bu. aphidicola BAp and E. coli K12. Second, we selected those involved in information transfer, and third, we selected the 10 with the largest amino acid sequences. The amino acid sequences of these proteins were obtained from MBGD and aligned with the ClustalX program (Thompson et al. 1997). Alignments were edited using the G-BLOCKS program (Castresana 2000) to select the most conserved sites of the alignment, deleting highly variable sites and sequence gaps with the goal of selecting the amino acid positions with the greater phylogenetic information. Parameters were fixed in 19 for conserved positions, 22 for flanked positions, 1 for the maximum number of contiguous nonconserved positions, and 10 for the minimum block size.

    The 10 alignments were concatenated into a single one composed of 3,670 amino acid positions. This alignment was used to infer amino acid substitution distances with maximum likelihood (ML) using Tree-Puzzle 5.2 (Strimmer and vonHaeseler 1996; Schmidt et al. 2002) with the VT model of evolution (Muller and Vingron 2000).

    Relative INV Distances

    To estimate whether the rates of rearrangement by inversions were constant among lineages, lineages were compared as follows. Considering two species which diverged from an ancestor O and an out-group species C, we determined the ratio dAO/dBO. To calculate this ratio, we use the pairwise INV distances among species A, B, and C. To estimate if lineage A had had a different inversion rate from lineage B, we estimated dAO/dBO as (dAC – dBC + dAB)/(dBC – dAC + dAB). Between all pairwise comparisons we decided to select as species A one representative of each endosymbiont species, Vibrionaceae spp. and Pasteurellaceae spp. (BAp, bfl, wgl, vch, vvu, vpa, hdu, hin, and pmu) and to compare them with the group of free-living enterics (species B: eco, sfl, sfx, stm, stt, sty, ype, and ypk) in order to find out whether the former lineages were evolving, on average, faster or slower than the latter. The genomes of pae, ppu, pst, and son were used as out-groups (species C). We estimated the dAO/dBO ratios as the average of those obtained with the four out-group species. BSg and BBp were not included because the order of their genomes was identical or almost identical to that from BAp. The genomes of ece, ecs, and ecc were also not included because they were identical to the one from eco.

    Phylogeny

    The INV distance matrix obtained with GRIMM was used to reconstruct the -proteobacterial phylogenetic tree using the Fitch Margoliash (FM) (Fitch and Margoliash 1967) and the neighbor-joining (NJ) (Saitou and Nei 1987) methods implemented in the FITCH and NEIGHBOR programs, respectively, from the PHYLIP software package (http://evolution.genetics.washington.edu/phylip.html). The input order of species was randomized, and global rearrangements were made to ensure that we obtained the optimum tree and that no species had fallen into a suboptimal region of the space of all possible trees.

    Phylogenetic reconstruction with the BP distance matrix was carried out with the same methods and conditions as those used with the INV distance matrix.

    To assess the reliability of the phylogenetic reconstruction with BP and INV distances, we could not do conventional bootstrap analyses because we did not work with sequence data with nucleotide or amino acid positions acting as characters, and randomly altering the order would render the data meaningless. To solve this problem we applied a jackknife resampling method that consisted of the random selection of 122 genes out of the initial 244 and the removal of the remaining genes from the genomes. Finally, we produced new signed permutations but this time from 122 instead of the 244 previous elements. Once we had the 30 genomes of 122 signed elements, we did the previous analyses to obtain INV and BP distance matrices. We implemented 100 jackknife random samples, obtaining 100 pairwise distance matrices for inversions and breakpoints. These 100 matrices were loaded into the FITCH and NEIGHBOR programs to obtain 100 breakpoint/inversion phylogenetic trees, and finally, the CONSENSE program of the PHYLIP software package was used to obtain a majority rule consensus tree with the numbers at each node reflecting the percentage of times that the clade defined by that node appears in/on the 100 jackknife trees. These values were assigned to the nodes of the initial 244-gene trees.

    The concatenated amino acid alignment described in Breakpoint, Inversion, and Amino Acid Substitution Distances was used to obtain a sequence-based phylogenetic tree for -Proteobacteria by ML using Tree-Puzzle 5.2 software and the quartet puzzling algorithm (Strimmer and vonHaeseler 1996; Schmidt et al. 2002). Options included exact parameter estimation by quartet puzzling plus NJ, VT model of evolution (Muller and Vingron 2000), heterogeneity rate (one invariable and eight gamma rates), and 10,000 puzzling steps. Because a few nucleotide indeterminations were detected in the sequences of the selected genes from E. coli CFT073 and E. coli 0157:H7 EDL933, these genomes were removed from the phylogenetic analysis.

    Results

    Orthologous Genes Shared by -Proteobacterial Complete Genomes

    The first step in our analyses was to build a table of orthologous genes for 30 -proteobacterial complete genomes. An ortholog of each gene ought to be present in every genome to be included in the table either as a gene or as a pseudogene. We tried to remove genes that had been putatively acquired by HGT in any of the genomes. This was very difficult because many genes produced abnormal phylogenies for several reasons not related to HGT. For that reason, we were very conservative. The final list of 244 genes may be obtained from the Supplementary Material online.

    Correlation of Breakpoint and INV Distances

    A matrix with the BP distances for the 30 genomes was obtained. The order of the genes in the four strains of E. coli was identical, leading to distances equal to zero. The same situation arose between the Bu. aphidicola strains BAp and BSg. The maximum distances were obtained between Haemophilus ducreyi and X. fastidiosa (162 breakpoints). This value corresponds to a normalized distance of 0.664. A matrix was also constructed for the INV distances with zero as the minimal value for the same pairwise comparisons and a maximum value for the distance between H. ducreyi and Xanthomonas axonopodis (159 inversions, with a normalized distance of 0.652).

    These two types of distances were expected to slightly underestimate the actual number of events because of the possibility of multiple breakpoints in the same place or because the optimal scenario to transform one genome into another via inversion events may provide a number of steps which are smaller than the actual one. However, we believe that these differences will be practically negligible because simulations comparing the estimated and the actual distances show very similar values for our range of normalized distances (Bourque and Pevzner 2002; Moret et al. 2002b). In fact, the expected underestimation for our maximum value, in the simulation for INV distances (Bourque and Pevzner 2002), was around three inversions.

    A great correlation between BP and INV distances was detected (fig. 1, correlation coefficient r = 0.996). It indicated that inversion is the most common rearrangement event affecting the -proteobacterial genomes. Although other kind of rearrangement such as transpositions may take place, their contribution would be very slight in this gene subset.

    FIG. 1.— Normalized breakpoint (BP) and inversion (INV) distance comparison between pairs of the -proteobacterial genomes. The discontinuous line represents the regression line between the two distance measurements (correlation coefficient r = 0.996).

    Genome Rearrangement Versus Amino Acid Substitution Distances Through -Proteobacterial Evolution

    In order to detect whether genome rearrangement rates had been constant between and within the different evolutionary -proteobacterial lineages, a comparison was carried out between sequence-based distances and BP and INV distances. Pairwise ML distances were computed for all the genomes from an amino acid concatenated alignment, and a phylogenetic tree was reconstructed by ML (fig. 2). The tree topology showed that the endosymbiotic species formed a well-supported monophyletic group. The clustering of these species in phylogenetic analyses was in agreement with previous observations (Sauer et al. 2000; Gil et al. 2003). The ML amino acid substitution distances were compared to BP and to INV distances (fig. 3, BP distance plot is not shown because it is almost identical to the INV distance plot). In general, the progressive increase in sequence distances between two genomes is associated with the increase in their rearrangement-based distances, but several groups of distances presented abnormal behavior. In the first group, H. ducreyi, Haemophilus influenzae, and Pa. multocida showed large BP and INV distances in all pairwise comparisons, indicating that the lineage of Pasteurellaceae is evolving at a fast rearrangement rate. A second group was constituted by the distances between Bl. floridanus and free-living enterics. The reason for the abnormal position of these distances was the acceleration of the sequence substitution rate in the endosymbiotic bacterial lineage which led to large ML distances. A third group was formed by the distances among the three Bu. aphidicola strains, with little or no rearrangement but high ML rates. These results indicate that, for some species, it is possible to observe for a specific group of genes or proteins, a constant increase in the number of rearrangements and amino acid substitutions over time. However, others behave heterogeneously with situations of an almost null genome rearrangement rate (within Bu. aphidicola lineage) or an extremely high rate (Pasteurellaceae). The same situation may be observed for the sequence substitution rates, which are extremely high in endosymbiotic species.

    FIG. 2.— Phylogenetic relationships inferred by maximum likelihood using Tree-Puzzle 5.2 and an alignment of the concatenated amino acid sequences from the proteins encoded by rpoC, rpoB, rho, rpoA, rpsC, rpsD, nusG, rpsG, rplP, and rpsK genes. Values at nodes indicate proportion of quartets supporting the corresponding inner branch, as determined by the quartet puzzling method. See species abbreviations in table 1.

    FIG. 3.— Comparison of maximum likelihood (ML) distances to normalized inversion (INV) distances between pairs of -proteobacterial genomes. Symbols show the different groups observed in the graph. Open squares represent distances from hdu, hin, and pmu to the rest of the studied species. Open triangles are distances from bfl to free-living enterics (eco, ecs, sfx, sfl, stm, stt, sty, ype, ypk). Crosses are distances among the three Buchnera aphidicola strains (BSg, BAp, and BBp). Close rhombuses represent the rest of the pairwise distance comparisons. See abbreviations in table 1.

    Relative INV Distances

    In order to confirm the faster evolutionary rearrangement rate of Pasteurellaceae and to characterize the situation of the three endosymbiotic species compared with other free-living enterobacteriacae, we carried out a relative rate approach comparing the INV distance rates in the lineages of Bu. aphidicola, Bl. floridanus, W glossinidia, V. cholerae, Vibrio vulnificus, Vibrio parahaemolyticus, H. ducreyi, H. influenzae, and Pa. multocida with those of the free-living enteric bacteria E. coli, Salmonella typhimurium LT2, Salmonella enterica subsp. enterica serovar Typhi str. CT18, Sa. enterica subsp. enterica serovar Typhi Ty2, Shigella flexneri 2a str. 301, Shi. flexneri 2a str. 2457T, Yersinia pestis CO92, and Y. pestis KIM, after the divergence of the two clusters. Shewanella oneidensis MR-1, Ps. aeruginosa, Pseudomonas putida, and Pseudomonas syringae were used as out-group species (fig. 4). The results showed that the branches leading to H. ducreyi, H. influenzae, and Pa. multocida were evolving faster than free-living enterics at average relative rates of 2.02, 1.90, and 1.64, respectively. This agrees with the abnormal position of their distances in the figure 3 plot. The behavior of the three endosymbiotic species was not identical. While Bu. aphidicola BAp and W. glossinidia were evolving, on average, to a slightly higher rate than free-living enterics (1.38 and 1.35), Bl. floridanus has evolved to almost the same rate as free-living enterics (1.04). Finally, Vibrio lineages, included as a control, have evolved to a slightly smaller rate than free-living enterics (V. cholerae, 0.86; V. vulnificus, 0.81; and V. parahaemolyticus, 0.73).

    FIG. 4.— Relative inversion distance rate estimation. The inversion distance in the branch from ancestor O to the problem species A (dAO) was compared with that of the branch from ancestor O to the reference species B (dBO) by using the inversion distances to the out-group species C (see right tree). Problem species are BAp, bfl, wgl, vch, vvu, vpa, hdu, hin, and pmu. Reference species are eco (1), stm (2), sty (3), stt (4), sfl (5), sfx (6), ype (7), and ypk (8), and out-group species are son, pae, ppu, and pst. The number over each column corresponds to the reference species compared. The height of each column is the dAO/dBO ratio, estimated as the average of the values obtained after the comparisons with the four out-group species. The species name for each abbreviation may be found in table 1.

    To interpret these relative rates correctly, we must bear in mind that they are averaging the number of chromosomal rearrangements after divergence from the free-living enteric cluster. Because we know that during the last 100–150 Myr of evolution the Bu. aphidicola genomes have experienced a minimal number of rearrangements, the average relative rate of 1.38 requires a more precise interpretation. Assuming that the divergence of Bu. aphidicola from the free-living enteric cluster occurred at some moment between 200 and 300 MYA, we can consider that in a first phase of evolution, during the adaptation to endosymbiosis, the rearrangement rate was much higher (around 2.76, if we consider the same period of time between the divergence from E. coli and the divergence of the three strains, and the time from that) which was followed, in a second phase of stability, by a rate close to zero as a consequence of the inability to produce and fix new rearrangements in the genome.

    Phylogenetic Reconstruction Based on BP and INV Distances (Gene Order Phylogenies)

    A phylogenetic reconstruction based on genome rearrangement distances was carried out with two aims. First, to detect the periods of faster or slower evolutionary rates according to the length of branches on the tree, and second to use these distances to determine the relationship between the three endosymbiotic species and their position within the -proteobacterial phylogeny.

    The NJ and the FM methods were used with BP distances to reconstruct the phylogeny (fig. 5A). Both methods inferred the same topology. In order to obtain supporting values for each node, BP distances were estimated after obtaining 100 random samples with genomes containing half the number of genes. The inferred topology was similar to that obtained based on amino acid sequences (fig. 2), but with several important differences.

    FIG. 5.— Phylogenetic relationships between 30 -proteobacterial genomes inferred from a breakpoint distance (A) and an inversion distance matrix (B). Values at nodes reflect the percentage of times the clade defined by that node appears in the 100 jackknife trees. Distance-based phylogenetic methods were Fitch-Margoliash (upper values) and neighbor joining (lower values). See species abbreviations in table 1. The bar represents 20 breakpoints (A) or 20 inversions (B).

    The same approach was carried out with the INV distances (fig. 5B), obtaining an even closer topology to the sequence-based one (fig. 2). The Shi. flexneri strains move closer to E. coli, and She. oneidensis slightly changed its position.

    The most important differences with the sequence-based topology affected the position of the Pasteurellaceae cluster and the position of the bacterial endosymbionts. The two Haemophilus spp. and Pa. multocida acquired, in the gene order phylogeny, a basal position between pseudomonads and the out-groups. The second discordant result for the sequence-based topology was the split of the monophyletic clade of the three endosymbiotic species. The position of Bl. floridanus was, in addition, closer to E. coli than to Y. pestis. The Bu. aphidicola and W. glossinidia lineages maintained the position as an out-group of the E. coli–Y. pestis cluster but as independent lineages. The gene order phylogeny reconstructed the relations within the Vibrio and Pseudomonas genera well, although it was unable to produce a clade with the three Salmonella genomes. This is not surprising due to the extremely low phylogenetic signal with a single inversion between the genomes of E. coli and Sa. typhimurium LT2.

    The observation of the branch lengths in the INV distance phylogeny shows that the fixation of inversion events in the genome is an irregular phenomenon during evolution with periods of stasis and others of acceleration. One example is the high inversion rate in the Shi. flexneri lineage with 9 and 11 inversions separating their genomes from the E. coli and even 4 inversions separating the genomes of the two strains. On the other hand, the genomes of E. coli and Sa. typhimurium are separated by a single inversion, in spite of the estimated 100 Myr of divergence (Lawrence and Ochman 1998). The two Y. pestis strains are another example of recent fast evolution with 14 inversions separating the two genomes, in spite of the small ML distance (0.0003) which indicates a very recent divergence. Another example is the large branches between the Pasteurellaceae species, which show that they evolved and continue evolving at a fast genome rearrangement rate.

    Discussion

    The availability of the sequences of many complete -proteobacterial genomes makes this group very useful for the study of rearrangements through evolution. Several reasons make an overall analysis of the gene order changes difficult: the large divergence time of many -proteobacterial species, the different sizes and gene contents of their genomes, the presence of duplications, the high frequency of HGT events, and the existence of restrictions to the gene order change (Campo et al. 2004). To reduce the complexity of this study, in a similar way as is done with sequence data, we selected a set of genes putatively slowly evolving and free of xenologous genes for our genome rearrangement analyses. The restrictive criterion that must be shared by the 30 analyzed bacterial genomes mean that we are probably working with a set close to the genome core (Jain, Rivera, and Lake 1999).

    To estimate the number of rearrangements between genomes we calculated two distances. The BP distance measured the number of breakpoints separating two genomes, which are produced as a consequence of inversions or transpositions. They may be underestimated because a BP may be reused by a new rearrangement event. INV distances measure the minimal numbers of inversions to pass from one genome to another. Because INV distances do not consider transpositions, their use will only be appropriate when inversions are the most important event, fixed in the genome through evolution. We consider that this distance is appropriate first because inversions were detected as the main type of rearrangement observed in interspecific -proteobacterial genome comparisons (Hughes 2000) and second because we detected a high correlation between inversion and BP distances. Pairwise INV distances, as we define them, are underestimated, and in general the true evolutionary distance is longer. Real distances may be got by analyzing a special problem called the median problem. For example, for three genomes, the median is the genome that minimizes the sum of the pairwise distances between itself and each of the other three. Algorithms to try to solve these problems required great computer capability and, for that reason, they are frequently applied to genomes with a small number of genes (Bourque and Pevzner 2002; Moret et al. 2002a; Tang and Moret 2003a). The INV distances estimated in our study are quite close to the actual number of rearrangements, provided the number of steps is much smaller than the number of genes (Bourque and Pevzner 2002).

    The comparison of the rearrangement and sequence distances (fig. 3) and the observation of the branch lengths in the phylogenetic tree (fig. 5) show that although rearrangement distances increase with time, they occur at a heterogeneous rate with strong variations between and throughout the evolution of lineages. A great acceleration of the three Pasteurellaceae lineages was detected. We have estimated that, on average, they evolve at a relative rate of at least twice that of free-living enteric bacteria. Natural competence of these species (Dubnau 1999) is probably responsible for this acceleration. The DNA uptake system in these bacteria requires the presence of a specific short sequence, called the uptake signal sequence, for the binding and uptake of a DNA fragment. The high number of copies in the genomes of Pa. multocida (927) and H. influenzae (1,471) lead to the preferential uptake of DNA from close relatives (Bakkali et al. 2004). So these genomes will contain many HGT genes difficult to identify due to the similar nucleotide characteristics of the acceptor and receptor species. Then, an important proportion of the Pasteurellaceae genes should not have been included, making this family unsuitable for performing gene order phylogenies.

    Heterogeneity was also observed in endosymbiont lineages. The three Bu. aphidicola genomes allow us to make a partition of the slightly high average relative rate (1.38) into two extremely different periods: the last 100 Myr of almost complete stasis and the initial period after the divergence from E. coli with a high rate (at least 2.76). These results confirm that the high number of rearrangements observed between E. coli and Bu. aphidicola occurred more than 100 MYA, most of them probably in a short period during the adaptation of the genome to endosymbiosis due to the reduction of the restrictions for the fixation of genome rearrangements. With the availability of a single genome in the other two endosymbiotic species, the analysis of the frequency of rearrangements during the evolution of the lineage is not so clear. However, it is possible to assume that W. glossinidia behaved like Bu. aphidicola due to the similar average relative rate (1.35). On the other hand, Bl. floridanus behaved very differently, with an average genome rearrangement rate similar to free-living enterics. Because we do not know whether this bacteria has a genome stability phase as Bu. aphidicola, we cannot discard the possibility that some acceleration in the genome rearrangement rate had taken place during the adaptation to symbiosis.

    Genome-based phylogenetic approaches may be classified into three groups: gene content methods, sequence methods, and gene order methods. Gene content methods (Snel, Bork, and Huynen 1999; Bapteste et al. 2004) require two steps. First, they must establish the orthology between genes and second, they must convert the shared or unshared gene data into a tree structure. These methods may be strongly affected when the number of genes in the compared species is markedly different. Several methods have been used to try to overcome this problem, but it is still usual to produce an incorrect position of Bu. aphidicola (Wolf et al. 2002; Dutilh et al. 2004). Genome-scale sequence-based methods help resolve incongruent phylogenies (Rokas et al. 2003). They have recently been used to reconstruct the phylogenetic relationships among the already-mentioned three bacterial endosymbiotic species and the rest of the -proteobacterial species. In all cases the monophyly of the endosymbionts as well as the intermediate position in the -proteobacterial tree between the E. coli–Y. pestis and the Pasteurellaceae clades were inferred (Gil et al. 2003; Lerat, Daubin, and Moran 2003; Canback, Tamas, and Andersson 2004). Gene order data have recently been introduced in phylogenetics and are considered to be especially suitable for resolving the phylogeny of closely related species (Suyama and Bork 2001). Problems associated with these methods are the diversity in the shape and number of chromosomes, the variable number of genes, or the HGT problem (Blanchette, Kunisawa, and Sankoff 1999; Moret et al. 2001; Bourque and Pevzner 2002; Tesler 2002; Tang and Moret 2003b). Some of these methods have been included in SHOT, a Web server for gene content and gene order phylogenies (Korbel et al. 2002). However, at the moment the number of genomes is low and only one endosymbiont (Bu. aphidicola) has been included in the database. Its position in gene order phylogeny using default parameters was at the base of the -Proteobacteria (Korbel et al. 2002).

    For the resolution of this phylogeny with our gene order data, we used a set of 244 genes that were present in all analyzed genomes as either genes or pseudogenes. This number is slightly higher than the 205 orthologous gene sets used in an early sequence-based -proteobacterial phylogeny (Lerat, Daubin, and Moran 2003). Our reconstruction, using several distance matrix–based methods, rendered very similar results to other -proteobacterial phylogenies, with the position of the family Pasteurellaceae being a consequence of a large set of noneasily detectable HGT genes. Our more surprising results were associated with the breakage of the monophyly of the three insect bacterial endosymbionts. Most of the sequence-based phylogenies tend to infer the sisterhood of these species, which is what we would expect if the phenomenon of long-branch attraction (Nei 1996) had taken place because not only do the long endosymbiont branches join together but also the two longest branches (Bl. floridanus and W. glossinidia) stick together. In addition, the branch leading to Bu. aphidicola BBp, which is the fastest evolving among the three Bu. aphidicola lineages, tends in some phylogenies to separate from the other Bu. aphidicola strains and to join Bl. floridanus and W. glossinidia. The selection of the slow-evolving positions of genes and proteins is required to avoid this artifact. In addition, endosymbionts share characteristics such as a low GC content or a bias for the increase in AT-encoded amino acids (Moran 1996; Clark, Moran, and Baumann 1999) that would lead to a wrong clustering. Our gene order phylogeny did not show a cluster of the three endosymbiotic species. The effect of the long-branch attraction is expected to be smaller because the acceleration of the branches leading to the three endosymbionts are, relative to free-living enteric bacteria, much smaller in the gene order than in the sequence phylogeny (see figs. 2 and 5). Several incorrect estimations may affect our gene order phylogeny. First, both rearrangement distances are underestimated, but simulation studies indicate that it is very low (Bourque and Pevzner 2002; Moret et al. 2002b), especially for the intermediate values of the normalized distances among endosymbionts or between them and free-living enterics. Second, NJ and least squares are not the most up-to-date methods in phylogenetics, although they are still valuable tools.

    We consider that endosymbiont phylogeny is still an open question, but we believe that the endosymbiotic clade is an artifact in spite of it being reconstructed in most genome-scale phylogenies. We consider that the use of ML methods with sequence data does not give complete certainty that the correct phylogeny will be reconstructed. In fact, recent simulation studies show that either distance or ML methods become inconsistent, with long-branch attraction as one of the common forms of inconsistency (Susko, Inagaki, and Roger 2004), producing artifacts such as the joining of the fast-evolving microsporidian parasitic fungi lineage to the Archaea (Inagaki et al. 2004), or that the ML and Bayesian Markov chain Monte Carlo can become strongly biased and statistically inconsistent when the rates at which sequence sites evolve change nonidentically over time (Kolaczkowski and Thornton 2004). Recently, -proteobacterial phylogenies based on two genes have shown that under nonhomogeneous evolutionary rate models, Bu. aphidicola is separated from the other AT-rich endosymbiont bacterial species (Herbeck, Degnan, and Wernegreen 2004).

    Finally, the estimation of genome rearrangement distances and the reconstruction of gene order phylogenies may be applied to the study of other bacterial groups. For example, an analysis of the Firmicutes group (including the small genomes of Mycoplasma spp.) may be done with the around 138 genes shared for all the Firmicutes sequenced genomes. This number would increase to 159 removing the symbiont Phytoplasma asteris.

    Supplementary Material

    Supplementary materials are available at Molecular Biology and Evolution online (www.mbe.oupjournals.org).

    Acknowledgements

    Financial support was provided by a fellowship to E.B. (BI04-66 UV). This work was supported by grant BMC2003-00305 from the Ministerio de Ciencia y Tecnología (Spain) and grant Grupos03/204 from Generalitat Valenciana (Spain).

    References

    Akman, L., A. Yamashita, H. Watanabe, K. Oshima, T. Shiba, M. Hattori, and S. Aksoy. 2002. Genome sequence of the endocellular obligate symbiont of tsetse flies, Wigglesworthia glossinidia. Nat. Genet. 32:402–407.

    Altschul, S. F., T. L. Madden, A. A. Schaffer, J. H. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.

    Andersson, J. O., and S. G. Andersson. 2001. Pseudogenes, junk DNA, and the dynamics of Rickettsia genomes. Mol. Biol. Evol. 18:829–839.

    Bakkali, M., T. Y. Chen, H. C. Lee, and R. J. Redfield. 2004. Evolutionary stability of DNA uptake signal sequences in the Pasteurellaceae. Proc. Natl. Acad. Sci. USA 101:4513–4518.

    Bapteste, E., Y. Boucher, J. Leigh, and W. F. Doolittle. 2004. Phylogenetic reconstruction and lateral gene transfer. Trends Microbiol. 12:406–411.

    Baumann, P., L. Baumann, C. Y. Lai, D. Roubakhsh, N. A. Moran, and M. A. Clark. 1995. Genetics, physiology, and evolutionary relationships of the genus Buchnera—intracellular symbionts of aphids. Annu. Rev. Microbiol. 49:55–94.

    Bininda-Emonds, O. R. P. 2004. The evolution of supertrees. Trends Ecol. Evol. 19:315–322.

    Blanchette, M., T. Kunisawa, and D. Sankoff. 1999. Gene order breakpoint evidence in animal mitochondrial phylogeny. J. Mol. Evol. 49:193–203.

    Blattner, F. R., G. Plunkett III, C. A. Bloch et al. (14 co-authors). 1997. The complete genome sequence of Escherichia coli K-12. Science 277:1453–1474.

    Boucher, Y., C. J. Douady, R. T. Papke, D. A. Walsh, M. E. R. Boudreau, C. L. Nesbo, R. J. Case, and W. F. Doolittle. 2003. Lateral gene transfer and the origins of prokaryotic groups. Annu. Rev. Genet. 37:283–328.

    Bourque, G., and P. A. Pevzner. 2002. Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Res. 12:26–36.

    Brown, J. R., C. J. Douady, M. J. Italia, W. E. Marshall, and M. J. Stanhope. 2001. Universal trees based on large combined protein sequence data sets. Nat. Genet. 28:281–285.

    Buell, C. R., V. Joardar, M. Lindeberg et al. (41 co-authors). 2003. The complete genome sequence of the Arabidopsis and tomato pathogen Pseudomonas syringae pv. tomato DC3000. Proc. Natl. Acad. Sci. USA 100:10181–10186.

    Campo, N., M. J. Dias, M. L. Daveran-Mingot, P. Ritzenthaler, and P. Le Bourgeois. 2004. Chromosomal constraints in gram-positive bacteria revealed by artificial inversions. Mol. Microbiol. 51:511–522.

    Canback, B., I. Tamas, and S. G. Andersson. 2004. A phylogenomic study of endosymbiotic bacteria. Mol. Biol. Evol. 21:1110–1122.

    Casjens, S. 1998. The diverse and dynamic structure of bacterial genomes. Annu. Rev. Genet. 32:339–377.

    Castresana, J. 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17:540–552.

    Charles, H., A. Heddi, and Y. Rahbe. 2001. A putative insect intracellular endosymbiont stem clade, within the Enterobacteriaceae, inferred from phylogenetic analysis based on a heterogeneous model of DNA evolution. C. R. Acad. Sci. III 324:489–494.

    Clark, M. A., N. A. Moran, and P. Baumann. 1999. Sequence evolution in bacterial endosymbionts having extreme base compositions. Mol. Biol. Evol. 16:1586–1598.

    da Silva, A. C., J. A. Ferro, F. C. Reinach et al. (62 co-authors). 2002. Comparison of the genomes of two Xanthomonas pathogens with differing host specificities. Nature 417:459–463.

    Daubin, V., M. Gouy, and G. Perriere. 2002. A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history. Genome Res. 12:1080–1090.

    Daubin, V., N. A. Moran, and H. Ochman. 2003. Phylogenetics and the cohesion of bacterial genomes. Science 301:829–832.

    Deng, W., V. Burland, G. Plunkett III et al. (18 co-authors). 2002. Genome sequence of Yersinia pestis KIM. J. Bacteriol. 184:4601–4611.

    Deng, W., S. R. Liou, G. Plunkett III, G. F. Mayhew, D. J. Rose, V. Burland, V. Kodoyianni, D. C. Schwartz, and F. R. Blattner. 2003. Comparative genomics of Salmonella enterica serovar Typhi strains Ty2 and CT18. J. Bacteriol. 185:2330–2337.

    Dubnau, D. 1999. DNA uptake in bacteria. Annu. Rev. Microbiol. 53:217–244.

    Dutilh, B. E., M. A. Huynen, W. J. Bruno, and B. Snel. 2004. The consistent phylogenetic signal in genome trees revealed by reducing the impact of noise. J. Mol. Evol. 58:527–539.

    Eisen, J. A., J. F. Heidelberg, O. White, and S. L. Salzberg. 2000. Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genom. Biol. 1:RESEARCH0011.1–0011.9.

    Fitch, W. M., and E. Margoliash. 1967. Construction of phylogenetic trees. Science 155:279–284.

    Fitz-Gibbon, S. T., and C. H. House. 1999. Whole genome-based phylogenetic analysis of free-living microorganisms. Nucleic Acids Res. 27:4218–4222.

    Fleischmann, R. D., M. D. Adams, O. White et al. (37 co-authors). 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512.

    Garcia-Vallve, S., E. Guzman, M. A. Montero, and A. Romeu. 2003. HGT-DB: a database of putative horizontally transferred genes in prokaryotic complete genomes. Nucleic Acids Res. 31:187–189.

    Garcia-Vallve, S., A. Romeu, and J. Palau. 2000. Horizontal gene transfer in bacterial and archaeal complete genomes. Genome Res. 10:1719–1725.

    Gil, R., F. J. Silva, E. Zientz, F. Delmotte, F. Gonzalez-Candelas, A. Latorre, C. Rausell, J. Kamerbeek, J. Gadau, B. Holldobler, R. C. H. J. van Ham, R. Gross, and A. Moya. 2003. The genome sequence of Blochmannia floridanus: Comparative analysis of reduced genomes. Proc. Natl. Acad. Sci. USA 100:9388–9393.

    Gomez-Valero, L., A. Latorre, and F. J. Silva. 2004. The evolutionary fate of nonfunctional DNA in the bacterial endosymbiont Buchnera aphidicola. Mol. Biol. Evol. 21:2172–2181.

    Hansmann, S., and W. Martin. 2000. Phylogeny of 33 ribosomal and six other proteins encoded in an ancient gene cluster that is conserved across prokaryotic genomes: influence of excluding poorly alignable sites from analysis. Int. J. Syst. Evol. Microbiol. 50:1655–1663.

    Hayashi, T., K. Makino, M. Ohnishi et al. (19 co-authors). 2001. Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res. 8:11–22.

    Heidelberg, J. F., J. A. Eisen, W. C. Nelson et al. (23 co-authors). 2000. DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. Nature 406:477–483.

    Heidelberg, J. F., I. T. Paulsen, K. E. Nelson et al. (40 co-authors). 2002. Genome sequence of the dissimilatory metal ion-reducing bacterium Shewanella oneidensis. Nat. Biotechnol. 20:1118–1123.

    Herbeck, J. T., P. H. Degnan, and J. J. Wernegreen. 2004. Non-homogeneous model of sequence evolution indicates independent origins of primary endosymbionts within the Enterobacteriales (gamma-Proteobacteria). Mol. Biol. Evol.

    Hughes, D. 2000. Evaluating genome dynamics: the constraints on rearrangements within bacterial genomes. Genom. Biol. 22:520–532.

    Inagaki, Y., E. Susko, N. M. Fast, and A. J. Roger. 2004. Covarion shifts cause a long-branch attraction artifact that unites microsporidia and archaebacteria in EF-1 alpha phylogenies. Mol. Biol. Evol. 21:1340–1349.

    Itoh, T., W. Martin, and M. Nei. 2002. Acceleration of genomic evolution caused by enhanced mutation rate in endocellular symbionts. Proc. Natl. Acad. Sci. USA 99:12944–12948.

    Jain, R., M. C. Rivera, and J. A. Lake. 1999. Horizontal gene transfer among genomes: the complexity hypothesis. Proc. Natl. Acad. Sci. USA 96:3801–3806.

    Jin, Q., Z. Yuan, J. Xu et al. (30 co-authors). 2002. Genome sequence of Shigella flexneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157. Nucleic Acids Res. 30:4432–4441.

    Kanehisa, M., S. Goto, S. Kawashima, Y. Okuno, and M. Hattori. 2004. The KEGG resource for deciphering the genome. Nucleic Acids Res. 32:D277–D280.

    Kim, Y. R., S. E. Lee, C. M. Kim, S. Y. Kim, E. K. Shin, D. H. Shin, S. S. Chung, H. E. Choy, A. Progulske-Fox, J. D. Hillman, M. Handfield, and J. H. Rhee. 2003. Characterization and pathogenic significance of Vibrio vulnificus antigens preferentially expressed in septicemic patients. Infect. Immun. 71:5461–5471.

    Kolaczkowski, B., and J. W. Thornton. 2004. Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431:980–984.

    Koonin, E. V., and M. Y. Galperin. 1997. Prokaryotic genomes: the emerging paradigm of genome-based microbiology. Curr. Opin. Genet. Dev. 7:757–763.

    Korbel, J. O., B. Snel, M. A. Huynen, and P. Bork. 2002. SHOT: a web server for the construction of genome phylogenies. Trends Genet. 18:158–162.

    Lawrence, J. G., and H. Ochman. 1998. Molecular archaeology of the Escherichia coli genome. Proc. Natl. Acad. Sci. USA 95:9413–9417.

    Lerat, E., V. Daubin, and N. A. Moran. 2003. From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-proteobacteria. PLoS. Biol. 1:101–109.

    Makino, K., K. Oshima, K. Kurokawa et al. (14 co-authors). 2003. Genome sequence of Vibrio parahaemolyticus: a pathogenic mechanism distinct from that of V cholerae. Lancet. 361:743–749.

    May, B. J., Q. Zhang, L. L. Li, M. L. Paustian, T. S. Whittam, and V. Kapur. 2001. Complete genomic sequence of Pasteurella multocida, Pm70. Proc. Natl. Acad. Sci. USA 98:3460–3465.

    McClelland, M., K. E. Sanderson, J. Spieth et al. (23 co-authors). 2001. Complete genome sequence of Salmonella enterica serovar Typhimurium LT2. Nature 413:852–856.

    Medrano-Soto, A., G. Moreno-Hagelsieb, P. Vinuesa, J. A. Christen, and J. Collado-Vides. 2004. Successful lateral transfer requires codon usage compatibility between foreign genes and recipient genomes. Mol. Biol. Evol. 21:1884–1894.

    Moran, N. A. 1996. Accelerated evolution and Muller's rachet in endosymbiotic bacteria. Proc. Natl. Acad. Sci. USA 93:2873–2878.

    ———. 2002. Microbial minimalism: genome reduction in bacterial pathogens. Cell 108:583–586.

    Moran, N. A., and A. Mira. 2001. The process of genome shrinkage in the obligate symbiont Buchnera aphidicola. Genom. Biol. 2:RESEARCH0054.01–0054.12.

    Moret, B. M., L. S. Wang, T. Warnow, and S. K. Wyman. 2001. New approaches for reconstructing phylogenies from gene order data. Bioinformatics 17(Suppl. 1):S165–S173.

    Moret, B. M. E., A. C. Siepel, J. Tang, and T. Liu. 2002a. Inversion medians outperform breakpoint medians in phylogeny reconstruction from gene-order data. Pp. 521–536 in Algorithms in bioinformatics. Second International Workshop, WABI 2002, Rome, Italy, September 17–21, 2002, ed. R. Guigó and D. Gusfield. Springer-Verlag.

    Moret, B. M. E., J. J. Tang, L. S. Wang, and T. Warnow. 2002b. Steps toward accurate reconstructions of phylogenies from gene-order data. J. Comput. Syst. Sci. 65:508–525.

    Muller, T., and M. Vingron. 2000. Modeling amino acid replacement. J. Comput. Biol. 7:761–776.

    Nadeau, J. H., and D. Sankoff. 1998. Counting on comparative maps. Trends Genet. 14:495–501.

    Nadeau, J. H., and B. A. Taylor. 1984. Lengths of chromosomal segments conserved since divergence of man and mouse. Proc. Natl. Acad. Sci. USA 81:814–818.

    Nei, M. 1996. Phylogenetic analysis in molecular evolutionary genetics. Annu. Rev. Genet. 30:371–403.

    Nelson, K. E., C. Weinel, I. T. Paulsen et al. (40 co-authors). 2002. Complete genome sequence and comparative analysis of the metabolically versatile Pseudomonas putida KT2440. Environ. Microbiol. 4:799–808.

    Ochman, H., J. G. Lawrence, and E. A. Groisman. 2000. Lateral gene transfer and the nature of bacterial innovation. Nature 405:299–304.

    Palacios, C., and J. J. Wernegreen. 2002. A strong effect of AT mutational bias on amino acid usage in Buchnera is mitigated at high-expression genes. Mol. Biol. Evol. 19:1575–1584.

    Parkhill, J., G. Dougan, K. D. James et al. (38 co-authors). 2001a. Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature 413:848–852.

    Parkhill, J., B. W. Wren, N. R. Thomson et al. (33 co-authors). 2001b. Genome sequence of Yersinia pestis, the causative agent of plague. Nature 413:523–527.

    Perna, N. T., G. Plunkett III, V. Burland et al. (25 co-authors). 2001. Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409:529–533.

    Rispe, C., F. Delmotte, R. C. H. J. Van Ham, and A. Moya. 2004. Mutational and selective pressures on codon and amino acid usage in Buchnera, endosymbiotic bacteria of aphids. Genome Res. 14:44–53.

    Rokas, A., B. L. Williams, N. King, and S. B. Carroll. 2003. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425:798–804.

    Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425.

    Sauer, C., E. Stackebrandt, J. Gadau, B. Holldobler, and R. Gross. 2000. Systematic relationships and cospeciation of bacterial endosymbionts and their carpenter ant host species: proposal of the new taxon Candidatus Blochmannia gen. nov. Int. J. Syst. Evol. Microbiol. 50:1877–1886.

    Schmidt, H. A., K. Strimmer, M. Vingron, and A. von Haeseler. 2002. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502–504.

    Shigenobu, S., H. Watanabe, M. Hattori, Y. Sakaki, and H. Ishikawa. 2000. Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp APS. Nature 407:81–86.

    Shigenobu, S., H. Watanabe, Y. Sakaki, and H. Ishikawa. 2001. Accumulation of species-specific amino acid replacements that cause loss of particular protein functions in Buchnera, an endocellular bacterial symbiont. J. Mol. Evol. 53:377–386.

    Sicheritz-Ponten, T., and S. G. E. Andersson. 2001. A phylogenomic approach to microbial evolution. Nucleic Acids Res. 29:545–552.

    Silva, F. J., A. Latorre, and A. Moya. 2001. Genome size reduction through multiple events of gene disintegration in Buchnera APS. Trends Genet. 17:615–618.

    ———. 2003. Why are the genomes of endosymbiotic bacteria so stable? Trends Genet. 19:176–180.

    Simpson, A. J., F. C. Reinach, P. Arruda et al. (113 co-authors). 2000. The genome sequence of the plant pathogen Xylella fastidiosa. The Xylella fastidiosa Consortium of the Organization for Nucleotide Sequencing and Analysis. Nature 406:151–157.

    Snel, B., P. Bork, and M. A. Huynen. 1999. Genome phylogeny based on gene content. Nat. Genet. 21:108–110.

    Stover, C. K., X. Q. Pham, A. L. Erwin et al. (28 co-authors). 2000. Complete genome sequence of Pseudomonas aeruginosa PA01, an opportunistic pathogen. Nature 406:959–964.

    Strimmer, K., and A. vonHaeseler. 1996. Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964–969.

    Susko, E., Y. Inagaki, and A. J. Roger. 2004. On inconsistency of the neighbor-joining, least squares, and minimum evolution estimation when substitution processes are incorrectly modeled. Mol. Biol. Evol. 21:1629–1642.

    Suyama, M., and P. Bork. 2001. Evolution of prokaryotic gene order: genome rearrangements in closely related species. Trends Genet. 17:10–13.

    Tamas, I., L. Klasson, B. Canback, A. K. Naslund, A. S. Eriksson, J. J. Wernegreen, J. P. Sandstrom, N. A. Moran, and S. G. E. Andersson. 2002. 50 Million years of genomic stasis in endosymbiotic bacteria. Science 296:2376–2379.

    Tang, J., and B. M. E. Moret. 2003a. Scaling up accurate phylogenetic reconstruction from gene-order data. Bioinformatics 19(Suppl. 1):i305–i312.

    ———. 2003b. Phylogenetic reconstruction from gene-rearrangement data with unequal gene content. Pp. 37–46 in Algorithms and data structures, ed. F. Dehne, J.-R. Sack, and M. Smid. Springer-Verlag.

    Tesler, G. 2002. GRIMM: genome rearrangements web server. Bioinformatics 18:492–493.

    Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876–4882.

    Tillier, E. R., and R. A. Collins. 2000. Genome rearrangement by replication-directed translocation. Nat. Genet. 26:195–197.

    Uchiyama, I. 2003. MBGD: microbial genome database for comparative analysis. Nucleic Acids Res. 31:58–62.

    van Ham, R. C., J. Kamerbeek, C. Palacios et al. (13 co-authors). 2003. Reductive genome evolution in Buchnera aphidicola. Proc. Natl. Acad. Sci. USA 100:581–586.

    Van Sluys, M. A., M. C. de Oliveira, C. B. Monteiro-Vitorello et al. (55 co-authors). 2003. Comparative analyses of the complete genome sequences of Pierce's disease and citrus variegated chlorosis strains of Xylella fastidiosa. J. Bacteriol. 185:1018–1026.

    Von Dohlen, C. D., and N. A. Moran. 2000. Molecular data support a rapid radiation of aphids in the Cretaceous and multiple origins of host alternation. Biol. J. Linn. Soc. 71:689–717.

    Wei, J., M. B. Goldberg, V. Burland et al. (14 co-authors). 2003. Complete genome sequence and comparative genomics of Shigella flexneri serotype 2a strain 2457T. Infect. Immun. 71:2775–2786.

    Welch, R. A., V. Burland, G. Plunkett III et al. (16 co-authors). 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc. Natl. Acad. Sci. USA 99:17020–17024.

    Wolf, Y. I., I. B. Rogozin, N. V. Grishin, and E. V. Koonin. 2002. Genome trees and the tree of life. Trends Genet. 18:472–479.(Eugeni Belda, Andrés Moya)