当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2005年 > 第6期 > 正文
编号:11258316
Molecular Evolution of Cadherin-Related Neuronal Receptor/Protocadherin (CNR/Pcdh) Gene Cluster in Mus musculus Subspecies
     * KOKORO Biology Group, Laboratories for Integrated Biology, Graduate School of Frontier Biosciences, Osaka University, Suita, Japan; Department of Biology, Graduate School of Science, Osaka University; CREST of Japan Science and Technology Agency; Mouse Genomics Resource Laboratory, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka-ken 411-8540, Japan; || Mammalian Genetics Laboratory, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka-ken, 411-8540, Japan; and ? Laboratory of Neurobiology and Behavioral Genetics, National Institute for Physiological Sciences, Okazaki, Japan

    Correspondence: E-mail: yagi@fbs.osaka-u.ac.jp.

    Abstract

    The mouse cadherin-related neuronal receptor/protocadherin (CNR/Pcdh) gene clusters are located on chromosome 18. We sequenced single-nucleotide polymorphisms (SNPs) of the CNR/Pcdh-coding region among 12 wild-derived and four laboratory strains; these included the four major subspecies groups of Mus musculus: domesticus, musculus, castaneus, and bactrianus. We detected 883 coding SNPs (cSNPs) in the CNR/Pcdh variable exons and three in the constant exons. Among all the cSNPs, 586 synonymous (silent) and 297 nonsynonymous (amino acid exchanged) substitutions were found; therefore, the Ka/Ks ratio (nonsynonymous substitutions per synonymous substitution) was 0.51. The synonymous cSNPs were relatively concentrated in the first and fifth extracellular cadherin domain-encoding regions (ECs) of CNR/Pcdh. These regions have high nucleotide homology among the CNR/Pcdh paralogs, suggesting that gene conversion events in synonymous and homologous regions of the CNR/Pcdh cluster are related to the generation of cSNPs. A phylogenetic analysis revealed gene conversion events in the EC1 and EC5 regions. Assuming that the common sequences between rat and mouse are ancestral, the GC content of the third codon position has increased in the EC1 and EC5 regions, although biased substitutions from GC to AT were detected in all the codon positions. In addition, nonsynonymous substitutions were extremely high (11 of 13, Ka/Ks ratio 5.5) in the laboratory mouse strains. The artificial environment of laboratory mice may allow positive selection for nonsynonymous amino acid variations in CNR/Pcdh during inbreeding. In this study, we analyzed the direction of cSNP generation, and concluded that subspecies-specific nucleotide substitutions and region-restricted gene conversion events may have contributed to the generation of genetic variations in the CNR/Pcdh genes within and between species.

    Key Words: CNR ? protocadherin ? SNP ? gene conversion

    Introduction

    Gene conversion is believed to be the primary mechanism for homogenization in duplicated genes (Ohta 1990). Clear evidence for gene conversion is seen when DNA polymorphism data are available for both of the duplicated genes. The cadherin-related neuronal receptor/protocadherin (CNR/Pcdh) gene cluster was first identified in a study of proteins that bound Fyn tyrosine kinase in the mouse brain (Kohmura et al. 1998). Genomic analyses revealed that the CNR/Pcdh genes are organized into three closely linked clusters named CNR/Pcdh, Pcdh?, and Pcdh; each cluster is arranged in a tandem array on mouse chromosome 18c (Sugino et al. 2000; Wu et al. 2001) and human chromosome 5q31 (Wu and Maniatis 1999). Recently, the CNR/Pcdh gene cluster was also identified and characterized in the rat (Yanase, Sugino, and Yagi 2004), chicken (Sugino et al. 2004b) and zebrafish (Noonan et al. 2004; Tada et al. 2004). The CNR/Pcdh and Pcdh genes are extensively divided into variable and constant regions. In the variable regions, multiple variable exons encoding alternate extracellular, transmembrane, and short cytoplasmic domains are clustered in tandem. One variable exon codes for the extracellular domain, which is composed of a signal peptide and six extracellular cadherin (EC) domains. Each EC domain consists of about 100–120 amino acid residues, and has a cadherin motif ([LIV]-X-[LIV]-X-D-X-N-D-[NH]-X-P) in the tail. There is no conservation among the six cadherin domains within a single CNR/Pcdh variable exon, except for their cadherin motifs. Small constant exons encoding the cytoplasmic tail are located in the constant region. Multiple transcripts can be generated from promoters upstream of each CNR/Pcdh variable exon. Transcription from a given promoter generates a long pre–messenger RNA that contains multiple downstream variable exons as well as constant exons. Subsequently, the cap-proximal variable exon engages in cis-splicing with the first constant exon (Tasic et al. 2002; Wang et al. 2002). Interestingly, the clustered CNR/Pcdh genes in human, mouse, rat, chicken, and zebrafish are prone to gene conversion events. Strong sequence homogenizations in a specific EC domain–encoding region have been demonstrated in mammals, chicken, and zebrafish (Sugino et al. 2000; Sugino et al. 2004b; Noonan et al. 2004; Tada et al. 2004). Thus, gene conversion must exist for all copies of the variable exons, because they are similar to each other, and this event occurred long after the gene duplication, which has been seen in all vertebrates examined to date.

    The combination of lineage-specific gene conversion and adaptive variation in diversified ECs may drive the molecular evolution of the CNR/Pcdh cluster genes. To understand further the concerted evolutionary mechanisms of the CNR/Pcdh gene clusters, we analyzed the complete nucleotide sequence of the mouse CNR/Pcdh locus for 12 wild-derived and four laboratory inbred strains. The house mouse (Mus musculus) is the most important model animal in mammalian genetics because of its abundant polymorphisms. About 1 MYA, the progenitors of the house mouse diverged into four separate subspecies groups, domesticus, musculus, castaneus, and bactrianus, that occupied nonoverlapping ranges in and around the Indian subcontinent (Silver 1995; Guenet and Bonhomme 2003). These subspecies groups are genetically differentiated by mitochondrial DNA (Yonekawa et al. 1981), biochemical markers (Bonhomme et al. 1984), and ribosomal DNA (Suzuki et al. 1986). Recently, a high-quality draft sequence of the mouse genome (mainly the C57BL/6 strain) was presented, and many single-nucleotide polymorphisms (SNPs) were identified by the additional sequencing of other mouse strains. The distribution of SNPs reveals that genetic variation among mouse strains occurs in large blocks, mostly reflecting the contributions of the domesticus and musculus subspecies groups to current laboratory strains (Wade et al. 2002; Waterston et al. 2002). Here, we investigated the polymorphisms of the CNR/Pcdh gene clusters among 12 wild-derived and four major laboratory mouse strains, including the four subspecies groups listed above, to determine the mechanisms of the molecular evolution of CNR/Pcdh after speciation.

    Materials and Methods

    Inbred Mouse Strains

    We used 16 inbred mouse strains including 12 wild-derived (JF1/Ms, MSM/Ms, SWN/Ms, KJR/Ms, CHD/Ms, BLG2/Ms, NJL/Ms, HMI/Ms, CAST/Ei, AVZ/Ms, PGN2/Ms, BFM/2Ms) and four classic laboratory (C57BL/6J, BALB/cUCSD, DBA/1J, DBA/2J) strains, listed in table 1. Details about the establishment and maintenance of each strain were described in Furuse et al. (2002) and Koide et al. (2000). The genomic DNA of each strain was obtained from tail tips.

    Table 1 Inbred Mouse Strains Used in this Study

    Polymerase Chain Reaction and Sequencing

    Genomic polymerase chain reactions (PCRs) were performed in a total volume of 30 μl with 0.3 μM each primer, 0.5 U of TaKaRa LA Taq polymerase (TaKaRa, Otsu, Japan), 1x LA PCR Buffer II (TaKaRa), 2.5 mM MgCl2, 0.25 mM each deoxynucleoside triphosphate, and 10 ng of genomic DNA. Reactions were performed on GeneAmp 9700 thermal cyclers (Applied Biosystems, Foster City, Calif.) by denaturing at 95°C for 5 min, followed by 30 cycles at 95°C for 30 s, annealing temperature for 15 s, and 68°C for 240 s. A second PCR using nested primers and 25 cycles was performed when the amount of product from the first PCR was low (<10 ng/μl) or when a smeared electrophoresis band appeared. Primer pairs were designed to bind both –500 bp upstream and downstream of each protocadherin variable and constant exon to amplify all the coding sequences. All primers used for the PCRs and sequencing are listed as Supplementary Material online. The PCR products of the variable exons ranged from 3.0 to 3.5 kb. The PCR conditions were optimized by varying the annealing temperature between 54°C and 62°C. Eighteen PCR products including 15 variable and 3 constant exons of each strain were obtained. The products were purified by polyethylene glycol (PEG) precipitation. In brief, 10 μl of PEG solution (20% PEG6000, 2.5 M NaCl) was added to the same volume of each PCR product, and the mixture was incubated for 30 min at 4°C. Samples were then subjected to plate centrifugation at 5,000 rpm for 15 min at 4°C, washed with 100 μl of 70% ethanol, air-dried for 5 min, and eluted in 12 μl of distilled water. The products of the first PCR with exon-specific primers were directly sequenced. The sequencing products were run on Applied Biosystems 3100, 3700, and 3730xl sequencers. Sequence traces were assembled and analyzed with SeqScape software version 2.0 (Applied Biosystems, Foster City, Calif.). Samples giving ambiguous or low-quality genotypes were sequenced again.

    Phylogenetic Analysis

    The nucleotide sequences of the CNR/Pcdh variable and constant regions were aligned using the ClustalX program (Thompson et al. 1997; ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalX/) with the default parameters. neighbor-joining phylogenetic analysis with 1,000 bootstrap trials was carried out using the ClustalX Bootstrap NJ tree option. All phylogenetic trees were drawn with the NJPLOT program (http://pbil.univ-lyon1.fr/software/njplot.html).

    dbSNP

    We have submitted all the polymorphism data from this study to the NCBI dbSNP database (http://www.ncbi.nlm.nih.gov/projects/SNP/) under the NCBI assay ID numbers (ss28533130–ss28533995). We have added our assay and genotype data to these records.

    Results

    Identification of SNPs in CNR/Pcdh Genes

    To identify and characterize coding SNPs (cSNPs) of the CNR/Pcdh genes, we sequenced the coding regions of the CNR/Pcdh genes using genomic DNA from 12 wild-derived and four laboratory mouse strains; these included the four major mouse subspecies groups. Table 1 shows the strains and subspecies groups we used in this study. The laboratory strains were recently reported by Wade et al. (2002) to have been established from a very small number of domesticus or musculus progenitors. Among all the strains, we identified 883 cSNPs, three deletions, one nucleotide insertion, and one transposon insertion in the 36,492-bp mouse CNR/Pcdh-coding exons. There were 731 cSNPs in the 28,647-bp CNR/Pcdh variable exons v1 to v12 (255.2 cSNPs/10 kb), 81 cSNPs in the 2,391-bp exon relic1 and 2 (v7/8, 338.8 cSNPs/10 kb, without the data from strain PGN2/Ms, because the PCR product could not obtained), which exists only in wild-derived strains between v7 and v8 (Sugino et al. 2004a), 68 cSNPs in the 4,998-bp exon vc1 and vc2 (136.1 cSNPs/10 kb), 3 cSNPs in the 456-bp CNR/Pcdh constant region exons (65.8 cSNPs/10 kb), and 26 polymorphisms (including SNP, tandem repeat, deletion, and insertion) in the 2,401-bp 3'-untranslated region of the CNR/Pcdh constant region (108.3 polymorphisms/10 kb). Among all the CNR/Pcdh variable and constant region–coding exons, we identified 297 nonsynonymous and 586 synonymous substitutions; the Ka/Ks ratio (nonsynonymous substitutions per synonymous substitution) was 0.51. Figure 1 shows the distribution of cSNPs by strain (fig. 1A) and by exon (fig. 1B).

    FIG. 1.— Distribution of cSNPs among the Mus musculus CNR/Pcdh. The white bars are synonymous substitutions and gray bars are nonsynonymous substitutions. (A) Quantities of cSNPs in all coding regions by each strain against C57BL/6. (B) Quantities of cSNP loci by each variable exon.

    Phylogenetic Analysis of CNR/Pcdh

    Wade et al. (2002) reported large segmental blocks of either extremely high (40 SNPs/10 kb) or extremely low (0.5 SNPs/10 kb) polymorphism rates among laboratory strains. They concluded that the regions with high SNP rates within laboratory strains of M. musculus originated in differences between domesticus and musculus, or castaneus (Wade et al. 2002). To understand the sequence variation of each CNR/Pcdh variable and constant region exon within subspecies groups, we constructed neighbor-joining phylogenetic trees (fig. 2) and calculated Ka/Ks ratios by each trees (supplementary table 1, Supplementary Material online) using the sequences of all 16 mouse strains in this study. Within laboratory strains, there were a few cSNPs in the CNR/Pcdh locus (13 SNPs/36,492 bp, 3.56 cSNPs/10 kb, see table 4). The laboratory strains (C57BL/6, BALB/c, DBA/1, and DBA/2) were always found to be in the same group when the variable region was analyzed. However, the sequence identity of the laboratory strains broke down in the constant region (between C57BL/6 and others). Among the wild-derived strains examined in this study, BFM/2 (domesticus) was the closest to the laboratory strains, but the sequences were not identical throughout the CNR/Pcdh locus (fig. 2, exon v1, 7/8, 10, c2, and cp3). On the other hand, wild-derived strains were not always grouped into the same subspecies. For example, strain NJL (musculus) was closer to CAST/Ei and HMI (castaneus) than to the other musculus strains only in v6 to v7/8. Thus, one strain belonging to a particular subspecies group does not always have subspecies-specific sequences but is partially mixed with sequences from other subspecies.

    FIG. 2.— Phylogenetic relationship of each Mus musculus CNR/Pcdh variable and constant region exon among 12 wild-derived and four laboratory inbred strains on chromosome 18. Each symbol indicates a subspecies group. CNR/Pcdh variable exon 1 to c2 (v1 to v12, gray boxes; vc1 and vc2, white boxes) and three constant exons (cp1 to cp3, black boxes). Exon v7/8 is a relic sequence in laboratory strains (Wu et al. 2001) due to retroelement invasion (Sugino et al. 2004a). Constant exons 1 and 2 had no polymorphism among the 16 strains in this study.

    Table 4 A Breakdown of 13 cSNPs Detected Within Laboratory Strains

    Gene Conversion from the Analysis of cSNPs

    Vertebrate protocadherin evolution driven by restricted gene conversion is seen in the mouse and human CNR/Pcdh EC5 domain (Noonan et al. 2004). To identify homogenized loci resulting from gene conversion, we constructed neighbor-joining phylogenetic trees (fig. 3) and calculated Ka/Ks ratios by each EC domain (supplementary table 2, Supplementary Material online) using the sequence of five subspecies groups including the laboratory strains group. We chose one strain from each subspecies group that contained peculiar (subspecies-specific) nucleotide sequences. Figures 3 and S1 (Supplementary Material online) show the phylogenetic trees of all the CNR/Pcdh variable region exons by each EC domain, including the presumed exon v7/8 in the laboratory strains but not exons vc1 and vc2, for all the subspecies groups. We constructed two types of trees, one for synonymous and the other for nonsynonymous substitutions in each domain. In the EC1 synonymous tree (fig. 3A), we observed three obvious gene conversion events: between exons v3 and v4 of the MSM and CAST/Ei strain (musculus and castaneus, fig. 3G), exons v6 and v11 in MSM (not shown), and exons v6 and v12 in CAST/Ei (fig. 3H). In the EC5 synonymous tree (fig. 3C), we also observed three conversion events: between exons v6 and v11 in CAST/Ei (fig. 3I), exons v1 and v5 in CAST/Ei (fig. 3J), and exons v5 and v6 in AVZ (bactrianus, not shown). In the EC1 nonsynonymous tree (fig. 3B); however, we observed a gene conversion event only between exons v3 and v4 in MSM and CAST/Ei (fig. 3G). Similar to EC1, There was no obvious conversion event indicated by the EC5 nonsynonymous tree (fig. 3D). These results suggested that homologous recombination events occurred in different highly conserved exons. On the other hand, the synonymous and nonsynonymous phylogenetic trees of EC2 (fig. 3E and F) indicated no apparent paralogous similarity but seemed to show orthologous conservation. Thus, synonymous gene conversion occurring in a specific region is a candidate mechanism for generating cSNPs within the M. musculus subspecies groups.

    FIG. 3.— Discovery of region-restricted gene conversion events in Mus musculus CNR/Pcdh. (A–F) Phylogenetic analysis of CNR/Pcdh v1 to v12 ectodomain 1 (EC1, A and B), 5 (EC5, C and D), and 2 (EC2, E and F) among five subspecies groups (MSM; musculus, BFM/2; domesticus, CAST/Ei; castaneus, AVZ; bactrianus, and C57BL/6; laboratory strains). The phylogenies of A, C, and E were constructed using the sequences of synonymous substitutions. The phylogenies of B, D, and F were constructed using the sequence of nonsynonymous substitutions. (G–J) The footprints of gene conversion event of EC1 (G, H) and EC5 (I, J) shown in (A) and (C).

    Direction of Nucleotide Substitution

    Recently, the genomic sequence of rat CNR/Pcdh was released (Yanase, Sugino, and Yagi 2004). Phylogenetic and paleontological data suggest that mice and rats diverged from a common ancestor 10–15 MYA (Jaeger, Tong, and Denys 1986). As a result of frequent gene conversion events in the CNR/Pcdh locus, the GC content at the third codon position (GC3) in homogenized regions has increased (Noonan et al. 2004). Thus, we assumed that there is a certain direction of nucleotide substitution from ancestral rodents to Mus or Rattus. In brief, we assumed that the common sequences between mouse and rat are ancestral and counted the number of cSNPs among the ancestral mouse sequences. We found 804 loci of common sequences in all the CNR/Pcdh exons, 506 loci from GC to AT, 215 loci from AT to GC, and 83 loci from AT to TA or GC to CG. In the third codon position, we found 585 loci: 364 loci from GC to AT, 165 loci from AT to GC, and 56 loci from AT to TA or GC to CG. These data indicated that there is a direction of nucleotide substitution in the speciation of M. musculus from GC to AT. Next, we analyzed the direction of GC3 substitution by each domain-encoding region among the CNR/Pcdh variable exons, excluding vc1 and vc2. We revised these data using the original GC3 content from the sequence of the C57BL/6 strain. That is, we normalized the data to the GC3 numbers in the C57BL/6 strain. Interestingly, directional substitution from AT to GC was found in the EC1, EC5, and TM regions (1.06, 1.06, and 2.55, to GC/to AT ratio, respectively). Substitution in the reverse direction from GC to AT was seen in the other regions (less than 1.0, to GC/to AT ratio, table 2). These data suggested that the substitution direction toward GC was seen in relatively highly homogenized regions as a result of gene conversion events.

    Table 2 Direction of GC3 Content by Domain-Encoding Region

    High Ratio of Nonsynonymous Substitution Within Laboratory Strains

    Among all the wild-derived and laboratory inbred strains, we identified 297 nonsynonymous and 586 synonymous substitutions in the CNR/Pcdh variable and constant region exons. Figure 4 shows the nonsynonymous and synonymous substitution rates per 10 kbp for each CNR/Pcdh domain-encoding region in the variable region exons. We detected the highest synonymous substitution (Ks) ratios in EC1 and EC5 (199 and 252 bp per 10 kbp, respectively). EC1 and EC5 of CNR/Pcdh are paralogous regions in a given strain that are homogenized as a result of frequent gene conversion events in meiosis. Thus, synonymous gene conversion in the paralogous conserved regions, which is the driving force of lineage-specific sequence homogenization within species, is also responsible for generating the cSNP diversity in the ectodomain-encoding regions among the M. musculus subspecies groups.

    FIG. 4.— Itemization of all cSNPs in the CNR/Pcdh variable exons among the 16 strains used in this study. Nonsynonymous (gray) and synonymous (white) cSNPs rates per 10 kb by respective domain are shown.

    Between the group of wild-derived strains and the group of laboratory strains, we detected 14 cSNPs in CNR/Pcdh loci. In other words, these 14 cSNPs were thought to be generated in the early formative period of the classic laboratory strains. Table 3 shows an itemization of these cSNPs. The Ka/Ks ratio of these cSNPs was 0.4. However, within the classic laboratory strains, we detected only a few, but nonetheless characteristic, cSNPs in CNR/Pcdh loci. There were only 13 cSNPs between C57BL/6 and the other three laboratory strains, BALB/c, DBA/1, and DBA/2. These cSNPs could have been generated after lineage separation. Surprisingly, 11 of the 13 (84.6%, Ka/Ks ratio 5.5) substitutions were nonsynonymous, compared with 297 of 883 (33.6%, Ka/Ks ratio 0.51) nonsynonymous substitutions among all 16 strains. Table 4 shows the loci of the substitutions. There appeared to be no nucleotide homogenization in any of the laboratory strains. Each of the 13 polymorphic loci seemed to be random. Thus, we regard the cSNPs generated within laboratory strains as not resulting from gene conversion but rather from simple mismatch substitutions.

    Table 3 A Breakdown of 14 cSNPs Detected Between Wild-Derived and Laboratory Strains

    Discussion

    We performed a complete cSNP analysis of the M. musculus CNR/Pcdh locus from 16 mouse strains, including 12 wild-derived strains (table 1), by genomic PCR and direct sequencing. Thus, our polymorphism data are more accurate than data obtained using indirect detecting methods, such as high-density oligonucleotide arrays (Patil et al. 2001; Frazer et al. 2004). In our study, we eliminated the detection of false-positive and -negative polymorphisms by obtaining the complete nucleotide sequence data, which we used to analyze the paralogous and orthologous relationships at CNR/Pcdh loci. We identified 883 cSNPs, three deletions, one nucleotide insertion, and one transposon insertion in the 36,492-bp mouse CNR/Pcdh locus (all cSNPs detected in this study are available in dbSNP). Frazer et al. (2004) performed a genome-wide SNP analysis and submitted it to dbSNP. The CNR/Pcdh gene clusters are on M. musculus chromosome 18. Frazer et al. (2004) also analyzed the CNR/Pcdh gene cluster locus, and we confirmed their data in the CAST/Ei strain. The number of cSNPs they detected was only 20.2% of the number we found (94 cSNPs vs. 465). We also found an inconsistency of the cSNPs in our data for CAST/Ei compared with that of Frazer's group as 16 out of 134 (11.9%) did not match between the two studies. However, we believe the data presented in this paper are highly reliable because we analyzed two different castaneus strains, CAST/Ei and HMI/Ms, and detected extremely similar cSNPs in all the CNR/Pcdh loci (458 cSNPs out of 460).

    First, we constructed phylogenetic trees for each CNR/Pcdh variable and constant region exon (fig. 2) to determine the orthologous relationships among all 16 strains in this study, and this analysis indicated that the musculus, castaneus, and laboratory strains could be assembled into separate subspecies groups. The musculus, castaneus, and laboratory strains had a distinguishing (subspecies-specific) sequence among the CNR/Pcdh variable exons. However, the domesticus (BFM/2, PGN2) and bactrianus (AVZ) strains could not be assembled into a consistent tree because their relationships changed when different exons were analyzed. We confirmed that these three strains were closer in relationship to one another than to the other strains by comparing each variable exon. We concluded that the CNR/Pcdh variable region of the laboratory strains was composed of domesticus and bactrianus sequences, which formed a mosaic structure of the different exons. In the CNR/Pcdh locus of laboratory strains, there are two large phylogenetic segment blocks: the variable region and the constant region (Frazer et al. 2004). We and Frazer et al. (2004) analyzed laboratory strains (C57BL/6, BALB/c, DBA/2) and castaneus (CAST/Ei). The phylogenetic trees in figure 2 indicate that all the laboratory strains were grouped by each variable exon but were separated from C57BL/6 and the other three strains by constant exon 3 (cp3). We concluded that there was a segmental border between the variable and constant region in the CNR/Pcdh locus.

    We next analyzed the paralog sequence diversity among all the CNR/Pcdh variable region exons. The rate of nonsynonymous and synonymous substitutions per 10 kb in each domain is shown in figure 4. The nonsynonymous polymorphism rates were uniform among all the domains. On the other hand, the synonymous polymorphism rates in EC1 and EC5 were significantly higher than in the other domain-encoding regions. These findings suggested that within M. musculus, amino acid exchanges were restricted equally in each EC domain–encoding region. However, nucleotide exchanges were high in the specific domain-encoding regions of the CNR/Pcdh genes. The relationships of CNR/Pcdh by EC domains among five mouse subspecies are shown in figures 3 and S1 (Supplementary Material online). Sequence homogenization was mostly detected in the synonymous phylogenetic trees of EC1 (fig. 3A) and EC5 (fig. 3C), although less was detected in the nonsynonymous trees of EC1 (fig. 3B) and EC5 (fig. 3D). These findings suggest that frequent gene conversions in these regions may occur, but only suitable conversions are selected. Within M. musculus, nucleotide-restricted homogenization following frequent gene conversion events contributed to the generation of cSNPs in EC1 and EC5.

    We also confirmed the GC content variations of the third codon position among mouse strains. Assuming that the common sequences between mouse and rat are ancestral, we could examine the direction of substitution using the mouse SNP data in this study. Among all coding nucleotides of the CNR/Pcdh exons, we detected approximately twice the amount of substitution from GC to AT than the reverse. In general, it has been argued that in both prokaryotes and eukaryotes, mutation processes produce more AT mutations than GC mutations, leading to a mutational bias towards AT (Birdsell 2002). However, at the third codon position (GC3), we counted and analyzed the GC3 content in each domain. We found that the GC3 content was slightly higher at EC1 and EC5 but lower at other relatively orthologous domains (table 2). The GC3 content appeared to be increased only in the EC1, EC5, and TM domain–encoding regions, although AT-biased substitutions were high in all positions of the codon among all the variable exons. Thus, there is a certain mechanism for upregulating the GC3 content at EC1 and EC5. In fact, we found obvious gene conversion events (fig. 3) and a high synonymous SNP rate (fig. 4) at these regions. We suggest that frequent gene conversion events between homogenized regions resulted in the increased GC3 content. It has been said that gene conversion can favor GC over AT base pairs, leading to the concept of biased gene conversion towards GC. The expected consequence of such a process is the GC enrichment of DNA sequences under gene conversion (Marais 2003; Galtier 2003). In this study, we found evidence for biased gene conversion towards GC at specific regions within M. musculus CNR/Pcdh, supporting this hypothesis.

    Sequence homogenization occurred in EC1 and EC5 of the CNR/Pcdh variable exons. However, most conversions we detected were synonymous substitutions. There were different evolutionary directions between EC1 and EC5 and the other domain-encoding regions in one large variable exon. In the generated proteins, the EC1 and EC5 domains, paralogous homogenized regions, may have a lineage-specific function. On the other hand, the EC2, EC3, EC4, and EC6 domains are orthologous conserved regions and may have a common function different from that of the EC1 and EC5 domains in protein interactions. The EC1 domains of the classical cadherins, the N, P, and E cadherins, are required for specific trans-homophilic interactions (Takeichi 1990). However, recently it has been reported that the trans-homophilic interaction activity does not always require just the EC1 domains but can involve multiple EC domains, for example, EC1 and EC4 and EC2 and EC3 (deep intercalation model) (Renaud-Young and Gallin 2002). The deep intercalation model may be adapted to the interactions of the CNR/Pcdh proteins. Lineage-specific EC1 and EC5 domains may cooperate in protein interactions in a lineage-specific function. A cis-heterophilic protein interaction between the products of CNR/Pcdh and Pcdh has also been proposed (Murata et al. 2004). These interactions occur in the cytoplasmic and extracellular domains. The EC2, EC3, EC4, EC6, and VCP domains are orthologous, conserved regions. Therefore, these conserved regions may contribute to the cis-heterophilic interaction activity.

    Interestingly, nonsynonymous substitutions were extremely high (11 of 13, Ka/Ks ratio 5.5) within the laboratory mouse strains. Our phylogenetic analysis indicated that the locus of all the CNR/Pcdh variable exons was derived from a single or limited number of domesticus strains (Frazer et al. 2004). The 13 cSNPs within the laboratory strains were not detected in the wild-derived strains. There are two possible mechanisms by which these 13 cSNPs emerged. The first is that they were generated over the last century since the classic laboratory strains were produced. Neutral theory asserts that nonsynonymous substitutions are almost entirely removed by natural selection (Kimura 1977). However, the special habitat of inbred laboratory mice may allow the generation of unusual replacement mutations. The second possibility is that these cSNPs persist from residual heterozygosity during the inbreeding of laboratory strains. C57BL/10J and C57BL/6J are closely related laboratory strains separated from the original C57BL after about 40 generations of inbreeding (Festing 1996). Forty-nine SNPs are detected between them, indicating that polymorphism between the two strains almost certainly represents old SNPs that were still segregating at the time the two strains separated. This suggests that there is selection for residual heterozygosity at some loci during the process of inbreeding (Petkov et al. 2004). Considering this possibility, nonsynonymous heterozygosity of CNR/Pcdh in laboratory strains may be positively selected for during inbreeding. A similar high ratio of nonsynonymous substitution is seen when human Pcdh is analyzed (Miki et al. 2005), suggesting that the positive selection of nonsynonymous substitutions occurs among closely related lineages.

    From the results of the detailed cSNP analysis in this study, we propose a molecular mechanism of nucleotide evolution in the M. musculus CNR/Pcdh cluster. We conclude that synonymous gene conversion occurring in a specific region is a candidate mechanism for generating the cSNPs, and the counterbalance of GC-biased gene conversion and the universal mutational bias towards AT by each CNR/Pcdh domain may be needed to drive the concerted evolution for speciation. Fascinatingly, in a recent study, single-cell analysis of Purkinje cells using multiple reverse transcription–PCR reactions showed the monoallelic and combinatorial expression of each variable exon in the CNR/Pcdh genes. This type of regulation could account for the wide variety of different combinations of the CNR/Pcdh genes expressed in individual neurons and, therefore, have a role in the generation of neurons with a range of different adhesions, functions, and interactions (Esumi et al. 2005). Considering this combinatorial expression of CNR/Pcdh genes in the brain, genetic variations in the CNR/Pcdh genes within and between species may contribute to lineage-specific neurological characteristics.

    Supplementary Material

    Supplementary tables 1 and 2 and figure S1 are available at Molecular Biology and Evolution online (www.mbe.oupjournals.org).

    Acknowledgements

    We thank the Yagi laboratory members for their assistance. This work was supported by Grants-in-Aid from the Ministry of Education, Science, Sports, and Culture of Japan (T.Y.), the Uehara Memorial Foundation, the Mitsubishi Foundation, and CREST (Core Research for Evolutional Science and Technology) of JST (Japan Science and Technology Agency).

    References

    Birdsell, J. A. 2002. Integrating genomics, bioinformatics, and classical genetics to study the effects of recombination on genome evolution. Mol. Biol. Evol. 19:1181–1197.

    Bonhomme, F., J. Catalan, J. Britton-Davidian, V. M. Chapman, K. Moriwaki, E. Nevo, and L. Thaler. 1984. Biochemical diversity and evolution in the genus Mus. Biochem. Genet. 22:275–303.

    Esumi S., N. Kakazu, Y. Taguchi, T. Hirayama, A. Sasaki, T. Hirabayashi, T. Koide, T. Kitsukawa, S. Hamada, and T. Yagi, 2005. Monoallelic yet combinatorial expression of variable exons of the protocadherin- gene cluster in single neurons. Nat. Genet. 37:171–176.

    Festing, M. F. W. 1996. Origins and characteristics of inbred strains of mice. Oxford University Press, Oxford.

    Frazer, K. A., C. M. Wade, D. A. Hinds, N. Patil, D. R. Cox, and M. J. Daly. 2004. Segmental phylogenetic relationships of inbred mouse strains revealed by fine-scale analysis of sequence variation across 4.6 mb of mouse genome. Genome Res. 14:1493–1500.

    Furuse, T., D. A. Blizard, K. Moriwaki, Y. Miura, K. Yagasaki, T. Shiroishi, and T. Koide. 2002. Genetic diversity underlying capsaicin intake in the Mishima battery of mouse strains. Brain Res. Bull. 57:49–55.

    Galtier, N. 2003. Gene conversion drives GC content evolution in mammalian histones. Trends Genet. 19:65–68.

    Guenet, J. L., and F. Bonhomme. 2003. Wild mice: an ever-increasing contribution to a popular mammalian model. Trends Genet. 19:24–31.

    Jaeger, J.-J., H. Tong, and C. Denys. 1986. The age of Mus-Rattus divergence: paleontological data compared with the molecular clock. C. R. Acad. Sci. Paris 302:917–922.

    Kimura, M. 1977. Preponderance of synonymous changes as evidence for the neutral theory of molecular evolution. Nature 267:275–276.

    Kohmura, N., K. Senzaki, S. Hamada, N. Kai, R. Yasuda, M. Watanabe, H. Ishii, M. Yasuda, M. Mishina, and T. Yagi. 1998. Diversity revealed by a novel family of cadherins expressed in neurons at a synaptic complex. Neuron 20:1137–1151.

    Koide, T., K. Moriwaki, K. Ikeda, H. Niki, and T. Shiroishi. 2000. Multi-phenotype behavioral characterization of inbred strains derived from wild stocks of Mus musculus. Mamm. Genome 11:664–670.

    Koide, T., K. Moriwaki, K. Uchida, A. Mita, T. Sagai, H. Yonekawa, H. Katoh, N. Miyashita, K. Tsuchiya, T. J. Nielsen, and T. Shiroishi. 1998. A new inbred strain JF1 established from Japanese fancy mouse carrying the classic piebald allele. Mamm. Genome 9:15–19.

    Marais, G. 2003. Biased gene conversion: implications for genome and sex evolution. Trends Genet. 19:330–338.

    Miki, R., K. Hattori, Y. Taguchi, M. N. Tada, T. Isosaka, Y. Hidaka, T. Hirabayashi, R. Hashimoto, H. Fukuzako, and T. Yagi 2005. Identification and characterization of coding single-nucleotide polymorphisms within human protocadherin- and -? gene clusters. Gene (in press).

    Murata, Y., S. Hamada, H. Morishita, T. Mutoh, and T. Yagi. 2004. Interaction with protocadherin-gamma regulates the cell-surface expression of protocadherin-alpha. J. Biol. Chem. 279:49508–49516.

    Noonan, J. P., J. Grimwood, J. Schmutz, M. Dickson, and R. M. Myers. 2004. Gene conversion and the evolution of protocadherin gene cluster diversity. Genome Res. 14:354–366.

    Ohta, T. 1990. How gene families evolve. Theor. Popul. Biol. 37:213–219.

    Patil, N., A. J. Berno, D. A. Hinds et al. (19 co-authors). 2001. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294:1719–1723.

    Petkov, P. M., Y. Ding, M. A. Cassell, W. Zhang, G. Wagner, E. E. Sargent, S. Asquith, V. Crew, K. A. Johnson, P. Robinson, V. E. Scott, and M. V. Wiles. 2004. An efficient SNP system for mouse genome scanning and elucidating strain relationships. Genome Res. 14:1806–1811.

    Renaud-Young, M., and W. J. Gallin. 2002. In the first extracellular domain of E-cadherin, heterophilic interactions, but not the conserved His-Ala-Val motif, are required for adhesion. J. Biol. Chem. 277:39609–39616.

    Silver, L. M. 1995. Mouse genetics. Oxford University Press, Oxford.

    Sugino, H., S. Hamada, R. Yasuda, A. Tuji, Y. Matsuda, M. Fujita, and T. Yagi. 2000. Genomic organization of the family of CNR cadherin genes in mice and humans. Genomics 63:75–87.

    Sugino, H., T. Toyama, Y. Taguchi, S. Esumi., M. Miyazaki, and T. Yagi. 2004a. Negative and positive effects of an IAP-LTR on nearby Pcdh gene expression in the central nervous system and neuroblastoma cell lines. Gene 337:91–103.

    Sugino, H., H. Yanase, S. Hamada, K. Kurokawa, S. Asakawa, N. Shimizu, and T. Yagi. 2004b. Distinct genomic sequence of the CNR/Pcdh genes in chicken. Biochem. Biophys. Res. Commun. 316:437–445.

    Suzuki, H., N. Miyashita, K. Moriwaki, R. Kominami, M. Muramatsu, T. Kanehisa, F. Bonhomme, M. L. Petras, Z. C. Yu, and D. Y. Lu. 1986. Evolutionary implication of heterogeneity of the nontranscribed spacer region of ribosomal DNA repeating units in various subspecies of Mus musculus. Mol. Biol. Evol. 3:126–137.

    Tada, M. N., K. Senzaki, Y. Tai, H. Morishita, Y. Z. Tanaka, Y. Murata, Y. Ishii, S. Asakawa, N. Shimizu, H. Sugino, and T. Yagi. 2004. Genomic organization and transcripts of the zebrafish protocadherin genes. Gene 340:197–211.

    Takeichi, M. 1990. Cadherins: a molecular family important in selective cell-cell adhesion. Annu. Rev. Biochem. 59:237–252.

    Tasic, B., C. E. Nabholz, K. K. Baldwin, Y. Kim, E. H. Rueckert, S. A. Ribich, P. Cramer, Q. Wu, R. Axel, and T. Maniatis. 2002. Promoter choice determines splice site selection in protocadherin alpha and gamma pre-mRNA splicing. Mol. Cell 10:21–33.

    Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24:4876–4882.

    Wade, C. M., E. J. Kulbokas 3rd, A. W. Kirby, M. C. Zody, J. C. Mullikin, E. S. Lander, K. Lindblad-Toh, and M. J. Daly. 2002. The mosaic structure of variation in the laboratory mouse genome. Nature 420:574–578.

    Wang, X., J. A. Weiner, S. Levi, A. M. Craig, A. Bradley, and J. R. Sanes. 2002. Gamma protocadherins are required for survival of spinal interneurons. Neuron 36:843–854.

    Waterston, R. H., K. Lindblad-Toh, E. Birney et al. (217 co-authors). 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562.

    Wu, Q., and T. Maniatis. 1999. A striking organization of a large family of human neural cadherin-like cell adhesion genes. Cell 97:779–790.

    Wu, Q., T. Zhang, J. F. Cheng, Y. Kim, J. Grimwood, J. Schmutz, M. Dickson, J. P. Noonan, M. Q. Zhang, R. M. Myers, and T. Maniatis. 2001. Comparative DNA sequence analysis of mouse and human protocadherin gene clusters. Genome Res. 11:389–404.

    Yanase, H., H. Sugino, and T. Yagi. 2004. Genomic sequence and organization of the family of CNR/Pcdhalpha genes in rat. Genomics 83:717–726.

    Yonekawa, H., K. Moriwaki, O. Gotoh, J. I. Hayashi, J. Watanabe, N. Miyashita, M. L. Petras, and Y. Tagashira. 1981. Evolutionary relationships among five subspecies of Mus musculus based on restriction enzyme cleavage patterns of mitochondrial DNA. Genetics 98:801–816.(Yusuke Taguchi*,,, Tsuyos)