当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2004年 > 第11期 > 正文
编号:11255317
Predicting Mammalian SINE Subfamily Activity from A-tail Length
     Tulane Cancer Center, and Department of Epidemiology, Tulane University Health Sciences Center, New Orleans, Louisiana

    E-mail: pdeinin@tulane.edu.

    Abstract

    Based on previous observations that newly inserted LINEs and SINEs have particularly long 3' A-tails, which shorten rapidly during evolutionary time, we have analyzed the rat and mouse genomes for evidence of recently inserted SINEs and LINEs. We find that the youngest predicted subfamilies of rodent identifier (ID) elements, a rodent-specific SINE derived from tRNAAla, are preferentially associated with A-tails over 50 bases in the rat genome, as predicted. Furthermore, these studies detected a subfamily of ID elements that has made over 15,000 copies that is younger than any previously reported ID subfamily. We use PCR analysis of genomic loci to demonstrate that all subfamily members tested inserted after the divergence of Rattus norvegicus from Rattus rattus. We also found evidence that the rodent B1 family of elements is much more active currently in mouse than in rat. These data provide useful estimates of recent activity from all of the mammalian retrotransposons, as well as allowing identification of the most recent insertions for use as population and speciation markers in those species. Both the current rat ID and mouse B1 elements that are active have small, specific interruptions in their 3' A-tail sequences. We suggest that these interruptions stabilize the length of the A-tails and contribute to the activity of these subfamilies. We present a model in which the dynamics of the 3' A-tail may be a central controlling factor in SINE activity.

    Key Words: SINE ? mobile element ? retrotransposition ? rodent ? subfamily ? microsatellite

    Introduction

    Of the 3 x 109 base pairs in mammalian genomes, only 3% to 5% encode exons. The remainder is composed of introns and intergenic regions, with transposable elements occupying between 33% and 45% of mammalian genomes (Lander et al. 2001; Smit 1996; Waterston et al. 2002). This is likely to be an underestimate of the contribution of mobile elements to the genome, as many of these elements have diverged over evolutionary time to the point where they are unrecognizable.

    In mammals, the most active mobile elements are non-LTR retrotransposons. The primary autonomous retrotransposon currently active in mammalian genomes appears to be L1 elements (Lander et al. 2001; Waterston et al. 2002; Deininger et al. 2003; Kazazian 1998; Ostertag and Kazazian 2001).

    SINEs (short interspersed repetitive DNA elements) are the most abundant of mammalian mobile elements. These elements, usually 90 to 300 bp in length, are classified as nonautonomous because they are thought to be dependent on LINE elements for their mobility (Ogiwara et al. 1999; Dewannieux, Esnault, and Heidmann 2003). Nearly all SINEs, with the exception of the human Alu and rodent B1 elements (Ullu and Tschudi 1984), are ancestrally derived from tRNA genes (Daniels and Deininger 1985b; Sakamoto and Okada 1985; Okada 1991). ID elements (originally termed R.dre.1) are members of a major SINE repetitive DNA family within the rodent genome and are believed to be derived from an alanine tRNA gene (Daniels and Deininger 1985a; Deininger and Slagel 1988; Sakamoto and Okada 1985). Features of SINEs include an internal RNA polymerase III promoter, an adenine-rich 3' region (which we will term A-tail in this article) of variable length, and flanking direct repeats created at the site of insertion.

    The rat ID family has been classified into subfamilies by diagnostic nucleotide changes (Daniels and Deininger 1985b; Kim et al. 1994). These subfamilies are of differing average ages. The oldest subfamily, ID1, parallels the evolution of the single-copy BC1 RNA locus throughout all rodent genomes. The other subfamilies are rat specific and are characterized by at least one shared base change. The latter subfamilies show a relatively young evolutionary age, ranging from 5 Myr to 1.9 Myr, with their relative ages ID2 > ID3 > ID4. The newer families of ID in rat do not follow the evolution of the BC1 locus and appear to have created a new, and much more actively amplifying, lineage (Kim and Deininger 1996; Kass, Kim, and Deininger 1996).

    Most SINE elements are thought to be incapable of retroposition, and only a relatively few active elements are responsible for the activity at any given time (Deininger et al. 1992; Schmid and Maraia 1992; Deininger et al. 1996). The features that separate the active copies from their inactive family members have not been experimentally demonstrated. However, circumstantial evidence suggests that a primary moderator of this activity is the length of the homopolymeric adenine flanking regions (Roy-Engel et al. 2002). For instance, the BC1 locus has served as a master locus for rodent ID elements (Kim et al. 1994), and the copy number of those elements in most rodent genomes parallels the length and homogeneity of the A-rich region at the 3' end of the BC1 locus in the respective genomes.

    In this article, we explore the ability to identify recently active ID subfamilies within the Rattus norvegicus genome by assessing the length of the A-tails on the elements. In addition, we have shown through genomic PCR analysis that a group of these elements were recently inserted within the R. norvegicus genome, while being absent from the R. rattus genome. Lastly, we present a model that may explain the observed pattern of SINE subfamily activity through time.

    Materials and Methods

    Polymerase Chain Reaction of Genomic Loci

    Rodent DNAs (Rattus norvegicus and Rattus rattus) were kindly provided by Drs. Anthony V. Furano (Section on Genomic Structure and Function, Laboratory of Molecular and Cellular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, Maryland) and Stefan Rothenburg (Institut für Immunologie, Universit?tsklinikum Eppendorf, Hamburg, Germany). For rodent speciation, we adhere to the nomenclature and taxonomy presented by the Smithsonian Institute (Musser 1993). We amplified individual loci by polymerase chain reaction (PCR) using 50 ng of total genomic DNA as template. PCR primers (table 1) for flanking regions of each ID element locus of the 47/51 (ID4b) subtypes were designed using Primer3 software (Whitehead Institute for Biomedical Research, Cambridge, Mass.; http://genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi). The predicted PCR primers were screened against the GenBank nonredundant database to eliminate primers that hybridized at multiple locations in the genomic sequence. PCR reactions were performed in volumes of 25 μl using an MJ Research PTC 200 thermal cycler with the following conditions: 1 x buffer (Promega), 1.5 mM MgCl2, 200 μM dNTPs, 0.25 μM for each primer, and 1.5 U Taq polymerase at 94°C for 2 min, 94°C for 60 sec, X°C for 45 s (X is the annealing temperature indicated in table 1), and 72°C for 1.5 min for 30 cycles, followed by 72°C for 20 min. The final PCR products were analyzed by agarose gel electrophoresis.

    Table 1 Primers Used for PCR Analysis of Rat ID Loci

    DNA sequences were obtained from PCR products of both R. norvegicus and R. rattus by cloning into TOPO TA cloning vector (Invitrogen), and three clones were sequenced for each using an ABI 3100 automated DNA sequencer.

    Bioinformatics

    To identify repetitive elements with long A-tails, Blast searches (BlastN) (Altschul et al. 1990) were performed with a homopolymer (A50) as the query sequence on the draft genome databases (rat [build 2], mouse [build 30], or human [build 33]) with low complexity filter disabled and the maximum number of results returned (–v –b flags) set at 1000. Expect values of 5e–19, 6e–19, 6e–19 was used for the mouse, rat, and human genomes, respectively, to eliminate all but perfect matches. Elements containing a homopolymeric stretch of 50 adenine residues were tabulated and extracted, along with adjacent flanking unique DNA sequence of approximately 1 kb from GenBank accessions, and analyzed by RepeatMasker2 (http://ftp.genome.washington.edu/cgi-bin/RepeatMasker).

    ID Elements

    To identify potential subfamily relationships and copy numbers, the ID portion of individual elements, without its A-tail, was used to query the R. norvegicus genome. Blast searches were performed using a 1000 value for the number of one-line descriptions (–v) and alignments (–b), while including an "expect value" (–e) of 1e–40. This would allow the acquisition of all perfect matches, as well as those that were imperfect by a single nucleotide. These were then collated manually.

    The A-tails of individual elements were defined as the first adenine base after the ID body sequence to the first non-A base in the direct repeats that define the insertion. The means of the distributions of A-tail lengths were analyzed with the z-test.

    Mouse and Rat B1 Elements

    The B1 elements extracted from the A50 mouse search were aligned (DNA Star, Megalign version 5.0 Expert analysis software) to derive a consensus sequence for this group of elements. Insufficient A50 rat B1 elements were found for our purposes, so we used a search for the rat B1 A40 group. After the alignment of the rat and mouse consensus B1 sequences, the positions involving subfamily diagnostic positions, hypermutable CpG positions (Labuda and Striker 1989; Batzer et al. 1990), and a mouse-specific diagnostic (GA) were replaced with N's to create a generic consensus sequence (TGGTGGNNCANNCCTTTAATCCCAGCACTNNGGAGGCAGAGGCAGGNNGATTTCTGAGTTNNAGGCCAGCCTGGTCTACGAAGTNANTTCCAGGACAGCCAGGGCTACACAGAGAAACCCTGTCTC). This query sequence should show minimal bias in a search because of species or subfamily characteristics that allow for a better estimation of insertion time based on divergence from this common consensus. The Blast program using this query sequence on both rat and mouse genomes were then used with various expect values and with the low complexity filter disabled.

    Results

    Long A-tails in Rodent and Human Genomes

    Previous studies demonstrated an inverse correlation between Alu element age and A-tail length (Roy-Engel et al. 2002). We wished to determine whether currently active subfamilies of SINEs could be identified from other genomes by searching the genome for elements associated with long A-tails. The rat, mouse, and human genomes were analyzed using a query sequence of 50 adenine residues (A50) in BlastN with an expect value that would only allow perfect matches for the presence of 50 A-residue homopolymer. We identified 91, 121, and 119 such regions present in the rat, mouse, and human genomes, respectively. RepeatMasker was used to identify the repetitive elements that were adjacent to the retrieved long A-tails (table 2). A much higher proportion of the human long A-tails was associated with mobile elements than the rodents. Furthermore, there were no human L1 elements associated with A50. The majority of the retrieved A50 were associated with Alu elements in humans, whereas in rodents, there were relatively more L1 elements and less SINE elements. The rodent SINE element distribution showed a significant variation between the two species, with ID elements representing the majority in rat and B1 elements representing the majority in mouse.

    Table 2 Summary of Mobile Elements with A-tails Longer Than A50

    Characterization of the Rat A50 ID Elements

    The 22 ID elements associated with long A-tails in rat were further classified into subfamilies using the known diagnostic positions (Kim and Deininger 1996) (table 3). The elements were numbered arbitrarily; corresponding to the original A50 retrieved sequence. Ten of the 22 ID/A50 elements identified were classified as ID4, 1 as ID3.5, 2 as ID3, and 9 as ID2a. Two elements were too divergent for subfamily identification. We eliminated these latter two elements and the ID2a subfamily elements from further analysis because they were also generally more divergent and showed no additional identical copies in the genome. The elements 47, 39, and 51 all represented an identical sequence that was diverged from the type 4 consensus (fig. 1 [Kim and Deininger 1996]) only at position 46. These three elements only differed from one another within the A-rich region, with element 39 having a homopolymeric A-tail and elements 47 and 51 having (A)nGAACC(A)50 with n being 7 and 8, respectively (see figure 1). We will refer to the ID elements related to the elements 39, 47, and 51 (table 3), with the G to A change at position 46, as being subfamily ID4b. There were over 15,000 perfect matches to the ID4b consensus. Of these, 90 #39-like elements had only simple A-tails, and 681 #47-like elements and 156 #51-like elements had A-tails matching the GAACC interruption (table 3). Visual inspection of the A-tails on the rest of the perfect ID4b matches revealed that the vast majority either had minor variations in the number of A residues before the interrupting sequence or had minor variations in the interrupting sequence (i.e., GAACCC). The element 77 also had about 7,000 perfect matches, whereas the other A50 ID elements had very few, if any, exact copies. These data suggest that the majority of these A50 ID elements are not actively amplifying but that two distinct subfamilies (ID4b and those related to element 77) seem to have amplified recently.

    Table 3 A50 Subfamily Designation and Copy Number

    FIG. 1.— Model of ID element structure. The ID body portion of the element consists of a 75-nucleotide core sequence with diagnostic base changes used to divide individual rat ID elements into four major subfamilies. Position 1 is associated with ID 3.5 and 4; position 2 is associated with ID2c; position 3 is associated with ID2b; position 5 is associated with ID4; position 5 is associated with ID 3, 3.5, and 4; position 6 is associated with ID 2a through 4; and position 7 is associated with ID1. ID subfamilies 2 to 4 have a unique sequence motif in the rat (GAACC) embedded within the oligo-(dA) tail at approximately positions 83 to 87. This GAACC motif is not found in all members of each subfamily. Black arrowheads indicate short flanking direct repeats formed at the position of insertion. The length of both A-regions and the exact sequence of the GAACC motif vary between individual elements.

    Analysis of Active SINE Loci Within the Rattus norvigicus Genome

    To confirm that the individual loci from the ID4b members represent a recently amplified subfamily, we PCR-amplified individual loci from R. norvegicus and R. rattus DNA. All of the 16 amplified loci were present in R. norvegicus but absent from the R. rattus genome (fig. 2) and, therefore, were inserted in the past 2 myr (Verneau, Catzeflis, and Furano 1998). One of the loci (Rn 6.2, [fig. 2b]) amplified two bands from the R. norvegicus DNA. Sequence analysis demonstrated that the upper band contained the expected ID element, and the lower bands that were present in both genomes were the preintegration sequence. Thus, this ID element was found to be polymorphic within the R. norvegicus genome while being absent from the R. rattus genome, indicating a very recent insertion.

    FIG. 2.— Genomic PCR of ID4b loci. PCR analysis of selected loci of R. norvegicus and R. rattus to determine the presence/absence of the ID4b insert. Primers for randomly selected ID elements related to elements 47 and 51 were designed (table 1) from the R. norvegicus (Rn) sequence obtained from the original query. PCR products for the R. norvegicus and R. rattus (Rr) show an approximately 100-bp variation in size, indicative of ID insertions in Rn but absent from Rr. The subpanel demonstrates the presence of two bands in the Rn, with a single band in the Rr sample using the primer set 6.2. After the cloning of these PCR products, sequence analysis revealed the presence of the ID element in band 1 (the upper band) in Rn but its absence from band 2 and 3 in Rn and Rr, repectively. Thus, the ID4b subfamily represents only very young, Rn-specific insertions, at least one of which is polymorphic in Rn.

    A-tail Distribution of the ID4b Elements

    We examined the A-tails of individual genomic loci of the ID4b subfamily members and found that they had a significantly different mean length (P < 0.01), mostly differing in the number of elements with shorter A-tails, comparable with the distribution observed for the most active human Alu family members, Ya5 (fig. 3) (Roy-Engel et al. 2002). Most of the analyzed ID4b subfamily members begin their A-rich region with A7/8GACCAn, as is commonly seen in many recent ID subfamilies (Kim and Deininger 1996). This suggests that the retrotransposition priming mechanism frequently occurs downstream of the GAACC interruption for this class of elements. We added the A7/8GACC sequence as part of the length of the A-rich regions because it has been shown that a similarly interrupted A-tail region of the BC1 RNA is still capable of binding poly(A) binding protein (West et al. 2002; Muddashetty et al. 2002) and supporting retrotransposition (Hagan, Sheffield, and Rudin 2003). If this region were not considered, the A-tail size distributions (fig. 3) would be shifted 11 bases to the left. The relatively inactive ID1 elements, derived from random selection of this older subfamily, show a much broader range with a bias toward the smaller lengths.

    FIG. 3.— Distribution of SINE A-tail lengths. A-tail lengths of young Alu (Ya5), ID-1 (older) elements, and ID elements of the ID4b subfamily were determined by analyzing random copies from the H. sapiens (Ya5) and R. norvegicus (ID-1 and ID4b) genomes. The length of the A-tail was considered as the number of bases between the last nucleotide of the consensus sequence before the homopolymeric adenine region up to the first non-A base in the direct repeat. The GAACC motif may help the elements maintain a long, although imperfect, A-tail. The Ya5 profile is from a previous study (Roy-Engel et al. 2002), and ID1 subfamily members are included to show the distribution of A-tail lengths from an older ID subfamily.

    Mouse B1/A50 Distribution

    Previous studies demonstrated that rat had a much higher ID amplification rate than mouse (Kim et al. 1994; Kass, Kim, and Deininger 1996). Our analysis of the mouse genome with the same A50 query detected only three ID elements with long A-tails. However, B1 elements with long A-tails were more prevalent in mouse, while being almost absent in rat (table 2). These data suggested the possibility that B1 elements may currently be much more active in mouse than rat.

    We utilized a generic rat/mouse B1 consensus to query these genomes under high stringency to allow only close matches to the consensus, to corroborate the observation that there is a higher proportion of young B1 elements in mouse than in rat. Using an expect value of 1e-24, there were 5,668 Blast sequence matches within this parameter for the mouse genome, compared with 409 Blast matches for the rat genome. Similar proportions were also obtained when using alternative expect values that would allow different mismatch levels relative to the consensus. Thus, the mouse genome contains a higher proportion of B1 elements with high similarity to the consensus, consistent with a higher recent rate of amplification.

    We also noted that the B1 elements selected with the A50 query included non-A bases in the consensus of their A-tail region (table 4). They routinely contained a short run of A residues (four to six) followed by one to three C residues and the An stretch in the majority of cases.

    Table 4 A-tail Examples from Recent Mouse B1 Insertions

    Discussion

    Human Alu elements have been demonstrated to show a very strong correlation between the number of elements with long A-tails and the recent activity of those subfamilies of elements (Roy-Engel et al. 2002). We have extrapolated this result to demonstrate that we are able to predict the most recently active ID subfamilies, as judged by their species specificity and polymorphism, from the rat genome. Identification of the currently active subfamilies allows a simple approach for finding elements that are most likely to be polymorphic or that allow discrimination of recent population or evolutionary events (Watkins et al. 2003; Deininger et al. 2003; Nikaido et al. 2001). This type of analysis is likely to be effective on any genome containing the nonstringent, nonautonomous SINEs that are typical of mammalian genomes. Although similar SINEs are found sporadically throughout other phyla, not all of them necessarily utilize a homogeneous A-tail (Kaukinen and Varvio 1992). Even those that do utilize an A-tail in their insertion process may vary in the required length of the A-tail. Within the appropriate class of SINEs, however, finding the most recently active elements is also likely to prove valuable in tracking the evolution of relatively understudied SINE species in those genomes and at least crudely estimating the current activity of those elements in the genome.

    Long A-tails in Rodent Elements

    ID elements have amplified at a high rate specifically within the rat genome, resulting in approximately 10 times as many copies as in the mouse genome (Kim et al. 1994). In the rat, only about 10% of ID elements appear to have been generated by the BC1 master gene (Kim et al. 1994). Thus, although the BC1 RNA gene has dominated the evolution of ID family members in most rodent genomes (Kass, Kim, and Deininger 1996), the rat genome has new subfamilies of elements that have diverged from the BC1 RNA locus. The A50 query identified elements that were dominated by the youngest of these subfamilies, ID4 (table 3). In particular, three of these ID elements were identical in sequence to over 15,000 elements, suggesting that they represented a minor variant of ID4, termed ID4b, that is a currently active subfamily. This current activity was confirmed by identifying that 16/16 of the members of this new subfamily of elements found in the Rattus norvegicus genomic sequence were detected by PCR in Rattus norvegicus but not in Rattus rattus. This suggests that they have all inserted within the last 2 Myr. One of the elements, which matches the #47 group of ID4b elements, was even polymorphic for its presence in Rattus norvegicus, suggesting extremely recent insertion. Thus, this approach appears to have identified at least some of the more active ID elements in the rat genome.

    The mouse genome contained 24 long A-tailed B1 elements, whereas the rat had only one. Our observations predict that B1 elements have been more active recently in mouse than rat. This was strongly supported by finding that more than an order of magnitude more mouse elements were found to be a close match to a generic B1 consensus relative to rat B1 elements. There has also been a recent report of a mutation in mouse caused by a very recent B1 insertion (Gilbert et al. 2004). Thus, it appears that in parallel with the ID expansion in the rat, there was a similar increase of B1 elements in mouse. L1 elements showed fairly equal numbers of elements with long A-tails in the rodent genomes in this analysis, and there were only very low levels of B2 or B4 elements identified with long A-tails, suggesting only modest activities of those SINEs.

    It is possible that the parallel expansion of ID and B1 in rat and mouse genomes, respectively, reflects only a limited capacity for SINE retroposition in those genomes and competition for those levels between different SINE species. The ID elements may compete more effectively in rat and the B1 in mouse. However, we favor an explanation that involves stochastic change of either ID or B1 elements in those respective genomes that resulted in changes in their amplification potential, as discussed below.

    Interrupted A-tails As a Mechanism for Increased Amplification

    We have hypothesized that SINE amplification rates are proportional to the length of the A-tail on individual elements (Roy-Engel et al. 2002). We note that the distributions of A-tails for the recently active ID subfamilies are similar in length to those from the active human Alu subfamilies (fig. 3). In fact, the ID elements are longer on the average because they do not contain as many elements on the shorter end of the distribution. However, the A-tails calculated in figure 3 also include the length of the A7GAACC sequence present in most of the recent ID subfamilies. The BC1 RNA gene has an A-rich stretch that is interrupted by a number of other bases, and yet it has served as the master gene for amplification of ID elements throughout rodent evolution (Shen, Batzer, and Deininger 1991; Kim et al. 1994) and is capable of supporting retroposition in a cultured model system (Hagan, Sheffield, and Rudin 2003). This demonstrates that an A-rich sequence is capable of supporting retroposition, and the adenine residues do not necessarily have to be homopolymeric. Furthermore, we have demonstrated that the BC1 imperfect A-tail is still capable of forming an RNP with poly(A) binding protein (West et al. 2002). The interruptions in the A-tails are likely to result in an evolutionary stabilization of the length of the A-tail, as has been observed for other microsatellites (Bacon, Farrington, and Dunlop 2000; Rolfsmeier and Lahue 2000). Simple microsatellites, including A homopolymers, are extremely unstable. Thus, the initially long A-homopolymers rapidly degrade to shorter lengths. If these shorter lengths are less capable of amplification, then the elements containing them would not propagate as well. If the interruptions in the A-homopolymers result in their stabilization, then the elements containing them are likely to maintain their amplification capability, resulting in the spread of the interrupted A-tails. It seems likely that an imperfect A-tail would not be as favorable as a perfect A-tail of the same length but that if it increases the length by 11 bases per element on the average, it may provide a selective advantage to those elements.

    The A-tails on many of the recently amplified mouse B1 elements also included an interruption, with a consensus of A5CCAn(table 4). Although there is some diversity, consistent with the rapid evolution of simple A-tails to more complex microsatellites (Arcot et al. 1995), they almost all share the A5CCAn motif. A minor variant of this motif, A6CCAn, is also included in the B1 element that resulted in the jittery mouse strain (Gilbert et al. 2004), as well as its two possible progenitors. These data suggest the possibility that the same strategy has been utilized independently by several active SINE families to amplify more effectively in rodent genomes.

    There may be some basic property of rodent genomes that favors the interrupted A-tail strategy over human. For instance, if homopolymeric A-tails are less stable in rodent than human, it might be necessary to have the interruptions to help provide stability, thus maintaining SINE activity in rodents. Despite the observation that rat microsatellites are generally longer than those in human, homopolymeric adenine sequences are much less common in rat than human (0.09 versus 0.34%, respectively) (Beckman and Weber 1992). These data are consistent with the possibility that SINEs in rodents have more difficulty maintaining a sufficiently long A-tail and have evolved the A-tail interruptions to avoid tail shortening.

    A Model for the Genomic Balance of SINE Retrotransposition

    The finding of long A-tails associated with the most recent SINE inserts can be explained by the findings that Alu element insertions have longer stretches of A residues at their 3' end than the element that served as the source of the insertion event (Dewannieux, Esnault and Heidmann 2003; Hagan, Sheffield, and Rudin 2003). However, both of these studies utilized elements with A-tails approximately 50 bases in length and do not provide data on the relative influence of A-tail length and amplification rate. Furthermore, it has been demonstrated that most of the SINE RNA transcripts are from inactive subfamilies of SINEs (Shaikh et al. 1997; Liu et al. 1994; Sinnett et al. 1992) and that even older, inactive Alu subfamilies can be made active by overexpressing them with a long A-tail (Hagan, Sheffield, and Rudin 2003). Along with other data previously presented (Roy-Engel et al. 2002), it is likely that long A-tails are beneficial for SINE amplification.

    SINE insertion depends on L1 element activity (Dewannieux, Esnault, and Heidmann 2003). Recently inserted L1 elements have also been shown to have long A-tails that rapidly shrink in size (Ovchinnikov, Troxel, and Swergold 2001; Roy-Engel et al. 2002). However, the A-tail is generated by a true polyadenylation process and, therefore, reflects recent activity, but less likely reflect potential for the element to maintain activity. However, the lack of long A-tailed L1 elements in human compared with the similar numbers in rat and mouse would be consistent with the much higher number of active L1 elements that have been estimated to be present in the mouse genome (Goodier et al. 2001; Kazazian 2000; Naas et al. 1998). It would be reasonable to expect long A-tails at the end of an L1 element to show the same instability as those in the SINEs. We cannot directly estimate the relative activity of human SINEs versus rodent SINEs from our data, because we do not know that A-tails follow similar stabilities in those genomes. However, our data strongly suggest that Alu is much more effective relative to L1 elements in human than are the rodent SINEs relative to rodent L1 elements. This may be consistent with hypotheses that human Alu elements have specific properties that make them particularly effective at amplifying (Dewannieux, Esnault, and Heidmann 2003; Schmid and Maraia 1992). In addition, L1 elements have been proposed to gradually evolve to avoid co-opting of the L1 proteins by SINEs for their own purposes. Thus, there may be periods of evolution in which the existing SINEs interact less effectively with the LINE retrotransposition apparatus or other cellular proteins. The SINEs may then evolve by selection for those elements that can interact better.

    We propose a model (fig. 4) in which the rate of retrotransposition is primarily controlled by the length of the A-tails of the SINE elements. In this model, we show that the natural equilibrium for the A-tails of SINEs favors them shrinking to a modest length of about 20 bases (Roy-Engel et al. 2002). We have found a few examples, however, where existing A-tails of Alu elements sporadically grew to a long length. Thus, we believe that existing elements in the genome would have a strong tendency to lose amplification potential but might occasionally be reactivated until other parts of the SINE mutate sufficiently to elminate amplification. The primary means of creating long A-tails on SINEs seems to be through the retrotransposition process itself (Dewannieux, Esnault, and Heidmann 2003; Hagan, Sheffield and Rudin 2003). Thus, if SINE retrotransposition rates are high, lots of elements with long A-tails will be created, resulting in more active retrotransposition. Factors that destabilize the length of homopolymeric adenines in a genome will also result in suppression of SINE activity. Alternatively, factors that stabilize the length of the A-rich region, such as interruptions in the homopolymeric A-tail, would tend to maintain amplification rate.

    FIG. 4.— SINE A-tail Cycle. We propose a model in which the rate of retrotransposition is influenced by the length of the A-tails of the SINE elements. In this model, new A-tail length increases with retrotransposition, as has been demonstrated recently (Dewannieux, Esnault, and Heidmann 2003; Hagan, Sheffield, and Rudin 2003), which makes them "potentially active." The natural equilibrium for the A-tails of SINEs favors them shrinking to a modest length of about 20 bases (negative instability) as has been shown previously (Roy-Engel), leading to "inactive SINEs." In rare cases, the A-tails on the older elements may also grow (positive instability), but the equilibrium is clearly towards smaller A-tails. The heavier arrows suggest those processes that are likely to have the most impact; however, the relative contributions may vary. Interruptions of the A-tail, such as the presence of the GAACC motif, may result in the increased genomic stability of the A-tail itself, enhancing the longevity or amplification potential of ID elements with long A-tails. The level of instability may also be different in different species or even in different chromosomal regions. If SINE retrotransposition rates are high, many elements with long A-tails will be created, resulting in more active retrotransposition. We should note, however, that not all newly created elements will be active, and their genomic environment or mutations may silence some of the loci immediately. Stimulatory factors (positive retroposition) might also include relative L1 activity, factors that stimulate SINE RNA production or the ability of SINE elements to interact with L1 retrotransposition machinery. Further, genomes evolve strategies to suppress mobile elements periodically. It has been proposed that both methylation and RNAi types of activities are involved in suppressing mobile elements (negative retroposition). Thus, whether a SINE family continues to amplify depends on a balance of these factors and other, as yet undefined, genetic and environmental influences on the amplification rate.

    The amplification of SINEs is unlikely to be solely dependent on the number and length of elements with long A-tails and L1 activity. Genomes may evolve strategies to suppress mobile elements periodically. Both methylation and RNAi types of activities are suggested to be involved in suppressing mobile elements (Carnell and Goodman 2003; Robertson and Wolffe 2000; Sijen and Plasterk 2003; Dawe 2003). Thus, whether a SINE family continues to amplify depends on a balance of these factors. Whether the balance is increasing the number of elements with long A-tails or they are shrinking faster than they are being replaced will determine whether the rate of amplification goes up or the elements are dying off. Alu is an example of an element that went through a massive increase 40 to 50 MYA but is in serious decline at present day (Lander et al. 2001; Shen, Batzer, and Deininger 1991). There are also examples of elements, such as MIR, that only have inactive elements left in the genome and have lost amplification capability completely, coinciding with the loss of L2 activity (Lander et al. 2001).

    Acknowledgements

    This research was supported by National Institutes of Health RO1 GM45668 (PD). The authors would like to thank Drs. S. Rothenburg and A.V. Furano for generous gifts of rat DNAs, and Drs. A. Roy-Engel, S. Gasior and V. Perepelitsa-Belancio for critical reading of the manuscript.

    References

    Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–410.

    Arcot, S. S., Z. Wang, J. L. Weber, P. L. Deininger, and M. A. Batzer. 1995. Alu repeats: a source for the genesis of primate microsatellites. Genomics 29:136–144.

    Bacon, A. L., S. M. Farrington, and M. G. Dunlop. 2000. Sequence interruptions confer differential stability at microsatellite alleles in mismatch repair-deficient cells. Hum. Mol. Genet 9:2707–2713.

    Batzer, M. A., G. E. Kilroy, P. E. Richard, T. H. Shaikh, T. D. Desselle, C. L. Hoppens, and P. L. Deininger. 1990. Structure and variability of recently inserted Alu family members. Nucleic Acids Res. 18:6793–6798.

    Beckman, J. S., and J. L. Weber. 1992. Survey of human and rat microsatellites. Genomics 12:627–631.

    Carnell, A. N., and J. I. Goodman 2003. The long (LINEs) and the short (SINEs) of it: altered methylation as a precursor to toxicity. Toxicol. Sci. 75:229–235.

    Daniels, G. R., and P. L. Deininger. 1985a. Integration site preferences of the Alu family and similar repetitive DNA sequences. Nucleic Acids Res. 13:8939–8954.

    ———. 1985b. Repeat sequence families derived from mammalian tRNA genes. Nature 317:819–822.

    Dawe, R. K. 2003. RNA interference, transposons, and the centromere. Plant Cell 15:297–301.

    Deininger, P. L., M. A. Batzer, C. A. Hutchison III, and M. H. Edgell. 1992. Master genes in mammalian repetitive DNA amplification. Trends Genet. 8:307–311.

    Deininger, P. L., J. V. Moran, M. A. Batzer, and H. H. Kazazian. 2003. Mobile elements and mammalian genome evolution. Curr. Opin. Genet. Dev. 13:651–658.

    Deininger, P. L., and V. K. Slagel. 1988. Recently amplified Alu family members share a common parental Alu sequence. Mol. Cell Biol. 8:4566–4569.

    Deininger, P. L., H. Tiedge, J. Kim, and J. Brosius. 1996. Evolution, expression, and possible function of a master gene for amplification of an interspersed repeated DNA family in rodents. Prog. Nucleic Acid Res. Mol. Biol. 52:67–88.

    Dewannieux, M., C. Esnault, and T. Heidmann. 2003. LINE-mediated retrotransposition of marked Alu sequences. Nat. Genet. 35:41–48.

    Gilbert, N., J. M. Bomar, M. Burmeister, and J. V. Moran. 2004. Characterization of a mutagenic B1 retrotransposon insertion in the jittery mouse. Hum. Mutat. 24:9–13.

    Goodier, J. L., E. M. Ostertag, K. Du, and H. H. Kazazian Jr. 2001. A novel active L1 retrotransposon subfamily in the mouse. Genome Res. 11:1677–1685.

    Hagan, C. R., R. F. Sheffield, and C. M. Rudin. 2003. Human Alu element retrotransposition induced by genotoxic stress. Nat. Genet. 35:219–220.

    Kass, D. H., J. Kim, and P. L. Deininger. 1996. Sporadic amplification of ID elements in rodents. J. Mol. Evol. 42:7–14.

    Kaukinen, J., and S. L. Varvio. 1992. Artiodactyl retroposons: association with microsatellites and use in SINEmorph detection by PCR. Nucleic Acids Res. 20:2955–2958.

    Kazazian, H. H. Jr. 1998. Mobile elements and disease. Curr. Opin. Genet. Dev. 8:343–350.

    ———. 2000. Genetics. L1 retrotransposons shape the mammalian genome. Science 289:1152–1153.

    Kim, J., and P. L. Deininger. 1996. Recent amplification of rat ID sequences. J. Mol. Biol. 261:322–327.

    Kim, J., J. A. Martignetti, M. R. Shen, J. Brosius, and P. Deininger. 1994. Rodent BC1 RNA gene as a master gene for ID element amplification. Proc. Natl. Acad. Sci. USA 91:3607–3611.

    Labuda, D., and G. Striker. 1989. Sequence conservation in Alu evolution. Nucleic Acids Res. 17:2477–2491.

    Lander, E. S., L. M. Linton, B. Birren et al. (252 co-authors) 2001. Initial sequencing and analysis of the human genome. Nature 409:860–921.

    Liu, W. M., R. J. Maraia, C. M. Rubin, and C. W. Schmid. 1994. Alu transcripts: cytoplasmic localisation and regulation by DNA methylation. Nucleic Acids Res. 22:1087–1095.

    Muddashetty, R., T. Khanam, A. Kondrashov et al. (11 co-authors) 2002. Poly(A)-binding protein is associated with neuronal BC1 and BC200 ribonucleoprotein particles. J. Mol. Biol. 321:433–445.

    Musser, G. G. 1993. Pp.501–755 in M. D. Carleton, ed. Mammal species of the world: a taxonomic and geographic reference. Family Muridae. Smithsonian Institution Press, Washington DC.

    Naas, T. P., R. J. DeBerardinis, J. V. Moran, E. M. Ostertag, S. F. Kingsmore, M. F. Seldin, Y. Hayashizaki, S. L. Martin, and H. H. Kazazian. 1998. An actively retrotransposing, novel subfamily of mouse L1 elements. EMBO J. 17:590–597.

    Nikaido, M., F. Matsuno, H. Hamilton et al. (11 co-authors) 2001. Retroposon analysis of major cetacean lineages: the monophyly of toothed whales and the paraphyly of river dolphins. Proc. Natl. Acad. Sci. USA 98:7384–7389.

    Ogiwara, I., M. Miya, K. Ohshima, and N. Okada. 1999. Retropositional parasitism of SINEs on LINEs: identification of SINEs and LINEs in elasmobranchs. Mol. Biol. Evol. 16:1238–1250.

    Okada, N. 1991. SINEs. Curr. Opin. Genet Dev. 1:498–504.

    Ostertag, E. M., and H. H. Kazazian Jr. 2001. Biology of mammalian L1 retrotransposons. Annu. Rev. Genet. 35:501–538.

    Ovchinnikov, I., A. B. Troxel, and G. D. Swergold. 2001. Genomic characterization of recent human LINE-1 insertions: evidence supporting random insertion. Genome Res. 11:2050–2058.

    Robertson, K. D., and A. P. Wolffe. 2000. DNA methylation in health and disease. Nat. Rev. Genet. 1:11–19.

    Rolfsmeier, M. L., and R. S. Lahue. 2000. Stabilizing effects of interruptions on trinucleotide repeat expansions in Saccharomyces cerevisiae. Mol. Cell Biol. 20:173–180.

    Roy-Engel, A. M., A. H. Salem, O. O. Oyeniran, L. Deininger, D. J. Hedges, G. E. Kilroy, M. A. Batzer, and P. L. Deininger. 2002. Active Alu element "A-tails": size does matter. Genome Res. 12:1333–1344.

    Sakamoto, K., and N. Okada. 1985. Rodent type 2 Alu family, rat identifier sequence, rabbit C family, and bovine or goat 73-bp repeat may have evolved from tRNA genes. J. Mol. Evol. 22:134–140.

    Schmid, C., and R. Maraia. 1992. Transcriptional regulation and transpositional selection of active SINE sequences. Curr. Opin. Genet. Dev. 2:874–882.

    Shaikh, T. H., A. M. Roy, J. Kim, M. A. Batzer, and P. L. Deininger. 1997. cDNAs derived from primary and small cytoplasmic Alu (scAlu) transcripts. J. Mol. Biol. 271:222–234.

    Shen, M. R., M. A. Batzer, and P. L. Deininger. 1991. Evolution of the master Alu gene(s). J. Mol. Evol. 33:311–320.

    Sijen, T., and R. H. Plasterk. 2003. Transposon silencing in the Caenorhabditis elegans germ line by natural RNAi. Nature 426:310–314.

    Sinnett, D., C. Richer, J. M. Deragon, and D. Labuda. 1992. Alu RNA transcripts in human embryonal carcinoma cells: model of post-transcriptional selection of master sequences. J. Mol. Biol. 226:689–706.

    Smit, A. F. 1996. The origin of interspersed repeats in the human genome. Curr. Opin. Genet. Dev. 6:743–748.

    Ullu, E., and C. Tschudi. 1984. Alu sequences are processed 7SL RNA genes. Nature 312:171–172.

    Verneau, O., F. Catzeflis, and A. V. Furano. 1998. Determining and dating recent rodent speciation events by using L1 (LINE-1) retrotransposons. Proc. Natl. Acad. Sci. USA 95:11284–11289.

    Waterston, R. H., K. Lindblad-Toh, E. Birney et al. (216 co-authors) 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562.

    Watkins, W. S., A. R. Rogers, C. T. Ostler et al. (11 co-authors) 2003. Genetic variation among world populations: inferences from 100 Alu insertion polymorphisms. Genome Res. 13:1607–1618.

    West, N., A. M. Roy-Engel, H. Imataka, N. Sonenberg, and P. Deininger. 2002. Shared protein components of SINE RNPs. J. Mol. Biol. 321:423–432.(Guy L. Odom, Jennifer L. )