Evolution of small nucleolar RNAs in nematodes
http://www.100md.com
《中华首席医学网》
ABSTRACT
In contrast to mRNAs, which are templates for translating proteins, non-protein coding (npc) RNAs (also known as ‘non-
coding’ RNA, ncRNA), exhibit various functions in different compartments and developmental stages of the cell. Small
nucleolar RNAs (snoRNAs), one of the largest classes of npcRNAs, guide post-transcriptional modifications of other RNAs that
are crucial for appropriate RNA folding as well as for RNA–RNA and RNA–protein interactions. Although snoRNA genes comprise
a significant fraction of the eutherian genome, identifying and characterizing large numbers of them is not sufficiently
accessible by classical computer searches alone. Furthermore, most previous investigations of snoRNAs yielded only limited
indications of their evolution. Using data obtained by a combination of high-throughput cDNA library screening and
computational search strategies based on a modified DNAMAN program, we characterized 151 npcRNAs, and in particular 121
snoRNAs, from Caenorhabditis elegans and extensively compared them with those in the related, Caenorhabditis briggsae.
Detailed comparisons of paralog snoRNAs in the two nematodes revealed, in addition to trans-duplication, a novel, cis-
duplication distribution strategy with insertions near to the original loci. Some snoRNAs coevolved with their modification
target sites, demonstrating the close interaction of complementary regions. Some target sites modified by snoRNAs were
changed, added or lost, documenting a high degree of evolutionary plasticity of npcRNAs.
INTRODUCTION
Two very surprising discoveries have arisen from the Human Genome Project. One, humans do not have significantly more
protein-coding genes than other mammals; and two, sequences corresponding to protein open reading frames comprise only 1.5%
of our genome (1). The unavoidable conclusion to be drawn from this is that the differences that separate humans from other
species may reside in the remaining 98.5% of the genome that encode untranslated functional RNAs and regulatory regions, or
constitutes non-genic regions. The present work focuses on a defined population of non-protein coding RNAs (npcRNAs), often
not quite correctly termed ‘non-coding’ RNA (ncRNA), derived from a Caenorhabditis elegans cDNA library generated with
size-fractionated RNA (70–600 nt). The size limitation, while excluding mature microRNAs (miRNAs), short interfering RNAs
and large ribosomal RNAs (rRNAs) that are well described elsewhere (2,3), yields predominantly small nucleolar RNAs (snoRNAs)
and spliceosomal RNAs. snoRNAs are 60–300 nt long and guide the post-transcriptional modifications of ribosomal and other
RNAs. Such modifications are crucial for appropriate RNA folding as well as for RNA–RNA and RNA–protein interactions (4).
Furthermore, snoRNAs are thought to be involved in epigenetic mechanisms regulating gene expression. In this context,
deletion of certain imprinted snoRNA clusters in the cerebral cortex is thought to play a causative role in the Prader–Willi
Syndrome of mental retardation (5–7).
Based on structural motifs and function the snoRNA family is divided into two subclasses: C/D-box snoRNAs (C-box
consensus UGAUGA; D-box consensus CUGA) and H/ACA snoRNAs (H-box consensus ANANNA and box ACA), which interact directly by
base complementarity to their target rRNA and spliceosomal RNA sequences to direct 2'-O-ribose methylation and
pseudouridylation, respectively. The complementary regions, known as ‘antisense elements’, reside at the 5' and/or 3' ends
of snoRNAs. Although snoRNA modifications were initially thought to be restricted to rRNA and to be localized strictly in the
nucleolus, a growing list of npcRNAs including transfer RNAs (tRNAs) are also modified by snoRNAs (4,8), and they have also
been found in Cajal bodies, nucleoplasmic substructures involved in processing npcRNAs (9). The spectrum of snoRNA targets
could potentially include even mRNAs, although it cannot be excluded yet that such existing base complementarities are simply
fortuitous and without biological significance (5). Most vertebrate snoRNAs are derived from introns of pre-mRNA transcripts,
especially those from ribosomal protein genes (RPGs) and other housekeeping proteins, and are processed in a complex sequence
involving endonucleases, exonucleases and helicases (10,11). Interestingly, a growing number of host genes do not yield
translatable mRNAs, and it appears that the main function of the corresponding genes and primary transcripts is the
expression of snoRNAs (12–14). Many miRNAs are also hosted by npcRNAs (15).
Systematic searches using experimental RNomics, an EST-like approach tailored for small RNAs, have successfully
identified large numbers of npcRNAs in Mouse (16), Drosophila (17), Arabidopsis (18) and Archaea (8). To better elucidate the
evolutionary pathways of snoRNAs, we have now extended this search to the nematode, C.elegans, an extremely interesting model
eukaryote with a simple body plan but complicated genomics including, for example, cis, trans and alternative splicing
systems. As intermediates between single-celled organisms and ‘higher’ metazoan animals, they offer an excellent system for
studies on metazoan genome function and evolution. To provide a large enough dataset for exhaustive analysis of snoRNAs in
C.elegans we have now combined high-throughput, experimental RNomic screening with computational methods focused on RPGs and
other introns of genes that harbor snoRNAs identified in our experimental approach. Furthermore, we have analyzed the
phylogeny of snoRNAs by comparing the above results with those of Caenorhabditis briggsae, a nematode that shared a common
ancestor with C.elegans some 100 million years ago (mya). Our three-pronged approach revealed possible mechanisms of how
novel snoRNAs arose, spread in the genome, changed targets or were lost over the course of evolution.
MATERIALS AND METHODS
The experimental procedures concerning construction and analysis of libraries are described in Hüttenhofer et al. (16).
Detailed methods for constructing the C.elegans library are given in Supplementary Data.
Computational strategies
The commercial software package DNAMAN was modified, in collaboration with the Lynnon Corporation, to computationally
screen defined databases of intronic sequences for snoRNAs and to identify snoRNA modification target sites from compiled RNA
databases. The modified DNAMAN version is available from http://www.lynnon.com (Mac OS X version 6018 or later). Note that
additional freeware is available at http://lowelab.ucsc.edu/snoscan to analyze C/D-box snoRNAs (19), at
http://lowelab.ucsc.edu/snoGPS for H/ACA-box snoRNAs (20) and at http://rna.tbi.univie.ac.at/cgi-bin/alifold.cgi to analyze
secondary structural prediction (21).
Computational search for snoRNAs in C.elegans
The modified DNAMAN software allowed us to apply complex search profiles to find potential snoRNAs in a compilation of
C.elegans introns of RPGs and genes that harbor experimentally identified snoRNAs. The following search was applied for C/D-
box snoRNAs:
TGATGA(N9-35)CTGA(N4-35)TGATGA(N9-35)CTGA
A maximum of three mismatches were allowed in the first, two in the second, three in the third and one in the last
sequence motif. N9-35 and N4-35 denote variable sequence stretches of at least 9 or 4, respectively, and a maximum of 35 nt.
The search motif for H/ACA-box snoRNAs was ANA(NN)A(N50-100)ACA. No mismatches were allowed. Both searches were accompanied
by intensive structural evaluations of the computationally predicted snoRNAs (Supplementary Figure 1).
The pattern search of DNAMAN is implemented in C language. Details of the search procedure are provided by the Lynnon
Corporation (Supplementary Data).
5' Extension of experimentally found snoRNAs
BLAST searches of the cDNA sequences were made against the C.elegans non-redundant (nr) NCBI database
(http://www.ncbi.nlm.nih.gov/blast), the Santa Cruz server (http://genome.ucsc.edu/cgi-bin/hgBlat), the Sanger database
(http://www.ensembl.org/Caenorhabditis_elegans/blastview) or the RPG databank (http://ribosome.miyazaki-med.ac.jp). Thus, the
sequences absent in the truncated cDNAs (usually some 10 nt) were extended with the aid of genomic sequences. The mature 5'
ends were estimated by structural requirements of mature snoRNAs (Supplementary Figure 1).
Target site search
A compiled library of all C.elegans rRNA, spliceosomal and tRNA genes was searched with the modified DNAMAN software for
potential snoRNA target motifs. For C/D-box snoRNA target sites we allowed a maximum of three G–U pairs and a minimum length
of 9 nt. For H/ACA-box snoRNAs we used a similar search profile but allowed a split of target sites in four or five
contiguous nucleotides. The detailed search process is shown in Supplementary Figure 1.
Comparison with C.briggsae sequences
The same database sources as mentioned above were used to computationally detect orthologous snoRNAs in C.briggsae.
Secondary structures of snoRNAs
The secondary structures of all experimentally and computationally identified snoRNAs were derived using the M-fold
program (22); http://www.bioinfo.rpi.edu/applications/mfold/old/rna.
RESULTS
Analysis of the size-fractionated cDNA library of C.elegans
Following high-density array hybridization of 38 400 cDNA sequence clones to exclude known small npcRNAs or fragments of
degraded large rRNAs (Supplementary Figure 2), we selected 4673 clones for sequencing. Exclusion of unreadable or very short
sequences, empty vectors, E.coli contaminations and other ambiguities yielded 3294 clones; among these we identified 15 known
spliceosomal RNAs (294 sequences), 41 known tRNAs (322 sequences), 3 isoforms of SRP (signal recognition particle) RNA (736
sequences), 29 different parts of known rRNAs that escaped prior exclusion (1180 sequences), 22 known mRNAs (31 sequences), 7
splice leader RNA sequences (SL; 64 sequences) and two histone hairpin RNAs (2 sequences) all of which were excluded from a
more detailed analysis (SL, SRP, histone hairpin and spliceosomal RNAs are listed in Supplementary Data). The remaining 665
sequences contained 120 npcRNAs including 91 snoRNAs (Figure 1).
Computational screening for additional npcRNA candidates
In addition to those npcRNAs identified experimentally, computer searches based on the following arguments yielded
another 23 snoRNAs (Figure 1). Yoshihama et al. (11) estimated that RPGs harbor about one-third of all snoRNAs in the human
genome. In our experimentally identified snoRNAs we also observed that genes harboring one snoRNA in an intron are likely to
encode additional snoRNAs in the same intron or in neighboring introns of the same gene. Consequently, we extracted and
analyzed snoRNA candidates from introns of all known C.elegans RPGs and from other intronic sequences that were found in the
proximity of our experimentally detected snoRNAs. We used the following stringent criteria to validate all computationally
detected snoRNA candidates (Supplementary Figure 1): (i) presence of all snoRNA structural requirements and box motifs; (ii)
identification of potential modification target site complementarities, (iii) sequence conservation in C.briggsae, and/or
signals in northern blots. From >100 potential candidates (Supplementary Data) an additional 23 novel snoRNA candidates met
these stringent criteria (Figure 1; comCe).
The reliability of the computational algorithm was confirmed in that we were also able to identify all but 17 of the
experimentally found or previously predicted snoRNAs with these search criteria. Those snoRNAs not confirmed in the computer
search were structurally modified and therefore did not match our search profile (data not shown). An additional BLAST search
of Genbank genomic sequences revealed seven snoRNA paralogs (Figure 1, blCe, blCb) and one additional spliceosomal RNA
(blCe378).
snoRNAs
Of all 154 experimentally or computationally identified sequences, 59 are novel snoRNAs candidates (Figure 1, I), while
65 of the recovered snoRNA candidates were recently confirmed experimentally (23) (59 candidates) or (24) (6 candidates)
(Figure 1, II; Supplementary Data). For completeness, Figure 1 (III) also lists 20 other snoRNA candidates that were not
recovered by our screen, but were identified previously by either Deng et al. (23) (18 candidates) or Wachi et al. (24) (2
candidates), and is thus now a compilation of all presently known C.elegans snoRNAs.
Altogether, we found 76 unambiguous snoRNA candidates with motifs, secondary structure elements and recognizable target
modification complementarities characteristic of C/D-box snoRNAs. Based on their chromosomal locations, individual candidates
could be described as either intronic or intergenic snoRNAs (Figure 1 and Supplementary Data). All but 16 are also
potentially functional in C.briggsae and are located at orthologous loci; 10 of these 16 were recognizable, but diverged, at
orthologous positions in C.briggsae (Supplementary Figure 3). Presumably, they became inactive pseudogenes that lack motifs
and structures to function as bona fide snoRNAs. Interestingly, in all C/D-box snoRNA candidates (as well as the H/ACA-box
snoRNAs) we identified a characteristic uridine-rich region adjacent to the mature 3' ends. This sequence has previously been
implicated in maturation of H/ACA-box snoRNAs only (25). We also found a C/D-box homodimer (Ce234) and a chimeric C/D-H/ACA-
box snoRNA (Ce104). Northern blot analysis of both resulted in hybridization to only dimeric forms, indicating that the
respective dimers are the mature forms of these snoRNAs. Interestingly, we detected only six C/D-box snoRNA candidates in RPG
introns compared with 20 H/ACA-box snoRNAs (Figure 1).
We also identified 48 H/ACA-box snoRNAs that were localized to intronic and intergenic regions (Figure 1 and
Supplementary Data). Only seven of those are probably not functional in C.briggsae (Figure 1). The sequences of three H/ACA-
box snoRNA orthologs are apparently diverged pseudogenes in C.briggsae (Supplementary Figure 3; comCb17, comCb22, blCb176).
snoRNA modification target sites and distribution patterns
In keeping with their function, snoRNAs have dual binding capacity for both small RNA modifying proteins and, via
specific sequence complementarity, for their target RNAs. We identified complementarities for potential modification targets
in 5S, 5.8S, 18S and 26S rRNAs; in U1, U2, U4, U5 and U6 spliceosomal RNAs and in tRNAs. Twelve 26S rRNA target sites are
supported by the presence of nucleotide modifications (26) (Supplementary Figure 3, black dots). This is the first time that
C/D-box snoRNAs in eukaryotes have been identified with potential target sites in various tRNAs (Ce62-tRNAIle, Ce63-tRNASer,
Ce94-tRNAAsn, Ce246-tRNAIle, comCe3-tRNAThr, comCe18-tRNAArg) (Figure 2). tRNA modifications guided by snoRNAs have been
reported thus far only in Archaea (8,27). Another interesting observation was the presence of two antisense elements in some
of our snoRNAs (e.g. Ce173.3, Ce251, Ce298, Ce23) with complimentary regions suggesting the potential to modify target RNAs
located in two different subcellular compartments. These snoRNAs are predicted to modify rRNAs that occur in the nucleolus,
as well as U1, U4, U5 spliceosomal RNAs that are present in Cajal bodies. Even an individual antisense element has the
potential to be complementary to more than one hypothetical target site (Supplementary Figure 3).
From our 121 experimentally and computationally identified snoRNAs in C.elegans, 98 potentially functional orthologs were
identified in C.briggsae (Figure 1). Forty of these orthologous pairs contain matching sequence complementarities to the same
RNA modification targets in C.briggsae and C.elegans (Supplementary Figure 3a and b). Surprisingly, the potential target
sites for the majority of them changed over a period of 100 million years (Supplementary Figure 3c and d).
snoRNA paralogs, generated perhaps by gene duplication, have been observed frequently and are a potential source for the
creation of novel snoRNAs (28). We identified 20 snoRNAs and their corresponding paralogs (11 pairs are orthologs in both
C.elegans and C.briggsae, 6 pairs in C.elegans and 3 pairs in C.briggsae only; Figure 3a). To help determine whether the
computationally identified H/ACA-box snoRNA paralogs are functional, we analyzed the compensatory nucleotide substitution
patterns in their double-stranded stem structures. Compensatory changes tend to maintain the secondary structure of stem
regions and indicate selection pressure for functionality. Characteristic compensatory changes could be found for all
identified H/ACA-box snoRNA paralogs suggesting that they retained their functionality, at least for a sufficient period to
form individual compensations after duplication (data not shown). Compensatory substitution pattern analyses in C/D-box
snoRNAs are not of much help in determining their functionality because they do not possess sufficient amounts of double-
stranded structures; thus, C/D-box snoRNAs were omitted from this analysis.
The chromosomal localization of snoRNA paralogs could be categorized based on two distinguishing events: those in which
snoRNA paralogs inserted into different positions in the same gene (cis-duplication), and those in which snoRNA paralogs
inserted into target genes (or intergenic regions) other than the original host gene or, in the event of host gene
duplication, moved to a different chromosomal location along with the host gene (trans-duplication).
Cis-duplication of snoRNA paralogs
The presence and/or absence of 20 snoRNA paralogs were analyzed in C.elegans and C.briggsae. Figure 4a shows a C.elegans
snoRNA (comCe12) that is conserved at the orthologous locus (intron 2 of the rps-29 gene) in C.briggsae. A paralog of this
snoRNA (Ce236) was also present in our cDNA library. In C.elegans Ce236 is located in intron 1 of the same rps-29 host gene,
while the orthologous position in the rps-29 gene of C.briggsae is empty. Since the probability of a clean excision of the
snoRNA without parts of the flanking sequences at this position in C.briggsae is negligible, this indicates a duplication
process involving integration into the adjacent intron (cis-duplication) after C.elegans split from a common ancestor with
C.briggsae. Interestingly, the function of Ce236 in C.elegans may have been replaced in C.briggsae by another non-orthologous
snoRNA (Cb309) that has the potential to modify the identical nucleotide in 26S rRNA, while the Cb309 ortholog in C.elegans
(Ce309) modifies a target sequence in 18S rRNA. A similar scenario is shown in Figure 4b. We found a snoRNA (comCe7) present
at orthologous positions of the rpl-24 gene in C.elegans, C.briggsae and Caenorhabditis remanei. A paralog (Ce80) could be
detected in intron 1 of the same gene in C.elegans only. Figure 4c shows a snoRNA (comCe14) present in orthologous positions
of the hypothetical protein gene K07C5 in C.elegans and C.briggsae. A corresponding paralog is found in intron 7 of the same
gene in C.elegans (comCe15) but not in C.briggsae. In all presence/absence cases examined, the intronic sequences flanking
the duplicated snoRNAs were recognizable at the corresponding, empty loci.
Trans-duplication of snoRNA paralogs
We could distinguish two forms of trans-duplications, both of which are exemplified in Figure 5. In some instances of
segmental duplications of entire genes that harbor snoRNAs in one or more introns, the snoRNA did not move to another part of
the host gene but hitchhiked with the host to a new location after duplication. In other cases, snoRNAs inserted into introns
of a new host gene without traces of the original host gene, or into a new intergenic location. Figure 5a describes an
example of segmental duplication of a hypothetical protein gene (C06A1.3) yielding a duplicated pseudogene (with respect to
the protein-coding capacity) including the paralogous snoRNAs Ce173.1-3. Figure 5b shows two experimentally identified snoRNA
paralogs; Ce254b is located in intron 1 of a hypothetical protein gene (Y53F4B.12). The paralog Ce254b duplicated, along with
the 5' (100 nt) and 3' (50 nt) sequences of its original flanking intron, but without detectable surrounding exons, and
migrated into a new location on a different chromosome. Interestingly, the separate left and right antisense elements of
Ce254 (Figure 3b, bottom) modify bases in 26S rRNA that are shifted by only 6 nt. Hence, the sequences on 26S rRNA that are
complementary to the two snoRNA antisense elements overlap by 2 nt (Figure 3b, top). This indicates that modification of the
two methylation targets is not likely to occur at the same time. We found both paralogs at orthologous positions in
C.briggsae, indicating that the duplication event took place in a common ancestor of both worms. Both forms retained their
modification targets over 100 my demonstrating strong functional constraints. The fact that two conserved snoRNA paralogs
modify the same targets indicates that one may not be enough to perform modification of all rRNA molecules, and that
quantitative aspects play an important role in snoRNA function. Figure 5c and d describe snoRNA paralogs that are located in
totally different surroundings following duplication. In the latter case it is noteworthy that the comCe6 paralog moved from
one RPG (rpl-7) to another (rps-13) as the Ce280 paralog or vice versa.
Functional plasticity of snoRNA paralogs
Data provided by both the experimental and computational searches, as well as comparisons of paralogous snoRNAs in both
C.elegans and C.briggsae enabled us to analyze target sites and hence function of duplicated snoRNA genes. We observed three
different fates of the snoRNAs following duplication: (i) one of the paralogs apparently became inactive and decayed during
the course of evolution; (ii) the new paralog maintained the same function as the original snoRNA and (iii) the new paralog
either partially (one antisense element maintained the same target and the other acquired a new one) or fully diverged with
respect to the complementary targets. Of the 20 pairs of paralogs, we found 4, 16 and 10 examples for the above three
scenarios, respectively. One example of target site plasticity is illustrated in C/D-box snoRNA Ce246, which was detected
experimentally in the C.elegans cDNA library and computationally in C.briggsae. In C.briggsae one paralog differs from the
other mainly by a 2 nt deletion 5' adjacent to the D'-box, shifting the methylation site by 2 nt (blCb246a-blCb246b). In 26S
rRNA G860 is modified by one paralog and A862 by the other.
Coevolution of snoRNA and rRNA modification target sites
Coevolution is defined as a change in the genetic composition of one species in response to a genetic change in another
(29,30). This definition can be adapted to molecular interactions within organisms. Biologically significant interactions
within macromolecules [e.g. RNA secondary structure; (31)] or between macromolecules, [e.g. RNA and proteins], can be
demonstrated by compensatory changes in one or the other (32). Two of the C/D-box snoRNAs (Ce138, Ce234.2) exemplify adaptive
evolution of the snoRNA complementary region to their 26S rRNA target sequence (Figure 6). In the lineage leading to
C.elegans, an AU substitution occurred in the 26S rRNA target site of the Ce138 snoRNA. This base change is not present in
C.briggsae or C.japonica 26S rRNA sequences (data not shown). Accordingly, we found a compensatory UA substitution in the
antisense element of the snoRNA ortholog in C.elegans (Figure 6a), but not in C.briggsae or C.japonica. At another 26S rRNA
position we found an AG substitution in C.briggsae but not in C.elegans or C.japonica (data not shown). The corresponding
C.briggsae snoRNA Cb234.2 shows a compensatory change from UC (Figure 6b).
DISCUSSION
The combined impact of experimental and computational npcRNA screening
Our goal was to obtain as comprehensive a view as possible of cellular snoRNA expression in C.elegans. Creating a cDNA
library based on size-fractionated, expressed RNAs, and enriched to remove large numbers of known RNAs, yielded highly
efficient experimental search results. We present here a detailed analysis of 120 different npcRNA species (Figure 1, groups
I, II) from 665 informative sequences selected from an initial 38 400 clones. Moreover, to complement and validate the
results of this experimental approach, we customized commercially available computer software to generate a search tool for
identifying snoRNA candidates and their modification target sites according to a set of stringent criteria. From >100
potential intronic snoRNA candidates, 23 additional candidates fulfilled these conditions; another 8 npcRNAs were found by
BLAST search. Thus, it is obvious that while computational approaches are not capable of supplanting experimental work, they
do constitute a very useful complementation. This was particularly exemplified by our ability to analyze experimentally
identified snoRNAs that, although apparently still functional, had diverged from the canonical motifs used for the computer
search. In fact, the pitfalls of not complementing experimental results with such careful computational analyses can be
clearly seen in a recent experimental screen (23). Of the C.elegans 56 novel snoRNAs shown in Figure 1, 14 were also reported
recently but were analyzed either incorrectly or not at all (23). As examples, Ce96 (CeN25-2) or Ce135 (CeN25-1) were
described as members of a novel class of small nuclear-like RNAs (23) (see their Figure 3D). Nevertheless, we could discern
clear characteristics of C/D-box snoRNAs for both of these npcRNAs using computational analyses. Ce173.1-3 (CeN128) was
described as one single H/ACA-box snoRNA species. By comparative analyses of C.elegans and C.briggsae we could distinguish
them as three independent C/D-box snoRNAs. The same was true for Ce234.1-2 (CeN47) that they defined as one single snoRNA
species. Ce110 (CeN42) is clearly a C/D-box snoRNA but they defined it as an H/ACA-box snoRNA even though part of the
predicted H/ACA-box snoRNA would clearly overlap exonic sequences. They also identified six other unclassified npcRNAs [Ce86
(CeN35), Ce151 (CeN129), Ce254a (CeN23-1), Ce254b (CeN23-2), Ce282(CeN52), Ce105 (CeN66)] which we could clearly assign to
specific snoRNA categories.
Target site plasticity
Our in silico target site complementarity search provided evidence of a high degree of plasticity in target site
modification. In some cases we found evidence to suggest that the two complementary regions of particular snoRNAs modify
targets in different compartments of the nucleoplasm, namely rRNAs in the nucleolus and spliceosomal RNAs in the Cajal body.
In addition to the classical modification targets, we also found snoRNA complementarities for target sites in five different
tRNAs. Modification of tRNAs by snoRNAs has been demonstrated so far only for Archaea and not for Eukarya (8,27). There is
evidence that, following duplication, several snoRNA paralogs evolved new target site complementarities. Comparing C.elegans
and C.briggsae, we observed that several specific modification sites of rRNAs are targeted by otherwise unrelated snoRNAs in
both species. Losing, gaining or changing target sites are frequent phenomena that document the plasticity of modification
interactions. Another source of plasticity is the compensatory changes of snoRNA target site complementary sequences that
arose following base substitutions in their targets as illustrated in the case of Ce138 and Ce234.2 (Figure 6). Although
several of our predicted target sites were confirmed by experimental approaches (26), a more conclusive verification of other
target sites is necessary.
Birth and evolution of snoRNAs
Little is known about the origin and distribution of snoRNAs. Polycistronic clusters of snoRNAs are frequent in plants,
and propagation due to cluster duplication is generated by polyploidization (33). However, polycistronic clusters of snoRNAs
are the exception in vertebrates, as snoRNAs in those organisms are mainly singular and intron-encoded. To elucidate the
process of snoRNA propagation in a ‘model’ eukaryote, we analyzed presence/absence patterns of snoRNA paralogs in C.elegans
and compared them with those in C.briggsae. We identified three snoRNA paralogs with clear presence/absence patterns (Figure
4). These patterns suggest a copy/paste mechanism in the duplication of certain singular snoRNAs into neighboring introns of
the same gene (cis-duplication). Cis-duplication seems to be a dominant process for H/ACA-box snoRNA propagation, but thus
far, we did not identify any C/D-box snoRNA paralogs generated by cis-duplication. We found most genes harboring
predominantly one type of intronic snoRNAs, (e.g. H/ACA-box snoRNAs in RPGs; Figure 1 and Supplementary Data), one notable
exception being the C/D-box snoRNA comCe4 which is present in the midst of several other H/ACA-box snoRNAs in the
hypothetical protein gene K07C5.4 (Figure 4).
Our data also suggest that snoRNAs can be propagated by complete or partial gene duplication that includes the embedded
snoRNAs, an event that has been purported to precede evolutionary novelties (Figure 5) (34,35). Brosius (36) suggested that
snoRNAs could be propagated by retroposition, a mechanism that might be responsible for trans-duplicated snoRNAs, but,
because insertions of retroposed sequences are virtually random and should not lead to accumulations in neighboring introns,
seems not to be involved in cis-duplication. Local, unequal recombination is a more probable mechanism for cis-duplications,
especially in C.elegans, because of the A/T-rich surroundings of snoRNA sequences.
In summary, the gain, loss and change of targets of snoRNAs over relatively short evolutionary times, possibly similar to
the evolution of miRNAs (37–39), indicate that npcRNAs are not merely fossils from the long gone RNA/RNP world but continue
to contribute to the changing needs of cells and genomes. This constitutes an astounding and unexpected level of plasticity
for a primordial macromolecule such as RNA.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
ACKNOWLEDGEMENTS
We thank Yue Huang for implementing modifications of the DNAMAN software and Marsha Bundman for editorial assistance.
This work was supported by the German Human Genome Project through the BMBF (#01KW9966), and grants from the Fonds der
Chemischen Industrie from the European Union (EU; LSHG-CT-2003-503022) to J.B., and the Nationales Genomforschungsnetz (NGFN;
0313358A) to J.B. and J.S. Funding to pay the Open Access publication charges for this article was provided by NGFN.
Conflict of interest statement. None declared.
REFERENCES
Lander, E.S. (2001) Initial sequencing and analysis of the human genome Nature, 409, 860–921
Stein, L., Bao, Z., Blasiar, D., Blumenthal, T., Brent, M.R., Chen, N., Chinwalla, A., Clarke, L., Clee, C., Coghlan, A.,
et al. (2003) The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics PLoS Biol, . 1, 166–192
Lim, L.P., Lau, N.C., Weinstein, E.G., Abdelhakim, A., Yekta, S., Rhoades, M.W., Burge, C.B., Bartel, D.P. (2003) The
microRNAs of Caenorhabditis elegans Genes Dev, . 17, 991–1008
Kiss, A.M., Jady, B.E., Bertrand, E., Kiss, T. (2004) Human box H/ACA pseudouridylation guide RNA machinery Mol. Cell.
Biol, . 24, 5797–5807
Cavaillé, J., Buiting, K., Kiefmann, M., Lalande, M., Brannan, C.I., Horsthemke, B., Bachellerie, J.-P., Brosius, J., Hü
ttenhofer, A. (2000) From the cover: identification of brain-specific and imprinted small nucleolar RNA genes exhibiting an
unusual genomic organization Proc. Natl Acad. Sci. USA, 97, 14311–14316
Mattick, J.S. and Makunin, I.V. (2005) Small regulatory RNAs in mammals Hum. Mol. Genet, . 14, R121–132
Kishore, S. and Stamm, S. (2006) The snoRNA HBII-52 regulates alternative splicing of the serotonin receptor 2C Science,
311, 230–232
Tang, T.-H., Bachellerie, J.-P., Rozhdestvensky, T., Bortolin, M.-L., Huber, H., Drungowski, M., Elge, T., Brosius, J., H
üttenhofer, A. (2002) Identification of 86 candidates for small non-messenger RNAs from the archaeon Archaeoglobus fulgidus
Proc. Natl Acad. Sci. USA, 99, 7536–7541
Darzacq, X., Jady, B.E., Verheggen, C., Kiss, A.M., Bertrand, E., Kiss, T. (2002) Cajal body-specific small nuclear RNAs:
a novel class of 2'-O-methylation and pseudouridylation guide RNAs EMBO J, . 21, 2746–2756
Kiss, T. (2002) Small nucleolar RNAs: an abundant group of noncoding RNAs with diverse cellular functions Cell, 109, 145
–148
Yoshihama, M., Uechi, T., Asakawa, S., Kawasaki, K., Kato, S., Higa, S., Maeda, N., Minoshima, S., Tanaka, T., Shimizu,
N., et al. (2002) The human ribosomal protein genes: sequencing and comparative analysis of 73 genes Genome Res, . 12, 379–
390
Higa, S., Maeda, N., Kenmochi, N., Tanaka, T. (2002) Location of 2'-O-methyl nucleotides in 26S rRNA and methylation
guide snoRNAs in Caenorhabditis elegans Biochem. Biophys. Res. Commun, . 297, 1344–1349
Singh, S.K., Gurha, P., Tran, E.J., Maxwell, E.S., Gupta, R. (2004) Sequential 2'-O-methylation of archaeal pre-tRNATrp
nucleotides is guided by the intron-encoded but trans-acting box C/D ribonucleoprotein of pre-tRNA J. Biol. Chem, . 279,
47661–47671
Chen, C.-L., Liang, D., Zhou, H., Zhuo, M., Chen, Y.-Q., Qu, L.-H. (2003) The high diversity of snoRNAs in plants:
identification and comparative study of 120 snoRNA genes from Oryza sativa Nucleic Acids Res, . 31, 2601–2613
Ehrlich, P.R. and Raven, P.H. (1964) Butterflies and plants: a study in coevolution Evolution, 18, 586–608
Page, R.D.M. and Holmes, E.C. Molecular Evolution: A Phylogenetic Approach, (1998) Oxford Blackwell Science Ltd .
Woese, C., Magrum, L., Gupta, R., Siegel, R.B., Stahl, D.A., Kop, J., Crawford, N., Brosius, J., Gutell, R., Hogan, J.J.,
et al. (1980) Secondary structure model for bacterial 16S ribosomal RNA: phylogenetic, enzymatic and chemical evidence
Nucleic Acids Res, . 8, 2275–2293
Metzenberg, S., Joblet, C., Verspieren, P., Agabian, N. (1993) Ribosomal protein L25 from Trypanosoma brucei: phylogeny
and molecular co-evolution of an rRNA-binding protein and its rRNA binding site Nucleic Acids Res, . 21, 4936–4940
Brown, J.W., Clark, G.P., Leader, D.J., Simpson, C.G., Lowe, T. (2001) Multiple snoRNA gene clusters from Arabidopsis
RNA, 7, 1817–1832
Bridges, C.B. (1935) Salivary chromosome maps with a key to the banding of the chromosomes of Drosopila melanogaster J.
Hereditas, 26, 60–64 .
Muller, H. (1936) Bar duplication Science, 83, 528–530
Brosius, J. (2003) The contribution of RNAs and retroposition to evolutionary novelties Genetica, 118, 99–116
Pasquinelli, A.E. and Ruvkun, G. (2002) Control of developmental timing by microRNAs and their targets Annu. Rev. Cell
and Dev. Biol, . 18, 495–513
Grun, D., Wang, Y.-L., Langenberger, D., Gunsalus, K.C., Rajewsky, N. (2005) microRNA target predictions across seven
Drosophila species and comparison to mammalian targets PLoS Comput. Biol, . 1, e13
Houbaviy, H.B., Dennis, L., Jaenisch, R., Sharp, P.A. (2005) Characterization of a highly variable eutherian microRNA
gene RNA, 11, 1245–1257(Anja Zemann, Anja op de B)
In contrast to mRNAs, which are templates for translating proteins, non-protein coding (npc) RNAs (also known as ‘non-
coding’ RNA, ncRNA), exhibit various functions in different compartments and developmental stages of the cell. Small
nucleolar RNAs (snoRNAs), one of the largest classes of npcRNAs, guide post-transcriptional modifications of other RNAs that
are crucial for appropriate RNA folding as well as for RNA–RNA and RNA–protein interactions. Although snoRNA genes comprise
a significant fraction of the eutherian genome, identifying and characterizing large numbers of them is not sufficiently
accessible by classical computer searches alone. Furthermore, most previous investigations of snoRNAs yielded only limited
indications of their evolution. Using data obtained by a combination of high-throughput cDNA library screening and
computational search strategies based on a modified DNAMAN program, we characterized 151 npcRNAs, and in particular 121
snoRNAs, from Caenorhabditis elegans and extensively compared them with those in the related, Caenorhabditis briggsae.
Detailed comparisons of paralog snoRNAs in the two nematodes revealed, in addition to trans-duplication, a novel, cis-
duplication distribution strategy with insertions near to the original loci. Some snoRNAs coevolved with their modification
target sites, demonstrating the close interaction of complementary regions. Some target sites modified by snoRNAs were
changed, added or lost, documenting a high degree of evolutionary plasticity of npcRNAs.
INTRODUCTION
Two very surprising discoveries have arisen from the Human Genome Project. One, humans do not have significantly more
protein-coding genes than other mammals; and two, sequences corresponding to protein open reading frames comprise only 1.5%
of our genome (1). The unavoidable conclusion to be drawn from this is that the differences that separate humans from other
species may reside in the remaining 98.5% of the genome that encode untranslated functional RNAs and regulatory regions, or
constitutes non-genic regions. The present work focuses on a defined population of non-protein coding RNAs (npcRNAs), often
not quite correctly termed ‘non-coding’ RNA (ncRNA), derived from a Caenorhabditis elegans cDNA library generated with
size-fractionated RNA (70–600 nt). The size limitation, while excluding mature microRNAs (miRNAs), short interfering RNAs
and large ribosomal RNAs (rRNAs) that are well described elsewhere (2,3), yields predominantly small nucleolar RNAs (snoRNAs)
and spliceosomal RNAs. snoRNAs are 60–300 nt long and guide the post-transcriptional modifications of ribosomal and other
RNAs. Such modifications are crucial for appropriate RNA folding as well as for RNA–RNA and RNA–protein interactions (4).
Furthermore, snoRNAs are thought to be involved in epigenetic mechanisms regulating gene expression. In this context,
deletion of certain imprinted snoRNA clusters in the cerebral cortex is thought to play a causative role in the Prader–Willi
Syndrome of mental retardation (5–7).
Based on structural motifs and function the snoRNA family is divided into two subclasses: C/D-box snoRNAs (C-box
consensus UGAUGA; D-box consensus CUGA) and H/ACA snoRNAs (H-box consensus ANANNA and box ACA), which interact directly by
base complementarity to their target rRNA and spliceosomal RNA sequences to direct 2'-O-ribose methylation and
pseudouridylation, respectively. The complementary regions, known as ‘antisense elements’, reside at the 5' and/or 3' ends
of snoRNAs. Although snoRNA modifications were initially thought to be restricted to rRNA and to be localized strictly in the
nucleolus, a growing list of npcRNAs including transfer RNAs (tRNAs) are also modified by snoRNAs (4,8), and they have also
been found in Cajal bodies, nucleoplasmic substructures involved in processing npcRNAs (9). The spectrum of snoRNA targets
could potentially include even mRNAs, although it cannot be excluded yet that such existing base complementarities are simply
fortuitous and without biological significance (5). Most vertebrate snoRNAs are derived from introns of pre-mRNA transcripts,
especially those from ribosomal protein genes (RPGs) and other housekeeping proteins, and are processed in a complex sequence
involving endonucleases, exonucleases and helicases (10,11). Interestingly, a growing number of host genes do not yield
translatable mRNAs, and it appears that the main function of the corresponding genes and primary transcripts is the
expression of snoRNAs (12–14). Many miRNAs are also hosted by npcRNAs (15).
Systematic searches using experimental RNomics, an EST-like approach tailored for small RNAs, have successfully
identified large numbers of npcRNAs in Mouse (16), Drosophila (17), Arabidopsis (18) and Archaea (8). To better elucidate the
evolutionary pathways of snoRNAs, we have now extended this search to the nematode, C.elegans, an extremely interesting model
eukaryote with a simple body plan but complicated genomics including, for example, cis, trans and alternative splicing
systems. As intermediates between single-celled organisms and ‘higher’ metazoan animals, they offer an excellent system for
studies on metazoan genome function and evolution. To provide a large enough dataset for exhaustive analysis of snoRNAs in
C.elegans we have now combined high-throughput, experimental RNomic screening with computational methods focused on RPGs and
other introns of genes that harbor snoRNAs identified in our experimental approach. Furthermore, we have analyzed the
phylogeny of snoRNAs by comparing the above results with those of Caenorhabditis briggsae, a nematode that shared a common
ancestor with C.elegans some 100 million years ago (mya). Our three-pronged approach revealed possible mechanisms of how
novel snoRNAs arose, spread in the genome, changed targets or were lost over the course of evolution.
MATERIALS AND METHODS
The experimental procedures concerning construction and analysis of libraries are described in Hüttenhofer et al. (16).
Detailed methods for constructing the C.elegans library are given in Supplementary Data.
Computational strategies
The commercial software package DNAMAN was modified, in collaboration with the Lynnon Corporation, to computationally
screen defined databases of intronic sequences for snoRNAs and to identify snoRNA modification target sites from compiled RNA
databases. The modified DNAMAN version is available from http://www.lynnon.com (Mac OS X version 6018 or later). Note that
additional freeware is available at http://lowelab.ucsc.edu/snoscan to analyze C/D-box snoRNAs (19), at
http://lowelab.ucsc.edu/snoGPS for H/ACA-box snoRNAs (20) and at http://rna.tbi.univie.ac.at/cgi-bin/alifold.cgi to analyze
secondary structural prediction (21).
Computational search for snoRNAs in C.elegans
The modified DNAMAN software allowed us to apply complex search profiles to find potential snoRNAs in a compilation of
C.elegans introns of RPGs and genes that harbor experimentally identified snoRNAs. The following search was applied for C/D-
box snoRNAs:
TGATGA(N9-35)CTGA(N4-35)TGATGA(N9-35)CTGA
A maximum of three mismatches were allowed in the first, two in the second, three in the third and one in the last
sequence motif. N9-35 and N4-35 denote variable sequence stretches of at least 9 or 4, respectively, and a maximum of 35 nt.
The search motif for H/ACA-box snoRNAs was ANA(NN)A(N50-100)ACA. No mismatches were allowed. Both searches were accompanied
by intensive structural evaluations of the computationally predicted snoRNAs (Supplementary Figure 1).
The pattern search of DNAMAN is implemented in C language. Details of the search procedure are provided by the Lynnon
Corporation (Supplementary Data).
5' Extension of experimentally found snoRNAs
BLAST searches of the cDNA sequences were made against the C.elegans non-redundant (nr) NCBI database
(http://www.ncbi.nlm.nih.gov/blast), the Santa Cruz server (http://genome.ucsc.edu/cgi-bin/hgBlat), the Sanger database
(http://www.ensembl.org/Caenorhabditis_elegans/blastview) or the RPG databank (http://ribosome.miyazaki-med.ac.jp). Thus, the
sequences absent in the truncated cDNAs (usually some 10 nt) were extended with the aid of genomic sequences. The mature 5'
ends were estimated by structural requirements of mature snoRNAs (Supplementary Figure 1).
Target site search
A compiled library of all C.elegans rRNA, spliceosomal and tRNA genes was searched with the modified DNAMAN software for
potential snoRNA target motifs. For C/D-box snoRNA target sites we allowed a maximum of three G–U pairs and a minimum length
of 9 nt. For H/ACA-box snoRNAs we used a similar search profile but allowed a split of target sites in four or five
contiguous nucleotides. The detailed search process is shown in Supplementary Figure 1.
Comparison with C.briggsae sequences
The same database sources as mentioned above were used to computationally detect orthologous snoRNAs in C.briggsae.
Secondary structures of snoRNAs
The secondary structures of all experimentally and computationally identified snoRNAs were derived using the M-fold
program (22); http://www.bioinfo.rpi.edu/applications/mfold/old/rna.
RESULTS
Analysis of the size-fractionated cDNA library of C.elegans
Following high-density array hybridization of 38 400 cDNA sequence clones to exclude known small npcRNAs or fragments of
degraded large rRNAs (Supplementary Figure 2), we selected 4673 clones for sequencing. Exclusion of unreadable or very short
sequences, empty vectors, E.coli contaminations and other ambiguities yielded 3294 clones; among these we identified 15 known
spliceosomal RNAs (294 sequences), 41 known tRNAs (322 sequences), 3 isoforms of SRP (signal recognition particle) RNA (736
sequences), 29 different parts of known rRNAs that escaped prior exclusion (1180 sequences), 22 known mRNAs (31 sequences), 7
splice leader RNA sequences (SL; 64 sequences) and two histone hairpin RNAs (2 sequences) all of which were excluded from a
more detailed analysis (SL, SRP, histone hairpin and spliceosomal RNAs are listed in Supplementary Data). The remaining 665
sequences contained 120 npcRNAs including 91 snoRNAs (Figure 1).
Computational screening for additional npcRNA candidates
In addition to those npcRNAs identified experimentally, computer searches based on the following arguments yielded
another 23 snoRNAs (Figure 1). Yoshihama et al. (11) estimated that RPGs harbor about one-third of all snoRNAs in the human
genome. In our experimentally identified snoRNAs we also observed that genes harboring one snoRNA in an intron are likely to
encode additional snoRNAs in the same intron or in neighboring introns of the same gene. Consequently, we extracted and
analyzed snoRNA candidates from introns of all known C.elegans RPGs and from other intronic sequences that were found in the
proximity of our experimentally detected snoRNAs. We used the following stringent criteria to validate all computationally
detected snoRNA candidates (Supplementary Figure 1): (i) presence of all snoRNA structural requirements and box motifs; (ii)
identification of potential modification target site complementarities, (iii) sequence conservation in C.briggsae, and/or
signals in northern blots. From >100 potential candidates (Supplementary Data) an additional 23 novel snoRNA candidates met
these stringent criteria (Figure 1; comCe).
The reliability of the computational algorithm was confirmed in that we were also able to identify all but 17 of the
experimentally found or previously predicted snoRNAs with these search criteria. Those snoRNAs not confirmed in the computer
search were structurally modified and therefore did not match our search profile (data not shown). An additional BLAST search
of Genbank genomic sequences revealed seven snoRNA paralogs (Figure 1, blCe, blCb) and one additional spliceosomal RNA
(blCe378).
snoRNAs
Of all 154 experimentally or computationally identified sequences, 59 are novel snoRNAs candidates (Figure 1, I), while
65 of the recovered snoRNA candidates were recently confirmed experimentally (23) (59 candidates) or (24) (6 candidates)
(Figure 1, II; Supplementary Data). For completeness, Figure 1 (III) also lists 20 other snoRNA candidates that were not
recovered by our screen, but were identified previously by either Deng et al. (23) (18 candidates) or Wachi et al. (24) (2
candidates), and is thus now a compilation of all presently known C.elegans snoRNAs.
Altogether, we found 76 unambiguous snoRNA candidates with motifs, secondary structure elements and recognizable target
modification complementarities characteristic of C/D-box snoRNAs. Based on their chromosomal locations, individual candidates
could be described as either intronic or intergenic snoRNAs (Figure 1 and Supplementary Data). All but 16 are also
potentially functional in C.briggsae and are located at orthologous loci; 10 of these 16 were recognizable, but diverged, at
orthologous positions in C.briggsae (Supplementary Figure 3). Presumably, they became inactive pseudogenes that lack motifs
and structures to function as bona fide snoRNAs. Interestingly, in all C/D-box snoRNA candidates (as well as the H/ACA-box
snoRNAs) we identified a characteristic uridine-rich region adjacent to the mature 3' ends. This sequence has previously been
implicated in maturation of H/ACA-box snoRNAs only (25). We also found a C/D-box homodimer (Ce234) and a chimeric C/D-H/ACA-
box snoRNA (Ce104). Northern blot analysis of both resulted in hybridization to only dimeric forms, indicating that the
respective dimers are the mature forms of these snoRNAs. Interestingly, we detected only six C/D-box snoRNA candidates in RPG
introns compared with 20 H/ACA-box snoRNAs (Figure 1).
We also identified 48 H/ACA-box snoRNAs that were localized to intronic and intergenic regions (Figure 1 and
Supplementary Data). Only seven of those are probably not functional in C.briggsae (Figure 1). The sequences of three H/ACA-
box snoRNA orthologs are apparently diverged pseudogenes in C.briggsae (Supplementary Figure 3; comCb17, comCb22, blCb176).
snoRNA modification target sites and distribution patterns
In keeping with their function, snoRNAs have dual binding capacity for both small RNA modifying proteins and, via
specific sequence complementarity, for their target RNAs. We identified complementarities for potential modification targets
in 5S, 5.8S, 18S and 26S rRNAs; in U1, U2, U4, U5 and U6 spliceosomal RNAs and in tRNAs. Twelve 26S rRNA target sites are
supported by the presence of nucleotide modifications (26) (Supplementary Figure 3, black dots). This is the first time that
C/D-box snoRNAs in eukaryotes have been identified with potential target sites in various tRNAs (Ce62-tRNAIle, Ce63-tRNASer,
Ce94-tRNAAsn, Ce246-tRNAIle, comCe3-tRNAThr, comCe18-tRNAArg) (Figure 2). tRNA modifications guided by snoRNAs have been
reported thus far only in Archaea (8,27). Another interesting observation was the presence of two antisense elements in some
of our snoRNAs (e.g. Ce173.3, Ce251, Ce298, Ce23) with complimentary regions suggesting the potential to modify target RNAs
located in two different subcellular compartments. These snoRNAs are predicted to modify rRNAs that occur in the nucleolus,
as well as U1, U4, U5 spliceosomal RNAs that are present in Cajal bodies. Even an individual antisense element has the
potential to be complementary to more than one hypothetical target site (Supplementary Figure 3).
From our 121 experimentally and computationally identified snoRNAs in C.elegans, 98 potentially functional orthologs were
identified in C.briggsae (Figure 1). Forty of these orthologous pairs contain matching sequence complementarities to the same
RNA modification targets in C.briggsae and C.elegans (Supplementary Figure 3a and b). Surprisingly, the potential target
sites for the majority of them changed over a period of 100 million years (Supplementary Figure 3c and d).
snoRNA paralogs, generated perhaps by gene duplication, have been observed frequently and are a potential source for the
creation of novel snoRNAs (28). We identified 20 snoRNAs and their corresponding paralogs (11 pairs are orthologs in both
C.elegans and C.briggsae, 6 pairs in C.elegans and 3 pairs in C.briggsae only; Figure 3a). To help determine whether the
computationally identified H/ACA-box snoRNA paralogs are functional, we analyzed the compensatory nucleotide substitution
patterns in their double-stranded stem structures. Compensatory changes tend to maintain the secondary structure of stem
regions and indicate selection pressure for functionality. Characteristic compensatory changes could be found for all
identified H/ACA-box snoRNA paralogs suggesting that they retained their functionality, at least for a sufficient period to
form individual compensations after duplication (data not shown). Compensatory substitution pattern analyses in C/D-box
snoRNAs are not of much help in determining their functionality because they do not possess sufficient amounts of double-
stranded structures; thus, C/D-box snoRNAs were omitted from this analysis.
The chromosomal localization of snoRNA paralogs could be categorized based on two distinguishing events: those in which
snoRNA paralogs inserted into different positions in the same gene (cis-duplication), and those in which snoRNA paralogs
inserted into target genes (or intergenic regions) other than the original host gene or, in the event of host gene
duplication, moved to a different chromosomal location along with the host gene (trans-duplication).
Cis-duplication of snoRNA paralogs
The presence and/or absence of 20 snoRNA paralogs were analyzed in C.elegans and C.briggsae. Figure 4a shows a C.elegans
snoRNA (comCe12) that is conserved at the orthologous locus (intron 2 of the rps-29 gene) in C.briggsae. A paralog of this
snoRNA (Ce236) was also present in our cDNA library. In C.elegans Ce236 is located in intron 1 of the same rps-29 host gene,
while the orthologous position in the rps-29 gene of C.briggsae is empty. Since the probability of a clean excision of the
snoRNA without parts of the flanking sequences at this position in C.briggsae is negligible, this indicates a duplication
process involving integration into the adjacent intron (cis-duplication) after C.elegans split from a common ancestor with
C.briggsae. Interestingly, the function of Ce236 in C.elegans may have been replaced in C.briggsae by another non-orthologous
snoRNA (Cb309) that has the potential to modify the identical nucleotide in 26S rRNA, while the Cb309 ortholog in C.elegans
(Ce309) modifies a target sequence in 18S rRNA. A similar scenario is shown in Figure 4b. We found a snoRNA (comCe7) present
at orthologous positions of the rpl-24 gene in C.elegans, C.briggsae and Caenorhabditis remanei. A paralog (Ce80) could be
detected in intron 1 of the same gene in C.elegans only. Figure 4c shows a snoRNA (comCe14) present in orthologous positions
of the hypothetical protein gene K07C5 in C.elegans and C.briggsae. A corresponding paralog is found in intron 7 of the same
gene in C.elegans (comCe15) but not in C.briggsae. In all presence/absence cases examined, the intronic sequences flanking
the duplicated snoRNAs were recognizable at the corresponding, empty loci.
Trans-duplication of snoRNA paralogs
We could distinguish two forms of trans-duplications, both of which are exemplified in Figure 5. In some instances of
segmental duplications of entire genes that harbor snoRNAs in one or more introns, the snoRNA did not move to another part of
the host gene but hitchhiked with the host to a new location after duplication. In other cases, snoRNAs inserted into introns
of a new host gene without traces of the original host gene, or into a new intergenic location. Figure 5a describes an
example of segmental duplication of a hypothetical protein gene (C06A1.3) yielding a duplicated pseudogene (with respect to
the protein-coding capacity) including the paralogous snoRNAs Ce173.1-3. Figure 5b shows two experimentally identified snoRNA
paralogs; Ce254b is located in intron 1 of a hypothetical protein gene (Y53F4B.12). The paralog Ce254b duplicated, along with
the 5' (100 nt) and 3' (50 nt) sequences of its original flanking intron, but without detectable surrounding exons, and
migrated into a new location on a different chromosome. Interestingly, the separate left and right antisense elements of
Ce254 (Figure 3b, bottom) modify bases in 26S rRNA that are shifted by only 6 nt. Hence, the sequences on 26S rRNA that are
complementary to the two snoRNA antisense elements overlap by 2 nt (Figure 3b, top). This indicates that modification of the
two methylation targets is not likely to occur at the same time. We found both paralogs at orthologous positions in
C.briggsae, indicating that the duplication event took place in a common ancestor of both worms. Both forms retained their
modification targets over 100 my demonstrating strong functional constraints. The fact that two conserved snoRNA paralogs
modify the same targets indicates that one may not be enough to perform modification of all rRNA molecules, and that
quantitative aspects play an important role in snoRNA function. Figure 5c and d describe snoRNA paralogs that are located in
totally different surroundings following duplication. In the latter case it is noteworthy that the comCe6 paralog moved from
one RPG (rpl-7) to another (rps-13) as the Ce280 paralog or vice versa.
Functional plasticity of snoRNA paralogs
Data provided by both the experimental and computational searches, as well as comparisons of paralogous snoRNAs in both
C.elegans and C.briggsae enabled us to analyze target sites and hence function of duplicated snoRNA genes. We observed three
different fates of the snoRNAs following duplication: (i) one of the paralogs apparently became inactive and decayed during
the course of evolution; (ii) the new paralog maintained the same function as the original snoRNA and (iii) the new paralog
either partially (one antisense element maintained the same target and the other acquired a new one) or fully diverged with
respect to the complementary targets. Of the 20 pairs of paralogs, we found 4, 16 and 10 examples for the above three
scenarios, respectively. One example of target site plasticity is illustrated in C/D-box snoRNA Ce246, which was detected
experimentally in the C.elegans cDNA library and computationally in C.briggsae. In C.briggsae one paralog differs from the
other mainly by a 2 nt deletion 5' adjacent to the D'-box, shifting the methylation site by 2 nt (blCb246a-blCb246b). In 26S
rRNA G860 is modified by one paralog and A862 by the other.
Coevolution of snoRNA and rRNA modification target sites
Coevolution is defined as a change in the genetic composition of one species in response to a genetic change in another
(29,30). This definition can be adapted to molecular interactions within organisms. Biologically significant interactions
within macromolecules [e.g. RNA secondary structure; (31)] or between macromolecules, [e.g. RNA and proteins], can be
demonstrated by compensatory changes in one or the other (32). Two of the C/D-box snoRNAs (Ce138, Ce234.2) exemplify adaptive
evolution of the snoRNA complementary region to their 26S rRNA target sequence (Figure 6). In the lineage leading to
C.elegans, an AU substitution occurred in the 26S rRNA target site of the Ce138 snoRNA. This base change is not present in
C.briggsae or C.japonica 26S rRNA sequences (data not shown). Accordingly, we found a compensatory UA substitution in the
antisense element of the snoRNA ortholog in C.elegans (Figure 6a), but not in C.briggsae or C.japonica. At another 26S rRNA
position we found an AG substitution in C.briggsae but not in C.elegans or C.japonica (data not shown). The corresponding
C.briggsae snoRNA Cb234.2 shows a compensatory change from UC (Figure 6b).
DISCUSSION
The combined impact of experimental and computational npcRNA screening
Our goal was to obtain as comprehensive a view as possible of cellular snoRNA expression in C.elegans. Creating a cDNA
library based on size-fractionated, expressed RNAs, and enriched to remove large numbers of known RNAs, yielded highly
efficient experimental search results. We present here a detailed analysis of 120 different npcRNA species (Figure 1, groups
I, II) from 665 informative sequences selected from an initial 38 400 clones. Moreover, to complement and validate the
results of this experimental approach, we customized commercially available computer software to generate a search tool for
identifying snoRNA candidates and their modification target sites according to a set of stringent criteria. From >100
potential intronic snoRNA candidates, 23 additional candidates fulfilled these conditions; another 8 npcRNAs were found by
BLAST search. Thus, it is obvious that while computational approaches are not capable of supplanting experimental work, they
do constitute a very useful complementation. This was particularly exemplified by our ability to analyze experimentally
identified snoRNAs that, although apparently still functional, had diverged from the canonical motifs used for the computer
search. In fact, the pitfalls of not complementing experimental results with such careful computational analyses can be
clearly seen in a recent experimental screen (23). Of the C.elegans 56 novel snoRNAs shown in Figure 1, 14 were also reported
recently but were analyzed either incorrectly or not at all (23). As examples, Ce96 (CeN25-2) or Ce135 (CeN25-1) were
described as members of a novel class of small nuclear-like RNAs (23) (see their Figure 3D). Nevertheless, we could discern
clear characteristics of C/D-box snoRNAs for both of these npcRNAs using computational analyses. Ce173.1-3 (CeN128) was
described as one single H/ACA-box snoRNA species. By comparative analyses of C.elegans and C.briggsae we could distinguish
them as three independent C/D-box snoRNAs. The same was true for Ce234.1-2 (CeN47) that they defined as one single snoRNA
species. Ce110 (CeN42) is clearly a C/D-box snoRNA but they defined it as an H/ACA-box snoRNA even though part of the
predicted H/ACA-box snoRNA would clearly overlap exonic sequences. They also identified six other unclassified npcRNAs [Ce86
(CeN35), Ce151 (CeN129), Ce254a (CeN23-1), Ce254b (CeN23-2), Ce282(CeN52), Ce105 (CeN66)] which we could clearly assign to
specific snoRNA categories.
Target site plasticity
Our in silico target site complementarity search provided evidence of a high degree of plasticity in target site
modification. In some cases we found evidence to suggest that the two complementary regions of particular snoRNAs modify
targets in different compartments of the nucleoplasm, namely rRNAs in the nucleolus and spliceosomal RNAs in the Cajal body.
In addition to the classical modification targets, we also found snoRNA complementarities for target sites in five different
tRNAs. Modification of tRNAs by snoRNAs has been demonstrated so far only for Archaea and not for Eukarya (8,27). There is
evidence that, following duplication, several snoRNA paralogs evolved new target site complementarities. Comparing C.elegans
and C.briggsae, we observed that several specific modification sites of rRNAs are targeted by otherwise unrelated snoRNAs in
both species. Losing, gaining or changing target sites are frequent phenomena that document the plasticity of modification
interactions. Another source of plasticity is the compensatory changes of snoRNA target site complementary sequences that
arose following base substitutions in their targets as illustrated in the case of Ce138 and Ce234.2 (Figure 6). Although
several of our predicted target sites were confirmed by experimental approaches (26), a more conclusive verification of other
target sites is necessary.
Birth and evolution of snoRNAs
Little is known about the origin and distribution of snoRNAs. Polycistronic clusters of snoRNAs are frequent in plants,
and propagation due to cluster duplication is generated by polyploidization (33). However, polycistronic clusters of snoRNAs
are the exception in vertebrates, as snoRNAs in those organisms are mainly singular and intron-encoded. To elucidate the
process of snoRNA propagation in a ‘model’ eukaryote, we analyzed presence/absence patterns of snoRNA paralogs in C.elegans
and compared them with those in C.briggsae. We identified three snoRNA paralogs with clear presence/absence patterns (Figure
4). These patterns suggest a copy/paste mechanism in the duplication of certain singular snoRNAs into neighboring introns of
the same gene (cis-duplication). Cis-duplication seems to be a dominant process for H/ACA-box snoRNA propagation, but thus
far, we did not identify any C/D-box snoRNA paralogs generated by cis-duplication. We found most genes harboring
predominantly one type of intronic snoRNAs, (e.g. H/ACA-box snoRNAs in RPGs; Figure 1 and Supplementary Data), one notable
exception being the C/D-box snoRNA comCe4 which is present in the midst of several other H/ACA-box snoRNAs in the
hypothetical protein gene K07C5.4 (Figure 4).
Our data also suggest that snoRNAs can be propagated by complete or partial gene duplication that includes the embedded
snoRNAs, an event that has been purported to precede evolutionary novelties (Figure 5) (34,35). Brosius (36) suggested that
snoRNAs could be propagated by retroposition, a mechanism that might be responsible for trans-duplicated snoRNAs, but,
because insertions of retroposed sequences are virtually random and should not lead to accumulations in neighboring introns,
seems not to be involved in cis-duplication. Local, unequal recombination is a more probable mechanism for cis-duplications,
especially in C.elegans, because of the A/T-rich surroundings of snoRNA sequences.
In summary, the gain, loss and change of targets of snoRNAs over relatively short evolutionary times, possibly similar to
the evolution of miRNAs (37–39), indicate that npcRNAs are not merely fossils from the long gone RNA/RNP world but continue
to contribute to the changing needs of cells and genomes. This constitutes an astounding and unexpected level of plasticity
for a primordial macromolecule such as RNA.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
ACKNOWLEDGEMENTS
We thank Yue Huang for implementing modifications of the DNAMAN software and Marsha Bundman for editorial assistance.
This work was supported by the German Human Genome Project through the BMBF (#01KW9966), and grants from the Fonds der
Chemischen Industrie from the European Union (EU; LSHG-CT-2003-503022) to J.B., and the Nationales Genomforschungsnetz (NGFN;
0313358A) to J.B. and J.S. Funding to pay the Open Access publication charges for this article was provided by NGFN.
Conflict of interest statement. None declared.
REFERENCES
Lander, E.S. (2001) Initial sequencing and analysis of the human genome Nature, 409, 860–921
Stein, L., Bao, Z., Blasiar, D., Blumenthal, T., Brent, M.R., Chen, N., Chinwalla, A., Clarke, L., Clee, C., Coghlan, A.,
et al. (2003) The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics PLoS Biol, . 1, 166–192
Lim, L.P., Lau, N.C., Weinstein, E.G., Abdelhakim, A., Yekta, S., Rhoades, M.W., Burge, C.B., Bartel, D.P. (2003) The
microRNAs of Caenorhabditis elegans Genes Dev, . 17, 991–1008
Kiss, A.M., Jady, B.E., Bertrand, E., Kiss, T. (2004) Human box H/ACA pseudouridylation guide RNA machinery Mol. Cell.
Biol, . 24, 5797–5807
Cavaillé, J., Buiting, K., Kiefmann, M., Lalande, M., Brannan, C.I., Horsthemke, B., Bachellerie, J.-P., Brosius, J., Hü
ttenhofer, A. (2000) From the cover: identification of brain-specific and imprinted small nucleolar RNA genes exhibiting an
unusual genomic organization Proc. Natl Acad. Sci. USA, 97, 14311–14316
Mattick, J.S. and Makunin, I.V. (2005) Small regulatory RNAs in mammals Hum. Mol. Genet, . 14, R121–132
Kishore, S. and Stamm, S. (2006) The snoRNA HBII-52 regulates alternative splicing of the serotonin receptor 2C Science,
311, 230–232
Tang, T.-H., Bachellerie, J.-P., Rozhdestvensky, T., Bortolin, M.-L., Huber, H., Drungowski, M., Elge, T., Brosius, J., H
üttenhofer, A. (2002) Identification of 86 candidates for small non-messenger RNAs from the archaeon Archaeoglobus fulgidus
Proc. Natl Acad. Sci. USA, 99, 7536–7541
Darzacq, X., Jady, B.E., Verheggen, C., Kiss, A.M., Bertrand, E., Kiss, T. (2002) Cajal body-specific small nuclear RNAs:
a novel class of 2'-O-methylation and pseudouridylation guide RNAs EMBO J, . 21, 2746–2756
Kiss, T. (2002) Small nucleolar RNAs: an abundant group of noncoding RNAs with diverse cellular functions Cell, 109, 145
–148
Yoshihama, M., Uechi, T., Asakawa, S., Kawasaki, K., Kato, S., Higa, S., Maeda, N., Minoshima, S., Tanaka, T., Shimizu,
N., et al. (2002) The human ribosomal protein genes: sequencing and comparative analysis of 73 genes Genome Res, . 12, 379–
390
Higa, S., Maeda, N., Kenmochi, N., Tanaka, T. (2002) Location of 2'-O-methyl nucleotides in 26S rRNA and methylation
guide snoRNAs in Caenorhabditis elegans Biochem. Biophys. Res. Commun, . 297, 1344–1349
Singh, S.K., Gurha, P., Tran, E.J., Maxwell, E.S., Gupta, R. (2004) Sequential 2'-O-methylation of archaeal pre-tRNATrp
nucleotides is guided by the intron-encoded but trans-acting box C/D ribonucleoprotein of pre-tRNA J. Biol. Chem, . 279,
47661–47671
Chen, C.-L., Liang, D., Zhou, H., Zhuo, M., Chen, Y.-Q., Qu, L.-H. (2003) The high diversity of snoRNAs in plants:
identification and comparative study of 120 snoRNA genes from Oryza sativa Nucleic Acids Res, . 31, 2601–2613
Ehrlich, P.R. and Raven, P.H. (1964) Butterflies and plants: a study in coevolution Evolution, 18, 586–608
Page, R.D.M. and Holmes, E.C. Molecular Evolution: A Phylogenetic Approach, (1998) Oxford Blackwell Science Ltd .
Woese, C., Magrum, L., Gupta, R., Siegel, R.B., Stahl, D.A., Kop, J., Crawford, N., Brosius, J., Gutell, R., Hogan, J.J.,
et al. (1980) Secondary structure model for bacterial 16S ribosomal RNA: phylogenetic, enzymatic and chemical evidence
Nucleic Acids Res, . 8, 2275–2293
Metzenberg, S., Joblet, C., Verspieren, P., Agabian, N. (1993) Ribosomal protein L25 from Trypanosoma brucei: phylogeny
and molecular co-evolution of an rRNA-binding protein and its rRNA binding site Nucleic Acids Res, . 21, 4936–4940
Brown, J.W., Clark, G.P., Leader, D.J., Simpson, C.G., Lowe, T. (2001) Multiple snoRNA gene clusters from Arabidopsis
RNA, 7, 1817–1832
Bridges, C.B. (1935) Salivary chromosome maps with a key to the banding of the chromosomes of Drosopila melanogaster J.
Hereditas, 26, 60–64 .
Muller, H. (1936) Bar duplication Science, 83, 528–530
Brosius, J. (2003) The contribution of RNAs and retroposition to evolutionary novelties Genetica, 118, 99–116
Pasquinelli, A.E. and Ruvkun, G. (2002) Control of developmental timing by microRNAs and their targets Annu. Rev. Cell
and Dev. Biol, . 18, 495–513
Grun, D., Wang, Y.-L., Langenberger, D., Gunsalus, K.C., Rajewsky, N. (2005) microRNA target predictions across seven
Drosophila species and comparison to mammalian targets PLoS Comput. Biol, . 1, e13
Houbaviy, H.B., Dennis, L., Jaenisch, R., Sharp, P.A. (2005) Characterization of a highly variable eutherian microRNA
gene RNA, 11, 1245–1257(Anja Zemann, Anja op de B)