当前位置: 首页 > 期刊 > 《核酸研究》 > 2004年第22期 > 正文
编号:11370194
The identification of novel RNA structural motifs using COMPADRES: an
http://www.100md.com 《核酸研究医学期刊》
     Department of Physics, Columbia University, New York, NY 10027, USA, 1 Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA and 2 Howard Hughes Medical Institute, 4000 Jones Bridge Road, Chevy Chase, MD 20815, USA

    * To whom correspondence should be addressed. Tel: +1 203 432 5633; Fax: +1 203 432 5316; Email: anna.pyle@yale.edu

    ABSTRACT

    Recurring RNA structural motifs are important sites of tertiary interaction and as such, are integral to RNA macromolecular structure. Although numerous RNA motifs have been classified and characterized, the identification of new motifs is of great interest. In this study, we discovered four new conformationally recurring motifs: the -turn, the -turn, the -loop and the C2'-endo mediated flipped adenosine motif. Not only do they have complex and interesting structures, but they participate in contacts of high biological significance. In a first for the RNA field, new motifs were discovered by a fully automated algorithm. This algorithm, COMPADRES, utilized a reduced representation of the RNA backbone and was highly successful at discerning unique structural relationships. This study also shows that recurring RNA substructures are not necessarily accompanied by consistent primary or secondary structure.

    INTRODUCTION

    The architectural richness of RNA tertiary structure has been revealed through investigations of self-cleaving and self-splicing RNAs (1), tRNA molecules (2,3) and ribosomal subunits (4). These studies have indicated that folded RNA molecules are largely constructed from elaborate networks of interactions among a defined set of molecular building blocks, or RNA motifs (5). Understanding the structural and functional characteristics of RNA motifs is therefore of fundamental importance to our knowledge of RNA tertiary structure form and stability.

    An RNA motif is typically classified in one of two ways. (i) It is physically superimposable with another element of RNA structure. (ii) It shares a particular mode of molecular interaction with other elements of RNA structure (5,6). For example, ‘A-minor motifs’ do not necessarily share identical local structures; rather, they are defined by a common mode of intermolecular interaction (7). Similarly, ‘ribose zipper motifs’ can be conformationally diverse and yet share common patterns of 2'-hydroxyl recognition (8,9). Base-pair classification schemes have been very useful for annotating units of secondary and tertiary structure in terms of nucleobase interactions (10). By classifying an RNA motif as a consensus set of sequences and base pairings, one can catalog units of structure and predict occurrences of a motif (11).

    An alternative approach for defining and identifying motifs employs only the conformational properties of an RNA molecule. RNA motifs that are categorized by a particular local structure include the GNRA tetraloop (12,13), the S1 and S2 turns (14,15), the kink turn (16) and the hook turn (17). Members of these individual motif classes have three-dimensional structures that are largely superimposable. Despite the importance of the nucleobase in nucleic acid structure, many RNA motifs have been successfully classified by focusing only on conformation of the RNA backbone. The six backbone torsion angles of RNA contain considerable information and may be uniquely capable of describing numerous motifs. In fact, the standard backbone torsion angles have been used as descriptors in the development of automated methods for grouping stretches of recurring RNA structure (18), although the approach has not identified new motifs. Future work to categorize and discover motifs may be facilitated by evidence showing that the RNA backbone prefers discrete values for individual torsion angles (19,20).

    In an alternative approach, the conformational space of the RNA backbone can be simplified using a reduced representation that defines each nucleotide as a set of two pseudobonds that stretch from successive P and C4' atoms (21–23). Associated with these pseudobonds are two pseudotorsion angles: (C4'i–1 – Pi – C4'i – Pi+1) and (Pi – C4'i – Pi+1 – C4'i+1), for a given nucleotide, i. This shorthand convention has proven to be a conformationally meaningful way to define RNA structural features (23) (Leven M. Wadley, Carlos M. Duarte and Anna Marie Pyle, manuscript in preparation), particularly when used as series of coordinates that describe a sequential string of nucleotides (13,15). An ordered set of – coordinates for a given stretch of RNA is called an RNA worm (15), which is calculated using the program PRIMOS. RNA worms can be superimposed and used to detect subtle conformational differences between RNA structures. They have also been shown to accurately describe individual RNA motifs and to group them into classes of superimposable structure. For example, PRIMOS can use a defined worm as ‘bait’ to automatically search for recurring RNA motifs within structures deposited in the Protein Data Bank (PDB) (24). Given that PRIMOS can use RNA worms to sensitively detect differences between similar RNA structures, and to catalog the known motifs that occur within a structure, we wondered whether PRIMOS could be used to identify previously undetected RNA conformational motifs within the database of RNA structures.

    Here, we apply a novel automated approach that implements the pseudotorsional convention for identifying entirely new RNA motifs. Adapting PRIMOS, we created a new program, entitled Comparative Algorithm to Discover Recurring Elements of Structure (COMPADRES), which is entirely designed to search for novel recurring elements of RNA structure. Underscoring the limitations in subjective, visual analysis of RNA structure, COMPADRES identified numerous complex and biologically relevant RNA motifs from the database of existing RNA structures. Four new motifs were discovered including the -turn, the -turn, the -loop motif and the C2'-endo mediated flipped adenosine motif. These motifs are defined mathematically by the RNA worms that characterize their backbones, and to the best of our knowledge, three of them have never been characterized structurally before. The work underscores the power of automated approaches for mathematically describing and analyzing RNA structure, and it highlights an important principle of RNA structure: invariant RNA conformations can exist without consistent primary or secondary structure.

    METHODS

    Data set selection

    In order to reduce complexity of the search, we first constructed a non-redundant data set of RNA structures. Certain RNA molecules have been structurally characterized multiple times, or appear multiple times within a unit cell, resulting in numerous PDB files that contain ‘redundant’ (or virtually identical) RNA structures. To eliminate this redundancy and create a database of unique RNA structures, an automated adaptation to the PRIMOS method (15) was used to calculate worm representations for all structures and to automatically compare them to one another. Identical structures were parsed into groups, and a representative example from each group was selected by choosing the longest, highest resolution structure from each set.

    We started with a preliminary structure set that was compiled from PDB files of ribosomal subunits, tRNA structures, ribozymes, aptamers and other crystal structures of RNA molecules with a resolution cutoff of 3 ?. Each PDB file contains one or more RNA ‘chains’, where a chain is defined as a stretch of residues that is labeled with a unique chain identifier and is typically biologically continuous. For example, the PDB file (1JJ2 ) for the 50S ribosomal subunit of Haloarcula marismortui (H50S) (16) has two RNA chains—one for 23S rRNA and another for 5S rRNA. A worm representation was made for every chain by calculating the and pseudotorsions for all relevant nucleotides. Each of these worms was compared to every other on a nucleotide-by-nucleotide basis by using

    where (1,1) and (2,2) are the – coordinates for the two nucleotides. Two nucleotides were considered structurally identical if (,) < 25°. Shorter worms were shifted with respect to longer ones in order to make all possible comparisons, and a score, S, between two worms was defined as the maximum percent of nucleotides that were identical. Worms were checked in order from shortest to longest, and a worm was determined to be redundant and eliminated from the data set if S > 85% when compared to any longer worm. This empirical cutoff was stringent enough to remove most redundancy, but permissive enough to prevent elimination of structures with recurring motifs. The resulting data set consisted of 49 structures, 50 chains and 6697 nt with defined and values (Supplementary Table 1 online).

    Motif searching

    The COMPADRES search for novel motifs was conducted in two stages:

    Stage 1: This stage automatically grouped pairs of identical stretches of five or more nucleotides in a pair-wise fashion. An RNA worm representation was calculated for the entire, non-redundant data set. That is, the RNA worm representations for all unique structures in the database were effectively strung together in a linear series that results in a readily searchable format for the data. The resulting whole data set worm was then slid against itself so that the and values for every nucleotide were compared to every other using the (,) metric, and identical stretches were grouped.

    Mathematically, this grouping was accomplished by constructing a square boolean matrix, M. Each entry, Mi,j, in the matrix represented a (,) comparison between two nucleotides, i and j, in the data set:

    To establish the entries, a worm was calculated for each chain, and the chains were strung together to form a 6697 nt long worm for the entire data set. A continuous stretch of nucleotides was considered identical to another if a set of five or more comparisons {Mi,j, Mi+1,j+1, Mi+2,j+2, ...}(i j) were all ‘true’. All such pairs of identical stretches were generated at this stage.

    Stage 2: This stage automatically eliminated known motifs from the output of stage 1 by employing a reference ‘motif library’, (Supplementary Table 2 online) which was established from several PRIMOS and PRIMOS-like searches for known motifs (13,15,25). Specifically, the library consists of worm representations for GNRA tetraloops, T-loops, S-turns, kink turns, adenosine platforms and tandem purine–purine base pairs. Worm representations for every pair of matching conformers, as identified in stage 1, were compared to all worms in the library. A candidate pair was eliminated if a match occurred with any motif in the library. The resulting set was further reduced by keeping only those stretches of nucleotides that contain three or more markedly non-helical nucleotides, as determined by their and values. The remaining pairs of identical stretches of nucleotides were found to be novel motifs and reported here.

    Finally, a census was taken for each new motif identified, as certain examples were missed by the stringent cutoff employed in stage 1. A worm representation for each new motif was automatically used as a probe worm to search through the data set. A modified PRIMOS search was used to perform this analysis. PRIMOS was modified to include sugar pucker as an intrinsic parameter. Sugar pucker was calculated using the phase angle, P (26). The modified PRIMOS search yielded all stretches of nucleotides in which (P) < 50° and , with respect to the probe worm. Here,

    where the sum is taken over the length of the probe worm, and

    A cutoff was then empirically determined by inspecting the list of possible motifs. This inspection resulted in the comprehensive list of motifs reported in Table 1.

    Table 1. A census of the motifs discovered with COMPADRES

    Software availability

    The COMPADRES search can be reproduced using PRIMOS, our library of known motifs (Supplementary Table 2 online), and our data set of PDB files (Supplementary Table 1 online). All of these are available for download at http://www.pylelab.org. The integrated COMPADRES program will be made available upon request.

    RESULTS

    Summary of the search for motifs

    During the first stage of the COMPADRES analysis, structurally identical stretches of nucleotides were automatically grouped in a pair-wise fashion (see Methods). This resulted in 1300 non-helical stretches of nucleotides (5 or longer) that were conformationally identical to at least one other stretch of nucleotides in the data set. This requirement for duplication of candidate conformers derives from the generally accepted definition of an RNA motif: an RNA fold or substructure that occurs more than once in the database.

    During the second stage of analysis, the worm representation (string of – coordinates) that describes each of the candidate motifs was automatically compared to the worm representations that describe the library of known motifs. All matches to known motifs were discarded, along with any substructures that contained 2 non-helical nucleotides. This procedure resulted in four novel substructures, each of which was repeated at least once in the database (i.e. four new recurrent RNA motifs were identified). It is worth noting that a search for novel RNA folds (in which there would be only one example in the database) is necessarily a different, simpler, computational procedure that yields a different set of novel RNA conformations (data not shown).

    After the COMPADRES search was completed, PRIMOS was used to conduct a census that determined the total population of each new motif in the database. All occurrences of each new motif were thereby identified, including examples that had been lost during the stringent selection process used by COMPADRES (see Table 1).

    New motifs identified by COMPADRES

    The -turn

    The -turn is mathematically characterized by the – coordinate string described in Table 2. It consists of five conformationally similar nucleotides along one strand, and it mediates a 120° change in backbone direction (Figure 1A). Qualitatively, this results in tight pinching of one RNA strand and the exclusion of specific extrahelical bases. On the 5'-side of the -turn, nucleotides 1 and 2 stack upon an adjacent helix. The 3'-side of the turn is terminated by nucleotide 5, which stacks upon a second helix. As a consequence of the acute backbone bend, the base planes of nucleotides 3 and 5 adopt a side-by-side arrangement, but they do not form a typical di-nucleotide platform motif. The combination of this side-by-side arrangement and the incoming and outgoing strands forms a shape that resembles the Greek letter, . A prominent signature of the -turn is nucleotide 4, which is flipped out and tends to engage in numerous interactions with RNA, protein or both (Figure 1A). The only consistent internal interaction that appears to confer structural stability to the -turn is a hydrogen bond between the 2'-hydroxyl groups of nucleotides 1 and 5. The change in strand direction in the -turn resembles the kink turn and some examples of the hook turn motif. However, the conformational features of the -turn are distinct, as determined by PRIMOS, by conventional RMSD superposition (Figure 1B) and even by simple visual inspection (Supplementary Figure 1 online).

    Table 2. Characteristic (,) coordinates for constituent nucleotides in the reported motifs (the motif coordinate string)

    Figure 1. The -turn motif. (A) A Ribbons (51) example of an isolated -turn, 0:A408–C412, from H50S. The five structurally similar nucleotides (blue) are flanked by two helical strands (yellow). Numbering is from 5' to 3'. (B) A superposition generated by MOLMOL (52) of the backbones of the seven -turns found in our data set. (C) Locations of the four H50S (PDB entry: 1JJ2 ) -turns (highlighted in red) in secondary structure. Secondary structures are derived from the Comparative RNA Web Site (53). (D) A Ribbons (51) drawing of the two symmetric -turns, 0:G1873–G1877 (magenta) and 0:C1854–A1858 (yellow), shown in their helical context. The nucleotides displayed comprise helix-66 from H50S. Nucleotides not part of the canonical -turns are shown in blue.

    It is difficult to predict the presence of a -turn from inspection of a secondary structure map. While some are located within large internal loops, others are located at apparent junctions between small internal loops and helices (Figure 1C). Unlike other well-known RNA motifs, -turns display little evidence for consistent sequence (Table 1) or base pairing, a trait made clear when Leontis–Westhof base-pairing diagrams are constructed for the occurrences of the -turn (data not shown). Only a couple of consistencies are evident: the -turn is often flanked on its 5' side by a Watson–Crick base pair. Nucleotide 5 usually participates in a base pair, but its character varies. Interestingly, nucleotides 2–4 of the -turn often do not participate in clear base pairs at all. Despite having a paucity of recurrent base pairing patterns and sequence, the bases of the prototypical -turns superimpose quite well (see Supplementary Figure 2 online). As such, the -turn exemplifies the fact that RNA can adopt a coherent and recurrent three-dimensional fold without having recurring primary or secondary structure.

    Seven structurally similar -turns were discovered in the non-redundant data set. Five of these are from H50S (PDB identifier: 1JJ2 ) (16), one is from an aptamer that binds vitamin B-12 (PDB identifier: 1DDY ) (27), and another is in the structure of domains P4–P6 from the Tetrahymena ribozyme (PDB identifier: 1HR2 ) (8,28) (Table 1). The most ‘generic’ -turn was identified in chain 0 at positions G1873–G1877 (hereafter 0:G1873–G1877) in H50S, and the other six examples of -turn superimpose on this structure with an average backbone RMSD of 1.07 ? (range: 0.78–1.39 ?) (Figure 1B). As a basis of comparison, the well-known kink–turn motifs superimpose with an average RMSD of 1.7 ? (16).

    The -turns at 0:G1873–G1877 and 0:C1854–A1858 of H50S (Figure 1D) are particularly interesting because they are located on opposing strands of the same bulge/duplex region (Figure 1C). As such, they are fused together into a single fold that imposes pseudo two-fold symmetry on helix-66 of H50S. They create a sharp bend in the helix axis, resulting in exposure of backbone residues and extrahelical nucleotides, facilitating extensive contacts with protein and nearby RNA strands. Also contributing to the symmetry of this structure is an S2II-turn (0:A1869–U1874, 0:C1856–U1860) (13) that partially overlaps with the two -turns. The two strands of the S2II-turn almost appear as mirror images of one another and further enhance the striking symmetry of helix-66. These two -turns are not the only examples that appear to reside in a larger structural motif. The -turn at 0:A408–C412 of H50S is found near the conserved core of a loop E-like (bulged-G) motif (29). Although the -turn is found near known motifs, in general the examples of -turns have highly variable structural contexts.

    The -turn participates in molecular interactions that have clear biological significance. One of the double -turns in H50S, 0:G1873–G1877, provides the only contact with Arg-51 and Arg-120 of protein L2 (Figure 2). These amino acids are essential for L2 binding to the 23S rRNA (30), and L2 is necessary for association between the 30S and 50S subunits (31). It is therefore likely that the H50S 0:G1873–G1877 -turn is a linchpin for holding the ribosomal subunits together.

    Figure 2. The hydrogen bonding network between a -turn and protein. This -turn binds to the ribosomal protein L2. G1873–G1877 is the only region of RNA that interacts with the amino acids Arg-120 and Arg-51 (not shown), which are necessary for association between the 50S and 30S subunits. The figure was made using PyMOL (http://www.pymol.org).

    The -turn

    The -turn motif is defined by a backbone trajectory that exhibits two conspicuous changes in strand direction (Figure 3A, Table 2). The first turn (180°) occurs between nucleotides 2 and 3, while the second (90°) occurs between nucleotides 4 and 5. This results in an overall bend of 90° between incoming and outgoing helical strands, creating a conformation that is shaped like an ‘’. There is no apparent signature for an -turn when inspecting secondary structure diagrams (Figure 3B) and sequence (Table 1), although they tend to appear as ‘single-stranded’ parts of loop regions. More common base-pairing features emerge in the ensemble of -turns than in the -turns. The motif is generally flanked on both its 5' and 3' sides by G–C base pairs. Within the motif, nucleotides 1 and 4 are involved in Watson–Crick base pairs with one or more neighboring strands. Nucleotide 5 also often (but not always) makes a Watson–Crick pair, and nucleotide 3 usually makes a base pair with its Watson–Crick edge, although its exact conformation varies. Finally, nucleotide 2 sometimes makes stabilizing hydrogen bonds, but no clear base-pairing pattern is evident. Despite some consistencies, the -turn secondary structure appears less regular than other well-known motifs. It is also dominated by Watson–Crick base pairs. Nevertheless, the base plane orientations in the examples of the -turn discovered with COMPADRES are remarkably similar (Figure 3C), and the fold of the motif is markedly non-helical (Figure 3A).

    Figure 3. The -turn motif. (A) An example of an -turn motif (0:G1416–C1420 of H50S). The five -turn motif nucleotides (blue) are flanked by two helical strands (grey). Numbering is from 5' to 3'. The figure was made using PyMOL (http://www.pymol.org). (B) Secondary structure context for the three -turns found in H50S. (C) A superposition made with MOLMOL (52) of the three -turns found exclusively using COMPADRES (see Table 1). Backbone and sugar atoms were selected as the targets of the superposition since the sequences of the -turns vary.

    Five -turns (Table 1) were identified in the non-redundant data set and all have the same backbone morphology, as evidenced by their low 0.61 ? average RMSD with 0:G1416–U1420 of H50S. Three of the -turns were identified in H50S, one was found in the 30S ribosomal subunit of Thermus thermophilus (T30S) (PDB identifier: 1N32 ) (32), and one was identified in an aptamer that binds streptomycin (PDB identifier: 1NTB ) (33).

    Like the -turn, an example of the -turn is found to be a constituent of a larger well-known motif. Specifically, one of the -turns (0:C245–G249 of H50S) forms part of the strand opposite the ‘kinked’ strand of a kink turn found in helix-15. However, the four other -turns do not share this context and have structural environments that vary considerably.

    The -turn is observed in positions of great biological significance. For example, nucleotides A:U751–G755 of T30S bind to protein S15, and in studies of homologous organisms, these nucleotides are essential for the S15 interaction (34). S15 plays a critical role in ribosome biogenesis, as the rRNA–S15 complex initiates the earliest stages of ribosome assembly (35). Intriguingly, S15 autoregulates its own expression by binding directly to the S15 mRNA, and biochemical studies have indicated that the mRNA shares a similar conformation to that observed for the S15 binding site in 16S rRNA (36), suggesting that a -turn may exist in S15 mRNA, as well.

    The -loop

    The -loop is a ‘loop-de-loop’ structure that looks like the letter, , and extrudes from the side of an otherwise normal duplex (Figure 4A). The motif contains eight conformationally invariant nucleotides, six of which comprise the roughly circular loop. Strand reversals between nucleotides 2 and 3 and nucleotides 6 and 7 are essential for creating the tight circular shape, which places four negative phosphate groups in close proximity (Figure 4A and B). To stabilize this electrostatically unfavorable conformation, the -loop binds a magnesium ion (Figure 4B) and positive protein side chains. As noted by Klosterman for a more diverse group of related motifs, the -loop contains an array of perfectly stacked nucleotides (motif nucleotides 3–6, Figure 4C) that create apparent helical structure within a single-stranded section of RNA (37).

    Figure 4. The -loop motif. (A) An example of the structure of the -loop motif in isolation (A:C503–A510 of T30S). The eight structurally invariant nucleotides are shown in blue, and the helix in which it resides is shown in grey. Numbering is from 5' to 3'. (B) The -loop motif is stabilized by an Mg2+ ion (in red) through both inner and outer sphere interactions. The solvation shell of the ion is shown in blue, and the grey spheres are the pro-S and pro-R oxygens from the -loop that coordinate with the ion. (C) A Ribbons (51) drawing showing the -loop (blue) interacting with the terminus of helix-18 (cyan) and the 5' end of 16S rRNA, helix-1 (magenta). Dashed lines show putative interactions between the -loop and the 5' tail of 16S rRNA.

    COMPADRES identified two occurrences of the -loop: 0:G1100–A1107 between helix-40 and helix-41 of H50S; and A:C503–A510 within helix-18 of T30S. The backbones of these two examples superimpose with an RMSD of 0.38 ?. The -loop can be considered a subclass of the ‘extruded helical single-strand motif’ (37), which is a conformationally diverse grouping of substructures. It is notable that the two -loops described here are more alike structurally than any other pair of extruded helical single-strand motifs. Thus, COMPADRES is capable of distinguishing a distinct conformational grouping in a larger but qualitative structural category.

    Unlike the -turn and -turn, the two -loops have base pairings that are remarkably similar. Nucleotides 1–4 are involved in Watson–Crick base pairs. A di-nucleotide platform involves nucleotide 5. Nucleotide 6 makes an A-minor interaction, and nucleotide 7 is part of a base triple.

    The -loop was identified in two locations of biological significance. The proteins that bind -loops are S4 and L13. L13 is important during early assembly of the 50S subunit (38). S4 is required for in vitro assembly of the 30S subunit (39,40), and is the first protein to bind 16S rRNA in vivo (41). As such, the -loop in T30S is of particular interest. It buttresses the pseudoknot containing the nucleotide 0:G530 (Figure 4C). The universally conserved G530 makes critical contacts to the codon–anticodon duplex during translation (42). The same -loop also interacts with the 5' terminus of 16S rRNA, where it may play a role in first steps of 30S assembly (Figure 4C) (43).

    The C2'-endo mediated flipped adenosine (C2FA)

    In addition to its unique – coordinate string (Table 2), the six nucleotide C2'-endo mediated flipped adenosine (C2FA) motif is characterized by a single flipped adenosine (nucleotide 4) that interrupts a stretch of otherwise helical nucleotides (Figure 5A). The conformation is supported by a recurring pattern of sugar pucker conformations, which includes C2'-endo puckering at nucleotide 3. This results in a characteristic hydrogen bond between the 2'-OH of nucleotide 3 and the pro-R oxygen of nucleotide 5 (Figure 5A). In both examples identified by COMPADRES, the flipped adenosine engages in A-minor interactions with neighboring helices (Figure 5B).

    Figure 5. The C2'-endo mediated flipped adenosine motif. (A) In blue are nucleotides 0:G446–C451 of H50S, and its complementary helical strand is shown in grey. Dashed lines indicate the hydrogen bond between the 2'-OH of nucleotide 3 and the pro-R oxygen of nucleotide 5. (B) The flipped adenosine (A449, blue) participates in an A-minor interaction with the minor groove of a neighboring helix (grey). Hydrogen bonds are shown in red. The figure was drawn using PyMOL (http://www.pymol.org).

    This case demonstrates that COMPADRES can detect subtle conformational differences among substructures that would be missed by RMSD comparisons alone. For example, the two C2FA motifs that are reported here, 0:G446–C451 and 0:A2019–A2024 of H50S, superimpose on each other with a backbone RMSD of 0.78 ?. However, visual inspection of the H50S revealed two other single base bulges that share a backbone morphology and sugar pucker pattern with that of the C2FA motif. COMPADRES did not group these bulges as members of the C2FA family despite an RMSD between the two families of 1.2 ?. Nonetheless, the excluded examples lack the characteristic hydrogen bond between nucleotides 3 and 5, and their bulged bases are oriented differently from that of the C2FA motif. Hence, an RMSD comparison would have grouped structures that were more precisely distinguished by COMPADRES.

    DISCUSSION

    Biologically important RNA substructures contain many new motifs

    Many, if not all, of the most conformationally elaborate new motifs are located in biologically essential regions of RNA substructure, such as the interfaces between 23S rRNA and L2, 16S rRNA and S15, or the core architecture of a group I intron. These motifs have clearly evolved to maintain complex networks of interactions with both RNA and protein. Given their locations, it is remarkable that these motifs have remained unnoticed. This fact attests to the difficulty of evaluating large RNA structures exclusively through visual inspection. New mathematical and computational approaches are clearly essential for modern analyses of RNA structure and its function.

    Structural invariance without consistent primary or secondary structure

    From the perspective of bioinformatics, there are sobering implications from this study. The substructures described here are indisputably RNA motifs: they are structurally complex, superimposable by both PRIMOS and RMSD comparisons, and there are numerous cases of each example. Many are of critical importance to the biological function of their parent RNA molecules. However, each motif would be difficult to discern from primary sequence or secondary structure alone. This indicates that de novo prediction and modeling of RNA tertiary structure will remain a daunting challenge and it highlights the limitations in our understanding of the forces that drive RNA tertiary structure formation. There is clearly great redundancy in the way that nucleobases can interact with each other, and this extends to their participation in complex RNA substructures. Indeed, recent evidence has suggested that evolution can play a role in selecting three-dimensional structure, even when consistent primary or secondary structure is lacking (44). It may be possible that similar evolutionary pressures dictate the conformations of some structural motifs in RNA.

    In this work and elsewhere (13,15), RNA motifs have been defined mathematically as recurrent elements of consistent three-dimensional architecture. It is important to consider whether this definition is appropriate, given that the substructures identified here do not always share base-pairing and stacking patterns, they are based on single strands, and they may not have the ability to fold in isolation. Indeed, motif studies that consider these latter features have provided valuable insight into RNA interactions and structure (45). We contend that COMPADRES provides a valuable complementary approach because it defines motifs as spatially and mathematically discrete entities that can be rigorously compared to other elements of structure in order to conduct quantitative discourse on similarities and differences between structures, per se, and not their contexts or thermodynamic stability.

    Structural invariance reflected exclusively in backbone conformation

    This study underscores the extreme structural conservation of specific RNA motifs and the fact that they can be grouped by backbone configuration. Families of substructures identified in this report superimpose with average RMSDs of 1.1 ? (-turn), 0.61 ? (-turn), 0.38 ? (-loop), and 0.78 ? (C2FA). This is consistent with the structural invariance of GNRA tetraloops and the various S-turn motifs. What is particularly surprising in this case, however, is that unique substructures and their structural consistency were detected with a search technique that contains no explicit information on conformation or positioning of the nucleobase. At its heart, COMPADRES relies only on the information contained in the artificial pseudotorsions and . The only inherent information on base orientation is imbedded in , which contains the conventional torsion angle that varies with the glycosidic torsion angle (26). Despite this apparently tenuous relationship between pseudotorsions and base orientation, motif superpositions reveal a striking picture: the base planes of each motif discovered with COMPADRES superimpose remarkably well (see Supplementary Figure 2 online and Figure 3C). The fact that COMPADRES was successful underscores the fact that RNA can be adequately described using highly reduced representations that diminish the conformational ‘noise’ from the many degrees of freedom inherent in conventional torsion angles.

    A search probe that utilized a full set of standard backbone torsions would be expected to miss many, if not all of the motifs identified here. This is because most examples within each group contain anomalous standard torsion angles. For example, the 0:G1873–G1877 -turn of H50S is the ‘best’ -turn as determined from standard RMSD. However, four of the other six -turns each have at least one backbone torsion (, ?, , or ) that differs from the corresponding one in 0:G1873–G1877 by more than 90° (data not shown). By this standard, all of the -turns would have been missed. As reported previously, large deviations in standard torsions belie the overall structural consistency in many RNA motifs, and even in A-form structure (23,46). It may be possible to remedy this situation by using a sophisticated scoring function that permits variability within certain sets of standard torsions. Nonetheless, PRIMOS and COMPADRES are inherently simpler and yet they can catalog all known motifs and find new motifs in a quantitatively rigorous manner.

    While RMSD comparisons have been useful for demonstrating the degree of similarity among motif examples discovered here, an RMSD approach falls short when used as a de novo search tool for distinguishing conformational differences among a diverse grouping of structures. For example, a backbone RMSD superposition was calculated between the -turn at 0:C451–A455 of H50S and every other continuous stretch of five nucleotides in the data set. Using this approach, the ‘best match’ to 0:C451–A455 was found to be A:C1344–U1348 of T30S. Visual inspection immediately revealed a clear difference between A:C1344–U1348 and the canonical -turn. The backbone trajectory is similar to that of a -turn, but the base plane of G1347 is oriented 180° differently from that of U454. Thus, while a pseudotorsional search was sensitive to base plane orientation, a backbone RMSD search was not.

    Automated identification of novel RNA structure

    To the best of our knowledge, this report represents the first computationally automated discovery of novel RNA substructures. Previous studies sought RNA motifs using the semi-automated PRIMOS method (15,17), but an a priori search probe was necessary. The task was much like asking a program to find all the apples in an orchard that contained conventional fruit trees. But in this work, the task was more challenging: the program was asked if there was anything new in the garden. The fact that ‘prior knowledge’ was not required for discovery of RNA motifs is highly significant because it indicates that a program can be ‘trained’ to discern unique macromolecular forms.

    Parallels to COMPADRES have been implemented in the protein field. Automated methods have been used to group protein substructures, to determine relationships among them and to discover new motifs (47–49). One interesting difference in results from the protein studies and those from PRIMOS and COMPADRES is that RNA ‘motifs’ are generally smaller than substructures defined as protein ‘motifs’ (5–10 residues for RNA rather than 20–200 residues for protein). While this may be a result of the restricted size of the RNA structural database, a consequence of our limited analysis, or simply semantics, it may suggest that the building blocks for RNA tertiary structure tend to be relatively small units (5–10 nt).

    Conclusions and future directions

    This work suggests a more important role for the RNA phosphodiester backbone than previously imagined. Because the new structural motifs were defined exclusively by backbone conformation, and because they have recurring structure without consistency in sequence or interactions, the results imply an elegant interplay between the nucleoside and the backbone, rather than a purely base-driven view of structural form.

    Although COMPADRES has been engineered to identify new motifs, it is inherently capable of finding new examples of known motifs. For example, two new examples of the ‘hook turn’ were found during the course of this investigation (Table 1). While RMSD and PRIMOS comparisons show that previously reported ‘hook turns’ are a conformationally diverse grouping of substructures, the hook turns identified by COMPADRES superimpose with the strikingly low RMSD of 0.25 ?. This illustrates that COMPADRES could be used to redefine groupings of RNA substructure.

    Adaptations of COMPADRES could lead to additional discoveries and insights. For example, a more permissive COMPADRES search would uncover more motifs since this search was highly stringent and is likely to have missed other interesting motifs, particularly small ones. Also possible is the marriage of COMPADRES with additional parameters such as base-pairing or other constraints. Machine learning (50) could be incorporated more fully to understand patterns that would otherwise be missed. The need for sophisticated automated tools such as these will continue to grow along with the database of solved RNA structures. This work demonstrates that relatively unbiased, automated tools for the study of RNA are both powerful and amenable to development.

    SUPPLEMENTARY MATERIAL

    Supplementary Material is available at NAR Online.

    ACKNOWLEDGEMENTS

    The authors would like to thank Alexandre de Lencastre and other members of the Pyle lab for helpful discussions. This work was supported by a grant from NIH (GM50313) to A.M.P., who is an investigator of the Howard Hughes Medical Institute.

    REFERENCES

    Cech,T.R. ( (2002) ) Ribozymes, the first 20 years. Biochem. Soc. Trans., , 30, , 1162–1166.

    Kim,S.H., Suddath,F.L., Quigley,G.J., McPherson,A., Sussman,J.L., Wang,A.H.J., Seeman,N.C. and Rich,A. ( (1974) ) Three-dimensional tertiary structure of yeast phenylalanine transfer RNA. Science, , 185, , 435–440.

    Robertus,J.D., Ladner,J.E., Finch,J.T., Rhodes,D., Brown,R.S., Clark,B.F. and Klug,A. ( (1974) ) Structure of yeast phenylalanine tRNA at 3 ? resolution. Nature, , 250, , 546–551.

    Ramakrishnan,V. ( (2002) ) Ribosome structure and the mechanism of translation. Cell, , 108, , 557–572.

    Leontis,N.B. and Westhof,E. ( (2003) ) Analysis of RNA motifs. Curr. Opin. Struct. Biol., , 13, , 300–308.

    Moore,P.B. ( (1999) ) Structural motifs in RNA. Annu. Rev. Biochem., , 68, , 287–300.

    Nissen,P., Ippolito,J.A., Ban,N., Moore,P.B. and Steitz,T.A. ( (2001) ) RNA tertiary interactions in the large ribosomal subunit: the A-minor motif. Proc. Natl Acad. Sci. USA, , 98, , 4899–4903.

    Cate,J.H., Gooding,A.R., Podell,E., Zhou,K., Golden,B.L., Kundrot,C.E., Cech,T.R. and Doudna,J.A. ( (1996) ) Crystal structure of a Group I ribozyme domain: principles of RNA packing. Science, , 273, , 1678–1685.

    Tamura,M. and Holbrook,S.R. ( (2002) ) Sequence and structural conservation in RNA ribose zippers. J. Mol. Biol., , 320, , 455–474.

    Leontis,N.B. and Westhof,E. ( (2001) ) Geometric nomenclature and classification of RNA base pairs. RNA, , 7, , 499–512.

    Leontis,N.B., Stombaugh,J. and Westhof,E. ( (2002) ) Motif prediction in ribosomal RNAs lessons and prospects for automated motif prediction in homologous RNA molecules. Biochimie, , 84, , 961–973.

    Heus,H.A. and Pardi,A. ( (1991) ) Structural features that give rise to the unusual stability of RNA hairpins containing GNRA loops. Science, , 253, , 191–194.

    Correll,C.C., Beneken,J., Plantinga,M.J., Lubbers,M. and Chan,Y.L. ( (2003) ) The common and the distinctive features of the bulged-G motif based on a 1.04 ? resolution RNA structure. Nucleic Acids Res., , 31, , 6806–6818.

    Yang,X., Gerczei,T., Glover,L.T. and Correll,C.C. ( (2001) ) Crystal structures of restrictocin-inhibitor complexes with implications for RNA recognition and base flipping. Nature Struct. Biol., , 8, , 968–973.

    Duarte,C.M., Wadley,L.M. and Pyle,A.M. ( (2003) ) RNA structure comparison, motif search and discovery using a reduced representation of RNA conformational space. Nucleic Acids Res., , 31, , 4755–4761.

    Klein,D.J., Schmeing,T.M., Moore,P.B. and Steitz,T.A. ( (2001) ) The Kink-turn: a new RNA secondary structure motif. EMBO J., , 20, , 4214–4221.

    Szep,S., Wang,J. and Moore,P.B. ( (2003) ) The crystal structure of a 26-nucleotide RNA containing a hook-turn. RNA, , 9, , 44–51.

    Hershkovitz,E., Tannenbaum,E., Howerton,S.B., Sheth,A., Tannenbaum,A. and Williams,L.D. ( (2003) ) Automated identification of RNA conformational motifs: theory and application to the HM LSU 23S rRNA. Nucleic Acids Res., , 31, , 6249–6257.

    Murray,L.J., Arendall,W.B.,III, Richardson,D.C. and Richardson,J.S. ( (2003) ) RNA backbone is rotameric. Proc. Natl Acad. Sci. USA, , 100, , 13904–13909.

    Schneider,B., Moravek,Z. and Berman,H.M. ( (2004) ) RNA conformational classes. Nucleic Acids Res., , 32, , 1666–1677.

    Olson,W.K. ( (1976) ) The spatial configuration of ordered polynucleotide chains. I. helix formation and base stacking. Biopolymers, , 15, , 859–878.

    Malathi,R. and Yathindra,N. ( (1982) ) Secondary and tertiary structural foldings in tRNA. A diagonal plot analysis using the blocked nucleotide scheme. Biochem. J., , 205, , 457–460.

    Duarte,C. and Pyle,A.M. ( (1998) ) Stepping through an RNA structure: a novel approach to conformational analysis. J. Mol. Biol., , 284, , 1465–1478.

    Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. ( (2000) ) The Protein Data Bank. Nucleic Acids Res., , 28, , 235–242.

    Duarte,C.M. ( (2002) ) Computational approaches to the analysis and prediction of RNA structure. Ph.D. thesis, Columbia University, New York.

    Saenger,W. ( (1984) ) Principles of Nucleic Acid Structure, 1st edn. Springer Verlag, New York.

    Sussman,D., Nix,J.C. and Wilson,C. ( (2000) ) The structural basis for molecular recognition by the vitamin B 12 RNA aptamer. Nature Struct. Biol., , 7, , 53–57.

    Juneau,K., Podell,E., Harrington,D.J. and Cech,T.R. ( (2001) ) Structural basis of the enhanced stability of a mutant ribozyme domain and a detailed view of RNA–solvent interactions. Structure (Camb.), , 9, , 221–231.

    Leontis,N.B. and Westhof,E. ( (1998) ) A common motif organizes the structure of multi-helix loops in 16 S and 23 S ribosomal RNAs. J. Mol. Biol., , 283, , 571–583.

    Harada,N., Maemura,K., Yamasaki,N. and Kimura,M. ( (1998) ) Identification by site-directed mutagenesis of amino acid residues in ribosomal protein L2 that are essential for binding to 23S ribosomal RNA. Biochim. Biophys. Acta, , 1429, , 176–186.

    Diedrich,G., Spahn,C.M., Stelzl,U., Schafer,M.A., Wooten,T., Bochkariov,D.E., Cooperman,B.S., Traut,R.R. and Nierhaus,K.H. ( (2000) ) Ribosomal protein L2 is involved in the association of the ribosomal subunits, tRNA binding to A and P sites and peptidyl transfer. EMBO J., , 19, , 5241–5250.

    Ogle,J.M., Murphy,F.V., Tarry,M.J. and Ramakrishnan,V. ( (2002) ) Selection of tRNA by the ribosome requires a transition from an open to a closed form. Cell, , 111, , 721–732.

    Tereshko,V., Skripkin,E. and Patel,D.J. ( (2003) ) Encapsulating streptomycin within a small 40-mer RNA. Chem. Biol., , 10, , 175–187.

    Serganov,A., Benard,L., Portier,C., Ennifar,E., Garber,M., Ehresmann,B. and Ehresmann,C. ( (2001) ) Role of conserved nucleotides in building the 16 S rRNA binding site for ribosomal protein S15. J. Mol. Biol., , 305, , 785–803.

    Held,W.A., Ballou,B., Mizushima,S. and Nomura,M. ( (1974) ) Assembly mapping of 30 S ribosomal proteins from Escherichia coli. J. Biol. Chem., , 249, , 3103–3111.

    Serganov,A., Polonskaia,A., Ehresmann,B., Ehresmann,C. and Patel,D.J. ( (2003) ) Ribosomal protein S15 represses its own translation via adaptation of an rRNA-like fold within its mRNA. EMBO J., , 22, , 1898–1908.

    Klosterman,P.S., Hendrix,D.K., Tamura,M., Holbrook,S.R. and Brenner,S.E. ( (2004) ) Three-dimensional motifs from the SCOR, structural classification of RNA database: extruded strands, base triples, tetraloops and U-turns. Nucleic Acids Res., , 32, , 2342–2352.

    Herold,M. and Nierhaus,K.H. ( (1987) ) Incorporation of six additional proteins to complete the assembly map of the 50 S subunit from Escherichia coli ribosomes. J. Biol. Chem., , 262, , 8826–8833.

    Nomura,M. ( (1973) ) Assembly of bacterial ribosomes. Science, , 179, , 864–873.

    Held,W.A. and Nomura,M. ( (1973) ) Rate determining step in the reconstitution of Escherichia coli 30S ribosomal subunits. Biochemistry, , 12, , 3273–3281.

    Nowotny,V. and Nierhaus,K.H. ( (1988) ) Assembly of the 30S subunit from Escherichia coli ribosomes occurs via two assembly domains which are initiated by S4 and S7. Biochemistry, , 27, , 7051–7055.

    Ogle,J.M., Brodersen,D.E., Clemons,W.M.,Jr., Tarry,M.J., Carter,A.P. and Ramakrishnan,V. ( (2001) ) Recognition of cognate transfer RNA by the 30S ribosomal subunit. Science, , 292, , 897–902.

    Culver,G.M. ( (2003) ) Assembly of the 30S ribosomal subunit. Biopolymers, , 68, , 234–249.

    Krasilnikov,A.S., Xiao,Y., Pan,T. and Mondragon,A. ( (2004) ) Basis for structural diversity in homologous RNAs. Science, , 306, , 104–107.

    Matsumura,S., Ikawa,Y. and Inoue,T. ( (2003) ) Biochemical characterization of the kink-turn RNA motif. Nucleic Acids Res., , 31, , 5544–5551.

    Portmann,S., Usman,N. and Egli,M. ( (1995) ) The crystal structure of r(CCCCGGGG) in two distinct lattices. Biochemistry, , 34, , 7569–7575.

    Holm,L. and Sander,C. ( (1994) ) Searching protein structure databases has come of age. Proteins, , 19, , 165–173.

    Orengo,C.A., Jones,D.T. and Thornton,J.M. ( (1994) ) Protein superfamilies and domain superfolds. Nature, , 372, , 631–634.

    Orengo,C.A., Sillitoe,I., Reeves,G. and Pearl,F.M. ( (2001) ) Review: what can structural classifications reveal about protein evolution? J. Struct. Biol., , 134, , 145–165.

    Mitchell,T. ( (1997) ) Machine Learning. McGraw-Hill, New York.

    Carson,M. ( (1991) ) Ribbons 2.0. J. Appl. Cryst., , 24, , 958–961.

    Koradi,R., Billeter,M. and Wuthrich,K. ( (1996) ) MOLMOL: a program for display and analysis of macromolecular structures. J. Mol. Graph., , 14, , 29–32, 51–55.

    Cannone,J.J., Subramanian,S., Schnare,M.N., Collett,J.R., D'Souza,L.M., Du,Y., Feng,B., Lin,N., Madabusi,L.V., Muller,K.M. et al. ( (2002) ) The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics, , 3, , 2.(Leven M. Wadley and Anna Marie Pyle1,2,*)