当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 病菌学杂志 > 2005年 > 第1期 > 正文
编号:11201638
Evolutionary Trace Residues in Noroviruses: Import
http://www.100md.com 病菌学杂志 2005年第1期
     Verna and Marrs Mclean Department of Biochemistry and Molecular Biology

    Keck Center for Computational Biology

    Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas

    ABSTRACT

    Noroviruses cause major epidemic gastroenteritis in humans. A large number of strains of these single-stranded RNA viruses have been reported. Due to the absence of infectious clones of noroviruses and the high sequence variability in their capsids, it has not been possible to identify functionally important residues in these capsids. Consequently, norovirus strain diversity is not understood on the basis of capsid functions, and the development of therapeutic compounds has been hampered. To determine functionally important residues in noroviruses, we have analyzed a number of norovirus capsid sequences in the context of the Norwalk virus capsid crystal structure by using the evolutionary trace method. This analysis has identified capsid protein residues that uniquely characterize different norovirus strains and provide new insights into capsid assembly and disassembly pathways and the strain diversity of these viruses. Such residues form specific three-dimensional clusters that may be of functional importance in noroviruses. One of these clusters includes residues known to participate in the proteolytic cleavage of these viruses at high pH. Other clusters are formed in capsid regions known to be important in the binding of antibodies to noroviruses, thereby indicating residues that may be important in the antigenicity of these viruses. The highly variable region of the capsid shows a distinct cluster whose residues may participate in norovirus-receptor interactions.

    INTRODUCTION

    Norwalk-like caliciviruses (noroviruses) cause over 90% of epidemic nonbacterial gastroenteritis in humans. The single-stranded RNA genome of noroviruses is organized into three open reading frames (ORFs). ORF1 encodes a 200-kDa polyprotein that is processed into at least six nonstructural proteins, ORF2 encodes the 60-kDa capsid protein VP1, and ORF3 encodes the basic minor structural protein VP2 (10). Comparisons of the ORF1 and ORF2 nucleotide sequences show a wide genetic diversity in noroviruses (2, 10). A large number (>100) of norovirus strains have been sequenced all over the world. Distinguishing these strains has been a major effort in the epidemiology of noroviruses (8) that has resulted in their classification into two major genogroups, GI and GII, with each genogroup consisting of seven genetic clusters, and three arbitrarily classified minor genogroups, GIII, GIV, and GV (10, 21, 31, 43). Such classification has relied on comparisons of the genomic and capsid sequences of noroviruses. A more complete understanding of the sequence relatedness among the different norovirus strains will be possible if functionally important regions in these sequences are identified.

    Little information is available about the molecular details of norovirus functions. Nucleoside triphosphatase and protease activities have been identified in two of the ORF1 proteins (5, 33, 39), while other enzymatic functions of the ORF1 proteins have mostly been deduced from their sequence similarities with the nonstructural proteins in picornaviruses. The experimental elucidation of VP1 functions and serotyping of noroviruses have not been possible because of the lack of propagation systems for these viruses. Cross-challenge studies with human volunteers and infected patients, along with immunoelectron microscopy and solid-phase immunoelectron microscopy, were the only sources earlier to identify antigenic relationships between some of the native norovirus strains (20, 24). Despite these difficulties, empty recombinant capsids of noroviruses that show morphological and antigenic similarity with the native virions (18) have been used as surrogates of the virions for structural and functional studies including enzyme immunoassay-based antigenic studies on many different norovirus strains.

    The capsids of noroviruses and all other caliciviruses are composed of a single predominant protein, VP1. Consequently, all major requirements for the assembly, receptor binding, host specificity, and antigenicity of noroviruses reside in VP1. Limited data on receptor binding and the antigenicity of noroviruses and only one norovirus crystal structure, that of the icosahedral T=3 recombinant Norwalk virus (rNV) capsid, are available. The rNV structure (Protein Data Bank [PDB] code 1IHM) shows that the VP1 subunit has a shell (S) and a protruding (P) domain consisting of a middle P1 and a distal P2 subdomain that is an insertion in the virus sequence (34). The S domain is primarily important in forming the icosahedral capsid shell (4). The P1 subdomain has been implicated in the antigenicity of noroviruses (11, 13), while the P2 subdomain has been suggested to bind to cellular receptors of these viruses (16). Available cryo-electron microscopy structural studies on various caliciviruses indicate that these viruses share a similar modular domain organization (7, 35).

    The norovirus capsid protein sequences show high variability. The S domain sequences are 30% identical, while the sequence identity is only about 11 and 8% in the P1 and the P2 subdomains, respectively (this study). Such variability among norovirus sequences makes it difficult to use conventional sequence comparison methods to detect conservation patterns that may indicate possible functional sites on norovirus capsids. The evolutionary trace (ET) method (25) that exploits phylogenetic tree-based sequence comparisons along with crystal structure information has been successfully applied to detect functional sites in a wide variety of proteins (26, 28, 40). We have applied the ET method for the first time to a viral system to detect capsid protein residues that uniquely characterize different norovirus strains and that may be important in the assembly and function of these human pathogens.

    MATERIALS AND METHODS

    The complete amino acid sequences of the capsid protein of 56 different noroviruses (Fig. 1, sequences S1 to S56) including all prototype and antigenic strains (10), among others, were aligned by using ClustalW1.8 (42) with default parameters on the European Bioinformatics Institute server. The aligned sequences and the rNV coordinates were submitted to the Cambridge University ET server (17). By using a Phylip distance matrix computed from all the sequences, a phylogenetic tree was constructed (Fig. 1). The sequences on different branches of the tree were grouped into different evolutionary classes according to their degree of similarity. To generate these classes, an evolutionary time cutoff line first split the phylogenetic tree into 10 evenly distributed partitions, P01 to P10, in order of increasing divergence (Fig. 1). In a given partition, the sequences that originated from a common node on the phylogenetic tree and shared the evolutionary time cutoff line that created the partition formed a class. This ensured that the most similar sequences belonged to the same class while the more distant ones belonged to different classes. Ten partitions sufficed to avoid random inclusion of sequences in any class because no distinct class-specific surfaces were created after partition P10. In a given partition, sequences within different classes were separately aligned, and the resultant aligned classes were compared to obtain their consensus residues, called the trace residues, for that partition. Three types of trace residues were identified: those that remained invariant across all classes were designated absolutely conserved residues (ACRs), those that remained strictly conserved within a class but differed between various classes were designated class-specific residues, while trace residues that showed no conservation within any class were designated neutral (17, 25). The ACRs and the class-specific residues were mapped onto the rNV structure. Class-specific residues forming structural clusters in the vicinity of the ACRs were analyzed for possible biological significance by using the rNV crystal structure and available biochemical information due to the propensity of such residues to form functional sites in a large number of proteins (27, 44). Because trace residues could not be defined for single sequence branches, such branches were not considered as independent classes. The extent of exposure of the class-specific residues was computed, using cutoff values of 0.3 of the occluded surface packing indices of the atoms (32). Such calculations were done individually for each of the icosahedral interfaces to account for the different degree of exposure of a given residue located at such interfaces. In order to compare the ET-based classification of noroviruses with their known classification based on conventional phylogenetic analyses (10), 10 additional sequences belonging to genogroups GI through GV were included in the alignment and in the tree (Fig. 1). However, these additional sequences were not included in most of the subsequent ET analyses. A few sequences were included twice to ensure positive controls in the alignment and in the tree generation process.

    RESULTS AND DISCUSSION

    ET partition-dependent classes and genogroups. The different partitions P01 to P10 divide the phylogenetic tree into classes that vary with the partitions (Fig. 1). Individual partitions contain different numbers of classes (Table 1), where each class consists of a cluster of similar sequences originating from a given node within that partition. Nodes 1 and 2 create two branches in partition P01 (Fig. 1). One of these branches contains the murine norovirus sequence, while the other branch consists of the remaining norovirus sequences, indicating that the murine sequence may indeed constitute an independent and distinct norovirus genogroup (21). In partition P02, nodes 3 and 4 that diverge from node 1 distinguish the GI sequences A1 to A20, referred to as class AG1, from the GII sequences A21 to A69, referred to as class AG2 (Fig. 1 and Table 1). Class AG1 includes the bovine sequences while class AG2 includes the swine and the Alphatron sequences (Table 1), indicating thereby that the bovine sequences are similar to GI sequences, while the swine and the Alphatron-like sequences are similar to GII sequences in partition P02. In partition P03, however, the bovine sequences branch off independently into class AG3 at node 6, while the remaining GI and GII sequences of partition P02 remain in clusters AG1a (sequences A1 to A17, corresponding to node 5) and AG2 (sequences A21 to A69 including the swine and the Alphatron, corresponding to node 4), respectively (Table 1 and Fig. 1). Thus, the bovine sequences are quite similar to the GI sequences only up to partition P02. In contrast, the swine and the Alphatron sequences bear a relatively stronger resemblance to the GII sequences because of their grouping together even in partition P03. It is only in the two nodes 7 and 8 of partitions P04 and P05 that the Alphatron-like sequences become distinct from the other genogroup II sequences (Table 1, class AG2b; Fig. 1). The AG2b class containing the Alphatron sequences remains unchanged beyond partition P05. Such partition-dependent similarities indicate that the bovine sequences may cluster in genogroup GI while the Alphatron sequences may cluster in genogroup GII, in contrast to the independent genogroup GIII and GIV status proposed previously for these sequences, respectively (31, 43).

    Partition P06 creates further GI classes and distinguishes the swine sequences from other GII sequence classes. This partition creates two GI clusters, AG1b (A1 to A12) and AG1c (A13 to A18) from the AG1a cluster, and four GII clusters, AG2c (A21 to A35), AG2d (A36 to A40), AG2e (A41 to A65) and AG2f (A66 and A67), from the AG2a cluster (Table 1 and Fig. 1). Sequences in cluster AG1b are similar to the GI reference sequences NV (A1), Chiba (A4), Musgrove (A8), Southampton (SOV; A9), and the bs5_DE (A11), while sequences in cluster AG1c are similar to the GI reference sequences Desert Shield (DSV; A17) and Winchester (A18). The GII sequences A21 to A35 of cluster AG2c are similar to the Bristol virus (A32) reference sequence, while sequences A36 to A40 of cluster AG2d are similar to the Amsterdam (A38) and the Leeds (A39) reference sequences. The GII sequences A41 to A65 of cluster AG2e are similar to the Melksham (A42), Hillingdon (A44), Hawaii (HV; A54), Seacroft (A55), and the Toronto (A59) reference virus strains, while the swine sequences (A66 and A67) branch off into an independent GII class, AG2f in this partition (Table 1 and Fig. 1).

    Further distinctions create new classes of similar sequences in partition P07 and beyond. Five new GI classes, AG1d to AG1h, and two new GII classes, AG2h and AG2i, are created in partition P07 (Table 1) where each new class contains at least one reference strain. The NV (A1) sequence shows similarity with the Japanese Aichi (A2) and KY89 (A3) strains, and these sequences branch off together into the AG1d class. The AG1e class consists of the Chiba (A4), Koblenz (A5), Valetta (A6), Thistlehall (A7), and the Musgrove (A8) strains. The AG1f class consists of the SOV (A9) and the Whiterose (A10) strains; the AG1g class consists of the bs5 (A11) and the Sindlesham (A12) strains, while the AG1h class consists of the Norway (A13), Potsdam (A14), VA115 (A15), Birmingham (A16), and the DSV (A17) strains (Table 1). The AG2h class (A41 to A54) includes the Melksham (A42), Hillingdon (A44), and the HV (A54) reference strains, while the AG2i class (A56 to A65) includes the Toronto (A59) reference strains (Table 1). The swine strains remain in the AG2f class in all the remaining partitions (Table 1). One new GI class (AG1j) and five new GII classes (A2j, A2k, A2m, A2n, and A2o) are created in partition P08 from the classes AG1e, Ag2d, and AG2h of partition P07. The Idaho (A36) and VA207 (A37) strains are similar enough to occur in class AG2j, and Leeds (A39) and Gwynedd (A40) are also similar enough to occur in class AG2k of this partition. The AG2m class groups the similar Chesterfield (A41), Melksham (A42), and Snow-Mountain (SMV; A43) strains, while the Hillingdon (A44), MOH (A45), and Whiterose (A46) strains occur in the AG2n class. The Chitta (A48), Schwerin (A49), Wortley (A50), Pirna (A51), Dillingen (A52), Wiesbaden (A53), and HV (A54) strains are grouped into the AG2o class (Table 1 and Fig. 1). Partition P09 splits the AG2o class of partition P08 into two distinct classes, AG2q (A48 to A51) and class AG2r containing the A52 to A54 sequences. The other P08 classes remain unchanged in this partition. Similarly, partition P10 creates a new GII class, AG2s (Table 1 and Fig. 1). Thus, classes originate from distinct nodes in each partition. Such classes belonging to partitions P01 and P02 distinguish between the two major norovirus genogroups GI and GII, while classes in the remaining partitions P03 to P10 increasingly resolve the various sequence clusters within these genogroups (Table 1 and Fig. 1).

    The ET classes (Table 1 and Fig. 1) correctly identify all known genetic clusters obtained from conventional phylogenetic analysis (10). Such clusters lie in different partitions because, unlike conventional phylogenetic analyses that determine the clusters by comparing all the sequences together by using arbitrary cutoff values of a sequence similarity index (2, 10), the ET classes are determined separately for each partition by comparing only those sequences that belong to the nodes originating in that partition. This results in well-defined, nonoverlapping ET classes unlike the conventional histogram-based cluster analyses that may show overlapping regions in the histogram (2). Besides, many new classes containing sequence similarities that may not be obvious in conventional phylogenetic analyses are created in the various ET partitions. Such similarities may be further understood from the conservation patterns of 56 of the GI and GII sequences (Fig. 1, S1 to S56) that form well-defined ET classes (Tables 1 and 2; Fig. 1). These results remain unchanged even when all the 70 sequences A1 to A70 (Fig. 1) are included in the analysis.

    Partitioning of phylogenetic tree reveals hidden conservation patterns. The ET analysis reveals characteristic conservation patterns in the regions that appear to be variable in conventional sequence comparisons. This is a consequence of partitioning the phylogenetic tree (Fig. 1) and comparing the resultant sequence classes, in contrast to conventional sequence comparisons where all given sequences are compared together. In partition P01, all 56 sequences cluster into one class (Table 2). In partition P02, these sequences separate into the two classes GI and GII that correspond to the two major norovirus genogroups (Fig. 1). Class GI consists of sequences S1 to S10, while class GII consists of the sequences S12 to S56 (Table 2). No further classes are created in partitions P03 and P04. However, all subsequent partitions P05 to P10 create additional classes, each of which contains a number of sequence clusters (Table 2). Comparisons of classes in partitions P01 to P10 show interesting differences in their variable regions (Fig. 2A). Large variable regions are seen in the single class of partition P01, similar to results of a conventional comparison of all the 56 sequences taken together. However, when the sequences contained in classes GI and GII of partition P02 are separately aligned and the aligned classes are compared, class-specific and neutral trace residues emerge in the variable regions of partition P01. For example, residue 44 is variable if all 56 sequences are aligned together in the P01 partition, but it becomes a class-specific residue (X) in a comparison of the two aligned classes GI and GII in the P02 partition (Fig. 2A). This residue is a conserved Ala in class GI containing all the genogroup I norovirus sequences, while it is a conserved Pro in all the genogroup II norovirus sequences of class GII (Table 2). In contrast, residue 39S is neutral because it remains variable in both partitions P01 and P02, while residue 42A is an ACR because it is conserved in both of these partitions (Fig. 2A). Similar comparisons of classes in the subsequent P03 to P10 partitions reveal several class-specific residues in regions that appear to be highly variable in the preceding partitions (Fig. 2A). Thus, systematic comparisons of the different classes reveal class-specific conservation patterns that would otherwise be hidden if all the sequences were compared together.

    Class-specific trace residues distinguish norovirus strains and may explain their antigenic diversity. The class-specific trace residues in different partitions uniquely distinguish different noroviruses including the antigenic strains. These include the NV (S8), SMV (S36), and the HV (S27) strains that are known to be antigenically distinct from each other (10) and SOV (S6), DSV (S3), Lordsdale virus (LV; S45), and Mexico virus (MXV; S12) that have either been shown or suggested to be distinct antigenic strains (9). Partition P01 contains all these antigenic strains together (Table 2; Fig. 1). In partitions P02, P03, and P04, the class GI (genogroup I sequences S1 through S10), which includes the antigenic strains DSV, SOV, and NV, contains the following class-specific residues (NV numbering): 44A, 70L, 104V, 106N, 163E, 201V, 203A, and 204G in the S domain; residue 226Q in the hinge region; and residues 329H, 375W, 377S, 397S, 414F, 434P, 460H, 463D, 471G, 500V, 514K, and 519A in the P domain (Table 2). In contrast, the corresponding residues in class GII (genogroup II sequences S12 through S56), which includes the antigenic strains MXV, HV, SMV, and LSV, are 44P, 70V, 104A, 106G, 163P, 201T, 203S, 204C, 226S, 329G, 375P, 377G, 397G, 414P, 434D, 460R, 463N, 471F, 500Y, 514A, and 519G (Table 2). Moreover, class-specific insertions at residues 485P, 486N, and 527G occur in class GI only (Table 2).

    The GI class-specific residues of partitions P02 to P04, along with their structural neighbors as seen in the NV crystal structure, form 15 surfaces that may be important in distinguishing the different GI and GII strains. A majority (87%) of such surfaces contains at least one exposed residue, and six of the surfaces are highly exposed with each surface containing more than three exposed residues (Table 3). Most of these exposed surfaces are located in the S and the P1 domains. The S domain exposed surfaces 1, 4, and 5 consist of residues interacting across the quasi-threefold and the icosahedral threefold and the fivefold axes. The remaining exposed surfaces (6, 11, and the common C-terminal end occurring in surfaces 13 to 15) consist of residues lying in the hinge region and the P1 domain (Table 3). Both of the P2 domain surfaces 7 and 8 are quite buried, although one of these surfaces (8) is relatively more exposed than the other. Interestingly, the P2 domain surface 7 is the only class-specific surface that includes the dimeric axis (Table 3).

    The S and the P domain class-specific surfaces of the GII antigens are not known because of the lack of a GII crystal structure. However, it is reasonable to assume that the GII class-specific residues that align with the GI surface-forming residues in a multiple sequence alignment interact to form surfaces in GII antigens as well. Characteristic differences among the class-specific capsid protein residues result in differences in the surfaces formed by these residues (Tables 2 and 3). Such differences uniquely distinguish all genogroup I noroviruses from those of genogroup II including the antigenic strains.

    Similar comparisons of class-specific residues in the subsequent partitions P05 to P10 further distinguish the different norovirus antigenic strains belonging to a given genogroup and identify the class-specific trace residues that lead to such distinctions. The partition P05 contains the three classes GIa, GIIa, and GIIb (Table 2). The class GIa consists of the genogroup 1 sequences S1 to S10, while the remaining two classes GIIa and GIIb consist of the genogroup II sequences S12 to S42 and S43 to S56, respectively. The distribution of the antigenic strains in class GIa remains unchanged with respect to class GI of partition P02; the class GIIa retains the antigenic strains MXV, HV, and SMV from class GII of partition P02, while the antigenic strain LSV branches out independently into the genogroup II class GIIb (Table 2). The antigenic strains are further resolved in partition P06 where the strains DSV, SOV, and NV occur in genogroup I classes GIb, GIc, and GId, respectively. Among the genogroup II antigenic strains, MXV and LSV separate out into the distinct classes GIIc and GIIf, respectively, while HV and SMV occur in class GIId of this partition (Table 2). These two antigenic strains separate only in partition P07, where they occur in classes GIIg and GIIh, respectively (Table 2). Although all currently known antigenic strains are distinguished in partition P07, it is possible that additional antigenic strains occur in these partitions as well as in partitions P08 to P10. What about the class-specific residues that distinguish the strains belonging to a given genogroup? In partition P05, residues in locations 83, 181, 227, 306, and 492 uniquely distinguish the sequences in class GIIa of genogroup II from the sequences in class GIIb of the same genogroup. These residues are 83L, 181R, 227K, 306G, and 492P in class GIIa (and class GIa of genogroup I) but 83A, 181K, 227R, 306W, and 492D in class GIIb (Table 2). Similar variations in the class-specific residues distinguish more sequence clusters and the antigenic strains belonging to the classes GIb, GIc, GId, GIe, GIIc, GIId, GIIe, GIIf, GIIg, GIIh and GIIi of partitions P06 and P07 (Table 2). Variations in the class-specific residues for subsequent partitions P08 to P10 further distinguish between the sequences in these partitions (see supplemental material found at http:/ncmi.bcm.tmc.edu/chin).

    Detection of such subtle sequence variations by the ET method is a significant improvement over conventional cluster analysis methods in which sequence clusters are obtained by comparing the separation distance of all the sequences taken together. In conventional cluster analysis, distance histograms are plotted, using a distance filter in the form of an arbitrary cutoff value of the separation distance, and the highest peaks in these plots indicate the sequence clusters (2, 10). The fundamental drawback in this approach is that by comparing the separation distances of all the sequences together, no use is made of the sequence similarity information that is already embedded in the phylogenetic tree through its nodes and the connectivities among them. Consequently, the single-distance filter cannot detect the tree-structured similarities at various nodes of the tree. Instead, the filter detects only the large peaks in the histogram that indicate only the gross similarity patterns between the sequences. Smaller peaks, corresponding to features of closely related sequence clusters, are often not visible in such histograms. In other words, cluster analysis that uses a single scalar value of the distance filter can discriminate only large differences between sequences such as that between distinct genogroups but cannot trace the subtle variations that exist between closely related norovirus antigenic strains. In contrast, because the ET method retains the connectivity information present within the tree while comparing the partition-based classes, it can detect very small differences between the class-specific residues of the antigenic sequences. Such fine variations in the class-specific residues in various partitions uniquely characterize the different norovirus antigenic and other strains, thereby explaining the diversity of existing and emerging norovirus strains.

    Trace residues may uniquely characterize norovirus strains. Given a new norovirus strain whose capsid protein sequence may or may not be known completely, it is possible to identify the genogroup and the sequence cluster of the strain uniquely by systematically examining the class-specific trace residue locations. For example, if a new strain contains the class-specific residues of class GI, it would belong to genogroup I, while the strain would belong to genogroup II if it contained the class-specific residues of class GII (Table 2). Furthermore, if the strain belonged to genogroup II and it contained the class-specific residues 83L, 181R, 227K, 306G, and 492P, the strain would be similar to the genogroup II sequences S12 to S42 of class GIIa in partition P05 (Table 2). Otherwise, if the same genogroup II strain contained residues 83A, 181K, 227R, 306W, and 492D, it would be similar to the genogroup II sequences S43 to S56 of class GIIb in partition P05 (Table 2). Further categorization of the norovirus strain into a suitable cluster would be possible by comparing its residues at the class-specific locations corresponding to the subsequent P06 to P10 partitions of the phylogenetic tree. Such a procedure has been applied to the partial sequence of a norovirus isolate called Japanese oyster (29). On aligning this sequence with other norovirus sequences, locations 44 and 70 (NV numbering) are residues P and V, respectively (Fig. 2B). This indicates that the sequence belongs to class GII of partitions P02, P03, and P04 (Table 2). Furthermore, because position 83 is an L (Fig. 2B), this sequence belongs to class GIIa of partition P05 (Table 2). Further comparisons in partitions P06 and P07 show that locations 43, 46, 65, and 82 are A, T, G, and N, respectively, that categorize the Japanese oyster sequence in class GIIc along with the similar sequences S12 to S23 in (Table 2), which is consistent with published results (29). Similar comparisons can be continued for subsequent partitions to place the oyster sequence in smaller classes if required.

    What about the murine (A70), bovine (A19 and A20), and the Alphatron-like (A68 and A69) sequences? All these sequences share some ACRs with GI and GII norovirus sequences when all the sequences are compared together in partition P01 (see supplemental material at http:/ncmi.bcm.tmc.edu/chin). However, no class-specific residues characteristic of murine sequences can be defined because only one such sequence (A70) is known. More murine sequences are needed to characterize them uniquely into appropriate ET classes on the basis of their class-specific residues. In contrast, the bovine sequences (Table 2, sequences A19 and A20) share nearly 50% ACRs and class-specific residues with other GI noroviruses in partitions P01 and P02 (see supplemental material at http:/ncmi.bcm.tmc.edu/chin), after which the bovine sequences become independent, indicating thereby that these sequences may belong to a GI class instead of an independent genogroup as has been proposed (31). Similarly, the Alphatron-like sequences (Table 1, sequences A68 and A69) share class-specific residues with the GII sequences up to partition P03 (see supplemental material at http:/ncmi.bcm.tmc.edu/chin), beyond which they branch off independently. This indicates that the Alphatron-like sequences are similar to GII sequences instead of being an independent genogroup (43). However, more such bovine and Alphatron-like sequences along with their structures are needed to evaluate whether such class-specific correlations with other GI and GII sequences are significant or random.

    Clearly, the ET nodes and classes may formally define norovirus genogroups. By definition, any class must contain at least two sequences in order to define their class-specific residues. Let a node be designated "parent" if it has further "child" nodes diverging from it. Thus, 1 and 2 are child nodes of the parent (root) node 0, while 5 and 6 are child nodes of parent node 3 (Fig. 1). Distinct genogroups may be defined as those classes that originate in nodes having only the root as the parent node. Thus, ignoring the single murine sequence at node 2, as this sequence does not form a class, genogroups I and II are clearly the classes that belong to nodes 3 and 4 because these are the only nodes that have the root as their parent node (Fig. 1).

    Such ET-based definitions and assignments of genogroups and classes have advantages. The main advantage of such assignments of genogroup and class is that knowledge of trace residues alone is sufficient to uniquely characterize complete or partial norovirus sequences. In addition, unlike conventional methods, such assignments may easily be automated to make the method cost-effective. As the trace residue locations are distributed throughout the genome, partial sequences from a large number of genomic regions may be used to carry out such analysis. However, the number of trace residues that should be known depends on the desired resolution with which the class of the isolate needs to be determined in the phylogenetic tree. The fewer the number of trace locations sequenced, the broader will be the classification due to the fewer number of class-specific differences considered (Table 2). In order to understand whether such differences have functional implications, the possible roles of the different types of norovirus class-specific trace residues need to be examined.

    S domain: significance of the ACRs and class-specific residues in norovirus assembly. The rNV structure shows that a majority (65%) of the ACRs are hydrophobic, buried, and located in the S domain. Most of these residues (58%) interact with neighboring subunits across the icosahedral interfaces (Fig. 3A and B). Therefore, hydrophobic interactions between the S-domain ACRs are important in maintaining the icosahedral structure in noroviruses. The number of interactions (NI) between the ACRs across the icosahedral interfaces of rNV show a descending order (NI)A-B5 > (NI)B5-C > (NI)A-A5 > (NI)C-B2 > (NI)ABC, where the subscripts indicate the corresponding interfaces shown in Fig. 3A. The dimeric (A-B5) interface of the rNV structure contains the largest numbers of ACRs that participate in intersubunit interactions, while the quasi-trimeric (ABC) interface contains the minimum number of such interacting ACRs. The pentameric (A-A5) interface contains fewer interactions than the dimeric (A-B5) and the hexameric (B5-C) interfaces but more than the number of interactions at the hexameric (C-B2) and the quasi-trimeric ABC interfaces. Such variations in NI across the different icosahedral interfaces match the variations in the calculated values of the buried surface areas (BA) of the rNV interfaces (BA)A-B5 > (BA)B5-C > (BA)A-A5 > (BA)C-B2 > (BA)ABC, which is in agreement with similar calculations shown on the VIPER website (36).

    The interfaces and their BAs are important determinants of the energetics of the assembly of icosahedral viruses (6, 14, 19, 23, 37, 38). In modeling the assembly pathway of icosahedral capsids based on the energetics of the assembly process of the different subunits, the assumption is usually made that the association energies between the subunits during assembly are directly correlated with their BAs computed from X-ray coordinates of the assembled capsid (37), provided these BAs are not significantly altered due to interactions of the virus subassemblies with their environment. Following this assumption, the relative values of the NIs between the ACRs at these rNV interfaces (or the BAs of the corresponding interfaces) suggest a model for the assembly and disassembly of norovirus capsids. The inequality of these NIs indicates that the T=3 norovirus capsid should follow different pathways during its assembly and disassembly. The most stable capsid interface is likely to assemble first, followed by the assembly of the less stable interfaces. Because the stability of the capsid interfaces is related to their BAs (19, 37), it follows that the capsid interfaces with the largest BA should assemble the earliest, while those with progressively lower BA values should assemble later. Therefore, during assembly, monomers should associate first to form the AB5 and CC2 dimers (Fig. 3A). The dimers would associate among themselves to form A-B5-C-C2 (dimer of dimers) intermediates that form the pentamer around the fivefold axis. Association of the pentamers creates the quasi-threefold environment in a natural way. Disassembly of the virus is likely to follow the reverse pathway if no significant changes in the BAs of the subassemblies occur due to virus-environment interactions. Thus, the S domain ACRs, by virtue of their locations, appear to be important in the assembly and disassembly processes in noroviruses.

    What about the S domain class-specific residues? A majority of such residues is also found to cluster near the NV icosahedral interfaces at the early P02 partition in the evolutionary tree (Table 4). The class-specific residues 70I, 201V, 203A, and 204G cluster together to form surface patch 1, while residues 104V, 106N, and 163E cluster to form the second surface patch 2 in genogroup I viruses (Table 4). The corresponding genogroup II residues are 70V, 201T, 203S, and 204C for surface 1 and 104A, 106G, and 163P for surface 2. Because of the absence of a GII crystal structure, it is not possible to precisely determine whether these residues form surfaces in these noroviruses. However, covariance matrices of the pairwise volume correlations of the substitutions at these residue locations show relatively high values for the diagonal terms in comparison with the off-diagonal terms at a 95% statistical confidence limit (data not shown). Assuming that correlated residue substitutions often imply functional interactions between such substituted residue locations (1, 3), this indicates a significant probability of interactions among such GII S domain class-specific residues across the icosahedral interfaces. The class-specific residue 44A, in the N-terminal helix of the genogroup I rNV structure near the quasi-sixfold (CB2 and CB5) interfaces (Fig. 3A and B), is a 44P in the genogroup II viruses. Such differences in the class-specific residues and their surfaces (Tables 4 and 5) in various partitions may be important in distinguishing subtle aspects of the assembly and disassembly processes in noroviruses of the two genogroups and in the clusters that are generated in the respective partitions (Table 4; Fig. 4A).

    Hinge region: residues implicated in proteolytic cleavage form a class-specific surface patch. The class-specific residues of the hinge region connecting the S domain and the P1 subdomain indicate different functional specificities compared to the S domain. The exposed class-specific residues 226Q, 227K, and 229R in the hinge region cluster to form a surface patch CS-1 around the exposed ACR 225E (Table 4 and Fig. 4B and C). Because this region undergoes proteolytic cleavage at high pH (12) and this patch starts forming in the P02 partition where the two genogroups diverge (Tables 2 and 4), residues in this patch may be critical for such cleavages in both norovirus genogroups. Although the exact significance of this cleavage site in the norovirus life cycle is not known, the evolutionary importance of the hinge region shown by the ET analysis indicates biological significance for this region.

    P1 subdomain: observed antigenic regions contain class-specific surfaces. The class-specific residues in the P1 subdomain form two patches in surfaces 9 through 15 that contain the highly exposed C-terminal residues (Tables 3 and 4). These two patches CS-2 and CS-3 that begin developing in the P02 partition (Table 4 and Fig. 4B and C) may be antigenically significant in both norovirus genogroups. The patch CS-2, centered about the C-terminal ACR 515P, develops into a large exposed patch in partition P06. This patch, consisting of the exposed residues 414F, 514K, and 520S along with the buried residues 463D, 485P, 486N, and 519A (Table 4 and Fig. 4B and C), may be critical in defining the C-terminal epitopes reported in binding studies of noroviruses with monoclonal antibodies (13). The S domain patch 1 containing the exposed residues T192, T196, and S199 (Table 4) may be considered together with the CS-2 patch as part of a larger S and P domain surface patch that may be part of the epitopes (Fig. 4B). The moderately exposed CS-3 patch (Table 4), centered about the ACRs 465D and 466T that interact with the S domain across the fivefold axis, may define the additional nonoverlapping epitopes observed in NV (11).

    Residue differences in the class-specific patches CS-2 and CS-3 between the two genogroups may explain the observed genogroup-specific reactivity differences of certain monoclonal antibodies (11). Residues 414F, 463D, 514K, and 519A that constitute patch CS-2 in all genogroup I noroviruses as shown in class GI of partition P02 (Tables 2 and 4), are 414P, 463N, 514A, and 519G in all genogroup II noroviruses as shown in class GII for the same partition (Table 2). In partitions P05 and P06, insertions at residues 485P and 486N of patch CS-2 in all genogroup II norovirus strains (Table 2) introduce additional genogroup-specific differences in this patch. Similarly, residues 434P, 460H, and 500V that comprise patch CS-3 in all genogroup I noroviruses (Table 2, class GI of partition P02) are 434D, 460R, and 500Y in all genogroup II noroviruses, as shown in class GII for the same partition (Tables 2). Such differences indicate that the patch CS-2 may be important in defining genogroup-specific epitopes that distinguish between the genogroup I antigenicity from that of genogroup II (11, 22).

    Partitions P05 and P06 indicate cluster-specific modifications to the CS-2 and CS-3 patches. The CS-2 patch shows the addition of only one class-specific residue at location 520 in the P05 partition relative to the P02 partition (Table 4). In contrast, the patch CS-3 contains many differences in the P05 and the P06 partitions relative to the P02 partition. These differences are in class-specific residues corresponding to the locations 492, 236, 241, 257, 409, 419, and 420 (Tables 2 and 4). Because of a larger number of cluster-specific variations in the CS-3 patch in comparison with the CS-2 patch, the CS-3 patch may be relatively more important in defining strain-specific epitopes that can distinguish the antigenicity of the different clusters in the two genogroups (22). Thus, the possible epitopes on the CS-2 and the CS-3 surface patches in the P1 subdomain may be part of genogroup-specific and strain-specific epitopes, respectively, in noroviruses. These surfaces are relatively more conserved in GI compared to the GII strains. Such conservation in these exposed NV surfaces may contribute to the observed homologous seroresponses of GI-infected patients to the recombinant NV antigen in contrast to that of GII infections (30). However, the hinge region patch CS-1 is the only class-specific patch that consists of residues that are nearly conserved between the two genogroups (Table 2). This patch may be involved in binding cross-reactive monoclonal antibodies that react with both the norovirus genogroups (22). The relatively sharp distinction in the seroresponses of GII-infected patients to the GII antigens HV and Toronto strains (30) may be understood from partition P07 in which HV (A54) lies in class AGIIh that is distinct from class AGIIi containing the Toronto (A59) antigen (Table 1). The class-specific residues 65N, 241L and 509Q in the GIIc and GIId classes of partition P06 mainly distinguish between these two strains (Table 2).

    P2 subdomain: class-specific residues indicate a putative carbohydrate-binding site. Two class-specific surface patches are formed in the P2 domain. The patch CS-4 consists of the class-specific residues at locations (NV numbering) 329, 373, 375, and 377 (Table 4 and Fig. 4B). Residues corresponding to these locations in genogroup I noroviruses are 329H, 373L, 375W, and 377S. These residues are located near the only P2 domain cavity at the dimeric interface of rNV as seen in its crystal structure. Additional residues 267N, 322D, 327D, 374S, 331N, 333T, 334Q and 341T also lie in the vicinity of this cavity. Because all of these residue types are known to bind carbohydrate and sugar molecules and because blood group H antigens have been suggested to be the receptors for noroviruses from binding studies (16), it is possible that the CS-4 patch of this groove is involved in carbohydrate receptor binding of noroviruses. Recent studies have implicated a sequence motif that lies near one end of this cavity, in the binding of carbohydrate to noroviruses (41).

    The two norovirus genogroups show significant differences in the class-specific residues of the CS-4 patch. Residues 329H, 375W, and 377S in genogroup I noroviruses of class GI are 329G, 375P, and 377G in the genogroup II noroviruses of class GII in the P02 partition (Table 2). Such residue variations in the CS-4 patch may explain the differences in carbohydrate binding to noroviruses of the two genogroups (16). Additional class-specific differences in this patch for the subsequent partitions may explain the host specificity in the receptor binding of the NV (GI) strain and the five GII (Grimsby, VA387, VA207, MOH, and MXV) strains (15). The earliest partition that distinguishes among these GII strains is P06 in which the Grimsby (A26) and VA387 (A27) strains belong to class AGIIc and VA207 (A37) belongs to class AGIId, while MOH (A45) and MXV (A60) belong to class AGIIe (Table 1). The class-specific surface residues of the proposed receptor-binding CS-4 surface patch are identical in VA387 and Grimsby, whereas they are only similar but not identical in MXV and MOH (Table 5). This may explain why the VA387 and the Grimsby strains share very similar receptor binding patterns that are different from the patterns of the MXV and MOH strains (15). In addition, a greater similarity in the class-specific residues of the CS-4 surfaces (Table 5) may explain the relatively similar receptor binding patterns observed for the VA387 and the Grimsby strains in comparison with that of the MXV and MOH strains. Because these residues are highly conserved among the A21 to A35 strains of class AGIIc compared to that of the A41 to A65 strains of class AGIIe in partition P06 (Table 1), the strains in class AGIIc may share a greater similarity in receptor binding characteristics in comparison to the strains of class AGIIe. The VA207 strain of class AGIId in partition P06 (Table 1) shows markedly different residues in the CS-4 surface in comparison with other noroviruses (Table 5). These differences may explain the distinct receptor binding characteristics observed for the VA207 strain (15). Interestingly, the GII class AGIId shows (Table 1) very low conservation of the CS-4 surface, indicating highly nonhomologous receptor binding patterns that should be expected for the A36 to A40 strains of this class (Table 1 and Fig. 1). In contrast, NV, Aichi, and the KY-89 GI strains belonging to the AG1d class in partition P07 (Table 1) show high sequence identity among themselves for the CS-4 surface (Table 5), indicating a very similar receptor binding pattern expected for these strains.

    A second class-specific surface patch CS-5 is also seen in the P2 subdomain. This patch, consisting of the buried residues 306G and 310H (Table 4; Fig. 4B and C), may define part of the same carbohydrate-binding site CS-4, or it may be involved in additional antigenic binding in noroviruses. The exact functional significance of this patch is not clear, similar to another surface patch that consists of the exposed residue 402A and the buried residue 447S near the CS-1 patch of the hinge region (Fig. 4B).

    Conclusion. The ET analysis, hitherto carried out only in proteins, of the norovirus sequences has allowed identification of class-specific trace residues of the capsid protein that may be functional signatures defining the strain diversity in noroviruses. In contrast to the existing classification schemes based on conventional phylogenetic analysis, the ET approach has the advantage that the classification of the viruses in a given genus or family can be understood on the basis of putative functional sites deduced from evolution-related subtle variations in their sequences and independent biochemical evidence. The generality of the approach allows similar analyses to be performed with nucleotide sequences. Our preliminary analysis of noroviruses ORF2 nucleotide sequences has identified class-specific nucleotides that vary between the strains of genogroups 1 and 2 (data not shown). Such nucleotides may possibly be used for designing primers for reverse transcription PCRs to identify genogroup-specific norovirus strains.

    ACKNOWLEDGMENTS

    We thank Tracy D. Parker, Rong Chen, and Robert L. Atmar for useful discussions.

    Grants from NIH (RO1AI-38036 and P01AI-57788 to M.K.E. and B.V.V.P.) and the R.A. Welch Foundation (B.V.V.P.) supported this work. NIH grants T32DK-07644 and CA-09197 supported A.M.H.

    REFERENCES

    Altschuh, D., A. M. Lesk, A. C. Bloomer, and A. Klug. 1987. Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J. Mol. Biol. 193:693-707.

    Ando, T., J. S. Noel, and R. L. Fankhauser. 2000. Genetic classification of Norwalk-like viruses. J. Infect. Dis. 181(Suppl. 2):S336-S348.

    Atchley, W. R., K. R. Wollenberg, W. M. Fitch, W. Terhalle, and A. W. Dress. 2000. Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. Mol. Biol. Evol. 17:164-178.

    Bertolotti-Ciarlet, A., L. J. White, R. Chen, B. V. Prasad, and M. K. Estes. 2002. Structural requirements for the assembly of Norwalk virus-like particles. J. Virol. 76:4044-4055.

    Blakeney, S. J., A. Cahill, and P. A. Reilly. 2003. Processing of Norwalk virus nonstructural proteins by a 3C-like cysteine proteinase. Virology 308:216-224.

    Caspar, D. L. D., and A. Klug. 1962. Physical principles in the construction of regular viruses. Cold Spring Harbor Symp. Quant. Biol. 27:1-24.

    Chen, R., J. D. Neill, J. S. Noel, A. M. Hutson, R. I. Glass, M. K. Estes, and B. V. Prasad. 2004. Inter- and intragenus structural variations in caliciviruses and their functional implications. J. Virol. 78:6469-6479.

    Glass, R. I., J. S. Noel, T. Ando, R. L. Fankhauser, G. Belliot, A. Mounts, U. D. Parashar, J. S. Bresse, and S. Monroe. 2000. The epidemiology of enteric caliciviruses from humans: a reassessment using new diagnostics. J. Infect. Dis. 181:S254-261.

    Green, J., J. Vinje, C. I. Gallimore, M. Koopmans, A. Hale, and D. W. Brown. 2000. Capsid protein diversity among Norwalk-like viruses. Virus Genes 20:227-236.

    Green, K. Y., R. M. Chanock, and A. Z. Kapiakan. 2001. Human caliciviruses, p. 841-874. In Fields virology, 4th ed., Lippincott, Williams & Wilkins, Baltimore, Md.

    Hale, A. D., T. N. Tanaka, N. Kitamoto, M. Ciarlet, X. Jiang, N. Takeda, D. W. Brown, and M. K. Estes. 2000. Identification of an epitope common to genogroup 1 "Norwalk-like viruses." J. Clin. Microbiol. 38:1656-1660.

    Hardy, M., L. White, J. Ball, and M. Estes. 1995. Specific proteolytic cleavage of recombinant Norwalk virus capsid protein. J. Virol. 69:1693-1698.

    Hardy, M. E., T. N. Tanaka, N. Kitamoto, L. J. White, J. M. Ball, X. Jiang, and M. K. Estes. 1996. Antigenic mapping of the recombinant Norwalk virus capsid protein using monoclonal antibodies. Virology 217:252-261.

    Harrison, S. C., A. J. Olson, C. E. Schutt, and F. K. Winkler. 1978. Tomato bushy stunt virus at 2.9? resolution. Nature 276:368-373.

    Huang, P., T. Farkas, S. Marionneau, W. Zhong, N. Ruvo?n-Clouet, A. L. Morrow, M. Altaye, L. K. Pickering, D. S. Newburg, J. LePendu, and X. Jiang. 2003. Noroviruses bind to human ABO, Lewis, and secretor histo-blood group antigens: identification of 4 distinct strain-specific patterns. J. Infect. Dis. 188:19-31.

    Hutson, A. M., R. L. Atmar, D. M. Marcus, and M. K. Estes. 2003. Norwalk virus-like particle hemagglutination by binding to H histo-blood group antigens. J. Virol. 77:405-415.

    Innis, C. A., J. Shi, and T. L. Blundell. 2000. Evolutionary trace analysis of TGF-beta and related growth factors: implications for site-directed mutagenesis. Protein Eng. 13:839-847.

    Jiang, X., M. Wang, D. Y. Graham, and M. K. Estes. 1992. Expression, self-assembly, and antigenicity of the Norwalk virus capsid protein. J. Virol. 66:6527-6532.

    Johnson, J. E., and J. A. Speir. 1997. Quasi-equivalent viruses: a paradigm for protein assemblies. J. Mol. Biol. 269:665-675.

    Kapikian, A. Z., R. G. Wyatt, R. Dolin, T. S. Thornhill, A. R. Kalica, and R. M. Chanock. 1972. Visualization by immune electron microscopy of a 27-nm particle associated with acute infectious nonbacterial gastroenteritis. J. Virol. 10:1075-1081.

    Karst, S. M., C. E. Wobus, M. Lay, J. Davidson, and H. W. Virgin IV. 2003. STAT1-dependent innate immunity to a Norwalk-like virus. Science 299:1575-1578.

    Kitamoto, N., T. Tanaka, K. Natori, N. Takeda, S. Nakata, X. Jiang, and M. K. Estes. 2002. Cross-reactivity among several recombinant calicivirus virus-like particles (VLPs) with monoclonal antibodies obtained from mice immunized orally with one type of VLP. J. Clin. Microbiol. 40:2459-2465.

    Lee, W.-M., and W. Wang. 2003. Human rhinovirus type 16: mutant V1210A requires capsid-binding drug for assembly of pentamers to form virions during morphogenesis. J. Virol. 77:6235-6244.

    Lewis, D. 1991. Norwalk agent and other small round structured viruses in the United Kingdom. J. Infect. 23:220-222.

    Lichtarge, O., H. R. Bourne, and F. E. Cohen. 1996. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257:342-358.

    Lichtarge, O., and M. E. Sowa. 2002. Evolutionary predictions of binding surfaces and interactions. Curr. Opin. Struct. Biol. 12:21-27.

    Madabushi, S., H. Yao, M. Marsh, D. M. Kristensen, A. Philippi, M. E. Sowa, and O. Lichtarge. 2002. Structural clusters of evolutionary trace residues are statistically significant and common in proteins. J. Mol. Biol. 316:139-154.

    Mihalek, I., I. Res, H. Yao, and O. Lichtarge. 2003. Combining inference from evolution and geometric probability in protein structure evaluation. J. Mol. Biol. 331:263-279.

    Nishida, T., H. Kimura, M. Saitoh, M. Shinohara, M. Kato, S. Fukuda, T. Munemura, T. Mikami, A. Kawamoto, M. Akiyama, Y. Kato, K. Nishi, K. Kozawa, and O. Nishio. 2003. Detection, quantitation, and phylogenetic analysis of noroviruses in Japanese oysters. Appl. Environ. Microbiol. 69:5782-5786.

    Noel, J. S., T. Ando, J. P. Leite, K. Y. Green, K. E. Dingle, M. K. Estes, Y. Seto, S. S. Monroe, and R. I. Glass. 1997. Correlation of patient immune responses with genetically characterized small round-structured viruses involved in outbreaks of nonbacterial acute gastroenteritis in the United States, 1990 to 1995. J. Med. Virol. 53:372-383.

    Oliver, S. L., A. M. Dastjerdi, S. Wong, L. El-Attar, C. Gallimore, D. W. Brown, J. Green, and J. C. Bridger. 2003. Molecular characterization of bovine enteric caliciviruses: a distinct third genogroup of noroviruses (Norwalk-like viruses) unlikely to be of risk to humans. J. Virol. 77:2789-2798.

    Pattabiraman, N., and K. Ward. 1995. Occluded molecular surface: analysis of protein packing. J. Mol. Recogn. 8:334-344.

    Pfister, T., and E. Wimmer. 2001. Polypeptide p41 of a Norwalk-like virus is a nucleic acid-independent nucleoside triphosphatase. J. Virol. 75:1611-1619.

    Prasad, B. V., M. E. Hardy, T. Dokland, J. Bella, M. G. Rossmann, and M. K. Estes. 1999. X-ray crystallographic structure of the Norwalk virus capsid. Science 286:287-290.

    Prasad, B. V., R. Rothnagel, X. Jiang, and M. K. Estes. 1994. Three-dimensional structure of baculovirus-expressed Norwalk virus capsids. J. Virol. 68:5117-5125.

    Reddy, V., P. Natarajan, B. Okerberg, K. Li, K. Damodaran, R. Morton, C. Brooks III, and J. E. Johnson. 2001. Virus particle explorer (VIPER), a website for virus capsid structures and their computational analyses. J. Virol. 75:11943-11947.

    Reddy, V. S., H. A. Giesing, R. T. Morton, A. Kumar, C. B. Post, C. L. Brooks III, and J. E. Johnson. 1998. Energetics of quasiequivalence: computational analysis of protein-protein interactions in icosahedral viruses. Biophys. J. 74:546-558.

    Rossmann, M. G., C. Abad-Zapatero, M. R. Murthy, L. Liljas, T. A. Jones, and B. Strandberg. 1983. Structural comparisons of some small spherical plant viruses. J. Mol. Biol. 165:711-736.

    Someya, Y., N. Takeda, and T. Miyamura. 2002. Identification of active-site amino acid residues in the Chiba virus 3C-like protease. J. Virol. 76:5949-5958.

    Sowa, M. E., W. He, K. C. Slep, M. A. Kercher, O. Lichtarge, and T. G. Wensel. 2001. Prediction and confirmation of a site critical for effector regulation of RGS domain activity. Nat. Struct. Biol. 8:234-237.

    Tan, M., P. Huang, J. Meller, W. Zhong, T. Farkas, and X. Jiang. 2003. Mutations within the P2 domain of norovirus capsid affect binding to human histo-blood group antigens: evidence for a binding pocket. J. Virol. 77:12562-12571.

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.

    Vinje, J., and M. P. Koopmans. 2000. Simultaneous detection and genotyping of "Norwalk-like viruses" by oligonucleotide array in a reverse line blot hybridization format. J. Clin. Microbiol. 38:2595-2601.

    Yao, H., D. M. Kristensen, I. Mihalek, M. E. Sowa, C. Shaw, M. Kimmel, L. Kavraki, and O. Lichtarge. 2003. An accurate, sensitive, and scalable method to identify functional sites in protein structures. J. Mol. Biol. 326:255-261.(Sugoto Chakravarty, Anne )