当前位置: 首页 > 期刊 > 《核酸研究》 > 2004年第We期 > 正文
编号:11371827
Qgrid: clustering tool for detecting charged and hydrophobic regions i
http://www.100md.com 《核酸研究医学期刊》
     Department of Biochemical Science and Engineering, Kyushu Institute of Technology, Iizuka 820 8502, Fukuoka-ken, Japan

    * To whom correspondence should be addressed. Tel: +81 948 29 7841; Fax: +81 948 29 27841; Email: shandar@bse.kyutech.ac.jp

    ABSTRACT

    We have developed a simple but powerful method and web server to quickly locate charged and hydrophobic clusters in proteins (http://www.netasa.org/qgrid/index.html). For the charged clusters, each atom in the protein is first assigned a charge according to a standard force field. Then a box is created with dimensions corresponding to the range of atomic coordinates. This box is then divided into cubic grids of selected size, which now have one or more charged atoms in them. This leaves each grid with a certain amount of charge. Cubic grids with more than a cutoff charge are then clustered using a hierarchical clustering method based on Euclidean distance. A tree diagram made from the resulting clusters indicates the distribution of charged and hydrophobic regions of the protein. Hydrophobic clusters are developed by grouping the positions of C atoms of such residues. We propose that such a tree representation will be helpful in detecting protein–protein interfaces, structure similarity and motif detection.

    INTRODUCTION

    Positive and negative charge clusters in proteins have been implicated in different biologically important functions and their importance has been realized for among other things, protein–protein interactions, DNA-binding, Ca+ and Na+ channeling and gating, electron transport and domain swapping (1–8). Similarly, hydrophobic clusters have been found to be of central importance in determining the stability, folding pattern, guanosine diphosphate (GDP) dissociation and similar properties of proteins (9–13). Despite enormous need to detect such charged and hydrophobic clusters in proteins, there is no web server to allow molecular biologists to detect such regions in proteins quickly. Molecular structure visualization programs such as Rasmol (now Protein Explorer) (14), Chime (http://www.mdli.com/) and VMD (15) may at best be used to locate surface distribution of residues and generate Connolly surfaces. Apart from their inability to locate the clusters in the interior of the protein, their very three-dimensional rendering makes it impossible to visualize the overall distribution without a need to rotate the structure and explore all possible orientations. Here we present a web server which can give the distribution of charged and hydrophobic regions and allow their quick visual location in a nutshell in two-dimensional cluster-tree diagrams. These two-dimensional diagrams allow us to inspect clusters in every part of the protein and present the results in a more concise manner which is free from the protein orientation and overall symmetry properties such as molecular chirality.

    MATERIALS AND METHODS

    The principle of Qgrid is quite straightforward. The atomic co-ordinates are read from a PDB-formatted file (16). Charges are assigned to every atom according to a standard force field . Using these potentials, the box parameters are calculated by choosing the extremities and forming a cube along those dimensions. This box is then divided into cubic grids of a selected dimension. The center of each grid now identifies it uniquely and (for charge clustering) charges are calculated on each grid (which may be due to one or more atoms falling inside the grid). Using these values of charges the cubic grids are clustered using a simple criterion of Euclidian distance and hierarchical clustering. For generating the postscript tree structures, cluster diagrams and the distance tables, we use the open source free software provided by P. Kleiweg (http://odur.let.rug.nl/~kleiweg/indexs.html). For hydrophobic clusters, charges are not assigned to all atoms. Only the C atoms of hydrophobic residues are assumed to have a pseudo-charge of 1.0 each. The rest of the clustering proceeds in the same way as that of the charged grids. Chain breaks are implemented whenever clustering of only one chain is desired.

    Clustering is started by first calculating the pairwise (Euclidean) distance between the grid centers. Once the first distance matrix is available, the first branches of the tree are constructed based on these distances. These first-level clusters (or pairs) are then joined by calculating distance between pairs. The distance between the clusters (or pairs) here is defined in two ways. In the group average or average linkage method each cluster location is the geometric center of all the points (grid centers) in the cluster. In the single linkage method, this distance refers to the distance between the nearest two points (grid centers) in the two clusters. The group average clustering makes more sense if the cluster geometry is spherical or closer to spherical. The single linkage may be useful if the cluster members in the original structure are less spherical in nature. Furthermore, single linkage will be preferable when a cluster of residues may be caused by successive attachment of residues to a region of interest, e.g. hydrophobic residues in a transmembrane helix. On the other hand group average or average linkage will reflect a more realistic situation if members of a cluster are all important to each other as is the case of hydrophobic cores in proteins and charged patches in DNA binding proteins. This process of joining pairs of grids and sub-clusters is continued until all grids have been joined and is termed ‘hierarchical clustering’.

    QGRID QUERY INTERFACE

    Qgrid has a simple HTML query interface, which takes the following inputs as the options (Figure 1):

    File upload or PDB code: This is simply the four-letter PDB code of the protein for which clusters are desired. We have a local mirror of PDB from which this data will be subsequently retrieved for calculations.

    Cluster type: Here the users can decide if they want to generate a cluster of hydrophobic or charged regions. This field input is implemented by way of and