当前位置: 首页 > 期刊 > 《核酸研究》 > 2005年第11期 > 正文
编号:11369395
Binding properties and evolution of homodimers in protein–protein inte
http://www.100md.com 《核酸研究医学期刊》
     Ariadne Genomics Inc. 9700 Great Seneca Highway, Suite 113, Rockville, MD 20850, USA 1Department of Physics, Brookhaven National Laboratory Upton, NY 11973, USA

    *To whom correspondence should be addressed. Fax: +1 240 453 6208; Email: slava@ariadnegenomics.com

    ABSTRACT

    We demonstrate that protein–protein interaction networks in several eukaryotic organisms contain significantly more self-interacting proteins than expected if such homodimers randomly appeared in the course of the evolution. We also show that on average homodimers have twice as many interaction partners than non-self-interacting proteins. More specifically, the likelihood of a protein to physically interact with itself was found to be proportional to the total number of its binding partners. These properties of dimers are in agreement with a phenomenological model, in which individual proteins differ from each other by the degree of their ‘stickiness’ or general propensity toward interaction with other proteins including oneself. A duplication of self-interacting proteins creates a pair of paralogous proteins interacting with each other. We show that such pairs occur more frequently than could be explained by pure chance alone. Similar to homodimers, proteins involved in heterodimers with their paralogs on average have twice as many interacting partners than the rest of the network. The likelihood of a pair of paralogous proteins to interact with each other was also shown to decrease with their sequence similarity. This points to the conclusion that most of interactions between paralogs are inherited from ancestral homodimeric proteins, rather than established de novo after duplication. We finally discuss possible implications of our empirical observations from functional and evolutionary standpoints.

    INTRODUCTION

    Many functionally important proteins, such as receptors , enzyme complexes (3), ion channels (4) and transcriptional factors (5), are homo- or hetero-dimers. For example, 70% of enzymes listed in the Brenda database (http://www.brenda.uni-koeln.de/) can self-interact to form dimers or higher-order oligomers. As another example, G-protein-coupled receptors (1), chemokine (6), cytokine (7) and tyrosine kinase receptor (2) families all use oligomerization as a step in the pathway activation in response to an agonist (3). The examples of multi-protein complexes containing homodimers include proteasome (8), ribosome (9) and nucleosome (10). The function of most filamentous proteins of the cytoskeleton, such as actin, myosin, spectrin, tubulin, etc., relies on their oligomerization or polymerization. The ability to self-interact confers several structural and functional advantages to proteins, including improved stability (11,12) control over the accessibility and specificity of active sites (3), and increased structural complexity. In addition, self-association can help to minimize genome size, while maintaining the advantages of modular complex formation. Protein assembly into heterodimers has the combinatorial effect of producing multiple species with different affinity to its substrates and other biophysical characteristics, giving the cell an instrument for fine-tuning its regulatory responses. Even bigger variety of complexes contain (or are formed by) the interacting paralogs, such as spliceosome (13), acting promoting complex Apr2/3, membrane receptors (14) and transcription factors (5).

    While many specific dimerizing proteins are well studied and their biological and structural properties have been established, little is known about an overall topological influence and high-level statistical properties of dimer distribution in protein networks. The protein networks have recently become a subject of extensive research by biologists as well as by scientists from other fields interested in networks and graphs . Among various studied types of protein–protein networks, a binding, or physical interaction networks have several appealing properties that make them a popular research subject: they are undirected, Boolean and the most extensive ones, in principle spanning over all proteins present in a given organism. Several universal features of the binding networks are believed to be established fairly well. Examples include an apparent broad (scale-free) degree distribution , suppression of interactions between high-degree (hub) proteins (17), a higher than randomly expected number of tightly linked sub-graphs or cliques (15) and evolutionary conservation of such tightly linked sub-graphs (18). In this paper, we describe a systematic empirical study of topological properties of physical interaction networks in the neighborhood of homodimers (self-interacting proteins) as well as heterodimers formed by paralogous proteins.

    MATERIALS AND METHODS

    The protein interaction data for all four species were obtained from the Biological Association Network databases available from Ariadne Genomics (http://www.ariadnegenomics.com/). The database for Homo sapiens was derived from the Ariadne Genomics ResNet database, constructed from the various literature sources using Medscan. Medscan is the Ariadne Genomics' proprietary natural language processing technology (20,21). The list of all human proteins used in our study along with their degrees (number of binding partners), dimerization state and a brief description of their functional role in the cell (if it is known) is available in the Supplementary Material. The databases for the baker's yeast Saccharomyces cerevisiae, the nematode worm Caenorhabditis elegans and the fruit fly Drosophila melanogaster were constructed by combining the data from published high-throughput experiments with the literature data obtained using Medscan technology. For more details on the construction of these databases, please refer to the PathwayAssist manual (http://www.ariadnegenomics.com/products/pathway.html).

    Most of the protein–protein interactions (PPIs) among fly proteins (20 496 out of 20 595 or 99.5%) are extracted from a single system-wide two-hybrid study (22), while most of worm interactions (4027 out of 5309 or 75%) are from a large-scale two-hybrid study (23). An abnormally small average degree in the worm PPI network compared with that of other organisms might be explained by the fact that, unlike in the yeast (24) and the fly (22) cases, the high-throughput two-hybrid assay of worm proteins was not truly genome-wide. Indeed, in (23) the authors experimentally investigated interactions of only 1873 specially selected baits (out of some 22 000 worm proteins) against genome-wide libraries of preys. Owing to a small probability that a given interaction would be observed in both directions, proteins that were not tested as baits on average get only half of their number of interaction partners. Indeed, we found that the average degree of worm proteins tested as baits (or rather 729 of them that were found to have at least one prey partner) is 6.1 as opposed to the average degree of 3 in the whole two-hybrid part of the worm network. This is now remarkably close to the 5.7–6.6 range found in the other three organisms studied here. It is important to note that the number of homodimer proteins found in this study (60 proteins) is a gross underestimate of the total number of homodimers among worm proteins as in order for self-interaction to be detected both bait and prey hybrids of a proteins have to be used in the study. A crude estimate gives the overall number of homodimers in the worm to be at least (60 x 22 000)/1873 700.

    Lists of paralogous pairs and their sequence similarities for all four species studied here were obtained by the following procedure. Amino acid sequences of individual proteins were obtained from the RefSeq database (http://www.ncbi.nlm.nih.gov/RefSeq/). For each organism, the sequences were compared against themselves using the BLASTp program with the expectation value cutoff equal to 0.001 (25). A global alignment similarity was then computed by adding together numbers of similar amino acids from all non-overlapping locally aligned segments and dividing this number by the geometric average of two protein lengths. Thus, gaps between the aligned segments were considered to have zero similarity. In the case of overlapping segments, we took the one with the highest percent of similarity. We estimated that 2% of the true homologs are not recovered by this approach due to an incompleteness of the BLASTp output for local alignment. Another sacrifice for quicker calculation is an underestimation of the global alignment score by 5–10% compared with more precise calculation after alignment using the CLUSTALW algorithm (26).

    To reduce the number of false positives we further restricted our set to include only protein pairs with the similarity >30%. At the end, all protein pairs that have been aligned by BLAST but omitted from the final paralog list due to failing the similarity cutoff were checked for having common paralogs. If a common paralog was found, the pair was reinstated in the paralog list.

    RESULTS

    Basic observations

    We have assembled and analyzed the PPI (binding) networks from four organisms: the baker's yeast S.cerevisiae, the nematode worm C.elegans, the fruit fly D.melanogaster and the human H.sapiens (see Materials and Methods for details). The most apparent observation that follows from the network data (Table 1) is that the number of self-interacting proteins in all four organisms is substantially higher than one would expect purely by chance. Indeed, in a network with N proteins (each having at least one interaction), a straightforward estimate assuming equal affinity to itself and other proteins suggests that a protein with the connectivity (degree) k would have a probability to bind to itself equal to k/N. The total number of dimers then will be the sum of this expression over all proteins, which is the average connectivity, . The actual number of dimers is 25–200 times higher than expected based on this simple-minded hypothesis (Table 1).

    Table 1 Estimated total number of proteins Ntotal, number of proteins involved in the PPI networks NPP1, the number of dimers or self-interacting proteins Ndimer, the average network degree (the number of neighbors) k over all NPP1 and the average degree kdimer of self-interacting proteins

    The abundance of dimers in all species suggests that their functional importance has been preserved through the evolution. In support of this conclusion, we note that self-interacting proteins also have about twice as many interaction partners compared with non-dimers (Table 1). Indeed, the number of interaction partners of a protein was shown before to be positively correlated with its probability to be essential for the survival of the cell and to be conserved in the course of evolution (18).

    Sometimes, the ease with which proteins form self-interactions has purely structural (as opposed to functional) origin explained, e.g. by the domain swapping model (27). Indeed, in the fully folded state the individual structural components of a protein are expected to make multiple binding contacts with each other. A pair of identical (or homologous) proteins then might be able to use the same set of contacts to physically interact with each other if they encounter each other in a partially unfolded state. It is interesting to note that average degrees of dimers are almost equal to each other in all four organisms studied here. Average degrees of all proteins in the network are also quite close to each other (a plausible experimental source of an anomalously low k 3 of the worm network is explained in Materials and Methods). At present, it is unclear whether this apparent similarity is just a coincidence or has some deeper explanations. In any case, the inter- and intra-species comparison of these networks with each other indicate that the data for PPI in any of these organisms are far from saturation and a considerable number of new interactions is expected to be added to these networks in the future.

    Linear scaling

    To better understand connectivity patterns of homodimers in protein interaction networks, we studied how the likelihood of a protein to interact with itself Pdimer(k) depends on its overall number of binding partners (degree) k. Pdimer(k) is simply a fraction of homodimers among all proteins with the degree k. Figure 1 shows Pdimer(k) versus k measured in the fly data based mainly on the species-wide two-hybrid dataset described previously (22). As one can see, the probability of self-interaction linearly increases with the degree in the protein network (the dashed line on the log–log plot in Figure 1 has slope 1). The proportionality coefficient of this linear increase can be interpreted as the probability pself 3.5 x 10–3 that a given edge of a physical interaction network starting at a certain protein ends up connecting this node with itself. It is 25 times larger than the probability pothers = 1/7000 1.4 x 10–4 that it will instead connect with a randomly selected other node among 7000 proteins present in the fly interaction dataset. This is consistent with a larger than expected number of homodimers discussed above. The observation that the likelihood of a protein to interact with itself linearly increases with the total number of its interaction (binding) partners (Figure 1) contains an important information about the general mechanisms of such interactions. We conjecture that every protein i can be characterized by a unique intrinsic parameter that we would refer to as its ‘stickiness’ i. This parameter quantifies protein's overall propensity toward forming physical interactions. We further assume that both the probability of a protein to interact with itself and its probability to interact with other proteins are proportional to this stickiness (albeit with different coefficients as we saw above) and thus should linearly depend on each other. This rather plausible conjecture of the existence of a ‘universal propensity toward interactions’ of individual proteins in an organism thus explains both the linear scaling in Figure 1 and our original observation that self-interacting proteins in several organisms tend to have higher than average number of binding partners in the physical interaction network (Table 1). Indeed, by considering the homodimers, we automatically pick proteins with higher than average stickiness and thus end up with a subset of proteins characterized by a higher than average number of binding partners k. It is important to emphasize that the proposed ‘stickiness’ of a protein should not be interpreted literally, i.e. as the ability of a protein to unspecifically bind other proteins. In fact, all interactions in our datasets (with the exception of false positives) come from specific functionally relevant bindings between proteins. Instead, one should view the ‘stickiness’ as a complex quantitative characteristic of a protein, which has contributions from such properties as the number and nature of its constituent domains, the hydrophobicity of its surface, the number of copies of the protein per cell, the extent of its evolutionary conservation, the overall level of a ‘cooperativity’ of the functional task it is involved, etc. In some of our datasets (e.g. human), which are based on a large number of small-scale experiments instead of a single genome-wide assay, the ‘stickiness’ of a protein may also correlate with its overall popularity, i.e. the number of publications it was studied in. Figure 2 shows the correlation between the propensity toward self-interactions and the number of binding partners in the human dataset. Here, as for the fly (see Figure 1), Pdimer(k) has a region of linear k-dependence. However, here this region is limited to small values of . For larger values of k, Pdimer(k) starts to show saturation effects and completely saturates at 1 for k > 100. The saturation is expected to follow a linear region as obviously no probability could exceed 1. Moreover, it can be qualitatively described by the following simple model. Suppose that each of the k interaction links starting at a given protein with a probability pself ends at the same protein, while with a probability 1 – pself it selects some other protein target. Then, the chances that none of the k links results in the formation of the homodimer are (1 – pself)k, while a homodimer is formed with a probability

    (1)

    For k < 1/pself, this expression yields a linear k-dependence for Pdimer(k), as it was observed for the fly data (Figure 1). This general formula also fits Pdimer(k) nicely over the whole range of k (see dashed lines in the Figure 2). The fit with this formula provides an estimate of a propensity toward self-interactions among human proteins: , which is 10 times higher than in our fly dataset. This is why the saturation of Pdimer(k) is clearly visible in human but not in the fly. However, due to a vast differences in the extent of coverage and sources of the data describing PPIs in the human (interacting protein pairs extracted from abstracts indexed in PubMed) and the fly (a genome-wide two-hybrid assay), different values of pself do not have to reflect actual differences between these two organisms. Finally, in Figure 3 we show the fraction of homodimers versus degree in our worm and yeast datasets. One can see that our previous observations remain valid. Worm dataset is well described by a linear scaling of Pdimer(k) with k corresponding to somewhere halfway between the fly and the human. The curve for the yeast exactly follows that of the worm until its slope suddenly changes to a much smaller value around k = 10. Causes of such sudden change of behavior in yeast are unclear to us. It could be somehow caused by the popularity of yeast as a model eukaryotic organism. Thus, unlike in worm or fly, both large-scale and small-scale experimental techniques significantly contribute to our knowledge of PPIs in yeast.

    Figure 1 The likelihood Pdimer(k) of a fly protein of a given degree k to self-interact plotted versus its degree k in the PPI network. The dashed line is the linear fit Pdimer(k) = 0.0035 k. To improve the statistics, the degree k in this and subsequent figures is logarithmically binned.

    Figure 2 The fraction of homodimers Pdimer(k) among human proteins as a function of their degree k. Dashed and dotted–dashed lines are fits with the Equation 1 and pself = 0.035 and pself = 0.055 correspondingly. The second value provides the best fit overall, while the first value better fits the low k region. The inset demonstrates the linear scaling of log with k in the large k region as expected from the Equation 1. The solid line corresponds to pself = 0.051.

    Figure 3 The fraction of homodimers among yeast (open squares) and worm (filled circles) proteins plotted versus their total number of binding partners. The solid line corresponds to the linear fit with to the worm data.

    Evolution of homodimers and interacting paralogs

    Interacting paralogous proteins (paralogous heterodimers) are often thought to be closely related to the self-interacting proteins or homodimers. Indeed, a duplication of a homodimer encoding gene in evolution results in an appearance of a new pair (or several pairs for larger families) of interacting paralogous proteins. Such interaction links between paralogs could be destroyed with time as accumulation of mutations in the constituent proteins changes their 3D shapes. A binding between a pair of non-homodimeric paralogous proteins may also appear de novo after duplication event. Relative importance of these two mechanisms of formation of paralogous heterodimers are not universally agreed on . In this section we study pairs of interacting paralogs present in our datasets. The purpose of this study is twofold:

    We first make a number of empirical observations favoring the hereditary nature of interactions between paralogs and confirming the relationship between most of such heterodimers and their homodimeric ancestors.

    We then use a set of proteins interacting with their paralogous partners to confirm and extend our empirical observations about homodimers discussed above. Owing to an incomplete and noisy nature of essentially any data describing genome-wide PPI networks, there is only partial overlap between sets of homodimers and interacting paralogs. Thus, the addition of interacting paralogs to the set of homodimers allows us to considerably improve the statistics of our analysis.

    We first count the number of linked paralogous pairs nlinked paralogs in each dataset. If most links between paralogs were indeed inherited from homodimeric ancestors, nlinked paralogs should be significantly higher than nlinked random,the number of links one expects to find between the same number Nparalogous pairs of randomly selected pairs of non-paralogous proteins. Indeed, as we demonstrated in the previous sections, all four organisms included in our study are characterized by an unusually large number of homodimers. However, if most links between paralogous proteins were established de novo after duplication, there is no reason to expect the number of such links to be unusually large compared with a random set of protein pairs. The results presented in Table 2 strongly support the hereditary origin of most paralogous heterodimers: for all species nlinked paralogs is much larger than nlinked random (by several orders of magnitude). This is a strong evidence for the hereditary rather than the de novo origin of the paralog–paralog links. Another strong argument for the hereditary hypothesis follows from Figure 4. This figure reveals that the further paralogs diverge in their amino acid sequences, the smaller is the probability of them to be linked to each other. This suggests that typically pairs of linked paralogs gradually loose inherited interactions rather than establish new ones. Thus, we conclude that most interacting paralogs present in our data were created by duplication of homodimeric proteins. A final argument in support of this conclusion is that the average number of binding partners of interacting paralogs klinked paralogs is indistinguishable from that of homodimers kdimer and is 2–3 times higher than the average over the whole network (see Tables 1 and 2). Given that most paralogous heterodimers were at some point formed from homodimers, one might assume that most proteins involved in such heterodimeric complexes are homodimers. However, it is far from being the case (see Table 3). Such discrepancy is caused by two reasons, one purely evolutionary while another anthropogenic.

    As a result of substitutions in its amino acid sequence, any protein might loose its ability to interact with its paralog or to homodimerize. From Figure 4, one can see that many ancient duplicates of homodimers have lost links to their ancestors.

    The experimental data are far from being complete and many links, including self-interactions, are simply not registered. The comparison between sets of homodimers and interacting paralogs may in principle be used to crudely estimate the completeness of our knowledge of a protein network in a given organism.

    Table 2 The number of linked pairs of paralogous proteins nlinked paralogs, the number of linked pairs nlinked paralogs expected by pure chance alone, the average degree klinked paralogs of proteins known to interact with some of their paralogs and the average degree kdimer of self-interacting (dimer) proteins

    Figure 4 The probability for two paralogous proteins to bind to each other Plinked paralogs versus their sequence similarity s for (top to the bottom) human, yeast, worm and fly. Even the most distant paralogs are more likely to interact with each other than a randomly selected pair of proteins. Such randomly expected probability is equal to 1.1 x 10–3 in the human, 1.3 x 10–3 in the yeast, 1.1 x 10–3 in the worm and 0.8 x 10–3 in the fly dataset.

    Table 3 Numbers of certain types of proteins for yeast, worm, fly and human

    DISCUSSION

    We demonstrated that self-interacting proteins tend to have connectivity significantly above the average in the PPI network. This phenomenon appears universally in PPI networks of all four model organisms studied above. As a related phenomenon, we found that interacting paralogs also have increased connectivity, likely because most of them are descendants of ancient self-interacting proteins. We also have shown that numbers of homodimers and interacting paralogs are both higher than expected by pure chance alone. We unify these phenomena by introducing a concept of protein's ‘stickiness’ measuring its overall propensity for binding. Both the propensity of proteins toward self-interactions and the degree of a protein in the PPI network are proportional to this parameter. However, the dimerization probability apparently has a larger proportionality coefficient. This is not very surprising given a multitude of functional roles dimers (or polymers) play in living cells. Dimerizing and oligomerizing proteins are ubiquitous in all organisms and are present in the most evolutionary conserved protein complexes (3). On the evolutionary side, we have confirmed that most links between paralogs are most probably inherited from their dimerizing ancestors. This does not exclude a possibility that some of these links are formed after duplication as a result of random mutations, but the relative number of such de novo created links is relatively small. This conclusion has several implications for the network topology. If a given dimerizing protein has duplicated several times, it leads to an appearance of a fully interconnected complex or clique of paralogous heterodimers. In reality, some links inside this complex are lost due the divergence of sequences of paralogous proteins. Such loss of links may split a higher-order clique into several lower-order ones or make it just a densely (yet not fully) interconnected motif. A higher density of links around dimers caused by these remaining heterodimeric links may provide a qualitative explanation to the empirically observed abundance of highly interconnected motifs and cliques in protein networks (15). Several simple models of network growth and evolution due to gene duplications followed by subsequent functional divergence of the resulting pair of paralogous proteins lead to networks with an unrealistic bipartite topology, in which descendants of a particular protein never interact with their paralogs (19). Introduction of a large number of heterodimers to the ancestral network in these models generates frequent links between paralogs, which in the end gives rise to more realistic network topologies. Finally, we would like to speculate on a general role that the highly connected self-interacting proteins might play in the cell. A single protein molecule can simultaneously bind only a limited number of partners, at most equal to the number of its functional domains. On the other hand, most biological processes require many different proteins in numbers far greater than the binding capacity of a single protein molecule. The protein components of large signaling or biochemical pathways do not form large stable complexes containing all proteins simultaneously. Yet, all the necessary molecules must be in a physical proximity to each other to form a functional module. This contradiction poses a question: how so many different proteins could co-localize in a cell to correctly perform a physiological function? A possible solution to this question involves highly connected self-interacting proteins serving as self-organizing centers for co-localization of the pathway components. The self-interaction (oligomerization) of such proteins might function as a general mechanism for sensing protein concentration (3). Indeed, a random increase of a local concentration of monomers leads to their oligomerization and subsequently to the increase in the concentration of binding sites for other pathway components, increasing in turn their effective concentration.

    SUPPLEMENTARY MATERIAL

    Supplementary Material is available at NAR Online.

    ACKNOWLEDGEMENTS

    This work was supported by 1 R01 GM068954-01 grant from NIGMS. Work at Brookhaven National Laboratory was carried out under Contract no. DE-AC02-98CH10886, Division of Material Science, U.S. Department of Energy. Two of us (I.I. and I.M.) thank the theory Institute for Strongly Correlated and Complex Systems at BNL for the hospitality and financial support during visit where some of this work was accomplished. Funding to pay the Open Access publication charges for this article was provided by the NIGMS grant.

    REFERENCES

    Milligan, G., Ramsay, D., Pascal, G., Carrillo, J.J. (2003) GPCR dimerisation Life Sci., 74, 181–188 .

    Ronnstrand, L. (2004) Signal transduction via the stem cell factor receptor/c-Kit Cell Mol. Life. Sci., 61, 2535–2548 .

    Marianayagam, N.J., Sunde, M., Matthews, J.M. (2004) The power of two: protein dimerization in biology Trends in Biochem. Sci., 29, 618–625 .

    Simon, A.M. and Goodenough, D.A. (1998) Diverse functions of vertebrate gap junctions Trends Cell Biol., 12, 477–483 .

    Amoutzias, G.D., Robertson, D.L., Oliver, S.G., Bornberg-Bauer, E. (2004) Convergent evolution of gene networks by single-gene duplications in higher eukaryotes EMBO Rep., 5, 274–279 .

    Mellado, M., Vila-Coro, A.J., Martinez, C., Rodriguez-Frade, J.M. (2001) Receptor dimerization: a key step in chemokine signaling Cell Mol. Biol., 47, 575–582 .

    Langer, J.A., Cutrone, E.C., Kotenko, S. (2004) The Class II cytokine receptor (CRF2) family: overview and patterns of receptor–ligand interactions Cytokine Growth Factor Rev., 15, 33–48 .

    Bochtler, M., Ditzel, L., Groll, M., Hartmann, C., Huber, R. (1999) The proteasome Annu. Rev. Biophys. Biomol. Struct., 28, 295–317 .

    Matadeen, R., Patwardhan, A., Gowen, B., Orlova, E.V., Pape, T., Cuff, M., Mueller, F., Brimacombe, R., van Heel, M. (1999) The Escherichia coli large ribosomal subunit at 7.5 A resolution Structure Fold Des., 7, 1575–1583 .

    Bentley, G.A., Lewit-Bentley, A., Finch, J.T., Podjarny, A.D., Roth, M. (1984) Crystal structure of the nucleosome core particle at 16 A resolution J. Mol. Biol., 176, 55–75 .

    Hattori, T., Ohoka, N., Inoue, Y., Hayashi, H., Onozaki, K. (2003) C/EBP family transcription factors are degraded by the proteasome but stabilized by forming dimer Oncogene, 22, 1273–1280 .

    Dunbar, A.Y., Kamada, Y., Jenkins, G.J., Lowe, E.R., Billecke, S.S., Osawa, Y. (2004) Ubiquitination and degradation of neuronal nitric-oxide synthase in vitro: dimer stabilization protects the enzyme from proteolysis Mol. Pharmacol., 66, 964–969 .

    Mura, C., Cascio, D., Sawaya, M.R., Eisenberg, D.S. (2001) The crystal structure of a heptameric archaeal Sm protein: implications for the eukaryotic snRNP core Proc. Natl Acad. Sci. USA, 98, 5532–5537 .

    Rubin, I. and Yarden, Y. (2001) The basic biology of HER2 Ann. Oncol., 12, Suppl. 1, S3–S8 .

    Spirin, V. and Mirny, L.A. (2003) Protein complexes and functional modules in molecular networks Proc. Natl Acad. Sci. USA, 100, 12123–12128 .

    Wagner, A. (2003) How large protein interaction networks evolve Proc. R. Soc. Lond. B, 270, 457–466 .

    Maslov, S. and Sneppen, K. (2002) Specificity and stability in topology of protein networks Science, 296, 910–913 .

    Wuchty, S., Oltvai, Z.N., Barabasi, A.-L. (2003) Evolutionary conservation of motif constituents within the yeast protein interaction network Nature Genet., 35, 176–179 .

    Kim, J., Krapivsky, P.L., Kahng, B., Redner, S. (2002) Infinite-order percolation and giant fluctuations in a protein interaction network Phys. Rev. E. Stat Nonlin. Soft Matter Phys., 66, 055101–055105 .

    Novichkova, S., Egorov, S., Daraselia, N. (2003) MedScan, a natural language processing engine for MEDLINE abstracts Bioinformatics, 19, 1699–1706 .

    Daraselia, N., Yuryev, A., Egorov, S., Novichkova, S., Nikitin, A., Mazo, I. (2004) Extracting human protein interactions from MEDLINE using a full-sentence parser Bioinformatics, 20, 604–611 .

    Giot, L., Bader, J.S., Brouwer, C., Chaudhuri, A., Kuang, B., Li, Y., Hao, Y.L., Ooi, C.E., Godwin, B., Vitols, E., et al. (2003) A protein interaction map of Drosophila melanogaster Science, 302, 1727–1736 .

    Li, S., Armstrong, C.M., Bertin, N., Ge, H., Milstein, S., Boxem, M., Vidalain, P.O., Han, J.D., Chesneau, A., Hao, T., et al. (2004) A map of the interactome network of the metazoan C.elegans Science, 303, 540–543 .

    Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., Sakaki, Y. (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome Proc. Natl Acad. Sci. USA, 98, 4569–4574 .

    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. (1990) Basic local alignment search tool J. Mol. Biol., 215, 403–410 .

    Thompson, J.D., Higgins, D.G., Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res., 22, 4673–4680 .

    Bennett, M.J., Choe, S., Eisenberg, D. (1994) Domain swapping: entangling alliances between proteins Proc. Natl Acad. Sci. USA, 91, 3127–3131 .(Iaroslav Ispolatov*, Anton Yuryev, Ilya )