Max-Planck-Institut für Informatik
max planck institut
mpii logo Minerva of the Max Planck Society

Structure - Function Relationships in Proteins

Naturally occurring proteins form stable but flexible tertiary structures. This flexibility is of most importance for protein function. For example, local and also large scale motions can be observed in response to binding to other proteins or other molecules. For many proteins, more than one experimentally determined structural model is currently available. These alternative structures provide us with different snapshots of the proteins in action, and allow us to better understand protein motions and protein function. So far these structures have not been compared and classified in a systematic way. We have developed a method (STRuster), to compare the alternative models available for each protein and to cluster them according to backbone structural similarity. Proteins in the same cluster should correspond to similar structural / functional states or to similar experimental conditions. The clustering provides an insight into how proteins function at the molecular level. The clustering is also useful to identify adequate structural templates for further structure prediction experiments and for experimental structure determination by molecular replacement methods. The method has been published [1], and applied to the structural models available in SCOP, a protein structure classification database. The results are available on the web.

The method is based on the calculation of carbon alpha atom distance matrices. Two filters are applied in the calculation of the dissimilarity measure in order to identify both large and small (but significant) backbone conformational changes. The resulting dissimilarity value is used for hierarchical clustering and partitioning around medoids (PAM). The silhouette width value is a measure of cluster validity and is used to select the best number of clusters obtained with the PAM algorithm. Hierarchical clustering reflects the hierarchy of similarities between all pairs of models, while PAM groups the models into the "optimal" number of clusters.

Alternative structural models of human transferrin. Left, multiple structure comparison of the 20 models, the iron atom and the associated carbonate are shown in space-fill. Right, STRuster hierarchical classification. Two major clusters can be observed in the dendrogram (red and gray regions). Models shown in red in the structure comparison correspond to the apo form and match the red region in the dendrogram. Models in gray correspond to the iron-binding holo form and match the gray region in the dendrogram.


  1. Domingues FS, Rahnenfuhrer J., Lengauer T.
    Automated clustering of ensembles of alternative models in protein structure databases.
    Protein Eng Des Sel. 2004 Jun;17(6):537-43. Epub 2004 Aug 19. (Abstract)