CECAM Program

Scientific Report

Scientific background and objectives

The last 25 years have witnessed the development of potent algorithms for docking macromolecules [1,2]. Launched in 2001, the CAPRI experience (Critical Assessment of PRediction of Interactions) provides evaluation of these methods on a common ground and incentives new methodology development [3]. The motivation is to build three-dimensional structures of molecular machineries (the biological active species) starting from their separate macromolecular components. Making such a tool available is an essential complement of experimental approaches as Protein-Interaction Maps become available for whole proteomes and as the number of protein with known or built-up structures increases. The structure of the proteins is generally available in their unbound form and may arise from precise X-ray or NMR structure resolution, but also from low resolution EM reconstruction or from homology modeling. In addition to side-chain fluctuations, likely to occur between the free and bound forms, the main chain can undergo conformational changes during association [4]. Structural elements may also be missing in the unbound form, particularly in the case of low resolution structures. Proteins loops are generally poorly defined in structures resulting from homology modelling [5].

Evaluation of the docking results in CAPRI has established that some complexes cannot be correctly predicted without explicitly taking into account the internal flexibility of the receptor as well as that of the ligand macromolecule [6]. This is a difficult task to achieve while conserving the search rapidity necessary for post-genomics applications. Systematic exploration, in terms of position and rotation, of the possible arrangement of the two partners considered as rigid bodies already necessitates the generation of hundreds of thousands trial conformations for the complex. Introducing every degree of protein internal flexibility from the beginning of a systematic search procedure is clearly not manageable due to combinatorial explosion. However, there is still no systematic way to know which of these internal degrees of freedom would be essential for a particular docking problem. In fact, incorporating the internal flexibility of proteins in the docking methods has been identified as a major bottleneck to improve the field and this CECAM workshop intended to specifically tackle this problem. The other identified bottleneck, which is the determination of scoring functions able to discriminate between a correct docking result and a false positive result, was not the subject of this workshop but has however been addressed in part during the discussions since it is hardly separable from the flexible docking problem.

Many of the groups which develop docking programs have already devised a way of considering side chain conformational rearrangement during docking, whether implicitly or explicitly. Methods for considering higher levels of flexibility, involving the rearrangement of segments of the protein backbone, i.e. loops, domains or the whole protein, are currently being explored by an increasing number of groups. At the same time, protein flexibility is also being addressed in other fields of molecular modeling, whether per se, to better understand the mechanism underlying induced fit, or to deal with related problems like small molecule docking or protein folding. Since these explorations are not necessarily submitted to the restrictions inherent to the docking methods (systematic search, need of rapidity), the information they provide can be very helpful for the development of flexible docking methods. The present workshop on flexible macromolecular docking came as a convergence between these different approaches. Its object was to bring together the groups which have experienced some of the aspects of flexible docking as well as those specifically working on protein flexibility, in order to share the experience accumulated by each of them, to inventory methodologies devoted to handle protein flexibility and to evaluate their potentiality in flexible docking, to identify the difficulties inherent to flexible macromolecular docking and to tackle the next steps to be performed in the field.

Program and Participants

The workshop gathered fifty participants, either developing docking methods in relation to the CAPRI experience, involved in specific docking projects or specifically working on protein flexibility in the frame of neighboring fields. In addition to academic institutions, four industrial companies were represented. The scientific background of the participants ranged from biophysics to robotics, informatics or genomics. The workshop consisted in series of 35 min oral presentations followed by discussion sessions and poster sessions.

As a guideline for the workshop, three principal sessions had been proposed (listed below). The issues raised in each session have all been largely covered by the speakers, though each speaker generally addressed several of these issues. People involved in the CAPRI experience have already begun to address the flexibility problem and several groups involved in the macromolecular docking field are also involved in small molecule docking or protein folding. It must be emphasized that an important effort has been made by all the speakers to focus their intervention on the theme of the workshop and to extract from their work and experience what is related to flexibility. This resulted in a high quality level of the lectures and related discussions.

1. Presentation and analysis of the possible types of deformations which can be expected to occur within macromolecules during their association
- analysis of the impact of such deformations on the results of the Capri experience
- examples where conformational changes have or have not impeded good prediction

2. How flexibility is accounted for in the present docking methods :
- implicitly or explicitly
- at the level of side-chains, loops or domain
- at the refinement stage or during the docking process
- advantages and/or problems related with each level of representation, in terms of prediction efficiency and processing time

3. Possible methods that can be used to treat protein flexibility at different levels:
- local or global, full-atom or reduced representation, harmonic or anharmonic movements;
- experience accumulated in neighboring fields, small molecule docking or protein folding

Results and Conclusion

The participants identified the following issues as important clues to further advance the flexible macromolecular docking problem.

How do we know a protein is flexible; how can we know which parts are flexible?

Several methods presented by the participants can be helpful to address these questions. The graph-theoretic algorithm FIRST [7] [http://firstweb.asu.edu] has been specifically developed to identify flexible regions in a protein. Other hints have been proposed, some of them based on experimental data from NMR or biochemistry, other based on theoretical calculation like Molecular Dynamics (MD) simulation, enhanced MD, normal mode analysis or Principal Component Analysis (PCA). These calculations may be performed on a protein or more generally on a representant of a family of proteins. Indirect indication of flexibility based on genome analysis has also been reported. The Evolutionary Trace (ET) method [8], based on the analysis of sequences of divergently related proteins, identifies patches of residues involved in the docking interface. In the case of flexible proteins, it happens that these patches are situated on the protein surface only after a conformational change.

If the flexibility characteristics are known, what methods can be used

Even if a protein has been detected as internally flexible, it can happen that flexible parts are situated outside a docking interface, as observed for one of the Capri targets. In this case it needs not be taken into account. Different levels of flexibility should also be considered, side chains, loops, domains or the whole protein. Concerning side chains, it is not clear from the examples described during the workshop whether it is sufficient to account implicitly for side chain flexibility (using a soft representation) or if the explicit level is necessary. Side chain refinement has been reported to improve the predictions in some cases but in other cases it could alter the prediction. However, a soft representation is clearly not adapted to loop movements or domain movements since the volume scanned by such moves is very large. One solution is to delete these flexible parts during a first step of systematic rigid body search, then possibly reintroduce them. Alternatively, methods have been presented to explicitly account for loop or domain movements during docking. In the case of loop movement, a mutlicopy approach was used with pre-generated loop conformations. In the case of domain movements of the hinge-bend type, a multi-component docking approach was used. Interestingly, multi-component docking appeared much more efficient in predicting correct protein arrangements than successively docking pairs of separate protein elements. Contributions from the graph theory for multi-component docking or from robotics to generate possible deformations and articulate movements appear very promising. In the case of global deformations of the protein backbone, docking on a sample of conformations issued from MD or from PCA-enhanced MD appeared to improve the predictions, as did PCA-based conformational adjustment performed during the docking process.

In any case, any information allowing restriction of the search space is welcome. This allows efficient use of an all-atom, internal variable representation of the protein during the final stages of docking, coupled with MD or minimization. Precise information on the docking interface can be obtained by NMR or ET. Information can also be deduced from published biochemical experiments. Such information must be used with caution since a wrong interpretation of the data systematically leads to wrong predictions.

Other points under discussion

Several studies presented during the workshop aimed at better understanding the induced fit process during macromolecular docking. In particular, it was discussed whether protein deformation occurs as a result of the modification of the external field sensed by the protein, or if the docking process involves a selection between conformations already present in the solution ensemble.

The importance of the scoring function was also emphasized. In intermediate stages of the docking process, an inadequate scoring function may lead to reject conformations that would have led to correct predictions. It was discussed whether an universal scoring function exists, based on a precise account of all free energy components, or if it is more adequate to use different scoring functions, adapted to the different levels of protein representation during the docking process.

Finally, the need to learn from bad predictions has been stressed. Particularly, it may be useful to elaborate a more complex analysis of the predictions. Poorly scored complexes containing zones with good interactions coexisting with zones with bad interactions (“white” + “black” = “grey”) should be distinguished from complexes with uniformly poor interactions (all “grey”). This distinction is particularly important in the case of flexible docking and may lead to identify induced deformations in a protein. To perform such an analysis, much can be learned from the huge amount of data generated by the various groups in view of the Capri experience and discarded when selecting the structures to be submitted. Comparative studies underway between true dimeric proteins and proteins submitted to crystal packing interaction are also very important for this purpose. It seems that the spatial repartition of interactions is more important than simply the total sum of interactions.

In conclusion, the workshop successfully reached its goal in specifically addressing the problem of accounting for flexibility in macromolecular docking. Thanks to the high-level contributions of all participants, we were able to determine the important issues to be addressed, to identify methods potentially useful to solve these questions and interesting tracks to be explored. We hope the constructive discussions that took place during the workshop announce a new stage of development of the macromolecular docking field.

References

1. Smith GR, Sternberg MJE. Prediction of protein-protein interactions by docking methods. Curr. Opin. Struct. Biol. 2002, 12, 28-35

2. Wodak SW, Janin J. Structural basis of macromolecular recognition. Adv. Prot. Chem. 2002, 61, 9-68

3. Janin J, Henrick K, Moult J, Eyck LT, Sternberg MJ, Vajda S, VakserI, Wodak SJ. CAPRI: A Critical Assessment of PRedicted Interactions. Proteins 2003, 52, 2-9. [http://capri.ebi.ac.uk]

4. Betts MJ, Sternberg MJE. An analysis of conformational changes on protein-protein association: implications for predictive docking. Protein Engineering 1999, 12, 271-283
see also http://molmovdb.org/molmovdb for an overview of protein movements

5. Rodriguez R, Chinea G, Lopez N, Pons T, Vriend G. Homology modeling, model and software evaluation: three related resources. Bioinformatics. 1998, 14, 523-8

6. Mendez R, Leplae R, De Maria L, Wodak SJ. Assessment of blind predictions of protein-protein interactions: Current status of docking methods. Proteins 2003, 52, 51-67

7. Jacobs DJ, Radler, AJ, Kuhn LA and Thorpe MF. Protein Flexibility predictions using graph theory. Proteins, 2001, 44, 150-165

8. Lichtarge O, Bourne HR and Cohen FE. The evolutionary trace method defines the binding surfaces common to a protein family. J Mol Biol 1996, 257, 342-358

Abstracts and Posters (.pdf)

List of participants (.pdf)