Scientific Report
Scientific background and objectives
The last 25 years have witnessed the
development of potent algorithms for docking macromolecules [1,2]. Launched in 2001, the CAPRI
experience (Critical Assessment of PRediction of Interactions) provides
evaluation of these methods on a common ground and incentives new
methodology development [3].
The motivation is to build three-dimensional structures of molecular
machineries (the biological active species) starting from their separate
macromolecular components. Making such a tool available is an essential
complement of experimental approaches as Protein-Interaction Maps
become available for whole proteomes and as the number of protein with
known or built-up structures increases. The structure of the proteins
is generally available in their unbound form and may arise from precise
X-ray or NMR structure resolution, but also from low resolution EM
reconstruction or from homology modeling. In addition to side-chain
fluctuations, likely to occur between the free and bound forms, the
main chain can undergo conformational changes during association [4]. Structural elements may also be
missing in the unbound form, particularly in the case of low resolution
structures. Proteins loops are generally poorly defined in structures
resulting from homology modelling [5].
Evaluation
of the docking results in CAPRI has established that some complexes
cannot be correctly predicted without explicitly taking into account the
internal flexibility of the receptor as well as that of the ligand
macromolecule [6]. This is a difficult task to achieve while conserving
the search rapidity necessary for post-genomics applications. Systematic
exploration, in terms of position and rotation, of the possible
arrangement of the two partners considered as rigid bodies already
necessitates the generation of hundreds of thousands trial
conformations for the complex. Introducing every degree of protein
internal flexibility from the beginning of a systematic search
procedure is clearly not manageable due to combinatorial explosion.
However, there is still no systematic way to know which of these
internal degrees of freedom would be essential for a particular docking
problem. In fact, incorporating the internal flexibility of proteins in
the docking methods has been identified as a major bottleneck to improve
the field and this CECAM workshop intended to specifically tackle this
problem. The other identified bottleneck, which is the determination of
scoring functions able to discriminate between a correct docking result
and a false positive result, was not the subject of this workshop but
has however been addressed in part during the discussions since it is
hardly separable from the flexible docking problem.
Many of the groups which develop
docking programs have already devised a way of considering side chain
conformational rearrangement during docking, whether implicitly or
explicitly. Methods for considering higher levels of flexibility,
involving the rearrangement of segments of the protein backbone, i.e. loops, domains or the whole
protein, are currently being explored by an increasing number of groups.
At the same time, protein flexibility is also being addressed in other
fields of molecular modeling, whether per se, to better understand the
mechanism underlying induced fit, or to deal with related problems like
small molecule docking or protein folding. Since these explorations are
not necessarily submitted to the restrictions inherent to the docking
methods (systematic search, need of rapidity), the information they
provide can be very helpful for the development of flexible docking
methods. The present workshop on flexible macromolecular docking came as
a convergence between these different approaches. Its object was to
bring together the groups which have experienced some of the aspects of
flexible docking as well as those specifically working on protein
flexibility, in order to share the experience accumulated by each of
them, to inventory methodologies devoted to handle protein flexibility
and to evaluate their potentiality in flexible docking, to identify the
difficulties inherent to flexible macromolecular docking and to tackle
the next steps to be performed in the field.
Program and
Participants
The
workshop gathered fifty participants, either developing docking methods
in relation to the CAPRI experience, involved in specific docking
projects or specifically working on protein flexibility in the frame of
neighboring fields. In addition to academic institutions, four
industrial companies were represented. The scientific background of the
participants ranged from biophysics to robotics, informatics or
genomics. The workshop consisted in series of 35 min oral presentations
followed by discussion sessions and poster sessions.
As a guideline for the workshop, three
principal sessions had been proposed (listed below). The issues raised
in each session have all been largely covered by the speakers, though
each speaker generally addressed several of these issues. People
involved in the CAPRI experience have already begun to address the
flexibility problem and several groups involved in the macromolecular
docking field are also involved in small molecule docking or protein
folding. It must be emphasized that an important effort has been made by
all the speakers to focus their intervention on the theme of the
workshop and to extract from their work and experience what is related
to flexibility. This resulted in a high quality level of the lectures
and related discussions.
1. Presentation and
analysis of the possible types of deformations which can be expected to
occur within macromolecules during their association
- analysis
of the impact of such deformations on the results of the Capri experience
- examples
where conformational changes have or have not impeded good prediction
2. How flexibility is
accounted for in the present docking methods :
- implicitly or
explicitly
- at the level of side-chains, loops or domain
- at the refinement stage or during the
docking process
-
advantages and/or problems related with each level of representation, in
terms of prediction efficiency and processing time
3. Possible methods that
can be used to treat protein flexibility at different levels:
- local or global, full-atom or reduced representation,
harmonic or anharmonic movements;
- experience accumulated in neighboring fields, small
molecule docking or protein folding
Results and Conclusion
The
participants identified the following issues as important clues to
further advance the flexible macromolecular docking problem.
How do we know a protein is flexible; how
can we know which parts are flexible?
Several methods presented by the participants can be helpful to address
these questions. The graph-theoretic algorithm FIRST [7]
[http://firstweb.asu.edu] has been specifically developed to identify
flexible regions in a protein. Other hints have been proposed, some of
them based on experimental data from NMR or biochemistry, other based on
theoretical calculation like Molecular Dynamics (MD) simulation,
enhanced MD, normal mode analysis or Principal Component Analysis (PCA).
These calculations may be performed on a protein or more generally on a
representant of a family of proteins. Indirect indication of flexibility
based on genome analysis has also been reported. The Evolutionary Trace
(ET) method [8], based on the analysis of sequences of divergently
related proteins, identifies patches of residues involved in the docking
interface. In the case of flexible proteins, it happens that these
patches are situated on the protein surface only after a conformational
change.
If the flexibility characteristics are
known, what methods can be used
Even if a protein has been detected as
internally flexible, it can happen that flexible parts are situated
outside a docking interface, as observed for one of the Capri targets.
In this case it needs not be taken into account. Different levels of
flexibility should also be considered, side chains, loops, domains or
the whole protein. Concerning side chains, it is not clear from the
examples described during the workshop whether it is sufficient to
account implicitly for side chain flexibility (using a soft
representation) or if the explicit level is necessary. Side chain
refinement has been reported to improve the predictions in some cases
but in other cases it could alter the prediction. However, a soft
representation is clearly not adapted to loop movements or domain
movements since the volume scanned by such moves is very large. One
solution is to delete these flexible parts during a first step of
systematic rigid body search, then possibly reintroduce them.
Alternatively, methods have been presented to explicitly account for
loop or domain movements during docking. In the case of loop movement, a
mutlicopy approach was used with pre-generated loop conformations. In
the case of domain movements of the hinge-bend type, a multi-component
docking approach was used. Interestingly, multi-component docking
appeared much more efficient in predicting correct protein arrangements
than successively docking pairs of separate protein elements.
Contributions from the graph theory for multi-component docking or from
robotics to generate possible deformations and articulate movements
appear very promising. In the case of global deformations of the protein
backbone, docking on a sample of conformations issued from MD or from
PCA-enhanced MD appeared to improve the predictions, as did PCA-based
conformational adjustment performed during the docking process.
In
any case, any information allowing restriction of the search space is
welcome. This allows efficient use of an all-atom, internal variable
representation of the protein during the final stages of docking,
coupled with MD or minimization. Precise information on the docking
interface can be obtained by NMR or ET. Information can also be deduced
from published biochemical experiments. Such information must be used
with caution since a wrong interpretation of the data systematically
leads to wrong predictions.
Other points under discussion
Several
studies presented during the workshop aimed at better understanding the
induced fit process during macromolecular docking. In particular, it was
discussed whether protein deformation occurs as a result of the
modification of the external field sensed by the protein, or if the
docking process involves a selection between conformations already
present in the solution ensemble.
The
importance of the scoring function was also emphasized. In intermediate
stages of the docking process, an inadequate scoring function may lead
to reject conformations that would have led to correct predictions. It
was discussed whether an universal scoring function exists, based on a
precise account of all free energy components, or if it is more adequate
to use different scoring functions, adapted to the different levels of
protein representation during the docking process.
Finally, the need to learn from bad predictions has been stressed.
Particularly, it may be useful to elaborate a more complex analysis of
the predictions. Poorly scored complexes containing zones with good
interactions coexisting with zones with bad interactions (“white” +
“black” = “grey”) should be distinguished from complexes with uniformly
poor interactions (all “grey”). This distinction is particularly
important in the case of flexible docking and may lead to identify
induced deformations in a protein. To perform such an analysis, much can
be learned from the huge amount of data generated by the various groups
in view of the Capri experience and discarded when selecting the
structures to be submitted. Comparative studies underway between true
dimeric proteins and proteins submitted to crystal packing interaction
are also very important for this purpose. It seems that the spatial
repartition of interactions is more important than simply the total sum
of interactions.
In conclusion, the workshop
successfully reached its goal in specifically addressing the problem of
accounting for flexibility in macromolecular docking. Thanks to the
high-level contributions of all participants, we were able to determine
the important issues to be addressed, to identify methods potentially
useful to solve these questions and interesting tracks to be explored.
We hope the constructive discussions that took place during the workshop
announce a new stage of development of the macromolecular docking field.
References
1. Smith GR, Sternberg MJE. Prediction of
protein-protein interactions by docking methods. Curr. Opin. Struct. Biol. 2002, 12,
28-35
2. Wodak SW, Janin J. Structural
basis of macromolecular recognition. Adv. Prot. Chem. 2002, 61,
9-68
3. Janin J, Henrick K,
Moult J, Eyck LT, Sternberg MJ, Vajda S, VakserI, Wodak SJ. CAPRI: A Critical
Assessment of PRedicted Interactions. Proteins 2003, 52, 2-9.
[http://capri.ebi.ac.uk]
4. Betts MJ, Sternberg
MJE. An
analysis of conformational changes on protein-protein association:
implications for predictive docking. Protein Engineering 1999, 12, 271-283
see also
http://molmovdb.org/molmovdb for an overview of protein movements
5. Rodriguez R, Chinea
G, Lopez N, Pons T, Vriend G. Homology
modeling, model and software evaluation: three related resources.
Bioinformatics. 1998, 14, 523-8
6. Mendez R, Leplae R,
De Maria L, Wodak SJ. Assessment of
blind predictions of protein-protein interactions: Current status of
docking methods. Proteins 2003, 52, 51-67
7. Jacobs DJ, Radler,
AJ, Kuhn LA and Thorpe MF. Protein
Flexibility predictions using graph theory. Proteins, 2001, 44,
150-165
8. Lichtarge O, Bourne
HR and Cohen FE. The evolutionary
trace method defines the binding surfaces common to a protein family.
J Mol Biol 1996, 257, 342-358
Abstracts and Posters (.pdf)
List
of participants (.pdf)
|