Protein structure prediction

Proteins are large macromolecules that are essential for the working of organisms and contribute to processes as diverse as the catalysis of reactions, immune response, structural support, transportation and messaging. The three-dimensional shape (tertiary structure) of proteins is crucial to allow them to do their specific jobs. Computational protein structure prediction aims to develop methods that can reliably predict the tertiary structure of a protein from its molecular composition, and is a long-standing challenge in computational biology.

My research on computational protein structure prediction is centered around a number of different questions:

Knowledge-based energy functions for protein structure prediction are typically linear combinations of a number of different weighted energy terms. Are current weight settings in these functions close to optimal and / or do such optimal weight settings even exist? If they do not exist, can we benefit from using multiobjective or interactive approaches that avoid the priori selection of a single fixed weight setting? Our two papers in PPSN 2008 (right) explore the effect that the decomposition of an energy function (or of any other objective) has on the performance of a simple search method.

How can we evaluate the performance of a given (single or multiobjective) energy function? What problems might we encounter if we compare energy functions on decoy sets of protein structures? Our recent Bioinformatics paper (right) addresses some of these issues.

Fragment-assembly techniques currently present the state-of-the-art for de novo structure prediction, but they do not scale to large proteins or those with high contact order. Is this due to limitations of the energy functions, the optimization methods, the quality of the fragment libraries or a combination of these three factors? Answering these questions will require a better understanding of the working mechanisms of fragment-assembly methods, and is crucial to facilitate further improvement of these techniques. My recent work (see below) has analyzed the effects of fragment and insertion size on search space size and search performance.

References:

Julia Handl, Joshua Knowles, Robert Vernon, David Baker and Simon Lovell (2012). The dual role of fragments in fragment-assembly methods for de novo protein structure prediction. Proteins 80(2,): 490-504

Julia Handl, Joshua Knowles and Simon Lovell (2009). Artefacts and biases affecting the evaluation of scoring function on decoy sets for protein structure prediction. Bioinformatics 25(10):1271-1279

Julia Handl, Simon C. Lovell and Joshua Knowles. (2008) Multiobjectivization by decomposition of scalar cost functions. Proceedings of the Tenth International Conference on Parallel Problem Solving from Nature (PPSN X).

Julia Handl, Simon C. Lovell and Joshua Knowles. (2008) Investigations into the effect of multiobjectivization in protein structure prediction. Proceedings of the Tenth International Conference on Parallel Problem Solving from Nature (PPSN X).

Links and downloads:

Data sets:

Selected decoy sets from the Decoys 'R' Us repository.
Publicly available decoy sets generated by Rosetta (Rosetta all atom and Rosetta Tsai decoy sets). These are no longer available on the group pages of the Baker laboratory, but can be found here.
Loop decoy set.
Comparative modelling (MOULDER) decoy set.
Summaries of the Rosetta energies for the custom-designed decoy sets for 1zdd.

Software:

The R project for Statistical Computing. The specific functions used were funtion cor(..,method="spearman") to compute Spearman rank correlations, function cor.test(..,method="spearman") to test the statistical significance of a given correlation and function parcoord(..) from the MASS library to generate parallel axes plots.
The Rosetta method for protein structure prediction, version 2.3.
The TINKER molecular modelling software, which implements the Amber99 all-atom energy function.