识别蛋白质界面中的计算热点英文文献和中文翻译

Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy


ABSTRACT:Motivation: Hot spots are residues comprising only a small fraction of interfaces yet accounting for the majority of the binding energy. These residues are critical in understanding the principles of protein interactions. Experimental studies like alanine scanning mutagenesis require significant effort; therefore, there is a need for computational methods to predict hot spots in protein   interfaces.

Results: We present a new intuitive efficient method to determine computational hot spots based on conservation (C), solvent accessibility [accessible surface area (ASA)] and statistical pairwise residue potentials (PP) of the interface residues. Combination of these features is examined in a comprehensive way to study their effect in hot spot detection. The predicted hot spots are observed to match with the experimental hot spots with an accuracy of 70% and a precision of 64% in Alanine Scanning Energetics Database (ASEdb), and accuracy of 70% and a precision of 73% in Binding Interface Database (BID). Several machine learning methods are also applied to predict hot spots. Performance of our empirical approach exceeds learning-based methods and other existing hot spot prediction methods. Residue occlusion from solvent in the complexes and pairwise potentials are found to be the main discriminative features in hot spot  prediction.

Conclusion: Our empirical method is a simple  approach  in hot spot prediction yet with its high accuracy and computational effectiveness. We believe that this method provides insights for the researchers working on characterization of protein binding sites and design of specific therapeutic agents for protein  interactions.

1 INTRODUCTION

Proteins function by interacting with other  molecules  through their interfaces. Studies on protein interfaces have revealed that energies are not uniformly distributed. Instead, there are certain

critical residues called hot spots comprising only a small fraction of interfaces yet accounting for the majority  of  the  binding energy (Bogan and Thorn, 1998; Clackson and Wells, 1995). Experimentally, a hot spot can be found  by  evaluating  free energy change upon mutating it to an alanine, playing key roles on the stability of the protein association. Thorn  and  Bogan (2001) deposited hot spots from alanine scanning mutagenesis experiments, in the Alanine Scanning Energetics Database (ASEdb). Binding Interface Database (BID) (Fischer et al., 2003) presents experimentally verified hot spots at interfaces collected from literature.

Analysis of amino acid composition of hot spots shows that some residues are more favorable. The most frequent ones, Tyr, Arg  and  Trp,  are  critical  due  to  their  size  and   conformation in hot spots (Bogan and Thorn, 1998). In addition, Bogan and Thorn reported that hot spots  are  surrounded  by  energetically less important residues that most likely serve to occlude bulk solvent from the hot spots. Occlusion of solvent is found to be a necessary condition for highly energetic interactions. Hot spot information from experimental studies are available only  for  a very limited number of complexes, therefore, there is a need for computational methods to identify hot spots of protein interaction sites (DeLano, 2002). In a pioneering work, Kortemme and Baker (2002) proposed a physical model (Robetta) to detect hot spots at protein–protein interfaces accounting for energies of packing interactions, hydrogen bonds and solvation. Computational hot spots, the residues they identified computationally based on their model, show accordance with experimental hot spots in ASEdb. Similarly, Gao et al. (2004) used non-covalent interactions to estimate energetic contribution of interfacial residues to binding. They reported an 88% success rate for predicting hot spots obtained from alanine scanning mutagenesis experiments (Gao et al., 2004). Another energy-based model developed by Serrano and co-workers (Guerois et al., 2002) was used to predict the energetic effect of mutations on protein complexes. The calculated energy change of mutations agreed well with the experimental results. Their method is applicable to hot spot predictions as well.