Topics to be discussed Introduction Comparative modeling Homology modeling Energy Minimization Validation methods Protein modeling tools
Introduction
Protein structure prediction is the prediction of the three-dimensional structure of a protein from its amino acid sequence - Prediction of its tertiary structure from its primary structure Protein structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry - Highly important in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes) Two most common methods to do this are xray diffraction and nuclear magnetic resonance Modeled structure is stored in a database (PDB) as data giving the x,y,z coordinates of each atom
March 25, 2012 3
We can apply docking algorithms to the structures (both with other proteins and with small molecules)
March 25, 2012 5
Comparative Modeling
Two primary methods : 1) Homology modeling 2) Threading (fold recognition) Note: both rely on availability of experimentally determined structures that are "homologous" or at least structurally very similar to target
Assumptions & Principles Increase in sequence identity correlates with increase in structural similarity
RMSD of core a-carbon coordinates for two homologous proteins sharing 50% identity expected at ~1 Theoretical models are low resolution, and depend on quality of input alignment!
What is Homology Modelling? Given an unknown protein, make an informed guess on its 3D structure based on its sequence: Search structure databases for homologous sequences Transfer coordinates of known protein onto unknown
10
Stages in Homology Modeling Identify templates (or parents) Align the target sequence with the parent(s) Find: - structurally conserved regions (SCR) -structurally variable regions (SVR) Inherit the SCRs from the parent(s) Build the SVRs Build the side chains Refine the model Evaluate errors in the model
11
Identify templates (or parents) First step in homology modeling is the assignment of the unknown protein structure to a protein family Then it is necessary to compare the new sequence with thousands of sequences already stored in protein databases and to identify, if possible, homologous ones Rapid searching procedures have been developed FASTP, BLASTP
12
Align the target sequence with the parent(s) Central technique used for amino acid sequence comparison is the so-called sequence alignment Framework of homology modeling the sequence alignment procedure is of importance for several reasons Sequence alignment is used to search databases to find related sequences and to identify which regions of the detected proteins are conserved, thus suggesting where the unknown protein may also be structurally conserved Sequence alignment is used for detection of correspondences between amino acids of the structurally known reference protein and those of the protein to be modeled
13
Align the target sequence with the parent(s) A kind of scoring scheme is required that dictates the weight for aligning a particular type of amino acid with another, so called homology matrices Homology matrices make use of the most probable amino acid substitutions according to physical, chemical or statistical properties Matrices which are in use: - Identity matrix, Codon substitution matrix, and Mutation matrix (also known as the Dayhoff or PAM250 matrix
March 25, 2012 14
Doolittle Rules of Thumb Rules of thumb which can ease the decision: - If the sequences are longer than 100 residues and are found to be more than 25% identical (with appropriate gaps) then they are very likely related
- If the identity is in the range of 15-25%, then the sequences may still be related
- If the sequences are less than 15% identical, they are probably not related
15
Construction of Structurally Variable Regions (SVRs) Structurally variable regions (SVRs) show little or no sequence homology and are the sites of addition and deletion of residues SVRs occur preferably in loop regions A variety of methods for generating loops have been developed Extensive investigations of variable regions in homologous proteins have shown that in cases where particular loops possess the same length and amino acid character, their conformation will be the same Coordinates then can be transferred directly to the model protein
17
If no comparable loop exists in the protein family, two other strategies can be applied for modeling the SVRs loop search method and de novo loop segment generation All loops generated by database or random search methods are usually far from optimal geometry. Loop regions (including confining residues) must subsequently be refined by energy minimization techniques in order to remove steric hindrance and to relax the loop conformations
18
In this work we examined differences in structures of amino- acid side chains around point mutations.
Conformation - a given set of dihedral angle which defines a structure. Rotamer - energetically favourable conformation. Asn
Phe
Construction of the side chains is done using the template structures when there is high similarity between the built protein and the templates Without such similarity the construction can be done using rotamer libraries A compromise between the probability of the rotamer and its fitness in specific position determines the score. Comparing the scores of all the rotamer for a given amino acid determines the preferred rotamer
March 25, 2012 19
Three different types of methods can be employed -Statistical -Stereochemical -Homology/neural network-based methods
Statistical methods : underlying idea is that 20 amino acids show statistically significant preferences for particular secondary structures Ala, Arg, Gln, Glu, Met, Leu and Lys for example are preferentially found in ahelices Cys, Ile, Phe, Thr, Trp, Tyr and Val occur more frequently in -sheets Prediction is done by calculating the probability of an amino acid to belong to a particular type of secondary structure, such as a-helix, -sheet or turn, based simply on its frequency of occurrence Chou and Fasman Method Gamier, Osguthorpe and Robson (GOR) Method
March 25, 2012 21
22
Optimisation Methods Energy Minimisation is used to produce a chemically and conformationally reasonable model protein structure Two mainly used optimisation algorithms are Steepest Descent
Conjugate Gradients
Molecular Dynamics is used to explore the conformational space a molecule could visit
24
25
26
SWISSMODEL (www)
3D-Jigsaw (www)
http://swissmodel.expasy.org/SWISS-MODEL.html
http://www.bmm.icnet.uk/servers/3djigsaw/
ESyPred3D (www)
http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/
WHATIF (www)
http://swift.cmbi.kun.nl/WIWWWI
PROCHECK server
http://nihserver.mbi.ucla.edu/SAVES/
27
28