Anda di halaman 1dari 28

Protein Modeling

Topics to be discussed Introduction Comparative modeling Homology modeling Energy Minimization Validation methods Protein modeling tools

March 25, 2012

Introduction
Protein structure prediction is the prediction of the three-dimensional structure of a protein from its amino acid sequence - Prediction of its tertiary structure from its primary structure Protein structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry - Highly important in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes) Two most common methods to do this are xray diffraction and nuclear magnetic resonance Modeled structure is stored in a database (PDB) as data giving the x,y,z coordinates of each atom
March 25, 2012 3

Why Insilico Modelling is Necessary?


Current structure determination methods: NMR conc. protein solution; limited size + resolution XRay Crystallography protein crystals, high resolution Expensive, slow, difficult (especially membrane proteins!)

Cant keep up with growth rate of sequence databases

March 25, 2012

Why make a structural model for your protein ?


Designing (site-directed) mutants to test hypotheses about function
The structure can provide clues to the function through structural similarity with other proteins With a structure it is easier to guess the location of active sites With a structure we can plan more precise experiments in the lab

We can apply docking algorithms to the structures (both with other proteins and with small molecules)
March 25, 2012 5

Comparative Modeling
Two primary methods : 1) Homology modeling 2) Threading (fold recognition) Note: both rely on availability of experimentally determined structures that are "homologous" or at least structurally very similar to target

March 25, 2012

Assumptions & Principles Increase in sequence identity correlates with increase in structural similarity

RMSD of core a-carbon coordinates for two homologous proteins sharing 50% identity expected at ~1 Theoretical models are low resolution, and depend on quality of input alignment!

March 25, 2012

Protein structure prediction work flow

March 25, 2012

What is Homology Modelling? Given an unknown protein, make an informed guess on its 3D structure based on its sequence: Search structure databases for homologous sequences Transfer coordinates of known protein onto unknown

MQEQLTDFSKVETNLISW-QGSLETVEQMEPWAGSDANSQTEAY | |..|. ||| ... |..||.|.| | |||..| MHQQVSDYAKVEHQWLYRVAGTIETLDNMSPANHSDAQTQAA | = Identity . | = Homology

March 25, 2012

Main steps in homology modeling

March 25, 2012

10

Stages in Homology Modeling Identify templates (or parents) Align the target sequence with the parent(s) Find: - structurally conserved regions (SCR) -structurally variable regions (SVR) Inherit the SCRs from the parent(s) Build the SVRs Build the side chains Refine the model Evaluate errors in the model

March 25, 2012

11

Identify templates (or parents) First step in homology modeling is the assignment of the unknown protein structure to a protein family Then it is necessary to compare the new sequence with thousands of sequences already stored in protein databases and to identify, if possible, homologous ones Rapid searching procedures have been developed FASTP, BLASTP

Commercially available software packages -MODELLER , COMPOSER, WHAT IF, UWGCG

March 25, 2012

12

Align the target sequence with the parent(s) Central technique used for amino acid sequence comparison is the so-called sequence alignment Framework of homology modeling the sequence alignment procedure is of importance for several reasons Sequence alignment is used to search databases to find related sequences and to identify which regions of the detected proteins are conserved, thus suggesting where the unknown protein may also be structurally conserved Sequence alignment is used for detection of correspondences between amino acids of the structurally known reference protein and those of the protein to be modeled

March 25, 2012

13

Align the target sequence with the parent(s) A kind of scoring scheme is required that dictates the weight for aligning a particular type of amino acid with another, so called homology matrices Homology matrices make use of the most probable amino acid substitutions according to physical, chemical or statistical properties Matrices which are in use: - Identity matrix, Codon substitution matrix, and Mutation matrix (also known as the Dayhoff or PAM250 matrix
March 25, 2012 14

Doolittle Rules of Thumb Rules of thumb which can ease the decision: - If the sequences are longer than 100 residues and are found to be more than 25% identical (with appropriate gaps) then they are very likely related

- If the identity is in the range of 15-25%, then the sequences may still be related
- If the sequences are less than 15% identical, they are probably not related

March 25, 2012

15

Determination and Generation of Structurally Conserved Regions (SCRs)


Structurally conserved regions (SCRs) are secondary structural units (a-helices & betastrands) show strong sequence homology SCRs are being used as the basic framework for the assignment of atomic coordinates for one of the other proteins belonging to the same family To recognize the conserved parts of the proteins, they must be superimposed relative to each other This is normally done using least-squares fitting methods Once correspondence between the reference and the target sequences has been established the coordinates for the SCRs can be assigned In segments with identical side chains detected in reference and target proteins, all coordinates of the amino acids are transferred In diverse regions only the backbone coordinates are transferred and corresponding side chains then will be added after complete backbone (SCRs and SVRs) generation
March 25, 2012 16

Construction of Structurally Variable Regions (SVRs) Structurally variable regions (SVRs) show little or no sequence homology and are the sites of addition and deletion of residues SVRs occur preferably in loop regions A variety of methods for generating loops have been developed Extensive investigations of variable regions in homologous proteins have shown that in cases where particular loops possess the same length and amino acid character, their conformation will be the same Coordinates then can be transferred directly to the model protein

March 25, 2012

17

Construction of Structurally Variable Regions (SVRs)

If no comparable loop exists in the protein family, two other strategies can be applied for modeling the SVRs loop search method and de novo loop segment generation All loops generated by database or random search methods are usually far from optimal geometry. Loop regions (including confining residues) must subsequently be refined by energy minimization techniques in order to remove steric hindrance and to relax the loop conformations

March 25, 2012

18

Build the side chains


Once peptide backbone has been constructed the next step is to add side chains It has been generally assumed that identical residues in homologous proteins adopt similar conformations

In this work we examined differences in structures of amino- acid side chains around point mutations.
Conformation - a given set of dihedral angle which defines a structure. Rotamer - energetically favourable conformation. Asn

Phe

Construction of the side chains is done using the template structures when there is high similarity between the built protein and the templates Without such similarity the construction can be done using rotamer libraries A compromise between the probability of the rotamer and its fitness in specific position determines the score. Comparing the scores of all the rotamer for a given amino acid determines the preferred rotamer
March 25, 2012 19

Secondary structure prediction


Best method for the generation of a structure proposal for a protein with unknown 3D structure is to base it on a homologous protein whose 3D structure is available However, in cases where a homologous protein does not exist, several other methods have been developed that have concentrated on the prediction of secondary structure Underlying idea evolves from the fact that 90% of the residues in most proteins are engaged either in a-helices, -strands or reverse turns As a consequence it seems possible-if the secondary structural elements are predicted accurately- to combine the predicted segments in an effort to generate the complete protein structure
March 25, 2012 20

Secondary structure prediction methods

Three different types of methods can be employed -Statistical -Stereochemical -Homology/neural network-based methods

Statistical methods : underlying idea is that 20 amino acids show statistically significant preferences for particular secondary structures Ala, Arg, Gln, Glu, Met, Leu and Lys for example are preferentially found in ahelices Cys, Ile, Phe, Thr, Trp, Tyr and Val occur more frequently in -sheets Prediction is done by calculating the probability of an amino acid to belong to a particular type of secondary structure, such as a-helix, -sheet or turn, based simply on its frequency of occurrence Chou and Fasman Method Gamier, Osguthorpe and Robson (GOR) Method
March 25, 2012 21

Secondary structure prediction methods


Stereochemical Method: LIM Based on the interpretation of the hydrophobic, hydrophilic and electrostatic properties of side chains in terms of the formulation of rules for the folding of proteins Alternating hydrophobic and hydrophilic side chains to be found in a -sheet strand, with hydrophilic residues exposed to the solvent and hydrophobic residues buried in the interior of the protein Neural network-based methods: PHD Algorithm which uses evolutionary information contained in multiple sequence alignments as input to neural networks Neural networks potentially have a methodological advantage compared with other prediction methods because they can be trained

March 25, 2012

22

Refine the model: Molecular Mechanics


Many structural artifacts can be introduced while the model protein is being built Substitution of large side chains for small ones Strained peptide bonds between segments taken from difference reference proteins Non optimum conformation of loops The object of molecular mechanics is to predict the energy associated with a given conformation of a molecule

A simple molecular mechanics energy equation is given by:


Total Energy =Stretching Energy +Bending Energy +Torsion Energy +Non-Bonded Interaction Energy*
March 25, 2012 23

Optimisation Methods Energy Minimisation is used to produce a chemically and conformationally reasonable model protein structure Two mainly used optimisation algorithms are Steepest Descent

Conjugate Gradients
Molecular Dynamics is used to explore the conformational space a molecule could visit

March 25, 2012

24

Model Validation Every homology model contains errors.Two main reasons

% sequence identity between reference and model


The number of errors in templates Hence it is essential to check the correctness of overall fold/ structure, errors of localized regions and stereochemical parameters: bond lengths, angles, geometries

March 25, 2012

25

Validation of Protein Models

March 25, 2012

26

Current (Free) Servers & Software


MODELLER 9v8 (standalone for windows, mac, linux; also web submission)
http://salilab.org/modeller http://alto.compbio.ucsf.edu/modweb-cgi/main.cgi

SWISSMODEL (www)
3D-Jigsaw (www)

http://swissmodel.expasy.org/SWISS-MODEL.html

http://www.bmm.icnet.uk/servers/3djigsaw/

ESyPred3D (www)
http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/

WHATIF (www)
http://swift.cmbi.kun.nl/WIWWWI

PROCHECK server
http://nihserver.mbi.ucla.edu/SAVES/

March 25, 2012

27

March 25, 2012

28

Anda mungkin juga menyukai