Protein Modeling

Protein Modeling
Topics to be discussed Introduction Comparative modeling Homology modeling Energy Minimization Validation methods Protein modeling tools
March 25, 2012
Introduction
Protein structure prediction is the prediction of the three-dimensional structure of a protein from its amino acid sequence - Prediction of its tertiary structure from its primary structure Protein structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry - Highly important in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes) Two most common methods to do this are xray diffraction and nuclear magnetic resonance Modeled structure is stored in a database (PDB) as data giving the x,y,z coordinates of each atom
March 25, 2012 3
Why Insilico Modelling is Necessary?

Current structure determination methods: NMR conc. protein solution; limited size + resolution XRay Crystallography protein crystals, high resolution Expensive, slow, difficult (especially membrane proteins!)
Cant keep up with growth rate of sequence databases
March 25, 2012
Why make a structural model for your protein ?

Designing (site-directed) mutants to test hypotheses about function
The structure can provide clues to the function through structural similarity with other proteins With a structure it is easier to guess the location of active sites With a structure we can plan more precise experiments in the lab
We can apply docking algorithms to the structures (both with other proteins and with small molecules)
March 25, 2012 5
Comparative Modeling
Two primary methods : 1) Homology modeling 2) Threading (fold recognition) Note: both rely on availability of experimentally determined structures that are "homologous" or at least structurally very similar to target
March 25, 2012
Assumptions & Principles Increase in sequence identity correlates with increase in structural similarity
RMSD of core a-carbon coordinates for two homologous proteins sharing 50% identity expected at ~1 Theoretical models are low resolution, and depend on quality of input alignment!
March 25, 2012
Protein structure prediction work flow
March 25, 2012
What is Homology Modelling? Given an unknown protein, make an informed guess on its 3D structure based on its sequence: Search structure databases for homologous sequences Transfer coordinates of known protein onto unknown
MQEQLTDFSKVETNLISW-QGSLETVEQMEPWAGSDANSQTEAY | |..|. ||| ... |..||.|.| | |||..| MHQQVSDYAKVEHQWLYRVAGTIETLDNMSPANHSDAQTQAA | = Identity . | = Homology
March 25, 2012
Main steps in homology modeling
March 25, 2012
10
Stages in Homology Modeling Identify templates (or parents) Align the target sequence with the parent(s) Find: - structurally conserved regions (SCR) -structurally variable regions (SVR) Inherit the SCRs from the parent(s) Build the SVRs Build the side chains Refine the model Evaluate errors in the model
March 25, 2012
11
Identify templates (or parents) First step in homology modeling is the assignment of the unknown protein structure to a protein family Then it is necessary to compare the new sequence with thousands of sequences already stored in protein databases and to identify, if possible, homologous ones Rapid searching procedures have been developed FASTP, BLASTP
Commercially available software packages -MODELLER , COMPOSER, WHAT IF, UWGCG
March 25, 2012
12
Align the target sequence with the parent(s) Central technique used for amino acid sequence comparison is the so-called sequence alignment Framework of homology modeling the sequence alignment procedure is of importance for several reasons Sequence alignment is used to search databases to find related sequences and to identify which regions of the detected proteins are conserved, thus suggesting where the unknown protein may also be structurally conserved Sequence alignment is used for detection of correspondences between amino acids of the structurally known reference protein and those of the protein to be modeled
March 25, 2012
13
Align the target sequence with the parent(s) A kind of scoring scheme is required that dictates the weight for aligning a particular type of amino acid with another, so called homology matrices Homology matrices make use of the most probable amino acid substitutions according to physical, chemical or statistical properties Matrices which are in use: - Identity matrix, Codon substitution matrix, and Mutation matrix (also known as the Dayhoff or PAM250 matrix
March 25, 2012 14
Doolittle Rules of Thumb Rules of thumb which can ease the decision: - If the sequences are longer than 100 residues and are found to be more than 25% identical (with appropriate gaps) then they are very likely related
- If the identity is in the range of 15-25%, then the sequences may still be related
- If the sequences are less than 15% identical, they are probably not related
March 25, 2012
15
Determination and Generation of Structurally Conserved Regions (SCRs)

Structurally conserved regions (SCRs) are secondary structural units (a-helices & betastrands) show strong sequence homology SCRs are being used as the basic framework for the assignment of atomic coordinates for one of the other proteins belonging to the same family To recognize the conserved parts of the proteins, they must be superimposed relative to each other This is normally done using least-squares fitting methods Once correspondence between the reference and the target sequences has been established the coordinates for the SCRs can be assigned In segments with identical side chains detected in reference and target proteins, all coordinates of the amino acids are transferred In diverse regions only the backbone coordinates are transferred and corresponding side chains then will be added after complete backbone (SCRs and SVRs) generation
March 25, 2012 16
Construction of Structurally Variable Regions (SVRs) Structurally variable regions (SVRs) show little or no sequence homology and are the sites of addition and deletion of residues SVRs occur preferably in loop regions A variety of methods for generating loops have been developed Extensive investigations of variable regions in homologous proteins have shown that in cases where particular loops possess the same length and amino acid character, their conformation will be the same Coordinates then can be transferred directly to the model protein
March 25, 2012
17
Construction of Structurally Variable Regions (SVRs)
If no comparable loop exists in the protein family, two other strategies can be applied for modeling the SVRs loop search method and de novo loop segment generation All loops generated by database or random search methods are usually far from optimal geometry. Loop regions (including confining residues) must subsequently be refined by energy minimization techniques in order to remove steric hindrance and to relax the loop conformations
March 25, 2012
18
Build the side chains

Once peptide backbone has been constructed the next step is to add side chains It has been generally assumed that identical residues in homologous proteins adopt similar conformations
In this work we examined differences in structures of amino- acid side chains around point mutations.
Conformation - a given set of dihedral angle which defines a structure. Rotamer - energetically favourable conformation. Asn
Phe
Construction of the side chains is done using the template structures when there is high similarity between the built protein and the templates Without such similarity the construction can be done using rotamer libraries A compromise between the probability of the rotamer and its fitness in specific position determines the score. Comparing the scores of all the rotamer for a given amino acid determines the preferred rotamer
March 25, 2012 19
Secondary structure prediction

Best method for the generation of a structure proposal for a protein with unknown 3D structure is to base it on a homologous protein whose 3D structure is available However, in cases where a homologous protein does not exist, several other methods have been developed that have concentrated on the prediction of secondary structure Underlying idea evolves from the fact that 90% of the residues in most proteins are engaged either in a-helices, -strands or reverse turns As a consequence it seems possible-if the secondary structural elements are predicted accurately- to combine the predicted segments in an effort to generate the complete protein structure
March 25, 2012 20
Secondary structure prediction methods
Three different types of methods can be employed -Statistical -Stereochemical -Homology/neural network-based methods
Statistical methods : underlying idea is that 20 amino acids show statistically significant preferences for particular secondary structures Ala, Arg, Gln, Glu, Met, Leu and Lys for example are preferentially found in ahelices Cys, Ile, Phe, Thr, Trp, Tyr and Val occur more frequently in -sheets Prediction is done by calculating the probability of an amino acid to belong to a particular type of secondary structure, such as a-helix, -sheet or turn, based simply on its frequency of occurrence Chou and Fasman Method Gamier, Osguthorpe and Robson (GOR) Method
March 25, 2012 21
Secondary structure prediction methods

Stereochemical Method: LIM Based on the interpretation of the hydrophobic, hydrophilic and electrostatic properties of side chains in terms of the formulation of rules for the folding of proteins Alternating hydrophobic and hydrophilic side chains to be found in a -sheet strand, with hydrophilic residues exposed to the solvent and hydrophobic residues buried in the interior of the protein Neural network-based methods: PHD Algorithm which uses evolutionary information contained in multiple sequence alignments as input to neural networks Neural networks potentially have a methodological advantage compared with other prediction methods because they can be trained
March 25, 2012
22
Refine the model: Molecular Mechanics

Many structural artifacts can be introduced while the model protein is being built Substitution of large side chains for small ones Strained peptide bonds between segments taken from difference reference proteins Non optimum conformation of loops The object of molecular mechanics is to predict the energy associated with a given conformation of a molecule
A simple molecular mechanics energy equation is given by:

Total Energy =Stretching Energy +Bending Energy +Torsion Energy +Non-Bonded Interaction Energy*
March 25, 2012 23
Optimisation Methods Energy Minimisation is used to produce a chemically and conformationally reasonable model protein structure Two mainly used optimisation algorithms are Steepest Descent
Conjugate Gradients
Molecular Dynamics is used to explore the conformational space a molecule could visit
March 25, 2012
24
Model Validation Every homology model contains errors.Two main reasons
% sequence identity between reference and model

The number of errors in templates Hence it is essential to check the correctness of overall fold/ structure, errors of localized regions and stereochemical parameters: bond lengths, angles, geometries
March 25, 2012
25
Validation of Protein Models
March 25, 2012
26
Current (Free) Servers & Software

MODELLER 9v8 (standalone for windows, mac, linux; also web submission)
http://salilab.org/modeller http://alto.compbio.ucsf.edu/modweb-cgi/main.cgi
SWISSMODEL (www)
3D-Jigsaw (www)
http://swissmodel.expasy.org/SWISS-MODEL.html
http://www.bmm.icnet.uk/servers/3djigsaw/
ESyPred3D (www)
http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/
WHATIF (www)
http://swift.cmbi.kun.nl/WIWWWI
PROCHECK server
http://nihserver.mbi.ucla.edu/SAVES/
March 25, 2012
27
March 25, 2012
28

Protein Modeling

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Protein Modeling

Diunggah oleh

Hak Cipta:

Format Tersedia

Protein Modeling

March 25, 2012

Why Insilico Modelling is Necessary?

Cant keep up with growth rate of sequence databases

March 25, 2012

Why make a structural model for your protein ?

March 25, 2012

March 25, 2012

Protein structure prediction work flow

March 25, 2012

MQEQLTDFSKVETNLISW-QGSLETVEQMEPWAGSDANSQTEAY | |..|. ||| ... |..||.|.| | |||..| MHQQVSDYAKVEHQWLYRVAGTIETLDNMSPANHSDAQTQAA | = Identity . | = Homology

March 25, 2012

Main steps in homology modeling

March 25, 2012

March 25, 2012

Commercially available software packages -MODELLER , COMPOSER, WHAT IF, UWGCG

March 25, 2012

March 25, 2012

March 25, 2012

Determination and Generation of Structurally Conserved Regions (SCRs)

March 25, 2012

Construction of Structurally Variable Regions (SVRs)

March 25, 2012

Build the side chains

Secondary structure prediction

Secondary structure prediction methods

Secondary structure prediction methods

March 25, 2012

Refine the model: Molecular Mechanics

A simple molecular mechanics energy equation is given by:

March 25, 2012

Model Validation Every homology model contains errors.Two main reasons

% sequence identity between reference and model

March 25, 2012

Validation of Protein Models

March 25, 2012

Current (Free) Servers & Software

March 25, 2012

March 25, 2012

Anda mungkin juga menyukai