Anda di halaman 1dari 24

Dept.

of Biotechnology, BCET
FOLD LIBRARY
MODULE IV
BT-701
INTRODUCTION

• Great challenge to predict the 3D structure of a protein


from it’s linear sequence.

Dept. of Biotechnology, BCET


• Proteins are made up of 20 amino acids that are folded
into unique 3D structures.

• These structures are determined by the sequence of the


amino acids.
INTRODUCTION

Two experimental methods are available for the


determination of 3D structures of proteins:

Dept. of Biotechnology, BCET


• X-Ray Crystallography.
• Nuclear Magnetic Resonance (NMR) Spectroscopy.

Disadvantages:

• Time consuming.
• Expensive.

So there is a need for fast and reliable computational


methods for predicting the three dimensional structures
from protein sequences.
INTRODUCTION

Proteins can have similar structural folds even if they


have no sequence or functional similarity.

Dept. of Biotechnology, BCET


Fold recognition methods try to recognize the structural
fold of a protein from structure template library, given its
sequence information and then generates an alignment
between the query and the template.

Fold recognition methods are efficient especially when

• The query sequence has no/little similarity to any


sequence with known structure.

• Some model from structure library represents the true


fold of the sequence.
PROBLEM DEFINITION

 Construct a protein structure template library.

Dept. of Biotechnology, BCET


 Design a scoring function to measure the fitness
between the target sequence and the template.

 Design an efficient algorithm for searching over all the


templates in the library.

 Find the best alignment between the target sequence


and the template by minimizing the scoring function.
Algorithms in use

 Neural Network. (GenTHREADER)

Dept. of Biotechnology, BCET


 Bayesian Networks.

 Structural Pattern based methods. (SPREK)

 Support Vector Machine (SVM).

 Evolutionary Methods (Genetic Algorithm, Monte Carlo)

 Parallel Evolutionary Methods for protein fold


recognition.
Fold Libraries

Two most important and well known fold libraries are:

Dept. of Biotechnology, BCET


• SCOP (Structural Classification of Proteins).

• CATH (Class, Architecture, Topology, Homologous


superfamily).
SCOP

Nearly all proteins have structural similarities with other


proteins and also share a common evolutionary
relationship.

Dept. of Biotechnology, BCET


Knowledge about these relationships has important
contributions in the field of molecular biology and allied
sciences.

Plays an important role in interpretation of sequences


produced by genomic projects and helps in the
understanding of evolution of development.
SCOP
Exponential growth in the number of proteins whose
structures have been determined by NMR and X-Ray
crystallography.

Dept. of Biotechnology, BCET


To facilitate the understanding and to provide access to
these structures is the main objective of SCOP.

Provides a detailed and comprehensive description of the


structural and evolutionary relationships of proteins
whose structures have been determined.

Includes all the proteins in PDB and also those whose


structures have been published but not included in PDB.
SCOP

Classification is based on evolutionary relationships and


on other principles that govern their 3D structures.

Dept. of Biotechnology, BCET


Striking regularities are observed in the ways in which
secondary structures (Levitt & Chothia, 1976) are
assembled and in topologies in the polypeptide chains
(Richardson, 1976).

These regularities arise from the intrinsic physical and


chemical properties of the proteins.
SCOP

Method used to construct protein classification in SCOP is


essentially the visual inspection and comparison of
structures through automatic tools used to make the task
manageable.

Dept. of Biotechnology, BCET


The unit of classification is usually the protein domain.

Most proteins are medium sized and contain only one


domain and hence treated as a whole.

The domains in large proteins are usually classified


individually.
Dept. of Biotechnology, BCET
SCOP
SCOP

FAMILY:

Proteins are clustered together into families on the basis


of either of the following criteria:

Dept. of Biotechnology, BCET


• All proteins should have residue identities of 30% or
more.

• Proteins with lower sequence identities but whose


functions and structures are very similar. [Eg. Globins
with sequence identity of 15%].
SCOP

SUPERFAMILY:

Families, whose proteins have low sequence identities but

Dept. of Biotechnology, BCET


whose structures and functional features imply a
probable common evolutionary relationship.
SCOP

COMMON FOLD:

Superfamilies and families are defined as having a

Dept. of Biotechnology, BCET


common fold if the proteins have some major secondary
structures in the same arrangement with the same
topological connection.

Proteins placed together in the same fold category have


structural similarities that probably arise from the physics
and chemistry of the proteins.
SCOP

CLASS:

For convenience different folds have been grouped into


classes. Most of the folds are assigned to one of the five

Dept. of Biotechnology, BCET


structural classes on the basis of the composition of the
secondary structures :

• All alpha.
• All beta.
• Alpha and beta.
• Alpha plus beta.
• Multi domain (for those with domains of different fold
and for which no homologues are known at present).
Dept. of Biotechnology, BCET
SCOP

Two search facilities are available in SCOP

• “Homology search”, which permits users to enter a


sequence and obtain a list of any structures to which it

Dept. of Biotechnology, BCET


has significant levels of similarity.

• The “key word” search finds for a word entered by the


user from both the text of the SCOP database and the
headers of the PDB files.
CATH

The CATH classification of protein domain structures was


established in 1993 as a hierarchical clustering of protein
domain structures into families (evolutionary and
structural) depending on the sequence and structural

Dept. of Biotechnology, BCET


similarity.

There are four major levels:


• Class.
• Architecture.
• Topology.
• Homologous family.
Dept. of Biotechnology, BCET
CATH

CATH consists of both phylogenetic and phenetic


descriptors for protein domain relationship.

At the lowest level proteins are grouped into Homologous

Dept. of Biotechnology, BCET


families, for having either significant sequence similarity
(≥35% identity) or high structural similarity with some
sequence similarity (≥20% identity).

Structural similarity is assessed automatically where a


score of 100 is assigned for identical proteins. Homologous
proteins generally return a score of 80 and more distantly
related ones return a score of 70.
CATH

The architecture level in CATH, groups proteins whose


folds have similar 3D arrangements of secondary
structures regardless of their connectivity.

Dept. of Biotechnology, BCET


Eg: Barrel, Sandwich, Propeller.
CATH

The class reflects the proportion of α-helix or β-strand


secondary structures.

The three major classes recognized are:

Dept. of Biotechnology, BCET


• Mainly α.
• Mainly β.
• α + β.

Before classification, multidomain proteins are first


separated into their constituent folds using a consensus
method which seeks agreement between three
independent algorithms.
Assignment of functions through structure

Structural data can help to assign functions in several


ways:

• Allows recognition of more distant homologues

Dept. of Biotechnology, BCET


compared with sequence data.

• Allows detailed inspection of the functional site, to


suggest if and how the function may have evolved.

• For the superfolds, similarity of structure doesn’t


necessarily mean similarity of function. However, the
active/binding sites are often conserved.

Anda mungkin juga menyukai