Fold Lib

Dept.
of Biotechnology, BCET
FOLD LIBRARY
MODULE IV
BT-701
INTRODUCTION
• Great challenge to predict the 3D structure of a protein

from it’s linear sequence.
Dept. of Biotechnology, BCET

• Proteins are made up of 20 amino acids that are folded
into unique 3D structures.
• These structures are determined by the sequence of the

amino acids.
INTRODUCTION
Two experimental methods are available for the

determination of 3D structures of proteins:

• X-Ray Crystallography.
• Nuclear Magnetic Resonance (NMR) Spectroscopy.
Disadvantages:
• Time consuming.
• Expensive.
So there is a need for fast and reliable computational

methods for predicting the three dimensional structures
from protein sequences.
INTRODUCTION
Proteins can have similar structural folds even if they

have no sequence or functional similarity.

Fold recognition methods try to recognize the structural
fold of a protein from structure template library, given its
sequence information and then generates an alignment
between the query and the template.
Fold recognition methods are efficient especially when
• The query sequence has no/little similarity to any

sequence with known structure.
• Some model from structure library represents the true

fold of the sequence.
PROBLEM DEFINITION
 Construct a protein structure template library.

 Design a scoring function to measure the fitness
between the target sequence and the template.
 Design an efficient algorithm for searching over all the

templates in the library.
 Find the best alignment between the target sequence

and the template by minimizing the scoring function.
Algorithms in use
 Neural Network. (GenTHREADER)

 Bayesian Networks.
 Structural Pattern based methods. (SPREK)
 Support Vector Machine (SVM).
 Evolutionary Methods (Genetic Algorithm, Monte Carlo)
 Parallel Evolutionary Methods for protein fold

recognition.
Fold Libraries
Two most important and well known fold libraries are:

• SCOP (Structural Classification of Proteins).
• CATH (Class, Architecture, Topology, Homologous

superfamily).
SCOP
Nearly all proteins have structural similarities with other

proteins and also share a common evolutionary
relationship.

Knowledge about these relationships has important
contributions in the field of molecular biology and allied
sciences.
Plays an important role in interpretation of sequences

produced by genomic projects and helps in the
understanding of evolution of development.
SCOP
Exponential growth in the number of proteins whose
structures have been determined by NMR and X-Ray
crystallography.

To facilitate the understanding and to provide access to
these structures is the main objective of SCOP.
Provides a detailed and comprehensive description of the

structural and evolutionary relationships of proteins
whose structures have been determined.
Includes all the proteins in PDB and also those whose

structures have been published but not included in PDB.
SCOP
Classification is based on evolutionary relationships and

on other principles that govern their 3D structures.

Striking regularities are observed in the ways in which
secondary structures (Levitt & Chothia, 1976) are
assembled and in topologies in the polypeptide chains
(Richardson, 1976).
These regularities arise from the intrinsic physical and

chemical properties of the proteins.
SCOP
Method used to construct protein classification in SCOP is

essentially the visual inspection and comparison of
structures through automatic tools used to make the task
manageable.

The unit of classification is usually the protein domain.
Most proteins are medium sized and contain only one

domain and hence treated as a whole.
The domains in large proteins are usually classified

individually.
SCOP
SCOP
FAMILY:
Proteins are clustered together into families on the basis

of either of the following criteria:

• All proteins should have residue identities of 30% or
more.
• Proteins with lower sequence identities but whose

functions and structures are very similar. [Eg. Globins
with sequence identity of 15%].
SCOP
SUPERFAMILY:
Families, whose proteins have low sequence identities but

whose structures and functional features imply a
probable common evolutionary relationship.
SCOP
COMMON FOLD:
Superfamilies and families are defined as having a

common fold if the proteins have some major secondary
structures in the same arrangement with the same
topological connection.
Proteins placed together in the same fold category have

structural similarities that probably arise from the physics
and chemistry of the proteins.
SCOP
CLASS:
For convenience different folds have been grouped into

classes. Most of the folds are assigned to one of the five

structural classes on the basis of the composition of the
secondary structures :
• All alpha.
• All beta.
• Alpha and beta.
• Alpha plus beta.
• Multi domain (for those with domains of different fold
and for which no homologues are known at present).
SCOP
Two search facilities are available in SCOP
• “Homology search”, which permits users to enter a

sequence and obtain a list of any structures to which it

has significant levels of similarity.
• The “key word” search finds for a word entered by the

user from both the text of the SCOP database and the
headers of the PDB files.
CATH
The CATH classification of protein domain structures was

established in 1993 as a hierarchical clustering of protein
domain structures into families (evolutionary and
structural) depending on the sequence and structural

similarity.
There are four major levels:

• Class.
• Architecture.
• Topology.
• Homologous family.
CATH
CATH consists of both phylogenetic and phenetic

descriptors for protein domain relationship.
At the lowest level proteins are grouped into Homologous

families, for having either significant sequence similarity
(≥35% identity) or high structural similarity with some
sequence similarity (≥20% identity).
Structural similarity is assessed automatically where a

score of 100 is assigned for identical proteins. Homologous
proteins generally return a score of 80 and more distantly
related ones return a score of 70.
CATH
The architecture level in CATH, groups proteins whose

folds have similar 3D arrangements of secondary
structures regardless of their connectivity.

Eg: Barrel, Sandwich, Propeller.
CATH
The class reflects the proportion of α-helix or β-strand

secondary structures.
The three major classes recognized are:

• Mainly α.
• Mainly β.
• α + β.
Before classification, multidomain proteins are first

separated into their constituent folds using a consensus
method which seeks agreement between three
independent algorithms.
Assignment of functions through structure
Structural data can help to assign functions in several

ways:
• Allows recognition of more distant homologues

compared with sequence data.
• Allows detailed inspection of the functional site, to

suggest if and how the function may have evolved.
• For the superfolds, similarity of structure doesn’t

necessarily mean similarity of function. However, the
active/binding sites are often conserved.

Fold Lib

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Fold Lib

Diunggah oleh

Hak Cipta:

Format Tersedia

Dept.

• Great challenge to predict the 3D structure of a protein

Dept. of Biotechnology, BCET

• These structures are determined by the sequence of the

Two experimental methods are available for the

Dept. of Biotechnology, BCET

So there is a need for fast and reliable computational

Proteins can have similar structural folds even if they

Dept. of Biotechnology, BCET

Fold recognition methods are efficient especially when

• The query sequence has no/little similarity to any

• Some model from structure library represents the true

 Construct a protein structure template library.

Dept. of Biotechnology, BCET

 Design an efficient algorithm for searching over all the

 Find the best alignment between the target sequence

 Neural Network. (GenTHREADER)

Dept. of Biotechnology, BCET

 Structural Pattern based methods. (SPREK)

 Support Vector Machine (SVM).

 Evolutionary Methods (Genetic Algorithm, Monte Carlo)

 Parallel Evolutionary Methods for protein fold

Two most important and well known fold libraries are:

Dept. of Biotechnology, BCET

• CATH (Class, Architecture, Topology, Homologous

Nearly all proteins have structural similarities with other

Dept. of Biotechnology, BCET

Plays an important role in interpretation of sequences

Dept. of Biotechnology, BCET

Provides a detailed and comprehensive description of the

Includes all the proteins in PDB and also those whose

Classification is based on evolutionary relationships and

Dept. of Biotechnology, BCET

These regularities arise from the intrinsic physical and

Method used to construct protein classification in SCOP is

Dept. of Biotechnology, BCET

Most proteins are medium sized and contain only one

The domains in large proteins are usually classified

Proteins are clustered together into families on the basis

Dept. of Biotechnology, BCET

• Proteins with lower sequence identities but whose

Families, whose proteins have low sequence identities but

Dept. of Biotechnology, BCET

Superfamilies and families are defined as having a

Dept. of Biotechnology, BCET

Proteins placed together in the same fold category have

For convenience different folds have been grouped into

Dept. of Biotechnology, BCET

Two search facilities are available in SCOP

• “Homology search”, which permits users to enter a

Dept. of Biotechnology, BCET

• The “key word” search finds for a word entered by the

The CATH classification of protein domain structures was

Dept. of Biotechnology, BCET

There are four major levels:

CATH consists of both phylogenetic and phenetic

At the lowest level proteins are grouped into Homologous

Dept. of Biotechnology, BCET

Structural similarity is assessed automatically where a

The architecture level in CATH, groups proteins whose

Dept. of Biotechnology, BCET

The class reflects the proportion of α-helix or β-strand