Multiple Seq Alignment

Multiple Sequences Alignment
Homology: Definition
Homology: similarity that is the result of inheritance from a common ancestor
Paralogs - related genes within an organism Orthologs genes in other species An Alignment is an hypothesis of positional homology between bases/ amino acids.
Why are multiple sequences alignment used?

Related protein can often provide the likely function, structure, and evolution. Multiple alignment is more sensitive than pairwise alignment to detect homologs. Revealed conserved residues or motifs. Database search effectively perform multiple sequences alignment. The regulatory region of many genes contain consensus sequences for transcription factor-binding site.
Information in Multiple Alignment

Conserved regions Region that are invariant in all the alignment. These usually indicate regions with a specific function. Can be totally or partially conserved. Phylogenetic analysis Tell you which sequences are closest. Sequences are arranged from the most closely related to the most distantly related.
Multiple Sequences Alignment -- Goal

To generate a concise, information-rich summary of sequence data. Used to illustrate the similarity between a group of sequences. Used to illustrate the dissimilarity between a group of sequences. Alignment can be treated as models that can be used to test hypotheses.
Alignment can be easy or difficult
Easy
Difficult : due to the insertions or deletions
The Methods of Multiple Sequences Alignment
Multiple Sequences Alignment - methods

Methods of solving the Multiple Alignment Problem Manual Dynamic Programming Hidden Markov Models (HMMs) Progressive Alignment
Manual Alignment
Alignment is easy. There is some extraneous information. Automated alignment methods have encountered the local minimum problem. An automated alignment method can be improved.
Dynamic Programming Alignment

Dynamic Programming Consider 2 protein sequences of 100 amino acids in length. If it takes 100 seconds to completely align these sequences, it will takes 100 seconds to align 3 sequences, and then 4 sequences etc. It will takes 1.90258x1034 years to align 20 sequences completely.
Limited to a small number of sequences.
Pairwise Alignment
Aligning two sequences : GATTC & GAATTC 1 Scoring: matches: +1 mismatches: 0 indel: -1 1
-1
1 1 1
GATTC GAATTC
Score = 2
GATTC GAATTC
Score = 4
Hidden Markov Models

HMMER was written by Sean Eddy. http://hmmer.wustl.edu Running on UNIX platform. Probabilistic models. Described the likelihood that an amino acid residue occurs at each given position of an alignment. Two main uses search a sequence database with a single profile HMM. search a single query sequence against a library of HMMs.
Progressive Alignment
Devised by Feng and Doolittle in 1987. Heuristic method, as such, is not guaranteed to find the optimal alignment. Based on the pairwise alignment. Most successful implementation is Clustal (by Des Higgins)
ClustalW
ClustalW - Introduction
. General purpose is the comparison or alignment of DNA or protein sequences. . Biologists can study the sequence patterns conserved through evolution and ancestral relationship between different organisms. . Clustalw can be displayed on different operating systems, including: WinXP, UNIX (Linux), Macintosh. . The first Clustal programme (1988) by Des Higgins ClustalV (1992) ClustalW (1994) ClustalX
. The latest version is ClustalW 1.83
ClustalW download & WWW

Download
http://www.imtech.res.in/pub/mirror_sites/ebi/dos/clustalw/ http://iubio.bio.indiana.edu/soft/iubionew/molbio/dna/analysis/ClustalW/ ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW/ WWW http://www.ebi.ac.uk/clustalw (version 1.83)
Three main stages for ClustalW :

Pairwise alignment: Calculate distance matrix
Unrooted Neighbour-Joining tree Rooted NJ tree (guide tree) and sequence weights
Progressive alignment: Align following the guide tree
Pairwise Alignment
. Pairwise aligns each sequence with every the others
for example: there are n sequences
n(n 1) n C2 2
pairwise alignments were calculated.
. accurate scores from full dynamic programming alignment using 2 gap penalties (opening and extending ) a full amino acid weight matrix
. Each pairwise alignment is completely independent
Calculate distance matrix
Both of the scores (gap penalties and amino acid weight matrix) are initially calculated as per cent identity scores and are converted to distances by dividing by 100 and subtracting from 1.0 to give number of differences per site.

Guide Tree unroot NJ tree

0.17 0.13
Generate a Neighbor-Joining guide tree from these pairwise distance. This guide tree gives the order in which the progressive alignment will be carried out.
Three main stages for ClustlaW :

Guide Tree root NJ tree
The weights are dependent upon the distance from the root of the tree but sequences which have a common branch with other sequences share the weight derived from the shared branch.

Pairwise alignment: Calculate distance matrix Unrooted Neighbour-Joining tree Rooted NJ tree (guide tree) and sequence weights
Progressive Alignment
Align the two most closely-related sequences first. This alignment is then fixed and will never change. Once gap, always gap.
Summary
There are three main stages for ClustalW
Higgins D., Thompson J., Gibson T.Thompson J.D., Higgins D.G., Gibson T.J.(1994). Nucleic Acids Res. 22:4673-4680.
ClustalW spends around 96% running time in the first stage for pairwise alignment of the n sequences; and the rest is the running time for second and third stages.
Perform ClustalW alignment
ClustalW
Main menu
Input file
Input File
Prepare the input file sequences should be all in one file there are 7 formats can be accepted : NBRF/PIR, EMBL/Swissport, Fasta, GDE, Clustal, GCG/MSF, RSF
edit the file by Notepad for example :
Fasta is the common
Main Menu
Multiple alignment menu 1. Do complete multiple alignment now (slow/fast) 2. Produce guide tree only 3. Do alignment using old guide tree file 4. Slow / fast pairwise alignment 5. Pairwise alignment parameter 6. Multiple alignment parameter 7. Reset gaps before alignemnt 8. Screen display 9. Output format option 1. Sequence input from disk 2. Multiple alignment 3. Profile / structure alignment
4. Phylogenetic tree
Profile / Structure alignment 1. Input 1st. profile 2. Input 2nd. profile / sequence 3. Align 2nd. profile to 1st. profile 4. Align sequences to 1st. profile Phylogenetic tree 1. Input alignment 2. Exclude position with gaps 3. Correct for multiple substitutions 4. Draw tree now 5. Bootstrap tree
Toggle slow/fast pairwise alignment

Slow/accurate alignment It is fine for short sequences. If sequences>100, length >1000, the speed will be extremely slow full dynamic programming. Fast/approximate alignment how to be fast: - only exactly matching fragments - only the best diagonal
Pairwise Alignment Parameter (1)

Slow alignment:
. Gap Open Penalty: the penalty for opening a gap. (initial gap penalty)
. Gap Extension Penalty: the penalty for extending a gap by 1 residue. ACGTAAATTTTTGG ACGT - - - - - -TTGG
GOP GEP
. Protein Weight Matrix: Gonnet, BLOSUM, PAM
. DNA Weight Matrix: assigned to matches and mismatches

For example: Gonnet BLOSUM PAM Scoring Matrix
Pairwise Alignment Parameters (2)

Fast alignmnet
. K-Tuple Size: the size of exactly matching fragment
increase for speed (max=2 for protein, 4 for DNA); decrease for
sensitivity . Top Diagonals: the number of K-Tuple matches on each diagonal (most matches)
decrease for speed; increase for sensitivity

. Window size: the number of diagonals around each of the best diagonals
decrease for speed; increase for sensitivity
Multiple Alignment Parameter

. increase the Gap Opening Penalty will make gaps less frequent. . increase the Gap Extension Penalty will make gaps shorter. . Delay Divergent Sequences: for delaying the alignment of the most distantly related sequences until most closely related sequences have aligned. . DNA Transition Weight: give the score of AG, CT, between 0 or 1 0 mismatches; 1 matches. for distantly related DNA sequences, the weight is approximately 0 for closely related DNA sequences, the weight has higher score. . Protein Weight Matrix: how similar the sequences to be aligned at this alignment step are.
Output File
CLUSTAL output : [filename].aln
GUIDE TREE : [filename].dnd

Multiple Seq Alignment

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Multiple Seq Alignment

Diunggah oleh

Hak Cipta:

Format Tersedia

Multiple Sequences Alignment

Why are multiple sequences alignment used?

Information in Multiple Alignment

Multiple Sequences Alignment -- Goal

Alignment can be easy or difficult

Difficult : due to the insertions or deletions

The Methods of Multiple Sequences Alignment

Multiple Sequences Alignment - methods

Dynamic Programming Alignment

Limited to a small number of sequences.

Hidden Markov Models

. The latest version is ClustalW 1.83

ClustalW download & WWW

Three main stages for ClustalW :

Progressive alignment: Align following the guide tree

Calculate distance matrix

Three main stages for ClustalW :

Progressive alignment: Align following the guide tree

Guide Tree unroot NJ tree

Three main stages for ClustlaW :

Progressive alignment: Align following the guide tree

Guide Tree root NJ tree

Three main stages for ClustalW :

Progressive alignment: Align following the guide tree

Perform ClustalW alignment

edit the file by Notepad for example :

Fasta is the common

Toggle slow/fast pairwise alignment

Pairwise Alignment Parameter (1)

. Protein Weight Matrix: Gonnet, BLOSUM, PAM

. DNA Weight Matrix: assigned to matches and mismatches

Pairwise Alignment Parameters (2)

decrease for speed; increase for sensitivity

decrease for speed; increase for sensitivity

Multiple Alignment Parameter

GUIDE TREE : [filename].dnd

Anda mungkin juga menyukai