Sequences
Cdric Notredame
Our Scope
Outline
wheat
?????
--DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER
| | |||||||| || | |||
||| | |||| ||||
KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA
EXTRAPOLATE
Homology?
??????
SwissProt
Same
Sequence
Same
Function
Same
Origin
Same
3D Fold
Many
Counter-examples!
An Alignment is a STORY
ADKPKRPLSAYMLWLN
ADKPKRPLSAYMLWLN
ADKPKRPLSAYMLWLN
Mutations
+
Selection
ADKPKRPKPRLSAYMLWLN
ADKPRRPLS-YMLWLN
An Alignment is a STORY
ADKPKRPLSAYMLWLN
ADKPKRPLSAYMLWLN
ADKPKRPLSAYMLWLN
Mutations
+
Selection
ADKPKRPKPRLSAYMLWLN
ADKPRRPLS-YMLWLN
Insertion
Deletion
ADKPRRP---LS-YMLWLN
ADKPKRPKPRLSAYMLWLN
Mutation
S
AFGP with (ThrAlaAla)n
NOT
Cdric Notredame (12/01/16)
Similar to
Trypsinogen
S
AFGP with (ThrAlaAla)n
NOT
Similar to Trypsinogen
SIMILAR Sequences
BUT
DIFFERENT origin
Cdric Notredame (12/01/16)
Same
Sequence
Same
Origin
Same
Function
Same
3D Fold
Similar Sequence
Historical Legacy
Family
KS
KA
Histone3
Insulin
Interleukin I
Globin
Apolipoprot. AI
Interferon G
6.4
4.0
4.6
5.1
4.5
8.6
0
0.1
1.4
0.6
1.6
2.8
Differentmolecularclocksfordifferentproteinsanotherprediction
L V
I
Aliphatic
Aromatic
P
AG G
T C S
D
N
Y HKE
Q
W R
Hydrophobic
Cdric Notredame (12/01/16)
Polar
Small
+
On the surface,
CHARGE MATTERS
OmpR, Cter Domain
Cdric Notredame (12/01/16)
In the core,
SIZE MATTERS
Their Structure
We Do Not
Have Them !!!
Their Function
Same
Function
Same
3D Fold
The table that contains the costs for all the possible
substitutions is called the SUBSTITUTION MATRIX
How to derive that matrix?
L V
I
Aromatic
Y
W
Small
A G
G
CC
D
K E
S
N
Q
Hydrophobic
Polar
-For each mutation, set the substitution score to the log odd ratio:
Log
Observed
Expected by chance
-For each mutation, set the substitution score to the log odd ratio:
Log
Observed
Expected by chance
Insertion
Deletion
ADKPRRP---LS-YMLWLN
ADKPKRPKPRLSAYMLWLN
Mutation
Scoring an Alignment
Most popular Subsitution Matrices
PAM250
Blosum62 (Most widely used)
Raw Score
TPEA
TPEA
|
| ||
APGA
APGA
Score =1 + 6 + 0 + 2 = 9
gap
Seq
Seq
Seq
Seq
AAGARFIELDTHE----CAT
GARFIELDTHE----CAT
|||||||||||
|||
|||||||||||
|||
BBGARFIELDTHELASTCAT
GARFIELDTHELASTCAT
ADKPKRPLSAYMLWLN
ADKPKRPLSAYMLWLN
ADKPKRPLSAYMLWLN
Mutations
+
Selection
ADKPKRPKPRLSAYMLWLN
ADKPRRPLS-YMLWLN
%Sequence Identity
Similar Sequence
Similar Structure
Different Sequence
Structure ????
Same 3D Fold
30%
30
Twilight Zone
Length
100
Dot Matrices
QUESTION
What are the elements shared by
two sequences ?
Dot Matrices
>Seq1
THEFATCAT
>Seq2
THELASTCAT
T H E F A T C A T
Window
Stringency
Cdric Notredame (12/01/16)
T
H
E
F
A
S
T
C
A
T
Dot Matrices
Sequences
Window size
Stringency
Dot Matrices
Strigency
Window=1
Stringency=1
Cdric Notredame (12/01/16)
Window=11
Stringency=7
Window=25
Stringency=15
Dot Matrices
x
y
x
y
Dot Matrices
http://myhits.isb-sib.ch/cgi-bin/dotlet
Dot Matrices
Dot Matrices
Dot Matrices
Dot Matrices
Limits
-Visual aid
-Best Way to EXPLORE the Sequence Organisation
-Does NOT provide us with an ALIGNMENT
wheat
?????
--DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER
| | |||||||| || | |||
||| | |||| ||||
KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA
Global Alignments
-Take 2 Nice Protein Sequences
-A good Substitution Matrix (blosum)
-A Gap opening Penalty (GOP)
-A Gap extension Penalty (GEP)
Cost
GOP
GOP
GEP
GOP
GOP
L
Parsimony:
Evolution takes the simplest path
(So We Think)
gap
Seq
Seq
Seq
Seq
AAGARFIELDTHE----CAT
GARFIELDTHE----CAT
|||||||||||
|||
|||||||||||
|||
BBGARFIELDTHELASTCAT
GARFIELDTHELASTCAT
Global Alignments
-Take 2 Nice Protein Sequences
-A good Substitution Matrix (blosum)
-A Gap opening Penalty (GOP)
-A Gap extension Penalty (GEP)
-DYNAMIC PROGRAMMING
>Seq1
THEFATCAT
>Seq2
THEFASTCAT
THEFA-TCAT
THEFASTCAT
DYNAMIC
PROGRAMMING
Global Alignments
DYNAMIC PROGRAMMING
----FAT
FAST-----FATFAST----F-ATFAST---
(L1+l2)!
(L1)!*(L2)!
Global Alignments
DYNAMIC PROGRAMMING
MisMatch=-1
Gap=-1
F A S T
0
F -1
A -2
T -3
-1 -2 -3 -4
1
F A S T
0
-1 -2 -3 -4
-1 1 0 -1 0
F
A -2
T -3
-1 -1
F A S T
F A - T
F A S T
0
F
A
T
-1 -2 -3 -4
1
2
1
2
Global Alignments
DYNAMIC PROGRAMMING
GOP
GEP
Global Alignments
DYNAMIC PROGRAMMING
Local Alignments
GLOBAL Alignment
LOCAL Alignment
Local Alignments
We now have a PairWise Comparison Algorithm,
We are ready to search Databases
Database Search
Q
SW
1.10e-20
10
1.10e-100
1.10e-2
1.10e-1
10
QUERRY
Comparison Engine
Database
3
1
3
6
1.10e-2
1
20
15
E-values
How many time do we expect such an
Alignment by chance?
13
CONCLUSION
Sequence Comparison
A few Addresses