2007/02/16
Lecture 3
DNA mutations
t ti
Probability and matrix
models
Phylogenetic distance
Phylogenetic trees
and their construction
http://147.8.101.93/MATH0011/
1
Introduction
Traditional taxonomy
MATH0011 Lecture 3
2007/02/16
Numerical taxonomy
Evolutionary taxonomy
DNA structure
MATH0011 Lecture 3
2007/02/16
DNA structure
DNA replication
TCGGCGCATAATC
8
DNA mutations
11
MATH0011 Lecture 3
2007/02/16
DNA mutations
DNA mutations
13
Introduction to Probability
14
MATH0011 Lecture 3
2007/02/16
Event
17
Union of events
18
MATH0011 Lecture 3
2007/02/16
Intersection of events
Complementary event
Complementary event of E,
E denoted by E
E, is the
event composed of all those outcomes not in E.
E.g., when tossing a die, Eeven = Eodd = {1,3,5} ,
E4 = {1,2,3,5,6}
Fact: P(E) = 1 P(E)
E
Example:
l iin th
the previous
i
DNA sequence example,
l
since P(A)= 0.2, one has P({C,G,T}) = 0.8.
20
Independent events
21
22
MATH0011 Lecture 3
2007/02/16
24
25
Conditional probability
Caution:
MATH0011 Lecture 3
2007/02/16
S0 :
ACTTGTCGGATGATCAGCGGTCCATGCACC
TGACAACGGT
S1 :
ACATGTTGCTTGACGACAGGTCCATGCGCC
TGAGAACGGC
There are 9 sites in S0 which are As. At these 9
sites of S1, there are 7 A
Ass, 1 G,
G 0 C and 1 T.
T
P(S1=A | S0=A) = 7/9, P(S1=G | S0=A) = 1/9,
P(S1=C | S0=A) = 0, P(S1=T | S0=A) = 1/9.
28
29
E h column
Each
l
sum iis
total number of base j (=
A,G,C or T) occur in S0.
Each row sum is total
number of base i (=
A G C or T) occur in S1.
A,G,C
30
MATH0011 Lecture 3
2007/02/16
We assume that:
Note :
Always use the ordering A,G,C,T.
Column sums of M are all 1.
32
33
Note that
Pi | j P j = P(S1=i | S0=j) P(S0=j) = P(S1=i and S0=j)
35
MATH0011 Lecture 3
2007/02/16
36
37
it is a square matrix
matrix,
Hence entries
H
t i off M t give
i conditional
diti
l probabilities
b biliti
P(St=i | S0=j) where i, j = A, G, C, or T.
38
39
10
MATH0011 Lecture 3
2007/02/16
References
40
11