Fall 2008
¹ = (2; 3) we obtain
1X
^ =
§ (xi ¡ ¹ ^yi )T
^yi ) ¢ (xi ¡ ¹
N i
• Note that each xi is subtracted from its corresponding ¹
^yi
prior to taking the outer product. This gives us the
“pooled” estimate of
Fall 2008 14 Lecture 5 - Sofus A. Macskassy
LDA learns an LTU
• Consider the 2-class case with a 0/1 loss function. Recall that
P (x; y = 0)
P (y = 0jx) =
P (x; y = 0) + P (x; y = 1)
P (x; y = 1)
P (y = 1jx) =
P (x; y = 0) + P (x; y = 1)
• Also recall from our derivation of the Logistic Regression classifier
that we should classify into class ŷ = 1 if
P (y = 1jx)
log > 0
P (y = 0jx)
• Hence, for LDA, we should classify into ŷ = 1 if
P (x; y = 1)
log > 0
P (x; y = 0)
because the denominators cancel
P (x; y = 1) P (y = 1) 1 ³ T ¡1 T ¡1
´
log = log ¡ [x ¡ ¹1] § [x ¡ ¹1] ¡ [x ¡ ¹0] § [x ¡ ¹0]
P (x; y = 0) P (y = 0) 2
• Subtract the lower from the upper line and collect similar
terms. Note that the quadratic terms cancel! This
leaves only terms linear in x.
xT §¡1(¹0¡¹1)+(¹0¡¹1)§¡1x+¹T
1 §¡1¹ ¡¹T §¡1¹
1 0 0
Let
w = §¡1(¹1 ¡ ¹0)
P (y = 1) 1 T ¡1 1 T ¡1
c = log ¡ ¹1 § ¹1 + ¹0 § ¹0
P (y = 0) 2 2
w ¢ x + c > 0:
This is an LTU.
Fall 2008 19 Lecture 5 - Sofus A. Macskassy
Two Geometric Views of LDA
View 1: Mahalanobis Distance
2 T ¡1
• The quantity DM (x; u) = (x ¡ u) § (x ¡ u) is known as
the (squared) Mahalanobis distance between x and u. We can think
of the matrix -1 as a linear distortion of the coordinate system that
converts the standard Euclidean distance into the Mahalanobis
distance
• Note that
1
log P (xjy = k) / log ¼k ¡ [(x ¡ ¹k )T §¡1(x ¡ ¹k )]
2
1
log P (xjy = k) / log ¼k ¡ DM (x; ¹k )2
2
• Therefore, we can view LDA as computing
– DM (x; ¹0) and DM (x; ¹1)2
2
and then classifying x according to which mean 0 or 1 is closest in
Mahalanobis distance (corrected by log k)
Fall 2008 20 Lecture 5 - Sofus A. Macskassy
Mahalanobis Distance
• Which point is closer to the vertical line?