P ROBLEM
From this assignment, you will use the logistic regression model to perform face detection
in images. We will use a subset of CBCL Face Database from MIT. The set contains 2429
faces and 4548 non-faces. Use the first 400 faces and 600 non-faces from the set to train
the model, and use the rest of the images as the test set. You are welcome to use more data
for the training. Each image is of size 19x19, and is reshaped as a 1x361 vector. There are two
matricies in the Matlab data file, one for face, one for non-face. Each data is a n by 361 matrix,
where n is the number of images in that set and each row represents one image.
For each training image x, we assign a label y, y=1 if x is a face image and y=-1 if x is not a face
image. The logistic regression model is
P (y = |x, w) = (y w T x) =
1
1 + exp(y w T x)
(1)
Assume the prior probability distribution of w is Gaussian, with mean 0 and variance 1 I ,
where I is the identity matrix.
Given a data set (X , y), where X = (x 1 , x 2 , ..., x n ) represent n trainin images, y = (y 1 , y 2 , ..., y n )
represent hte labels of the training images, we would like to find a parameter vector w which
maximizes the likelihood P (w|X , y). It is equivalent to minimize
l (w) =
n
X
i =1
l n(1 + exp(y i w T x i )) +
T
w w
2
(2)
with = 0.01
1) Find the gradient of l with respect to w. Write a Matlab function myfunc.m which calculates both l (w) and its gradient.
M ATLAB CODE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
../myfunc.m
2) Find the optimal w using the Matlab function fminunc starting with w = 0 . Plot the
optimal w in a figure. Try trust-region Newton-CG and BFGS methods.
10
trustregion NewtonCG
BFGS
5
0
5
10
0
100
200
300
400
4) Comments
Both method used for the minimization of equation (2) give very similar solution as seen
in figure (1), however, after performing several tests of different size problems the trust
region Newton-CG method took longer time to converge than the BFGS method in my
personal computer. Since both method obtained very similar estimate of the parameter
w of the regression model, the accuracy of recognitions does not change between minimization methods. What is interesting is that, from my results, accuracy of recognizing a
face with this mehtodology is very poor while accuracy of telling that is not a face is better.
M ATLAB CODE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
c l e a r a l l ; close a l l ;
%READ DATA
D = load ( data . mat ) ;
nFace=600;nNonface=600;
X = [D. facedata ( 1 : nFace , : ) ;D. nonfacedata ( 1 : nNonface , : ) ] ;
y = [ ones ( 1 , nFace ) ,ones ( 1 , nNonface ) ] ;
w0 = zeros ( 3 6 1 , 1 ) ;
% fmin
options = optimoptions ( fminunc , GradObj , on , MaxIter ,1000 , Display , i t e r ) ;
func=@(w) myfunc (w, X , y ) ;
w1opt1=fminunc ( func , w0, options ) ;
options = optimset ( LargeScale , o f f , HessUpdate , bfgs , gradobj , on , MaxIter ,1000 ,
display , i t e r ) ;
func=@(w) myfunc (w, X , y ) ;
w1opt2=fminunc ( func , w0, options ) ;
% PLOTTING
figure
plot ( w1opt1 , ko )
hold on
plot ( w1opt2 , k * )
pbaspect ( [ 5 , 2 , 1 ] )
h_ l e g =legend ( t r u s t region NewtonCG , BFGS ) ;
s e t (h_ leg , FontSize , 1 0 )
s e t ( gca , f o n t s i z e , 2 0 )
p r i n t painters dpsc r300 w. ps
POpt1 = 1 . / ( 1 . + exp( (w1opt1 * X) ) ) ;
POpt2 = 1 . / ( 1 . + exp( (w1opt2 * X) ) ) ;
% PROBABILITIES COMPUTATION
PfOpt1
= 1 . / ( 1 . + exp( (w1opt1 *D. facedata ( nFace : end , : ) ) ) ) ;
PnonfOpt1 = 1 . / ( 1 . + exp( (w1opt1 *D. nonfacedata ( nNonface : end , : ) ) ) ) ;
PfOpt2
= 1 . / ( 1 . + exp( (w1opt2 *D. facedata ( nFace : end , : ) ) ) ) ;
PnonfOpt2 = 1 . / ( 1 . + exp( (w1opt2 *D. nonfacedata ( nNonface : end , : ) ) ) ) ;
% ACCURACY COMPUTATION
AccfOpt1
= sum( PfOpt1 > 0 . 5 ) / length ( PfOpt1 )
AccnonfOpt1 = sum( PnonfOpt1 < 0 . 5 ) / length ( PnonfOpt1 )
AccfOpt2
= sum( PfOpt2 > 0 . 5 ) / length ( PfOpt2 )
AccnonfOpt2 = sum( PnonfOpt2 < 0 . 5 ) / length ( PnonfOpt2 )
../hw4.m