Anda di halaman 1dari 14

MACHINE LEARNING: COL774

ASSIGNMENT 1

1. Linear Regression
a. First, x was normalized by calculating the mean and the variance. The intercept term
was added into x. In a while loop, the update equation was implemented and the loop
was terminated only when the change in each component of theta was less than 1X10^6. Different learning rates were tried. The final answer was more or less same but the
number of iterations required varied. 0.4 was finally chosen as the best value.
Results:
Learning rate = 0.4
Stopping criteria: when the change in all the attributes for is less than 0.000001
Final = [5.8391, 4.5930]
b. The values were simply plotted.

c. First arrays 1 and 2 were taken over an interval at the distance of 0.1. J() was
calculated for each pair and was plotted as a mesh using the surf() function. The value
of was then calculated using the update equation and J() was calculated and plotted
in each iteration. Hence we got the plot given below.
Shown here is J() plotted in * s on the surface of the mesh.

d. This time we used contour() instead of surf() to plot the contours. X1 and x2 were
plotted on the 2-D plane.
Shown below are the values of y in each iteration of gradient descent.

e. For learning rate = 0.1

For learning rate = 0.5

For learning rate = 0.9

For learning rate = 1.3

For learning rate = 2.1, the values diverge to infinity and hence no plot is formed.
For learning rate less than 1, the values slowly converge towards the minima. The lower
the learning rate, the descend starts from far off and slowly moves towards the minima.
With increasing values of learning rate, the descent starts from lower values from the
start and reaches the minima in lesser number of iterations.

2. Locally Weighed Linear Regression


a. We implemented Linear Regression in this part. Same as before
The obtained = [1.0313, 0.8352]

b. This time, we took an array of x values from min(x) to max(x) at an interval of 0.1 each.
Then for each value of x, weights of all the other data points were calculated. Theta was
then calculated using the given equation and the estimated value of y was calculated
using the weights for the given x.
Observation:
For = 0.8

c. For = 0.1

For = 0.3

For = 2

For = 10

With increasing , the curve tends towards being linear. This is because as the value of
increases, the difference in weights reduces. As approaches infinity, the plot
approaches straight line, resembling linear regression.
So, being too small results in overfitting and being too large results in underfitting

For = 1000, result is a straight line, similar to linear regression.

3. Logistic Regression
a. Newtons method was used to obtain the Theta this time. We first calculated the log
likelihood in terms of three variables, Theta = [theta1, theta2, theta3]. Hessian was
calculated of the log likelihood wrt Theta. Gradient was also calculated. Then Theta was
initialized and a loop was started which ran the update equation after substituting the
values of theta1, theta2, and theta3 in the gradient and the hessian. The loop was
terminated only when the change in the log likelihood was less than 1X10^-9.
Observation:
Coefficients of = [-2.6205 0.7604 1.1719]
b. The data provided was plotted and a straight line was plotted using the coefficients of
Theta.
+ indicate y = 1, * denote y = 0
----- represents decision boundary

4. Gaussian Discriminant Analysis


a. The Gaussian Discriminant Analysis was carried on using the equations given in the
notes.
Observation:
0 = [98.3800 429.6600]
1 = [137.4600 366.6200]
0.2875 0.0267
=
0.0267 1.1232
b.
*s indicate Alaska, + indicate Canada

c. The equation of the decision boundary was derived after putting the value of probability
as 0.5. This resulted in the following equation:
(1 0 ) 1 + 1 (1 0 ) = 1 1 1 0 1 0
------ represents the Decision Boundary

d. The given equations were implemented and results were obtained.


The values obtained are:
98.3800
0 =
429.6600
1 =
0 = 1.0 103
1 =

137.4600
366.6200
0.2554 0.1843
0.1843 1.3711

319.5684 130.8348
130.8348 875.3956

e. The equation for the boundary is:


(01 11 ) + (1 11 0 01 ) + (11 1 01 0 )
= 1 11 1 0 01 0

There are two solutions to this equation. Only one serves as the boundary.
------ is the quadratic decision boundary.

f.

The quadratic boundary fits the observation better and is better in predicting data.
However, it involves more parameters. Errors might be lesser for the quadratic
boundary.

Anda mungkin juga menyukai