Anda di halaman 1dari 43

LOGO

Prepared By:
M Arman 36
Rana Riaz 02
M Jahangir Sarwar 48
Syed Ali Adnan 33
Nouman Pervaiz 62


Correlation
and
Regression
Nouman Pervaiz
Correlation and linear regression are the most
commonly used techniques for investigating the
relationship between two quantitative variables.

The goal of a correlation analysis is to see whether
two measurement variables co vary, and to quantify
the strength of the relationship between the
variables, whereas regression expresses the
relationship in the form of an equation.
Correlation a Linear association between two random
variables

Correlation analysis in which we study the degree of
closeness of relationship between the variables

Correlation lies between +1 to -1

For example, in students taking a Math's and English test,
we could use correlation to determine whether students
who are good at Math's tend to be good at English as well,
and regression to determine whether the marks in English
can be predicted for given marks in Math's.




Correlation
A zero correlation indicates that there is no relationship
between the variables

A correlation of 1 indicates a perfect negative
correlation

A correlation of +1 indicates a perfect positive correlation
Examples:
Heights and weights;
Household income and expenditure;
Price and supply of commodities;
Amount of rainfall and yield of crops.

Nature of Correlation
Type 1
Positive Negative No Perfect

If two related variables are such that when one
increases (decreases), the other also increases
(decreases).
If two variables are such that when one increases
(decreases), the other decreases (increases)
If both the variables are independent

When plotted on a graph it tends to be a perfect line

When plotted on a graph it is not a straight line

Type 2
Linear Non linear
Two independent and one dependent variable
One dependent and more than one independent
variables
One dependent variable and more than one independent
variable but only one independent variable is
considered and other independent variables are
considered constant

Type 3
Simple Multiple Partial
Rana Riaz
Methods
Scatter Diagram Method

Karl Pearson Coefficient Correlation of
Method

Spearmans Rank Correlation Method


0
20
40
60
80
100
120
140
160
180
0 50 100 150 200 250
Drug A (dose in mg)
S
y
m
p
t
o
m

I
n
d
e
x
0
20
40
60
80
100
120
140
160
0 50 100 150 200 250
Drug B (dose in mg)
S
y
m
p
t
o
m

I
n
d
e
x
Very good fit Moderate fit
Correlation: Linear
Relationships
Strong relationship = good linear fit
Points clustered closely around a line show a strong correlation.
The line is a good predictor (good fit) with the data. The more
spread out the points, the weaker the correlation, and the less
good the fit. The line is a REGRESSSION line (Y = a+bx)
Coefficient of Correlation
A measure of the strength of the linear relationship
between two variables that is defined in terms of the
(sample) covariance of the variables divided by their
(sample) standard deviations

Represented by r
r lies between +1 to -1
Magnitude and Direction
-1 < r < +1
The + and signs are used for positive linear
correlations and negative linear correlations,
respectively



| | | |
2 2 2 2
) ( ) (




=
Y Y n X X n
Y X XY n
rxy

Shared variability of X and Y variables on the top

Individual variability of X and Y variables on the bottom

Interpreting Correlation Coefficient r
strong correlation: r > .70 or r < .70

moderate correlation: r is between .30 & .70
or r is between .30 and .70

weak correlation: r is between 0 and .30 or r is
between 0 and .30 .
Coefficient of Determination
Coefficient of determination lies between 0 to 1

Represented by r
2

The coefficient of determination is a measure of how well
the regression line represents the data
If the regression line passes exactly through every point
on the scatter plot, it would be able to explain all of the
variation
The further the line is away from the points, the less it is
able to explain





r
2
, is useful because it gives the proportion of the variance
(fluctuation) of one variable that is predictable from the other
variable

It is a measure that allows us to determine how certain one can
be in making predictions from a certain model/graph

The coefficient of determination is the ratio of the explained
variation to the total variation

The coefficient of determination is such that 0 < r
2
< 1, and
denotes the strength of the linear association between x and y

The Coefficient of determination represents the
percent of the data that is the closest to the line of
best fit

For example, if r = 0.922, then r
2
= 0.850

Which means that 85% of the total variation in y can
be explained by the linear relationship between x and
y (as described by the regression equation)

The other 15% of the total variation in y remains
unexplained
M.Jahangir Sarwar
The Spearmans Rank Correlation Coefficient is a different
way of describing the strength of the correlation between two
quantities
A method to determine correlation where items can be ranked
even though they cannot be measured on numerical scale and
as an alternative the method, the method of rank correlation is
used. Thus when the values of the two variables are converted
to their ranks, and there from the correlation is obtained, the
correlations known as rank correlation
Examples:
a) Useful to measure the correlation between the ranking by the two
judges to various exhibits at an industrial fair
b) To measure the relationship between two persons preferences for
various kinds of foods.


Spearman's rank coefficient




Lets take an example of 10 students, with their number of hrs spend on study
(x) and Grades (y) in exam.













Highest value in x = 18) p = 0.982
Rank 6 and 7 = Both 8 ( mean = 6.5, by 6+7=13/2)

X Y Rank of X Rank of Y d d2
8 ---- 6 56 ---- 7 6.5 7 -0.5 0.25
5 ---- 8 44 ---- 8 8.5 9 -0.5 0.25
11 ---- 4 79 ---- 3 4 3 1.0 1.00
13 ---- 3 72 ---- 4 3 4 0.0 1.00
10 ---- 5 70 ----- 5 5 5 0.5 0.00
5 ---- 9 54 ---- 9 8.5 8 0.0 0.25
18 - --- 1 94 ---- 1 1 1 0.0 0.00
15 ---- 2 85 ----- 2 2 2 0.0 0.00
2 ----- 10 33 ----- 10 10 10 0.0 0.00
8 ----- 7 65 ----- 6 6.5 6 0.5 0.25
d = Diff b/w
ranks X and Y

n= number of
pairs of values ,
x and y
A monotonic relationship is a relationship that does one of the
following: (1) as the value of one variable increases so does the value
of the other variable or (2) as the value of one variable increases the
other variable value decreases.








Spearman Rank Correlation Coefficient is a non-parametric measure
of correlation. Spearman Rank Correlation Coefficient tries to assess
the relationship between ranks without making any assumptions
about the nature of their relationship.

Regression
Simple regression analysis provides an equation that can be
used to estimate or predict the value of one variable from a
given value of other

Suppose data on retail sales and population size are collected
for 50 cities. By regression analysis of these data, we obtain
an equation that relates retail sales to population size. Then if
we know the population of some city but do not know the
citys retail sales, we can use the equation to estimate retail
sales from the size of the citys population.
Or
The Variable to be estimated or predicted is termed as dependant variable
on the basis of independent variable
Heights of children on the basis of their ages, height is dependent and age
is independent variable.

Importance of Regression
Analysis
Regression analysis helps in three important ways :

It provides estimate of values of dependent variables from
values of independent variables.

It can be extended to 2or more variables, which is known as
multiple regression.

It shows the nature of relationship between two or more
variable.

Methods Of Regression
REGRESSION
GRAPHICALLY
FREE HAND CURVE
LESAST SQUARES
ALGEBRAICALLY
LESAST SQUARES
DEVIATION METHOD
FROM AIRTHMETIC
MEAN
DEVIATION METHOD
FORM ASSUMED MEAN
Or
Muhammad Arman

1.Least Square Method


The regression equation of X on Y is :
X= a+bY
Where,
X=Dependent variable
Y=Independent variable
The regression equation of Y on X is:
Y = a+bX
Where,
Y=Dependent variable
X=Independent variable
And the values of a and b in the above equations are found by the
method of least of Squares-reference . The values of a and b are found
with the help of normal equations given below:
(I ) (II )







+ =
+ =
2
X b X a XY
X b na Y


+ =
+ =
2
Y b Y a XY
Y b na X
Example1-:From the following data obtain the two regression
equations using the method of Least Squares.



X 3 2 7 4 8
Y 6 1 8 5 9
X Y XY X
2
Y
2
3 6 18 9 36
2 1 2 4 1
7 8 56 49 64
4 5 20 16 25
8 9 72 64 81

= 24 X

= 29 Y

=168 XY 142
2
=

X 207
2
=

Y

+ = X b na Y

+ =
2
X b X a XY
Substitution the values from the table we get
29=5a+24b(i)
168=24a+142b
84=12a+71b..(ii)

Multiplying equation (i ) by 12 and (ii) by 5
348=60a+288b(iii)
420=60a+355b(iv)

By solving equation(iii)and (iv) we get
a=0.66 and b=1.07
By putting the value of a and b in the Regression equation Y on X we get
Y=0.66+1.07X
Now to find the regression equation of X on Y ,
The two normal equation are





+ =
+ =
2
Y b Y a XY
Y b na X
Substituting the values in the equations we get

24=5a+29b(i)
168=29a+207b..(ii)

Multiplying equation (i)by 29 and in (ii) by 5 we get

a=0.49 and b=0.74



Substituting the values of a and b in the
Regression equation X and Y
X=0.49+0.74Y
2.Deaviation from the Arithmetic mean method:

The calculation by the least squares method are quit cumbersome when the
values of X and Y are large. So the work can be simplified by using this
method.
The formula for the calculation of Regression Equations by this method:

Regression Equation of X on Y-
) ( ) ( Y Y b X X
xy
=
Regression Equation of Y on X-
) ( ) ( X X b Y Y
yx
=

=
2
y
xy
b
xy

=
2
x
xy
b
yx
and
Where,
xy
b
yx
b
and
= Regression
Coefficient
Example2-: From the previous data obtain the regression equations by
Taking deviations from the actual means of X and Y series.
X 3 2 7 4 8
Y 6 1 8 5 9
X Y x
2
y
2
xy
3 6 -1.8 0.2 3.24 0.04 -0.36
2 1 -2.8 -4.8 7.84 23.04 13.44
7 8 2.2 2.2 4.84 4.84 4.84
4 5 -0.8 -0.8 0.64 0.64 0.64
8 9 3.2 3.2 10.24 10.24 10.24
X X x =
Y Y y =

= 24 X

= 29 Y 8 . 26
2
=

x 8 . 28 =

xy 8 . 38
2
=

=0 x
0

= y
Solution-:
Regression Equation of X on Y is

( )
( )
49 . 0 74 . 0
8 . 5 74 . 0 8 . 4
8 . 5
8 . 38
8 . 28
8 . 4
2
+ =
=
=
=

Y X
Y X
Y X
y
xy
b
xy
Regression Equation of Y on X is
) ( ) ( X X b Y Y
yx
=
( )
66 . 0 07 . 1
) 8 . 4 ( 07 . 1 8 . 5
8 . 4
8 . 26
8 . 28
8 . 5
2
+ =
=
=
=

X Y
X Y
X Y
x
xy
b
yx
.(I)
.(II)
) ( ) ( Y Y b X X
xy
=
It would be observed that these regression equations are same as those
obtained by the direct method .
3.Deviation from Assumed mean method-:
When actual mean of X and Y variables are in fractions ,the calculations
can be simplified by taking the deviations from the assumed mean.
The Regression Equation of X on Y-:
( )

=
2
2
y
y
y
x y x
xy
d d N
d d d d N
b
The Regression Equation of Y on X-:

( )

=
2
2
x
x
y
x y x
yx
d d N
d d d d N
b
) ( ) ( Y Y b X X
xy
=
) ( ) ( X X b Y Y
yx
=
But , here the values of and will be calculated by
following formula:
xy
b
yx
b
Example-: From the data given in previous example calculate regression
equations by assuming 7 as the mean of X series and 6 as the mean of Y series.
X Y
Dev. From
assu. Mean
7 (d
x
)=X-7
Dev. From
assu.
Mean 6
(d
y
)=Y-6


d
x
d
y



3 6 -4 16 0 0 0
2 1 -5 25 -5 25 +25
7 8 0 0 2 4 0
4 5 -3 9 -1 1 +3
8 9 1 1 3 9 +3
2
x
d
2
y
d

= 24 X

= 29 Y

= 11
x
d

= 1
y
d

=51
2
x
d

=39
2
y
d
=31
y x
d d
The Regression Coefficient of X on Y-:
( )

=
2
2
y
y
y
x y x
xy
d d N
d d d d N
b
74 . 0
194
144
1 195
11 155
) 1 ( ) 39 ( 5
) 1 )( 11 ( ) 31 ( 5
2
=
=

=


=
xy
xy
xy
xy
b
b
b
b
8 . 5
5
29
= = =

Y
N
Y
Y
The Regression equation of X on Y-:
49 . 0 74 . 0
) 8 . 5 ( 74 . 0 ) 8 . 4 (
) ( ) (
+ =
=
=
Y X
Y X
Y Y b X X
xy
8 . 4
5
24
= = =

X
N
X
X
The Regression coefficient of Y on X-:
( )

=
2
2
x
x
y
x y x
yx
d d N
d d d d N
b
07 . 1
134
144
121 255
11 155
) 11 ( ) 51 ( 5
) 1 )( 11 ( ) 31 ( 5
2
=
=

=


=
yx
yx
yx
yx
b
b
b
b
The Regression Equation of Y on X-:
) ( ) ( X X b Y Y
yx
=
66 . 0 07 . 1
) 8 . 4 ( 07 . 1 ) 8 . 5 (
+ =
=
X Y
X Y
It would be observed the these regression equations are same as those
obtained by the least squares method and deviation from arithmetic mean .
Syed Ali Adnan
Application
In the field of business regression is widely used.
Businessman are interested in predicting future
production, consumption, investment, prices, profits,
sales etc. So the success of a businessman depends on
the correctness of the various estimates that he is
required to make. It is also use in sociological study and
economic planning to find the projections of population,
birth rates. death rates etc.

There are three main uses for correlation and regression.

One is to test hypotheses about cause-and-effect
relationships. In this case, the experimenter determines
the values of the X-variable and sees whether variation in
X causes variation in Y. For example, giving people
different amounts of a drug and measuring their blood
pressure.

The second main use for correlation and regression is to
see whether two variables are associated, without
necessarily inferring a cause-and-effect relationship. In
this case, neither variable is determined by the
experimenter; both are naturally variable. If an
association is found, the inference is that variation in X
may cause variation in Y, or variation in Y may cause
variation in X, or variation in some other factor may
affect both X and Y.

The third common use of linear regression is estimating
the value of one variable corresponding to a particular
value of the other variable.

Correlation and Regression Analysis by: Miraj Din Mirza

Simple linear Regression and Correlation
Basic Statistics for business and economics
By Earl k. Bowen

http://www.experiment-resources.com/correlation-and-regression.html
http://www.experiment-resources.com/spearman-rank-correlation-
coefficient.html
http://www.mei.org.uk/files/pdf/Spearmanrcc.pdf
https://statistics.laerd.com/statistical-guides/spearmans-rank-order-
correlation-statistical-guide.php
http://www.statpac.com/statistics-calculator/correlation-regression.htm
References
Thanks

Anda mungkin juga menyukai