Anda di halaman 1dari 26

Correlation and Regression

The existence of a relationship between


two or more variables constitute a vital
information for decision making.

Statistical Methods

Usha A. Kumar, IIT Bombay

Sales and advertising expenditure.


Family income and consumption.
Demand and Price.
Income and Age.

Statistical Methods

Usha A. Kumar, IIT Bombay

Correlation

The correlation between two random


variables, X and Y, is a measure of the
degree of linear association between the
two variables.
The population correlation, denoted by ,
can take on any value from -1 to 1.

Statistical Methods

Cov( X , Y )

X Y
Usha A. Kumar, IIT Bombay

Illustrations of Correlation
Y

= -1

= -.8

X
Y

=0

Statistical Methods

=0

=1

X
Y

= .8

Usha A. Kumar, IIT Bombay

The sample correlation coefficient for


the n pairs ( x , y ) ,( x , y ) is
1

( x x )( y y )

=
(x x ) ( y y )
i

Statistical Methods

S xy
S xx S yy

Usha A. Kumar, IIT Bombay

Regression analysis : Deals with


investigation of the relationship
between two or more variables related
in a nondeterministic fashion.

Statistical Methods

Usha A. Kumar, IIT Bombay

Example
Income
(X)
(in `00)

Consumption expenditure (Y)


(in `00)

Conditional means
of Y
E[Y| X]

80
100
120
140
160
180
200
220
240
260

55,60,65,70,75
65,70,74,80,85,88
79,84,90,94,98
80,93,95,103,108,113,115
102,107,110,116,118,125
110,115,120,130,135,140
120,136,140,144,145
135,137,140,152,157,160,162
137,145,155,165,175,189
150,152,175,178,180,185,191

65
77
89
101
113
125
137
149
161
173

Statistical Methods

Usha A. Kumar, IIT Bombay

Simple linear regression model

The model assumes that the expected value of Y is a linear


function of x. For fixed x, the variable Y differs from its
expected value by a random amount.

Yi = 0 + 1 xi + i
where
Yi is the value of the dependent variable in the ith trial

0 and 1 are parameters


xi is a known constant, namely, the value of the predictor
variable in the ith trial.

i is a random error term


Statistical Methods

Usha A. Kumar, IIT Bombay

Estimated Regression line


The estimated regression line is

y= b0 + b1 x
where
b0 is the estimate of 0
b1 is the estimate of 1

y is the predicted value

Statistical Methods

Usha A. Kumar, IIT Bombay

10

Errors in Regression
Y
the observed data point

yi
Error

yi

y = b + b x the fitted regression line


0 1

ei = yi yi

yi the predicted value of Y for x


i

X
Xi
Statistical Methods

Usha A. Kumar, IIT Bombay

11

Least Squares Regression


The sum of squared errors is
n

i =1

i =1

2
2

e
=
y

y
(
)
SSE = i i
i

The least squares regression line is


that which minimizes SSE with respect
to the estimates b0 and b1.

Statistical Methods

Usha A. Kumar, IIT Bombay

12

Least Square Estimates


b1

( x x )( y y )

=
(x x )
i

S xy
S xx

b0= y b1 x

Statistical Methods

Usha A. Kumar, IIT Bombay

13

Example

Sales and amount spent on


advertising for a retail store
in Mumbai and nine of its
branches.

Statistical Methods

Advertising
expenditure (in
ten thousands)

Sales
(in lakhs)

18

55

17

14

36

31

85

21

62

18

11

33

16

41

26

63

29

87
Usha A. Kumar, IIT Bombay

14

How good is the estimated


regression line?

When Gauss-Markov conditions are


met, the least square estimates are
good.

Statistical Methods

Usha A. Kumar, IIT Bombay

15

Gauss Markov Conditions


E(i ) = 0,
i = 1,2, ,n
V(i ) = 2
, i = 1,2, ,n
Cov(i , j ) = 0, ij = 1,2, ., n
When these assumptions are met, the
least square estimators are unbiased
and have minimum variance among all
linear unbiased estimators of and .
0

Statistical Methods

Usha A. Kumar, IIT Bombay

16

Residual Analysis
Residuals

Residuals

x or y

x or y

Homoscedasticity: Residuals appear completely


random. No indication of model inadequacy.
Residuals

Heteroscedasticity: Variance of residuals


changes when x changes.
Residuals

Time

Residuals exhibit a linear trend with time.


Statistical Methods

x or y

Curved pattern in residuals resulting from


underlying nonlinear relationship.
Usha A. Kumar, IIT Bombay

17

The Coefficient of Determination


Relationship among SST, SSR and SSE
SST = SSR +SSE
2
y

y
(
)
i =

2
2

y
+
y

y
(
)
(
)
i
i i

Coefficient of Determination
r2 = SSR/SST
The proportion of observed y variation that can be
explained by the simple linear regression model.
Statistical Methods

Usha A. Kumar, IIT Bombay

18

Inference in Regression
Analysis
Source of
Variation

Sum of
Squares

Degrees of
Freedom Mean Square F Ratio

Regression

SSR

(1)

MSR

Error

SSE

(n-2)

MSE

Total

SST

(n-1)

MST

Statistical Methods

MSR
MSE

Usha A. Kumar, IIT Bombay

19

Inference in Regression
Analysis

Assumption
The model errors i are normally
2
distributed with mean 0 and variance .
yi N ( 0 + 1 xi , 2 ).

b0 and b1 are also normally distributed as

they are linear combinations of

Multivariate Analysis

yi s.

Usha A. Kumar, IIT Bombay

20

Mean and Variance of


Estimators
E (b0 )

=
E (b1 ) 1
0

1
S .E =
(b0 ) s
+
n

x2
n

2
x
x
(
)

i
i =1

S .E (b1 ) =

s
n

2
x
x
(
)

i
i =1

Multivariate Analysis

Usha A. Kumar, IIT Bombay

21

Hypothesis testing
H 0 : 1 = 10
H1 : 1 10
The test statistic is
b1 10
tn 2
S .E (b1 )

Multivariate Analysis

Usha A. Kumar, IIT Bombay

22

Confidence interval

1 : b1 t / 2,n 2 S .E (b1 ), b1 + t / 2,n 2 S .E (b1 )

Multivariate Analysis

Usha A. Kumar, IIT Bombay

23

Residual Analysis

Normal probability plot of the residuals.


Plot of residuals against y
Plot of residuals against xi

Multivariate Analysis

Usha A. Kumar, IIT Bombay

24

Example

Sales and amount spent on


advertising for a retail store
in Mumbai and nine of its
branches.

Statistical Methods

Advertising
expenditure (in
ten thousands)

Sales
(in lakhs)

18

55

17

14

36

31

85

21

62

18

11

33

16

41

26

63

29

87

Usha A. Kumar, IIT Bombay

25

Statistical Methods

Usha A. Kumar, IIT Bombay

26

Anda mungkin juga menyukai