Anda di halaman 1dari 20

STAT1306/STAT1603

Introductory Statistics (2017-2018A)


Chapter 8

Correlation and Regression

STAT1306/STAT1603 Ch 8 1
Regression
• 𝑌𝑌 – response variable
• (variable of our particular interest)
• 𝑋𝑋 – explanatory variable
• Assume 𝑋𝑋 causes 𝑌𝑌, for example,
– 𝑋𝑋: length of the ears
Cause
– 𝑌𝑌: life span
– In this example, which is 𝑋𝑋?
– Which variable is 𝑌𝑌?

STAT1306/STAT1603 Ch 8 2
Regression
• The model is Random error

𝑌𝑌𝑖𝑖 = 𝑓𝑓 𝑋𝑋𝑖𝑖 + 𝑒𝑒𝑖𝑖


• A simple linear regression model is
𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝑒𝑒𝑖𝑖
Straight line relationship

• Linear regression analysis is VERY


COMMONLY used in various fields of
study, i.e. science, economics, medical
research.
STAT1306/STAT1603 Ch 8 3
Covariance
• Covariance is a measure of the linear dependence
between two random variables 𝑋𝑋 and 𝑌𝑌
• By definition, 𝐶𝐶𝐶𝐶𝐶𝐶 𝑋𝑋, 𝑌𝑌 = 𝐸𝐸 𝑋𝑋𝑋𝑋 − 𝐸𝐸 𝑋𝑋 𝐸𝐸(𝑌𝑌)
• The sample covariance is defined to be
1 1
� 𝑋𝑋, 𝑌𝑌 =
• 𝐶𝐶𝐶𝐶𝐶𝐶 𝑆𝑆𝑋𝑋𝑋𝑋 = ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 − 𝑋𝑋� 𝑌𝑌𝑖𝑖 − 𝑌𝑌�
𝑛𝑛−1 𝑛𝑛−1

STAT1306/STAT1603 Ch 6 4
Covariance

• If 𝐶𝐶𝐶𝐶𝐶𝐶 𝑋𝑋, 𝑌𝑌 > 0, then 𝑌𝑌 tends to increase as 𝑋𝑋 increases


(positive association);
• If 𝐶𝐶𝐶𝐶𝐶𝐶 𝑋𝑋, 𝑌𝑌 < 0, then 𝑌𝑌 tends to decrease as 𝑋𝑋 increases
(negative association).
• If 𝐶𝐶𝐶𝐶𝐶𝐶 𝑋𝑋, 𝑌𝑌 = 0, then 𝑋𝑋 and 𝑌𝑌 are not linearly dependent.

STAT1306/STAT1603 Ch 6 5
Correlation
• For two random variables 𝑋𝑋 and 𝑌𝑌, their
correlation is
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌)
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑋𝑋, 𝑌𝑌 = 𝜌𝜌 =
𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) 𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌)
• This is a measure of the strength and
direction of the “linear relationship” of two
quantities.
• Note: there exists other “non-linear”
relationship between variables which
cannot be quantified by 𝜌𝜌.
STAT1306/STAT1603 Ch 8 6
Correlation
• Note that
−1 ≤ 𝜌𝜌 ≤ +1
• If 𝜌𝜌 < 0, then 𝑋𝑋 and 𝑌𝑌 have a negative
linear relationship. (negatively correlated)
• If 𝜌𝜌 = 0, then 𝑋𝑋 and 𝑌𝑌 are not linearly
related. (uncorrelated)
• If 𝜌𝜌 > 0, then 𝑋𝑋 and 𝑌𝑌 positively
associated. (positively correlated)

STAT1306/STAT1603 Ch 8 7
Correlation
• In the extreme cases,
• 𝜌𝜌 = +1
– 𝑋𝑋 and 𝑌𝑌 are perfectly positively correlated.
• 𝜌𝜌 = −1
– 𝑋𝑋 and 𝑌𝑌 are perfectly negatively correlated.
• In these two cases, all the (bivariate)
observations fall on a straight line in the
scatter diagram.
STAT1306/STAT1603 Ch 8 8
Correlation

Zero correlation  no “linear relationship”  no relationship


STAT1306/STAT1603 Ch 8 9
Example 8.1

STAT1306/STAT1603 Ch 8 10
Some Data Management Skills…
1

STAT1306/STAT1603 Ch 8 11
Using Excel to find Correlation

STAT1306/STAT1603 Ch 8 12
Section 8.1.1 Hypothesis Testing
• Testing for Significant Linear Association
𝐻𝐻0 : 𝜌𝜌 = 0 vs 𝐻𝐻1 : 𝜌𝜌 ≠ 0
• Then, with the bivariate normality assumption Just and
under 𝐻𝐻0 , skip it!
𝜌𝜌� 𝑛𝑛 − 2
𝑇𝑇 = ~𝑡𝑡 𝑛𝑛−2
1 − 𝜌𝜌�2
• For testing 𝐻𝐻0 : 𝜌𝜌 = 𝜌𝜌0 vs 𝐻𝐻1 : 𝜌𝜌 ≠ 𝜌𝜌0 ,
1 1 + 𝜌𝜌� 1 1 + 𝜌𝜌0
ln − ln
2 1 − 𝜌𝜌� 2 1 − 𝜌𝜌0
𝑍𝑍 = ~𝑁𝑁 0,1
1/ 𝑛𝑛 − 3
STAT1306/STAT1603 Ch 8 13
Section 8.1.2 Fallacies
• Consider
𝑋𝑋 = Age, 𝑌𝑌 = Weight, 𝑍𝑍 = IQ
• Study:
– Growth of the young children aged 3 to 11
• 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑋𝑋, 𝑌𝑌 = 0.8
• 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑋𝑋, 𝑍𝑍 = 0.68 𝑌𝑌 𝑍𝑍

• 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑌𝑌, 𝑍𝑍 = 0.61 𝑋𝑋 Masking effect


• Question:
– Are heavier children (𝑌𝑌) more intelligent (𝑍𝑍)?
STAT1306/STAT1603 Ch 8 14
Example 8.2

STAT1306/STAT1603 Ch 6 15
STAT1306/STAT1603 Ch 8 16
Example 8.3

STAT1306/STAT1603 Ch 8 17
STAT1306/STAT1603 Ch 8 18
STAT1306/STAT1603 Ch 8 19
STAT1306/STAT1603 Ch 8 20