Anda di halaman 1dari 49

Centre For Foundation Studies

Department of Sciences and Engineering

FHMM1034 Mathematics III

Chapter 5
Correlation and Regression
FHMM1034
Mathematics III

Content
5.1 Introduction
5.2 Linear Correlation
5.3 Simple Linear Regression
5.4 Coefficient of Determination
5.5 Regression Analysis : A complete example
FHMM1034
Mathematics III

5.1
Introduction

FHMM1034
Mathematics III

Introduction
The main objective of this chapter is to analyze a
collection of paired sample data (or bivariate data)
and determine whether there appears to be a
relationship between the two variables.
Example:
What is the relationship between cholesterol levels
and the incidence of heart disease?

FHMM1034
Mathematics III

Introduction
There are 2 most common procedures for examining
relationships between measured variables:
1.

Correlation Analysis

Is there a relationship between two (or more variables?


If there is, what is the strength of the relationship?
2.

Regression Analysis

Develop a model that relates Y to X.


Predict the future values of Y variable.
FHMM1034
Mathematics III

Bivariate Data
When two variables are measured on a single
experimental unit, the resulting data are called bivariate
data.
You can describe each variable individually, and you
can also explore the relationship between the two
variables.
Bivariate data can be described with
Graphs
Numerical Measures
FHMM1034
Mathematics III

Examining Relationship
Dependent variable (also known as Y variable)
which measures the outcome of a study. It is the
variable that is being predicted or estimated.
Independent variable (also known as X variable)
which is a variable that attempts to explain the
variation in Y. It is the predictor variable.

FHMM1034
Mathematics III

Scatter Diagram
When both of the variables are quantitative, call one variable x
and the other y. A single measurement is a pair of numbers (x, y)
that can be plotted using a two-dimensional graph called a
scatter plot.
y
(2, 5)
y=5
x
x=2
Scatter diagram (scatter plot) is a plot of paired observations that
portrays the relationship between the X and Y variables.
FHMM1034
Mathematics III

Example 1
Incomes and food expenditure of seven households are
listed below. Using the information, draw a scatter
diagram.

FHMM1034
Mathematics III

Income
(hundreds RM)

Food expenditure
(hundreds of dollars)

35
49
21
39
15
28
25

9
15
7
11
5
8
9
9

Example 1 (cont.)
The scatter diagram:

FHMM1034
Mathematics III

10

5.2
Linear Correlation

FHMM1034
Mathematics III

11

Correlation Analysis
A group of techniques to measure the association
relationship between variables.
Examples:
1.Time spent study and exam grade.
2.Salary and years of working experience.
3.Age and blood pressure.
4.Smoking and lungs cancer.
FHMM1034
Mathematics III

12

Linear Correlation Coefficient


Measures the strength of the linear
association/relationship between two variables.
The linear correlation coefficient measures how closely
the points in a scatter diagram are spread around the
regression line.
The correlation coefficient calculated for the population
data is denoted by and for sample data is denoted by
r.
The value of the correlation coefficient always lies in
the range 1 to 1; that is,
1 1 and 1 r 1
FHMM1034
Mathematics III

13

Linear Correlation Coefficient


The linear correlation coefficient, r, (is also called
the Pearson product moment correlation coefficient)
measures the strength of the linear relationship
between the paired x and y quantitative values in
a sample.

r
FHMM1034
Mathematics III

S XY
S XX

SYY
14

Linear Correlation Coefficient


Correlation Coefficient, r

S XY
S XX

SYY

where,
S XX x 2

SYY y
FHMM1034
Mathematics III

S XY

xy

xy
n

15

Linear Correlation

FHMM1034
Mathematics III

16

Linear Correlation

FHMM1034
Mathematics III

17

Linear Correlation

FHMM1034
Mathematics III

18

Linear Correlation
Perfect positive linear correlation :
When r = 1:
In this case, all points in the scatter diagram lie on a
straight line that slopes upward from left to right.
y

r=1

FHMM1034
Mathematics III

x
19

Linear Correlation
Perfect negative linear correlation :
When r = 1:
In this case, all points in the scatter diagram fall on a
straight line that slopes downward from left to the right.
y

FHMM1034
Mathematics III

r = 1

20

Linear Correlation

FHMM1034
Mathematics III

21

Linear Correlation
Properties of the linear correlation coefficient, r :
(i)The value of r is always between 1 and 1 inclusive.
That is,
1 r 1
(ii) r measures the strength of a linear relationship. It is
not designed to measure the strength of a relationship
that is not linear.

FHMM1034
Mathematics III

22

Example 2
Calculate the correlation coefficient of the example
of incomes and food expenditures of seven
households in Example 1.

FHMM1034
Mathematics III

23

5.3
Simple Linear
Regression
FHMM1034
Mathematics III

24

Simple Linear Regression


Relationship between
food expenditure and income?
The simple regression equation
(model) expresses a relationship
between 2 variable, one independent
variable and one dependent variable.
FHMM1034
Mathematics III

25

Simple Linear Regression


Variable x :
(i) Independent variable
(ii) Predictor variable
(iii)Explanatory variable
Variable y :
(i) Dependent variable
(ii) Response variable
FHMM1034
Mathematics III

26

Simple Linear Regression


A (simple) regression model that gives a straight
line relationship between two variables is called a
linear regression model, y = A + Bx .
where
x
A
B
FHMM1034
Mathematics III

y
= dependent variable
= independent variable
= y-intercept
= slope
27

Simple Linear Regression


Given a collection of paired sample data, the
regression equation describes the relationship
between the two variables algebraically.
The graph of the regression equation is called the
regression line.

y a b x

FHMM1034
Mathematics III

28

Simple Linear Regression


For least squares regression line, y a b x :
S XY
b
S XX

and

a y bx

The least squares regression line y a b x


is also called the regression of y on x.

FHMM1034
Mathematics III

29

Interpretation of a and b
Note:
When b is positive, an increase in x will lead to an
increase in y and a decrease in x will lead to a
decrease in y. Such a relationship between x and y
is called a positive linear relationship.
If the value of b is negative, an increase in x will
cause a decrease in y and a decrease in x will cause an
increase in y. Such a relationship between x and y
is called a negative linear relationship.
FHMM1034
Mathematics III

30

Using Regression Equation for Prediction


There is a linear correlation between x and y, the
best predicted yvalue is found by substituting the x
value into the regression equation.

FHMM1034
Mathematics III

31

Example 3
Table below shows the incomes and food expenditures
(in hundreds of dollar) of seven households.

FHMM1034
Mathematics III

Income

Food Expenditure

35

49

15

21

39

11

15

28

25

9
32

Example 3
(a) Find the least squares regression line for the data on
incomes and food expenditures on the seven
households.
(b) What is the predicted food expenditure for a
household with income of RM3000?
(c) Give a brief interpretation of the values of a and b
calculated in part (a).
FHMM1034
Mathematics III

33

5.4
Coefficient of
Determination
FHMM1034
Mathematics III

34

Error Sum of Squares, SSE


The error sum of squares, denoted by SSE, is
2

SSE ( y y )

The values of a and b which give the minimum SSE


are called the least squares estimates of A and B and
the regression line obtained with these estimates is
called the least squares line.

FHMM1034
Mathematics III

35

Standard Deviation of
Random Errors, Se
The standard deviation of errors tells how widely the
errors and hence the values of y are spread for a given x.

SSE
se
n2

where SSE ( y y )

For calculation, we will use,

SYY bS XY
se
n2
FHMM1034
Mathematics III

where SSE SYY bS XY


36

Example 4
Compute the standard deviation of errors, Se, for
the data on monthly incomes and food expenditures
of the seven households given in Example 3.

FHMM1034
Mathematics III

37

Total Sum of Squares, SST


The total sum of squares, denoted by SST is given
by,

SST ( y y )
SYY

( y )
y
n
2

FHMM1034
Mathematics III

38

Example 5
For the regression line in Example 3, find the value
of its SSE and SST.

FHMM1034
Mathematics III

39

Regression Sum of Squares, SSR


This reduction in squared errors is called the
regression sum of squares and is denoted by SSR.
Thus,

FHMM1034
Mathematics III

SSR SST SSE

40

Coefficient of Determination, r2
Measure how well does the independent variable
explain the dependent variable in the regression
model.

FHMM1034
Mathematics III

41

Coefficient of Determination, r2

The coefficient of determination, denoted by r2,


represents the proportion of SST that is explained
by the use of the linear regression model.
SSR
2
r
SST

The computational formula for r2 is:


2
S
XY
r2
S XX S YY

FHMM1034
Mathematics III

42

Example 6
For the data in Example 3, calculate the coefficient
of determination. Interpret your answer.

FHMM1034
Mathematics III

43

5.5
Regression Analysis:
A Complete Example
FHMM1034
Mathematics III

44

Regression Analysis:
A Complete Example
A random sample of eight drivers insured with a
company and having similar auto insurance policies
was selected. The following table lists their driving
experiences (in years) and monthly auto insurance
premiums (in dollars).

FHMM1034
Mathematics III

45

Regression Analysis:
A Complete Example
Driving Experiences
(in years)

Monthly Auto Insurance


Premium (in dollars)

64

87

12

50

71

15

44

56

25

42

16

60

FHMM1034
Mathematics III

46

Regression Analysis:
A Complete Example
(a) Does the insurance premium depend on the driving
experience or does the driving experience depend on
the insurance premium? Do you expect a positive or
negative relationship between these two variables?
(b) Compute SXX , SYY and SXY .
(c) Find the least squares regression line by choosing
appropriate dependent and independent variables
based on answer in part (a).
FHMM1034
Mathematics III

47

Regression Analysis:
A Complete Example
(d) Interpret the meaning of the values of a and b
calculated in part (c).
(e) Plot the scatter diagram and the regression line.
(f) Calculate r and r2 and explain what they mean.
(g) Calculate standard deviation of errors.
(h) Predict the monthly auto insurance premium for a
driver with 10 years of driving experience.
FHMM1034
Mathematics III

48

The End
of
Chapter 5
FHMM1034
Mathematics III

49