Anda di halaman 1dari 24

Introductory Mathematics & Statistics

Chapter 17
Correlation
Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-1

Learning Objectives
Understand correlation analysis and relationships between variables Draw and interpret a scatter diagram Understand and calculate the product-moment correlation coefficient Understand and calculate the rank correlation coefficient Recognise spurious correlation Test a correlation coefficient for significance

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-2

17.1 Introduction
The consideration of whether there is any relationship or association between two variables is called correlation analysis The correlation coefficient is the index which defines the strength of association between two variables There are many instances where you may want to test whether there is a relationship between two variables Examples the number of CDs sold in a district and the number of teenagers who live in that district the number of breakdowns of a certain type of machine and the age of the machine the level of education obtained by an individual and his or her income in later working life

If there is a relationship between any two variables, it may be possible to predict the value of one of the variables from the value of the other
17-3

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17.1 Introduction (cont)


Dependent and independent variables
To establish whether there is a relationship between two variables, an appropriate random sample must be taken and a measurement recorded of each of the two variables Such data are said to be bivariate data, since they consist of two variables Data may be written as ordered pairs, where they are expressed in a specific order for each individual, i.e. (first variable value, second variable value)

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-4

17.1 Introduction (cont)


To predict the value of one variable from the value of the other (if a relationship exists), there is a basic rule

It is useful to label the variables according to the following


The dependent variable is the one whose value is to be predicted. It is usually denoted by the letter y The independent variable is the one whose value is used to make the prediction. It is usually denoted by the letter x

In this form the ordered pairs resemble points on a graph In doing so, it is possible to get a feel for what the actual relationship between the variables may be, even before any calculations are undertaken
Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-5

17.2 Scatter diagrams


A scatter diagram or scatter plot is a display in which ordered pairs of measurements are plotted on a coordinate axes system The independent variable (x) is represented on the horizontal axis The dependent variable (y) is represented on the vertical axis

The points representing the data are usually plotted either by dots or crosses
Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-6

17.2 Scatter diagrams (cont)


Example
An insurance company manager is concerned about the health of female adults, since the company is prepared to give a reduced premium rate to those who have a certain level of fitness. In particular, he would like to investigate how their height is related to their weight, with a view to possibly using these measurements as a fitness criterion. To this end, he selects a random sample of 12 adult females and measures both their height (in cm) and weight (in kg). The results are:
Number Height (cm) Weight (kg) 1 167 71.8 2 168 72.0 3 165 69.3 4 165 70.0 5 160 64.2 6 156 58.1 7 169 74.0 8 166 70.0 9 162 59.3 10 158 59.0 11 168 67.1 12 168 64.0

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-7

17.2 Scatter diagrams (cont)


Solution

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-8

17.3 The Pearson product-moment correlation coefficient


The numerical measure of the degree of association between two variables is given by the product-moment correlation coefficient This index provides a quantitative measure of the extent to which the two variables are associated The value of the correlation coefficient is calculated from the bivariate data by means of a formula that involves the values of the data points The value of the correlation coefficient calculated from a sample is denoted by the letter r The value of the correlation coefficient calculated from a population is denoted by the Greek letter
Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-9

17.3 The Pearson product-moment correlation coefficient (cont)


The value of r

Sxx Syy

Sxy

Where
Sxx Syy Sxy
2 2 x x n 1 S x 2 2 y y n 1 S y

x x y y
17-10

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17.3 The Pearson product-moment correlation coefficient (cont)


Sx = the standard deviation of the x-variable Sy = the standard deviation of the y-variable n = the number of pairs of observations Alternative formulae for the calculation of r

n 1S x S y
or

S xy

xy nx y r

n 1S x S y

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-11

17.3 The Pearson product-moment correlation coefficient (cont)


Note:
The value of r must always lie between 1 and +1 (both inclusive) If r = +1, the two variables have perfect positive correlation. This means that on a scatter diagram, the points all lie on a straight line that has a positive slope If r = 1, the two variables have perfect negative correlation. This means that on a scatter diagram, the points all lie on a straight line that has a negative slope If the two variables are positively correlated, but not perfectly so, the coefficient lies between 0 and 1 If the two variables are negatively correlated, but not perfectly so, the coefficient lies between 1 and 0 If the two variables have no overall upward or downward trend whatsoever, the coefficient is 0

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-12

17.3 The Pearson product-moment correlation coefficient (cont)


Positive and negative correlation
It can be seen that, if correlation exists, it can be in one of two directions: positive or negative If two variables x and y are positively correlated, this means that: large values of x are associated with large values of y, and small values of x are associated with small values of y If two variables x and y are negatively correlated, this means that: large values of x are associated with small values of y, and small values of x are associated with large values of y

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-13

17.3 The Pearson product-moment correlation coefficient (cont)


Positive correlation
Examples of scatter diagrams

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-14

17.3 The Pearson product-moment correlation coefficient (cont)


Negative correlation
Examples of scatter diagrams

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-15

17.3 The Pearson product-moment correlation coefficient (cont)


Calculation of r
Example Calculate the correlation coefficient for the data in the previous example Solution In this case, we denote height by x and weight by y n = 12 x 1164.3 y 66.6
Sxx Syy Sxy
2 x x 206.68 2 y y 337.44 x x y y 216.62

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-16

17.3 The Pearson product-moment correlation coefficient (cont)


Solution (cont)

206.68337.44

216.62

0.82

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-17

17.4 The Spearman rank correlation coefficient


An alternative measure of the degree of association between two variables is the rank correlation coefficient This coefficient does not strictly measure the degree of association between the actual observations, but rather the association between the ranks of the observations

rs 1

d2

n3 n

Where: d = difference between corresponding pairs of rankings n = number of pairs of observations

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-18

17.4 The Spearman rank correlation coefficient (cont..)


Being a correlation coefficient, rs has the following properties: rs = +1 for perfect positive correlation of the ranks, that is when the x-rank = the y-rank in each case, and hence d2 = 0 rs = 1 for perfect negative correlation of the ranks, that is when they run in precisely opposite order to each other, and hence

n3 n d 3
2

All other values of rs lie between 1 and +1

The subscript s in rs stands for Spearman and is used to distinguish it from the Pearson product-moment correlation coefficient
Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-19

17.5 Spurious correlation


If two variables are significantly correlated, this does not imply that one must be the cause of the other The degree of association is not directly proportional to the magnitude of the correlation coefficient

The correlation coefficient is subject to variations in sampling


Correlation between two variables that is really induced by other external variables is referred to as spurious correlation or false correlation
Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-20

17.6 Interpretation of the correlation coefficient


Testing a value of r
The actual test involves the calculation of a t-statistic, which can be found from the values of r and n The steps are: 1. Assume that the two variables are uncorrelated 2. Calculate the correlation coefficient (r) 3. Calculate the value of the t-statistic:

n2 t r 1 r 2
4. Calculate the value of y, where v = n2 5. Use Table 7 to find the critical value. This is the value in the 0.05 column If | t| < critical value, there is no correlation between the two variables If |t| > critical value, there is correlation between the two variables The risk that we are incorrect in our conclusion is 5%
Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-21

17.6 Interpretation of the correlation coefficient (cont)


Testing a value of rs for significance
The value of rs is tested for significance using a different procedure from that for r. It is outlined below: 1. Assume that the two sets of rankings are uncorrelated 2. Find the critical value of rs for the given value of n, using Table 8 3. If | rs | > critical value, reject the assumption in Step 1; a significant relationship does exist between the two sets of rankings 4. If | rs | < critical value, accept the assumption in Step 1; a significant relationship does not exist between the two sets of rankings

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-22

17.7 Unusual claimed correlations


There is no shortage of research in which some very odd relations are claimed to be found Some include the following 1. Age at which a mother gives birth and life expectancy of the child (the chance of living to 100 is doubled if the child was born to a woman aged under 25 years) 2. Attractiveness of a couple and the sex of their first child (physically attractive couples are 36% more likely than an unattractive couple to produce a girl as their first child) 3. Height of children and mental development (short children perform more poorly on intelligence tests than tall ones) 4. Blood flow to the heart and type of movie watched (watching comedy boosts blood flow; sad or distressing movies lower it)
17-23

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

Summary
We understood correlation analysis and relationships between variables We drew and interpreted a scatter diagram We understood and calculated the product-moment correlation coefficient We understood and calculated the rank correlation coefficient We recognised spurious correlation We tested a correlation coefficient for significance

Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e

17-24