Anda di halaman 1dari 23

Simple and Multiple Linear Regression

Todd Thomas
University of Arkansas
tjt001@uark.edu

5 May 2014

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

1 / 23

Overview
1

Key Concepts
Simple and Multiple Linear Regression
Linear Correlation
Positive & Negative
Strength of Correlation
Calculating r
Notation for the Linear Correlation Coefficient
Requirements
Formula
Interpretation of r
Properties of r
Example
Multiple Linear Regression
Key Concept
Finding Multiple Regression Equation
Notation
Requirement
Todd Thomas (UARK)
Simple and Multiple Linear Regression

5 May 2014

2 / 23

Key Concepts

Simple and Multiple Linear Regression

Key Concepts

In this section we introduce methods for determining whether a


correlation, or association exists between two variables and we try to
determine if that correlation is linear.
Definition
Linear Correlation-A linear correlation exists between two variables when
there is a correlation and the plotted points of paired data result in a
pattern that can be approximated by a straight line.

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

3 / 23

Linear Correlation

Positive & Negative

Linear Correlation

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

4 / 23

Linear Correlation

Strength of Correlation

Measure the Strength of Linear Correlation

Since conclusions based on visual interpretations of scatter-plots are


largely subjective, we use the Linear Correlation Coefficient r , as a number
that measures the strength of the (linear) association between the two
variables.
Definition
The linear correlation coefficient r , measures the strength of the linear
correlation between the paired quantitative x and y values in a sample.

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

5 / 23

Calculating r

Notation for the Linear Correlation Coefficient

Notation

n Number of pairs of sample data


denotes addition of the items collected
x the sum of all xvalues
x 2 Indicates that each xvalue should be squared and then those
squares added.
(x)2 indicates that the xvalues should be added and the total then
squared.
xy indicates that each xvalue should first be multiplied by its
corresponding y value. After obtaining all such product find their sum.
r linear correlation coefficient for sample data.
linear correlation coefficeient for a population of paired data.

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

6 / 23

Calculating r

Requirements

Requirements

The sample data of paired (x, y ) data is a simple random sample of


quantitative data.
Visual examination of the scatter-plot must confirm that the points
approximate a straight-line pattern.
Outliers that are known to be error must be removed from the data
set.
Data must have a bivariate normal distribution.

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

7 / 23

Calculating r

Formula

Formula for Calculating r

For simpler manual calculations:


nxy (x)(y )
q
r=q
n(x)2 (x)2 n(y )2 (y )2

(1)

Or to get a better understanding about r :


r=

(zx zy )
n1

(2)

where zx denotes the zscore for an individual sample value x and zy is


the zscore for the corresponding sample value y . .

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

8 / 23

Calculating r

Interpretation of r

Interpertation of r

1. Using Computer Software to Interpret r If the P-value computed


from r is less than or equal to the significance level, conclude that there is
sufficient evidence to support a claim of linear correlation. Otherwise,
there is not sufficient evidence to support a claim of linear correlation.

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

9 / 23

Calculating r

Properties of r

Interperting the Linear Correlation Coefficeient r

The value of r is always between 1 and 1, that is 1 r 1


If all values of either variable are converted to a different scale, the
value of r will not change.
The value of r is not affected by the choice of x or y .
r measures the strength of a linear relationship. It is not designed to
measure the strength of a relationship that is not linear.
r is very sensitive to outliers in the sense that a single outlier can
dramatically affect its value.

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

10 / 23

Calculating r

Example

Example 1

The paired shoe/height data from five males are listed below. Find the
value of the linear correlation coefficient r for the paired sample data using
a significance lever of = 0.05.
x (Shoe Print)
29.7
29.7
31.4
31.8
27.6

Todd Thomas (UARK)

y(Height)
175.3
177.8
185.4
175.3
172.7

x2
882.09
882.09
985.96
1011.24
761.76

y2
30730.09
31612.84
34373.16
30730.09
29825.29

Simple and Multiple Linear Regression

xy
5206.41
5280.66
5821.56
5574.54
4766.52

5 May 2014

11 / 23

Calculating r

Example

Example cont.
Now we calculate the value of r using equation 1:
nxy (x)(y )
q
r=q
n(x)2 (x)2 n(y )2 (y )2
5(26649.69) (150.2)(886.5)
p
=p
5(4523.14) (150.2)2 5(157271.47) (886.5)2
96.15

=
55.66 475.10
= 0.591

Since r = 0.591 and using the significance level = 0.05 then is there
sufficient evidence to support a claim that there is a linear correlation
between shop print length and height?
Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

12 / 23

Calculating r

Example

Example 1 cont.

Using Technology: If the computed P-value is less than or equal to the


significance level, conclude there is a linear correlation. Thus for the above
example the P-value given by technology is p = 0.294 Since this is greater
than the significance lever of = 0.05 then we conclude there is not
sufficient evidence to support the claim of linear correlation of the data set.

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

13 / 23

Multiple Linear Regression

Key Concept

Key Concepts

In the preceding section we learned how to determine a linear correlation


between two variables. With Multiple Linear Regression we will analyze
a linear relationship between more than two variables. We will do this by:
finding the multiple regression equation.
using the value of adjusted R 2 and the Pvalue as measures of how
well the multiple regression equation fits the sample data.

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

14 / 23

Multiple Linear Regression

Key Concept

Definition

Definition
A multiple regression equation expresses a linear relationship between a
response variable y and two or more predictor variables (x1 , x2 , ..., xn ).
The general form of a multiple regression equation obtained from sample
data is:
y = b0 + b1 x1 + b2 x2 + ... + bn xn

Todd Thomas (UARK)

Simple and Multiple Linear Regression

(3)

5 May 2014

15 / 23

Finding Multiple Regression Equation

Notation

Notation

y = b0 + b1 x1 + b2 x2 + ... + bn xn - Multiple Regression equation found


from sample data.
y = 0 + 1 x1 + 2 x2 + ... + n xn Multiple Regression equation found
from population data
y predicted value of y (computed using the Multiple Regression
Equation above).
n Sample size.

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

16 / 23

Finding Multiple Regression Equation

Requirement

Requirements

For any specific set of x values, the regression equation is associated


with a random error often denoted by .
We assume that such errors are normally distributed with a mean of 0
and standard deviation of and that the the random errors are
independent
Manual calculations are not practical, so computer software such as
StatCrunch, Excel, TI-83/84 or STATDISK must be used to find a
Multiple Regression Equation.

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

17 / 23

Finding Multiple Regression Equation

Example 2

Example 2

On the next page is a chart of random samples of heights of mothers,


fathers, and daughters. Find the multiple regression equation in which the
response variable (y ) is the height of the daughter and the predictor
variables (x) are the height of the mother and the father.

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

18 / 23

Finding Multiple Regression Equation

Example 2

Data Set
Mother Height
63
67
64
60
65
67
59
60
58
72
63

Todd Thomas (UARK)

Father Height
64
65
67
72
72
72
67
71
66
75
69

Daughter Height
58.6
64.7
65.3
61.0
65.4
67.4
60.9
63.1
60.0
71.1
62.2

Simple and Multiple Linear Regression

5 May 2014

19 / 23

Finding Multiple Regression Equation

Example 2

Results
Using Technology we find that our regression equation is:
daughter = 7.5 + 0.707 mother +0.164 father
R 2 = 67.5% R 2 (adj)= 63.7% P = 0.00
Using the notation from the earlier slide, we denote our multiple regression
equation as:
y = 7.5 + 0.707x1 + 0.164x 2

(4)

Where y is the predicted height of the daughter, x1 is the height of the


mother, and x2 is the height of the father.

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

20 / 23

Finding Multiple Regression Equation

Adjusted Coefficient of Determination

Adjusted Coefficeient of Determination

R 2 denotes the multiple coefficient of determination which is a


measure of how well the multiple regression equation fits the sample data.
A perfect fit would result is a value of R 2 = 1, and a very good fit
represents a R 2 value near 1. A poor fit results in R 2 values near 0.

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

21 / 23

Finding Multiple Regression Equation

Adjusted Coefficient of Determination

Definition & Equation

Definition
The Adjusted Coefficient of Determination is the multiple coefficient
fo determination R 2 modified to account for the number of variables and
the sample size. It is calculated by:
R 2 (adj) = 1

(n 1)(1 R 2 )
[n (k + 1)]

(5)

where n is the sample size and k is the number of predictor variables.

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

22 / 23

Finding Multiple Regression Equation

Adjusted Coefficient of Determination

Guidelines for Finding the Best Multiple Regression


Equation

Use common sense and practical consideration to include or exclude


variables.
Consider the Pvalue: Select an equation having overall significance,
as determined by the Pvalue found using technology.
Consider equations with high values of adjusted R 2 , and try to
include only a few variables.
This list illustrates the common sense and critical thinking that is required
in order for you to be an effective statistician.

Todd Thomas (UARK)

Simple and Multiple Linear Regression

5 May 2014

23 / 23

Anda mungkin juga menyukai