Chhaapptteerr 99::
LLIIN
NEEA
ARRR
REEG
GRRE
ESSS
SIIO
ONN
CHAPTER REVIEW
9.1 Preamble
9.2 What is linear regression?
9.3 Linear regression for prediction
9.4 Linear regression analysis
9.5 Calculating relationship between two variables
9.6 Predicting English performance
9.1 PREAMBLE
This chapter discusses linear regression which is used to analyse statistically the relationship
between an independent and dependent variable. It is also used for prediction of a variable
based on what is known of another variable.
There are two key words; ‘linear’ and ‘regression’. When you think of ‘regression’ you are
thinking of ‘prediction’. A regression models the past relationship between variables to
predict their future behaviour. You are predicting the value of variable Y using the value of
variable X. To be able to predict the future value of Y, you need to have data showing the
relationship of variable X and variable Y. For example, to predict future performance in
university using an aptitude test, you need to have a set of data showing the relationship
between ‘Scores on the Aptitude Test’ and university ‘GPA’.
Figure 9.1 Graph showing a linear relationship between the time spent and the
acquisition of a skill
The other word is ‘linear’ which mean that relationship between the two variables should be
a ‘straight-line relationship’ or linear relationship. See Figure 9.1. which shows a ‘linear’
relationship between ‘time spent’ and ‘skill acquisition’. In other word, the more time a
person spends practicing a skill, the more competent is the person in that skill.
The opposite is a ‘non-linear relationship’ where the relationship between the two variables
is represented by a curved line.
Figure 9.2 Graph showing a non-linear relationship between performance and anxiety
See Figure 9.2 which shows a non-linear relationship between mathematics performance
and anxiety levels where the line is curved. Very low or non-existent anxiety levels will be
shown to be correlated with low test scores, then as the anxiety levels go up, so do the test
scores (a little bit of stress may increase the concentration levels), till we reach a point where
really high anxiety levels will be shown to be correlated with low test scores again (as very
high stress levels may impair concentration).
The main purpose of linear regression analysis is to assess associations between dependent
and independent variables. The most basic type of regression is that of simple linear
regression. A simple linear regression uses only one independent variable, and it describes
the relationship between the independent variable (hours spent practicing mathematics) and
dependent variable (score in a mathematics test) as a straight line.
When two or more independent variables are used, it is called a multiple regression. For
example ‘attitude towards English’ and ‘reading books in English’ as predictors towards
‘scores in English’. This chapter will focus on simple linear regression.
A simple linear regression models the past relationship between variables to predict their
future behaviour. For example you collect data on the relationship between ‘attitude towards
English’ and ‘scores obtained in an English test’. Based on this data, you can predict the
score a person obtains on an attitude towards English test, his or her score on an English test.
Businesses use regression to predict such things as future sales, stock prices, currency
exchange rates, and productivity gains resulting from a training program.
Let’s say you had collected data on 10 subjects, on their attitudes towards English and
their English test score and the results are shown in the Table below:
34 80
37 87
39 89
29 70
28 69
30 72
33 79
37 85
32 81
32 79
First, you want to test the relationship between between Attitudes towards English
and Performance in English. You are expecting a positive relationship between
Attitude and English score. In other words, as Attitude increases, you expect English
score to also increase. How do you establish this to be true?
Second, you want go further than just stating the relationship between Attitudes
towards English and English scores. You want to know whether you can PREDICT
values of one variable if you know or can estimate the other variable. In other words,
can you predict performance in English based on what you know about their Attitudes
towards English.
We are saying that English performance depends on Attitude. In the language of Regression
Analysis, English performance is the dependent variable and Attitude towards English is the
independent variable. The distribution of the scores for the 10 students on the x axis
(attitude) and y axis (English performance) is shown in the graph below.
85.00
mathscor
English 80.00
score
These are students
with high attitude
75.00
scores but doing
poorly on the
performance test.
70.00
attitude
What do you observe about the graph or scatterplot above?
You make English performance the dependent variable (y) and you ‘enter’ the independent
variable Attitude into the regression equation.
Using SPSS you will have several Tables showing the following:
TABLE 1 shows that ATTITUDE has been ‘entered’ as the independent variable and
ENGLISHSCORE as the dependent variable.
Table 1:
b
Variables Entered/Remov ed
Variables Variables
Model Entered Removed Method
1 ATTITUDEa . Enter
a. All requested variables entered.
b. Dependent Variable: MATHSCOR
ENGLISHSCORE
TABLE 2 shows the Model Summary which reports a statistic that measures “goodness
of fit” which gives us the line that best fits the points in the Graph, better than any other.
Table 2:
Model Summary
Table 3:
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression 476.202 1 476.202 27.076 .001a
Residual 140.698 8 17.587
Total 616.900 9
a. Predictors : (Constant), ATTITUDE
b. Dependent Variable: MATHSCOR
ENGLISHSCORE
When you do a regression analysis, you also want to know whether the linear relationship
between Attitude (x) and English performance (y) is statistically significant; i.e. whether
there is any significant linear relationship between x and y. That is the reason for the
ANOVA table..
The F statistic shown in Table 3 is 27.076 which is significant at p<.05 and so we reject
the null hypothesis. In other words, there is a linear relationship between English
performance and Attitude and the relationship is significant at p<.05.
a) What does the R square tell you?
b) Why is the ANOVA calculated?
Next you want to predict English performance on the score obtained on an Attitude Test. In
other words, you get students to take the Attitude Towards English Test and based on the
score obtained you want to predict their performance in English.
[Don’t panic seeing these mathematical symbols! We will analyse step-by-step what they
mean].
Slope
0.5
Using the ‘Regression Equation” you can predict English score (Y) based on the Attitude
Towards English (X) score.
If Attitudes (x) and English performance (y) have a positive relationship than the Slope
(β1) will be a positive number. Lines with positive slopes go from the bottom left toward
the upper right.
English
score
If Attitudes (x) and English performance (y) have a negative relationship than the Slope
(β1) will be a negative number. Lines with negative slopes go from the upper right to the
lower left. The following graph has a slope of -1: An increase of 1 on the X axis is
associated with a decrease of 1 on the Y Axis.
English
score
Attitude
o If Attitudes (x) and English scores (y) have NO relationship than the Slope (β1) will
be ZERO.
Using the Regression Equation to Predict
From the SPSS output, you get what is called the ‘Coefficients’ table (see Table 4).
In a regression equation, the slope and the intercept are referred to as the “Coefficients”
in the model (see Table 4).
Both the coefficient for the equation are found in column labeled “B” (see Table 4) where
the intercept (β1) is identified as the ‘Constant’, and the slope (βo) is identified as
‘Attitude’.
To predict, you will apply the regression equation with information about the ‘intercept’
and ‘slope’ obtained from the ‘Coefficients Table’, i.e. Y = βoX + β1
o Slope βo = 1.50
o Intercept β1 = 0.70
o X = 30
o Y = ? (Find ‘Y’)
To predict ENGLISH performance (Y) =
Attitude
Slope (βo) Intercept (β1) Predicted
score
English
Performance
Score
Examples:
2. Suppose it is possible to predict a person's score on Test B from the person's score
on Test A. The regression equation is: Y = 2.3 X + 9.5. What is a person's predicted
score on Test B assuming this person got a 40 on Test A?
Answer = 101.5