CORRECTION GUIDE
Section 1
1.2 Practise
1. a) Knowing that the range = max - min, the range = 1.85 - 0.42 = 1.43 m
b) By entering the values in the graphics calculator, you get Q3 = 1.415 and
Q1 = 1.165.
Q 3 − Q1
Knowing that the semi-interquartile interval = , we obtain
2
1.415 - 1.165
semi-interquartile range = = 0.125 m
2
c) Figure 1.1
xi 1.85 0.95 1.04 1.15 0.80 1.18 1.32 1.45 1.24 1.03 1.28
xi − x .585 .315 .225 .115 4.65 .085 .055 .185 .025 .235 .015
xi 1.75 1.42 1.53 1.22 1.24 1.27 1.18 1.53 1.29 1.41 0.99
xi − x .485 .155 .265 .045 .025 .005 .085 .265 .025 .145 .275
xi 1.33 1.21 1.28 1.52 1.65 0.42 1.80 1.10 1.25 1.35 1.42
xi − x 0.65 0.55 .015 .225 .385 .845 .535 .165 .015 0.85 .155 Sum Average
xi 1.26 1.32 1.18 1.32 1.22 1.05 1.23 1.42 0.75 1.15 1.32 55.67 1.265
xi − x .005 .055 .085 .055 .045 .215 .035 .155 .515 .115 0.55 7.990 .182 E.M.
N.B. To complete the table more rapidly, you can use the graphics calculator.
You enter the list in L1 STAT 1
You move right to list L2, directly at the top on L2, and enter the following
formula: ABS ( L1 - LIST MATH
MEAN ( L1 )) ENTER
1.80 − 1.2652
e) Z≈ ≈ 2.0561
0.2601
Interpretation:
Population Canada is slightly below the average of the other countries studied
(Z ≈ -0.134).
Pop. Density Canada’s population density is clearly below the average (117
inhabitants per km2). The explanation is simple - Canada has a low
population in a very large total area. (Z ≈ -1.014)
Birth Rate The Canadian birth rate is very close to the average of the countries
studied (Z ≈ 0.064). Note that this relatively low (13.5/100 = 1.35%). To
maintain population levels, it is said by certain specialists that a birth rate
of 2% is ideal.
Death Rate Canada is well below the average of these countries in death rate (Z ≈ -
1.3). One could suggest that this is due to the relatively good quality and
wide availability of health services.
1. Figure 1.3
Name Formula Method Interpretation Limitations
Range max - min Mental calculation Maximum variation Mediocre - no use
when there are a
few extreme
values
Semi-interquartile Q3 − Q1 Box and whisker Spread of the central Limited value as
Range 2 plot on graphics 50% of the data half the data is not
calculator used
Mean Variation ∑ xi - X Table of values in Average of the Long process and
which the variation variations from the not useful when
n from the mean mean there are extreme
column is added values
Standard 2 Graphics calculator Ideal measure of When n > 30,
∑ xi − X
Deviation σ = dispersion replace n - 1 by n
n − 1 in the formula.
For a sample
where n < 30,
replace σ by S
2. Figure 1.4
Name Formula Method Interpretation Limitations
Z score xi − X Calculator Indicates position with About 95% of
Z= respect to a set of values are situated
σ values. Position between +/- 2
indicated in terms of standard
standard deviation deviations on
either side of the
mean
Figure 1.5
Xi − X
2
Xi Xi − X Xi 2
7 3 9 49
8 4 16 64
3 -1 1 9
1 -3 9 1
1 -3 9 1
4 0 0 16
24 0 44 140
2
∑ xi − X 44
Formula 1 s = Y s= . 2.9664
n − 1 5
∑x 2
24 2
∑x i − 140 −
2 i
s = Y s=
n 6
Formula 2 . 2.9664
n − 1 5
Although it is not clear from this example, the second formula is recommended
because it requires a less complicated table of values. With today’s calculators and
computers, this does not pose a problem, but it should be noted that it rapidly becomes
much more complicated when the mean is not a whole number. Furthermore, virtually
all calculators have the ∑x2 and ∑x buttons used in the formula 2.
Section 2
2.2 Practise
1. Figure 2.1
Linear correlation Other
Situation positive negative non-existent strong correlation
a x x
b x x
c x
d x
e x
f x
g x x
h x x
i x x
j x
k x
l x
y
ii) Hours
watching
30
TV
20
10
10 20 30 x
hours spent on sport
N.B. The variables could be reversed.
vi) a) From the graph, using the points (0, 23) and (32, 0):
0 − 23 − 23
a = = = − 0.71875 and b = 23
32 − 0 32
The equation is y = -0.71875x + 23
y
ii)
Fuel
consumption 20
(L/100 km)
15
10
r = 1 −
0 . 35
iv) a) From the scattergram: 2.2 . 0.84
b) By the calculator: r ≈ 0.98
vi) a) From the graph: Using the points (1700, 820 and (3300, 14.5) on
the line:
vii) If your car has a 4 L or 4000 cm 3 engine (1 L = 1000 cm3), you can
expect to use about 16.88 litres of gasoline per 100 km driven.
ii) Number of 19
bounces
16
13
10
3 6 9 12 15 18 x
Height (concrete blocks)
r = 1 −
1.2
iv) a) From the scattergram: 4.4 . 0.72
b) By the graphics calculator: r ≈ 0.92
v) Is the calculator fooling us? If the ellipse reflects the form of the
scattergram, this method seems to be in error. In reality, the
correlation is probably not linear, but is probably a square root
relation.
ii) y
Number of 20
candles
burned 15
10
2 4 6 8 10 12 x
days without electricity)
r = 1 − 3 . 5 . 0.31
2.4
Iv) a) From the scattergram:
b) By the calculator: r ≈ 0.3085
y
ii) Price
($)
20000
15 000
10 000
5 000
3 6 9 12 15 18 x
Age (years))
r = − 1 − . − 0.72
1.2
iv) a) From the scattergram: 4.3
vi) a) From the graph: (1, 15 000) and (15, 0) are points on the linear
regression line. So:
0 − 15 000 − 15 000
a = = = − $1 071. 43 per year
15 − 1 14
Putting (1, 15 000) and a = − 1 0071. 43 in y = ax + b
Y 15 000 = − 1 071. 43( 1) + b
Y b = $16 071. 43
r = " 1 −
l
3. L
4. 1. d 2. a 3. f 4. e 5. b 6. c
6. No, other forms of non-linear relations are possible, for example a parabolic
relation or an exponential relation.
1. a) Figure 2.2
x y x2 y2 xy
1 3 1 9 3
11 18 121 324 198
3 6 9 36 18
9 15 81 225 135
5 9 25 81 45
7 12 49 144 84
∑x = 36 ∑y = 63 ∑x2 = 286 ∑y2 = 819 ∑xy = 483
n∑xy − ∑x ∑y
b) r =
n∑x − ∑x ∗ n∑y − ∑y
2 2 2 2
6 ∗ 43 − 36 ∗ 63
= 2
6 ∗ 286 − ∗ 6 ∗ 819 −
2
36 63
2 898 − 2 268
=
1 716 − 1 296 ∗ 4 914 − 3 969
630
=
420 ∗ 945
630
=
630
= 1
c) A perfect (maximal) positive correlation. The points are located exactly
on, and not dispersed around, a straight line.
e) r=1
f) Certainly!
2. a) Figure 2.3
x y x2 y2 xy
12 16 144 256 192
16 20 256 400 320
19 25 361 625 475
10 12 100 144 120
13 9 169 81 117
16 30 256 900 480
12 9 144 81 108
12 8 144 64 96
∑x = 110 ∑y = 129 ∑x2 = 1 574 ∑y2 = 2 551 ∑xy = 1 908
n∑xy − ∑x ∑y
b) r =
n∑x − ∑x ∗ n∑y − ∑y
2 2 2 2
15 264 − 14 190
=
12 592 − 12 100 ∗ 20 408 − 16 641
1 074
=
492 ∗ 3 767
1 074
=
1 361. 4
= 0.79
Section 3
Situation 1
Figure 3.1
Active Rate of Employ Unempl Unempl Time Total Ratio
pop. activity ed oyed oyment zone pop. (employ
(thousa (%) (thousa (thousa rate (GMT) (thousan ed/total
nds) nds) nds) (%) ds pop.)
%
Newfoundland 237.8 53.1 196.0 41.9 17.6 4.5 447.8 43.8
P.E.I. 70.1 65.5 59.9 10.3 14.7 5 107.0 56.0
Nova Scotia 453.5 60.9 402.6 50.9 11.2 5 744.7 54.1
New Brunswick 368.1 60.9 320.4 47.7 13.0 5 604.4 53.0
Quebec 3669.7 61.7 3256.1 413.6 11.3 6 5 947.6 54.7
Ontario 6011.1 66.4 5533.0 478.1 8.0 6 9 052.9 61.1
Manitoba 575.2 66.8 542.4 32.8 5.7 7 86101 63.0
Saskatchewan 510.8 67.0 482.4 28.4 5.6 7 762.4 63.3
Alberta 1589.1 72.5 1503.6 85.4 5.4 8 2 191.9 68.6
British Columbia 2004.9 64.0 1818.2 186.6 9.3 9 3 132.7 58.0
x y z a = x - y a/x*100 b = x/y*100 z/b*100
You will notice that in Quebec only about 54% of the population has a job. You must
bear in mind that this includes children and retired people. You cannot conclude too
quickly, however, that half of the population takes care of the needs of the other half.
Retired people have, for the most part, significant pension income for which they have
already paid. They do not, therefore, live at the expense of others. Perhaps the word
“unemployed” itself has been devalued and has taken on an a negative interpretation
in our society.
b) The best way to consider the position of a data point with respect to a
distribution is to find its Z-score. To do this, you need the mean and the standard
deviation. The Z-score for Quebec is:
11.3 − 10.2
Z= . 0.27
4.152054
This indicates that Quebec is situated in the middle of the list of Canadian
provinces with regard to the unemployment rate even if its rate is a little above
the Canadian average of 10.2%.
From this table, it is possible to draw a scattergram, but the graphic calculator is
probably a more appropriate tool. By entering the two lists, you should obtain:
Figure 3.3
y
Unemployment
Rate (%0 20.0
15.0
10.0
5.0
5 6 7 8 9 x
Time zone
This suggests that the time zone influences fairly strongly the unemployment
rate. Thus, in Alaska (zone = 10 hours GMT: Greenwich Mean Time), you could
predict that the unemployment rate will be y = -2.0764(10) + 23.1577 =
2.3937%, and that in France, it will be y = -2.0764(0) + 23.1577 = 23.1577%.
Situation 2
Figure 3.4
y
maximum
average
temperature 35
(°C)
30
25
20
15
10
20 40 60 x
You can see that one point appears to be out of line with the others. This point
represents the city of Bogota. In fact this result is explicable, since it is not only
latitude, but other factors which influence temperature. Altitude is also
particularly important. This is so for Bogota which is located very high above
sea level.
d) The equation for the line of regression is y = -0.4325x + 35.6, so the town of St.
Mêton would be expected to have a maximum average daily temperature of
about y = -0.4325 (49)+ 35.6 ≈ 14.4°C. This is, of course, highly unlikely since
St. Mêton is a town in Quebec!
Situation 3
a) Range = max - min ⇒ range = 126.3 - 56.9 = 69.4 (that is to say that the
value of the houses in this area vary by about $69 400. This is very large, and is
at the base of the problem).
Deviation from the mean 0.66 8.94 5.74 31.94 22.24 6.34 37.46 9.54
Note that the final column gives the mean evaluation and the mean deviation
from the mean. These two figures indicate that an average house is evaluated
at $88 400 and that the mean deviation is $16 650.
Standard deviation: From the calculator, the standard deviation = $21 095.
With these different measures of dispersion, it is clear that house values vary
considerably in Jean’s neighbourhood. This has a direct effect on house prices.
b) Since this is the only sample that Jean has, she has no choice but to use it.
Furthermore, she cannot disregard a sample just because it shows a very wide
dispersion of values.
c) Evidently, Jean’s house is right at the top of the list of evaluations of houses in
her neighbourhood. In spite of this, she wants to raise the evaluation.
d) Yes, the correlation coefficient (r ≈ 0.81) indicates that the size of a house
strongly influences its evaluation, and hence the tax imposed. The graphic
calculator gives the following rule:
y = 0.4196x + 18.22 where x and y represent the surface area and the
evaluation respectively.
e) The cost of reconstruction is not the only factor in determining evaluation levels
in a municipality. Another factor is the value of other properties in the
neighbourhood. The relatively low evaluation of Jean’s house is because she
has built a veritable palace compared to the other houses in the
neighbourhood. This is an error which has enormous consequence on sale
prices.
f) Because with a lower municipal evaluation, she will pay less municipal and
education taxes, since these are determined according to the municipal
evaluation.
Situation 4
Figure 3.06
Range Mean Standard
Dev.
Situation 5
Draw the ellipse for each situation and find the measure of the major and minor axes.
0.3 2.3
First graph: r≈1 − ≈ 0.88 Second graph: r≈1 − ≈ 0.23
2.5 3
In the first situation, the coefficient (0.88) indicates a very strong relation. On the
contrary, in the second, adding some values which don’t follow the same tendency, the
situation is not at all like a straight line.
Conclusion: We must be careful about the size of a sample, because if it is too limited,
one may conclude too rapidly that there is a correlation. In this case, it is beginning at
the 11th data point that things start to go wrong.
Situation 6
A good way is to use the Z score which gives a comparison with respect to the mean
while also allowing for the dispersion in the data. Here are the tables of values for the
three schools with the Z scores of each of the students added.
Figure 3.7
St. Jim’s School
Student A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17
Average 85 80 86 71 72 86 75 72 60 84 87 72 78 82 79 76 76 77.7
Z Score 1.02 0.32 1.16 -0.94 -0.80 1.16 -0.38 -0.80 -2.49 0.88 1.31 -0.80 0.04 0.60 0.18 -0.24 -0.24
Z Score -1.75 0.50 1.06 -0.62 -0.40 1.62 1.39 0.16 -0.40 -1.4 10.27 -0.74 -0.40 0.72
A1 (85%), A3 (86%), A6 (86%), A11 (87%), B5 (80%), B20 (89%), B8 (90%), C3 (85%),
C7 (88%) and C6 (90%). You may notice that the lowest mark among the students
selected is 80% although A10 was not selected despite a mark of 84%. Why?
Situation 7
b) y = 0.1318x - 1.8289
The rate of change of the cholesterol level is 0.1318 mg/100 ml per year.
It is not necessary to determine the equation for Cholesterol level and mass
since the correlation coefficient indicates that the relation is not dependent.
c) No, you would need data for cholesterol level prior to treatment. One would
guess that young people begin with a lower level of blood fat than older people.