Describing Distributions
with Numbers
Chapter 2
Chapter 2 Overview
Three characteristics of a quantitative
variable's distribution:
Chapter 1
Shape Visually via graphs.
Center (Typical Value) Numeric summary.
Chapter 2
Spread (Dispersion) Numeric summary.
Chapter 2
For samples:
We know only part of the entire data.
Descriptive measures of samples are called
statistics.
Statistics are often written using Roman letters (x
BPS - 5th Ed.
Chapter 2
).
3
Chapter 2
Chapter 2
Chapter 2
N
x
i
i
1
xi
i 1
statistic
n = Sample Size
parameter
N = Population Size
Chapter 2
xxx12341010
n
x
x
i
1
2
3
4
n
i
1
2
n
n
i1xi2i441x
i2
x
x
i12
i
12x
23
24
12491630
Review: Summation Notation
Is
ii
1
2
3
4
the same as
Chapter 2
x
x
xi
i 1
1 4 5 10
4
x 5.0
Chapter 2
Chapter 2
10
Chapter 2
11
Chapter 2
12
Median
Example 1 data: 2 4 6
[ x1 = 2, x2 = 4, x3 = 6 ]
L(M) = (3+1)/2 =2
Median: M = 4.0
Example 2 data: 2 4 6 8
[ x1 = 2, x2 = 4, x3 = 6, x4 = 8 ]
L(M)= (4+1)/2 = 2.5
so we must average data values x 2 and x3 to get the
median
Chapter 2
13
Median
Example 3 data: 6 2 4
The Median is not equal to 2.0 because we did not
order our data. After ordering our data, it is just
like example 1.
Chapter 2
14
Chapter 2
15
Chapter 2
16
Chapter 2
17
Resistant Statistic
What if one value is extremely different from the
others?
Example: What if we made a mistake and
6, 1, 2
was recorded as
6000, 1, 2 ?
The mean is now ( 6000 + 1 + 2 ) / 3 = 2001.0
The median is still 2.0
Conclusion: The median is resistant to extreme values
while the mean is not resistant.
When data has extreme values (i.e., unusually large or small
values) or is skewed, median preferred to mean because it is
more representative of the "typical" value of the variable.
BPS - 5th Ed.
Chapter 2
18
Center of gravity.
Useful for roughly symmetric quantitative data
with no extreme values.
Median
Chapter 2
19
Question
A recent newspaper article in California said
that the median price of single-family homes
sold in the past year in the local area was
$136,000 and the mean price was $149,160.
Which do you think is more useful to
someone considering the purchase of a
home, the median or the mean?
Chapter 2
20
Spread, or Variability
If all values are the same, then they
all equal the mean. There is no
variability.
Variability exists when some values
are different from (above or below)
the mean.
Chapter 2
21
The variance
The standard deviation
Chapter 2
22
Range
The range of a variable is the
largest data value (called the maximum)
minus the smallest data value (called the
minimum).
Example 6: Compute the range of 6, 1, 2, 6,
11, 7, 3, 3
The largest value is 11.
The smallest value is 1.
Subtracting the two 11 1 = 10 the
range is 10.0, one more decimal place
than the data.
General Rounding Rule for Reporting Statistics
BPS - 5th Ed.
Chapter 2
23
Chapter 2
24
Range
Note: The range only uses two values in the
data set the largest value and the
smallest value. So, the range is not
resistant.
Example 2: If we made a mistake and 6, 1, 2
was recorded as 6000, 1, 2 .
The range is now (6000 1) = 5999.0
instead of (6 1) = 5.0 .
Chapter 2
25
Quartiles
Three numbers which divide the
ordered data into four equal sized
groups.
Q1 has 25% of the data below it.
Q2 has 50% of the data below it.
(Median)
Chapter 2
26
Quartiles
Uniform Distribution
Q1
BPS - 5th Ed.
Q2
Chapter 2
Q3
27
Chapter 2
28
Example 7
Even Data Set: 5 25 7 23 10 11 15 21 18 20
1. Put Data Set in order and find location of
median:
5 7 10 11 15 18 20 21 23 25
L(M) = (10+1)/2 = 5.5 so the median or 2 nd
quartile is the average of the 5th and 6th data
value
2.M = 16.5
3.There are 5 data points below and above
median.
4.Find the median of the first 5 data points and
that is the 1st quartile.
BPS - 5th Ed.
Chapter 2
29
Example 7 (cont.)
Data Set: 5 7 10 11 15 18 20 21 23
25
To find the 1st Quartile, we must find
the median of the first half of the
data (data to the left of the Median).
The L(M) of the first 5 data points is
L(M) = (5+1)/2 = 3
so the 1st quartile is x3 = 10.0
Chapter 2
30
Example 7 (cont.)
Data Set: 5 7 10 11 15 18 20 21 23
25
To find the 3rd Quartile, we must find
the median of the second half of the
data (data to the right of the
Median). The L(M) of the last 5 data
points is
L(M) = (5+1)/2 = 3
so the 3rd quartile is 3rd data point in
that set, so 3rd quartile is 21.0
BPS - 5th Ed.
Chapter 2
31
Example 8
Odd Data Set: 5 25 7 23 10 11 15 21 18 20 27
1. Put Data Set in order and find location of
median:
5 7 10 11 15 18 20 21 23 25 27
L(M) = (11+1)/2 = 6 so the median or 2nd
quartile is the 6th data value, x6 = 18
2.M = 18.0
3.There are 5 data points below and above
median.
4.Find the median of the first 5 data points and
that is the 1st quartile.
BPS - 5th Ed.
Chapter 2
32
Example 8 (cont.)
Data Set: 5 7 10 11 15 18 20 21 23
25 27
To find the 1st Quartile, we must find
the median of the first half of the
data (data to the left of the Median).
The L(M) of the first 5 data points is
L(M) = (5+1)/2 = 3
so the 1st quartile is x3 = 10.0
Chapter 2
33
Example 8 (cont.)
Data Set: 5 7 10 11 15 18 20 21 23
25 27
To find the 3rd Quartile, we must find
the median of the second half of the
data (data to the right of the
Median). The L(M) of the last 5 data
points is
L(M) = (5+1)/2 = 3
so the 3rd quartile is 3rd data point in
that set, so 3rd quartile is 23.0
BPS - 5th Ed.
Chapter 2
34
L(M)=(53+1)/2=27
L(Q1)=(26+1)/2=13.5
Chapter 2
35
Chapter 2
36
Weight
Data:
10
11
first
12
13
quartile
Quartiles
14
15
median or second quartile 16
17
third quartile 18
19
20
21
22
23
24
25
26
BPS - 5th Ed.
Chapter 2
0166
009
0034578
00359
08
00257
555
000255
000055567
245
3
025
0
0
37
IQR
Interquartile
Range
BPS - 5th Ed.
Chapter 2
38
Chapter 2
39
Five-Number Summary
Five-Number Summary gives a
concise description of the
distribution of a variable:
40
minimum = 5.0
Q1 = 10.0
Interquartile
Range (IQR)
= Q3 Q1
M = 16.5
Q3 = 21.0
maximum = 25.0
= 11.0
Chapter 2
41
minimum = 5.0
Q1 = 10.0
Interquartile
Range (IQR)
= Q3 Q1
M = 18.0
Q3 = 23.0
maximum = 27.0
Chapter 2
= 13.0
42
minimum = 100
Q1 = 127.5
M = 165.0
Q3 = 185.0
maximum = 260
Chapter 2
Interquartile
Range (IQR)
= Q3 Q1
= 57.5
43
Boxplot
A boxplot is a graphical representation of
the five-number summary
Central box spans Q1 and Q3.
A line in the box marks the median M (Q2).
Lines extend from the box out to the
minimum and maximum. (These lines are
sometimes called whiskers)
Chapter 2
44
Q1
100
275
125
M
150
Q3
175
max
200
225
250
Weight
Chapter 2
45
Boxplot
WARNING: When Using MINITAB, the
whiskers show the minimum and
maximum values within what is called the
lower and upper fences (Numbers greater
than the upper fence and smaller than the
lower fence are things called outliers) not
the min and max of the data set
necessarily. Asterisks are used to indicate
any outliers.
We will talk about what an outlier is and a
way we can check our data for outliers.
BPS - 5th Ed.
Chapter 2
46
Outliers
Outliers are extreme observations in the data. They
are values that are significantly too high or too
low, based on the spread of the data.
Outliers should be identified and investigated.
Outliers could be:
Chance occurrences
Measurement errors
Data entry errors
Sampling errors
Chapter 2
47
Outliers (cont.)
Fence Rule for checking for outliers using the
quartiles:
Calculate lower and upper fences:
Lower fence = LF = Q1 (1.5 IQR)
Upper fence = UF = Q3 + (1.5 IQR)
Chapter 2
48
Chapter 2
49
Chapter 2
50
Chapter 2
51
Chapter 2
52
Variance
The variance is based on the deviation from the mean. (How far is each observation from the typical value?)
( xi )
( xi x )
for samples
) 2 for populations
( xi
( xi x ) 2
for populations
for samples
Chapter 2
2
222
()
x
(
x
(
xx
)()
i
2
12
N
NN
Population Variance
Chapter 2
54
2
222
()
x
()
x
(
x
x
)()
i
2
12
n
snn
11
Sample Variance
Why do we use different formulas for the population variance and the
sample variance? See page 51 in your book.
BPS - 5th Ed.
Chapter 2
55
Standard Deviation
The standard deviation is the square root of the
variance.
The population standard deviation
Is the square root of the population variance (2).
Is represented by .
Chapter 2
56
Deviations
what is a typical deviation from the
mean? (standard deviation)
small values of this typical deviation
indicate small variability in the data
large values of this typical deviation
indicate large variability in the data
Chapter 2
57
1
7
9
2
1
6
1
3
6
2
1
6
4
1
6
0
1
8
6
7
1
4
3
9
x16,707
Variance and Standard
Deviation
Example from Text
Chapter 2
58
xxx
2
ii i
Deviations
Squared deviations
1792
1666
1362
1614
1460
1867
1439
Sum =
Chapter 2
59
xi xixi2
Variance and Standard
Deviation
Observations
Squared deviations
1792
17921600 = 192
(192)2 = 36,864
1666
1666 1600 =
1362
1614
1614 1600 =
1460
(-140)2 = 19,600
1867
(267)2 = 71,289
1439
(-161)2 = 25,921
66
14
Chapter 2
(66)2 =
4,356
(-238)2 = 56,644
(14)2 =
196
sum = 214,870
60
,s35,8
2
1
4
8170.6
,78192.647calories
3
5
2
Chapter 2
61
Chapter 2
62
10
30
10
99
12
13
14
14
15
15
20
20
Chapter 2
63
0
100
10
20
30
40
50
60
Number of books
Mean = 7.06
BPS - 5th Ed.
70
80
90
s.d. = 14.43
Chapter 2
64
Chapter 2
65
Chapter 2
66
Statistical Problem
Chapter 2
67
Tornadoes
Example
The following data give the number of tornadoes in
Oklahoma, Kansas, and Nebraska for the years 1990 to
2004.
Year
1990
1991
1992
1993
1994
1995
1996
1997
Chapter 1998
2
Oklahoma
Nebraska
30
88
73
63
64
74
64
69
40
55
79
26
47
60
55
30
83
65
Kansas
88
116
92
113
42
73
68
62
71
68
MINITAB Tornadoes
Example
C1, which contains the
Chapter 2
69
States
Oklahoma
Kansas
Nebraska
20
40
60
80
100
120
Number of Tornadoes (Per Year)
Chapter 2
140
160
70
Chapter 2
71