Presenting
and Describing Data
Frequency Distribution
A table or graph describing the number of
observations in each category or class of a
data set.
Example:
Consider the number of bottles of soda sold
in a snack bar during lunch hour, on 40 days.
(The numbers have been arranged in increasing order.)
63 71 76 81 85
66 73 76 82 85
67 73 76 82 86
68 74 77 84 86
68 74 78 84 89
70 75 79 84 90
71 75 79 85 92
71 75 79 85 94
In order to get a better grasp of this
distribution of numbers, we’ll organize
them into categories or classes.
We’ll look at
absolute frequency,
relative frequency,
cumulative absolute frequency, &
cumulative relative frequency
Notation
[20, 30] denotes all real numbers between 20 & 30,
including the 20 & the 30.
(20, 30) denotes all real numbers between 20 & 30,
including neither the 20 nor the 30.
[20, 30) denotes all real numbers between 20 & 30,
including the 20 but not the 30.
(20, 30] denotes all real numbers between 20 & 30,
including the 30 but not the 20.
So the square bracket means include that endpoint &
the round parenthesis means do not include that
endpoint.
Absolute Frequency
12
10
8
6
4
2
0
60 65 70 75 80 85 90 95 Bottles of soda
Cumulative Absolute Frequency
class abs. freq. rel. freq. cum. abs. freq. cum. rel. freq.
[60, 65) 1 0.025 1 0.025
[65, 70) 4 0.100 5 0.125
[70, 75) 8 0.200 13 0.325
[75, 80) 11 0.275 24 0.600
[80, 85) 6 0.150 30 0.750
[85, 90) 7 0.175 37 0.925
[90, 95) 3 0.075 40 1.000
40 1.000
Cumulative Relative Frequency
1.00
Cumulative Relative Frequency
0.00
60 65 70 75 80 85 90 95 Bottles of soda
Cumulative Relative Frequency Ogive
1.00
Cumulative Relative Frequency
Line connecting
0.75 the points at the
back of the steps.
0.50
0.25
0.00
60 65 70 75 80 85 90 95 Bottles of soda
Next we will consider two types of
summary measures:
1. Measures of the center of the distribution
(also called measures of central tendency)
2. Measures of the spread of the distribution
Measures of
the center of the distribution,
or central tendency,
or typical value, or average
Measures of the Center of the Distribution
Observations 2, 2, 3, 4, 8, 10, 13
Mean 6
Median 4
Mode 2
Example 2
Observations 2, 3, 4, 4, 4, 7
Mean 4
Median 4
Mode 4
Example 4
Salary xi Freq. fi x i fi
700 8 5600
800 23 18,400
900 75 67,500
1000 90 90,000
1100 43 47,300
1200 11 13,200
250 242,000
Computing the Mean
for a Frequency Distribution of a Population
Salary xi Freq. fi x i fi
700 8 5600
800 23 18,400
900 75 67,500
1000 90 90,000
1100 43 47,300
1200 11 13,200
250 242,000
Salary xi Freq. fi x i fi
700 8 5600
800 23 18,400
900 75 67,500
1000 90 90,000
1100 43 47,300
1200 11 13,200
250 242,000
Interval frequency f
[0, 15) 10
[15, 30) 10
[30, 45) 5
[45, 60) 5
[60, 75) 5
35
To calculate the median of interval data, we
need to make an assumption.
( N / 2) f p
median Lmd ( width)
f md
Lmd is the lower limit on the category containing the median.
N is the population size.
Sfp is the sum of the frequencies of the categories
preceding the category containing the median.
fmd is the frequency of the category containing the median.
width is the width of the interval containing the median.
Let’s go through the parts Lmd is the lower limit on the
of the formula, keeping in category containing the median.
mind that the median is in 15
the second category. N is the population size.
Interval frequency f 35
Sfp is the sum of the frequencies
[0, 15) 10 of the categories preceding the
category containing the median.
[15, 30) 10
10
[30, 45) 5
fmd is the frequency of the
[45, 60) 5 category containing the median.
10
[60, 75) 5
width is the width of the interval
35 containing the median.
15
Now we just assemble the pieces.
[0, 15) 10 ( N / 2) f p
Lmd ( width)
[15, 30) 10 f md
35 26.25
What does this mean?
[0, 15) 10 ( N / 2) f p
Lmd ( width)
[15, 30) 10 f md
35 26.25
x
4
8
10
13
15
First we need
the mean.
N
x
4
8
10
13
15
50
N
x
4
8
10
13
15
50
= 50/5 = 10
N
x x-
4 -6
8 -2
10 0
13 3
15 5
50
= 50/5 = 10
N
x x- | x- |
4 -6 6
8 -2 2
10 0 0
13 3 3
15 5 5
50
= 50/5 = 10
N
x x- | x- |
4 -6 6
8 -2 2
10 0 0
13 3 3
15 5 5
50 16
= 50/5 = 10
N
x x- | x- |
4 -6 6
8 -2 2
10 0 0
13 3 3
15 5 5
50 16
i 1
N
i 1
x x-
4 -6
8 -2
10 0
13 3
15 5
Recall = 10
N
i 1
x x- ( x- )2
4 -6 36
8 -2 4
10 0 0
13 3 9
15 5 25
Recall = 10
N
i 1
x x- ( x- )2
4 -6 36
8 -2 4
10 0 0
13 3 9
15 5 25
74
Recall = 10
N
i 1
x x- ( x- )2
4 -6 36
8 -2 4
10 0 0
13 3 9
15 5 25
74
Recall = 10 2 = MSD = 74/5
= 14.8
population standard deviation
√Population Variance
2
Example:
population standard deviation
In the example we just did,
the population variance was 14.8 .
So the standard deviation is
√14.8 = 3.847
2
Calculating the MAD, MSD, & Std. Dev.
for a Frequency Distribution
xi fi
1 3
2 5
3 2
The total number of observations N
is the sum of the frequencies or 10.
xi fi
1 3
2 5
3 2
10
Calculate the population mean .
xi fi xifi
1 3 3
2 5 10
3 2 6
10
Calculate the population mean .
xi fi xifi
1 3 3
2 5 10
3 2 6
10 19
Calculate the population mean .
xi fi xifi
1 3 3
2 5 10
3 2 6
10 19
= 19/10
=1.9
Calculate the Mean Absolute Deviation (MAD).
xi fi xifi xi -
1 3 3 -0.9
2 5 10 0.1
3 2 6 1.1
10 19
= 19/10
=1.9
Calculate the Mean Absolute Deviation (MAD).
xi fi xifi xi - |xi – |
1 3 3 -0.9 0.9
2 5 10 0.1 0.1
3 2 6 1.1 1.1
10 19
= 19/10
=1.9
Calculate the Mean Absolute Deviation (MAD).
c
MSD (1/ N ) ( xi ) 2 f i
2
i 1
The standard deviation is still just the square root of the variance.
The formulae we have been
using are for populations.
If we have samples instead,
we have some notational
changes and one change
in the calculation process.
Notational Changes for Samples
instead of Populations
MAD for a n
numbers: i 1
Sample Variance c
for a frequency s [1/(n 1)] ( xi X ) f i
2 2
distribution: i 1
CV (s / X ) * 100 %
CV (s / X ) * 100 %
(10 / 200 ) * 100 %
( 0.05 ) * 100 %
5 %
x X
Z score
s
Example
Suppose that a particular sample has a mean of
100, and a standard deviation of 10.
Would a value of 120 be considered an outlier?
x X 120 100
Z score 2
s 10
x X 150 100
Z score 5
s 10
Since the Z-score is less than -3 or greater than
+3, 150 is an outlier in this sample.
Example
Is 60 an outlier in this sample (with mean 100 and
standard deviation 10)?
x X 60 100
Z score 4
s 10
Since the Z-score is less than -3 or greater than
+3, 60 is an outlier in this sample.
Symmetric versus Skewed
Distributions
Symmetric Distribution