14 14
10 f 10
f
6
6
2
2
A B C D F
Rep Dem Ind
exam grades
Political affiliation
18
18
14
14
f 10
f 10
6
6
2
2
5 7 9 11 13 15 17 19 21
5 7 9 11 13 15 17 19 21
# of presentations
# of presentations
Measure of Central Tendency:
Mode
Mode
• The data entry that occurs with the greatest frequency.
• If no entry is repeated the data set has no mode.
• If two entries occur with the same greatest frequency,
each entry is a mode (bimodal).
c) 1 2 3 6 7 8 9 10 No Mode
The mode
• Calculation of the mode from a frequency
distribution
The observation with the largest frequency is
the mode
Example
A group of 13 real estate agents were asked how many houses they
had sold in the past month. Find the mode.
Number of houses sold F
0 2
1 5
2 6
Total 13
The observation with the largest frequency (6) is 2. Hence the mode of
these data is 2.
The mode
• Calculation of the mode from a grouped frequency
distribution
– It is not possible to calculate the exact value
of the mode of the original data from a
grouped frequency distribution
– The class interval with the largest frequency
is called the modal class
Mo L
d1
i
d1 d2
Where
L = the real lower limit of the modal class
d1 = the frequency of the modal class minus the
frequency of the previous class
d2 = the frequency of the modal class minus the
frequency of the next class above the modal class
i = the length of the class interval of the modal class
The median
The median is the middle observation in a set
50% of the data have a value less than the median, and 50% of the data
have a value greater than the median.
~ n1
If n is odd, x
2
n
If n is even, the median is the mean of the 2 th observation
n
and the 1 th observation
2
The median
• Calculation of the median from a frequency distribution
– This involves constructing an extra column
(cf) in which the frequencies are cumulated
Number of pieces Frequency f Cumulative frequency
cf
1 10 10
2 12 22
3 16 38
f 38
10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90
Median
Median: The value of the data that occupies the middle position when
the data are ranked in order according to size
Notes:
~
Denoted by “x tilde”: x
_ Xi
_________
i
X =
N
Statistical Notation
• Formula for mean: X
N
• Σ: summate
– add all that follows
• X: observation
– value of an observation
• N: number of observations
– Or data points ~
Example
Example: The following data represents the number of accidents
in each of the last 8 years at a dangerous intersection.
Find the mean number of accidents: 8, 9, 3, 5, 2, 6, 4, 5:
1
Solution: x (8 9 3 5 2 6 4 5) 5.25
8
_ Xi
_________
i
X =
N
The mean
• Calculation of the mean from a frequency distribution
– It is useful to be able to calculate a mean
directly from a frequency table
x
fx
f
where
Σf = the sum of the frequencies
Σfx = the sum of each observation multiplied by its
frequency
The mean
• Calculation of the mean from a grouped frequency
distribution
– The mean can only be estimated from a grouped frequency distribution
– Assume that the observations are spread evenly throughout each class
interval
x
fm
f
where:
Σfm = the sum of the midpoint of a class interval and
that class interval’s frequency
Σf = the sum of the frequencies
1. Determine the mean, median, mode
53, 55, 56, 56, 58, 58, 59, 59, 60, 61, 61, 62, 62, 62,
64, 65, 65, 67, 68, 68, 70
x
xw
w
– The weights are usually expressed as
percentages or fractions
• A student's grade in a Psychology course is
comprised of tests (40%), quizzes (20%),
papers (20%), and a final project (20%). His
scores for each of the categories are 85 (tests),
100 (quizzes), 92 (papers) and 84 (final
project). Calculate his overall grade.
• In a History course, a student's grade is
composed of papers (40%), tests (40%) and a
final exam (20%). The student has earned a 90
value on all papers and 80 value on all tests.
What is the minimum score the student needs to
earn on the final exam to achieve an overall
grade of 87.0?
Central Tendency
• Describes most typical values
– Depends on level of measurement
• Mode (all levels)
– Most frequently occurring value
• Median (only ordinal & interval/ratio)
– value where ½ observations above & ½ below
• Mean (only interval/ratio)
– Arithmetic average ~
• Determine the mode, the median, the mean
Choosing the “Best Average”
• The shape of your data and the existence of any
outliers may help you choose the best average:
Shapes of Distributions
Symmetry in Data Sets
The analysis of a data set often depends on whether the
distribution is symmetric or non-symmetric.
Symmetric distribution: the pattern of frequencies from a
central point is the same (or nearly so) from the left and right.
Symmetry in Data Sets
Non-symmetric distribution: the patterns from a central
point from the left and right are different.
Skewed to the left: a tail extends out to the left.
Skewed to the right: a tail extends out to the right.
Measures of Dispersion
• Measures of central tendency alone cannot
completely characterize a set of data. Two very
different data sets may have similar measures of
central tendency.
– Set A: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
– Set B: 1, 10, 10, 10, 10, 10, 10, 10, 10, 10
Example
Example: Consider the sample {12, 23, 17, 15, 18}.
Find 1) the range and 2) each deviation from the mean.
Solutions:
Sample Variance, s 2 :
( x x ) 2
• s2
n 1
x 2
x n
2
s2
n 1
s s2
• Round to one more decimal than the data.
• Don’t round until the end.
• Include the appropriate units.
Find the Standard Deviation and Variance for
Bank B (1 wait line)
x 36.5
x 7.3 min Wait time, Deviation: x – x Squares: (x – x)2
n 5 x (in min)
6.6
6.8
( x x ) 2 7.5
s2 7.7
n 1
7.9
x x
2
x 36.5 Σ(x – x) =
s s2
• Round to one more decimal than the data.
• Don’t round until the end.
• Include the appropriate units.
Example
Example: Find the 1) variance and 2) standard deviation for the
data {5, 7, 1, 3, 8}:
Solutions:
First: x 1(5 7 1 3 8) 48
.
5
x x x ( x x )2
5 0.2 0.04
7 2.2 4.84
1 -3.8 14.44
3 -1.8 3.24
8 3.2 10.24
Sum: 24 0 32.08
1
1) s 2 ( 32 . 8 ) 8 . 2 2) s 8 . 2 2 . 86
4
You grow 20 crystals from a solution and
measure the length of each crystal in
millimeters. Here is your data: 9, 2, 5, 4, 12,
7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4
Calculate the sample standard deviation of the
length of the crystals.
Sample versus Population
Standard Deviation
Note: Unlike x and µ, the formulas for s and σ are not
mathematically the same:
Sample Standard Deviation
• ( x x ) 2
s s 2
n 1
Population Standard Deviation
( x ) 2
• 2
N
Standard Deviation: Key Points
Range Rule: For most data sets, the majority of the data lies
within 2 standard deviations of the mean.
Recall: Range = High – Lo
Estimate: Range ≈ 4s
s
Range
4
Using the Range Rule of Thumb
at least
1 12
k
x ks x x ks
Chebyshev’s Theorem
Notes:
The empirical rule is more informative than Chebyshev’s theorem since
we know more about the distribution (normally distributed)
Also applies to populations
Can be used to determine if a distribution is normally distributed
The Empirical Rule
The Empirical Rule
The Empirical Rule
Example
Example: A random sample of plum tomatoes was selected
from a local grocery store and their weights recorded.
The mean weight was 6.5 ounces with a standard
deviation of 0.4 ounces. If the weights are normally
distributed:
1) What percentage of weights fall between 5.7 and 7.3?
2) What percentage of weights fall above 7.7?
Solutions:
1) ( x 2s, x 2s) (65
. 2(0.4), 65
. 2(0.4)) (57
. , 7.3)
Approximately 95% of the weights fall between 5.7 and 7.3
2) ( x 3s, x 3s) (65
. 3(0.4), 65
. 3(0.4)) (53
. , 7.7)
Approximately 99.7% of the weights fall between 5.3 and 7.7
Approximately 0.3% of the weights fall outside (5.3, 7.7)
Approximately (0.3/2)=0.15% of the weights fall above 7.7
A Note about the Empirical Rule
Note: The empirical rule may be used to determine whether or
not a set of data is approximately normally distributed
xf
2
x 2
f
x
xf
s
2 f
f f 1
Example
Example: A survey of students in the first grade at a local school
asked for the number of brothers and/or sisters for
each child. The results are summarized in the table
below. Find 1) the mean, 2) the variance, and
3) the standard deviation:
Solutions:
First: x f xf x2 f
0 15 0 0
1 17 17 17
2 23 46 92
4 5 20 80
5 2 10 50
Sum: 62 93 239
239 (93) 2
2) s2 62 62 . 128
1) x 93/ 62 15 1 163 3) s 163
. . .
Problem
• Find the mean and the variance for this
grouped frequency distribution:
Class Boundaries f
2–6 7
6 – 10 15
10 – 14 22
14 – 18 14
18 – 22 2
z-Score
z-Score: The position a particular value of x has relative to the mean,
measured in standard deviations. The z-score is found by the
formula:
value mean x x
z
st.dev. s
Notes:
Typically, the calculated value of z is rounded to the nearest
hundredth
The z-score measures the number of standard deviations
Solutions:
z x s x 46 35.6 1.46
7.1
46 is 1.46 standard deviations above the mean
x x 33 35.6
z 0.37
s 7.1
33 is 0.37 standard deviations below the mean.
Quartiles
Quartiles: Values of the variable that divide the ranked data into
quarters; each set of data has three quartiles
1. The first quartile, Q1, is a number such that at most 25% of
the data are smaller in value than Q1 and at most 75% are
larger
2. The second quartile, Q2, is the median
3. The third quartile, Q3, is a number such that at most 75%
of the data are smaller in value than Q3 and at most 25%
are larger
Ranked data, increasing order
Notes:
The 1st quartile and the 25th percentile are the same: Q1 = P25
The median, the 2nd quartile, and the 50th percentile are
x Q2 P50
all the same: ~
Finding Pk (and Quartiles)
• Procedure for finding Pk (and quartiles):
1. Rank the n observations, lowest to highest
2. Compute A = (nk)/100
3. If A is an integer:
– d(Pk) = A.5 (depth)
– Pk is halfway between the value of the data in the Ath
position and the value of the next data
If A is a fraction:
– d(Pk) = B, the next larger integer
– Pk is the value of the data in the Bth position
Example
Example: The following data represents the pH levels of a
random sample of swimming pools in a California
town. Find: 1) the first quartile, 2) the third quartile,
and 3) the 37th percentile:
5.6 5.6 5.8 5.9 6.0
6.0 6.1 6.2 6.3 6.4
6.7 6.8 6.8 6.8 6.9
7.0 7.3 7.4 7.4 7.5
Solutions:
1) k = 25: (20) (25) / 100 = 5, depth = 5.5, Q1 = 6
2) k = 75: (20) (75) / 100 = 15, depth = 15.5, Q3 = 6.95
Note: The mean, median, midrange, and midquartile are all measures
of central tendency. They are not necessarily equal. Can you
think of an example when they would be the same value?
5-Number Summary
5-Number Summary: The 5-number summary is composed of:
1. L, the smallest value in the data set
2. Q1, the first quartile (also P25)
3. ~
x , the median (also P50 and 2nd quartile)
4. Q3, the third quartile (also P75)
5. H, the largest value in the data set
Notes:
The 5-number summary indicates how much the data is
spread out in each quarter
The interquartile range is the difference between the first and third
quartiles. It is the range of the middle 50% of the data
Box-and-Whisker Display
Box-and-Whisker Display: A graphic representation of the
5-number summary:
• The five numerical values (smallest, first quartile, median, third
quartile, and largest) are located on a scale, either vertical or
horizontal
• The box is used to depict the middle half of the data that lies
between the two quartiles
• The whiskers are line segments used to depict the other half of the
data
• One line segment represents the quarter of the data that is smaller
in value than the first quartile
• The second line segment represents the quarter of the data that is
larger in value that the third quartile
Example
Example: A random sample of students in a sixth grade class
was selected. Their weights are given in the table
below. Find the 5-number summary for this data and
construct a boxplot:
63 64 76 76 81 83
85 86 88 89 90 91
92 93 93 93 94 97
99 99 99 101 108 109
112
Solution:
63 85 92 99 112
L Q1 ~
x Q3 H
Boxplot for Weight Data
Weights from Sixth Grade Class
60 70 80 90 100 110
Weight
L Q1 ~
x Q3 H
Quartiles
• Quartiles divide data into four equal parts
– First quartile—Q1
• 25% of observations are below Q1 and 75% above Q1
• Also called the lower quartile
– Second quartile—Q2
• 50% of observations are below Q2 and 50% above Q2
• This is also the median
– Third quartile—Q3
• 75% of observations are below Q3 and 25% above Q3
• Also called the upper quartile
• Calculating quartiles
Example
The sorted observations are:
25, 29, 31, 39, 43, 48, 52, 63, 66, 90
where:
L = the real lower limit of the quartile class (containing Q1 or
Q3)
n = Σf = the total number of observations in the entire data set
C = the cumulative frequency in the class immediately before
the quartile class
f = the frequency of the relevant quartile class
i = the length of the real class interval of the relevant quartile
class
Deciles, percentiles and fractiles
• Further division of a distribution into a number of equal parts is
sometimes used; the most common of these are deciles, percentiles,
and fractiles
• Distribution is symmetrical
Mean = Median = Mode
Assign
Midpoint Percentile Rank
Data Order Number to Ties (Apply Formula)
9 1 1 1 2.381
5 2 2 2 7.143 Steps to
2
3
3
3
3
4
4
4
16.667
16.667
Calculating
3 3 5 4 16.667 Percentile Ranks
4 4 6 7 30.952
8 4 7 7 30.952
9 4 8 7 30.952
1 5 9 10 45.238
7 5 10 10 45.238 Example:
4 5 11 10 45.238
8 6 12 12 54.762
( Rank3 .5)
3 7 13 14 64.286 PR 3 100
7 7 14 14 64.286 N
6 7 15 14 64.286
5 8 16 17.5 80.952 (4 .5)
7 8 17 17.5 80.952 100 16.667
4 8 18 17.5 80.952 21
5 8 19 17.5 80.952
8 9 20 20.5 95.238
8 9 21 20.5 Psy 427 -95.238
Cal State Northridge 100
Percentile
X P ( p)(n 1)
• Where XP is the score at the desired percentile, p is the
desired percentile (a number between 0 and 1) and n is
the number of scores)
• If the number is an integer, than the desired percentile
is that number
• If the number is not an integer than you can either
round or interpolate; for this class we’ll just round
(round up when p is below .50 and down when p is
above .50)
Psy 427 - Cal State Northridge 101
Percentile
• Apply the formula X P ( p)(n 1)
1. You’ll get a number like 7.5 (think of it as
place1.proportion)
2. Start with the value indicated by place1 (e.g. 7.5,
start with the value in the 7th place)
3. Find place2 which is the next highest place
number (e.g. the 8th place) and subtract the value
in place1 from the value in place2, this distance1
4. Multiple the proportion number by the distance1
value, this is distance2
5. Add distance2Psyto427the value
- Cal State in place1 and that is102
Northridge
Example: Percentile
• Example 1: 25th percentile:
{1, 4, 9, 16, 25, 36, 49, 64, 81}
• X25 = (.25)(9+1) = 2.5
– place1 = 2, proportion = .5
– Value in place1 = 4
– Value in place2 = 9
– distance1 = 9 – 4 = 5
– distance2 = 5 * .5 = 2.5
– Interpolated value = 4 + 2.5 = 6.5
–
Psy 427 - Cal State Northridge 103
6.5 is the 25th percentile
Example: Percentile
• Example 2: 75th percentile
{1, 4, 9, 16, 25, 36, 49, 64, 81}
• X75 = (.75)(9+1) = 7.5
– place1 = 7, proportion = .5
– Value in place1 = 49
– Value in place2 = 64
– distance1 = 64 – 49 = 15
– distance2 = 15 * .5 = 7.5
– Interpolated value = 49 + 7.5 = 56.5
– 56.5 is the 75th Psy
percentile
427 - Cal State Northridge 104
Quartiles
• To calculate Quartiles you simply find the scores
the correspond to the 25, 50 and 75 percentiles.
• Q1 = P25, Q2 = P50, Q3 = P75