STATISTICS AND
STATISTICAL
INFERENCE
Summary Measures
Percentile Kurtosis
Maximum Quartile
Range
Decile
Minimum Coefficient of
Median
Variance Variation
Central Interquartile
Tendency Range
Standard Deviation
Mean Median Mode
Measures of Central Tendency
A single value that is used to identify the center
of the data
it is thought of as a typical value of the
distribution
precise yet simple
most representative value of the data
Mean
Most common measure of the center
Also known as arithmetic average
N
X
i =1
i
X1 + X 2 + K + X N
Population Mean: = =
N N
n
x
i =1
i
x1 + x2 + K + xn
x = =
Sample Mean: n n
Properties of the Mean
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5
Mean = 6
Median
Divides the observations into two equal
parts
If the number of observations is odd, the
median is the middle number.
If the number of observations is even, the
median is the average of the 2 middle
numbers.
~
Sample median denoted as x
~
while population median is denoted as
Properties of a Median
may not be an actual observation in
the data set
can be applied in at least ordinal level
a positional measure; not affected by
extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5
Mode
occurs most frequently
nominal average
may or may not exist
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No Mode
Mode = 9
Properties of a Mode
Measures of Location
A Measure of Location summarizes a data set
by giving a value within the range of the data
values that describes its location relative to the
entire data set arranged according to magnitude
(called an array).
SomeCommon Measures:
Minimum, Maximum
Percentiles, Deciles, Quartiles
Maximum and Minimum
Minimum is the smallest value in the
data set, denoted as MIN.
Percentiles
Numerical measures that give the
relative position of a data value
relative to the entire data set.
Divide an array (raw data arranged
in increasing or decreasing order
of magnitude) into 100 equal parts.
The jth percentile, denoted as Pj, is
the data value in the the data set
that separates the bottom j% of the
data from the top (100-j)%.
EXAMPLE
Suppose LJ was told that relative to the
other scores on a certain test, his score
was the 95th percentile.
This means that (at least) 95% of those
who took the test had scores less than or
equal to LJs score, while (at least) 5%
had scores higher than LJs.
Deciles
Divide an array into ten equal
parts, each part having ten
percent of the distribution of
the data values, denoted by Dj.
Measures of Variation
A measure of variation is a single
value that is used to describe the
spread of the distribution
A measure of central tendency alone
does not uniquely describe a distribution
A look at dispersion
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21
s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
R = 85 - 54 = 31
54 58 58 60 62 65 66 71
74 75 77 78 80 82 85
IQR = 78 - 60 = 18
(X i )2
Population variance 2
= i =1
N
Sample variance 2
(x
i =1
i x )2
s =
n 1
(X
i =1
i )2
=
Population SD N
(x i x)2
Sample SD s= i =1
n 1
Computation of Standard Deviation
(Sample) Data: 10 12 14 15 17 18 18 24
(10 16) 2 + (12 16) 2 + (14 16)2 + (15 16) 2 + (17 16) 2 + (18 16) 2 + ( 24 16) 2
s=
7
= 4.309
Mean = 65
S =0
65
65 65 65 65 65
Comparing Standard Deviation
Example: Team B - Heights of five marathon players in inches
Mean = 65
s = 4.0
62 67 66 70 60
Chebyshevs Rule
At least 75%
Example
The midterm exam scores of 100 STAT 1 students
last semester had a mean of 65 and a standard
deviation of 8 points.
Applying the Chebyshevs Rule, we can say that:
1. At least 75% of the students had scores
between 49 and 81.
2. At least 88.9% of the students had scores
between 41 and 89.
Coefficient of Variation (CV)
measure of relative variation
usually expressed in percent
shows variation relative to mean
used to compare 2 or more groups
Formula :
SD
CV = 100%
Mean
Comparing CVs
Stock A: Average Price = P50
SD = P5
CV = 10%
Stock B: Average Price = P100
SD = P5
CV = 5%
Measure of Skewness
Describes the degree of departures of the
distribution of the data from symmetry.
The degree of skewness is measured by
the coefficient of skewness, denoted as SK
and computed as,
3(Mean Median)
SK =
SD
What is Symmetry?
A distribution is said to be
symmetric about the mean,
if the distribution to the left
of mean is the mirror
image of the distribution to
the right of the mean.
Likewise, a symmetric
distribution has SK=0 since
its mean is equal to its
median and its mode.
Measure of Skewness
SK > 0
positively
skewed
SK < 0
negatively skewed
Measure of Kurtosis
Describes the extent of peakedness or
flatness of the distribution of the data.
Measured by coefficient of kurtosis (K)
computed as,
N
(X
4
i
)
K = i =1
4
3
N
Measure of Kurtosis
K=0
mesokurtic
K>0 K<0
leptokurtic platykurtic
Box-and-Whiskers Plot
Q1 Md Q3
75 78 85
Steps to Construct a Box-and-Whiskers plot
Q1 Md Q3
60 75 78 85 100
.
.
Q1 Md Q3
55 60 75 78 85 98 100