Measure of Location
A numerical value within the range of the data set which describes its location or position relative to the entire data set.
Measures of Central Tendencies Percentiles Deciles Quartiles Quintiles
4
Mean
Xi
i 1
X
i 1
POPULATION MEAN
SAMPLE MEAN
Total (Sum) of all observations in the data set divided by the Total number of observations.
Mean (Example)
25, 45, 50, 55, 55, 62, 64, 66, 67, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90
25 45 ... 90 68.55 20
Mean (Properties)
Applicable only to QUANTITATIVE variables. All observations contribute to the mean value. Easily affected by extreme values Amenable to further mathematical manipulation Total deviation of all observations from the mean is equal to zero.
(x ) (x x ) 0
i 1 i i 1 i
8
Weighted Mean
In many situations, a numerical value is associated/attached with each observation where such value denotes the relative importance of that observation as compared to the rest. Such values are called weights. If some values are more important than the others, the computed average/mean called the weighted mean is given as
Weighted Mean
xw
w x
i 1 n
i i
w
i 1
Where wi called weights are numerical values associated with each observation representing the relative importance of an observation. The larger the values, the more important the observation is.
10
Weighted Mean
One particular type of a weighted mean is the GROUP MEAN. The mean value computed from GROUPED DATA data summarized in a frequency distribution table. In here, the xi are the class marks and the weight, the frequency. That is
xG
fx fx
i 1 k i i
i 1
i 1
i i
fi
11
Median
A value that divides an array into two equal parts. (Array data set arranged in either increasing /decreasing order) Denoted by Md. Middle value of the array. That is about 50% of the observations is less than the median value and also the same proportion is above it.
12
Median (Computation)
25, 45, 50, 55, 55, 62, 64, 66, 67, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90 Even number of observations: Middle value is between the 10th (N/2) and 11th ((N+1)/2) values in the array. In this case, the median value is between 72 and 74. Median: Midpoint of these two values.
72 74 Md 73 2
13
Median (Computation)
25, 45, 50, 55, 55, 62, 64, 66, 67, 72, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90 Odd number of observations: Middle value is between the 11th ((N+1)/2) value in the array. In this case, the median value is 72
14
Median (Properties)
Applicable for quantitative variables Not affected by extreme values. Thus if the data set consists of extreme values (either very large or very small), the Median is a desired measure of central tendency. Not amenable to further manipulations.
15
Mode
A value or values in the data set that appears most frequent. Denoted by M0 May or may not exist. If it exist, it may not be unique. Can be determined for both qualitative and quantitative variables.
16
Mode (Example)
25, 45, 50, 55, 55, 62, 64, 66, 67, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90
M 0 79
Remark: A modal value of 0 is different from a data set with no mode. In this case, we say the mode does not exist or there is no mode.
17
Percentiles
Numbers that divide the data set into 100 equal parts. The jth percentile value, denoted by Pj, is the number (not necessarily an observation) that separates the bottom j% from the top (100j)%. Best applied for large data sets (at least 100 observations)
18
Percentiles (Example)
Suppose there are 10,000 student who took a college entrance examination. Suppose that the 30th percentile score in mathematics is 78% (P30=78). This means that 30% of the examinees (3,000) got scores below 78 and 70% (7,000) got scores above 78%. P50=Median
19
Deciles
Numbers which divide the array into ten equal parts. Denoted by Di, i=1,2,,10 D1=P10, D2=P20, D5=P50=Median
20
Measures of Dispersion
Single value which measures the spread or variability of the observations in a given data set. The larger the value, the more dispersed the data set. A measure equal to zero indicates no variation., Measures
Range Interquartile Range Variance Standard Deviation Coefficient of Variation
22
Range
Range, denoted by R is defined as the difference between the highest (largest) and smallest value in the data set. R=MAX MIN Quick but rough/crude measure of dispersion
23
MIN
Q1
MEDIAN
Q3
MAX
IQR = Q3 Q1
24
Variance
Spread is measured in terms of the average squared deviations of each observation from the mean. The farther the observations from the mean, the larger the variance hence the larger the spread of the data set
1 ( xi ) 2 N i 1
2
1 2 s ( x x ) i n 1 i 1
2
POPULATION VARIANCE
SAMPLE VARIANCE
25
Variance (Example)
25, 45, 50, 55, 55, 62, 64, 66, 67, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90
26
Weighted Variance
The weighted variance is defined as
s
2 w
w (x x
i 1 i i
w
i 1
27
Weighted Variance
For GROUPED DATA
s
2 G
f (x x
i 1 i i
f
i 1
28
Standard Deviation
(Positive) Square root of the variance. Denoted by (population) or s (sample). Most common measure of variation Has the same unit as the data set. A measure of spread about the mean Affected by extreme values
29
251.1475 15.85
On the average, the exam scores deviate from the mean score of 68.55 by 15.85.
30
CV (%) x100
s x100 x
31
32
Skewness
A measure of symmetry of distributions or departure from symmetry. Measured by the coefficient of skewness.
Skewness
SK=0, Symmetric
Mode<Md<Mean
Mean=Median
Kurtosis
Measures the heaviness of lightness of the tails of a distribution (flatness or peakedness) Measured by the coefficient of kurtosis
(x )
i
3
35
Kurtosis
K=0, Normal
36
Box-Plots
Graphical representation of properties of distributions. Utilizes the 5-numbers summary measures [MIN, Q1,Median (Q2), Q3, MAX] A box-plot is a graph of the 5-numbers summary measures
A central box spans Q1 and Q3 A line in the box marks the median Lines extend from the box out to the smallest (MIN) and largest (MAX) values Best used for side-by-side comparisons of distributions
37
Box-Plots
Sales
MAX Q3 Q1
Median
Manila Cebu
MIN
38
Correlation Coefficient
A measure that describes the strength of linear relationship between two quantitative variables (say X and Y measured on the same individuals). Does not denote cuasality i.e. cause and effect. Also referred to as the Pearson-product moment correlation Denoted by r. Scatterplot plot of X and Y values.
39
42
Correlation Coefficient
r
( x x )( y y )
i 1 i i
[ ( xi x ) 2 ][ ( yi y ) 2 ]
i 1 i 1
x y nxy
i 1 i i
[ xi2 nx 2 ][ yi2 ny 2 ]
i 1 i 1
43
Correlation Coefficient
The value of r ranges from -1 to +1. That is: -1 r +1 The closer r is to 1 (-1), the stronger the LINEAR relationship The closer r is to 0, the weaker the LINEAR relationship r > 0, indicates positive (direct) linear relation. r<0, indicates negative (inverse) linear relation. r=0 does not indicate no relation
44
Correlation Coefficient
|r| 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1.0 Qualitative Interpretation VERY WEAK linear relation WEAK linear relation MODERATE linear relation STRONG linear relation VERY STRONG linear relation
45