Anda di halaman 1dari 45

Descriptive Measures

What are descriptive measures?


Descriptive measures are numerical quantities computed from data (population or sample) that is used to describe its properties. Is helpful in describing data

Some Descriptive Measures


Measures of Location Measures of Dispersion/Spread Measure of Symmetry (skewness) Measure of peaked ness (kurtosis) Measures of relationships

Measure of Location
A numerical value within the range of the data set which describes its location or position relative to the entire data set.
Measures of Central Tendencies Percentiles Deciles Quartiles Quintiles
4

Measures of Central Tendency (Averages)


Describes the center of the data set or the value in which a set of data tend to cluster/center.
Arithmetic Mean, Mean, Average Median Mode

Mean

Xi
i 1

X
i 1

POPULATION MEAN

SAMPLE MEAN

Total (Sum) of all observations in the data set divided by the Total number of observations.

Mean (Example)
25, 45, 50, 55, 55, 62, 64, 66, 67, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90

25 45 ... 90 68.55 20

Mean (Properties)
Applicable only to QUANTITATIVE variables. All observations contribute to the mean value. Easily affected by extreme values Amenable to further mathematical manipulation Total deviation of all observations from the mean is equal to zero.

(x ) (x x ) 0
i 1 i i 1 i
8

Weighted Mean
In many situations, a numerical value is associated/attached with each observation where such value denotes the relative importance of that observation as compared to the rest. Such values are called weights. If some values are more important than the others, the computed average/mean called the weighted mean is given as

Weighted Mean
xw

w x
i 1 n

i i

w
i 1

Where wi called weights are numerical values associated with each observation representing the relative importance of an observation. The larger the values, the more important the observation is.
10

Weighted Mean
One particular type of a weighted mean is the GROUP MEAN. The mean value computed from GROUPED DATA data summarized in a frequency distribution table. In here, the xi are the class marks and the weight, the frequency. That is

xG

fx fx
i 1 k i i

i 1

i 1

i i

fi

11

Median
A value that divides an array into two equal parts. (Array data set arranged in either increasing /decreasing order) Denoted by Md. Middle value of the array. That is about 50% of the observations is less than the median value and also the same proportion is above it.

12

Median (Computation)
25, 45, 50, 55, 55, 62, 64, 66, 67, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90 Even number of observations: Middle value is between the 10th (N/2) and 11th ((N+1)/2) values in the array. In this case, the median value is between 72 and 74. Median: Midpoint of these two values.

72 74 Md 73 2
13

Median (Computation)
25, 45, 50, 55, 55, 62, 64, 66, 67, 72, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90 Odd number of observations: Middle value is between the 11th ((N+1)/2) value in the array. In this case, the median value is 72

14

Median (Properties)
Applicable for quantitative variables Not affected by extreme values. Thus if the data set consists of extreme values (either very large or very small), the Median is a desired measure of central tendency. Not amenable to further manipulations.

15

Mode
A value or values in the data set that appears most frequent. Denoted by M0 May or may not exist. If it exist, it may not be unique. Can be determined for both qualitative and quantitative variables.

16

Mode (Example)
25, 45, 50, 55, 55, 62, 64, 66, 67, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90

M 0 79
Remark: A modal value of 0 is different from a data set with no mode. In this case, we say the mode does not exist or there is no mode.

17

Percentiles
Numbers that divide the data set into 100 equal parts. The jth percentile value, denoted by Pj, is the number (not necessarily an observation) that separates the bottom j% from the top (100j)%. Best applied for large data sets (at least 100 observations)

18

Percentiles (Example)
Suppose there are 10,000 student who took a college entrance examination. Suppose that the 30th percentile score in mathematics is 78% (P30=78). This means that 30% of the examinees (3,000) got scores below 78 and 70% (7,000) got scores above 78%. P50=Median

19

Deciles
Numbers which divide the array into ten equal parts. Denoted by Di, i=1,2,,10 D1=P10, D2=P20, D5=P50=Median

20

Quartiles and Quintiles


Quartiles are numbers that divide the array into 4 equal parts whereas quintiles divide it into 5 equal parts. Quartiles: Q1=P25, Q2=P50=Median, Q3=P75 The first quartile is the median of observations in the array whose values are between the smallest value and the median for the entire data set. The third quartile is the median of observations in the array whose values are greater than the median of the entire data set.
21

Measures of Dispersion
Single value which measures the spread or variability of the observations in a given data set. The larger the value, the more dispersed the data set. A measure equal to zero indicates no variation., Measures
Range Interquartile Range Variance Standard Deviation Coefficient of Variation

22

Range
Range, denoted by R is defined as the difference between the highest (largest) and smallest value in the data set. R=MAX MIN Quick but rough/crude measure of dispersion

23

Interquartile Range (IQR)


Difference between the 3rd and 1st quartile values.

MIN

Q1

MEDIAN

Q3

MAX

IQR = Q3 Q1
24

Variance
Spread is measured in terms of the average squared deviations of each observation from the mean. The farther the observations from the mean, the larger the variance hence the larger the spread of the data set

1 ( xi ) 2 N i 1
2

1 2 s ( x x ) i n 1 i 1
2

POPULATION VARIANCE

SAMPLE VARIANCE
25

Variance (Example)
25, 45, 50, 55, 55, 62, 64, 66, 67, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90

1 (25 68.55) 2 (45 68.55) 2 ... (90 68.55) 2 20 251.1475


2

26

Weighted Variance
The weighted variance is defined as

s
2 w

w (x x
i 1 i i

w
i 1

27

Weighted Variance
For GROUPED DATA

s
2 G

f (x x
i 1 i i

f
i 1

28

Standard Deviation
(Positive) Square root of the variance. Denoted by (population) or s (sample). Most common measure of variation Has the same unit as the data set. A measure of spread about the mean Affected by extreme values

29

Standard Deviation (Example)


25, 45, 50, 55, 55, 62, 64, 66, 67, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90

1 (25 68.55) 2 (45 68.55) 2 ... (90 68.55) 2 20 251.1475


2

251.1475 15.85
On the average, the exam scores deviate from the mean score of 68.55 by 15.85.

30

Coefficient of Variation (CV)


Ratio of the standard deviation to the mean. It is usually expressed in percent. Measures the variability of the data set relative to the mean. Unitless quantity which makes it more appropriate if one wishes to compare the variability of 2 data sets having different units of measure.

CV (%) x100
s x100 x

31

Empirical or Chebyshevs Rule


At least 75% of the observations are within 2 of its mean. At least 88.9% of the observations are within 3 of its mean. If the distribution is mound-shape and approximately symmetric, then
68% of the observations are within 1 of its mean 95% of the observations are within 2 of its mean 99% of the observations are within 3 of its mean

32

Skewness
A measure of symmetry of distributions or departure from symmetry. Measured by the coefficient of skewness.

3( Mean Median) SK Std .Dev


SK=0 implies symmetric distribution.
33

Skewness
SK=0, Symmetric

SK>0, Positively skewed (Right-tailed)

Mode<Md<Mean

Mean=Median

SK<0, Negatively skewed (Left-tailed) Mean<Md<Mode


34

Kurtosis
Measures the heaviness of lightness of the tails of a distribution (flatness or peakedness) Measured by the coefficient of kurtosis

(x )
i

3
35

K>0, light-tailed (peaked)

Kurtosis
K=0, Normal

K<0, heavy-tailed (flat)

36

Box-Plots
Graphical representation of properties of distributions. Utilizes the 5-numbers summary measures [MIN, Q1,Median (Q2), Q3, MAX] A box-plot is a graph of the 5-numbers summary measures
A central box spans Q1 and Q3 A line in the box marks the median Lines extend from the box out to the smallest (MIN) and largest (MAX) values Best used for side-by-side comparisons of distributions
37

Box-Plots
Sales
MAX Q3 Q1

Median

Manila Cebu

MIN

38

Correlation Coefficient
A measure that describes the strength of linear relationship between two quantitative variables (say X and Y measured on the same individuals). Does not denote cuasality i.e. cause and effect. Also referred to as the Pearson-product moment correlation Denoted by r. Scatterplot plot of X and Y values.
39

Scatterplot (Linear Relation)

X Positive Linear Relationship

X Negative (Inverse) Linear Relationship


40

Scatterplot (Linear Relation)

X Stronger Linear Relationship

X Weaker Linear Relationship


41

Scatterplot (Possible non-linear relations)

42

Correlation Coefficient
r

( x x )( y y )
i 1 i i

[ ( xi x ) 2 ][ ( yi y ) 2 ]
i 1 i 1

x y nxy
i 1 i i

[ xi2 nx 2 ][ yi2 ny 2 ]
i 1 i 1
43

Correlation Coefficient
The value of r ranges from -1 to +1. That is: -1 r +1 The closer r is to 1 (-1), the stronger the LINEAR relationship The closer r is to 0, the weaker the LINEAR relationship r > 0, indicates positive (direct) linear relation. r<0, indicates negative (inverse) linear relation. r=0 does not indicate no relation
44

Correlation Coefficient
|r| 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1.0 Qualitative Interpretation VERY WEAK linear relation WEAK linear relation MODERATE linear relation STRONG linear relation VERY STRONG linear relation

45

Anda mungkin juga menyukai