Intsta1 Lecture3

Descriptive Measures
What are descriptive measures?

Descriptive measures are numerical quantities computed from data (population or sample) that is used to describe its properties. Is helpful in describing data
Some Descriptive Measures

Measures of Location Measures of Dispersion/Spread Measure of Symmetry (skewness) Measure of peaked ness (kurtosis) Measures of relationships
Measure of Location
A numerical value within the range of the data set which describes its location or position relative to the entire data set.
Measures of Central Tendencies Percentiles Deciles Quartiles Quintiles
4
Measures of Central Tendency (Averages)

Describes the center of the data set or the value in which a set of data tend to cluster/center.
Arithmetic Mean, Mean, Average Median Mode
Mean
Xi
i 1
X
i 1
POPULATION MEAN
SAMPLE MEAN
Total (Sum) of all observations in the data set divided by the Total number of observations.
Mean (Example)
25, 45, 50, 55, 55, 62, 64, 66, 67, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90
25 45 ... 90 68.55 20
Mean (Properties)
Applicable only to QUANTITATIVE variables. All observations contribute to the mean value. Easily affected by extreme values Amenable to further mathematical manipulation Total deviation of all observations from the mean is equal to zero.
(x ) (x x ) 0
i 1 i i 1 i
8
Weighted Mean
In many situations, a numerical value is associated/attached with each observation where such value denotes the relative importance of that observation as compared to the rest. Such values are called weights. If some values are more important than the others, the computed average/mean called the weighted mean is given as
Weighted Mean
xw
w x
i 1 n
i i
w
i 1
Where wi called weights are numerical values associated with each observation representing the relative importance of an observation. The larger the values, the more important the observation is.
10
Weighted Mean
One particular type of a weighted mean is the GROUP MEAN. The mean value computed from GROUPED DATA data summarized in a frequency distribution table. In here, the xi are the class marks and the weight, the frequency. That is
xG
fx fx
i 1 k i i
i 1
i 1
i i
fi
11
Median
A value that divides an array into two equal parts. (Array data set arranged in either increasing /decreasing order) Denoted by Md. Middle value of the array. That is about 50% of the observations is less than the median value and also the same proportion is above it.
12
Median (Computation)
25, 45, 50, 55, 55, 62, 64, 66, 67, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90 Even number of observations: Middle value is between the 10th (N/2) and 11th ((N+1)/2) values in the array. In this case, the median value is between 72 and 74. Median: Midpoint of these two values.
72 74 Md 73 2
13
Median (Computation)
25, 45, 50, 55, 55, 62, 64, 66, 67, 72, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90 Odd number of observations: Middle value is between the 11th ((N+1)/2) value in the array. In this case, the median value is 72
14
Median (Properties)
Applicable for quantitative variables Not affected by extreme values. Thus if the data set consists of extreme values (either very large or very small), the Median is a desired measure of central tendency. Not amenable to further manipulations.
15
Mode
A value or values in the data set that appears most frequent. Denoted by M0 May or may not exist. If it exist, it may not be unique. Can be determined for both qualitative and quantitative variables.
16
Mode (Example)
25, 45, 50, 55, 55, 62, 64, 66, 67, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90
M 0 79
Remark: A modal value of 0 is different from a data set with no mode. In this case, we say the mode does not exist or there is no mode.
17
Percentiles
Numbers that divide the data set into 100 equal parts. The jth percentile value, denoted by Pj, is the number (not necessarily an observation) that separates the bottom j% from the top (100j)%. Best applied for large data sets (at least 100 observations)
18
Percentiles (Example)
Suppose there are 10,000 student who took a college entrance examination. Suppose that the 30th percentile score in mathematics is 78% (P30=78). This means that 30% of the examinees (3,000) got scores below 78 and 70% (7,000) got scores above 78%. P50=Median
19
Deciles
Numbers which divide the array into ten equal parts. Denoted by Di, i=1,2,,10 D1=P10, D2=P20, D5=P50=Median
20
Quartiles and Quintiles

Quartiles are numbers that divide the array into 4 equal parts whereas quintiles divide it into 5 equal parts. Quartiles: Q1=P25, Q2=P50=Median, Q3=P75 The first quartile is the median of observations in the array whose values are between the smallest value and the median for the entire data set. The third quartile is the median of observations in the array whose values are greater than the median of the entire data set.
21
Measures of Dispersion
Single value which measures the spread or variability of the observations in a given data set. The larger the value, the more dispersed the data set. A measure equal to zero indicates no variation., Measures
Range Interquartile Range Variance Standard Deviation Coefficient of Variation
22
Range
Range, denoted by R is defined as the difference between the highest (largest) and smallest value in the data set. R=MAX MIN Quick but rough/crude measure of dispersion
23
Interquartile Range (IQR)

Difference between the 3rd and 1st quartile values.
MIN
Q1
MEDIAN
Q3
MAX
IQR = Q3 Q1
24
Variance
Spread is measured in terms of the average squared deviations of each observation from the mean. The farther the observations from the mean, the larger the variance hence the larger the spread of the data set
1 ( xi ) 2 N i 1
2
1 2 s ( x x ) i n 1 i 1
2
POPULATION VARIANCE
SAMPLE VARIANCE
25
Variance (Example)
25, 45, 50, 55, 55, 62, 64, 66, 67, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90
1 (25 68.55) 2 (45 68.55) 2 ... (90 68.55) 2 20 251.1475

2
26
Weighted Variance
The weighted variance is defined as
s
2 w
w (x x
i 1 i i
w
i 1
27
Weighted Variance
For GROUPED DATA
s
2 G
f (x x
i 1 i i
f
i 1
28
Standard Deviation
(Positive) Square root of the variance. Denoted by (population) or s (sample). Most common measure of variation Has the same unit as the data set. A measure of spread about the mean Affected by extreme values
29
Standard Deviation (Example)

25, 45, 50, 55, 55, 62, 64, 66, 67, 72 74, 77, 78, 79, 79, 79, 82, 84, 88, 90
1 (25 68.55) 2 (45 68.55) 2 ... (90 68.55) 2 20 251.1475

2
251.1475 15.85
On the average, the exam scores deviate from the mean score of 68.55 by 15.85.
30
Coefficient of Variation (CV)

Ratio of the standard deviation to the mean. It is usually expressed in percent. Measures the variability of the data set relative to the mean. Unitless quantity which makes it more appropriate if one wishes to compare the variability of 2 data sets having different units of measure.
CV (%) x100
s x100 x
31
Empirical or Chebyshevs Rule

At least 75% of the observations are within 2 of its mean. At least 88.9% of the observations are within 3 of its mean. If the distribution is mound-shape and approximately symmetric, then
68% of the observations are within 1 of its mean 95% of the observations are within 2 of its mean 99% of the observations are within 3 of its mean
32
Skewness
A measure of symmetry of distributions or departure from symmetry. Measured by the coefficient of skewness.
3( Mean Median) SK Std .Dev

SK=0 implies symmetric distribution.
33
Skewness
SK=0, Symmetric
SK>0, Positively skewed (Right-tailed)
Mode<Md<Mean
Mean=Median
SK<0, Negatively skewed (Left-tailed) Mean<Md<Mode

34
Kurtosis
Measures the heaviness of lightness of the tails of a distribution (flatness or peakedness) Measured by the coefficient of kurtosis
(x )
i
3
35
K>0, light-tailed (peaked)
Kurtosis
K=0, Normal
K<0, heavy-tailed (flat)
36
Box-Plots
Graphical representation of properties of distributions. Utilizes the 5-numbers summary measures [MIN, Q1,Median (Q2), Q3, MAX] A box-plot is a graph of the 5-numbers summary measures
A central box spans Q1 and Q3 A line in the box marks the median Lines extend from the box out to the smallest (MIN) and largest (MAX) values Best used for side-by-side comparisons of distributions
37
Box-Plots
Sales
MAX Q3 Q1
Median
Manila Cebu
MIN
38
Correlation Coefficient
A measure that describes the strength of linear relationship between two quantitative variables (say X and Y measured on the same individuals). Does not denote cuasality i.e. cause and effect. Also referred to as the Pearson-product moment correlation Denoted by r. Scatterplot plot of X and Y values.
39
Scatterplot (Linear Relation)
X Positive Linear Relationship
X Negative (Inverse) Linear Relationship

40
Scatterplot (Linear Relation)
X Stronger Linear Relationship
X Weaker Linear Relationship

41
Scatterplot (Possible non-linear relations)
42
r
( x x )( y y )
i 1 i i
[ ( xi x ) 2 ][ ( yi y ) 2 ]
i 1 i 1
x y nxy
i 1 i i
[ xi2 nx 2 ][ yi2 ny 2 ]
i 1 i 1
43
The value of r ranges from -1 to +1. That is: -1 r +1 The closer r is to 1 (-1), the stronger the LINEAR relationship The closer r is to 0, the weaker the LINEAR relationship r > 0, indicates positive (direct) linear relation. r<0, indicates negative (inverse) linear relation. r=0 does not indicate no relation
44
|r| 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1.0 Qualitative Interpretation VERY WEAK linear relation WEAK linear relation MODERATE linear relation STRONG linear relation VERY STRONG linear relation
45

Intsta1 Lecture3

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Intsta1 Lecture3

Diunggah oleh

Hak Cipta:

Format Tersedia

Descriptive Measures

What are descriptive measures?

Some Descriptive Measures

Measures of Central Tendency (Averages)

Quartiles and Quintiles

Interquartile Range (IQR)

1 (25 68.55) 2 (45 68.55) 2 ... (90 68.55) 2 20 251.1475

Standard Deviation (Example)

1 (25 68.55) 2 (45 68.55) 2 ... (90 68.55) 2 20 251.1475

Coefficient of Variation (CV)

Empirical or Chebyshevs Rule

3( Mean Median) SK Std .Dev

SK>0, Positively skewed (Right-tailed)

SK<0, Negatively skewed (Left-tailed) Mean<Md<Mode

K>0, light-tailed (peaked)

K<0, heavy-tailed (flat)

Scatterplot (Linear Relation)

X Positive Linear Relationship

X Negative (Inverse) Linear Relationship

Scatterplot (Linear Relation)

X Stronger Linear Relationship

X Weaker Linear Relationship

Scatterplot (Possible non-linear relations)

Anda mungkin juga menyukai