Chapt3 Overheads

NUMERICAL DESCRIPTIVE MEASURES
Wish to describe data using summary statistics

1. Measures of Central Tendency (what is the middle of the data?)
Three Options: Arithmetic Mean, Median, Mode
(ignore geometric mean in text)
1.a. Arithmetic Mean
Most common measure of central tendency: the average
When referring to sample values, denoted as
When referring to population values, denoted as

Sample mean
n
Xi
X i 1
n
X1 X 2 X n
n
where n is sample size

Population Mean
N
X
i 1
X1 X 2 L X N
N
where N is population size

Example: raw data: 1 3 5 7 9
mean = (1+3+5+7+9)/5 = 5
Arithmetic Mean is affected by outliers
0 1 2 3 4 5 6 7 8 9 10
Mean = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 6
Chapter 3 - 1
1. b Median
Raw data is arrayed in ascending order. The MEDIAN is the
middle of the data, i.e., a value such that half of the observations
lie below and half lie above the value.
If n or N is odd, the median is the middle number
If n or N is even, the median is the average of the two middle
numbers.
Example: a sample of 10 house prices (in thousands) yields:
144 98 204 177 155 316 100 177 177 170
to find median, arrange in ascending order:
98 100 144 155 170 177 177 177 204 316
Since even number of observations: median = (170+177)/2 =
173.5
Note: arithmetic mean = 171.8
Median is often preferred when data are skewed, since not
affected by extreme values:
Ex. Housing prices: 98 100 100 102 110 125 350
Median = 102
Mean = 140.7
0 1 2 3 4 5 6 7 8 9 10
Median = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5
1. c. Mode
Value that occurs most often
There may be no mode or there may be more than one mode.
Chapter 3 - 2
Unaffected by extreme values

Ex. In housing price data above, mode = 177
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 1 2 3 4 5 6
No Mode
Mode = 9
Mode is infrequently used, but has applications in quality

control.
Which is the best measure to use? It depends.
If histogram of data is symmetric about the middle, all three
measures give similar results.
If histogram is skewed right or left, measures will differ.
Choose in context. The remainder of course will focus on mean
(eg., what is the mean income of Canadians?, what is the mean
life expectancy of a component?) and how to estimate
population mean using sample mean.
2. Measures of Variation/Dispersion
Need a single number describing how spread out the data are.
Many options: discuss range, MAD, Variance
2. a. Range: simply distance between highest and lowest values
Not very informative: ignores all but two observations:
Ex. 1, 9, 9.5, 9.5, 10
Ex. 1, 3, 7, 9, 10
Range = 9
Range = 9
2. b. Interquartile range (ignore)

2. c. Mean Absolute Deviation (MAD)
Chapter 3 - 3
Note in text, but simple idea. Conceptually straightforward idea

is to measure how spread out the data are from the middle by
reporting the average distance from the the middle to each
observation. Let the mean be the measure of the middle.
Ex. 4 observations 3 4 5 6
Mean = (3+4+5+6)/4 = 4.5
Observation 1 lies 1.5 units away from mean
Obs. 2 lies 0.5 units away
Obs. 3 lies 0.5 units away
Obs. 4 lies 1.5 units away.
Therefore, average distance away from mean is
(1.5+0.5+0.5+1.5)/4 = 1
This measure is Mean Absolute Deviation (MAD). Can be
computed using the formula (for population)
N
MAD
X i
i 1
Absolute value operators are difficult to work with, but are

required, since simple mean of deviations will always equal
zero. An alternative to absolute value operators is to square
the deviations before adding and dividing by no. of observations
then positives do not cancel negatives.
Leads to important concept in course
2. d Variance
Defined: the average squared deviation from the mean.
Computed differently in population than in sample.
N
Population Variance:
( X i ) 2
i 1
N
n
Sample Variance:
s2
(X i X )2
i 1
n 1
Chapter 3 - 4
Example of calculation:
Observation Deviation from Mean
3
-1.5
4
-0.5
5
0.5
6
1.5
Total
Squared Deviation
2.25
0.25
0.25
2.25
5
Therefore population variance = 5/4 = 1.25

Interpretation of variance is difficult
Most frequently used measure of variation/dispersion
STANDARD DEVIATION - square root of variance
In Population
In Sample
s2
Data A
11 12
13
14
15
16
17
18
19
20 21
Data B
11 12
21
13
14
15
16
17
18
19
Mean = 15.5
s = 3.338
20
Mean = 15.5
s = .9258
20 21
Mean = 15.5
s = 4.57
Data C
11 12
13
14
15
16
17
18
19
The magnitude of the Standard Deviation is meaningful.
Chapter 3 - 5
Standard Deviation uses original units of measurement.

Value of standard deviation interpreted using the Empirical
Rule: If the data are fairly symmetric about the mean (i.e.,
histogram is bell-shaped), then the interval
1
contains approximately 68% of the observations
Ex. Suppose final grades in class of 150 statistics students are

distributed in a symmetric manner about the mean, with = 72
and = 9 . Does this indicate a wide variation in grades?
In order to capture 68% of final grades, we require a range from
72 9 = 63
to 72 + 9 = 81
and 95% of students will have grades between 54 and 90.

2.e. Coefficient of Variation
We may wish to compare the dispersion of two sets of data. If
they have different means or are measured in different units, the
standard deviations are not comparable. In these cases, use
measures of dispersion relative to the mean
CV = (Standard Deviation/Mean) x 100%
where standard dev. and mean can either be pop. or sample
Note: measured in percentages
Ex.
Stock A:
mean price last year = $50

Std. dev. of prices last year = $5
Stock B:
mean price last year = $100

Chapter 3 - 6
Std. dev. of prices last year = $5

Calculate CVs to compare degree of risk:
Stock A: CV = (5/50) x 100% = 10%
Stock B: CV = (5/100) x 100% = 5%
3. Measures of Shape
Measures of skewness indicate shape of distribution.
Left-Skewed
Symmetric
Mean = Median =Mode
Right-Skewed
Mode < Median < Mean
4. Measures of Correlation
Require measure of how closely correlated two variables are.
Coefficient of Correlation indicates type and strength of linear
relationship in bivariate data.
Denoted r
n
( xi x)( yi y)
i 1
i 1
i 1
( xi x) 2 ( yi y ) 2
Coefficient of correlation is unit free, ranges from -1 to +1

-1 indicates perfect negative linear relationship
+1 indicates perfect positive linear relationship
Chapter 3 - 7
r = -1
r = -.6
r=0
r = .6
r=1
Chapter 3 - 8

Chapt3 Overheads

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Chapt3 Overheads

Diunggah oleh

Hak Cipta:

Format Tersedia

NUMERICAL DESCRIPTIVE MEASURES

Wish to describe data using summary statistics

When referring to population values, denoted as

where n is sample size

where N is population size

Arithmetic Mean is affected by outliers

Unaffected by extreme values

Mode is infrequently used, but has applications in quality

2. b. Interquartile range (ignore)

Note in text, but simple idea. Conceptually straightforward idea

Absolute value operators are difficult to work with, but are

Therefore population variance = 5/4 = 1.25

The magnitude of the Standard Deviation is meaningful.

Standard Deviation uses original units of measurement.

contains approximately 68% of the observations

contains approximately 95% of the observations

contains approximately 99% of the observations

Ex. Suppose final grades in class of 150 statistics students are

and 95% of students will have grades between 54 and 90.

mean price last year = $50

mean price last year = $100

Std. dev. of prices last year = $5

Coefficient of correlation is unit free, ranges from -1 to +1

Anda mungkin juga menyukai