Anda di halaman 1dari 10

Please purchase PDFcamp Printer on http://www.verypdf.com/ to remove this watermark.

Measures of Variation

From our raw data, we were able to calculate a measure of central location. Although we
found five measures of central location, we shall, for the remainder of this course,
concentrate only on the arithmetic mean

Having found the mean of our data set, we can now proceed to calculate a statistic that
measures how much are observed data varies around its mean.

Suppose that we had two sections of students (X & Y) taking an exam graded out of ten.

Observation X Y

1 7 6
2 9 10
3 6 6
4 9 4
5 4 2
6 7 8
7 5 10
8 8 6
9 8 9
10 7 9

S 70 S 70

X = SX/n = 7 Y = SY/n = 7

The mean of both data sets is 7, yet closer inspection reveals that there is greater variation
in data set Y than in data set X. For starters, the top score in X is 9, and the low score is
4. By contrast the high and low scores in Y are 10 and 2 respectively.

But this is hardly rigorous, what we need is a statistic that is calculated from as much of
our data as possible, not merely the high and low scores.

The statistics of choice will be the Variance, the Standard Deviation and the Coefficient
of Determination.

But. before I start throwing equations around the shop, I need to sell you on the idea of
why we should be interested in a measure of variation – what does it tell us?
Please purchase PDFcamp Printer on http://www.verypdf.com/ to remove this watermark.

Consider first meteorology. The average July temperature in Buffalo NY is 69°F the
average July temperature in Seattle WA is also 69°F. However, the average January
temperature in Buffalo is 25°F, while Seattle it’s a balmy 41°F.

In finance, the riskyness of an asset is measured by the standard deviation (variation) of


it’s price over a period of time (say, 250 days). Indeed, there is a relationship between an
asset’s return and its riskyness, assets with low risk (such as US T-Bills) also have low
returns, while assets with higher risk, have high returns (for example, the stock of
Google). Even in the universe of stocks, some are considered stable (General Electric,
IBM, to name but two), while others are considered to be volatile, such as the stocks in
the bio-technology sector. For those of you who might be interested the translation of the
word risk into Mandarin Chinese is ? ? , It means risk but opportunity, not just plain risk

In sports, the idea of variation is pegged to consistency. For example in baseball, the
closer might not necessarily be the best pitcher on a team, but he’s probably the most
consistent. In golf, major tournaments are decided over four rounds. The winner is rarely
the golfer who scored the lowest round in the tournament, but is definitely the most
consistent.

Anyhow, the variation found in a data set is measured as following way.

i =n

∑ ()Τϕ
X − X/Φ5 18.2031 Τφ 0.6657 0 0 1 194.04 452.7298 Τµ ()
2
i
i =1
Variance =
n −1

We subtract the mean from each observation and sum the squares. We then divide by the
number of observation minus 1. The reason why we have to square the deviation from the
mean is because the simple sum of the mean deviations will be zero. The reason we
divide by ‘n-1’ rather than ‘n’, is a bit tricky but I’ll deal with that later. Let’s have look
at the X data from the previous page, where we found the mean to be 7.

Observation X ()Τϕ
X − X ΕΤ Θ (θX 258.24
− X)
2
310.5298 4.2 16.92 ρε Ω∗ ν 0 Γ Β
1 7 0 0
2 9 2 4
3 6 -1 1
4 9 2 4
5 4 -3 9
6 7 0 0
7 5 -2 4
8 8 1 1
9 8 1 1
10 7 0 0
S = 70 S=0 S = 24
Please purchase PDFcamp Printer on http://www.verypdf.com/ to remove this watermark.

Therefore our variance = 24 ÷ 9 = 2? or 2.6667

We can repeat the process for our Y data and will find that its variance is 7.1111, or seven
and one ninth.

So, the variance of the Y data is larger than the variance of the X data, which is what we
expected when we first looked at the data. However, what we now want is a way to
interpret 2.6667 and 7.1111 – what do those numbers mean? The answer is not
immediately obvious, we had to square the deviations from the mean in order to ensure
that they did not sum to zero, but in doing so we inflated each deviation.

The obvious thing to do would be to somehow undo the squaring, by taking the square
root of the variance. This statistic is called the Standard Deviation.

1/ 2
i =n 2 
∑ ()ΤϕX − X /Φ5
 18.2188 Τφ 0.6654 0 0 1 251.28 536.3698 Τµ
Standard Deviation =  i =1 
 n −1 
 
 

So the standard deviation of X = ( 2.6667)1/2 = 1.6330

And the standard deviation of Y = (7.1111)1/2 = 2.6667

Note: The fact that the standard deviation of Y happened to be the variance of X was
purely accidental. The data sets are independent of each other.

Now we can interpret the standard deviation.

Standard Deviation: The standard deviation is the average deviation, of the


individual observations, from their mean.

For practical purpose, we are only really interested in the standard deviation, the fact that
we have to calculate the variance first, is neither here or there.

Notation

The variance of a sample is denoted S2

The Standard deviation of a sample is denoted S


Please purchase PDFcamp Printer on http://www.verypdf.com/ to remove this watermark.

S2 The sample variance is a statistic, it is our best estimate of the population


variance, which denoted by s 2 (sigma squared). s 2 is a parameter.

S The sample standard deviation is a statistic, it is our best estimate of the


population standard deviation, which denoted by s (sigma). s is a
parameter.

Recall that we referred to the mean of a data set as its first moment. The standard
deviation is called the second moment.

Degrees of Freedom

We can now return to the thorny issue of why we divided by (n-1) rather than simply (n).

The short answer is that we lost one degree of freedom, but I would venture to guess that
this fact alone is not particularly helpful.

Formally,

Degrees of freedom are the number of independent pieces of information (our


observations) that are available to estimate another piece of information. More
concretely, the number of degrees of freedom is the number of independent observations
in a sample of data that are available to estimate a parameter of the population from
which that sample is drawn.

Example 1) If we have two observations, when calculating the mean we have two
independent observations; however, when calculating the variance, we
have only one independent observation, since the two observations are
equally distant from the mean.

Example 2) Suppose we have three observation (n = 3), if I tell you that the arithmetic
mean of this data set is five (5), I have lost a degree of freedom. Otherwise
stated, only two of the original three variables can actually vary, the third
has to be fixed – it can no long vary.

I have three volunteers: Tom, Dick and Harry who are free to chose any
number they wish, but I tell them that the mean of their choices must be 5.

Tom scratches his head and comes up with three (3).

Dick choose nine (9)

Now Harry, unlike Tom & Dick, cannot choose any number he wants; he
is constrained by the fact that the mean is five (5). Therefore he must say
four (4) because only for can give us a mean of five (3 + 9 + 4)/3 = 5.
Please purchase PDFcamp Printer on http://www.verypdf.com/ to remove this watermark.

Thus, having calculate the mean, we no longer have (n) variables that can vary, we now
only have (n-1), the last has to be fixed.

Alternative Way to Calculate the Standard Deviation/Variance

The formula given earlier on in this note has the advantage of being intuitive, we can
immediately see that we are summing squares of deviations from the mean.
Pedagogically this is a desirable quality. Unfortunately, it is not computationally
efficient, in the sense that we can compute the standard deviation using fewer steps.

Rather that simply give you the alternative formula, I will derive it for you. I do this not
to aggravate you, or to show off, but I want to give you a sense of what Mathematical
Statistics looks like. Since this is a course in Business Statistics, you are not required to
learn this, but I believe you will benefit from the exercise.

We will start with the variance

i =n

∑ ()Τϕ
X − X /Φ5 18.2031 Τφ 0.6668 0 0 1 155.64 515.0098 Τµ ()
2

S2 = i =1
(1)
n −1

Since I am not going to play with the denominator, I will omit it for clarity and bring it
back later. Again for clarity, I’ll also lose the super & subscripts from the Sigma.

∑ ()Τϕ
X − X ΕΤ Θ θ 128.88 398.4898 (2)
4.2 18.84 ρε Ω ν 0 Γ ΒΤ /Φ5 18.1875 Τφ 0.
2

I expand the term in the bracket

∑ (X 2
+ X 2 − 2 X.X ) (3)

Now, I will run the sigma operator through the equation. We treat S in exactly the same
way as we wood a constant (like a fixed number).

∑ X +∑ X
2 2
− 2 ∑ X. X (4)

Now before this gets too unmanageable, why don’t we concoct a little data set to help us
unravel the three above terms.
Please purchase PDFcamp Printer on http://www.verypdf.com/ to remove this watermark.

Suppose X = {4, 8, 6} So SX = 18 n = 3 and, X =6

X X2 XX X2

4 36 24 16

8 36 48 64

6 36 36 36

SX = 18 S X 2 =108 SX X = 108 SX2 = 116

So, we notice that a) S X 2 = SX X Think about why this has to be so.

b) S X 2 = n. X 2

Returning to equation (4)

∑ X +∑ X
2 2
− 2 ∑ X. X (4)

From a) above we get

∑ X +∑ X
2 2
− 2.∑ X 2 (5)

∑ X −∑ X
2 2
(6)

From b) above we get

∑X 2
− n.X 2 (7)

Replacing the denominator we get the variance

Variance =
∑X 2
− n. X 2
n −1

And, taking the square root we recover the standard deviation

∑ X 2 − n. X 2 
1/ 2

Standard Deviation =   (8)



 n −1 

Please purchase PDFcamp Printer on http://www.verypdf.com/ to remove this watermark.

We can now double check if this new equation is actually correct, with our original X
Data.

Observation X X2

1 7 49
2 9 81
3 6 36
4 9 81
5 4 16
6 7 49
7 5 25
8 8 64
9 8 64
10 7 49
S(X) = 70 S X2 = 514

n = 10

X= 7

∑ X 2 − n. X 2 
1/ 2
514 − 10(49) 
1/ 2

Standard Deviation = 
n −1
 =   = 1.6330 Yes!!! J

 
  9 

I don’t care which method you use as long as the answer is correct. Computers use the
above method because it is computationally more efficient than the mean deviation
method.

The coefficient of Variation

Suppose we have a random variable X

S
The coefficient of Variation is given by: C.V . =
X

This quantity, which gives the standard deviation as a proportion of the mean, is
sometimes informative. For example, the value S = 10 has little meaning unless we can
compare it to something else.. If S is observed to be 10 and X is observe to be 1,000, the
amount of variation is small relative to the size of the mean. However, if S is observed to
Please purchase PDFcamp Printer on http://www.verypdf.com/ to remove this watermark.

be 10 and X is observed to be 5, the variation is quite large relative to the size of the
mean.

Example: In statistics, the term precision has a special meaning.

Precision means variation in repeated measurement

If we were interested in testing a measuring instrument, such as those


stupid plastic things nurses shove into ones ear to take your temperature.

A Coefficient of variation of 10/1,000 = 0.01 might be acceptable.


However, a coefficient of variation of 10/5 = 2 might be unacceptable.

Example: We have two stocks: ABC Corp. and XYZ Corp. Which has the most risk.

ABC has a standard deviation of $12 and an average price of $50

XYZ has a standard deviation of $6 and an average price $24 .

Coefficient of Variation for ABC Corp. is $12/$50 = 0.24

Coefficient of Variation for XYZ Corp. is $6/$24 = 0.25

Remember, in finance less risky is good, more risky is bad

Thus, ABC Corp. is slightly less risk (but not by much).

Summary

1) Our main measure of variability is the sample Standard Deviation denotes S

1/ 2
i =n 2 
∑ ()ΤϕX − X /Φ5
 18.2188 2 Τφ 0.6654 0 0 1 266.76 289.4098 Τ
2) S is given by either  i =1

 or, ∑ X − n . X 2 1/ 2


 n −1  
 n −1 

 
 

i =n

∑ ()Τϕ
X − X /Φ5 18.2031 Τφ 0.6665 0 0 1 336.6 20
2

3) S2 is the sample variance given by i =1


or,
∑ X − n. X 2 2

n −1 n −1
Please purchase PDFcamp Printer on http://www.verypdf.com/ to remove this watermark.

S
4) The Coefficient of Variation given by C.V . = allows us to compare the
X
relative variability of two data sets.

5) Degrees of freedom is the number of independent observations in a sample of data


that are available to estimate a parameter of the population from which that
sample is drawn. For the purposes of this course, whenever a mean is calculated
we lose one degree of freedom. Later on in the course we will be dealing with two
two random variables and how they vary together (Covariance). Not surprisingly,
if we calculate the mean of both variables for the sake of calculating their
covariance – we lose two degrees of freedom.

6) S and S2 are sample statistics, they are our best estimate of the population
standard deviation and variance, s and s 2 – these are population parameters

S
7) The coefficient of variation for a random variable X, is given by is also a
X
sample statistic. It is our best estimate of the population coefficient of variation,
σ
which is a parameter given by X , where µ X is the population mean of X
µX
(parameter), and σ X is the population standard deviation of X (also a

parameter). There is no ancient Greek letter for the Coefficient of variation.

Incidentally, these ancient Greek letter were not chosen at random.

µ Is pronounced “mu”, chosen to represent the mean

S Is the ancient Greek capital letter “sigma,” chosen to represent the sum

s Is the ancient Greek lower case letter “sigma”, chosen to represent the
Standard deviation.

? Is the ancient Greek capital letter “pi” , chosen to represent the product.

p Is the ancient Greek lower case letter ‘pi”, which you know from grade
school to represent the mathematical constants, approximately equal to
3.14159. It represents the ratio of any circle's circumference to its diameter
in Euclidean geometry.

Never be afraid of notation, it’s like manners, it’s there to put you at ease, not to
frighten you.
Please purchase PDFcamp Printer on http://www.verypdf.com/ to remove this watermark.

JOIN KHALID AZIZ

ICMAP STUDENTS
DO NOT WASTE YOUR PRECIOUS TIME

* STAGE 1 FUNDAMENTALS OF FINANCIAL ACCOUNTING


RS 2000 FOR COMPLETE SYLLABUS

ECONOMICS RS 2000 FOR COMPLETE SYLLABUS

*STAGE 2 COST ACCOUNTING RS 2500 FOR COMPLETE SYLLABUS

*STAGE 3 FINANCIAL ACCOUNTING RS 3000 FOR COMPLETE


SYLLABUS
COST ACCOUNTING APPRAISAL RS 3000 FOR COMPLETE
SYLLABUS

CONTACT:

0322-3385752

R-1173, ALNOOR SOCIETY, BLOCK 19, F.B.AREA,


NEAR POWER HOUSE, KARACHI.

Anda mungkin juga menyukai