Central Tendency (Stats)

MEASURES OF CENTRAL TENDENCY ( Location )
Average age of top 50 powerful persons of 2010 in India decreased from 58 years to 54
Measures of Location or Central Tendency.

In
a distribution , the observations cluster around a central value. This property of concentration of the observations around a central value is called Central Tendency. The central Value around which there is concentration is called measure of central tendency( measure of location, Average). Ex: Mean marks scored by 1st PGDM is 65 %
Objectives of Averaging:
To
get a Single value that describes the characteristics of the entire data.
To Facilitate Comparison. For computing various other statistical measures such as dispersion, skewness, kurtosis and various other basic characteristics of a mass data.
Requisites of a Good Average.

1. It should be simple to Understand and 2. 3. 4. 5. 6.
easy to calculate. It should be based on all the items of the given data. It should be rigidly defined. It should be capable of further mathematical treatment. It should be affected as little as possible by fluctuations of sampling. It should not be affected by extreme observations ( Values)
Various measures of Central Tendency
1. 2. 3. 4. 5.
Arithmetic Mean (A.M) Median (M) Mode (Z) Geometric Mean (G.M) Harmonic Mean (H.M)
1. Arithmetic Mean ( A.M )
A.M =
Sum of observations
Number of observations
Calculation of A.M
1. Ungrouped Data ( Raw Data) 2. Discrete Data 3. Continuous Data
Ungrouped Data (Raw Data): A sample of 30 persons weight of a particular class students are as follows.
62 57 52 58 56 56 58 46 57 52 48 52 48 53 52 53 56 53 54 57 54 63 59 58 69 58 61 63 53 63
Discrete Data
Number of post graduates (x) 0 1 2 3 4
Frequency (f) 2 2 4 1 1
Continuous Data
Marks 20 30 30 40 40 50 No. of students 5 15 25
Exclusive method (overlapping)

In this method, the upper limits of one classinterval are the lower limit of next class. This method makes continuity of data. Marks 20 30 30 40 No. of students 5 15
40 50
25
A student whose mark is between 20 to 29.9 will be included in the 20 30 class.
Inclusive method (non-overlaping)

Marks 20 29 30 39 40 49 No. of students 5 15 25
A student whose mark is 29 is included in 20 29 class interval and a student whose mark in 39 is included in 30 39 class interval.
Ungrouped Data (Raw Data)

X
=
=
X1+ X2+ X3+ + X
x = observations n = number of observations.
The following data gives value of equity holdings of 20 of the Indias billionaires.
Name
Kiran Mazumdar-shaw The Nilekani family The Punj family Karsanbhai K.Patel& family Shashi Ruia K.K . Birla B. Rama Linga Raju Habil F. khorakiwala The Murthy family Keshub Mahindra The Kirloskar family M.v. Subbiah family Ajay G. Piramal Uday Kotak S.P.Hinduja Subhash Chandra Adi Godrej Vijay Mallya V.N. Dhoot Naresh Goyal
Equity Holdings ( Millions of Rs.)

2717 2796 3098 3144 3527 3534 3862 4187 4310 4506 4745 4784 4923 5034 5071 5424 5561 6505 6707 6874
X =
X 2717+2796++6874 = n 20 = Rs.4565.4 Millions
Discrete Data
X =
f x
f
X = Observations F = Frequency
Problem on Discrete Data

The following is the frequency distribution of the number of telephone calls received in 245 successive one-minute intervals at an exchange: No. of Calls Frequency 0 14 1 21 2 25 3 43 4 51 5 40 6 39 7 12
Obtain the mean number of calls per minute.
No. of calls (x)

0 1 2 3
Frequency (f)
14 21 25 43
fx
0 21 50 129 204 200 234
4
5 6 7
51
40 39 12 f=245
84
f x: 922
f x X = f
922 = 245
= 3.763
Continuous Series
The calculation is illustrated with the data relating to equity holdings of the group of 20 billionaires given
Class Interval
2000-3000
Frequency
2
3000-4000
4000-5000 5000-6000 6000-7000
5
6 4 3
Class Interval Frequency (F) Mid value(X) 2000-3000 2 2500 3000-4000 5 3500 4000-5000 6 4500 5000-6000 4 5500 6000-7000 3 6500 f=20
fx 5000 17500 27000 22000 19500 fx=91000
X =
fx= f
91000 20
4550
Properties of Arithmetic Mean
1. The sum of the deviations, of all the values x, from their arithmetic mean, is always zero 2. The product of the arithmetic mean and the number of items gives the total of all items.
3. If there are the arithmetic mean of two samples
of sizes n1and n2 respectively then, the arithmetic mean of the distribution combining the two can be calculated as
X12 = N1 X 1 + N 2 X 2 N1 + N2
Properties of Arithmetic Mean
4. The sum of squared deviations of the items from mean is minimum, when compared to the sum of squared deviation of the items from any other value.
Weighted Mean
The weighted mean of a set of numbers X1, X2, ..., Xn, with corresponding weights w1, w2, ...,wn, is computed from the following formula:
EXAMPLE Weighted Mean

The Carter Construction Company pays its hourly employees $16.50, $19.00, or $25.00 per hour. There are 26 hourly employees, 14 of which are paid at the $16.50 rate, 10 at the $19.00 rate, and 2 at the $25.00 rate. What is the mean hourly rate paid the 26 employees?
26
Merits:
Mean is based on all the items of the given data. 2. Mean is rigidly defined by a mathematical formula. 3. Mean is capable of further algebraic treatment. 4. Mean has good sampling stability.
1.
Demerits:
1. 2.
3.
Mean can be unduly affected by extreme values. Mean cannot be calculated for open-end classes, since mid points cannot be found for such classes. Mean cannot be found graphically like median and mode
Median (M)
The median is that value of the variable which divides the group in two equal parts, one part comprising all the values greater and the other, all the values less than median.
Calculation of Median
Raw Data
Steps:
Arrange the data in ascending order. 2. Find n+1 value 2 3. Apply the formula M= size of n+1 item.
1.
Sales Sorted Sales 9 6 12 10 13 15 16 14 14 16 17 16 24 21 22 6 9 10 12 13 14 14 15 16 16 16 17 17 18 18
Median
16
(20+1)/2=10.5
18
19 18 20 17
19
20 21 22 24
The median is the middle value of data sorted in order of magnitude.
Discrete Data
Steps: 1. Find Cumulative Frequencies (C.F) 2. Find N/2 value.
3. Apply the formula M= Size of (N/2)th item. In other words locate a value which is just more than N/2 value.
( Note: This is not Median)
N= total Frequency
4. Read the corresponding X value. This gives the value of Median.
Continuous data
Steps: 1. Find C.F 2. Find N/2 Value. 3. Locate the value which is more than N/2 value from the cumulative frequency column. 4. Read the corresponding class. This is the median class i. e the class where median lies.
5.
Apply the formula, M= l+

2 f
N
C .F
M = Median l = Lower limit of the median class. N = Total Frequency c. f= cumulative frequency of the pre median class f = frequency of the median class c = width of the median class
Merits: 1. It is easy to understand and easy to calculate for a non-mathematical person. 2. It is not affected by extreme observations. 3. Median can be calculated dealing with a distribution with open end classes. 4. Median can be represented graphically. 5. Median is the only average to be used with qualitative data.
Demerits: 1. In case of even number of observation for an ungrouped data , median can not be determined graphically. 2. Median, being a positional average , is not based on each and every item of the distribution. 3. Median is not suitable for further mathematical treatment. 4. Median doest not have sampling stability.
Mode (Z)
Mode is defined as the value which is repeated maximum number of times in a data.
. . . . . . : .
: : : . . . .
--------------------------------------------------------------6 9 10 12 13 14 15 16 17 18 19 20 21 22 24.
Mode
Calculation of Mode
Ungrouped data
Here, Mode is calculated by mere inspection.
Discrete Data
Here, Mode is calculated by mere inspection.
Continuous data
Steps: 1. Locate the maximum frequency. 2. Read the corresponding class. This is the modal class i.e., the class where mode lies. 3. Apply the formula,
= l +
+ 1 1 2
X c
z = mode l = lower limit of modal class 1 = f1 - f 0 = f1 - f 2 2 f 1 = Frequency of modal class f 0 = Frequency of pre modal class f 2 = Frequency of post modal class c = Width of the class interval
Merits: 1. Its value can be easily ascertained without much calculation. 2. It is an average which is commonly used in day to day life. 3. It is not affected by extreme values. 4. The data need not be arranged. 5. Mode can be graphically determined. 6. Mode can be calculated for data with open-end classes.
Demerits: 1. Mode is not based on each and every item of the data. 2. Mode is not capable of further of algebraic treatment. 3. Mode is not rigidly defined. 4. Model value can be misleading. 5. Mode is ill defined for bimodal or multimodal distribution. 6. Mode doesnt have sampling stability.
Relation b/w mean, median and mode.
Mode = mean - 3 [mean - median]
Mode = 3 median - 2 mean Median = mode +
Symmetrical Distribution
NEGATIVELY OR LEFT SKEWED
Mean < Median < Mode
POSITIVE OR RIGHT SKEWED
Mean > Median > Mode
Geometric Mean
It is defined as the nth root of product of n positive values or items.
Calculation of G.M
Ungrouped data ( Raw Data )
G.M= antilog log X
n
Calculation of G.M
Grouped data (Discrete &Continues Data )
G.M= antilog f log X

N
N= total Frequency.
Example:
Suppose you receive a 5 percent increase in salary this year and a 15 percent increase next year. The average annual percent increase is 8.886, not 10.0. Why is this so? We begin by calculating the geometric mean.
The Geometric Mean

Useful in finding the average change of percentages, ratios, indexes, or growth rates over time. It has a wide application in business and economics because we are often interested in finding the percentage changes in sales, salaries, or economic figures, such as the GDP, which compound or build on each other. The geometric mean will always be less than or equal to the arithmetic mean.
55
Combined Geometric Mean
G = Antilog [(n1 log G1 + n2 log G2)/ (n1 + n2)]
Geometric Mean
Merits:
Makes use of full data. 2. Extreme large values have lesser impact. 3. Useful for data relating to ratios and percentages. 4. Useful for rate of change/growth.
1.
Demerits: (G.M) 1. Cannot be calculated if any observation has the value zero or is negative. 2. Difficult to calculate and interpret.
AM, GM, and HM satisfy these inequalities:
AMGMHM
Equality holds only when all the elements of the given sample are equal.
Harmonic Mean
It is defined as the reciprocal of mean of reciprocal of values.
Calculation of H.M:
Ungrouped Data
n
Grouped Data
H.M- Merits:
1. 2. 3.
4.
It is based on all the items of the given data. It gives the best results where time and rates are under study. It is rigidly defined. It is calculated even if the series contains negative values.
H.M Demerits:
1.
2. 3.
It is difficult for layman to understand and interpret. It has limited practical application. It cannot be calculated if any of the value is zero.
Sales March April
Sales Executive A 14 12
Sales Executive B Sales Executive C 10 10 6 16
May
June July Aug
6
8 13 7
10
10 10 10
7
15 10 6 60
Total
60
60
Average
10
10
10
MEASURES OF DISPERSION
Why Study Dispersion?

A measure of location, such as the mean or the median, only describes the center of the data. It is valuable from that standpoint, but it does not tell us anything about the spread of the data. For example, if your nature guide told you that the river ahead averaged 3 feet in depth, would you want to wade across on foot without additional information? Probably not. You would want to know something about the variation in the depth. A second reason for studying the dispersion in a set of data is to compare the spread in two or more distributions.
The scatterdness of values from any measure of central tendency is called Variation or Dispersion
Characteristics for Ideal measure of dispersion

1.
2.
3. 4.
It should be rigidly defined. It should be based on all the observations. It should be amenable to further mathematical treatment. It should be not be affected by extreme observations.
Measures of Dispersion:
1.
2.
3. 4.
Range Quartile Deviation Mean Deviation Standard Deviation
Range:
Range is simply the difference between the highest and lowest value in the distribution of values. Weekly income of 10 people:
Example:
180 220 280 320 280 180 350 280 330 220
Range is maximum income minus minimum income: 330-180 = 150.
Group A: 30, 40, 40, 40, 40, 50, 50 Group B: 30, 30, 30, 40, 50, 50,50 Group C: 30, 35, 40, 40, 40, 45, 50
Range:20
Let us take two sets of observations. Set A contains marks of five students in Mathematics out of 25 marks and group B contains marks of the same student in English out of 100 marks. Set A: 10, 15, 18, 20, 20 Set B: 30, 35, 40, 45, 50
The values of range and coefficient of range are calculated as:
Range
Set : A Set : B
20 -10 = 10 50 -30=20
Co efficient of Range
Coefficient of Range:
It is relative measure of dispersion and is based on the value of range. It is also called range coefficient of dispersion. It is defined as Coefficient of Range = Max- min
Max+ min
Merits:
1. It is not based on each and every item 1. It is the simplest of the given data. method of 2. It can get affected measuring unduly by extreme variation. values, since only 2. It can be those values are calculated quickly considered. since only two 3. It can not be values are taken calculated for data into consideration. with open end classes. 4. Range does not have sampling stability.
Demerits:
Semi Interquartile Range ( Quartile Deviation )

Inter quartile range (IQR) is another range measure but this time looks at the data in terms of quarters or percentiles.
The range of data is divided into four equal percentiles or quarters (25%).
25th
Q1 percentile IQR
75th
Q3 percentile
Min Q2 Median th 50 Percentile
Max
Range
Calculation Of Q.D
1.
2.
3.
Ungrouped data ( Raw data). Discrete Data. Continuous Data.
Raw Data:
Q1 = Size of n+1 4 th item.
th Q3 = Size of 3 n+1 4
item.
Discrete Data:
Merits:
1.It is simple to compute and easy to understand. 2. It can be computed for data with open-end classes. 3. It is not affected by extreme values.
Demerits:
1.
2.
3.
It doesnt take all the values into consideration. It omits 50% of the items- i.e. 25% items below Q1 and 25% items above Q3. It is not much capable of further algebraic treatment. It doesnt have sampling stability.
3. Mean Deviation
It is defined as the mean of absolute deviations of various items from either mean or median or mode.
Calculation of M.D
1.
2.
3.
Raw Data Discrete Data Continuous Data.
Merits of M.D:
1. 2. 3.
It is based on every item of the series. It is rigidly defined. It is not much affected by extreme values.
Demerits of M.D :
1. 2.
3.
4.
It ignores algebraic signs while taking deviations of the items. It is not much used for further algebraic treatment. It can not be computed for data with open end classes. Calculation of M.D becomes tedious when the values of Mean, median, mode are in decimals.
Variance
Where the mean is a measure of the centre of a group of numbers, the variance is the measure of the spread. It involves measuring the distance between each of the values and the mean. To calculate the variance : 1. calculate the mean 2. for each value in the distribution subtract the mean and then square the result (the squared difference) 3. calculate the average of those squared differences.
Variance
s
2
N 1
= Sum of (observed value mean score) 2 Total number of scores -1
The larger the variance value the further the observed values of the data set are dispersed from the mean.
A variance value of zero means all observed values are the same as the mean.
4. Standard Deviation (S.D)
The square root of variance is known as standard deviation.

Central Tendency (Stats)

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Central Tendency (Stats)

Diunggah oleh

Hak Cipta:

Format Tersedia

MEASURES OF CENTRAL TENDENCY ( Location )

Measures of Location or Central Tendency.

Requisites of a Good Average.

Various measures of Central Tendency

1. Arithmetic Mean ( A.M )

1. Ungrouped Data ( Raw Data) 2. Discrete Data 3. Continuous Data

Exclusive method (overlapping)

A student whose mark is between 20 to 29.9 will be included in the 20 30 class.

Inclusive method (non-overlaping)

Ungrouped Data (Raw Data)

X1+ X2+ X3+ + X

x = observations n = number of observations.

Equity Holdings ( Millions of Rs.)

X 2717+2796++6874 = n 20 = Rs.4565.4 Millions

Problem on Discrete Data

Obtain the mean number of calls per minute.

No. of calls (x)

fx 5000 17500 27000 22000 19500 fx=91000

Properties of Arithmetic Mean

Properties of Arithmetic Mean

EXAMPLE Weighted Mean

1. Ungrouped Data ( Raw Data) 2. Discrete Data 3. Continuous Data

Sales Sorted Sales 9 6 12 10 13 15 16 14 14 16 17 16 24 21 22 6 9 10 12 13 14 14 15 16 16 16 17 17 18 18

The median is the middle value of data sorted in order of magnitude.

4. Read the corresponding X value. This gives the value of Median.

Apply the formula, M= l+

1. Ungrouped Data ( Raw Data) 2. Discrete Data 3. Continuous Data

Here, Mode is calculated by mere inspection.

Here, Mode is calculated by mere inspection.

Relation b/w mean, median and mode.

Mode = mean - 3 [mean - median]

Mode = 3 median - 2 mean Median = mode +

NEGATIVELY OR LEFT SKEWED

Mean < Median < Mode

POSITIVE OR RIGHT SKEWED

Mean > Median > Mode

G.M= antilog f log X

The Geometric Mean

Combined Geometric Mean

G = Antilog [(n1 log G1 + n2 log G2)/ (n1 + n2)]

AM, GM, and HM satisfy these inequalities:

Sales March April

Sales Executive B Sales Executive C 10 10 6 16

Why Study Dispersion?

Characteristics for Ideal measure of dispersion

Range Quartile Deviation Mean Deviation Standard Deviation

Range is maximum income minus minimum income: 330-180 = 150.

The values of range and coefficient of range are calculated as:

Semi Interquartile Range ( Quartile Deviation )

Min Q2 Median th 50 Percentile

Ungrouped data ( Raw data). Discrete Data. Continuous Data.

Raw Data Discrete Data Continuous Data.

= Sum of (observed value mean score) 2 Total number of scores -1

4. Standard Deviation (S.D)

The square root of variance is known as standard deviation.

Anda mungkin juga menyukai