Anda di halaman 1dari 13

Welcome to Business Statistics Lecture 1 & 2

Contents: Basic Statistical Concepts -


1) Summarization of Data
2) Frequency Distribution
3) Measures of Central Tendency
4) Measures of Dispersion
5) Relative Dispersion, Skewness.


* A GOOD STUDENT of Statistics should ensure that the information resulting from a good
statistical analysis is always CONCISE, often PRECISE and never USELESS.


Types of Variables
1) Qualitative or Attribute variable - the characteristic being studied is generally nonnumeric.
EXAMPLES: Gender, religious affiliation, type of automobile owned, state of birth, eye color
are examples.
Qualitative variables could also be described by numbers, although the description might be
arbitrary. Examples: Car Registration number, State of birth 1, 2, 3, 4, etc.
2) Quantitative variable Can be described by a number for which arithmetic operations such
as averaging makes sense. EXAMPLES: Balance in your mobile account, minutes remaining in
class, or number of children in a family.
Quantitative Variable can be either Discrete or Continuous.


Summary of Types of Variables


Four Scales of Measurement Weakest 1 & Strongest 4
1) Nominal scale - data that is classified into categories and cannot be arranged in any particular
order. Numbers are just labels for groups or classes. Nominal stands for NAME
EXAMPLES: eye color, gender, religious affiliation, Platform number.
2) Ordinal scale involves data arranged in some order according to their relative size or
quality. The differences between data values cannot be determined or are meaningless. We know
one is better than the other but how much better is not known.
EXAMPLE: During a taste test of 4 soft drinks, Coca Cola was ranked number 1, Sprite number
2, Seven-up number 3, and Orange Mirinda number 4.
3) Interval scale - similar to the ordinal scale, with the additional property that meaningful
amounts of differences between data values can be determined. There is no natural zero point.
EXAMPLE: Time of a day. 10:00 a.m. is not twice of 5:00 a.m. but the interval between 00:00 &
10:00 a.m. is twice the interval between 00:00 and 5:00 a.m..
4) Ratio scale - the interval scale with an inherent zero starting point. Differences and ratios are
meaningful for this level of measurement.
EXAMPLES: Monthly income of surgeons, or distance traveled by manufacturers
representatives per month.

Population v/s Sample
A population is a collection of all possible individuals, objects, or measurements of interest. The
population is also called the UNIVERSE. Greek letters, like or are used for population &
termed as Population Parameter. A sample is a portion, or part, or subset of measurements
selected from the population of interest. Roman letters, x, s are used for describing sample
statistic.


Types of Statistics Descriptive Statistics
Data and Data Collection A set of measurements obtained on some variable is called a data set.
Descriptive Statistics - methods of organizing, summarizing, and presenting data in an
informative way. Generally when the entire population space is considered, tabulating &
presenting the data is a challenge.
Inferential Statistics: A decision, estimate, prediction, or generalization about a population, based
on a sample.
Problems To Be Solved
1) Percentiles & Quartiles.
2) Measures of Central Tendency,
Mean, Arithmetic, Geometric, Harmonic.
Mean for individual, discrete, continuous distribution.
Mean from Assumed mean.
Median for individual, discrete, continuous distribution.
Mode for individual, discrete, continuous distribution
3) Measures of Dispersion,
Range.
Mean Deviation.
Standard Deviation.
Coefficient of Variation.
Combined Standard Deviation.
4) Skewness,
Test for Skewness.

Requisites of a Good Measure of Central Tendency
It should be rigidly defined, which means that it should be calculated and interpreted in the same
way by everyone
It should be based on all values of the data
It should not be unduly affected by the extreme values
It should be amenable for further algebraic treatment
It should be amenable to sampling, by which we mean that the results obtained by various
samples should be similar. It should be simple to compute.
Some Measures of Central Tendency
Arithmetic Mean: It is an mathematical average and is obtained by dividing the sum of the
observations by the number of observations.
Median: It refers to the VALUE of the middle observation of the array & is an positional
average.
Quartiles, Deciles, Percentiles: These are also positional averages and divides the series into
four parts, ten parts and 100 parts respectively.
MODE: MODE is the Value of the data that occurs most frequently.
Geometric Mean: It is a specialized average and is applicable when quantities requiring
averaging are drawn from situations following Exponential law of growth or decline.
Harmonic Mean: Harmonic Mean is used to average rates.

Arithmetic Mean
Merits

Demerits

Easy to understand and simple to calculate

It is affected by extreme values & thus for
distributions where concentration is on small
or big values the mean is not an ideal
representative

It is based on all items of the series

For open ended distributions mean cannot be
calculated with accuracy

Rigidly defined by a mathematical formula

Mean is not useful for studying quantitative
phenomena like beauty, intelligence, honesty,
etc

It is capable of further algebraic treatment

Mean does not have a life of its own. Average
number of children is 3.6 in India is
meaningless

It has sampling stability and is least affected by
sampling fluctuations

Mean averages out the positive and negative
deviations, which is incorrect.

Arrangement of items is not required



Median
Merits

Demerits

Useful in Open ended series as it is based on
position and not on the values.

Requires arrangement of data.

Easier to compute as compared to mean in case
of unequal class intervals
It is not based on all the items of the series.

It is not affected by extreme values.

Incapable of any algebraic treatment &
combined medians cannot be obtained.

Suitable in case of Qualitative Data

Assumption of uniformly distributed median
class is not always true.

It minimizes total absolute deviations.





MODE
Merits

Limitations

In certain situations mode is the only suitable
average, e.g. size of shoes, garments, wages,
etc.

In case of bi modal or multi modal series,
mode cannot be uniquely defined.

It is not affected by extreme values.

It is incapable of further algebraic treatment.

It can be used for qualitative phenomena It is not based on all the items of the series.

It indicates point of maximum concentration in
case of highly skewed distributions.

It is not rigidly defined because different
formulae will give different answers.

Its value is affected by size of class interval.




Case Study Descriptive Statistics

Constructing a Frequency Table Example
Step 1: Decide on the number of classes.
A useful recipe to determine the number of classes (k) is the 2 to the k rule. such
that 2
k
> n.
There were 80 vehicles sold. So n =80. If we try k = 6, which means we would use 6
classes, then 2
6
= 64, somewhat less than 80. Hence, 6 is not enough classes. If we let k =7,
then 2
7
128, which is greater than 80. So the recommended number of classes is 7.
Step 2: Determine the class interval or width.
The formula is: i (H-L)/k where i is the class interval, H is the highest observed
value, L is the lowest observed value, and k is the number of classes.
($35,925 - $15,546)/7 = $2,911
Round up to some convenient number, such as a multiple of 10 or 100. Use a class width of
$3,000
Step 3: Set the individual class limits



Constructing a Frequency Table

Step 4: Tally the vehicle selling prices into the classes.
Step 5: Count the number of items in each class.

Relative Frequency Distribution
To convert a frequency distribution to a relative frequency distribution, each of the class
frequencies is divided by the total number of observations.


Graphic Presentation of a Frequency Distribution
The three commonly used graphic forms are:
1) Histograms
2) Frequency polygons
3) Cumulative frequency distributions

Histogram
Histogram for a frequency distribution based on quantitative data is very similar to the bar chart
showing the distribution of qualitative data. The classes are marked on the horizontal axis and
the class frequencies on the vertical axis. The class frequencies are represented by the heights of
the bars.

Histogram Using Excel

Frequency Polygon
Polygon also shows the shape of a distribution and is similar to a histogram.
It consists of line segments connecting the pA frequency oints formed by the
intersections of the class midpoints and the class frequencies.







Cumulative Frequency Distribution

Cumulative Frequency Distribution



Standard Deviation,
Merits

Demerits.

It is based on all items of the
distribution.

It cannot be used for comparing the
variability of two or more series of
observations given in different units.

It is amenable to algebraic treatment It is difficult to compute as compared
with other measures of dispersion.

It is least affected by fluctuations in
sampling.

It is very much affected by the extreme
values & importance is given to extreme
values from the mean than the near
values.

It facilitates the calculation of combined
standard deviation of two or more
groups.


It provides a unit of measurement for
normal distribution.