Anda di halaman 1dari 22

Lecture 2:

Organizing Data, Measures of Central Tendency


and Dispersion in Frequency Distributions

By: Dr. Saddam Hussain


Frequency Distributions

Illustrates trends and pattern in the data by organizing and


graphing. Organized data can assist decisions makers in
educated guess.
Data recorded in the sequence in which they are collected and
before they are processed or ranked are called raw data.
Meaningful pattern in the data
A frequency distribution It is a table that organizes data into
classes. The number of observation from the data that fall into
each of the classes

2
Example

Some what None Somewhat Very Very None


Very Somewhat Somewhat Very Somewhat Somewhat
Very Somewhat None Very None Somewhat
Somewhat Very Somewhat Somewhat Very None
Somewhat Very very somewhat None Somewhat

Construct a frequency distribution table for these data.

3
Solution

Table Frequency Distribution of Stress on Job

Stress on Job Tally Frequency (f)


Very |||| |||| 10
Somewhat |||| |||| |||| 14
None |||| | 6
Sum = 30

4
Frequency Distributions....

Discrete Data: Separate entities that don’t progress from one class to
next without a break.
Continuous Data: Progress from one class to the next without a
break.
Constructing a FD
1.Decide on the type and number of classes i.e. 6-15 Classes
We need to make the class intervals of equal size.
2.Sort the data point in to the classes and illustrate in a chart
Calculation of Class Width:
Largest va lue - Smallest v alue
Width of class intervals 
Number of classes 5
Organizing & Graphing Data

Graph give data in a two dimensional picture


Horizontal Axis: Values of the variable, we are measuring
Vertical Axis: We mark the frequencies on the classes shown on
horizontal axis
• Graphing Data
– Histograms: It is a series of rectangles, each proportional in
width to the range of values within a class and proportional in
height to the number of items falling in the class.
– Polygons

6
Figure Frequency histogram

15

12
Frequency

0 124 - 146 - 168 - 190 - 212 -


145 167 189 211 233
Total home runs 7
Figure Frequency polygon

15

12
Frequency

0 124 - 146 - 168 - 190 - 212 -


145 167 189 211 233 8
Figure Frequency Distribution curve.
Frequency

9
Measure of Central Tendency and
Dispersion
Two key characteristics of a frequency distribution are especially
important when summarizing data or when making a
prediction from one set of results to another:
CT: It is the middle point of a distribution
D: It is the spread of the data in a distribution
Two other characteristics of data set that provide useful
information.
1. Skewness: Curves representing the data points in the data set
may be either symmetrical or skewed.
2. Kurtosis: To measure its peakedness
Figure

(a) (b)

(a) A histogram skewed to the right.


(b) A histogram skewed to the left.

11
Skewed Distributions
Voter Turnout - 1980
Voter Turnout - 1940
Measure of Central Tendency and
Dispersion….
Three measures of central tendency are commonly used in
statistical analysis - the mode, the median, and the mean
Each measure is designed to represent a typical score
The choice of which measure to use depends on:
• the shape of the distribution (whether normal or skewed), and
• the variable’s “level of measurement” (data are nominal,
ordinal or interval).
1. Mean: Average
2. Median: Central number  x
N  n 1
3. Mode: Repeated most often   
 2 
Why can’t the mean tell us everything?

• Mean describes Central Tendency, what the average outcome


is.
• We also want to know something about how accurate the
mean is when making predictions.
• The question becomes how good a representation of the
distribution is the mean? How good is the mean as a
description of central tendency -- or how good is the mean as a
predictor?
• Answer -- it depends on the shape of the distribution. Is the
distribution normal or skewed?
Measures of Variability

Central Tendency doesn’t tell us everything


Dispersion/Deviation/Spread tells us a lot about how a variable is
distributed.
It deals with average deviation from some measure of central
tendency.
We are most interested in Standard Deviations (σ) and Variance
(σ2). Both tell us an average distance of any observation in the
data set from the mean of the distribution.
Variance and Standard Deviation

• Instead of taking the absolute value, we square the


deviations from the mean. This yields a positive
value.
• This will result in measures we call the Variance and
the Standard Deviation
Sample- Population-
s: Standard Deviation σ: Standard Deviation
s2: Variance σ2: Variance
Using of Standard Deviation

• It enable us to determine with a great deal of accuracy where


the values of a frequency distributions are located in relation
to the means.
Chebyshev’s Theorem
1. 68% of the values in the population will fall within +1
standard deviation from the mean
2. 95% ………… +2 …….
3. 99%........... +3…..
Standard Deviation…..
Standard Score (Z-Score)
• An extreme value or outlier is a value located far away from
the mean. Z scores are useful in identifying outliers. The larger
the Z score, the greater the distance from the value to the
mean. The Z score is the difference between that value and
the mean, divided by the SD.

X  X 39.0  39.6
Z   0.09
S 6.77
Coefficient of Variation

The coefficient of variation is a relative measure of variation that is


always expressed as a percentage rather than in terms of the
units of the particular data.
The Coefficient of variation is equal to the standard deviation
divided by the mean, multiplied by 100%

S  6.77 
CV    100%    100%  17.10
X  39.6 
The SD is 17.1% of the size of the mean. It is very useful when
comparing two or more sets of data that are measured in
different units.

Anda mungkin juga menyukai