C
Chhaapptteerr 22::
D
DE
ES
SC
CR
RIIP
PTTIIV
VE
ES
STTA
ATTIIS
STTIIC
CS
S
Upon completion of this chapter, you should be able to:
CHAPTER OVERVIEW
Chapter 1: Introduction
Chapter 2: Descriptive Statistics
Chapter 3: The Normal Distribution
Chapter 4: Hypothesis Testing
Chapter 5: T-test
Chapter 6: Oneway Analysis of Variance
Chapter 7: Correlation
Chapter 8: Chi-Square
Summary
Key Terms
References
This chapter discusses a variety of methods of displaying and describing data. The most
widely used are measures of central tendencies involving the mean, median and mode.
Measures to indicate spread or dispersion include the widely used range and standard
deviation. In terms of displaying data, the frequency table is discussed.
2
Chapter 2: Descriptive Statistics
3
Chapter 2: Descriptive Statistics
a) THE MEAN
The mean or average and the standard deviation are the most widely used statistical
tool in educational and psychological research. The mean is the most frequently used
measure of central tendency or measures of centre while the standard deviation is the
most frequently used measure of variability or dispersion.
Computing the Mean
The mean or X (pronounced as X bar) is the figure obtained when the sum of all the
items in the group is divided by the number of items (N). Say for example you have
the score of 10 students on a science test.
The sum () of all the ten scores = 23 + 22 + 26 + 21 + 30 + 24 + 20 + 27 + 25 + 32
= 250
(X )
Mean or X
250
=
= 25.0
10
Notation:
= denotes the sum of a set of values or scores
N = represents the number of values or scores in a population
n = represents the number of values or scores in a sample.
In the computation of the mean, every item counts. As a result, extreme values at
either end of the group or series of scores severely affects the value of the mean. The
mean could be "pulled towards" as a result of the extreme scores which may give a
distorted picture of the groups or series of scores or data.
However, in general, the mean is a good measure of central tendency for
roughly symmetric distributions but can be misleading in skewed distributions (see
example below) since it can be greatly influenced by extreme scores.
b) THE MEDIAN
The Median is the score found at the exact middle of the set of values. It is the middle
value when the values or scores are arranged in order of increasing (or decreasing
magnitude. One way to compute the median is to list all scores in ascending order,
and then locate the score in the centre of the sample. For example, if we order the
following 7 scores as shown below, we would get:
4
Chapter 2: Descriptive Statistics
12,
18,
22,
30,
25
37,
40
Score 25 is the median because it represents the halfway point for the distribution of
scores.
15,
15,
20,
15,
20
21,
25,
36
There are 8 scores and the fourth score (20) and the fifth score (20) represent the
halfway point. Since both of these scores are 20, the median is 20. If the two middle
scores had different values, you would have to interpolate to determine the median.
c) THE MODE
The Mode is the most frequently occurring value in the set of scores. To determine
the mode, you might again order the scores as shown below, and then count each one.
15,
15,
15
20,
20,
21,
25,
36
The most frequently occurring value is the mode. In our example, the value 15 occurs
three times and is the mode. In some distributions there is more than one modal value.
For instance, in a bimodal distribution there are two values that occur most frequently.
If the distribution is truly normal (i.e., bell-shaped), the mean, median and mode are
all equal to each other.
5
Chapter 2: Descriptive Statistics
selected does not give a good indication of the typical score, then you most probably
have chosen the wrong one.
The MEAN is the most frequently used measure of central tendency and it
should be used if you are satisfied that it gives a good indication of the typical score
in your sample. However, there is a problem with the mean. Since it uses all the
scores in a distribution, it is sensitive to extreme score,
Example:
The Mean for these set of nine scores:
20 + 22 + 25 + 26 + 30 + 31 + 33 + 40 + 42
is 29.89
If we were to change the last score from 42 to 70, see what happens to the mean
The Mean:
20 + 22 + 25 + 26 + 30 + 31 + 33 + 40 + 70 is 33.00
Obviously, this mean is not a good indication of the typical score in this set of data.
The extreme score has changed the mean from 29.89 to 33.00 If these were test
scores, it may give the impression that students performed better in the latter test
when in fact only one student scored high.
NOTE: Keep in mind this characteristic when interpreting the means
obtained from a set of data.
If you find that you have extreme score and you are unable to use the mean,
then you should use the MEDIAN. The median is not sensitive to extreme scores. If
you examine the above example, the median is 30 in both distributions. The reason is
simply that the median score does not depend on the actual scores themsleves beyond
putting them in ascending order. So the last score in a distribution could be 80, 150 or
5000 and the median still would not change. It is this insensitivity to extreme scores
that make the median useful when you cannot use the mean.
Measures of Variability or Dispersion
Variability or Dispersion refers to the spread of the values around the central
tendency. There are two common measures of dispersion, the range and the standard
deviation.
a) RANGE
The range is simply the highest value minus the lowest value. For example, in
distribution, if the highest value is 36 and the lowest is 15, the range is 36 - 15 = 21.
6
Chapter 2: Descriptive Statistics
b) STANDARD DEVIATION
The Standard Deviation is a more accurate and detailed estimate of dispersion
because an outlier can greatly exaggerate the range. The Standard Deviation shows
the relation that set of scores has to the mean of the sample. For instance, when you
give a test, there is bound to be variation in the scores obtained by students.
Variability or variation or dispersion is determined by the distance of a particular
score from the 'norm' or measure of central tendency such as the mean. The standard
deviation is a statistic that shows the extent of variability or variation for a given
series of scores from the mean.
The standard deviation makes use of the deviations of the individual scores
from the mean. Then each individual deviation is squared to avoid the problem of
plus and minus. The standard deviation is the most often used measure of variability
or variation in educational and psychological research.
OR
(X X )
N -1
When you give a test, there is bound to be variation in the scores obtained by
students. Variability or variation or dispersion is determined by the distance of a
particular score from the 'norm' or measure of central tendency such as the mean.
The standard deviation is a statistic that shows the extent of variability or variation
for a given series of scores from the mean.
Interpretation of the Formula
The standard deviation (s) is found
by taking the difference between the mean (X) and each item
squaring this difference (X - X)2
summing all the squared differences (X - X)2
dividing by the number of scores (N) minus 1,
extracting the square root.
7
Chapter 2: Descriptive Statistics
Column 2
X X
Column 3
(X X)2
23
22
26
21
30
24
20
27
25
32
23 25 = 2
22 25 = 3
26 25 = + 1
21 25 = 4
30 25 = + 5
24 25 = + 1
20 25 = 5
27 25 = + 2
25 25 = 0
32 25 = + 7
4
9
1
16
25
1
25
4
0
49
(X - X)2
134
(X X )
Std. Dev =
134
=
N -1
134
=
10 1
= 3.8586
9
8
Chapter 2: Descriptive Statistics
Class A
Standard Deviation
In Class B, there is low variance or a small standard deviation which explains why
most of the scores are clustered around the mean. Most of the scores are 'bunching'
around the mean; i.e. most of the scores are between - 3 and + 3. If the mean is 50,
95% of the students scored between 47 and 53.
Class B
Standard Deviation
9
Chapter 2: Descriptive Statistics
LEARNING ACTIVITY
Below are the scores obtained by students in two classes
on a history test:
Class A marks: 15, 25, 20, 20, 18, 22, 16, 24, 28, 12
Class B marks: 10, 30, 13, 27, 16, 24, 5, 35, 28, 12
a) Compute the mean of the two classes
b) Compute the standard deviation of the two classes
c) Explain the implication of differences in standard
deviations
Frequency Distribution
Frequency distribution is a way of displaying numbers in an organised
manner so questions can be answered easily. A frequency distribution is simply a
table that, at minimum, displays how many times in a data set each response or
"score" occurs. A good frequency distribution will display more information than this
although with just this minimum information, many other bits of information can be
computed.
TABLES
Tables can contain a great deal of information but they also take up a lot of space and
may overwhelm readers with detail. How should tables be represented in a manner
that can be easily understood? In general frequency tables are best for variables with
different numbers of categories (see Table below).
1
3
4
5
13
Percent
7.7
23.1
30.8
38.5
100.0
Valid
Percent
7.7
23.1
30.8
38.5
100.0
Cumulative
Percent
7.7
30.8
61.5
100.0
The table above summarises the responses of 13 teachers with regards to the teaching
of sex education in secondary school.
10
Chapter 2: Descriptive Statistics
The first column contains the values or categories of the variables (opinion on
teaching sex education in schools - extent of agreement
The frequency column indicates the number of people in each category
The percentage column lists the percentage of the whole sample in each
category. These percentages are based on the total sample size, including
those who did not answer the question.
The valid percentage column contains the percentage of those who gave a
valid response to the question that belong to each category. This the
percentage that is normally used.
The cumulative percentage column provides the rolling addition of
percentages from the fist category to the last valid category. For example, 7.7
percent of teachers strongly agree that sex education should be taught in
secondary school. A further 23.1 percent simply agree that sex education
should be taught. The cumulative percentage column adds up the percentage
who strongly agree with those who agree (7.7 + 23.1 = 30.8). Thus 30.8
percent at least agree (agree or strongly agree) that sex education should be
taught in secondary school.
When using frequency tables you should ensure that the information is accurate and to
avoid clutter and most important of all is to make sure that the table is informative.
SPSS PROCEDURE:
To obtain a frequency table, measure of central tendency and
variability
1. Select the Analyze menu
2. Click on the Descriptive Statistics and then on
Frequencies. to open the Frequencies dialogue box
3. Select the variable(s) you require (i.e. opinion on sex
education) and click on the button to move the variable into
the Variables(s): box
4. Click on the Statistics.command pushbutton to open the
Frequencies: Statistics sub-dialogue box
5. In the Central Tendency box, select the Mean, Median
and Mode check boxes
6. In the Dispersion box, select the Std. deviation and
range check boxes.
7. Click on Continue
8. Click on Continue and then OK
11
Chapter 2: Descriptive Statistics
Graphs
Graphs are widely used in describing data but it should be appropriately used. There
is the tendency for graphs to be cluttered, confusing and downright misleading.
A) BAR CHARTS
There following are elements of a graph that should be given due consideration:
The X-Axis represents the values of the variables being displayed. The x-axis
may be divided into discrete categories (bar charts) or into arrange of
continuous values (line graphs). Which units are used depends on the level of
measurement of the variable being graphed.
In the example below, the x-axis represents car type according to country of
origin and shown on the graph horizontally.
60
50
40
Percent
30
20
10
0
Brit
Ger
Fren
Jap
Kor
Swed
Y- axis
CAR TYPE
12
Chapter 2: Descriptive Statistics
o German cars was preferred by 50% of respondents while 40%
preferred French cars.
o Only 20% of respondents preferred British cars compared to 30% who
preferred Korean cars.
B) HISTOGRAM
Histograms are different from bar charts because they are used to display a
continuous internal-level variable (see histogram below).
The X-Axis represents the values of the variable being displayed. The x-axis is
arranged as continuous values.
The x-axis represents the different age groups and each bar represents one age
group in ascending order and is placed at the bottom of the graph horizontally.
X axis
60
50
40
Percent
30
20
10
0
18-28
29-39
40-50
51-61
62-72
73+
Y- axis
AGE GROUP
13
Chapter 2: Descriptive Statistics
o Among the 18-28 age group only 20% agreed that sex education
should be taught in schools compared to 60% in the 51-61 age group
who felt that sex education should be taught.
o About 40% in the 40-50 age group and 50% among the 29-39 agreed
that sex education should be taught in secondary schools.
o Only 10% of those 73 years and older agreed that secondary school
students should be taught sex education.
C) LINE GRAPHS
The line graph serves a similar function to a histogram. It should be used for
continuous interval-level variables. The main differences between a line graph and a
histogram are that on a line graph the frequency of any value on the x-axis is
represented by a point on a line rather than by a single column and the values of the
continuous variable are not automatically grouped into a smaller number of groups as
they in histograms. As such the line graph reflects the frequency of every value of the
x variable and thus avoids potential distortions due to the way in which values are
grouped.
60
50
Females
40
Percent
30
20
Males
10
0
None
Monthly
Fortnightly
Weekly
2-3
Weekly
Frequency
The line graph above shows the frequency of using the library among a group
of male and female respondents. The level of measurement of the y-axis variable is
ordinal or interval. Line graphs are more suitable for y-variables that have more than
five or six categories. They are less suited to variables with a very large number of
values as this can produce a very jagged and confusing graph.
14
Chapter 2: Descriptive Statistics
Since a separate line is produced for each category of the x variable, only x
variables with a small numbers of categories should be used. This will normally mean
that the x variable is a nominal or ordinal variable.
The y axis on a line graph should have a percentage scale rather than a
frequency scale. The appearance of the lines in a line graph will be misrepresented if
frequencies are used.
LEARNING ACTIVITY
Interpret the line graph showing the frequency a group of
respondents visit the library. A separate line is used for
male and female respondents.
SUMMARY
With inferential statistics, you are trying to reach conclusions based on the
sample that extend beyond the immediate data alone.
The mean or average and the standard deviation are the most widely used
statistical tool in educational and psychological research.
_
The mean or X (pronounced as X bar) is the figure obtained when the sum of
all the items in the group is divided by the number of items (N).
The median is a measure of centre that is the middle value when the values or
scores are arranged in order of increasing (or decreasing) magnitude.
The Mode is the most frequently occurring value in the set of scores.
The range of a set of values or scores is the difference between the highest
value and the lowest value.
15
Chapter 2: Descriptive Statistics
Histograms are different from bar charts because they are used to display a
continuous internal-level variable.
The line graph serves a similar function to a histogram. It should be used for
continuous interval-level variables.
KEY TERMS
Distribution
Measure of central tendency
Mean
Median
Mode
Measures of variability
Range
Standard deviation
Frequency
Bar chart
Line graph