Chapter 2 Descriptive Statistics

1
Chapter 2: Descriptive Statistics
C
Chhaapptteerr 22::
D
DE
ES
SC
CR
RIIP
PTTIIV
VE
ES
STTA
ATTIIS
STTIIC
CS
S
Upon completion of this chapter, you should be able to:
Explain what is descriptive statistics

Compute the mean
Compute the standard deviation
Explain the implication of differences in standard deviations
Identify the median and the mode
Explain the types of charts used to display data
CHAPTER OVERVIEW
What is descriptive statistics?

Measures of central tendencies
o The Mean
o The Standard Deviation
o The Median
o The Mode
Measures of variability or dispersion
o Range
o Standard deviation
Frequency Table
Summary Table
Charts
Chapter 1: Introduction
Chapter 3: The Normal Distribution
Chapter 4: Hypothesis Testing
Chapter 5: T-test
Chapter 6: Oneway Analysis of Variance
Chapter 7: Correlation
Chapter 8: Chi-Square
Summary
Key Terms
References
This chapter discusses a variety of methods of displaying and describing data. The most
widely used are measures of central tendencies involving the mean, median and mode.
Measures to indicate spread or dispersion include the widely used range and standard
deviation. In terms of displaying data, the frequency table is discussed.
2
What is Descriptive Statistics?
Descriptive statistics are used to summarise a collection of data and presented

in way that is clearly understood. For example, a researcher administered a scale to
measure self-esteem among 500 teenagers. How might these measurements be
summarised? There are two basic methods: numerical and graphical. Using the
numerical approach one might compute the mean and the standard deviation. Using
the graphical approach one might create a frequency table, bar-chart, a line graph or
a box plot. These graphical methods display detailed information about the
distribution of the scores. Graphical methods are better suited than numerical methods
for identifying patterns in the data. Numerical approaches are more precise and
objective.
Descriptive statistics are typically distinguished from inferential statistics.
With descriptive statistics you are simply describing what is or what the data shows
based on the sample. With inferential statistics, you are trying to reach conclusions
based on the sample that extend beyond the immediate data alone. For instance, we
use inferential statistics to try to infer from the sample data what the population might
think. Or, we use inferential statistics to make judgments of the probability that an
observed difference between groups is a dependable one or one that might have
happened by chance in this study. Thus, we use inferential statistics to make
inferences from our data to more general conditions; we use descriptive statistics
simply to describe what's going on in our data.
Descriptive Statistics are used to present quantitative descriptions in a
manageable form. In a research study we may have lots of measures. Or we may
measure a large number of people on any measure. Descriptive statistics help us to
simply large amounts of data in a sensible way. Each descriptive statistic reduces lots
of data into a simpler summary. For instance, consider the the Grade Point Average
(GPA). This single number describes the general performance of a student across a
potentially wide range of course experiences. The single number describes a large
number of discrete events such as the grade obtained for each subject taken. However,
every time you try to describe a large set of observations with a single indicator you
run the risk of distorting the original data or losing important detail. The GPA doesn't
tell you whether the student was in difficult courses or easy ones, or whether they
were courses in their major field or in other disciplines. Even given these limitations,
descriptive statistics provide a powerful summary that may enable comparisons across
people or other units.
Measures of Central Tendencies or Measures of Centre
A measure of central tendency or measure of centre is a value at the centre or middle

of a distribution of data or scores. There are several different ways to determine the
centre, such as the mean, median and mode. We begin with the mean
3
a) THE MEAN
The mean or average and the standard deviation are the most widely used statistical
tool in educational and psychological research. The mean is the most frequently used
measure of central tendency or measures of centre while the standard deviation is the
most frequently used measure of variability or dispersion.
Computing the Mean
The mean or X (pronounced as X bar) is the figure obtained when the sum of all the
items in the group is divided by the number of items (N). Say for example you have
the score of 10 students on a science test.
The sum () of all the ten scores = 23 + 22 + 26 + 21 + 30 + 24 + 20 + 27 + 25 + 32
= 250
(X )
Mean or X
250
=
= 25.0
10
Notation:
= denotes the sum of a set of values or scores
N = represents the number of values or scores in a population
n = represents the number of values or scores in a sample.
In the computation of the mean, every item counts. As a result, extreme values at
either end of the group or series of scores severely affects the value of the mean. The
mean could be "pulled towards" as a result of the extreme scores which may give a
distorted picture of the groups or series of scores or data.
However, in general, the mean is a good measure of central tendency for
roughly symmetric distributions but can be misleading in skewed distributions (see
example below) since it can be greatly influenced by extreme scores.
b) THE MEDIAN
The Median is the score found at the exact middle of the set of values. It is the middle
value when the values or scores are arranged in order of increasing (or decreasing
magnitude. One way to compute the median is to list all scores in ascending order,
and then locate the score in the centre of the sample. For example, if we order the
following 7 scores as shown below, we would get:
4
12,
18,
22,
30,
25
37,
40
Score 25 is the median because it represents the halfway point for the distribution of
scores.
Look at these set of 8 scores. What is the median score?
15,
15,
20,
15,
20
21,
25,
36
There are 8 scores and the fourth score (20) and the fifth score (20) represent the
halfway point. Since both of these scores are 20, the median is 20. If the two middle
scores had different values, you would have to interpolate to determine the median.
c) THE MODE
The Mode is the most frequently occurring value in the set of scores. To determine
the mode, you might again order the scores as shown below, and then count each one.
15,
15,
15
20,
20,
21,
25,
36
The most frequently occurring value is the mode. In our example, the value 15 occurs
three times and is the mode. In some distributions there is more than one modal value.
For instance, in a bimodal distribution there are two values that occur most frequently.
If the distribution is truly normal (i.e., bell-shaped), the mean, median and mode are
all equal to each other.
SHOULD YOU USE THE MEAN OR THE MEDIAN?

The mean and median are two common measures of central tendencies of a
typical score in a sample discussed. Which of these two should you use when
describing your data? It depends on your data. In other words, you should ask yourself
whether the measure of central tendency you have selected gives a good indication of
the typical score in your sample. If you suspect that the measure of central tendency
5
selected does not give a good indication of the typical score, then you most probably
have chosen the wrong one.
The MEAN is the most frequently used measure of central tendency and it
should be used if you are satisfied that it gives a good indication of the typical score
in your sample. However, there is a problem with the mean. Since it uses all the
scores in a distribution, it is sensitive to extreme score,
Example:
The Mean for these set of nine scores:
20 + 22 + 25 + 26 + 30 + 31 + 33 + 40 + 42
is 29.89
If we were to change the last score from 42 to 70, see what happens to the mean
The Mean:
20 + 22 + 25 + 26 + 30 + 31 + 33 + 40 + 70 is 33.00
Obviously, this mean is not a good indication of the typical score in this set of data.
The extreme score has changed the mean from 29.89 to 33.00 If these were test
scores, it may give the impression that students performed better in the latter test
when in fact only one student scored high.
NOTE: Keep in mind this characteristic when interpreting the means
obtained from a set of data.
If you find that you have extreme score and you are unable to use the mean,
then you should use the MEDIAN. The median is not sensitive to extreme scores. If
you examine the above example, the median is 30 in both distributions. The reason is
simply that the median score does not depend on the actual scores themsleves beyond
putting them in ascending order. So the last score in a distribution could be 80, 150 or
5000 and the median still would not change. It is this insensitivity to extreme scores
that make the median useful when you cannot use the mean.
Measures of Variability or Dispersion
Variability or Dispersion refers to the spread of the values around the central
tendency. There are two common measures of dispersion, the range and the standard
deviation.
a) RANGE
The range is simply the highest value minus the lowest value. For example, in
distribution, if the highest value is 36 and the lowest is 15, the range is 36 - 15 = 21.
6
b) STANDARD DEVIATION
The Standard Deviation is a more accurate and detailed estimate of dispersion
because an outlier can greatly exaggerate the range. The Standard Deviation shows
the relation that set of scores has to the mean of the sample. For instance, when you
give a test, there is bound to be variation in the scores obtained by students.
Variability or variation or dispersion is determined by the distance of a particular
score from the 'norm' or measure of central tendency such as the mean. The standard
deviation is a statistic that shows the extent of variability or variation for a given
series of scores from the mean.
The standard deviation makes use of the deviations of the individual scores
from the mean. Then each individual deviation is squared to avoid the problem of
plus and minus. The standard deviation is the most often used measure of variability
or variation in educational and psychological research.
The following is the formula for calculating the standard deviation:
OR
(X X )
N -1
When you give a test, there is bound to be variation in the scores obtained by
students. Variability or variation or dispersion is determined by the distance of a
particular score from the 'norm' or measure of central tendency such as the mean.
The standard deviation is a statistic that shows the extent of variability or variation
for a given series of scores from the mean.
Interpretation of the Formula
The standard deviation (s) is found
by taking the difference between the mean (X) and each item
squaring this difference (X - X)2
summing all the squared differences (X - X)2
dividing by the number of scores (N) minus 1,
extracting the square root.
7
Computing the Standard Deviation

Example: A mathematics test was given to a group of 10 students and their scores is
shown in Column 1.
Column 1
X
Column 2
X X
Column 3
(X X)2
23
22
26
21
30
24
20
27
25
32
23 25 = 2
22 25 = 3
26 25 = + 1
21 25 = 4
30 25 = + 5
24 25 = + 1
20 25 = 5
27 25 = + 2
25 25 = 0
32 25 = + 7
4
9
1
16
25
1
25
4
0
49
(X - X)2
134
Applying the formula:
(X X )
Std. Dev =
134
=
N -1
134
=
10 1
= 3.8586
9
Differences In Standard Deviations

A mathematics test was administered to Class A and Class B and the distribution of
the scores are shown below.
In Class A, the scores are widely spread out which means that there is high variance
or a bigger standard deviation; i.e. most the scores are between - 6 and + 6. If the
mean is 50, then you can say that 95% of the students scored between 44 and 56.
8
Class A
Standard Deviation
In Class B, there is low variance or a small standard deviation which explains why
most of the scores are clustered around the mean. Most of the scores are 'bunching'
around the mean; i.e. most of the scores are between - 3 and + 3. If the mean is 50,
95% of the students scored between 47 and 53.
Class B
Standard Deviation
9
LEARNING ACTIVITY
Below are the scores obtained by students in two classes
on a history test:
Class A marks: 15, 25, 20, 20, 18, 22, 16, 24, 28, 12
Class B marks: 10, 30, 13, 27, 16, 24, 5, 35, 28, 12
a) Compute the mean of the two classes
b) Compute the standard deviation of the two classes
c) Explain the implication of differences in standard
deviations
Frequency Distribution
Frequency distribution is a way of displaying numbers in an organised
manner so questions can be answered easily. A frequency distribution is simply a
table that, at minimum, displays how many times in a data set each response or
"score" occurs. A good frequency distribution will display more information than this
although with just this minimum information, many other bits of information can be
computed.
TABLES
Tables can contain a great deal of information but they also take up a lot of space and
may overwhelm readers with detail. How should tables be represented in a manner
that can be easily understood? In general frequency tables are best for variables with
different numbers of categories (see Table below).
Question: Should Sex Education be Taught in Secondary School?

Frequency
Strongly Agree
Agree
Disagree
Strongly Disagree
Total
1
3
4
5
13
Percent
7.7
23.1
30.8
38.5
100.0
Valid
Percent
7.7
23.1
30.8
38.5
100.0
Cumulative
Percent
7.7
30.8
61.5
100.0
The table above summarises the responses of 13 teachers with regards to the teaching
of sex education in secondary school.
10
The first column contains the values or categories of the variables (opinion on
teaching sex education in schools - extent of agreement
The frequency column indicates the number of people in each category
The percentage column lists the percentage of the whole sample in each
category. These percentages are based on the total sample size, including
those who did not answer the question.
The valid percentage column contains the percentage of those who gave a
valid response to the question that belong to each category. This the
percentage that is normally used.
The cumulative percentage column provides the rolling addition of
percentages from the fist category to the last valid category. For example, 7.7
percent of teachers strongly agree that sex education should be taught in
secondary school. A further 23.1 percent simply agree that sex education
should be taught. The cumulative percentage column adds up the percentage
who strongly agree with those who agree (7.7 + 23.1 = 30.8). Thus 30.8
percent at least agree (agree or strongly agree) that sex education should be
taught in secondary school.
When using frequency tables you should ensure that the information is accurate and to
avoid clutter and most important of all is to make sure that the table is informative.
SPSS PROCEDURE:
To obtain a frequency table, measure of central tendency and
variability
1. Select the Analyze menu
2. Click on the Descriptive Statistics and then on
Frequencies. to open the Frequencies dialogue box
3. Select the variable(s) you require (i.e. opinion on sex
education) and click on the button to move the variable into
the Variables(s): box
4. Click on the Statistics.command pushbutton to open the
Frequencies: Statistics sub-dialogue box
5. In the Central Tendency box, select the Mean, Median
and Mode check boxes
6. In the Dispersion box, select the Std. deviation and
range check boxes.
7. Click on Continue
8. Click on Continue and then OK
11
Graphs
Graphs are widely used in describing data but it should be appropriately used. There
is the tendency for graphs to be cluttered, confusing and downright misleading.
A) BAR CHARTS
There following are elements of a graph that should be given due consideration:
The X-Axis represents the values of the variables being displayed. The x-axis
may be divided into discrete categories (bar charts) or into arrange of
continuous values (line graphs). Which units are used depends on the level of
measurement of the variable being graphed.
In the example below, the x-axis represents car type according to country of
origin and shown on the graph horizontally.
60
50
40
Percent
30
20
10
0
Brit
Ger
Fren
Jap
Kor
Swed
Y- axis
CAR TYPE
Preference for Type of Cars
The Y Axis represents either percentages or frequencies, The y-axis is usually

placed perpendicular to the x-axis.
Interpretation of the graph on Preference for Type of Car According to
Country of Origin
o About 60% of respondents preferred Japanese cars compared to only
10% who preferred Swedish cars..
12
o German cars was preferred by 50% of respondents while 40%
preferred French cars.
o Only 20% of respondents preferred British cars compared to 30% who
preferred Korean cars.
B) HISTOGRAM
Histograms are different from bar charts because they are used to display a
continuous internal-level variable (see histogram below).
The X-Axis represents the values of the variable being displayed. The x-axis is
arranged as continuous values.
The x-axis represents the different age groups and each bar represents one age
group in ascending order and is placed at the bottom of the graph horizontally.
X axis
60
50
40
Percent
30
20
10
0
18-28
29-39
40-50
51-61
62-72
73+
Y- axis
AGE GROUP
Percentage Who Agreed that Sex Education Should be

Taught in Secondary School
The Y Axis represents either percentages or frequencies. The y-axis is usually

placed perpendicular to the x-axis.
Interpretation of the graph on Sex Education Should be Taught in Secondary
School
13
o Among the 18-28 age group only 20% agreed that sex education
should be taught in schools compared to 60% in the 51-61 age group
who felt that sex education should be taught.
o About 40% in the 40-50 age group and 50% among the 29-39 agreed
that sex education should be taught in secondary schools.
o Only 10% of those 73 years and older agreed that secondary school
students should be taught sex education.
C) LINE GRAPHS
The line graph serves a similar function to a histogram. It should be used for
continuous interval-level variables. The main differences between a line graph and a
histogram are that on a line graph the frequency of any value on the x-axis is
represented by a point on a line rather than by a single column and the values of the
continuous variable are not automatically grouped into a smaller number of groups as
they in histograms. As such the line graph reflects the frequency of every value of the
x variable and thus avoids potential distortions due to the way in which values are
grouped.
60
50
Females
40
Percent
30
20
Males
10
0
None
Monthly
Fortnightly
Weekly
2-3
Weekly
Frequency
The line graph above shows the frequency of using the library among a group
of male and female respondents. The level of measurement of the y-axis variable is
ordinal or interval. Line graphs are more suitable for y-variables that have more than
five or six categories. They are less suited to variables with a very large number of
values as this can produce a very jagged and confusing graph.
14
Since a separate line is produced for each category of the x variable, only x
variables with a small numbers of categories should be used. This will normally mean
that the x variable is a nominal or ordinal variable.
The y axis on a line graph should have a percentage scale rather than a
frequency scale. The appearance of the lines in a line graph will be misrepresented if
frequencies are used.
LEARNING ACTIVITY
Interpret the line graph showing the frequency a group of
respondents visit the library. A separate line is used for
male and female respondents.
SUMMARY
Descriptive statistics are used to summarise a collection of data and presented

in way that is clearly understood.
With inferential statistics, you are trying to reach conclusions based on the
sample that extend beyond the immediate data alone.
The mean or average and the standard deviation are the most widely used
statistical tool in educational and psychological research.
_
The mean or X (pronounced as X bar) is the figure obtained when the sum of
all the items in the group is divided by the number of items (N).
The median is a measure of centre that is the middle value when the values or
scores are arranged in order of increasing (or decreasing) magnitude.
The Mode is the most frequently occurring value in the set of scores.
The range of a set of values or scores is the difference between the highest
value and the lowest value.
15
The standard deviation is a measure of variation of values or scores about the

mean.
Frequency distribution is a way of displaying numbers in an organised manner

so questions can be answered easily.
Bar charts are used to represent a frequency distribution of a quantitative

variable.
Histograms are different from bar charts because they are used to display a
continuous internal-level variable.
The line graph serves a similar function to a histogram. It should be used for
continuous interval-level variables.
KEY TERMS
Distribution
Measure of central tendency
Mean
Median
Mode
Measures of variability
Range
Standard deviation
Frequency
Bar chart
Line graph

Chapter 2 Descriptive Statistics

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Chapter 2 Descriptive Statistics

Diunggah oleh

Hak Cipta:

Format Tersedia

1

Chapter 2: Descriptive Statistics

Explain what is descriptive statistics

What is descriptive statistics?

What is Descriptive Statistics?

Descriptive statistics are used to summarise a collection of data and presented

A measure of central tendency or measure of centre is a value at the centre or middle

Look at these set of 8 scores. What is the median score?

SHOULD YOU USE THE MEAN OR THE MEDIAN?

The following is the formula for calculating the standard deviation:

Computing the Standard Deviation

Applying the formula:

Differences In Standard Deviations

Question: Should Sex Education be Taught in Secondary School?

Preference for Type of Cars

The Y Axis represents either percentages or frequencies, The y-axis is usually

Percentage Who Agreed that Sex Education Should be

The Y Axis represents either percentages or frequencies. The y-axis is usually

Descriptive statistics are used to summarise a collection of data and presented

The standard deviation is a measure of variation of values or scores about the

Frequency distribution is a way of displaying numbers in an organised manner

Bar charts are used to represent a frequency distribution of a quantitative

Anda mungkin juga menyukai