Anda di halaman 1dari 47

1

W4 L8 Learning Outcomes
Measures of Position:
Describe and calculate the values for
Quartile
Interquartile Range, IQR
Quartile Deviation, QD
Percentile
Interpret the meaning.
Measures of Shape:
Describe and calculate the value for
Skewness
Interpret the meaning.

SQQS1013 W4 L8 ZZ
2
Quartiles, IQR
Quartiles are three summary measures that divide a
ranked data set into four equal parts. (100/4 = 25)

2
nd
Quartile
= median
1
4
+
1
Depth of Q =
n 3 1
4
+
3
Depth of Q =
( n )
IQR = Q3 Q1
SQQS1013 W4 L8 ZZ
*Grouped data: (p19)
3
Quartiles, IQR
The first quartile (Q1) is at the 25% mark, the second quartile
(Q2) is at 50% mark, and the third quartile (Q3) is at the 75% mark.

Approx. 25% of values in a ranked data set are less than Q1, and
about 75% are greater than Q1.






Interquartile range, IQR is the range from Q1 to Q3.
IQR = Q3 Q1.
IQR is the range of the middle 50% of the data.
Used to identify outliers, and to measure variability
in Exploratory Data Analysis (EDA).
SQQS1013 W4 L8 ZZ
4
Examples
Find the values of three quartiles and interquartile range for the
following data set.
9 7 5 1 3 10 7
Step 1: arrange the data set in increasing order
1 3 5 7 7 9 10
Step 2: use the formula to find Q1,Q2, Q3 and IQR. n = 7 (odd)
The position of Q1 = (n+1) = (7 + 1) = 2, thus 2
nd
position.
4 4
The position of Q2 = 2(n+1) = (7 + 1) = 4, thus 4
th
position.
4 2
The position of Q3 = 3(n+1) = 3(7 + 1) = 6, thus 6
th
position.
4 4
IQR = Q3 Q1 = 9 3 = 6

SQQS1013 W4 L8 ZZ
Values,
NOT
positions!
5
Examples
Find the values of three quartiles and interquartile range for the following
data set.
16 12 35 21 11 44 293 50 42 39

Step 1: arrange the data set in increasing order.

11 12 16 21 35 39 42 44 50 293




Step 2: use the formula to find Q1,Q2, Q3 and IQR.

The position of Q1 = (n+1) = (10 + 1) = 2.75
th

4 4

1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
Q1 = 12 + 0.75(16-12) = 15

SQQS1013 W4 L8 ZZ
6
Examples

11 12 16 21 35 39 42 44 50 293



The position of Q2 = 2(n+1) = (10 + 1) = 5.5
th

4 2


The position of Q3 = 3(n+1) = 3(10 + 1) = 8.25
th

4 4

1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
Q2 = 35 + 0.5(39-35) = 37
Q3 = 44 + 0.25(50-44) = 45.5
IQR = Q3 Q1 = 45.5 15 = 30.5
SQQS1013 W4 L8 ZZ
Use the
values,
NOT the
positions
or depths!
7
Quartile Deviation (FYI)
One way to avoid the problem of outliers affecting the
measure of dispersion (e.g. range) is to use the quartile
deviation.
QD is used to compare the level of dispersion between
data sets.
A relatively large quartile deviation indicates that there is
a relatively large level of dispersion in a data set.
QD is also known as the mean of IQR:
QD = IQR/2 = (Q3-Q1)/2
SQQS1013 W4 L8 ZZ
8
Percentiles





The 25th percentile is also known as the first quartile (Q1), the 50th
percentile as the median or second quartile (Q2), and the 75th
percentile as the third quartile (Q3).
Percentiles measure position from the bottom.
most often used for determining the relative standing of an individual in a
population or the rank position of the individual.

Percentile rank of x
i
= Number of values less than x
i
X 100
Total number of values in
the data set

SQQS1013 W4 L8 ZZ
9
Example
To find the percentile rank of a score, x
i
, out of a set of n scores,
where x
i
is not included:

Example: If Jason graduated 25
th
out of a class of 150 students,
then 125 students were ranked below Jason. Jason's percentile
rank would be:

= 125 x 100 = 83.33
150

Jason's standing in the class at the 83
rd
percentile, is higher than
83% of the graduates.

Source: Algebra Lesson Page, last accessed on 2 Oct 2011.
http://regentsprep.org/REgents/math/ALGEBRA/AD6/quartiles.htm
SQQS1013 W4 L8 ZZ
10
Example 28
1. The total revenue (in RM million) for the 12 top
tourism company in Malaysia is listed below. Find the
values of the three quartiles and IQR for the data below.
109.7 79.9 74.1 121.2 76.4
80.2 82.1 79.4 89.3 98.0
103.5 86.8
Remember:
Arrange data in increasing order!
SQQS1013 W4 L8 ZZ
Solution:
1 12 1
3 25
4 4
+ +
1
Depth of Q = = =
n
.
( )
3 12 1
3 1
9 75
4 4
+
+
3
Depth of Q = = =
( n )
.

Step 1: Arrange the data in increasing order.
74.1 76.4 79.4 79.9 80.2 82.1
86.8 89.3 98.0 103.5 109.7 121.2

Step 2: Determine the depth for Q
1
and Q
3
.


11 SQQS1013 W4 L8 ZZ
Solution:
Step 3: Determine Q1 and Q3

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3
98.0 103.5 109.7 121.2

Q1 = 79.4 + 0.25 (79.9 79.4) = 79.525

Q3 = 98.0 + 0.75 (103.5 98.0) = 102.125

Therefore, IQR = Q3 Q1 = 102.125 79.525
= 22.6
12 SQQS1013 W4 L8 ZZ
For Grouped Data
Quartile:


1
1
1 Q
Q
n
- F
4
Q L + i
f
| |
|
=
|
|
\ .
| |
|
=
|
|
\ .
3
3
3 Q
Q
3n
- F
4
Q L + i
f
We can get Q
1
and Q
3
using the equations:

13 SQQS1013 W4 L8 ZZ
Recall its
similarity to
the equation
for finding
the median
Example:
Time to travel to
work
Frequency
1 10
11 20
21 30
31 40
41 50
8
14
12
9
7
Find Q
1
and Q
3
for the following data.
14 SQQS1013 W4 L8 ZZ
Solution
Time to travel to
work
Frequency
Cumulative
Frequency
1 10
11 20
21 30
31 40
41 50
8
14
12
9
7
8
22
34
43
50
15 SQQS1013 W4 L8 ZZ
Solution
1
n 50
Q 12 5
4 4
. = = =
1
1
1
4
12 5 8
10 5 10
14
13 7143
| |
|
= +
|
|
\ .
| |
= +
|
\ .
=
Q
Q
n
- F
Q L i
f
. -
.
.
Class Q
1
is the 2
nd
class. Therefore,



16 SQQS1013 W4 L8 ZZ
Position of
Note the
different
formulations for
ungrouped (p5)
and grouped
data
Solution
( )
3
3 50
3n
Q 37 5
4 4
. = = =
3
3
3
3
4
37 5 34
30 5 10
9
34 3889
| |
|
= +
|
|
\ .
| |
= +
|
\ .
=
Q
Q
n
- F
Q L i
f
. -
.
.
Class Q
3
is the 4
th
class. Therefore,

17 SQQS1013 W4 L8 ZZ
Position of
Exercise:
Find Q
1
and Q
3
for the following data.
Price (RM) Frequency
12 14
15 17
18 20
21 23
24 26
27 - 29
5
14
25
7
6
3
18 SQQS1013 W4 L8 ZZ
Solution
Price (RM) Frequency
Cumulative
freq
12 14
15 17
18 20
21 23
24 26
27 - 29
5
14
25
7
6
3
5
19
44
51
57
60
19 SQQS1013 W4 L8 ZZ
Solution

Class Q
1
is the 2
nd
class. Therefore,



1
60
15
4 4
n
Q = = =
6429 . 16
3
14
5 15
5 . 14
4
1
1 1
=
|
.
|

\
|

+ =
|
|
|
|
.
|

\
|

+ = i
f
F
n
L Q
q
q
20 SQQS1013 W4 L8 ZZ
Position of
Solution

Class Q
3
is the 4
th
class. Therefore,



3
3 3(60)
45
4 4
n
Q = = =
9286 . 20
3
7
44 45
5 . 20
4
3
3
3 3
=
|
.
|

\
|

+ =
|
|
|
|
.
|

\
|

+ = i
f
F
n
L Q
q
q
21 SQQS1013 W4 L8 ZZ
Position of
Measures of Shape
A distribution, or data set, is symmetric if it looks the same
to the left as it does to the right of the centre point.
A well defined normal distribution is bell-shaped
(symmetric).
However, there are data sets that are not symmetrical,
having a short or a long tail, either on the left or their right-
hand side: skewed.
Skewness is a measure of symmetry, or more precisely,
the lack of symmetry.
22 SQQS1013 W4 L8 ZZ
Skewness
The skewness coefficient is also called the
Pearsons coefficient of skewness.










23 SQQS1013 W4 L8 ZZ
If S
k
+ve (> 1.0) right skewed
If S
k
-ve (< -1.0) left skewed
If S
k
= 0 symmetry
If -1.0 < S
k
< 0 or 0 < S
k
< 1.0 approximately symmetry.

Example
The duration of stay in wards for a sample of patients in
Hospital Seberang Jaya was recorded in days. From the
record, the mean is 28 days, the median is 25 days and
the mode is 23 days. Given the standard deviation is 4.2
days,
what is the type of the distribution?
find the skewness coefficient.

24 SQQS1013 W4 L8 ZZ
Solution:

This distribution is right skewed because the mean is
the largest value.

( ) ( )
28 23
11905
4 2
3 3 28 25
21429
4 2
Mean - Mode
OR
Mean - Median

= = =

= = =
k
k
S .
s .
S .
s .

From the positive S
k
value, this distribution is right skewed.

25 SQQS1013 W4 L8 ZZ
Kurtosis
Kurtosis is a measure of whether the data are peaked or
flat relative to a normal distribution.
high kurtosis - a distinct peak near the mean, decline
rather rapidly, heavy tails.
low kurtosis - a flat top near the mean rather than a
sharp peak.
The kurtosis is classified as:
Mesokurtic (neither high nor low)
Leptokurtic (peaks that are thin and tall)
Platykurtic (peak lower than a mesokurtic)

SQQS1013 W4 L8 ZZ 26
W4 L8: Revisions
Chapter 1: Introduction to Statistics
Chapter 2: Descriptive Statistics
Statistics
Theoretical
Statistics
Applied
Statistics
Inferential
Statistics
Development, derivation
and proof of theorems,
formulas, rules and laws.
Descriptive Statistics
Methods for collecting,
organizing, analyzing and
summarizing data obtained from
either a sample or a population.
Methods that use results
obtained from a sample to derive
conclusions about a population.
Applications of those
theorems, formulas, rules and
laws to solve real problems.
28 SQQS1013_W4_L8_ZZ












Figure extracted from Bowmans website (2009)
SQQS1013_W4_L8_ZZ 29
Survey
Survey = data collection from a population or sample.
Census
Survey
Parameter
Sample
Survey
Statistic
Characteristic
or measure
30 SQQS1013_W4_L8_ZZ
Types of Variables
31 SQQS1013_W4_L8_ZZ
Can only be whole numbers
Can have any numerical values
Qualitative Variables
Qualitative variables can be further categorized as either nominal or
ordinal.
Nominal variables are variables that have two or more categories
but which do not have an intrinsic order.
For example, a real estate agent could classify their types of property
into distinct categories such as houses, condos, or bungalows. So "type
of property" is a nominal variable with 3 categories called houses,
condos, and bungalows. E.g.?
Ordinal variables are variables that have two or more categories just
like nominal variables only the categories can also be ordered or
ranked.
So if I ask if you like Statistics, your possible responses are "Not very
much", "OK" or "Yes, a lot" then we have an ordinal variable.
However, whilst we can rank the levels, we cannot place a "value" to
them; we cannot say that "OK" is twice as positive as "Not very much,
for example.
SQQS1013_W4_L8_ZZ 32
Quantitative Variables
Quantitative variables can be further categorized as either
interval or ratio variables.
Interval variables are variables that can be measured along
a continuum and they have a numerical value.
For example, temperature measured in degrees Celsius or Fahrenheit. So the
difference between 20C and 30C is the same as 30C to 40C. However,
temperature measured in degrees Celsius or Fahrenheit is NOT a ratio variable.
Ratio variables are interval variables but with the added condition
that 0 (zero) of the measurement indicates that there is none of that
variable.
So, temperature measured in degrees Celsius or Fahrenheit is not a ratio variable
because 0C does not mean there is no temperature.
However, temperature measured in Kelvin is a ratio variable as 0 Kelvin (often
called absolute zero) indicates that there is no temperature whatsoever.
Other examples of ratio variables include height, mass, distance and many more.
The name "ratio" reflects the fact that you can use the ratio of measurements. So,
for example, a distance of 10 metres is twice the distance of 5 metres.
SQQS1013_W4_L8_ZZ 33
Primary & Secondary Data
Primary data:
Specific data obtained for a particular study
conducted by researcher.

Secondary data:
Pre-existing data, second-hand.
Data obtained from materials published by
governmental, industrial or individual
sources.


34 SQQS1013_W4_L8_ZZ
Sir Arthur Conan Doyle (Sherlock Holmes) once said that
it is a capital mistake to theorize before one has data.
Primary Data

Data is collected by researchers, for a specific research.
1. Surveys:
describing, recording, analyzing and interpreting conditions
that exist or existed by asking from respondents.
i. Face-to-face interview
ii. Phone interview
iii. Questionaire
2. Observations:
the information is sought by way of investigators own direct
observation without asking from respondents.
3. Experiments:
investigators manipulate variable to study the effects on
respondents.


35 SQQS1013_W4_L8_ZZ
Descriptive Statistics
Graphical summaries
Construction
Interpretation
Application
36 SQQS1013_W4_L8_ZZ
Qualitative Data
Organizing & Graphing
Frequency Distribution/Table
Relative Frequency & Percentage Distribution
Bar Chart (simple/ vertical, horizontal, component, multiple)
Pie Chart
Line Graph/Time Series Graph

37 SQQS1013_W4_L8_ZZ
Quantitative Data
Organizing & Graphing
Stem & Leaf Display
Frequency Distribution
Histogram
Polygon
Ogive
Box Plot

38 SQQS1013_W4_L8_ZZ
Histograms
Bar graphs that show the distribution of data.
Symmetry: Symmetrical or skewed?
Modality: unimodal, multimodal, or no mode?
Often used in combination with other statistical
summaries, for example, the boxplots which give the
median, quartiles and range of the data.
39 SQQS1013_W4_L8_ZZ

40 SQQS1013_W4_L8_ZZ

41 SQQS1013_W4_L8_ZZ








Source: http://www.saferpak.com/histogram_articles/howto_histogram.pdf

42 SQQS1013_W4_L8_ZZ
Interpreting Histograms
13A: Most of the data were on target, with very little
variation from it.
13B: Although some data were on target, many others
were dispersed away from the target.
13C: Even when most of the data were close together,
they were located off the target by a significant amount.
13D: The data were off target and widely dispersed.
43 SQQS1013_W4_L8_ZZ

44 SQQS1013_W4_L8_ZZ
45 SQQS1013_W4_L8_ZZ
Extra Info
Pareto chart: popular!
46 SQQS1013_W4_L8_ZZ
47
W4 L8: Closure
You should now be able to:
Describe and calculate the values for the
Measures of Position
Measures of Shape
Interpret the meaning of each value.
Next Lesson W5 L9:
Quick Revision (Chapters 1 & 2)
Chapter 3: Introduction to Probability.
Quiz 2!


SQQS1013 W4 L8 ZZ