Anda di halaman 1dari 29

Introduction

Statistics is the study of collection , organization , analysis , interpretation , and


presentation of data. It deals with all aspects of data , including planning of data
collection in terms of design of surveys and experiments .
During my additional mathematics project work in the year 2013 , I have chosen
statistic as my topic. The reasons that I have chosen statistic because it has many uses in
our daily life , comes very handy in certain occasion such as data analyzing and helps us
complete our daily tasks such as analyzing the marks obtained by my classmate during
school examination.
In Part 1 , I have done some research on the importance of data analysis in our daily
life . Besides , I have listed out the types of central tendency and measure of dispersion
tendency.
In Part 2 , I have attached a frequency distribution table with marks obtained by my
classmates for Additional Mathematics monthly test . I have calculate the mean , mode ,
median and standard deviation for the particular frequency distribution table . Based on
the table , I have stated the most appropriate measure that reflect the performance of my
class and identified which type of data gives a more accurate representation.
In Part 3 , I have make a new conjecture for the new values of mean , mode , median ,
interquartile range and standard deviation when our teacher add 3 marks for each student
in my class for completing all their assignments.
In addition , I have make further exploration that the top 20% students in my class
will be awarded by the Additional Mathematics teacher . I have also make comparison
with Mr. Mas class in the same Additional Mathematics monthly test.












Importance of data analysis in daily life
Data analysis is a process used to transform, remodel and revise certain information
(data) with a view to reach to a certain conclusion for a given situation or problem. Data
analysis can be done by different methods as according to the needs and requirements.

Analysis does not have to involve complex statistics. Data analysis in schools
involves collecting data and using that data to improve teaching and learning.
Interestingly, principals and teachers have it pretty easy. In most cases, the collection of
data has already been done. Schools regularly collect attendance data, transcript records,
discipline referrals, quarterly or semester grades, norm- and criterion-referenced test
scores, and a variety of other useful data. Rather than complex statistical formulas and
tests, it is generally simple counts, averages, percents, and rates that educators are
interested in.
There are many benefits of data analysis however; the most important ones are as
follows: - data analysis helps in structuring the findings from different sources of data
collection like survey research. It is again very helpful in breaking a macro problem into
micro parts. Data analysis acts like a filter when it comes to acquiring meaningful
insights out of huge data-set. Every researcher has sort out huge pile of data that he/she
has collected, before reaching to a conclusion of the research question. Mere data
collection is of no use to the researcher. Data analysis proves to be crucial in this process.
It provides a meaningful base to critical decisions. It helps to create a complete
dissertation proposal.

One of the most important uses of data analysis is that it helps in keeping human bias
away from research conclusion with the help of proper statistical treatment. With the
help of data analysis a researcher can filter both qualitative and quantitative data for an
assignment writing projects. Thus, it can be said that data analysis is of utmost
importance for both the research and the researcher. Or to put it in another words data
analysis is as important to a researcher as it is important for a doctor to diagnose the
problem of the patient before giving him any treatment












Three types of measure of central tendency

Mean (Arithmetic)

Mean or arithmetic mean of a set of data x
1 ,
x
2
, x
3
x
n
is the sum of the values of all
observations divided by the total number of observations.

This formula is usually written in a slightly different manner using the Greek capitol
letter, , pronounced "sigma", which means "sum of...":

For a set of data if x
1 ,
x
2
, x
3
x
n
are observations for a set of data and f
1
, f
2
, f
3

=
f
f x
x
The mean is essentially a model of your data set. It is the value that is most common.
You will notice, however, that the mean is not often one of the actual values that you
have observed in your data set. However, one of its important properties is that it
minimises error in the prediction of any one value in your data set. That is, it is the value
that produces the lowest amount of error from all other values in the data set.
An important property of the mean is that it includes every value in your data set as part
of the calculation. In addition, the mean is the only measure of central tendency where
the sum of the deviations of each value from the mean is always zero.










Median
Median is the value of the centre of a set of data when the set of data is arranged in
ascending or descending order.
a) If a set of data has an odd number of observations , the median is defined as the
middle data/term of the list.
b) If a set of data has an even number of observations , the median is the mean
value of the two middle terms of the list.
65 55 89 56 35 14 56 55 87 45 92
We first need to rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the
middle mark because there are 5 scores before it and 5 scores after it. This works fine
when you have an odd number of scores, but what happens when you have an even
number of scores? What if you had only 10 scores? Well, you simply have to take the
middle two scores and average the result. So, if we look at the example below:
65 55 89 56 35 14 56 55 87 45
We again rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Only now we have to take the 5th and 6th score in our data set and average them to get a
median of 55.5.
For grouped data , median can be calculated by the formula as follows :
Median , m =
|
|
|
|
.
|

\
|

+
m
f
F
N
L
2

Where L = lower boundary of the median class
N = total number of observations
F = cumulative frequency before the median class

m
f = frequency of the midan class
c = size of the median class
Mode
The mode of a set of data is the observation which occurs the most number of times
compared to the others.
For the set of data 2,3,5,8,3,3 , the mode is 3 because 3 is the value of which occurs the
most frequently.







In the above histogram the mode has a value of 22.5. We can clearly see, however, that
the mode is not representative of the data. To use the mode to describe the central
tendency of this data set would be misleading.











Types of Measures of Absolute Dispersion:
(a) The Range,
(b) The Interquartile Range
(c) Variance
(d) The Standard Deviation.
(a) The Range:
1. The range is the difference between the largest value and the smallest
value in a set of data.

2. The range for grouped data is defined as the difference between the
midpoint of the highest class and the lowest class.
Range = midpoint of the highest class midpoint of the lowest class
(b) Interquartile range
1. Quartiles are values which divide a set of data arranged in ascending or
descending order into for equal parts and is defined as the differences
between the third quartile and the first quartile
Interquartile range = Q
3
Q
1

(c) Variance
1. Variance is a measure of how far a set of number is spread out .
2. For a set of ungrouped data x
1 ,
x
2
, x
3
x
n
, the variance is denoted by

( )
N
x x


=
2
2
o or ( )
2
2
2
x
N
x
=

o
3. For a grouped data , to find the variance , the midpoint of the class is used
to represent the class.
( )
2
2
2
x
f
f x
=

o



(d) Standard Deviation (SD):
1. The Standard Deviation is defined as the positive Square root of the
mean of the squared deviations of the values from their mean.
2. In case of a frequency distribution with x
1
, x
2
, .. , x
k
as class marks,
and f
1
, f
2
, , f
k
as the corresponding class frequencies, the SD is
expressed as follows:























Uses of measure of central tendency in our daily life
a) Mean
Marks Number of Students Midpoint
41-50 2 45.5
51-60 5 55.5
61-70 8 65.5
71-80 10 75.5
81-90 10 85.5
91-100 5 95.5
The above shows a frequency dispersion table of the marks obtained by 40 students in a
certain examination .A teacher uses mean to average your grade. She/ or he adds up all
the grades and divides them by the number of grades you have. For example:
Mean,

=
f
f x
x
=
40
) 5 . 95 ( 5 ) 5 . 85 ( 10 ) 5 . 75 ( 10 ) 5 . 65 ( 8 ) 5 . 55 ( 5 ) 5 . 45 ( 2 + + + + +

= 74.5
b) Median
Marks Lower Boundary Midpoint Number of Students Cumulative frequency
51-60 50.5 55.5 2 2
61-70 60.5 65.5 3 5
71-80 70.5 75.5 5 10
81-90 80.5 85.5 6 16
91-100 90.5 95.5 4 20

The above table shows the marks obtained by 20 students in an examination . A teacher
uses the table to find who is the median . She/he will use the formula :
c
f
f
N
L m
m
|
|
|
|
.
|

\
|

+ =
2
, as the table shows grouped data.
Median = 70.5 + 10
5
5
2
20
|
|
|
|
.
|

\
|


= 80.5



c) Mode

Colour Number of students
Red 5
Orange 7
Yellow 13
Green 15
Blue 17
Purple 13


The above shows a table of 70 students and their favourite colour . Mode can be used to
identify which colour has the most number of students who prefers it .
Based on the graph , the colour blue has the highest number of students whos favourite
colour is blue . So,
Mode = Blue
Modal number = 17




PART
2












1. March Pendidikan Sivik dan Kewarganegaraan monthly test scores for 25
students.
Bil Name Scores
1 Alberth Gampad 60
2 Anndysent Hedeon Chong 68
3 Clesty Chin Kher Sing 68
4 Darrel Dylan Paskol 62
5 Delia Rosanne 94
6 Esrine Esther 76
7 Grace Geoffry Mojiliu 70
8 Hani Farhana 70
9 JeffeLee Shang Huang 76
10 Jessica Grace David 86
11 Julian Julio 76
12 Liaw Vun Shein 72
13 Mohamad Rasyed 72
14 Mohd. Ameer Fiqri 90
15 Muhammad Fathin 86
16 Myron Jeremy Michael 72
17 Ng Shu Jeit 82
18 Prisca Yong Jing Jie 68
19 Sebastian Primus 80
20 Steveandey Lee Shang Yang 72
21 Tan Sze Yang 88
22 Tan Xin Min 82
23 Voo Yu Peng 58
24 Wilson Chong Wei Seng 82
25 Yasmin Hamzah 78












2.
a) Mean
Mean ,
N
x
x

=
=
25
1888

= 75.52
b) Median

Median , m = 76

c) Mode

Mode = 72

d) Standard deviation

( )
2
2
x
N
x
=

o
=
2
52 . 75
25
144672

= 9.144












3.
Marks Frequency
55-59 1
60-64 2
65-69 3
70-74 6
75-79 4
80-84 4
85-89 3
90-94 2

a)
i) Mean
Marks Frequency ,f Midpoint,x f x
55-59 1 57 57
60-64 2 62 124
65-69 3 67 201
70-74 6 72 432
75-79 4 77 308
80-84 4 82 328
85-89 3 87 261
90-94 2 92 184

= 25 f

=1895 fx

Mean ,

=
f
f x
x
=
25
1895

= 75.8








ii) Mode
Based on histogram ,
Mode = 72.5

iii) Median
Marks Frequency ,f Midpoint,x Lower Boundary Cumulative Frequency
55-59 1 57 54.5 1
60-64 2 62 59.5 3
65-69 3 67 64.5 6
70-74 6 72 69.5 12
75-79 4 77 74.5 16
80-84 4 82 79.5 20
85-89 3 87 84.5 23
90-94 2 92 89.5 25

Method 1 Using formula
c
f
F
N
L m
m
|
|
|
|
.
|

\
|
=
+ =
2

= 5
4
12
2
25
5 . 74
|
|
|
|
.
|

\
|

+
= 75.125
Method 2 Using Ogive
Based on ogive ,
Median = 75.25







iv) Standard deviation
Method 1
Marks Frequency ,f Midpoint,x x
2
f x
2
55-59 1 57 3249 3249
60-64 2 62 3844 7688
65-69 3 67 4489 13467
70-74 6 72 5184 31104
75-79 4 77 5929 23716
80-84 4 82 6724 26896
85-89 3 87 7569 22707
90-94 2 92 8464 16928

= 25 f

45452
2
=

x 755 , 145
2
=

fx

( )
2
2
x
f
fx
=

o
=
2
8 . 75
25
145755

= 9.196
Method 2
Marks Frequency ,f Midpoint,x
( )
2
x x ( )
2
x x f
2
55-59 1 57 353.44 353.44
60-64 2 62 190.44 380.88
65-69 3 67 77.44 232.32
70-74 6 72 14.44 86.64
75-79 4 77 1.44 5.76
80-84 4 82 38.44 153.76
85-89 3 87 125.44 376.32
90-94 2 92 262.44 524.88

= 25 f

( )

= 52 . 1063
2
x x ( )

= 2114
2
x x f


=
f
x x f
2
) (
o
=
25
2114

=9.196

v) interquartile range
Method 1- Using the Formula
Marks Frequency ,f Midpoint,x Lower Boundary Cumulative Frequency
55-59 1 57 54.5 1
60-64 2 62 59.5 3
65-69 3 67 64.5 6
70-74 6 72 69.5 12
75-79 4 77 74.5 16
80-84 4 82 79.5 20
85-89 3 87 84.5 23
90-94 2 92 89.5 25

Q
1
= c
f
F
N
L
m
|
|
|
|
.
|

\
|

+
4

Q
1
=
4
25
th
=6.25 th
= 70-74
Q
1
= 5
6
6
4
25
5 . 69
|
|
|
|
.
|

\
|

+
= 69.71
Q
3
= c
f
F
N
L
m
|
|
|
|
.
|

\
|

+
4
3

Q
3
=
4
25 3x
th
=18.75 th
= 80 - 84
Q
3
= 5
4
16
4
25 3
5 . 79
|
|
|
|
.
|

\
|

+
x

= 82.94
Interquartile range =Q
3

-
Q
1

= 82.94 69.71
= 13.23



Method 2 Using Ogive
Based on ogive,
Q
1
= 74
Q
3
= 86.25
Interquartile range = Q
3
- Q
1

= 86.25 74
= 12.25
































b) Mean, x = 75.8 Median, m = 75.125 (or 76.0 5 . 0 + ) Mode = 72.5 5 . 0 +

From the above measure of central tendency, mean is suitable measure of central
tendency because the minimum value of raw data is not extreme where the data seems to
be clustered, whereas mode and median does not take all the values in the data into
account which decrease the accuracy of central tendency.
c)

The standard deviation gives a measure of dispersion of the data about the mean. A
direct analogy would be that of the interquartile range, which gives a measure of
dispersion about the median. However, the standard deviation is generally more useful
than the interquartile range as it includes all data in its calculation. The interquartile
range is totally dependent on just two values and ignores all the other observations in the
data. This reduces the accuracy it extreme value is present in the data. Since the marks
does not contain any extreme value, standard deviation give a better measures compared
to interquartile range.

4. a)
Grouped data gives more accurate representation . It is because grouped data would be
summarized by a table called a frequency distribution. For example, when finiding the
mean for grouped data is done by using the formula :

=
f
f x
x



where

fxis the sum of midpoint X frequency and

f is the sum of frequency.



Ungrouped data would just be the numbers themselves. We can find an ungrouped mean
by just adding them up and dividing by the sample size. The formula of mean for
ungrouped data is :
N
x
x

=
Therefore , grouped data shows more accurate representation than ungrouped data.





b)
i) Grouped data
When analyzing the marks of a certain examination , grouped data is used by
constructing frequency distribution data. Constructing a frequency distribution data is
the same thing as grouping data. By constructing the frequency distribution data , the
teacher is able to analysis the median , mode , mean , standard deviation and
interquartile of the marks of students sit for the examination.

Marks Frequency
55-59 3
60-64 4
65-69 5
70-74 9
75-79 5
80-84 5
85-89 5
90-94 5

From the above frequency distribution table , we can easily find mean , mode , and
median for the marks obtained by the students . Therefore , analyzing the marks
obtained in a certain examination or test is suitable for using grouped data/constructing
frequency distribution data.
ii) Ungrouped data
Ungrouped data comes handy when analyzing the transportation of students when going
to school . Ungrouped data is prepared by constructing a table with the specific
transportation and its respective number of students . For example :
Transportation Number of students
Car 7
Bus 14
Bicycle 6
Taxi 3

From the above table , we can easily interpret the data at first sight . Therefore ,
analyzing the transportation used by students when coming to school is suitable for
using ungrouped data.



a) If teacher add 3 marks for each student in my class for completing all their
assignments.
New Marks of Pendidikan Sivik dan Kewarganegaraan Monthly Test of the class 5
Bakti with 25 students
Bil Name Scores
1 Alberth Gampad 63
2 Anndysent Hedeon Chong 71
3 Clesty Chin Kher Sing 74
4 Darrel Dylan Paskol 65
5 Delia Rosanne 97
6 Esrine Esther 79
7 Grace Geoffry Mojiliu 73
8 Hani Farhana 73
9 JeffeLee Shang Huang 79
10 Jessica Grace David 89
11 Julian Julio 79
12 Liaw Vun Shein 75
13 Mohamad Rasyed 75
14 Mohd. Ameer Fiqri 93
15 Muhammad Fathin 89
16 Myron Jeremy Michael 75
17 Ng Shu Jeit 85
18 Prisca Yong Jing Jie 71
19 Sebastian Primus 83
20 Steveandey Lee Shang Yang 75
21 Tan Sze Yang 91
22 Tan Xin Min 85
23 Voo Yu Peng 61
24 Wilson Chong Wei Seng 85
25 Yasmin Hamzah 81

a) Mean
Marks Frequency ,f Midpoint,x f x
60-64 2 62 124
65-69 1 67 67
70-74 5 72 360
75-79 7 77 539
80-84 2 82 164
85-89 5 87 435
90-94 2 92 184
95-99 1 97 97

= 25 f

=1970 fx




a) Mean
Mean ,

=
f
f x
x
=
25
1970

= 78.8

b) Mode

Based on histogram ,
Mode = 76

c) Median
Marks Frequency ,f Midpoint,x Lower Boundary Cumulative Frequency
60-64 2 62 59.5 2
65-69 1 67 64.5 3
70-74 5 72 69.5 8
75-79 7 77 74.5 15
80-84 2 82 79.5 17
85-89 5 87 84.5 22
90-94 2 92 89.5 24
95-99 1 97 94.5 25

= 25 f


Method 1 Using formula
c
f
F
N
L m
m
|
|
|
|
.
|

\
|
=
+ =
2

Median class =
2
25
th
= 12.5 th
= 75-79
m = 5
15
8
2
25
5 . 74
|
|
|
|
.
|

\
|

+
= 76

Method 2 Using Ogive
Based on ogive ,
Median = 76.75

d)Interquartile range
Marks Frequency ,f Midpoint,x Lower Boundary Cumulative Frequency
60-64 2 62 59.5 2
65-69 1 67 64.5 3
70-74 5 72 69.5 8
75-79 7 77 74.5 15
80-84 2 82 79.5 17
85-89 5 87 84.5 22
90-94 2 92 89.5 24
95-99 1 97 94.5 25

= 25 f


Method 1 Using Formula

Q
1
= 5
5
3
4
25
5 . 69
|
|
|
|
.
|

\
|

+
= 72.75

Q
3
= 84.5+ 5
5
17
4
25 3
|
|
|
|
.
|

\
|

x

= 86.25


Interquartile range = Q
3
- Q
1

= 86.25 72.75
= 13.5



Method 2 Using Ogive
Based on ogive,
Q
1
= 69.75
Q
3
= 83
Interquartile range = Q
3
- Q
1

= 83 69.75
= 13.25
e) Standard deviation
Method 1
Marks Frequency ,f Midpoint,x x
2
f x
2
60-64 2 62 3844 7688
65-69 1 67 4489 4489
70-74 5 72 5184 25920
75-79 7 77 5929 41503
80-84 2 82 6724 13448
85-89 5 87 7569 37845
90-94 2 92 8464 16928
95-99 1 97 9409 9409

= 25 f

51612
2
=

x 157230
2
=

fx

( )
2
2
x
f
fx
=

o
=
2
8 . 78
25
157230

= 8.931


Method 2
Marks Frequency ,f Midpoint,x
( )
2
x x ( )
2
x x f
2
60-64 2 62 282.24 564.48
65-69 1 67 139.24 139.24
70-74 5 72 46.24 231.2
75-79 7 77 3.24 22.68
80-84 2 82 10.24 20.48
85-89 5 87 67.24 336.2
90-94 2 92 174.24 344.48
95-99 1 97 331.24 331.24

= 25 f

( )

= 92 . 1053
2
x x ( )

= 1990
2
x x f


=
f
x x f
2
) (
o
=
25
1990

= 8.922
















a) 2.
New Mean
Marks Frequency ,f Midpoint,x f x
60-64 2 62 124
65-69 1 67 67
70-74 5 72 360
75-79 7 77 539
80-84 2 82 164
85-89 5 87 435
90-94 2 92 184
95-99 2 97 194

= 26 f

= 2067 fx

Mean ,

=
f
f x
x
=
26
2067

= 79.5
New Standard deviation
Method 1
Marks Frequency ,f Midpoint,x x
2
f x
2
60-64 2 62 3844 7688
65-69 1 67 4489 4489
70-74 5 72 5184 25920
75-79 7 77 5929 41503
80-84 2 82 6724 13448
85-89 5 87 7569 37845
90-94 2 92 8464 16928
95-99 2 97 9409 18818

= 26 f

51612
2
=

x 166639
2
=

fx

( )
2
2
x
f
fx
=

o
=
2
5 . 79
26
166639

= 9.431


Further Exploration
1.The calculation of the following :
Number of Top 20% students = 25 x 20%
= 5 students
Therefore , 20
th
-25
th
students will be awarded by PSK teacher .
Based on ogive ,
The lowest mark for the top 20% of the students to be awarded by the Pendidikan Sivik
& Kewarganegaraan teacher is 84.5 .


2.
Mr. Mas class :
Mean = 76.79
Standard deviation = 10.36

Between Mr. Mas class and my class , Mr.Mas class have a better achievements . This
is because Mr.Mas class have a higher mean of their classs overall marks than my
classs marks , this mean they have a better performance in their Pendidkan Sivik dan
Kewarganegaraan s monthly test. On the other hand , my classs mark have the standard
deviation of 9.196 and Mr.Mas class with the standard deviation of 10.36, this shows
that my class have lower standard deviation Mr.Mas class . The smaller the standard
deviation means greater the central tendency and data is concentrated around mean. The
higher value of Standard Deviation implies greater spread of data. Therefore , from the
statement above , we can conclude that Mr.Mas class have a better achievements than
my class.







My class :
Mean = 75.8
Standard deviation = 9.196
Reflection
When my Additional Mathematic teacher first announce this project , we were
surprised and at the same time puzzled because we did not know how to do it . After
some explanation by our Additional Mathematic teacher , we finally were able to picture
how to begin our Additional Mathematic Project Work .
When doing my Additional Mathematic Project Work , I learnt how to work
independently without relying on the helps of other . I learnt to never give up and keep
moving forward , for instance I frequently made mistakes in my project work , I keeping
making countless correction instead of throwing the towels . Besides , I learnt to share
information obtained with my class and exchange points of view to finish our project
work without further delay.
To improve our classs performance , I personally suggest that our class should have
our own studying groups , so we can tutor those who are weak in Additional
Mathematics subject. Our Additional Mathematic teacher can also organize an extra
class for our class , so we can finish the syllabus sooner . Our class can have a mentor-
mentee system , so those who are good in Additional Mathematic can teach those who
are weak in Additional Mathematic .
My feeling when doing the Additional Mathematic Project Work :



Additional Mathematics
MT oh MT,
Before PMR , I heard of you,
After PMR , I still havent met you,
When I get to Form 4 , I finally met you ,
And
From the moment , I started to get to know you ,
I know you will be my friend ,
Who helps me in overcoming daily obstacle whether it is easy or difficult,
I spent countless hours,
Countless days,
Countless seasons,
I even sacrifice my precious time just for you,
Sacrificing My hand phones ,
Sacrificing My facebook,
And even Sacrificing My game time.
In the end , I realized that you are really important to me,
You are my problems solver,
My Hero,
My companion ,
And best of all ,
My friend.






Conclusion
After doing research , answering questions , drawing graphs , and solving problems , I
saw that the usage of statistics is important in daily life. It is not just widely used in
markets but also interpreting the condition of the surrounding like air or the water .
Especially in conducting air pollution survey . In conclusion , statistic in daily life
necessity . Without it , surveys cannot be conducted , stock market cannot be interpret
and many more . Therefore , we should be thankful of the people who contribute in the
idea of statistics.

Anda mungkin juga menyukai