Anda di halaman 1dari 10

Module 1

Basic Concepts in Statistics

Introduction
Everyday we often listen to our friends on such matters
such as How many units have you enrolled this semester or
What is you average grade this semester and so forth and so
on. Or we listen to a barrage of statistic from our University
Registrar telling us that The total enrollment this semester is
almost 6,500 with 3,500 males and 3,000 females. The total
enrollment for the College of Arts and Sciences is 1600, with 879
number of students enrolled in the College of Agriculture and so
forth and so on.
We often listen to the different vital statistics of Miss CAS,
Miss BA, Miss Agriculture, etc. during Miss CLSU as 34-24-34 or
36-25-35 and so on. Or you may ask the question of What is the
probability that you will graduate two years from now? or What
is the probability that Sir will be absent next meeting? However,
statistics is more than these.
Statistics enters into almost every phase of life in some way
or another. A daily news broadcast may start with a weather
forecast and end with an analysis of the stock market. In a
newspaper at hand we see in the first five pages stories on an
increase in the wholesale price of sugar, an increase in the
number of crimes committed, new findings on mothers who
smoke, the urgent need for laws, a school plan for evaluation of
teachers, popularity of cell phones, a tuition fee increase and sex
bias in the government working force. Each article reports some
information, proposal or conclusion based on the organization
and analysis of numerical data.
Statistics in systematic and penetrating ways provides
bases for investigations in many fields of knowledge, such as

social, physical sciences, engineering, education, business,


medicine and law. Information on a topic is acquired in the form
of numbers; an analysis of these data is made in order to obtain a
better understanding of the phenomenon of interest; and some
conclusions may be drawn. Often generalizations are sought;
their validity is assessed by further investigations.
A definition of statistics is making sense out of figures.
Statistics
is
the
methodology
which
scientists
and
mathematicians have developed for interpreting and drawing
conclusions from data. This chapter begins with the real
definitions of statistics and the basic terminologies and concepts
underlying the subject of statistics.

Objectives:
At the end of this module, you should be able to:
1. Differentiate descriptive and inferential statistics.
2. Differentiate a continuous variable from a discrete
variable and quantitative variable from qualitative
variable.
3. Classify data according to level of measurement.
4. Employ summation notation and apply operations
involving the summation.
Definition of Statistics
Statistics is a branch of science which deals with the
collection, organization, summary, presentation and
analyses of quantitative data as well as drawing valid
conclusions and making reasonable decisions on the
bases of such analyses.
The analysis of data collected in the course of study is
among the most important activities performed by the
researcher. Unfortunately, it is not given very much attention
until the moment it is scheduled to begin. This module stresses
that planning for data analysis begins when a study is just getting

underway and it continues until all hypotheses or questions being


considered have been satisfactorily resolved. Data analysis is the
focus because:
1. The value of an entire study may depend upon the
analyses one chooses to make or not to make. All the
time spent obtaining permission to conduct a study and
selecting samples and instruments may be wasted, in
whole or in part, if careful attention is not given to how
data will eventually be analyzed.
2. Planning data analysis procedures before data are
collected insures that the right information will be
collected and in a form suitable for later use. Learning
too late that additional data would have made your study
far more valuable is justification enough to plan ahead. It
is not unusual for researchers to find that for every hour
spent in planning the exact format in which data should
be collected, as many as ten hours are saved in the
analysis phase.
3. Knowledge of available procedures for data analysis
should lead you to make more useful sets of findings and
implications. This should cause others to take your work
more seriously than they would if this section were given
inadequate attention.
4. Data analysis is not that difficult anyway. There now
exists a multitude of computer programs, many of which
are designed in a user friendly format. All it need to
take to do an analysis is the ability to push the button on
the computer corresponding to the number of your
selection. Data analysis can be simple as that and still be
powerful enough to accomplish the purposes discussed
above. With a bit more prodding, some computers can do
much more to help us to generate study findings which
will make our previous efforts even more worthwhile.
DESCRIPTIVE AND INFERENTIAL STATISTICS

The emphasis on the decision-making aspects of statistics is


a recent one. In its early years, the study of statistics largely
consisted of methodology for summarizing or describing
numerical data. This area of study has become known as
descriptive statistics because it is concerned largely with
summary calculations and graphical displays. These methods are
in contrast to the modern statistical approach in which
generalizations are made about the whole by investigating a
portion. Thus, the average income of all families in the Philippines
can be estimated from figures obtained from a few hundred
families. Such a prediction or estimate is an example of inference.
The study of how inferences are made from numerical data is
called inferential statistics.
VARIABLES
Variables are the factors that we focus on in a given study.
They are the characteristics of interest of the study which are
inherent of the object or person. Example of such variables are
sex of the grade I pupils, number of children in the family, age of
father, family income, color of the eye of the person, nationality,
attitude of farmers, behavior of kindergarten pupils, etc
Kinds of Variables:
1. Continuous variable takes any value within a specified
range of values. It usually gives rise to measurement.
Example: height, weight, volume, age
2. Discrete variable takes integral values. It usually gives
rise to counting numbers. Example: number of children,
number of road accidents
Types of Variables:
1. Quantitative variables those variables which are
expressed numerically. Example: height, weight, number of
children in the family

2. Qualitative variables those variables which are


expressed categorically. Example: color of the eye, sex,
military rank
Levels of Measurements
1. Nominal: Nominal data consist of numbers which indicate
categories for purely classification or identification purposes.
With nominal data, the numbers themselves have no
mathematical value assigned to them. The number on the
back of a basketball player, for example, is at a nominal level.
It makes no sense to add the number of the center (12) to the
number of the guard (33). In effect the numbers are really
names. Mathematically, all one may do with nominal data is to
count how many are in each category. Another example is sex
where we assigned 1 for male and 2 for female.
2. Ordinal: Ordinal data possess a rank order characteristics,
but they do not provide information about the distance
between each rank. Thus if I know that Marvin is ranked best in
Mathematics, Melvin is next and MJ is third best, they may be
assigned values of 1, 2, and 3, respectively. One does
not know, however, how much better in mathematics Marvin is
as compared to Melvin, or Melvin compared to MJ. With ordinal
data the intervals between the ranks are not equal. Likewise,
the only mathematical symbol that can be used in an ordinal
data is greater than (>) or less than (<).
3. Interval:
Interval data possess equal intervals providing
information about how much better one value is compared
with another. Usually, we assume that our mental ability test,
achievement test and attitudinal test scores are examples of
interval data. Further, interval data have no absolute zero, that
is, zero is just an arbitrary value. If the temperature reading is
0oC, it does not mean the absence of temperature but rather
the temperature reaches the freezing point. Or if the IQ score
is zero, it does not mean the absence of knowledge but rather
the individual belongs to the low (or very low) performer
category. Furthermore, interval level can differentiate between
any two classes in terms of degrees of differences. Aside from
the mathematical symbol > and <, addition and subtraction
have meanings.

4. Ratio: Ratio data possess the characteristics of interval


measures and they have a true zero or absolute zero which
indicates the total absence of the property being measured.
For example, measures of height, weight and age are typical
ratio scales since all of them have a zero value. All
mathematical procedures are appropriate with ratio scales.

SAQ1
1. Categorize each of the following as either nominal,
ordinal, interval or ratio measurement:
a. first, second and third place in a singing contest
b. metric measurement of distance
c. house numbers
d. cell phone numbers
e. number
of live births in December, 2000
Notations
and Symbols
f. attitude
towards
impeachment:
1=high
2=moderate 3=low
2. Categorize each of the following variables as either
discrete or continuous.
a. number of students who score 80 and above in
the NSAT exam
b. distance of the school to your house
c. number of chairs in the auditorium
d. number of faculty members in you school
e. floor area of our classroom (in sq. ft.)
f. scores of students in an examination

Notations and Symbols


In the study of statistics, we cannot avoid the use of the different
notations and symbols. If our variable of interest is age, then we
let the symbol X stand for the variable age. Similarly, if the ages
of the 4 students are 15, 18, 19, and 15 then we can write the
following as: X1 = 15, X2 = 18, X3 = 19 and X4 = 15. Generally
X1,X2, X3, and X4 can be written down as Xi (read as X sub i)
where i is known as the index which locates the value of the
variable in the set. Note that X1 and X4 have the same value,
however, the value of X1 refers to student number 1 while X 4
refers to student number 4. Formally, we write the symbol for the
variable and the value of the variable (known as the variate) as
follows:
X = (X1,X2, , Xn )
Frequently, it is necessary to work with sums of numerical values.
Using the Greek letter (capital sigma) to indicate summation
of, we can write the sum of the 4 ages as
4

X i
i 1

where we read summation of Xi, i going from 1 to 4. The


numbers 1 to 4 are called the lower and upper limits of
summation. Hence
4

X i=
i 1

X1 + X2 + X3 + X4

= 15 + 18 + 19 + 15 = 67
n

In general, the symbol

i 1

means that we replace i whenever it

appears after the summation symbol by 1, then by 2, and so on


up to n and then add up the terms. The subscript may be
changed to any letter, although i is seen to be written in most
textbooks.
When we are summing over all the values, instead of using
we use to mean that the sum is taken from the first observation
to the nth observation.
PROPERTIES OF SUMMATION SIGN

1. The sum of n number of constant k is


k = nk
2. The sum of the constant k times the variable Xi is equal to
the constant k times the sum of the variable Xi, that is,
k X i = k Xi
3. The summation of the sum or difference of 2 or more
variables is equal to the sum or difference of terms taken
separately, that is
(Xi Yi) = Xi Yi
4. The sum of the squares of variables is obtained by first
taking the square of all the observations and then get the
sum, that is,
Xi2 = X12 + X22 + X32 + + Xn2
5. The square of the sum of the variable is obtained by first
taking the sum of all the observations and then take the
square, that is
( Xi )2 = (X1 + X2 + X3 + + Xn )2
6. The sum of the product of 2 variables X and Y is obtained by
first taking the product of the 2 variables then take the sum,
that is
(Xi Yi) = (X1Y1) + (X2Y2) + (X3Y3) + + (XnYn)
7. The sum of the product of 2 variables X and Y is obtained by
first taking the sum of X and the sum of Y separately and
then take the product, that is
( Xi )( Yi )= (X1 + X2 + X3 + + Xn)(Y1 + Y2 + Y3 + +
Yn)

Activity

A. Write out in full the sums represented by the following


expressions:
8

1.

4Wi

4.

2.

i 4
6

j 1

Yj

i4

i 1

i 1

Xi / Yi 3. X i Yi 4

i2

i 3

5. X i 3 Yi 2

B. Write each of the following expressions in summation notation


with appropriate limits.
1. a1 b1 a 2 b2 a 3 b3 a 4 b4
3
2. R1 R2 R3 R4 R5 R6 R7 R8
3
4
3. X 1 X 2 X 3 X 4 Y1 Y2 Y3 Y4

2
2
2
2
2
4. W1 W2 W3 W4 W5

5.

X1 X2 X3 X4 X5 X6 X7 X8
Y1 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y 8

10

C. For data set below, compute the following:

a.

b.

1
f. X i
n

j.

X Y

i i

1
g. Yi
n

c.

h.

2
i

2
i

d.

i.

e.

X Y
i

X Y
i

Data on the frequency of watching TV (X) and frequency of reading


books (Y) of n = 10 pupils per week.
Pupi
1
2
3
4
5
6
7
8
9
10
l
X
Y

5
2

3
4

4
3

2
5

1
6

0
7

6
1

5
2

D. Given a set of values X 1 , X 2 ,..., X N and Y1 , Y2 ,..., YN .


If A

and B

Prove the following:


a.

b.

c.

d.

e.

A 0

A X i NA2
2

A X i B

X Y B
i

1 X i 2 X i N
2

A X i B

X Y NAB
i i

2
5

3
4