Anda di halaman 1dari 197

STATISTICS

and

PROBABILITY

Basic Approach

By:

FELIPE M. SARMIENTO WILFREDO N.


DE CASTRO

1
TABLE OF CONTENTS
Introduction Page
0.1 Statistics 1
0.2 Mathematical Statistics 2
0.3 Two Fields of Statistics 2
0.4 Uses of Statistics 3
0.5 Terminologies 5

Chapter I - Collection of Data


1.1 Importance of Statistical Data in Management 7
1.2 Advantage of Primary Data 8
1.3 Reasons for Collecting Data 9
1.4 Methods of Gathering Data 9
1.5 Planning the Study 10
1.6 Types of Question 11
1.7 Contents of a Good Questionnaires 13
1.8 Sampling Technique 13
1.9 Random Sampling 13
1.10 Systematics Sampling 14
1.11 Types of Systematic Sampling 15
1.12 Non-Random Sampling 18

Chapter II - Presentation of Data


2.1 Organization of Quantitative Data 23
2.2 Textual Method 24
2.3 Tabular Presentation 24
2.4 Parts of Statistical Table 25
2.5 Frequency Table 26
2.6 Steps in Constructing Frequency Table 27
2.7 Organization of Data Using EXCEL 30

2
2.8 Graphical Methods 31
2.9 Kinds of Graphs 31
2.10 Cumulative Frequency Distribution 34
2.11 Relative Frequency Distribution 37

Chapter III - Central Tendency & Variation


3.1 Central Tendency 41
3.2 Mean. Mode, Median 43
3.3 Computation Using EXCEL 46
3.4 Analysis of the Mean 47
3.5 Ungrouped Data 49
3.6 Arithmetic Mean 50
3.7 Group Data 50
3.8 Computation of Median from Grouped Data 53
3.9 Computation of Mode from Grouped Data 54
3.10 Comparison of Mean, Mode, and Median 54
3.11 Positively Skewed Distribution 56
3.12 Negatively Skewed Distribution 57
3.13 Other Forms of Central Tendency 58
3.14 Non-Central Form of Measurement 59
3.15 Variations & Deviations 68
3.16 The Range 69
3.17 The Semi-Interquartile Range 69
3.18 Mean Absolute Deviation 70
3.19 Standard Deviation 70

Chapter IV - Basic & Discrete Probability


4.1 The Basic Concept of Probability 81
4.2 Mutually Exclusive Event 82
4.3 Repeated trials 83
4.4 n-Factorial 83
4.5 Sample Space 84

3
4.6 Contingency Table 84
4.7 Discrete Probability 85
4.8 Probability Distribution of Discrete Random Variable 86
4.9 Value of Discrete Variable 87

Chapter V - Normal Distribution


5.1 Properties of Normal Curve 97
5.2 Skewed Distribution 98
5.3 Areas under Normal Curve 99
5.4 Application of Normal Distribution 103
5.5 Standardized Normal Distribution 112
5.6 Table of Areas Under Normal Curve 116
5.7 Sampling Distribution of the Mean 120
5.8 Other Features of Random Sampling 120
5.9 Distribution of the Mean 121
5.10 Standard Error of the Mean 123
5.11 Sampling from Normally Distributed Population 123
5.12 Z-Value for Sampling Distribution of the Mean 124
5.13 The Central Limit Theorem 126
5.14 Normality and the Sampling Distribution of the Mean 126

Chapter VI - Interval Estimation


6.1 Interval Estimation 132
6.2 Confidence Interval of the Mean (σ Known) 132
6.3 Confidence Interval of the Mean (σ Unknown) 137
6.4 Student's t-Distribution 137
6.5 Properties of t-Distribution 137
6.6 Student's t-Distribution Table 138
6.7 Degrees of Freedom Concept 139
6.8 Sampling Distribution of the Proportion 143
6.9 Confidence Interval Estimate for Proportion 145

4
Chapter VII - Fundamentals of Hypothesis
Testing
7.1 Hypothesis Testing Methodology 151
7.2 The Null and Alternative Hypotheses 152
7.3 Type I and Type II error 152
7.4 Level of Significance 153
7.5 One Tailed and Two Tailed Test 154
7.6 Steps in Hypothesis Testing 154

Chapter VIII - Simple Regression &


Correlation
8.1 Simple Regression Analysis 175
8.2 Scatter Diagram 176
8.3 Simple Correlation Analysis 184
8.4 Pointers in Interpreting Computed Values of r 188

5
INTRODUCTION
Statistics

It is the process of gathering, collecting, analysing, interpreting,


presenting, organizing, and evaluating data. Statistics can be applied to a
scientific, engineering, industrial, and social problem. It is conservative to
start with a statistical model or a population to be evaluated or studied.
Population can be a various topics from all aspects such as people, animals,
things, gadgets, foods, and the likes or a simple matter that is considered in a
study or in an experiment.

Today, the use of statistics extends to electronics, marketing, trading,


medicine, foods, sports, entertainment, and other related things that can be
expressed in numbers. The government used statistics on its operations and
accomplishments based on quantitative data given by different departments
and bureaus. It is now the language of everyone from all walks of life.

In this work-textbook, we will undertake the term statistics as the


science that deals with gathering, presenting or tabulating, analysing and
interpreting numerical or quantitative data. In addition to these, the work-
textbook will feature some basic topics of probability.

The Gathering or Collecting of data refers to sampling and gathering


quantitative measurements of some specific data while the Tabulation or
Presentation of data deals with the process of combining similar data into

6
categories by way of organized tabulation, graphical diagrams or charts in
order to arrive in a logical conclusions or decisions.

A standard procedure in statistics includes the assessment of two


statistical data sets regarding their relationship. A hypothesis is the basic
method to be used in comparing two statistical data sets whether they are
both collected, or one is idealized. Other methods or process will be
discussed on the later chapters of the work-textbook to give emphasis on
other dimensions of statistics.

Mathematical statistics

The application of mathematics in statistics is called Mathematical


Statistics. Different mathematical techniques and methods are being used for
the analysis of data gathered or idealized, like linear algebra, theoretical
probabilities, differential equation and other applicable methods.

Two Fields of Statistics

Descriptive Statistics is the establishment of facts that includes the


gathering, collecting, classifying, and presenting of data. It contains the
summarizing of collected and gathered values to describe a group
characteristics or category of the data. These involve the three measures of
central tendency, variability, and skewedness of the organized and ordered
data.

7
Inferential Statistics stresses a logical order of evaluation that leads to
a more specific conclusion for an analytical findings. It is therefore a
process that includes mathematical analysis to reach a reasonable conclusion
out of the given facts or data. We are using this to infer or conclude from the
samples what a set of data or the population might have or to make decisions
of probability between two sets of data. The simplest inferential test is
shown when we want to compare the performance of two groups or the
sample to the population on a single measure to see if there is a difference.
The methods are: hypothesis, analysis of variance, enumeration data or the
chi square, regression, simple correlation and time series analysis.

Uses of Statistics

 In Education – quantitative and qualitative data on graduation and


enrolment, number of teachers, school buildings and other related
physical facilities, etc. Based on these data, the DepEd can project its
budgetary requirements.
 In Government – Annual records are being stored and kept to become
idealized data which is the basis of budget projection. Other data are
being gathered to provide the heads of the departments and offices for
guidance in their administration and operation.
 In Psychology – Data on human behaviour in relation to emotion and
intelligence, attitude and aptitudes, and personality traits are being
gathered and analysed in a very systematic approach. Psychologists are
the ones who interpret and evaluate these data.
 In Sociology – Statistics is being utilized in determining social condition
of certain place or location. It includes the living condition of the people,
8
the economic impact on the place, social impact and responsibilities of
the people, and also the peace and order of the place.
 In Business and Economics – Statistics has a great share in business and
economics. Before establishing a business, statistics is being used in
determining possible markets, forecasting sales, and determining
business feasibilities. It is also being used in personnel relationships and
improvement of work attitudes of workers.
 In Sports, Medicine – In boxing, basketball, football, soccer, tennis, and
in different kinds of sports, statistics plays a big role in dealing with
athletes’ performances. All previous games are being treated with a
mathematical statistics in order to evaluate the present performance of a
player or athlete. In the same manner, statistical computations are being
used in Medicine especially in the formulation of drugs and other
medical treatment.
 Others – Statistics has a wide range of uses. It is presumed that
everything under the sun can be subjected to statistics.

A usual objective for a statistical analysis and evaluation is to


investigate causality or connection, particularly to draw a conclusion on the
effect of changes in the values of independent variable to dependent
variable. The effect of the differences between independent and dependent
variables are being observed and it is in this effect that a statistical research
study is being conducted.

9
Terminologies
 Data is any gathered or idealized set of studies under consideration,
standards, significances, costs, components, things, items, articles, or any
valuable matters or elements.
 Population refers to a complete set of all possible data under
consideration in the study or research.
 Sample refers to a portion of the population that is being gathered by way
of any acceptable method of collecting data.
 Data Point refers to any element in the sample or population.
 Qualitative Data are extremely varies in nature. It includes any
information which is not numerical in nature. These are the results when
the information has been sorted into categories.
 Quantitative Data are the data that can be quantified and can be
subjected to statistical computation. These are the results of counting or
measuring as the qualitative data described.
 Parameter is a variable for which the range of possible values identifies a
collection of distinct cases in a certain problem. In statistics, the
parameter is a variable whose value is sought by means of evidence from
samples.
 Subscript is a number or letter representing several numbers placed at the
lower right of variable.
 Primary Data refers to information which were gathered directly from an
original source.
 Secondary Data refers to information which are taken from published or
unpublished data which were previously gathered by other individual or
agencies.

10
 Numerical Scale It is often necessary to group numerical data into
categories. The range of the data is divided into a number of intervals,
where each interval becomes a category in a numerical scale. This type of
numerical scale is implemented by the Numerical Scale Class.
 Ordinal Scale refers to systematic arrangement of data by way of rank,
degree, capability, strength, and many others.
 Interval Scale refers not only to the arrangement of observation in order
but also to other information attached in the study or research.

11
CHAPTER 1
COLLECTION OF DATA

TOPIC LESSON
1. Methods in Collecting Data.
2. Planning a Research.
3. Survey Questionnaire.
4. Sampling Techniques

OBJECTIVES
For the students to:

1. Know the methods of collecting data.


2. Prepare a research planning including its
requirements and expenses.
3. Make and compose a good questionnaire.
4. Have an actual experience in collecting data thru
sampling technique.

Importance of Statistical Data in Management

Management requires available information when it comes to decision


making. The ability of every entrepreneur to tender critical business
manoeuver relies only on the information available to him. The present trend
and issue regarding any business endeavour is not only the shortage of data

12
and information but also the use of the available data to come up a good
business decisions.

Any kind of business ventures, project study, research and the like
should be based on precise and correct data in order to ensure the accuracy
of the research or the study. To this effect, there should be an excellent
method in (Sampling) gathering and collecting the data the will be used in
the analysis and interpretation of the research or study. Best samples will
produce outstanding results while unhealthy samples will lead to
unfavourable results.

There are two types of data, the primary data and secondary data.
Example of primary data are first person accounts, autobiographies, diaries
and the likes while secondary data are published books, newspapers,
magazines, biographies, business reports and the likes.

The data or observations collected and gathered from prime or first-


hand source for as long as it is acquired systematically have a higher degree
of accuracy. This type of data is dependable and relevant in the manner of
the direct involvement of the researchers in gathering the data. If ever that
the primary data is not available, the secondary data can be used but of a
lesser precision.

Advantages of Primary Data:

1. Primary data is a first-hand data coming from the original source.


2. Primary data gives excellent relevance to the research for its direct
involvement in the project.
3. Primary data are more accurate because of its origin.

13
Reasons for Collecting Data:
1. To provide necessary input to a study and research.
2. To determine the performance of any existing service,
production, sales process and the likes.
3. To assess the quality of any product in accordance with
existing standards.
4. To support in the formulation of alternative measures in
the process of decision making.
5. To satisfy management curiosity towards the direction of
the business.

Methods of Gathering Data:

1. The direct or interview method. This is a person to person


conversation between the interviewer and the interviewee. The
researcher or the interviewer is the one who accomplish the set of
questions by writing the answers of the respondents. The
researcher can modify the question to suit the level of
understanding of the respondents.
2. The indirect or questionnaire method. This is being done by way
of a questionnaire. In this method, written responses from the
respondents are given to the prepared questions. The respondents
will answer the questionnaire the way they understand it without
further explanation by the researcher. This is done by just giving
out questionnaires to anybody within the population to be a
sample.
3. The registration method. This method is a compulsory to any
probable respondents as it is being prescribed by law. The

14
advantage of this method is that the information is being saved or
stored by the government or private entities and made available to
anybody who needs it. Examples of this method are car
registration, enrolments, census, and the likes.
4. The observation method. In this method, the investigator observes
the behaviour of a certain phenomenon, a person or group on their
activities or outcomes. It is usually used when the subjects cannot
talk or write like the occurrence of typhoons and other phenomena,
special person or people, and the likes.
5. The experiment method. It is commonly used by scientists,
chemist, and other people connected to experiments. The objective
of this method is to record the cause and effect of a scientific
research or study that is being done in a meticulous and organized
manner.

Where to find Data


1. Published government data.
2. Published industrial and personal data.
3. Execute an experiment for the required data.
4. Make an observational study.
5. Plan a survey study.

Planning the Study

1. Establish the quantity of the population. Make an estimated


number of items to be considered in the study or research.
2. Organize the people to be used in the gathering and interpretation
of the data. Determine the duration of the gathering of data,

15
evaluation of the data, and the interpretation of the data (to be
discussed in the latter chapters or the inferential part of the book).
Assess the monetary resources or the budget if available to pursue
the research. If the budget does not warrant the study of the entire
population, the researcher can use samples of the population.
3. Prepare all the documents pertaining to the study especially the
questionnaire. Decide on the parameters to be followed in
collecting the population’s or sample’s data.

Types of questions

1. Structured questions – These are questions that can be answered by


a few options like for example:
a. Are the following equipment important to you? Please
check.
Lap top ( ) yes ( ) no
P.C. ( ) yes ( ) no
Cell phone ( ) yes ( ) no
Motorcycle ( ) yes ( ) no
b. Please check the items you want.
Car ( )
Motorcycle ( )
House & Lot ( )
c. Please check your civil status.
( ) Single
( ) Married
( ) Widow
( ) Widower

16
2. Unstructured or open-ended questions – are questions that can be
answered in different ways. These are the investigative questions
Sathat elicit some relative reasons.
Examples:
a. Do you want the system of government?
( ) yes ( ) no. Why?
b. In your opinion, can we produce a genius student in this
university
( ) yes ( ) no. Why?
4. Determine the sample size needed using the Sloven’s formula:

𝑁
n= 𝑒𝑞. 2.1 Where: n = Sample size
1+𝑁𝑒 2

N = Population size
e = Desired margin of error

Example: A researcher wants to make a socio-economic survey in a


certain community having a population of 10,000 families more
or less. If he allows a margin of error of ±3.0%, how many
families must he take into his sample?

Solution:

𝑁 10,000
n= , n= n = 1,000 families.
1+𝑁𝑒 2 1+10,000(0.03)2

5. Collect the samples using one of the sampling techniques that will
be discussed in the latter part of this chapter. The parameter will be
strictly observed in collecting data.

17
6. Evaluate and interpret the data using the methods of inferential
statistics that will be discussed on the latter part of the book. After
the interpretation, a statement of conclusion has to be made.

Contents of a Good Questionnaire

1. It should contain short but clear questions.


2. Leading questions should be avoided. Ex. Why do you prefer
“SELECTA” ice cream? Instead, What “BRAND” of ice cream do
you prefer?
3. The units stated in the question should be precise to have an easy
order in the tabulation of results.
4. The questions if possible can be answered by just checking the slots
provided to every particular name, kind, type, or brand.
5. The questions should contain important information.
6. The question should be carefully planned to attain the desired flow of
thinking.

Sampling Techniques

Sampling is the actualization of collecting or gathering data. It is not


required to get the entire population if it is impossible to do it, but only the
required samples for a desired margin of error. What is important here is to
have sample of the data required for any particular research or study.

A. Random Sampling
Random sampling refers to the selecting of samples size (n) without a
given pattern or system from a population (N) so that each item or member
in the population has an equal chance of being a sample. The number of

18
samples will be in accordance with the required number samples as
computed in the formula given in step 4 of the planning the study.
When we speak of picking things at random, we mean picking things
fairly without prejudice or any predetermined choice. In any occasion for
example, the quests may be asked to pick a seat at random. This can be done
by assigning individual number to each seat. The numbers are then written
on pieces of paper and placed in a box or container where they are mixed
thoroughly. When the participant draws a number from the box, he would
have drawn a number at random.
The random sampling can also be done in awarding prizes through the
“raffle” system. The participant winners can be asked to pick their prizes at
random. There will be assignment of numbers to every prize so that anyone
who could pick the number will get the prize.
One way of getting samples thru random sampling is the Lottery
Method. This is a method of random sampling wherein all members will be
given an assigned number each. All numbers will be put in a lottery box and
be rolled or shaken, after which the samples will be picked-up one by one on
the box.

There are other ways of selecting samples by way of random method.


The manner of selecting can still be called random for as long as the entire
members of the population can have a chance to be a sample and no system
is being applied in the selection.

B. Systematic Sampling
Another way of selecting samples is systematic sampling. In this
method, a number of types have been created which may be called
systematic sampling methods. These types are being used when there is an

19
erstwhile understanding of the members of the population. An example of
this method is when samples are chosen by way of counting in repetition
while getting every first or fifth in the process. This process using a system
is then called systematic sampling.

Types of Systematic Sampling

1. Stratified Sampling – Stratify refers to layer or layers. In this


method, the population will be divided into layers based on their
uniformity in order to evade the chance of drawing samples whose
members will come only from one layer.
The number of samples in each group will be determined
proportional to their sizes. The bigger the size the larger the
samples:
2. Cluster Sampling – Cluster also refers to a group like the stratified
sampling. But this one implies more on area sampling. In this
method, the geographic locations are given more emphasis in
selecting samples.
The number of samples per group can also be known in accordance
with the total number of members in each group like for example
in a certain municipality, the number of samples will be dependent
on the population of each barangay.
3. Multi-Stage Sampling – This refers to a phase by phase sampling.
The samples will come from different stages of the whole area or
community. This method uses many stages in getting the sample
from the general population. However, the selection of sample is
done at random sampling.

20
A concrete example of this is selecting samples in the entire
country. The first stage is to select regional samples. The size of
the regional samples will be determined by the regional
populations. Then, from regional to provincial; provincial to
municipal; and municipal to barangay.

Illustrative example in determining number of samples in systematic


sampling:

In a certain school, a survey on academic performance is being


conducted in each course. The following are the number of
students per course:

BSA - 540
BSBA - 1,950
BSEnt. - 450
BSE - 520
BSIT - 470
BSECE - 350
BSEE - 410
BSME - 560
BSCE - 570
BSComE - 480
Others - 650
TOTAL - 6,950
a) How many samples are required if the margin of error is ±3%?
b) Find the number of samples per course.

Given: Total population = 6,950, e = ±3%?

21
Solution: Determine first the required number of samples base on the
total population:
𝑁
Formula: Sloven’s Ratio - 𝑛=
1 +𝑁𝑒 2
6,950
𝑛=
1 +6,950(0.03)2

𝒏 = 𝟗𝟓𝟖 𝑨𝒏𝒔𝒘𝒆𝒓
Ratio of samples per course:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑦𝑑𝑒𝑛𝑡𝑠 𝑜𝑓 𝑐𝑜𝑢𝑟𝑠𝑒 ∗ 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
𝑅=
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛

For BSA:
540 𝑥 958
𝑅= = 74 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950

For BSBA:
1,950 𝑥 958
𝑅= = 269 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950

For BSEnt:
450 𝑥 958
𝑅= = 62 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950

For BSE:
520 𝑥 958
𝑅= = 72 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950

For BSIT:
470 𝑥 958
𝑅= = 65 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950

For BSECE:
350 𝑥 958
𝑅= = 48 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950

For BSEE:
410 𝑥 958
𝑅= = 57 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950

For BSME”

22
560 𝑥 958
𝑅= = 77 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950

For BSCE:
570 𝑥 958
𝑅= = 79 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950

BSComE:
480 𝑥 958
𝑅= = 66 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950

For Others:
650 𝑥 958
𝑅= = 90 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950

TOTAL - 6,950
Note: The sum of the samples per course is 959 due to the interpolation of
decimal values of courses’ ratio.

C. Non-Random Sampling
In this method, not all members of the population are given equal
chances to be chosen. Some elements of the population are deliberately
left out in the choice of the sample for various reasons. The non-random
is being used instead of random if there is inadequate budget to pursue
the random sampling. Most of the times, this method is being used for
exclusive study for men or for women; for rich or for the poor and the
likes.

23
Name: ________________________ Course: _______
Classroom Activity No. 1.1 Section: _______

1. A researcher wants to make a socio-economic survey in a certain


municipality having a population of 350,000 people more or less. If he
allows a margin of error of ±4.0%, how many people must he take into his
sample?

2. A researcher gathered 1,560 samples in his research/study. If his desired


margin of error was ±3.0%, what would be the considered population in his
study?

3. A researcher gathered 2,200 samples in his research/study. If the total


population considered in his research/study was 18,000 people, what would
be the considered margin of error in his study?

24
Name: ________________________ Course: ________
Classroom Activity No. 1.1 Section: ________
4. In a certain school, a survey on the academic performance and socio-
economic status of the students is being conducted in each course. The
following are the number of students per course:
BSA - 140
BSBA - 650
BSEnt. - 350
BSE - 150
BSIT - 170
a) How many samples are required if the margin of error is ±2.5%?
b) Find the number of samples per course.

25
Name: ________________________ Course: ________
Homework No. 1.1 Section: ________

Evaluate the following problem


1. A researcher wants to make a socio-economic survey in a certain
municipality having a population of 350,000 people more or less.
If he allows a margin of error of ±4.0%, how many people must he
take into his sample?

2. A researcher gathered 1,560 samples in his research/study. If his


desired margin of error was ±3.0%, what would be the considered
population in his study?

3. A researcher gathered 2,200 samples in his research/study. If the


total population considered in his research/study was 18,000
people, what would be the considered margin of error in his study?

26
4. In a certain school, a survey on the academic performance and
socio-economic status of the students is being conducted in each
course. The following are the number of students per course:
BSA - 140
BSBA - 650
BSEnt. - 350
BSE - 150
BSIT - 170
a) How many samples are required if the margin of error is ±2%?
b) Find the number of samples per course.

27
CHAPTER 2

PRESENTATION OF DATA

TOPIC LESSON
1. Methods of Presenting Data.
2. Statistical Table / Frequency Table
3. Frequency Distribution
4. Frequency Polygon

OBJECTIVES
For the students to:

1. Be familiar with the methods of presenting data.


2. Construct statistical / frequency table.
3. Know how to accomplish the frequency distribution.
4. Draw the frequency polygon.

Organization of Quantitative Data


Presentation of data will come after the gathering of data. When the
number of observations becomes very large, it is very difficult to arrange
them either ascending or descending order. To understand what the data has
conveyed for, it is necessary to have ways in organizing the huge data to
have a better understanding and interpretation. The gathered data are raw

28
data and they have to be organized in order to express the important qualities
and attributes. There are three forms to present data:
1. Textual – where data is presented in paragraph form
2. Tabular – where data is presented in rows or columns
3. Graphical – where data is presented in visual form.

Textual Method.

This method of presenting data is being than by describing in the form


of sentences and paragraphs the summary of the contents. Some people
cannot easily realize set of data in a tabular form unless an initial
clarification of data is given. In textual presentation, the researcher or the
one who is making the study can point some emphases on the importance of
some items or give some relevant attention to some values.

An example of textual method is the one that is being used by the


reporters on television of radio. Reporters prefer to use textual method of
presentation rather than others because of the manner they execute their job.
It is so easy for them to just read the text rather than interpret tables and
graphs.

Tabular Presentation.

Tabular presentation can be done manually but some big companies


are using computer back-up program in doing tabulations of data. In this
book, we will show the process of condensing huge amount of collected data
into an arranged and organized tabulation.

The process of summarizing collected data and assembling them in an


organized table is called Tabulation. The collected data can be classified into

29
group data and each group in the table can be compared with each other in a
more comfortable manner.

The process of combining together the same items from the mass of
collected data based on their appearances and features like occupation, sex,
height, weight, income, nationality, etc. is called classification of data.

The classified data, when arranged in a symmetrical form simplifies


the analysis of the relationship of each group. This is called statistical table.
This table contains four parts, and they are:

Parts of Statistical Table.

The four important parts of statistical table:

1. The table heading – it describes the contents of the table.


2. Stub – it shows the classification of the contents.
3. Box head – it defines the contents of each column.
4. Body – is the content of the table.

Example:

Annual Enrolees of P. A. University


1995 - 2005
Year Population Rate of Increase
1995 1,280 -
1996 1,379 7.73%
1997 1,492 8.19%
1998 1,650 10.59%
1999 1,783 8.06%

30
2000 1,935 8.52%
2001 2,122 9.66%
2002 2,368 11.59%
2003 2,681 13.22%
2004 3,042 13.47%
2005 3,428 12.69%

Red - Table Heading


Green - Box Head
Blue - Stub
Purple - Body

Frequency Table

Preferred benefits of the faculties & employees of P. A. University


Welfare Benefits Tally Frequency
Profit Sharing IIII-IIII-IIII-IIII-IIII 25
Christmas Bonus IIII-IIII-IIII-IIII-1 21
Quarterly Bonus IIII-IIII-IIII-II 17
Night Differential IIII-IIII-IIII-II 17
Rice Benefits IIII-IIII-IIII 15
Monthly Groceries IIII-IIII-III 13
Vacation Leave IIII-IIII 10
Sick Leave IIII-III 8
Clothing IIII-II 7
Provident Fund IIII-I 6
TOTAL 139

31
The tabular arrangement or organization of data by categories
including their frequencies or occurrences is called Frequency Distribution.
The number of items or observations belong to any category is called the
Class Frequency. The grouping of items that described by lower and upper
limit is the Class Interval. The lower limit is the value of the lowest item that
belongs to a class interval. While the upper limit is the value of the highest
item that belongs to the same class interval.

The class limits are described with more accurate manifestations by at


least 0.5 of their values by Class Boundaries. It is located in between the
upper boundary of one class interval and the lower limit of the next class
interval.

Steps in Constructing a Frequency Distribution

1. Determine the range in the set of data. It is the difference between


the highest and lowest values in the observation.
2. Assume the desired number of class interval or categories. Five to
fifteen (5 – 15) is the perfect or ideal number of class intervals.
3. Establish the approximate size of the class interval. The size and
the number of class interval can be interchanged some of the time.
It is computed as the quotient of the range over the desired number
of class interval.
4. When constructing the frequency distribution table, it is suggested
that the class intervals will start with the lowest lower limit as
determined by the researcher’s choice.

32
5. Determine the class frequencies for each class interval using the
tally method or any other acceptable method.
6. Compute for class mark. The class mark is the average of lower
and upper limit.

Example: Consider the Raw Data below, construct a frequency


distribution table.

153 144 166 147 135 148


142 152 161 156 133 123
170 143 152 137 151 155
154 134 147 163 157 135
125 138 185 143 145 155
175 158 166 154 129 173
180 153 147 164 179 128

Solution: Please notice the difficulty of making any interpretation on the


observations on the raw data given. The only way to analyse the
data is to put them in a frequency distribution table.

1. Find the range in the observation.

𝑅 = 185 − 123 = 62

2. Assume the desired number of class interval.


Assumption: 7 Class Intervals
3. Compute for the size of Class Intervals.
𝑅+1
𝑆𝑖𝑧𝑒 (𝑖) =
# 𝑜𝑓 𝐶𝐼
62+1
𝑖= =9
7

33
4. Determine the Class Intervals in ascending order. The book
suggests to start from the lowest value of item.
a. First Class Interval
i. Lower Limit = 123
ii. Upper Limit = Lower Limit + Size -1
Upper Limit = 123 + 9 – 1 = 131
b. Second Class Interval & succeeding C. I.
i. Lower Limit = Lower Limit of Prior + Size
Second L.L = 123 + 9 = 132
Third L.L = 132 + 9 = 141
Fourth L.L = 141 + 9 = 150
Fifth L.L = 150 + 9 = 159
Sixth L.L = 159 + 9 = 168
Seventh L.L = 168 + 9 = 177
ii. Upper Limit = Upper Limit of Prior + Size
Second U.L = 131 + 9 = 140
Third U.L = 140 + 9 = 149
Fourth U.L = 149 + 9 = 158
Fifth U.L = 158 + 9 = 167
Sixth U.L = 167 + 9 = 176
Seventh U.L = 176 + 9 = 185
5. Count the number of frequencies in each class intervals.
6. Compute the class marks.
123+131
First CM = = 127
2

Succeeding CI will be computed accordingly like the first class


interval.

34
Class interval Tally Frequency Class Mark
123-131 III 4 127
132-140 IIII 6 136
141-149 IIII-IIII 9 145
150-158 IIII-IIII-II 12 154
159-167 IIII 5 163
168-176 IIII 3 172
177-185 II 3 181
42

Organization of Data Using the Excel Program:


Data gathered on any source can be organized by the use of Excel
Program. Construct a table having the desired number of columns in the
excel display and also the number of rows as shown below. Encode the
necessary information from the gathered data into the rows and columns of
the table.
The encoded data can now be analyse using the excel program again.
The data can be turned into different types of graphs. They can be also
evaluated and interpreted in finding some statistical measures that will be
discussed on the next chapters of the book.
Class Interval Tally Frequency Class Mark
123-131 IIII 4 127
132-140 IIIII-I 6 136
141-149 IIIII-IIII 9 145
150-158 IIIII-IIIII-II 12 154
159-167 IIIII 5 163
168-176 III 3 172
177-185 III 3 181
Total 42

35
Layout of Class Boundaries and Class limits:
Class Boundaries : 131.5 140.5 149.5
Class Limits 123 131.132 140.141 149.150

Graphical Method:
Graphs are pictures of numerical data. We can see them in
many styles and they are widely because of clear pictures of
numerical data. Instantly, the viewer can recognize the highest or the
largest among any particular data like, population, births, registration,
and the likes.

Kinds of Graphs:
1. Bar graph – is a graph that consists of several bars either vertical or
horizontal bars. The magnitude of the bars is represented by their
scaled lengths.
2. Pie Chart – is a graph in the form of a pie or circle. Pie chart is
used to represent the shares of all categories in the entire
observation or data.
3. Line Graphs – is a graph that shows the magnitudes or frequencies
of an item or value in any observation.
4. Compound Bar Chart – is an ordinary bar graph wherein there are
two or more bars drawn for each item. This chart is used when the
need of comparison is being asked.
There are many graphs that can be adapted to any presentation
relevant to any subject of study. Graphs are instruments that can be helpful
in the interpretation of data and other related matters.

36
BAR GRAPH
14
12
10
8
6
4
2
0
127 136 145 154 163 172 181

PIE CHART

127
136
145
154
163
172
181

LINE GRAPH
14

12

10

0
127 136 145 154 163 172 181

37
COMPOUND BAR

30

25

20

Frequency
15
Percent
10

0
127 136 145 154 163 172 181

Frequency Polygon – is a line graph of class frequencies plotted


against class marks. It is made by connecting the mid-points of the
rectangular tops of a bar graph.

Frequency Polygon
14

12

10

8
Frequency
6

38
Cumulative Frequency distribution: The cumulative frequency
distribution is a tabular distribution of cumulated frequencies of class
intervals in tabular arrangement. There are two types of cumulative
frequency distribution, the “less than” and “more than” cumulative
distribution.

1. The “less than” cumulative frequency distribution starts from the


frequency of the first class interval. Then, succeeding frequencies
of the class intervals will be added correspondingly one by one to
identify the frequencies of the rest class intervals. It is represented
by a line going upward and the frequency polygon produced is
called less than ogive. The items belong to any class boundary is
less that the upper boundary of the corresponding class interval.
The frequencies in the <cf column are the items less than the upper
boundary of any particular class interval.

Class Interval f <cf


123-131 4 4
132-140 6 10
141-149 9 19
150-158 12 31
159-167 5 36
168-176 3 39
177-185 3 42
42

39
In the frequency distribution: There are 4 items less than
131.50; 10 items less than 140.50; 19 items less than 149.50; 31 items
less than 158.50; 36 items less than 167.50; 39 items less than 176.50;
and 42 items less than 185.50.

<cf
45
40
35
30
25
20 <cf

15
10
5
0
131.5 140.5 149.5 158.5 167.5 176.5 185.5

2. The “more than” cumulative frequency distribution starts from the


total frequency. Then, the next frequency is the difference between
the cumulative frequency and the frequency of the previous class
interval. It is represented by a line going downward and the
frequency polygon produced is called more than ogive. The items
belong to any class boundary is more that the lower boundary of
the corresponding class interval. The frequencies in the <cf column
are the items more than the lower boundary of any particular class
interval.

40
Class Interval f >cf
123-131 4 42
132-140 6 38
141-149 9 32
150-158 12 23
159-167 5 11
168-176 3 6
177-185 3 3
42

In the frequency distribution: There are 42 items more than


122,3.50; 38 items more than 131.50; 32 items more than 140.50; 23
items more than 149.50; 11 items more than 158.50; 6 items less than
167.50; and 3 items less than 176.50.

>cf
45
40
35
30
25
20 >cf
15
10
5
0
122.5 131.5 140.5 149.5 158.5 167.5 176.5

41
Relative Frequency Distribution: The relative frequency distribution
is the arrangement of data in tabular form indicating the percentage of the
class frequencies over the total frequency. It is sometimes called the
percentage table.
Class Interval f rf(%)
123-131 4 9.52
132-140 6 14.29
141-149 9 21.43
150-158 12 28.57
159-167 5 11.90
168-176 3 7.14
177-188 3 7.14

30

25

20

Frequency
15
Percent

10

0
127 136 145 154 163 172 181
`

42
Name: ________________________ Course: ________
Classroom Activity No.2.1 Section: ________
1. Prepare a frequency distribution table for the following random data.
148 253 268 372 387 493 408 513 528 633 648
753 768 873 888 491 406 511 526 631 646 751
766 472 487 592 507 612 627 732 747 453 468
573 588 693 608 713 728 534 549 654 669 576
681 696 701 517 622 637 547 552 564 576 588
690 503 519 621 535 644 556 666 374 584 794
804 712 820 835 641 556 761 876 683 699 501
517 422 235 647 758 869 374 485 293 505 610
625 730 745 850 865 973 187 292 207 312 327
432 447 552 567 672 687 792 707 812 827 932

43
Name: ________________________ Course: ________
Homework No.2.1 Section: ________
1. Prepare a frequency distribution table for the following random data.
57 42 25 67 78 89 34 45 23 55 60
65 70 75 80 85 93 17 22 27 32 37
42 47 52 57 62 67 72 77 82 87 92
18 23 28 32 37 43 48 53 58 63 68
73 78 83 88 41 46 51 56 61 66 71
76 42 47 52 57 62 67 72 77 43 48
53 58 63 68 73 78 54 59 64 69 56
61 66 71 57 62 67 57 52 54 56 58
60 53 59 61 55 64 56 66 34 54 74
84 72 80 85 61 56 71 86 63 69 51

44
2. Construct a frequency distribution table for the following data
representing the daily savings of employees in a certain company.
141 146 151 156 161 166 171 176 142 147 152 157
132 137 143 148 153 158 163 168 173 178 183 188
152 154 156 158 160 153 159 161 155 164 156 166
134 154 174 184 177 162 167 172 177 182 187 192
157 142 125 167 178 189 134 145 123 155 160 165
170 175 180 185 196 117 122 127 132 137 142 147
162 167 172 177 143 148 153 158 163 168 173 178
154 159 164 169 156 161 166 171 157 162 167 157
152 157 162 167 172 177 182 187 192 118 123 148

45
CHAPTER 3

CENTRAL TENDENCY and


VARIATION
A. CENTRAL TENDENCY

TOPIC LESSON
1. The Three Measures of Central Tendency
2. Ungrouped Data
3. Grouped Data
4. Comparison of Mean, Median, and Mode
5. What Measure to be used.

OBJECTIVES
For the students to:

1. Identify the Three Measures of Central Tendency.


2. Deal with Ungrouped and Grouped Data.
3. Establish the Difference between the Three
Measures of Central Tendency.
4. Know the Uses of each of them

Central Tendency and Variation:


Different descriptive measures representing central tendency and
variation can be used to summarize the foremost information about any
given set of data in any analysis and interpretation. When these measures

46
from the entire population of data are treated and evaluated, it is called
statistics.

Central Tendency is a measure or a position in a set of data that


described the entire data itself. It can be shown in three forms which are:
Mean, Median, and Mode. Another measure that can described a set of data
is the measure of variability. This will be taken up on the next chapter of this
work textbook.

The representative of the values and items in a set of data is a single


figure that may or may not be found in the set of data is called Central
Tendency. It is a measure or a position in the set of data, or maybe a member
of the set of data that represents the entire data in any evaluation or
interpretation.

For instance, a store manager was asked by the store owner about the
daily sales of the store for the period of six months or 180 days. Hence,
instead of enumerating the 180 days sales, the manager can only give a value
that will represent the entire 180 days sales like the average daily sales, the
highest one day sale, or any particular value that can describe the 180-day
sales.

In line with production, the values may differ from day to day but a
single figure will suffice to define the volume of production in any given
span or period. And so, instead of going into the details of a given
distribution, perhaps it would be easier to find out that single figure that can
represent the entire set of data.

47
In any aspect, there is one value or single figure that could be used to
describe a set of data. The most commonly used measures are the mean,
median, and mode.

The mean is defined as the average figure of all the items. It is the
“central value” of any set of observations and computed as the sum of all
the items divided by the number of all the items.

∑𝑥 ∑ 𝑓𝑥
𝑥̅ = 𝑁
𝑓𝑜𝑟 𝑢𝑛𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎 𝑒𝑞. 3.1 , 𝑥̅ = 𝑁
𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎 𝑒𝑞. 3.2

The median is defined as the value at the middle of any distribution or set
of data. It could be one of the data or an item in the distribution or just
simply a value that represents the middle figure after arranging the data
accordingly. The item or score can be found by:
𝑁+1
𝑚= 2
𝑓𝑜𝑟 𝑢𝑛𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎 𝑒𝑞. 3.3
𝑁
−∑ 𝑓𝑚−1
2
𝑚 = 𝑙𝑚 + ( 𝑓𝑚
) 𝑖 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎 𝑒𝑞. 3.4

A mode is defined as the value that has the highest frequency of figure or
value that appears most frequently in the set of data.

∆1
𝑚𝑜 = 𝑙𝑚𝑜 + (∆ ) 𝑖 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎 𝑒𝑞. 3.5
1 + ∆2

48
Illustrative example: Consider the following set of observations.
35, 37, 40, 40, 48

Given: N=5

Solution: Refer to the formulas

∑𝑥 35+37+40+40+48
Mean: 𝑥̅ = = = 𝟒𝟎
𝑁 5

𝑁+1 5+1
The Median: 𝑚= = =3
2 2

Median is the third item: 𝑚 = 𝟒𝟎

The Mode: (By inspection)

The most frequent figure is:

𝑚𝑜 = 𝟒𝟎

We notice that on the given distribution, the three measures of central


tendency, the mean, mode and median are equal. Suppose we apply change
in one item in the distribution, say 58 instead of 48. The measures would
then be:

The average figure now is:

35+37+40+40+58
Mean: 𝑥̅ = = 𝟒𝟐
5

The middle figure now is still: (By inspection)

Median: 𝑚 = 𝟒𝟎

The most frequent figure now is steel: (By inspection)

49
Mode: 𝑚𝑜 = 𝟒𝟎

We can see that the mode and median are still 40 but the mean of the
new set of terms is now 42 and no longer 40. If we change the third term to
35 instead of 40, the mode is no longer 40 but 35 and the median is no
longer 40 but 37. But, the value of the mean would also change. Every
change applied to any terms would bring change in the mean as shown.

The average figure now is:

35+35+37+40+48
Mean: 𝑥̅ = = 𝟑𝟗
5

The middle figure is: (By inspection)

Median: 𝑚 = 𝟑𝟕

The most frequent figure is: (By inspection)

Mode: 𝑚𝑜 = 𝟑𝟓

We must remember that the mode and median respond only on some
changes in the terms, but the mean responds to every change in the terms.
This is the reason why the mean is often described as “sensitive” and reflects
represents the entire distribution.

That is the very reason why the mean is the most important among the
measures of central tendency. However, this sensitivity can lead to some
disadvantages, especially when the distribution contains some extreme
values. Extreme values refer to the lowest and highest values in the
distribution. If there are many extreme values, the mean is not the measure
to represent the distribution.

50
Computation Using the Excel

The computation of the three forms can be done using the excel
program. From the table, we can find the product by putting a formula on the
cell location of the sum or total of the items. The three forms can be found
by inspection and using the sum for the mean.

A B
1 1 35
2 2 37
3 3 40
4 4 40
5 5 48
6 Total 200
7 𝑥̅ 40
8 m 40
9 mo 40

From the Excel Program:

Assign numbers to each of the items and encode the items in random manner in
one column as shown above. After the encoding, highlight the entire data and click the
“Data” icon to show the options. Then click the order for ascending or descending manner;
or Smallest to Largest or Largest to Smallest. The data will automatically arrange in any of
the two orders.

The sum or total can be found by clicking the “∑” Auto Sum icon. The “Mean” can
be computed by dividing the “Total Cell” by the assigned “Cell” of the last number of the
items. The “Median can be found using the “Formula” while the “Mode” can be found by
inspection of the most frequent item.

Computation of the Mean Using the Excel Program: (From the Figure shown)

= 𝐵6 ÷ 𝐴5𝑒𝑛𝑡𝑒𝑟 The operation will give 40.

51
Analysis of the mean:

Theorem 1: In any set of terms or numerical distribution, if


the sum of the differences from a certain
number in any numerical distribution equals to
zero, that certain number is the mean.

Illustration: Differences from a certain number

Items Certain Number Differences


35 40 -5
37 40 -3
40 40 0
40 40 0
48 40 8
Total 0

Using other number on the certain number, we have the following


results. Try 42

Items Certain Number Differences


35 42 -7
37 42 -5
40 42 -2

52
40 42 -2
48 42 6
Total -10

Based on the theorem, if the total of the differences is zero, the certain
number is the mean. It is proven on the illustration shown above the theorem
on the mean.

Theorem 2: In any set of terms or numerical distribution, the sum of the


squared differences from the mean is the least as compared
to the sum of differences to any number.

Illustration: Sum of the squared differences

Items Mean Difference Squared D


35 40 -5 25
37 40 -3 9
40 40 0 0
40 40 0 0
48 40 8 64
Total 98

53
Using other values, we have: Try 42

Items Other Difference Squared D


Number
35 42 -7 49
37 42 -5 25
40 42 -2 4
40 42 -2 4
48 42 6 36
Total 118

Based on the theorem, the total of the squared differences is the least
from the mean. It is again proven on the illustration shown above the
theorem on the mean.

Ungrouped Data:

Ungrouped data refers to individual items without any category or


class. For ungrouped data, the mean is simply the sum of all items divided
by the total number of items.

Example: What are the three measures of central tendency of the


following costs of certain brand of shirts? P400, P550, P520,
P600, and P650.

∑𝑥
Formula: 𝑥̅ =
𝑛

400+520+550+600+650
Solution: 𝑥̅ =
5

54
𝑥̅ = 𝑃544

By inspection, the median and mode are:

𝑚 = 550

𝑚𝑜 = 𝑁𝑜𝑛𝑒

Arithmetic mean:

Arithmetic mean refers to some values of more than one in their


quantity. The mean derived in this case is called arithmetic mean.

Example: Consider the following data, 1,000 shirts were sold at P250;
800 shirts at P300; 500 shirts at P320; 400 shirts at P400; and
300 shirts at P550. What is the weighted arithmetic mean price
of shirts?

Solution:

1
WX = (1,000 x 250) + (800 x 300) + (500 x 320) + (400 x 400) +
3,000

(300 x 550)

WX = P325

Computation of mean, median, and mode from grouped data:

Data which are arranged in a frequency distribution are called grouped


data. When number of items is too large, it is best to compute for the
measure of central tendency and variability using the frequency distribution.

55
The Mean:

∑ 𝑓𝑥
Formula: 𝑥̅ =
𝑁

Where: 𝑥̅ – Mean
f – Frequency of Class Interval
x – Class Mark of Class interval
N – Total Number of Items

The Median:

𝑁
− ∑ 𝑓(𝑚−1)
2
Formula: 𝑚 = 𝐿𝑚 + ( )𝑖
𝑓𝑚

Where: m – Median
Lm – Lower limit boundary of median class
∑ 𝑓(𝑚 − 1) - Sum of all frequencies before the median
class
fm – Frequency of the median class
N – Total number of items
i – Size of class interval

The Mode:

𝛥1
Formula: 𝑚𝑜 = 𝐿𝑚𝑜 + ( )𝑖
𝛥1 +𝛥2

Where: mo – Mode
Lmo – Lower Limit Boundary of the modal class
∆1 – Difference between the highest frequency and the
frequency above it.

56
∆2 – Difference between the highest frequency and the
frequency below it.
i – Size of class interval

Example: Consider the Raw Data below, Determine the Mean, the
Median. and the Mode.

153 144 166 147 135 148


142 152 161 156 133 123
170 143 152 137 151 155
154 134 147 163 157 135
125 138 185 143 145 155
175 158 166 154 129 173
180 153 147 164 179 128

Solution: Using the tabular presentation of data as discussed in Chapter 2,


construct the Frequency Distribution in preparation for the
determination of three measures. Make a tally of the
frequencies and present them in the frequency distribution table
for analysis.

Class interval Frequency Class Mark


123-131 4 127
132-140 6 131
141-149 9 140
150-158 12 149
159-167 5 158
168-176 3 167
177-185 3 176

57
After arranging the data from lowest to highest, the middle value in
152+152
the set are 152 and 152. Hence, the median if = 152. the median
2

implies that the first half of the ordered set of data have values less than 152
while the other half have values greater than 152.

Computation of median from grouped data:

Arranging huge data in ascending or descending order will take a lot


of time just to find the median. The median, being one of the three forms of
central tendency is also important in the analysis of the data. We also have to
determine the value of the median, which divides the distribution into two
equal parts. Thus, we consider the “less than” cumulative frequency.
𝑁
− ∑ 𝑓(𝑚−1)
2
Median m = Lm + (
𝑓𝑚
)𝑖

𝑁+1
The median class is the class interval where 𝑡ℎ item is found. In
2
𝑁+1
the example, the 𝑡ℎ item is between 21th and 22nd items. Form the given
2

data, the values of this 21th item is 152 and 22nd item is 152 also and both
are within the cumulative frequency of 31. Therefore, the median class is
[150-158]. The lower limit boundary for the median class is 149.5, the
frequency of the median class is 12, and the cumulative frequency before the
median class is 19.

The median is then computed as follows:

𝑁
− ∑ 𝑓(𝑚−1)
2
Median m = Lm + (
𝑓𝑚
)𝑖

42
− 19
2
𝑚 = 149.5 + ( )9
12

58
21 − 19
𝑚 = 149.5 + ( )9
12

2
𝑚 = 149.5 + ( ) 9
12

m = 149.5 + 1.5

Median m = 151 Using grouped data.

Computation of the mode from grouped data:

Based on its description, the mode can be found in the class interval
with highest frequency. The class interval with highest frequency is known
as modal class. An observation with only one mode is known as uni-modal
while an observation with two or more modes is called multi-modal. A two
modes can be called bimodal while three modes as trimodal, and so on....

The mode in this distribution is located in the class interval with a


frequency of 12. Using the given formula, the mode is:

(12−9)
Mode 𝑚𝑜 = 149.5 + ((12−9)+(12−5)) 9
3
= 149.5 + ( )9
3+7
3
= 149.5 + ( ) 9
10

= 149.5 + 2.7
Mode mo = 152.2 or 152

Comparison of mean, median and mode:

The mode is very noticeable in the distribution. It is the item having


the highest frequency in the frequency distribution and it is the most
common value in the distribution. Because of its occurrence, a dressmaker

59
would produce more of its size. In business of apparel, the mode is the best
central tendency to use.

In mathematical analysis, the mode has lesser sensitivity against the


two forms, hence we rather use either mean or median because of their
greater susceptibility and stability. The mean is unaffected by the size while
the median is only a location of position in the distribution. The mean is
very sensitive in its very nature. Any changes in the distribution, the mean
changes in accordingly. It is so sensitive to the extent and enormity of all the
scores. The median is always in its location whatever changes occurred in
the observation.

The mean, the median, and the mode are all located at one point in a
symmetrical distribution. Data can be considered symmetrical if there are no
extreme values on both ends of the distribution so that the distribution is
balanced at the center of the data.

Frequencies

Mean X-axis
Median
Mode

1. The median bisects the total area. The area is divided into two parts,
one to the left and the other to the right of the median.

60
2. The mode is the item with the greatest frequency, the item on the x-
axis which corresponds to the tallest point of the curve.
3. The mean is the score point on the x-axis which corresponds to the
balance or fulcrum of the set of data.

If the distribution is skewed, the scores are concentrated more at one


end or the other or both ends. The three measures then will then differ with
each other. When the mean is greater than the median, the skewness is at the
right while if the mean is less than the median, the skewness is at the left of
the frequency polygon. If the mean is equal to the median, it is therefore a
zero skewness and it is called a normal or symmetrical distribution.

Positively skewed distribution

Frequency

X-axis
Mode
Median

Mean

We notice from the graph that the skewness is at the right. This
signifies that there are many low values and the mode is parallel to low
values and it is lower than the mean. The mean which is sensitive in its
nature, will be pulled in the direction of the extreme scores and will have a

61
high value. The media will have a value between the mode and the mean and
it is unaffected by extreme values.

Negatively skewed distribution

Frequency Mean Mode Median

x-axis

When do we use the three forms of central tendency.


When do we use each measure of central tendency? The legitimate
measure to use in a nominal scale is the mode because the mode can be
located in the distribution merely by looking for the most frequently
occurring value. It gives essential information to businessmen and producers
who aim their commodities at specific markets. Manufacturers are very
particular on the size of the most buyable product for dress, shoes, bags, and
the likes. For them, the mode is the most important among the three forms of
central tendency.
The Median is important if we want to place the distribution in order,
(ordinal scale) and then determine the middle value. The measure to use is
the median. The salaries of the employees are good example when using the
median. The median would be preferred over the mean because the salaries
of a few high employees would distort the average. The mean in such a case

62
cannot be the typical measure because it will be pulled by the few high
salaries.
In all occasions other than the above, especially if there is interval or
ratio scale, then the mean is to be used as central tendency. Generally, if we
are concentrated with quantity, the mean will be an appropriate measure.
A second consideration in choosing a measure of central tendency is
the purpose for which the measure is being used. The mean is the best
measure to use if we want the value of every single observation to contribute
to the average. If we want to estimate the cost of the average housing unit in
the community, the median would be more accurate to use. If we want to
find out the most frequent occurring item in a distribution, the mode is the
measure to be used.

Other form of Central Tendency:


The Midrange(mr)

The midrange is the average of the lowest and highest observation. It


is very easy to obtain by just getting the sum of the extremes and divide
them by two. This measure is not frequently used in interpretation and
analysis of data. The midrange can be found by:

The midrange is usually used by many analysts as a summary measure


because it can give immediate and suitable figure that can describe the given
set of data. It can be computed by mental computation by some experts.
𝑥1 +𝑥𝑛
𝑚𝑟 = 𝑒𝑞. 3.6
2

Where: 𝑚𝑟 − 𝑀𝑖𝑑𝑟𝑎𝑛𝑔𝑒

𝑥1 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛

𝑥𝑛 − 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛

63
The Non-Central Forms of Measurement:

The Quartiles
The quartiles are commonly used measures of “non-central” location
particularly of very large observations. The quartiles divides the entire
observation into four quarters.

The First Quartile is the item or value wherein three-fourths of the observations
are higher while the remaining one-fourth are lower. It can be found by:
𝑛+1
𝑄1 = 4
𝑓𝑜𝑟 𝑢𝑛𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎

𝑛+1
−(Σ𝑓𝑞1 −1)
4
𝑄1 = 𝐿𝐿𝑞1 + ( )𝑖 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎
𝑓𝑞1

Where: Q1 – First Quartile


LLq1 – Lower Limit of First Quartile Class
n– Total Observations
Σ𝑓𝑞1 − 1 – Total Frequency before the First Quartile Class
𝑓𝑞1 – Frequency of the First Quartile Class
i– Size of the Class

The Third Quartile is the item or value wherein three-fourths of the observations
are lower while the remaining one-fourth are higher. It can be found by:

3(𝑛+1)
𝑄3 =
4

3(𝑛+1)
−(Σ𝑓𝑞3 −1)
4
𝑄3 = 𝐿𝐿𝑞3 + ( 𝑓𝑞3
)𝑖 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎

Where: Q3 – Third Quartile


LLq3 – Lower Limit of Third Quartile Class
n– Total Observations
Σ𝑓𝑞3 − 1 – Total Frequency before the Third Quartile Class
𝑓𝑞3 – Frequency of the Third Quartile Class
i– Size of the Class

64
The Quantiles

The quantiles are the natural extension of the median concept but they
are values which divide the set of data into equal parts while the median
divides the set of data into two parts. The quantiles which divide the set of
data into four is called quartiles. Deciles divide the distribution into ten
while percentiles divide the distribution into one hundred parts.

Example: Consider the Frequency Distribution below, determine the


midrange and the quartiles.
Solution: Identify the lowest and the highest item for the midrange; and
the location of the first and third quartiles.
Class interval Frequency Class Mark
123-131 4 127
132-140 6 131
141-149 9 140
150-158 12 149
159-167 5 158
168-176 3 167
177-185 3 176
42

123+185
Midrange 𝑚𝑟 =
2

𝒎𝒓 = 𝟏𝟓𝟒 𝑨𝒏𝒔𝒘𝒆𝒓
𝑛+1 42+1
First Quartile: 𝑄1 = = = 10.75
4 4

Hence, we are looking for the 11th item on the set of data. From the
table, the 11th item belongs to Class Interval 141-149.
LLq1 = 141
n = 42
Σ𝑓𝑞1 − 1 = 10
65
𝑓𝑞1 =9
i =9

𝑛+1
−(Σ𝑓𝑞1 −1)
4
𝑄1 = 𝐿𝐿𝑞1 + ( )𝑖
𝑓𝑞1

10.75−10
𝑄1 = 141 + ( )9
9

𝑸𝟏 = 𝟏𝟒𝟏. 𝟕𝟓 𝑨𝒏𝒔𝒘𝒆𝒓

3(𝑛+1) 129
Third Quartile: 𝑄3 = = = 32.25
4 4

Hence, we are looking for the 33rd item on the set of data. From the
table, the 33rd item belongs to Class Interval 159-167
LLq3 = 159
n = 42
Σ𝑓𝑞3 − 1 = 31
𝑓𝑞3 =5
i =9

3(𝑛+1)
−(Σ𝑓𝑞3 −1)
4
𝑄3 = 𝐿𝐿𝑞3 + ( )𝑖
𝑓𝑞3

32.25−31
𝑄3 = 159 + ( )9
5

𝑸𝟑 = 𝟏𝟔𝟏. 𝟐𝟓 𝑨𝒏𝒔𝒘𝒆𝒓

66
Name: ________________________ Course: ________
Classroom Activity No.3.1 Section: ________
1. Determine the three forms of central tendency and the non-central measures
regarding the observation below.
148 253 268 372 387 493 408 513 528 633 648
753 768 873 888 491 406 511 526 631 646 751
766 472 487 592 507 612 627 732 747 453 468
573 588 693 608 713 728 534 549 654 669 576
681 696 701 517 622 637 547 552 564 576 588
690 503 519 621 535 644 556 666 374 584 794
804 712 820 835 641 556 761 876 683 699 501
517 422 235 647 758 869 374 485 293 505 610
625 730 745 850 865 973 187 292 207 312 327
432 447 552 567 672 687 792 707 812 827 932

67
Name: ________________________ Course: ________
Homework No.3.1 Section: ________

1. Find the arithmetic mean of a student who gathered the following


examination grades: 90, 87, 92, 85, 88, 95, and 78.

2. Find the measures of central tendency and the non-central measures of


the following numbers? 40, 50, 60, 35, 65, 28, 31, 55, 61, 49, 34, 42, 38,
47, 55, 46.

3. The following are the salaries of 12 players of the top NBA teams: four
of them earn 15M a year, two earn 18M a year, three earn 20M and 8M a
year. What is the mean and median salary of the players.

68
4. An achievement test in economics contained 30 questions. The
distribution below summarizes the result of the test.
Number of Answers Frequency
1–3 2
4–6 8
7–9 16
10 – 12 26
13 – 15 38
16 – 18 42
19 – 21 36
22 – 24 24
25 – 27 6
28 – 30 2
Find the mean, median, and the mode, midrange and the quartiles of the
distribution above.

69
5. In a selection of 15 lots of 200 electronic components, the following
numbers of defective electronic components were found:
3 10 12 5 11
8 7 4 13 4
9 4 6 5 15
Find the median and the mode of the defective electronics.

70
6. Given the following frequency distribution, estimate the mean, median,
and the mode.
Class Interval Frequency
71 – 75 4
66 – 70 18
61 – 65 24
56 – 60 42
51 – 55 47
46 – 50 29
41 – 45 27
36 – 40 11
31 – 35 3

71
7. Given the following raw data, determine the mean, median and mode;
and the non-central measures..
57 42 25 67 78 89 34 45 23 55 60 65
70 75 80 85 98 15 22 27 32 37 42 47
52 57 62 67 72 77 82 87 92 18 23 28
32 37 43 48 53 58 63 68 73 78 83 88
41 46 51 56 61 66 71 76 42 47 52 57
62 67 72 77 43 48 53 58 63 68 73 78
54 59 64 69 56 61 66 71 57 62 67 57
52 54 56 58 60 53 59 61 55 64 56 66
34 54 74 84

72
II. VARIATION and DEVIATION

TOPIC LESSON
1. Mean Absolute Deviation
2. Quartile Deviation
3. Standard Deviation
4. Other Deviations

OBJECTIVES
For the students to:

1. Understand deviations and variances.


2. Analyse the differences of items in any observations.
3. Be familiar with the different types of variability.
4. Know the uses of different types of variability.

Variation is another essential property that can describe a set of


numerical observation. The amount of spread or dispersion in any
observation or set of data is called variation. There are five forms or
measures that can be considered in variation. They are the range, the
interquartile range, the mean absolute deviation, the standard deviation, and
the variance. Other forms of variations will be discussed at the end of the
chapter for basic information.

In the previous topic, we discussed the measure of central tendency as


means to describe a given set of data. These measures indicate the point
where the items are centrally located. However, they do not show whether
the terms in the distribution are far or close to each other.

73
The measures of position are of little value unless the measures of
spread or variability which occur about them are known. Therefore, the
description of a set of data becomes more meaningful if the degree of
clustering about a central point is measured. Information on how far apart
the observations are from each other in every set will be very useful.

We can answer this through the use of measures of spread or


variability. The measures describe the extent of “scattering” of individual
items about the average or point of central location. Among these measures
we shall discuss are the range, quartile deviation, the mean absolute
deviation, and the standard deviation.

The Range

The range is the simplest measure of spread or variability. It is the


difference between the highest and the lowest items in the distribution. In the
frequency distribution table, the range is the difference between the upper
limit of the highest class interval and the lower limit of the lowest class
interval.

The semi-Interquartile Range

The semi-interquartile range is sometimes called quartile deviation. It


is the amount of spread between the first quartile and the median, or the
median and the third quartile.

The semi-interquartile range measures the dispersion in the middle


half of the items arranged in array. Hence, it offsets the possible effect when
the extreme values are out of line. Some analysts called this a “midhinge”.
1
Formula: QD = (Q3 – Q1) eq. 3.7
2

Where: QD – Quartile deviation

Q1 – First quartile

Q3 – Third quartile

74
Mean absolute deviation

To arrive in a more reliable indicator of the variability or spread in a


distribution, we should consider the value of each individual score and
determine the amount by which each varies from the mean of the
distribution. One way of doing it is to use the measure called mean absolute
deviation.

To get the mean absolute deviation, we get the sum of the absolute
values of the mean deviates then divide it by the total number of items in the
distribution.
∑∣𝑥−𝑥̅ ∣
Formula: MAD = ungrouped data eq. 3.8
𝑛−1

Where: MAD – mean absolute deviation

x – individual score

𝑥̅ - mean
∑ 𝑓∣𝑥−𝑥̅ ∣
Formula: MAD = grouped data eq. 3.9
𝑛−1

Where: MAD – mean absolute deviation

x – class mark

𝑥̅ - mean

F – frequency

The standard deviation

The standard deviation is a special form of average deviation from the


mean. It is also affected by all the individual scores of the items in the
distribution.

∑(𝑥− 𝑥̅ )2
Formula: 𝑠=√ ungrouped data eq. 3.10
𝑛−1

Where: s – standard deviation

75
x – value of item

𝑥̅ - mean

N – total number of items

∑𝑓(𝑥− 𝑥̅ )2
Formula: 𝑠= √ grouped data eq. 3.11
𝑛−1

Where: s – standard deviation

f - frequency

x – value of item

𝑥̅ - mean

N – total number of items

Example 1. The prices of certain books are set at P400, P550, P520, P600,
and P650. Find the measures of variability.

Solution: Determine first the mean of the distribution


∑∣𝑥−ẍ ∣
Mean 𝑥̅ =
𝑁
400+500+520+600+650
𝑥̅ =
5

𝑥̅ = 𝑃544
Then using the computed mean, tabulate the given data together with
the mean in the excel program in two separate columns. Also, provide a
column for the mean absolute deviation and the standard deviation. Since the
analysis is for the ungrouped data, the two formulas to be used are: (Please
refer to example 2 in the detailed step by step procedure in EXCEL.

∑∣𝑥−𝑥̅ ∣
MAD = For Mean Absolute Deviation
𝑛−1

∑(𝑥− 𝑥̅ )2
𝑠=√ For Standard Deviation
𝑛−1

76
Tabulation from the Excel Program
x x- (x-x) Ix-xI
1 400 544 -144 20736 144
2 520 544 -24 576 24
3 550 544 6 36 6
4 600 544 56 3136 56
5 650 544 106 11236 106
TOTAL 2720 35720 336
8930
X 544
MAD 84
SD 94.50

By Analytical Computation for Ungrouped Data:


Mean Absolute Deviation
∑∣𝑥−𝑥̅ ∣
𝑀𝐴𝐷 =
𝑛−1
∑∣336 ∣
𝑀𝐴𝐷 =
4

𝑴𝑨𝑫 = 𝟖𝟒 𝑨𝒏𝒔𝒘𝒆𝒓

Standard Deviation
∑(𝑥− 𝑥̅ )2
𝑠=√
𝑛−1

35,720
𝑠=√
4

𝒔 = 𝟗𝟒, 𝟓𝟎 𝑨𝒏𝒔𝒘𝒆𝒓

Example 2: Consider the raw data below (Chapter two) and determine the
measures of variability.

153 144 166 147 135 148


142 152 161 156 133 123

77
170 143 152 137 151 155
154 134 147 163 157 135
125 138 185 143 145 155
175 158 166 154 129 173
180 153 147 164 179 128

Construct first the frequency Distribution:

Class Interval f C. Mark


123-131 4 127
132-140 6 136
141-149 9 145
150-158 12 154
159-167 5 163
168-176 3 172
177-185 3 181
42

Mean Absolute Deviation


A B C D E F G
1 Class Interval f C.M.(x) fx x Ix-xI fIx-xI
2 123-131 4 127 508 151.21 -24.21 96.86
3 132-140 6 136 816 151.21 -15.21 91.29
4 141-149 9 145 1305 151.21 -6.21 55.93
5 150-158 12 154 1848 151.21 2.79 33.43
6 159-167 5 163 815 151.21 11.79 58.93
7 168-176 3 172 516 151.21 20.79 62.36
8 177-185 3 181 543 151.21 29.79 89.36
9 Σ 42 6351 488.14
10 x 151.21
11 MAD 11.91

78
Transfer the outcome of the frequency distribution in the EXCEL
Program and formulate the formulas for each cell. The only given values in
the table are values from columns A, B, and C coming from the frequency
distribution. The following procedures will be followed to get the values of
the rest of the columns.

1. The values in column D can be taken by multiplying B-values (f)


and C-values (x): D2 = B2*C2, D3 = B3*C3, and so on.
2. Row 9 are the totals of the columns and can be found using the
icon ΣAutoSum. Therefore, D9 = SUM(L3:L9).
3. The Mean (A10) can be computed by dividing D9 by B9. Hence,
A10 = D9/B9. Please refer to column E.
4. Values of column F are the absolute differences of columns C and
E: F2 = C2-E2, F3 = C3-E3, and so on.
5. Values of column G are the products of columns B and F: G2 =
B2*F2, G3 = B3*F3, and so on.
6. The Mean Absolute Deviation (A11) is the quotient of G9 divided
by B9-1: A11=G9/(B9-1).
In solving other forms of variance, the same procedures will be
followed. The contents of Row 1 are to be followed in formulating the
formulas.

By Analytical Computation for Grouped Data:


Mean Absolute Deviation
∑ 𝑓∣𝑥−𝑥̅ ∣
𝑀𝐴𝐷 =
𝑛−1
∑∣488.14 ∣
𝑀𝐴𝐷 =
41

𝑴𝑨𝑫 = 𝟏𝟏. 𝟗𝟏 𝑨𝒏𝒔𝒘𝒆𝒓

79
Standard Deviation
A B C D E F G H
1 Class Interval f C.M. fx x (x-x) (x-x)2 f(x-x)2
2 123-131 4 127 508 151.21 -24.21 586.33 2345.33
3 132-140 6 136 816 151.21 -15.21 231.47 1388.85
4 141-149 9 145 1305 151.21 -6.21 38.62 347.56
5 150-158 12 154 1848 151.21 2.79 7.76 93.12
6 159-167 5 163 815 151.21 11.79 138.90 694.52
7 168-176 3 172 516 151.21 20.79 432.05 1296.14
8 177-185 3 181 543 151.21 29.79 887.19 2661.57
9 Σ 42 6351 2322.32 8827.07
10 x 151.21
11 MAD 14.67

By Analytical Computation for Grouped Data:


Standard Deviation
∑(𝑥− 𝑥̅ )2
𝑠=√
𝑛−1

8,827.07
𝑠=√
41
𝒔 = 𝟏𝟒. 𝟔𝟕 𝑨𝒏𝒔𝒘𝒆𝒓

80
Name: ________________________ Course: ________
Classroom Activity No.3.2 Section: ________
1. Determine the three forms of central tendency and the non-central measures
regarding the observation below.
148 253 268 372 387 493 408 513 528 633 648
753 768 873 888 491 406 511 526 631 646 751
766 472 487 592 507 612 627 732 747 453 468
573 588 693 608 713 728 534 549 654 669 576
681 696 701 517 622 637 547 552 564 576 588
690 503 519 621 535 644 556 666 374 584 794
804 712 820 835 641 556 761 876 683 699 501
517 422 235 647 758 869 374 485 293 505 610
625 730 745 850 865 973 187 292 207 312 327
432 447 552 567 672 687 792 707 812 827 932

81
Name: ________________________ Course: ________
Homework No.3.2 Section: ________

1. Find the measures of variability of a student who gathered the following


examination grades: 90, 87, 92, 85, 88, 95, and 78.

2. Find the measures of variability of the following numbers?


40, 50, 60, 35, 65, 28, 31, 55, 61, 49, 34, 42, 38, 47, 55, 46.

82
3. An achievement test in economics contained 30 questions. The distribution
below summarizes the result of the test. Find the measures of variability
Number of Answers Frequency
1–3 2
4–6 8
7–9 16
10 – 12 26
13 – 15 38
16 – 18 42
19 – 21 36
22 – 24 24
25 – 27 6
28 – 30 2

83
4. Given the following raw data, determine the measures of variability.
57 42 25 67 78 89 34 45 23 55 60 65
70 75 80 85 98 15 22 27 32 37 42 47
52 57 62 67 72 77 82 87 92 18 23 28
32 37 43 48 53 58 63 68 73 78 83 88
41 46 51 56 61 66 71 76 42 47 52 57
62 67 72 77 43 48 53 58 63 68 73 78
54 59 64 69 56 61 66 71 57 62 67 57
52 54 56 58 60 53 59 61 55 64 56 66
34 54 74 84 51 56 61 62 67 72 77 36

84
CHAPTER 4

BASIC & DISCRETE PROBABILITY

TOPIC LESSON
1. Basic Concepts of Probability
2. Conditional Probability
3. Bayes’ Theorem
4. Discrete and Continuous Random Variable Probability
Distribution
5. Binomial Distribution
6. Poison Distribution
7. Applications of Probability

OBJECTIVES
For the students to:

1. Learn the basic concepts and conditional probability.


2. Study the Bayes’ theorem.
3. Distinguish the basic from the discrete.
4. Determine the probable values of random variable.
5. Demonstrate a probability distribution for discrete
random variable.
6. Evaluate the binomial and poison distribution.
7. Know the applications of probability.

85
The Basic Concepts of Probability

Probability is the possibility or an ability of a particular event will


occur. It can be referred to a chance of getting a red card in a deck of cards
or a chance of selecting one item over another item. Also, it can be referred
to a chance of a new product in the market against the old ones. In any of
these, probability can be described as a portion or a fraction of a whole or
less than unity. A zero probability is an event that has no chance to occur.

The aforesaid examples refers to a priori or classical method. The


prior realization of the process is the basis of the probability of success of
any event. In tossing a coin, there are only two probable outcomes. Each
outcome has a chance of 0.50 probability. In throwing a die, there are six
probable outcomes. Therefore, each outcome has a chance of 1/6 probability
of occurrence. Hence,

Single Event:
𝐻
𝑃= eq. 4.1
𝐻+𝐹
𝐹
𝑄= eq. 4.1a
𝐻+𝐹

𝑁 = 𝐻 + 𝐹 eq. 4.1b

Where:
P = Probability of Occurrence
Q = Probability of Failure
H = Number of Outcomes that the event will happen
F = Number of Outcomes that the event will fail
N = Total Number of Possible Outcomes

86
If there are more than one trials in tossing a coin, there is no more
prior equal assumption on favourable and failure outcomes. The probability
now is being taken to the number of successful outcomes over the total
number of trials. This method is called empirical classical probability.

Empirical Method
𝑂𝑠
𝑃= eq. 4.2
𝑁𝑡

Where: P – Probability of Success


𝑂𝑠 − Total Number of successful Outcomes
𝑁𝑡 − Total Number of Trials or cases

We can consider this one as multiple events whether dependent of


independent. Dependent multiple events can happen when the occurrence of
one affects the probability of occurrence of the other, while independent
multiple events can happen if the occurrence of one will not affect the
occurrence of the other.

Multiple Events

𝑃 = 𝑃1 × 𝑃2 × 𝑃3 × … × 𝑃𝑛 eq. 4.3

Mutually Exclusive Events

Mutually Exclusive Events are two or more events but it is impossible


for more than one of them to happen in just one trial. The probability of
which is the sum of their individual probabilities.

87
Mutually Exclusive Event

𝑃 = 𝑃1 + 𝑃2 + ⋯ + 𝑃𝑛 eq. 4.4

Repeated Trials

Most of the times, a repeated trial is being done in an experiment in


order to record the occurrence of a desired event. Permutations and
combinations are parts of analysing repeated trials. The permutation of n
𝑛!
different things taken r at a time is 𝑃(𝑛,𝑟) = (𝑛−𝑟)! , while the combination of
𝑛!
n different things taken r at a time is 𝐶(𝑛,𝑟) = (𝑛−𝑟)!𝑟!.

n Factorial (n!) = 𝑛 × (𝑛 − 1) × … × 3 × 2 × 1

Hence, the occurrence of such event in exactly r times in n trials has a


probability of:

Repeated Trials

𝑃(𝑛,𝑟) = 𝐶(𝑛,𝑟) 𝑝𝑟 𝑞 𝑛−𝑟 eq. 4.5


𝑛!
𝐶(𝑛,𝑟) = (𝑛−𝑟)!𝑟! eq. 4.6

Where: 𝑃(𝑛,𝑟) − Permutation n different thing r at a time


𝐶(𝑛,𝑟) − Combination of n thing taken r at a time
𝑝 − Probability of the occurrence of event
𝑞 − Probability of the failure of event
𝑛 − Number of trials
𝑟 − Number of occurrence

88
The first two methods (single and empirical) dealt with objective
probability whereas the third one determines the probability by dealing with
available data or believing that an event will occur or not. This method is
called subjective probability. A concrete example of this method is the view
of a manufacturer against the view of a seller in the success of one particular
product. The personal opinion and analysis of a definite condition of a
particular event are the basis of a subjective probability.

Sample space (Ss)

The basic composition of the theory of probability are the outcomes of


a particular study. Any possible nature of occurrence in the process is called
an event. The gathering of all the possible events is herein referred to as the
sample space or total number of outcomes if we consider throwing a die or
dice.

Contingency Table

The simple way of showing a defined sample space is by way of a


contingency table. The values in the cells of the contingency table are
probable outcomes of that can be obtained in any trial to be performed or
any possible outcome in a survey study. (See Table 4.1)

Example: Tabulate the probable outcomes of throwing two fair dice.

Table 4-1 Dice 2


1 2 3 4 5 6
1 1 1 1 2 1 3 1 4 1 5 1 6
2 2 1 2 2 2 3 2 4 2 5 2 6
Dice 1 3 3 1 3 2 3 3 3 4 3 5 3 6
4 4 1 4 2 4 3 4 4 4 5 4 6
5 5 1 5 2 5 3 5 4 5 5 5 6
6 6 1 6 2 6 3 6 4 6 5 6 6

89
The columns are the possible outcomes of dice 1 while the rows are
the possible outcomes of dice 2. Therefore in throwing two fair dice, there
are 36 probable outcomes or the sample space.

Probability is computed as:


𝐶𝑛
𝑃= eq. 4.7
𝑆𝑠
Where: P – Probability
𝐶𝑛 − Desired Number of Combinations
𝑆𝑠 − Sample Space

Discrete Probability

When outcomes of a random experiment or a study is determined by


counting numbers such as the number of times the head turned up when a
coin is tossed five times, the number of dots when pair of fair dice is thrown,
number of kings in a deck of cards, by any chance the number can be found,
that number is called random variable. This is denoted by x, y, z,… or any
other accepted symbol.

If there is certain distance from any permissible value of a random


variable to the next permissible value in a particular range, the random
variable is called discrete random variable. Examples of discrete random
variable are daily production, monthly sales, annual expenditures, etc…

In contrast, a continuous random variable is any figure or value


within a particular range or interval. It can take values in between two
discrete random variables. If a and b are two discrete random variables, a
90
continuous random variable can take values in between a and b or even in
between two continuous random variables. The number of permitted values
is infinite.

Probability Distribution of Discrete Random Variable

The mutually exclusive record of entire numerical outcomes for any


random variable so that a specific probability of occurrence is related to each
outcome is the probability distribution of discrete random variable. For
example, the distribution of constructed housing units per schedule per
month of a realty development company as shown in the table. For every
batch of units started a month, the probability is reflected per month for the
entire year of operations. The reflected monthly probabilities are based on
previous years history of the realty development company.

Table for Monthly Probability Distribution of Houses

Month Number of Houses Constructed Probability


1 200 0.06
2 250 0.07
3 250 0.10
4 350 0.11
5 400 0.12
6 400 0.10
7 300 0.10
8 250 0.06
9 200 0.05
10 300 0.08
11 350 0.09
12 250 0.06

To summarize a discrete probability distribution of a random variable,


we have to determine the major properties of the distribution as discussed in

91
chapter 3 of this work textbook. The two major properties are the mean (𝜇)
and standard deviation (𝜎). Please notice also the sum of the values of
probabilities is equal to one (1) or unity.

The Histogram (By EXCEL Program)

Monthly Probability Distribution


600
400
200
0
1 2 3 4 5 6 7 8 9 10 11 12

Mo. Number of Housing Units Probability

Value of Discrete Random Variable

The weighted average or the mean 𝝁 of all possible outcomes is the


expected value of the discrete random variable wherein the weights are the
probabilities associated to each of the outcomes. In the given example, the
measures of central tendency and variance are computed as follows:

Formula: 𝜇 = ∑𝑛𝑖=1 𝑥1 𝑃1 eq. 4.9

𝜎 = √∑𝑛𝑖=1(𝑥𝑖 − 𝜇)2 𝑃𝑖 eq. 4.10


Where: 𝜇 − 𝑀𝑒𝑎𝑛
𝜎 − 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑥 − 𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠 (𝑁𝑜. 𝑜𝑓 𝐻𝑜𝑢𝑠𝑒𝑠 𝑓𝑜𝑟 𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛)
𝑃 − 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑂𝑐𝑐𝑢𝑟𝑎𝑛𝑐𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠

Solution: From the Formula, the central tendency of the discrete


probability distribution is: (Using the Excel)

92
Mo. Number of Housing Units Probability xP 𝜇 (𝑥 − 𝜇) (𝑥 − 𝜇)2 𝑃
1 200 0.06 12 306.5 -106.5 680.535
2 250 0.07 17.5 306.5 -56.5 223.4575
3 250 0.10 25 306.5 -56.5 319.225
4 350 0.11 38.5 306.5 43.5 208.1475
5 400 0.12 48 306.5 93.5 1049.07
6 400 0.10 40 306.5 93.5 874.225
7 300 0.10 30 306.5 -6.5 4.225
8 250 0.06 15 306.5 -56.5 191.535
9 200 0.05 10 306.5 -106.5 567.1125
10 300 0.08 24 306.5 -6.5 3.38
11 350 0.09 31.5 306.5 43.5 170.3025
12 250 0.06 15 306.5 -56.5 191.535
Σ 306.5 4482.75
𝜇 306.5
𝜎 66.95

By analytical computation:

The Mean 𝜇 of the Discrete Probability Distribution is:

𝜇 = 200 × 0.06 + 250 × 0.07 + ⋯ + 350 × 0.09 + 250 × .06

𝝁 = 𝟑𝟎𝟔. 𝟓 𝑼𝒏𝒊𝒕𝒔 𝒐𝒇 𝑯𝒐𝒖𝒔𝒆𝒔 𝒑𝒆𝒓 𝒎𝒐𝒏𝒕𝒉 𝑨𝒏𝒔𝒘𝒆𝒓

The Standard Deviation 𝜎 is:


𝜎 = √(200 − 306.5)2 × 0.06 + (350 − 306.5)2 × 0.09 + ⋯ (250 − 306.5)2 × 0.06

𝜎 = √4,482.75
𝝈 = 𝟔𝟔. 𝟗𝟓 𝑨𝒏𝒔𝒘𝒆𝒓

Example: Based on the contingency table, if a pair of fair dice is thrown,


there will be 36 pair of combinations that can be made.
Construct the probability distribution.
Solution: Tabulate the 36 probable outcomes of throwing two fair dice.

93
Die 2
1 2 3 4 5 6
1 1 1 1 2 1 3 1 4 1 5 1 6
2 2 1 2 2 2 3 2 4 2 5 2 6
Die 1 3 3 1 3 2 3 3 3 4 3 5 3 6
4 4 1 4 2 4 3 4 4 4 5 4 6
5 5 1 5 2 5 3 5 4 5 5 5 6
6 6 1 6 2 6 3 6 4 6 5 6 6

Let x be the random variable in a space S of 36 with a set of :


𝑥(𝑆) = {𝑥1 , 𝑥2 , … . . , 𝑥𝑛 }
Then, x(S) can be converted into a probability space wherein
Px1 would be function f(x1), Px2 = f(x2), …, Pxn = f(xn). This is
called probability function.

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6
𝑓(𝑥)1 𝑓(𝑥)2 𝑓(𝑥)3 𝑓(𝑥)4 𝑓(𝑥)5 𝑓(𝑥)6

Going back to the table, there are six coloured cells representing the
possible outcomes:
1. 𝑓(𝑥)1 − 𝑐𝑜𝑙𝑜𝑢𝑟𝑒𝑑 𝑦𝑒𝑙𝑙𝑜𝑤
2. 𝑓(𝑥)2 − 𝑐𝑜𝑙𝑜𝑢𝑟𝑒𝑑 𝑔𝑟𝑒𝑒𝑛
3. 𝑓(𝑥)3 − 𝑐𝑜𝑙𝑜𝑢𝑟𝑒𝑑 𝑡𝑢𝑟𝑞𝑢𝑜𝑖𝑠𝑒
4. 𝑓(𝑥)4 − 𝑐𝑜𝑙𝑜𝑢𝑟𝑒𝑑 𝑝𝑖𝑛𝑘
5. 𝑓(𝑥)5 − 𝑐𝑜𝑙𝑜𝑢𝑟𝑒𝑑 𝑏𝑙𝑢𝑒
6. 𝑓(𝑥)6 − 𝑐𝑜𝑙𝑜𝑢𝑟𝑒𝑑 𝑟𝑒𝑑

The summation of all the probabilities should be equal to 1


∑ 𝑓(𝑥𝑖 ) = 1
Then, 𝑓(𝑥)1 > 0

94
The probability distribution of f(x) is determined as follows:
f(1) : P(x = 1) = P(1, 1)
𝟏
=
𝟑𝟔

f(2) : P(x = 2) = P(2, 1), P(2, 2), P(1, 2)


𝟑
=
𝟑𝟔

f(3) : P(x = 3) = P(3, 1), P(3, 2), P(3, 3), P(1, 3), P(2, 3)
𝟓
=
𝟑𝟔

f(4) : P(x = 4) = P(4, 1), P(4, 2), P(4, 3), P(4, 4), P(1, 4),
P(2, 4), P(3, 4)
𝟕
=
𝟑𝟔

f(5) : P(x = 5) = P(5, 1), P(5, 2), P(5, 3), P(5, 4), P(5, 5),
P(1, 5), P(2, 5), P(3, 5), P(4, 5)
𝟗
=
𝟑𝟔

f(6) : P(x = 6) = P(6, 1), P(6, 2), P(6, 3), P(6, 4), P(6, 5),
P(6, 6), P(1, 6), P(2, 6), P(3, 6), P(4, 6), P(5, 6)

𝟏𝟏
=
𝟑𝟔

Tabulating the results, we have:

𝑥𝑖 1 2 3 4 5 6
1 3 5 7 9 11
𝑓(𝑥𝑖) 36 36 36 36 36 36

Computing the major properties of discrete probability, we have the


table below. (Using the EXCEL Program)

95
x P xP 𝜇 x-𝜇 (𝑥 − 𝜇)2 P
1 0.03 0.03 4.47 -3.47 0.33
2 0.08 0.17 4.47 -2.47 0.51
3 0.14 0.42 4.47 -1.47 0.30
4 0.19 0.78 4.47 -0.47 0.04
5 0.25 1.25 4.47 0.53 0.07
6 0.31 1.83 4.47 1.53 0.71
Σ 1.00 4.47 1.97
𝜇 4.47
𝜎 1.40

By analytical computation:

The Mean 𝜇 of the Discrete Probability Distribution is:

𝜇 = 1 × 0.03 + 2 × 0.17 + ⋯ + 5 × 0.25 + 6 × .31

𝝁 = 𝟒. 𝟒𝟕 𝑨𝒏𝒔𝒘𝒆𝒓

The Standard Deviation 𝜎 is:

𝜎 = √(1 − 4.47)2 × 0.03 + (2 − 4.47)2 × 0.17 + ⋯ (6 − 4.47)2 × 0.31

𝜎 = √1.97
𝝈 = 𝟏. 𝟒𝟎 𝑨𝒏𝒔𝒘𝒆𝒓

Hence, the major properties of the discrete probability distribution


regarding the outcomes when throwing a pair of fair dice are:

Mean: 𝜇 = 4.47
Standard Deviation: 𝜎 = 1.40

96
Name: ________________________ Course: ________
Classroom Activity No. 4.1 Section: ________

1. In a production process, the defective rate is 10%. If 15 items were selected


at random, what is the probability that 2 of them are defective?

2. 50% of the riding public at LRT and MRT have wrong chance. If 250
passengers ride at the terminal every 5 minutes, what is the probability that
100 passengers have correct change?

3. A random of 5 students is selected from a group of students that compose of


200 first year and 100 second year. Determine the probability that exactly
two of them are second year?

97
4. The distribution of finished and completed housing units per schedule per
month of a realty development company is shown in the table. If there were
5,500 units completed in a year, what is the population standard deviation?
Table for Monthly Probability Distribution of Houses

Month Probability
1 0.08
2 0.09
3 0.10
4 0.12
5 0.13
6 0.09
7 0.08
8 0.06
9 0.05
10 0.07
11 0.08
12 0.05

98
Name: ________________________ Course: ________
Homework No. 4.1 Section: ________

1. The probability of a defective item in the production of ball pens is 0.005.


What is the probability that 150 will be proven to be defective in the
production of 20,000 pieces?

2. 1.00% of a certain product is defective in a long run. What is the probability


that there is one and only one defective item in a random selection of 100
items?

99
3. The U.S forces in Iraq used a missile that hit the target with a probability of
0.20. How many missiles should be fired so that there is at least 75%
probability of hitting the target?

4. Michael Jordan sinks 80% of his free throw attempts. What is the probability
that he will make exactly 7 of his next 10 attempt?

100
Chapter 5
Normal Distribution

TOPIC LESSON
5. Normal random Variable
6. Normal Curve
7. Regions of Normal Curve
8. Probabilities and Percentiles using Normal Curve
9. Sampling Distribution of the Mean

OBJECTIVES
For the students to:
5. Illustrate a normal random variable and its
characteristics.
6. Know how to construct a normal curve
7. Identify regions under normal curve corresponding to
standard normal values.
8. Convert normal random variable to standard normal
variable.
9. Compute probabilities and percentiles using areas under
normal curve.
10.Evaluate sampling distribution from the mean.

101
One of the vitally important in statistics is the Normal Distribution.
There are three reasons behind its importance:
1. It is a tool in approximation of various discrete probability
distribution.
2. It can follow ad approximate numbers of continuous phenomena.
3. It provides inferential statistics because of its relation to the central
limit theorem. The Central Limit Theorem: The sample mean 𝑥̅
approximately follows the normal distribution with population
mean µ and population standard deviation 𝜎.

The normal distribution is the focus in the study of statistics.


Statistical problems of various type can be solved by way of normal
distribution. The distribution of variables most of the time such as students’
grades, peoples’ weights or heights, family income, or IQ’s of persons
usually approximate a normal distribution.

The normal distribution is represented by a normal curve. Let us now


discuss the properties of a normal curve. A normal curve, which is a bell-
shaped figure has the following properties:

Properties of Normal Curve

1. It is a bell-shape and thus, symmetrical about the mean 𝜇


2. Its measures of Central Tendency (mean, median, mode, midrange)
are equal and identical.
3. The tails are asymptotic along the horizontal line.
4. The areas on the left and on the right of the mean are equal to 0.50 or
50%, so that the total area under the normal curve is equal to one or
100%.

102
5. Its related random variable has infinite range (−∞ < 𝑥 < +∞).
6. There should be at least three standard scores, each to the left and to
the right of the mean.
7. The distance between two standard scores is measured by the standard
deviation.
8. Its middle spread is equal to 1.33 standard deviations which means,
the quartile deviation is within the interval of two-thirds of the
standard deviation below and above the mean.

Skewed distributions:

1. Skewed to the right – A distribution having a tail that is longer on the


right end.
2. Skewed to the left – A distribution having a tail that is longer on the
left end.

Examples of skewed distributions are:


Marriage and age
Mortality age of some diseases
Industrial measurements
Biological measurements

103
Illustration diagram of age at marriage:

15 20 25 30 35 40 45 50 60 70

The great majority of persons that get married as shown in the


diagram is between the ages 18 to 35. The age of 25 is the marriage age with
higher frequency or it is the age which most people get married. There are
lesser persons who marry at a much later age. This distribution is skewed to
the right.
The index of skewedness introduced by Pearson is based on the
measures of central tendency:

3(𝜇 −𝑚)
𝑖𝑛𝑑𝑒𝑥 = 𝐸𝑞. 5.1
5

Between -1 and +1 of the index, the skewedness has no mark. If the


index is > +1, the skewedness is to the right or positively skewed. If the
index is < -1, the skewedness is to the left or negatively skewed. In any case
other than no mark, the best measure of central tendency is the median and
the interquartile range is the best measure of variability.

Areas under normal curve:

The properties of normal curve can be used in solving various types of


statistical problems. The first step is to learn how to use the table of areas
under normal curve.

104
The values on the table are anchored by the values of standard score
(z) having a formula called the Transformation Formula:

𝑥−𝜇
𝑧= 𝐸𝑞. 5.2
𝜎

Where: z = Standard Score


x = Value of the Variable
𝜇 = Distribution Mean
𝜎 = Standard Deviation

The difference between any value of variable (x) and distribution


mean (𝜇) is being converted by the formula for standard scores into so many
standard deviations (𝜎). We explained that the standard deviation measures
the spread or variances of the data from the mean in chapter 3.

Example: If 𝜇 is 120, and 𝜎 is 15 in a given normal distribution. What is


the value of standard score (z) and its corresponding area under
the normal curve, of the following values: a) x1=90, b) x2=100,
c) x3=145, and d) x4=160.

Solution: Substitute the given values of the variables, mean and standard
deviation in the formula

a) For x1=90:
𝑥− 𝜇 90 − 120
𝑧= , 𝑧=
𝜎 15

𝒛 = −𝟐. 𝟎 𝑨𝒏𝒔𝒘𝒆𝒓

The area corresponds to this z-value is:

Area = 0.4772 Answer

105
There are 2 standard deviations between 90 and 120 to
the left of the mean is the meaning of z = -2. The area will still
be +0.4772 although the z-value is negative which means that
the value covered from 90 to 120 has a probability 𝑝 = 0.4772
and they are located at the left portion of the mean. (Note that (-
) value of (z) tells us that the variable is at the left of mean.)
b) For x2=100:
𝑥− 𝜇 100− 120
𝑧= , 𝑧=
𝜎 15

𝒛 = −𝟏. 𝟑𝟑 𝑨𝒏𝒔𝒘𝒆𝒓

To determine the areas of this z-values from the mean of


the distribution, we have to refer on the table of the areas under
the normal curve in Appendix A of this workbook.

The area corresponds to this z-value is:

Area = 0.4082 Answer

c) For x3=145:
𝑥− 𝜇 145− 120
𝑧= , 𝑧=
𝜎 15

𝒛 = +𝟏. 𝟎 𝑨𝒏𝒔𝒘𝒆𝒓

The area corresponds to this z-value is:

Area = 0.3413 Answer

There is one standard deviation between 90 and 120 to


the left of the mean is the meaning of z = +1. (Note that (+)
value of (z) tells us that the variable is at the right of mean.)

d) For x4=160:
106
𝑥− 𝜇 160− 120
𝑧= , 𝑧=
𝜎 15

𝒛 = +𝟐. 𝟔𝟕 𝑨𝒏𝒔𝒘𝒆𝒓

The area corresponds to this z-value is:

Area = 0.4962 Answer

Mathematicians & Statisticians developed a statistical tabulations of


the results of standard scores for different areas under the standardized
normal curve.

The table only shows the right portion of the areas under normal curve
or the half of the normal curve. Because the two portions of the normal
curve are symmetrical, it is no longer necessary to give the area of the left
half of the normal curve. Disregard the negative (-) value of the standard
score, since the sign only tells us the location of the variable.

The normal curve of any normal distribution comes from the graph of
histogram taken from the frequencies of the class marks of every class
interval or category. Sample of this is shown below.

Graph of Normal Curve


Figure 5.1

𝜇
Relative Frequency Histogram

107
Application of Normal Distribution

The normal distribution can be a tool in determining the values and


probabilities of some undefined intervals in the normal random variable
distribution. By the aide of the table of areas under normal curve, we can
find the probable values of random variables and probabilities of some
intervals in the distribution.

Example: Draw the normal curve of the normal distribution taken from
the frequencies of class marks’ intervals of the incomes of
employees of a certain company as shown in table 5.1. The
incomes as specified in the column of class marks are in
thousands of pesos. Determine also the frequencies of random
variable and probabilities of the employees having an income
that range from:
a) 27k and below
b) 27k to 36k
c) 36k to 54K. Table 5.1
Income of the Employees of ABC Co., Ltd.
C. Intervals Frequencies C. Marks
1 2 21
2 7 25
3 38 29
4 170 33
5 420 37
6 850 41
7 1,400 45
8 850 49
9 420 53
10 170 57
11 38 61
12 7 65
13 2 69

108
Solution: With the help of EXCEL program, the chart below was made
using the frequencies of the class intervals. Connecting the
middle top of the bars in the histogram, we can now draw the
normal curve.

Frequency
1600
1400
1200
1000
800
600
400
200
0
1 2 3 4 5 6 7 8 9 10 11 12 13

Figure 5.2
Using the EXCEL program, we can find the value of the standard
deviation as shown in the tabulation. (Table 5.2)

C. I. f C.M. x (𝑥 − 𝜇) (𝑥 − 𝜇)2 𝑓(𝑥 − 𝜇)2


1 2 21 45 -24 576 1152
2 7 25 45 -20 400 2800
3 38 29 45 -16 256 9728
4 170 33 45 -12 144 24480
5 420 37 45 -8 64 26880
6 850 41 45 -4 16 13600
7 1400 45 45 0 0 0
8 850 49 45 4 16 13600
9 420 53 45 8 64 26880
10 170 57 45 12 144 24480
11 38 61 45 16 256 9728
12 7 65 45 20 400 2800
13 2 69 45 24 576 1152
Σ 4374 157280
𝜎 6.00

109
The standard deviation as computed is equal to 6.0. Hence, the
frequencies and probabilities of the random variables are:
a) 27k and below
27−45
𝑧27 = = −2.98
6.0

From the table, the area of z = -2.98 is 0.4986. Therefore, Area


covered by the interval of 27k and below is:
Area = 0.50 – 0.4986 = 0.0014
Probability p = Area = 0.0014 Answer
No. of random variables (N):
𝑁 = 𝑝𝑛 = 0.0014 × 4374 = 𝟔 𝑨𝒏𝒔𝒘𝒆𝒓
b) 27k to 36k
27−36
𝑧36 = = −1.50
6.0

From the table, the area of z = -1.50 is 0.4332. Therefore, Area


covered by the interval of 27k – 36k is:
Area = 0.4986 – 0.4332 = 0.0654
Probability p = Area = 0.0654 Answer
No. of random variables (N):
𝑁 = 𝑝𝑛 = 0.0654 × 4374 = 𝟐𝟖𝟔 𝑨𝒏𝒔𝒘𝒆𝒓
c) 36k to 54K.
54−45
𝑧54 = = +1.50
6.0

From the table, the area of z = 1.50 is 0.4332. Therefore, Area


covered by the interval of 27k – 36k is:
Area = 0.4332 + 0.4332 = 0.8664
Probability p = 0.8664 Answer
No. of random variables (N):
𝑁 = 𝑝𝑛 = 0.8664 × 4374 = 𝟑, 𝟕𝟗𝟎 𝑨𝒏𝒔𝒘𝒆𝒓
110
Name: ________________________ Course: ________
Classroom Activity No.5.1 Section: ________

Draw a diagram of the normal curve and shade the area of the following
intervals. Determine also the area covered by the intervals.
a. From z = -3.05 to z = - 1.52

b. From z = +1.04 to z = +1.59

c. From z = -0.73 to z = +2.17

111
d. Right of z = +1. 03

e. Right of z = -0.47

f. Left of z = -1.12

g. Left of z = +1.85

112
Name: ________________________ Course: ________
Homework No.5.1 Section: ________

Evaluate the following problems


1. Show diagrams and shade the required area.
a. Find the area under normal curve from z = 0 to z = +0.87

b. From z = -1 25 to z = - 1.88

c. From z = +0.54 to z = +1.58

d. From z = -0.89 to z = +1.07

113
e. Right of z = +1,69

f. Right of z = -1.92

g. Left of z = -0.56

h. Find the probability that z is less than or equal to -0.76 and greater
than or equal to +1.23.

114
2. Find the z score corresponding to the given areas under the normal curve.
(show diagrams and shade given areas)
a. Area to the left of z = 0.6730

b. Area to the left of –z = 0.2345

c. Area to the right of z = 0.3456

d. Area to the right of –z = 0.7654

115
3. In a statistics examination, the mean grade is 80 and the standard
deviation is 5.
a. Find the corresponding scores of two students whose grades are 88
and 68 respectively.

b. Find the grades of two students whose z scores are 0.68 and -1.56
respectively.

c. If there were 120 students that took the examination, how many
students got 75 and below.

d. How many students who got 85 and above.

116
4. Four hundred skilled workers were given an examination to determine
how much they know about their job. If the scores are normally
distributed and the score of one worker measured in z score is 0.75,
a. How many of the workers who took the examination has scored
higher than or equal to this particular worker.

b. If the standard deviation is 4.5 and the lowest score is 60, what is
the probable highest score?

117
The Standardized Normal Distribution

From the previous example illustrated in the application of normal


distribution, we can change the frequency distribution into a relative
frequency distribution to support the standardized normal distribution. The
amounts in the class intervals are in thousands of pesos. The relative
frequency distribution representing the 4,374 incomes of employees still
forms a bell-shape curve called normal curve.

The continuous random variable of incomes follow the Normal


Probability Density Function or Gaussian. The measurement of 4,374
incomes, group in the interval of (43 < 47) thousand pesos and distributed
symmetrically upward and downward to form a bell-shape normal curve.

Table 5.3

Incomes of the Employees of ABC Co., Ltd.


Class
Intervals Relative Frequency
Below 23 2/4374 = 0.00046
23 < 27 7/4374 = 0.00160
27 < 31 38/4374 = 0.00869
31 < 35 170/4374 = 0.03887
35 < 39 420/4374 = 0.09602
39 < 43 850/4374 = 0.19433
43 < 47 1400/4374 = 0.32007
47 < 51 850/4374 = 0.19433
51 < 55 420/4374 = 0.09602
55 < 59 170/4374 = 0.03887
59 < 63 38/4374 = 0.00869
63 < 67 7/4374 = 0.00160
67 < 69 2/4374 = 0.00046
Σ 1.00000

118
The mathematical symbol representing the probability density
function is denoted by f(X). For the normal distribution, the formula is:

𝟏 𝒙−𝝁 𝟐
𝒆 𝟐 𝝈 ]
− [
𝒇(𝑿) = 𝑬𝒒. 𝟓. 𝟑
√𝟐𝝅𝝈

Where: 𝑒 − 𝑀𝑎𝑡ℎ𝑒𝑚𝑎𝑡𝑖𝑐𝑎𝑙 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 2.71828


𝜋 − 𝐶𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 3.1416
𝜇 − 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛
𝜎 − 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑥 − 𝐴𝑛𝑦 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒

𝑥−𝜇
From the transformation formula 𝑧 = , the original data for the
𝜎

random variable x has a mean 𝜇 and standard deviation 𝜎, for the


standardized random variable z of a standardized normal distribution has a
mean 𝜇 of (0) zero and a standard deviation of (1) one. Substituting this
values in the formula, we have:

1 𝑧−0 2
− [ ]
𝑒 2 1
𝑓(𝑍) =
√2𝜋(1)

𝟏 𝟐
− [𝒛]
𝒆 𝟐
𝒇(𝒁) = 𝑬𝒒. 𝟓. 𝟒
√𝟐𝝅

Hence, we can convert normally distributed data into standardized


data and determine any desired probabilities. Let us consider the normal
distribution that was discussed in the early part of this chapter regarding the
incomes of ABC Co., Ltd. employees’ family. The normal distribution has a
mean 𝜇 of 45k and standard deviation 𝜎 of 6.

119
From Figure 5.3, we notice that any values of X has a corresponding
values of standardized measurement Z coming from the transformation
𝑥−𝜇
formula 𝑧 = as discussed in the early part of this chapter.
𝜎

Incomes of ABC Co. Ltd. Employees

x-scale 𝜇 − 4𝜎 𝜇 − 3𝜎 𝜇 − 2𝜎 𝜇 − 1𝜎 𝜇 𝜇 + 1𝜎 𝜇 + 2𝜎 𝜇 + 3𝜎 𝜇 + 4𝜎

𝜇 = 45, 𝜎 = 6 21 27 33 39 45 51 57 63 69

𝑧 − 𝑠𝑐𝑎𝑙𝑒 −4 −3 −2 −1 0 +1 +2 +3 +4

Figure 5.3
Illustrative Example:
Supposing we pick an employee at random and determine the
probability that the income of the employees’ family. What would be the
probability if the income of the employees’ family is between 39k to 45k.

Solution: Referring to Figure 5.3, the income of the employees’ family is


between -1 standard deviation below the mean. From the table
5-A of areas under normal curve, we can extract the area which
is in turn, the probability of the random selection.
From the table 5-A, the area for z = 1 is 0.3413. The area is
taken from z = 1 which is also the area of z = -1 because of the
symmetry property of the normal curve. Therefore p = 0.3413.

120
AREAS OF A NORMAL CURVE
Table 5-A
Z 0 1 2 3 4 5 6 7 8 9
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0754
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2258 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2518 0.2549
0.7 0.2580 0.2612 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2996 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4812 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4965 0.4966 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
3.1 0.4990 0.4991 0.4991 0.4991 0.4992 0.4992 0.4992 0.4992 0.4993 0.4993
3.2 0.4993 0.4993 0.4994 0.4994 0.4994 0.4994 0.4994 0.4994 0.4995 0.4995
3.3 0.4995 0.4995 0.4996 0.4996 0.4996 0.4996 0.4996 0.4996 0.4996 0.4997
3.4 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4998
3.5 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998
3.6 0.4998 0.4998 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999
3.7 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999
3.8 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999
3.9 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000

121
Name: ________________________ Course: ________
Classroom Activity No.5.2 Section: ________

In the last entrance examination, the mean score 𝜇 is 128 out of 180 items and the
standard deviation 𝜎 is 10.
a. What is the probable lowest score?

b. What is the probable highest score?

c. Find the corresponding standard z-scores of two students whose


examination scores are 135 and 118 respectively.

122
d. Find the examination scores of two students whose z-scores are
0.68 and -1.56 respectively.

e. If there were 1,150 students that took the examination, how many
students who 105 and below.

f. How many students got 95 and above.

123
g. What is the probability that a score of one examinee is between 95
and 105?

h. What is the probability that one examinee got a score of exactly


128?

i. If there were 720 students who passed the examination, what is the
probable lowest passing score?

j. If the lowest passing score is 105, how many passed the


examination?

124
Sampling Distribution of the Mean

Using statistics particularly the sample mean and the sample


proportion to estimate when estimating the corresponding parameters in the
population is a major objective data analysis. When making statistical
inference, we should realized that we are drawing some conclusions about
the entire population and not only for the sample. An example of this is a
political survey intended for a candidate who wishes to run for any position
in an election. A sample result is being used to estimate the proportion of the
electorate.

In actual practice, a set of sample of a predetermined size is chosen


among the population through random sampling of acceptable method. The
computation of size of sample has been discussed in chapter 2 of this work
textbook.

Other Features of Random Sampling

a. Lottery Sampling – This method of random sampling uses


numbers to be assigned to all the members of the population. The
numbers then will be placed in a container or lottery drum and be
rotated or shaken. Then, the needed samples will be picked in the
container drum.
b. Random Numbers – Another method of random sampling
technique is the use of random numbers. The selection of samples
is being done by direct selection if only few samples are required
and by remainder method if many samples are required.

125
Distribution of the Mean

The unbiasedness property of the mean includes the fact that the
population mean 𝜇 is equal to the average all possible sample means of a
possible sample size. We have discussed several measures of central
tendency in the previous chapter. Specifically, we took on the mean as the
most important measure among the measures of central tendency because of
its sensitivity. It is the best measure if the population is found to be normally
distributed. Other properties of the mean include efficiency and consistency.

Illustration: A group of five (5) friends are fond on fish balls. The
following data are the number of pieces of fish balls they
ate in one occasion.

Friends Number of Fish Balls


𝑃𝑒𝑑𝑟𝑜 𝑋1 = 10
𝐽𝑢𝑎𝑛 𝑋2 = 12
𝑃𝑎𝑏𝑙𝑜 𝑋3 = 15
𝐽𝑜𝑠𝑒 𝑋4 = 16
𝑀𝑎𝑟𝑖𝑜 𝑋5 = 17

Population Mean – The population mean 𝜇 is equal to the sum of the


𝑋 values in the population divided by the total number of values 𝑁.

∑𝑁
𝑖=1 𝑋
𝜇= Eq. 5.5
𝑁

Population Standard Deviation – The population standard deviation is


the square root of the summation of the squares of the difference of the
values from the mean divided by the total number of values.

126
∑𝑁
𝑖=1(𝑋𝑖 −𝜇)
2
𝜎=√ Eq. 5.6
𝑁

Using the two equations, we have:

The mean
10 + 12 + 15 + 17 + 16
𝜇= = 14 𝑝𝑖𝑒𝑐𝑒𝑠
5

The Standard Deviation

(10−14)2 +(12−14)2 + (15−14)2 +(17−14)2 +(16−14)2


𝜎=√ = 2.61
5

If two samples are selected without replacement, using the


fundamentals of combination, the following combinations are:

There are now 10 samples of n = 2 friends from N = 5 friends. We


noticed also that the mean 𝜇𝑥̅ of all the sample means is still 14.

Combinations Sample Outcomes ̅


Sample Mean 𝑿
Pedro/Juan 10, 12 11
Pedro/Pablo 10, 15 12.5
Pedro/Jose 10, 17 13.5
Pedro/Mario 10, 16 13
Juan/Pablo 12, 15 13.5
Juan/Jose 12, 17, 14.5
Juan/Mario 12, 16 14
Pablo/Jose 15, 17 16
Pablo/Mario 15, 16 15.5
Jose/Mario 17, 16 16.5
Σ 140
𝝁𝒙̅ 14

127
Standard error of the Mean

A population can comprise of outcomes that can take a wide


difference of values from small to large difference. If one extreme value
make it in a sample, the effect will be reduced because of the effect of other
values included in the sample, nevertheless it will affect the mean.

The standard error of the mean 𝜎𝑥̅ is equal to the standard deviation of
the population 𝜎 over the square root of sample size n.

𝜎
𝜎𝑥̅ = Eq. 5.7
√𝑛

Sampling from Normally Distributed Population

In the simplest case of sampling, if we select samples of size (n = 1),


the possible sample mean is:
𝑛
∑ 𝑋 𝑋
𝑋̅ = 𝑖=1 𝑖 = 𝑖 = 𝑋𝑖 Eq. 5.8
𝑛 1

When the population is normally distributed having the mean 𝜇 and


standard deviation 𝜎, then it follows that mean 𝜇𝑥̅ = 𝜇 and the standard
mean error 𝜎𝑥̅ = 𝜎 for samples n = 1.

Where: 𝑋̅ − 𝑆𝑎𝑚𝑝𝑙𝑒 𝑀𝑒𝑎𝑛


𝜇𝑥̅ − 𝑀𝑒𝑎𝑛 𝑜𝑓 𝑎𝑙𝑙 𝑆𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛𝑠
𝜎𝑥̅ − 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑚𝑒𝑎𝑛𝑠
𝜇 − 𝑀𝑒𝑎𝑛 𝑜𝑓 𝑛𝑜𝑟𝑚𝑎𝑙𝑙𝑦 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑑 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝜎 − 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛

128
Z – Value for Sampling Distribution of the Mean

The Z-value for sampling distribution of the mean is the quotient of


the difference between the sample mean 𝑋̅ and the population mean 𝜇 over
the standard error of the mean 𝜎𝑥̅ .

𝑋̅−𝜇𝑥̅
𝑍= Eq. 5.9
𝜎𝑥̅

𝜇𝑥̅ = 𝜇 − 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑𝑛𝑒𝑠𝑠 𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛

𝑋̅−𝜇
𝑍= 𝜎 Eq. 5.10
√𝑛

Illustrative example: If a sample of 16 sachets of 3 in 1 coffee labelled


“30 grams net” chosen from tens of thousands produced
in one day has a mean 𝑋̅ of 31 grams. What would be the
expectation on the entire one day production if the
population is normally distributed having a standard
deviation of 2 grams?

Solution: The 16 sachets of sample is the miniature of the day’s


production of tens of thousands sachets. If the population
is normally distributed, we can assume that the sample is
approximately normally distributed.

Given: 𝑋̅ = 31 grams
𝜇 = 30 grams
𝜎 = 2.0 grams
𝑛 = 16 sachets

129
𝑋̅−𝜇
Formula: 𝑍= 𝜎
√𝑛

31−30 1.0
𝑍= 2 = 2
√16 4

𝑍 = +2.0

From the table of areas under normal curve, the area of Z = +2.0 is
0.4772. This implies that 2.28% of all possible samples of size 16 could
have a sample mean over 31 grams.

Diagram of sampling distribution of the mean:

0.4772
0.0228

30 31
If we consider individual sachet, that percentage can be computed
immediately by:
𝑋−𝜇 31−30
𝑍= = = +0.50
𝜎 2

The area correspond to Z = +0.50 on the table is 0.50 – 0.1915 =


0.3085 which means that 30.85% of the individual sachets are containing
more than 31 grams. If we compare the two results, we may find that more
individual boxes than sample mean will be over 31 grams due to the fact that

130
each sample contains 16 sachets of different contents, some below and some
above the sample mean.

The Central Limit Theorem

We discussed already the sampling distribution of the mean for the


normally distributed population. If ever in so many instances that the
distribution of the population was found non-normal distribution, this brings
us an important theorem called the central limit theorem.

The Central Limit Theorem:

Regardless of any shape of the distribution of the individual


values in the population, the sampling distribution of the mean can
be approximated by the normal distribution as the sample size gats
bigger and larger.

The important factor of this theorem is that the average of all the
sample means is equal to the population mean. In other words, just simply
add all the sample means divide it by the number of sample units and that
figure is already the mean of the population.
Likewise, the average of all of the standard deviations of the sample,
is the actual standard deviation of the population. It’s a good and useful tool
that can help facilitating the nature and characteristics of a population.

Normality and the Sampling Distribution of the Mean

1. For non-normal distribution, for any sample size of not less than 30
observations selected at random, the sampling of the mean is
approximated to be normally distributed.

131
2. For a fairly normally distribution, for any sample size of not less
than 15 observations selected at random, the sampling of the mean
is approximated to be normally distributed.
3. For a normal distribution, the sampling of the mean is
approximated to be normally distributed regardless of sample size.

Illustrations: In a population of 10,000, the following sets of samples


are gathered independently. Determine the approximate
population mean.

Solution: Determine the mean of the samples 𝑋̅ and add them.


Divide the sum by number of sets to get the mean of the
population.
Sample 1 2 3 4 5 6 7
𝑋1 4 23 17 29 72 55 27
𝑋2 15 65 3 34 35 6 83
𝑋3 12 38 56 56 41 46 56
𝑋4 7 55 19 24 63 23 23
𝑋5 25 68 57 87 2 50 45
𝑋6 37 23 76 25 5 70 2
𝑋7 43 35 45 1 74 72 4
𝑋8 28 53 58 5 37 24 6
𝑋9 33 61 72 7 45 8 23
10 45 9 34 2 35 45 57
𝑋11 63 2 2 45 64 5 35
𝑋12 26 5 47 71 47 2 73
𝑋13 20 47 39 55 23 11 57
𝑋14 30 37 36 24 46 65 24
𝑋15 45 8 77 16 11 24 76
ΣX 433 529 638 481 600 506 591
𝑋̅ 28.87 35.27 42.53 32.07 40.00 33.73 39.40
Σ𝑋̅ 251.87
𝜇 35.98

132
∑𝑋
Formula: Sample Mean 𝑋̅ =
𝑛
433
𝑋̅1 = = 28.87
15
529
𝑋̅2 = = 35.27
15
638
𝑋̅3 = = 42.53
15
481
𝑋̅4 = = 32.07
15
600
𝑋̅5 = = 40.00
15
506
𝑋̅6 = = 33.73
15
591
𝑋̅7 = = 39.40
15

∑ 𝑋̅
Population Mean 𝜇 =
𝑁
251.87
𝜇=
7

𝝁 = 𝟑𝟓. 𝟗𝟖 𝑨𝒏𝒔𝒘𝒆𝒓

133
Name: ________________________ Course: ________
Classroom Activity No.5.3 Section: ________

1. If a sample of 36 bottles of beer labelled “330 cc. net” chosen from


hundreds of thousands produced in one day has a mean 𝑋̅ of 327 cc.
What would be the expectation on the entire one day production if the
population is normally distributed having a standard deviation of 5 cc?

134
2. Evaluate the population mean using the central limit theorem of the
tabulated samples shown below.
Sample 1 2 3 4 5 6 7
𝑋1 14 16 35 9 7 15 19
𝑋2 52 65 31 34 35 6 63
𝑋3 14 388 56 76 62 66 56
𝑋4 71 55 29 24 63 23 43
𝑋5 25 46 6 57 29 58 45
𝑋6 31 23 76 25 5 70 28
𝑋7 4 30 62 19 45 7 4
𝑋8 28 53 58 5 37 24 62
𝑋9 33 37 34 71 66 82 2
10 9 9 73 2 35 45 57
𝑋11 63 22 21 4 55 35 15
𝑋12 12 5 47 71 47 2 73
𝑋13 20 55 9 28 44 61 27
𝑋14 30 37 36 24 46 25 24
𝑋15 45 18 7 33 33 24 46

135
Chapter 6

Interval Estimation

TOPIC LESSON
1. Interval estimation of the mean(Known & Unknown 𝜎)
2. Interval estimate of the proportion
3. t – Distribution
4. Population mean
5. Population proportion
6. Confidence interval estimator

OBJECTIVES
For the students to:
1. Illustrate point and interval estimation
2. Identify point estimator for population mean
3. Identify the form of confidence interval estimator for
population mean(known & unknown 𝜎)
4. Illustrate and construct t – distribution.
5. Identify point estimator for population proportion
6. Identify the form of confidence interval estimator for
population proportion by central limit theorem.
7. Calculate the length of confidence interval
8. Draw conclusions based on confidence interval estimate

136
Interval Estimation

Interval estimation requires inductive reasoning. In inferential


statistics, we are trying to make some results of a single sample but draw
conclusions on the population. In real practice, the population mean is
usually the unknown quantity although most of the time it is attached on
most products like net weight of a sachet of shampoo or sachet of coffee, in
the bottle of soft drinks or any beverages, etc.. The confidence interval
estimation shows the location of the mean between the lower and upper
limit.

In general, if a 95% confidence interval estimate is being drawn in a


normal distribution, it will show 5% proportion in the two tails of the
distribution which is outside the confidence area.

Confidence interval of the mean (𝝈 𝒌𝒏𝒐𝒘𝒏)

When the standard deviation of the population or population variance


𝜎 is given or known, we can find the limits for the confidence interval for a
mean.

𝑍𝜎
Lower Limit: 𝑋̅ − Eq. 6.1
√𝑛

𝑍𝜎
Upper Limit: 𝑋̅ + Eq. 6.2
√𝑛

The Confidence Interval will be:

𝑍𝜎 𝑍𝜎
𝑋̅ − ≤ 𝜇 ≤ 𝑋̅ +
√𝑛 √𝑛

1−∝
Where: 𝑍 = 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑎𝑟𝑒𝑎 𝑜𝑓
2

137
Example 1: A certain brand of canned evaporated milk is labelled 331 cc.
Determine the confidence interval for a mean with a 98% level
of confidence if the sample mean is 330 cc. for n = 16 while the
population standard deviation 𝜎 is 2.5 cc.

Given: confidence level = 98%, 𝑋̅ = 330 𝑐𝑐, 𝜎 = 2.5 𝑐𝑐, n = 66

Solution: Convert the 98% confidence level into Z value.

0.98
𝐴= = 0.49 , 𝑍 = ±2.327
2

𝑍𝜎
Lower Limit: 𝑋̅ −
√𝑛

2.327×2.5
𝐿𝐿 = 330 −
√16

𝐿𝐿 = 328.55

𝑍𝜎
Upper Limit: 𝑋̅ +
√𝑛

2.327×2.5
𝑈𝐿 = 330 +
√16

𝑈𝐿 = 331.50

The Confidence Interval 𝟑𝟐𝟖. 𝟓𝟓 ≤ 𝟑𝟑𝟎 ≤ 𝟑𝟑𝟏. 𝟓𝟎 𝑨𝒏𝒔𝒘𝒆𝒓

Note: The mean is less than the upper limit which means that the mean is
still within the samples. Therefore the canning operation is still going
well.

138
Example 2: It is known for a paint manufacturer that the standard deviation
of canned paint is 0.02 gallon. A random of 64 cans sample is
done and found out that the average content of the gallon can is
0.995 gallon. At 95% confidence level of the true population
average, set up the confidence interval estimate.

Given: confidence level = 95%, 𝑋̅ = 0.995, 𝜎 = 0.02, n = 64

Solution: Convert the 95% confidence level into Z value.

0.95
𝐴= = 0.45 , 𝑍 = ±1.96
2

𝑍𝜎
Lower Limit: 𝑋̅ −
√𝑛

1.96×0.02
𝐿𝐿 = 0.995 −
√64

𝐿𝐿 = 0.9901

𝑍𝜎
Upper Limit: 𝑋̅ +
√𝑛

1.96×0.02
𝑈𝐿 = 330 +
√64

𝑈𝐿 = 0.9999

The Confidence Interval 𝟎. 𝟗𝟗𝟎𝟏 ≤ 𝟏. 𝟎 ≥ 𝟎. 𝟗𝟗𝟗𝟗 𝑨𝒏𝒔𝒘𝒆𝒓

Note: The mean of the population is greater than the upper limit which
means that the samples are really lesser in paint content. Therefore,
the production has a problem in canning operations.

139
Name: ________________________ Course: ________
Classroom Activity No. 6.1 Section: ________

1. A sample of 100 bottles of liquid detergent labelled “375 cc.” are chosen
from tens of thousands produced in one day has a mean 𝑋̅ of 374 cc. Set up a
99% confidence interval estimate if the population is normally distributed
having a standard deviation of 5 cc?

140
2. A light bulb factory wants to know the average life of their one shipment of
bulbs. They conducted a random sampling for 36 light bulbs and found out
the mean life of the samples was 220 hours. Set up a 99% confidence
interval estimate of the true population mean of 200 hours with a standard
deviation of 7 hours?

141
Confidence Interval of the Mean (𝝈 𝑼𝒏𝒌𝒏𝒐𝒘𝒏)

Usually the mean of the population 𝜇 is unknown the same with


the standard deviation 𝜎. Hence, by using the sample mean 𝑋̅, and
sample standard deviation S, we are getting the confidence interval
estimate of population mean 𝜇. To realize this, we have to work on the
student’s t-distribution.

Student’s t-Distribution:

William S. Gosset developed today’s Student’s t-Distribution.


The process is being used when the population standard deviation 𝜎 is
not available or unknown.

When the normally distributed random variable is:

𝑋̅−𝜇
𝑆 Eq. 6.3
√𝑛

The statistics is said to be a t – distribution with (n - 1) degrees


of freedom. The expression of t – distribution has similarity with the
expression of normal distribution except for the S.

Properties of t-Distribution

The appearance of t-distribution is the same as the normal


distribution. They are bell-shaped in appearance and also symmetrical
from the mean. They only differ on the areas under the curve. The t-
distribution has more on the tails than the normal while the normal is
more on the middle than the t-distribution. But, as the degrees of
freedom go higher, it slowly approaches

142
STUDENT'S t-DISTRIBUTION
df 0.400 0.300 0.200 0.100 0.050 0.025 0.010 0.005
1 0.325 0.727 1.376 3.078 6.314 12.706 31.821 63.657
2 0.289 0.617 1.061 1.886 2.920 4.303 6.965 9.925
3 0.277 0.584 0.978 1.838 2.353 3.182 4.541 5.841
4 0.271 0.569 0.941 0.155 2.132 2.776 3.747 4.604
5 0.267 0.559 0.920 1.476 2.015 2.571 3.365 4.032
6 0.265 0.553 0.906 1.440 1.943 2.447 3.143 3.707
7 0.263 0.549 0.896 1.415 1.895 2.365 2.998 3.499
8 0.262 0.546 0.889 1.397 1.860 2.306 2.896 3.355
9 0.261 0.543 0.883 1.383 1.833 2.262 2.821 3.250
10 0.260 0.542 0.879 1.372 1.812 2.228 2.764 3.169
11 0.260 0.540 0.876 1.363 1.796 2.201 2.718 3.106
12 0.259 0.539 0.873 1.356 1.782 2.179 2.681 3.055
13 0.259 0.538 0.870 1.350 1.771 2.160 0.031 3.012
14 0.258 0.537 0.868 1.345 1.761 2.145 2.624 2.977
15 0.258 0.536 0.866 1.341 1.753 2.131 2.602 2.947
16 0.258 0.535 0.865 1.337 1.746 2.120 2.583 2.921
17 0.257 0.534 0.863 1.333 1.740 2.110 2.567 2.898
18 0.257 0.534 0.862 1.330 1.734 2.101 2.552 2.878
19 0.257 0.533 0.861 1.328 1.729 2.093 2.539 2.861
20 0.257 0.533 0.860 1.325 1.725 2.086 2.528 2.845
21 0.257 0.532 0.859 1.323 1.721 2.080 2.518 2.831
22 0.256 0.532 0.858 1.321 1.717 2.074 2.508 2.819
23 0.256 0.532 0.858 1.319 1.714 2.069 2.500 2.807
24 0.256 0.531 0.857 1.318 1.711 2.064 2.492 2.797
25 0.256 0.531 0.856 1.316 1.708 2.060 2.485 2.787
26 0.256 0.531 0.856 1.515 1.706 2.056 2.479 2.779
27 0.256 0.531 0.855 1.314 1.703 2.052 2.473 2.771
28 0.256 0.530 0.855 1.313 1.701 2.048 2.467 2.763
29 0.256 0.530 0.854 1.311 1.699 2.045 2.462 2.756
30 0.256 0.530 0.854 1.310 1.697 2.042 2.457 2.750
40 0.255 0.529 0.851 1.303 1.684 2.012 2.423 2.704
60 0.254 0.527 0.848 1.296 1.671 2.000 2.390 2.660
120 0.254 0.526 0.845 1.289 1.658 1.980 2.358 2.617
0.254 0.524 0.842 1.282 1.645 1.960 2.326 2.576

143
Degrees of Freedom Concept

From the ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2 when computing the variance 𝑆 2 , we


have to know the value of the sample 𝑋̅, to determine the value of 𝑆 2 .
Hence, there are only (n – 1) values of the samples that are free to
vary. This is the reason why, there are (n – 1) degrees of freedom.

Illustration: There are 4 values whose mean is 10. This means that the
total value is 40. If there are three values that are free to
vary like 8, 9, and 10, the fourth number which is 13
cannot be varied because it will not give the total of 40.
That is the reason of the (n – 1) degrees of freedom.

The Confidence Interval of the Mean (𝝈 𝑼𝒏𝒌𝒏𝒐𝒘𝒏)

The following are the expressions for the (1−∝) × 100%


confidence interval estimate.

𝑡 𝑆
Lower Limit: 𝑋̅ − 𝑛−1 Eq. 6.4
√𝑛

𝑡 𝑆
Upper Limit: 𝑋̅ + 𝑛−1 Eq. 6.5
√𝑛

𝑡 𝑆 𝑡 𝑆
The Confidence Interval will be: 𝑋̅ − 𝑛−1 ≤ 𝜇 ≤ 𝑋̅ + 𝑛−1
√𝑛 √𝑛

Where: 𝑡𝑛−1 = critical value of the t-distribution having (n-1) as


degrees of freedom.

Example: The sample of 36 pieces of steel bars shows 35,000 psi strength
with 500 psi sample standard deviation. Evaluate the given data
setting a 95% level of confidence of estimating the population
mean.

144
Given: 𝑋̅ = 35,000 psi, S = 500 psi, n = 36
Level of confidence = 95%, hence, ∝ = 0.05
Solution: t = 2.045 taken from the table of Student’s t-distribution where
∝/2 = 0.025

𝑡 𝑆
Lower Limit: 𝑋̅ − 𝑛−1
√𝑛

2.045×500
𝐿𝐿 = 35,000 −
√36

𝐿𝐿 = 34,829.58

𝑡 𝑆
Upper Limit: 𝑋̅ + 𝑛−1
√𝑛

2.045×500
𝑈𝐿 = 35,000 +
√36

𝑈𝐿 = 35,170.42

The Confidence Interval 𝟑𝟒, 𝟖𝟐𝟗. 𝟓𝟖 ≤ 𝝁 ≤ 𝟑𝟓, 𝟏𝟕𝟎. 𝟒𝟐 𝑨𝒏𝒔𝒘𝒆𝒓

Note: The mean of the population is approximated between the lower limit
and the upper limit of the chosen samples. The validity of this is
dependent on the assumption that the distribution of the strength of
steel is normal.

145
Name: ________________________ Course: ________
Classroom Activity No. 6.2 Section: ________

1. A sample of 64 bottles of liquid detergent are chosen from tens of thousands


produced in one day with a mean 𝑋̅ of 328 cc and standard deviation of 4 cc.
Set up a 95% confidence interval estimate in evaluating the population mean
if the population is normally distributed.

146
2. A light bulb factory wants to know the average life of their one shipment of
bulbs. They conducted a random sampling for 100 light bulbs and found out
the mean life of the samples was 200 hours with standard deviation of 5
hours. Set up a 99% confidence interval estimate for the evaluation of
population mean.

147
Sampling distribution of the proportion
When dealing with categorical data, the sample mean 𝑋̅ is the
same as the sample proportion 𝑝𝑠 with the same characteristics. In a
trial observation of 1, 0, 1, 0, 1, wherein 1 is a success and 0 is a
failure, the total successes is 3 and the failure is 2. The mean of the
success trials is 0.60. That is also the proportion of the number of
successes in the trial observation. The sample proportion 𝑝𝑠 therefore
can be defined as:
𝑋
𝑝𝑠 = Eq. 6.6
𝑛

Where: 𝑝𝑠 − Sample Proportion


𝑋 − Number of successes
𝑛 − Sample size

The value of sample proportion 𝑝𝑠 ranges from 0 to 1 as its


special property. Whereas the sample mean 𝑋̅ is the estimator of the
population mean 𝜇.

Standard Error of the proportion

𝑝(1−𝑝)
𝜎𝑝𝑠 = √ Eq. 6.7
𝑛

In any cases, we can use the normal distribution in the


evaluation of the sampling distribution of the proportion because we
are dealing with sample proportions not sample mean. Hence,

𝑋̅−𝜇
𝑍= Eq. 6.8
𝜎𝑥̅

𝑝(1−𝑝)
Substitute: 𝑝𝑠 = 𝑋̅, 𝑝 = 𝜇𝑋̅ , and 𝜎𝑝𝑠 = √ = 𝜎𝑥̅
𝑛

148
Hence, the difference between the Sample and Population
Proportion in Standardized Normal Units is:
𝑝𝑠 −𝑝
𝑍= Eq. 6.9
𝑝(1−𝑝)

𝑛

Example: The registrar of a state university determines that 40% of all the
students exceeds the average grade of 2.50. If 200 students are
chosen at random, what is the probability that the sample
proportion of the students is more than 0.35?

Solution: np = 200(.40) = 80, n(1 – p) = 200(0.60) = 120, assuming


the sampling distribution to be approximately normally
distributed.
𝑝𝑠 −𝑝
Formula: 𝑍=
𝑝(1−𝑝)

𝑛

0.35−0.40
𝑍=
0.4(0.60)

200

𝑍 = −1.44

In the table of areas under normal curve, the corresponding area of


this is 0.4251, hence the area of the tail is (0.5000-0.4251) = 0.0749.
Therefore, the probability of obtaining a sample proportion of 0.30 is 0.0749
or a percentage of 7.49%. please refer of the diagram below.

Z = -1.44

149
Confidence Interval Estimate for Proportion

Following the discussions above regarding sample proportion, we


extend the concept to the confidence interval estimate for population
𝑋
proportion 𝑝 from the sample proportion 𝑝𝑠 = as long as np and n(1 - p)
𝑛

are at least 5. With this, we can approximate a normal distribution and set up
(1−∝) × 100% confidence interval estimate for population proportion 𝑝.

The Interval Estimate for Proportion is:

𝑝𝑠 (1−𝑝𝑠 )
Lower Limit: 𝑝𝑠 − 𝑍√ Eq. 6.10
𝑛

𝑝𝑠 (1−𝑝𝑠 )
Upper Limit: 𝑝𝑠 + 𝑍√ Eq. 6.11
𝑛

𝑝𝑠 (1−𝑝𝑠 ) 𝑝𝑠 (1−𝑝𝑠 )
The Interval: 𝑝𝑠 − 𝑍√ ≤ 𝑝 ≤ 𝑝𝑠 + 𝑍√
𝑛 𝑛

Where: 𝑝𝑠 − Sample Proportion


𝑝 − Population Proportion
𝑍 − Critical Value from Normal Distribution
𝑛 − Sample Size

Example: The production manager of a manufacturing company wants to


determine the proportion of the products with defects in terms
of length, width, color, and other standards the company is
implementing. The production manager selects 200 samples at
random for the analysis for the day’s production. For the 200
samples, 35 shows non-conformance to the standards. At 90%
confidence level, set up the confidence interval estimate.

Solution: Following the process, the confidence interval is computed as:

150
35
𝑝𝑠 = = 0.175
200

And with a 90% confidence level, the area = 0.90/2 = 0.4500,


from the table of areas under normal curve, Z = 1.645

𝑝𝑠 (1−𝑝𝑠 )
`Lower Limit: 𝑝𝑠 − 𝑍√
𝑛

0.175×0.825
𝐿𝐿 = 0.175 − (1.645)√
200

𝐿𝐿 = 0.1308

𝑝𝑠 (1−𝑝𝑠 )
Upper Limit: 𝑝𝑠 + 𝑍√
𝑛

0.175×0.825
𝑈𝐿 = 0.175 − (1.645)√
200

𝑈𝐿 = 0.2192

The Confidence Interval 𝟎. 𝟏𝟑𝟎𝟖 ≤ 𝒑 ≤ 𝟎. 𝟐𝟏𝟗𝟐 𝑨𝒏𝒔𝒘𝒆𝒓

Note: The production manager has determine with 90% confidence level
that between 13.08% to 21.92% of the produced on that day have
some defects on the standards of the company.

151
Name: ________________________ Course: ________
Classroom Activity No. 6.3 Section: ________

1. Based on experience, 12% of large shipments of machine parts are


defective. If random sample of 300 parts are chosen, what proportion of
the samples will have:
a. Between 8% to 10% defective parts?
b. Less than 6%?
c. If sample size is increased to 400, what would be the answer in (a)
and (b)?

152
2. Determine the critical value of t in each of the following:
a. 1−∝ = .96, 𝑛 = 16
b. 1−∝ = .95, 𝑛 = 25
c. 1−∝ = .94, 𝑛 = 36
d. 1−∝ = .92, 𝑛 = 49
e. 1−∝ = .90, 𝑛 = 64

153
3. The local branch manager of a universal bank desires an estimate of the
average amount of depositors’ savings accounts in the bank. He selected
30 depositors at random and the result show a sample mean of P47,500
and sample standard deviation of P12,000.
a. Assuming a normal distribution, set up a 90% confidence interval
estimate of the average amount held in all savings accounts
depositors.
b. If a depositor has P40,000 in his passbook, is this unusual?

154
CHAPTER 7
FUNDAMENTALS OF HYPOTHESIS
TESTING
TOPIC LESSON
1. Null and Alternative Hypothesis
2. Level of Significance, Rejection Region
3. Types of Error in Hypothesis Testing
4. Z – Test
5. T - Test

OBJECTIVES
For the students to:
1. Illustrate null and alternative hypotheses including
significance level and rejection region.
2. Calculate the probabilities of committing a type I and II
error.
3. Formulate null and alternative hypotheses.
4. Identify the form of test to be done.
5. Compute for the test statistical value.
6. Formulate the hypotheses of population proportion.
7. Draws conclusion on the results of the hypothesis testing.
8. Analyse problems involving hypothesis testing.

155
We will focus on another phase of statistics which is the inferential
statistics. A step-by-step methodology will be presented at the middle part of
this chapter on how to deal with hypothesis testing on population and sample
parameters. We will analyse the results we observe and the results we are
expecting to get if there is hypothetical assumption which is actually true.

The word “Decision” is one of the most frequently used terms in the
modern statistics today. A very important role is being played by
“Inferential Statistics” in the construction and analysis of standards and
principles wherein decisions are being given. The major function of many
businessmen is to make and give decisions every day. Their job is to decide
whether or not to produce a product or not, to enter into another product or
not, buy a new equipment or not; and the likes.

Inferential statistics is very useful in dealing with this tasks that it


enables a manager or a businessman to simplify a small samples and
approximate the entire population. A manager in many instances, can only
rely on the information that can be provided by the samples.

Hypothesis Testing Methodology

What is a hypothesis? A hypothesis is an exploratory declaration of a


statement wherein its focus is to explain evidences, proofs, and facts about
the realities. A hypothesis has its origin in an inquiry or demand of some
practical problems. In looking for a probable answer, a very knowledgeable
presumption or prediction and relevant indications are being brought out and
turned into propositions of hypothesis.

Hypotheses are subjected to testing either accepted or rejected based


on the available data that will be processed and interpreted using the
156
procedures that were discussed in the previous chapters. The testing of the
hypothesis is used to authenticate a claim, a declaration, or a statement
called the null hypothesis.

The Null and Alternative Hypotheses

When the population parameter is equal to the specified statement, the


hypothesis is null hypothesis. The null hypothesis always impose a no
difference or a status quo. It is the hypothesis that we hope to accept or
reject and is denoted by Ho. it always express the idea of non-significance of
difference or equality in nature. As the starting point in the procedure of
testing, the null hypothesis is the working hypothesis. The alternative
hypothesis Ha is the one that must be true if the null hypothesis is rejected or
found not true.

- Rejection of Ho denotes the acceptance of Ha


- Acceptance of Ho denotes the rejection of Ha

Type I and type II errors:

There is always risk that a wrong conclusion will be reached when


making a decision about a population parameter when using a sample
statistical data.
Type 1 and Type II errors:
Decision Ho = true Actual condition
Ha = true
Reject Ho Type I error Correct decision
Accept Ho Correct decision Type II error

157
There is always the possibility of making an error in drawing a
decision based on the table shown. It may either be a type I or type II
error if a decision is made but not both types at one time. To
summarize them we have:

Type I error ( ∝ 𝑒𝑟𝑟𝑜𝑟 ) – This error occurs if the null


hypothesis 𝐻𝑜 is rejected when in fact it is true and should be accepted.
The probability of Type I error is ∝.
Type II error ( 𝑒𝑟𝑟𝑜𝑟 ) − This type of error occur if the null
hypothesis is accepted when in fact it is not true and should be
rejected. The probability of type II error is 𝛽.

Level of significance:

The Level of Significance ∝ of the statistical testing is the reference in


committing a probable error of Type I. The type I error is directly under the
control of the person who perform the statistical testing because the Level of
Significance ∝ is being specified before the start of the statistical test. When
the value of ∝ is already specified, the rejection region exist because ∝ is the
probability of rejection under null hypothesis.

The maximum level value of probability of not accepting the null


hypothesis 𝐻𝑜 when in fact it is true is the significance level of hypothesis
testing. For statistical decision involving management or operations of
industries, it is a statistical tradition to use a level of significance of 5% or
1%. Accepting a 5 out of 100 chances of rejecting a hypothetical statement
means a 5% significance level. Thus, it denotes a 95% confident that we
have made the right decision. A 1% significance level indicates 99%
confidence level.

158
One-tailed and two-tailed test:

This is a very important process in hypothesis testing. The value of


the significance level is attached to tailed-test. One tailed-test happens when
the rejection region is located at one extreme tail of the distribution of values
for the test statistics. When the rejection region is located on both extreme
tails, the test is called a two-tailed test.

The formulation of the alternative hypothesis 𝐻𝑎 will tell us whether a


test is one or two-tailed test. When using a directional hypothesis such as
greater than, more than, and the likes, specifies the use of a one-tailed test. If
the alternative hypothesis 𝐻𝑎 is a declaration of non-equality characterized
by a sign ≠, and a statement of non-significance, then the hypothesis is non-
directional, therefore we have a two-tailed test.

Steps in hypothesis-testing:

1. Construct the null the null hypothesis 𝐻𝑜 .


a. That there is no significant difference between items being
compared.
b. Use comparative adjectives for directional hypothesis like,
greater than, lower than, less than, etc..
c. Then, construct the alternative hypothesis 𝐻𝑎 based on the
constructed null hypothesis. In case 𝐻𝑜 is rejected, 𝐻𝑎 will be
used.
2. Set the level of significance 𝛼 based on confidence that the
performer’s requirement or idea.

159
3. When population standard deviation is given, use the area under
normal curve for z-test. If the given standard deviation is for the
samples, use the student’s t-distribution for t-test.
4. Determine the tabular value for the test.
a. For z-test:
i. Use the z-values on the table of areas under normal curve
(chapter 5)
b. For a t-test:
i. Determine the degrees of freedom.
ii. Look for the corresponding value from the t-distribution
(chapter 6).
1. Single sample: df = n-1.
2. For two samples, df = n1 + n2 – 2
Where:
n1 = number of items in the first sample
n2 = number of items in the second sample
5. Determine the computed value of z or t based on the given data using
the following formulas:
a. For z-test:
i. When sample mean, population mean and standard
deviation, and number of samples are given. Comparing
sample mean and population mean.
√𝑛(𝑋̅−𝜇)
𝑍= Eq. 7.1
𝜎

ii. When two sample means, population standard deviation,


and number of items of two samples are given.
Comparing two sample means.

160
𝑋̅1 −𝑋̅2
𝑍= 1 1
Eq. 7.2
𝜎√ +
𝑛1 𝑛2

iii. When two sample proportions and number of item of two


samples are given. Comparing two proportions.
𝑃1 −𝑃2
𝑍= 𝑃1 𝑞1 𝑃2 𝑞2
Eq. 7.3
√ 𝑛 +
1 𝑛2

b. For t-test:
i. When sample mean, population mean, sample standard
deviation, and number of items of sample are given.
Comparing sample mean and population mean.
√𝑛−1(𝑋̅−𝜇)
𝑡= Eq. 7.4
𝑠

ii. When two sample means, two mean standard deviations,


and number of items of two samples are given.
Comparing two sample means.
𝑋̅1 −𝑋̅2
𝑡= (𝑛 −1)𝑠 +(𝑛2 −1)𝑠2
2
Eq. 7.5
1 1
√ 1 ×√ +
𝑛1 +𝑛2 −2 𝑛1 𝑛2

6. State conclusion by comparing the computed value and tabular value,


based on the following recommendations:
a. If the absolute computed value is equal to or greater than the
tabular value, reject null hypothesis 𝐻𝑜 .
b. If the absolute computed value is less than the absolute tabular
value, accept null hypothesis 𝐻𝑜 .

161
Example 1. The census in one school campus show that the mean weight of
college students was 48 kilos, with a standard deviation of 2
kilos. A sample of 36 students were found to have a mean
weight of 47 kilos. Are the 36 students really heavier than the
rest, using 0.05 significance level?

Solution: Following the aforesaid steps and procedures, we have

Step 1. Ho: The 36 college students are not really lighter than
the rest.
Ha: The 36 college students are really lighter than the
rest. This hypothesis is a directional and suited for one-
tailed test.

Step 2. Set level of significance ∝= 0.05

Step 3. Use z-test, the population standard deviation is given.

Step 4. At ∝= 0.05, the area of rejection region is 0.05. Based


on the diagram, the area of acceptance region is 0.45. On
the table below, the tabular value of z for an area of
0.045 is ±1.645. (between 1.64 and 1.65)

Critical Region 𝛼 = 0.05


Acceptance Region
0.45

𝑥̅ z = ±1.645

162
AREAS OF A NORMAL CURVE
Z 0 1 2 3 4 5 6 7 8 9
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0754
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2258 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2518 0.2549
0.7 0.2580 0.2612 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2996 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706

Step 5. The given values in the problem are:


𝑥̅ = 48 kilos
𝜇 = 47 kilos
𝛿 = 2 kilos
n = 36
√𝑛(𝑥̅ −𝜇)
Formula: 𝑧=
𝛿

√36(48−47)
𝑧=
2
(6)(1)
𝑧=
3

𝒛 = ±𝟐. 𝟎

163
Step 6. The computed value of 2.0 is greater than the absolute
tabular value of 1.645. Therefore, reject the null
hypothesis. The 36 students are really lighter than the
rest.

Example 2: A statistical research wanted to find out whether or not there is


significant difference between the daily allowances of morning
and afternoon students in his school. By random sampling, a
sample of 200 students were taken in the morning and found to
have a daily allowance of P200. A sample of 220 students were
taken also in the afternoon session and found out to have a daily
allowance of P 195. The daily allowance of the total population
of the students in that school has a standard deviation of P20. Is
there a significant difference between the two samples at 0.01
level of significance?

Solution: This is comparing two sample mean


Step 1. Ho: There is no significant difference between the two
samples.
Ha: There is significant difference between the
samples. This hypothesis calls for a two-tailed test

Step 2. Set level of significance 𝛼 = 0.01

Step 3. Use z-test

Step 4. At ∝= 0.01, the area of rejection region is 0.005. Based


on the diagram, the area of acceptance region is 0.495.
On the table below, the tabular value of z for an area of
0.045 is ±2.575. (between 2.57 and 2.58)
164
Critical Region 𝛼 = 0.01
Acceptance Region

0.005 0.495 0.005


𝑥̅ z = ±2.575
AREAS OF A NORMAL CURVE
Z 0 1 2 3 4 5 6 7 8 9
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0754
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2258 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2518 0.2549
0.7 0.2580 0.2612 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2996 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4812 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964

165
Step 5. The given values in the problem are:
𝑥̅1 = P200 n1 = 200
𝑥̅2 = P195 n2 = 220
𝛿 = P20

𝑥̅ 1 −𝑥̅ 2
Formula: 𝑧= 1 1
𝛿√ +
𝑛1 𝑛2

200−195
𝑧= 1 1
20√ +
200 220

5
𝑧=
20√0.005+0.0045
5
𝑧=
20(.0977)
5
𝑧=
1.954

𝒛 = +𝟐. 𝟓𝟓𝟗

Step 6. The absolute computed value of 2.559 is less than the


absolute tabular value of 2.575. therefore, accept the null
hypothesis. There is no significant difference between
the samples.

Example 3. A television program survey in Metro Manila shows that 100 of


200 women dislike the program while 180 of 300 men do not
like the same program. We want to decide whether the
100
difference between the two sample proportions, =
200
180
0.50 𝑎𝑛𝑑 = 0.60, is significant or nor at 0.05 level of
300

significance.

166
Solution: This is comparing two proportions

Step 1. Ho: There is no significant difference between the two


proportions
Ha: There is significant difference between the two
proportions

Step 2. Set level of significance 𝛼 = 0.05

Step 3. Use z-test to compare two sample proportions.

Step 4. The tabular value at 0.05 level of significance for a two-


tailed test z = ±1.96

Critical Region 𝛼 = 0.05


Acceptance Region

0.025 0.475 0.025


z = ±1.96

Step 5. The given values in the problem are:


p1 = 0.50 q1 = 1 – p1 q1 = 1 – 0.50 = 0.50
p2 = 0.60 q1 = 1 – p2 q2 = 1 – 0.60 = 0.40
n1 = 100 n2 = 1800

𝑝1 −𝑝2
Formula: 𝑧= 𝑝1 𝑞1 𝑝2 𝑞2
√ 𝑛 + 𝑛
1 2

167
0.50−0.60
𝑧=
0.50(0.50) 0.60(0.40)
√ +
100 180

−0.10
𝑧= 0.25 0.24
√ +
100 180

−0.10
𝑧=
√0.00383
−𝟎.𝟏𝟎
𝒛= = −𝟏. 𝟔𝟐
𝟎.𝟎𝟔𝟐

Step 6. The absolute computed value of -1.62 is less than the


absolute tabular value. Therefore, the null hypothesis is
accepted. There is no significant difference between the
two proportions.

Example 4. The average height of Filipino women is 1.525 meters. A


random selection of 50 women was taken and found to have a
mean height of 1.55 meters with standard deviation of 0.15
meter. Are the women in the sample are significantly taller at
0.05 level of significance.

Solution: This is comparing sample and population means but sample


standard deviation is give instead of population standard
deviation. (t-test)

Step 1. Ho: The sample is not significantly taller than other


Filipino women
Ha: The sample is significantly taller than other
Filipina women

Step 2. Set level of significance = 0.05


Step 3. Used t-test

168
Step 4. Look for the degree of freedom

df = n – 1, df = 50 – 1, df = 49
The tabular value for one-tailed test when df = 49 and
level of significance = 0.05 is 1.676 (by Interpolation)
STUDENT'S t-DISTRIBUTION
df 0.400 0.300 0.200 0.100 0.050 0.025 0.010 0.005
1 0.325 0.727 1.376 3.078 6.314 12.706 31.821 63.657
2 0.289 0.617 1.061 1.886 2.920 4.303 6.965 9.925
3 0.277 0.584 0.978 1.838 2.353 3.182 4.541 5.841
4 0.271 0.569 0.941 0.155 2.132 2.776 3.747 4.604
5 0.267 0.559 0.920 1.476 2.015 2.571 3.365 4.032
6 0.265 0.553 0.906 1.440 1.943 2.447 3.143 3.707
7 0.263 0.549 0.896 1.415 1.895 2.365 2.998 3.499
8 0.262 0.546 0.889 1.397 1.860 2.306 2.896 3.355
9 0.261 0.543 0.883 1.383 1.833 2.262 2.821 3.250
10 0.260 0.542 0.879 1.372 1.812 2.228 2.764 3.169
11 0.260 0.540 0.876 1.363 1.796 2.201 2.718 3.106
12 0.259 0.539 0.873 1.356 1.782 2.179 2.681 3.055
13 0.259 0.538 0.870 1.350 1.771 2.160 0.031 3.012
14 0.258 0.537 0.868 1.345 1.761 2.145 2.624 2.977
15 0.258 0.536 0.866 1.341 1.753 2.131 2.602 2.947
16 0.258 0.535 0.865 1.337 1.746 2.120 2.583 2.921
17 0.257 0.534 0.863 1.333 1.740 2.110 2.567 2.898
18 0.257 0.534 0.862 1.330 1.734 2.101 2.552 2.878
19 0.257 0.533 0.861 1.328 1.729 2.093 2.539 2.861
20 0.257 0.533 0.860 1.325 1.725 2.086 2.528 2.845
21 0.257 0.532 0.859 1.323 1.721 2.080 2.518 2.831
22 0.256 0.532 0.858 1.321 1.717 2.074 2.508 2.819
23 0.256 0.532 0.858 1.319 1.714 2.069 2.500 2.807
24 0.256 0.531 0.857 1.318 1.711 2.064 2.492 2.797
25 0.256 0.531 0.856 1.316 1.708 2.060 2.485 2.787
26 0.256 0.531 0.856 1.515 1.706 2.056 2.479 2.779
27 0.256 0.531 0.855 1.314 1.703 2.052 2.473 2.771
28 0.256 0.530 0.855 1.313 1.701 2.048 2.467 2.763
29 0.256 0.530 0.854 1.311 1.699 2.045 2.462 2.756
30 0.256 0.530 0.854 1.310 1.697 2.042 2.457 2.750
40 0.255 0.529 0.851 1.303 1.684 2.012 2.423 2.704
60 0.254 0.527 0.848 1.296 1.671 2.000 2.390 2.660

169
Step 5. The given values in the problem are:
𝑥̅ = 1.55 meters
𝜇 = 1.525 meters
𝑠 = 0.15 meters
n = 50

√𝑛−1(𝑥̅ −𝜇)
Formula: 𝑡=
𝑠

√49(1.55−1.525)
𝑡=
0.15
7(0.025)
𝑡=
0.15

𝑡 = 1.17

Step 6. The absolute computed value of 1.17 is less than the


absolute tabular value of 1.676. Therefore, the null
hypothesis is accepted. The sample is not significantly
taller than the others.

Example 5. The head of science department wants to test the effectiveness


of the case method of teaching over the traditional teaching
method. He picked two classes of approximately verified IQ.
He gathered a sample of 24 students to whom he used the case
method and another sample of 20 students to whom he used the
traditional method. After the experiment, an objective test
revealed that the first sample got a mean score of 30 with a
standard deviation of 3, while the second group got a mean
score of 25 with a standard deviation of 2.5. Based on the result

170
of the administered test, can we say that the case method is
more effective than the traditional method?

Solution: This is comparison of two sample means and two sample


standard deviations.

Step 1. Ho: The case method is as effective as the traditional


method.
Ha: The case method is more effective than the
traditional method.

Step 2. Set level of significance = 0.05


(Note: if there is no given significance level, use 0.05)

Step 3. Used t-test

Step 4. Compute for the degree of freedom for one-tailed t-test.

df = n1 + n2 – 2
df = 24 + 20 – 2 = 42
Tabular value of t from the table of student’s t-
distribution is t = 1.685 for 𝛼 = 0.05. (By Interpolation)

Step 5. The given values in the problem are:


𝑥̅1 = 30 𝑥̅2 = 25
s1 = 3 s2 = 2.5
n1 = 24 n2 = 20

𝑥̅1 −𝑥̅2
Formula: 𝑡= (𝑛 −1)(𝑠1 ) +(𝑛2 −1)(𝑠2 )2 1
2 1
√ 1 √𝑛 +𝑛
𝑛1 +𝑛2 −2 1 2

171
30−25
𝑡= 2 2
√(24−1)(3) +(20−1)(2.5) √ 1 + 1
24+20−2 24 20

5
𝑡=
√8.144√0.092

t = 5.79

Step 6. The computed t value of 5.79 is greater than the tabular


value 1.685. Reject the null hypothesis. The case method
is more effective than the traditional method.

172
Name: ________________________ Course: ________
Classroom Activity No. 7.1 Section: ________

1. A manufacturer of batteries claims that the average strength capacity of


their product will exceed 14 volts. A retailer is willing to buy a very large
shipment of batteries if the claim is true. A random sample of 100
batteries is tested and it was found out that the sample mean is 13.5 volts.
If the population standard deviation is 0.5 volt, is it like that the batteries
will be bought?

173
2. Capitol Steel Manufacturing Co. is producing steel wire with an average
tensile strength of 150 lbs. A random samples of 36 pieces in a laboratory
tests shows that the mean tensile strength is 145 lbs. and the standard
deviation is 3 lbs. Are the samples really below the average tensile
strength?

174
3. A bus company is looking for a better tire to be used for their Buses.
They would like to adopt steel belted brand (A) unless that there is some
evidence that nylon belted brand (B) is better. An experiment was
conducted where 25 tires from each brand were used. The tires run under
the same conditions until they wore out. The following are the results:
Brand A: 𝑥̅1 = 38,500 kms, s1 = 2,400 kms
Brand B: 𝑥̅2 = 37,000 kms, s2 = 1,200 kms
What would be the conclusion?

175
Name: ________________________ Course: ________
Homework No. 7.1 Section: ________

1. A random survey on the average total monthly expenditures of 150


students in a known university in Manila shows that the mean
expenditure per student including tuitions and books was P12,500 with
standard deviation of P1,000. How likely it is that the students spend an
average of P13,500 per month as claimed by one parent at 0.01
significance level?

176
2. Freshmen in a particular school are given entrance examinations in a
number of fields including Mathematics. Over a period of years, it has
been found that the average score in the Math examination is 85 with
standard deviation of 6. A Math professor examined the scores of his
class of 36 and found out that their average is 86. Can he claim that the
average score has increased?

177
3. A fisherman decides that he needs a line that can catch a weight 15 lbs.,
the size of fish he wants. He randomly tests 16 pieces of brand P line and
finds a sample mean of 16.50 lbs. If the standard deviation of the brand P
is 1.2 lbs., what can be conclude about brand P?

178
CHAPTER 8
SIMPLE REGRESSION &
CORRELATION

TOPIC LESSON
1. Bivariate Data
2. Scatter Diagram
3. Pearson Coefficient
4. Dependent and Independent Variables
5. Regression Line

OBJECTIVES
For the students to:
1. Illustrate the nature of bivariate data.
2. Construct a scatter diagram.
3. Calculate the Pearson coefficient.
4. Solve problems involving correlations.
5. Identify dependent and independent variables.
6. Plot the regression line in scatter diagram.
7. Compute the slope of regression line.
8. Determine values of dependent variables.
9. Evaluate problems involving regression analysis.

179
Simple Regression Analysis:

Regression analysis is being used mainly for the purpose on


prediction. Problems on forecasting and estimation are the primary concerns
of regression analysis. When a variable is dependent on other variable, given
a series of values for independent variable, the values dependent variable
can be estimated correspondingly with the values independent variable.

For example, given the heights of certain people, their weights can be
taken correspondingly. We can estimate or predict the possible height of a
person whose weight is known, say, 105 lbs. In this case, the height
corresponding to 105 lbs. is not known in the original data.

Problems involving unknown values of dependent variables can be


taken easily provided that there is a given or known values of independent
variables. The first type of problem is illustrated below based on the
following data.

Pairs of values of X and Y are shown. The independent variable is X


and the independent variable is Y. (weights and heights) as shown in the
scatter diagram below.

X 105 110 115 120 125 130 135


Y 62” 62.5” 63.5” 64” 64.5” 65” 65”

180
Scatter Diagram
70

69

68

67

66

65

64

63

62

61

60

10 11 11 12 12 13 13 14 14 15 15
5 0 5 0 5 0 5 0 5 0 5

We are assuming that the heights are dependent on weights, hence, the
values of Y here depend on the values by X. Let us estimate the value of Y
when X is 150 lbs.

Note: While the values of X and Y are increasing, it doesn’t mean that
their increments vary uniformly. If ever there is a uniform variations, then,
there’s no need to resort in the computations of estimated value of Y.

There are two ways of solving the problem

1. The first method is by graphical approach.

181
2. The second method uses the regression formula. This gives the exact
value of Y when X is 150 lbs. This method engaged the use of the
scatter diagram. This consists of plotting the points corresponding to
the paired values of X and Y on an X-Y axes system. (scatter
diagram)

Draw the trend line after plotting the points


corresponding to 7 pairs of X and Y. This line will represent the
many points that were plotted so that that the line approximates
the general direction of all the points in the scatter diagram.
Also, the line will approximately divide the vertical distances of
the scattered points below and above the trend line. In order to
get the direction of the line, all these conditions should be
fulfilled and the value of Y will be fairly established. The trend
line is shown on the scatter diagram.

Note that the trend line fulfils the conditions mentioned earlier:
1. It approximates the general direction of the points.
2. It passes through the points.
3. The sum of the vertical distances (from the points to
the trend line) of the points above the line is
approximately equal to the sum of the distances of the
vertical points below the line. One can check this by
using a ruler.

182
Scatter Diagram Trend Line

70

69 N

68

67 P

66

65

64

63

62

61

60

105 110 115 120 125 130 135 140 145 150 155

Lines M and N do not fulfil the three conditions and are obviously
incorrect trend lines. Also, there’s no need for the trend line need to pass

183
through the first or last points. If ever happened, it is just a matter of
coincidence. Neither, there’s no need for an equal number of points above
and below the trend line.

The value of Y when X is 150 by using the trend line is the


intersection of the trend line and the X=150. (Note: The estimated value of
Y will vary slightly depending on the accuracy of the drawn trend line. In
our example, the estimated value using the graphical method is 67.10”.

The graphical method may come up with a slight variance as


compared to the second method. If ever the two methods bring about the
same answer, it is still a matter of coincidence. The second method employs
formulas called the Least Square Regression Line or (LSRL) for short.

The trend line is a straight line. We know from algebra that a straight
line has an equation which follows the form:

𝑦 = 𝑚𝑥 + 𝑏 Eq. 8.1

Where: m = slope of the line

b = y-intercept.

The method using the LSRL is reduced in determining the equation of


the trend line which in turn can be found by finding the value of m and b in
the equation. In algebra, we need exact points to be used in extracting the
desired equation, but here, there is no exact point/s that can be used as basis
to extract the equation of the trend line as shown in the graph, which is just
floating in between coordinated points.

184
Statistics provides the formula for finding the value of m and b, but,
let us first explain briefly why LSRL is used. LSRL is a regression line of
“Least Square” which means that the most precise trend line that may be
drawn out of it is one where the sum of the squares of the vertical distances
of the points from the line is least of minimum. All other lines other than
the LSRL will yield a higher results.

This is similar when we say that the sum of the vertical distances of
the points above the line should be equal to the sum of the vertical distances
of the points below the line. If they are not equal, then the sum of the
squares of the vertical distances below and above the line is not minimum.

Let us illustrate the use of these formulas by using the same example
we used in explaining the graphical approach.

EXCEL Computation

X Y XY XX
1 105 62.00 6,510.00 11,025.00
2 110 62.50 6,875.00 12,100.00
3 115 63.50 7,302.50 13,225.00
4 120 64.00 7,680.00 14,400.00
5 125 64.50 8,062.50 15,625.00
6 130 65.00 8,450.00 16,900.00
7 135 65.00 8,775.00 18,225.00
Σ 840 446.50 53,655.00 101,500.00
b 50.93
m 0.11

The Formulas:

(∑𝑌)(∑𝑋 2 )−(∑ 𝑋)(∑ 𝑋𝑌)


𝑏= 2 Eq. 8.2
𝑁(∑ 𝑋 2 )−(∑ 𝑋)

185
𝑁(∑𝑋𝑌)−(∑ 𝑋)(∑ 𝑌)
𝑚= 2 Eq. 8.3
𝑁(∑ 𝑋 2 )−(∑ 𝑋)

Analytical Computations:

Substituting these values in the two equations, we have:


(∑𝑌)(∑𝑋 2 )−(∑ 𝑋)(∑ 𝑋𝑌)
𝑏= 2
𝑁(∑ 𝑋 2 )−(∑ 𝑋)

(446.5)(101,500)−(840)(53,655)
𝑏=
7(101,500)−(840)2

𝑏 = 50.93

𝑁(∑𝑋𝑌) − (∑ 𝑋)(∑ 𝑌)
𝑚= 2
𝑁(∑ 𝑋 2 ) − (∑ 𝑋)

7(53,655)−(840)(446.5)
𝑚=
7(101,500)−(840)2

𝑚 = 0.107

The equation of the trend line is:

𝑌 = 0.107𝑋 + 50.93

When X=150:

𝑌 = 0.107𝑋 + 50.93

𝑌 = 0.107 × 150 + 50.93

𝒀 = 𝟔𝟔. 𝟗𝟖 𝒊𝒏𝒄𝒉𝒆𝒔

This result is not very far from the graphical result of 67.10 inches
from the graphical method.

186
The method of LSRL is very useful in providing a fairly accurate
estimate when the values dependent and independent variables are given.
The method presumes the dependency of one variable to the other variable.
It also the same presumption on the trend line to be approximately straight.

Another type of problem on this is the time series. In here, the values
of one variable corresponding to several years are known. The trend line
LSRL equation will still be employed with a little modification.

Supposing, these are the sales of a certain firm for seven years.
Assuming that the trend will continue in the near future, what would be the
forecast in 2011 and 2012.

Year Sales (in P Million)

2004 P 450
2005 470
2006 495
2007 510
2008 540
2009 580
2010 630

(∑𝑌)(∑𝑋 2 )−(∑ 𝑋)(∑ 𝑋𝑌)


𝑏= 2
𝑁(∑ 𝑋 2 )−(∑ 𝑋)

(3,675)(28,196,371)−(14,049)(7,376,530)
𝑏= = −57,176.25
7(28,196,371)−(14,049)2

𝑁(∑𝑋𝑌)−(∑ 𝑋)(∑ 𝑌)
𝑚= 2
𝑁(∑ 𝑋 2 )−(∑ 𝑋)

187
7(7,376,530)−(14,049)(3,675)
𝑚= = 28.75
7(28,196,371)−(14,049)2

To simplify the computations, let us again use the EXCEL.

No. of
Yrs X Y XY 𝑿𝟐
1 2004 450 901800 4016016
2 2005 470 942350 4020025
3 2006 495 992970 4024036
4 2007 510 1023570 4028049
5 2008 540 1084320 4032064
6 2009 580 1165220 4036081
7 2010 630 1266300 4040100
Σ 14049 3675 7376530 28196371
b -57176.25
m 28.75

Therefore, the equation of the trend line is:

Y = 28.75X – 57,176.25

In the year 2011 or year 8, the sales projection will be:

Y = 28.75(2011) – 57,176.25

Y = 640 Million

In the year 2012 or year 9, the sales projection will be:

Y = 28.75(2012) – 57,176.25

Y = 668.75 Million

188
Simple Correlation analysis:

Correlation and regression analysis are very close with each other in
terms of variables analysis. Regression analysis talks about the projection
estimation of one variable depending on the one or two variables.
Correlation analysis deals with the relationship of one variable to another
variable. The strength of relationship between two variables is being
computed and measured by the coefficient of correlation (r).

There are three degrees of correlation or relationship between two


variables:

1. Perfect correlation (positive and negative)


2. Some degree of correlation (positive and negative)
3. No correlation

Perfect correlation can be found in physics laboratories. Experiments


may be performed in connection with existing theories and find out the
changes and fluctuations of variables in the experiments like temperature,
air, pressure, volume, and the likes. The theory of Boyle can be done in
laboratory by way of experiment, and we know that temperature will rise as
the pressure is increased.

Outside of a physics laboratory, the correlation between any two


variables whether related or not can be measured by way of coefficient of
correlation. There will only be some degree or none correlation at all.

Examples

1. Weight versus height


2. Grades of student against his IQ

189
3. Grades of student against study time
4. GNP against total investment
5. Work output against number of years in experience

In all of these, as the independent variable rises, the dependent


variable may or may not rise although not in straight line. This is because the
independent variable is not the only determinant of the dependent variable
which is influenced also by other variables if exist. This type of correlation
is true, in the relationship between two or more variables.

When the point in the scatter diagram shown no proof of correlation at


all, there will be no correlation between two variables. This is the case when
two or more dice are rolled simultaneously. There will be no definite pattern
of the points when the points are plotted in the scatter diagram..

Summarizing the different types of degrees of correlation between


two or more variables, Positive Correlation relates two variables whose
values are both increasing while Negative Correlation describes a situation
where one variable increases, the other variable decreases.

The concept of correlation in terms of computed value called


Correlation Coefficient represented by the symbol ( r ). This value ranges
from -1 to +1. The value -1 signifies perfect negative correlation while +1
indicates perfect positive correlation. The in-between values except zero,
specify some degree of correlation, whether positive or negative. A zero
correlation coefficient indicates no correlation at all.

The Formula:

𝑁(∑𝑋𝑌)−(∑𝑋)(∑𝑌)
𝑟= Eq. 8.4
√[𝑁(∑𝑋 2 )−(∑𝑋)2 )][𝑁(∑𝑌 2 )−(∑𝑌)2

190
The terms in this formula are the same terms used in computing the
values of the parameters m and b in regression analysis. To solve for r, we
have to add one more column for 𝑌 2 .

Example: Given the heights of twelve fathers and the corresponding


heights of the eldest sons, we known that the variables – height
of father and height of son are related. The height of the father
although not only the determinant, is a factor that strongly
influence the height of the eldest son. Although there is
exception, we can say generally, the taller the father is, the
taller will be the eldest son. in the following data, determine the
correlation coefficient and interpret the result.

Given data: X = Height of fathers in inches


Y = Height of eldest sons in inches.
X Y
A 65 66
B 66 68
C 67 66
D 69 70
E 64 67
F 65 65
G 72 70
H 65 68
I 69 71
J 66 67
K 67 68
L 65 68

191
EXCEL Computation:

X Y XY 𝑿𝟐 𝒀𝟐
A 65 66 4290 4225 4356
B 66 68 4488 4356 4624
C 67 66 4422 4489 4356
D 69 70 4830 4761 4900
E 64 67 4288 4096 4489
F 65 65 4225 4225 4225
G 72 70 5040 5184 4900
H 65 68 4420 4225 4624
I 69 71 4899 4761 5041
J 66 67 4422 4356 4489
K 67 68 4556 4489 4624
L 65 68 4420 4225 4624
Σ 800 814 54300 53392 55252
r 0.728705

A cursory inspection on the data shows that the eldest sons are
generally taller than their fathers except for C and F. This relationship
between the heights of father and son describes some degree of
correlation.

Summarizing the results:

𝑁(∑𝑋𝑌)−(∑𝑋)(∑𝑌)
𝑟=
√[𝑁(∑𝑋 2 )−(∑𝑋)2 ][𝑁(∑𝑌 2 )−(∑𝑌)2

12(54,300)−(800)(814)
𝑟=
√[12(53,392)−(800)2 ][12(55,252)−(814)2

651,600−651,200
𝑟=
√(640,704−640,0000)(663,024−662,596)

400
𝑟= = +0.7287 = +0.73
√301312

192
Interpretation: The result r = +0.73 could be initially interpreted
as follows: there is some degree of correlation
between the heights of the fathers and the eldest
sons.

- The correlation is positive since the value falls


from zero to +1.
- A value of 0.73 may also be interpreted as some
degree of correlation. It is a strong positive
correlation between the heights of the fathers and
eldest sons.

Values of r from zero to ±0.5 are considered weak


correlation either positive or negative.

Pointers in interpreting computed values of r:

1. The coefficient of correlation r does not necessarily implies cause and


effect relationship while r suggests dependence of one variable on
another variable
2. When r is computed high, it does not necessarily mean that one
variable strongly depends on the other.
3. The dependence of one variable to another variable reflect on the
computed r. If r is high, it confirms the dependence. If the computed r
is low, then it has to find reasons or try to explain this low value.
4. The correlation coefficient tells us whether the changes in the two
variables are closely interrelated or not.

193
Name: ________________________ Course: ________
Classroom Activity No. 8.1 Section: ________
𝑋:7, 12, 14, 16, 23, 27, 28, 34, 40
1. Given the values of:
𝑌:8, 11, 15, 18, 22, 28, 30, 36, 42

a. Draw the scatter diagram.


b. Using a ruler, draw a straight line (trend line) which you think best
estimates the general direction of the points in the scatter diagram.
c. Using the line drawn in (b), estimate Y when X is 11; when X is
12.
d. Find the equation of the best fitting line or the least square
regression line.
e. Using the equation of LSRL, determine the values of Y when X is
11, 12 and compare them to the results of (c).

194
2. The following data shows the grades of ten students in Algebra and
Statistics.

Alg. (X) 83 88 90 69 82 79 95 88 83 77

Stat. (Y) 80 85 86 78 86 87 94 84 88 82

a. Draw the scatter diagram.


b. Find the equation of the LSRL.
c. Draw the LSRL on the scatter diagram.
d. What is the student’s expected grade in Statistics if his grade in
algebra is 78? 82? 89? 95? 100?

195
Name: ________________________ Course: ________
Assignment No. 8.1 Section: ________

1. Supposing, these are the sales of a certain firm for seven years. Assuming
that the trend will continue in the near future, what would be the forecast in
2016 and 2017.

Year Sales (in P Million)

2009 P 1,450
2010 1,670
2011 1,950
2012 2,510
2013 2,950
2014 3,580
2015 3,630

196
2. Supposing, these are the sales and gross profit of a certain firm for seven
years. Test the correlation of the two variables.

Year Sales (in P Million) Gross Income

2009 P 1,450 P 370


2010 1,670 450
2011 1,950 520
2012 2,510 710
2013 2,950 805
2014 3,580 965
2015 3,630 1,340

197

Anda mungkin juga menyukai