and
PROBABILITY
Basic Approach
By:
1
TABLE OF CONTENTS
Introduction Page
0.1 Statistics 1
0.2 Mathematical Statistics 2
0.3 Two Fields of Statistics 2
0.4 Uses of Statistics 3
0.5 Terminologies 5
2
2.8 Graphical Methods 31
2.9 Kinds of Graphs 31
2.10 Cumulative Frequency Distribution 34
2.11 Relative Frequency Distribution 37
3
4.6 Contingency Table 84
4.7 Discrete Probability 85
4.8 Probability Distribution of Discrete Random Variable 86
4.9 Value of Discrete Variable 87
4
Chapter VII - Fundamentals of Hypothesis
Testing
7.1 Hypothesis Testing Methodology 151
7.2 The Null and Alternative Hypotheses 152
7.3 Type I and Type II error 152
7.4 Level of Significance 153
7.5 One Tailed and Two Tailed Test 154
7.6 Steps in Hypothesis Testing 154
5
INTRODUCTION
Statistics
6
categories by way of organized tabulation, graphical diagrams or charts in
order to arrive in a logical conclusions or decisions.
Mathematical statistics
7
Inferential Statistics stresses a logical order of evaluation that leads to
a more specific conclusion for an analytical findings. It is therefore a
process that includes mathematical analysis to reach a reasonable conclusion
out of the given facts or data. We are using this to infer or conclude from the
samples what a set of data or the population might have or to make decisions
of probability between two sets of data. The simplest inferential test is
shown when we want to compare the performance of two groups or the
sample to the population on a single measure to see if there is a difference.
The methods are: hypothesis, analysis of variance, enumeration data or the
chi square, regression, simple correlation and time series analysis.
Uses of Statistics
9
Terminologies
Data is any gathered or idealized set of studies under consideration,
standards, significances, costs, components, things, items, articles, or any
valuable matters or elements.
Population refers to a complete set of all possible data under
consideration in the study or research.
Sample refers to a portion of the population that is being gathered by way
of any acceptable method of collecting data.
Data Point refers to any element in the sample or population.
Qualitative Data are extremely varies in nature. It includes any
information which is not numerical in nature. These are the results when
the information has been sorted into categories.
Quantitative Data are the data that can be quantified and can be
subjected to statistical computation. These are the results of counting or
measuring as the qualitative data described.
Parameter is a variable for which the range of possible values identifies a
collection of distinct cases in a certain problem. In statistics, the
parameter is a variable whose value is sought by means of evidence from
samples.
Subscript is a number or letter representing several numbers placed at the
lower right of variable.
Primary Data refers to information which were gathered directly from an
original source.
Secondary Data refers to information which are taken from published or
unpublished data which were previously gathered by other individual or
agencies.
10
Numerical Scale It is often necessary to group numerical data into
categories. The range of the data is divided into a number of intervals,
where each interval becomes a category in a numerical scale. This type of
numerical scale is implemented by the Numerical Scale Class.
Ordinal Scale refers to systematic arrangement of data by way of rank,
degree, capability, strength, and many others.
Interval Scale refers not only to the arrangement of observation in order
but also to other information attached in the study or research.
11
CHAPTER 1
COLLECTION OF DATA
TOPIC LESSON
1. Methods in Collecting Data.
2. Planning a Research.
3. Survey Questionnaire.
4. Sampling Techniques
OBJECTIVES
For the students to:
12
and information but also the use of the available data to come up a good
business decisions.
Any kind of business ventures, project study, research and the like
should be based on precise and correct data in order to ensure the accuracy
of the research or the study. To this effect, there should be an excellent
method in (Sampling) gathering and collecting the data the will be used in
the analysis and interpretation of the research or study. Best samples will
produce outstanding results while unhealthy samples will lead to
unfavourable results.
There are two types of data, the primary data and secondary data.
Example of primary data are first person accounts, autobiographies, diaries
and the likes while secondary data are published books, newspapers,
magazines, biographies, business reports and the likes.
13
Reasons for Collecting Data:
1. To provide necessary input to a study and research.
2. To determine the performance of any existing service,
production, sales process and the likes.
3. To assess the quality of any product in accordance with
existing standards.
4. To support in the formulation of alternative measures in
the process of decision making.
5. To satisfy management curiosity towards the direction of
the business.
14
advantage of this method is that the information is being saved or
stored by the government or private entities and made available to
anybody who needs it. Examples of this method are car
registration, enrolments, census, and the likes.
4. The observation method. In this method, the investigator observes
the behaviour of a certain phenomenon, a person or group on their
activities or outcomes. It is usually used when the subjects cannot
talk or write like the occurrence of typhoons and other phenomena,
special person or people, and the likes.
5. The experiment method. It is commonly used by scientists,
chemist, and other people connected to experiments. The objective
of this method is to record the cause and effect of a scientific
research or study that is being done in a meticulous and organized
manner.
15
evaluation of the data, and the interpretation of the data (to be
discussed in the latter chapters or the inferential part of the book).
Assess the monetary resources or the budget if available to pursue
the research. If the budget does not warrant the study of the entire
population, the researcher can use samples of the population.
3. Prepare all the documents pertaining to the study especially the
questionnaire. Decide on the parameters to be followed in
collecting the population’s or sample’s data.
Types of questions
16
2. Unstructured or open-ended questions – are questions that can be
answered in different ways. These are the investigative questions
Sathat elicit some relative reasons.
Examples:
a. Do you want the system of government?
( ) yes ( ) no. Why?
b. In your opinion, can we produce a genius student in this
university
( ) yes ( ) no. Why?
4. Determine the sample size needed using the Sloven’s formula:
𝑁
n= 𝑒𝑞. 2.1 Where: n = Sample size
1+𝑁𝑒 2
N = Population size
e = Desired margin of error
Solution:
𝑁 10,000
n= , n= n = 1,000 families.
1+𝑁𝑒 2 1+10,000(0.03)2
5. Collect the samples using one of the sampling techniques that will
be discussed in the latter part of this chapter. The parameter will be
strictly observed in collecting data.
17
6. Evaluate and interpret the data using the methods of inferential
statistics that will be discussed on the latter part of the book. After
the interpretation, a statement of conclusion has to be made.
Sampling Techniques
A. Random Sampling
Random sampling refers to the selecting of samples size (n) without a
given pattern or system from a population (N) so that each item or member
in the population has an equal chance of being a sample. The number of
18
samples will be in accordance with the required number samples as
computed in the formula given in step 4 of the planning the study.
When we speak of picking things at random, we mean picking things
fairly without prejudice or any predetermined choice. In any occasion for
example, the quests may be asked to pick a seat at random. This can be done
by assigning individual number to each seat. The numbers are then written
on pieces of paper and placed in a box or container where they are mixed
thoroughly. When the participant draws a number from the box, he would
have drawn a number at random.
The random sampling can also be done in awarding prizes through the
“raffle” system. The participant winners can be asked to pick their prizes at
random. There will be assignment of numbers to every prize so that anyone
who could pick the number will get the prize.
One way of getting samples thru random sampling is the Lottery
Method. This is a method of random sampling wherein all members will be
given an assigned number each. All numbers will be put in a lottery box and
be rolled or shaken, after which the samples will be picked-up one by one on
the box.
B. Systematic Sampling
Another way of selecting samples is systematic sampling. In this
method, a number of types have been created which may be called
systematic sampling methods. These types are being used when there is an
19
erstwhile understanding of the members of the population. An example of
this method is when samples are chosen by way of counting in repetition
while getting every first or fifth in the process. This process using a system
is then called systematic sampling.
20
A concrete example of this is selecting samples in the entire
country. The first stage is to select regional samples. The size of
the regional samples will be determined by the regional
populations. Then, from regional to provincial; provincial to
municipal; and municipal to barangay.
BSA - 540
BSBA - 1,950
BSEnt. - 450
BSE - 520
BSIT - 470
BSECE - 350
BSEE - 410
BSME - 560
BSCE - 570
BSComE - 480
Others - 650
TOTAL - 6,950
a) How many samples are required if the margin of error is ±3%?
b) Find the number of samples per course.
21
Solution: Determine first the required number of samples base on the
total population:
𝑁
Formula: Sloven’s Ratio - 𝑛=
1 +𝑁𝑒 2
6,950
𝑛=
1 +6,950(0.03)2
𝒏 = 𝟗𝟓𝟖 𝑨𝒏𝒔𝒘𝒆𝒓
Ratio of samples per course:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑦𝑑𝑒𝑛𝑡𝑠 𝑜𝑓 𝑐𝑜𝑢𝑟𝑠𝑒 ∗ 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
𝑅=
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
For BSA:
540 𝑥 958
𝑅= = 74 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950
For BSBA:
1,950 𝑥 958
𝑅= = 269 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950
For BSEnt:
450 𝑥 958
𝑅= = 62 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950
For BSE:
520 𝑥 958
𝑅= = 72 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950
For BSIT:
470 𝑥 958
𝑅= = 65 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950
For BSECE:
350 𝑥 958
𝑅= = 48 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950
For BSEE:
410 𝑥 958
𝑅= = 57 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950
For BSME”
22
560 𝑥 958
𝑅= = 77 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950
For BSCE:
570 𝑥 958
𝑅= = 79 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950
BSComE:
480 𝑥 958
𝑅= = 66 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950
For Others:
650 𝑥 958
𝑅= = 90 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
6,950
TOTAL - 6,950
Note: The sum of the samples per course is 959 due to the interpolation of
decimal values of courses’ ratio.
C. Non-Random Sampling
In this method, not all members of the population are given equal
chances to be chosen. Some elements of the population are deliberately
left out in the choice of the sample for various reasons. The non-random
is being used instead of random if there is inadequate budget to pursue
the random sampling. Most of the times, this method is being used for
exclusive study for men or for women; for rich or for the poor and the
likes.
23
Name: ________________________ Course: _______
Classroom Activity No. 1.1 Section: _______
24
Name: ________________________ Course: ________
Classroom Activity No. 1.1 Section: ________
4. In a certain school, a survey on the academic performance and socio-
economic status of the students is being conducted in each course. The
following are the number of students per course:
BSA - 140
BSBA - 650
BSEnt. - 350
BSE - 150
BSIT - 170
a) How many samples are required if the margin of error is ±2.5%?
b) Find the number of samples per course.
25
Name: ________________________ Course: ________
Homework No. 1.1 Section: ________
26
4. In a certain school, a survey on the academic performance and
socio-economic status of the students is being conducted in each
course. The following are the number of students per course:
BSA - 140
BSBA - 650
BSEnt. - 350
BSE - 150
BSIT - 170
a) How many samples are required if the margin of error is ±2%?
b) Find the number of samples per course.
27
CHAPTER 2
PRESENTATION OF DATA
TOPIC LESSON
1. Methods of Presenting Data.
2. Statistical Table / Frequency Table
3. Frequency Distribution
4. Frequency Polygon
OBJECTIVES
For the students to:
28
data and they have to be organized in order to express the important qualities
and attributes. There are three forms to present data:
1. Textual – where data is presented in paragraph form
2. Tabular – where data is presented in rows or columns
3. Graphical – where data is presented in visual form.
Textual Method.
Tabular Presentation.
29
group data and each group in the table can be compared with each other in a
more comfortable manner.
The process of combining together the same items from the mass of
collected data based on their appearances and features like occupation, sex,
height, weight, income, nationality, etc. is called classification of data.
Example:
30
2000 1,935 8.52%
2001 2,122 9.66%
2002 2,368 11.59%
2003 2,681 13.22%
2004 3,042 13.47%
2005 3,428 12.69%
Frequency Table
31
The tabular arrangement or organization of data by categories
including their frequencies or occurrences is called Frequency Distribution.
The number of items or observations belong to any category is called the
Class Frequency. The grouping of items that described by lower and upper
limit is the Class Interval. The lower limit is the value of the lowest item that
belongs to a class interval. While the upper limit is the value of the highest
item that belongs to the same class interval.
32
5. Determine the class frequencies for each class interval using the
tally method or any other acceptable method.
6. Compute for class mark. The class mark is the average of lower
and upper limit.
𝑅 = 185 − 123 = 62
33
4. Determine the Class Intervals in ascending order. The book
suggests to start from the lowest value of item.
a. First Class Interval
i. Lower Limit = 123
ii. Upper Limit = Lower Limit + Size -1
Upper Limit = 123 + 9 – 1 = 131
b. Second Class Interval & succeeding C. I.
i. Lower Limit = Lower Limit of Prior + Size
Second L.L = 123 + 9 = 132
Third L.L = 132 + 9 = 141
Fourth L.L = 141 + 9 = 150
Fifth L.L = 150 + 9 = 159
Sixth L.L = 159 + 9 = 168
Seventh L.L = 168 + 9 = 177
ii. Upper Limit = Upper Limit of Prior + Size
Second U.L = 131 + 9 = 140
Third U.L = 140 + 9 = 149
Fourth U.L = 149 + 9 = 158
Fifth U.L = 158 + 9 = 167
Sixth U.L = 167 + 9 = 176
Seventh U.L = 176 + 9 = 185
5. Count the number of frequencies in each class intervals.
6. Compute the class marks.
123+131
First CM = = 127
2
34
Class interval Tally Frequency Class Mark
123-131 III 4 127
132-140 IIII 6 136
141-149 IIII-IIII 9 145
150-158 IIII-IIII-II 12 154
159-167 IIII 5 163
168-176 IIII 3 172
177-185 II 3 181
42
35
Layout of Class Boundaries and Class limits:
Class Boundaries : 131.5 140.5 149.5
Class Limits 123 131.132 140.141 149.150
Graphical Method:
Graphs are pictures of numerical data. We can see them in
many styles and they are widely because of clear pictures of
numerical data. Instantly, the viewer can recognize the highest or the
largest among any particular data like, population, births, registration,
and the likes.
Kinds of Graphs:
1. Bar graph – is a graph that consists of several bars either vertical or
horizontal bars. The magnitude of the bars is represented by their
scaled lengths.
2. Pie Chart – is a graph in the form of a pie or circle. Pie chart is
used to represent the shares of all categories in the entire
observation or data.
3. Line Graphs – is a graph that shows the magnitudes or frequencies
of an item or value in any observation.
4. Compound Bar Chart – is an ordinary bar graph wherein there are
two or more bars drawn for each item. This chart is used when the
need of comparison is being asked.
There are many graphs that can be adapted to any presentation
relevant to any subject of study. Graphs are instruments that can be helpful
in the interpretation of data and other related matters.
36
BAR GRAPH
14
12
10
8
6
4
2
0
127 136 145 154 163 172 181
PIE CHART
127
136
145
154
163
172
181
LINE GRAPH
14
12
10
0
127 136 145 154 163 172 181
37
COMPOUND BAR
30
25
20
Frequency
15
Percent
10
0
127 136 145 154 163 172 181
Frequency Polygon
14
12
10
8
Frequency
6
38
Cumulative Frequency distribution: The cumulative frequency
distribution is a tabular distribution of cumulated frequencies of class
intervals in tabular arrangement. There are two types of cumulative
frequency distribution, the “less than” and “more than” cumulative
distribution.
39
In the frequency distribution: There are 4 items less than
131.50; 10 items less than 140.50; 19 items less than 149.50; 31 items
less than 158.50; 36 items less than 167.50; 39 items less than 176.50;
and 42 items less than 185.50.
<cf
45
40
35
30
25
20 <cf
15
10
5
0
131.5 140.5 149.5 158.5 167.5 176.5 185.5
40
Class Interval f >cf
123-131 4 42
132-140 6 38
141-149 9 32
150-158 12 23
159-167 5 11
168-176 3 6
177-185 3 3
42
>cf
45
40
35
30
25
20 >cf
15
10
5
0
122.5 131.5 140.5 149.5 158.5 167.5 176.5
41
Relative Frequency Distribution: The relative frequency distribution
is the arrangement of data in tabular form indicating the percentage of the
class frequencies over the total frequency. It is sometimes called the
percentage table.
Class Interval f rf(%)
123-131 4 9.52
132-140 6 14.29
141-149 9 21.43
150-158 12 28.57
159-167 5 11.90
168-176 3 7.14
177-188 3 7.14
30
25
20
Frequency
15
Percent
10
0
127 136 145 154 163 172 181
`
42
Name: ________________________ Course: ________
Classroom Activity No.2.1 Section: ________
1. Prepare a frequency distribution table for the following random data.
148 253 268 372 387 493 408 513 528 633 648
753 768 873 888 491 406 511 526 631 646 751
766 472 487 592 507 612 627 732 747 453 468
573 588 693 608 713 728 534 549 654 669 576
681 696 701 517 622 637 547 552 564 576 588
690 503 519 621 535 644 556 666 374 584 794
804 712 820 835 641 556 761 876 683 699 501
517 422 235 647 758 869 374 485 293 505 610
625 730 745 850 865 973 187 292 207 312 327
432 447 552 567 672 687 792 707 812 827 932
43
Name: ________________________ Course: ________
Homework No.2.1 Section: ________
1. Prepare a frequency distribution table for the following random data.
57 42 25 67 78 89 34 45 23 55 60
65 70 75 80 85 93 17 22 27 32 37
42 47 52 57 62 67 72 77 82 87 92
18 23 28 32 37 43 48 53 58 63 68
73 78 83 88 41 46 51 56 61 66 71
76 42 47 52 57 62 67 72 77 43 48
53 58 63 68 73 78 54 59 64 69 56
61 66 71 57 62 67 57 52 54 56 58
60 53 59 61 55 64 56 66 34 54 74
84 72 80 85 61 56 71 86 63 69 51
44
2. Construct a frequency distribution table for the following data
representing the daily savings of employees in a certain company.
141 146 151 156 161 166 171 176 142 147 152 157
132 137 143 148 153 158 163 168 173 178 183 188
152 154 156 158 160 153 159 161 155 164 156 166
134 154 174 184 177 162 167 172 177 182 187 192
157 142 125 167 178 189 134 145 123 155 160 165
170 175 180 185 196 117 122 127 132 137 142 147
162 167 172 177 143 148 153 158 163 168 173 178
154 159 164 169 156 161 166 171 157 162 167 157
152 157 162 167 172 177 182 187 192 118 123 148
45
CHAPTER 3
TOPIC LESSON
1. The Three Measures of Central Tendency
2. Ungrouped Data
3. Grouped Data
4. Comparison of Mean, Median, and Mode
5. What Measure to be used.
OBJECTIVES
For the students to:
46
from the entire population of data are treated and evaluated, it is called
statistics.
For instance, a store manager was asked by the store owner about the
daily sales of the store for the period of six months or 180 days. Hence,
instead of enumerating the 180 days sales, the manager can only give a value
that will represent the entire 180 days sales like the average daily sales, the
highest one day sale, or any particular value that can describe the 180-day
sales.
In line with production, the values may differ from day to day but a
single figure will suffice to define the volume of production in any given
span or period. And so, instead of going into the details of a given
distribution, perhaps it would be easier to find out that single figure that can
represent the entire set of data.
47
In any aspect, there is one value or single figure that could be used to
describe a set of data. The most commonly used measures are the mean,
median, and mode.
The mean is defined as the average figure of all the items. It is the
“central value” of any set of observations and computed as the sum of all
the items divided by the number of all the items.
∑𝑥 ∑ 𝑓𝑥
𝑥̅ = 𝑁
𝑓𝑜𝑟 𝑢𝑛𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎 𝑒𝑞. 3.1 , 𝑥̅ = 𝑁
𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎 𝑒𝑞. 3.2
The median is defined as the value at the middle of any distribution or set
of data. It could be one of the data or an item in the distribution or just
simply a value that represents the middle figure after arranging the data
accordingly. The item or score can be found by:
𝑁+1
𝑚= 2
𝑓𝑜𝑟 𝑢𝑛𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎 𝑒𝑞. 3.3
𝑁
−∑ 𝑓𝑚−1
2
𝑚 = 𝑙𝑚 + ( 𝑓𝑚
) 𝑖 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎 𝑒𝑞. 3.4
A mode is defined as the value that has the highest frequency of figure or
value that appears most frequently in the set of data.
∆1
𝑚𝑜 = 𝑙𝑚𝑜 + (∆ ) 𝑖 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎 𝑒𝑞. 3.5
1 + ∆2
48
Illustrative example: Consider the following set of observations.
35, 37, 40, 40, 48
Given: N=5
∑𝑥 35+37+40+40+48
Mean: 𝑥̅ = = = 𝟒𝟎
𝑁 5
𝑁+1 5+1
The Median: 𝑚= = =3
2 2
𝑚𝑜 = 𝟒𝟎
35+37+40+40+58
Mean: 𝑥̅ = = 𝟒𝟐
5
Median: 𝑚 = 𝟒𝟎
49
Mode: 𝑚𝑜 = 𝟒𝟎
We can see that the mode and median are still 40 but the mean of the
new set of terms is now 42 and no longer 40. If we change the third term to
35 instead of 40, the mode is no longer 40 but 35 and the median is no
longer 40 but 37. But, the value of the mean would also change. Every
change applied to any terms would bring change in the mean as shown.
35+35+37+40+48
Mean: 𝑥̅ = = 𝟑𝟗
5
Median: 𝑚 = 𝟑𝟕
Mode: 𝑚𝑜 = 𝟑𝟓
We must remember that the mode and median respond only on some
changes in the terms, but the mean responds to every change in the terms.
This is the reason why the mean is often described as “sensitive” and reflects
represents the entire distribution.
That is the very reason why the mean is the most important among the
measures of central tendency. However, this sensitivity can lead to some
disadvantages, especially when the distribution contains some extreme
values. Extreme values refer to the lowest and highest values in the
distribution. If there are many extreme values, the mean is not the measure
to represent the distribution.
50
Computation Using the Excel
The computation of the three forms can be done using the excel
program. From the table, we can find the product by putting a formula on the
cell location of the sum or total of the items. The three forms can be found
by inspection and using the sum for the mean.
A B
1 1 35
2 2 37
3 3 40
4 4 40
5 5 48
6 Total 200
7 𝑥̅ 40
8 m 40
9 mo 40
Assign numbers to each of the items and encode the items in random manner in
one column as shown above. After the encoding, highlight the entire data and click the
“Data” icon to show the options. Then click the order for ascending or descending manner;
or Smallest to Largest or Largest to Smallest. The data will automatically arrange in any of
the two orders.
The sum or total can be found by clicking the “∑” Auto Sum icon. The “Mean” can
be computed by dividing the “Total Cell” by the assigned “Cell” of the last number of the
items. The “Median can be found using the “Formula” while the “Mode” can be found by
inspection of the most frequent item.
Computation of the Mean Using the Excel Program: (From the Figure shown)
51
Analysis of the mean:
52
40 42 -2
48 42 6
Total -10
Based on the theorem, if the total of the differences is zero, the certain
number is the mean. It is proven on the illustration shown above the theorem
on the mean.
53
Using other values, we have: Try 42
Based on the theorem, the total of the squared differences is the least
from the mean. It is again proven on the illustration shown above the
theorem on the mean.
Ungrouped Data:
∑𝑥
Formula: 𝑥̅ =
𝑛
400+520+550+600+650
Solution: 𝑥̅ =
5
54
𝑥̅ = 𝑃544
𝑚 = 550
𝑚𝑜 = 𝑁𝑜𝑛𝑒
Arithmetic mean:
Example: Consider the following data, 1,000 shirts were sold at P250;
800 shirts at P300; 500 shirts at P320; 400 shirts at P400; and
300 shirts at P550. What is the weighted arithmetic mean price
of shirts?
Solution:
1
WX = (1,000 x 250) + (800 x 300) + (500 x 320) + (400 x 400) +
3,000
(300 x 550)
WX = P325
55
The Mean:
∑ 𝑓𝑥
Formula: 𝑥̅ =
𝑁
Where: 𝑥̅ – Mean
f – Frequency of Class Interval
x – Class Mark of Class interval
N – Total Number of Items
The Median:
𝑁
− ∑ 𝑓(𝑚−1)
2
Formula: 𝑚 = 𝐿𝑚 + ( )𝑖
𝑓𝑚
Where: m – Median
Lm – Lower limit boundary of median class
∑ 𝑓(𝑚 − 1) - Sum of all frequencies before the median
class
fm – Frequency of the median class
N – Total number of items
i – Size of class interval
The Mode:
𝛥1
Formula: 𝑚𝑜 = 𝐿𝑚𝑜 + ( )𝑖
𝛥1 +𝛥2
Where: mo – Mode
Lmo – Lower Limit Boundary of the modal class
∆1 – Difference between the highest frequency and the
frequency above it.
56
∆2 – Difference between the highest frequency and the
frequency below it.
i – Size of class interval
Example: Consider the Raw Data below, Determine the Mean, the
Median. and the Mode.
57
After arranging the data from lowest to highest, the middle value in
152+152
the set are 152 and 152. Hence, the median if = 152. the median
2
implies that the first half of the ordered set of data have values less than 152
while the other half have values greater than 152.
𝑁+1
The median class is the class interval where 𝑡ℎ item is found. In
2
𝑁+1
the example, the 𝑡ℎ item is between 21th and 22nd items. Form the given
2
data, the values of this 21th item is 152 and 22nd item is 152 also and both
are within the cumulative frequency of 31. Therefore, the median class is
[150-158]. The lower limit boundary for the median class is 149.5, the
frequency of the median class is 12, and the cumulative frequency before the
median class is 19.
𝑁
− ∑ 𝑓(𝑚−1)
2
Median m = Lm + (
𝑓𝑚
)𝑖
42
− 19
2
𝑚 = 149.5 + ( )9
12
58
21 − 19
𝑚 = 149.5 + ( )9
12
2
𝑚 = 149.5 + ( ) 9
12
m = 149.5 + 1.5
Based on its description, the mode can be found in the class interval
with highest frequency. The class interval with highest frequency is known
as modal class. An observation with only one mode is known as uni-modal
while an observation with two or more modes is called multi-modal. A two
modes can be called bimodal while three modes as trimodal, and so on....
(12−9)
Mode 𝑚𝑜 = 149.5 + ((12−9)+(12−5)) 9
3
= 149.5 + ( )9
3+7
3
= 149.5 + ( ) 9
10
= 149.5 + 2.7
Mode mo = 152.2 or 152
59
would produce more of its size. In business of apparel, the mode is the best
central tendency to use.
The mean, the median, and the mode are all located at one point in a
symmetrical distribution. Data can be considered symmetrical if there are no
extreme values on both ends of the distribution so that the distribution is
balanced at the center of the data.
Frequencies
Mean X-axis
Median
Mode
1. The median bisects the total area. The area is divided into two parts,
one to the left and the other to the right of the median.
60
2. The mode is the item with the greatest frequency, the item on the x-
axis which corresponds to the tallest point of the curve.
3. The mean is the score point on the x-axis which corresponds to the
balance or fulcrum of the set of data.
Frequency
X-axis
Mode
Median
Mean
We notice from the graph that the skewness is at the right. This
signifies that there are many low values and the mode is parallel to low
values and it is lower than the mean. The mean which is sensitive in its
nature, will be pulled in the direction of the extreme scores and will have a
61
high value. The media will have a value between the mode and the mean and
it is unaffected by extreme values.
x-axis
62
cannot be the typical measure because it will be pulled by the few high
salaries.
In all occasions other than the above, especially if there is interval or
ratio scale, then the mean is to be used as central tendency. Generally, if we
are concentrated with quantity, the mean will be an appropriate measure.
A second consideration in choosing a measure of central tendency is
the purpose for which the measure is being used. The mean is the best
measure to use if we want the value of every single observation to contribute
to the average. If we want to estimate the cost of the average housing unit in
the community, the median would be more accurate to use. If we want to
find out the most frequent occurring item in a distribution, the mode is the
measure to be used.
Where: 𝑚𝑟 − 𝑀𝑖𝑑𝑟𝑎𝑛𝑔𝑒
𝑥1 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑥𝑛 − 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
63
The Non-Central Forms of Measurement:
The Quartiles
The quartiles are commonly used measures of “non-central” location
particularly of very large observations. The quartiles divides the entire
observation into four quarters.
The First Quartile is the item or value wherein three-fourths of the observations
are higher while the remaining one-fourth are lower. It can be found by:
𝑛+1
𝑄1 = 4
𝑓𝑜𝑟 𝑢𝑛𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎
𝑛+1
−(Σ𝑓𝑞1 −1)
4
𝑄1 = 𝐿𝐿𝑞1 + ( )𝑖 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎
𝑓𝑞1
The Third Quartile is the item or value wherein three-fourths of the observations
are lower while the remaining one-fourth are higher. It can be found by:
3(𝑛+1)
𝑄3 =
4
3(𝑛+1)
−(Σ𝑓𝑞3 −1)
4
𝑄3 = 𝐿𝐿𝑞3 + ( 𝑓𝑞3
)𝑖 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎
64
The Quantiles
The quantiles are the natural extension of the median concept but they
are values which divide the set of data into equal parts while the median
divides the set of data into two parts. The quantiles which divide the set of
data into four is called quartiles. Deciles divide the distribution into ten
while percentiles divide the distribution into one hundred parts.
123+185
Midrange 𝑚𝑟 =
2
𝒎𝒓 = 𝟏𝟓𝟒 𝑨𝒏𝒔𝒘𝒆𝒓
𝑛+1 42+1
First Quartile: 𝑄1 = = = 10.75
4 4
Hence, we are looking for the 11th item on the set of data. From the
table, the 11th item belongs to Class Interval 141-149.
LLq1 = 141
n = 42
Σ𝑓𝑞1 − 1 = 10
65
𝑓𝑞1 =9
i =9
𝑛+1
−(Σ𝑓𝑞1 −1)
4
𝑄1 = 𝐿𝐿𝑞1 + ( )𝑖
𝑓𝑞1
10.75−10
𝑄1 = 141 + ( )9
9
𝑸𝟏 = 𝟏𝟒𝟏. 𝟕𝟓 𝑨𝒏𝒔𝒘𝒆𝒓
3(𝑛+1) 129
Third Quartile: 𝑄3 = = = 32.25
4 4
Hence, we are looking for the 33rd item on the set of data. From the
table, the 33rd item belongs to Class Interval 159-167
LLq3 = 159
n = 42
Σ𝑓𝑞3 − 1 = 31
𝑓𝑞3 =5
i =9
3(𝑛+1)
−(Σ𝑓𝑞3 −1)
4
𝑄3 = 𝐿𝐿𝑞3 + ( )𝑖
𝑓𝑞3
32.25−31
𝑄3 = 159 + ( )9
5
𝑸𝟑 = 𝟏𝟔𝟏. 𝟐𝟓 𝑨𝒏𝒔𝒘𝒆𝒓
66
Name: ________________________ Course: ________
Classroom Activity No.3.1 Section: ________
1. Determine the three forms of central tendency and the non-central measures
regarding the observation below.
148 253 268 372 387 493 408 513 528 633 648
753 768 873 888 491 406 511 526 631 646 751
766 472 487 592 507 612 627 732 747 453 468
573 588 693 608 713 728 534 549 654 669 576
681 696 701 517 622 637 547 552 564 576 588
690 503 519 621 535 644 556 666 374 584 794
804 712 820 835 641 556 761 876 683 699 501
517 422 235 647 758 869 374 485 293 505 610
625 730 745 850 865 973 187 292 207 312 327
432 447 552 567 672 687 792 707 812 827 932
67
Name: ________________________ Course: ________
Homework No.3.1 Section: ________
3. The following are the salaries of 12 players of the top NBA teams: four
of them earn 15M a year, two earn 18M a year, three earn 20M and 8M a
year. What is the mean and median salary of the players.
68
4. An achievement test in economics contained 30 questions. The
distribution below summarizes the result of the test.
Number of Answers Frequency
1–3 2
4–6 8
7–9 16
10 – 12 26
13 – 15 38
16 – 18 42
19 – 21 36
22 – 24 24
25 – 27 6
28 – 30 2
Find the mean, median, and the mode, midrange and the quartiles of the
distribution above.
69
5. In a selection of 15 lots of 200 electronic components, the following
numbers of defective electronic components were found:
3 10 12 5 11
8 7 4 13 4
9 4 6 5 15
Find the median and the mode of the defective electronics.
70
6. Given the following frequency distribution, estimate the mean, median,
and the mode.
Class Interval Frequency
71 – 75 4
66 – 70 18
61 – 65 24
56 – 60 42
51 – 55 47
46 – 50 29
41 – 45 27
36 – 40 11
31 – 35 3
71
7. Given the following raw data, determine the mean, median and mode;
and the non-central measures..
57 42 25 67 78 89 34 45 23 55 60 65
70 75 80 85 98 15 22 27 32 37 42 47
52 57 62 67 72 77 82 87 92 18 23 28
32 37 43 48 53 58 63 68 73 78 83 88
41 46 51 56 61 66 71 76 42 47 52 57
62 67 72 77 43 48 53 58 63 68 73 78
54 59 64 69 56 61 66 71 57 62 67 57
52 54 56 58 60 53 59 61 55 64 56 66
34 54 74 84
72
II. VARIATION and DEVIATION
TOPIC LESSON
1. Mean Absolute Deviation
2. Quartile Deviation
3. Standard Deviation
4. Other Deviations
OBJECTIVES
For the students to:
73
The measures of position are of little value unless the measures of
spread or variability which occur about them are known. Therefore, the
description of a set of data becomes more meaningful if the degree of
clustering about a central point is measured. Information on how far apart
the observations are from each other in every set will be very useful.
The Range
Q1 – First quartile
Q3 – Third quartile
74
Mean absolute deviation
To get the mean absolute deviation, we get the sum of the absolute
values of the mean deviates then divide it by the total number of items in the
distribution.
∑∣𝑥−𝑥̅ ∣
Formula: MAD = ungrouped data eq. 3.8
𝑛−1
x – individual score
𝑥̅ - mean
∑ 𝑓∣𝑥−𝑥̅ ∣
Formula: MAD = grouped data eq. 3.9
𝑛−1
x – class mark
𝑥̅ - mean
F – frequency
∑(𝑥− 𝑥̅ )2
Formula: 𝑠=√ ungrouped data eq. 3.10
𝑛−1
75
x – value of item
𝑥̅ - mean
∑𝑓(𝑥− 𝑥̅ )2
Formula: 𝑠= √ grouped data eq. 3.11
𝑛−1
f - frequency
x – value of item
𝑥̅ - mean
Example 1. The prices of certain books are set at P400, P550, P520, P600,
and P650. Find the measures of variability.
𝑥̅ = 𝑃544
Then using the computed mean, tabulate the given data together with
the mean in the excel program in two separate columns. Also, provide a
column for the mean absolute deviation and the standard deviation. Since the
analysis is for the ungrouped data, the two formulas to be used are: (Please
refer to example 2 in the detailed step by step procedure in EXCEL.
∑∣𝑥−𝑥̅ ∣
MAD = For Mean Absolute Deviation
𝑛−1
∑(𝑥− 𝑥̅ )2
𝑠=√ For Standard Deviation
𝑛−1
76
Tabulation from the Excel Program
x x- (x-x) Ix-xI
1 400 544 -144 20736 144
2 520 544 -24 576 24
3 550 544 6 36 6
4 600 544 56 3136 56
5 650 544 106 11236 106
TOTAL 2720 35720 336
8930
X 544
MAD 84
SD 94.50
𝑴𝑨𝑫 = 𝟖𝟒 𝑨𝒏𝒔𝒘𝒆𝒓
Standard Deviation
∑(𝑥− 𝑥̅ )2
𝑠=√
𝑛−1
35,720
𝑠=√
4
𝒔 = 𝟗𝟒, 𝟓𝟎 𝑨𝒏𝒔𝒘𝒆𝒓
Example 2: Consider the raw data below (Chapter two) and determine the
measures of variability.
77
170 143 152 137 151 155
154 134 147 163 157 135
125 138 185 143 145 155
175 158 166 154 129 173
180 153 147 164 179 128
78
Transfer the outcome of the frequency distribution in the EXCEL
Program and formulate the formulas for each cell. The only given values in
the table are values from columns A, B, and C coming from the frequency
distribution. The following procedures will be followed to get the values of
the rest of the columns.
79
Standard Deviation
A B C D E F G H
1 Class Interval f C.M. fx x (x-x) (x-x)2 f(x-x)2
2 123-131 4 127 508 151.21 -24.21 586.33 2345.33
3 132-140 6 136 816 151.21 -15.21 231.47 1388.85
4 141-149 9 145 1305 151.21 -6.21 38.62 347.56
5 150-158 12 154 1848 151.21 2.79 7.76 93.12
6 159-167 5 163 815 151.21 11.79 138.90 694.52
7 168-176 3 172 516 151.21 20.79 432.05 1296.14
8 177-185 3 181 543 151.21 29.79 887.19 2661.57
9 Σ 42 6351 2322.32 8827.07
10 x 151.21
11 MAD 14.67
8,827.07
𝑠=√
41
𝒔 = 𝟏𝟒. 𝟔𝟕 𝑨𝒏𝒔𝒘𝒆𝒓
80
Name: ________________________ Course: ________
Classroom Activity No.3.2 Section: ________
1. Determine the three forms of central tendency and the non-central measures
regarding the observation below.
148 253 268 372 387 493 408 513 528 633 648
753 768 873 888 491 406 511 526 631 646 751
766 472 487 592 507 612 627 732 747 453 468
573 588 693 608 713 728 534 549 654 669 576
681 696 701 517 622 637 547 552 564 576 588
690 503 519 621 535 644 556 666 374 584 794
804 712 820 835 641 556 761 876 683 699 501
517 422 235 647 758 869 374 485 293 505 610
625 730 745 850 865 973 187 292 207 312 327
432 447 552 567 672 687 792 707 812 827 932
81
Name: ________________________ Course: ________
Homework No.3.2 Section: ________
82
3. An achievement test in economics contained 30 questions. The distribution
below summarizes the result of the test. Find the measures of variability
Number of Answers Frequency
1–3 2
4–6 8
7–9 16
10 – 12 26
13 – 15 38
16 – 18 42
19 – 21 36
22 – 24 24
25 – 27 6
28 – 30 2
83
4. Given the following raw data, determine the measures of variability.
57 42 25 67 78 89 34 45 23 55 60 65
70 75 80 85 98 15 22 27 32 37 42 47
52 57 62 67 72 77 82 87 92 18 23 28
32 37 43 48 53 58 63 68 73 78 83 88
41 46 51 56 61 66 71 76 42 47 52 57
62 67 72 77 43 48 53 58 63 68 73 78
54 59 64 69 56 61 66 71 57 62 67 57
52 54 56 58 60 53 59 61 55 64 56 66
34 54 74 84 51 56 61 62 67 72 77 36
84
CHAPTER 4
TOPIC LESSON
1. Basic Concepts of Probability
2. Conditional Probability
3. Bayes’ Theorem
4. Discrete and Continuous Random Variable Probability
Distribution
5. Binomial Distribution
6. Poison Distribution
7. Applications of Probability
OBJECTIVES
For the students to:
85
The Basic Concepts of Probability
Single Event:
𝐻
𝑃= eq. 4.1
𝐻+𝐹
𝐹
𝑄= eq. 4.1a
𝐻+𝐹
𝑁 = 𝐻 + 𝐹 eq. 4.1b
Where:
P = Probability of Occurrence
Q = Probability of Failure
H = Number of Outcomes that the event will happen
F = Number of Outcomes that the event will fail
N = Total Number of Possible Outcomes
86
If there are more than one trials in tossing a coin, there is no more
prior equal assumption on favourable and failure outcomes. The probability
now is being taken to the number of successful outcomes over the total
number of trials. This method is called empirical classical probability.
Empirical Method
𝑂𝑠
𝑃= eq. 4.2
𝑁𝑡
Multiple Events
𝑃 = 𝑃1 × 𝑃2 × 𝑃3 × … × 𝑃𝑛 eq. 4.3
87
Mutually Exclusive Event
𝑃 = 𝑃1 + 𝑃2 + ⋯ + 𝑃𝑛 eq. 4.4
Repeated Trials
n Factorial (n!) = 𝑛 × (𝑛 − 1) × … × 3 × 2 × 1
Repeated Trials
88
The first two methods (single and empirical) dealt with objective
probability whereas the third one determines the probability by dealing with
available data or believing that an event will occur or not. This method is
called subjective probability. A concrete example of this method is the view
of a manufacturer against the view of a seller in the success of one particular
product. The personal opinion and analysis of a definite condition of a
particular event are the basis of a subjective probability.
Contingency Table
89
The columns are the possible outcomes of dice 1 while the rows are
the possible outcomes of dice 2. Therefore in throwing two fair dice, there
are 36 probable outcomes or the sample space.
Discrete Probability
91
chapter 3 of this work textbook. The two major properties are the mean (𝜇)
and standard deviation (𝜎). Please notice also the sum of the values of
probabilities is equal to one (1) or unity.
92
Mo. Number of Housing Units Probability xP 𝜇 (𝑥 − 𝜇) (𝑥 − 𝜇)2 𝑃
1 200 0.06 12 306.5 -106.5 680.535
2 250 0.07 17.5 306.5 -56.5 223.4575
3 250 0.10 25 306.5 -56.5 319.225
4 350 0.11 38.5 306.5 43.5 208.1475
5 400 0.12 48 306.5 93.5 1049.07
6 400 0.10 40 306.5 93.5 874.225
7 300 0.10 30 306.5 -6.5 4.225
8 250 0.06 15 306.5 -56.5 191.535
9 200 0.05 10 306.5 -106.5 567.1125
10 300 0.08 24 306.5 -6.5 3.38
11 350 0.09 31.5 306.5 43.5 170.3025
12 250 0.06 15 306.5 -56.5 191.535
Σ 306.5 4482.75
𝜇 306.5
𝜎 66.95
By analytical computation:
𝜎 = √4,482.75
𝝈 = 𝟔𝟔. 𝟗𝟓 𝑨𝒏𝒔𝒘𝒆𝒓
93
Die 2
1 2 3 4 5 6
1 1 1 1 2 1 3 1 4 1 5 1 6
2 2 1 2 2 2 3 2 4 2 5 2 6
Die 1 3 3 1 3 2 3 3 3 4 3 5 3 6
4 4 1 4 2 4 3 4 4 4 5 4 6
5 5 1 5 2 5 3 5 4 5 5 5 6
6 6 1 6 2 6 3 6 4 6 5 6 6
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6
𝑓(𝑥)1 𝑓(𝑥)2 𝑓(𝑥)3 𝑓(𝑥)4 𝑓(𝑥)5 𝑓(𝑥)6
Going back to the table, there are six coloured cells representing the
possible outcomes:
1. 𝑓(𝑥)1 − 𝑐𝑜𝑙𝑜𝑢𝑟𝑒𝑑 𝑦𝑒𝑙𝑙𝑜𝑤
2. 𝑓(𝑥)2 − 𝑐𝑜𝑙𝑜𝑢𝑟𝑒𝑑 𝑔𝑟𝑒𝑒𝑛
3. 𝑓(𝑥)3 − 𝑐𝑜𝑙𝑜𝑢𝑟𝑒𝑑 𝑡𝑢𝑟𝑞𝑢𝑜𝑖𝑠𝑒
4. 𝑓(𝑥)4 − 𝑐𝑜𝑙𝑜𝑢𝑟𝑒𝑑 𝑝𝑖𝑛𝑘
5. 𝑓(𝑥)5 − 𝑐𝑜𝑙𝑜𝑢𝑟𝑒𝑑 𝑏𝑙𝑢𝑒
6. 𝑓(𝑥)6 − 𝑐𝑜𝑙𝑜𝑢𝑟𝑒𝑑 𝑟𝑒𝑑
94
The probability distribution of f(x) is determined as follows:
f(1) : P(x = 1) = P(1, 1)
𝟏
=
𝟑𝟔
f(3) : P(x = 3) = P(3, 1), P(3, 2), P(3, 3), P(1, 3), P(2, 3)
𝟓
=
𝟑𝟔
f(4) : P(x = 4) = P(4, 1), P(4, 2), P(4, 3), P(4, 4), P(1, 4),
P(2, 4), P(3, 4)
𝟕
=
𝟑𝟔
f(5) : P(x = 5) = P(5, 1), P(5, 2), P(5, 3), P(5, 4), P(5, 5),
P(1, 5), P(2, 5), P(3, 5), P(4, 5)
𝟗
=
𝟑𝟔
f(6) : P(x = 6) = P(6, 1), P(6, 2), P(6, 3), P(6, 4), P(6, 5),
P(6, 6), P(1, 6), P(2, 6), P(3, 6), P(4, 6), P(5, 6)
𝟏𝟏
=
𝟑𝟔
𝑥𝑖 1 2 3 4 5 6
1 3 5 7 9 11
𝑓(𝑥𝑖) 36 36 36 36 36 36
95
x P xP 𝜇 x-𝜇 (𝑥 − 𝜇)2 P
1 0.03 0.03 4.47 -3.47 0.33
2 0.08 0.17 4.47 -2.47 0.51
3 0.14 0.42 4.47 -1.47 0.30
4 0.19 0.78 4.47 -0.47 0.04
5 0.25 1.25 4.47 0.53 0.07
6 0.31 1.83 4.47 1.53 0.71
Σ 1.00 4.47 1.97
𝜇 4.47
𝜎 1.40
By analytical computation:
𝝁 = 𝟒. 𝟒𝟕 𝑨𝒏𝒔𝒘𝒆𝒓
𝜎 = √1.97
𝝈 = 𝟏. 𝟒𝟎 𝑨𝒏𝒔𝒘𝒆𝒓
Mean: 𝜇 = 4.47
Standard Deviation: 𝜎 = 1.40
96
Name: ________________________ Course: ________
Classroom Activity No. 4.1 Section: ________
2. 50% of the riding public at LRT and MRT have wrong chance. If 250
passengers ride at the terminal every 5 minutes, what is the probability that
100 passengers have correct change?
97
4. The distribution of finished and completed housing units per schedule per
month of a realty development company is shown in the table. If there were
5,500 units completed in a year, what is the population standard deviation?
Table for Monthly Probability Distribution of Houses
Month Probability
1 0.08
2 0.09
3 0.10
4 0.12
5 0.13
6 0.09
7 0.08
8 0.06
9 0.05
10 0.07
11 0.08
12 0.05
98
Name: ________________________ Course: ________
Homework No. 4.1 Section: ________
99
3. The U.S forces in Iraq used a missile that hit the target with a probability of
0.20. How many missiles should be fired so that there is at least 75%
probability of hitting the target?
4. Michael Jordan sinks 80% of his free throw attempts. What is the probability
that he will make exactly 7 of his next 10 attempt?
100
Chapter 5
Normal Distribution
TOPIC LESSON
5. Normal random Variable
6. Normal Curve
7. Regions of Normal Curve
8. Probabilities and Percentiles using Normal Curve
9. Sampling Distribution of the Mean
OBJECTIVES
For the students to:
5. Illustrate a normal random variable and its
characteristics.
6. Know how to construct a normal curve
7. Identify regions under normal curve corresponding to
standard normal values.
8. Convert normal random variable to standard normal
variable.
9. Compute probabilities and percentiles using areas under
normal curve.
10.Evaluate sampling distribution from the mean.
101
One of the vitally important in statistics is the Normal Distribution.
There are three reasons behind its importance:
1. It is a tool in approximation of various discrete probability
distribution.
2. It can follow ad approximate numbers of continuous phenomena.
3. It provides inferential statistics because of its relation to the central
limit theorem. The Central Limit Theorem: The sample mean 𝑥̅
approximately follows the normal distribution with population
mean µ and population standard deviation 𝜎.
102
5. Its related random variable has infinite range (−∞ < 𝑥 < +∞).
6. There should be at least three standard scores, each to the left and to
the right of the mean.
7. The distance between two standard scores is measured by the standard
deviation.
8. Its middle spread is equal to 1.33 standard deviations which means,
the quartile deviation is within the interval of two-thirds of the
standard deviation below and above the mean.
Skewed distributions:
103
Illustration diagram of age at marriage:
15 20 25 30 35 40 45 50 60 70
3(𝜇 −𝑚)
𝑖𝑛𝑑𝑒𝑥 = 𝐸𝑞. 5.1
5
104
The values on the table are anchored by the values of standard score
(z) having a formula called the Transformation Formula:
𝑥−𝜇
𝑧= 𝐸𝑞. 5.2
𝜎
Solution: Substitute the given values of the variables, mean and standard
deviation in the formula
a) For x1=90:
𝑥− 𝜇 90 − 120
𝑧= , 𝑧=
𝜎 15
𝒛 = −𝟐. 𝟎 𝑨𝒏𝒔𝒘𝒆𝒓
105
There are 2 standard deviations between 90 and 120 to
the left of the mean is the meaning of z = -2. The area will still
be +0.4772 although the z-value is negative which means that
the value covered from 90 to 120 has a probability 𝑝 = 0.4772
and they are located at the left portion of the mean. (Note that (-
) value of (z) tells us that the variable is at the left of mean.)
b) For x2=100:
𝑥− 𝜇 100− 120
𝑧= , 𝑧=
𝜎 15
𝒛 = −𝟏. 𝟑𝟑 𝑨𝒏𝒔𝒘𝒆𝒓
c) For x3=145:
𝑥− 𝜇 145− 120
𝑧= , 𝑧=
𝜎 15
𝒛 = +𝟏. 𝟎 𝑨𝒏𝒔𝒘𝒆𝒓
d) For x4=160:
106
𝑥− 𝜇 160− 120
𝑧= , 𝑧=
𝜎 15
𝒛 = +𝟐. 𝟔𝟕 𝑨𝒏𝒔𝒘𝒆𝒓
The table only shows the right portion of the areas under normal curve
or the half of the normal curve. Because the two portions of the normal
curve are symmetrical, it is no longer necessary to give the area of the left
half of the normal curve. Disregard the negative (-) value of the standard
score, since the sign only tells us the location of the variable.
The normal curve of any normal distribution comes from the graph of
histogram taken from the frequencies of the class marks of every class
interval or category. Sample of this is shown below.
𝜇
Relative Frequency Histogram
107
Application of Normal Distribution
Example: Draw the normal curve of the normal distribution taken from
the frequencies of class marks’ intervals of the incomes of
employees of a certain company as shown in table 5.1. The
incomes as specified in the column of class marks are in
thousands of pesos. Determine also the frequencies of random
variable and probabilities of the employees having an income
that range from:
a) 27k and below
b) 27k to 36k
c) 36k to 54K. Table 5.1
Income of the Employees of ABC Co., Ltd.
C. Intervals Frequencies C. Marks
1 2 21
2 7 25
3 38 29
4 170 33
5 420 37
6 850 41
7 1,400 45
8 850 49
9 420 53
10 170 57
11 38 61
12 7 65
13 2 69
108
Solution: With the help of EXCEL program, the chart below was made
using the frequencies of the class intervals. Connecting the
middle top of the bars in the histogram, we can now draw the
normal curve.
Frequency
1600
1400
1200
1000
800
600
400
200
0
1 2 3 4 5 6 7 8 9 10 11 12 13
Figure 5.2
Using the EXCEL program, we can find the value of the standard
deviation as shown in the tabulation. (Table 5.2)
109
The standard deviation as computed is equal to 6.0. Hence, the
frequencies and probabilities of the random variables are:
a) 27k and below
27−45
𝑧27 = = −2.98
6.0
Draw a diagram of the normal curve and shade the area of the following
intervals. Determine also the area covered by the intervals.
a. From z = -3.05 to z = - 1.52
111
d. Right of z = +1. 03
e. Right of z = -0.47
f. Left of z = -1.12
g. Left of z = +1.85
112
Name: ________________________ Course: ________
Homework No.5.1 Section: ________
b. From z = -1 25 to z = - 1.88
113
e. Right of z = +1,69
f. Right of z = -1.92
g. Left of z = -0.56
h. Find the probability that z is less than or equal to -0.76 and greater
than or equal to +1.23.
114
2. Find the z score corresponding to the given areas under the normal curve.
(show diagrams and shade given areas)
a. Area to the left of z = 0.6730
115
3. In a statistics examination, the mean grade is 80 and the standard
deviation is 5.
a. Find the corresponding scores of two students whose grades are 88
and 68 respectively.
b. Find the grades of two students whose z scores are 0.68 and -1.56
respectively.
c. If there were 120 students that took the examination, how many
students got 75 and below.
116
4. Four hundred skilled workers were given an examination to determine
how much they know about their job. If the scores are normally
distributed and the score of one worker measured in z score is 0.75,
a. How many of the workers who took the examination has scored
higher than or equal to this particular worker.
b. If the standard deviation is 4.5 and the lowest score is 60, what is
the probable highest score?
117
The Standardized Normal Distribution
Table 5.3
118
The mathematical symbol representing the probability density
function is denoted by f(X). For the normal distribution, the formula is:
𝟏 𝒙−𝝁 𝟐
𝒆 𝟐 𝝈 ]
− [
𝒇(𝑿) = 𝑬𝒒. 𝟓. 𝟑
√𝟐𝝅𝝈
𝑥−𝜇
From the transformation formula 𝑧 = , the original data for the
𝜎
1 𝑧−0 2
− [ ]
𝑒 2 1
𝑓(𝑍) =
√2𝜋(1)
𝟏 𝟐
− [𝒛]
𝒆 𝟐
𝒇(𝒁) = 𝑬𝒒. 𝟓. 𝟒
√𝟐𝝅
119
From Figure 5.3, we notice that any values of X has a corresponding
values of standardized measurement Z coming from the transformation
𝑥−𝜇
formula 𝑧 = as discussed in the early part of this chapter.
𝜎
x-scale 𝜇 − 4𝜎 𝜇 − 3𝜎 𝜇 − 2𝜎 𝜇 − 1𝜎 𝜇 𝜇 + 1𝜎 𝜇 + 2𝜎 𝜇 + 3𝜎 𝜇 + 4𝜎
𝜇 = 45, 𝜎 = 6 21 27 33 39 45 51 57 63 69
𝑧 − 𝑠𝑐𝑎𝑙𝑒 −4 −3 −2 −1 0 +1 +2 +3 +4
Figure 5.3
Illustrative Example:
Supposing we pick an employee at random and determine the
probability that the income of the employees’ family. What would be the
probability if the income of the employees’ family is between 39k to 45k.
120
AREAS OF A NORMAL CURVE
Table 5-A
Z 0 1 2 3 4 5 6 7 8 9
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0754
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2258 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2518 0.2549
0.7 0.2580 0.2612 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2996 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4812 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4965 0.4966 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
3.1 0.4990 0.4991 0.4991 0.4991 0.4992 0.4992 0.4992 0.4992 0.4993 0.4993
3.2 0.4993 0.4993 0.4994 0.4994 0.4994 0.4994 0.4994 0.4994 0.4995 0.4995
3.3 0.4995 0.4995 0.4996 0.4996 0.4996 0.4996 0.4996 0.4996 0.4996 0.4997
3.4 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4998
3.5 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998
3.6 0.4998 0.4998 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999
3.7 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999
3.8 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999
3.9 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000
121
Name: ________________________ Course: ________
Classroom Activity No.5.2 Section: ________
In the last entrance examination, the mean score 𝜇 is 128 out of 180 items and the
standard deviation 𝜎 is 10.
a. What is the probable lowest score?
122
d. Find the examination scores of two students whose z-scores are
0.68 and -1.56 respectively.
e. If there were 1,150 students that took the examination, how many
students who 105 and below.
123
g. What is the probability that a score of one examinee is between 95
and 105?
i. If there were 720 students who passed the examination, what is the
probable lowest passing score?
124
Sampling Distribution of the Mean
125
Distribution of the Mean
The unbiasedness property of the mean includes the fact that the
population mean 𝜇 is equal to the average all possible sample means of a
possible sample size. We have discussed several measures of central
tendency in the previous chapter. Specifically, we took on the mean as the
most important measure among the measures of central tendency because of
its sensitivity. It is the best measure if the population is found to be normally
distributed. Other properties of the mean include efficiency and consistency.
Illustration: A group of five (5) friends are fond on fish balls. The
following data are the number of pieces of fish balls they
ate in one occasion.
∑𝑁
𝑖=1 𝑋
𝜇= Eq. 5.5
𝑁
126
∑𝑁
𝑖=1(𝑋𝑖 −𝜇)
2
𝜎=√ Eq. 5.6
𝑁
The mean
10 + 12 + 15 + 17 + 16
𝜇= = 14 𝑝𝑖𝑒𝑐𝑒𝑠
5
127
Standard error of the Mean
The standard error of the mean 𝜎𝑥̅ is equal to the standard deviation of
the population 𝜎 over the square root of sample size n.
𝜎
𝜎𝑥̅ = Eq. 5.7
√𝑛
128
Z – Value for Sampling Distribution of the Mean
𝑋̅−𝜇𝑥̅
𝑍= Eq. 5.9
𝜎𝑥̅
𝑋̅−𝜇
𝑍= 𝜎 Eq. 5.10
√𝑛
Given: 𝑋̅ = 31 grams
𝜇 = 30 grams
𝜎 = 2.0 grams
𝑛 = 16 sachets
129
𝑋̅−𝜇
Formula: 𝑍= 𝜎
√𝑛
31−30 1.0
𝑍= 2 = 2
√16 4
𝑍 = +2.0
From the table of areas under normal curve, the area of Z = +2.0 is
0.4772. This implies that 2.28% of all possible samples of size 16 could
have a sample mean over 31 grams.
0.4772
0.0228
30 31
If we consider individual sachet, that percentage can be computed
immediately by:
𝑋−𝜇 31−30
𝑍= = = +0.50
𝜎 2
130
each sample contains 16 sachets of different contents, some below and some
above the sample mean.
The important factor of this theorem is that the average of all the
sample means is equal to the population mean. In other words, just simply
add all the sample means divide it by the number of sample units and that
figure is already the mean of the population.
Likewise, the average of all of the standard deviations of the sample,
is the actual standard deviation of the population. It’s a good and useful tool
that can help facilitating the nature and characteristics of a population.
1. For non-normal distribution, for any sample size of not less than 30
observations selected at random, the sampling of the mean is
approximated to be normally distributed.
131
2. For a fairly normally distribution, for any sample size of not less
than 15 observations selected at random, the sampling of the mean
is approximated to be normally distributed.
3. For a normal distribution, the sampling of the mean is
approximated to be normally distributed regardless of sample size.
132
∑𝑋
Formula: Sample Mean 𝑋̅ =
𝑛
433
𝑋̅1 = = 28.87
15
529
𝑋̅2 = = 35.27
15
638
𝑋̅3 = = 42.53
15
481
𝑋̅4 = = 32.07
15
600
𝑋̅5 = = 40.00
15
506
𝑋̅6 = = 33.73
15
591
𝑋̅7 = = 39.40
15
∑ 𝑋̅
Population Mean 𝜇 =
𝑁
251.87
𝜇=
7
𝝁 = 𝟑𝟓. 𝟗𝟖 𝑨𝒏𝒔𝒘𝒆𝒓
133
Name: ________________________ Course: ________
Classroom Activity No.5.3 Section: ________
134
2. Evaluate the population mean using the central limit theorem of the
tabulated samples shown below.
Sample 1 2 3 4 5 6 7
𝑋1 14 16 35 9 7 15 19
𝑋2 52 65 31 34 35 6 63
𝑋3 14 388 56 76 62 66 56
𝑋4 71 55 29 24 63 23 43
𝑋5 25 46 6 57 29 58 45
𝑋6 31 23 76 25 5 70 28
𝑋7 4 30 62 19 45 7 4
𝑋8 28 53 58 5 37 24 62
𝑋9 33 37 34 71 66 82 2
10 9 9 73 2 35 45 57
𝑋11 63 22 21 4 55 35 15
𝑋12 12 5 47 71 47 2 73
𝑋13 20 55 9 28 44 61 27
𝑋14 30 37 36 24 46 25 24
𝑋15 45 18 7 33 33 24 46
135
Chapter 6
Interval Estimation
TOPIC LESSON
1. Interval estimation of the mean(Known & Unknown 𝜎)
2. Interval estimate of the proportion
3. t – Distribution
4. Population mean
5. Population proportion
6. Confidence interval estimator
OBJECTIVES
For the students to:
1. Illustrate point and interval estimation
2. Identify point estimator for population mean
3. Identify the form of confidence interval estimator for
population mean(known & unknown 𝜎)
4. Illustrate and construct t – distribution.
5. Identify point estimator for population proportion
6. Identify the form of confidence interval estimator for
population proportion by central limit theorem.
7. Calculate the length of confidence interval
8. Draw conclusions based on confidence interval estimate
136
Interval Estimation
𝑍𝜎
Lower Limit: 𝑋̅ − Eq. 6.1
√𝑛
𝑍𝜎
Upper Limit: 𝑋̅ + Eq. 6.2
√𝑛
𝑍𝜎 𝑍𝜎
𝑋̅ − ≤ 𝜇 ≤ 𝑋̅ +
√𝑛 √𝑛
1−∝
Where: 𝑍 = 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑎𝑟𝑒𝑎 𝑜𝑓
2
137
Example 1: A certain brand of canned evaporated milk is labelled 331 cc.
Determine the confidence interval for a mean with a 98% level
of confidence if the sample mean is 330 cc. for n = 16 while the
population standard deviation 𝜎 is 2.5 cc.
0.98
𝐴= = 0.49 , 𝑍 = ±2.327
2
𝑍𝜎
Lower Limit: 𝑋̅ −
√𝑛
2.327×2.5
𝐿𝐿 = 330 −
√16
𝐿𝐿 = 328.55
𝑍𝜎
Upper Limit: 𝑋̅ +
√𝑛
2.327×2.5
𝑈𝐿 = 330 +
√16
𝑈𝐿 = 331.50
Note: The mean is less than the upper limit which means that the mean is
still within the samples. Therefore the canning operation is still going
well.
138
Example 2: It is known for a paint manufacturer that the standard deviation
of canned paint is 0.02 gallon. A random of 64 cans sample is
done and found out that the average content of the gallon can is
0.995 gallon. At 95% confidence level of the true population
average, set up the confidence interval estimate.
0.95
𝐴= = 0.45 , 𝑍 = ±1.96
2
𝑍𝜎
Lower Limit: 𝑋̅ −
√𝑛
1.96×0.02
𝐿𝐿 = 0.995 −
√64
𝐿𝐿 = 0.9901
𝑍𝜎
Upper Limit: 𝑋̅ +
√𝑛
1.96×0.02
𝑈𝐿 = 330 +
√64
𝑈𝐿 = 0.9999
Note: The mean of the population is greater than the upper limit which
means that the samples are really lesser in paint content. Therefore,
the production has a problem in canning operations.
139
Name: ________________________ Course: ________
Classroom Activity No. 6.1 Section: ________
1. A sample of 100 bottles of liquid detergent labelled “375 cc.” are chosen
from tens of thousands produced in one day has a mean 𝑋̅ of 374 cc. Set up a
99% confidence interval estimate if the population is normally distributed
having a standard deviation of 5 cc?
140
2. A light bulb factory wants to know the average life of their one shipment of
bulbs. They conducted a random sampling for 36 light bulbs and found out
the mean life of the samples was 220 hours. Set up a 99% confidence
interval estimate of the true population mean of 200 hours with a standard
deviation of 7 hours?
141
Confidence Interval of the Mean (𝝈 𝑼𝒏𝒌𝒏𝒐𝒘𝒏)
Student’s t-Distribution:
𝑋̅−𝜇
𝑆 Eq. 6.3
√𝑛
Properties of t-Distribution
142
STUDENT'S t-DISTRIBUTION
df 0.400 0.300 0.200 0.100 0.050 0.025 0.010 0.005
1 0.325 0.727 1.376 3.078 6.314 12.706 31.821 63.657
2 0.289 0.617 1.061 1.886 2.920 4.303 6.965 9.925
3 0.277 0.584 0.978 1.838 2.353 3.182 4.541 5.841
4 0.271 0.569 0.941 0.155 2.132 2.776 3.747 4.604
5 0.267 0.559 0.920 1.476 2.015 2.571 3.365 4.032
6 0.265 0.553 0.906 1.440 1.943 2.447 3.143 3.707
7 0.263 0.549 0.896 1.415 1.895 2.365 2.998 3.499
8 0.262 0.546 0.889 1.397 1.860 2.306 2.896 3.355
9 0.261 0.543 0.883 1.383 1.833 2.262 2.821 3.250
10 0.260 0.542 0.879 1.372 1.812 2.228 2.764 3.169
11 0.260 0.540 0.876 1.363 1.796 2.201 2.718 3.106
12 0.259 0.539 0.873 1.356 1.782 2.179 2.681 3.055
13 0.259 0.538 0.870 1.350 1.771 2.160 0.031 3.012
14 0.258 0.537 0.868 1.345 1.761 2.145 2.624 2.977
15 0.258 0.536 0.866 1.341 1.753 2.131 2.602 2.947
16 0.258 0.535 0.865 1.337 1.746 2.120 2.583 2.921
17 0.257 0.534 0.863 1.333 1.740 2.110 2.567 2.898
18 0.257 0.534 0.862 1.330 1.734 2.101 2.552 2.878
19 0.257 0.533 0.861 1.328 1.729 2.093 2.539 2.861
20 0.257 0.533 0.860 1.325 1.725 2.086 2.528 2.845
21 0.257 0.532 0.859 1.323 1.721 2.080 2.518 2.831
22 0.256 0.532 0.858 1.321 1.717 2.074 2.508 2.819
23 0.256 0.532 0.858 1.319 1.714 2.069 2.500 2.807
24 0.256 0.531 0.857 1.318 1.711 2.064 2.492 2.797
25 0.256 0.531 0.856 1.316 1.708 2.060 2.485 2.787
26 0.256 0.531 0.856 1.515 1.706 2.056 2.479 2.779
27 0.256 0.531 0.855 1.314 1.703 2.052 2.473 2.771
28 0.256 0.530 0.855 1.313 1.701 2.048 2.467 2.763
29 0.256 0.530 0.854 1.311 1.699 2.045 2.462 2.756
30 0.256 0.530 0.854 1.310 1.697 2.042 2.457 2.750
40 0.255 0.529 0.851 1.303 1.684 2.012 2.423 2.704
60 0.254 0.527 0.848 1.296 1.671 2.000 2.390 2.660
120 0.254 0.526 0.845 1.289 1.658 1.980 2.358 2.617
0.254 0.524 0.842 1.282 1.645 1.960 2.326 2.576
143
Degrees of Freedom Concept
Illustration: There are 4 values whose mean is 10. This means that the
total value is 40. If there are three values that are free to
vary like 8, 9, and 10, the fourth number which is 13
cannot be varied because it will not give the total of 40.
That is the reason of the (n – 1) degrees of freedom.
𝑡 𝑆
Lower Limit: 𝑋̅ − 𝑛−1 Eq. 6.4
√𝑛
𝑡 𝑆
Upper Limit: 𝑋̅ + 𝑛−1 Eq. 6.5
√𝑛
𝑡 𝑆 𝑡 𝑆
The Confidence Interval will be: 𝑋̅ − 𝑛−1 ≤ 𝜇 ≤ 𝑋̅ + 𝑛−1
√𝑛 √𝑛
Example: The sample of 36 pieces of steel bars shows 35,000 psi strength
with 500 psi sample standard deviation. Evaluate the given data
setting a 95% level of confidence of estimating the population
mean.
144
Given: 𝑋̅ = 35,000 psi, S = 500 psi, n = 36
Level of confidence = 95%, hence, ∝ = 0.05
Solution: t = 2.045 taken from the table of Student’s t-distribution where
∝/2 = 0.025
𝑡 𝑆
Lower Limit: 𝑋̅ − 𝑛−1
√𝑛
2.045×500
𝐿𝐿 = 35,000 −
√36
𝐿𝐿 = 34,829.58
𝑡 𝑆
Upper Limit: 𝑋̅ + 𝑛−1
√𝑛
2.045×500
𝑈𝐿 = 35,000 +
√36
𝑈𝐿 = 35,170.42
Note: The mean of the population is approximated between the lower limit
and the upper limit of the chosen samples. The validity of this is
dependent on the assumption that the distribution of the strength of
steel is normal.
145
Name: ________________________ Course: ________
Classroom Activity No. 6.2 Section: ________
146
2. A light bulb factory wants to know the average life of their one shipment of
bulbs. They conducted a random sampling for 100 light bulbs and found out
the mean life of the samples was 200 hours with standard deviation of 5
hours. Set up a 99% confidence interval estimate for the evaluation of
population mean.
147
Sampling distribution of the proportion
When dealing with categorical data, the sample mean 𝑋̅ is the
same as the sample proportion 𝑝𝑠 with the same characteristics. In a
trial observation of 1, 0, 1, 0, 1, wherein 1 is a success and 0 is a
failure, the total successes is 3 and the failure is 2. The mean of the
success trials is 0.60. That is also the proportion of the number of
successes in the trial observation. The sample proportion 𝑝𝑠 therefore
can be defined as:
𝑋
𝑝𝑠 = Eq. 6.6
𝑛
𝑝(1−𝑝)
𝜎𝑝𝑠 = √ Eq. 6.7
𝑛
𝑋̅−𝜇
𝑍= Eq. 6.8
𝜎𝑥̅
𝑝(1−𝑝)
Substitute: 𝑝𝑠 = 𝑋̅, 𝑝 = 𝜇𝑋̅ , and 𝜎𝑝𝑠 = √ = 𝜎𝑥̅
𝑛
148
Hence, the difference between the Sample and Population
Proportion in Standardized Normal Units is:
𝑝𝑠 −𝑝
𝑍= Eq. 6.9
𝑝(1−𝑝)
√
𝑛
Example: The registrar of a state university determines that 40% of all the
students exceeds the average grade of 2.50. If 200 students are
chosen at random, what is the probability that the sample
proportion of the students is more than 0.35?
0.35−0.40
𝑍=
0.4(0.60)
√
200
𝑍 = −1.44
Z = -1.44
149
Confidence Interval Estimate for Proportion
are at least 5. With this, we can approximate a normal distribution and set up
(1−∝) × 100% confidence interval estimate for population proportion 𝑝.
𝑝𝑠 (1−𝑝𝑠 )
Lower Limit: 𝑝𝑠 − 𝑍√ Eq. 6.10
𝑛
𝑝𝑠 (1−𝑝𝑠 )
Upper Limit: 𝑝𝑠 + 𝑍√ Eq. 6.11
𝑛
𝑝𝑠 (1−𝑝𝑠 ) 𝑝𝑠 (1−𝑝𝑠 )
The Interval: 𝑝𝑠 − 𝑍√ ≤ 𝑝 ≤ 𝑝𝑠 + 𝑍√
𝑛 𝑛
150
35
𝑝𝑠 = = 0.175
200
𝑝𝑠 (1−𝑝𝑠 )
`Lower Limit: 𝑝𝑠 − 𝑍√
𝑛
0.175×0.825
𝐿𝐿 = 0.175 − (1.645)√
200
𝐿𝐿 = 0.1308
𝑝𝑠 (1−𝑝𝑠 )
Upper Limit: 𝑝𝑠 + 𝑍√
𝑛
0.175×0.825
𝑈𝐿 = 0.175 − (1.645)√
200
𝑈𝐿 = 0.2192
Note: The production manager has determine with 90% confidence level
that between 13.08% to 21.92% of the produced on that day have
some defects on the standards of the company.
151
Name: ________________________ Course: ________
Classroom Activity No. 6.3 Section: ________
152
2. Determine the critical value of t in each of the following:
a. 1−∝ = .96, 𝑛 = 16
b. 1−∝ = .95, 𝑛 = 25
c. 1−∝ = .94, 𝑛 = 36
d. 1−∝ = .92, 𝑛 = 49
e. 1−∝ = .90, 𝑛 = 64
153
3. The local branch manager of a universal bank desires an estimate of the
average amount of depositors’ savings accounts in the bank. He selected
30 depositors at random and the result show a sample mean of P47,500
and sample standard deviation of P12,000.
a. Assuming a normal distribution, set up a 90% confidence interval
estimate of the average amount held in all savings accounts
depositors.
b. If a depositor has P40,000 in his passbook, is this unusual?
154
CHAPTER 7
FUNDAMENTALS OF HYPOTHESIS
TESTING
TOPIC LESSON
1. Null and Alternative Hypothesis
2. Level of Significance, Rejection Region
3. Types of Error in Hypothesis Testing
4. Z – Test
5. T - Test
OBJECTIVES
For the students to:
1. Illustrate null and alternative hypotheses including
significance level and rejection region.
2. Calculate the probabilities of committing a type I and II
error.
3. Formulate null and alternative hypotheses.
4. Identify the form of test to be done.
5. Compute for the test statistical value.
6. Formulate the hypotheses of population proportion.
7. Draws conclusion on the results of the hypothesis testing.
8. Analyse problems involving hypothesis testing.
155
We will focus on another phase of statistics which is the inferential
statistics. A step-by-step methodology will be presented at the middle part of
this chapter on how to deal with hypothesis testing on population and sample
parameters. We will analyse the results we observe and the results we are
expecting to get if there is hypothetical assumption which is actually true.
The word “Decision” is one of the most frequently used terms in the
modern statistics today. A very important role is being played by
“Inferential Statistics” in the construction and analysis of standards and
principles wherein decisions are being given. The major function of many
businessmen is to make and give decisions every day. Their job is to decide
whether or not to produce a product or not, to enter into another product or
not, buy a new equipment or not; and the likes.
157
There is always the possibility of making an error in drawing a
decision based on the table shown. It may either be a type I or type II
error if a decision is made but not both types at one time. To
summarize them we have:
Level of significance:
158
One-tailed and two-tailed test:
Steps in hypothesis-testing:
159
3. When population standard deviation is given, use the area under
normal curve for z-test. If the given standard deviation is for the
samples, use the student’s t-distribution for t-test.
4. Determine the tabular value for the test.
a. For z-test:
i. Use the z-values on the table of areas under normal curve
(chapter 5)
b. For a t-test:
i. Determine the degrees of freedom.
ii. Look for the corresponding value from the t-distribution
(chapter 6).
1. Single sample: df = n-1.
2. For two samples, df = n1 + n2 – 2
Where:
n1 = number of items in the first sample
n2 = number of items in the second sample
5. Determine the computed value of z or t based on the given data using
the following formulas:
a. For z-test:
i. When sample mean, population mean and standard
deviation, and number of samples are given. Comparing
sample mean and population mean.
√𝑛(𝑋̅−𝜇)
𝑍= Eq. 7.1
𝜎
160
𝑋̅1 −𝑋̅2
𝑍= 1 1
Eq. 7.2
𝜎√ +
𝑛1 𝑛2
b. For t-test:
i. When sample mean, population mean, sample standard
deviation, and number of items of sample are given.
Comparing sample mean and population mean.
√𝑛−1(𝑋̅−𝜇)
𝑡= Eq. 7.4
𝑠
161
Example 1. The census in one school campus show that the mean weight of
college students was 48 kilos, with a standard deviation of 2
kilos. A sample of 36 students were found to have a mean
weight of 47 kilos. Are the 36 students really heavier than the
rest, using 0.05 significance level?
Step 1. Ho: The 36 college students are not really lighter than
the rest.
Ha: The 36 college students are really lighter than the
rest. This hypothesis is a directional and suited for one-
tailed test.
𝑥̅ z = ±1.645
162
AREAS OF A NORMAL CURVE
Z 0 1 2 3 4 5 6 7 8 9
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0754
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2258 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2518 0.2549
0.7 0.2580 0.2612 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2996 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
√36(48−47)
𝑧=
2
(6)(1)
𝑧=
3
𝒛 = ±𝟐. 𝟎
163
Step 6. The computed value of 2.0 is greater than the absolute
tabular value of 1.645. Therefore, reject the null
hypothesis. The 36 students are really lighter than the
rest.
165
Step 5. The given values in the problem are:
𝑥̅1 = P200 n1 = 200
𝑥̅2 = P195 n2 = 220
𝛿 = P20
𝑥̅ 1 −𝑥̅ 2
Formula: 𝑧= 1 1
𝛿√ +
𝑛1 𝑛2
200−195
𝑧= 1 1
20√ +
200 220
5
𝑧=
20√0.005+0.0045
5
𝑧=
20(.0977)
5
𝑧=
1.954
𝒛 = +𝟐. 𝟓𝟓𝟗
significance.
166
Solution: This is comparing two proportions
𝑝1 −𝑝2
Formula: 𝑧= 𝑝1 𝑞1 𝑝2 𝑞2
√ 𝑛 + 𝑛
1 2
167
0.50−0.60
𝑧=
0.50(0.50) 0.60(0.40)
√ +
100 180
−0.10
𝑧= 0.25 0.24
√ +
100 180
−0.10
𝑧=
√0.00383
−𝟎.𝟏𝟎
𝒛= = −𝟏. 𝟔𝟐
𝟎.𝟎𝟔𝟐
168
Step 4. Look for the degree of freedom
df = n – 1, df = 50 – 1, df = 49
The tabular value for one-tailed test when df = 49 and
level of significance = 0.05 is 1.676 (by Interpolation)
STUDENT'S t-DISTRIBUTION
df 0.400 0.300 0.200 0.100 0.050 0.025 0.010 0.005
1 0.325 0.727 1.376 3.078 6.314 12.706 31.821 63.657
2 0.289 0.617 1.061 1.886 2.920 4.303 6.965 9.925
3 0.277 0.584 0.978 1.838 2.353 3.182 4.541 5.841
4 0.271 0.569 0.941 0.155 2.132 2.776 3.747 4.604
5 0.267 0.559 0.920 1.476 2.015 2.571 3.365 4.032
6 0.265 0.553 0.906 1.440 1.943 2.447 3.143 3.707
7 0.263 0.549 0.896 1.415 1.895 2.365 2.998 3.499
8 0.262 0.546 0.889 1.397 1.860 2.306 2.896 3.355
9 0.261 0.543 0.883 1.383 1.833 2.262 2.821 3.250
10 0.260 0.542 0.879 1.372 1.812 2.228 2.764 3.169
11 0.260 0.540 0.876 1.363 1.796 2.201 2.718 3.106
12 0.259 0.539 0.873 1.356 1.782 2.179 2.681 3.055
13 0.259 0.538 0.870 1.350 1.771 2.160 0.031 3.012
14 0.258 0.537 0.868 1.345 1.761 2.145 2.624 2.977
15 0.258 0.536 0.866 1.341 1.753 2.131 2.602 2.947
16 0.258 0.535 0.865 1.337 1.746 2.120 2.583 2.921
17 0.257 0.534 0.863 1.333 1.740 2.110 2.567 2.898
18 0.257 0.534 0.862 1.330 1.734 2.101 2.552 2.878
19 0.257 0.533 0.861 1.328 1.729 2.093 2.539 2.861
20 0.257 0.533 0.860 1.325 1.725 2.086 2.528 2.845
21 0.257 0.532 0.859 1.323 1.721 2.080 2.518 2.831
22 0.256 0.532 0.858 1.321 1.717 2.074 2.508 2.819
23 0.256 0.532 0.858 1.319 1.714 2.069 2.500 2.807
24 0.256 0.531 0.857 1.318 1.711 2.064 2.492 2.797
25 0.256 0.531 0.856 1.316 1.708 2.060 2.485 2.787
26 0.256 0.531 0.856 1.515 1.706 2.056 2.479 2.779
27 0.256 0.531 0.855 1.314 1.703 2.052 2.473 2.771
28 0.256 0.530 0.855 1.313 1.701 2.048 2.467 2.763
29 0.256 0.530 0.854 1.311 1.699 2.045 2.462 2.756
30 0.256 0.530 0.854 1.310 1.697 2.042 2.457 2.750
40 0.255 0.529 0.851 1.303 1.684 2.012 2.423 2.704
60 0.254 0.527 0.848 1.296 1.671 2.000 2.390 2.660
169
Step 5. The given values in the problem are:
𝑥̅ = 1.55 meters
𝜇 = 1.525 meters
𝑠 = 0.15 meters
n = 50
√𝑛−1(𝑥̅ −𝜇)
Formula: 𝑡=
𝑠
√49(1.55−1.525)
𝑡=
0.15
7(0.025)
𝑡=
0.15
𝑡 = 1.17
170
of the administered test, can we say that the case method is
more effective than the traditional method?
df = n1 + n2 – 2
df = 24 + 20 – 2 = 42
Tabular value of t from the table of student’s t-
distribution is t = 1.685 for 𝛼 = 0.05. (By Interpolation)
𝑥̅1 −𝑥̅2
Formula: 𝑡= (𝑛 −1)(𝑠1 ) +(𝑛2 −1)(𝑠2 )2 1
2 1
√ 1 √𝑛 +𝑛
𝑛1 +𝑛2 −2 1 2
171
30−25
𝑡= 2 2
√(24−1)(3) +(20−1)(2.5) √ 1 + 1
24+20−2 24 20
5
𝑡=
√8.144√0.092
t = 5.79
172
Name: ________________________ Course: ________
Classroom Activity No. 7.1 Section: ________
173
2. Capitol Steel Manufacturing Co. is producing steel wire with an average
tensile strength of 150 lbs. A random samples of 36 pieces in a laboratory
tests shows that the mean tensile strength is 145 lbs. and the standard
deviation is 3 lbs. Are the samples really below the average tensile
strength?
174
3. A bus company is looking for a better tire to be used for their Buses.
They would like to adopt steel belted brand (A) unless that there is some
evidence that nylon belted brand (B) is better. An experiment was
conducted where 25 tires from each brand were used. The tires run under
the same conditions until they wore out. The following are the results:
Brand A: 𝑥̅1 = 38,500 kms, s1 = 2,400 kms
Brand B: 𝑥̅2 = 37,000 kms, s2 = 1,200 kms
What would be the conclusion?
175
Name: ________________________ Course: ________
Homework No. 7.1 Section: ________
176
2. Freshmen in a particular school are given entrance examinations in a
number of fields including Mathematics. Over a period of years, it has
been found that the average score in the Math examination is 85 with
standard deviation of 6. A Math professor examined the scores of his
class of 36 and found out that their average is 86. Can he claim that the
average score has increased?
177
3. A fisherman decides that he needs a line that can catch a weight 15 lbs.,
the size of fish he wants. He randomly tests 16 pieces of brand P line and
finds a sample mean of 16.50 lbs. If the standard deviation of the brand P
is 1.2 lbs., what can be conclude about brand P?
178
CHAPTER 8
SIMPLE REGRESSION &
CORRELATION
TOPIC LESSON
1. Bivariate Data
2. Scatter Diagram
3. Pearson Coefficient
4. Dependent and Independent Variables
5. Regression Line
OBJECTIVES
For the students to:
1. Illustrate the nature of bivariate data.
2. Construct a scatter diagram.
3. Calculate the Pearson coefficient.
4. Solve problems involving correlations.
5. Identify dependent and independent variables.
6. Plot the regression line in scatter diagram.
7. Compute the slope of regression line.
8. Determine values of dependent variables.
9. Evaluate problems involving regression analysis.
179
Simple Regression Analysis:
For example, given the heights of certain people, their weights can be
taken correspondingly. We can estimate or predict the possible height of a
person whose weight is known, say, 105 lbs. In this case, the height
corresponding to 105 lbs. is not known in the original data.
180
Scatter Diagram
70
69
68
67
66
65
64
63
62
61
60
10 11 11 12 12 13 13 14 14 15 15
5 0 5 0 5 0 5 0 5 0 5
We are assuming that the heights are dependent on weights, hence, the
values of Y here depend on the values by X. Let us estimate the value of Y
when X is 150 lbs.
Note: While the values of X and Y are increasing, it doesn’t mean that
their increments vary uniformly. If ever there is a uniform variations, then,
there’s no need to resort in the computations of estimated value of Y.
181
2. The second method uses the regression formula. This gives the exact
value of Y when X is 150 lbs. This method engaged the use of the
scatter diagram. This consists of plotting the points corresponding to
the paired values of X and Y on an X-Y axes system. (scatter
diagram)
Note that the trend line fulfils the conditions mentioned earlier:
1. It approximates the general direction of the points.
2. It passes through the points.
3. The sum of the vertical distances (from the points to
the trend line) of the points above the line is
approximately equal to the sum of the distances of the
vertical points below the line. One can check this by
using a ruler.
182
Scatter Diagram Trend Line
70
69 N
68
67 P
66
65
64
63
62
61
60
105 110 115 120 125 130 135 140 145 150 155
Lines M and N do not fulfil the three conditions and are obviously
incorrect trend lines. Also, there’s no need for the trend line need to pass
183
through the first or last points. If ever happened, it is just a matter of
coincidence. Neither, there’s no need for an equal number of points above
and below the trend line.
The trend line is a straight line. We know from algebra that a straight
line has an equation which follows the form:
𝑦 = 𝑚𝑥 + 𝑏 Eq. 8.1
b = y-intercept.
184
Statistics provides the formula for finding the value of m and b, but,
let us first explain briefly why LSRL is used. LSRL is a regression line of
“Least Square” which means that the most precise trend line that may be
drawn out of it is one where the sum of the squares of the vertical distances
of the points from the line is least of minimum. All other lines other than
the LSRL will yield a higher results.
This is similar when we say that the sum of the vertical distances of
the points above the line should be equal to the sum of the vertical distances
of the points below the line. If they are not equal, then the sum of the
squares of the vertical distances below and above the line is not minimum.
Let us illustrate the use of these formulas by using the same example
we used in explaining the graphical approach.
EXCEL Computation
X Y XY XX
1 105 62.00 6,510.00 11,025.00
2 110 62.50 6,875.00 12,100.00
3 115 63.50 7,302.50 13,225.00
4 120 64.00 7,680.00 14,400.00
5 125 64.50 8,062.50 15,625.00
6 130 65.00 8,450.00 16,900.00
7 135 65.00 8,775.00 18,225.00
Σ 840 446.50 53,655.00 101,500.00
b 50.93
m 0.11
The Formulas:
185
𝑁(∑𝑋𝑌)−(∑ 𝑋)(∑ 𝑌)
𝑚= 2 Eq. 8.3
𝑁(∑ 𝑋 2 )−(∑ 𝑋)
Analytical Computations:
(446.5)(101,500)−(840)(53,655)
𝑏=
7(101,500)−(840)2
𝑏 = 50.93
𝑁(∑𝑋𝑌) − (∑ 𝑋)(∑ 𝑌)
𝑚= 2
𝑁(∑ 𝑋 2 ) − (∑ 𝑋)
7(53,655)−(840)(446.5)
𝑚=
7(101,500)−(840)2
𝑚 = 0.107
𝑌 = 0.107𝑋 + 50.93
When X=150:
𝑌 = 0.107𝑋 + 50.93
𝒀 = 𝟔𝟔. 𝟗𝟖 𝒊𝒏𝒄𝒉𝒆𝒔
This result is not very far from the graphical result of 67.10 inches
from the graphical method.
186
The method of LSRL is very useful in providing a fairly accurate
estimate when the values dependent and independent variables are given.
The method presumes the dependency of one variable to the other variable.
It also the same presumption on the trend line to be approximately straight.
Another type of problem on this is the time series. In here, the values
of one variable corresponding to several years are known. The trend line
LSRL equation will still be employed with a little modification.
Supposing, these are the sales of a certain firm for seven years.
Assuming that the trend will continue in the near future, what would be the
forecast in 2011 and 2012.
2004 P 450
2005 470
2006 495
2007 510
2008 540
2009 580
2010 630
(3,675)(28,196,371)−(14,049)(7,376,530)
𝑏= = −57,176.25
7(28,196,371)−(14,049)2
𝑁(∑𝑋𝑌)−(∑ 𝑋)(∑ 𝑌)
𝑚= 2
𝑁(∑ 𝑋 2 )−(∑ 𝑋)
187
7(7,376,530)−(14,049)(3,675)
𝑚= = 28.75
7(28,196,371)−(14,049)2
No. of
Yrs X Y XY 𝑿𝟐
1 2004 450 901800 4016016
2 2005 470 942350 4020025
3 2006 495 992970 4024036
4 2007 510 1023570 4028049
5 2008 540 1084320 4032064
6 2009 580 1165220 4036081
7 2010 630 1266300 4040100
Σ 14049 3675 7376530 28196371
b -57176.25
m 28.75
Y = 28.75X – 57,176.25
Y = 28.75(2011) – 57,176.25
Y = 640 Million
Y = 28.75(2012) – 57,176.25
Y = 668.75 Million
188
Simple Correlation analysis:
Correlation and regression analysis are very close with each other in
terms of variables analysis. Regression analysis talks about the projection
estimation of one variable depending on the one or two variables.
Correlation analysis deals with the relationship of one variable to another
variable. The strength of relationship between two variables is being
computed and measured by the coefficient of correlation (r).
Examples
189
3. Grades of student against study time
4. GNP against total investment
5. Work output against number of years in experience
The Formula:
𝑁(∑𝑋𝑌)−(∑𝑋)(∑𝑌)
𝑟= Eq. 8.4
√[𝑁(∑𝑋 2 )−(∑𝑋)2 )][𝑁(∑𝑌 2 )−(∑𝑌)2
190
The terms in this formula are the same terms used in computing the
values of the parameters m and b in regression analysis. To solve for r, we
have to add one more column for 𝑌 2 .
191
EXCEL Computation:
X Y XY 𝑿𝟐 𝒀𝟐
A 65 66 4290 4225 4356
B 66 68 4488 4356 4624
C 67 66 4422 4489 4356
D 69 70 4830 4761 4900
E 64 67 4288 4096 4489
F 65 65 4225 4225 4225
G 72 70 5040 5184 4900
H 65 68 4420 4225 4624
I 69 71 4899 4761 5041
J 66 67 4422 4356 4489
K 67 68 4556 4489 4624
L 65 68 4420 4225 4624
Σ 800 814 54300 53392 55252
r 0.728705
A cursory inspection on the data shows that the eldest sons are
generally taller than their fathers except for C and F. This relationship
between the heights of father and son describes some degree of
correlation.
𝑁(∑𝑋𝑌)−(∑𝑋)(∑𝑌)
𝑟=
√[𝑁(∑𝑋 2 )−(∑𝑋)2 ][𝑁(∑𝑌 2 )−(∑𝑌)2
12(54,300)−(800)(814)
𝑟=
√[12(53,392)−(800)2 ][12(55,252)−(814)2
651,600−651,200
𝑟=
√(640,704−640,0000)(663,024−662,596)
400
𝑟= = +0.7287 = +0.73
√301312
192
Interpretation: The result r = +0.73 could be initially interpreted
as follows: there is some degree of correlation
between the heights of the fathers and the eldest
sons.
193
Name: ________________________ Course: ________
Classroom Activity No. 8.1 Section: ________
𝑋:7, 12, 14, 16, 23, 27, 28, 34, 40
1. Given the values of:
𝑌:8, 11, 15, 18, 22, 28, 30, 36, 42
194
2. The following data shows the grades of ten students in Algebra and
Statistics.
Alg. (X) 83 88 90 69 82 79 95 88 83 77
Stat. (Y) 80 85 86 78 86 87 94 84 88 82
195
Name: ________________________ Course: ________
Assignment No. 8.1 Section: ________
1. Supposing, these are the sales of a certain firm for seven years. Assuming
that the trend will continue in the near future, what would be the forecast in
2016 and 2017.
2009 P 1,450
2010 1,670
2011 1,950
2012 2,510
2013 2,950
2014 3,580
2015 3,630
196
2. Supposing, these are the sales and gross profit of a certain firm for seven
years. Test the correlation of the two variables.
197