Anda di halaman 1dari 195

INTRODUCTION

Prepared by:
REBECCA K. CAJUCOM

rkcajucom 1
STATISTICS

 Statistics is defined in two ways:


 in its singular sense, it is a branch of science
that deals with the collection, presentation,
analysis and interpretation of quantitative data.
 in its plural sense, it is a set of quantitative data.
 Example: vital statistics of 36-24-36

rkcajucom 2
Uses of Statistics
 Business Research
 Market Research
 Economics Research
 Product Control – quality control, price control,
and volume of production
 Life Insurance
 Employee and Employer relationship
 Etc….

rkcajucom 3
Branches of Statistics

STATISTICS

DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS MATHEMATICAL STATISTICS

PROBLEM OF ESTIMATION

POINT ESTIMATION

INTERVAL ESTIMATION

TESTS OF HYPOTHESES

PARAMETRIC TESTS

NON-PARAMETRIC TESTS

rkcajucom 4
2 Divisions of Statistics
 Descriptive Statistics – division wherein one
describes a given set of data by a single
measure called statistical description.
 Inferential Statistics – division wherein one
estimates a population parameter based on
samples and one either accepts or rejects
specific assertions about populations using
samples.
a. Problems of Estimation
b. Tests of Hypotheses

rkcajucom 5
Population vs. Sample
 Population - a set of data consisting of all
conceivably possible (or hypothetically
possible) observations of a given
phenomenon.
 Sample - a set of data consisting of only a
part of the observations of a population.

rkcajucom 6
Terms needed in the discussion

Sample Population

Method of gathering Sample Census


facts of interest Survey

Collection of Data Sample Population


distribution Distribution
Computed Value Statistic Parameter
(Roman (Greek Alphabet)
Alphabet)

rkcajucom 7
Symbols Used
Sample Statistics Population Parameter
Mean m (mu)
x
Standard Deviation s s
(small Greek letter sigma)
Variance s2 s2

Proportion
Probability p p (pi)
Percentage
Size n N

Regression Coefficients a,b a, b


(alpha, beta)
Correlation Coefficient r r(rho)

rkcajucom 8
Variables and its classifications
 A variable is a characteristic that can vary
in value among subjects in a sample or
population.
 It can be classified into:
a. Qualitative vs. Quantitative
b. Discrete vs. Continuous
c. Dependent vs. Independent
d. Experimental vs. Observational

rkcajucom 9
Qualitative vs. Quantitative
 Qualitative variable – when scale for
measurement is a set of unordered
categories.
 Quantitative variable- when the possible
values do differ in magnitude. Each possible
value is greater than or less than any other
possible value.

rkcajucom 10
Discrete vs. Continuous
 Discrete variable –variable that can take on an
infinite number of values.
Example: Number of children, number of murders
 Continuous variable- variable that can take an
infinite continuum of possible real number values.
Example: Measurements such as height, weight,
age, amount of time it takes to read a passage of a
book

rkcajucom 11
Dependent vs. Independent
 Dependent (response) variable – outcome
variable about which comparisons are made. It
refers to the goal of investigating the degree to
which the response on that variable depends on the
group to which the subject belongs.
 Independent (explanatory) variable- variable that
defines the groups.

rkcajucom 12
Experimental vs. Observational
 Experimental data –data resulting from planned
experiment. The major purpose of many experiments
is to compare responses of subjects on some
outcome measure, under different conditions, called
treatments. To obtain these data, the researcher
needs a plan (called experimental design) for
assigning subjects to the different conditions being
compared
 Observational data- data obtained from surveys.
The researcher measures subjects’ responses on the
variables of interest, but has no experimental control
over the subjects.
rkcajucom 13
Four(4) Basic Levels of Measurement:

1. Nominal data
2. Ordinal data
3. Interval data
4. Ratio data

rkcajucom 14
Levels of Measurement
 Nominal data – when the measure assigned to
an item is a label used to identify the item.
 Example:
a. Numbers of the baseball uniforms – labels
used to identify players .
b. Democrat, Republican or Independent –
labels used to identify political category.

rkcajucom 15
Levels of Measurement
Ordinal data – when the measures assigned
permit the items to be ordered with respect to
some criterion.
 Example:
a. Size of automobile - compact, intermediate
or full size.
b. Class rank to each student based upon
grade point average.

rkcajucom 16
Levels of Measurement
 Interval data – when there is a fixed numerical
unit of measurement and each measure
assigned is expressed as a quantity of those
units.
 Example:
Measurement of temperature – unit of
measurement is degree.

rkcajucom 17
Levels of Measurement
 Ratio data – when there is a fixed unit of
measure and the zero point is inherently defined
as the scale of measurement.
 Example:
Physical distance such as length, width since a
value of zero denotes the absence of any
distance.

rkcajucom 18
Kinds of Distributions
 Qualitative Distribution
 Quantitative Distribution
a. Frequency Distribution
b. Probability Distribution
c. Sampling Distribution

rkcajucom 19
Three Ways of Collecting Data
 One may ask people questions
 One may observe the behavior of
persons, groups or outcomes
 One may utilize existing records of data
other than one’s own research

rkcajucom 20
Three Ways of Presenting Data
 Tabular Form
 Textual (or Paragraph) Form
 Graphical Form
 line diagram or line curve
 pictograph
 pie chart
 bar chart
 statistical map
• dot map
• flow map
• cross hatched map

rkcajucom 21
Tabular Form

Table II
Percentage Distribution of the students included in the Sample
( According to Batch Year )

Batch Year Number of Students %

1993 – 94 20 30.303

1994 – 95 14 21.212

1995 – 96 10 15.152

1996 – 97 22 33.333

Total 66 100.000%

rkcajucom 22
Textual (or Paragraph) Form
 Table II shows that of the 61.682% of the
sample size,
– 30.303% came from Batch year 1993 – 94
– 21.212% from Batch year 1994 – 95
– 15.152% from Batch year 1995 – 96
– 33.333% were from Batch year 1996 – 97.

rkcajucom 23
Graphical Form
FIGURE 2. LINE DIAGRAM

#
of
10

S 8
t
1993-94
u 6
d 1994-95
e
4 1995-96
n
t 1996-97
s 2

0
80 - 99 100 - 119 120 - 139 140 - 159 160 - 179 180 - 199
Combined grades in ECO & ECO 5

rkcajucom 24
Graphical Form
FIGURE 3. BAR CHART

#
of 180 - 199
S 160 - 179
t
140 - 159 1996-97
u
d 120 - 139 1995-96
e
100 - 119 1994-95
n
t 80 - 99 1993-94

0 2 4 6 8 10
Combined Grades in ECO 4 &

rkcajucom 25
Graphical Form
FIGURE 3. BAR CHART

#
10
of
S 8 1993-94
t 6
1994-95
u
d 4 1995-96
e 2 1996-97
n 0
t 80 - 99 100 - 119 120 - 139 140 - 159 160 - 179 180 - 199
s
Combined Grades in ECO 4 & ECO 5

rkcajucom 26
Graphical Form
FIGURE 3. BAR CHART

#
of 25
S
t 20
u 1996-97
d 15
1995-96
e
n 10 1994-95
t 1993-94
5

0
80 - 99 100 - 119 120 - 139 140 - 159 160 - 179 180 - 199
Combined Grades in ECO 4 &

rkcajucom 27
Graphical Form
FIGURE 3. BAR CHART

#
of 10
S 8
t 6 1993-94
u 1994-95
4
d
e 2 1995-96
n 0 1996-97
t 80- 100- 120- 140- 160- 180-
s 99 119 139 159 179 199

CombinedGradesinECO4&ECO5

rkcajucom 28
Graphical Form
FIGURE 1. PIE CHART

Percentage Distribution of Students


Included in the Sample
( According to Batch Year )

33.333% 30.303%

1993-1994
1994-1995
1995-1996
1996-1997

15.152% 21.212%

rkcajucom 29
Graphical Form
FIGURE 1. PIE CHART

Percentage Distribution of Students


Included in the Sample
( According to Batch Year )

33.333% 30.303%

1993-1994
1994-1995
1995-1996
1996-1997

15.152% 21.212%

rkcajucom 30
Sample Design
 A sample design is a definite plan,
determined completely before any data are
actually collected, for obtaining a sample
from a given population.

rkcajucom 31
Kinds of Sampling
 Probability Sampling
-sampling wherein each element is given
an equal chance of being chosen.
 Non- probability Sampling
- sampling wherein each element is not
given an equal chance of being chosen.

rkcajucom 32
Kinds of Probability Sampling
 Simple random sampling
 Systematic sampling with a random start
 Stratified sampling
 Multi- stage sampling
 Cluster sampling

rkcajucom 33
Sampling Frame
 A sampling frame is a list of all N items in the
population, so that we can assign each item one of
the numbers from 1 to N.
 It makes it easy to draw random samples with the
aid of computers or random number table (a table
containing a sequence of numbers that is computer
generated according to a scheme whereby each
digit is equally likely to be any of the integers 0, 1,
2, .. ., 9).

rkcajucom 34
Sources of Variability among Samples
 Sampling error of a statistic- the error that occurs when a statistic based
on sample estimates or predicts the value of a population parameter. For
samples of size 1000, the sampling error for estimating percentages is
usually no greater than 3% or 4%.
 Other factors: (a) Undercoverage of the sampling frame. It may lack
representation from some groups in the population of interest.
(b) Problem of non-response of the subjects. Subjects supposed to be
included may refuse to participate, or it may not be possible to reach
them. © Response bias created by the interviewer or other factors
affecting the response. Respondents might lie if they think their response
to a question is socially unacceptable.
(d) Measurement of the variables have a large impact on the types of
results observed. (e) Order of the questions can influence the results
dramatically.
 Missing data- Some subjects do not provide responses for some of
variables measured.
rkcajucom 35
Simple Random Sampling
 The population is listed and the plan and sample
size is fixed. Each possible sample has the same
probability of being selected. A table of random
numbers is used and sampling is done without
replacement. Apart from this, the sampling units
are drawn independently.
 Randomization, a mechanism for ensuring that the
sample representative is adequate for inferential
methods, is used.

rkcajucom 36
Systematic Sampling with a Random Start

 Used when every kth individual in the population


is included in the sample.
 Sampling interval (or skip number) :
N
k 
n
where
N = population size
n = sample size

rkcajucom 37
Stratified Sampling
 The Stratified sampling is a procedure that
consists of stratifying ( or dividing ) the
population into a number of non-overlapping
sub-populations, or strata, and then taking a
sample from each stratum.
 Sampling is called proportional if the
proportions of the sample chosen in the various
strata are the same as those existing in the
entire population
rkcajucom 38
Sample Sizes for Proportional Allocation

 Formula:
Ni
ni  n
N

for i = 1, 2, . . . , and k
where n = n1 + n2 + . . . + nk = the total
size of the sample.
 When necessary, use integers closest to the values
given by this formula.

rkcajucom 39
Proportional Allocation Example

 A stratified sample of size n = 60 is to be


taken from a population of size N = 4,000,
which consists of three strata of sizes:
N1 = 2,000, N2 = 1,200 and N3 = 800. If
the allocation is to be proportional, how
large a sample must be taken from each
stratum?

rkcajucom 40
Cluster Sampling
 Cluster sampling technique is useful when a
complete listing of the population is not available.
The total population is divided into a number of
relatively small subdivisions, and some of these
subdivisions, or clusters, are randomly selected for
inclusion in the overall sample.
 A cluster sample is one for which the sampling units
are the subjects in a random sample of the clusters.
 If the clusters are geographic subdivisions, this kind
of sampling is called area sampling.

rkcajucom 41
Multistage Sampling
 Multistage sampling methods use
combinations of various sampling
techniques. They are common in social
science research for they are simpler to
implement than simple random sampling
but provide a broader sampling of the
population than a single method, such as
cluster sampling, provides.

rkcajucom 42
**Cross Stratification
 In a system-wide survey designed to determine the
attitude of its students, say, toward a new tuition
plan, a state college system with 17 colleges
might stratify its sample not only with respect to
colleges, but also with respect to students’ class
standing, sex and major. This stratification will
increase the precision ( reliability ) of
estimates and other generalizations, and is widely
used, particularly in opinion sampling and
market research.

rkcajucom 43
**Quota Sampling
 Quota Sampling is a convenient, relatively expensive, and
sometimes necessary procedure, but as it is often executed,
the resulting samples do not have the essential features of
random samples.
 For instance, in determining voters’ attitudes toward
increased medical coverage for elderly persons, an
interviewer working on a certain area might be told to
interview 6 male self-employed homeowners under 30 years
of age, 10 female wage-earners in the 45 -60 bracket who
live in apartments, 3 retired males over 60 who live in
trailers, and so on, with the actual selection of the
individuals being left to the interviewer’s discretion.
rkcajucom 44
FREQUENCY
DISTRIBUTION

Prepared by:
REBECCA K. CAJUCOM

rkcajucom 45
Frequency Distribution

The Frequency distribution is a


distribution of the number of
observations over arbitrarily defined
classes or categories.

rkcajucom 46
Frequency Distribution
 Example 1:
Population Distribution of
College students in a certain
School
Year Level Number of Students
I 1000
II 500
III 300
IV 200
Total 2000

rkcajucom 47
Frequency Distribution
 Example 2:
Classes Frequency
2 - 3 2
4 - 5 3
6 - 7 9
8 - 9 11
10 - 11 4
12 - 13 1
n = 30

rkcajucom 48
Terms defined
 Classes or class intervals (CI)  Class size ( c ) – the width of a
–the symbol defined by the end CI which is the absolute
numbers 2 -3, 4 -5, 6- 7, etc… difference of 2 successive lower
 Class limits ( CL )-- the end class limits or 2 successive
numbers of a CI. upper class limits OR the
a. Lower class limits difference of the CB of a CI.
b. Upper class limits  Class mark ( Xi ) -- the
 Class boundaries ( CB )—the midpoint or the mid-value of a
true limits of a CI. CI which is obtained by getting
a. Lower class boundaries ½ the sum of the class limits of
b. Upper class boundaries a CI.
 Class frequencies (fi where
the i subscript refers to the  Range ( R )—the absolute
number of the class interval) - difference between the lowest
the number of observations value ( LV ) and the highest
falling under a CI. value ( HV ).

rkcajucom 49
Steps in the Frequency Distribution
1. Construction of the Frequency d. Enumerate the class limits by
distribution simply starting with lowest
Steps: value as the first lower class
a. Determine the range (R ). e. limit. Then add the class size
c from thereon. Do the same
b. Decide on the number of CI
thing with the upper class
desired. Choose any
limits. Be sure that the
number from 6 to 16.
highest value is included in
c. Compute the class size to the last CI.
be used. e. Then make a tally sheet of the
R number of observations under
c
No. of class intervals desired each CI to determine the
frequency falling under each
class
rkcajucom 50
Steps in the Frequency Distribution

2. Graphical Presentation of the Frequency


distribution using:
a. Frequency Polygon
b. Histogram
3. Cumulative Frequencies and their
OGIVES
4. Conversion of the Frequency distribution into
its Percentage Distribution

rkcajucom 51
Stem and Leaf Plots
 A stem and leaf plot is a graphical representation
which represents each observation by its leading
digit(s) (the stem) and its final digit (the leaf).Each
stem is a number to the left of the vertical bar and
a leaf is a number to the right of it.
 It conveys much of the same information as a
histogram.
 It is useful for quick portrayals of small sets of
data.
 It can provide simple visual comparisons of two
relatively small samples on a quantitative variable.
rkcajucom 52
Stem and Leaf Plot (example)
Stem Leaf
1 6 7
2 0 3 9
3 0 1 4 4 4 6 8 9 9 9
4 4 6
5 0 2 3 8
6 0 3 4 6 8 9
7 5
8 0 3 4 6 9
9 0 8
10 2 2 3 4
11 3 3 4 4 6 9
12 7
13 1 3 5
rkcajucom 53
STATISTICAL DESCRIPTIONS

By:
REBECCA K. CAJUCOM

rkcajucom 54
Statistical Descriptions

 A Statistical Description is a single


measure used to describe a given set of
data.

rkcajucom 55
MEASURES
of
CENTRAL TENDENCY

rkcajucom 56
Measures of Central Tendency/
Central Location
 Arithmetic Mean ( x 
 Median ( Md )
 Mode ( Mo )
 Harmonic Mean ( HM )
 Geometric Mean ( GM )

rkcajucom 57
Arithmetic Mean( x )
 The Arithmetic mean is the most popularly
known ‘‘ average .’’
 It is unique and it always exists.
 Its serious weakness lies in that it is strongly
influenced by extreme values called ‘‘outliers.”
( An observation is an outlier if it falls more than
1.5 IQR above the upper quartile or more than
1.5 IQR below the lower quartile.)

rkcajucom 58
Other Characteristics of the Mean

 The algebraic sum of the deviations of the


various values of the mean equals zero.
 The sum of the squared deviations is less when
the deviation are taken from the mean than when
they are taken from any other value.
 It cannot be computed when the distribution
contains open-ended intervals unless reasonably
accurate estimates of the class marks for the
open intervals are possible.

rkcajucom 59
Kinds of Arithmetic Mean
 Simple Arithmetic Mean ( S.A.M. )

x 
 x
n

 Weighted Arithmetic Mean ( W.A.M. )

xw 
 wx

w

rkcajucom 60
S.A.M Example
 Find the average grade  Solution
of a student in
86  89  94  78  80
Statistics having the x
following grades in 5 5
quizzes: 86, 89, 94, 78 427

& 80 5
 85.4

rkcajucom 61
W.A.M. Example
 Find the average price per kilo of rice if a
mixture is formed out of the following
varieties of rice:
10k @ P25.70
18k @ P18.75
24k @ P14.65
 Solution:
10(25.70  18(18.75  24(14.65 946.1
xw  
10  18  24 52
 P 18.194/ k
rkcajucom 62
Computation of the Mean
 For Ungrouped Data:  For Grouped Data:

x 
 x
x  xo c
 f d i i
n n
where where
xo = assumed mean which
x = individual item
is one of the class marks
n = the number of items
c = class size
fi = class frequency
di= class deviation =
x x
n = total frequencies i o
c

rkcajucom 63
How to determine the assumed Mean Xo

 If the no. of class intervals is odd, then the


assumed mean is the class mark falling in
the middle.
 If no. of class intervals is even, then the
assumed mean is anyone of the two class
marks falling in the middle.

rkcajucom 64
Median ( Md)
 The Median is a positional measure which
divides the distribution into two equal parts.

50% 50%

 Interpretation:
Md
Lower 50% of the distribution lies below
the Md or Upper 50% of the distribution
lies above the Md .
rkcajucom 65
Characteristics of the Median
 Like the mean, it always exists and is unique for
any set of data.
 Unlike the mean, it is not affected by extreme
values.
 It is utilized as an average when open-ended
intervals are contained in the distribution.
 Since it is a positional measure, in its computation it
does not make use of the values of the individual
items.
 Like the mean, it is meaningful only when the
distribution is fairly normal.

rkcajucom 66
Computation of the Median

 For ungrouped data:


Formula of the median’s position:
Md’s position = n/2 + ½
Case 1. If n = odd no. of individual measures, the
median is obtained by picking out the value of
the middle item from its array.
Case2. If n = even no. of individual observations, the
median is the value halfway between the 2
values in the middle of the ordered series.

rkcajucom 67
Computation of the Median
 For grouped data:
Formula:  n 
  Fm 1 
M d  Lm  c 2 
 fm 
 
 
where
Lm = lower boundary of the Md class
c = class size
n = total frequencies
Fm-1 = cumulative frequency preceding the
cumulative frequency of Md class
fm = class frequency of the Md class

rkcajucom 68
Steps in the computation of the Median

1. Define class boundaries.


2. Cumulate the frequencies from the lowest to
highest.
3. Get n/2.
4. Locate the 1st. no. n/2 in the < cfi column. The Cl
opposite this no. is the median class. But if the 1st
no. located is equal to n/2, then the median is the
upper class boundary of the Cl opposite this no.
5. Determine the Lm, c, Fm-1, and fm and substitute
them in the formula.
rkcajucom 69
Mode ( Mo)
 The Mode is that value with the greatest frequency.
 How to compute:
a. For ungrouped data:
Mo = that value which is repeated the most
number of times.
Examples:
1. 1, 2 , 6, 4, 8, 10, 0, -1
2. 1, 2, 3, 5, 10, -1, 6, 1, 2
3. 1, 3, 5, 7, 9
rkcajucom 70
Characteristics of the Mode
a. It is an extremely poor measure of central location in
statistical inference, but is useful in averaging qualitative
data.
b. For some sets of numerical data, it may not exist and for others it
may not be unique.
c. It is the most meaningful when the distribution is strongly
skewed, since it gives the best indication of the point of
concentration or clustering.
d. Like the median, it is not affected by extreme values and can be
used even when open-ended intervals have been provided for
and in the distribution, provided one of these is not the modal
class.
e. It does not have stability nor the desirable mathematical
properties of the arithmetic mean.
f. If the graph of the distribution is available, it is easily identified
as the abscissa of the peak of the distribution curve.

rkcajucom 71
Computation of the Mode
 For grouped data:
Formula:
f mo  f1
M o  Lmo  c 
2 f mo  f1  f 2

where
Lmo = lower boundary of the modal class
c = class size
fmo = class frequency of the modal class
f1 = class frequency preceding the class frequency of the
modal class
f2 = class frequency following the class frequency of the
modal class
rkcajucom 72
Harmonic Mean (HM)
 The Harmonic Mean is the reciprocal of
the arithmetic mean of the reciprocals of a
given set of values.
 Characteristics:
a. Like the mean, median, and the geometric mean, it is
rigidly defined.
b. It makes use of all the individual values in the set.
c. It is difficult to understand and computations involved
laborious and cumbersome operations.

rkcajucom 73
Harmonic Mean (HM)

 The HM has a limited usefulness but is


appropriate in getting the average of speed,
special rates such as P/dozen, P/gross,
etc...
 Example:
If an aircraft flies 100 miles at 300 mph and
the next 100 miles at 600 mph, what is its
average speed?

rkcajucom 74
Geometric Mean (GM)
 The Geometric Mean is the nth root of the
product of the n values.
 Characteristics:
a. It is not affected by extremely large or small values,
therefore, it’s often used in the place of the arithmetic
mean.
b. It is used in averaging ratios , in estimating the average
rate of change and in computing the average for a series
of values in geometric progression.
c. It is always < Arithmetic mean, but if the numbers are all
the same it is = to the Arithmetic Mean.

rkcajucom 75
GM Example

 The GM is used mainly to average ratios,


rates of change and index numbers.
 Examples:
1. Find the GM of 2/3, 3/4, 4/5, and 9/10.
2. If a quart of milk, a lb. of butter, a dozen
eggs, and a loaf of bread cost 12, 3, 4,
and 9 per cent more than they cost a year
earlier, find its average.

rkcajucom 76
MEASURES of DISPERSION

rkcajucom 77
Measures of Variation/Dispersion

a. Absolute Measures
1. Range (R)
2. Interquartile Range (IR)
3. Average Deviation (AD)
4. Standard Deviation (s) and Variance (s2)
b. Relative Measures
1. Coefficient of Variation (CV)
2. Coefficient of Quartile Deviation (CQD)

rkcajucom 78
Absolute Measures of Dispersion

 Range ( R )
 Interquartile Range ( IR )
 Average Deviation ( AD )
 Standard Deviation ( s ) and Variance ( s2 )

rkcajucom 79
Absolute Measures of Dispersion

 Range ( R ) -- the simplest measure of


dispersion.
 Formula:
1. For ungrouped data:
R = HV - LV
2. For grouped data:
R = Upper limit --- Lower limit
of highest CI of lowest CI
rkcajucom 80
Interquartile Range

 Interquartile Range
IR = Q3 - Q1

 Semi-interquartile Range or
Quartile deviation:
QD = 1/2 IR

rkcajucom 81
Quantiles/ Fractiles

 Quartiles - 4 equal parts


 Deciles - 10 equal parts
 Percentiles - 100 equal parts

rkcajucom 82
The Quartiles
 Figure:  Interpretation of Q3:
Lower 75% of the
data lies below Q3 or
Upper 25% of the
data lies above Q3.
 Interpretation of Q1:
25% 25% 25% 25%
Lower 25% of the
Q1 Q2 Q3 data lies below Q1 or
Upper 75% of the
data lies above Q 1.
rkcajucom 83
The Deciles
 Figure:

10% or 1/10

D1 D2 D3 D4 D5 D6 D7 D8 D9
 Interpretation of D4:
Lower 40% of the data lies below D4 or
Upper 60% of the data lies above D4
rkcajucom 84
The Percentiles
 Figure:

1%or 1/100

P1 . . . . . P43 P50 .... P99

 Interpretation of P43:
Lower 43% of the data lies below P43 or
Upper 57% of the data lies above P 43
rkcajucom 85
Computation of the Quantiles
 For ungrouped data:
1. Determine the position first using the
formula:
D3’s position = 3n/ 10 + 1/2
P45’s position = 45n/ 100 + 1/2
2. Then find the value of the quantile.

rkcajucom 86
Computation of the Quartiles:
 Grouped data Formula:
n 3n
 FQ1 1  FQ3 1
Q3  LQ3 c 4
Q1  LQ1 c 4
fQ1 f Q3

where
LQ1 = lower boundary of the Q1
class LQ3 = lower boundary of the Q3
c = class size
n = total frequencies class
fQ1 = frequency of the Q1 class fQ3 = frequency of the Q3 class
FQ1-1 = cumulative frequency FQ3-1 = cumulative frequency
of the class preceding the
cumulative frequency of of the class preceding the
the Q1 class. cumulative frequency of
the Q3 class.

rkcajucom 87
Computation of the Deciles :

 Grouped data Formula:


3n
 FD3 1
D3  LD3  c  10
f D3
where
LD3 = lower boundary of the D3 class
FD3-1 = cumulative frequency of the class preceding
the cumulative frequency of the D3 class
f D3 = class frequency of the D3 class
n = total frequencies
c = class size

rkcajucom 88
Computation of the Percentiles:

 Grouped Data Formula:


43n
 FP4 3 1
P43  LP4 3  c  100
f P4 3

where
LP43 = lower boundary of the P43 class
FP43 - 1 = cumulative frequency of the class preceding
the cumulative frequency of the P43 class
f P43 = class frequency of the P43 class
n = total frequencies
c = class size

rkcajucom 89
Note the ff. :
 Md = Q2 = D5 = P50  The quantiles are
Q1 = P25 , Q3 = P75 computed similarly as
D1 = P10 the Md is computed.
D2 = P20  # of quantiles = # of
equal parts - 1.
: :
: :
D9 = P90

rkcajucom 90
Four (4) Cases of Problems
 Below what value lies the lower __% of
the distribution?
 Above what value lies the upper ___% of
the distribution?
 Between what values lies the middle ___%
of the distribution?
 What is the middle ___% range?

rkcajucom 91
Chebyshev’s Theorem
 For any set of data (population or sample)
and any constant k greater than 1, at least
1 - 1/k2 of the data must lie within k
standard deviations on either side of the
mean.
 Figure:
P 11/k2

x  ks x x  ks
rkcajucom 92
CT Example 1
 The mean amount of time for chemical
workers to vacate a chemical factory during
a fire drill is 7 minutes with a standard
deviation of 0.5 minutes. Using CT,
determine at least what percentage of the
time the chemical factory can be vacated
during a fire drill between 6 and 8 minutes.

rkcajucom 93
CT Example 2
 For a certain library, the mean daily number
of books which are returned overdue is 45
and the standard deviation is 6 books. Use
CT to determine between what 2 numbers
must lie at least 15/16 of the daily number
of books which are returned overdue.

rkcajucom 94
Average Deviation (AD)
 The Average  Formula:
Deviation is
arithmetic mean of
the
AD 
 x M
i d
n
the absolute  Note the ff.:
deviations of the a. AD is always less than
individual
observations from the the standard deviation.
median or from the b. AD is 4/5 as large as
arithmetic mean of all the standard deviation
the observations. when the distribution is
approximately bell
shaped-curve or normal
rkcajucom 95
Standard Deviation & Variance
 Standard deviation (s)  How to compute:
is the root mean
square. 1. For ungrouped data:
Variance ( s2 ) is the
 x  ( x
 2 2
n
square of the standard s
n(n  1)
deviation. 2. For grouped data:
 Interpretation of s:
On the average, the
s  c
n  fi di2  ( f d 
i i
2

data deviate from the n(n  1)


mean by s units.

rkcajucom 96
Box Plots
 The Box Plot is a graphical summary of both the
central tendency and variation of a set of data. It
portrays the range and the quartiles of the data, and
possibly some outliers.
 The box contains the central 50% of the distribution,
from the lower quartile to the upper quartile. The Md is
marked by a line drawn within the box. The lines
extending from the box are called whiskers. These
extend to the maximum and minimum values unless
there are outliers.
 The box plot is particularly useful for comparing
distributions side by side (or back-to-back
comparisons of two groups) and they identify outliers
separately.

rkcajucom 97
Box Plot Example
 Box Plots for U.S. & Canadian Murder rates
M For US: The upper whisker
0
U and upper half of the central
! box are longer than lower
R 20 ! ones. This indicates that the
D
! right tail of the distribution,
E
15 which corresponds to the
R relatively large values, is
+ longer than the left tail. The
R 10 ! plot reflects the skewness to
a the right of the distribution.
T 5 +
These side-by-side plots
e reveals that the murder rates in
! !
! the US tend to be much higher
and have much greater
US Canada variability.

rkcajucom 98
Relative Measures of Dispersion
 Coefficient of Variation (C.V. )
s
C.V .   100%
x
 Coefficient of Quartile Deviation ( CQD )

Q3 Q1
C.Q.D. 
Q3  Q1

rkcajucom 99
Other Measures

 Measures of Skewness (Sk)

 Measures of Kurtosis ( m̂ 4)

rkcajucom 100
Measure of Skewness
 The Measure of Skewness is the asymmetry of the
distribution.
 Types of skewness
1. Positively skewed (Sk > 0)
2. Negatively skewed (Sk < 0)

3(x  M d 
 Formula:
Sk 
s
Note : a. If Sk = 0 the distribution is normal.
b. Median is always in between mean and mode.
rkcajucom 101
Positively Skewed
 Characteristics:
1. The distribution tapers more to the right
than to the left.
2. Mean is always greater than the Median.
3. The value of skewness is always greater
than the zero or always positive.
 Figure:

^ ^
m3 > 0 (m3 = + )
Mo Md x
rkcajucom 102
Negatively skewed
 Characteristics:
1. The distribution tapers more to the left than
to the right.
2. Mean is always less than the Median.
3. The value of skewness is always less than
zero or always negative.
 Figure: ^m < 0 (m^ =-)
3 3

x
Md Mo
rkcajucom 103
Computation of Skewness (general case)

Forungrouped data: For grouped data:


Formula: Formula:

m3 
 xi  x
( 3
m3 
 fi (xi  x 3
n n
where
where n = total no. of fi = class frequency
observations n = total no. of
xi = individual observations
observation xi = individual
x = arithmetic observation
mean x = arithmetic mean
rkcajucom 104
Measure of Kurtosis

 The Measure of Kurtosis is the extent to


which the distribution approximates a bell-
shaped curve or the so-called normal curve.
 Types of kurtosis:
1. Platykurtic - the flat-topped curve.
2. Leptokurtic - the pointed curve.
3. Mesokurtic - the normal curve.

rkcajucom 105
Leptokurtic
 Characteristics:
1. It is a pointed curve more peaked than the
normal curve.
2. The value of kurtosis is greater than 3.
 Figure:

^m > 0
4

rkcajucom 106
Platykurtic
 Characteristics:
1. It is a flat topped curve.
2. The value of kurtosis is less than 3.
 Figure:

^m < 3
4

rkcajucom 107
Mesokurtic
 Characteristics:
1. It is the normal curve.
2. The value of kurtosis is equal to 3.
 Figure:
^
m4 = 3

rkcajucom 108
Computation of Kurtosis
 For ungrouped data:
m4 
 i
( x  x 4
where
n
xi = individual observation
 For grouped data:
i i
fi = class frequency
f ( x  x 4
m4  x = arithmetic mean
n
n = total frequencies

 Note: If m̂ 4 = 3, the distribution is normal.


rkcajucom 109
PROBABILITY
AND
ITS RULES

Prepared by:
REBECCA K. CAJUCOM

rkcajucom 110
Probabilities
In order to make intelligent decisions, two questions must
be answered:
1. What is possible? 2. What is probable?
a. Problems of listing a. To assign probabilities
down everything
possible 1. Classical or “priori”
1. Tabular method probability
2. Tree diagram 2. Relative frequency
approach
b. Problem of determining 3. Subjective
the number of possible probability
ways without listing
down everything- use b. To specify the odds at
counting techniques which it is fair to bet
1. Permutation that events will likely
2. Combination
to occur.

rkcajucom 111
Problem of listing down
everything possible (example 1)
 Toss 2 coins. Give the sample space of the
given experiment using:
a. tabular form
b. tree diagram

rkcajucom 112
Problem of listing down
everything possible (example 2)

 Two dice are tossed. Give the sample space


of the given experiment using:
a. tabular form
b. tree diagram.

rkcajucom 113
Fundamental Principle
(Multiplication of Choices)
 If an event can happen in any one of the
n1 ways, and, if when this has occurred,
another event can happen in any one of the
n2 ways, then the number of the ways, in
which both events can happen at the same
time in the specified order is:
N = n 1 x n2
 Similarly, if there are more than 2 events:
N = n 1 x n2 x n 3 . . .
rkcajucom 114
The Counting Techniques

 Permutations
 Combinations

rkcajucom 115
Permutation
 The Permutation of n distinct objects is the
arrangement of the objects with attention given to
the order of arrangement. The number of
permutation of n objects taken r at a time is
denoted by nPr and is defined by:
n!
n Pr 
(n  r )!
where n  r.
 Examples:
1. 6P4 , 6P6
2. In how many ways can the 3 letters (a, b, c) be
arranged if taken two at a time?
rkcajucom 116
Permutation
 The number of permutation of n objects taken
all (or n = r) at a time is
nPr = nPn = n!
 Example:
In how many ways can the 3 letters (a, b,
c) be arranged if taken all at a time?

rkcajucom 117
Permutation
 The number of permutation of n objects consisting of
groups of which n1 are alike, n2 are alike, . . . is :

n!
P
n1!n2!n3!..... where n = n1 + n2 + n3 + ...
 Examples:
1. In how many ways can the letters a, a, b, b, b, c, c, c be
arranged?
2. How many words can be formed out of the letters of the
word “STATISTICS”?
rkcajucom 118
Permutation
 The number of permutation of n objects
arranged in a circle is :
P=(n-1)!
 Example:
In how many ways can 5 people be seated
on a round table?

rkcajucom 119
Combination
 The Combination of n objects is the
selection of the objects with no attention
given to the order of arrangement. The
number of combinations of n objects taken r
at a time is denoted by nCr or  n  and is
r
defined by:
 n
 
n Cr    
n!
where n  r.
 r  (n  r )!r!
 Example:
In how many ways can the 3 letters (a, b, c)
be grouped if taken 2 at a time?
rkcajucom 120
Combination
 nCr = nCn-r
 Example
5C2 = 5C5 -2 = 5C3

 nCn = 1
 Example
5C5 = 1

rkcajucom 121
Kinds of Probabilities

 Classical ( or ‘ a priori ’ ) Probability


 Relative Frequency Approach
 Subjective Probability

rkcajucom 122
Classical Probability
 The probability of an event A is the ratio of the
number of sample points corresponding to event A
over the total number of sample points in the
sample space. In symbols,
a
P( A) 
ab
where
a = number of successes (the number of sample
points that correspond to event A)
b = number of failures (the number of sample
points that do not correspond to event A)
rkcajucom 123
Classical Probability(example)
 Toss 2 coins. What’s the probability of
getting:
a. exactly two heads?
b. at least one head?
c. at most one head?
d. at least two heads?
e. exactly three heads?

rkcajucom 124
Relative Frequency Approach
 The probability of an  Example:
event ( happening or
Toss a coin 100 times.
outcome ) is the
proportion of the time What’s the probability
that events of the same of getting heads in the
kind will occur in the given experiment?
long run.

rkcajucom 125
The Law of Large Numbers ( in relation to
relative frequency approach )

If a situation, trial, or experiment is repeated


again and again, the proportion of successes
will tend to approach the probability that
any one outcome will be a success.
 This theorem is known informally as the
“ Law of Averages.”

rkcajucom 126
Subjective Probability
 Subjective probability is sometimes called
personal probability. It reflects one’s belief with
regard to uncertainties that are involved, and they
apply especially when there is little or no direct
evidence, so that there really is no choice but to
consider collateral or indirect information ,
educated guesses, and perhaps intuition and other
subjective factors.
 Example:
95% of the time it will rain today.
rkcajucom 127
Rules on Probabilities
 Addition Rules (or = )  Multiplication Rules (and = 
1. For mutually exclusive or implied and )
events (or disjoint sets) - events 1. For independent events (with
which cannot occur together ( or replacement )
the occurrence of one event
 Formula:
automatically precludes the
occurrence of the other event/s. P( A B ) = P ( A )· P ( B )
 Formula: 2. For dependent events (without
P (A U B) = P (A) + P (B) replacement)
2. For not mutually exclusive events  Formula:
(or joint sets) P ( A  B ) = P( A ) P ( B A)
 Formula:
conditional probability
P(A U B) = P(A) + P(B) - P(A  B)
of B given A
joint probability of A and B
rkcajucom 128
Addition Rules (example)
 What’ s the probability of drawing an ace or
a king in a deck of cards?
 What’s the probability of drawing an ace or
a heart in a deck of cards?
 What’s the probability of drawing an ace,
king , queen, or a heart in a deck of cards?

rkcajucom 129
Multiplication Rules (example)

 A box contains 3 red, 4 black, and 5 white balls.


What’s the probability of drawing a red ball? A
black ball? A white ball?
Suppose, two balls are drawn at random
a. with replacement
b. without replacement
What’s the probability that the balls drawn are of
the same color? Not of the same color?

rkcajucom 130
Further Rules on Probability
1. P (A)  1 for any event A
2. P () = 0
3. P (A) + P (A’) = 1
or
P (A’) = 1 - P(A)
event A not happening

rkcajucom 131
Venn Diagram

 The Venn Diagram is a diagram related to


set theory in Mathematics by which the
events which can occur in a particular
observation or experiment can be portrayed.
Universal set
(Sample Space)
a c
Element b
(Sample Point)
Subset A (Event A )

rkcajucom 132
V.D. (in relation to operations on sets )
1. Union (  )
A. Joints sets B. Disjoint sets

A B A B
A B A  B

if AB, AB = B
C. Subsets if BA, AB = A

A B
B A

rkcajucom 133
V.D. (in relation to operations on sets )
2. Intersection (  )
A. Joints sets B. Disjoint sets

A B A B
A B A  B= 

C. Subsets if BA, AB = B


if AB, AB = A
A B
B A

rkcajucom 134
V. D. (in relation to operations on sets)

3. Complementation ( A ’)

A’

rkcajucom 135
V. D. Example
 One of the 240 members of a tennis club is
to be named Player of the year . If 145 of
the members are women, 85 use a two-
handed backhand, and 50 are women who
use a two-handed backhand, how many of
the outcomes correspond to the choice of a
man who does not use a two-handed
backhand?

rkcajucom 136
V. D. Example
 Among 60 houses advertised for sale there are 8 with
swimming pools, three or more bedrooms, and wall-to-
wall carpeting; 5 with swimming pools, three or more
bedrooms, but no wall-to-wall carpeting; 3 with swimming
pools, wall-to-wall carpeting, but fewer than 3 bedrooms; 8
with swimming pools but neither wall-to-wall carpeting
nor 3 or more bedrooms; 24 with 3 or more bedrooms, but
neither a swimming pool nor wall-to-wall carpeting; 2 with
3 or more bedrooms, wall-to-wall carpeting, but no
swimming pool; 3 with wall-to-wall carpeting but neither a
swimming pool nor 3 or more bedrooms; and 7 without
any of these features. If one of these houses is to be chosen
for a television commercial, how many outcomes
correspond to the choice of: (a) house with a swimming
pool; (b) a house with wall-to-wall carpeting?

rkcajucom 137
2-Circle Venn Diagram Example
 The probability that a person stopping at a service
station, will ask to have his oil checked is 0.28, the
probability that he will ask to have his tire
pressures checked is 0.11, and the probability that
he will ask to have both checked is 0.04. What are
the probabilities that a person stopping at this
service station, will ask to have:
a. his oil, his tire pressures, or both checked?
b. neither his oil nor his tire pressures checked?
rkcajucom 138
3-Circle Venn Diagram example
 In a marketing study made by an SMC researcher, the following data
were obtained out of a sample of 100 male beer-drinkers:
42 drinks Gold Eagle, 68 drinks Pale Pilsen, 54 drinks Lagerlite, 22
drinks both Lagerlite and Gold Eagle, 25 drinks both Pale Pilsen and
Gold Eagle, 7 drinks Lagerlite and neither Gold Eagle nor Pale Pilsen, 10
drinks all the three kinds of beers, and 8 does not take any of the three
beers.
A. Construct a three circle Venn Diagram showing the number of
drinkers in each of the 3 sets.
B. Suppose, a drinker is selected at random, find the probability that:
1. he drinks Pale Pilsen only;
2. He takes Pale Pilsen and Lagerlite but not
Gold Eagle;
3. If he drinks Pale Pilsen he takes all 3 beers.

rkcajucom 139
Conditional Probabilities
 Formula:
P( A  B)
P( A B) 
P( B)
if P (B)  0
or
P( B  A)
P( B A) 
P( A)
if P (A)  0

rkcajucom 140
Conditional Probabilities (example1)
 A consumer research A. Construct a joint probability of
organization has studied the the given table .
service under warranty B. Find the probability of
choosing:
provided by the 200 tire
1. A N/B who provided good
dealers in a large city, and service under warranty;
that its findings are 2. A dealer who provides good
summarized in the ff. Table: service under warranty given
that he is a N/B;
Good Service Bad Service 3. An O/B dealer who gives bad
Dealer
under Warranty under Warranty
service; and
Name
Brand
84 36 4.a dealer giving bad service
given that he is O/B.
Off
Brand
38 42

rkcajucom 141
Conditional Probabilities (example 2)
 Among a company’s replacement parts for a given
assembly, 20% are defective and the rest are good,
60% were bought from external sources and the rest
were made by the co. itself, and of those bought
from external sources 80% are good and the rest are
defective. What ‘s the probability that a replacement
part, randomly selected from this stock, is:
A. company made and good?
B. either defective or bought?
C. neither company made nor good?
D. bought, given that it is defective?
rkcajucom 142
Conditional Probabilities (example 3)
 Of the many dwellings in a large district of a major
city 70 per cent are single- and the rest multiple-
family dwellings, 60% were built prior to 1939 and
the rest since then, and of the pre- 1939 dwellings 3/4
are single and the rest multiple units. If one dwelling
is selected at random from all the dwellings in this
district, what are the probabilities that this dwelling is:
A. a pre - 1939 single one;
B. either a multiple or a pre- 1939 one;
C. neither a multiple nor a pre- 1939 one; and
D. a multiple one given that it is not pre- 1939.

rkcajucom 143
Mathematical Expectation
 A labor union wage negotiator feels that the odds
are 3 to 1 that the union members will get a raise of
80 cents in their hourly wage, the odds are 17 to 1
against their getting a raise of 40 cents in their
hourly wage, and the odds are 9 to 1 against their
getting no raise at all.
A. Find the corresponding probabilities that they
will get an P.80, .40., or no raise at all in their
hourly wage.
B. What is the expected raise in their hourly wage?
rkcajucom 144
PROBABILITY DISTRIBUTIONS
By:
REBECCA K. CAJUCOM

rkcajucom 145
Random Variables
--quantities which can take on different values
depending on chance.
Ex. 1. Number of tickets issued each day in a
movie house
2. Annual Production of rice in the
Philippines
3. Number of students passing a course
4. Number of mistakes a student makes in a
test
 Note: In our study of random variables, we are usually
interested mainly in the probabilities with which they
take on the various values within their range.

rkcajucom 146
Kinds of random variables
 Discrete random variables – values expressed as
integers or whole numbers only or observed values at
isolated points along a scale of values.
Example: Number of persons per household, units of
an item in an inventory.
 Continuous random variables – variables which can
assume a value at any fractional point along a specified
interval of values.
Example: Weight of each shipment, average number of
persons per household in a large community.

rkcajucom 147
Probability Distributions
(or probability functions)

 A probability function is a correspondence


which assigns probabilities to the values of
a random variable.

rkcajucom 148
Probability Distributions
(or probability functions)
 Mean of a Probability Distribution
m = E(x) = x ·p(x)

 Variance of a Probability Distribution


s2 = Var (x) = x 2 ·p(x) - m 2
where: x = random variable
p(x) = probability of the
random variable

rkcajucom 149
Kinds of Probability Distributions
 Discrete probability distributions:
a. Binomial Distribution
b. Hypergeometric Distribution
c. Poisson Distribution
d. Negative Binomial Distribution
e. Geometric Distribution
f. Multinomial Distribution

 Continuous probability distributions


a. Normal Distribution
b. Exponential Distribution

rkcajucom 150
Probability Distribution (example 1)

 Toss a coin. a.Construct the probability


distribution of the number of heads.
b. Also, compute the mean and variance
of the probability distribution obtained
in (a).

rkcajucom 151
Probability Distribution (example 2)
 Toss a die. (a.) Construct the probability distribution of
the different outcomes of the given experiment. (b.)
Compute the mean and standard deviation of the
probability distribution obtained in (a).
 Solution: Number of points Probabilities
Rolled with a die of x
x P (x)
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
6 1/6

rkcajucom 152
Probability Distribution(example 3)
 Toss two coins. a. Construct the
probability distribution of the number
of tails.
b. Also, compute the mean and
variance of the probability distribution
obtained in (a).

rkcajucom 153
Probability Distribution (example 4)
 The following table gives the probabilities that a computer
will malfunction 0, 1, 2, 3, 4, 5, or 6 time on any given
day: No. of malfunctions Probability
0 0.15
1 0.22
2 0.31
3 0.18
4 0.09
5 0.04
6 0.01
calculate the mean and the standard deviation of this
probability distribution.
rkcajucom 154
DISCRETE PROBABILITY
DISTRIBUTIONS

rkcajucom 155
Binomial Distribution
 The Binomial  Characteristics:
Distribution is a a. There are 2 mutually

discrete probability exclusive possible


outcomes on each trial or
distribution, which observation. For convenience,
is applicable these are called success and
whenever a failure.
b. The series of trials, or
sampling process observations, constitute
can be assumed to independent events and is fixed.
conform to a c. The probability of success,
Bernoulli process. denoted by p, remains constant
from trial to trial, that is,
the process is stationary.

rkcajucom 156
Binomial Distribution
 Formulas:
P(x) = nCx px qn-x
where:  n  n!
C
n x    
 x  (n  x)! x! = no. of combinations (2 < n < 15)
 
(Table of Binomial Coefficient
with p = .05, .1, .2, … .9, .95)
p = probability of success in each trial
n = no. of trials or observations
x = designated no. of successes
q = 1 – p = probability of failure
Mean of a BD: Variance of a BD:
m  np s2= npq

rkcajucom 157
B D Example
 The probability that a randomly chosen sales
prospect will make a purchase is 0.2. If a
salesman calls on six prospects, what’s the
probability that he will make:
a. exactly 4 sales
b. 4 or more sales
c. at most 2 sales

rkcajucom 158
BD as approximation to HD
 If, among 16 delivery trucks, five(5) have
worn brakes and ten(10) are chosen at
random for inspection, what is the
probability that at least three(3) trucks with
worn brakes are chosen?

rkcajucom 159
B D Note
 In actual practice the BD is often used to
approximate the HD. It is agreed that this
approximation is “safe” as long as (sample
size) is less than 5% of the N(population
size) that is, n < .05 N or n < .05 (a + b).

rkcajucom 160
Hypergeometric Distribution
 When sampling is done without
replacement of each sampled item taken
from a finite population of items, the
Bernoulli process does not apply because
there is a systematic change in the
probability of success as items are removed
from the population. Therefore, the
Hypergeometric distribution is the
appropriate discrete probability distribution.

rkcajucom 161
Hypergeometric Distribution
 Formula:
aCx bCn-x for x = 0, 1, 2,… n
P (x) = ------------
a+bC n
where
n = size of sample/or sum or total of
designated successes and failure
a = number of successes
b = number of failures
x = designated number of successes

rkcajucom 162
H D Example
 Of nine employees, three have been with
the company five or more years. If 4
employees are chosen randomly from the
group of 9, what’s the probability that:
a. exactly 2
b. less than 3 employees will have 5 or
more years seniority?

rkcajucom 163
Poisson Distribution
 The Poisson distribution can be used to determine the
probability of a designated number of successes when
the events occur in a continuum of time or space, such
a process is called a poisson process. It is similar to a
Bernoulli process except that the events occur over a
continuum rather than occurring on fixed trials or
observations.
 Examples:
Number of complaints received by a telephone
operator, Number of accidents happening in an
intersection, Number of patients entering an ER, etc.
rkcajucom 164
Poisson Distribution
 Formulas:
P( x) 
xe

(npx e np
x! x!
where
n = number of trials
x = designated number of successes in a
poisson process
e = constant = 2.71828…
p = probability of success
 = np= m = number of expected successes
(or average number of successes)
Mean of a PD:
( x) =  = m = np
Variance of a PD:
Var ( x) = s = 
rkcajucom 165
P D Example 1
 An average of 5 calls for service/ hour are
received by a machine repair department.
What’s the probability that:
a. exactly three calls
b. fewer than 3 calls
c. for service will be received in a
randomly selected hour?

rkcajucom 166
P D Example 2
 On the average, 12 people /hr. ask questions
of a decorating consultant in a fabric store.
The probability that three(3) or more people
approach the consultants with questions :
a. during a 10 minute interval?
b. during a 15 minute period?
c. within 2 hours?

rkcajucom 167
P D Note
 When the number of observations or trials n in
a Bernoulli process is large, computations are
quite tedious. Further, tabled probabilities for
very small values of p are not generally
available. Fortunately, the PD is suitable as an
approximation of Binomial Probabilities when n is
large (n  30) and p or (1 - p) is very small.
(np < 5 or n (1 - p) < 5)

rkcajucom 168
P D Approximation to B D

 For a large shipment of transistors


from a supplier, 1% of the items is
known to be defective. If a sample of
30 transistors is randomly selected,
what’s the probability that 2 or more
transistors will be defective?

rkcajucom 169
Negative Binomial Distribution
 If repeated independent trials can result in
success with probability p and a failure with
probability q = 1 - p, then the probability
distribution of the random variable X is a Negative
Binomial distribution. The number of the trial on
which the kth success occurs is given by:
x - 1 k x-k
b* (x;k,p) = p q , for x = k, k + 1, k +2,. .
k-1

rkcajucom 170
NBD Example

 Find the probability that a person tossing


3 coins will get either all heads or all tails
for the second time on the fifth toss.

rkcajucom 171
Geometric Distribution
 If repeated independent trials can result in a
success with probability p and a failure with
probability q = 1 - p, then the probability
distribution of the random variable X is a
Geometric distribution. The number of the trial on
which the first success occurs, is given by :

g ( x; p ) = p q x - 1 , for x = 1, 2, 3, . . .

rkcajucom 172
GD Example
 Find the probability that a person flipping
a balanced coin requires 4 tosses to get a
head.

rkcajucom 173
Multinomial Distribution
 If a given trial results in k outcomes E1, E2,
. . , Ek, with probabilities p1, p2, …, pk,
then the probability distribution of the
random variables x1, x2, …, xk, is a
Multinomial distribution.
 Formula:
n
P(x1, x2, …, xk, p1, p2, …, pk) = p1x1 p2x2 …, pkxk
x1, x2, …, xk
k k
with  xi = n and  pi = 1
i =1 i =1
rkcajucom 174
MD Example

 If a pair of dice is tossed 6 times, what is


the probability of obtaining a total of 7 or
11 twice, a matching pair once, and any
other combination 3 times?

rkcajucom 175
CONTINUOUS PROBABILITY
DISTRIBUTIONS

rkcajucom 176
Normal Distribution
 The Normal distribution is a continuous probability
distribution which is both symmetric and mesokurtic
and is given by the function:
2
 xm 
1 / 2 
1  s 
y  f ( x)  e
2p s
where
m = population mean
s = population standard deviation
s2= population variance
e = natural logarithmic constant = 2.71828…
x = random variable
rkcajucom 177
Characteristics of the
Normal Distribution
1.It looks like a Mexican 5. Although its tails are prolonged
indefinitely on both sides, they
sombrero or it has a bell- will never touch the horizontal
shaped curve. axis.
2.Its mean, median, and mode 6.The probability (or area) under
the curve bounded by the
are all equal.
horizontal axis is always equal
3.It is symmetrical about its to one. In symbols,
center. P (-  < z < + ) =1
4.Figure: 7.To convert the random variable
into its standard units, use
X - µ
Z = -----------------
s
- µ +
rkcajucom 178
ND Note
 The mean m of a standard normal curve
is zero and its variance s2 is = to one.
 Figure of the standard normal curve:

m= 0
and
s 2= 1
rkcajucom 179
Importance of the Normal Distribution in
Statistical Inference

 Measurements produced in many random


processes are known to follow this distribution.
 Normal probabilities can often be used to
approximate the probability distributions, such as
the Binomial and Poisson distributions.
 Distributions of such statistics as the sample mean
and sample proportion often follow the Normal
distribution regardless of the distribution of the
parent population.
rkcajucom 180
ND Example 1
 On a final exam in Mathematics the mean
was 72 and the standard deviation was 15.
a. Determine the standard scores ( i.e.
grades in standard units ) of students
receiving grades of 60, 93 and 72.
b. Find the grades corresponding to the
standard scores of –1.00 and 1.60.

rkcajucom 181
ND Example 2
 Two students were informed that they
received standard scores of 0.80 and -
0.40, respectively. On a multiple choice
examination in English. If their grades
were 88 and 64, respectively, find the
mean and standard deviation of the
examination grades.

rkcajucom 182
AREAS under the Normal Curve

Case 1. Area between 0 and  z - get the


area directly from the table.
Case 2 . Area between two z values
a. if signs are like, subtract their areas.
( Bigger area - Smaller area )
b. if signs are unlike, add their areas.
Case 3. Area to the left / right of  z -- add
to / or subtract from 0.5

rkcajucom 183
Steps in finding areas under the
Normal Curve
1. Draw the correct figure.
2. Write the area or probability notation.
3. Solve / or find the area using the table of
the areas under the normal curve.

rkcajucom 184
ND Example 3
 Find the areas under the normal curve in each of
the cases below:
a. between 0 and 1.23 h. to the right of - 0.78
b. between -0.68 and 0 i. to the right of 2.18
c. between -0.46 and 2.21 j. between –2.05 and
d. between 0.81 and 1.94 -1.44
e. to the left of -0.68
f. to the right of -2.05 and to the left of –1.44
g. to the left of 1.28

rkcajucom 185
ND Example 4
 Determinethe value/s of z in each of the
cases where area refers to that under the
normal curve:
a. area between 0 and z is 0.3770
b. area to the left of z is 0.8621
c. area between –1.50 and z is 0.0217.

rkcajucom 186
ND Example 5
 The grape fruits grown in a large orchard have a mean
weight of 19.3 ounces with a standard deviation of 2.2
ounces. Assuming that the distribution of the weight of these
grapefruits has roughly the shape of a normal distribution,
find:
a. what percentage of the grapefruits weigh:
1. less than 18.0 oz.
2. at least 20.0 oz.
3. Between 18.5 and 20.5 oz.
b. the weight below which lies the lightest 15 percent of
the grapefruits,
c. the weight above which lies the heaviest 25 percent of
the grapefruits.
rkcajucom 187
Normal Approximations
Rules on correction for Continuity:
 In general, when use of the correction for continuity is appropriate 0.50
is either added or subtracted according to the form of the probability
value required. To convert discrete data into a continuous data:
a. Subtract 0.50 from xI when P ( x  xi ) is required. => at least xi or
xi or more
b. Subtract 0.50 from xI when P ( x < xi ) is required. => less than xi
c. Add 0.50 to xI when P ( x > xi ) is required. => more than xi
d. Add 0.50 to xI when P ( x xi) is required. => at most xi or
xi or less

rkcajucom 188
Normal Approximation to BD
 When the no. of observations or trials n is relatively
large, the normal probability distribution can be
used to approximate BD. This is acceptable
whenever n  30 and both np  5 and n (1 – p)  5.
 Example:
For a large group of sales prospects, it is known
that 20% of those contacted personally by a
sales representative will make a purchase. If a
sales representative contacts 30 prospects,
determine the probability that 10 or more will
make a purchase.
rkcajucom 189
Normal Approximation to BD

 Find the probability of getting 4 heads


in 12 flips of a balanced coin.

rkcajucom 190
Normal Approximation to PD
 When the mean  of a PD is relatively large, the
normal probability distribution can be used to
approximate PD. A convenient rule of thumb is
that such approximation is acceptable when   10.
Example:

The average number of calls for service received


by a machine repair department per 8 hours
shift is 10. Determine the probability of more
than 15 calls received during a randomly
selected 8 hr. shift.
rkcajucom 191
Exponential Distribution
 If events occur in the context of a PD process,
then the length of time or space between
successive events follows an exponential
distribution. Since time or space is a continuum,
therefore, the distribution is a continuous one.
 The Exponential distribution applies whether we
are concerned with the time ( or space ) until the
very first event, the time between 2 successive
events, or the time until the 1st event occurs after
any randomly selected point, where  is the mean
number of occurrences for the interval of interest.
rkcajucom 192
Exponential Distribution
 Formulas:
x

P(T  t )  1  e  1  e m

Mean of an ED: Variance of an ED:


1 1
E (T )  Var(T ) 
 2

where
T= variable designated as time.

rkcajucom 193
ED Example 1
 An average of 5 calls per hour is received
by a machine repair department beginning
at a random point in time. What’s the
probability that the first call for service
will arrive within a half hour?

rkcajucom 194
ED Example 2
 Find the probability that a random
variable having an exponential distribution
with m = 10 will take on a value
a. between 0 and 4;
b. greater than 6;
c. between 8 and 12.

rkcajucom 195

Anda mungkin juga menyukai