Anda di halaman 1dari 51

Business Statistics

PGDM(2017-19)
Term-II (Sep-Dec,2017)

Kakali Kanjilal
Associate Professor, Operations
IMI, Delhi
Continuous Probability
Distributions:
Normal Distribution
f (x)
Normal

x
Continuous Probability Functions

Unlike a discrete random variable a continuous random


variable is one that can assume an uncountable number
of values.

We cannot list the possible values because there is an


infinite number of them.
Because there is an infinite number of values, the
probability of each individual value is virtually 0.
Point Probabilities are zero
Continuous Probability Functions

Thus, we can determine the probability of a range of


values only.

So, in a continuous random variable, It is meaningful to


talk about P(X 5) and not P(X=5), like discrete random
variable.
Discrete and Continuous Random
Variables - Revisited
A discrete random variable:
counts occurrences

has a countable number of possible values

has discrete jumps between successive values

has measurable probability associated with individual values

probability is height

For example: Binomial Distribution


n=3 p=.5
x P(x) 0.4
Binomial: n=3 p=.5

0 0.125 0.3

P(x)
1 0.375 0.2

0.1

2 0.375 0.0
0 1 2 3

3 0.125 C1

1.000
Discrete and Continuous Random
Variables - Revisited
A continuous random variable:
measures (e.g.: height, weight, speed, value, duration,

length)
has an uncountably infinite number of possible values

moves continuously from value to value

has no measurable probability associated with individual

values
probability is area
M in u te s to C o m p le te T a s k

0.3

For example:
In this case, the shaded area 0.2

represents the probability that the P (x) 0.1

task takes between 2 and 3 0.0


1 2 3 4 5 6

minutes. M in ute s
From a Discrete to a Continuous
Distribution
The time it takes to complete a task can be subdivided into:
Eighth-Minute Intervals (0.125)
Half-Minute (0.5) Intervals Quarter-Minute (0.25)Intervals
Minutes to C omplete T as k: B y Half-Minutes Minutes to C omplete Task: F ourths of a Minute Minutes to Complete Task: Eighths of a Minute
0.15

0.10
P(x)

P(x)

P(x)
0.05

0.00
0.0
. 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Minutes Minutes Minutes

Or even infinitesimally small intervals:


Minutes to Complete Task: Probability Density Function When a continuous random variable has been subdivided
into infinitesimally small intervals, a measurable
probability can only be associated with an interval of
values, and the probability is given by the area beneath the
f(z)

probability density function corresponding to that interval.


0 1 2 3 4 5 6 7
In this example, the shaded area represents P(2 X ).
Minutes
Cumulative Distribution Function and
Probability Density Function
F(x) For a continuous random variable,

F(b)

F(a)
} P(a X b)=F(b) - F(a)

0
a b x
f(x)

P(a X b) = Area under f(x)


between a and b
= F(b) - F(a)

x
0 a b
Continuous Random Variables
A continuous random variable is a random variable that can take
on any value in an interval of numbers.

The probabilities associated with a continuous random variable X


are determined by the probability density function of the random
variable. The function, denoted f(x), has the following properties.

1. f(x) 0 for all x.


2. The probability that X will be between two numbers a and b
is equal to the area under f(x) between a and b.
3. The total area under the curve of f(x) is equal to 1.00.

The cumulative distribution function of a continuous random variable:

F(x) = P(X x) =Area under f(x) between the smallest possible value of X (often -
) and the point x.
Normal Probability Distribution
As n increases, the binomial distribution approaches a ...
n=6 n = 10 n = 14
B inomial D is tribution: n= 6, p= .5 B inomial D is tribution: n= 1 0 , p= .5 B inomial D is tribution: n= 1 4 , p= .5

0.3 0.3 0.3

0.2 0.2 0.2


P (x)

P (x)
P (x)
0.1 0.1 0.1

0.0 0.0 0.0


0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
x x x

Normal Probability Density Function: Normal D is tribution: = 0, = 1


0.4

x - 2

-
0.3

e 2 2

f(x)
f ( x) = - < x<
0.2
1 for 0.1

2p 2 0.0

where e = 2 . 7182818 ... and p = 3 . 14159265 ...


-5 0 5
x
Normal Probability Distribution

The normal probability distribution is the most


important distribution for describing a continuous
random variable.

It has been used in a wide variety of applications


including:
Heights of people
Rainfall amounts
Test scores
Stock Prices
Normal Probability Distribution

If a random variable is affected by many independent


causes, and the effect of each cause is not
overwhelmingly large compared to other effects, then
the random variable will closely follow a normal
distribution.

It is widely used in statistical inference.


Normal Probability Distribution

Normal Probability Density Function

Developed by Carl Frederich Gauss.


= mean
= standard deviation
p= 3.14159
e = 2.71828

So, it is described by two parameters; mean and


standard deviation .
Normal Probability Distribution
Characteristics
The entire family of normal probability
distributions is defined by its mean and its
standard deviation

Standard Deviation

Mean x
Normal Probability Distribution
Characteristics
The distribution is symmetric; its skewness measure is
zero.

x
Normal Probability Distribution
Characteristics
The highest point on the normal curve is at the
mean, which is also the median and mode.

x
Mean=Median=Mode
Mean, Median and Mode
The mean (or average) of a set of data values is the sum of all of the data values divided by the number of data values.

The median of a set of data values is the middle value of the data set when it has been arranged in ascending order. That is,
from the smallest value to the highest value.
Median = (n+1)th value; n is the number of data values in the sample
If the number of values in the data set is even, then the median is the average of the two middle values.
- Half of the values in the data set lie below the median and half lie above the median
- The median is the most commonly quoted figure used to measure property prices. The use of the median avoids the
problem of the mean property price which is affected by a few expensive properties that are not representative of the
general property market.

The mode of a set of data values is the value(s) that occurs most often.
- It is possible for a set of data values to have more than one mode.
I- f there are two data values that occur most frequently, we say that the set of data values is bimodal.

If there is no data value or data values that occur most frequently, we say that the set of data values has no mode.
The mean, median and mode of a data set are collectively known as measures of central tendency as these three measures
focus on where the data is centred or clustered. To analyse data using the mean, median and mode, we need to use the
most appropriate measure of central tendency. The following points should be remembered:

The mean is useful for predicting future results when there are no extreme values in the data set. However, the impact of
extreme values on the mean may be important and should be considered. E.g. The impact of a stock market crash on
average investment returns.

The median may be more useful than the mean when there are extreme values in the data set as it is not affected by the
extreme values.

The mode is useful when the most common item, characteristic or value of a data set is required.
Normal Probability Distribution
Characteristics
The mean can be any numerical value: negative,
zero, or positive.
Different Mean , Same Standard Deviation

-10 0 25
x
Normal Probability Distribution
Characteristics
The standard deviation determines the width of the
curve: larger values result in wider, flatter curves.
Normal Probability Distribution
Characteristics
Probabilities for the normal random variable are
given by areas under the curve. The total area
under the curve is 1 (.5 to the left of the mean and
.5 to the right).

.5 .5
Properties of Normal Probability
Distribution
The normal is a family of
Bell-shaped and symmetric distributions.
Because the distribution is symmetric, one-half (.50 or 50%)
lies on either side of the mean.

Each is asymptotic to the horizontal axis.


If several independent random variables are normally
distributed then their sum will also be normally distributed.
The mean of the sum will be the sum of all the individual
means.
The variance of the sum will be the sum of all the individual
variances (by virtue of the independence).
Properties of Normal Probability
Distribution
If X1, X2, , Xn are independent normal random variables,
then their sum S will also be normally distributed with

E(S) = E(X1) + E(X2) + + E(Xn)


V(S) = V(X1) + V(X2) + + V(Xn)
Note: It is the variances that can be added above and not the
standard deviations.

For example: the Sales of an automobile company is a


summation of sales in small segment, middle-segment and high
segment cars. If the sales in 3 segments are normally
distributed, the total sales will also be normally distributed.
Normal Probability Distribution
Characteristics (basis for the empirical rule)
99.72%
95.44%
68.26%

x
+ 3
3 1 + 1
2 + 2
Normal Probability Distribution
Characteristics (basis for the empirical rule)

68.26% of values of a normal random variable are within


+/- 1 standard deviation of its mean.

95.44% of values of a normal random variable are within


+/- 2 standard deviation of its mean.

99.72% of values of a normal random variable are within


+/- 3 standard deviation of its mean.
Normal Probability Distribution
All of these are normal probability density functions, though each has a different mean and
variance.
Normal Dis tribution: =40, =1 Normal D is tribution: =30, =5 Normal D is tribution: =50, =3
0.4 0.2 0.2

0.3
f(w)

f(x)

f(y)
0.2 0.1 0.1

0.1

0.0 0.0 0.0


35 40 45 0 10 20 30 40 50 60 35 45 50 55 65
w x y

W~N(40,1) X~N(30,25) Y~N(50,9)


Normal D is tribution: =0, =1
0.4 Consider:
0.3
The probability in each
P(39 W 41) case is an area under a
f(z)

0.2

0.1 P(25 X 35) normal probability density


0.0 P(47 Y 53) function.
P(-1 Z 1)
-5 0 5
z

Z~N(0,1)
Standard Normal Probability
Distribution
Characteristics
A random variable having a normal distribution
with a mean of 0 and a standard deviation of 1 is
said to have a standard normal probability
distribution.
The letter z is used to designate the standard
normal random variable: Z=(x- )/
=1

=0
z
Example: Demand at Gas Station
Suppose that at a gas station the daily demand for regular
gasoline is normally distributed with a mean of 1,000 gallons
and a standard deviation of 100 gallons.

The station manager has just opened the station for business
and notes that there is exactly 1,100 gallons of regular
gasoline in storage.

The next delivery is scheduled later today at the close of


business. The manager would like to know the probability
that he will have enough regular gasoline to satisfy todays
demands or P(X < 1,100)
Example: Demand at Gas Station
The demand is normally distributed with mean = 1,000 and
standard deviation = 100. We want to find the probability
P(X < 1,100). Graphically we want to calculate:

P (x < 1,100)

P(Z < 1.00) can be calculated using can be solved by


1. Excel (shown later)
2. Empirical Rule discussed before
3. Using Standard Normal Probability Table
Example: Demand at Gas Station
P(X<1,100) using Excel
Example: Demand at Gas Station
P(X<1,100) using Empirical Rule
Rule: 68.26% of values of a normal random variable are within +/-
1 standard deviation of its mean. P( -1<=X<= +1) = 68.26%;
So, distribution being perfectly symmetrical, P( <=X<= +1) =
34.13%

P (x <= 1,100)=P(X<= +1)

P(X<=1000)=0.5
So, P(X<=1100) = 0.5+0.3413 = 0.8413

OR, 1-0.68= 0.32; for each half , 0.16,


=1000,
So, 0.68 + 0.16 = 0.84
=100
Example: Demand at Gas Station
using Standard Normal Prob Table
Step 1: Convert x to the standard normal distribution.

P(X < 1,100) = = P(Z < 1.00)


X - 1,100 - 1,000
Graphically, P <
100
Finding Probabilities of the Standard
Normal Distribution: P(0 < Z < 1.56)
Standard Normal Probabilities
S tandard Normal Distribution z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.4 0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.3 0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
f(z)

0.2 0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
0.1 1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.56 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
{

1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
0.0 1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
-5 -4 -3 -2 -1 0 1 2 3 4 5 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
Z 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

Look in row labeled 1.5 2.1


2.2
0.4821
0.4861
0.4826
0.4864
0.4830
0.4868
0.4834
0.4871
0.4838
0.4875
0.4842
0.4878
0.4846
0.4881
0.4850
0.4884
0.4854
0.4887
0.4857
0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
and column labeled .06 to 2.4
2.5
0.4918
0.4938
0.4920
0.4940
0.4922
0.4941
0.4925
0.4943
0.4927
0.4945
0.4929
0.4946
0.4931
0.4948
0.4932
0.4949
0.4934
0.4951
0.4936
0.4952

find P(0 z 1.56) = 2.6


2.7
0.4953
0.4965
0.4955
0.4966
0.4956
0.4967
0.4957
0.4968
0.4959
0.4969
0.4960
0.4970
0.4961
0.4971
0.4962
0.4972
0.4963
0.4973
0.4964
0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
0.4406 2.9
3.0
0.4981
0.4987
0.4982
0.4987
0.4982
0.4987
0.4983
0.4988
0.4984
0.4988
0.4984
0.4989
0.4985
0.4989
0.4985
0.4989
0.4986
0.4990
0.4986
0.4990
Example: Demand at Gas Station
using Standard Normal Prob Table
P(X < 1,100) = P( Z < 1.00) = 0.5 + 0.3413=0.8413
Problem: Pep Zone
Pep Zone sells auto parts and supplies including a popular multi-
grade motor oil. When the stock of this oil drops to 20 gallons, a
replenishment order is placed.

The store manager is concerned that sales are being lost due to
stock-outs while waiting for a replenishment order.

The manager would like to know the probability of a stock-out during


replenishment lead-time. In other words, what is the probability that
demand during lead-time will exceed 20 gallons?

It has been determined that demand during replenishment lead-time


is normally distributed with a mean of 15 gallons and a standard
deviation of 6 gallons.
Problem: Pep Zone

Solving for the Stock-out Probability P(X>20)?

Step 1: Convert x to the standard normal distribution.


z = (x - )/
= (20 - 15)/6
= .83

Step 2: To find P(Z > 0.83) , we need to first find the area
under the standard normal curve to the left of z = .83.
Problem: Pep Zone
using Standard Normal Prob Table
P ( z > 0.83) = 1- (0.5+0.2967)=1-0.7967=0.2033

Probability of no Probability of a
Stock-out during Stock-out during
replenishment replenishment
lead-time = lead-time =
P(z<=0.83)= .7967 P(Z>0.83)=0.2033

z
0 .83
Problem: Pep Zone

If the manager of Pep Zone wants the probability


of a stock-out during replenishment lead-time to be
no more than .05, what should the reorder point be or
what should be X?

HINT: Given a probability, we can use the standard normal table in an inverse
fashion to find the corresponding z value.
In earlier case we moved from X to Z, here we would first find Z for a
corresponding probability which is 0.05 now and then Z, next we will move
forward from Z to X.
Problem: Pep Zone
Solving for the Reorder Point

Area = .0500

z
0 z.05
P(Z > zA) = 0.05=> P(Z<= zA ) = 0.5-0.05 => P(Z<= zA ) = 0.45;
So, looking up the table , Z= 1.64 gives prob 0.45.
Problem: Pep Zone
Solving for the Reorder Point

Step 2: Convert z.05 to the corresponding value of x.

x = + z.05
= 15 + 1.645(6)
= 24.87 or 25

A reorder point of 25 gallons will place the probability


of a stock-out during lead time at (slightly less than) .05.
Problem: Pep Zone
Solving for the Reorder Point

Probability of no
Probability of a
Stock-out during
Stock-out during
replenishment
replenishment
lead-time = .95
lead-time = .05

x
15 24.87
Problem: Pep Zone
Solving for the Reorder Point

By raising the reorder point from 20 gallons to


25 gallons on hand, the probability of a stock-out
decreases from about .20 to .05.

This is a significant decrease in the chance that


Pep Zone will be out of stock and unable to meet a
customers desire to make a purchase.
Problem: Investment Decisions
Consider an investment whose return is normally
distributed with a mean of 10% and a standard deviation
of 5%.
a. Determine the probability of losing money or P(X<0)
b. Find the probability of losing money when the
standard deviation is equal to 10%.
Problem: Investment Decisions
The investment loses money when the return is negative.
Thus we wish to determine
P(X < 0)
The first step is to standardize both X and 0 in the probability
statement.
X - 0 - 10
P(X < 0) = P < = P(Z < 2.00) ;
5
which is same as P(Z>2)= 0.5-0.4772=0.0228;
using standard normal probability table.

Therefore the probability of losing money is .0228


Problem: Investment Decisions

If we increase the standard deviation to 10% the


probability of suffering a loss becomes
X - 0 - 10
P(X < 0) = P <
10

= P(Z < 1.00) ; OR, P(Z>1)

Find P(Z>1) =0.5-0.3413

= .1587
Standard Normal Probability
Notice that the largest value of z in the table is 3.09, and
that P( Z < 3.09) = .9990. This means that

P(Z > 3.09) = 1 .9990 = .0010

However, because the table lists no values beyond 3.09, we


approximate any area beyond 3.10 as 0. That is,

P(Z > 3.10) = P(Z < 3.10) 0


Normal Approximation of Binomial
Distribution

Normal distribution is another approximation of Binomial


Distribution under the following conditions:

n, the number of trials is very large


neither p (probability of occurrence) nor (1-p) is very
small
A thumb rule:
np > 5 and n(1 - p) > 5
Normal Approximation of Binomial
Distribution
In the definition of the normal curve, set
= np and = np (1 - p )
Add and subtract a continuity correction factor because a
continuous distribution is being used to approximate a discrete
distribution.

So, a column in the histogram of a Binomial Distribution, say x = 12,


covers in the continuous sense [11.5 , 12.5]. So, P(x = 12) for the
discrete binomial probability distribution is approximated by
P(11.5 < x < 12.5) for the continuous normal distribution.
Normal Distribution Problems
CASE: DAILY SENSEX RETURN
Mr. Arin is a short-term investor in stock market and he prefers to invest in indices to stocks directly
He works in IT consulting division of Accenture. He holds an MBA degree in Information
Management from one of the top four IIMs. In Mar 2008, he invests 40% of his bonus in BSE
SENSEX expecting at least 15-20% return in the second quarter when SENSEX would cross the
mark of 19000. However, at the end of the first quarter, the SENSEX starts sliding down rapidly and
investors start panicking about the future. Arin is also in doubt about the return at the end of second
quarter. Before taking a final call on redemption, he wants to check the results of some basic things on
statistics (probability distribution, sampling distribution, estimation and confidence interval) by
analyzing the daily return of SENSEX from the time he invested in SENSEX, Mar-2008. He thinks
this would give some guidance in his final decision. He decides to reach out to Market analyst after
analyzing some basic things on SENSEX.
Mr. Arin recalls his statistics lessons in MBA. He does correctly recall that if a random variable is
affected by many independent causes, and the effect of each cause is not overwhelmingly large
compared to other effects, then the random variable will closely follow a normal distribution. He
knows that SENSEX is driven by multiple factors, though identifying and interpreting each factor is
difficult task. If the multiple forces work behind the SENSEX, does the return really follow a normal
distribution?
He first downloads the daily SENSEX from Mar-08 to Jun-08 from BSE website. He then derives the
daily return of SENSEX as he is interested in its return only.

12/12/2017 49
CASE: DAILY SENSEX RETURN
How should he proceed to get the following answers?
1. Does the daily return of SENSEX follow a normal distribution?
How do you derive this information?
2. What are average daily return and its variation?
3. What is the chance that return is negative?
4. What are the two values for which Arin can be 95% sure that the daily
return would lie between these values?
5. Does this help Arin in his decision?
6. If yes, explain the reasons. If not, what additional information does
Arin require? Should he seek advice from market analyst?

Data is sourced from Bombay Stock Exchange

12/12/2017 50
Solution

12/12/2017 51

Anda mungkin juga menyukai