Stanley B. Gershwin
http://web.mit.edu/manuf-sys
Massachusetts Institute of Technology
Spring, 2010
1/128
I ip a coin 100 times, and it shows heads every time. Question: What is the probability that it will show heads on the next ip?
2/128
Probability = Statistics Probability: mathematical theory that describes uncertainty. Statistics: set of techniques for extracting useful information from data.
3/128
The probability that the outcome of an experiment is A is prob (A) if the experiment is performed a large number of times and the fraction of times that the observed outcome is A is prob (A).
4/128
The probability that the outcome of an experiment is A is prob (A) if the experiment is performed in each parallel universe and the fraction of universes in which the observed outcome is A is prob (A).
5/128
The probability that the outcome of an experiment is A is prob (A) = P (A) if before the experiment is performed a risk-neutral observer would be P (A) willing to bet $1 against more than $ 1 P (A) .
6/128
The probability that the outcome of an experiment is A is prob (A) if that is the opinion (ie, belief or state of mind) of an observer before the experiment is performed.
7/128
The probability that the outcome of an experiment is A is prob (A) if prob () satises a certain set of axioms.
8/128
Axioms of probability Let U be a set of samples . Let E1 , E2 , ... be subsets of U . Let be the null set (the set that has no elements).
0 prob (Ei ) 1 prob (U ) = 1 prob () = 0 If Ei Ej = , then prob (Ei Ej ) = prob (Ei ) + prob (Ej )
9/128
Probability Basics
10/128
Probability Basics
If
Ei
Ei = U , and
then
prob (Ei ) = 1
11/128
Venn diagrams
U
12/128
Venn diagrams
U
A B
13/128
AUB
14/128
prob (A|B ) =
prob (A B ) prob (B )
U
A B
15/128
AUB
A is the event of getting an odd number (1, 3, 5). B is the event of getting a number less than or equal to 3 (1, 2, 3).
Then prob (A) = prob (B ) = 1/2 and prob (A B ) = prob (1, 3) = 1/3. Also, prob (A|B ) = prob (A B )/ prob (B ) = 2/3.
16/128
Note: prob (A|B ) being large does not mean that B causes A. It only means that if B occurs it is probable that A also occurs. This could be due to A and B having similar causes. Similarly prob (A|B ) being small does not mean that B prevents A.
17/128
Let B = C D and assume C D = . We have prob (A D ) prob (A C ) and prob (A|D ) = . prob (A|C ) = prob (C ) prob (D ) Also prob (C ) prob (C B ) = because C B = C . prob (B ) prob (B ) prob (D ) Similarly, prob (D |B ) = prob (B ) prob (C |B ) =
18/128
A B
Or, prob (A|B ) prob (B ) = prob (A|C ) prob (C ) + prob (A|D ) prob (D ) so prob (A|B ) = prob (A|C ) prob (C |B ) + prob (A|D ) prob (D |B ).
20/128
21/128
More generally, if A and E1 , . . . Ek are events and Ei and Ej = , for all i = j and
Ei
(ie, the set of Ej sets is mutually exclusive and collectively exhaustive ) then ...
22/128
prob (Ej ) = 1
23/128
prob (A and B ) =
j
24/128
25/128
Flip of One Coin Let U =H,T. Let = H if we ip a coin and get heads; = T if we ip a coin and get tails. Let X ( ) be the number of times we get heads. Then X ( ) = 0 or 1. prob ( = T ) = prob (X = 0) = 1/2 prob ( = H ) = prob (X = 1) = 1/2
26/128
Let X be the number of heads. The order does not matter! Then X = 0, 1, 2, or 3.
27/128
28/128
Dynamic Systems
The state can be scalar or vector. The state can be discrete or continuous or mixed. The state can be deterministic or random.
X is a stochastic process if X (t ) is a random variable for every t . The value of X is sometimes written explicitly as X (t , ) or X (t ).
29/128
Flip a biased coin. If X B is Bernoulli, then there is a p such that prob(X B = 0) = p . prob(X B = 1) = 1 p .
30/128
The sum of n independent Bernoulli random variables XiB with the same parameter p is a binomial random variable X b . Xb =
n i =0
prob (X b = x ) =
31/128
The number of independent Bernoulli random variables XiB tested until the rst 0 appears is a geometric random variable X g . X g = min{XiB = 0}
i
To calculate prob
(X g
= t ):
32/128
For t > 1, prob (X g > t ) = prob (X g > t |X g > t 1) prob (X g > t 1) = (1 p ) prob (X g > t 1), so prob (X g > t ) = (1 p )t and prob (X g = t ) = (1 p )t 1 p
33/128
1 p
Consider a two-state system. The system can go from 1 to 0, but not from 0 to 1. Let p be the conditional probability that the system is in state 0 at time t + 1, given that it is in state 1 at time t . That is, p = prob (t + 1) = 0 ( t ) = 1 .
2.852 Manufacturing Systems Analysis 34/128 c 2010 Stanley B. Gershwin. Copyright
1p
1 p
Let p(, t ) be the probability of the system being in state at time t . Then, since p(0, t + 1) = prob (t + 1) = 0(t ) = 1 prob [(t ) = 1] + prob (t + 1) = 0(t ) = 0 prob [(t ) = 0], p(0, t + 1) = p p(1, t ) + p(0, t ), p(1, t + 1) = (1 p )p(1, t ), and the normalization equation p(1, t ) + p(0, t ) = 1.
2.852 Manufacturing Systems Analysis 35/128 c 2010 Stanley B. Gershwin. Copyright
(Why?) we have
1p
1 p
36/128
1p
1 p
0.8
probability
0.6
p(0,t) p(1,t)
0.4
0.2
10
20
30
37/128
1p
1 p
Recall that once the system makes the transition from 1 to 0 it can never go back. The probability that the transition takes place at time t is prob [(t ) = 0 and (t 1) = 1] = (1 p )t 1 p . The time of the transition from 1 to 0 is said to be geometrically distributed with parameter p . The expected transition time is 1/p . (Prove it!) Note: If the transition represents a machine failure, then 1/p is the Mean Time to Fail (MTTF). The Mean Time to Repair (MTTR) is similarly calculated.
38/128
1p
1 p
39/128
40/128
A linear dierence equation with constant coecients is one of the form x (t + 1) = Ax (t ) where A is a square matrix of appropriate dimension. Solution: x (t ) = At c However, this form of the solution is not always convenient.
41/128
where k is the dimensionality of x , 1 , 2 , ..., k are scalars and b1 , b2 , ..., bk are vectors. The bj satisfy c = b1 + b2 + ... + bk
1 , 2 , ..., k are the eigenvalues of A and b1 , b2 , ..., bk are its eigenvectors, but we dont always have to use that explicitly to determine them. This is very similar to the solution of linear dierential equations with constant coecients.
42/128
The typical solution technique is to guess a solution of the form x (t ) = b t and plug it into the dierence equation. We nd that must satisfy a k th order polynomial, which gives us the k s. We also nd that b must satisfy a set of linear equations which depends on . Examples and variations will follow.
43/128
Markov processes
A Markov process is a stochastic process in which the probability of nding X at some value at time t + t depends only on the value of X at time t . Or, let x (s ), s t , be the history of the values of X before time t and let A be a set of possible values of X (t + t ). Then prob {X (t + t ) A|X (s ) = x (s ), s t } = prob {X (t + t ) A|X (t ) = x (t )}
In words: if we know what X was at time t , we dont gain any more useful information about X (t + t ) by also knowing what X was at any time earlier than t .
44/128 c 2010 Stanley B. Gershwin. Copyright
States can be numbered 0, 1, 2, 3, ... (or with multiple indices if that is more convenient). Time can be numbered 0, 1, 2, 3, ... (or 0, , 2, 3, ... if more convenient). The probability of a transition from j to i in one time unit is often written Pij , where Pij = prob{X (t + 1) = i |X (t ) = j }
45/128
1 P P P
14 24
64
4 P
64
45
6 5 3 7
m , m =i
Pmi .
Dene pi (t ) = prob{X (t ) = i }.
{pi (t )for all i } is the probability distribution at time t. Transition equations: pi (t + 1) = j Pij pj (t ).
Initial condition: pi (0) specied. For example, if we observe that the system is in state j at time 0, then pj (0) = 1 and pi (0) = 0 for all i = j .
Let the current time be 0. The probability distribution at time t > 0 describes our state of knowledge at time 0 about what state the system will be in at time t . Normalization equation: i pi (t ) = 1.
Steady state: pi = limt pi (t ), if it exists. Steady-state transition equations: pi = j Pij pj . Steady state probability distribution:
Very important concept, but dierent from the usual concept of steady state. The system does not stop changing or approach a limit. The probability distribution stops changing and approaches a limit.
48/128
the probability of it being in that state at time 2, which is very dierent from it being in that state at time 3.
The probability of the system being in that state at time 1000 is very close to the
probability of it being in that state at time 1001, which is very close to the probability of it being in that state at time 2000.
49/128
50/128
(Prove it!)
51/128
1=up; 0=down.
1p r 1 p 0 1r
52/128
The probability distribution satises p(0, t + 1) = p(0, t )(1 r ) + p(1, t )p , p(1, t + 1) = p(0, t )r + p(1, t )(1 p ).
53/128
= =
= =
1r +
a(1) p, a(0)
t t p(0, t ) = a1 (0)X1 + a2 (0)X2 = a1 (0) + a2 (0)(1 r p )t r t t p(1, t ) = a1 (1)X1 + a2 (1)X2 = a1 (0) a2 (0)(1 r p )t p
56/128
r +p r = a1 (0) p p p r +p
p r +p
Solution
After more simplication and some beautication, p(0, t ) = p(0, 0)(1 p r )t p + 1 (1 p r )t , r +p p(1, t ) = p(1, 0)(1 p r )t r 1 (1 p r )t . + r +p
58/128
0.8
probability
0.4
0.2
20
40
60
80
100
59/128
60/128
Steady-state solution If the machine makes one part per time unit when it is operational, the average production rate is p(1) = 1 r = p r +p 1+ . r
61/128
Classication of states A chain is irreducible if and only if each state can be reached from each other state. Let fij be the probability that, if the system is in state j , it will at some later time be in state i . State i is transient if fij < 1. If a steady state distribution exists, and i is a transient state, its steady state probability is 0.
62/128
Classication of states
The states can be uniquely divided into sets T , C1 , . . . Cn such that T is the set of all transient states and fij = 1 for i and j in the same set Cm and fij = 0 for i in some set Cm and j not in that set. If there is only one set C , the chain is irreducible. The sets Cm are called nal classes or absorbing classes and T is the transient class. Transient states cannot be reached from any other states except possibly other transient states. If state i is in T , there is no state j in any set Cm such that there is a sequence of possible transitions (transitions with nonzero probability) from j to i .
63/128
C7 T
C6
C5
C1
C4
C2
C3
64/128
States can be numbered 0, 1, 2, 3, ... (or with multiple indices if that is more convenient). Time is a real number, dened on (, ) or a smaller interval. The probability of a transition from j to i during [t , t + t ] is approximately ij t , where t is small, and ij t = prob{X (t + t ) = i |X (t ) = j } + o (t ) for j = i .
65/128
no self loops!!!!
24
64
45
6 5 3 7
=
i
Normalization equation:
pi (t ) = 1.
j =i
ji
ij pj (t ).
67/128
Steady state: pi = limt pi (t ), if it exists. Steady-state transition equations: 0 = j ij pj . Steady-state balance equations: pi m,m=i mi = j ,j =i ij pj Normalization equation: i pi = 1.
68/128
Never Draw self-loops in continuous time Markov process graphs. Never write 1 14 24 64 . Write
1 (14 + 24 + 64 ) t , or (14 + 24 + 64 )
69/128
p
(t + t ) = 0(t ) = 1 + o (t ).
70/128
p t = prob
2.852 Manufacturing Systems Analysis
p(0, t + t ) = prob (t + t ) = 0(t ) = 1 prob [(t ) = 1]+ prob (t + t ) = 0(t ) = 0 prob[(t ) = 0]. p(0, t + t ) = p t p(1, t ) + p(0, t ) + o (t ) or d p(0, t ) = p p(1, t ). dt
71/128 c 2010 Stanley B. Gershwin. Copyright
or
Since p(0, t ) + p(1, t ) = 1, d p(1, t ) = p p(1, t ). dt If p(1, 0) = 1, then p(1, t ) = e pt and p(0, t ) = 1 e pt
72/128
Density function The probability that the transition takes place in [t , t + t ] is prob [(t + t ) = 0 and (t ) = 1] = e pt p t . The exponential density function is pe pt . The time of the transition from 1 to 0 is said to be exponentially distributed with rate p . The expected transition time is 1/p . (Prove it!)
73/128
F(t) 1
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
1p
1p
74/128
Density function
Memorylessness: prob (T > t + x |T > x ) = prob (T > t ) prob (t T t + t ) t for small t . If T1 , ..., Tn are exponentially distributed random variables with parameters 1 ..., n and T = min(T1 , ..., Tn ), then T is an exponentially distribution random variable with parameter = 1 + ... + n .
75/128
76/128
Continuous time
r 1 p 0
77/128
or d p(0, t ) dt d p(1, t ) dt
2.852 Manufacturing Systems Analysis
78/128
79/128
Steady-state solution If the machine makes parts per time unit on the average when it is operational, the overall average production rate is p(1) = 1 r = p r +p 1+ . r
80/128
Exponentially distributed inter-arrival times mean is 1/; is arrival rate (customers/time). (Poisson arrival process.) Exponentially distributed service times mean is 1/; is service rate (customers/time). 1 server. Innite waiting area.
81/128
Exponential arrivals:
If a part arrives at time s , the probability that the next part arrives during the interval [s + t , s + t + t ] is e t t + o ( t ) t . is the arrival rate.
Exponential service:
If an operation is completed at time s and the buer is not empty, the probability that the next operation is completed during the interval [s + t , s + t + t ] is e t t + o ( t ) t . is the service rate.
82/128
83/128
State Space
0
n1
n+1
84/128
85/128
If a steady state distribution exists, it satises 0 = p(n 1) + p(n + 1) p(n)( + ), n > 0 0 = p(1) p(0). Why if?
2.852 Manufacturing Systems Analysis 86/128 c 2010 Stanley B. Gershwin. Copyright
np(n) =
= . 1
87/128
30
Delay
20
10
0.25
88/128
Performance Evaluation
W
100 80
60
40
20
0 0 0.5 1 1.5 2
=1
=2
89/128
There are multiple servers. There is nite space for queueing. The arrival process is not Poisson. The service process is not exponential.
90/128
1. Mathematically, continuous and discrete random variables are very dierent. 2. Quantitatively , however, some continuous models are very close to some discrete models. 3. Therefore, which kind of model to use for a given system is a matter of convenience . Example: The production process for small metal parts (nuts, bolts, washers, etc.) might better be modeled as a continuous ow than a large number of discrete parts.
91/128
High density
The probability of a two-dimensional random variable being in a small square is the probability density times the area of the square. (Actually, it is more general than this.)
Low density
92/128
93/128
in one, two, three, ..., innite dimensional spaces; in nite or innite regions of the spaces. probability measures with the same dimensionality as the space; lower dimensionality than the space; a mix of dimensions.
94/128
M1
B1
x1
M2
B2
x2
M3
M1
B1
M2
B2
M3
95/128
M1
B1
x1
M2
B2
x2
M3
M1
B1
M2
B2
M3
96/128
Dimensionality
x2 Onedimensional density Twodimensional density
Probability distribution of the amount of Zerodimensional density material in each of the (mass) two buers.
x1 M1 B1 M2 B2 M3
97/128
Discrete approximation
x2
98/128
Production rate = 0
r . (Why?) Demand rate = d < r +p Problem: producing more than has been demanded creates inventory and is wasteful. Producing less reduces revenue or customer goodwill. How can we anticipate and respond to random failures to mitigate these eects?
99/128
= 0 u=0
u=0
How do we choose u?
100/128
dx (t ) = u (t ) d dt
Cumulative Production and Demand production
dt+Z
if x < Z , u = , if x = Z , u = d , if x > Z , u = 0;
if = 0,
demand dt
u = 0.
t
How do we choose Z ?
2.852 Manufacturing Systems Analysis 101/128 c 2010 Stanley B. Gershwin. Copyright
prob (Z , , t ) is a probability mass. prob (Z , , t ) = prob (x = Z and the machine state is at time t ). Note that x > Z is transient.
2.852 Manufacturing Systems Analysis 102/128 c 2010 Stanley B. Gershwin. Copyright
=1 =0
x
dx = d dt
x=Z
103/128
x <Z :
=1
no failure
=0
repair
x x=Z
104/128
x <Z :
=1
failure
=0
no repair
x x=Z
105/128
x <Z :
x(t+ t)
=1
x(t) = x + d t
=0
f (x , 1, t + t ) x = [f (x + d t , 0, t ) x ]r t + [f (x ( d ) t , 1, t ) x ](1 p t ) +o ( t )o ( x )
2.852 Manufacturing Systems Analysis 106/128 c 2010 Stanley B. Gershwin. Copyright
+f (x + d t , 0)r t + f (x ( d ) t , 1)(1 p t )
107/128
108/128
df (x , 1) ( d ) t dx df (x , 1) ( d )p t 2 dx
o ( t )o ( x ) x
109/128
df (x , 1) ( d )p t 2 dx
110/128
df (x , 1) ( d )p t dx
111/128
112/128
=1
failure
=0
f (x , 0, t + t ) x = [f (x + d t , 0, t ) x ](1 r t ) + [f (x ( d ) t , 1, t ) x ]p t +o ( t )o ( x )
2.852 Manufacturing Systems Analysis 113/128 c 2010 Stanley B. Gershwin. Copyright
114/128
=0
115/128
116/128
117/128
Mathematical model
=1 =0
dx = d dt
x
dx = d dt
x=Z
P (Z , 0) = 0. Why?
118/128
=1
p t failure 1r t no repair
=0
119/128
120/128
[f (x , 0) + f (x , 1)] dx
121/128
P (Z , 0) = 0 where b= p r d d
0.02
1E-2
0 -40
-20
20
123/128
Observations 1. Meanings of b: Mathematical: In order for the solution on the previous slide to make sense, b > 0. Otherwise, the normalization integral cannot be evaluated.
124/128
The average duration of an up period is 1/p . The rate that x increases (while x < Z ) while the machine is up is d . Therefore, the average increase of x during an up period while x < Z is ( d )/p . The average duration of a down period is 1/r . The rate that x decreases while the machine is down is d . Therefore, the average decrease of x during an down period is d /r . In order to guarantee that x does not move toward , we must have ( d )/p > d /r .
125/128
Observations
If then or ( d )/p > d /r , r p < d d b= p r > 0. d d
That is, we must have b > 0 so that there is enough capacity for x to increase on the average when x < Z .
126/128
Observations
Also, note that b > 0 = p r > = d d r ( d ) > pd = r rd > pd = r > rd + pd = which we assumed. r >d r +p
127/128
P (Z , 1) = C d p P (Z , 0) = 0 That is, the probability distribution really depends on Z x . If Z is changed, the distribution shifts without changing its shape.
2.852 Manufacturing Systems Analysis 128/128 c 2010 Stanley B. Gershwin. Copyright
For information about citing these materials or our Terms of Use,visit: http://ocw.mit.edu/terms.