Part I.
Part II.
Part III.
Part IV.
1.
2.
3.
1.
2.
3.
4.
5.
1.
2.
1.
2.
3.
4.
Introduction
Macros of Excel
Setting of Excel 2003 and earlier versions
Setting of Excel 2007
Loading macros to Visual Basic Editor
Deciding probability weights
Estimation of the distribution form a sample
Simple considerations for distribution choice
More detailed information from the histogram
Probability density functions of distributions
Selecting a distribution without a sample
Simple static simulation
Running the simulation
Appreciating the results
More examples
Portfolio analysis
Project planning
Travel time estimation
Reliability of systems
Index
Page
2
3
4
5
7
9
10
12
13
16
16
21
21
23
25
24
25
26
29
31
Introduction
Many decisions must be made involving unpredictable figures1 such as a share price six
months from now. Other examples are sales forecasts, time duration of performing a
completely new task, future selling prices, inflation and exchange rates.
A general strategy that may indicate what decision is most advisable is to assume several
possible realizations of the figures called scenarios -, calculate a numerical outcome
from each scenario and weigh the outcomes by the probability of the respective
realizations. Nowadays this procedure is computerized and it is possible to check
hundreds or thousands of scenarios and thereby increase the credibility of such an
analysis. The procedure is called static or Monte-Carlo simulation (or just simulation in
the rest of this tutorial). Excel can carry out static simulations with the aid of suitable
macros.
This tutorial expects some familiarity with Excel but none for statistics or probability. It
illustrates static simulations and introduces the minimal theory underlying probabilities of
realizations. The tools for these are the macros available in
http://my.jce.ac.il/bani/StaticSimulation/Sources.
The macros are free in the sense of GNU Lesser General Public License (LGPL) that
means approximately that they can be freely used for any purpose, including commercial
one; but please read LGPL for the exact meaning. The macros can make essentially the
same job as the commercial programs ExpertFit and Crystal Ball that sell for
hundreds of dollars.
At first it is explained how to attach probability weights (in Part II) aided by macro
ChooseDistribution and only then how to use the Simulate macro that performs the
simulation run (in Part III). If you have not done yet, please, install the macros and set up
Excel so that you can rerun all the examples given in this tutorial (explained in Part I).
Part IV shows an assortment of simple example problems.
An introductory example used in Parts II and III is the problem of an American call
option. It is a decision whether it is worthwhile to pay a given sum now (called
premium) for the right to buy a share on an agreed future date (called expiration date) at a
given fixed price (called strike price) agreed today. Obviously, this depends on the price
of the share at the expiration date. The outcome is supposed to aid the decision maker.
The outcome is based on the difference between the expense and the gross profit while
the expense is the sum of the premium and the strike price and the gross profit is the
shares market value at the expiration date. It is worthwhile to pay now the premium only
if profit is more probable than loss.
Source file
BoundsFrm.frm
chooseFrm.frm
FitFrm.frm
mainForm.frm
htpFrm.frm
progressForm.frm
ChooseDistribution.bas
FitModule.bas
Simulate.bas
Utility.bas
BoundsFrm.frx
chooseFrm.frx
FitFrm.frx
mainForm.frx
htpFrm.frx
progressForm.frx
Empty.xls
Samples.xls
GNULesserGeneralPblicLicense.txt
GNUlicense.txt
DrawDistributionSetup.exe
MonteCralo.zip
File types
Source files
The examination of the macro source code is possible in Excel 2007 only if the
Developer tab is placed on the ribbon. This is also not a default setting and it is
necessary to request. This is done in the Popular part of Excel options. The setting is
shown in Fig 7. After this operation the Developer tab may be open where clicking the
Visual Basic button pops up the Visual Basic Editor (see Fig.8; about the Editor see
under section 3).
Check if the ribbon of Data contains the Data Analysis button in the Analysis
group as in Fig. 9. If not, you have to request it by entering Excel Options (Fig. 6)
selecting Add-Ins and clicking the Go button in the bottom. A list of add-ins is
popped up where Analysis ToolPak is to be selected (Fig. 3).
This advice is a good way to ovoid malicious code (virus, Trojan horse, spy ware etc) in a macro. Of
course, this is possible only when the source is available.
Excel Function
=BETAINV(RAND(), alpha, beta, lower bound, upper bound)
=-LN(RAND())*mean
=GAMMAINV(RAND(),shape parameter, scale parameter)
=LOGINV(RAND(),mean of ln(X), standard deviation of ln(X))
=NORMINV(RAND(),mean, standard deviation)
Recall from the Introduction that for the American call option problem you want to make
an educated guess about the share price at the expiration date. You may either estimate
the price based on historical data (if available); in statistical parlance it called as
estimation based on a sample. The sample is the historical data and its individual items
are referred as observations. Alternatively, you can guess the share price without
previous sample. These are the two options presented in Fig. 11. We shall discuss the first
possibility in section 1 while the second on section 5. In the intermediate sections
important techniques and concepts are introduced.
1. Estimation of the distribution form a sample
Suppose that the expiration date is six months from now and it is possible to find
historical data about prices of 20 shares that behave similarly to the one under
consideration. So we can find the price increase or decrease (in percents) from their
prices six month ago to the prices today. We assume, for the sake of illustration, that
similar change will occur to our share in the future. The increases are displayed in Table
3, while a negative number means that the price went down.
8.03
1.11
-2.58
7.66
1.59
7.14
9.17
14.82
1.61
8.95
7.30
10.58
15.66
7.11
0.59
5.62
9.00
10.86
17.98
2.62
10
distribution. The preferred distribution in the case under consideration is the exponential
that is shifted by the offset in row 6. Thus, to get a realization of the share price six
months from now is given by
=-4.165 -LN(RAND())*15.436.
11
The log-normal distribution is, however, valid not for the percent change but for the ratio between the
present price and the price six months ago. This was raised for methodological reasons to open the
discussion for theoretical considerations.
10
The financial mathematicians choice of log-normal distribution is a result of much more sophisticated
considerations beyond the scope of this tutorial
12
Are the realizations deviations from the mode above the mode and below the
mode equiprobable? If so, the distribution is termed symmetric. Deviations from
a nominal weight or size due to slight technical variations in production
processes11 are often assumed to be symmetric. The nominal values serve as the
modes.
Non-existing of a lower bound means that any negative value is possible, non-existing of
upper bound means that values may be large without any limitation, although large
deviations above the mode are extremely rare.
The answers for three questions above for the five distributions are shown in Table 4.
The alpha and beta in the table are the parameters of the distribution of Table 2.
Distribution
Beta
Exponential
Gamma
Log-normal
Normal
Is Lower Bound?
Yes
Yes
Yes
Yes
No
Is Upper Bound?
Yes
No
No
No
No
Symmetric?
Only if alpha=beta
No
No
No
Yes
11
This holds only if the process is under control. Indeed, large part of statistical quality control is based on
analyses of possible asymmetries.
12
See the comparison in subsection 4 further
13
In some installation the data analysis does not appear under Tools menu. See Part I. how to cause it
appear.
13
button on the ribbon of the Data tab (in Excel 2007). This opens a list of the
available statistical methods. Histogram should be selected.
3. The selection opens a window shown in Fig. 15 where the location of the data
(A1:A20 supposing the situation from Fig. 12) is entered. The output range should
be an empty cell that below and right to it there are empty cells so that the output
can be written there. The output is shown in Fig 17 in the range E1:F8. In column
E has the upper boundaries of the bins and in column F the number of cases
falling in the bin. Thus, there are five numbers in the bin whose boundaries are -3
and 2.
14
14
If the number of bins tends to infinity and at the same time the rage of bins tends to 0 then the histogram
tends to the pdf.
15
Figs 19 and 21 are different only in standard deviation. Both are centered at 0 (the mode
is 0), but the corresponding range in Fig 21 is twice that of Fig 19. Hence, the standard
deviation can stretch the horizontal axis. Such stretching parameter that does not
influence the shape or location is called scale parameter. The parameter that changes the
shape is called the shape parameter. Normal distributions have the same overall shape.
Another distribution where the shape is unchanged is the exponential distribution (offset
0) shown in Fig. 22. Its mode is always 0 and the mean is its scale parameter.
The gamma distribution is another one that has a scale parameter. The shape depends on
the shape parameter see Figs 23 and 24. The larger the shape parameter, the farther is
the mode from the lower bound (0) and the more symmetric the pdf is.
16
The pdf of the log-normal distribution is shown in Figs 25 and 26. The shape is a
complicate function of the parameters. Log-normal distributions are similar to gamma
distributions.
17
mode tends to the upper bound as demonstrated by Figs 29 and 30. The tendency is
stronger when the imbalance between the two parameters is larger
Figure 27. Beta pdf with alpha parameter=3 Figure 28. Beta pdf with alpha parameter=5
and beta parameter=3
and beta parameter=5
Figure 29. Beta pdf with alpha parameter=2 Figure 30. Beta pdf with alpha parameter=5
and beta parameter=5
and beta parameter=2
Finally note that the beta pdf may be very similar to normal pdf (compare Figs 21 and
28). The normal distribution may be closely approximated by a beta distribution where
alpha=beat=4.3 and the bounds are mean3(standard deviation); sign is for the lower
and + for the upper bound. The beta approximation is preferable if normal distribution
may lead to negative values with probability that is not negligible.
Beta pdf may be also similar to gamma pdf (compare Figs 23 and 29). Moreover, beta
distribution can be also applied to the case where any value is equally probable between
bounds:
=BETAINV(RAND(),1,1,lower bound, upper bound)
18
15
This is an approximation to the pdf or more exactly to pdf multiplied by a constant that has no influence
on the shape of pdf and is satisfactory for the purpose.
19
20
The reason is that the choice is random. The truth is that it only looks so, what is called pseudo-random.
This issue is however beyond the scope of this tutorial.
17
Such functions are widely used in modern probability theory and called indicator functions.
18
Macro #9 of Table 1.
19
You cannot see macro Simulate if it has not been installed; see how to install it in Part I.
21
scenarios is performed and only the average and standard deviation of the outcomes are
computed. Besides this information there are possibilities to get the following if the
options are checked:
the outcomes of all the scenarios
histogram of the outcomes
fitting a distribution to the outcomes
.
22
Average
Standard
deviation
t
statistic
0.56
0.36
0.47
0.55
0.57
0.55
0.55
0.53
0.49
0.50
0.50
0.50
0.50
0.50
2.31
2.06
1.98
1.96
1.96
1.96
1.96
Lower 95%
confidence
limit
0.15
0.16
0.37
0.52
0.55
0.54
0.55
Upper 95%
confidence
limit
0.96
0.56
0.57
0.59
0.59
0.57
0.55
where t is the t statistic from the table, sign is used for the lower and sign + for the
upper confidence limit. For example the lower limit for 25 scenarios is computed as:
=0.36 2.060.49 /SQRT(25)
20
23
21
The formula is a result of approximating the value of the t statistic by 2 in (1), equating the difference
between the upper and lower confidence limits as given by (1) to and solving the equation for the
number of scenarios, N.
24
Expected return %
6
1
8
Assets
Real Property
unit portfolio
=C3*B3/200
State Bonds
=NORMINV(RAND(),1,0.1)
=C4*B4/200
Shares
=NORMINV(RAND(),8,3)
=C5*B5/200
Sum
=SUM(D3:D5)
22
23
25
The problem dealt in this Section is called stochastic PERT in the project management literature.
26
Immediate predecessors
Basic specification
Market research
Functional design
Documentation
Detailed design and prototype
none
none
Basic specification
Basic specification
Market research, Functional design
Minimum
time
2
1
2
4
5
Maximum
time
4
3
3
5
8
Figure 38. The model of project duration and its simulation setting
Fig 38. also shows the simulation settings. The earliest time of milestone #1 is 0 and the
others are based on formulas presented in Table 9. For example, the earliest time of
27
milestone #4 (end of project) is when both Documentation (its starting time is in cell B3)
and Detailed design and prototype (its starting time is in cell B4) are done. Note that in
this case other activities finish is implicit in cells B3 and B4. As only bounds are
available in Table 8, any value between the bounds it taken with equal probability that
leads to functions
(see end of Section 4 of Part II).
Milestone #
Earliest time
2
3
4
minimum time
2
3
2
3
maximum time
4
9
4
7
Define the immediate predecessor nodes as the set of nodes from which there is a section
to a given node. A has no predecessors, nodes B and C each has a single predecessor
while node C has two predecessor nodes: A and B.
The Excel sheet (Fig 40) calculates the shortest time from node A to a given node. The
shortest time for node A is clearly 0. For the other nodes the shortest time is the minimum
over the set of all immediate predecessor nodes the sum of the shortest routes to a
predecessor and the traverse time to the node under consideration. The actual formula for
this problem is given in Table 11.
As a result, the true average is between 10.1 and 10.3 but any value approximately
between 6.5 and 13.5 is possible.
25
The problem dealt in this Section is called stochastic shortest route in the network algorithms literature.
28
Figure 40. The model of travel time and its simulation setting
Town
Earliest time
29
Component
Camera
Battery A
Battery B
Life time
!" #$!!%&'%(
$) %&$
"% (*(!%)
30
Index
activity
American call
beta distribution
bin
bound: lower
bound: upper
call
ChooseDistribution macro
confidence limits
covariance matrix
Data analysis package in Excel
distribution
distribution: symmetric
estimation
exponential distribution
gamma distribution
histogram
immediate predecessor
Kolmogorov-Smirnov test
K-S test
log-normal distribution
lower bound
macro
macro: ChooseDistribution
macro: Simulate
milestone
mode
model
Monte-Carlo
Monte-Carlo simulation
normal distribution
numerical outcome
observations
one sided test
option
page
26
2
9
13
12
12
2
2
23
25
4
9
13
9
9
9
13
26
10
10
9
12
2
2
2
26
12
25
21
21
9
2
10
10
2
31
outcome
pdf
PERT
predecessor
probability density function
sample
scale parameter
scenario
security
shape parameter
shotest route
Simulate macro
stachastic
static simulation
statistical distribution
statistical distribution: beta
statistical distribution: exponential
statistical distribution: gamma
statistical distribution: log-normal
statistical distribution: normal
statistical distribution: symmetric
statistical distribution: t Student
statistical distribution: Weibull
statistical independence
statistical test: Kolmogorov-Smirnov
statistical test: K-S
statistical test: one sided
statistical test: two sided
symmetric distribution
t distribution
trust
two sided test
upper bound
Visual Basic Editor
Weibull distribution
page
2
15
26
26
15
10
16
2
4
16
28
2
2
2
9
9
9
9
9
9
13
24
30
25
10
10
10
10
13
24
5
10
12
4
30