Anda di halaman 1dari 13

# An Introduction to Project Risk Management in Excel and R

Dr. Roberto Rossi mailto:roberto.rossi@wur.nl Logistics, Decision and Information Sciences Wageningen University the Netherlands May 2011

Introduction

In this tutorial we discuss some project evaluation and review techniques (PERT). This tutorial does not aim to be a comprehensive description of PERT. For a more thorough discussion the reader may refer to [1, page 60]. What we are going to discuss are both Monte Carlo simulation and analytical techniques for carrying out a quantitative risk assessment in project planning. This tutorial makes an extensive use of Excel1 and R2 . The former is used to demonstrate Monte Carlo simulation in practice, the latter is used to graphically display results of this simulation. We expect this tutorial to be useful for getting a feeling of the complexity that lurks behind dealing with uncertainty and risk issues in planning. Most of all, the reader should get acquainted with the fact that even for very small and apparently simple projects, assessing the risk associated with random activity durations and delays is far from being an easy task. Intuition is often misleading and the results presented are meant to demonstrate this fact. After this tutorial, the reader should be able to carry out a risk assessment for small projects and to suggest eective actions for reducing risks associated with delays in the completion of a project. This tutorial complements the lecture slides available at the following address http://sites.google.com/site/robros/Home/stochasticprojectmanagement.pdf?attredirects=0 . We expect a reader to be able to complete this tutorial in about 2 hours.
1 2

http://office.microsoft.com/nl-nl/products/ http://www.r-project.org/

A simple project

Consider a project comprising 5 activities. The durations of these 5 activities and the respective precedence constraints are reported in Table 1. The Activity A B C D E Duration (hours) 8 10 4 5 6 Activity precedence constraints A B,C B E C D D E n/a

## Table 1: A simple project corresponding project graph is presented in Fig. 1.

B A E

Figure 1: The project graph for the simple project in Table 1 By using the critical path method (CPM) [1, page 56] it is possible to determine the critical path for this project. The critical path is the sequence of activities that determines the duration of the project. For the project in Table 1, it is immediately seen even without applying the CPM that the critical path is ABE. In real life, however, the duration of activities in a project is not known in advance. Delays are the norm, especially in large and complex projects. Planners must take these potential risks into account and a simple CPM analysis based on deterministic activity durations is often not sucient to correctly estimate the duration of a project. In the following section we explore what happens when activities have uncertain durations. We will try to predict how likely is for our project to exceed a certain deadline and which strategies we can adopt to analyze the risks associated with each activity.

## Activity duration uncertainty

When activity durations cannot be predicted with certainty, it is common practice to model them as random variables. For instance, we may represent the duration dA for a given activity A as a normally distributed random variable with mean A = 8 and standard deviation A = 2. The cumulative distribution function (CDF) for this random variable is shown in Fig. 2. The CDF in Fig. 2 gives us a complete overview on the likelihood for activity A to exceed a certain duration. For instance, A will last less than 11 hours with probability 0.93. This information can be obtained in Excel, by using the following formula =NORM.DIST(11,8,2,TRUE) for more information on the use of NORM.DIST() in Excel, please refer to the Excel help.

Figure 2: The CDF for the normally distributed duration of activity A. The distribution of the duration for a certain activity can be derived from past data (i.e. by tting a distribution via the least square method) or it can be estimated via expert assessment (i.e. by eliciting the minimum and maximum possible durations of an activity and by tting a Triangular or a Beta distribution to these data). It is out of the scope of this tutorial the discussion of how to derive good distributions for activity duration. In what follows we will assume that these distributions are given. There are, however, several free or commercial packages (i.e. SPSS3) that can t a distribution to data automatically. In the next section we discuss how to use Monte Carlo simulation in Excel to estimate the duration of a project involving activities with random durations.
3

http://en.wikipedia.org/wiki/SPSS

## Project planning under activity duration uncertainty

Assume that activities presented in Table 1 have uncertain duration. In particular, activity durations are normally distributed random variables with given mean () and standard deviation ( ). These are reported in Table 2. Activity A B C D E Duration (hours) 8 2 10 4 4 2 5 2 6 3 Activity precedence constraints A B,C B E C D D E n/a

Table 2: A simple project under activity duration uncertainty Our aim is to (a) determine the empirical distribution of the completion time for a given activity (b) use R to display the empirical distribution (c) determine the probability that our project will exceed a given deadline (d) determine an analytical approximation of this probability and understand the limits of this approximation (e) determine the criticality index of a given activity (i.e. probability of being part of the critical path) (f) perform a what if analysis to identify actions that will improve project performances. To accomplish the tasks above we will now discuss how to apply Monte Carlo simulation in Excel to the project in Table 2.

## Monte Carlo simulation in Excel

The key idea of Monte Carlo simulation is to repeatedly draw random numbers from the assumed distributions and then to carry out a statistical analysis on the results obtained. Let us rst draw a random activity duration in Excel. Once more, we refer to the activity A previously discussed, which has a normally distributed duration with mean A = 8 and standard deviation A = 2. We can draw a random duration by generating a random number r uniformly distributed in [0,1], and by inverting the CDF of the associated Normal distribution at the respective r -quantile. Graphically, this process is shown in Fig. 3 for r = 0.8; in this case the r -quantile obtained by inverting the CDF is equal to 9.68. This number represent our observed random duration. In Excel

Figure 3: The CDF for the normally distributed duration of activity A. we can perform this inversion by using the following instruction =NORM.INV(0.8,8,2) for more information on the function NORM.INV() in Excel, please refer to the Excel help.

Figure 4: Generating random numbers in Excel. It is now easy to draw a random duration for each of the 5 activities in the project. To do so, we rst generate 5 random numbers by using the instruction =RAND() in Excel (Fig. 4). Note that the rst cell in the picture is positioned at coordinate G1! 5

Next, we input for each activity the respective mean and standard deviation for its duration (Fig. 5).

Figure 5: Problem parameters in Excel. We can now generate for each activity a random duration by inverting the CDF as previously discussed (Fig. 6).

Figure 6: Generating random durations in Excel. Next, we determine the completion times for this specic set of realizations for activity durations. We position the completion times in cells T4, U4, V4, W4, X4 (Fig. 7). The respective expressions for each cell can be found in Table 3. Please spend some time to understand the logic behind these expressions.

Figure 7: Determining completion times in Excel. We are now ready to perform our Monte Carlo simulation. To do this, we select cells from H4 to X4 and we duplicate them 1000 times. What we obtain is a large spreadsheet like the one shown in Fig. 8

Activity A B C D E

## Table 3: Excel expressions for computing activity completion times

Figure 8: Monte Carlo simulation in Excel. To address point (a), we will use the 1000 samples available in columns T, U, V, W, X. We select these samples and we save them in a separate comma separated values (.csv) le. This le can be immediately imported in R4 by using the following instruction ct <- read.csv(file="completion_times.csv",sep=",",head=TRUE) R can plot several kind of graphs for the data imported. For instance, we can plot a histogram (Fig. 9) with the following instruction: hist(ct\$E,breaks=200,xlim=c(0,50), main=Completion time of activity E,xlab=duration) However, a boxplot5 may often provide more insights on the completion time of a given activity or of the whole project (Fig. 10). The following instruction plots a boxplot in R: boxplot(ct\$E,main=Completion time of activity E,xlab=E)
4 5

## Completion time of activity E

Frequency

0 0

10

20

30

40

10

20 duration

30

40

50

Figure 9: Completion time histogram of activity E. Note that this is also the empirical distribution of the project as a whole. We have therefore addressed point (a) and point (b). To address point (c) we must introduce a number of changes in our spreadsheet. Assume a given deadline d = 35 is xed. We introduce the expression =IF(X4<=35,1,0) in cell Z4 (Fig. 11); this will be our deadline indicator function, that raises a ag whenever the deadline d is exceeded. We also introduce the relevant headers and labels as shown in the picture. Then we replicate this function for each of our 1000 samples (i.e. rows). The probability of not exceeding the deadline can be computed via the function shown in Fig. 12, that is =AVERAGE(Z4:Z1003) which we insert in cell AC3. It is often good practice to perform condence interval analysis6 on the results obtained by Monte Carlo simulation, in order to determine our condence in these estimated values. This discussion is however, out of the scope of this tutorial.
6

http://en.wikipedia.org/wiki/Confidence_interval

## Completion time of activity E

40 15 20 25 30 35

Figure 10: Completion time boxplot of activity E. Note that this is also representative for the project as a whole. We have now addressed point (c). It is often possible to obtain the same result for point (c) by employing an analytical approximation. The analytical approximation is carried out the following way: consider the deterministic project in which each activity has a duration equal to its expected duration compute the critical path and let C be the set of activities ai in the critical path compute the mean and standard deviation of the duration associated with the critical path by using this new random variable, compute the probability associated with a duration d for the critical path. We now carry out these steps for our example. As already discussed in Section 2 the critical path for the deterministic project in which each activity has a duration equal to its expected duration is ABE. Since the 9

## Figure 11: Deadline indicator function.

Figure 12: Probability of not exceeding the deadline. durations of these activities are all normally distributed their sum is also normally distributed with mean equal to A + B + E = 24 and standard deviation equal to
2 + 2 + 2 = 5.3. A B E

We therefore have a new normally distributed random variable with mean = 24 and standard deviation = 5.3 representing the duration of the critical path. We can compute the probability that the duration of the critical path is less than a given deadline d by using the excel formula =NORM.DIST(d,24,5.3,TRUE) which for d = 35 returns 0.9810. This value is slightly higher than the one we observed in our Monte Carlo simulation. In fact, this approximation tends to be optimistic since it disregards the impact of other paths. On the other hand, especially for large project, this approximation constitutes a quick and dirty way of getting an idea about the risk of exceeding a given deadline. This addresses point (d). To address point (e) we must, once more, extend our spreadsheet. This extension is not an easy job, since it requires to identify which activities are on the critical path for each realizations in our Monte Carlo simulation. There are software packages such as Risky Project 7 that can perform this task in a very eective way. Unfortunately, these are not free. We will
7

http://www.intaver.com/

10

therefore focus only on simple projects, so that identifying activities on the critical paths remains a relatively easy task. Nevertheless, as we will see, despite the simple structure of the project graph analyzed, our spreadsheet will produce results that are far from being trivial. This demonstrates the diculty of analyzing project plans under uncertainty. We introduce in cells AG4, AH4, AI4, AJ4, AK4 the formulas presented in Table 4. The reasoning behind these formulas is the following. Activities A and E are clearly always on the critical path. Activity D is on the critical path only if its completion time (stored in cell W4) is greater than the completion time of activity B (stored in cell U4). Conversely, B is on the critical path if its completion time is greater or equal to the completion time of D. C is on the critical path if D is on the critical path. Of course, depending on the structure of the network, working out these formulas may constitute a very complicated issue. After entering the formulas discussed Activity A B C D E Critical path (1=true, 0=false) =1 =IF(U4W4,1,0) =IF(AJ4=1,1,0) =IF(W4>U4,1,0) =1

Table 4: Excel expressions for identifying activitieson the critical path. in Excel, we duplicate them for each of our 1000 samples. Then we also introduce in cells AM4, AN4, AO4, AP4, AQ4 the averages for these results. So, in AM4 we introduce the formula =AVERAGE(AG4:AG1003) in AN4 the formula =AVERAGE(AH4:AH1003) and so forth. The resulting spreadsheet is shown in Fig. 13. This analysis reveals that with probability 0.88 activity B is on the critical path, while C and D are on the critical path with probability 0.12. These values also represent the criticality indexes of these activities, therefore this addresses our point (d). Software like Risky project typically provides a list of the most critical paths with the respective probability of occurrence. This is particularly useful when one has to decide how to invest resources in order to tackle risks related to the duration of the project. Nevertheless, such a reasoning can be carried out also by looking at the criticality index of each activity. In our case, the criticality index of activity A, B and E is much higher than that 11

Figure 13: Determining the probability for an activity to be on the critical path in Excel. of activities C and D. Therefore, we should carefully invest resources and eort to reduce the duration of activity A, B and E in order to also reduce the duration of the whole project. For instance, assume that an additional consultant is assigned to activity B. According to your past experience as a manager, this has reduced on average by 3 hours the duration of this activity. It is clear that, by using our spreadsheet, it is fairly easy to nd out how this action is going to aect the completion time of the project and the criticality index of activity B. In fact, we just need to reduce the value in cell C4 by 3 units and observe the changes in the remaining cells. What is now the probability of nishing on time if d = 35? .................... What is now the probability of nishing on time if d = 30? .................... Save the new set of completion time realizations in a .csv le and plot again the results in R. Describe how the histogram is aected.

## Describe how the boxplot is aected.

12

Assignment

Consider the project in Table 5. The corresponding project graph is preActivity A B C D E F Duration (hours) 8 2 10 4 4 2 5 2 6 3 7 2 Activity precedence constraints A B,D B C,E C F D E E F n/a

Table 5: A project under activity duration uncertainty sented in Fig. 14. Address points (a), (b), (c), (d), (e) for this new project.

B A

C F

## Figure 14: The project graph for the project in Table 5

References
[1] Michael L. Pinedo. Planning and Scheduling in Manufacturing and Services (Springer Series in Operations Research and Financial Engineering). Springer, June 2007.

13