Anda di halaman 1dari 37

Artificial Intelligence for Project Management

CT5813701
Assignment #5

Data Mining Software Tutorial: Ensemble Model


using WEKA

Lecturer:
Prof. Jui-Sheng Chou

By:
Richard Antoni Gosno
M10605820

National Taiwan University of Science and Technology


Department of Civil and Construction Engineering
December, 2017
Artificial Intelligence for Project Management

1. Introduction
This assignment is related from previous assignment, which is demonstration of
baseline model in WEKA. From previous demonstration, we have already known
how to operate 4 baseline models; Artificial Neural Network (ANN), Linear
Regression (LR), Support Vector Regression (SVR), and Classification and
Regression Tree (CART). In this assignment, there will be a demonstration to build
an ensemble model based on 4 baseline models in WEKA. There are also 4 kinds
of ensemble model that will be demonstrated in this assignment:
 Voting
 Bagging
 Stacking
 Tiering

2. Problem Statement
This assignment will use “Heating load data” as the dataset. This dataset contains
768 datapoints which is divided into 8 columns of input variables (X1-X8) and 1
column of output variables (Y). Inputs and output of the dataset is shown in Table
1. The statistical description of the dataset is shown in Table 2.
Table 1. Dataset Input and Output Variables

Data no. X1 X2 X3 X4 X5 X6 X7 X8 Y1
1 0.9 563.5 318.5 122.5 7 2 0.4 1 36.47
2 0.76 661.5 416.5 122.5 7 5 0.4 1 40.43
3 0.76 661.5 416.5 122.5 7 5 0.1 4 32.31
4 0.86 588 294 147 7 5 0.1 2 25.36
5 0.9 563.5 318.5 122.5 7 4 0.1 5 28.03
6 0.86 588 294 147 7 5 0.25 5 29.39
7 0.62 808.5 367.5 220.5 3.5 4 0.1 2 12.97
8 0.82 612.5 318.5 147 7 2 0 0 17.05
… … … … … … … … … …
… … … … … … … … … …
… … … … … … … … … …
764 0.86 588 294 147 7 3 0.1 5 26.45
765 0.62 808.5 367.5 220.5 3.5 4 0.25 2 14.6
766 0.82 612.5 318.5 147 7 3 0.4 5 29.5
767 0.82 612.5 318.5 147 7 3 0.4 2 29.49
768 0.9 563.5 318.5 122.5 7 5 0.1 4 29.79

1
Artificial Intelligence for Project Management

Table 2. Statistical Description of Dataset

Attributes Average Minimum Maximum St. Deviation


X1 0.78 0.62 0.98 0.10
X2 659.27 514.50 808.50 81.11
X3 313.30 245.00 416.50 43.21
X4 172.98 110.25 220.50 45.30
X5 5.41 3.50 7.00 1.77
X6 3.45 2.00 5.00 1.20
X7 0.18 0.00 0.40 0.12
X8 2.76 0.00 5.00 1.39
Y 21.78 7.10 42.77 9.59

3. Preparation
After getting information about the dataset, we will start to demonstrate the baseline
model in WEKA. Before everything, please make sure to convert the “Heating load
data” from excel format (*.xls) to csv format (*.csv). This is because WEKA can
only read the data based on several format. For this demonstration, it is safe to use
the csv format of the dataset. You can convert by opening the excel file and then
save the file as csv file (*.csv) as seen in Figure 1.

Figure 1. Save Dataset into CSV Format


The next step is to load the dataset into WEKA. Open WEKA application
and select explorer. The WEKA interface is shown in Figure 2.

2
Artificial Intelligence for Project Management

Figure 2. WEKA Application Interface


Click open file, choose *.csv in the file format and then select the dataset.
Figure 3 shows how the dataset looks in Preprocess tab.

Figure 3. Dataset Loaded in WEKA

3
Artificial Intelligence for Project Management

4. Ensemble Model Demonstration

After loading the dataset into WEKA, now we are ready to create ensemble model
in WEKA. As mentioned before, this assignment will create 4 ensemble models:
Voting, Stacking, Bagging and Tiering.

4.1. Ensemble Model: Voting


Choose classify tab, and you can see the appearance as shown in Figure 4.

Figure 4. Classify Tab of WEKA Explorer


To create a voting model, we need to create a combination of baseline model
first. The model combination to create voting model is shown in Table 3. As there
are so many possible combinations of baseline model, the demonstration for voting
model below is shown only for ANN+CART+SVR+LR.

4
Artificial Intelligence for Project Management

Table 3. Combination of Baseline Models for 11 Voting Models

Voting Model Baseline 1 Baseline 2 Baseline 3 Baseline 4


1 ANN CART - -
2 ANN SVR - -
3 ANN LR - -
4 CART SVR - -
5 CART SVR - -
5 CART LR - -
6 SVR LR - -
7 ANN CART SVR -
8 ANN CART LR -
9 CART SVR LR -
10 ANN SVR LR -
11 ANN CART SVR LR
*. Voting Model 11 is demonstrated in this assignment

After determining all the combinations for voting model, we will start to
create the voting model for ANN+CART+SVR+LR. At classifier panel click
“choose”, the voting model is located in classifiers/meta/vote (Figure 5).

Figure 5. Choose Vote to be the Classifier


Now, we need to set the combination of base models in the vote setting.
Click the vote text in the classifier panel to go to the settings. Click the classifier in
the object editor until it shows like Figure 6.

5
Artificial Intelligence for Project Management

Figure 6. Object Editor of Vote

The default setting of the baseline model is set for ZeroR. Since we do not
need the ZeroR, click the ZeroR in the list and delete it. Click “Choose” to insert
the baseline model into voting model. The location of the baseline model is the
same from previous assignment. ANN is located in
classifiers/functions/MultilayerPerceptron, LR is located in
classifiers/functions/LinearRegression, SVR is located in
classifiers/functions/SMOreg and CART is located in classifiers/tree/REPTree. If
you decide to use RBFKernel as SVR Kernel, just click the the SMOreg beside the
“Choose” button to change the kernel to RBFKernel (Figure 7). We need to click
“Add” after choosing every baseline model we want to input into voting model. The
classifier box should be like Figure 8 if you have already input all the baseline
model (ANN+CART+SVR+LR).

Figure 7. SVR Kernel Setting

6
Artificial Intelligence for Project Management

Figure 8. Combination of Baseline Models in Voting Model

Click “Ok” to close the object editor. Set the folds for cross-validation to 10
in the Test options panel and do not forget to click More options and choose
PlainText or CSV as output predictions. After that, the ensemble model is ready to
run. The result of the voting is shown in Figure 9.

Figure 9. Result from Voting (ANN+CART+SVR+LR)

Save the model if you want to use the model for future testing. The step for
the other combination of voting model is the same. The difference is only the
baseline models inputted in the voting model. Repeat the step above to do the 10
other voting model.

7
Artificial Intelligence for Project Management

4.2. Ensemble Model: Bagging


The initial step for running bagging model is quite the same with the previous
ensemble model. The difference is just at the classifier panel, choose “Bagging”. It
is located in classifier/meta/Bagging (Figure 10).

Figure 10. Choose Bagging to be the Classifier


For bagging, there is 4 possible bagging models; bagging model for ANN,
LR, SVR and CART. In this assignment, the bagging of LR will be demonstrated.
After choosing “Bagging” as the classifier. Go to the setting (object editor) and set
the classifier into LR (Figure 11).

Figure 11. Object Editor of Bagging

8
Artificial Intelligence for Project Management

Since we will do the default setting of bagging, click “Ok” to close the
object editor. The default setting of WEKA’s bagging model is 100 bag size. Set
the folds for cross-validation to 10 in the Test options panel and do not forget to
click More options and choose PlainText or CSV as output predictions. After that,
the ensemble model is ready to run. The result of bagging is shown in Figure 12.

Figure 12. Result from Bagging (LR)


Repeat the step above for Bagging model with ANN, CART and SVR.
There will be a total of 4 bagging result collected in this assignment.

4.3. Ensemble Model: Stacking


The initial step for running stacking model is also quite the same with the previous
ensemble model. The difference is just at the classifier panel, choose “Stacking”. It
is located in classifier/meta/Stacking (Figure 13).

Figure 13. Choose Stacking to be Classifier

9
Artificial Intelligence for Project Management

For stacking, the possible combination of stacking model is 4. This is


because we use 4 baseline models. Stacking method divides the classifier into two
parts, the first classifier is set to be stacked to the input and the second classifier
called meta-classifier is set to be the predictor. Since we want to avoid bagging in
this stacking model, the meta-classifier model must not be the same with the stacked
classifier. For example, if we use ANN, LR and SVR to be the stacked classifier,
the meta classifier must be CART. The rest of combination also share the same
rules. All stacked combination is shown in Table

Table 4. Combination of Baseline Models in Stacking Model

Stacking Model Stacked Classifiers Meta Classifier


1 ANN SVR CART LR
2 ANN SVR LR CART
3 ANN LR CART SVR
4 SVR LR CART ANN

In this assignment, the stacking model 1 will be demonstrated


(ANN+SVR+CART with LR meta-classifier). Go to the setting (object editor) and
set the classifier into ANN+SVR+CART (Figure 14). Delete the ZeroR classifier
and for SVR, set the kernel into RBFKernel. After that, set the meta-classifier into
LR (Figure 15).

Figure 14. Object Editor for Stacking (1)

10
Artificial Intelligence for Project Management

Figure 15. Object Editor for Stacking (2)


Click “Ok” to close the object editor. Set the folds for cross-validation to 10
in the Test options panel and do not forget to click More options and choose
PlainText or CSV as output predictions. After that, the ensemble model is ready to
run. The result of the stacking is shown in Figure 16.

Figure 16. Result from Staking Model 1 (ANN+SVR+CART, LR meta-classifier)

Repeat the step above for the 3 other Stacking models. There will be a total
of 4 stacking result collected in this assignment.

11
Artificial Intelligence for Project Management

4.4. Ensemble Model: Tiering


For tiering, the step is a bit complicated. First, you must decide to build how many
tier in the dataset. For this assignment, there will be 3-tier classification in the
dataset; high, medium, low. To divide the dataset into 3-tier, we need to determine
threshold value for classification. The threshold value is calculated by equation
shown below:

(𝑀𝑎𝑥(𝑌)+𝑀𝑖𝑛(𝑌))
𝑇= (4.1)
𝑛

Where:
T: threshold value
Max(Y): Maximum value of output attribute in dataset
Min(Y): Minimum value of output attribute in dataset
n: number of n-tier
Since we use 3-tier classification, the value of n is equal to 3. After
determining the threshold value, we can build some classification rule to classify
our dataset. The classification rule is shown in equation below:

𝐼𝑓 𝑌 < 𝑇, 𝑡ℎ𝑒𝑛 𝑌𝑐𝑙𝑎𝑠𝑠 = 𝐿𝑜𝑤 (4.2)

𝐼𝑓 𝑇 ≤ 𝑌 ≤ 2𝑇, 𝑡ℎ𝑒𝑛 𝑌𝑐𝑙𝑎𝑠𝑠 = 𝑀𝑒𝑑𝑖𝑢𝑚 (4.3)

𝐼𝑓 𝑌 > 2𝑇, 𝑡ℎ𝑒𝑛 𝑌𝑐𝑙𝑎𝑠𝑠 = 𝐻𝑖𝑔ℎ (4.4)

Apply this classification rule to our dataset. Open the dataset and calculate
the threshold value (Figure 17). We can easily use if function in excel to classify
our dataset base on the classification rule above.

Output
class
Threshold

Figure 17. Threshold Value Calculation

12
Artificial Intelligence for Project Management

After determining all the output class, now we have to split the dataset into
training and testing subsets. Since the excel for generate the cross-validation index
is not provided in the class, the writer will use MATLAB program to automatically
split the dataset into testing and training subset. This assignment will use 10-folds
cross-validation which means that the dataset will be split into 10 parts. In each
fold, there is a specific part that will be set into testing subsets while the rest
datapoints will be set into training subsets. This step is repeated for 10 parts so in
the end each part will be set into testing subsets exactly 1 time and set into training
subsets exactly 9 times. The MATLAB program interface to split the dataset is
shown in Figure 19.

Figure 18. Program for Splitting the Dataset


All we need to do just set the source file, number of folds and the destination
name and also path. The program will automatically split the dataset into 10 sheet
of testing subset and 10 sheet of training subset in the destination path by excel
format (*xlsx). The sheet of the result file is shown in Figure 19.

Figure 19. Sheet of Result File

Now, we have to save manually each sheet (training and testing) into csv
format. It is because we want to run this subset in WEKA software and WEKA can
only read the subset if it is converted into csv format. You can save the training
subset by differentiate the name by its fold number (Example: Train1, Train2, etc.).
Do the same thing for testing subset (Example: Test1, Test2, etc.). After that you
can move the training files into training folder and testing files into testing folder

13
Artificial Intelligence for Project Management

to avoid loading the wrong files when running WEKA (it is optional). The training
and testing folder is shown in Figure 20 and Figure 21.

Figure 20. Training Folder for 10-Folds

Figure 21. Testing Folder for 10-Folds


After having 10 files of training subset and 10 files of testing subset, we are
to do tiering in WEKA. Open WEKA software, choose “Explorer” and in Pre-
process tab load the Train1.xlsx (Figure 22).

Figure 22. Open Training Subset for Fold-1 in WEKA

14
Artificial Intelligence for Project Management

Delete the Y1 which is the actual output of the dataset. This has to be done
since we will do classification not regression. So, we will use Y1Class as output.

Figure 23. Delete Y1 (Actual Output)


Go to classify tab, choose “SMO” as classifier which is SVM. You can also
set the kernel to RBF Kernel if you want. SMO is located also in
classifiers/function/SMO. In the Test options panel chose “Use training set”, click
also “More Option” at set the output prediction to CSV format. After that, you can
just click “Start”. All the settings are shown in Figure 24. The result is shown in
Figure 25.

Figure 24. SMO Settings for Classification (Training Fold-1)

15
Artificial Intelligence for Project Management

Figure 25. Running Result for Classification (Training Fold-1)

Save the model into specific name, for example Train1.model then go back
to the Preprocess tab. Now, load the testing data (Test1) and remove also the Y1
attributes using the same step as before (Figure 26).

Figure 26. Open Testing Subset for Fold-1 in WEKA

After that, back again to Classify tab, make sure the classifier is still SMO.
In the Testing options panel click “Supplied test set” and open the same testing
subset (Test1.xlsx), the selection is shown in Figure 27.

16
Artificial Intelligence for Project Management

Figure 27. Set the Supplied Test Set


Now load the train model save before (Train1.model), and then right click
the loaded model and choose “Re-evaluate model on current test set” (Figure 28).

Figure 28. Run Classification for Testing Subset Based on Training Subset
Copy the predicted result for the testing subset. You can either make a new
column or just overwrite the previous classification in testing subset (We suggest
to create new column called “Predicted”). The predicted result of classification
model is shown in Figure 29.

17
Artificial Intelligence for Project Management

Copy the second column


(Predicted value)

Figure 29. Classification Result

We finished to run the classification for the first fold. Repeat the same step
for the other fold until getting the result for all 10-folds. After that, combine the
testing file into 1 file again to convert back our testing subset into original dataset.
Since every testing subset is different for each fold, when we combine it all together
again, the testing subset will gather again as a complete original dataset (Figure 30).

Figure 30. Combined Result of 10-Folds Testing Subset

18
Artificial Intelligence for Project Management

If you follow the step correctly, the amount of combined testing subset will
be the same with the amount of original dataset’s datapoints. You can save it as a
new file with name “Combined Result”. The next step is to use filter in excel to sort
the classification result. Copy the “High” filter result into new excel file and save it
into csv format with file name “High” (Optional but do not make yourself confuse
because we operate so many files). Do the same with “Medium” and “Low” filter
result. In the end we have 3 files with corresponding tier; High, Medium and Low
(Figure 31).

Figure 31. 3 Separate Subset Based on Corresponding Tier

If you are doing things correctly again, the total datapoints for High,
Medium and Low files will equal to original dataset. In this assignment, we have
268 low-tier datapoints, 142 medium-tier datapoints, and 358 high-tier datapoints.
The total is 768 datapoints which is the same with original dataset.

Now we are ready to do the final step of tiering. Run the 3 files; High,
Medium and Low files in WEKA by using 4 baseline models; ANN, LR, CART
and SVR. It is just the same step with running baseline model in the previous
assignment. The difference is now we use different dataset (Low, Medium, High).
Each tier will be run by 4 baseline models. So, in the end we have a total of 12
results for Tiering.

19
Artificial Intelligence for Project Management

5. Result Analysis
Now, we have the result of all the ensemble model; Bagging, Stacking, Voting and
Tiering. It is the time to analyze the result of each ensemble model. This assignment
will use 4 statistical performance measures to compare the result of each baseline
model, which are coefficient of correlation (R), root mean squared error (RMSE),
mean absolute percentage error (MAPE) and also mean absolute error (MAE). The
formulation of the performance measures is show in Table 5.

Table 5. Formulation of Performance Measures

Performance measure Formula


𝑛∑𝑦.𝑦′−(∑𝑦)(∑𝑦′)
Coefficient of correlation (R) 𝑅=
√𝑛(∑𝑦 2 )−(∑𝑦)2 √𝑛(∑𝑦′2 )−(∑𝑦′)2

1
Root mean squared error (RMSE) 𝑅𝑀𝑆𝐸 = √𝑛 ∑𝑛𝑖=1(𝑦 −𝑦 ′ )2
1 𝑦−𝑦′
Mean absolute percentage error (MAPE) 𝑀𝐴𝑃𝐸 = 𝑛 ∑𝑛𝑖=1 | |
𝑦
1
Mean absolute error (MAE) 𝑀𝐴𝐸 = ∑𝑛𝑖=1|𝑦 −𝑦′|
𝑛

Result from Voting Model is shown in Table 6-Table 16.

Table 6. Vote Model: SVR+LR

Fold RMSE MAE MAPE R


Fold 1 3.150 2.152 10.257 0.949
Fold 2 2.694 1.779 8.868 0.959
Fold 3 3.265 2.189 9.125 0.950
Fold 4 3.051 2.059 9.324 0.956
Fold 5 2.652 1.778 8.678 0.964
Fold 6 2.911 2.032 9.686 0.959
Fold 7 3.047 2.115 10.787 0.954
Fold 8 2.935 2.113 9.360 0.953
Fold 9 3.105 2.032 8.122 0.961
Fold 10 2.712 1.832 8.725 0.962
Average 2.952 2.008 9.293 0.957

20
Artificial Intelligence for Project Management

Table 7. Vote Model: ANN+CART

Fold RMSE MAE MAPE R


Fold 1 0.696 0.585 3.156 0.998
Fold 2 0.858 0.702 4.366 0.997
Fold 3 0.556 0.445 2.228 0.999
Fold 4 0.723 0.550 2.992 0.998
Fold 5 0.706 0.543 3.010 0.998
Fold 6 0.796 0.551 2.966 0.997
Fold 7 0.628 0.494 2.737 0.998
Fold 8 0.716 0.538 2.741 0.997
Fold 9 0.713 0.562 3.102 0.998
Fold 10 0.631 0.509 2.963 0.998
Average 0.702 0.548 3.026 0.998

Table 8. Vote Model: ANN+CART+LR

Fold RMSE MAE MAPE R


Fold 1 1.206 0.931 4.544 0.993
Fold 2 1.241 0.945 5.249 0.992
Fold 3 1.192 0.827 3.640 0.994
Fold 4 1.241 0.920 4.641 0.993
Fold 5 1.161 0.883 4.667 0.993
Fold 6 1.257 0.964 4.755 0.992
Fold 7 1.093 0.771 3.935 0.994
Fold 8 1.255 0.993 4.729 0.992
Fold 9 1.193 0.877 4.106 0.995
Fold 10 1.152 0.850 4.302 0.993
Average 1.199 0.896 4.457 0.993

Table 9. Vote Model: ANN+CART+SVR

Fold RMSE MAE MAPE R


Fold 1 1.225 0.919 4.538 0.993
Fold 2 1.227 0.929 5.092 0.993
Fold 3 1.293 0.859 3.556 0.993
Fold 4 1.271 0.909 4.396 0.993
Fold 5 1.197 0.888 4.513 0.993
Fold 6 1.248 0.950 4.753 0.993
Fold 7 1.147 0.814 4.231 0.994
Fold 8 1.223 0.959 4.526 0.993
Fold 9 1.334 0.974 4.489 0.993
Fold 10 1.128 0.830 4.125 0.994
Average 1.229 0.903 4.422 0.993

21
Artificial Intelligence for Project Management

Table 10. Vote Model: ANN+LR

Fold RMSE MAE MAPE R


Fold 1 1.731 1.345 6.722 0.985
Fold 2 1.819 1.387 7.775 0.983
Fold 3 1.668 1.183 5.192 0.988
Fold 4 1.761 1.331 6.555 0.985
Fold 5 1.648 1.298 6.950 0.987
Fold 6 1.717 1.321 6.529 0.986
Fold 7 1.525 1.087 5.739 0.989
Fold 8 1.770 1.389 6.715 0.984
Fold 9 1.687 1.242 5.574 0.989
Fold 10 1.684 1.246 6.339 0.986
Average 1.701 1.283 6.409 0.986

Table 11. Vote Model: ANN+SVR

Fold RMSE MAE MAPE R


Fold 1 1.768 1.320 6.661 0.985
Fold 2 1.793 1.370 7.575 0.984
Fold 3 1.830 1.214 5.061 0.986
Fold 4 1.804 1.295 6.125 0.985
Fold 5 1.699 1.268 6.527 0.986
Fold 6 1.731 1.318 6.546 0.986
Fold 7 1.610 1.132 6.016 0.988
Fold 8 1.715 1.330 6.407 0.985
Fold 9 1.897 1.356 5.967 0.986
Fold 10 1.645 1.204 5.997 0.987
Average 1.749 1.281 6.288 0.986

Table 12. Vote Model: ANN+SVR+LR

Fold RMSE MAE MAPE R


Fold 1 2.170 1.548 7.599 0.977
Fold 2 2.041 1.436 7.652 0.978
Fold 3 2.225 1.497 6.262 0.978
Fold 4 2.163 1.535 7.209 0.978
Fold 5 1.961 1.429 7.314 0.981
Fold 6 2.069 1.525 7.427 0.979
Fold 7 2.025 1.413 7.357 0.980
Fold 8 2.095 1.582 7.393 0.977
Fold 9 2.195 1.532 6.503 0.982
Fold 10 1.974 1.403 6.863 0.980
Average 2.092 1.490 7.158 0.979

22
Artificial Intelligence for Project Management

Table 13. Vote Model: ANN+SVR+LR+CART

Fold RMSE MAE MAPE R


Fold 1 1.655 1.191 5.754 0.987
Fold 2 1.547 1.086 5.768 0.987
Fold 3 1.720 1.154 4.828 0.987
Fold 4 1.665 1.180 5.619 0.988
Fold 5 1.512 1.093 5.549 0.989
Fold 6 1.620 1.197 5.833 0.987
Fold 7 1.561 1.094 5.589 0.989
Fold 8 1.625 1.244 5.757 0.986
Fold 9 1.686 1.180 5.152 0.989
Fold 10 1.496 1.070 5.221 0.989
Average 1.609 1.149 5.507 0.988

Table 14. Vote Model: CART+LR

Fold RMSE MAE MAPE R


Fold 1 1.607 1.170 5.416 0.987
Fold 2 1.406 0.972 4.960 0.989
Fold 3 1.671 1.167 5.027 0.988
Fold 4 1.608 1.126 5.454 0.989
Fold 5 1.396 0.964 4.778 0.990
Fold 6 1.639 1.212 5.766 0.987
Fold 7 1.603 1.105 5.540 0.988
Fold 8 1.629 1.234 5.437 0.986
Fold 9 1.549 1.035 4.484 0.991
Fold 10 1.424 0.999 4.899 0.990
Average 1.553 1.099 5.176 0.988

Table 15. Vote Model: CART+SVR

Fold RMSE MAE MAPE R


Fold 1 1.706 1.135 5.202 0.986
Fold 2 1.404 0.914 4.400 0.989
Fold 3 1.839 1.198 4.858 0.986
Fold 4 1.659 1.092 4.910 0.988
Fold 5 1.464 0.984 4.533 0.990
Fold 6 1.605 1.107 5.341 0.988
Fold 7 1.662 1.130 5.553 0.987
Fold 8 1.607 1.165 5.028 0.987
Fold 9 1.758 1.161 4.910 0.989
Fold 10 1.402 0.951 4.364 0.990
Average 1.611 1.084 4.910 0.988

23
Artificial Intelligence for Project Management

Table 16. Vote Model: CART+SVR+LR

Fold RMSE MAE MAPE R


Fold 1 2.130 1.463 6.860 0.977
Fold 2 1.811 1.186 5.938 0.982
Fold 3 2.240 1.507 6.286 0.978
Fold 4 2.083 1.417 6.534 0.981
Fold 5 1.816 1.219 5.908 0.984
Fold 6 2.023 1.429 6.826 0.981
Fold 7 2.077 1.434 7.196 0.979
Fold 8 2.030 1.485 6.535 0.978
Fold 9 2.114 1.398 5.792 0.983
Fold 10 1.821 1.240 5.885 0.983
Average 2.014 1.378 6.376 0.981

Result from Bagging Model is shown in Table 17-Table 20.

Table 17. Bagging Model: SVR

Fold RMSE MAE MAPE R


Fold 1 3.288 2.152 10.111 0.944
Fold 2 2.728 1.751 8.420 0.958
Fold 3 3.514 2.269 9.124 0.943
Fold 4 3.136 2.067 8.997 0.954
Fold 5 2.738 1.846 8.785 0.962
Fold 6 2.972 1.995 9.508 0.957
Fold 7 3.161 2.166 10.858 0.951
Fold 8 2.959 2.077 9.115 0.953
Fold 9 3.428 2.204 8.681 0.953
Fold 10 2.753 1.843 8.460 0.961
Average 3.068 2.037 9.206 0.954

Table 18. Bagging Model: LR

Fold RMSE MAE MAPE R


Fold 1 3.091 2.249 10.722 0.952
Fold 2 2.781 1.952 9.924 0.957
Fold 3 3.143 2.206 9.407 0.953
Fold 4 3.055 2.140 10.029 0.955
Fold 5 2.638 1.875 9.440 0.964
Fold 6 2.976 2.179 10.348 0.958
Fold 7 3.027 2.142 11.139 0.954
Fold 8 3.018 2.222 9.899 0.951
Fold 9 2.983 1.986 8.114 0.964

24
Artificial Intelligence for Project Management

Fold 10 2.814 1.991 9.907 0.960


Average 2.952 2.094 9.893 0.957

Table 19. Bagging Model: CART

Fold RMSE MAE MAPE R


Fold 1 0.741 0.466 2.144 0.997
Fold 2 0.431 0.304 1.518 0.999
Fold 3 0.569 0.402 1.895 0.998
Fold 4 0.581 0.399 2.097 0.998
Fold 5 0.453 0.327 1.519 0.999
Fold 6 0.676 0.458 2.502 0.998
Fold 7 0.671 0.443 2.224 0.998
Fold 8 0.567 0.405 1.748 0.998
Fold 9 0.488 0.366 1.584 0.999
Fold 10 0.390 0.279 1.391 0.999
Average 0.557 0.385 1.862 0.998

Table 20. Bagging Mode: ANN

Fold RMSE MAE MAPE R


Fold 1 0.901 0.768 4.278 0.996
Fold 2 1.008 0.787 4.594 0.995
Fold 3 1.116 0.816 4.259 0.995
Fold 4 0.898 0.698 3.807 0.996
Fold 5 1.236 1.073 6.438 0.995
Fold 6 0.916 0.671 3.871 0.996
Fold 7 0.847 0.699 4.264 0.997
Fold 8 0.905 0.724 3.738 0.996
Fold 9 0.927 0.707 3.879 0.996
Fold 10 0.762 0.575 2.947 0.997
Average 0.952 0.752 4.207 0.996

Result from Stacking Model is shown in Table 21-Table 24.

Table 21. Stacking Model: CART+SVR+LR

Fold RMSE MAE MAPE R


Fold 1 0.929 0.637 3.494 0.997
Fold 2 0.587 0.345 1.780 0.999
Fold 3 0.620 0.434 1.933 0.998
Fold 4 0.560 0.433 2.408 0.999

25
Artificial Intelligence for Project Management

Fold 5 0.503 0.346 1.769 0.999


Fold 6 1.249 0.863 5.358 0.995
Fold 7 0.735 0.486 2.426 0.998
Fold 8 0.728 0.478 2.037 0.997
Fold 9 1.354 1.227 6.144 0.999
Fold 10 0.487 0.366 2.007 0.999
Average 0.775 0.562 2.936 0.998

Table 22. Stacking Model: ANN+SVR+LR

Fold RMSE MAE MAPE R


Fold 1 1.381 0.955 5.148 0.992
Fold 2 0.809 0.523 3.226 0.997
Fold 3 0.828 0.617 3.215 0.997
Fold 4 0.980 0.725 4.323 0.996
Fold 5 0.762 0.498 2.516 0.997
Fold 6 0.999 0.686 3.929 0.995
Fold 7 0.781 0.594 3.646 0.997
Fold 8 1.581 1.138 5.775 0.989
Fold 9 0.681 0.532 2.704 0.998
Fold 10 1.292 0.811 4.307 0.993
Average 1.009 0.708 3.879 0.995

Table 23. Stacking Model: ANN+CART+SVR

Fold RMSE MAE MAPE R


Fold 1 0.800 0.521 2.575 0.997
Fold 2 0.533 0.349 1.868 0.999
Fold 3 0.545 0.387 1.795 0.999
Fold 4 0.519 0.402 2.209 0.999
Fold 5 0.524 0.392 2.029 0.999
Fold 6 0.852 0.559 3.260 0.997
Fold 7 0.660 0.462 2.500 0.998
Fold 8 0.626 0.440 2.045 0.998
Fold 9 0.564 0.428 2.008 0.999
Fold 10 0.512 0.374 1.860 0.999
Average 0.613 0.431 2.215 0.998

Table 24. Stacking Model: ANN+CART+LR

Fold RMSE MAE MAPE R


Fold 1 1.046 0.847 4.633 0.996
Fold 2 0.748 0.552 3.218 0.997
Fold 3 0.834 0.653 3.239 0.997
Fold 4 0.758 0.600 3.349 0.998

26
Artificial Intelligence for Project Management

Fold 5 0.737 0.545 2.954 0.997


Fold 6 0.923 0.672 3.742 0.996
Fold 7 0.736 0.557 3.043 0.998
Fold 8 0.835 0.668 3.524 0.997
Fold 9 0.687 0.528 2.684 0.998
Fold 10 0.727 0.535 2.666 0.998
Average 0.803 0.616 3.305 0.997

Result from Tiering Model (Low) is shown in Table 25-Table 28


Table 25. Tiering Model (Low): LR

Fold RMSE MAE MAPE R


Fold 1 2.561 1.984 9.423 0.967
Fold 2 3.203 2.108 9.046 0.969
Fold 3 3.431 2.606 11.196 0.945
Fold 4 3.172 2.225 10.166 0.945
Fold 5 3.339 2.404 12.209 0.954
Fold 6 1.948 1.533 9.183 0.984
Fold 7 2.550 1.829 9.814 0.964
Fold 8 2.712 1.782 7.024 0.971
Fold 9 3.112 2.248 10.624 0.946
Fold 10 3.815 2.749 12.844 0.921
Average 2.984 2.147 10.153 0.957

Table 26. Tiering Model (Low): CART

Fold RMSE MAE MAPE R


Fold 1 0.406 0.321 1.398 0.999
Fold 2 0.840 0.507 2.296 0.998
Fold 3 0.666 0.530 2.417 0.998
Fold 4 1.003 0.692 3.519 0.995
Fold 5 1.634 0.789 5.911 0.990
Fold 6 0.543 0.417 2.265 0.999
Fold 7 0.816 0.573 2.919 0.997
Fold 8 1.042 0.594 2.385 0.995
Fold 9 1.190 0.744 4.108 0.992
Fold 10 0.706 0.471 2.568 0.997
Average 0.885 0.564 2.979 0.996

Table 27. Tiering Model (Low): ANN

Fold RMSE MAE MAPE R


Fold 1 1.715 1.494 7.481 0.993
Fold 2 2.266 1.936 10.482 0.990

27
Artificial Intelligence for Project Management

Fold 3 0.792 0.578 2.597 0.998


Fold 4 1.809 1.537 8.616 0.988
Fold 5 2.974 2.384 16.640 0.986
Fold 6 2.164 1.783 12.935 0.989
Fold 7 1.939 1.325 6.601 0.981
Fold 8 1.618 1.273 6.521 0.993
Fold 9 1.763 1.451 8.967 0.981
Fold 10 2.869 2.540 12.851 0.976
Average 1.991 1.630 9.369 0.987

Table 28. Tiering Model (Low): SVR

Fold RMSE MAE MAPE R


Fold 1 2.106 1.676 7.633 0.976
Fold 2 3.941 2.656 9.644 0.957
Fold 3 3.703 2.561 10.573 0.937
Fold 4 3.892 2.852 12.903 0.916
Fold 5 3.616 2.519 11.741 0.947
Fold 6 1.752 1.283 7.397 0.985
Fold 7 2.544 1.760 8.729 0.964
Fold 8 3.438 2.290 8.473 0.947
Fold 9 2.874 2.004 9.137 0.949
Fold 10 3.916 2.736 12.513 0.917
Average 3.178 2.234 9.874 0.949

Result from Tiering Model (Medium) is shown in Table 29-Table 32.

Table 29. Tiering Model (Medium): LR

Fold RMSE MAE MAPE R


Fold 1 3.853 2.866 13.068 0.926
Fold 2 3.197 2.195 11.890 0.948
Fold 3 3.579 2.704 11.854 0.952
Fold 4 3.332 2.432 10.720 0.952
Fold 5 2.654 1.719 8.866 0.977
Fold 6 2.779 2.161 9.257 0.962
Fold 7 3.558 2.475 10.654 0.949
Fold 8 3.599 2.943 14.505 0.972
Fold 9 2.719 1.897 8.703 0.949
Fold 10 3.336 2.383 10.639 0.957
Average 3.260 2.378 11.016 0.954

28
Artificial Intelligence for Project Management

Table 30. Tiering Model (Medium): CART

Fold RMSE MAE MAPE R


Fold 1 2.633 1.325 5.307 0.962
Fold 2 1.793 1.166 5.095 0.983
Fold 3 2.511 1.663 6.255 0.975
Fold 4 2.667 2.193 9.196 0.967
Fold 5 2.877 1.928 7.390 0.969
Fold 6 1.818 1.240 4.342 0.979
Fold 7 2.460 1.774 10.212 0.976
Fold 8 1.355 0.901 5.409 0.995
Fold 9 1.493 0.845 3.957 0.986
Fold 10 0.719 0.507 3.050 0.998
Average 2.033 1.354 6.021 0.979

Table 31. Tiering Model (Medium): ANN

Fold RMSE MAE MAPE R


Fold 1 2.454 1.475 8.047 0.976
Fold 2 1.933 1.144 5.745 0.987
Fold 3 3.192 2.894 15.802 0.979
Fold 4 3.299 2.633 12.060 0.964
Fold 5 2.251 1.848 12.777 0.994
Fold 6 2.087 1.864 8.978 0.987
Fold 7 3.222 2.646 16.258 0.972
Fold 8 1.627 1.228 5.462 0.991
Fold 9 1.563 1.253 6.613 0.990
Fold 10 1.798 1.574 8.986 0.987
Average 2.343 1.856 10.073 0.983

Table 32. Tiering Model (Medium): SVR

Fold RMSE MAE MAPE R


Fold 1 4.709 3.083 13.609 0.876
Fold 2 3.693 2.068 9.151 0.930
Fold 3 3.805 2.597 8.884 0.952
Fold 4 4.297 3.062 10.529 0.944
Fold 5 5.198 3.481 15.412 0.905
Fold 6 3.016 2.031 7.767 0.948
Fold 7 5.569 4.040 18.701 0.862
Fold 8 3.983 2.148 7.396 0.962
Fold 9 3.453 2.211 9.600 0.915
Fold 10 2.220 1.757 9.305 0.975
Average 3.994 2.648 11.035 0.927

29
Artificial Intelligence for Project Management

Result from Tiering Model (Medium) is shown in Table 33-Table 36.

Table 33. Tiering Model (High): LR

Fold RMSE MAE MAPE R


Fold 1 2.246 1.594 9.082 0.957
Fold 2 3.305 2.254 8.485 0.954
Fold 3 2.633 1.920 9.941 0.959
Fold 4 2.871 2.146 9.448 0.960
Fold 5 2.331 1.735 9.313 0.960
Fold 6 3.386 2.578 11.607 0.952
Fold 7 2.109 1.443 7.171 0.972
Fold 8 2.981 1.879 7.700 0.973
Fold 9 3.407 2.226 11.386 0.954
Fold 10 2.744 2.011 10.414 0.953
Average 2.801 1.979 9.455 0.960

Table 34. Tiering Model (High): CART

Fold RMSE MAE MAPE R


Fold 1 0.952 0.587 3.918 0.992
Fold 2 0.893 0.567 2.260 0.997
Fold 3 0.796 0.570 3.195 0.996
Fold 4 0.737 0.596 2.620 0.998
Fold 5 1.449 0.849 4.121 0.985
Fold 6 2.134 1.276 4.908 0.984
Fold 7 1.280 0.720 3.859 0.990
Fold 8 1.055 0.580 2.137 0.996
Fold 9 2.797 1.651 10.791 0.952
Fold 10 1.009 0.761 3.666 0.996
Average 1.310 0.816 4.148 0.989

Table 35. Tiering Model (High): ANN

Fold RMSE MAE MAPE R


Fold 1 1.794 1.465 10.784 0.984
Fold 2 2.402 1.705 6.852 0.975
Fold 3 1.386 1.090 6.100 0.986
Fold 4 2.781 2.377 11.418 0.979
Fold 5 1.888 1.426 6.892 0.977
Fold 6 2.175 1.720 10.360 0.985
Fold 7 1.661 1.463 8.625 0.985
Fold 8 1.736 1.379 8.837 0.993
Fold 9 1.998 1.354 7.792 0.976
Fold 10 2.362 1.954 10.403 0.972

30
Artificial Intelligence for Project Management

Average 2.019 1.593 8.806 0.981

Table 36. Tiering Model (High): SVR

Fold RMSE MAE MAPE R


Fold 1 2.532 1.533 9.026 0.943
Fold 2 4.187 2.397 7.286 0.941
Fold 3 2.790 1.705 8.691 0.945
Fold 4 3.750 2.685 9.877 0.953
Fold 5 2.516 1.494 6.769 0.954
Fold 6 4.314 2.971 12.889 0.936
Fold 7 2.342 1.358 6.071 0.973
Fold 8 3.816 2.009 6.376 0.957
Fold 9 2.651 1.745 9.542 0.961
Fold 10 3.066 2.265 10.694 0.936
Average 3.197 2.016 8.722 0.950

All the result of the ensemble model will be summarized into 1 table format to make
it easier to analyze and compare (Table 37).

Table 37. Result of All Ensemble Models + Baseline Models

Method Combination Model RMSE MAE MAPE R


CART 0.557 0.385 1.862 0.998
ANN 0.952 0.752 4.207 0.996
Bagging
LR 2.952 2.094 9.893 0.957
SVR 3.068 2.037 9.206 0.954
CART 0.569 0.396 1.955 0.998
ANN 1.127 0.888 5.136 0.995
Baseline
LR 2.941 2.086 9.853 0.957
SVR 9.015 2.028 9.147 0.949
ANN+CART+SVR 0.613 0.431 2.215 0.998
CART+SVR+LR 0.775 0.562 2.936 0.998
Stacking
ANN+CART+LR 0.803 0.616 3.305 0.997
ANN+SVR+LR 1.009 0.708 3.879 0.995
ANN+CART 0.702 0.548 3.026 0.998
ANN+CART+LR 1.199 0.896 4.457 0.993
ANN+CART+SVR 1.229 0.903 4.422 0.993
CART+SVR 1.611 1.084 4.910 0.988
Voting
CART+LR 1.553 1.099 5.176 0.988
ANN+CART+SVR+LR 1.609 1.149 5.507 0.988
ANN+SVR 1.749 1.281 6.288 0.986
ANN+LR 1.701 1.283 6.409 0.986

31
Artificial Intelligence for Project Management

CART+SVR+LR 2.014 1.378 6.376 0.981


ANN+SVR+LR 2.092 1.490 7.158 0.979
SVR+LR 2.952 2.008 9.293 0.957
ANN 1.991 1.630 9.369 0.987
SVR 3.178 2.234 9.874 0.949
Tiering (Low)
LR 2.984 2.147 10.153 0.957
CART 0.885 0.564 2.979 0.996
ANN 2.343 1.856 10.073 0.983
SVR 3.994 2.648 11.035 0.927
Tiering (Medium)
LR 3.260 2.378 11.016 0.954
CART 2.033 1.354 6.021 0.979
ANN 2.019 1.593 8.806 0.981
SVR 3.197 2.016 8.722 0.950
Tiering (High)
LR 2.801 1.979 9.455 0.960
CART 1.310 0.816 4.148 0.989

If you run the same step with WEKA, you can found that the performance
measure value is slightly different. It is because we manually calculate the
statistical performance measures by the equation shown in the previous section
(not WEKA’s). We also add the result from baseline model from previous
assignment to add more comparison. Since there are 4 statistical performance
measures used in this assignment, we need to rank the ensemble model based on
the performance of each measures. For R, the higher the value the better and for the
rest the lower the value is better. This assignment assumes that the priority will be
taken as R>RMSE>MAE>MAPE. It means that R is the highest priority and MAPE
is the lowest priority. We rank all the methods by using multiple level sort in excel,
shown in Figure 32. The result of the ranking is shown in Table 38.

Figure 32. Ranking Method

32
Artificial Intelligence for Project Management

Table 38. Ranked Result of Each Ensemble Model

Method Combination Model RMSE MAE MAPE R Rank


Bagging CART 0.557 0.385 1.862 0.998 1
Baseline CART 0.569 0.396 1.955 0.998 2
Stacking ANN+CART+SVR 0.613 0.431 2.215 0.998 3
Stacking CART+SVR+LR 0.775 0.562 2.936 0.998 4
Voting ANN+CART 0.702 0.548 3.026 0.998 5
Stacking ANN+CART+LR 0.803 0.616 3.305 0.997 6
Tiering (Low) CART 0.885 0.564 2.979 0.996 7
Bagging ANN 0.952 0.752 4.207 0.996 8
Stacking ANN+SVR+LR 1.009 0.708 3.879 0.995 9
Baseline ANN 1.127 0.888 5.136 0.995 10
Voting ANN+CART+LR 1.199 0.896 4.457 0.993 11
Voting ANN+CART+SVR 1.229 0.903 4.422 0.993 12
Tiering (High) CART 1.310 0.816 4.148 0.989 13
Voting CART+LR 1.553 1.099 5.176 0.988 14
Voting CART+SVR 1.611 1.084 4.910 0.988 15
Voting ANN+CART+SVR+LR 1.609 1.149 5.507 0.988 16
Tiering (Low) ANN 1.991 1.630 9.369 0.987 17
Voting ANN+LR 1.701 1.283 6.409 0.986 18
Voting ANN+SVR 1.749 1.281 6.288 0.986 19
Tiering (Medium) ANN 2.343 1.856 10.073 0.983 20
Tiering (High) ANN 2.019 1.593 8.806 0.981 21
Voting CART+SVR+LR 2.014 1.378 6.376 0.981 22
Voting ANN+SVR+LR 2.092 1.490 7.158 0.979 23
Tiering (Medium) CART 2.033 1.354 6.021 0.979 24
Tiering (High) LR 2.801 1.979 9.455 0.960 25
Baseline LR 2.941 2.086 9.853 0.957 26
Bagging LR 2.952 2.094 9.893 0.957 27
Voting SVR+LR 2.952 2.008 9.293 0.957 28
Tiering (Low) LR 2.984 2.147 10.153 0.957 29
Tiering (Medium) LR 3.260 2.378 11.016 0.954 30
Bagging SVR 3.068 2.037 9.206 0.954 31
Tiering (High) SVR 3.197 2.016 8.722 0.950 32
Tiering (Low) SVR 3.178 2.234 9.874 0.949 33
Baseline SVR 9.015 2.028 9.147 0.949 34
Tiering (Medium) SVR 3.994 2.648 11.035 0.927 35

From Table 38 above, we can see that there are top 5 ensemble models that
have the highest R value, Bagging-CART, Baseline-CART, Stacking-
ANN+CART+SVR, Voting-ANN+CART and Stacking-CART+SVR+LR. All of

33
Artificial Intelligence for Project Management

them have R value of 0.998. The difference takes place in other performance
measures (RMSE, MAE and MAPE). For Rank 1-3, the result of the other measures
is linearly same with R. So, there will be no problem to sort the rank for the first
top 3. For Rank 4-5, it can be concluded based on our assumption in the first place
(R>RMSE>MAE>MAPE). In the end, Stacking-CART+SVR+LR is placed to
Rank 4 and Voting-ANN+CART is placed to Rank 5. The radar plot of top 5 models
is shown in Figure 33.

TOP 5 RESULT FROM ALL


MODELS
CART (Bagging) CART (Baseline)
ANN+CART+SVR (Stacking) ANN+CART (Voting)
CART+SVR+LR (Stacking)
RMSE
1
0.95
0.9
0.85
0.8
R 0.75 MAE

MAPE

Figure 33. Radar Plot of Top 5 Models


If we pay attention to the type of model from top 5 results, we can see that
CART model is always shown in every model. So, it can be concluded from all
baseline and ensemble model result, CART model is the most suitable model for
dataset used in this assignment. And there is also an assumption to let the Tiering
result included into the comparison. It is because in the Tiering method, we split
the original dataset into 3 part (tier). So, the performance running is not in the same
amount of datapoints with the other ensemble model (Voting, Stacking, Bagging).
The other ensemble model is run using the original dataset which has full
datapoints.

34
Artificial Intelligence for Project Management

6. Conclusion
From the demonstration of all baseline models plus ensemble models in WEKA,
the writer concludes that CART method (Classification and Regression Tree) is the
best method for the dataset. In can be seen clearly from the top 5 results of all
models that CART is always shown in the contributing model. As further analysis,
the writer concludes that for Assignment 4 and Assignment 5, the best model is the
Bagging Model used for CART.

Result from all baseline models and ensemble models in this assignments is
done by the assumption of using default settings of WEKA. There is still a
possibility that another ensemble model or parameter settings of each model can
outperform the best model concluded in this assignment. Because of that reason,
the writer suggests the future trials to consider more comparison to give a better
knowledge in this data science practice.

35
Artificial Intelligence for Project Management

APPENDICES

(Since in this ensemble model trial we run the program many times and many folds,
it is not possible to print all the running result from all methods used in the
assignment).

36

Anda mungkin juga menyukai