Taller 5 Solucionado

Taller 5
La siguiente tabla se compone de datos de entrenamiento de una base de datos de los empleados. Los
datos han sido generalizados. Por ejemplo, "31-35" para la edad representa el rango de edad de 31 a 35.
Para una entrada de fila dada, representa el nmero de tuplas de datos que tienen los valores de
departamento, estado, edad y salario dado en esa fila.
department
sales
sales
sales
systems
systems
systems
systems
marketing
marketing
secretary
secretary
status
senior
junior
junior
junior
senior
junior
senior
senior
junior
senior
junior
age
31-35
26-30
31-35
21-25
31-35
26-30
41-45
36-40
31-35
46-50
26-30
salary
46K-50K
26K-30K
31K-35K
46K-50K
66K-70K
46K-50K
66K-70K
46K-50K
41K-45K
36K-40K
26K-30K
count
30
40
40
20
5
3
3
10
4
4
6
Sea status el atributo etiqueta de clase.
1) Usando weka,
a. Construir el rbol usando id3, j48 y random forest. Compare los resultados
b. Bayes net
c. Multilayer perceptron
d. LibSVM, pruebe con 4 diferentes tipos de kernel. Compare los resultados.
e.
2) Otro mtodo para solucionar las redes bayesianas es el de Belief propagation, tambien conocido
como sum-product message passing. Describa en brevemente en que consiste.
3) Hacer el ejercicio 9.1 del libro
Solucin
Id3
=== Classifier model (full training set) ===
Id3
salary = 46K-50k: senior

salary = 26K-30K: junior
salary = 31K-35K: junior
salary = 46K-50K
| department = sales: null
| department = systems: junior
| department = marketing: senior
| department = secretary: null
salary = 66K-70K: senior
salary = 41K-45k: junior
salary = 36K-40K: senior
Time taken to build model: 0.03 seconds
=== Stratified cross-validation ===

=== Summary ===
Correctly Classified Instances
165
100
Incorrectly Classified Instances

Kappa statistic
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances
165
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class

1
senior
junior
Weighted Avg. 1
=== Confusion Matrix ===
a b <-- classified as
52 0 | a = senior
0 113 | b = junior
J48
J48 pruned tree
------------------
salary = 46K-50k: senior (30.0)

salary = 26K-30K: junior (46.0)
salary = 31K-35K: junior (40.0)
salary = 46K-50K
| department = sales: junior (0.0)
| department = systems: junior (23.0)
| department = marketing: senior (10.0)
| department = secretary: junior (0.0)
salary = 66K-70K: senior (8.0)
salary = 41K-45k: junior (4.0)
salary = 36K-40K: senior (4.0)
Number of Leaves :
10
Size of the tree :
12

=== Summary ===
165
100

Kappa statistic
Mean absolute error
165

1
senior
junior
Weighted Avg. 1
52 0 | a = senior
0 113 | b = junior
random forest
Random forest of 10 trees, each constructed while considering 3 random features.

Out of bag error: 0

=== Summary ===
165

Kappa statistic
100
%
%
Mean absolute error
0.0029
0.0199
0.6805 %
4.2893 %
165

1
senior
junior
Weighted Avg. 1
52 0 | a = senior
0 113 | b = junior
.
El algoritmo id3 y j48 muestra un conteo de la relacin entre las tuplas mostradas
anteriormente.
J48 nos muestra el nmero de hojas que posee el rbol y el tamao de este. Lo que no hace el
algoritmo id3.
Ambos mtodos muestran un porcentaje de error nulo.
El algoritmo j48 es ms preciso que el algoritmo randomForest debido a las diferencias de los
porcentajes de errores que muestra cada mtodo.
b. Bayes Net
Bayes Network Classifier

not using ADTree
#attributes=4 #classindex=1
Network structure (nodes followed by parents)
department(4): status
status(2):
age(6): status
salary(7): status
LogScore Bayes: -671.2178322167754
LogScore BDeu: -734.0007744477259

LogScore MDL: -741.4529682985715
LogScore ENTROPY: -667.416758927013
LogScore AIC: -696.416758927013

=== Summary ===
161

Kappa statistic
97.5758 %
2.4242 %
0.945
Mean absolute error
0.0273

0.0912
6.3016 %
19.6156 %
165

1
0.965
0.035
0
0.929
1
0.963
senior
0.965
0.982
junior
Weighted Avg. 0.976
0.011
0.977
0.976
0.976
52 0 | a = senior
4 109 | b = junior
En este algoritmo de clasificacin proporciona datos de los resultaos de los logaritmos

referentes algunas variables como la entropa, los Bayes, BDeu, MDL, AIC. Tambin se puede
observas las instancias que se clasificaron correctamente y las que se clasificaron
incorrectamente.
c. Multilayer perceptron
Sigmoid Node 0
Inputs Weights
Threshold 2.7394370243645954
Node 2 -1.8508751465691542
Node 3 -2.354821458534509
Node 4 -1.4870386211949875
Node 5 -1.3229943341191277
Node 6 1.5278889125307347
Node 7 -2.2131181020972868
Node 8 -0.5914245531409845
Node 9 -2.0587896015776685
Node 10 2.270779949723552
Sigmoid Node 1
Inputs Weights
Threshold -2.7825884536066616
Node 2 1.85191579937515
Node 3 2.323626175638421
Node 4 1.477966841443966
Node 5 1.304880433881045
Node 6 -1.4594876918432313
Node 7 2.232715726449767
Node 8 0.6801335831028799
Node 9 2.051171802934638
Node 10 -2.297941872862322
Sigmoid Node 2
Inputs Weights
Threshold -0.07755042992069021
Attrib department=sales 0.22690884175757925
Attrib department=systems 0.1545650553116968
Attrib department=marketing -0.028916340821306043
Attrib department=secretary -0.2113862149709065
Attrib age=31-35 -0.02750617633512445
Attrib age=26-30 0.7708789960391705
Attrib age=21-25 0.9031969460307472
Attrib age=41-45 -0.24524874175681483
Attrib age=36-40 -0.9372910085639369
Attrib age=46-50 -0.3173972567786108
Attrib salary=46K-50k -1.2721688131980107

Attrib salary=26K-30K 0.596029591410492
Attrib salary=31K-35K 1.073614573206162
Attrib salary=46K-50K 0.10622100546611281
Attrib salary=66K-70K -0.8931249639552377
Attrib salary=41K-45k 0.9840722037777665
Attrib salary=36K-40K -0.32173827499151414
Sigmoid Node 3
Inputs Weights
Threshold -0.09160443154315269
Attrib age=31-35 0.01617542844596184
Attrib age=26-30 0.9011003972162315
Attrib age=21-25 1.0622717118790497
Attrib age=41-45 -0.2752012547415663
Attrib age=36-40 -1.084971185421792
Attrib age=46-50 -0.41474256360460926
Attrib salary=46K-50k -1.448381199332202
Attrib salary=26K-30K 0.6690483312961746
Attrib salary=31K-35K 1.2615937981296461
Attrib salary=46K-50K 0.11767253931489512
Attrib salary=66K-70K -1.0619222218399373
Attrib salary=41K-45k 1.1155548070459484

Attrib salary=36K-40K -0.4197992341286937
Sigmoid Node 4
Inputs Weights
Threshold -0.020643481201846912
Attrib age=31-35 0.010512235705337582
Attrib age=26-30 0.7498866595807208
Attrib age=21-25 0.7873805984482601
Attrib age=41-45 -0.1464959547224344
Attrib age=36-40 -0.7580525890150757
Attrib age=46-50 -0.2556075853274041
Attrib salary=46K-50k -1.0701392832809675
Attrib salary=26K-30K 0.5473953000497558
Attrib salary=31K-35K 0.992235520740865
Attrib salary=46K-50K 0.13349935484850442
Attrib salary=66K-70K -0.7751334030115394
Attrib salary=41K-45k 0.7376105027424174
Attrib salary=36K-40K -0.31165331988873113
Sigmoid Node 5
Inputs Weights
Threshold -0.10504649400363189

Attrib age=31-35 -0.017009595286887676
Attrib age=26-30 0.6482292510400863
Attrib age=21-25 0.6915153763340572
Attrib age=41-45 -0.19586354097092593
Attrib age=36-40 -0.7359220880188584
Attrib age=46-50 -0.29334007131163364
Attrib salary=46K-50k -1.0108613445175572
Attrib salary=26K-30K 0.5262317715574415
Attrib salary=31K-35K 0.9182024924100928
Attrib salary=46K-50K 0.1188196030377099
Attrib salary=66K-70K -0.670937831354511
Attrib salary=41K-45k 0.7010453789252467
Attrib salary=36K-40K -0.22576085497477247
Sigmoid Node 6
Inputs Weights
Threshold 0.09557922327970046
Attrib department=sales -0.018139053459302966
Attrib department=systems -0.10209290440196074
Attrib department=secretary 0.028572139713019157
Attrib age=31-35 -0.0739998323577446
Attrib age=26-30 -0.3457681026476964

Attrib age=21-25 -0.46939913970448427
Attrib age=41-45 -0.042763003576478054
Attrib age=36-40 0.39852936666748573
Attrib age=46-50 0.06929754238286273
Attrib salary=46K-50k 0.3741253602958736
Attrib salary=26K-30K -0.24297175284159717
Attrib salary=31K-35K -0.4818491178517223
Attrib salary=46K-50K -0.10589748590134765
Attrib salary=66K-70K 0.3434715967606044
Attrib salary=41K-45k -0.5752678422949488
Attrib salary=36K-40K 0.045120149940549435
Sigmoid Node 7
Inputs Weights
Threshold -0.06160115651344392
Attrib age=31-35 0.029991616105111238
Attrib age=26-30 0.8651532630509001
Attrib age=21-25 0.9906708595270766
Attrib age=41-45 -0.20575332717275158
Attrib age=36-40 -1.047645476787963
Attrib age=46-50 -0.3535834941223248
Attrib salary=46K-50k -1.4332784615759104

Attrib salary=26K-30K 0.6493573607930859
Attrib salary=31K-35K 1.1819371137499333
Attrib salary=46K-50K 0.12596930399229297
Attrib salary=66K-70K -1.064741079546573
Attrib salary=41K-45k 1.1431299292474153
Attrib salary=36K-40K -0.3944593594329362
Sigmoid Node 8
Inputs Weights
Threshold -0.0472094583119319
Attrib age=31-35 -0.08736108359059333
Attrib age=26-30 0.4530072183024529
Attrib age=21-25 0.5003986272198906
Attrib age=41-45 -0.03799085042854391
Attrib age=36-40 -0.3920240148745457
Attrib age=46-50 -0.1205586069197979
Attrib salary=46K-50k -0.605742149160759
Attrib salary=26K-30K 0.3802185575469776
Attrib salary=31K-35K 0.6066272566318488
Attrib salary=46K-50K 0.10660422983302109
Attrib salary=66K-70K -0.3320427568391321
Attrib salary=41K-45k 0.42400945288403125

Attrib salary=36K-40K -0.10285393387441728
Sigmoid Node 9
Inputs Weights
Threshold -0.08424420101244516
Attrib age=31-35 0.02784723014273439
Attrib age=26-30 0.8489877575101271
Attrib age=21-25 0.9380267979491645
Attrib age=41-45 -0.2608835537805671
Attrib age=36-40 -1.0141091744602124
Attrib age=46-50 -0.39718995571738797
Attrib salary=46K-50k -1.3435954859187489
Attrib salary=26K-30K 0.6385529218748827
Attrib salary=31K-35K 1.186853891625651
Attrib salary=46K-50K 0.14603610059077612
Attrib salary=66K-70K -0.9588501691188384
Attrib salary=41K-45k 1.0099505342882553
Attrib salary=36K-40K -0.362947971136462
Sigmoid Node 10
Inputs Weights
Threshold 0.09908187740254618
Attrib department=sales -0.05342457992841399

Attrib department=systems -0.1192290423182317
Attrib department=secretary 0.14338709613151754
Attrib age=31-35 -0.09957529167902422
Attrib age=26-30 -0.5969152237408825
Attrib age=21-25 -0.7313360476645168
Attrib age=41-45 0.11828014170484517
Attrib age=36-40 0.7424640467459025
Attrib age=46-50 0.2309416143733404
Attrib salary=46K-50k 0.854775905742494
Attrib salary=26K-30K -0.4117530818438961
Attrib salary=31K-35K -0.7689602300444687
Attrib salary=46K-50K -0.09902009869134437
Attrib salary=66K-70K 0.7183111474183733
Attrib salary=41K-45k -0.8460358832180466
Attrib salary=36K-40K 0.22686248383226118
Class senior
Input
Node 0
Class junior
Input
Node 1
Mediante este algoritmo podemos observar cada uno de los nodos con cada uno de sus
atributos y los pesos que corresponde a cada nodo.
D. LibSVM tipo de Kernel: Linear

LibSVM wrapper, original code by Yasser EL-Manzalawy (= WLSVM)

=== Summary ===
125
75.7576 %
40
24.2424 %

Kappa statistic
0.3429
Mean absolute error
0.2424

0.4924
56.0304 %
105.9548 %
Coverage of cases (0.95 level)
75.7576 %
Mean rel. region size (0.95 level)

50
165
TP Rate FP Rate Precision Recall F-Measure MCC
ROC Area PRC Area Class
0,346 0,053 0,750
0,346 0,474
0,386 0,647
0,466
senior
0,947 0,654 0,759
0,947 0,843
0,386 0,647
0,755
junior
Weighted Avg. 0,758 0,465 0,756
0,758 0,726
0,386 0,647
18 34 | a = senior
6 107 | b = junior
LibSVM Tipo de Kernel : Polinomial

=== Summary ===

Kappa statistic
Mean absolute error
132
80
33
20
0.4409
0.2
0.4472
0,664
46.2251 %
96.2382 %
80

50
165
0,365 0,000 1,000
0,365 0,535
0,532 0,683
0,565
senior
1,000 0,635 0,774
1,000 0,873
0,532 0,683
0,774
junior
Weighted Avg. 0,800 0,435 0,845
0,800 0,766
0,532 0,683
19 33 | a = senior
0 113 | b = junior
LibSVM tipo de Kernel: Funcion Radial
0,708

=== Summary ===
165

Kappa statistic
100
%
%
Mean absolute error

0
0
100
50

165
1,000 0,000 1,000
1,000 1,000
1,000 1,000
1,000
senior
1,000 0,000 1,000
1,000 1,000
1,000 1,000
1,000
junior
Weighted Avg. 1,000 0,000 1,000
52 0 | a = senior
1,000 1,000
1,000 1,000
1,000
0 113 | b = junior
LibSVM tipo de Kernel: Sigmoid

=== Summary ===
143
86.6667 %
22
13.3333 %

Kappa statistic
0.6513
Mean absolute error
0.1333

0.3651
30.8167 %
78.5782 %
86.6667 %

50
165
0,577 0,000 1,000
0,577 0,732
0,695 0,788
0,710
senior
1,000 0,423 0,837
1,000 0,911
0,695 0,788
0,837
junior
Weighted Avg. 0,867 0,290 0,888
0,867 0,855
0,695 0,788
0,797
30 22 | a = senior
0 113 | b = junior
Este tipo de algoritmo nos presenta el numero correcto de instancias que se clasificaron y las
que no se clasificaron correctamente con sus respectivos porcentajes, nos muestra un error
relativo, una cobertura de los casos y su porcentaje, la raz del error cuadrado y su error relativo.
Se hace una tabla de valores con los detalles de la precisin por clase, con valores como tasa TP,
tasa FP, Precisin, re-llamado, medida F, MCC con valores que varan entre 0 y 1.
Con cada tipo de kernel diferente el nmero de instancias correctas que se clasifican correcta e
incorrectamente cambia por tal razn varan todos los datos y porcentajes.
Con el kernel de funcin de Base Radial el procentaje de instancias clasificadas
incorrectamente fue 0 por tal razn los porcentajes de error, y la raz del error cuadrado es 0 y
la tada TP esta en 1, tada FP 1, precisin 1, recall 1, medida F 1 y MCC 1.
2.
Belief propagation
Es un algoritmo para realizar inferencias en modelos grficos, como redes bayesianas y los campos
aleatorios de Markov . Calculando la distribucin marginal de cada nodo. Es utilizado en la
inteligencia artificial y teora de la informacin, se ha demostrado que es un algoritmo til en
aproximada de grficos generales.

Taller 5 Solucionado

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Taller 5 Solucionado

Diunggah oleh

Hak Cipta:

Format Tersedia

Taller 5

Sea status el atributo etiqueta de clase.

=== Classifier model (full training set) ===

salary = 46K-50k: senior

Time taken to build model: 0.03 seconds

=== Stratified cross-validation ===

Correctly Classified Instances

Incorrectly Classified Instances

Mean absolute error

Root mean squared error

Relative absolute error

Root relative squared error

Total Number of Instances

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

=== Confusion Matrix ===

J48 pruned tree

salary = 46K-50k: senior (30.0)

Size of the tree :

Time taken to build model: 0.02 seconds

=== Stratified cross-validation ===

Correctly Classified Instances

Incorrectly Classified Instances

Mean absolute error

Root mean squared error

Relative absolute error

Root relative squared error

Total Number of Instances

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

=== Confusion Matrix ===

Random forest of 10 trees, each constructed while considering 3 random features.

Time taken to build model: 0.04 seconds

=== Stratified cross-validation ===

Correctly Classified Instances

Incorrectly Classified Instances

Mean absolute error

Root mean squared error

Relative absolute error

Root relative squared error

Total Number of Instances

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

=== Confusion Matrix ===

Bayes Network Classifier

LogScore BDeu: -734.0007744477259

Time taken to build model: 0.01 seconds

=== Stratified cross-validation ===

Correctly Classified Instances

Incorrectly Classified Instances

Mean absolute error

Root mean squared error

Root relative squared error

Total Number of Instances

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

Weighted Avg. 0.976

=== Confusion Matrix ===

En este algoritmo de clasificacin proporciona datos de los resultaos de los logaritmos

Attrib salary=46K-50k -1.2721688131980107

Attrib salary=41K-45k 1.1155548070459484

Attrib department=sales 0.1888580478883902

Attrib age=26-30 -0.3457681026476964

Attrib salary=46K-50k -1.4332784615759104