Anda di halaman 1dari 23

Taller 5

La siguiente tabla se compone de datos de entrenamiento de una base de datos de los empleados. Los
datos han sido generalizados. Por ejemplo, "31-35" para la edad representa el rango de edad de 31 a 35.
Para una entrada de fila dada, representa el nmero de tuplas de datos que tienen los valores de
departamento, estado, edad y salario dado en esa fila.
department
sales
sales
sales
systems
systems
systems
systems
marketing
marketing
secretary
secretary

status
senior
junior
junior
junior
senior
junior
senior
senior
junior
senior
junior

age
31-35
26-30
31-35
21-25
31-35
26-30
41-45
36-40
31-35
46-50
26-30

salary
46K-50K
26K-30K
31K-35K
46K-50K
66K-70K
46K-50K
66K-70K
46K-50K
41K-45K
36K-40K
26K-30K

count
30
40
40
20
5
3
3
10
4
4
6

Sea status el atributo etiqueta de clase.

1) Usando weka,
a. Construir el rbol usando id3, j48 y random forest. Compare los resultados
b. Bayes net
c. Multilayer perceptron
d. LibSVM, pruebe con 4 diferentes tipos de kernel. Compare los resultados.
e.
2) Otro mtodo para solucionar las redes bayesianas es el de Belief propagation, tambien conocido
como sum-product message passing. Describa en brevemente en que consiste.
3) Hacer el ejercicio 9.1 del libro

Solucin

Id3

=== Classifier model (full training set) ===

Id3

salary = 46K-50k: senior


salary = 26K-30K: junior
salary = 31K-35K: junior
salary = 46K-50K
| department = sales: null
| department = systems: junior
| department = marketing: senior
| department = secretary: null
salary = 66K-70K: senior
salary = 41K-45k: junior
salary = 36K-40K: senior

Time taken to build model: 0.03 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances

165

100

Incorrectly Classified Instances


Kappa statistic

Mean absolute error

Root mean squared error

Relative absolute error

Root relative squared error

Total Number of Instances

165

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class


1

senior

junior

Weighted Avg. 1

=== Confusion Matrix ===

a b <-- classified as
52 0 | a = senior
0 113 | b = junior

J48
=== Classifier model (full training set) ===

J48 pruned tree

------------------

salary = 46K-50k: senior (30.0)


salary = 26K-30K: junior (46.0)
salary = 31K-35K: junior (40.0)
salary = 46K-50K
| department = sales: junior (0.0)
| department = systems: junior (23.0)
| department = marketing: senior (10.0)
| department = secretary: junior (0.0)
salary = 66K-70K: senior (8.0)
salary = 41K-45k: junior (4.0)
salary = 36K-40K: senior (4.0)

Number of Leaves :

10

Size of the tree :

12

Time taken to build model: 0.02 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances

165

100

Incorrectly Classified Instances


Kappa statistic

Mean absolute error

Root mean squared error

Relative absolute error

Root relative squared error

Total Number of Instances

165

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class


1

senior

junior

Weighted Avg. 1

=== Confusion Matrix ===

a b <-- classified as
52 0 | a = senior
0 113 | b = junior
random forest
=== Classifier model (full training set) ===

Random forest of 10 trees, each constructed while considering 3 random features.


Out of bag error: 0

Time taken to build model: 0.04 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances

165

Incorrectly Classified Instances


Kappa statistic

100

%
%

Mean absolute error

0.0029

Root mean squared error

0.0199

Relative absolute error

0.6805 %

Root relative squared error

4.2893 %

Total Number of Instances

165

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class


1

senior

junior

Weighted Avg. 1

=== Confusion Matrix ===

a b <-- classified as
52 0 | a = senior
0 113 | b = junior

.
El algoritmo id3 y j48 muestra un conteo de la relacin entre las tuplas mostradas
anteriormente.
J48 nos muestra el nmero de hojas que posee el rbol y el tamao de este. Lo que no hace el
algoritmo id3.
Ambos mtodos muestran un porcentaje de error nulo.
El algoritmo j48 es ms preciso que el algoritmo randomForest debido a las diferencias de los
porcentajes de errores que muestra cada mtodo.

b. Bayes Net
=== Classifier model (full training set) ===

Bayes Network Classifier


not using ADTree
#attributes=4 #classindex=1
Network structure (nodes followed by parents)
department(4): status
status(2):
age(6): status
salary(7): status
LogScore Bayes: -671.2178322167754

LogScore BDeu: -734.0007744477259


LogScore MDL: -741.4529682985715
LogScore ENTROPY: -667.416758927013
LogScore AIC: -696.416758927013

Time taken to build model: 0.01 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances

161

Incorrectly Classified Instances


Kappa statistic

97.5758 %

2.4242 %

0.945

Mean absolute error

0.0273

Root mean squared error


Relative absolute error

0.0912
6.3016 %

Root relative squared error

19.6156 %

Total Number of Instances

165

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class


1
0.965

0.035
0

0.929
1

0.963

senior

0.965

0.982

junior

Weighted Avg. 0.976

0.011

0.977

0.976

0.976

=== Confusion Matrix ===

a b <-- classified as
52 0 | a = senior
4 109 | b = junior

En este algoritmo de clasificacin proporciona datos de los resultaos de los logaritmos


referentes algunas variables como la entropa, los Bayes, BDeu, MDL, AIC. Tambin se puede
observas las instancias que se clasificaron correctamente y las que se clasificaron
incorrectamente.

c. Multilayer perceptron
=== Classifier model (full training set) ===

Sigmoid Node 0
Inputs Weights
Threshold 2.7394370243645954
Node 2 -1.8508751465691542
Node 3 -2.354821458534509
Node 4 -1.4870386211949875
Node 5 -1.3229943341191277
Node 6 1.5278889125307347
Node 7 -2.2131181020972868
Node 8 -0.5914245531409845
Node 9 -2.0587896015776685
Node 10 2.270779949723552

Sigmoid Node 1
Inputs Weights
Threshold -2.7825884536066616
Node 2 1.85191579937515
Node 3 2.323626175638421
Node 4 1.477966841443966
Node 5 1.304880433881045
Node 6 -1.4594876918432313
Node 7 2.232715726449767
Node 8 0.6801335831028799
Node 9 2.051171802934638
Node 10 -2.297941872862322
Sigmoid Node 2
Inputs Weights
Threshold -0.07755042992069021
Attrib department=sales 0.22690884175757925
Attrib department=systems 0.1545650553116968
Attrib department=marketing -0.028916340821306043
Attrib department=secretary -0.2113862149709065
Attrib age=31-35 -0.02750617633512445
Attrib age=26-30 0.7708789960391705
Attrib age=21-25 0.9031969460307472
Attrib age=41-45 -0.24524874175681483
Attrib age=36-40 -0.9372910085639369
Attrib age=46-50 -0.3173972567786108

Attrib salary=46K-50k -1.2721688131980107


Attrib salary=26K-30K 0.596029591410492
Attrib salary=31K-35K 1.073614573206162
Attrib salary=46K-50K 0.10622100546611281
Attrib salary=66K-70K -0.8931249639552377
Attrib salary=41K-45k 0.9840722037777665
Attrib salary=36K-40K -0.32173827499151414
Sigmoid Node 3
Inputs Weights
Threshold -0.09160443154315269
Attrib department=sales 0.19633050850394618
Attrib department=systems 0.14350673033770345
Attrib department=marketing -0.021132001738419306
Attrib department=secretary -0.22057503111112434
Attrib age=31-35 0.01617542844596184
Attrib age=26-30 0.9011003972162315
Attrib age=21-25 1.0622717118790497
Attrib age=41-45 -0.2752012547415663
Attrib age=36-40 -1.084971185421792
Attrib age=46-50 -0.41474256360460926
Attrib salary=46K-50k -1.448381199332202
Attrib salary=26K-30K 0.6690483312961746
Attrib salary=31K-35K 1.2615937981296461
Attrib salary=46K-50K 0.11767253931489512
Attrib salary=66K-70K -1.0619222218399373

Attrib salary=41K-45k 1.1155548070459484


Attrib salary=36K-40K -0.4197992341286937
Sigmoid Node 4
Inputs Weights
Threshold -0.020643481201846912
Attrib department=sales 0.15069976965985996
Attrib department=systems 0.12379838338640137
Attrib department=marketing -0.0943505094026181
Attrib department=secretary -0.14725072714787346
Attrib age=31-35 0.010512235705337582
Attrib age=26-30 0.7498866595807208
Attrib age=21-25 0.7873805984482601
Attrib age=41-45 -0.1464959547224344
Attrib age=36-40 -0.7580525890150757
Attrib age=46-50 -0.2556075853274041
Attrib salary=46K-50k -1.0701392832809675
Attrib salary=26K-30K 0.5473953000497558
Attrib salary=31K-35K 0.992235520740865
Attrib salary=46K-50K 0.13349935484850442
Attrib salary=66K-70K -0.7751334030115394
Attrib salary=41K-45k 0.7376105027424174
Attrib salary=36K-40K -0.31165331988873113
Sigmoid Node 5
Inputs Weights
Threshold -0.10504649400363189

Attrib department=sales 0.1888580478883902


Attrib department=systems 0.15300803282270037
Attrib department=marketing -0.02098097843654761
Attrib department=secretary -0.1280029758315231
Attrib age=31-35 -0.017009595286887676
Attrib age=26-30 0.6482292510400863
Attrib age=21-25 0.6915153763340572
Attrib age=41-45 -0.19586354097092593
Attrib age=36-40 -0.7359220880188584
Attrib age=46-50 -0.29334007131163364
Attrib salary=46K-50k -1.0108613445175572
Attrib salary=26K-30K 0.5262317715574415
Attrib salary=31K-35K 0.9182024924100928
Attrib salary=46K-50K 0.1188196030377099
Attrib salary=66K-70K -0.670937831354511
Attrib salary=41K-45k 0.7010453789252467
Attrib salary=36K-40K -0.22576085497477247
Sigmoid Node 6
Inputs Weights
Threshold 0.09557922327970046
Attrib department=sales -0.018139053459302966
Attrib department=systems -0.10209290440196074
Attrib department=marketing -0.1328719579969402
Attrib department=secretary 0.028572139713019157
Attrib age=31-35 -0.0739998323577446

Attrib age=26-30 -0.3457681026476964


Attrib age=21-25 -0.46939913970448427
Attrib age=41-45 -0.042763003576478054
Attrib age=36-40 0.39852936666748573
Attrib age=46-50 0.06929754238286273
Attrib salary=46K-50k 0.3741253602958736
Attrib salary=26K-30K -0.24297175284159717
Attrib salary=31K-35K -0.4818491178517223
Attrib salary=46K-50K -0.10589748590134765
Attrib salary=66K-70K 0.3434715967606044
Attrib salary=41K-45k -0.5752678422949488
Attrib salary=36K-40K 0.045120149940549435
Sigmoid Node 7
Inputs Weights
Threshold -0.06160115651344392
Attrib department=sales 0.2134113259865331
Attrib department=systems 0.172272506671537
Attrib department=marketing -0.029607224853917643
Attrib department=secretary -0.24106617744959774
Attrib age=31-35 0.029991616105111238
Attrib age=26-30 0.8651532630509001
Attrib age=21-25 0.9906708595270766
Attrib age=41-45 -0.20575332717275158
Attrib age=36-40 -1.047645476787963
Attrib age=46-50 -0.3535834941223248

Attrib salary=46K-50k -1.4332784615759104


Attrib salary=26K-30K 0.6493573607930859
Attrib salary=31K-35K 1.1819371137499333
Attrib salary=46K-50K 0.12596930399229297
Attrib salary=66K-70K -1.064741079546573
Attrib salary=41K-45k 1.1431299292474153
Attrib salary=36K-40K -0.3944593594329362
Sigmoid Node 8
Inputs Weights
Threshold -0.0472094583119319
Attrib department=sales 0.15217654915023798
Attrib department=systems 0.11722287931424957
Attrib department=marketing -0.06807690704999361
Attrib department=secretary -0.024785605611675223
Attrib age=31-35 -0.08736108359059333
Attrib age=26-30 0.4530072183024529
Attrib age=21-25 0.5003986272198906
Attrib age=41-45 -0.03799085042854391
Attrib age=36-40 -0.3920240148745457
Attrib age=46-50 -0.1205586069197979
Attrib salary=46K-50k -0.605742149160759
Attrib salary=26K-30K 0.3802185575469776
Attrib salary=31K-35K 0.6066272566318488
Attrib salary=46K-50K 0.10660422983302109
Attrib salary=66K-70K -0.3320427568391321

Attrib salary=41K-45k 0.42400945288403125


Attrib salary=36K-40K -0.10285393387441728
Sigmoid Node 9
Inputs Weights
Threshold -0.08424420101244516
Attrib department=sales 0.18487395033283963
Attrib department=systems 0.14441985599862148
Attrib department=marketing -0.014746998161577226
Attrib department=secretary -0.17691559953689145
Attrib age=31-35 0.02784723014273439
Attrib age=26-30 0.8489877575101271
Attrib age=21-25 0.9380267979491645
Attrib age=41-45 -0.2608835537805671
Attrib age=36-40 -1.0141091744602124
Attrib age=46-50 -0.39718995571738797
Attrib salary=46K-50k -1.3435954859187489
Attrib salary=26K-30K 0.6385529218748827
Attrib salary=31K-35K 1.186853891625651
Attrib salary=46K-50K 0.14603610059077612
Attrib salary=66K-70K -0.9588501691188384
Attrib salary=41K-45k 1.0099505342882553
Attrib salary=36K-40K -0.362947971136462
Sigmoid Node 10
Inputs Weights
Threshold 0.09908187740254618

Attrib department=sales -0.05342457992841399


Attrib department=systems -0.1192290423182317
Attrib department=marketing -0.05846524907628152
Attrib department=secretary 0.14338709613151754
Attrib age=31-35 -0.09957529167902422
Attrib age=26-30 -0.5969152237408825
Attrib age=21-25 -0.7313360476645168
Attrib age=41-45 0.11828014170484517
Attrib age=36-40 0.7424640467459025
Attrib age=46-50 0.2309416143733404
Attrib salary=46K-50k 0.854775905742494
Attrib salary=26K-30K -0.4117530818438961
Attrib salary=31K-35K -0.7689602300444687
Attrib salary=46K-50K -0.09902009869134437
Attrib salary=66K-70K 0.7183111474183733
Attrib salary=41K-45k -0.8460358832180466
Attrib salary=36K-40K 0.22686248383226118
Class senior
Input
Node 0
Class junior
Input
Node 1

Time taken to build model: 2.09 seconds

Mediante este algoritmo podemos observar cada uno de los nodos con cada uno de sus
atributos y los pesos que corresponde a cada nodo.

D. LibSVM tipo de Kernel: Linear


=== Classifier model (full training set) ===

LibSVM wrapper, original code by Yasser EL-Manzalawy (= WLSVM)

Time taken to build model: 0.14 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances

125

75.7576 %

40

24.2424 %

Incorrectly Classified Instances


Kappa statistic

0.3429

Mean absolute error

0.2424

Root mean squared error


Relative absolute error

0.4924
56.0304 %

Root relative squared error

105.9548 %

Coverage of cases (0.95 level)

75.7576 %

Mean rel. region size (0.95 level)


Total Number of Instances

50
165

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC

ROC Area PRC Area Class

0,346 0,053 0,750

0,346 0,474

0,386 0,647

0,466

senior

0,947 0,654 0,759

0,947 0,843

0,386 0,647

0,755

junior

Weighted Avg. 0,758 0,465 0,756

0,758 0,726

0,386 0,647

=== Confusion Matrix ===

a b <-- classified as
18 34 | a = senior
6 107 | b = junior
LibSVM Tipo de Kernel : Polinomial
=== Classifier model (full training set) ===

LibSVM wrapper, original code by Yasser EL-Manzalawy (= WLSVM)

Time taken to build model: 0.06 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances


Incorrectly Classified Instances
Kappa statistic
Mean absolute error
Root mean squared error

132

80

33

20

0.4409
0.2
0.4472

0,664

Relative absolute error

46.2251 %

Root relative squared error

96.2382 %

Coverage of cases (0.95 level)

80

Mean rel. region size (0.95 level)


Total Number of Instances

50

165

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC

ROC Area PRC Area Class

0,365 0,000 1,000

0,365 0,535

0,532 0,683

0,565

senior

1,000 0,635 0,774

1,000 0,873

0,532 0,683

0,774

junior

Weighted Avg. 0,800 0,435 0,845

0,800 0,766

0,532 0,683

=== Confusion Matrix ===

a b <-- classified as
19 33 | a = senior
0 113 | b = junior
LibSVM tipo de Kernel: Funcion Radial
=== Classifier model (full training set) ===

LibSVM wrapper, original code by Yasser EL-Manzalawy (= WLSVM)

Time taken to build model: 0.03 seconds

0,708

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances

165

Incorrectly Classified Instances


Kappa statistic

100

%
%

Mean absolute error

Root mean squared error


Relative absolute error

0
0

Root relative squared error

Coverage of cases (0.95 level)

100

50

Mean rel. region size (0.95 level)


Total Number of Instances

165

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC

ROC Area PRC Area Class

1,000 0,000 1,000

1,000 1,000

1,000 1,000

1,000

senior

1,000 0,000 1,000

1,000 1,000

1,000 1,000

1,000

junior

Weighted Avg. 1,000 0,000 1,000

=== Confusion Matrix ===

a b <-- classified as
52 0 | a = senior

1,000 1,000

1,000 1,000

1,000

0 113 | b = junior
LibSVM tipo de Kernel: Sigmoid
=== Classifier model (full training set) ===

LibSVM wrapper, original code by Yasser EL-Manzalawy (= WLSVM)

Time taken to build model: 0.04 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances

143

86.6667 %

22

13.3333 %

Incorrectly Classified Instances


Kappa statistic

0.6513

Mean absolute error

0.1333

Root mean squared error


Relative absolute error

0.3651
30.8167 %

Root relative squared error

78.5782 %

Coverage of cases (0.95 level)

86.6667 %

Mean rel. region size (0.95 level)


Total Number of Instances

50

165

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC

ROC Area PRC Area Class

0,577 0,000 1,000

0,577 0,732

0,695 0,788

0,710

senior

1,000 0,423 0,837

1,000 0,911

0,695 0,788

0,837

junior

Weighted Avg. 0,867 0,290 0,888

0,867 0,855

0,695 0,788

0,797

=== Confusion Matrix ===

a b <-- classified as
30 22 | a = senior
0 113 | b = junior

Este tipo de algoritmo nos presenta el numero correcto de instancias que se clasificaron y las
que no se clasificaron correctamente con sus respectivos porcentajes, nos muestra un error
relativo, una cobertura de los casos y su porcentaje, la raz del error cuadrado y su error relativo.

Se hace una tabla de valores con los detalles de la precisin por clase, con valores como tasa TP,
tasa FP, Precisin, re-llamado, medida F, MCC con valores que varan entre 0 y 1.
Con cada tipo de kernel diferente el nmero de instancias correctas que se clasifican correcta e
incorrectamente cambia por tal razn varan todos los datos y porcentajes.
Con el kernel de funcin de Base Radial el procentaje de instancias clasificadas
incorrectamente fue 0 por tal razn los porcentajes de error, y la raz del error cuadrado es 0 y
la tada TP esta en 1, tada FP 1, precisin 1, recall 1, medida F 1 y MCC 1.

2.
Belief propagation

Es un algoritmo para realizar inferencias en modelos grficos, como redes bayesianas y los campos
aleatorios de Markov . Calculando la distribucin marginal de cada nodo. Es utilizado en la
inteligencia artificial y teora de la informacin, se ha demostrado que es un algoritmo til en
aproximada de grficos generales.

Anda mungkin juga menyukai