Anda di halaman 1dari 24

Tugas Kapita Selekta Ilmu Komputer 1

1. IRIS
a) Deskripsi Data
Data “IRIS” merupakan data hasil pengukuran panjang dan lebarnya bagian bunga.
Bagian bunga yang diukur panjang dan lebarnya adalah tangkai bunga dan kelopak
bunga. Dari hasil pengukuran ini bunga dibedakan menjadi 3 class, yakni Iris Setosa,
Iris Versicolour, Iris Virginica.

b) Atribut
Atribut yang ada di dalam database :
 sepalength : panjang tangkai dalam cm
 sepalwidth : lebar tangkai dalam cm
 petallength : panjang kelopak dalam cm
 petalwidth : lebar kelopak dalam cm

c) Class
Terdapat 3 class :
-- Iris Setosa
-- Iris Versicolour
-- Iris Virginica

d) Jumlah Instance
Jumlah instance dari database ini adalah 30 kelopak bunga.

e) Hasil Clustering
Untuk melakukan clustering, maka atribut class tidak akan menjadi atribut yang
dicluster. Kemudian akan dilakukan clustering dan akan dijadikan 3 cluster
menggunakan metode – metode di bawah ini :
(1). Simple K-Means
 Akurasi
Dengan setting seperti di bawah ini,

maka didapatkan hasil seperti di bawah ini :

Scheme: weka.clusterers.SimpleKMeans -N 3 -A
"weka.core.EuclideanDistance -R first-last" -I 500 -S 10
Relation: iris
Instances: 30
Attributes: 5
sepallength
sepalwidth
petallength
petalwidth
Ignored:
class
Test mode: evaluate on training data

=== Model and evaluation on training set ===


kMeans
======

Number of iterations: 6
Within cluster sum of squared errors: 1.913551539144951
Missing values globally replaced with mean/mode

Cluster centroids:
Cluster#
Attribute Full Data 0 1 2
(30) (10) (14) (6)
=========================================================
sepallength 5.8433 4.86 6.7643 5.3333
sepalwidth 3.04 3.31 3.05 2.5667
petallength 3.8633 1.45 5.4357 4.2167
petalwidth 1.2133 0.22 1.8286 1.4333

Clustered Instances

0 10 ( 33%)
1 14 ( 47%)
2 6 ( 20%)

 Waktu
14:42:13: Command: weka.clusterers.SimpleKMeans -N 3 -A
"weka.core.EuclideanDistance -R first-last" -I 500 -S 10
14:42:13: Finished weka.clusterers.SimpleKMeans

 Analisis
(2). Hierarchical Clusterer
 Akurasi
Dengan setting seperti di bawah ini,

maka didapatkan :

Scheme: weka.clusterers.HierarchicalClusterer -N 3
-L COMPLETE -P -A "weka.core.EuclideanDistance -R first-
last"
Relation: iris
Instances: 30
Attributes: 5
sepallength
sepalwidth
petallength
petalwidth
Ignored:
class
Test mode: evaluate on training data

=== Model and evaluation on training set ===

Cluster 0
(((((0.2:0.06988,0.2:0.06988):0.05654,0.2:0.12642):0.0552
,0.3:0.18162):0.2247,0.4:0.40632):0.29968,
(((0.2:0.07745,0.1:0.07745):0.06357,
(0.2:0.07942,0.2:0.07942):0.06161):0.06946,0.2:0.21048):0
.49552)

Cluster 1
(((((1.4:0.08968,1.5:0.08968):0.1526,
(1.5:0.08968,1.6:0.08968):0.1526):0.07967,
(1.5:0.10875,1.3:0.10875):0.2132):0.23987,
((1.9:0.3375,1.8:0.3375):0.02214,
(1.8:0.19239,2.2:0.19239):0.16725):0.20218):0.21658,
((2.5:0.33855,2.5:0.33855):0.18954,
((2.1:0.17069,1.8:0.17069):0.03391,2.1:0.20459):0.3235):0
.25032)

Cluster 2
((1.3:0.26857,1.0:0.26857):0.16948,
((1.3:0.20706,1.4:0.20706):0.14711,1.7:0.35417):0.08388)

Clustered Instances

0 10 ( 33%)
1 15 ( 50%)
2 5 ( 17%)

 Waktu
12:53:00: Command: weka.clusterers.EM -I 100 -N 3 -M 1.0E-6 -S 100
12:53:00: Finished weka.clusterers.EM

 Analisis
(3). Simple EM (expectation maximisation) class
 Akurasi
Dengan setting seperti di bawah ini,

maka didapatkan 3 kluster, yakni :

Scheme: weka.clusterers.EM -I 100 -N 3 -M 1.0E-6 -S


100
Relation: iris
Instances: 30
Attributes: 5
sepallength
sepalwidth
petallength
petalwidth
Ignored:
class
Test mode: evaluate on training data

=== Model and evaluation on training set ===


EM
==

Number of clusters: 3

Cluster
Attribute 0 1 2
(0.16) (0.5) (0.33)
======================================
sepallength
mean 5.2338 6.6953 4.86
std. dev. 0.3181 0.4646 0.2764

sepalwidth
mean 2.5361 3.0257 3.31
std. dev. 0.1839 0.2617 0.2914

petallength
mean 4.0348 5.4087 1.45
std. dev. 0.4474 0.6904 0.1025

petalwidth
mean 1.3418 1.8305 0.22
std. dev. 0.2275 0.3694 0.0748

Clustered Instances

0 5 ( 17%)
1 15 ( 50%)
2 10 ( 33%)

Log likelihood: -1.62297

 Waktu
12:53:00: Command: weka.clusterers.EM -I 100 -N 3 -M 1.0E-6 -S 100
12:53:00: Finished weka.clusterers.EM
 Analisis
2. Diabetes
a) Deskripsi Data
Data “Diabetes” merupakan data hasil pemeriksaan pasien untuk mendiagnosa
penyakit diabetes.

b) Atribut
Atribut yang ada di dalam database :
 Preg : Jumlah kali hamil
 Plas : Konsentrasi glukosa plasma 2 jam dalam tes toleransi glukosa oral
 Prees : Tekanan darah diastolik (mm Hg)
 Skin : Ketebalan lipatan trisep (mm)
 Insu : Jam serum insulin (mu U / ml)
 Mass : Indeks massa tubuh (berat dalam kg / (tinggi dalam m) ^ 2)
 Pedi : Diabetes silsilah fungsi
 Age : Umur (Dalam Tahun)
c) Class

Ada 2 class yakni tested_positive dan tested_negative.

d) Jumlah Instance
Jumlah instance dari database ini adalah 30.

e) Hasil Clustering
Untuk melakukan clustering, maka atribut class tidak akan menjadi atribut yang
dicluster. Kemudian akan dilakukan clustering dan akan dijadikan 2 cluster
menggunakan metode – metode di bawah ini :
(1). Simple K-Means
 Akurasi
Dengan setting seperti di bawah ini,

maka didapatkan :
Scheme: weka.clusterers.SimpleKMeans -N 2 -A
"weka.core.EuclideanDistance -R first-last" -I 500 -S 10
Relation: pima_diabetes
Instances: 30
Attributes: 9
preg
plas
pres
skin
insu
mass
pedi
age
Ignored:
class
Test mode: evaluate on training data

=== Model and evaluation on training set ===

kMeans
======

Number of iterations: 4
Within cluster sum of squared errors: 12.010726677127337
Missing values globally replaced with mean/mode

Cluster centroids:
Cluster#
Attribute Full Data 0 1
(30) (16) (14)
============================================
preg 5.4667 3.4375 7.7857
plas 130.0667 127.1875 133.3571
pres 68.5333 67.375 69.8571
skin 17.5 31.625 1.3571
insu 102.3 184.9375 7.8571
mass 31.6367 33.2313 29.8143
pedi 0.4608 0.5 0.416
age 38.2667 36.875 39.8571

Clustered Instances

0 16 ( 53%)
1 14 ( 47%)

 Waktu
13:25:18: Command: weka.clusterers.SimpleKMeans -N 2 -A
"weka.core.EuclideanDistance -R first-last" -I 500 -S 10
13:25:18: Finished weka.clusterers.SimpleKMeans

 Analisis

(2). Hierarchical Clusterer


 Akurasi
Dengan setting seperti di bawah ini,

maka didapatkan :

Scheme: weka.clusterers.HierarchicalClusterer -N 2
-L COMPLETE -P -A "weka.core.EuclideanDistance -R first-
last"
Relation: pima_diabetes
Instances: 30
Attributes: 9
preg
plas
pres
skin
insu
mass
pedi
age
Ignored:
class
Test mode: evaluate on training data

=== Model and evaluation on training set ===

Cluster 0
(((((50.0:0.46639,51.0:0.46639):0.28299,
(51.0:0.5039,57.0:0.5039):0.24547):0.16752,
(29.0:0.42072,41.0:0.42072):0.49617):0.27996,
(((32.0:0.59795,
((34.0:0.39844,43.0:0.39844):0.05075,41.0:0.44919):0.1487
6):0.32575,(((30.0:0.19574,31.0:0.19574):0.15226,
(30.0:0.25324,38.0:0.25324):0.09476):0.29608,50.0:0.64408
):0.27962):0.19358,
(54.0:0.85829,57.0:0.85829):0.25899):0.07957):0.2383,
(29.0:0.33937,32.0:0.33937):1.09579)

Cluster 1
((((((31.0:0.31405,26.0:0.31405):0.1483,32.0:0.46236):0.0
514,(21.0:0.26581,22.0:0.26581):0.24794):0.40688,
((31.0:0.33483,27.0:0.33483):0.40912,33.0:0.74395):0.1766
8):0.30059,33.0:1.22122):0.36421,
(53.0:0.63725,59.0:0.63725):0.94818)

Clustered Instances

0 19 ( 63%)
1 11 ( 37%)

 Waktu
13:28:21: Command: weka.clusterers.HierarchicalClusterer -N 2 -L COMPLETE -P
-A "weka.core.EuclideanDistance -R first-last"
13:28:22: Finished weka.clusterers.HierarchicalClusterer
 Analisis
(3). EM
 Akurasi
Dengan setting seperti di bawah ini :
maka didapatkan 2 kluster, yakni :

Scheme: weka.clusterers.EM -I 100 -N 2 -M 1.0E-6 -S


100
Relation: pima_diabetes
Instances: 30
Attributes: 9
preg
plas
pres
skin
insu
mass
pedi
age
Ignored:
class
Test mode: evaluate on training data

=== Model and evaluation on training set ===

EM
==

Number of clusters: 2

Cluster
Attribute 0 1
(0.36) (0.64)
==============================
preg
mean 1.2695 7.8785
std. dev. 0.9612 2.3326

plas
mean 121.4449 135.021
std. dev. 37.9009 27.2698

pres
mean 62.758 71.852
std. dev. 16.4957 26.1592

skin
mean 32.5661 8.8426
std. dev. 9.4485 13.5367

insu
mean 230.0458 28.8932
std. dev. 237.8522 56.9629

mass
mean 34.1669 30.1827
std. dev. 7.2617 8.8454

pedi
mean 0.5524 0.4082
std. dev. 0.5762 0.29

age
mean 33.4912 41.0108
std. dev. 11.4278 10.01

Clustered Instances

0 11 ( 37%)
1 19 ( 63%)

Log likelihood: -29.55391

 Waktu
15:10:20: Command: weka.clusterers.EM -I 100 -N 2 -M 1.0E-6 -S 100
15:10:21: Finished weka.clusterers.EM

 Analisis
3. Companies
a) Deskripsi Data
Data “Company” merupakan data fakta tentang perusahaan yang dipilih dari daftar
Forbes 500 untuk tahun 1986. Di dalam data ini hanya memuat 1/10 dari data
keseluruhan yang disusun secara sistematis berdasarkan abjad nama perusahaan. The
Forbes 500 mencakup 500 perusahaan terbaik berdasarkan suatu kriteria.

b) Atribut
Atribut yang ada di dalam database :
 Company : Nama Perusahaan
 Assets : Amount of assets (in millions)
 Sales : Jumlah penjualan (dalam jutaan)
 Market_Value : Nilai pasar perusahaan (dalam jutaan)
 Profits : Laba (dalam jutaan)
 Cash_Flow: Cash Flow (dalam jutaan)
 Employees: Jumlah karyawan (dalam ribuan)
 Sector: Jenis pasar perusahaan terkait dengan

c) Class
Tidak ada class.
d) Jumlah Instance
Jumlah instance dari database ini adalah 30 perusahaan.

e) Hasil Clustering
Jumlah klaster yang kan dibentuk adalah 4 klaster menggunakan metode – metode di
bawah ini :
(1). Simple K-Means
 Akurasi
Dengan setting seperti di bawah ini,

maka didapatkan :

Scheme: weka.clusterers.SimpleKMeans -N 4 -A
"weka.core.EuclideanDistance -R first-last" -I 500 -S 10
Relation: relation
Instances: 30
Attributes: 8
Assets
Sales
Market_Value
Profits
Cash_Flow
Employees
Ignored:
Company
sector
Test mode: evaluate on training data

=== Model and evaluation on training set ===


kMeans
======

Number of iterations: 6
Within cluster sum of squared errors: 1.4707257092777661
Missing values globally replaced with mean/mode

Cluster centroids:
Cluster#
Attribute Full Data 0 1
2 3
(30) (19) (1)
(4) (6)
=========================================================
Assets 5476.4667 1888.1053 44736
11097.75 6548.8333
Sales 2921.7333 880.0526 16197
8505.25 3452.1667
Market_Value 2103.2 673.4737 4653
7810.5 2400.8333
Profits 101.8067 45.0158 -732.5
374.525 238.8833
Cash_Flow 259.1667 76.9632 -651.9
967.75 515.6
Employees 25.7367 8.1632 48.5
109.625 21.6667

Clustered Instances

0 19 ( 63%)
1 1 ( 3%)
2 4 ( 13%)
3 6 ( 20%)

 Waktu
13:41:32: Command: weka.clusterers.SimpleKMeans -N 4 -A
"weka.core.EuclideanDistance -R first-last" -I 500 -S 10
13:41:32: Finished weka.clusterers.SimpleKMeans

 Analisis
(2). Hierarchical Clusterer
 Akurasi
Dengan setting seperti di bawah ini,

maka didapatkan :
Scheme: weka.clusterers.HierarchicalClusterer -N 4
-L COMPLETE -P -A "weka.core.EuclideanDistance -R first-
last"
Relation: relation
Instances: 30
Attributes: 8
Assets
Sales
Market_Value
Profits
Cash_Flow
Employees
Ignored:
Company
sector
Test mode: evaluate on training data

=== Model and evaluation on training set ===

Cluster 0
(((((18.2:0.11383,21.9:0.11383):0.03855,6.2:0.15238):0.14
323,10.8:0.2956):0.04322,((((1.1:0.0295,
(2.1:0.01291,2.1:0.01291):0.01659):0.04114,
((4.1:0.01,4.1:0.01):0.00919,3.0:0.0192):0.05144):0.09718
,(4.8:0.11296,
(((2.8:0.02581,3.8:0.02581):0.0058,0.7:0.03161):0.05233,
(3.8:0.03522,2.8:0.03522):0.04872):0.02901):0.05487):0.03
626,(((20.8:0.0538,22.5:0.0538):0.02151,
(12.6:0.02879,16.0:0.02879):0.04652):0.04714,
((19.4:0.04674,15.4:0.04674):0.0245,13.2:0.07124):0.05121
):0.08163):0.13474):0.35165,
(23.4:0.34169,49.5:0.34169):0.34879)

Cluster 1
(143.8:0.61292,(128.0:0.46779,87.3:0.46779):0.14513)

Clustered Instances

0 25 ( 83%)
1 3 ( 10%)
2 1 ( 3%)
3 1 ( 3%)

 Waktu
13:44:30: Command: weka.clusterers.HierarchicalClusterer -N 4 -L COMPLETE -P
-A "weka.core.EuclideanDistance -R first-last"
13:44:30: Finished weka.clusterers.HierarchicalClusterer

 Analisis
(3). Simple EM (expectation maximisation) class
 Akurasi
Dengan setting seperti di bawah ini,
maka didapatkan :

Scheme: weka.clusterers.EM -I 100 -N 4 -M 1.0E-6 -S


100
Relation: relation
Instances: 30
Attributes: 8
Assets
Sales
Market_Value
Profits
Cash_Flow
Employees
Ignored:
Company
sector
Test mode: evaluate on training data

=== Model and evaluation on training set ===

EM
==

Number of clusters: 4

Cluster
Attribute 0 1 2
3
(0.03) (0.13) (0.63)
(0.2)
=========================================================
==
Assets
mean 44736 11097.7023 1888.3692
6543.0483
std. dev. 8673.648 5965.852 1649.3666
3380.1425

Sales
mean 16197 8505.2202 879.7877
3450.2544
std. dev. 3750.7583 861.5581 540.9635
1713.8357

Market_Value
mean 4653 7810.4384 673.3435
2399.41
std. dev. 2652.5976 2201.2806 360.6344
1003.3248

Profits
mean -732.5 374.5248 44.9984
238.7316
std. dev. 283.697 487.8764 30.6113
180.9456

Cash_Flow
mean -651.9 967.7468 76.9354
515.2209
std. dev. 522.2425 958.9774 41.8034
236.0391

Employees
mean 48.5 109.6243 8.1591
21.6649
std. dev. 36.9972 27.0078 7.2381
13.8197

Clustered Instances

0 1 ( 3%)
1 4 ( 13%)
2 19 ( 63%)
3 6 ( 20%)

Log likelihood: -41.37623


 Waktu
13:48:16: Command: weka.clusterers.EM -I 100 -N 4 -M 1.0E-6 -S 100
13:48:16: Finished weka.clusterers.EM

 Analisis

Anda mungkin juga menyukai