Komputer Skripsi
Komputer Skripsi
Komputer Skripsi
id
digilib.uns.ac.id
SKRIPSI
Diajukan untuk Memenuhi Salah Satu Syarat Mencapai Gelar Strata Satu
Jurusan Informatika
Disusun oleh :
Andriyanto Dwi N
NIM. M0508085
JURUSAN INFORMATIKA
FAKULTAS MATEMATIKA & ILMU PENGETAHUAN ALAM
UNIVERSITAS SEBELAS MARET
SURAKARTA
2013
commit to user
perpustakaan.uns.ac.id
digilib.uns.ac.id
HALAMAN JUDUL
commit to user
ii
perpustakaan.uns.ac.id
digilib.uns.ac.id
commit to user
iii
perpustakaan.uns.ac.id
digilib.uns.ac.id
iv
perpustakaan.uns.ac.id
digilib.uns.ac.id
ABSTRACT
Spam is a very disturbing case, so it is necessary to filtering the classify email
and there are several methods that can classify the email. the methods are Bayesian
Chi-Square and Nave Bayes Classifier, both are classify the email mathematically
based on words, phrases and domains contained within the email.
this Research of analysing spam filtering on the mail server is using the
Bayesian-Chi Square and Nave Bayes Classifier methods. Both were compared to
determine which method is more effective on spam filtering. the methods could be
integrated with the mail server, then training using the data set TREC2007 which
have been classified into ham and spam. the Samples obtained take some random
data from TREC2007 . In the testing phase of each pieces, performed testing with
300 sample data of random email. gradually, Input training data first phase up to 750
emails, second phase up to 1050 email, and the last was 1350. next, the testing phase,
the testing is done for each phase of training data in both methods. the Bayesian ChiSquare test method is done by changing the threshold between spam and ham, so
there will know which is the best threshold to use.
based on test result, the conclusion are the Bayesian Chi-Square method has the
best accuracy threshold at 40 and 60, with the accuracy was 87%. While Nave
Bayes Classifier method had better results with the required default 5 is able to
produce the best accuracy reaches 92,6%, this high accuracy also resulted in some
type of the error such as errors result in spam being the ham that interfere the
performance of the server, the second error is ham being spam which is the email
that should be in inbox will become spam or deleted. there are error unsure in Chisquare bayesian that consequently the user must classify email independently but its
value is inversely proportional to accuracy.
Keywords: Bayesian-Chi Square, Email, Ham, Nave Bayes Classifier, Spam
commit to user
perpustakaan.uns.ac.id
digilib.uns.ac.id
MOTTO
commit to user
vi
perpustakaan.uns.ac.id
digilib.uns.ac.id
PERSEMBAHAN
commit to user
vii
perpustakaan.uns.ac.id
digilib.uns.ac.id
KATA PENGANTAR
Bismillahirrahmaanirrahiim
Puji syukur penulis panjatkan kehadirat Allah Subhanahu Wa Taala yang
senantiasa memberikan nikmat dan karunia-Nya sehingga penulis dapat
menyelesaikan skripsi dengan judul Analisis Spam Filtering pada Mail Server
dengan Metode Bayesian-Chi Square dan Nave Bayes classifier, yang menjadi salah
satu syarat wajib untuk memperoleh gelar Sarjana Informatika di Universitas Sebelas
Maret (UNS) Surakarta.
Penulis menyadari akan keterbatasan yang dimiliki, begitu banyak
bimbingan, bantuan, serta motivasi yang diberikan dalam proses penyusunan skripsi
ini. Oleh karena itu, ucapan terima kasih penulis sampaikan kepada :
1. Ibu Umi Salamah,S.Si.,M.Kom. selaku Ketua Jurusan S1 Informatika,
2. Bapak Abdul aziz, S.kom, M.Cs. selaku Dosen Pembimbing I yang penuh
kesabaran membimbing, mengarahkan, dan memberi motivasi kepada penulis
selama proses penyusunan skripsi ini,
3. Bapak Ristu Saptono, S.Si. MT. selaku Dosen Pembimbing II yang penuh
kesabaran membimbing, mengarahkan, dan memberi motivasi kepada penulis
selama proses penyusunan skripsi ini,
4. Bapak Wiharto, S.T., M.Kom. selaku Pembimbing Akademik yang telah
banyak memberi bimbingan dan pengarahan selama penulis menempuh studi di
Jurusan Informatika FMIPA UNS,
5. Bapak dan Ibu dosen di Jurusan Informatika FMIPA UNS yang telah mengajar
penulis selama masa studi dan membantu dalam proses penyusunan skripsi ini,
6. Ibu, Bapak, dan kakak-kakakku, serta teman-teman yang telah memberikan
bantuan sehingga penyusunan skripsi ini dapat terselesaikan.
Penulis berharap semoga skripsi ini dapat bermanfaat bagi semua pihak
yang berkepentingan.
Surakarta, Mei 2013
Penulis
commit to user
viii
perpustakaan.uns.ac.id
digilib.uns.ac.id
DAFTAR ISI
HALAMAN JUDUL.................................................................................................... ii
ABSTRAK .................................................................................................................. iv
ABSTRACT ................................................................................................................. v
MOTTO ...................................................................................................................... vi
PERSEMBAHAN ...................................................................................................... vii
KATA PENGANTAR .............................................................................................. viii
DAFTAR ISI ............................................................................................................... ix
DAFTAR TABEL ....................................................................................................... xi
DAFTAR GAMBAR ................................................................................................. xii
DAFTAR LAMPIRAN ............................................................................................. xiii
DAFTAR SIMBOL................................................................................................... xiv
BAB I PENDAHULUAN ............................................................................................ 2
1.1 Latar Belakang ................................................................................................. 2
1.2 Rumusan Masalah ............................................................................................ 3
1.3 Batasan Masalah ............................................................................................... 4
1.4 Tujuan Penelitian.............................................................................................. 4
1.5 Manfaat Penelitian............................................................................................ 4
1.6 Sistematika Penulisan ....................................................................................... 4
BAB II TINJAUAN PUSTAKA .................................................................................. 6
2.1 Landasan Teori ................................................................................................. 6
2.1.1 Mail ........................................................................................................ 6
2.1.2 SPAM MAIL.......................................................................................... 7
2.1.3 Spam Filtering ........................................................................................ 9
2.1.4 Mail Server ............................................................................................. 9
commit to user
2.1.5 Statistical Filtering ............................................................................... 10
ix
perpustakaan.uns.ac.id
digilib.uns.ac.id
commit to user
perpustakaan.uns.ac.id
digilib.uns.ac.id
DAFTAR TABEL
Tabel 3. 1 Data Training 50% .................................................................................... 21
Tabel 3. 2 Data Training 70% .................................................................................... 22
Tabel 3. 3 Data Training 90% .................................................................................... 22
Tabel 3. 4 Identifikasi Email ...................................................................................... 23
Tabel 4. 1 Hasil pengujian data training 50% .........................................................24
Tabel 4. 2 Tabel hasil pengujian 70% ........................................................................ 26
Tabel 4. 3 Hasil pengujian 90% ................................................................................. 27
commit to user
xi
perpustakaan.uns.ac.id
digilib.uns.ac.id
DAFTAR GAMBAR
Gambar 2. 1 Mail Server .............................................................................................. 6
Gambar 2. 2 Proses Pengiriman Mail .......................................................................... 9
Gambar 3. 1 Alur Rancangan Penelitian....................................................................20
Gambar 4. 1 Pengujian data training 50%......................................................25
Gambar 4. 2 Pengujian data training 70% ................................................................. 26
Gambar 4. 3 hasil Pengujian data training 90% ......................................................... 28
commit to user
xii
perpustakaan.uns.ac.id
digilib.uns.ac.id
DAFTAR LAMPIRAN
1.
2.
3.
4.
5.
6.
commit to user
xiii
perpustakaan.uns.ac.id
digilib.uns.ac.id
DAFTAR SIMBOL
p(S|W)
p(W|S)
P(W|H)
P(S)
P(H)
spam
q
f(w)
commit to user
xiv