Klasifikasi Teks Dengan TFIDF - Ipynb - Colaboratory

6/13/22, 10:18 AM Klasifikasi Teks dengan TFIDF.
ipynb - Colaboratory
pip install sastrawi
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/p

Collecting sastrawi
Downloading Sastrawi-1.0.1-py2.py3-none-any.whl (209 kB)
|████████████████████████████████| 209 kB 29.3 MB/s
Installing collected packages: sastrawi
Successfully installed sastrawi-1.0.1
##·for·data
import·pandas·as·pd
import numpy as np
## for plotting
import matplotlib.pyplot as plt
import seaborn as sns
## for preprocessing
import re
from Sastrawi.Stemmer.StemmerFactory import StemmerFactory
from Sastrawi.StopWordRemover.StopWordRemoverFactory import StopWordRemoverFactory
## for bag-of-words
from sklearn import feature_extraction, model_selection, naive_bayes, metrics
df = pd.read_csv('data sentimen dki.csv')
df.head()
Id Sentiment Pasangan Calon Text Tweet
0 1 negative Agus-Sylvi Banyak akun kloning seolah2 pendukung #agussil...
1 2 negative Agus-Sylvi #agussilvy bicara apa kasihan yaa...lap itu ai...
2 3 negative Agus-Sylvi Kalau aku sih gak nunggu hasil akhir QC tp lag...
3 4 negative Agus-Sylvi Kasian oh kasian dengan peluru 1milyar untuk t...
4 5 negative Agus-Sylvi Maaf ya pendukung #AgusSilvy..hayo dukung #Ani...
from Sastrawi.Stemmer.StemmerFactory import StemmerFactory #memanggil modul untuk stemming
from Sastrawi.StopWordRemover.StopWordRemoverFactory import StopWordRemoverFactory #memang
algstem = StemmerFactory()
stemmer = algstem.create_stemmer() #deklarasi algoritma stemming
stpw = StopWordRemoverFactory()
stplist = stpw.get_stop_words() #mendapatkan daftar stopword yang ada di library
cleantext = "[^A-Za-z0-9]"
https://colab.research.google.com/drive/1m7roGmVakScuYcw_ulzQrquQeD5MsvVh?authuser=1#scrollTo=-j_gKYhG1w9a&printMode=true 1/4
6/13/22, 10:18 AM Klasifikasi Teks dengan TFIDF.ipynb - Colaboratory
def preparasi(teks):
tokens = []
#remove noise and lowercasing
teks = re.sub(cleantext,' ',str(teks).lower()).strip()
#spell check menggunakan nltk edit distance
for word in teks.split():
#stemming
token = stemmer.stem(word) #penggunaan sastrawi untuk stemming
#print(token)
#remove stopword
if token not in stplist: #mengecek apakah hasil stem merupakan stopword
#print(token)
tokens.append(token)
return " ".join(tokens)
df['preprocessing'] = df['Text Tweet'].apply(lambda x:preparasi(x))
df.head()
Id Sentiment Pasangan Calon Text Tweet
0 1 negative Agus-Sylvi Banyak akun kloning seolah2 pendukung #agussil... banyak
1 2 negative Agus-Sylvi #agussilvy bicara apa kasihan yaa...lap itu ai... agu
2 3 negative Agus-Sylvi Kalau aku sih gak nunggu hasil akhir QC tp lag... kala
3 4 negative Agus-Sylvi Kasian oh kasian dengan peluru 1milyar untuk t... kas
4 5 negative Agus-Sylvi Maaf ya pendukung #AgusSilvy..hayo dukung #Ani... maaf d
df_baru = df[['Sentiment','preprocessing']]
df_baru.head()
df_train, df_test = model_selection.train_test_split(df_baru, test_size=0.3)
y_train·=·df_train['Sentiment'].values·#data·training,·y
y_test·=·df_test['Sentiment'].values·#data·testing,·y
vectorTFIDF = feature_extraction.text.TfidfVectorizer(ngram_range=(1,2)) #membuat vektor t
x = df_train['preprocessing']
X_train = vectorTFIDF.fit_transform(x) #data training, x
#membuat model menggunakan data training
classifier = naive_bayes.MultinomialNB()
model = classifier.fit(X_train, y_train)
x_test = df_test['preprocessing']
X_test = vectorTFIDF.transform(x_test)
#ujicoba dengan data uji yang sudah ada label sebenarnya
hasil = model.predict(X_test)
hasil
accuracy = metrics.accuracy_score(y_test, hasil)
print("Accuracy:", round(accuracy,2))
print("Detail:")
print(metrics.classification_report(y_test, hasil))
Accuracy: 0.76
Detail:
precision recall f1-score support
negative 0.84 0.64 0.73 138
positive 0.70 0.87 0.78 132
accuracy 0.76 270
macro avg 0.77 0.76 0.75 270
weighted avg 0.77 0.76 0.75 270
#membuat·data·frame·untuk·data·yang·belum·memiliki·label
d·=·{'teks':·['aku·sih·mendukung·calon·yang·amanah·seperti·anies',·'kasian·si·anies,·sudah
df_nolabel·=·pd.DataFrame(data=d)·
#preprocessing·untuk·data·baru
df_nolabel['preprocessing']·=·df_nolabel['teks'].apply(lambda·x:preparasi(x))
df_nolabel.head()
teks preprocessing
0 aku sih mendukung calon yang amanah seperti anies aku sih dukung calon amanah anies
1 kasian si anies, sudah dipecat jadi menteri ma... kasi si anies pecat jadi menteri gila jabat
X_nolabel·=·vectorTFIDF.transform(df_nolabel['preprocessing'])·#membuat·vektor·TFIDF·untuk
model.predict(X_nolabel)·#memprediksi·data·baru
array(['positive', 'negative'], dtype='<U8')
check 0s completed at 9:14 AM

Klasifikasi Teks Dengan TFIDF - Ipynb - Colaboratory

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Klasifikasi Teks Dengan TFIDF - Ipynb - Colaboratory

Diunggah oleh

Hak Cipta:

Format Tersedia

6/13/22, 10:18 AM Klasifikasi Teks dengan TFIDF.

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/p

Downloading Sastrawi-1.0.1-py2.py3-none-any.whl (209 kB)

|████████████████████████████████| 209 kB 29.3 MB/s

Installing collected packages: sastrawi

Successfully installed sastrawi-1.0.1

Id Sentiment Pasangan Calon Text Tweet

0 1 negative Agus-Sylvi Banyak akun kloning seolah2 pendukung #agussil...

1 2 negative Agus-Sylvi #agussilvy bicara apa kasihan yaa...lap itu ai...

3 4 negative Agus-Sylvi Kasian oh kasian dengan peluru 1milyar untuk t...

4 5 negative Agus-Sylvi Maaf ya pendukung #AgusSilvy..hayo dukung #Ani...

Id Sentiment Pasangan Calon Text Tweet

0 1 negative Agus-Sylvi Banyak akun kloning seolah2 pendukung #agussil... banyak

4 5 negative Agus-Sylvi Maaf ya pendukung #AgusSilvy..hayo dukung #Ani... maaf d

precision recall f1-score support

negative 0.84 0.64 0.73 138

positive 0.70 0.87 0.78 132

accuracy 0.76 270

macro avg 0.77 0.76 0.75 270

weighted avg 0.77 0.76 0.75 270

array(['positive', 'negative'], dtype='<U8')

check 0s completed at 9:14 AM

Anda mungkin juga menyukai