Anda di halaman 1dari 10

3/8/23, 7:39 PM Tugas 2 SC.

ipynb - Colaboratory

Nama : Dimas Satria Prayoga


NPM : 20081010249
Kelas : A 081

Analisis Risiko Kredit menggunakan KNN


Kita mulai dengan import semua library yang dibutuhkan.

1 import numpy as np
2 import pandas as pd
3 import seaborn as sns
4 import matplotlib.pyplot as plt
5 from IPython.display import display, Markdown, Latex
6 sns.set_style('whitegrid')
7
8 from sklearn.preprocessing import LabelEncoder
9 from sklearn import model_selection
10 from sklearn.cluster import KMeans
11 from sklearn.neighbors import KNeighborsClassifier
12 from sklearn.metrics import f1_score

1 from google.colab import drive
2 drive.mount('/content/drive')

Mounted at /content/drive

Kode tersebut berfungsi untuk melakukan import library yang dibutuhkan dalam pemrosesan data dan machine learning. Line 1-7 mengimport
library numpy, pandas, seaborn, matplotlib.pyplot, IPython.display dan mengatur tampilan grafik dengan style 'whitegrid'. Line 9-13 mengimport
beberapa fungsi dari library scikit-learn seperti LabelEncoder untuk mengubah variabel kategorikal menjadi numerik, model_selection untuk
melakukan pemisahan dataset menjadi data latih dan data uji, KMeans untuk melakukan clustering, KNeighborsClassifier untuk melakukan
klasifikasi, dan f1_score untuk menghitung nilai f1-score dari hasil klasifikasi. Line 14 mengimport library drive dan line 15 digunakan untuk
menghubungkan dengan Google Drive.

1. Memuat dan Memahami dataset

1 df_loan = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Dataset/loan_data.csv")
2 df_loan.head(7)

id member_id loan_amnt funded_amnt funded_amnt_inv term int_rate installment grade sub_grade ... total_bal_il i

36
0 1077501 1296599 5000.0 5000.0 4975.0 10.65 162.87 B B2 ... NaN
months

60
1 1077430 1314167 2500.0 2500.0 2500.0 15.27 59.83 C C4 ... NaN
months

36
2 1077175 1313524 2400.0 2400.0 2400.0 15.96 84.33 C C5 ... NaN
months

36
3 1076863 1277178 10000.0 10000.0 10000.0 13.49 339.31 C C1 ... NaN
months

60
4 1075358 1311748 3000.0 3000.0 3000.0 12.69 67.79 B B5 ... NaN
months

36
5 1075269 1311441 5000.0 5000.0 5000.0 7.90 156.46 A A4 ... NaN
months

60
6 1069639 1304742 7000.0 7000.0 7000.0 15.96 170.08 C C5 ... NaN
months

7 rows × 74 columns

1 df_loan.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3463 entries, 0 to 3462
Data columns (total 74 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 3463 non-null int64
1 member_id 3463 non-null int64
2 loan_amnt 3463 non-null float64
https://colab.research.google.com/drive/159yIM6oyvZe5oTZMv989KC0fuJtRcff5#scrollTo=WazTgR6ytq3C 1/10
3/8/23, 7:39 PM Tugas 2 SC.ipynb - Colaboratory
3 funded_amnt 3463 non-null float64
4 funded_amnt_inv 3463 non-null float64
5 term 3463 non-null object
6 int_rate 3463 non-null float64
7 installment 3463 non-null float64
8 grade 3463 non-null object
9 sub_grade 3463 non-null object
10 emp_title 3259 non-null object
11 emp_length 3361 non-null object
12 home_ownership 3463 non-null object
13 annual_inc 3463 non-null float64
14 verification_status 3463 non-null object
15 issue_d 3463 non-null object
16 loan_status 3463 non-null object
17 pymnt_plan 3463 non-null object
18 url 3463 non-null object
19 desc 1914 non-null object
20 purpose 3463 non-null object
21 title 3463 non-null object
22 zip_code 3463 non-null object
23 addr_state 3463 non-null object
24 dti 3463 non-null float64
25 delinq_2yrs 3462 non-null float64
26 earliest_cr_line 3462 non-null object
27 inq_last_6mths 3462 non-null float64
28 mths_since_last_delinq 1025 non-null float64
29 mths_since_last_record 94 non-null float64
30 open_acc 3462 non-null float64
31 pub_rec 3462 non-null float64
32 revol_bal 3462 non-null float64
33 revol_util 3462 non-null float64
34 total_acc 3462 non-null float64
35 initial_list_status 3462 non-null object
36 out_prncp 3462 non-null float64
37 out_prncp_inv 3462 non-null float64
38 total_pymnt 3462 non-null float64
39 total_pymnt_inv 3462 non-null float64
40 total_rec_prncp 3462 non-null float64
41 total_rec_int 3462 non-null float64
42 total_rec_late_fee 3462 non-null float64
43 recoveries 3462 non-null float64
44 collection_recovery_fee 3462 non-null float64
45 last_pymnt_d 3460 non-null object
46 last_pymnt_amnt 3462 non-null float64
47 next_pymnt_d 373 non-null object
48 last_credit_pull_d 3462 non-null object
49 collections_12_mths_ex_med 3462 non-null float64
50 mths_since_last_major_derog 0 non-null float64
51 policy_code 3462 non-null float64
52 application_type 3462 non-null object

2. Menghapus kolom yang tidak relevan

1 df_loan.drop(df_loan.columns.difference(['loan_amnt','term','int_rate','installment','grade','emp_length','home_ownership',
2                                          'annual_inc','verification_status','loan_status','purpose',]), 1, inplace=True)

<ipython-input-8-aaa88b1c3909>:1: FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argum
df_loan.drop(df_loan.columns.difference(['loan_amnt','term','int_rate','installment','grade','emp_length','home_ownership',

1 df_loan.isnull().sum()

loan_amnt 0
term 0
int_rate 0
installment 0
grade 0
emp_length 102
home_ownership 0
annual_inc 0
verification_status 0
loan_status 0
purpose 0
dtype: int64

Ada banyak "Nilai yang Kosong" di Kolom "emp_length" dan sedikit di "annual_inc".

1 df_loan.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3463 entries, 0 to 3462
Data columns (total 11 columns):

https://colab.research.google.com/drive/159yIM6oyvZe5oTZMv989KC0fuJtRcff5#scrollTo=WazTgR6ytq3C 2/10
3/8/23, 7:39 PM Tugas 2 SC.ipynb - Colaboratory
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 loan_amnt 3463 non-null float64
1 term 3463 non-null object
2 int_rate 3463 non-null float64
3 installment 3463 non-null float64
4 grade 3463 non-null object
5 emp_length 3361 non-null object
6 home_ownership 3463 non-null object
7 annual_inc 3463 non-null float64
8 verification_status 3463 non-null object
9 loan_status 3463 non-null object
10 purpose 3463 non-null object
dtypes: float64(4), object(7)
memory usage: 297.7+ KB

1 df_loan.head(10)

loan_amnt term int_rate installment grade emp_length home_ownership annual_inc verification_status loan_status

36
0 5000.0 10.65 162.87 B 10+ years RENT 24000.0 Verified Fully Paid
months

60
1 2500.0 15.27 59.83 C < 1 year RENT 30000.0 Source Verified Charged Off
months

36
2 2400.0 15.96 84.33 C 10+ years RENT 12252.0 Not Verified Fully Paid sma
months

36
3 10000.0 13.49 339.31 C 10+ years RENT 49200.0 Source Verified Fully Paid
months

60
4 3000.0 12.69 67.79 B 1 year RENT 80000.0 Source Verified Current
months

36
5 5000.0 7.90 156.46 A 3 years RENT 36000.0 Source Verified Fully Paid
months

1 df_loan.annual_inc = df_loan.annual_inc.fillna(0)
2 df_loan.isnull().sum()

loan_amnt 0
term 0
int_rate 0
installment 0
grade 0
emp_length 102
home_ownership 0
annual_inc 0
verification_status 0
loan_status 0
purpose 0
dtype: int64

Untuk menghilangkan nilai null kolom pendapatan tahunan diisi dengan nilai 0

3. Buat label Kolom : Keterangan status pinjaman

Pada kolom ini, nilai 0 diisi dengan kondisi yang benar: 'Dibayar Penuh', 'Tidak memenuhi kebijakan kredit. Status: Dibayar Penuh', 'Saat Ini'
Sedangkan nilai 1 akan diisi dengan kondisi yang salah: 'Terlambat (31-120 hari)', 'Terlambat (16-30 hari)', 'Dalam Masa Tenggang',
'Dikenakan Biaya', 'Wanprestasi', ' Tidak memenuhi kebijakan kredit. Status: Dibebankan Mati

1 # binary classification
2 label_categories = [
3     (0, ['Fully Paid', 'Does not meet the credit policy. Status:Fully Paid', 'Current']),
4     (1, ['Late (31-120 days)', 'Late (16-30 days)', 'In Grace Period', 
5          'Charged Off', 'Default', 'Does not meet the credit policy. Status:Charged Off'])
6 ]
7
8 # function to apply the transformation
9 def classify_label(text):
10     for category, matches in label_categories:
11         if any(match in text for match in matches):
12             return category
13     return None
14
15 df_loan.loc[:, 'label'] = df_loan['loan_status'].apply(classify_label)
16 df_loan = df_loan.drop('loan_status', axis=1)

https://colab.research.google.com/drive/159yIM6oyvZe5oTZMv989KC0fuJtRcff5#scrollTo=WazTgR6ytq3C 3/10
3/8/23, 7:39 PM Tugas 2 SC.ipynb - Colaboratory

1 # label several label with specific grading system.
2 def SC_LabelEncoder1(text):
3     if text == "E":
4         return 1
5     elif text == "D":
6         return 2
7     elif text == "C":
8         return 3
9     elif text == "B":
10         return 4
11     elif text == "A":
12         return 5
13     else:
14         return 0
15     
16     
17 def SC_LabelEncoder2(text):
18     if text == "< 1 year":
19         return 1
20     elif text == "1 year":
21         return 2
22     elif text == "2 years":
23         return 3
24     elif text == "3 years":
25         return 4
26     elif text == "4 years":
27         return 5
28     elif text == "5 years":
29         return 6
30     elif text == "6 years":
31         return 7
32     elif text == "7 years":
33         return 8
34     elif text == "8 years":
35         return 9
36     elif text == "9 years":
37         return 10
38     elif text == "10 years":
39         return 11
40     elif text == "10+ years":
41         return 12
42     else:
43         return 0
44
45 def SC_LabelEncoder3(text):
46     if text == "RENT":
47         return 1
48     elif text == "MORTGAGE":
49         return 2
50     elif text == "OWN":
51         return 3
52     else:
53         return 0
54     
55 df_loan["grade"] = df_loan["grade"].apply(SC_LabelEncoder1)
56 df_loan["emp_length"] = df_loan["emp_length"].apply(SC_LabelEncoder2)
57 df_loan["home_ownership"] = df_loan["home_ownership"].apply(SC_LabelEncoder3)

1 df_loan.head(10)

loan_amnt term int_rate installment grade emp_length home_ownership annual_inc verification_status purpose la

36
0 5000.0 10.65 162.87 4 12 1 24000.0 Verified credit_card
months

60
1 2500.0 15.27 59.83 3 1 1 30000.0 Source Verified car
months

36
2 2400.0 15.96 84.33 3 12 1 12252.0 Not Verified small_business
months

36
3 10000.0 13.49 339.31 3 12 1 49200.0 Source Verified other
months

60
4 3000.0 12.69 67.79 4 2 1 80000.0 Source Verified other
months

36
5 5000.0 7.90 156.46 5 4 1 36000.0 Source Verified wedding
months

https://colab.research.google.com/drive/159yIM6oyvZe5oTZMv989KC0fuJtRcff5#scrollTo=WazTgR6ytq3C 4/10
3/8/23, 7:39 PM Tugas 2 SC.ipynb - Colaboratory
1 df_loan.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3463 entries, 0 to 3462
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 loan_amnt 3463 non-null float64
1 term 3463 non-null object
2 int_rate 3463 non-null float64
3 installment 3463 non-null float64
4 grade 3463 non-null int64
5 emp_length 3463 non-null int64
6 home_ownership 3463 non-null int64
7 annual_inc 3463 non-null float64
8 verification_status 3463 non-null object
9 purpose 3463 non-null object
10 label 3463 non-null int64
dtypes: float64(4), int64(4), object(3)
memory usage: 297.7+ KB

4. Analisis Data Eksplorasi

1 fig, ax = plt.subplots(1,2,figsize=(15,5))
2 sns.countplot(data=df_loan, x='grade', hue="home_ownership", ax=ax[0]).set_title("Grade/Home Ownership distribution");
3 sns.countplot(data=df_loan, x='home_ownership', hue='grade', ax=ax[1]).set_title("Grade/Home Ownership distribution");
4
5 fig, ax = plt.subplots(1,2,figsize=(15,5))
6 sns.countplot(data=df_loan, x='label', hue='purpose', ax=ax[0]).set_title("Grade Distribution with verification_status distribution"
7 sns.countplot(data=df_loan, x='grade', hue='label', ax=ax[1]).set_title("Grade Distribution with loan_status");

Analisis :
1. Jumlah Peminjam dengan grade tinggi akan sedikit dibandingkan dengan grade rendah
2. Sebagian besar tujuan peminjam uang dari label 0 dan 1 adalah konsolidasi utang
3. Grade yang mampu menyelesaikan pinjaman terbanyak adalah grade 4, sedangkan grade yang gagal paling banyak adalah grade 3

1 plt.figure(figsize=(12,6))
2 sns.boxplot(x='purpose', y='loan_amnt', data=df_loan)
3 plt.xticks(rotation=30)
4 plt.title('Loan amounts grouped by purpose')

https://colab.research.google.com/drive/159yIM6oyvZe5oTZMv989KC0fuJtRcff5#scrollTo=WazTgR6ytq3C 5/10
3/8/23, 7:39 PM Tugas 2 SC.ipynb - Colaboratory

Text(0.5, 1.0, 'Loan amounts grouped by purpose')

Analisis :
Ada 5 kategori tertinggi untuk jumlah kredit dengan tujuan sebagai berikut: Kartu kredit, bisnis UMKM, konsolidasi utang, perbaikan rumah, dan
pembelian rumah

1 fig, ax = plt.subplots(1,2,figsize=(15,5))
2 sns.histplot(df_loan, x='loan_amnt',hue="label", bins=30, ax=ax[0]).set_title("Loan Ammount distribution");
3 sns.countplot(data=df_loan, x='term', hue="label", ax=ax[1]).set_title("Term distribution");
4
5 fig, ax = plt.subplots(1,2,figsize=(15,5))
6 sns.countplot(data=df_loan, hue='home_ownership', x='label', ax=ax[1]).set_title("Home ownership with loan_status");
7 sns.countplot(data=df_loan, x='verification_status', hue='label', ax=ax[0]).set_title("Verification Status Distribution with loan_stat

Analisis :
1. Nilai nominal utang terbesar adalah 10.000 USD

https://colab.research.google.com/drive/159yIM6oyvZe5oTZMv989KC0fuJtRcff5#scrollTo=WazTgR6ytq3C 6/10
3/8/23, 7:39 PM Tugas 2 SC.ipynb - Colaboratory

2. Jatuh tempo maksimal 36 bulan, sedangkan untuk 60 bulan hampir sepertiganya


3. Sebagian besar kredit yang dapat dibayar lunas diperoleh dari status verifikasi "Terverifikasi".

Melihat korelasi antar variabel:

1 corr = df_loan[['loan_amnt', 'int_rate', 'grade', 'emp_length', 'home_ownership', 'annual_inc','purpose','label']].corr()
2 sns.set(rc={'figure.figsize':(11,7)})
3 sns.heatmap(corr,linewidths=.5, annot=True, cmap="YlGnBu",mask=np.triu(np.ones_like(corr, dtype=np.bool)))\
4     .set_title("Pearson Correlations Heatmap");

<ipython-input-20-6ad682432b0f>:3: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warn
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
sns.heatmap(corr,linewidths=.5, annot=True, cmap="YlGnBu",mask=np.triu(np.ones_like(corr, dtype=np.bool)))\

Analisis :
Jumlah kredit sangat tergantung pada pendapatan tahunan peminjam

5. Pra-Processing data Untuk Kolom Diskrit

1 # use LabelEncoder() to encode another category column:
2 for col in ["verification_status", "purpose","term"]:
3     le = LabelEncoder()
4     le.fit(df_loan[col])
5     df_loan[col] = le.transform(df_loan[col])
6 df_loan.head()

loan_amnt term int_rate installment grade emp_length home_ownership annual_inc verification_status purpose label

0 5000.0 0 10.65 162.87 4 12 1 24000.0 2 1 0

1 2500.0 1 15.27 59.83 3 1 1 30000.0 1 0 1

2 2400.0 0 15.96 84.33 3 12 1 12252.0 0 10 0

3 10000.0 0 13.49 339.31 3 12 1 49200.0 1 8 0

4 3000.0 1 12.69 67.79 4 2 1 80000.0 1 8 0

1 df_loan.mean()

loan_amnt 12974.927808
term 0.317644
int_rate 13.051363
installment 374.799012
grade 3.352296
emp_length 6.465492
home_ownership 1.542304
annual_inc 65231.249177

https://colab.research.google.com/drive/159yIM6oyvZe5oTZMv989KC0fuJtRcff5#scrollTo=WazTgR6ytq3C 7/10
3/8/23, 7:39 PM Tugas 2 SC.ipynb - Colaboratory
verification_status 1.049957
purpose 2.933295
label 0.170373
dtype: float64

1 df_loan.isnull().sum()

loan_amnt 0
term 0
int_rate 0
installment 0
grade 0
emp_length 0
home_ownership 0
annual_inc 0
verification_status 0
purpose 0
label 0
dtype: int64

1 df_loan.label = df_loan.label.fillna(1)

6. Pengelompokan

1 inertias = []
2
3 for i in range(2,16):
4     kmeans = KMeans(n_clusters=i, random_state=0).fit(df_loan)
5     inertias.append(kmeans.inertia_)
6
7 plt.figure(figsize=(10,5))
8 plt.title('Inertias v.s. N_Clusters')
9 plt.plot(np.arange(2,16),inertias, marker='o', lw=2);

/usr/local/lib/python3.8/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change fro


warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change fro
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change fro
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change fro
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change fro
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change fro
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change fro
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change fro
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change fro
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change fro
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change fro
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change fro
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change fro
warnings.warn(
/usr/local/lib/python3.8/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change fro
warnings.warn(

https://colab.research.google.com/drive/159yIM6oyvZe5oTZMv989KC0fuJtRcff5#scrollTo=WazTgR6ytq3C 8/10
3/8/23, 7:39 PM Tugas 2 SC.ipynb - Colaboratory

Analisis:
"Siku" pada grafik di atas adalah 4. Jumlah cluster harus 4.

1 km = KMeans(n_clusters=4, random_state=0)
2 clusters = km.fit_predict(df_loan)

/usr/local/lib/python3.8/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change fro


warnings.warn(

1 df_clustered = df_loan[['loan_amnt', 'int_rate', 'grade', 'emp_length', 'home_ownership', 'annual_inc', 'purpose']]
2 df_clustered["Cluster"] = clusters
3 sns.pairplot(df_clustered[['loan_amnt', 'int_rate', 'grade', 'emp_length', 'home_ownership', 'annual_inc', 'purpose'
4                            , "Cluster"]], hue="Cluster");

https://colab.research.google.com/drive/159yIM6oyvZe5oTZMv989KC0fuJtRcff5#scrollTo=WazTgR6ytq3C 9/10
3/8/23, 7:39 PM Tugas 2 SC.ipynb - Colaboratory

<ipython-input-28-d611c00a7f93>:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus


df_clustered["Cluster"] = clusters

7. Memprediksi Risiko: Menggunakan Model Klasifikasi K-Nearest Neighbors

1 X, y = df_loan.drop("label", axis=1), df_loan["label"]
2 X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.20, random_state=0)

1 max_score = 0
2 max_k = 0
3 for k in range(1, 100):
4     neigh = KNeighborsClassifier(n_neighbors=k)
5     neigh.fit(X_train,y_train)
6     score = f1_score(y_test, neigh.predict(X_test),average='micro')
7     if score > max_score:
8         max_k = k
9         max_score = score

1 print('Jika kita menggunakan K-Nearest Neighbours Classification, maka nilai K adalah :',str(max_k),'untuk mendapatkan prediksi terb

Jika kita menggunakan K-Nearest Neighbours Classification, maka nilai K adalah : 6 untuk mendapatkan prediksi terbaik

1 print("Maka akurasi rata-ratanya adalah :", max_score)

maka akurasi rata-ratanya adalah : 0.8297258297258298

Karena Klasifikasi KNN (K-Nearest Neighbors) membutuhkan banyak waktu dan memori untuk diprediksi, model ML lain dapat digunakan
seperti SVC, DecisionTree, RandomForest, dan GaussianNaiveBayes.

Namun pada notebook ini, Kami menggunakan Model KNN Saja, dan itu dilakukan dan memiliki akurasi yang baik = 82,9%

check 28 d selesai pada 19.36

https://colab.research.google.com/drive/159yIM6oyvZe5oTZMv989KC0fuJtRcff5#scrollTo=WazTgR6ytq3C 10/10

Anda mungkin juga menyukai