Anda di halaman 1dari 19

Praktikum VII

Selamat datang pada praktikum unggulan minggu ke - 7.

1_L0464qoX7pSkIQMBcF73Tg.png

Gambar diatas menunjukkan tahapan yang Anda perlu lakukan ketika ingin
membangun sebuah model artificial intelligence. Pada minggu ini Anda akan
melakukan fase data cleansing atau pembersihan data. Tahapan ini biasanya
dilakuakan sebelum anda melakukan tahapan EDA yang telah Anda pelajari
sebelumnya. Tahapan ini perlu dilakukan jika data yang Anda miliki tidak 'bersih',
sehingga perlu dilakukan pemrosesan terlebih dahulu (tahapan pre-processing)
sebelum data tersebut masuk kedalam algoritma pembangunan model.

Materi praktikum ini dibagi menjadi 2 bagian dengan menggunakan dua dataset yang
berbeda. Adapun operasi akan Anda lakukan antara lain

Melihat bentuk data (shape) dari data train dan test set
Cek data NaN, bila ada maka hapus/drop data NaN tsb
Cek outliers, bila ada maka hapus/drop outliers tsb
Melakukan konversi jenis kolom yang relevan.
Melakukan transformasi terhadap data yang bersifat kategori

Operasi yang Anda lakukan pada tahapan pembersihan data sangat bergantung pada
karakteristik permasalahan, karakter data, serta jenis data yang terdapat dalam dataset
Anda. Sebagai referensi, berikut bagan dari berbagai macam tipe data yang mungkin
Anda temui dalam sebuah dataset.

8UUywzzaMhY2ZGHrWE7VkA_b.png

import numpy as np
import pandas as pd
import sklearn
import seaborn as sns
import matplotlib.pyplot as plt
Dataset 1
Dataset yang akan Anda gunakan pada praktikum kali ini adalah dataset survei tingkat
kepuasan di Starbucks.

# Load data train dan test ke dalam pandas dataframe


# dataset : https://gitlab.com/andreass.bayu/file-directory/-/raw/main/Sta
concrete = pd.read_csv("https://gitlab.com/andreass.bayu/file-directory/-/

# menghasilkan jumlah baris dan jumlah kolom (bentuk data) pada data train
concrete.shape

(122, 21)

# menampilkan 10 data teratas


concrete.head(10)

6.
4. What 5. How
1. 2.
3. Are you is your often do
Timestamp Your Your u
Gender Age currently....? annual you visit
income? Starbucks?
Star

2019/10/01 From Less than


0 12:38:43 Female 20 to Student Rarely
RM25,000
PM GMT+8 29

2019/10/01 From Less than


1 12:38:54 Female 20 to Student Rarely Ta
RM25,000
PM GMT+8 29

2019/10/01 From Less than


2 12:38:56 Male 20 to Employed Monthly
RM25,000
PM GMT+8 29

2019/10/01 From Less than


3 12:39:08 Female 20 to Student Rarely Ta
RM25,000
PM GMT+8 29

2019/10/01 From Less than


4 12:39:20 Male 20 to Student Monthly Ta
RM25,000
PM GMT+8 29

2019/10/01 From Less than


5 12:39:39 Female 20 to Student Rarely
RM25,000
PM GMT+8 29

2019/10/01 From Less than


6 12:39:42 Female 20 to Student Rarely
RM25,000
PM GMT+8 29

2019/10/01 From RM50,000


7 12:40:58 Male 20 to Employed - Rarely
PM GMT+8 29 RM100,000

2019/10/01 From Less than


8 12:42:27 Female 20 to Student Rarely D
RM25,000
PM GMT+8 29

2019/10/01 From Less than


9 12:43:36 Male 20 to Employed Monthly Ta
RM25,000
PM GMT+8 29

10 rows × 21 columns
# fungsi describe() untuk mengetahui statistika data untuk data numeric se
concrete.describe()

12. How
would you
rate the
quality 15. How
of 14. How
would you 16. You 17. H
13. How important
Starbucks rate the rate the you
would you are sales
compared ambiance WiFi se
rate the and
to other at quality St
price promotions
brands Starbucks? at (Pro
range at in your
(Coffee (lighting, Starbucks frien
Starbucks? purchase
Bean, Old music, as..
decision?
Town etc...)
White
Coffee..)
to be:

count 122.000000 122.000000 122.000000 122.000000 122.000000 1

mean 3.663934 2.893443 3.795082 3.754098 3.254098

std 0.941343 1.081836 1.090443 0.929867 0.958317

min 1.000000 1.000000 1.000000 1.000000 1.000000

25% 3.000000 2.000000 3.000000 3.000000 3.000000

#cek nilai
50% yang hilang / missing
4.000000 3.000000 values di dalam 4.000000
4.000000 data train 3.000000
concrete.isnull().sum().sort_values(ascending=False)
75% 4.000000 4.000000 5.000000 4.000000 4.000000

max 5.000000 5.000000 5.000000 5.000000 5.000000


19. How do you come to hear of promotions at Starbucks? Check all
that apply. 1
6. How do you usually enjoy Starbucks?
1
Timestamp
0
11. On average, how much would you spend at Starbucks per visit?
0
18. How likely you will choose Starbucks for doing business meetings
or hangout with friends? 0
17. How would you rate the service at Starbucks? (Promptness,
friendliness, etc..) 0
16. You rate the WiFi quality at Starbucks as..
0
15. How would you rate the ambiance at Starbucks? (lighting, music,
etc...) 0
14. How important are sales and promotions in your purchase
decision? 0
13. How would you rate the price range at Starbucks?
0
12. How would you rate the quality of Starbucks compared to other
brands (Coffee Bean, Old Town White Coffee..) to be: 0
10. What do you most frequently purchase at Starbucks?
0
1. Your Gender
0
9. Do you have Starbucks membership card?
0
8. The nearest Starbucks's outlet to you is. ?
0
7. How much time do you normally spend during your visit?
0
5. How often do you visit Starbucks?
0
4. What is your annual income?
0
3. Are you currently. .. ?
0
2. Your Age
0
20. Will you continue buying at Starbucks?
0
dtype: int64

Missing values adalah nilai yang tidak terdefinisi di dataset. Bentuknya beragam, bisa
berupa blank cell, ataupun simbol-simbol tertentu seperti NaN (Not a Number), NA
(Not Available), ?, -, dan sebagainya. Missing values dapat menjadi masalah dalam
analisis data serta tentunya dapat mempengaruhi hasil modelling machine learning.
Instruksi Praktikum untuk mahasiswa SosHum
Lakukan data encoding dengan melakukan transformasi kolom - kolom berikut :

1. Your Gender -> gender

2. Your Age -> age

3. Are you currently.... ? -> status

4. What is your annual income? -> income

5. How often do you visit Starbucks? -> visitNo


Lakukan transformasi data untuk kolom - kolom berikut :

gender = 0 - Male, 1 - Female


age = 0 - Below 20, 1 - From 20 to 29, 2 - From 30 to 39, 3 - 40 and above
status = 0 - Student, 1 - Self-Employed, 2 - Employed, 3 - Housewife income
= 0 - Less than RM25,000, 1 - RM25,000 – RM50,000, 2 - RM50,000
– RM100,000, 3 - RM100,000 – RM150,000, 4 - More than RM150,000
visitNo = 0 - Daily, 1 - Weekly, 3 - Monthly, 4 - Never
Lakukan analisis histogram untuk pengaruh kolom income terhadap kolom
visitNo, apakah yang dapat Anda simpulkan ?
Berikan kesimpulan akhir anda terhadap survei kepuasan di Starbucks
berdasarkan dataset yang digunakan

JAWABAN

# Load data train dan test ke dalam pandas dataframe


# dataset : https://gitlab.com/andreass.bayu/file-directory/-/raw/main/Sta
concrete = pd.read_csv("https://gitlab.com/andreass.bayu/file-directory/-/

# menghasilkan jumlah baris dan jumlah kolom (bentuk data) pada data train
concrete.shape

# menampilkan 10 data teratas


concrete.head(10)
6.
4. What 5. How
1. 2.
3. Are you is your often do
Timestamp Your Your u
Gender Age currently....? annual you visit
income? Starbucks?
Star

2019/10/01 From Less than


0 12:38:43 Female 20 to Student Rarely
RM25,000
PM GMT+8 29

2019/10/01 From Less than


1 12:38:54 Female 20 to Student Rarely Ta
RM25,000
PM GMT+8 29

2019/10/01 From Less than


2 12:38:56 Male 20 to Employed Monthly
RM25,000
PM GMT+8 29

2019/10/01 From Less than


3 12:39:08 Female 20 to Student Rarely Ta
RM25,000
PM GMT+8 29

2019/10/01 From Less than


4 12:39:20 Male 20 to Student Monthly Ta
RM25,000
PM GMT+8 29

2019/10/01 From Less than


5 12:39:39 Female 20 to Student Rarely
RM25,000
PM GMT+8 29

2019/10/01 From Less than


6 12:39:42 Female 20 to Student Rarely
RM25,000
PM GMT+8 29

2019/10/01 From RM50,000


7 12:40:58 Male 20 to Employed - Rarely
PM GMT+8 29 RM100,000

2019/10/01 From Less than


8 12:42:27 Female 20 to Student Rarely D
RM25,000
PM GMT+8 29

2019/10/01 From Less than


9 12:43:36 Male 20 to Employed Monthly Ta
RM25,000
PM GMT+8 29
list(concrete.columns)
10 rows × 21 columns
['Timestamp',
'1. Your Gender',
'2. Your Age',
'3. Are you currently.... ?',
'4. What is your annual income?',
'5. How often do you visit Starbucks?',
'6. How do you usually enjoy Starbucks?',
'7. How much time do you normally spend during your visit?',
"8. The nearest Starbucks's outlet to you is. ?",
'9. Do you have Starbucks membership card?',
'10. What do you most frequently purchase at Starbucks?',
'11. On average, how much would you spend at Starbucks per visit?',
'12. How would you rate the quality of Starbucks compared to other
brands (Coffee Bean, Old Town White Coffee..) to be:',
'13. How would you rate the price range at Starbucks?',
'14. How important are sales and promotions in your purchase
decision?',
'15. How would you rate the ambiance at Starbucks? (lighting,
music, etc. )',
'16. You rate the WiFi quality at Starbucks as..',
'17. How would you rate the service at Starbucks? (Promptness,
friendliness, etc..)',
'18. How likely you will choose Starbucks for doing business
meetings or hangout with friends?',
'19. How do you come to hear of promotions at Starbucks? Check all
that apply.',
'20. Will you continue buying at Starbucks?']

# Transformasi kolom
dataRename = concrete.rename(columns={'1. Your Gender': 'gender', '2. Your

dataRename.head(10)

6. How do t
you
Timestamp gender age status income visitNo usually no
enjoy
Starbucks?
2019/10/01 From Less than
0 12:38:43 Female 20 to Student Rarely Dine in
RM25,000 m
PM GMT+8 29

2019/10/01 From Less than B


1 12:38:54 Female 20 to Student Rarely Take away
RM25,000
PM GMT+8 29

2019/10/01 From Less than


2 12:38:56 Male 20 to Employed Monthly Dine in
RM25,000 m
PM GMT+8 29

2019/10/01 From Less than B


3 12:39:08 Female 20 to Student Rarely Take away
RM25,000
PM GMT+8 29

2019/10/01 From Less than


4 12:39:20 Male 20 to Student Monthly Take away
RM25,000 m
PM GMT+8 29

2019/10/01 From Less than


5 12:39:39 Female 20 to Student Rarely Dine in
RM25,000 m
PM GMT+8 29

2019/10/01 From Less than B


6 12:39:42 Female 20 to Student Rarely Dine in
RM25,000
PM GMT+8 29

2019/10/01 From RM50,000


7 12:40:58 Male 20 to Employed - Rarely Dine in
m
PM GMT+8 29 RM100,000

2019/10/01 From Less than B


8 12:42:27 Female 20 to Student Rarely Drive-thru
RM25,000
PM GMT+8 29

2019/10/01 From Less than B


9 12:43:36 Male 20 to Employed Monthly Take away
RM25,000
PM GMT+8 29

# Transformasi gender = 0 - Male, 1 - Female


10 rows × 21
concrete['1. columns
Your Gender'] = concrete['1. Your Gender'].map({'Female': 1,
dataRename = concrete.rename(columns={'1. Your Gender': 'gender'})
dataRename.head(10)
6.
4. What 5. How
2.
3. Are you is your often do
Timestamp gender Your u
Age currently....? annual you visit
income? Starbucks?
Star

2019/10/01 From Less than


0 12:38:43 1 20 to Student Rarely
RM25,000
PM GMT+8 29

2019/10/01 From Less than


1 12:38:54 1 20 to Student Rarely Ta
RM25,000
PM GMT+8 29

2019/10/01 From Less than


2 12:38:56 0 20 to Employed Monthly
RM25,000
PM GMT+8 29

2019/10/01 From Less than


3 12:39:08 1 20 to Student Rarely Ta
RM25,000
PM GMT+8 29

2019/10/01 From Less than


4 12:39:20 0 20 to Student Monthly Ta
RM25,000
PM GMT+8 29

2019/10/01 From Less than


5 12:39:39 1 20 to Student Rarely
RM25,000
PM GMT+8 29

2019/10/01 From Less than


6 12:39:42 1 20 to Student Rarely
RM25,000
PM GMT+8 29

2019/10/01 From RM50,000


7 12:40:58 0 20 to Employed - Rarely
PM GMT+8 29 RM100,000

2019/10/01 From Less than


8 12:42:27 1 20 to Student Rarely D
RM25,000
PM GMT+8 29
2019/10/01 From
Less than
9 12:43:36 0 20 to Employed Monthly Ta
RM25,000
PM GMT+8 29

# Transformasi Age : Below 20 = 0, From 20 to 29 = 1, From 30 to 39 = 2, 4


10 rows × 21 columns
concrete['2. Your Age'] = concrete['2. Your Age'].map({'Below 20': 0, 'Fro
dataRename = concrete.rename(columns={'2. Your Age': 'age'})
dataRename.head(10)

6. H
4. What 5. How
1.
3. Are you is your often do
Timestamp Your age us
currently. .. ? annual you visit
Gender
income? Starbucks?
Starb

2019/10/01
Less than
0 12:38:43 1 1 Student Rarely
RM25,000
PM GMT+8

2019/10/01
Less than
1 12:38:54 1 1 Student Rarely Tak
RM25,000
PM GMT+8

2019/10/01
Less than
2 12:38:56 0 1 Employed Monthly
RM25,000
PM GMT+8

2019/10/01
Less than
3 12:39:08 1 1 Student Rarely Tak
RM25,000
PM GMT+8

2019/10/01
Less than
4 12:39:20 0 1 Student Monthly Tak
RM25,000
PM GMT+8

2019/10/01
Less than
5 12:39:39 1 1 Student Rarely
RM25,000
PM GMT+8

2019/10/01
Less than
6 12:39:42 1 1 Student Rarely
RM25,000
PM GMT+8

2019/10/01 RM50,000
7 12:40:58 0 1 Employed - Rarely
PM GMT+8 RM100,000

2019/10/01 Less than


8 12:42:27 1 1 Student Rarely Dri
RM25,000
PM GMT+8

2019/10/01 Less than


9 12:43:36 0 1 Employed Monthly Tak
RM25,000
PM GMT+8

# Transformasi status = 0 - Student, 1 - Self-Employed, 2 - Employed, 3 -


10 rows × 21
concrete['3. columns
Are you currently....?'] = concrete['3. Are you currently....
dataRename = concrete.rename(columns={'3. Are you currently....?': 'status
dataRename.head(10)

6. How do
1. 2. 4. What 5. How
is your often do you
Timestamp Your Your status usually
Gender Age annual you visit
income? Starbucks? enjoy
Starbucks?

2019/10/01 Less than


0 12:38:43 1 1 0.0 Rarely Dine in
RM25,000
PM GMT+8

2019/10/01 Less than


1 12:38:54 1 1 0.0 Rarely Take away
RM25,000
PM GMT+8

2019/10/01 Less than


2 12:38:56 0 1 2.0 Monthly Dine in
RM25,000
PM GMT+8

2019/10/01 Less than


3 12:39:08 1 1 0.0 Rarely Take away
RM25,000
PM GMT+8

2019/10/01
4 12:39:20 0 1 0.0 RM25,000 Monthly Take away
PM GMT+8

2019/10/01
5 12:39:39 1 1 0.0 RM25,000 Rarely Dine in
PM GMT+8

2019/10/01
6 12:39:42 1 1 0.0 RM25,000 Rarely Dine in
PM GMT+8

2019/10/01 RM50,000
7 12:40:58 0 1 2.0 - Rarely Dine in
PM GMT+8 RM100,000

2019/10/01
8 12:42:27 1 1 0.0 RM25,000 Rarely Drive-thru
PM GMT+8

2019/10/01
9 12:43:36 0 1 2.0 RM25,000 Monthly Take away
PM GMT+8

#Transformasi data income = 0 - Less than RM25,000, 1 - RM25,000 – RM50,00


10 rows × 21
concrete['4. columns
What is your annual income?'] = concrete['4. What is your ann
dataRename = concrete.rename(columns={'4. What is your annual income?': 'i
dataRename.head(10)

6. Ho
1. 2. 5. How
3. Are you often do
Timestamp Your Your income usu
currently....? you visit
Gender Age e
Starbucks?
Starbu

2019/10/01
0 12:38:43 1 1 0.0 0.0 Rarely Di
PM GMT+8
2019/10/01
1 12:38:54 1 1 0.0 0.0 Rarely Take
PM GMT+8

2019/10/01
2 12:38:56 0 1 2.0 0.0 Monthly Di
PM GMT+8

2019/10/01
3 12:39:08 1 1 0.0 0.0 Rarely Take
PM GMT+8

2019/10/01
4 12:39:20 0 1 0.0 0.0 Monthly Take
PM GMT+8

2019/10/01
5 12:39:39 1 1 0.0 0.0 Rarely Di
PM GMT+8

2019/10/01
6 12:39:42 1 1 0.0 0.0 Rarely Di
PM GMT+8

2019/10/01
7 12:40:58 0 1 2.0 2.0 Rarely Di
PM GMT+8

2019/10/01
8 12:42:27 1 1 0.0 0.0 Rarely Drive
PM GMT+8

2019/10/01
9 12:43:36 0 1 2.0 0.0 Monthly Take
PM GMT+8

#Transfromasi data visitNo = 0 - Daily, 1 - Weekly, 2 - Rarely, 3 - Monthl


10 rows × 21
concrete['5. columns
How often do you visit Starbucks?'] = concrete['5. How often
dataRename = concrete.rename(columns={'5. How often do you visit Starbucks
dataRename.head(10)

6. How
4. What
1. 2. is your y
3. Are you visitNo
Timestamp Your Your annual usual
currently. .. ?
Gender Age enj
income?
Starbuck

2019/10/01
0 12:38:43 1 1 0.0 0.0 2 Dine
PM GMT+8

2019/10/01
1 12:38:54 1 1 0.0 0.0 2 Take aw
PM GMT+8

2019/10/01
2 12:38:56 0 1 2.0 0.0 3 Dine
PM GMT+8

2019/10/01
3 12:39:08 1 1 0.0 0.0 2 Take aw
PM GMT+8

2019/10/01
4 12:39:20 0 1 0.0 0.0 3 Take aw
PM GMT+8

2019/10/01
5 12:39:39 1 1 0.0 0.0 2 Dine
PM GMT+8

2019/10/01
6 12:39:42 1 1 0.0 0.0 2 Dine
PM GMT+8

2019/10/01
7 12:40:58 0 1 2.0 2.0 2 Dine
PM GMT+8

2019/10/01
8 12:42:27 1 1 0.0 0.0 2 Drive-th
PM GMT+8

2019/10/01
9 12:43:36 0 1 2.0 0.0 3 Take aw
PM GMT+8

10 rows × 21 columns
Kesimpulan akhir
Dapat disimpulkan dari 10 data yang diperoleh bahwa pelanggan yang datang membeli
Starbucks rata-rata berusia kisaran 20-29 tahun yang berstatus pelajar sertamemiliki
penghasilan tahunan kurang dari RM25,000. Para pelanggan jarang mengunjungi
Starbucks tetapi menurut survey yang dikumpulkan mereka akan membeli Starbucks
kembali di lain waktu.

Nama : R. Ario Rafi ARTANTO

Kelas : 2PA33

Npm : 11521141

Rangkuman pembahasan materi big data

Data adalah kumpulan fakta dari objek, secara umum ada tahap yang dilalui data yang
diolah akan menghasilkan suatu informasi.

• Contoh data

A. Mahasiswa

Biodata mahasiswa itu sendiri mulai dari Nomor Pokok mahasiswa, nama dan suatu
alamat merupakan suatu data yang dapat diolah menjadi suatu informasi.

Ketika bermain dengan data akan bersentuhan dengan internet of things, internet of
things merupakan objek yang terhubung ke dalam internet yang memiliki kemampuan
untuk saling bertukar data antara user yang sama-sama terhubung di suatu jaringan
internet itu. Seperti smart clock berkoleksi dengan internet of Things contohnya pagar
yang dibuka secara otomatis menggunakan telepon seluler, mematikan lampu, CCTV
yang bisa diakses secara langsung melalui handphone yang terhubung dengan internet.
Data Tidak cuma terkait dengan big data tetapi ada internet of Things dan akhirnya
menggunakan sains pada artificial intelligence.

Big data adalah data dengan jumlah yang sangat besar, dapat mengolah data supaya
bisa berguna kembali dengan data yang yang sangat besar dengan arsitektur yang
sangat kompleks.

Karakteristik data 4 V

1.Volume
Membahas mengenai jumlah datanya seberapa besar data yang kita punya2.Variety

Macam-macam data yang kita punya

3.Velocity

Seberapa cepat kita mendapatkan data tersebut, misalkan data Real Time, data yang
diambil secara Sebulan sekali, seminggu sekali, dll.

4. Varicity

Mengenai seberapa akurat data yang kita miliki yang akan kita olah sesuai kebutuhan
kita.

Konsep big data Ketika kita memiliki data dengan karakteristik V lebih dari 3 kita bisa
gunakan untuk analitik.

• Konsep big data terbagi menjadi tiga:

1. Integrasi data

Ketika kita mempunyai dua data yang disimpan di tempat yang berbeda Bagaimana
kita bisa membuat kedua data tersebut bisa terhubung

2. Pengelolaan data

Bagaimana mengorganisasikan data tersebut

3. Analisis data

Membutuhkan suatu teknik tertentu

• Dari suatu teknologi yang bernama iot bisa dijadikan suatu sumber dari data yang
kemudian menjadi salah satu dari yang harus digabungkan dari sebuah integrasi data
ini.

Keuntungan big data

1. Dapat mengidentifikasi penyebab permasalahan bisnis, misalkan pada suatu


permasalahan perusahaan ayam goreng di suatu tahun penjualannya menurun
dari big data ini data yang di punya bisa membantu perusahaan ayam goreng
tersebut untuk meningkatkan kembali penjualannya di tahun berikutnya. Seperti
bisa mengidentifikasi Faktor apa yang dapat menyebabkan terjadinya penurunan
income.
2. Membantu pengambilan keputusan.
3. Efektif dan efisien karena memiliki lebih dari satu V sebagai karakteristik
sehingga tidak perlu adanya proses awal.

Penerapan big data sudah dekat banget pada kita dengan contoh Google dengan
mudahnya kita mencari sesuatu Google, sekarang semakin berkembang metode yang
dipakai Google, smartphone, sosial media dengan komentar ataupun berita yang bisa
digunakan untuk melakukan analisis sentimen, misal dari suatu komentar A bisa
diidentifikasi dengan artificial intelligence sebagai komentar yang baik, buruk, atau
Netral. Dari data sentimen analisis bisa dikembangkan juga masuk ke dalam e-
commerce dari situ bisa tahu sentimen apa orang terhadap produk kita sehingga
kedepannya bisa lebih improve lagi apa yang kurang dari produk kita.

Big data dengan karakteristiknya dan juga manfaatnya memiliki tantangan tersendiri
satu diantara tantangan data :

1. sistem harus selalu up to date, Kenapa selalu up to date karena dia bisa
mengambil data dari banyaknya channel
2. Membutuhkan server yang besar dan juga butuh maintenance.
3. Kurangnya sumber daya manusia.
4. Keamanan rawan terjadinya kebocoran data harus tahu bagaimana cara
mengamankan data tersebut jangan sampai data itu bocor atau tersebar.

Aplikasi analisis big data

Hadoop
Produk berbayar Colab - Batalkan kontrak di sini

Anda mungkin juga menyukai