Anda di halaman 1dari 8

Praktikum V

Selamat datang pada praktikum unggulan minggu ke - 5.

1_L0464qoX7pSkIQMBcF73Tg.png

Gambar diatas menunjukkan tahapan yang Anda perlu lakukan ketika ingin membangun sebuah model artificial intelligence. Pada minggu ini
Anda akan melakukan fase data cleansing atau pembersihan data. Tahapan ini biasanya dilakuakan sebelum anda melakukan tahapan EDA
yang telah Anda pelajari sebelumnya. Tahapan ini perlu dilakukan jika data yang Anda miliki tidak 'bersih', sehingga perlu dilakukan
pemrosesan terlebih dahulu (tahapan pre-processing) sebelum data tersebut masuk kedalam algoritma pembangunan model.

Materi praktikum ini dibagi menjadi 2 bagian dengan menggunakan dua dataset yang berbeda. Adapun operasi akan Anda lakukan antara lain

Melihat bentuk data (shape) dari data train dan test set
Cek data NaN, bila ada maka hapus/drop data NaN tsb
Cek outliers, bila ada maka hapus/drop outliers tsb
Melakukan konversi jenis kolom yang relevan.
Melakukan transformasi terhadap data yang bersifat kategori

Operasi yang Anda lakukan pada tahapan pembersihan data sangat bergantung pada karakteristik permasalahan, karakter data, serta jenis
data yang terdapat dalam dataset Anda. Sebagai referensi, berikut bagan dari berbagai macam tipe data yang mungkin Anda temui dalam
sebuah dataset.

8UUywzzaMhY2ZGHrWE7VkA_b.png

import numpy as np 
import pandas as pd 
import sklearn
import seaborn as sns
import matplotlib.pyplot as plt

Dataset 1
Dataset yang akan Anda gunakan pada praktikum kali ini adalah dataset survei tingkat kepuasan di Starbucks.

# Load data train dan test ke dalam pandas dataframe
# dataset : https://gitlab.com/andreass.bayu/file-directory/-/raw/main/Starbucks_satisfactory_survey.csv
concrete = pd.read_csv("https://gitlab.com/andreass.bayu/file-directory/-/raw/main/Starbucks_satisfactory_survey.csv")

# menghasilkan jumlah baris dan jumlah kolom (bentuk data) pada data train dengan fungsi .shape
concrete.shape

(122, 21)

# menampilkan 10 data teratas
concrete.head(10)
6. How do
4. What 5. How
1. 2. you
3. Are you is your often do
Timestamp Your Your usually n
currently....? annual you visit
Gender Age enjoy
income? Starbucks?
Starbucks?

2019/10/01 From
Less than
0 12:38:43 Female 20 to Student Rarely Dine in
RM25,000
PM GMT+8 29

2019/10/01 From
Less than
1 12:38:54 Female 20 to Student Rarely Take away
RM25,000
PM GMT+8 29

2019/10/01 From
Less than
2 12:38:56 Male 20 to Employed Monthly Dine in
RM25,000
PM GMT+8 29

2019/10/01 From
Less than
3 12:39:08 Female 20 to Student Rarely Take away
RM25,000
PM GMT+8 29

2019/10/01 From
Less than
4 12:39:20 Male 20 to Student Monthly Take away
RM25,000
PM GMT+8 29

2019/10/01 From
Less than
5 12:39:39 Female 20 to Student Rarely Dine in
RM25,000
PM GMT+8 29

2019/10/01 From
Less than
# fungsi describe() untuk mengetahui statistika data untuk data numeric seperti count, mean, standard deviation, maximum, mininum, dan q
6 12:39:42 Female 20 to Student Rarely Dine in
RM25,000
concrete.describe()
PM GMT+8 29

2019/10/01
12. How From RM50,000
7 12:40:58
would youMale 20 to Employed - Rarely Dine in
PM GMT+8
rate the 29 RM100,000
quality 15. How
14. How
2019/10/01 of FromHow
13. important
would you 16. You 17. How would
Starbucks Less than
8 12:42:27 Femalewould
20 toyou Student rate the
are sales
rateRarely
the youDrive-thru
rate the
compared RM25,000
ambiance WiFi service at
PM GMT+8 rate29the and
to other at quality Starbucks?
price promotions
brands Starbucks? at (Promptness,
2019/10/01 range
From at in your
(Coffee (lighting,
Less than Starbucks friendliness,
9 12:43:36 Starbucks?
Male 20 to purchase
Employed Monthly Take away
Bean, Old music,
RM25,000 as.. etc..)
PM GMT+8 29 decision?
Town etc...)
White
10 rows × 21 columns
Coffee..)
to be:

count 122.000000 122.000000 122.000000 122.000000 122.000000 122.000000

mean 3.663934 2.893443 3.795082 3.754098 3.254098 3.745902

std 0.941343 1.081836 1.090443 0.929867 0.958317 0.828834

min 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000

#cek nilai yang hilang / missing values di dalam data train
concrete.isnull().sum().sort_values(ascending=False)

19. How do you come to hear of promotions at Starbucks? Check all that apply. 1
6. How do you usually enjoy Starbucks? 1
Timestamp 0
11. On average, how much would you spend at Starbucks per visit? 0
18. How likely you will choose Starbucks for doing business meetings or hangout with friends? 0
17. How would you rate the service at Starbucks? (Promptness, friendliness, etc..) 0
16. You rate the WiFi quality at Starbucks as.. 0
15. How would you rate the ambiance at Starbucks? (lighting, music, etc...) 0
14. How important are sales and promotions in your purchase decision? 0
13. How would you rate the price range at Starbucks? 0
12. How would you rate the quality of Starbucks compared to other brands (Coffee Bean, Old Town White Coffee..) to be: 0
10. What do you most frequently purchase at Starbucks? 0
1. Your Gender 0
9. Do you have Starbucks membership card? 0
8. The nearest Starbucks's outlet to you is...? 0
7. How much time do you normally spend during your visit? 0
5. How often do you visit Starbucks? 0
4. What is your annual income? 0
3. Are you currently....? 0
2. Your Age 0
20. Will you continue buying at Starbucks? 0
dtype: int64

Missing values adalah nilai yang tidak terdefinisi di dataset. Bentuknya beragam, bisa berupa blank cell, ataupun simbol-simbol tertentu seperti
NaN (Not a Number), NA (Not Available), ?, -, dan sebagainya. Missing values dapat menjadi masalah dalam analisis data serta tentunya dapat
mempengaruhi hasil modelling machine learning.

Instruksi Praktikum untuk mahasiswa SosHum


Lakukan data encoding dengan melakukan transformasi kolom - kolom berikut :

1. Your Gender -> gender

2. Your Age -> age

3. Are you currently....? -> status

4. What is your annual income? -> income

5. How often do you visit Starbucks? -> visitNo


Lakukan transformasi data untuk kolom - kolom berikut :

gender = 0 - Male, 1 - Female


age = 0 - Below 20, 1 - From 20 to 29, 2 - From 30 to 39, 3 - 40 and above
status = 0 - Student, 1 - Self-Employed, 2 - Employed, 3 - Housewife
income = 0 - Less than RM25,000, 1 - RM25,000 – RM50,000, 2 - RM50,000 – RM100,000, 3 - RM100,000 – RM150,000, 4 - More than
RM150,000
visitNo = 0 - Daily, 1 - Weekly, 3 - Monthly, 4 - Never
Lakukan analisis histogram untuk pengaruh kolom income terhadap kolom visitNo, apakah yang dapat Anda simpulkan ?
Berikan kesimpulan akhir anda terhadap survei kepuasan di Starbucks berdasarkan dataset yang digunakan

JAWABAN

list(concrete.columns)

['Timestamp',
'1. Your Gender',
'2. Your Age',
'3. Are you currently....?',
'4. What is your annual income?',
'5. How often do you visit Starbucks?',
'6. How do you usually enjoy Starbucks?',
'7. How much time do you normally spend during your visit?',
"8. The nearest Starbucks's outlet to you is...?",
'9. Do you have Starbucks membership card?',
'10. What do you most frequently purchase at Starbucks?',
'11. On average, how much would you spend at Starbucks per visit?',
'12. How would you rate the quality of Starbucks compared to other brands (Coffee Bean, Old Town White Coffee..) to be:',
'13. How would you rate the price range at Starbucks?',
'14. How important are sales and promotions in your purchase decision?',
'15. How would you rate the ambiance at Starbucks? (lighting, music, etc...)',
'16. You rate the WiFi quality at Starbucks as..',
'17. How would you rate the service at Starbucks? (Promptness, friendliness, etc..)',
'18. How likely you will choose Starbucks for doing business meetings or hangout with friends?',
'19. How do you come to hear of promotions at Starbucks? Check all that apply.',
'20. Will you continue buying at Starbucks?']

## Data encoding dengan melakukan transformasi kolom - kolom
concreteRename = concrete.rename(columns={'1. Your Gender': 'gender', '2. Your Age': 'age', '3. Are you currently....?': 'status', '4. W
concreteRename.head(5)    
7. How
much
6. How do time do
you you
Timestamp gender age status income visitNo usually normally S
enjoy spend
Starbucks? during
your
visit?

Between
2019/10/01 From
Less than 30
0 12:38:43 Female 20 to Student Rarely Dine in
RM25,000 minutes
PM GMT+8 29
to 1 hour

2019/10/01 From
Less than Below 30
1 12:38:54 Female 20 to Student Rarely Take away
RM25,000 minutes
PM GMT+8 29

Between
2019/10/01 From
Less than 30
2 12:38:56 Male 20 to Employed Monthly Dine in
RM25,000 minutes
PM GMT+8 29
to 1 hour

2019/10/01 From
Less than Below 30
3 12:39:08 Female 20 to Student Rarely Take away
RM25,000 minutes
PM GMT+8 29

Between
2019/10/01 From
Less than 30
4 12:39:20 Male 20 to Student Monthly Take away
RM25,000 minutes
PM GMT+8 29
to 1 hour

5 rows × 21 columns

concreteRename.gender.unique()

array(['Female', 'Male'], dtype=object)

concreteRename.age.unique()

array(['From 20 to 29', 'From 30 to 39', '40 and above', 'Below 20'],


dtype=object)

concreteRename.status.unique()

array(['Student', 'Employed', 'Self-employed', 'Housewife'], dtype=object)

concreteRename.income.unique()

array(['Less than RM25,000', 'RM50,000 - RM100,000',


'RM25,000 - RM50,000', 'RM100,000 - RM150,000',
'More than RM150,000'], dtype=object)

concreteRename.visitNo.unique()

array(['Rarely', 'Monthly', 'Weekly', 'Never', 'Daily'], dtype=object)

## Transformasi data untuk kolom gender 
gender_map = {'Male':0, 'Female':1}

concreteRename['gender'] = concreteRename['gender'].map(gender_map)
concreteRename.head()
7. How
much
6. How do time do
you you
Timestamp gender age status income visitNo usually normally S
enjoy spend
Starbucks? during
your
visit?

Between
2019/10/01 From
Less than 30
0 12:38:43 1 20 to Student Rarely Dine in
RM25,000 minutes
PM GMT+8 29
to 1 hour

2019/10/01 From
Less than Below 30
1 12:38:54 1 20 to Student Rarely Take away
RM25,000 minutes
PM GMT+8 29

Between
2019/10/01 From
Less than 30
2 12:38:56 0 20 to Employed Monthly Dine in
RM25,000 minutes
PM GMT+8 29
to 1 hour

2019/10/01 From
Less than Below 30
3 12:39:08 1 20 to Student Rarely Take away
RM25,000 minutes
PM GMT+8 29

Between
2019/10/01 From
Less than 30
4 12:39:20 0 20 to Student Monthly Take away
RM25,000 minutes
PM GMT+8 29
to 1 hour

5 rows × 21 columns
## Transformasi data kolom age 
age_map = {'Below 20':0, 'From 20 to 29':1, 'From 30 to 39':2, '40 and above':3}

concreteRename['age'] = concreteRename['age'].map(age_map)
concreteRename.head()
                                                  
7. How
much
6. How do time do
you you
Timestamp gender age status
## Transformasi data untuk kolom status income visitNo usually normally Sta
enjoy
status_map = {'Student':0, 'Self-employed':1, 'Employed':2, 'Housewife':3} spend o
Starbucks? during yo
concreteRename['status'] = concreteRename['status'].map(status_map) your
visit?
concreteRename.head()

Between
2019/10/01
Less than 7. How30
0 12:38:43 1 1 Student Rarely Dine in
RM25,000 much
minutes
PM GMT+8
6. How do to 1 do
time hour
you you
2019/10/01
Timestamp gender age status income
Less thanvisitNo usually normally
Below 30Star
1 12:38:54 1 1 Student Rarely Take away
enjoy spend
RM25,000 minutes ou
PM GMT+8 Starbucks? during you
your
Between
2019/10/01 visit?
Less than 30
2 12:38:56 0 1 Employed Monthly Dine in
RM25,000 minutes
PM GMT+8
to 1 hour

2019/10/01
Less than Between
Below 30
3 2019/10/01
12:39:08 1 1 Student Rarely Take away
RM25,000
Less than minutes
30
0 PM12:38:43
GMT+8 1 1 0 Rarely Dine in w
RM25,000 minutes
PM GMT+8
to 1Between
hour
2019/10/01
Less than 30
4 12:39:20
2019/10/01 0 1 Student Monthly Take away
RM25,000
Less than minutes
Below 30
1 PM12:38:54
GMT+8 1 1 0 Rarely Take away 1
RM25,000 minutes
to 1 hour
PM GMT+8
5 rows × 21 columns
Between
2019/10/01
Less than 30 m
2 12:38:56 0 1 2 Monthly Dine in
RM25,000 minutes
PM GMT+8
to 1 hour

2019/10/01
Less than Below 30 m
3 12:39:08 1 1 0 Rarely Take away
RM25,000 minutes
PM GMT+8

Between
2019/10/01
Less than 30
4 12:39:20 0 1 0 Monthly Take away 1
RM25,000 minutes
PM GMT+8
to 1 hour

5 rows × 21 columns

## Transformasi data untuk kolom income
income_map = {'Less than RM25,000':0, 'RM25,000 - RM50,000':1, 'RM50,000 - RM100,000':2, 'RM100,000 - RM150,000':3, 'More than RM150,000

concreteRename['income'] = concreteRename['income'].map(income_map)
concreteRename.head()
7. How
much
6. How do time do 8
you you ne
Timestamp gender age status income visitNo usually normally Starbu
enjoy spend outl
Starbucks? during you i
your
visit?

Between
2019/10/01
30
0 12:38:43 1 1 0 0 Rarely Dine in withi
minutes
PM GMT+8
to 1 hour

2019/10/01
Below 30
1 12:38:54 1 1 0 0 Rarely Take away 1km
minutes
PM GMT+8

Between
2019/10/01
## Transformasi data untuk kolom visitNo 30 mor
2 12:38:56 0 1 2 0 Monthly Dine in
minutes
visitNo_map = {'Daily':0, 'Weekly':1, 'Rarely':2, 'Monthly':3, 'Never':4}
PM GMT+8
to 1 hour
concreteRename['visitNo'] = concreteRename['visitNo'].map(visitNo_map)
2019/10/01
concreteRename.head() Below 30 mor
3 12:39:08 1 1 0 0 Rarely Take away
minutes
PM GMT+8

Between
2019/10/01
30
4 12:39:20 0 1 0 0 Monthly Take away 1km
minutes
7. How
PM GMT+8
to 1 much
hour
6. How do time do 8. The
5 rows × 21 columns
you you nearest
Timestamp gender age status income visitNo usually normally Starbucks's
enjoy spend outlet to
Starbucks? during you is...?
your
visit?

Between
2019/10/01
30
0 12:38:43 1 1 0 0 2 Dine in within 1km
minutes
PM GMT+8
to 1 hour

2019/10/01
Below 30
1 12:38:54 1 1 0 0 2 Take away 1km - 3km
minutes
PM GMT+8

Between
2019/10/01
30 more than
2 12:38:56 0 1 2 0 3 Dine in
minutes 3km
PM GMT+8
to 1 hour

2019/10/01
Below 30 more than
3 12:39:08 1 1 0 0 2 Take away
minutes 3km
PM GMT+8

Between
2019/10/01
30
4 12:39:20 0 1 0 0 3 Take away 1km - 3km
minutes
PM GMT+8
to 1 hour

5 rows × 21 columns

Lakukan analisis histogram untuk pengaruh kolom income terhadap kolom visitNo, apakah yang dapat Anda simpulkan?

Dalam analisis histogram ini, perhatikan distribusi frekuensi masing-masing nilai income pada setiap kategori visitNo. Hal ini akan memberikan
gambaran tentang kecenderungan kunjungan Starbucks berdasarkan pendapatan responden.
Berdasarkan hasil analisis histogram, dapat disimpulkan hubungan antara kolom "income" dan kolom "visitNo" di Starbucks. Misalnya, jika
terdapat jumlah kunjungan yang tinggi pada kategori "Monthly" pada responden dengan pendapatan di atas RM150,000, maka dapat
disimpulkan bahwa responden dengan pendapatan tinggi cenderung mengunjungi Starbucks secara bulanan.

Berikan kesimpulan akhir anda terhadap survei kepuasan di Starbucks berdasarkan dataset yang digunakan

Kesimpulan akhir tentang survei kepuasan di Starbucks berdasarkan dataset yang digunakan akan tergantung pada hasil analisis yang
dilakukan. Namun, dengan informasi yang diberikan, kita dapat melihat pengaruh pendapatan terhadap frekuensi kunjungan ke Starbucks.

Anda dapat menghitung frekuensi kunjungan Starbucks untuk setiap kategori pendapatan dan memvisualisasikannya menggunakan
histogram. Jika histogram menunjukkan bahwa frekuensi kunjungan cenderung tinggi di kategori pendapatan tertentu (misalnya, lebih banyak
orang dengan pendapatan tinggi yang mengunjungi Starbucks secara rutin), maka dapat disimpulkan bahwa survei kepuasan di Starbucks
cenderung positif.

Namun, tanpa data aktual dan informasi tambahan, tidak mungkin memberikan kesimpulan akhir yang pasti tentang survei kepuasan di
Starbucks berdasarkan dataset yang digunakan.

check 0s completed at 7:52 AM

Anda mungkin juga menyukai