Understanding
Exploratory
Data Analysis
Kenalan
Yuk!
Data Scientist
Past Experiences
Hendra H Choiri
Data Scientist Education Background
{
Pararawendy Indarjo
Email : pararawendy19@gmail.com
Linkedin : https://www.linkedin.com/in/pararawendy-indarjo/
Blog : medium.com/@pararawendy19
Saya
Master of Science
Leiden University (2017-2019)
Focused on application of theoretical machine
learning & reinforcement learning
Linkedin : https://www.linkedin.com/in/pararawendy-indarjo/
Blog : medium.com/@pararawendy19
Halo!
Feedback
Data
Insight Understanding
Data
Deployment Evaluation Modeling
Preparation
Tahap Masuk Proses Keluar
Data collection - ● Survey/Labelling ● Data mentah
● ETL
Data ● Data mentah ● Exploratory Data ● Data mentah
understanding Analysis ● Insight/To do list
Data ● Data mentah ● Pre-processing ● Data training
preparation ● To do list ● Feature processing ● Data test/validation
Modelling ● Data training ● Model training ● ML Model
● Hyperparameter
tuning
Evaluation ● Data test/ ● Validation ● Performance
validation measure
Deployment ● Data baru ● Prediction ● Prediksi
Kenapa perlu EDA?
Untuk menjawab pertanyaan
berikut:
Statistika Deskriptif
Data Data
terkecil 25% 50% 75% terbesar
Pick + Separate Columns
Pisahkan kolom2 yang
ingin dianalisis
Sample
df.sample() , df.head(), atau df.tail()
akan menampilkan beberapa baris data secara langsung
Dataset :
1. botak.csv
Link: bit.ly/EDATrial
Dataset
Botak.csv
- Deskripsi:
Dataset sintetik. Memprediksi peluang botaknya seseorang dari
beberapa atribut mengenai orang tersebut.
- Data:
Setiap baris mewakili satu orang, setiap kolom berisi atribut orang
tersebut.
Sudah.
Sesi tanya-jawab