Anda di halaman 1dari 117

Pengolahan Data

dan Insights untuk


Bisnis Vanessa Geraldine
Maret 2022
Agenda
01 Memahami pentingnya data 04 Penyajian pengolahan data

02 Preparasi Data 05 Pemaknaan hasil pengolahan data

03 Metode pengolahan data 06 Cara mengomunikasikan hasil data


Objektif
● Memahami jenis-jenis data serta metode yang tepat digunakan untuk
mengumpulkan serta mengelola data
● Menambah pengetahuan praktis terkait analisis data dan bagaimana
menerjemahkannya sebagai wawasan baru (insight)
● Mampu menyajikan dan mengkomunikasikan data
About me

Hi! My name is
Vanessa Geraldine.
Data Analytics Consultant
8+ years of experience in Data Analytics, Strategy, Management

S1 - Universitas Indonesia
S2 - University of Melbourne, Melbourne Business School
Let’s have a short
menti! Go to www.menti.com and
enter the code 5620 5451
https://www.menti.com/j1udv2oew2
1.
Memahami Pentingnya Data
Data? Data is numbers, characters, images, or other
method of recording, in a form which can be
25 assessed to make a determination or decision
about a specific action.

20

• Many believe that data on its own has no


meaning, only when interpreted does it take
15
on meaning and become information.
• By closely examining data we can find
10
patterns to perceive information, and then
information can be used to enhance
5
knowledge (Source: The Free Online
Dictionary of Computing, 1993-2005 Denis
0 Howe).
Item 1 Item 2 Item 3 Item 4 Item 5
Data vs Informasi

Data Informasi
1,099 Rata-rata orang yang
1,100 membeli buku
1,111 pendidikan X adalah
1,099 per bulan
Angka dalam sebuah tabel
tanpa konteks adalah data
Data Analisis data adalah ilmu mengekstraksi
tren, pola, dan informasi yang relevan dari

Analitik? data mentah untuk menarik kesimpulan.

Ini memiliki banyak pendekatan, banyak


dimensi, dan beragam teknik.

Selain membuat keputusan bisnis, ini


digunakan oleh ilmuwan data dan peneliti
untuk memverifikasi model dan teori ilmiah.
Impact of Data Analytics
Data has the potential to provide a lot of value to businesses

Improved More Effective Better Customer More Efficient


Decision Making Marketing Experience Operations

Companies which are using Data analytics also gives you Personalization that can Cutting unnecessary cost
data to make decisions will useful insights into how your improve customer and invest in more profitable
give better outcomes campaigns are performing to satisfaction to the product area
get optimal outcomes
Your Daily Data Wrangling Tool
Big Data Value Chain

From big data, to data


you can consume into
insights

Source: Dataversity
Penggunaan Analitik dalam bisnis

LinkedIn employs data


Amazon uses data analytics Netflix gathers data from its
analytics to revamp its job
to improve efficiency and subscribers to decide on
listings, track user profiles,
reduce cost. customer preferences.
and posts.
Benefits of Business Analytics
Tipe
Analitik
Descriptive: What happened?
Diagnostics: Why did this happen?
Predictive: What will happen?
Prescriptive: How can we make it
happen?
Data Types
Quantitative Qualitative
(numeric) (categoric)
Data that can be quantified and measured. This kind of data is divided into categories based
This kind of data explains a trend or the results on non-numeric characteristics. It may or may not
of research through numeric values. This have a logical order, and it measures qualities and
category of data can be further subdivided generates categorical answers. It can be:
into:

● Discrete: Data that consists of whole ● Ordinal: Meaning it follows an order or


numbers (0, 1, 2, 3...). For example, the sequence. That might be the alphabet or the
number of children in a family months of the year.
● Continuous: Data that can take any value ● Categorical: Meaning it follows no fixed
within an interval. For example, people’s order. For example, varieties of products
height (between 60 - 70 inches) or weight sold.
(between 90 and 110 pounds).
Data Analytics
Process
Data Analysis Process
Understanding
Business Problems
Business Question
Business understanding is the first stage of
the data science methodology and lays the
foundation for a successful end result.
It includes defining the problem, project
objectives, and solution requirements from
a business perspective.
How to Structure your Problem

● Without structure, your ideas won’t stand up.


● Use structure to strengthen your thinking.

Feel Free to be MECE Don’t Reinvent the wheel Every problem is unique

Breaking the problem before them into its


component elements
Logic Tree
SMART Problem Statement
SMART is an acronym used when creating objectives to define a set of criteria that are easy to understand and to know
when they have been fulfilled.

A good problem statement should: What is SMART?

● Specific Specific – target a specific area for improvement


● Measureable Measurable – quantify or show an indicator of
● Attainable/Achievable progress

● Realistic Attainable – they need to be agreed, to be


attainable and able to be implemented
● Time-bound
Realistic – states what results can realistically be
achieved, given available resources
Time-bound - there need to be deadlines, but are
they reasonable?
Good Problem Statement
• Limited in Scope
• Specific enough to be solvable
• Addresses the Root Cause of the Problem, not
secondary
• Can have multiple solution: Who, what, where,
when, how?

Example:
Why are the sales decreases by 5% in the past 2
months? Source: Prabudhesai, 2018

How might we increase sales with social media


by 10% in three months?
Creating your hypothesis statement
Example: Testing how much we need to spend in vouchers order to get the
maximum amount of customers with the lowest unit economics

If I (independent variable) then (dependent variable)


due to (reason)

If (I give a minimum of IDR 100k voucher to customers with


min purchase of 300k) then (I can increase #customers
while keeping the Unit Economics the same) due to the
(increased basket size)
How to improve my skills as Supervisor?
How to improve

my problem solving skills


Specific

from 50 points to 80 points


Measurable, Realistic

within 3 months?
Attainable, Time-Bound
MECE Principle MECE stands for mutually exclusive,
comprehensively exhaustive
It is a framework for solving complicated problems.
When you apply it to a problem, you break that problem
into subproblems that are:
mutually exclusive (they don’t overlap) and
comprehensively exhaustive (they cover all possibilities)
When you have a complicated problem, that’s when
MECE will be most helpful.
Partitioning the problem into smaller problems makes it
easier to solve.
Issue Tree
An issue tree is the evolved cousin
of the logic tree.
Where a logic tree is simply a
hierarchical grouping of elements,
an issue tree is the series of
questions or issues that must be
addressed to prove or disprove a
hypothesis.
An issue tree is simply the laying
out of issues and sub-issues into
a MECE visual progression
The 5 Why technique
Another example, here from business case...:

1. Write down the specific problem.


2. Ask Why the problem happens
and write the answer down
below the problem.
3. If the answer doesn’t identify the
root cause of the problem in Step
1, ask Why again and write that
answer down.
4. Loop back to step 3 until the
team is in agreement that the
problem’s root cause is identified.
Pareto Principle The Pareto Principle states that 80% of
consequences come from 20% of the
causes

● Advantage: Pareto Principle


becomes a guide for how to
allocate resources efficiently
● Disadvantage: While the 80/20
split is true for Pareto's
observation, that doesn't
necessarily mean that it is always
true
Pareto Principle
Some examples:
● 80% of car accidents are caused by 20% of young people
● 80% of lottery tickets are bought by 20% of society
● 80% of air pollution is caused by 20% of the population
● 80% of all firearms are used by 20% of the population
● 80% of all Internet traffic belongs to 20% of websites
● 80% of car crashes happen within the first 20% of the distance covered
● 80% of mobile phone calls come from 20% of the population
● 80% of the time people use 20% of the tools at their disposal
You need to understand the problem and having a clean data to deliver insightful
business insights.
Metric

Think of metrics as the ‘what,’ and analytics


as the ‘so what?’ Metrics are the numbers
you track, and analytics implies analyses
and decision making.

Metrics: What you measure to gauge


performance or progress within a company
or organization. Your most important
metrics are your key performance
indicators, or KPIs
Vanity vs Actionable Metrics
Vanity metrics: Numbers or statistics that look good on paper, but don’t really
mean anything important. Vanity metrics show that your efforts are making a
difference but you don’t know what that difference is.

Actionable metrics: Statistics that link to specific tasks that you can improve on
and to stats that you can tie in to the goals of your business.
Typical actionable metrics include but are not limited to:

● Conversion volume/rate,

● Customer lifetime value,

● Active subscriptions,

● Average order value,

● Monthly Active Revenue

● New Members (Past 30 days)

● Members Lost (Past 30 days)

● Monthly Churn

● Retention Rate
2.
Data Preparation:
Gathering & Cleaning Data
Data Analysis Process
Data Gathering
Istilah ‘Garbage In, Garbage Out’
Data cleaning
Data cleaning is the process of
identifying, deleting, and/or
replacing inconsistent or incorrect
information from the database.

IBM’s study shows that low


data quality costs 3.1 trillion
dollars every year in the U.S.
alone.

https://www.iteratorshq.com/blog/data-cleaning-in-5-easy-steps/
Case Study

Is average monthly pricing in Jakarta


higher than Yogyakarta?

Before clean up: NO


Case Study

Is average monthly pricing in Jakarta


higher than Yogyakarta?

After clean up: YES

Having wrong or bad quality data can


be affect your business processes and
analysis.
Case study
In the left, we have not treated missing values: chances of playing cricket by males is higher than
females.

In the right, data after treatment of missing values (based on gender), we can see that females have
higher chances of playing cricket compared to males.
https://www.iteratorshq.com/blog/data-cleaning-in-5-easy-steps/
Ways to Clean Data in Excel
Trim White Spaces (=TRIM) Change Text to Lower/Upper/Proper Case
Select & Treat all blank cells Parse Data Using Text to Column
Convert Numbers Stored as Text into Use Find & Replace to Clean Data in Excel
Numbers
Delete all Formatting
Highlight and Remove Duplicates
Spell Check
Highlight Errors with Data Validation
Use Filter
CONCATENATE

LEFT, RIGHT, MID

More functions to VLOOKUP, HLOOKUP

clean up the data INDEX, MATCH

TEXT TO COLUMNS/SPLIT
Q&A
3.
Metode pengolahan data
Explore Data

This stage is the exploration of what data do


you need to take includes summarizing,
transforming data into more useful variables,
even creating basic visualizations to help
understand data better.
Exploratory Data Analysis (EDA)

• Exploratory data analytics is an approach


to analyze data sets to summarize their
main characteristics.
• Data visualization in exploratory data
analytics is the first step towards
modeling.
• EDA primarily helps analyze data beyond
the formal modeling.
Basic Statistical Parameters

1.Measures of Frequency
(Use this when you want to show how often a response is given) :
a.Count, Percent, Frequency
b.Shows how often something occurs

2.Measures of Central Tendency
(Use this when you want to show how an average or most commonly indicated
response) :
a.Mean, Median, and Mode
b.Locates the distribution by various points

3.Measures of Dispersion or Variation (Use this when you want to show how
"spread out" the data are. It is helpful to know when your data are so spread out that
it affects the mean) :
a.Identifies the spread of scores by stating intervals
b.Range = Max-Min points
c.Variance or Standard Deviation = difference between observed score and mean

4.Measures of Position (Use this when you need to compare scores to a


normalized score) :
a.Percentile Ranks, Quartile Ranks (Q1, Q2, Q3)
b.Describes how scores fall in relation to one another. Relies on standardized scores
Normal Distribution & Hypothesis Testing

A statistical hypothesis is a
hypothesis that is testable on
the basis of observed data
modelled as the realised values
taken by a collection of random
variables
Analyze Data

Using exploratory data analytics, data analysts


will analyze performance data and scientists
attempt multiple algorithms to find the best
model for the available data set.
Correlation Analysis

• Correlation is a statistical measure that indicates the extent to which two or more
variables fluctuate together.
• A positive correlation indicates the extent to which those variables increase or decrease
in parallel; a negative correlation indicates the extent to which one variable increases as
the other decreases.
Clustering Analysis

Clustering is the task of dividing the


population or data points into a number of
groups such that data points in the same
groups are more similar to other data
points in the same group than those in
other groups. In simple words, the aim is to
segregate groups with similar traits and
assign them into clusters.
4.
Penyajian pengolahan data
Data visualization is the graphical
representation of data using charts,
DATA graphs, and maps.

VISUALIZATION Data visualization is a form of visual


art that grabs our interest and keeps
our eyes on the message.
Data Visualisation involves interpreting data simple, easy to understand
information using visuals.

Main Goal: It helps us to think and communicate better.


Why Dataviz is so important?
“The human brain processes images
60,000 times faster than text, and 80
percent of information transmitted to the
brain is visual.”

Source: t-sciences.com
Another Example
Data Relationships
Source:
Berkeley
Bar Charts
● The most straight to the point
● Showing trends, patterns, exceptions
● Uses: compare discrete categories, to analyze changes over time, or to compare parts of a whole.

67
68
Histograms
● Represent a variable in the
form of bars, where the
surface of each bar is
proportional to the frequency
of the values represented.
● Distribution of a population or
sample with respect to a
given characteristic.
Line Charts
● Continuous data evolution
● Display changes or trend overtime
● Visualize changes in one value relative to another
● Showcasing relationships, acceleration,
deceleration, and volatility in a data set.

70
Pie Charts
Displaying proportions and parts

Powerful for adding detail to other visualizations

Pro: Can Emphasize Data When There Are Only a


Few Units.

Cons: Cannot Compare More Than A Few Pieces of


Data, unhelpful When Observing Trends Over Time

71
Heatmap
Patterns or relative concentrations

Data set containing many data points

72
Scatter Plot
● Present large volumes of data at once
● Show the relationship between two, three or
more measures: size, color, type of dots
● They also help us determine whether or not
different groups of data are correlated.

73
Dot Plot
Allows you to compare values
across two dimensions.

In our example, each row shows


sales by ship mode.
The dots show sales for each ship
mode, broken down by each
segment. In the example, you can
see that corporate sales are
highest with standard class ship
mode.
Bubble Chart
● Represent individual values from a data set on a
matrix using variations in color or color intensity.
● Patterns or relative concentrations: with intense
color refer to higher concentrations
● Data set containing many data points

75
Preattentive attributes
These are things that our brain processes in milliseconds, before we pay attention
to everything else.
Source: Storytelling with data

How many 9s are here?


Source: Storytelling with data
The Use of Color in Data Visualization

Source: Storytelling with data


Practical Approach to Visualizations (Few, 2009)

Make it easy to compare data;


Choose a graphic that will Represent the information in a
highlight trends and
capture the viewer’s attention simple, clear, and precise way
differences.

Establish an order for the


Give the viewer a clear way to elements based on the
explore the graphic and quantity that they represent;
understand its goals that is, detect maximums and
minimums.
Example of Good Charts

Concise, clear segmentation, ordered from highest to lowest


Use Simple Color Schemes and compare apple to apple
Example of Bad Charts: What’s Wrong?
Q&A
Assignment Part 1

bit.ly/SMDPDataAssignment
bit.ly/SMDPDatasetMar22
5.
Pemaknaan hasil pengolahan data
I have data. I need
insights.
Where do I start?
Insight Generation: Important Notes
Every business can be thought Given this, you can think of an
of as a complicated system “insight” as anything that
with many moving parts. increases your understanding of
Nobody really understands it how the system actually works.
100%.
Insight occurs when people
There’s a gap between their
recognize relationships or make
understanding of the
associations between objects and
business and how it
actually works. actions that can help solve new
problems (Source: Britannica)
Example

Expectation vs Reality.

Our job is to investigate the ‘Hmm….’


Recognizing Relationships
Ask thoughtful questions

Look beyond the obvious

Don’t be afraid to ask and reframe


questions
Process

Ask questions: Is there


Analyze data. Make plots, do
Write down a short list of anything that doesn’t match?
summaries, whatever is
hypothesis, or what you Anything that makes you go
needed to see if it matches
expect to see in the data “That’s odd” or “That doesn’t
your expectations.
make any sense.”?

Zoom in and try to understand You may have just found an


what happened. Gather other insight into the business and
data if you need them. increased your understanding!
Example
Sales have fluctuated over the past 2 years, however, the business
have shown a massive growth YoY with December as the highest
order generated month.

● The top four revenue generator categories are Basket, Art & Sculpture, Jewelry, and Home Decor which represents
82% of the total sales or 72% of the total items sold
● Christmas category, which sits on the 6th rank, is the reason why December’s sales are peaking compared to the
other months, as can be seen in the middle chart
Thinking Critically About the Data
Correlation vs Causation

Causation is often confused with correlation

Correlation does not imply causation


Be Careful With: Personal Bias

Bias, in general, is prejudice in favor of or against one


thing, person, or group compared with another, usually in
a way considered to be unfair.

Example:
● Confirmation bias: you’re so sure about the result of
an experiment and try to prove it with data,
researchers choose only the data that supports their
own hypothesis.
● AI: Certain facial recognition systems trained primarily
on images of white men (gender & racial bias)
Other Example of Data Bias
● Sample/Selection bias: A survey of high
school students to measure teenage use of
illegal drugs will be a biased sample because
it does not include home-schooled students
or dropouts
● Analytics bias: incomplete data sets and a
lack of context around those data sets
● Outlier bias: including Jeff Bezos in an effort
to analyze mean American incomes, for
example, would drastically skew the results of
your study because of his wealth
How to Reduce Bias
● Use multiple people to code the data
● Verify with more data sources
● Check for alternative explanations
● Review findings with peers
6.
Cara mengomunikasikan hasil data
Why communication with
data is important to
discuss? Communication is the life blood of an
organization. If we can’t do it
effectively, at best we are less
effective, at worst will fail.
Characteristics of a Great Business
Communication

Easy to
Clear
Understand (by
Concise conclusion/
the
next steps
audience(s))
Why we need a structured communication?

A lot of stuff to Help audience


communicate understand

● Data ● Concise
● Analysis ● Easy to Understand (by
● Funnels
the audience(s))
● Arguments
● Facts ● Clear conclusion/next
● Findings steps
● Conclusions
4 Key Questions for Successful Data Communication
Deductive vs Inductive Communication
Which one is better?

Deductive.
For an inductive presentation you lay out all the evidence,
then eventually sum it up and deliver the conclusion.

For a deductive presentation you deliver the conclusion, then


lay out all the evidence.

The problem with inductive on the web is that your audience


leaves well before getting to your main point.
Role of Design Thinking in Data Analytics
Generate revolutionary insights through empathy, not just getting caught with bias

Don’t fall in love with the idea, but the problem of the users!
Storyboarding: Example

Storyboarding helps to:


● Establish a structure for your
communication (using Minto
pyramid principle)
● Visual outline of the content you
plan to create
Planning Your Presentation Steps by Steps

LIST SORT OUTLINE SHARPEN

Jot down all Group into logical Organize into logical Remove stuff not
important facts buckets of related storyline critical to
related to items communication
communication
Typical outline for business presentations
Steps to Sharpen Your Presentation

Synthesis vs
Highlight Simplify
Summary

●Headings Remove stuffy How to reduce


●Bulleting words length and
●Highlighting increase
understanding
Highlights
Example: McKinsey’s slide

● Headings
● Bullet points
● Highlights
Simplify: Less words, simple, concise.
Which one is better? Left/right?

To have research on financial planning


Action plan:
especially for a young family, education
● Research family financial planning
fund, college fund, health insurance,
○ Education/college fund
retirement fund, do we need a life
○ Health insurance
insurance(?)
○ Retirement fund
Action needed to have financial
○ Life insurance
projection until 2024 based on research,
● Financial projection 2021-2024
then setup meeting again in the next 2
● Setup meeting in 2 weeks (email)
week via Google Calendar
Synthesis vs Summary

An objective, short A combination of several


written presentation in SENTENCES into a single
your own words of ideas, one, which aims to create
facts, events, in a an understanding or original
SINGLE PIECE OF perspective of the
SENTENCE. information.

Aim: Reduces Length Aim: Increases


Understanding
The importance of decluttering visualization
Which one is easier to see?
Preattentive attributes in text + visual hierarchy
Preattentive attributes in graphs

Original graph without preattentive attributes Leverage color to draw attention


Assignment Part 2

bit.ly/SMDPDataAssignment
bit.ly/SMDPDatasetMar22
Q&A
Thank you for
your attention!

Anda mungkin juga menyukai