Hi! My name is
Vanessa Geraldine.
Data Analytics Consultant
8+ years of experience in Data Analytics, Strategy, Management
S1 - Universitas Indonesia
S2 - University of Melbourne, Melbourne Business School
Let’s have a short
menti! Go to www.menti.com and
enter the code 5620 5451
https://www.menti.com/j1udv2oew2
1.
Memahami Pentingnya Data
Data? Data is numbers, characters, images, or other
method of recording, in a form which can be
25 assessed to make a determination or decision
about a specific action.
20
Data Informasi
1,099 Rata-rata orang yang
1,100 membeli buku
1,111 pendidikan X adalah
1,099 per bulan
Angka dalam sebuah tabel
tanpa konteks adalah data
Data Analisis data adalah ilmu mengekstraksi
tren, pola, dan informasi yang relevan dari
Companies which are using Data analytics also gives you Personalization that can Cutting unnecessary cost
data to make decisions will useful insights into how your improve customer and invest in more profitable
give better outcomes campaigns are performing to satisfaction to the product area
get optimal outcomes
Your Daily Data Wrangling Tool
Big Data Value Chain
Source: Dataversity
Penggunaan Analitik dalam bisnis
Feel Free to be MECE Don’t Reinvent the wheel Every problem is unique
Example:
Why are the sales decreases by 5% in the past 2
months? Source: Prabudhesai, 2018
within 3 months?
Attainable, Time-Bound
MECE Principle MECE stands for mutually exclusive,
comprehensively exhaustive
It is a framework for solving complicated problems.
When you apply it to a problem, you break that problem
into subproblems that are:
mutually exclusive (they don’t overlap) and
comprehensively exhaustive (they cover all possibilities)
When you have a complicated problem, that’s when
MECE will be most helpful.
Partitioning the problem into smaller problems makes it
easier to solve.
Issue Tree
An issue tree is the evolved cousin
of the logic tree.
Where a logic tree is simply a
hierarchical grouping of elements,
an issue tree is the series of
questions or issues that must be
addressed to prove or disprove a
hypothesis.
An issue tree is simply the laying
out of issues and sub-issues into
a MECE visual progression
The 5 Why technique
Another example, here from business case...:
Actionable metrics: Statistics that link to specific tasks that you can improve on
and to stats that you can tie in to the goals of your business.
Typical actionable metrics include but are not limited to:
● Conversion volume/rate,
● Active subscriptions,
● Monthly Churn
● Retention Rate
2.
Data Preparation:
Gathering & Cleaning Data
Data Analysis Process
Data Gathering
Istilah ‘Garbage In, Garbage Out’
Data cleaning
Data cleaning is the process of
identifying, deleting, and/or
replacing inconsistent or incorrect
information from the database.
https://www.iteratorshq.com/blog/data-cleaning-in-5-easy-steps/
Case Study
In the right, data after treatment of missing values (based on gender), we can see that females have
higher chances of playing cricket compared to males.
https://www.iteratorshq.com/blog/data-cleaning-in-5-easy-steps/
Ways to Clean Data in Excel
Trim White Spaces (=TRIM) Change Text to Lower/Upper/Proper Case
Select & Treat all blank cells Parse Data Using Text to Column
Convert Numbers Stored as Text into Use Find & Replace to Clean Data in Excel
Numbers
Delete all Formatting
Highlight and Remove Duplicates
Spell Check
Highlight Errors with Data Validation
Use Filter
CONCATENATE
TEXT TO COLUMNS/SPLIT
Q&A
3.
Metode pengolahan data
Explore Data
1.Measures of Frequency
(Use this when you want to show how often a response is given) :
a.Count, Percent, Frequency
b.Shows how often something occurs
2.Measures of Central Tendency
(Use this when you want to show how an average or most commonly indicated
response) :
a.Mean, Median, and Mode
b.Locates the distribution by various points
3.Measures of Dispersion or Variation (Use this when you want to show how
"spread out" the data are. It is helpful to know when your data are so spread out that
it affects the mean) :
a.Identifies the spread of scores by stating intervals
b.Range = Max-Min points
c.Variance or Standard Deviation = difference between observed score and mean
A statistical hypothesis is a
hypothesis that is testable on
the basis of observed data
modelled as the realised values
taken by a collection of random
variables
Analyze Data
• Correlation is a statistical measure that indicates the extent to which two or more
variables fluctuate together.
• A positive correlation indicates the extent to which those variables increase or decrease
in parallel; a negative correlation indicates the extent to which one variable increases as
the other decreases.
Clustering Analysis
Source: t-sciences.com
Another Example
Data Relationships
Source:
Berkeley
Bar Charts
● The most straight to the point
● Showing trends, patterns, exceptions
● Uses: compare discrete categories, to analyze changes over time, or to compare parts of a whole.
67
68
Histograms
● Represent a variable in the
form of bars, where the
surface of each bar is
proportional to the frequency
of the values represented.
● Distribution of a population or
sample with respect to a
given characteristic.
Line Charts
● Continuous data evolution
● Display changes or trend overtime
● Visualize changes in one value relative to another
● Showcasing relationships, acceleration,
deceleration, and volatility in a data set.
70
Pie Charts
Displaying proportions and parts
71
Heatmap
Patterns or relative concentrations
72
Scatter Plot
● Present large volumes of data at once
● Show the relationship between two, three or
more measures: size, color, type of dots
● They also help us determine whether or not
different groups of data are correlated.
73
Dot Plot
Allows you to compare values
across two dimensions.
75
Preattentive attributes
These are things that our brain processes in milliseconds, before we pay attention
to everything else.
Source: Storytelling with data
bit.ly/SMDPDataAssignment
bit.ly/SMDPDatasetMar22
5.
Pemaknaan hasil pengolahan data
I have data. I need
insights.
Where do I start?
Insight Generation: Important Notes
Every business can be thought Given this, you can think of an
of as a complicated system “insight” as anything that
with many moving parts. increases your understanding of
Nobody really understands it how the system actually works.
100%.
Insight occurs when people
There’s a gap between their
recognize relationships or make
understanding of the
associations between objects and
business and how it
actually works. actions that can help solve new
problems (Source: Britannica)
Example
Expectation vs Reality.
● The top four revenue generator categories are Basket, Art & Sculpture, Jewelry, and Home Decor which represents
82% of the total sales or 72% of the total items sold
● Christmas category, which sits on the 6th rank, is the reason why December’s sales are peaking compared to the
other months, as can be seen in the middle chart
Thinking Critically About the Data
Correlation vs Causation
Example:
● Confirmation bias: you’re so sure about the result of
an experiment and try to prove it with data,
researchers choose only the data that supports their
own hypothesis.
● AI: Certain facial recognition systems trained primarily
on images of white men (gender & racial bias)
Other Example of Data Bias
● Sample/Selection bias: A survey of high
school students to measure teenage use of
illegal drugs will be a biased sample because
it does not include home-schooled students
or dropouts
● Analytics bias: incomplete data sets and a
lack of context around those data sets
● Outlier bias: including Jeff Bezos in an effort
to analyze mean American incomes, for
example, would drastically skew the results of
your study because of his wealth
How to Reduce Bias
● Use multiple people to code the data
● Verify with more data sources
● Check for alternative explanations
● Review findings with peers
6.
Cara mengomunikasikan hasil data
Why communication with
data is important to
discuss? Communication is the life blood of an
organization. If we can’t do it
effectively, at best we are less
effective, at worst will fail.
Characteristics of a Great Business
Communication
Easy to
Clear
Understand (by
Concise conclusion/
the
next steps
audience(s))
Why we need a structured communication?
● Data ● Concise
● Analysis ● Easy to Understand (by
● Funnels
the audience(s))
● Arguments
● Facts ● Clear conclusion/next
● Findings steps
● Conclusions
4 Key Questions for Successful Data Communication
Deductive vs Inductive Communication
Which one is better?
Deductive.
For an inductive presentation you lay out all the evidence,
then eventually sum it up and deliver the conclusion.
Don’t fall in love with the idea, but the problem of the users!
Storyboarding: Example
Jot down all Group into logical Organize into logical Remove stuff not
important facts buckets of related storyline critical to
related to items communication
communication
Typical outline for business presentations
Steps to Sharpen Your Presentation
Synthesis vs
Highlight Simplify
Summary
● Headings
● Bullet points
● Highlights
Simplify: Less words, simple, concise.
Which one is better? Left/right?
bit.ly/SMDPDataAssignment
bit.ly/SMDPDatasetMar22
Q&A
Thank you for
your attention!