Anda di halaman 1dari 48

M. Malik, M.

Ak, CPMA, CIA, CertDA


Hendro Sulistio, S. Kom, CertDA
TABLE OF CONTENTS

1 THE CRISP-DM FRAMEWORK 5 DATA ANALYTICS METHODOLOGIES

2 BIG DATA AND DATA ANALYTICS 6 MAINSTREAM TOOLS AND KEY APPLICATIONS

3 SOURCES OF DATA 7 DATA VISUALIZATION AND


COMMUNICATION

4 TYPES OF ANALYTICS 8 SKEPTICISM AND ETHICAL CONSIDERATIONS


5
Data Analytics
Methodologies
Artificial Intelligence
Kecerdasan buatan (artificial intelligence) adalah cabang penting dari ilmu komputer yang memiliki
tujuan luas untuk menciptakan mesin yang berperilaku cerdas. Bidang ini memiliki subbidang,
termasuk Robotics dan Machine Learning.

Ada tiga kategori utama kecerdasan buatan.

Artificial Intelligence

Artificial Narrow Artificial General Artificial Super


Intelligence Intelligence Intelligence
Artificial Intelligence
Artificial Narrow Intelligence atau Weak AI disebut demikian karena terbatas pada kinerja tugas
khusus dan sangat spesifik. Alexa Amazon adalah contoh kecerdasan sempit buatan. Sebagian
besar aplikasi komersial AI adalah contoh Artificial Narrow Intelligence.

Artificial General Intelligence, juga dikenal sebagai Strong AI atau Human-Level AI, adalah istilah
yang digunakan untuk kecerdasan buatan yang memungkinkan mesin memiliki kemampuan yang
sama dengan manusia.

Artificial Super intelligence Kecerdasan Super Buatan melampaui kecerdasan umum dan
menghasilkan mesin yang memiliki kemampuan lebih unggul daripada manusia.

Sebagian besar kecerdasan buatan saat ini adalah narrow intelligence, sementara General
Intelligence akan akan menjadi dianggap biasa waktu dekat. Super Intelligence masih belum ada
kemungkinan terwujudnya.
Robotics
Robotics adalah cabang interdisipliner dari kecerdasan buatan yang mengacu pada disiplin ilmu
komputer, teknik elektronik dan teknik mesin dan berkaitan dengan pengembangan mesin yang
dapat melakukan tugas-tugas manusia dan mereproduksi tindakan manusia. Tugas manusia yang
ingin ditiru oleh robotika mencakup logika, penalaran, dan perencanaan.

Tidak semua robot dirancang menyerupai penampilan manusia, tetapi banyak yang akan diberikan
fitur mirip manusia untuk memungkinkan mereka melakukan tugas fisik yang dilakukan oleh
manusia. Desain robot semacam itu banyak menggunakan teknologi sensor, termasuk tetapi tidak
terbatas pada sistem penglihatan komputer, yang memungkinkan robot "melihat" dan
mengidentifikasi objek.

Robot sering digunakan pada lini produksi di perusahaan manufaktur besar tetapi juga dapat
ditemukan dalam sistem autopilot di pesawat terbang serta dalam pengembangan mobil self-
driving atau otonom yang lebih baru dan berkembang. Semua contoh ini mewakili kategori
kecerdasan buatan dalam pengertian "sempit".
Robotics
Machine learning
Machine learning adalah penggunaan model statistik dan algoritma lain untuk memungkinkan
komputer belajar dari data. Hal ini dibagi menjadi dua jenis yang berbeda, pembelajaran tanpa
pengawasan dan terawasi.

Machine Learning

Supervised Learning Unsupervised Learning


Machine learning
Unsupervised Learning menarik kesimpulan dan mempelajari struktur dari data tanpa diberi label,
klasifikasi, atau kategori apa pun. Dengan kata lain, pembelajaran tanpa pengawasan dapat terjadi
tanpa diberikan pengetahuan sebelumnya tentang data atau pola yang mungkin dikandungnya.

Supervised Learning mirip dengan tugas yang dilakukan manusia untuk belajar konsep. Pada
tingkat yang paling dasar, memungkinkan komputer untuk mempelajari fungsi yang memetakan
satu set variabel input ke variabel output menggunakan satu set contoh pasangan input-output.
6

Mainstream Tools and Key


Application
Tools and applications for descriptive analytics
There are many tools available for descriptive analytics, some of which are briefly described below.

Microsoft Excel with the Data Analysis Tool Pack is a relatively easy to use yet powerful
application for descriptive analysis. It has one drawback in that the number of rows of data that
can be processed is limited to one million. However, it is a viable and readily available tool for
descriptive statistical analysis of smaller datasets.

RapidMiner is a data science software platform developed by the company of the same name that
provides an integrated environment for data preparation, machine learning, deep learning, text
mining, and predictive analytics.

WEKA, the Waikato Environment for Knowledge Analysis is a suite of machine learning software
written in Java, developed at the University of Waikato, New Zealand.
Tools and applications for descriptive analytics
KNIME, , the Konstanz Information Miner, is a free and open-source data analytics, reporting and
integration platform. KNIME integrates various components for machine learning and data mining
through its modular data pipelining concept.

R is a statistical programming language and computing environment created by the R Foundation


for Statistical Computing. The R language is widely used among statisticians and data miners for
developing statistical software. It is particularly useful for data analysts because it can read any
type of data and supports much larger data sets than is currently possible with spreadsheets.

Python is a general-purpose programming language that can make use of additional code in the
form of "packages" that provide statistical and machine learning tools.

SAS is a commercial provider of Business Intelligence and data management software with a suite
of solutions that include artificial intelligence and machine learning tools, data management, risk
management and fraud intelligence.
Tools and applications for descriptive analytics
SPSS Statistics is a commercial solution from IBM, while originally designed for social science
research, is increasingly used in health sciences and marketing. In common with the other
applications listed here, it provides a comprehensive range of tools for descriptive statistics.

Stata is a commercial statistical software solution frequently used in economics and health
sciences.
Tools and applications for predictive analytics
All of the tools mentioned in the previous section can also be used for predictive analytics.
Some, such as Excel and SPSS Statistics are limited to in the range of predictive analytics tasks they
can perform. In particular, these tools do not offer the wide range of options for classification or
advanced regression available.

Predictive analytics features are also provided by applications and services such as IBM Predictive
Analytics, SAS Predictive Analytics, Salford Systems SPM 8, SAP Predictive Analytics, Google
Cloud Prediction API. R and Python can also be used to perform predictive analytics.

Other tools in the predictive analytics space include SPSS Modeler from IBM, Oracle Data Mining,
Microsoft Azure Machine Learning and TIBCO Spotfire.
Tools and applications for prescriptive analytics
Tools in the prescriptive analytics space are fewer in number. One frequently overlooked solution
is the "what if" analysis tool which is part of Excel's Analysis Tool Pack. This simple yet effective
small-scale predictive analytics tool allows the user to model different scenarios by plugging in
different values to a worksheet's formulas.

As mentioned earlier in the unit, there is also ‘Scenario Manager’ which allows the analyst to test
outcomes from different scenarios, but the most powerful tool in the Tool Pack is ‘Solver’ which is
a flexible and powerful optimization tool and examples of how ‘Solver’ can help solve business
problems and determine optimal solutions have already been illustrated.

Although spreadsheets are versatile tools which most people have access to and can easily use, R
and Python would be two other widely used tools for more advanced prescriptive analytics as they
use programming languages which allow the user the flexibility to design prescriptive analytical
models, only limited in their sophistication by the programmer or coder’s skill, ingenuity and
imagination.
7
DATA VISUALIZATION AND
COMMUNICATION
What is data visualization?
Data visualization expert Andy Kirk described data visualization as "the representation and
presentation of data to facilitate understanding.“

User-experience (UX) specialist Vitaly Friedman describes the main benefit of data visualization as
"its ability to visualize data, communicating information clearly and effectively.“

https://www.youtube.com/watch?v=9NUjHBNWe9M
The purpose of data visualization
Data visualization allows us to:
1. Summarize large quantities of data effectively
2. Answer questions that would be difficult, if not impossible, to answer using non-visual
analysis
3. Discover questions that were not previously apparent and reveal previously unidentified
patterns.
4. View the data in its context.

The benefits of data visualization


By utilizing data visualization techniques, we can:
1. Quickly identify emerging trends and hidden patterns in the data
2. Gain rapid insights into data which are relevant and timely
3. Rapidly process vast amounts of data
4. Identify data quality issues.
Types of data
visualization

Comparison Composition Relationship


1. Types of data visualization - Comparison
2. Types of data visualization - Composition

Sales

1,2

1,4

3,2 8,2

Product 1 Product 2 Product 3 Product 4


2. Types of data visualization - Composition

Stacked Bar Chart


14

12

10 5
8 2 2
3
6 2,8
2,4
4,4 1,8
4

2 4,3 4,5
3,5
2,5
0
2017 2018 2019 2020
Product 1 Product 2 Product 3
3. Types of data visualization - relationship
The scatter plot is ideal for visualizing the relationship between two variables and identifying
potential correlations between them.

Variable 1

Variable 2
What makes a good visualization?
Accessibility
Accessibility is offering the most insight for the least amount of viewer effort.

Color
Careful use of color is a fundamental part of ensuring an accessible design. The choice of colors
should be a deliberate decision. They should be limited in number, should complement each other
and should be used in moderation to draw attention to key parts of the design.

Metaphor
These metaphors include the "traffic light" design used to provide a high-level summary of the
state of some key business risk indicator or the speedometer metaphor employed to indicate
performance.

Elegance
A good design should avoid placing obstacles in the way of the viewer; it should flow seamlessly
and should guide the viewer to the key insights it is designed to impart.
QUIZ TIME
QUIZ
Question 6
Data visualization is described as “the representation and presentation of data to facilitate
understanding.”

Indicate the above statement as True or False.

A. True
B. False
QUIZ
Question 8
By using data visualization techniques, we can

1. Always make the correct decisions


2. Gain rapid insights into data
3. Automatically reveal the correct course of action for future
4. Identify data quality issues

Indicate the above statement as correct.

A. Statement 1 and 3 only


B. Statement 2 and 4 only
C. Statement 1, 3 and 4 only
D. Statement 1, 2, 3 and 4
QUIZ
Question 5
Artificial General Intelligence results in machines having superior capabilities than humans do.

Indicate the above statement as True or False.

A. True
B. False
Referensi:

1. https://ascarya.or.id/cara-menganalisis-data-kualitatif/
2. https://youtu.be/6u2TytBL69w (PPT embedding interaktif pivot table)
3. https://www.youtube.com/watch?v=Yv7QBZXEDDc (create pivot table
from multiple sheet in excel)
4. https://youtu.be/ADArCWLz55Y (data cleansing in excel)
5. https://www.youtube.com/watch?v=iG6lN9aBrcM (data analysis using
excel)
6. https://www.youtube.com/watch?v=dRm5MEoA3OI (solver)
7. https://code.tutsplus.com/id/tutorials/mathematical-modules-in-
python-statistics--cms-27750
8. https://code.tutsplus.com/id/tutorials/python-from-scratch-functions-
and-modules--net-21045
6

Scepticism in data
analytics
Skepticism in data analytics
Although it is tempting to conclude that the answer produced by a data mining exercise, or the
prediction produced by a machine learning algorithm, are both true and precise, this is not always
the case.

We should never take the results of any data analysis at face value. If something looks too good
(or too strange) to be true, it probably isn't but of course this is not always the case, making it
important to verify the findings before drawing any firm conclusions from it.

https://towardsdatascience.com/be-skeptical-the-most-important-principle-as-a-data-analyst-
903172222c65
6

Ethical considerations in
use of data
Ethical considerations in the use of data
Data mining and the data sources used as part of the process are the subject of increasing
amounts of government and supranational regulation because:

i. Relative ease of access to data


ii. Issues of privacy
iii. Unethical use of data by certain individuals

Transparency is a fundamental ethical principle related to the collection and use of personal data.
The data subject; the person whose data is being stored, has a right to know why the data is
who will be storing and using the data, for what purposes the data will be
being collected,
used, for how long they can expect it to be stored, and how they can go about amending the
data if any details are incorrect or have changed.
Ethical considerations in the use of data
A well-known example of the deliberate misuse of data mining is the Facebook-Cambridge
Analytica scandal in which Cambridge Analytica used data from the profiles of millions of
Facebook users in order to influence public opinion in the 2015 and 2016 campaigns of United
States politicians Donald Trump and Ted Cruz, the 2016 Brexit vote and the 2018 Mexican General
Election. The result of this irresponsible use of data was not only a heightening of awareness of
privacy issues on social media, but also the loss of 100 billion dollars from Facebook's market
valuation.
QUIZ TIME
Question 1
Data mining exercise and the prediction produced by machine learning algorithm are always true
and precise.

Indicate the above statement as True or False.

A. True
B. False
Question 2
A retailer acquired data from customers who agreed to participate in a survey. The terms and
conditions of the survey state that the customer agrees to the use of their data for any purpose
and this is completely ethical.

Indicate the above statement as True or False.

A. True
B. False

Anda mungkin juga menyukai