1.
2.
3.
Lecture plan
RPKPS (Rencana Program Kegiatan
Pembelajaran Semester)
Lecture plan
RPKPS (Rencana Program Kegiatan
Pembelajaran Semester)
Tujuan Instruksional Khusus Tiap Topik (Pokok Bahasan)
Data Transformation
WEKA Data Mining Implementation.
Todays topic
What is data mining:
(1) data mining and machine learning;
(2) simple examples;
Alternative names:
9/13/2013
The motivation:
Big data in
databases and
other repositories
9/13/2013
1960s:
1970s:
1980s:
1990s2000s:
9/13/2013
So
How is the relation between data mining and
machine learning?
machine learning,
statistics,
databases,
pattern recognition
Text mining
9/13/2013
12
Simple example
Contact lens prescription
Classification
9/13/2013
Presented in
decision tree
14
pelvic_tilt
Class
attribute
-0.254399986Hernia
4.564258645Hernia
-3.530317314Hernia
11.21152344Hernia
49.71285934 9.652074879
7.918500615Hernia
40.25019968 13.92190658
2.230651729Hernia
5.988550702Hernia
-10.67587083Hernia
43.79019026
36.68635286 5.010884121
13.28901817Hernia
0.664437117Hernia
-7.825985755Hernia
Database or data
warehouse server
Filtering
Databases
9/13/2013
Knowledge-base
Data
Warehouse
18
Problem:
Increasing vast number of marketing campaigns
Solution:
Directed campaigns with a strict and rigorous selection of
contacts.
Focus on targets that assumable will be keener to that specific
product/service
More efficient, reduction in costs and time
The dataset:
Portuguese marketing
campaign related with bank
deposit subscription.
Steps
1. Goal definition
Classification task
dataset was randomly divided into training (2/3) and test (1/3) sets
Conclusion
Call duration is the most
relevant feature, meaning
that longer calls tend
increase successes.
In second place comes the
month of contact.
Success is most likely to
occur in the last month of
each trimester (March, June,
September and December).
Such knowledge can be
used to shift campaigns to
occur in those months.
Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
9/13/2013
23
Functionality
Knowledge produced by data mining
Useful
Valid
Understandable
Cluster
Frequent itemset
Frequent subsequences
Frequent substructures
Cluster analysis
A data mining system may generate thousands of patterns, not all of them
are interesting.
9/13/2013
28
Approaches
9/13/2013
First general all the patterns and then filter out the uninteresting
ones.
Generate only the interesting patternsmining query optimization
29