Anda di halaman 1dari 3

Originated from DB community…

 Traditional Database Systems


Data Mining  Indexing
 Query languages
 Query optimization
 Transaction processing
Vikram Pudi  Recovery …
vikram@iiit.ac.in  XML, Semantic web
IIIT Hyderabad  OO and OR DBMS …
 Data Mining

Business

Mine/Explore
Data Mining Government
DATA
Patterns

Automated extraction of
interesting patterns from large Research Labs
databases Feedback To Data Sources

Decision Making
Internet

Types of Patterns
 Associations
 Coffee buyers usually also purchase sugar Association Rules
 Clustering
That which is infrequent is not
 Segments of customers requiring different
promotion strategies worth worrying about.
 Classification
 Customers expected to be loyal

6
Association Rules Association Rule Applications
Transaction ID Items  E-commerce
1 Tomato, Potato, Onions  People who have bought Sundara Kandam
D: 2 Tomato, Potato, Brinjal, Pumpkin have also bought Srimad Bhagavatham
3 Tomato, Potato, Onions, Chilly  Census analysis
4 Lemon, Tamarind  Immigrants are usually male
 Sports
Rule: Tomato, Potato  Onion (confidence: 66%, support: 50%)  A chess end-game configuration with “white
Support(X) = |transactions containing X| / |D| pawn on A7” and “white knight dominating
Confidence(R) = support(R) / support(LHS(R)) black rook” typically results in a “win for white”.
 Medical diagnosis
Problem proposed in [AIS 93]: Find all rules satisfying  Allergy to latex rubber usually co-occurs with
user given minimum support and minimum allergies to banana and tomato
confidence.
7 8

The Classification Problem


Outlook Temp Humidity Windy? Class
(F)) (%)
sunny 75 70 true play
Play Outside?
sunny 80 90 true don’t play

Classification sunny
sunny
85
72
85
95
false
false
don’t play
don’t play Model relationship between
sunny 69 70 false play class labels and attributes
overcast 72 90 true play

To be or not to be: That is the overcast 83 78 false play


overcast 64 65 true play e.g. outlook = overcast  class = play
question. overcast
rain
81
71
75
80
false
true
play
don’t play
 Assign class labels to
- William Shakespeare rain
rain
65
75
70
80
true
false
don’t play
play
new data with unknown labels
rain 68 80 false play
rain 70 96 false play

sunny 77 69 true ?
rain 73 76 false ?
9 10

Applications
 Text classification
 Classify emails into spam / non-spam
 Classify web-pages into yahoo-type hierarchy
 NLP Problems


 Tagging: Classify words into verbs, nouns, etc.
Risk management, Fraud detection, Computer intrusion
Clustering
detection
 Given the properties of a transaction (items purchased, amount,

location, customer profile, etc.)
Determine if it is a fraud
Birds of a feather flock together.
 Machine learning / pattern recognition applications
 Vision
 Speech recognition
 etc.
 All of science & knowledge is about predicting future in terms of
past
 So classification is a very fundamental problem with ultra-wide scope
of applications

11 12
The Clustering Problem Applications
Outlook Temp Humidity Windy?
(F)) (%)
Find groups of similar records.
 Targetting similar people or objects
sunny 75 70 true  Student tutorial groups
sunny 80 90 true
sunny 85 85 false
 Hobby groups
sunny 72 95 false Need a function to compute
 Health support groups
sunny 69 70 false similarity, given 2 input records  Customer groups for marketing
overcast 72 90 true
overcast 73 88 true
 Organizing e-mail
overcast 64 65 true  Spatial clustering
overcast 81 75 false  Exam centres
rain 71 80 true  Unsupervised learning
rain 65 70 true
 Locations for a business chain
rain 75 80 false  Planning a political strategy
rain 68 80 false
rain 70 96 false

13 14

Take Home
 Data mining is a mature field.
 Good algorithms for core tasks are available.
 Focus on applications to challenging kinds of
data
 Streams, Distributed data, Multimedia, Web, …
 Most effort is in how to map domain problems to
data mining problems
 And how to make sense of the output.

15 16

Anda mungkin juga menyukai