Presented By
Dr K SATYANARAYAN REDDY
31 December 2015
Contents
Recommender Systems
Fraud Detection
Sentiment Analysis
Applications and Trends in Data Mining
Recommender System
31 December 2015
Content-Based Recommenders
Find me things that I liked in the past.
Machine learns preferences through user
feedback and builds a user profile
Explicit feedback user rates items
Implicit feedback system records user activity
Clicksteam data classified according to page category
and activity, e.g. browsing a product page
Time spent on an activity such as browsing a page
Collaborative Filtering
31 December 2015
31 December 2015
Data
Mining
George
Mark
Peter
31 December 2015
Search Data
Engines Bases
XML
4
4
2
4
Challenges for CF
Sparsity problem when many of the items have not
been rated by many people, it may be hard to find like
minded people.
First rater problem what happens if an item has not
been rated by anyone.
Privacy problems.
Can combine CF with CB recommenders
Use CB approach to score some unrated items.
Then use CF for recommendations.
Serendipity - recommend to me something I do not know
already
Oxford dictionary: the occurrence and development of
by chance
in a 9happy
beneficial
way.
31 December 2015events
BITS WASE
Data Minig Session
(2015) By or
Dr. K.
Satyanarayan Reddy
Fraud Detection
Identify wrongful actions
Is right and wrong universal?
If so, why not just prevent wrong actions
31 December 2015
10
Outlier Detection
Assume non-fraudulent behavior is normal
Find the exceptions
Problems
31 December 2015
11
31 December 2015
12
Scaling Issues
What happens with millions of users?
Credit card
Cell phone
31 December 2015
13
Multi-user profiles
Cluster users
Develop profiles for clusters
E.g., differential profiling
31 December 2015
14
Sentiment Analysis
Sentiment
A thought, view, or attitude, especially one based
mainly on emotion instead of reason
Sentiment Analysis
aka opinion mining
use of natural language processing (NLP) and
computational techniques to automate the
extraction or classification of sentiment from
typically unstructured text
31 December 2015
15
Motivation
Consumer information
Product reviews
Marketing
Consumer attitudes
Trends
Politics
Politicians want to know voters views
Voters want to know policitians stances and who else supports them
Social
Find like-minded individuals or communities
31 December 2015
16
Problem
Which features to use?
Words (unigrams)
Phrases/n-grams
Sentences
17
Challenges
Harder than topical classification, with which
bag of words features perform well
Must consider other features due to
Subtlety of sentiment expression
irony
expression of sentiment using neutral words
Domain/context dependence
words/phrases can mean different things in different
contexts and domains
18
Approaches
Machine learning
Nave Bayes
Maximum Entropy Classifier
SVM
Markov Blanket Classifier
Accounts for conditional feature dependencies
Allowed reduction of discriminating features from
thousands of words to about 20 (movie review domain)
Unsupervised methods
Use lexicons
31 December 2015
19
20
21
22
23
24
25
31 December 2015
26
27
28
29
30
31
32
33
34
Clementine (SPSS)
An integrated data mining development
environment for end-users and developers
Multiple data mining algorithms and
visualization tools
31 December 2015
35
36
Multimedia
Systems
High Performance
Computing
31 December 2015
Human
Computer
Interfaces
Pattern
Recognition
37
Visualization
Purpose of Visualization
Gain insight into an information space by mapping data
onto graphical primitives
Provide qualitative overview of large data sets
Search for patterns, trends, structure, irregularities,
relationships among data.
Help find interesting regions and suitable parameters for
further quantitative analysis.
Provide a visual proof of computer representations derived
31 December 2015
38
data visualization
data mining result visualization
data mining process visualization
interactive visual data mining
Data visualization
Data in a database or data warehouse can be
viewed at different levels of abstraction
as different combinations of attributes or dimensions
39
Outliers
Generalized rules
31 December 2015
40
41
Understand
variations with
visualized data
31 December 2015
42
Example
Display the data distribution in a set of attributes using colored
sectors or columns (depending on whether the whole space is
represented by either a circle or a set of columns)
31 December 2015
43
31 December 2015
44
45
Regression
predict the value of a response
(dependent) variable from one or more
predictor (independent) variables where
the variables are numeric
31 December 2015
46
Mixed-effect models
For analyzing grouped data, i.e. data that can be classified according to one
or more grouping variables
31 December 2015
47
48
www.spss.com/datamine/factor.htm
31 December 2015
49
Survival Analysis
predicts the probability
that a patient undergoing
a medical treatment
would survive at least to
time t (life span
prediction)
31 December 2015
50
51
52
53
54
31 December 2015
55
56
57
58
59
60
31 December 2015
61
62
63
Summary
Domain-specific applications include biomedicine (DNA), finance, retail
and telecommunication data mining
There exist some data mining systems and it is important to know their
power and limitations
Visual data mining include data visualization, mining result
64
THANK YOU
31 December 2015
65
31 December 2015
66
Outlier Detection
Assume non-fraudulent behavior is normal
Find the exceptions
Problems?
Gives profile of
individual behavior
How do we do this?
Classification
Mining
Profile
Profile
Profile
Results
Accuracy
Time to Alarm
Scaling Issues
What happens with millions of users?
Credit card
Cell phone
Multi-user profiles
Cluster users
Develop profiles for clusters
E.g., differential profiling
Data mining
for detection and
prevention
Group behavior
using a clustering
algorithm
Find groups of
events using the
association
algorithms
Identify outliers
and investigate
Data Mining in
action: Fraud,
waste and abuse
case studies
Audit selection
The Washington State
Department of Revenue
needed to detect erroneous
tax returns and...
Predict
Prevent future instances