Data Mining Techniques

Submitted by :
Abhishek Chowdhury(10030141006) Samradni Ghone(10030141009) Puneet Gupta(10030141026)
Contents :
Data Mining
Next Generation Data Mining Techniques
Decision Trees Neural Network Association Rules Implementation of Techniques in Business
What is Data Mining? Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses.
Data Mining Techniques :

Decision Tree Neural Network Association Rules
What is a Decision Tree? A decision tree is a predictive model that, as its name implies, can be viewed as a tree. Specifically each branch of the tree is a classification question and the leaves of the tree are partitions of the dataset with their classification.
A decision tree consists of 3 types of nodes:1. Decision nodes - commonly represented by squares 2. Chance nodes - represented by circles 3. End nodes - represented by triangles
Example :
Where can decision trees be used?

Using Decision Tree for Exploration Using Decision Tree for Preprocessing Using Decision Tress for Prediction
Growing of decision Tree

When to start Growing the decision Tree?
When does the Tree stop growing?

Why would a decision algorithm stop growing the tree if
there wasnt enough data?
Rule Induction
Rule induction is one of the major forms of data mining
and is perhaps the most common form of knowledge discovery in unsupervised learning systems. In Rule induction all possible patterns are systematically pulled out of the data and then an accuracy and significance are added to them that tell the user how strong the pattern is and how likely it is to occur again.
What is a Rule?
In rule induction systems the rule itself is of a simple form
of if this and this and this then this. For example a rule that a supermarket might find in their data collected from scanners would be:
if pickles are purchased then ketchup is purchased. Or If paper plates then plastic forks
In order for the rules to be useful there are two pieces of
information that must be supplied as well as the actual rule:
Accuracy - How often is the rule correct? Coverage - How often does the rule apply?
What is a Rule?
In some cases accuracy is called the confidence of the rule
and coverage is called the support. Accuracy and coverage appear to be the preferred ways of naming these two measurements.
Rule If breakfast cereal purchased then milk purchased. Accuracy 85% Coverage 20%
If bread purchased then swiss cheese purchased.
15%
6%
The left hand side is called the antecedent and the right
hand side is called the consequent.
What to do wirh a Rule?

When the rules are mined out of the database the rules
can be used either for understanding better the business problems that the data reflects or for performing actual predictions against some predefined prediction target.
Target the antecedent.
Target the consequent. Target based on accuracy.
Target based on coverage.

Target based on interestingness.
Discovery
These systems provide both a very detailed view of the
data where significant patterns Target the antecedent. These systems thus display a nice combination of both micro and macro views:
Macro Level - Patterns that cover many situations are
provided to the user that can be used very often and with great confidence and can also be used to summarize the database. Micro Level - Strong rules that cover only a very few situations can still be retrieved by the system and proposed to the end user
Discovery
These systems provide both a very detailed view of the
data where significant patterns Target the antecedent. These systems thus display a nice combination of both micro and macro views:
Macro Level - Patterns that cover many situations are
provided to the user that can be used very often and with great confidence and can also be used to summarize the database. Micro Level - Strong rules that cover only a very few situations can still be retrieved by the system and proposed to the end user
Prediction
Each rule by itself can perform prediction - the
consequent is the target and the accuracy of the rule is the accuracy of the prediction. But because rule induction systems produce many rules for a given antecedent or consequent there can be conflicting predictions with different accuracies. This can be done in a variety of ways by summing the accuracies as if they were weights or just by taking the prediction of the rule with the maximum accuracy.
Prediction
Antecedent bread butter eggs cheese Consequent milk milk milk milk Accuracy 35% 65% 35% 40% Coverage 30% 20% 15% 8%
The business importance of accuracy and coverage

From a business perspective accurate rules are important because
they imply that there is useful predictive information in the database that can be exploited - namely that there is something far from independent between the antecedent and the consequent.
Accuracy Low Accuracy High Rule is often correct and can be used often. Rule is often correct but can be only rarely used. Rule is rarely correct but can be used often. Rule is rarely correct and can be only rarely used.
Coverage High Coverage Low
For instance you may have a rule that is 100% accurate but is only
applicable in 1 out of every 100,000 shopping baskets. You can rearrange your shelf space to take advantage of this .

Data Mining Techniques

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Data Mining Techniques

Diunggah oleh

Hak Cipta:

Format Tersedia

Submitted by :

Abhishek Chowdhury(10030141006) Samradni Ghone(10030141009) Puneet Gupta(10030141026)

Decision Trees Neural Network Association Rules Implementation of Techniques in Business

Data Mining Techniques :

Where can decision trees be used?

Growing of decision Tree

When does the Tree stop growing?

there wasnt enough data?

In order for the rules to be useful there are two pieces of

information that must be supplied as well as the actual rule:

If bread purchased then swiss cheese purchased.

hand side is called the consequent.

What to do wirh a Rule?

Target based on coverage.

The business importance of accuracy and coverage

Coverage High Coverage Low

Anda mungkin juga menyukai