Anda di halaman 1dari 19

BY : MOHIT YADAV (096 ) JAYEETA CHATTERJEE ( 101) MONIKA KATARIA ( 112 )

MEANING OF DATA MINING


Data mining (the analysis step of the knowledge discovery in databases process), a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems.

ROLE OF DATA MINING


Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals.

Analyze the data by application software.


Present the data in a useful format, such as a graph or table.

EXAMPLE OF DATA MINING

ADVANTAGES AND DISADVANTAGES OF DATAMINING


Marketing / Retail

Finance / Banking Manufacturing Government Privacy Issues Security Issues Misuse of Information / Inaccurate

Information

MEMORY BASED REASONING TECHNIQUE - MEANING


Memory-Based Reasoning (MBR) tries to mimic human behavior in an automatic way. Memories of specific events are used directly to make decisions, rather than indirectly (as in systems which use experience to infer rules). MBR is a two step procedure: first, identifying similar cases from experience, secondly, applying the information from these cases to new cases. MBR is specifically well suited to non-numerical data. MBR needs a distance measure to assign dissimilarity of two observations and a combination function to combine the results from the neighboring points to achieve an answer. Generating examples is much easier than generating rules which makes MBR so attractive. However, applying rules to new observations is much easier and faster than comparing new cases to a bulk of memorized objects.

The human ability to reason from experience depends on the ability to recognize appropriate examples from the past. A doctor diagnosing diseases, a claims analyst identifying fraudulent insurance claims, Each first identifies similar cases from experience and then applies knowledge of those examples to the problem at hand. This is the essence of memory-based reasoning. A database of known records is searched to find preclassified records similar to a new record. These neighbors are used for classification and estimation.

ELEMENTS OF MBR
It uses known instances of a model to predict unknown instances. Maintains a dataset of known records. When a new record arrives for evaluation, the algorithm finds neighbors similar to new record which helps in :

Prediction

Classification

HOW IT WORKS?
When a new record arrives, the tool first calculates the distance between new record and the records existing in the training dataset. The distance function does the calculation. This determine which training dataset qualify to be considered as neighbors.

SOLVING A DATA MINING PROBLEM USING MBR


Selecting the most suitable historical records to form the training or base dataset. Establishing the best way to compose the historical record. Determining the two essential functions:

Distance Function

Combination Function

MBR APPLICATIONS
Fraud detection Customer response prediction

Medical treatments
Classifying responses MBR can process free-text responses and assign codes
12

PREDICTIVE DATA MINING USED IN MBR


Honest
Tridas Vickie Mike

Crooked
Wally Waldo Barney
13

PREDICTION

Tridas

Vickie

Mike

Honest = has round eyes and a smile


14

ADVANTAGES
Can use data as is.
Able to adapt easily to new data. Adding/deleting example does not give side effect.

Explanation of answers is based on real examples.


It is possible to apply to ordered data as well as

Nominal data and ratio data.


High parallelism is possible.

DISADVANTAGES
Resource intensive

No ability to generate the answer that does not exist in the examples data base.
Prediction accuracy strongly depends on the definition of similarity. Choosing appropriate historical data for use in training Choosing the most efficient way to represent the training data Choosing the distance function, combination function, and the number of neighbors

CONCLUSION
It produces results that are readily understandable.

It is applicable to arbitrary data types, even nonrelational data.


It works efficiently on almost any number of fields. Maintaining the training set requires a minimal amount of effort. It is computationally expensive when doing classification and prediction. It requires a large amount of storage for the training set. Results can be dependent on the choice of distance function, combination function, and number of neighbors.

Anda mungkin juga menyukai