Anda di halaman 1dari 13

Introduction to Data Mining

BY Poonam Bhargav

Outline
Why data mining?
What is data mining? KDD and Data Mining

Data mining functionality


Applications of data mining
1

Why Data Mining


Wide availability of Vast Amount of Data Huge Data, Starvation for Knowledge Strong Competitive Pressure Traditional techniques infeasible for raw data The costs of data storage have decreased significantly. Similarly, computing power has continued to increase, while the relative cost of computing power has continued to decrease.
2

What is Data Mining


Extraction of interesting ,non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases. Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns Data Mining: Confluence of Multiple Disciplines like Database, Machine Learning, Visualizations, Statistics, Pattern recognition etc.

Also known as Knowledge discovery(mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.
3

KDD and Data Mining


While data mining and knowledge discovery in databases (or KDD) are frequently treated as synonyms, data mining is actually part of the knowledge discovery process.

Data Mining: The core of Knowledge Discovery Process

Knowledge Discovery Process


Data cleaning: also known as data cleansing, it is a phase in which noise data and irrelevant data are removed from the collection. Data integration: at this stage, multiple data sources, often heterogeneous, may be combined in a common source. Data selection: at this step, the data relevant to the analysis is decided on and retrieved from the data collection. Data transformation: also known as data consolidation, it is a phase in which the selected data is transformed into forms appropriate for the mining procedure.
5

Knowledge Discovery Process Contd.


Data mining: it is the crucial step in which clever techniques are applied to extract patterns potentially useful. Pattern evaluation: in this step, strictly interesting patterns representing knowledge are identified based on given measures. Knowledge representation: is the final phase in which the discovered knowledge is visually represented to the user. This essential step uses visualization techniques to help users understand and interpret the data mining results.
6

Data Mining Functionalities


Concept description: Characterization and discrimination
eg. to compare the general characteristics of the customers who rented

more than 30 movies in the last year with those whose rental account is lower
than 5 from a video Store.

Association
major(x, CS) takes(x, DB) grade(x, A) *1%, 75%]

Classification and Prediction


eg. Teachers classify students grades as A, B, C, D, or F.

Predicting a missing value, a user profile property that the user did not
submitted on web form.
7

Data Mining Functionalities Contd.


Cluster analysis
In the field of psychiatry, the correct diagnosis of clusters of symptoms such as paranoia, schizophrenia, etc. is done for successful therapy

Outlier analysis
technique to fraud detection, network intrusion detection

Trend and evolution analysis


A decrease in total sales of a company for a month in comparison to same month of last year is a deviation pattern.

Applications of Data mining


Some of the important application domains of data mining are:

Retail industry Telecommunication Industry Biological Data Analysis Scientific Applications Sports Astronomy Health Industry Finance Law Agriculture
9

Summary
Data mining: discovering interesting patterns from large amounts of data A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc.
10

References
Advances in Knowledge Discovery and Data Mining, U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. AAAI/MIT Press, 1996. Data Mining: Concepts and Techniques, J. Han and M. Kamber. Morgan Kaufmann, 2000.

Data Warehousing, Data mining and OLAP, Alex Berson, Stephan J.Smith,1997 Knowledge Discovery and Data Mining in Databases, Vladan Devedzic, Principles of Knowledge Discovery in Databases, Osmar R. Zaiane , 1999
11

Thank You for Your Time and Support!

13

Anda mungkin juga menyukai