Anda di halaman 1dari 10

Data Mining

• What is Data Mining?


• What is Data Warehousing?
• What is the need of Data Mining?
• Data Mining Architecture
• Data Mining Algorithms
Data mining
• Data mining, the extraction of hidden
predictive information from large
databases, is a powerful new technology
with great potential to help companies
focus on the most important information in
their data warehouses
Data Warehousing
• A system for storing and delivering massive quantities of
data.

• The critical factor leading to the use of a data warehouse


is that a data analyst can perform complex queries and
analysis (such as data mining) on the information without
slowing down the operational systems
Data Mining

• Data mining tools predict future trends and behaviors,


allowing businesses to make proactive, knowledge-
driven decisions

• Data mining tools can answer business questions that


traditionally were too time consuming to resolve
Uses

• Retailing
• Weather Forecasting
• Traffic Congestion
Algorithms
• Data mining algorithms traditionally fall
into one of four broad categories
• Classification
• Clustering
• Association
• Sequence discovery
• Classification, or supervised induction, is perhaps the
most common of all data mining activities. The objective
of classification is to analyze the historical data stored in
a database and to automatically generate a model that
can predict future behavior.
• This induced model consists of generalizations over the
records of a training data set, which help distinguish
predefined classes.
• The hope is that this model can then be used to predict
the classes of other unclassified records.

.
• Common tools used for classification are neural networks,
decision trees and if-then-else rules that need not have a
tree structure.
• Neural networks involve the development of
mathematical structures with the ability to learn.
• Decision trees classify data into a finite number of
classes, based on the values of the variables. DTs are
comprised of essentially a hierarchy of if-then statements
and are thus significantly faster than neural nets
• Rule induction —The extraction of useful if-then rules
from data based on statistical significance. if-then
statements used here need not be hierarchical
• Clustering partitions the database into segments in
which each segment member shares similar qualities
• Associations establish relationships about items that
occur together in a given record
• Sequence Discovery can be looked at as the
identification of associations over time. When
appropriate information is available (for instance, the
identity of a customer in a retail shop), a temporal
analysis can be conducted to identify behavior over time.

Anda mungkin juga menyukai