Anda di halaman 1dari 11

NOVEMBER 30, 2017

DATA MINING, WEB TECHNOLOGY & WEB MINING


FINAL REPORT
COURSE INCHARGE: Dr. Syed Jamal
Hussain

MOHSIN ALI
EP-1651023
MCS FINAL (EVE)
DATA MINING
Data mining is a knowledge discovery process in large and complex data sets, refers to
extracting or “mining” knowledge from large amounts of data.

DATA MINING PROCESS


Some elements in Data Mining Process are:

DATA SET
It is a collection of data, usually presented in tabular form. Each column represents a
particular variable.

PRE-PROCESSING
Data mining requires substantial pre-processing of data. This was especially the case of the
behavioural data. To make the data comparable, all data needs to be normalized.

GENERAL RESULTS
This activity is related to overall assessment of the effort in order to find out whether some
important issues might have been overlooked.

DECISION TREES
Decision trees are powerful and popular tools for classification and prediction. The
attractiveness of decision trees is due to the fact that, these decision trees represent rules.

Association Rules
Association rules describe events that tend to occur together. They are formal statements in
the form of X=>Y, where if X happens, Y is likely to happen (Márquez et al., 2008).

DATA MINING FUNCTIONS


 Association Rules
o Find interesting association or correlation relationship among data items
o Classification
 Predict classes
o Two steps – build model, apply model
 Clustering
o Find natural groups of data

ASSOCIATION RULE LEARNING


It is the searching for relationships between numbers of variables. It may include the
analysis of market survey or customer purchasing pattern or behavior. It is also referred to as
market basket analysis.
CLUSTERING
It is the task of finding and extracting groups and structures in the data in some way without
making use of structures in the data.

CLASSIFICATION
It is the activity of generalizing known structure to apply to the new data set. It may include
the analysis of E-mail whether it is valid or a Spam.

REGRESSION
It tries to extract a function that models the data with the least error.

APPLICATIONS OF DATA MINING:


 Customer Analytics or Market Basket Analysis
 Cyber Forensics and Investigation
 Data mining in Agriculture
 Law Enforcement Agencies
 Ocean Analysis and Satellite Predictions
 Meteorology
 Surveillance

CATEGORIES OF DATA MINING:


 KDD
 Data Visualization
 Case-Based Reasoning
 Neural Networks
 Fuzzy Query Analysis

WEB MINING
The application of data mining techniques to discover patterns from the web (www) and
categorical extraction and evaluation with filtered information for knowledge discovery from
sophisticated web data and its appropriate web services is known as Web Mining.

WEB MINING TAXONOMY


Web mining can be broadly divided into three distinct categories, according to the kinds of
data to be mined.
WEB CONTENT MINING
Web content mining is extraction of information from web page content. First category of
web content mining is which directly mines the content of web documents called Web
Page/Document content mining. Second is that which improves on the content search of other
tools like search engine called as the search result mining.
Data mining techniques can be used by these search engines to improve performance,
efficiency and scalability.

Web Content Mining Process


Web Contents Mining – Classification
 Web page/site classification
 Assign a class label to each web page from a set of predefined topic categories
 Based on a set of examples of pre-classified documents
 Example
 Use Yahoo!'s taxonomy and its associated documents as training and test sets
 Derive a Web document classification model
 Use the model to classify new Web documents by assigning categories from the
same taxonomy
 Methods
 Keyword-based classification, use of hyperlink information, statistical models, …

WEB STRUCTURE MINING


Web structure mining, one of three categories of web mining for data, is a tool used to
identify the relationship between Web pages linked by information or direct link connection. This
structure data is discoverable by the provision of web structure schema through database
techniques for Web pages. This connection allows a search engine to pull data relating to a search
query directly to the linking Web page from the Web site the content rests upon.

Minimize Problem of Search


Structure mining uses minimize two main problems of the World Wide Web due to its
vast amount of information.
 The first of these problems is irrelevant search results.
 The second of these problems is the inability to index the vast amount if
information provided on the Web.

PURPOSE OF STRUCTURE MINING:


The main purpose for structure mining is to extract previously unknown relationships
between Web pages. This structure data mining provides use for a business to link the information
of its own Web site to enable navigation and cluster information into site maps.
Web Structure Mining
 Finding authoritative Web pages
 Retrieving pages that are not only relevant, but also of high quality, or authoritative
on the topic
 Hyperlinks can infer the notion of authority
 A hyperlink pointing to another Web page, this can be considered as the author's
endorsement of the other page
 Problems
 Not every hyperlink represents an endorsement
 One authority will seldom point to its rival authority
 Authoritative pages are seldom particularly descriptive
 Hub
 Set of Web pages that provides collections of links to authorities

Website Structure Levels

WEB USAGE MINING


Web usage mining is the third category in web mining. This type of web mining allows for
the collection of Web access information for Web pages. This usage data provides the paths leading
to accessed Web pages. This information is often gathered automatically into access logs via the
Web server. CGI scripts offer other useful information such as referrer logs, user subscription
information and survey logs.

PROCESS OF WEB USAGE MINING

In the process of data preparation of Web usage mining, the Web content and Web site topology will be
used as the information sources which interacts Web usage mining with the Web content mining and Web structure
mining. The Web usage mining is parsed into three distinctive phases:
 Pre-processing
 Pattern Discovery
 Pattern Analysis

USE:

There are typically three main uses for mining:

 Usage processing, used to complete pattern discovery. This first use is also the most difficult
because only bits of information like IP addresses, user information, and site clicks are
available.
 Use is content processing, consisting of the conversion of Web information like text, images,
scripts and others into useful forms.
 Use is structure processing. This consists of analysis of the structure of each page contained
in a Web site.

MINING THE WORLD-WIDE WEB


 WWW provides rich sources for data mining
 Contents information
 Hyperlink information
 Usage information
 Challenges
 Too huge for effective data warehousing and data mining
 Too complex and heterogeneous
 Growing and changing very rapidly

WEB SEARCH ENGINES


 Index-based
 Search the Web, collect Web pages, index Web pages, and build and store huge
keyword-based indices
 Locate sets of Web pages containing certain keywords
 Deficiencies
 A topic of any breadth may easily contain hundreds of thousands of documents
 Many documents that are highly relevant to a topic may not contain keywords
defining them (synonymy, polysemy)

WEB USAGE MINING


 Mining Web log records
 Discover user access patterns
 Typical Web log entry - URL requested, the IP address from which the request
originated, timestamp, etc.
 OLAP on the Weblog database
 Find the top N users, top N accessed Web pages, most frequently accessed time
periods, etc.
 Data mining on Weblog records
 Find association patterns, sequential patterns, and trends of Web accessing
 Applications
 Target potential customers for electronic commerce
 Identify potential prime advertisement locations
 Enhance the quality and delivery of Internet information services to the end user
 Improve Web server system performance
 Web caching, Web page prefetching, and Web page swapping

WEB MINING TASKS


MINING WEB SEARCH-ENGINE DATA:
An index-based Web search engine crawls the Web, indexes Web pages, and builds and
stores huge keyword-based indices that help locate sets of Web pages that contain specific
keywords. By using a set of tightly constrained keywords and phrases, an experienced user can
quickly locate relevant documents.
ANALYZING THE WEB’S LINK STRUCTURES
Given a keyword or topic, such as investment, we assume a user would like to find pages
that are not only highly relevant, but authoritative and of high quality. Automatically identifying
authoritative Web pages for a certain topic will enhance a Web search’s quality.

CLASSIFYING WEB DOCUMENTS AUTOMATICALLY


Although Yahoo and similar Web directory service systems use human readers to classify
Web documents, reduced cost and increased speed make automatic classification highly desirable.
Typical classification methods use positive and negative examples as training sets, then assign each
document a class label from a set of predefined topic categories based on preclassified document.

APPLICATIONS AND FUTURE OF DATA AND WEB MINING

GENERATE USER PROFILES


Improving web customization and provides users with web pages, web advertisements of
interest.

TARGETED ADVERTISING
Ads are a major source of revenue for web portals and web sites and e-commerce sites.
Internet advertising is probably the ―hottest‖ web mining application today.

FRAUD
Maintain a signature for each user based on buying patterns on the web. If buying pattern
changes significantly, then signal fraud.

PERFORMANCE MANAGEMENT
Annual bandwidth demand is increasing ten fold on average, annual bandwidth supply is
rising only by a factor of three.

FAULT MANAGEMENT
Analyze alarm and traffic data to carry out root cause analysis of faults.

INFORMATION RETRIEVAL (SEARCH) ON THE WEB


Web Mining tools analyze web logs for useful customer-related information that can help
personalize web sites according to user behavior.
CONCLUSION:
Data mining has importance regarding finding the patterns, forecasting, discovery of
knowledge etc., in different business domains. Data mining techniques and algorithms such as
classification, clustering etc., helps in finding the patterns to decide upon the future trends in
businesses to grow. Data mining has wide application domain almost in every industry where the
data is generated that’s why data mining is considered one of the most important frontiers in
database and information systems
The World Wide Web today is the major resource of web data for all domains. Data and
Web mining are challenging activities with the main aim to discover new, relevant and reliable
information and knowledge by investigating the structure of web data, its content and its usage. The
Web Data Mining can help us to understand more some things and provide a base for the
decision make.
References:
 A RESEARCH SUPPORT SYSTEM FRAMEWORK FOR SYSTEM FRAMEWORK
FOR WEB DATA MINING WEB DATA MINING By Jin Xu, Yingping Huang,
Gregory Madey
 Chapter 10 of Data Mining Concepts & techniques by Jiawei Han & Micheline Kamber
 Research Journal: Comparison of Data Mining and Web Mining by M. Rajendra Prasad1,
B. Manjula2, Ayesha Banu3
 http://www.web-datamining.net/usage/
 http://www.web-datamining.net/structure/
 Data Mining in Web Applications Research Journal by Aguascalientes University
(www.intechopen.com)