Anda di halaman 1dari 5

International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com


Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 218


Abstract: Analysis of web use plays an important role in the
growth of business and the World Wide Web. With the
enhancement in e-commerce, competition has increased
between companies. So companies need to analyse the
transaction logs by using web usage mining mechanisms.
Using web mining technique, we will be able to extract the
information about the customers that are involved in daily
transactions. This would also help in increasing the number
of customers. Millions of users interact daily on different
websites will help the company to know the customer's
behavior and improve the customer relationship. We have
designed one algorithm based on artificial intelligence
technique to generate production rules mechanism.
Association rules are simply the if-then rules that present the
relationship between the dissimilar data. These rules always
provide the same results even after the user accesses the same
data every time. Production rules work on the same
phenomenon, but it has one additional property of auto
updating the indexes which diminishes the shortcoming of
currently applied technique using association rules. The
indexes are updated automatically when the user enters a new
keyword for searching and provide better mining results. The
advantage of using this approach in which the production
rules are applied and will definitely give better results than
association rules because production rules are more adaptive
than association rules.

Keywords: web usage mining, web content mining, association
rules, indexes, Kays algorithm.

1. INTRODUCTION
Database index is the concept of data mining which
basically works for improving the speed of data retrieval
during any search made from the database through a
search box. Data through index are quickly located by
reducing the overhead of accessing the database tables
tuple by tuple. An index supports fast lookups. Most of
the indexing technologies include sub linear time lookup
that improves the performance because the linear search
is inefficient in that case. Sub linear time is the one which
has T(n) =O(n). Indexes are also used for policing the
database constraints like Unique, Foreign Key, and
Primary Key. The various data mining techniques are
used for the function of analyzing patterns are
association, clustering, classification, prediction, time
series analysis. These techniques tell the relationship of
one item to another. The items are interrelated with each
other through the association rules. The technique is
basically used in the process of selling and buying. For
example, in a market if the customer is buying bread then
he will have to buy milk also because the bread and milk
are connected through the association rules. Association
rules work on the concept of if-then rules. Through these
rules, every time, a new search is made similar results are
returned as an output. To overcome this problem in which
association rules or any other traditional algorithms are
used and the updating is to be done manually. Production
rules have solved this problem with its auto updating
feature. Clustering is an identification of classes such as
groups or clusters for a set of objects. The objects that
have same characteristics are put in one cluster and
another object with different behavior in the next cluster.
Data mining can be done on the basis of these clusters.
Lets take the example of the Bank that makes the
clusters of its customers according to their age,
hometown, salary etc. So it would be easy for the bank to
recognize its customers well and provide the different
services according to the clusters already defined.
Prediction predicts the feasible values of data that is
lacking. This function will find the set of attributes that
must be linked to the attribute of interest. It is mainly
useful in credit approval, target marketing, medical
diagnosis and fraud detection. For example, in a bank the
employees salary will only be predicted by going through
the salary details of other employee with similar
designation and same joining year for the job. Time series
analysis technique works on the basis of time data series
analysis by determining the similar characteristics such as
searching the same data again and again. For example, a
manager can predict the stock rates by using the stock
history already generated in the last 5-10 years.

2. RELATED WORK
The automatic recommendations for e-learning
personalization based on web usage mining techniques
and information retrieval is applied whose basic function
is to provide online automatic recommendations for
learners without their feedback. The proposed automatic
recommendations in e-learning platform are composed of
two modules: an off-line and an online module. Online
module will help to recognize the students goals, and
prepare a recommendation list. In this paper, clustering is
implemented to predict the behavior and needs of students
that a particular student must take recommendations
according to his/her interest. These recommendations
depend on the user needs and association rule mining will
An Approach for Information Retrieval:
Kays Algorithm

Karanbir Singh
1
, Richa Sapra
2


1
Lovely Professional University, Department of Computer Science and Engineering,
J alandhar-Phagwara Road, Punjab, India

2
School of Computer Science and Engineering, Lovely Professional University,
J alandhar-Phagwara Road, Punjab, India
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 219


be performed to analyze the behavior and goals of student
[1]. The development in web mining is increasing rapidly
along with its challenges. Due to large amount of data
available on the web, some problems are occurred while
extracting knowledge using different data mining
techniques. The aim of process mining in any online
organization used to maximize expected sets from each
visit because large institutions and organizations get log
file information data from the website. So for the same,
models are constructed in extracting Web content, Web
structure, Web communities, authorities, etc. Data should
be scalable so that it would be easily available and
provided to the user in an efficient and timely manner. To
tackle the challenges of fraud and threat analysis and
terrorism, data mining is a useful tool. Many institutional
organizations can also have challenges of mining proper
records where the online system is implemented. So
different techniques are discovering continuously in order
to tackle these problems according to changing data
mining trends [2]. The automatic recommendation of web
pages for online users is growing day by day will generate
a huge amount of information. So users want to extract
the useful information out of it. Every user wants
information according to his/her requirements and based
on that they will search for data and filter it according to
data types. Sometimes the data provided to the user may
or may not be useful. So they can use different techniques
in order to get suitable information. Users interests can
be inferred from users action, browsed documents or past
query history etc. The main problem for web master of a
website is how to match the user requirements with
available information and keep their attention towards
website. In this paper, a web recommendation approach is
used which recommends user a list of pages based upon
users historic pattern and a list of web pages which have
not been accessed yet. This approach results in the
improvement in the accuracy of the pages accessible to
the users [3].

3. MATERIAL AND METHODOLOGIES

Figure 1. System architecture

3.1 Working of Proposed Architecture
The process is carried out as follows:-
1. The user enters the query and the request is send to the
server.
2. The server looks for the keyword entered by the user in
the database.
3. The data present in database index is matched with the
searched keyword and give results accordingly.
4. The data is mined in the form of associations with help
of content, index and associability.
5. In case of production rules the data is mined and the
index is updated automatically every time the new search
is made by the keyword that exists in that particular
search.
6. Update database index according to selected data, and
increase associability.
7. Repeat Steps as per mining request.

3.2 Working of Association Rules and Kays
Algorithm
3.2.1 Association rules
Association rules works on the principle of if-then rules.
These rules consist of two parts: antecedent (if) and
consequent(then).An antecedent means that the data item
is present in the database and consequent means that the
item is found in relation with antecedent. These rules are
used for analysing the sequential patterns that are in the
form of if then. Association rules are always helpful for
the online transactions in which the user is making
request for accessing the data. Data present on the web is
stored in the database in the form of index. For example,
if given data is saved in the form of association rule in a
database.
{Course} {Attendance}
If the user requests for data related to the {Course} then
he/ she gets the data of {Course} along with the data of
{Attendance} because Attendance and Course are in
associations with each other. Also, somewhere in the
database index this rule is saved. So every active user who
is accessing the Course related file gets attendance data
along with it and gets the same searching results even
after a search is made twice. So the administrator need to
update the index manually for better searching results.

3.2.2 Kays Algorithm
Kays algorithm is implemented to overcome the
disadvantage of association rules to deal with indexes
manually. This algorithm works on same fact of
production rules. These rules are in the form of string
which consists of terminals and non terminals. Terminal
symbols are those which can be replaced by the other
strings. Production rules are in the form of:-
{Stacks} {stacks}{Queues}
{Queues} {queues}{priority queues}

The words starting with capital letter represents the non
terminal symbol that are replaceable with the data content
and rest all are terminals that we dont need to be
replaced. When a user fires a search query for Stacks,
server will respond back and show results according to
the searched keyword. From this, if there is a linkage to
the Queues .Then the user also accesses the Queues
according to the requirement of user. As the user click
these Queues link from Stacks. Stacks are automatically
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 220


added to the index of queues. Next time, when the user
even search with queues keyword .He will get the all data
related to queues as well as stacks because the indexes are
updated automatically. This algorithm provides us the
better results for mining as compared to the previous
technique.

Steps for Kays Algorithm















4. RESULT AND DISCUSSIONS
4.1 Working of Association Rules

Table 1: Contents on web
Cha
pter
No.
Chapter
Name
Data Index
1 Data types In the C
programming
language, data
types refer to an
extensive
system used for
declaring
variables
.Arithmetic
types consists of
the two types:
(a) integer types
and (b) floating-
point types.
NULL
2 Loops A loop
statement
allows us to
execute a
statement or
group of
statements
multiple times.
NULL
3 Decision
making
Decision
making
structures
require that the
programmer
NULL
specify one or
more conditions
to be evaluated
or tested by the
program
whether its true
or false.
4 Arrays Array is a
collection of
same data types.
NULL
5 Stacks It works as Last
in First out
(LIFO)
NULL
6 Queues It works as First
in First out
(FIFO).It is
opposite to
stacks.
NULL
7 Priority
Queues
Priority queues
include number
of queues that
have assigned a
priority and are
executed
accordingly.
NULL

Now, If the user make a search with keyword Stacks.
Then he gets the data related to the stacks or where the
stacks keyword exists in the database as shown in Table
2.



4.1.1 Results and Output
Table 2: Contents after applying associations
Chapt
er
No.
Chapter
Name
Data Index
5 Stacks It works as Last in First
out (LIFO)
NULL
6 Queues It works as First in First
out (FIFO).It is opposite
to stacks.
NULL

4.2 Working of Kays Algorithm
Table 3: Contents before applying Kays algorithm
Cha
pter
No.
Chapter
Name
Data Index
1 Data types In the C
programming
language, data
types refer to an
extensive
system used for
declaring
variables
NULL
Begin
// Initialization step
Mine the 1 to n data items as the
associations in content and indexes.
// Updation step
Update the database index according to the
selected data and increase associability.
// Final step
Repeat the steps as per the mining request.
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 221


.Arithmetic
types consists of
the two types:
(a) integer types
and (b) floating-
point types.
2 Loops A loop
statement
allows us to
execute a
statement or
group of
statements
multiple times.
NULL
3 Decision
making
Decision
making
structures
require that the
programmer
specify one or
more conditions
to be evaluated
or tested by the
program
whether its true
or false.
NULL
4 Arrays Array is a
collection of
same data types.
NULL
5 Stacks It works as Last
in First out
(LIFO)
NULL
6 Queues It works as First
in First out
(FIFO).It is
opposite to
stacks.
NULL
7 Priority
Queues
Priority queues
include number
of queues that
have assigned a
priority and are
executed
accordingly.
NULL

Now, If the user make a search with keyword Stacks.
Then he gets the data related to the stacks or where the
stacks keyword exists in the database like in Queues.
From there if the user accesses the Queues data item, then
index will be automatically updated and the stacks are
added to the index of Queues shown in Table 4. Next time
if the user even searches for the Queues keyword .He
gets the Queues data as well as the Stacks data.

4.2.1 Step 1
In this step, the indexes are updated automatically in the
database. Stacks are added in the index of Queues.

Table 4: Addition of indexes
Cha
pter
No.
Chapter
Name
Data Index
1 Data
types
In the C
programming
language, data
types refer to an
extensive system
used for declaring
variables
.Arithmetic types
consists of the
two types: (a)
integer types and
(b) floating-point
types.
NULL
2 Loops A loop statement
allows us to
execute a
statement or
group of
statements
multiple times.
NULL
3 Decision
making
Decision making
structures require
that the
programmer
specify one or
more conditions
to be evaluated or
tested by the
program whether
its true or false.
NULL
4 Arrays Array is a
collection of same
data types.
NULL
5 Stacks It works as Last
in First out
(LIFO)
NULL
6 Queues It works as First
in First out
(FIFO).It is
opposite to stacks.
Stacks
7 Priority
Queues
Priority queues
include number
of queues that
have assigned a
priority and are
executed
accordingly.
NULL




4.2.2 Step 2
In the next step, make a search with keyword Queues
and get the results shown in Table 5.
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 222


Table 5: Mined contents after applying Kays
Cha
pter
No.
Chapter
Name
Data Index
6 Queues It works as First
in First out
(FIFO).It is
opposite to
stacks.
Stacks
7 Priority
Queues
Priority queues
include number
of queues that
have assigned a
priority and are
executed
accordingly.
NULL
5 Stacks It works as Last
in First out
(LIFO).
NULL

4.2.3 Results and Output

Figure2. Final results

5. CONCLUSION
In this paper, we have outlined the general principles of
new approach to perform mining on web contents in E-
learning platforms by updating the database indexes that
must be useful in searching the web contents through the
search engines. This is a dominant technique which will
provide better results for mining under the consideration
of educational resources. This automated indexing
technique is used in order to build and create the content
models as well as learners profiles. The enhancement
done in this technique will definitely increase the quality
of learner object. This new technique that we have
developed can be applicable in software mining,
application mining or any other resources where there is a
need of filtering and mining.

REFERENCES
[1] Mohamed Koutheair Khribi, Mohamed Jemni and
Olfa Nasraoui, Automatic Recommendations for E-
Learning Personalization Based on Web Usage
Mining Techniques and Information Retrieval,
Educational Technology & Society, pp. 3042, 2009.
[2] Md. Zahid Hasan, Jakaria Ahmad Chisty Khawja and
Nur-E-Zaman Ayshik, Research Challenges in Web
Data mining, International J ournal of Computer
Science and Telecommunications, IEEE, pp. 80-
83,2012.
[3] Ravi Bhushan, Dr. Nath Rajender, Automatic
Recommendation of Web Pages for Online Users
Using Web Usage Mining, International Conference
on Computing Sciences, IEEE, pp. 371-374, 2012.
[4] A.Gosain and M. Bhugra,A comprehensive survey
of association rules on quantative data in data
mining, International Conference on Information
and Communication technology, IEEE, pp. 1003-
1008, 2013.
[5] Yihua Zhong and Yuxin Liao ChinaResearch of
mining effective and weighted association rules based
on dual conference, 4
th
International Conference on
Computational and Information Sciences, IEEE, pp.
1228-1231, 2012.
[6] Bakar and Kadir,. Mining the positive and negative
association rules from interesting frequent and
infrequent data items , 9
th
International Conference
on Fuzzy Systems and Knowledge Discovery, IEEE,
pp. 650-655, 2012 .
[7] Chang Liu and M.K Agaram, Vocabulary Model
Requirement for Production Rule System, 16
th

International Conference on Enterprise Distributed
Object Computing Conference Workshops, IEEE, pp.
132-139, 2012.
[8] L. Li and H. Yu, Offline Learning Based Adaptive
Dispatching Rule for Semiconductor Water
Fabrication Facility, International Conference on
Automation Science and Engineering, IEEE, pp.
1028-1033, 2013.


AUTHOR
Karanbir Singh received the B.Tech. and
M.Tech. degrees in Computer Science and
Engineering from Lovely Professional
University in 2012 and 2014, respectively.
During 2012-2014, he stayed in Research Laboratory of
India to study advanced data and web mining techniques
along with its applications.

Anda mungkin juga menyukai