0 penilaian0% menganggap dokumen ini bermanfaat (0 suara)
6 tayangan5 halaman
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March – April 2014 ISSN 2278-6856
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March – April 2014 ISSN 2278-6856
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March – April 2014 ISSN 2278-6856
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March April 2014 ISSN 2278-6856
Volume 3, Issue 2 March April 2014 Page 218
Abstract: Analysis of web use plays an important role in the growth of business and the World Wide Web. With the enhancement in e-commerce, competition has increased between companies. So companies need to analyse the transaction logs by using web usage mining mechanisms. Using web mining technique, we will be able to extract the information about the customers that are involved in daily transactions. This would also help in increasing the number of customers. Millions of users interact daily on different websites will help the company to know the customer's behavior and improve the customer relationship. We have designed one algorithm based on artificial intelligence technique to generate production rules mechanism. Association rules are simply the if-then rules that present the relationship between the dissimilar data. These rules always provide the same results even after the user accesses the same data every time. Production rules work on the same phenomenon, but it has one additional property of auto updating the indexes which diminishes the shortcoming of currently applied technique using association rules. The indexes are updated automatically when the user enters a new keyword for searching and provide better mining results. The advantage of using this approach in which the production rules are applied and will definitely give better results than association rules because production rules are more adaptive than association rules.
Keywords: web usage mining, web content mining, association rules, indexes, Kays algorithm.
1. INTRODUCTION Database index is the concept of data mining which basically works for improving the speed of data retrieval during any search made from the database through a search box. Data through index are quickly located by reducing the overhead of accessing the database tables tuple by tuple. An index supports fast lookups. Most of the indexing technologies include sub linear time lookup that improves the performance because the linear search is inefficient in that case. Sub linear time is the one which has T(n) =O(n). Indexes are also used for policing the database constraints like Unique, Foreign Key, and Primary Key. The various data mining techniques are used for the function of analyzing patterns are association, clustering, classification, prediction, time series analysis. These techniques tell the relationship of one item to another. The items are interrelated with each other through the association rules. The technique is basically used in the process of selling and buying. For example, in a market if the customer is buying bread then he will have to buy milk also because the bread and milk are connected through the association rules. Association rules work on the concept of if-then rules. Through these rules, every time, a new search is made similar results are returned as an output. To overcome this problem in which association rules or any other traditional algorithms are used and the updating is to be done manually. Production rules have solved this problem with its auto updating feature. Clustering is an identification of classes such as groups or clusters for a set of objects. The objects that have same characteristics are put in one cluster and another object with different behavior in the next cluster. Data mining can be done on the basis of these clusters. Lets take the example of the Bank that makes the clusters of its customers according to their age, hometown, salary etc. So it would be easy for the bank to recognize its customers well and provide the different services according to the clusters already defined. Prediction predicts the feasible values of data that is lacking. This function will find the set of attributes that must be linked to the attribute of interest. It is mainly useful in credit approval, target marketing, medical diagnosis and fraud detection. For example, in a bank the employees salary will only be predicted by going through the salary details of other employee with similar designation and same joining year for the job. Time series analysis technique works on the basis of time data series analysis by determining the similar characteristics such as searching the same data again and again. For example, a manager can predict the stock rates by using the stock history already generated in the last 5-10 years.
2. RELATED WORK The automatic recommendations for e-learning personalization based on web usage mining techniques and information retrieval is applied whose basic function is to provide online automatic recommendations for learners without their feedback. The proposed automatic recommendations in e-learning platform are composed of two modules: an off-line and an online module. Online module will help to recognize the students goals, and prepare a recommendation list. In this paper, clustering is implemented to predict the behavior and needs of students that a particular student must take recommendations according to his/her interest. These recommendations depend on the user needs and association rule mining will An Approach for Information Retrieval: Kays Algorithm
Karanbir Singh 1 , Richa Sapra 2
1 Lovely Professional University, Department of Computer Science and Engineering, J alandhar-Phagwara Road, Punjab, India
2 School of Computer Science and Engineering, Lovely Professional University, J alandhar-Phagwara Road, Punjab, India International Journal of EmergingTrends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 3, Issue 2, March April 2014 ISSN 2278-6856
Volume 3, Issue 2 March April 2014 Page 219
be performed to analyze the behavior and goals of student [1]. The development in web mining is increasing rapidly along with its challenges. Due to large amount of data available on the web, some problems are occurred while extracting knowledge using different data mining techniques. The aim of process mining in any online organization used to maximize expected sets from each visit because large institutions and organizations get log file information data from the website. So for the same, models are constructed in extracting Web content, Web structure, Web communities, authorities, etc. Data should be scalable so that it would be easily available and provided to the user in an efficient and timely manner. To tackle the challenges of fraud and threat analysis and terrorism, data mining is a useful tool. Many institutional organizations can also have challenges of mining proper records where the online system is implemented. So different techniques are discovering continuously in order to tackle these problems according to changing data mining trends [2]. The automatic recommendation of web pages for online users is growing day by day will generate a huge amount of information. So users want to extract the useful information out of it. Every user wants information according to his/her requirements and based on that they will search for data and filter it according to data types. Sometimes the data provided to the user may or may not be useful. So they can use different techniques in order to get suitable information. Users interests can be inferred from users action, browsed documents or past query history etc. The main problem for web master of a website is how to match the user requirements with available information and keep their attention towards website. In this paper, a web recommendation approach is used which recommends user a list of pages based upon users historic pattern and a list of web pages which have not been accessed yet. This approach results in the improvement in the accuracy of the pages accessible to the users [3].
3. MATERIAL AND METHODOLOGIES
Figure 1. System architecture
3.1 Working of Proposed Architecture The process is carried out as follows:- 1. The user enters the query and the request is send to the server. 2. The server looks for the keyword entered by the user in the database. 3. The data present in database index is matched with the searched keyword and give results accordingly. 4. The data is mined in the form of associations with help of content, index and associability. 5. In case of production rules the data is mined and the index is updated automatically every time the new search is made by the keyword that exists in that particular search. 6. Update database index according to selected data, and increase associability. 7. Repeat Steps as per mining request.
3.2 Working of Association Rules and Kays Algorithm 3.2.1 Association rules Association rules works on the principle of if-then rules. These rules consist of two parts: antecedent (if) and consequent(then).An antecedent means that the data item is present in the database and consequent means that the item is found in relation with antecedent. These rules are used for analysing the sequential patterns that are in the form of if then. Association rules are always helpful for the online transactions in which the user is making request for accessing the data. Data present on the web is stored in the database in the form of index. For example, if given data is saved in the form of association rule in a database. {Course} {Attendance} If the user requests for data related to the {Course} then he/ she gets the data of {Course} along with the data of {Attendance} because Attendance and Course are in associations with each other. Also, somewhere in the database index this rule is saved. So every active user who is accessing the Course related file gets attendance data along with it and gets the same searching results even after a search is made twice. So the administrator need to update the index manually for better searching results.
3.2.2 Kays Algorithm Kays algorithm is implemented to overcome the disadvantage of association rules to deal with indexes manually. This algorithm works on same fact of production rules. These rules are in the form of string which consists of terminals and non terminals. Terminal symbols are those which can be replaced by the other strings. Production rules are in the form of:- {Stacks} {stacks}{Queues} {Queues} {queues}{priority queues}
The words starting with capital letter represents the non terminal symbol that are replaceable with the data content and rest all are terminals that we dont need to be replaced. When a user fires a search query for Stacks, server will respond back and show results according to the searched keyword. From this, if there is a linkage to the Queues .Then the user also accesses the Queues according to the requirement of user. As the user click these Queues link from Stacks. Stacks are automatically International Journal of EmergingTrends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 3, Issue 2, March April 2014 ISSN 2278-6856
Volume 3, Issue 2 March April 2014 Page 220
added to the index of queues. Next time, when the user even search with queues keyword .He will get the all data related to queues as well as stacks because the indexes are updated automatically. This algorithm provides us the better results for mining as compared to the previous technique.
Steps for Kays Algorithm
4. RESULT AND DISCUSSIONS 4.1 Working of Association Rules
Table 1: Contents on web Cha pter No. Chapter Name Data Index 1 Data types In the C programming language, data types refer to an extensive system used for declaring variables .Arithmetic types consists of the two types: (a) integer types and (b) floating- point types. NULL 2 Loops A loop statement allows us to execute a statement or group of statements multiple times. NULL 3 Decision making Decision making structures require that the programmer NULL specify one or more conditions to be evaluated or tested by the program whether its true or false. 4 Arrays Array is a collection of same data types. NULL 5 Stacks It works as Last in First out (LIFO) NULL 6 Queues It works as First in First out (FIFO).It is opposite to stacks. NULL 7 Priority Queues Priority queues include number of queues that have assigned a priority and are executed accordingly. NULL
Now, If the user make a search with keyword Stacks. Then he gets the data related to the stacks or where the stacks keyword exists in the database as shown in Table 2.
4.1.1 Results and Output Table 2: Contents after applying associations Chapt er No. Chapter Name Data Index 5 Stacks It works as Last in First out (LIFO) NULL 6 Queues It works as First in First out (FIFO).It is opposite to stacks. NULL
4.2 Working of Kays Algorithm Table 3: Contents before applying Kays algorithm Cha pter No. Chapter Name Data Index 1 Data types In the C programming language, data types refer to an extensive system used for declaring variables NULL Begin // Initialization step Mine the 1 to n data items as the associations in content and indexes. // Updation step Update the database index according to the selected data and increase associability. // Final step Repeat the steps as per the mining request. International Journal of EmergingTrends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 3, Issue 2, March April 2014 ISSN 2278-6856
Volume 3, Issue 2 March April 2014 Page 221
.Arithmetic types consists of the two types: (a) integer types and (b) floating- point types. 2 Loops A loop statement allows us to execute a statement or group of statements multiple times. NULL 3 Decision making Decision making structures require that the programmer specify one or more conditions to be evaluated or tested by the program whether its true or false. NULL 4 Arrays Array is a collection of same data types. NULL 5 Stacks It works as Last in First out (LIFO) NULL 6 Queues It works as First in First out (FIFO).It is opposite to stacks. NULL 7 Priority Queues Priority queues include number of queues that have assigned a priority and are executed accordingly. NULL
Now, If the user make a search with keyword Stacks. Then he gets the data related to the stacks or where the stacks keyword exists in the database like in Queues. From there if the user accesses the Queues data item, then index will be automatically updated and the stacks are added to the index of Queues shown in Table 4. Next time if the user even searches for the Queues keyword .He gets the Queues data as well as the Stacks data.
4.2.1 Step 1 In this step, the indexes are updated automatically in the database. Stacks are added in the index of Queues.
Table 4: Addition of indexes Cha pter No. Chapter Name Data Index 1 Data types In the C programming language, data types refer to an extensive system used for declaring variables .Arithmetic types consists of the two types: (a) integer types and (b) floating-point types. NULL 2 Loops A loop statement allows us to execute a statement or group of statements multiple times. NULL 3 Decision making Decision making structures require that the programmer specify one or more conditions to be evaluated or tested by the program whether its true or false. NULL 4 Arrays Array is a collection of same data types. NULL 5 Stacks It works as Last in First out (LIFO) NULL 6 Queues It works as First in First out (FIFO).It is opposite to stacks. Stacks 7 Priority Queues Priority queues include number of queues that have assigned a priority and are executed accordingly. NULL
4.2.2 Step 2 In the next step, make a search with keyword Queues and get the results shown in Table 5. International Journal of EmergingTrends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 3, Issue 2, March April 2014 ISSN 2278-6856
Volume 3, Issue 2 March April 2014 Page 222
Table 5: Mined contents after applying Kays Cha pter No. Chapter Name Data Index 6 Queues It works as First in First out (FIFO).It is opposite to stacks. Stacks 7 Priority Queues Priority queues include number of queues that have assigned a priority and are executed accordingly. NULL 5 Stacks It works as Last in First out (LIFO). NULL
4.2.3 Results and Output
Figure2. Final results
5. CONCLUSION In this paper, we have outlined the general principles of new approach to perform mining on web contents in E- learning platforms by updating the database indexes that must be useful in searching the web contents through the search engines. This is a dominant technique which will provide better results for mining under the consideration of educational resources. This automated indexing technique is used in order to build and create the content models as well as learners profiles. The enhancement done in this technique will definitely increase the quality of learner object. This new technique that we have developed can be applicable in software mining, application mining or any other resources where there is a need of filtering and mining.
REFERENCES [1] Mohamed Koutheair Khribi, Mohamed Jemni and Olfa Nasraoui, Automatic Recommendations for E- Learning Personalization Based on Web Usage Mining Techniques and Information Retrieval, Educational Technology & Society, pp. 3042, 2009. [2] Md. Zahid Hasan, Jakaria Ahmad Chisty Khawja and Nur-E-Zaman Ayshik, Research Challenges in Web Data mining, International J ournal of Computer Science and Telecommunications, IEEE, pp. 80- 83,2012. [3] Ravi Bhushan, Dr. Nath Rajender, Automatic Recommendation of Web Pages for Online Users Using Web Usage Mining, International Conference on Computing Sciences, IEEE, pp. 371-374, 2012. [4] A.Gosain and M. Bhugra,A comprehensive survey of association rules on quantative data in data mining, International Conference on Information and Communication technology, IEEE, pp. 1003- 1008, 2013. [5] Yihua Zhong and Yuxin Liao ChinaResearch of mining effective and weighted association rules based on dual conference, 4 th International Conference on Computational and Information Sciences, IEEE, pp. 1228-1231, 2012. [6] Bakar and Kadir,. Mining the positive and negative association rules from interesting frequent and infrequent data items , 9 th International Conference on Fuzzy Systems and Knowledge Discovery, IEEE, pp. 650-655, 2012 . [7] Chang Liu and M.K Agaram, Vocabulary Model Requirement for Production Rule System, 16 th
International Conference on Enterprise Distributed Object Computing Conference Workshops, IEEE, pp. 132-139, 2012. [8] L. Li and H. Yu, Offline Learning Based Adaptive Dispatching Rule for Semiconductor Water Fabrication Facility, International Conference on Automation Science and Engineering, IEEE, pp. 1028-1033, 2013.
AUTHOR Karanbir Singh received the B.Tech. and M.Tech. degrees in Computer Science and Engineering from Lovely Professional University in 2012 and 2014, respectively. During 2012-2014, he stayed in Research Laboratory of India to study advanced data and web mining techniques along with its applications.