Anda di halaman 1dari 11

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),

ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 IAEME
129












DATA MINING CLASSIFICATION APPROACH FOR WEB-BASED
EDUCATIONAL SYSTEM USING GENETIC ALGORITHM


Ch. Neelima
1
, K. Sridhar
2
,
Prof. S.S.V.N. Sarma
3

1
Department of Computer Science, Kakatiya University - Warangal, Telangana, India
2
Dravidian University Kuppam, Chittoor, A.P. India

3
Dept. of CSE, Vaagdevi College of Engineering - Warangal, Telangana, India




ABSTRACT

The ever Increasing progress of a network-distributed computing and particularly the rapid
development of the web have had a broad impact on society. Online delivery of educational
instructions provides the opportunity to bring the colleges and universities simply use the online
infrastructure for institutions and students. The main aim of this paper is to introduce to find similar
patterns of use in the data gathered from Learning Online Network with Computer-Assisted
Personalized Approach (LON-CAPA), and eventually be able to make predictions as to the most-
beneficial course of studies for each learner based on their present usage. The system could then
make suggestions to the learner as to how to best proceed. The objective is to predict the students
final grades based on their web-use features, which are extracted from the homework data. Using a
GA to optimize a combination of classifiers test data we selected the student and course data of a
LON-CAPA course, we design, implement, and evaluate a series of pattern classifiers with various
parameters in order to compare their performance on a dataset from LON-CAPA.

Keywords: Data Mining; Genetic Algorithm; Clustering; Classification; Prediction.

1. INTRODUCTION

Data mining is a knowledge discovery process to find previously unknown, potentially useful
and non-trivial patterns from large repositories of data [4]. The application of data mining technique
such as classification approach is used to extract knowledge from web data? There are three web
mining categories: web content mining, web structure mining and web usage mining [3]. Web usage
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
TECHNOLOGY (IJCET)




ISSN 0976 6367(Print)
ISSN 0976 6375(Online)
Volume 5, Issue 5, May (2014), pp. 129-139
IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2014): 8.5328 (Calculated by GISI)
www.jifactor.com

IJCET

I A E M E

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 IAEME
130

mining is the more relevant technique for e-learning systems. Web usage mining generally refers to
the application of data mining techniques on web logs and meta-data. Frequently used methods in
web usage mining are:

Association rules. Associations between web pages visited.

Sequence analysis. Analyzing sequences of page hits in a visit or between visits by the same
user.

Clustering and classification. Grouping users by navigation behavior, grouping pages by
content, type, access, and grouping similar navigation behaviors.

The use of rule mining in education is not new but was already successfully employed in
several web-based educational systems. Data mining techniques can discover useful information that
can be used in formative evaluation to assist educators establish a pedagogical basis for decisions
when designing or modifying an environment or teaching approach. The application of data mining
in educational systems is an iterative cycle of hypothesis formation, testing, and refinement (Figure.
1). Mined knowledge should enter the loop of the system and guide, facilitate and enhance learning
as a whole. Not only turning data into knowledge, but also filtering mined knowledge for decision
making. This concept is shown in figure 1.


Figure 1: A Cycle of applying data mining education systems

This technique web based system of educators and academics responsible are in charge of
designing, planning, building and maintaining the educational systems. Students use and interact
with them. Starting from all the available information about courses, students, usage and interaction,
different data mining techniques can be applied in order to discover useful knowledge that helps to
improve the e-learning process. The discovered knowledge can be used not only by providers
(educators) but also by own users (students).


International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 IAEME
131

2. PROPOSED SYSTEM

Many leading instructional establishments are operating to ascertain an internet teaching
and learning presence. Many systems with totally different capabilities and approaches have been
developed to deliver on-line education in a tutorial setting.

In this paper, two kinds of large data sets are proposed.

Educational resources such as web pages, demonstrations, simulations, and individualized
problems designed for use on homework assignments, quizzes, and examinations; and

Information about users who create, modify, assess, or use these resources. In other
words, we have two ever-growing pools of data.


Figure 2: Proposed System Architecture

In this proposed system there are two modules considered for this framework such as
User and Administrator module. In this framework, administrator is responsible to capture all the
user preferences including the analysis of the domain data. All users have the capabilities to
communicate and cooperate with web based system network for online learning and teaching
presence.
Administrator is maintaining all data sets of examination, quizzes and home work
assignments in data reposit. User or Student module is responsible to register the all details about
them in this frame work application after registration student can access the web based system
network and participating in the online teaching. Once student was participated in this frame
work administrator has allotted the grading of the student for participating in examination, home
work, quizzes and assignment etc. These all data has stored in data repository.

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 IAEME
132


Figure 3: Process Flow Diagram


3. CLASSIFICATION

Classification is used to find a model that segregates data into predefined classes [3]. Thus
classification is based on the features present in the data. The result is a description of the present
data and a better understanding of each class in the database. Thus classification provides a model
for describing future data. Prediction helps users make a decision. Predictive modeling for
knowledge discovery in databases predicts unknown or future values of some attributes of interest
based on the values of other attributes in a database. Different methodologies have been used for
classification and developing predictive modeling including Bayesian inference, neural net
approaches, decision tree-based methods and genetic algorithms-based approaches.

3.1 Nearest Neighbor method:
The k-nearest neighbor algorithm makes a classification for a given sample without making
any assumptions about the distribution of the training and testing data [4]. Each testing sample must
be compared to all the samples in the training set in order to classify the sample. In order to make a
decision using this algorithm, the distances between the testing sample and all the samples in the
training set must first be calculated. In this any distance measurement may be used. The Euclidean
distance metric requires normalization of all features into the same range. At this point, the k closest
neighbors of the given sample are determined where k represents an integer number between 1 and
the total number of samples. The testing sample is then assigned to the label most frequently
represented among the k nearest samples. The value of k that is chosen for this decision rule has an
affect on the accuracy of the decision rule. The k-nearest neighbor classifier is a nonparametric
classifier that is said to yield an efficient performance for optimal values of k.

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 IAEME
133


Figure 4: K-Nearest Neighbor

Algorithm Steps:

The set of stored records

Distance Metric to compute distance between records

The value of k, the number of nearest neighbors to retrieve

The k-nearest neighbor classification algorithm




Figure 5: Nearest Neighbor for K=3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 IAEME
134

3.2 K-Means Method
The k-means algorithm is the simplest and most commonly used clustering algorithm
employing a square error criterion [3]. It is computationally fast, and iteratively partitions a data set
into k disjoint clusters, where the value of k is an algorithmic input. The goal is to obtain the partition
(usually of hyper-spherical shape) with the smallest square-error. Suppose k clusters {C1, C2, ,
Ck} such that Ck has nk patterns. The mean vector or center of cluster Ck.

( ) ( )
1
1
k
n
k k
i
i k
x
n

=
=


Where n
i
is number of patterns in cluster C
i
, (among exactly k clusters: C1, C2,, Ck) and x
is the point in space representing the given object.

The total squared-error: Where
2 2
k
k k
T
E e =



2 ( ) ( )
1
( )( )
k
n
k k k k
k i i
i
e x x
=
=



The steps of the iterative algorithm for partitioned clustering are as follows:

1. Choose an initial partition with k < n clusters (1, 2, ,
k
) are cluster centers and n is the
number of patterns).

2. Generate a new partition by assigning a pattern to its nearest cluster center
i
.

3. Recompute new cluster centers
i
.

4. Go to step 2 unless there is no change in
i
.

5. Return 1, 2, ,
k
as the mean values of C1, C2,, Ck.

3.3 Classifiers
Pattern recognition has a wide variety of applications in many different fields; therefore it is
not possible to come up with a single classifier that can give optimal results in each case. The
optimal classifier in every case is highly dependent on the problem domain. In practice, one might
come across a case where no single classifier can perform at an acceptable level of accuracy. In such
cases it would be better to pool the results of different classifiers to achieve the optimal accuracy.
Every classifier operates well on different aspects of the training or test feature vector. As a result,
assuming appropriate conditions, combining multiple classifiers may improve classification
performance when compared with any single classifier.

4. OPTIMIZATION USING GENETIC ALGORITHM

A genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution.
This heuristic is routinely used to generate useful solutions to optimization and search
problems. Genetic algorithms belong to the larger class of evolutionary algorithms (EA), which
generate solutions to optimization problems using techniques inspired by natural evolution, such as
inheritance, mutation, selection, and crossover [11].
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 IAEME
135

In a genetic algorithm, a population of strings (called chromosomes or the genotype of the
genome), which encode candidate solutions (called individuals, creatures, or phenotypes) to an
optimization problem, evolves toward better solutions. Traditionally, solutions are represented in
binary as strings of 0s and 1s, but other encodings are also possible. The evolution usually starts
from a population of randomly generated individuals and happens in generations. In each generation,
the fitness of every individual in the population is evaluated, multiple individuals are stochastically
selected from the current population (based on their fitness), and modified (recombined and possibly
randomly mutated) to form a new population. The new population is then used in the next iteration of
the algorithm [14]. Commonly, the algorithm terminates when either a maximum number of
generations has been produced, or a satisfactory fitness level has been reached for the population. If
the algorithm has terminated due to a maximum number of generations, a satisfactory solution may
or may not have been reached.

Genetic Algorithm Steps

Begin

1. X: = choose an Initial population;

2. Cost (X) := Compute initial chromosome cost of X;

3. Best-fitness chromosome value := Cost (X); Best-Soln := X;

while (stopping criterion not met) do repeat (pre-chosen number of times)

4. X:= Select a random neighbor from N(X);

5. C: = Cost (X) Cost (X);

6. prob := generate random number (0,1);

i. If ((C < 0) or (prob <= e-C)) then

{

X: = X;

Cost (X):= Cost (X)
if (Cost (X) < Best- fitness chromosome value) then

{

Best- fitness chromosome Value: = Cost (X); Best-Soln: = X;

}
}

7. End repeat;
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 IAEME
136

8. End while;
Output Best- fitness chromosome value;

End

5. IMPLEMENTATION

The performance of faculty a college of any school or an Institute has been found to be
obsessed with variety of parameters broadly starting from the individuals qualifications, experience,
level of commitment, analysis activities undertaken to institutional support, monetary feasibility,
high managements support etc.

Use a GA as an optimization tool for resetting the parameters in other classifiers.

Most applications of GAs in pattern recognition optimize some parameters in the
classification process.

Implement and use a GA to optimize a combination of classifiers

We design, implement, and evaluate a series of pattern classifiers with various parameters in
order to compare their performance on a dataset from LON-CAPA

Some of students dropped the course after doing a couple of homework sets, so they do not
have any final grades

The project has 3 major actors.

1. System Administrator

2. Student

3. Faculty

System Administrator

The Administrator is responsible for

1. Adding multiple students to the system

2. Managing the faculties like assigning them to the colleges
3. Deciding the no of examination questions for students

4. Deciding the questions for quizzes

5. Check the summary report



International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 IAEME
137

Student

Role of Student:

1. Change his/her password

2. Participating in the quizzes and examination

6. RESULTS

Some experiments in order to evaluate the performance and usefulness of different
classification algorithms for predicting students final marks based on information in the students
usage data in an e-learning system. The main objective is to classify students with final marks into
different groups depending on the activities carried out in a web-based course. The Table I shows the
sample student dataset and Table II shows the sample Faculty dataset.

Table I: Student Details
Name Sid Address Email-ID Phone Zip Pwd
Harshith 111 Warangal harivala@gmail.com 8823462345 506001 Hari
Ravi 333 Hyderabad ravi@yahoo.com 9987876543 500031 Ravi01
Ranjeet 544 Banglore ranjeet@gmail.com 8876523456 500015 Ran07
Meghana 344 Kazipet megha@rediffmail.com 8345562547 506006 Megha
Manvitha 123 Warangal manvi_45@gmail.com 9753256324 506004 Manvi88
Varshith 653 Hyderabad varshit_k@yahoo.com 9562453562 500021 Var667
Stalin 768 Banglore stalin33ch@gmail.com 9876589786 560007 Sta222
Maleeha 543 Mumbai maleeha@gmail.com 9976854634 400022 Mal99
Ashish 434 Warangal ashish.raj@yahoo.com 8765439834 500600 Ashish3
Nanditha 577 Warangal nandu@yahoo.com 9812367854 500603 nan345
Vikranth 666 Mumbai vikranth67@gmail.com 8876545987 400106 Vik11
Kranthi 888 Hyderabad pkrnathip@gmail.com 9745623456 500025 Kra77

The students answer the assignments and homework given by a particular faculty for a
particular subject. This assignment or homework is answered by the student and is evaluated by the
system as per the answers submitted by the faculty along with the assignment. The assignment or
homework has objective questions which are evaluated by the system. The students are graded as
High, Medium or Fail as shown in the Figure 6.

Table II: Faculty Details
F_Id Faculty Pwd Sub Address Mobile Email ID Zip
1 Neelima 123 1 Warangal 998765345
3
neelima@gmail.com 506001
2 Vijay 658 7 Hyderabad 998769987
1
vijay@yahoo.com 500041
3 Swetha 452 3 Mumbai 889765478 swethas@gmail.com 400071
4 Kamal 776 7 Hyderabad 998786231
3
nkamal@gmail.com 500004





International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 IAEME
138


Figure 6: Grade Distribution

VII. CONCLUSION

A new approach to classifying student usage of web-based instruction was proposed. Three
classifiers are used in grouping the students. Weighing the features and using a genetic algorithm to
minimize the error rate improves the prediction accuracy by at least 10%. The successful
optimization of student classification in all three cases demonstrates the merits of using the LON-
CAPA data to predict the students final grades based on their features, which are extracted from the
homework data. This approach is easily adaptable to different types of courses, different population
sizes, and allows for different features to be analyzed.
This work represents a rigorous application of known classifiers as a means of analyzing and
comparing use and performance of students who have taken a technical course that was partially or
completely administered by the web.

REFERENCES

[1] Minaei Bidgoli, B., and Punch, Using Genetic Algorithms for Data Mining Optimization in
an Educational Web-based System.
[2] Freitas, A.A. A survey of Evolutionary Algorithms for Data Mining and Knowledge
Discovery, See: www.pgia.pucpr.br/~alex/papers. To appear in: A. Ghosh and S. Tsutsui.
(Eds.) Advances in Evolutionary Computation. Springer-Verlag.
[3] Duda, R.O, Hart, P.E, and Stork D.G. Pattern Classification. 2
nd
Edition, John Wiley &
Sons, Inc., New York NY.
[4] Kuncheva, L.I., and Jain, L.C., Designing Classifier Fusion Systems by Genetic
Algorithms, IEEE Transaction on Evolutionary Computation, Vol. 33 2000, pp 351-373
[5] Kortemeyer, G., Bauer, W., Kashy, D. A., Kashy, E., & Speier, C. ,The Learning Online
Network with CAPA Initiative, Proceedings of the Frontiers in Education conference, 2001.
http://www.lon-capa.org.
[6] Kashy, D. A., Albertelli, G., Ashkenazi, G., Kashy E. Ng, H. K., & Thoennessen, M.,
Individualized interactive exercises: A promising role for network technology, Proceedings
of the Frontiers in Education conference, 2001.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 129-139 IAEME
139

[7] Albertelli, G., Minaei-Bigdoli, B., Punch, W.F., Kortemeyer, G., & Kashy, E., Concept
Feedback in Computer-Assisted Assignments, Proceedings of the Frontiers in Education
conference, 2002.
[8] Guerra-Salcedo C. and Whitley D. Feature Selection mechanisms for ensemble creation: a
genetic search perspective. Freitas AA (Ed.) Data Mining with Evolutionary Algorithms:
Research Directions Papers from the AAAI Workshop, 13-17. Technical Report WS-99-06.
AAAI Press, 1999.
[9] Martin-Bautista MJ and Vila MA. A survey of genetic feature selection in mining issues.
Proceeding Congress on Evolutionary Computation (CEC-99), 1314-1321.
[10] Pei, M., Punch, W.F., and Goodman, E.D. "Feature Extraction Using Genetic Algorithms",
Proceeding of International Symposium on Intelligent Data Engineering and Learning 98
(IDEAL98), Hong Kong, Oct. 1998.
[11] Pei, M., Goodman, E.D., and Punch, W.F. "Pattern Discovery from Data Using Genetic
Algorithms", Proceeding of 1st Pacific-Asia Conference Knowledge Discovery & Data
Mining (PAKDD-97). Feb.1997.
[12] Jain, A. K.; Zongker, D. Feature Selection: Evaluation, Application, and Small Sample
Performance, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 19,
No. 2, February 1997.
[13] Michalewicz Z. Genetic Algorithms + Data Structures = Evolution Programs, 3
rd
Ed.
Springer-Verlag, 1996.
[14] Bandyopadhyay, S., and Muthy, C.A. Pattern Classification Using Genetic Algorithms,
Pattern Recognition Letters, Vol. 16, 1995, pp.801-808.
[15] Bala J., De Jong K., Huang J., Vafaie H., and Wechsler H. Using learning to facilitate the
evolution of features for recognizing visual concepts. Evolutionary Computation 4(3) -
Special Issue on Evolution, Learning, and Instinct: 100 years of the Baldwin Effect. 1997.
[16] Rinal H. Doshi, Dr. Harshad B. Bhadka and Richa Mehta, Development of Pattern
Knowledge Discovery Framework using Clustering Data Mining Algorithm, International
Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, 2013,
pp. 101 - 112, ISSN Print: 0976 6367, ISSN Online: 0976 6375.
[17] Ravita Mishra, Web Usage Mining Contextual Factor: Human Information Behavior,
International Journal of Information Technology and Management Information Systems
(IJITMIS), Volume 5, Issue 1, 2014, pp. 12 - 29, ISSN Print: 0976 6405, ISSN Online:
0976 6413.
[18] R. Vijaya Prakash, Dr. A. Govardhan and Prof. S.S.V.N. Sarmaeswari, Mining Non-
Redundant Frequent Patterns in Multi-Level Datasets using Min Max Approximate Rules,
International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 2,
2012, pp. 271 - 279, ISSN Print: 0976 6367, ISSN Online: 0976 6375.
[19] Nathan Dlima, Anirudh Prabhu, Jaison Joseph and Shamsuddin S. Khan, Novel Approach
in E-Learning to Imbibe Environmental Awareness, International Journal of Computer
Engineering & Technology (IJCET), Volume 4, Issue 2, 2013, pp. 166 - 171, ISSN Print:
0976 6367, ISSN Online: 0976 6375.
[20] R. Manickam, D. Boominath and V. Bhuvaneswari, An Analysis of Data Mining: Past,
Present and Future, International Journal of Computer Engineering & Technology (IJCET),
Volume 3, Issue 1, 2012, pp. 1 - 9, ISSN Print: 0976 6367, ISSN Online: 0976 6375.

Anda mungkin juga menyukai