Anda di halaman 1dari 4

International Journal on Recent and Innovation Trends in Computing and Communication

Volume: 2 Issue: 12

ISSN: 2321-8169
3962 - 3965

_______________________________________________________________________________________________

Application of Data Mining Technique for Prediction of Academic Performance


of Student A Literature survey
Mr. Bhushan S. Olokar

Prof. Ms. V.M.Deshmukh

rd

ME 3 Sem Information Technology


Prof. Ram Meghe Institute of Technology & Research
Badnera-Amravati, India
bhushanolokae@outlook.com

Associate Professor & Head Information Technology


Prof. Ram Meghe Institute of Technology & Research
Badnera-Amravati, India
msvmdeshmukh@rediffmail.com

Abstract Application of data mining in the educational Systems can be directed to support the specific need of each of the participants in the
education system and the process. Students are required to add the recommendation for additional activities, teaching material and task that
would favor and improve his/her learning process. Professors would have the feedback, possibilities to classify students into groups base on
their need for guidance and monitoring, to find the mistakes, and find the effective actions. There are so many prediction model are available
with difference approach and techniques in student performance prediction was reported by researchers, but there is no possibility if there are
any predictors that accurately determine whether a student will be an genius, a drop out, or an average performer. The target of this study was to
apply the k-map method for mining data to analyze the relationships in between students success and their behavior and to develop model for
Prediction of Academic Performance of Students. This would be done by using Support Vector Machine (SVM) classifications and kernel k-map
clustering mechanism. By Predicting students performance can help to identify the students who are at risk of failure and thus management can
provide timely help and take essential steps to coach the students to improve performance.

Keywords: Data mining, SVM, kernel-k-means, SOM, Student Performance.

__________________________________________________*****_________________________________________________
I.

INTRODUCTION

used to uncover hidden or unknown information that is not

Data mining is the powerful technology for

capable of being seen, but strongly useful [7]. The data can be

analyzing important information from the data warehouse. It is

personal or academic which can be used to understand

data analysis methodology used to identify hidden patterns in a

students behavior to improve coursework, to improve teaching

large data set. KDD process includes the data mining.

and many other benefits.

Knowledge discovery (KDD) aims at the discovery of useful

The topic of prediction system of academic performance

information from large collections of data [2]. The main goal

is widely researched. The prediction of student success in

of data mining in the KDD process concerned with the

every institution is still the most topical debates in higher

algorithmic means by which patterns or structures are

study centers. In the previous studies, the model of Tinto [15]

enumerated from the data under acceptable computational

is the predominant theoretical framework for considering

efficiency limitations. Data mining has a wide range of

consisting factors in academic goal. The model of Tinto's

applications including the educational environment. In this

considers the process of student attrition as a psychological

environment, data mining is an interesting research area which

interplay between the characteristics of the student entering in

extracts useful, previously unknown patterns from the

university and the experience at the institute. Using data

database for better understanding. It in turns improves the

mining technique in this field is relatively advanced. There are

educational performance and assessment of the student

many data mining techniques was used in this field, such as

learning process [3].

decision tree, Bayesian network, and neural networks, so on

There are increasing research interests in education field

[1]

using data mining. Data mining techniques concerns to

This study investigates the educational domain of data

develop the methods that discover knowledge from data and

mining using a detail case study from data that mostly comes
3962

IJRITCC | December 2014, Available @ http://www.ijritcc.org

_______________________________________________________________________________________

International Journal on Recent and Innovation Trends in Computing and Communication


Volume: 2 Issue: 12

ISSN: 2321-8169
3962 - 3965

_______________________________________________________________________________________________
from behavior of students. It always showed what type of data

values, or insufficient memory area to store the kernel matrix,

could be basically collected, how could we reprocess that data,

that make it unsophisticated for large corporation. The new

how to apply kernel method for data mining on the collected

clustering scheme is a large scale integrity clustering for

data, and finally how can we benefited from the knowledge

Kernel K-means algorithm [4].

discovery of collected data. In this case study, university

II.

LITERATURE REVIEW/SURVEY

students were predicated their final grade by using SVM


classification and grouped the students according to their
similar characteristics, by clustering. The clustering process
was carried out using kernel k-means algorithm technique.

Most cited literature survey in educational Data


mining have been by Romero and Ventura [1] which indicate
performance prediction as one of the emerging field of
educational data mining Various Bayesians Classification

1.1 Support Vector Machine (SVM) for Classification


Classification is data mining task that predicts group
memberships for data instance [8]. In educational area
application of the classification method, given works of a
student, one may predicate his/her final grade. The SSVM is
further development of Support Vector Machine (SVM)
[10][14]. The SSVM generated and solve an unconstrained
smooth reformulation of the SVM for pattern classification
using completely arbitrary kernel [10]. SSVM is solved by a
very fast Newton-Armijo algorithm and has been extended to
non linear separation surfaces by using non linear kernel
technique. The numerical results show that SSVM is faster
than other methods and has better generalization ability [8].

techniques have been Used and comparative study suggest that


Ensemble methods gives best overall accuracy.
Cheewaprakobkit [13] considered 1600 students
records bet 2001 and 2011 in Thailand University and applies
decision tree and neural network to most important factors
affecting students academic achievement. Decision tree
proves to be a better classifier than the neural network with
1.311% more accuracy. Number of hours worked per
semester, additional English course, no of credits enrolled per
semester and marital status of the students are major factors
affecting the performance.
Bharadwaj and Pal [5] base their experiment only on
Previous Semester marks, seminar performance, Assignment,
class test marks, attendance, Lab work to predict end semester

1.2. An Effective Kernel K-Means for Clustering


Clustering is making groups of objects such that the objects in
one group will be similar to one another and different the
objects to another group [4]. In educational field, clustering
would be used to grouping students according to their behavior
and performance. In this study we used Kernel K-means
algorithm to cluster the given data. A drawback behind the
original K-means is that it cannot separate cluster that are non
linearly separable I/p space. Kernel K-Means is one approach
has emerged for handling such a problem. Kernel K-means
before clustering, points mapped to a higher dimensional
feature space using a non linear functionality, and then Kernel
K-means partitions the points by linear separator in new and
additional space[12]. Kernel K-means has been extended to
sufficient and effective large scale clustering [4], since the
original Kernel K-means had serious problems, such as the
high clustering cost due to the repeated simulation of kernel

marks. Records of 50 students of Session 2007 to 2010 MCA


of Purvanchal University were considered. The paper
calculates Split info, gain ratio of each predictor and products
prediction rules.
The drop out of the student from open polytechnic of
New Zealand due to failure has been explored byKovaic.Z[9].
Enrollment data consisting of socio-demographic variables
such as (age, gender, class, work status, education and
disability) and study environment (course program and course
block), of near about 435 students of polytechnic students of
Information system course were collected. The final label
consisting of two categories PASS (those who completed the
course) and FAIL (Those who did not complete) were
considered. Feature selection indicated that most important
attributes for prediction are ethnicity, course program and
course block.
The research had been motivated by a number of
3963

IJRITCC | December 2014, Available @ http://www.ijritcc.org

_______________________________________________________________________________________

International Journal on Recent and Innovation Trends in Computing and Communication


Volume: 2 Issue: 12

ISSN: 2321-8169
3962 - 3965

_______________________________________________________________________________________________
practical data mining projects Where Self Organizing map has

factors like mothers education and family income were highly

been a central data analysis tool [6]. It could become an easily

correlated with the student academic performance. Conducted

seen that while the SOM can be used to quickly create a

study for the student performance using association rule

qualitative overview of the data, turning this qualitative

technique and they find the interesting ratio of student in

information to quantitative characterizations requires a great

opting class teaching language.

deal of expertise and completely user manual work. There is

III.

PROPOSED WORK

not another wide process of decision making that seeks


widespread

agreement

among

group

members

or

understanding of the methods needed for post-processing of


the SOM-based data analysis. The subsequent research has
concentrated on devising such methods and on gaining a better
understanding of the strengths, possibilities and weaknesses of
the SOM in data exploration.
[8] Applied the classification of data as data mining
technique to estimate student performance, they had used
decision tree method for classification of similar data. This
study is helpful earlier in identifying the drop-outs and
students who need special attention and allow teacher to
provide appropriate promotions. [9] and Applied the
classification as data mining technique to estimate student
performance, they had used decision tree method for
classification. This study allows the University management to
prepare necessary resources for the new enrolled students to
get desired result and indicates at an early stage which type of
students will potentially be enrolled and what areas to
concentrate upon in higher educational system for support.
[10] is applied the association rule mining analysis based on
students failed courses to identifies students failure patterns.
The main goal of their study is to identify hidden relationship
between the failed courses and suggests relevant causes of the
failure to improve the low capacity students performances.
[11] Used k-means clustering algorithm for prediction
of student's learning activities. The information gets generated
after the implementation of data mining technique might be
helpful for instructor and also for students. Using Bayesian
Classification Method as a data mining technique and to the
that students grade in senior secondary exam, location,
medium, mother's qualification, other habits, family annual

Educational data mining is the emerging field regarding to


prediction of future performance The objective of the
proposed methodology is to build the classification model that
classifies a students performance and has been built by
combining the Standard Process for Data Mining that includes:
business and data understanding, data preparation, modeling
and finally application of data mining techniques which is
classification in present study. Particularly, we will implement
the rules into SVM algorithm to predicate the students final
grade. Also we clustered the student into group using kernel kmeans clustering. This study expressed the strong correlation
between mental condition of student and their final academic
performance. DMT has a potential in performance monitoring
of universities and other levels education offering historical
perspectives of students performances. The results may both
supplement and complement increment ratio of education
performance monitoring and assessment implementations.
CONCLUSION
As we have seen classification task has been used on
student database to predict the students performance on the
basis of previous database record. There are many approaches
that are basically used for the data classification. Information
like Class test, Attendance, Seminar, innovative activities and
Assignment marks were collected from the students previous
database record, to predict the performance at the end of the
each semester. This study will definitely help for the students
and the teachers to improve the performance of the student.
This study will also help full to identify those students which
needed special attention to reduce failure ration and taking
appropriate action for the next academic examination.
REFERENCES

income and status of students family were highly correlated


with the student academic performance. Used simple
sophisticated linear regression analysis and it found that the
3964
IJRITCC | December 2014, Available @ http://www.ijritcc.org

_______________________________________________________________________________________

International Journal on Recent and Innovation Trends in Computing and Communication


Volume: 2 Issue: 12

ISSN: 2321-8169
3962 - 3965

_______________________________________________________________________________________________
[1] C. Romero and S. Ventura, Educational data mining:

W.P,

A.Embong.,Smooth

Support

Vector

a survey from 1995 to 2005, Expert Systems with

Machine for Breast Cancer Classification, IMT-GT

Applications, no. 33, pp. 135146, 2007.

Conference

[2] Heikki, Mannila, .Data mining:

machine learning,

on

Mathematics,

Statistics

and

Applications(ICMSA), 2008
[11] Christoper Burges. A Tutorial on support vector

statistics, anddatabases. IEEE, 1996.


[3] Moucaryet,al.,Improving student performance using data
clustering and neural networks in foreign language
based higher education, The Research Bulletin of Jordan

Machines for Pattern Recognition, Data Mining and


Knowledge Discovery, 2(2), 1998
[12] Mark Girolami. Mercer Kernel Based Clustering in
Feature Space I EEE Trans. On Newral Networks.

ACM, vol II (III).


[4] Rong Zhang and Alexander I. Rudnicky, A large Scale

[13] P.Cheewaprakobkit,

Study

Factor

Undergraduate,

Analysis

AffectingAchievements

Computer Science, Carnegie Mellon University 5000

presented atInternational Multi Conference of Engineers

Forbes Avenue, Pittsburgh, PA 15213, USA.2006

and ComputerScientists, IMECS , Hong Kong, HK,

Analyze Students Performance, International Journal

of

of

Clustering Scheme for kernel-K-Means School of

[5] B.K.Bhardwaj and S.Paul , Mining Educational Datato

Paper

March 13 - 15, 2013.


[14] Furqan,M.,A.Embong,

Suryanti,A,

Santi

W.P.,

Advanced Computer Science and applicationVol. 2 No. 6

Sajadin,S.,Smooth Support Vector Machine For Face

, 2011 .

Recognition Using Principal Componen Analysis.

[6] Kohonen, T., Self-Organizing Maps, Series in Information


Sciences, second edn. 1997,Springer, Heidelberg.
[7] PavelBerkhin,

Survey

of

Clustering

Data

Mining

Machine

for

classification,

Engineering Malahayati University, Bandar Lampung,


Indonesia.

Y.J. Lee. And O.L Mangasarian, A Smooth Support


Vector

Proceeding 2nd International Conference On Green


Technology and Engineering (ICGTE), 2009. Faculty of

Techniques, Accrue Software, Inc.


[8]

[10] Santi

Journal

of

Computational Optimization and Applications.20, 2001,

[15] V. Tinto, Limits of theory and practice in student


attrition," Journal of Higher Education no. 53, pp. 687700,1982.

pp.5-22
[9]

Kovaic Z (2010) Early prediction of student success;


mining student enrolment data paper presented at
proceeding if information science &IT Education and
conference(InSITE), casinioitalia, june, 19-24,2010

3965
IJRITCC | December 2014, Available @ http://www.ijritcc.org

_______________________________________________________________________________________

Anda mungkin juga menyukai