Anda di halaman 1dari 7

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 1, Issue 8, August!1"# I$$% &"8 ' "8(&


() * + !1", IJAFRC All Rights Reserved ,,,#i-afrc#org
.fficienc/ of 0ata 1ining 2echni3ues in .dif/ing $ector#
Dr S. Anupama Kumar
1
, Dr.Vijayalakshmi M N
2

Associate Professors, Dept of MA, !V olle"e of #n"ineerin", $an"alore.
Kumaranu.%&%'("mail.com, mn)iju*+("mail.com

A 4 $ 2 R A C 2
In recent /ears, data mining techni3ues are used in various fields li5e 6io metrics, 6io technolog/,
6usiness anal/tics etc to ma5e decisions# In this paper, the different data mining techni3ues li5e
classification, clustering and association mining are applied on various educational domains li5e
goal setting, learning and teaching methodologies, performance and achievement# 2he efficienc/
of the algorithms for each domain is anal/7ed# Classification techni3ues are proved to 6e ver/
efficient in predicting the course and program outcome of a higher education course, clustering
algorithm ,as proved ver/ effective in grouping the students 6ased on their performance and
association mining is proved ver/ effective in 6ringing out interesting rules to understand their
6ehavior in ne, learning environments#

Inde8 2erms9 Classification, Clustering, Association mining, :erformance, Anal/sis#

I# I%2R;0<C2I;%

#,ucational ,ata minin" -#DM. is increasin"ly reco"ni/e, as an emer"in" ,iscipline 011 ten,erin" to
impro)e the 2uality of e,ucation. #,ucational Data Minin" is the process of transformin" ra3 ,ata
compile, 4y e,ucation systems into useful information that coul, 4e use, to take informe, ,ecisions an,
ans3er research 2uestions 021. 5t can 4e applie, on )arious ,omains of e,ucation6
-i. 7o ,e)elop computational tools an, techni2ues for ,eterminin" the 4est practices for e)aluation
metrics an, mo,el fittin" on lar"e e,ucational ,ata sets an,
-ii. 7o ,etermine 3hat can 4e e8tracte, from the ,ata set that 3ill help the stake hol,ers to make ri"ht
,ecisions.
Data minin" is a potent tool 3hich can 4e applie, on )arious research areas to 4rin" out interestin" an,
useful information from hu"e ,atasets. 5n this research 3ork 3e ha)e incorporate, the ,ata minin"
techni2ues in the e,ucational ,atasets an, the efficiency of the ,ata minin" al"orithms in performin" the
re2uire, tasks are analy/e,. 7his paper e8plores the applications of ,ata minin" techni2ues like
classification, clusterin" an, association minin" on e,ucational ,ata sets. 7he major contri4ution of the
research 3ork is the implementation of $aker9s ta8onomy 0:1. 7he techni2ues of Pre,iction, lusterin",
Association minin", Distillation of ;uman <u,"ment an, Disco)ery 3ith mo,els has 4een applie, o)er
the ,ata sets usin" $aker9s ta8onomy. 7he techni2ues are either implemente, unai,e, or com4ine,
to"ether to achie)e results. lassification techni2ues like ,ecision trees, $ayesian net3orks, an, rule
4ase, al"orithms are applie, on ,ata sets to pre,ict the pro"ram an, course outcome of a hi"her
e,ucation pro"ram. 7he efficiency of the al"orithms 3as analy/e, usin" )arious metrics.

lusterin" techni2ues are use, to frame cluster of stu,ents ,epen,in" on their performance "ra,es an,
i,entify the "oo,, a)era"e an, poor performers. Association minin" has 4een applie, to un,erstan, the
4eha)ior of the stu,ents in ,ifferent learnin" en)ironments. 7he follo3in" section ,escri4es the
metho,olo"y of research, section 555 ,iscusses a4out the o)erall conclusion of the research 3ork.
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 8, August!1"# I$$% &"8 ' "8(&
(= * + !1", IJAFRC All Rights Reserved ,,,#i-afrc#org
II# 1.2>;0;?;@A

7his section ,escri4es the metho,olo"y a,opte, in con,uctin" the research 3ork. =e e8plain a4out the
e,ucational ,omains, the ,ata minin" task implemente, on these ,omains an, the results achie)e,. 7he
fi"ure 1 e8plains the theoretical frame3ork constitutin" the )arious phases of the hi"her e,ucation
system implemente, in this research 3ork.


Figure 1# 2heoretical Frame,or5 for .ducation $/stem

>oal settin" refers to the phase of e,ucation ,omain 3here "oals can 4e set 4y stake hol,ers -like
stu,ent, teachers an, mana"ement. of the e,ucation. ?earnin" Strate"ies an, Metho,s refers to the
implementation of ,ifferent learnin" metho,s like tra,itional learnin", e learnin", colla4orati)e learnin",
,istance learnin" etc in an e,ucational en)ironment an, un,erstan, the 4eha)ior of the stu,ents
to3ar,s ,ifferent learnin" techni2ues. Action control refers to the phase of monitorin" the acti)ities of
the stake hol,ers usin" ,ifferent techni2ues. Performance an, achie)ement is the final phase of the
frame3ork 3hich is the outcome of all the other e,ucational ,omains. 7he )arious phases in)ol)e, in
implementin" the research in)ol)es

-i. Pre processin" the ,ata set accor,in" to the nee,s
-ii. 5mplementin" it usin" classification@ clusterin" @ association rules
-iii. omparin" the al"orithms 4ase, on their accuracy an, efficiency

A# Implementation of Classification 2echni3ues

lassification techni2ues are implemente, in all the phases of the frame3ork to

-i. Pre,ict the pro"ram outcome an, the course outcome of each stu,ent
-ii. An,erstan, the 4eha)ior of the stu,ent to3ar,s achie)in" the "oal
7he e,ucational ,omains pertainin" to "oal settin", action control, an, performance an, achie)ement
phases 3ere implemente, in this research 3ork. 7he )arious techni2ues of ,ecision trees, rule 4ase,
al"orithms an, $ayesian net3orks 3ere implemente, o)er the ,ata sets. lassification techni2ues are
4est suite, for pre,ictin" the stu,ent performance 0*1.7he ,ataset use, in this research 3ork consists of
personal an, aca,emic recor,s of stu,ents. 7he aca,emic recor, of stu,ents consists of their
performance from Bst, till the pre final semester of their post "ra,uate pro"ram. 7he ,ata set is
preprocesse, in or,er to pre,ict the outcomes mentione, a4o)e. Cor pre,ictin" the course @ pro"ram
outcome the ,ata pertainin" to stu,ent9s name, Seat num4ers etc are ,etache, from the ,ataset since
they are not functional to pre,ict the outcome.
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 8, August!1"# I$$% &"8 ' "8(&
(8 * + !1", IJAFRC All Rights Reserved ,,,#i-afrc#org

5n or,er to pre,ict the pro"ram outcome @ course outcome ,ecision tree al"orithms like 5D: an, +.&,
NaD)e $ayes al"orithm, !ule 4ase, al"orithm like Ene !ule an, !an,om trees, 3ere use,. 7he efficiency
of the al"orithms is analy/e, usin" the 4elo3 parameters6

1. 7he num4er of instances pre,icte, as Pass@Cail.
2. 7he time taken 4y the al"orithms to "enerate the output

Decision tree al"orithms like 5D: an, +.& 3as foun, )ery effecti)e in pre,ictin" the outcome of the
stu,ents 3hen the tar"et )alue is Pass@Cail. A ,ecision tree is a set of con,itions or"ani/e, in a
hierarchical structure 0+1. 7he follo3in" ta4le 1 sho3s the comparison of the al"orithms.

2a6le 1 Comparison of I0& and C"#( algorithms

Al"orithm 5D:
+.&-<+F.
5nstances classifie,
as Pass
11% 1%*
5nstances classifie,
as Cail
: G
7ime 7aken-secs. %.%2 %

Crom the a4o)e ta4le it is clear that +.& al"orithm performs efficiently in pre,ictin" the outcome an,
takes less time compare, to 5D: al"orithm.

NaD)e $ayes al"orithm 3as foun, )ery effecti)e in pre,ictin" the course outcome an, pro"ram outcome
3hen the tar"et )alue is more than t3o like pass, fail an, pass@fail. 5n 0&1 the author has i,entifie, the
stu,ent assessment ,ata as a tool to pre,ict the stu,ent outcome an, pro"ram outcome, ho3 assessment
can 4e use, to pre,ict the same. 7he system coul, the num4er of stu,ents 3ho 3ill pass, fail an, a set of
stu,ents 3ho ten, to pass@fail ,epen,in" on their a4ility to learn more after "i)in" e8tra coachin". 7he
naD)e 4ayes classification approaches nee, only one scan of the trainin" ,ata 0'1 i.e. the approach ,oes not
nee, many e8amples to classify. 7he al"orithm 3as also foun, effecti)e in pre,ictin" the outcome of the
stu,ents in,i)i,ually so that it 4ecomes easy for the tutor to i,entify the 3eak stu,ents an, train them
easily.

73o rule 4ase, al"orithms, Decision ta4le al"orithm an, Ene ! al"orithm 3ere applie, o)er the learner
assessment recor,s to pre,ict the results of the stu,ents in final year ,epen,in" upon their performance
in the pre)ious semesters. 5n 0F1 the author has e8plaine, the efficiency of ,ecision ta4le 4y applyin" it on
,iscrete )alues an, pro)e, that the ,ecision ta4les are more efficient than +.& al"orithms. 7he "oal of
rule in,uction is "enerally to in,uce a set of rules from ,ata that captures all "enerali/a4le kno3le,"e
3ithin that ,ata, an, at the same time 4ein" as small as possi4le 0G1.A ,ata set comprisin" of learner9s
assessment ,ata from Semester 5 to Semester 5V has 4een implemente, to pre,ict the outcome of
Semester V an, the al"orithms 3ere foun, )ery effecti)e the pre,ictin" the same. 7he follo3in" ta4le 2 is
the confusion matri8 of 4oth the al"orithms an, foun, to "i)e similar pre,iction.


International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 8, August!1"# I$$% &"8 ' "8(&
(B * + !1", IJAFRC All Rights Reserved ,,,#i-afrc#org
2a6le Confusion 1atri89 0ecision 2a6le and ;ne R algorithms
7otal. No. of
5nstances 611%
No. of
5nstances
Pre,icte,
Pass
No. of
5nstances
pre,icte,
Cail
orrectly
Pre,icte,
5nstances
G2 '
5ncorrectly
pre,icte,
5nstances
1% 2

7hese al"orithms 3ere analy/e, 4ase, on the ,ifferent metrics like the time taken to e)aluate the mo,el,
7rue an, Calse positi)e )alues o4taine, ,!E )alue o4taine, from the output. Cor these )alues, Ene !
al"orithm 3as foun, to 4e more effecti)e than the Decision ta4le. Precision is the a"reement 4et3een
o4ser)ers -inter o4ser)er a"reement., an, is often reporte, as a kappa statistic 01%1. 7he follo3in" "raph
2 sho3s the analysis of the al"orithm usin" ohenHs kappa measures 3hich is the a"reement 4et3een
t3o raters 3ho classify N items into mutually e8clusi)e cate"ories. 7he )alues of kappa can ran"e
4et3een %I1 3here a )alue of kappa less than Jero is not accepta4le an, a )alue 4et3een %.F1 an, %.GG is
perfect.

Figure # Comparision of Cappa $tatic Value#

7he )alue of kappa in Decision ta4le is %.:G:* an, Ene ! is %.++':. Crom the )alues, it is un,erstoo, that
Ene ! al"orithm performs 4etter than Decision ta4le.

4# Implementation of Association 1ining 2echni3ues

7he ,ifferent types of teachin" metho,s play a )ital role in any e,ucation en)ironment. 5t may 4e one of
the factors affectin" the performance of the stu,ent. 7he learner shoul, 4e comforta4le 3ith the ne3
learnin" style, en)ironment etc. 7he )arious learnin" styles like colla4orati)e learnin", a,apti)e learnin"
etc can 4e implemente, in all the sta"es of e,ucation. Association rule minin" is use, in the research
3ork to analy/e the learner9s response to3ar,s ne3 learnin" techni2ues an, un,erstan, their interest
to3ar,s a,aptin" to ne3 learnin" en)ironments. Association rules are if@then statements that help to
unco)er relationships 4et3een unrelate, ,ata in an information repository 0111. 73o association
al"orithms namely apriori an, 7ertius al"orithms 3ere implemente, o)er the ,ata set to un,erstan, the
stu,ent 4eha)ior o)er the ne3 learnin" en)ironments. An online 2uestionnaire comprisin" of 2%
2uestions re"ar,in" ne3 learnin" style, learnin" en)ironment, their ea"erness to learn an, apply current
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 8, August!1"# I$$% &"8 ' "8(&
)! * + !1", IJAFRC All Rights Reserved ,,,#i-afrc#org
technolo"y, their attitu,e to3ar,s self learnin" etc 3as circulate, amon" the stu,ents an, the responses
to3ar,s the same 3ere recor,e,. 7he responses are then pre processe, accor,in" to the nee, an, then
implemente, 4y 4oth the al"orithms. 7he Apriori al"orithm is implemente, usin" the support an,
confi,ence )alues an, 3as capa4le of "eneratin" interestin" rules o)er the ,ata set 3hich helpe, us
un,erstan, the stu,ent9s attitu,e to3ar,s ne3 learnin" styles. 7he follo3in" ta4le : is a sample of the
rules "enerate,.
2a6le & 9 Interesting rules from Apriori Algorithm
K1%L=eek LLM K1&L>oo"le
. K1Lno LLM K1&L>oo"le
K11L=eek LLM K1&L>oo"le
K1+LDictionary LLM K1&L>oo"le

7he rules can 4e interprete, as6

1. 7he stu,ent 3ho 4ro3ses the internet 3eekly for assi"nments uses >oo"le as search en"ine
2. 7he stu,ent 3ho ,oes not ha)e a laptop for his personal use also uses "oo"le search en"ine
:. 7he stu,ent 3ho 4ro3ses internet 3eekly for sen,in" @ recei)in" emails re"ar,in" his stu,ies
also uses >oo"le as search en"ine.
5t is clear that learnin" style of stu,ents has "ot shift in para,i"m from 4ein" au,itory an, )isual learners
to )isual an, tacitile learners an, they are 3illin" to learn in ne3 en)ironments. 7he 7ertius association
rule al"orithm is presente, in Clach N ?achiche in 2%%1.0121. 7he al"orithm is ,esi"ne, to e8tract first
or,er rules usin" unsuper)ise, learnin" 01:1. 7he rules are "enerate, ,epen,in" on the num4er of
literals consi,ere, for hypothesis usin" 7ertius al"orithm. 7he hypothesis is a set ,epen,in" on the
num4er of literals appearin" the ,ata set. 7he ma8imum num4er of literals taken for the purpose of
research is + an, the e8periment is performe, till it reaches the minimum. 7he e8periment is repeate, 4y
keepin" the classification OENP an, the minimum confirmation )alue as 1%. 7he rules "enerate, are "i)en
4elo3 6

K1G L No LLM K1 L no
K1G L Qes LLM K1 L yes
K1* L Ne)er LLM K1 L no
K1* L =eek LLM K1 L yes

Crom the rule set it is un,erstoo, that the association rules are frame, in symmetric pattern. 7his
re)eals the 4eha)ior of the stu,ent in ,i)erse situations. 7he interpretation of the rule clearly states that
the stu,ents of to,ay9s community are more ,i)er"ent to3ar,s the internet technolo"ies an, ne3 self
learnin" metho,s.

C# Implementation of Clustering 2echni3ues

7he o4jecti)e of this research 3ork is to apply simple K means clusterin" on learner recor,s an, 4rin"
out ne3 information from it. A stu,ent ,ata 4ase consistin" of the personal an, aca,emic recor,s 3ere
selecte, an, ,ata cleanin" 5 s applie, to remo)e the un3ante, ,ata. K means clusterin" al"orithm is
applie, o)er the ,ataset usin" t3o ,ifferent ,istance metrics namely the -i. #ucli,ean ,istance an, -ii.
Manhattan ,istance. . K means clusterin" has 4een applie, to position elements of a ,ata4ase into specific
"roups accor,in" to some attri4utes 01+1.7he author in 01&1 has use, learner9s assessment ,ata to stu,y
the performance of the stu,ents in a support learnin" system calle, !#C?#7 . 7he ,ata set is iterate,
three times to o4tain the centroi,s of the ,ata points. 7he missin" )alues in the ,ata set are replace, 4y
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 8, August!1"# I$$% &"8 ' "8(&
)1 * + !1", IJAFRC All Rights Reserved ,,,#i-afrc#org
the mean )alue of the attri4utes. 7he si/e of the ,ata set is 1*% an, the al"orithm resulte, in t3o clusters
3ith '+ instances in one cluster an, 1%' instances in another cluster. 7he follo3in" fi"ure : ,isplays the
clusters forme, for the "i)en ,ata set. 7he BIa8is sho3s the ,etails of the cluster forme, an, Q Ra8is
sho3s the )alue of the "en,er attri4ute a"ainst it.

Figure &# Formation of Clustering <sing C means algorithm#
7he circle colore, in 4ro3n sho3s the clusters forme, for cluster 1 3hose ,ata are close to each other
4et3een *%SF% an, the circle colore, in 4lue in,icates the cluster % 3hose centroi,s )ary from %S'%. 7he
num4er of instances has 4een partitione, e8actly as F+.& out of the 1'G instances. 7he ,ataset is a"ain
implemente, usin" Manhattan ,istance keepin" all other criteria same. 7he )alues of the centroi,s in
4oth the ,istances for all the attri4utes ,iffer only on a minimum )alue 3hich ,oes not affect the
formation of clusters. Crom the clusters frame,, it is un,erstoo, that the clusters are forme, 4ase, on
the class attri4ute of the learner9s namely the C -Cirst lass. an, CD -Cirst lass 3ith Distinction.
)alues of the attri4utes. No cluster coul, 4e forme, 4ase on the attri4ute )alue Cail. 7herefore it is clear
that the un,erperformers cannot 4e "roupe, since the num4er of learners 3ith this attri4ute is less.
Crom the research 3ork it is clear that K means al"orithm can 4e effecti)ely use, to "roup the stu,ents
an, analy/e their performance.

III# C;%C?<$I;%

#,ucational ,ata minin" can 4e effecti)ely use, o)er stu,entTs ,ata set to pre,ict their course @ pro"ram
outcomes, to un,er their 4eha)ior o)er ne3 learnin" styles an, en)ironments an, also to analy/e their
recor,s to 4rin" ne3 kno3le,"e from it. Cor the "i)en ,ata set, Decision trees 3ere foun, )ery effecti)e
to pre,ict the stu,ent9s pro"ram outcome usin" their marks 3hen the tar"et )aria4le is Pass@Cail.
$ayesian net3ork 3as foun, )ery effecti)e to pre,ict the pro"ram an, course outcome 3hen the tar"et
)aria4le is more than t3o an, also it coul, pre,ict the in,i)i,ual outcome of the stu,ent. !ule 4ase,
al"orithm is )ery effecti)e in pre,ictin" the pro"ram outcome ,epen,in" on their historical ,ata.
5nterestin" rules coul, 4e foun, o)er learner recor,s an, their 4eha)ior to3ar,s ne3 learnin"
en)ironments an, learnin" styles can 4e pre,icte,. lusterin" al"orithms 3ere )ery effecti)e in analy/in"
the stu,ent ,ata an, "roupin" them accor,in" to their performance. 5n future #DM metho,s can 4e
further use, for ,ecision makin" an, )isually ,epicte, usin" learnin" analytics

International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 8, August!1"# I$$% &"8 ' "8(&
) * + !1", IJAFRC All Rights Reserved ,,,#i-afrc#org
IV# R.F.R.%C.$
011 Arrua4arrena !., Pere/, 7. A., ?ope/Iua,ra,o, <., an, Va,illo, <. >. <. -2%%2.. OEn e)aluatin"
a,apti)e systems for e,ucationP, 5n 5nternational onference on A,apti)e ;yperme,ia an,
A,apti)e =e4I$ase, Systems, MaUla"a, Spain, pp. :':R:'*.
021 ecily ;einer, !yan $aker , Kalina Qacef -e,s. , Procee,in"s of the =orkshop on #,ucational Data
Minin" at the Fth 5nternational onference6 En 5ntelli"ent 7utorin" Systems <hon"li,
7ai3an.,2%%'.
0:1 $aker, !.S.<.,. -in press. OData Minin" for #,ucationP. 7o appear in Mc>a3, $., Peterson, P., $aker,
#. -#,s.. 5nternational #ncyclope,ia of #,ucation -:r, e,ition.. E8for,, AK6 #lse)ier.
0+1 <.;an an, M.Kam4er, OData Minin"6 oncepts an, 7echni2uesP, San Die"o6 Aca,emic Press 2%%1.

0&1 Dennis K. >eor"e, PhD , !etta #. Poe, PhD , OAsin" Stu,ent Assessment Data for Pro"ram
AssessmentP, =estern Kentucky Ani)ersity , http6@@333.n)cc.e,u@a4outIno)a@,irectories
0'1 Nai)eI$ayes lassification Al"orithm, 333.soft3are.uc).ro@Vcmihaescu@ro@teachin"@A5!@,ocs@
0*1 Dr. A. Pa,mapriya , OPre,iction of ;i"her #,ucation A,missi4ility usin" lassification
Al"orithmsP, 5nternational <ournal of A,)ance, !esearch in omputer Science an, Soft3are
#n"ineerin" , Volume 2, 5ssue 11, No)em4er 2%12 5SSN6 22** 12FB , pp ::% I::*
0F1 !on Koha)i , O7he Po3er of Decision 7a4lesP, 333.cse.uns3.e,u.au
0G1 ohen, =. -1GG&., O Cast effecti)e rule in,uctionP , Procee,in"s 12th 5nternational onference on
Machine ?earnin", Mor"an Kaufmann. Pp. 11&R12:.
01%1 Anthony <. Viera, MDW <oanne M. >arrett, PhD ,P An,erstan,in" 5ntero4ser)er A"reement6 7he
Kappa StatisticP , Crom the !o4ert =oo, <ohnson linical Scholars Pro"ram, Ani)ersity of North
arolina., Camily Me,icine !esearch Series, May 2%%&

0111 OAssociation rules in Data Minin"P,
333.search4usinessanalytics.techtar"et.com@,efinition@associationIrulesIinI,ataIminin"
0121 Peter A. Clach , Nicolas ?achiche, O onfirmationI>ui,e, Disco)ery of CirstIEr,er !ules 3ith
7ertiusP, Kluwer Academic Publishers. Manufactured in The Netherlands, Machine Learning, 42, 61
!", 2##1
01:1 Payam !efaeil/a,eh, ?ei 7an", ;uan ?iu, Ari/ona State Ani)ersity,P rossIVali,ation O,
htt$%&&www.cse.iitb.ac.in
01+1 ha,y #l Moucary ,Marie Khair an, et al,P 5mpro)in" Stu,ent9s Performance Asin" Data
lusterin" an, Neural Net3orks in Corei"nI?an"ua"e $ase, ;i"her #,ucationP , 7he !esearch
$ulletin of <or,an A M , V o l . 55 - 555 .
01&1 ha,y #l Moucary ,Marie Khair an, et al,P 5mpro)in" Stu,ent9s Performance Asin" Data
lusterin" an, Neural Net3orks in Corei"nI?an"ua"e $ase, ;i"her #,ucationP , 7he !esearch
$ulletin of <or,an A M , V o l . 55 - 555 .

Anda mungkin juga menyukai