Anda di halaman 1dari 4

Mining Acute Inflammations of Urinary System

Using GAJA2: a New Data Mining Algorithm

Suwimon Kooptiwoot
Computer Science Program Faculty of Science and Technology Suan SunandhaRajabhat University 1 Uthong Nok, Dusit, Bangkok 10300, THAILAND suwimonktw@yahoo.com
Abstract-Medical data mining is so challenging. In this paper, we propose a new data mining algorithm called GAJA2, which is a derivation of GAJA [I]. We apply GAJA2 to mine Acute Inflammations data set, a medical data set got from VCI machine learning repository 2009[2]. This data set is about symptoms and diagnosis of two diseases of urinary system which are inflammation of urinary bladder and Nephritis of renal pelvis origin. The results show that knowledge mined by using GAJA2 is very interesting. We compare the results from GAJA2 with GAJA and Rough Set Theory. We found that the results from GAJA2 can be used by the experts in the fields and are very much easier to understand than from GAJA and Rough Set Theory. Keywords-Medical Data Mining; inflammation of urinary bladder; Nephritis of renal pelvis origin; of Urinary System; GAJA2 Acute Inflammations

Then we define two diseases as class attributes. The problem is classification algorithm can work with only one class attribute at a time. But we need to use two class attributes at the same time. So we cannot use any of the existing classification algorithms. From this point, we create our new data mining algorithm called GAJA2 for this work. GAJA2 can be used to find out the relationships between related symptom attributes and many class attributes at a time. We use GAJA2 to mine the medical data set as mentioned above. This medical data set is also used in [10]. The results from mining by using our new data mining algorithm, GAJA2, show that GAJA2 algorithm is very good. By comparing with the background knowledge in medical domain from the experts in the field, the decision rules got are reasonable for medical diagnosis. And the rules got from GAJA2 are better than the rules got from the work done in [10], also better than the rules got from the original GAJA in both the number of rules and the quality of rules. II. DATA

I.

INT RODUCTION

Nowadays medical data mining is so interesting and challenging. Data mining is used mainly for mining the implicit knowledge in the data and also finding the relationships between or among related attributes as seen in [3-10]. We are interested in mining the relationships between or among symptoms and diseases. Many researchers used data sets from UCI machine learning data repository as seen in [11-18]. We get acute inflammation data set from UCI machine learning repository 2009. This data set consists of the data about symptoms and diagnosis of two diseases of urinary systems which are Inflammation of urinary bladder and Nephritis of renal pelvis origin. We want to find out the occurrence of these two diseases by looking at the symptoms occ urrence. So we decide to use data mining the find the relationships between the symptoms and the diseases. Then we consider and select the data mining algorithm for our work. Classification algorithm and association rules algorithm are used for mining the relationships among the attributes but in different form. Both have advantages and disadvantages. Using classification algorithm, we have to specify the class attribute, but using association rules algorithm, we do not need to do so. The number of rules got from classification algorithms is less than association rules algorithm significantly as seen in [1, 3- 9]. In this work we want to find the relationships between symptoms and diseases. So we should use classification algorithm. Using classification algorithm, we need to define class attribute.

We get the acute inflammation data set from UCI machine learning repository 2009. This data set is used in [10] for preparing the algorithm for the expert system which performs the presumptive diagnosis of two diseases of urinary system: Inflammation of urinary bladder and Nephritis of renal pelvis origin. The important point is that the symptoms of acute inflammation of urinary bladder often appear with the occurrence of acute nephritis of renal pelvis origin. So we need to know if one disease occurs then another one occur or not by using all symptoms. This is why we need to use both diseases to be our class attributes. This medical data set consists of 8 attributes, 2 attributes are class attributes, the diagnosis of two diseases, and other 6 attributes are the symptoms. Two class attributes and their values are 1. Inflammation of urinary bladder: yes, no 2. Nephritis of renal pelvis origin: yes, no The other 6 attributes and their values are 1. Body temperature of the patients: continuous 2. O ccurrence of nausea: yes, no 3. Lumbar pain: yes, no 4. Urine pushing (continuous need for urination): yes, no 5. Micturation pains: yes, no

978-1-4244-5540-9/10/$26.00 2010 IEEE

278

6. Burning of urethra, itch, swelling of urethra outlet: yes, no We use our GAJA2 algorithm to mine this data set to get the rules. III. GAJA2 ALGO RITHM

Rule 1: IF Nephritis of renal pelvis origin

yes THEN no THEN yes THEN

Lumbar pain yes Rule 2: IF Nephritis of renal pelvis origin Nausea no Rule 3: IF Inflammation of urinary bladder
= =

The algorithm called GAJA2 is our new data mining algorithm proposed in this work. This algorithm is a derivation of GAJA [1]. GAJA2 algorithm is shown follow.

Urine pushing yes Rule 4: IF Burning of urethra


=

yes THEN Urine


=

pushing yes Rule 5: IF Micturation


= =

no THEN Nausea
= =

no

For all attributes

For all values of current attribute


Group data cases which have the same value of current attribute Select the attributes whose values are the same as all cases in current group Put selected attributes and their values in THEN table Put the name and value of current attribute used for grouping in IF table End End Group rules which have the same THEN part together End Generate rules from IF and THEN table /*Extension part*/ Select rules which have class attribute(s) of interest Select rules which class attribute(s) of interest is/are in THEN part Move class attribute(s) in THEN part to IF part Move the other attribute(s) in IF part to THEN part Group rules which have the same IF part together and delete the repetition part
IV. RESULTS yes THEN yes, Nausea yes
= = =

no THEN Burning of urethra no, Infl ammation of urinary bladder no


Rule 7: IF Lumbar pain
= = =

Rule 6: IF Urine pushing

no THEN Nausea Nephritis of renal pelvis origin no Rule 8: IF Nausea yes THEN Lumbar pain Micturation pain yes, Nephritis of renal pelvis origin
= =

no, yes, yes

Then we select the rules which have class attribute(s). In this case, class attributes are Nephritis of renal pelvis origin and Inflammation of urinary bladder. So the rules selected areRulel-3 andRule6-8 as follow.
Rule 1: IF Nephritis of renal pelvis origin Lumbar pain yes
= =

yes THEN no THEN yes THEN

Rule 2: IF Nephritis of renal pelvis origin Nausea no


=

Rule 3: IF Inflammation of urinary bladder Urine pushing yes


= = = =

Rule 6: IF Urine pushing no THEN Burning of urethra no, Infl ammation of urinary bladder no Rule 7: IF Lumbar pain no THEN Nausea no, Nephritis of renal pelvis origin no
= = =

Rule 8: IF Nausea
=

yes THEN Lumbar pain

yes,

Micturation pain yes, Nephritis of renal pelvis origin yes Then we select the rules which class attribute(s) is/are in
=

THEN part and move class attribute(s) and its/their value(s) to IF part and move other attributes and their values in IF part to THEN part. From this we get the rules as shown follow. Rule 1: IF Nephritis of renal pelvis origin yes THEN Lumbar pain yes
= =

Rule 2: IF Nephritis of renal pelvis origin Nausea no


=

no THEN yes THEN no THEN no THEN yes THEN


=

Rule 3: IF Inflammation of urinary bladder Urine pushing yes


=

Rule 6: IF Infl ammation of urinary bladder Burning of urethra no, Urine pushing no
= =

Rule 7: IF Nephritis of renal pelvis origin Nausea no, Lumbar pain no


= =

The rules got from GAJA2 are shown follow


Rule 1: IF Nephritis of renal pelvis origin

Rule 8: IF Nephritis of renal pelvis origin

Lumbar pain
=

yes, Micturation pain


=

Lumbar pain

yes, Micturation pain

yes, Nausea

yes

Rule 2: IF Nephritis of renal pelvis origin

no THEN yes THEN no THEN

Nausea no, Lumbar pain no Rule 3: IF Inflammation of urinary bladder Urine pushing yes Rule 4: IF Inflamm ation of urinary bladder
=

Group the rules which have the same IF part together and delete the repetition part. From this point, we groupRule 1 and 8, and group Rule 2 and 7 together; we will get the results as shown follow
Rule 1+8: IF Nephritis of renal pelvis origin yes THEN Lumbar pain yes, Micturation pain yes, Nausea yes Rule 2+7: IF Nephritis of renal pelvis origin no THEN
= = = = =

Burning of urethra

no, Urine pushing

no

Steps ofGAJA2
The first part is the same as the original GAJA. And the results from GAJA are following.

Nausea no, Lumbar pain no Finally we get the results from GAJA2 as shown follow.
= =

279

Rule 1: IF Nephritis of renal pelvis origin = yes THEN

other. As extension, they might be said that if the condition shown in their IF parts occurs, then the results could be as shown in THEN part of the first rule and/or the second rule, both are possible. While the rules got from GAJA2 have no repetition of IF part. VII.
CONCLUSION

Lumbar pain = yes, Micturation pain = yes, Nausea = yes Rule 2: IF Nephritis of renal pelvis origin = no THEN Nausea = no, Lumbar pain = no Rule 3: IF Inflammation of urinary bladder = yes THEN Urine pushing = yes Rule 4: IF Inflammation of urinary bladder = no THEN Burning of urethra = no, Urine pushing = no V.
COMPARISON WITH GNA

In this paper, we propose a new data mining algorithm called GAJA2 to mine the relationships between or among related attributes in the same style as classification algorithms. But using classification algorithm, we can define only one class attribute, while using GAJA2 algorithm, we can define more than one class attribute at a time. GAJA2 can be used to mine the relationships between the other related attributes and the class attributes. We applied GAJA2 algorithm to mine Acute Inflammations data set, a medical data set got from UCI machine learning repository 2009. We compare the results from GAJA2 with the results from the original GAJA and also compare with the results from the work done in [10]. The results show that using GAJA2 we can get better results than using the original GAJA andRough set theory. RE FERENCES
[1] Kooptiwoot, S. and M. A. Salam. GNA: A New Consistent, Concise and Precise Data Mining Algorithm. In Proceedings of the Seventh International Conference on Information Integration and Web-based Applications and Services, IIWAS 2005. 2005. Kuala Lumpur, Malaysia. Blake and C. J. Merz. UCI repository of machine learning databases, 1998. Groth, R., Data Mining : Building Competitive Advantage. 2000, New Jersey, USA: Prentice-Hall. Quinlan, lR., Data Mining Tools See5 and C5.0. 2001, RuleQuest. Agrawal, R., T. lmielinski, and A. Swami, Mining Association Rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data. 1993. Washington, D.C. Agrawal, R. and R. Srikant. Fast Algorithms for Mining Association Rules. in Proceedings of the 20th International Conference Very Large Data Bases. 1994. Kantardzic, M., DATA MINING : Concepts, Models, Methods, and Algorithms, ed. E.i.C. Stamatios V Kartalopoulos. 2003, USA: IEEE press. Berry, M.J.A. and G.S. Linoff, Data Mining Techniques and Algorithms, in Mastering Data Mining, R.M. Elliott, Editor. 2000, John Wiley & Sons, Inc.: USA. Tseng, S.-M., Mining Association Rules with Interestingness Constraints in Large Databases. International Journal of Fuzzy Systems, 2001. 3(2): p. 415-421.

We compare the results from GAJA2 with the results from GAJA. The rules from GAJA2 and the rules from the original GAJA are shown in the previous section. GAJA2 gives only four rules while GAJA gives eight rules. The number of the rules from GAJA2 is less than from the original GAJA. And the rules from GAJA2 are very much of interest. Moreover, the rules from GAJA2 are satisfied by the experts in the fields. VI.
COMPARISON WITH ROUGH SET THEORY

We compare the results from using GAJA2 with the results from usingRough Set theory as done in [10] The final results from the work [10] are shown follow.
Rule 1: IF Inflammation of urinary bladder = no and

Nephritis of renal pelvis origin = no THEN Nausea = no and Urine pushing = no and Micturation pains =no and Burning of urethra, itch, swelling of urethra outlet = no
Rule 2: IF Inflammation of urinary bladder = no and Nephritis of renal pelvis origin = yes THEN Temperature

of patient=38-40 oC and Nausea =no and Lumbar pain = yes and Urine pushing =yes and Micturation pains = no and Burning of urethra, itch, swelling of urethra outlet = yes Rule 3: IF Inflammation of urinary bladder = no and Nephritis of renal pelvis origin = yes THEN Temperature of patient >40 oC and Lumbar pain =yes
Rule 4: IF Inflammation of urinary bladder = yes and Nephritis of renal pelvis origin = no THEN Temperature of

[2] [3] [4] [5]

patient = 37-38 oC and Nausea = no and Lumbar pain = no and Urine pushing = yes Rule 5: IF Inflammation of urinary bladder = yes and Nephritis of renal pelvis origin = no THEN Temperature of patient = 36-37 oC and Nausea =no and Lumbar pain = no and Urine pushing =yes and Micturation pains = yes and Buming of urethra, itch, swelling of urethra outlet = yes Rule 6: IF Inflammation of urinary bladder = yes and Nephritis of renal pelvis origin =yes THEN Temperature of patient = >40 oC and Nausea =yes and Lumbar pain = yes and Urine pushing = yes and Micturation pains = yes From the results, we find that the number of rules got from Rough set is more than the number of rules got from GAJA2. For the rules got from Rough set, IF part ofRule 2 is the same as IF part of Rule3 but THEN part of Rule 2 is different from THEN part ofRule3. And IF part ofRule4 is the same as IF part of Rule 5 but THEN part of Rule 4 is different from THEN part ofRule 5. We might look at these rules as the conflict of the rules or the extension of the rules. As conflict, when the condition in IF part occurs, the results should be as shown in THEN part of which rule, the first rule or the second rule which their IF parts are the same as each

[6]

[7]

[8]

[9]

[10] Czerniak, J., H. Zarzycki, Application of rough sets in the presumptive diagnosis of urinary system diseases. In Proceedings of the 9th International Conference on Artificial Intelligence and Security in Computing Systems, ACS'2002.2002. Miedzyzdroje, Poland. [II ] Plant, C. et aI., Enhancing instance-bases classification with local density: a new algorithm for classifying unbalanced biomedical data. Bioinformatics, 2006. 22(8): p. 981-988. [12] Peng, L. et aI., Data Gravitation-based Classification. Information Sciences 179, 2009. p. 809-819. [13] Fraley, c., Adrian E. Raftery. Model-based Methods of Classification: Using the mclust Software in Chemometircs. Journal of Statistical Software ,2007. 18 (6)

280

[14] Lavesson, N., Paul Davidsson. AMORI: A Metric-based One Rule Inducer . SIAM: International Conference on Data Mining, 2009. Sparks, Nevada [15] Golzari, S., et al.. The Effect of Noise on RWTSAIRS Classifier. European Journal of Scientific Research, 2009. 31(4): p. 632-341.

[16] Shichao Zhang,et al.. "Missing is Useful":Missing Values in Cost Sensitive Decision Trees. IEEE Transaction on Knowledge and Data Engineering, 2005. 17(12). [17] Winkler, S. M. et al.. Using Enhanced Genetic Programming Technique for Evolving Classifiers in the Context of Medical Diagnosis- An Empirical Study. GECCO 2006. 2006. Seattle, Washington, USA: ACM. [18] Bhatia, S.et al.. SVM Based Decision Support System for Heart Diseases Classification with Integer-Coded Genetic Algorithm to Select Critical Features. In Proceedings of the World Congress on Engineering and Computer Science. 2008. San Francisco, USA.

281

Anda mungkin juga menyukai