Anda di halaman 1dari 5

EPIDEMIOLOGY AND DATA MINING IN HEALTH IN

KENYA
Anne Njuguna Nyambura Oyugi Steve Ouma Lishba Mose Naisinkoi
Kimathi University Kimathi University Kimathi University
College of Technology, College of Technology, College of Technology,
Nyeri, Kenya Nyeri, Kenya Nyeri, Kenya
miunjuguna@yahoo.com steveoyugi@gmail.com mosenaisy@yahoo.com

ABSTRACT

This paper intends to explicitly provide an overview of Public Health and


Epidemiology and Data Mining. How the Internet in specific can be used to collect
and analyze data to generate a pattern that can be used by the health industry to
improve their services. This paper looks at Kenya as the area of study.

What it intends to do is implement a website that shall be used to gather data and use
various statistical and data mining tools to provide accurate information that can be
used for the above intention. It shall make use of the various web 2.0 tools such as are
provided by php, which has become one of the most dynamic web scripting languages
world over. It shall be used to facilitate interactions with a database that shall store all
the required data.

The same website shall be used to connect to all the medical centres available in the
country that will provide manual input of data. Analysis of the data shall be done
using relevant data mining methods and techniques.

INTRODUCTION

Health is the cornerstone of all societies and the apex of all social and economic
systems world over. There is no doubt that for every economy to succeed, its working
population must consist of healthy individuals; healthy in mind, body and soul. This
paper seeks to study a big field in health: Epidemiology.

Epidemiology is the study of patterns of health and illness and associated factors at
the population level. It is a field that is greatly accredited with the determination of
precise evidence based medicine and medical practices used in identifying risk factors
for disease and is also used in the determination of optimal treatment approaches to
clinical practice and preventive medicine.

Epidemiologists rely on scientific disciplines such as Bio-Statistics, Geospatial


Information Science and Social Science to better understand disease processes, obtain
the current raw information available, store data and map disease patterns, to better
understand proximate and distal risk factors that affect the spread of major diseases.
Currently we are being dogged by great challenges in health and the integration of
data mining methodologies and ideologies will greatly reduce the load taken in data
collection and analysis.
BACKGROUND INFORMATION

In Kenya, some of the major diseases affecting the population include Malaria, HIV,
Cholera, Typhoid, Cancer and Polio. The hindrances to the provision of these services
include lack of adequate access to information, lack of adequate VCT centres,
insufficient provision of Treated Mosquito Nets, poor hygiene, poverty that brings us
to the use of the Internet as a means of propagation of Information.

A great percentage of the Kenyan population own Mobile telephony equipment (about
50% of the population), in essence this could mean that this very population could
have access to the Internet.

We intend to use this as an advantage, by employing online facilities that seek to


capture data from the Kenyan Diaspora on issues evolving around health. This shall
be employed through the use of Medical Websites that seek and gather data from
other sites on the various variables of study.

PREVIOUS RESEARCH

There is information that has been collected on epidemiology in Kenya in the recent
past, most of which were specific on health, especially about HIV, Malaria, Bacteria
and Typhoid by The Centre for Viral Research, under the Kenya Medical Research
Institute (KEMRI), Nairobi, Kenya and other initiatives also actively undertaken by
KEMRI.

They have however been heavily paper based, relying on traditional data techniques
like filling paper questionnaires. Our methodology would intend to make use of a
larger field, which would be the whole of Kenya, and would be entirely digital. This
will be cheaper in terms of implementation, can be done faster and would eradicate
factors such as bias.

PRELIMINARY FINDINGS

The beauty of web 2.0 is its ability to integrate the use of virtually ant technology
under its huge wings. Php has come up as a force to be reckoned with in the creation
and dimensioning of dynamic sites, whose content changes every fraction of a second.
The current Kenyan population, as earlier stated, consists of young and tech adequate
individuals, with the mind and eye for technology.

It has often been joked about the assumption that Kenyans live off Facebook, they
board taxis while they are on Facebook, are constantly on chat and currently 2go
seems so much better than regular cell phone service.

This gets even more interesting: A regular blogger, Moses Kemibaro used Alexa, a
website ranking service to determine the most common websites in Kenya and came
up with this results from first to tenth: Yahoo, Google, Facebook, Windows Live
Search, Hi5, YouTube, MSN, Blogger, RapidShare and Wikipedia with others like
MySpace, Daily and Sunday Nation and the East African Standard in close rank.
Trends in the world Internet usage is hanging towards production of statistics of
website usage and site ratings that is being used, most of the time for economic
marketing purposes.

This highlights the key to the success of this research. Basically, it shall involve the
creation of a main website, specifically dedicated to data collection and analysis. This
data shall be stored in a central database. It shall rely on other website such as those
frequented by Kenyans through the use of advertisements and posts, questionnaires
and other data collection mechanisms.

The same site shall be used by health centres in the country for data collection in the
form of regular data uploads. To facilitate this venture, all the health centres in the
country shall be linked to each other, allowing comparisons and further analysis of the
same data.

This idea brings as to the implementation of this research, which shall rely greatly on
this phenomena. Site visitations shall include statistics on topics and forums on health
most frequented, medicines most purchased, family health histories, lifestyle
statistics, geographical factors, genomic factors, ethnic/tribal/race factors, diseases
most clicked on, prescriptions most filled and other variables of research.

The data collected shall seek to answer some of the following Medical Practice
concerns: What factors affect the onset of disease within a population? What is the
likelihood that a patient will require follow-up treatment, hospitalization or that the
case will worsen? Are there particular clusters of Patients that are likely to develop
certain diseases? Which geographical regions in the country are likely to be affected
by some diseases and which ones are not susceptible? How often is a case mis-
diagnosed? What particular treatments and drug combinations are most likely to cure
the diseases?

Other variables shall include Patient and family Histories, Treatment facilities, Drug
interactions, geographical factors, genomic influences, lifestyle influences and
Treatment Personnel.

The data shall be classified according to the following criteria: Environment in terms
of exposure, location, job risks and diet; Genetics in terms of genetic markers present;
Clinical in terms of blood and other diagnostic data; Familial in terms of other family
members; History in terms of past illnesses; Socio-economics in terms of job,
marriages, education, age and gender; Lifestyle in terms of Exercise, smoking
patterns and alcohol consumption issues; and Ethnicity, Tribal factors, and
Geographical placement.

METHODS

There are various data mining techniques that shall be utilized. A few of them shall
be: Classification which shall seek to find out attributes that best describe the
dependent variables, clustering that shall seek to group various attributes in the entire
population being investigated, association which shall determine whether one event
shall affect another, attribute importance which shall provide information on how one
attribute’s effects overally affect another dependent variable and lift model which
shall seek to find out how well a variable can seek out or identify a required target.

There exists several algorithms that can be used to analyze the data, and their use
shall be dependant on their functionality and resource requirement. Methods such as
Naïve Bayes would be useful when dealing with new data. This is so because it
assumes that the presence or absence of data of a particular feature of a class or
variable is completely unrelated to the presence or absence of another feature of the
same variable. It considers all of the properties to independently contribute to the
probability that a variable is what it is. An example would be such as the
classification of an organism as bacteria provided it has common bacterium
properties.

Another possibility would be the use of Bayesian Probability, slightly similar to the
latter, only that it is probabilistic in approach and depends heavily on the availability
of prior knowledge on the variable being covered or discussed. The assumptions that
were used to formulate policies on the knowledge of that variable are also considered,
provided that they do exist. It is very fast in analysis and in the case of unavailable
data, cross validation and bootstrapping is implemented. Cross validation shall deal
with the subsequent submitted parts of the data to generate patterns while
bootstrapping is employed on random samples of submitted data taken later are being
used instead of the prior ones.

The data shall be collected periodically, after which analysis shall be conducted and
the results released on all available media of communication.

CONCLUSIONS

The collection and analysis of this data on epidemiology shall result to the provision
of better medical services throughout the country. The success of this venture shall be
a great jump for ICT in Medicine in the country. It shall highlight the significance of
Information Technology in the Kenyan Diaspora as a cheap and efficient alternative.
Data mining as a field will be accredited with facilitating the analysis of this data, and
provision of accurate and timely information.

ACKNOWLEDGEMENTS

We would like to acknowledge the work of the A.G Director and Chairman of the
School of Computer Science, Mr. Nicholas Gachui for his dedication towards our
research work. We also appreciate the support of our university, Kimathi University
College of Technology in this noble venture. Mistake and omissions remain ours.

REFERENCES
1. Peter Lucas, Bayesian Analysis, Pattern Analysis and Data Mining in Health
Care, October 2004, 399-403
2. Scott A. Rappoport, Data Mining and Epidemiology, OCP MTS Technologies,
Oracle World 2003
3. Ramoni M, Sebastian P, Cohen P, Bayesian Clustering by Dynamics, Machine
Learning, 2002; 47;91-121
4. Cousin J. The New Maths of Clinical Trials. Science 303; 2004; 784-786
5. Beaumont MA, Rannala B, The Bayesian Revolution in Genetics, Nature
Reviews Genetics 5; 2004; 251-261
6. Greg Rogers, Ellen Joyner, Mining your Data for Health Care Quality
Improvement 1997; 3-5
7. Online Data Mining Projects http://www.ultragem.com/projxmpl.html
8. SAS Instittute Inc. , SAS Communications, Data Mining Reveals Diamonds in
your Database, Third Quarter, 1995
9. Eric V. Siegel, Predictive Analytics with Data Mining, DM Review Magazine,
February 2005
10. James A. Berkley, Brett S. Lowe, KEMRI, Bacteremia Among Children
Admited to a Rural Hospital in Kenya; January 2005
11. ME Parise, JG Ayisi, Prevention of Placental Malaria in an Area of Kenya
with a high Prevalence of malaria and Human Immunodeficiency Virus
Infection; November 1998

Anda mungkin juga menyukai