Anda di halaman 1dari 5

170

2017 International Conference on Computation of Power,Energy,Information and Communication (ICCPEIC).

Identifying Symptoms and Treatment for Heart Disease


from Biomedical Literature Using Text
Data Mining
P. Sudeshna#1, S.Bhanumathi*2,M.R,Anish Hamlin#3
#P.G Student, Department of Computer Science and
Engineering, Sathyabama University, Chennai.
1Sudheshna.26794@gmail.com.
3annishhamlin@gmail.com.
* Assistant Professor, School of Computing,
Sathyabama University, Chennai.
2 banujun8@gmail.com

Abstract: As the heart diseases causing the major early stages we can prevent the possibility of the
problem and they became the main cause of death heart attack and we can take precautions. There is
worldwide, because it is very difficult for so much of data hidden in the medical records have
identifying the disease, based on symptoms, for a lot of information which may be helpful, for
that we need lots of experience and also predicting the disease, if we used it properly and
knowledge. For knowing and identifying it is efficiently. To make the detection easy, the
taking a lot of time for doctors also, because they researchers convert the unused data in the form of
have to observe the health condition and also the the dataset that is in the form of tables by using the
food habits of the patients, it is being a very long techniques. Some deaths occur suddenly due to
process, that the patient has to go to doctor and take heart diseases because they cannot identify the
the test and it is being a long process. The disease and also the symptoms that they are
identification of the symptoms and also predicting suffering and not considering them. Formodelling
the disease should be done fastly and efficiently, using different data mining techniques. Doctors
the problem of identifying and also predicting should predict the diseases before they occur in
should be solved. we are proposing a technique, their corresponding patients, and the symptoms
which predicts of the disease and probability of the should be identify. There are so many reasons
disease and probability of the disease and which increase the possibility of the heart diseases,
probability of getting attacks, and there is predicted reasons may be like smoking, high and low blood
based on the patient data set. . In our sector of pressure, and consumption of more alcohol, and
health we have lots of information which is hidden higher sugar level in blood and also the unhealthy
and it may be useful for us in making the effective diet.
and also the precise decisions. Medicine Analysis is Cardio Vascular Disease (CVD) includes
achieved by evidence using data mining Technique. coronary heart, congenital artery, mild stroke,
Process is achieved by Analysis of Patient heart hypertensive heart and also the damage in the
Health Condition, Formulating Questions gives hearts major blood vessels. Data mining is a
heart diseases, Gathering evidences Analysis about technique for discovering the knowledge to
heart disease. we modify an Automatic Machine examine the data and enclose it into the useful
Technique is used for Disease identification and information. We are proposing a method that will
correct Medicine Analysis based on evidence is predict the probability of getting heart disease
achieved. Until Disease is not yet Diagnosis based on that particular symptoms given by patient
Evidence based Medicine Analysis is of no use. We and the patient’s health record. The major goals of
also analyze the disease and best drug advised to data mining are predictions and descriptions which
that specific patient. are mostly used. Prediction is nothing but knowing
something form the given information. In data
Keywords—Heart Disease, Difficult, Medical mining prediction is that knowing the variables in
Practitioners, Medicine Analysis, Best Drug. the dataset. For finding the unknown values or a
future state value of another attribute. Description
gives special importance in pattern discovering in
I. INTRODUCTION the datasets. It involves the methods of statistics
Heart diseases are the main cause of death and also the database system.
worldwide, if the disease is detected during the In data mining prediction is used to help
in finding the changes or developments in the

978-1-5090-4324-8/17/$31.00 ©2017 IEEE


171

P. Sudeshna et al: IDENTIFYING SYMPTOMS AND TREATMENT FOR HEART DISEASE FROM
BIOMEDICAL LITERATURE USING TEXT DATA MINING

patient data for improving the health condition of data into useful information which is used in
the patient. An attempt to utilize knowledge and to decision making, are provided by data mining in
gain the experience and all the clinical details of the form of methodologies and technologies, we
the patients, to identify the nature of the symptoms have so many them for the higher accuracy and the
and recognize the diseases is taken as the big quick prediction of disease is made easy with the
opportunity. Data mining plays a main role in the use of data mining algorithms.
health sector for predicting the diseases. Data [2]Cardiovascular disease remains the
mining is also known as the discovery of biggest cause of deaths worldwide and the disease
knowledge in databases. And it may be defined as should be Predicted at the initial stage is very
the implicit extraction of the unknown and much important., because the data which is generated in
useful information from the large datasets. It our medical organizations (hospitals, medical
contains the process that performs automatically, centres) is in very large amounts but alsothe data
the task is that to extract and also discover the which we have is not used efficiently. Available
features that are hidden from large datasets. data is of large amounts and that data is not used
The main and essential task of data mining properly. Handling that large data by traditional
is to build the accurate and also efficient classifiers way affect the results.[3].
for the large databases. Classification is that the Researchers based on the heart disease
initial data analysis step for exploring the set of prediction are increasing, it happened in most of
cases to check if they can be matched and grouped the categories, then all the research came with a
based on the similarity to each other. The main result that gives us the techniques in predicting the
reason for classifying is for the best understanding heart disease in each and every category. All the
purpose for improving the predictions which are questions will be answered in a way that shows
compared to the unclassified data. concern with what is held to be socially acceptable
There are so many types of the by the help of data mining tools. And all the
classification techniques, such as SVM,SMO, healthcare where diagnosis are made is made only
clustering is a type technique in data mining, it by the doctors based on their knowledge and their
helps in segmenting the data into so many number practice, each and every time going meeting doctor
of centralized clusters, clustering means the data’s and checking may be a tough job.[4].While
are grouped in small groups based on similarities. prescribing drugs we need to consider s many
Cluster’s quality is measured by it’s diameter that factors, like side effects of that particular drug
is distance between any two object in that particular interaction and it is very complicated for knowing
cluster. Based on these we choose the k-means the presence of the properties of that drug those all
clustering for effectiveness. are the factors. And those factors are mainly based
In data mining the research on the heart on the characteristics of the patients like the gender
based diseases is the most important research in the and age and also their medical record. So based on
recent years in all the topics of medical science. that we need to provide a useful tool for assisting
The heart disease has been the major cause of death the prescribed drugs by the doctors.[5]
for most of the people including youth and also the For querying the drugs we need to develop
children. So the information from different sources an different and unique approach is made
are collected, so we have lots of information, these successfully for finding the answers for the given
should be mined using the text data mining, the profile of the patient. The information of drugs
data is semi-structured in most of the databases. which we are utilizing from different sources are
the data with noise and that data may be sometimes
II.RELATEDWORK created manually or they may be the automatic
extraction of the some of the text resources.[6] The
The huge amount of data which is related data we have may be incomplete, and it might also
to the health care is collected by the healthcare may be noisy, so to overcome this both the exactly
industry, But those data are not mined for the matched answers and also closely matched queries
discovery unfortunately, are not mined to discover all should be considered and comparing and be
hidden information, which makes the decision giving the answer for that query that is information
making effective. The effects which are made in of the drug should be given.[7]
discovering the hidden patterns and the Biomedical data have high and high
relationships has not been benefited. The volumes of the data that data is used for the
techniques are used but the efficient mining has not identification of the indication of the drugs which
done, so the discovery became little tough, so the are existing, sometimes it may get failure after
hidden information is not discovered.[1]The performing so many experiments also, so we take
techniques for the transformation of large pile of help from the fields like biomedical,

978-1-5090-4324-8/17/$31.00 ©2017 IEEE


172

2017 International Conference on Computation of Power,Energy,Information and Communication (ICCPEIC).

pharmaceutical and also in the areas of informatics. heart disease Advantages of existing system are it
So we are using the data which is related to the is Reliable that is it is easily understandable and it
pharma that is we check the genetic makeup of an is also adaptive, and automatically disease is
individual affect patients response to the drugs. and analysed More effectively, and very easily it can be
based on its studies ,so to address the problems identified ,the best drug for the particular disease is
related to the repositioning of the drugs we use the also identified with the help of the doctor, then that
technologies in the web and also the informatics is suggested to the patient via email.
applied in it. We explore the pharma profiles and An Technique is used for finding diseases
the profiles related to the pahrma cogenomics and and it’s Evidence is appropriate based on Medicine
the drug and the food administration of the usa for Analysis is achieved. Until Disease is not yet
the approved drugs of the heart disease. And then Diagnosis Evidence based Medicine Analysis is of
we convert the related drugs and also the no use. We also analyze the disease and best drug
performance of that drug will be taken into advised to that specific patient. We also arrange
consideration [8] appointment to the Best Doctor for the consultation
Now-a-days the information in the web based on user feed backs.
scale will be having the so many entities that is it
contain and billions of entities, while qurying these
entities we have abig problem. And information
which we have is the most noisy information in
nature, so it is very difficult for us to match the
exact answer that is related to the similar query, so
that answers related to that query should be
matched and it should be taken perfectly and also
correctly. So we need to provide an algorithm. [9]

A.EXIXTING SYSTEM
Now a days the main cause for the Fig 1:System Architecture
maximum deaths in world are the heart diseases,
this leading the main cause, because it is difficult to Figure.1 shows about the First the patient gives
predict the heart attack, because it is the most the symptoms of the disease, then automatically the
difficult to predict the heart attack, because it is the details are sent to the server, then the server uses
most difficult task, to be done. In our sector of the svm algorithm and it suggests some diseases
health we have lots of information which is hidden and it will send those to the doctor via email.
and it may be useful for us in making the effective To deploy the analysis Research process is used
and also the precise decisions. Disadvantage of for Data analysis ,it is nothing but vast amount of
existing system are it is Unreliable and it cannot data with unstructured format, in this unstructured
give the correct output which we need, so we are format data we have valid information. So we used
going to search the better way it is not that much this concept of get vast of information from the
effective, it cannot give the correct answer, that is insurance domain to get useful information in the
the existing system cannot identify the disease ,the domain of health care. Whereas it then uses the
best drug is also not identified, so we are going to user machine for auto diagnosis of the disease with
identify that in our proposed system. reference to the user input of symptoms and
reports. System will automatically identify the
III. PROPOSED SYSTEM disease with the help of the doctor, then the best
drug and also the side effects are sent mail to the
Data mining is a technique for knowledge patient. So the patient will be knowing the best
discovery for the data analyzationand the extraction drug.
of the information which is most useful, we are
proposing a technique, which predicts of the A.SVM Algorithm
disease and probability of the disease and The algorithm which we choose is the
probability of the disease and probability of getting SVM algorithm for the clear understanding, SVM
attacks, and there is predicted based on the patient DT the method which we are going to implement
data set and the Evidence Based Medicine Analysis is the one which we used here. It is a type of the
is achieved using data mining Technique. This supervised machine learning algorithm which we
Process is achieved by Analysis of heart Patient mainly used for the classification of the data and
Health Condition, Gives heart disease Formulating also for knowing the relationship between the
Questions, Evidence Gathering Analysis about

978-1-5090-4324-8/17/$31.00 ©2017 IEEE


173

P. Sudeshna et al: IDENTIFYING SYMPTOMS AND TREATMENT FOR HEART DISEASE FROM
BIOMEDICAL LITERATURE USING TEXT DATA MINING

variables. it is the supervised algorithm which has a _ using two attribute frequency table
training phase in the initial and we need to give the
data to the algorithm, the data which is classified C.Sequential Minimal Optimization
and labelled already. After finishing the initial While training the SVM, problem named
stage of the training then the another datasets are quadratic programming(QP) will be arise to solve
given to the algorithm which will be classified with and overcome that problem we use the SMO,
the help of the minimal intervention of the human. optimization problems which we face during the
The support vector machine have so many SVM are solved by the help of the SMO and it is
classification an example is supervised learning and also known as the iterative algorithm. These
it has the labels which helps us in indicating that problems are solved by SM in a way that is the
the system performance is the right way or in the problems is breaked in to number of small small
wrong way. Accuracy of the system is validated by sub problems, and then those sub problems are
the responses which we get and those responses are solved separately and they are solved analytically.
achievedby the help of the information points.
Based on the response of our system we that the IV.IMPLEMENTATION DETAILS
system is running correctly or not and the
information point helps the system to learn how to A. Patient Data Gathering
act correctly. Patient has to register his details, such as
SVM has a step called feature selection personal details and the symptoms and the patient
that it identifies the know classes and which they will be given one number like id , and it is used for
connecting and also in which they are identifying. further reference and it is also used for sending the
When the unknown sample and its prediction is details to the doctor.The patient once registered
node needed much then then we will use SVM and those details will be saved for later use and that
also the feature selection evenly. These both are details will be stored in our database.This includes
also used for identifying the keysets that are information such as the treatment, symptoms of
involved in different process that help to disease.
differentiate the classes.

B .Patient account
B. Decision Tree once the patient get registered , patient
In the classification technique the most have his own account, and patient can login any
popular technique is DT, In that we have internal time and check the details and check the
nodes which denotes the attribute test, the test suggestions given to him.Patient account will be
outcome is denoted by the help of the branches and linked to their mail id, doctor’s suggestion will be
the class label will be present on the each of the sent directly to their mail id.
leaf nodes, in which the leaf nodes as well as the C .Doctor registration
decision nodes are present in that we will be having First the doctor will register by giving
five leaf nodes and four decision nodes. The mail id and all the details, and they have a their
construction of DT classifiers do not need any new login.Once the patient give details and give
knowledge in domain or setting new parameters, so symptoms then those details will be sent to the
DT classifiers are constructed very easily and its doctor through the mail, then the doctor checks
very simple and also very popular and for the those details and then they will suggest the drug
discovery of exploratoryon knowledge decision and also the food habits they should follow.Doctor
tree is the best. also informs about the side effects of those drugs
Entropy: The data decision tree is built top-down and also tells about the usage.
from a root node and will be portioned into subsets
which are having the same values and also the D. Analysing the disease
similar instances, based on these partitioning the Based on the symptoms given by the
tree is built from top-down and starting from the patient, SVM algorithm analyzes the disease from
root node. And tree is built based on partitioning. the dataset we have, and that details we will once
In the ID3 algorithm the homogeneity of the sent to the doctor for confirmation. Then the
sample is calculated by the help of entropy. If the disease is analyzed and also the treatment for that
entropy is zero then the sample is homogeneous disease will be analyzed and given to the patient.
completely, or if the entropy is of two then the
sample is divided equally two types of entropy is E.Identifying the best drug
calculated while constructing the decision tree Based on the patients details and
those are given below: symptoms, SVM algorithm will generate the drug
_ using one attributefrequency table details and also the side effects of that drug, and

978-1-5090-4324-8/17/$31.00 ©2017 IEEE


174

2017 International Conference on Computation of Power,Energy,Information and Communication (ICCPEIC).

those details will be sent to the doctor’s mail. Then [4] Aussem, A., de Morais, S.R., Corbex, M.,
the doctor checks and suggests the best drug from ”Analysis of nasopharyngeal carcinoma risk
that and then the information will be mailed to the factors with Bayesian networks. ”, Artif. Intell.
patient. Med. 54 (1), 5362. , 2012
[5] Barisic, I., Wilhelm, V., Stambuk, N., et al., ,
V. CONCLUSION ”Machine learning based analysis of
In our sector of health we have lots of biochemical and morphologic parameters in
information which is hidden and it may be useful patients with dialysis related amyloidosis. ”,
for us in making the effective and also the precise Croat. Chem. Acta 75 (4), 935944, 2012
decisions.we are proposing a technique, which [6] 4. Blake, C., ”Text mining”, Annual Rev. Inf.
predicts of the disease and probability of the Sci. Technol. 45, 123155. Cohen, T.,
disease and probability of the disease and Widdows, D., Schvaneveldt, R.W., Davies, P.,
probability of getting attacks, and there is predicted Rindflesch, T.C., 2012. Discovering discovery
based on the patient data set. We recognize the patterns with predication-based semantic
symptoms and based on the symptoms the disease indexing. J. Biomed. Inform. 45 (6),
is found and the drug for that disease are identified 10491065. http://dx.doi.org/10.1016/j.jbi.2012.
and based on the patient dataset we identify the 07.003, 2011.
best drug that would set for their health conditions [7] Sudhakar.M, Mayan J.A., Srinivasan.N,
and that the best drug is suggested to the patient. "Intelligent data prediction system using data
mining and neural networks", Advances in
Intelligent Systems and Computing, March
2015.
REFERENCES [8] Albert Mayan .J , Dr. T. Ravi, “Optimized
[1] Neha Chikshe, Tejasweeta Dixit et al, ”Hybrid Regression Testing using Genetic Algorithm
approach for heart disease detection using and Dependency Structure Matrix”,
clustering and ANN”, International journal on International Journal of Applied Engineering
recent and innovation trends in computing and Research , Vol:9, Issue:20, pp: 7679-7690,Nov
communication ISSN:2321-8169 Vol.4 issue 2014 , ISSN: 1087-1090
1,pp.119-122,2016. [9] James, M.T., Hemmelgarn, B.R., Tonelli, M.,
[2] HarithaJagad, JehanKandawalla et al, 2010.Renal Medicine Early recognition and
“Detection of Coronary Heart Diseases using prevention of chronic kidney disease.Lancet
Data Mining Techniques”,International journal 375 (9722), pp:1296-1309.,2010
on recent and innovation trends in computing
and communication ISSN:2321-8169 Vol.4 [10]S. Dhamodaran, K. R. Sachin and Rahul
issue 11,2015. Kumar, “Big Data Implementation of Natural
[3] Anonymous, ” KDIGO clinical practice Disaster Monitoring and Alerting System in Real
guideline for the evaluation and management Time Social Network using Hadoop Technology”,
of chronic kidney disease ”,Kidney Int. Suppl. Indian Journal of Science and Technology, Vol
3 (1), 150,2013 8(22), IPL0278, September 2015

978-1-5090-4324-8/17/$31.00 ©2017 IEEE

Anda mungkin juga menyukai