fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2846626, IEEE Journal of
Biomedical and Health Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1
Abstract—To keep pace with the developments in medical It is widely accepted that health information tools and
informatics, health medical data is being collected continually. But, machine learning techniques can be exploited successfully to
owing to the diversity of its categories and sources, medical data help doctors in diagnosing and treating their patients more
has become highly complicated in many hospitals that it now needs
Clinical Decision Support (CDS) system for its management. To efficiently [4]. Using their experience and knowledge, the
effectively utilize the accumulating health data, we propose a CDS physicians classify patients and diagnose their diseases, but in
framework that can integrate heterogeneous health data from doing so, it is probable that they commit some mistakes,
different sources, such as laboratory test results, basic information particularly when they lack adequate experience or when their
of patients, and health records into a consolidated representation faculty of judgment is poor. In such situations, Clinical
of features of all patients. Using the electronic health medical data
so created, multi-label classification was employed to recommend Decision Support (CDS) systems, including systems that
a list of diseases and thus assist physicians in diagnosing or provide diagnosis, personalized medical measurement,
treating their patients’ health issues more efficiently. Once the treatment and relevant knowledge, would be helpful to the
physician diagnoses the disease of a patient, the next step is to physicians by way of providing them with specific knowledge,
consider the likely complications of that disease, which can lead to patients’ information and intelligent applications, which can
more diseases. Previous studies reveal that correlations do exist
among some diseases. Considering these correlations, a k-nearest improve the efficiency of their decision-making processes [5].
neighbors algorithm is improved for multi-label learning by using CDS systems focus on extracting characteristics of patients,
correlations among labels (CML-kNN). The CML-kNN algorithm based on which they classify patients and provide
first exploits the dependency between every two labels to update corresponding clinical suggestions to the physicians. Patients’
the origin label matrix and then performs multi-label learning to medical information is extracted from their personal medical
estimate the probabilities of labels by using the integrated features.
Finally, it recommends the top N diseases to the physicians. data, such as the physiological data, electronic health records
Experimental results on real health medical data establish the (EHRs), 3D images, radiology images, genomic sequencing,
effectiveness and practicability of the proposed CDS framework. and clinical and billing data. Through CDS applications, the
physicians can avoid the mistakes that are likely to arise from
Index Terms—Diagnosis recommender systems, clinical medical negligence and thus improve the quality of their
decision support system, heterogeneous data sources, multi-label
medical service. In the medical field, the demand for high-
classification
quality clinical support systems has been steadily on the
I. INTRODUCTION increase [6]. In medical scenes, the specifies of scoring
standards and the context complexity of medical field are the
I n the era of fourth revolution of industry (Industry 4.0), the
associated services (smart services) develop rapidly [1]. In
this context, Health 4.0 has been growing as a vital strategic
challenges of clinical decision support system.
Medical institutions keep accumulating health medical data,
which is highly complex in most of the recognized research labs
concept for health domain, which is aimed at providing real- and hospitals. Government agencies have been working hard to
time and personalized health services (called smart healthcare utilize such complex and diverse types of medical data to
services) for patients and professionals [2]. Health data from diagnose patients’ diseases correctly and offer them the right
numerous medical sources (Cyber-Physical Systems, the treatment [7]. To make this happen, the physicians have to
Internet of Things) is collected continuously, facilitating the consider multiple types of health information of patients, like
growth of healthcare industry [3]. laboratory test results, basic attributes, health records and
monitoring data. Medical data comes from different sources,
Mengxing Huang, Huirui Han, Yu Zhang and Uzair Aslam Bhatti are with and most of it is unstructured. Integrating complex medical data
State Key Laboratory of Marine Resource Utilization in South China Sea,
and transforming it into appropriate format are challenging
College of Information Science & Technology, Hainan University, Haikou,
China (email: hanhr26@163.com; huangmx09@163.com; tasks for the CDS systems.
yuzhang_nwpu@163.com; uzairaslambhatti@hotmail.com) At present, although many CDS systems have been
Lefei Li, the Department of Industrial Engineering, Tsinghua University, developed to assist the doctors, they are still unsatisfactory,
Beijing, 100084, China (email: lilefei@tsinghua.edu.cn) because they consider only a single medical condition [8-9].
Hao Wang is with Big Data Lab, Dept. of ICT and Natural Sciences,
Norwegian University of Science and Technology, Aalesund, 6009, Norway But, it is not uncommon that a patient has more than one
(email: hawa@ntnu.no) medical condition, at the same time, because of the
Shared corresponding authors: Mengxing Huang, Hao Wang and Yu Zhang. complications of the first disease. For instance, a patient with
2168-2194 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2846626, IEEE Journal of
Biomedical and Health Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 2
diabetes mellitus type 2 may develop some cardiovascular several correlative diseases by formulating a multi-label
diseases [10] and a patient with hypertensive heart disease may estimation model. Thus, HDS CDS system improves its
get coronary disease or/and angina pectoris [11]. A clinical performance in terms of comprehensiveness and accurate
decision support system might be more complicated to discover diagnosis. In the ongoing HDS CDS system, the patients
more possible diseases as references to clinics. After analyzing initially offer information and, in return, they get information
real-world diagnostic information, it is found that the majority from both the system and the physician. As a result, the
of the patients have more than one disease. Therefore, the interaction among patients, physicians and system will be
clinical decision support system recommends a list of possible enhanced and information dissemination will improve. The
diseases rather than only a single disease. Consequently, the proposed clinical decision support framework for
task of recommendation transforms itself into a multi-label heterogeneous data sources is depicted in Fig. 2.
classification problem [12]. In dealing with this problem, ML- The main contributions of this paper are as follow:
kNN algorithm [13] has gained wide acceptance because of its 1) A novel framework is proposed for retrieving the most
simple procedure and effectiveness. But, this method estimates relevant information of patients from multiple data
each label independently and ignores the correlation among sources, such as laboratory test data, basic information of
labels. Considering the correlations among diseases, we applied patients, symptoms of patients and electrocardiogram data,
a novel multi-label classification approach in the clinical and for combining them to generate integrated features.
decision support framework. 2) Considering the likely complications due to multiple
What see today is the emergence of Health 4.0. It unfolds the medical diseases (conditions), k-nearest neighbors
coming-together of all these technologies, coupled with real- algorithm is proposed for multi-label learning, by using
time data collection, increased use of Artificial Intelligence (AI) correlations among labels (CML-kNN) and for
and an overlay of invisible user interfaces. By focusing on anticipating more potential diseases of a patient, so that a
collaboration, coherence, and convergence the healthcare can list of diseases can be recommended to the physician
be made more accurately predictive and personalized. The main simultaneously.
focus of this work is to build a clinical decision support 3) Using the laboratory test data and basic information of
framework for heterogeneous data sources (HDS CDS) for patients, a set of experiments of different multi-label
assisting doctors in diagnosing the diseases of their patients and learning methods were performed to confirm the
treating them more efficiently. Fig. 1 illustrates the proposed effectiveness and practicality of the proposed framework.
improvement in HDS CDS system. In a tech-free system, the The remainder of the paper is organized as follows. Section
patients need to wait for a long time before they get to consult II presents some related works about clinical decision support
the doctor; besides, their repeated inquiries reduce diagnosis systems and multi-label learning algorithms. We describe
efficiency of their doctors. Also, inexperienced doctors may methods in section III, and the analysis of the correlations in the
find it difficult to diagnose complicated illnesses. Traditional health data is given in section IV. In Section V, the paper
CDS systems (clinical decision support systems) improve introduces the model validation. Finally, section VI concludes
efficiency and effectiveness of diagnosis by giving decision the paper and gives an outlook on future works.
support to the physicians. They integrate historical medical
records, which would be helpful in identifying potential II. RELATED WORKS
diseases and thus in reducing fault risks. HDS CDS systems Big data has five attributes: Volume, Velocity, Variety,
further improve the efficiency and effectiveness of CDS Value and Veracity [14]. The number of visiting patients in a
systems, especially by enhancing the wholeness of the system. general hospital is generally large. For example, Haikou
HDS CDS systems collect and analyze healthcare data from peoples’ hospital had received 46,000 inpatients and 95,000
diverse sources, rather than a single source. They suggest outpatients approximately last year. The health medical data of
2168-2194 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2846626, IEEE Journal of
Biomedical and Health Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 3
Fig. 2. The Description of Clinical Decision Support Framework for Heterogeneous Data Sources
such a large number of patients would obviously be of peta or 3) Through the applicable channels (mobile devices,
zeta bytes, and this refers to the Volume of Big data. The working stations)
patient’s information will be updated as soon as he/she visits 4) Applying the right intervention format (alarm, graphics,
the hospital again, and this represents the Velocity of health buttons)
medical data. The health medical data consists of structured, 5) At the right time within clinical working process (before
semi-structured and unstructured data. Moreover, the health working on the drug prescription, at the point of nursing)
medical data is of different categories: electronic health medical The CDS system can utilize appropriate computing
records written by physicians; data from real-time monitors; technology to improve the efficiency of decision-making [20].
images collected by computed tomography (CT); nuclear The big data of health has been recognized as a great
magnetic resonance images (MRI); cardiac angiographs etc. opportunity for improving CDS systems [21]. Many machine
Each patient’s medical records from professional physicians learning techniques, such as ensemble learning [22], SVMs [23],
and medical instruments reflect his/her real physical condition, deep neural networks [24] and rule-based algorithm [25] were
which represent the Veracity of health medical data. employed in developing clinical decision support systems for
Furthermore, the amenability of the collected health data for some specific diseases. Although these systems achieved high
transformation into useful and meaningful knowledge, which accuracy, the effectiveness of improvement is unsatisfactory,
represents the Value of the data. Therefore, health medical data because they considered only a small number of selected
is a kind of “Big data” to some extent. features [26]. These methods may prove ineffective if they are
Nowadays, the data-intensive applications require a large to deal with a large number of different kinds of features. The
number of efficient models. Numerous stochastic methods [15] existing clinical decision support systems are poor in
were exploited by different researchers for healthcare parameter processing large volumes of multi-structured healthcare data
analysis. Furthermore, physicians consider the similarity and in providing accurate health recommendation, in practice
between the health parameters of a patient for accurate [27-28].
diagnosis decision [16]. Analysis of big data is applied in Traditional classification methods belong mainly to binary-
healthcare to identify clusters of patients and groups of diseases, class or multi-class, in which each sample belongs to a single
which are used to estimate future health condition with the help class. For analyzing some samples with multiple labels, those
of different machine learning techniques [17]. Utilization and methods are transformed to multi-label learning methods,
analysis of health medical data play a vital role in healthcare whose task is to estimate possible labels of target samples. It is
system. possible for a sample from multi-label data to own multiple
The clinical decision support (CDS) system is an information labels, while the sample from single-label data owns only one
system that offers knowledge and personalized information to distinct label. Thus, multi-label data is more complex and
users in enhancing health and healthcare outcomes [18]. The varied. Multi-label learning methods can be divided into two
system is aimed at aiding physicians in diagnosis and treatment groups: problem transformation method (PTM) and algorithm
planning. CDS system can be used for routine requirements or adaption method (AAM) [29].
applied to a specific situation, like implant placement, and the Problem transformation method first transforms one multi-
output can be submitted to users before, during or after clinical label dataset into multiple single-label datasets, and then
decision [19]. For designing and implementing CDS system, exploits existing single-label learning algorithm to process each
the following five concepts must be followed: single-label dataset. Problem transformation method makes
1) Correct information (treatment planning and drug multi-label data to adapt algorithms. Traditional problem
interaction) transformation approaches are BR (binary relevance) method
2) Appropriate user (clinicians, patients) [30], LP (label powerset) method etc. However, algorithm
2168-2194 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2846626, IEEE Journal of
Biomedical and Health Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 4
2168-2194 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2846626, IEEE Journal of
Biomedical and Health Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 5
X i = ( xi1 , xi2 , , xib ) denotes a feature vector of the sample X i and where Pij is the set of samples with label I i and I j . rik is 1, when
xij is the value of X i in feature f j , and Yi = ( yi1 , yi2 ,, yib ) denotes sample k got label i; otherwise, rik is 0. sim( I i , I j ) falls into
2168-2194 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2846626, IEEE Journal of
Biomedical and Health Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 6
B. Correlation between information and diseases 0% 20% 40% 60% 80% 100%
1) Laboratory test data: A patient, who was diagnosed with
hyperlipidemia, was selected from the health medical database. <44 45-60 >60
This patient’s exceptions of laboratory test results and basic
information are listed in Fig. 4. This patient is a female and 53 Fig. 6. The Statistics of Different Diseases, in terms of age
years old, and her condition reflects the fact that the incidence
of hyperlipidemia is higher in menopausal women [47]. disease, diabetes mellitus type 2 and brain infraction, the
Triglyceride (TG), Total Cholesterol (TC), Total Lipids (TL) numbers of males are higher than those of females. The number
and High Density Lipoprotein Cholesterol (HDL-C) are shown of men who drink and/or smoke is much higher than that of
as the exceptions of her lab tests. In the patients with women who drink and/or smoke. Therefore, men are more
hyperlipidemia, the results of TG, TC and TL are usually higher prone to be affected by these illnesses. However, the number of
than their biological reference intervals, whereas the result of males with osteoporosis is smaller than that of females, because
HDL-C is lower [48]. Hence, physicians require laboratory test osteoporosis is more common in women than in men [49].
results to diagnose some diseases. These statistics illustrate that some diseases are correlative with
2) Gender: Fig. 5 shows the number of patients, in terms of gender.
gender, suffering from coronary heart disease, brain infarction, 3) Age: The numbers of patients, in terms of age, suffering
fatty liver and diabetes mellitus type 2. For coronary heart from coronary heart disease, brain infraction, fatty liver and
2168-2194 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2846626, IEEE Journal of
Biomedical and Health Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 7
2168-2194 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2846626, IEEE Journal of
Biomedical and Health Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 8
2168-2194 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2846626, IEEE Journal of
Biomedical and Health Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 9
(a) (b)
Fig. 9. Two Screenshots of the Clinical Decision Support System for Heterogeneous Data Sources Prototype
Sometimes, the physician needs to check the correctness of his which can be applied it in the proposed framework to
or her diagnosis for which he or she needs the laboratory test recommend diseases to physicians.
results of the patient. Once these results are uploaded into the Some experiments were performed on real medical data to
system, the physician can click the left green button to access evaluate the proposed framework. For conducting the
the laboratory test results (see Fig. 9 (b)). Also, the physician experiments, patients with 9 common diseases were selected as
can see the abnormal laboratory test results (see the blue region samples. Five kinds of basic information and 278 laboratory test
in Fig. 9 (b)) and review all the laboratory test results by results of the patients were combined to generate integrated
clicking the green button. Based on the laboratory test results features and carry out multi-label learning methods. The
and basic attributes, the system identifies two possible diseases experimental results show that the improved multi-label
of the patient and shows them to the physician (see orange classification method performs better than the existing methods.
region in Fig. 9 (b)). The physician can click the blue button to Based on the design of the proposed framework, the clinical
add the recommended disease to his or her diagnosis if he or decision support system prototype was developed as well.
she agrees with the recommended disease. Once the key of “add For their future work, the authors propose to continue
the recommended disease to diagnosis” is clicked, the disease integrating textual and monitoring data to generate more
will be added to the diagnosis text box automatically. Then the comprehensive integrated features for each patient. The
physician can click the return key of the browser to get back to increasing diversity in data types calls for an appropriate
Fig. 9 (a) and continue to record diagnosis. method to decrease the number of integrated features for
ensuring the efficiency of the clinical decision support system.
VI. CONCLUSIONS AND FUTURE WORK Because of the scale of labels, the processing of improved
Promoted by Industry 4.0 and Health 4.0, Health data from multi-label algorithm will be a little slow. Therefore, a more
massive medical infrastructures (Cyber-Physical Systems, the appropriate and efficient method to correlate labels will have to
Internet of Things) is collected continually to generate “Big be developed.
data” in the health domain. To fully utilize the health data and
support smart health services, the authors propose here a ACKNOWLEDGMENTS
clinical decision support framework, which integrates This research received financial support from
heterogeneous health medical data from different sources, such the National Natural Science Foundation of China (Grant #:
as laboratory test results, basic information of patients, health 61462022), the National Key Technology Support Program
medical records and monitoring data, including structured and (Grant #:2015BAH55F04, Grant #:2015BAH55F01), Major
unstructured data, to construct an integrated representation of Science and Technology Project of Hainan province (Grant #:
features for all the patients. An improved multi-label ZDKJ2016015), Natural Science Foundation of Hainan
classification was applied to this representation of features to province (Grant#:617062), Scientific Research
recommend suggested diseases to physicians in the framework. Staring Foundation of Hainan University (Grant #: kyqd1610).
After the physician diagnoses the disease of a visiting patient,
he or she has to consider the complications of that disease for REFERENCES
recognizing possible other diseases. It is an accepted fact that
correlations do exist among some diseases. Therefore, a k- [1] Hermann M, Pentek T, Otto B. Design Principles for Industrie 4.0
nearest neighbors algorithm was improved for multi-label Scenarios, pp. 3928-3937, 2016.
[2] Thuemmler C, Bai C. Health 4.0: How Virtualization and Big Data are
learning by using correlations among labels (CML-kNN), Revolutionizing Healthcare, Springer International Publishing, 2017.
2168-2194 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2018.2846626, IEEE Journal of
Biomedical and Health Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 10
[3] Cao P, Xian Medical University. “The Impact of Big Data on the [26] P. Anooj, “Clinical decision support system: Risk level estimation of heart
Development of Chinese Health Service Industry”, Journal of Electronic disease using weighted fuzzy rules”, Journal of King Saud University-
Test, 2014. Computer and Information Sciences, vol. 24, no. 1, pp. 27-40, 2012.
[4] M. Akay, D. I. Fotiadis, K. S. Nikita and R. W. Williams, “Guest Editorial: [27] J. Xu, D. Sow, D. Turaga and M. van der Schaar, “Online Transfer
Biomedical Informatics in Clinical Environments,” Biomedical and Learning for Differential Diagnosis Determination,” AAAI Workshop
Health Informatics, IEEE Journal of, vol. 19, no. 1, pp. 149-150, 2015. onthe World Wide Web and Public Health Intelligence, 2014.
[5] Black A D, Car J, Pagliari C, et al. “The impact of eHealth on the quality [28] S. Molinaro, S. Pieroni, F. Mariani and M. N. Liebman, “Personalized
and safety of health care: a systematic overview”, PLoS Medicine, vol. 8, medicine: Moving from correlation to causality in breast cancer”,
no.1, 2011. NewHorizons in Translational Medicine, vol. 2, no. 2, 2015.
[6] Horsky J, Schiff G D, Johnston D, et al. “Methodological Review: [29] Tsoumakas G, Katakis I, Vlahavas I, “Mining Multi-label Data”, Data
Interface design principles for usable decision support: A targeted review Mining and Knowledge Discovery Handbook. 2010:667-685.
of best practices for clinical prescribing interventions”, Journal of [30] Boutell M R, Luo J, Shen X, et al. “Learning multi-label scene
Biomedical Informatics, vol. 45, no. 6, pp. 1202-1216, 2012. classification”, Pattern Recognition, vol. 37, no.9, pp. 1757-1771, 2004.
[7] Valdez A C, Ziefle M, Verbert K, et al. “Recommender Systems for [31] Clare A, King R D, “Knowledge Discovery in Multi-label Phenotype
Health Informatics: State-of-the-Art and Future Perspectives”, Machine Data”, Lecture Notes in Computer Science, vol. 2168, no.2168, pp. 42-53,
Learning for Health Informatics, Springer International Publishing, 2016. 2001.
[8] Yu W D, Pratiksha C, Swati S, et al. “A Modeling Approach to Big Data [32] Elisseeff A, Weston J, “A kernel method for multi-labelled classification”,
Based Recommendation Engine in Modern Health Care Environment”, International Conference on Neural Information Processing Systems:
Computer Software and Applications Conference, IEEE Computer Natural and Synthetic. MIT Press, pp.681-687, 2001.
Society, pp.75-86, 2015. [33] Min-Ling Zhang, Zhi-Hua Zhou, “Multilabel Neural Networks with
[9] Bhatti U A, Huang M, Zhang Y, et al. Research on the Smartphone Based Applications to Functional Genomics and Text Categorization”, IEEE
eHealth Systems for Strengthing Healthcare Organization[M]// Smart Transactions on Knowledge and Data Engineering, vol.18, no.10, pp.
Health. 2017. 1338-1351, 2006.
[10] Vancampfort D, Mugisha J, Hallgren M, et al. “The prevalence of diabetes [34] Spyromitros E, Tsoumakas G, Vlahavas I, “An Empirical Study of Lazy
mellitus type 2 in people with alcohol use disorders: a systematic review Multilabel Classification Algorithms”, Artificial Intelligence: Theories,
and large scale meta-analysis”, Psychiatry Research, vol. 246, pp. 394- Models and Applications. Springer Berlin Heidelberg, pp. 401-406, 2008.
400, 2016. [35] Xu J, Xu Z X, Lu P, et al, “Classifying syndromes in Chinese medicine
[11] Drazner M H. “The progression of hypertensive heart disease”, using multi-label learning algorithm with relevant features for each label”,
Circulation, vol. 123, no. 3, pp. 327-34, 2011. Chinese Journal of Integrative Medicine, vol. 22, no. 11, pp. 867-871,
[12] Wang Y, Li P, Tian Y, et al. “A Shared Decision Making System for 2016.
Diabetes Medication Choice Utilizing Electronic Health Record Data”, [36] Frazer-Abel A, Sepiashvili L, Mbughuni M M, et al. “Overview of
IEEE Journal of Biomedical & Health Informatics, vol.1, no.1, Laboratory Testing and Clinical Presentations of Complement
pp.99,2016. Deficiencies and Dysregulation”, Advances in Clinical Chemistry, vol. 77,
[13] Zhang M L, Zhou Z H. ML-KNN, “A lazy learning approach to multi- no. 1 , 2016.
label learning”, Pattern Recognition, vol.40, no.7, pp.2038-2048,2007. [37] Shivade C, Raghavan P, Fosler-Lussier E, et al, “A review of approaches
[14] Hu H, Wen Y, Chua T S, et al. “Toward Scalable Systems for Big Data to identifying patient phenotype cohorts using electronic health records”,
Analytics: A Technology Tutorial”, IEEE Access, vol. 2, no. 1, pp. 652- Journal of the American Medical Informatics Association, vol. 21, no. 2,
687, 2017. pp.221-30, 2014.
[15] Z. Yu et al., “Incremental semi-supervised clustering ensemble for high [38] Hoffman M D, Blei D M, Bach F R. “Online Learning for Latent Dirichlet
dimensional data clustering,'' IEEE Trans. Knowl. Data Eng., vol. 28, no. Allocation”, Advances in Neural Information Processing Systems, vol. 23,
3, pp. 701-714, Mar. 2016. pp.856-864,2010.
[16] M. Mukaka, “A guide to appropriate use of correlation coefficient in [39] C. De Oliveira L S, V. Andreão R, M. Sarcinelli-Filho, “Premature
medical research”, Malawi Med. J., vol. 24, no. 3, pp. 69-71, 2012. ventricular beat classification using a dynamic Bayesian network in
[17] S. Rallapalli, R. R. Gondkar, and U. P. K. Ketavarapu, “Impact of Engineering, in: Medicine and Biology Society, EMBC, Annual
processing and analyzing healthcare big data on cloud computing International Conference of the IEEE, IEEE, pp. 4984–4987, 2011.
environment by implementing hadoop cluster”, Procedia Comput. Sci., [40] H.F. Huang, G.S. Hu, L. Zhu, Sparse representation-based heart beat
vol. 85, pp. 16-22, May 2016. classification using independent component analysis, J. Med. Syst. Vol.
[18] D. Demner-Fushman , W.W. Chapman , C.J. McDonald, “What can 36, no. 3, pp.1235–1247, 2012.
natural language processing do for clinical decision support?”, J. Biomed, [41] S. Banerjee, M. Mitra, Application of cross wavelet transform for ecg
Inform, vol. 42, no. 5, pp. 760–772, 2009. pattern analysis and classification, IEEE Trans. Instrum. Meas. Vol. 63,
[19] Wei Li, Jue Jiang, Guoguang Xia, et.al, “A study on the relation between no.2, pp.326–333, 2014.
the age and gender factors for fatty liver, hyperlipidemia and the insulin [42] MIT-BIH. Arrhythmia DataBase Directory, viewed 18/10/2017,
resistance”, General Medical Journal, vol.3, no.5, pp.355-356, 2000. Available:http://www.physionet.org/physiobank/database/html/mitdbdir/
[20] Brian H R, Wilczynski N L. “Effects of computerized clinical decision mitdbdir.html
support systems on practitioner performance and patient outcomes: [43] Smith L I, “A Tutorial on Principal Components Analysis”, Information
Methods of a decision-maker-researcher partnership systematic review”, Fusion, vol. 51, no. 3, pp. 52, 2002.
Implementation Science, vol. 5, no. 1, pp.12, 2010. [44] Kak A C, Martínez A M, “PCA versus LDA”, IEEE Transactions on
[21] M. Viceconti, P. Hunter, and R. Hose, “Big data, big knowledge: big data Pattern Analysis & Machine Intelligence, vol. 23, no. 3-4, pp. 228-233,
for personalized healthcare”, Biomedical and Health Informatics, IEEE 2001.
Journal of, vol. 19, no. 4, pp. 1209-1215, 2015. [45] Kang F, Jin R, Sukthankar R, “Correlated Label Propagation with
[22] K. Zarkogianni and K. S. Nikita, “Personal health systems for diabetes Application to Multi-label Learning”, IEEE Computer Society
management, early diagnosis and prevention”, Handbook of Research on Conference on Computer Vision and Pattern Recognition, IEEE
Trends in the Diagnosis and Treatment of Chronic Conditions, pp. Computer Society, pp. 1719-1726, 2006.
465.2015. [46] Weikum G, “Foundations of statistical natural language processing”,
[23] E. Çomak, A. Arslan and İ. Türkoğlu, “A decision support system Information Retrieval Journal, vol. 4, no. 1, pp. 80-81, 2001.
basedon support vector machines for diagnosis of the heart valve [47] Wei Li, Jue Jiang, Guoguang Xia, et.al, “A study on the relation between
diseases”, Computers in Biology and Medicine, vol. 37, no. 1, pp. 21-27, the age and gender factors for fatty liver, hyperlipidemia and the insulin
2007. resistance”, General Medical Journal, vol.3, no.5, pp. 355-356, 2000.
[24] S. Molinaro, S. Pieroni, F. Mariani and M. N. Liebman, “Personalized [48] Dayi Hu, Zhimin Xiang, “How to correctly diagnose and treat
medicine: Moving from correlation to causality in breast cancer”, hyperlipidemia?”, Chinese Journal of Medicine, vol. 24, no. 4, 2000.
NewHorizons in Translational Medicine, vol. 2, no. 2, 2015. [49] Lin T H, Lung C C, Su H P, et al. “Association Between Periodontal
[25] Song L, Hsu W, Xu J, et al, “Using Contextual Learning to Improve Disease and Osteoporosis by Gender: A Nationwide Population-Based
Diagnostic Accuracy: Application in Breast Cancer Screening”, Biomed Cohort Study Medicine”, vol. 94, no. 7, pp. 553, 2015.
Health Inform, IEEE Journal of, vol 20, no. 3, pp. 902-914, 2016. [50] http://www.who.int/healthinfo/survey/ageing_mds_report_en_daressalaa
m.pdf
2168-2194 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.