Proposed System
A Healthcare-as-a-Service system is developed using vast collection of
heterogeneous data which is one of the biggest challenges which requires a
specialized approach. To address this challenge, a new fuzzy rule based classifier is
presented in this paper with an aim to provide Healthcare-as-a-Service. The pro-
posed scheme is based upon the initial cluster formation, retrieval, and processing of
the big data in cloud environ- ment. Then, a fuzzy rule based classifier is designed for
efficient decision making for data classification in the proposed scheme. To perform
inferencing from the collected data, membership functions are designed for
fuzzification and defuzzification processes.
Block Diagram
Fig 1.1: Network Model Framework (healthcare data acquisition and processing
module).
Methodology
The cloud is divided into a set of sub-clouds based on the parameters which are used
to find out the list of patients who are suffering from a particular disease. For
example, sub-cloud SA may be an age cloud, sub-cloud SB can be Blood Pressure
(BP) cloud and similarly other sub-clouds can also be formed for other. These sub-
clouds are the collection of various parameters that cover most of the diseases and
these sub-clouds are further divided into different clusters as shown in Fig. 2. The
clustering is done based on the modified EM[1] algorithm by calculating the
membership of each value with specified clusters, and updating the clusters
accordingly. For example, sub-cloud SA representing age cloud can be classified into
three clusters of different age groups such as - infants, adults and old aged per- sons
(Different clusters would be formed after consulting with an expert).
Moreover, a fuzzy rule-based classifier has also been presented to classify fuzzy data
and to retrieve fuzzy queries as specified by the expert. Once the clusters are formed,
whenever a new record is received, cloudlets send different parameters to their
respective sub-clouds, where data is stored in different clusters according to their
membership value. The detailed de- scription about membership value of data is
given in the next section. The major benefit of using this scheme of different sub-
clouds for storing various parameters is that it classifies the records intelligently and
retrieve the results quickly. For exam- ple, the doctor or the expert can specify the
symptoms (in terms of parameters) and cloudlets will generate the results based on
the parameters of the patients stored in various sub-clouds. The novelty of our
approach lies in the fact that whenever a new disease breaks out, the doctors can
simply pass the symptoms for that disease in the system and the cloud will generate
the probable patients which manifest those symptoms. A doctor can easily quantify
the parameters according the symptoms of the disease. The parameters that cannot be
exactly quantified are fuzzified and processed accordingly.
The flow of information in the proposed scheme is as shown in Fig. 1.2. The data
gathered from the patient using various sensors is stored in the cloud as explained
above. The doctor then searches for the queries in the cloud of a particular disease.
The cloud provides the search results of those patients to the doctor which satisfies
the query.
Modules Description
1. Data Acquisition Layer: The data acquisition layer is responsible for gathering the
data of the patients from different geographical domains. This data can be generated
from the body sensor network, vehicular ad-hoc network, and network of devices
present in homes or at various hospitals. The data from vehicular network is
generated either by using body sensors or the sensors that are placed in the vehi- cles.
These vehicles are generally hospital vans and ambulances; however the patients can
send the data from their personal vehi- cles if these are equipped with appropriate
sensors. The devices in these networks send the data individually or can choose a
network head which oversees the transmission of data for each of these networks.
The intra-network communication amongst the devices is done using short range
communication techniques such as Bluetooth, ZigBee, Passive Radio Frequency
IDentifier (RFID), Ultra WideBand (UWB), and 60 GHz Millimeter Wave .
3.Computational Layer: All the data generated from the networks or devices go to the
cloud server through transmission layer, where the programmed cloudlets receive the
data and store it on different sub-clouds based on its context. The results are then
computed in order to serve the requests arising from different clients. Cloud
computing is used for man- agement of health care services which is the basis of
Healthcare as-a-service (HaaS). The cloud services are used for real-time storage of
data sensed from patients’ body. This data is collected through various body sensors
which are deployed in patient’s body, or in the form of wearable sensors. Extending
to traditional storage utility of HaaS, our work not only gives fast and reliable
information to doctors, but can also predict the future diseases that a patient may
suffer. This requires providing symptoms of a disease that are given by an expert or
specialized doctor.
Open Issues
Block Diagram
Methodology
When working with large-scale data, the vocabulary size W can easily reach millions.
In those cases, training of the neural network becomes a challenging task, as updates
of word vectors become computationally expensive. For that reason, recent
approaches propose log-linear models which aim to reduce the computational
complexity. The use of hierarchi- cal softmax or negative sampling is shown to be
effec- tive in substantially speeding up the training process.
3. Predictive Model: Several penalized linear models for regression and classifi-
cation tasks are used in our experiments. In particular, for regression problems we
apply linear regression. On the other hand, for the classification problem we use the
logistic regression model. Vector w is an unknown set of weights for both prediction
models, and IðÞ is an indicator function equal to 1 if the argument is true and 0
otherwise. In addition, for both models we explored a number of regularization
approaches, ranging from l1 Lasso to over- lapping group Lasso penalizations. We
summarized the training objectives of five penalized line models in Table 3, where l1
indicates Lasso norm and lq is norm of the non- overlapping groups, wi and wGi
indicate a single dimension of the weight vector and a group of dimensions defined
by the index set Gi, respectively. The index sets Gi for group Lasso models were
defined in groups of ten consecutive features, indexed from 1 to 10, 11 to 20, and so
on until M 9 to M (smaller groups showed better performance). For the overlapping
group Lasso the index sets were defined as 1 to 20, 11 to 30, and so on. All
parameters were set to be equal and chosen from range [0.01,0.1], determined
through cross-validation. In the conducted experiments, an implementation from the
efficient SLEP package is done.
Open Issues
3. “ A Hybrid Feature Selection with Ensemble Classification for
Imbalanced Healthcare Data: A Case Study for Brain Tumor
Diagnosis”, Shamsul Huda, John Yearwood, Herbert Jelinek,
Mohammad Mehedi Hassan, Giancarlo Fortino, and Michael
Buckland-2017
Proposed System
The proposed approach develops a globally optimized Arti- ficial Neural Network Input
Gain Measurment Approxima- tion (GANNIGMA) based hybrid feature selection which
is combined with an ensemble classification (GANNIGMA- ensemble) technique to
generate the diagnostic decision rule. The GANNIGMA[5] hybrid feature selection in
the proposed approach finds the significant features which help to generate a simplified
rule. Ensemble classifier improves the classification accuracy.
Block Diagram
Methodology
The data collection procedure from different patients was approved by the University
of Sydney Human Ethics Committee. Neurology patients at the Royal Prince Alfred
Hospital, Sydney Australia with and without the 1p19q co-deletion variant of
ologodendroglioma .The system develops a globally optimized Arti- ficial Neural
Network Input Gain Measurment Approximation (GANNIGMA) based hybrid
feature selection which is combined with an ensemble classification (GANNIGMA-
ensemble) technique to generate the diagnostic decision rule. The GANNIGMA
hybrid feature selection in the proposed approach finds the significant features which
help to generate a simplified rule. Ensemble classifier improves the classifi-cation
accuracy.
Modules
1.Feature Selection: Filter approach can find the intrinsic relationships between the
individual diagnosis feature and tumor class. However, filter approach did not use
any performance evaluation criteria based on accuracies. In contrast, wrapper
approach uses accuracy based performance evaluation. Since the wrapper approach
uses a classification accuracy based performance evaluation criteria during training, it
can be ensured from wrapper approach that selected subset by the wrapper can
achieve a better peformance; however, it may take more computational cost. The
proposed hybrid approach integrates the knowledge about the intrinsic relationship
between a particular feature with corresponding class estimated by the filter in the
wrapper search process and takes advantages of the complementary properties of both
approaches. In our proposed tumor feature selection approach, a mutual information
(MI) based Maximum Relevance Minimum Redundancy (MRMR) [47], [48] filter
ranking heuristic is combined with the wrapper heuristic. The wrapper is taken as an
Artificial Neural Network (ANN) .
Open Issues
Complications in combining filter approach and wrapper approach although accuracy
increases manifolds. GANNINGMA feature selection based approach ensemble is
computationally intensive. The imbalanced dataset in healthcare data is an inherent
limitation.
4. “Healthcare Big Data Voice Pathology Assessment Framework”, M
Shamim Hossain, and Ghulam Muhammad-2016
Proposed System
A cloud server receives the medical data from many different hospitals and clinics. In
this paper, the publisher mainly focuson on speech signals from patients with or
without vocal fold pathology. The speech signals are processed on the cloud server.
Feature extraction is done. These features are fed into a classification unit in the
cloud. In the classification unit, three machine learning algorithms, SVM, GMM[6],
and ELM, are used. A ranking sys- tem based on the scores from these algorithms is
introduced to provide a classification.
Block Diagram
Fig. 4.1 Block diagram of the proposed big data for voice pathology assessment.
Methodology
Big healthcare data is contributed to by various data sources. From these data
sources, relevant information is sent to the healthcare cloud datacenters for analysis
by a big data application. The data is delivered through intermediate communication
and processing, where it has been pre-processed for noise reduction, unreliability,
inconsistency, and analog-to-digital conversion. Sometimes this pre-processing is
done based on the opinion of the health- care professionals.
With the increasing volume of collected data extracted from heterogeneous sources, it
is necessary that this large dataset should be stored in a database or a distributed file
system for aggregation or processing by different healthcare stakeholders. In the
aggregation phase, at first, the feature is extracted from the original data (e.g., signal)
and then further processed by classification, normalization, fusion, or mod- elling.
The features can be extracted based on the advice of healthcare professionals or
domain experts. Then processed data (as training) and test data are fed through the
machine learning algorithm. Training is sometimes based on previous historical data.
Classification algorithms have the potential to detect abnormalities, such as with
voice pathol- ogy detection, or to assist with diagnosis as a reference for physicians.
A popular platform for parallel programming is Google’s MapReduce, which is
characterized by its sophisticated techniques of load balancing and fault tolerance.
Apache Hadoop’s big data platform can implement the MapReduce algorithm for
healthcare big data analysis. MapReduce is used in processing healthcare data where
response time is not so critical. For large healthcare data, the MapR (MapReduce
version 2) can be used for implementing machine learning algorithms, to provide
useful and accurate data for improved patient care. However, for the fast pro- cessing
as well as for continuous streaming of data, Apache Spark may be used. It has a set of
Application program interfaces (APIs) for machine learning. After data processing,
the meaningful healthcare data recommendations are sent to the relevant stakeholder.
Modules
1.Hospital Data :The hospital data for the VPA may vary in many ways. As we are
concerned with only speech signals, we focus on the diversity of these signals
captured in many hospitals. The speech signals may contain only sustained vowels
/a/, /i/, or /o/, or a combination of them. Alternatively, they may contain more
complex speech such as whole words, phrases, or sentences from a conversation,
spontaneous speech, or reading aloud. Therefore, speech signals differ in both content
and length. Speech signals may be recorded by different media and can be captured in
different environments, such as in a sound-treated room or in a normal office room.
The sampling frequencies may also differ. For each patient, there is information
about gender, age, smoking or non-smoking, weight, diagnosis, severity, and so on;
however, some fields in this information may be missing. This results in a very large
volume of data that are then transferred to the cloud for processing. At the processing
stage, we process only voice or speech signals; EHR data are kept in a separate index.
Due to the unstruc- tured nature of the data, we also have to filter out the poor quality
or incorrectly labeled samples; this should be done at least for the training data.
2.Feature Extraction: There are many feature extraction techniques involved in the
VPA. These include Mel-frequency cepstral coefficients (MFCC) , multi-
dimensional voice program (MDVP) , MPEG-7 low-level audio descriptors , IDP ,
and glottal noise parameters. Each of them has its own advantages and disadvan-
tages; however, after a careful consideration, we find that both the MPEG-7 audio
features and the IDP features provided good results in the literature, and they are not
too much affected by the diversity of the recorded signals. Therefore, we adopt these
two features in the proposed assessment.
Open Issues
There are different combinations for feature extraction and feature classifications
which make it tedious to work through every combination and perform the
experiment in order to get one final output. The accuracy for different combinations
differ and sometimes it is difficult to tell which of the combinations is better than
which.
APPLICATIONS
Helps to filter highly heterogenous healthcare data into coherent data to be used for
mining.
Helps to take general symptoms during data acquisition and using it to predict the
disease and required diagnosis.
Has huge implications in day to day life as everyone comes across medical issues
every now and then.
REFERENCES
[7] J. F. Parkinson et al., ‘‘The impact of molecular and clinical factors on patient
outcome in oligodendroglioma from 20 years’ experience at a single centre,’’ J. Clin.
Neurosci., vol. 18, no. 3, pp. 329–333, Mar. 2011.