Fuzzy Decision Tree

International Journal of Fuzzy Systems, Vol. 16, No.
2, June 2014 265
Apply Fuzzy Decision Tree to Information Security Risk Assessment

Zne-Jung Lee and Li-Yun Chang
Abstract1 to prevent unauthorized access of information system

which could cause a loss of confidential data or even
As computer becomes popular and internet ad- damage the reputation of the company or organization, it
vances rapidly, information application systems are is an important issue to properly deal with information
used extensively in organizations. Various informa- security risk assessment [1]. Information security risk
tion application systems such as attendance systems, assessment ascertains the threat and vulnerability associ-
accounting systems, and statistical systems have al- ated with assets. With respect to information security
ready replaced manual operations. In such a drastic risk assessment, ISO27001 and ISO/IEC27005:2008
change, the information security issue encountered standard mention the development of risk management
by organizations becomes increasingly significant. and attributes of risk assessment [1, 2].
Information security risk assessment is the core of Recently, information security risk assessment is be-
information security. It focuses on the assessments of ing developed by qualitative and quantitative methods.
assets with confidentiality, integrity and availability. Qualitative risk evaluation methods include factor analy-
Moreover, vulnerability of information systems and sis, logical analysis, historical comparative method,
threats to the outside are also included in the scope of Delphi method, and analytic hierarchy process. The ma-
consideration. This study adopts fuzzy decision tree jor disadvantages of qualitative risk evaluation methods
to evaluate the information security risk assessment are rely heavily on subjective, because they are based on
for decision-makers. There are 155 input-output data judgment, intuition, and experiences [3-7]. Quantitative
with 22 attributes used to measure the value at risk risk evaluation methods use statistics data to build mod-
obtained from ISO/IEC 27001 information security els. Typical methods of quantitative risk evaluation in-
management system standard and ISO/IEC27005: clude cluster analysis, time series model, regression
2008 Information technology. Another zoo dataset model, and decision tree [2, 6]. Above mentioned meth-
collected from UCI repository is also used to test the ods have some drawbacks, because they are too cum-
performance for the proposed algorithm. From bersome to implement [1, 7]. For information security
simulation results, the proposed approach outper- risk assessment, it is an important issue that the behavior
forms other existing approaches. of risk assessment could be verified for decision-makers.
In this paper, a fuzzy decision tree is applied to informa-
Keywords: Information Security, Risk Assessment, tion security risk assessment. In the proposed algorithm,
Fuzzy Decision Tree, ISO 27001, ISO 27005:2008. it combines fuzzy method and decision tree (DT), and
has the advantage of good performance for information
1. Introduction security risk assessment. The purpose of this study is to
apply fuzzy method to increase the testing accuracy for
Along with the development of office automation sys- DT.
tems and the fast spread of internet, information security The remainder of this paper is organized as follows.
issues have become a significant concern. As informa- Section 2 reviews information security risk assessment
tion security related standards have been developed, and DT. Section 3 then introduces the proposed algo-
corporations and organizations are paying more and rithm. Simulation results are compared with other exist-
more attention to information security problems. In order ing approaches in Section 4. Conclusions are finally
drawn in Section 5.
Corresponding Author: Zne-Jung Lee is with the Department of In-
formation Management, Huafan University, No. 1, Hua Fan Rd. 2. Literature Review
Shihding District, New Taipei City, 22301, Taiwan.
E-mail: johnlee@gm.hfu.edu.tw
Li-Yun Chang is with the Information Security Assistant Manager
This research primarily uses fuzzy method and deci-
Department of Mechatronic Engineering, Hua Fan University. No. 1, sion tree to explore the information security risk assess-
Huafan Rd. Shihding District, New Taipei City, 22301, Taiwan. ment. In this section, information security risk assess-
E-mail: a9999b@gmail.com ment and these used approaches are briefly described.
Manuscript received 5 Oct. 2013; revised 24 Feb. 2014; accepted 9
May 2014.
2014 TFSA
266 International Journal of Fuzzy Systems, Vol. 16, No. 2, June 2014
A. Information security risk assessment where Info ( S ) is the average amount of information
Information is a type of asset which is the same as needed to identify the class of a case in S, InfoX (T ) is
other important business assets. Because it is essential to
the operation of an organization, it needs to be properly the expected information value for feature X to the parti-
protected. This is especially important as operation en- tion T, S is the number of cases in the training set, Ci
vironments become increasingly interconnected. Infor- is a class, i=1,2,,k and k is the number of classes,
mation security risk assessment is a basis for the devel- freq(Ci , S ) is the number of cases included in Ci., n is
opment of safety measures to ensure information secu- the number of outputs for feature X, Tj is a subset of T
rity [2]. It is protecting information from a variety of corresponding to the jth output, and T j is the number of
threats to ensure continuity of operations, minimize op-
erations risk, earn a substantial return on investment and cases of the subset Tj.
win greater business opportunities [2]. Information secu-
rity risk assessment is mainly a combination of the risk 3. The Proposed Algorithm
and vulnerability level assessments and the potential
impact of adverse events on operations. The analysis The information security risk assessment consisted
model for information security risk assessment is intro- mainly of information value of asset, threat and vulner-
duced in ISO27001:2005 and ISO/IEC 27005. The ISO ability. This research uses an attendance management
27001:2005 Information Security Management Systems system at a government agency as an example. This sys-
(ISMS)-Requirements is mainly implementation and tem is the core operation system of the government
documentation requirements when establishing ISMS. It agency. All of the leave and attendance records for the
can also be used as a verification standard. The purpose agency must be processed, reviewed and cancelled
of ISO/IEC 27005 is to provide guidelines for informa- through this system. In this paper, a fuzzy decision tree
tion security risk management. It is designed to assist the is proposed to evaluate information security risk assess-
satisfactory implementation of information security ment. In the proposed algorithm, it addresses a tree to-
based on a risk management approach. This research pology with fuzzy method. The tree grows using a deci-
also referred to the relevant recommendations from sion test of splitting point information (SplitInfo) in the
ISO/IEC 27005. internal nodes. The fuzzy decision tree uses each deci-
sion test by two sigmoidal membership functions,
B. Decision tree 0 1/ (1 exp(( x ) / )) , describing the concept of
Decision tree is based on the greedy algorithm that less than and greater than [13]. For example, the mem-
utilizes a divide-and-conquer strategy to recursively bership functions describe the concepts less than 3 and
construct decision tree [8]. It consists of the root node, greater than 3 assuming = -0.3(less than) and =
internal nodes, branches, and leaves. Each decision tree 0.3(greater than). Each leaf of the decision tree is asso-
represents a rule which categorizes data according to ciated with a region of the input space and its related tree
these attributes [8]. A node specifies an attribute (feature) model. To compute the output for a given input, it must
in the dataset. A branch connects either two nodes or a start at the root and find the membership values for each
node and a leaf. Each node has a number of branches pair of membership functions in the internal nodes of all
which are labeled as the possible value of attribute in the paths from the root node to leaf nodes. Thereafter, the
parent node [9-12]. Leaves are labeled as the decision membership values are aggregated by t-norm operator
value of classification. DT starts from the root node to and linked to the decision tree model of each tree leaf.
select the attribute with the maximum value of informa- The recursive partitioning strategy above results in tree
tion gain and then divides the dataset into subsets. This that is consistent with the training pattern [12]. For con-
process terminates when all the data in the current subset structing the decision tree, these above procedures will
belong to the same class. Let the classes be denoted be continued to subdivide the set of training patterns un-
C1 , C2 , , Ck , and the decision tree for T is a leaf iden- til each subset in the partition contains only one class of
tifying class Ci . The information gain is calculated as patterns. The procedure of the proposed algorithm is
shown as follows:
follows [9]: Step 1: Calculate Info(S) to identify the class in the
Information Gain( X ) Info( S ) InfoX (T ) training set S.
k
freq(Ci , S ) freq(Ci , S ) (1) Step 2: Calculate the expected information value Infox (S )
Info( S ) log 2
i 1 S S for feature X to the partition S.
n T
Step 3: Calculate the Information Gain (X) after parti-
InfoX (T )
j
Info(T j ) tioning according to feature X.
j 1 T Step 4: Calculate the partition information value
Z.-J. Lee and L.-Y. Chang: Apply Fuzzy Decision Tree to Information Security Risk Assessment 267
SplitInfo ( X ) acquired for S partitioned into L subsets radial basis kernel function (RBF) is used in SVM. Two
L S
parameters of the RBF applied in SVM, C and r, are set
S
SplitInfo( X ) i log 2 i (2) as [20]. The testing accuracy is shown in Table 2. The
i 1
S S testing accuracy of the proposed algorithm is 96.8%.
Step 5: Calculate the output y . From Table 2, the proposed algorithm has the best per-
formance among these compared approaches. Further-
i=1
m
yi w i more, another zoo dataset collected from UCI repository
y (3)
im=1w i is also used to test the performance for the proposed al-
gorithm [21]. There are 101 instances with 17 attributes
where , w i Tj=1
1
j (x) , m is the number of leaves, T is a
for zoo dataset as shown in Table 3. The simulation re-
t-norm, l is the number of internal nodes reached from sults are shown in Table 4. From Table 4, the testing ac-
root node to leaf node i, and j is one of the sigmoidal curacy of zoo dataset for the proposed algorithm is
membership functions associated with internal node j. 99.5% and it is the best solution among these compared
Step 6: These above procedures will be continued to approaches. For both datasets, the proposed algorithm
subdivide the set of training patterns until each subset in can obtain the best testing accuracy among these com-
the partition contains only one class of patterns. pared approaches. Clearly, the proposed algorithm out-
performs other existing approaches.
4. Simulation Results
Table 1. The used 22 attributes for the information security
risk assessment.
In this paper, the proposed algorithm is applied to in-
formation security risk assessment. These input-output Number# Attribute Name
data and attributes are obtained from a practical atten-
dance system. There are 155 input-output data with 22 1 The value of asset
2 Maintenance error
attributes in this study. These 22 attributes is listed in 3 Hardware failures
Table 1. Each value of attribute is an integer between 1 4 Theft
and 4. The k-fold (k=5) cross validation with random 5 Misuse of resources
partitions is used to evaluate the accuracy [14]. Thus, the 6 Operational staff error
dataset was split into 5 parts, with each part of the data 7 Incorrect use of software and hardware
sharing the same proportion of each class of data. Four 8 Lack of documentation
data parts were applied in the training process, while the 9 Lack of efficient configuration change control
10 Lack of audit-trail
remaining one was utilized in the testing process. To 11 Insufficiency security training
verify the performance of the proposed algorithm, vari- 12 Complicated use interface
ous approaches include the proposed algorithm are used 13 Insufficient maintenance
to compare the simulation results. In these compared 14 Lack of periodic replacement schemes
15 Insufficiency professional training
approaches, back propagation network (BPN) is the most 16 Inadequate service maintenance response
widely used neural network model, and its network be- 17 Inadequate recruitment procedures
havior is determined on the basis of input-output learn- 18 Unsupervised work by outside or cleaning staff
ing pairs [15]. For the BPN, three layers are used in this 19 Lack of security awareness
paper. The number of hidden nodes, the learning rate and 20 Wrong allocation of access rights
Inadequate or careless use of physical access con-
training iterations are set as [16]. Fuzzy c-means (FCM) 21
trol to buildings and rooms
algorithm is one of the most widely used fuzzy cluster- 22 Lack of strict operation process procedure
ing algorithms. FCM algorithm attempts to partition a
finite collection of n elements into a collection of c fuzzy Table 2. The testing accuracy of the proposed approach and
clusters with respect to some given criterion. Given a other learning algorithms for the information security risk as-
finite set of data, the algorithm returns a list of c cluster sessment.
centers and a partition matrix, where each element tells
Algorithm Accuracy
the degree to which element should belong to the cluster
The proposed algorithm 96.8%
[17]. For the FCM, the number of clusters c is consider- Decision tree 90.3%
ing the best accuracy over c = 2, 3,,N, where N is the FCM 83.9%
number of attributes. Support vector machine (SVM) BPN 84.6%
introduced by Vapnik and co-workers is a learning sys- SVM 87.1%
tem [18]. SVM uses a hypothesis space of linear func-
tion in a high dimensional feature space and has been
successfully applied in a wide variety of fields [19]. The
268 International Journal of Fuzzy Systems, Vol. 16, No. 2, June 2014
Table 3. The used 17 attributes for the zoo data. [4] D. G. Feng, Y. Zhang, and Y. Q. Zhang, Survey of
Number# Attribute Name Data Type information security risk assessment, Jour-
1 Animal name Continuous nal-China Institute of Communications, vol. 25, no.
2 Hair Nominal 7, pp. 10-18, 2004.
3 Feathers Continuous [5] Y. Yang and S. Z. Yao, Risk assessment method of
4 Eggs: Nominal information security based on threat analysis,
5 Milk Nominal
6 Airborne Nominal
Computer Engineering and Applications, vol. 45, no.
7 Aquatic Nominal 3, pp. 94-96, 2009.
8 Predator Nominal [6] F. Liu, K. Dai, Z. Y. Wang, and Z. P. Cai, Research
9 Toothed Nominal on the technology of quantitative security evalua-
10 Backbone Nominal tion based on fuzzy number arithmetic operations,
11 Breathes Nominal
12 Venomous Nominal Fuzzy Systems and Mathematics, vol. 4, pp. 20,
13 Fins Nominal 2004.
14 Legs Nominal [7] C. C. Lo and W. J. Chen, A hybrid information
15 Tail Nominal security risk assessment procedure considering in-
16 Domestic Nominal
17 Cat size Nominal
terdependences between controls, Expert Systems
with Applications, vol. 39, no. 1, pp. 247-257, 2012.
[8] H. Kim and G. J. Koehler, Theory and practice of
Table 4. The testing accuracy of the proposed approach and decision tree induction, Omega, vol. 23, pp.
other learning algorithms for the zoo data. 637-652, 1995.
Algorithm Accuracy [9] J. R. Quinlan, Induction of decision trees, Ma-
The propose algorithm 99.5% chine Learning, pp. 81-106, 1986.
Decision tree 94% [10] J. R. Quinlan, Decision trees as probabilistic clas-
FCM 89.6%
BPN 90.5%
sifiers, in Proc. of 4th International Workshop
SVM 96.3% Machine Learning, Irvine, California, pp. 31-37,
1987.
5. Conclusions [11] J. R. Quinlan, C4.5: Programs for Machine Learn-
ing, Morgan Kaufmann: San Francisco USA, 1993.
The information security risk assessment includes [12] J. R. Quinlan, Improved use of continuous attrib-
valuation of assets, threat and vulnerability analysis. In utes in C4.5, Journal of Artificial Intelligence Re-
this paper, fuzzy decision tree is applied to evaluate in- search, pp. 77-90, 1996.
formation security risk assessment. In the proposed al- [13] A. Lemos, W. Caminhas, and F. Gomide, Fuzzy
gorithm, fuzzy method can increase the testing accuracy evolving linear regression trees, Evolving Systems,
for DT. Moreover, the extracted decision rules can be vol. 2, no. 1, pp. 1-14, 2011.
used to analyze gathered information security risk as- [14] S. L. Salzberg, On comparing classifiers: pitfalls to
sessment for decision-makers. The testing accuracy of avoid and a recommended approach, Data Mining
the proposed algorithm is 96.8%. It is shown that the Knowledge Discovery, vol. 1, pp. 317-327, 1997.
proposed algorithm can find the best accuracy for infor- [15] O. Lezoray and H. Cardot, A neural network ar-
mation security risk evaluation. chitecture for data classification, International
journal of neural systems, vol. 11, no. 1, pp. 33-42,
2001.
References
[16] Z. J. Lee, K. C. Ying, S. C. Chen, and S. W. Lin,
Applying PSO-based BPN for predicting the yield
[1] G. H. Gao, X. Y. Li, B. J. Zhang, and W. X. Xiao,
rate of DRAM modules produced using defective
Information security risk assessment based on in-
ICs, International Journal of Advanced Manufac-
formation measure and fuzzy clustering, Journal of
turing Technology, vol. 49, pp. 987-999, 2010.
Software, vol. 6, no. 11, pp. 2159-2166, 2011.
[17] A. F. Gomez-Skarmeta, M. Delgado, and M. A. Vila,
[2] S. Fu and Y. Xiao, An effective process of infor-
About the use of fuzzy clustering techniques for
mation security risk assessment, International
fuzzy model identification, Fuzzy sets and systems,
Conference on Computer and Automation Engi-
vol. 106, no. 2, pp. 179-188, 1999.
neering, vol. 3, pp. 124-128, 2011.
[18] V. N. Vapnik, Statistical learning theory, 1998.
[3] Z. Yu and Z. Ji, A survey on the evolution of risk
[19] C. J. Burges, A tutorial on support vector machines
evaluation for information systems security, En-
for pattern recognition, Data mining and knowl-
ergy Procedia, vol. 17, pp. 1288-1294, 2011.
edge discovery, vol. 2, no. 2, pp. 121-167, 1998.
Z.-J. Lee and L.-Y. Chang: Apply Fuzzy Decision Tree to Information Security Risk Assessment 269
[20] S. W. Lin, Z. J. Lee, S. C. Chen, and T. Y. Tseng,

Parameter determination of support vector ma-
chines and feature selection using simulated an-
nealing approach, Applied Soft Computing, vol. 8,
no. 4, pp. 1505-1512, 2008.
[21] C. Blake, E. Keogh, and C. J. Merz. (1998) UCI
repository of machine learning databases [Online].
Available: http://www.ics.uci.edu/mlearn/MLRe-
pository.html
Copyright of International Journal of Fuzzy Systems is the property of Taiwan Fuzzy System
Association and its content may not be copied or emailed to multiple sites or posted to a
listserv without the copyright holder's express written permission. However, users may print,
download, or email articles for individual use.

Fuzzy Decision Tree

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Fuzzy Decision Tree

Diunggah oleh

Hak Cipta:

Format Tersedia

International Journal of Fuzzy Systems, Vol. 16, No.

2, June 2014 265

Apply Fuzzy Decision Tree to Information Security Risk Assessment

Abstract1 to prevent unauthorized access of information system

[20] S. W. Lin, Z. J. Lee, S. C. Chen, and T. Y. Tseng,

Anda mungkin juga menyukai