P.Jayashree1 , Dr.K.S.Easwarakumar2
1
Department of Information Technology, Anna University, MIT, Chennai,India
2
Department of Computer Science and Engineering, Anna University, CEG, Chennai,India
pjshree@annauniv.edu
ABSTRACT
With the development and deployment of increasing internet services due to
emerging technologies to meet the ever growing demands of the web users, the
necessity to make these services available in also equally demanding. But web has
become a necessary iniquity due to cyber attacks that are springing in abundance
everyday. One of the most threatening attacks is the denial of service attacks
originated from a single or multiple sources to make the legitimate users starve
from the requested services. Many solutions have been proposed in the literature to
defend against such attacks with each one having its own strength and weakness.
In this paper an optimal datamining based defense cum protection mechanism, that
identifies and uses the candidate packet attributes which can demark the attack
packets from legitimate traffic more accurately, is devised as a complement to
existing solutions and tested for its detection efficiency using ANTS, an active
network test bed.
Training Phase
Attribute Tree
attack traffic packets attributes
Populator
attribute trees
Packet Elicitor
Tree Attributer
attack
attribute trees packets
Traffic Classifier
real traffic packets
normal
packets
Detection phase
score point based on degree of relevance to attack corresponding to the various values taken by the
characteristics possessed by them. This information attribute. Each node Nj has two fields to signify the
is fed back to the set of attribute trees that are used attribute value (Vj) and the frequency (Fj). The tree
for classifying the traffic. This positive feed back is constructed dynamically as repeated insertion of
aids in fine tuning the classifier to more correctly nodes as and when a packet with that attribute
classify the traffic. The packets that score above a arrives. The set of trees for the deciding set of
predefined threshold value are stamped as attack attributes used for attack detection is represented as
packets and get dropped at the router and thereby in Eq.(1).
prevented from entering the network. The detection
mechanism outlined here is depicted in the fig. 1. SA Ai , i is an integer
4 SYSTEM DESIGN DETAILS Ti Nj , j is an integer (1)
Nj Vj , Fj
4.1 Attribute Tree construction
After identifying the necessary attributes of the During the training phase the trees are initially
packet, called the deciding set (SA), which can more populated with DoS attack packets of varying classes
clearly distinguish the legitimate packets from the and during the detection phase the trees are
attack packets, as the conformity of each property dynamically updated with incoming real packet
towards its legitimacy decision is not the same, the attributes when analysed to be an attack packet.
packet elicitor extracts the deciding set of attributes The range and type of values taken by the
from the incoming packets. The deciding set is various attributes defined by S A, is not within a
selected such that when some attributes fail to detect defined boundary. In order to perform effective
the packet correctly, the others in the set should be searching of the trees it is proposed to convert the
able to do it. Hence they are not considered as actual values to an equivalent hash integer values .
independent quantities; instead, they are highly The field Vj of the jth node of an attribute defines the
interrelated with each other such that each feature hash equivalent of the actual attribute value. Pearson
completely cooperates with the rest in deciding the hashing [35] is simple and less likely to have
legitimacy of the packets. collisions. Given an input (C), consisting of any
Each element (Ai) in the deciding set SA, is number of bytes, it produces as output, a single byte
represented by a binary search tree Ti . The tree Ti is (h) that is strongly dependent on every byte of the
represented as a collection of nodes N1,N2 etc input. Its implementation requires only few
60 2 62 6
46 4 78 2 46 4 80 3
25 2 60 2 78 2
25 2 62 6 80 3
67 1 67 1
Heaviness = 51 Heaviness = 42
performing a search throughout its height though factor be Si for the tree Ti. Frequency of the node
takes longer time, will add only a very small delay. corresponding to the attribute value of the incoming
So restructuring of the tree helps in achieving the packet is Fi. The value Fi / Si gives the contribution
search efficiency. It is required that there should be of that node or attribute value in that particular tree.
an optimal rearrangement such that the heaviness H Packet score is nothing but the weighed ratio of the
is minimized. For a tree Ti with node Nj having number of attack packets having that value for the
frequency Fj and depth Dj, the heaviness Hi is feature to the total number of packets that have
defined as in Eq.(3). been used to construct the tree. The decision
whether to pass the packet or drop it is taken based
H i all nodes j ( Fj Dj ) (3) on this packet score value.
For optimizing the tree, a parameter called tree Score attributes i (W i Fi S i ) / attributes i (W i )
heaviness is considered as the objective function. It (4)
The root node is defined as level 1 and successive The packets scoring a high value is detected as
siblings at successive levels. The optimal tree is an attack as they resemble the more frequently
obtained using dynamic programming approach as occurred packets structured in the attribute trees for
applied in the Maximum Chain Multiplication attack traffic. Packets scoring a lower value may
problem. From a given set of nodes, the most not be attack packets. Some delimiter value for the
appropriate root node is chosen that serves the best. score is to be used to classify the packets as attack
The same procedure is applied at all levels or not. This threshold value should be able to
recursively to arrive at the most optimized tree. It is correctly classify the packets. This is determined
the most appropriate tree needed which satisfies all using the sensitivity analysis by plotting the
the constraints and is optimal. An example attribute response curves of the traffic classification for
tree with depth 4 and its equivalent optimized tree various threshold values. The statistics is collected
are shown in Fig.2 for legitimate, attack and mixed traffic. Let the
attack threshold value figured out is Tha. If score >
5 DEFENSE STRATEGY – PACKET Tha, then the packet is classified as attack and is
SCORE used to update the trees and then dropped at the
router itself. This feedback of the attack
Size of a tree is defined by the number of characteristics helps in refining the detection
packets that have been used to construct that tree. accuracy by enabling the packets to score values
Numerically it is equal to the sum of the that have distinct margins for attack and legitimate
frequencies of all the nodes of the tree. Let this size packets.
Figure 3: Test topology in active network Based on various simulation runs performed
using generic, nominal and SYN-flood attacks, the
DARPA dataset is the standard dataset in the false alarm rate is evaluated. The average false
field of intrusion detection [37],[38] .KDD 99 positive percentage is 2.65 for nominal traffic and 0
intrusion detection datasets, which are based on for others while the average false negative
DARPA 98 dataset, provides labeled data for percentage is 2.5, 2.08, 3.55 for generic, nominal
feature identification and is the only labeled dataset and SYN flood attacks. Since the solution deployed
publicly available. 10% of the data set corresponds at the routers employs feed back loops to allow
to DoS attacks. In the training data set containing learning cum detection for fine tuning the detection
24 attack types classified into 4 broad classes, only process, it is justified that false negative rate exceed
the DoS class of records were taken as the data set false positive as some attack packets get through
for evaluation. The relevance of each feature in the routers undetected at the initial time instances
KDD 99 intrusion detection datasets with 41 of testing time window.
[26] W. Lee, S. J. Stolfo, and K. Mok: A data [38]S. D. Moitra and S. L. Konda: An empirical
mining framework for building intrusion detection investigation of network attacks on computer
model, IEEE Symposium on Security and Privacy, systems, Computers and Security, vol. 23, no. 1, pp.
pp. 120–132(1999). 43–51,(2004).