Anda di halaman 1dari 2

Autonomic Intrusion Detection System

Wei Wang 1 , Thomas Guyet 2,3 , and Svein J. Knapskog 1

1 Centre for Quantifiable Quality of Service in Communication Systems, Norwegian University of Science and Technology (NTNU) {wei.wang, knapskog}@q2s.ntnu.no 2 Project DREAM, INRIA Rennes/IRISA, France 3 AGROCAMPUS OUEST, Rennes, France

Abstract. We propose a novel framework of autonomic intrusion de- tection that fulfills online and adaptive intrusion detection in unlabeled audit data streams. The framework owns ability of self-managing: self- labeling, self-updating and self-adapting. Affinity Propagation (AP) uses the framework to learn a subject’s behavior through dynamical cluster- ing of the streaming data. The testing results with a large real HTTP log stream demonstrate the effectiveness and efficiency of the method.

1 Problem statement, motivation and solution

Anomaly Intrusion Detection Systems (IDS) are important in current network security framework. Insomuch as data involved in current network environments evolves continuously and as the normal behavior of a subject may have some changes over time, a static anomaly IDS is often ineffective. The detection models should be frequently updated by incorporating new incoming normal examples and be adapted to behavioral changes. To achieve this goal, there are at least two main difficulties: (1) the lack of precisely labeled data that is very difficult to obtain in practice; (2) the streaming nature of the data with behavioral changes. In order to tackle these difficulties, we propose a framework to fulfil au- tonomic intrusion detection that detects anomalies in an online and adaptive fashion through dynamical clustering of audit data streams. The autonomic IDS works in a fashion of self-managing, adapting to unpredictable changes whilst hiding intrinsic complexity to operators. It has abilities of self-labeling, self-updating and self-adapting for detecting attacks over unlabeled audit data streams. The self-updating consists in updating the detection model to take into account the normal variability of the data items. On the opposite, self-adapting consists in rebuilding the model in case of behavioral changes. The framework is under an assumption of rareness of abnormal data. We thus “capture” the anomalies by finding outliers in the data streams. Given a bunch of data stream, our method identifies outliers through the initial clus- tering. In the framework, the detection model is a set of clusters of normal data items. The outliers generated during the clustering as well as any incoming outlier that is too far from the current model are suspected to be attacks. To refine our diagnosis, we define three states of a data item: normal, suspicious and anomalous. If an outlier is identified, it is marked as suspicious and then put into a reservoir. Otherwise, the detection model is updated with the normal incoming data until a change is found, triggering model rebuilding to adapt to the current behavior. A suspicious item is considered as real anomalous if it is again marked as suspicious after the adaption.

2

Implementation and discussion

The autonomic IDS is effective for detecting rare attacks [1]. Detecting bursty

attacks is a challenge as the attack scenario does not well match the assumption. We thus design another two mechanisms during the autonomic detection. First, if

a data item is very far from the model, the data item will be flagged as anomalous

immediately (other than considered as suspicious). Second, a change is triggered

if the percentage of outliers is high (e.g., larger than 60%) during a time period.

Bursty attacks can thus be easily identified by the large dissimilarity and by the prompt model rebuilding. We use Affinity Propagation (AP) and StrAP [2] to detect bursty attacks with the framework. We use a real HTTP log stream to test the method. Character distribution of each HTTP request is used as the feature and the IDS is to identify whether a request is normal or not. The data contains 40,095 requests in which 239 attacks occurring in a very short interval (request 7923-9743th, see Fig.1(a), the k-NN distance between a data item and the training items) after filtering out static requests. To facilitate comparison, we also use another three static methods k-NN, PCA and one class SVM for the detection. The first 800 attack-free requests are used for training the static models while the first 800 requests are used for AP initial clustering. Testing results are shown in Fig.1(b).

0.6 attacks normal 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 3 3.5 4
0.6
attacks normal
0.5
0.4
0.3
0.2
0.1
0
0
0.5
1
3
3.5
4
HTTP 1.5 requests 2 in order 2.5
x 10 4
(a) Distance distribution of the log stream
Anomaly index
100 90 80 70 60 50 40 30 kNN AP PCA SVM 20 0 5
100
90
80
70
60
50
40
30
kNN AP PCA SVM
20
0
5
25
30
10 False Positive 15 Rate (%) 20
Detection Rate (%)

(b) Testing results with comparison

Fig. 1. Dynamic normal behaviors and testing results with comparison

Fig.1(a) shows that the normal behavior changes over time and Fig.1(b) indicates that the autonomic detection method achieves the better results than other three static methods while the detection rates are higher than 50%. Note that the autonomic IDS does not need a priori knowledge while static methods need labeled data for training. Our future work is combining the autonomic IDS with effective static methods to prevent mimicry attacks (e.g., implementing large-scale attacks to evade the autonomic IDS).

References

1. Wang, W., Masseglia, F., Guyet, T., Quiniou, R., Cordier, M.O.: A general frame- work for adaptive and online detection of web attacks. In: WWW. (2009) 1141–1142

2. Zhang, X., Furtlehner, C., Sebag, M.: Data streaming with affinity propagation. In:

ECML/PKDD (2). (2008) 628–643