Anda di halaman 1dari 5

Using Neuro-Fuzzy Approach to Reduce False Positive Alerts

Riyad Alshammari, Sumalee Sonamthiang, Mohsen Teimouri, Denis Riordan Faculty of Computer Science, Dalhousie University 6050 University Ave.,Halifax, NS, Canada B3H 1W5 Email: {riyad,sumalee,teimouri,riordan}@cs.dal.ca

Abstract One of the major problems of Intrusion Detection Systems (IDS) at the present is the high rate of false alerts that the systems produce. These alerts cause problems to human analysts to repeatedly and intensively analyze the false alerts to initiate appropriate actions. We demonstrate the advantages of using a hybrid neuro-fuzzy approach to reduce the number of false alarms. The neuro-fuzzy approach was experimented with different background knowledge sets in DARPA 1999 network trafc dataset. The approach was evaluated and compared with RIPPER algorithm. The results shows that the neurofuzzy approach signicantly reduces the number of false alarms more than the RIPPER algorithm and requires less background knowledge sets. Index Terms Intrusion Detection, False Positive, NeuroFuzzy, Classication, Security

a powerful learning method to learn from scratch. For this reasons, we hypothesized that the NN can be a useful learning approach to rene the fuzzy sets and membership function to be appropriate with the dataset. Due to the reasons mentioned above, the neuro-fuzzy hybrid approach was investigated to reduce false positive alarms. The remainder of this paper is organized as follows: Section 2 gives a review of previous works in false alarms reduction for IDSs. Section 3 describes the machine learning techniques. Section 4 species the background knowledge used in the experiment and data preprocessing. Section 5 gives details of the experiment set up and the results. Section 6 is the conclusions and future work. II. R ELATED W ORKS Our work was motivated largely by the work of Pietrazek [1]. He built incremental classier called ALAC using background knowledge and human analyst feedback. Pietrazek used weighting and batch learner to circumvent the limitation of RIPPER (described in section 3.1) algorithm for constructing the incremental classier. In our experiment, we applied his background knowledge approaches. One of the works in false alarm reduction is in Intrusion Alert Quality Framework (IAQF) [14]. The work addresses the data reparation stage for higher levels IDS alert operations. The IAQF uses quality parameter to enhance the alerts data to generate rule weights that was able to reduce the false positive alerts by 35.04% using DARPA 2000 dataset. A score of quality of each parameter (e.g. correctness, accuracy, reliability, and sensitivity) is calculated and added to the data. On the other hand, in our work we take into account the effect of adding background knowledge to the data (alerts) to reduce false positive alarms. The next section explains the machine learning techniques used in our experiment. III. T HE N EURO -F UZZY A PPROACH A neural network can approximate a function from scratch; however, it is not able to interpret the result in terms of natural language. A fuzzy rule base encompasses this disadvantage, but fuzzy rules need prior knowledge which is not able to obtain from scratch. The fusion of neural networks and fuzzy logic provides learning as well as readability [18]. Modern neuro-fuzzy systems are usually represented as a multilayer feed-forward neural network, but fuzzications of other neural

I. I NTRODUCTION Intrusion detection systems (IDSs) are great defending tools against computer attacks. IDSs are considered to be the last line of defense in a secure network. However, IDS systems have some problems such as generating high false positive alarms [2,8,9]. The goal of IDS is to precisely differentiate the intrusion action from normal behavior; therefore the false alarms rate, consists of false positive and false negative rates, is the most important factor considering the accuracy and performance of IDS [7, 10]. High volume of false positive alerts is a central problem in network security and consumes so much time for network analysts to determine whether an alert is an attack when it is a benign [2,9,11,14]. To reduce the false positive alarms of an IDS, we need an approach which is able to deal with uncertainty in network trafc to predict unforeseen and noisy data accurately. Furthermore, the information provided for alerts through audit data and logs do not hold sufcient facts on the characteristics of the connections made on the network [17]. Fuzzy rulebased systems have ability to explain the fuzzy patterns of alerts attributes. However, these alerts attributes used to train the fuzzy rules for IDS is usually high in dimensionality. For example, alerts generated from DARPA 1999 dataset contains many attributes to be analyzed. Each attribute has a various number of possible values ranging from small number of possible value (e.g. the number of protocols) to the huge number of possible values (e.g. the IP address). Therefore, it is not an easy task to explicitly determine the membership functions for the fuzzy rules. For this type of background knowledge, Neural Network (NN) approach is acceptable as

Fifth Annual Conference on Communication Networks and Services Research(CNSR'07) 0-7695-2835-X/07 $20.00 2007

We decide to choose RIPPER since it has better performance than other decision tree algorithm such as C4.5. For instance, Cohen showed that RIPPER performances better than C4.5 decision tree for large noisy dataset [2]. RIPPER can extract the condence of the rules based on the training data pursed by it. We used the Weka implementation of RIPPER algorithm in our experiment [5]. IV. T HE N EURO -F UZZY A PPROACH TO FALSE A LARMS R EDUCTION A. Preprocessing In data preprocessing we operated in the following steps. Firstly, we initiated an IDS classication from the background knowledge using Snort [3], a lightweight intrusion detection system tool that can be deployed on TCP/IP network, to detect attacks and generate alerts. We used the default conguration of Snort and rule sets intentionally to show how much we reduce the number of false positive alarms. We have no intention to evaluate the performance of Snort as IDS system. Secondly, we used Tcpreplay tool [4], replay packets captured by Tcpdump, to resend the TCP dump raw data of the DARPA 1999 to Snort, running in a batch mode, to generate the alarms and log them into le. The alerts are represented as a record contains the following attributes: timestamp, signature ID and revert, source and destination IP addresses, message name, and protocol types. The DARPA 1999 dataset from MIT-Lincoln Lab is a collection of four types of network trafc data, which are inside and outside Tcpdump, audit data (bsm) , and le systems data. The dataset consists of 5 weeks of trafc. The rst three weeks of trafc is attack-free except for the second week that includes labeled known attacks. The fourth and fth weeks are the testing dataset that contains new attacks [6]. Thirdly, we used the Tcpdump binary les of the outside trafc of the DARPA 1999 dataset for our experiment. The rst three weeks of the dataset are used for training purpose by our experiment, while the last two weeks of data are used for evaluation. B. Background Knowledge Data Attributes In our experiment, we applied Pietraszeks background knowledge implementation to add attributes[1]. He added 19 attributes whereas we added only 18 attributes (we did not use the alerts classied as intrusion for the aggregate1) to the alerts generated by Snort. The background knowledge sets used in our experiment are: Classication of IP addresses results in extra attributes for the source and destination IP addresses. The classication of the IP is based on their known subnet (e.g. Internet, intranet, etc). Classication of hosts results in extra attributes for both the source and destination IP addresses. The classication of the host is based on their known operating system (e.g. Linux, Macintosh, windows, etc) and host type (e.g. gateway, router, workstation, server, etc). Aggregates1 results in extra attributes counting the number of alerts in time window of one minute for alerts with the same source IP address, the same destination IP

Figure 1: A NEFCLASS system adapted from [12]. network architectures are also considered. In [15], the author gives a lists of properties in which encompass in the neurofuzzy system. In our work we used java implementation of NEFCLASS to improve the classication of an IDS. NEFCLASS [19], NEuro Fuzzy CLASSication, is a 3 layer feed-forward neural network. The rst layer is the fuzzication layer where each node, input variable, represented by fuzzied values, fuzzy set, based on the membership function. The second layer is the fuzzy rule layer where the AND operation is used to grouped the matching node of similar degree to generate the corresponding rule. The last layer is the defuzzication layer where the output produced. Figure 1 shows the NEFCLASS architecture. NEFCLASS implements the Mamdanis fuzzy inference method that produces fuzzy classication rules in the form: IF X1 is 1 and X2 is 2 , ... , and Xn is n then Y is related to class i, where 1 , 2 .., and n is the f uzzy set.[12, 13]. The learning algorithm in NEFCLASS [12] makes use of reinforcement learning algorithm for learning the rule base (structure learning) and a fuzzy backpropagation algorithm for learning the fuzzy sets (parameter learning). NEFCLASS uses a fuzzy error based on fuzzy goodness measure for the reinforcement learning. However, the performance of the system will very much depend on heuristic factors [16] such as learning rate, number of epochs, error measurement, membership functions, number of variables in fuzzy sets, and rule weights. Therefore, in our experiment we ne tuned these factors to optimize the outputs of false positive alarms reduction as described in the next section. A. RIPPER algorithm RIPPER, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), is a rule learner algorithm that William W. Cohen proposed as an optimized copy of IREP. RIPPER is able to generate an optimze rule-set from a training dataset.

Fifth Annual Conference on Communication Networks and Services Research(CNSR'07) 0-7695-2835-X/07 $20.00 2007

address, the same source and destination IP addresses, and the same signature revert. Aggregates2 and 3 count the number of alerts likewise for aggregates1 but in time window of 5 minutes and 30 minutes, correspondingly. The performance of our approach was tested based on these background knowledge sets in reducing the false alarms. The background knowledge results in three classes as follows: IP class, IP and W1 (time window of one minute) and the W30 (consists of IP class and time window of 1, 5 and 30 minutes).

Figure 2: trapezoidal MF adapted from [19]. rate 0.01 was used. We also tried learning rate of 0.1, and 0.001. Cross validation: we used 10 cross validations. Number of epochs: used to train the Neural network were 1000 and 10,000 respectively. V. R ESULTS AND D ISCUSSION The hybrid neuro-fuzzy were trained on the rst 3 weeks of DARPA 1999 dataset using different background knowledge and member functions. The last two weeks of DARPA 1999 dataset were used for evaluation. The result of the experiment is the reduction rate of false alarms. Three variables fuzzy set using the maximum function as the aggregation functions were tested with four membership functions, MFs, on the background knowledge classes with a Condence Factor, CF, equal to 99% to inspect the effect of adding background knowledge in reducing false alarms and to decide which MF is the the most appropriate for this characteristics of dataset. The four MFs are triangular, trapezoid, bell-shaped, and list. Figure 3 shows the result of the experiment in reducing the false positive alerts with 99% CF. The CF was calculated by NEFCLASS-J.

C. Alert Labeling The alerts generated by Snort are classied as true or false positive alerts based on the truth table provide by MIT-Lincoln lab [6]. The details of the classication of week 4 and 5 are shown in table 1. W eek Week4 Week5 Total T ruealerts 1,994 5,124 7,118 F alsealerts 4,308 16,451 20,759

Table 1: Classied alerts D. Training the Fuzzy Sets NEFCLASS-J was used in the experiment to verify our hypothesis that the hybrid neuro-fuzzy approach can improve the classication of the alarms. However, the performance of NEFCLASS-J depends on the heuristics learning factors. We ran may tests with differnet values for NEFCLASS-Js learning factors. Therefore, in our experiment we ne tuned these factors as follows: Background knowledge sets: the experiment was started with the less background knowledge (less attributes, e.g., IP class). Then, we added more knowledge to the data as in W1 and W30 respectively. We conducted the same experiment process for each background knowledge sets. Membership function (MF): the MFs provided by NEFCLASS include Triangular, Trapezoidal, Bell-shaped, and List function. The parameters of these membership functions are described in [19]. Figure 2 shows trapezoidal function that requires four parameters (a, b, c, d). Fuzzy sets variables: we experimented with three, four, and ve-variable fuzzy sets. Three-variable is the linguistic variables which are small, medium and large. Fourvariable fuzzy set comprises of the linguistic value as very small, small, medium, and large. The ve-variable fuzzy set similarly has the same values as the four-variable set plus one more linguistic value that is very large. We experimented with the different number of the fuzzy sets variables. Rule weights: we experimented with how the rule weight function affects the classication performance. Thus, we tried with no rule weight and using the rule weight function. Learning rate: different learning rates are tested. With the heuristics knowledge, the commonly used learning

Figure 3: a) false positive rate for background classes b) false negative rate for background classes. The experimental result on different background knowledge sets shows that the full knowledge approach had the best result while the no knowledge approach resulted in lowest ability to

Fifth Annual Conference on Communication Networks and Services Research(CNSR'07) 0-7695-2835-X/07 $20.00 2007

capture the false alarms. The full knowledge approach was shown the lowest false negative ratio. The trapezoid membership function with the full knowledge approach had the lowest false negative rate (8.12%) with high false positive detection rate (91.81%). Unrecorded experimental result showed the same result with 5 variables fuzzy set. However, using 4 variables fuzzy set with the trapezoid MF showed different result as showing in table 2. We ran the test using two different aggregation functions for the neural network. The result shows that the full knowledge class had no effect so ever in improving the classier. The IP class with 10 attributes detected 90.92% false positive alerts with 5.07% false negative ratio using the Weight function as the aggregation function.

knowledge reduce the ability of nuero-fuzzy classication. VI. C ONCLUSIONS AND F UTURE WORK

We used hybrid neuro-fuzzy to generate fuzzy rules that classied alerts as true or false positives using background knowledge. The experiment was conducted with different fuzzy variables, membership functions, aggregation functions, and background knowledge sets. In our experiment, we used the rst 3 weeks of DARPA 1999 dataset for training and the last 2 weeks for testing. The experimental result showed that the trapezoid membership function with IP class background knowledge had the best ability in detecting false alarms. However, it showed that adding more background knowledge had no effect in reducing the false positive alerts. Our approach Max. Aggregation Weight Aggregation of neuro-fuzzy system can signicantly reduce the number Classes FP ratio FN ratio FP ratio FN ratio of false positive alerts by 90.92% using less background knowledge (the IP class with 10 attributes) compared to No knowledge 97.76% 70.4% 97.76% 70.4% RIPPER algorithm. However, the number the false negative IP class 90.55% 6.79% 90.92% 5.07% alerts classied by our approach (5.07%) are more than those W1 class 90.55% 6.79% 90.55% 6.79% classied by the RIPPER algorithm (0.02%). Thus, the neuroW30 class 90.55% 6.79% 90.55% 6.79% fuzzy approach seems too modest to false negative alerts. The Table 2: Comparison between Maximum Aggregation function RIPPER algorithm may be combined with our approach for and Weight Aggregation function. better alert classication that is able to capture both false positive and false negative alerts. Table 3 illustrates the performance of the nuero-fuzzy The work may be extended by using feature selection approach and the RIPPER algorithm in term of classication algorithms such as rough set theory to determine the most ability. Our approach shows a better classication ability important attributes for alarm classication. The reduced and in reducing the false positive alerts (90.92%). Additionally, selected attributes can improve the classier and reduce the The neuro-fuzzy learning required less knowledge (IP class) training time for the neural network. Moreover, this work to come up with better result than RIPPER that had to use can be extended with multiple classes representing different the full knowledge approach. However, the number the false attacks. Each attack class may have different attribute set that negative alerts classied by our approach (5.07%) are more are more appropriate to the attack class. than those classied by the RIPPER algorithm (0.02%). Thus, the neuro-fuzzy approach seems too modest to false negative R EFERENCES attacks. The RIPPER algorithm may be combined with our [1] Pietraszek, T., Using Adaptive Alert Classication to Reduce False approach for better alert classication that is able to capture Positives in Intrusion Detection, Recent Advances in Intrusion Detection: 7th International Symposium RAID 2004, pp. 102-124, September 2004. both false positive and false negative alerts. F alseP ositive 34.24% 90.92% F alseN egative 0.02% 5.07%
[2] Cohen, W.W., Fast effective rule induction, In Machine Learning: Proceedings of the Twelfth International Conference, pp. 115-123. 1995. [3] Roesch, M., Snort lightweight intrusion detection for networks, The 13th USENIX Systems Administration Conference (LISA 99), pp. 228238. November 1999. Available http://www.snort.org. Accessed August 2006. [4] Turner, A., Tcpreplay, http://tcpreplay.synfin.net/ trac/. Accessed August 2006. [5] Witten, Ian H., Frank, E. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations,Morgan Kaufmann Publishers, 2000. [6] Lincoln Laboratory Massachusetts Institute of Technology. DARPA intrusion detection evaluation. http://www.ll.mit.edu/IST/ ideval/docs/1999/schedule.html. Accessed August 2006. [7] Axelsson, S., The base-rate fallacy and the difculty of intrusion detection, ACM Trans Inf. Syst. Secur., Vol.3, No. 3, pp. 186-205, August 2000. [8] Portnoy, L., Eskin E., and Solfo S., Intrusion detection with unlabelled data using clustering, Proceedings of ACM CSS Workshop on Data Mining Applied to Security (DMSA-2001), pp. 76-105, 2001. [9] Mahoney, M. and Chan, P., An analysis of the 1999 DARPA Lincoln Laboratory evaluation data for network anomaly detection, In Recent Advances in Intrusion Detection (RAID2003)- Lecture Notes in Computer Science, Vol. 2820, pp. 220-237. Springer-Verlag, 2003. [10] Debar, H. and Wespi, A., Aggregation and correlation of intrusiondetection alerts, Recent Advances in Intrusion Detection(RAID2001) Lecture Notes in Computer Science, Vol. 2212, pp. 85-103. SpringerVerlag, 2001.

RIPPER Neuro-fuzzy

Table 3: Comparison of false positive ratio and false negative ratio for RIPPER using full knowledge class and 4 variables Trapezoidal function using IP class. In summary, as shown in gure 3a and 3b, the trapezoid membership function using the weight as an aggregation function for neural network extensively reduces the number of false positive alerts with fewer mistakes. The more background knowledge should provide better result on classications than the less knowledge. However, the experimental result showed that the less knowledge approach (IP class) reduces the number of false positive alerts more than other knowledge sets (aggregates1, 2, and 3). This may be because of the DARPA dataset that has been created for the purpose of detection technique evaluations rather than alert analysis. The IP class seems to be sufcient for alerts classication while the more added

Fifth Annual Conference on Communication Networks and Services Research(CNSR'07) 0-7695-2835-X/07 $20.00 2007

[11] Wang, J. and Lee, I., Measuring false-positive by automated realtime correlated hacking behavior analysis, Information Security 4th International Conference - Lecture Notes in Computer Science, Vol. 2200, pp. 512. Springer-Verlag, 2001. [12] Nauck, D. and Kruse, R., NEFCLASS: A Neuro-Fuzzy Approach for the Classication of Data, In K. M. George, J. H. Carrol, E. Deaton, D. Oppenheim, and J. High- tower, editors, Applied Computing 1995. Proe. 1995 ACM Symposium on Applied Computing, Nashville, Feb. 26-28, pages 461-465. ACM Press, New York, Feb.1995. [13] Nanck, D. and Kruse, R., A neuro-fuzzy method to learn fuzzy classication rules from data, Fuzzy Sets and Systems, Vol.89, pp. 277288, 1997. [14] Bakar, N. A., Belaton, B. and Samsudin A., False Positives Reduction via Intrusion Alert Quality Framework, Joint IEEE Malaysia International Conference on Communications and IEEE International Conference on Networks, pp. 547-552, November 2005. [15] Nauck, D., Neuro-Fuzzy Systems: Review and Prospects, Proceedings of Fifth European Congress on Intelligent Techniques and Soft Computing (EUFIT97), pp. 1044-1053, 1997. [16] Abraham, A., Beyond Neuro-Fuzzy Systems: Reviews, Prospects, Seventh International Mendel Conference on Soft Computing, Brno, MENDEL 2001, Matousek Radek et al (Eds.), pp. 376-372, 2001. [17] Schuba C. L., et. al. Analysis of a denial of service attack on TCP, Proceedings of 1997 IEEE Symposium on Security and Privacy, pp. 208223, IEEE Computer Society Press, 1997. [18] Jantzen, J., Neurofuzzy modelling Technical Report No. 98-H-874 (nfmod), Technical University of Denmark, Department of Automation, October 1998. [19] Nauck, D., Nauck, U. and Kruse, R., NEFCLASS for JAVA - New Learning Algorithms, Proceddings of Fuzzy Information Processing Society (NAFIPS) - 18th International Conference of the North American, pp. 472-476. July 1999.

Fifth Annual Conference on Communication Networks and Services Research(CNSR'07) 0-7695-2835-X/07 $20.00 2007

Anda mungkin juga menyukai