Anda di halaman 1dari 5

ISSN: 2277 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering, Volume 1, Issue 1, March

h 2012

Anomaly Detection by Naive Bayes & RBF Network


Dilip Kumar Ahirwar, Sumit Kumar Saxena, M. S. Sisodia
AbstractIntrusion detection by automated means is gaining widespread interest due to the serious impact of Intrusions on computer system or network. Several techniques have been introduced in an effort to minimize up to some extent the risk associated with Intrusion attack. In this respect, we have used a novel Machine learning technique which comprises of Nave Bayes approach and weighted radial basis function Network approach. The proposed scheme exhibits very promising results in comparison with many earlier techniques while experimenting on KDDCup99 data set in detecting Intrusions. Keywords- Network intrusion detection, Naive Bayes, RBF Network .

II.

INTRUSION DETECTION

I.

INTRODUTION

Intrusion Detection System (IDS) are software or hardware systems that automate the process of monitoring and analyzing the events that occur in a computer network, to detect malicious activity. Since the severity of attacks occurring in the network has increased drastically, Intrusion detection system have become a necessary addition to security infrastructure of most organizations. Intrusion detection allows organization to protect their systems from the threats that come with increasing network connectivity and reliance on information systems. Given the level and nature of modern network security threats the question for security professionals should not be whether to use intrusion detection but instead which intrusion detection features and capabilities can be used. Intrusions are caused by: Attackers accessing the systems, Authorized users of the systems who attempt to gain additional privileges for which they are not authorized, Authorized users who misuse the privileges given to them. Intrusion detection systems (IDS) take either network or host based approach for recognizing and deflecting attacks. In either case, these products look for attack signatures (specific patterns) that usually indicate malicious or suspicious intent. When an IDS looks for these patterns in network traffic then it is network based. When an IDS looks for attack signatures in log files, then it is host based. Various algorithms have been developed to identify different types of network intrusions; however there is no heuristic to confirm the accuracy of their results. The exact effectiveness of a network intrusion detection systems ability to identify malicious sources cannot be reported unless a concise measurement of performance is available. The two main approaches we are considering Naive Bayes & Radial basis function Network (RBF Network) approach for Intrusion detection.

An Intrusion Detection System (IDS) inspects the activities in a system for suspicious behaviors or patterns that may indicate system attack or misuse. There are two main categories of intrusion detection techniques; Anomaly detection [2] and Misuse detection. The former analyses the information gathered and compares it to a defined baseline of what is seen as normal service behaviors, so it has the ability to learn how to detect network attacks that are currently unknown. Misuse Detection is based on signatures for known attacks, so it is only as good as the database of attack signatures that it uses for comparison. Misuse detection has low false positive rate, but cannot detect novel attacks. However, anomaly detection can detect unknown attacks, but has high false positive rate. In this paper, we review the performance of classifiers when trained to identify signatures of specific attacks. These attacks are discussed in more detail in the following section. III. NETWORKING ATTACKS The simulated attacks were classified, according to the actions and goals of the attacker. Each attack type falls into one of the following four main categories [3]: Denials-of Service (DoS) attacks have the goal of limiting or denying services provided to the user, computer or network. A common tactic is to severely overload the targeted system. (e.g. apache, smurf, Neptune, Ping of death, back, mailbomb, udpstorm, SYNflood, etc.). Probing or Surveillance attacks have the goal of gaining knowledge of the existence or configuration of a computer system or network. Port Scans or sweeping of a given IP-address range typically fall in this category. (e.g. saint, portsweep, mscan, nmap, etc.). User-to-Root (U2R) attacks have the goal of gaining root or super-user access on a particular computer or system on which the attacker previously had user level access. These are attempts by a non-privileged user to gain administrative privileges (e.g. Perl, xterm, etc.). Remote-to-Local(R2L) attack is an attack in which a user sends packets to a machine over the internet, which the user does not have access to in order to expose the machine vulnerabilities and exploit privileges which a local user would have on the computer (e.g. xclock, dictionary, guest password, phf, sendmail, xsnoop, etc.).

14
All Rights Reserved 2012 IJARCSEE

ISSN: 2277 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering, Volume 1, Issue 1, March 2012

IV.

RELATED WORK

ADAM (Audit Data Analysis and Mining) [4] is an intrusion detector built to detect intrusions using data mining techniques. It first absorbs training data known to be free of attacks. Next, it uses an algorithm to group attacks, unknown behaviour, and false alarms. ADAM has several useful capabilities, namely; Classifying an item as a known attack Classifying an item as a normal event, Classifying an item as an unknown attack, Match audit trial data to the rules it gives rise to. IDDM (Intrusion Detection using Data Mining Technique) [5] is a real-time NIDS for misuse and anomaly detection. It applies association rules, Meta rules, and characteristic rules. It employs data mining to produce description of network data and uses this information for deviation analysis. MADAM ID (Mining Audit Data for Automated Models for Intrusion Detection) [6] is one of the best known data mining projects in intrusion detection. It is an off-line IDS to produce anomaly and misuse intrusion detection models. Association rules and frequent episodes are applied in MADAM ID to replace hand-coded intrusion patterns and profiles with the learned rules. In [7], the authors propose a method of intrusion detection using an evolving fuzzy neural network. This type of learning algorithm combines artificial neural network (ANN) and fuzzy Inference systems (FIS), as well as evolutionary algorithms. They create an algorithm that uses fuzzy rules and allow new neurons to be created in order to accomplish this. They use Snort to gather data for training the algorithm and then compare their technique with that of an augmented neural network. In [8], a statistical neural network classifier for anomaly detection is developed, which can identify UDP flood attacks. Comparing different neural network classifiers, the back propagation neural network (BPN) has shown to be more efficient in developing IDS [9]. In [9], the author uses the back propagation method by Sample Query and Attribute Query for the Intrusion Detection, whereby analyzing and identifying the most important components of training data. It could reduce processing time, storage requirement, etc. In [10], Axellson wrote a well-known paper that uses the Bayesian rule of conditional probability to point out that implication of the base-rate fallacy for intrusion detection. In [11], a behavior model is introduced that uses Bayesian techniques and RBF Network for classication Here, these neural networks are trained to estimate posterior probabilities of class membership by means of mixtures of Gaussian basis functions and hyper-planes. to obtain model parameters with maximal a-posteriori probabilities. Their work is similar to our, to the extent that Bayesian & RBF network statistics are employed. However, the difference lies in that; we use nave bayes & RBF Network for our model. V. PROPOSED WORKED

then calculate the probability for the hypothesis to be true. This is among the most practical approaches for certain types of problems. The approach requires only one scan of the whole data. Also, if at some stage there are additional training data, then each training example can incrementally increase/decrease the probability that a hypothesis is correct. Thus, a Bayesian network is used to model a domain containing uncertainty {12, 13] & evolutionary optimization of RBF network architectures (feature and model selection) applicable to a wide range of data mining problems (in particular, classication problems). Therefore, the overall runtime of the EA had to be reduced substantially. We decided to optimize the most important architecture parameters only and to use standard techniques for representation, selection, and reproduction.

Naive Bayes:
Define the nave Bayes model is a heavily simplified Bayesian probability model [14]. In this model, consider the probability of an end result given several related evidence variables. The probability of end result is encoded in the model along with the probability of the evidence variables occurring given that the end result occurs. The probability of an evidence variable given that the end result occurs is assumed to be independent of the probability of other evidence variables given that end results occur. Now we will consider the alarm example using a nave Bayes classifier. Assume that we have a set of examples that monitor some attributes such as whether it is raining, whether an earthquake has occurred etc. Lets assume that we also know, using the monitor, about the behavior of the alarm under these conditions. In addition, having knowledge of these attributes, we record whether or not a theft actually occurred. We will consider the category of whether a theft occurred or not as the class for the naive Bayes classifier. This is the knowledge that we are interested in. The other attributes will be considered as knowledge that may give us evidence that the theft has occurred. Abbreviations Figure1 below shows the framework for a Naive Bayesian model to perform intrusion detection. The nave Bayes classifier operates on a strong independence assumption [14]. This means that the probability of one attribute does not affect the probability of the other. Given a series of n attributes, the nave Bayes classifier makes 2n! Independent assumptions. Nevertheless, the results of the nave Bayes classifier are often correct. The work reported in [15] examines the circumstances under which the nave bayes classifier performs well and why. It states that the error is a result of three factors: training data noise, bias, and variance. Training data noise can only be minimized by choosing good training data. The training data must be divided into various groups by the machine learning algorithm. Bias is the error due to groupings in the training data being very large. Variance is the error due to those groupings being too small.

The Naive Bayes method is based on the work of Thomas Bayes (1702-1761). In Bayesian classification, we have a hypothesis that the given data belongs to a particular class. We

15
All Rights Reserved 2012 IJARCSEE

ISSN: 2277 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering, Volume 1, Issue 1, March 2012

Radial basis function Network (RBF Network) 1. (RBF, [21][24]) are used for classication Here, these neural networks are trained to estimate posterior probabilities of class membership by means of mixtures of Gaussian basis functions and hyper-planes. From a structural viewpoint, RBF networks are closely related to direct kernel methods and support vector machines (SVM) with Gaussian kernel functions [21], [26]. 2. Evolutionary algorithms (EA, [27], [28]) are used for architecture optimization (combined feature and model selection) of the RBF networks. Here, this class of optimization algorithms is chosen because the search space is highdimensional and the objective function is noisy, deceptive, multimodal, and nondifferentiable. Evolutionary optimization of RBF architectures is in no way a new idea, but existing approaches typically suffer from the problems of a high runtime and a premature convergence in local minima. Often, these two problems are closely related: Due to a high runtime smaller populations with lower diversity are evolved and convergence in a local minimum becomes more likely. Hence, we take over the best from existing approachessuch as standard encoding schemes for the representation of individuals or standard recombination and mutation operatorsand we integrate various techniques that reduce the runtime of the EA radically. In particular, we use methods for: a) fast tness evaluation of individuals (hybrid training of RBF networks, lazy evaluation); b) Consideration of soft constraints by means of penalty terms (e.g., to prefer smaller network structures); c) Adaptive control of the evolutionary optimization process by means of a temperature coefficient. Due to a signicantly reduced runtime and a goal-oriented search more and tter solutions can be evaluated within shorter time. Therefore, it can be expected that better solutions with higher classication rates can be obtained. Altogether, an algorithm is introduced that can be utilized for a wide variety of data mining tasks to obtain and to apply new, domain-specic knowledge. This approach efficiently instantiates already known techniques as well as innovative, novel ideas. Its advantages are outlined by means of four realworld data mining applications: Intrusion detection in computer networks, biometric signature verication, customer acquisition with direct marketing methods, and optimization of chemical production processes. Compared to some of our earlier work on EA-based RBF optimization, the runtime is reduced by up to 99% and the error rates are decreased by up to 86%, depending on the application. The remainder of this article is structured as follows. First, the state of the art is analyzed to motivate our work (Section II) and the RBF networks used are described (Section III). Then, the EA for architecture optimization is introducedwith a strong focus on the innovative aspects mentioned above (Section IV). After that, the advantages of the approach are set out by means of various application examples (Section V). Finally, the main

Figure1. The framework of the Intrusion Detection Model. Abstractly, the probability model for a classifier is a conditional model

p(C | F1 ,..., Fn )
over a dependent class variable C with a small number of outcomes or classes, conditional on several feature variables F1 through Fn. The problem is that if the number of features n is large or when a feature can take on a large number of values, then basing such a model on probability tables is infeasible. We therefore reformulate the model to make it more tractable. Using Bayes' theorem, we write

p(C | F1 ,..., Fn )

p(C ) p( F1 ,..., Fn | C ) . p( F1 ,..., Fn )

In plain English the above equation can be written as

prior likelihood posterior = evidence


In practice we are only interested in the numerator of that fraction, since the denominator does not depend on C and the values of the features Fi are given, so that the denominator is effectively constant. The numerator is equivalent to the joint probability model

p(C | F1 ,..., Fn )
which can be rewritten as follows, using repeated applications of the definition of conditional probability:

p(C | F1 ,..., Fn )
p(C ) p ( F1 ,..., Fn | C )

p(C ) p ( F1 | C ) p( F2 ,..., Fn | C , F1 )
In the training phase, the nave bayes algorithm calculates the probabilities of a theft given a particular attribute and then stores this probability. This is repeated for each attribute, and the amount of time taken to calculate the relevant probabilities for each attribute. In the testing phase, the amount of time taken to calculate the probability of the given class for each example in the worst case is proportional to n, the number of attributes. However, in worst case, the time taken for testing phase is same as that for the training phase.

16
All Rights Reserved 2012 IJARCSEE

ISSN: 2277 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering, Volume 1, Issue 1, March 2012

VI.

EXPERIMENT AND RESULTS


Actual Normal DoS Probe U2R R2L Predicted Normal Predicted DoS Predicted Probe Predicted U2R Predicted R2L Accuracy (%)

In this section, we summarize our experimental results to detect network intrusion detections using the nave Bayes algorithm over KDDCup99 data set. We first describe the data set used in this experiment and then discuss the results obtained. Finally, we evaluate our approach and compare the results with the results obtained by other researchers using BPN algorithms and with the best result of the KDD99contest. For our experiments we are using KDD CUP 99 dataset. KDD CUP 1999 contains 41 fields as an attributes and 42nd field as a label. In our algorithm we have taken selected features. The 42nd field can be generalized as Normal, DoS, Probing, U2R, and R2L. The performances of each method are measured according to the Accuracy, Detection Rate and False Positive Rate using the following expressions:

4487 36 74 2 552

1 41 21 4 3

52 4 452 2 335

62 4 35 4048 17

31 8 17 1 184

96.84869 44.08602 75.4591 99.77816 16.86526

Table 2: Confusion Matrix for RBF Network classifier using training data set.
Actual Normal DoS Probe U2R R2L Predicted Normal Predicted DoS Predicted Probe Predicted U2R Predicted R2L Accuracy (%)

2079 0 1 7 7

30 59 47 13 13

22 9 143 3 54

1145 4 3 937 14

1357 21 405 97 1003

44.87373 63.44086 23.87312 88.64711 91.93401

Accuracy =

TP + TN TP + TN + FP + FN TP TP + FP FP FP + TN
Where,

Table 3: Confusion Matrix for Naive bayes using training data set

Detection Rate =

120 100 80 60 40 20 0 RBF Network Nave Bayes

False

Alarm =

PR O BI

The detection rate is the number of attacks detected by the system divided by the number of attacks in the data set. The false positive rate is the number of normal connections that are misclassified as attacks divided by the number of normal connections in the data set.
Attack Types Normal Denial of Service Remote to User User to Root Probing Total Examples Training Examples 4633 93 599 1057 1091 7473

Figure 1: Accuracy comparison graph by using training data set

RBF Network Detection False Rate Alarm


DoS U2R R2L PROBING NORMAL 0.968 0.441 0.755 0.991 0.169 0.234 0.004 0.057 0.018 0.009

NO

RM

FN is False Negative, TN is True Negative, TP is True Positive, and FP is False Positive

U2 R

R2 L

Do S

NG

AL

NAVE BAYES Detection False Rate Alarm


0.449 0.634 0.239 0.886 0.919 0.005 0.014 0.013 0.182 0.295

Table 4: Detailed Accuracy of RBF Network & NAVE BAYES Table 1: shows the number of examples in randomly selected data of KDD99 dataset.

A confusion matrix is sometimes used to represent the result of, as shown in table2 (RBF Network) & table3 (Nave Bayes) .the advantage of using this matrix is that it not only tells us how many got misclassified but also what misclassifications occurred. for our model we get the following confusion matrix.

VII. CONCLUSION AND FUTURE WORK In this paper, we have proposed a framework of NIDS based on Nave Bayes algorithm & RBF network. The framework builds the patterns of the network services over data sets labeled by the services. With the built patterns, the framework detects attacks in the datasets using the nave Bayes Classifier algorithm. Compared to the neural network based approach, our approach achieve higher detection rate,

17
All Rights Reserved 2012 IJARCSEE

ISSN: 2277 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering, Volume 1, Issue 1, March 2012

less time consuming and has low cost factor. However, it generates somewhat more false positives. Evolutionary optimization of RBF network architectures (feature and model selection) applicable to a wide range of data mining problems (in particular, classication problems).Therefore, the overall runtime of the EA had to be reduced substantially. We decided to optimize the most important architecture parameters only and to use standard techniques for representation, selection, and reproduction. Each genotype produced during evolution describes a valid solution of the optimization problem and each solution can be reached from any other point in the search space. Runtime reduction as well as improvements to classication rates or lift factors are achieved by a combination of various techniques, in particular fast tness evaluation (hybrid training, lazy evaluation), integration of soft constraints (side conditions), and temperature-based control of the EA (to perform the tradeoff between ne tuning and coarse tuning of the optimization process). As a nave Bayesian network is a restricted network that has only two layers and assumes complete independence between the information nodes. This poses a limitation to this research work. In order to alleviate this problem so as to reduce the false positives, active platform or event based classification may be thought of using Bayesian network. We continue our work in this direction in order to build an efficient intrusion detection model. REFERENCES
[1] Symantec-Internet Security threat report highlights (Symantec.com), http://www.prdomain.com/companies/Symantec/Newreleases/Symantec _internet_205032.htm R.Durst, T.champion, B.witten, E.Miller, andL.Spagnuolo, Testing and evaluating computer intrusion detection system communications of ACM, Vol.42, no.7 , PP 53-61,1999. A.sung & S.Mukkamala, Identifying important features for intrusion detection using SVM and neural networks, in symposium on applicaation and the Internet, pp 209-216, 2003. D.Barbara, J.Couto, S.Jajodia, and N.Wu, "ADAM: A test bed for exploring the use of data mining in intrusion detection, SIGMOD, vol30, no.4, pp 15-24, 2001. Tomas Abraham, "IDDM: INTRUSION Detection using Data Mining Techniques, Technical report DSTO electronics and surveillance research laboratory, Salisbury, Australia, May2001. Wenke Lee and Salvatore J.Stolfo, "A Framework for constructing features and models for intrusion detection systems, ACM transactions on Information and system security (TISSEC), vol.3, Issue 4, Nov 2000. S.chavan, K.Shah, N.Dave, S.Mukherjee, A.Abraham, and S.Sanyal, "Adaptive neuro-fuzzy Intrusion detection syatems, ITCC, Vol 1, 2004 Z. Zhang, J. Li, C.N. Manikapoulos, J.Jorgenson, J.ucles, "HIDE: A hierarchical network intrusion detection system using statistical pre-

[9]

[10]

[11]

[12] [13] [14] [15]

[16]

[17] [18] [19] [20]

[21] [22]

[2]

[3]

[23] [24] [25]

[4]

[5]

[6]

[26]

processing and neural network classification, IEEE workshop proceedings on Information assurance and security, 2001, pp.85-90. Roy-I Chang, Liang-Bin Lai, et al, "Intrusion detection by back propagation network with sample query and attribute query, International Journal of computational Intelligence Research,Vol..3, no.1, 2007, pp 6-10. S. Axelsson, "The base rate fallacy and its implications for the difficulty of Intrusion detection, Proc. Of 6th.ACM conference on computer and communication security 1999. R.Puttini, Z.marrakchi, and L. Me, "Bayesian classification model for Real time intrusion detection, Proc. of 22nd. International workshop on Bayesian inference and maximum entropy methods in science and engineering, 2002. P.Jenson, "Bayesian networks and decision graphs, Springer, Newyork, USA, 2001. J.Pearl, "Probabilistic reasoning in intelligent system, Networks of plausible inference, Morgan Kaufmann 1997. S.J.Russel, and Norvig, "Artificial Intelligence: A modern approach (International edition), Pearson US imports & PHIPES, Nov 2002. P.Domingos, and M.J. Pizzani, "On the optimality of the simple Bayesian classifier under zero-one loss, m/c learning, Vol.29, no2-3, pp 103-130, 1997. M.Mahoney and P. chan, "An analysis of the 1999 DARPA/Lincoln laboratory evaluation data for network anomaly detection, Proc.of Recent Advances in intrusion detection (RAID)-2003, Pittsburg, USA, Sept. 2003. MIT Lincoln Laboratory, DARPA Intrusion detection Evaluation, http://www.ii.mit.edu. Charles Elkan, "Results of the KDD99 classifier learning, SIGKDD Explorating192), pp. 63-64, 2000. WEKA: software machine learning, the University of Waikato, Hamilton, New-Zealand. K.M.Faroun, A.Boukelif, "Neural network learning improvement using K-means clustering algorithm to detect network intrusions, April17, 2006, http://www.dcc.ufla.br/infocomp/artigos/v5.3/art04.pdf S. Haykin, Neural NetworksA Comprehensive Foundation. NewYork: Macmillan, 1994. T. Poggio and F. Girosi, A theory of networks for approximation and learning, Articial Intelligence Lab., Center for Biological Information Processing, Mass. Inst. Technol., Cambridge, A.I. Memo 1140, C.B.I.P. Paper 31, 1989. J. Moody and C. J. Darken, Fast learning in networks of locally-tuned processing units, Neural Comput., vol. 1, pp. 281294, 1989. D. S. Broomhead and D. Lowe, Multivariable functional interpolation and adaptive networks, Complex Syst., vol. 2, pp. 321355, 1988. M. J. Embrechts, B. Szymanski, and K. Sternickel, Introduction to scientic data mining: Direct kernel methods & applications, in Computationally Intelligent Hybrid Systems: The Fusion of Soft Computing and Hard Computing, S. J. Ovaska, Ed. New York: Wiley, 2004, ch. 10, pp. 317362. N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge,
U.K.: Cambridge Univ. Press, 2000.

[7] [8]

[27] T. Bck, Evolutionary algorithms in theory and practice, Ph.D. dissertation, Univ. Dortmund, Dortmund, Germany, 1994. [28] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, 3rd ed. New York: Springer-Verlag, 1996.

18
All Rights Reserved 2012 IJARCSEE

Anda mungkin juga menyukai