Anda di halaman 1dari 8

AN INTRUSION DETECTION SYSTEM USING FUZZY DATA MINING AND GENETIC ALGORITHMS

B.SAI PRAVEEN, K.SUBRAHMANYAM Sai_praveen17@yahoo.com rajaa.0321@yahoo.com 3RD YEAR CSE, ANIL NEERUKONDA INSTITUTE OF TECHNOLOGY AND SCIENCES, SANGIVALASA, VIZAG.

ABSTRACT :Intrusion Detection systems are increasingly a key part of systems defense. Various approaches to Intrusion Detection are currently being used. Artificial intelligence plays a driving role in security services. This paper presents a dynamic intelligent Intrusion Detection system model, based AI approach which includes fuzzy logic and simple data mining techniques to process network data. This system combines two distinct intrusion approaches: 1)Anomaly based intrusion detection system using fuzzy data mining techniques, and 2)Intrusion detection systems using genetic algorithms.

the attacks. Intrusion detection has become the integral part of the information security process. 2.0 INTRUSION SYSTEMS DETECTION

2.1 AN OVERVIEW OF CURRENT INTRUSION DETECTION SYSTEMS: Intrusion detection is defined [1] as the process of intelligently monitoring the events occurring in a computer system or network and analyzing them for signs of violations of the security policy. The primary aim of IDS is to protect the availability, confidentiality and integrity of critical networked information systems. IDS are defined by both the method used to detect attacks and placement of the IDS on network. IDS may perform either misuse detection or anomaly detection and may be deployed as a network based system or host based system. This result in four general groups: misusehost, misuse-network, anomaly host and anomaly network. Misuse detection relies on matching known patterns of hostile activity against databases of past attacks. They are highly effective at identifying known attack and vulnerabilities, but rather poor in identifying new security threats. Anomaly detection will search for something rare or unusual by applying statistical measures or artificial intelligence methods to

1. INTRODUCTION: Information has become an organizations most precious asset. Organizations have become increasingly dependent on information, since more information is being stored and processed on network-based systems. Hacking, viruses, worms and trozan horses are some of the major attacks. A significant challenge in providing an effective mechanism to a network is the ability to detect novel attacks or any intrusion works and implement counter measures. Intrusion detection is a critical component in securing information systems. Intrusion detection is implemented by an Intrusion detection system. Intrusion detection system, can detect, prevent and react to

compare current knowledge.

activity

against

historic

information about an existing network configurations.

computer

or

Common problems with anomaly-based systems are that, they often require extensive training data for artificial learning algorithms, and they tend to be computationally expensive, because several metrics are often maintained, and need to be updated against every system activity. Some IDS combine qualities from all these categories and are known as hybrid systems.

Remote to local (R2L) attacks are caused by an attacker who has only remote access rights. These attacks occur when the attacker tries to get local access to a computer network. User to root(U2R) attacks are performed by an attacker who has rights at user level access and tries to obtain super user access. Probing attacks: In this type of attacks, an attacker scans a network of computers to gather information of find known vulnerabilities. Data attacks are performed to gain access to some information to which the attacker is not permitted to access. Many R2L and U2L goals are for accessing the secret files. 2.3 IDS PRINCIPLES: DESIGN

IDS are designed and implemented on modelled network systems. Several points should be predefined and stated, inorder to find proper model for network: Normal behavior of a network system is the most dominant and frequent behavior of the network in a certain time period. Anomaly within the network system least frequent and abnormal behavior of the network at certain time period. Modelling a dynamic and complex system such as the network is very difficult, for this reason , abstraction and partial modelling are used as good solution. The whole network components could be divided into: Host User Network environment The user itself could be divided into legimate user and malicious user (intruder). Many other nested divisions

FIG 1[19]

2.2 COMPUTER CATEGORIES:

ATTACK

DARPA [2] categorizes the attacks into five major types based on goals and actions of the attacker. DoS (Denial-of-service ) attacks tries to make services provided by or to computer users to be restricted or denied. For example, in SYN-flood attack, the attacker floods the victim host with more TCP connections requests that can handle, causing the host to be unable to respond even to valid requests. Probe attacks attempts to get

could occur according to the designers point of view and the areas of focus. An Intruder detection system basically raises an alarm whenever an anomaly event occurs, which could be caused by an intruder to the system. These systems do not react equally at all times, false alarms could occur sometime and this is called False Positive (FP).The lower value of FP gives a higher value of the IDS.[3][4] 2.4 IDS TRENDS DESIGN

and suspects an attack once an anomaly occurs. 2.5 DATA CAPTURING USING SNORT: Snort is mainly a Network Intrusion Detection System (NIDS);it is Open Source and available for a variety of unices. Snort also can be used as a sniffer to troubleshoot network problems. Basically there are three modes in which Snort can be configured: Sniffer mode simply reads the packets off of the network and displays them in a continuous stream on the console. Packet logger mode logs the packets to the disc. Network intrusion detection system is the most complex and configurable configuration, allowing snort to analyze network traffic for matches against a user defined rule set and performs several actions based upon what it sees.

There are number of different ways to classify IDS in order to distinguish between their different types. The most generic classification 1 found for IDS is: Analysis approach Placement of IDS Under each of these categories several classifications could occur.[5] 2.4.1.ANALYSIS APPROACH: Boer and Pels[6], gave three types of IDS which could be listed under this approach: NIDS: Network-based monitors the network traffic. IDS which for malicious

3.DATAMINING AND FUZZY LOGIC 3.1DATAMINI NG Data mining methods are used to automatically discover new patterns from a large amounts of data[7]. Data mining is the automated extraction of previously unrealized information from large data sources for the purpose of supporting actions. The rapid development in data mining has made available a wide variety of algorithms, drawn from the field of statistics, pattern recognization, machine learning and databases. Specifically, data mining approaches have been proposed and used for anomaly detection. 3.1.1.ASSOCIATION RULES Association rules were first developed to find correlations in tractions using real data[8]. For example, if a customer who buys a soft drink(A) usually also buys potato chips(B), then potato chips are associated with soft drinks using the rule A->B. suppose that 25% of all customers buy both A and B and that 50% of the customers who buy

HIDS: Host-based IDS which monitors the activities of a single host. DIDS: Distributed IDS correlate events from different Host- or Network based IDS 2.4.2.PLACEMENT OF IDS: In this respect IDS are usually divided into: SIDS: Signature-based IDS, which studies the attacks patterns and defines a signature for it, to enable security specialists to design a defense against that attack. AIDS: Anomaly-based IDS, which learns the usual behavior of a network patterns,

A also buy B. then the degree of support for the rule is s=0.25 and degree of confidence in the rule is c=0.50. Agarwal and Srikanth developed the fast Apriori algorithm for mining association rules. The Apriori algorithm requires two thresholds of minconfidence and minsupport. These two thresholds determine the degree of association that must hold before the rule will be mined.

4.

ANOMALY DETECTION VIA FUZZY DATAMINING

3.2 FUZZY LOGIC Fuzzy logic was introduced as a means to the model of uncertainity of natural language. And due to the uncertainity nature of intrusions fuzzy sets are strongly used in discovering attack events and reducing the rate of false alarms at the same time. Basically, intrusion detection systems distinguish between two distinct types of behaviors, normal and abnormal, which creates two distinct sets of rules and information. Fuzzy logic could create sets that have in-between values where the difference between two sets are not well defined. In this case the logic depends on linguistics by taking the minimum of set of events or maximum instead of stating of OR, AND or NOT operation in the if then else condition. This feature strongly participates in reducing the false positive alarm rates in the system[9][10]. Applying fuzzy methods for the development of IDS yield some advantages, compared to the classical approach. Therefore, Fuzzy logic techniques have been employed in the computer security field since in the early 90s. The fuzzy logic provides some flexibility to the uncertain problem of intrusion detection and allows much greater complexity for IDS. Most of the fuzzy IDS require human experts to determine the fuzzy sets and set of fuzzy rules. These tasks are time consuming. However, if the fuzzy rules are automatically generated, less time would be consumed for building a good intrusion classifier and shortens the development time of building or updating an intrusion classifier. A dynamic fuzzy boundary is developed from labelled data for different levels of security.

Fuzzy logic is based on Fuzzy set theory. In contrast to standard set theory in which each element is either completely In or not in a set . fuzzy set theory allows partial membership in sets. This provides a powerful mechanism for representing vague concepts. Data mining methods are used to automatically learn patterns from large quantities of data. The integration of fuzzy logic with data mining methods will help to create more abstract patterns at a higher level than at the data level. Patterns that are more abstract and less dependent on data will be helpful of intrusion detection. In the intrusion detection domain , we may want to reason about a quantity such as the number of different destination IP addresses in the last 2 seconds. Suppose one wants to write a rule such as If the number of different destination addresses during the last 2 seconds was high Then an unusual situation exists. Using traditional logic , one would need to decide which values for the number of destination addresses fall into the category high. As shown in fig 4a. , one would typically divide the range of possible values into discrete buckets, each representing a different set. The value 10 , for example is a member of the set low to the degree 1 and a member of the other two sets , medium and high, to the degree 0. In Fuzzy logic , a particular value can have a degree of membership between 0 and 1 and can be a member of more than one fuzzy set. In fig 4b, for example , the value 10 is a member of the set low to the degree 0.4 and a member of the set medium to the degree 0.75 . In this example , the membership functions for the fuzzy sets are piecewise linear functions. Using fuzzy logic terminology , the number of destination ports is a fuzzy variable(also called a linguistic variable), while the possible values of the fuzzy variable are the fuzzy sets low, medium, and high. In general ,fuzzy variables corresponds to nouns and fuzzy sets corresponds to adjectives.

5. IDS USING ALGORITHMS:

GENETIC

A genetic Algorithm is a programming technique that mimics biological evolution as a problem solving strategy[11]. It is based on Darwinians principle of evolution and survival of fittest to optimize a population of candidate solutions towards a predefined fitness[12][13]. GA uses an evolution and natural selection that uses a chromosome like data structure and evolve the chromosomes using selection, recombination, and mutation operators. The process usually begins with randomly generated population of chromosomes, which represent all possible solution of a problem that are considered candidate solutions. Different positions of each chromosome encoded as bits, characters or numbers. are

4a. NON FUZZY SETS

These positions can be referred to as genes. An evolution function is used to calculate the goodness of each chromosome according to the desired solution, this function is known as Fitness Function. During evolution, two basic operators, crossover and mutation ,are used to simulate the natural reproduction and mutation of species. The selection of chromosomes for survival and combination is biased towards the fittest chromosomes[14][15][17]. The following figure taken from [16] shows the structure of a si9mple genetic algorithm. Starting by random generation of initial population, then evaluate and evolve through selection ,recombination ,and mutation. Finally, the best individual(chromosome) is picked out as the final result once the optimization meet it target. Many others and researchers are highly motivated to genetic algorithms as a strong and efficient method used in different field in Artificial Intelligence, noting that several AI techniques could be combined in different ways in different systems for several purposes. The genetic algorithm is employed to drive a set of classification rules from network audit data, and the support-confidence framework is utilized as

Using fuzzy logic , a rule like the one shown above could be written as If the DP=high Then an unusual situation exists. Where DP is a fuzzy variable and high is a fuzzy set. The degree of membership of the number of destination ports in the fuzzy set high determines whether or not the rule is activated.

fitness to judge the quality of each rule. The generated rules are the used to detect or classify network intrusions in a real-time environment.

an attacker who is previously known in the system.[16][8]. 2-GA Parameters

5.1. A GENETIC INTRUSION APPROACH:

GA

BASED DETECTION

GA has some common elements and parameters which should be defined : Fitness Function is defined according to[11],The fitness function is defined as a function which scales the value individual relative to the rest of population. It computes the best possible solutions from the amount of candidates located in the population. GA Operators According to the figure below we could see that the selection mutation and crossover are the most effective parts in the algorithm as they are they participate in the generation of each population.

As a conclusion of what previously presented of AI based IDS, these systems work is divided into two main stages. First the training stage which provides the system with necessary information required initially, after that the next step is the detection stage where the system detects intrusions according to what was learned in the previous step. Applying this in GA based IDS; the GA is trained with classification rules learned from previous network audit data. The second stage is applied in real-time manner by classifying the incoming network connections according to the generated rules. Many systems have been proposed in a lot of researchers in either simple or advanced fashion ,but to give a general idea of the components of the system and basic mechanism of it; the three following components will be highlighted: 1-Data Representation Genes should be represented in some format using different data types such as byte, integer and float. Also they may have different data ranges and other features, knowing that the genes are generated randomly ,in each population generating iteration. Genetic Algorithms can be used to evolve rules for the network traffic; these rules are usually in the following form: if {condition} then {act}[16]. It basically contains if-then clause, a condition and an act. The conditions usually matches the current network behaviour with the one stored in the IDS such as comparing an intruder source IP address and port number with one already stored in the system. The act could be an alarm indicating that the intrudes IP and port numbers are related to

Selection is the phase where population individuals with better fitness are selected ,otherwise it gets damaged. Crossover is a process where each pair of individuals selects randomly participates in exchanging their parents with each other, until a total new population has been generated. Mutation flips some bits in an individual ,and since all bits could be filled ,there is low probability of predicting the change.

3-Detection Algorithm overview In [8],a genetic algorithm has been presented which contains a training process. This algorithm is designed to apply set of classification rules according to the input data given. It follows the simple flow of genetic algorithms presented in the Figure{draw the fig: the operation of GA} PROCEDURE: Rule set generation using genetic algorithm. INPUT: Network audit data, number of generations , and population size. OUTPUT: A set of classification rules PARAMETERS: NAD: Network Audit Data PS: Population Size N: Number of records in training set PSEUDO CODE: Ruleset (NAD, PS, N) { W1=0.2, W2=0.8, T=0.5; For each chromosome in PS Begin 1: A=0; AB=0; For each record in training set Begin 2: If (record==chromosome) AB=AB+1; If (record==condition) A=A+1; End 2 Fitness=W1*AB/N+W2*AB/A; If (Fitness>T)

Select chromosome into new population; End 1 For each chromosome in new population Begin 3: Crossover (chromosome); Bmutation (chromosome); End 3 If (Number of generations<N) Goto Begin 1; }

6. CONCLUSION AND FUTURE WORK: In this review, we have integrated data mining techniques with fuzzy logic to provide new techniques for intrusion detection. Also, we presented approacheslike Anomaly based intrusion detection using fuzzy data mining techniques and IDS using Genetic algorithms. Genetic algorithms and Fuzzy logic along with neurocomputing techniques are major parts of soft computing: a set of computing technologies already riding the waves of the next century to produce a human-centered intelligent systems of tomorrow. Fuzzy logic techniques can dynamically control parameter settings of genetic algorithms. In our future work, we could implement these techniques so that the resulting IDS works more efficiently with low false positive alarming rates and can detect intrusions effectively.
6.0 REFERENCES 1.Bace R.G Intrusion detection, Technical Publishing (ISBN 157870-185-6) 2.MIT Lincoln laboratory, 1999 DARPA intrusion detection evolution and procedure, DARPA technical report Feb 2001 3.Kabiri, P., Ali A. Ghorban.research on intrusion detection and response: a survey.

4.Gorodetsky,V.,I.Kotenko, and O.Karsaev. Multi agent technologies for computer network security:Attack simulation, Intrusion detection and intrusion detection learning. 5. Luan Qinglin, Lu Huibin, Research of intrusion detection based on neural network optimized by adaptive genetic algorithm, Computer Engineering and Design, vol. 29,no. 12, pp. 3022-3025, 2008. 6.De Boer, P., and Martin Pels, Host-Based Intrusion Detection System., Technical Report:1.10, Faculty of Science, Informatics Institute, University of Amsterdam, 2005. 7.Lee, W., S. Stolfo, and K.Mok. 1998. Mining audit data to build intrusion detection models. 8.Agarwal, R., and R. Srikanth. 1994. Fast algorithms for mining association rules. 9.Yao, J.T., S.L.Zhao, L.V.Saxton, A study of Fuzzy Intrusion Detection, Data Mining, Intrusion Detection, Information Assuarance, And Data Networks Security, 28 March-1 April 2005, Orlando, Florida, USA. 10.Gomez, J., and D.Dasguptha. Evolving Fuzzy Classifiers for Intrusion detection . 11. Bobor, V. Efficient Intrusion Detection Architecture Based on Neural Networks And Algorithms. System Genetic

12.Li, W., Using Genetic Algorithms For network intrusion detection. 13. Marczyk, A.Genetic Algorithms and Computation Techniques, 24 April, 2004. Evolutionary

14.Song, D., A LINEAR GENETIC PROGRAMMING APPROACH TO INTRUSION DETECTION GECCO 2003. 15.Sinclair, C., L. Pierce, S.Matzner, AN APPLICATION OF MACHINE LEARNING TONETWORK INTRUSION DETECTION SYSTEM, 16.Li,W.,Using Genetic Algorithm for Network Intrusion Detection, Proceedings of the United states Department of Energy Cyber Security Group 2004 Training Conference, May 24-27,2004,Kanasas City,USA. 17.Gong, R.H., M.Zulkernine, P. Abolmaesumi, A Software Implementation of a Genetic Algorithm Based Approach to Network Intrusion Detection System. 18.Gong,R,H,M.Zulkernine ,P.Abolmaesumi ,A software Implementation of a genetic algorithm based approach to network Intrusion Detection, proceedings of sixth IEEE ACCIS international conference on software engineering, Artificial Intelligence, Networking ,and Parallel/Distributed Computing ,May 2005,Maryland ,USA. 19.Novel Attack Detection Using Fuzzy Logic and Data Mining by Norbik Bashah Idris and Bharanidharan Shanmugam.

Anda mungkin juga menyukai