“DNA CHARACTERIZATION”
ABSTRACT With the increasing use of personal
computers in business and home, the
With growing Internet connectivity topic of computer security must be
comes growing opportunities for continually addressed. Only with
attackers to illicitly attack the computer adequate computer security can users be
over the network. Continuous research is certain that their computers are not
being carried upon to develop a infected by malicious programs or being
completely dependable “Intrusion used for malicious purposes. In order for
Detection” system with most of the a computer to have a dangerous program
research channeled towards developing installed on it or for it to be used
new systems rather than adapting pre- inappropriately, it must first be intruded
existing ones. The research is mostly upon. Computer intrusion may occur via
focussed on strengthening the security as an e-mail attachment, a download from a
opposed to recovery. website, physically via a disc, or by
In this paper we present a new method to unauthorized access. Therefore, against
hit the headlines in the field of Intrusion computer intrusion two defense
Detection, namely “DNA techniques are employed. They are:
Characterization” which draws its
inspiration from the human genome, to Intrusion Prevention: This scheme
detect the intrusion as early as possible. may be utilized to prevent viruses,
We do this, based upon the “Teiresias” malicious software and unauthorized
algorithm, which helps us to generate a users are prevented from entering our
DNA sequence with which the newly computer. Protection of this nature can
generated sequences are compared and if be accomplished with the use of
there are any conflicts then the user is > Password controlled access.
informed that a possible intrusion is > Software signature recognition.
happening. This is well illustrated by > Firewalls.
several self-explanatory examples.The But, inevitably the best intrusion
long term goal of our paper is not only to prevention systems fail due to the
develop a system with more continuous evolution of malicious
“dependability” but also to enable software and the persistence of
“survivability” so as to hold its place in unauthorized users.
the ever challenging world of network
intrusion and security. Intrusion Detection: Due to the
shortcomings present in the above
method, this defence scheme gains
1. INTRODUCTION
importance. In this paper, we define a
unique method called COMPUTER
DNA CHARACTERIZATION for
determining whether our computer is
being intruded or attacked through
networks.
1.1 Why we use COMPUTER The first base structure contains
DNA CHARACTERIZATION? only three pieces of information namely,
amount of TCP, UDP, and ICMP traffic
The benefits of decoding the human over a certain time period. Sequences
genome are to have a better generated from this base only present the
understanding of diseases and efficient most basic information about a computer
development of drugs and techniques system’s network activity. A sequence of
that will treat the disease. Thus, early this type is illustrated in Figure 1.
detection of diseases will be
possible.Computer characteristics such
as network traffic, modification of
system files, and modification of data
files are determined by the computer’s
DNA sequence. As with humans, the
initial DNA sequences are
predetermined on inception and ideally To retrieve information pertaining to
do not cause the system any harm. But packet type, the IP header must be
any user trying to intrude the system examined. Although various protocols
brings an abnormal change in the exist we use only TCP, UDP and ICMP
generated DNA sequence, leading to an protocols. Although very basic in nature,
early intrusion detection, which limits sequences based on this fundamental
the damage and quickens the process of base structure can provide important
recovery. information about computer network
usage. For eg., a sequence may be
generated for a home computer that is
commonly used to retrieve information
from the Internet but is not used to
download streaming audio or video files.
Therefore, this sequence would predict
average TCP network traffic but
minimal UDP network traffic. However,
if a computer attacker attempts to access
this particular computer via a UDP flood
or a similar attack, the increase in UDP
network traffic would be detected and
the user can be alerted to a possible
intrusion.
A COMPUTER ILLUSTRATION
2.1.2. Detailed Base Structure:
2. COMPUTERNETWORK As with the fundamental base
TRAFFICDNASEQUENCES structure, this sequence will include
2.1. Base Structure Definition information about the no. of UDP
Before a computer DNA packets and ICMP packets over a given
sequence can be generated, the structure time period. However, TCP packets are
of the base pairs must first be further separated into three segments
determined. There are two base as ,
structures:
2.1.1. Fundamental Base Structure:
The logic behind using a structure of this
form is to detect a common computer Mostly network communications occur
attack known as a SYN flood. Normally, via commonly used ports such as port
communications between two computers 23, 80 and 25. Therefore, a base
is initiated by a three-way handshake. structure could enumerate the no. of
When the destination host receives a packets processed by commonly used
TCP packet with the SYN flag set, the ports as well as enumerate packets
destination host replies with a SYN received and sent by other ports.
ACK packet and then waits for an ACK
packet from a source host. While waiting
for the ACK packet, a connection queue
of finite size on the destination host
keeps track of connections waiting to be
completed. This queue typically empties
quickly since ACK is expected to arrive
a few millisec., after the SYN ACK. A
computer attacker exploits this design by
generating numerous SYN packets with
random source IP addresses. These
packets are then directed towards a
victim and the victim replies with a SYN
ACK packet back to the random source 3. DNA SEQUENCE
IP addresses. Further, an entry is added
GENERATION USING
to the victim’s connection queue.
Because the SYN ACK packet is send to TEIRESIAS ALGORITHM
random IP addresses, the victim does not
receive the final ACK packet and the Once the base structures have
last part of the three-way handshake been defined, sequences of these bases
is never completed. However, the entry must be generated to form Computer
remains in the connection queue until a DNA using Teiresias algorithm.
timer expires, typically for about one
minute. When the connection queue is Algorithm:
full, legitimate users will not be able to
access TCP services such as e-mail and Step 1: Collection of the raw data
web browsing. pertaining to network activity.
Suggestions for improvement: Base Step 2: Raw data must be processed into
Structure using Port Numbers the two base structures as shown below.
A base structure containing information For eg.,[10, 3, 1] indicate 10 TCP ,3
about the computer ports being used for UDP and 1 ICMP packets
communications may be beneficial. Step 3:After all base structures have
been generated for a given time interval,
repeated structures are determined
via Teiresias method.
Step 4: A collection of the commonly to changes in users and user activities.
repeated base structures then forms the
computer DNA sequence to predict 4. Parameters in Computer DNA
computer network activity. In the fig 4, Sequence Generation
the base structure [10, 3, 1] occurs 17
times in the given time period
Time period allocated to a particular
instance of a base structure: Each base
structures in fig 4, indicates computer
network activity for a time period of five
seconds. Generally, if a longer time
period is used to generate base
structures, the no. of packets that a
computer system must process increases.
Time period for collecting base
structures to create computer’s DNA:
Base structures indicating computer
network activity will be collected for a
certain time interval and then processed
by Teiresias to determine commonly
Once the computer DNA sequence is repeated structures. Generally, if a
created, new base structures generated longer time interval is allocated for base
by real time network activity can be structure
compared to structures contained within collection, the no. of repeated base
the DNA sequence. For, eg., if a new structures will increase, thereby
base structure of the form [1000,500,40] generating a larger DNA sequence.
is generated for the Tolerance:The tolerance factor is an
predetermined time period, but absent in integer used as a parameter of a function
the DNA sequence of the computer in to round a number to the closest multiple
question, a flag would be raised of the tolerance factor. Tolerance factor
indicating a computer attack may be is included to allow similar numbers to
occurring. be treated as equal.
Suggestions for improvement:
DNA EvolutioN
Once the initial DNA sequence has been
established, a mechanism should be
developed so that the sequence is
continuously updated by
Adding
sequences that are not currently Suggestions for improvement:
present. Time
allocated per base structure :
Eliminating
initial sequences that (1). A shorter time period may be more
were absent in future monitoring. effective for a computer, subjected to
By continuously evolving, the DNA continuously high levels of network
sequence for a computer will more activity.
accurately predict the levels of computer Tolerance:
network activity currently being
experienced by adapting
(1). If security requirements are higher, a period is large we get a larger DNA
lower tolerance factor should be used to sequence
generate a more specific DNA sequence
capable of detecting smaller variations in
network activity.
(2). If network activity volume is high, a
large tolerance factor should be utilized
to limit the no. of false positives
5.NETWORK MONITORING