Anda di halaman 1dari 35

Intelligent Dictionary Based Encryption

And Compression Algorithm


(For Network Security and High Speed Text Data Transfer)

Hanly Y Nadackal
R7A 29
Guided By : Sindhu M P
16/09/’09
Current Scenario:-

 Computers - now a days are a part of a network


 Data residing inside a system is much important
 But, This Data is Vulnerable to exploits
 Of course, Data can be made secure within a
computer by the User
 Anyhow, The Data sent over a network is
vulnerable to external attacks…
So, Why Network Security?:-

 Confidentiality- Only sender, intended receiver


should “understand” message contents

 Authentication- Sender, receiver want to


confirm identity of each other

 Message Integrity- Sender, receiver want to


ensure message not altered (in transit, or
afterwards) without detection
Security now in practice:-

 Firewall- Is designed to block unauthorized


access while permitting authorized
communications

 Security features built up in Application, Transport,


Network, Link layers
Friends and enemies: Alice, Bob, Trudy:
 well-known in network security world
 Bob, Alice want to communicate “securely”
 Trudy (intruder) may intercept, delete, add
messages

Alice Bob
channel data, control
messages

data secure secure data


sender receiver

Trudy
Who might Bob, Alice be?

 … well, real-life Bobs and Alices!


 Web browser/server for electronic transactions
(e.g., on-line purchases)
 On-line banking client/server
 DNS servers
 Routers exchanging routing table updates
There are bad guys (and girls) out there!

Q: What can a “bad guy” do?


A: A lot!
 Eavesdrop: Intercept messages
 Actively Insert messages into connection
 Impersonation: Can fake (spoof) source address in
packet (or any field in packet)
 Hijacking: “Take over” ongoing connection by
removing sender or receiver, inserting himself in
place
 Denial Of Service(DoS): Prevent service from being
used by others (e.g., by overloading resources)
That’s the reason why we need the network
to be secure………………
Why Compression of data essential?:-

 There has been an unprecedented explosion in the


amount of digital data transmitted via the Internet
 It is estimated that in the year 2004 the National
Service Provider backbone will have an estimated
traffic around 30000Gbps and that the growth will
continue to be 100% every year.
 With this trend expected to continue, it makes
sense to pursue research on developing algorithms
that can most effectively use available network
bandwidth by maximally compressing data.
What Compression algorithms do?:-

 Compression algorithms reduce the redundancy in


data to decrease the storage required for that data.
 Offers an attractive approach to reduce the
communication costs by using available bandwidth
effectively.
So, What if the Encoding and compression
done in a single Algorithm?

 In early times we achieved this by using a “STAR


ENCODING”
Star Encoding:-

 Here each word has a star-encoded equivalent, in


which as many letters possible are replaced by the
'*’ character.
 For example, a commonly used word such ‘the’
might be replaced by the string t**.
 The star-encoding transform simply replaces ‘every
occurrence’ of the word ‘the’ in the input file with
t**.
Star Encoding continued….

 Ideally, the most common words will have the highest


percentage of '*' characters in their encoding.
 If done properly, this means that transformed file will
have a huge number of '*’characters.
 This ought to make the transformed file more
compressible than the original plain text.
Star Encoding continued….

 The existing star encoding does not provide any


compression as such but provide the input text a
better compressible format for a later stage
compressor.
 The star encoding is very much weak and
vulnerable to attacks.
Star Encoding continued….
Consider the following text taken from Romeo and Juliet:

“But soft, what light through yonder window breaks?


It is the East, and Iuliet is the Sunne,
Arise faire Sun and kill the enuious Moone,
Who is already sicke and pale with griefe,
That thou her Maid art far more faire then she...”
Star Encoding continued….

Running this text through the star-encoder yields:-

“B** *of*, **a* **g** *****g* ***d*r ***do* b*e***?


It *s *** E**t, **d ***i** *s *** *u**e,
A***e **i** *un **d k*** *** e****** M****,
*ho *s a****** **c*e **d **le ***h ****fe,
***t ***u *e* *ai* *r* f*r **r* **i** ***n s**…”
Star Encoding continued….

 You can clearly see that the encoded data has


exactly the same number of characters, but is
dominated by stars.
 It certainly looks as though it is more compressible
and at the same time does not offer any serious
challenge to the hacker!
 That is, the star encoding provide a compressible
form of output compromising the Security of data
transfer
Do an Algorithm gives both the “Security”
and “Compression” at the same time?

Yes!!!
The Algorithm is:
 “An Intelligent Dictionary Based Encoding and
Compression Algorithm”
 ..Which will offer higher compression ratios and
better security towards all possible ways of attacks
while transmission.
IDBE Algorithm:-

Consider the following paragraph:


“Our philosophy of compression is to trasfom the txt
into som intermedate form which can be
compresed with bettr efficency and which
xploits the natural redundancy of the language
in making this tranformation.”
 The above block is written with a lot of spelling
mistakes, but most people will have no problem to
read it…
IDBE Algorithm continued….

 Because our visual perception system recognizes


each word with an approximate signature pattern
and we have a dictionary in our brain, which
associates each misspelled word with a
corresponding, correct word.
 This is the core concept in the development of this
algorithm.
IDBE Algorithm continued….

The algorithm is developed in two steps:

I. Make an intelligent dictionary


II. Encode the input text data using the dictionary
IDBE Algorithm continued….

Dictionary Making Algorithm:


1. Extract all words from input files.
2. If a word is already in the table increment the number
of occurrence by 1, otherwise add it to the table and set
the number occurrence to 1.
3. Sort the table by frequency of occurrences in
descending order.

A
IDBE Algorithm continued….
A

4. Start giving codes using the following method:


i). Give the first 218 words the ASCII characters 33
to 250 as the code.
ii). Now give the remaining words each one
permutation of two of the ASCII characters (in the
range 33 - 250), taken in order. If there are any
remaining words give them each one permutation of
three of the ASCII characters and finally if required
permutation of four characters and so on.
5. Create a new table having only words and their
codes. Store this table as the Dictionary in a file.
6. Stop.
IDBE Algorithm continued….

Encoding Algorithm:

A. Read the dictionary and store all words and their


codes in a table
B. While inp is not empty
1.Read the characters from inp and form tokens.
2. If the token is longer than 1 character, then
1.Search for the token in the table
2. If it is not found,
Write the token as such in to the output file.

B
IDBE Algorithm continued….
B

3.Else(that is, if the token is found in the dictionary)


1. Find the length of the code for the word(which

is used when decoding is to be done)


2. Write the actual code into the output file.
3. Read the next token and neglect the it
if it is a space. If it is any other character,
go back to B

C
IDBE Algorithm continued….
C

3.Else(That is, if it is a ‘1 character token’)


1. Write the 1 character token
2. If the token is one of the ASCII characters
251 - 255, write the character once more so as to

show that it is part of the text and not a marker


C. Stop
IDBE Algorithm continued….
Consider the following data taken from “THE BIBLE”

“In the beginning God created the heaven and the earth.
And the earth was without form, and void; and darkness
was upon the face of the deep. And the Spirit of God
moved upon the face of the waters. And God said, Let
here be light: and there was light. And God saw the light,
that it was good: and God divided the light from the
darkness. And God called the light Day, and the darkness
he called Night. And the evening and the morning were
the first day.…”
IDBE Algorithm continued….
Running the text through the Intelligent Dictionary Based
Encoder (IDBE) yields the following text:

“û©û!ü%;ûNü’.û!ü".û"û!û.ÿ. û*û!û.û5ü"8ü"}ÿ, û"ü2Óÿ;


û"ü%Lû5ûYû!ü"nû#û!ü&.ÿ.û*û!ü%Ìû#ûNü&ÇûYû!ü"nû#û!
ü#Éÿ.û*ûNûAÿ, ü"¿û]û.ü".ÿ: û"û]û5ü".ÿ.û*ûNü"Qû!ü".ÿ,
û’û1û5û²ÿ: û"ûNü(Rû!ü".û;û!ü%Lÿ….”

 It is clear from the above sample output that the


encoded text provide a better compression and a stiff
challenge to the hacker!
Performance analysis:-

 The performance issues such as Bits Per Character


(BPC) and conversion time are compared for the two
cases i.e., Star encoding and our proposed Intelligent
Dictionary Based Encoding (IDBE).

 The results are shown below:


BPC(Bits Per Character) and Time comparison of *Encode & IDBE
BPC & Conversion time comparison of transform with *Encoding

and IDBE for the above data


Conclusion :-

 In an ideal channel, the reduction of transmission time


is directly proportional to the amount of compression.
 But in a typical Internet scenario with fluctuating
bandwidth, congestion and protocols of packet
switching, this does not hold true.
 But, Results have shown excellent improvement in text
data compression and added levels of security over
the existing methods.
References :-

 [ZiLe77] J. Ziv and A. Lempel. .A Universal


Algorithm for Sequential Data Compression., IEEE
Trans. Information Theory, IT-23, pp.237-243.
 [Welc84] T. Welch, .A Technique for High-
Performance Data Compression., IEEE Computer,
Vol. 17, No. 6, 1984.
 [Moff90] A. Moffat. .Implementing the PPM Data
Compression Scheme., IEEE Transactions on
Communications, COM-38, 1990, pp. 1917-1921.
Questions?

Anda mungkin juga menyukai