Anda di halaman 1dari 4

Data Compression

LZ77 and 78 algorithms


Subrahmanya Akhil Koundinya C Dept.

Sravan Kumar Reddy M

of Computer Science and Engineering


J.B.I.E.T
Hyderabad, India
akhil.cse7@gmail.com
Abstract This document refers to Data Compression
algorithms, LZ77 and LZ78. Data Compression, or Bit Rate
reduction involves encoding information using fewer bits than the
original. It can be of two types, either a lossy or lossless
compression. There are certain advantages and disadvantages for
each type, and they depend on purpose and area of deployment.

I. INTRODUCTION
Named after Abraham Lempel and Jacob Ziv, with Their
publishing years 1977 and 1978 suffixed. These are Dictionary
based algorithms. LZ77 compression keeps track of the last nbytes of data seen, and when a phrase is encountered that has
already been seen. And replaces them with a key. LZ78
compression, a substitutional compression scheme which
works by entering phrases into a dictionary and then, when a
re-occurrence of that particular phrase is found, outputting the
dictionary index instead of the phrase.
II. EASE OF READING
A. Abbreviations and Acronyms
ZIP mean move at high speed
PKZIP Derived from company PKWARE

Dept. of Electronics and Communications Engineering


J.B.I.E.T Hyderabad,
India
sravan.chat@gmail.com

B. Units

Byte 8 bits
Mbps Mega bytes per secondary
Kbps Kilo Bytes per second
III. ALGORITHMS

A. LZ77 pseudo code


1.begin
2. fill view from input
3. while (view is not empty) do
4. begin
5.
find longest prefix p of view starting in coded part
6.
i := position of p in window
7.
j := length of p
8.
X := first char after p in view
9.
output(i,j,X)
10.
add j+1 chars
11. end
12.end

GZIP G is from GNU


PNG Portable Network Graphics
LZW Lempel Ziv Welch
AVC Advanced video codec
MPEG Moving Pictures Expert Group
15: end
16:
ISO International Organization for Standardization
17: DECODING
ITU-T
- International Telecommunication Union
18:
begin
19:
dictionaryElectrotechnical
by an empty phrase
IEC -init
International
Commission
20: while (not EOF) do
SMPTE
Society of Motion Picture and Television Engineers
21:
begin
22:
read pair of index and character (i,X) from input
23:
put new phrase phrase(i).X into distionary
24:
generate phrase to the output
25:
end
26: end
IV.
DESCRIPTION
A. Figures

B. LZ78 pseudo code


1: begin
2: initialize a dictionary by empty phrase P
3: while (not EOF) do
4: begin
5: readSymbol(X)
if (F.X> is in the dictionary) then
6:
7:
F = F.X
8:
else
9:
begin
output(pointer(F),X)
10:
11:
encode X to the dictionary
12:
initialize phrase F by empty character
13:
end
14: end
Fig. 1 Classification of
compression

r
.

i l

t il fr

t
,

t
i it
o

et
lr t i r
t
it .
f
tf
B. Lossy
.
l
D opping non-essent a de a
om he da a source can sav
da a storage space Some of the examples: JPEG
.D g a came
as o nc easealgorithms
da a capacusually
y DVDs
use MPEG
Lossless
data compression
exploit
2 vide coding
orma to
orrepresent
video compression.
statistical
redundancy
data more concisely
without losing information. Some of the examples are:
C Loss ess
1.LZ77 and LZ78.2.DEFLATE, LZW are variations on LZ.
They are
used in PKZIP, GZIP, PNG, GIF images. 4.ZIP and
UNZIP uses LZH techniques.

D.
LZ77
LZ77 algorithms achieve compression by replacing
repeated occurrences of data with references to a single
copy of that data existing earlier in the uncompressed data
stream. A match is encoded by a pair of numbers called a
length-distance pair, which is equivalent to the statement
"each of the next length characters is equal to the characters
exactly distance characters behind it in the uncompressed
stream". (The "distance" is sometimes called the "offset"
instead.)
To spot matches, the encoder must keep track of some
amount of the most recent data, such as the last 2 kB, 4 kB,
or 32 kB. The structure in which this data is held is called a
sliding window, which is why LZ77 is sometimes called
sliding window compression. The encoder needs to keep this
data to look for matches, and the decoder needs to keep this
data to interpret the matches the encoder refers to. The
larger the sliding window is, the longer back the encoder
may search for creating references.
It is not only acceptable but frequently useful to allow
length- distance pairs to specify a length that actually
exceeds the distance. As a copy command, this is puzzling:
"Go back four characters and copy ten characters from that
position into the current position". How can ten characters be
copied over when only four of them are actually in the
buffer? Tackling one byte at a time, there is no problem
serving this request, because as a byte is copied over, it may
be fed again as input to the copy
command. When the copy-from position makes it to the
initial destination position, it is consequently fed data that
was pasted from the beginning of the copy-from position.
The operation is thus equivalent to the statement "copy the
data you were given and repetitively paste it until it fits".
As this type of pair repeats a single copy of data multiple
times, it can be used to incorporate a flexible and easy form
of run-length encoding.
Another way to see things is as follows: While encoding, for
the search pointer to continue finding matched pairs past the
end of the search window, all characters from the first
match at offset D and forward to the end of the search
window must have matched input, and these are the
(previously seen) characters that comprise a single run unit of
length LR, which must equal D. Then as the search pointer
proceeds past the search window and forward, as far as the
run pattern repeats in the input, the search and input pointers
will be in sync and match characters until the run pattern is
interrupted. Then L characters have been matched in total,
L>D, and the code is [D,L,c].

length LR until L characters have been copied to output


in total.
Considering the above, especially if the compression of data
runs is expected to predominate, the window search should
begin at the end of the window and proceed backwards, since
run patterns, if they exist, will be found first and allow
the
search to terminate, absolutely if the current maximum
matching sequence length is met, or judiciously, if a

Upon decoding [D,L,c], again, D=LR. When the first LR


characters are read to the output, this corresponds to a single
run unit appended to the output buffer. At this point, the read
pointer could be thought of as only needing to return
int(L/LR) + (1 if L mod LR does not equal 0) times to the
start of that single buffered run unit, read LR characters (or
maybe fewer on the last return), and repeat until a total
of L characters are read. But mirroring the encoding process,
since the pattern is repetitive, the read pointer need only trail
in sync with the write pointer by a fixed distance equal to
the run
1996
H.263
ITU-T
Videoconferencin
g, Video
telephones,
Video on Mobile
Phones (3GP)
sufficient length is
met, and finally

for the simple possibility that


the

1999 MPEG 4 Part


2
data is more recent and may correlate better with the
next input.

ISO, IEC
Video on Internet
2003

H.264/MPE
G4 AVC

Sony,
Panasonic,
Samsung,
ISO,
IEC,ITU-T

2009

VC-2 (Dirac)

SMPTE

For example, look at table 2.


E. LZ78
To avoid the problems that occurred with LZ77, Ziv and
Lempel developed a different form of dictionary-based
compression. LZ78 abandons the concept of a text window.
In LZ77, the dictionary of phrases was defined by a fixedlength window of previously seen text. Under LZ78, the
dictionary is a potentially unlimited collection of previously
seenphrases. LZ78-based schemes work by entering
phrases into a
dictionary and then, when a repeat occurrence of that
particular phrase is found, outputting a token that consists
of
the dictionary index instead of the phrase, as well as a
single
character that follows that phrase. Unlike LZ77, there is no
need to pass the phrase length as a parameter because
decoder already has this information.
Look at figure 2, for
example.
A.
Tables

TABLE I.

TABLE FOR VIDEO

Standard

Publisher

1984

H.120

ITU-T

1993

H.261

MPEG-1
Part 2

Video on
Internet, HDTV
broadcast,
UHDTV

REFERENCES
Wikipedia

Webresource:

Year

1988

Blue ray,HD
DVD, Digital
Video
Broadcasting, iPod
Video, Apple TV,
videoconferencing

TABLE2. AN EXAMPLE FOR LZ77 ALGORITHM

[1]
[2]

COMPRESSION

(DivX, Xvid...)

ITU-T

ISO, IEC

Popular
Implementations
-

Video conferencing
Video telephony
Video -CD

http://cs.stanford.edu/people/eroberts/courses/soco/projects/200001/data-compression/lossless/lz78/concept.htm for Lossless


compression
- LZ78

[3]
Webresource:
http://cs.stanford.edu/people/eroberts/courses/soco/projects/200001/data-compression/lossless/lz77/index.htm for Loss less
compression
LZ77.

[4]
WebResource:
http://www.stringology.org/DataCompression/lz78/index_en.html

1995

H.262/
MPEG-2
Part2

ISO, IEC,
ITU-T

DVD Video, Blue ray,


Digital Video
Broadcasting, SVCD

Anda mungkin juga menyukai