Anda di halaman 1dari 73

1

Digital Still Image and Digital Video


Definitions DCT compression
J ouko Kurki, 6.1.2014

Copyright J ouko Kurki, 2005-2014
References:
Michael Robin and Michael Poulin: Digital television Fundamentals, 2nd ed., 2000, ISBN 0 07
135 581 2, Iain E.G. Richardson, H.264 and MPEG-4 Video Compression, Wiley, England, 2003,
ISBN 0-470-84837-5. J erry D. Gibson (ed), Multimedia Communications, Academic Press, 2001,
ISBN 0-12-282160-2, Information from the Web, and other documents.
DVF_2_DigitalImage_Video_Definitions_Compression_DCT.ppt
2
Digital image
Digital image is formed by an array of equal size picture elements, PIXELS, or PELs
Pixels can be square or non-square (typical with video)
Each pixel holds a color value and brightness value.
The color value is presented by the used color space, e.g. RGB
In typical RGB video system each color element (R/G/B) is coded with 8 bits, and thus has 256 different
possible values 0..255. Thus there can be 256*256*256 =2
3*8
~16 million different colors and the color
depth is said to be 3*8 =24 bits.

Horizontal resolution, N pixels
Vertical resolution,
M pixels
Pixel
3
Image size and resolution
In the picture there can be N horizontal and M vertical pixels, so total there would be
NxM pixels

For a 10x13 cm digital camera the pixel count for a good quality snapshot would be e.g.:
1300 horizontal and 1000 vertical pixels, so total there would be NxM =1300000 pixels =
1.3 Mega pixels. If each pixel is coded according to the 8-bit RGB color scheme, it would
hold 1.3 M * 24 bits of information =1.3 M * 3 Bytes =3.8 MB of information.

However typical file size from such a camera is approximately 0.5 MB, so where is the
error ?
The answer is that the image information is strongly compressed e.g. by J PEG-
compression algorithm - to a faction of about 10 % !

The density of pixels in the (printed) picture is called resolution. Bigger number of pixels
in a certain pictures size makes the picture look more accurate. The unit of resolution
pixels per inch (ppi). Printer resolution is defied as dots per inch (dpi), that defines the
number of inks spots / inch. Typical number with inkjet printers is 300 dpi and suitable
resolution of the pictures is 100 ppi.

4
Picture sizes, TV Monitor (CIF) and Computer
Monitor (VGA) Resolution Formats

CIF
Formats
Sub-
QCIF
QCIF CIF
No of
Pixels
128 x
96
176 x
144
352 x
288

VGA
Formats
QQVGA QVGA VGA SVGA XGA SXGA UXGA HDTV QXGA
No of
Pixels
160 x
120
320 x
240
640 x
480
800 x
600
1024 x
768
1280 x
1024
1600 x
1200
1920 x
1080
2048 x
1536

The old PC monitor had a pixel size of 640x480, and aspect ratio 4:3. This serves still as basis
for many NTSC video applications. In Europe (PAL world) numbers are slightly different for
TV and many video applications
Below is a summary of pixel sizes for many popular formats. Note that for TV (CIF) numbers
may differ in North American and Europe.
CIF =Common Intermediate Format: 352 x 288 picture size, for video 30 fps frame rate. Used
e.g. for videoconferencing. 4CIF =2x pixel count in both dimensions ->720x576 (digital TV
resolution; Quarter CIF pixel count in both dimensions.
5
Color systems and Color Spaces
Color space is the system under which colors are defined, e.g. RGB-system is
used in TV and PC-monitor to define the color.

Two regions in our visual field that appear to have the same color need not have
the same spectrum.

Color reproduction schemes rely on the fact that any color visible by humans can
be approximated by the combination of a limited subset of visible light frequencies.

The main color spaces have at least three dimensions:
RGB (red, green, blue)
CMY (cyan, magenta, yellow)
HSB (hue, saturation, brightness)
HLV (hue, lightness, value)
XYZ (tristimulus)

Some printing schemes use more than three colors of ink:
CMYK (cyan, magenta, yellow, key)
CMYK+spot (cyan, magenta, yellow, key, special color)
Hexachrome (cyan, magenta, yellow, black, green, orange)

6
The additive color wheel (RGB color space)
The basic rules of
additive color mixing:
red +green =yellow
green +blue =cyan
blue +red =magenta
red +green +blue =white
Additive color mixing:
Used in TV, PC-monitors etc.
Used to express the color: All possible colors are presented on the circle. Different colors
(hue or tint) are expressed as degrees on the color wheel: 0
o
=red, 120
o
=Green , 240
o

=Blue. These are primary colors (in Finnish pvrit), the colors between the main colors
are complementary colors (Fi vlivrit).

the absence of light is darkness, add light to it to create desired color.
Color are super positioned (lamp overlap)
small elements (TV pixels, halftones)
Usually 8 bits / color (24 bits total), thus 2
24
~16.7 million colors

7
Subtractive color mixing
The basic rules of
subtractive color mixing:
cyan +magenta =blue
magenta +yellow =red
yellow +cyan =green
cyan +magenta +yellow =
black
the
subtractive
color wheel
Used in printing while adding inks
The applied inks reflect certain wavelengths to give appearance of a desired color
In most cases, each of the four channels is a value between 0 and 255. While this provides
up to 4,294,967,296 different colour combinations (32-bit), you actually only get the same
number (16,777,216) of discrete colours as if you are using 24-bit colours, because many of
the possible colour combinations duplicate each other. E. g grey can presented as:
Cyan=128, Magenta=128, Yellow=128, Black=0 or Cyan=0, Magenta=0, Yellow=0,
Black=128.
CMYK is not able to present as many colours as RGB.
When most computer programs display a graphic that uses CMYK, they first convert the
image to CMY by adding the value of the black channel to each of the other three and then
removing the black channel. This CMY can easily be converted to RGB.
8
HSL Colon Space (Hue, Saturation, and
Luminance)
The acronym stands for hue, saturation, and luminance. This method of describing colors is also
known as HSB (hue, saturation, and brightness), HSI (hue, saturation, and intensity), or HSV
(hue, saturation, and value).

The hue describes the position on the spectrum where the color is located (angle on color wheel),
with red at the low end of the spectrum and violet at the high end of the spectrum. This number can
be either an 8-bit value (a number between 0-255), a percentage (0-100 percent), or a number
between 0-359 (representing the degrees on a color wheel).

The saturation describes how bright the color is, between gray at the low end and very bright at
the high end. This number can be either an 8-bit value or a percentage.

The luminance (intensity or brightness) describes where on the scale between black and white the
color falls. This method of describing color is easy for many artists to use, and it is usually used only
in the interface of a graphics program. Once the graphic is saved, it is converted to RGB, Palletized,
or CMYK color.

The only time this color definition method is used natively is by color television, where it is referred
to as YUV (Y-signal, U-signal, and V-signal.) The Y-signal represents the intensity, and is the only
part of the signal a black-and-white television set uses. The U- and V-signals define a color
spectrum that a color television uses to choose which color to display each pixel.
9
CAMERA PICTURE / TV PICTURE FORMATION
AND INTERFACES
In cameras and color TV system the picture is broken to RGB-components in the still / video
camera. This can be done by filters or prisms. The umber of CCDs also vary: There can be 3 for
high quality / professional video cameras. In digital still cameras and lower cost video cameras
there is one CCD and a color filter in front of it for different color components of the picture.
10
RGB<=> YCrRb signal processing
The Matrix converts the RGB signal to Luminance
(brightness) Y:

Y = 0.587G + 0.299R + 0.114B

and two color difference signals (C
r,,
C
b
)

C
r
=0.713 (R-Y) = 0.500 R 0.419G 0.081B
= 0.701R0.587G 0.114B ( V-signal)
C
b
= 0.564 (B-Y) = -0.169R 0.331G 0.500B
( U-signal)
The Y, Cr, and Cb signals are carried by
the TV system to the TV receiver, that
converts them back to RGB signals for
display in TV

Human vision system (HVS) says that human eye is less sensitive lacking detail in colors (Chrominance
=Chroma) than for brightness (Luminance =Luma). To take advantage of this picture signal is divided to
luminance and Chrominance signals for separate treatment in compression, storage and transport.
The conversion RGB <- >YUV is linear and works to both directions. Display uses RGB.
11
R
G
B B-Y
R-Y
Y
Stereo
Audio
L
R
C
Y
Y/C, S-Video
Matrix
Composite video
R
G
B B-Y, U
R-Y, V
Y
Stereo
Audio
L
R
C
Y
Component video,
YUV-signal; or Y,
Cr, Cb signal
Matrix
QAM Modulation and
Modulation to IF-carrier
Luminance
and Chroma
combined in
one cable
Y Cr Cb / Y/C / S-video Interface
Chroma
Luma
12
Digital Still Picture Standards
Most important standards
J PEG (J PEG =J oint Photographers Expert Group)
Picture compression to about 10 % by DCT compression and
quantization
Some degradation of quality
Very widely spread: Digital cameras, E-mail attachment etc.
J PEG2000
Picture compression by wavelets
Improves picture quality as compared to J PEG
Special applications, mapping, satellite pictures etc.
GIF format
Used especially in animation and graphics on the web
8-bit color (palette of colors), saves space
13
J PEG
14
Syntax of Non-hierarchical J PEG data
15
Picture and Video Transform Compression steps
1. In picture and video compression the compression is applied to component
signals: Y, C
r
and C
b
. So this transform needs to be done fist to the picture
data. Also the picture is divided to typically 8x8 sample BLOCKs
2. Next step is color sub-sampling. This takes advantage of the Human eye lower
capability to see fast color variations, so color signals C
r
and C
b
can be
presented with less accuracy. The process is called Chroma Subsampling.
3. After that Y, C
r
and C
b
signals are transform coded by the Discrete Cosine
Transformation (DCT).
DCT is a discrete form of Fourier cosine transformation. Cosine transformation is
Fourier Transformation in case the signal to be transformed is symmetric.
In case of Fourier transform the signal to be transformed is usually in time domain
(t) and frequency domain is presented as frequency f (Hz). In case of picture
transform the picture is in spatial (dimensional) domain (e.g. marked with x), and
instead of frequency; we talk about spatial frequencies. E.g. a lot of variation in the
picture in a small area means that there are high spatial frequencies. Nevertheless
the mathematics is quite the same !
16
Croma (color) sub-sampling
4:2:2 video (studio video)
4:2:0 video (Digital TV-system / Europe,
DVD video disc, DV tape)
In colour (chroma) sub sampling the chroma signals is sampled at a lower frequency (or
there are less chroma pixels). Most used systems are:
4:4:4 Here there are 4 Cr and Cb samples for one 4 Y samples, thus no chroma sub
sampling. Best quality for most demanding studio use.
4:2:2 Here there are 2 Cr and Cb samples for 4 Y samples. This is normal system for a lot
of studio work.
4:2:0 In this scheme there are one Cr and Cb sample for 4 Y samples (name is odd due
technical reasons). This format is used in DV video compression, Digital TV and on DVD
video disc. This is also recommended for J PEG digital still images.
17
DCT for J PEG & MPEG-2 compression
J PEG, MPEG-1/2 and MPEG-4 use Discrete Cosine transformation (DCT) for
compression.
In the method a 8x8 pixel area of the picture is taken at a time. This is called a
BLOCK. For chroma we consider 4:2:0 subsampling (used in J PEG, DTV and
DVD).
A Block of 8 x 8 pixels consists of (4:2:0):
8 x 8 luminance pixels 8 bits, and 4x4 chroma points 8 bits for Cr Cb.
Other color sampling structures and block sizes Like 4x4) are also possible.
In 4:2:0 format 4 luminance blocks and 1 chroma block for Cr and Cb form a
Macroblock. This is formed so that color signal data can be handled in 8x8 pixel
blocks for DCT, and same HW/SW used for Luma and Chroma.
Example 8 x 8 pixel BLOCK of
pixel values (-128+127)
18
Macroblocks and Scaling of Luminance Data
DCT is the basic method used in J PEG still image compression as well as MPEG-1/2 and MPEG-
4 Visual profile video compression standards.
For compression the image is split into 8x8 pixel blocks (called BLOCK). DCT is applied to
these blocks to yield 8x8 blocks of DCT coefficients.
The DCT coefficients are calculated for Y, Cr and Cb separately

Scaling of pixel data before and after DCT:
RGB signal values are normally presented with 8 bits. Some Professional Video applications
use 10 bits.
When applying DCT process to component video signal, Cr and Cb have binary values
between (+127 and -127) (due to calculation of these components)
To simplify DCT encoder / decoder the Luminance signals Y are downshifted by subtracting
128 from the luminance pixel values. At the decoder 128 is added to the pixel data
The DC coefficient after DCT process has value range -10241016 (8 * binary value); AC
coefficients have a range +-/1027.
A thing to take into account is that corresponding to the analogue video signal range: 0700
mV, the binary signal has a range 16..235 according to CCIR-601. This allows some
headroom. This value may vary between standards and systems. Care should be taken in
compression / decompression systems to take this into account.
DCT process is reversible. The accuracy of the calculation should be 13-14 bits to avoid
round-off errors.
19
DCT coefficient calculation for J PEG and MPEG-
1/2





F(u,v) =[C(u) C(v)] /4 { f(j,k) * cos[(2j+1)u/16] * cos[(2k+1)v/16] }

j=0..7 k=0..7
Discrete Cosine Transformation (DCT) operates n NxN size pixel block (X-
matrix) and produces NxN size matrix of coefficients (Y-matrix); N=8 for J PEG
and MPEG-1/2. The forward DCT (FDCT) J PEG and MPEG-1/2 is given by:
f(j,k) = Pixel values (luminance or chrominance) in the 8x8 pixel block.
F(u,v) = coefficients of a 8x8 DCT block
u = normalized horizontal frequency (0<u<7)
v = normalized horizontal frequency (0<v<7)
Scaling factor C(u) and C(v) is 1/2 for v, u = 0, and 1 for u, v 0. E.g.
when calculating F(0,7), we have C(0) =1 and C(7) = 1/2.
F(0,0) is the DC coefficient: It is the sum of all pixel values in the block
multiplied by 1/8. Pixel values can be max -128..127 so max. DC coefficient
values are 1/8 * 64 *(-128) = -10241016. DC coefficient represent the
average value of the block. The other coefficients represent variation in the
block.
F(0,0) = 1/8 { f(j,k)}
j=0..7 k=0..7
20
Weighting Table
J PEG Standard Weighting Table -> for
Luminance Q (u,v). In MPEG-1/2
weighting tables can be varied. Different
weighting tables are used for Luminance
and Crominance
After calculating the DCT coefficient a Weighting Table is applied: Each
precalculated DCT coefficient is divided by the corresponding value (same u
and v indexes) in the weighting table.
Weighting: the DCT coefficient table is
divided pixel by pixel by the weighting
table (different for Luma and Chroma)
and the result is rounded to the nearest
integer:

21
Quantizicing & Rounding
DCT itself is not lossy ! - But it enables lossy compression by low weighting of the high
frequency components less than low frequency components (in effect this is low pass
filtering !).
As a result the detail in the picture is reduced, but also amount of data is reduced. Using
Human Vision System (HVS) properties this can be done so that the degradation is only
small for the visual quality but very big reduction of data rate can be achieved (in the
order of 5:1).
Weighting: the DCT coefficient table is divided pixel by pixel by the weighting table
(different for Luma and Chroma) and the result is rounded to the nearest integer:
After weighting the high frequency components get even smaller and many
are rounded to zero.
22
Weighting Table Examples
Examples for J PEG standard weighting tables
In MPEG-2 weighting tables can be varied at frame level.
Thus the tables need to be transmitted with the video data.
Size of coefficients can be used to adjust picture data rate

23
DCT example (1)
DCT coefficient calculation and weighting
24
DCT in detail indexing (2)
Original scaled luminance
data of a 8x8 pixel
macroblock, f(j,k)
Form of the matrix of
resulting DCT coefficients,
F(u,v), note capital letter as
in Fourier transform
25
DCT coefficient calculation





F(u,v) =[C(u) C(v)] /4 { f(j,k) * cos[(2j+1)u/16] * cos[(2k+1)v/16] }
Scaling factor C(u) and C(v) is 1/2 for v, u = 0, and 1 for u, v 0.

j=0..7 k=0..7
General DCT equation for J PEG and MPEG-1/2 reads as:
Example procedure for calculation: When calculating ONE DCT coefficient F(u,v) we can
first put j=0, and then calculate the inner summation. In starting that we take a pixel value
f(j,k)=f(0,0) [left top value in pixel block] and multiply that by the two cosine terms having
appropriate v and u values, j=0 and k is first 0. Then we repeat that for k=1 7 and all
these values are summed together. That is the first inner summation, i.e. sum of 8 terms.
Next we give j the next value j=1 and again calculate all 8 sum terms and sum them
together. This is repeated for all j=0..7, i.e. 8 times. After that we sum all the eight inner
sums together and get the sum of all 64 terms; then multiply the result by scaling factor
[C(u) C(v)] /4.
All this is repeated for the 64 coefficients of the 8x8 DCT coefficient matrix.
F(u,v)=[C(u) C(v)] /4 { [ f(j,k) * cos[(2j+1)u/16] * cos[(2k+1)v/16]] }

j=0..7 k=0..7
E.g. when calculating F(0,7), we have C(0) =1 and C(7) =1/2. Putting brackets:
26
DCT coefficient calculation DC coefficient
DCT calculation of a 8x8 pixel block produces a 8x8 DCT coefficient matrix.

DC coefficient: This is DCT coefficient F(0,0), which is in the upper left corner of the DCT
matrix. For F(0,0) U and v=0 so both cosines are cos(0) =1. What is left in inner sum is
sum of all pixel coefficient f(j,k). This is repeated for all rows and the result is the sum of
all pixel values, scaled.
F(0,0) = [1/2*1/2] /4 { f(j,k) * cos[(2j+1)0 /16] * cos[(2k+1)0/16] }
F(u,v) =[C(u) C(v)] /4 { f(j,k) * cos[(2j+1)u/16] * cos[(2k+1)v/16] }
Scaling factor C(u) and C(v) is 1/2 for v, u = 0, and 1 for u, v 0.

j=0..7 k=0..7
j=0..7 k=0..7
i.e. F(0,0) is the sum of all pixel values in the block multiplied by 1/8. Pixel values can
be max 127 so max. DC coefficient value is 1/8 * 64 *127 =1016. DC coefficient
represent the average value of the block. The other coefficients represent variation in
the block.
F(0,0) = 1/8 { f(j,k)}
j=0..7 k=0..7
27
Calculation of coefficient for u or v=0 (1/2)
F(u,v) =[C(u) C(v)] /4 { f(j,k) * cos[(2j+1)u/16] * cos[(2k+1)v/16] }
Scaling factor C(u) and C(v) is 1/2 for v, u = 0, and 1 for u, v 0.

j=0..7 k=0..7
For u=0 or v=0 on of the cosine terms becomes zero ( cos(0) =1 ). This allows
simplification. E.g. if v=0, we get:
F(u,0) =[C(u) C(0)] /4 { f(j,k) * cos[(2j+1)u/16] }
j=0..7 k=0..7
In calculation of the inner sum: j =constant, so the cosine term is the same for all
terms (and can be takes as common factor); and then we have left sum of pixel
values with k=0..7, i.e. sum of one column's pixel values multiplied by one cosine
factor. This is repeated for all 8 rows (j=0..7) and the sums are then summed
together and scaled.
28
Calculation of coefficient for u or v=0 (2/2)
F(u,v) =[C(u) C(v)] /4 { f(j,k) * cos[(2j+1)u/16] * cos[(2k+1)v/16] }
Scaling factor C(u) and C(v) is 1/2 for v, u = 0, and 1 for u, v 0.

k=0..7 j=0..7
This becomes useful when calculating coefficients with u=0:
F(0,v) =[C(0) C(v)] /4 { f(j,k) * cos[(2k+1)v/16] }
k=0..7 j=0..7
Again in the calculation of the inner sum now k = constant, so the cosine
term is the same for all terms (and can be takes as common factor); and
then we have left sum of pixel values with j=0..7, i.e. sum of one rows pixel
values multiplied by one cosine factor. This is repeated for all 8 columns
(k=0..7) and the sums are then summed together and scaled.
In cases where u and v are both 0,these procedures cannot be used.
Mathematically in the DCT equation we can exchange the summation order in
DCT calculation (after all we calculate all the coefficients according to
equation, but summation order does not matter), so:
29
Review of DCT
process
DCT coefficient calculation and weighting
1. Take 8x8 pixel data block of Y, Cb or
Cr values f(j,k).
2. Calculate DCT coefficient matrix
F(u,v), 8x8 values. This can be
considered as a two-dimensional
Fourier transform of the pixel data
block. Calculation accuracy around
14 bits needed, Values are rounded
to nearest integer.
3. Divide each value of the DCT matrix
by corresponding value in weighting
table and round to nearest integer -
>Normalized and quantiziced DCT
coefficient values. These are stored
or transmitted in applications.
4. Received (decoder) does the
reverse process. Instead of Forward
DCT, Inverse DCT. But equation for
IDCT is the same as forward DCT !


30
Example of calculation of one DCT coefficient
Calculation of F(3,0) = F(u,v) => u=3, v=0
Pixel data f(j,k)
Weighting Table Q(u,v)
After DCT process we get F(u,v) =157.6, as
rounded 158. This is divided by weighting
table number Q(3,0) =16 to give normalized
and quantized DCT coefficient of F
q
(3,0) =2
C(u) =1 for u or v 0; C(v) =1/2 for u or v=0
31
Zig-Zag scanning, RLC, VLC coding
After performing DCT the
remaining bit stream is still
reduced:
1. RLC =Run Length Coding.
Here runs of zeros are
presented by the number of
zeros, and the next digit.
Instead of transmitting zeros
at the end End Of Block
(EOB) marker is inserted.

2.Variable length coding. The
more often appearing
combinations are coded with
less digits.

Note: Here Zig-Zag scan is for
J PEG !
32
Zig-Zag scanning, RLC, VLC coding
Note: Here Zig-Zag
scan is for J PEG !
33
VLC coding
34
What happens in DCT ?
We calculate the F(u,v) coefficients.
The top left corner coefficient is F(0,0). Thus u=v=o, and the cosines in the DCT equation are
zero. Also by the DCT equation: F(0,0) =1/8 * (sum of all pixel values). This
represent the average of luminance of over the whole block for Luminance and macroblock
for Chrominance) x 8 (so max. range comes -1024+1016). F(0,0) is the DC coefficient of
the block / macroblock (DC =Direct Current, compare with Fourier transform).
The AC- coefficients (AC=Alternating Current) in turn measure correlation of the bit pattern
of each row or column against the given cosine wave with used u and v values. If the picture
varies according to this cosine pattern we get a large DCT coefficient value, if not value is
small. If the variation is antiphase the component is negative.
Most of the AC coefficient (before rounding) are zero. What does that mean ? It means that
the picture does not have a lot of fast variation. An even area of picture with same color, gives
just DC-coefficients for one for luma (Y) and the two Chrominance components (Cr and Cb).
I.e. the matrix of 3 x 64 values is transformed to 3 values; and no error is made !! This is
because spatial components are represented by frequency components and for that kind of
picture there are just the DC-coefficients. In this cases we achieve 64:1 compression without
making an error !
However in quantization and rounding we deliberately cause some error. In weighting we
divide the usually small AC-coefficient with relatively large values. These often yield zero
values after rounding to nearest integer. Thus in weighting we can cause some error which is
reduction of the value of high order high frequency components. In effect this means low pass
filtering where the detail of the picture is reduced.
35
Illustration of the DCT-coefficients with DCT
patterns
4x4 DCT basic pattern used in
Advanced Video Coding (AVC)
for MPEFG-4 part 10 / H.264.
8x8 DCT basic pattern
36
DCT based compression process, continued
Depending on the detail of the picture the DCT process can give a varying
amount of data. To make things work for a fixed size transmission a buffer is
needed. In addition a feedback control adjusting the weighting parameters is
needed. E.g. for a vivid picture larger quantization values would be used
resulting in reduction of picture quality.
Example might be sports programs where quality could be clearly degraded if
not enough transmission capacity is available.
In MPEG-2 the weighting tables can be varied on frame level. Naturally the
weighting table values need to be transmitted within the data.
37
Full chain of MPEG-1/2 compression
38
DCT decoder
Inverse DCT is similar to Forward DCT with DCT coefficients
replaced by sample values ! same hardware/software can be
used for both operations.
First a reverse quantization process is done: every DCT coefficient is
multiplied by the weighting table values Q(u,v). Second step is Inverse DCT.
39
Errors in video compression
Errors can be calculated as differences between the original picture
and encoded video as follows.

In practice the Root Mean Square Error (RMSE), and Peak Signal to Noise
Ratio (PSNR) are calculated
Here we calculate error for each pixel value, and calculate the rms-error
based on these and scale to the maximum value of 255 (for 8-bit video).
40
J PEG-2000 compression: Wavelet (1)
Wavelet is the newer Transform technique used e.g. in J PEG2000
Idea is based on decomposition of the picture data to higher and lower
frequency band successively; i.e. spectral band formation
41
Wavelet (2)
First level decomposition
Sub-band structures in verification model
Refinement of decomposition
42
Visualization of Wavelet Transform
43
Picture after wavelet decomposition
44
VIDEO DEFINITIONS
45
Video is sequence of pictures
The aspect ration is =physical picture width/picture height. For normal TV this is 4:3 and
for widescreen TV 16:9. The shape of the pixels can be rectangular (normal 4:3 TV), or
rectangular (widescreen TV). Thus the pixel aspect ratio does not need the be the same
as the physical aspect ratio.
Picture size can also determined as a physical size, e.g. 50 cm x 67 cm. Resolution is
then sometimes defined as pixels /inch or pixels /cm. In printers the common definition is
dots/inch (dpi).
In video we need to define spatial
resolution of one picture of the sequence
and temporal resolution, i.e. how many
pictures /second are transmitted
The accuracy of the picture is called
resolution, and it is defined as a number of
horizontal and vertical picture elements:
Pixels. E.g. resolution of a European (PAL)
TV is 576x720 (576 vertical lines, each line
containing 720 pixels, so there are total
about 0.415*106 pixels =0.415 Mega
pixels =0.415 MPix.

46
Progressive (p) and Interlaced video (i)
The other definition of video concerns
repetition of the frames. Possibilities for
transmission of frames are progressive
mode (p) and interlaced mode (i)
Frame rate is the number of full frames
transmitted and displayed per second,
unit Frames / second =fps or f/s. In
European (PAL) TV the frame rate is
25 fps, and North American TV system
(NTSC) 60 fps.

The video frames can be full frames containing all information of the frame. This
is called progressive video, abbreviated p. E.g. marking 25p means progressive
video with frame repetition rate of 25 fps.
In Interlaced video the frame is split into two fields: top field containing e.g. odd
numbered lines (or pixels), i.e. lines 1, 3, 5, and bottom field containing even
numbered lines, i.e. 2, 4, 6. Field are transmitted at twice the frame rate. In this
way for the eye the visual repetition rate appears as 50 Hz while the full frame is
produced at a frequency of 25 Hz.


47
PAL- and NTSC-TV resolutions



PAL = European standard, also used in Far East. NTSC: North America and Japan.
PAL and NTSC differ in number of lines, frame rate and on how color signal is modulated.
Different versions of PAL differ mainly on the sound carrier freq.
Digital sampling freq same for both, 13.5 MHz
Same number of horizontal lines - No horizontal interpolation needed in conversion, but careful
design in line blanking needed. Conversion is problem for consumer. Vertical interpolation or
eqv. Needed.
Widescreen TV (Aspect ration 16:9) uses the same 720 horizontal pixels as 4:3 TV but pixels are
stretched (non-square), pixel aspect ratio 1.422 vs. 1.067 in 4:3 TV.
858 total lines, 720 active
horizontal pixels
864 total lines, 720 active
horizontal pixels
Aspect ratio
4:3 (Clean
aperture pixel
array
690x566)
Aspect ratio
4:3 (Clean
aperture pixel
array
708x480)
NTSC 525 lines / 59.94 Half-
frames / second / 29.97 fps
PAL 625 lines /50 half-frames /
second/ 25 fps
625 total
lines:
576 lines of
picture
information
525 total
lines:
480 lines of
picture
information

48
ITU-R BT.601-5 specification for NTSC (30 Hz) and
PAL (25 Hz) video
Uncompressed SD quality digital video bit rate: 216 Mb/s. This does
not go through any bitpipe. So reduction of data is very necessary.
HD video in uncompressed format takes > 1 Gb/s. Very efficient
compression is key for making this practical for consumers.
BUT: Technologies now exist to do this and at low cost !
49
Video bitrate and compression
Video is a sequence of multiple pictures. The picture (or frame) rate is typically
10-30 frames / second (fps).
In PAL TV frame rate is 25 fps and resolution 576x720 pixels (eye see this as
constant movement). Low quality videoconferencing some 10-15 fps (jerky
picture) at resolution of 176x144 pixels (QCIF).
Imagine TV-resolution. The amount of picture data is 576*720 pixels x 3 x 8 bits
(each color component RGB coded with 8 bits) =0.415 Mpixels x 24 bits/pixel =
9 953 280 bits =1.244 MB.
When this is repeated at 25 fps the bitrate is 1.244 MB * 25 /s =(1.244 *8) *25
~250 Mbit/s !!
This is huge, so compression is necessary. The current technology is to use
MPEG-2 (MPEG =Motion Pictures Expert Group) compression in digital TV
system (same also on DVD). This compresses this data to less than 5 Mb/s a
ratio of 50 !
Improvement in compression (to about 100:1) have been obtained with newer
codecs like e.g. MPEG-4 AVC (=H.264) and Windows Media. Note, however,
that compression efficiency is not the only variable: E.g. needed compression
and decoding (decompression) processor power is also important parameter for
mobile devices.

50
Digital Video Compression key enabler for digital video
applications.
Reference point: Uncompressed video in PAL Standard TV resolution (SD) is 216 Mb/s and High Definition
video about 1.2 Gbit /s !
Thus efficient compression of video withtout major degradation of quality is key for applications.
Picture quality vs. compression ratio basic estimates:
10:1 20 Mbits High definition (MPEG-2)
20:1 10 Mbits Enhanced definition.
40:1 5 Mbits PAL Digital TV, and DVD (MPEG-2)
100:1 2 Mbits VHS-quality
Newer technologies, e.g H.264, Microsoft Windows Media, Apple Quick Time, Reals Real Video achieve
even better compression efficiencies.
In compression, efficiency and cost of hardware, battery power consumption, quality of video and transfer
bitrate /required storage space are related. In general the older methods (MPEG-1/2), are not so efficient, but
playback also works on older computers with less CPU power.
In audio the compression ratios are around 10:1 (e.g. MP3). Standard formats used today in Digital TV and
DVD (stereo), MPEG-1 layer II (224 kb/s) and Dolby Digital (AC-3) around 192 kb/s. Some newer
technologies (MPEG-4, Windows media etc.) achieve even better efficiencies.
However, for wireless transmission high compression ratio is the key thing due to the limited bandwidth of the
transmission channel.

51
Video compression - workflow and tools
Key things: Spatial and temporal
correlation of picture data
Flow of motion - > Motion vectors
Video -> Encoding -> -> Transmit or Storage -> -> Decoding -> -> ~Original video
52
VIDEO COMPRESSION STEPS
1. Data reduction due to colour sub-sampling. Due to Human vision system resolution
of colours need to be roughly 50 % as compared to luminance signal. Common
format is 4:2:0 where from studio video of approximately 216-249 Mb/s >124
Mb/s : 2:1 compression.
2. Data reduction due to Intraframe compression using Discrete Cosine
Transformation (DCT): 124 Mb/s ->to approximately 25 Mb/s (data rate of DV
video): 5:1 compression
3. Data reduction using the similarity between successive frames: E.g. in PAL
MPEG-2 a Group Of Pictures (GOP) block consisting of 12 frames is coded using
motion estimation and interpolation in MPEG-2). Data reduced from approx. 25
Mb/s ->5 Mb/s, so 5:1 compression

Total compression 2 * 5 * 5:1 =50:1 !

Newer Compression tools (MPEG-4 Part10 / H.264, Windows Media, Real Media,
Quick Time) achieve even better efficiency up to about 100:1.
53
Temporal Spatial video coding
Spatial compression (Intraframe) is based on Discrete Cosine
Transformation (DCT) in most video standards
Temporal compression is based on motion estimation and transmission
of motion vectors
The coding process for MPEG-1 and MPEG-2 is in many respects similar
as for J PEG: Same DCT process, quantization, RLC and VLC. For video
temporal redundancy coding by motion estimation is the main new
element.
54
Croma (color) sub-sampling
4:2:2 video (studio video)
4:2:0 video (Digital TV-system / Europe,
DVD video disc, DV tape)
In colour (chroma) sub sampling the chroma signals is sampled at a lower frequency (or
there are less chroma pixels). Most used systems are:
4:4.4 Here there are 4 Cr and Cb samples for one Y sample, thus no chroma sub sampling.
Best quality for most demanding studio use.
4:2:2 Here there are 2 Cr and Cb samples for one Y sample. This is normal system for a lot
of studio work.
4:2:0 In this scheme there are 2 Cr and Cb samples for one Y sample (name is odd due
technical reasons). This format is used in DV video compression, Digital TV and on DVD
video disc. This is also recommended for J PEG digital still images.
55
CODING OF
COMPOSITE
VIDEO SIGNALS
AND CHROMA
SUBSAMPLING
STRUCTURES
Y Cr Cb
4:4:4
For each four Y sample points, four C
r
and C
b
sample points
Y Cr Cb
4:2:2
For each four Y sample points, two C
r
and C
b
sample points
Y Cr Cb
4:1:1
For each four Y sample points, one C
r
and C
b
sample point
Y Cr Cb
4:2:0
For each four Y sample points, two C
r
or C
b
sample points in turn
The C
r
and C
b
sample points are taken from underlying lines
56
Intraframe compression - DCT
The first key thing in video compression is Intraframe Compression,
i.e. reducing picture data within one video frame.
Technology Used is Discrete Cosine Transform (DCT). This is the
same that is used for J PEG.
Difference in MPEG methods is that different weighting tables and
zig-zag scan may be used.
Video compression systems also generally include a feedback system
for adjusting quantization level to maintain constant bitrate.

For DCT see previous slides on the DCT process
57
Removing Temporal Redundancy
Next step in compression: Reduction of the temporal variation between successive frames.
Idea: In video the successive pictures have often relatively little change. Then we transmit
only first full frame, and for next frames only change is transmitted.
In MPEG-2 this is done typically in groups of 12 frames for PAL (15 frames for NTSC). This
is called Group of Pictures=GOP. These mean about 0.5 seconds of video.
In Windows media, the GOP length can be several seconds, e.g. 3-5 seconds.
What do we lose ? In quality theoretically nothing, since the differences to previous pictures
can be nearly perfectly transmitted. However, associated motion search takes a lot of
computer power.
But: If there are transmission errors then the several of the pictures in a GOP would be
degraded resulting in severe degradation of picture quality. This gets worse with longer GOP
(compare Windows Media and MPEG).
To reduce these effects, a very strong error correction is needed in the transmission system. A
Quasi Error Free (QEF) channel is needed (BER ~10
-12
) resulting in <1 error /hr.
58
Temporal data
reduction
Frame 1
Difference
Frame 2
Successive video frames typically
differ from each other only to
some extent. The biggest change
is often movement; i.e. parts of
the picture have moved to
different positions.
The example shows that purely
subtracting successive pictures
results in dramatically reduced
amount of data.
This process is called Differential
PCM. Many modern codecs use
combination of DPCM and DCT to
achieve good coding efficiency
(MPEG-1&2, MPEG-4 etc.) but
add a technology called motion
estimation to improve efficiency.
59
Differential PCM for video
Calculate the difference between successive frames
Difference is transmitted. Straightforward but does not
comply with today's requirements in efficiency. However,
DPCM is used for DC- coefficients of successive macro
blocks.
60
Motion estimation
Motion estimation is key process for temporal data reduction and take advantage of the
similarity of successive video frames.
The Idea of Motion estimation is to form a new picture based on previous picture where
every macroblock (16x16 pixels in MPEG-1/2) has been moved to new positions in the
current picture. This forms an improved estimate for the new picture.
Motion estimation is applied for the Luminance signal (only). Same movement assumed for
Chroma.
The process results in motion vectors for every macroblock, indicating the position of
each macroblock in the next frame based on position in previous frame. This estimate is
then reduced from the previous frame and transmitted (in from of DCT coefficients) together
with motion vectors to the receiver.
61
Model for DPCM / DCT video compression with
motion compensation
This is the main method used
e.g. im MPEG-1 &2 and
MPEG-4.

Step 1: Intraframes every z
frames (typical 12 frame in
PAL video)
Step 2: Motion estimation for
each macroblock ->
Prediction for next frame
Step 2: Subtract prediction
from current frame. Residual
contains only little information
Step 4: Apply DCT for the
residual
Step 5. Transmit residual as
DCT coefficients, and motion
vectors
Decoder does the reverse.
62
Motion vector search
Effective search for motion is key
thing for a codec.
Effectiveness depends on search
area and accuracy of motion
estimation
As example in MPEG2, search
area is 64x64 pixels for a 16x16
pixel macroblock and accuracy
of motion estimation is 0.5
pixels.
Motion estimation can be based
to block matching where current
block is compared to all possible
search positions in the previous
frame.
The rms error in each position is
calculated and the best position
is the one with smallest error.
Intelligence in motion estimation
is a differentiating factor between
codecs.

63
Types of MPEG Video Frames
Intraframes (I): Compression is based e.g. on DCT. Only one frame is
treated. These frames take most space, e.g. in PAL Digital TV if video
consists only of I-frames bitrate would be about 25 Mb/s.
Predicted frames (P): These frames are based motion estimation.
Prediction is based on previous I or P frame.
Bidirectional frames. Compression is based on motion estimation (kind
of interpolation) between previous and next I, P and B frames. These
frames take least space.
All combined on a 12 pictures Group of Pictures (GOP) block results
in about 5 Mb/s bitrate in PAL Digital TV.
64
Group of Pictures (GOP) structures in MPEG-2
In MPEG-2 broadcast video and
DVD pictures are compressed in
groups. One such group is called
Group Of Pictures (GOP).
Typically there are 12 pictures in
the GOP for PAL and 15 for
NTSC.

The first frame is always the
Intraframe (I). The successive
frames are based on predictions
from the intraframe (P od B
pictures).

There are several possibilities for
the GOP structures depending on
many factors. Typical for PAL
TV/DVD is 12 frame GOP:
IBB PBB PBB PBB
65
DCT based compression process, continued
Depending on the detail of the picture the DCT process can give a varying
amount of data. To make things work for a fixed size transmission a buffer is
needed. In addition a feedback control adjusting the weighting parameters is
needed. E.g. for a vivid picture larger quantization values would be used
resulting in reduction of picture quality.
Example might be sports programs where quality could be clearly degraded if
not enough transmission capacity is available.
In MPEG-2 the weighting tables can be varied on frame level. Naturally the
weighting table values need to be transmitted within the data.
66
Full chain of MPEG-1/2 compression
67
Overview of Digital Video Standards
Copyright J ouko kurki, 2005-2006
Based On Michael Robin and Michael Poulin: Digital television Fundamentals, 2nd ed., 2000, ISBN 0 07
135 581 2, Iain E.G. Richardson, H.264 and MPEG-4 Video Compression, Wiley, England, 2003, ISBN
0-470-84837-5. J erry D. Gibson (ed), Multimedia Communications, Academic Press, 2001, ISBN 0-12-
282160-2, Information from the Web, and other documents.
68
Summary of TV Monitor (CIF) and Computer
Monitor (VGA) Resolution Formats

CIF
Formats
Sub-
QCIF
QCIF CIF
No of
Pixels
128 x
96
180 x
144
360 x
288

VGA
Formats
QQVGA QVGA VGA SVGA XGA SXGA UXGA HDTV QXGA
No of
Pixels
160 x
120
320 x
240
640 x
480
800 x
600
1024 x
768
1280 x
1024
1600 x
1200
1920 x
1080
2048 x
1536

Size of the picture determines the amount of data, so this together with compression scheme
define the need for transmission bitrate.
The old PC monitor had a pixel size of 640x480, and aspect ratio 4:3. This serves still as basis
for many NTSC video applications. In Europe (PAL world) numbers are slightly different for
TV and many video applications.
Below is a summary of pixel sizes for many popular formats. Note that for TV (CIF) numbers
may differ in North American and Europe.
CIF =Common Intermediate Format: 360 x 288 picture size, for video 30 fps frame rate. Used
e.g. for videoconferencing. 4CIF =2x pixel count in both dimensions ->720x576 (digital TV
resolution; Quarter CIF pixel count in both dimensions.
69
H.261 and H.263 (ITU-T) video standards
H.261 first widely used standard for videoconferencing over ISDN
circuit switched network. Bitrate nx 64 kb/s. Hybrid DPCM / DCT
model for compression integer accuracy motion compensation
H.263, better compression than H.261, supports basic quality video at
30 kb/s. Designed to operate over circuit and packet switched
networks. Uses hybrid DPCM / DCT model for compression with half
pixel motion compensation.
The baseline H.263 coding model was adopted as core of MPEG-4
Visual Simple profile.
There are also H.263+and H.263++standards with enhanced
characteristics (see Richardson).
70
MPEG-1 and MPEG-2
Use DCT for compression. The process is basically the same for J PEG still
images and MPEG-1/2 video.
Good compression efficiency with low processor power.
MPEG-2 dominating in Digital TV, DVD, and High Definition TV (HDTV)

However, newer compression standards, like MPEG-4 part 2 and part 10,
achieve better compression and have other properties, like error resilience,
digital rights management, etc. that make them better suitable for mobile
applications
Due to better compression efficiency MPEG-4 may become the standard
also for HDTV.
71
MPEG-1 CD-ROM storage compression standard (Video CD)
Stereo Audio standards: MPEG1 layer 1 (high quality), layer
2 (audio data transmission over networks; DVD, and Digital
TV); layer 3 (=MP3); music delivery to many aplications
MPEG-2 DVB (digital video broadcasting) and DVD (digital versatile
disk) compression standard. Also profile for HDTV ~20
Mb/s with MPEG-2
MPEG-3 Originally meant for HDTV, but HDTV was made part of
MPEG-2
MPEG-4 Effective object based video compression technology for
natural and synthetic audio and video. Audio and video
streaming and complex media manipulation. Great promises
for mobile multimedia. Scales from 30 kbps to HDTV quality
(HDTV ~8 Mb/s with MPEG-4).
MHEG-5 Multi-media hypermedia standard (MPEG 4 for set top
boxes).
MPEG-7 Standard for content identification.
MPEG-21 Network quality, content quality, conditional access rights
(multimedia umbrella standard).

M-PEG Set of Standards
MPEG = Moving Pictures Experts Group
MPEG standards are ISO standards, ISO = International Standards Organization
72
MPEG-4 Part 10 and H.264
MPEG-4 Part 10 is the newest addition to MPEG-4, standardized in
2003.

Achieves very high compression efficiency for Natural Video

Potential applications in Mobile / Wireless / 3G

Also strongest proposal for Mobile-TV using DVB-H as a carrier

H.264 is much like MPEG-4 Part 10, but simpler, has only 3 profiles.

H.264 video format is 4:2:0. It supports Progressive and Interlaced
video.


73
Some compressed video bitrates and
applications
Compres
sion
standard
MPEG-1 MPEG-2 DV /HDV H.264 / MPEG-4
part 10
Media Hard disk,
CD-ROM,
tape etc.
Hard disk,
DVD disk,
tape, Flash
memory
DV cassette 60 min
(small), DV cassette
180 min (large)
Flash memory
Hard disk, DVD disk,
tape, Flash memory
Bitrate 1.5 Mb/s Typ. 3-7 Mb/s 25 Mb/s 2 Mb/s for SD and 8
Mb/s for HD video. 0.3
Mb/s for Mobile video.
Typical
storage
capacity
1 2hr movie
on a 600 MB
CD-ROM
1 2hr movie on
a 4.7 GB DVD
disk
1/2 hrs of video on a
small/large DV casette

One HDTV-movie on a
HD-DVD (15/30 GB) or
Blue ray (25/50 GB)
single /dual layer disc.
Application
s
Backup of
VHS quality
video,
training
videos,
Video clips
on the WEB
High quality
movie
distribution via
DVD and
Digital TV
brodcasting,
HDTV (20
Mb/s)
Videocamera shooting
format; video editing
format. Broadcast news
gathering, corporate /
business use, Home
videos, education etc.
HDV is consumer High
Definition format.
Future HD-TV
Broadcasting format,
news gathering, Mobile
video format.
Format for High
definition video discs.

Anda mungkin juga menyukai