Anda di halaman 1dari 138

-1-

2004. 10. 20.


Overview of H.264 /
MPEG-4 Part10
Soon-kak Kwon, A. Tamhankar, K. R. Rao
Dongeui University, T-Mobile, University of Texas at Arlington

-2-
Contents
1. Introduction
2. Layered Structure
3. Video Coding Algorithm
4. Error Resilience
5. Comparison of Coding Efficiency
6. Conclusions

-3-
Introduction
Scope of Image and Video Coding Standards
Only the Syntax and Decoder are standardized:
Optimization beyond the obvious
Complexity reduction for implementation
Provides no guarantees of quality
Pre-Processing Encoding
Post-Processing
& Error Recovery
Decoding
Input (image / video)
Output (image / video)
Scope of Standard
-4-
Introduction
Video Coding Standards
2003 Advanced Video Coding
2002 Multimedia Framework MPEG-21
2001
Multimedia Content description
Interface
MPEG-7
2000 Interactive video MPEG-4
1995 DTV, SDTV, HDTV, DVD MPEG-2
1992 Video CD MPEG-1
1998, 2000 Videophone H.263, H.263++
1995, 2000 DTV, SDTV H.262, H.262+
1990 Video Conferencing H.261
1995-2000 Fax JBIG
1992-1999, 2000 Image JPEG, JPEG2000
Year Main Applications Standard
2004 August
Fidelity Range Extensions
(High profile), Studio editing, Post
processing, Digital cinema
H.264/MPEG-4 part 10
-5-
Introduction
MPEG-1
Formally ISO/IEC 11172-2 (93), developed by ISO/IEC JTC1
SC29 WG11 (MPEG) use is fairly widespread, but mostly
overtaken by MPEG-2
Superior quality compared to H.261 when operated at higher bit
rates ( > 1Mbps for CIF 352x288 resolution)
Provides approximately VHS quality between 1-2Mbps using SIF
352x240/288 resolution
Additional technical features :
Bi-directional motion prediction (B-pictures)
Half-pel motion vector resolution
Slice-structured coding
DC-only D pictures
-6-
Introduction
Predictive Coding with B Pictures
I B P B P
-7-
Introduction
MPEG-2 / H.262
Formally ISO/IEC 13818-2 & ITU-T H.262, developed (1994)
jointly by ITU-T and ISO/IEC SC29 WG11 (MPEG) Now in wide
use for DVD and standard & high-definition DTV (the most
commonly used video coding standard)
Primary new technical features:
Support for interlaced-scan pictures
Also
Various forms of scalability (SNR, Spatial, Temporal and hybrid)
I-picture concealment motion vectors
Essentially same as MPEG-1 for progressive-scan pictures, and
MPEG-1 forward compatibility is required
Not especially useful below 2-3Mbps (range ~2-5Mbps SDTV
broadcast, 6-8Mbps DVD, 18Mbps HDTV), picture skipping not easy
-8-
Introduction
H.263 : The Next Generation
ITU-T Rec. H.263 (v1: 1995): The next generation of video
coding performance, developed by ITU-T the current premier
ITU-T video standard (has overtaken H.261 as dominant
videoconferencing codec)
Superior quality to prior standards at all bit rates (except perhaps for
interlaced video)
Wins by a factor of two at very low rates
Version 2 (late 1997 / early 1998) & version 3 (2000) later developed
with a large number of new features
Profiles defined early 2001
H.263+ & H.263++ (Extensions to H.263)
-9-
Introduction
MPEG-4 Visual : Baseline H.263 and Many Creative
Extras
MPEG-4 Visual (formally 14496-2, v1: early 1999): Contains the
H.263 baseline design and adds essentially all prior features and
many creative new extras:
Segmented coding of shapes
Scalable wavelet coding of still textures
Mesh coding
Face animation coding
Coding of synthetic and semi-synthetic content
10 & 12-bit sampling
More
v2 (early 2000) & v3 (early 2001) added later
-10-
Introduction
Relationship to Other Standards
Same design to be approved in both ITU-T / VCEG and ISO/IEC
/ MPEG
In ITU-T / VCEG this is a new & separate standard
ITU-T Recommendation H.264
ITU-T Systems (H.32x) is modified to support it
In ISO/IEC / MPEG this is a new part in the MPEG-4 suite
Separate coded design from prior MPEG-4 visual (Part 2)
New part 10 called Advanced Video Coding (AVC similar to
AAC MPEG-2 as separate audio codec)
Not backward or forward compatible with prior standards
MPEG-4 Systems / File Format modifying to support it
H.222.0 | MPEG-2 Systems are also be modified to support it
IETF working on RTP payload packetization

-11-
Introduction
History of H.264 / MPEG-4 part 10
ITU-T Q.6/SG16 started work on H.26L (L: Long Range)
July 2001: H.26L demonstrated at MPEG (Moving Picture
Experts Group) call for technology
December 2001: ITU-T VCEG (Video Coding Experts Group) and
ISO/IEC MPEG started a joint project Joint Video Team (JVT)
May 2003: Final approval from ISO/IEC and ITU-T
The standard is named H.264 by ITU-T and MPEG-4 part 10 by
ISO/IEC
Fidelity Range Extensions (August 2004) Amendment 1
Transport of MPEG-4 AVC on MPEG-2 TS Ammendment 3

-12-
Introduction
Purpose of H.264 / MPEG-4 part 10
Higher coding efficiency than previous standards, MPEG-1,2,4
part 2, H.261, H.263
Simple syntax specifications
Seamless integration of video coding into all current protocols
More error robustness
Various applications like video broadcasting, video streaming,
video conferencing, D-Cinema, HDTV
Network friendliness
Balance between coding efficiency, implementation complexity
and cost - based on state-of the-art in VLSI design technolgy
-13-
Introduction
H.264 / MPEG-4 part 10 Architecture
-14-
Introduction
Applications of H.264 / MPEG-4 part 10 : A Broad range of applications
for video content including but not limited to the following:
Video Streaming over the internet
CATV Cable TV on optical networks, copper, etc.
DBS Direct broadcast satellite video services
DSL Digital subscriber line video services
DTTB Digital terrestrial television broadcasting, cable modem,
DSL
ISM Interactive storage media (optical disks, etc.)
MMM Multimedia mailing
MSPN Multimedia services over packet networks
RTC Real-time conversational services (videoconferencing,
videophone, etc.)
RVS Remote video surveillance
SSM Serial storage media (digital VTR, etc.)
D Cinema Content contribution, content distribution, studio editing, post
processing
-15-
Introduction
Profiles and Levels for particular applications
Profile : a subset of entire bit stream of syntax,
different decoder design based on the Profile
Four profiles : Baseline, Main, Extended and High

Streaming Video
Extended
Digital Storage Media
Television Broadcasting
Main
Video Conferencing
Videophone
Baseline
Applications Profile
Content contribution
Content distribution
Studio editing
Post processing

High
-16-
Introduction
Specific coding parts for the Profiles

-17-
Introduction
Common coding parts for the Profiles
I slice (Intra-coded slice) : the coded slice by using prediction only
from decoded samples within the same slice
P slice (Predictive-coded slice) : the coded slice by using inter
prediction from previously-decoded reference pictures, using at
most one motion vector and reference index to predict the sample
values of each block
CAVLC (Context-based Adaptive Variable Length Coding) for
entropy coding


-18-
Introduction
Coding parts for Baseline Profile
Common parts : I slice, P slice, CAVLC
FMO Flexible macroblock order : macroblocks may not necessarily
be in the raster scan order. The map assigns macroblocks to a slice
group
ASO Arbitrary slice order : the macroblock address of the first
macroblock of a slice of a picture may be smaller than the
macroblock address of the first macroblock of some other
preceding slice of the same coded picture
RS Redundant slice : This slice belongs to the redundant coded
data obtained by same or different coding rate, in comparison with
previous coded data of same slice


-19-
Introduction
Coding parts for Main Profile
Common parts : I slice, P slice, CAVLC
B slice (Bi-directionally predictive-coded slice) : the coded slice by
using inter prediction from previously-decoded reference pictures,
using at most two motion vectors and reference indices to predict
the sample values of each block
Weighted prediction : scaling operation by applying a weighting
factor to the samples of motion-compensated prediction data in P
or B slice
CABAC (Context-based Adaptive Binary Arithmetic Coding) for
entropy coding


-20-
Introduction
Coding parts for Extended Profile
Common parts : I slice, P slice, CAVLC
SP slice : the specially coded slice for efficient switching between
video streams, similar to coding of a P slice
SI slice : the switched slice, similar to coding of an I slice
Data partition : the coded data is placed in separate data partitions,
each partition can be placed in different layer unit
Flexible macroblock order (FMO)
Arbitrary slice order (ASO)
Redundant slice (RS)
B slice
Weighted prediction


-21-
Introduction
Profile specifications

X
CABAC
X X
Interlaced Coding
X X
B Slice
X
SP/SI Slices
X X
Error Resilience Tools Flexible
MB Order, ASO, Red. Slices
X X X
CAVLC/UVLC
X X X
Variable Block Size (16x16 to 4x4)
X X X
Pel Motion Compensation
X X X
Deblocking Filter
X X X
I & P Slices
Extended Main Baseline High
X
X
X
X
X
X
X
X
Data Partitioning X
-22-
Introduction
Application requirements








Application

Requirements

H.264
Profiles

MPEG-4 Profiles

Broadcast television

Coding efficiency, reliability (over a controlled
distribution channel), interlace, low-complexity
decoder

Main

ASP (Advanced
Simple)

Streaming video

Coding efficiency, reliability (over a
uncontrolled packet-based network channel),
scalability

Extended

ARTS (Advanced
Real Time Simple)
or FGS (Fine
Granular Scalability)

Video storage and
playback

Coding efficiency, interlace, low-complexity
encoder and decoder

Main

ASP

Videoconferencing

Coding efficiency, reliability, low latency, low-
complexity encoder and decoder

Baseline

SP (Simple)

Mobile video

Coding efficiency, reliability, low latency, low-
complexity encoder and decoder, low power
consumption

Baseline

SP

Studio distribution

Lossless or near-lossless, interlace, efficient
transcoding

Main
High

Studio Profile



-23-
Introduction
Level : corresponding to processing power and memory
capability of a codec

Level number Picture type & frame rate
1 QCIF @ 15fps
1.1 QCIF @ 30fps
1.2 CIF @ 15fps
1.3 CIF @ 30fps
2 CIF @ 30fps
2.1 HHR @15 or 30fps
2.2 SDTV @ 15fps
3 SDTV: 720x480x30i,720x576x25i 10Mbps(max)
3.1 1280x720x30p
3.2 1280x720x60p
4 HDTV: 1920x1080x30i, 1280x720x60p, 2Kx1Kx30p 20Mbps(max)
4.1 HDTV: 1920x1080x30i, 1280x720x60p, 2Kx1Kx30p 50Mbps(max)
4.2 HDTV: 1920x1080x60i, 2Kx1Kx60p
5 SHDTV/D-Cinema: 2.5Kx2Kx30p
5.1 SHDTV/D-Cinema: 4Kx2Kx30p
-24-
Introduction
Parameter set limits for each Level

Level
number
Max
macroblock
processing rate
(MB/s)
Max
frame
size
(MBs)
Max decoded
picture buffer
size (1024 bytes)
Max video
bit rate
(1000 bits/s or
1200 bits/s)
Max
CPB size
(1000 bits or
1200 bits)
Vertical MV
component range
(luma frame samples)
Min
compression
ratio
Max number of
MVs per two
consecutive
MBs
1 1 485 99 148.5 64 175 [-64,+63.75] 2

-

1.1 3 000 396 337.5 192 500 [-128,+127.75]
2

-
1.2 6 000 396 891.0 384 1 000 [-128,+127.75]
2

-
1.3 11 880 396 891.0 768 2 000 [-128,+127.75]
2

-

2 11 880 396 891.0 2 000 2 000 [-128,+127.75] 2

-

2.1 19 800 792 1 782.0 4 000 4 000 [-256,+255.75]
2

-
2.2 20 250 1 620 3 037.5 4 000 4 000 [-256,+255.75]
2

-

3 40 500 1 620 3 037.5 10 000 10 000 [-256,+255.75] 2

32

3.1 108 000 3 600 6 750.0 14 000 14 000 [-512,+511.75]
4

16
3.2 216 000 5 120 7 680.0 20 000 20 000 [-512,+511.75]
4

16

4 245 760 8 192 12 288.0 20 000 25 000 [-512,+511.75] 4

16

4.1 245 760 8 192 12 288.0 50 000 62 500 [-512,+511.75]
2

16
4.2 491 520 8 192 12 288.0 50 000 62 500 [-512,+511.75]
2

16
5 589 824 22 080 41 310.0 135 000 135 000 [-512,+511.75]
2

16
5.1 983 040 36 864 69 120.0 240 000 240 000 [-512,+511.75]
2

16

-25-
Layered Structure
Two Layers : Network Abstraction Layer (NAL),
Video Coding Layer (VCL)
NAL
Abstracts the VCL data hence the name Network Abstraction
Layer
Header information about the VCL format
Appropriate for conveyance by the transport layers or storage
media
NAL unit (NALU) defines a generic format for use in both packet
based and bit-streaming systems
VCL
Core coding layer
Concentrates on attaining maximum coding efficiency
-26-
Layered Structure
Elements of VCL
-27-
Layered Structure
Supporting picture format : 4:2:0 chroma sampling

CIF
Format



QCIF
format
352 4
288 lines
360 pels
4
2 2
144
lines
176
180 pels
2 2
144
lines
176
180 pels
176 2
144
lines
180 pels
2
1 1
72
lines
88
90 pels
1 1
72
lines
88
90 pels
Y C
b
C
r
-28-
Video Coding Algorithm
Block diagram for H.264 encoder
Transform &
Quantization
Motion
Estimation
Motion
Compensation
Picture
Buffering
Entropy
Coding
Intra
Prediction
Intra/Inter Mode
Decision
Inverse Quantization
& Inverse Transform
Deblocking
Filter
+
-
+
+
Video Input
Bitstream
Output
-29-
Video Coding Algorithm
Block diagram for H.264 Decoder
Motion
Compensation
Entropy
Decoding
Intra
Prediction
Intra/Inter Mode
Selection
Inverse Quantization
& Inverse Transform
Deblocking
Filter
+
+
Bitstream
Input
Video
Output
Picture
Buffering
-30-
VC Algorithm : Intra Prediction
Exploits Spatial redundancy between adjacent macroblocks in a
frame
4 x 4 luma block
9 prediction modes : 8 Directional predictions and 1 DC prediction
(vertical : 0, horizontal : 1, DC : 2, diagonal down left : 3, diagonal down right : 4,
vertical right : 5, horizontal down : 6, vertical left : 7, horizontal up : 8)
a b c d
e f g h
i j k l
m n o p
A B C D
I
J
K
L
M E F G H
mode 1
mode 6
mode 0 mode 5
mode 4
a b c d
e f g h
i j k l
m n o p
A B C D
I
J
K
L
M E F G H
mode 8
mode 3 mode 7
samples a, b, , p : the predicted ones for the current block,
above and left samples A, B, , M : previously reconstructed ones
-31-
VC Algorithm : Intra Prediction
Example of 4 x 4 luma block
Sample a, d : predicted by round(I/4 + M/2 + A/4), round(B/4 + C/2 + D/4) for
mode 4
Sample a, d : predicted by round(I/2 + J/2), round(J/4 + K/2 + L/4) for mode 8
a b c d
e f g h
i j k l
m n o p
A B C D
I
J
K
L
M E F G H
mode 4
a b c d
e f g h
i j k l
m n o p
A B C D
I
J
K
L
M E F G H
mode 8
-32-
VC Algorithm : Intra Prediction
16 x 16 luma
4 prediction modes
(vertical : 0, horizontal : 1, DC : 2, plane : 3)
Plane: works well in smoothly varying luminance.
A linear plane function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor
to improve prediction performance

Plane
-33-
VC Algorithm Intra Prediction
Chroma always operates using full MB prediction
(8x8) 4:2:0 Format
(8x16) 4:2:2
(16x16) 4:4:4

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC: 0, Horizontal: 1, Vertical: 2, Plane: 3)
-34-
VC Algorithm : Inter Prediction
Exploits temporal redundancy
Prediction of variable block sizes
Sub-pel motion compensation
Deblocking filter
Management of multiple reference pictures



-35-
VC Algorithm : Inter Prediction
Prediction of variable block size
A MB can be partitioned into smaller block sizes
4 cases for 16 x 16 MB, 4 cases for 8 x 8 Sub-MB
Large partition size : homogeneous areas, small : detailed areas











Cannot mix the two partitions .i.e. cannot have 16x8 and 4x8 partitions
When sub-MB partition (8x8) is selected, the (8x8) block can be further partitioned
-36-
VC Algorithm : Inter Prediction
Sub-pel motion compensation
Better compression performance than integer-pel MC
Expense of increased complexity
Outperforms at high bit rates and high resolutions



Transform &
Quantization
Motion
Estimation
Motion
Compensation
Picture
Buffering
Entropy
Coding
Intra
Prediction
Intra/Inter Mode
Decision
Inverse Quantization
& Inverse Transform
Deblocking
Filtering
+
-
+
+
Video Input
Bitstream
Output









motion vector accuracy 1/4 (6 tap filter)

0
0
1
0 1
0 1
2 3
MB
16x16 16x8 8x16 8x8
0
0
1
0 1
0 1
2 3
Sub
MB
8x8 8x4 4x8 4x4
-37-
VC Algorithm : Inter Prediction
Sub-pel accuracy












A distinct MV can be sent for each sub-MB partition. ME can be based
on multiple pictures that lie in the past or in the future in display order.
Reference picture for ME is selected at the MB partition level. Sub-MB
partitions within the same MB partition must use the same reference
picture.

Integer position pixels
1/8 pixels
1/2 and 1/4 pixels
-38-
VC Algorithm : Inter Prediction
Half-pel : interpolated from neighboring integer-pel samples
using a 6-tap Finite Impulse Response filter with weights (1, -5,
20, 20, -5, 1)/32
Quarter-pel : produced using bilinear interpolation between
neighboring half- or integer-pel samples



bb
a c E F I J G
h
d
n
H
m
A
C
B
D
R
T
S
U
M s N K L P Q
f e g
j i k
q p r
aa
b
cc dd ee ff
hh
gg
b = round((E-5F+20G+20H-5I+J)/32)
a = round((G+b)/2)
-39-
VC Algorithm : Inter Prediction
Deblocking filter Adaptive
To reduce the blocking artifacts in the block boundary and
prevent the propagation of accumulated coded noise
Filtering is applied to horizontal or vertical edges of 4 x 4 blocks
in a macroblock, adaptively on the several levels (slice, block-
edge, sample)



Vertical edges
(chroma)
Vertical edges
(luma)
Horizontal edges
(luma)
Horizontal edges
(chroma)
16*16 Macroblock 16*16 Macroblock
-40-
VC Algorithm : Inter Prediction
Management of multiple reference pictures
To take care of marking some stored pictures as unused and
deciding which pictures to delete from the buffer


Transform &
Quantization
Motion
Estimation
Motion
Compensation
Picture
Buffering
Entropy
Coding
Intra
Prediction
Intra/Inter Mode
Decision
Inverse Quantization
& Inverse Transform
Deblocking
Filtering
+
-
+
+
Video Input
Bitstream
Output







management of multiple reference pictures
(short term, long term)
-41-
VC Algorithm : Transform & Quantization
Transform
Integer transform, multiplier free : additions and shifts in 16-bit arithmetic
Hierarchical structure : 4 x 4 Integer DCT + Hadamard transform
0 1 4 5
2 3 6 7
8 9 12 13
10 11 14 15
00 01 02 03
10 11 12 13
20 21 22 23
30 31 32 33
Assignment of the indices of DC (dark samples) to luma 4 x 4 block,
the numbers 0, 1, , 15 are the coding order for (4x4) integer DCT transform
(0,0), (0,1), (0,2), , (3,3) are DC coefficients of each 4x4 block
Hadamard transform is applied only when (16x16) intra prediction mode is used
with (4x4) IntDCT. Similarly for the chroma: MB size for chroma depends on 4:2:0,
4:2:2 and 4:4:4 formats
-42-
VC Algorithm : Transform
4 x 4 integer DCT
X : input pixels, Y : output
coefficients

Y=(C
f
x C
f
T
) E
f
1 2 1
, ,
2 5 2
a b d = = =
Implies element by element multiplication
00 01 02 03
10 11 12 13
20 21 22 23
30 31 32 33
1 1 1 1 1 2 1 1
2 1 1 2 1 1 1 2
1 1 1 1 1 1 1 2
1 2 2 1 1 2 1 1
( ( (
( ( (

( ( (
=
( ( (
( ( (

( ( (

x x x x
x x x x
Y
x x x x
x x x x
2 2
2 2
2 2
2 2
2 2
2 4 2 4
2 2
2 4 2 4
ab ab
a a
ab b ab b
ab ab
a a
ab b ab b
(
(
(
(
(
(
(
(
(
(
(

-43-
4x4 Inverse IntDCT
2 2
2 2
2 2
2 2
2 2
2 4 2 4
2 2
2 4 2 4
ab ab
a a
ab b ab b
ab ab
a a
ab b ab b
(
(
(
(
(
(
(
(
(
(
(

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E
f
and E
i
2 2
2 2
2 2
2 2
[ '] [ ]
a ab a ab
ab b ab b
Y Y
a ab a ab
ab b ab b
(
(
(
=
(
(

Here
X = C
i
T
(Y E
i
) C
i
-44-
VC Algorithm : Transform
Luma DC coefficients for Intra 16x16 MB
16 DC coefficients of 16 (4x4) blocks are transformed using
Walsh Hadamard transform
2 //
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
33 32 31 30
23 22 21 20
13 12 11 10
03 02 01 00
|
|
|
|
|
.
|

\
|
(
(
(
(




(
(
(
(
(

(
(
(
(




D D D D
D D D D
D D D D
D D D D
x x x x
x x x x
x x x x
x x x x
Y
D
=
where // = rounding to the nearest integer
-45-
VC Algorithm : Transform
Chroma DC coefficients Intra pediction mode
(4x4) IntDCT
Walsh Hadamard transform : 2 x 2 DC coefficients
Y
D
=
(

1 1
1 1
1 1
1 1
11 10
01 00
DC DC
DC DC
18
19
20
21
22
23
24 25
V U
2x2 DC
AC
16
17
, 4:2:0
For 4:2:2 and 4:4:4 chroma formats Hadamard block size is increased.
-46-
VC Algorithm : Transform
Block diagram emphasizing transform
Transform &
Quantization
Motion
Estimation
Motion
Compensation
Picture
Buffering
Entropy
Coding
Intra
Prediction
Intra/Inter Mode
Decision
Inverse Quantization
& Inverse Transform
Deblocking
Filtering
+
-
+
+
Video Input
Bitstream
Output

- 4 x 4 integer DCT transform


H =



- Hadamard transform of DC coefficients
for 16 x 16 Intra luma and 8 x 8 chroma blocks






1 1 1 1
2 1 1 2
1 1 1 1
1 2 2 1
-47-
VC Algorithm : Quantization
Multiplication operation for the exact transform
is combined with the multiplication of scalar
quantization
Encoder : post-scaling and quantization
Decoder : inverse quantization and pre-scaling

|
|
.
|

\
|
=
Qstep
SF
round X Y
ij
ij ij
ij ij ij
SF Qstep Y X - - = '
X : quantizer input
Y : quantizer output
Qstep : quantization parameter, a total of 52 values, doubles in size for
every increment of 6 in QP 8 for bits per decoded sample.
FRExt expands QP beyond 52 by 6 for each additional bit of decoded sample
SF : scaling term
-48-
VC Algorithm : Transform, Quantization
Rescale and Inverse transform
Intra (16x16) prediction mode only

Forward
transform
Post-scaling
and
quantization
2x2 or 4x4
DC
transform
Chroma or Intra-
16 Luma Only
Encoder part
Input
block
Inverse
quantization and
pre-scaling
Inverse
transform
2x2 or 4x4
DC inverse
transform
Chroma or Intra-
16 Luma Only
Decoder part
Encoder
output /
decoder
input
Output
block
-49-
VC Algorithm : Entropy Coding
All syntax elements other than residual transform
coefficients are encoded by the Exp-Golomb codes (UVLC)
Scan order to read the residual data (quantized transform
coefficients) : zig-zag, alternate
Context-based Adaptive Variable Length Coding (CAVLC) in
All Profiles
Context-based Adaptive Binary Arithmetic Coding (CABAC)
in Main Profile


0 1 5 6
2 4 7 12
3 8 11 13
9 10 14 15
a b
0 2 8 12
1 5 9 13
3 6 10 14
4 7 11 15
Zig-zag scan
Alternate scan
-50-
Exponential Golomb codes (for data elements other than
tansform coefficients these codes are actually fixed,
and are also called Universal Variable Length Codes
(UVLC))
-51-
These are variable length codes with a regular construction
[M Zeroes] [1] [INFO]

INFO is an M-bit carrying information.
The first codeword as no leading zero or trailing info.

Code words 1 and 2 have a single-bit INFO field, code words 3-6
have a two-bit INFO field and so on.

The length of each Exp-Golomb codeword is (2M+1) bits.
M = Floor (Log
2
[code_num + 1])
INFO = code_num + 1 2
M
-52-
Decoding
1. Read in M leading zeroes followed by 1
2. Read in M-bit INFO field
3. Code_num = 2
M
+ INFO 1

(For codeword 0, INFO and M are zero)

CAVLC: Codes transform coefficients
CABAC: Codes transform coefficients and MV

All other syntax elements are coded with the
Exp_Golomb codes
-53-
VC Algorithm : Entropy Coding
CAVLC : handles the zero and +/-1 coefficients as the different
manner with the levels of coefficients. The total numbers of zeros
and +/-1 are coded. For the other coefficients, their levels are
coded.
Encoding steps
step 1 : encode the total number of nonzero coefficients and +/-1 (trailing
ones) values
step 2 : encode the sign of each trailing one in reverse order
step 3 : encode the levels of the remaining non-zero coefficients in reverse
order
step 4 : encode the total number of zeros before the last coefficient
step 5 : encode each run of zeros

H.264 maintains 11 different sets of codes (4 for # of coefficients and 7 for
the actual coefficients)
These are adopted to the current stream or context (thus CAVLC)


-54-
VC Algorithm : Entropy Coding
Example of CAVLC

c0 c1 c2 0 1 1 0 1 0 0 0
0 1 2 3 4 5 6 7 8 9 16
order
coeff.
Step 1 : encode for no. of nonzero total coefficients and 1 or 1 (trailing ones)
from look-up table
no. of nonzero total coefficients = 6 (order 0, 1, 2, 4, 5, 7)
no. of trailing ones = 3 (order 4, 5, 7)
Step 2 : encode for sign of trailing one in reverse order
- (order 7) , + (order 5), + (order 4)
Step 3 : encode for level of remaining non-zero coefficients in reverse order
c2 (order 2), c1, c0
Step 4 : encode for total no. of zeros before the last coefficient
2 (order 3, 6)
Step 5 : encode for run of zeros in reverse order
1 (order 6-5), 0 (order 4), 1 (order 3-2)
-55-
VC Algorithm : Entropy Coding
CABAC : utilizes the arithmetic coding, also in order to
achieve good compression, the probability model for each
symbol element is updated. Both MV and residual
transform coefficients are coded by CABAC.
Encoding steps
step 1 : context modeling: Choose a suitable model
step 2 : binarization: I f a symbol is non-binary valued it will be
mapped into a sequence of binary decisions called bins
step 3 : binary arithmetic coding using probability estimates provided
by context modeling

-56-
CABAC increases compression efficiency by 10% over CAVLC but computationally more intensive
-57-
VC Algorithm : B Slice
Generalized Bidirectional prediction
Supports not only forward/backward prediction pair, but also
forward/forward and backward/backward pairs
Direct mode
Derives reference picture, block size, and motion vector data
from the subsequent inter picture.
Weighted prediction
Scaling operation by applying a weighting factor to the samples
of motion-compensated prediction data in P or B slice.

Pictures coded using B slices can be used as references for
decoding of subsequent pictures in decoding order (with an
arbitrary relationship to such pictures in display order)


-58-
VC Algorithm : B Slice
Generalized Bidirectional prediction
Multiple reference pictures mode
Two forward references : proper for a region just before scene
change
Two backward references : proper for a region just after scene
change


......
next pictures
current picture
...... ......
......
previous pictures
2 forward MVs
2 backward MVs
1 forward MV +
1 backward MV
-59-
VC Algorithm : B Slice
Direct mode
Forward / backward pair of bi-directional prediction
Prediction signal is calculated by a linear combination of two
blocks that are determined by the forward and backward
motion vectors pointing to two reference pictures.


List 0 Reference
td
tb
mvCol
mvL0
mvL1
......
direct-mode partition
co-located partition
List 1 Reference Current Picture
mvL0 = tb mvCol / td
mvL1 = (td tb) mvCol / td

where mvCol is a MV used
in the co-located MB of
the subsequent picture
-60-
VC Algorithm : B Slice
Weighted prediction
Different weights of reference signals for gradual transitions
from scene to scene, i.e., fade to black (the luma samples of
the scene gradually approach zero), fade from black
Different weighted prediction method for a macroblock of P
slice or B slice
A prediction signal p for B slice is obtained by different weights
from two reference signals, r1 and r2.
p = w1 r1 + w2 r2
where w1 and w2 are weighting factors
Implicit type : the factors are calculated based on the temporal
distance between the pictures
Explicit type : the factors are transmitted in the slice header

-61-
VC Algorithm: SP and SI Slices (Extended profile only)
Switched slice
SP slice : the specially coded slice for efficient switching
between video streams, similar to coding of a P slice
SI slice : the switched slice, similar to coding of an I slice


P(1,1) P(1,2) P(1,3) P(1,4) P(1,5)
P(2,1) P(2,2)
P(2,3) P(2,4) P(2,5)
S(3)
Bitstream A
Bitstream B
Allows bit stream switching and additional functionalities such as random access, fast forward,
reverse and stream splicing.
-62-
Error Resilience
Parameter setting
Flexible macroblock ordering (FMO)
Redundant slice methods
Switched slice SP/SI
Data partitioning
Arbitrary Slice Order ASO


Only in Extended Profile
-63-
Data partitioning slices (Extended profile only)
1. Coded data of a slice is placed in three separate data
partitions A,B & C.
2. A has slice header and header data for each MB in the
splice
3. B has coded residual data for intra and SI slice MBs
4. C has coded residual data for inter coded MB
5. Place each partition A, B & C in a separate NAL unit
and transport separately
-64-
Error Resilience : Parameter setting
The sequence parameter set contains all information
related to a sequence of pictures
a picture parameter set contains all information related to
all the slices belonging to a single picture.
The encoder chooses the appropriate picture parameter set
to use by referencing the storage location in the slice
header of each coded slice.
H.264
Encoder
H.264
Decoder
Parameter Set #3
-Video format NTSC
-Motion Resolution

-Enc: CABAC
-Frame width: 11


1

2

3

3

2

1
Reliable Parameter Set
Exchange
VCL Data transfer with PS #3
-65-
Error Resilience : FMO
Flexible macroblock ordering allows to assign macroblocks
to slices in an order other than the scan order.
Assume that all macroblocks of the picture are allocated
either to slice group 0 or slice group 1, and the
macroblocks in each slice group are dispersed through the
picture.
If the packet containing the information of slice group 1 is lost
during transmission, then the lost macroblock can be
recovered by the error concealment mechanism, since every
lost macroblock has several spatial neighbors that belong to
the other slice.
ASO is similar to FMO. Randomizes data prior to
transmission. Errors are distributed more randomly over
the video frames rather than in a single block of data.

-66-
Error Resilience : Redundant Slice
Redundant slices allow to place one or more redundant
representations of the same macroblocks.
For example, the primary representation can be coded with
a low quantization parameter (hence in good quality),
whereas the redundant slice can be coded with a high
quantization parameter (hence, in a much coarser quality,
but also utilizing fewer bits).
A decoder reacts to redundant slices by reconstructing only
the primary slice, if it is available, and discarding the
redundant slice. However, if the primary slice is missing,
the redundant slice can be reconstructed.

-67-
Comparison of Coding Efficiency
Subjective verification test
Comparison of the H.264 Baseline Profile (BP) and MPEG-4 part
2 Simple Profile (SP) for the multimedia definition (MD). The
numbers in the table indicate the coding efficiency improvement
achieved by the H.264 where the codecs being compared
provide statistically equivalent picture quality. The letter T
indicates that H.264 achieved transparency.
H.264 Baseline Profile achieves a coding efficiency improvement
of 2 times or greater in 14 out of 18 statistically conclusive cases.



Sequence

Bitrate[kbps] for QCIF

Bitrate[kbps] for CIF

24

48

96

192

96

192

384

768

Foreman

> 1x

2x

2x

T

2x

> 2x

T

T

Paris

> 1x

2x

2x



2x

2x

T, 2x

T

Head



> 2x

2x



2x



T

T

Zoom

> 1x

1x

2x



2x









-68-
Comparison of Coding Efficiency
Subjective verification test
Comparison of H.264 Main Profile (MP) and MPEG-4 Part 2
Advanced Simple Profile (ASP) for the MD.
H.264 Main Profile achieves a coding efficiency improvement
of 2 times or greater in 18 out of 25 statistically conclusive
cases.







Sequence

Bitrate[kbps] for QCIF

Bitrate[kbps] for CIF

24

48

96

192

96

192

384

768

Football

2x / 1x

2x

2x



> 1x

> 1x

1x

> 1x

Mobile

2x / 1x

2x

2x



> 2x

4x

> 2x

T

Husky

2x

2x

> 1x



2x

2x

2x



Tempete

2x

2x

> 2x

T

2x

2x

T,2x

T



-69-
Comparison of Coding Efficiency
Subjective verification test
Comparison of H.264 Main Profile and MPEG-2 for the Standard
Definition (SD)
When compared to MPEG-2 HiQ (real-time High Quality), H.264
Main Profile achieves a coding efficiency improvement of 1.5
times or greater in 8 out of 12 statistically conclusive cases.
When compared to MPEG-2 TM5, H.264 Main Profile achieves a
coding efficiency improvement of 1.8 times or greater in 9 out
of 12 statistically conclusive cases.












Sequence

Bitrate[Mbps] for MPEG-2 HiQ

Bitrate[Mbps] for MPEG-2 TM5

1.5

2.25

3

4

6

1.5

2.25

3

4

6

Football

> 1.5x

> 1.3x

1.3x

1.5x



2x

1.8x

1.3x

1.5x



Mobile

4x

2.7x

2x

T

T

> 4x

> 2.7x

> 2x

T

T

Husky

> 1.5x

1.3x

1x /1.3x

1.5x



2.7x / 2x

1.8x

2x

> 1.5x



Tempete

T, 2x

T

T

T

T

T, 4x

T

T

T

T



-70-
Comparison of Coding Efficiency
Subjective verification test
Comparison of H.264 Main Profile and MPEG-2 for the High
Definition (HD)
When compared to MPEG-2 HiQ, H.264 Main Profile achieves a
coding efficiency improvement of 1.7 times or greater in 7 out
of 9 statistically conclusive cases.
When compared to MPEG-2 TM5, H.264 Main Profile achieves a
coding efficiency improvement of 1.7 times or greater in 8 out
of 9 statistically conclusive cases.















Sequence

Bitrate[Mbps] for MPEG-2 HiQ

Bitrate[Mbps] for MPEG-2 TM5

6

10

20

6

10

20

720
(60p)

Crew

1.7x

2x

T

1.7x

2x

T

Harbour

T, 3.3x

T

T

T, 1.7x

T

T

1080
(30i)

Stockholm Pan



1x





2x



New Mobile &
Calendar



T, 2x

T



T, 2x

T

1080
(25p)

River Bed

> 1.7x

> 1x

T

> 1.7x

> 1x

T

Vintage Car

1.7x

T, 2x

T

1.7x

T, 2x

T



-71-
Comparison of Coding Efficiency
Objective test
PSNR (between original and reconstructed pictures) and bitrate
saving results of Tempete CIF 15Hz sequence for the video
streaming application















HLP High Latency Profile
ASP Advanced Simple Profile
H.26L H.264 Main Profile
-72-
Comparison of Coding Efficiency
Objective test
PSNR and bitrate saving results of Paris CIF 15Hz sequence for
the video conferencing application















CHC Conversational High Compression
SP Simple Profile
ASP Advanced Simple Profile
H.26L H.264 Baseline Profile
-73-
Conclusions
H.264 outperforms over the previous standards
Comparison of standards















Feature/Standard MPEG-1 MPEG-2 MPEG-4 part 2
(visual)
H.264/MPEG-4
part 10
Macroblock size 16x16 16x16 (frame mode)
16x8 (field mode)
16x16 16x16
Block Size 8x8

8x8

16x16, 16x8, 8x8 16x16, 8x16, 16x8,
8x8, 4x8, 8x4, 4x4
Transform 8x8 DCT 8x8 DCT 8x8 DCT/Wavelet 4x4, 8x8 Int DCT
4x4, 2x2 Hadamard
Quantization Scalar quantization
with step size of
constant increment
Scalar quantization
with step size of
constant increment
Vector
quantization
Scalar quantization
with step size
increase at the rate
of 12.5%
Entropy coding VLC VLC VLC VLC, CAVLC, CABAC
Motion Estimation &
Compensation
Yes Yes Yes Yes, more flexible
Up to 16 MVs per MB
Playback & Random
Access
Yes Yes Yes Yes
-74-
Conclusions
Comparison of standards (continued)
















Feature/Standard MPEG-1 MPEG-2 MPEG-4 part 2
(visual)
H.264/MPEG-4
part 10
Pel accuracy Integer, -pel Integer, -pel Integer, -pel,
-pel
Integer, -pel,
-pel
Profiles No 5 8 4
Reference picture one one one multiple
Bidirectional
prediction mode
forward/backward forward/backward forward/backward forward/forward
forward/backward
backward/backward
Picture Types I, P, B, D I, P, B I, P, B I, P, B, SP, SI
Error robustness Synchronization &
concealment
Data partitioning,
FEC for important
packet
transmission
Synchronization,
Data partitioning,
Header extension,
Reversible VLCs
Data partitioning,
Parameter setting,
Flexible macroblock
ordering, Redundant
slice, Switched slice
Transmission rate Up to 1.5Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps
Compatibility with
previous standards
n/a Yes Yes No
Encoder complexity Low Medium Medium High
-75-
Conclusions















Currently the commercial H.264 codecs are widely developed by
several companies for replacing / complementing existing products.
Related companies
- UBVideo website http://www.ubvideo.com
- LSI Logic website http://www.lsilogic.com
- Microsoft website: http://www.microsoft.com
- Envivio website: http://www.envivio.com
- Broadcom website: http://www.broadcom.com
- Nagravision website: http://www.nagravision.com
- Philips website: http://www.philips.com
- Polycom website: http://www.polycom.com
- PixelTools Corporation website: http://www.pixeltools.com
- Amphion website: http://www.amphion.com


-76-
Conclusions















Related companies (continued)
- Ligos Corporation website: http://www.ligos.com
- LifeSize website: http://www.lifesize.com
- Netvideo website: http://www.netvideo.com
- Motorola website: http://www.motorola.com
- Vanguard Software Solutions website: http://www.vsofts.com
- STMicroelectronics website: http://us.st.com
- MainConcept website: http://www.mainconcept.com
- Impact Labs Inc. website: http://www.impactlabs.com
- Sorenson media AVC Pro codec (H.264)
- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and
Microsofts VC-1 video codec (based on Windows Media Video 9 codec)
mandatory (blu-ray Disc BD-ROM specification)
-77-
Conclusions















Related group
- MPEG website http://www.mpeg.org
- JVT website: ftp://standards.polycom.com
- www.mpegif.org
Test software
http://iphome.hhi.de/suehring/tml/download
- H.264/AVC JM Software:
http://bs.hhi.de/~suehring/tml/download
Test sequences
- http://ise.stanford.edu/video.html
- http://kbs.cs.tu-berlin.de/~stewe/vceg/sequences.htm
- http://www.its.bldrdoc.gov/vqeg
- ftp.tnt.uni-hannover.de/pub/jvt/sequences/
- http://trace.eas.asu.edu/yuv/yuv.html
-78-
Conclusions
H.264 licensing : MPEG LA and Via Licensing are now coordinating
the licensing terms, decoder-encoder royalties for product
manufacturers and participation fees for video streaming services
regardless of Profile(s)
MPEG LA website : http://www.mpegla.com
Via Licensing : http://www.vialicensing.com

FRExtensions
to 4:2:2 and 4:4:4 chroma formats
12 bit resolution for medical imaging
Scalable coding/ Lossless coding for digital cinema application
High fidelity coding for the next generation optical discs
Extension for various applications H. Schwartz, D. Marpe and T.
Wiegand, SNRscalable extension of H.264/AVC, ICIP 2004,
vol. , pp. , Singapore, Oct. 2004.
FINAL STAGES OF APPROVAL
Standard systems and file format support specifications
Standardizing reference software implementation
Standardizing conformance bit streams and specifications















-79-
Contacts for Further Information
JVT documents and software on open ftp website:
ftp://standards.polycom.com
http://iphome.hhi.de/suehring
JVT reflector subscription:
http:/mail.imtc.org/cgi-bin/lyris.pl?enter=jvt-experts
JVT reflector e-mail:
jvt-experts@mail.imtc.org
JVT management team:
Chair: Gary Sullivan (garysull@microsoft.com)
Co-chair: Ajay Luthra (aluthra@motorola.com)
Co-chair: Thomas Wiegand (wiegand@hhi.de)
Dr. K. R . Rao, UTA: rao@uta.edu
Dr. S. K. Kwon, Dongeui University: skkwon@dongeui.ac.kr
Ms. A. Tamhankar, T-Mobile: arundhati@ieee.org
Karsten.suehring@hhi.fraunhofer.de















-80-
References
[1] MPEG-2: ISO/IEC JTC1/SC29/WG11 and ITU-T, ISO/IEC 13818-2:
Information Technology-Generic Coding of Moving Pictures and
Associated Audio Information: Video, ISO/IEC and ITU-T, 1994.
[2] MPEG-4: ISO/IEC JTCI/SC29/WG11, ISO/IEC 14 496:2000-2:
Information on Technology-Coding of Audio-Visual Objects-Part 2:
Visual, ISO/IEC, 2000.
[3] H.263 : International Telecommunication Union, Recommendation
ITU-T H.263: Video Coding for Low Bit Rate Communication, ITU-T,
1998.
[4] H.264 : International Telecommunication Union, Recommendation
ITU-T H.264: Advanced Video Coding for Generic Audiovisual Services,
ITU-T, 2003.
[5] T. Stockhammer, M. Hannuksela, and S. Wenger, H.26L/JVT
Coding Network Abstraction Layer and IP-based Transport, IEEE ICIP
2002, Rochester, New York, Vol. 2, pp. 485-488, Sep. 2002.
-81-
[6] P. List, A. Joch, J. Lainema, G. Bjontegaard, and M. Karczewicz,
Adaptive Deblocking Filter, IEEE Trans. CSVT, Vol. 13, pp. 614-619,
July 2003.
[7] K. R. Rao and P. Yip, Discrete Cosine Transform, Academic Press,
1990.
[8] I. E.G. Richardson, H.264 and MPEG-4 Video Compression : Video
Coding for Next-generation Multimedia, Wiley, 2003.
[9] H. S. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, Low-
Complexity Transform and Quantization in H.264/AVC, IEEE Trans.
CSVT, Vol. 13, pp. 598-603, July 2003.
[10] S. W. Golomb, Run-Length Encoding, IEEE Trans. on
Information Theory, IT-12, pp. 399-401, December 1966.
[11] D. Marpe, H. Schwarz, and T. Wiegand, Context-Based Adaptive
Binary Arithmetic Coding in the H.264/AVC Video Compression
Standard, IEEE Trans. CSVT, Vol. 13, pp. 620-636, July 2003.
-82-
[12] M. Flierl and B. Girod, Generalized B Picture and the Draft
H.264/AVC Video-Compression Standard, IEEE Trans. CSVT, Vol. 13, pp.
587-597, July 2003.
[13] M. Karczewicz and R. Kurceren, The SP- and SI-Frames Design for
H.264/AVC, IEEE Trans. CSVT, Vol. 13, pp. 637-644, July 2003.
[14] S. Wenger, H.264/AVC Over IP, IEEE Trans. CSVT, Vol. 13, pp.
645-656, July 2003.
[15] ISO/IEC JTC1/SC29/WG11, Report of The Formal Verification Tests
on AVC (ISO/IEC14496-10 | ITU-T Rec. H.264), MPEG2003/N6231,
December 2003.
[16] M. Ghanbari, Standard Codecs : Image Compression to Advanced
Video Coding, Hertz, UK: IEE, 2003.
[17] A. Joch, F. Kossentini, H. Schwarz, T. Wiegand, and G. J. Sullivan,
Performance Comparison of Video Coding Standards using Lagrangian
Coder Control, IEEE ICIP 2002, Rochester, New York, Vol. 2, pp. 501-
504, Sept. 2002.
-83-
[18] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra,
Overview of the H.264/AVC Video Coding Standard, IEEE Trans. CSVT,
Vol. 13, pp. 560-576, July 2003.
[19] MPEG website : http://www.mpeg.org
[20] JVT website : ftp://standards.polycom.com
[21] MPEG LA website : http://www.mpegla.com
[22] H.264 / AVC JM Software :
http://bs.hhi.de/~suehring/tml/download
[23] UBVideo website http://www.ubvideo.com
[24] LSI Logic website: http://www.lsilogic.com
[25] Microsoft website: http://www.microsoft.com
[26] Envivio website: http://www.envivio.com
[27] PixelTools Corporation website: http://www.pixeltools.com
[28] Nagravision website: http://www.nagravision.com
[29] Philips website: http://www.philips.com
-84-
[30] Polycom website: http://www.polycom.com
[31] MainConcept website: http://www.mainconcept.com
[32] Amphion website: http://www.amphion.com
[33] Ligos Corporation website: http://www.ligos.com
[34] LifeSize website: http://www.lifesize.com
[35] Broadcom website: http://www.broadcom.com
[36] Netvideo website: http://www.netvideo.com
[37] Motorola website: http://www.motorola.com
[38] http://www.mediaware.com
[39] Impact Labs Inc. website: http://www.impactlabs.com
[40] Vanguard Software Solutions website: http://www.vsofts.com
[41] STMicroelectronics website: http://us.st.com www.thomson.net
[42] www.conexant.com (H.264 decoder ICs _ HDTV & SDTV)
[43] www.pixtree.com
-85-
[44] BT Exact--http://www.btexact.bt.com/
[45] DemoGaFrX--www.dolby.com
[46] Equator--http://www.equator.com/
[47] Moonlight--www.elecard.com
[48] Sand Video--www.broadcom.com/
[49] VideoLocus-
http://www.lsilogic.com/technologies/industry_standards/mpeg_based_
standards_h_264.html
[50] W&W Communications (and DSP Research)--
http://www.wwcoms.com/
[51] Cisco Systems -- www.cisco.com
[52] Deutsche Telekom-- http://www.telekom3.de/en-p/home/cc-
startseite.html
-86-
[53] FastVDO-- http://www.fastvdo.com/
[54] Glance Networks---http://www.glance.net
[55] RADVISION-- www.radvision.com/
[56] Sun Microsystems--http://www.sun.com/
[57] S. Srinivasan et al, Windows media video 9: Overview and
applications, Signal Processing: Image Communication, vol.19, pp.
851-875, Oct. 2004.
[57a] G. Sullivan and T. Wiegand, Video compression from
concepts to H.264/AVC standard, Proc. IEEE, vol.93, pp. 18-31,
Jan. 2005.
[57b] C. Gomila, The H. 264/MPEG -4 AVC video coding standard,
Short tutorial, EURASIP News Letter, vol. 15, pp. 19-34, June 2004.
[58] http://ecs.itu.ch

-87-
[59] N. Kamaci and Y. Altunbasak, Performance comparison of the
emerging H.264 video coding standard with the existing standards,
IEEE ICME, pp. , Baltimore, MD, July 2003.
[60] H. Schwartz, D. Marpe and T. Wiegand, SNRscalable
extension of H.264/AVC, ICIP 2004, vol. , pp. , Singapore, Oct.
2004.
[61] G. J. Sullivan, P. Topiwala and A. Luthra The H.264/AVC
advanced video coding standard: Overview and introduction to the
fidelity range extensions, SPIE Conf. on applications of digital image
processing XXVII, vol. 5558, pp. 53-74, Aug. 2004.
[62] J. Ostermann et al, Video coding with H.264/AVC: Tools,
performance and complexity, IEEE CAS Magazine, vol. pp.7-34, I
quarter, 2004.
[63] W. Gao et al, AVS The Chinese next-generation video coding
standard, NAB 2004, Las Vegas, NV, April 2004.
[64] http://www.imtc.org/activity_groups/ JVT-EXPERTS LIST (FAQ)
-88-
[65] H.264 / AVC reference SOFWARE 9.3
[66] http://iphome.hhi.de/suehring/tml/download/jm93.zip
[67] S. Kumar et al Overview of error resiliency schemes in
H.264/AVC standard, JVCIR, Special Issue on H.264/AVC, VOL. ,
pp. , June-Aug. 2005.
[68] www.stmicroelectronics.com WMV 9 and HD H.264/AVC decoder
chip (STB7100)
[69] a. Concept Main
http://www.mainconcept.com/index_flash.shtml
b. Mpegable
http://www.mpegable.com/show/home.html
c. Moonlight
http://www.moonlight.co.il/cons_xmuxer.php

Moonlights codec is one of the popular ones in the industry and it
supports AAC. All the codecs have a trial version for download and
also sample video clips are available.

-89-
[70] ST Thomson, Broadcom and Ateme
http://www.ateme.com/products/h264.php
have decoder chips for H.264. Ateme has real time single chip
H.264 Main profile encoder (FPGA)
[71] Moscow State University has published a study of current
implementation of H.264 standard, including a widely-used
implementation of MPEG-4 ASP as a reference.
The study is available at:
http://compression.ru/video/codec_comparison/mpeg-
4_avc_h264_en.html
Some of the results and observations in the study may be
interesting to H.264/AVC community.

Another interesting test has been performed in December 2004.
http://www.doom9.org/codecs-104-1.htm The methodology is
completely different than the one used by the Moscow State
University.
It features H264, WM9, RV10, VP6 and MPEG-4 ASP.
-90-
http://www.avc-alliance.org


http://ftp3.itu.int/av-arch/jvt-site

Http://www.dvdforum.org/29cmtg-resolution.htm\
High Profile is now officially mandatory for HD DVD Video (DVD -
Forum).

http://tinyurl.com/3u9ww (up to 3 recommendations can be
downloaded per year)

http://tinyurl.com/6dnck (ISO/IEC 14493-10 - MPEG-4 part 10
published standard costs CHF 260.00 Swiss Franks.)

-91-
Fidelity Range Extensions
Slices in a picture are compressed as follows:
"Intra" spatial (block based) prediction
o Full-macroblock luma or chroma prediction 4 modes
(directions) for prediction
o 8x8 (FRExt-only) or 4x4 luma prediction 9 modes (directions)
for prediction
4:2:2, 4:4:4 Formats
> 8 bit depths
(8x8) integer DCT
HVS weighting matrices
Transform bypass lossless mode: uses prediction and entropy
coding of prediction errors
Residual color transform
Source editing such as Alpha blending
High bit rates [use RGB color format] Y C
g
C
o
High resolution

-92-
"Inter" temporal prediction block based motion estimation and
compensation

o Multiple reference pictures
o Reference B pictures
o Arbitrary referencing order
o Variable block sizes for motion compensation
Seven block sizes:
16x16, 16x8, 8x16, 8x8, 8x4, 4x8 & 4x4
o 1/4-sample luma interpolation (1/4 or 1/8th-sample chroma
interpolation)
o Weighted prediction
o Frame or Field based motion estimation for interlaced scanned
video
-93-
Interlaced coding features
o Frame-field adaptation
Picture Adaptive Frame Field (PicAFF)
Choice of compression (frame or field) is
selected a the frame level
MacroBlock Adaptive Frame Field (MBAFF)
o Field scan
Lossless representation capability
o Intra PCM raw sample-value macroblocks
o Entropy-coded transform-bypass lossless
macroblocks (FRExt-only)

In the MBAFF, choice of compression (frame or field) is
selected at the two-vertical-pair-MB pair.

-94-
8x8 (FRExt-only) or 4x4 Integer Inverse Transform
(conceptually similar to the well-known DCT)

Residual color transform for efficient RGB coding
without conversion loss or bit expansion (FRExt-only)

Scalar quantization

Encoder-specified perceptually weighted quantization
scaling matrices (FRExt-only)

Logarithmic control of quantization step size as a
function of quantization control parameter
-95-
Deblocking filter (within the motion compensation loop)

Coefficient scanning
o Zig-Zag (Frame)

o Field (alternate scan)

Lossless Entropy coding
o Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)
-96-
Error Resilience Tools
o Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

SP and SI synchronization pictures for streaming and other uses

-97-
Various color spaces supported (YCbCr of various types, YCgCo, RGB, etc.
especially in FRExt)

4:2:0, 4:2:2 (FRExt-only), and 4:4:4 (FRExt-only) color formats

Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools. Depending upon the subset of
these tools, a slice can be I, P, B, SP or SI. A picture may contain different
slice types.

-98-
Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted)
(Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)
-99-
I Slice
(MB in I slice and intra MB in P and B slices)

Spatial intra prediction
9 directional modes for (4x4) or (8x8) blocks.

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors.


Note (8x8) IntDCT for FRExt-only.

After (8x8) IntDCT, HVS weighting is applied to coefficients
(FRExt-only).

-100-
Quantized transform coefficients are scanned (zigzag or
field) and then entropy coded (CAVLC or CABAC)

PICAFF: Field processing similar to frame mode

MBAFF: If MB pair in field mode (frame mode), field
(frame) neighbors are used for spatial prediction.
-101-
I Slice (Spatial Prediction)

(16x16) Luma & Corresponding chroma block size
for full MB prediction

(8x8) luma prediction (FRExt-only)

(4x4) Luma prediction
-102-
For (16x16) luma, full MB prediction has four modes

Vertical pels in MB predicted from pels just above of MB

Horizontal pels in MB predicted from pels just left of MB

DC pels in MB are predicted as average value of the
neighboring pels

Planar Prediction
Assume MB covers diagonally increasing luma values.
Predictor is formed based upon the planar equation.
-103-
Chroma spatial prediction (operates on entire MB)

4:2:0 (8x8) Similar to (16x16) Luma MB prediction

4:2:2 (8x16) Vertical, Horizontal, DC, Planar

4:4:4 (16x16)

-104-
For (8x8) luma intra prediction
Nine Intra_8x8 prediction modes similar to the nine
modes for Intra_4x4
FRExt Only
-105-
Integer 8x8 Transform (luma only)
FRExt Only
-106-
FRExt Only
HVS Weighting Matrices
Matrix can be transmitted in SPS and PPS
Separate Matrix for 4x4 and 8x8 transforms
Separate Matrix for Inter and Intra
Encoder can design and use customized scaling matrices.
These are to be sent to the decoder at the sequence or picture level.

Default matrices
-107-
HVS Weighting Matrices
Scaling matrix reflecting visual perception is simply a multiplier
applied during the inverse quantization. (This itself is a
multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y
4x4 Intra C
b
, C
r
4x4 Inter Y
4x4 Inter C
b
, C
r
8x8 Intra Y
8x8 Inter Y
-108-
Two scans similar to 4x4 transform switched for frame/field coding
Coefficient scanning is based on the decreasing variances and to maximize
number of zero-valued coefficients along the scan
Frame Zig-Zag Field
FRExt Only
-109-
Examples of parameters to be encoded
Parameters Description

Sequence, picture and Headers and parameters
slice-layer syntax elements

Macroblock type mb_type Prediction method for each coded
macroblock

Coded block pattern Indicates which blocks within a
macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the
previous value of QP

Reference frame index Identify reference frame(s) for
inter prediction

Motion vector Transmitted as a difference (mvd) from
predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2
block
-110-
Exponential Golomb Codes (for data elements other than transform coefficients
these codes are actually fixed, and are also called Universal Variable Length
Codes (UVLC))
-111-
These are variable length codes with a regular construction
[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information.
The first codeword has no leading zero or trailing INFO.

Code words 1 and 2 have a single-bit INFO field, code words 3-6 have a
two-bit INFO field and so on.

The length of each Exp-Golomb codeword is (2M + 1) bits.
M = Floor(log
2
[ code_num + 1 ])
INFO = code_num + 1 2
M
-112-
Decoding
1. Read in M leading zeros followed by 1
2. Read M-bit INFO field
3. Code_num = 2
M
+ INFO 1


CAVLC: Codes transform coefficients
CABAC: Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes
-113-

DVD Forum: High Profile is mandatory for HD DVD players.

The BD-ROM Video specification of the Blu-ray Disc Association:
FRExtentions are mandatory.

The DVB (digital video broadcast) standards for European broadcast
television. For SD main is mandatory and high is optional. For HD High is
mandatory.

ATSC has preliminarily selected high profile.
Several other environments may soon embrace it as well in the U.S. and
various designs for satellite and cable television.
ADOPTIONS
-114-
For applications such as content-contribution,
content-distribution, and studio editing and post-
processing:

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in
consumer applications (i.e., 4:2:2 or 4:4:4 sampling as opposed to 4:2:0
chroma sampling format)

Perform source editing functions such as alpha blending (a process for
blending of multiple video scenes, best known for use in weather reporting
where it is used to super- impose video of a newscaster over video of a
map or weather-radar scene)
-115-
Use very high bit rates

Use very high resolution

Achieve very high fidelity even representing some parts of the video
losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-
High profile (HP), supporting 8-bit video with 4:2:0
sampling, addressing high-end consumer use and other
applications using high-resolution video without a need for
extended chroma formats or extended sample accuracy

High 10 profile (Hi10P), supporting 4:2:0 video with up to
10 bits of representation accuracy per sample

High 4:2:2 profile (H422P), supporting up to 4:2:2
chroma sampling and up to 10 bits per sample, and
High Profiles
-117-
High 4:4:4 profile (H444P), supporting up to 4:4:4
chroma sampling, up to 12 bits per sample, and
additionally supporting efficient lossless region coding
and an integer residual color transform for coding RGB
video while avoiding color-space transformation error

All of these profiles support all features of the Main
profile, and additionally support an adaptive transform
block size and perceptual quantization scaling matrices.
-118-
FRExt Only
4:2:2 MB
4:4:4 MB
MB structure in 4:2:2 and 4:4:4 formats
16
8
8
16
Y
C
b
C
r
16
16
16
16
-119-
RGB Y Cb Cr
Y = K
R
* R + (1 K
R
K
B
) * G + K
B
* B







K
R
= 0.2126; K
B
= 0.0722; K
R
+ K
B
+ K
G
= 1

Y = 0.2126 R + 0.7152 G + 0.0722 B

C
b
= 0.5389 (B Y) ; C
r
= 0.7874 (R Y)

(ITU-R Rec.BT.601 defines K
B
=0.114, K
R
=0.299)

( )
2(1 )
b
B
B Y
C
K

( )
2(1 )
r
R
R Y
C
K

-120-
Rounding error in RGB Y Cb Cr
FRExt Only : YCgCo












Cg = Green Chroma ; Co = Orange Chroma
To further avoid any rounding error, add only one bit of precision to
chroma samples
1 ( )
[ ]
2 2
1 ( )
[ ]
2 2
( )
2
g
o
R B
Y G
R B
C G
R B
C
+
= +
+
=

=
-121-

In 4:4:4 video, FRExt has residual color transform.

Keep RGB domain (same depth) for input, output and stored
reference pictures and use the forward and inverse color
transformations inside the encoder and decoder for processing of
the residual data only.

Eliminates color-space conversion error without significantly
increasing the overall complexity of the system.
-122-
Co = (R - B)

t = B + (Co >> 1)

Cg = G t

Y = t + (Cg >> 1)
Where t is an intermediate temporary variable and >> denotes
an arithmetic right shift operation.

Inverse color space conversion
t = Y (Cg >> 1)

G + t + Cg

B = t (Co >> 1)

R = B + Co
Forward color space conversion
-123-
Auxiliary pictures, which are extra monochrome pictures sent
along with the main video stream, and can be used for such
purposes as alpha blend compositing (specified as a different
category of data than SEI).

Film grain characteristics SEI, which allow a model of film
grain statistics to be sent along with the video data, enabling
an analysis-synthesis style of video enhancement wherein a
synthesized film grain is generated as a post-process when
decoding, rather than burdening the encoder with the
representation of exact film grain during the encoding
process.
SEI : Supplemental Enhancement Information
-124-
Deblocking filter display preference SEI, which allows the
encoder to indicate cases in which the pictures prior to the
application of the deblocking filter process may be
perceptually superior to the filtered pictures.

Stereo video SEI indicators, which allow the encoder to
identify the use of the video on stereoscopic displays, with
proper identification of which pictures are intended for
viewing by each eye.
-125-
Higher profile supports all capabilities of the lower ones
Also capable of decoding all bit streams encoded for the lower nested
profiles
All high profiles support all features of the main profile
New Profiles in the H.264/AVC FRExt Amendment
-126-
Levels in H.264/AVC

Level 1b added in FRExt. For some 3G wireless environments
-127-
Levels in H.264/AVC
1. If a picture size is smaller than the typical picture size
then frame rate can be higher up to a maximum of 172
frames/sec
2. Horizontal and vertical maximum sizes cannot be more
than sqrt[(Total # of pixels/frame)x8]
3. If at a given level, picture size is less than that in the
table, # of reference frames for ME and MC can be up
to 16.
-128-
To meet more demanding high fidelity applications
Compressed Bit Rate Multipliers for FRExt Profiles
Multipliers for fourth column of table in page 125
-129-
24 Frames/sec film
1920x1080 progressive
The High profile of FRExt produced nominally better video quality
than MPEG-2 when using only one third as many bits (8 Mbps
versus 24 Mbps)
The High profile of FRExt produced nominally transparent (i.e.,
difficult to distinguish from the original video without
compression) video quality at only 16 Mbps.
[9] T. Wedi, Y. Kashiwagi, Subjective quality evaluation of H.264/AVC FRExt for HD movie content,
JVT document JVT-L033, July 2004.
-130-
Courtesy: Advanced Technology Group of Motorola BCS
-131-
Courtesy: Advanced Technology Group of Motorola BCS
-132-
Fig. 7: (a) (e) Comparison of R-D curves for MPEG-2 (MP2),
MPEG-4 ASP (MP4 ASP) and H.264/AVC (MP4 AVC). I frames were
inserted every 15 frames (N=15) and two non-reference B frames
per reference I or P frame were used (M=3).

Courtesy: Advanced Technology Group of Motorola BCS
MP4 ASP yields 1.5 coding gain over MPEG-2.
MPEG-4 AVC yields 2.0 coding gain over MPEG-2.
-133-
High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps
Nominally transparent video quality on 1080p24 at 16 Mbps
-134-
(Fast VDO)
Sub-optimal uses of B frames and other aspects make the plotted
performance conservative for FRExt, thus the remark in the figure about
potential future performance
-135-
High Profile Details:
Deblocking Filter, CABAC, Signaling
Deblocking Filter:
Only control of filter is adjusted: do not filter 4x4 blocks
No change to filter operation itself

CABAC:
61 new contexts and corresponding initialization values
No change to CABAC engine

Signaling:
8x8 transform on/off flag at PPS level
8x8 transform on/off flag per macroblock allows adaptive use
-136-
High vs. Main Profile Summary
High Profile contains:
Main profile
Adaptive MB level switching between 8x8 and 4x4 transform block sizes.
Encoder specified perceptual based quantization scaling matrices
Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction):
HD Film: 12%
HD Video (progressive): 12%
HD Video (interlace): 4% (only 2 test clips)
SD Video (interlace): 6%

Complexity impact:
Implementation beyond Main Profile affects Intra prediction,
transform, deblocking filter control, CABAC decoding
No increase in computational requirements
Slight increase in memory requirements (CABAC, transform)
-137-
Licensing of H.264/AVC Technology
Two patent pools to obtain the license
1. MPEGLA www.mpegla.com
2. Via licensing www.vialicensing.com

These two patent pools do not guarantee that they
cover the entire technology of H.264 as participation of
a patent owner in a patent pool is voluntary.
-138-
AUDIO coding & systems
H.264 is limited to video
Audio coder: Bit rates, Quality levels and # of channels
left to industry and standards groups (ATSC, SCTE,
ARIB, DVB etc.)
DVB is considering AAC with SBR (AAC plus)
ATSC has selected AC-3 plus from Dolby
MPEG calls it HE-AAC (HE High efficiency)
ATSC, SCTE, ARIB, MPEG etc. will continue to use
MPEG-1 Audio, MPEG-2, AAC and AC-3.

Anda mungkin juga menyukai