\
|
(
(
(
(
(
(
(
(
(
(
(
(
(
D D D D
D D D D
D D D D
D D D D
x x x x
x x x x
x x x x
x x x x
Y
D
=
where // = rounding to the nearest integer
-45-
VC Algorithm : Transform
Chroma DC coefficients Intra pediction mode
(4x4) IntDCT
Walsh Hadamard transform : 2 x 2 DC coefficients
Y
D
=
(
1 1
1 1
1 1
1 1
11 10
01 00
DC DC
DC DC
18
19
20
21
22
23
24 25
V U
2x2 DC
AC
16
17
, 4:2:0
For 4:2:2 and 4:4:4 chroma formats Hadamard block size is increased.
-46-
VC Algorithm : Transform
Block diagram emphasizing transform
Transform &
Quantization
Motion
Estimation
Motion
Compensation
Picture
Buffering
Entropy
Coding
Intra
Prediction
Intra/Inter Mode
Decision
Inverse Quantization
& Inverse Transform
Deblocking
Filtering
+
-
+
+
Video Input
Bitstream
Output
- 4 x 4 integer DCT transform
H =
- Hadamard transform of DC coefficients
for 16 x 16 Intra luma and 8 x 8 chroma blocks
1 1 1 1
2 1 1 2
1 1 1 1
1 2 2 1
-47-
VC Algorithm : Quantization
Multiplication operation for the exact transform
is combined with the multiplication of scalar
quantization
Encoder : post-scaling and quantization
Decoder : inverse quantization and pre-scaling
|
|
.
|
\
|
=
Qstep
SF
round X Y
ij
ij ij
ij ij ij
SF Qstep Y X - - = '
X : quantizer input
Y : quantizer output
Qstep : quantization parameter, a total of 52 values, doubles in size for
every increment of 6 in QP 8 for bits per decoded sample.
FRExt expands QP beyond 52 by 6 for each additional bit of decoded sample
SF : scaling term
-48-
VC Algorithm : Transform, Quantization
Rescale and Inverse transform
Intra (16x16) prediction mode only
Forward
transform
Post-scaling
and
quantization
2x2 or 4x4
DC
transform
Chroma or Intra-
16 Luma Only
Encoder part
Input
block
Inverse
quantization and
pre-scaling
Inverse
transform
2x2 or 4x4
DC inverse
transform
Chroma or Intra-
16 Luma Only
Decoder part
Encoder
output /
decoder
input
Output
block
-49-
VC Algorithm : Entropy Coding
All syntax elements other than residual transform
coefficients are encoded by the Exp-Golomb codes (UVLC)
Scan order to read the residual data (quantized transform
coefficients) : zig-zag, alternate
Context-based Adaptive Variable Length Coding (CAVLC) in
All Profiles
Context-based Adaptive Binary Arithmetic Coding (CABAC)
in Main Profile
0 1 5 6
2 4 7 12
3 8 11 13
9 10 14 15
a b
0 2 8 12
1 5 9 13
3 6 10 14
4 7 11 15
Zig-zag scan
Alternate scan
-50-
Exponential Golomb codes (for data elements other than
tansform coefficients these codes are actually fixed,
and are also called Universal Variable Length Codes
(UVLC))
-51-
These are variable length codes with a regular construction
[M Zeroes] [1] [INFO]
INFO is an M-bit carrying information.
The first codeword as no leading zero or trailing info.
Code words 1 and 2 have a single-bit INFO field, code words 3-6
have a two-bit INFO field and so on.
The length of each Exp-Golomb codeword is (2M+1) bits.
M = Floor (Log
2
[code_num + 1])
INFO = code_num + 1 2
M
-52-
Decoding
1. Read in M leading zeroes followed by 1
2. Read in M-bit INFO field
3. Code_num = 2
M
+ INFO 1
(For codeword 0, INFO and M are zero)
CAVLC: Codes transform coefficients
CABAC: Codes transform coefficients and MV
All other syntax elements are coded with the
Exp_Golomb codes
-53-
VC Algorithm : Entropy Coding
CAVLC : handles the zero and +/-1 coefficients as the different
manner with the levels of coefficients. The total numbers of zeros
and +/-1 are coded. For the other coefficients, their levels are
coded.
Encoding steps
step 1 : encode the total number of nonzero coefficients and +/-1 (trailing
ones) values
step 2 : encode the sign of each trailing one in reverse order
step 3 : encode the levels of the remaining non-zero coefficients in reverse
order
step 4 : encode the total number of zeros before the last coefficient
step 5 : encode each run of zeros
H.264 maintains 11 different sets of codes (4 for # of coefficients and 7 for
the actual coefficients)
These are adopted to the current stream or context (thus CAVLC)
-54-
VC Algorithm : Entropy Coding
Example of CAVLC
c0 c1 c2 0 1 1 0 1 0 0 0
0 1 2 3 4 5 6 7 8 9 16
order
coeff.
Step 1 : encode for no. of nonzero total coefficients and 1 or 1 (trailing ones)
from look-up table
no. of nonzero total coefficients = 6 (order 0, 1, 2, 4, 5, 7)
no. of trailing ones = 3 (order 4, 5, 7)
Step 2 : encode for sign of trailing one in reverse order
- (order 7) , + (order 5), + (order 4)
Step 3 : encode for level of remaining non-zero coefficients in reverse order
c2 (order 2), c1, c0
Step 4 : encode for total no. of zeros before the last coefficient
2 (order 3, 6)
Step 5 : encode for run of zeros in reverse order
1 (order 6-5), 0 (order 4), 1 (order 3-2)
-55-
VC Algorithm : Entropy Coding
CABAC : utilizes the arithmetic coding, also in order to
achieve good compression, the probability model for each
symbol element is updated. Both MV and residual
transform coefficients are coded by CABAC.
Encoding steps
step 1 : context modeling: Choose a suitable model
step 2 : binarization: I f a symbol is non-binary valued it will be
mapped into a sequence of binary decisions called bins
step 3 : binary arithmetic coding using probability estimates provided
by context modeling
-56-
CABAC increases compression efficiency by 10% over CAVLC but computationally more intensive
-57-
VC Algorithm : B Slice
Generalized Bidirectional prediction
Supports not only forward/backward prediction pair, but also
forward/forward and backward/backward pairs
Direct mode
Derives reference picture, block size, and motion vector data
from the subsequent inter picture.
Weighted prediction
Scaling operation by applying a weighting factor to the samples
of motion-compensated prediction data in P or B slice.
Pictures coded using B slices can be used as references for
decoding of subsequent pictures in decoding order (with an
arbitrary relationship to such pictures in display order)
-58-
VC Algorithm : B Slice
Generalized Bidirectional prediction
Multiple reference pictures mode
Two forward references : proper for a region just before scene
change
Two backward references : proper for a region just after scene
change
......
next pictures
current picture
...... ......
......
previous pictures
2 forward MVs
2 backward MVs
1 forward MV +
1 backward MV
-59-
VC Algorithm : B Slice
Direct mode
Forward / backward pair of bi-directional prediction
Prediction signal is calculated by a linear combination of two
blocks that are determined by the forward and backward
motion vectors pointing to two reference pictures.
List 0 Reference
td
tb
mvCol
mvL0
mvL1
......
direct-mode partition
co-located partition
List 1 Reference Current Picture
mvL0 = tb mvCol / td
mvL1 = (td tb) mvCol / td
where mvCol is a MV used
in the co-located MB of
the subsequent picture
-60-
VC Algorithm : B Slice
Weighted prediction
Different weights of reference signals for gradual transitions
from scene to scene, i.e., fade to black (the luma samples of
the scene gradually approach zero), fade from black
Different weighted prediction method for a macroblock of P
slice or B slice
A prediction signal p for B slice is obtained by different weights
from two reference signals, r1 and r2.
p = w1 r1 + w2 r2
where w1 and w2 are weighting factors
Implicit type : the factors are calculated based on the temporal
distance between the pictures
Explicit type : the factors are transmitted in the slice header
-61-
VC Algorithm: SP and SI Slices (Extended profile only)
Switched slice
SP slice : the specially coded slice for efficient switching
between video streams, similar to coding of a P slice
SI slice : the switched slice, similar to coding of an I slice
P(1,1) P(1,2) P(1,3) P(1,4) P(1,5)
P(2,1) P(2,2)
P(2,3) P(2,4) P(2,5)
S(3)
Bitstream A
Bitstream B
Allows bit stream switching and additional functionalities such as random access, fast forward,
reverse and stream splicing.
-62-
Error Resilience
Parameter setting
Flexible macroblock ordering (FMO)
Redundant slice methods
Switched slice SP/SI
Data partitioning
Arbitrary Slice Order ASO
Only in Extended Profile
-63-
Data partitioning slices (Extended profile only)
1. Coded data of a slice is placed in three separate data
partitions A,B & C.
2. A has slice header and header data for each MB in the
splice
3. B has coded residual data for intra and SI slice MBs
4. C has coded residual data for inter coded MB
5. Place each partition A, B & C in a separate NAL unit
and transport separately
-64-
Error Resilience : Parameter setting
The sequence parameter set contains all information
related to a sequence of pictures
a picture parameter set contains all information related to
all the slices belonging to a single picture.
The encoder chooses the appropriate picture parameter set
to use by referencing the storage location in the slice
header of each coded slice.
H.264
Encoder
H.264
Decoder
Parameter Set #3
-Video format NTSC
-Motion Resolution
-Enc: CABAC
-Frame width: 11
1
2
3
3
2
1
Reliable Parameter Set
Exchange
VCL Data transfer with PS #3
-65-
Error Resilience : FMO
Flexible macroblock ordering allows to assign macroblocks
to slices in an order other than the scan order.
Assume that all macroblocks of the picture are allocated
either to slice group 0 or slice group 1, and the
macroblocks in each slice group are dispersed through the
picture.
If the packet containing the information of slice group 1 is lost
during transmission, then the lost macroblock can be
recovered by the error concealment mechanism, since every
lost macroblock has several spatial neighbors that belong to
the other slice.
ASO is similar to FMO. Randomizes data prior to
transmission. Errors are distributed more randomly over
the video frames rather than in a single block of data.
-66-
Error Resilience : Redundant Slice
Redundant slices allow to place one or more redundant
representations of the same macroblocks.
For example, the primary representation can be coded with
a low quantization parameter (hence in good quality),
whereas the redundant slice can be coded with a high
quantization parameter (hence, in a much coarser quality,
but also utilizing fewer bits).
A decoder reacts to redundant slices by reconstructing only
the primary slice, if it is available, and discarding the
redundant slice. However, if the primary slice is missing,
the redundant slice can be reconstructed.
-67-
Comparison of Coding Efficiency
Subjective verification test
Comparison of the H.264 Baseline Profile (BP) and MPEG-4 part
2 Simple Profile (SP) for the multimedia definition (MD). The
numbers in the table indicate the coding efficiency improvement
achieved by the H.264 where the codecs being compared
provide statistically equivalent picture quality. The letter T
indicates that H.264 achieved transparency.
H.264 Baseline Profile achieves a coding efficiency improvement
of 2 times or greater in 14 out of 18 statistically conclusive cases.
Sequence
Bitrate[kbps] for QCIF
Bitrate[kbps] for CIF
24
48
96
192
96
192
384
768
Foreman
> 1x
2x
2x
T
2x
> 2x
T
T
Paris
> 1x
2x
2x
2x
2x
T, 2x
T
Head
> 2x
2x
2x
T
T
Zoom
> 1x
1x
2x
2x
-68-
Comparison of Coding Efficiency
Subjective verification test
Comparison of H.264 Main Profile (MP) and MPEG-4 Part 2
Advanced Simple Profile (ASP) for the MD.
H.264 Main Profile achieves a coding efficiency improvement
of 2 times or greater in 18 out of 25 statistically conclusive
cases.
Sequence
Bitrate[kbps] for QCIF
Bitrate[kbps] for CIF
24
48
96
192
96
192
384
768
Football
2x / 1x
2x
2x
> 1x
> 1x
1x
> 1x
Mobile
2x / 1x
2x
2x
> 2x
4x
> 2x
T
Husky
2x
2x
> 1x
2x
2x
2x
Tempete
2x
2x
> 2x
T
2x
2x
T,2x
T
-69-
Comparison of Coding Efficiency
Subjective verification test
Comparison of H.264 Main Profile and MPEG-2 for the Standard
Definition (SD)
When compared to MPEG-2 HiQ (real-time High Quality), H.264
Main Profile achieves a coding efficiency improvement of 1.5
times or greater in 8 out of 12 statistically conclusive cases.
When compared to MPEG-2 TM5, H.264 Main Profile achieves a
coding efficiency improvement of 1.8 times or greater in 9 out
of 12 statistically conclusive cases.
Sequence
Bitrate[Mbps] for MPEG-2 HiQ
Bitrate[Mbps] for MPEG-2 TM5
1.5
2.25
3
4
6
1.5
2.25
3
4
6
Football
> 1.5x
> 1.3x
1.3x
1.5x
2x
1.8x
1.3x
1.5x
Mobile
4x
2.7x
2x
T
T
> 4x
> 2.7x
> 2x
T
T
Husky
> 1.5x
1.3x
1x /1.3x
1.5x
2.7x / 2x
1.8x
2x
> 1.5x
Tempete
T, 2x
T
T
T
T
T, 4x
T
T
T
T
-70-
Comparison of Coding Efficiency
Subjective verification test
Comparison of H.264 Main Profile and MPEG-2 for the High
Definition (HD)
When compared to MPEG-2 HiQ, H.264 Main Profile achieves a
coding efficiency improvement of 1.7 times or greater in 7 out
of 9 statistically conclusive cases.
When compared to MPEG-2 TM5, H.264 Main Profile achieves a
coding efficiency improvement of 1.7 times or greater in 8 out
of 9 statistically conclusive cases.
Sequence
Bitrate[Mbps] for MPEG-2 HiQ
Bitrate[Mbps] for MPEG-2 TM5
6
10
20
6
10
20
720
(60p)
Crew
1.7x
2x
T
1.7x
2x
T
Harbour
T, 3.3x
T
T
T, 1.7x
T
T
1080
(30i)
Stockholm Pan
1x
2x
New Mobile &
Calendar
T, 2x
T
T, 2x
T
1080
(25p)
River Bed
> 1.7x
> 1x
T
> 1.7x
> 1x
T
Vintage Car
1.7x
T, 2x
T
1.7x
T, 2x
T
-71-
Comparison of Coding Efficiency
Objective test
PSNR (between original and reconstructed pictures) and bitrate
saving results of Tempete CIF 15Hz sequence for the video
streaming application
HLP High Latency Profile
ASP Advanced Simple Profile
H.26L H.264 Main Profile
-72-
Comparison of Coding Efficiency
Objective test
PSNR and bitrate saving results of Paris CIF 15Hz sequence for
the video conferencing application
CHC Conversational High Compression
SP Simple Profile
ASP Advanced Simple Profile
H.26L H.264 Baseline Profile
-73-
Conclusions
H.264 outperforms over the previous standards
Comparison of standards
Feature/Standard MPEG-1 MPEG-2 MPEG-4 part 2
(visual)
H.264/MPEG-4
part 10
Macroblock size 16x16 16x16 (frame mode)
16x8 (field mode)
16x16 16x16
Block Size 8x8
8x8
16x16, 16x8, 8x8 16x16, 8x16, 16x8,
8x8, 4x8, 8x4, 4x4
Transform 8x8 DCT 8x8 DCT 8x8 DCT/Wavelet 4x4, 8x8 Int DCT
4x4, 2x2 Hadamard
Quantization Scalar quantization
with step size of
constant increment
Scalar quantization
with step size of
constant increment
Vector
quantization
Scalar quantization
with step size
increase at the rate
of 12.5%
Entropy coding VLC VLC VLC VLC, CAVLC, CABAC
Motion Estimation &
Compensation
Yes Yes Yes Yes, more flexible
Up to 16 MVs per MB
Playback & Random
Access
Yes Yes Yes Yes
-74-
Conclusions
Comparison of standards (continued)
Feature/Standard MPEG-1 MPEG-2 MPEG-4 part 2
(visual)
H.264/MPEG-4
part 10
Pel accuracy Integer, -pel Integer, -pel Integer, -pel,
-pel
Integer, -pel,
-pel
Profiles No 5 8 4
Reference picture one one one multiple
Bidirectional
prediction mode
forward/backward forward/backward forward/backward forward/forward
forward/backward
backward/backward
Picture Types I, P, B, D I, P, B I, P, B I, P, B, SP, SI
Error robustness Synchronization &
concealment
Data partitioning,
FEC for important
packet
transmission
Synchronization,
Data partitioning,
Header extension,
Reversible VLCs
Data partitioning,
Parameter setting,
Flexible macroblock
ordering, Redundant
slice, Switched slice
Transmission rate Up to 1.5Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps
Compatibility with
previous standards
n/a Yes Yes No
Encoder complexity Low Medium Medium High
-75-
Conclusions
Currently the commercial H.264 codecs are widely developed by
several companies for replacing / complementing existing products.
Related companies
- UBVideo website http://www.ubvideo.com
- LSI Logic website http://www.lsilogic.com
- Microsoft website: http://www.microsoft.com
- Envivio website: http://www.envivio.com
- Broadcom website: http://www.broadcom.com
- Nagravision website: http://www.nagravision.com
- Philips website: http://www.philips.com
- Polycom website: http://www.polycom.com
- PixelTools Corporation website: http://www.pixeltools.com
- Amphion website: http://www.amphion.com
-76-
Conclusions
Related companies (continued)
- Ligos Corporation website: http://www.ligos.com
- LifeSize website: http://www.lifesize.com
- Netvideo website: http://www.netvideo.com
- Motorola website: http://www.motorola.com
- Vanguard Software Solutions website: http://www.vsofts.com
- STMicroelectronics website: http://us.st.com
- MainConcept website: http://www.mainconcept.com
- Impact Labs Inc. website: http://www.impactlabs.com
- Sorenson media AVC Pro codec (H.264)
- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and
Microsofts VC-1 video codec (based on Windows Media Video 9 codec)
mandatory (blu-ray Disc BD-ROM specification)
-77-
Conclusions
Related group
- MPEG website http://www.mpeg.org
- JVT website: ftp://standards.polycom.com
- www.mpegif.org
Test software
http://iphome.hhi.de/suehring/tml/download
- H.264/AVC JM Software:
http://bs.hhi.de/~suehring/tml/download
Test sequences
- http://ise.stanford.edu/video.html
- http://kbs.cs.tu-berlin.de/~stewe/vceg/sequences.htm
- http://www.its.bldrdoc.gov/vqeg
- ftp.tnt.uni-hannover.de/pub/jvt/sequences/
- http://trace.eas.asu.edu/yuv/yuv.html
-78-
Conclusions
H.264 licensing : MPEG LA and Via Licensing are now coordinating
the licensing terms, decoder-encoder royalties for product
manufacturers and participation fees for video streaming services
regardless of Profile(s)
MPEG LA website : http://www.mpegla.com
Via Licensing : http://www.vialicensing.com
FRExtensions
to 4:2:2 and 4:4:4 chroma formats
12 bit resolution for medical imaging
Scalable coding/ Lossless coding for digital cinema application
High fidelity coding for the next generation optical discs
Extension for various applications H. Schwartz, D. Marpe and T.
Wiegand, SNRscalable extension of H.264/AVC, ICIP 2004,
vol. , pp. , Singapore, Oct. 2004.
FINAL STAGES OF APPROVAL
Standard systems and file format support specifications
Standardizing reference software implementation
Standardizing conformance bit streams and specifications
-79-
Contacts for Further Information
JVT documents and software on open ftp website:
ftp://standards.polycom.com
http://iphome.hhi.de/suehring
JVT reflector subscription:
http:/mail.imtc.org/cgi-bin/lyris.pl?enter=jvt-experts
JVT reflector e-mail:
jvt-experts@mail.imtc.org
JVT management team:
Chair: Gary Sullivan (garysull@microsoft.com)
Co-chair: Ajay Luthra (aluthra@motorola.com)
Co-chair: Thomas Wiegand (wiegand@hhi.de)
Dr. K. R . Rao, UTA: rao@uta.edu
Dr. S. K. Kwon, Dongeui University: skkwon@dongeui.ac.kr
Ms. A. Tamhankar, T-Mobile: arundhati@ieee.org
Karsten.suehring@hhi.fraunhofer.de
-80-
References
[1] MPEG-2: ISO/IEC JTC1/SC29/WG11 and ITU-T, ISO/IEC 13818-2:
Information Technology-Generic Coding of Moving Pictures and
Associated Audio Information: Video, ISO/IEC and ITU-T, 1994.
[2] MPEG-4: ISO/IEC JTCI/SC29/WG11, ISO/IEC 14 496:2000-2:
Information on Technology-Coding of Audio-Visual Objects-Part 2:
Visual, ISO/IEC, 2000.
[3] H.263 : International Telecommunication Union, Recommendation
ITU-T H.263: Video Coding for Low Bit Rate Communication, ITU-T,
1998.
[4] H.264 : International Telecommunication Union, Recommendation
ITU-T H.264: Advanced Video Coding for Generic Audiovisual Services,
ITU-T, 2003.
[5] T. Stockhammer, M. Hannuksela, and S. Wenger, H.26L/JVT
Coding Network Abstraction Layer and IP-based Transport, IEEE ICIP
2002, Rochester, New York, Vol. 2, pp. 485-488, Sep. 2002.
-81-
[6] P. List, A. Joch, J. Lainema, G. Bjontegaard, and M. Karczewicz,
Adaptive Deblocking Filter, IEEE Trans. CSVT, Vol. 13, pp. 614-619,
July 2003.
[7] K. R. Rao and P. Yip, Discrete Cosine Transform, Academic Press,
1990.
[8] I. E.G. Richardson, H.264 and MPEG-4 Video Compression : Video
Coding for Next-generation Multimedia, Wiley, 2003.
[9] H. S. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, Low-
Complexity Transform and Quantization in H.264/AVC, IEEE Trans.
CSVT, Vol. 13, pp. 598-603, July 2003.
[10] S. W. Golomb, Run-Length Encoding, IEEE Trans. on
Information Theory, IT-12, pp. 399-401, December 1966.
[11] D. Marpe, H. Schwarz, and T. Wiegand, Context-Based Adaptive
Binary Arithmetic Coding in the H.264/AVC Video Compression
Standard, IEEE Trans. CSVT, Vol. 13, pp. 620-636, July 2003.
-82-
[12] M. Flierl and B. Girod, Generalized B Picture and the Draft
H.264/AVC Video-Compression Standard, IEEE Trans. CSVT, Vol. 13, pp.
587-597, July 2003.
[13] M. Karczewicz and R. Kurceren, The SP- and SI-Frames Design for
H.264/AVC, IEEE Trans. CSVT, Vol. 13, pp. 637-644, July 2003.
[14] S. Wenger, H.264/AVC Over IP, IEEE Trans. CSVT, Vol. 13, pp.
645-656, July 2003.
[15] ISO/IEC JTC1/SC29/WG11, Report of The Formal Verification Tests
on AVC (ISO/IEC14496-10 | ITU-T Rec. H.264), MPEG2003/N6231,
December 2003.
[16] M. Ghanbari, Standard Codecs : Image Compression to Advanced
Video Coding, Hertz, UK: IEE, 2003.
[17] A. Joch, F. Kossentini, H. Schwarz, T. Wiegand, and G. J. Sullivan,
Performance Comparison of Video Coding Standards using Lagrangian
Coder Control, IEEE ICIP 2002, Rochester, New York, Vol. 2, pp. 501-
504, Sept. 2002.
-83-
[18] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra,
Overview of the H.264/AVC Video Coding Standard, IEEE Trans. CSVT,
Vol. 13, pp. 560-576, July 2003.
[19] MPEG website : http://www.mpeg.org
[20] JVT website : ftp://standards.polycom.com
[21] MPEG LA website : http://www.mpegla.com
[22] H.264 / AVC JM Software :
http://bs.hhi.de/~suehring/tml/download
[23] UBVideo website http://www.ubvideo.com
[24] LSI Logic website: http://www.lsilogic.com
[25] Microsoft website: http://www.microsoft.com
[26] Envivio website: http://www.envivio.com
[27] PixelTools Corporation website: http://www.pixeltools.com
[28] Nagravision website: http://www.nagravision.com
[29] Philips website: http://www.philips.com
-84-
[30] Polycom website: http://www.polycom.com
[31] MainConcept website: http://www.mainconcept.com
[32] Amphion website: http://www.amphion.com
[33] Ligos Corporation website: http://www.ligos.com
[34] LifeSize website: http://www.lifesize.com
[35] Broadcom website: http://www.broadcom.com
[36] Netvideo website: http://www.netvideo.com
[37] Motorola website: http://www.motorola.com
[38] http://www.mediaware.com
[39] Impact Labs Inc. website: http://www.impactlabs.com
[40] Vanguard Software Solutions website: http://www.vsofts.com
[41] STMicroelectronics website: http://us.st.com www.thomson.net
[42] www.conexant.com (H.264 decoder ICs _ HDTV & SDTV)
[43] www.pixtree.com
-85-
[44] BT Exact--http://www.btexact.bt.com/
[45] DemoGaFrX--www.dolby.com
[46] Equator--http://www.equator.com/
[47] Moonlight--www.elecard.com
[48] Sand Video--www.broadcom.com/
[49] VideoLocus-
http://www.lsilogic.com/technologies/industry_standards/mpeg_based_
standards_h_264.html
[50] W&W Communications (and DSP Research)--
http://www.wwcoms.com/
[51] Cisco Systems -- www.cisco.com
[52] Deutsche Telekom-- http://www.telekom3.de/en-p/home/cc-
startseite.html
-86-
[53] FastVDO-- http://www.fastvdo.com/
[54] Glance Networks---http://www.glance.net
[55] RADVISION-- www.radvision.com/
[56] Sun Microsystems--http://www.sun.com/
[57] S. Srinivasan et al, Windows media video 9: Overview and
applications, Signal Processing: Image Communication, vol.19, pp.
851-875, Oct. 2004.
[57a] G. Sullivan and T. Wiegand, Video compression from
concepts to H.264/AVC standard, Proc. IEEE, vol.93, pp. 18-31,
Jan. 2005.
[57b] C. Gomila, The H. 264/MPEG -4 AVC video coding standard,
Short tutorial, EURASIP News Letter, vol. 15, pp. 19-34, June 2004.
[58] http://ecs.itu.ch
-87-
[59] N. Kamaci and Y. Altunbasak, Performance comparison of the
emerging H.264 video coding standard with the existing standards,
IEEE ICME, pp. , Baltimore, MD, July 2003.
[60] H. Schwartz, D. Marpe and T. Wiegand, SNRscalable
extension of H.264/AVC, ICIP 2004, vol. , pp. , Singapore, Oct.
2004.
[61] G. J. Sullivan, P. Topiwala and A. Luthra The H.264/AVC
advanced video coding standard: Overview and introduction to the
fidelity range extensions, SPIE Conf. on applications of digital image
processing XXVII, vol. 5558, pp. 53-74, Aug. 2004.
[62] J. Ostermann et al, Video coding with H.264/AVC: Tools,
performance and complexity, IEEE CAS Magazine, vol. pp.7-34, I
quarter, 2004.
[63] W. Gao et al, AVS The Chinese next-generation video coding
standard, NAB 2004, Las Vegas, NV, April 2004.
[64] http://www.imtc.org/activity_groups/ JVT-EXPERTS LIST (FAQ)
-88-
[65] H.264 / AVC reference SOFWARE 9.3
[66] http://iphome.hhi.de/suehring/tml/download/jm93.zip
[67] S. Kumar et al Overview of error resiliency schemes in
H.264/AVC standard, JVCIR, Special Issue on H.264/AVC, VOL. ,
pp. , June-Aug. 2005.
[68] www.stmicroelectronics.com WMV 9 and HD H.264/AVC decoder
chip (STB7100)
[69] a. Concept Main
http://www.mainconcept.com/index_flash.shtml
b. Mpegable
http://www.mpegable.com/show/home.html
c. Moonlight
http://www.moonlight.co.il/cons_xmuxer.php
Moonlights codec is one of the popular ones in the industry and it
supports AAC. All the codecs have a trial version for download and
also sample video clips are available.
-89-
[70] ST Thomson, Broadcom and Ateme
http://www.ateme.com/products/h264.php
have decoder chips for H.264. Ateme has real time single chip
H.264 Main profile encoder (FPGA)
[71] Moscow State University has published a study of current
implementation of H.264 standard, including a widely-used
implementation of MPEG-4 ASP as a reference.
The study is available at:
http://compression.ru/video/codec_comparison/mpeg-
4_avc_h264_en.html
Some of the results and observations in the study may be
interesting to H.264/AVC community.
Another interesting test has been performed in December 2004.
http://www.doom9.org/codecs-104-1.htm The methodology is
completely different than the one used by the Moscow State
University.
It features H264, WM9, RV10, VP6 and MPEG-4 ASP.
-90-
http://www.avc-alliance.org
http://ftp3.itu.int/av-arch/jvt-site
Http://www.dvdforum.org/29cmtg-resolution.htm\
High Profile is now officially mandatory for HD DVD Video (DVD -
Forum).
http://tinyurl.com/3u9ww (up to 3 recommendations can be
downloaded per year)
http://tinyurl.com/6dnck (ISO/IEC 14493-10 - MPEG-4 part 10
published standard costs CHF 260.00 Swiss Franks.)
-91-
Fidelity Range Extensions
Slices in a picture are compressed as follows:
"Intra" spatial (block based) prediction
o Full-macroblock luma or chroma prediction 4 modes
(directions) for prediction
o 8x8 (FRExt-only) or 4x4 luma prediction 9 modes (directions)
for prediction
4:2:2, 4:4:4 Formats
> 8 bit depths
(8x8) integer DCT
HVS weighting matrices
Transform bypass lossless mode: uses prediction and entropy
coding of prediction errors
Residual color transform
Source editing such as Alpha blending
High bit rates [use RGB color format] Y C
g
C
o
High resolution
-92-
"Inter" temporal prediction block based motion estimation and
compensation
o Multiple reference pictures
o Reference B pictures
o Arbitrary referencing order
o Variable block sizes for motion compensation
Seven block sizes:
16x16, 16x8, 8x16, 8x8, 8x4, 4x8 & 4x4
o 1/4-sample luma interpolation (1/4 or 1/8th-sample chroma
interpolation)
o Weighted prediction
o Frame or Field based motion estimation for interlaced scanned
video
-93-
Interlaced coding features
o Frame-field adaptation
Picture Adaptive Frame Field (PicAFF)
Choice of compression (frame or field) is
selected a the frame level
MacroBlock Adaptive Frame Field (MBAFF)
o Field scan
Lossless representation capability
o Intra PCM raw sample-value macroblocks
o Entropy-coded transform-bypass lossless
macroblocks (FRExt-only)
In the MBAFF, choice of compression (frame or field) is
selected at the two-vertical-pair-MB pair.
-94-
8x8 (FRExt-only) or 4x4 Integer Inverse Transform
(conceptually similar to the well-known DCT)
Residual color transform for efficient RGB coding
without conversion loss or bit expansion (FRExt-only)
Scalar quantization
Encoder-specified perceptually weighted quantization
scaling matrices (FRExt-only)
Logarithmic control of quantization step size as a
function of quantization control parameter
-95-
Deblocking filter (within the motion compensation loop)
Coefficient scanning
o Zig-Zag (Frame)
o Field (alternate scan)
Lossless Entropy coding
o Universal Variable Length Coding (UVLC) using Exp-Golomb codes
o Context Adaptive VLC (CAVLC)
o Context-based Adaptive Binary Arithmetic Coding (CABAC)
-96-
Error Resilience Tools
o Flexible Macroblock Ordering (FMO)
o Arbitrary Slice Order (ASO)
o Redundant Slices
SP and SI synchronization pictures for streaming and other uses
-97-
Various color spaces supported (YCbCr of various types, YCgCo, RGB, etc.
especially in FRExt)
4:2:0, 4:2:2 (FRExt-only), and 4:4:4 (FRExt-only) color formats
Auxiliary pictures for alpha blending (FRExt-only)
Each slice need not use all these tools. Depending upon the subset of
these tools, a slice can be I, P, B, SP or SI. A picture may contain different
slice types.
-98-
Slice
I (Intra)
P (Predicted)
B (Bidirectionally predicted)
(Reference for temporal prediction or non-reference)
SP (Switching P)
SI (Switching I)
-99-
I Slice
(MB in I slice and intra MB in P and B slices)
Spatial intra prediction
9 directional modes for (4x4) or (8x8) blocks.
Apply (4 x4) or (8x8) IntDCT to Intra prediction errors.
Note (8x8) IntDCT for FRExt-only.
After (8x8) IntDCT, HVS weighting is applied to coefficients
(FRExt-only).
-100-
Quantized transform coefficients are scanned (zigzag or
field) and then entropy coded (CAVLC or CABAC)
PICAFF: Field processing similar to frame mode
MBAFF: If MB pair in field mode (frame mode), field
(frame) neighbors are used for spatial prediction.
-101-
I Slice (Spatial Prediction)
(16x16) Luma & Corresponding chroma block size
for full MB prediction
(8x8) luma prediction (FRExt-only)
(4x4) Luma prediction
-102-
For (16x16) luma, full MB prediction has four modes
Vertical pels in MB predicted from pels just above of MB
Horizontal pels in MB predicted from pels just left of MB
DC pels in MB are predicted as average value of the
neighboring pels
Planar Prediction
Assume MB covers diagonally increasing luma values.
Predictor is formed based upon the planar equation.
-103-
Chroma spatial prediction (operates on entire MB)
4:2:0 (8x8) Similar to (16x16) Luma MB prediction
4:2:2 (8x16) Vertical, Horizontal, DC, Planar
4:4:4 (16x16)
-104-
For (8x8) luma intra prediction
Nine Intra_8x8 prediction modes similar to the nine
modes for Intra_4x4
FRExt Only
-105-
Integer 8x8 Transform (luma only)
FRExt Only
-106-
FRExt Only
HVS Weighting Matrices
Matrix can be transmitted in SPS and PPS
Separate Matrix for 4x4 and 8x8 transforms
Separate Matrix for Inter and Intra
Encoder can design and use customized scaling matrices.
These are to be sent to the decoder at the sequence or picture level.
Default matrices
-107-
HVS Weighting Matrices
Scaling matrix reflecting visual perception is simply a multiplier
applied during the inverse quantization. (This itself is a
multiplication)
Weighting matrices can be customized separately for
4x4 Intra Y
4x4 Intra C
b
, C
r
4x4 Inter Y
4x4 Inter C
b
, C
r
8x8 Intra Y
8x8 Inter Y
-108-
Two scans similar to 4x4 transform switched for frame/field coding
Coefficient scanning is based on the decreasing variances and to maximize
number of zero-valued coefficients along the scan
Frame Zig-Zag Field
FRExt Only
-109-
Examples of parameters to be encoded
Parameters Description
Sequence, picture and Headers and parameters
slice-layer syntax elements
Macroblock type mb_type Prediction method for each coded
macroblock
Coded block pattern Indicates which blocks within a
macroblock contain coded coefficients
Quantiser parameter Transmitted as a delta value from the
previous value of QP
Reference frame index Identify reference frame(s) for
inter prediction
Motion vector Transmitted as a difference (mvd) from
predicted motion vector
Residual data Coefficient data for each 4x4 or 2x2
block
-110-
Exponential Golomb Codes (for data elements other than transform coefficients
these codes are actually fixed, and are also called Universal Variable Length
Codes (UVLC))
-111-
These are variable length codes with a regular construction
[ M Zeros] [ 1 ] [ INFO ]
INFO is an M-bit field carrying information.
The first codeword has no leading zero or trailing INFO.
Code words 1 and 2 have a single-bit INFO field, code words 3-6 have a
two-bit INFO field and so on.
The length of each Exp-Golomb codeword is (2M + 1) bits.
M = Floor(log
2
[ code_num + 1 ])
INFO = code_num + 1 2
M
-112-
Decoding
1. Read in M leading zeros followed by 1
2. Read M-bit INFO field
3. Code_num = 2
M
+ INFO 1
CAVLC: Codes transform coefficients
CABAC: Code transform coefficients and MV
All other syntax elements are coded with the Exp_Golomb codes
-113-
DVD Forum: High Profile is mandatory for HD DVD players.
The BD-ROM Video specification of the Blu-ray Disc Association:
FRExtentions are mandatory.
The DVB (digital video broadcast) standards for European broadcast
television. For SD main is mandatory and high is optional. For HD High is
mandatory.
ATSC has preliminarily selected high profile.
Several other environments may soon embrace it as well in the U.S. and
various designs for satellite and cable television.
ADOPTIONS
-114-
For applications such as content-contribution,
content-distribution, and studio editing and post-
processing:
Use more than 8 bits per sample of source video accuracy
Use higher resolution for color representation than what is typical in
consumer applications (i.e., 4:2:2 or 4:4:4 sampling as opposed to 4:2:0
chroma sampling format)
Perform source editing functions such as alpha blending (a process for
blending of multiple video scenes, best known for use in weather reporting
where it is used to super- impose video of a newscaster over video of a
map or weather-radar scene)
-115-
Use very high bit rates
Use very high resolution
Achieve very high fidelity even representing some parts of the video
losslessly
Avoid color-space transformation rounding error
Use RGB color representation
-116-
High profile (HP), supporting 8-bit video with 4:2:0
sampling, addressing high-end consumer use and other
applications using high-resolution video without a need for
extended chroma formats or extended sample accuracy
High 10 profile (Hi10P), supporting 4:2:0 video with up to
10 bits of representation accuracy per sample
High 4:2:2 profile (H422P), supporting up to 4:2:2
chroma sampling and up to 10 bits per sample, and
High Profiles
-117-
High 4:4:4 profile (H444P), supporting up to 4:4:4
chroma sampling, up to 12 bits per sample, and
additionally supporting efficient lossless region coding
and an integer residual color transform for coding RGB
video while avoiding color-space transformation error
All of these profiles support all features of the Main
profile, and additionally support an adaptive transform
block size and perceptual quantization scaling matrices.
-118-
FRExt Only
4:2:2 MB
4:4:4 MB
MB structure in 4:2:2 and 4:4:4 formats
16
8
8
16
Y
C
b
C
r
16
16
16
16
-119-
RGB Y Cb Cr
Y = K
R
* R + (1 K
R
K
B
) * G + K
B
* B
K
R
= 0.2126; K
B
= 0.0722; K
R
+ K
B
+ K
G
= 1
Y = 0.2126 R + 0.7152 G + 0.0722 B
C
b
= 0.5389 (B Y) ; C
r
= 0.7874 (R Y)
(ITU-R Rec.BT.601 defines K
B
=0.114, K
R
=0.299)
( )
2(1 )
b
B
B Y
C
K
( )
2(1 )
r
R
R Y
C
K
-120-
Rounding error in RGB Y Cb Cr
FRExt Only : YCgCo
Cg = Green Chroma ; Co = Orange Chroma
To further avoid any rounding error, add only one bit of precision to
chroma samples
1 ( )
[ ]
2 2
1 ( )
[ ]
2 2
( )
2
g
o
R B
Y G
R B
C G
R B
C
+
= +
+
=
=
-121-
In 4:4:4 video, FRExt has residual color transform.
Keep RGB domain (same depth) for input, output and stored
reference pictures and use the forward and inverse color
transformations inside the encoder and decoder for processing of
the residual data only.
Eliminates color-space conversion error without significantly
increasing the overall complexity of the system.
-122-
Co = (R - B)
t = B + (Co >> 1)
Cg = G t
Y = t + (Cg >> 1)
Where t is an intermediate temporary variable and >> denotes
an arithmetic right shift operation.
Inverse color space conversion
t = Y (Cg >> 1)
G + t + Cg
B = t (Co >> 1)
R = B + Co
Forward color space conversion
-123-
Auxiliary pictures, which are extra monochrome pictures sent
along with the main video stream, and can be used for such
purposes as alpha blend compositing (specified as a different
category of data than SEI).
Film grain characteristics SEI, which allow a model of film
grain statistics to be sent along with the video data, enabling
an analysis-synthesis style of video enhancement wherein a
synthesized film grain is generated as a post-process when
decoding, rather than burdening the encoder with the
representation of exact film grain during the encoding
process.
SEI : Supplemental Enhancement Information
-124-
Deblocking filter display preference SEI, which allows the
encoder to indicate cases in which the pictures prior to the
application of the deblocking filter process may be
perceptually superior to the filtered pictures.
Stereo video SEI indicators, which allow the encoder to
identify the use of the video on stereoscopic displays, with
proper identification of which pictures are intended for
viewing by each eye.
-125-
Higher profile supports all capabilities of the lower ones
Also capable of decoding all bit streams encoded for the lower nested
profiles
All high profiles support all features of the main profile
New Profiles in the H.264/AVC FRExt Amendment
-126-
Levels in H.264/AVC
Level 1b added in FRExt. For some 3G wireless environments
-127-
Levels in H.264/AVC
1. If a picture size is smaller than the typical picture size
then frame rate can be higher up to a maximum of 172
frames/sec
2. Horizontal and vertical maximum sizes cannot be more
than sqrt[(Total # of pixels/frame)x8]
3. If at a given level, picture size is less than that in the
table, # of reference frames for ME and MC can be up
to 16.
-128-
To meet more demanding high fidelity applications
Compressed Bit Rate Multipliers for FRExt Profiles
Multipliers for fourth column of table in page 125
-129-
24 Frames/sec film
1920x1080 progressive
The High profile of FRExt produced nominally better video quality
than MPEG-2 when using only one third as many bits (8 Mbps
versus 24 Mbps)
The High profile of FRExt produced nominally transparent (i.e.,
difficult to distinguish from the original video without
compression) video quality at only 16 Mbps.
[9] T. Wedi, Y. Kashiwagi, Subjective quality evaluation of H.264/AVC FRExt for HD movie content,
JVT document JVT-L033, July 2004.
-130-
Courtesy: Advanced Technology Group of Motorola BCS
-131-
Courtesy: Advanced Technology Group of Motorola BCS
-132-
Fig. 7: (a) (e) Comparison of R-D curves for MPEG-2 (MP2),
MPEG-4 ASP (MP4 ASP) and H.264/AVC (MP4 AVC). I frames were
inserted every 15 frames (N=15) and two non-reference B frames
per reference I or P frame were used (M=3).
Courtesy: Advanced Technology Group of Motorola BCS
MP4 ASP yields 1.5 coding gain over MPEG-2.
MPEG-4 AVC yields 2.0 coding gain over MPEG-2.
-133-
High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps
Nominally transparent video quality on 1080p24 at 16 Mbps
-134-
(Fast VDO)
Sub-optimal uses of B frames and other aspects make the plotted
performance conservative for FRExt, thus the remark in the figure about
potential future performance
-135-
High Profile Details:
Deblocking Filter, CABAC, Signaling
Deblocking Filter:
Only control of filter is adjusted: do not filter 4x4 blocks
No change to filter operation itself
CABAC:
61 new contexts and corresponding initialization values
No change to CABAC engine
Signaling:
8x8 transform on/off flag at PPS level
8x8 transform on/off flag per macroblock allows adaptive use
-136-
High vs. Main Profile Summary
High Profile contains:
Main profile
Adaptive MB level switching between 8x8 and 4x4 transform block sizes.
Encoder specified perceptual based quantization scaling matrices
Encoder specified separate control of each chroma component QP
Coding efficiency impact (measured as average bit-rate reduction):
HD Film: 12%
HD Video (progressive): 12%
HD Video (interlace): 4% (only 2 test clips)
SD Video (interlace): 6%
Complexity impact:
Implementation beyond Main Profile affects Intra prediction,
transform, deblocking filter control, CABAC decoding
No increase in computational requirements
Slight increase in memory requirements (CABAC, transform)
-137-
Licensing of H.264/AVC Technology
Two patent pools to obtain the license
1. MPEGLA www.mpegla.com
2. Via licensing www.vialicensing.com
These two patent pools do not guarantee that they
cover the entire technology of H.264 as participation of
a patent owner in a patent pool is voluntary.
-138-
AUDIO coding & systems
H.264 is limited to video
Audio coder: Bit rates, Quality levels and # of channels
left to industry and standards groups (ATSC, SCTE,
ARIB, DVB etc.)
DVB is considering AAC with SBR (AAC plus)
ATSC has selected AC-3 plus from Dolby
MPEG calls it HE-AAC (HE High efficiency)
ATSC, SCTE, ARIB, MPEG etc. will continue to use
MPEG-1 Audio, MPEG-2, AAC and AC-3.