Anda di halaman 1dari 3

Research on Fast Block Partition Mode Selection Algorithm in H.

264
Jianjun Guo, Kui Dai, Yun Cheng and Zhiying Wang
School of Computer Science, National University of Defense Technology, Changsha Hunan
410073, China
jjguo@tom.com

Abstract
The variable block size motion compensation using
multiple reference frames is one of the key
technologies to provide notable performance gain in
H.264. However it is also the main bottleneck that
increases the overall computational complexity. For
this reason, based on some test results, this paper
proposes a new biased fast mode decision method for
H.264 video coding standard. Experimental results
indicate that this method can speed up the H.264
encoder efficiently without noticeable loss in quality.

1. Introduction
Video compression is a necessity when it comes to
efficient storage and transmission of digital video
signals and thus has numerous applications in critical
digital technologies such as High Definition Television
(HDTV), teleconferencing and CD-ROM video data
storage. In particular, motion compensated predictive
coding has received considerable attention as it aims at
removing the high temporal redundancy between
successive frames. This has thus been adopted by
many video coding standards such as H.261, H.262,
H.263 from ITU-T (Telecommunication standardization sector of International Telecommunication
Union), MPEG-1 MPEG-2 MPEG-4 from ISO
(International Standardization Organization). But all
these standards are restricted by the limited network
bandwidth. A more efficient video coding standard
H.264[1] emerges to solve this problem.
The new JVT H.264 or MPEG-4 AVC has a very
high objective and subjective quality compared to
previous coding standard. According to some tests, it
can save about 38% bitrate of MPEG-4 and about 48%
that of H.263. The basic encoding algorithm of H.264
is very similar to H.26x or MPEG-x except that integer
4x4 discrete cosine transform (DCT) is used instead of
the traditional 8x8 DCT. Additional features include
intra prediction mode for I-frames, multiple block sizes

for motion estimation, multiple reference picture


selection for higher coding efficiency and one-fourth
or one-eighth fractional pixel accuracy motion
vector[2].
The H.264 encoder complexity is greatly increased
and the motion estimation (ME) module contributes
the largest complexity. In H.264, the encoder support
seven various block size ME with a tree-structured
hierarchical macroblock partitions as shown in Figure1.
The processing time increases linearly with the number
of block type used. This is because motion estimation
needs to be performed for each block type. This full
searching process (the examination of all seven block
modes) provides the best coding result but the increase
in computation is very high.

Figure 1. Different Modes for partitioning a macroblock in


H.264

In this paper, we propose a fast multi-block


selection scheme which can efficiently reduce the
computational cost with a little degradation of PSNR.
It is based on the proportion of each mode in the test.
Instead of searching through all the possible block
types, the proposed scheme tries to stop the search
process when it meets some criteria and has a biased
search order of the mode which has a large proportion.
This is very useful for real-time applications.
The rest of this paper is organized as follows. Some
observation in multi-block motion estimation will be
shown in Section 2. Section 3 describes the proposed
fast multi-block selection algorithm and experimental
results will be presented in Section 4. A conclusion
will be given in Section 5.

Proceedings of the Fourth Annual ACIS International Conference on Computer and Information Science (ICIS05)
0-7695-2296-3/05 $20.00 2005 IEEE

2. Observations on Multi-block Motion


Estimation
Motion Estimation is an important technology in
video compression. In motion estimation the current
image is divided into Macro-Blocks (MB) and for each
MB, a similar one is chosen in a reference frame,
minimizing a distortion measure. The best match found
represents the predicted MB, while the displacement
from the original MB to the best match gives the socalled Motion Vector (MV). Only the MV and the
residual (i.e. the difference between the original MB
and the predicted MB) need to be encoded and
transmitted into the final stream.
Multiple block sizes motion estimation/compensation (ME/MC) is adopted in H.264 to further reduce
the degree of error between the original image and the
predicted image by increasing the accuracy of
prediction. For some MBs, it is possible to contain
more than one object and the objects may not move in
the same direction. Therefore using only one motion
vector may not be enough to completely describe the
motion of all objects in one MB. With only one motion
vector, only part of the MB will be well described and
the resulting residue energy can still be large due to the
mismatch in the remaining part of the MB. If multiblock motion estimation is allowed, the MB will be
segmented into smaller zones. Each of them will get a
motion vector pointing to the best matched zone in the
preceding pictures. In H.264, seven type of block size
with different shapes are supported. The best match is
found by minimizes:

J (m, OMOTION )

SA(T ) D( s, c(m))  OMOTION R(m  p)

With s being the original video signal, c being the


coded video signal, m being the motion vector, p being
the prediction for the motion vector, and O MOTION
being the Lagrange multiplier. The rate term R(m-p)
represents the motion information only and is
computed by a table-lookup. The distortion measure
SAD (the Sum of Absolute Differences) is defined as
follows:
B,B

SAD( s, c(m))

s[ x, y]  c[ x  m , y  m ]
x

x 1, y 1

Multi-block Motion Estimation (ME) is an


important part of H.264, but unfortunately it is also the
most computationally intensive function of the entire
encoding process. Including the SKIP mode and two
intra modes, it has up to 10 modes to be searched thus
not fit for the real time coding. Many efforts are made
to speed up the mode decision process like additional
MV information used in [3] and adaptive partition used

in [4]. But do all the modes need to be searched? Here


we test several video sequences. The result is listed in
Table 1. From Table 1, we find that only the modes
above 8x8 take up a large proportion of the final
coding mode.
Table 1. Proportion of each modes for several sequences
Seq.
modes
SKIP
16x16
16x8
8x16
8x8
intra16x16
intra4x4

carphone
.qcif
22.47
26.65
11.52
11.81
26.20
0.33
1.01

foreman.
qcif
15.31
26.63
9.25
21.64
27.06
0.05
0.05

akiyo.qci
f
59.60
10.90
3.28
4.21
21.95
0.04
0.02

3. Proposed Algorithm
Enlightened by the common principle Make the
Common Case Fast[5], we can make a biased
searching order to speed up the mode selection part to
reduce the total coding time unlike the process in [3]
needs additional MV information. In [4], a threshold is
used to terminate the searching process as quickly. We
also use a threshold during the searching process to
terminate it as quickly as possible.
Our searching algorithm is described as follows:
Step1: Search SKIP mode and 16x16mode, if
cost(SKIP)<cost(16x16), then use the SKIP mode to
code and stop, else go to step 2.
Step2: Search 8x8mode,if cost(16x16)<cost(8x8)
and cost(8x8)-cost(16x16)>TH1, use 16x16 mode to
code and stop searching, else search 16x8 mode and
8x16 mode, select the mode with the least cost to code
and stop searching.
Step3: If cost(16x16)>cost(8x8), search 4x4 mode.
If cost(8x8)<cost(4x4), then use 8x8 mode to code and
stop searching. If cost(8x8)>cost(4x4), search 8x4
mode and 4x8 mode select the mode with the least cost
to code.
In the above process, the threshold TH1 is not
constant. But it cant change arbitrarily. It should have
some relation with the current cost and can adjust itself
according to different sequence. Experimental results
indicate 1/4 or 1/5 of the current cost is the best.
Additionally here we use the DS[6] search strategy.
DS method searched the best position through two
steps: coarse grain search and fine grain search. We
use a modified distortion measure Half Sum of
Absolute Difference during the coarse grain search.
HSAD is defined as follows:

Proceedings of the Fourth Annual ACIS International Conference on Computer and Information Science (ICIS05)
0-7695-2296-3/05 $20.00 2005 IEEE

HSAD( s, c(m))

s[ x, y]  c[ x  m

, y  my ]

x, y

x 0,1,..., B  1, y 0,2,..., B  2
Because of the internal features of video sequence,
the HSAD is about half of SAD with only a little
difference. We can use a coefficient to achieve SAD
from HSAD.
SAD = c0h2hHSAD
If the DS method searches n1 times during the
coarse grain step and 4 times during the fine grain step.
For 4x4 block, DS needs to compute (n1h16 + 4h16)
points. If using HSAD, it needs (n2h8 + 4h16) points.
n2 are close to n1. If ni is very big, then we can save
about half of the computation to calculate distortion
measure.
In the proposed mode searching method, we favor
the modes taken up the larger proportion and have a
biased searching order to them. Most of the time, the
searching process terminates at the large mode and
dont need to search the additional modes. Because in
the mode decision stage, we have to do the MV
searching process to get the MV and its each Cost, so
that no more modes need to be searched means no
more MVs need to be searched thus can save
considerable time.

4. Experimental Results
The proposed algorithm is implemented in the
reference JVT software version 8.2[7]. We have tested
our proposed method over a series of testing sequences.
Three typical sequences with low, medium and high
motions are selected to show the results. The
sequences are encoded at 30 fps with QP= 10 to 42.
The PSNR (Peak Signal to Noise Ratio) is used to
evaluate the degradation of image quality. One PSNR
comparison between our method and JM82 using the
full search is listed in Figure 2 and the speedup is
shown in Figure 3. It is shown in Figure 3 that the
average speedup for all the typical sequences is about
5 and varies little with different image.

5. Conclusion
The new video coding standard H.264 is very
efficient in solving the limited bandwidth problem and
improves image property thus has many potential
application fields. But the computation size is a
bottleneck for its real-time application. In this paper,
we proposed an efficient mode searching method to
optimize the H.264 encoder. According to our
experimental results, the proposed method provides a

significant speedup of about 5 compared with JM8.2


with a little degradation on PSNR.

Figure 2. The luma PSNR comparison for the


sequence carphone.qcif

Figure 3. Speedup of the selected sequences varies


with QP (the Quantization Parameter). The run time of
JM82 is assumed to be 1 unit.

References
[1] H.264, Draft ITU-T Recommendation and Final Draft
International Standard, Pattaya, Thailand, 2003.
[2] Wiegand T. , Sullivan G.J. , Bjntegaard G. , Luthra A.:
Overview of the H.264/AVC Video Coding Standard.
IEEE Transactions on Circuits and Systems for Video
Technology. July 2003, Vol.13, No.7
[3] Andy Chang, P.H.W. Wong, Y.M. Yeung and Oscar C.
Au, "Fast Multi-block Selection for H.264 Video
Coding," Proc. of 2004 IEEE Int. Sym. on Circuits &
Systems (ISCAS), Vancouver, 2004.
[4] Mani VT, Anup Shah and G Chandrashekar Reddy: A
fast block motion estimation algorithm based on motion
adaptive partitioning. Tata Elxsi Ltd., Bangalore.
[5] John L. Hennessy and David A. Pattern: Computer
Architecture A Quantitative Approach Third Edition.
Elsevier Science Pte Ltd.
[6] S. Zhu and K K Ma: A new diamond search algorithm for
fast
block-matching
motion
estimation.
IEEE
Transactions on Image Processing, vol9, no. 2, February
2000
[7] Joint Video Team (JVT): Test Model JM8.2.
http://bs.hhi.de/~suehring/tml/download/

Proceedings of the Fourth Annual ACIS International Conference on Computer and Information Science (ICIS05)
0-7695-2296-3/05 $20.00 2005 IEEE

Anda mungkin juga menyukai