Multiresolution Motion Estimation and Compensation For Video Coding

ICSP2010 Proceedings
Multiresolution motion estimation and compensation

for video coding
Najib BEN AOUN, Maher EL’ARBI, Chokri BEN AMAR
REGIM: REsearch Group on Intelligent Machines
National School of Engineers (ENIS)
BP 1173, Sfax, 3038, Tunisia
{Najib.benaoun, Maher.elarbi, Chokri.benamar}@ieee.org
Abstract— Recently, the quantity of data has known a big There are many methods to achieve ME/MC. They can be
evolution especially with the emergence of many video divided on two such as the statistical methods, the differentials
applications over networks such as the videophone and the methods as an indirect methods (applied to features) and the
videoconferencing, and multimedia devices such as the high- optical flow, and the block based method as a direct ones
definition TV and the personal digital assistants. So, it was (applied to pixels). Block matching algorithm (BMA) [1] is an
crucial to reduce the quantity of data stored or transmitted by effective and popular technique for block based motion
compressing it spatially and temporally. Hence, motion estimation. It has been widely adopted in various video coding
estimation and compensation are employed in video coding standards and highly desirable since it maintain an acceptable
systems to remove temporal redundancy while keeping a high
prediction errors.
visual quality. They are the most important parts of the video
coding process since they require the most computational power Conventional video coding system applies ME in spatial
and the biggest consumption in resources and bandwidth. domain directly on frame without transformation. After that,
Therefore, many techniques have been developed to estimate they follow it by a discrete cosine transform (DCT). However,
motion between successive frames. In this paper, we will present giving the promising performances of the multiresolution
our motion estimation and compensation method applied on the analysis especially the discrete wavelet transform (DWT)
discrete wavelet transform coefficients and based on the block which provides a multiresolution expression of the signal with
matching algorithm which is the simplest, the most efficient and
localization in both space and frequency, many methods have
the most popular technique. Additional techniques are
been developed to construct a wavelet based video coding
introduced to accelerate the estimation process and improve the
prediction quality. system [2] and the DWT was integrated in new coding
standards such as JPEG2000, MPEG-4, and H.264.
Keywords-component; Discrete Wavelet Transform; motion For this, we have developed a block based ME/MC method
estimation; multiresolution domain; video coding. in the wavelet domain. Our method exploits the benefits of
I. INTRODUCTION DWT and the hierarchical relationship between its subbands
(quadtree) to drive ME/MC on wavelet coefficients, especially
The development of new applications and the spread of in the low frequency subband where we find the most
Internet all over the world have produced a big quantity of data significant visual information. This method is consolidated by
that must be stored and transmitted. With the increasing several techniques to ameliorate the results. With this method,
capacity of the storage media, the data storage problem was we have achieved good results in terms of prediction quality,
resolved but it remains the transmission problem especially compression performance and computational complexity.
with the limited channel bandwidth.
This paper is organized as follows: in Sec. II, we present
Therefore, a need for efficient ways for signals encoding the motion estimation principle and techniques focusing on the
made signal compression central to digital communications. wavelet domain. Sec. III describes the proposed method. In
That is why, motion estimation and compensation (ME/MC) is Sec. IV, we will introduce supplementary techniques which are
introduced as a solution to reduce the quantity of data by used to improve our method. In Sec. V, we evaluate our
eliminating the temporal redundancy between adjacent frames method comparing to some conventional methods. This will
in an image sequence. Motion estimation process serves to prove that our method outperforms conventional method in
predict motion between two successive frames and produce the many terms. Finally, Sec. VI summarizes the key findings and
motion vectors (MVs) which are the displacements between suggests future research possibilities.
these two frames. So, instead of transmitting two frames, we
will send only one frame which is the reference frame, the II. MOTION ESTIMATION AND COMPENSATION
motion vectors and the residue which is the difference between ME/MC are the fundamental parts of video coding systems
the current frame and the reconstructed frame by motion
and form the core of many video processing applications.
compensation. The combination of the motion estimation and
motion compensation is a key part of the video coding. Motion estimation eliminates temporal redundancy from video
by exploiting the temporal correlation between successive
frames, so that reduces the amount of data to be transmitted or
Manuscript received June 14, 2010; revised August 10, 2010.
Corresponding author: N. BEN AOUN (najib.benaoun@ieee.org).
___________________________________
978-1-4244-5899-8/10/$26.00 ©2010 IEEE
1121
stored while maintaining sufficient data quality. However, ME Exploiting the hierarchical relationship between the wavelet
extract temporal motion information from video sequences, coefficients of the different subbands in different levels,
while motion compensation uses this motion information for different hierarchical ME methods were developed which are
efficient interframe coding. adapted to the wavelet transformation. The hierarchical
Motion estimation predicts motion between two successive relationship means that every wavelet coefficients has four
frames to generate a motion vectors (MVs) which represent descendants in the lower level of the DWT. So, there are two
the change between them. Consequently, these motion vectors main ME categories of schemes for DWT based: forward and
and the prediction error are transmitted instead of the frame backward schemes.
itself. With this process, the encoder will have sufficient The forward approach consists on conducting the ME in the
information to faithfully reproduce the frame sequence. DWT details subbands of the low level and using it to
Block-based motion estimation is most used method determine the motion in the higher level subbands (coarse-to-
because of their simplicity and performances, which made it fine). Researchers like F.G. Meyer, A. Averbuch and R.R.
the standard approach in the video coding systems. The Coifman [4] have followed the forward scheme to propose a
procedure of BMA is to divide the frames into a block of N×N ME method with a new pyramid structure. Recent researchers
pixels, to match every block of the current frame (CF) with his [5], [6] have developed a backward scheme (coarse-to-fine)
most similar block inside a research window in the reference where they estimated the motion in the finest DWT resolution
frame (RF) and to generate the motion vector. Consequently, (higher level) and then progressively refined the ME by
for this method, the most important parameters here are the incorporating the finer level. This scheme has proved its
size of the block N and the size of the search window P. superiority over the forward scheme.
However, the block matching is based on minimizing a The effectiveness of the BMA and the suitability of the
criterion like the Mean Absolute Error (MAD) or the Mean DWT in the video coding, has conducted us to develop a block
Square Error (MSE) which is the most common block based motion estimation method in the wavelet domain.
distortion measure for matching two blocks and it provide
more accurate block matching. The MV will be applicable to OUR PROPOSED METHOD
every pixels of the same block which reduce the The proposed method makes use of the wavelet properties
computational requirement. to apply the motion estimation directly in the wavelet
To identify the best corresponding block, the simplest way coefficients. By adopting the fine-to-coarse motion estimation
is to evaluate every block in the reference frame (exhaustive strategy, we have provided a better estimation since the
search, ES). But, although this method find generally the approximation contains the most visual information. The
appropriate block, it consumes a high computations time. motion vectors of the approximation are directly calculated,
Hence, others fast searching strategies [3] have been and then the motion vectors of the details subbands are
developed where search is done in a particular order. There are deducted using hierarchical relationship that exists between
the Three Step Search (TSS), the Simple and Efficient Search the DWT subbands as shown in Fig.1. Working with a 3 level
(SES), the Four Step Search (4SS), the Adaptive Rood Pattern DWT, we compute the motion vectors of the details subbands
Search (ARPS) and the Diamond Search (DS) which has following this formula:
proved to be the best searching strategies coming close to the ܸ௜ǡ௝ ൌ ʹଷି௜ ܸଷǡଵ ሺ‫ݔ‬ǡ ‫ݕ‬ሻ ൅ ߜ௜ǡ௝ (1)
ES results. So, the DS was improved in many variant such as
With i={1, 2, 3}, j={1, 2, 3, 4}, Vi,j(x,y) is the motion
the Cross DS (CDS), the Small CDS (SCDS) and the New
vector for the subband “j” at the level “i” and Ɂi,j is the
CDS (NCDS).
refinement factor (equal to 0 if “i” is equal to 3).
In conventional coding systems such as H.261 and MPEG-
Moreover, by predicting the motion only in the
1/2, BMA is conducted directly on frame which need a large
approximation which has a small size compared to the original
computing power. That is why many studies have been made
frame, not only the computation requirement is highly reduced
and proved that is better to transform the frame before
and the compression ratio is increasing, but also our method
executing the ME techniques. However, with the development
maintains a good prediction quality.
of new video coding standards, wavelets have received an
important interest since it has shown good and effective
results. The main idea behind wavelet is to generate a space-
frequency representation focusing only on the spatial
frequencies that are most significant to the human eye. This
wavelet decomposition is a reversible procedure which is
performed by successive approximations of the initial
information (original frame). This process, will improve the
coding efficiency since the wavelet coefficients are much
correlated and this representation reduce the blocking effects
especially in the edges. Figure 1. DWT subbands motion vectors representation
1122
The BMA is an efficient method for motion estimation C. Shifting technique
which encourages us to use it in our multiresolution based The DWT has many advantages of multiresolution domain,
method. Unfortunately, despite their encouraging proprieties which has made this space-frequency transformation very
and their promising results, the BMA and DWT suffer from useful for the ME. However, the shift-variant property of the
some problems. For this, a several improvement techniques DWT caused by the decimation process has made the ME/MC
have been implemented to surmount these problems and make less inefficient in the wavelet domain. Otherwise, there is a
our method more robust giving best results. big difference between the DWT of an image and the DWT of
the same image shifted by one pixel. This property is often
IMPROVEMENT TECHNIQUES seen on the edges of the image, but less important in the low-
The proposed method outperforms the conventional motion pass frequencies, which reinforce our choice to conduct ME in
estimation methods, but still having some problems. That is the approximation of the DWT.
what we drive us to develop some additional techniques to To overcome the shift-variant property of the DWT, a
overcome these problems. These techniques are to detect the shifting technique is used which increase the prediction
motion zone to limit the estimation operation to it; to add a quality [6]. Before applying ME, we shift the frame in spatial
sub-pixel precision to the motion vector computing; to shift domain by one pixel in all directions. Then, the shifted frames
the frame to better predict the motion; to overlap the frame are transformed to the wavelet domain for motion estimation
blocks to correct the motion vector by their neighboring more precise and more real. After calculating a motion vector
vectors and finally, to refine the prediction by changing the for the block in every direction, we generate the final motion
block size and re-predicting the block which are falsely vector which is the mean of all calculated vectors. This
predicted. In this section we will describe these techniques as technique has increased the estimation results by smoothing
well as the causes that conduct us to implement them. the predicted vectors and reducing the aliasing effect.
A. Temporal Segmentation D. Overlapping block matching technique
To accelerate the ME process, we have reduced the motion Supplementary technique for improving the motion
estimation area with a Background subtraction technique [7] estimation is to overlap the neighboring block to smooth the
to detect the zone which contains the movements. This motions vectors in a way to have a more real prediction. So,
detection is based on a temporal segmentation which allows us each motion vector will be the average of itself and the direct
to predict motion only on a limited area. This technique will neighboring motion vectors.
reduce the computations of the ME process since it assume This overlapped block matching technique will surmount
that the motion vectors of the blocks that are out of the the false prediction especially the discontinuity at the edges
detected area are null. This gain is increased if the movement which gives the high frequencies in the estimated image. This
is concentrated in a very limited area. is done since the technique is somewhat averaging the possible
candidates for each pixel. Hence, this technique will make the
B. Sub-pixel precision
visual quality more clear and net.
Block based motion estimation assume that every block
have an integer displacement which is, in reality, not true. E. Refinement techniques
Therefore, to improve the motion estimation and to increase The basic idea in the BMA is to divide the frame into
the accuracy of the prediction, we have moved to sub-pixel blocks of a fixed size N×N. This means that all the pixel of the
precision by developing a sub-pixel technique with a bilinear same block has the same displacement. But, this is not true in
interpolation process. This is done by interposing a line most cases, since there may be different movements in the
between each two lines and a column between each two same block. So, we have divided the blocks which are poorly
columns of the image. Then, ME is applied to the new image. predicted and re-estimate the motion on them. This will fix the
With this technique, a motion vector can point in a half or blocks size relatively to the movements and we will use
quarter of pixel position or even more. In this case, a block variable block sizes [8]. This technique is very powerful since
that has a real location at a fraction of pixels will be better it corrects the motion vectors by a hierarchical procedure
predicted. The sub-pixel accuracy can not only increase the based on modifying the block sizes. It provides a good
accuracy of motion vectors (augmenting the PSNR of the estimation and tries to minimize the error by taking into
reconstructed image by more than 2dB) and reduce errors, but account the intra-block movements.
also filter the image to eliminate noise and rapid changes. That Another refinement technique is also carried out for our
is true that this technique causes a doubling of image size, but method, which is to move the estimation to a lower level
it also allow a quick search by minimizing the path to find the (larger resolution) of the DWT. This process is not performed
corresponding block. For all this, in block based ME methods, for all blocks, but it runs only on poorly predicted blocks. The
sub-pixel technique is becoming crucial. refinements will re-estimate the motion of the blocks that has
an error greater than certain threshold. This technique has
given a more accurate estimation prediction quality. We have
1123
proved that the second refinement technique has better results, Our method gives a good visual estimate that resembles to
which have encouraged us to use it in our method. the estimate in the spatial domain.
All these techniques have united to improve our methods
which make it fast, efficient and accurate. In addition, we can
even exploit the human visual system and remove the small
variations not recognized by the human eye between the two
frames. The motion vectors and the prediction error are
encoded after transformed by DWT using the Embedded
Zerotree wavelet algorithm (EZW) which is an algorithm that
exploit the wavelet structure for an efficient coding.
EXPERIMENTAL RESULTS
In our block based method, we have fixed the DS as a block
searching strategy and the MSE as a block matching criterion
since it gives better compression performance while not Figure 2. The 129th frame of “foreman” and the 17th frame of “Tennis”
sequences. (a) The original image. The estimated image: (b) in the DCT
sacrificing image quality. We have also fixed the size of the domain, (c) in the DWT domain, (d) with our method.
window to 7 and the size of the block to 2 since we work in
the approximation in the third level of the DWT. Furthermore, CONCLUSION
we have integrated all the techniques mentioned previously
Whatever the motion estimation algorithm considered, a
with a quarter of pixel precision.
large computing power is needed. That is why many studies
Our method has proved its performance and robustness for
have been made to improve and simplify the algorithms. This
several video benchmarks used to test the ME/MC methods
paper proposes a multiresolution motion estimation and
such as the "Tennis", "Foreman", "Susie," "Claire" sequences
compensation method based on block matching applying in
and even the "Football" sequence which contains large
the wavelet coefficients. We will reinforce our method, in the
movements. The reached results showed large performance in
future works with others techniques, such as the spatial
terms of quality of reconstructed frame as shown in Table
segmentation, to identify the moving objects.
I(we have used the PSNR as a criterion to compare the
original frame to the reconstructed frame after estimation) and
ACKNOWLEDGMENT
also in terms of compression ratio. All this, amounts to the
accuracy of the estimation and the corrections made for the The authors would like to acknowledge the financial
motion vectors. support of this work by grants from the General Direction of
Scientific Research (DGRST), Tunisia, under the ARUB
TABLE I. PSNR OF THE RECONSTRUCTED IMAGE program.
Sequences
Tennis Foreman Susie Claire REFERENCES
Methods
Spatial domain 34.3983 33.5550 36.6450 37.7992 [1] H. Gharavi, and M. Mills, "Block Matching Motion Estimation
Algorithms: New Results," IEEE Trans. Circuits and Systems for Video
Technology, Vol. 37, pp. 649-651, 1990.
DCT domain 28.2568 31.3646 31.2833 33.0233 [2] P. C. Shenolikar, S. P. Narote, “Motion estimation on DWT based image
Conventional 31.7586 31.2889 33.1613 32.5908 sequence,” International Journal of Recent Trends in Engineering, Vol.
2, No. 4, November 2009.
DWT [3] Aroh Barjatya, “Block Matching Algorithms For Motion Estimation,
Proposed method 35.6263 34.6025 38.3417 38.5418 “Student Member, IEEE, DIP 6620 Spring 2004 Final Project Paper
[4] François G. Meyer, Amir Averbuch, and Ronald R. Coifman, « Motion
Our experiments verify the superiority of the proposed compensation of wavelet coefficients for very low bit rate video coding,
algorithm, not only versus several other well-known » Proc. IEEE Inter. Conference on Image Processing, Vol. 3, pp. 638-
641, 1997.
algorithms in the frequency and the multiresolution domains, [5] A. Lundmark, H. Li, and R. Forchheimer, « Motion vector certainty
but also versus the ME/MC method in the spatial domain. reduces bit rate in backward motion estimation video coding », In Proc.
Moreover, it is faster than other methods and the compression of SPIE Visual Comm. and Image Processing, pages 95–104, 2000.
[6] Yufei Yuan and Mrinal K. Mandal, “Low-Band-Shifted Hierarchical
ratio is highly increased because it works on the Backward Motion Estimation and Compensation for Wavelet-Based
approximation level of the DWT, which is 8 times smaller Video Coding,” ICVGIP’02, 2002.
than the original image. [7] Z. Zivkovic and F. van der Heijden, “Efficient adaptive density
estimation per image pixel for the task of background subtraction,”
The Fig.2 shows an increase in visual quality of the image Pattern Recognition Letters, 27(7):773–780, May 2006
estimate. We can observe that when applying motion [8] M. G. Arvanitidou, et al, “Global motion estimation using variable block
estimation on the transformed DCT, block effects appeared. sizes and its application to object segmentation,” Workshop on Image
Analysis for Multimedia Interactive Services, 2009.
On the other hand, using the classical DWT domain, there are
also blocks effects, despite its superiority to the DCT domain.
1124

Multiresolution Motion Estimation and Compensation For Video Coding

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Multiresolution Motion Estimation and Compensation For Video Coding

Diunggah oleh

Hak Cipta:

Format Tersedia

ICSP2010 Proceedings

Multiresolution motion estimation and compensation

Anda mungkin juga menyukai