Anda di halaman 1dari 4

2004 IEEE International Conference on Multimedia and Expo (ICME)

A Statistical Approach for Object Motion Estimation with MPEG Motion Vectors
Xiaodong Yu', Ping Xue' and Qi Tian'
Nanyang Technological Universiv, School of Electrical and Electronic Engineering, Singapore
Institute far Infocomm Research, Agency for Science, Technology and Research, Singapore
'{exdyu, epxue)@ntu.edu.sg, tian@i2r.a-star.edu.sg
was sensitive to the existence of small motion vector
clusters and resulted in accurate identification of small
objects. Babu and Ramakrishnan [6] accumulated and
interpolated motion vectors over a few frames to enrich
the motion information. Nevertheless, these approaches
are inefficient if the object is too small. For example,
wherever two or more objects smaller than a macroblock
conbibute to distinct motions within a macroblock, the
encoded motion vector cannot represent the motion
correctly [4] hence motion segmentation is infeasible.
Furthermore, if an object is in similar size of one or two
macroblocks, only one or two motion vectors cannot
provide sufficient information to distinguish object
motions from noisy vectors thus it is still difticult to
segment this object from the background. These problem
motivate us to seek another way to estimate object motion
with motion vectors. We argue that it is possible to
extract some useful motion information in macro level
even when the objects are too small to estimate individual
object's motion, providing these objects follow some
kinds of common motion pattern.
In this paper we proposed a statistical model to
estimate the mean object motion with MF'EG motion
vectors under the stationary assumption. Two normal
distribution terms are used to model the randomness of
the object motion and the noises embedded in motion
vector field respectively. Applying the statistical analysis
within a time window, we alleviate the granularity of
motion vector field on the cost of instant motion
information.
The rest of this paper is organized as follows. In
Section 2, we formulate our research question and
proposed a statistical model. Then we present the test bed
for the proposed model in Section 3. In Section 4 we
present experimental results of the model and the
influential factors presented in Section 2 with the test bed.
Finally, a conclusion and the discussions of future work
are given in Section 5.

Abstract
In this paper we propose a statistical approach to
estimate the object motion with A4PEG motion vectors. A
model with two normal distribution terms is applied to
represent the simplified object motion. One term models
the nobes embedded in the mofion vectorfield produced
in the encoding stage and the other term models ihe
randomness of the frue object motion. Experiments with
vehicle motion estimation from MPEG ha@c video are
used to evaluate the proposed algorithm. The influence of
rime window, frame size and referencej+ame distance are
investigated. The vehicle speeds can be estimared with a
high accuracy up to 85% 92%.

1. Introduction
Object motion estimation is a classic problem in the
computer vision field. In recent years with the popularity
of MPEG videos, much research effoorts have been
attached to estimate object motion with MPEG motion
vectors. Although MPEG motion vector is originally
designed to minimize the motion prediction error in
coding, it also embeds rich motion information among
frames [I]. Since motion vectors are readily available in
MPEG streams, we need neither fully decode the
compressed video stream nor calculate the optical flow
thus great computations can be saved.
Motion-vector-based object motion estimation is
composed of two components: motion segmentation and
object tracking. It is assumed that objects are rigid or their
parts are rigidly connected to one another and objects
have continuous motion [I]. Thus an object can be
segmented from background by clustering motion vectors
according to their similarities in directions or amplitudes
[2,3,10]. In the next step, motion parameters are derived
from the motion vectors associated to this object for
tracking. Such algorithms are analogues of those in
optical flow field and they all rely on the success of
moving object segmentation. However, the granularity of
motion vector field limits the performance of motion
vector based object segmentation. To solve this problem,
scholars have raised several approaches. Eng and Ma [5]
used unbiased fuzzy clustering to replace the well-know
fuzzy c-means clustering. They found that this algorithm

0-7803-8603-5/04/$20.00
02004 IEEE

2. Theoretical analysis
In this paper, we assume that the object motions are
homogeneous both in spatial and temporal domain and we
call it the stationary assumption. This assumption requires
that object motions are similar to one another in terms of

519

improving the signal-noise-ratio (SNR). The estimation


error follows the normal distribution,

amplitude or direction and their motions change slowly.


In light of the difiiculties of object segmentation with
motion vectors discussed in Section 1, we expect to
describe the objects motion with a few statistical
parameters within a short period rather than identifying
instant motion for every single object. The period within
which the stationary assumption is satisfied is defined as
the time window. We defme the displacement that
associates with object motion as the true object motion,
the one that does not as noise and the motion derived
directly from the raw motion vector field, i.e., the true
object motion plus noise, as the observed object motion,
X,'=Xi + n,
(1)
where X' , X and n are variables of the observed objects
motion, the true object motion and noise respectively, i
denotes the i-th sample, i.e., the i-th motion vector within
a time window. The variables in the statistical model can
be either amplitudes or directions of motion vectors. They
can be extracted either directly from motion vector field
o r from motion vector field after some transformations,
e.g., camera calibration, as long as the stationary
assumption can be satisfied after such transformations.
The statistical model for the true object motion is
application-tailored. In this paper, we are interest in the
amplitude of object motion. Under the stationary
assumption, we can expect that within a time window,
most true object motions should concentrate around a
center value, i.e., the mean, and they should be symmetric
about the center in a bell shape. This rational deduction
coincides with the experimental ohservations (see Figure
2). Combining the rational deduction and the
experimental ohservations, we propose to model the true
object motion with normal distribution in this paper. For
the model of noise, we employ an additive zero-mean and
constant variance normal distribution, which has been
used to model the noise in the optical flow [7]. Thus we
have

<

5 = (X,'-p)
and S N R is given by

- N(0, * 1,
0:+U:

(4)

(5)
Now let us characteristically analyze the influential
factors in <and SNR.
The estimation error is controlled by three factors:
the sample size N, the variance of the true object motion
dxand the variance of noise d,.
The sample size N, i.e.,
the total number of motion vectors used for calculation of
the sample mean, is positively correlated to the time
window T and the object density d. The variance of the
true object motion dxis in direct proportion to the time
window T while in inverse proportion to the density of
objects d. The variance of noise dnis composed of the
variance of the motion estimation error and the variance
of the error caused by resolution limitation. The motion
estimation error results from the block-matching
algorithm in MPEG video. For a specific application, its
variance can be reasonably assumed as a constant. The
error caused by resolution limitation comes from the half
pixel accuracy of MF'EG motion vector. It implies that the
motion estimation error has a lower bound there is
always a minimum random error introduced by this half
pixel accuracy. Its variance is also a constant. As a result,
the variance ofnoise dmis a constant in equation (2).
SNR is controlled by three factors too: the variance of
the true object motion dr,
the variance of-noise deand
the sample mean of the true object motion x.dxand d,
have been discussed as above. To improve the estimation
accuracy of p, a larger
is preferred. The amplitude of
the true object motion X i s influenced by the speed of
objects, the frame size F, and the distance between
current frame and its reference frame Df For a given
application, we cannot control the speed of objects. But
we can improve the signal-noise-ratio by increasing the
amplitude of motion vectors. By observation, the
amplitude of a motion vector is roughly in direct

<

X , N(P,~::),
ni N(O,U;),
X z ' - N ( P . +~U : ) ,
(2)
where p and dxare mean and variance of the true
object motion, and dnvariance of noise. We approximate

p with the sample mean of the observed motion A" from

Nsamoles in a time window T:

,,
The mean of the true object motion is a parameter of
interest to the users in applications because it represents
the dominant motion characteristics. It is desired to
improve its estimation accuracy. This can he achieved by
either reducing the variance of the estimation error or

<

proportion to the square root of frame size fi and the


reference frame distance Or. Hence, we can heuristically
approximate Xas
X I k x f i x D,
(6)
where k is a constant.

3. A test bed from a traffic monitoring


application

520

fits demonstrate that the normal distribution properly


approximate the speed distributions. To test our
assumption objectively, we conduct the Ryan-Joiner
(similar to Shapiro-Wilk) normality test. The plots and
the test results are presented in Figure 2.b. As the plot
shows the ordered observed values and the respective
cumulative frequency almost lie along a straight line, it is
secure to assume that the vehicle speed is normally
distributed.

We present a case study for the trafiic monitoring


application in this section and use it to test the proposed
model and the influential facton discussed above. It is
extended from our previous work [SI. In this application,
we estimated vehicle speed with motion vectors in MPEG
traffic video. The traffic video is collected from a Skycam,
i.e., a camera highly mounted with a much wider view.
Figure 1 shows a sample image of such video with the
motion vectors. Most of the vchicles in the traffic video
are smaller than a single macroblock and there is only one
or two motion vectors associated to each vehicle. Hence,
this is a good example to show the advantage of the
proposed method over conventional clustering based
counterparts. Within a short period of time, the
amplitudes and the directions of vehicles speed should be
similar and they will not change significantly. Thus, the
stationary assumption is satisfied easily in this application.
Due to the perspective effect, motion vectors from
different vehicles with similar speed may present
different amplitudes. Hence camera calibration is
employed to obtain the displacement of the vehicles in
ground plane from MPEG motion vectors. In this way,
the variable of the object motion in the statistical model
and equations (1)-(6) is the mapped motion vector, or
equivalently

$4,

833

&

a
gp.6

,
.
.
a
.
,._"

"-I...-

.
I
,

,.*<-.*a*

*1/,

(b)

Figure 2. The distributions of the estimated speed (a) and it


normal probability plots (b). In (a), the discrete values are the
frequencies of speed and the continuous values are the normal

Figure 1. Sample image from Skycam and motion vectors


(scaled by IO)

fits.

Then we test the impact of time window T on speed


estimation. The standard deviation of speed estimation
error at and mean accuracy of speed estimation at
different time windows are illustrated in Figure 3.a and
Figure 3.h respectively. For T less than 60 seconds, a
larger time window will improve the performance of
speed estimation since more samples are used. For T more
than 60 seconds, the stationary assumption is less likely to
be satisfied and the increase of the variance of true
motion overwhelms the benefit of a larger sample size.
Thus the mean accuracy of speed estimation decreases
accordingly.

4. Experimental results
We test the proposed model and the impact of the
influential factors with the test bed described in Section 3
and the test videos are two MF'EG videos collected from
hvo Skycams respectively. Each of them is 5 minutes
long and includes 6 lanes, representing various traffic
conditions at certain place. They are digitalized by a
MPEG card in MPEG-1 format at resolution 352x288,
frame rate 25Fps, reference frame distance Df=3 and
constant bitrate 1150khps. The variable of object motion
is the speed of vehicle in this case study. The mean speed
of each lane is calculated and compared with ground buth
independently. Ground truth is obtained manually at 2
seconds interval.
First of all, we test the normal approximation of object
motion. Figure 2.a show the distributions of the estimated
speed within a lane for a 30-second test sequence. It is
bell-shape and symmetric about their mean. The normal

521

Figure 3. The impact of time window on speed estimation. (a)


and (b) show the standard deviation of speed estimation error at
and mean accuracy of speed estimation at different time
windows respectively.
Finally, we test the influence of frame size and
reference frame distance on speed estimation. We reencode two test videos at two Frame sizes, CIF and QCIF,
and three reference frame distance, D,= 3 (normal), Of=
6 and D - 12. Then we evaluate the test bed with ref:
encoded videos. Figure 4.a illustrates the mean amplitude
of motion vectors in the test videos. We find that the
mean motion vectors increase approximately linearly with
the reference frame distance and the square root of frame
size. Using a larger reference frame distance or a larger
frame size, motion vector becomes longer. Consequently,
the influence of half pixel error is suppressed and the
accuracies of speed estimation are improved. Note that
the mean motion vector with 0,= 12 in CIF size is
slightly smaller than the double of the mean motion
vector with Of=12 in QCIF size and the one with Of=6
in CIF size. Meanwhile, the mean accuracy of speed
estimation with 0, = 12 in CIF size deteriorates as
compared with the others in CIF size. Similar observation
is also reported in Gonzales, Yeo and Kuos experiments
[9]. The reason is that with a larger reference frame
distance, a larger motion vector is selected thus more bits
are needed to code this motion vector. When these
additional bits are not sufficiently compensated by the
corresponding saves in coding smaller marcoblock
residual emors, a sub-optimal, shorter motion vector is
used instead of the optimal, longer one. This is the
inherent limitation of MPEG motion vector based
approach.
-

Unlike clustering based object motion estimation


techniques, we did not attempt to segment and track every
object. Instead, we tried to estimate a few statistical
motion parameters for all objects within a time window.
In this way, motion estimation accuracy can be
substantially improved with proper spatial and temporal
processing (85%-92% in DUI test bed) and thc granularity
of motion vector field is alleviated.
Although the test vehicle in this paper is a typical
application for traffic monitoring, it is applicable in other
scenarios where the objects are small while moving in a
common pattern. It is can also be extend to moving
camera by compensating the camera motion. They are
included in our future works.

Reference:
[I] Nevenka Dimitrova and Forouzan Clshani, Motion
Recovely for Video Content Classification, ACM
Transactions on Information Systems, Vo1.13, No.4,

October,1995,pp408-439
[2] F. Bartolini, V. Cappelhi, and C. Giani, Motion
Estimation and Tracking for Urban Traffic Monitoring,
Proceeding of IEEE Internal Conference on Image
Processing, 1996, pages 87-90
[3] Heitou Zen, Tameharu Hasegawa, Shinji Ozawa, Moving
Object Detection from MPEG Coded Picture, Proceeding
of IEEE International Confrrence on Image Processing,
vol. IV, pp.25-29, Oct. 1999
[4] Kyongil Yoon, Daniel DeMenthon, David Doermann,
Event Detection from MPEC Video in the Compressed
Domain, Internalional Conference on Pattern Recognition,
p. 1819-1825, Volume 1, Barcelona, Spain, September 03 08,2000.
[5] Haw-Lung Eng, Kai-Kuang Ma, Motion Trajectory
Extraction Based on Macroblock Motion Vectors for Video
Indexing, International Conzrence on Image Processing,
pp:284-288, 1999
[6] Babu, R.V., Ramakrishnan, K.R., Compressed domain
motion segmentation for video object extraction,Acoustics,
Speech, and Signal Processing, 2002 IEEE Inlernational
Conference on, Volume: 4,2002, Page(s): 3788 -3791
,
I
I
[7] Christophe Garcia, Georgios Tziritas, Optimal Projection of
w.
w
a>
2-0 Displacements for 3-D Translational Motion
Estimation, Image omi Vision Computing, Vol 20, pp:793(a)
(b)
804,2002
Figure 4. The mean motion vectors (a) and the mean accuracies
of speed estimation @) far test videos in different fnme size and
[8] Xiaodong Yu, Lingyu Dum, Qi Tian, Highway Traffic
Information Extraction from Skycam MPEG Video,
reference hame distance. T=60s.
Proceedings of IEEE 5th Intelligent Tramponation System
Conference, Page(s): 37- 42, Sep. 3-6, 2002
5. Conclusion and future work
[9] C A . Gonzales, H. Yeo and C.J.Kuo, Requirements for
Motion Estimation Search Range in MPEG-2 Coded Video,
In this paper, an algorithm that estimates object
IBM Joumal of Research Development, Vol. 43, No.4, July
motion from MPEG compressed video with statistical
1999.
[IO] Jim Wang and Ze-Nian Li, Kernel-based Multiple Cue
model was presented. This algorithm complements the
Algorithm for Object Segmentation, IS&T/SPIE, Symp. On
existing clustering based approaches in small object
Electronic Image and Video Communications and
scenarios where the latter are inefficient. Theoretical
Processing, 2000
analysis and experimental evaluation were conducted to
investigate the influential factors of the proposed model.
.
7
.

PI,

011

522

Anda mungkin juga menyukai