Anda di halaman 1dari 14

3852 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO.

10, OCTOBER 2013

Monocular Human Motion Tracking by Using


DE-MC Particle Filter
Ming Du, Xiaoming Nan, and Ling Guan, Fellow, IEEE

Abstract— Tracking human motion from monocular video in the state space. In the sense of importance sampling, if the
sequences has attracted significantly increased interests in recent proposal distribution is biased and no corrections are made, the
years. A key to accomplishing this task is to efficiently explore tracker will fail quickly. In addition, the relationship between
a high-dimensional state space. However, the traditional particle
filter method and many of its variants have not been able to meet human motion and image appearance is too complex to be
expectations as they lack a strategy to do efficiently sampling formulated mathematically. Due to the motion blur, depth
or stochastic search. We present a novel approach, namely ambiguity and self occlusion, the extracted low-level visual
differential evolution - Markov chain (DE-MC) particle filtering. features cannot describe the human motion effectively and
By taking the advantage of the DE-MC algorithm’s ability to robustly. Although a variety of approaches have been proposed
approximate complicated distributions, substantial improvement
can be made to the traditional structure of the particle filter. As in recent years, the high dimensional human motion tracking
a result, an efficient stochastic search can be performed to locate problem is still inherently difficult.
the modes of likelihoods. Furthermore, we apply the proposed In this paper, we propose a novel tracking algorithm, namely
algorithm to solve the 3D articulated model-based human motion the Differential Evolution - Markov Chain (DE-MC) particle
tracking problem. A reliable image likelihood function is built filter. It is based on the particle filter framework but makes
for visual tracker design. Based on the proposed DE-MC particle
filter and the image likelihood function, we perform a variety of substantial changes to the core structure of the CONDENSA-
monocular human motion tracking experiments. Experimental TION algorithm. Guided by the DE-MC algorithm, samples
results, including the comparison with the performance of other can make progressive move to the important regions of the
particle filtering methods demonstrate the reliable tracking conditional distribution as defined by the image likelihood.
performance of the proposed approach. Furthermore, we apply the DE-MC particle filter to the
Index Terms— Articulated human motion tracking, importance 3D articulated model-based human motion tracking problem.
sampling, particle filtering, DE-MC. A 3D articulated human body model is proposed in this paper,
I. I NTRODUCTION and we also design a robust multi-cue based measurement
function which describes the resemblance between hypothesis

H UMAN motion tracking plays a critical role in human


movement understanding and analysis in that it quanti-
fies human’s motion and is of interest for many applications
and image observations. Based on the proposed algorithm
and multi measurements, we perform a variety of monocular
human motion tracking experiments. The experimental results
including security and surveillance system, video indexing demonstrate the reliable tracking performance of the proposed
and retrieval, sports training, movie special effect production tracking algorithm.
etc. For human being, the vision system can locate a person Our major contributions in this paper are listed as follows:
and track body parts easily, accurately and quickly, but the 1) Since the traditional CONDENSATION algorithm ignores
same task remains arduous for computers. Many challenges the most recent observation and thus produces unreliable human
have been impeding the progress of the research in this field. motion tracking results, we introduce the DE-MC algorithm from
First, a large number of degrees of freedom (DOF) will be statistics and optimization theory to address this problem. The
introduced by the articulated human body model in practical proposed DE-MC particle filter incorporates both the advantage
human motion tracking, and thus the tracking task need to of the Differential Evolution algorithm in global optimization
search for the optimal position in the high dimensional state and the ability of the Markov Chain Monte Carlo in reasonably
space. Second, the crucial factors to a tracking problem are sampling a high dimensional state space. 2) We apply the
how to allocate the samples and how to guide their movement proposed DE-MC particle filter to address the 3D articulated
Manuscript received March 6, 2012; revised July 24, 2012, November model-based human motion tracking problem. Specifically, we
19, 2012, and April 14, 2013; accepted May 1, 2013. Date of publication present our 3D articulated human body model and fuse silhouette,
May 14, 2013; date of current version August 28, 2013. The associate editor color and boundary information to build a robust multi-cue
coordinating the review of this manuscript and approving it for publication
was Prof. Jenq-Neng Hwang. measurement function. Among them, the boundary information
M. Du is with the Department of Electrical and Computer Engineer- represented by the Fourier Descriptors (FD) sets up a novel and
ing, University of Maryland, College Park, MD 20742 USA (e-mail: effective connection between the estimated model parameters
mingdu@umd.edu).
X. Nan and L. Guan are with the Department of Electrical and Computer and the image likelihoods. Based on these novelties, our human
Engineering, Ryerson University, Toronto, ON M5B 1Z2, Canada (e-mail: motion tracking system achieves great improvement over the
xnan@ee.ryerson.ca; lguan@ee.ryerson.ca). traditional particle filtering methods.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. The remainder of this paper is organized as follows.
Digital Object Identifier 10.1109/TIP.2013.2263146 Section II reviews the related work in the field of human
1057-7149 © 2013 IEEE
DU et al.: MONOCULAR HUMAN MOTION TRACKING BY USING DE-MC PARTICLE FILTER 3853

motion tracking. Section III firstly introduces the background between different state variables, as has been done through
of Markov Chain Monte Carlo (MCMC), Differential Evolu- using the Rao-Blackwellised Particle Filter in [11]. Generally
tion (DE) algorithm and DE-MC algorithm, and then proposes speaking, most of these work rely on training sets and are
DE-MC particle filter algorithm. The 3D articulated human successful when the types of motion to be tracked are close
body model and image likelihoods are built in Section IV. to those in the training set.
Experimental results are shown in Section V and the conclu- From a different perspective, many researchers focus their
sions are drawn in Section VI. attentions on refining dynamical models. Given the posterior
history, they try to accurately predict the region that covers the
solution at current time step by deriving process noise from
II. R ELATED W ORK
an uncertainty description matrix [12], [13], by building an
The core of the tracking algorithm is its mechanism of adaptive velocity model [14]–[16], and especially, by learning
searching for configurations which interpret observations best. motion templates. Authors in [17]–[19] present a real-time
For this reason, a rich body of technical literature was devoted full body tracker. In their work, the parameter space is first
to designing an efficient sampling and search strategy. Basi- partitioned into Gaussian clusters each representing an elemen-
cally the two tasks are towards the same goal and are closely tary motion. A prior dynamics model is then learnt from this
linked: a good sampling strategy will substantially increase low dimensional representation by using an unsupervised EM
the efficiency of searching, and a proper search strategy clustering algorithm. The temporal dependencies of high-level
will increase the possibility of finding extrema. Most human behaviors are captured by a variable length Markov model
motion trackers are based on particle filters. Application of (VLMM). By using the learnt dynamics model, the propaga-
particle filters (or Sequential Monte Carlo Sampling) in the tion of candidate poses is biased in the low dimensional space.
computer vision society can be traced back to the CONDEN- Authors in [17]–[19] apply their 3D human motion tracker
SATION algorithm [1]. Though paving a path to solving many to multi-view videos. They propose a hierarchical algorithm
visual tracking problems, the CONDENSATION algorithm to merge silhouette extraction and volumetric reconstruction
and most of its variants turn out to be inadequate when together, which can efficiently evaluate candidate poses against
the dimensionality of the state space increases by a certain evidences captured from multiple views. The hierarchical
degree. In such a space, the samples are distributed extremely volumetric reconstruction algorithm can effectively fuse infor-
sparsely. Unless enough of them fall into the neighboring mation from different views and resolve spatial ambiguities.
region of a solution, there is no guarantee of reaching this A rich body of literature [20]–[23] has discussed how to utilize
solution. Unfortunately, the simplification which differentiates motion templates to guide the tracking even though it can only
the CONDENSATION (or generic particle filter) from the be applied to limited motion types. All of these efforts rely
Sequential Monte Carlo sampling (SMC) makes a nontrivial heavily on the knowledge about dynamical prior. They usually
move away from the objective. make too rigid assumptions to be applied to general tracking
A possible compensation is to reduce the dimensionality of scenes.
the searching space and hence make the samples relatively Because of the aforementioned limitations of those schemes,
more dense in their distribution. The well-known dimen- we realize that the improvement should be made on the
sionality reduction algorithms such as Principle Component core of particle filtering, for instance, the sampling strat-
Analysis (PCA) and Independent Component Analysis (ICA) egy. Importance sampling is first introduced in the form
sometimes are very useful. In [2], for example, PCA is used of ICONDENSATION [24], [25] in which the samples are
to explore the 3D head motion and pose estimation problem. not only drawn according to the dynamical priors but also
However, the linear nature of the PCA and ICA usually an auxiliary importance function. Poon and Fleet’s Hybrid
make them inadequate to handle the complicated relations Monte Carlo Filtering (HMCF) [26] bears some similarity
between joint angles. There are many non-linear dimensional to our work in that it uses MCMC to generate samples.
reduction techniques available, but some of them, such as But unlike our work, the HMCF follows the gradient of the
Isomap, Laplacian Eigenmaps, Locally Linear Embedding, are posterior distribution to find promising samples. It requires
non-invertible. The shortcoming prevents them from being a analytical form of image likelihood gradient, which is not
qualified candidate in visual tracking applications since we available for most image cues. Therefore it cannot maximize
must return to the state space to retrieve the tracking result. the use of abundant visual information contained in a video
In addition, non-linear dimensionality reduction methods with for tracking. Sminchisescu and Triggs [27] use kinematic jump
inverse mapping ability such as the Locally Linear Coor- sampling to handle the forwards/backwards ambiguities in the
dination (LLC) [3]–[5] algorithm and the Gaussian Process estimation of limb’s pose. While this sampling strategy targets
Latent Variable Model (GPLVM) [6]–[10] have been used for a specific problem structure, our work is concerned with
for tracking human motion. In [5], the LLC preserves the more general case of sampling. In [28], Sminchisescu and
clustering behavior of similar high-dimensional data points and Triggs develop a proposal density based on uncertainty of
separate the different clusters in the global coordinate system, local parameter estimation. Along the eigen-directions which
and the models learned from the LLC is then used for tracking. account for the most uncertain parameters (usually the ones
Authors in [10] employ GPLVM for pose synthesis given that describe the movement in depth), sampling covariances
kinematics constraints. Some researchers also try to partition are inflated by a large scale. So the generated samples can
the search space by looking for the conditional dependencies focus on the modes that cause ambiguities. The unscented
3854 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 10, OCTOBER 2013

particle filter (UPF) [29]–[31] draws samples from a proposal state at each time step according to a posterior distribution
distribution which is determined by the calculation result of p(X 0:k |Y1:k ), where X 0:k = {X 0 , X 1 , . . . , X k } are the state
the unscented Kalman filter (UKF). Alongside approaches that vectors up to and including time k and Y1:k = {Y1 , . . . , Yk }
try to “generate good samples”, there are also research projects are the observations in the same period of time. By Bayesian
devoted to “turning bad samples into good ones”, i.e. driving Inference [1]:
the existing biased samples to a global or quasi-global extrema.
p(X k |Y1:k ) = λk P(Yk |X k ) p(X k |Y1:k−1 )
The Kernel Particle Filter (KPF) [32]–[34] is based on kernel
density estimation and moves the samples towards the gradient = λkP(Yk |X k ) ·
ascending direction of the posterior distribution density. Wu p(X k |X 0:k−1 , Y1:k−1 )
et al. [35] model the 2D tracking problem using a dynamic
Markov Random Field, in which the node potential at each × p(X 0:k−1 |Y1:k−1 )d X 0:k−1

time instance is determined by dynamical model and edge
= λk P(Yk |X k ) p(X k |X k−1 )
potentials are location constraints for body part pairs. They use
the mean field algorithm for inference. In [25], [36], [37], the × p(X k−1 |Y1:k−1 )d X k−1 , (1)
simulating annealing algorithm and the genetic algorithm (GA)
together constitute the foundation of the annealed particle filter where λk is a normalization constant that is independent with
(APF). In this case samples are pushed gradually toward the Xk.
global maximum of a weighting function by progressively By applying the Monte Carlo principle, the continuous
adjusting the sensitivity of the weighting function. In [38], distribution can be represented by discrete samples through
the articulated human body motion is decomposed into limb particle filtering. And since the direct sampling is difficult for
motions. The coupling between limb motions are captured by the complicated posterior distribution, importance sampling
Sequential Ancestral Simulation (SAS) of Bayes Networks principle is adopted. We summarize a typical particle filter
and by Markov Random Field model in the sample weights. step as follows:
(i)
Although this work relies on cyclic motion prior and thus not At time step k, starting with a sample set: {X k−1 ,
(i)
applicable to general motions, an iterative sampling is found ω(X k−1 )}i=1N
:
ˆ N
to be able to improve the quality of sampling, which resembles 1) Selection: Select a new set of samples { X k(i) }i=1 from
the idea of the APF. (i) N (i)
{X k−1 }i=1 according to ω(X k−1 ). The samples with a
The preceding works have demonstrated that the decisive
larger weight should be selected with a higher probabil-
factors to a high-dimensional visual tracking problem are how
ity. The detailed selection strategy can be found in [1]
to allocate the samples and how to guide their movement in
and [39].
the state space. Motivated by this perspective, we present a
2) Prediction: Sample from the proposal function,
new algorithm which makes substantial changes to the core
ˆ
structure of the CONDENSATION algorithm. An information {X k(i) } ∼ g(X k(i) | X k(i) , Yk ), i = 1, 2, . . . , N. (2)
exchange scheme is built between the sampling and weighting
steps to guide the moves of samples in the state space. This 3) Measurement: Evaluate the weight for each sample,
adjustment also has the effect of stochastic global optimiza- ˆ
tion. In addition, the proposed algorithm does not require p(Yk |X k(i) ) p(X k(i) | X k(i) )
ω(X k(i) ) = , i = 1, 2, . . . , N,
any training for dynamical priors. As our work is based on (i) ˆ(i)
g(X k | X k , Yk )
the DE-MC, a method originally developed for approximating (3)
target density functions in statistics, we name the proposed where the p(Yk |X k(i) ) is the image likelihood and
algorithm the DE-MC particle filter. We will demonstrate its ˆ
p(X k(i) | X k(i) ) is the dynamical model. Then normalize
effectiveness with application in monocular 3D human motion N (i)
tracking. The generality of the algorithm however, allows it to the weight so that i=1 ω(X k ) = 1.
be readily used in other visual tracking contexts. 4) Representation: Estimate the state at time step k as
X̃ k = argmax X (i) ω(X k(i) ), i = 1, . . . , N, (4)
III. DE-MC PARTICLE F ILTER k

or
In this section, we will first give a brief introduction to 
N
(i) (i)
the particle filter, the Markov Chain Monte Carlo (MCMC) X̃ k = E[X k ] = ω(X k )X k . (5)
algorithm, the Differential Evolution (DE) Algorithm and the i=1
DE-MC algorithm. Then we propose the DE-MC particle filter. Particle filters often suffer from the degeneracy problem,
referring to the cases in which all but a few particles have
A. Particle Filter negligible weights after some iterations. An indicator of the
In an articulated model-based human motion tracking prob- degree of degeneracy is the effective sample size, or survival
lem, joint angles together with global translation and rotation diagnostic [40]:
parameters constitute a state vector. This vector gives a com- 1
Neff =  .
N 
(6)
plete description for the pose of human. Therefore, the visual (i) 2
tracking problem can be formulated as recursively estimating i=1 ω(X k )
DU et al.: MONOCULAR HUMAN MOTION TRACKING BY USING DE-MC PARTICLE FILTER 3855

A small effective sample size indicates severe degeneracy. In Now we verify that Equation (13) has the prop-
this paper, we try to use a better proposal distribution to tackle erty of reversibility. Firstly, we verify the case when
the degeneracy problem. The optimal proposal distribution is ( p(X k )g(X k−1 |X k ))/( p(X k−1 )g(X k |X k−1 )) > 1. Given this
given by inequality condition, we have:
ˆ ˆ
g(X k(i) | X k(i) , Yk ) = p(X k(i) | X k(i) , Yk ). (7) α(X k−1 , X k ) = 1
However, in practice it is very difficult to either obtain the and
analytical form of the optimal proposal distribution or draw p(X k−1 )g(X k |X k−1 )
α(X k , X k−1 ) = ,
samples from it. Generic sequential importance resampling p(X k )g(X k−1 |X k )
(SIR) particle filters use the dynamical prior as substitute for so:
the proposal distribution,
p(X k−1 )T (X k |X k−1 ) = p(X k−1 )g(X k |X k−1 )
(i) ˆ(i) (i) ˆ(i)
g(X k | X k , Yk ) = p(X k | X k ). (8) p(X k−1 )g(X k |X k−1 )
= p(X k )
In this case, the sample weight in Equation (3) is nothing but p(X k )g(X k−1 |X k )
the image likelihood, ×g(X k−1 |X k )
= p(X k )T (X k−1 |X k ).
ω(X k(i) ) = p(Yk |X k(i) ). (9)
The cases of ( p(X k )g(X k−1 |X k ))/( p(X k−1 )g(X k |X k−1 )) < 1
However, a state space of 20 or so dimensions appears to be so and ( p(X k )g(X k−1 |X k ))/( p(X k−1 )g(X k |X k−1 )) = 1 can be
vast that the dynamical model alone cannot project the samples proved the same way.
into the most probable locations of the solution. Indeed, the One condition that ensures the quality of convergence is
sparse samples thus generated are almost certain to miss many that g(X)/ p(X) > 0 everywhere, so g(X) is usually chosen
local or global modes. Therefore, to find a new strategy that such that it is similar in shape to p(X), the target distribution.
we can perform efficiently stochastic search in such a high- g(X k |X k−1 ) determines how the state space is exploited. This
dimensional space has become an urgent issue. This is our is especially important to a high-dimensional problem such as
motivation in developing the DE-MC particle filter. human tracking. A popular choice is to generate new samples
from a symmetric random walker sampler, as the Metropolis
B. Markov Chain Monte Carlo algorithm does. It means the sampling proposal is determined
A Markov Chain (MC) can be described by a transition only by the samples’ separation from X k−1 :
matrix T in which g(X k |X k−1 ) = g(|X k − X k−1 |). (14)
Tmn = p(X k = Sn |X k−1 = Sm ), (10) Thus the acceptance rate reduces to
 
where Sn and Sm are two of the probable states. Regardless p(X k )
of which initial state it starts, the Markov chain will always α(X k−1 , X k ) = min 1, . (15)
p(X k−1 )
reach a steady state distribution p(X) if the transition matrix T
possesses irreducibility and aperiodicity properties [41]. These The calculations of the MH algorithm and the Metropolis algo-
two properties guarantee a finite path from each state to every rithm are especially convenient because they can be applied
other state with non-zero transition probability, which is the to situations in which we cannot directly draw samples from
so-called ergodicity. the target distribution, but know how to roughly evaluate their
The Markov Chain Monte Carlo (MCMC) algorithm takes values everywhere. This is precisely the case we encounter in
aim at constructing a MC which has the given target distri- human motion tracking.
bution as its invariant distribution [41]. Normally we ensure
the stationarity by making the chain satisfy the reversibility C. Differential Evolution Algorithm
property, The Differential Evolution algorithm (DE) is an algorithm
p(X k )T (X k−1 |X k ) = p(X k−1 )T (X k |X k−1 ). (11) dealing with the parallel search for a global maximum through
high dimensional state space [42]. Similar to other evolution-
The most frequently adopted MCMC method is the ary program methods such as the Genetic algorithm, it is also
Metropolis-Hasting (MH) algorithm [41]. According to this based on evolution theory and the competition mechanism:
algorithm the transition probability is given by stronger members can more easily survive to the next gen-
eration so as to guarantee that the new generation is better
T (X k |X k−1 ) = α(X k−1 , X k )g(X k |X k−1 ), (12)
than the previous one as a whole. Compared with the Genetic
where g(X k |X k−1 ) is the proposal distribution we can directly algorithm, the Differential Evolution algorithm is defined in
sample from and real parameter spaces instead of binary code parameter spaces.
  So it is much simpler to implement. The DE algorithm can
p(X k )g(X k−1 |X k )
α(X k−1 , X k ) = min 1, (13) explore non-isotropic structures such as ridges in the target
p(X k−1 )g(X k |X k−1 ) function because the vector differences are usually aligned
is the acceptance rate. with the direction of the ridges.
3856 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 10, OCTOBER 2013

Assume that a complicated function f (E) is defined over a


D-dimensional state space ε. Assume also that we do not know
the analytical form of this function (or the analytical form is
too complicated for a gradient-based method to apply), but can
evaluate the value indirectly. We can use the DE algorithm
to search the global maximum with an initial population
E n,0 , n = 0, 1, . . . , N − 1. N is the number of populations.
The simplest version of the DE algorithm generates a new
generation of population in time step k + 1 according to

E n,k+1 = E n,k + λ(Er1,k − Er2,k ), (16)
where r 1, r 2 are random integers drawn from [0, 1, . . . , N −1]
Fig. 1. The DEMC algorithm simulation result: bimodal distribution (200
and mutually different, and λ > 0 controls the amplification samples, 20 iterations).
of differential variation. Its typical value is in the range of
[0.5, 1]. Whether a newly generated state vector will be 5) Choose a number h from U (0, 1). If R > h, E n,k =
∗ ; otherwise E
accepted is solely dependent on the value evaluated by the E n,k n,k = E n,k−1 .

target function. In a global maximum search, if E n,k+1 yields 6) Repeat steps 2–5 for iteration k + 1 until a convergence
a larger function value than E n,k , the state will be updated; or a preset end point is reached.
otherwise it will be kept unchanged. Several variants of the Note that to make the DE-MC algorithm match our work,
DE algorithm are available and the use of crossover operator here we are searching for a maximum instead of a minimum.
is optional [42]. ∗
Equation (17) satisfies the detailed balance condition if E n,k
is accepted:
D. Differential Evolution Markov Chain (DE-MC) Algorithm
E n,k−1 = E n,k + λ(Er2,k−1 − Er1,k−1 ) − g (19)
By examining the characteristics of the MCMC and the DE
algorithm we find that they aid each other in searching for and if it is rejected, the value remains unchanged.
an optimal solution. The acceptance rule in the DE part is We run simulations to verify the performance of the DE-
controlled by the MCMC acceptance mechanism, whilst the MC algorithm. As it is difficult to visualize a space of
step size and orientation of the random walk of the MCMC more than 3 dimensions, we only choose the target density
part is produced by the DE algorithm. By constructing multiple functions with one or two variables, which require 2D and
MCMCs in parallel, the state space is more efficiently explored 3D plot to display respectively. The first one is a bimodal
distribution p(x) ∝ 0.3e−0.2x + 0.7e−0.2(x−10) . The number
2 2
since the state vectors can be more reasonably distributed than
with a single chain. All these Markov chains can interact of samples is 200 and the value of λ is set to 1.19 according
with each other, sharing information with the aid of the to Equation (18). We start with a population of samples drawn
DE algorithm. Under the guidance of the DE algorithm, the from uniform distribution on [−10, 20]. Fig. 1 shows the result
MCMCs will gradually concentrate on the important regions after 20 simulation iterations. The approximated distribution
of the target distribution without being trapped in local basins. is represented in the form of histogram and the ground-truth
The DE-MC algorithm [43] is summarized as follows. density curve is overlayed. The second example is the Muller
DE-MC Algorithm potential surface as appeared in [44], [45], with the form
1) Start with a target function f (E) and an initial pop- 
4
Ai eai (X −xi )
2 +b
i (X −x i )(Y −yi )+ci (Y −yi )
2
ulation (E 0,0 , E 1,0 , . . . , E N−1,0 ), whose members are V (X, Y ) = . (20)
D-dimensional vectors. i=1
2) In the kth iteration, for each member of the population We set A = (−200, −100, −170, 15), a = (−1, −1, −6.5,
E n,k−1 , n = 0, 1, . . . , N − 1, randomly choose two 0.7), b = (0, 0, 11, 0.6), c = (−10, −10, −6.5, 0.7), x = (1,
integers r 1 and r 2 so that r 1 = r 2 = n. 0, −0.5, −1), y = (0, 0.5, 1.5, 1) and initially put all the
3) Create a new member E n,k ∗ by
samples at the minimum (−0.5, 1.5). We can view it as a
∗ bivariate distribution with respect to random variables X and Y
E n,k = E n,k−1 + λ(Er1,k−1 − Er2,k−1 ) + g. (17)
and dependent on parameters A, a, b, c, x and y. It has been
λ is the scalar parameter as used in Equation (16). shown in [44] that with the standard dynamic sampling the
It can control the tradeoff between the jumping pace of samples are still trapped in the starting cost basin even after
the samples and the degree of diffusion of the samples. 6000 iterations, while from Fig. 2(b) we can see that the
It is suggested that the value should be set around: important regions of target distribution are well approximated
√ by 200 samples with the DE-MC algorithm after only 30
λ = 2.38/ 2D (18)
iterations. Most samples finally occur around the three minima.
and g is drawn from a symmetric distribution such as Here we do not compare our result with that obtained by the
Gaussian or uniform distribution with small variance hyperdynamicas importance sampling because that algorithm
compared to that of E. attempts to attract the samples to saddle points instead of
∗ )/ f (E
4) Compute the ratio R = f (E n,k n,k−1 ). extrema.
DU et al.: MONOCULAR HUMAN MOTION TRACKING BY USING DE-MC PARTICLE FILTER 3857

Fig. 2. The DEMC algorithm simulation result: Muller potential surface Fig. 3. The structure comparison of the DE-MC particle filter and the generic
(200 samples, 30 iterations). particle filter.

E. DE-MC Particle Filter Equation (6). The step size of random jumping for current DE-
Based on the DE-MC algorithm, we propose a novel MC iteration is reduced if the survival rate of the last DE-MC
sequential Monte Carlo sampling approach, namely the DE- iteration is high and is inflated otherwise.
MC particle filter. The DE-MC particle filtering iteration at As previously mentioned, current observations are invisible
time step k can be summarized as follows. to a generic SIR particle filter prior to the measurement stage.
DE-MC Particle Filter Algorithm As such, there is a high risk that these sparse samples are not
Starting from the set of particles which are the filtering placed near the modes of the conditional distribution as defined
(i) (i)
result of time step k − 1, {X k−1 , ω(X k−1 )}i=1
N .
by the likelihood. The DE-MC particle filter, on the other
ˆ(i) N hand, can iteratively use the current observation to adjust the
1) Selection: select a new set of samples { X k }i=1 from
(i) N (i) distribution of particles (See Fig. 3). The success of the DE-
{X k−1 }i=1 with the probability proportional to ω(X k−1 ). MC particle filter relies on two facts. First, as an importance
2) Prediction and Measurement: Apply a constant veloc- sampling algorithm, MCMC guarantees that the samples are
ity dynamical model to the samples: in accordance with the target conditional distribution. As a
ˆ result, samples will be concentrated around the modes of the
X k(i)− = X k(i) + Vk−1 , (21)
observation likelihood. Second, as a stochastic optimization
where Vk−1 is the velocity vector computed in time algorithm, the DE algorithm reduces the chance that the
(i)− N
step k − 1. The particle set {X k }i=1 then acts as samples are trapped in some of the local modes.
the initial population for a T -iteration DE-MC process-
ing. The processing follows the DE-MC algorithm we IV. 3D H UMAN B ODY M ODEL AND I MAGE L IKELIHOODS
listed in Section III-D. The fitness function is the In Section III, we present the proposed DE-MC particle
image likelihoods in the case of visual tracking. For filter to perform an efficient stochastic search to locate the
Equation (17), we choose g ∼ U (−cσ, cσ ) wherein modes of likelihoods. In this section, we will discuss some
σ = [σ0 , σ1 , . . . , σ D−1 ]T is a vector with the elements details and application level issues of the proposed DE-MC
equal to standard deviations for the elements in X. particle filter when it is used for tracking monocular human
Normal distribution can be used here instead of uniform motion video sequences.
distribution. c is a small number which can be flexibly
chosen. Also in the same equation, the value of λ is
A. 3D Articulated Human Body Model
determined by
Articulated human body models are consistent with the
2.38
λ = (1 − c) × √ . (22) natural mechanism of human motion. Therefore, we are able
2D to directly apply our knowledge about human motion to it.
At the end of this step, we take the output population as The model usually has a hierarchical structure, so the motion
(i) (i) N
the particle set of current time step: {X k , ω(X k )}i=1 . of a parent node will constrain that of its child or grandchild
3) Representation and Velocity Updating: Estimate the nodes. This relationship is reflected by the rigid geometric
state at time step k as transformations between the local coordinate systems of the
body parts:
X k = argmax X (i) ω(X k(i) ), i = 1, . . . , N, (23) P  = R P + T, (25)
k

and calculate the velocity vector of current time step where the same point is represented in two different body part
Vk = X k − X k−1 . (24) coordinate systems by P = [x, y, z]T and P  = [x  , y  , z  ]T .
R is a 3 × 3 rotation matrix and T = [Tx , Ty , Tz ]T is the
We adopt a strategy inspired by [36] to help the filter adapt translation vector.
to the changes of situations: to calculate the value of σ in The model we built has 14 body segments, and 21 DOFs
step 2 as proportional to the survival diagnostic as defined in (Degrees of Freedom) are associated with them. The segments
3858 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 10, OCTOBER 2013

Apparently, if the hypothesis silhouette and the ground-truth


silhouette totally fit, p1 (Yk |X k(i) ) = 1. Otherwise if they do
not overlap each other at all, p1 (Yk |X k(i) ) = 0.
2) Color: 2D image pixels are generated by points in the
real world through perspective projection. We make an inverse
projection from the image to the 3D model:
−1 −1
Pm = Mlm (X 0 , Mmw (Q −1 (Pi , E))), (27)

where Pm and Pi are the local coordinates of a human body


Fig. 4. The proposed 3D articulated human body model and its hierarchical surface point and the image coordinates of its projection,
structure. Number of DOFs assigned to each joint is presented in the diagram.
LUA means Left-Upper-Arm, RLL means Right-Lower-Leg, LF means Left-
respectively; Q is the perspective projection; E is the extrinsic
Foot etc.) camera parameter solved from calibration; Mmw and Mlm are
the rigid transformation which relates the model coordinates
are represented by truncated cones or spheres, which provide to the world coordinates and relates the body segment local
acceptable approximation to the real body parts and save much coordinates to the model’s global coordinates, respectively;
computational cost at the same time. Geometric parameters and X 0 is the initial pose learned from initialization.
such as the radius and height of the truncated cones correspond After the correspondences between model points and image
to anthropometric data of humans, so they are supposed to pixels have been built through Equation (27), we construct
vary with subjects but remain constant for the same video a reference appearance model for visible image patches of
sequence. We do not model the hands separately. Instead, we individual body segments in the form of 10 × 10 × 10 bins
regard them as a part of lower arms, because they do not normalized color histogram. This is done after initialization.
make a big difference in the observation model and modeling For the invisible body parts or the invisible side of a part,
them will introduce additional DOFs. For the same reason we we make justified assumptions such as left-right symmetry of
do not assign DOFs to the ankle joint. Joint limit constraints human appearance and uniform appearance distribution for the
are imposed to make the model kinematics consistent with surface of a body segment. At each time step k, the difference
common sense. The articulated model and its hierarchical of the reference histogram Hr and the hypothesis histogram
structure are shown in Fig. 4. Hk(i) for hypothesis pose vector X k(i) is measured. A popular
In our human motion tracking application, the state vector way of doing so is to sum the Bhattacharyya distance of all
is defined as X = [T0 , θ0 , θ1 , . . . , θ13 ], where T0 and θ0 are the individual body segment histograms [46], [47]:
the global translation and rotation parameters for the root body
part - torso, and θi are the joint angle vectors of the remaining (i)

14 
3
(i)
(i) D(Hr , Hk ) = D(Hr , Hk,( j,m) ), (28)
body parts as shown in Fig. 4. Correspondingly, X k is a
m=1 j =1
hypothesis state vector or a sample generated by the DE-
MC particle filter. In the following, we are going to model where
the conditional probability p(Yk |X k(i) ), which will be used to

f
(i)

calculate the sample weights ω(X k ) in the DE-MC particle (i)
D(Hr , Hk,( j,m)) = 1 − (i)
Hr,(n, j,m) Hk,(n, j,m) , (29)
filter. n=1

n is the bin index, j is the RGB channel index, m is the


B. Image Likelihoods body segment index and f is the total number of bins.
In this work we choose silhouette, color and boundary as Considering different contributions each body segment makes
image cues to define the image likelihood function. Consid- to the observations, we propose the assignment of different
ering the task scenario, an improved scheme for histogram weight factors to their histograms, which turns Equation (28)
distance calculation is proposed. We also characterize the into
boundary of human bodies with the Fourier Descriptor (FD). (i)
14 3
(i)
It provides a reliable measurement for image likelihood and D(Hr , Hk ) = αm D(Hr , Hk,( j,m)), (30)
is especially suitable for the current task. m=1 j =1

1) Silhouette: Assuming a static camera and video back- where αm denotes the weights. They are proportional to the
ground, a ground-truth silhouette S is extracted through back- area of the image patches projected by different body parts.
ground subtraction. We compare it with S  , which is generated We will show in Section V how this change improves the per-
by a hypothesis pose X k(i) . The pixels are categorized into 2 formance of the tracker. The color measurement distribution
groups, R1 and R2 with R1 = S ∩ S  and R2 = S ∪ S  . can then be formulated as
Let the number of pixels in R1 and R2 be N1 and N2 , 2 (H ,H (i) )
respectively, and then the silhouette area measurement density p2 (Yk |X k(i) ) = e−β D r k , (31)
can be represented by
where β is a scalar that helps the calculation result more
(i) N1 reasonably distributed in the range of (0, 1). To make the ref-
P1 (Yk |X k ) = . (26)
N2 erence appearance model adaptive to the variation of lighting
DU et al.: MONOCULAR HUMAN MOTION TRACKING BY USING DE-MC PARTICLE FILTER 3859

Fig. 5. Fourier descriptors as image cue. (a) Boundary extraction result


containing more than 2300 pixels (left) and the boundary reconstructed from
the first 100 (middle) and the first 50 (right) FD coefficients. (b) Euclidean
distance between Fourier Descriptors extracted from ground-truth image and Fig. 6. Image likelihood function surface with respect to 2 DOFs (the
that extracted from hypothesis image. translational movement of left knee joint and right hip joint).

conditions throughout the whole sequence, an update process do not need to do any modifications to make the FD insensitive
can be applied: to translation, rotation and scaling, which we would normally
+ − + have to worry about in many shape analysis scenarios, such as
Hr,k = λHr,k + (1 − λ)Hr,k−1 , (32)
image retrieval. On the contrary, the FD needs to be sensitive
where the sign + and − distinguish the reference appearance to those transformations to reflect the human motion. However,
model after and before the update has occurred. we do wish to avoid the FD’s sensitivity to the starting point,
3) Boundary: Boundaries are often confused with edges and so we set a fixed corner of the boundary as the starting point.
contours. Here we define the boundary as the outer border of Fig. 5(b) demonstrates the effectiveness of FD as a measure-
an object that does not enclose any holes. Consequently, we ment feature for tracking. The images on the left and the top
can not use typical edge and contour extraction method for of the checkerboard are the ground-truth observations and the
boundary extraction. Instead, a morphology operator is applied hypothesis ones, respectively. The gray-level values of those
to the silhouettes S: blocks are proportional to the Euclidean distance between the
FDs of the ground-truths and that of the hypotheses. A dark
B=S−S
M (33)
block indicates a strong resemblance and a bright one indicates
where M is a 3 × 3 uniform structuring element and
otherwise. As we expected, the blocks along the diagonal axis
signifies erosion. We choose Fourier Descriptor as the feature are the darkest among the row and the column they are located
to represent a boundary B: in.
4) Fusion: We fuse the three image cues for an overall
1 
N−1
n image likelihood density function
B( f ) = b(n)e−2π f N , (34)
N
n=0 p(Yk |X k(i) ) = p1 (Yk |X k(i) ) · p2 (Yk |X k(i) ) · p3 (Yk |X k(i) ). (37)
where
In our experiments, we assume three cues are of the same
b(n) = x(n) + j y(n), (35)
importance. Fig. 6 shows the surfaces of the proposed image
and [x(n), y(n)](n = 0, 1, . . . , N − 1) are the image coordi- likelihood function for two different frames. In order to be
nates of the pixels on B. N is the total number of pixels on B. visualized, the likelihood function is shown with regard only
The boundary information-based measurement density is then to 2 DOFs of the human body model and all the other DOFs
formulated as are set as constant. The significant peak suggests the validity
(i) of our image likelihood design scheme.
p3 (Yk |X k(i) ) = e−ρ D(B( f )k ,B( f )k ) , (36)
where ρ has a similar function as the β in Equation (31) V. E XPERIMENTAL R ESULTS
(i)
and D(B( f )k , B( f )k ) is the Euclidean distance between the We performed experiments with the proposed DE-MC par-
FD of the ground-truth boundary and that of the boundary ticle filter and the measurement function. We used seven
(i)
generated by hypothesis pose X k . monocular human motion video sequences in the exper-
The first 50 or 100 coefficients in B( f ) are usually sufficient iments. Sequences 1 to 4 are walking, hopping, run-
to reconstruct a boundary of thousands pixels without losing ning and jumping sequences, respectively in a regular set-
much fidelity, as we can see from Fig. 5(a). Using only low ting. Sequence 5 is a public test sequence downloaded
frequency components of the FD allows a strong emphasis to from www.csc.kth.se/hedvig/movies.html, which is a walking
be laid on the gross essence of boundary since high-frequency sequence in a circular trace with a complex background.
components of the FD correspond to noise or trivial details. Sequences 6 and 7 are public test sequences from HumanEva
Moreover, we can utilize the Fast Fourier Transform (FFT) to Database [48], with the subject performing boxing actions
accelerate the computation to a large extent. The FD can also in Sequence 6, and a combination of walking and jogging
be directly integrated into the human tracking framework. We actions in an elliptical path in Sequence 7. The lengths of
3860 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 10, OCTOBER 2013

(a) Tracking results for the walking sequence by the DE-MC (b) Tracking results for the hopping sequence by the DE-MC
particle filter. particle filter.

(c) Tracking results for the running sequence by the DE-MC (d) Tracking results for the jumping sequence by the DE-MC
particle filter. particle filter.

Fig. 7. Snapshots of the DE-MC particle filters tracking results for: (a) walking, (b) hopping, (c) running, and (d) jumping sequences, respectively.

video sequences vary from 1.2 seconds to 5.5 seconds and the any state-of-art bottom-up detectors. Please refer to Yang and
frame size is 720 × 480 for Sequences 1 to 4, 320 × 240 Ramanan’s “Mixture of Deformable Parts” model [49] and
for Sequence 5, 640 × 480 for Sequences 6 and 7. For Viola and Jones’ object detection work [50] for examples of
Sequences 1 to 4 we do not ask the subjects to wear shorts and such detectors.
T-shirts or other clothes that can label the body parts by using
different colors or patterns as clues. The human subject wears
loose fit clothing with uniform color and pattern. It will bring A. Full-Body Human Motion Tracking Experiments
even more difficulties to the task. Although Sequences 1–4 In our work, the captured human motion video sequence is
are shot in side view, the hopping and jumping videos also the ground-truth observation, while an image generated by the
contain many limb motions in depth direction. Sequence 5 articulated human model is called a hypothesis observation.
is more challenging in that not only its background is more Since the 3D joint angle can only be measured by asking the
complicated but also there are considerably more motions subject to wear motion capture devices and there is always an
vertical to image plane in it. Due to serious self-occlusions unsolvable ambiguity in the depth direction based on only 2D
and depth ambiguity, monocular image sequences offer a observation, we currently consider the overlapping as the way
huge challenge for any human motion tracking algorithm. to make qualitative analysis. This method is adopted by many
Moreover, there are both cyclic and non-cyclic motions in related works [9], [28], and [36].
those test sequences. To test the performance of the pro- Figs. 7 and 8 show part of the tracking results for the
posed algorithm, we carry out different kinds of experiments. test sequences. For the walking sequence, a 7-layer DE-MC
In these experiments, model parameters are manually initial- particle filter is used (here we use “layer” to represent a DE-
ized for each test video. However, this stage can be replaced MC iteration). The number of particles is 500, which can
with an automatic initialization module which is driven by lead to a satisfactory balance between the reliability and the
DU et al.: MONOCULAR HUMAN MOTION TRACKING BY USING DE-MC PARTICLE FILTER 3861

Fig. 8. Tracking results for the walking in circle sequence by the DE-MC particle filter.

computational cost of the tracker. For the hopping, running errors were still observed in this case, especially when the
and jumping sequences, a 9-layer DE-MC particle filter with limbs of the subject were bending and projected only to small
600 particles is used, in order to handle faster motions and areas on the image plane. However, overall, our algorithm
more depth ambiguities than those presented in Sequence 1. produces satisfactory results on both sequences.
In Fig. 8, a 12-layer, 500-sample DE-MC particle filter is
used for the walking in circle sequence. As we observed
from the shown results, motion blurs do affect performance B. Comparison Experiment 1
of the tracker, and so do self-occlusions. Misplacement of In Fig. 10, we compare the tracking results obtained by
limbs is inevitable in these cases. However, when blurs or self- implementing a 7-layer DE-MC particle filter with those
occlusions disappear, the tracker is able to correct the errors obtained by implementing other popular particle filtering-
in time. based algorithms. In terms of fair comparison, the experi-
Fig. 9 shows tracking results for several representative ment is based on almost the same number of measurement
frames of Sequence 6 and Sequence 7 from the HumanEva function evaluations because it is the most time-consuming
Database [48]. In the first sequence, the subject performs part for particle filtering. The experiment settings are: CON-
boxing actions, and, in the second, the subject moves in circle DENSATION – 5000 samples; annealed particle filter – 10
and performs a combination of walking and jogging actions, layers, 500 samples per layer; DE-MC particle filter – 7
following by certain free limb motions. Both sequences con- layers, 500 samples per layer. The other factors, such as
tain complex limb motions. We used a DE-MC particle filter the initialization result, initial standard deviation of the state
with 5-layer and 500 samples/layer for the first sequence, and a vector, constant velocity model, adaptive strategy and mea-
filter with 12-layer and 500 samples/layer for the second. Even surement function are all given the same settings for each
though more samples were used for the second sequence to algorithm. Despite consuming only 70% of the computations
offset the difficulty caused by view variations, some tracking spent by the other two algorithms, DE-MC particle filter still
3862 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 10, OCTOBER 2013

Fig. 9. Tracking results for Sequences 6 and 7 from the HumanEva Database [48]. The first two rows are the 50th, 100th, 150th, and 200th frames of the
Box-1(C1) video and the rest are the 500th, 600th, 700th, 800th, 900th, 1600th, 1800th, and 1900th frames of the Combo-2(C3) video.

outperforms them. It estimates every human body part with experiment remarkably demonstrate the performance of the
relatively high accuracy. The annealed particle filter, which proposed approach. Quantitative analysis of this experiment
is based on the Simulated Annealing algorithm, is shown is conducted by measuring the average of sample weights
to successfully track human walking from multi-view video ω(X k(i) ) at each frame. The average sample weights ω̄(X k(i) )
N
sequences in [36]. But in our monocular experiment, it just can be formulated as ω̄(X k(i) ) = (i)
i=1 ω(X k )/N at each
roughly captures the global location of the human subject time step k. They are regarded as a “score” of a tracker’s
but makes invalid estimates to the position of limbs in many efficiency since they can reflect, to a certain degree, how many
cases. The classic CONDENSATION algorithm cannot even of the samples are considered “valid”. The results are plotted
find the human body accurately after 20 or so frames. This in Fig. 11. As we have pointed out, DE-MC particle filter’s
DU et al.: MONOCULAR HUMAN MOTION TRACKING BY USING DE-MC PARTICLE FILTER 3863

Fig. 12. Comparison of the performance of the DE-MC particle filters with
two layers, five layers, and seven layers (from left to right, respectively). The
original image is the 48th frame of Sequence 1.

Fig. 10. Comparison of the tracking performance of the DE-MC particle


filters with CONDENSATION algorithm and the annealed particle filter
algorithm.

Fig. 13. The quantitative analysis of comparison experiment 2 in terms of


the average sample weights.

Fig. 11. The quantitative analysis of comparison experiment 1 in terms of


the average sample weights.

success in tracking is largely due to the efficient usage of


particles, which can be reflected by its stable score in Fig. 11.
On the contrary, tracking failure causes accumulated errors for
the annealed particle filter and CONDENSATION trackers and
leads to the consecutive dropping in scores. We can observe
consecutive drop of the score over the first several frames.This
Fig. 14. Comparison of the tracking performance based on different image
is however, because the impact of a precise initialization likelihood functions.
gradually decreases when the tracking proceeds.

C. Comparison Experiment 2 D. Comparison Experiment 3


In this experiment we compare the efficacy of different
In this experiment we compare the tracking results for
image cues. We compare the tracking results for Sequence
Sequence 1 obtained by using 2-layer, 5-layer and 7-layer
1 based on 4 different image likelihood functions and list
DE-MC particle filters. The numbers of samples per layer for
some of the results in Fig. 14. This experiment exactly
them are 1750, 700 and 500, respectively. The goal of this
demonstrates the necessity of fusing multiple image features
experiment is to verify that in terms of equal number of image
for tracking. We also test our weighting scheme of calculating
likelihood function evaluations more DE-MC iterations can
Bhattacharyya distance between two histograms. Fig. 15 shows
boost the tracker’s performance. This speculation is supported
the results of tracking Sequence 1 with a DE-MC particle
by the results shown in Fig. 12 and the quantitative analysis
filter before and after adopting the proposed color cue scheme
in terms of average sample weights shown in Fig. 13. Note
as formulated in Equation (30). The performance gain by
that the results shown here should not be confused with the
adopting this simple change is significant and evident.
intermediate output of a 7-layer DE-MC particle filter obtained
after the computation in the second, fifth and seventh layer. It
also differs from the DE-MC particle filtering demonstration E. Computation Time Analysis
experiment in that the result is an accumulative one and that The computation time varies with a number of factors with
the three trackers have different particle number settings. the two most important being the number of particles and
3864 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 10, OCTOBER 2013

[4] H. Chang and D. Yeung, “Robust locally linear embedding,” Pattern


Recognit., vol. 39, no. 6, pp. 1053–1065, 2006.
[5] R. Li, M. Yang, S. Sclaroff, and T. Tian, “Monocular tracking of 3D
human motion with a coordinated mixture of factor analyzers,” in Proc.
9th Eur. Conf. Comput. Vis., 2006, pp. 137–150.
[6] N. Lawrence, “Gaussian process latent variable models for visualisation
of high dimensional data,” in Advances in Neural Information Processing
Systems, vol. 16. Cambridge, MA, USA: MIT Press, 2004, pp. 329–336.
[7] N. Lawrence and A. Moore, “Hierarchical Gaussian process latent
variable models,” in Proc. 24th Int. Conf. Mach. Learn., 2007,
pp. 481–488.
[8] C. Ek, P. Torr, and N. Lawrence, “Gaussian process latent variable
models for human pose estimation,” in Proc. 4th Int. Conf. Mach. Learn.
Multimodal Interact., 2007, pp. 132–143.
[9] R. Urtasun, D. Fleet, and P. Fua, “3D people tracking with Gaussian
process dynamical models,” in Proc. IEEE Comput. Soc. Conf. Comput.
Vis. Pattern Recognit., vol. 1. Jun. 2006, pp. 238–245.
Fig. 15. Tracking results before (top) and after (bottom) adopting weighting [10] K. Grochow, S. Martin, A. Hertzmann, and Z. Popović, “Style-based
scheme for color-cue calculation. inverse kinematics,” ACM Trans. Graph., vol. 23, no. 3, pp. 522–531,
2004.
[11] X. Xu and B. Li, “Adaptive Rao–Blackwellized particle filter and its
the number of layers. On a workstation with quad-core Xeon evaluation for tracking in surveillance,” IEEE Trans. Image Process.,
2.4 GHz CPU, it takes 0.5 ∼ 1.4 seconds for a 500-particle vol. 16, no. 3, pp. 838–849, Mar. 2007.
[12] S. Wachter and H. Nagel, “Tracking persons in monocular image
DE-MC particle filter to finish the computation of one layer, of sequences,” Comput. Vis. Image Understand., vol. 74, no. 3,
which the overwhelming majority is spent on the calculation pp. 174–192, 1999.
of image likelihoods and feature extraction. The ratio of [13] C. Stauffer and W. Grimson, “Learning patterns of activity using real-
time tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8,
computation time for the three cues (silhouette: boundary: pp. 747–757, Aug. 2000.
color) is approximately 1 : 5.6 : 31.3. This is as expected [14] S. Zhou, R. Chellappa, and B. Moghaddam, “Visual tracking and
considering their respective computational complexities. recognition using appearance-adaptive models in particle filters,” IEEE
Trans. Image Process., vol. 13, no. 11, pp. 1491–1506, Nov. 2004.
[15] E. Kim, “Output feedback tracking control of robot manipulators with
VI. C ONCLUSION model uncertainty via adaptive fuzzy logic,” IEEE Trans. Fuzzy Syst.,
In this paper, we propose a novel approach for human vol. 12, no. 3, pp. 368–378, Jun. 2004.
[16] S. Zhou, R. Chellappa, and B. Moghaddam, “Appearance tracking using
motion tracking, the DE-MC particle filter, based on the adaptive models in a particle filter,” in Proc. 6th Asian Conf. Comput.
Differential Evolution algorithm, the Markov Chain Monte Vis., 2004, pp. 1–9.
Carlo theory and the particle filtering. This method can [17] F. Caillette, A. Galata, and T. Howard, “Real-time 3-D human body
tracking using learnt models of behaviour,” Comput. Vis. Image Under-
improve the traditional particle filter based approach in that stand., vol. 109, no. 2, pp. 112–125, 2008.
it bridges the gap between sampling and measurement. This [18] S. Hou, A. Galata, F. Caillette, N. Thacker, and P. Bromiley, “Real-time
is realized by incorporating a stochastic optimization method, body tracking using a Gaussian process latent variable model,” in Proc.
IEEE 11th Int. Conf. Comput. Vis., Oct. 2007, pp. 1–8.
the DE algorithm and a statistical sampling method, the
[19] F. Caillette, A. Galata, and T. Howard, “Real-time 3-D human body
MCMC algorithm into the particle filtering frame work. In tracking using variable length Markov models,” in Proc. Brit. Mach.
terms of searching for global optimal pose configuration Vis. Conf., vol. 1. 2005, pp. 469–478.
in a high-dimensional state vector space, the DE-MC par- [20] Y. Ouyang, Y. Ling, and J. Xing, “Based-APF human motion tracking
from monocular videos,” in Proc. IEEE Int. Conf. Image Anal. Signal
ticle filter achieves very good balance between exploration Process., Apr. 2009, pp. 179–183.
and exploitation. In addition, we apply the proposed DE- [21] I. Patras and E. Hancock, “Template trackingwith observation rele-
MC particle filter to track monocular human motion video vance determination,” in Proc. IEEE Int. Conf. Image Process., vol. 1.
Sep.–Oct. 2007, pp. 501–504.
sequences. An articulated 3D human body model is built. [22] I. Patras and E. Hancock, “Regression-based template tracking in pres-
Our robust image likelihood model contributes a lot to the ence of occlusions,” in Proc. 8th Int. Workshop Image Anal. Multimedia
satisfactory performance of the proposed algorithm. The multi- Interact. Services, Jun. 2007, pp. 15–19.
cue fusion scheme is shown to help shape a peaky observation [23] P. Perez and B. Gidas, “Motion detection and tracking using deformable
templates,” in Proc. IEEE Int. Conf. Image Process., vol. 2. Nov. 1994,
density function. We evaluate the proposed DE-MC particle pp. 272–276.
filter in a variety of experiments which include comparison [24] M. Isard and A. Blake, “Icondensation: Unifying low-level and high-
with other popular trackers. Experimental results have shown level tracking in a stochastic framework,” in Proc. 5th Eur. Conf.
Comput. Vis., 1998, pp. 893–901.
that a boosted tracking performance is achieved compared to [25] M. Fontmarty, F. Lerasle, and P. Danes, “Data fusion within a mod-
traditional particle filter based algorithms. ified annealed particle filter dedicated to human motion capture,”
in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Oct.–Nov. 2007,
pp. 3391–3396.
R EFERENCES [26] E. Poon and D. Fleet, “Hybrid Monte Carlo filtering: Edge-based
[1] M. Isard and A. Blake, “Condensation conditional density propagation people tracking,” in Proc. Workshop Motion Video Comput., Dec. 2002,
for visual tracking,” Int. J. Comput. Vis., vol. 29, no. 1, pp. 5–28, 1998. pp. 151–158.
[2] K. An and M. Chung, “3D head tracking and pose-robust 2D texture [27] C. Sminchisescu and B. Triggs, “Kinematic jump processes for monoc-
map-based face recognition using a simple ellipsoid model,” in Proc. ular 3D human tracking,” in Proc. IEEE Comput. Soc. Conf. Comput.
IEEE/RSJ Int. Conf. Intell. Robots Syst., Sep. 2008, pp. 307–312. Vis. Pattern Recognit., vol. 1. Jun. 2003, pp. 69–76.
[3] S. Roweis, L. Saul, and G. Hinton, “Global coordination of local linear [28] C. Sminchisescu and B. Triggs, “Covariance scaled sampling for monoc-
models,” in Advances in Neural Information Processing Systems, vol. 2. ular 3D body tracking,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis.
Cambridge, MA, USA: MIT Press, 2002, pp. 889–896. Pattern Recognit., vol. 1. Dec. 2001, pp. 447–452.
DU et al.: MONOCULAR HUMAN MOTION TRACKING BY USING DE-MC PARTICLE FILTER 3865

[29] R. Van Der Merwe, A. Doucet, N. De Freitas, and E. Wan, “The Ming Du received the B.S. degree in electrical engi-
unscented particle filter,” in Advances in Neural Information Processing neering from the Beijing Institute of Technology,
Systems. Cambridge, MA, USA: MIT Press, 2001, pp. 584–590. Beijing, China, in 2002, and the M.S. degree from
[30] K. Okuma, A. Taleghani, N. Freitas, J. Little, and D. Lowe, “A boosted the Department of Electrical and Computer Engi-
particle filter: Multitarget detection and tracking,” in Proc. 8th Eur. Conf. neering, Ryerson University, Toronto, ON, Canada,
Comput. Vis., 2004, pp. 28–39. in 2005. He is currently pursuing the Ph.D. degree
[31] X. Wang, S. Wang, and J. Ma, “An improved particle filter for target with the Department of Electrical and Computer
tracking in sensor systems,” J. Sensors, vol. 7, no. 1, pp. 144–156, 2007. Engineering, University of Maryland, College Park,
[32] C. Chang and R. Ansari, “Kernel particle filter for visual tracking,” IEEE MD, USA. His current research interests include
Signal Process. Lett., vol. 12, no. 3, pp. 242–245, Mar. 2005. computer vision and statistical pattern recognition,
[33] C. Chang, R. Ansari, and A. Khokhar, “Multiple object tracking with especially video-based face detection, and tracking
kernel particle filter,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. and recognition. He is working on video analysis based on machine learning
Pattern Recognit., vol. 1. Jun. 2005, pp. 566–573. algorithms.
[34] J. Schmidt, J. Fritsch, and B. Kwolek, “Kernel particle filter for real-
time 3D body tracking in monocular color images,” in Proc. 7th Int.
Conf. Autom. Face Gesture Recognit., Apr. 2006, pp. 567–572.
[35] Y. Wu, G. Hua, and T. Yu, “Tracking articulated body by dynamic
Markov network,” in Proc. 9th IEEE Int. Conf. Comput. Vis., Oct. 2003,
pp. 1094–1101.
[36] J. Deutscher, A. Blake, and I. Reid, “Articulated body motion capture
by annealed particle filtering,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit., vol. 2. Jun. 2000, pp. 126–133.
[37] G. Klein and D. Murray, “Full-3D edge tracking with a particle filter,” Xiaoming Nan received the M.S. degree in telecom-
in Proc. 17th Brit. Mach. Vis. Conf., 2006, pp. 256–260. munication engineering from the Beijing University
[38] C. Chang, R. Ansari, and A. Khokhar, “Cyclic articulated human motion of Posts & Telecommunications, Beijing, China, in
tracking by sequential ancestral simulation,” in Proc. IEEE Comput. Soc. 2010. He is currently pursuing the Ph.D. degree with
Conf. Comput. Vis. Pattern Recognit., vol. 2. Jun.–Jul. 2004, pp. 45–52. the Ryerson Multimedia Research Laboratory, Ryer-
[39] M. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A tutorial son University, Toronto, ON, Canada. His current
on particle filters for online nonlinear/non-Gaussian Bayesian tracking,” research interests include multimedia cloud comput-
IEEE Trans. Signal Process., vol. 50, no. 2, pp. 174–188, Feb. 2002. ing, content based video retrieval, immersive com-
[40] J. MacCormick and M. Isard, “Partitioned sampling, articulated objects, munication, and human body tracking. He received
and interface-quality hand tracking,” in Proc. 6th Eur. Conf. Comput. the Meritorious Prize in Mathematical Contest in
Vis., 2000, pp. 3–19. Modeling in 2006.
[41] C. Andrieu, N. De Freitas, A. Doucet, and M. Jordan, “An introduction
to MCMC for machine learning,” Mach. Learn., vol. 50, no. 1, pp. 5–43,
2003.
[42] R. Storn, “On the usage of differential evolution for function optimiza-
tion,” in Proc. Biennial Conf. North Amer. Fuzzy Inf. Process. Soc.,
Jun. 1996, pp. 519–523.
[43] C. Ter Braak, “Genetic algorithms and Markov chain Monte Carlo:
Differential evolution Markov chain makes Bayesian computing easy,”
Biometris, Wageningen UR, Wageningen, The Netherlands, Tech.
Rep. 010404, 2004. Ling Guan (S’88–M’90–SM’96–F’08) received the
[44] C. Sminchisescu and B. Triggs, “Hyperdynamics importance sampling,” Ph.D. degree in electrical engineering from the
in Proc. 7th Eur. Conf. Comput. Vis., 2002, pp. 769–783. University of British Columbia, Vancouver, BC,
[45] C. Sminchisescu and B. Triggs, “Building roadmaps of minima and Canada, in 1989. He is currently a Professor and a
transitions in visual models,” Int. J. Comput. Vis., vol. 61, no. 1, Tier I Canada Research Chair with the Department
pp. 81–101, 2005. of Electrical and Computer Engineering, Ryerson
[46] P. Perez, J. Vermaak, and A. Blake, “Data fusion for visual tracking University, Toronto, ON, Canada. He held visiting
with particles,” Proc. IEEE, vol. 92, no. 3, pp. 495–513, Mar. 2004. positions with British Telecom, London, U.K., in
[47] T. Roberts, S. McKenna, and I. Ricketts, “Online appearance learning for 1994, Tokyo Institute of Technology, Tokyo, Japan
3D articulated human tracking,” in Proc. 16th IEEE Int. Conf. Pattern in 1999, Princeton University, Princeton, NJ, USA,
Recognit., vol. 1. Aug. 2002, pp. 425–428. in 2000, National ICT Australia, Sydney, Australia,
[48] L. Sigal, A. O. Balan, and M. J. Black, “Humaneva: Synchronized in 2007, Hong Kong Polytechnic University, Hong Kong, from 2008 to
video and motion capture dataset and baseline algorithm for evaluation 2009, and Microsoft Research Asia, Cambridge, U.K., in 2002 and 2009.
of articulated human motion,” Int. J. Comput. Vis., vol. 87, nos. 1–2, He has published extensively in multimedia processing and communications,
pp. 4–27, 2010. human-centered computing, machine learning, and adaptive image and signal
[49] Y. Yang and D. Ramanan, “Articulated pose estimation with flexible processing. He is a fellow of the Engineering Institute of Canada and an
mixtures-of-parts,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Elected member of the Canadian Academy of Engineering. He is the IEEE
Jun. 2011, pp. 1385–1392. Circuits and System Society Distinguished Lecturer from 2010 to 2011 and he
[50] P. Viola and M. Jones, “Robust real-time object detection,” Int. J. is a recipient of the 2005 IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS
Comput. Vis., vol. 57, no. 2, pp. 137–154, 2001. Best Paper Award.

Anda mungkin juga menyukai