some way), our approach introduces little visual artifacts where w, is a weight vector for controlling the contribution
and blur, which are common issues in many existing of the feature points to the computation of the distance,
techniques. since according to our observation, the feature points in the
lower limbs play a bigger role than the feature points in the
We now present the details of the steps. upper limbs in determining a moving human's motion state.
2.1. Background inpainting
Two types of backgrounds need to be repaired: that
occluded by stationary foreground objects and that occluded
by moving foreground objects. For the first type, many well
established still image inpainting methods can be used to
repair the background (e.g. [8-9]). The second type of
occlusion can be solved by using the background obtained Figure 2. Left: the 8ff frame of one sequence. Right: the 25th
from other frames where the occlusion does not occur. frame of the same video. Since the moving human has similar
2.2. Motion state estimation
motion/pose in both frames, these frames are regarded as
having the same motion state.
We define feature points as the joints of the moving
human's body. In our current experiments with video Upon the completion of the clustering, we label the clusters
containing a walking human, we manually detect 7 feature sequentially according to the temporal order of the
points in the moving human to be repaired. For a man underlying frames. Then each undamaged frame can be
walking from the left side to the right side, these feature associated with a state index. (Again, in the case of multiple
points correspond to the head, the elbow, the wrist of the humans, there will be multiple indices.)
right hand, both knees, and both ankles, as illustrated in 2.4. Motion state prediction
Fig.1 Then we compute a motion-state vector for each In this step, we attempt to predict the motion state for those
undamaged frame. The motion-state vector is defined as frames with the human missing. This step is inspired by
follows. We first set one of the feature points as the human motion synthesis techniques, such as [10], where
reference point (in our experiments we use the feature point typically a Markov model is used to model and synthesize
corresponding to the head). Then we set up a polar human motion. In our current study, since we handle only
coordinates system with the reference point as the origin. simple and periodic motion, e.g. walking, running, and
Other six feature points will correspond to 6 angles with jumping, we adopt a simple first order motion state
respect to the origin, O01 02.,.... 06 (see Fig. 1). The motion- transition model as illustrated in Fig. 3, where the states can
state vector is defined as: transit only to its next state and itself
Vi = (tgoilI,tg0i2.,
* I tg0i6) (1)
where the subscript i indicates the frame index i.
Origin
1720
model is extended into a general probabilistic graph model We apply our approach to different video sequences with a
to repair human with more complex motion, the prediction moving human. In these video sequences, we manually crop
of a state / for a given frame can be achieved using out a block covering the human, and then attempt to repair
= arg Max[P( a)] (3) the frame using our approach. Figures 5-7 demonstrate
three samples from our experiments, with the following
which means that the most likely motion state,8 transiting table summarizing the characteristics of the sequences (in
from state a will be chosen. the table, the "source frame" column indicates which frames
are used to repair the damaged frames, and the number of
2.5. Moving human inpainting: states is the state number n obtained in the clustering stage).
Assume that frame i is a damaged frame, and its predicted #
Video # of of Damaged Source
motion state is state,. In a video, we can detect the set of frame
Resolution
states frame frame
frames {Fi1,Fi2, .Fim3 all having the state label ,B. In this
step we choose a frame f E {Fil, Fi2 Fim } , and paste the
........
Fig. 5
Fig.6
I 60
80
320*240 I 14
300* 100 22
13 ,14'
13th 15th
I 42th-44th
29th 30th
human contained inf into frame i. Our criteria in making the Fig.7 70 256* 192 14 26 Ih 29th 12'h-15th
choice is that, we choose a frame f whose neighboring For easy appreciation of the results, the corresponding video
frames are most similar to the corresponding neighbors of is posted at http://home.ustc.edu.cn/-whaomian/
the damaged frame, in terms of the moving human.
Furthermore, if there are a series of damaged frames, e.g. 4. CONCLUSION AND FUTURE WORK
frames i, i+±,..., i+m, and for frame i+t, we choose framef,
then for frame i+t+±, we choose framef+1. These strategies In this paper, we proposed a novel approach to repairing
are intended to maximally maintain the motion continuity. largely occluded moving human in video. Our approach is
based on the estimation and prediction of the motion state of
Once the best source frame f has been determined, the the frames. Experiments show that our method can avoid
moving object segmentation technique of [11] is used to common artifacts in typical existing video inpainting
obtain the moving human in f An example is illustrated in methods.
Fig. 4.
As a future direction, we will extend our work to repair
moving humans with more complex motion, potentially with
a probabilistic graphical with sophisticated state transitions.
Another important future direction is to develop automatic
method for detecting the feature points.
5. ACKNOWLEDGEMENT
r igure 4.Mvioving numan segmenmtaon. This work is supported by NSFC General Program under
The rest of the work is to paste the moving human into the contract No. 60672161, 863 Program under contract No.
damaged frames. Simply pasting the moving human from an 2006AAOlZ317, and NSFC Key Program under contract
undamaged frame into a damaged frame may lead to No. 60632040.
obvious artificial effects. We largely avoid such problem 6. REFERENCES
using graph cut based image inpainting technique. Graph
cut based image inpainting is to find the least visible seam [1] Y. Wexler, E. Shechtman, and M. Irani, "Space-time video
between the target and the source fragments in the completion," Proc. EEE Conference on Computer Vision and
Pattern Recognition, vol.], pp. 120-127, 2004.
overlapped region, and the pasting path is across the least [2] K. A. Patwardhan, G. Sapiro, and M. Bertalm'io, "Video
visible seam. More details can be found in e.g. [4]. inpainting of occluding and occluded objects," Proc. IEEE
Another noticeable artificial effect may be caused due to the Intl. Conf Image Processing (ICIP) Genoa, Italy, 2005.
difference of brightness between frames, since the exposure [3] K. Patwardhan, G. Sapiro, and M. Bertalmio, "Video
inpainting under camera motion," IEEE Trans. Image
of the camera may change if the scene or background Processing, to appear.
changes. To avoid such drawback, we bring in image editing [4] Y.T. Jia, S.M. Hu, and R. Martin "Video completion using
technique [12] to our algorithm. Using image editing tracking and fragment mergin", Visual Comput (2005)
technique, we can merge one object from an image into a 21:601-610.
new image and eliminate the artificial effects caused by the [5] Y. Zhang, J. Xiao, and M. Shah. Motion layer based object
difference of brightness or contrast between different removal in videos. In Proc. IEEE Workshop on Applications
images. of Computer Vision, pages 516-521, 2005
[6] J. Jia, T. Wu, Y. Tai, and C. Tang, "Video Repairing:
3. EXPERIMENTAL RESULTS Inference of Foreground and Background Under Severe
1721
Occlusion," Proc. IEEE Conf on Computer Vision and [10] Y. Li , T. Wang , H. Shum, Motion texture: a two-level
Pattern Recognition, vol. 1, pp. 364-371, 2004. statistical model for character motion synthesis, Proc.
[7] Takaaki Shiratori, Yasuyuki Matsushita, Sing Bing Kang, Conference on computer graphics and interactive techniques,
Xiaoou Tang, "Video Completion by Motion Field Transfer", July 23-26, 2002, San Antonio, Texas.
Proc. IEEE Conf Computer Vision and Pattern Recognition, [11] S.Y. Chien, S.Y. Ma, L.G. Chen, Efficient Moving Object
2006 Segmentation Algorithm Using Background Registration
[8] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester, Technique. IEEE Trans. Circuits and Systems for Video
"Image inpainting," SIGGRAPH 2000, pp. 417-424, 2000. Technology. July 2002, vol 12:577-586.
[9] M. Bertalmio, L. Vese, G. Sapiro, and S. Osher, [12] Perez, P., Gangnet, M., Blake, A. 2003. Poisson image
"Simultaneous texture and structure image inpainting," IEEE editing. Proceedings ofACMSIGGRAPH, 313-318.
Trans. on Image Processing, vol. 12:8, pp. 882-889, 2002.
Figure 5
tigure /
Figure 5 - Figure 7: In each figure, the top row is the original video, the middle row the video with damaged
frames, and the bottom row the video with repaired frames.
1722