Anda di halaman 1dari 4

VIDEO INPAINTING FOR LARGELY OCCLUDED MOVING HUMAN

Haomian WangI, Houqiang LiI, Baoxin Li2


'University of Science and Technology of China, Hefei, 230027, P.R. China
2Computer Science & Engineering, Arizona State University, Tempe, Arizona, U.S.A.
whaomian @ mail. ustc. edu. cn, lihq @ ustc. edu. cn, Baoxin.Li @ asu. edu
ABSTRACT most existing methods, our approach first categorizes
typically periodic human motion in a video into a set of
In this paper, a video inpainting approach is proposed, temporal states (called motion states), and then estimates the
which targets at repairing a video containing moving motion state of a missing human for a given frame. The
humans that are largely or completely occluded or missing estimate is then used to select a set of candidate frames from
for some of the frames. The proposed approach first other parts of the video. Finally, an optimal frame is chosen
categorizes typically periodic human motion in a video into from the candidates, using the best motion continuity as the
a set of temporal states (called motion states), and then criterion. The moving human in the chosen frame is then
estimates the motion states for the frames with missing used to repair the given damaged frame.
humans so as to repair the missing parts using other
undamaged frames with the same motion states. This In Section 2 we present the details of the proposed approach,
deviates from common approaches that directly repair the followed by sample experimental results in Section 3 and a
pixels of the damaged parts. Experiments demonstrate that brief discussion of future work in Section 4.
the proposed method can well repair the damaged video
sequences without introducing strong artifacts that exist in 2. PROPOSED VIDEO INPAINTING ALGORITHM
many existing techniques. This section presents the proposed method. We first give an
overview of the algorithm in terms of its computational
1. INTRODUCTION steps, and then discuss the steps in detail. We make the
Image/video inpainting serves to fill in the missing parts or following assumptions about the video to be processed: the
to replace the undesired parts of an image or a video camera is fixed; the scene is composed of stationary
sequence. The technique has many applications including background with a moving human (multiple humans can be
restoration of corrupted images, surveillance video processed similarly in principle); the moving human to be
processing, and video editing during post-production in the repaired has periodic motion. These assumptions are
movie industry, etc. In the case of video, one important task approximately true for video from many applications. In
in inpainting is to repair a moving object/human. practice, some of these assumptions can be relaxed by
including additional processing modules (e.g., using a
Current video inpainting methods (e.g., [1-7]) for repairing sensor motion compensation module to remove the first
moving objects in video range from those based on assumption).
extension of 2D image inpainting techniques to those that
directly employ spatiotemporal analysis. While most of Our approach consists of the following major steps:
these methods work reasonably well for repairing not-so- Stepl Background inpainting: This is needed for
large moving objects, they often suffer from noticeable repairing damaged background regions. We employ
artifacts (e.g., loss of motion consistency) and/or over- established existing approaches for this task.
smoothing in face of largely occluded moving objects. Step2 Motion state estimation: The feature points of a
Methods relying on local motion field (e.g. [7]) may also be moving human in each undamaged frames are detected
very sensitive to noise. (manually in current experiments), and a motion state vector
is computed based on the feature points.
In this paper, we propose a novel method for the specific Step3 Motion state classification: The motion state
task of repairing largely damaged moving human in video vectors from the undamaged frames are clustered and then
sequences. Specifically, our experiments deal with video labeled with the cluster indices.
with damaged frames in which a moving human is Step4 Motion state prediction: A motion-state transition
completely missing. The potential application scenarios for model is constructed, which is used to predict the motion
our method include, for example, virtually revealing a states of the damaged frames.
walking human that is occluded by some other objects for a Step5 Moving human inpainting: The moving human of
short period in a video, or repairing video with regions of each damaged frame is copied from other undamaged
humans being contaminated during transmission. Instead of frames according to the predicted motion states. In this step,
directly repairing the pixels of the missing part, as done in graph-cut-based image inpainting [4] is used.

1-4244-1017-7/07/$25.00 ©2007 IEEE 1719 ICME 2007


The primary contribution of our approach lies in the use of multiple humans, we may keep multiple vectors for each
motion state prediction in repairing a missing human. An frame.) Fig. 2 shows an example of two frames having the
accurate prediction for the motion of the missing human will same motion state. Then we classify the motion state vectors
ensure optimal repair of the video in terms of maintaining based on K-means clustering. In the algorithm, we define
good motion continuity, which is a significant factor for the distance between two motion-state vectors i andj as
good visual quality of the repaired video. Since we repair 6
the damaged frame with nearby frames of the same motion D(inj)= wt I t-jtI (2)
state in the same video (as opposed to creating the pixels in t=l

some way), our approach introduces little visual artifacts where w, is a weight vector for controlling the contribution
and blur, which are common issues in many existing of the feature points to the computation of the distance,
techniques. since according to our observation, the feature points in the
lower limbs play a bigger role than the feature points in the
We now present the details of the steps. upper limbs in determining a moving human's motion state.
2.1. Background inpainting
Two types of backgrounds need to be repaired: that
occluded by stationary foreground objects and that occluded
by moving foreground objects. For the first type, many well
established still image inpainting methods can be used to
repair the background (e.g. [8-9]). The second type of
occlusion can be solved by using the background obtained Figure 2. Left: the 8ff frame of one sequence. Right: the 25th
from other frames where the occlusion does not occur. frame of the same video. Since the moving human has similar
2.2. Motion state estimation
motion/pose in both frames, these frames are regarded as
having the same motion state.
We define feature points as the joints of the moving
human's body. In our current experiments with video Upon the completion of the clustering, we label the clusters
containing a walking human, we manually detect 7 feature sequentially according to the temporal order of the
points in the moving human to be repaired. For a man underlying frames. Then each undamaged frame can be
walking from the left side to the right side, these feature associated with a state index. (Again, in the case of multiple
points correspond to the head, the elbow, the wrist of the humans, there will be multiple indices.)
right hand, both knees, and both ankles, as illustrated in 2.4. Motion state prediction
Fig.1 Then we compute a motion-state vector for each In this step, we attempt to predict the motion state for those
undamaged frame. The motion-state vector is defined as frames with the human missing. This step is inspired by
follows. We first set one of the feature points as the human motion synthesis techniques, such as [10], where
reference point (in our experiments we use the feature point typically a Markov model is used to model and synthesize
corresponding to the head). Then we set up a polar human motion. In our current study, since we handle only
coordinates system with the reference point as the origin. simple and periodic motion, e.g. walking, running, and
Other six feature points will correspond to 6 angles with jumping, we adopt a simple first order motion state
respect to the origin, O01 02.,.... 06 (see Fig. 1). The motion- transition model as illustrated in Fig. 3, where the states can
state vector is defined as: transit only to its next state and itself
Vi = (tgoilI,tg0i2.,
* I tg0i6) (1)
where the subscript i indicates the frame index i.
Origin

Figure 3. A simple first-order transition model for the


a motion states of a video. State n goes back to state 1 due to
periodicity of the motion (in such as walking).
, 0
Figure 1. Illustration of the state-vector With this motion state transition model, we can predict the
motion states of the damaged frames. Assume that frame i is
2.3. Motion state classification a damaged frame and its neighbor frame i-i (or i+±) is a
After the above step, each undamaged frame has its own good frame. Further assume that the motion state of frame i-
motion state vector, and the vector represents the motion 1 is statea. Then the motion state of frame i should be
state of the moving human in this frame. (In the case of state a or a + l(or a - 1) It is worth mentioning that, if the

1720
model is extended into a general probabilistic graph model We apply our approach to different video sequences with a
to repair human with more complex motion, the prediction moving human. In these video sequences, we manually crop
of a state / for a given frame can be achieved using out a block covering the human, and then attempt to repair
= arg Max[P( a)] (3) the frame using our approach. Figures 5-7 demonstrate
three samples from our experiments, with the following
which means that the most likely motion state,8 transiting table summarizing the characteristics of the sequences (in
from state a will be chosen. the table, the "source frame" column indicates which frames
are used to repair the damaged frames, and the number of
2.5. Moving human inpainting: states is the state number n obtained in the clustering stage).
Assume that frame i is a damaged frame, and its predicted #
Video # of of Damaged Source
motion state is state,. In a video, we can detect the set of frame
Resolution
states frame frame
frames {Fi1,Fi2, .Fim3 all having the state label ,B. In this
step we choose a frame f E {Fil, Fi2 Fim } , and paste the
........
Fig. 5
Fig.6
I 60
80
320*240 I 14
300* 100 22
13 ,14'
13th 15th
I 42th-44th
29th 30th

human contained inf into frame i. Our criteria in making the Fig.7 70 256* 192 14 26 Ih 29th 12'h-15th
choice is that, we choose a frame f whose neighboring For easy appreciation of the results, the corresponding video
frames are most similar to the corresponding neighbors of is posted at http://home.ustc.edu.cn/-whaomian/
the damaged frame, in terms of the moving human.
Furthermore, if there are a series of damaged frames, e.g. 4. CONCLUSION AND FUTURE WORK
frames i, i+±,..., i+m, and for frame i+t, we choose framef,
then for frame i+t+±, we choose framef+1. These strategies In this paper, we proposed a novel approach to repairing
are intended to maximally maintain the motion continuity. largely occluded moving human in video. Our approach is
based on the estimation and prediction of the motion state of
Once the best source frame f has been determined, the the frames. Experiments show that our method can avoid
moving object segmentation technique of [11] is used to common artifacts in typical existing video inpainting
obtain the moving human in f An example is illustrated in methods.
Fig. 4.
As a future direction, we will extend our work to repair
moving humans with more complex motion, potentially with
a probabilistic graphical with sophisticated state transitions.
Another important future direction is to develop automatic
method for detecting the feature points.

5. ACKNOWLEDGEMENT
r igure 4.Mvioving numan segmenmtaon. This work is supported by NSFC General Program under
The rest of the work is to paste the moving human into the contract No. 60672161, 863 Program under contract No.
damaged frames. Simply pasting the moving human from an 2006AAOlZ317, and NSFC Key Program under contract
undamaged frame into a damaged frame may lead to No. 60632040.
obvious artificial effects. We largely avoid such problem 6. REFERENCES
using graph cut based image inpainting technique. Graph
cut based image inpainting is to find the least visible seam [1] Y. Wexler, E. Shechtman, and M. Irani, "Space-time video
between the target and the source fragments in the completion," Proc. EEE Conference on Computer Vision and
Pattern Recognition, vol.], pp. 120-127, 2004.
overlapped region, and the pasting path is across the least [2] K. A. Patwardhan, G. Sapiro, and M. Bertalm'io, "Video
visible seam. More details can be found in e.g. [4]. inpainting of occluding and occluded objects," Proc. IEEE
Another noticeable artificial effect may be caused due to the Intl. Conf Image Processing (ICIP) Genoa, Italy, 2005.
difference of brightness between frames, since the exposure [3] K. Patwardhan, G. Sapiro, and M. Bertalmio, "Video
inpainting under camera motion," IEEE Trans. Image
of the camera may change if the scene or background Processing, to appear.
changes. To avoid such drawback, we bring in image editing [4] Y.T. Jia, S.M. Hu, and R. Martin "Video completion using
technique [12] to our algorithm. Using image editing tracking and fragment mergin", Visual Comput (2005)
technique, we can merge one object from an image into a 21:601-610.
new image and eliminate the artificial effects caused by the [5] Y. Zhang, J. Xiao, and M. Shah. Motion layer based object
difference of brightness or contrast between different removal in videos. In Proc. IEEE Workshop on Applications
images. of Computer Vision, pages 516-521, 2005
[6] J. Jia, T. Wu, Y. Tai, and C. Tang, "Video Repairing:
3. EXPERIMENTAL RESULTS Inference of Foreground and Background Under Severe

1721
Occlusion," Proc. IEEE Conf on Computer Vision and [10] Y. Li , T. Wang , H. Shum, Motion texture: a two-level
Pattern Recognition, vol. 1, pp. 364-371, 2004. statistical model for character motion synthesis, Proc.
[7] Takaaki Shiratori, Yasuyuki Matsushita, Sing Bing Kang, Conference on computer graphics and interactive techniques,
Xiaoou Tang, "Video Completion by Motion Field Transfer", July 23-26, 2002, San Antonio, Texas.
Proc. IEEE Conf Computer Vision and Pattern Recognition, [11] S.Y. Chien, S.Y. Ma, L.G. Chen, Efficient Moving Object
2006 Segmentation Algorithm Using Background Registration
[8] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester, Technique. IEEE Trans. Circuits and Systems for Video
"Image inpainting," SIGGRAPH 2000, pp. 417-424, 2000. Technology. July 2002, vol 12:577-586.
[9] M. Bertalmio, L. Vese, G. Sapiro, and S. Osher, [12] Perez, P., Gangnet, M., Blake, A. 2003. Poisson image
"Simultaneous texture and structure image inpainting," IEEE editing. Proceedings ofACMSIGGRAPH, 313-318.
Trans. on Image Processing, vol. 12:8, pp. 882-889, 2002.

Figure 5

tigure /

Figure 5 - Figure 7: In each figure, the top row is the original video, the middle row the video with damaged
frames, and the bottom row the video with repaired frames.

1722

Anda mungkin juga menyukai