Matting
Jue Wang
Dept. of Electrical Engineering
University of Washington
Seattle, WA 98195, USA
juew@ee.washington.edu
Abstract
Separating a foreground object from the background in a
static image involves determining both full and partial pixel
coverages, also known as extracting a matte. Previous approaches require the input image to be pre-segmented into
three regions: foreground, background and unknown, which
is called a trimap. Partial opacity values are then computed only for pixels inside the unknown region. This presegmentation based approach fails for images with large
portions of semi-transparent foreground where the trimap is
difficult to create even manually. In this paper we combine
the segmentation and matting problem together and propose
a unified optimization approach based on Belief Propagation. We iteratively estimate the opacity value for every
pixel in the image, based on a small sample of foreground
and background pixels marked by the user. Experimental
results show that compared with previous approaches, our
method is more efficient to extract high quality mattes for
foregrounds with significant semi-transparent regions.
1. Introduction
Image matting refers to the problem of estimating an
opacity (alpha value) and foreground and background colors for each pixel in the image. Specifically, the observed
image I(z) (z = (x, y)) is modeled as a linear combination
of foreground image F (z) and background image B(z) by
an alpha map:
I(z) = z F (z) + (1 z )B(z)
(1)
Michael F. Cohen
Microsoft Research
Redmond, WA 98052, USA
mcohen@microsoft.com
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
Figure 1. (a). Original image with user specified foreground and background strokes. (b). Applying
graph-cut based segmentation followed by erosion-dilation operations to automatically create a
trimap. The unknown region(shown in yellow) is insufficient to cover all hair strands. (c). Directly
applying Bayesian matting by treating all unmarked pixels as unknown ones results in an erroneous
matte. (d). Manually specified trimap. (e). Bayesian matting result on trimap (d). (f)-(j). Our algorithm
iteratively estimates a matte based on the user specified strokes in (a). From (f) to (j): initial matte,
matte after 3 iterations, matte after 6 iterations, matte after 9 iterations, final matte after 15 iterations.
MRF Construction
model each iteration of the matte estimation as an MRF optimization problem. We employ a belief propagation (BP)
based approach to solve each step of the iteration. BP has
been proven to be an efficient method for solving a number
of low level vision tasks [6, 11].
In detail, each pixel in Uc is treated as a node in the
MRF. In addition, adjacent pixels in Uc which are not in Uc
are also included in the MRF. All the nodes (denoted as )
are connected to their spatial neighbors with 4-connectivity.
Since Uc changes at each iteration, the graph structure is dynamically updated. The optimization goal in each iteration
is to minimize the total energy defined as:
Vd (p ) +
Vs (p , q )
(2)
V =
p
p,q
where Vd is the data energy describing how well the estimated alpha value p , and foreground and background
colors for p fit with the actual color Cp , and Vs is
the smoothness energy which penalizes inconsistent alpha
value changes between two neighbors p and q. Vd and Vs
will be described in detail in the next section.
2.2.2
s(p, pi ))2
)
2
w
(3)
(background) pixels, then assign each marked pixel to a single Gaussian in the GMM. We then randomly select pixels
from each Gaussian to assemble the sample set. Note that
this initial global sampling is not accurate since it samples
a mixture of all possible foreground or background colors,
but as the algorithm progresses, foreground and background
regions will be propagated close to p thus the local model
will take over.
Given the foreground and background samples and corresponding weights, we can build a probability histogram
of alpha levels. Specifically, we compute the likelihood of
each alpha level k as
N N
1 F B
w w
N 2 i=1 j=1 i j
2
exp dc Cp , k Fip + (1 k )Bjp /2dk2
Lk (p) =
(4)
where Fip and Bjp are foreground and background samples and Cp is the actual color of p. We calculate Euclidian distance in RGB space as distance between colors
which is denoted as dc (, ). To compute the covariance dk ,
we first compute the distance covariance among foreground
and background samples, denoted as F and B , then calculate dk as k F + (1 k )B .
The data term in Equation 2 can be computed for each of
the possible states, k , for p . The data cost is defined as
Lk (p)
Vd (pk ) = 1 K
k=1 Lk (p)
(5)
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 2. (a). One frame of a video sequence with user specified foreground and background strokes.
(b). Extracted matte by our approach. (c). Automatically propagating foreground and background
marks onto the next frame. (d). Extracted matte on the next frame. (e)-(h). Automatically generated
mattes for the 5th, 10th, 15th and 20th frames.
are applied when p = 0 for background pixels. Otherwise we choose the pair of foreground and background colors from the group of foreground and background samples
which minimizes the fitting error:
2
Cp p Fip (1 p )Bjp
F (p), B (p) = arg min
p
p
Fi ,Bj
(9)
Note that we use real foreground and background colors
as the estimated ones, in this way we can avoid the color
bleeding artifact that the Bayesian matting approach may
have as addressed in [2], in which case the estimated foreground color is a mixture of real foreground and background colors. The uncertainty value u(p) is updated as
(10)
qH(p)
k
where wiF and wiB are weights for the selected pair of
foreground and background samples. Since the sample
weights will generally increase as the Uc region grows, the
uncertainty values will decrease during the iterative process.
The algorithm halts when the total uncertainty of the whole
matte cannot be reduced any further.
Extension to Video
(a)
(b)
(c)
(d)
Figure 3. (a). Original images with user specified foreground and background strokes. (b). Extracted
alpha mattes by our approach. (c). For Bayesian matting, complex trimaps must be manually created
by the user. (d). Bayesian matting results based on trimaps in (c).
foreground and background colors, thus it has similar weakness with Bayesian matting method for images where foreground and background colors are too ambiguous. How to
build more accurate color models is the crucial challenge
for both this approach and many other color-based segmentation and matting approaches.
Acknowledgement
The authors would like to thank Colin Ke Zheng and Yi
Li for sharing their initial experimental code, and Qi Miao
for help collecting test images. One of the authors of this
work is supported by a grant from Microsoft Research.
References
[1] A.R. Smith and J.F. Blinn. Blue screen matting. Proceedings of ACM SIGGRAPH, pp. 259-268, 1996.
[2] C. Rother, V. Kolmogorov and A. Blake. GrabCut - Interactive foreground extraction using iterated graph cut.
Proc. of ACM SIGGRAPH, pp. 309-314, 2004.
(a)
(b)
(c)
(d)
(e)
Figure 5. (a). Original image with foreground and background strokes. (b). Extracted matte by our
approach is erroneous due to color ambiguity. (c). The user specified rough trimap. (d). Bayesian
matting result based on trimap (c). (e) Our matting result based on trimap (c).