Abstract
This contribution reports Nokias results of CE7 on improving the FGS layer coding performance
of close-loop P frames. Temporal prediction is introduced into FGS layer coding and it is formed
adaptively from the enhancement layer reference and the base layer reference, based on the
information coded in the base layer. The new algorithm can control the drift due to the partial
decoding, while at the same time achieve high coding efficiency. Much effort is also spent on
reducing the complexity of the new solution by using low-complexity motion compensation and
minimizing additional transform operations. With minimal increase in the complexity, the FGS
coding performance of close-loop P frames can be improved by as much as 4 dB for one FGS
layer on a base layer coded at QP 42.
Introduction
For real-time video communication applications, such as video conferencing, minimizing end-to-
end delay is very critical to ensure good interaction among the participants. The low-delay
requirement is usually met by encoding each video frame as either I-frames or P-frame.
JSVM currently uses the prediction only from the base layer in coding the FGS layer of close-
loop P frames in order to avoid drift. One problem of using the enhancement layer in prediction
is mismatch between the reference frame used by the encoder and that by the decoder when
the bitstream is only partially decoded. This mismatch in the reference frames could cause the
accumulation of error, and result in drift. The drift can be put under control if the accumulated
error is bounded. The leaky prediction is an effective method to achieve that by using a
reference signal which is the weighted average between the base layer reference and
enhancement layer reference.
JVT-O054 proposed a leaky prediction based solution to improve the FGS coding of the close-
loop P frames by using temporal prediction signal which is adaptively formed from both the
enhancement layer reference frame and base layer reference frame based on the information
coded in the base layer. This solution is referred to as AR-FGS (FGS coding with adaptive
reference). AR-FGS significantly improves the FGS layer coding efficiency with effective control
on the drift. However, the complexity of the FGS coder is increased because of the necessity of
additional motion compensation and transform operations.
In this experiment, the idea in JVT-P087 is incorporated into AR-FGS. We also performed
further tuning on the design to reduce the complexity and improve the coding performance.
In JVT-O054, the coefficients being coded in the enhancement layer are classified based on the
information coded in the base layer, and different leaky factors are used in forming the
prediction for coefficients in different categories. Specifically, it proposed the following algorithm
to form the prediction signal used in FGS layer coding.
n
For a block of size M N , X n , being coded in the FGS layer, the actual reference signal, Ra
n
, is formed as a weighted average between the base layer reconstruction, X b , and the
n 1 n
enhancement reference signal, Re , if the coefficients, Qb , coded in the base layer
collocated block are all zero.
Ran (1 ) X bn Ren 1 if Qb 0
n
(1)
n n 1
Otherwise, transform is performed on X b and Re to obtain the transform coefficients,
FX n f ( X bn ) , and FR n1 f ( Ren 1 ) respectively. A coefficient block FRan is formed based on
b e
FR n (u , v ) (1 ) FX n (u , v) FR n 1 (u , v ) if Qb (u , v) 0
n
(2)
a b e
FR n (u, v) FX n (u , v) if Q (u , v) 0
n
b (3)
a b
(5)
FR n (u , v) FR n 1 (u , v) DQ (u , v)
n
b if Qb (u , v) 0
n
(6)
a b
n
Equation (4)(5)(6) can be combined into a unified equation. Pb is the reconstructed prediction
residual coded in the base layer.
The high-level structure of AR-FGS is illustrated in Figure 1. The highlighted area is the
additional module that needs to be added to JSVM.
n 1
The adjusted differential reference block, Rd ' , is calculated from the differential reference
n 1 n 1 n 1
block, Rd Re Rb , as follows.
If the base layer collocated block does not have any nonzero coefficients. The differential
reference block is scaled by using a scaling factor .
Rdn 1 ' Rdn 1 if Qb 0
n
(8)
Otherwise, transform is performed on Rdn 1 to obtain the transform coefficients
n 1
FR n1 f ( R d ) . A coefficient FRdn1 (u , v ) is scaled by using a scaling factor if the base
d
(9)
FR n1 ' (u , v ) 0 if Qb (u , v) 0
n
d
(10)
The adjusted differential reference block is added to base layer constructed block to obtain the
reference block used in FGS layer coding.
Transform is needed only for the block that its base layer collocated block has nonzero
coefficients. For normal coding condition, the percentage of this type of block is usually
negligible.
According to the experimental results, the enhancement layer reference block no longer helps in
coding the current block if the base layer collocated block has certain amount of nonzero
coefficients. In generating the CE7 results, only these 4x4 blocks that their base layer
collocated blocks have 1 to 4 nonzero coefficients are transformed, and coefficient-based
scaling is performed. If there are more than 4 nonzero coefficients in the base layer collocated
block, no enhancement reference is used, and no transform is needed as well.
For single FGS layer coding tests, 4 different FGS coders are used.
Original JSVM
Modified JSVM with AR-FGS using AVC interpolation filter in enhancement layer motion
compensation
Modified JSVM with AR-FGS using Bilinear interpolation in differential reference frame
motion compensation
Modified JSVM with AR-FGS using 4-tap polyphase filters in differential reference frame
motion compensation.
The 4-tap polyphase filters originally proposed in JVT-D029 were tested. As analyzed in JVT-
D029, direct interpolation using 4-tap polyphase filter requires much less computation then 2-
step AVC interpolation.
{ 0, 16, 0, 0},
{-2, 14, 5, -1},
{-2, 10, 10, -2},
{-1, 5, 14, -2}
All the experiment data are listed in the attached file JVT-Q039.xls.
Among 3 types of filters, AVC filters give the best performance. However, direct interpolation
using 4-tap polyphase filter is almost as good as AVC filters except for city. Bilinear filters also
give very good performance, especially considering the complexity of bilinear interpolation is
minimal.
In generating the results, the motion estimation is performed using a reference frame that is
upsampled by using AVC interpolation filter. The performance of FGS coder using either 4-tap
polyphase filter of bilinear filters can potentially be improved if the filtered used in motion
estimation matches that used in motion compensation.
For combined scalability tests, the improvement to coding performance can be up to 0.19dB for
luma, and 0.38dB for chroma, even though the new algorithms are applied only to coding of
FGS layer of close-loop P frames.
Software modifications
The new algorithms have been integrated into the latest JSVM software. The integrated
software was extensively tested with different configurations and has been distributed to other
participants of the CE.
Conclusions
The proposed solution significantly improves the coding performance of FGS layer of the close-
loop P frames. The additional complexity is minimal since the large coding gain can be
achieved by using an interpolation filter as simple as bilinear interpolation.
References
1. Yiliang Bao, Marta Karczewicz, Justin Ridge, Xianglin Wang, JVT-O054, Improvements
to Fine Granularity Scalability for Low-Delay Applications, April 2005, Busan, Korea.
2. Julien Reichel, Heiko Schwarz, and Mathias Wien, JVT-P201, Scalable Video Coding
Working Draft 3, July 2005, Poznan, Poland.
3. Yiliang Bao, JVT-P 307r1, Core Experiment on FGS coding for low-delay applications
(CE-7), July 2005, Poznan, Poland.
4. Antti Hallapuro, Jani Lainema, and Marta Karczewicz, JVT-D029, "4-tap filter for bi-
predicted macroblocks", Klagenfurt, Austria, July, 2002.
This form provides the ITU-T | ISO/IEC Joint Video Coding Experts Group (JVT) with information about the patent
status of techniques used in or proposed for incorporation in a Recommendation | Standard. JVT requires that all
technical contributions be accompanied with this form. Anyone with knowledge of any patent affecting the use of
JVT work, of their own or of any other entity (third parties), is strongly encouraged to submit this form as well.
This information will be maintained in a living list by JVT during the progress of their work, on a best effort basis.
If a given technical proposal is not incorporated in a Recommendation | Standard, the relevant patent information
will be removed from the living list. The intent is that the JVT experts should know in advance of any patent
issues with particular proposals or techniques, so that these may be addressed well before final approval.
This is not a binding legal document; it is provided to JVT for information only, on a best effort, good faith basis.
Please submit corrected or updated forms if your knowledge or situation changes.
This form is not a substitute for the ITU ISO IEC Patent Statement and Licensing Declaration, which should be
submitted by Patent Holders to the ITU TSB Director and ISO Secretary General before final approval.
2.0 The submitter is not aware of having any granted, pending, or planned patents associated with the
technical content of the Recommendation | Standard or Contribution.
or,
The submitter (Patent Holder) has granted, pending, or planned patents associated with the technical content of the
Recommendation | Standard or Contribution. In which case,
2.1 The Patent Holder is prepared to grant on the basis of reciprocity for the above Recommendation |
Standard a free license to an unrestricted number of applicants on a worldwide, non-discriminatory
basis to manufacture, use and/or sell implementations of the above Recommendation | Standard.
2.2 The Patent Holder is prepared to grant on the basis of reciprocity for the above Recommendation |
Standard a license to an unrestricted number of applicants on a worldwide, non-discriminatory basis
and on reasonable terms and conditions to manufacture, use and/ or sell implementations of the above
Recommendation | Standard.
Such negotiations are left to the parties concerned and are performed outside the ITU | ISO/IEC.
x 2.2.1 The same as box 2.2 above, but in addition the Patent Holder is prepared to grant a royalty-free license
to anyone on condition that all other patent holders do the same.
2.3 The Patent Holder is unwilling to grant licenses according to the provisions of either 2.1, 2.2, or 2.2.1
above. In this case, the following information must be provided as part of this declaration:
patent registration/application number;
an indication of which portions of the Recommendation | Standard are affected.
a description of the patent claims covering the Recommendation | Standard;
In the case of any box other than 2.0 above, please provide the following:
Patent number(s)/status
Inventor(s)/Assignee(s)
Relevance to JVT
3.1 The submitter is not aware of any granted, pending, or planned patents held by third parties associated
with the technical content of the Recommendation | Standard or Contribution.
3.2 The submitter believes third parties may have granted, pending, or planned patents associated with the
technical content of the Recommendation | Standard or Contribution.
For box 3.2, please provide as much information as is known (provide attachments if more space needed) - JVT will
attempt to contact third parties to obtain more information:
Mailing address
Country
Contact person
Telephone
Fax
Email
Patent number/status
Inventor/Assignee
Relevance to JVT