Diversity
VMon and VQuad Results Description
Manual
Manual
The firmware of the instrument makes use of several valuable open source software packages. For information, see the "Open
Source Acknowledgement" on the user documentation CD-ROM (included in delivery).
Rohde & Schwarz would like to thank the open source community for their valuable contribution to embedded computing.
SwissQual AG
Allmendweg 8, 4528 Zuchwil, Switzerland
Phone: +41 32 686 65 65
Fax:+41 32 686 65 66
E-mail: info@swissqual.com
Internet: http://www.swissqual.com/
Printed in Germany Subject to change Data without tolerance limits is not binding.
R&S is a registered trademark of Rohde & Schwarz GmbH & Co. KG.
Trade names are trademarks of the owners.
SwissQual has made every effort to ensure that eventual instructions contained in the document are adequate and free of errors and
omissions. SwissQual will, if necessary, explain issues which may not be covered by the documents. SwissQuals liability for any
errors in the documents is limited to the correction of errors and the aforementioned advisory services.
Copyright 2000 - 2013 SwissQual AG. All rights reserved.
No part of this publication may be copied, distributed, transmitted, transcribed, stored in a retrieval system, or translated into any
human or computer language without the prior written permission of SwissQual AG.
Confidential materials.
All information in this document is regarded as commercial valuable, protected and privileged intellectual property, and is provided
under the terms of existing Non-Disclosure Agreements or as commercial-in-confidence material.
When you refer to a SwissQual technology or product, you must acknowledge the respective text or logo trademark somewhere in
your text.
SwissQual, Seven.Five, SQuad, QualiPoc, NetQual, VQuad, Diversity as well as the following logos are registered trademarks of SwissQual AG.
Diversity ExplorerTM, Diversity RangerTM, Diversity UnattendedTM, NiNA+TM, NiNATM, NQAgentTM, NQCommTM, NQDITM, NQTMTM,
NQViewTM, NQWebTM, QPControlTM, QPViewTM, QualiPoc FreeriderTM, QualiPoc iQTM, QualiPoc MobileTM, QualiPoc StaticTM, QualiWatch-MTM, QualiWatch-STM, SystemInspectorTM, TestManagerTM, VMonTM, VQuad-HDTM are trademarks of SwissQual AG.
The following abbreviations are used throughout this manual: R&S___ is abbreviated as R&S ___.
SwissQual... Diversity
Contents
Contents
1 Introduction............................................................................................ 5
2 Visual Quality Overview........................................................................ 6
2.1
Visual Quality................................................................................................................ 6
2.2
2.3
2.4
Technical Requirements.............................................................................................12
3.1.1
Frame Rate................................................................................................................... 13
3.1.2
3.2
Technical Background................................................................................................19
4.2
4.2.1
Blockiness..................................................................................................................... 20
4.2.1.1
Root Causes................................................................................................................. 21
4.2.2
Tiling..............................................................................................................................21
4.2.2.1
Root Causes................................................................................................................. 22
4.2.3
Blurring..........................................................................................................................23
4.2.3.1
Root Causes................................................................................................................. 23
4.2.4
Jerkiness....................................................................................................................... 23
4.2.4.1
Root Causes................................................................................................................. 24
4.2.5
Additional Results......................................................................................................... 24
4.2.6
MOS Prediction............................................................................................................. 25
4.3
4.4
4.5
4.6
4.6.1
4.6.2
SwissQual... Diversity
Contents
5.1.1
Perceptual Difference....................................................................................................31
5.1.2
Additional Results......................................................................................................... 31
5.1.3
MOS Prediction............................................................................................................. 31
5.2
5.3
VQuad08 Application.................................................................................................. 33
5.3.1
Lip-Sync in VQuad........................................................................................................ 33
A Acknowledgements............................................................................. 34
Glossary: Abbreviations......................................................................35
SwissQual... Diversity
Introduction
1 Introduction
The following sections describe the technical background, the application scenarios as
well as the parameters that SwissQual video quality measurements record.
To objectively predict visual quality, SwissQual uses the VMon algorithm for the no-reference approach and the VQuad algorithm for the full-reference approach. SwissQual
has successfully used these algorithms in quality measurement systems for several
years. For the full-reference approach, SwissQual provides a set of video clips that
cover different types of videos. For higher confidence in the measurement results,
SwissQual has adjusted the algorithms to perfectly harmonize with these video clips.
To keep pace with rapidly evolving video compression and transmission techniques,
SwissQual is constantly improving the VMon and VQuad algorithms.
The main indicators and the presentation scheme have not been changed.
The latest versions of these algorithms are VMon08 and VQuad08, which improve and
extend detectors as well as the perceptive weighting for individual degradations. The
new versions are more robust with respect to the latest coding technologies than the
previous versions and are less dependent on content. The redesign of the internal
structure also provides a framework for High Definition (HD) resolution.
SwissQual... Diversity
SwissQual... Diversity
English
German
French
Spanish
excellent
ausgezeichnet
excellent
excelente
good
gut
bonne
buena
fair
ordentlich
assez bonne
regular
poor
drftig
mediocre
mediocre
bad
schlecht
mauvaise
mala
Each individual score is influenced by the global experience of the user, expectation,
and individual preferences. That is, different people tend to assign different quality
scores to the same clip. Scores are also subject to short term focus and accidental
assignment. Consequently, a MOS value is the average of a wider or narrower distribution of individual scores.
The main disadvantage of this approach is that individuals assign different scores to a
clip of perfect quality due to a lack of confidence, accidental down-scoring, or from
being overly critical. The highest MOS value in subjective tests is usually around 4.5.
Conversely, people tend to assign a score of 'bad' to most of the lower quality clips.
The main reason for this score is that the lower end of the quality values is much wider
and one can also choose 'worse than bad' while at the upper end one cannot assign a
quality value that is better than undisturbed speech or video.
However, we also have to consider that the MOS is an average value for the scores
from a group of at least 24 people. In scientific papers, the standard deviation of the
MOS is also included to represent the distribution width of the individual scores. An
additional value that is often included with the MOS is the 95 % confidence interval,
which represents the range of statistically probable MOS assignments by 95 % of the
global population. This interval allows you to determine how close the MOS is to the
'true quality' of the clip. Logically, this confidence interval is smaller for larger test
groups. In a well designed traditional test, the interval is about 0.2.
SwissQual... Diversity
The term 'MOS' is only a generic approximation of a measurement unit and is meaningless if you do not specify the kind of quality perception that the MOS describes. A
MOS can be obtained for listening quality and visual quality.
Objective measurements do not evaluate quality in the traditional sense, but rather
estimate or predict quality as if the clip had been observed by a large group of people.
More than 5000 of subjectively scored samples are used to train the VQuad, VMon,
and SQuad algorithms. These objective measures are based on sophisticated psychoacoustic, psycho-visual, and perceptive models that processes signals in a similar way
to the human auditory and visual systems. The signal analysis and the subsequent
comparison to the undistorted original signal leads to a quality value that is mapped to
a common 5 to 1 scale.
The performance of the objective measures is usually represented by correlation coefficient and residual prediction error data on a scatter plot where the subjective and
objective data are plotted on the X and Y axes, respectively. On such a diagram, a
good objective measure is narrowly distributed along the 45 line.
Both of these quality assessment methods predict the Mean Opinion Score (MOS) that
would be obtained from a subjective test. figure 2-1 provides an overview of the basic
relationship between subjective and objective assessments as well as the full and no
reference approaches.
SwissQual... Diversity
Internal
reference
expectation
Human viewer
Quality rating
Non-intrusive In-Service Monitoring: Assesses the video signals in real applications, such as IPTV or video telephony, by parallel monitoring in the core network.
This method includes no-reference approaches.
SwissQual... Diversity
The single-ended models use signal analysis methods to look for known types of distortions. For example, the models search for typical coding artefacts such as visible
block structures or freezing events. More advanced methods apply perceptual models
to the detected distortions that consider the effects of the human visual system such as
local contrast adaptation or masking.
The accuracy of a no-reference approach is lower than the full-reference approach.
However, the accuracy is more than sufficient for a basic classification of the video
quality and the detection of consistently poor quality links.
Since the reference signal is not available, no-reference video quality models are subject to a content dependency. If the video contains natural objects and a small amount
of motion, the extraction of the individual features performs well. However, if the video
contains unnatural content such as cartoons, moving or fixed graphical objects or still
sequences, the feature extraction can lead to inaccurate results. Such results are
caused by the similarity of the content characteristics to typical compression and transmission distortions.
Cartoons, for example, contain a restricted number of Colour as well as entire areas
that are filled with the same colour and without natural texture. which is acceptable in a
cartoon. However, unlike a cartoon, such effects in a video with natural content are
seen as a strong distortion. Since the measure has no a-priori knowledge of the content, these contents are predicted with low quality, even though this is not true for the
cartoon.
A similar case is a graphically animated background, for example, during a TV newscast. This type of background can contain solid colour areas with horizontal and vertical sharp edges or even moving blocks. These objects are easily interpreted as
unnatural coding artefacts and can become subject of misinterpretation.
10
SwissQual... Diversity
The analysis results from one short clip might provide information about serious distortions. However, for a more accurate quality analysis, SwissQual strongly recommends
evaluating several video sequences with a no-reference model and using the average
of the results to completely characterize a transmission channel.
11
SwissQual... Diversity
VQuad uses a reference signal, which must be in uncompressed format, that is, perfect
quality, with a frame rate of 25 or 30 fps.
The reference signal must have the same image resolution as the degraded video.
VQuad does not rescale the video.
The VMon and VQuad methods analyze a video clip in a raw non-encoded format such
as RGB24, where each frame is considered a bitmap and the RGB values for each
pixel are available. In addition to this spatial information, these methods also require
the display time of each individual frame to calculate temporal effects.
A Diversity system only uses RGB24 to store and analyze uncompressed video clips.
The VMon08 algorithm can also use YUV format.
VMon evaluation, as measured on an Intel Xeon processor at 2.33 GHz, is faster than
playback time due to consequent run time optimization, even for larger image sizes
such as a VGA signal that has been sampled at 25 fps. Due to a pre-evaluation of the
reference video, VQuad has a slightly longer evaluation time.
As the VMon solution can dynamically adjust the algorithm computations to the available processing resources, VMon can be run on the Symbian mobile OS platform. As a
result, VMon is an ideal component for lower performing platforms such as mobile
phone operating systems and digital signal processors.
On low performing platforms, the estimation of quality related values can be less accurate due to the dynamic adjustment of calculation depth.
12
SwissQual... Diversity
Still images or completely frozen video sequences are signalized, but MOS values are
not calculated.
13
SwissQual... Diversity
Usually, the prediction accuracy is provided by the correlation coefficient of the objective scores to the subjective MOS as a single number value for performance accuracy.
A score close to 1.0 indicates a high prediction accuracy while lower scores indicate a
lower prediction accuracy. In general, correlations of less than 0.7 describe a model as
weak, while correlations less than 0.5 describe a model as unusable. A more detailed
view is possible with scatter plots, which plot the subjective MOS versus the objective
scores. figure 3-1 contains an example of a detailed analysis of VQEG QCIF datasets.
Each point in the diagram represents a video sample that has been scored subjectively
and objectively. For points above the 45 line, the objective measure indicates a higher
quality than the quality that was derived in the subjective test. Similarly, points below
the 45 line indicate a more pessimistic quality prediction.
In figure 3-1, the accuracy of VQuad08 predictions is noticeably better than VMon08
predictions. The VQuad scores are closely grouped and are nearly symmetrically distributed along the 45 line. However, due to content dependencies, a few outliers are
incorrectly predicted, that is, VQuad rates individual files in one condition either too
highly or too lowly.
To avoid under predictions, VMon08 searches for known distortions that are based on
a general expectation. If a distortion is found with confidence, the score is calculated
correctly. An over prediction can occur if VMon08 does not detect a visible distortion. In
essence, VMon08 tends to yield an over prediction for missed distortions and no under
predictions. For applications such as a trigger-based troubleshooting system VMon08
tends toward 'false acceptance' but avoids 'false rejection', which is useful for systems
where false alarms require more operational effort.
VQuad08 vs Visual Quality (MOS)
per file analysis
QCIF data VQEG q05
VMon08
VQuad08
= 0.84
Visual MOS
= 0.93
Fig. 3-1: Per-sample comparison between VMon08 and VQuad08 data on example data in QCIF resolution
The results that have been discussed up to now have been on a per sample basis. To
evaluate a channel or video system, a set of different samples with different contents
are typically used and transmitted through the system.
In voice quality tests, a so-called per-condition analysis is usually performed as well.
This analysis averages the scores of a condition, that is, a given codec setting, for
each talker and each sentence. This averaging minimizes dependencies on individual
characteristics and instead focuses more on the system being tested.
14
SwissQual... Diversity
This approach can also be applied to video analysis by averaging across different contents. The deviation of the per-sample scores for the same condition is wider for video
than for speech, which is mainly caused by the wider variation of the video content that
was transmitted. However, content averaging provides a good real-life overview for a
channel or codec performance in which a wide range of contents must be processed.
In the example data set, eight different contents were always processed with the same
condition. A so-called per-condition evaluation is obtained when these eight individual
scores are averaged in the subjective and objective domain. figure 3-2 displays these
results, which are based on the same example data that was taken from the VQEG
data set in QCIF.
VMon08 vs MOS (subj)
VMon08
VQuad08
= 0.97
Visual MOS
= 0.98
Fig. 3-2: Per-condition comparison between VMon08 and VQuad08 data on example data in QCIF resolution
The charts show that the prediction accuracy of VMon08 and VQuad08 increases significantly while under- or over-predictions that are caused by individual contents (see
figure 3-1) are 'averaged out' completely.
For a complete overview of the algorithm accuracy, the correlation coefficients for the
14 QCIF data sets are shown in figure 3-3. Initially, a 'per-sample' evaluation is performed during which each score for each video sample file is considered individually.
The statistical evaluation procedure is equivalent to the VQEG primary analysis.
15
SwissQual... Diversity
Fig. 3-3: Correlation coefficients between MOS values obtained in auditory tests and objective scores
based on the VQEG QCIF data set
For comparison, the VMon06 VQEG performance, the predecessor to VMon08, has
also been included in figure 3-3. As the chart clearly shows, the VMon08 performance
is significantly better than VMon06. The chart also shows the performance of VQuad08
on the same data set. Due to the fact that VQuad is a full-reference model and can
perform more detailed analysis, the prediction accuracy for VQuad is higher than the
no-reference models.
The statistical evaluation of VQuad08 is equivalent to the method applied by VQEG to
the full-reference models within its evaluation. Note that there are small differences in
the evaluation method for full-reference and no-reference models.
figure 3-4 shows the accuracy in a 'per-condition' analysis for all 14 data sets.
Taking the discussion of figure 3-1 into consideration, the performance increases in
figure 3-4 are due to averaging across the individual video contents.
The applied method for content averaging is different to VQEGs so-called secondary
analysis and should not directly be compared to these results.
VMon08 and VQuad08 have been optimized for the QCIF-like resolution sizes that
mobile phone applications and devices use today. Along with the widening up of the
16
SwissQual... Diversity
data channels in the mobile networks and the progress of IPTV solutions, SwissQual is
continuing to improve VMon08 and VQuad08, especially for larger video resolutions.
For comparison, the evaluation of the VQEG data at higher resolutions, that is, CIF and
VGA, is shown in table 3-1. The data is obtained from 14 databases for CIF and 13 for
VGA. The evaluation follows the same rules as the results in figure 3-3, and figure 3-4.
Fig. 3-4: Correlation coefficients between MOS values obtained in subjective tests and objective
scores based on the 14 VQEG QCIF data sets on a per-condition evaluation
The main value is the correlation coefficient, which is averaged for the databases. The
value in parenthesis, the average r.m.s.e., allows for a rough estimate of the size of
prediction errors. For a good prediction accuracy, the correlation coefficients must be
close to 1.0 and the r.m.s.e. must be small.
Table 3-1: Results of VMon08 and VQuad08 for all three resolution sizes
Resolution
VMon08
VMon08
VQuad08
VQuad08
(per sample)
(per condition)
(per sample)
(per condition)
QCIF
0.73 (0.70)
0.91 (0.37)
0.88 (0.50)
0.95 (0.27)
CIF
0.63 (0.78)
0.81 (0.51)
0.86 (0.51)
0.97 (0.22)
VGA
0.52 (0.92)
0.75 (0.57)
0.86 (0.54)
0.93 (0.31)
As mentioned before, VMon08 has been optimized for the smaller image sizes that are
used in mobile services, and has the highest accuracy for QCIF and similar resolu-
17
SwissQual... Diversity
tions. Although VMon08 is still acceptable for CIF resolutions, only a rough categorization of the visual quality is possible for VGA video.
The full-reference VQuad08 method has far more information available for a quality
estimation and has an 0.86 correlation that is higher for almost all image sizes.
18
SwissQual... Diversity
19
SwissQual... Diversity
Blockiness: Visible block borders that are caused by compression during the
encoding process
Tiling: Visible macro-block and slice edges that are caused by encoding or transmission errors
Blurring: Loss of sharp edges, which is caused by strong compression or decoding filters
These perceptual degradation measures are the basis for MOS prediction.
Root causes use a technical scale that ranges from 0 % to 100 %, where 0 % represents no degradation and 100 % represents the maximum possible degradation.
The percentage values of one degradation measure do not relate directly to the perceived quality, which depends on a combination of all degradations. That is, you cannot interpret VMon results in the form of "30 % jerkiness is poor quality". However, the
individual values are of importance for relative measurements of the form "video A has
20 % blockiness and video B has 25 % blockiness, therefore video A has less blockiness than video B".
Due to the nature of the content, a small amount of degradations are often present. In
general, results below 10 % might be caused by the actual content of the video and will
have no considerable influence on the quality prediction.
4.2.1 Blockiness
Blockiness is an effect that is caused by the division of an image into smaller squares,
that is, blocks, by the encoding process. Almost all of current video encoders use a
block based transformation. Due to a lossy encoding of these blocks, a resulting block
structure can be seen in the decoded video sequence. Various block sizes are used,
with 8 x 8 and 16 x 16 pixels as the most frequent ones.
20
SwissQual... Diversity
Traditionally, that is, in MPEG4 part2 or H.263, those blocks are 8x8 pixels where the
luminance information is encoded (so-called micro-blocks) The chrominance information is encoded in so-called macro-blocks of 16x16 pixels. The entire information related to one macro-block consists of the related chrominance information and the corresponding micro-blocks and their luminance information. The macro-block is the smallest entity of the encoded image; position and upgrade information are referring to
macro-blocks. Macro-blocks are displayed at a fixed position in a frame.
More recent video encoders, such as H.264, allow even a scalable micro-block size of
4x4, 8x8 or 16x16 pixels.
The image information in a block is normally transformed with a DCT-based transformation. Usually luminance and chrominance information are encoded separately, even
for different block sizes and only the most significant coefficients of the transformed
values are retained.
For strong compression, only a few coefficients are retained and in extreme cases,
only one, coefficient is retained, which most of the time is the one that represents a
uniform colour or luminance of the whole block. As a result of strong compression, a
block contains less or no spatial detail, and has visible transitions along its borders.
Due to the lack of transition details, the border area with the neighboring blocks
becomes more visible.
The blockiness value is an estimate of the visibility of these block borders. This value is
based on a measure of the luminance differences at block borders and is related to the
amount of spatial detail as a block border has a stronger visibility in the absence of
spatial details.
Although the blockiness measure takes into account that blocks might have different
sizes, the block borders must always be oriented horizontally or vertically and form a
right angle. The blockiness value also takes into account the luminance of the neighboring area. In very bright or very dark areas, the degradation by block borders is less
visible even though the borders are clearly measurable.
Table 4-1: Blockiness scale
4.2.1.1
Percent
Image
0%
100 %
Root Causes
The main root cause for blockiness is strong compression during encoding. In addition,
packet loss during transmission might increase blockiness.
4.2.2 Tiling
During the encoding process, a video frame is divided into blocks. An important loss of
information corresponding to one ore multiple blocks either during encoding or during
21
SwissQual... Diversity
transmission leads to tiling, which are visible tile-like artefacts in the image or video
frame.
The tiling value is focused on distortions at block borders that are caused by transmission errors. Transmission errors are handled differently by the receiving decoder. The
simplest way of handling this type of error is to freeze the last successfully updated
image to the next key-frame to provide a complete image. Other strategies include
replacing the incorrectly transmitted parts of the image with the same area of the previous frame. Advanced concealments predict missing data by using the neighboring
areas, that is, using the same motion compensation or similar spatial textures. Of
course, simple implementations just display the erroneous data, which can leads to
some strange effects. Since no concealment strategy is perfect, the residual error will
be propagated by the differential frames up to the next key-frame.
Since the transmission is organized so that macro blocks as the smallest entity, transmission errors or residual errors often have a macro-block visible structure. At least
border lines of the erroneous areas are always orientated horizontally or vertically. The
VMon08 tiling detector is especially designed to recognize such erroneous areas by
checking incoherent vertical and horizontal edges.
A threshold is applied to avoid the false detection of the tiling measure in the content of
a video sequence and hence lower scores.
Visible macro-block borders that are caused by spatial compression were counted as
tiling in previous versions of VMon. Due to the high correlation to blockiness, the blockiness value in the 08 series includes visible macro-block borders that are caused by
compression.
However, suddenly appearing macro-block structures due to a highly compressed keyframe or temporarily increased spatial compression can also be considered as tiling.
In case of high motion, the affected macro-blocks in this area might be encoded as socalled intra-blocks. Due to the limited amount of bits, the intra-blocks are highly compressed and suddenly become visible.
Table 4-2: Tiling scale
4.2.2.1
Percent
Video
0%
100 %
Root Causes
The main root cause for tiling is packet loss during transmission. Strong compression
of encoding might also increase tiling.
22
SwissQual... Diversity
4.2.3 Blurring
In VMon, blurring is measured indirectly by measuring sharpness, that is, the sharpness of the luminance edges in the frames. More specifically, sharpness measures the
luminance offset at the edge borders and relates this offset to the local contrast at the
edge location. In addition, the sharpness measure tries to avoid block border edges,
which are the result of strong compression.
The blurring value is the decrease of the sharpness in an average high quality video
sequence with respect to the sharpness of the video sequence that is being tested.
Sharpness is a value that strongly depends on the content of a video signal. For example, a cloudy sky over a meadow does not contain sharp edges. In such an image, the
sharpness is measured at the position of the sharpest edges in the frame.
Table 4-3: Blurring scale
4.2.3.1
Percent
Image
0%
100%
Root Causes
The main root cause for blurring is the use of de-blocking filters of the video decoder.
4.2.4 Jerkiness
Jerkiness is a perceptual value that measures jerks from one frame to the next. High
jerkiness is the result of a bad representation of moving objects in the video sequence.
In other words, jerkiness measures the loss of information due to a freezing period or a
low frame rate.
In case of freezing, jerkiness considers the freezing period and the assumed loss of
information during this period. This loss of information is estimated by the inter-frame
difference at the end of the period. The measure of Jerkiness comprises freezing, the
anticipated loss of information, and the Dominating Frame Rate, which is described in
the "Additional Results" sections.
In absence of explicitly frozen periods, jerkiness is mainly related to the technical value
of the Dominating Frame Rate. For moderate or high motion videos, jerkiness and
frame rate are highly negatively correlated. In low motion videos, a low frame rate
value does not necessarily imply that jerkiness is high. In this situation, the jerkiness
measure takes into account the amount of motion in the video and the frame rate only
measures the display time of frames.
In most cases, a regular temporal degradation such as a lower frame rate is usually
better accepted than an irregular freezing. The jerkiness calculation takes this effect
into account.
23
SwissQual... Diversity
This might lead to the effect that in a moderate moving clip, a consistent frame rate of
5fps causes a Jerkiness of only 15% whereas two longer freezing events, for example, 2 times 500ms, in a 15fps clip result in a Jerkiness of >50%.
Table 4-4: Jerkiness scale
4.2.4.1
Percent
Video
0%
100%
Root Causes
Large jerkiness values are the result of the reduction to a low encoder frame rate or of
transmission delays and strong packet loss during transmission.
Dominating Frame Rate in Fps (Frames Per Second): As for jerkiness, the basis
for this value is the display time of a frame, that is, the amount of time an image
remains visible until the image information changes in the next update.
In the case of a constant frame rate, the dominating frame rate is equal to the constant frame rate. In the case of a variable frame rate, the dominating frame rate is
the median of the frame rates.
Black Frame Ratio: This value provides the ratio of detected black frames with
respect to all frames in the sequence.
More specifically, the black frame ratio in percentage is the total time black frames
are displayed divided by the video sequence length. In NQDI, intervals of black
frames have a grey background in the time analysis graph.
All of the mono colour frames, including black frames, are discarded before the
MOS estimation. In the time analysis graph of NQDI, all intervals of mono colour
frames have a grey background. In the previous version, that is, VMon06, only blue
frames were discarded for the MOS calculation and sequences of other mono colour frames were considered as highly blurred frames. In QualiPoc, all mono colour
frames are counted and reported as black-frames.
Freezing: If the display time of a frame exceeds 350 ms, the frame is considered
frozen.
The freezing value is displayed in percentage as the total freezing time divided by
the video sequence length. On the time analysis graph in NQDI, freezing intervals
have a blue background
Sequences of black or other mono colour frames are not considered freezing even
if the display time exceeds the given limits.
24
SwissQual... Diversity
The left column displays technical values, such as freezing and frame rate. The next
column displays the outcomes of the different detectors as described in this document.
Furthermore, the window contains some basic information, such as the application scenario, protocol, and player information.
25
SwissQual... Diversity
The lower part of the NQDI window displays per-frame information. The upper chart
shows the inter-frame differences along with the blurring and blockiness value for each
frame. The bars that indicate the inter-frame difference are green for regular frames,
blue for repeated frames, and black for black frames that were detected. Freezings and
mono colour frames are marked with a shadowed background for easy visibility.
The second chart displays results from the content and scene analysis, which is
restricted to the audio activity (channel active / inactive) and detected scene changes
(vertical black lines). The scene analysis is subject to extensions for the next releases
of VMon.
26
SwissQual... Diversity
27
SwissQual... Diversity
VMon08
VMon08
(per file)
(per condition)
QCIF
0.73 (0.70)
0.91 (0.37)
CIF
0.63 (0.78)
0.81 (0.51)
VGA
0.52 (0.92)
0.75 (0.57)
Since VMon is optimized for the smaller image sizes that are currently used in mobile
services, the accuracy is best for QCIF and similar resolutions and is still acceptable
for CIF resolutions; however, VMon can only provide a rough categorization of visual
quality for VGA resolutions. However, the next use case described below allows the
use of VMon even for VGA.
28
SwissQual... Diversity
Criteria A False Rejection: MOS > 2.7 & VMon < 2.5
Criteria B False Acceptance: MOS < 2.3 & VMon > 2.5
Table 4-6: False acceptance and false rejection ratio of all experiments for each format
Format
QCIF
7.6%
2.8%
CIF
11.5%
3.0%
VGA
15.6%
4.8%
The results in table 4-6 show at an alarm is only incorrectly raised in approximately 3 to
4% of the cases on a per-sample basis. However, quality problems that are not identified remain within a range of 8 % to 15 %. This asymmetry is particularly useful to
avoid false alarms and to focus on cases where the quality drops with confidence.
In a real world application, such decisions are not exclusively based on a MOS.
Instead, these decisions also take partial results of the analysis into account, which
leads to even more confident results.
In summary, no-reference models can be used in certain applications which cannot be
addressed by full-reference approaches and can deliver worthwhile results.
29
SwissQual... Diversity
Blockiness: Visible block borders that are caused by compression during encoding
Tiling: Visible macro-block and slice edges that are caused by encoding or transmission errors
Blurring: Loss of sharp edge details that are caused by strong compression or
decoding filters
Perceptual difference: Perceived difference between matched frames of the reference and degraded video sequence.
These perceptual degradation measures are the bases for MOS prediction. The degradation measures for blockiness, tiling, blurring, and jerkiness are the same as for
VMon.
For more information, see chapter 4.2, "Perceptual Degradation and MOS Prediction",
on page 20.
Root causes use a technical scale that ranges from 0 % to 100 %, where 0 % is no
degradation and 100 % is the maximum possible degradation.
The degradation percentage values are reported with respect to the reference value,
that is, a 0 % value means that the transmitted sequence has not been degraded with
respect to the reference sequence.
30
SwissQual... Diversity
Dominating Frame Rate, Freezing, Black Frame Ratio: For more information,
see chapter 4.2, "Perceptual Degradation and MOS Prediction", on page 20.
Matched Frames: Relative number of frames in the coded and transmitted video
sequence that match a frame in the reference video sequence.
This value is calculated with respect to the total number of frames in the transmitted video sequence. In other words, 100 % of matched frames means that all
frames of the coded and transmitted video sequence could be matched to a frame
of the reference sequence.
Frame Jitter: Standard deviation of the frame display time where a high value of
the frame jitter is the result of irregular video playback.
31
SwissQual... Diversity
where ftemporal is a function of temporal degradations and fspatial_fullRef is a function of spatial degradations, which is dominated by the perceptual difference measure.
The maximum predictable MOS is 4.5 and the minimum is 1.0.
The individual results are explained in the earlier sections of this document. The main
value, visual quality (estimated MOS), is rated into one of five categories. You can configure these categories individually in NQDI.
If a video sequence has an audio track, VQuad also performs an audio-video synchronization evaluation. The result of this lip-sync evaluation is displayed on the lower right
of the window.
In the time domain charts, the results are sub-divided into two sections. The first diagram shows the inter-frame differences and the IP throughput during IP streaming services. The inter-frame differences provide information about the movement and the
temporal complexity. The IP-throughput is drawn along the time axis as a red line.
The lower chart in shows the per-frame results for blurring, blockiness, and PSNR.
32
SwissQual... Diversity
PC: VGA, SD
For performance reasons, the clips that are used for VQuad have a small watermark in
the last lines of each clip. VQuad uses this watermark to assign the correct reference
clip. An individual marker is included in each frame so that the match between a
received frame and the corresponding reference frame can be found efficiently. The
marker lines are ignored in the quality analysis.
You cannot analyze VGA or SD contents on the Diversity platform. Instead, these
image sizes must be hosted by PC server applications.
33
SwissQual... Diversity
Acknowledgements
A Acknowledgements
SwissQual would like to thank VQEG and the parties that were involved in the multimedia test phase I.
The video data and the subjective scores that were used for development VMon and
VQuad were provided to VQEG by the namely listed companies below:
Acreo AB (Sweden)
CRC (Canada)
FUB (Italy)
KDDI (Japan)
INTEL (USA)
IRCCyN (France)
NTIA/ITS (USA)
NTT (Japan)
OPTICOM (Germany)
Psytechnics (UK)
Symmetricom (USA)
34
SwissQual... Diversity
Glossary: Abbreviations
Glossary: Abbreviations
M
MOS: Mean Opinion Score
V
VQEG: Video Quality Expert Group, an independent international forum for video quality evaluation metrics
35