Anda di halaman 1dari 12


3, SEPTEMBER 2010 269

Full-Reference Video Quality Metric for Fully

Scalable and Mobile SVC Content
Hosik Sohn, Hana Yoo, Wesley De Neve, Cheon Seog Kim, and Yong Man Ro, Senior Member, IEEE

Abstract—Universal Multimedia Access (UMA) aims at en- (UMA) [1], [2] strives for enabling a straightforward consump-
abling a straightforward consumption of multimedia content in tion of multimedia content in diverse usage environments. Scal-
heterogeneous usage environments. These usage environments able coding can be seen as one of the most important tools to
may range from mobile devices in a wireless network to high-end
desktop computers with wired network connectivity. Scalable realize the UMA vision, as it allows optimizing the End-to-End
video content can be used to deal with the restrictions and ca- Quality of Service (E2E QoS) [3]–[6] in systems for multimedia
pabilities of diverse usage environments. However, in order to delivery and consumption.
optimally tailor scalable video content along the temporal, spatial, Scalable Video Coding (SVC) is a new standard developed by
or perceptual quality axis, a metric is needed that reliably models the Joint Video Team (JVT) of the ITU-T Video Coding Experts
subjective video quality. The major contribution of this paper
is the development of a novel full-reference quality metric for Group (VCEG) and the ISO/IEC Moving Picture Experts Group
scalable video bit streams that are compliant with the H.264/AVC (MPEG). The SVC specification enables the creation of video
Scalable Video Coding (SVC) standard. The scalable video bit bit streams that can be adapted along the temporal, spatial, and
streams are intended to be used in mobile usage environments (e.g., Signal-to-Noise Ratio (SNR) scalability axis, respectively re-
adaptive video streaming to mobile devices). The proposed quality sulting in an adjustment of the frame rate, the spatial resolution,
metric allows modeling the temporal, spatial, and perceptual
quality characteristics of SVC-compliant bit streams by taking and the perceptual quality. Consequently, using a combination
into account several properties of the compressed bit streams. of the aforementioned adaptation possibilities, it is possible to
These properties include the temporal and spatial variance of the create video bit streams that offer a diverse set of spatial res-
video content, the frame rate, the spatial resolution, and PSNR olutions, frame rates, and perceptual quality levels for a given
values. An extensive number of subjective experiments have been target bit rate, without requiring complicated transcoding opera-
conducted to construct and validate our quality metric. Exper-
imental results show that the average correlation coefficient for tions [7], [8]. As such, it should be clear that SVC-compliant bit
the video sequences tested is as high as 0.95 (compared to a value streams are of particular interest for realizing a UMA-enabled
of 0.60 when only using the traditional PSNR quality metric). multimedia system.
The proposed quality metric also shows a performance that is a To satisfy the high E2E QoS requirements of present-day and
uniformly high for video sequences with different temporal and future multimedia systems, it is essential to have a thorough un-
spatial characteristics.
derstanding of subjective video quality, i.e., video quality as ex-
Index Terms—QoS, quality measurement, quality metric, SVC. perienced by end-users. Mean Squared Error (MSE) and Peak
Signal-to-Noise Ratio (PSNR) are two frequently used methods
for the objective assessment of video quality. Their computation
I. INTRODUCTION is simple and straightforward. However, it is also well-known
that significant discrepancies can often be observed between
HE demand for ubiquitous consumption of multimedia
T resources is steadily increasing. In order to have access
to multimedia resources, consumers are relying on a plethora
MSE or PSNR values on the one hand and the perceived video
quality on the other hand. Furthermore, when the frame rate,
spatial resolution, and perceptual quality are jointly adjusted
of networks and terminals, ranging from mobile devices in a for a scalable video bit stream, the computed MSE and PSNR
wireless network to high-end desktop computers with wired net- values often do not reflect the subjective quality [9], [10].
work connectivity. All of these different networks and terminals A significant number of research efforts have been dedicated
come with particular restrictions and capabilities, such as band- to the construction of objective video quality metrics that aim at
width availability, display resolution, energy consumption, and better modeling of subjective quality than MSE and PSNR. The
computational power. Moreover, dependent on their physical Video Quality Experts Group (VQEG) is the most well-known
capabilities, end-users may have different preferences on how group of experts working in the field of video quality assess-
to consume multimedia content. Universal Multimedia Access ment [11]. Their work has been used by the ITU as the basis for
several recommendations. An example of such a recommenda-
Manuscript received January 11, 2009; revised February 11, 2010; accepted tion is ITU-T J247, which deals with objective perceptual mul-
May 05, 2010. Date of publication June 07, 2010; date of current version August timedia video quality measurement in the presence of a full ref-
20, 2010.
The authors are with the Image and Video Systems Laboratory, Korea
erence [12]. S. Wingler and E. Ong et al. have been working
Advanced Institute of Science and Technology, Daejeon 305-732, Re- on video quality assessment techniques using particular proper-
public of Korea (e-mail:;; ties of the Human Visual System (HVS), such as human color;; perception and the theory of opponent colors [13]–[16]. The
Color versions of one or more of the figures in this paper are available online
at National Telecommunications and Information Administration
Digital Object Identifier 10.1109/TBC.2010.2050628 (NTIA) [16] has proposed a video quality metric that provides
0018-9316/$26.00 © 2010 IEEE

an estimate of the subjective quality of video content by com- video content, while the second step measures the video quality
paring several parameters between the original and the distorted by taking into account the classified genre.
video sequence. These parameters include edge, texture, color, This paper discusses a full-reference quality metric for video
and angle information. bit streams compliant with SVC. The proposed quality metric
Quality assessment in broadcasting systems has also been in- takes into account the frame rate and the standard deviation of
vestigated [17], [18]. Using the reduced-reference quality metric the motion magnitude, as well as information regarding the spa-
discussed in [16], M. Pinson and S. Wolf discuss a broadcasting tial resolution, the PSNR, and the edge complexity of each pic-
system in which the quality of the video content is measured ture. Further, we also target a mobile usage environment in this
in the terminal responsible for the playback of the video con- paper. Therefore, the spatial resolution of the video bit streams
tent [18]. A low-complexity quality metric had to be devised, varies between QCIF (Quarter Common Intermediate Format;
as delay needs to be minimized in real-time broadcasting sys- 176 144) and CIF (Common Intermediate Format; 352
tems. In [19], a comparison can be found of different quality 288). Note that this paper does not address the impact of packet
metrics, as well as a system that classifies quality metrics ac- loss on video quality, which is also an important characteristic
cording to their objective or subjective nature. In their conclu- of mobile usage environments.
sions, the authors claim that the construction of a “general-pur- The potential use of video quality metrics is diverse. Video
pose” quality metric might be too complex, and that applica- quality metrics can be used for determining optimal compres-
tion-specific quality evaluation is more sensible. The authors sion parameters during encoding [30] and for monitoring the
also conclude that more research should be focused on quality video quality in digital broadcasting networks [31]. In addi-
evaluation of recent image and video coding formats. tion, video quality metrics can be used for guiding the decision-
Quality metrics have also been proposed that directly make making process in extraction software for scalable bit streams.
use of coded bit stream characteristics, such as frame and The latter functionality makes it possible to achieve a maximum
bit rate, instead of relying on properties of the human visual E2E QoS when a bit stream extractor has several adaptation pos-
system [9], [10], [20]. Video quality is also strongly dependent sibilities to meet a particular target bit rate.
on the coding format used. Hence, in the scientific literature, This paper is organized as follows. Section II discusses the
quality metrics have been proposed that are format specific. overall setup of the experiments that were conducted to collect
In [21]–[23], quality metrics are proposed for H.262/MPEG-2 subjective quality data, as well as the methodology used to mea-
Video and H.264/AVC. However, all of the aforementioned sure PSNR. This section also discusses how the spatial and tem-
quality metrics are not able to take into account both the spatial poral variance of video content affects the quality of the video
resolution and the frame rate of video bit streams. This is in content, and how to quantify their influence. The analysis of the
practice a significant problem when measuring the quality of collected quality data and the process of quality metric modeling
SVC bit streams, because these bit streams can be adapted are presented in Section III. Finally, a performance evaluation
along the temporal, spatial, and perceptual quality axis. of the proposed quality metric is provided in Section IV, while
As video content is increasingly consumed in mobile usage Section V concludes this paper.
environments, several quality metrics have been proposed that
take into account a low spatial resolution or wireless connec- II. SUBJECTIVE QUALITY ASSESSMENT
tivity [24]–[26]. The quality metric in [9] is for instance param- In order to construct a metric that is able to reflect video
eterized in terms of frame rate, motion magnitude, and PSNR. quality in a reliable way, several subjective quality assessments
That way, it is possible to deal with joint adjustments of the need to be performed. The obtained experimental results are
quantization parameter and the frame rate when a given target used for two purposes in our research: for the construction of
bit rate has to be maintained. However, information regarding a quality metric in a step-by-step approach on the one hand
the spatial resolution is not incorporated in the quality metric, (where each step relies on experiments using different settings
although the importance of spatial resolution is high in the con- and video sequences), and for the independent verification of
text of mobile video content [27]. In [10], a quality metric for the reliability of our quality metric on the other hand. There-
fully scalable SVC bit streams has been proposed, taking into fore, the overall setup of our subjective experiments is described
account the frame rate, motion information, PSNR, and the spa- first in this section. Further, this section also pays attention to
tial resolution. However, the effect of the spatial resolution is the methodology used for measuring PSNR (as PSNR measure-
only partially considered, as the quality metric does not consider ment is more complicated for scalable video content than for
the influence of video characteristics such as edge information. non-scalable video content). Finally, this section also discusses
Also, in the experiments, video sequences are used that are sim- how quality is influenced by the spatial and temporal character-
ilar in terms of spatial detail. istics of video content, as well as how to quantify the spatial and
The spatial and temporal characteristics of video sequences temporal complexity of a particular video sequence.
exert a dominant influence on video quality [9], [16], [23], [28].
Therefore, it is necessary to take into account these character- A. Experimental Environment
istics in order to improve the performance of a quality metric. The settings for our subjective experiments were in line with
In [29], [30], a quality metric has been proposed that allows the requirements of ITU-R recommendation BT.500-11 [32].
dealing with several video genres, like action and drama. In this The total number of human observers per experiment was 16.
method, the use of the quality metric is divided into two steps: A 30 inch LCD monitor was used (DELL3007WFB), with
the first step consists of genre classification of the consumed the viewing distance between the participants and the monitor

Fig. 2. Scalable bit stream creation.

Also, the need for fine-grained spatial scalability during the

Fig. 1. Video sequences used for quality metric construction and validation. construction of our quality metric implied that our bit streams
are compliant with the Scalable High Profile of SVC. Indeed, in
the Scalable Baseline Profile, which explicitly targets low-com-
fixed to six times the height of a video sequence. To avoid plexity decoding in mobile usage environments, support for spa-
constructing a quality metric that can only deal with video tial scalable coding is restricted to resolution ratios of 1.5 and
content of a specific nature, twelve video sequences were used, 2 between successive spatial layers in both horizontal and ver-
covering a wide range of spatial and temporal characteristics: tical direction. This restriction does not hold true for the Scal-
Harbor, Mobile, Silent, and Crew, as well as video content able High Profile. It is also important to note that, while the
originating from regular TV programs. Harbor, Mobile, Silent, Scalable High Profile is used to create bit streams needed for
and Crew are online available [33]. Snapshots of the video constructing our quality metric, the Scalable Baseline Profile is
sequences used are shown in Fig. 1: (a) Harbor, (b) Soccer used to encode bit streams for verifying the reliability of the pro-
Game, (c) Mobile, (d) Interview, (e) Snow Forest, (f) Silent, posed quality metric (see Section IV).
(g) Child, (h) Crew, (i) Rain, (j) Soccer, (k) Mountain, and (l) Further, each spatial layer consisted of three temporal layers,
Soccer Ground. making it possible to offer three different frame rates for a given
The original version of each video sequence has CIF resolu- spatial resolution: 7.5 fps, 15 fps, and 30 fps. Due to a restriction
tion, a fixed frame rate of 30 frames per second (fps), and a total of the reference software regarding the maximum number of
duration of eight seconds. The video sequences were encoded layers in a scalable bit stream, SNR scalability was realized by
and decoded using the Joint Scalable Video Model (JSVM) 9.16 encoding a video sequence six times using the spatial and tem-
reference software. Given a particular video sequence and a poral configuration as previously discussed, each time using a
Quantization Parameter (QP) value, a scalable bit stream was fixed QP value for all spatial layers. Encoding settings were also
generated with six spatial layers, making it possible to vary the selected in order to reflect the typical requirements of mobile
spatial resolution between QCIF and CIF. To be more precise, usage environments (e.g., use of CAVLC instead of CABAC, a
the following spatial resolutions were used: 176 144, 224 base layer compatible with single layer H.264/AVC, use of an
176, 256 208, 288 224, 320 256, and 352 288. While IPPPPPPP coding pattern in the original bit streams, etc.). The
a number of these spatial resolutions are not frequently used way the scalable bit streams were created is also summarized in
in practice for the coding of mobile video content (i.e., 256 Fig. 2. The construction of our quality metric consisted of sev-
208 and 288 224), the higher amount of variation in terms of eral steps. For each step, a subset of the total number of available
spatial resolution allowed to more accurately model the spatial video sequences was used. This selection process is described
component of the quality metric proposed in this paper. in more detail in Section III.

When conducting subjective tests, the way the video quality is

graded may depend on “context effects”. Context effects are ob-
served when the perceived video quality of one video sequence
is influenced by the perceived video quality of the other video
sequences included in the subjective experiment. As explained
in [34], among several double stimulus methods, the influence
of contextual effects is minimal when the Double Stimulus Con-
tinuous Quality Scale (DSCQS; [32]) is used. Further, DSCQS
was also used in [9] for grading test sequences with a low spatial
resolution of 320 192 (the quality metric discussed in [9] was
used as a starting point for the construction of the quality metric
proposed in this paper). Therefore, in order to model and eval-
uate quality metrics in our research, we have selected DSCQS
using the Differential Mean Opinion Score (DMOS) for mea-
suring video quality using subjective experiments. Note that a
single stimulus method could also have been used for the pur-
pose of grading low-resolution and low-quality videos [35]. Fur-
ther, it is worth mentioning that the perceived quality may also
depend on the order in which the video sequences are shown
to the participants, an observation known as “the order effect”.
Fig. 3. Process for measuring PSNR.
Order effects, which are present with all subjective methodolo-
gies, are typically addressed by randomizing the presentation
order of the video sequences in subjective experiments.
The DMOS method consists of randomly displaying two B. PSNR Measurement
video sequences A and B, where one of the video sequences
is the reference sequence and where the other video sequence SVC bit streams may offer an arbitrary combination of spa-
is the impaired sequence. An opinion score is subsequently tial, temporal, and SNR scalability. Adjusting the spatial reso-
assigned to the video sequences A and B. This opinion score lution or frame rate of an SVC bit stream results in problems
may range from 0 to 100. The differential mean opinion score when measuring PSNR, as the use of this video quality metric
(DMOS) for the perceived quality of the impaired sequence assumes that the spatial resolution and the frame rate of the orig-
is obtained by subtracting the DMOS score of inal and the impaired video sequence are the same. Therefore, a
the reference sequence from the DMOS score of the impaired number of measures had to be taken in order to make the compu-
sequence. In this paper, an value close to zero means tation of PSNR values possible for arbitrarily adjusted scalable
that the impaired sequence is similar to the reference one. On bit streams.
the other hand, an value far from zero means that To achieve spatial scalability, encoding by the JSVM refer-
the impaired sequence is distorted. The worse the quality of ence software is done using different versions of a particular
the impaired sequence, the farther from zero the video sequence, where each version has a different spatial res-
value. As such, an value of ‘ 100’ is considered to olution. These different versions of the original video sequence
represent the worst quality possible. can be generated using a downsampling tool that is part of the
The score of the quality metric proposed in this paper ranges JSVM reference software (in our experiments, the downsam-
from 0 to 100, where a quality score of ‘100’ is considered to pling method used was based on the Sine-windowed Sinc-func-
represent the best Subjective Quality . Therefore, for con- tion). As such, in our experiments, the PSNR was measured be-
venience of modeling, is converted to using (1). tween a spatially downsampled version of the original video se-
That way, ranges from 0 (the lowest possible quality) to 100 quence and the impaired video sequence. That way, both the
(the highest possible quality). reference and the impaired video sequence have the same spa-
tial resolution.
As previously discussed, for ease of display during the
(1) subjective quality tests, frame copy operations were applied to
video sequences that were temporally down-sampled to 15 fps
Strictly speaking, the differential score could also have been and 7.5 fps. This approach also makes it possible to perform
determined by subtracting the DMOS score of the impaired se- PSNR measurements in a straightforward way. The way PSNR
quence from the DMOS score of the reference sequence. This is measured is also visualized in Fig. 3. Note that the PSNR
would have avoided using the transformation in (1). Further, the values in our research are computed in the luma domain, relying
obtained DMOS data were not scanned for unreliable and in- on the same formulas as used in [36]. In particular, the PSNR
consistent results. As such, outliers were not removed. However, values in our research are computed using (2):
despite the fact that no outliers were removed, the average size
of the 95% confidence intervals is 2.13 on the 0-100 scale used,
indicating a feasible level of agreement between observers. (2)

where denotes the maximum luma value ( is equal to TABLE I

when the pixel depth is equal to eight bits per pixel component). SV VALUES FOR THE VIDEO SEQUENCES USED
The MSE is computed using (3):


where and respectively represent the coordinate of a pixel

along the horizontal axis and the vertical axis, and where and
respectively represent the height and width of the video se-
quence. As such, denotes a pixel value in the original
video sequence at coordinate , where the video sequence
in question has a spatial resolution of . Similarly, the nota-
tion denotes a pixel value in an experimental video
sequence, having a spatial resolution of , encoded with a
QP value equal to , and having a frame rate equal to (where
is the frame rate of the adapted video sequence, i.e., the video for the video sequences used in our research are presented in
sequence obtained after adaptation, but before applying frame Table I. The higher the value, the higher the spatial com-
copy operations). plexity of a video sequence.

D. Temporal Characteristics
C. Spatial Characteristics
Besides spatial characteristics, the subjective quality of a
The spatial complexity of a video sequence affects the sub-
video sequence is also affected by temporal characteristics,
jective quality [16], [28]. If pictures have the same distortion
such as the speed of motion and the variation of the motion
measured in terms of MSE or PSNR, then the subjective quality
speed. For example, the difference in subjective quality at 7.5
increases when the spatial complexity increases [16]. Moreover,
fps and 30 fps is negligible for static video sequences as such
people have a stronger preference for the use of higher spatial
video sequences are typically characterized by slow changes
resolutions in case of pictures with a high spatial complexity
in motion. However, discrepancies in subjective quality are
often easily noticeable for dynamic video sequences, which are
The spatial complexity of a picture can be computed using
usually characterized by fast changes in motion. Indeed, the
several approaches. In [16], the luma values of the original and
subjective quality perceived for dynamic video sequences is
impaired pictures are first processed with horizontal and vertical
highly sensitive to dropped or repeated frames.
edge filters. These filters allow enhancing the edges in the pic-
The temporal complexity of a video sequence can be deter-
tures in question, while reducing noise. The enhanced edges are
mined by investigating its motion vectors. In this paper, we use a
then used for computing an estimate of the spatial complexity.
temporal complexity metric known as Temporal Variance .
In [12], [13], another technique called spatial texture masking
Similar to the spatial complexity metric, our temporal com-
is used. In our research, the spatial complexity of a video se-
plexity metric is derived from an MPEG-7 concept known as
quence is computed by relying on a metric known as Spatial
motion activity, which denotes the intuitive notion of ‘intensity
Variance . This spatial complexity metric is derived from
of action’ or ‘pace of action’ in a video sequence [39], [40]. Mo-
the MPEG-7 edge histogram algorithm [36], [38]. The value
tion activity is computed using the standard deviation of the mo-
that is computed using this algorithm is resolution invariant [36].
tion magnitude, which denotes the average value of the motion
Consequently, only one value needs to be calculated for
vectors. A high value for the motion magnitude indicates high
a scalable video sequence encoded at different spatial resolu-
activity, while a low value for the motion magnitude indicates
tions. In particular, the value is computed using the formula
low activity. Both motion magnitude and its standard deviation
can be used to reflect how the subjective quality is influenced
by the frame rate. However, the standard deviation of the mo-
(4) tion magnitude reflects motion slightly better than the motion
magnitude [39]. The values for the video sequences used in
our research are presented in Table II. The higher the value,
where and respectively represent the frame index and the type the higher the temporal complexity of a video sequence.
of edge histogram value, and where and respectively rep-
resent the total number of frames and the total number of types
of edge histogram values. In the MPEG-7 edge histogram algo- III. QUALITY METRIC CONSTRUCTION
rithm, one picture is divided into 16 sub blocks. For each sub The subjective quality of a video sequence is affected by
block, 5 types of edge histogram values (vertical, horizontal, and [12], [13], [16], [28]. Hence, our proposed quality
45 , 135 , and non-direction) are calculated [36]. denotes metric considers both and in order to quantify the sub-
the average edge histogram value of the type of edge his- jective quality. The construction of our quality metric is done
togram value of all sub blocks in the frame. The values using the step-by-step approach summarized in Fig. 4.


V QM [10]

linear with the PSNR at a fixed frame rate. This linear relation
is defined by (5):


In (5), denotes Quality Metric, while the abbreviation

represents a fixed frame rate. The coefficients and are
used to maximize the correlation between and the subjec-
tive quality. Their precise values are experimentally determined.
The second term, which is parameterized on and , is
used to compensate for the linearity between PSNR and .
In this paper, is derived from the standard deviation of
the motion magnitude. This is in contrast to [9], where is
derived from the motion magnitude. To quantify the improve-
ment that comes with using the standard deviation of the mo-
tion magnitude instead of the motion magnitude, we compare
the correlation coefficients between the subjective quality
and the scores as obtained by the quality metrics defined in
[9] and [10]. More precisely, the Pearson product-moment cor-
relation coefficient is used for measuring the correlation [41].
The quality metrics in [9] and [10] only take into account the
PSNR and the frame rate. Hence, they are respectively denoted
Fig. 4. Quality metric construction. as and . The difference
between the two quality metrics is completely determined by
the definition of . The coefficients and are different as
First, an experiment is discussed in which we quantify our well. However, the values of these coefficients are optimized ac-
proposal to replace the motion magnitude by the standard devi- cording to the definition of used (in other words, changing
ation of the motion magnitude in the quality metric presented the definition of in the quality metric defined in [9] also re-
in [9]. This modification allows to better take into account the quires recalculating the values of the coefficients and ).
temporal variance of different video sequences. Second, we in- The subjective experiment, performed to quantify the im-
vestigate the influence of the spatial variance on the subjective provement that comes with using the standard deviation of the
quality and the PSNR. To quantify this influence, an experiment motion magnitude instead of the motion magnitude, relied on
is conducted that uses video sequences with different QP values, the use of the video sequences Harbor, Soccer Ground, and
but with a fixed frame rate and a fixed spatial resolution. Third, Akiyo. These video sequences were not used in [9] and [10].
a quality metric is constructed that is able to take into account The video sequences in question were encoded at two spatial
both spatial variance and spatial scalability. To quantify the in- resolutions, QCIF (176 144) and CIF (352 288), a frame
fluence of the spatial variation on the subjective quality and the rate of 7.5, 15, and 30 fps at each spatial resolution, and a QP
spatial resolution, video sequences are used with different spa- value of 25, 35, and 45 at each frame rate, resulting in a total
tial resolutions, but with a fixed frame rate and a fixed QP. Fi- of 54 video sequences used. The results obtained for all video
nally, the quality metrics constructed in the different steps are sequences are presented in Table III. Note that the correlation
integrated. To realize this, video sequences are used having dif- coefficients in Table III were measured using values for and
ferent QP values, frame rates, and spatial resolutions. The con- that best fit the subjective quality, making use of multiple
struction of our quality metric is described in more detail in the regression analysis based on the Least Squared Method (LSM).
sections below.
B. Influence of SV on SQ-PSNR Relation
A. Improved QM for Temporal and SNR Scalability
In addition to , the relation between and PSNR is
For quality metrics that do not consider the spatial resolution also exerted by [12], [13], [16]. To incorporate this observa-
of a video sequence, [9] indicates that the subjective quality is tion in the modeling of our quality metric, an understanding is

account the of the video sequence in question. As a result

of the conducted subjective experiments, the -PSNR relation
as affected by is described in (6).


In (6), represents a normalized PSNR value,

from [20, 45] to [0, 1]. When a PSNR value exceeds a value
of 45, we have found that an observer cannot distinguish the
experimental video sequence from the original video sequence.
The value is considered to be equal to 100 in that partic-
ular case. Further, we also assume that the underlying network
Fig. 5. Influence of SV on the SQ-PSNR relation. guarantees a minimum picture quality that is equal to a PSNR
value of 20. In (6), 1.54–7.61 is derived from the relation
between the values and the shape of the -PSNR relation
needed of how influences the -PSNR relation. The reso- (using the Findgraph v1.87 graphing tool [43], minimizing the
lution and frame rate of the video sequences used in this experi- standard deviation error). Finally, taking into account our obser-
ment are fixed to CIF and 30 fps. Further, six different QP values vations regarding the use of the standard deviation of the motion
(20, 25, 30, 35, 40, 45) are used, as described in Section II-A. magnitude and the influence of on the -PSNR relation,
Consequently, the total number of video sequences used is 72, the quality metric that considers temporal scalability, SNR scal-
all having the same spatial resolution and frame rate, but a dif- ability, and spatial variation is defined as :
ferent PSNR. The influence of on the -PSNR relation is
illustrated in Fig. 5. (7)
In Fig. 5, the value is divided into three levels: Low,
Normal, and High. The video sequences used in the experi- Note that our perceptual quality metric is built on top of PNSR,
ment are classified according to these three levels. Each line taking advantage of its ease of computation and the fact that this
in Fig. 5 denotes the average value as obtained for the quality metric is well-understood and widely used by the video
video sequences that are classified into the same level. The coding community. The proposed quality metric addresses a
criterion used for the classification of the video sequences is number of shortcomings of PSNR that become apparent when
the quantization table that comes with the computation of an targeting quality measurement in the context of fully scalable
MPEG-7 edge histogram [42]. This quantization table is pre- video content (this is, support for temporal, spatial, and SNR
sented in Appendix A. The range of each level is given as fol- scalability). In particular, our quality metric uses PSNR as a pa-
lows: , , rameter that reflects the visual distortion that results from the
. For instance, a video sequence with an use of quantization, whereas other parameters in our perceptual
value of 0.13 is assigned to Normal . According to this quality metric take into account the temporal and spatial vari-
classification, Harbor, Soccer Game, and Mobile are assigned ance of the video content, as well as the frame rate and the spa-
to High , Interview, Snow Forest, Silent, Child, and Crew tial resolution of the video content.
are assigned to Normal , while Rain, Soccer, Mountain, and
Soccer Ground are assigned to Low . C. Influence of SV on SQ-Spatial Resolution Relation
As shown in Fig. 5, the shape of the -PSNR relation de- The subjective quality of a particular video sequence is af-
notes a log function that is fully saturated when the value fected by both the spatial resolution and [28]. In order to add
is equal to 100. When the value of is increasing, the slope support for varying spatial resolutions to our quality metric, an
of the curve in Fig. 5 is decreasing. This means that an ob- understanding is needed of the relation between and the spa-
server grades an experimental video sequence with a high of tial resolution, and how this relation is influenced by the spatial
better quality, although the PSNR value for all video sequences variance. The video sequences used in the resulting subjective
is the same, an observation also made in [13] and [16]. For ex- experiment have the following characteristics: a fixed frame rate
ample, when the PSNR value is close to 30 dB, the value of 30 fps, a fixed QP with a value of 35, and six different spatial
at High , Normal , and Low is respectively equal resolutions uniformly ranging from QCIF to CIF. In total, 72
to 85, 70, and 50. Therefore, the quality metric as shown in (5), video sequences were used in this subjective experiment. The
which demonstrates a linear relationship between the PSNR and outcome of this subjective experiment is shown in Fig. 6.
the , will not perform well for video sequences with a high In Fig. 6, the criterion for classifying the is again the
. In particular, the error will be enlarged for video sequences quantization table of the MPEG-7 edge histogram, which is the
with a high , as the -PSNR relation is far from linear in same criterion as used in Fig. 5 [42]. We have normalized the
these cases (as can be seen in Fig. 5, the -PSNR relation is height value of the spatial resolution using (8), which implies
only linear for video sequences having a low ). To solve this that the range 144 to 288 is replaced on the X-axis by the range
problem, we propose to make use of instead of the 20 to 0:
traditional PSNR metric. is able to better reflect the
subjective quality of a particular video sequence by taking into (8)

D. QM for Temporal, Spatial, and SNR Scalability

This section describes the construction of our proposed
quality metric, having support for temporal, spatial, and SNR
scalability. Using (7) and (9), our quality metric is defined as:


where the PSNR, as defined in [36], is dependent on the

spatial resolution and the frame rate.
and respectively in-
dicate (5), i.e., , and (7), i.e., .
Multiple regression analysis using the Least Squared Method
(LSM) was used to compute the coefficients , , , and .
Fig. 6. Influence of SV on the SQ-spatial resolution relation.
These coefficients determine the correlation between the
and the quality score obtained by evaluating our proposed .
In order to collect subjective quality data for several combi-
In (8), represents the height, while represents the nor- nations of different types of scalability, Harbor, Crew, and
malization of . We assume that a video sequence makes use Soccer Ground were used. The three selected video sequences
of a meaningful aspect ratio. Therefore, we also assume that have diverse and values. Further, each selected video
the height is sufficient to represent the spatial resolution of a sequence was coded using three different QP values (25, 35,
particular video sequence, without needing information about 45), three different frame rates (7.5, 15, 30), and three different
the width of the video sequence. Normalization is performed in spatial resolutions (QCIF, 256 208, CIF), resulting in a total
order to simplify our video quality metric. of 81 video sequences used in this experiment.
In Fig. 6, each point indicates the average value of the ob- The values of , , and
served subjective quality for each level, while the dotted are computed for each video sequence
and dashed lines (Low QM, Normal QM, High QM) are the re- using information about the , , PSNR, frame rate, and
sult of modeling the subjective quality using (9). For compar- spatial resolution of the video sequence in question. The
ison purposes, the quality metric proposed in [10] and having value for each video sequence was experimentally determined.
support for spatial scalability is presented as a bold black line As previously mentioned in this section, LSM-based multiple
(denoted as Default in Fig. 6). Note that the quality metric pro- regression analysis was applied using the obtained and
posed in [10] is not able to take into account the spatial variance values. The final shape of the proposed quality metric is
of a video sequence. presented below, consisting of an SNR, temporal, and spatial

In (9), denotes our quality metric with support for

spatial scalability. By giving the coefficients a value of 50, it is
possible to obtain a range of 0 to 100 for , which is
similar to the range of the final quality metric proposed in this (11)
paper. As shown in Fig. 6, the shape of the different curves is
similar to the left-half of a Gaussian function, with zero being The value is affected by the content characteristics and
the position of the center of the peak. Hence, is mod- the bit rates used. Specifically, and values are highly
eled using a Gaussian function. The width of the Gaussian func- dependent on the content characteristics, while the PSNR has a
tion is dependent on the value of . A low results in a high dependency on the bit rate. The dependency between the
broad width of the Gaussian function. This observation makes parameters in (11) can be summarized as follows: the second
clear that a video sequence with a high gets a lower than term on the right side of (11) implies that the perceptual video
a video sequence with a low , although both video sequences quality becomes higher as increases at the same bit rate.
have the same spatial resolution. In (9), the constants in the ex- The third term on the right side of (11) is used to compensate
ponent are computed using the relation between the width of the for the linearity between PSNR and . In particular, this term
Gaussian function and . As shown in Fig. 6, the shape and compensates the influence of frame copying on PSNR. Finally,
position of Default are similar to the shape and the position of the last term on the right side of (11) implies that a video se-
the curve representing a Normal . However, a clear gap can quence with a high spatial resolution gets a higher value at
be observed between the Default curve and the curves denoting the same .
a Low and a High . The reason for this is the fact that the Note that and seem to resemble the Spatial Informa-
quality metric proposed in [10] is not able to take into account tion and Temporal Information terms used in the SITI
the spatial variance of a video sequence. metric proposed in [44]. However, the motivation and usage of

Fig. 7. Video sequences used in the verification experiment.

, and , are different in the respective quality met-

rics. In [44], and are used to measure the perceptual im-
pairments that are the result of spatial and temporal artifacts
of digitally compressed video. As such, Sobel-filtered images
and difference images between successive frames are used to
extract and values, respectively. On the other hand, the
and values are used in our research to reflect the spatial
and temporal characteristics of scalable video content, where
the scalable video content may have been the subject of adap-
tations along the spatial, temporal, or SNR quality axis. Strictly
speaking, in our quality metric, the second term and the third
term in (11) are more similar to the use of and rather
than and .


This section describes an experiment that was used to verify
the reliability of our video quality metric, as defined by (9).
Three video sequences were used in this experiment. Note that
these video sequences were not used for the actual construc-
tion of our quality metric (discussed in Section III). Each of the
video sequences was coded using three different QP values (25,
35, 45), three different frame rates (7.5, 15, 30), and three dif-
ferent resolutions (QCIF, 264 216, CIF). As such, 81 video se-
quences were used for verification purposes. Further, since our
quality metric is targeting deployment in mobile usage environ-
ments, the generated video sequences are strictly compliant with
the Scalable Baseline Profile of SVC.
The three original video sequences have significantly dif-
ferent and values. This makes it possible to show
that our quality metric is not specialized for specific types of
video content. The video sequences used are Akiyo, Foreman,
and Fall Road. Representative screenshots for the three video
sequences can be found in Fig. 7. Note that Akiyo has low Fig. 8. The proposed QM versus the assessed subjective quality: (a) Akiyo;
(b) Foreman; and (c) Fall Road.
and low ; Foreman has high and low ; and Fall
Road has high and low (the and values can
also be found in Tables I and II).The experimental conditions TABLE IV
were similar to the ones described in Section II. CORRELATION WITH SQ
The results of the verification experiment are summarized in
Fig. 8 and Table V. Fig. 8 plots the values as computed by the
proposed quality metric, defined in equation (11), versus the as-
sessed subjective quality, defined in equation (1). It can be seen
that the points on each chart in Fig. 8 are closely distributed
around a straight line, which implies that a high correlation ex-
ists between the estimated subjective quality and the proposed
. Table IV shows the correlation coefficient between the
quality metric score and the subjective quality. and spatial variance), [10] (a quality metric that does
Four metrics were used in the verification experiment, in- not consider the influence of the on the subjective quality),
cluding the traditional PSNR metric, [9] (a and (the quality metric proposed in this paper). In order
quality metric that does not take into account spatial resolution to measure the PSNR of scalable video content, frame copying

TABLE V bit streams. It can be expected that the construction and the
CORRELATION WITH SQ AT CIF RESOLUTION experimental validation of the proposed quality metric would
benefit from the use of a higher number of video sequences
with diverse values for and . The number of viewers
per experiment can also be increased in order to obtain more
subjective quality data.

was used to deal with temporal scalability and down sampling V. CONCLUSIONS
was used to deal with spatial scalability (see Fig. 3). These Multimedia resources are increasingly consumed in diverse
two operations explain the low correlation between PSNR and usage environments. The use of SVC-compliant bit streams al-
. Although it is natural that the other quality metrics used lows taking into account the constraints and capabilities of het-
in the verification experiment outperform PSNR, the correla- erogeneous usage environments. However, in order to optimally
tion between PSNR and was included in the comparative adapt scalable video bit streams along the spatial, temporal,
experiment to illustrate the minimum baseline performance or perceptual quality axis, a quality metric is needed that reli-
of the proposed quality metric. Note that the quality ably models subjective quality. In this paper, we have proposed
metric in Table IV is different from the a quality metric that offers support for fully scalable SVC bit
quality metric in Table III: is parameterized on the streams that are suited for delivery in mobile usage environ-
temporal, spatial, and SNR properties of a bit stream, while ments.
is only able to take into account the The proposed quality metric allows modeling the temporal,
temporal and SNR characteristics of a bit stream. spatial, and perceptual quality characteristics of SVC-compliant
As shown in Table IV, PSNR cannot efficiently deal with bit streams by taking into account several properties of the com-
changes in frame rate and spatial resolution. The low correla- pressed bit streams, such as the temporal and spatial variance
tion between PSNR and can be attributed to frame copying of the video content, the frame rate, the spatial resolution, and
and down-sampling operations that alter the video signal. Fur- PSNR values. Our experimental results show that the average
ther, [9] cannot take into account the spa- correlation coefficient for the video sequences tested is as high
tial resolution of a video sequence. Consequently, as shown in as 0.95 (compared to a value of 0.60 when only using the tradi-
Table IV, [10] is characterized by a higher performance tional PSNR quality metric). The proposed quality metric also
than PSNR and [9], as [10] is pa- shows a uniformly high performance for video sequences with
rameterized in terms of frame rate and spatial resolution. How- different temporal and spatial characteristics.
ever, [10] shows a decreased performance for Fall Road. Future work will investigate the use of the proposed quality
Indeed, the performance of [10] abruptly decreases for metric for the optimal adaptation of SVC bit streams. The influ-
video sequences with a high . This is due to the fact that the ence of packet losses and bandwidth fluctuations on the behavior
construction of [10] relied on video sequences that were of the proposed quality metric will also be analyzed.
similar to Foreman in terms of (the value of Foreman is
equal to 0.12, which is Normal ). As shown in Figs. 5 and 6, REFERENCES
the discrepancy between subjective quality and PSNR, as well [1] S.-F. Chang and A. Vetro, “Video adaptation: Concepts, technologies,
as subjective quality and Default [10], is higher in Normal and open issues,” Proc. IEEE, vol. 93, no. 1, pp. 148–158, Jan. 2005.
than others. PSNR is the main factor used in matching the sub- [2] A. Perkis, Y. Abdeljaoued, C. Christopoulos, T. Ebrahimi, and J. F.
Chicaro, “Universal multimedia access from wired and wireless sys-
jective quality [9]. Hence, the performance was the worst for tems,” Circuits, Syst., Signal Process., vol. 20, no. 3, pp. 387–402, Feb.
video sequences with a high . On the other hand, our quality 2005.
metric considers both and , and where the value may [3] R. V. Babu and A. Perkis, “Evaluation and monitoring of video quality
for UMA enabled video streaming systems,” Multimedia Tools Appl.,
vary from a low to a high value. Hence, the proposed quality vol. 32, no. 2, pp. 211–231, Apr. 2008.
metric shows a uniformly high performance for all tested video [4] A. T. Compbell and G. Coulson, “QoS adaptive transports: Delivering
sequences, regardless of their temporal and spatial characteris- scalable media to the desktop,” IEEE Netw. Mag., vol. 11, no. 2, pp.
18–27, Mar. 1997.
tics. [5] J. W. Kang, S.-H. Jung, J.-G. Kim, and J.-W. Hong, “Development of
Table V shows the correlation between the quality metric QoS-aware Ubiquitous Content Access (UCA) testbed,” IEEE Trans.
score and the subjective quality at CIF resolution. Since Consum. Electron., vol. 53, no. 1, pp. 197–203, Feb. 2007.
[6] L. Skorin-Kapovm and M. Matijasevic, “End-to-end QoS signaling for
[9] does not take into account spatial scal- future multimedia services in the NGN,” in Proc. Next Gen. Teletraffic
ability, the spatial resolution was set to CIF in order to allow Wired/Wireless Adv. Netw., LNCS, Sep. 2006, pp. 408–419.
for a fair comparison. As shown in Table V, the results are in [7] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable
video coding extension of the H.264/AVC standard,” IEEE Trans. Cir-
line with the results provided in Table IV. The correlation for cuits Syst. Video Technol., vol. 17, no. 9, pp. 1103–1120, Sep. 2007.
is higher since [9] does not take into [8] Joint draft 9 SVC amendment, ISO/IEC JTC1/SC29/WG11 and ITU-T
. When is high, then better reflects the subjective SG16 Q.6, Jan. 2007.
[9] R. Fechali, F. Speranza, D. Wang, and A. Vincent, “Video quality
quality. metric for bit rate control via joint adjustment of quantization and
In this paper, three video sequences were used to determine frame rate,” IEEE Trans. Broadcast., vol. 53, no. 1, pp. 441–446, Mar.
the coefficients in (11) and to validate the proposed quality 2007.
[10] C. S. Kim, D. Suh, T. M. Bae, and Y. M. Ro, “Quality metric for
metric, mainly for practical reasons as the scalable coding of H.264/AVC scalable video coding with full scalability,” Proc. SPIE,
the three video sequences already resulted in a total of 81 test pp. 64921P-1–64921P-12, Jan. 2007.

[11] “Final report from the video quality experts group on the validation of [37] T. Sikora, “The MPEG-7 visual standard for content description-an
objective model of video quality assessment, phase II,” VQEG, Aug. overview,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 6,
2003. pp. 696–702, Jun. 2001.
[12] Objective Perceptual Multimedia Video Quality Measurement in the [38] C. S. Won, D. K. Park, and S.-J. Park, “Efficient use of MPEG-7 edge
Presence of a Full Reference, ITU-T Recommendation J.247, Aug. histogram descriptor,” ETRI Journal, vol. 24, no. 1, pp. 23–30, Feb.
2008. 2002.
[13] E. Ong, X. Yang, W. Lin, Z. Lu, and S. Yao, “Perceptual quality metric [39] B. S. Manjunath, P. Salembier, and T. Sikora, Introduction to MPEG-7:
for compressed videos,” in Proc. Int. Conf. Acoust., Speech, Signal Multimedia Content Description Interface. Hoboken, NJ: John Wiley
Process., Mar. 2005, pp. 581–584. & Sons Ltd., 2002.
[14] E. Ong, W. Lin, Z. Lu, S. Yao, X. Yang, and F. Moschetti, “Low bit [40] A. Divakaran, “An overview of MPEG-7 motion descriptors and their
rate video quality assessment based on perceptual characteristics,” in applications,” in Proc. 9th Int. Conf. Comput. Anal. Images Patterns,
Proc. Int. Conf. Image Process., Sep. 2003, pp. 182–192. LNCS, 2001, pp. 29–40.
[15] S. Winkler, “A perceptual distortion metric for digital color video,” [41] J. L. Rodgers and A. W. Nicewander, “ Thirteen ways to look at the
Proc. SPIE, pp. 175–184, Jan. 1999. correlation coefficient,” The American Statistician, vol. 42, no. 1, pp.
[16] S. Wolf and M. Pinson, “Video quality measurement techniques,” 59–66, Feb. 1988.
NTIA Report 02-392, 2002. [42] Information Technology—Multimedia Content Description Inter-
[17] G.-M. Muntean, P. Perry, and L. Murphy, “Subjective assessment of face—Part 3: Visual, ISO/IEC 15938-3, May 2002, First edition, pp.
the quality-oriented adaptive scheme,” IEEE Trans. Broadcast., vol. 53, 63-65.
no. 3, pp. 1–11, Sep. 2005. [43] “Find graph quick and easy,” [Online]. Available: http://www.uniphiz.
[18] M. Pinson and S. Wolf, “A new standardized method for objectively com/findgraph.htm
measuring video quality,” IEEE Trans. Broadcast., vol. 50, no. 3, pp. [44] A. A. Webster, C. T. Jones, M. H. Pinson, S. D. Voran, and S. Wolf,
312–446, Sep. 2004. “An objective video quality assessment system based on human per-
[19] U. Engelke and H.-J. Zepernick, “Perceptual-based quality metrics for ception,” in Proc. SPIE, Feb. 1993, pp. 15–26.
image and video services: A survey,” in Proc. 3rd Euro-NGI Conf. Next
Gen. Internet Netw., May 2007, pp. 190–197.
[20] E. C. Reed and F. Dufaux, “Constrained bit-rate control for very low
bit-rate streaming-video applications,” IEEE Trans. Circuits Syst. Video
Technol., vol. 11, no. 7, pp. 882–888, Jul. 2001.
[21] P. Cuenca, L. Orozco-Barbosa, A. Carrido, and F. Quiles, “Study of
video quality metrics for MPEG-2 based video communications,” in
Proc. IEEE Pacific Rim Conf. Commun., Comput. Signal Process., Aug.
1999, pp. 280–283. Hosik Sohn received the B.S. degree from Korea
[22] E. Ong, W. Lin, Z. Lu, S. Yao, and M. H. Loke, “Perceptual quality Aerospace University, Goyang, South Korea, in
metric for H.264 low bit rate videos,” in Proc. IEEE Int. Conf. Multi- 2007 and the M.S degree from the Korea Advanced
media Expo, Jul. 2006, pp. 677–680. Institute of Science and Technology (KAIST),
[23] O. Nemethova, M. Ries, M. Rupp, and E. Siffel, “Quality assessment Daejeon, South Korea, in 2009. He is currently
for H.264 coded low-rate and low-resolution video sequences,” in Proc. working toward the Ph.D. degree at KAIST. His
Conf. Internet Inf. Technol. (CITT), Nov. 2004, pp. 136–140. research interests include video adaptation, visual
[24] S. Winkler and F. Dufaux, “Video quality evaluation for mobile appli- quality measurement, bio-cryptography, multimedia
cations,” in Proc. Visual Commun. Image Process. Conf. (VCIP), Jul. security, Scalable Video Coding (SVC), and JPEG
2003, pp. 593–603. XR.
[25] D. Gill, J. P. Cosmas, and A. Pearmain, “Mobile audio-visual terminal:
System design and subjective testing in DECT and UMTS networks,”
IEEE Trans. Veh. Technol., vol. 49, no. 4, pp. 1378–1391, Jul. 2000.
[26] M. Ries, O. Nemethova, and M. Rupp, “Video quality estimation for
mobile H.264/AVC video streaming,” J. Commun., vol. 3, no. 1, pp. Hana Yoo received the B.S. degree from the De-
41–50, Jan. 2008. partment of Electronics at Inha University, Incheon,
[27] H. Knoche, J. D. McCarthy, and M. A. Sassse, “Can small be beautiful? South Korea. She worked as an engineer for the
assessing image resolution requirements for mobile TV,” in Proc. ACM Department of Liquid Crystal Displays at Samsung
Multimedia, Nov. 2005, pp. 829–838. Electronics from 2005 to 2006. In 2008, she received
[28] R. C. Gonzalez and R. E. Woods, Digital Image Process., 2nd ed. the M.S. degree from the Korea Advanced Institute
London: Prentice Hall, 2002, pp. 61–63. of Science and Technology (KAIST), Daejeon, South
[29] Y. S. Kim, Y. J. Jung, T. C. Thang, and Y. M. Ro, “Bit-stream extrac- Korea. She is currently working as a research staff
tion to maximize perceptual quality using quality information table in member for the Advanced Infotainment Research
SVC,” in Proc. SPIE, Jan. 2006, pp. 607723-1–607723-11. Team at Hyundai Motors. Her research interests
[30] H. Koumaras, A. Kourtis, D. Martakos, and J. Lauterjung, “Quantified include Scalable Video Coding (SVC) and video
PQoS assessment based on fast estimation of the spatial and temporal quality measurement in mobile environments.
activity level,” Multimedia Tools Appl., vol. 34, no. 3, pp. 355–374,
Sep. 2007.
[31] N. Montard and P. Bretillon, “Objective quality monitoring issues in
digital broadcasting networks,” IEEE Trans. Broadcast., vol. 51, no. 3, Wesley De Neve received the M.Sc. degree in com-
pp. 269–275, Sep. 2005. puter science and the Ph.D. degree in computer sci-
[32] Methodology for the Subjective Assessment of the Quality of Television ence engineering from Ghent University, Ghent, Bel-
Pictures, ITU-R Recommendation BT.500-11, 2002. gium, in 2002 and 2007, respectively.
[33] “FTP directory,” [Online]. Available: He is currently working as a senior researcher for
[34] P. Corriveau, C. Gojmerac, B. Hughes, and L. Stelemach, “All sub- the Image and Video Systems Lab (IVY Lab), in the
jective scales are not created equal: The effects of context on different position of assistant research professor. IVY Lab is
scales,” Signal Process., vol. 77, no. 1, pp. 1–9, Aug. 1999. part of the Department of Electrical Engineering of
[35] Q. Huynh-Thu and M. Ghanbari, “A comparison of subjective video KAIST, the Korea Advanced Institute of Science and
quality assessment methods for low-bit rate and low-resolution video,” Technology (Daejeon, South Korea). Prior to joining
in Proc. IASTED Int. Conf. Signal Image Process., Aug. 2005, pp. KAIST, he was a post-doctoral researcher at both
70–76. Ghent University - IBBT in Belgium and the Information and Communications
[36] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan, University (ICU) in South Korea. His research interests and areas of publication
“Rate-constrained coder control and comparison of video coding stan- include the coding, annotation, and adaptation of image and video content,
dards,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. GPU-based video processing, efficient XML processing, and the Semantic and
688–703, Jul. 2000. the Social Web.

Cheon Seog Kim received the B.S degree from the Yong Man Ro (M’92–SM’98) received the B.S. de-
Department of Electrical Engineering at Hong-Ik gree from Yonsei University, Seoul, South Korea, and
University, Seoul, South Korea, in 1981 and the M.S. the M.S. and Ph.D. degrees from the Korea Advanced
degree from the School of Engineering at Korea Institute of Science and Technology (KAIST), Dae-
University, Seoul, South Korea, in 1983. He received jeon, South Korea.
the Ph.D. degree from the Korea Advanced Institute In 1987, he was a visiting researcher at Columbia
of Science and Technology (KAIST), Daejeon, University, and from 1992 to 1995, he was a visiting
South Korea, in 2009. researcher at the University of California, Irvine and
In 1984, he worked for Taihan Electric Wire Co., KAIST. He was a research fellow at the University
Ltd. From 1992 to 2008, he worked as a chief scien- of California, Berkeley and a visiting professor at the
tist for security service provider ADT CAPS and for University of Toronto in 1996 and 2007, respectively.
multimedia solutions provider Curon. He currently works as a CTO for Woori He is currently holding the position of full professor at KAIST, where he is di-
CSt. He participated in the MPEG-21 international standardization effort, con- recting the Image and Video Systems Lab. He participated in the MPEG-7 and
tributing to the definition of the MPEG-21 DIA visual impairment descriptors MPEG-21 international standardization efforts, contributing to the definition of
and modality conversion. His major research interests are video quality mea- the MPEG-7 texture descriptor, the MPEG-21 DIA visual impairment descrip-
surement, video coding, Scalable Video Coding (SVC), and the design of mul- tors, and modality conversion. His research interests include image/video pro-
timedia systems. cessing, multimedia adaptation, visual data mining, image/video indexing, and
multimedia security.
Dr. Ro received the Young Investigator Finalist Award of ISMRM in 1992
and the Scientist Award in Korea in 2003. He served as a TPC member of inter-
national conferences such as IWDW, WIAMIS, AIRS, and CCNC, and he was
the program co-chair of IWDW 2004.