Anda di halaman 1dari 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/261453053

Database for video quality assessment in license plate recognition

Conference Paper · January 2013

CITATION READS
1 40

2 authors:

Mikołaj Igor Leszczuk Lucjan Janowski


AGH University of Science and Technology in Kraków AGH University of Science and Technology in Kraków
107 PUBLICATIONS   418 CITATIONS    71 PUBLICATIONS   715 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

ACE (Advancing the Customer Experience) View project

Development of a System for Detecting Security Threats for Persons with a Vision Impairment, Including Threats Caused by Traffic. Legal and Technical Aspects. View project

All content following this page was uploaded by Mikołaj Igor Leszczuk on 07 April 2019.

The user has requested enhancement of the downloaded file.


Database for Video Quality Assessment in
License Plate Recognition
Mikołaj Leszczuk AGH University of Science and Technology
al. Mickiewicza 30
PL-30059 Kraków, Poland
leszczuk@kt.agh.edu.pl
Lucjan Janowski AGH University of Science and Technology
al. Mickiewicza 30
PL-30059 Kraków, Poland
janowski@kt.agh.edu.pl

Abstract
The Quality of Experience (QoE) concept for video content used for entertain-
ment differs significantly from the QoE of surveillance video used for recognition
tasks. Consequently, the latter requires specific reference video database sets for
research. In this paper, such a database is described. The paper reports the process
of preparing test video sequences and related subjective scores. It also presents
methods used to form visual sequences and collections of accompanying subjective
scores. The resulting data is freely available (via a public library) to the research
community, which is common practice in research.
CCTV, free content database, objective evaluation, recognition video quality, coding
impairments

1 Introduction
Users of video to perform tasks require sufficient video quality to recognize the in-
formation needed for their application. Therefore, the fundamental measure of video
quality in these applications is the success rate of these recognition tasks, which is
referred to as visual intelligibility or acuity. One of the major causes of reduction of
visual intelligibility is loss of data through various forms of compression. Additionally,
the characteristics of the scene being captured have a direct effect on visual intelligi-
bility and on the performance of a compression operation-specifically, the size of the
target of interest, the lighting conditions, and the temporal complexity of the scene
[1]. For example, low resolution of images can have influence on reliability of face
detection and recognition [2].
Consequently, there is a need for developments in quality assessment for target
recognition video, which is including performed series of tests to study the effects and

1
interactions of compression and scene characteristics. An additional goal is to test
existing and develop new objective measurements [1].
The Quality of Experience (QoE) concept for surveillance video used for recog-
nition tasks requires specific reference video and ground truth database sets for re-
search. Currently available task recognition databases (such as those prepared by
the Video Quality in Public Safety Working Group) frequently miss ground truth,
therefore they are insufficient for the research. Consequently, this paper proposes a
ground truth database. The paper describes the process of preparing test video se-
quences and related subjective scores. It also presents methods used to form visual
sequences and collections of accompanying subjective scores. The resulting data is
freely available (via a public library) to the research community [3] or at our webpage
http://vq.kt.agh.edu.pl/videoLibrary.html.
The remainder of this article is organized as follows. Section 2 describes source
video sequences. Section 2 presents Source Reference Circuits (SRC). Section 3 dis-
cusses Hypothetical Reference Circuits (HRC) related to this research. In Section 4, we
present Processed Video Sequences (PVS), while in Section 5 – we present a psycho-
physical experiment. Possible applications for state of the art research are provided in
Section 6. The paper is concluded in Section 7.

2 Source Reference Circuits (SRC)


All Source Reference Circuits (SRC video sequences) were collected at the AGH
University of Science and Technology, Kraków, by filming a car park during hours of
high traffic volumes. In this scenario (one of the major video surveillance scenarios),
the camera was located 50 meters from the parking lot entrance in order to simulate
typical video recordings. Using a ten-fold optical zoom, a 6m × 3.5m field of view was
obtained. The camera was placed statically without changing the zoom throughout the
recording time. An example frame is shown in Figure 1.
Video sequences were acquired using a 2 mega-pixel (1080p) camera with a CMOS
sensor, set (for performance reason) to record at a lower resolution of 720p. The
recorded material was stored on an SDHC memory card inside the camera.
All the video content collected by the camera was analyzed and cut into 20 sec-
ond shots including cars entering or leaving the car park. Statistically, each license
plate was visible for an average of 17 seconds in each sequence. The length of the
sequences was dictated mostly by the need to capture the vehicles not only when they
were stopped by the entrance barrier, but also when they were in motion. The param-
eters of each source sequence are as follows: resolution: 1280 × 720 pixels, frame
rate: 25 frames/s, average bit-rate: 5.6 – 10.0 Mbit/s (depending on the local motion
amount), and video compression: H.264/AVC in the Matroska Multimedia Container
(MKV).
The owners of the vehicles captured on film were asked for their written consent,
which made it possible for us to use the video content for testing and publication pur-
poses. 22 of them allowed us to make the videos publicly available. The remaining 8
sequences can by used by AGH only and they are kept as a separate test set.

2
Figure 1: Example frame.

3 Hypothetical Reference Circuits (HRC)


The motivation to create Hypothetical Reference Circuits (HRC) was as follows:
if quality is not acceptable, we must question the reasons behind it. The sources of
potential problems are located in different parts of the end-to-end video delivery chain.
The first group of distortions (1) can be introduced at the time of image acquisition.
The most common problems are noise, lack of focus or incorrect exposure. Other
distortions (2) appear as a result of further compression and processing. Problems can
also arise when scaling video sequences in the quality, temporal and spatial domains,
the introduction of digital watermarks, and so on. Then (3), for transmission over
the network, there may be some artifacts caused by packet loss. At the end of the
transmission chain (4), problems may relate to the equipment used to present video
sequences.
Considering this, all SRCs were encoded with a fixed quantification parameter QP
using the H.264/AVC video codec, x264 implementation. Prior to encoding, some
modifications involving changing resolution and cropping were applied in order to ob-
tain diverse aspect ratios between car plates and video size (see Figure 2 for details
related to processing). Each SRC was modified into 6 versions and each version was
encoded with 5 different quantification parameters (QP). Three sets of QPs were se-
lected: 1) {43, 45, 47, 49, 51}, 2) {37, 39, 41, 43, 45}, and 3) {33, 35, 37, 39, 41}.
Selected QP values were adjusted to different video processing paths in order to cover
the number plate recognition ability threshold. Furthermore, network streaming arti-
facts have not been considered. Therefore, an experiment with packet losses needs a
specific methodology where packet loss and position loss in a picture are considered.
This topic can be considered for future studies. As a result of our assumptions, 30
different HRC were obtained.

3
SRC
1280x720

Scale down
960x576

Scale down Cropp Scale down Cropp Scale down


640x360 704x576 352x288 704x576 352x288

H264/AVC H264/AVC H264/AVC H264/AVC H264/AVC H264/AVC


Compression Compression Compression Compression Compression Compression
QPs set 1 QPs set 2 QPs set 1 QPs set 2 QPs set 1 QPs set 3

HRC HRC HRC HRC HRC HRC


1-5 6-10 11-15 16-20 21-25 26-30

Figure 2: Generation of HRCs (based on literature [4]).

HRC is described by QP parameters nevertheless in the rest of the paper we use


bit-rate as more network driven parameter. Note that the same QP for different views
will result in different bit rates.
Based on the above parameters, it is easy to determine that the whole test set con-
sists of 900 sequences (each SRC 1-30 encoded into each HRC 1-30).
Figure 3 presents example frames of four SRC versions.

4 Processed Video Sequences (PVS)


In order to share the research within the scientific community, which is common prac-
tice in research, the authors put the Processed Video Sequences (PVS) material recorded
using a CCTV camera in a public library. The files with the set of PVSes created
for the recognition experiment are shared on the Consumer Digital Video Library
(http://www.cdvl.org/) [3] (being mirrored at: [5]).
We made video sequences available to the research community free of charge.

5 Recognition Task Psycho-Physical Experiment


The shared video data is accompanied by results of a recognition task psycho-physical
experiment. The recognition task was threefold: 1) type in the license plate number,
2) select car color, and 3) select make of car. Thirty non-expert subjects participated
in the study. The average age was 23 years old, ranging between 17 and 29 years old.
11 women and 19 men participated in the study. The subjects provided a total of 960
answers.
Each answer obtained can be interpreted as 1 or 0, i.e. correct or incorrect recog-
nition. The goal of this analysis is to find the detection probability as a function of a
certain parameter(s) i.e. the explanatory variables. The most obvious choice for the
explanatory variable is bit-rate, which has two useful properties. The first property is

4
a monotonically increasing amount of information, because higher bit-rates indicate
that more information is being sent. The second advantage is that if a model predicts
the needed bit-rate for a particular detection probability, it can be used to optimize the
network utilization.
Moreover, if the network link has limited bandwidth the detection probability as a
function of a bit-rate computes the detection probability, what can be the key informa-
tion which could be crucial for a practitioner to decide whether the system is sufficient
or not.
One important conclusion is that for a bit rate as low as 180 kbit/s the detection
probability is over 80% even if the visual quality of the video is very low. Moreover,
the detection probability depends strongly on the SRC (over all detection probability
varies from 0 (sic!) to over 90%) [1, 6].
Please see [1, 4, 7, 6] for more details related to the experiment results.

6 Possible Applications for State of the Art Research


Existing and emerging applications for state of the art research are suggested below.
Plate recognition is the field that has been extensively researched in recent years.
The main published references include works [8, 9].
Extensive work is being carried out in the area of video quality, mainly driven by
the Video Quality Experts Group. Recently, a new project, Quality Assessment for
Recognition Tasks, was created for task-based video quality research.
In the paper [4], Leszczuk et al. attempted to develop quality thresholds in li-
cense plate recognition tasks, based on video originating from the described database,
streamed in constrained networking conditions. The measures that have been devel-
oped for this kind of task-based video provide specifications and standards that will
assist users of task-based video to determine the technology that will successfully al-
low them to perform the function required.
Since the number of surveillance cameras is still growing, it is extremely likely
that automatic systems will be used to carry out the tasks. Research based on the
same database and presented by Janowski et al. in [10] includes analysis of automatic
recognition algorithms.
In the paper [11], Dumke explores using visual acuity as a video quality metric for
public safety applications. An experiment has been conducted to track the relationship
between visual acuity and the ability to perform a forced-choice object recognition
task with digital video of varying quality. Visual acuity is measured according to the
smallest letters reliably recognized on a reduced LogMAR chart.
The paper [7] by Leszczuk introduces a typical usage of task-based video: surveil-
lance video for accurate license plate recognition. The author presents the field of
task-based video quality assessment, from subjective psycho-physical experiments to
objective quality models. Example test results and models, based on the described
database, are provided alongside the descriptions. The continuation of this research
is the paper [6] by Leszczuk, presenting a quality optimization approach driven by
recognition rates.

5
Finally, in the paper [12], Ukhanova et al. show a related objective quality metric
that considers frame rate. The proposed metric uses PSNR, frame rate and a content-
dependent parameter that can easily be obtained from spatial and temporal activity
indexes. The results have been validated on data from a subjective quality study.

7 Conclusions
In this paper, a specific reference video database sets for research have been described.
The paper has reported the process of preparing test video sequences and related sub-
jective scores. It also has presented methods used to form visual sequences and collec-
tions of accompanying subjective scores. The resulting data is freely available (via a
public library) to the research community, which is common practice in research.

8 Acknowledgments
The authors would like to thank the European Commission’s Seventh Framework Pro-
gram (FP7/2007-2013), from which funding has been received from under grant agree-
ment №218086 (INDECT) to the research leading to these results. Preparation of
recording has been co-financed by the European Regional Development Fund under
the Innovative Economy Operational Program, INSIGMA project №POIG.01.01.02-
00-062/09.

References
[1] M. Leszczuk and J. Dumke, “Survey of recent developments in quality assessment
for target recognition video,” in Multimedia Communications, Services and
Security, ser. Communications in Computer and Information Science, A. Dziech
and A. Czyżewski, Eds. Springer Berlin Heidelberg, 2013, vol. 368, pp. 59–70.
[Online]. Available: http://dx.doi.org/10.1007/978-3-642-38559-9 6

[2] T. Marciniak, A. Dabrowski, A. Chmielewska, and R. Weychan, “Face


recognition from low resolution images,” in Multimedia Communications,
Services and Security, ser. Communications in Computer and Information
Science, A. Dziech and A. Czyżewski, Eds. Springer Berlin Heidelberg,
2012, vol. 287, pp. 220–229. [Online]. Available: http://dx.doi.org/10.1007/
978-3-642-30721-8 22

[3] The Consumer Digital Video Library, Institute for Telecommunication Sciences,
http://www.cdvl.org/, June 2013.
[4] M. Leszczuk, L. Janowski, P. Romaniak, A. Głowacz, and R. Mirek, “Quality
assessment for a licence plate recognition task based on a video streamed in
limited networking conditions,” in Multimedia Communications, Services and
Security, ser. Communications in Computer and Information Science, A. Dziech

6
and A. Czyżewski, Eds. Springer Berlin Heidelberg, 2011, vol. 149, pp. 10–18.
[Online]. Available: http://dx.doi.org/10.1007/978-3-642-21512-4 2
[5] Video Quality, AGH University of Science and Technology, http://vq.kt.agh.edu.
pl/, June 2013.

[6] M. Leszczuk, “Optimising task-based video quality,” Multimedia Tools and


Applications, pp. 1–18, 2012. [Online]. Available: http://dx.doi.org/10.1007/
s11042-012-1161-6
[7] M. Leszczuk, “Assessing task-based video quality — a journey from
subjective psycho-physical experiments to objective quality models,” in
Multimedia Communications, Services and Security, ser. Communications in
Computer and Information Science, A. Dziech and A. Czyżewski, Eds.
Springer Berlin Heidelberg, 2011, vol. 149, pp. 91–99. [Online]. Available:
http://dx.doi.org/10.1007/978-3-642-21512-4 11
[8] J. Gao, E. Blasch, K. Pham, G. Chen, D. Shen, and Z. Wang, “Automatic vehicle
license plate recognition with color component texture detection and template
matching,” in Proc. SPIE, vol. 8739, 2013, pp. 87 390Z–87 390Z–6. [Online].
Available: http://dx.doi.org/10.1117/12.2014595
[9] D. Findley, C. Cunningham, J. Chang, K. Hovey, and M. Corwin, “Effects
of license plate attributes on automatic license plate recognition,” Transporta-
tion Research Record: Journal of the Transportation Research Board, vol. Vol-
ume 2327, no. -1, pp. 34–44, 12-01 2013, http://trb.metapress.com/content/
31H7577377247203.
[10] L. Janowski, P. Kozłowski, R. Baran, P. Romaniak, A. Glowacz, and T. Rusc,
“Quality assessment for a visual and automatic license plate recognition,”
Multimedia Tools and Applications, pp. 1–18, 2012. [Online]. Available:
http://dx.doi.org/10.1007/s11042-012-1199-5
[11] J. Dumke, “Visual acuity and task-based video quality in public safety
applications,” Image Quality and System Performance, pp. 865 306–865 306–7,
2013. [Online]. Available: +http://dx.doi.org/10.1117/12.2004882
[12] A. Ukhanova, J. Korhonen, and S. Forchhammer, Objective assessment of the
impact of frame rate on video quality, ser. International Conference on Image
Processing. Proceedings. IEEE, 2012, pp. 1513–1516.

7
(a) “Full frame + kept scale + compressed”

(b) “Full frame + scaled down + compressed”

(c) “Cropped frame + kept scale + compressed” (d) “Cropped frame + scaled
down + compressed”

Figure 3: Example frames of four SRC versions (with relative sizes maintained).

View publication stats

Anda mungkin juga menyukai