2005 02

University of Leeds
SCHOOL OF COMPUTING
RESEARCH REPORT SERIES
Report 2005.02
Hide and Seek:

Robust Digital Watermarking
by
Simon Wilkinson
June 2005
Summary
This project1 investigates robust digital watermarking within the domain of image copyright protec-
tion, and explores methods by which robustness properties can be quantified and measured. Specifically,
the concepts of digital watermarking are investigated and correlated with the requirements of copyright
protection mechanisms. Following this, a framework for benchmarking a watermark’s robustness against
image distortions is presented, which builds on previous work in the field. The new benchmark strength-
ens current techniques via a novel method of measuring watermark perceptibility, including low-level
modeling of the human visual system. This framework is then used to benchmark four existing water-
marking systems, hence allowing a detailed evaluation of their robustness strategies. The results show
that redundant watermark embedding and exploitation of the human visual system’s insensitivity to the
blue colour channel are effective strategies. The results are also shown to validate the benchmarking
approach, particularly the importance of an accurate, colour-based perceptibility measure.
1 This work was undertaken as the author’s final-year undergraduate project (BSc Computer Science).
i
Contents
1 Background to Watermarking 1
1.1 The Digital Revolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 What is a Digital Watermark? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Applications of Digital Watermarking in Image Copyright Protection . . . . . . . . . . . 2
1.4 Invisible Digital Image Watermarking . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.1 Steganography - A Distinction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.2 A Brief History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 A General Watermarking System Model . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.1 Watermark Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.2 Watermark Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Classes of Watermarking System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.7 Properties of Digital Watermarking Systems . . . . . . . . . . . . . . . . . . . . . . . . 7
1.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Benchmarking Watermarking Systems 10

2.1 The Need to Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Existing Benchmarking Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Problems with Existing Techniques . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 A New Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 Constraining the Problem Space . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.2 High Level Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.3 Experimental Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.4 Performance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.5 Creating Comparable Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Perceptual Distance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Existing Benchmark Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.2 A New Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.2.1 S-CIELAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.2.2 SSIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
ii
2.4.2.3 A Hybrid Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Experimental Configuration 24
3.1 Benchmarking Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.1 Keys and Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.2 Normalisation Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.3 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.4 Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.5 Experimental Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Watermarking Systems Under Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.1 Barni System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.2 Corvi System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.3 Cox System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.4 Kutter System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4 Implementation Commentary 36
4.1 High-level Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Implementation Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 Development Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.2 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.3 C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.4 Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.5 A Hybrid Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Performance Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Additional Driver Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5 Benchmarking Results and Evaluation 43

5.1 Results Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.1.1 Brightness Adjust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.1.2 Contrast Adjust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.1.3 Mean Blur / Gaussian Blur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.4 Sharpen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1.5 Crop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1.6 Rotate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.1.7 Horizontal Shear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.1.8 Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
iii
5.1.9 Impulse Noise / Gaussian Noise . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.1.10 JPEG Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2 Results Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.3 Benchmark Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3.1 How scientifically valid is the benchmark? . . . . . . . . . . . . . . . . . . . . 52
5.3.2 How accurate are the results? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.3.3 How does the benchmark compare to previous work? . . . . . . . . . . . . . . . 55
5.4 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6 Conclusion 58
Bibliography 60
A Test Images 64
iv
Chapter 1
Background to Watermarking
1.1 The Digital Revolution

The pace of the current digital revolution in multimedia distribution is constantly accelerating. Devel-
opment of the CD, DVD, digital radio and digital television as distribution mechanisms have all been
major milestones in this growth. Internet based distribution has also become economically viable [37],
and digital devices have converged into integrated playback and recording systems for many media and
content types. However, this digital revolution has presented new problems in copyright protection as
perfect copies of digital content can easily be made, and the barrier of physicality that has historically in-
hibited illegal distribution has been removed. This is particularly true of digital images, whose relatively
small storage and bandwidth requirements make them easy targets for casual copyright infringement.
Additionally, as images have no temporal dimension, they are inherently more pliable than other content
types, i.e. copying only a central region of an image is generally much more useful than copying a
central region of a song.
Practical solutions for protecting image copyright have been driven mainly by the ‘stock-photography’
industry, as its revenue model depends on the identification of illegal distribution and also enforcement
of legitimate purchasers’ ‘usage-rights’. Specifically, full copyrights of an image are not generally sold
to an image consumer, rather a licence to use a copy of an image in a certain context. Cox et al. [10]
discuss that standard cryptographic techniques could provide a solution: An image could be encrypted
by the copyright holder prior to transit, and only decrypted by the legitimate purchaser via a secret
key. However, it is clear that the copyright holder has no control over how the purchaser then further
distributes or uses the decrypted image. A complementary technology is required that can protect the
image after it has been decrypted; Digital Watermarking has been proposed as a mechanism for this
protection.
1
1.2 What is a Digital Watermark?
A digital watermark is a piece of information embedded into a digital image that in some way acts as
meta-data for the image [29]. The key idea is that of embedding rather than appending the information.
Embedding in this context means to add the information directly into the image data in such a way that
it is not easily removed. If the watermark is removed, then it should render its host image unusable for
its original purpose. This is not the case for appended information (for example in the header of a file),
as this can generally be removed without destroying the image it describes. Although this report focuses
on digital watermarking of images, it should be noted that techniques have been developed for many
digital content types, including audio [3], video [16], text [4], and computer software [42].
1.3 Applications of Digital Watermarking in Image Copyright Protection

c copyright notice; its application to a copyrighted work does not immediately
Consider the traditional ‘ ’
prove ownership or automatically stop copyright infringement, it is simply a tool to be used within a
larger legal framework. The same is true for a digital watermark, in itself it is not a standalone solution
to image copyright protection, but is a mechanism that can be used within a larger ‘rights management’
architecture. Craver et al. [11] discuss these issues extensively, arguing: “resolving rightful ownerships
of digital images may require, in addition to invisible watermarks, the inclusion of protocols, formal
requirements and standardization similar to traditional legal channels that are currently used to copyright
images and photographs”. Within such a framework, distinct areas of application for digital watermarks
are identified by Cox et al. [10]:
• Owner Verification enables the copyright holder of an image to assert their claim of ownership
by ‘signing’ an image with a unique identifier in the form of an invisible watermark.
• Transaction Tracking provides an audit trail of transactions that have taken place in the distri-
bution of a digital image. Generally, this will involve embedding an invisible watermark into an
image that uniquely identifies the image itself (a ‘fingerprint’ of the image), the copyright holder,
the image seller, the image buyer and the usage rights of the buyer. Recovery of watermark infor-
mation from an image deemed by the copyright holder as being used illegally will allow ‘traitor
tracing’, i.e. identification of who is responsible for the misuse of the image.
Cox et al. [10] identify other applications for digital watermarks, however these would only be
required by niche copyright protection models, and are not considered within this report.
2
1.4 Invisible Digital Image Watermarking
This report focuses exclusively on invisible image watermarking, however, for the applications discussed
in Section 1.3, watermarks can be employed that are either visible or invisible, however in both cases the
watermark must be reliably detectable by its intended receiver [34]. In the case of invisible watermarks,
the receiver will be hardware or software, whereas the intended receiver of a visible watermark will
generally be the same as the intended receiver of the image (i.e. the human visual system).
Visible watermarks are a very overt method of asserting copyright ownership, and have a limited
scope of application as they reduce the usability of an image (see Figure 1.1). Invisible watermarks
present a more pragmatic solution as legitimate users have access to an unflawed copy of an image,
whilst the copyright holder maintains some control over copyright issues via the watermark and the
overall copyright protection model employed. For this reason, much of current watermarking research
is interested in invisible watermarks [34], particularly answering the questions:
• How can information be embedded so that perceivably, the image remains unchanged?
• How can the information be embedded robustly so that it remains detectable despite any process-
ing the image undergoes?
Figure 1.1: Visibly (left) and invisibly (right) watermarked image carrying copyright information.
1.4.1 Steganography - A Distinction

A distinction should be drawn between invisible digital watermarking and the related field of steganog-
raphy. Both come under the general area of ‘information hiding’ and share many techniques, however
their primary goals are different:
• Steganography is the art of ‘concealed communication’, where a certain media is used as a cover
for the transmission of an imperceivable message, with the presence of the message kept secret.
The message will typically have no relation to the content of the cover media, and any tampering
with the media should render the message unreadable [1].
• Invisible digital watermarking does not demand the presence of the message to be a secret,
the message will relate directly to the content of the media, and the message should be robust to
tampering [29].
3
1.4.2 A Brief History
Digital image watermarking is a relatively new discipline, the term only becoming widely known in
the early nineties, having first been coined in Komatsu and Tominaga’s 1988 paper [21]. During the
early nineties, research output was as little as five to ten papers per year, until 1995 when interest in the
area (and the rigor applied) increased greatly. Since then, research papers have approximately doubled
in number each year [10]. The International Society for Optical Engineering (SPIE) began devoting a
specific conference to ‘Security and Watermarking of Digital Contents’ in 1999.
Commercial exploitation of digital watermarking has followed suit, with few notable commercial
applications being available until the formation of Digimarc Corporation in 1995. Digimarc released its
first digital image watermarking product in 1996, and now has revenues in excess of ninety million dol-
lars per year and is backed by large industry players such as Adobe, Macrovision and Philips Electronics
[5]. Digimarc currently provides watermarking solutions for use in the copyright protection models of
many large stock photography firms, such as Corbis [18] and Getty Images [19].
1.5 A General Watermarking System Model

Many digital watermarking systems follow the same basic model, and only differ in their choice of a
specific strategy at some stage in the embedding or detection process [29]. Therefore a general model
of a watermarking system can be presented, which encapsulates most of the techniques given in the
literature.
1.5.1 Watermark Embedding

Figure 1.2 depicts the general watermark embedding process. Embedding consists of two distinct steps
(although these are often regarded implicitly as one [17]):
1. Generation of a watermark W , requires the message to be conveyed M (e.g. a unique identifier of

the copyright owner), and a watermarking key K:
W = f0 (M, K) (1.1)
The generated watermark may not have any direct spatial representation, i.e. it may not be in
the (more intuitive) form of an image itself [10]. The watermark is simply an encoding of the
message in a form which is usable by the system’s embedding strategy, e.g. a string of bits [20] or
a sequence of floating point numbers with certain properties [7]. The watermarking key is used to
enforce a certain level of security by introducing a deterministic randomality into the watermark
generation process. Security in this context relates to the requirement of the watermark being
hard to remove, i.e. removing the watermark would generally be easier if the watermark itself was
known. Hence, the watermarking key ensures that recreating the watermark is difficult even if the
generation algorithm and message are public entities [17].
4
Message (M) Image (I)
Watermark Watermark Watermarked

Generation Embedding Image (I)
Secret Key (K) Embedding Strength ( α )
Figure 1.2: General model of watermark generation and embedding.
ˆ
2. Embedding of the watermark W , into the host image I, giving watermarked image I:
Iˆ = f1 (W, I, α ) (1.2)
The image data will present a certain number of sites into which the embedding algorithm can em-
bed watermark data. Depending on the algorithm, these sites may be actual pixels, or coefficients
from some transform domain representation of the image. The strategy used to select these sites
is generally the watermarking system’s main distinguishing feature [29]. The sites may be chosen
based on some perceptual significance to decrease visibility within the final image (using some
model of the human visual system), or based on a strategy to increase robustness to expected im-
age processing operations. Actual embedding is usually additive between a site’s value and a data
element from the watermark, and is performed in a deterministic sequence [10]. If more sites are
available than watermark data elements, then an algorithm may choose to embed multiple copies
of the watermark. α is a parameter common to all watermark embedding techniques that in some
way controls the strength of the additive operation. In some cases this may be further adjusted per
site by the embedding algorithm, based on some perceptual weighting (again, using some model
of the human visual system).
1.5.2 Watermark Detection

Figure 1.3 depicts the general watermark detection process. Recovery of information Y , from a host
ˆ requires at a minimum the watermark key K:
image I,
ˆ K),
Y = g(I, (1.3)
or additionally, the original watermark W :

ˆ K,W ),
Y = g(I, (1.4)
or at a maximum, the addition of the original, unwatermarked image I:

ˆ K,W, I)
Y = g(I, (1.5)
5
Watermark(W) Original Image (I)
Message (M) or
Watermarked Watermark confidence measure
Image (I) Recovery that watermark W
is embedded in the
image I
Secret Key (K)
Figure 1.3: General model of watermark recovery.
The recovered information, Y can take two forms, either the watermark itself (Equation 1.3) or a con-
fidence measure that a given watermark is present (Equations 1.4 and 1.5). In the case of Equation 1.5
detection is trivial, as all of the necessary information is available to invert the embedding operation
and extract the embedded watermark. Note also in this case, the watermark generation function is not
invertible so the message cannot be extracted; rather some form of correlation between the original and
extracted watermarks is applied to gain a confidence measure. In the other two cases, many systems
attempt to invert the embedding operation using estimates for any missing data elements. For example,
a very common additive embedding technique is given by the equation:
Ŝi = Si (1 + α Wi ), (1.6)
where Ŝi and Si are the modulated and original site values respectively, and Wi is the corresponding wa-
termark value. Assuming that the site selection is deterministic, this is clearly invertible at the detector:
Ŝi − Si
Wi0 = , (1.7)
α Si
where W 0 is the extracted watermark. Hence, if I (and therefore S i ) is not available, then Si will be
estimated.
1.6 Classes of Watermarking System

Using this general model, Kutter and Petitcolas [23] identify watermarking systems to be of certain
classes depending on the inputs and outputs of their detection scheme 1 , and relate these directly to the
watermarking applications discussed in Section 1.3. (The following uses the notation of Kutter and
Petitcolas [23].)
1 Kutter and Petitcolas actually refer to these schemes as ‘Private’ and ‘Public’ watermarking, however as noted in other
literature, this could be confused by the notion of private and public key cryptology. Therefore the other accepted names,
‘Blind’ and ‘Informed’ are used.
6
• Informed Watermarking systems produce a yes/no answer to whether a given watermark is
ˆ K,W, I) → {0, 1}. This is clearly the most restrictive scheme, and could
present in an image: g(I,
only be used for copyright owner verification [29] as no message information is recovered. Addi-
tionally, to maintain the integrity of the copyright protection model, the verification can conceptu-
ally only be executed by the copyright holder, and within a suitable environment (for instance, in
court). Clearly distributing the original image to other parties for purposes of watermark detection
would circumvent the original intentions of applying a watermarking scheme.
• Semi-Informed Watermarking systems produce a yes/no answer to whether a given watermark

ˆ K,W ) → {0, 1}, but do not require the original image to be present.
is present in an image: g(I,
This can also only be used for copyright owner verification [29], however the absence of the
original image in the detection process gives a wider scope of use. The detection procedure
can be done remotely, with the practicalities of secret material distribution (the watermark and
the watermark key) addressed by standard cryptographic protocols. Many of the watermarking
schemes proposed in the literature fall into this class [17].
• Blind Watermarking systems recover the original, embedded message using only the watermark
ˆ K) → M. Clearly these type of systems have the most practical application, as they are the
key: g(I,
only class which perform true extraction rather than detection of information and require minimal
distribution of secret material. Few practical blind watermarking systems have been proposed, as
generally they are difficult to design within the necessary requirements of robustness and security
(i.e. removing the watermark typically does not render its host image unusable [10]).
1.7 Properties of Digital Watermarking Systems

Although a watermarking system’s class will constrain its potential applications, further properties must
be examined to determine its specific suitability. Three properties are generally used to describe a
watermarking system [17]:
• Fidelity - The visibility of artefacts introduced into an image by the watermarking process. Cox
et al. [10] describe this as the ‘perceptual similarity’ between watermarked and unwatermarked
versions of an image.
• Capacity - The information carrying capacity of the watermarking system. A watermarking sys-
tem can carry N + 1 bits of information, comprising one of 2 N different messages encoded into
the watermark, and one additional bit gained from hypothesis testing ‘is the given watermark
present?’. Clearly semi-informed and informed watermarking systems have only a one bit capac-
ity, which is sufficient for owner verification applications [10]. Kutter and Petitcolas [23] suggest
that a capacity of approximately seventy bits is necessary for practical applications of transaction
tracking, which would be sufficient to store the fields discussed in Section 1.3.
7
• Robustness - The ability of the watermark to be resilient to passive distortions that do not ren-
der the image unusable for its intended purpose (i.e. distortions that still preserve the value of
the image [10]). These operations include standard image processing (e.g. affine transformation,
spatial filtering), transmission distortion (e.g. impulse noise) and storage distortion (e.g. lossy
compression). Robustness also relates to the ability of the watermark to withstand active attempts
at unauthorised removal. This method of the attack can include statistical analysis [35], non-
linear geometric distortions [8] and ‘presentation attack’ (e.g. where an image is split into small
sub-images which are juxtaposed to form the original image when rendered [33]). As discussed in
Section 1.5.1, the watermark key provides some element of robustness against ‘brute-force’ active
attack [24].
The relationship between these properties is complex; typically they are conflicting or ‘mutually
competitive’ [13]. For example, if a watermark is applied with a high embedding strength (α ), then
correlation values at a binary detector will be higher, but this will lead to the modulated sites being more
visible in the host image. Similarly, if error correcting coding is used to encode an M bit message into a
watermark, then detection (and message recovery) will be more robust, however capacity will be reduced
as the coding method will force M < N. Figure 1.4 visualises this concept. A watermarking system’s
generation, embedding and detection techniques will be designed to optimise its position within this
fidelity-capacity-robustness property space, and also aim to reduce the relative size of the space, so that
any trade-off is less apparent.
Fidelity
Capacity Robustness
Figure 1.4: The conflicting properties of watermarking systems (adapted from [13]).
1.8 Summary
The digital multimedia revolution has presented new challenges to copyright protection, particularly ca-
sual copyright infringement of digital images. To aid a solution to this, digital watermarking techniques
have been developed that invisibly embed copyright information within an image. The requirements of
such watermarks are dependant on the copyright protection application, but generally all watermarking
systems are constrained by the properties of capacity, robustness and fidelity, which are conflicting and
cannot be independently optimised. Additionally, based on the outputs of a watermarking system’s de-
tection strategy, systems can be identified to belong to one of three separate classes. A system’s class
8
will place a limit on the scope of possible copyright protection applications to which it can be applied.
Watermarking systems are also characterised by their watermark generation, embedding and de-
tection techniques. Typically, these follow the same overall model, but present diverse solutions in
optimising performance within the fidelity-capacity-robustness property space.
9
Chapter 2
Benchmarking Watermarking Systems
2.1 The Need to Benchmark

As watermarking systems have such complex dependencies, it is difficult to ask the question “How good
is a watermarking system?” and more importantly, “Which is a better watermarking system for a given
application?”. However, as argued by Nikolaidis et al. [30], if watermarking systems are to be used
practically, these questions must be addressed.
Generally, when a new watermarking scheme is proposed, its creators will include some empirical
results to validate their property claims. However, the tests that are performed are usually far from com-
prehensive, and the lack of any standard metrics or testing techniques make it difficult to compare results
[30]. Therefore, efforts have been made to devise scientific techniques for benchmarking watermarking
systems, hence giving a standard mechanism for their evaluation and comparison.
2.2 Existing Benchmarking Techniques

Many such benchmarking systems have been proposed, of which Stirmark [23] and Checkmark [28]
are arguably the most well-known, and are relatively mature applications 1 . Both follow a similar bench-
marking model, however a more thorough abstract framework has recently been developed by Solachidis
et al. [41], which highlights weaknesses in these earlier systems and attempts to extend their model to be
more scientifically valid. The following sections discuss the framework of Solachidis et al. and where
necessary highlight the weaknesses of Stirmark and Checkmark. 2
1 The software versions discussed within this report are Stirmark v4.0 and Checkmark v1.2.
2 Throughout the remaining sections of this chapter, Stirmark, Checkmark and the work of Solachidis et al. are referred
to without citation in the interests of brevity. However, all discussions of these works are derived from [23], [28] and [41]
respectively.
10
2.2.1 Performance Metrics
Ideally, the output from a benchmark would be a scalar value expressing how ‘good’ a watermarking
system is. A number of performance tests would be administered, the results weighted and an over-
all score produced. However, this black-box approach is not suited to the complex problem space of
watermarking, and does not provide enough information to evaluate a watermarking system against
application-centric requirements.
Solachidis et al. suggest that a number of separate performance metrics are required to fully describe
a system’s performance. Firstly, the tri-dimensional property space can be divided into the more man-
ageable metrics of fidelity vs. robustness, robustness vs. capacity and capacity vs. fidelity, where in
each case the third parameter is set to a constant value. Generally, these metrics require embedding a
watermark into an image and measuring the relevant values when varying the external parameters of α
(the embedding strength) and N (the size of the encoded message). Measurement of capacity is trivial,
as this is simply some function of N. Measurement of fidelity can be achieved through either human ob-
servation or some algorithmic perceptual distance measure (e.g. [46], [44]). Measurement of robustness
for non-blind watermarking classes could be some function of the correlation value at detection, and for
blind schemes, some function of the bit-error-rate at detection.
These metrics are useful, but do not encapsulate the true performance of the watermarking system;
rather they measure the relative impact of the trade-off within the property space. Solachidis et al.
suggest therefore that a further metric is required that deals solely with robustness, as in practise both
the capacity and the fidelity of a watermarking system would be fixed. This correlates strongly with the
actual definition of robustness, as it is not simply a function of the embedding strength; rather a dynamic
property that describes how resilient a watermark is to attack (both passive and active). It follows that
this metric assesses detection performance, following the application of different types and strengths of
attack to a watermarked image, whilst maintaining a constant capacity and fidelity.
2.2.2 Problems with Existing Techniques

Stirmark and Checkmark calculate these four performance measures, and offer varying levels of com-
pactness when presenting results, i.e. from detailed plots for all metrics, to a simply weighted and com-
bined ‘score’ for the watermarking system under evaluation. Although this seems a thorough bench-
marking model, Solachidis et al. discuss a number of flaws:
• The results do not reflect the non-functional requirements of a watermarking system, so metrics
should be expanded to include more practical aspects such as execution time.
• From a scientific and experimental perspective, Stirmark and Checkmark do not parameterise their
metrics fully. Specifically, Stirmark and Checkmark will generate a single metric by performing
a number experimental permutations on a range of host images and average the results. However,
a single watermark key and message are reused for all permutations. Solachidis et al. argue that
detection performance is key and message dependant, hence a large number of keys and messages
should be used when calculating a metric involving detection accuracy.
11
Figure 2.1: The RGB colour channels of a digital image.
• Each of the four metrics require some form of fidelity measure to enable their calculation. Both
Stirmark and Checkmark use an algorithmic perceptual distance measure to enable automated
calculation of fidelity. As demonstrated in Section 2.4, and also discussed extensively by Wang
et al. [45], this is a difficult aspect to achieve accurately. Stirmark and Checkmark do not approach
this naively, however more recent methods are available than they currently employ.
• A watermarking system could exhibit artificially high robustness through false-positive detection.
Although this metric is acknowledged by both Stirmark and Checkmark, it is not included in any
final performance measures.
A further problem with these existing systems is in their treatment of issues regarding colour. A
colour can be mathematically described in terms of three independent variables [40], hence within a
digital colour image, each pixel is represented by a coordinate within a particular tri-dimensional colour
space. The Red-Green-Blue (RGB) colour space is typically used within digital imaging, however many
other colour spaces can be derived from RGB through either a linear or non-linear transformation; there-
fore an image can be modelled as being composed of three independent ‘channels’ or ‘bands’ of data
(see Figure 2.1). Using this model, a watermarking system’s embedding function may choose embed-
ding sites derived from one or many of these colour channels.
One would expect therefore that a benchmarking system should treat colour images and colour image
embedding as the norm, particularly as ‘real-world’ scenarios of watermarking would typically involve
colour images. Checkmark and Stirmark 3 use colour images for benchmarking purposes, however when
calculating the fidelity between an original and watermarked image, they are first converted to greyscale
images (i.e. only the intensity channel of the hue-saturation-intensity colour space is used). Many early
watermarking systems were designed for greyscale images, with an implicit extension to colour images
via intensity channel embedding [38]. For such systems, the approach of Stirmark and Checkmark
is valid, however more recent watermarking techniques use other colour channels to exploit aspects
of human colour perception within their robustness strategy (e.g. [12], [22]). Clearly this could lead to
3 Asdiscussed in Section 2.4.1, the Stirmark system is described as using a colour fidelity metric in its original paper,
however examination of the software source code showed no evidence of its implmentation.
12
inaccuracies in fidelity measurement, and hence unfair comparisons between greyscale and colour image
embedding algorithms.
2.3 A New Benchmark

A primary objective of this project is to evaluate the performance of a number of watermarking systems.
Clearly, this could have been performed experimentally using either Checkmark or Stirmark. However,
in the following sections, a new benchmark is presented that attempts to overcome some of the limi-
tations of Stirmark and Checkmark, both by constraining the problem space and employing the more
scientifically valid techniques suggested by Solachidis et al. Additionally, this new benchmark explores
the problems of performing algorithmic perceptual distance measurement, including native handling of
colour images and colour channel embedding strategies.
2.3.1 Constraining the Problem Space

It has been shown that Checkmark and Stirmark deal with the complex interdependence of watermark
properties by partitioning the problem space and using a variety of performance metrics. This allows
both systems to benchmark the various classes of watermarking system discussed in Section 1.6, and
hence provide an evaluation against the application-centric requirements discussed in Section 1.3. How-
ever, if the possible applications are constrained to be only ‘owner verification’, then the properties of
capacity and fidelity are inherently fixed to a constant value (one bit and ‘approximately invisible’ re-
spectively). This constraint mitigates against any application of the fidelity vs. robustness, robustness
vs. capacity and capacity vs. fidelity metrics, as the watermarking system needs only to optimise along
the single dimension of robustness against attack.
If the dimensionality of possible attacks is further reduced to include only those that are ‘passive’ (see
Section 1.7), then a practical measure of ‘owner verification’ watermark performance can be attained by
assessing the watermark’s resilience to such attacks. Limiting the benchmark to only passive attack
implies an application-centric constraint, i.e. the worth of an image would not be regarded as high
enough to expect anything other than the size of watermarking key space to provide protection against
naive (i.e. average user) approaches to watermark removal.
The proposed benchmarking scheme is designed within these constraints. However, as demonstrated
in the following sections, its scientific validity is increased over that of Checkmark and Stirmark as a
result of this specialisation.
2.3.2 High Level Design

The design of the benchmark is essentially that of Checkmark and Stirmark’s approach to passive attack
evaluation, with embellishments from the framework of Solachidis et al. A singular metric data point
is derived by watermarking an image with a given message and key, attacking the watermarked image
with an image processing operation, and finally attempting to detect the watermark in the attacked,
watermarked image (see Figure 2.2).
13
,
-.
( )
(' -.#-% /

& '(*
3 (4545

( +
(

'

& ('
)

) :
(
'; -.
!#"%$

6 6879
/ 01(2
Figure 2.2: The conceptual benchmarking model.
Although the benchmark only deals with classes of watermarking system that have a one bit capacity
(i.e. the message conceptually is zero bit), a message must still be provided to the watermark generation
stage. This is non-intuitive, however it is only the (hypothesis testing) detection strategy that constrains
the capacity to one bit. The generation stage will encode the message into a certain representation,
and the key will then permute this representation. So in the case of owner verification applications, the
message is still required (e.g. a unique identifier of the copyright holder), otherwise the space of all
possible watermarks is simply the key space, rather than the message×key space.
The original watermark is available at detection (as the watermarking system will be informed or
semi-informed), so rather than thresholding the detector output to produce a binary value as to whether
the given watermark is present (clearly it is), the actual detection value is recorded. This detection value
provides one data point in the benchmark of a certain watermarking system, with a given key, message,
image and attack.
The value of the embedding strength, α , will clearly have a direct effect on the detection value,
however it also has a direct (conflicting) effect on the fidelity of the watermarked image. The benchmark
design calls for the fidelity to be at a constant value, therefore the embedding strength must somehow
be derived for any given watermarking system, key, message and image so that the fidelity is constant,
and hence results derived from the detection values are comparable. Section 2.4 discusses these issues
extensively.
2.3.3 Experimental Parameters

A single benchmark data point clearly does not encapsulate the true robustness of a watermarking sys-
tem. For experimental validity, many data points must be generated and consolidated. Solachidis et al.
propose the following experimental parameters (using their notation):
• A set of images I = {Ii |i = 1 . . . NI }. These images should represent a broad range of image ‘types’,
i.e. have different sizes and content types to reflect the differing statistical properties of possible
image data (e.g. images containing bright colours, textures, fine details, lines, edges or smooth
areas).
14
• A set of keys K = {Ki |i = 1 . . . NK }. A range of keys should be used as detection performance
is key dependant. As discussed in Section 2.2.2, this parameter is not present in Checkmark or
Stirmark and hence affects their accuracy.
• A set of messages M = {Mi |i = 1 . . . NM }. Even in the case of this constrained benchmark, detec-
tion performance is message dependant. In many non-blind watermarking systems the key does
not actually change watermark values, rather it permutes the values’ position in the watermark
vector. Hence using different messages will ensure a good statistical variation in embedded wa-
termarks, as the messages will be encoded to different watermark vectors before any permutation.
Again, as discussed in Section 2.2.2, this parameter is not present in Checkmark or Stirmark and
hence affects their accuracy.
• A set of attacks A = {Ai |i = 1 . . . NA }. The set of attacks should reflect a variety of common image
processing operations, transmission distortions and storage distortions.
The above sets are combined to produce many experimental permutations and hence compute many
detection values. For a given watermarking system, watermarks will be embedded using each I i , Ki and
Mi , giving Ne = NI × NK × NM watermarked images. The embedding strength will be adjusted for each
embedding operation so that the constant fidelity requirement is satisfied across all watermarked images.
For each watermarked image, attack A i is applied, giving Nd = Ne × NA watermarked, attacked images.
Detection can then be performed on each watermarked, attacked image, giving a set D of N d detection
values.
2.3.4 Performance Measures

One approach to combining the set of detection values into a usable metric would be to simply average
them:
1 Nd
X= ∑ Di
Nd i=1
(2.1)
Using many detection values would clearly reduce the influence of any outliers in the results, and the
value of X could be used as an indication of a practical threshold for a binary detector. Similarly, the
variance of the detection values could give some indication of detection ‘accuracy’.
However, this approach is not sufficient to describe the watermark’s performance, as outliers may
be of particular relevance to an owner verification application, specifically if they are clustered around a
certain attack [41]. Clearly this will highlight that a watermarking system’s robustness strategy is weak
with regard to the statistical distortions introduced by the attack. Therefore, separating the results into
attack classes (image processing, transmission distortion and storage distortion) and further into specific
attacks gives a more informative performance measure.
As a further extension, and following the approach of Stirmark and Checkmark, attacks should
be applied at varying strengths, and the detection values averaged over each strength/attack rather than
choosing an arbitrary (constant) strength for each attack. The aim of the benchmark is to allow evaluation
through measurement of watermark degradation, i.e. its dynamic robustness properties, so grouping the
15
1
Watermarking System 1
0.9 Watermarking System 2
0.8
0.7
Detection Value
0.6
0.5
0.4
0.3
0.2
0.1
0
0 20 40 60 80 100
Attack Strength
Figure 2.3: An example attack strength/detection value plot.
results using this method will allow a plot of detection value against attack strength (Figure 2.3 shows
an example).
Note that for each data point, the detection value will have been averaged over, N e = NI × NK × NM
detections. Therefore the set of attacks, A, is conceptually extended to include each attack type, applied
at each attack strength. As discussed in Section 1.7, a watermark need only be robust to attacks that
do not render its host image unusable for its intended purpose, so the range and granularity of strengths
will be inherently different for each attack type, and bounded by the requirements of the watermarking
application.
2.3.5 Creating Comparable Results

Ultimately, the metric should allow watermarking systems to be compared. Therefore, the notion of
‘detection value’ should be normalised across watermarking systems to allow this comparison. Although
most informed and semi-informed watermarking systems use some form of correlation to calculate a
detection value, they are generally not directly comparable [17].
To produce a normalised detection value across all systems, Stirmark and Checkmark take the rather
draconian approach of requiring actual detection schemes to be changed, so that detection values range
0 . . . 1. It should be noted that detection of a watermark in an unattacked host image may produce a
detection value less than 1. This is due to the statistical properties of the image affecting the correlation
measure, and is highly probable if the embedding function is non-invertible. It follows that the range
0 . . . 1 is interpreted by Stirmark and Checkmark as the range of detection values for all possible images,
keys and messages. Therefore, for a detection value of 0.75 gained from an attacked image, there is
no distinction made between the effect of the attack and the intrinsic maximum detection value for the
particular image, key and message used.
The process of normalising detection values is not addressed by Solachidis et al., however the ap-
16
proach of Stirmark and Checkmark can be improved by levering the application-centric constraints of
the proposed new benchmark. The strategy employed by Stirmark and Checkmark is a result of their
architecture being generalised to deal with all classes of watermarking system, hence binary detection
is treated as a sub-set of full watermark (and message) extraction. This disregards the fact that in both
informed and semi-informed watermarking, the original image and watermark are either explicitly or im-
plicitly known at the time of detection, hence the absolute maximum of detection value can be calculated
for a particular image, key and message. In owner verification applications, this absolute maximum is
often used as the basis of the binary detection threshold; hence the proposed new benchmark normalises
detection results using this premise. Specifically, for each watermarked image, an absolute maximum
detection value is calculated by detecting the watermark before any attack is applied:
ˆ K,W, I),
DMAX = g(I, (2.2)
where W is the watermark generated from key K and message M, using the appropriate watermark
generation function (Equation 1.1). Iˆ is the watermarked image created by the watermarking system’s
embedding function (Equation 1.2), using watermark W and original image I. Similarly, an absolute
minimum detection value is calculated by attempting to detect many incorrect watermarks in the image
before any attack is applied:
1 N
DMIN = ∑ g(I,ˆ K,Wx , I),
N x=1
(2.3)
where Wx is generated using a key Kx 6= K, and a message Mx 6= M. DMAX and DMIN are then used to
normalise the detection value Di , for each detection performed on the attacked version of watermarked
ˆ
image I:
Di − DMIN
D̂i = (2.4)
DMAX − DMIN
Due to the application-specific nature of the benchmark, this ‘local’ normalisation presents a more
practical measure of watermark degradation. It also allows the use of the original measure from the
watermark system to be used, rather than changing the actual detection function to output a value in the
range 0 . . . 1.
2.4 Perceptual Distance Metrics

Any watermark benchmarking model requires a method of measuring fidelity, i.e. the ‘perceptual dis-
tance’ between an unwatermarked and watermarked image [41]. In the above sections, Checkmark,
Stirmark, and the proposed new benchmarking framework are described as having a function thus:
ˆ
P = h(I, I), (2.5)
where P is a value that quantifies the perceptual difference between the unwatermarked image I, and the
watermarked image I. ˆ This function is utilised in all of Checkmark and Stirmark’s metrics to measure
fidelity, and in the case of the proposed benchmark, it is used to adjust the embedding strength until
a particular fidelity is reached (a constant fidelity across all experimental permutations). It is also this
17
function that determines the benchmark’s ability to compare watermarking systems in a fair manner.
Specifically, if this function includes effects of human colour perception when measuring the fidelity
between two images, then benchmarking will be independent of the colour channel and colour space
used for embedding.
A large amount of current research is addressed at deriving such a function, as it is useful not only
for watermark benchmarking, but also in many other areas of image processing, e.g. optimisation of
JPEG quantisation matrices [46]. It is generally agreed however, that it is a hard problem to solve
due to the subjective nature of the question being asked: “How perceptually different are two images
in terms of how a human would see them?” [45]. The approach taken by many authors when giving
empirical results on a new watermarking system is to use the mean-squared error (MSE) of the two
images [44]. The MSE is calculated as the average of the squared intensity difference of corresponding
pixels from the two images. (The MSE measure could also be applied to different colour channels in a
naive attempt to produce an overall error value for a colour image.) Similarly, the peak signal-to-noise
ratio (PSNR) is also used as a simple measure, and is recommended for use by Solachidis et al. within
their benchmarking framework as “no other globally applicable and acceptable quality measure has been
proposed so far” [41].
Figure 2.4 demonstrates the shortcomings of using the MSE (the PSNR measure is similar). Clearly
the watermarked images have different types of errors introduced, however the MSE does not differ-
entiate the visibility of these errors [44]. Specifically, the MSE treats all errors in an image equally,
however human visual perception is not uniform and responds differently with regard to spatial frequen-
cies, brightness and colour [10]. It is therefore universally accepted that such simplistic measures are
inaccurate, as they are not correlated with the complex nature of the human visual system [14] [45].
Hence, if used within a benchmark, constant fidelity would not be guaranteed, particularly when testing
a watermarking system that exploits aspects of psycho-visual colour perception [23].
2.4.1 Existing Benchmark Approaches

Many advanced perceptual distance metrics (PDM’s) are presented in the literature. Also, Checkmark
and Stirmark pay particular attention to this topic, and utilise PDM’s developed by Watson [46] and
Lambrecht et al. [43] respectively. Initially, these two PDM’s were candidates for use within the pro-
posed new benchmark, however they were rejected for a number of reasons:
• Checkmark employs a PDM developed in 1993 by A. B. Watson [46]. Originally designed to

aid the optimisation of JPEG quantisation matrices for individual images, it has been adapted by
Checkmark and also by a number of watermark systems to aid fidelity optimisation [10]. It has
also been evaluated against a number of other PDM’s by Mayache et al. [26], and shown to be
accurate against subjective experiments. However, it is designed exclusively for monochrome
images, which is impractical with regard to watermark benchmarking. Checkmark overcomes this
by converting any colour images to greyscale before applying the metric. As discussed above, this
may lead to inaccuracies where a watermarking system exploits aspects of psycho-visual colour
perception.
18
(a) (b) (c)
(d) (e) (f)
Figure 2.4: ‘Lena’ image (intensity channel only) with various attacks applied. (a) Unattacked image.
(b) Contrast stretched image, MSE ≈ 200. (c) Gaussian noise distorted image, MSE ≈ 200. (d) Impulse
noise distorted image, MSE ≈ 200. (e) Mean blurred image, MSE ≈ 200. (f) JPEG compressed image,
MSE ≈ 200. This particular type of visualisation of the effects of the MSE is adapted from [45].
• Stirmark presents a more recent (1996) colour based metric proposed by Lambrecht et al. [43].
No evaluations or further reviews of this metric could be found, and the original paper is incom-
plete in some of the implementation details. Although the Stirmark paper discusses that this metric
is used as its PDM, investigation of the publicly available software source code provided evidence
of only a PSNR measure. Again, this is only applied to the intensity channel of an image.
2.4.2 A New Approach

As neither of these metrics were acceptable, extensive investigations were undertaken to find applicable
metrics that could be harnessed as a PDM. Several more recent techniques were found to be available,
of which, two are presented below.
2.4.2.1 S-CIELAB
S-CIELAB [48] has been developed by X. Zhang and B. Wandell of the Department of Psychology,
Stanford University as a spatial extension to the CIE L*a*b* DeltaE colour difference formula [31].
19
CIE L*a*b* DeltaE is a ‘perceptual colour fidelity’ metric which is used extensively for measuring the
perceptual colour difference between two large, unvarying regions of colour. S-CIELAB attempts to ex-
tend this metric to general colour images by including spatial frequency information into the calculation,
arguing that colour discrimination is a “function of spatial pattern”. The reader is referred to the cited
paper for a more detailed explanation of the algorithm.
In its original form the S-CIELAB metric will produce an error map of the perceptual colour distance
between two images, i.e. for each pair of corresponding pixels, an error value is produced in (perceptu-
ally uniform) CIE L*a*b* DeltaE units. However this cannot be used directly, rather a method is required
to combine these error perceptibilities into one error value that encapsulates the total perceptual distance
between the two images. This is a common problem for PDM’s, and the solution presented by many
(including the Watson metric [46] used in Checkmark) is to pool the error values using a Minkowski
summation, in an attempt, as Watson describes it, “to combine the probabilities that individual errors
will be seen” [46]. The summation generally takes the form:
!1/β
E= ∑ ∑ |ei, j | β
, (2.6)
i j
where ei, j is the error at position i, j in the error map and β is a constant that determines the ‘degree’ of
pooling. When β = 1, E is simply a summation of the individual absolute errors values. As β grows,
the summation tends towards a ‘maximum-of’ operation on the errors values. Watson notes that in
controlled psycho-physical experiments, a value of β = 4 has been cited as matching human observation
[46].
Unexpectedly, the Minkowski summation does not appear to have been proposed in the literature
as a pooling technique for an S-CIELAB error map. However, as an exploratory solution, the proposed
benchmark applies a Minkowski summation to the results of the S-CIELAB measure to produce a per-
ceptual distance metric for full colour images.
2.4.2.2 SSIM
A recent proposal (2004) by Wang et al. [44] presents a radically different model from typical PDM
techniques. They argue that the human visual system is “highly adapted to extract structural informa-
tion”, and hence a measure of the change in structure between two images will provide a good metric
for their similarity. A qualitative justification for this can be inferred from the distorted images shown
in Figure 2.4. Clearly the blurred image has a greater perceptual difference to the original image when
compared to the version that has been contrast-stretched. The authors argue that this is due to the greater
loss of structural information from the blurred image, whereas the contrast-stretched image preserves
much of its original structure. Note that this metric specifically ignores any effects that colour percep-
tion may have upon the similarity of the images, hence the images are converted to monochrome prior
to assessment. A brief description of the SSIM metric is given below, however the reader is referred to
the cited paper for a more detailed discussion.
20
The SSIM metric compares two images in an overlapping block-wise fashion, using a circular-
symmetric Gaussian weighting function to reduce blocking artefacts. Each pair of corresponding blocks
are compared for luminance, contrast and structural similarity, with the results combined over all blocks
to give a similarity measure in the range 0 . . . 1. The luminance comparison is a function of corresponding
blocks’ mean intensity, and the contrast comparison is a function of corresponding blocks’ standard
deviation. The structural comparison is computed as the correlation coefficient of the two blocks. The
product of the luminance, contrast and structural comparison functions is then taken as the combined
similarity. Although the metric takes a relatively simple approach, the authors report good results in
subjective tests, and provide a favourable comparison against the commercially available ‘JNDmetrix’
image quality assessment software produced by Sarnoff Corporation 4 .
2.4.2.3 A Hybrid Model
The pooled S-CIELAB metric could have been used solely as the PDM for the proposed new benchmark,
however it presents a number of problems that lower confidence in its accuracy. Firstly, the Minkowski
summation is widely acknowledged as being highly speculative, as it assumes statistical independence
of errors at different locations, which has been shown empirically not to be the case [45]. Secondly,
the overall pooled error value is not intrinsically bounded and hence any threshold must be derived
experimentally, as must an appropriate value for the exponent, β .
To raise confidence in the measure, the proposed benchmark uses both the pooled S-CIELAB metric
and the SSIM metric as a combined PDM, weighting their outputs as appropriate (Section 2.4.3 describes
the particular weighting employed). Using this combined metric is advantageous, as the high-level
approach of the SSIM metric complements the S-CIELAB metric’s low-level modelling of human visual
system. Additionally, the SSIM metric’s bounded output will aid thresholding and calibration of the
pooled S-CIELAB metric.
2.4.3 Calibration
For both metrics, a suitable range of outputs must be experimentally derived that match the constant
fidelity specification of the benchmarking system. A secondary investigation must also be undertaken to
derive a suitable value for the Minkowski summation’s exponent. Therefore, a set of experiments were
designed to quantify these ranges for a conceptual set of seven metrics: SSIM, and pooled S-CIELAB
with β = 2 . . . 7.
Many studies have been undertaken to investigate how perceived levels of distortion can be exper-
imentally derived [10]. The method used here is a much simplified version of that proposed by D. M.
Green and J. A. Swets [15], with experiments undertaken by two subjects using the calibration tool
discussed in Section 4.4. Specifically, an unwatermarked and watermarked image were shown to the ex-
perimental subject, and the embedding strength increased in small steps until the watermark was judged
to be ‘just visible’ (i.e. the watermarked image had a perceptual distance of one ‘Just Noticeable Differ-
4 Please see http://www.sarnoff.com/
21
SSIM S-CIELAB
β =2 β =3 β =4 β =5 β =6 β =7
Upper 0.997 551.2 102.4 35.8 25.6 14.1 20.2
Lower 0.996 210.0 15.6 9.4 5.0 12.0 15.9
Table 2.1: Experimental PDM calibration results.
ence’ (JND)5 to the original image). The PDM value for each of the seven metrics was then recorded.
This was then repeated for five different images using four different watermarking systems and the re-
sults averaged for each metric. In order to produce a range of acceptable PDM outputs, the experiments
were repeated but the embedding strength was decreased in small steps from a high value. Again, the
PDM value for each of the seven metrics was recorded and the results averaged for each metric. Table
2.1 shows the derived upper and lower boundaries for each metric.
Many outliers within the results have been removed from the ranges shown in Table 2.1, however
these exhibited a particular trend for each of the metrics. The SSIM outliers appeared as low SSIM values
when assessing an image with a watermark containing mostly high-frequency components. The converse
is true of the S-CIELAB metric, which performed poorly when assessing images with a low-frequency
watermark. A qualitative explanation for this can be derived from how the metrics are calculated. SSIM’s
correlation measure for corresponding blocks’ structural similarity will react overtly to widespread small
changes in pixel intensity, whilst the S-CIELAB metric will weight these as frequency masking effects
and hence produce a more accurate value. A similar explanation can be applied to the converse case,
however a more detailed study of these observations is beyond the scope of this report.
Although sensitivities were found within the metrics, it should be noted that they are opposing.
Hence, combining the metrics’ results should neutralise these effects to some extent, allowing confidence
in the hybrid metric. An equal weighting is placed onto each metric, i.e. given the acceptable fidelity
ranges SSIMMIN to SSIMMAX and SCIELABMIN to SCIELABMAX , then the final embedding strength α ,
X+Y
for image I and watermark W is calculated as α = 2 , where X and Y are the embedding strengths
such that by Equations 1.2 and 2.5,
SSIMMIN ≤ SSIM(I, f (W, I, X)) ≤ SSIMMAX (2.7)
and,
SCIELABMIN ≤ SCIELAB(I, f (W, I,Y )) ≤ SCIELABMAX (2.8)
The SSIM range can be used directly within the benchmark, however the choice of S-CIELAB range
and the exponent β is more complex. With lower exponent values the acceptable range is large, clearly
showing that the pooled metric is sensitive to the statistical properties of an image and watermark.
As the search for a suitable embedding strength will be automated within the benchmark, the chosen
5 The JND is a standard unit used within psychophysical experimentation, and is defined as “the minimum amount
by which stimulus intensity must be changed in order to produce a noticeable variation in sensory experience” -
http://www.usd.edu/psyc301/WebersLaw.htm
22
strength will be taken as soon as the PDM value falls within the acceptable range. Hence, for a particular
image and watermark, the embedding strength may be inaccurate if the acceptable PDM range is large.
Therefore β = 6 was chosen for the Minkowski summation, as this proved to be the exponent at which
the results were least sensitive to image and watermark contents.
As discussed in Section 2.4.2.1, it is generally accepted that a value of β = 4 matches results from
psychophysical experiments, and this value is used extensively [10]. This conflicts with the results of
this (albeit simplified) experiment, however some authors discuss larger values as having some validity,
including [32] where β = 6 was found to be optimal.
2.5 Summary
The evaluation and comparison of watermarking systems is difficult due to the complex interdepen-
dence of their properties. Scientifically valid benchmarking is one technique that can provide a solution
as it decouples these dependencies to produce a usable metric. Many such benchmarks have been pro-
posed, of which, the Checkmark and Stirmark systems are currently the most well known. Although
these benchmarks provide a thorough set of metrics, their experimental validity has been questioned,
due primarily to their over-generalisation of watermarking classes and applications. Additionally, the
‘perceptual distance metric’ components used within their design are far from optimal, particularly in
being sensitive to the colour channel and colour space employed by a watermarking system’s embedding
strategy.
A new benchmarking framework has been proposed that attempts to increase the experimental valid-
ity of Stirmark and Checkmark. This is achieved by expanding their range of experimental parameters
whilst constraining the range of watermarking systems and applications under evaluation, hence allow-
ing a more specific design. Further to this, a new ‘perceptual distance metric’ component is used that
is a hybrid of two recent techniques. This new metric models human colour perception, allowing a fair
benchmark which is independent of a watermarking system’s colour image strategy.
23
Chapter 3
Experimental Configuration
3.1 Benchmarking Parameters

To satisfy this project’s objectives, four watermarking systems are evaluated using the benchmarking
framework presented in Chapter 2. The systems under test are discussed in Section 3.2, however many
benchmarking parameters must also be populated with relevant data and algorithms, including the sets
described in Section 2.3.3. Hence, the following subsections describe and justify the parameterisation
of the benchmarking system for the experiments performed.
3.1.1 Keys and Messages

The set of keys K and the set of messages M are arbitrary in terms of content as conceptually each
element is used only as the seed of some pseudo-random number generator, i.e. it is the responsibility of
the watermark generation function to ensure that for a given key / message pair, a unique watermark is
produced. The size of sets K and M are required to be large as discussed in Section 2.3.3, however for
each additional key or message used, the number of experimental permutations grows rapidly. Therefore
to maintain computationally feasibility, the experiments use a set of ten keys and ten messages.
3.1.2 Normalisation Calculation

To calculate DMAX (Equation 2.2) requires only one detection per watermarked image, however calcula-
tion of DMIN (Equation 2.3) requires the averaging of many detection values. Many incorrect watermarks
should be used to calculate DMIN to gain an accurate value for a ‘true-negative’ detection; again however,
this must be constrained to maintain computationally feasibility. One hundred incorrect watermarks are
generated for this purpose, using ten keys and ten messages. As discussed in Section 2.3.3, different
24
keys and messages are used to gain a good statistical variation in generated watermarks.
3.1.3 Images
Fabien A. P. Petitcolas (an author of the Stirmark [23] benchmarking system) maintains a database of
suitable images for benchmarking purposes, of which, twenty eight are used as test images within the
experiments (set I). These satisfy the statistical variation requirements discussed in Section 2.3.3, and
are free to use for research purposes. Following image research tradition, the ubiquitous ‘Lena’ image
is included. A full list of test images is given in Appendix A.
3.1.4 Attacks
Fifteen separate attacks are used with the experiments, which satisfy the required passive attack classes
of ‘image processing’, ‘transmission distortion’ and ‘storage distortion’. Each attack has a particular
number of attack strengths, however as discussed in Section 2.3.4, the range and granularity of strengths
is unique for each attack. For every attack the range of strengths varies from zero (no attack), to either
an intrinsic or specified maximum. If the benchmarking framework was being used for evaluating a
watermarking system for a specific application, then the ‘maximum’ can be described as the strength at
which the attack distorts an image just passed the point of usability. Clearly ‘usability’ will be applica-
tion specific. In the case of these experiments, the benchmarking framework is being used to compare
watermarking systems, so the maximum attack strength is set to that where each watermarking system
has a sustained detection value of approximately D MIN . Using this extreme value for the maximum
attack strength will allow evaluation of the ‘break-down’ point of the watermarking system. Note that
some attacks have a finite range of strengths (e.g. JPEG compression), in this case, D MIN may never be
reached. The granularity of attack strengths was derived to balance computationally feasibility with the
required level of detail for the results. A brief description of each attack is given below:
• Image Processing (Enhancement)
– Brightness Adjust - A constant offset x is applied to the intensity of each pixel in the image.
Offsets are both positive and negative, and the resulting modified pixel values are clamped
at their extremes. Attack strength determines the absolute value of x.
– Contrast Adjust - A constant multiplier x is applied to the intensity of each pixel in the
image. Attack strength determines the value of x. Contrast is both reduced (x < 1) and
increased (x > 1), and the resulting modified pixel values are clamped at their extremes.
– Sharpen - The visibility of high frequency features within the image is increased. Sharpen-
ing is performed via an ‘unsharpen mask’ operation, i.e. a high-frequency mask is derived
by subtracting the original image from a version convolved with a 3 × 3 mean filter. The
sharpened image is created by adding corresponding pixels from the original image and the
mask according to the formula PAT TACKED = PORIGINAL + xPMASK , where x is a scale factor
determined by the attack strength.
25
– Mean Blur - The image is convolved with a x × x mean filter, where the attack strength
determines x.
– Gaussian Blur - The image is convolved with a 5 × 5 Gaussian filter over x ‘passes’, where
the attack strength determines x.
• Image Processing (Geometric Transformation)

Geometric transformations are dissimilar to other attacks as they imply that a watermarked image’s
shape or dimensions will be modified. Some informed watermarking systems attempt to perform
automatic registration of the original and host images prior to detection, i.e. invert any geometric
transformation [10]. The simplest of these would be to naively rescale the watermarked image
to the dimensions of the original image. Additionally, in some applications of semi-informed
watermarking, the images are manually registered prior to detection. This presents an ambiguous
area for benchmarking using geometric attacks, as generally the attacked version of the image
will be of different dimensions to the original (certainly this is the case for crop, scale, rotate
and shear). Hence, benchmarking values would represent the watermarking system’s registration
performance (if any) rather than base detection performance. Therefore all geometric attacks
used within the benchmark produce an attacked image of the same dimensions as the original
image, and are pre-registered where applicable. This removes the ambiguity and divergence in
how watermarking systems deal with transformed images, and allows a comparable metric.
– Crop - A symmetrical border region of the image is removed and replaced by black pixels.
Attack strength determines the proportion of the region cropped in relation to the total image
area.
– Rotate (No interpolation / Bi-linear interpolation) - The image is rotated anti-clockwise
through an angle θ , about its bottom-left corner. The image is then cropped to an equal
size of the original image, with a black border applied where necessary. Two variants of the
attack are applied, using the differing interpolation schemes. Attack strength determines the
rotation angle θ .
– Horizontal Shear (No interpolation / Bi-linear interpolation) - The image is sheared
right-horizontally by a proportion x of its width. The image is then cropped to an equal
size of the original image, with a black border applied where necessary. Two variants of
the attack are applied, using differing interpolation schemes. Attack strength determines the
shear proportion x.
– Scale (No interpolation / Bi-linear interpolation) - The image is uniformly scaled down
to a proportion x of its original size. The image is then rescaled by a factor 1/x. Two
variants of the attack are applied, using the differing interpolation schemes for the down-
scaling operation (no interpolation is applied when up-scaling). Attack strength determines
the scale factor x.
26
• Transmission Distortion
– Impulse Noise - A random selection of image pixels are set to either their minimum or
maximum values in equal proportions. The signal-to-noise ratio is determined by the attack
strength.
– Gaussian Noise - A Gaussian distributed offset is applied to all pixel intensities. The stan-
dard deviation of the offset is determined by the attack strength.
• Storage Distortion
– JPEG Compression - The image is subjected to lossy compression by applying the quan-
tisation step of the JPEG algorithm. As all other components of JPEG compression are
lossless [10], they will have no effect on detection performance. The attack strength deter-
mines the quantisation factor of the 8 × 8 quantisation matrix applied to the block-wise DCT
of the image, i.e. the JPEG ‘quality specification’.
3.1.5 Experimental Permutations

Summarising the above quantities, the total number of experimental permutations can be calculated for
each functional stage of the benchmarking system using the simple formulae given in Section 2.3.3:
• The quantity of embedding operations and hence watermarked images:

Ne = NI × NK × NM = 28 × 10 × 10 = 2, 800
• The quantity of attack operations and hence watermark detections (the granularity for attack
strengths is approximated to 20 strength levels for each attack type):
Nd = Ne × NA = 2, 800 × 300 = 840, 000
• The total quantity of embed-attack-detect cycles for all watermarking systems:

Nd × 4 = 3, 360, 000
3.2 Watermarking Systems Under Evaluation

The watermarking systems under evaluation were chosen to reflect a broad range of current techniques,
particularly in their diverse embedding algorithms. Each can be considered a ‘classic’ scheme, and have
introduced many of the fundamental robustness strategies used in many watermarking systems. Note
two of the systems are of the ‘informed’ class, and two are of the ‘semi-informed’ class, which will
allow secondary evaluation of inter and intra-class performance. Each algorithm is briefly presented in
the following sections, with a discussion of its robustness strategies. For each system, two figures are
also shown:
• The ‘Lena’ image after having a watermark embedded by the watermarking system.
27
• A ‘difference’ image of the watermarked and original Lena image, which can be interpreted as
the spatial representation of the complete watermark. It is calculated as the normalised, absolute
difference of the intensities of the two images, where the darker areas have the greater difference.
The normalisation step linearly scales the range of absolute differences per image to the range of
available grey-scale levels.
3.2.1 Barni System

A semi-informed watermarking system proposed in 1997 by M.Barni, F. Bartolini, V. Cappellini and A.
Piva of the Università di Firenze, Italy [2]. The system was originally proposed for use with monochrome
images, with an implicit extension to colour via embedding in a colour image’s intensity channel.
Figure 3.1: Watermarked and difference images of ‘Lena’ using the Barni System.
• Watermark Generation
The watermark is formatted as a Gaussian distributed sequence of floating point numbers of length
X, with a mean of zero and unit variance. The message is used as a seed for the pseudo-random
generation of this sequence, and the watermarking key acts a seed for a pseudo-random ‘shuffle’
of the sequence.
• Watermark Embedding
The watermark is embedded into certain coefficients of the full-frame DCT of an image’s intensity
channel using the embedding formula given in Equation 1.6. The DCT coefficient sites are chosen
by a simple formula, i.e. the coefficients are visited in a zigzag order, and the first L are skipped.
The next X coefficients are chosen as the embedding sites. The modified DCT representation is
then inversely transformed to give an intermediate watermarked intensity channel. The original
intensity channel I, and intermediate watermarked intensity channel I 0 are then added pixel by
pixel, with the contribution from each decided by a local weighting factor β x,y , to produce the
28
final watermarked intensity channel, I 00 :
00
Ix,y = Ix,y (1 − βx,y ) + Ix,y
0
βx,y (3.1)
Finally, the watermarked intensity channel I 00 replaces the intensity channel within the original
image to yield the watermarked image. The weighting factor is designed to exploit frequency
masking characteristics of the human visual system [9], i.e. highly textured regions of the image
have a lower sensitivity to noise, so in these regions β x,y tends to 1, and hence Ix,y
00 ≈ I 0 . The
x,y
authors suggest a simple scheme for deriving β x,y , where I is split into non-overlapping fixed
sized blocks, and βx,y is defined as the normalised local intensity variance of the block in which
Ix,y is contained.
• Watermark Detection
Although the embedding function is not invertible, given a watermarked, possibly corrupted im-
age, the locations of the embedding sites are known as they follow the simple pattern described
above. It follows that detection requires taking the DCT of the watermarked image’s intensity
channel, and performing linear correlation between watermarked sites and the original, known
watermark. The resulting value is output as the detection value.
• Parameterisation
Embedding parameters recommended by the authors are X = 16, 000, L = 16, 000 and a block size
of 32. These values were used for benchmarking.
• Robustness Strategy
Many authors claim that embedding a watermark into the frequency domain representation of an
image increases its robustness [17]; arguing that coefficients can be chosen that have the greatest
perceptual significance, and hence cannot be distorted by image processing operations without
destroying the visual content of the image [27]. This argument conflicts with the fidelity re-
quirements of watermarking, as modifying coefficients with perceptual significance will increase
visible artefacts. To overcome this dilemma, frequency domain embedding is usually modelled
using some spread spectrum technique. In terms of communication theory, Cox et al. describe
this as “transmitting a narrowband signal over a much larger bandwidth such that the signal en-
ergy present in any single frequency is imperceptible” [7]. Applied to watermarking, this implies
spreading the watermark over many coefficients so that each coefficient is only modified slightly
and hence almost undetectably. The Barni system uses both of these strategies, arguing that the
low-frequency components (coefficients L + 1 to L + X) are perceptually significant, yet not the
most sensitive to modification (i.e. these would be coefficients 0 to L). As the inverse DCT spreads
the watermark over all pixels within the image, localised perceptual shaping is applied within the
spatial domain. This further step yields a net gain in correlation value. (The block-wise scaling
can clearly be seen in the watermark difference image shown as Figure 3.1.)
29
3.2.2 Corvi System
An informed watermarking system proposed in 1997 by Marco Corvi and Gianluca Nicchiotti of the El-
sag Bailey Research Department, Italy [6]. The system was originally proposed for use with monochrome
images, with an implicit extension to colour via embedding in a colour image’s intensity channel.
Figure 3.2: Watermarked and difference images of ‘Lena’ using the Corvi System.
An identical watermark generation procedure to the Barni scheme is used, yielding a watermark
of length X.
The intensity channel of an image is decomposed to its N th level representation in the wavelet
domain via a DWT, where N is chosen so that the resulting ‘approximation image’ has approxi-
mately X coefficients (please see ‘Robustness Strategy’ for a description of this particular wavelet
component). The watermark is then embedded into a deterministic sequence of X approximation
image coefficients using the embedding formula given in Equation 1.6. The modified wavelet
representation is then reconstructed via an inverse DWT, with the resulting intensity channel re-
placing the original within the watermarked image.
The watermarked image’s intensity channel is decomposed to its N th level wavelet representa-
tion. The embedding function is invertible (as the original image is available at detection), so the
watermark can be recovered in full from the approximation coefficients using Equation 1.7. The
original and recovered watermarks are then compared using linear correlation. The resulting value
is output as the detection value.
Embedding parameters recommended by the authors are X = 1000, and decomposition via Daubechies
30
six-tap filters. These values were used for benchmarking.
The Corvi scheme uses the standard spread spectrum model as employed in the Barni system.
However the authors argue that embedding in the wavelet domain allows for a greater embed-
ding strength, as this representation is closely related to the operation of the human visual system
(HVS). Studies ([25], [47]) have shown a correspondence between the hypothetical ‘cortex trans-
form’ of the HVS and the wavelet transform, as both split a signal into individual bands that can
be processed independently [27]. Specifically, a one dimensional wavelet decomposition can be
thought of as splitting a signal into two parts, usually its high and low frequency components. The
low frequency part is then split again into high and low frequency components, and this process
is repeated recursively until a required decomposition level is reached. A two dimensional de-
composition is further defined as the application of a one dimensional decomposition, computed
separately for the of the dimensions x and y. This results in a pyramidal representation of an
image (see Figure 3.3), where the ‘approximate image’ used by the Corvi system contains the
low-frequency components of the original image, and the remaining HL N , LHN and HHN sub-
bands contain the horizontal, vertical and diagonal high-frequency components at resolution N.
Therefore, the authors argue that two advantages are gained: The visibility of artefacts caused by
the embedding operation will be minimised due to implicit perceptual masking effects introduced
by the wavelet / HVS correlation, and the watermark will be robust against imaging processing
operations as the modified (low-frequency) coefficients will be perceptual significant.
^`__badceZf
<=P][
g%h=ij9k < PV>
=
@BW=GSXC%YNGHZLDNI
JKML6DNC%I.O
PQ<?[ <=<\[
P <?>
Q < <?>
=
@5R?KTSBLC%UNDNI @BA=C.DFEGH9DFI
JKTLDFC.I%O JKML6DNC%I.O
Figure 3.3: Three-level pyramidal representation of the ‘Lena’ image.

Left - The decomposed image. Right - The decompositional structure.
31
3.2.3 Cox System
An informed watermarking system proposed in 1996 by Ingemar Cox, Joe Kilian, Tom Leighton and
Talal Shamoon of the NEC Research Institute, Princeton, USA [7]. The system was originally proposed
for use with monochrome images, with an implicit extension to colour via embedding in a colour image’s
intensity channel.
Figure 3.4: Watermarked and difference images of ‘Lena’ using the Cox System.
An identical watermark generation procedure to the Barni scheme is used, yielding a watermark
of length X.
The embedding operation is similar to the Barni scheme, in that the watermark is embedded into
certain coefficients of the full-frame DCT of an image’s intensity channel, using the embedding
formula given in Equation 1.6. However, the coefficients are chosen using a different strategy,
i.e. the algorithm chooses the X greatest valued coefficients as the embedding sites. The modu-
lated DCT representation is then inversely transformed, and replaces the image’s original intensity
channel to yield the watermarked image. Unlike the Barni scheme, a block-wise weighting is not
applied to the spatial representation of the watermark.
As the original image is available at detection, the positions of the modulated coefficients can be
calculated. Also, the embedding function is invertible, so the watermark can be recovered in full
from the correct DCT coefficients using Equation 1.7. The original and recovered watermarks are
then compared using normalised correlation. The resulting value is output as the detection value.
The singular embedding parameter recommended by the authors is X = 1000. This value was
32
used for benchmarking.
This scheme extends the concept of embedding into perceptually significant coefficients by calcu-
lating these per image, rather than simply using a set range of frequency bins. The authors suggest
that selecting the X highest valued coefficients has a secondary advantage, as they will be less
sensitive to modification and hence a greater embedding strength can be applied.
3.2.4 Kutter System

A semi-informed watermarking system proposed in 1997 by Martin Kutter, Frederic Jordan and Frank
Bossen of the EPFL Signal Processing Laboratory, Switzerland [22]. This is the only system of those
under test to be explicitly designed for use with colour images. Note also that this system can be trivially
modified to act as a blind watermarking system, however for benchmarking purposes it is used in its
semi-informed guise.
Figure 3.5: Watermarked and difference images of ‘Lena’ using the Kutter System.
The watermark is formatted as a bit string of length X. The message is used as a seed for the
pseudo-random generation of this sequence. An additional bit is added to the front and end of
the sequence to form a final watermark of length X + 2. These front and end ‘signature’ bits are
always set to 0 and 1 respectively. This system is unusual in that the randomisation provided by
the watermarking key is applied during the embedding stage.
The embedding operation is applied directly to pixel values in the spatial domain and only to the
blue channel of the image. For every pixel a pseudo-random number x is generated in the range
0 . . . 1, where the watermarking key is used as a seed. If the value of x for a particular pixel is
33
smaller than a global embedding density parameter ρ , then the pixel is used for embedding. ρ
also lies in the range 0 . . . 1, so the total number of pixels used for embedding is equal to ρ times
the number of pixels in the image. To embed into a pixel, a pseudo-random bit is chosen from the
watermark (again using the watermarking key as a seed), and encoded by modifying the pixel’s
blue channel by a fraction of its luminance:
B0x,y = Bx,y + (2s − 1)Lx,y α , (3.2)
where s is the value of the chosen bit and α is the embedding strength. The luminance of the pixel
Lx,y is calculated as Lx,y = 0.299Rx,y + 0.587Gx,y + 0.114Bx,y . The algorithm is designed to embed
multiple copies of the watermark within the image, i.e. X should be small compared to ρ times
the total number of pixels. Note that as both the selected pixels and bit positions are randomly
chosen, the distribution of the watermark bits is highly irregular.
Each watermarked pixel is visited using the same deterministic sequence as used for embedding.
As the original image is not available at detection, the original value of a pixel’s blue channel is
estimated by taking a combination of the values of its neighbours. The authors suggest a cross-
shaped area of size c should be used. The estimate is calculated thus:
!
c c
1
B̂x,y = ∑ Bx+i,y + ∑ Bx,y+i − 2Bx,y
4c i=−c
(3.3)
i=−c
The difference between the value of watermarked pixel and the estimate of its original value is
averaged over all pixels used to embed bit b:
1 Nb
δ̄b = ∑ Bx,y − B̂x,y ) , (3.4)

Nb i=1
where Nb is the number of embedding sites for bit b. The value of each δ̄b is then compared
against an adaptive threshold τ , to determine the value of watermark bit s b . τ is computed from
the signature bits, which are of known values:
(
1 if δ̄b > τ δ̄0 + δ̄1
sb = where τ = (3.5)
0 otherwise 2
The Hamming distance between the original and recovered bit string is output as the detection
value.
The authors suggest a bit-string length X = 32, however this is implicitly intended for the ‘blind’
version of the algorithm. A larger string is needed to achieve an acceptable range and granularity
of Hamming distances for semi-informed detection, therefore strings of length X = 256 were used
for benchmarking. Both the embedding density ρ , and the detection cross-size c were set to the
author recommended values of 0.55 and 3 respectively.
34
The human visual system is least sensitive to the blue colour channel [22], hence this algorithm
attempts to mask watermarking artefacts by exclusively embedding into this channel, allowing a
greater embedding strength. The algorithm also makes use of redundancy within its robustness
strategy as a simple form of error correction coding.
3.3 Summary
Four diverse watermarking systems have been presented and their robustness strategies analysed. To
evaluate these watermarking systems with the proposed new benchmark, it must be fully parameterised
to be experimentally valid. Sets of keys, message and images have been chosen to produce a good
statistical variation in watermarked images, and the size of these sets has been set at a maximum, whilst
allowing computational feasibility. Fifteen passive attacks have been defined that satisfy a range of
image processing, transmission distortion and storage distortion operations. These attacks are fully
parametric with regard to attack strength, and the range of applied strengths is set to an extreme value to
allow analysis of the ‘breakdown point’ of the evaluated watermarking systems.
35
Chapter 4
Implementation Commentary
4.1 High-level Design

The modular character of the benchmark design provides a natural blueprint for its implementation. Soft-
ware components are needed for each of the watermark generation algorithms, embedding algorithms,
detection algorithms, attack algorithms and perceptual distance metrics. Further to this, an adminis-
tration system is necessary to configure and automate the execution of experimental permutations, and
record their results. Two high level designs were considered:
• A loosely-coupled system of distinct programs that would be configured into experimental runs
via scripts. The distinct programs would mirror the system components listed above, and the
scripting would automate the experiments. This design would depend on output from each of
the component programs being delivered to the subsequent component either through pipes or
intermediate files (see Figure 4.1).
• A tightly-coupled system based around an experiment driver program that would provide a plug-
in architecture for watermark generation, embedding, detection, perceptual distance metric and
attack modules. The driver program would permit the configuration of a number of experiments
that would be executed completely in memory, and would record and possibly consolidate results
(see Figure 4.2).
The tightly-coupled design was chosen as it presented a number of advantages:
• Performance - The minimal I/O requirements would improve speed due to the large number of
experimental permutations. All system components would only be loaded into memory once, and
no intermediate output would be produced. Further to this, ‘stateful’ software components could
36
| ptz }Bz vBmq~6r8mq}Bo
N~ z vBon oqqn z xwvxBvB~
tvBn oqpr8oqBz mqn o
mqn optr8mqpsBoq
z r8mq}Bo
lVmn oqpr8mqpts lbmqn oqpr8mqpts n n mqqs lVmqn optr8mpts
u o vwoptmqn xwpdytz { r8BoqBBoqpTyz { ptxBo xBpTyt { oqn oqn xBpTyz {
vBn oqpr8oqBz mqn o

lVmn oqpr8mqptsBmqn m M t t mqn optr8mqpsBoqX No B~ nBmqn m
qm n n mqqsBoqz r8mq}Bo
Figure 4.1: Loosely-coupled system design.
¥ ¦%§ ¨%©tª«¦q©¬w® ¯w°±²w³§ ¨%ªµ´¶ ·¸°¸¹t® ¯¸³

¥¦%§ ¨%©tª¦%©t¬ ¥¦q§ ¨%©tª¦%©¬ ¥¦%§ ¨%©ª¦%©t¬
ºdªÀ¸¨%ÁwÁ¸¨q© ½¾¨%§ ¨qÂ%§ Ãw© Ä ¨%¯w¨%©t¦%§ Ãw©
q q

¢¡Fq£¤
ºd»w¼¸¨%©t® ª¨%¯w§ È¾§ § ¦%Â%¬
½¾©t® ¿¸¨q© ´©tÃ¸Â%¨%³³Ãw©
´6¶ ·¸°w¹® ¯¸³
±¹tÅÆtºdÇdÈÊÉ ±±ÆtË
ºd»¸¼w¨%©t® ª¨%¯w§ ³ ¨%³·¸¶ § ³

ÅÃw¯¸Ìt® °w·¸©t¦%§ ® Ã¸¯ ½¾Í ¦%§ ¦
Figure 4.2: Tightly-coupled system design.
be used that would ensure minimal re-computation of data. For example, a semi-informed wa-
termarking scheme’s embedding and detection algorithms will generally perform some identical
computation on unwatermarked image data. A stateful watermarking component could cache and
reuse this data more efficiently. Section 4.3 describes further performance enhancements.
• Extensibility - Definition of specific interfaces for the plug-in system components would allow
for a completely generalised driver program, whilst maintaining full flexibility of actual plug-in
component functionality. This would achieve controlled extensibility, which suits the complex
nature of the system; rather than the loosely coupled model which would define no interfaces
other than the structure of the intermediate files.
• Configurability - By design, all system components have to be highly configurable. With over
three million different embed-attack-detect cycles to be performed, having a tightly-coupled sys-
tem would allow a more structured approach to experiment configuration, i.e. the driver program
would act as a central point of experiment management, and assist in permutation generation.
37
4.2 Implementation Technologies
4.2.1 Development Requirements
Implementing the benchmark framework and watermarking systems presented challenges due to the
diverse requirements of the chosen programming language and execution environment. Further to this,
being a research project rather than a conventional piece of software engineering, implementation needed
to be focused on what was to be done, rather than how it was to be done. Ideally a platform was
required that provided a high-level paradigm for the driver program, allowing straightforward handling
of large amounts of highly structured data, file-handling and configuration management. However, the
platform needed to also provide facilities for high performance, complex numerical computing, and
preferably have available libraries for Fourier analysis, wavelet analysis, and image processing. Finally,
the platform should provide high-level facilities for the plug-in architecture of the system.
4.2.2 Java
Java’s rapid development style would suit the complex nature of the problem, as would its large standard
library which includes configuration management and image processing capability. Additionally, Java’s
garbage-collection features would aid developmental focus on functionality rather than memory man-
agement. Java also features a ‘reflective’ API, which allows program components to be loaded at runtime
and their structure analysed. Combined with its object-oriented model, this presents an elegant solution
to plug-in functionality. A major drawback is that Java is not suited to intensive numerical processing,
as its interpreted execution and array bounds checking lead to relatively slow performance. Extensive
research was undertaken to find applicable numerical Java libraries, however this was unsuccessful.
4.2.3 C
C’s speed and numerical computing pedigree offer an attractive solution to the processing components
of the system. Several Fourier analysis, wavelet analysis, and image processing C libraries are available
that could be used within the system. However, due to its low-level characteristics, using C for other
areas of the system would equate to a longer and more complex implementation.
4.2.4 Matlab
Matlab is becoming a de-facto standard for research focused numerical computing and provides libraries
for Fourier analysis, wavelet analysis, and image processing. A key advantage of utilising Matlab would
be that reference Matlab implementations of both the SSIM and S-CIELAB perceptual distance metrics
are available. However, being more suited to monolithic design, employing Matlab for the whole system
would present challenges in implementing the plug-in architecture.
38
ÙÚ%î¸Üqç Ü%Û
üdòwò¸ç ßwà Û
ÙÚ%Û Ü%ÝÞÚ%Ýtßwà á¸âãä¸åÛ Ü%Þµæç èwâ¸étà áwå
ÙÚ%Û Ü%ÝtÞÚ%Ýß ÙÚ%Û Ü%ÝÞÚqÝß ÙÚ%Û ÜqÝÞÚ%Ýtß ýý6üdÙ
êdÞ«ïwÜ%ðwð¸Ü%Ý í¾Ü%Û Ü%ñqÛ ò¸Ý ó Ü%áwÜ%ÝtÚ%Û òwÝ í¾ôü
ÎÏ Ð ÑqÐ ÒqÓÔ

Õ¢ÖFÓÑq×Ø
êdëwì¸Ü%Ýtà ÞÜ%áwÛ ÷¾Û Û Ú%ñ%ß
í¾Ýà î¸Ü%Ý æÝtò¸ñ%Ü%ååòwÝ
æç è¸âwéà á¸å
ödÜqâ¸Ü%áwðTþ
ÿÚ%î¸ÚFôòwÞ«ìwò¸áwÜ%áwÛ
ãéôõêdöd÷¾ø ããõù ô#ödà ï¸ÝtÚ%Ýtäôò¸Þìwò¸áwÜ%á¸Û
êdë¸ìwÜ%Ýà ÞÜ%á¸Û å Ü%åè¸ç Û å ù9Ú%Û ç Ú%ïôòwÞì¸òwá¸Ü%áwÛ
ôòwá¸útà âwè¸ÝtÚ%Û à ò¸á í¾û Ú%Û Ú
Figure 4.3: Ideal hybrid system design.
4.2.5 A Hybrid Solution

It is clear that using only one of these platforms would result in compromising some aspect of the design,
slower performance or extended development time. Therefore a hybrid solution was sought, that could
lever the positive qualities of each platform, whilst maintaining the integrity of the design. The loosely-
coupled model would present no challenges in doing this, as each component is a separate program and
the format of the intermediate data files could resolve any inter-development platform differences. In the
tightly-coupled design, runtime binary compatibility is necessary between each system component. Ide-
ally, the architecture illustrated in Figure 4.3 would be employed. Using this model, the system would
be essentially implemented in Java, however the intensive numerical processing would be delegated to
either the FFTW1 C library for computation of two-dimensional DCT’s, or Matlab for wavelet decom-
position and perceptual distance metric computation. The attack modules would be implemented in Java
as relatively they represent only a small amount of computation.
The realisation of this model required extensive research into integration technologies. In particular,
it was apparent that no existing technology could facilitate invoking a Matlab function from a Java
program.2 So a solution was developed that utilises external interface technologies from both platforms
to bridge the gap ‘half-way’ from both sides:
• Java’s Native Interface (JNI) extension allows the invocation of a method from a Java class to
be handled directly by a platform dependant shared library, e.g. a shared object file (.so) for
1 FFTW is a high performance C library for “computing the discrete Fourier transform (DFT) in one or more dimensions,
of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms
or DCT/DST)” - http://www.fftw.org
2 Matlab can be configured to have a JVM running within it, however a Java method can only be invoked from a Matlab
function - not the other way around. Also, some Java to Matlab adapters do exist, but they are for very old versions of both
systems.
39
NO QSR T9U ^HR _O] ` F9F#E.
V U QXW%YZ
! aPO QXO 0D:E
[.Z.O\%QXU] [.Z.O\ QSU]
NPO QSR T9U ^.R _O] `

V U QXW YZ
! aO QSO
!#"$% '&#( )*+ $
[.Z.O\ QSU] [HZ.O\ QSU]
3 + 3 +
,.455 01 6 7 8 7
32(
E.77(

9*+:<;+,.=?>1@
,.-/+ N O QSR T9U
^.R _O] ` Ab ( 4
01 2 V!U QSW YZ aPO QXO C)
[HZ.O\ QXU] [.Z.O\%QXU] ,.
99;A
=.5HG
> 6
1
&#76$%$%7 I%2J:7/7
., -/ $ C<$%)( $ &9( )* $
:<7B+ )+ 7 0D :K=. 4+"!:7/7
AL ( 4M:<7/7
Ab:<:c:<7/ ( 5!AL ( 4M:<7/7
I%d1;feg:K=. 4"!>15/
Figure 4.4: Final hybrid system design with external integration.
GNU/Linux or a dynamic link library (.dll) for Microsoft Windows. Facilities include moving
data between the Java Virtual Machine and the memory space of the library. Using this, a binary
adapter can be built between a Java class and any C library.
• Matlab’s MCC Compiler compiles inbuilt or user-defined Matlab functions into a platform de-
pendant C library. This library then utilises (re-distributable) runtime components of the Matlab
engine to provide an API for moving data in and out of the Matlab workspace, and linkage for
invoking the compiled inbuilt or user-defined functions. Using this library, a binary adapter can
be built to invoke a Matlab function from any C library.
Together, these two external interface technologies can be harnessed to produce the necessary binary
compatibility between the software components. (Relative to the processing time involved, the overhead
introduced by using these adapters is negligible.) The final test-bed design, including external integration
is illustrated in Figure 4.4. Note, excluding the perceptual distance metrics, all software components
were built from scratch, including the watermarking systems, which were implemented directly from
the original cited papers.
40
4.3 Performance Issues
Although performance (in terms of execution speed) was not an explicit requirement of the system,
it remained an implicit requirement throughout all stages of the design and implementation. A large
emphasis on numerical processing, combined with large data sets (an 800×600 24-bit image will require
approximately 11Mb of memory when represented in double precision floating point format) and a large
number of experimental permutations meant that optimisation was necessary at all levels of the design.
In addition to the delegation of highly computation tasks to either C libraries or Matlab, performance
was enhanced via the following:
• Caching - It was recognised that each experimental permutation re-computed much of the data
from other experiments. The more shallow into the functional nesting this data was computed, the
more it could be re-used. For example, the Cox watermarking system [7] requires the locations of
the one thousand largest DCT coefficients of the original image for both embedding and detection.
Following the first calculation of these positions for a certain image, the positional data could
be stored in a look-up table and re-used for all future operations on the image by the relavent
algorithms. This notion of pre-computation and caching was employed in all embedding and
detection modules, attack modules, and FFTW/Matlab adapter functions to efficiently re-use any
appropriate data.
• Heuristic Algorithms - A performance bottleneck detected in the system was that of determining
the optimum embedding strength for each experimental permutation by the perceptual distance
metrics. Initially this used a brute-force approach, however a dynamic algorithm was introduced
to improve the search time using heuristics based on previous results.
• Parallelisation - Even with the above optimisations, the projected running time for the experi-
ments was approximately one thousand hours (based on an AMD Athlon 1300 workstation with
Fedora Core 2 Linux). However, the problem space can be partitioned and the experiments paral-
lelised to deliver a linear gain in performance. This was realised by distributing the experiments
over one hundred and twelve School of Computing Linux workstations, where each node concur-
rently processed a single image with a single watermarking system, whilst utilising the full set of
messages, keys, attacks and attack strengths. The implementation of the parallelisation was built
into the experiment driver module, and administered by remote command execution over SSH.
4.4 Additional Driver Modules

The plug-in architecture of the software allowed the processing components to be utilised with additional
driver modules. The most important of these was a GUI-based driver (see Figure 4.5) that provided in-
teractive facilities for watermark generation, embedding and detection. Additionally, attacks could be
performed on watermarked images, fidelity measurement could be taken using both perceptual distance
metrics, and ‘difference images’ could be calculated to show watermark spatial representations. The
41
tool allowed for every parameter of the components to be adjusted, and used Java’s reflective API to ex-
amine and dynamically configure the GUI controls per watermarking system, attack and fidelity metric.
This tool was used extensively for component testing, and provided the framework for undertaking the
perceptual distance metric calibration experiments, as discussed in Section 2.4.3.
Figure 4.5: GUI based calibration tool.
4.5 Summary
Several implementation designs were evaluated, each of which matched the modular character of the
benchmark design. The tightly-coupled design was chosen as it offered advantages of performance
through minimal I/O, extensibility through a plug-in architecture and configurability through a central
experimental driver module.
The requirements of the implementation technology were diverse and no individual programming
language or platform could satisfy these without compromising some aspect of the design. Therefore a
hybrid solution was sought, that enabled runtime compatibility between components written in incom-
patible languages and running in homogenous execution environments. To provide this solution, runtime
adapters were written in C that utilised external API’s of Java and Matlab to perform the necessary link-
age and data interpretation.
Performance was an implicit requirement of the implementation due to the large number of experi-
mental permutations and computation complexity of the various system components. Performance was
enhanced through a combination of optimised processing via caching, heuristic algorithms and paral-
lelisation over a large number of workstations.
All software components (excluding the perceptual distance metrics) were build from scratch. The
watermark generation, embedding and detection modules were implemented directly from the original
cited papers.
42
Chapter 5
Benchmarking Results and Evaluation
5.1 Results Analysis

The following sections present the results of the benchmarking experiments. For each attack, a graph is
shown which summarises the benchmarking results, together with an example attacked image.
5.1.1 Brightness Adjust
1
0.9
0.8
0.7
Detection Value
0.6
0.5
0.4
0.3
0.2
0.1
0
-255 -204 -153 -102 -51 0 51 102 153 204 255
Intensity Offset
Barni Corvi Cox Kutter
Figure 5.1: ‘Brightness Adjust’ attack results and attacked ‘rose’ image (Intensity offset = 153).
See Figure 5.1. Both the Cox and Barni systems embed in the DCT domain of an image, but do
not use the DC coefficient as an embedding site. As only the DC term will be affected by a uniform
43
intensity offset, this provides excellent robustness. The Kutter system’s adaptive threshold serves a sim-
ilar purpose, in removing the average intensity of pixels before correlation. The Corvi system attempts
no specific intensity normalisation, hence performs poorly. With either a positive or negative intensity
offset, there is the potential for pixels to be clamped at their extreme values. As the offset becomes large,
many pixels may become clamped and hence the statistical properties of an image changes non-linearly.
In this case the Kutter system performs well, as its redundant embedding strategy increases the probabil-
ity that ‘unclamped’ watermarked sites are available to reconstruct the watermark correctly. Conversely,
as the other three systems spread one copy of the watermark over every pixel in the image, clamped
pixel values will consistently affect detection performance.
5.1.2 Contrast Adjust
1
0.9
0.8
0.7
Detection Value
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Intensity Scale
Figure 5.2: ‘Contrast Adjust’ attack results and attacked ‘rose’ image (Intensity scale = 0.5).
See Figure 5.2. When contrast is increased (contrast scale > 1), pixel values may be clamped as
discussed with the ‘brightness adjust’ attack, hence detection values follow a similar pattern. However
decreasing contrast has no clamping effect, and detection values remain high for the Barni, Cox and
Kutter systems. These high values can be explained as both the Barni and Cox systems normalise the
watermark and embedding site vectors before correlation, hence scaled DCT spectrums have no effect
on detection performance. Similarly, the Kutter system’s adaptive threshold serves to normalise over the
range of intensity differences. The Cox system has an extra dependency, i.e. the positions and ordering
of the greatest one thousand DCT coefficients must remain stationary; clearly this dependency holds
with a linearly scaled spectrum. The Corvi system again performs poorly as it makes no attempt at
intensity normalisation.
44
5.1.3 Mean Blur / Gaussian Blur
1
0.9
0.8
0.7
Detection Value
0.6
0.5
0.4
0.3
0.2
0.1
0
3 5 7 9 11 13 15 17 19
Mean Filter Size NxN
1
0.9
0.8
Detection Value
0.7
0.6
0.5
0.4
0.3
0.2
0.1
1 2 3 4 5 6 7 8 9 10
5x5 Filter Passes
Figure 5.3: Top - ‘Mean Blur’ attack results and attacked ‘rose’ image (Filter size = 11×11). Bottom -
‘Gaussian Blur’ attack results and attacked ‘rose’ image (Filter passes = 8).
See Figure 5.3. The high-frequency Kutter watermark performs the poorest, as much of the water-
mark information will be lost to the mean blur operation. The mean blur acts similarly to the low-pass
filters used within the wavelet decomposition of the Corvi system; therefore embedding into approx-
imation sub-band coefficients derived from such a low-pass filter leads to high detection values. The
inflection present within the Barni results may be due to the edge conditions of the mean blur operation,
i.e. as the filter size grows, more edge pixels cannot be convolved with a full size filter, hence low to
mid-frequency coefficients are distorted to a lesser extent.
The Gaussian blur presents differing results to that of the mean blur. This may be explained by the
differing statistical properties introduced by applying a constant sized filter over multiple passes, rather
than applying a varying filter size over a single pass. Clearly the Barni and Kutter systems are less erratic
45
and have higher detection values. Again, this may be due to the edge conditions being less prevalent
with a 5×5 filter.
5.1.4 Sharpen
1
0.9
0.8
Detection Value
0.7
0.6
0.5
0.4
0.3
0.2
0.1
1 2 3 4 5 6 7 8 9 10
Unsharpen Mask Scale
Figure 5.4: ‘Sharpen’ attack results and attacked ‘rose’ image (Unsharpen mask scale = 10).
See Figure 5.4. The Kutter system maintains strong detection values as it creates a spatial watermark
with a high frequency pattern (see Figure 3.5); hence a sharpening operation will enhance these features
and effectively increase detection value. The Barni system maintains equally strong detection values as
it embeds exclusively into low to mid-range DCT coefficients, which will be unaffected by the attack. An
possible explanation of the Cox system’s relatively poor performance is that the sharpening operation
permutes the positions of the greatest valued DCT coefficients; hence leading to correlation of non-
corresponding site and watermark values. The Corvi system shows unexpectedly poor performance, as
one would assume that the low-frequency, wavelet approximation image into which it embeds would be
unaffected by high-frequency changes.
5.1.5 Crop
See Figure 5.5. The crop operation results in the most divergent performance of the watermarking sys-
tems. The Kutter system performs well due to its spatially embedded, highly redundant watermark; this
will clearly be invariant to large areas of removal. The large difference in performance between the
two DCT embedding algorithms (Barni and Cox), is again possibly due to the crop operation permut-
ing the positions of the greatest valued coefficients within the Cox scheme. The Corvi system shows
unexpectedly poor performance, as intuitively, the non-cropped areas should remain unaffected in their
corresponding wavelet domain approximation coefficients.
46
1
0.9
0.8
0.7
Detection Value
0.6
0.5
0.4
0.3
0.2
0.1
0
5 15 25 35 45 55 65 75 85
Removed Border Area (%)
Figure 5.5: ‘Crop’ attack results and attacked ‘rose’ image (Removed area = 75%).
5.1.6 Rotate
See Figure 5.6. Clearly the rotate attack has a large impact on watermark detection. Considering the
spatial embedding of the Kutter system, site positions will be transposed, leading to correlation with
values from incorrect sites. Rotating an image will also distort its frequency spectrum, which severely
impacts the Cox and Barni systems. The wavelet domain approximation sub-band of a rotated image
will be rotated to the same degree, hence the Corvi system suffers a similar problem to the Kutter system,
that of transposed embedding sites.
The interpolation scheme had little effect on detection values, hence only the graph for ‘rotation with
bi-linear interpolation’ is shown.
0.2
Detection Value
0.1
0.25 0.75 1.25 1.75 2.25 2.75 3.25 3.75 4.25

Angle of Rotation (Degrees)
Figure 5.6: ‘Rotate with bi-linear interpolation’ attack results and attacked ‘rose’ image (Rotation = 5 o ).
47
5.1.7 Horizontal Shear
1
0.9
0.8
0.7
Detection Value
0.6
0.5
0.4
0.3
0.2
0.1
0
0.3 0.9 1.5 2.1 2.7 3.3 3.9 4.5 5.1 5.7
Shear Proportion (%)
Figure 5.7: ‘Horizontal shear with bi-linear interpolation’ attack results and attacked ‘rose’ image (Shear
proportion = 6%).
See Figure 5.7. The horizontal shear operation is comparable to that of rotation, however as the
pixels are only shifted horizontally there is a greater opportunity for embedding sites to be in the correct
positions (in both the spatial and transform domains). For example, this can be seen in the results of
the Kutter system, which embeds across scan-lines of an image. When the shear proportion is small, a
number of scan-lines at the top of the image are not affected by the operation, hence the embedding sites
are in the correct positions and the detection value across the entire image is increased.
The interpolation scheme had little effect on detection values, hence only the graph for ‘horizontal
shear with bi-linear interpolation’ is shown.
5.1.8 Scale
See Figure 5.8. In terms of perceptibility, the scale attack arguably affects an image the greatest; however
all watermarking systems maintain strong detection values. The Kutter system performs unexpectedly
well, as intuitively, one would expect its high-frequency watermark to be adversely distorted by down-
sampling. However, its redundant embedding strategy may provide this robustness as some sites will
remain unaffected by the attack (i.e. the nearest-neighbour interpolation used for the image rescaling
operation will select correct corresponding site positions in both the down-sampled and up-sampled
image). The transform domain embedding systems will clearly suffer from Nyquist limit 1 aliasing intro-
duced by the re-sampling of the image. This can be seen particularly in the Cox results, as Nyquist limit
artefacts may permute the positions of the highest valued DCT coefficients within an image’s spectrum.
Each watermarking scheme exhibits an increase in detection value when the image is scaled by 50%.
1 The Nyquist limit states that a sampled signal can only represents frequencies at or below half the sampling rate. See [39].
48
1
0.9
0.8
0.7
Detection Value
0.6
0.5
0.4
0.3
0.2
0.1
0
5 15 25 35 45 55 65 75 85 95
Size Reduction (%)
Figure 5.8: ‘Scale with bi-linear interpolation’ attack results and attacked ‘rose’ image (Size reduction
= 80%).
This is caused by the nature of the attack, rather than any inherent robustness to this particular value,
i.e. when scaling an image back to its original size, the resulting image may not be perfectly registered
with the original image. This is due to the integer arithmetic of the nearest-neighbour interpolated re-
sampling (e.g. a 512×512 image reduced by 80% and then rescaled by 500% will result in a 510×510
image). The scale attack was implemented to reduce this effect by performing further registration of the
original and rescaled image. However, over the range of all images, the reduce 50% / increase 200%
operation resulted in the most accurate registration performance, yielding the slight peak in the results.
The down sampling interpolation scheme had little effect on detection values, hence only the graph
for ‘scale with bi-linear interpolation’ is shown.
5.1.9 Impulse Noise / Gaussian Noise

See Figure 5.9. Only the Kutter system withstands a significant amount of impulse noise; showing a
gradual, linear fall in detection value. This linear trend reflects the strength of the system’s embedding
density parameter, which controls the ratio of modified (watermarked) pixels. Clearly as the ratio of
impulse noise affected pixels increases, the ratio of attacked embedding sites increases proportionally.
The transform domain systems exhibit similar trends, however the results are poorer than expected as
each use low-frequency coefficient embedding sites. The Corvi system has unexpectedly very poor
results, as theoretically, the high-frequency noise should not affect the approximation image at a low-
level of wavelet decomposition. Note that although it is traditional to measure impulse noise in terms
of signal-to-noise ratio, the non-logarithmic ‘ratio of modified pixels’ scale is used here to highlight the
linear trend of the Kutter system.
The Gaussian noise results follow a very similar trend to that of the impulse noise attack, however,
the overall performance of each of the watermarking systems is increased. It is possible that is due to the
49
less severe nature of the attack, i.e. although every pixel in the image is affected by Gaussian noise, the
probability that a pixel’s value is clamped because of the noise is much less than that of impulse noise
(in which every affected pixel is clamped).
1
0.9
0.8
0.7
Detection Value
0.6
0.5
0.4
0.3
0.2
0.1
0
1 3 5 7 9 11 13 15 17 19
Ratio of Modified Pixels (%)
1
0.9
0.8
0.7
Detection Value
0.6
0.5
0.4
0.3
0.2
0.1
0
5 15 25 35 45 55 65 75 85 95
Noise Standard Deviation
Figure 5.9: Top - ‘Impulse Noise’ attack results and attacked ‘rose’ image (Modified ratio = 19%).
Bottom - ‘Gaussian Noise’ attack results and attacked ‘rose’ image (Noise Std. Dev. = 95).
5.1.10 JPEG Compression

See Figure 5.10. Many authors treat robustness to lossy compression as the most important property
of practical watermarking systems [10]. Particularly, robustness to the JPEG compression algorithm is
a desirable property due to its ubiquity of use in software applications. However, as Cox et al. [10]
discuss, many authors have recognised a fundamental conflict between lossy compression and water-
marking. Analogous to watermarking, lossy compression deals with issues of fidelity, i.e. the com-
pression algorithm will aim to produce a compressed image that is perceptually identical to the original
50
1
0.9
0.8
0.7
Detection Value
0.6
0.5
0.4
0.3
0.2
0.1
0
95 85 75 65 55 45 35 25 15 5
JPEG Quality Specification (%)
Figure 5.10: ‘JPEG Compression’ attack results and attacked ‘rose’ image (JPEG Quality = 5%).
image. Conceptually therefore, two perceptually identical images would ideally compress to an identical
bit-wise representation. In here lies the conflict, as one of the two perceptually identical images may
contain a watermark. Hence, if lossy compression was indeed optimal, watermark recovery would be
impossible unless the watermarked and original images where perceptually distinct (which is of course,
conflicting with the fidelity requirements of watermarking).
As can be seen from the results, the JPEG compression algorithm is far from optimal in this sense,
leaving much redundant information for the watermark to exploit. The Kutter system performs the
poorest, as its high-frequency, spatial watermark is quantised heavily by the JPEG algorithm. The three
transform domain algorithms maintain strong detection values; clearly selecting the most perceptually
significant coefficients as embedding sites is a highly robust strategy, as the JPEG algorithm quantises
the perceptually insignificant coefficients of an image.
5.2 Results Discussion

The results show that geometric transformation type attacks present the greatest difficultly in achieving
watermarking robustness. This is widely acknowledged by many authors and is an active area of current
research [23]. The problem is so acute that geometric transformations form the basis of many active
attacks [33]. (It can be seen from the results that even a small, un-noticable distortion of this type can
render the watermark undetectable.) For the remaining image processing, transmission distortion and
storage distortion attacks, the watermarking systems show sufficient robustness to allow confidence in
a high binary detection threshold. Note the strengths of the attacks used in the benchmarking experi-
ments were generally much greater than what a watermark would be expected to withstand in a practical
application (i.e. the attacked images were distorted beyond the threshold of ‘usability’).
The Kutter system is generally the most robust throughout the experiments, suggesting that redun-
51
dant embedding and exploitation of the human visual system’s insensitivity to the blue colour channel
are effective strategies. However, the high-frequency spatial nature of the watermark leaves the system
weak to any attack that modifies the high-frequency components of an image (e.g. mean blur, JPEG
compression).
The Barni system is the most robust of the transform domain embedding schemes. Its strategy of
embedding into low to mid-frequency intensity channel DCT coefficients is highly robust to JPEG com-
pression, which as discussed in Section 5.1.10 is an important property. The Barni system is also the
only transform domain embedding scheme to perform additional spatial perceptual shaping of the water-
mark. The authors argue that this simple step increases net detection values by allowing a higher average
embedding strength [2]. It is unclear to what effect this had on the results, however a useful exercise
would be to repeat the Barni experiments without this post-processing step to assess its contribution. An
additional positive feature of the Barni results is the generally linear or near-linear decrease of its detec-
tion values, which would be advantageous when evaluating the system against any application-centric
requirements.
The Cox and Corvi systems generally perform the poorest in all of the experiments. This is an
unexpected result, as initially their robustness strategies appear astute. The Cox system’s selection of
embedding sites appears flawed in its dependency on actual image data. Rather than adding a layer of
robustness, this actually appears to add a layer of fragility to the algorithm. The Corvi system suffers
from a poor detection algorithm which is sensitive to both uniform brightness and contrast change,
however its poor performance in other attacks appears to conflict with the known properties of wavelet
decomposition and reconstruction. The cause of this is unclear, and further investigation would be
necessary to offer an explanation.
A further unexpected result is that both the Kutter and Barni schemes are of the semi-informed
class. Many authors argue that informed watermarking systems are more robust as they allow invertible
operations on image data [23]. This is shown not to be the case with the above results, however the
argument does appear valid and is widely accepted. Therefore, the Cox and Corvi algorithms appear not
to be representative of the achievable robustness of informed watermarking systems.
5.3 Benchmark Evaluation

The previous sections analyse the results of the benchmarking experiments and evaluate the watermark-
ing systems within the context of these results. However, a further evaluation is required of the bench-
mark framework itself, to review its strengths and limitations, and specifically what effect these issues
had on the results. This evaluation is presented in terms of three criteria: The scientific validity of the
benchmark, a comparison to previous work, and factors that affect the accuracy of the results.
5.3.1 How scientifically valid is the benchmark?

Throughout this report it has been shown that watermarking and watermarking performance measure-
ment have complex inter and intra-dependencies that make benchmarking a difficult task. In view of
52
this, the scientific validity of the benchmarking model is very important, as it must allow confidence in
the decoupling of these dependencies and the creation of comparable results.
The main argument for the validity of the new benchmark is its adherence to experimental method.
Specifically, the conflicting capacity-fidelity-robustness property space is simplified by fixing the ca-
pacity and fidelity elements to constant values. This allows the manipulation of a single independent
variable (attack strength), and the measurement of a single dependant variable (detection value). All
other variables are controlled, and the design allows for the results of multiple experiments to be com-
bined in a statistically sound manner (i.e. via the ‘local’ normalisation of detection values discussed in
Section 2.3.5).
A weakness in scientific validity is the method used for calibration of the perceptual distance metrics.
The actual accuracy of the metrics is discussed in Section 5.3.2, however the accuracy of calibration
could potentially magnify any errors in fidelity measurement. The method used for the calibration was
constrained by the time available, however it was performed as accurately a possible. Specifically, the
calibration exercise did not include important external parameters such as lighting conditions, display
device dependencies or visual angle. Additionally, with such a subjective exercise, appropriate psycho-
physical experimental technique should be used. These factors were beyond the remit of this project,
and the effects of their absence are difficult to quantify within the benchmarking results. However, they
are acknowledged as a potential weakness within the design.
5.3.2 How accurate are the results?

Although the benchmarking model has been judged to be scientifically valid within its remit, this does
not automatically guarantee accurate results. Two main factors affect results accuracy: The size of the
data sets and the accuracy of the perceptual distance metrics.
The data set size is important in that a sufficiently large number of tests need to be performed to
allow a representative, average performance of the watermarking systems to be inferred. In terms of
experimental permutations, as discussed in Section 3.1 the number of detection values that are averaged
to produce one benchmarking data point is set to a computationally feasible maximum. Specifically, two
thousand, eight hundred detection values are averaged to produce each data point of the graphs presented
in the previous sections. This consists of twenty eight images, embedded with one hundred different
watermarks, generated from ten keys and ten messages. The ‘local’ normalisation of the detection values
(discussed in Section 2.3.5) is performed before their averaging; therefore conceptually, the averaging
operation serves only to deduce ‘typical’ performance, rather than suppressing the effects of a priori low
detection values on the results. For this reason, two thousand eight hundred detection values is judged
to be a sufficiently large set.
Above all else, the accuracy of the perceptual distance metrics will decide the accuracy of the bench-
mark, i.e. its design is dependant on watermarks being embedded with a constant fidelity, independent of
the watermarking system, image, key or message being used. Note that the perceptual distance metrics’
accuracy in predicting actual human observation is less important than being consistent across all ex-
perimental permutations, i.e. it should be monotonic with human observation. If the perceptual distance
53
1
0.9
0.8
0.7
Embedding Strength
0.6
0.5
0.4
0.3
0.2
0.1
0
10 20 30 40 50 60 70 80 90 100
Watermark
Barni (S-CIELAB) Barni (SSSIM)

Corvi (S-CIELAB) Corvi (SSSIM)
Cox (S-CIELAB) Cox (SSSIM)
Kutter (S-CIELAB) Kutter (SSSIM)
Figure 5.11: Embedding strengths calculated by the perceptual distance metrics when embedding one
hundred different watermarks into the ‘Lena’ image.
metrics are inaccurate then watermarks will be embedded with a higher or lower strength than necessary,
and hence exhibit experimentally invalid robustness properties.
The perceptual distance metrics’ accuracy could be checked manually, requiring eleven thousand,
two hundred images to be compared for perceptual similarity. Clearly this is impractical, and would
suffer the same problems of subjective experimentation as did the metrics’ calibration. Therefore, any
accuracy evaluation must be inferred from the benchmark data itself. To aid this, the embedding strength
calculated by each metric for each embedding operation was recorded as benchmarking meta-data. Fig-
ure 5.11 shows the embedding strengths derived by each metric when embedding into the ‘Lena’ image.
Intuitively, the embedding strength should change relatively little when only varying the key or message,
as it is the properties of the image that affect watermark perceptibility more than the relatively low en-
ergy of the watermark itself. The values shown in Figure 5.11 correlate with this supposition, showing
that for each watermarking system, embedding strength remained relatively constant. Therefore the met-
rics can be judged to be consistent for all watermarks embedded into a particular image by a particular
watermarking system.
Any further evaluation is difficult as embedding strengths will intrinsically differ between images,
54
hence any direct comparison is invalid. As discussed in Section 2.4.2.3, the embedding strengths cal-
culated by the two metrics are combined and equally weighted in an attempt to reduce the effects of
inaccurate values. Using this premise, a simple observation could be made whereby for a particular wa-
termarking system, image and watermark, a large difference in calculated embedding strengths indicates
an inaccurate result from either of the perceptual distance metrics. Although this is a somewhat specu-
lative measure of the metrics’ accuracy, Figure 5.11 does show encouraging results, i.e. the perceptual
distance metrics generally agree in their calculation of embedding strength for a particular watermarking
system.
Measuring the consistency of the perceptual distance metrics between watermarking systems is not
possible using the above method, as their respective embedding strength values are not directly com-
parable, and normalisation methods would require research beyond the remit of this report. Therefore
the above discussion gives weight to the accuracy of the results for individual watermarking systems,
but accuracy of the relative performance of the watermarking systems is unqualified, and would require
extensive calibration and experimental evaluation of the perceptual distance metrics.
5.3.3 How does the benchmark compare to previous work?

The benchmark presents a number of advantages and disadvantages over previous work. As discussed
throughout this report, many watermarking schemes and watermark benchmarking systems either ignore
or simplify the properties of colour images and colour perception. Therefore a major contribution of this
benchmarking system is its explicit treatment of colour, specifically within the problems of fidelity
measurement and the solution it provides via the hybrid perceptual distance metric. This is a novel
approach, and the results of the benchmarking experiments highlight the importance of this topic, i.e. the
most robust watermarking system exploits colour perception within its robustness strategy. Additionally,
the benchmark’s native handling of colour image perception provides an advantage in terms of designing
watermarking systems. Specifically, variants of a watermarking system could be tested that previously
did not exploit colour perception. For example, the many watermarking systems that implicitly embed
into an image’s intensity channel could be modified and tested when embedding into other channels, and
more interestingly, into other colour spaces.
It has been shown that the constraints imposed onto the benchmark have aided in its accuracy, how-
ever these have also limited its range of application. The existing benchmarking systems and frameworks
discussed in Section 2.2 are general with regard to watermarking system class, and provide many more
metrics with which to measure properties other than robustness. In this respect the new benchmark
does not compare favourably. However, it could be argued that completely generalised benchmark-
ing, although theoretically important, may not be of any greater practical use than application specific
benchmarking. Specifically, until watermarking techniques allow the decoupling of the capacity-fidelity-
robustness property space, systems will be optimised to only one of these dimensions. Therefore, spe-
cific performance measurement of this optimisation will remain an important application.
In Section 2.2.2 the benchmarking framework proposed by Solachidis et al. [41] is presented as a
model for the design of the new benchmark. The new benchmark implements a subset of their proposals,
55
however some metrics applicable to robustness measurement are not implemented. Specifically, infor-
mation regarding the false-positive performance of a watermarking system is helpful when interpreting
any robustness evaluation. The new benchmark deals only with true-positive performance, however in
practical circumstances, a watermarking system could exhibit artificially high robustness through false-
positive detection. Therefore, the absence of this metric is not a disadvantage, but it would be a useful
and complementary feature.
5.4 Further Work

Based on the presented results and the above evaluation, there are many aspects of this work that could
be extended, in both the context of watermarking systems and watermark benchmarking:
• Additional watermarking systems could be evaluated, to give a more detailed picture of successful
robustness strategies.
• Colour perception effects on robustness could be investigated by modifying existing watermarking

systems to embed into different colour channels or colour spaces, and systematically comparing
performance. The benchmarking system could be used to scientifically perform these experiments.
• Specific results from the experiments discussed in Section 5.1 do not match expected observa-
tions. The cause of these could be more extensively investigated, particularly the poor overall
performance of the Corvi system.
• An evaluation of the new benchmarking system in comparison to Checkmark [28] and Stirmark
[23] could be performed by repeating the benchmarking experiments with these two systems.
Results analysis of the benchmarking could add quantitive evidence to the arguments posed within
this research.
• The range of attacks could be broadened to include further passive attacks. One of particular
relevance would be simulation of the recent JPEG-2000 lossy compression standard [36]. This
standard will have implications for watermarking systems, as many assume ‘normal’ JPEG (i.e.
the ISO/IEC-IS-10918-1 standard) will be applied, and hence their robustness strategies are de-
signed accordingly. The JPEG-2000 standard is based on wavelet domain compression rather than
DCT domain, so robustness strategies may be invalidated.
• Active attacks could be included within the benchmarking design, to enable the definition of
robustness to include watermark ‘security’.
• Metrics to evaluate the false-positive performance of watermarking systems would be a very useful
extension to the current scheme, as discussed in Section 5.3.3.
Perhaps the largest area of extension is within the topic of fidelity measurement. As discussed in
Section 2.4, this is a very active research area, and the perceptual distance metrics used within this
56
benchmark represent only a small proportion of current techniques. Improving existing metrics or de-
velopment of new metrics presents a difficult challenge, as this would involve detailed research of human
psycho-visual perception, and the design of methods for extensive subjective experimentation. For this
reason, an undergraduate project may not be suited to such a multi-disciplinary topic. However, a sys-
tematic review of current techniques would present a useful complement to this work, through evaluation
of the metrics’ applicability to fidelity measurement within benchmarking. Additionally, as discussed in
the previous section, the calibration of the current metrics was rudimentary and presents a wide scope
for improvement. Particularly, development of quantitive methods for evaluating the accuracy of the
current metrics would strengthen confidence in the benchmarking results.
57
Chapter 6
Conclusion
Digital watermarking is an important new technology within the field of image copyright protection. To
succeed it must satisfy many requirements; of which, the ability of a watermark to be robust against
image distortions is fundamental. However, watermark robustness is a complex property and its mea-
surement requires a scientific method of benchmarking. This work has aimed to address both of these
issues, and from it, a number of conclusions can be made.
It has been shown that benchmarking of watermarking systems is a difficult task due to the inter-
dependence of the properties that are to be measured, and those that are to be held constant. Several
existing benchmarks have been discussed, and shown to take a generalised approach, allowing flexibil-
ity in terms of the classes of watermarking system that can be evaluated and the performance metrics
available. Recent studies of these benchmarks have highlighted weaknesses in their testing model, par-
ticularly they do not acknowledge the statistical importance of using multiple watermarking keys and
messages, and do not explore the impact of human colour perception on embedding strength (and hence
robustness).
A new benchmarking strategy has been proposed that builds on this previous work and shown to
address the problems identified. By constraining the problem space this benchmark is less flexible, but
shown to be more scientifically valid for the watermarking system classes under evaluation. The im-
portance of an accurate measure of fidelity has been highlighted, and an innovative solution employed
which models both high and low-level characteristics of the human visual system. This solution explic-
itly models psycho-visual colour perception, allowing the benchmark to evaluate watermarking systems
with full-colour test images. As watermarking systems can exploit masking effects of human colour per-
ception to enhance robustness, this is an important advancement in benchmarking accuracy, specifically
with regard to performance comparison.
It has been shown that watermarking systems adopt diverse strategies to optimise their robustness to
58
image distortions. The results of the benchmarking experiments show that spatially redundant embed-
ding and exploitation of the human visual system’s insensitivity to the blue colour channel are effective
techniques. Additionally, there is strong evidence that targeting image content that will have a relatively
low probability of being distorted is also successful (specifically, embedding into perceptually significant
low to mid-frequency DCT coefficients). An unexpected result has been highlighted, in that the water-
marking systems found to be the most robust are of the semi-informed class. Traditionally, these are
accepted to less robust than informed schemes, however these results are encouraging for watermarking
applications, as semi-informed systems have a wider and more practical scope of use.
This work’s findings have shown that present watermarking techniques would satisfy many of the
requirements of ‘real-world’ copyright protection scenarios, in showing robustness to diverse image
distortions. However, further optimisation of the trade-off between fidelity and robustness is crucial
to their evolution, as invisibility will remain a primary application requirement. As it has also been
shown that an important element of benchmarking is an accurate fidelity measurement, it is clear that
advancement in understanding of the human visual system is fundamental to both the development of
watermarking systems, and in methods of measuring their performance.
59
Bibliography
[1] Ross J. Anderson and Fabien A. P. Petitcolas. On the limits of steganography. IEEE Journal of
Selected Areas in Communications, 16(4):474–481, 1998.
[2] M. Barni, F. Bartolini, V. Cappellini, and A. Piva. Robust watermarking of still images for copy-
right protection. In Proc. 13th Inter. Conf. Digital Signal Processing, volume 2, pages 499–502,
1997.
[3] Laurence Boney, Ahmed H. Tewfik, and Khaled N. Hamdy. Digital watermarks for audio signals.
In International Conference on Multimedia Computing and Systems, pages 473–480, 1996.
[4] J. Brassil, S. Low, N. Maxemchuk, and L. O’Gorman. Electronic marking and indentification
techinques to discourage document copying. In Proceedings of IEEE Inforcom’94, pages 1278–
1287, 1994.
[5] Digimarc Corporation. Corporate website. http://www.digimarc.com/.
[6] Marco Corvi and Gianluca Nicchiotti. Wavelet-based image watermarking for copyright protec-
tion. In Scandinavian Conference on Image Analysis SCIA ’97, 1997.
[7] Ingemar Cox, Joe Kilian, Tom Leighton, and Talal Shamoon. Secure spread spectrum watermark-
ing for multimedia. IEEE Transactions on Image Processing, 6(12):1673–1687, 1997.
[8] Ingemar J. Cox and Jean-Paul M. G. Linnartz. Some general methods for tampering with water-
marks. IEEE Journal on Selected Areas in Communications, 16(4):587–593, 1998.
[9] Ingemar J. Cox and Matt L. Miller. A review of watermarking and the importance of perceptual
modeling. In Proc. of Electronic Imaging ’97, 1997.
[10] Ingemar J. Cox, Matthew L. Miller, and Jeffrey A. Bloom. Digital Watermarking. Morgan Kauf-
mann Publishers, 2002.
[11] Scott Craver, Nasir D. Memon, Boon-Lock Yeo, and Minerva M. Yeung. Can invisible watermarks
resolve rightful ownerships? In Storage and Retrieval for Image and Video Databases (SPIE),
pages 310–321, 1997.
[12] David J. Fleet and David J. Heeger. Embedding invisible information in color images. In IEEE
Signal Processing Society 1997 International Conference on Image Processing (ICIP’97), 1997.
60
[13] J. Fridrich. Applications of data hiding in digital images. In Proceedings of the ISPACS Conference,
1998.
[14] Bernd Girod. What’s wrong with mean-squared error? Digital Images and Human Vision, pages
207–220, 1993.
[15] D.M. Green and J.A. Swets. Signal Detection Theory and Psychophysics. Krieger Publishing Co.,
1974.
[16] F. Hartung and B. Girod. Digital watermarking of raw and compressed video. In Proc. European
EOS/SPIE Symposium on Advanced Imaging and Network Technologies, 1996.
[17] Frank Hartung and Martin Kutter. Multimedia watermarking techniques. Proceedings of the IEEE
(USA), 87(7):1079–1107, 1999.
[18] Corbis Inc. Corporate website. http://pro.corbis.com/.
[19] Getty Images Inc. Corporate website. http://creative.gettyimages.com/source/

home/home.aspx.
[20] E. Koch and J. Zhao. Towards robust and hidden image copyright labeling. In Proc. of 1995 IEEE
Workshop on Nonlinear Signal and Image Processing, pages 452–455, 1995.
[21] N. Komatsu and H. Tominaga. Authentication system using concealed images in telematics. Mem-
oirs of the School of Science and Engineering, Waseda University, 52:45–60, 1988.
[22] Martin Kutter, Frederic Jordan, and Frank Bossen. Digital signature of color images using ampli-
tude modulation. In Proc. SPIE Storage and Retrieval for Image and Video Databases, volume
3022, pages 518–526, 1997.
[23] Martin Kutter and Fabien A. P. Petitcolas. A fair benchmark for image watermarking systems. In
Proc. SPIE Security and Watermarking of Multimedia Contents, pages 226–239, 1999.
[24] J. Linnartz, A. Kalker, G. Depovere, and R. Beuker. A reliability model for detection of electronic
watermarks in digital images. In Proc. Benelux Symposium on Communication Theory, Enschede,
pages 202–208, 1997.
[25] Ross Martin and Douglas Cochran. Generalized wavelet transforms and the cortex transform. In
Proceedings of the 28th Asilomar Conference on Signals, Systems and Computers, 1994.
[26] A. Mayache, T. Eude, and H. Cherifi. A comparison of image quality models and metrics based on
humanvisual sensitivity. In Proceedings of the IEEE International Conference on Image Process-
ing, ICIP ’98, pages 409–413, 1998.
[27] Peter Meerwald. Digital Image Watermarking in the Wavelet Transform Domain. PhD thesis,
University of Salzburg, 2001.
61
[28] Peter Meerwald and Shelby Pereira. Attacks, applications and evaluation of known watermarking
algorithms with checkmark. In Proceedings of SPIE, Electronic Imaging, Security and Watermark-
ing of Multimedia Contents IV, 2002.
[29] S. Mohanty. Digital watermarking: A tutorial review. http://citeseer.ist.psu.edu/

mohanty99digital.html.
[30] N. Nikolaidis, V. Solachidis, A. Tefas, V. Arguriou, and I. Pitas. Benchmarking of still image
watermarking methods: Principles and state of the art. In Proc. of Electronic Imaging and the
Visual Arts 2002 (EVA2002), 2002.
[31] International Commission on Illumination. Recommendations on uniform color spaces, color dif-
ference equations, psychometric color terms. Supplement no.2 to cie publication no.15 (e.-1.3.1)
1971/(tc-1.3.), 1978.
[32] D. Pearson. Viewer response to time-varying video quality. In Proceedings of the SPIE - Human
Vision and Electronic Imaging, pages 16–25, 1999.
[33] Fabien A.P. Petitcolas, Ross J. Anderson, and Markus G. Kuhn. Attacks on copyright marking
systems. In Information Hiding, pages 218–238, 1998.
[34] Fernando Prez-Gonzlez and Juan R. Hernndez. A tutorial on digital watermarking. In Proc. of the
33rd IEEE Annual Carnahan Conference on Security Technology, pages 286–292, 1999.
[35] Lintian Qiao and Klara Nahrstedt. Watermarking schemes and protocols for protecting rightful
ownership and customer ’s rights. Journal of Visual Communication and Image Representation,
(9):194–210, 1998.
[36] Majid Rabbani and Rajan Joshi. An overview of the jpeg2000 still image compression standard.
Signal Processing: Image Communication Journal, 17(1), 2001.
[37] Tomas Sander. Golden times for digital rights management? http://citeseer.ist.psu.
edu/489047.html. InterTrust Technologies.
[38] Elisa Sayrol and Josep Vidal. Optimum watermark detection in color images. In Proc. IEEE
International Conference on Image Processing, ICIP99, 1999.
[39] C. E. Shannon. Communication in the presence of noise (reprint). Proceedings of the IEEE,
86:447–457, 1998.
[40] G. Sharma and H. J. Trussell. Digital color imaging. IEEE Transactions on Image Processing,
6(7):901–932, 1997.
[41] V. Solachidis, A. Tefas, N. Nikolaidis, S. Tsekeridou, A. Nikolaidis, and I. Pitas. A benchmark-

ing protocol for watermarking methods. In Proc. of 2001 IEEE Int. Conf. on Image Process-
ing(ICIP’01), pages 1023–1026, 2001.
62
[42] J. P. Stern, G. Hachez, F. Koeune, and J. J. Quisquater. Robust object watermarking: Application
to code. In Proceedings of Info Hiding ’99, 1999.
[43] Christian J. van den Branden Lambrecht and Joyce E. Farrell. Perceptual quality metric for digitally
coded color images. In Proceedings of EUSIPCO ’96, pages 1175–1178, 1996.
[44] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: From error
visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.
[45] Zhou Wang, A.C. Bovik, and Ligang Lu. Why is image quality assessment so difficult? In IEEE
International Conference on Acoustics, Speech, and Signal Processing, 2002, pages 3313–3316,
2002.
[46] A. B. Watson. Dct quantization matrices visually optimized for individual images. In Human
Vision, Visual Processing and Digital Display IV, Proc SPIE, pages 202–216, 1993.
[47] Andrew B. Watson. The cortex transform: rapid computation of simulated neural images. Com-
puter Vision, Graphics and Image Processing, 39(3):311–327, 1987.
[48] X. Zhang and B. Wandell. A spatial extension of cielab for digital color image reproduction. In
Proc. Soc. Inform. Display 96 Digest, pages 731–734, 1996.
63
Appendix A
Test Images
The following images were used within the benchmarking experiments, and are taken from the
database maintained by Fabien A. P. Petitcolas (an author of the Stirmark [23] benchmarking system).
Localised corrosion on an electropolished Al- Arctic Hare. Copyright photos courtesy of

Zn-Mg-Cu alloy. Copyright photo courtesy of Robert E. Barber, Barber Nature Photography.
Gerald Deshais, Department of Materials Sci- Original size: 594 x 400 pixels.
ence & Metallurgy, University of Cambridge.
Original size: 572 x 392 pixels.
64
Baboon. Courtesy of the Signal and Image Bandon beach. Copyright photo courtesy of
Processing Institute at the University of South- Robert E. Barber, Barber Nature Photography.
ern California. Original size: 512 x 512 pixels. Original size: 610 x 403 pixels.
Black Bear. Copyright photos courtesy of Brandy rose. Copyright photo courtesy of Toni
Robert E. Barber, Barber Nature Photography. Lankerd, 18347 Woodland Ridge Dr. Apt #7,
Original size: 394 x 600 pixels. Spring Lake, MI 49456, U.S.A. Original size:
418 x 600 pixels.
65
F15. Copyright photo courtesy of Toni F16. Courtesy of the Signal and Image Pro-
Lankerd, 18347 Woodland Ridge Dr. Apt #7, cessing Institute at the University of Southern
Spring Lake, MI 49456, U.S.A. Original size: California. Original size: 512 x 512 pixels.
732 x 500 pixels.
Fishing boat. Courtesy of the Signal and Im- Fourviere Cathedral, north wall. F. A. P. Petit-
age Processing Institute at the University of colas. Original size: 512 x 619 pixels.
Southern California. Original size: 512 x 512
pixels.
66
Kid. Copyright photo courtesy of Karel de Lena. Courtesy of the Signal and Image Pro-
Gendre. Original size: 487 x 703 pixels. cessing Institute at the University of Southern
California. Original size: 512 x 512 pixels.
Loch Ness. Copyright photo courtesy of Intergranular Stress Corrosion Cracking of an

Patrick Loo, University of Cambridge. Orig- Al-Zn-Mg-Cu alloy. Gerald Deshais, Depart-
inal size: 841 x 559 pixels. ment of Materials Science & Metallurgy, Uni-
versity of Cambridge. Original size: 600 x 496
pixels.
67
New-York. Copyright photo courtesy of Patrick Opera House of Lyon. F. A. P. Petitcolas. Orig-
Loo, University of Cambridge. Original size: inal size: 695 x 586 pixels.
842 x 571 pixels.
Paper machine. Copyright photo courtesy of Pentagon. Courtesy of the Signal and Image
Karel de Gendre. Original size: 800 x 529 pix- Processing Institute at the University of South-
els. ern California. Original size: 503 x 503 pixels.
68
Peppers. Courtesy of the Signal and Image Pills. Copyright photo courtesy of Karel de
Processing Institute at the University of South- Gendre. Original size: 800 x 519 pixels.
ern California. Original size: 512 x 512 pixels.
Pueblo Bonito. Copyright photo courtesy of Skyline Arch. Copyright photo courtesy of
Robert E. Barber, Barber Nature Photography. Robert E. Barber, Barber Nature Photography.
Original size: 403 x 610 pixels. Original size: 400 x 594 pixels.
69
Fontaine des Terreaux. Copyright photo cour- USC texture mosaic #1. Courtesy of the Signal
tesy of Eric Laboure. Original size: 768 x 512 and Image Processing Institute at the Univer-
pixels. sity of Southern California. Original size: 512
x 512 pixels.
Pocket Watch on a Gold Chain. Copyright im- Always running, never the same.... Copyright
age courtesy of Kevin Odhner. Original size: image courtesy of Jaime Vives Piqueres. Orig-
420 x 315 pixels. inal size: 650 x 488 pixels.
70
Waterfall. Copyright image courtesy of Sascha Wildflowers. Copyright photos courtesy of
Ledinsky. Original size: 800 x 600 pixels. Robert E. Barber, Barber Nature Photography.
Original size: 594 x 400 pixels.
71

2005 02

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

2005 02

Diunggah oleh

Hak Cipta:

Format Tersedia

University of Leeds

Hide and Seek:

2 Benchmarking Watermarking Systems 10

5 Benchmarking Results and Evaluation 43

1.1 The Digital Revolution

1.3 Applications of Digital Watermarking in Image Copyright Protection

1.4.1 Steganography - A Distinction

1.5 A General Watermarking System Model

1.5.1 Watermark Embedding

1. Generation of a watermark W , requires the message to be conveyed M (e.g. a unique identifier of

Watermark Watermark Watermarked

Secret Key (K) Embedding Strength ( α )

Figure 1.2: General model of watermark generation and embedding.

1.5.2 Watermark Detection

or additionally, the original watermark W :

or at a maximum, the addition of the original, unwatermarked image I:

Secret Key (K)

Figure 1.3: General model of watermark recovery.

1.6 Classes of Watermarking System

• Semi-Informed Watermarking systems produce a yes/no answer to whether a given watermark

1.7 Properties of Digital Watermarking Systems

Benchmarking Watermarking Systems

2.1 The Need to Benchmark

2.2 Existing Benchmarking Techniques

2.2.2 Problems with Existing Techniques

2.3 A New Benchmark

2.3.1 Constraining the Problem Space

2.3.2 High Level Design

Figure 2.2: The conceptual benchmarking model.

2.3.3 Experimental Parameters

2.3.4 Performance Measures

Figure 2.3: An example attack strength/detection value plot.

2.3.5 Creating Comparable Results

2.4 Perceptual Distance Metrics

2.4.1 Existing Benchmark Approaches

• Checkmark employs a PDM developed in 1993 by A. B. Watson [46]. Originally designed to

(d) (e) (f)

2.4.2 A New Approach

2.4.2.3 A Hybrid Model

Table 2.1: Experimental PDM calibration results.

SSIMMIN ≤ SSIM(I, f (W, I, X)) ≤ SSIMMAX (2.7)

3.1 Benchmarking Parameters

3.1.1 Keys and Messages

3.1.2 Normalisation Calculation

• Image Processing (Enhancement)

• Image Processing (Geometric Transformation)

3.1.5 Experimental Permutations

• The quantity of embedding operations and hence watermarked images:

• The total quantity of embed-attack-detect cycles for all watermarking systems:

3.2 Watermarking Systems Under Evaluation

3.2.1 Barni System

Figure 3.3: Three-level pyramidal representation of the ‘Lena’ image.

3.2.4 Kutter System

B0x,y = Bx,y + (2s − 1)Lx,y α , (3.2)

4.1 High-level Design

The tightly-coupled design was chosen as it presented a number of advantages:

vBn oqpr8oqBz mqn o

Figure 4.1: Loosely-coupled system design.

¥ ¦%§ ¨%©tª«¦q©­¬w® ¯w°±²w³§ ¨%ªµ´¶ ·¸°¸¹t® ¯¸³

  q q 

ºd»¸¼w¨%©t® ª¨%¯w§ ³ ¨%³·¸¶ § ³

Figure 4.2: Tightly-coupled system design.

vBn oqpr8oqBz mqn o

¥ ¦%§ ¨%©tª«¦q©¬w® ¯w°±²w³§ ¨%ªµ´¶ ·¸°¸¹t® ¯¸³

q q

ºd»¸¼w¨%©t® ª¨%¯w§ ³ ¨%³·¸¶ § ³

ÎÏ Ð ÑqÐ ÒqÓÔ

NPO QSR T9U ^.R _O] `