information theory
angenommene Habilitationsschrift
zur Erlangung der Venia Legendi
für das Lehrgebiet Theoretische Physik
Braunschweig
21. Mai 2003
Foreword
1 Introduction 9
1.1 What is quantum information? . . . . . . . . . . . . . . . . . . . . . 9
1.2 Tasks of quantum information . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Experimental realizations . . . . . . . . . . . . . . . . . . . . . . . . 13
I Fundamentals 17
2 Basic concepts 18
2.1 Systems, States and Effects . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.1 Operator algebras . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.2 Quantum mechanics . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.3 Classical probability . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.4 Observables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Composite systems and entangled states . . . . . . . . . . . . . . . . 22
2.2.1 Tensor products . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.2 Compound and hybrid systems . . . . . . . . . . . . . . . . . 23
2.2.3 Correlations and entanglement . . . . . . . . . . . . . . . . . 24
2.2.4 Bell inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 Completely positive maps . . . . . . . . . . . . . . . . . . . . 26
2.3.2 The Stinespring theorem . . . . . . . . . . . . . . . . . . . . . 27
2.3.3 The duality lemma . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Separability criteria and positive maps . . . . . . . . . . . . . . . . . 29
2.4.1 Positivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.2 The partial transpose . . . . . . . . . . . . . . . . . . . . . . 30
2.4.3 The reduction criterion . . . . . . . . . . . . . . . . . . . . . 31
3 Basic examples 32
3.1 Entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1.1 Maximally entangled states . . . . . . . . . . . . . . . . . . . 32
3.1.2 Werner states . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.3 Isotropic states . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.4 OO-invariant states . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.5 PPT states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1.6 Multipartite states . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.1 Quantum channels . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.2 Channels under symmetry . . . . . . . . . . . . . . . . . . . . 40
3.2.3 Classical channels . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.4 Observables and preparations . . . . . . . . . . . . . . . . . . 42
3.2.5 Instruments and parameter dependent operations . . . . . . . 43
3.2.6 LOCC and separable channels . . . . . . . . . . . . . . . . . 45
3.3 Quantum mechanics in phase space . . . . . . . . . . . . . . . . . . . 46
3.3.1 Weyl operators and the CCR . . . . . . . . . . . . . . . . . . 46
3.3.2 Gaussian states . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.3 Entangled Gaussians . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.4 Gaussian channels . . . . . . . . . . . . . . . . . . . . . . . . 50
4 Basic tasks 52
4.1 Teleportation and dense coding . . . . . . . . . . . . . . . . . . . . . 52
4.1.1 Impossible machines revisited: Classical teleportation . . . . 52
4.1.2 Entanglement enhanced teleportation . . . . . . . . . . . . . 52
4.1.3 Dense coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Estimating and copying . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.1 Quantum state estimation . . . . . . . . . . . . . . . . . . . . 55
4.2.2 Approximate cloning . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 Distillation of entanglement . . . . . . . . . . . . . . . . . . . . . . . 57
4.3.1 Distillation of pairs of qubits . . . . . . . . . . . . . . . . . . 58
4.3.2 Distillation of isotropic states . . . . . . . . . . . . . . . . . . 59
4.3.3 Bound entangled states . . . . . . . . . . . . . . . . . . . . . 60
4.4 Quantum error correction . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4.1 The theory of Knill and Laflamme . . . . . . . . . . . . . . . 61
4.4.2 Graph codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.5 Quantum computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.5.1 The network model of classical computing . . . . . . . . . . . 66
4.5.2 Computational complexity . . . . . . . . . . . . . . . . . . . . 67
4.5.3 Reversible computing . . . . . . . . . . . . . . . . . . . . . . 68
4.5.4 The network model of a quantum computer . . . . . . . . . . 69
4.5.5 Simons problem . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.6 Quantum cryptography . . . . . . . . . . . . . . . . . . . . . . . . . 72
5 Entanglement measures 75
5.1 General properties and definitions . . . . . . . . . . . . . . . . . . . 75
5.1.1 Axiomatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.1.2 Pure states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1.3 Entanglement measures for mixed states . . . . . . . . . . . . 78
5.2 Two qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.1 Pure states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2.2 EOF for Bell diagonal states . . . . . . . . . . . . . . . . . . 81
5.2.3 Wootters formula . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2.4 Relative entropy for Bell diagonal states . . . . . . . . . . . . 83
5.3 Entanglement measures under symmetry . . . . . . . . . . . . . . . . 83
5.3.1 Entanglement of Formation . . . . . . . . . . . . . . . . . . . 84
5.3.2 Werner states . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3.3 Isotropic states . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3.4 OO-invariant states . . . . . . . . . . . . . . . . . . . . . . . 86
5.3.5 Relative Entropy of Entanglement . . . . . . . . . . . . . . . 87
6 Channel capacity 90
6.1 Definition and elementary properties . . . . . . . . . . . . . . . . . . 90
6.1.1 The definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.1.2 Elementary properties . . . . . . . . . . . . . . . . . . . . . . 92
6.1.3 Relations to entanglement measures . . . . . . . . . . . . . . 95
6.2 Coding theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2.1 Shannon’s theorem . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2.2 The classical capacity of a quantum channel . . . . . . . . . . 96
6.2.3 Entanglement assisted capacity . . . . . . . . . . . . . . . . . 97
6.2.4 The quantum capacity . . . . . . . . . . . . . . . . . . . . . . 98
6.2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
II Advanced topics 103
7 Continuity of the quantum capacity 104
7.1 Discrete to continuous error model . . . . . . . . . . . . . . . . . . . 104
7.2 Coding by random graphs . . . . . . . . . . . . . . . . . . . . . . . . 105
7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.3.1 Correcting small errors . . . . . . . . . . . . . . . . . . . . . . 107
7.3.2 Estimating capacity from finite coding solutions . . . . . . . 108
7.3.3 Error exponents . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.3.4 Capacity with finite error allowed . . . . . . . . . . . . . . . . 111
11 Purification 157
11.1 Statement of the problem . . . . . . . . . . . . . . . . . . . . . . . . 157
11.1.1 Figures of Merit . . . . . . . . . . . . . . . . . . . . . . . . . 157
11.1.2 The optimal purifier . . . . . . . . . . . . . . . . . . . . . . . 158
11.2 Calculating fidelities . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
11.2.1 Decomposition of states . . . . . . . . . . . . . . . . . . . . . 159
11.2.2 The one qubit fidelity . . . . . . . . . . . . . . . . . . . . . . 160
11.2.3 The all qubit fidelity . . . . . . . . . . . . . . . . . . . . . . . 162
11.3 Solution of the optimization problems . . . . . . . . . . . . . . . . . 163
11.4 Asymptotic behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
11.4.1 The one particle test . . . . . . . . . . . . . . . . . . . . . . . 165
11.4.2 The many particle test . . . . . . . . . . . . . . . . . . . . . . 168
of data; and even if there are losses they are well understood and it is known how
to deal with them. However, quantum information theory breaks with this point of
view. It studies, loosely speaking, that kind of information (“quantum information”)
which is transmitted by micro particles from a preparation device (sender) to a
measuring apparatus (receiver) in a quantum mechanical experiment – in other
words the distinction between carriers of classical and quantum information becomes
essential. This approach is justified by the observation that a lossless conversion of
quantum information into classical information is in the above sense not possible.
Therefore, quantum information is a new kind of information.
In order to explain why there is no way from quantum to classical information
and back, let us discuss how such a conversion would look like. To convert quantum
to classical information we need a device which takes quantum systems as input
and produces classical information as output – this is nothing else than a measuring
apparatus. The converse translation from classical to quantum information can be
rephrased similarly as “parameter dependent preparation”, i.e. the classical input to
such a device is used to control the state (and possibly the type of system) in which
the micro particles should be prepared. A combination of these two elements can be
done in two ways. Let us first consider a device which goes from classical to quantum
to classical information. This is a possible task and in fact technically realized
already. A typical example is the transmission of classical information via an optical
fiber. The information transmitted through the fiber is carried by micro particles
(photons) and is therefore quantum information (in the sense of our preliminary
definition). To send classical information we have to prepare first photons in a
certain state send them through the channel and measure an appropriate observable
at the output side. This is exactly the combination of a classical → quantum with
a quantum → classical device just described.
The crucial point is now that the converse composition – performing the mea-
surement M first and the preparation P afterwards (cf. Figure 1.1) – is more prob-
lematic. Such a process is called classical teleportation, if the particles produced by
P are “indistinguishable” from the input systems. We will show the impossibility
of such a device via a hierarchy of other “impossible machines” which traces the
problem back to the fundamental structure of quantum mechanics. This finally will
prove our statement that quantum information is a new kind of information 1 .
Measurement Preparation
M P
Figure 1.1: Schematic representation of classical teleportation. Here and in the fol-
lowing diagrams a curly arrow stands for quantum systems and a straight one for
the flow of classical information.
This concerns in particular the construction of Bell’s telephone from a joint measurement, which
we have omitted here.
11 1.1. What is quantum information?
P0 M P M0
∼
=
P0 M0
Figure 1.2: A teleportation process should not affect the results of a statistical
experiment with quantum systems. A more precise explanation of the diagram is
given in the text.
in between; cf. Figure 1.2. In both cases we should get the same distribution of
measuring results for a large number of repetitions of the corresponding experiment.
This requirement should hold for any preparation P 0 and any measurement M 0 ,
but for fixed M and P . The latter means that we are not allowed to use a priori
knowledge about P 0 or M 0 to adopt the teleportation process (otherwise we can
choose in the most extreme case always P 0 for P and the whole discussion becomes
meaningless).
The second impossible machine we have to consider is a quantum copying ma-
chine. This is a device C which takes one quantum system p as input and produces
two systems p1 , p2 of the same type as output. The limiting condition on C is that
p1 and p2 are indistinguishable from the input, where “indistinguishable” has to be
understood in the same way as above: Any statistical experiment performed with
one of the output particles (i.e. always with p1 or always with p2 ) yields the same
result as applied directly to the input p. To get such a device from teleportation
is easy: We just have to perform an M measurement on p, make two copies of the
classical data obtained, and run the preparation P on each of them; cf. Figure 1.3.
Hence if teleportation is possible copying is possible as well.
According to the “no-cloning theorem” of Wootters and Zurek [239], however, a
quantum copy machine does not exist and this basically concludes our proof. How-
ever we will give an easy argument for this theorem in terms of a third impossible
Figure 1.4: Constructing a joint measurement for the observables A and B from a
quantum copying machine.
machine – a joint measuring device MAB for two arbitrary observables A and B.
This is a measuring apparatus which produces each time it is invoked a pair (a, b)
of classical outputs, where a is a possible output of A and b a possible output of
B. The crucial requirement for MAB again is of statistical nature: The statistics of
the a outcomes is the same as for device A, and similarly for B. It is known from
elementary quantum mechanics that many quantum observables are not jointly
measurable in this way. The most famous examples are position and momentum or
different components of angular momentum. Nevertheless a device MAB could be
constructed for arbitrary A and B from a quantum copy machine C. We simply have
to operate with C on the input system p producing two outputs p1 and p2 and to
perform an A measurement on p1 and a B measurement on p2 ; cf. Figure 1.4. Since
the outputs p1 , p2 are, by assumption indistinguishable from the input p the overall
device constructed this way would give a joint measurement for A and B. Hence a
quantum copying machine cannot exist, as stated by the no-cloning theorem. This
in turn implies that classical teleportation is impossible, and therefore we can not
transform quantum information lossless into classical information and back. This
concludes our chain of arguments.
1.2 Tasks of quantum information
So we have seen that quantum information is something new, but what can we do
with it? There are three answers to this question which we want to present here.
First of all let us remark that in fact all information in a modern data processing
environment is carried by micro particles (e.g. electrons or photons). Hence quantum
information comes automatically into play. Currently it is safe to ignore this and
to use classical information theory to describe all relevant processes. If the size of
the structures on a typical circuit decreases below a certain limit, however, this is
no longer true and quantum information will become relevant.
This leads us to the second answer. Although it is far too early to say which
concrete technologies will emerge from quantum information in the future, several
interesting proposals show that devices based on quantum information can solve
certain practical tasks much better than classical ones. The most well known and
exciting one is, without a doubt, quantum computing. The basic idea is, roughly
speaking, that a quantum computer can operate not only on one number per reg-
ister but on superpositions of numbers. This possibility leads to an “exponential
speedup” for some computations which makes problems feasible which are consid-
ered intractable by any classical algorithm. This is most impressively demonstrated
by Shor’s factoring algorithm [192, 193]. A second example which is quite close
13 1.3. Experimental realizations
to a concrete practical realization (i.e. outside the laboratory; see next Section) is
quantum cryptography. The fact that it is impossible to perform a quantum me-
chanical measurement without disturbing the state of the measured system is used
here for the secure transmission of a cryptographic key (i.e. each eavesdropping
attempt can be detected with certainty). Together with a subsequent application
of a classical encryption method known as the “one-time” pad this leads to a cryp-
tographic scheme with provable security – in contrast to currently used public key
systems whose security relies on possibly doubtful assumptions about (pseudo) ran-
dom number generators and prime numbers. We will come back to both subjects –
quantum computing and quantum cryptography in Sections 4.5 and 4.6.
The third answer to the above question is of more fundamental nature. The dis-
cussion of questions from information theory in the context of quantum mechanics
leads to a deeper and in many cases more quantitative understanding of quantum
theory. Maybe the most relevant example for this statement is the study of en-
tanglement, i.e. non-classical correlations between quantum systems, which lead to
violations of Bell inequalities2 . Entanglement is a fundamental aspect of quantum
mechanics and demonstrates the differences between quantum and classical physics
in the most drastic way – this can be seen from Bell-type experiments, like the
one of Aspect et. al. [11], and the discussion about. Nevertheless, for a long time it
was only considered as an exotic feature of the foundations of quantum mechanics
which is not so relevant from a practical point of view. Since quantum information
attained broader interest, however, this has changed completely. It has turned out
that entanglement is an essential resource whenever classical information process-
ing is outperformed by quantum devices. One of the most remarkable examples is
the experimental realization of “entanglement enhanced” teleportation [33, 31]. We
have argued in Section 1.1 that classical teleportation, i.e. transmission of quantum
information through a classical information channel, is impossible. If sender and
receiver share, however, an entangled pair of particles (which can be used as an
additional resource) the impossible task becomes, most surprisingly, possible [19]!
(We will discuss this fact in detail in Section 4.1.) The study of entanglement and
in particular the question how it can be quantified is therefore a central topic within
quantum information theory (cf. Chapter 5). Further examples for fields where
quantum information has led to a deeper and in particular more quantitative in-
sight include “capacities” of quantum information channels and “quantum cloning”.
A detailed discussion of these topics will be given in Chapter 6 and 8. Finally let
us remark that classical information theory benefits in a similar way from the syn-
thesis with quantum mechanics. Beside the just mentioned channel capacities this
concerns for example the theory of computational complexity which analyzes the
scaling behavior of time and space consumed by an algorithm in dependence of the
size of the input data. Quantum information challenges here in particular the fun-
damental Church-Turing hypotheses [54, 212] which claims that each computation
can be simulated “efficiently” on a Turing machine; we come back to this topic in
Section 4.5.
One of the most far developed approaches to quantum computing is the ion trap
technique (see Section 4.3 and 5.3 in [32] and Section 7.6 of [172] for an overview and
further references). A “quantum register” is realized here by a string of ions kept by
electromagnetic fields in high vacuum inside a Paul trap, and two long-living states
of each ion are chosen to represent “0” and “1”. A single ion can be manipulated
by laser beams and this allows the implementation of all “one-qubit gates”. To get
two-qubit gates as well (for a quantum computer we need at least one two qubit
gate together with all one-qubit operations; cf. Section 4.5) the collective motional
state of the ions has to be used. A “program” on an ion trap quantum computer
starts now with a preparation of the register in an initial state – usually the ground
state of the ions. This is done by optical pumping and laser cooling (which is in
fact one of the most difficult parts of the whole procedure, in particular if many
ions are involved). Then the “network” of quantum gates is applied, in terms of a
(complicated) sequence of laser pulses. The readout finally is done by laser beams
which illuminate the ions subsequently. The beams are tuned to a fast transition
which affects only one of the qubit states and the fluorescent light is detected. An
overview about recent experimental directions can be found in [86].
A second quite successful technique is NMR quantum computing (see Section
5.4 of [32] and Section 7.7 of [172] together with the references therein for details).
NMR stands for “nuclear magnetic resonance” and it is the study of transitions
between Zeeman levels of an atomic nucleus in a magnetic field. The qubits are in
this case different spin states of the nuclei in an appropriate molecule and quantum
gates are realized by high frequency oscillating magnetic fields in pulses of controlled
duration. In contrast to ion traps however we do not use one molecule but a whole
cup of liquid containing some 1020 of them. This causes a number of problems,
concerning in particular the preparation of an initial state, fluctuations in the free
time evolution of the molecules and the readout. There are several ways to overcome
these difficulties and we refer the reader again to [32] and [172] for details. Concrete
implementations of NMR quantum computers are capable to use up to seven qubits
[213]. A recent review can be found in [87]
The fundamental problem of the two methods for quantum computation dis-
cussed so far, is their lack of scalability. It is realistic to assume that NMR and
ion-trap quantum computer with up to tens of qubits will exist somewhen in the
future but not with thousands of qubits which are necessary for “real world” appli-
cations. There are, however, many other alternative proposals available and some
of them might be capable to avoid this problem. The following is a small (not at all
exhaustive) list: atoms in optical lattices [37], semiconductor nanostructures such as
quantum dots (there are many works in this area, some recent are [209, 40, 28, 38])
and arrays of Josephson junctions [155].
A second circle of experiments we want to mention here is grouped around
quantum communication and quantum cryptography (for a more detailed overview
let us refer to [227] and [97]). Realizations of quantum cryptography are fairly far
developed and it is currently possible to span up to 50km with optical fibers (e.g.
[126]). Potentially greater distances can be bridged by “free space cryptography”
where the quantum information is transmitted through the air (e.g [44]). With this
technology satellites can be used as some sort of “relays”, thus enabling quantum
key distribution over arbitrary distances. In the meantime there are quite a lot of
successful implementations. For a detailed discussion we will refer the reader to the
review of Gisin et. al. [97] and the references therein. Other experiments concern
the usage of entanglement in quantum communication. The creation and detection
of entangled photons is here a fundamental building block. Nowadays this is no
problem and the most famous experiment in this context is the one of Aspect et.
al. [11], where the maximal violation of Bell inequalities was demonstrated with
polarization correlated photons. Another spectacular experiment is the creation
15 1.3. Experimental realizations
Fundamentals
Chapter 2
Basic concepts
After we have got a first, rough impression of the basic ideas and most rel-
evant subjects of quantum information theory, let us start with a more detailed
presentation. First we have to introduce the fundamental notions of the theory and
their mathematical description. Fortunately, much of the material we should have
to present here, like Hilbert spaces, tensor products and density matrices, is known
already from quantum mechanics and we can focus our discussion to those concepts
which are less familiar like POV measures, completely positive maps and entangled
states.
2.1 Systems, States and Effects
As classical probability theory quantum mechanics is a statistical theory. Hence its
predictions are of probabilistic nature and can only be tested if the same experiment
is repeated very often and the relative frequencies of the outcomes are calculated.
In more operational terms this means: the experiment has to be repeated according
to the same procedure as it can be set out in a detailed laboratory manual. If we
consider a somewhat idealized model of such a statistical experiment we get in
fact two different types of procedures: first preparation procedures which prepare
a certain kind of physical system in a distinguished state and second registration
procedures measuring a particular observable.
A mathematical description of such a setup basically consists of two sets S and
E and a map S × E 3 (ρ, A) → ρ(A) ∈ [0, 1]. The elements of S describe the states,
i.e. preparations, while the A ∈ E represent all yes/no measurements (effects) which
can be performed on the system. The probability (i.e. the relative frequency for a
large number of repetitions) to get the result “yes”, if we are measuring the effect
A on a system prepared in the state ρ, is given by ρ(A). This is a very general
scheme applicable not only to quantum mechanics but also to a very broad class
of statistical models, containing in particular classical probability. In order to make
use of it we have to specify of course the precise structure of the sets S and E and
the map ρ(A) for the types of systems we want to discuss.
2.1.1 Operator algebras
Throughout this paper we will encounter three different kinds of systems: quantum
and classical systems and hybrid systems which are half classical, half quantum (cf.
Subsection 2.2.2). In this subsection we will describe a general way to define states
and effects which is applicable to all three cases and which therefore provides a
handy way to discuss all three cases simultaneously (this will become most useful
in Section 2.2 and 2.3).
The scheme we are going to discuss is based on an algebra A of bounded op-
erators acting on a Hilbert space H. More precisely A is a (closed) linear sub-
space of B(H), the algebra of bounded operates on H, which contains the identity
(1I ∈ A) and is closed under products (A, B ∈ A ⇒ AB ∈ A) and adjoints (A ∈ A
⇒ A∗ ∈ A). For simplicity we will refer to each such A as an observable algebra.
The key observation is now that each type of system we will study in the following
can be completely characterized by its observable algebra A, i.e. once A is known
there is a systematic way to derive the sets S and E and the map (ρ, A) 7→ ρ(A)
from it. We frequently make use of this fact by referring to systems in terms of their
observable algebra A, or even by identifying them with their algebra and saying
that A is the system.
19 2.1. Systems, States and Effects
where A∗ denotes the dual space of A, i.e. the set of all linear functionals on A, and
ρ ≥ 0 means ρ(A) ≥ 0 ∀A ≥ 0. Elements of S(A) describe the states of the system
in question while effects are given by
The probability to measure the effect A in the state ρ is ρ(A). More generally we can
look at ρ(A) for an arbitrary A as the expectation value of A in the state ρ. Hence
the idea behind Equation (2.1) is to define states in terms of their expectation value
functionals.
Both spaces are convex, i.e. ρ, σ ∈ S(A) and 0 ≤ λ ≤ 1 implies λρ + (1 − λ)σ ∈
S(A) and similarly for E(A). The extremal points of S(A) respectively E(A), i.e.
those elements which do not admit a proper convex decomposition (x = λy+(1−λ)z
⇒ λ = 1 or λ = 0 or y = z = x), play a distinguished role: the extremal points
of S(A) are pure states and those of E(A) are the propositions of the system in
question. The latter represent those effects which register a property with certainty
in contrast to non-extremal effects which admit some “fuzziness”. As a simple ex-
ample for the latter consider a detector which registers particles not with certainty
but only with a probability which is smaller than one.
Finally let us note that the complete discussion of this section can be generalized
easily to infinite dimensional systems, if we replace H = Cd by an infinite dimen-
sional Hilbert space (e.g. H = L2 (R)). This would require however more material
about C* algebras and measure theory than we want to use in this paper.
2.1.2 Quantum mechanics
For quantum mechanics we have
A = B(H), (2.3)
where we have chosen again H = Cd . The corresponding systems are called d-level
systems or qubits if d =£2 holds.
¤ To avoid
£ clumsy
¤ notations we frequently write S(H)
and E(H) instead of S B(H) and E B(H) . From Equation (2.2) we immediately
see that an operator A ∈ B(H) is an effect iff it is positive and bounded from above
by 1I. An element P ∈ E(H) is a propositions iff P is a projection operator (P 2 = P ).
States are described in quantum mechanics usually by density matrices, i.e.
positive and normalized trace class1 operators. To make contact to the general
definition in Equation (2.1) note first that B(H) is a Hilbert space with the Hilbert-
Schmidt scalar product hA, Bi = tr(A∗ B). Hence each linear functional ρ ∈ B(H)∗
1 On a finite dimensional Hilbert space this attribute is of course redundant, since each operator
is of trace class in this case. Nevertheless we will frequently use this terminology, due to greater
consistency with the infinite dimensional case.
2. Basic concepts 20
can be expressed in terms of a (trace class) operator ρe by2 A 7→ ρ(A) = tr(e ρA). It is
obvious that each ρe defines a unique functional ρ. If we start on the other hand with
ρ we can recover the matrix elements of ρe from ρ by ρekj = tr(e ρ|jihk|) = ρ(|jihk|),
where |jihk| denotes the canonical basis of B(H) (i.e. |jihk|ab = δja δkb ). More
generally we get for ψ, φ ∈ H the relation hφ, ρeψi = ρ(|ψihφ|), where |ψihφ| now
denotes the rank one operator which maps η ∈ H to hφ, ηiψ. In the following we
drop the ∼ and use the same symbol for the operator and the functional whenever
confusion can be avoided. Due to the same abuse of language we will interpret
elements of B(H)∗ frequently as (trace class) operators instead of linear functionals
(and write tr(ρA) instead of ρ(A)). However we do not identify B(H)∗ with B(H)
in general, because the two different notations help to keep track of the distinction
between spaces of states and spaces of observables. In addition we equip B ∗ (H) with
the trace-norm kρk1 = tr |ρ| instead of the operator norm.
Positivity of the functional ρ implies positivity of the operator ρ due to
0 ≤ ρ(|ψihψ|) = hψ, ρψi and the same holds for normalization: 1 = ρ(1I) = tr(ρ).
Hence we can identify the state space from Equation (2.1) with the set of density
matrices, as expected for quantum mechanics. Pure states of a quantum system
are the one dimensional projectors. As usual we will frequently identify the density
matrix |ψihψ| with the wave function ψ and call the latter in abuse of language a
state.
To get a useful parameterization of the state space consider again the Hilbert-
Schmidt scalar product hρ, σi = tr(ρ∗ σ), but now on B ∗ (H). The space of trace free
matrices in B ∗ (H) (alternatively the functionals with ρ(1I) = 0) is the corresponding
orthocomplement 1I⊥ of the unit operator. If we choose a basis σ1 , . . . , σd2 −1 with
hσj , σk i = 2δjk in 1I⊥ we can write each selfadjoint (trace class) operator ρ with
tr(ρ) = 1 as
2
d −1
1I 1 X 1I 1 2
ρ= + xj σj =: + ~x · ~σ , with ~x ∈ Rd −1 . (2.4)
d 2 j=1 d 2
the observable algebra is much larger and Equation (2.1) leads to states which are not necessarily
given by trace class operators. Such “singular states” play an important role in theories which
admit an infinite number of degrees of freedom like quantum statistics and quantum field theory;
cf. [35]. This point will be essential in the discussion of infinitely entangled states; cf. Chapter 13.
21 2.1. Systems, States and Effects
set X of elementary events. Typical examples are: throwing a dice X = {1, . . . , 6},
tossing a coin X = {“head”, “number”} or classical bits X = {0, 1}. To simplify
the notations we write (as in quantum mechanics) S(X) and E(X) for the spaces
of states and effects.
The observable algebra A of such a system is the space
A = C(X) = {f : X → C} (2.6)
of complex valued functions on X. To interpret this as an operator algebra acting
on a Hilbert space H (as indicated in Subsection 2.1.1) choose an arbitrary but
fixed orthonormal P basis |xi, x ∈ X in H and identify the function f ∈ C(X) with
the operator f = x fx |xihx| ∈ B(H) (we use the same symbol for the function
and the operator, provided confusion can be avoided). Most frequently we have
X = {1, . . . , d} and we can choose H = Cd and the canonical basis for |xi. Hence
C(X) becomes the algebra of diagonal d × d matrices. Using Equation (2.2) we
immediately see that f ∈ C(X) is an effect iff 0 ≤ fx ≤ 1, ∀x ∈ X. Physically
we can interpret fx as the probability that the effect f registers the elementary
event x. This makes the distinction between propositions and “fuzzy” effects very
transparent: P ∈ E(X) is a proposition iff we have either Px = 1 or Px = 0 for all
x ∈ X. Hence the propositions P ∈ C(X) are in one to one correspondence with
the subsets ωP = {x ∈ X | Px = 1} ⊂ X which in turn describe the events of the
system. Hence P registers the event ωP with certainty, while a fuzzy effect f < P
does this only with a probability less then one.
Since C(X) is finite dimensional and admits the distinguished basis |xihx|, x ∈ X
it is naturally isomorphic to its dual C ∗ (X). More precisely: each linear functional
ρ ∈ C ∗ (X) defines Pand is uniquely defined by the function x 7→ ρx = ρ(|xihx|) and
we have ρ(f ) = x fx ρx . As in the quantum case we will identify the function ρ
with the linear functional and use the same symbol for both, although we keep the
notation C ∗ (X) to indicate that we are talking about states rather than observables.
Positivity of ρP ∈ C ∗ (X) is given
P by ρx ≥ 0 for all x and normalization leads
to 1 = ρ(1I) = ρ ( x |xihx|) = x ρx . Hence to be a state ρ ∈ C ∗ (X) must be a
probability distribution on X and ρx is the probability that the elementary event x
occurs during
P statistical experiments with systems in the state ρ. More generally
ρ(f ) = j ρj fj is the probability to measure the effect f on systems in the state ρ.
If P is in particular a proposition, ρ(P ) gives the probability for the event ωP . The
pure states of the system are the Dirac measures δx , x ∈ X; with δx (|yihy|) = δxy .
Hence each ρ ∈ S(X) can be decomposed in a unique way into a convex linear
combination of pure states.
2.1.4 Observables
Up to now we have discussed only effects, i.e. yes/no experiments. In this subsection
we will have a first short look at more general observables. We will come back to
this topic in Section 3.2.4 after we have introduced channels. We can think of an
observable E taking its values in a finite set X as a map which associates to each
possible outcome x ∈ X the effect Ex ∈ E(A) (if A is the observable algebra of
the system in question) which is true if x is measured and false otherwise. If the
measurement is performed on systems in the state ρ we get for each x ∈ X the
probability px = ρ(Ex ) to measure x. Hence the family of the px should be a
probability distribution on X, and this implies that E should be a POV measure
on X.
Definition 2.1.1 Consider an observable algebra A ⊂ B(H) and a finite 3 set X.
A family E = (Ex )x∈X of effects in A (i.e. 0 ≤ Ex ≤ 1I) is called a positive
3 This is if course an artifical restriction and in many situations not justified (cf. in particular
the discussion of quantum state estimation in Section 4.2 and Chapter 8). However, it helps us to
avoid measure theoretical subtleties; cf. Holevo’s book [111] for a more general discussion.
2. Basic concepts 22
P
operator valued measure (POV measure) on X if x∈X Ex = 1I holds. If all Ex
are projections, E is called projection valued measure (PV measure).
From basic quantum mechanics we know that observables are described by self
adjoint operators on a Hilbert space H. But, how does this point of view fit into
the previous definition? The answer is given by the spectral theorem (Thm. VIII.6
[186]): Each selfadjoint
P operator A on a finite dimensional Hilbert space H has
the form A = λ∈σ(A) λPλ where σ(A) denotes the spectrum of A, i.e. the set of
eigenvalues and Pλ denotes the projection onto the corresponding eigenspace. Hence
there is a unique PV measure P = (Pλ )λ∈σ(A) associated to A which is called the
spectral measure
P of A. It is uniquely characterized by the property that the expecta-
tion value λ λρ(Pλ ) of P in the state ρ is given for any state ρ by ρ(A) = tr(ρA);
as it is well known from quantum mechanics. Hence the traditional way to define
observables within quantum mechanics perfectly fits into the scheme just outlined,
however it only covers the projection valued case and therefore admits no fuzziness.
For this reason POV measures are sometimes called generalized observables.
Finally note that the eigenprojections Pλ of A are elements of an observable
algebra A iff A ∈ A. This shows two things: First of all we can consider selfadjoint
elements of any *-subalgebra A of B(H) as observables of A-systems, and this is
precisely the reason why we have called A observable algebra. Secondly we see why
it is essential that A is really a subalgebra of B(H): if it is only a linear subspace
of B(H) the relation A ∈ A does not imply Pλ ∈ A.
2.2 Composite systems and entangled states
Composite systems occur in many places in quantum information theory. A typical
example is a register of a quantum computer, which can be regarded as a system
consisting of N qubits (if N is the length of the register). The crucial point is that
this opens the possibility for correlations and entanglement between subsystems.
In particular entanglement is of great importance, because it is a central resource
in many applications of quantum information theory like entanglement enhanced
teleportation or quantum computing – we already discussed this in Section 1.2 of
the introduction. To explain entanglement in greater detail and to introduce some
necessary formalism we have to complement the scheme developed in the last section
by a procedure which allows us to construct states and observables of the composite
system from its subsystems. In quantum mechanics this is done of course in terms
of tensor products, and we will review in the following some of the most relevant
material.
2.2.1 Tensor products
Consider two (finite dimensional) Hilbert spaces H and K. To each pair of vectors
ψ1 ∈ H, ψ2 ∈ K we can associate a bilinear form ψ1 ⊗ ψ2 called the tensor product
of ψ1 and ψ2 by ψ1 ⊗ ψ2 (φ1 , φ2 ) = hψ1 , φ1 ihψ2 , φ2 i. For two product vectors ψ1 ⊗ ψ2
and η1 ⊗ η2 their scalar product is defined by hψ1 ⊗ ψ2 , η1 ⊗ η2 i = hψ1 , η1 ihψ2 , η2 i
and it can be shown that this definition extends in a unique way to the span of all
ψ1 ⊗ ψ2 which therefore defines the tensor product H ⊗ K. If we have more than two
Hilbert spaces Hj , j = 1, . . . , N their tensor product H1 ⊗ · · · ⊗ HN can be defined
similarly.
The tensor product A1 ⊗ A2 of two bounded operators A1 ∈ B(H), A2 ∈ B(K)
is defined first for product vectors ψ1 ⊗ ψ2 ∈ H ⊗ K by A1 ⊗ A2 (ψ1 ⊗ ψ2 ) =
(A1 ψ1 ) ⊗ (A2 ψ2 ) and then extended by linearity. The space B(H ⊗ K) coincides
with the span of all A1 ⊗ A2 . If ρ ∈ B(H ⊗ K) is not of product form (and of
trace class for infinite dimensional H and K) there is nevertheless a way to define
“restrictions” to H respectively K called the partial trace of ρ. It is defined by the
equation
tr[trK (ρ)A] = tr(ρA ⊗ 1I) ∀A ∈ B(H) (2.7)
23 2.2. Composite systems and entangled states
where the trace on the left hand side is over H and on the right hand side over
H ⊗ K.
If two orthonormal bases φ1 , . . . , φn and ψ1 , . . . , ψm are given in H respectively
K we can consider the product basis P φ1 ⊗ ψ1 , . . . , φn ⊗ ψm in H ⊗ K, and we can
expand each Ψ ∈ H ⊗ K as Ψ = jk Ψjk φj ⊗ ψk with Ψjk = hφj ⊗ ψk , Ψi. This
procedure works for an arbitrary number of tensor factors. However, if we have
exactly a twofold tensor product, there is a more economic way to expand Ψ, called
Schmidt decomposition in which only diagonal terms of the form φj ⊗ ψj appear.
Proposition 2.2.1 For each element Ψ of the twofold tensor product H ⊗ K there
are orthonormal systems φj , j = 1, . . . , n and ψk , k = 1, . . . , n (not necessarily
bases, i.e.P √ be smaller than dim H and dim K) of H and K respectively such
n can
that Ψ = j λj φj ⊗ ψj holds. The φj and ψj are uniquely determined by Ψ. The
√
expansion is called Schmidt decomposition and the numbers λj are the Schmidt
coefficients.
Proof. Consider the partial trace ρ1 = trK (|ΨihΨ|) of the one dimensional projector
|ΨihΨ| associated to Ψ. ItPcan be decomposed in terms of its eigenvectors φ n and we
get trK (|ΨihΨ|) = ρ1 = n λn |φn ihφn |. Now we can choose an orthonormal basis
ψk0 , k = 1, . . . , m in K and expand Ψ with respect
P to φj ⊗ ψk0 . Carrying out the k
summation
P we get a family of vectors ψj = k hΨ, φj ⊗ ψk0 iψk0 with the property
00
Ψ = j φj ⊗ ψj00 . Now we can calculate the partial trace and get for any A ∈ B(H1 ):
X X
λj hφj , Aφj i = tr(ρ1 A) = hΨ, (A ⊗ 1I)Ψi = hφj , Aφk ihψj00 , ψk00 i. (2.8)
j,k
j
Since A is arbitrary we can compare the left and right hand side of this equation
−1/2 00
term by term and we get hψj00 , ψk00 i = δjk λj . Hence ψj = λj ψj is the desired
orthonormal system. 2
As an immediate application of this result we can show that each mixed state
ρ ∈ B ∗ (H) (of the quantum system B(H)) can be regarded as a pure state on a
larger
P Hilbert space H ⊗ H0 . We just have to consider the eigenvalue expansion ρ =
j λj |φj ihφj | of ρ and to choose an arbitrary orthonormal system ψ j , j = 1, . . . n
in H0 . Using Proposition 2.2.1 we get
Corollary 2.2.2 Each state ρ ∈ B ∗ (H) can be extended to a pure state Ψ on a
larger system with Hilbert space H ⊗ H0 such that trH0 |ΨihΨ| = ρ holds.
2.2.2 Compound and hybrid systems
To discuss the composition of two arbitrary (i.e. classical or quantum) systems it is
very convenient to use the scheme developed in Subsection 2.1.1 and to talk about
the two subsystems in terms of their observable algebras A ⊂ B(H) and B ⊂ B(K).
The observable algebra of the composite system is then simply given by the tensor
product of A and B, i.e.
as expected. For two classical systems A = C(X) and B = C(Y ) recall that elements
of C(X) (respectively C(Y )) are complex valued functions on X (on Y ). Hence the
tensor product C(X) ⊗ C(Y ) consists of complex valued functions on X × Y , i.e.
C(X) ⊗ C(Y ) = C(X × Y ). In other words states and observables of the composite
system C(X) ⊗ C(Y ) are, in accordance with classical probability theory, given by
probability distributions and random variables on the Cartesian product X × Y .
If only one subsystem is classical and the other is quantum; e.g. a micro particle
interacting with a classical measuring device we have a hybrid system. The elements
of its observable algebra C(X) ⊗ B(H) can be regarded as operator valued functions
on X, i.e. X 3 x 7→ Ax ∈ B(H) and A is an effect iff 0 ≤ Ax ≤ 1I holds for all
x ∈ X. The elements of the dual C ∗ (X) ⊗ B ∗ (H) are in a similar way B ∗ (X) valued
functions X 3 x 7→ P ρx ∈ B ∗ (H) and ρ is a state iff each ρx is a positive trace class
operator
P on H and x ρx = 1I. The probability to measure the effect A in the state
ρ is x ρx (Ax ).
2.2.3 Correlations and entanglement
Let us now consider two effects A ∈ A and B ∈ B then A ⊗ B is an effect of the
composite system A ⊗ B. It is interpreted as the joint measurement of A on the first
and B on the second subsystem, where the “yes” outcome means “both effects give
yes”. In particular A ⊗ 1I means to measure A on the first subsystem and to ignore
the second one completely. If ρ is a state of A ⊗ B we can define its restrictions
by ρA (A) = ρ(A ⊗ 1I) and ρB (A) = ρ(1I ⊗ A). If both systems are quantum the
restrictions of ρ are the partial traces, while in the classical case we have to sum
over the B, respectively A variables. For two states ρ1 ∈ S(A) and ρ2 ∈ S(B) there
is always a state ρ of A ⊗ B such that ρ1 = ρA and ρ2 = ρB holds: We just have to
choose the product state ρ1 ⊗ ρ2 . However in general we have ρ 6= ρA ⊗ ρB which
means nothing else then ρ also contains correlations between the two subsystems.
Definition 2.2.3 A state ρ of a bipartite system A ⊗ B is called correlated if there
are some A ∈ A, B ∈ B such that ρ(A ⊗ B) 6= ρA (A)ρB (B) holds.
We immediately see that ρ = ρ1 ⊗ ρ2 implies ρ(A ⊗ B) = ρ1 (A)ρ2 (B) =
ρA (A)ρB (B) hence ρ is not correlated. If on the other hand ρ(A⊗B) = ρA (A)ρB (B)
holds we get ρ = ρA ⊗ ρB . Hence, the definition of correlations just given perfectly
fits into our intuitive considerations.
An important issue in quantum information theory is the comparison of correla-
tions between quantum systems on the one hand and classical systems on the other.
Hence let us have a closer look on the state space of a system consisting of at least
one classical subsystem.
Proposition 2.2.4 Each state ρ of a composite system A ⊗ B consisting of a clas-
sical (A = C(X)) and an arbitrary system (B) has the form
X
ρ= λj ρA B
j ⊗ ρj (2.11)
j∈X
If A and B are two quantum systems it is still possible for them to be correlated
in the way just described. We can simply prepare them with a classical random
25 2.2. Composite systems and entangled states
generator which triggers two preparation devices to produce systems in the states
ρA B
j , ρj with probability λj . The overall state produced by this setup is obviously
the ρ from Equation (2.11). However, the crucial point is that not all correlations of
quantum systems are of this type! This is an immediate consequence of the definition
of pure states ρ = |ΨihΨ| ∈ S(H): Since there is no proper convex decomposition of
ρ, it can be written as in Proposition 2.2.4 iff Ψ is a product vector, i.e. Ψ = φ ⊗ ψ.
This observation motivates the following definition.
Definition 2.2.5 A state ρ of the composite system B(H1 ) ⊗ B(H2 ) is called sep-
arable or classically correlated if it can be written as
X (1) (2)
ρ= λj ρj ⊗ ρ j (2.12)
j
(k)
with states ρj of B(Hk ) and weights λj > 0. Otherwise ρ is called entangled. The
set of all separable states is denoted by D(H1 ⊗ H2 ) or just D if H1 and H2 are
understood.
2.2.4 Bell inequalities
We have just seen that it is quite easy for pure states to check whether they are
entangled or not. In the mixed case however this is a much bigger, and in general
unsolved, problem. In this subsection we will have a short look at Bell inequalities,
which are maybe the oldest criterion for entanglement (for a more detailed review see
[233]). Today more powerful methods, most of them based on positivity properties,
are available. We will postpone the corresponding discussion to the end of the
following section, after we have studied (completely) positive maps (cf. Section
2.4).
Bell inequalities are traditionally discussed in the framework of “local hidden
variable theories”. More precisely we will say that a state ρ of a bipartite system
B(H ⊗ K) admits a hidden variable model, if there is a probability space (X, µ) and
(measurable) response functions X 3 x 7→ FA (x, k), FB (x, l) ∈ R for all discrete PV
measures A = A1 , . . . , AN ∈ B(H) respectively B = B1 , . . . , BM ∈ B(K) such that
Z
FA (x, k)FB (x, l)µ(dx) = tr(ρAk ⊗ Bl ) (2.13)
X
holds for all, k, l and A, B. The value of the functions FA (x, k) is interpreted as
the probability to get the value k during an A measurement with known “hidden
parameter” x. The set of states admitting a hidden variable model is a convex set
and as such it can be described by an (infinite) hierarchy of correlation inequalities.
Any one of these inequalities is usually called (generalized) Bell inequality. The
most well known one is those given by Clauser, Horne, Shimony and Holt [57]: The
state ρ satisfies the CHSH-inequality if
¡ ¢
ρ A ⊗ (B + B 0 ) + A0 ⊗ (B − B 0 ) ≤ 2 (2.14)
quantum systems. The most prominent examples are “maximally entangled states”
(cf. Subsection 3.1.1) which violate the CHSH
√ inequality (for appropriately chosen
A, A0 , B, B 0 ) with a maximal value of 2 2. This observation is the starting point
for many discussions concerning the interpretation
√ of quantum mechanics, in par-
ticular because the maximal violation of 2 2 was observed in 1982 experimentally
by Aspect and coworkers [11]. We do not want to follow this path (see [233] and the
the references therein instead). Interesting for us is the fact that Bell inequalities,
in particular the CHSH case in Equation (2.14), provide a necessary condition for
a state ρ to be separable. However there exist entangled states admitting a hidden
variable model [229]. Hence, Bell inequalities are not sufficient for separability.
2.3 Channels
Assume now that we have a number of quantum systems, e.g. a string of ions in
a trap. To “process” the quantum information they carry we have to perform in
general many steps of a quite different nature. Typical examples are: free time
evolution, controlled time evolution (e.g. the application of a “quantum gate” in a
quantum computer), preparations and measurements. The purpose of this section is
to provide a unified framework for the description of all these different operations.
The basic idea is to represent each processing step by a “channel”, which converts
input systems, described by an observable algebra A into output systems described
by a possibly different algebra B. Henceforth we will call A the input and B the
output algebra. If we consider e.g. the free time evolution, we need quantum systems
of the same type on the input and the output side, hence in this case we have
A = B = B(H) with an appropriately chosen Hilbert space H. If on the other hand
we want to describe a measurement we have to map quantum systems (the measured
system) to classical information (the measuring result). Therefore we need in this
example A = B(H) for the input and B = C(X) for the output algebra, where X is
the set of possible outcomes of the measurement (cf. Subsection 2.1.4).
Our aim is now to get a mathematical object which can be used to describe a
channel. To this end consider an effect A ∈ B of the output system. If we invoke first
a channel which transforms A systems into B systems, and measure A afterwards
on the output systems, we end up with a measurement of an effect T (A) on the
input systems. Hence we get a map T : E(B) → E(A) which completely describes the
channel 4 . Alternatively we can look at the states and interpret a channel as a map
T ∗ : S(A) → S(B) which transforms A systems in the state ρ ∈ S(A) into B systems
in the state T ∗ (ρ). To distinguish between both maps we can say that T describes
the channel in the Heisenberg picture and T ∗ in the Schrödinger picture. On the level
of the statistical interpretation both points of view should coincide of course, i.e. the
probabilities5 (T ∗ ρ)(A) and ρ(T A) to get the result “yes” during an A measurement
on B systems in the state T ∗ ρ, respectively a T A measurement on A systems in
the state ρ, should be the same. Since (T ∗ ρ)(A) is linear in A we see immediately
that T must be an affine map, i.e. T (λ1 A1 + λ2 A2 ) = λ1 T (A1 ) + λ2 T (A2 ) for each
convex linear combination λ1 A1 + λ2 A2 of effects in B, and this in turn implies that
T can be extended naturally to a linear map, which we will identify in the following
with the channel itself, i.e. we say that T is the channel.
2.3.1 Completely positive maps
Let us change now slightly our point of view and start with a linear operator T :
A → B. To be a channel, T must map effects to effects, i.e. T has to be positive:
4 Note that the direction of the mapping arrow is reversed compared to the natural ordering of
processing.
5 To keep notations more readable we will follow frequently the usual convention to drop the
parenthesis around arguments of linear operators. Hence we will write T A and T ∗ ρ instead of
T (A) and T ∗ (ρ). Similarly we will simply write T S instead of T ◦ S for compositions.
27 2.3. Channels
T (A) ≥ 0 ∀A ≥ 0 and bounded from above by 1I, i.e. T (1I) ≤ 1I. In addition it is
natural to require that two channels in parallel are again a channel. More precisely, if
two channels T : A1 → B1 and S : A2 → B2 are given we can consider the map T ⊗S
which associates to each A⊗B ∈ A1 ⊗A2 the tensor product T (A)⊗S(B) ∈ B1 ⊗B2 .
It is natural to assume that T ⊗ S is a channel which converts composite systems
of type A1 ⊗ A2 into B1 ⊗ B2 systems. Hence S ⊗ T should be positive as well [178].
algebras. It needs however some material from representation theory of C*-algebras which we want
to avoid here. See e.g. [178, 115].
2. Basic concepts 28
ρ = (Id ⊗T ∗ ) σ, (2.18)
where Id denotes the identity map on B ∗ (H). The pure state σ can be chosen such
that trH (σ) has no zero eigenvalue. In this case T and σ are uniquely³ determined
´
e, T with ρ = Id ⊗Te∗ σ
(up to unitary equivalence) by Equation (2.18); i.e. if σ e e are
given, we have σ ∗ e ∗
e = (1I ⊗ U ) σ(1I ⊗ U ) and T ( · ) = U T ( · )U with an appropriate
unitary operator U .
29 2.4. Separability criteria and positive maps
Proof. The state σ is obviously the purification of trH1 (ρ). Hence if λj and
ψj are
P eigenvalues
p and eigenvectors of trH1 (ρ) we can set σ = |ΨihΨ| with
Ψ = j λj ψj ⊗ φj where φj is an (arbitrary) orthonormal basis in K. It is clear
that σ is uniquely determined up to a unitary. Hence we only have to show that a
unique T exists if Ψ is given. To satisfy Equation (2.18) we must have
¡ ¢ ¡ ¢ ®
ρ |ψj ⊗ ηk ihψl ⊗ ηl | = Ψ, (Id ⊗T ) |ψj ⊗ ηk ihψl ⊗ ηl | Ψ (2.19)
¡ ¢ ®
= Ψ, |ψj ihψl | ⊗ T |ηk ihηp | Ψ (2.20)
p ¡ ¢ ®
= λj λl φj , T |ηk ihηp | φl , (2.21)
to replace α by α − γ tr. Hence the result follows from the fact that each linear
functional on B ∗ (H ⊗ K) has the form α(σ) = tr(Aσ) with A ∈ B(H ⊗ K). 2
is not. The latter can be easily checked with the maximally entangled state (cf.
Subsection 3.1.1).
1 X
Ψ= √ |ji ⊗ |ji (2.25)
d j
for any separable state ρ ∈ B ∗ (H ⊗ K), These equations are another non-trivial
separability criterion, which is called the reduction criterion [117, 52]. It is closely
related to the ppt criterion, due to the following proposition (see [117]) for a proof).
Proposition 2.4.5 Each ppt-state ρ ∈ S(H ⊗ K) satisfies the reduction criterion.
If dim H = 2 and dim K = 2, 3 both criteria are equivalent.
Hence we see with Theorem 2.4.3 that a state ρ in 2 × 2 or 2 × 3 dimensions is
separable iff it satisfies the reduction criterion.
Chapter 3
Basic examples
After the somewhat abstract discussion in the last chapter we will become more
concrete now. In the following we will present a number of examples which help
on the one hand to understand the structures just introduced, and which are of
fundamental importance within quantum information on the other.
3.1 Entanglement
Although our definition of entanglement (Definition 2.2.5) is applicable in arbitrary
dimensions, detailed knowledge about entangled states is available only for low
dimensional systems or for states with very special properties. In this section we
will discuss some of the most basic examples.
3.1.1 Maximally entangled states
Let us start with a look on pure states of a composite systems ¡ A ⊗ B¢ and their
possible correlations. If one subsystem is classical, i.e. A = C {1, . . . , d} , the state
space is given according to Subsection 2.2.2 by S(B)d and ρ ∈ S(B)d is pure iff
ρ = (δj1 τ, . . . , δjd τ ) with j = 1, . . . , d and a pure state τ of the B system. Hence the
restrictions of ρ to A respectively B are the Dirac measure δj ∈ S(X) or τ ∈ S(B),
in other words both restrictions are pure. This is completely different if A and B
are quantum, i.e. A ⊗ B = B(H ⊗ K): Consider ρ = |ΨihΨ| with Ψ ∈ H ⊗ K and
P 1/2
Schmidt decomposition (Proposition 2.2.1) Ψ = j λj φj ⊗ ψj . Calculating the A
restriction, i.e. the partial trace over K we get
X 1/2 1/2
tr[trK (ρ)A] = tr[|ΨihΨ|A ⊗ 1I] = λj λk hφj , Aφk iδjk , (3.1)
jk
P
hence trK (ρ) = j λj |φj ihφj | is mixed iff Ψ is entangled. The most extreme case
arises if H = K = Cd and trK (ρ) is maximally mixed, i.e. trK (ρ) = 1dI . We get for Ψ
d
1 X
Ψ= √ φj ⊗ ψ j (3.2)
d j=1
Let us come back to the general case now and consider an arbitrary ρ ∈ S(H⊗H).
Using maximally entangled states, we can introduce another separability criterion
in terms of the maximally entangled fraction (cf. [24])
If ρ is separable the reduction criterion (2.26) implies hΨ, [tr1 (ρ) ⊗ 1I − ρ]Ψi ≥ 0 for
any maximally entangled state. Since the partial trace of |ΨihΨ| is d−1 1I we get
hence F(ρ) ≤ 1/d. This condition is not very sharp however. Using the ppt criterion
it can be shown that ρ = λ|Φ1 ihΦ1 | + (1 − λ)|00ih00| (with the Bell state Φ1 ) is
entangled for all 0 < λ ≤ 1 but a straightforward calculation shows that F(ρ) ≤ 1/2
holds for λ ≤ 1/2.
Finally, we have to mention here a very useful parameterization of the set of
pure states on H ⊗ H in terms of maximally entangled states: If Ψ is an arbitrary
but fixed maximally entangled state, each φ ∈ H ⊗ H admits (uniquely determined)
operators X1 , X2 such that
In our case (N = 2) there are only two permutations: the identity 1I and the flip
F (ψ ⊗ φ) = φ ⊗ ψ. Hence ρ = a1I + bF with appropriate coefficients a, b. Since ρ is
a density matrix, a and b are not independent. To get a transparent way to express
these constraints, it is reasonable to consider the eigenprojections P± of F rather
then 1I and F ; i.e. F P± ψ = ±P± ψ and P± = (1I ± F )/2. The P± are the projections
⊗2
on the subspaces H± ⊂ H ⊗ H of symmetric respectively antisymmetric tensor
3. Basic examples 34
λ (1 − λ)
ρ= P+ + P− , λ ∈ [0, 1]. (3.8)
d+ d−
On the other hand it is obvious that each state of this form is U ⊗ U invariant,
hence a Werner state.
If ρ is given, it is very easy to calculate the parameter λ from the expectation
value of ρ and the flip tr(ρF ) = 2λ − 1 ∈ [−1, 1]. Therefore we can write for an
arbitrary state σ ∈ S(H ⊗ H)
tr(σF ) + 1 (1 − tr σF )
PUU (σ) = P+ + P− , (3.9)
2d+ 2d−
and this defines a projection from the full state space to the set of Werner states
which is called the twirl operation. In many cases it is quite useful that it can be
written alternatively as a group average of the form
Z
PUU (σ) = (U ⊗ U )σ(U ∗ ⊗ U ∗ )dU, (3.10)
U(d)
where dU denotes the normalized, left invariant Haar measure on U(d). To check
this identity note first that its right hand side is indeed U ⊗ U invariant, due to the
invariance of the volume element dU . Hence we have to check only that the trace
of F times the integral coincides with tr(F σ):
" Z # Z
tr F (U ⊗ U )σ(U ∗ ⊗ U ∗ )dU = tr [F (U ⊗ U )σ(U ∗ ⊗ U ∗ )] dU (3.11)
U(d) U(d)
Z
= tr(F σ) dU = tr(F σ), (3.12)
U(d)
where we have used the fact that F commutes with U ⊗ U and the normalization of
dU . We can apply PUU obviously to arbitrary operators A ∈ B(H ⊗ H) and, as an
integral over unitarily implemented operations, we get a channel. Substituting U →
U ∗ in (3.10) and cycling the trace tr(APUU (σ)) we find tr(PUU (A)ρ) = tr(APUU (ρ)),
hence PUU has the same form in the Heisenberg and the Schrödinger picture (i.e.
∗
PUU = PUU ).
If σ ∈ S(H ⊗ H) is a separable state the integrand of PUU (σ) in Equation (3.10)
consists entirely of separable states, hence PUU (σ) is separable. Since each Werner
state ρ is the twirl of itself, we see that ρ is separable iff it is the twirl PUU (σ) of
a separable state σ ∈ S(H ⊗ H). To determine the set of separable Werner states
we therefore have to calculate only the set of all tr(F σ) ∈ [−1, 1] with separable
σ. Since each such σ admits a convex decomposition into pure product states it is
sufficient to look at
hψ ⊗ φ, F ψ ⊗ φi = |hψ, φi|2 (3.13)
which ranges from 0 to 1. Hence ρ from Equation (3.8) is separable iff 1/2 ≤ λ ≤ 1
and entangled otherwise (due to λ = (tr(F ρ) + 1)/2). If H = C2 holds, each Werner
state is Bell diagonal and we recover the result from Subsection 3.1.1 (separable if
highest eigenvalue less or equal than 1/2).
3.1.3 Isotropic states
To derive a second class of states consider the partial transpose (Id ⊗Θ)ρ (with
respect to a distinguished base |ji ∈ H, j = 1, . . . , d) of a Werner state ρ. Since ρ is,
by definition, U ⊗U invariant, it is easy to see that (Id ⊗Θ)ρ is U ⊗ Ū invariant, where
35 3.1. Entanglement
Ū denotes component wise complex conjugation in the base |ji (we just have to use
that U ∗ = Ū T holds). Each state τ with this kind of symmetry is called an isotropic
state [183], and our previous discussion shows that τ is a linear combination of 1I
and the partial transpose of the flip, which is the rank one operator
d
X
Fe = (Id ⊗Θ)F = |ΨihΨ| = |jjihkk|, (3.14)
jk=1
P
where Ψ = j |jji is, up to normalization a maximally entangled state. Hence each
isotropic τ can be written as
µ ¶ · ¸
1 1I e d2
τ= λ + (1 − λ)F , λ ∈ 0, 2 , (3.15)
d d d −1
where the bounds on λ follow from normalization and positivity. As above we can
determine the parameter λ from the expectation value
1 − d2
tr(Fe τ ) = λ+d (3.16)
d
which ranges from 0 to d and this again leads to a twirl operation: For an arbitrary
state σ ∈ S(H ⊗ H) we can define
µ ¶
1 £ ¤ £ ¤
PUŪ (σ) = tr( e σ) − d 1I + 1 − d tr(Fe σ) Fe ,
F (3.17)
d(1 − d2 )
and as for Werner states PUŪ can be rewritten in terms of a group average
Z
PUŪ (σ) = (U ⊗ Ū )σ(U ∗ ⊗ Ū ∗ )dU. (3.18)
U(d)
Now we can proceed in the same way as above: PUŪ is a channel with PU∗ Ū = PUŪ ,
its fixed points PUŪ (τ ) = τ are exactly the isotropic states, and the image of the set
of separable states under PUŪ coincides with the set of separable isotropic states.
To determine the latter we have to consider the expectation values (cf. Equation
(3.13)) ¯ ¯
¯ d ¯
¯X ¯
hψ ⊗ φ, Fe ψ ⊗ φi = ¯¯ ψj φj ¯¯ = |hψ, φ̄i|2 ∈ [0, 1]. (3.19)
¯ j=1 ¯
This implies that τ is separable iff
d(d − 1) d2
≤ λ ≤ (3.20)
d2 − 1 d2 − 1
holds and entangled otherwise. For λ = 0 we recover the maximally entangled state.
For d = 2, again we recover again the special case of Bell diagonal states encountered
already in the last subsection.
3.1.4 OO-invariant states
Let us combine now Werner states with isotropic states, i.e. we look for density
matrices ρ which can be written as ρ = a1I + bF + cFe , or, if we introduce the three
mutually orthogonal projection operators
1e 1 1 1
p0 = F, p1 = (1I − F ), (1I + F ) − Fe (3.21)
d 2 2 d
as a convex linear combination of tr(pj )−1 pj , j = 0, 1, 2:
p1 p2
ρ = (1 − λ1 − λ2 )p0 + λ1 + λ2 , λ1 , λ2 ≥ 0, λ1 + λ2 ≤ 1 (3.22)
tr(p1 ) tr(p2 )
3. Basic examples 36
f
tr(F ρ)
3
-1
-1 0 1 2 3
tr(F ρ)
Figure 3.1: State space of OO-invariant states (upper triangle) and its partial trans-
pose (lower triangle) for d = 3. The special cases of isotropic and Werner states are
drawn as thin lines.
which we can express alternatively in terms of the expectation values tr(F ρ), tr( Fe ρ)
by
à !
tr(Fe ρ) 1 − tr(F ρ) 1 + tr(F ρ) tr(Fe ρ) p2
POO (ρ) = p0 + p1 + − . (3.24)
d 2 tr(p1 ) 2 d tr(p2 )
2 tr(Fe ρ)
−1 ≤ tr(F ρ) ≤ 1, 0 ≤ tr(Fe ρ) ≤ d, tr(F ρ) ≥ − 1. (3.25)
d
For d = 3 this is the upper triangle in Figure 3.1.
The values in the lower (dotted) triangle belong to partial transpositions of
OO-invariant states. The intersection of both, i.e. the gray shaded square Q =
37 3.1. Entanglement
[0, 1]×[0, 1], represents therefore the set of OO-invariant ppt states, and at the same
time the set of separable states, since each OO-invariant ppt state is separable. To
see the latter note that separable OO-invariant states form a convex subset of Q.
Hence, we only have to show that the corners of Q are separable.
¡ To
¢ do this note
that 1. POO (ρ) is separable whenever ρ is and 2. that tr F POO (ρ) = tr(F ρ) and
¡ ¢
tr Fe POO (ρ) = tr(F ρ) holds (cf. Equation (3.12)). We can consider pure product
¡ ¢ ¡ ¢
states |φ ⊗ ψihφ ⊗ ψ| for ρ and get |hφ, ψi|2 , hφ, ψ̄i|2 for the tuple tr(F ρ), tr(Fe ρ) .
Now the point 1, 1) in Q is obtained if ψ = φ is real, the point (0, 0) is obtained
for real and orthogonal φ, ψ and the point (1, 0) belongs to the case ψ = φ and
hφ, φ̄i = 0. Symmetrically we get (0, 1) with the same φ and ψ = φ̄.
3.1.5 PPT states
We have seen in Theorem 2.4.3 that separable states and ppt states coincide in 2 × 2
and 2×3 dimensions. Another class of examples with this property are OO-invariant
states just studied. Nevertheless, separability and a positive partial transpose are
not equivalent. An easy way to produce such examples of states which are entangled
and ppt is given in terms of unextendible product bases [22]. An orthonormal family
φj ∈ H1 ⊗ H2 , j = 1, . . . , N < d1 d2 (with dk = dim Hk ) is called an unextendible
product basis1 (UPB) iff 1. all φj are product vectors and 2. there is no product
vector orthogonal to all φj . Let us denote the projector to the span of all φj by E, its
orthocomplement by E ⊥ , i.e. E ⊥ = 1I−E, and define the state ρ = (d1 d2 −N )−1 E ⊥ .
It is entangled because there is by construction no product vector in the support of
ρ, and it is ppt. The latter can be seen as follows: The projector E is a sum of the
one dimensional projectors |φj ihφj |, j = 1, . . . , N . Since all φj are product vectors
the partial transposes of the |φj ihφj | are of the form |φej ihφej |, with another UPB
φej , j = 1, . . . , N and the partial transpose (1I ⊗ Θ)E of E is the sum of the |φej ihφej |.
Hence (1I ⊗ Θ)E ⊥ = 1I − (1I ⊗ Θ)E is a projector and therefore positive.
To construct entangled ppt states we have to find UPBs. The following two
examples are taken from [22]. Consider first the five vectors
Ψj = φj ⊗ φ2jmod5 , j = 0, . . . , 4 (3.27)
form a UPB in the Hilbert space H ⊗ H, dim H = 3 (cf. [22]). A second example,
again in 3×3 dimensional Hilbert space are the following five vectors (called “Tiles”
in [22]):
1 ¡ ¢ 1 ¡ ¢ 1 ¡ ¢
√ |0i ⊗ |0i − |1i , √ |2i ⊗ |1i − |2i , √ |0i − |1i ⊗ |2i,
2 2 2
1 ¡ ¢ 1¡ ¢ ¡ ¢
√ |1i − |2i ⊗ |0i, |0i + |1i + |2i ⊗ |0i + |1i + |2i , (3.28)
2 3
where |ki, k = 0, 1, 2 denotes the standard basis in H = C3 .
3.1.6 Multipartite states
In many applications of quantum information rather big systems, consisting of a
large number of subsystems, occur (e.g. a quantum register of a quantum computer)
and it is necessary to study the corresponding correlation and entanglement prop-
erties. Since this is a fairly difficult task, there is not much known about – much less
1 This name is somewhat misleading because the φj are not a base of H1 ⊗ H2 .
3. Basic examples 38
(k) (k)
with N orthonormal bases φ1 , . . . , φd of H(k) , k = 1, . . . , N . To get examples for
such states in the tri-partite case, note first that any partial trace of |ΨihΨ| with Ψ
from Equation (3.29) has separable eigenvectors. Hence, each purification (Corollary
2.2.2) of an entangled, two-partite, mixed state with inseparable eigenvectors (e.g.
a Bell diagonal state) does not admit a Schmidt decomposition. This implies on
the one hand that there are interesting new properties to be discovered, but on
the other we see that many techniques developed for bipartite pure states can be
generalized in a straightforward way only for states which are Schmidt decomposable
in the sense of Equation (3.29). The most well known representative of this class
for a tripartite qubit system is the GHZ state [101]
1 ¡ ¢
Ψ = √ |000i + |111i , (3.30)
2
which has the special property that contradictions between local hidden variable
theories and quantum mechanics occur even for non-statistical predictions (as op-
posed to maximally entangled states of bipartite systems; [101, 163, 162]).
A second new aspect arising in the discussion of multiparty entanglement is the
fact that several different notions of separability occur. A state ρ of an N -partite
system B(H1 ) ⊗ · · · ⊗ B(HN ) is called N -separable if
X
ρ= λ J ρ j1 ⊗ · · · ⊗ ρ jN , (3.31)
J
with states ρjk ∈ B ∗ (Hk ) and multi indices J = (j1 , . . . , jk ). Alternatively, how-
ever, we can decompose B(H1 ) ⊗ · · · ⊗ B(HN ) in two subsystems (or even into M
subsystems if M < N ) and call ρ biseparable if it is separable with respect to this
decomposition. It is obvious that N -separability implies biseparability with respect
to all possible decompositions. The converse is – not very surprisingly – not true.
One way to construct a corresponding counterexample is to use an unextendable
product base (cf. Subsection 3.1.5). In [22] it is shown that the tripartite qubit state
complementary to the UPB
1
|0, 1, +i, |1, +, 0i, |+, 0, 1i, |−, −, −i with |±i = √ (|0i ± |1i) (3.32)
2
structure of the set of all U ⊗N invariant states can be derived from representation
theory of the symmetric group (which can be tedious for large N !). For N = 3
this program is carried out in [81] and it turns out that the corresponding set of
invariant states is a five dimensional (real) manifold. We skip the details here and
refer to [81] instead.
3.2 Channels
In Section 2.3 we have introduced channels as very general objects transforming
arbitrary types of information (i.e. classical, quantum and mixtures of them) into
one another. In the following we will consider some of the most important special
cases.
3.2.1 Quantum channels
Many tasks of quantum information theory require the transmission of quantum
information over long distances, using devices like optical fibers or storing quantum
information in some sort of memory. Both situations can be described by a channel
or quantum operation T : B(H) → B(H), where T ∗ (ρ) is the quantum information
which will be received when ρ was sent, or alternatively: which will be read off
the quantum memory when ρ was written. Ideally we would prefer those channels
which do not affect the information at all, i.e. T = 1I, or, as the next best choice,
a T whose action can be undone by a physical device, i.e. T should be invertible
and T −1 is again a channel. The Stinespring Theorem (Theorem 2.3.2) immediately
shows that this implies T ∗ ρ = U ρU ∗ with a unitary U ; in other words the systems
carrying the information do not interact with the environment. We will call such
a kind of channel an ideal channel. In real situations however interaction with the
environment, i.e. additional, unobservable degrees of freedom, can not be avoided.
The general structure of such a noisy channel is given by
¡ ¢
T ∗ (ρ) = trK U (ρ ⊗ ρ0 )U ∗ (3.33)
ρ T ∗ (ρ)
Unitary
A
Note that there are in general many ways to express a channel this way, e.g. if
T is an ideal channel ρ 7→ T ∗ ρ = U ρU ∗ we can rewrite it with an arbitrary unitary
U0 : K → K by T ∗ ρ = tr2 (U ⊗ U0 ρ ⊗ ρ0 U ∗ ⊗ U0∗ ). This is the weakness of the ancilla
form compared to the Stinespring representation of Theorem 2.3.2. Nevertheless
Corollary 3.2.1 shows that each channel which is not an ideal channel is noisy in
the described way.
The most prominent example for a noisy channel is the depolarizing channel for
d-level systems (i.e. H = Cd )
1I
S(H) 3 ρ 7→ ϑρ + (1 − ϑ) ∈ S(H), 0≤ϑ≤1 (3.37)
d
or in the Heisenberg picture
tr(A)
B(H) 3 A 7→ ϑA + (1 − ϑ) 1I ∈ B(H). (3.38)
d
A Stinespring dilation of T (not the minimal one – this can be checked by counting
dimensions) is given by K = H ⊗ H ⊕ C and V : H → H ⊗ K = H ⊗3 ⊕ H with
"r #
1−ϑ X
d h√ i
|ji 7→ V |ji = |ki ⊗ |ki ⊗ |ji ⊕ ϑ|ji , (3.39)
d
k=1
(Id ⊗X1 )|ψihψ| = 1I, (Id ⊗X2 )|ψihψ| = F, (Id ⊗X3 )|ψihψ| = Fe . (3.43)
Using Equation (3.21) we can determine therefore the channels which belong to the
three extremal OO-invariant states (the corners of the upper triangle in Figure 3.1):
tr(A)1I − AT
T0 (A) = A, T1 (A) = (3.44)
d−1
· ¸
2 d¡ T
¢
T2 (A) = tr(A)1I + A − A (3.45)
d(d + 1) − 2 2
ϑ £ ¤ 1 − ϑ£ ¤
T (A) = tr(A)1I + AT + tr(A)1I − AT , ϑ ∈ [0, 1]; (3.46)
d+1 d−1
cf. Equation (3.8).
Let us come back now to the general case. We will state here the covariant
version of the Stinespring theorem (see [136] for a proof). The basic idea is that all
covariant channels are parameterized by representations on the dilation space.
Theorem 3.2.2 Let G be a group with finite dimensional unitary representations
πj : G → U(Hj ) and T : B(H1 ) → B(H2 ) a π1 , π2 - covariant channel.
holds. Hence the family (Txy )x∈X is a probability distribution on X and Txy is
therefore the probability to get the information x ∈ X at the output side of the
channel if y ∈ Y was send. Each classical channel is uniquely determined by its
matrix of transition probabilities. For X = Y we see that the information is trans-
mitted without error iff Txy = δxy , i.e. T is an ideal channel if T = Id holds and
noisy otherwise.
3.2.4 Observables and preparations
Let us consider now a channel which transforms quantum information B(H) into
classical information C(X). Since positivity and complete positivity are again equiv-
alent, we just have to look at a positive and unital map E : C(X) → B(H). With
the canonical basis |xihx|, x ∈ X of C(X)P we get a family Ex = E(|xihx|), x ∈ X
of positive operators Ex ∈ B(H) with x∈X Ex = 1I. Hence the Ex form a POV
measure, i.e. an observable. If on the other hand a POV measure Ex ∈ B(H), x ∈ X
is given we can define a quantum to classical channel E : C(X) → B(H) by
X
E(f ) = f (x)Ex . (3.48)
x∈X
This shows that the observable Ex , x ∈ X and the channel E can be identified and
we say E is the observable.
Keeping this interpretation in mind it is possible to have a short look at con-
tinuous observables without the need of abstract measure theory: We only have to
define the classical algebra C(X) for a set X which is not finite or discrete. To this
end assume that X is locally compact space (e.g. an open or closed subset of R d ).
We choose for C(X) the space of continuous, complex valued functions vanishing at
infinity, i.e. |f (x)| < ² for each ² > 0 provided x lies outside an appropriate compact
set. C(X) can be equipped with the sup-norm and becomes an Abelian C*-algebra
(cf. [35]). To interpret it as an operator algebra as assumed in Subsection 2.1.1
we have to identify f ∈ C(X) with the corresponding multiplication operator on
L2 (X, µ), where µ is an appropriate measure on X (e.g. the Lebesgue measure for
X ⊂ Rd ). An observable taking arbitrary values in X can now be defined as a
positive map E : C(X) → B(H). The probability to get a result in the open subset
ω ⊂ X during an E measurement on systems in the state ρ is
where supp denotes the support of f . Applying a little bit measure theory (basically
the Riesz-Markov theorem [186, Thm. IV.18] together with dominated convergence
[186, Thm. I.16] and linearity of the trace) it is easy to see that we can express
Kρ (ω) for each ρ by a positive operator E(ω) such that
¡ ¢
Kρ (ω) = tr E(ω)ρ (3.50)
43 3.2. Channels
holds. The family of operators E(ω) we get in this way has typical properties of a
measure (e.g. some sort of σ-additivity). Hence we have encountered the continuous
version of a POV measure. We do not want to discuss the technical details here
(cf. [115, Sect. 2.1] instead). For later use we will only remark here that we can
reconstruct the channel f 7→ E(f ) from the measure ω 7→ E(ω) in terms of the
integrals Z
E(f ) = f (x)E(dx) (3.51)
X
ρ ∈ B ∗ (K)
T
x∈X
ρ ∈ B ∗ (H)
hence we can identify T with the family Tx , x ∈ X. Finally we can consider the
second marginal of T
X
B(H) 3 A 7→ T (A ⊗ 1I) = Tx (A) ∈ B(K). (3.56)
x∈X
Hence we get the final state tr(Ex ρ)−1 Ex ρEx if we measure the value x ∈ X on
systems initially in the state ρ – this is well known from quantum mechanics.
Let us change now the role of B(H) ⊗ C(X) and B(K); in other words consider
a channel T : B(K) → B(H) ⊗ C(X) with hybrid input and quantum output. It
describes a device which changes the state of a system depending on additional
classical information. As for an instrument, T decomposes into a family
P of (uni-
tal!) channels Tx : B(K) → B(H) such that we get T ∗ (ρ ⊗ p) = x p x T ∗
x (ρ) in
the Schrödinger picture. Physically T describes a parameter dependent operation:
depending on the classical information x ∈ X the quantum information ρ ∈ B(K)
is transformed by the operation Tx (cf. figure 3.4)
Finally we can consider a channel T : B(H) ⊗ C(X) → B(K) ⊗ C(Y ) with hybrid
input and output to get a parameter dependent instrument (cf. figure 3.5): Similarly
to the discussion in the last paragraph we can define a family of instruments T y :
∗
ρ ∈ B ∗ (H) Ty,x (ρ) ∈ B ∗ (K)
T
y∈Y x∈X
Alice Bob
T ∗ ρ ∈ B ∗ (K1 ⊗ K2 )
ρ ∈ B ∗ (H1 ⊗ H2 )
TA
TB
Figure 3.6: One way LOCC operation; cf Figure 3.7 for an explanation.
P
B(H) ⊗ C(X) → B(K), y ∈ Y by the equation T ∗ (ρ ⊗ p) = y py Ty∗ (ρ). Physically
T describes the following device: It receives the classical information y ∈ Y and a
quantum system in the state ρ ∈ B ∗ (K) as input. Depending on y a measurement
with the instrument Ty is performed, which in turn produces the measuring value
∗
x ∈ X and leaves the quantum system in the state (up to normalization) T y,x (ρ);
with Ty,x given as in Equation (3.53) by Ty,x (A) = Ty (A ⊗ |xihx|).
3.2.6 LOCC and separable channels
Let us consider now channels acting on finite dimensional bipartite systems: T :
B(H1 ⊗ K2 ) → B(K1 ⊗ K2 ). In this case we can ask the question whether a channel
preserves separability. Simple examples are local operations (LO), i.e. T = T A ⊗ T B
with two channels T A,B : B(Hj ) → B(Kj ). Physically we think of such a T in terms
of two physicists Alice and Bob both performing operations on their own particle but
without information transmission neither classical nor quantum. The next difficult
step are local operations with one way classical communications (one way LOCC).
This means Alice operates on her system with an instrument, communicates the
classical measuring result j ∈ X = {1, . . . , N } to Bob and he selects an operation
depending on these data. We can write such a channel as a composition T = (T A ⊗
Id)(Id ⊗T B ) of the instrument T A : B(H1 ) ⊗ C(X1 ) → B(K1 ) and the parameter
dependent operation T B : B(H2 ) → C(X1 ) ⊗ B(K2 ) (cf. Figure 3.6)
Id ⊗T B T A ⊗Id (3.58)
B(H1 ⊗ H2 ) −−−−−→ B(H1 ) ⊗ C(X) ⊗ B(K2 ) −−−−→ B(K1 ⊗ K2 ).
Figure 3.7: LOCC operation. The upper and lower curly arrows represent Alice’s
respectively Bob’s quantum system, while the straight arrows in the middle stand
for the classical information Alice and Bob exchange. The boxes symbolize the
channels applied by Alice and Bob.
It is easy to see that a separable T maps separable states to separable states (up
to normalization) and that each LOCC channel is separable (cf. [21]). The converse
however is (somewhat surprisingly) not true: there are separable channels which are
not LOCC, see [21] for a concrete example.
3.3 Quantum mechanics in phase space
Up to now we have considered only finite dimensional systems and even in this
extremely idealized situation it is not easy to get nontrivial results. At a first look
the discussion of continuous quantum systems seems therefore to be hopeless. If we
restrict our attention however to small classes of states and channels, with suffi-
ciently simple structure, many problems become tractable. Phase space quantum
mechanics, which will be reviewed in this Section (see Chapter 5 of [111] for details),
provides a very powerful tool in this context.
Before we start let us add some remarks to the discussion of Chapter 2 which we
have restricted to finite dimensional Hilbert spaces. Basically most of the material
considered there can be generalized in a straightforward way, as long as topological
issues like continuity and convergence arguments are treated carefully enough. There
are of course some caveats (cf. in particular Footnote 2 of Chapter 2), however they
do not lead to problems in the framework we are going to discuss and can therefore
be ignored.
3.3.1 Weyl operators and the CCR
The kinematical structure of a quantum system with d degrees of freedom is
usually described by a separable Hilbert space H and 2d selfadjoint operators
Q1 , . . . , Qd , P1 , . . . , Pd satisfying the canonical commutation relations [Qj , Qk ] = 0,
[Pj , Pk ] = 0, [Qj , Pk ] = iδjk 1I. The latter can be rewritten in a more compact form
as
R2j−1 = Qj , R2j = Pj , j = 1, . . . , d, [Rj , Rk ] = −iσjk . (3.60)
47 3.3. Quantum mechanics in phase space
contrast to popular believe, not true: There are representations of the CCR which are unitarily
inequivalent to the Schrödinger representation; cf. [186] Section VIII.5 for particular examples.
Hence uniqueness can only be achieved on the level of Weyl operators – which is one major reason
to study them.
3. Basic examples 48
holds. By differentiation it is easy to check that ρ has indeed mean m and covariance
matrix α.
The most prominent examples for Gaussian states are the ground state ρ 0 of a
system of d harmonic oscillators (where the mean is 0 and α is given by the corre-
sponding classical Hamiltonian) and its phase space translates ρ m = W (m)ρW (−m)
(with mean m and the same α as ρ0 ), which are known from quantum optics as
coherent states. ρ0 and ρm are pure states and it can be shown that a Gaussian
state is pure iff σ −1 α = −1I holds (see [111], Ch. 5). Examples for mixed Gaussians
are temperature states of harmonic oscillators. In one degree of freedom this is
∞ µ ¶n
1 X N
ρN = |nihn| (3.67)
N + 1 n=0 N + 1
where |nihn| denotes the number basis and N is the mean photon number. The
characteristic function of ρN is
· µ ¶ ¸
£ ¤ 1 1
tr W (x)ρN = exp − N+ |x|2 , (3.68)
2 2
holds.
This theorem is somewhat similar to Theorem 2.4.1: It provides a useful criterion
as long as abstract considerations are concerned, but not for explicit calculations.
In contrast to finite dimensional systems, however, separability of Gaussian states
can be decided by an operational criterion in terms of nonlinear maps between
matrices [93]. To state it we have to introduce some terminology first. The key tool
is a sequence of 2n + 2m × 2n + 2m matrices αN , N ∈ N, written in block matrix
notation as · ¸
A N CN
αN = T . (3.70)
CN BN
Given α0 the other αN are recursively defined by:
if αN −iσ ≥ 0 and αN +1 = 0 otherwise. Here we have set XN = CN (BN −iσB )−1 CNT
5
and the inverse denotes the pseudo inverse if BN − iσB is not invertible. Now we
can state the following theorem (see [93] for a proof):
Theorem 3.3.2 Consider a Gaussian state ρ of a bipartite system with correlation
matrix α0 and the sequence αN , N ∈ N just defined.
The interesting question is now whether the ppt criterion is (for a given number
of degrees of freedom) equivalent to separability or not. The following theorem which
was proved in [197] for 1 × 1 systems and in [234] in 1 × d case gives a complete
answer.
Theorem 3.3.4 A Gaussian state of a quantum system with 1 × d degrees of free-
dom (i.e. dim XA = 2 and dim XB = 2d) is separable iff it is ppt; in other words iff
the condition of Proposition 3.3.3 holds.
For other kinds of systems the ppt criterion may fail which means that there
are entangled Gaussian states which are ppt. A systematic way to construct such
states can be found in [234]. Roughly speaking, it is based on the idea to go to the
boundary of the set of ppt covariance matrices, i.e. α has to satisfy Equation (3.65)
and (3.72) and it has to be a minimal matrix with this property. Using this method
explicit examples for ppt and entangled Gaussians are constructed for 2 × 2 degrees
of freedom (cf. [234] for details).
5 A−1 is the pseudo inverse of a matrix A if AA−1 = A−1 A is the projector onto the range of
for k > 1. If the environment is initially in a thermal state ρNe (cf. Equation (3.67))
this leads to · µ 2 ¶ ¸
£ ¤ 1 |k − 1|
T W (x) = exp + Nc x2 W (kx), (3.77)
2 2
51 3.3. Quantum mechanics in phase space
N 0 = k 2 N + max{0, k 2 − 1} + Nc . (3.78)
If Nc = 0 this means that T amplifies (k > 1) or damps (k < 1) the mean pho-
ton number, while Nc > 0 leads to additional classical, Gaussian noise. We will
reconsider this channel in greater detail in Chapter 6.
Chapter 4
Basic tasks
of the channels X
C(X) 3 f 7→ E(f ) = f (x)Ex ∈ B(H) (4.2)
x∈X
and X
C ∗ (X) 3 p 7→ D ∗ (p) = px ρx ∈ B ∗ (H), (4.3)
x∈X
i.e. ρe = D ∗ E ∗ (ρ) and this Equation makes sense even if X is not finite. The tele-
portation is successful if the output state ρe can not be distinguished from the input
state ρ by any statistical experiment, i.e. if D ∗ E ∗ (ρ) = ρ. Hence the impossibility
of classical teleportation can be rephrased simply as ED 6= Id for all observables E
and all preparations D.
4.1.2 Entanglement enhanced teleportation
Let us now change our setup slightly. Assume that Alice wants to send a quantum
state ρ ∈ B ∗ (H) to Bob and that she shares an entangled state σ ∈ B ∗ (K ⊗ K)
and an ideal classical communication channel C(X) → C(X) with him. Alice can
perform a measurement E : C(X) → B(H ⊗ K) on the composite system B(H ⊗ K)
consisting of the particle to teleport (B(H)) and her part of the entangled system
(B(K)). Then she communicates the classical data x ∈ X to Bob and he operates
with the parameter dependent operation D : B(H) → B(K) ⊗ C(X) appropriately
on his particle (cf. Figure 4.1). Hence the overall procedure can be described by the
53 4.1. Teleportation and dense coding
here tr12 denotes the partial trace over the first two tensor factors (= Alice’s qubits).
If Ω, the Φj and the Uj are related by the equation
Φj = (Uj ⊗ 1I)Ω (4.10)
Alice Bob
ρ ρ
x∈X
E Dx
To get an ideal channel we just have to choose mutually orthogonal pure states
ρx = |ψx ihψx |, x = 1, . . . , d on Alice’s side and the corresponding one-dimensional
projections Ey = |ψy ihψy |, y = 1, . . . , d on Bob’s. If d = 2 and H = C2 it is possible
to send one bit classical information via one qubit quantum information. The crucial
point is now that the amount of classical information can be increased (doubled in
the qubit case) if Alice shares an entangled state σ ∈ S(H ⊗ H) with Bob. To send
the classical information x ∈ X = {1, . . . , n} to Bob, Alice operates on her particle
with an operation Dx : B(H) → B(H), sends it through an (ideal) quantum channel
to Bob and he performs a measurement E1 , . . . , En ∈ B(H ⊗ H) on both particles.
The probability for Bob to measure y ∈ X if Alice has send x ∈ X is given by
£ ¤
tr (Dx ⊗ Id)∗ (σ)Ey , (4.12)
D ∗ ⊗Id E∗
C ∗ (X) ⊗ B ∗ (H) ⊗ B ∗ (H) −−−−→ B ∗ (H) ⊗ B ∗ (H) −−→ C ∗ (X) (4.13)
Alice Bob
x∈X x∈X
Dx E
i.e. T ∗ (p) = E ∗ ◦ (D∗ ⊗ Id)(p ⊗ σ). The advantage of this point of view is that it
works as well for infinite dimensional Hilbert spaces and continuous observables.
Finally let us again consider the case where H = Cd and X = {1, . . . , d2 }. If
we choose as in the last paragraph a maximally entangled vector Ω ∈ H ⊗ H, an
orthonormal base Φx ∈ H ⊗ H, j = x, . . . , d2 of maximally entangled vectors and
an orthonormal family Ux ∈ B(H ⊗ H), x = 1, . . . , d2 of unitary operators, we can
construct a dense coding scheme as follows: Ex = |Φx ihΦx |, Dx (A) = Ux∗ AUx and
σ = |ΩihΩ|. If Ω, the Φx and the Ux are related by Equation (4.10) it is easy to see
that we really get a dense coding scheme [231]. If d = 2 holds, we have to set again
the Bell basis for the Φx , Ω = Φ0 and the identity and the Pauli matrices for the
Ux . We recover in this case the standard example of dense coding proposed in [27]
and we see that we can transfer two bits via one qubit, as stated above.
4.2 Estimating and copying
The impossibility of classical teleportation can be rephrased as follows: It is impos-
sible to get complete information about the state ρ of a quantum system by one
measurement on one system. However, if we have many systems, say N , all prepared
in the same state ρ it should be possible to get (with a clever measuring strategy)
as much information on ρ as possible, provided N is large enough. In this way we
can circumvent the impossibility of devices like classical teleportation or quantum
copying at least in an approximate way.
4.2.1 Quantum state estimation
To discuss this idea in a more detailed way consider a number N of d-level quantum
systems, all of them prepared in the same (unknown) state ρ ∈ B ∗ (H). Our aim
is to estimate the state ρ by measurements on the compound system ρ ⊗N . This
is described in terms of an observable (in the following called “estimator”) E N :
C(S) → B(H⊗N ) with values in the quantum state space S = S(H). Since S is not
finite, we have to apply here the machinery introduced in Section 3.2.4, i.e. C(S) is
the algebra of continuous functions, and the probability to get a measuring value
in an open subset ω ⊂ S is given by (cf. Section 3.2.4)
¡ ¢
KN (ω) = sup{tr EN (f )ρ | f ∈ C(X), 0 ≤ f ≤ 1I, supp f ⊂ ω}. (4.16)
For many practical purposes it is sufficient to consider only those estimators which
admits a finite set of possible outcomes. In this case everything reduces to the finite
dimensional setup introduced in Chapter 2 and EN becomes
X
EN (f ) = f (σ)EN,σ (4.17)
σ∈X
However, to discuss structural problems, e.g. a quantitative analysis like the search
for an “optimal estimator” (cf. Chapter 10) a restriction to the special case from
Equation (4.17) is inappropriate.
The criterion for a good estimator EN is that for any one-particle density op-
erator ρ, the value measured on a state ρ⊗N is likely to be close to ρ, i.e. that the
probability KN (ω) is small if ω ⊂ S(H) is the complement of a small ball around
ρ. Of course, we will look at this problem for large N . So the task is to find a
whole “estimation scheme”, i.e. a sequence of observables EN , N = 1, 2, . . ., which
is “asymptotically exact”, i.e. error probabilities should vanish in the limit N → ∞.
Variants of this scheme arise if have some a priori knowledge about the input state
ρ. E.g. if we know that ρ is an element of a distinguished subset Y of the state space
S(H) it is sufficient to control the error probabilities for each ρ ∈ Y . Hence we can
improve the estimation quality for each ρ ∈ Y at the expense of the usefulness of
the estimates for ρ 6∈ Y . The most relevant special case is estimation of pure states
(i.e. Y is the set of pure states). It is much better understood than the general
problem and it admits a rather simple optimal solution which is closely related to
the corresponding cloning problem; we will come back to this circle of questions
in a more quantitative way in Chapter 10. Another special case, called “quantum
hypothesis testing”, arises if Y is finite. The task is to distinguish between finitely
many states in terms of a measurement on N equally prepared systems; cf. [109] for
an overview and [173, 107, 166] and the references therein for more recent results.
The most direct way to get an asymptotically exact estimation scheme is to
perform a sequence of measurements on each of the N input systems separately. A
finite set of observables which leads to a successful estimation strategy is usually
called a “quorum” (cf. e.g. [150, 226]). E.g. for d = 2 we can perform alternating
measurements of the three spin components. If ρ = 21 (1I + ~x · ~σ ) is the Bloch rep-
resentation of ρ (cf. Subsection 2.1.2) we see that the expectation values of these
measurements are given by 12 (1 + xj ). Hence we get an arbitrarily good estimate
if N is large enough (we leave the construction of the observable EN associated
to this scheme as an easy exercise to the reader). A similar procedure is possible
for arbitrary d if we consider the generalized Bloch representation for ρ (see again
Subsection 2.1.2). There are however more efficient strategies based on “entangled”
measurements (i.e. the EN (σ) can not be decomposed into pure tensor products)
on the whole input system ρ⊗N (e.g. [218, 137]). Somewhat in between are “adap-
tive schemes” [89] consisting of separate measurements but the j th measurement
depend on the results of (j − 1)th . We will reconsider this circle of questions in a
more quantitative way in Chapter 10.
4.2.2 Approximate cloning
By virtue of the no-cloning theorem [239], it is impossible to produce M perfect
copies of a d-level quantum system if N < M input systems in the common (un-
known) state ρ⊗N are given. More precisely there is no channel TM N : B(H⊗M ) →
B(H⊗N ) such that TM ∗
N (ρ
⊗N
) = ρ⊗M holds for all ρ ∈ S(H). Using state estima-
tion, however, it is easy to find a device TM N which produces at least approximate
copies which become exact in the limit N, M → ∞: If ρ⊗N is given, we measure the
observable EN and get the classical data σ ∈ S(H), which we use subsequently to
prepare M systems in the state σ ⊗M . In other words, TM N has the form
Z
∗ ⊗N
B (H ) 3 ρ 7→ σ ⊗M KN (dσ) ∈ B ∗ (H⊗M ) (4.19)
S
We immediately see that the probability to get wrong copies coincides exactly with
the error probability of the estimator EN . This shows first that we get exact copies
in the limit N → ∞ and second that the quality of the copies does not depend on the
number M of output systems, i.e. the asymptotic rate limN,M →∞ M/N of output
systems per input system can be arbitrary large. Note that the latter (independence
of cloning quality from the output number M ) is a special feature of the estimation
based cloning scheme just introduced. In Chapter 9 we will encounter cloning maps
which are not based on estimation and repreparation and which produce better
copies, as long as the required number M of outputs is finite.
Similar to the estimation problem we can improve the quality of the outcomes
if we can use a priori information about the state ρ to be cloned. The most relevant
example arises again if ρ is pure. A detailed discussion of this special case, including
the construction of the (unique) optimal pure state cloner, will be given in Chapter
9.
The fact that the cloning map from Equation (4.19) uses classical data at an in-
termediate step allows further generalizations. Instead of just preparing M systems
in the state σ detected by the estimator, we can apply first an arbitrary transfor-
mation F : S(H) → S(H) on the density matrix σ and prepare F (σ)⊗M instead of
σ ⊗M . In this way we get the channel (cf. Figure 4.3)
Z
∗ ⊗N
B (H ) 3 ρ 7→ F (σ)⊗M KN (dσ) ∈ B ∗ (H⊗M ), (4.21)
S
The probability to get a bad approximation of the state F (ρ)⊗M (if the input state
was ρ⊗N ) is again given by the error probability of the estimator and we get a
perfect realization of F at arbitrary rate as M, N → ∞.
There are in particular two interesting tasks which become possible this way:
The first is the “universal not gate” which associates to each pure state of a qubit the
unique pure state orthogonal to it [46]. This is a special example of a antiunitarily
implemented symmetry operation and therefore not completely positive. The second
example is the purification of states [55, 138]. Here it is assumed that the input
states were once pure but have passed later on a depolarizing channel |φihφ| 7→
ϑ|φihφ| + (1 − ϑ)1I/d. If ϑ > 0 this map is invertible but its inverse does not describe
an allowed quantum operation because it maps some density operators to operators
with negative eigenvalues. Hence the reversal of noise is not possible with a one shot
operation but can be done with high accuracy if enough input systems are available.
A detailed quantitative analysis is again postponed to Chapter 11.
4.3 Distillation of entanglement
Let us now return to entanglement. We have seen in Section 4.1 that maximally
entangled states play a crucial role for processes like teleportation and dense coding.
In practice however entanglement is a rather fragile property: If Alice produces a pair
of particles in a maximally entangled state |ΩihΩ| ∈ S(HA ⊗ HB ) and distributes
one of them over a great distance to Bob, both end up with a mixed state ρ which
contains much less entanglement then the original and which can not be used any
longer for teleportation. The latter can be seen quite easily if we try to apply
the qubit teleportation scheme (Subsection 4.1.2) with a non-maximally entangled
isotropic state (Equation (3.15) with λ > 0) instead of Ω.
4. Basic tasks 58
F (σ)⊗M ∈ B ∗ (H⊗M )
ρ⊗N ∈ B ∗ (H⊗N )
Preparation
Estimation
F
classical data
σ∈X⊂S F (σ) ∈ S
Hence the question arises, whether it is possible to recover |ΩihΩ| from ρ, or,
following the reasoning from the last section, at least a small number of (almost)
maximally entangled states from a large number N of copies of ρ. However since
the distance between Alice and Bob is big (and quantum communication therefore
impossible) only LOCC operations (Section 3.2.6) are available for this task (Alice
and Bob can only operate on their respective particles, drop some of them and
communicate classically with one another). This excludes procedures like the pu-
rification scheme just sketched, because we would need “entangled” measurements
to get an asymptotically exact estimate for the state ρ. Hence we need a sequence
of LOCC channels
⊗N ⊗N
TN : B(CdN ⊗ CdN ) → B(HA ⊗ HB ) (4.23)
such that
kTN∗ (ρ⊗N ) − |ΩN ihΩN |k1 → 0, for N → ∞ (4.24)
dN dN
holds, with a sequence of maximally entangled vectors ΩN ∈ C ⊗ C . Note that
⊗N ⊗N ∼
we have to use here the natural isomorphism HA ⊗ HB = (HA ⊗ HB )⊗N , i.e. we
⊗N
have to reshuffle ρ such that the first N tensor factors belong to Alice (HA ) and
the last N to Bob (HB ). If confusion can be avoided we will use this isomorphism
in the following without a further note. We will call a sequence of LOCC channels,
TN satisfying (4.24) with a state ρ ∈ S(HA ⊗ HB ) a distillation scheme for ρ and
ρ is called distillable if it admits a distillation scheme. The asymptotic rate with
which maximally entangled states can be distilled with a given protocol is
in the state ρ, so that the total state is ρ⊗N . To obtain a smaller number of pairs
with a higher F they proceed as follows:
1. First they take two pairs (let us call them pair 1 and pair 2), i.e. ρ ⊗ ρ and
apply to each of them the twirl operation PUŪ associated to isotropic states
(cf. Equation (3.18)). This can be done by LOCC operations in the following
way: Alice selects at random (respecting the Haar measure on U(2)) a unitary
operator U applies it to her qubits and sends to Bob which transformation she
has chosen; then he applies Ū to his particles. They end up with two isotropic
states ρe ⊗ ρe with the same maximally entangled fraction as ρ.
2. Each party performs the unitary transformation
they discard their particles (this requires classical communication). Obviously the
state ρe is entangled (this can be easily checked), hence they can proceed as in the
previous Subsection.
The scheme just proposed can be used to show that each state ρ which violates
the reduction criterion (cf. Subsection 2.4.3) can be distilled [117]. The basic idea
is to project ρ with the twirl PUŪ (which is LOCC as we have seen above; cf.
Subsection 4.3.1) to an isotropic state PUŪ (ρ) and to apply the procedure from the
last paragraph afterwards. We only have to guarantee that PUŪ (ρ) is entangled. To
this end use a vector ψ ∈ H ⊗ H with hψ, (1I ⊗ tr1 (ρ) − ρ)ψi < 0 (which exists by
assumption since ρ violates the reduction criterion) and to apply the filter operation
given by ψ via Equation (4.27).
4.3.3 Bound entangled states
It is obvious that separable states are not distillable, because a LOCC operation
map separable states to separable states. However is each entangled state distillable?
The answer, maybe somewhat surprising, is no and an entangled state which is not
distillable is called bound entangled [119] (distillable states are sometimes called free
entangled, in analogy to thermodynamics). Examples of bound entangled states are
all ppt entangled states [119]: This is an easy consequence of the fact that each
separable channel (and therefore each LOCC channel as well) maps ppt states to
ppt states (this is easy to check), but a maximally entangled state is never ppt. It
is not yet known, whether bound entangled npt states exists, however, there are
at least some partial results: 1. It is sufficient to solve this question for Werner
states, i.e. if we can show that each npt Werner state is distillable it follows that
all npt states are distillable [117]. 2. Each npt Gaussian state is distillable [92]. 3.
For each N ∈ N there is an npt Werner state ρ which is not “N -copy distillable”,
i.e. hψ, ρ⊗N ψi ≥ 0 holds for each pure state ψ with exactly two Schmidt summands
[72, 78]. This gives some evidence for the existence of bound entangled npt states
because ρ is distillable iff it is N -copy distillability for some N [119, 72, 78].
Since bound entangled states can not be distilled, they can not be used for
teleportation. Nevertheless bound entanglement can produce a non-classical effect,
called “activation of bound entanglement” [125]. To explain the basic idea, assume
that Alice and Bob share one pair of particles in a distillable state ρf and many
particles in a bound entangled state ρb . Assume in addition that ρf can not be
used for teleportation, or, in other words if ρf is used for teleportation the particle
Bob receives is in a state σ 0 which differs from the state σ Alice has send. This
problem can not be solved by distillation, since Alice and Bob share only one pair
of particles in the state ρf . Nevertheless they can try to apply an appropriate filter
operation on ρ to get with a certain probability a new state which leads to a better
quality of the teleportation (or, if the filtering fails, to get nothing at all). It can
be shown however [120] that there are states ρf such that the error occuring in
this process (e.g. measured by the trace norm distance of σ and σ 0 ) is always above
a certain threshold. This is the point where the bound entangled states ρ b come
into play: If Alice and Bob operate with an appropriate protocol on ρf and many
copies of ρb the distance between σ and σ 0 can be made arbitrarily small (although
the probability to be successful goes to zero). Another example for an activation
of bound entanglement is related to distillability of npt states: If Alice and Bob
share a certain ppt-entangled state as additional resource each npt state ρ becomes
distillable (even if ρ is bound entangled) [80, 145]. For a more detailed survey of the
role of bound entanglement and further references see [123].
4.4 Quantum error correction
If we try to distribute quantum information over large distances or store it for a
long time in some sort of “quantum memory” we always have to deal with “de-
61 4.4. Quantum error correction
coherence effects”, i.e. unavoidable interactions with the environment. This results
in a significant information loss, which is particularly bad for the functioning of a
quantum computer. Similar problems arise as well in a classical computer, but the
methods used there to circumvent the problems can not be transferred to the quan-
tum regime. E.g. the most simple strategy to protect classical information against
noise is redundancy: instead of storing the information once we make three copies
and decide during readout by a majority vote which bit to take. It is easy to see that
this reduces the probability of an error from order ² to ²2 . Quantum mechanically
however such a procedure is forbidden by the no cloning theorem.
Nevertheless quantum error correction is possible although we have to do it in a
more subtle way than just copying; this was observed for the first time independently
in [48] and [201]. Let us consider first the general scheme and assume that T :
B(K) → B(K) is a noisy quantum channel. To send quantum systems of type B(H)
undisturbed through T we need an encoding channel E : B(K) → B(H) and a
decoding channel D : B(H) → B(K) such that ET D = Id holds, respectively
D∗ T ∗ E ∗ = Id in the Schrödinger picture; cf. Figure 4.4.
4.4.1 The theory of Knill and Laflamme
To get a more detailed description of the structure of the channels E and D we will
give in the following a short review of the theory of error correcting codes in the
sense of Knill and Laflamme [143]. To this end start from the error corrector’s dream,
namely the situation in which all the errors happen in another part of the system,
where we do not keep any of the precious quantum information. This will help us to
characterize the structure of the kind of errors which such a scheme may tolerate,
or ‘correct’. Of course, the dream is just a dream for the situation we are mainly
interested in: several parallel channels, each of which may be affected by errors.
But the splitting of the system into subsystems, mathematically the decomposition
of the Hilbert space of the total system into a tensor product is something we
may change by a suitable unitary transformation. This is then precisely the role
of the encoding and decoding operations. The Knill-Laflamme theory is precisely
the description of the situation where such a unitary, and hence a coding/decoding
scheme exists. Constructing such schemes, however, is another matter, to which we
will turn in the next subsection.
So consider a system split into H = Hg ⊗ Hb , where the indices g and b stand
for ‘good’ and ‘bad’. We prepare the system in a state ρ ⊗ |ΩihΩ|, where ρ is the
quantum state we want Pto protect. Now come the errors in the form of a completely
positive map T (A) = i Fi∗ AFi . Then according to the error corrector’s dream, we
T
Encoding
Decoding
Id
ρ ρ
Id
Id
Id
Figure 4.4: Five bit quantum code: Encoding one qubit into five and correcting one
error.
4. Basic tasks 62
would just have to discard the bad system, and get the same state ρ as before.
The hardest demands for realizing this come from pure states ρ = |φihφ|, because
the only way that the restriction to the good system can again be |φihφ| is that the
state after errors factorizes, i.e.
X
T ∗ (|φ ⊗ Ωihφ ⊗ Ω|) = |Fi (φ ⊗ Ω)ihFi (φ ⊗ Ω)| = |φihφ| ⊗ σ . (4.29)
i
F U (φ ⊗ Ω) = U (φ ⊗ Φ(F )) (4.33)
holds. This equation describes precisely the elements F ∈ Emax of the maximal error
space.
63 4.4. Quantum error correction
P
To check that we really have ET D = Id for any channel T (A) = i Fi∗ AFi with
Fi ∈ Emax , it suffices to consider pure input states |φihφ|, and the measurement of
an arbitrary observable X at the output:
£ ¤ X £ ¤
tr |φihφ|ET D(X) = tr U |φ ⊗ Ωihφ ⊗ Ω|U ∗ Fi U (X ⊗ 1I)U ∗ Fi (4.34)
i
X £ ¤
= tr |φ ⊗ Φ(Fi )ihφ ⊗ Φ(Fi )|X ⊗ 1I (4.35)
i
X
= hφ, Xφi kΦ(Fi )k2 = hφ, Xφi. (4.36)
i
P
In the last equation we have used that i kΦ(Fi )k2 = 1, since E, T , and D each
map 1I to 1I.
The encoding E defined in Equation (4.31) is of the form E ∗ (ρ) = V ρV ∗ with
the encoding isometry V : H1 → H2 given by
V φ = U (φ ⊗ Ω) . (4.37)
If we just know this isometry and the error space we can reconstruct the whole
structure, including the decomposition H2 = H1 ⊗Hb ⊕(1I−U U ∗ )H2 , and hence the
decoding operation D. A necessary condition for this, first established by Knill and
Laflamme [143], is that, for arbitrary φ1 , φ2 ∈ H1 and error operators F1 , F2 ∈ E:
hV φ1 , F1∗ F2 V φ2 i = hφ1 , φ2 iω(F1∗ F2 ) (4.38)
holds with some numbers ω(F1∗ F2 )
independent of φ1 , φ2 . Indeed, from (4.33) we
immediately get this equation with ω(F1∗ F2 ) = hΦ(F1 ), Φ(F2 )i. Conversely, if the
Knill-Laflamme condition (4.38) holds, the numbers ω(F1∗ F2 ) serve as a (possibly
degenerate) scalar product on E, which upon completion becomes the ‘bad space’
Hb , such that F ∈ E is identified with a Hilbert space vector Φ(F ). The operator
U : φ⊗Φ(F ) = F V φ is then an isometry, as used at the beginning of this section. To
conclude, the Knill-Laflamme condition is necessary and sufficient for the existence
of a decoding operation. Its main virtue is that we can use it without having to
construct the decoding explicitly.
The most relevant example of such a scheme arises if we generalize the classical
idea of sending multiple copies in a certain sense. This means we encode the quan-
tum information we want to transmit into n systems which can be send separately
through multiple copies of a noisy channel; cf. Figure 4.4. In that case the space
H2 is the n-fold tensor product of the system H on which the noisy channels under
consideration act.
Definition 4.4.1 We say that a coding isometry V : H1 → H⊗n corrects f errors,
if it satisfies the Knill-Laflamme condition (4.38) for the error space Ef spanned
linearly by all operators of the kind X1 ⊗ X2 ⊗ · · · ⊗ Xn , where at most f places we
have a tensor factor Xi 6= 1I.
When F1 and F2 are both supported on at most f sites, the product F1∗ F2 ,
which appears in the Knill-Laflamme condition involves 2f sites. Therefore we can
paraphrase the condition by saying that
hV φ1 , XV φ2 i = hφ1 , φ2 iω(X) (4.39)
for X ∈ E2f . From Kraus operators in Ef we can build arbitrary channels of the
kind T = T1 ⊗ T2 ⊗ · · · ⊗ Tn , where at most f of the tensor factors Ti are channels
different from id.
There are several ways to construct error correcting codes (see e.g. [98, 47, 10]).
Most appropriate for our purposes are “Graph codes” [190], because they are quite
easy to describe and admit a simple way to check the error correction condition.
This will be the subject of the next subsection.
4. Basic tasks 64
Because Γ is symmetric, every term in this sum appears twice, hence adding a
multiple of d to any jx or Γxy will change the exponent in (4.40) by a multiple of
2π, and thus will not change VΓ .
The error correcting properties of VΓ are summarized in the following result
[190]. It is just the Knill-Laflamme condition with a special expression for the form
ω, for error operators such that F1∗ F2 is localized on a set Z.
Theorem 4.4.2 Let Γ be a graph, i.e., a symmetric matrix with entries Γ xy ∈ Zd ,
for x, y ∈ (X ∪Y ). Consider a subset Z ⊂ Y , and suppose that the (Y \Z)×(X ∪Z)-
submatrix of Γ is non-singular, i.e.,
X
∀y∈Y \Z Γyx hx ≡ 0 implies ∀x∈X∪Z hx ≡ 0 (4.42)
x∈X∪Z
where congruences are mod d. Then, for every operator F ∈ B(H Y ) localized on
Z, we have
VΓ∗ F VΓ = d−n tr(F )1IX (4.43)
Proof. It will be helpful to use the notation for collections of variables, already
present in (4.41) more systematically: for any subset W ⊂ X ∪ Y we write jW for
the collection of variables jy with y ∈ W . The Kronecker-Delta δ(jW ) is defined to
0, and one otherwise. By jW · ΓW W 0 · kW 0 we mean the
be zero if for any y ∈ W jy 6= P
suitably restricted sum, i.e., x∈W,y∈W 0 jx Γxy ky . The important sets to which we
65 4.4. Quantum error correction
Since F is localized on Z, the matrix element contains a factor δjy ,ky for every
y ∈ Y \ Z = Y 0 , so we can write hjY |F |kY i = hjZ |F |kZ iδ(jY 0 − kY 0 ). Therefore we
can compute the sum (4.44) in stages:
X
hjX |VΓ∗ F VΓ |kX i = hjZ |F |kZ iS(jX 0 , kX 0 ) , (4.45)
jZ ,kZ
where S(jX 0 , kX 0 ) is the sum over the Y 0 -variables, which, of course, still depends
on the input variables jX , kX and the variables jZ , kZ at the error positions:
³ ´
X iπ
kX∪Y ·Γ·kX∪Y −jX∪Y ·Γ·jX∪Y
−n d
S(jX 0 , kX 0 ) = d δ(jY 0 − kY 0 )e (4.46)
jY 0 ,kY 0
The sums in the exponent can each be split into four parts according to the de-
composition X 0 vs. Y 0 . The terms involving ΓY 0 Y 0 cancel because kY 0 = jY 0 . The
terms involving ΓX 0 Y 0 and ΓY 0 X 0 are equal because Γ is symmetric, and together
give 2jY 0 · ΓY 0 X 0 · (kX 0 − jX 0 ). The ΓX 0 X 0 remain unchanged, but only give a phase
factor independent of the summation variables. Hence
¡ ¢X
iπ 2πi
S(jX 0 , kX 0 ) = d−n e d kX 0 ·Γ·kX 0 −jX 0 ·Γ·jX 0 e d jY 0 ·ΓY 0 X 0 ·(kX 0 −jX 0 )
jY 0
¡ ¢
iπ 0
kX 0 ·Γ·kX 0 −jX 0 ·Γ·jX 0
= d−n e d d|Y |
δ(ΓY 0 X 0 · (kX 0 − jX 0 ))
¡ ¢
0 iπ
kX 0 ·Γ·kX 0 −jX 0 ·Γ·jX 0
= d−n+|Y | e d δ(kX 0 − jX 0 )
0
−n+|Y |
= d δ(kX 0 − jX 0 ) . (4.47)
Here we used at the first equation that the sum is a product of geometric series
as they appear in discrete Fourier transforms.
P At the second equality the main
condition of the Proposition enters: if x∈X 0 Γyx · (kx − jx ) vanishes for all y ∈ Y 0
as required by the delta-function then (and only then) the vector kX 0 − jX 0 must
vanish. But then the two terms in the exponent of the phase factor also cancel.
Inserting this result into (4.45), and using that δ(hX 0 ) = δ(hX )δ(hZ ), we find
0 X
hjX |VΓ∗ F VΓ |kX i = δ(jX − kX ) d−n+|Y | hjZ |F |jZ i
jZ
X
−n
= δ(jX − kX ) d hjY |F |jY i
jY
Here the error operator is considered in the first line as an operator on HZ , and as
an operator on HY in the second line, by tensoring it with 1IY 0 . This cancels the
0
dimension factor d|Y | 2
All that is left to get an error correcting code is to ensure that the conditions
of this Theorem are satisfied sufficiently often. This is evident from combining the
above Theorem with Definition 4.4.1.
4. Basic tasks 66
Corollary 4.4.3 Let Γ be a graph as in the previous Proposition, and suppose that
the (Y \ Z) × (X ∪ Z)-submatrix of Γ is non-singular for all Z ⊂ Y with up to 2f
elements. Then the code associated to Γ corrects f errors.
Two particular examples (which are equivalent!) are given in Figure 4.5. In both
cases we have N = 1, M = 5 and K = 1 i.e. one input node, which can be chosen
arbitrarily, five output nodes and the corresponding codes correct one error.
4.5 Quantum computing
Quantum computing is without a doubt the most prominent and most far reaching
application of quantum information theory, since it promises on the one hand, “ex-
ponential speedup” for some problems which are “hard to solve” with a classical
computer, and gives completely new insights into classical computing and complex-
ity theory on the other. Unfortunately, an exhaustive discussion would require its
own review article. Hence we we are only able to give a short overview (see Part II
of [172] for a more complete presentation and for further references).
4.5.1 The network model of classical computing
Let us start with a brief (and very informal) introduction to classical computing (for
a more complete review and hints for further reading see Chapter 3 of [172]). What
we need first is a mathematical model for computation. There are in fact several
different choices and the Turing machine [212] is the most prominent one. More
appropriate for our purposes is, however, the so called network model, since it allows
an easier generalization to the quantum case. The basic idea is to interpret a classical
(deterministic) computation as the evaluation of a map f : BN → BM (where
B = {0, 1} denotes the field with two elements) which maps N input bits to M
output bits. If M = 1 holds f is called a boolean function and it is for many purposes
sufficient to consider this special case – each general f is in fact a Cartesian product
of boolean functions. Particular examples are the three elementary gates AND, OR
and NOT defined in Figure 4.6 and arbitrary algebraic expressions constructed
from them: e.g. the XOR gate (x, y) 7→ x + y mod 2 which can be written as
(x ∨ y) ∧ ¬(x ∧ y). It is now a standard result of boolean algebra that each boolean
function can be represented in this way and there are in general many possibilities
to do this. A special case is the disjunctive normal form of f ; cf [225]. To write
such an expression down in form of equations is, however, somewhat confusing. f
is therefore expressed most conveniently in graphical form as a circuit or network,
i.e. a graph C with nodes representing elementary gates and edges (“wires”) which
determine how the gates should be composed; cf. Figure 4.7 for an example. A
a a
c c a b
b b
a b c a b c
0 0 0 0 0 0 a b
1 0 0 1 0 1 0 1
0 1 0 0 1 1 1 0
1 1 1 1 1 1
c = ab c = a + b − ab b=1−a
AND, ∧ OR, ∨ NOT, ¬
Figure 4.6: Symbols and definition for the three elementary gates AND, OR and
NOT.
67 4.5. Quantum computing
x + y mod 2
is that each circuit CN allows only the computation of a boolean function fN : BN → B which
acts on input data of length N . Since we are interested in answers for arbitrary finite length inputs
a sequence CN , N ∈ N of circuits with appropriate uniformity properties is needed; cf. [177] for
details.
4. Basic tasks 68
depends on the set of elementary operations we choose, e.g. the set of elementary
gates in the network model. It is therefore useful to divide computational problems
into complexity classes whose definitions do not suffer under model dependent as-
pects. The most fundamental one is the class P which contains all problems which
can be computed in “polynomial time”, i.e. t is, as a function of L, bounded from
above by a polynomial. The model independence of this class is basically the con-
tent of the strong Church Turing hypotheses which states, roughly speaking, that
each model of computation can be simulated in polynomial time on a probabilistic
Turing machine.
Problems of class P are considered “easy”, everything else is “hard”. However
even if a (decision) problem is hard the situation is not hopeless. E.g. consider
the factoring problem fac described above. It is generally believed (although not
proved) that this problem is is not in class P. But if somebody gives us a divisor
p < l of m it is easy to check whether p is really a factor, and if the answer is
true we have computed fac(m, l). This example motivates the following definition:
A decision problem f is in class NP (“nondeterministic polynomial time”) if there
is a boolean function f 0 in class P such that f 0 (x, y) = 1 for some y implies f (x). In
our example fac0 is obviously defined by fac0 (m, l, p) = 1 ⇔ p < l and p is a devisor
of m. It is obvious that P is a subset of NP the other inclusion however is rather
nontrivial. The conjecture is that P 6= NP holds and great parts of complexity
theory are based on it. Its proof (or disproof) however represents one of the biggest
open questions of theoretical informatics.
To introduce a third complexity class we have to generalize our point of view
slightly. Instead of a function f : BN → BM we can look at a noisy classical T
which sends the input value x ∈ BN to a probability distribution Txy , y ∈ BM on
BM (i.e. Txy is the transition matrix of the classical channel T ; cf. Subsection 3.2.3).
Roughly speaking, we can interpret such a channel as a probabilistic computation
which can be realized as a circuit consisting of “probabilistic gates”. This means
there are several different ways to proceed at each step and we use a classical random
number generator to decide which of them we have to choose. If we run our device
several times on the same input data x we get different results y with probability
Txy . The crucial point is now that we can allow some of the outcomes to be wrong
as long as there is an easy way (i.e. a class P algorithm) to check the validity of
the results. Hence we define BPP (“bounded error probabilistic polynomial time”)
as the class of all decision problems which admit a polynomial time probabilistic
algorithm with error probability less than 1/2 − ² (for fixed ²). It is obvious that
P ⊂ BPP holds but the relation between BPP and NP is not known.
4.5.3 Reversible computing
In the last subsection we have discussed the time needed to perform a certain
computation. Other physical quantities which seem to be important are space and
energy. Space can be treated in a similar way as time and there are in fact space-
related complexity classes (e.g PSPACE which stands for “polynomial space”).
Energy, however, is different, because it turns surprisingly out that it is possible to
do any calculation without expending any energy! One source of energy consumption
in a usual computer is the intrinsic irreversibility of the basic operations. E.g. a basic
gate like AND maps two input bits to one output bit, which obviously implies that
the input can not be reconstructed from the output. In other words: one bit of
information is erased during the operation of the AND gate, hence a small amount
of energy is dissipated to the environment. A thermodynamic analysis, known as
Landauer’s principle, shows that this energy loss is at least kB T ln 2, where T is the
temperature of the environment [148].
If we want to avoid this kind of energy dissipation we are restricted to reversible
processes, i.e. it should be possible to reconstruct the input data from the output
69 4.5. Quantum computing
U1 H
U2 U1 H
U3 U2 U1 H
· ¸ · ¸
1 1 1 1 0
H=√ Uk = −k
2 1 −1 0 e2 π
Figure 4.9: Quantum circuit for the discrete Fourier transform on a 4-qubit register.
1. The first step is in most cases preprocessing of the input data on a classical
computer. E.g. the Shor algorithm for the factoring problem does not work if
the input number m is a pure prime power. However in this case there is an
efficient classical algorithm. Hence we have to check first whether m is of this
particular form and use this classical algorithm where appropriate.
2. In the next step e have to prepare the quantum register based on these pre-
processed data. This means in the most simple case to write classical data,
i.e. to prepare the state |xi ∈ H⊗N if the (classical) input is x ∈ BN . In many
cases however it might be more intelligent to use a superposition of several
|xi, e.g. the state
1 X
Ψ= √ |xi, (4.48)
2N x∈BN
which represents actually the superposition of all numbers the registers can
represent – this is indeed the crucial point of quantum computing and we
come back to it below.
3. Now we can apply the quantum circuit C to the input state ψ and after the
calculation we get the output state U ψ, where U is the unitary represented
by C.
4. To read out the data after the calculation we perform a von Neumann mea-
surement in the computational basis, i.e. we measure the observable given by
the one dimensional projectors |xihx|, x ∈ BN . Hence we get x ∈ BN with
probability PN = |hψ|xi|2 .
71 4.5. Quantum computing
So, why is quantum computing potentially useful? First of all, a quantum com-
puter can perform at least as good as a classical computer. This follows immediately
from our discussion of reversible computing in Subsection 4.5.3 and the fact that
any invertible function f : BN → BN defines a unitary by Uf : |xi 7→ |f (x)i (the
quantum CNOT gate in Figure 4.8 arises exactly in this way from the classical
CNOT). But, there is on the other hand strong evidence which indicates that a
quantum computer can solve problems in polynomial time which a classical com-
puter can not. The most striking example for this fact is the Shor algorithm, which
provides a way to solve the factoring problem (which is most probably not in class
P) in polynomial time. If we introduce the new complexity class BQP of decision
problems which can be solved with high probability and in polynomial time with a
quantum computer, we can express this conjecture as BPP 6= BQP.
The mechanism which gives a quantum computer its potential power is the
ability to operate not just on one value x ∈ BN , but on whole superpositions
of values, as already mentioned in step 2 above. E.g. consider a, not necessarily
invertible, map f : BN → BM and the unitary operator Uf
H⊗N ⊗ H⊗M 3 |xi ⊗ |0i 7→ Uf |xi ⊗ |0i = |xi ⊗ |f (x)i ∈ H⊗N ⊗ H⊗M . (4.49)
If we let act Uf on a register in the state Ψ ⊗ |0i from Equation (4.48) we get the
result
1 X
Uf (Ψ ⊗ |0i) = √ |xi ⊗ |f (x)i. (4.50)
2N x∈BN
Hence a quantum computer can evaluate the function f on all possible arguments
x ∈ BN at the same time! To benefit from this feature – usually called quantum
parallelism – is, however, not as easy as it looks like. If we perform a measurement
on Uf (Ψ ⊗ |0i) in the computational basis we get the value of f for exactly one
argument and the rest of the information originally contained in Uf (Ψ ⊗ |0i) is
destroyed. In other words it is not possible to read out all pairs (x, f (x)) from
Uf (Ψ ⊗ |0i) and to fill a (classical) lookup table with them. To take advantage
from quantum parallelism we have to use a clever algorithm within the quantum
computation step (step 3 above). In the next section we will consider a particular
example for this.
Before we come to this point, let us give some additional comments which link
this section to other parts of quantum information. The first point concerns entan-
glement. The state Uf (Ψ ⊗ |0i) is highly entangled (although Ψ is separable since
£ ¤⊗N
Ψ = 2−1/2 (|0i + |1i) ), and this fact is essential for the “exponential speedup” of
computations we could gain in a quantum computer. In other words, to outperform
a classical computer, entanglement is the most crucial resource – this will become
more transparent in the next section. The second remark concerns error correction.
Up to now we have implicitly assumed that all components of a quantum computer
work perfectly without any error. In reality however decoherence effects make it
impossible to realize unitarily implemented operations, and we have to deal with
noisy channels. Fortunately it is possible within quantum information to correct at
least a certain amount of errors, as we have seen in Section 4.4). Hence unlike an
4. Basic tasks 72
analog computer2 a quantum computer can be designed fault tolerant, i.e. it can
work with imperfectly manufactured components.
4.5.5 Simons problem
We will consider now a particular problem (known as Simons problem; cf. [196])
which shows explicitly how a quantum computer can speed up a problem which is
hard to solve with a classical computer. It does not fit however exactly into the gen-
eral scheme sketched in the last subsection, because a quantum “oracle” is involved,
i.e. a black box which performs an (a priori unknown) unitary transformation on
an input state given to it. The term “oracle” indicates here that we are not in-
terested in the time the black box needs to perform the calculation but only in
the number of times we have to access it. Hence this example does not prove the
conjecture BPP 6= BQP stated above. Other quantum algorithms which we have
not the room here to discuss include: the Deutsch [69] and Deutsch-Josza problem
[70], the Grover search algorithm [103, 102] and of course Shor’s factoring algorithm
[192, 193].
Hence let us assume that our black box calculates the unitary Uf from Equation
(4.49) with a map f : BN → BN which is two to one and has period a, i.e. f (x) =
f (y) iff y = x + a mod 2. The task is to find a. Classically, this problem is hard, i.e.
we have to query the oracle exponentially often. To see this note first that we have
to find a pair (x, y) with f (x) = f (y) and the probability to get it with two random
queries is 2−N (since there is for each x exactly one y 6= x with f (x) = f (y)). If we
use the box 2N/4 times, we get less than 2N/2 different pairs. Hence the probability
to get the correct solution is 2−N/2 , i.e. arbitrarily small even with exponentially
many queries.
Assume now that we let our box act on a quantum register H ⊗N ⊗ H⊗N in the
state Ψ ⊗ |0i with Ψ from Equation (4.48) to get Uf (Ψ ⊗ |0i) from (4.50). Now
we measure the second register. The outcome is one of 2N −1 possible values (say
f (x0 )), each of which occurs equiprobable. Hence, after the measurement the first
register is the state 2−1/2 (|xi + |x + ai). Now we let a Hadamard gate H (cf. Figure
4.9) act on each qubit of the first register and the result is (this follows with a short
calculation)
1 ¡ ¢ 1 X
√ H ⊗N |xi + |x + ai = √ (−1)x·y |yi (4.51)
2 2 N −1
a·y=0
where the dot denotes the (B-valued) scalar product in the vector space B N . Now
we perform a measurement on the first register (in computational basis) and we get
a y ∈ BN with the property y · a = 0. If we repeat this procedure N times and if we
get N linear independent values yj we can determine a as a solution of the system
of equations y1 · a = 0, . . . , yN · a = 0. The probability to appear as an outcome of
the second measurement is for each y with y · a = 0 given by 21−N . Therefore the
success probability can be made arbitrarily big while the number of times we have
to access the box is linear in N .
4.6 Quantum cryptography
Finally we want to have a short look on quantum cryptography – another more
practical application of quantum information, which has the potential to emerge into
technology in the not so distant future (see e.g. [130, 126, 44] for some experimental
realizations and [97] for a more detailed overview). Hence let us assume that Alice
has a message x ∈ BN which she wants to send secretly to Bob over a public
communication channels. One way to do this is the so called “one-time pad”: Alice
generates randomly a second bit-string y ∈ BN of the same length as x sends x + y
2 If an analog computer works reliably only with a certain accuracy, we can rewrite the algorithm
1. Assume that Alice wants to transmit bits from the (randomly generated) key
y ∈ BN through an ideal quantum channel to Bob. Before they start they
settle upon two orthonormal bases e0 , e1 ∈ H, respectively f0 , f1 ∈ H, which
are mutually nonorthogonal, i.e. |hej , fk i| ≥ ² > 0 with ² big enough for each
j, k = 0, 1. If photons are used as information carrier a typical choice are
linearly polarized photons with polarization direction rotated by 45◦ against
each other.
2. To send one bit j ∈ B Alice selects now at random one of the two bases, say
e0 , e1 and then she sends a qubit in the state |ej ihej | through the channel.
Note that neither Bob nor a potential eavesdropper knows which bases she
has chosen.
3. When Bob receives the qubit he selects, as Alice before, at random a base and
performs the corresponding von Neumann measurement to get one classical
bit k ∈ B, which he records together with the measurement method.
successfully transmitted bits per bits sended is obviously 1/2. Hence Alice has
to send approximately twice as many bits as they need.
To see why this procedure is secure, assume now that the eavesdropper Eve can
listen and modify the information sent through the quantum channel and that she
can listen on the classical channel but can not modify it (we come back to this
restriction in a minute). Hence Eve can intercept the qubits sent by Alice and make
two copies of it. One she forwards to Bob and the other she keeps for later analysis.
Due to the no cloning theorem however she has produced errors in both copies and
the quality of her own decreases if she tries to make the error in Bob’s as small
as possible. Even if Eve knows about the two bases e0 , e1 and f0 , f1 she does not
know which one Alice uses to send a particular qubit3 . Hence Eve has to decide
randomly which base to choose (as Bob). If e0 , e1 and f0 , f1 are chosen optimal,
i.e. |hej , fk i|2 = 0.5 it is easy to see that the error rate Eve necessarily produces if
she randomly measures in one of the bases is 1/4 for large N . To detect this error
Alice and Bob simply have to sacrify portions of the generated key and to compare
randomly selected bits using their classical channel. If the error rate they detect is
too big they can decide to drop the whole key and restart from the beginning.
So let us discuss finally a situation where Eve is able to intercept the quantum
and the classical channel. This would imply that she can play Bob’s part for Alice
and Alice’s for Bob. As a result she shares a key with Alice and one with Bob.
Hence she can decode all secret data Alice sends to Bob, read it, and encode it
finally again to forward it to Bob. To secure against such a “woman in the middle
attack”, Alice and Bob can use classical authentication protocols which ensure that
the correct person is at the other end of the line. This implies that they need a
small amount of initial secret material which can be renewed however from the new
key they have generated through quantum communication.
3 If Alice and Bob uses only one basis to send the data and Eve knows about it she can produce
of course ideal copies of the qubits. This is actually the reason why two nonorthogonal bases are
necessary.
Chapter 5
Entanglement measures
In the last chapter we have seen that entanglement is an essential resource for
many tasks of quantum information theory, like teleportation or quantum compu-
tation. This means that entangled states are needed for the functioning of many
processes and that they are consumed during operation. It is therefore necessary
to have measures which tell us whether the entanglement contained in a number
of quantum systems is sufficient to perform a certain task. What makes this sub-
ject difficult, is the fact that we can not restrict the discussion to systems in a
maximally or at least highly entangled pure state. Due to unavoidable decoherence
effects realistic applications have to deal with imperfect systems in mixed states,
and exactly in this situation the question for the amount of available entanglement
is interesting.
5.1 General properties and definitions
The difficulties arising if we try to quantify entanglement can be divided, roughly
speaking, into two parts: Firstly we have to find a reasonable quantity which de-
scribes exactly those properties which we are interested in and secondly we have to
calculate it for a given state. In this section we will discuss the first problem and
consider several different possibilities to define entanglement measures.
5.1.1 Axiomatics
First of all, we will collect some general properties which a reasonable entanglement
measure should have (cf. also [24, 216, 215, 217, 121]). To quantify entanglement,
means nothing else but to associate a positive real number to each state of (finite
dimensional) two-partite systems.
Axiom E0 An entanglement measure is a function E which assigns to each state
ρ of a finite dimensional bipartite system a positive real number E(ρ) ∈ R + .
Note that we have glanced over some mathematical subtleties here, because E
is not just defined on the state space of B(H ⊗ K) systems for particularly cho-
sen Hilbert spaces H and K – E is defined on any state space for arbitrary finite
dimensional H and K. This is expressed mathematically most conveniently by a
family of functions which behaves naturally under restrictions (i.e. the restriction
to a subspace H0 ⊗ K0 coincides with the function belonging to H0 ⊗ K0 ). However
we will see soon that we can safely ignore this problem.
The next point concerns the range of E. If ρ is unentangled E(ρ) should be
zero of course and it should be maximal on maximally entangled states. But what
happens if we allow the dimensions of H and K to grow? To get an answer consider
first a pair of qubits in a maximally entangled state ρ. It should contain exactly
one bit entanglement i.e. E(ρ) = 1 and N pairs in the state ρ⊗N should contain
N bits. If we interpret ρ⊗N as a maximally entangled state of a H ⊗ H system
with H = CN we get E(ρ⊗N ) = log2 (dim(H)) = N , where we have to reshuffle in
ρ⊗N the tensor factors such that (C2 ⊗ C2 )⊗N becomes (C2 )⊗N ⊗ (C2 )⊗N (i.e. “all
Alice particles to the left and all Bob particles to the right”; cf. Section 4.3.) This
observation motivates the following.
Axiom E1 (Normalization) E vanishes on separable and takes its maximum on
maximally entangled states. More precisely, this means that E(σ) ≤ E(ρ) = log 2 (d)
for ρ, σ ∈ S(H ⊗ H) and ρ maximally entangled.
5. Entanglement measures 76
One thing an entanglement measure should tell us, is how much quantum infor-
mation can be maximally teleported with a certain amount of entanglement, where
this maximum is taken over all possible teleportation schemes and distillation pro-
tocols, hence it can not be increased further by additional LOCC operations on the
entangled systems in question. This consideration motivates the following Axiom.
Axiom E2 (LOCC monotonicity) E can not increase under LOCC operation,
i.e. E[T (ρ)] ≤ E(ρ) for all states ρ and all LOCC channels T .
A special case of LOCC operations are of course local unitary operations U ⊗ V .
Axiom E2 implies now that E(U ⊗ V ρU ∗ ⊗ V ∗ ) ≤ E(ρ) and on the other hand
E(U ∗ ⊗ V ∗ ρeU ⊗ V ) ≤ E(e
ρ) hence with ρe = U ⊗ V ρU ∗ ⊗ V we get E(ρ) ≤ E(U ⊗
V ρV ⊗U ) therefore E(ρ) = E(U ⊗V ρU ∗ ⊗V ∗ ). We fix this property as a weakened
∗ ∗
The last point we have to consider here are additivity properties: Since we are
looking at entanglement as a resource, it is natural to assume that we can do with
two pairs in the state ρ twice as much as with one ρ, or more precisely E(ρ ⊗ ρ) =
2E(ρ) (in ρ ⊗ ρ we have to reshuffle tensor factors again ;see above).
Axiom E5 (Additivity) For any pair of two-partite states ρ, σ ∈ S(H ⊗ K) we
have E(σ ⊗ ρ) = E(σ) + E(ρ).
Unfortunately this rather natural looking axiom seems to be too strong (it ex-
cludes reasonable candidates). It should be however always true that entanglement
can not increase if we put two pairs together.
Axiom E5a (Subadditivity) For any pair of states ρ, σ we have E(ρ ⊗ σ) ≤
E(ρ) + E(σ).
There are further modifications of additivity available in the literature. Most
frequently used is the following, which restricts Axiom E5 to the case ρ = σ:
77 5.1. General properties and definitions
Axiom E5b (Weak additivity) For any state ρ of a bipartite system we have
N −1 E(ρ⊗N ) = E(ρ).
Finally, the weakest version of additivity only deals with the behavior of E for
large tensor products, i.e. ρ⊗N for N → ∞.
Axiom E5c (Existence of a regularization) For each state ρ the limit
E(ρ⊗N )
E ∞ (ρ) = lim (5.2)
N →∞ N
exists.
5.1.2 Pure states
Let us consider now a pure state ρ = |ψihψ| ∈ S(H ⊗ K). If it is entangled its partial
trace σ = trH |ψihψ| = trK |ψihψ| is mixed and for a maximally entangled state it
is maximally mixed. This suggests to use the von Neumann entropy 1 of ρ, which
measures how much a state is mixed, as an entanglement measure for mixed states,
i.e. we define [17, 24] £ ¤
EvN (ρ) = − tr trH ρ ln(trH ρ) . (5.3)
It is easy to deduce from the properties of the von Neumann entropy that E vN
satisfies Axioms E0, E1, E3 and E5b. Somewhat more difficult is only Axiom E2
which follows however from a nice theorem of Nielsen [169] which relates LOCC
operations (on pure states) to the theory of majorization. To state it here we need
first some terminology. Consider two probability distributions λ = (λ 1 , . . . , λM )
and µ = (µ1 , . . . , µN ) both given in decreasing order (i.e. λ1 ≥ . . . ≥ λM and
µ1 ≥ . . . ≥ µN ). We say that λ is majorized by µ, in symbols λ ≺ µ, if
k
X k
X
λj ≤ µj ∀k = 1, . . . , min M, N (5.4)
j=1 j=1
holds. Now we have the following result (see [169] for a proof).
P 1/2 0
Theorem 5.1.1 A pure state ψ = j λj ej ⊗ ej ∈ H ⊗ K can be transformed
P 1/2
into another pure state φ = j µj fj ⊗ fj0 ∈ H ⊗ K via a LOCC operation, iff the
Schmidt coefficients of ψ are majorized by those of φ, i.e. λ ≺ µ.
The von Neumann entropy of the restriction trH |ψihψ| can be P immediately
calculated from the Schmidt coefficients λ of ψ by EvN (|ψihψ|) = − Pj λj ln(λj ).
Axiom E2 follows therefore from the fact that the entropy S(λ) = − j λj ln(λj )
of a probability distribution λ is a Shur concave function, i.e. λ ≺ µ implies S(λ) ≥
S(µ); see [171].
Hence we have seen so far that EvN is one possible candidate for an entanglement
measure on pure states. In the following we will see that it is in fact the only
candidate which is physically reasonable. There are basically two reasons for this.
The first one deals with distillation of entanglement. It was shown by Bennett et.
al. [17] that each state ψ ∈ H ⊗ K of a bipartite system can be prepared out of
(a possibly large number of) systems in an arbitrary entangled state φ by LOCC
operations. To be more precise, we can find a sequence of LOCC operations
£ ¤ £ ¤
TN : B (H ⊗ K)⊗M (N ) → B (H ⊗ K)⊗N (5.5)
such that
lim kTN∗ (|φihφ|⊗N ) − |ψihψ|k1 = 0 (5.6)
N →∞
1 We assume here and in the following that the reader is sufficiently familiar with entropies. If
holds with a nonvanishing rate r = limN →∞ M (N )/N . This is done either by dis-
tillation (r < 1 if ψ is higher entangled then φ) or by “diluting” entanglement,
i.e. creating many less entangled states from few highly entangled ones (r > 1).
All this can be performed in a reversible way: We can start with some maximally
entangled qubits dilute them to get many less entangled states which can be dis-
tilled afterwards to get the original states back (again only in an asymptotic sense).
The crucial point is that the asymptotic rate r of these processes is given in terms
of EvN by r = EvN (|φihφ|)/EvN (|ψihψ|). Hence we can say, roughly speaking that
EvN (|ψihψ|) describes exactly the amount of maximally entangled qubits which is
contained in |ψihψ|.
A second somewhat more formal reason is that EvN is the only entanglement
measure on the set of pure states which satisfies the axioms formulated above. In
other words the following “uniqueness theorem for entanglement measures” holds
[182, 217, 74]
Theorem 5.1.2 The reduced von Neumann entropy EvN is the only entanglement
measure on pure states which satisfies Axioms E0 – E5.
5.1.3 Entanglement measures for mixed states
To find reasonable entanglement measures for mixed states is much more difficult.
There are in fact many possibilities (e.g. the maximally entangled fraction intro-
duced in Subsection 3.1.1 can be regarded as a simple measure) and we want to
present therefore only four of the most reasonable candidates. Among those mea-
sures which we do not discuss here are negativity quantities ([220] and the references
therein) the “best separable approximation” [151], the base norm associated with
the set of separable states [219, 188] and ppt-distillation rates [185].
The first measure we want to present is oriented along the discussion of pure
states: We define, roughly speaking, the asymptotic rate with which maximally
entangled qubits can be distilled at most out of a state ρ ∈ S(H ⊗ K) as the
Entanglement of Distillation ED (ρ) of ρ; cf [20]. To be more precise consider all
possible distillation protocols for ρ (cf. Section 4.3), i.e. all sequences of LOCC
channels
TN : B(CdN ⊗ CdN ) → B(H⊗N ⊗ K⊗N ) (5.7)
such that
lim kTN∗ (ρ⊗N ) − |ΩN ihΩN | k1 = 0 (5.8)
N →∞
holds with a sequence of maximally entangled states ΩN ∈ CdN . Now we can define
log2 (dN )
ED (ρ) = sup lim sup , (5.9)
(TN )N ∈N N →∞ N
where the infimum is taken over all decompositions of ρ into a convex sum of pure
states. EF satisfies E0 - E4 and E5a (cf. [24] for E2 and [170] for E4 the rest follows
directly from the definition). Whether EF is (weakly) additive (Axiom E5b) is not
known. Furthermore it is conjectured that EF coincides with EC . However proven
is only the identity EF∞ = EC , where the existence of the regularization EF∞ of EF
follows directly from subadditivity.
Another idea to quantify entanglement is to measure the “distance” of the (en-
tangled) ρ from the set of separable states D. It hat turned out [216] that among
all possible distance functions the relative entropy is physically most reasonable.
Hence we define the relative entropy of entanglement as
£ ¡ ¢¤
ER (ρ) = inf S(ρ|σ), S(ρ|σ) = tr ρ log2 ρ − ρ log2 σ , (5.14)
σ∈D
where the infimum is taken over all separable states. It can be shown that E R
satisfies, as EF the Axioms E0 - E4 and E5a, where E1 and E2 are shown in [216]
and E4 in [73]; the rest follows directly from the definition. It is shown in [221] that
∞
ER does not satisfy E5b; cf. also Subsection 5.3. Hence the regularization ER of
ER differs from ER .
Finally let us give now some comments on the relation between the measures just
introduced. On pure states all measures just discussed, coincide with the reduced
von Neumann entropy – this follows from Theorem 5.1.2 and the properties stated in
the last Subsection. For mixed states the situation is more difficult. It can be shown
however that ED ≤ EC holds and that all “reasonable” entanglement measures lie
in between [121].
Theorem 5.1.3 For each entanglement measure E satisfying E0, E1, E2 and E5b
and each state ρ ∈ S(H ⊗ K) we have ED (ρ) ≤ E(ρ) ≤ EC (ρ).
Unfortunately no measure we have discussed in the last Subsection satisfies all
the assumptions of the theorem. It is possible however to get a similar statement for
the regularization E ∞ with weaker assumptions on E itself (in particular without
assuming additivity); cf [74].
5.2 Two qubits
Even more difficult than finding reasonable entanglement measures are explicit cal-
culations. All measures we have discussed above involve optimization processes over
spaces which grow exponentially with the dimension of the Hilbert space. A direct
numerical calculation for a general state ρ is therefore hopeless. There are however
some attempts to get either some bounds on entanglement measures or to get ex-
plicit calculations for special classes of states. We will concentrate this discussion to
5. Entanglement measures 80
some relevant special cases. On the one hand we will concentrate on EF and ER and
on the other we will look at two special classes of states where explicit calculations
are possible: Two qubit systems in this section and states with symmetry properties
in the next one.
5.2.1 Pure states
Assume for the rest of this section that H = C2 holds and consider first a pure state
ψ ∈ H ⊗ H. To calculate EvN (ψ) is of course not difficult and it is straightforward
to see that (cf. for all material of this and the following subsection [24]):
· ³ ´¸
1 p
EvN (ψ) = H 1 + 1 − C(ψ)2 (5.15)
2
holds, with
H(x) = −x log2 (x) − (1 − x) log2 (1 − x) (5.16)
and the concurrence C(ψ) of ψ which is defined by
¯ ¯
¯ 3 ¯ 3
¯X 2 ¯ X
C(ψ) = ¯¯ αj ¯¯ with ψ = αj Φ j , (5.17)
¯ j=0 ¯ j=0
where Φj , j = 0, . . . , 3 denotes the Bell basis (3.3). Since C becomes rather im-
portant in the following let us reexpress it as C(ψ) = |hψ, Ξψi|, where ψ 7→ Ξψ
denotes complex conjugation in Bell basis. Hence Ξ is an antiunitary operator and
it can be written as the tensor product Ξ = ξ ⊗ ξ of the map H 3 φ 7→ σ2 φ̄, where
φ̄ denotes complex conjugation in the canonical basis and σ2 is the second Pauli
matrix. Hence local unitaries (i.e. those of the form U1 ⊗ U2 ) commute with Ξ and
it can be shown that this is not only a necessary but also a sufficient condition for
a unitary to be local [222].
We see from Equations (5.15) and (5.17) that C(ψ) ranges from 0 to 1 and that
EvN (ψ) is a monotone function in C(ψ). The latter can be considered therefore as
an entanglement quantity in its own right. For a Bell state we get in particular
C(Φj ) = 1 while a separable state φ1 ⊗ φ2 leads to C(φ1 ⊗ φ2 ) = 0; this can be seen
easily with the factorization Ξ = ξ ⊗ ξ.
Assume now that one of the αj say α0 satisfies |α0 |2 > 1/2. This implies that
C(ψ) can not be zero since ¯ ¯
¯X ¯
¯ 3 2¯
¯ αj ¯¯ ≤ 1 − |α0 |2 (5.18)
¯
¯ j=1 ¯
must hold. Hence C(ψ) is at least 1 − 2|α0 |2 and this implies for EvN and arbitrary
ψ
( h p i
¡ 2
¢ H 21 + x(1 − x) x ≥ 21
EvN (ψ) ≥ h |hΦ0 , ψi| with h(x) = . (5.19)
0 x < 12
¡ ¢
where F |ψihψ| is the maximally entangled fraction of |ψihψ| which we have in-
troduced in Subsection 3.1.1.
To see that even equality holds in Equation (5.20) note first that it is sufficient to
consider the case ψ = a|00i+b|11i with a, b ≥ 0, a2 +b2 = 1, since each pure state ψ
can be brought into this form (this follows again from the Schmidt decomposition)
by a local unitary transformation which on the other hand does not change E vN .
The maximally
¡ ¢ entangled state which maximizes |hψ, Φi|2 is in this case Φ0 and we
2
£ F
get ¡ |ψihψ|¢¤ = (a + b) /2 = 1/2 + ab. Straightforward calculations now show that
h F |ψihψ| = h(1/2 + ab) = EvN (ψ) holds as stated.
5.2.2 EOF for Bell diagonal states
It is easy to extend the inequality (5.20) to mixed states if we use the convexity of
EF and the fact that EF coincides with EvN on pure states. Hence (5.20) becomes
£ ¤
EF (ρ) ≥ h F(ρ) . (5.21)
The
P reduced von Neumann entropy of all these states equals h(λ1 ), hence
j µj EvN (|Ψj ihΨj |) = h(λ1 ) and therefore EF (ρ) = h(λ1 ). Since the maximally
entangled fraction of ρ is obviously λ1 we see that (5.21) holds with equality.
Assume now that the highest eigenvalue is less than 1/2. Then we can find phase
P3
factors exp(iφj ) such that j=0 exp(iφj )λj = 0 holds and ρ can be expressed as a
convex linear combination of the states
p 3
X p
eiφ0 /2 λ0 Φ0 + i (±eiφj /2 λj )Φj . (5.23)
j=1
1
Entanglement of Formation
EF (ρ) Relative Entropy
ER (ρ)
0.8
0.6
0.4
0.2
0
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
Highest eigenvalue λ of ρ
Figure 5.1: Entanglement of Formation and Relative Entropy of Entanglement for
Bell diagonal states, plotted as a function of the highest eigenvalue λ of ρ
with q
√ √
R= ρΞρΞ ρ. (5.26)
Here we have set ρ = |ψihψ|. The definition of the hermitian matrix R however
makes sense for arbitrary ρ as well. If we write λj , j = 1, . . . , 4 for the eigenvalues of
R and λ1 is without loss of generality the biggest one we can define the concurrence
of an arbitrary two qubit state ρ as [238]
¡ ¢
C(ρ) = max 0, 2λ1 − tr(R) = max(0, λ1 − λ2 − λ3 − λ4 ). (5.27)
It is easy to see that C(|ψihψ|) coincides with C(ψ) from (5.17). The crucial point
is now that Equation (5.15) holds for EF (ρ) if we insert C(ρ) instead of C(ψ):
Theorem 5.2.2 (Wootters Formula) The Entanglement of Formation of a two
qubit system in a state ρ is given by
· ³ ´¸
1 p
EF (ρ) = H 1 + 1 − C(ρ)2 (5.28)
2
where the concurrence of ρ is given in Equation (5.27) and H denotes the binary
entropy from (5.16).
Since this is much more involved than the simple case discussed in Subsection 5.2.2
we omit the proof and refer to [238] instead. Note however that Equation (5.28)
really coincides with the special cases we have derived for pure and Bell diagonal
states. Finally let us add the remark that there is no analogon of Wootters’ for-
mula for higher dimensional Hilbert spaces. It can be shown [222] that the essential
properties of the Bell basis Φj , j = 0, .., 3 which would be necessary for such a
generalization are available only in 2 × 2 dimensions.
5.2.4 Relative entropy for Bell diagonal states
To calculate the Relative Entropy of Entanglement ER for two qubit systems is
more difficult. However there is at least an easy formula for Bell diagonal states
which we will give in the following; [216].
Proposition 5.2.3 The Relative Entropy of Entanglement for a Bell diagonal state
ρ with highest eigenvalue λ is given by (cf. Figure 5.1)
(
1 − H(λ) λ > 21
ER (ρ) = (5.29)
0 λ ≤ 12
P3
Proof. For a Bell diagonal state ρ = j=0 λj |Φj ihΦj | we have to calculate
£ ¡ ¢¤
ER (ρ) = inf tr ρ log2 ρ − ρ log2 σ (5.30)
σ∈D
X 3
= tr(ρ log2 ρ) + inf − λj hΦj , log2 (σ)Φj i . (5.31)
σ∈D
j=0
Since log is a concave function we have − log 2 hΦj , σΦj i ≤ hΦj , − log2 (σ)Φj i and
therefore
3
X
ER (ρ) ≥ tr(ρ log2 ρ) + inf − λj log2 hΦj , σΦj i . (5.32)
σ∈D
j=0
Hence only the diagonal elements of σ in the Bell basis enter the minimization
on the right hand side of this inequality and this implies that we can restrict the
infimum to the set of separable Bell diagonal state. Since a Bell diagonal state is
separable iff all its eigenvalues are less than 1/2 (Proposition 5.2.1) we get
X3 3
X
ER (ρ) ≥ tr(ρ log2 ρ) + inf − λj log2 pj , with pj = 1. (5.33)
pj ∈[0,1/2]
j=0 j=0
This is an optimization problem (with constraints) over only four real parameters
and easy to solve. If the highest eigenvalue of ρ is greater than 1/2 we get p 1 = 1/2
and pj = λj /(2 − 2λ), where we have chosen without loss of generality λ = λ1 . We
get a lower bound on ER (ρ) which is achieved if we insert the corresponding σ in
Equation (5.31). Hence we have proven the statement for λ > 1/2. which completes
the proof, since we have already seen that λ ≤ 1/2 implies that ρ is separable
(Proposition 5.2.1). 2
The equality in the last Equation is of course a non-trivial statement which has to
be proved. We skip this point, however, and refer the reader to [221]. The advantage
of this scheme relies on the fact that spaces of G invariant states are in general very
low dimensional (if G is not too small). Hence the optimization problem contained
in step 3 has a much bigger chance to be tractable than the one we have to solve for
the original definition of EF . There is of course no guarantee that any of this three
steps can be carried out in a concrete situation. For the three examples mentioned
above, however, there are results available, which we will present in the following.
5.3.2 Werner states
Let us start with Werner states [221]. In this case ρ is uniquely determined by its
flip expectation value tr(ρF ) (cf. Subsection 3.1.2). To determine Φ ∈ H ⊗ H such
that PUU |ΦihΦ| = ρ holds, we have to solve therefore the equation
X
hΦ, F Φi = Φjk Φkj = tr(F ρ), (5.36)
jk
where Φjk denote components of Φ in the canonical basis. On the otherPhand the
reduced density matrix ρ = tr1 |ΦihΦ| has the matrix elements ρjk = l Φjl Φkl .
By exploiting U ⊗ U invariance we can assume without loss of generality that ρ is
diagonal. Hence to get the function ²UU we have to minimize
" #
¡ ¢ X X
2
EvN |ΦihΦ| = S |Φjk | (5.37)
j k
under the constraint (5.36), where S(x) = −x log2 (x) denotes the von Neumann
entropy. We skip these calculations here (see [221] instead) and state the results
only. For tr(F ρ) ≥ 0 we get ²(ρ) = 0 (as expected since ρ is separable in this case)
and with H from (5.16)
· ³ ´¸
1 p
²UU (ρ) = H 1 − 1 − tr(F ρ) 2 (5.38)
2
85 5.3. Entanglement measures under symmetry
1
EF (ρ)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1 -0.8 -0.6 -0.4 -0.2 0
tr(ρF )
Figure 5.2: Entanglement of Formation for Werner states plotted as function of the
flip expectation.
for tr(F ρ) < 0. The minima are taken for Φ where all Φjk except one diagonal
element are zero in the case tr(F ρ) ≥ 0 and for Φ with only two (non-diagonal)
coefficients Φjk , Φkj , j 6= k nonzero if tr(ρF ) < 0. The function ² is convex and
coincides therefore with its convex hull such that we get
Proposition 5.3.1 For any Werner state ρ the Entanglement of Formation is
given by (cf. Figure 5.2)
( h ³ p ´i
H 12 1 − 1 − tr(F ρ)2 tr(F ρ) < 0
EF (ρ) = (5.39)
0 tr(F ρ) ≥ 0.
with V = U1T U2 and after inserting the definition of Fe . Following our general
scheme, we have to minimize EvN (|ΦihΦ|) under the constraint given in Equation
5. Entanglement measures 86
2
d=4
²UŪ (ρ) d=3
1.8 d=2
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0
1 1.5 2 2.5 3 3.5 4
tr(ρFe )
Figure 5.3: ²-function for isotopic states plotted as a function of the flip expectation.
For d > 2 it is not convex near the right endpoint.
(5.42). This is explicitly done in [210]. We will only state the result here, which
leads to the function
(
H(γ) + (1 − γ) log2 (d − 1) tr(ρFe ) ≥ d1
²UŪ (ρ) = (5.43)
0 tr(ρFe ) < 0
with µq q ¶2
1
γ= 2 tr(ρFe ) + e
[d − 1][d − tr(ρF )] . (5.44)
d
For d ≥ 3 this function is not convex (cf. Figure 5.3), hence we get
Proposition 5.3.2 For any isotropic state the Entanglement of Formation is given
as the convex hull
P P
EF (ρ) = inf{ j λj ²UŪ (σj ) | ρ = j λj σj , PUŪ σ = σ} (5.45)
1
ER (ρ)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1 -0.8 -0.6 -0.4 -0.2 0
tr(ρF )
Figure 5.5: Relative Entropy of Entanglement for Werner states, plotted as a func-
tion of the flip expectation.
The sets of Werner and isotropic states are just intervals and the corresponding
separable states form subintervals over which we have to perform the optimization.
Due to the convexity of the relative entropy in both arguments, however, it is
clear that the minimum is attained exactly at the boundary between entangled and
separable states. For Werner states this is the state σ0 with tr(F σ0 ) = 0, i.e. it
gives equal weight to both minimal projections. To get ER (ρ) for a Werner state ρ
5. Entanglement measures 88
we have to calculate therefore only the relative entropy with respect to this state.
Since all Werner states can be simultaneously diagonalized this is easily done and
we get: µ ¶
1 + tr(F ρ)
ER (ρ) = 1 − H (5.46)
2
Similarly, the boundary point σ1 for isotropic states is given by tr(Fe σ1 ) = 1 which
leads to
à ! à !
tr(Fe ρ) tr(Fe ρ) 1 − tr(Fe ρ)
ER (ρ) = log2 d − 1 − log2 (d − 1) − S , (5.47)
d d d
for each entangled isotropic state ρ, and 0 if ρ is separable. (S(p1 , p2 ) denotes here
the entropy of the probability vector (p1 , p2 ).)
2
d=2
ER (ρ) d=3
d=4
1.5
0.5
0
1 1.5 2 2.5 3 3.5 4
tr(ρFe )
Let us now consider OO-invariant states. As for EOF we divide the state space
into the separable square and the three triangles A, B, C; cf. Figure 5.4. The state
at the coordinates (1, d) is a maximally entangled state and all separable states on
the line connecting (0, 1) with (1, 1) minimize the relative entropy for this state.
Hence consider a particular state σ on this line. The convexity property of the
relative entropy immediately shows that σ is a minimizer for all states on the line
connecting σ with the state at (1, d). In this way it is easy to calculate E R (ρ) for
all ρ in A. In a similar way we can treat the triangle B: We just have to draw a line
from ρ to the state at (−1, 0) and find the minimizer for ρ at the intersection with
the separable border between (0, 0) and (0, 1). For all states in the triangle C the
relative entropy is minimized by the separable state at (0, 1).
An application of the scheme just reviewed is a proof that ER is not additive, i.e.
it does not satisfy Axiom E5b. To see this consider the state ρ = tr(P− )−1 P− where
P− denotes the projector on the antisymmetric subspace. It is a Werner state with
flip expectation −1 (i.e. it corresponds to the point (−1, 0) in Figure 5.4). According
to our discussion above S(ρ| · ) is minimized in this case by the separable state σ 0
89 5.3. Entanglement measures under symmetry
and we get ER (ρ) = 1 independently of the dimension d. The tensor product ρ⊗2
can be regarded as a state in S(H⊗2 ⊗ H⊗2 ) with U ⊗ U ⊗ V ⊗ V symmetry, where
U, V are unitaries on H. Note that the corresponding state space of U U V V invariant
states can be parameterized by the expectation of the three operators F ⊗ 1I, 1I ⊗ F
and F ⊗ F (cf. [221]) and we can apply the machinery just described to get the
minimizer σe of S(ρ| · ). If d > 2 holds it turns out that
d+1 d−1
σ
e= P+ ⊗ P + + P− ⊗ P − (5.48)
2d tr(P+ )2 2d tr(P− )2
holds (where P± denote the projections onto the symmetric and antisymmetric
subspaces of H ⊗ H) and not σe = σ0 ⊗ σ0 as one would expect. As a consequence
we get the inequality
µ ¶
2d − 1
⊗2
ER (ρ ) = 2 − log2 < 2 = S(ρ⊗2 |σ0⊗2 ) = 2ER (ρ). (5.49)
d
The cb-norm improves the sometimes annoying property of the unusual operator
norm that quantities like kT ⊗ IdB(Cd ) k may increase with the dimension d. On
infinite dimensional observable algebras kT kcb can be infinite although each term
in the supremum is finite. A particular example for a map with such a behavior is the
transposition on an infinite dimensional Hilbert space. A map with finite cb-norm
is therefore called completely bounded. In a finite dimensional setup each linear
map is completely bounded. For the transposition Θ on Cd we have in particular
kΘkcb = d. The cb-norm has some nice features which we will use frequently; this
includes its multiplicativity kT1 ⊗T2 kcb = kT1 kcb kT2 kcb and the fact that kT kcb = 1
holds for each (unital) channel. Another useful relation is kT kcb = kT ⊗ IdB(H) k,
which holds if T is a map B(H) → B(H). For more properties of the cb-norm let us
refer to [178].
91 6.1. Definition and elementary properties
where the infimum is taken over all encoding and decoding channels E : A2 → B2
respectively D : B1 → A1 . The map S plays the role of a reference channel and
∆(T, S) is the minimal error we have to take into account if we want to simulate S
by T and appropriate encodings and decodings. If we try in particular to transmit
B systems through T we have to choose B1 = B2 = B and S = IdB . In this case we
write
∆(T, B) = ∆(T, IdB ) = inf kET D − IdB kcb . (6.3)
E,D
In Section 4.4, we have seen that we can reduce the error if we take M copies of the
channel instead of just one. More generally we are interested in the transmission
of “codewords of length” N , i.e. B ⊗N systems using M copies of the channel T .
Encodings and decodings are in this case channels of the form E : A⊗M 2 → B ⊗N
⊗N ⊗M
respectively D : B → A1 . If we increase the number M of channels the error
∆(T ⊗M , B ⊗N ) decreases provided the rate with which N grows as a function of
M is not too large. A more precise formulation of this idea leads to the following
definition.
Definition 6.1.1 A number c ≥ 0 is called achievable rate for a channel T with
respect to a reference channel S, if for any pair of sequences Mj , Nj , j ∈ N with
Mj → ∞ and lim supj→∞ Nj /Mj < c we have
The supremum of all achievable rates is called the capacity of T with respect to S
and denoted by C(T, S). If S is the ideal channel on an observable algebra B we
write C(T, B) instead of C(T, IdB ). Similarly we write C(A, S) if T is an ideal A
channel.
Note that by definition c = 0 is an achievable rate hence C(T, S) ≥ 0. If on
the other hand each c > 0 is achievable we write C(T, B) = ∞. At a first look
it seems cumbersome to check all pairs of sequences with given upper ratio when
testing c. Due to some monotonicity properties of ∆, however, it can be shown that
it is sufficient to check only one sequence provided the Mj satisfy the additional
condition Mj /(Mj+1 ) → 1. This is the subject of the following lemma.
Lemma 6.1.2 Let (Mα )α∈N be a strictly increasing sequence of integers such that
limα Mα+1 /Mα = 1. Suppose Nα are integers such that limα ∆(T ⊗Mα , S ⊗Na ) = 0.
Then any
Nα
c < lim inf (6.5)
α Mα
is an admissible rate. Moreover, if the errors decrease exponentially, in the sense
that ∆(T ⊗Mα , S ⊗Nα ) ≤ µe−λMα (µ, λ ≥ 0), then they decrease exponentially for
M → ∞ with rate
−1
lim inf ln ∆(T ⊗M , S ⊗bcM c ) ≥ λ, (6.6)
M →∞ M
for sequences Mj , Nj , j ∈ N with c > limj→∞ Nj /Mj > log2 f / log2 d. This implies
⊗N ⊗M
that there is a j0 ∈ N such that dim Md j > dim Mf j holds for all j > j0 .
⊗Nj ⊗Mj
Therefore each decoding map D : Md → Mf must have a nontrivial kernel.
⊗N
Let A ∈ Md jwith D(A) = 0 and kAk = 1. Then we have for any k ∈ N and
B ∈ Mk with kBk = 1:
kED − Id kcb ≥ k(ED − Id) ⊗ Id k ≥ k(ED − Id)(A) ⊗ Id(B)k = 1. (6.12)
⊗M ⊗N
Hence ∆(Mf j , Md j ) ≥ 1 for all j > j0 in contradiction to (6.11) which implies
C(Md , Mf ) = log2 f / log2 d. Similar reasoning holds for C(Cf , Cd ) and C(Mf , Cd ),
and the proof is complete1 . 2
In the previous proposition we have excluded the case C(Cf , Md ), i.e. the quan-
tum capacity of an ideal classical channel. From the “no-teleportation theorem”
we expect that this quantity is zero. For a proof of this statement it is useful to
introduce first a simple upper bound on C(T, Md ) (cf. [116])
1 For the classical capacity of a quantum channel C(M , C ), it is, however, more difficult
f d
to derive an analog of the error estimate (6.12). We skip this part nevertheless and leave the
corresponding details to the reader.
93 6.1. Definition and elementary properties
where we have used for the last equation the fact that Dj and ΘEj Θ are channels
and that the cb-norm is multiplicative. Taking logarithms on both sides we get
Nj logd (1 − ²)
+ ≤ logd kΘT kcb . (6.17)
Mj Mj
where we have used for the last inequality the fact that the cb-norm of a channel is
one. If c1 is an achievable rate of T1 with respect to T2 such that limj→∞ Nj /Mj < c1
and c2 is an achievable rate of T2 with respect to T3 such that limj→∞ Mj /Kj < c2
(i.e. the sequences of quotients converge) we see that
Nj M j Nj Nj Mk
lim inf = lim inf ≤ lim lim . (6.22)
j→∞ Kj j→∞ Kj Mj j→∞ Mj k→∞ Kk
6. Channel capacity 94
Hence each c < c1 c2 is achievable. Since C(T1 , T3 ) is the supremum over all achiev-
able rates we get (6.18). 2
closely related to our version. It is not yet clear whether equality holds. There might be subtle
differences [147].
95 6.2. Coding theorems
When all channels are ideal, or when all systems involved are classical even
equality holds, i.e. channel capacities are additive in this case. However, if quantum
channels are considered, it is one of the big open problems of the field, to decide
under which conditions additivity holds.
6.1.3 Relations to entanglement measures
The duality lemma proved in Subsection 2.3.3 provides an interesting way to de-
rive bounds on channel capacities and capacity like quantities from entanglement
measures (and vice versa) [24, 122]: To derive a state of a bipartite system from a
channel T we can take a maximally entangled state Ψ ∈ H ⊗ H, send one particle
through T and get a less entangled pair in the state ρT = (Id ⊗T ∗ )|ΨihΨ|. If on the
other hand an entangled state ρ ∈ S(H ⊗ H) is given, we can use it as a resource
for teleportation and get a channel Tρ . The two maps ρ 7→ Tρ and T 7→ ρT are,
however, not inverse to one another. This can be seen easily from the duality lemma
(Theorem 2.3.5): For each state ρ ∈ S(H ⊗ H) there is a channel T and a pure state
Φ ∈ H ⊗ H such that ρ = (Id ⊗T ∗ )|ΦihΦ| holds; but Φ is in general not maximally
entangled (and uniquely determined by ρ). Nevertheless, there are special cases in
which the state derived from Tρ coincides with ρ: A particular class of examples is
given by teleportation channels derived from a Bell-diagonal state.
On ρT we can evaluate an entanglement measure E(ρT ) and get in this way a
quantity which is related to the capacity of T . A particularly interesting candidate
for E is the “one-way LOCC” distillation rate ED,→ . It is defined in the same way
as the entanglement of distillation ED , except that only one-way LOCC operation
are allowed in Equation (5.8). According to [24] ED,→ is related to Cq by the
inequalities ED,→ (ρ) ≥ Cq (Tρ ) and ED,→ (ρT ) ≤ Cq (T ). Hence if ρTρ = ρ we can
calculate ED,→ (ρ) in terms of Cq (Tρ ) and vice versa.
A second interesting example is the transposition bound Cθ (T ) introduced in
the last subsection. It is related to the logarithmic negativity [220]
which measures the degree with which the partial transpose of ρ fails to be positive.
Eθ can be regarded as entanglement measure although it has some drawbacks: it is
not LOCC monotone (Axiom E2), it is not convex (Axiom E3) and most severe: It
does not coincides with the reduced von Neumann entropy on pure states, which we
have considered as “the” entanglement measure for pure states. On the other hand
it is easy to calculate and it gives bounds on distillation rates and teleportation
capacities [220]. In addition Eθ can be used together with the relation between
depolarizing channels and isotropic states to derive Equation (6.50) in a very simple
way.
6.2 Coding theorems
To determine channel capacities directly in terms of Definition 6.1.1 is fairly difficult,
because optimization problems in spaces of exponentially fast growing dimensions
are involved. This renders in particular each direct numerical approach practically
impossible. It is therefore an important task of (quantum) information theory to
express channel capacities in terms of quantities which are easier to compute. In
this section we will review the most important of these “coding theorems”.
6.2.1 Shannon’s theorem
Let us consider first a classical to classical channel T : C(Y ) → C(X). This is
basically the situation of classical information theory and we will only have a short
6. Channel capacity 96
look here – mainly to show how this (well known) situation fits into the general
scheme described in the last section3 .
First of all we have to calculate the error quantity ∆(T, C2 ) defined in Equation
(6.2). As stated in Subsection 3.2.3 T is completely determined by its transition
probabilities Txy , (x, y) ∈ X × Y describing the probability to receive x ∈ X when
y ∈ Y was sent. Since the cb-norm for a classical algebra coincides with the ordinary
norm we get (we have set X = Y for this calculation):
¯ ¯
¯X ¯
¯ ¯
k Id −T kcb = k Id −T k = sup ¯ (δxy − Txy ) fy ¯ (6.32)
x,f ¯ y
¯
= 2 sup (1 − Txx ) (6.33)
x
where the supremum in the first equation is taken over all f ∈ C(X) with kf k =
supy |fy | ≤ 1. We see that the quantity in Equation (6.33) is exactly twice the
maximal error probability, i.e. the maximal probability of sending x and getting
anything different. Inserting this quantity for ∆ in Definition 6.1.1 applied to a
classical channel T and the “bit-algebra” B = C2 , we get exactly Shannon’s classical
definition of the capacity of a discrete memoryless channel [191].
Hence we can apply Shannon’s noisy channel coding theorem to calculate C c (T )
for a classical channel. To state it we have to introduce first some terminology.
Consider therefore a state p ∈ C ∗ (X) of the classical input algebra C(X) and its
image q = T ∗ (p) ∈ C ∗ (Y ) under the channel. p and q are probability distributions
on X respectively Y and px canPbe interpreted as the probability that the “letter”
x ∈ X was send. Similarly qy = x Txy px is the probability that y ∈ Y was received
and Pxy = Txy px is the probability that x ∈ X was sent and y ∈ Y was received.
The family of all Pxy can be interpreted as a probability distribution P on X × Y
and the Txy can be regarded as conditional probability of P under the condition x.
Now we can introduce the mutual information
X µ ¶
Pxy
I(p, T ) = S(p) + S(q) − S(P ) = Pxy log2 , (6.34)
px q y
(x,y)∈X×Y
where S(p), S(q) and S(P ) denote the entropies of p, q and P . The mutual infor-
mation describes, roughly speaking, the information that p and q contain about
each other. E.g. if p and q are completely uncorrelated (i.e. Pxy = px qy ) we get
I(p, T ) = 0. If T is on the other hand an ideal bit-channel and p equally distributed
we have I(p, T ) = 1. Now we can state Shannon’s Theorem which expresses the
classical capacity of T in terms of mutual informations [191]:
Theorem 6.2.1 (Shannon) The classical capacity of Cc (T ) of a classical commu-
nication channel T : C(Y ) → C(X) is given by
where the supremum is taken over all probability distributions pj and collections of
density operators ρj .
6.2.3 Entanglement assisted capacity
Another classical capacity of a quantum channel arises, if we use dense coding
schemes instead of simple encodings and decodings to transmit the data through
the channel T . In other words we can define the entanglement enhanced classical
capacity Ce (T ) in the same way as Cc (T ) but by replacing the encoding and decod-
ing channels in Definition 6.1.1 and Equation (6.2) by dense coding protocols. Note
that this implies that the sender Alice and the receiver Bob share an (arbitrary)
amount of (maximally) entangled states prior to the transmission.
For this quantity a coding theorem was recently proven by Bennett and others
[26] which we want to state in the following. To this end assume that we are trans-
mitting systems in the state ρ ∈ B ∗ (H) through the channel and that ρ has the
purification Ψ ∈ H ⊗ H, i.e. ρ = tr1 |ΨihΨ| = tr2 |ΨihΨ|. Then we can define the
entropy exchange h¡ ¢¡ ¢i
S(ρ, T ) = S T ⊗ Id |ΨihΨ| . (6.38)
¡ ¢¡ ¢
The density operator T ⊗ Id |ΨihΨ| has the output state T ∗ (ρ) and the input
state ρ as its partial traces. It can be regarded therefore as the quantum analog of
the input/output probability distribution Pxy defined in Subsection 6.2.1. Another
way to look at S(ρ, T ) is in terms of an ancilla representation of T : If T ∗ (ρ) =
trK (U ρ ⊗ ρK U ∗ ) with a unitary U on H ⊗ K and a pure environment state ρK it
can be shown [13] that S(ρ, T ) = S [TK∗ ρ] where TK is the channel describing the
information transfer into the environment, i.e. TK∗ (ρ) = trH (U ρ ⊗ ρK U ∗ ), in other
words S(ρ, T ) is the final entropy of the environment. Now we can define
I(ρ, T ) = S(ρ) + S(T ∗ ρ) − S(ρ, T ) (6.39)
which is the quantum analog of the mutual information given in Equation (6.34).
It has a number of nice properties, in particular positivity, concavity with respect
6. Channel capacity 98
to the input state and additivity [3] and its maximum with respect to ρ coincides
actually with Ce (T ) [26].
Theorem 6.2.3 The entanglement assisted capacity Ce (T ) of a quantum channel
T : B(H) → B(H) is given by
Here S(T ∗ ρ) is the entropy of the output state and S(ρ, T ) is the entropy exchange
defined in Equation (6.38). It is argued [13] that J(ρ, T ) plays a role in quantum
information theory which is analogous to that of the (classical) mutual information
(6.34) in classical information theory. J(ρ, T ) has some nasty properties, however: it
can be negative [51] and it is known to be not additive [71]. To relate it to Cq (T ) it
is therefore not sufficient to consider a one-shot capacity as in Shannon’s Theorem
(Thm 6.2.1). Instead we have to define
1
Cs (T ) = sup Cs,1 (T ⊗N ) with Cs,1 (T ) = sup J(ρ, T ). (6.42)
N N ρ
holds for any channel. Cθ is in many cases a weaker bound as Cs , however it is much
easier to calculate and it is in particular useful if we want identify cases where the
quantum capacity is zero (e.g. the quantum capacity of a classical channel discussed
in Corollary 6.1.5).
Finally we want to mention that lower bounds can be derived in terms of rates
which can be achieved in terms of distinguished coding schemes; cf. e.g [24, 99, 71,
158]. A detailed discussion of this approach is given in the next chapter.
6.2.5 Examples
Although the expressions provided in the coding theorems above are much easier to
calculate then the original definitions they still involve some optimization problems
over possibly large parameter spaces. Nevertheless there are special cases which
allow explicit calculations. As a first example we will consider the “quantum erasure
channel” which transmits with probability 1−ϑ the d-dimensional input state intact
99 6.2. Coding theorems
This example is very unusal, because all capacities discussed up to now can be
calculated explicitly: We get Cc,1 (T ) = Cc (T ) = (1 − ϑ) log2 (d) for the clas-
sical, Ce (T ) = 2Cc (T ) for the entanglement enhanced classical capacity and
Cq (T ) = max(0, (1 − 2ϑ) log2 (d)) for the quantum capacity [23, 25]. Hence the
gain by entanglement assistance is exactly a factor two; cf. Figure 6.1.
2
classical capacity
Ce (T ) ee. classical capacity
quantum capacity
Cc (T )
Cq (T ) 1.5
0.5
0
0 0.2 0.4 0.6 0.8 1
ϑ
Figure 6.1: Capacities of the quantum erasure channel plotted as a function of the
error probability.
the first term underPthe sup in Equation (6.37) becomes maximal and the second
∗
becomes minimal: j pj T ρj is maximally mixed in this case and its entropy is
therefore maximal. The entropies of the T ∗ ρj are on the other hand minimal if the
ρj are pure. In Figure 6.2 we have plotted both capacities as a function of the noise
parameter ϑ and in Figure 6.3 we have plotted the quotient Ce (T )/Cc (T ) which
gives an upper bound on the gain we get from entanglement assistance. Note in this
context that due to a result of King [142] Cc (T ) = Cc,1 (T ) holds for the depolarizing
channel.
2
one-shot cl. capacity
Ce (T ) entanglement enhanced cl. capacity
1.8
Cc (T )
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
θ
3
Ce (T )
Cc,1 (T ) 2.9
2.8
2.7
2.6
2.5
2.4
2.3
2.2
2.1
2
0 0.2 0.4 0.6 0.8 1
θ
Figure 6.3: Gain of using entanglement assisted versus unassisted classical capacity
for a depolarizing qubit channel.
101 6.2. Coding theorems
1
one-shot coherent information
Cθ (T ) transposition bound
Hamming bound
Cs,1 (T )
0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
ϑ
For the quantum capacity of the depolarizing channel precise calculations are
not available. Hence let us consider first the coherent information. J(T, ρ) inherits
from T its unitary covariance, i.e. we have J(U ρU ∗ , T ) = J(ρ, T ). In contrast to
the mutual information, however, it does not have nice concavity properties, which
makes the optimization over all input states more difficult to solve. Nevertheless,
the calculation of J(ρ, T ) is straightforward and we get in the qubit case (if ϑ is the
noise parameter of T and λ is the highest eigenvalue of ρ):
µ ¶ µ ¶ µ ¶
ϑ 1 − ϑ/2 + A 1 − ϑ/2 − A
J(ρ, T ) = S λ(1 − ϑ) + −S −S
2 2 2
µ ¶ µ ¶
λϑ (1 − λ)ϑ
−S −S (6.48)
2 2
where S(x) = −x log2 (x) denotes again the entropy function and
p
A = (2λ − 1)2 (1 − ϑ/2)2 + 4λ(1 − λ)(1 − ϑ)2 . (6.49)
Advanced topics
Chapter 7
Continuity of the quantum capacity
In the Section 6.2 we have stated that a coding theorem for the quantum ca-
pacity is not yet available. Nevertheless, there are several subproblems which can
be treated independently and which admit simpler solutions. One of them concerns
the question we are going to answer in the following: Is it possible to correct small
errors with small coding effort? Or more precisely: If a channel T is close to an
identical channel Id, is Cq (T ) close to Cq (Id)?. The arguments in this chapter are
based on [139]. Closely related discussions given by other authors are [105, 158]
7.1 Discrete to continuous error model
In Section 4.4 we have described how errors can be corrected, which occur only on a
small number k of n > k parallel channels. Hence the corresponding schemes correct
rare rather than small errors which occur on each copy of the parallel channels T ⊗n .
Nevertheless, the discrete theory can be applied to the situation we are studying in
this chapter. This is the content of the following Proposition. It is the appropriate
formulation of “reducing the order of errors from ε to εf +1 ”.
Proposition 7.1.1 Let T : B(H) → B(H) be a channel, and let E, D be encoding
and decoding channels for coding m systems into n systems. Suppose that this coding
scheme corrects f errors (Definition 4.4.1), and that
Then
kET ⊗n D − id kcb ≤ kT − id kfcb+1 2nH2 ((f +1)/n) , (7.2)
where H2 (r) = −r log2 r − (1 − r) log2 (1 − r) denotes the Shannon entropy of the
probability distribution (r, 1 − r).
Proof. Into ET ⊗n D, we insert the decomposition T = id +(T − id) and expand the
product. This gives 2n terms, containing tensor products with some number, say k,
of tensor factors (T − id) and tensor factors id on the remaining (n − k) sites. Now
when k ≤ f , the error correction property makes the term zero. Terms with k > f
we estimate by kT − id kkcb . Collecting terms we get
n
X µ ¶
n
kET ⊗n D − id kcb ≤ kT − id kkcb . (7.3)
k
k=f +1
The rest then follows from the next Lemma (with r = (f + 1)/n). It treats the
exponential growth in n for truncated binomial sums.
Lemma 7.1.2 Let 0 ≤ r ≤ 1 and a > 0 such that a ≤ r/(1 − r). Then, for all
integers n: Ã n µ ¶ !
1 X n ¡
log ak ≤ log ar ) + H2 (r) . (7.4)
n k
k=rn
Proof. For λ > 0 we can estimate the step function by an exponential, and get
n µ ¶
X Xn µ ¶
n k n k λ(k−rn)
a ≤ a e
k k
k=rn k=0
¡ ¢n
= e−λrn 1 + aeλ = M (λ)n (7.5)
105 7.2. Coding by random graphs
¡ ¢
with M (λ) = e−λr 1 + aeλ . The minimum over all real λ is attained at aeλmin =
r/(1−r). We get λmin ≥ 0 precisely when the conditions of the Lemma are satisfied,
in which case the bound is computed by evaluating M (λ). 2 2
This goes to zero, and even exponentially to zero, as soon as the expression in
parentheses is < 1. This will be the case whenever kT − id kcb is small enough, or,
more precisely,
kT − id kcb ≤ 2−H2 (ε)/ε . (7.7)
Note in addition that we have for all n ∈ N
² − n1
2H2 (ε)/ε < 1 . (7.8)
1−²+ n
2−H2 (²)/²
0.8
0.6
0.4
²
e
0.2
0
0 0.2 0.4 0.6 0.8 1
²
Figure 7.1: The two bounds from Equation (7.9) plotted as a function of ².
get P < 1, so that there must be at least one matrix Γ correcting f errors. The
crucial point is that this observation does not depend on n, but only on the rate-like
parameters m/n and f /n. Let us make this behavior a Definition:
Definition 7.2.3 Let d be an integer. Then we say a pair (µ, ²) consisting of a
coding rate µ and an error rate ² is achievable, if for every n we can find an
encoding E of dµne d-level systems into n d-level systems correcting b²nc errors.
Then we can paraphrase the last proposition as saying that all pairs (µ, ²) with
are achievable. This is all the input we need for the next section, although a better
coding scheme, giving larger µ or larger ² would also improve the rate estimates
proved there. Such improvements are indeed possible. E.g. for the qubit case (d = 2)
it is shown in [47] that there is always a code which saturates the quantum Gilbert-
Varshamov bound (1 − µ − 2² log2 (3)) > H2 (2²) which is slightly better than our
result.
But there are also known limitations, particularly the so-called Hamming bound.
This is a simple dimension counting argument, based on the error correctors dream:
Assuming that the scalar product (F, G) 7→ ω(F ∗ G) on the error space E is non-
degenerate, the dimension of the “bad space” is the same as the dimension of the
error space. Hence with the notations of Section 4.4 we expect dim H0 · dim E ≤
dim H2 . We now take m input systems and n output systems of dimension d each, so
that dim H1 = dm and dim H2 = dn . For the space of errors happening at at most f
places we introduce a basis s follows: at each site we choose a basis of B(H) consisting
of d2 −1 operators plus the identity. Then a basis of E is given byP all tensor
¡ ¢ products
with basis elements 6= 1I placed at j ≤ f sites. Hence dim E = j≤f nj (d2 − 1)j .
For large n we estimate this as in Lemma 7.1.1 as log dim E ≈ (f /n) log2 (d2 − 1) +
H2 (f /n). Hence the Hamming bound becomes
m f
log2 d + H2 (²) + log2 (d2 − 1) ≤ log2 d (7.12)
n n
which (with d2 À 1) is just (7.11) with a factor 1/2 on all errors.
If we drop the nondegeneracy condition made above it is possible to find codes
which break the Hamming bound [71]. In this case, however, we can consider the
weaker singleton bound, which has to be respected by those degenerate codes as well.
It reads
m f
1− ≥d . (7.13)
n n
We omit its proof here (see [172] Sect. 12.4 instead). Both bounds are plotted
together with the rate achieved by random graph coding in in Figure 7.2 (for d = 2).
7.3 Results
We are now ready to apply the results about error correction just derived to the
calculation of achievable rates and therefore to lower bounds on the quantum ca-
pacity.
7.3.1 Correcting small errors
We first look at the problem which motivated our study, namely estimating the
capacity of a channel T ≈ Id.
Theorem 7.3.1 Let d be a prime, and let T be a channel on d-level systems. Sup-
pose that for some 0 < ε < 1/2,
1
µ Hamming bound
Singleton bound
Achieved by random graph coding
0.8
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5
²
Figure 7.2: Singleton bound and Hamming bound together with the rate achieved
by random graph coding (for d = 2). The allowed regions are below the respective
curve.
Then
Cq (T ) ≥ (1 − 4ε) log2 (d) − H2 (2ε) = g(²) (7.15)
Proof. For every n set f = bεnc, and m = bµnc − 1, where µ is, up to a log 2 (d)
factor, the right hand side of (7.15), i.e. µ = 1 − 4ε − log2 (d)−1 H2 (2ε). This ensures
that the right hand side of (7.10) is strictly negative, so there must be a code for
d-level systems, with m inputs and n outputs, and correcting f errors. To this code
we apply Proposition 7.1.1, and insert the bound on k id −T k into Equation (7.6).
bµnc−1
Thus ∆(T ⊗n , Md ) → 0, even exponentially. This means that any number
< µ log2 (d) is an achievable rate. In other words, µ log 2 (d) is a lower bound to the
capacity. 2
If ² > 0 is small enough the quantity on the right hand side of Equation (7.15)
is strictly positive (cf. the dotted graph in Figure 7.2). Hence each channel which
is sufficiently close to the identity allows (asymptotically) perfect error correction.
Beyond that we see immediately that Cq (T ) is continuous (in the cb-norm) at
T = Id: Since Cq (T ) is smaller than log2 (d) and g(²) is continuous in ² with g(0) =
log2 (d) we find for each δ > 0 an ² > 0 exists, such that log 2 (d) − Cq (T ) < ² for
all T with kT − Id kcb < ²/e. In other words if T is arbitrarily close to the identity
its capacity is arbitrarily close to log 2 (d). In Corollary 7.3.3 below we will show the
significantly stronger statement that Q is a lower semicontinuous function on the
set of all channels.
7.3.2 Estimating capacity from finite coding solutions
A crucial consequence of the ability to correct small errors is that we do not actually
have to compute the limit defining the capacity: if we have a pretty good coding
scheme for a given channel, i.e., one that gives us ET ⊗k D ≈ idd , then we know the
errors can actually be brought to zero, and the capacity is close to the nominal rate
of this scheme, namely log2 (d)/k.
109 7.3. Results
Theorem 7.3.2 Let T be a channel, not necessarily between systems of the same
dimension. Let k, p ∈ N with p a prime number, and suppose there are channels E
and D encoding and decoding a p-level system through k parallel uses of T , with
1
error ∆ = k idp −ET ⊗k Dkcb < 2e . Then
log2 (p) 1
Cq (T ) ≥ (1 − 4e∆) − H2 (2e∆) . (7.16)
k k
Moreover, Cq (T ) is the least upper bound on all expressions of this form.
Proof. We apply Proposition 7.3.1 to the channel Te = ET ⊗k D. With the random
coding method we thus find a family of coding and decoding channels E e and De
from m0 into n0 systems, of p levels each, such that
¡ ¢ 0
e ET ⊗k D ⊗n Dk
k id −E e cb → 0. (7.17)
0
This can be reinterpreted as an encoding of pm -dimensional systems through
kn0 uses of the channel T (rather than Te), which corresponds to a rate
0
(kn0 )−1 log2 (pm ) = (log2 p/k)(m0 /n0 ). We now argue exactly as in the proof of
the previous proposition, with ε = e∆, so that
by equation (7.9). By random graph coding we can achieve the coding ratio µ ≈
(m0 /n0 ) = 1 − 4ε − log2 (p)−1 H2 (2ε), and have the errors ∆(Te⊗n , Mm
0 0
p ) go to zero
exponentially. Since
¡ ¢ 0
0
∆(T ⊗kn , Mm
0
e ET ⊗k D ⊗n Dk
e⊗n0 , Mm0 ) ≤ k id −E e cb ,
p ) ≤ ∆(T p (7.19)
we can apply Lemma 6.1.2 to the channel T (where the sequence Mα is given by
Mα = nα) and find that the rate µ(log2 p/k) is achievable. This yields the estimate
claimed in Equation (7.16).
To prove the second statement consider the function x → p(x) which associates
to each real number x ≥ 2 the biggest prime p(x) with p(x) ≤ x. From known
bounds on the length of gaps between two consecutive primes [127]1 it follows that
limx→∞ x/p(x) = 1 holds, hence we get 2kc /p(2kc ) ≤ 1 + δ 0 for an arbitrary δ 0 > 0,
provided n is large enough, but this implies
£ ¤
log2 p(2kc ) log2 (1 + δ 0 )
c− < . (7.20)
k k
Since we can choose an achievable rate c arbitrarily close to the capacity C q (T ) this
shows that there is for each δ > 0 a prime p and a positive integer k such that
|Cq (T ) − log2 (p)/k| ≤ δ. In addition we can find a coding scheme E, D for T ⊗k
such that Equation (7.18) holds, i.e. the right hand side of (7.16) can be arbitrarily
close to log2 (p)/k, and this completes the proof. 2
bcnc
∆(T ⊗n , M2 ) ≤ e−nλ(c) , (7.21)
Proof. We start as in Theorem 7.3.2 with the channel Te = ET ⊗k D and the quantity
∆ = k idp −ET ⊗k Dkcb . However instead of assuming that ∆ = ²/e holds, the full
range e∆ ≤ ² ≤ 1/2 is allowed for the error rate ². Using the same arguments as in
the proof of Theorem 7.3.2 we get an achievable rate
µ ¶
log2 (p) H2 (2²)
c(k, p, ²) = 1 − 4² − (7.22)
k log2 (p)
¡ ¢ 0 ³ ´n 0
0
∆(T ⊗kn , Mm
0
) ≤ k id − e ET ⊗k D ⊗n Dk
E e cb ≤ 2H2 (²) ∆² ; (7.23)
p
1 bncc −1 0 ³ H2 (²) ² ´
λ(c) = lim inf − ln ∆(T ⊗n , M2 ) ≥ lim n ln 2 ∆ (7.24)
n →∞ kn0
µ n
n→∞ 0
¶
² H2 (²)
≥− ln(∆) + ln 2 = −²Λ(∆, ²)/k (7.25)
k ²
where we have inserted inequality (7.23). Now we we can apply Lemma 6.1.2 (with
the sequence Mα = kα), which shows that λ(c) is positive, if the right hand side of
(3.21) is.
What remains to show is that λ(c) > 0 holds for each c < Cq (T ). To this end
we have to choose k, p, ∆ and ² such that c(k, p, ²) = c and Λ(∆, ²) < 0. Hence
consider δ > 0 such that c + δ < Cq (T ) is an achievable rate. As in the proof of
Theorem 7.3.2 we can choose log2 (p)/k such that log2 (p)/k > c + δ holds while ∆ is
arbitrarily small. Hence there is an ²0 > 0 such that c(k, p, ²) = c implies ² > ²0 . The
statement therefore follows from the fact that there is a ∆0 > 0 with Λ(∆, ²) > 0
for all 0 < ∆ < ∆0 and ² > ²0 . 2
1
λ ∆ = 10−3
∆ = 10−4
0.8
∆ = 10−5
∆ = 10−6
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
c
Figure 7.3: Lower bounds on the error exponent λ(c) plotted for n = 1, p = 2 and
different values of ∆.
ε0 + ²
k idp −ET ⊗n Dkcb = (7.26)
e
(²)
and |Cq (T ) − log2 (p)/n| < δ holds. If ² + ²0 is small enough, however, we find as
in Theorem 7.3.2 a random graph coding scheme such that
log2 (p) ¡ ¢ 1 ¡ ¢
Cq (T ) ≥ 1 − 4(² + ²0 ) − H2 2(² + ²0 ) = g(² + ²0 ). (7.27)
n n
Hence the statement follows from continuity of g and the fact that g(0) = log 2 (p)/n
holds. 2
For a classical channel Φ even more is known about the similar defined quantity
(²)
Cc (T ): If ² > 0 is small enough we can not achieve bigger rates by allowing small
(²)
errors, i.e. C(T ) = Cc (T ). This is called the “strong converse of Shannon’s noisy
channel coding theorem” [191]. To check whether a similar statement holds in the
quantum case is one of the big open problem of the theory.
Chapter 8
Multiple inputs
The topic of this and the following three chapters is a quantitative discussion
of the circle of questions we have already visited in Section 4.2, i.e. quantum state
estimation, quantum copying and other devices which act on a large number of
equally prepared inputs. This means that we are following the spirit of Chapter
5 and 6 and ask questions like: How can we measure the error which an approx-
imate cloning machine produces in its outputs? What is the lower bound on this
error? Is there a device which achieves this bound and how does it look like? One
fundamental difference to similar questions arising within entanglement distillation
and calculations of channel capacities is the fact that we are able to give complete
answers under quite general conditions. The reason is that the tasks we are going
to discuss admit large symmetry groups which can be used to reduce the number of
parameters, which makes the corresponding optimization problems more tractable.
Since the material we want to present is quite comprehensive, we have broken it
up into four chapters: The topic of the present one (which is a significantly extended
version of [133]) is an overview and the discussion of some general properties, while
the following three treat special cases, namely: optimal pure state cloning (Chapter
9), quantum state estimation (Chapter 10) and optimal purification (Chapter 11).
8.1 Overview and general structure
To start with, let us have a short look on the general structure of this particular
type of problem. In all cases we are searching for channels
T : A → B(H⊗N ) (8.1)
For state estimation, T should be an observable with values in the quantum state
space S = S(H). Hence we have to choose A = C(S) and the set of all such
estimators is denoted by
This notation is justified by the fact that state estimation is in a certain sense
the limiting case of cloning for infinitely many output systems. We will make this
statement more precise in Chapter 10.
In both cases the task is to optimize a “figure of merit” ∆(T ) which measures,
roughly speaking, the largest deviation of T ∗ (ρ⊗N ) from the target functional β(ρ) ∈
S(A) we want to approximate. In most cases ∆(T ) has the form
£ ¤
∆(T ) = sup δ T ∗ (ρ⊗N ), β(ρ) , (8.4)
ρ∈X
113 8.1. Overview and general structure
where δ is a distance measure on the state space S(A) of the algebra A and X ⊂
S(H). If nothing is known about the input state ρ we have to choose X = S(H). If,
in contrast to that, X is strictly smaller than S(H) it describes a priori knowledge
about ρ. The most relevant special cases arise when X is the set of pure states or if
X is finite. The latter corresponds to a cryptographic setup, where Alice and Bob
use finitely many signal states ρ1 , . . . , ρn to send classical information through a
quantum channel S and Eve tries to eavesdrop on the conversation by copying the
quantum information transmitted through S. Both situations require quite different
methods and we will concentrate our discussion on the pure state case (for recent
results concerning quantum hypothesis testing, i.e. estimation of states from a finite
set, cf. e.g. [173, 107, 166] and the references therein).
A different kind of a priori knowledge are a priori measures, i.e. instead of
knowing that all possible input states lie in a special set X we know for each
measurable set X ⊂ S(H) the probability µ(X) for ρ ∈ X. Such a situation typically
arises when we are trying to estimate (or copy) states of systems which originate
from a source with known characteristics. In this case we can use mean errors
Z
£ ¤
¯
∆(T ) = δ T ∗ (ρ⊗N ), β(ρ) µ(dρ). (8.5)
S(H)
as a figure of merit. Sometimes they are easier to compute than maximal errors as
in Equation (8.4). Often however ∆ leads to stronger results than ∆ ¯ and we will
concentrate our discussion therefore on maximal rather than mean errors.
Now assume that a particular estimation or cloning problem is given, which is
described by a set of channels and an appropriate figure of merit ∆. Then there is
a number of characteristic questions we are interested in the following. The first is:
• Is there an optimal device Tb which minimizes the error, i.e. ∆(Tb) = inf T ∆(T ),
and how does it look like?
Since the dimension of the space T (N, M ) grows exponentially with N and M it
seems at a first look to be hopeless to search for a closed form solution for arbitrary
N and M . We will see however that some problems admit quite simple (symmetry
based) arguments which restrict the size of the spaces, in which we have to search
for the minimizers, quite significantly. In this way we will be able to give a complete
answer for pure state cloning (Chapter 9) and estimation (Section 10.1) and for
purification (Chapter 11).
In other cases the situation is more difficult and a closed form solution can
not be achieved; for us this concerns primarily mixed state estimation and related
tasks. In this situation we will concentrate on the asymptotic behavior in the limit
of infinitely many input systems (N → ∞). Here we have to distinguish between
estimation (M = ∞) and cloning-like tasks (M arbitrary), because in the latter
case we have two parameters which can go to infinity. Let us consider first state
estimation. Here our main interests are the following
Note that the EN we are looking for in this context are not necessarily optimal
for N < ∞ but if ν is finite the error ∆(EN ) vanishes exponentially fast and the
difference (measured by ∆) between EN and an optimal scheme becomes already
negligible for a quite small number of input systems. The search for a sequence
8. Multiple inputs 114
where Vσ , Vτ are the unitaries associated to the permutations σ and τ ; cf. Equation
(3.7). The transformation T 7→ (αU T ) can be interpreted (passively) as a basis
change in the one-particle Hilbert space H, while ασ and ατ refer to permutations
of input respectively output systems. Now we have the following lemma:
Lemma 8.2.1 Consider the space T (N, M ) for M < ∞ and a convex, lower semi-
continuous functional ∆ : T (N, M ) → R+ which is invariant under the action of
115 8.2. Symmetric cloner and estimators
G = U(d) × SN × SM defined in Equation (8.6), i.e. ∆(αg T ) = ∆(T ) holds for all
g ∈ G. Then there is at least one Tb ∈ T (N, M ) with
i.e. there exist minimizers which are invariant under the group action α g .
Proof. The existence of minimizers is a simple consequence of compactness of
T (N, M ) and semicontinuity of ∆. Hence there is an S ∈ T (N, M ) with ∆(S) ≤
∆(T ) ∀T ∈ T (N, M ). Due to the invariance of ∆ we get
Now we can average over G (this integral is well defined because αg S ∈ T (N, M )
and T (N, M ) is finite dimensional)
Z
Tb = αg Sdg, (8.9)
G
where dg denotes the normalized Haar measure on G (which exists, due to com-
pactness of G). Obviously Tb is G-invariant: Since the action αg is affine we get
Z Z Z
αh Tb = αh αg Sdg = αhg Sdg = αg0 Sdg 0 = Tb with g 0 = hg. (8.10)
G G G
Exploiting convexity of ∆ and Equation (8.8) we get in addition (for all T ∈
T (N, M ))
·Z ¸ Z Z
b
∆(T ) = ∆ αg Sdg ≤ ∆(αg S)dg = ∆(S)dg = ∆(S), (8.11)
G G G
Hence as long as we are only interested to find some (rather than all) optimal
devices we can restrict our attention to those channels which are invariant under
the operation αU,σ,τ of U(d) × SN × SM introduced above. It is therefore useful to
define
Definition 8.2.2 A completely positive, (not necessarily unital) map T :
B(H⊗N ) → B(H⊗M ) which is invariant under the action αU,σ,τ of U(d) × SN × SM
defined in Equation (8.6) is called a fully symmetric cloning map. The space of all
fully symmetric elements of T (N, M ) is denoted by Tfs (N, M ).
To adopt the previous discussion to state estimation, let us consider now
the space T (N, ∞) of estimators (Equation (8.3)). As the set T (N, M ) defined
above, T (N, ∞) is convex, however it is infinite dimensional and compactness is
therefore topology dependent. An appropriate topology for our purposes is the
weak topology on T (N, ∞), i.e. the coarsest topology such that all functions
T (N, ∞) 3 E 7→ hψ, E(f )φi with f ∈ C(S), ψ, φ ∈ H⊗N are continuous. It is
then an easy consequence of the Banach-Alaoglu Theorem [186, Theorem IV.21]
that T (N, ∞) is compact in this topology.
In analogy to Equation (8.6) we can define a weakly continuous action of the
group SN × U(d) on T (N, ∞): For each (U, τ ) 3 U(d) × SM we define αU,τ E =
(αU ατ )(E) with
8.2.1 if we take into account that integrals of T (N, ∞) valued maps should be
considered as weak integrals, i.e. the average Ē of αg E over the group G = SN × U(d)
is defined as
Z
hψ, Ē(f )φi = hψ, (αg E)(f )φiµ(dg) ∀f ∈ C(S) ∀ψ, φ ∈ H⊗N . (8.13)
G
Hence we have:
Lemma 8.2.3 Consider a convex, lower semicontinuous (with respect to the weak
topology) functional ∆ : T (N, ∞) → R+ which is invariant under the action of
G = U(d) × SN defined in Equation (8.12), i.e. ∆(αg E) = ∆(E) holds for all
b ∈ T (N, ∞) with
g ∈ G. Then there is at least one estimator E
b ≤ ∆(E) ∀E ∈ T (N, ∞) and αg E
∆(E) b = E,
b ∀g ∈ G. (8.14)
As in the case M < ∞ discussed above, this lemma shows that we can restrict
the search for minimizers to those observables which are invariant under the action
αU,τ of U(d) × SN (as long as the figure of merit under consideration has the correct
symmetry). In analogy to the M < ∞ case we define therefore
Definition 8.2.4 A (completely) positive, unital map E : C(S) → B(H ⊗N ) which
is invariant under the action αU,τ of U(d)×SN defined in Equation (8.12) is called a
fully symmetric estimator. The set of all fully symmetric E is denotes by Tfs (N, ∞).
To make use of these results it is necessary to get a better understanding of the
structure of the sets Tfs (N, M ) for N ∈ N and M ∈ N ∪ {∞}. This is the subject of
the rest of this chapter.
8.2.2 Decomposition of tensor products
The first step is an analysis of the representations U 7→ U ⊗N and σ 7→ Vσ of U(d)
respectively SN on the tensor product Hilbert space H⊗N , which play a crucial role
in the definition of fully symmetric channels and estimators. The results we are
going to review here are well known and go back to Weyl [237, Ch. 4]. To state
them we have to introduce some notations from group theory: A Young frame is an
arrangement of a finite number of boxes into rows of decreasing length. We represent
it by a sequence of integers m1 ≥ m2 ≥ · · · ≥ md ≥ 0 where mk denotes the number
of boxes in the k th row. Hence
d
X
Yd (N ) = {m = (m1 , . . . , md ) ∈ Nd0 | m1 ≥ m2 ≥ . . . ≥ md , mk = N } (8.15)
k=1
denotes the set of all frames with d rows and N boxes. Each Young frame m ∈ Y d (N )
determines uniquely (up to unitary equivalence) irreducible representations of S N
and U(d) which we denote by Πm and πm . In the U(d)-case m gives the highest
weight of πm in the basis Ejj = |jihj|, j = 1, .., d of the Cartan subalgebra of the
Lie algebra of U(d) (cf. Subsection 8.3.2 for notations and a further discussion).
Πm as well as πm can be constructed explicitly from m, but we do not need this
information.
Theorem 8.2.5 Consider the d-dimensional Hilbert space H = Cd , its N -fold ten-
sor product H⊗N and the representations U(d) 3 U 7→ U ⊗N and SN 3 σ 7→ Vσ on
H⊗N . There is a unique decomposition of H⊗N into a direct sum such that
M M
H⊗N ∼ = Hm ⊗ Km , U ⊗N ∼= πm (U ) ⊗ 1I,
m∈Yd (N ) m∈Yd (N )
M
Vσ ∼
= 1I ⊗ Πm (σ) (8.16)
m∈Yd (N )
holds, where ∼
= means “naturally isomorphic”.
117 8.2. Symmetric cloner and estimators
For a proof see [195, Sect. IX.11]. This theorem is intimately related to Theorem
3.1.1, where commutativity properties between the U ⊗N and Vσ are discussed. This
can be seen if we introduce the algebras
M M
AN = B(Hm ) ⊗ 1I, BN = 1I ⊗ B(Km ). (8.17)
m∈Yd (N ) m∈Yd (N )
It is easy to check that AN and BN are commutants of each other, i.e. we have
0
AN = B N and BN = A0N where the prime denotes the commutant of the corre-
sponding set of operators, i.e. A0N = {B ∈ B(H⊗N ) | [A, B] = 0, ∀A ∈ A} and
0
similarly for BN . On the other hand we have U ⊗N ∈ AN and Vσ ∈ BN for all
U ∈ U(d) and σ ∈ SN . Hence, irreducibility of the representations πm and Πm im-
plies immediately that each element of AN (of BN ) is a finite linear combination of
operators from {U ⊗N | U ∈ U(d)} (respectively of permutation unitaries Vσ ). This
shows in particular that each operator which commutes with all U ⊗N is a linear
combination of Vσ ’s as stated in Theorem 3.1.1. Note however that Theorem 3.1.1
is not a corollary of Theorem 8.2.5, because the former is used in an essential way
in the proof of the latter.
Now let us consider the general linear group GL(d, C). Each representation πm
of U(d) admits a (unique) analytic continuation and leads therefore to a represen-
tation of GL(d, C). We will denote it by πm as well and it is in fact the GL(d, C)
representation with highest weight m. Therefore the following Corollary is an easy
consequence of Theorem 8.2.5
Corollary 8.2.6 Consider an operator X ∈ GL(d, C) and a Young frame m ∈
Yd (N ) for some N ∈ N then we have
⊗N 1 X
H+ = {SN ψ | ψ ∈ H⊗N }, SN ψ = Vσ ψ. (8.20)
N!
σ∈SN
⊗N
By definition all permutation unitaries Vσ act as identities on H+ and it is the
⊗N ⊗N
biggest subspace of H with this property. In addition it is easy to see that H+
⊗N
is left invariant by the U , i.e. it carries a subrepresentation of U(d). Since the
trivial representation of SN is labeled by the Young frame with one row and N
⊗N
boxes we get with Theorem 8.2.5 H+ = HN 1 ⊗ KN 1 , where we have used the
notation
⊗N
Proposition 8.2.7 Consider the N -fold symmetric tensor product H + , the cor-
⊗N
responding projection SN : H⊗N → H+ and the U(d) representation π+N
(U ) =
⊗N
SN U SN . Then we have, using the notations from Theorem 8.2.5 and Equation
(8.21):
⊗N N
H+ = H N 1 , π+ = πN 1 , SN = P N 1 . (8.22)
8.2.3 Fully symmetric cloning maps
Let us consider now fully symmetric cloning maps. Our aim in this Subsection is to
determine the extremal elements of the convex set Tfs (M, N ) and the central tool
for this task is Theorem 8.2.5, which we have to apply to the input and output
Hilbert space H⊗N and H⊗M . Since the procedure is quite complex we have broken
it up into several steps.
Proposition 8.2.8 Each fully symmetric channel T : B(H ⊗M ) → B(H⊗N ) can be
decomposed into a direct sum
M
T (A) = Tm (A) ⊗ 1IKm , (8.23)
m∈Yd (N )
The set of all such Tm (which we will call again fully symmetric) is denoted by
Tfs (Hm , M ).
Proof. According to Definition 8.2.2 we have [T (A), Vσ ] = 0 for all A ∈ B(H⊗M )
and all σ ∈ SN . By Theorem 8.2.5 this implies that T (A) ∈ AN holds, where AN
denotes the algebra from Equation (8.17). Hence, T is of the given form. 2
The next step applies Theorem 8.2.5 to the output Hilbert space H⊗M . This
leads to a further decomposition of the spaces Tfs (Hm , M ).
Theorem 8.2.9 Consider N, M ∈ N and m ∈ Yd (N ). Each channel T :
B(H⊗M ) → B(Hm ) satisfying the covariance condition from Equation (8.24) admits
a unique convex decomposition
X cn £ ¤
T (A) = Tn trKn (Pn APn ) (8.25)
dim Kn
n∈X
P
with cn > 0, n cn = 1 and
The set of all channels Tn with this property is denoted by Tfs (Hm , Hn ).
Proof. To prove uniqueness it is sufficient to note that each summand in Equation
(8.25) equals T (Pn APn ) and is therefore uniquely
P determined by T . To show that
the corresponding decomposition T (A) = n T (Pn APn ) of T has the given form
consider first the dual T ∗ of T . By assumption we have [Vτ , T ∗ (ρ)] = 0 for all
ρ ∈ B ∗ (Hm ) and all τ ∈ SM . Due to Theorem 8.2.5 this implies T ∗ (ρ) ∈ A∗M ,
where A∗M denotes the dual of the algebra from Equation (8.17). T ∗ is therefore a
L
direct sum T ∗ (ρ) = n Ten∗ (ρ) ⊗ 1I over n ∈ Yd (M ), where Ten∗ is the dual of a cp-
map Ten : B(Hn ) → B(Hm ) which satisfies the covariance condition from Equation
(8.27).
119 8.2. Symmetric cloner and estimators
This is exactly the form from Equation (8.25), except that the Ten are not unital.
Hence consider Ten (1I). Due to covariance of Ten we have
¡ ¢
πm (U )Ten (1I)πm (U )∗ = Ten πn (U )Pn πn (U )∗ = Ten (1I) (8.32)
Now the final step is to analyze the spaces Tfs (Hm , Hn ) of πm , πn covariant
channels.
Proposition 8.2.10 The set Tfs (Hm , Hn ) is convex and its extremal elements are
of the form
T (A) = V ∗ (A ⊗ 1IL )V (8.34)
with an isometry V : Hm → Hn ⊗ L into the tensor product of Hm and an auxiliary
Hilbert space L such that L carries an irreducible representation π of U(d) and V
intertwines πm with πn ⊗ π, i.e. V πm = πn ⊗ πV holds.
Proof. Assume first that T from Equation (8.34) admits a convex decomposition
T = λT1 + (1 − λ)T2 with T1 , T2 ∈ Tfs (Hm , Hn ) and 0 < λ < 1. By Theorem 3.2.2
this implies that there are two operators F1 , F2 with Tj (A) = V ∗ A ⊗ Fj V , j = 1, 2
and [π(U ), Fj ] = 0. Irreducibility of π implies together with normalization (the Tj
are unital by assumption) that F1 = F2 = 1I holds. Hence T1 = T2 = T which
implies that T is extremal.
To show that each channel T ∈ Tfs (Hm , Hn ) can be decomposed into elements
of the given form, note that Theorem 3.2.2 implies that a Stinespring representation
T (A) = V ∗ (A ⊗ 1IL )V of T exists such that L carries a representation π of U(d)
and V : Hm → Hn ⊗ L is an isometry which intertwines πm with πn ⊗ π. If π
is irreducible the Theorem is proved (T is extremal in this case); if not we can
decompose it into a direct sum
M M
L= Lj , π = πj (8.35)
j∈J j∈J
8. Multiple inputs 120
where J is a finite index set and the πj are irreducible representations on Lj . If the
projection from L onto Lj is denoted by Pj we can define operators Vj = (1I ⊗ Pj )V
which intertwine πm and πn ⊗πj . Hence Tej (A) = Vj∗ (A⊗1I)Vj is a cp-map satisfying
the proposition, except that it is not unital. Irreducibility of πm and covariance of Tej
P
imply however that Tej (1I) = cj 1I holds with positive constants cj . Due to j Pj = 1I,
we get a convex decomposition T (A) = Σj cj Tj (A) of T with summands Tj = c−1 e
j Tj
of the stated form. 2
Combining Theorem 8.2.9 and Proposition 8.2.10 we get the extremal elements
of the set Tfs (Hm , M ) as
1 £¡ ¢ ¤
T (A) = V ∗ trKn (Pn APn ) ⊗ 1I V (8.36)
dim Kn
with n ∈ Yd (M ) and an isometry V which satisfies the condition from Proposition
8.2.10. Using in addition Proposition 8.2.8 we see that each extremal element of the
set Tfs (N, M ) is a direct sum over the set Yd (N ) of channels of the form (8.36).
To get a result which is even more explicit, we have to determine for each n and
m all admissible intertwining isometries V . For arbitrary but finite d this can be
done at least in an algorithmic way and in the special case d = 2 we just have to
calculate Clebsch-Gordon coefficients. This shows that the general structure of a
fully symmetric cloning map is completely determined by group theoretical data.
8.2.4 Fully Symmetric estimators
Our next task is to determine the structure of the set Tfs (N, M ) in the special case
M = ∞. Hence consider an E ∈ Tfs (N, M ). As for finite M we have [E(f ), Vσ ] = 0
for allLσ ∈ SN and all f ∈ C(S). This implies that E decomposes into a direct sum
E = m∈Yd (N ) Em where the Em are observables
and fU (ρ) = f (U ρU ∗ ). We write Tfs (m, ∞) for the space of all observables satisfying
Equation (8.37) and call them again fully symmetric. To analyze the structure of
the Em let us state first the following result [66, 111]:
Theorem 8.2.11 Consider a compact, unimodular group G which acts transitively
on a topological space X by G × X 3 (g, x) 7→ αg (x), and a representation π of
G on a Hilbert space H. Each covariant POV measure E : C(X) → B(H) (i.e.
E(f ◦ αg ) = π(g)E(f )π(g)∗ holds for all g ∈ G and all f ∈ C(X)) has the form
Z
E(f ) = f (αg x0 )π(g)Q0 π(g)∗ µ(dg) (8.38)
G
Σ coincides with the orbit space S/ U(d) and p with the canonical projection. If
e1 , . . . , ed ∈ H denotes an orthonormal basis we can introduce in addition the map
If the map Eem,j is nonzero, covariance of E implies that there is a positive constant
λj with Eem,j (1I) = λj 1I hence we can define Em,j = λ−1 E em,j and get POV-measures
P j
Em,j : C(Sj ) → B(Hm ) with Em (f ) = j λj Em,j (f ¹ Sj ), where f ¹ Sj denotes
the restriction of f ∈ C(S) to Sj . It is therefore sufficient to show Equation (8.42)
for all Em,j .
Hence consider a positive function h ∈ C(Σj ) and the map
£ ¤
eh (g) = Em,j (g ⊗ h) ◦ Φ−1 ∈ B(Hm )
C(Xj ) 3 g 7→ E (8.44)
j
Eeh (1I) = νh 1I holds with a constant νh > 0. In other words E eh is (up to nor-
malization) a covariant POV measure and Theorem 8.2.11 applies. Hence there is
a unique positive operator Qm,j (h) ∈ B(Hm ) such that
Z
£ ¤
Eeh (g) = Em,j (g ⊗ h) ◦ Φ−1 =
j g(U v0j U ∗ )πm (U )Qm,j (h)πm (U )∗ dU (8.45)
U(d)
8. Multiple inputs 122
holds. It is easy to see that the map C(Σj ) 3 h 7→ Qm,j (h) ∈ B(Hm ) is linear and
positive. Normalization of Em,j implies in addition that Qm,j (1I) = 1I. Hence Qm,j
is a POV-measure satisfying Equation (8.45) for each function (g ⊗h)◦Φ−1 j ∈ C(Sj ).
Linearity and continuity (which is a consequence of positivity) implies therefore
Z
Em,j (f ) = πm (U )Qm,j (fU )πm (U )∗ dU (8.46)
U(d)
P
for all f ∈ C(Sj ). Hence Equation (8.42) follows with Qm = j λj Qm,j . 2
exactly one weight m, called the highest weight, such that ∂π(Ejk )x = 0 for all x in
the weight subspace of m and for all j, k = 1, . . . , d with j < k. The representation
π is (up to unitary equivalence) uniquely determined by its highest weight. On the
other hand the weight m is uniquely determined by its values m(Ejj ) = mj on the
basis Ejj of tC (d). We will express this fact in the following as “m = (m1 , . . . , md )
is the highest weight of the representation π”. For each analytic representation of
GL(d, C) the mj are integers satisfying the inequalities m1 ≥ m2 ≥ · · · ≥ md
and the converse is also true: each family of integers with this property defines the
highest weight of an analytic, irreducible representation of GL(d, C).
In a similar way we can define weights and highest weights for representations
of the group SL(d, C) as linear forms on the Cartan subalgebra stC (d). As in the
GL(d, C)-case an irreducible representation π of SL(d, C) is characterized uniquely
by its highest weight m. However we can not evaluate m on the basis Ejj since these
matrices are not trace free. One possibility is to consider an arbitrary extension of
m to the algebra tC (d) = stC (d) ⊕ C1I. Obviously this extension is not unique.
Therefore the values m(Ejj ) = mj are unique only up to an additive constant. To
circumvent this problem we will use usually the normalization condition m d = 0.
In this case the integer mj corresponds to the number of boxes in the j th row of
the Young tableau usually used to characterize the irreducible representation π.
Another possibility to describe the weight m is to use the basis Hj of stC (d). We
get a sequence of integers lj = m(Hj ), j = 1, . . . , d − 1. They are related to the
mj by lj = mj − mj+1 . Each sequence l1 , . . . , ld−1 defines the highest weight of an
irreducible representation of SL(d, C) iff the lj are positive integers.
Finally consider the representation π̄ conjugate to π, i.e. π(u) = π(u). If π
is irreducible the same is true for π̄. Hence π̄ admits a highest weight which is
given by (−md , −md−1 , . . . , −m1 ). If π is a SU(d) representation we can apply the
normalization md = 0. Doing this as well for the conjugate representation we get
(m1 , m1 − md−1 , . . . , m1 − m2 , 0). In terms of Young tableaus this corresponds to
the usual rule to construct the tableau of the conjugate representation: Complete
the Young tableau of π to form a d × m1 rectangle. The complementary tableau
rotated by 180◦ is the Young tableau of π̄.
Let us discuss now the Casimir elements of SL(d, C). Since SL(d, C) is a subgroup
of GL(d, C) its enveloping algebra S is a subalgebra of G. However the corresponding
Lie algebras differ only by the center of gl(d, C). Hence the center Z(S) of S is a
subalgebra of Z(G). Since sl(d, C) is simple there is no first order Casimir element
and there is only one second order Casimir element C e 2 which is therefore a linear
combination C e 2 = C2 + αC2 of C2 and C2 . Obviously the factor α is uniquely
1 1
determined by the condition that the expression
2
d
X X d
X
e2 (π) = C1 (π) + αC12 (π) =
C m2j + (mj − mk ) + α mj (8.48)
j=1 j<k j=1
with ∂π(C e 2) = C
e2 (π)1I is invariant under the renormalization (m1 , . . . , md ) 7→ (m1 +
µ, . . . , md + µ) with an arbitrary constant µ. Straightforward calculations show that
α = − d1 . Hence we get C e2 = C2 − 1 C 2 and
d 1
Xd Xd X
1
e2 (π) = (d − 1)
C m2j − mj mk + d (mj − mk ) . (8.49)
d j=1 j6=k j<k
After the general discussion let us consider now several special problems. The
first is optimal cloning of pure states. In other words we are searching for a device
T ∈ T (N, M ) which acts on N d-level systems, each of them in the same (unknown)
pure state ρ and which yields at its output side an M -particle system in a state
T ∗ (ρ⊗N ) which approximates the product state ρ⊗M “as good as possible”. This is
obviously easy if N ≥ M holds, because we only have to drop some particles. Hence
we will assume throughout this chapter M > N , as long as nothing else is explicitly
stated.
The presentation in this chapter is based on [230, 136] and it concerns universal
and symmetric cloners, i.e. we are looking at problems which admit symmetry prop-
erties as discussed in Section 8.2. Other related work in this direction, in most cases
restricted to the qubit case can be found in [96, 41, 42, 45]. Other approaches to
quantum cloning, which are not subject of this work, include “asymmetric cloning”,
which arises if we trade the quality of one particular output system against the rest
(see [49]) and cloning of Gaussian states [50].
9.1 Figures of merit
To get a figure of merit ∆ which measures the quality of the clones, we can follow
the general formula (8.4). Hence we have to choose the set of pure states for X
and β(ρ) = ρ⊗M for the target functional. The remaining freedom is the distance
measure δ and there are in fact two physically different choices: We can either
check the quality of each clone separately or we can test in addition the correlations
between output systems. With the notation
where the supremum is taken over all pure states ρ and j = 1, . . . , N and F1C
denotes the “one-particle fidelity”
£ ¤
F1C (T ) = inf tr T (ρ(j) )ρ⊗N . (9.3)
ρ pure,j
∗ ⊗N
∆C1 measures the worst one particle error of the output state T (σ ). If we are
interested in correlations too, we have to choose
£ ¡ ⊗M ⊗N
¢¤
∆Call (T ) = sup 1 − tr T (ρ )ρ C
= 1 − Fall (T ) (9.4)
ρ,pure
∆Call measures again a “worst case” error, but now of the full output with respect
to M uncorrelated copies of the input ρ. Note that we can replace the fidelity
quantities in Equation (9.4) and (9.2) by other distance measures like trace-norm
9. Optimal Cloning 126
distances1 or relative entropies without changing the results we are going to present
significantly (although some proofs might become more difficult). This is however a
special feature of the pure state case. For mixed state cloning the correct choice of
the figure of merit has to be done much more carefully; cf the discussion in Section
9.6.
Another simplification which arises from the restriction to pure input states
concerns the dependency of ∆C C
1 and ∆all on the channel T . Since ρ = |ψihψ| with
ψ ∈ H holds we have
for all A ∈ B(H⊗M ). Therefore tr[T (A)ρ⊗N ] depends only on the part of T (A)
⊗N
which is supported by the symmetric tensor product H+ ⊂ H⊗N , i.e. we have
C C
∆] (T ) = ∆] (T+ ), ] = 1, all with T+ (A) = SN T (A)SN , where SN denotes as
⊗N
in Subsection 8.2.2 the projection onto H+ . This implies that it is sufficient to
consider channels
⊗N
T : B(H⊗M ) → B(H+ ). (9.7)
⊗N
Since H+ ⊂ H⊗N we can look at such a T as a cp-map which takes its values in
⊗N ⊗N
B(H ), but in this sense it is not unital (since T is unital as map into B(H+ )
i.e. we have T (1I) = SN ). In other words the channel T from Equation (9.7) is not
in T (N, M ) and does not fit into the general discussion from Section 8.1. This is,
however, an artificial problem, because we can replace T by
⊥ ⊥
tr(SN ASN ) ⊥
Te(A) = T (A) + ⊥
SN , (9.8)
dim SN
⊥
where SN denotes the orthocomplement of SN . This new map is obviously com-
pletely positive and unital (i.e. in T (N, M )) but the additional term does not change
C e
the value of ∆C C
] (i.e. ∆] (T ) = ∆] (T )). Whenever it is necessary for formal reasons
to consider T from Equation (9.7) as an element of T (N, M ) such an extension is
understood.
9.2 The optimal cloner
According to the discussion from the last paragraph, the optimal cloning map has
⊗N
to take density operators on H+ to operators on H⊗M . An easy way to achieve
such a transformation is to tensor the given operator ρ with the identity operators
belonging to tensor factors (N + 1) through M , i.e., to take ρ 7→ ρ ⊗ 1I⊗(M −N ) . This
breaks the symmetry between the clones, making N perfect copies and (N − M )
states, which are worst possible “copies”. Moreover, it does not map to states on the
⊗M
Bose sector H+ , which would certainly be desirable, as the target states ρ⊗M are
supported by that subspace. An easy way to remedy both defects is to compress the
operator to the symmetric subspace with the projection SM . With the appropriate
normalization this is our definition of the cloning map, later shown to be optimal:
d[N ]
Tb∗ (ρ) = SM (ρ ⊗ 1I⊗(M −N ) )SM (9.9)
d[M ]
⊗N
where d[N ] (respectively d[M ]) denotes the dimension of H+ , i.e.
µ ¶ µ ¶
−d d+N −1
d[N ] = (−1)N = , (9.10)
N N
1 Trace norm distances would lead in the present case to exactly the same results, including the
values of the minimal errors from Proposition 9.2.2. This is easy to check. Other proofs, however,
would become more difficult, which is the reason why we have chosen fidelity based quantities.
127 9.2. The optimal cloner
⊗N
which can be checked easily using the occupation number basis of H+ . The channel
Tb given in Equation (9.9) produces M clones from N input systems. Sometimes it
is useful to have a symbol for Tb which indicates these numbers (i.e. if N and M are
not understood from the context) in this case we write TbN →M instead of Tb. The
following two propositions summarizes the most elementary properties of Tb.
Proof. Full symmetry and complete positivity of Tb are obvious. Hence it remains to
show that Tb is unital. With U(d) covariance of T (which is part of full symmetry)
we get
where π+ N
(U ) = SN U ⊗N SN . Irreducibility of π+ (cf. Proposition 8.2.7) implies
therefore Tb(1I) = λ1I. To determine the value of λ ∈ R+ , let us consider the density
matrix τN = d[N ]−1 SN and
£ ¤ £ ¤
λ = tr Tb(1I)τN = tr Tb∗ (τN ) (9.12)
· ¸
1
= tr SM (SN ⊗ 1I⊗(M −N ) )SM (9.13)
d[M ]
· ¸
SM
= tr = 1. (9.14)
d[M ]
C b C b
Proof. Consider ∆C
1 first. By definition we have ∆1 (T ) = 1 − F1 (T ) and
N
£ ¤ 1 X £ (j) b∗ ⊗M ¤
F1C (Tb) = inf tr T (σ (j) )σ ⊗N = tr ρ T (ρ ) (9.17)
σ pure,j N j=1
M
d[N ] X £ (j) ¤
F1C (Tb) = tr ρ SM (ρ⊗N ⊗ 1I⊗(M −N ) )SM . (9.18)
M d[M ] j=1
9. Optimal Cloning 128
2
P
Since SM is a projector (SM = SM ) and due to [ j σ (j) , SM ] = 0 this equation
leads to
M
d[N ] X £ (j) ⊗N ¤
F1C (Tb) = tr ρ (ρ ⊗ 1I⊗(M −N ) )SM (9.19)
M d[M ] j=1
d[N ] ³ £ ¤
= N tr SM (ρ⊗N ⊗ 1I⊗(M −N ) )SM (9.20)
M d[M ]
£ ¤´
+ (M − N ) tr SM (ρ⊗(N +1) ⊗ 1I⊗(M −N −1) )SM (9.21)
N £ b∗ ¤ (M − N ) d[N ] £ ¤
= tr TN →M (ρ⊗N ) + tr TbN∗ +1→M (ρ⊗(N +1) ) (9.22)
M M d[N + 1]
where ρ = |ψihψ| and the infimum is taken over all normalized ψ ∈ H. Since
∗
SM = SM and SM ψ ⊗M = ψ ⊗M we get
d[N ] ⊗N ⊗N ⊗N d[N ]
F1C (Tb) = inf hψ , ρ ψ ihψ ⊗(M −N ) , ψ ⊗(M −N ) i = . (9.25)
ψ d[M ] d[M ]
Together with ∆C C
all = 1 − Fall Equation (9.16) follows. 2
The significance of Tb lies in the fact that it is the only cloning map which
minimizes ∆C C
1 and ∆all . The central result of this section is therefore the following.
⊗N
Theorem 9.2.3 For any cloning map T : B(H⊗M ) → B(H+ ) (with M > N ,
d ⊗N
H = C and H+ denotes the N -fold symmetric tensor product) we have
¯ ¯
d − 1 ¯¯ N M + d ¯¯
∆C (T ) ≤ 1 − (9.26)
1
d ¯ N +d M ¯
¯ ¯
¯ d[N ] ¯¯
∆C (T ) ≤ ¯ 1 − . (9.27)
all ¯ d[M ] ¯
⊗N ⊗N
with covariant, unital channels Tm : B(Hm ) → B(H+ ) and η ∈ S(H+ ).
Proposition 9.3.1 The channel Tb minimizes the all particle error ∆C C
all = 1 − Fall .
C
Proof. To calculate the all particle fidelity
£ ∗ ⊗N Fall (T ) of¤ a fully symmetric T note that
U(d) covariance of T implies that tr T (ρ )ρ⊗M does not depend on the pure
state ρ ∈ S(H). Hence
£ ¤ £ ¤
C
Fall (T ) = inf tr T ∗ (σ ⊗N )σ ⊗M = tr T ∗ (ρ⊗N )ρ⊗M (9.29)
σpure
⊗M
Since ρ⊗M is supported by H+ we have due to PM 1 = SM the equalities
⊗M ⊗M
PM 1 ρ PM 1 = ρ and Pm ρ⊗M Pm = 0 for m 6= M 1. Hence only one term
remains in the sum (9.29) and we get
C
£ ∗ ⊗N ⊗M
¤
Fall (T ) = cM 1 tr TM 1 (ρ )ρ . (9.31)
If T is optimal we must have cM 1 = 1 and therefore
C
£ ∗ ⊗N ⊗M
¤
T = TM 1 and Fall (T ) = TM 1 (ρ )ρ (9.32)
To get an upper bound on Fall C
we use positivity of the operator SN − ρ⊗N
and U(d) covariance of TM 1 . The latter implies that T ∗ (SN /d[N ]) is a density
M M
matrix which commutes with π+ (U ). Irreducibility of π+ implies together with
∗ ∗
the fact that TM 1 is trace preserving that TM 1 (SN ) = d[N ]/d[M ]SM . Hence we get
according to T = TM 1
£ ¤
0 ≤ tr T ∗ (SN − ρ⊗N )ρ⊗M (9.33)
d[N ] £ ¤ £ ¤
= tr SM ρ⊗M − tr T ∗ (ρ⊗N )ρ⊗M (9.34)
d[M ]
d[N ] £ ¤
= − tr T ∗ (ρ⊗N )ρ⊗M . (9.35)
d[M ]
9. Optimal Cloning 130
C
Together with Equation (9.32) we get Fall (T ) ≤ d[N ]/d[M ]. However we already
know from Proposition 9.2.2 that Fall (T ) = d[N ]/d[M ], hence Tb is optimal.
C b
2
⊗N
Proposition 9.3.2 There is only one channel T : B(H ⊗M ) → B(H+ ) which
C C
minimizes ∆all (respectively maximizes Fall ).
⊗N
Proof. To prove uniqueness consider now a general channel T : B(H ⊗M ) → B(H+ )
C C
which minimizes ∆all (respectively maximizes Fall ). By averaging over the groups
U(d) and SM we get
Z
∗ 1 X £ N ¤
T (η) = Vτ U ⊗M ∗ T ∗ π+ N
(U )ηπ+ (U )∗ U ⊗M Vτ∗ dU (9.36)
M! U(d)
τ ∈SM
which is a fully symmetric channel. Due to convexity and invariance of ∆ C all we get
∆Call (T ) ≤ ∆ C
all (T ) and since T is by assumption already optimal: ∆ C
all (T ) = ∆Call (T ).
Hence T is optimal as well and at the same time fully symmetric; cf. the proof of
∗
Lemma 8.2.1. This implies according to Equation (9.32) that T (η) is supported by
⊗M ∗ ∗ ⊗N
H+ , i.e. T (η) = SM T (η)SM for all η ∈ S(H+ ). Hence we get for an arbitrary
⊗M
vector ψ ∈ H with SM ψ = 0
∗
0 = hψ, T (η)ψi (9.37)
Z
1 X ⊗M ∗ £ N ¤ ®
= U Vτ ψ, T ∗ π+ N
(U )ηπ+ (U )∗ U ⊗M Vτ∗ ψ dU. (9.38)
M! U(d)
τ ∈SM
This is a sum of integrals over positive quantities and it vanishes. Hence the
integrand has to be zero for all U and all τ , which implies in particular that
∗ ⊗M
hψ, T ∗ (η)ψi = 0. In other words T ∗ (η) is as T (η) supported only by H+ .
C
Since T is optimal we have Fall (T ) = d[N ]/d[M ]. Together with Equation (9.29)
and (9.35) this implies
£ ∗ ¤
tr T (SN − ρ⊗N )ρ⊗M = 0. (9.39)
As in Equation (9.37) we can argue that (9.39) involves an integral over positive
quantities which vanish. Hence Equation (9.39) holds as well for T .
£ ¤
tr T ∗ (SN − ρ⊗N )ρ⊗M = 0. (9.40)
£ ¤
Together with optimality this leads to tr T ∗ (SN )ρ⊗M = d[N ]/d[M ] for all pure
⊗M
states ρ. Since the symmetric subspace H+ is generated by tensor products ψ ⊗M
∗ ⊗M
and due to the observation that T (η) is supported by H+ we conclude that
∗
T (SN ) = d[N ]/d[M ]SM holds.
To further exploit the optimality condition, consider the Stinespring dilation of
T ∗ in the form
d[N ] ∗
T ∗ (η) = V (η ⊗ 1IK )V (9.41)
d[M ]
⊗M ⊗N
where V : H+ → H+ ⊗ K for some auxiliary Hilbert space K, and η is an
⊗N
arbitrary density matrix on H+ . We have included the factor d[N ]/d[M ] in this
definition, so that for an optimal cloner V ∗ V = 1I. The optimality condition (9.40)
written in terms of V becomes
¡ ¢ ® ¡ ¢
0 = ψ ⊗M , V ∗ (SN − ρ⊗N ) ⊗ 1IK V ψ ⊗M = k (SN − ρ⊗N ) ⊗ 1IK V ψ ⊗M k2 (9.42)
¡
where
¢ ρ⊗Mis the one-dimensional projection to ψ ∈ H. Equivalently, (SN − ρ⊗N ) ⊗
1IK V ψ = 0 which is to say that V ψ ⊗M must be in the subspace ψ ⊗N ⊗ K for
every ψ.
131 9.3. Testing all clones
hence we get
where αU,τ is the action defined in Equation2 (8.6). Furthermore we know from
¡ ¢ d[N ]
Proposition 9.2.2 and 9.3.1 that tr ρ⊗M T ∗ (ρ⊗N ) ≤ d[M ] is true for all pure states
ρ ∈ S(H) and from Proposition 9.3.2 that equality holds iff T = Tb. Consequently
we have
Z µ ¶
1 X d[N ] £ ¤
− tr ρ⊗M αU,τ T (ρ⊗N ) dU =
M! d[M ]
τ ∈SM U(d)
d[N ] ³ ∗
´ d[N ] ³ ´
− tr ρ⊗M T (ρ⊗N ) = − tr ρ⊗M Tb∗ (ρ⊗N ) = 0. (9.48)
d[M ] d[M ]
Since the integral on the left hand site of this equation is taken over positive quanti-
ties the integrand has to vanish for all values of U ∈ U(d) and τ ∈ SM . This implies
£ ¤ d[N ]
tr ρ⊗M T ∗ (ρ⊗N ) = d[M ] for all pure states ρ ∈ S(H). However this is, according to
Proposition 9.3.2 only possible if T = Tb. 2
For later use (Chapter 11) let us temporarily drop our assumption that M > N
holds. In the case M ≤ N the optimal “cloner” is easy to achieve: we only have to
throw away N − M particles, i.e. we can define
where trN −M denotes the partial trace over N −M tensor factors. As in the N < M
case TbN →M is uniquely determined by optimality. The proof can be done (almost)
as in Proposition 9.3.2.
Proposition 9.3.4 Assume that N ≥ M holds, then TbN →M from Equation (9.49)
⊗N
is the only channel T : B(H⊗M ) → B(H+ ) with ∆C
all (T ) = 0.
Proof. Assume that T is optimal, i.e. ∆C all (T ) = 0. Then we can show exactly as
⊗N
in the proof of Proposition 9.3.2 that T ∗ (η) is supported for all η ∈ B ∗ (H+ ) by
⊗M
H+ . Hence we get
¡ ¢ ¡ ¢ ¡ ¢
1 = tr T ∗ (η) = tr T ∗ (η)SM = tr ηT (SM ) (9.50)
⊗N
for each pure state η ∈ B ∗ (H+ ). This implies T (SM ) = SN . Optimality of T
implies in addition 0 = ∆all (T ) = ∆C
C
all (T ), where T denotes the averaged channel
from Equation (9.36). Since T is fully symmetric we have
£ ¡ ¢¤ £ ¡ ¢¤
0 = sup 1 − tr T (σ ⊗M )σ ⊗N = 1 − tr T (ρ⊗M )ρ⊗N , (9.51)
σ,pure
with an arbitrary pure state ρ. The right hand side of this equation should be
regarded ¡as in (9.37) as
¢ an integral over positive quantities which vanishes. Hence
we get tr T (ρ⊗M )ρ⊗N = 1. Together with (9.50) this implies
£ ¤
0 = tr T (SM − ρ⊗M )ρ⊗N (9.52)
where K(m) ∈ N and the Tmj are U(d) covariant channels with
∗ N
Tmj (A) = Vmj A ⊗ 1IVmj , Vmj π+ = πm ⊗ π
ej Vmj (9.54)
C
£ To (k) ¤ F1 note that in analogy to Equation (9.29) the quantity
calculate
⊗N
tr T (σ )σ does not depend on the pure state σ and the index k = 1, . . . , M .
Hence we get
M
X 1 £ ¤
F1C (T ) = tr T (ρ(k) )ρ⊗N (9.55)
M
k=1
for an arbitrary pure state ρ. Now consider the Lie algebra sl(d, C) of SL(d, C),
i.e. the space of trace free d × d matrices
P equipped with the commutator as the
(k)
Lie bracket. The map sl(d, C) 3 X 7→ k X is the representation of sl(d, C)
corresponding to U 7→ U ⊗N . Hence we get
M
X
Pm X (k) Pm = ∂πm (X) ⊗ 1IKm , (9.56)
k=1
To further exploit this equation we need the following lemma which helps to calcu-
late Tm,j (∂πm (X)).
Lemma 9.4.1 Let π : U(d) → B(Hπ ) be a unitary representation, and let T :
⊗N
B(Hπ ) → B(H+ ) be a completely positive, unital and U(d)-covariant map, i.e.
∗ N
T (π(u)Aπ(u) ) = π+ N
(u)T (A)π+ (u)∗ . Then there is a number ω(T ) such that
N
X
T [∂π(X)] = ω(T ) X (k) , (9.58)
k=1
L(U XU ∗ ) = π+
N N
(U )L(X)π+ (U )∗ . (9.59)
⊗N ⊗N ⊗N
Now note that we can identify B(H+ ) with the tensor product H+ ⊗ H+ .
⊗N
Hence the map which associates to each U ∈ SU(d) the operator B(H+ ) 3 X 7→
⊗N
π+N
(U )Xπ+N
(U )∗ ∈ B(H+ ) can be reinterpreted as a unitary representation of
⊗N ⊗N
SU(d) on the representation space H+ ⊗ H+ . In fact it is (unitarily equivalent
to) the tensor product π+ ⊗ π+ . Since SU(d) 3 U 7→ U ( · )U −1 ∈ B((su(d)) is the
N N
adjoint representation of SU(d) this implies that each linear map L satisfying (9.59)
N N and the adjoint representation Ad. Note in addition that the
intertwines π+ ⊗ π+
representation
⊗N
X
⊗N ⊗N
sl(d, C) 3 X 7→ ∂π+ (X) = X (j) ∈ B(H+ ) (9.60)
j=1
of the Lie algebra sl(d, C) satisfies Equation (9.59) as well. Hence we have to show
that all such intertwiners are proportional, or in other words that Ad is contained
N N exactly once. This however is a straightforward application of standard
in π+ ⊗ π+
results from group theory. We omit the details here, see [241, § 79, Ex. 4] instead.
2
9. Optimal Cloning 134
Hence, to find the minimizer we have to maximize ω(Tmj ), and this is in fact
the hard part of the proof. Therefore we will explain the idea first in the d = 2 case.
9.4.2 The qubit case
For d = 2 the representations of SU(2) are conventionally labeled by their “total
angular momentum” α = 0, 1/2, 1, . . ., which is related to the highest weight m =
(m1 , m2 ) by α = (m1 − m2 )/2. The irreducible representation πα has dimension
+
2α + 1, and is isomorphic to πN with N = 2α in the notation used above. For
α = 1 we get the 3-dimensional representation isomorphic to the rotation group,
which is responsible for the importance of this group in physics. In a suitable basis
X1 , X2 , X3 of the Lie algebra su(2) we get the commutation relations [X1 , X2 ] =
X3 , and cyclic permutations of the indices thereof. In the α = 1 representation
∂π1 (Xk ) generates the rotations around the k-axis in 3-space. The Casimir operator
(cf. Subsection 8.3.3) of SU(2) is the square of this vector operator, i.e., C e2 =
P3 2
k=1 Xk . In the representation πα it is the scalar α(α + 1), i.e., if we extend the
representation ∂π of the Lie algebra to the universal enveloping algebra (which
also contains polynomials in the generators), we get ∂πα (C e 2 ) = α(α + 1)1I. We
can use this to determine ω(Tmj ) from Proposition 9.4.2 for arbitrary irreducible
representations. This computation can be seen as an elementary computation of a
so-called 6j-symbol, but we will not need to invoke any of the 6j-machinery.
Lemma 9.4.3 Consider three irreducible SU(2) representations π α , πβ , πγ with
α, β, γ ∈ {0, 1/2, . . .}, an intertwining isometry V πγ = πα ⊗ πβ V and the corre-
sponding channel T (A) = V ∗ (A ⊗ 1I)V ∗ . Then we have
1 α(α + 1) − β(β + 1)
ω(T ) = + . (9.62)
2 2γ(γ + 1)
The tensor product in the second summand can be re-expressed in terms of Casimir
operators as
X¡ ¢ 1 X¡ ¢2
∂πα (Xk ) ⊗ ∂πβ (Xk ) = ∂πα (Xk ) ⊗ 1I + 1I ⊗ ∂πβ (Xk ) −
2
k k
1 e 2 ) ⊗ 1Iβ − 1 1Iα ⊗ ∂πα (C
e 2 ).
∂πα (C (9.65)
2 2
135 9.4. Testing single clones
Inserting this into the previous equation, using the intertwining property once again,
and inserting the appropriate scalars for ∂π(C e 2) ≡ Ce2 (π)1I, we find that
e2 (πγ ) = C
e2 (πα ) + 1¡ e e2 (πα ) − C
¢
e2 (πβ ) ,
ω(T ) · C C2 (πγ ) − C (9.66)
2
and hence
1 C e2 (πα ) − C
e2 (πβ )
ω(T ) = + . (9.67)
2 e
2C2 (πγ )
e2 we find
Inserting the value for C
1 α(α + 1) − β(β + 1)
ω= + , (9.68)
2 2γ(γ + 1)
Note that we have only used the fact that the Casimir operator C e 2 is some fixed
quadratic expression in the generators. This is also true for SU(d). Hence equa-
tion (9.67) also holds in the general case; this observation leads directly to Lemma
9.4.4. In particular, we have shown that for the purpose of optimizing ω(Tmj ) for
any finite d only the isomorphism types of πα and πβ are relevant, but not the
particular intertwiner V .
To calculate ω(Tmj ) we have to set γ = N/2 and α is constrained by the con-
dition that πα must be a subrepresentation of U 7→ U ⊗M , which is equivalent to
α ≤ M/2. Finally we have π ej = πβ for some β = 0, 1/2, 1, . . . which is constrained by
the condition that there must be a non-zero intertwiner between πγ and πα ⊗ πβ . It
is well-known that this condition is equivalent to the inequality |α−β| ≤ γ ≤ α + β.
This is the same as the “triangle inequality”: the sum of any two of α, β, γ is larger
than the third. The area of admissible pairs (α, β) is represented in Fig. 1.
Since x 7→ x(x + 1) is increasing for x ≥ 0, we maximize ω with respect to β
in equation (9.62) if we choose β as small as possible, i.e., β = |α − γ|. Then the
numerator in equation (9.62) becomes
β
6
¡
¡
¡
¡
¡
(M − N )/2 ¡ r ωmax
¡ ¡
¡ ¡
¡ ¡
¡ ¡
¡ ¡
¡ ¡
¡ ¡
N/2 ¡ ¡
@ ¡
@ ¡
@¡ -
N/2 M/2 α
Then we have
1 C e2 (πm ) − C
e2 (πn )
ω(Tmn ) = + , (9.72)
2 e2 (π )
2C N
+
e2 denotes the second order Casimir operator of SU(d); cf. Subsection 8.3.3.
where C
which follows from Lemma 9.4.1. Note that equation (9.73) is valid only for X ∈
su(d) (and not for X ∈ u(d) in general). Hence we have to consider the second order
Casimir operator C e 2 of SU(d) which is given, according to Subsection 8.3.3, by an
expression of the form C e 2 = P g jk Xj Xk . This is all we needed in the derivation
jk
of equation (9.67) in Lemma 9.4.3. Hence the statement follows. 2
All channels Tmj from Equation (9.54) are of the form (9.71) with some highest
weight n. Hence the previous lemma shows together with Proposition 9.4.2 and the
e2 (π N ) is a positive constant that we have only to maximize the function
fact that C +
e2 (πm ) − C
W 3 (m, n) 7→ F (m, n) = C e2 (πn ) ∈ Z (9.74)
on its domain
W = {(m, n) ∈ Zd × Zd | m ∈ Yd (M ) and π+
N
⊂ πm ⊗ πn }, (9.75)
N N
where π+ ⊂ πm ⊗ πn stands for: “π+ is a subrepresentation of πm ⊗ πn ” and the
latter is an necessary and sufficient condition for the existence of an intertwining
N
isometry V between π+ and πm ⊗ πn . The first step is the following Lemma
2M N − N 2
F (m, n) = F1 (m, n) − , (9.76)
d
137 9.4. Testing single clones
with
Proof. The first step is to reexpress F (m, n) in terms of the U(d) Casimir operators
C2 and C21 . Note in this context that although equation (9.73) is, as already stated,
valid only for X ∈ su(d) the representations πm and πn are still U(d) representations
Hence we can apply the equation C e 2 = C2 − 1 C2 given in Section 8.3.3:
d 1
1
F (m, n) = C2 (πm ) − C2 (πn ) − (C12 (πm ) − C12 (πn )). (9.78)
d
This rewriting is helpful, because the invariants C1 turn out to be independent of the
variational parameters: Since πm is a subrepresentation of U 7→ U ⊗M = π1⊗M (U )
and ∂π1⊗M (1I) = M 1I, we also have C1 (πm ) = M . On the other hand, the existence
N
of an intertwining isometry V with V π+ = πm ⊗ πn V implies
N N
V C1 (π+ )1I = V ∂π+ (C1 ) = (∂πm (C1 ) ⊗ 1In + 1Im ⊗ ∂πn (C1 )) V
= (C1 (πm )1I + C1 (πn )1I) V (9.79)
N N
and therefore C1 (π+ ) = C1 (πm ) + C1 (πn ). Since C1 (π+ ) = N and C1 (πm ) = M we
get C1 (πn ) = N − M . Inserting this into equation (9.78) the statement follows. 2
We see that only F1 depends on the variational parameter and has to be max-
imized over W . To do this we have to express the constraints defining the domain
W more explicitly.
Lemma 9.4.6 If we introduce for each n ∈ Zd the notation n
e = (e
n1 , . . . , n
ed ) =
(−nd , . . . , −n1 ), we can express the set W as
W = {(m, n) | n
e = m − µ, and (m, µ) ∈ W1 } (9.80)
with
d
X
W1 = {(m, µ) ∈ Yd (M ) × Zd | µk = N and
k=1
0 ≤ µk ≤ mk − mk+1 ∀k = 1, . . . , d − 1}. (9.81)
The function F1 is then given by
e) =
F1 (m, n) = F1 (m, n
d
X d
X
µk (2mk − 2k − µk ) + (d + 1) µk = F2 (m, µ) + (d + 1)N (9.82)
k=1 k=1
with
d
X
W1 3 (m, µ) 7→ F2 (m, µ) = µk (2mk − 2k − µk ) ∈ Z. (9.83)
k=1
e we get
Together with Equation (9.86) and the definition of n
N
π+ ⊂ πm ⊗ πn ⇐⇒ n
ek = mk − µk with
d
X
0 ≤ µk ≤ mk − mk+1 ∀k = 1, . . . , d − 1 and µk = N, (9.88)
k=1
and this implies the statement about W . To express the function F1 in terms of the
new variables note that C2 (πn ) = C2 (πn ) = C2 (πe
n
).
e ) = F1 (m, m − µ).
F1 (m, n) = F1 (m, n (9.89)
Together with equation (9.77) this implies (9.76). 2
This defines a variational problem in its own right. Any step increasing m 1 at the
expense of some other mk increases F2 . This process terminates either, when M =
m1 , and all other mk = 0. This is surely the case for M < N , because then µd =
N −m1 +md ≥ N −M > 0. This is already the final result claimed in the Lemma. On
the other hand, the process may terminate because µd reaches 0 or would become
negative. In the former case we get µd = 0, and hence Case C or Case D. The latter
case (termination at µd = 1) may occur because the transformation m1 7→ (m1 + 1),
md 7→ (md − 1) changes µd = N − m1 + md by −2. There are two basic situations
in which changing both m1 and md is the only option for maximizing F3 , namely
d = 2 and m1 = m2 = · · · = md . The first case is treated below as Case E. In the
latter case we have 1 = N − m1 + md = N . Then the overall variational problem in
the Lemma is trivial, because only one term remains, and one only has to maximize
the quantity 2mk − 2k − 1, with trivial maximum at k = 1, m1 = M .
Case C: µd = 0, md > 0. For µd = 0, the number md does not enter in the
function F2 . Therefore, the move md 7→ 0 and m1 7→ m1 + md , increases F2 by
µ1 md ≥ 0. Note that this is always compatible with the constraints, and we end up
in Case D.
Case D: µd = 0, md = 0, d > 2. Set d 7→ (d − 1). Note that we could now use
the extra constraint µd0 ≤ md0 , where d0 = d − 1. We will not use it, so in principle
we might get a larger maximum. However, since we do find a maximizer satisfying
all constraints, we still get a valid maximum.
Case E: d = 2, µ1 = m1 − m2 , µ2 = 1. In this case m = (m1 , m2 ) is completely
fixed by the constraints. We have: m1 + m2 = M and µ1 + µ2 = m1 − m2 + 1 = N
hence m1 −m2 = N −1. This implies 2m1 = M +N −1, 2m2 = M −N +1 and since
m2 ≥ 0 we get M ≥ N − 1. If M = N − 1 holds we get m1 = N − 1 = M , m2 = 0
and consequently µ1 = N − 1. Together with µ2 = 1 = N − M these are exactly
the parameters where F2 should take its maximum according to the Lemma. Hence
assume M ≥ N . In this case µ2 = 1 implies that F2 becomes N M − 3N − 4, which
is, due to M ≥ N , strictly smaller than F2 (M, 0; N, 0) = 2M N − N 2 − 2N .
Uniqueness: In all cases just discussed the manipulations described lead to a
strict increase of F2 (m, µ) as long as (m, µ) 6= (mmax , µmax ) holds. The only ex-
ception is Case C with µ1 = 0. In this situation there is a 1 < k < d with µk > 0.
Hence we can apply the maps d 7→ d − 1 (Case D) and md 7→ 0 and m1 7→ m1 + md
(Case C) until we get µd 6= 0 (i.e. d reaches k). Since µ1 = 0 the corresponding
(m, µ) is not equal to (mmax , µmax ). Therefore we can apply one of manipulations
described in Case A, Case B or Case E which leads to a strict increase of F2 (m, µ).
This shows that F2 (m, µ) < F2 (mmax , µmax ) as long as (m, µ) 6= (mmax , µmax )
holds. Consequently the maximum is unique. 2
M +d
ωmax = ω(Tb) =
N +d
C b
and with Proposition 9.4.2 we get ∆C
1 (T ) ≥ ∆1 (T ) for all T . 2
Proof. One part of the uniqueness proof is already given above: there is only one
optimal fully symmetric cloning map, namely Tb. This follows easily from the unique-
ness of the maximum found in Lemma 9.4.7 and from the fact that the representation
N + +
π+ is contained exactly once in the tensor product πM ⊗ πM −N (cf. Equation (9.87)
and the decomposition of a fully symmetric T from Equation (9.53)).
Suppose now that T is a non-covariant cloning map, which also attains the best
C b
value: ∆C 1 (T ) = ∆1 (T ). Then we may consider the average T of T over the group
SM × U(d) (cf. Equation (9.36)), which is also optimal and, in addition, fully sym-
metric. Therefore T = Tb. The uniqueness part of the proof thus follows immediately
from Proposition 9.3.3. 2
lim ∆C b
] (TN →brN c ) = 0 (9.93)
N →∞
holds; where bxc denotes the biggest integer smaller than x. Note that this question
is related to entanglement of distillation and channel capacities, where asymptotic
rates are used as well. In the present case the complete answer to our question is
given by the following Theorem
Theorem 9.5.1 For each asymptotic rate r ∈ [1, ∞] we have
lim ∆C b
1 (TN →brN c ) = 0 (9.94)
N →∞
b 1
lim ∆C
all (TN →brN c ) = 1 − (9.95)
N →∞ rd−1
Proof. Consider first the all particle error. According to Equation (9.15) we have
¯ ¯
b d − 1 ¯¯ N (r − 1)N + d ¯¯
∆C ( T ) ≤ 1 − (9.96)
d ¯ N + d (r − 1)N ¯
1 N →brN c
¯ ¯
d − 1 ¯¯ 1 (r − 1) + d/N ¯¯
= 1 − . (9.97)
d ¯ r − 1 1 + d/N ¯
Hence limN →∞ ∆C b C
1 (TN →brN c ) = 0 as stated. The all particle error ∆all is given
according to Equation (9.16) and (9.10) by
b (N + 1)(N + 2) · · · (N + d − 1)
∆C
all (TN →brN c ) = 1 − (9.98)
(brN c + 1)(brN c + 2) · · · (brN c + d − 1)
hence we get
b (N + 1)(N + 2) · · · (N + d − 1)
lim ∆C
all (TN →brN c ) = 1 − lim (9.99)
N →∞ N →∞ (rN + 1)(rN + 2) · · · (rN + d − 1)
This results complements Theorem 9.2.3 where we have seen that the one- and
all-particle error admits the same (unique) optimal cloner. If we consider the asymp-
totic behavior we see that both figures behaves very differently: We can produce
optimal copies at infinite rate if we measure only the quality of individual clones. If
we take in addition correlations into account the rate is, however, zero.
9.6 Cloning of mixed states
Up to now we have excluded a discussion of mixed state cloning and related tasks.
The reason is that the search for a reasonable figure of merit is much more difficult
in this case and not even clarified for classical systems. At first, the latter statement
sounds strange, because it is indeed possible to copy classical information without
any error. However, cloning mixed states of a classical system does not mean to
copy a particular code word (e.g. from the hard drive in the memory of a computer)
but to enlarge a sample of N iid random variables to a size M > N .
To explain this in greater detail let us consider a finite alphabet X, the corre-
sponding classical observable algebra C(X) and a channel T : C(X M ) → C(X N ).
This T can be interpreted as a device which maps codewords of length N to code-
words of length M and it is uniquely characterized by the matrix T~xy~ , ~x ∈ X N ,
~y ∈ X M of transition probabilities; i.e. T~x~y denotes the probability that the code-
word ~x = (x1 , . . . , xN ) is mapped to ~y = (y1 , . . . , yM ). If S denotes in addition
a source which produces letters from X independently and identically distributed
according to the (unknown) probability distribution p ∈ S(X) (recall the notation
and terminology from Subsection 2.1.3), we can describe classical cloning as follows:
Draw a sample ~x ∈ X N from S and generate with probability T~xy~ a bigger sequence
~y = (y1 , . . . , yM ) ∈ X M which reflects the statistics of S as good as possible. This
means the output distribution T ∗ (p⊗N ) with
X
(T ∗ ρ⊗N )(|~y ih~y |) = T~xy~ px1 . . . pxN (9.102)
x∈X N
~
for mixed state cloners is most probable even more difficult and good proposals are
up to now not available.
Chapter 10
State estimation
Our next topic is quantum estimation theory, i.e. we are looking at measurements
on N d-level quantum systems which are all prepared in the same state ρ. There
is quite a lot of literature about this topic and we are not able to give a complete
discussion here (cf. Hellströms book [109] for an overview and [107, 166, 173, 2, 89,
94, 43, 42, 68] and the references therein for a small number of recent publications).
Instead we will follow the symmetry based approach already used in the last two
Chapters. Parts of the presentation (Theorem 10.2.4) are based on [137]. Other
results (Theorem 10.1.2 and 10.2.6) are not yet published.
10.1 Estimating pure states
Consider first the case where we know that the N input systems are all prepared
in the same pure but otherwise unknown state. As for optimal cloning this assump-
tion leads to great simplifications and therefore to a quite complete solution of the
corresponding optimization problem.
10.1.1 Relations to optimal cloning
As already discussed in Section 4.2 cloning and estimation are closely related: If E :
C(S) → B(H⊗N ) is an estimator we can construct a cloning map TE : B(H⊗M ) →
B(H⊗N ) by (cf. Equation (4.19))
Z
TE∗ (ρ⊗N ) = σ ⊗M tr[E(dσ)ρ⊗N ]. (10.1)
S
where ψ, φ ∈ H⊗M and fψφ ∈ C(S) is the function given by fψφ (σ) = hψ, σ ⊗M φi.
If we insert TE into the figure of merit ∆C 1 we get according to Equation (9.2)
£ ¤
∆C1 (TE ) = 1 − inf tr ρ(j) TE∗ (ρ⊗N ) (10.3)
ρ pure,j
Z
= 1 − inf tr(ρσ) tr[E(dσ)ρ⊗N ] (10.4)
ρ pure S
£ ¤
= 1 − inf tr ρhEiρ (10.5)
ρ pure
where hEiρ denotes the expectation value of E in the state ρ⊗N , i.e.
Z
£ ¤
hψ, hEiρ φi = hψ, σφi tr[E(dσ)ρ⊗N ] = tr E(hψφ )ρ⊗N , ∀φ, ψ ∈ H (10.6)
S
with fψφ ∈ C(S), fψφ (ρ) = hψ, ρφi. Hence a possible figure of merit for the estima-
tion of pure states is the biggest deviation of the expectation value from the “true”
density matrix ρ, i.e. ¡ £ ¤¢
∆Ep (E) = sup 1 − tr ρhEiρ . (10.7)
ρpure
and therefore hαU Eiρ = U hEiU ∗ ρU U ∗ holds. In the same way we can show that
hατ Eiρ = hEiρ holds for each permutation τ ∈ SN . Inserting this in (10.7) we get
¡ £ ¤¢
∆E
p (αU,τ T ) = sup 1 − tr ρhαU,τ Eiρ (10.12)
ρ pure
¡ £ ¤¢
= sup 1 − tr ρU hEiU ∗ ρU U ∗ (10.13)
ρ pure
¡ £ ¤¢
= sup 1 − tr (U ∗ ρU )hEiU ∗ ρU = ∆E
p (E). (10.14)
ρ pure
We can invoke therefore Lemma 8.2.3 to see that we can search estimators which
minimize ∆E p among the covariant ones, and the latter are completely characterized
by Theorem 8.2.12. The general structure is still quite complicate. However we know
that the input states are pure and this leads to several possible simplifications.
First of all ∆Ep (E) detects only the part of E which is supported by the symmetric
⊗N
subspace H+ . Hence we can restrict the discussion in this subsection to observables
of the form
⊗N
E : C(S) → B(H+ ). (10.15)
This is the same type of assumption we have made already in the last Chapter.
In addition it is reasonable to search an optimal pure state estimator among those
observables which are concentrated on the set of pure states. But the latter is tran-
sitive under the action of U(d). Covariance implies therefore according to Theorem
8.2.11 that we have to look for observables of the form
Z
E(f ) = f (U σ0 U ∗ )U ⊗N P0 U ⊗N ∗ dU (10.16)
U(d)
⊗N
where σ0 is a fixed but arbitrary pure state and P0 is a positive operator on H+ .
⊗N
The most obvious choice for P0 is just σ0 . Hence we define
Z
b ) = d[N ]
E(f f (U σ0 U ∗ )(U σ0 U ∗ )⊗N dU. (10.17)
U(d)
145 10.1. Estimating pure states
Sometimes it is useful to keep track about the number of input systems on which
bN instead of E.
the estimator operates. In this case we write E b
b
The map E is obviously positive. To see that it is unital as well (and therefore
an observable), note that
Z Z
(U σU ∗ )⊗N dU = U ⊗N σ ⊗N U ⊗N ∗ dU (10.18)
U(d) U(d)
⊗N
is an operator which is supported by the symmetric subspace H+ and commutes
⊗N N
with all unitaries U . Hence, by irreducibility of π+ it coincides with λSN where
λ is a positive constant which can be determined by
"Z # Z
λd[N ] = λ tr(SN ) = tr (U σU ∗ )⊗N dU = tr(U σU ∗ )N dU = 1. (10.19)
U(d) U(d)
b d−1
∆E
p (E) = (10.20)
N +d
and is therefore optimal.
£ ¤
b ρ does not depend on the pure state
Proof. Due to covariance the quantity tr ρhEi
ρ. Hence we have
£ ¤
b = 1 − tr ρhEi
∆E (E) b ρ = 1 − hψ, hEi
b ρ ψi (10.21)
p
for an arbitrary but fixed ρ = |ψihψ|. Inserting Equation (10.17) into (10.6) we get
Z
£ ¤
b ρ = d[N ]
tr ρhEi tr(ρU σU ∗ ) tr(ρ⊗N (U σU ∗ )⊗N )dU (10.22)
U(d)
" Z #
d[N ]
= d[N ] tr ρ⊗(N +1) (U σU ∗ )⊗(N +1) dU = (10.23)
U(d) d[N + 1]
where we have used for the last equality the same reasoning as in Equation (10.19).
Hence Equation (10.20) follows from the definition of d[N ] in (9.10) and optimality
is an immediate consequence of the bound (10.8), derived from optimal cloning. 2
£ ¤
b
effect E(ω) b
such that tr ρ⊗N E(ω) is the probability to get an estimate in ω if the
N input systems where in the joint state ρ⊗N . Since E b is concentrated on the set
P ⊂ S of pure states, only subsets ω of P are interesting here. This leads to the
following theorem.
Theorem 10.1.2 Consider the estimator E b defined in Equation (10.17). The se-
quence of probability measures on the set P of pure states
£ ¤
b
KN (ω) = tr E(ω)ρ ⊗N
(10.24)
satisfies the large deviation principle with rate function I(σ) = − ln tr(ρσ).
Proof. We use Theorem 10.3.5 and show that the probability measures KN satisfy
the Laplace principle (cf. Definition 10.3.4). Hence consider a continuous, bounded
function f : P → R and
Z
1 ∗
lim e−N f (U σ0 U ) tr(ρU σ0 U ∗ )N dU (10.25)
N →∞ N U(d)
Z
1 ∗ ∗
= lim e−N [f (U σ0 U )−ln tr(ρU σ0 U )] dU (10.26)
N →∞ N U(d)
£ ¤
= − inf f (U σ0 U ∗ ) − ln tr(ρU σ0 U ∗ ) (10.27)
U ∈U(d)
£ ¤
= − inf f (σ) − ln tr(ρσ) . (10.28)
σ∈P
To derive the second Equation we have used Varadhan’s Theorem (Theorem 10.3.2)
and the fact that a constant sequence of measures satisfies the large deviation prin-
ciple with zero rate function. Theorem 10.3.5 now implies that the KN satisfy the
large deviation principle as stated. 2
Since the rate function I(σ) = − ln tr(ρσ) is positive and vanishes only for σ = ρ
£ ¤
we see that the probability measures KN (ω) = tr ρ⊗N E bN (ω) converge weakly to a
point measure concentrated at σ = ρ. This shows that the estimation scheme which
is given by the sequence of optimal estimators E bN is asymptotically exact (cf. the
corresponding discussion in Section 4.2).
10.2 Estimating mixed states
If no a priori information about the state ρ of the input systems is available, we
can try to generalize the figure of merit ∆Ep by replacing the supremum over all
pure states with the supremum over all density matrices. In addition we have to
use a different distance measure which is more appropriate for mixed states. A
good choice is the trace-norm distance (for a discussion of fidelities of mixed states
consider the corresponding Section in [172]) and we get
∆E
m (E) = sup kρ − hEiρ k1 . (10.29)
ρ∈S
while E can be arbitrary transversal to them. Since the set of all orbits of (10.30)
coincides with the set
d
X
Σ = {x ∈ [0, 1]d | x1 ≥ x2 ≥ . . . ≥ xd ≥ 0, xj = 1} (10.31)
j=1
of ordered spectra (cf. Equation (8.39)) this observation indicates that the hard
part of the estimation problem is estimating the spectrum of a density matrix, while
the rest can be covered by methods we know already from pure state estimation.
10.2.1 Estimating the spectrum
To follow this idea let us introduce a spectral estimator as an observable
on N quantum systems with values in the set of ordered spectra. If we denote the
natural projection from S to Σ by p : S → Σ (i.e. p(ρ) coincides with the ordered
spectrum of ρ) we can construct a spectral estimator from a full estimator E by
F (f ) = E(f ◦ p), where f ∈ C(Σ). If E is fully symmetric the corresponding F is
invariant under U(d) transformations and permutations, i.e. it satisfies
Following Definition 8.2.4 we will denote each spectral estimator with this invari-
ance property fully symmetric. Theorem 8.2.5 implies immediately the following
proposition.
Proposition 10.2.1 Consider a fully symmetric, spectral estimator, i.e. an ob-
servable F : C(Σ) → B(H⊗N ) satisfying Equation (10.33). There is a sequence µm ,
m ∈ Yd (N ) of probability measures on Σ such that
X Z
F (f ) = Pm f (x)µm (dx) (10.34)
m∈Yd (M ) Σ
holds.
We see that the structure of spectral estimators becomes much easier if we
restrict our attention to the projection valued case. To indicate that we can do
this without loosing estimation quality, let us have a short look on an optimization
problem, which is similar to the one in the last section. To this end consider the
expectation value of a (general) spectral estimator F
Z
£ ¤
hF iρ = x tr ρ⊗N F (dx) , (10.37)
Σ
10. State estimation 148
∆E 2
s (F ) = sup khF iρ − p(ρ)k , (10.38)
ρ∈S
where p(ρ) ∈ Σ denotes again the ordered spectrum of ρ and k · k is the usual
norm1 of Rd . In contrast to Theorem 10.1.1 we are not able to state a minimizer of
this quantity explicitly. However, we can show at least that it is sufficient to search
among projection valued observables.
Proposition 10.2.3 The figure of merit ∆E
s is minimized by a projection valued
estimator.
Proof. Using similar reasoning as in the pure state case (cf. Lemma 8.2.3) it is
easy to see that ∆E e
s is minimized by a fully symmetric estimator F , i.e. we have
E e E
∆s (F ) = inf F ∆s (F ). Inserting Equation (10.34) into (10.37) we get
X Z
£ ¤
e
hF i ρ = tr Pm ρPm xµm (dx). (10.39)
m∈Yd (N ) Σ
and therefore
° °2
° °
° X °
∆E ( e
F ) = sup ° χ (ρ) dim K x(m) − p(ρ) ° , (10.42)
s °
ρ∈S °
m m °
m∈Yd (N ) °
det ρ>0
where x(m) are the first moments of the probability measures µm , i.e.
Z
x(m) = xµm (dx). (10.43)
Σ
The map m 7→ x(m) from Equation (10.43) defines according to Corollary 10.2.2 a
projection valued, spectral estimator F which satisfies ∆E e E
s (F ) = ∆s (F ). 2
the most simple quantity, because the term under the sup becomes a polynomial (although its
coefficients depends in a difficult way on m and N .)
149 10.2. Estimating mixed states
It turns out somewhat surprisingly that these FbN form an asymptotically exact
estimation scheme, i.e. the probability measures tr[FbN (ω)ρ⊗N ] converge weakly to
the point measure at the spectrum p(ρ) of ρ. Explicitly, for each continuous function
f on Σ we have
Z X ³m´ ¡ ¢ ¡ ¢
lim f (x) tr[FbN (dx)ρ⊗N ] = lim f tr ρ⊗N Pm = f p(ρ) (10.46)
N →∞ Σ N →∞ N
Y
We illustrate this in Figure 10.1, for d = 3, and ρ a density operator with spectrum
r = (0.6, 0.3, 0.1). Then Σ is a triangle with corners A = (1, 0, 0), B = (1/2, 1/2, 0),
and C = (1/3, 1/3, 1/3), and we plot the probabilities tr(ρ⊗N Pm ) over m/N ∈ Σ.
This behavior was observed already by Alicki et. al. [5] in the framework of statistical
mechanics. We will prove now the following stronger result.
Theorem 10.2.4 The sequence of probability measures
KN (ω) = tr[FbN (ω)ρ⊗N ] (10.47)
satisfies the large deviation principle on Σ with rate function
X
Σ 3 x 7→ I(x) = xj (ln xj − ln pj (ρ)) ∈ [0, ∞] (10.48)
j
To simplify the calculations note first that it is, due to U(d) invariance of Fb , sufficient
to consider diagonal density matrices, with eigenvalues given in decreasing order.
Further simplification arises if we set ρ = eh where h = diag(h1 , . . . , hd ) with
h1 ≥ h2 ≥ . . . ≥ hd . Note that we exclude by this choice singular density matrices,
i.e. those with zero eigenvalue. However we can retain the latter as a limiting case,
if some of the hj goes to infinity. Hence, to restrict the analysis to ρ = eh is no loss
of generality.
Now we can define
Z
1
cN (y, h) = ln eN hx,yi tr[FbN (dx)(eh )⊗N ] (10.50)
N Rd
1 X
= ln ehm,yi χm (eh ) dim Km (10.51)
N
m∈Yd (N )
10. State estimation 150
Figure 10.1: Probability distribution tr(ρ⊗N Pm ) for d = 3, N = 20, 100, 500 and r =
(0.6, 0.3, 0.1). The set Σ is the triangle with corners A = (1, 0, 0), B = (1/2, 1/2, 0),
C = (1/3, 1/3, 1/3).
151 10.2. Estimating mixed states
where χm denotes again the character of πm . It is easy to see that cn (y, h) exists for
each y and h. Hence it remains to show that the limit c(y, h) = limN →∞ cN (y, h)
exists and the function y 7→ c(y, h) is differentiable.
To get a more explicit expression for χm (ρ), note that h is an element of the
Cartan subalgebra tC of gl(d, C). Hence we can calculate λ · h for each weight λ ∈ t∗C
of πm (cf. Subsection 8.3.2). If we denote in addition the multiplicity of λ (i.e. the
dimension of the weight subspace of λ) by mult(λ) we get
X
χm (ρ) = mult(λ)eλ·h (10.52)
λ
where the sum is taken over the set of all weight λ of πm . If the matrix elements hk
of h are given (as assumed) in decreasing order, exp(m·h) is the biggest exponential
(this is equivalent to the statement that m is the highest weight) and we have
em·h ≤ χm (ρ) ≤ dim (Hm ) em·h . (10.53)
Using the Weyl dimension formula it can be shown [76] that dim(Hm ) is bounded
from above by a polynomial in N , i.e.
dim (Hm ) ≤ (a1 + a2 N )a3 (10.54)
with positive constants a1 , a2 , a3 . Inserting this in Equation (10.51) we get
a3 ln(a1 + a2 N )
cN (y + h) ≤ cN (y, h) ≤
e +e
cN (y + h) (10.55)
N
where X
1
e
cN (y) = ln em·y dim Km , (10.56)
N
m∈Yd (N )
and we have identified here the diagonal matrix h = diag(h1 , . . . , hd ) with the d-
tuple h = (h1 , . . . , hd ) ∈ Rd . Equation (10.55) implies that
c(y, h) = lim cN (y, h) = lim e
cN (y + h) (10.57)
N →∞ N →∞
2 If y ≥ · · · ≥ y holds, there is a direct way to calculate e c(y), because we can invoke Equation
1 d
(10.57) to show that e c(y) = c(0, y). Using the definition of cN (y, h) in Equation (10.50) we easily
Pd
get e
c(y) = ln j=1
exp(yj ); cf. [137]. For a general y ∈ Rd this argument does not work, however.
10. State estimation 152
(10.70)
where the factors dim(Hm ) are needed for normalization (this is straightforward to
check). As for the spectral estimator Fb and in contrast to the pure state case it is not
clear whether E b is optimal for finite N , i.e. whether it minimizes an appropriately
chosen figure of merit. Nevertheless, we can extend the large deviation result from
Theorem 10.2.4 to get the following:
Theorem 10.2.6 Consider the estimator E bN from Equation (10.70) and a density
matrix ρ. The sequence £ ¤
KN (ω) = tr E bN (ω)ρ⊗N (10.71)
satisfies the large deviation principle with a rate function I : S → [0, ∞] which is
given by
Xd µ · ¸¶
pmj (U ∗ ρU )
I(U ρx U ∗ ) = xj ln(xj ) − ln (10.72)
j=1
pmj−1 (U ∗ ρU )
where x ∈ Σ, ρx is the density matrix from Equation (10.67), U ∈ U(d) and pmj (σ)
denotes: the principal minor of the matrix σ for j = 1, . . . , d and pm0 (σ) = 1 for
j = 0.
Proof. We will show that the measures KN satisfy the Laplace principle (Definition
10.3.4) which implies according to Theorem 10.3.5 the large deviation principle.
Hence we have to consider
Z
e−N f (ρ) KN (dρ) =
S
X Z
∗
dim Hm dim Km e−N f (U ρm/N U ) hφm , πm (U ∗ ρU )φm idU, (10.73)
m∈Yd (N ) U(d)
where we have set md+1 = 0. Note that the right hand side of this equation makes
sense even if the exponents are not integer valued. We can rewrite therefore Equation
(10.73) with the probability measure
Z X
1 m
h(x)LN (dx) = N h( ) dim(Hm ) dim(Km ) (10.75)
Σ d N
m∈Yd (N )
to get
Z
e−N f (ρ) KN (dρ) = (10.76)
S
Z Z d
Y
N −N f (U ρx U ∗ )
= d e pmk (U ∗ ρU )N (xk −xk+1 ) dU LN (dx) (10.77)
Σ U(d) k=1
Z Z
¡ £ ¤¢
= exp −N f (U ρx U ∗ ) − ln(d) − I1 (U, x) dU LN (dx) (10.78)
Σ U(d)
with
d
X £ ¤
I1 (U, x) = (xk − xk+1 ) ln pmk (U ∗ ρU ) (10.79)
k=1
where we have set xd+1 = 0. Now we can apply Lemma 10.2.5 and Equation (10.54)
to see that the LN satisfy the large deviation principle on Σ with rate function3
d
X
I0 (x) = ln(d) + xj ln(xj ). (10.80)
j=1
The product measures dU LN (dx) satisfy therefore the large deviation principle as
well, with the same rate function, but on U(d) × Σ. Varadhan’s Theorem 10.3.2
implies therefore
Z
e−N f (σ) KN (dσ) = − inf (f (U ρx U ∗ ) − ln(d) − I1 (U, x) + I0 (x)) (10.81)
S x,U
Xd
= − inf f (U ρx U ∗ ) + xj ln(xj ) − I1 (U, x) . (10.82)
x,U
j=1
Hence the KN satisfy the Laplace principle, provided there is a well defined function
I : S → [0, ∞] with I(U ρx U ∗ ) = ln(d) − I1 (U, x) + I0 (x).
Lemma 10.2.7 There is a (unique) continuous function I on S such that
d
X
I(U ρx U ∗ ) = xj ln(xj ) − I1 (U, x) (10.83)
j=1
eN from Lemma 10.2.5 are slightly different. We have to use therefore Theorem
3 The measures K
10.3.3 and the same reasoning as in the last paragraph of the proof of Theorem 10.2.4.
155 10.3. Appendix: Large deviation theory
with j0 = 0 and pm0 (σ) = 1. On the other hand [U, ρx ] = 0 implies that U is block
diagonal U = diag(U1 , . . . , Uk ) with Uα ∈ U(dα ), dα = jα+1 − jα . Hence we have
pmjα (U ∗ ρU ) = pmjα (ρ) for all such U and all α. Together with Equation (10.84)
this shows that I is well defined.
To prove positivity of I consider a fixed x ∈ Σ. To get a lower bound on
d
X · ¸
pmj (U ∗ ρU )
− xj ln (10.85)
j=1
pmj−1 (U ∗ ρU )
we have to choose U such that the − ln terms are given in increasing £ order;¤ i.e. the
reverse ordering of the xj . This implies in particular that − ln pm1 (U ∗ ρU ) should
be as small as possible, in other words pm1 (U ∗ ρU ) should be as big as possible. This
is achieved if pm1 (U ∗ ρU ) coincides with the biggest eigenvalue λ1 of ρ. In this case
the basis vector e1 has to be the eigenvector of U ∗ ρU which corresponds to λ1 . This
shows that biggest possible value of pm2 (U ∗ ρU ) is λ1 λ2 , where λ2 is the second
biggest eigenvalue of ρ. Again, this implies that e2 is the corresponding eigenvector
of U ∗ ρU . In this way we can proceed to see that the quantity in Equation (10.85)
is minimized if pmj (U ∗ ρU ) = λ1 λ2 · · · λj , where λj , j = 1, . . . , d are the eigenvalues
of ρ in decreasing order. Hence we get
d
X ¡ ¢
I(x, U ) ≥ xj ln(xj ) − ln(λj ) , (10.86)
j=1
Now we can invoke Theorem 10.3.5 which implies together with Equation (10.82)
and the preceding lemma the Theorem. 2
1. I is lower semicontinuous
Throughout this work we are using two different methods to prove that a given
sequence KN , N ∈ N satisfies the large deviation principle. One possibility, which
leads to the Gärtner-Ellis Theorem [85, Thm. II.6.1], is to look at the corresponding
sequence of Laplace transforms.
Theorem 10.3.3 Consider a finite dimensional vector space E with dual E ∗ and
a sequence KN , N ∈ N of probability measures on the Borel subsets of E. Define
cN : E ∗ → R by Z
1
cN (y) = ln eN y(x) KN (dx), (10.90)
N E
and assume that
1I
R∗ σ = λσ + (1 − λ) . (11.1)
d
Now we can follow the general structure given in Equation (8.4): We are searching
for channels which act on N systems in the state ρ = R ∗ (σ), where σ ∈ S(H) is
pure. The target functional we want to approximate is (R ∗ )−1 (ρ) = σ. If we follow
the two general options already encountered in Chapter 9 (all-particle error and
one-particle error) we get
£ ¡ ¢¤
∆R1 (T ) = sup 1 − tr T (σ (j) )(R∗ σ)⊗N = 1 − F1R (T ) (11.2)
σ pure,j
where the supremum is taken over all pure states ρ and j = 1, . . . , N and F1R
denotes the “one-particle fidelity”
£ ¤
F1R (T ) = inf tr T (σ (j) )(R∗ σ)⊗N . (11.3)
σ pure,j
Here σ (j) = 1I⊗· · ·⊗σ ⊗· · ·⊗1I denotes the tensor products with (M −1) factors “1I”
and one factor σ at the j th position (cf. Section 9.1). ∆R 1 measures the worst one
particle error of the output state T ∗ ([R∗ σ]⊗N ). If we are interested in correlations
too, we have to choose
£ ¡ ⊗M
¢¤
∆Rall (T ) = sup 1 − tr T (σ )(R∗ σ)⊗N = 1 − Fall R
(T ) (11.4)
σ,pure
11. Purification 158
The second special feature of the qubit case concerns the rather simple structure
of the πs . For each s the Hilbert space Hs is naturally isomorphic to the symmetric
⊗2s 2s
tensor product H+ and πs is unitarily equivalent to π+ (the restriction of U 7→
⊗2s ⊗2s
U to H+ ). The L decomposition of a fully symmetric T from 8.2.8 therefore
becomes T (A) = s T s (A) ⊗ 1
I, with fully symmetric channels Ts : B(H⊗M ) →
⊗2s
B(H+ ). Hence the Ts are exactly of the special form we have studied already in
Chapter 9 within optimal cloning. Hence let us define
M
Qb : B(H⊗M ) → B(H⊗N ), Q(A) b = Tb2s→M (A) ⊗ 1I (11.8)
s∈I[N ]
with
2s + 1
Tb2s→M
∗
(θ) = SM (θ ⊗ 1I⊗(M −2s) )SM 2s < M (11.9)
M +1
and
Tb2s→M
∗
(θ) = tr2s−M θ 2s ≥ M. (11.10)
The action of Qb on a system in the state ρ ⊗N
can be interpreted as follows: First
apply an instrument to the system which produces with probability
£ ¤
wN (s) = tr Ps ρ⊗N Ps (11.11)
λ = tanh(β). (11.14)
exp(2βL3 )
ρ(β)⊗N = (11.15)
2 cosh(β))N
where
1³ ´
B(H⊗N ) 3 L3 = σ3 ⊗ 1I⊗(N −1) + · · · + 1I⊗(N −1) ⊗ σ3 (11.16)
2
7 U ⊗N .
denotes the 3–component of angular momentum in the representation U →
Similarly we get
¡ ¢
πs ρ(β) sinh(β)
ρs (β) = ¡ ¢= ¡ ¢ exp(2βL(s)
3 ) (11.17)
χs ρ(β) sinh (2s + 1)β
(s)
where L3 denotes again the 3–component of angular momentum but now in the
representation πs . For wN (s) introduced in (11.11) we can write
¡ ¢
¡ ⊗N
¢ sinh (2s + 1)β
wN (s) = tr Ps ρ(β) Ps = dim KN,s . (11.18)
sinh(β)(2 cosh(β))N
The quantities wN (s) are closely related to the spectral estimator Fb introduced
in Equation (10.45). We only have to identify the set Σ with the interval [0, 1/2]
according to the map [0, 1/2] 3 λ 7→ (1/2 + λ, 1/2 − λ) ∈ Σ. Then we get
£ ¤ X
tr Fb (f )ρ(β)⊗N = f (s/N )wN (s). (11.20)
s∈I[N ]
This observation will be very useful inPSection 11.4. For now, note that the wN (s)
define a probability measure. Hence s wN (s) = 1 and 0 ≤ wN (s) ≤ 1. Together
with the fact that the multiplicities dim KN,s are independent of β we can extract
from Equation (11.18) a generating functional for dim KN,s :
X ¡ ¢
2 sinh(β)(2 cosh(β))N = 2 sinh (2s + 1)β dim KN,s (11.21)
s∈I[N ]
¡ ¢¡ ¢ X ³ ´
−β −β N
= e −e β β
e +e = e(2s+1)β − e−(2s+1)β dim KN,s , (11.22)
s∈I[N ]
obtaining µ ¶
2s + 1 N
dim KN,s = , (11.23)
N/2 + s + 1 N/2 − s
provided N/2−s is integer, and zero otherwise. The same result can be derived using
representation theory of the symmetric group; see [195], where the more general case
dim H = d ∈ N is studied.
11.2.2 The one qubit fidelity
b To this end note that due to covariance of the
Our next task is to calculate F1R (Q).
depolarizing channel R the expression under the infima defining F1R (T ) in Equation
(11.3) depends for any fully symmetric purifier T not on σ and i. I.e. we get with
R∗ σ = ρ(β): h ¡ ¢i
F1R (T ) = tr σ (1) T ∗ ρ(β)⊗N (11.24)
with σ = |ψihψ|. Further simplification arises, if we introduce the black cow param-
eter γ(θ) which is defined for each density matrix θ on H ⊗M by
1
γ(θ) = tr(2L3 θ). (11.25)
M
To derive the relation of γ to F1R note that full symmetry of T implies equivalently
to (11.24)
XM
1 ¡ ¢
F1R (T ) = tr σ (j) T ∗ ρ(β)⊗N . (11.26)
M j=1
Since σ = (1I + σ3 )/2 holds with the Pauli matrix σ3 we get together with the
definition of L3 in Equation (11.16)
1h £ ¤i
1 + γ T ∗ (ρ(β)⊗N ) .
F1R (T ) = (11.27)
2
£ ¤
In other words it is sufficient to calculate γ T ∗ (ρ(β)⊗N ) (which is simpler because
SU(2) representation theory is more directly applicable) instead of F1R (T ).
Another advantage of γ is its close relation to the parameter λ = tanh(β) defin-
ing the operation R∗ in Equation (11.1). In fact we have
1 ¡ ¢ 1 ¡ ¢
γ(ρ(β)⊗N ) = tr 2L3 ρ(β)⊗N = N tr σ3 ρ(β) = tanh(β) = λ. (11.28)
N N
161 11.2. Calculating fidelities
¡ ¢
In other words the one particle restrictions of the output state T ρ(β)⊗N are given
by
£ ¤ £ ¤ 1I
γ T (ρ(β)⊗N ) σ + 1 − γ[T (ρ(β)⊗N )] . (11.29)
2
£ ¤
This implies that γ T (ρ(β)⊗N ) > λ should hold if T is really a purifier. Now we
can prove the following proposition:
b of the optimal purifier is given
Proposition 11.2.1 The one–qubit fidelity F1R (Q)
by X
b =
F1R (Q) wN (s)f1 (M, β, s) (11.30)
s∈I[N ]
with
2f1 (M, β, s) − 1 =
¡ ¢
2s + 1 1
2s coth (2s + 1)β − 2s coth β for 2s > M
= (11.31)
1 M + 2³
¡ ¢ ´
(2s + 1) coth (2s + 1)β − coth β for 2s ≤ M .
2s + 2 M
Proof. According to Equation (11.8) and (11.27) we have
X £ ¤
b = 1 1 +
F1R (Q) wN (s)γ Tb2s→M
∗
(ρs (β)) (11.32)
2
s∈I[N ]
X
=: wN (s)f1 (M, β, s), (11.33)
s∈I[N ]
where the supremum is taken over all fully symmetric channels T : B(H ⊗M ) →
⊗2s
B(H+ ).
(s)
Proof. Validity of T (L3 ) = ω(T )L3 follows from Lemma 9.4.1. If 2s < M Equation
(11.35) is a consequence of Theorem 9.2.3. For 2s ≥ M note first that the one-qubit
error of Tb2s→M vanishes, i.e. ∆C b
1 (T2s→M ) = 0; cf. Equation (9.4). On the other
hand we know from Proposition 9.4.2 that ∆C (Tb2s→M ) is related to ω(Tb2s→M ) by
1
µ ¶
b 1 2s b
∆C
1 (T2s→M ) = 1− ω(T2s→M ) . (11.36)
2 M
Now we have
£ ∗ ¤ 1 £ b ¤
2f1 (M, β, s) − 1 = γ Tb2s→M (ρs (β)) = tr 2T2s→M (L3 )ρs (β) (11.37)
M
ω(Tb2s→M ) (s) ω(Tb2s→M )2s
= tr[2L3 ρs (β)] = γ[ρs (β)]. (11.38)
M M
and
¡ (s) (s) ¢
¡ ¢ 1 ³ (s) ´ 1 tr 2L3 exp(2βL3 )
γ ρs (β) = tr 2L3 ρs (β) = ¡ ¢ (11.39)
2s 2s tr exp(2βL(s) )
3
1 d ¡ (s) ¢
= ln tr exp(2βL3 ) (11.40)
2s dβ
1 d ¡ ¡ ¢ ¢
= ln sinh (2s + 1)β − ln sinh β (11.41)
2s dβ
2s + 1 ¡ ¢ 1
= coth (2s + 1)β − coth β (11.42)
2s 2s
Inserting the values of ω(Tb2s→M ) from Equation (11.42) we get Equation (11.31).
2
with σ = |ψihψ|. Using this relation we can prove the following proposition:
R b
Proposition 11.2.3 The all–qubit fidelity Fall (Q) of the optimal purifier is given
by X
R b
Fall (Q) = wN (s)fall (M, β, s) (11.44)
s∈I[N ]
Proof. Using the decomposition of Q b given in Equation (11.8) we get for the optimal
purifier something similar as in the last subsection:
X h ¡ ¢i
R b
Fall (Q) = wN (s) tr σ ⊗M Tb2s→M
∗
ρs (β) . (11.46)
s∈I[N ]
is now more difficult, since the knowledge of Tb2s→M (L3 ) = ω(Tb2s→M )Ls3 is not
sufficient in this case. Hence we have to use the explicit form of Tb2s→M in Equation
163 11.3. Solution of the optimization problems
2s + 1
fall (M, β, s) = hψ ⊗M , SM (ρs (β) ⊗ 1I⊗(M −2s) )SM ψ ⊗M i (11.48)
M +1
2s + 1
= hψ ⊗M , (ρs (β) ⊗ 1I⊗(M −2s) )ψ ⊗M i (11.49)
M +1
2s + 1
= hψ ⊗2s , ρs (β)ψ ⊗2s i (11.50)
M +1
2s + 1 1 − e−2β
= (11.51)
M +1 1 − e−(4s+2)β
We can now expand the “1I” in Equation (11.53) in product basis, and apply (11.54),
to find
X µ2s − M ¶µ2s¶−1
⊗M ⊗M ⊗(2s−M )
SM [(|ψ ihψ |) ⊗ 1I ]SM = |Ki hK|. (11.55)
K −M K
K
we get
µ ¶−1 X µ ¶
1 − e−2β 2s K 2β(K−s)
fall (M, β, s) = −(4s+2)β
e . (11.58)
1−e M
K
M
Now the statement follows from Equations (11.46), (11.51) and (11.58). 2
and X £ ¡ ¢¤
R
Fall (T ) = wN (s) tr σ ⊗M Ts∗ ρs (β) . (11.61)
s∈I[N ]
The last two Equations show that we have to optimize each component T s of the
purifier T independently. In the one qubit case this is very easy, because we can use
(s) £ ¤ ¡ (s) ¢
Lemma 11.2.2 to get£Ts (L3 ) =¢ω(Ts )L3 and γ Ts∗ (ρs (β)) = ω(Ts ) tr L3 ρs (β) .
Hence maximizing γ Ts∗ (ρs (β) ] is equivalent to maximizing ω(Ts ). But we have
according to Lemma 11.2.2
M
2s for 2s ≥ M
b
max ω(Ts ) = ω(T2s→M ) = (11.62)
T
M +2
for 2s < M ,
2(s + 1)
which we have to maximize, depends only on this part of the operation. Full sym-
metry implies in addition that Ts∗ (ρs (β)) is diagonal in occupation number basis
(see Equation (11.54)), because Ts∗ (ρs (β)) commutes with each πs0 (U ) (s0 = M/2,
U ∈ U(2)) if πs (U ) commutes with ρs (β).
If M > 2s this means we have Ts∗ (ρs (β)) = κ∗ σ ⊗M + r∗ where r∗ is a positive
operator with σ ⊗M r∗ = r∗ σ ⊗M = 0. Inserting this into (11.63) we see that fs (Ts ) =
κ∗ . Hence we have to ¡maximize κ¢∗ . The first step is an upper bound which we get
from the fact that tr σ ⊗2s ρs (β) 1I − ρs (β) is a positive operator. Since Ts∗ (1I) =
(2s + 1)/(M + 1)1I (another consequence of full symmetry) we have
³ ¡ ¢ ´ 2s + 1 ¡ ⊗2s ¢
0 ≤ T tr σ ⊗2s ρs (β) 1I − ρs (β) = tr σ ρs (β) 1I − κσ ⊗M − r∗ . (11.64)
M +1
Multiplying this Equation with σ ⊗M and taking the trace we get
2s + 1 ¡ ⊗2s ¢
κ∗ ≤ tr σ ρs (β) . (11.65)
M +1
165 11.4. Asymptotic behavior
However calculating fs (Tb2s→M ) we see that this upper bound is achieved, in other
words Tb2s→M maximizes fs .
If M ≤ 2s holds we have to use slightly different arguments because the estimate
(11.65) is to weak in this case. However we can consider in Equation (11.63) the
dual Ts instead of Ts∗ and use then similar arguments. In fact for each covariant
Ts the quantity Ts (σ ⊗M ) is, due to the same reasons as Ts∗ (ρs (β)) diagonal in the
occupation number basis and we get Ts (σ ⊗M ) = κσ ⊗2s +r where r is again a positive
P2s−1
operator with r = n=0 rn |nihn| (|ni denotes again the occupation number basis)
and κ is a positive constant. Since Ts is unital we get from 1I−σ ⊗M ≥ 0 the estimate
0 ≤ κ ≤ 1 in the same way as Equation (11.65). Calculating Tb2s→M (σ ⊗M ) shows
again that the upper bound κ = 1 is indeed achieved, however it is now not clear
whether maximizing κ is equivalent to maximizing fs (Ts ).
Hence let us show first that κ = 1 is necessary for fs (Ts ) to be maximal. This
follows basically from the fact that Ts is, up to a multiplicative constant, trace
preserving. In fact we have
¡ ¢ ¡ ¢ ¡ ¢ 2s + 1
tr Ts (σ ⊗M ) = tr Ts (σ ⊗M )1I = tr σ ⊗M Ts∗ (1I) = . (11.66)
M +1
This means especially that κ + tr(r) = (2s + 1)/(M + 1) holds, i.e. decreasing κ by
0 < ² < 1 is equivalent to increasing tr(r) by¡ the same ¢². Taking into account that
P2s
ρs (β) = n=0 hn |nihn| holds with hn = exp 2β(n − s) , we see that reducing κ by
² reduces fs (Ts ) at least by
³ ¡ ¢ ¡ ¢´ ¡ ¢
² tr σ ⊗2s ρs (β) − tr |2s − 1ih2s − 1|ρs (β) = ² e2βs − e(2s−1)β > 0. (11.67)
Therefore κ = 1 is necessary.
The last question we have to answer, is how the rest term r has to be chosen,
for fs (Ts ) to be maximal. To this end let us consider the cloning fidelity of Ts ,
C
i.e. Fall (Ts ). It is in contrast to fs (Ts ) maximized iff κ = 1. However the operation
C
which maximizes Fall (Ts ) is according to Proposition 9.3.4 unique. This implies that
κ = 1 fixes Ts completely. Together with the facts that κ = 1 is necessary for fs (Ts )
to be maximal and κ = 1 is realized for Tb2s→M we conclude that max fs (Ts ) =
fs (Tb2s→M ) holds, which proves the assertion. 2
f1 (M, β, s) is relevant in this situation and we get, together with Equation (11.30),
the expression
F1max (N, ∞) =
· ´¸
X 1 1 ³ ¡ ¢
wN (s) 1 + (2s + 1) coth (2s + 1)β − coth β , (11.69)
2 2s + 2
s∈I[N ]
which obviously takes its values between 0 and 1. To take the limit N → ∞ we can
write
X 2s
lim F1max (N, ∞) = lim wN (s)fN,∞ ( ) (11.70)
N →∞ N →∞ N
s∈I[N ]
with
· ´¸
1 1 ³ ¡ ¢
fN,∞ (x) = 1+ (N x + 1) coth (N x + 1)β − coth β . (11.71)
2 Nx + 2
The functions fN,∞ are continuous, bounded and converge on each interval (², 1)
with 0 < ² < 1 uniformly to f∞,∞ ≡ 1. Hence the assumptions of Lemma 11.4.1 are
fulfilled and we get
lim F1max (N, ∞) = f∞,∞ (λ) = 1 (11.72)
N →∞
Hence we can produce arbitrarily good purified qubits at infinite rate if we have
enough input systems. In other words we have proved the following proposition:
Proposition 11.4.2 For each asymptotic rate r > 0 the optimal one-qubit fidelity
from Equation (11.59) satisfies
Let us consider now F1max (N, M ) for M < ∞. Since F1max (N, M ) > F1max (N, ∞)
we have obviously limN →∞ F1max (N, M ) = 1 for all M . Hence there is no difference
between finite and infinite output systems, as long as we are looking only at the limit
limN →∞ F1max (N, M ). Our next task is therefore to analyze how fast the quantities
F1max (N, M ) approaches 1 as N → ∞. To this end we compare three different quan-
tities F1max (N, ∞), F1max (N, 1) and f1 (1, β, N/2). The latter is the maximal fidelity
we can expect for N input systems. It corresponds to a device which produces an
output only with probability wN (N/2) and declares failure otherwise (from Lemma
11.4.1 we see that this probability goes to 0 as N → ∞). In slight abuse of notation
we write F1max (N, 0) = f1 (1, β, N/2) expressing that this is the case with no de-
mands on output numbers at all. The results are given in the following proposition
and plotted in Figure 11.1.
Proposition 11.4.3 The leading asymptotic behavior (as N → ∞) of F1max (N, M )
for the cases M = 0, 1, ∞ is of the form
µ ¶
max cM 1
Fone (N, M ) = 1 − +o (11.74)
2N N
¡ ¢
where, as usual, o N1 stands for terms going to zero faster than N1 , and with
c0 = (1 − λ)/λ (11.75)
2
c1 = (1 − λ)/λ (11.76)
2
c∞ = (λ + 1)/λ (11.77)
167 11.4. Asymptotic behavior
1
F1 (M, N )
0.95
0.9
0.85
0.8
0.75
0.7
0.65
M =0
0.6 M =1
M =∞
0.55
0 20 40 60 80 100 120 140 160 180 200
with feN,∞ = N (1 − fN,∞ ). The existence of this limit is equivalent to the asymp-
totic formula (11.74). Lemma 11.4.1 leads to c∞ /2 = fe∞,∞ (λ) with fe∞,∞ =
limN →∞ feN,∞ uniformly on (², 1). To calculate fe∞,∞ note that
N N coth β
feN,∞ (x) = + + Rest (11.79)
Nx + 2 Nx + 2
holds, where “Rest” is a term which vanishes exponentially fast as N → ∞. Hence
with coth β = 1/λ we get
1+λ
c∞ = 2fe∞,∞ (λ) = . (11.80)
λ2
The asymptotic behavior of F1max (N, 1) can be analyzed in the same way. The
only difference is that we have to consider now the 1 = M ≤ 2s branch of Equation
(11.31). In analogy to Equation (11.78) we have to look at
X 2s c1
lim N (1 − F1max (N, 1)) = wN (s)feN,1 ( ) = (11.81)
N →∞ N 2
s∈I[N ]
where coefficients with M +R > K are defined to be zero. We can write the non-zero
coefficients as
µ ¶−1 µ ¶
K K −R (K − M )!(K − R)!
c(K, M, R) = = (11.91)
M M K!(K − R − M )!
(K − M ) (K − M − 1) (K − M − R + 1)
= ··· (11.92)
K (K − 1) (K − R + 1)
Y³
R−1
M ´
= 1− . (11.93)
K −S
S=0
169 11.4. Asymptotic behavior
To calculate now Φ(µ) recall that the weights wN (s) approach a point measure
in 2s/N =: x concentrated at λ = tr(ρ(β)σ3 ). This means that in Equation (11.44)
only the term with 2s = λN survives the limit. Hence if µ ≥ λ we get M ≥ λN = 2s.
Using Equation (11.45) and Lemma 11.4.1 we get in this case
λ
Φ(µ) = (1 − e−2β ). (11.94)
µ
1 − e−2β
Φ(µ) = (11.95)
1 − (1 − µ/λ)e−2β
1
theta=0.25
Φ(µ) theta=0.50
0.9 theta=0.75
theta=1.00
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5 2
µ
Figure 11.2: Asymptotic all-qubit fidelity Φ(µ) plotted as function of the rate µ.
Chapter 12
Quantum game theory
games – the two-person case is, however, sufficient for our purposes.
2 To be more precise we have to decompose the set of all nodes into equivalence classes (“in-
formation sets”) such that all nodes in a class belong to the same player and the same moves are
available at each node. Each equivalence class represent the same position although each node in
a class is given by a different combination of moves. A strategy is then a map from equivalence
171 12.1. Overview
strategy is given for each player we get a path in the tree which connects the top
node with one of the end nodes where the payoffs are given. Hence we can construct
the normal form of a game from its extensive form but the converse is not true. The
normal form of a game should therefore be regarded as a summary representation
– which contains, however, for many purposes enough information.
The aim of each player is of course to maximize her (or his) payoff uj (sA , sB ) by
judicious choice of the own strategy. In general, however, this has to be done without
knowledge about the choice of the opponent. The most important concept to solve
this problem is the Nash equilibrium. A pair of strategies (b sA , sbB ) ∈ XA × XB is
called pure Nash equilibrium if
uA (sA , sbB ) ≤ uA (b sA , sB ) ≤ uB (b
sA , sbB ) and uB (b sA , sbB ) (12.1)
Now we can define a (mixed) Nash equilibrium to a be pair pbA ∈ S(XA ), pbB ∈
S(XB ) such that
holds for all pA ∈ S(XA ), pB ∈ S(XB ). Due to a well known theorem of Nash [167]
each finite normal form game has a mixed Nash equilibrium, which is however in
general not unique.
12.1.2 Quantum games
Let us turn now to quantum games. Roughly speaking a quantum game is nothing
else but a usual game (in the sense described above) which is played with quantum
systems, i.e. the strategies can be represented by quantum operations. There are
several proposals which try to make this informal idea more precise. Most of them
are based on the normal form description of a game [84, 156, 149]. It is however
quite difficult to provide a definition which describes all relevant physical ideas
without excluding interesting examples (e.g. the version of the Monty Hall game
described in Section 12.2 is not covered by most of the proposed definitions). We
will follow therefore a different, turn based, approach which can be loosely regarded
as a generalization of the extensive form. Maybe it is still not general enough, but it
covers many relevant examples; in particular those which arise from quantum cryp-
tography (cf. Section 12.3). Hence a quantum game is in the following an interactive
process between two players (Alice and Bob) which obeys the following structure:
classes to moves rather than nodes to moves.
12. Quantum game theory 172
(1)
If the game is repeated many times and Alice uses strategy sA with probability λ
(2)
and sA with 1 − λ Equation (12.4) becomes
(1) (2) (1) (2)
λυ(sA , sB ) + (1 − λ)υ(sA , sB ) = υ(λsA + (1 − λ)sA , sB ), (12.6)
where
(1) (2) (1) (2) (1) (2)
λsA + (1 − λ)sA = (λT1 + (1 − λ)T1 , . . . , λTN −1 − (1 − λ)TN −1 ). (12.7)
Hence it is natural to assume that ΣA and ΣB are convex sets and its extremal
elements are the pure strategies.
3 Alternatively we can use a special condition on the classical system (“checkmate”) which
signals the end of the game. This is however more difficult to handle and we do not need this
generalization.
173 12.2. The quantum Monty Hall problem
first appeared. The player is female, like Marilyn vos Savant, who was the first to fight the public
debate for the recognition of the correct solution, and had to take some sexist abuse for that.
12. Quantum game theory 174
2. The candidate is asked to choose one of the three doors, which is, however,
not opened at this stage.
3. The show master opens another door, and shows that there is no prize behind
it. (He can do this, because he knows where the prize is).
4. The candidate can now open one of the remaining doors to either collect her
prize or lose.
Of course, the question is: should the candidate stick to her original choice or
“change her mind” and pick the other remaining door? As a quick test usually
shows, most people will stick to their first choice. After all, before the show mas-
ter opened a door the two doors were equivalent, and they were not touched (nor
was the prize moved). So they should still be equivalent. This argument seems so
obvious that trained mathematicians and physicists fall for it almost as easily as
anybody else.
However, the correct solution by which the candidates can, in fact, double their
chance of winning, is to always choose the other door. The quickest way to convince
people of this is to compare the game with another one, in which the show master
offers the choice of either staying with your choice or opening both other doors.
Anybody would prefer that, especially if the show master courteously offers to open
one of the doors for you. But this is precisely what happens in the original game
when you always change to the other door.
To catch up with the general discussion from the last section let us discuss the
normal form of this game. The pure strategies of Q are described by the numbers
of the doors where the prize is hidden5 XQ = {1, 2, 3}. The player P can choose
as well one of the three doors in round 2 and has to decide whether she switches
(1) or not (0). Hence XP = {1, 2, 3} × {0, 1}. The game is a zero sum game, i.e.
uQ = −uP and uP has only two possible outcomes +1 if P wins and −1 if she looses.
If j ∈ XQ and (k, l) ∈ XP we can write uP simply as uP (j; k, l) = (−1)l (2δkj − 1).
If the game is repeated very often there are unique optimal strategies for both
players. Assume to this end that P has watched each issue of the show and has
calculated the probabilities pj with which the price is hidden behind the door j.
Then her best option is to choose in the 2. round the door with the lowest pj and
her chance to win becomes 1 − minj pj if she switches at the end to the second
unopened door. This is even greater than 2/3 if Q does not uses all three doors with
equal probability. Hence the best option for Q is to choose the uniform distribution.
The pair of strategies “uniform distribution” and “switch to the second door” is
therefore a Nash equilibrium.
12.2.2 The quantum game
We will “quantize” only the key parts of the problem. That is, the prize and the
players, as well as their publicly announced choices, will remain classical. The quan-
tum version can even be played in a game show on classical TV.
The main quantum variable will be the position of the prize. It lies in a 3-
dimensional complex Hilbert space H, called the game space. We assume that an
orthonormal basis is fixed for this space so that vectors can be identified by their
components, but apart from this the basis has no significance for the game. A second
important variable in the game is what we will call the show master’s notepad,
described by an observable algebra N . This might be classical information describing
how the game space was prepared, or it might be a quantum system, entangled with
the prize. In the latter case, the show master is able to do a quantum measurement
5 If P selects the correct door in the 2. step, Q has has to choose in step 3 between two doors.
However, to take this choice as an additional strategical option of Q into account makes things
more difficult without leading to new insights.
175 12.2. The quantum Monty Hall problem
on his notepad, providing him with classical information about the prize, without
moving the prize, in the sense that the player’s information about the prize is
not changed by the mere fact that the show master “consults his notepad”. A
measurement on an auxiliary quantum system, even if entangled with a system of
interest, does not alter the reduced state of the system of interest. After the show
master has consulted his notepad, we are in the same situation as if the notepad
had been a classical system all along. As in the classical game, the situation for
the player might change when the show master, by opening a door, reveals to some
extent what he saw in his notepad. Opening a door corresponds to a measurement
along a one dimensional projection on H. Finally we need a classical system which
is used by both players to exchange classical information. We will call it the mail
box and describe it by a classical observable algebra C(X), where X can be taken
as the space of one-dimensional projections in H (i.e. X is the projective space
P 2 (C)). The overall algebra A which has to be used to describe the game according
to Subsection 12.2.2 is therefore A = B(H) ⊗ N ⊗ C(X).
The game proceeds in the following stages, closely analogous to the classical
game:
1. Before the show Q prepares the game space quantum mechanically and stores
some information about this preparation in his notepad N . The initial state
of the mail box can be arbitrary.
2. The candidate P chooses some one dimensional projection p on H and stores
this as classical information in the mailbox. The game space and the show-
masters notepad (which P can not access) remain untouched.
3. The show master opens a door, i.e., he chooses a one dimensional projection
q, and makes a Lüders/von Neumann measurement with projections q and
(1I − q). In order to do this, he is allowed first to consult his notebook. If it
is a quantum system, this means that he carries out a measurement on the
notebook. The joint state of prize and notebook then change, but the traced
out or reduced state of the prize does not change, as far as the player is
concerned. Two rules constrain the show master’s choice of q: he must choose
“another door” in the sense that q ⊥ p; and he must be certain not to reveal
the prize. The purpose of his notepad is to enable him to do this. After these
steps, the game space is effectively collapsed to the two-dimensional space
(1I − q)H and information about the opened door is stored in the mailbox.
4. The player P reads the mailbox, chooses a one dimensional projection p 0 on
(1I − q)H, and performs the corresponding measurement on the game space.
If it gives “yes” she collects the prize.
Note that we recover the classical game if Q and P are restricted to choose pro-
jections along the three coordinate axes. This shows that the proposed scheme is
really a quantization as described in Subsection 12.2.2. As in the classical case, the
question is: how should the player choose the projection p0 in order to maximize her
chance of winning? Perhaps it is best to try out a few options in a simulation, for
which a Java applet is available [64]. For the input to the applet, as well as for some
of the discussion below it is easier to use unit vectors rather than one-dimensional
projections. As standard notation we will use for p = |ΦihΦ| for the door chosen
by the player, q = |χihχ| for the door opened by Q, and r = |ΨihΨ| for the initial
position of the prize, if that is defined.
From the classical case it seems likely that choosing p0 = p is a bad idea. So
let us say that the classical strategy in this game consists of always switching to
the orthogonal complement of the previous choice, i.e., to take p0 = 1I − q − p.
Note that this is always a projection because, by rule 3, p and q are orthogonal one
12. Quantum game theory 176
dimensional projections. We will analyze this strategy in Sec. 12.2.3, which turns
out to be possible without any specification of how the show master can guarantee
not to stumble on the prize in step 3.
For the show master there are two main ways how he can satisfy the rules. The
first is that he chooses randomly the components of a vector in H, and prepares the
game space in the corresponding pure state. He can then just take a note of his choice
on a classical pad, so that in stage 3 he can compute a vector orthogonal to both the
direction of the preparation and the direction chosen by the player. Q’s strategies in
this case are discussed in Subsection 12.2.3. The second and more interesting way is
to use a quantum notepad, i.e., another system with three dimensional Hilbert space
K, and to prepare a “maximally entangled state” on H ⊗ K. Then until stage 3 the
position of the prize is completely undetermined in the strong sense only possible
in quantum mechanics, but the show master can find a safe door to open on H by
making a suitable measurement on K. Q’s strategies in this case are discussed in
Subsection 12.2.5.
12.2.3 The classical strategy
To explain why the classical strategy works almost as in the classical version of the
problem, we look more closely at the end of round 3, i.e. Q has opened one door
by measuring along q and the information which q he has chosen is stored in the
mailbox system. Q’s notepad is completely irrelevant from this stage on because it
is now P’s turn and she can not access it. Hence we have to look at a state ω on
the hybrid system B(H) ⊗ C(X). Note that ω depends on p but we suppress this
dependency in the notation. For a finite set X we have seen in Section 2.2.2 that ω
is given by a probability distribution w(q) on X and family ρq ∈ S(H), q ∈ X of
density operators such that expectation values becomes
X
ω(p0 ⊗ f ) = w(q)f (q) tr[ρq p0 ], p0 ⊗ f ∈ B(H) ⊗ C(X) (12.8)
q∈X
The ρq are called conditional density operators and they represent, loosely speaking,
the density matrix which P has to use for the game space after Q has announced
his intention of opening door q. This is usually not the same conditional density
operator as the one used by Q: Since Q has more classical information about the
system, he may condition on that, leading to finer predictions. In contrast, ρ q is
conditioned only on the publicly available information.
In our case X is not finite but the set of all one-dimensional projections in H.
Therefore Equation (12.8) is not applicable. Fortunately it can be generalized if we
replace the sum with an integral and the probability distribution with a probability
measure [175]. Z
ω(p0 ⊗ f ) = w(dq) tr(ρq p0 )f (q) (12.9)
It will not depend on p and it will be the same as the reduced density operator
for the game space before the show master consults his notepad (he is not allowed
177 12.2. The quantum Monty Hall problem
to touch the prize), and even before the player chooses p (which cannot affect the
prize).
From the rules alone we know two things about the conditional density operators:
firstly, that tr(ρq q) = 0: the show master must not hit the prize. Secondly, q and
p must commute, so it does not matter whichR of the two we measure first. Thus a
measurement of p responds with probability w(dq) tr(ρq p) = tr(ρ p). Combining
these two we get the overall probability wc for winning with the classical strategy
as Z
¡
wc = w(dq) tr ρq (1I − p − q)) = 1 − tr(ρ p) . (12.11)
If we assume that ρ is known to P, from watching the show sufficiently often, the
best strategy for P is to choose initially the p with the smallest expectation with
respect to ρ, just as in the classical game with uneven prize distribution it is best
to choose initially the door least likely to contain the prize. If Q on the other hand
wants to minimize P’s gain, he will choose ρ to be uniform, which in the quantum
case means ρ = 31 1I, and hence wc = 2/3.
12.2.4 Strategies against classical notepads
In this section we consider the case that the show master records the prepared
direction of the prize on a classical notepad. We will denote the one dimensional
projection of this preparation by r. Then when he has to open a door q, he needs
to choose q ⊥ r and q ⊥ p. This is always possible in a three dimensional space.
But unless p = r, he has no choice: q is uniquely determined. This is the same as
in the classical case, only that the condition “p = r”, i.e., that the player chooses
exactly the prize vector typically has probability zero. Hence Q’s strategic options
are not in the choice of q, but rather in the way he randomizes the prize positions
r, i.e., in the choice of a probability measure v on the set of pure states. In order to
safeguard against
R the classical strategy he will make certain that the mean density
operator ρ = v(dr) r is unpolarized (= 13 1I). It seems that this is about all he has
to do, and that the best the player can do is to use the classical strategy, and win
2/3 of the time. However, this turns out to be completely wrong.
Preparing along the axes. — Suppose the show master decides that since the player
can win as in the classical case, he might as well play classical as well, and save
the cost for an expensive random generator. Thus he fixes a basis and chooses each
one of the basis vectors with probability 1/3. Then ρ = 13 1I, and there seems to be
no giveaway. In fact, the two can now play the classical version, with P choosing
likewise a projection along a basis vector.
But suppose she √ does not, and chooses instead the projection along the vec-
tor Φ = (1, 1, 1)/ 3. Then if the prize happens to be prepared in the direction
Ψ = (1, 0, 0), the show master has no choice but to choose for q the unique projec-
tion orthogonal to these two, which is along χ = (0, 1, −1). So when Q announces
his choice, P only has to look which component of the vector is zero, to find the
prize with certainty! In other words a quantum strategy of P can always beat an
opponent, who is restricted to classical strategies. This is exactly the behavior we
have mentioned at end of Section 12.1.
At a first look the success of P’s strategy seems to be an artifact of the rather
minimalistic choice of probability distribution. But suppose that Q has settled for
any arbitrary finite collection of vectors Ψα and their probabilities. Then P can
choose a vector Φ which lies in none of the two dimensional subspaces spanned by
two of the Ψα . This is possible, even with a random choice of Φ, because the union
of these two dimensional subspaces has measure zero. Then, when Q announces the
projection q, P will be able to reconstruct the prize vector with certainty: at most
one of the Ψα can be orthogonal to q. Because if there were two, they would span a
12. Quantum game theory 178
two dimensional subspace, and together with Φ they would span a three dimensional
subspace orthogonal to q, which is a contradiction.
Of course, any choice of vectors announced with floating point precision is a
choice from a finite set. Hence the last argument would seem to allow P to win with
certainty in every realistic situation. However, this only works if she is permitted
to ask for q at any desired precision. So by the same token (fixed length of floating
point mantissa) this advantage is again destroyed.
This shows, however, where the miracle strategies come from: by announcing
q, the show master has not just given the player log 2 3 bits of information, but an
infinite amount, coded in the digits of the components of q (or the vector χ).
Preparing real vectors. — The discreteness of the probability distribution is not
the key point in the previous example. In fact there is another way to economize
on random generators, which proves to be just as disastrous for Q. The vectors in
H are specified by three complex numbers. So what about choosing them real for
simplicity? An overall phase does not matter anyhow, so this restriction does not
seem to be very dramatic. √
Here the winning strategy for P is to take Φ = (1, i, 0)/ 2, or another vector
whose real and imaginary parts are linearly independent. Then the vector χ ⊥ Φ
announced by Q will have a similar property, and also must be orthogonal to the
real prize vector. But then we can simply compute the prize vector as the outer
product of real and imaginary part of χ.
For the vector Φ specified above we find that if the prize is at Ψ = (Ψ1 , Ψ2 , Ψ3 ),
with Ψk ∈ IR, the unique vector χ orthogonal to Φ and Ψ is connected to Ψ via the
transformations
χ ∝ (Ψ3 , −iΨ3 , −Ψ1 + iΨ2 ) (12.12)
Ψ ∝ (− Re χ3 , Im χ3 , χ1 ) , (12.13)
where “∝” means “equal up to a factor”, and it is understood that an overall phase
for χ is chosen to make χ1 real. This is also the convention used in the simulation
[64], so Eq. (12.13) can be tried out as a universal cheat against show masters using
only real vectors.
Uniform distribution. — The previous two examples have one thing in common: the
probability distribution of vectors employed by the show master is concentrated on
a rather small set of pure states on H. Clearly, if the distribution is more spread out,
it is no longer possible for P to get the prize every time. Hence it is a good idea for Q
to choose a distribution which is as uniform as possible. There is a natural definition
of “uniform” distribution in this context, namely the unique probability distribution
on the unit vectors, which is invariant under arbitrary unitary transformations. Is
this a good strategy for Q?
Let us consider the conditional density operator ρq , which depends on the two
orthogonal projections p, q. It implicitly contains an average over all prize vectors
leading to the same q, given p. Therefore, ρq must be invariant under all unitary
rotations of H fixing these two vectors, which means that it must be diagonal in the
same basis as p, q, (1I − p − q). Moreover, the eigenvalues cannot depend on p and
q, since every pair of orthogonal one dimensional projections can be transformed
into any other by a unitary rotation. Since we know the average eigenvalue in the
p-direction to be 1/3, we find
1 2
ρq = p + (1I − p − q) . (12.14)
3 3
Hence the classical strategy for P is clearly optimal. In other words, the pair of
strategies: “uniform distribution for Q and classical strategy for P” is a Nash equi-
librium of the game. We do not know however, whether this equilibrium is unique, in
179 12.2. The quantum Monty Hall problem
other words: If Q does not play precisely by the uniform distribution: can P always
improve on the classical strategy? We suppose that the answer to this question is
yes; to find a proof of this conjecture has turned out, however, to be a hard problem
which is still open.
12.2.5 Strategies for Quantum notepads
Assume now that the notepad system N is quantum rather than classical, i.e. N =
B(K) with K = C3 . Initially the bipartite system B(K ⊗ H) consisting of notepad
and game space is prepared in a maximally entangled state
3
1 X
Ω= √ |kki, (12.15)
3 k=1
We can state this in a stronger way, by introducing tougher rules for Q: In this
variant P not only picks the direction p, but also two more projections p0 and p00
such that p + p0 + p00 = 1I. Then Q is not only required to open a door q ⊥ p, but we
require that either q = p0 or q = p00 . It is obvious how Q can play this game with an
entangled notepad: he just uses the transposes of p, p0 , p00 as his observable. Then
everything is as in the classical version, and the equilibrium is again at 2/3.
12.2.6 Alternative versions and quantizations of the game
Quantizing something is seldom a problem with a unique solution and quantum
game theory is no exception in this respect. In the following we will give a brief
overview on some games which are closely related to our version.
Variants arising already in the classical case. — Some variants of the problem can
also be considered in the classical case, and they tend to trivialize the problem, so
that P’s final choice becomes equivalent to “Q has prepared a coin, and P guesses
heads or tails”. Here are some possibilities, formulated in a way applying both to
the classical and the quantum version.
• Q is allowed to touch the prize after P made her first choice. Clearly, in this
case Q can reshuffle the system, and equalize the odds between the remaining
doors. So no matter what P chooses, there will be a 50% chance for getting
the prize.
• Q is allowed to open the door first chosen by P. Then there is no way P’s first
choice enters the rules, and we may analyze the game with stage 2 omitted,
which is entirely trivial.
• Q may open the door with the prize, in which case the game starts again.
Since Q knows where the prize is, this is the same as allowing him to abort
the round, whenever he does not like what has happened so far, e.g., if he does
not like the relative position of prize and P’s choice. In the classical version
he could thus cancel 50% of the cases, where P’s choice is not the prize, thus
equalizing the chances for P’s two pure strategies. Similar possibilities apply
in the quantum case.
• Q may open the door with the prize, in which case P gets the prize. In the
classical version, revealing the prize is then the worst possible pure strategy,
so mixing in a bit of it would seem to make things always worse for Q. Then
although increasing Q’s options in principle can only improve things for Q,
one would advise him not to use the additional options. This is assuming,
though, that in the remaining cases Q sticks to his old strategy. However,
even classically, the relaxed rule gives him some new options: He can simply
ignore the notepad, and open any door other than p. Then the game becomes
effectively “P and Q open a door each, and P gets all prizes”. Assuming
uniform initial distribution of prizes this gives the same 2/3 winning chance
as in the original game.
The corresponding quantum strategy works in the same way. Assuming, for
simplicity, a uniform mean density operator ρ = 13 1I, Q’s strategy of ignoring
his prior information will give the classical 2/3 winning chance for P. But this
is a considerable improvement for Q in cases where a non-uniform probability
distribution of pure states previously gave Q a 100% chance of winning. So in
the quantum case, doing two seemingly stupid things together amounts to a
181 12.2. The quantum Monty Hall problem
good strategy for Q: firstly, sometimes revealing the prize for P, and secondly
ignoring all prior information.
Note that this strategy is optimal for Q, because the classical strategy still
guarantees the 2/3 winning chance for P. This can be seen with the same
arguments as in Subsection 12.2.3. The only difference is that tr(ρq q) can be
nonzero, since Q may open the door with the prize. However in this case P
wins and we get instead of Equation (12.11)
Z
¡
wc = w(dq) tr ρq (1I − p − q)) + tr(ρq q) (12.17)
2
= 1 − tr(ρp) = (12.18)
3
ρ 7→ qρq + q 0 ρq 0 + q 00 ρq 00 . (12.19)
Two published versions. — Finally let us have a short look on two variants of the
game which are proposed independently by other authors [152, 91].
where Q hides the prize, which P chooses in the second step and which Q
opens afterwards and the gameplay is described by the unitary operator
ρB,0
Bob
Mailbox
ρA,0
Alice
Figure 12.1: Schematic picture of a quantum coin-tossing protocol. The curly arrows
stands for the flow of quantum or classical information or both.
and B denotes the field with two elements – in other words Alice’s notepad consists
in this case of n qubits and m classical bits.
If Alice wants to send data (classical or quantum) to Bob, she has to store them
in the mailbox system, where Bob can read them off in the next round. Hence each
processing step of the protocol (except the first and the last one) can be described
as follows: Alice (or Bob) uses her own private data and the information provided
by Bob (via the mailbox) to perform some calculations. Afterwards she writes the
results in part to her notepad and in part to the mailbox. An operation of this
kind can be described by a completely positive map TA : A ⊗ M → A ⊗ M, or (if
executed by Bob) by TB : M ⊗ B → M ⊗ B.
Based on these structures we can describe a coin tossing protocol as a special
case of the general scheme for a quantum game introduced in Subsection 12.1.2:
At the beginning Alice and Bob prepare their private systems in some initial state.
Alice uses in addition the mailbox system to share some information about her
preparation with Bob, i.e. Alice prepares the system A⊗M in a (possibly entangled,
or at least correlated) state ρA,0 , while Bob prepares his notepad in the state ρB,0 .
Hence the state of the composite system becomes ρ0 = ρA,0 ⊗ ρB,0 . Now Alice and
Bob start to operate alternately6 on the system, as described in the last paragraph,
i.e. Alice in terms of operations TA : A ⊗ M → A ⊗ M and Bob with TB : M ⊗ B →
M ⊗ B. After N rounds7 the systems ends therefore in the state (cf. Figure 12.1)
∗ ∗ ∗ ∗
ρN = (TA,N ⊗ IdB )(IdA ⊗TB,N −1 ) · · · (TA,2 ⊗ IdB )(IdA ⊗TB,1 )ρ0 , (12.21)
where IdA , IdB are the identity maps on A and B. Note that we have assumed here
without loss of generality that Alice performs the first (i.e. providing the initial
preparation of the mailbox) and the last step (applying the operation TA,N ). It
is obvious how we have to change the following discussion if Bob starts the game
or if N is odd. To determine the result Alice and Bob perform measurements on
their notepads. The corresponding observables EA = (EA,0 , EA,1 , EA,∅ ) and EB =
(EB,0 , EB,1 , EB,∅ ) can have the three possible outcomes X = {0, 1, ∅}, which we
6 This means we are considering only turn based protocols. If special relativity, and therefore
finite propagation speed for information, is taken into account it can be reasonable to consider
simultaneous exchange of information; cf. e.g. [132] for details.
7 Basically N is the maximal number of rounds: After K < N steps Alice (Bob) can apply
consists of all parts of the protocol Alice respectively Bob can influence. Hence the
sA represent Alice’s and the sB represent Bob’s strategies. As in Subsection 12.1.2
the sets of all strategies of Alice respectively Bob are denoted by ΣA and ΣB . Note
that ΣA depends only on the algebras A and M while ΣB depends on B and M.
Occasionally it is useful to emphasize this dependency (the number of rounds is kept
fixed in this paper). In this case we write ΣA (A, M) and ΣB (B, M) instead of ΣA
and ΣB . The probability that Alice gets the result a ∈ X if she applies the strategy
sA ∈ ΣA and Bob gets b ∈ X with strategy sB ∈ ΣB is (cf. Equation (12.4))
£ ¤
υ(sA , sB ; a, b) = tr (EA,a ⊗ 1I ⊗ EB,b )ρN . (12.23)
1
υ(s0A , sB ; x, x) ≤ +² (12.24)
2
1
υ(sA , s0B ; x, x) ≤ +² (12.25)
2
The two security conditions in this definition imply that neither Alice nor Bob
can increase the probability of the outcome 0 or 1 beyond the bound 1/2 + ².
However it is more natural to think of coin tossing as a game with payoff defined
according to the following table
Alice Bob
a=b=0 1 0
(12.26)
a=b=1 0 1
other 0 0
This implies that Alice tries to increase only the probability for the outcome 0 and
not for 1 while Bob tries to do the contrary, i.e. increase the probability for 1. This
motivates the following definition.
185 12.3. Quantum coin tossing
1
υ(s0A , sB ; 0, 0) ≤ + ², (12.27)
2
1
υ(sA , s0B ; 1, 1) ≤ + ². (12.28)
2
Here R stands again for any finite dimensional (but arbitrarily large) observable
algebra.
Good coin tossing protocols are of course those with a small bias. Hence the
central question is: What is the smallest bias which we have to take into account, and
how do the corresponding optimal strategies look like? To get an answer, however,
is quite difficult. Up to now there are only partial results available (cf. Section 12.3.5
for a summary).
Other but related questions arise if we exploit the game theoretic nature of
the problem. In this context it is reasonable to look at a whole class of quantum
games, which arises from the scheme developed up to now. We only have to fix the
algebras8 A, B and M and to specify a payoff matrix as in Equation (12.26). The
latter, however, has to be done carefully. If we consider instead of (12.26) the payoff
Alice Bob
a=b=0 1 -1
(12.29)
a=b=1 -1 1
other 0 0
we get a zero sum game, which seems at a first look very reasonable. Unfortunately
it admits a very simple (and boring) optimal strategy: Bob produces always the
outcome 1 on his side while Alice claims always that she has measured 0. Hence
they never agree and nobody has to pay. The game from Equation (12.26) does not
suffer from this problem, because a draw is for Alice as bad as the case a = b = 1
where Bob wins.
12.3.2 Classical coin tossing
Let us now add some short remarks on classical coin tossing, which is included
in the general scheme just developed as a special case: We only have to choose
classical algebras for A, B and M, i.e. A = C(XA ), B = C(XB ) and M = C(XM ).
The completely positive maps TA and TB describing the operations performed by
Alice and Bob are in this case given by matrices of transition probabilities (see
Sect. 3.2.3) This implies in particular that the strategies in ΣA , ΣB are in general
mixed strategies. This is natural – there is of course no classical coin tossing protocol
consisting of pure strategies, because it would lead always to the same result (either
always 0 or always 1). However, we can decompose each mixed strategy in a unique
way into a convex linear combination of pure strategies, and this can be used to show
that there is no classical coin tossing protocol, which admits the kind of security
contained in Definition 12.3.1 and 12.3.2.
8 In contrast to the security definitions given above this means that we assume limited recourses
(notepads) of Alice and Bob. This simplifies the analysis of the problem and should not be a big
restriction (from the practical point of view) if the notepads are fixed but very large.
12. Quantum game theory 186
Proposition 12.3.3 There is no (weak) classical coin tossing protocol with bias
² < 12 .
Proof. Assume a classical coin tossing protocol (sA , sB ) is given. Since its outcome
is by definition probabilistic, sA or sB (or both) are mixed strategies which can be
decomposed (in a unique way) into pure strategies. Let us denote the sets of pure
strategies appearing in this decomposition by Σ0A , Σ0B . Since the protocol (sA , sB )
is correct, each pair (sA , sB ) ∈ Σ0A × Σ0B leads to a valid outcome, i.e. either 0 or 1
on both sides. Hence there are two possibilities to construct a zero-sum game, either
Alice wins if the outcome is 0 and Bob if it is 1 or the other way round. In both cases
we get a zero-sum two-person game with perfect information, no chance moves 9 and
only two outcomes. In those games one player has a winning strategy (cf. Sect. 15.6,
15.7 of [224]), i.e. if she (or he) follows that strategy she wins with certainty, no
matter which strategy the opponent uses. This includes in particular the case where
the other player is honest and follows the protocol. If we apply this arguments to
both variants of the game, we see that either one player could force both possible
outcomes or one bit could be forced by both players. Both cases only fit into the
definition of (weak) coin tossing if the bias is 1/2. This proves the proposition. 2
Note that the proof is not applicable in the quantum case (in fact there are
coin tossing protocols with bias less than 1/2 as we will see in Section 12.3.4). One
reason is that in the quantum case one does not have perfect information. E.g. if
Alice sends a qubit to Bob, he does not know what qubit he got. He could perform
a measurement, but if he measures in a wrong basis, he will inevitably change the
qubit.
Another way to circumvent the negative result of the previous proposition is
to weaken the assumption that both players can perform any operation on their
data. A possible practical restriction which come into mind immediately is limited
computational power, i.e. we can assume that no player is able to solve intractable
problems like factorization of large integers in an acceptable time. Within the defi-
nition given above this means that Alice and Bob do not have access to all strategies
in ΣA and ΣB but only to certain subsets. Of course, such additional restrictions
can be imposed as well in the quantum case. To distinguish the much stronger se-
curity requirements in Definition 12.3.1 and 12.3.2 a protocol is sometimes called
unconditionally secure, if no additional assumptions about the accessible cheating
strategies are necessary (loosely speaking: the “laws of quantum mechanics” are the
only restriction).
12.3.3 The unitary normal form
A special class of quantum coin tossing arises if: 1. all algebras are quantum, i.e.
A = B(HA ), B = B(HB ) and M = B(HM ) with Hilbert spaces HA , HB and HM ;
2. the initial preparation is pure: ρA = |ψA ihψA | and ρB = |ψB ihψB | with ψA ∈
HA ⊗ HM and ψB ∈ HB ; 3. the operations TA,j , TB,j are unitarily implemented:
∗
TA,j (ρ) = UA,j ρUA,j with a unitary operator UA,j on HA ⊗ HM and something
similar holds for Bob and 4. the observables EA , EB are projection valued. It is
easy to see that the corresponding strategies (sA , sB ) ∈ ΣA × ΣB do not admit a
proper convex decomposition into other strategies. Hence they represent the pure
strategies. In contrast to the classical case it is possible to construct correct coin
tossing protocols with pure strategies. The following proposition was stated for the
first time (in a less explicit way) in [160] and shows that we can replace a mixed
strategy always by a pure one without loosing security.
Proposition 12.3.4 For each strategy sA ∈ ΣA (A, M) with A ⊂ B(HA ) there is a
Hilbert space KA and a pure strategy σ e M) with Ae = B(HA ⊗ KA ) such
eA ∈ ΣA (A,
9 That means there are no outside probability experiments like dice throws.
187 12.3. Quantum coin tossing
that
υ(sA , sB ; x, y) = υ(e
σA , sB ; x, y) (12.30)
holds for all sB ∈ ΣB (B, M) (with arbitrary Bob algebra B) and all x, y ∈ {0, 1, ∅}.
A similar statement holds for Bob’s strategies.
Proof. Note first that all observable algebras A, B and M are linear subspaces
of pure quantum algebras, i.e. A ⊂ B(HA ), B ⊂ B(HB ) and M ⊂ B(HM ). In
addition it can be shown that Alice’s operations TA : A ⊗ M → B(HA ) ⊗ B(HM )
can be extended to a channel TeA : B(HA ) ⊗ B(HM ) → B(HA ) ⊗ B(HM ), i.e. a
quantum operation [178]; something similar holds for Bob’s operations. Hence we
can restrict the proof to the case where all three observable algebras are quantum.
Now the statement basically follows from the fact that we can find for each item
in the sequence TA = (ρA ; TA,2 , . . . , TA,N ; EA ) a “dilation”. For the operations TA,j
this is just the ancilla representation given in Corollary 3.2.1 , i.e.
∗
¡ ¢
TA,j (ρ) = tr2 Vj (ρ ⊗ |φj ihφj |)Vj∗ (12.31)
with a Hilbert space Lj , a unitary Vj on HA ⊗ Lj and a pure state φj ∈ Lj (and
tr2 denotes the partial trace over Lj ). Similarly, there is a Hilbert space L0 and a
pure state φ0 ∈ HA ⊗ L0 such that
ρA = tr2 (|φ0 ihφ0 |) (12.32)
holds (i.e. φ0 is the purification of ρA ; cf. Sect. 2.2), and finally we have a Hilbert
space LN +2 , a pure state φN +2 and a projection valued measure F0 , F1 , F∅ ∈ B(HA ⊗
LN +2 ) with ¡ ¢
tr(EA,x ρ) = tr Fx (ρ ⊗ |φN +2 ihφN +2 |) , (12.33)
this is another consequence of Stinesprings theorem. Now we can define the pure
strategy σeA as follows:
KA = L0 ⊗ L2 ⊗ . . . ⊗ LN ⊗ LN +2 (12.34)
ψA = φ0 ⊗ φ2 ⊗ · · · ⊗ φN ⊗ φN +2 (12.35)
UA,j = 1I0 ⊗ 1I2 ⊗ · · · ⊗ Vj ⊗ · · · ⊗ 1IN ⊗ 1IN +2 (12.36)
eA,x = 1I0 ⊗ · · · ⊗ 1IN ⊗ Fx ,
E (12.37)
where 1Ik denotes the unit operator on Lk and in Equation (12.36) we have implicitly
used the canonical isomorphism between HA ⊗KA and L0 ⊗· · ·⊗HA ⊗Lj ⊗. . .⊗LN +2
. It is now easy to show σ
eA satisfies Equation (12.30). 2
1. preparation step: Alice throws a coin, the result is bA ∈ {0, 1}, with prob-
ability 1/2 each. She stores the result and prepares the system B(HA ) ⊗
B(HM ) in the state |ψbA ihψbA |, where |ψ0 i = √12 (|0, 0i + |1, 2i) and |ψ1 i =
√1 (|1, 1i + |0, 2i) are orthogonal to each other. Bob throws a similar coin, and
2
stores the result bB . The initial preparation of his quantum part is arbitrary.
2. Bob reads the mailbox (i.e. swaps it with the second part of his Hilbert space)
and sends bB to Alice.
3. Alice receives bB and puts her remaining quantum system into the mailbox.
4. Bob reads the mailbox and puts the system into the first slot of this quantum
register.
Possible cheating strategies. — Now we will give possible cheating strategies for each
party which lead to the maximal probability of achieving the preferred outcome.
For simplicity we just look at the case where Alice prefers the outcome to be 0,
whereas Bob prefers it to be 1, cheating strategies for the other cases are easily
derivable. A cheating strategy for Bob is to try to distinguish in step 2 whether
Alice has prepared |ψ0 i or |ψ1 i. For this purpose he performs the measurement
(|0ih0|, |1ih1|, |2ih2|). If the result cB 6= 2 (the probability for this in either case is
1/2) he can identify bA = cB and set bB = cB ⊕ 1 to achieve the overall result
1. If cB = 2 holds, he has not learned anything about bA . In that case he just
continues with the protocol and hopes for the desired result, which appears with
the probability 1/2.10 So the total probability for Bob to achieve the result 0 is
1 1 1 3
2 + 2 · 2 = 4.
A cheating strategy for Alice is to set in the initial step bA = 0 and to prepare
the system B(HA ) ⊗ B(HM ) in the state |ψ f0 i = √1 (|0, 0i + |0, 1i + 2 · |1, 2i). Then
6
she continues until step 3. If bB = 0 she just continues with the protocol. Then the
probability that in the last step Bob measures b0B = 0 equals tr(|ψ f0 ihψ
f0 |·|ψ0 ihψ0 |) =
f0 i|2 = 3 . If bB = 1 she first applies a unitary operator, which swaps |0i and
|hψ0 |ψ 4
|1i, on her system before she sends it to Bob. The state on Bob’s side is then | ψ f1 ihψ
f1 |
with |ψf1 i = √1 (|1, 0i + |1, 1i + 2 · |0, 2i). The probability that Bob measures b0 = 1
6 B
f f f 2 3
equals tr(|ψ1 ihψ1 | · |ψ1 ihψ1 |) = |hψ1 |ψ1 i| = 4 . So the total probability for Alice to
get the outcome 0 is 12 · 34 + 12 · 34 = 34 .
12.3.5 Bounds on security
The previous example shows that quantum coin tossing admits, in contrast to the
classical case, a nontrivial bias. However, how secure quantum coin tossing really is?
Can we reach the optimal case (² = 0)? The answer actually is “no”. This was first
proven by Mayers, Salvail and Chiba-Kohno [161]. Later on Ambainis recalls the
10 After that measurement he is no longer able to figure out which outcome occurs on Alice’s
side, so he just sets his outcome to 1. A similar situation occurs in the cheating strategy for Alice,
but she is in neither case able to predict the outcome on Bob’s side with certainty.
189 12.3. Quantum coin tossing
arguments in a more explicit form [6]11 . It is still an open question, whether there
exists quantum coin tossing protocols with bias arbitrarily near to zero. Ambainis
also shows that a coin tossing protocol with a bias of at most ² must use at least
Ω(log log 1² ) rounds of communication. Although in that paper he gives only the
proof for strong coin tossing, it holds in the weak case as well. It follows that a
protocol cannot be made arbitrarily secure (i.e. have a sequence of protocols with
² → 0) by just increasing the amount of information exchanged in each step. The
number of rounds has to go to infinity (although very slowly).
The strong coin tossing protocol given in section 12.3.4 has a bias of ² = 0.25.
Another one with the same bias is given by Ambainis [6]. No strong protocol with
provable smaller bias is known yet. The best known weak protocol is given by
Spekkens and Rudolph [200] and has a bias of ² = √12 − 12 = 0.207 . . . . Although
this is still far from arbitrarily secure, it shows another distinction between classical
and quantum information, as in a classical world no protocol with bias smaller than
0.5 is possible.
Another interesting topic in quantum coin tossing is the question of cheat-
sensitivity, that means how much can each player increase the probability of one
outcome without risking being caught cheating. For more about this cf e.g. [200] or
[106].
11 The first attempt for a proof, given by Lo and Chau [154]. However, its validity is restricted to
the case where ‘cheating’ always influences the probabilities of both valid outcomes. More precisely
they demand that the probabilities for the outcomes 0 and 1 are equal, for any cheating strategy.
This restriction is too strong, even if Alice and Bob sit together and throw a real coin one of them
can always say he (she) does not accept the result (and for example refuses to pay his loss) and
so put the probability for one outcome to zero while the probability for the other one and the
outcome invalid are 1/2 each.
Chapter 13
Many of the concepts of entanglement theory were originally developed for quan-
tum systems described in finite dimensional Hilbert spaces. This restriction is often
justified, since we are usually only trying to coherently manipulate a small part of
the system. On the other hand, a full description of almost any system, beginning
with a single elementary particle, requires an infinite dimensional Hilbert space.
Hence if one wants to discuss decoherence mechanisms arising from the coupling of
the “qubit part” of the system with the remaining degrees of freedom, it is neces-
sary to widen the context of entanglement theory to infinite dimensions. This is not
difficult, since many of the basic notions, e.g. the definitions of entanglement mea-
sures, like the reduced von Neumann entropy or entanglement of formation, carry
over almost unchanged, merely with finite sums replaced by infinite series. More
serious are some technical problems arising from the fact that such entanglement
measures can now become infinite, and are no longer continuous functions of the
state. Luckily, as shown in recent work of Eisert et. al. [83], these problems can
be tamed to a high degree, if one imposes some natural energy constraints on the
systems.
In this chapter we look at some not-so-tame states, which should be considered
as idealized descriptions of situations in which very much entanglement is available.
For example, in the study of “entanglement assisted capacity” (Subsection 6.2.3)
one assumes that the communicating partners have an unlimited supply of shared
maximally entangled singlets. In quantum information problems involving canonical
variables it is easily seen that perfect operations can only be expected in the limit
of an “infinitely squeezed” two mode gaussian state as entanglement resource (see
also Section 13.5). But infinite entanglement is not only a desirable resource, it is
also a natural property of some physical systems, such as the vacuum in quantum
field theory (see [204, 205] and Section 13.4 below). Our aim is to show that one can
analyze these situations by writing down bona fide states on suitably constructed
systems.
This chapter is mainly based on [135]. Related publications are [83] where en-
tangled density matrices in infinite dimensional Hilbert spaces are studied and
[58, 59, 60] concerning EPR states (cf. Section 13.5).
unseparable ones in the topological sense) is not really interesting with regard to entanglement
theory, since any density operator has separable support, i.e., it is zero on all but countably many
dimensions.
191 13.2. Infinite one-copy entanglement
Here we denote by X T2 the partial transposition with respect to the second tensor
factor, of an operator X on the finite dimensional space Cd ⊗ Cd , and use that this
operation is unitary with respect to the Hilbert-Schmidt scalar product hX, Y i HS =
tr(X ∗ Y ). By assumption, Ed (σ)T2 ≥ 0, and since partial transposition preserves the
trace, Ed (σ)T2 is even a density operator. Hence the expectation value of pTd 2 in this
state is bounded by the norm of this operator. But it is easily verified that p Td 2 is
just (1/d) times the unitary operator exchanging the two tensor factors. Hence its
norm is (1/d). Taking the limit of this estimate along a sub-net of Ad converging
to A∞ , we find tr(σA∞ ) = 0. 2
The problem lies in this infinite product, which clearly need not converge for arbi-
trary choice of vectors φj , ψj . A well-known way out of this dilemma, known as von
Neumann’s incomplete tensor product [223] is to restrict the possible sequences of
vectors φ1 , φ2 , . . . in the basic product vectors: for each tensor factor, one picks a
reference unit vector χj , and only sequences are allowed for which φj = χj holds for
all but a finite number of indices. Evidently, if this property holds for both the φ j
and the ψj the product in (13.6) contains only a finite number of factors 6= 1, and
converges.P By taking norm limits of such vectors we see that also product vectors
∞
for which j=1 kφj − χj k < ∞ are included in the infinite product Hilbert space.
However, the choice of reference vectors χj necessarily breaks the full unitary sym-
metry of the factors, as far as asymptotic properties for j → ∞ are concerned. For
the case at hand, i.e., qubit systems, let us choose, for definiteness, the “spin up”
vector as χj for every j, and denote the resulting space by H∞ .
An important observation about this construction is that all observables of finite
tensor product subsystems N∞ act as operators on this infinite tensor product space.
In fact, any operator j=1 Aj makes sense on the incomplete tensor product, as
long as Aj = 1I for all but finitely many indices. The algebra of such operators is
known as the algebra of local observables. It has the structure of a *-algebra, and
its closure in operator norm is called quasi-local algebra [35].
Let us take the space H∞ as Alice’s and Bob’s Hilbert space. Then each of them
holds infinitely many qubits, and we can discuss the entanglement contained in a
density operator on H∞ ⊗ H∞ . Clearly, there is no general upper bound to this
entanglement, since we can take a maximally entangled state on the first M < ∞
factors, complemented by infinitely many spin-up product states on the remaining
qubit pairs. But for any fixed density operator the entanglement is limited: for
measurements on qubit pairs with sufficiently large j we always get nearly the same
expectations as for two uncorrelated spin-up qubits (or whatever the reference states
χj dictate). This is just another instance of Theorem 13.2.1: there is no density
operator describing infinitely many singlets.
13.3.2 Singular states
However, can we not take the limit of states with growing entanglement? To be
specific, let ΦM denote the vector which is a product of singlet states for the first
M qubit pairs, and a spin-up product for the remaining ones. These vectors do
not converge in H∞ ⊗ H∞ , but that need not concern us, if we are only interested
in expectation values: for all local observables A (observables depending on only
13. Infinitely entangled states 194
exists. Thereby we get an expectation value functional for all quasi-local observ-
ables, and by the Hahn-Banach Theorem (see e.g. [186, Theorem III.6]), we can
extend this expectation value functional to all bounded operators on H ∞ ⊗ H∞ .
The extended functional ω has all the properties required by the statistical in-
terpretation of quantum mechanics: linearity in A, ω(A) ≥ 0 for positive A, and
ω(1I) = 1. In other words it is a state on the algebra B(H) as we have introduced
them in Subsection 2.1.1. By construction, ω describes maximal entanglement for
any finite collection of qubit pairs, so it is truly a state of infinitely many singlets.
How does this match with Theorem 13.2.1? The crucial point is that that The-
orem only speaks of states given by the trace with a density operator, i.e., of func-
tionals of the form ωρ (A) = tr(ρA). Such states are called “normal” and for a finite
dimensional algebra each state is normal (cf. Subsection 2.1.2). But in the infinite
dimensional case this equivalence between the two different descriptions of quan-
tum states breaks down. In other words there is no density operator for ω: this is a
singular state on the algebra of bounded operators.
Singular states are not that unusual in quantum mechanics, although they can
only be “constructed” by an invocation the Axiom of Choice, usually through the
Hahn-Banach Theorem2 . For example, we can think of a non-relativistic particle lo-
calized at a sharp point, as witnessed by the expectations of all continuous functions
of position. Extending from this algebra to all bounded operators, we get a singular
state with sharp position3 , but “infinite momentum”, i.e., the probability assigned
to finding the momentum in any given finite interval is zero [235]. This shows that
the probability measure on the momentum space induced by such a state is only
finitely additive, but not σ-additive. This is typical for singular states.
More practical situations involving singular states arise in all systems with in-
finitely many degrees of freedom, as in quantum field theory and in statistical me-
chanics in the thermodynamic limit. For example, the equilibrium state of a free
Bose gas in infinite space at finite density and temperature is singular with respect
to Fock space because the probability for finding only a finite number of particles in
such a state is zero. In all these cases, one is primarily interested in the expectations
of certain meaningful observables (e.g., local observables), and the wilder aspects
of singular states are connected only to the extension of the state to all bounded
operators. Therefore it is a good strategy to focus on the state as an expectation
functional only on the “good” observables.
13.3.3 Local observable algebras
If we want to represent a situation with infinitely many singlets, an obvious approach
is to take again von Neumann’s incomplete tensor product, but this time the infinite
tensor product of pairs rather than single qubits, with the singlet vector chosen as
the reference vector χj for every pair. We denote this space by H∞∞ , and by Ω ∈
H∞∞ the infinite tensor product of singlet vectors. Clearly, this is a normal state
(with density operator |ΩihΩ|), and we seem to have gotten around Theorem 13.2.1
after all.
However, the problem is now to identify the Hilbert spaces of Alice and Bob as
tensor factors of H∞∞ . To be sure, the observables measurable by Alice and Bob,
respectively, are easily identified. For example, the σx -Pauli matrix for Alice’s 137th
2 Other constructions based on the Axiom of Choice are the application of invariant means, e.g.,
when averaging expectation values over all translations, or algebraic constructions using maximal
ideals. For an application in von Neumann style measurement theory of continuous spectra, see
[176]
3 This is not related to improper eigenkets of position, which do not yield normalized states
195 13.3. Singular states and infinitely many degrees of freedom
Then the Bicommutant Theorem [208] states that M00 = (M0 )0 is the smallest von
Neumann algebra containing M. In particular, when M is already an algebra, M 00
is the weak closure of M. Von Neumann algebras are characterized by the property
M00 = M. A von Neumann algebra M with the property that its only elements
commuting with all others are the multiples of the identity (i.e., M0 ∩ M00 = C1I)
is called a factor.
It might seem that the two ways out of the NoGo-Theorem indicated at the end
of the previous section are opposite to each other, but in fact they are closely related.
For if ω is a state on a C*-algebra C ⊃ A ∪ B, we can associate with it a Hilbert
space Hω , a representation πω : C → B(H), and a unit vector Ω ∈ Hω , such that
ω(C) = hΩ, πω (C)Ωi, and such that the vectors πω (C)Ω are dense in Hω . This is
called the Gelfand-Naimark-Segal (GNS)-construction [35]. Clearly, the given state
ω is given by a density operator (namely |ΩihΩ|) in this new representation and the
algebra can naturally be extended to the weak closure πω (C)00 . The commutativity
of two subalgebras is preserved by the weak closure, so the normal state |ΩihΩ|,
and the two commuting von Neumann subalgebas πω (A)00 and πω (B)00 are again a
bipartite system, which describes essentially the same situation. The only difference
is that some additional idealized observables arise from the weak closure operations,
and that some observables in C (those with C ≥ 0 but ω(C) = 0) are represented
by zero in πω .
We remark that von Neumann’s incomplete infinite tensor product of Hilbert
spaces can be seen as aNspecial case of the GNS-construction: The infinite tensor
product of C*-algebras i Ai is well-defined (see [35, Sec 2.6] for precise conditions),
N
essentially by taking the norm completion of the algebra of local observables i Ai ,
with all but finitely many factors Ai ∈ Ai equal to 1Ii . On this algebra the infinite
tensor product of states is well-defined, Nand we get the incomplete tensor product
as the GNS-Hilbert space of the algebra i B(Hi ) with respect to the pure product
state defined by the reference vectors χi .
13.4 Von Neumann algebras with maximal entanglement
13.4.1 Characterization and basic properties
Let us analyze the example given in the last section: the bipartite state obtained
from the incomplete tensor product of singlets in H∞∞ . We take as Alice’s observ-
able algebra A the von Neumann algebra generated by all local Alice operators (and
N for Bob). The bipartite state on these algebras, given by the reference
analogously
vector i χi , then has the following properties
These properties, except perhaps ME 2 (see [7]) are immediately clear from
the construction, and the properties of the respective local observables. They are
also true for finite dimensional maximally entangled states on H = HA ⊗ HB ,
A = B(HA ) ⊗ 1I, and B = 1I ⊗ B(HB ). This justifies calling this particular bipartite
system maximally entangled, as well.
There are many free parameters in this construction. For example, we could take
arbitrary dimensions di < ∞ for the ith pair. However, all these possibilities lead
to the same maximally entangled system:
Theorem 13.4.1 All bipartite states on infinite dimensional systems satisfying
conditions ME 1 - ME 5 above are unitarily isomorphic.
Proof. (Sketch). We first remark that A has to be a factor, i.e., A ∩ A0 = C1I.
Indeed, using ME 1 and ME 2, we get A ∩ A0 = B 0 ∩ A0 = (B ∪ A)0 = B(H)0 = C1I.
Now consider the support projection S ∈ A of the restriction of the state to
A. Thus 1I − S is the largest projection in A with vanishing expectation. Suppose
that this projection does not lie in the center of A, i.e., there is an A ∈ A such
that AS 6= SA. Let X = (1I − S)AS, which must then be nonzero, as AS − SA =
((1I − S) + S)(AS − SA) = X − SA(1I − S). Then using the trace property we get
ω(X ∗ X) = ω(XX ∗ ) ≤ kAk2 ω(1I−S) = 0, which implies that the support projection
of X ∗ X has vanishing expectation. But since X ∗ X ≤ kAk2 S, this contradicts the
maximality of (1I − S). It follows that S lies in the center of A and that S = 1I,
because A is a factor. To summarize this argument, ω must be faithful, in the sense
that A ∈ A, A ≥ 0, and ω(A) = 0 imply A = 0.
Now consider the subspace spanned by all vectors of the form AΩ, with A ∈ A.
This subspace is invariant under A, so its orthogonal projection is in A 0 = B. But
since (1I − P ) obviously has vanishing expectation, the previous arguments, applied
to B imply that P = 1I. This is to say that AΩ is dense in H or, in the jargon
of operator algebras, that Ω is cyclic for A. Thus H is unitarily equivalent to the
GNS-Hilbert space of ω restricted to A, and the form of B = A0 is completely
determined by this statement. Now a factor admits at most one trace state, so ω
is uniquely determined by the isomorphism type of A as a von Neumann algebra,
and it remains to show that A is uniquely determined by the above conditions.
A is a factor admitting a faithful normal trace state, so it is a “type II1 -factor”
in von Neumann’s classification. It is also hyperfinite, so we can invoke a deep
result of Alain Connes [61] stating that such a factor is uniquely determined up to
isomorphism. 2
For the rest of this section we will study further properties of this unique max-
imally entangled state of infinite entanglement. The items ME 6, ME 7 below are
clear from the above proof. ME 8 follows by splitting the infinite tensor product
either into a finite product and an infinite tail, or into factors with even and odd
labels, respectively. ME 9 - ME 11 are treated in separate subsections as indicated.
(iii) There is a set Tk of test operators formed from A and B such that (13.12)
holds for all density operators ρ.
H = H∞∞ ⊗ H e,
A = A1 ⊗ Ae ,
B = B1 ⊗ Be ,
e Be ⊂ B(H)
A1 , B1 ⊂ B(H∞∞ ) are the algebras of Theorem 13.4.1, and A, e are
other von Neumann algebras.
In other words, the maximal violation of Bell’s inequalities for all normal states
implies that the bipartite system is precisely the maximal entangled state, plus some
additional degrees of freedom (A, e B),
e which do not contribute to the violation of
Bell inequalities.
13.4.3 Schmidt decomposition and modular theory
The Schmidt decomposition (Proposition 2.2.1) is a key technique for analyzing
bipartite pure states in the standard framework. It represents an arbitrary vector
Ω ∈ HA ⊗ HB as X
Ω= c α eα ⊗ f β , (13.13)
α
where the cα > 0 are positive constants, and {eα } ⊂ HA and {fα } ⊂ HB are
orthonormal systems.
Its analog in the context of von Neumann algebras is a highly developed theory
with many applications in quantum field theory and statistical mechanics, known
as the modular theory of Tomita and Takesaki [207]. We recommend Chapter 2.5
in [35] for an excellent exposition, and only outline some ideas and indicate the
connection to the Schmidt decomposition.
Throughout this subsection, we will assume that A, B ⊂ B(H) are von Neumann
algebras, and Ω ∈ H is a unit vector, such that the properties ME 2, ME 3, and
ME 7 of Section 13.4.1 hold. As in the case of the usual Schmidt decomposition
the essential information is already contained in the restriction of the given state
to the subalgebra A, i.e., by the linear functional ω(A) = hΩ, AΩi. Indeed, the
Hilbert space and the cyclic vector Ω (cf. ME 7) satisfy precisely the conditions
for the GNS-representation, which is unique up to unitary equivalence. Moreover,
condition ME 2 fixes B as the commutant algebra.
However, since A often does not admit a trace, we cannot represent ω by a
density operator, and therefore we cannot use the spectrum of the density operator
to characterize ω. Surprisingly, it is equilibrium statistical mechanics, which pro-
vides the notion to generalize. In the finite dimensional context, we can consider
every density operator as a canonical equilibrium state, and determine from it the
Hamiltonian of the system. This in turn defines a time evolution. Note that the
Hamiltonian is only defined up to a constant, so we cannot expect to reconstruct
the eigenvalues of H, but only the spectrum of the Liouville operator σ 7→ i[σ, H],
which generates the dynamics on density operators, and has eigenvalues i(E n −Em ),
when the En are the eigenvalues of H. The connection between the time evolutions
13. Infinitely entangled states 200
and equilibrium states makes sense also for von Neumann algebras, and can be seen
as the physical interpretation of modular theory [35].
We begin the outline of this theory with the anti-linear operator S on H by
S(AΩ) = A∗ Ω, A ∈ A. (13.14)
It turns out to be closable, and we denote its closure by the same letter. As a closed
operator S admits a polar decomposition
S = J∆1/2 , (13.15)
which defines the anti-unitary modular conjugation J and the positive modular
operator ∆.
Let us calculate ∆ in the standard situation, where H = K⊗K, and A = B(K)⊗1I
respectively B = 1I ⊗ B(K), and Ω is in Schmidt form (13.13). Due to assumption
ME 7 (cyclicity), the orthonormal systems eα and fα have to be even complete (i.e.,
bases). Now consider (13.14) with A = (|eβ iheγ |) ⊗ 1I, which becomes
S(cγ eβ ⊗ fγ ) = cβ eγ ⊗ fβ , (13.16)
from which we readily get
∆1/2 = ρ1/2 ⊗ ρ−1/2 , and J = F (Θ ⊗ Θ), (13.17)
P
where ρ = α c2α |eα iheα | is the reduced density operator, F φ1 ⊗ φ2 = φ2 ⊗ φ1 is the
flip operator and Θ denotes complex conjugation in the en basis. The time evolution
with Hamiltonian H = − log ρ + c1I, for which ω is now the equilibrium state with
unit temperature, is then given by Et (A) ⊗ 1I = ∆it (A ⊗ 1I)∆−it .
In the case of general von Neumann algebras, the spectrum of ∆ need no longer
be discrete, and it can be a general positive, but unbounded selfadjoint operator.
It turns out that ∆it still defines a time evolution on the algebra A, the so-called
modular evolution The equilibrium condition cannot be written directly in the Gibbs
form ρ ∝ exp(−H), since there is no density matrix any more, but has to be replaced
by the so-called KMS-condition, a boundary condition for the analytic continuation
of correlation functions [35, 104] which links the modular evolution to the state.
In the standard situation, the eigenvalue 1 of ∆ plays a special role, because it
points to degeneracies in the Schmidt spectrum. In the extreme case of a maximally
entangled state all cα are equal, and ∆ = 1I or, equivalently, S is anti-unitary. This
characterization of maximal entanglement carries over to the von Neumann algebra
case: S is anti-unitary if and only if for all A1 , A2 ∈ A
hΩ, A1 A2 Ωi = hA∗1 Ω, A2 Ωi = hSA1 Ω, SA∗2 Ωi
= hA∗2 Ω, A1 Ωi = hΩ, A2 A1 Ωi.
This is precisely the trace property ME 4.
13.4.4 Characterization by the EPR-doubles property
In the original EPR-argument it is crucial that certain observables of Alice and Bob
are perfectly correlated, so that Alice can find the values of observables on Bob’s side
with certainty, without Bob having to carry out this measurement. An approach to
studying such correlations was proposed recently by Arens and Varadarajan [8]. The
basic idea, stripped of some measure theoretic overhead, and extended to the more
general bipartite systems considered here [236], rests on the following definition.
Let A, B be commuting observable algebras and ω a state on an algebra containing
both A and B. Then we say that an element B ∈ B is an EPR-double of A ∈ A, or
that A and B are doubles (of each other) if
¡ ¢ ¡ ¢
ω (A∗ − B ∗ )(A − B) = ω (A − B)(A∗ − B ∗ ) = 0. (13.18)
201 13.4. Von Neumann algebras with maximal entanglement
Of course, when A and B are hermitian, the two expressions coincide, and in this
case there is a simple interpretation of equation (13.18). Since A and B commute,
we can consider their joint distribution (measuring the joint spectral resolution of A
and B). Then (A−B)2 is a positive quantity, which has vanishing expectation if and
only if the joint distribution is concentrated on the diagonal, i.e., if the measured
values coincide with probability one.
Basic properties are summarized in the following Lemma.
Lemma 13.4.3 Let ω be a state on a C*-algebra containing commuting subalgebras
A and B. Then
(i) A and B are doubles iff for all C in the ambient observable algebra we have
ω(AC) = ω(BC) and ω(CA) = ω(CB).
(iii) When A and B are normal (AA∗ = A∗ A), and doubles of each other, then so
are f (A) and f (B), where f is any continuous complex valued function on the
spectrum of A and B, evaluated in the functional calculus.
(iv) When A and B are von Neumann algebras, and ω is a normal state, and
observables An with doubles Bn converge in weak*-topology to A, then every
cluster point of the sequence Bn is a double of A.
In the situation we have assumed for modular theory, we can give a detailed
characterization of the elements admitting a double:
Proposition 13.4.4 Suppose A and B = A0 are von Neumann algebras on a
Hilbert space H, and the state ω is given by a vector Ω ∈ H, which is cyclic for
both A and B. Then for every A ∈ A the following conditions are equivalent:
(ii) A is in the centralizer of the restricted state, i.e., ω(AA1 ) = ω(A1 A) for all
A1 ∈ A.
(iii) A is invariant under the modular evolution ∆it A∆−it = A for all t ∈ R.
Two special cases are of interest. On the one hand, in the standard case of a pure
bipartite state we get a complete characterization of the observables which posses
a double: they are exactly the ones commuting with the reduced density operator
[8]. On the other hand, we can ask under what circumstances all A ∈ A admit a
double. Clearly, this is the case when the centralizer in (ii) of the Proposition is
all of A, i.e., if and only if the restricted state is a trace. Again this characterizes
the everybody’s maximally entangled states on finite dimensional algebras, and the
unique infinite dimensional one for hyperfinite von Neumann algebras.
13.5 The original EPR state
In their famous 1935 paper [82] Einstein, Podolsky and Rosen studied two quantum
particles with perfectly correlated momenta and perfectly anticorrelated positions.
It is immediately clear that such a state does not exist in the standard framework
of Hilbert space theory: the difference of the positions is a self-adjoint operator with
purely absolutely continuous spectrum, so whatever density matrix we choose, the
probability distribution of this quantity will have a probability density with respect
to Lebesgue measure, and cannot be concentrated on a single point. Consequently,
the wave function written in [82] is a pretty wild object. Essentially it is Ψ(x1 , x2 ) =
cδ(x1 − x2 + a), with the Dirac delta function, and c a “normalization factor” which
must vanish, because the normalization integral for the delta function is undefined,
but infinite if anything.
How could such a profound physical argument be based on such an ill-defined
object? The answer is probably that the authors were completely aware that they
were really talking about a limiting situation of more and more sharply peaked
wave functions. We could model them by a sequence of more and more highly
squeezed two mode Gaussian states (cf. Subsection 13.5.5), or some other sequence
representation of the delta function. The key point is that the main argument does
not depend on the particular approximating sequence. But then we should also be
able to discuss the limiting situation directly in a rigorous way, and extract precisely
what is common to all approximations of the EPR state.
13.5.1 Definition
In this section we consider a family of singular states, which describes quite well
what Einstein Podolsky and Rosen may have had in mind. Throughout we as-
sume we are in the usual Hilbert space H = L2 (R2 ) for describing two canonical
degrees of freedom, with position and momentum operators Q1 , Q2 , P1 , P2 . The
basic observation is that the operators P1 + P2 and Q1 − Q2 commute as a conse-
quence of the Heisenberg commutation relations. Therefore we can evaluate in the
functional calculus (i.e., using a joint spectral resolution) any function of the form
g(P1 + P2 , Q1 − Q2 ), where g : R2 → C is an arbitrary bounded continuous function.
We define an EPR-state as any state ω such that
³ ´
ω g(P1 + P2 , Q1 − Q2 ) = g(0, a) , (13.19)
where a is the fixed distance between the particles. Several comments are in order.
First of all, if we take any sequence of vectors to “approximate” the EPR wave func-
tion (and adjust normalization on the way), weak*-cluster points of the correspond-
ing sequence of pure states exist by compactness of the state space, and all these
will be EPR states in the sense of our definition. Secondly, condition (13.19) does
203 13.5. The original EPR state
not fix ω uniquely. Indeed, different approximating sequences may lead to different
ω. Even for a fixed approximating sequence it is rarely the case that the expectation
values of all bounded operators converge, so the sequence will have many different
cluster points. Thirdly, the existence of EPR states can also be seen more directly:
the algebra of bounded continuous functions on R2 is faithfully represented in B(H)
(i.e., g(P1 + P2 , Q1 − Q2 ) = 0 only when g is the zero function). On that algebra
the point evaluation at (0, a) is a well defined state, so any Hahn-Banach extension
of this state to all of B(H) will be an EPR state 4 .
In our further analysis we will only look at properties which are common to
all EPR states, and which are hence independent of any choice of approximating
sequences. The basic technique for extracting such properties from (13.19) is to use
positivity of ω in the form of the Schwartz inequality |ω(A∗ B)| ≤ ω(A∗ A)ω(B ∗ B).
For example, we get
³ ´ ³ ´
ω Xb g = ω gb X = g(0, a)ω(X) , (13.20)
for ~ ~η ) ∈ S .
(ξ, (13.22)
In particular, the state is invariant under all phase space translations by vectors in
S.
This is already sufficient to conclude that the state is purely singular, i.e., that
ω(K) = 0 for every compact operator, and in particular for all finite dimensional
projections. An even stronger statement is that the restrictions to Alice’s and Bob’s
subsystem are purely singular.
Lemma 13.5.1 For any EPR state, and any compact operator K, ω(K ⊗ 1I) = 0.
Proof. Indeed the restricted state is invariant under all phase space translations,
since we can extend W(ξ, η) to a Weyl operator of the total system, i.e., W 0 (ξ, η) =
W(ξ, ξ, η, −η) ∼
= W(ξ, η) ⊗ W(ξ, −η), with (ξ, ξ, η, −η) ∈ S, and
¡
ω (W(ξ, η)AW(ξ, η)∗ ) ⊗ 1I) (13.23)
¡ 0 0 ∗
= ω W (ξ, η)(A ⊗ 1I)W (ξ, η) )) .
4 The reason for defining EPR-states with respect to continuous functions of P + P and
1 2
Q1 − Q2 rather than, say, measurable functions, is that we need faithfulness. The functional
calculus is well defined also for measurable functions, but some functions will evaluate to zero.
In particular, for the function g(p, x) = 1 for x = a and p = 0, but g(p, x) = 0 for all other
points, we get g(P1 + P2 , Q1 − Q2 ) = 0, because the joint spectrum of these operators is purely
absolutely continuous. Hence condition (13.19), extended to measurable functions would require
the expectation of the zero operator to be 1.
13. Infinitely entangled states 204
Now consider a unit vector χ with bounded support in position space, and let K =
|χihχ| be the corresponding one-dimensional projection. Then sufficiently widely
space translates W(nξ0 , 0)χ are orthogonal, and hence, for all N , the operator
PN
KN = n=1 W(nξ0 , 0)KW(nξ0 , 0∗ ) is bounded by 1I. Hence N ω(K) = ω(KN ) ≤
ω(1I) = 1, and ω(K) = 0. Since vectors of compact support are norm dense in
Hilbert space, the conclusion holds for arbitrary 2
For other Weyl operators we get the expectations from the Weyl commutation
relations
trivial. Combining the Weyl relations (13.24) with the invariance (13.22) gives
ω(W(ξ, ~ ~η )) = ω(W(ξ~ 0 , ~η 0 )W(ξ, ~ ~η )) = eiσ ω(W(ξ,~ ~η )W(ξ~ 0 , ~η 0 )) = eiσ ω(W(ξ,
~ ~η ))
which implies that the expectation values
³ ´
ω W(ξ, ~ ~η ) = 0 ~ ~η ) ∈
for (ξ, /S (13.25)
Fix ε > 0. Since f is uniformly continuous, there is some δ > 0 such that
|f (x) − f (y)| ≤ ε whenever |x − y| ≤ δ. Now pick a continuous function h : R →
[0, 1] ⊂ R such that h(0) = 1, h(t) = 0 for |t| > δ. We consider the operator
where F (x, y) = (f (x)−f (y))h(x−y) and this function is evaluated in the functional
calculus of the commuting selfadjoint operators (ξP1 + ηQ1 ) and (−ξP2 + ηQ2 ). But
the real valued function F satisfies |F (x, y)| ≤ ε for all (x, y): when |x − y| > δ the
h-factor vanishes, and on the strip |x − y| ≤ δ we have |f (x) − f (y)| ≤ ε. Therefore
kM k ≤ ε. Let X be an arbitrary operator. Then
¯ ¡£ ¤ ¢¯
¯ω f (ξP1 + ηQ1 ) − f (−ξP2 + ηQ2 ) X ¯
¡ ¢
= |ω M X | ≤ kM k kXk ≤ εkXk .
Here we have added a factor h(ξ(P1 + P2 ) + η(Q1 − Q2 )) at the second equality sign,
which we may because of (13.20), and because h is a function of the appropriate
operators, which is = 1 at the origin. Since this estimate holds for any ε, we conclude
that the first relation in Lemma 13.4.3.1 holds. The argument for the second relation
is completely analogous. 2
where ζ = exp(2πi/d) is the d th root of unity. These are a basis of the vector space
B(Cd ⊗ Cd ), which shows that this algebra is generated by the four unitaries u1 =
w(1, 0, 0, 0), v1 = w(0, 1, 0, 0), u2 = w(0, 0, 1, 0) and v2 = w(0, 0, 0, 1). They are
defined algebraically by the relations vk uk = ζuk vk , k = 1, 2, and ud1 = ud2 = v1d =
v2d = 1I. The oneP dimensional projection onto the standard maximally entangled
vector Ω = d−1/2 k |kki can be expressed in the basis (13.26) as
1 X
|ΩihΩ| = w(n, m, −n, m)
d2 n,m
1 X
= (u1 u−1 n m
2 ) (v1 v2 ) , (13.27)
d2 n,m
above relations and hence generate two copies of the d × d matrices. It is easy
to satisfy the commutation relations Vk Uk = ζUk Vk , by taking appropriate Weyl
operators, say
U e2 = ei(Q2 −a) , and Vek = eiξPk
e1 = eiQ1 , U (13.28)
with ξ = 2π/d. The tilde indicates that these are not quite the operators yet we are
e d = exp(idQ1 ) 6=
looking for, because they do not satisfy the periodicity relations: U 1
e d e e
1I, and similarly for U2 and Vk . We will denote by A the C*-algebra, generated by
the operators U e1 , Ve1 (13.28). The algebra Be is constructed analogously. Then by
virtue of the commutation relations U e d and Ve d commute with all other elements of
1 1
e
A, i.e., they belong to the center CA ⊂ A, e which represents the classical variables
of the system. In the same manner, U e d and Ve d generate the center CB of Bob’s
2 2
algebra Be .7
Proof of this equation. We have shown in Section 13.5.3 that U e1 and Ue2 are EPR-
e
doubles. This property transfers to arbitrary continuous functions of U1 and U e2 by
Lemma 13.4.3 and uniform approximation of continuous functions by polynomials.
However, because the state ω is not normal, it does not transfer automatically to
the measurable functional calculus and hence not automatically to U b1 and U b2 . We
claim that this is true nonetheless.
Denote by rd (z) = z 1/d the dth root function with the branch cut as described,
and let f² be a continuous function from the unit circle to the unit interval [0, 1]
such that f² (z) = 1 except for z in an ²-neighborhood of z = −1 in arclength, and
such that f² (−1) = 0. Then the function z 7→ f² (z)rd (z) is continuous. Then, since
e d and U
U e d are doubles, so are f² (U
e d ), f² (U
e d )U
b1 and their counterparts. Note that
1 2 1 1
both of these commute with all other operators involved. Hence (using the notation
|X|2 = X ∗ X or |X|2 = XX ∗ , which coincide in this case)
³ ´
e1d )2 |U
ω f ² (U b1 − U b2 |2
³¯ ¯´
= ω ¯f ² ( Ue d )Ub1 − f² (U b2 ¯ = 0,
e d )U (13.32)
1 2
where the first equality holds by expanding the modulus square, and applying the
e d ) where appropriate. On the other hand, we have
double property of f² (U 1
³¡ ¢ ´
ω 1I − f² (U b1 − U
e d ) 2 |U b2 |2
1
¡ ¢
≤ 4ω 1I − f² (Ue1d )2 ≤ 4 ² , (13.33)
π
because kU b1 − Ub2 k ≤ 2, and 0 ≤ f² (U e d ) ≤ 1I. For the estimate we used that f² (z)2
1
for all z on the unit circle except a section of relative size 2²/(2π), and that the
probability distribution for the spectrum of U e d is uniform, because the expectation
1
of all powers (Ue ) = exp(indQ1 ) vanishes.
d
1 ¡ ¢
Adding (13.32) and (13.33) we find that ω |U b1 − Ub2 |2 ≤ 4²/π for every ², and
hence that U b1 and U b2 are EPR doubles as claimed. The proof that Vb1 and Vb ∗ are
2
likewise doubles (just as Ve1 and Ve2∗ ) is entirely analogous. Hence U1 and U2 as well
as V1 and V2 are also doubles. Applying this property in the fidelity expression
(13.31) we find that every term has expectation one, so that with the prefactor d−2
the d2 terms add up to one as claimed. ¤
13.5.5 EPR states based on two mode Gaussians
In this section we will deviate from the announcement that we intended to study only
such properties of EPR states which follow from the definition alone, and are hence
common to all EPR states. The reason is that there is one particular family, which
has a lot of additional symmetry, and hence more operators admitting doubles, than
general EPR states. Moreover, it is very well known. In fact, most people working in
quantum optics probably have a very concrete picture of the EPR state, or rather
of an approximation to this state: since Gaussian states play a prominent role in
the description of lasers, it is natural to consider a Gaussian wave function of the
form
µ
1 1−λ
Ψλ (x1 , x2 ) = √ exp − (q1 − q2 )2
π 4(1 + λ)
¶
1+λ
− (q1 + q2 )2 (13.34)
4(1 − λ)
p X∞
Ψλ = 1 − λ 2 λn en ⊗ e n , (13.35)
n=0
where en denotes the eigenbasis of the harmonic oscillators Hi = (Pi2 + Q2i )/2
(i = 1, 2). This state is also known as the NOPA state, and the parameter λ ∈ [0, 1)
is related to the so-called squeezing parameter r by λ = tanh(r). Values around
r = 5 are considered a good experimental achievement [144]. Of course, we are
interested in the limit r → ∞, or λ → 1.
The λ-dependence of the wave function can also be written as
where the hyperbolic angle η is r/2. It is easy to see that for any wave function Ψ 0
the probability distributions of both Q1 − Q2 and P1 + P2 scale to a point measures
13. Infinitely entangled states 208
at zero. Hence any cluster point of the associated sequence of states ω λ (X) =
hΨλ , AΨλ i is an EPR state in the sense of our definition (with shift parameter
a = 0). Note, however, that the family itself does not converge to any state: it is
easy to construct observables X for which the expectation ωλ (X) remains oscillating
between 0 and 1 as λ → 1. Here, as in the general case, a single state can only be
obtained by going to a finest subsequence (or by taking the limit along an ultrafilter).
The virtue of the particular family (13.35) is that it has especially high symme-
try: it is immediately clear that
³ ´
(f (H1 ) − f (H2 ) Ψλ = 0 (13.37)
for all λ, and for all bounded functions f : N → C of the oscillator Hamiltonians
H1 , H2 . This implies that f (H1 ) and f (H2 ) are doubles with respect to the state ωλ
for each λ. Clearly, this property remains valid in the limit along any subsequence,
so all EPR-states obtained as cluster points of the sequence ωλ also have f (H1 )
in their algebra of doubles. Consequently, the unitaries Uk (t) = exp(itHk ) are also
doubles of each other, and the limiting states are invariant under the time evolution
U12 (t) = U1 (t) ⊗ U2 (−t). This is certainly suggestive, because oscillator time evo-
lutions have an interpretation as linear symplectic transformations on phase space:
Qk 7→ Qk cos t ± Pk sin t and Pk 7→ ∓Qk sin t + Pk cos t, where the upper sign holds
for k = 1 and the lower for k = 2. The subspace S from Section 13.5.3 is invariant
under such rotations, and one readily verifies that the time evolution U12 (t) takes
EPR states into EPR states. This certainly implies that by averaging we can gener-
ate EPR states invariant under this evolution, and we have clearly just constructed
a family with this invariance.
As λ → 1, the Schmidt spectrum in (13.35) becomes “flatter”, which suggests
that exchanging some labels n should also define a unitary with double. Let p :
N → N denote an injective (i.e., one-to-one but not necessarily onto map). Then we
define an isometry Vp by
Vp en = ep(n) (13.38)
with adjoint ½
ep−1 (n) if n ∈ p(N)
Vp∗ en = (13.39)
0 if n ∈
/ p(N)
Let us assume that p has finite distance, i.e., there is a constant ` such that |p(n) −
n| ≤ ` for all n ∈ N. We claim that in this case Vp ⊗ 1I and 1I ⊗ Vp∗ are doubles in all
EPR states constructed from the sequence (13.35). We show this by verifying that
the condition holds approximately already for finite λ. Consider the vector
¡ ¢
∆λ = Vp ⊗ 1I − 1I ⊗ Vp∗ Ψλ (13.40)
p X∞
= 1 − λ2 (λn − λp(n) ) ep(n) ⊗ en ,
n=0
where in the second summand we changed the summation index from n to p(n),
automatically omitting all terms annihilated by Vp∗ according to (13.39). Since this
is a sum of orthogonal vectors, we can readily estimate the norm by writing (λ n −
λp(n) ) = λn (1 − λp(n)−n ):
d
X
(d)
Ψλ ∝ λr e(d) (d)
r ⊗ er . (13.45)
r=1
Note that the infinite dimensional factor on the right hand side of (13.44) is again
a state of the form (13.35), however, a less entangled one with parameter λ0 =
λd < λ. The second factor, i.e., (13.44) becomes maximally entangled in the limit
λ → 1. Therefore the unitary (Ud ⊗ Ud ) splits both Alice’s and Bob’s subsystem,
so that the total system is split exactly into a less entangled version of itself and a
pure, nearly maximally entangled d-dimensional pair. The local operation extracting
entanglement from this state is to discard the infinite dimensional parts. Seen in
one of the limit states of the family ωλ this is maximally entangled, so equation
(13.2) is satisfied with ² = 0. Moreover, since the remaining system is of exactly the
same type, the process can be repeated arbitrarily often.
13.5.6 Counterintuitive properties of the restricted states
Basically, subsection 13.5.3 shows that the EPR states constructed here do satisfy
the requirements of the EPR argument. However, Einstein, Podolsky and Rosen
do not consider the measurement of suitable periodic functions of Qk or Pk but
measurements of these quantities themselves [82]: What do EPR states have to say
about these?
Unfortunately, the “values of momentum” found by Alice or Bob are not quite
what we usually mean by “values”: they are infinite with probability 1. To see this,
8 For any N > `, consider the set {1, . . . , N }. This has to contain at least the images of
{1, . . . , N − `}, hence it can contain at most ` elements not in p(N).
9 This is probably what the authors of [39] are trying to say.
Infinitely entangled states 210
recall the remark after eq. (13.22) that EPR states are invariant with respect to
phase space translations with W(ξ, ~ ~η ) with (ξ,
~ ~η ) ∈ S. Hence
¡ ¢
ω W(ξ1 , 0, η1 , 0)(A ⊗ 1I)W(ξ1 , 0, η1 , 0)∗
¡ ¢
= ω W(ξ1 , ξ1 , η1 , −η1 )(A ⊗ 1I)W(ξ1 , ξ1 , η1 , −η1 )∗
= ω(A ⊗ 1I). (13.46)
That is, the reduced state is invariant under all phase space translations. Now
suppose that for some continuous function f with compact support we have
ω(f (Q1 )) = ² 6= 0. Then we could add many (say N ) sufficiently widely spaced
PN
translates of f to get an operator F = i f (Q1 + xi 1I) with kF k ≤ kf k and
|N ²| = |ω(F )| ≤ kf k, which implies ² = 0. Hence for every function with compact
support we must have ω(f (Q1 )) = 0. Note that this is possible only for singular
states, since we can easily construct a sequence of compactly supported function
increasing to the identity, whose ω expectations are all zero, hence fail to converge
to 1.
In spite of being infinite, the “measured values” of Alice and Bob are perfectly
correlated, which means that we have to distinguish different kinds if infinity. Such
“kinds of infinity” are the subject of the topological theory of compactifications
[53, 235]. The basic idea is very simple: consider some C*-algebra of bounded func-
tions on the real line. Then the evaluations of the functions at a point, i.e., the
functionals x 7→ f (x), are pure states on such an algebra, but ´compactness of the
state space together with the Kreı̆n-Milman Theorem [4] dictates that there are
many more pure states. These additional pure states are interpreted as the points
at infinity associated with the given observable algebra. The set of all pure states is
called the Gel’fand spectrum of the commutative C*-algebra[35, Sec.2.3.5], and the
algebra is known to be isomorphic to the algebra of continuous functions on this
compact space. For the algebra of all bounded function the additional pure states
are called free ultrafilters, for the algebra of all continuous bounded functions we
get the points of the Stone-Čech-compactification, and for the algebra of uniformly
continuous functions we get a still coarser notion of points at infinity. According to
Section 13.5.3 these are the measured values, which will be perfectly correlated be-
tween Alice’s and Bob’s positions or momenta. It is not possible to exhibit any such
value, because proving their mere existence already requires an argument based on
the Axiom of Choice.
So do we have to be content with the statement that the measured values lie “out
there on the infinite ranges, where the free ultrafilters roam?” Section 13.5.4 shows
that for many concrete problems, involving not too large observable algebras, we
can use the perfect correlation property quite well. A smaller algebra of observables
means that many points of Gel’fand spectrum become identified, and some of these
coarser points may have a direct physical interpretation. So the moral is not so much
that compactification points at infinity are wild, pathological objects, but that they
describe the way a sequence can go to infinity in the finest possible detail, which is
just much finer that we usually want to know. The EPR correlation property holds
even for such wild “measured values”.
Bibliography
[3] C. Adami and N. J. Cerf. Von Neumann capacity of noisy quantum channels.
Phys. Rev. A 56, no. 5, 3470–3483 (1997).
[4] E.M. Alfsen. Compact convex sets and boundary integrals, volume 57 of Ergeb-
nisse der Mathematik und ihrer Grenzgebiete. Springer, Springer, New York,
Hedelberg, Berlin, (1971).
[6] A. Ambainis. A new protocol and lower bounds for quantum coin flipping.
In Proceedings of the 33rd Annual Symposium on Theory of Computing 2001,
pages 134–142. Association for Computing Machinery, New York (2001). see
also the more recent version in quant-ph/0204022.
[7] H. Araki and E.J. Woods. A classification of factors. Publ. R.I.M.S, Kyoto
Univ. 4, 51–130 (1968).
[8] R. Arens and V.S. Varadarajan. On the concept of EPR states and their
structure. Jour. Math. Phys. 41, 638–651 (2000).
[18] C. H. Bennett and G. Brassard. Quantum key distribution and coin tossing.
In Proc. of IEEE Int. Conf. on Computers, Systems, and Signal Processing
(Bangalore, India, 1984), pages 175–179. IEEE, New York (1984).
[29] M. Blum. Coin flipping by telephone. A protocol for solving impossible prob-
lems. SIGACT News 15, 23–27 (1981).
[61] A. Connes. Sur la cassification des facteurs de type II. C.R. Acad. Sci. Paris
Ser. A-B 281, A13–A15 (1975).
[62] J. F. Cornwell. Group theory in physics. II. Academic Press, London et. al.
(1984).
[65] G.M. D’Ariano, R.D. Gill, M. Keyl, B. Kuemmerer, H. Maassen and R.F.
Werner. The quantum Monty Hall problem. Quantum Inf. Comput. 2, 355–
366 (2002).
[68] R. Derka, V. Bužek and A.K. Ekert. Universal algorithm for optimal es-
timation of quantum states from finite ensembles via realizable generalized
measurements. Phys. Rev. Lett. 80, no. 8, 1571–1575 (1998).
[69] D. Deutsch. Quantum theory, the Church-Turing principle and the universal
quantum computer. Proc. R. Soc. Lond. A 400, 97–117 (1985).
215 Bibliography
[72] D.P. DiVincenzo, P.W. Shor, J.A. Smolin, B.M. Terhal and A.V. Thapliyal.
Evidence for bound entangled states with negative partial transpose. Phys.
Rev. A 61, no. 6, 062312 (2000).
[76] N. G. Duffield. A large deviation principle for the reduction of product repre-
sentations. Proc. Amer. Math. Soc. 109, 503–515 (1990).
[77] P. Dupius and R. S. Ellis. A weak convergence approach to the theory of large
deviations. Wiley, New York et. al. (199?).
[78] W. Dür, J.I. Cirac, M. Lewenstein and D. Bruss. Distillability and partial
transposition in bipartite systems. Phys. Rev. A 61, no. 6, 062313 (2000).
[86] D. J. Wineland et. al. Quantum information processing with trapped ions.
quant-ph/0212079 (2002).
[91] A. P. Flitney and D. Abbott. Quantum version of the Monty Hall problem.
Phys. Rev. A 65, 062318 (2002).
[92] G. Giedke, L.-M. Duan, J. I. Cirac and P. Zoller. Distillability criterion for
all bipartite gaussian states. Quant. Inf. Comp. 1, no. 3 (2001).
[94] R. D. Gill and S. Massar. State estimation for large ensembles. Phys. Rev.
A61, 2312–2327 (2000).
[95] N. Gisin. Hidden quantum nonlocality revealed by local filters. Phys. Lett. A
210, no. 3, 151–156 (1996).
[99] D. Gottesman. Stabilizer codes and quantum error correction. Ph.D. thesis,
California Institute of Technology (1997). quant-ph/9705052.
[100] M. Grassl, T. Beth and T. Pellizzari. Codes for the quantum erasure channel.
Phys. Rev. A 56, no. 1, 33–38 (1997).
[106] L. Hardy and A. Kent. Cheat sensitive quantum bit commitment. quant-
ph/9911043 (1999).
[124] P. Horodecki, J.I. Cirac and M. Lewenstein. Bound entanglement for contin-
uous variables is a rare phenomenon. quant-ph/0103076 (2001).
[150] U. Leonhardt. Measuring the quantum state of light. Cambridge Univ. Press,
Cambridge (1997).
[152] C.-F. Li, Y.-S. Zhang, Y.-F. Huang and G.-C. Guo. Quantum strategies of
quantum measurement. Phys. Lett. A 280, 257–260 (2000).
[153] S. Lloyd. Capacity of the noisy quantum channel. Phys. Rev. A 55, no. 3,
1613–1622 (1997).
[154] H.-K. Lo and H. F. Chau. Why quantum bit commitment and ideal quantum
coin tossing are impossible. Physica D 120, 177–187 (1998).
[158] R. Matsumoto and T. Uyematsu. Lower bound for the quantum capacity of a
discrete memoryless quantum channel. quant-ph/0105151 (2001).
[162] N. D. Mermin. Quantum mysteries revisited. Am. J. Phys. 58, no. 8, 731–734
(1990).
[163] N. D. Mermin. What’s wrong with these elements of reality? Phys. Today 43,
no. 6, 9–11 (1990).
[164] D. A. Meyer. Quantum strategies. Phys. Rev. Lett. 82, 1052–1055 (1999).
[167] J. Nash. Non-cooperative games. Ann. of Math., II. Ser 54, 286–295 (1951).
[170] M. A. Nielsen. Continuity bounds for entanglement. Phys. Rev. A 61, no. 6,
064301 (2000).
[173] T. Ogawa and H. Nagaoka. Strong converse and Stein’s lemma in quantum
hypothesis testing. IEEE Trans. Inf. Theory IT-46, 2428 2433 (2000).
[174] M. Ohya and D. Petz. Quantum entropy and its use. Springer, Berlin (1993).
[178] V. I. Paulsen. Completely bounded maps and dilations. Longman Scientific &
Technical (1986).
[179] A. Peres. Higher order schmidt decompositions. Phys. Lett. A 202, no. 1,
16–17 (1995).
[180] A. Peres. Separability criterion for density matrices. Phys. Rev. Lett. 77,
no. 8, 1413–1415 (1996).
[196] D. Simon. On the power of quantum computation. In Proc. 35th annual sym-
posium on foundations of computer science, pages 124–134. IEEE Computer
Society Press, Los Alamitos (1994).
[198] S. Singh. The code book: The Science of Secrecy from Ancient Egypt to Quan-
tum Cryptography. Fourth Estate, London (1999).
[199] R. W. Spekkens and T. Rudolph. Degrees of concealment and bindingness in
quantum bit commitment protocols. Phys. Rev. A 65, 012310 (2002).
[203] E. Størmer. Positive linear maps of operator algebras. Acta Math. 110, 233–
278 (1693).
[205] S.J. Summers and R.F.Werner. Maximal violation of Bell’s inequalities for
algebras of observables in tangent spacetime regions. Ann. Inst. H. Poincaré
A 49, 215–243 (1988).
[206] S.J. Summers and R.F.Werner. On Bell’s inequalities and algebraic invariants.
Lett. Math. Phys. 33, 321–334 (1995).
[207] M. Takesaki. Tomita’s theory of modular Hilbert algebras and its application,
volume 128 of Lect. Notes. Math. Springer, Berlin, Heidelberg, New York
(1970).
[217] G. Vidal. Entanglement monotones. J. Mod. Opt. 47, no. 2-3, 355–376 (2000).
[222] K. G. H. Vollbrecht and R. F. Werner. Why two qubits are special. J. Math.
Phys. 41, no. 10, 6772–6782 (2000).
[223] J. von Neumann. On infinite direct products. Compos. Math. 6, 1–77 (1938).
cf. also Collected Works III, No. 6.
223 Bibliography
[224] J. von Neumann and O. Morgenstern. Theory of games and economic behav-
ior. Princeton Univ. Press, Princeton (1944).
[234] R. F. Werner and M. M. Wolf. Bound entangled gaussian states. Phys. Rev.
Lett. 86, no. 16, 3658–3661 (2001).
[235] R.F. Werner. Physical uniformities on the state space of non-relativistic quan-
tum mechanics. Found. Phys. 13, 859–881 (1983).
[236] R.F. Werner. EPR states for von Neumann algebras. quant-ph/9910077
(1999).