Keyl Aspects of QIT PDF

Aspects of quantum
information theory
Von der Gemeinsamen Naturwissenschaftlichen Fakultät

der Technischen Universität Carolo-Wilhelmina
zu Braunschweig
von Michael Keyl

aus Berlin
angenommene Habilitationsschrift
zur Erlangung der Venia Legendi
für das Lehrgebiet Theoretische Physik
Braunschweig
21. Mai 2003
Foreword
The main purpose of this habilitation thesis is to document my research on

quantum information theory since 1997. To this end I have divided it into two parts.
The first (Part I “Fundamentals”) is of introductory nature and gives an overview
on the foundations of quantum information. It is based on an invited review article
I have written for “Physics Report” [134]. Its main purposes are to make the work
self contained and to show how my own research fits into in the whole field, which
has become fairly large during the last decade. At the same time it should help
readers unfamiliar with quantum information to get an easier access to the work.
My own research on different aspects of quantum information theory is presented
in the second part of the work (Part II “Advanced topics”). It contains results on
quantum channel capacities (Chapter 7), quantum cloning and estimation (Chapters
8 - 11), quantum game theory (Chapter 12) and infinitely entangled states (Chapter
13). Although most of the results presented here are published already elsewhere
[136, 133, 138, 137, 65, 139, 75], the present work contains several significant new
results. This concerns in particular estimation theory for mixed quantum states
(Chapter 10) and the discussion of infinite entanglement in Chapter 13. The latter
is submitted [135] but not yet accepted for publication.
Contents
1 Introduction 9
1.1 What is quantum information? . . . . . . . . . . . . . . . . . . . . . 9
1.2 Tasks of quantum information . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Experimental realizations . . . . . . . . . . . . . . . . . . . . . . . . 13
I Fundamentals 17
2 Basic concepts 18
2.1 Systems, States and Effects . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.1 Operator algebras . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.2 Quantum mechanics . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.3 Classical probability . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.4 Observables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Composite systems and entangled states . . . . . . . . . . . . . . . . 22
2.2.1 Tensor products . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.2 Compound and hybrid systems . . . . . . . . . . . . . . . . . 23
2.2.3 Correlations and entanglement . . . . . . . . . . . . . . . . . 24
2.2.4 Bell inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 Completely positive maps . . . . . . . . . . . . . . . . . . . . 26
2.3.2 The Stinespring theorem . . . . . . . . . . . . . . . . . . . . . 27
2.3.3 The duality lemma . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Separability criteria and positive maps . . . . . . . . . . . . . . . . . 29
2.4.1 Positivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.2 The partial transpose . . . . . . . . . . . . . . . . . . . . . . 30
2.4.3 The reduction criterion . . . . . . . . . . . . . . . . . . . . . 31
3 Basic examples 32
3.1 Entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1.1 Maximally entangled states . . . . . . . . . . . . . . . . . . . 32
3.1.2 Werner states . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.3 Isotropic states . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.4 OO-invariant states . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.5 PPT states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1.6 Multipartite states . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.1 Quantum channels . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.2 Channels under symmetry . . . . . . . . . . . . . . . . . . . . 40
3.2.3 Classical channels . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.4 Observables and preparations . . . . . . . . . . . . . . . . . . 42
3.2.5 Instruments and parameter dependent operations . . . . . . . 43
3.2.6 LOCC and separable channels . . . . . . . . . . . . . . . . . 45
3.3 Quantum mechanics in phase space . . . . . . . . . . . . . . . . . . . 46
3.3.1 Weyl operators and the CCR . . . . . . . . . . . . . . . . . . 46
3.3.2 Gaussian states . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.3 Entangled Gaussians . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.4 Gaussian channels . . . . . . . . . . . . . . . . . . . . . . . . 50
4 Basic tasks 52
4.1 Teleportation and dense coding . . . . . . . . . . . . . . . . . . . . . 52
4.1.1 Impossible machines revisited: Classical teleportation . . . . 52
4.1.2 Entanglement enhanced teleportation . . . . . . . . . . . . . 52
4.1.3 Dense coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Estimating and copying . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.1 Quantum state estimation . . . . . . . . . . . . . . . . . . . . 55
4.2.2 Approximate cloning . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 Distillation of entanglement . . . . . . . . . . . . . . . . . . . . . . . 57
4.3.1 Distillation of pairs of qubits . . . . . . . . . . . . . . . . . . 58
4.3.2 Distillation of isotropic states . . . . . . . . . . . . . . . . . . 59
4.3.3 Bound entangled states . . . . . . . . . . . . . . . . . . . . . 60
4.4 Quantum error correction . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4.1 The theory of Knill and Laflamme . . . . . . . . . . . . . . . 61
4.4.2 Graph codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.5 Quantum computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.5.1 The network model of classical computing . . . . . . . . . . . 66
4.5.2 Computational complexity . . . . . . . . . . . . . . . . . . . . 67
4.5.3 Reversible computing . . . . . . . . . . . . . . . . . . . . . . 68
4.5.4 The network model of a quantum computer . . . . . . . . . . 69
4.5.5 Simons problem . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.6 Quantum cryptography . . . . . . . . . . . . . . . . . . . . . . . . . 72
5 Entanglement measures 75
5.1 General properties and definitions . . . . . . . . . . . . . . . . . . . 75
5.1.1 Axiomatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.1.2 Pure states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1.3 Entanglement measures for mixed states . . . . . . . . . . . . 78
5.2 Two qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.1 Pure states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2.2 EOF for Bell diagonal states . . . . . . . . . . . . . . . . . . 81
5.2.3 Wootters formula . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2.4 Relative entropy for Bell diagonal states . . . . . . . . . . . . 83
5.3 Entanglement measures under symmetry . . . . . . . . . . . . . . . . 83
5.3.1 Entanglement of Formation . . . . . . . . . . . . . . . . . . . 84
5.3.2 Werner states . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3.3 Isotropic states . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3.4 OO-invariant states . . . . . . . . . . . . . . . . . . . . . . . 86
5.3.5 Relative Entropy of Entanglement . . . . . . . . . . . . . . . 87
6 Channel capacity 90
6.1 Definition and elementary properties . . . . . . . . . . . . . . . . . . 90
6.1.1 The definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.1.2 Elementary properties . . . . . . . . . . . . . . . . . . . . . . 92
6.1.3 Relations to entanglement measures . . . . . . . . . . . . . . 95
6.2 Coding theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2.1 Shannon’s theorem . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2.2 The classical capacity of a quantum channel . . . . . . . . . . 96
6.2.3 Entanglement assisted capacity . . . . . . . . . . . . . . . . . 97
6.2.4 The quantum capacity . . . . . . . . . . . . . . . . . . . . . . 98
6.2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
II Advanced topics 103
7 Continuity of the quantum capacity 104
7.1 Discrete to continuous error model . . . . . . . . . . . . . . . . . . . 104
7.2 Coding by random graphs . . . . . . . . . . . . . . . . . . . . . . . . 105
7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.3.1 Correcting small errors . . . . . . . . . . . . . . . . . . . . . . 107
7.3.2 Estimating capacity from finite coding solutions . . . . . . . 108
7.3.3 Error exponents . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.3.4 Capacity with finite error allowed . . . . . . . . . . . . . . . . 111
8 Multiple inputs 112

8.1 Overview and general structure . . . . . . . . . . . . . . . . . . . . . 112
8.2 Symmetric cloner and estimators . . . . . . . . . . . . . . . . . . . . 114
8.2.1 Reducing parameters . . . . . . . . . . . . . . . . . . . . . . . 114
8.2.2 Decomposition of tensor products . . . . . . . . . . . . . . . . 116
8.2.3 Fully symmetric cloning maps . . . . . . . . . . . . . . . . . . 118
8.2.4 Fully Symmetric estimators . . . . . . . . . . . . . . . . . . . 120
8.3 Appendix: Representations of unitary groups . . . . . . . . . . . . . 122
8.3.1 The groups and their Lie algebras . . . . . . . . . . . . . . . 122
8.3.2 Representations . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.3.3 The Casimir invariants . . . . . . . . . . . . . . . . . . . . . . 123
9 Optimal Cloning 125

9.1 Figures of merit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
9.2 The optimal cloner . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
9.3 Testing all clones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
9.3.1 Existence and uniqueness . . . . . . . . . . . . . . . . . . . . 129
9.3.2 Supplementary properties . . . . . . . . . . . . . . . . . . . . 131
9.4 Testing single clones . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
9.4.1 Fully symmetric cloners . . . . . . . . . . . . . . . . . . . . . 132
9.4.2 The qubit case . . . . . . . . . . . . . . . . . . . . . . . . . . 134
9.4.3 The general case . . . . . . . . . . . . . . . . . . . . . . . . . 135
9.5 Asymptotic behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
9.6 Cloning of mixed states . . . . . . . . . . . . . . . . . . . . . . . . . 141
10 State estimation 143

10.1 Estimating pure states . . . . . . . . . . . . . . . . . . . . . . . . . . 143
10.1.1 Relations to optimal cloning . . . . . . . . . . . . . . . . . . . 143
10.1.2 The optimal estimator . . . . . . . . . . . . . . . . . . . . . . 144
10.2 Estimating mixed states . . . . . . . . . . . . . . . . . . . . . . . . . 146
10.2.1 Estimating the spectrum . . . . . . . . . . . . . . . . . . . . 147
10.2.2 Asymptotic behavior . . . . . . . . . . . . . . . . . . . . . . . 148
10.2.3 Estimating the full density matrix . . . . . . . . . . . . . . . 152
10.3 Appendix: Large deviation theory . . . . . . . . . . . . . . . . . . . . 155
11 Purification 157
11.1 Statement of the problem . . . . . . . . . . . . . . . . . . . . . . . . 157
11.1.1 Figures of Merit . . . . . . . . . . . . . . . . . . . . . . . . . 157
11.1.2 The optimal purifier . . . . . . . . . . . . . . . . . . . . . . . 158
11.2 Calculating fidelities . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
11.2.1 Decomposition of states . . . . . . . . . . . . . . . . . . . . . 159
11.2.2 The one qubit fidelity . . . . . . . . . . . . . . . . . . . . . . 160
11.2.3 The all qubit fidelity . . . . . . . . . . . . . . . . . . . . . . . 162
11.3 Solution of the optimization problems . . . . . . . . . . . . . . . . . 163
11.4 Asymptotic behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
11.4.1 The one particle test . . . . . . . . . . . . . . . . . . . . . . . 165
11.4.2 The many particle test . . . . . . . . . . . . . . . . . . . . . . 168
12 Quantum game theory 170

12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
12.1.1 Classical games . . . . . . . . . . . . . . . . . . . . . . . . . . 170
12.1.2 Quantum games . . . . . . . . . . . . . . . . . . . . . . . . . 171
12.2 The quantum Monty Hall problem . . . . . . . . . . . . . . . . . . . 173
12.2.1 The classical game . . . . . . . . . . . . . . . . . . . . . . . . 173
12.2.2 The quantum game . . . . . . . . . . . . . . . . . . . . . . . 174
12.2.3 The classical strategy . . . . . . . . . . . . . . . . . . . . . . 176
12.2.4 Strategies against classical notepads . . . . . . . . . . . . . . 177
12.2.5 Strategies for Quantum notepads . . . . . . . . . . . . . . . . 179
12.2.6 Alternative versions and quantizations of the game . . . . . . 180
12.3 Quantum coin tossing . . . . . . . . . . . . . . . . . . . . . . . . . . 182
12.3.1 Coin tossing protocols . . . . . . . . . . . . . . . . . . . . . . 182
12.3.2 Classical coin tossing . . . . . . . . . . . . . . . . . . . . . . . 185
12.3.3 The unitary normal form . . . . . . . . . . . . . . . . . . . . 186
12.3.4 A particular example . . . . . . . . . . . . . . . . . . . . . . . 187
12.3.5 Bounds on security . . . . . . . . . . . . . . . . . . . . . . . . 188
13 Infinitely entangled states 190

13.1 Density operators on infinite dimensional Hilbert space . . . . . . . . 190
13.2 Infinite one-copy entanglement . . . . . . . . . . . . . . . . . . . . . 191
13.3 Singular states and infinitely many degrees of freedom . . . . . . . . 192
13.3.1 Von Neumann’s incomplete infinite tensor product of Hilbert
spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
13.3.2 Singular states . . . . . . . . . . . . . . . . . . . . . . . . . . 193
13.3.3 Local observable algebras . . . . . . . . . . . . . . . . . . . . 194
13.3.4 Some basic facts about operator algebras . . . . . . . . . . . 195
13.4 Von Neumann algebras with maximal entanglement . . . . . . . . . 196
13.4.1 Characterization and basic properties . . . . . . . . . . . . . 196
13.4.2 Characterization by violations of Bell’s inequalities . . . . . . 198
13.4.3 Schmidt decomposition and modular theory . . . . . . . . . . 199
13.4.4 Characterization by the EPR-doubles property . . . . . . . . 200
13.5 The original EPR state . . . . . . . . . . . . . . . . . . . . . . . . . 202
13.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
13.5.2 Restriction to the CCR-algebra . . . . . . . . . . . . . . . . . 203
13.5.3 EPR-correlations . . . . . . . . . . . . . . . . . . . . . . . . . 204
13.5.4 Infinite one-shot entanglement . . . . . . . . . . . . . . . . . 205
13.5.5 EPR states based on two mode Gaussians . . . . . . . . . . . 207
13.5.6 Counterintuitive properties of the restricted states . . . . . . 209
Chapter 1
Introduction
Quantum information and quantum computation have recently attracted a lot

of interest. The promise of new technologies like safe cryptography and new “super
computers”, capable of handling otherwise untractable problems, has excited not
only researchers from many different fields like physicists, mathematicians and com-
puter scientists, but also a large public audience. On a practical level all these new
visions are based on the ability to control the quantum states of (a small number
of) micro systems individually and to use them for information transmission and
processing. From a more fundamental point of view the crucial point is a recon-
sideration of the foundations of quantum mechanics in an information theoretical
context.
The purpose of this work is to document my own contributions on this field. To
this end the text is divided into two parts. The first (Part I. “Fundamentals”) is of
introductory nature. It takes into account that most of the fundamental concepts
and basic ideas of quantum information are developed during the last decade, and
are therefore unfamiliar to most physicists. To make the thesis more self contained
and easier accessible for the non-expert I have started the work therefore with a
detailed review (mainly based on [134]) about the fundamentals of quantum infor-
mation. Its outline is as follows: The rest of this introduction is devoted to a rough
and informal overview of the field, discussing some of its tasks and experimental
realizations. Afterwards, in Chapter 2, we will consider the basic formalism which
is necessary to present more detailed results. Typical keywords in this context are:
systems, states, observables, correlations, entanglement and quantum channels. We
then clarify these concepts (in particular entanglement and channels) with several
examples in Chapter 3, and in Chapter 4 we discuss the most important tasks of
quantum information in greater detail. Chapter 5 and 6 are then devoted to a more
quantitative analysis. They discuss entanglement measures and channel capacities.
The second part of this thesis (Part II. “Advanced topics”) is devoted to my own
contributions to quantum information. It starts with Chapter 7 where continuity
properties of the quantum capacity of a quantum channel are discussed. These
results are based on [139]. The following Chapters 8 to 11 should be regarded as
a contiguous part because they all deal with different aspects of quantum cloning
and estimation (general properties in Chapter 8, quantum cloning in Chapter 9,
quantum state estimation in Chapter 10 and undoing noise in Chapter 11). They
are mainly based on work published in [136, 138, 133, 137], but Chapter 10 contains
significant new results as well (Section 10.2). Chapter 12 is devoted to quantum
game theory. It discusses two examples [65, 75] and adds some new ideas about
definition and general structure of quantum games. The last chapter (Chapter 13)
finally discusses entanglement theory in the context of systems with infinitely many
degrees of freedom. It is based on [135].
1.1 What is quantum information?
Classical information is, roughly speaking, everything which can be transmitted
from a sender to a receiver with “letters” from a “classical alphabet” e.g. the two
digits “0” and “1” or any other finite set of symbols. In the context of classical
information theory, it is completely irrelevant which type of physical system is used
to perform the transmission. This abstract approach is successful because it is easy
to transform information between different types of carriers like electric currents in
a wire, laser pulses in an optical fiber, or symbols on a piece of paper without loss
1. Introduction 10
of data; and even if there are losses they are well understood and it is known how
to deal with them. However, quantum information theory breaks with this point of
view. It studies, loosely speaking, that kind of information (“quantum information”)
which is transmitted by micro particles from a preparation device (sender) to a
measuring apparatus (receiver) in a quantum mechanical experiment – in other
words the distinction between carriers of classical and quantum information becomes
essential. This approach is justified by the observation that a lossless conversion of
quantum information into classical information is in the above sense not possible.
Therefore, quantum information is a new kind of information.
In order to explain why there is no way from quantum to classical information
and back, let us discuss how such a conversion would look like. To convert quantum
to classical information we need a device which takes quantum systems as input
and produces classical information as output – this is nothing else than a measuring
apparatus. The converse translation from classical to quantum information can be
rephrased similarly as “parameter dependent preparation”, i.e. the classical input to
such a device is used to control the state (and possibly the type of system) in which
the micro particles should be prepared. A combination of these two elements can be
done in two ways. Let us first consider a device which goes from classical to quantum
to classical information. This is a possible task and in fact technically realized
already. A typical example is the transmission of classical information via an optical
fiber. The information transmitted through the fiber is carried by micro particles
(photons) and is therefore quantum information (in the sense of our preliminary
definition). To send classical information we have to prepare first photons in a
certain state send them through the channel and measure an appropriate observable
at the output side. This is exactly the combination of a classical → quantum with
a quantum → classical device just described.
The crucial point is now that the converse composition – performing the mea-
surement M first and the preparation P afterwards (cf. Figure 1.1) – is more prob-
lematic. Such a process is called classical teleportation, if the particles produced by
P are “indistinguishable” from the input systems. We will show the impossibility
of such a device via a hierarchy of other “impossible machines” which traces the
problem back to the fundamental structure of quantum mechanics. This finally will
prove our statement that quantum information is a new kind of information 1 .
Measurement Preparation
M P
Figure 1.1: Schematic representation of classical teleportation. Here and in the fol-
lowing diagrams a curly arrow stands for quantum systems and a straight one for
the flow of classical information.
To start with, we have to clarify the precise meaning of “indistinguishable” in

this context. This has to be done in a statistical way, because the only possibility to
compare quantum mechanical systems is in terms of statistical experiments. Hence
we need an additional preparation device P 0 and an additional measuring apparatus
M 0 . Indistinguishable now means that it does not matter whether we perform M 0
measurements directly on P 0 outputs or whether we switch a teleportation device
1 The following chain of arguments is taken from [232], where it is presented in greater detail.
This concerns in particular the construction of Bell’s telephone from a joint measurement, which
we have omitted here.
11 1.1. What is quantum information?
P0 M P M0
∼
=
P0 M0
Figure 1.2: A teleportation process should not affect the results of a statistical
experiment with quantum systems. A more precise explanation of the diagram is
given in the text.
in between; cf. Figure 1.2. In both cases we should get the same distribution of
measuring results for a large number of repetitions of the corresponding experiment.
This requirement should hold for any preparation P 0 and any measurement M 0 ,
but for fixed M and P . The latter means that we are not allowed to use a priori
knowledge about P 0 or M 0 to adopt the teleportation process (otherwise we can
choose in the most extreme case always P 0 for P and the whole discussion becomes
meaningless).
The second impossible machine we have to consider is a quantum copying ma-
chine. This is a device C which takes one quantum system p as input and produces
two systems p1 , p2 of the same type as output. The limiting condition on C is that
p1 and p2 are indistinguishable from the input, where “indistinguishable” has to be
understood in the same way as above: Any statistical experiment performed with
one of the output particles (i.e. always with p1 or always with p2 ) yields the same
result as applied directly to the input p. To get such a device from teleportation
is easy: We just have to perform an M measurement on p, make two copies of the
classical data obtained, and run the preparation P on each of them; cf. Figure 1.3.
Hence if teleportation is possible copying is possible as well.
According to the “no-cloning theorem” of Wootters and Zurek [239], however, a
quantum copy machine does not exist and this basically concludes our proof. How-
ever we will give an easy argument for this theorem in terms of a third impossible
Figure 1.3: Constructing a quantum copying machine from a teleportation device.

1. Introduction 12
Figure 1.4: Constructing a joint measurement for the observables A and B from a
quantum copying machine.
machine – a joint measuring device MAB for two arbitrary observables A and B.
This is a measuring apparatus which produces each time it is invoked a pair (a, b)
of classical outputs, where a is a possible output of A and b a possible output of
B. The crucial requirement for MAB again is of statistical nature: The statistics of
the a outcomes is the same as for device A, and similarly for B. It is known from
elementary quantum mechanics that many quantum observables are not jointly
measurable in this way. The most famous examples are position and momentum or
different components of angular momentum. Nevertheless a device MAB could be
constructed for arbitrary A and B from a quantum copy machine C. We simply have
to operate with C on the input system p producing two outputs p1 and p2 and to
perform an A measurement on p1 and a B measurement on p2 ; cf. Figure 1.4. Since
the outputs p1 , p2 are, by assumption indistinguishable from the input p the overall
device constructed this way would give a joint measurement for A and B. Hence a
quantum copying machine cannot exist, as stated by the no-cloning theorem. This
in turn implies that classical teleportation is impossible, and therefore we can not
transform quantum information lossless into classical information and back. This
concludes our chain of arguments.
1.2 Tasks of quantum information
So we have seen that quantum information is something new, but what can we do
with it? There are three answers to this question which we want to present here.
First of all let us remark that in fact all information in a modern data processing
environment is carried by micro particles (e.g. electrons or photons). Hence quantum
information comes automatically into play. Currently it is safe to ignore this and
to use classical information theory to describe all relevant processes. If the size of
the structures on a typical circuit decreases below a certain limit, however, this is
no longer true and quantum information will become relevant.
This leads us to the second answer. Although it is far too early to say which
concrete technologies will emerge from quantum information in the future, several
interesting proposals show that devices based on quantum information can solve
certain practical tasks much better than classical ones. The most well known and
exciting one is, without a doubt, quantum computing. The basic idea is, roughly
speaking, that a quantum computer can operate not only on one number per reg-
ister but on superpositions of numbers. This possibility leads to an “exponential
speedup” for some computations which makes problems feasible which are consid-
ered intractable by any classical algorithm. This is most impressively demonstrated
by Shor’s factoring algorithm [192, 193]. A second example which is quite close
13 1.3. Experimental realizations
to a concrete practical realization (i.e. outside the laboratory; see next Section) is
quantum cryptography. The fact that it is impossible to perform a quantum me-
chanical measurement without disturbing the state of the measured system is used
here for the secure transmission of a cryptographic key (i.e. each eavesdropping
attempt can be detected with certainty). Together with a subsequent application
of a classical encryption method known as the “one-time” pad this leads to a cryp-
tographic scheme with provable security – in contrast to currently used public key
systems whose security relies on possibly doubtful assumptions about (pseudo) ran-
dom number generators and prime numbers. We will come back to both subjects –
quantum computing and quantum cryptography in Sections 4.5 and 4.6.
The third answer to the above question is of more fundamental nature. The dis-
cussion of questions from information theory in the context of quantum mechanics
leads to a deeper and in many cases more quantitative understanding of quantum
theory. Maybe the most relevant example for this statement is the study of en-
tanglement, i.e. non-classical correlations between quantum systems, which lead to
violations of Bell inequalities2 . Entanglement is a fundamental aspect of quantum
mechanics and demonstrates the differences between quantum and classical physics
in the most drastic way – this can be seen from Bell-type experiments, like the
one of Aspect et. al. [11], and the discussion about. Nevertheless, for a long time it
was only considered as an exotic feature of the foundations of quantum mechanics
which is not so relevant from a practical point of view. Since quantum information
attained broader interest, however, this has changed completely. It has turned out
that entanglement is an essential resource whenever classical information process-
ing is outperformed by quantum devices. One of the most remarkable examples is
the experimental realization of “entanglement enhanced” teleportation [33, 31]. We
have argued in Section 1.1 that classical teleportation, i.e. transmission of quantum
information through a classical information channel, is impossible. If sender and
receiver share, however, an entangled pair of particles (which can be used as an
additional resource) the impossible task becomes, most surprisingly, possible [19]!
(We will discuss this fact in detail in Section 4.1.) The study of entanglement and
in particular the question how it can be quantified is therefore a central topic within
quantum information theory (cf. Chapter 5). Further examples for fields where
quantum information has led to a deeper and in particular more quantitative in-
sight include “capacities” of quantum information channels and “quantum cloning”.
A detailed discussion of these topics will be given in Chapter 6 and 8. Finally let
us remark that classical information theory benefits in a similar way from the syn-
thesis with quantum mechanics. Beside the just mentioned channel capacities this
concerns for example the theory of computational complexity which analyzes the
scaling behavior of time and space consumed by an algorithm in dependence of the
size of the input data. Quantum information challenges here in particular the fun-
damental Church-Turing hypotheses [54, 212] which claims that each computation
can be simulated “efficiently” on a Turing machine; we come back to this topic in
Section 4.5.
1.3 Experimental realizations

Although this is a theoretical paper, it is of course necessary to say something
about experimental realizations of the ideas of quantum information. Let us consider
quantum computing first. Whatever way we go here, we need systems which can
be prepared very precisely in few distinct states (i.e. we need “qubits”), which can
be manipulated afterwards individually (we have to realize “quantum gates”) and
which can finally be measured with an appropriate observable (we have to “read
out” the result).
2 This is only a very rough characterization. A more precise one will be given in Section 2.2.
1. Introduction 14
One of the most far developed approaches to quantum computing is the ion trap
technique (see Section 4.3 and 5.3 in [32] and Section 7.6 of [172] for an overview and
further references). A “quantum register” is realized here by a string of ions kept by
electromagnetic fields in high vacuum inside a Paul trap, and two long-living states
of each ion are chosen to represent “0” and “1”. A single ion can be manipulated
by laser beams and this allows the implementation of all “one-qubit gates”. To get
two-qubit gates as well (for a quantum computer we need at least one two qubit
gate together with all one-qubit operations; cf. Section 4.5) the collective motional
state of the ions has to be used. A “program” on an ion trap quantum computer
starts now with a preparation of the register in an initial state – usually the ground
state of the ions. This is done by optical pumping and laser cooling (which is in
fact one of the most difficult parts of the whole procedure, in particular if many
ions are involved). Then the “network” of quantum gates is applied, in terms of a
(complicated) sequence of laser pulses. The readout finally is done by laser beams
which illuminate the ions subsequently. The beams are tuned to a fast transition
which affects only one of the qubit states and the fluorescent light is detected. An
overview about recent experimental directions can be found in [86].
A second quite successful technique is NMR quantum computing (see Section
5.4 of [32] and Section 7.7 of [172] together with the references therein for details).
NMR stands for “nuclear magnetic resonance” and it is the study of transitions
between Zeeman levels of an atomic nucleus in a magnetic field. The qubits are in
this case different spin states of the nuclei in an appropriate molecule and quantum
gates are realized by high frequency oscillating magnetic fields in pulses of controlled
duration. In contrast to ion traps however we do not use one molecule but a whole
cup of liquid containing some 1020 of them. This causes a number of problems,
concerning in particular the preparation of an initial state, fluctuations in the free
time evolution of the molecules and the readout. There are several ways to overcome
these difficulties and we refer the reader again to [32] and [172] for details. Concrete
implementations of NMR quantum computers are capable to use up to seven qubits
[213]. A recent review can be found in [87]
The fundamental problem of the two methods for quantum computation dis-
cussed so far, is their lack of scalability. It is realistic to assume that NMR and
ion-trap quantum computer with up to tens of qubits will exist somewhen in the
future but not with thousands of qubits which are necessary for “real world” appli-
cations. There are, however, many other alternative proposals available and some
of them might be capable to avoid this problem. The following is a small (not at all
exhaustive) list: atoms in optical lattices [37], semiconductor nanostructures such as
quantum dots (there are many works in this area, some recent are [209, 40, 28, 38])
and arrays of Josephson junctions [155].
A second circle of experiments we want to mention here is grouped around
quantum communication and quantum cryptography (for a more detailed overview
let us refer to [227] and [97]). Realizations of quantum cryptography are fairly far
developed and it is currently possible to span up to 50km with optical fibers (e.g.
[126]). Potentially greater distances can be bridged by “free space cryptography”
where the quantum information is transmitted through the air (e.g [44]). With this
technology satellites can be used as some sort of “relays”, thus enabling quantum
key distribution over arbitrary distances. In the meantime there are quite a lot of
successful implementations. For a detailed discussion we will refer the reader to the
review of Gisin et. al. [97] and the references therein. Other experiments concern
the usage of entanglement in quantum communication. The creation and detection
of entangled photons is here a fundamental building block. Nowadays this is no
problem and the most famous experiment in this context is the one of Aspect et.
al. [11], where the maximal violation of Bell inequalities was demonstrated with
polarization correlated photons. Another spectacular experiment is the creation
15 1.3. Experimental realizations
of entangled photons over a distance of 10 km using standard telecommunication

optical fibers by the Geneva group [211]. Among the most exciting applications
of entanglement is the realization of entanglement based quantum key distribution
[130], the first successful “teleportation” of a photon [33, 31] and the implementation
of “dense coding” [159]; cf. Section 4.1.
1. Introduction 16
Part I
Fundamentals
Chapter 2
Basic concepts
After we have got a first, rough impression of the basic ideas and most rel-
evant subjects of quantum information theory, let us start with a more detailed
presentation. First we have to introduce the fundamental notions of the theory and
their mathematical description. Fortunately, much of the material we should have
to present here, like Hilbert spaces, tensor products and density matrices, is known
already from quantum mechanics and we can focus our discussion to those concepts
which are less familiar like POV measures, completely positive maps and entangled
states.
2.1 Systems, States and Effects
As classical probability theory quantum mechanics is a statistical theory. Hence its
predictions are of probabilistic nature and can only be tested if the same experiment
is repeated very often and the relative frequencies of the outcomes are calculated.
In more operational terms this means: the experiment has to be repeated according
to the same procedure as it can be set out in a detailed laboratory manual. If we
consider a somewhat idealized model of such a statistical experiment we get in
fact two different types of procedures: first preparation procedures which prepare
a certain kind of physical system in a distinguished state and second registration
procedures measuring a particular observable.
A mathematical description of such a setup basically consists of two sets S and
E and a map S × E 3 (ρ, A) → ρ(A) ∈ [0, 1]. The elements of S describe the states,
i.e. preparations, while the A ∈ E represent all yes/no measurements (effects) which
can be performed on the system. The probability (i.e. the relative frequency for a
large number of repetitions) to get the result “yes”, if we are measuring the effect
A on a system prepared in the state ρ, is given by ρ(A). This is a very general
scheme applicable not only to quantum mechanics but also to a very broad class
of statistical models, containing in particular classical probability. In order to make
use of it we have to specify of course the precise structure of the sets S and E and
the map ρ(A) for the types of systems we want to discuss.
2.1.1 Operator algebras
Throughout this paper we will encounter three different kinds of systems: quantum
and classical systems and hybrid systems which are half classical, half quantum (cf.
Subsection 2.2.2). In this subsection we will describe a general way to define states
and effects which is applicable to all three cases and which therefore provides a
handy way to discuss all three cases simultaneously (this will become most useful
in Section 2.2 and 2.3).
The scheme we are going to discuss is based on an algebra A of bounded op-
erators acting on a Hilbert space H. More precisely A is a (closed) linear sub-
space of B(H), the algebra of bounded operates on H, which contains the identity
(1I ∈ A) and is closed under products (A, B ∈ A ⇒ AB ∈ A) and adjoints (A ∈ A
⇒ A∗ ∈ A). For simplicity we will refer to each such A as an observable algebra.
The key observation is now that each type of system we will study in the following
can be completely characterized by its observable algebra A, i.e. once A is known
there is a systematic way to derive the sets S and E and the map (ρ, A) 7→ ρ(A)
from it. We frequently make use of this fact by referring to systems in terms of their
observable algebra A, or even by identifying them with their algebra and saying
that A is the system.
19 2.1. Systems, States and Effects
Although A and H can be infinite dimensional in general, we will consider only

finite dimensional Hilbert spaces, as long as nothing else is explicitly stated. Since
most research in quantum information is done up to now for finite dimensional
systems this is not a too severe loss of generality. (We come back to this point
in Chapter 13 where we will discuss the new aspects which arises from infinite
dimensional observable algebras.) Hence we can choose H = Cd and B(H) is just
the algebra of complex d × d matrices. Since A is a subalgebra of B(H) it operates
naturally on H and it inherits from B(H) the operator norm kAk = supkψk=1 kAψk
and the operator ordering A ≥ B ⇔ hψ, Aψi ≥ hψ, Bψi ∀ψ ∈ H. Now we can
define:
S(A) = {ρ ∈ A∗ | ρ ≥ 0, ρ(1I) = 1} (2.1)
where A∗ denotes the dual space of A, i.e. the set of all linear functionals on A, and
ρ ≥ 0 means ρ(A) ≥ 0 ∀A ≥ 0. Elements of S(A) describe the states of the system
in question while effects are given by
E(A) = {A ∈ A | A ≥ 0, A ≤ 1I}. (2.2)
The probability to measure the effect A in the state ρ is ρ(A). More generally we can
look at ρ(A) for an arbitrary A as the expectation value of A in the state ρ. Hence
the idea behind Equation (2.1) is to define states in terms of their expectation value
functionals.
Both spaces are convex, i.e. ρ, σ ∈ S(A) and 0 ≤ λ ≤ 1 implies λρ + (1 − λ)σ ∈
S(A) and similarly for E(A). The extremal points of S(A) respectively E(A), i.e.
those elements which do not admit a proper convex decomposition (x = λy+(1−λ)z
⇒ λ = 1 or λ = 0 or y = z = x), play a distinguished role: the extremal points
of S(A) are pure states and those of E(A) are the propositions of the system in
question. The latter represent those effects which register a property with certainty
in contrast to non-extremal effects which admit some “fuzziness”. As a simple ex-
ample for the latter consider a detector which registers particles not with certainty
but only with a probability which is smaller than one.
Finally let us note that the complete discussion of this section can be generalized
easily to infinite dimensional systems, if we replace H = Cd by an infinite dimen-
sional Hilbert space (e.g. H = L2 (R)). This would require however more material
about C* algebras and measure theory than we want to use in this paper.
2.1.2 Quantum mechanics
For quantum mechanics we have
A = B(H), (2.3)
where we have chosen again H = Cd . The corresponding systems are called d-level
systems or qubits if d =£2 holds.
¤ To avoid
£ clumsy
¤ notations we frequently write S(H)
and E(H) instead of S B(H) and E B(H) . From Equation (2.2) we immediately
see that an operator A ∈ B(H) is an effect iff it is positive and bounded from above
by 1I. An element P ∈ E(H) is a propositions iff P is a projection operator (P 2 = P ).
States are described in quantum mechanics usually by density matrices, i.e.
positive and normalized trace class1 operators. To make contact to the general
definition in Equation (2.1) note first that B(H) is a Hilbert space with the Hilbert-
Schmidt scalar product hA, Bi = tr(A∗ B). Hence each linear functional ρ ∈ B(H)∗
1 On a finite dimensional Hilbert space this attribute is of course redundant, since each operator
is of trace class in this case. Nevertheless we will frequently use this terminology, due to greater
consistency with the infinite dimensional case.
2. Basic concepts 20
can be expressed in terms of a (trace class) operator ρe by2 A 7→ ρ(A) = tr(e ρA). It is
obvious that each ρe defines a unique functional ρ. If we start on the other hand with
ρ we can recover the matrix elements of ρe from ρ by ρekj = tr(e ρ|jihk|) = ρ(|jihk|),
where |jihk| denotes the canonical basis of B(H) (i.e. |jihk|ab = δja δkb ). More
generally we get for ψ, φ ∈ H the relation hφ, ρeψi = ρ(|ψihφ|), where |ψihφ| now
denotes the rank one operator which maps η ∈ H to hφ, ηiψ. In the following we
drop the ∼ and use the same symbol for the operator and the functional whenever
confusion can be avoided. Due to the same abuse of language we will interpret
elements of B(H)∗ frequently as (trace class) operators instead of linear functionals
(and write tr(ρA) instead of ρ(A)). However we do not identify B(H)∗ with B(H)
in general, because the two different notations help to keep track of the distinction
between spaces of states and spaces of observables. In addition we equip B ∗ (H) with
the trace-norm kρk1 = tr |ρ| instead of the operator norm.
Positivity of the functional ρ implies positivity of the operator ρ due to
0 ≤ ρ(|ψihψ|) = hψ, ρψi and the same holds for normalization: 1 = ρ(1I) = tr(ρ).
Hence we can identify the state space from Equation (2.1) with the set of density
matrices, as expected for quantum mechanics. Pure states of a quantum system
are the one dimensional projectors. As usual we will frequently identify the density
matrix |ψihψ| with the wave function ψ and call the latter in abuse of language a
state.
To get a useful parameterization of the state space consider again the Hilbert-
Schmidt scalar product hρ, σi = tr(ρ∗ σ), but now on B ∗ (H). The space of trace free
matrices in B ∗ (H) (alternatively the functionals with ρ(1I) = 0) is the corresponding
orthocomplement 1I⊥ of the unit operator. If we choose a basis σ1 , . . . , σd2 −1 with
hσj , σk i = 2δjk in 1I⊥ we can write each selfadjoint (trace class) operator ρ with
tr(ρ) = 1 as
2
d −1
1I 1 X 1I 1 2
ρ= + xj σj =: + ~x · ~σ , with ~x ∈ Rd −1 . (2.4)
d 2 j=1 d 2
If d = 2 or d = 3 holds, it is most natural to choose the Pauli matrices respectively

the Gell-Mann matrices (cf. e.g. Sect. 13.4 of [62]) for the σj . In the qubit case it is
easy to see that ρ ≥ 0 holds iff |~x| ≤ 1. Hence the state space S(C2 ) coincides with
the Bloch ball {~x ∈ R3 | |~x| ≤ 1}, and the set of pure states with its boundary, the
Bloch sphere {~x ∈ R3 | |~x| = 1}. This shows in a very geometric way that the pure
states are the extremal points of the convex set S(H). If ρ is more generally a pure
state of a d-level system we get
1 1 2 p
1 = tr(ρ2 ) = + |~x| ⇒ |~x| = 2 (1 − 1/d). (2.5)
d 2
This implies that all states are contained in the ball with radius 21/2 (1 − 1/d)1/2 ,
however not all operators in this set are positive. A simple example is d −1 1I±21/2 (1−
1/d)1/2 σj , which is positive only if d = 2 holds.
2.1.3 Classical probability
Since the difference between classical and quantum systems is an important issue
in this work let us reformulate classical probability theory according to the general
scheme from Subsection 2.1.1. The restriction to finite dimensional observable alge-
bras leads now to the assumption that all systems we are considering admit a finite
2 If we consider infinite dimensional systems this is not true. In this case the dual space of
the observable algebra is much larger and Equation (2.1) leads to states which are not necessarily
given by trace class operators. Such “singular states” play an important role in theories which
admit an infinite number of degrees of freedom like quantum statistics and quantum field theory;
cf. [35]. This point will be essential in the discussion of infinitely entangled states; cf. Chapter 13.
21 2.1. Systems, States and Effects
set X of elementary events. Typical examples are: throwing a dice X = {1, . . . , 6},
tossing a coin X = {“head”, “number”} or classical bits X = {0, 1}. To simplify
the notations we write (as in quantum mechanics) S(X) and E(X) for the spaces
of states and effects.
The observable algebra A of such a system is the space
A = C(X) = {f : X → C} (2.6)
of complex valued functions on X. To interpret this as an operator algebra acting
on a Hilbert space H (as indicated in Subsection 2.1.1) choose an arbitrary but
fixed orthonormal P basis |xi, x ∈ X in H and identify the function f ∈ C(X) with
the operator f = x fx |xihx| ∈ B(H) (we use the same symbol for the function
and the operator, provided confusion can be avoided). Most frequently we have
X = {1, . . . , d} and we can choose H = Cd and the canonical basis for |xi. Hence
C(X) becomes the algebra of diagonal d × d matrices. Using Equation (2.2) we
immediately see that f ∈ C(X) is an effect iff 0 ≤ fx ≤ 1, ∀x ∈ X. Physically
we can interpret fx as the probability that the effect f registers the elementary
event x. This makes the distinction between propositions and “fuzzy” effects very
transparent: P ∈ E(X) is a proposition iff we have either Px = 1 or Px = 0 for all
x ∈ X. Hence the propositions P ∈ C(X) are in one to one correspondence with
the subsets ωP = {x ∈ X | Px = 1} ⊂ X which in turn describe the events of the
system. Hence P registers the event ωP with certainty, while a fuzzy effect f < P
does this only with a probability less then one.
Since C(X) is finite dimensional and admits the distinguished basis |xihx|, x ∈ X
it is naturally isomorphic to its dual C ∗ (X). More precisely: each linear functional
ρ ∈ C ∗ (X) defines Pand is uniquely defined by the function x 7→ ρx = ρ(|xihx|) and
we have ρ(f ) = x fx ρx . As in the quantum case we will identify the function ρ
with the linear functional and use the same symbol for both, although we keep the
notation C ∗ (X) to indicate that we are talking about states rather than observables.
Positivity of ρP ∈ C ∗ (X) is given
P by ρx ≥ 0 for all x and normalization leads
to 1 = ρ(1I) = ρ ( x |xihx|) = x ρx . Hence to be a state ρ ∈ C ∗ (X) must be a
probability distribution on X and ρx is the probability that the elementary event x
occurs during
P statistical experiments with systems in the state ρ. More generally
ρ(f ) = j ρj fj is the probability to measure the effect f on systems in the state ρ.
If P is in particular a proposition, ρ(P ) gives the probability for the event ωP . The
pure states of the system are the Dirac measures δx , x ∈ X; with δx (|yihy|) = δxy .
Hence each ρ ∈ S(X) can be decomposed in a unique way into a convex linear
combination of pure states.
2.1.4 Observables
Up to now we have discussed only effects, i.e. yes/no experiments. In this subsection
we will have a first short look at more general observables. We will come back to
this topic in Section 3.2.4 after we have introduced channels. We can think of an
observable E taking its values in a finite set X as a map which associates to each
possible outcome x ∈ X the effect Ex ∈ E(A) (if A is the observable algebra of
the system in question) which is true if x is measured and false otherwise. If the
measurement is performed on systems in the state ρ we get for each x ∈ X the
probability px = ρ(Ex ) to measure x. Hence the family of the px should be a
probability distribution on X, and this implies that E should be a POV measure
on X.
Definition 2.1.1 Consider an observable algebra A ⊂ B(H) and a finite 3 set X.
A family E = (Ex )x∈X of effects in A (i.e. 0 ≤ Ex ≤ 1I) is called a positive
3 This is if course an artifical restriction and in many situations not justified (cf. in particular
the discussion of quantum state estimation in Section 4.2 and Chapter 8). However, it helps us to
avoid measure theoretical subtleties; cf. Holevo’s book [111] for a more general discussion.
P
operator valued measure (POV measure) on X if x∈X Ex = 1I holds. If all Ex
are projections, E is called projection valued measure (PV measure).
From basic quantum mechanics we know that observables are described by self
adjoint operators on a Hilbert space H. But, how does this point of view fit into
the previous definition? The answer is given by the spectral theorem (Thm. VIII.6
[186]): Each selfadjoint
P operator A on a finite dimensional Hilbert space H has
the form A = λ∈σ(A) λPλ where σ(A) denotes the spectrum of A, i.e. the set of
eigenvalues and Pλ denotes the projection onto the corresponding eigenspace. Hence
there is a unique PV measure P = (Pλ )λ∈σ(A) associated to A which is called the
spectral measure
P of A. It is uniquely characterized by the property that the expecta-
tion value λ λρ(Pλ ) of P in the state ρ is given for any state ρ by ρ(A) = tr(ρA);
as it is well known from quantum mechanics. Hence the traditional way to define
observables within quantum mechanics perfectly fits into the scheme just outlined,
however it only covers the projection valued case and therefore admits no fuzziness.
For this reason POV measures are sometimes called generalized observables.
Finally note that the eigenprojections Pλ of A are elements of an observable
algebra A iff A ∈ A. This shows two things: First of all we can consider selfadjoint
elements of any *-subalgebra A of B(H) as observables of A-systems, and this is
precisely the reason why we have called A observable algebra. Secondly we see why
it is essential that A is really a subalgebra of B(H): if it is only a linear subspace
of B(H) the relation A ∈ A does not imply Pλ ∈ A.
2.2 Composite systems and entangled states
Composite systems occur in many places in quantum information theory. A typical
example is a register of a quantum computer, which can be regarded as a system
consisting of N qubits (if N is the length of the register). The crucial point is that
this opens the possibility for correlations and entanglement between subsystems.
In particular entanglement is of great importance, because it is a central resource
in many applications of quantum information theory like entanglement enhanced
teleportation or quantum computing – we already discussed this in Section 1.2 of
the introduction. To explain entanglement in greater detail and to introduce some
necessary formalism we have to complement the scheme developed in the last section
by a procedure which allows us to construct states and observables of the composite
system from its subsystems. In quantum mechanics this is done of course in terms
of tensor products, and we will review in the following some of the most relevant
material.
2.2.1 Tensor products
Consider two (finite dimensional) Hilbert spaces H and K. To each pair of vectors
ψ1 ∈ H, ψ2 ∈ K we can associate a bilinear form ψ1 ⊗ ψ2 called the tensor product
of ψ1 and ψ2 by ψ1 ⊗ ψ2 (φ1 , φ2 ) = hψ1 , φ1 ihψ2 , φ2 i. For two product vectors ψ1 ⊗ ψ2
and η1 ⊗ η2 their scalar product is defined by hψ1 ⊗ ψ2 , η1 ⊗ η2 i = hψ1 , η1 ihψ2 , η2 i
and it can be shown that this definition extends in a unique way to the span of all
ψ1 ⊗ ψ2 which therefore defines the tensor product H ⊗ K. If we have more than two
Hilbert spaces Hj , j = 1, . . . , N their tensor product H1 ⊗ · · · ⊗ HN can be defined
similarly.
The tensor product A1 ⊗ A2 of two bounded operators A1 ∈ B(H), A2 ∈ B(K)
is defined first for product vectors ψ1 ⊗ ψ2 ∈ H ⊗ K by A1 ⊗ A2 (ψ1 ⊗ ψ2 ) =
(A1 ψ1 ) ⊗ (A2 ψ2 ) and then extended by linearity. The space B(H ⊗ K) coincides
with the span of all A1 ⊗ A2 . If ρ ∈ B(H ⊗ K) is not of product form (and of
trace class for infinite dimensional H and K) there is nevertheless a way to define
“restrictions” to H respectively K called the partial trace of ρ. It is defined by the
equation
tr[trK (ρ)A] = tr(ρA ⊗ 1I) ∀A ∈ B(H) (2.7)
23 2.2. Composite systems and entangled states
where the trace on the left hand side is over H and on the right hand side over
H ⊗ K.
If two orthonormal bases φ1 , . . . , φn and ψ1 , . . . , ψm are given in H respectively
K we can consider the product basis P φ1 ⊗ ψ1 , . . . , φn ⊗ ψm in H ⊗ K, and we can
expand each Ψ ∈ H ⊗ K as Ψ = jk Ψjk φj ⊗ ψk with Ψjk = hφj ⊗ ψk , Ψi. This
procedure works for an arbitrary number of tensor factors. However, if we have
exactly a twofold tensor product, there is a more economic way to expand Ψ, called
Schmidt decomposition in which only diagonal terms of the form φj ⊗ ψj appear.
Proposition 2.2.1 For each element Ψ of the twofold tensor product H ⊗ K there
are orthonormal systems φj , j = 1, . . . , n and ψk , k = 1, . . . , n (not necessarily
bases, i.e.P √ be smaller than dim H and dim K) of H and K respectively such
n can
that Ψ = j λj φj ⊗ ψj holds. The φj and ψj are uniquely determined by Ψ. The
√
expansion is called Schmidt decomposition and the numbers λj are the Schmidt
coefficients.
Proof. Consider the partial trace ρ1 = trK (|ΨihΨ|) of the one dimensional projector
|ΨihΨ| associated to Ψ. ItPcan be decomposed in terms of its eigenvectors φ n and we
get trK (|ΨihΨ|) = ρ1 = n λn |φn ihφn |. Now we can choose an orthonormal basis
ψk0 , k = 1, . . . , m in K and expand Ψ with respect
P to φj ⊗ ψk0 . Carrying out the k
summation
P we get a family of vectors ψj = k hΨ, φj ⊗ ψk0 iψk0 with the property
00
Ψ = j φj ⊗ ψj00 . Now we can calculate the partial trace and get for any A ∈ B(H1 ):
X X
λj hφj , Aφj i = tr(ρ1 A) = hΨ, (A ⊗ 1I)Ψi = hφj , Aφk ihψj00 , ψk00 i. (2.8)
j,k
j
Since A is arbitrary we can compare the left and right hand side of this equation
−1/2 00
term by term and we get hψj00 , ψk00 i = δjk λj . Hence ψj = λj ψj is the desired
orthonormal system. 2
As an immediate application of this result we can show that each mixed state
ρ ∈ B ∗ (H) (of the quantum system B(H)) can be regarded as a pure state on a
larger
P Hilbert space H ⊗ H0 . We just have to consider the eigenvalue expansion ρ =
j λj |φj ihφj | of ρ and to choose an arbitrary orthonormal system ψ j , j = 1, . . . n
in H0 . Using Proposition 2.2.1 we get
Corollary 2.2.2 Each state ρ ∈ B ∗ (H) can be extended to a pure state Ψ on a
larger system with Hilbert space H ⊗ H0 such that trH0 |ΨihΨ| = ρ holds.
2.2.2 Compound and hybrid systems
To discuss the composition of two arbitrary (i.e. classical or quantum) systems it is
very convenient to use the scheme developed in Subsection 2.1.1 and to talk about
the two subsystems in terms of their observable algebras A ⊂ B(H) and B ⊂ B(K).
The observable algebra of the composite system is then simply given by the tensor
product of A and B, i.e.
A ⊗ B := span{A ⊗ B | A ∈ A, B ∈ B} ⊂ B(K ⊗ H). (2.9)
The dual of A ⊗ B is generated by product states, (ρ ⊗ σ)(A ⊗ B) = ρ(A)σ(B) and

we therefore write A∗ ⊗ B ∗ for (A ⊗ B)∗ .
The interpretation of the composed system A ⊗ B in terms of states and effects
is straightforward and therefore postponed to the next Subsection. We will consider
first the special cases arising from different choices for A and B. If both systems are
quantum (A = B(H) and B = B(K)) we get
B(H) ⊗ B(K) = B(H ⊗ K) (2.10)

as expected. For two classical systems A = C(X) and B = C(Y ) recall that elements
of C(X) (respectively C(Y )) are complex valued functions on X (on Y ). Hence the
tensor product C(X) ⊗ C(Y ) consists of complex valued functions on X × Y , i.e.
C(X) ⊗ C(Y ) = C(X × Y ). In other words states and observables of the composite
system C(X) ⊗ C(Y ) are, in accordance with classical probability theory, given by
probability distributions and random variables on the Cartesian product X × Y .
If only one subsystem is classical and the other is quantum; e.g. a micro particle
interacting with a classical measuring device we have a hybrid system. The elements
of its observable algebra C(X) ⊗ B(H) can be regarded as operator valued functions
on X, i.e. X 3 x 7→ Ax ∈ B(H) and A is an effect iff 0 ≤ Ax ≤ 1I holds for all
x ∈ X. The elements of the dual C ∗ (X) ⊗ B ∗ (H) are in a similar way B ∗ (X) valued
functions X 3 x 7→ P ρx ∈ B ∗ (H) and ρ is a state iff each ρx is a positive trace class
operator
P on H and x ρx = 1I. The probability to measure the effect A in the state
ρ is x ρx (Ax ).
2.2.3 Correlations and entanglement
Let us now consider two effects A ∈ A and B ∈ B then A ⊗ B is an effect of the
composite system A ⊗ B. It is interpreted as the joint measurement of A on the first
and B on the second subsystem, where the “yes” outcome means “both effects give
yes”. In particular A ⊗ 1I means to measure A on the first subsystem and to ignore
the second one completely. If ρ is a state of A ⊗ B we can define its restrictions
by ρA (A) = ρ(A ⊗ 1I) and ρB (A) = ρ(1I ⊗ A). If both systems are quantum the
restrictions of ρ are the partial traces, while in the classical case we have to sum
over the B, respectively A variables. For two states ρ1 ∈ S(A) and ρ2 ∈ S(B) there
is always a state ρ of A ⊗ B such that ρ1 = ρA and ρ2 = ρB holds: We just have to
choose the product state ρ1 ⊗ ρ2 . However in general we have ρ 6= ρA ⊗ ρB which
means nothing else then ρ also contains correlations between the two subsystems.
Definition 2.2.3 A state ρ of a bipartite system A ⊗ B is called correlated if there
are some A ∈ A, B ∈ B such that ρ(A ⊗ B) 6= ρA (A)ρB (B) holds.
We immediately see that ρ = ρ1 ⊗ ρ2 implies ρ(A ⊗ B) = ρ1 (A)ρ2 (B) =
ρA (A)ρB (B) hence ρ is not correlated. If on the other hand ρ(A⊗B) = ρA (A)ρB (B)
holds we get ρ = ρA ⊗ ρB . Hence, the definition of correlations just given perfectly
fits into our intuitive considerations.
An important issue in quantum information theory is the comparison of correla-
tions between quantum systems on the one hand and classical systems on the other.
Hence let us have a closer look on the state space of a system consisting of at least
one classical subsystem.
Proposition 2.2.4 Each state ρ of a composite system A ⊗ B consisting of a clas-
sical (A = C(X)) and an arbitrary system (B) has the form
X
ρ= λj ρA B
j ⊗ ρj (2.11)
j∈X
with positive weights λj > 0 and ρA B

j ∈ S(A), ρj ∈ S(B).
Proof. Since A = C(X) is classical, there is a basis |jihj| ∈ A, j ∈ X of Pmutually

orthogonal one-dimensional projectors and we can write each A ∈ A as j aj |jihj|
(cf. Subsection 2.1.3). For each state ρ ∈ S(A ⊗ B) we can now define ρA j ∈ S(A)
−1
with ρAj (A) = tr(A|jihj|) = a j and ρ B
jP∈ S(B) with ρ B
j (B) = λ j ρ(|jihj| ⊗ B) and
λj = ρ(|jihj| ⊗ 1I). Hence we get ρ = j∈X λj ρA j ⊗ ρ B
j with positive λ j as stated.
2
If A and B are two quantum systems it is still possible for them to be correlated
in the way just described. We can simply prepare them with a classical random
25 2.2. Composite systems and entangled states
generator which triggers two preparation devices to produce systems in the states
ρA B
j , ρj with probability λj . The overall state produced by this setup is obviously
the ρ from Equation (2.11). However, the crucial point is that not all correlations of
quantum systems are of this type! This is an immediate consequence of the definition
of pure states ρ = |ΨihΨ| ∈ S(H): Since there is no proper convex decomposition of
ρ, it can be written as in Proposition 2.2.4 iff Ψ is a product vector, i.e. Ψ = φ ⊗ ψ.
This observation motivates the following definition.
Definition 2.2.5 A state ρ of the composite system B(H1 ) ⊗ B(H2 ) is called sep-
arable or classically correlated if it can be written as
X (1) (2)
ρ= λj ρj ⊗ ρ j (2.12)
j
(k)
with states ρj of B(Hk ) and weights λj > 0. Otherwise ρ is called entangled. The
set of all separable states is denoted by D(H1 ⊗ H2 ) or just D if H1 and H2 are
understood.
2.2.4 Bell inequalities
We have just seen that it is quite easy for pure states to check whether they are
entangled or not. In the mixed case however this is a much bigger, and in general
unsolved, problem. In this subsection we will have a short look at Bell inequalities,
which are maybe the oldest criterion for entanglement (for a more detailed review see
[233]). Today more powerful methods, most of them based on positivity properties,
are available. We will postpone the corresponding discussion to the end of the
following section, after we have studied (completely) positive maps (cf. Section
2.4).
Bell inequalities are traditionally discussed in the framework of “local hidden
variable theories”. More precisely we will say that a state ρ of a bipartite system
B(H ⊗ K) admits a hidden variable model, if there is a probability space (X, µ) and
(measurable) response functions X 3 x 7→ FA (x, k), FB (x, l) ∈ R for all discrete PV
measures A = A1 , . . . , AN ∈ B(H) respectively B = B1 , . . . , BM ∈ B(K) such that
Z
FA (x, k)FB (x, l)µ(dx) = tr(ρAk ⊗ Bl ) (2.13)
X
holds for all, k, l and A, B. The value of the functions FA (x, k) is interpreted as
the probability to get the value k during an A measurement with known “hidden
parameter” x. The set of states admitting a hidden variable model is a convex set
and as such it can be described by an (infinite) hierarchy of correlation inequalities.
Any one of these inequalities is usually called (generalized) Bell inequality. The
most well known one is those given by Clauser, Horne, Shimony and Holt [57]: The
state ρ satisfies the CHSH-inequality if
¡ ¢
ρ A ⊗ (B + B 0 ) + A0 ⊗ (B − B 0 ) ≤ 2 (2.14)
holds for all A, A0 ∈ B(H) respectively B, B 0 ∈ B(K), with −1I ≤ A, A0 ≤ 1I and

−1I ≤ B, B 0 ≤ 1I. For the special case of two dichotomic observables the CHSH
inequalities are sufficient to characterize the states with a hidden variable model. In
the general case the CHSH-inequalities are a necessary but not a sufficient condition
and a complete characterization is not known.
Pn (1) (2)
It is now easy to see that each separable state ρ = j=1 λj ρj ⊗ ρj ad-
mits a hidden variable model: we have to choose X = 1, . . . , n, µ({j}) = λ j ,
(1)
FA (x, k) = ρx (Ak ) and FB analogously. Hence we immediately see that each
state of a composite system with at least one classical subsystem satisfies the Bell
inequalities (in particular the CHSH version) while this is not the case for pure
quantum systems. The most prominent examples are “maximally entangled states”
(cf. Subsection 3.1.1) which violate the CHSH
√ inequality (for appropriately chosen
A, A0 , B, B 0 ) with a maximal value of 2 2. This observation is the starting point
for many discussions concerning the interpretation
√ of quantum mechanics, in par-
ticular because the maximal violation of 2 2 was observed in 1982 experimentally
by Aspect and coworkers [11]. We do not want to follow this path (see [233] and the
the references therein instead). Interesting for us is the fact that Bell inequalities,
in particular the CHSH case in Equation (2.14), provide a necessary condition for
a state ρ to be separable. However there exist entangled states admitting a hidden
variable model [229]. Hence, Bell inequalities are not sufficient for separability.
2.3 Channels
Assume now that we have a number of quantum systems, e.g. a string of ions in
a trap. To “process” the quantum information they carry we have to perform in
general many steps of a quite different nature. Typical examples are: free time
evolution, controlled time evolution (e.g. the application of a “quantum gate” in a
quantum computer), preparations and measurements. The purpose of this section is
to provide a unified framework for the description of all these different operations.
The basic idea is to represent each processing step by a “channel”, which converts
input systems, described by an observable algebra A into output systems described
by a possibly different algebra B. Henceforth we will call A the input and B the
output algebra. If we consider e.g. the free time evolution, we need quantum systems
of the same type on the input and the output side, hence in this case we have
A = B = B(H) with an appropriately chosen Hilbert space H. If on the other hand
we want to describe a measurement we have to map quantum systems (the measured
system) to classical information (the measuring result). Therefore we need in this
example A = B(H) for the input and B = C(X) for the output algebra, where X is
the set of possible outcomes of the measurement (cf. Subsection 2.1.4).
Our aim is now to get a mathematical object which can be used to describe a
channel. To this end consider an effect A ∈ B of the output system. If we invoke first
a channel which transforms A systems into B systems, and measure A afterwards
on the output systems, we end up with a measurement of an effect T (A) on the
input systems. Hence we get a map T : E(B) → E(A) which completely describes the
channel 4 . Alternatively we can look at the states and interpret a channel as a map
T ∗ : S(A) → S(B) which transforms A systems in the state ρ ∈ S(A) into B systems
in the state T ∗ (ρ). To distinguish between both maps we can say that T describes
the channel in the Heisenberg picture and T ∗ in the Schrödinger picture. On the level
of the statistical interpretation both points of view should coincide of course, i.e. the
probabilities5 (T ∗ ρ)(A) and ρ(T A) to get the result “yes” during an A measurement
on B systems in the state T ∗ ρ, respectively a T A measurement on A systems in
the state ρ, should be the same. Since (T ∗ ρ)(A) is linear in A we see immediately
that T must be an affine map, i.e. T (λ1 A1 + λ2 A2 ) = λ1 T (A1 ) + λ2 T (A2 ) for each
convex linear combination λ1 A1 + λ2 A2 of effects in B, and this in turn implies that
T can be extended naturally to a linear map, which we will identify in the following
with the channel itself, i.e. we say that T is the channel.
2.3.1 Completely positive maps
Let us change now slightly our point of view and start with a linear operator T :
A → B. To be a channel, T must map effects to effects, i.e. T has to be positive:
4 Note that the direction of the mapping arrow is reversed compared to the natural ordering of
processing.
5 To keep notations more readable we will follow frequently the usual convention to drop the
parenthesis around arguments of linear operators. Hence we will write T A and T ∗ ρ instead of
T (A) and T ∗ (ρ). Similarly we will simply write T S instead of T ◦ S for compositions.
27 2.3. Channels
T (A) ≥ 0 ∀A ≥ 0 and bounded from above by 1I, i.e. T (1I) ≤ 1I. In addition it is
natural to require that two channels in parallel are again a channel. More precisely, if
two channels T : A1 → B1 and S : A2 → B2 are given we can consider the map T ⊗S
which associates to each A⊗B ∈ A1 ⊗A2 the tensor product T (A)⊗S(B) ∈ B1 ⊗B2 .
It is natural to assume that T ⊗ S is a channel which converts composite systems
of type A1 ⊗ A2 into B1 ⊗ B2 systems. Hence S ⊗ T should be positive as well [178].
Definition 2.3.1 Consider two observable algebras A, B and a linear map T : A →

B ⊂ B(H).
1. T is called positive if T (A) ≥ 0 holds for all positive A ∈ A.
2. T is called completely positive (cp) if T ⊗ Id : A ⊗ B(Cn ) → B(H) ⊗ B(Cn )

is positive for all n ∈ N. Here Id denotes the identity map on B(Cn ).
3. T is called unital if T (1I) = 1I holds.
Consider now the map T ∗ : B ∗ → A∗ which is dual to T , i.e. T ∗ ρ(A) = ρ(T A)

for all ρ ∈ B ∗ and A ∈ A. It is called the Schrödinger picture representation of the
channel T , since it maps states to states provided T is unital. (Complete) positivity
can be defined in the Schrödinger picture as in the Heisenberg picture and we
immediately see that T is (completely) positive iff T ∗ is.
It is natural to ask whether the distinction between positivity and complete
positivity is really necessary, i.e. whether there are positive maps which are not
completely positive. If at least one of the algebras A or B is classical the answer
is no: each positive map is completely positive in this case. If both algebras are
quantum however complete positivity is not implied by positivity alone. We will
discuss explicit examples in Subsection 2.4.2.
If item 2 holds only for a fixed n ∈ N the map T is called n-positive. This is
obviously a weaker condition then complete positivity. However, n-positivity implies
m-positivity for all m ≤ n, and for A = B(Cd ) complete positivity is implied by
n-positivity, provided n ≥ d holds.
Let us consider now the question whether a channel should be unital or not. We
have already mentioned that T (1I) ≤ 1I must hold since effects should be mapped to
effects. If T (1I) is not equal to 1I we get ρ(T 1I) = T ∗ ρ(1I) < 1 for the probability to
measure the effect 1I on systems in the state T ∗ ρ, but this is impossible for channels
which produce an output with certainty, because 1I is the effect which is always true.
In other words: If a cp map is not unital it describes a channel which sometimes
produces no output at all and T (1I) is the effect which measures whether we have
got an output. We will assume in the future that channels are unital if nothing else
is explicitly stated.
2.3.2 The Stinespring theorem
Consider now channels between quantum systems, i.e. A = B(H1 ) and B = B(H2 ).
A fairly simple example (not necessarily unital) is given in terms of an operator
V : H1 → H2 by B(H1 ) 3 A 7→ V AV ∗ ∈ B(H2 ). A second example is the restriction
to a subsystem, which is given in the Heisenberg picture by B(H) 3 A 7→ A ⊗ 1I K ∈
B(H ⊗ K). Finally the composition S ◦ T = ST of two channels is again a channel.
The following theorem, which is the most fundamental structural result about cp
maps6 , says that each channel can be represented as a composition of these two
examples [202].
6 Basically there is a more general version of this theorem which works with arbitrary output
algebras. It needs however some material from representation theory of C*-algebras which we want
to avoid here. See e.g. [178, 115].
Theorem 2.3.2 (Stinespring dilation theorem) Every completely positive

map T : B(H1 ) → B(H2 ) has the form
T (A) = V ∗ (A ⊗ 1IK )V, (2.15)
with an additional Hilbert space K and an operator V : H2 → H1 ⊗ K. Both (i.e.

K and V ) can be chosen such that the span of all (A ⊗ 1I)V φ with A ∈ B(H1 ) and
φ ∈ H2 is dense in H1 ⊗ K. This particular decomposition is unique (up to unitary
equivalence) and called the minimal decomposition. If dim H1 = d1 and dim H2 = d2
the minimal K satisfies dim K ≤ d21 d2 .
P By introducing a family |χj ihχj | of one dimensional projectors with

j |χj ihχj | = 1I we can define the “Kraus operators” hψ, Vj φi = hψ ⊗ χj , V φi.
In terms of them we can rewrite Equation (2.15) in the following form [146]:
Corollary 2.3.3 (Kraus form) Every completely positive map T : B(H1 ) →
B(H2 ) can be written in the form
N
X
T (A) = Vj∗ AVj (2.16)
j=1
with operators Vj : H2 → H1 and N ≤ dim(H1 ) dim(H2 ).

Finally let us state a third result which is closely related to the Stinespring
theorem. It characterizes all decompositions of a given completely positive map
into completely positive summands. By analogy with results for states on abelian
algebras (i.e. probability measures) we will call it a Radon-Nikodym theorem; see
[9] for a proof.
Theorem 2.3.4 (Radon-Nikodym theorem) Let Tx : B(H1 ) → B(H2 ), x ∈ X
be a family of completely
P positive maps and let V : H2 → H1 ⊗ K be the Stinespring
T̄ = x Tx , then there are uniquely determined positive operators Fx in
operator of P
B(K) with x Fx = 1I and
Tx (A) = V ∗ (A ⊗ Fx )V. (2.17)
2.3.3 The duality lemma

We will consider a fundamental relation between positive maps and bipartite sys-
tems, which will allow us later on to translate properties of entangled states to
properties of channels and vice versa. The basic idea originates from elementary
linear algebra: A bilinear form φ on a d-dimensional vector space V can be repre-
sented by a d × d-matrix, just as an operator on V . Hence, we can transform φ into
an operator simply by reinterpreting the matrix elements. In our situation things
are more difficult, because the positivity constraints for states and channels should
match up in the right way. Nevertheless we have the following theorem.
Theorem 2.3.5 Let ρ be a density operator on H ⊗ H1 . Then there is a Hilbert
space K a pure state σ on H ⊗ K and a channel T : B(H1 ) → B(K) with
ρ = (Id ⊗T ∗ ) σ, (2.18)
where Id denotes the identity map on B ∗ (H). The pure state σ can be chosen such
that trH (σ) has no zero eigenvalue. In this case T and σ are uniquely³ determined
´
e, T with ρ = Id ⊗Te∗ σ
(up to unitary equivalence) by Equation (2.18); i.e. if σ e e are
given, we have σ ∗ e ∗
e = (1I ⊗ U ) σ(1I ⊗ U ) and T ( · ) = U T ( · )U with an appropriate
unitary operator U .
29 2.4. Separability criteria and positive maps
Proof. The state σ is obviously the purification of trH1 (ρ). Hence if λj and
ψj are
P eigenvalues
p and eigenvectors of trH1 (ρ) we can set σ = |ΨihΨ| with
Ψ = j λj ψj ⊗ φj where φj is an (arbitrary) orthonormal basis in K. It is clear
that σ is uniquely determined up to a unitary. Hence we only have to show that a
unique T exists if Ψ is given. To satisfy Equation (2.18) we must have
¡ ¢ ¡ ¢ ®
ρ |ψj ⊗ ηk ihψl ⊗ ηl | = Ψ, (Id ⊗T ) |ψj ⊗ ηk ihψl ⊗ ηl | Ψ (2.19)
¡ ¢ ®
= Ψ, |ψj ihψl | ⊗ T |ηk ihηp | Ψ (2.20)
p ¡ ¢ ®
= λj λl φj , T |ηk ihηp | φl , (2.21)
where ηk is an (arbitrary) orthonormal basis in H1 . Hence T is uniquely determined

by ρ in terms of its matrix elements and we only have to check complete positivity.
To this end it is useful to note that the map ρ 7→ T is linear if the λj are fixed.
Hence it is sufficient to consider the case ρ = |χihχ|. Inserting this in Equation
−1/2
(2.21) we immediately see that T (A) = V ∗ AV with hV φj , ηk i = λj hψj ⊗ ηk , χi
holds. Hence T is completely positive. Since normalization T (1I) = 1I follows from
the choice of the λj the theorem is proved. 2
2.4 Separability criteria and positive maps

We have already stated in Subsection 2.3.1 that positive but not completely pos-
itive maps exist, whenever input and output algebra are quantum. No such map
represents a valid quantum operation, nevertheless they are of great importance in
quantum information theory, due to their deep relations to entanglement properties.
Hence, this Section is a continuation of the study of separability criteria which we
have started in 2.2.4. In contrast to the rest of this section, all maps are considered
in the Schrödinger rather than in the Heisenberg picture.
2.4.1 Positivity
Let us consider now an arbitrary positive, but not necessarily completely positive
map T ∗ : B ∗ (H) → B ∗ (K). If Id again denotes the identity map, it is easy to see that
(Id ⊗T ∗ )(σ2 ⊗σ2 ) = σ1 ⊗T ∗ (σ2 ) ≥ 0 holds for each product state σ1 ⊗σ2 ∈ S(H⊗K).
Hence (Id ⊗ T ∗ )ρ ≥ 0 for each positive T ∗ is a necessary condition for ρ to be
separable. The following theorem proved in [118] shows that sufficiency holds as
well.
Theorem 2.4.1 A state ρ ∈ B ∗ (H ⊗ K) is separable iff for any positive map T ∗ :
B ∗ (K) → B ∗ (H) the operator (Id ⊗T ∗ )ρ is positive.
Proof. We will only give a sketch of the proof see [118] for details. The condition is
obviously necessary since (Id ⊗T ∗ )ρ1 ⊗ ρ2 ≥ 0 holds for any product state provided
T ∗ is positive. The proof of sufficiency relies on the fact that it is always possible
to separate a point ρ (an entangled state) from a convex set D (the set of separable
states) by a hyperplane. A precise formulation of this idea leads to the following
proposition.
Proposition 2.4.2 For any entangled state ρ ∈ S(H⊗K) there is an operator A on
H ⊗ K called entanglement witness for ρ, with the property ρ(A) < 0 and σ(A) ≥ 0
for all separable σ ∈ S(H ⊗ K).
Proof. Since D ⊂ B ∗ (H⊗K) is a closed convex set, for each ρ ∈ S ⊂ B ∗ (H⊗K) with
ρ 6∈ D there exists a linear functional α on B ∗ (H ⊗ K), such that α(ρ) < γ ≤ α(σ)
for each σ ∈ D with a constant γ. This holds as well in infinite dimensional Banach
spaces and is a consequence of the Hahn-Banach theorem (cf. [187] Theorem 3.4).
Without loss of generality we can assume that γ = 0 holds. Otherwise we just have
to replace α by α − γ tr. Hence the result follows from the fact that each linear
functional on B ∗ (H ⊗ K) has the form α(σ) = tr(Aσ) with A ∈ B(H ⊗ K). 2
To continue the proof of Theorem 2.4.1 associate now to any operator A ∈

B(H ⊗ K) the map TA∗ : B ∗ (K) → B ∗ (H) with
tr(Aρ1 ⊗ ρ2 ) = tr(ρT1 TA∗ (ρ2 )), (2.22)
where ( · )T denotes the transposition in an arbitrary but fixed orthonormal basis

|ji, j = 1, . . . , d. It is easy to see that TA∗ is positive if tr(Aρ1 ⊗ ρ2 ) ≥ 0 for all
product states ρ1 ⊗ ρ2 ∈ S(H ⊗ K) [128]. A straightforward calculation [118] shows
in addition that
¡ ¢
tr(Aρ) = tr |ΨihΨ|(Id ⊗TA∗ )(ρ) (2.23)
P
holds, where Ψ = d−1/2 j |ji⊗|ji. Assume now that (Id ⊗T ∗ )ρ ≥ 0 for all positive
T ∗ . Since TA∗ is positive this implies that the left hand site of (2.23) is positive, hence
tr(Aρ) ≥ 0 provided tr(Aσ) ≥ 0 holds for all separable σ, and the statement follows
from Proposition 2.4.2. 2
2.4.2 The partial transpose

The most typical example for a positive non-cp map is the transposition ΘA = A T
of d × d matrices, which we have just used in the proof of Theorem 2.4.1. Θ is
obviously a positive map, but the partial transpose
B ∗ (H ⊗ K) 3 ρ 7→ (Id ⊗Θ)(ρ) ∈ B ∗ (H ⊗ K) (2.24)
is not. The latter can be easily checked with the maximally entangled state (cf.
Subsection 3.1.1).
1 X
Ψ= √ |ji ⊗ |ji (2.25)
d j
where |ji ∈ Cd , j = 1, . . . , d denote the canonical basis vectors. In low dimensions

the transposition is basically the only positive map which is not cp. Due to results
of Størmer [203] and Woronowicz [240] we have: dim H = 2 and dim K = 2, 3 imply
that each positive map T ∗ : B ∗ (H) → B ∗ (K) has the form T ∗ = T1∗ + T2∗ Θ with
two cp maps T1∗ , T2∗ and the transposition on B(H). This immediately implies that
positivity of the partial transpose is necessary and sufficient for separability of a
state ρ ∈ S(H ⊗ K) (cf. [118]):
Theorem 2.4.3 Consider a bipartite system B(H ⊗ K) with dim H = 2 and

dim K = 2, 3. A state ρ ∈ S(H ⊗ K) is separable iff its partial transpose is pos-
itive.
To use positivity of the partial transpose as a separability criterion was proposed

for the first time by Peres [180], and he conjectured that it is a necessary and
sufficient condition in arbitrary finite dimension. Although it has turned out in the
meantime that this conjecture is wrong in general (cf. Subsection 3.1.5), partial
transposition has become a crucial tool within entanglement theory and we define:
Definition 2.4.4 A state ρ ∈ B ∗ (H ⊗ K) of a bipartite quantum system is called

ppt-state if (Id ⊗Θ)ρ ≥ 0 holds and npt-state otherwise (ppt=“positive partial
transpose” and npt=“negative partial transpose”).
31 2.4. Separability criteria and positive maps
2.4.3 The reduction criterion

Another frequently used example of a non-cp but positive map is B ∗ (H) 3 ρ 7→
T ∗ (ρ) = (tr ρ)1I − ρ ∈ B ∗ (H). The eigenvalues of T ∗ (ρ) are given byP
tr ρ − λi , where
λi are the eigenvalues of ρ. If ρ ≥ 0 we have λi ≥ 0 and therefore j λj − λk ≥ 0.
Hence T ∗ is positive. That T ∗ is not completely positive follows if we consider again
the example |ψihψ| from Equation (2.25), hence we get
1I ⊗ tr2 (ρ) − ρ ≥ 0, tr1 (ρ) ⊗ 1I − ρ ≥ 0 (2.26)
for any separable state ρ ∈ B ∗ (H ⊗ K), These equations are another non-trivial
separability criterion, which is called the reduction criterion [117, 52]. It is closely
related to the ppt criterion, due to the following proposition (see [117]) for a proof).
Proposition 2.4.5 Each ppt-state ρ ∈ S(H ⊗ K) satisfies the reduction criterion.
If dim H = 2 and dim K = 2, 3 both criteria are equivalent.
Hence we see with Theorem 2.4.3 that a state ρ in 2 × 2 or 2 × 3 dimensions is
separable iff it satisfies the reduction criterion.
Chapter 3
Basic examples
After the somewhat abstract discussion in the last chapter we will become more
concrete now. In the following we will present a number of examples which help
on the one hand to understand the structures just introduced, and which are of
fundamental importance within quantum information on the other.
3.1 Entanglement
Although our definition of entanglement (Definition 2.2.5) is applicable in arbitrary
dimensions, detailed knowledge about entangled states is available only for low
dimensional systems or for states with very special properties. In this section we
will discuss some of the most basic examples.
3.1.1 Maximally entangled states
Let us start with a look on pure states of a composite systems ¡ A ⊗ B¢ and their
possible correlations. If one subsystem is classical, i.e. A = C {1, . . . , d} , the state
space is given according to Subsection 2.2.2 by S(B)d and ρ ∈ S(B)d is pure iff
ρ = (δj1 τ, . . . , δjd τ ) with j = 1, . . . , d and a pure state τ of the B system. Hence the
restrictions of ρ to A respectively B are the Dirac measure δj ∈ S(X) or τ ∈ S(B),
in other words both restrictions are pure. This is completely different if A and B
are quantum, i.e. A ⊗ B = B(H ⊗ K): Consider ρ = |ΨihΨ| with Ψ ∈ H ⊗ K and
P 1/2
Schmidt decomposition (Proposition 2.2.1) Ψ = j λj φj ⊗ ψj . Calculating the A
restriction, i.e. the partial trace over K we get
X 1/2 1/2
tr[trK (ρ)A] = tr[|ΨihΨ|A ⊗ 1I] = λj λk hφj , Aφk iδjk , (3.1)
jk
P
hence trK (ρ) = j λj |φj ihφj | is mixed iff Ψ is entangled. The most extreme case
arises if H = K = Cd and trK (ρ) is maximally mixed, i.e. trK (ρ) = 1dI . We get for Ψ
d
1 X
Ψ= √ φj ⊗ ψ j (3.2)
d j=1
with two orthonormal bases φ1 , . . . , φd and ψ1 , . . . , ψd . In 2n × 2n dimensions these

states violate maximally the CHSH inequalities, with appropriately chosen opera-
tors A, A0 , B, B 0 . Such states are therefore called maximally entangled. The most
prominent examples of maximally entangled states are the four “Bell states” for
two qubit systems, i.e. H = K = C2 , |1i, |0i denotes the canonical basis and
1
Φ0 = √ (|11i + |00i) , Φj = i(1I ⊗ σj )Φ0 , j = 1, 2, 3 (3.3)
2
where we have used the shorthand notation |jki for |ji ⊗ |ki and the σj denote the
Pauli matrices.
The Bell states, which form an orthonormal basis of C2 ⊗C2 , are the best studied
and most relevant examples of entangled states within quantum information. A
mixture of them, i.e. a density
P matrix ρ ∈ S(C2 ⊗ C2 ) with eigenvectors Φj and
eigenvalues 0 ≤ λj ≤ 1, j λj = 1 is called a Bell diagonal state. It can be shown
[24] that ρ is entangled iff maxj λj > 1/2 holds. We omit the proof of this statement
here, but we will come back to this point in Chapter 5 within the discussion of
entanglement measures.
33 3.1. Entanglement
Let us come back to the general case now and consider an arbitrary ρ ∈ S(H⊗H).
Using maximally entangled states, we can introduce another separability criterion
in terms of the maximally entangled fraction (cf. [24])
F(ρ) = sup hΨ, ρΨi. (3.4)

Ψ max. ent.
If ρ is separable the reduction criterion (2.26) implies hΨ, [tr1 (ρ) ⊗ 1I − ρ]Ψi ≥ 0 for
any maximally entangled state. Since the partial trace of |ΨihΨ| is d−1 1I we get
d−1 = hΨ, tr1 (ρ) ⊗ 1IΨi ≤ hΨ, ρΨi, (3.5)
hence F(ρ) ≤ 1/d. This condition is not very sharp however. Using the ppt criterion
it can be shown that ρ = λ|Φ1 ihΦ1 | + (1 − λ)|00ih00| (with the Bell state Φ1 ) is
entangled for all 0 < λ ≤ 1 but a straightforward calculation shows that F(ρ) ≤ 1/2
holds for λ ≤ 1/2.
Finally, we have to mention here a very useful parameterization of the set of
pure states on H ⊗ H in terms of maximally entangled states: If Ψ is an arbitrary
but fixed maximally entangled state, each φ ∈ H ⊗ H admits (uniquely determined)
operators X1 , X2 such that
φ = (X1 ⊗ 1I)Ψ = (1I ⊗ X2 )Ψ (3.6)
holds. This can be easily checked in a product basis.

3.1.2 Werner states
If we consider entanglement of mixed states rather than pure ones, the analysis
becomes quite difficult, even if the dimensions of the underlying Hilbert spaces are
low. The reason is that the state space S(H1 ⊗ H2 ) of a two-partite system with
dim Hi = di is a geometric object in a d21 d22 −1 dimensional space. Hence even in the
simplest non-trivial case (two qubits) the dimension of the state space becomes very
high (15 dimensions) and naive geometric intuition can be misleading. Therefore it is
often useful to look at special classes of model states, which can be characterized by
only few parameters. A quite powerful tool is the study of symmetry properties; i.e.
to investigate the set of states which is invariant under a group of local unitaries. A
general discussion of this scheme can be found in [221]. In this paper we will present
only three of the most prominent examples.
Consider first a state ρ ∈ S(H ⊗ H) (with H = Cd ) which is invariant under the
group of all U ⊗ U with a unitary U on H; i.e. [U ⊗ U, ρ] = 0 for all U . Such a ρ
is usually called a Werner state [229, 181] and its structure can be analyzed quite
easily using a well known result of group theory which goes back to Weyl [237] (see
also Theorem IX.11.5 of [195]), and which we will state in detail for later reference:
Theorem 3.1.1 Each operator A on the N -fold tensor product H ⊗N of the (finite
dimensional) Hilbert space H which commutes with all unitaries
P of the form U ⊗N
is a linear combination of permutation operators, i.e. A = π λπ Vπ , where the sum
is taken over all permutations π of N elements, λπ ∈ C and Vπ is defined by
Vπ φ1 ⊗ · · · ⊗ φN = φπ−1 (1) ⊗ · · · ⊗ φπ−1 (N ) . (3.7)
In our case (N = 2) there are only two permutations: the identity 1I and the flip
F (ψ ⊗ φ) = φ ⊗ ψ. Hence ρ = a1I + bF with appropriate coefficients a, b. Since ρ is
a density matrix, a and b are not independent. To get a transparent way to express
these constraints, it is reasonable to consider the eigenprojections P± of F rather
then 1I and F ; i.e. F P± ψ = ±P± ψ and P± = (1I ± F )/2. The P± are the projections
⊗2
on the subspaces H± ⊂ H ⊗ H of symmetric respectively antisymmetric tensor
3. Basic examples 34
products (Bose- respectively Fermi-subspace). If we write d± = d(d ± 1)/2 for the

⊗2
dimensions of H± we get for each Werner state ρ
λ (1 − λ)
ρ= P+ + P− , λ ∈ [0, 1]. (3.8)
d+ d−
On the other hand it is obvious that each state of this form is U ⊗ U invariant,
hence a Werner state.
If ρ is given, it is very easy to calculate the parameter λ from the expectation
value of ρ and the flip tr(ρF ) = 2λ − 1 ∈ [−1, 1]. Therefore we can write for an
arbitrary state σ ∈ S(H ⊗ H)
tr(σF ) + 1 (1 − tr σF )
PUU (σ) = P+ + P− , (3.9)
2d+ 2d−
and this defines a projection from the full state space to the set of Werner states
which is called the twirl operation. In many cases it is quite useful that it can be
written alternatively as a group average of the form
Z
PUU (σ) = (U ⊗ U )σ(U ∗ ⊗ U ∗ )dU, (3.10)
U(d)
where dU denotes the normalized, left invariant Haar measure on U(d). To check
this identity note first that its right hand side is indeed U ⊗ U invariant, due to the
invariance of the volume element dU . Hence we have to check only that the trace
of F times the integral coincides with tr(F σ):
" Z # Z
tr F (U ⊗ U )σ(U ∗ ⊗ U ∗ )dU = tr [F (U ⊗ U )σ(U ∗ ⊗ U ∗ )] dU (3.11)
U(d) U(d)
Z
= tr(F σ) dU = tr(F σ), (3.12)
U(d)
where we have used the fact that F commutes with U ⊗ U and the normalization of
dU . We can apply PUU obviously to arbitrary operators A ∈ B(H ⊗ H) and, as an
integral over unitarily implemented operations, we get a channel. Substituting U →
U ∗ in (3.10) and cycling the trace tr(APUU (σ)) we find tr(PUU (A)ρ) = tr(APUU (ρ)),
hence PUU has the same form in the Heisenberg and the Schrödinger picture (i.e.
∗
PUU = PUU ).
If σ ∈ S(H ⊗ H) is a separable state the integrand of PUU (σ) in Equation (3.10)
consists entirely of separable states, hence PUU (σ) is separable. Since each Werner
state ρ is the twirl of itself, we see that ρ is separable iff it is the twirl PUU (σ) of
a separable state σ ∈ S(H ⊗ H). To determine the set of separable Werner states
we therefore have to calculate only the set of all tr(F σ) ∈ [−1, 1] with separable
σ. Since each such σ admits a convex decomposition into pure product states it is
sufficient to look at
hψ ⊗ φ, F ψ ⊗ φi = |hψ, φi|2 (3.13)
which ranges from 0 to 1. Hence ρ from Equation (3.8) is separable iff 1/2 ≤ λ ≤ 1
and entangled otherwise (due to λ = (tr(F ρ) + 1)/2). If H = C2 holds, each Werner
state is Bell diagonal and we recover the result from Subsection 3.1.1 (separable if
highest eigenvalue less or equal than 1/2).
3.1.3 Isotropic states
To derive a second class of states consider the partial transpose (Id ⊗Θ)ρ (with
respect to a distinguished base |ji ∈ H, j = 1, . . . , d) of a Werner state ρ. Since ρ is,
by definition, U ⊗U invariant, it is easy to see that (Id ⊗Θ)ρ is U ⊗ Ū invariant, where
Ū denotes component wise complex conjugation in the base |ji (we just have to use
that U ∗ = Ū T holds). Each state τ with this kind of symmetry is called an isotropic
state [183], and our previous discussion shows that τ is a linear combination of 1I
and the partial transpose of the flip, which is the rank one operator
d
X
Fe = (Id ⊗Θ)F = |ΨihΨ| = |jjihkk|, (3.14)
jk=1
P
where Ψ = j |jji is, up to normalization a maximally entangled state. Hence each
isotropic τ can be written as
µ ¶ · ¸
1 1I e d2
τ= λ + (1 − λ)F , λ ∈ 0, 2 , (3.15)
d d d −1
where the bounds on λ follow from normalization and positivity. As above we can
determine the parameter λ from the expectation value
1 − d2
tr(Fe τ ) = λ+d (3.16)
d
which ranges from 0 to d and this again leads to a twirl operation: For an arbitrary
state σ ∈ S(H ⊗ H) we can define
µ ¶
1 £ ¤ £ ¤
PUŪ (σ) = tr( e σ) − d 1I + 1 − d tr(Fe σ) Fe ,
F (3.17)
d(1 − d2 )
and as for Werner states PUŪ can be rewritten in terms of a group average
Z
PUŪ (σ) = (U ⊗ Ū )σ(U ∗ ⊗ Ū ∗ )dU. (3.18)
U(d)
Now we can proceed in the same way as above: PUŪ is a channel with PU∗ Ū = PUŪ ,
its fixed points PUŪ (τ ) = τ are exactly the isotropic states, and the image of the set
of separable states under PUŪ coincides with the set of separable isotropic states.
To determine the latter we have to consider the expectation values (cf. Equation
(3.13)) ¯ ¯
¯ d ¯
¯X ¯
hψ ⊗ φ, Fe ψ ⊗ φi = ¯¯ ψj φj ¯¯ = |hψ, φ̄i|2 ∈ [0, 1]. (3.19)
¯ j=1 ¯
This implies that τ is separable iff
d(d − 1) d2
≤ λ ≤ (3.20)
d2 − 1 d2 − 1
holds and entangled otherwise. For λ = 0 we recover the maximally entangled state.
For d = 2, again we recover again the special case of Bell diagonal states encountered
already in the last subsection.
3.1.4 OO-invariant states
Let us combine now Werner states with isotropic states, i.e. we look for density
matrices ρ which can be written as ρ = a1I + bF + cFe , or, if we introduce the three
mutually orthogonal projection operators
1e 1 1 1
p0 = F, p1 = (1I − F ), (1I + F ) − Fe (3.21)
d 2 2 d
as a convex linear combination of tr(pj )−1 pj , j = 0, 1, 2:
p1 p2
ρ = (1 − λ1 − λ2 )p0 + λ1 + λ2 , λ1 , λ2 ≥ 0, λ1 + λ2 ≤ 1 (3.22)
tr(p1 ) tr(p2 )
f
tr(F ρ)
3
-1
-1 0 1 2 3
tr(F ρ)
Figure 3.1: State space of OO-invariant states (upper triangle) and its partial trans-
pose (lower triangle) for d = 3. The special cases of isotropic and Werner states are
drawn as thin lines.
Each such operator is invariant under all transformations of the form U ⊗ U if U

is a unitary with U = Ū , in other words: U should be a real orthogonal matrix.
A little bit representation theory of the orthogonal group shows that in fact all
operators with this invariance property have the form given in (3.22); cf. [221]. The
corresponding states are therefore called OO-invariant, and we can apply basically
the same machinery as in Subsection 3.1.2 if we replace the unitary group U(d)
by the orthogonal group O(d). This includes in particular the definition of a twirl
operation as an average over O(d) (for an arbitrary ρ ∈ S(H ⊗ H)):
Z
POO (ρ) = U ⊗ U ρU ⊗ U ∗ dU (3.23)
O(d)
which we can express alternatively in terms of the expectation values tr(F ρ), tr( Fe ρ)
by
Ã !
tr(Fe ρ) 1 − tr(F ρ) 1 + tr(F ρ) tr(Fe ρ) p2
POO (ρ) = p0 + p1 + − . (3.24)
d 2 tr(p1 ) 2 d tr(p2 )
The range of allowed values for tr(F ρ), tr(Fe ρ) is given by
2 tr(Fe ρ)
−1 ≤ tr(F ρ) ≤ 1, 0 ≤ tr(Fe ρ) ≤ d, tr(F ρ) ≥ − 1. (3.25)
d
For d = 3 this is the upper triangle in Figure 3.1.
The values in the lower (dotted) triangle belong to partial transpositions of
OO-invariant states. The intersection of both, i.e. the gray shaded square Q =
[0, 1]×[0, 1], represents therefore the set of OO-invariant ppt states, and at the same
time the set of separable states, since each OO-invariant ppt state is separable. To
see the latter note that separable OO-invariant states form a convex subset of Q.
Hence, we only have to show that the corners of Q are separable.
¡ To
¢ do this note
that 1. POO (ρ) is separable whenever ρ is and 2. that tr F POO (ρ) = tr(F ρ) and
¡ ¢
tr Fe POO (ρ) = tr(F ρ) holds (cf. Equation (3.12)). We can consider pure product
¡ ¢ ¡ ¢
states |φ ⊗ ψihφ ⊗ ψ| for ρ and get |hφ, ψi|2 , hφ, ψ̄i|2 for the tuple tr(F ρ), tr(Fe ρ) .
Now the point 1, 1) in Q is obtained if ψ = φ is real, the point (0, 0) is obtained
for real and orthogonal φ, ψ and the point (1, 0) belongs to the case ψ = φ and
hφ, φ̄i = 0. Symmetrically we get (0, 1) with the same φ and ψ = φ̄.
3.1.5 PPT states
We have seen in Theorem 2.4.3 that separable states and ppt states coincide in 2 × 2
and 2×3 dimensions. Another class of examples with this property are OO-invariant
states just studied. Nevertheless, separability and a positive partial transpose are
not equivalent. An easy way to produce such examples of states which are entangled
and ppt is given in terms of unextendible product bases [22]. An orthonormal family
φj ∈ H1 ⊗ H2 , j = 1, . . . , N < d1 d2 (with dk = dim Hk ) is called an unextendible
product basis1 (UPB) iff 1. all φj are product vectors and 2. there is no product
vector orthogonal to all φj . Let us denote the projector to the span of all φj by E, its
orthocomplement by E ⊥ , i.e. E ⊥ = 1I−E, and define the state ρ = (d1 d2 −N )−1 E ⊥ .
It is entangled because there is by construction no product vector in the support of
ρ, and it is ppt. The latter can be seen as follows: The projector E is a sum of the
one dimensional projectors |φj ihφj |, j = 1, . . . , N . Since all φj are product vectors
the partial transposes of the |φj ihφj | are of the form |φej ihφej |, with another UPB
φej , j = 1, . . . , N and the partial transpose (1I ⊗ Θ)E of E is the sum of the |φej ihφej |.
Hence (1I ⊗ Θ)E ⊥ = 1I − (1I ⊗ Θ)E is a projector and therefore positive.
To construct entangled ppt states we have to find UPBs. The following two
examples are taken from [22]. Consider first the five vectors
φj = N (cos(2πj/5), sin(2πj/5), h), j = 0, . . . , 4, (3.26)

p √ p √
with N = 2/ 5 + 5 and h = 21 1 + 5. They form the apex of a regular pen-
tagonal pyramid with height h. The latter is chosen such that nonadjacent vectors
are orthogonal. It is now easy to show that the five vectors
Ψj = φj ⊗ φ2jmod5 , j = 0, . . . , 4 (3.27)
form a UPB in the Hilbert space H ⊗ H, dim H = 3 (cf. [22]). A second example,
again in 3×3 dimensional Hilbert space are the following five vectors (called “Tiles”
in [22]):
1 ¡ ¢ 1 ¡ ¢ 1 ¡ ¢
√ |0i ⊗ |0i − |1i , √ |2i ⊗ |1i − |2i , √ |0i − |1i ⊗ |2i,
2 2 2
1 ¡ ¢ 1¡ ¢ ¡ ¢
√ |1i − |2i ⊗ |0i, |0i + |1i + |2i ⊗ |0i + |1i + |2i , (3.28)
2 3
where |ki, k = 0, 1, 2 denotes the standard basis in H = C3 .
3.1.6 Multipartite states
In many applications of quantum information rather big systems, consisting of a
large number of subsystems, occur (e.g. a quantum register of a quantum computer)
and it is necessary to study the corresponding correlation and entanglement prop-
erties. Since this is a fairly difficult task, there is not much known about – much less
1 This name is somewhat misleading because the φj are not a base of H1 ⊗ H2 .
as in the two-partite case, which we mainly consider in this paper. Nevertheless, in

this subsection we will give a rough outline of some of the most relevant aspects.
At the level of pure states the most significant difficulty is the lack of an analog
of the Schmidt decomposition [179]. More precisely there are elements in an N -fold
tensor product H(1) ⊗ · · · ⊗ H(N ) (with N > 2) which can not be written as2
d
X (1) (N )
Ψ= λ j φj ⊗ · · · ⊗ φ j (3.29)
j=1
(k) (k)
with N orthonormal bases φ1 , . . . , φd of H(k) , k = 1, . . . , N . To get examples for
such states in the tri-partite case, note first that any partial trace of |ΨihΨ| with Ψ
from Equation (3.29) has separable eigenvectors. Hence, each purification (Corollary
2.2.2) of an entangled, two-partite, mixed state with inseparable eigenvectors (e.g.
a Bell diagonal state) does not admit a Schmidt decomposition. This implies on
the one hand that there are interesting new properties to be discovered, but on
the other we see that many techniques developed for bipartite pure states can be
generalized in a straightforward way only for states which are Schmidt decomposable
in the sense of Equation (3.29). The most well known representative of this class
for a tripartite qubit system is the GHZ state [101]
1 ¡ ¢
Ψ = √ |000i + |111i , (3.30)
2
which has the special property that contradictions between local hidden variable
theories and quantum mechanics occur even for non-statistical predictions (as op-
posed to maximally entangled states of bipartite systems; [101, 163, 162]).
A second new aspect arising in the discussion of multiparty entanglement is the
fact that several different notions of separability occur. A state ρ of an N -partite
system B(H1 ) ⊗ · · · ⊗ B(HN ) is called N -separable if
X
ρ= λ J ρ j1 ⊗ · · · ⊗ ρ jN , (3.31)
J
with states ρjk ∈ B ∗ (Hk ) and multi indices J = (j1 , . . . , jk ). Alternatively, how-
ever, we can decompose B(H1 ) ⊗ · · · ⊗ B(HN ) in two subsystems (or even into M
subsystems if M < N ) and call ρ biseparable if it is separable with respect to this
decomposition. It is obvious that N -separability implies biseparability with respect
to all possible decompositions. The converse is – not very surprisingly – not true.
One way to construct a corresponding counterexample is to use an unextendable
product base (cf. Subsection 3.1.5). In [22] it is shown that the tripartite qubit state
complementary to the UPB
1
|0, 1, +i, |1, +, 0i, |+, 0, 1i, |−, −, −i with |±i = √ (|0i ± |1i) (3.32)
2
is entangled (i.e. tri-inseparable) but biseparable with respect to any decomposition

into two subsystems (cf. [22] for details).
Another, maybe more systematic, way to find examples for multipartite states
with interesting properties is the generalization of the methods used for Werner
states (Subsection 3.1.2), i.e. to look for density matrices ρ ∈ B ∗ (H⊗N ) which
commute with all unitaries of the form U ⊗N . Applying again theorem 3.1.1 we
see that each such ρ is a linear combination of permutation unitaries. Hence the
2 There is however the possibility to choose the bases φ (k) (k)
1 , . . . , φd such that the number of
summands becomes minimal. For tri-partite systems this “minimal canonical form” is study in [1].
39 3.2. Channels
structure of the set of all U ⊗N invariant states can be derived from representation
theory of the symmetric group (which can be tedious for large N !). For N = 3
this program is carried out in [81] and it turns out that the corresponding set of
invariant states is a five dimensional (real) manifold. We skip the details here and
refer to [81] instead.
3.2 Channels
In Section 2.3 we have introduced channels as very general objects transforming
arbitrary types of information (i.e. classical, quantum and mixtures of them) into
one another. In the following we will consider some of the most important special
cases.
3.2.1 Quantum channels
Many tasks of quantum information theory require the transmission of quantum
information over long distances, using devices like optical fibers or storing quantum
information in some sort of memory. Both situations can be described by a channel
or quantum operation T : B(H) → B(H), where T ∗ (ρ) is the quantum information
which will be received when ρ was sent, or alternatively: which will be read off
the quantum memory when ρ was written. Ideally we would prefer those channels
which do not affect the information at all, i.e. T = 1I, or, as the next best choice,
a T whose action can be undone by a physical device, i.e. T should be invertible
and T −1 is again a channel. The Stinespring Theorem (Theorem 2.3.2) immediately
shows that this implies T ∗ ρ = U ρU ∗ with a unitary U ; in other words the systems
carrying the information do not interact with the environment. We will call such
a kind of channel an ideal channel. In real situations however interaction with the
environment, i.e. additional, unobservable degrees of freedom, can not be avoided.
The general structure of such a noisy channel is given by
¡ ¢
T ∗ (ρ) = trK U (ρ ⊗ ρ0 )U ∗ (3.33)
where U : H ⊗ K → H ⊗ K is a unitary operator describing the common evolution of

the system (Hilbert space H) and the environment (Hilbert space K) and ρ 0 ∈ S(K)
is the initial state of the environment (cf. Figure 3.2). It is obvious that the quantum
information originally stored in ρ ∈ S(H) can not be completely recovered from
T ∗ (ρ) if only one system is available. It is an easy consequence of the Stinespring
theorem that each channel can be expressed in this form
Corollary 3.2.1 (Ancilla form) Assume that T : B(H) → B(H) is a channel.
Then there is a Hilbert space K, a pure state ρ0 and a unitary map U : H ⊗ K →
H ⊗ K such that Equation (3.33) holds. It is always possible, to choose K such that
dim(K) = dim(H)3 holds.
Proof. Consider the Stinespring form T (A) = V ∗ (A ⊗ 1I)V with V : H → H ⊗ K
of T and choose a vector ψ ∈ K such that U (φ ⊗ ψ) = V (φ) can be extended to
a unitary map U : H ⊗ K → H ⊗ K (this is always possible since T is unital and
V therefore isometric). If ej ∈ H, j = 1, . . . , d1 and fk ∈ K, k = 1, . . . , d2 are
orthonormal bases with f1 = ψ we get
£ ¤ £ ¤ X
tr T (A)ρ = tr ρV ∗ (A ⊗ 1I)V = hV ρej , (A ⊗ 1I)V ej i (3.34)
j
XD E
= U (ρ ⊗ |ψihψ|)(ej ⊗ fk ), (A ⊗ 1I)U (ej ⊗ fk ) (3.35)
jk
h £ ¤ i
= tr trK U (ρ ⊗ |ψihψ|)U ∗ A , (3.36)
which proves the statement. 2

ρ T ∗ (ρ)
Unitary
A
Figure 3.2: Noisy channel
Note that there are in general many ways to express a channel this way, e.g. if
T is an ideal channel ρ 7→ T ∗ ρ = U ρU ∗ we can rewrite it with an arbitrary unitary
U0 : K → K by T ∗ ρ = tr2 (U ⊗ U0 ρ ⊗ ρ0 U ∗ ⊗ U0∗ ). This is the weakness of the ancilla
form compared to the Stinespring representation of Theorem 2.3.2. Nevertheless
Corollary 3.2.1 shows that each channel which is not an ideal channel is noisy in
the described way.
The most prominent example for a noisy channel is the depolarizing channel for
d-level systems (i.e. H = Cd )
1I
S(H) 3 ρ 7→ ϑρ + (1 − ϑ) ∈ S(H), 0≤ϑ≤1 (3.37)
d
or in the Heisenberg picture
tr(A)
B(H) 3 A 7→ ϑA + (1 − ϑ) 1I ∈ B(H). (3.38)
d
A Stinespring dilation of T (not the minimal one – this can be checked by counting
dimensions) is given by K = H ⊗ H ⊕ C and V : H → H ⊗ K = H ⊗3 ⊕ H with
"r #
1−ϑ X
d h√ i
|ji 7→ V |ji = |ki ⊗ |ki ⊗ |ji ⊕ ϑ|ji , (3.39)
d
k=1
where |ki, k = 1, . . . , d denotes again the canonical basis in H. An ancilla form of

T with the same K is given by the (pure) environment state
"r #
1−ϑ X
d h√ i
ψ= |ki ⊗ |ki ⊕ ϑ|0i ∈ K (3.40)
d
k=1
and the unitary operator U : H ⊗ K → H ⊗ K with

U (φ1 ⊗ φ2 ⊗ φ3 ⊕ χ) = φ2 ⊗ φ3 ⊗ φ1 ⊕ χ, (3.41)
i.e. U is the direct sum of a permutation unitary and the identity.
3.2.2 Channels under symmetry
Similarly to the discussion in Section 3.1 it is often useful to consider channels
with special symmetry properties. To be more precise, consider a group G and
two unitary representations π1 , π2 on the Hilbert spaces H1 and H2 respectively. A
channel T : B(H1 ) → B(H2 ) is called covariant (with respect to π1 and π2 ) if
T [π1 (U )Aπ1 (U )∗ ] = π2 (U )T [A]π2 (U )∗ ∀A ∈ B(H1 ) ∀U ∈ G (3.42)
41 3.2. Channels
holds. The general structure of covariant channels is governed by a fairly powerful

variant of Stinesprings theorem which we will state below (and which will be very
useful for the study of the cloning problem in Chapter 8). Before we do this let
us have a short look on a particular class of examples which is closely related to
OO-invariant states.
Hence consider a channel T : B(H) → B(H) which is covariant with respect
to the orthogonal group, i.e. T (U AU ∗ ) = U T (A)U ∗ for all unitaries U on H with
Ū = UPin a distinguished basis |ji, j = 1, . . . , d. The maximally entangled state ψ =
d−1/2 j |jji is OO-invariant, i.e. U ⊗ U ψ = ψ for all these U . Therefore each state
ρ = (Id ⊗T ∗ )|ψihψ| is OO-invariant as well and by the duality lemma (Theorem
2.3.5) T and ψ are uniquely determined (up to unitary equivalence) by ρ. This
means we can use the structure of OO-invariant states derived in Subsection 3.1.4
to characterize all orthogonal covariant channels. As a first step consider the linear
maps X1 (A) = d tr(A)1I, X2 (A) = dAT and X3 (A) = dA. They are not channels
(they are not unital and X2 is not cp) but they have the correct covariance property
and it is easy to see that they correspond to the operators 1I, F, Fe ∈ B(H ⊗ H), i.e.
(Id ⊗X1 )|ψihψ| = 1I, (Id ⊗X2 )|ψihψ| = F, (Id ⊗X3 )|ψihψ| = Fe . (3.43)
Using Equation (3.21) we can determine therefore the channels which belong to the
three extremal OO-invariant states (the corners of the upper triangle in Figure 3.1):
tr(A)1I − AT
T0 (A) = A, T1 (A) = (3.44)
d−1
· ¸
2 d¡ T
¢
T2 (A) = tr(A)1I + A − A (3.45)
d(d + 1) − 2 2
Each OO-invariant channel is a convex linear combination of these three. Special

cases are the channels corresponding to Werner and isotropic states. The latter leads
to depolarizing channels T (A) = ϑA + (1 − ϑ)d−1 tr(A)1I with ϑ ∈ [0, d2 /(d2 − 1)];
cf. Equation (3.15), while Werner states correspond to
ϑ £ ¤ 1 − ϑ£ ¤
T (A) = tr(A)1I + AT + tr(A)1I − AT , ϑ ∈ [0, 1]; (3.46)
d+1 d−1
cf. Equation (3.8).
Let us come back now to the general case. We will state here the covariant
version of the Stinespring theorem (see [136] for a proof). The basic idea is that all
covariant channels are parameterized by representations on the dilation space.
Theorem 3.2.2 Let G be a group with finite dimensional unitary representations
πj : G → U(Hj ) and T : B(H1 ) → B(H2 ) a π1 , π2 - covariant channel.
1. Then there is a finite dimensional unitary representation π e : G → U(K)

and an operator V : H2 → H1 ⊗ K with V π2 (U ) = π1 (U ) ⊗ π e(U )V and
T (A) = V ∗ A ⊗ 1IV .
P
2. If T = α T α is a decomposition of T inPcompletely positive and covariant
summands, there is a decomposition 1I = α F α of the identity operator on
K into positive operators F α ∈ B(K) with [F α , π
e(g)] = 0 such that T α (X) =
∗ α
V (X ⊗ F )V .
To get an explicit example consider the dilation of a depolarizing channel given

e(U ) = (U ⊗ Ū )⊕1I.
in Equation (3.39). In this case we have π1 (U ) = π2 (U ) = U and π
The check that the map V has indeed the intertwining property V π2 (U ) = π1 (U ) ⊗
π
e(U ) stated in the theorem is left as an exercise to the reader.
3.2.3 Classical channels

The classical analog to a quantum operation is a channel T : C(X) → C(Y ) which
describes the transmission or manipulation of classical information. As we have men-
tioned already in Subsection 2.3.1 positivity and complete positivity are equivalent
in this case. Hence we have to assume only that T is positive and unital. Obviously
T is characterized by its matrix elements Txy = δy (T |xihx|), where δy ∈ C ∗ (X) de-
notes the Dirac measure at y ∈ Y and |xihx| ∈ C(X) is the canonical basis in C(X)
(cf. Subsection 2.1.3). Positivity and normalization of T imply that 0 ≤ Txy ≤ 1
and h ³X í X
¡ ¢
1 = δy (1I) = δy T (1I) = δy T |xihx| = Txy (3.47)
x x
holds. Hence the family (Txy )x∈X is a probability distribution on X and Txy is
therefore the probability to get the information x ∈ X at the output side of the
channel if y ∈ Y was send. Each classical channel is uniquely determined by its
matrix of transition probabilities. For X = Y we see that the information is trans-
mitted without error iff Txy = δxy , i.e. T is an ideal channel if T = Id holds and
noisy otherwise.
3.2.4 Observables and preparations
Let us consider now a channel which transforms quantum information B(H) into
classical information C(X). Since positivity and complete positivity are again equiv-
alent, we just have to look at a positive and unital map E : C(X) → B(H). With
the canonical basis |xihx|, x ∈ X of C(X)P we get a family Ex = E(|xihx|), x ∈ X
of positive operators Ex ∈ B(H) with x∈X Ex = 1I. Hence the Ex form a POV
measure, i.e. an observable. If on the other hand a POV measure Ex ∈ B(H), x ∈ X
is given we can define a quantum to classical channel E : C(X) → B(H) by
X
E(f ) = f (x)Ex . (3.48)
x∈X
This shows that the observable Ex , x ∈ X and the channel E can be identified and
we say E is the observable.
Keeping this interpretation in mind it is possible to have a short look at con-
tinuous observables without the need of abstract measure theory: We only have to
define the classical algebra C(X) for a set X which is not finite or discrete. To this
end assume that X is locally compact space (e.g. an open or closed subset of R d ).
We choose for C(X) the space of continuous, complex valued functions vanishing at
infinity, i.e. |f (x)| < ² for each ² > 0 provided x lies outside an appropriate compact
set. C(X) can be equipped with the sup-norm and becomes an Abelian C*-algebra
(cf. [35]). To interpret it as an operator algebra as assumed in Subsection 2.1.1
we have to identify f ∈ C(X) with the corresponding multiplication operator on
L2 (X, µ), where µ is an appropriate measure on X (e.g. the Lebesgue measure for
X ⊂ Rd ). An observable taking arbitrary values in X can now be defined as a
positive map E : C(X) → B(H). The probability to get a result in the open subset
ω ⊂ X during an E measurement on systems in the state ρ is
Kρ (ω) = sup {tr(E(f )ρ) | f ∈ C(X), 0 ≤ f ≤ 1I, supp f ⊂ ω} (3.49)
where supp denotes the support of f . Applying a little bit measure theory (basically
the Riesz-Markov theorem [186, Thm. IV.18] together with dominated convergence
[186, Thm. I.16] and linearity of the trace) it is easy to see that we can express
Kρ (ω) for each ρ by a positive operator E(ω) such that
¡ ¢
Kρ (ω) = tr E(ω)ρ (3.50)
43 3.2. Channels
holds. The family of operators E(ω) we get in this way has typical properties of a
measure (e.g. some sort of σ-additivity). Hence we have encountered the continuous
version of a POV measure. We do not want to discuss the technical details here
(cf. [115, Sect. 2.1] instead). For later use we will only remark here that we can
reconstruct the channel f 7→ E(f ) from the measure ω 7→ E(ω) in terms of the
integrals Z
E(f ) = f (x)E(dx) (3.51)
X
which should be regarded as the continuous variable analog of Equation (3.48).

The most well known example for R valued observables are of course position Q
and momentum P of a free particle in one dimension. In this case we have H = L 2 (R)
and the channels corresponding to Q and P are (in position representation) given by
C(R) 3 f 7→ EQ (f ) ∈ B(H) with EQ (f )ψ = f ψ respectively C(R) 3 f 7→ EP (f ) ∈
B(H) with EP (f )ψ = (f ψ)b ∨ where ∧ and ∨ denote the Fourier transform and its
inverse.
Let us return now to a finite set X and exchange the role of C(X) and B(H); in
other words let us consider a channel R : B(H) → C(X) with a classical input and
a quantum output algebra. In the Schrödinger picture we get a family of density
matrices ρx := R∗ (δx ) ∈ B ∗ (H), x ∈ X, where δx ∈ C ∗ (X) again denote the Dirac
measures (cf. Subsection 2.1.3). Hence we get a parameter dependent preparation
which can be used to encode the classical information x ∈ X into the quantum
information ρx ∈ B ∗ (H).
3.2.5 Instruments and parameter dependent operations
An observable describes only the statistics of measuring results, but does not contain
information about the state of the system after the measurement. To get a descrip-
tion which fills this gap we have to consider channels which operates on quantum
systems and produces hybrid systems as output, i.e. T : B(H) ⊗ M(X) → B(K).
Following Davies [66] we will call such an object an instrument. From T we can
derive the subchannel
C(X) 3 f 7→ T (1I ⊗ f ) ∈ B(K) (3.52)
£ ¡ ¢ ¤
which is the observable measured by T , i.e. tr T 1I ⊗ |xihx| ρ is the probability to
measure x ∈ X on systems in the state ρ. On the other hand we get for each x ∈ X
a quantum channel (which is not unital)
B(H) 3 A 7→ Tx (A) = T (A ⊗ |xihx|) ∈ B(K). (3.53)
It describes the operation performed by the instrument T if x ∈ X was measured.

More precisely if a measurement on systems in the state ρ gives the result x ∈ X we
get (up to normalization) the state Tx∗ (ρ) after the measurement (cf. Figure 3.3),
while ¡ ¢
tr (Tx∗ (ρ)) = tr (Tx∗ (ρ)1I) = tr ρT (1I ⊗ |xihx|) (3.54)
Tx∗ (ρ) ∈ B ∗ (H)
ρ ∈ B ∗ (K)
T
x∈X
Figure 3.3: Instrument

ρ ∈ B ∗ (H)
Tx∗ (ρ) ∈ B ∗ (K)

T
x∈X
Figure 3.4: Parameter dependent operation
is (again) the probability to measure x ∈ X on ρ. The instrument T can be expressed

in terms of the operations Tx by
X
T (A ⊗ f ) = f (x)Tx (A); (3.55)
x
hence we can identify T with the family Tx , x ∈ X. Finally we can consider the
second marginal of T
X
B(H) 3 A 7→ T (A ⊗ 1I) = Tx (A) ∈ B(K). (3.56)
x∈X
It describes the operation we get if the outcome of the measurement is ignored.

The most well known example of an instrument is a von Neumann-Lüders mea-
surement associated to a PV measure given by family of projections E x , x = 1, . . . d;
e.g. the eigenprojections of a selfadjoint operator A ∈ B(H). It is defined as the
channel
T : B(H) ⊗ C(X) → B(H) with X = {1, . . . , d} and Tx (A) = Ex AEx , (3.57)
Hence we get the final state tr(Ex ρ)−1 Ex ρEx if we measure the value x ∈ X on
systems initially in the state ρ – this is well known from quantum mechanics.
Let us change now the role of B(H) ⊗ C(X) and B(K); in other words consider
a channel T : B(K) → B(H) ⊗ C(X) with hybrid input and quantum output. It
describes a device which changes the state of a system depending on additional
classical information. As for an instrument, T decomposes into a family
P of (uni-
tal!) channels Tx : B(K) → B(H) such that we get T ∗ (ρ ⊗ p) = x p x T ∗
x (ρ) in
the Schrödinger picture. Physically T describes a parameter dependent operation:
depending on the classical information x ∈ X the quantum information ρ ∈ B(K)
is transformed by the operation Tx (cf. figure 3.4)
Finally we can consider a channel T : B(H) ⊗ C(X) → B(K) ⊗ C(Y ) with hybrid
input and output to get a parameter dependent instrument (cf. figure 3.5): Similarly
to the discussion in the last paragraph we can define a family of instruments T y :
∗
ρ ∈ B ∗ (H) Ty,x (ρ) ∈ B ∗ (K)
T
y∈Y x∈X
Figure 3.5: Parameter dependent instrument

45 3.2. Channels
Alice Bob
T ∗ ρ ∈ B ∗ (K1 ⊗ K2 )
ρ ∈ B ∗ (H1 ⊗ H2 )
TA
TB
Figure 3.6: One way LOCC operation; cf Figure 3.7 for an explanation.
P
B(H) ⊗ C(X) → B(K), y ∈ Y by the equation T ∗ (ρ ⊗ p) = y py Ty∗ (ρ). Physically
T describes the following device: It receives the classical information y ∈ Y and a
quantum system in the state ρ ∈ B ∗ (K) as input. Depending on y a measurement
with the instrument Ty is performed, which in turn produces the measuring value
∗
x ∈ X and leaves the quantum system in the state (up to normalization) T y,x (ρ);
with Ty,x given as in Equation (3.53) by Ty,x (A) = Ty (A ⊗ |xihx|).
3.2.6 LOCC and separable channels
Let us consider now channels acting on finite dimensional bipartite systems: T :
B(H1 ⊗ K2 ) → B(K1 ⊗ K2 ). In this case we can ask the question whether a channel
preserves separability. Simple examples are local operations (LO), i.e. T = T A ⊗ T B
with two channels T A,B : B(Hj ) → B(Kj ). Physically we think of such a T in terms
of two physicists Alice and Bob both performing operations on their own particle but
without information transmission neither classical nor quantum. The next difficult
step are local operations with one way classical communications (one way LOCC).
This means Alice operates on her system with an instrument, communicates the
classical measuring result j ∈ X = {1, . . . , N } to Bob and he selects an operation
depending on these data. We can write such a channel as a composition T = (T A ⊗
Id)(Id ⊗T B ) of the instrument T A : B(H1 ) ⊗ C(X1 ) → B(K1 ) and the parameter
dependent operation T B : B(H2 ) → C(X1 ) ⊗ B(K2 ) (cf. Figure 3.6)
Id ⊗T B T A ⊗Id (3.58)
B(H1 ⊗ H2 ) −−−−−→ B(H1 ) ⊗ C(X) ⊗ B(K2 ) −−−−→ B(K1 ⊗ K2 ).
It is of course possible to continue the chain in Equation (3.58), i.e. instead of

just operating on his system, Bob can invoke a parameter dependent instrument de-
pending on Alice’s data j1 ∈ X1 , send the corresponding measuring results j2 ∈ X2
to Alice and so on. To write down the corresponding chain of maps (as in Equation
(3.58)) is simple but not very illuminating and therefore omitted; cf. Figure 3.7 in-
stead. If we allow Alice and Bob to drop some of their particles, i.e. the operations
they perform need not to be unital, we get a LOCC channel (“local operations and
classical communications”). It represents the most general physical process which
can be performed on a two partite system if only classical communication (in both
directions) is available.
LOCC channels play a significant role in entanglement theory (we will see this
in Section 4.3), but they are difficult to handle. Fortunately it is often possible
to replace them by closely related operations with a more simple structure: A not
necessarily unital channel T : B(H1 ⊗ K2 ) → B(K1 ⊗ K2 ) is called separable, if it is
Alice Bob Alice Bob
Figure 3.7: LOCC operation. The upper and lower curly arrows represent Alice’s
respectively Bob’s quantum system, while the straight arrows in the middle stand
for the classical information Alice and Bob exchange. The boxes symbolize the
channels applied by Alice and Bob.
a sum of (in general non-unital) local operations, i.e.

N
X
T = TjA ⊗ TjB . (3.59)
j=1
It is easy to see that a separable T maps separable states to separable states (up
to normalization) and that each LOCC channel is separable (cf. [21]). The converse
however is (somewhat surprisingly) not true: there are separable channels which are
not LOCC, see [21] for a concrete example.
3.3 Quantum mechanics in phase space
Up to now we have considered only finite dimensional systems and even in this
extremely idealized situation it is not easy to get nontrivial results. At a first look
the discussion of continuous quantum systems seems therefore to be hopeless. If we
restrict our attention however to small classes of states and channels, with suffi-
ciently simple structure, many problems become tractable. Phase space quantum
mechanics, which will be reviewed in this Section (see Chapter 5 of [111] for details),
provides a very powerful tool in this context.
Before we start let us add some remarks to the discussion of Chapter 2 which we
have restricted to finite dimensional Hilbert spaces. Basically most of the material
considered there can be generalized in a straightforward way, as long as topological
issues like continuity and convergence arguments are treated carefully enough. There
are of course some caveats (cf. in particular Footnote 2 of Chapter 2), however they
do not lead to problems in the framework we are going to discuss and can therefore
be ignored.
3.3.1 Weyl operators and the CCR
The kinematical structure of a quantum system with d degrees of freedom is
usually described by a separable Hilbert space H and 2d selfadjoint operators
Q1 , . . . , Qd , P1 , . . . , Pd satisfying the canonical commutation relations [Qj , Qk ] = 0,
[Pj , Pk ] = 0, [Qj , Pk ] = iδjk 1I. The latter can be rewritten in a more compact form
as
R2j−1 = Qj , R2j = Pj , j = 1, . . . , d, [Rj , Rk ] = −iσjk . (3.60)
47 3.3. Quantum mechanics in phase space
Here σ denotes the symplectic matrix

· ¸
0 1
σ = diag(J, . . . , J), J= , (3.61)
−1 0
which plays a crucial role for the geometry of classical mechanics. We will call
the pair (V, σ) consisting of σ and the 2d-dimensional real vector space V = R 2d
henceforth the classical phase space.
The relations in Equation (3.60) are, however, not sufficient to fix the opera-
tors Rj up to unitary equivalence. The best way to remove the remaining physical
ambiguities is the study of the unitaries
2d
X
W (x) = exp(ix · σ · R), x ∈ V, x · σ · R = xj σjk Rk (3.62)
jk=1
instead of the Rj directly. If the family W (x), x ∈ V is irreducible (i.e. [W (x), A] =

0, ∀x ∈ V implies A = λ1I with λ ∈ C) and satisfies3
µ ¶
i
W (x)W (x0 ) = exp − x · σ · x0 W (x + x0 ), (3.63)
2
it is called an (irreducible) representation of the Weyl relations (on (V, σ)) and the
operators W (x) are called Weyl operators. By the well known Stone - von Neumann
uniqueness theorem all these representations are mutually unitarily equivalent, i.e. if
we have two of them W1 (x), W2 (x), there is a unitary operator U with U W1 (x)U ∗ =
W2 (x) ∀x ∈ V . This implies that it does not matter from a physical point of view
which representation we use. The most well known one is of course the Schrödinger
representation where H = L2 (Rd ) and Qj , Pk are the usual position and momentum
operators.
3.3.2 Gaussian states
A density operator ρ ∈ S(H) has finite second moments if the expectation values
tr(ρQ2j ) and tr(ρPj2 ) are finite for all j = 1, . . . , d. In this case we can define the
mean m ∈ R2d and the correlation matrix α by
£
mj = tr(ρR), αjk + iσjk = 2 tr (Rj − mj )ρ(Rk − mk )]. (3.64)
The mean m can be arbitrary, but the correlation matrix α must be real and sym-
metric and the positivity condition
α + iσ ≥ 0 (3.65)
must hold (this is an easy consequence of the canonical commutation relations
(3.60)).
Our aim is now to distinguish exactly one state among all others with the same
mean and correlation matrix. This is the point where the Weyl operators come into
play. Each state ρ ∈ S(H) £can be characterized
¤ uniquely by its quantum character-
istic function X 3 x 7→ tr W (x)ρ ∈ C which should be regarded as the quantum
Fourier transform of ρ and is in fact the Fourier transform of the Wigner function
of ρ [228]. We call ρ Gaussian if
µ ¶
£ ¤ 1
tr W (x)ρ = exp im · x − x · α · x (3.66)
4
3 Note that the CCR (3.60) are implied by the Weyl relations (3.63) but the converse is, in
contrast to popular believe, not true: There are representations of the CCR which are unitarily
inequivalent to the Schrödinger representation; cf. [186] Section VIII.5 for particular examples.
Hence uniqueness can only be achieved on the level of Weyl operators – which is one major reason
to study them.
holds. By differentiation it is easy to check that ρ has indeed mean m and covariance
matrix α.
The most prominent examples for Gaussian states are the ground state ρ 0 of a
system of d harmonic oscillators (where the mean is 0 and α is given by the corre-
sponding classical Hamiltonian) and its phase space translates ρ m = W (m)ρW (−m)
(with mean m and the same α as ρ0 ), which are known from quantum optics as
coherent states. ρ0 and ρm are pure states and it can be shown that a Gaussian
state is pure iff σ −1 α = −1I holds (see [111], Ch. 5). Examples for mixed Gaussians
are temperature states of harmonic oscillators. In one degree of freedom this is
∞ µ ¶n
1 X N
ρN = |nihn| (3.67)
N + 1 n=0 N + 1
where |nihn| denotes the number basis and N is the mean photon number. The
characteristic function of ρN is
· µ ¶ ¸
£ ¤ 1 1
tr W (x)ρN = exp − N+ |x|2 , (3.68)
2 2
and its correlation matrix is simply α = 2(N + 1/2)1I

3.3.3 Entangled Gaussians
Let us now consider bipartite systems. Hence the phase space (V, σ) decomposes
into a direct sum V = VA ⊕ VB (where A stands for “Alice” and B for “Bob”)
and the symplectic matrix σ = σA ⊕ σB is block diagonal with respect to this
decomposition. If WA (x) respectively WB (y) denote Weyl operators, acting on the
Hilbert spaces HA , HB , and corresponding to the phase spaces VA and VB , it is
easy to see that the tensor product WA (x) ⊗ WB (y) satisfies the Weyl relations with
respect to (V, σ). Hence by the Stone - von Neumann uniqueness theorem we can
identify W (x ⊕ y), x ⊕ y ∈ Va ⊕ VB = V with WA (x) ⊗ WA (y). This immediately
shows that a state ρ on H = HA ⊗ HB is a product state iff its characteristic
function factorizes. Separability4 is characterized as follows (we omit the proof, see
[234] instead).
Theorem 3.3.1 A Gaussian state with covariance matrix α is separable iff there
are covariance matrices αA , αB such that
· ¸
αA 0
α≥ (3.69)
0 αB
holds.
This theorem is somewhat similar to Theorem 2.4.1: It provides a useful criterion
as long as abstract considerations are concerned, but not for explicit calculations.
In contrast to finite dimensional systems, however, separability of Gaussian states
can be decided by an operational criterion in terms of nonlinear maps between
matrices [93]. To state it we have to introduce some terminology first. The key tool
is a sequence of 2n + 2m × 2n + 2m matrices αN , N ∈ N, written in block matrix
notation as · ¸
A N CN
αN = T . (3.70)
CN BN
Given α0 the other αN are recursively defined by:
AN +1 = BN +1 = AN − Re(XN ) and CN +1 = − Im(XN ) (3.71)

4 In infinite dimensions we have to define separable states (in slight generalization to Definition
2.2.5) as a trace-norm convergent convex sum of product states.

if αN −iσ ≥ 0 and αN +1 = 0 otherwise. Here we have set XN = CN (BN −iσB )−1 CNT
5
and the inverse denotes the pseudo inverse if BN − iσB is not invertible. Now we
can state the following theorem (see [93] for a proof):
Theorem 3.3.2 Consider a Gaussian state ρ of a bipartite system with correlation
matrix α0 and the sequence αN , N ∈ N just defined.
1. If for some N ∈ N we have AN − iσA 6≥ 0 then ρ is not separable.
2. If there is on the other hand an N ∈ N such that AN − kCN k1I − iσA ≥ 0,

then the state ρ is separable (kCN k denotes the operator norm of CN ).
To check whether a Gaussian state ρ is separable or not we have to iterate

through the sequence αN until either condition 1 or 2 holds. In the first case we
know that ρ is entangled and separable in the second. Hence only the question
remains whether the whole procedure terminates after a finite number of iterations.
This problem is treated in [93] and it turns out that the set of ρ for which separability
is decidable after a finite number of steps is the complement of a measure zero set
(in the set of all separable states). Numerical calculations indicate in addition that
the method converges usually very fast (typically less than five iterations).
To consider ppt states we first have to characterize the transpose for infinite
dimensional systems. There are different ways to do that. We will use the fact that
the adjoint of a matrix can be regarded as transposition followed by componentwise
complex conjugation. Hence we define for any (possibly unbounded) operator A T =
CA∗ C, where C : H → H denotes complex conjugation of the wave function in
position representation. This implies QTj = Qj for position and PjT = −Pj for
momentum operators. If we insert the partial transpose of a bipartite state ρ into
Equation (3.64) we see that the correlation matrix α ejk of ρT picks up a minus sign
whenever one of the indices belongs to one of Alice’s momentum operators. To be
a state α
e should satisfy α e + iσ ≥ 0, but this is equivalent to α + ie σ ≥ 0, where in
σ
e the corresponding components are reversed i.e. σ e = (−σA ) ⊕ σB . Hence we have
shown
Proposition 3.3.3 A Gaussian state is ppt iff its correlation matrix α satisfies
· ¸
−σA 0
σ ≥ 0 with σ
α + ie e= . (3.72)
0 σB
The interesting question is now whether the ppt criterion is (for a given number
of degrees of freedom) equivalent to separability or not. The following theorem which
was proved in [197] for 1 × 1 systems and in [234] in 1 × d case gives a complete
answer.
Theorem 3.3.4 A Gaussian state of a quantum system with 1 × d degrees of free-
dom (i.e. dim XA = 2 and dim XB = 2d) is separable iff it is ppt; in other words iff
the condition of Proposition 3.3.3 holds.
For other kinds of systems the ppt criterion may fail which means that there
are entangled Gaussian states which are ppt. A systematic way to construct such
states can be found in [234]. Roughly speaking, it is based on the idea to go to the
boundary of the set of ppt covariance matrices, i.e. α has to satisfy Equation (3.65)
and (3.72) and it has to be a minimal matrix with this property. Using this method
explicit examples for ppt and entangled Gaussians are constructed for 2 × 2 degrees
of freedom (cf. [234] for details).
5 A−1 is the pseudo inverse of a matrix A if AA−1 = A−1 A is the projector onto the range of
A. If A is invertible A−1 is the usual inverse.

3.3.4 Gaussian channels

Finally we want to give a short review on a special class of channels for infinite
dimensional quantum systems (cf. [116] for details). To explain the basic idea firstly
note that each finite set of Weyl operators (W (xj ), j = 1, . . . , N , xj 6= xk for
j 6=Pk) is linear independent. This can be checked easily using expectation values
of j λj W (xj ) in Gaussian states. Hence linear maps on the space of finite linear
combinations of Weyl operators can be defined by T [W (x)] = f (x)W (Ax) where f
is a complex valued function on V and A is a 2d × 2d matrix. If we choose A and f
carefully enough, such that some continuity properties match T can be extended in
a unique way to a linear map on B(H) – which is, however, in general not completely
positive.
This means we have to consider special choices for A and f . The most easy
case arises if f ≡ 1 and A is a symplectic isomorphism, i.e. AT σA = σ. If this
holds the map V 3 x 7→ W (Ax) is a representation of the Weyl relations and
therefore unitarily equivalent to the representation we have started with. In other
words there is a unitary operator U with T [W (x)] = W (Ax) = U W (x)U ∗ , i.e.
T is unitarily implemented, hence completely positive and, in fact, well known as
Bogolubov transformation.
If A does not preserve the symplectic matrix, f ≡ 1 is no option. Instead we
have to choose f such that the matrices
µ ¶
i i
Mjk = f (xj − xk ) exp − xj · σxk + Axj · σAxk (3.73)
2 2
are positive. Complete positivity of the corresponding T is then a standard result

of abstract C*-algebra
¡ theory
¢ (cf. [67]). If the factor f is in addition a Gaussian,
i.e. f (x) = exp − 12 x · βx for a positive definite matrix β the cp-map T is called a
Gaussian channel.
A simple way to construct a Gaussian channel is in terms of an ancilla repre-
sentation. More precisely, if A : V → V is an arbitrary linear map we can extend
it to a symplectic map V 3 x 7→ Ax ⊕ A0 x ∈ V ⊕ V 0 , where the symplectic vec-
tor space (V 0 , σ 0 ) now refers to the environment. Consider now the Weyl operator
W (x) ⊗ W 0 (x0 ) = W (x, x0 ) on the Hilbert space H ⊗ H0 associated to the phase
space element x ⊕ x0 ∈ V ⊕ V 0 . Since A ⊕ A0 is symplectic it admits a unitary
Bogolubov transformation U : H ⊗ H0 → H ⊗ H0 with U ∗ W (x, x0 )U = W (Ax, A0 x).
If ρ0 denotes now a Gaussian density matrix on H0 describing the initial state of the
environment we get a Gaussian channel by
£ ¤ £ ¤ £ ¤ £ ¤
tr T ∗ (ρ)W (x) = tr ρ ⊗ ρ0 U ∗ W (x, x0 )U = tr ρW (Ax) tr ρ0 W (A0 x) . (3.74)
£ ¤ £
Hence T W (x) = f (x)W (Ax) with f (x) = tr ρ0 W (A0 x)].
Particular examples for Gaussian channels in the case of one degree of freedom
are attenuation and amplification channels [113, 116]. They are given in terms of a
real parameter k 6= 1 by R2 3 x 7→ Ax = kx ∈ R2
p
R2 3 x 7→ A0 x = 1 − k 2 x ∈ R2 < 1, (3.75)
for k < 1 and

p
R2 3 (q, p) 7→ A0 (q, p) = (κq, −κp) ∈ R2 with κ = k2 − 1 (3.76)
for k > 1. If the environment is initially in a thermal state ρNe (cf. Equation (3.67))
this leads to · µ 2 ¶ ¸
£ ¤ 1 |k − 1|
T W (x) = exp + Nc x2 W (kx), (3.77)
2 2
e . If we start initially with a thermal state ρN it is

where we have set Nc = |k 2 − 1|N
mapped by T again to a thermal state ρN 0 with mean photon number N 0 given by
N 0 = k 2 N + max{0, k 2 − 1} + Nc . (3.78)
If Nc = 0 this means that T amplifies (k > 1) or damps (k < 1) the mean pho-
ton number, while Nc > 0 leads to additional classical, Gaussian noise. We will
reconsider this channel in greater detail in Chapter 6.
Chapter 4
Basic tasks
After we have discussed the conceptual foundations of quantum information we

will now consider some of its basic tasks. The spectrum ranges here from elementary
processes, like teleportation 4.1 or error correction 4.4, which are building blocks
for more complex applications, up to possible future technologies like quantum
cryptography 4.6 and quantum computing 4.5.
4.1 Teleportation and dense coding
Maybe the most striking feature of entanglement is the fact that otherwise impossi-
ble machines become possible if entangled states are used as an additional resource.
The most prominent examples are teleportation and dense coding which we want
to discuss in this section.
4.1.1 Impossible machines revisited: Classical teleportation
We have already pointed out in the introduction that classical teleportation, i.e.
transmission of quantum information over a classical information channel is im-
possible. With the material introduced in the last two chapters it is now possible
to reconsider this subject in a slightly more mathematical way, which makes the
following treatment of entanglement enhanced teleportation more transparent. To
“teleport” the state ρ ∈ B ∗ (H) Alice performs a measurement (described by a POV
measure E1 , . . . , EN ∈ B(H)) on her system and gets a value x ∈ X = {1, . . . , N }
with probability px = tr(Ex ρ). These data she communicates to Bob and he prepares
a B(H) system in the state ρx . PHence the overall state Bob gets if the experiment
is repeated many times is: ρe = x∈X tr(Ex ρ)ρx (cf. Figure 1.1). The latter can be
rewritten as the composition
E∗ D∗
B ∗ (H) −−→ C(X)∗ −−→ B ∗ (H)∗ (4.1)
of the channels X
C(X) 3 f 7→ E(f ) = f (x)Ex ∈ B(H) (4.2)
x∈X
and X
C ∗ (X) 3 p 7→ D ∗ (p) = px ρx ∈ B ∗ (H), (4.3)
x∈X
i.e. ρe = D ∗ E ∗ (ρ) and this Equation makes sense even if X is not finite. The tele-
portation is successful if the output state ρe can not be distinguished from the input
state ρ by any statistical experiment, i.e. if D ∗ E ∗ (ρ) = ρ. Hence the impossibility
of classical teleportation can be rephrased simply as ED 6= Id for all observables E
and all preparations D.
4.1.2 Entanglement enhanced teleportation
Let us now change our setup slightly. Assume that Alice wants to send a quantum
state ρ ∈ B ∗ (H) to Bob and that she shares an entangled state σ ∈ B ∗ (K ⊗ K)
and an ideal classical communication channel C(X) → C(X) with him. Alice can
perform a measurement E : C(X) → B(H ⊗ K) on the composite system B(H ⊗ K)
consisting of the particle to teleport (B(H)) and her part of the entangled system
(B(K)). Then she communicates the classical data x ∈ X to Bob and he operates
with the parameter dependent operation D : B(H) → B(K) ⊗ C(X) appropriately
on his particle (cf. Figure 4.1). Hence the overall procedure can be described by the
53 4.1. Teleportation and dense coding
channel T = (E ⊗ Id)D, or in analogy to (4.1)

E ∗ ⊗Id D∗
B ∗ (H ⊗ K⊗2 ) −−−−→ C ∗ (X) ⊗ B ∗ (K) −−→ B ∗ (H). (4.4)
The teleportation of ρ is successful if
¡ ¢
T ∗ (ρ ⊗ σ) := D ∗ (E ∗ ⊗ Id)(ρ ⊗ σ) = ρ (4.5)
holds, in other words if there is no statistical measurement which can distinguish
the final state T ∗ (ρ ⊗ σ) of Bob’s particle from the initial state ρ of Alice’s input
system. The two channels E and D and the entangled state σ form a teleportation
scheme if Equation (4.5) holds for all states ρ of the B(H) system, i.e. if each state
of a B(H) system can be teleported without loss of quantum information.
Assume now that H = K = Cd and X = {0, . . . , d2 − 1} holds. In this case
we can define a teleportation scheme as follows: The entangled state shared by
Alice and Bob is a maximally entangled state σ = |ΩihΩ| and Alice performs a
measurement which is given by the one dimensional projections Ej = |Φj ihΦj |,
where Φj ∈ H ⊗ H, j = 0, . . . , d2 − 1 is a basis of maximally entangled vectors.
If her result is j = 0, . . . , d2 − 1 Bob has to apply the operation τ 7→ Uj∗ τ Uj on
his partner of the entangled pair, where the Uj ∈ B(H), j = 0, . . . , d2 − 1 are an
orthonormal family of unitary operators, i.e. tr(Uj∗ Uk ) = dδjk . Hence the parameter
dependent operation D has the form (in the Schrödinger picture):
2
dX −1
∗ ∗ ∗
C (X) ⊗ B (H) 3 (p, τ ) 7→ D (p, τ ) = pj Uj∗ τ Uj ∈ B ∗ (H). (4.6)
j=0
Therefore we get for T ∗ (ρ ⊗ σ) from Equation (4.5)

£ ¤ £ ¤
tr T ∗ (ρ ⊗ σ)A = tr (E ⊗ Id)∗ (ρ ⊗ σ)D(A) (4.7)
 2 
dX−1
£ i
= tr  tr12 |Φj ihΦj |(ρ ⊗ σ) Uj∗ AUj  (4.8)
j=0
2
dX −1
£ ¤
= tr (ρ ⊗ σ)|Φj ihΦj | ⊗ (Uj∗ AUj ) (4.9)
j=0
here tr12 denotes the partial trace over the first two tensor factors (= Alice’s qubits).
If Ω, the Φj and the Uj are related by the equation
Φj = (Uj ⊗ 1I)Ω (4.10)
Alice Bob
ρ ρ
x∈X
E Dx
Figure 4.1: Entanglement enhanced teleportation

4. Basic tasks 54
it is a straightforward calculation to show that T ∗ (ρ ⊗ σ) = ρ holds as expected

[231]. If d = 2 there is basically a unique choice: the Φj , j = 0, . . . , 3 are the four Bell
states (cf. Equation (3.3), Ω = Φ0 and the Uj are the identity and the three Pauli
matrices. In this way we recover the standard example for teleportation, published
for the first time in [19]. The first experimental realizations are [33, 31].
4.1.3 Dense coding
We have just shown how quantum information can be transmitted via a classical
channel, if entanglement is available as an additional resource. Now we are looking
at the dual procedure: transmission of classical information over a quantum channel.
To send the classical information x ∈ X = {1, . . . , n} to Bob, Alice can prepare a
d-level quantum system in the state ρx ∈ B ∗ (H), sends it to Bob and he measures
an observable given by positive operators E1 , . . . , Em . The probability for Bob to
receive the signal y ∈ X if Alice has sent x ∈ X is tr(ρx Ey ) and this defines a
classical information channel by (cf. Subsection 3.2.3)
¡P P ¢
C ∗ (X) 3 p 7→ x∈X p(x) tr(ρx E1 ), . . . , x∈X p(x) tr(ρx Em ) ∈ C ∗ (X). (4.11)
To get an ideal channel we just have to choose mutually orthogonal pure states
ρx = |ψx ihψx |, x = 1, . . . , d on Alice’s side and the corresponding one-dimensional
projections Ey = |ψy ihψy |, y = 1, . . . , d on Bob’s. If d = 2 and H = C2 it is possible
to send one bit classical information via one qubit quantum information. The crucial
point is now that the amount of classical information can be increased (doubled in
the qubit case) if Alice shares an entangled state σ ∈ S(H ⊗ H) with Bob. To send
the classical information x ∈ X = {1, . . . , n} to Bob, Alice operates on her particle
with an operation Dx : B(H) → B(H), sends it through an (ideal) quantum channel
to Bob and he performs a measurement E1 , . . . , En ∈ B(H ⊗ H) on both particles.
The probability for Bob to measure y ∈ X if Alice has send x ∈ X is given by
£ ¤
tr (Dx ⊗ Id)∗ (σ)Ey , (4.12)
and this defines the transition matrix of a classical communication channel T . If T

is an ideal channel, i.e. if the transition matrix (4.12) is the identity, we will call E,
D and σ a dense coding scheme (cf. Figure 4.2).
In analogy to Equation (4.4) we can rewrite the channel T defined by (4.12) in
terms of the composition
D ∗ ⊗Id E∗
C ∗ (X) ⊗ B ∗ (H) ⊗ B ∗ (H) −−−−→ B ∗ (H) ⊗ B ∗ (H) −−→ C ∗ (X) (4.13)
Alice Bob
x∈X x∈X
Dx E
Figure 4.2: Dense coding

55 4.2. Estimating and copying
of the parameter dependent operation

n
X
D : C ∗ (X) ⊗ B ∗ (H) → B ∗ (H), p ⊗ τ 7→ pj Dj (τ ) (4.14)
j=1
and the observable

n
X
E : C(X) → B(H ⊗ H), p 7→ p j Ej , (4.15)
j=1
i.e. T ∗ (p) = E ∗ ◦ (D∗ ⊗ Id)(p ⊗ σ). The advantage of this point of view is that it
works as well for infinite dimensional Hilbert spaces and continuous observables.
Finally let us again consider the case where H = Cd and X = {1, . . . , d2 }. If
we choose as in the last paragraph a maximally entangled vector Ω ∈ H ⊗ H, an
orthonormal base Φx ∈ H ⊗ H, j = x, . . . , d2 of maximally entangled vectors and
an orthonormal family Ux ∈ B(H ⊗ H), x = 1, . . . , d2 of unitary operators, we can
construct a dense coding scheme as follows: Ex = |Φx ihΦx |, Dx (A) = Ux∗ AUx and
σ = |ΩihΩ|. If Ω, the Φx and the Ux are related by Equation (4.10) it is easy to see
that we really get a dense coding scheme [231]. If d = 2 holds, we have to set again
the Bell basis for the Φx , Ω = Φ0 and the identity and the Pauli matrices for the
Ux . We recover in this case the standard example of dense coding proposed in [27]
and we see that we can transfer two bits via one qubit, as stated above.
4.2 Estimating and copying
The impossibility of classical teleportation can be rephrased as follows: It is impos-
sible to get complete information about the state ρ of a quantum system by one
measurement on one system. However, if we have many systems, say N , all prepared
in the same state ρ it should be possible to get (with a clever measuring strategy)
as much information on ρ as possible, provided N is large enough. In this way we
can circumvent the impossibility of devices like classical teleportation or quantum
copying at least in an approximate way.
4.2.1 Quantum state estimation
To discuss this idea in a more detailed way consider a number N of d-level quantum
systems, all of them prepared in the same (unknown) state ρ ∈ B ∗ (H). Our aim
is to estimate the state ρ by measurements on the compound system ρ ⊗N . This
is described in terms of an observable (in the following called “estimator”) E N :
C(S) → B(H⊗N ) with values in the quantum state space S = S(H). Since S is not
finite, we have to apply here the machinery introduced in Section 3.2.4, i.e. C(S) is
the algebra of continuous functions, and the probability to get a measuring value
in an open subset ω ⊂ S is given by (cf. Section 3.2.4)
¡ ¢
KN (ω) = sup{tr EN (f )ρ | f ∈ C(X), 0 ≤ f ≤ 1I, supp f ⊂ ω}. (4.16)
For many practical purposes it is sufficient to consider only those estimators which
admits a finite set of possible outcomes. In this case everything reduces to the finite
dimensional setup introduced in Chapter 2 and EN becomes
X
EN (f ) = f (σ)EN,σ (4.17)
σ∈X
where X ⊂ S is a finite subset of the quantum state space and EN,σ , σ ∈ X is a

POV measure. For such a discrete observable the probability KN (ω) simplifies to
¡ ¢ X
KN (ω) = tr EN (ω)ρ⊗N with EN (ω) = EN,σ . (4.18)
σ∈XN ∩ω
4. Basic tasks 56
However, to discuss structural problems, e.g. a quantitative analysis like the search
for an “optimal estimator” (cf. Chapter 10) a restriction to the special case from
Equation (4.17) is inappropriate.
The criterion for a good estimator EN is that for any one-particle density op-
erator ρ, the value measured on a state ρ⊗N is likely to be close to ρ, i.e. that the
probability KN (ω) is small if ω ⊂ S(H) is the complement of a small ball around
ρ. Of course, we will look at this problem for large N . So the task is to find a
whole “estimation scheme”, i.e. a sequence of observables EN , N = 1, 2, . . ., which
is “asymptotically exact”, i.e. error probabilities should vanish in the limit N → ∞.
Variants of this scheme arise if have some a priori knowledge about the input state
ρ. E.g. if we know that ρ is an element of a distinguished subset Y of the state space
S(H) it is sufficient to control the error probabilities for each ρ ∈ Y . Hence we can
improve the estimation quality for each ρ ∈ Y at the expense of the usefulness of
the estimates for ρ 6∈ Y . The most relevant special case is estimation of pure states
(i.e. Y is the set of pure states). It is much better understood than the general
problem and it admits a rather simple optimal solution which is closely related to
the corresponding cloning problem; we will come back to this circle of questions
in a more quantitative way in Chapter 10. Another special case, called “quantum
hypothesis testing”, arises if Y is finite. The task is to distinguish between finitely
many states in terms of a measurement on N equally prepared systems; cf. [109] for
an overview and [173, 107, 166] and the references therein for more recent results.
The most direct way to get an asymptotically exact estimation scheme is to
perform a sequence of measurements on each of the N input systems separately. A
finite set of observables which leads to a successful estimation strategy is usually
called a “quorum” (cf. e.g. [150, 226]). E.g. for d = 2 we can perform alternating
measurements of the three spin components. If ρ = 21 (1I + ~x · ~σ ) is the Bloch rep-
resentation of ρ (cf. Subsection 2.1.2) we see that the expectation values of these
measurements are given by 12 (1 + xj ). Hence we get an arbitrarily good estimate
if N is large enough (we leave the construction of the observable EN associated
to this scheme as an easy exercise to the reader). A similar procedure is possible
for arbitrary d if we consider the generalized Bloch representation for ρ (see again
Subsection 2.1.2). There are however more efficient strategies based on “entangled”
measurements (i.e. the EN (σ) can not be decomposed into pure tensor products)
on the whole input system ρ⊗N (e.g. [218, 137]). Somewhat in between are “adap-
tive schemes” [89] consisting of separate measurements but the j th measurement
depend on the results of (j − 1)th . We will reconsider this circle of questions in a
more quantitative way in Chapter 10.
4.2.2 Approximate cloning
By virtue of the no-cloning theorem [239], it is impossible to produce M perfect
copies of a d-level quantum system if N < M input systems in the common (un-
known) state ρ⊗N are given. More precisely there is no channel TM N : B(H⊗M ) →
B(H⊗N ) such that TM ∗
N (ρ
⊗N
) = ρ⊗M holds for all ρ ∈ S(H). Using state estima-
tion, however, it is easy to find a device TM N which produces at least approximate
copies which become exact in the limit N, M → ∞: If ρ⊗N is given, we measure the
observable EN and get the classical data σ ∈ S(H), which we use subsequently to
prepare M systems in the state σ ⊗M . In other words, TM N has the form
Z
∗ ⊗N
B (H ) 3 ρ 7→ σ ⊗M KN (dσ) ∈ B ∗ (H⊗M ) (4.19)
S
where KN denotes the probability measure from Equation (4.16). If EN is discrete

as in Equation (4.17) this channel simplifies to
X
B ∗ (H⊗N ) 3 ρ 7→ tr(EN,σ ρ)σ ⊗M ∈ B ∗ (H⊗M ). (4.20)
σ∈XN
57 4.3. Distillation of entanglement
We immediately see that the probability to get wrong copies coincides exactly with
the error probability of the estimator EN . This shows first that we get exact copies
in the limit N → ∞ and second that the quality of the copies does not depend on the
number M of output systems, i.e. the asymptotic rate limN,M →∞ M/N of output
systems per input system can be arbitrary large. Note that the latter (independence
of cloning quality from the output number M ) is a special feature of the estimation
based cloning scheme just introduced. In Chapter 9 we will encounter cloning maps
which are not based on estimation and repreparation and which produce better
copies, as long as the required number M of outputs is finite.
Similar to the estimation problem we can improve the quality of the outcomes
if we can use a priori information about the state ρ to be cloned. The most relevant
example arises again if ρ is pure. A detailed discussion of this special case, including
the construction of the (unique) optimal pure state cloner, will be given in Chapter
9.
The fact that the cloning map from Equation (4.19) uses classical data at an in-
termediate step allows further generalizations. Instead of just preparing M systems
in the state σ detected by the estimator, we can apply first an arbitrary transfor-
mation F : S(H) → S(H) on the density matrix σ and prepare F (σ)⊗M instead of
σ ⊗M . In this way we get the channel (cf. Figure 4.3)
Z
∗ ⊗N
B (H ) 3 ρ 7→ F (σ)⊗M KN (dσ) ∈ B ∗ (H⊗M ), (4.21)
S
i.e. a physically realizable device which approximates the impossible machine F . If

the estimator is discrete as in Equation (4.17) we get similarly to (4.20)
X
B ∗ (H⊗N ) 3 τ 7→ tr(EN,σ τ )F (σ)⊗M ∈ B ∗ (H⊗M ). (4.22)
σ∈XN
The probability to get a bad approximation of the state F (ρ)⊗M (if the input state
was ρ⊗N ) is again given by the error probability of the estimator and we get a
perfect realization of F at arbitrary rate as M, N → ∞.
There are in particular two interesting tasks which become possible this way:
The first is the “universal not gate” which associates to each pure state of a qubit the
unique pure state orthogonal to it [46]. This is a special example of a antiunitarily
implemented symmetry operation and therefore not completely positive. The second
example is the purification of states [55, 138]. Here it is assumed that the input
states were once pure but have passed later on a depolarizing channel |φihφ| 7→
ϑ|φihφ| + (1 − ϑ)1I/d. If ϑ > 0 this map is invertible but its inverse does not describe
an allowed quantum operation because it maps some density operators to operators
with negative eigenvalues. Hence the reversal of noise is not possible with a one shot
operation but can be done with high accuracy if enough input systems are available.
A detailed quantitative analysis is again postponed to Chapter 11.
4.3 Distillation of entanglement
Let us now return to entanglement. We have seen in Section 4.1 that maximally
entangled states play a crucial role for processes like teleportation and dense coding.
In practice however entanglement is a rather fragile property: If Alice produces a pair
of particles in a maximally entangled state |ΩihΩ| ∈ S(HA ⊗ HB ) and distributes
one of them over a great distance to Bob, both end up with a mixed state ρ which
contains much less entanglement then the original and which can not be used any
longer for teleportation. The latter can be seen quite easily if we try to apply
the qubit teleportation scheme (Subsection 4.1.2) with a non-maximally entangled
isotropic state (Equation (3.15) with λ > 0) instead of Ω.
4. Basic tasks 58
F (σ)⊗M ∈ B ∗ (H⊗M )
ρ⊗N ∈ B ∗ (H⊗N )
Preparation
Estimation
F
classical data
σ∈X⊂S F (σ) ∈ S
Figure 4.3: Approximating the impossible machine F by state estimation.
Hence the question arises, whether it is possible to recover |ΩihΩ| from ρ, or,
following the reasoning from the last section, at least a small number of (almost)
maximally entangled states from a large number N of copies of ρ. However since
the distance between Alice and Bob is big (and quantum communication therefore
impossible) only LOCC operations (Section 3.2.6) are available for this task (Alice
and Bob can only operate on their respective particles, drop some of them and
communicate classically with one another). This excludes procedures like the pu-
rification scheme just sketched, because we would need “entangled” measurements
to get an asymptotically exact estimate for the state ρ. Hence we need a sequence
of LOCC channels
⊗N ⊗N
TN : B(CdN ⊗ CdN ) → B(HA ⊗ HB ) (4.23)
such that
kTN∗ (ρ⊗N ) − |ΩN ihΩN |k1 → 0, for N → ∞ (4.24)
dN dN
holds, with a sequence of maximally entangled vectors ΩN ∈ C ⊗ C . Note that
⊗N ⊗N ∼
we have to use here the natural isomorphism HA ⊗ HB = (HA ⊗ HB )⊗N , i.e. we
⊗N
have to reshuffle ρ such that the first N tensor factors belong to Alice (HA ) and
the last N to Bob (HB ). If confusion can be avoided we will use this isomorphism
in the following without a further note. We will call a sequence of LOCC channels,
TN satisfying (4.24) with a state ρ ∈ S(HA ⊗ HB ) a distillation scheme for ρ and
ρ is called distillable if it admits a distillation scheme. The asymptotic rate with
which maximally entangled states can be distilled with a given protocol is
lim inf log2 (dN )/N. (4.25)

n→∞
This quantity will become relevant in the framework of entanglement measures

(Chapter 5).
4.3.1 Distillation of pairs of qubits
Concrete distillation protocols are in general rather complicated procedures. We
will sketch in the following how any pair of entangled qubits can be distilled. The
first step is a scheme proposed for the first time by Bennett et. al. [20]. It can be
applied if the maximally entangled fraction F (Equation (3.4)) is greater than 1/2.
As indicated above, we assume that Alice and Bob share a large amount of pairs
59 4.3. Distillation of entanglement
in the state ρ, so that the total state is ρ⊗N . To obtain a smaller number of pairs
with a higher F they proceed as follows:
1. First they take two pairs (let us call them pair 1 and pair 2), i.e. ρ ⊗ ρ and
apply to each of them the twirl operation PUŪ associated to isotropic states
(cf. Equation (3.18)). This can be done by LOCC operations in the following
way: Alice selects at random (respecting the Haar measure on U(2)) a unitary
operator U applies it to her qubits and sends to Bob which transformation she
has chosen; then he applies Ū to his particles. They end up with two isotropic
states ρe ⊗ ρe with the same maximally entangled fraction as ρ.
2. Each party performs the unitary transformation
UXOR : |ai ⊗ |bi 7→ |ai ⊗ |a + b mod 2i (4.26)
on his/her members of the pairs.

3. Finally Alice and Bob perform local measurements in the basis |0i, |1i on pair
1 and discards it afterwards. If the measurements agree, pair 2 is kept and
has a higher F. Otherwise pair 2 is discarded as well.
If this procedure is repeated over and over again, it is possible to get states
with an arbitrarily high F, but we have to sacrifice more and more pairs and the
asymptotic rate is zero. To overcome this problem we can apply the scheme above
until F(ρ) is high enough such that 1 + tr(ρ ln ρ) ≥ 0 holds and then we continue
with another scheme called hashing [24] which leads to a nonvanishing rate.
If finally F(ρ) ≤ 1/2 but ρ is entangled, Alice and Bob can increase F for some
of their particles by filtering operations [17, 95]. The basic idea is that Alice applies
an instrument T : C(X) ⊗ B(H) → B(H) with two possible outcomes (X = {1, 2})
−1 ∗
to her particles. Hence
£ ∗ the ¤ state becomes ρ 7→ px (Tx ⊗ Id) (ρ), x = 1, 2 with
probability px = tr Tx (ρ) (cf. Subsection 3.2.5 in particular Equation (3.53) for
the definition of Tx ). Alice communicates her measuring result x to Bob and if
x = 1 they keep the particle otherwise (x = 2) they discard it. If the instrument T
was correctly chosen Alice and Bob end up with a state ρe with higher maximally
entangled fraction. To find an appropriate T firstly note that there are ψ ∈ H ⊗ H
with hψ, (Id ⊗Θ)ρψi ≤ 0 (this follows from Theorem 2.4.3 since ρ is by assumption
entangled) and second that we can write each vector ψ ∈ H ⊗ H as (Xψ ⊗ 1I)Φ0 with
the Bell state Φ0 and an appropriately chosen operator Xψ (see Subsection 3.1.1).
Now we can define T in terms of the two operations T1 , T2 (cf. Equation (3.55))
with
T1 (A) = Xψ∗ AXψ−1 , Id −T1 = T2 (4.27)
It is straightforward to check that we end up with
(T ⊗ Id)∗ (ρ)
ρe = £ x ¤ (4.28)
tr (Tx ⊗ Id)∗ (ρ)
such that F(e

ρ) > 1/2 holds and we can continue with the scheme described in the
previous paragraph.
4.3.2 Distillation of isotropic states
Consider now an entangled isotropic state ρ in d dimensions, i.e. we have H = C d
and 0 ≤ tr(Fe ρ) ≤ 1 (with the operator Fe of Subsection 3.1.3). Each such state
is distillable via the following scheme [36, 117]: First Alice and Bob apply a filter
operation T : C(X) ⊗ B(H) → B(H) on their respective particle given by T1 (A) =
P AP , T2 = 1−T1 where P is the projection onto a two dimensional subspace. If both
measure the value 1 they get a qubit pair in the state ρe = (T1 ⊗ T1 )(ρ). Otherwise
4. Basic tasks 60
they discard their particles (this requires classical communication). Obviously the
state ρe is entangled (this can be easily checked), hence they can proceed as in the
previous Subsection.
The scheme just proposed can be used to show that each state ρ which violates
the reduction criterion (cf. Subsection 2.4.3) can be distilled [117]. The basic idea
is to project ρ with the twirl PUŪ (which is LOCC as we have seen above; cf.
Subsection 4.3.1) to an isotropic state PUŪ (ρ) and to apply the procedure from the
last paragraph afterwards. We only have to guarantee that PUŪ (ρ) is entangled. To
this end use a vector ψ ∈ H ⊗ H with hψ, (1I ⊗ tr1 (ρ) − ρ)ψi < 0 (which exists by
assumption since ρ violates the reduction criterion) and to apply the filter operation
given by ψ via Equation (4.27).
4.3.3 Bound entangled states
It is obvious that separable states are not distillable, because a LOCC operation
map separable states to separable states. However is each entangled state distillable?
The answer, maybe somewhat surprising, is no and an entangled state which is not
distillable is called bound entangled [119] (distillable states are sometimes called free
entangled, in analogy to thermodynamics). Examples of bound entangled states are
all ppt entangled states [119]: This is an easy consequence of the fact that each
separable channel (and therefore each LOCC channel as well) maps ppt states to
ppt states (this is easy to check), but a maximally entangled state is never ppt. It
is not yet known, whether bound entangled npt states exists, however, there are
at least some partial results: 1. It is sufficient to solve this question for Werner
states, i.e. if we can show that each npt Werner state is distillable it follows that
all npt states are distillable [117]. 2. Each npt Gaussian state is distillable [92]. 3.
For each N ∈ N there is an npt Werner state ρ which is not “N -copy distillable”,
i.e. hψ, ρ⊗N ψi ≥ 0 holds for each pure state ψ with exactly two Schmidt summands
[72, 78]. This gives some evidence for the existence of bound entangled npt states
because ρ is distillable iff it is N -copy distillability for some N [119, 72, 78].
Since bound entangled states can not be distilled, they can not be used for
teleportation. Nevertheless bound entanglement can produce a non-classical effect,
called “activation of bound entanglement” [125]. To explain the basic idea, assume
that Alice and Bob share one pair of particles in a distillable state ρf and many
particles in a bound entangled state ρb . Assume in addition that ρf can not be
used for teleportation, or, in other words if ρf is used for teleportation the particle
Bob receives is in a state σ 0 which differs from the state σ Alice has send. This
problem can not be solved by distillation, since Alice and Bob share only one pair
of particles in the state ρf . Nevertheless they can try to apply an appropriate filter
operation on ρ to get with a certain probability a new state which leads to a better
quality of the teleportation (or, if the filtering fails, to get nothing at all). It can
be shown however [120] that there are states ρf such that the error occuring in
this process (e.g. measured by the trace norm distance of σ and σ 0 ) is always above
a certain threshold. This is the point where the bound entangled states ρ b come
into play: If Alice and Bob operate with an appropriate protocol on ρf and many
copies of ρb the distance between σ and σ 0 can be made arbitrarily small (although
the probability to be successful goes to zero). Another example for an activation
of bound entanglement is related to distillability of npt states: If Alice and Bob
share a certain ppt-entangled state as additional resource each npt state ρ becomes
distillable (even if ρ is bound entangled) [80, 145]. For a more detailed survey of the
role of bound entanglement and further references see [123].
4.4 Quantum error correction
If we try to distribute quantum information over large distances or store it for a
long time in some sort of “quantum memory” we always have to deal with “de-
61 4.4. Quantum error correction
coherence effects”, i.e. unavoidable interactions with the environment. This results
in a significant information loss, which is particularly bad for the functioning of a
quantum computer. Similar problems arise as well in a classical computer, but the
methods used there to circumvent the problems can not be transferred to the quan-
tum regime. E.g. the most simple strategy to protect classical information against
noise is redundancy: instead of storing the information once we make three copies
and decide during readout by a majority vote which bit to take. It is easy to see that
this reduces the probability of an error from order ² to ²2 . Quantum mechanically
however such a procedure is forbidden by the no cloning theorem.
Nevertheless quantum error correction is possible although we have to do it in a
more subtle way than just copying; this was observed for the first time independently
in [48] and [201]. Let us consider first the general scheme and assume that T :
B(K) → B(K) is a noisy quantum channel. To send quantum systems of type B(H)
undisturbed through T we need an encoding channel E : B(K) → B(H) and a
decoding channel D : B(H) → B(K) such that ET D = Id holds, respectively
D∗ T ∗ E ∗ = Id in the Schrödinger picture; cf. Figure 4.4.
4.4.1 The theory of Knill and Laflamme
To get a more detailed description of the structure of the channels E and D we will
give in the following a short review of the theory of error correcting codes in the
sense of Knill and Laflamme [143]. To this end start from the error corrector’s dream,
namely the situation in which all the errors happen in another part of the system,
where we do not keep any of the precious quantum information. This will help us to
characterize the structure of the kind of errors which such a scheme may tolerate,
or ‘correct’. Of course, the dream is just a dream for the situation we are mainly
interested in: several parallel channels, each of which may be affected by errors.
But the splitting of the system into subsystems, mathematically the decomposition
of the Hilbert space of the total system into a tensor product is something we
may change by a suitable unitary transformation. This is then precisely the role
of the encoding and decoding operations. The Knill-Laflamme theory is precisely
the description of the situation where such a unitary, and hence a coding/decoding
scheme exists. Constructing such schemes, however, is another matter, to which we
will turn in the next subsection.
So consider a system split into H = Hg ⊗ Hb , where the indices g and b stand
for ‘good’ and ‘bad’. We prepare the system in a state ρ ⊗ |ΩihΩ|, where ρ is the
quantum state we want Pto protect. Now come the errors in the form of a completely
positive map T (A) = i Fi∗ AFi . Then according to the error corrector’s dream, we
T
Encoding
Decoding
Id
ρ ρ
Id
Id
Id
Figure 4.4: Five bit quantum code: Encoding one qubit into five and correcting one
error.
4. Basic tasks 62
would just have to discard the bad system, and get the same state ρ as before.
The hardest demands for realizing this come from pure states ρ = |φihφ|, because
the only way that the restriction to the good system can again be |φihφ| is that the
state after errors factorizes, i.e.
X
T ∗ (|φ ⊗ Ωihφ ⊗ Ω|) = |Fi (φ ⊗ Ω)ihFi (φ ⊗ Ω)| = |φihφ| ⊗ σ . (4.29)
i
This requires that

Fi (φ ⊗ Ω) = φ ⊗ Φi , (4.30)
where Φi ∈ Hb is some vector, which must be independent of φ if such an equation
is to hold for all φ ∈ Hg . Conversely, condition (4.30) implies (4.29) for every pure
state |φihφ| and, by convex combination, for every state ρ.
Two remarks are in order. Firstly, we have not required that Fi = 1I ⊗ Fi0 . This
would be equivalent to demanding that this scheme works with every Ω, or indeed
with every (possibly mixed) initial state of the bad system. This would be much
too strong for a useful theory of codes. So later on we must insist on a proper
initialization of the bad subsystem by a suitable encoding. Secondly, if we have the
condition (4.30) for the Kraus operators of some channel T , then it also holds for
all channels whose Kraus operators can be written as linear combinations of the F i .
In other words, the “set of correctible errors” is naturally identified with the vector
space of operators F such that there is a vector Φ ∈ Hb with F (φ ⊗ Ω) = φ ⊗ Φ
for all φ ∈ Hg . This space will be called the maximal error space of the coding
scheme, and will be denoted by Emax . Usually, a code is designed for a given error
space E. Then the statement that these given errors are corrected simply becomes
E ⊂ Emax . The key observation, however, is that the space of errors is a vector space
in a natural way, i.e., if we can correct two types of errors, then we can also correct
their superposition.
Let us now consider the situation in which we want to send states of a small
system with Hilbert space H1 through a channel T : B(H2 ) → B(H2 ). The Kraus
operators of T lie in an error space E ⊂ B(H2 ), which we assume to be given. No
more assumptions will be made about T . Our task is now to devise coding E and
decoding D so that ET D is the identity on B(H1 ).
The idea is to realize the error corrector’s dream by suitable encoding. The
‘good’ space in that scenario is, of course, the space H1 . We are looking for a way
to write H2 ∼ = H1 ⊗ Hb . Actually, an isomorphism may be asking too much, and
we look for an isometry U : H1 ⊗ Hb → H2 . The encoding, written best in the
Schrödinger picture, is tensoring with an initial state Ω as before, but now with an
additional twist by U :
E ∗ (ρ) = U (ρ ⊗ |ΩihΩ|)U ∗ . (4.31)
The decoding operation D is again taking the partial trace over the bad space H b ,
after reversing of U . Since U is only an isometry and not necessarily unitary we
need an additional term to make D unit preserving. The whole operation is best
written in the Heisenberg picture:
D(X) = U (X ⊗ 1I)U ∗ + tr(ρ0 X)(1I − U U ∗ ) , (4.32)
where ρ0 is an arbitrary density operator. These transformations are successful, if

the error space (transformed by U ) behaves as before, i.e., if for all F ∈ E there are
vectors Φ(F ) ∈ Hb such that, for all φ ∈ H1
F U (φ ⊗ Ω) = U (φ ⊗ Φ(F )) (4.33)
holds. This equation describes precisely the elements F ∈ Emax of the maximal error
space.
P
To check that we really have ET D = Id for any channel T (A) = i Fi∗ AFi with
Fi ∈ Emax , it suffices to consider pure input states |φihφ|, and the measurement of
an arbitrary observable X at the output:
£ ¤ X £ ¤
tr |φihφ|ET D(X) = tr U |φ ⊗ Ωihφ ⊗ Ω|U ∗ Fi U (X ⊗ 1I)U ∗ Fi (4.34)
i
X £ ¤
= tr |φ ⊗ Φ(Fi )ihφ ⊗ Φ(Fi )|X ⊗ 1I (4.35)
i
X
= hφ, Xφi kΦ(Fi )k2 = hφ, Xφi. (4.36)
i
P
In the last equation we have used that i kΦ(Fi )k2 = 1, since E, T , and D each
map 1I to 1I.
The encoding E defined in Equation (4.31) is of the form E ∗ (ρ) = V ρV ∗ with
the encoding isometry V : H1 → H2 given by
V φ = U (φ ⊗ Ω) . (4.37)
If we just know this isometry and the error space we can reconstruct the whole
structure, including the decomposition H2 = H1 ⊗Hb ⊕(1I−U U ∗ )H2 , and hence the
decoding operation D. A necessary condition for this, first established by Knill and
Laflamme [143], is that, for arbitrary φ1 , φ2 ∈ H1 and error operators F1 , F2 ∈ E:
hV φ1 , F1∗ F2 V φ2 i = hφ1 , φ2 iω(F1∗ F2 ) (4.38)
holds with some numbers ω(F1∗ F2 )
independent of φ1 , φ2 . Indeed, from (4.33) we
immediately get this equation with ω(F1∗ F2 ) = hΦ(F1 ), Φ(F2 )i. Conversely, if the
Knill-Laflamme condition (4.38) holds, the numbers ω(F1∗ F2 ) serve as a (possibly
degenerate) scalar product on E, which upon completion becomes the ‘bad space’
Hb , such that F ∈ E is identified with a Hilbert space vector Φ(F ). The operator
U : φ⊗Φ(F ) = F V φ is then an isometry, as used at the beginning of this section. To
conclude, the Knill-Laflamme condition is necessary and sufficient for the existence
of a decoding operation. Its main virtue is that we can use it without having to
construct the decoding explicitly.
The most relevant example of such a scheme arises if we generalize the classical
idea of sending multiple copies in a certain sense. This means we encode the quan-
tum information we want to transmit into n systems which can be send separately
through multiple copies of a noisy channel; cf. Figure 4.4. In that case the space
H2 is the n-fold tensor product of the system H on which the noisy channels under
consideration act.
Definition 4.4.1 We say that a coding isometry V : H1 → H⊗n corrects f errors,
if it satisfies the Knill-Laflamme condition (4.38) for the error space Ef spanned
linearly by all operators of the kind X1 ⊗ X2 ⊗ · · · ⊗ Xn , where at most f places we
have a tensor factor Xi 6= 1I.
When F1 and F2 are both supported on at most f sites, the product F1∗ F2 ,
which appears in the Knill-Laflamme condition involves 2f sites. Therefore we can
paraphrase the condition by saying that
hV φ1 , XV φ2 i = hφ1 , φ2 iω(X) (4.39)
for X ∈ E2f . From Kraus operators in Ef we can build arbitrary channels of the
kind T = T1 ⊗ T2 ⊗ · · · ⊗ Tn , where at most f of the tensor factors Ti are channels
different from id.
There are several ways to construct error correcting codes (see e.g. [98, 47, 10]).
Most appropriate for our purposes are “Graph codes” [190], because they are quite
easy to describe and admit a simple way to check the error correction condition.
This will be the subject of the next subsection.
4. Basic tasks 64
4.4.2 Graph codes

The general scheme of graph codes works not just for qubits, but for any dimension
d of one site spaces. The code will have some number m of input systems, which we
label by a set X, and, similarly n output systems, labeled by a set Y . The Hilbert
space of the system with label x ∈ X ∪ Y will be denoted by Hx although all these
are isomorphic to Cd , and are equipped with a special basis |jx i, where jx ∈ Zd is
an integer taken modulo d. As a convenient shorthand, we write jX for a tuple of
jx ∈ ZdN , specified for every x ∈ X. Thus the |jX i form a basis of the input space
HX = x∈X Hx of the code. An operator F , say, on the output N space will be called
localized on a subset Z ⊂ Y of systems, if it is some operator on y∈Z Hy , tensored
with the identity operators of the remaining sites.
The main ingredient of the code construc-
tion is now an undirected graph with vertices
X ∪ Y . The links of the graph are given by
the adjacency matrix, which we will denote
by Γ. When we have |X| = m input ver-
tices and |Y | = n output vertices, this is an
(n + m) × (n + m) matrix with Γxy = 1 if
node x and y are linked and Γxy = 0 oth-
Figure 4.5: Two graph codes.
erwise. We do allow multiple edges, so the
entries of Γ will in general be integers, which can also be taken modulo d. It is
convenient to exclude self-linked vertices, so we always take Γxx = 0.
The graph determines an operator V = VΓ : HX → HY by the formula
µ ¶
−n/2 iπ
hjY |VΓ |jX i = d exp jX∪Y Γ · jX∪Y , (4.40)
d
where the exponent contains the matrix element of Γ

X
jX∪Y · Γ · jX∪Y = jx Γxy jy . (4.41)
x,y∈X∪Y
Because Γ is symmetric, every term in this sum appears twice, hence adding a
multiple of d to any jx or Γxy will change the exponent in (4.40) by a multiple of
2π, and thus will not change VΓ .
The error correcting properties of VΓ are summarized in the following result
[190]. It is just the Knill-Laflamme condition with a special expression for the form
ω, for error operators such that F1∗ F2 is localized on a set Z.
Theorem 4.4.2 Let Γ be a graph, i.e., a symmetric matrix with entries Γ xy ∈ Zd ,
for x, y ∈ (X ∪Y ). Consider a subset Z ⊂ Y , and suppose that the (Y \Z)×(X ∪Z)-
submatrix of Γ is non-singular, i.e.,
X
∀y∈Y \Z Γyx hx ≡ 0 implies ∀x∈X∪Z hx ≡ 0 (4.42)
x∈X∪Z
where congruences are mod d. Then, for every operator F ∈ B(H Y ) localized on
Z, we have
VΓ∗ F VΓ = d−n tr(F )1IX (4.43)
Proof. It will be helpful to use the notation for collections of variables, already
present in (4.41) more systematically: for any subset W ⊂ X ∪ Y we write jW for
the collection of variables jy with y ∈ W . The Kronecker-Delta δ(jW ) is defined to
0, and one otherwise. By jW · ΓW W 0 · kW 0 we mean the
be zero if for any y ∈ W jy 6= P
suitably restricted sum, i.e., x∈W,y∈W 0 jx Γxy ky . The important sets to which we
apply this notation are X 0 = (X ∪ Z) and Y 0 = Y \ Z. In particular, the condition

on Γ can be written as ΓY 0 X 0 jX 0 = 0 =⇒ jX 0 = 0.
Consider now the matrix element
X
hjX |VΓ∗ F VΓ |kX i = hjX |VΓ∗ |jY ihjY |F |kY ihkY |VΓ |kX i (4.44)
jY ,kY
³ ´
X iπ
kX∪Y ·Γ·kX∪Y −jX∪Y ·Γ·jX∪Y
d−n
d
= e hjY |F |kY i
jY ,kY
Since F is localized on Z, the matrix element contains a factor δjy ,ky for every
y ∈ Y \ Z = Y 0 , so we can write hjY |F |kY i = hjZ |F |kZ iδ(jY 0 − kY 0 ). Therefore we
can compute the sum (4.44) in stages:
X
hjX |VΓ∗ F VΓ |kX i = hjZ |F |kZ iS(jX 0 , kX 0 ) , (4.45)
jZ ,kZ
where S(jX 0 , kX 0 ) is the sum over the Y 0 -variables, which, of course, still depends
on the input variables jX , kX and the variables jZ , kZ at the error positions:
³ ´
X iπ
kX∪Y ·Γ·kX∪Y −jX∪Y ·Γ·jX∪Y
−n d
S(jX 0 , kX 0 ) = d δ(jY 0 − kY 0 )e (4.46)
jY 0 ,kY 0
The sums in the exponent can each be split into four parts according to the de-
composition X 0 vs. Y 0 . The terms involving ΓY 0 Y 0 cancel because kY 0 = jY 0 . The
terms involving ΓX 0 Y 0 and ΓY 0 X 0 are equal because Γ is symmetric, and together
give 2jY 0 · ΓY 0 X 0 · (kX 0 − jX 0 ). The ΓX 0 X 0 remain unchanged, but only give a phase
factor independent of the summation variables. Hence
¡ ¢X
iπ 2πi
S(jX 0 , kX 0 ) = d−n e d kX 0 ·Γ·kX 0 −jX 0 ·Γ·jX 0 e d jY 0 ·ΓY 0 X 0 ·(kX 0 −jX 0 )
jY 0
¡ ¢
iπ 0
kX 0 ·Γ·kX 0 −jX 0 ·Γ·jX 0
= d−n e d d|Y |
δ(ΓY 0 X 0 · (kX 0 − jX 0 ))
¡ ¢
0 iπ
kX 0 ·Γ·kX 0 −jX 0 ·Γ·jX 0
= d−n+|Y | e d δ(kX 0 − jX 0 )
0
−n+|Y |
= d δ(kX 0 − jX 0 ) . (4.47)
Here we used at the first equation that the sum is a product of geometric series
as they appear in discrete Fourier transforms.
P At the second equality the main
condition of the Proposition enters: if x∈X 0 Γyx · (kx − jx ) vanishes for all y ∈ Y 0
as required by the delta-function then (and only then) the vector kX 0 − jX 0 must
vanish. But then the two terms in the exponent of the phase factor also cancel.
Inserting this result into (4.45), and using that δ(hX 0 ) = δ(hX )δ(hZ ), we find
0 X
hjX |VΓ∗ F VΓ |kX i = δ(jX − kX ) d−n+|Y | hjZ |F |jZ i
jZ
X
−n
= δ(jX − kX ) d hjY |F |jY i
jY
Here the error operator is considered in the first line as an operator on HZ , and as
an operator on HY in the second line, by tensoring it with 1IY 0 . This cancels the
0
dimension factor d|Y | 2
All that is left to get an error correcting code is to ensure that the conditions
of this Theorem are satisfied sufficiently often. This is evident from combining the
above Theorem with Definition 4.4.1.
4. Basic tasks 66
Corollary 4.4.3 Let Γ be a graph as in the previous Proposition, and suppose that
the (Y \ Z) × (X ∪ Z)-submatrix of Γ is non-singular for all Z ⊂ Y with up to 2f
elements. Then the code associated to Γ corrects f errors.
Two particular examples (which are equivalent!) are given in Figure 4.5. In both
cases we have N = 1, M = 5 and K = 1 i.e. one input node, which can be chosen
arbitrarily, five output nodes and the corresponding codes correct one error.
4.5 Quantum computing
Quantum computing is without a doubt the most prominent and most far reaching
application of quantum information theory, since it promises on the one hand, “ex-
ponential speedup” for some problems which are “hard to solve” with a classical
computer, and gives completely new insights into classical computing and complex-
ity theory on the other. Unfortunately, an exhaustive discussion would require its
own review article. Hence we we are only able to give a short overview (see Part II
of [172] for a more complete presentation and for further references).
4.5.1 The network model of classical computing
Let us start with a brief (and very informal) introduction to classical computing (for
a more complete review and hints for further reading see Chapter 3 of [172]). What
we need first is a mathematical model for computation. There are in fact several
different choices and the Turing machine [212] is the most prominent one. More
appropriate for our purposes is, however, the so called network model, since it allows
an easier generalization to the quantum case. The basic idea is to interpret a classical
(deterministic) computation as the evaluation of a map f : BN → BM (where
B = {0, 1} denotes the field with two elements) which maps N input bits to M
output bits. If M = 1 holds f is called a boolean function and it is for many purposes
sufficient to consider this special case – each general f is in fact a Cartesian product
of boolean functions. Particular examples are the three elementary gates AND, OR
and NOT defined in Figure 4.6 and arbitrary algebraic expressions constructed
from them: e.g. the XOR gate (x, y) 7→ x + y mod 2 which can be written as
(x ∨ y) ∧ ¬(x ∧ y). It is now a standard result of boolean algebra that each boolean
function can be represented in this way and there are in general many possibilities
to do this. A special case is the disjunctive normal form of f ; cf [225]. To write
such an expression down in form of equations is, however, somewhat confusing. f
is therefore expressed most conveniently in graphical form as a circuit or network,
i.e. a graph C with nodes representing elementary gates and edges (“wires”) which
determine how the gates should be composed; cf. Figure 4.7 for an example. A
a a
c c a b
b b
a b c a b c
0 0 0 0 0 0 a b
1 0 0 1 0 1 0 1
0 1 0 0 1 1 1 0
1 1 1 1 1 1
c = ab c = a + b − ab b=1−a
AND, ∧ OR, ∨ NOT, ¬
Figure 4.6: Symbols and definition for the three elementary gates AND, OR and
NOT.
67 4.5. Quantum computing
classical computation can now be defined as a circuit applied to a specified string

of input bits.
Variants of this model arise if we replace AND, OR and NOT by another (finite)
set G of elementary gates. We only have to guarantee that each function f can be
expressed as a composition of elements from G. A typical example for G is the set
which contains only the NAND gate (x, y) 7→ x ↑ y = ¬(x ∧ y). Since AND, OR
and NOT can be rewritten in terms of NAND (e.g. ¬x = x ↑ x) we can calculate
each boolean function by a circuit of NAND gates.
x
c
x + y mod 2
Figure 4.7: Half-adder circuit as an example for a boolean network.
4.5.2 Computational complexity

One of the most relevant questions within classical computing, and the central
subject of computational complexity, is whether a given problem is easy to solve or
not, where “easy” is defined in terms of the scaling behavior of the resources needed
in dependence of the size of the input data. In the following we will give a rough
survey over the most basic aspects of this field, while we refer the reader to [177]
for a detailed presentation.
To start with, let us specify the basic question in greater detail. First of all the
problems we want to analyze are decision problems which only give the two possible
values “yes” and “no”. They are mathematically described by boolean functions
acting on bit strings of arbitrary size. A well known example is the factoring problem
given by the function fac with fac(m, l) = 1 if m (more precisely the natural number
represented by m) has a divisor less then l and fac(m, l) = 0 otherwise. Note that
many tasks of classical computation can be reformulated this way, so that we do
not get a severe loss of generality. The second crucial point we have to clarify is the
question what exactly are the resources we have mentioned above and how we have
to quantify them. A natural physical quantity which come into mind immediately
is the time needed to perform the computation (space is another candidate, which
we do not discuss here, however). Hence the question we have to discuss is how the
computation time t depends on the size L of the input data x (i.e. the length L of
the smallest register needed to represent x as a bit string).
However a precise definition of “computation time” is still model dependent.
For a Turing machine we can take simply the number of head movements needed to
solve the problem, and in the network model we choose the number of steps needed
to execute the whole circuit, if gates which operate on different bits are allowed to
work simultaneously1 . Even with a fixed type of model the functional behavior of t
1 Note that we have glanced over a lot of technical problems at this point. The crucial difficulty
is that each circuit CN allows only the computation of a boolean function fN : BN → B which
acts on input data of length N . Since we are interested in answers for arbitrary finite length inputs
a sequence CN , N ∈ N of circuits with appropriate uniformity properties is needed; cf. [177] for
details.
4. Basic tasks 68
depends on the set of elementary operations we choose, e.g. the set of elementary
gates in the network model. It is therefore useful to divide computational problems
into complexity classes whose definitions do not suffer under model dependent as-
pects. The most fundamental one is the class P which contains all problems which
can be computed in “polynomial time”, i.e. t is, as a function of L, bounded from
above by a polynomial. The model independence of this class is basically the con-
tent of the strong Church Turing hypotheses which states, roughly speaking, that
each model of computation can be simulated in polynomial time on a probabilistic
Turing machine.
Problems of class P are considered “easy”, everything else is “hard”. However
even if a (decision) problem is hard the situation is not hopeless. E.g. consider
the factoring problem fac described above. It is generally believed (although not
proved) that this problem is is not in class P. But if somebody gives us a divisor
p < l of m it is easy to check whether p is really a factor, and if the answer is
true we have computed fac(m, l). This example motivates the following definition:
A decision problem f is in class NP (“nondeterministic polynomial time”) if there
is a boolean function f 0 in class P such that f 0 (x, y) = 1 for some y implies f (x). In
our example fac0 is obviously defined by fac0 (m, l, p) = 1 ⇔ p < l and p is a devisor
of m. It is obvious that P is a subset of NP the other inclusion however is rather
nontrivial. The conjecture is that P 6= NP holds and great parts of complexity
theory are based on it. Its proof (or disproof) however represents one of the biggest
open questions of theoretical informatics.
To introduce a third complexity class we have to generalize our point of view
slightly. Instead of a function f : BN → BM we can look at a noisy classical T
which sends the input value x ∈ BN to a probability distribution Txy , y ∈ BM on
BM (i.e. Txy is the transition matrix of the classical channel T ; cf. Subsection 3.2.3).
Roughly speaking, we can interpret such a channel as a probabilistic computation
which can be realized as a circuit consisting of “probabilistic gates”. This means
there are several different ways to proceed at each step and we use a classical random
number generator to decide which of them we have to choose. If we run our device
several times on the same input data x we get different results y with probability
Txy . The crucial point is now that we can allow some of the outcomes to be wrong
as long as there is an easy way (i.e. a class P algorithm) to check the validity of
the results. Hence we define BPP (“bounded error probabilistic polynomial time”)
as the class of all decision problems which admit a polynomial time probabilistic
algorithm with error probability less than 1/2 − ² (for fixed ²). It is obvious that
P ⊂ BPP holds but the relation between BPP and NP is not known.
4.5.3 Reversible computing
In the last subsection we have discussed the time needed to perform a certain
computation. Other physical quantities which seem to be important are space and
energy. Space can be treated in a similar way as time and there are in fact space-
related complexity classes (e.g PSPACE which stands for “polynomial space”).
Energy, however, is different, because it turns surprisingly out that it is possible to
do any calculation without expending any energy! One source of energy consumption
in a usual computer is the intrinsic irreversibility of the basic operations. E.g. a basic
gate like AND maps two input bits to one output bit, which obviously implies that
the input can not be reconstructed from the output. In other words: one bit of
information is erased during the operation of the AND gate, hence a small amount
of energy is dissipated to the environment. A thermodynamic analysis, known as
Landauer’s principle, shows that this energy loss is at least kB T ln 2, where T is the
temperature of the environment [148].
If we want to avoid this kind of energy dissipation we are restricted to reversible
processes, i.e. it should be possible to reconstruct the input data from the output
data. This is called reversible computation and it is performed in terms of reversible

gates, which in turn can be described by invertible functions f : BN → BN . This does
not restrict the class of problems which can be solved however: We can repackage
a non-invertible function f : BN → BM into an invertible one f 0 : BN +M → BN +M
simply by f 0 (x, 0) = (x, f (x)) and an appropriate extension to the rest of BN +M . It
can be even shown that a reversible computer performs as good as a usual one, i.e.
an “irreversible” network can be simulated in polynomial time by a reversible one.
This will be of particular importance for quantum computing, because a reversible
computer is, as we will see soon, a special case of a quantum computer.
4.5.4 The network model of a quantum computer
Now we are ready to introduce a mathematical model for quantum computation.
To this end we will generalize the network model discussed in Subsection 4.5.1 to
the network model of quantum computation.
|xi 7→ U |xi |0xi 7→ |0i ⊗ |xi |0xi 7→ |0i ⊗ |xi

|1xi 7→ |1i ⊗ U |xi |1xi 7→ |1i ⊗ |¬xi
One qubit gate. Controlled U gate. CNOT gate.
Figure 4.8: Universal sets of quantum gates.
A classical computer operates by a network of gates on a finite number of classical

bits. A quantum computer operates on a finite number of qubits in terms of a
network of quantum gates – this is the rough idea. To be more precise consider the
Hilbert space H⊗N with H = C2 which describes a quantum register consisting
of N qubits. In H there is a preferred set |0i, |1i of orthogonal states, describing
the two values a classical bit can have. Hence we can describe each possible value
x of a classical register of length N in terms of the computational basis |xi =
|x1 i⊗· · ·⊗|xN i, x ∈ BN . A quantum gate is now nothing else but a unitary operator
acting on a small number of qubits (preferably 1 or 2) and a quantum network is a
graph representing the composition of elementary gates taken from a small set G of
unitaries. A quantum computation can now be defined as the application of such a
network to an input state ψ of the quantum register (cf. Figure 4.9 for an example).
Similar to the classical case the set G should be universal; i.e. each unitary operator
on a quantum register of arbitrary length can be represented as a composition of
elements from G. Since the group of unitaries on a Hilbert space is continuous, it
is not possible to do this with a finite set G. However we can find at least suitably
small sets which have the chance to be realizable technically (e.g. in an ion-trap)
somehow in the future. Particular examples are on the one hand the controlled U
operations and the set consisting of CNOT and all one-qubit gates on the other (cf.
Figure 4.8; for a proof of universality see Section 4.5 of [172]).
Basically we could have considered arbitrary quantum operations instead of only
unitaries as gates. However in Subsection 3.2.1 we have seen that we can implement
each operation unitarily if we add an ancilla to the systems. Hence this kind of gen-
4. Basic tasks 70
U1 H
U2 U1 H
U3 U2 U1 H
· ¸ · ¸
1 1 1 1 0
H=√ Uk = −k
2 1 −1 0 e2 π
Figure 4.9: Quantum circuit for the discrete Fourier transform on a 4-qubit register.
eralization is already covered by the model. (As long as non-unitarily implemented

operations are a desired feature. Decoherence effect due to unavoidable interaction
with the environment are a completely different story; we come back to this point
at the end of the Subsection.) The same holds for measurements at intermediate
steps and subsequent conditioned operations. In this case we get basically the same
result with a different network where all measurements are postponed to the end.
(Often it is however very useful to allow measurements at intermediate steps as we
will see in the next Subsection.)
Having a mathematical model of quantum computers in mind we are now ready
to discuss how it would work in principle.
1. The first step is in most cases preprocessing of the input data on a classical
computer. E.g. the Shor algorithm for the factoring problem does not work if
the input number m is a pure prime power. However in this case there is an
efficient classical algorithm. Hence we have to check first whether m is of this
particular form and use this classical algorithm where appropriate.
2. In the next step e have to prepare the quantum register based on these pre-
processed data. This means in the most simple case to write classical data,
i.e. to prepare the state |xi ∈ H⊗N if the (classical) input is x ∈ BN . In many
cases however it might be more intelligent to use a superposition of several
|xi, e.g. the state
1 X
Ψ= √ |xi, (4.48)
2N x∈BN
which represents actually the superposition of all numbers the registers can
represent – this is indeed the crucial point of quantum computing and we
come back to it below.
3. Now we can apply the quantum circuit C to the input state ψ and after the
calculation we get the output state U ψ, where U is the unitary represented
by C.
4. To read out the data after the calculation we perform a von Neumann mea-
surement in the computational basis, i.e. we measure the observable given by
the one dimensional projectors |xihx|, x ∈ BN . Hence we get x ∈ BN with
probability PN = |hψ|xi|2 .
5. Finally we have to postprocess the measured value x on a classical computer

to end up with the final result x0 . If, however, the output state U Ψ is a proper
superposition of basis vectors |xi (and not just one |xi) the probability p x
to get this particular x0 is less than 1. In other words we have performed a
probabilistic calculation as described in the last paragraph of Subsection 4.5.2.
Hence we have to check the validity of the results (with a class P algorithm
on a classical computer) and if they are wrong we have to go back to step 2.
So, why is quantum computing potentially useful? First of all, a quantum com-
puter can perform at least as good as a classical computer. This follows immediately
from our discussion of reversible computing in Subsection 4.5.3 and the fact that
any invertible function f : BN → BN defines a unitary by Uf : |xi 7→ |f (x)i (the
quantum CNOT gate in Figure 4.8 arises exactly in this way from the classical
CNOT). But, there is on the other hand strong evidence which indicates that a
quantum computer can solve problems in polynomial time which a classical com-
puter can not. The most striking example for this fact is the Shor algorithm, which
provides a way to solve the factoring problem (which is most probably not in class
P) in polynomial time. If we introduce the new complexity class BQP of decision
problems which can be solved with high probability and in polynomial time with a
quantum computer, we can express this conjecture as BPP 6= BQP.
The mechanism which gives a quantum computer its potential power is the
ability to operate not just on one value x ∈ BN , but on whole superpositions
of values, as already mentioned in step 2 above. E.g. consider a, not necessarily
invertible, map f : BN → BM and the unitary operator Uf
H⊗N ⊗ H⊗M 3 |xi ⊗ |0i 7→ Uf |xi ⊗ |0i = |xi ⊗ |f (x)i ∈ H⊗N ⊗ H⊗M . (4.49)
If we let act Uf on a register in the state Ψ ⊗ |0i from Equation (4.48) we get the
result
1 X
Uf (Ψ ⊗ |0i) = √ |xi ⊗ |f (x)i. (4.50)
2N x∈BN
Hence a quantum computer can evaluate the function f on all possible arguments
x ∈ BN at the same time! To benefit from this feature – usually called quantum
parallelism – is, however, not as easy as it looks like. If we perform a measurement
on Uf (Ψ ⊗ |0i) in the computational basis we get the value of f for exactly one
argument and the rest of the information originally contained in Uf (Ψ ⊗ |0i) is
destroyed. In other words it is not possible to read out all pairs (x, f (x)) from
Uf (Ψ ⊗ |0i) and to fill a (classical) lookup table with them. To take advantage
from quantum parallelism we have to use a clever algorithm within the quantum
computation step (step 3 above). In the next section we will consider a particular
example for this.
Before we come to this point, let us give some additional comments which link
this section to other parts of quantum information. The first point concerns entan-
glement. The state Uf (Ψ ⊗ |0i) is highly entangled (although Ψ is separable since
£ ¤⊗N
Ψ = 2−1/2 (|0i + |1i) ), and this fact is essential for the “exponential speedup” of
computations we could gain in a quantum computer. In other words, to outperform
a classical computer, entanglement is the most crucial resource – this will become
more transparent in the next section. The second remark concerns error correction.
Up to now we have implicitly assumed that all components of a quantum computer
work perfectly without any error. In reality however decoherence effects make it
impossible to realize unitarily implemented operations, and we have to deal with
noisy channels. Fortunately it is possible within quantum information to correct at
least a certain amount of errors, as we have seen in Section 4.4). Hence unlike an
4. Basic tasks 72
analog computer2 a quantum computer can be designed fault tolerant, i.e. it can
work with imperfectly manufactured components.
4.5.5 Simons problem
We will consider now a particular problem (known as Simons problem; cf. [196])
which shows explicitly how a quantum computer can speed up a problem which is
hard to solve with a classical computer. It does not fit however exactly into the gen-
eral scheme sketched in the last subsection, because a quantum “oracle” is involved,
i.e. a black box which performs an (a priori unknown) unitary transformation on
an input state given to it. The term “oracle” indicates here that we are not in-
terested in the time the black box needs to perform the calculation but only in
the number of times we have to access it. Hence this example does not prove the
conjecture BPP 6= BQP stated above. Other quantum algorithms which we have
not the room here to discuss include: the Deutsch [69] and Deutsch-Josza problem
[70], the Grover search algorithm [103, 102] and of course Shor’s factoring algorithm
[192, 193].
Hence let us assume that our black box calculates the unitary Uf from Equation
(4.49) with a map f : BN → BN which is two to one and has period a, i.e. f (x) =
f (y) iff y = x + a mod 2. The task is to find a. Classically, this problem is hard, i.e.
we have to query the oracle exponentially often. To see this note first that we have
to find a pair (x, y) with f (x) = f (y) and the probability to get it with two random
queries is 2−N (since there is for each x exactly one y 6= x with f (x) = f (y)). If we
use the box 2N/4 times, we get less than 2N/2 different pairs. Hence the probability
to get the correct solution is 2−N/2 , i.e. arbitrarily small even with exponentially
many queries.
Assume now that we let our box act on a quantum register H ⊗N ⊗ H⊗N in the
state Ψ ⊗ |0i with Ψ from Equation (4.48) to get Uf (Ψ ⊗ |0i) from (4.50). Now
we measure the second register. The outcome is one of 2N −1 possible values (say
f (x0 )), each of which occurs equiprobable. Hence, after the measurement the first
register is the state 2−1/2 (|xi + |x + ai). Now we let a Hadamard gate H (cf. Figure
4.9) act on each qubit of the first register and the result is (this follows with a short
calculation)
1 ¡ ¢ 1 X
√ H ⊗N |xi + |x + ai = √ (−1)x·y |yi (4.51)
2 2 N −1
a·y=0
where the dot denotes the (B-valued) scalar product in the vector space B N . Now
we perform a measurement on the first register (in computational basis) and we get
a y ∈ BN with the property y · a = 0. If we repeat this procedure N times and if we
get N linear independent values yj we can determine a as a solution of the system
of equations y1 · a = 0, . . . , yN · a = 0. The probability to appear as an outcome of
the second measurement is for each y with y · a = 0 given by 21−N . Therefore the
success probability can be made arbitrarily big while the number of times we have
to access the box is linear in N .
4.6 Quantum cryptography
Finally we want to have a short look on quantum cryptography – another more
practical application of quantum information, which has the potential to emerge into
technology in the not so distant future (see e.g. [130, 126, 44] for some experimental
realizations and [97] for a more detailed overview). Hence let us assume that Alice
has a message x ∈ BN which she wants to send secretly to Bob over a public
communication channels. One way to do this is the so called “one-time pad”: Alice
generates randomly a second bit-string y ∈ BN of the same length as x sends x + y
2 If an analog computer works reliably only with a certain accuracy, we can rewrite the algorithm
into a digital one.

73 4.6. Quantum cryptography
instead of x. Without knowledge of the key y it is completely impossible to recover

the message x from x + y. Hence this is a perfectly secure method to transmit
secret data. Unfortunately it is completely useless without a secure way to transmit
the key y to Bob, because Bob needs y to decrypt the message x + y (simply by
adding y again). What makes the situation even worse is the fact that the key y can
be used only once (therefore the name one-time pad). If two messages x 1 , x2 are
encrypted with the same key we can use x1 as a key to decrypt x2 and vice versa:
(x1 + y) + (x2 + y) = x1 + x2 , hence both messages are partly compromised.
Due to these problems completely different approaches, namely “public key sys-
tems” like DSA and RSA are used today for cryptography. The idea is to use two
keys instead of one: a private key which is used for decryption and only known to its
owner and a public key used for encryption, which is publicly available (we do not
discuss the algorithms needed for key generation, encryption and decryption here,
see [198] and the references therein instead). To use this method, Bob generates
a key pair (z, y), keeps his private key (y) at a secure place and sends the public
one (z) to Alice over a public channel. Alice encrypts her message with z sends
the result to Bob and he can decrypt it with y. The security of this scheme relies
on the assumption that the factoring problem is computationally hard, i.e. not in
class P, because to calculate y from z requires the factorization of large integers.
Since the latter is tractable on quantum computers via Shor’s algorithm, the secu-
rity of public key systems breaks down if quantum computers become available in
the future. Another problem of more fundamental nature is the unproven status of
the conjecture that factorization is not solvable in polynomial time. Consequently,
security of public key systems is not proven either.
The crucial point is now that quantum information provides a way to distribute
a cryptographic key y in a secure way, such that y can be used as a one-time
pad afterwards. The basic idea is to use the no cloning theorem to detect possible
eavesdropping attempts. To make this more transparent, let us consider a particular
example here, namely the probably most prominent protocol proposed by Benett
and Brassard in 1984 [18].
1. Assume that Alice wants to transmit bits from the (randomly generated) key
y ∈ BN through an ideal quantum channel to Bob. Before they start they
settle upon two orthonormal bases e0 , e1 ∈ H, respectively f0 , f1 ∈ H, which
are mutually nonorthogonal, i.e. |hej , fk i| ≥ ² > 0 with ² big enough for each
j, k = 0, 1. If photons are used as information carrier a typical choice are
linearly polarized photons with polarization direction rotated by 45◦ against
each other.
2. To send one bit j ∈ B Alice selects now at random one of the two bases, say
e0 , e1 and then she sends a qubit in the state |ej ihej | through the channel.
Note that neither Bob nor a potential eavesdropper knows which bases she
has chosen.
3. When Bob receives the qubit he selects, as Alice before, at random a base and
performs the corresponding von Neumann measurement to get one classical
bit k ∈ B, which he records together with the measurement method.
4. Both repeat this procedure until the whole string y ∈ BN is transmitted

and then Bob tells Alice (through a classical, public communication channel)
bit for bit which base he has used for the measurement (but not the result
of the measurement). If he has used the same base as Alice both keep the
corresponding bit otherwise they discard it. They end up with a bit-string
y 0 ∈ BM of a reduced length M . If this is not sufficient they have to continue
sending random bits until the key is long enough. For large N the rate of
4. Basic tasks 74
successfully transmitted bits per bits sended is obviously 1/2. Hence Alice has
to send approximately twice as many bits as they need.
To see why this procedure is secure, assume now that the eavesdropper Eve can
listen and modify the information sent through the quantum channel and that she
can listen on the classical channel but can not modify it (we come back to this
restriction in a minute). Hence Eve can intercept the qubits sent by Alice and make
two copies of it. One she forwards to Bob and the other she keeps for later analysis.
Due to the no cloning theorem however she has produced errors in both copies and
the quality of her own decreases if she tries to make the error in Bob’s as small
as possible. Even if Eve knows about the two bases e0 , e1 and f0 , f1 she does not
know which one Alice uses to send a particular qubit3 . Hence Eve has to decide
randomly which base to choose (as Bob). If e0 , e1 and f0 , f1 are chosen optimal,
i.e. |hej , fk i|2 = 0.5 it is easy to see that the error rate Eve necessarily produces if
she randomly measures in one of the bases is 1/4 for large N . To detect this error
Alice and Bob simply have to sacrify portions of the generated key and to compare
randomly selected bits using their classical channel. If the error rate they detect is
too big they can decide to drop the whole key and restart from the beginning.
So let us discuss finally a situation where Eve is able to intercept the quantum
and the classical channel. This would imply that she can play Bob’s part for Alice
and Alice’s for Bob. As a result she shares a key with Alice and one with Bob.
Hence she can decode all secret data Alice sends to Bob, read it, and encode it
finally again to forward it to Bob. To secure against such a “woman in the middle
attack”, Alice and Bob can use classical authentication protocols which ensure that
the correct person is at the other end of the line. This implies that they need a
small amount of initial secret material which can be renewed however from the new
key they have generated through quantum communication.
3 If Alice and Bob uses only one basis to send the data and Eve knows about it she can produce
of course ideal copies of the qubits. This is actually the reason why two nonorthogonal bases are
necessary.
Chapter 5
Entanglement measures
In the last chapter we have seen that entanglement is an essential resource for
many tasks of quantum information theory, like teleportation or quantum compu-
tation. This means that entangled states are needed for the functioning of many
processes and that they are consumed during operation. It is therefore necessary
to have measures which tell us whether the entanglement contained in a number
of quantum systems is sufficient to perform a certain task. What makes this sub-
ject difficult, is the fact that we can not restrict the discussion to systems in a
maximally or at least highly entangled pure state. Due to unavoidable decoherence
effects realistic applications have to deal with imperfect systems in mixed states,
and exactly in this situation the question for the amount of available entanglement
is interesting.
5.1 General properties and definitions
The difficulties arising if we try to quantify entanglement can be divided, roughly
speaking, into two parts: Firstly we have to find a reasonable quantity which de-
scribes exactly those properties which we are interested in and secondly we have to
calculate it for a given state. In this section we will discuss the first problem and
consider several different possibilities to define entanglement measures.
5.1.1 Axiomatics
First of all, we will collect some general properties which a reasonable entanglement
measure should have (cf. also [24, 216, 215, 217, 121]). To quantify entanglement,
means nothing else but to associate a positive real number to each state of (finite
dimensional) two-partite systems.
Axiom E0 An entanglement measure is a function E which assigns to each state
ρ of a finite dimensional bipartite system a positive real number E(ρ) ∈ R + .
Note that we have glanced over some mathematical subtleties here, because E
is not just defined on the state space of B(H ⊗ K) systems for particularly cho-
sen Hilbert spaces H and K – E is defined on any state space for arbitrary finite
dimensional H and K. This is expressed mathematically most conveniently by a
family of functions which behaves naturally under restrictions (i.e. the restriction
to a subspace H0 ⊗ K0 coincides with the function belonging to H0 ⊗ K0 ). However
we will see soon that we can safely ignore this problem.
The next point concerns the range of E. If ρ is unentangled E(ρ) should be
zero of course and it should be maximal on maximally entangled states. But what
happens if we allow the dimensions of H and K to grow? To get an answer consider
first a pair of qubits in a maximally entangled state ρ. It should contain exactly
one bit entanglement i.e. E(ρ) = 1 and N pairs in the state ρ⊗N should contain
N bits. If we interpret ρ⊗N as a maximally entangled state of a H ⊗ H system
with H = CN we get E(ρ⊗N ) = log2 (dim(H)) = N , where we have to reshuffle in
ρ⊗N the tensor factors such that (C2 ⊗ C2 )⊗N becomes (C2 )⊗N ⊗ (C2 )⊗N (i.e. “all
Alice particles to the left and all Bob particles to the right”; cf. Section 4.3.) This
observation motivates the following.
Axiom E1 (Normalization) E vanishes on separable and takes its maximum on
maximally entangled states. More precisely, this means that E(σ) ≤ E(ρ) = log 2 (d)
for ρ, σ ∈ S(H ⊗ H) and ρ maximally entangled.
5. Entanglement measures 76
One thing an entanglement measure should tell us, is how much quantum infor-
mation can be maximally teleported with a certain amount of entanglement, where
this maximum is taken over all possible teleportation schemes and distillation pro-
tocols, hence it can not be increased further by additional LOCC operations on the
entangled systems in question. This consideration motivates the following Axiom.
Axiom E2 (LOCC monotonicity) E can not increase under LOCC operation,
i.e. E[T (ρ)] ≤ E(ρ) for all states ρ and all LOCC channels T .
A special case of LOCC operations are of course local unitary operations U ⊗ V .
Axiom E2 implies now that E(U ⊗ V ρU ∗ ⊗ V ∗ ) ≤ E(ρ) and on the other hand
E(U ∗ ⊗ V ∗ ρeU ⊗ V ) ≤ E(e
ρ) hence with ρe = U ⊗ V ρU ∗ ⊗ V we get E(ρ) ≤ E(U ⊗
V ρV ⊗U ) therefore E(ρ) = E(U ⊗V ρU ∗ ⊗V ∗ ). We fix this property as a weakened
∗ ∗
version of Axiom E2:

Axiom E2a (Local unitary invariance) E is invariant under local unitaries,
i.e. E(U ⊗ V ρU ∗ ⊗ V ∗ ) = E(ρ) for all states ρ and all unitaries U , V .
This axiom shows why we do not have to bother about families of functions
as mentioned above. If E is defined on S(H ⊗ H) it is automatically defined on
S(H1 ⊗ H2 ) for all Hilbert spaces Hk with dim(Hk ) ≤ dim(H), because we can
embed H1 ⊗ H2 under this condition unitarily into H ⊗ H.
Consider now a convex linear combination λρ + (1 − λ)σ with 0 ≤ λ ≤ 1.
Entanglement can not be “generated” by mixing two states, i.e. E(λρ + (1 − λ)σ) ≤
λE(ρ) + (1 − λ)E(σ).
Axiom E3 (Convexity) E is a convex function, i.e. E(λρ + (1 − λ)σ) ≤ λE(ρ) +
(1 − λ)E(σ) for two states ρ, σ and 0 ≤ λ ≤ 1.
The next property concerns the continuity of E, i.e. if we perturb ρ slightly
the change of E(ρ) should be small. This can be expressed most conveniently as
continuity of E in the trace norm. At this point however it is not quite clear, how we
have to handle the fact that E is defined for arbitrary Hilbert spaces. The following
version is motivated basically by the fact that it is a crucial assumption in Theorem
5.1.2 and 5.1.3.
Axiom E4 (Continuity) Consider a sequence of Hilbert spaces HN , N ∈ N and
two sequences of states ρN , σN ∈ S(HN ⊗ HN ) with lim kρN − σN k1 = 0. Then we
have
E(ρN ) − E(σN )
lim = 0. (5.1)
N →∞ 1 + log2 (dim HN )
The last point we have to consider here are additivity properties: Since we are
looking at entanglement as a resource, it is natural to assume that we can do with
two pairs in the state ρ twice as much as with one ρ, or more precisely E(ρ ⊗ ρ) =
2E(ρ) (in ρ ⊗ ρ we have to reshuffle tensor factors again ;see above).
Axiom E5 (Additivity) For any pair of two-partite states ρ, σ ∈ S(H ⊗ K) we
have E(σ ⊗ ρ) = E(σ) + E(ρ).
Unfortunately this rather natural looking axiom seems to be too strong (it ex-
cludes reasonable candidates). It should be however always true that entanglement
can not increase if we put two pairs together.
Axiom E5a (Subadditivity) For any pair of states ρ, σ we have E(ρ ⊗ σ) ≤
E(ρ) + E(σ).
There are further modifications of additivity available in the literature. Most
frequently used is the following, which restricts Axiom E5 to the case ρ = σ:
77 5.1. General properties and definitions
Axiom E5b (Weak additivity) For any state ρ of a bipartite system we have
N −1 E(ρ⊗N ) = E(ρ).
Finally, the weakest version of additivity only deals with the behavior of E for
large tensor products, i.e. ρ⊗N for N → ∞.
Axiom E5c (Existence of a regularization) For each state ρ the limit
E(ρ⊗N )
E ∞ (ρ) = lim (5.2)
N →∞ N
exists.
5.1.2 Pure states
Let us consider now a pure state ρ = |ψihψ| ∈ S(H ⊗ K). If it is entangled its partial
trace σ = trH |ψihψ| = trK |ψihψ| is mixed and for a maximally entangled state it
is maximally mixed. This suggests to use the von Neumann entropy 1 of ρ, which
measures how much a state is mixed, as an entanglement measure for mixed states,
i.e. we define [17, 24] £ ¤
EvN (ρ) = − tr trH ρ ln(trH ρ) . (5.3)
It is easy to deduce from the properties of the von Neumann entropy that E vN
satisfies Axioms E0, E1, E3 and E5b. Somewhat more difficult is only Axiom E2
which follows however from a nice theorem of Nielsen [169] which relates LOCC
operations (on pure states) to the theory of majorization. To state it here we need
first some terminology. Consider two probability distributions λ = (λ 1 , . . . , λM )
and µ = (µ1 , . . . , µN ) both given in decreasing order (i.e. λ1 ≥ . . . ≥ λM and
µ1 ≥ . . . ≥ µN ). We say that λ is majorized by µ, in symbols λ ≺ µ, if
k
X k
X
λj ≤ µj ∀k = 1, . . . , min M, N (5.4)
j=1 j=1
holds. Now we have the following result (see [169] for a proof).
P 1/2 0
Theorem 5.1.1 A pure state ψ = j λj ej ⊗ ej ∈ H ⊗ K can be transformed
P 1/2
into another pure state φ = j µj fj ⊗ fj0 ∈ H ⊗ K via a LOCC operation, iff the
Schmidt coefficients of ψ are majorized by those of φ, i.e. λ ≺ µ.
The von Neumann entropy of the restriction trH |ψihψ| can be P immediately
calculated from the Schmidt coefficients λ of ψ by EvN (|ψihψ|) = − Pj λj ln(λj ).
Axiom E2 follows therefore from the fact that the entropy S(λ) = − j λj ln(λj )
of a probability distribution λ is a Shur concave function, i.e. λ ≺ µ implies S(λ) ≥
S(µ); see [171].
Hence we have seen so far that EvN is one possible candidate for an entanglement
measure on pure states. In the following we will see that it is in fact the only
candidate which is physically reasonable. There are basically two reasons for this.
The first one deals with distillation of entanglement. It was shown by Bennett et.
al. [17] that each state ψ ∈ H ⊗ K of a bipartite system can be prepared out of
(a possibly large number of) systems in an arbitrary entangled state φ by LOCC
operations. To be more precise, we can find a sequence of LOCC operations
£ ¤ £ ¤
TN : B (H ⊗ K)⊗M (N ) → B (H ⊗ K)⊗N (5.5)
such that
lim kTN∗ (|φihφ|⊗N ) − |ψihψ|k1 = 0 (5.6)
N →∞
1 We assume here and in the following that the reader is sufficiently familiar with entropies. If
this is not the case we refer to [174].

holds with a nonvanishing rate r = limN →∞ M (N )/N . This is done either by dis-
tillation (r < 1 if ψ is higher entangled then φ) or by “diluting” entanglement,
i.e. creating many less entangled states from few highly entangled ones (r > 1).
All this can be performed in a reversible way: We can start with some maximally
entangled qubits dilute them to get many less entangled states which can be dis-
tilled afterwards to get the original states back (again only in an asymptotic sense).
The crucial point is that the asymptotic rate r of these processes is given in terms
of EvN by r = EvN (|φihφ|)/EvN (|ψihψ|). Hence we can say, roughly speaking that
EvN (|ψihψ|) describes exactly the amount of maximally entangled qubits which is
contained in |ψihψ|.
A second somewhat more formal reason is that EvN is the only entanglement
measure on the set of pure states which satisfies the axioms formulated above. In
other words the following “uniqueness theorem for entanglement measures” holds
[182, 217, 74]
Theorem 5.1.2 The reduced von Neumann entropy EvN is the only entanglement
measure on pure states which satisfies Axioms E0 – E5.
5.1.3 Entanglement measures for mixed states
To find reasonable entanglement measures for mixed states is much more difficult.
There are in fact many possibilities (e.g. the maximally entangled fraction intro-
duced in Subsection 3.1.1 can be regarded as a simple measure) and we want to
present therefore only four of the most reasonable candidates. Among those mea-
sures which we do not discuss here are negativity quantities ([220] and the references
therein) the “best separable approximation” [151], the base norm associated with
the set of separable states [219, 188] and ppt-distillation rates [185].
The first measure we want to present is oriented along the discussion of pure
states: We define, roughly speaking, the asymptotic rate with which maximally
entangled qubits can be distilled at most out of a state ρ ∈ S(H ⊗ K) as the
Entanglement of Distillation ED (ρ) of ρ; cf [20]. To be more precise consider all
possible distillation protocols for ρ (cf. Section 4.3), i.e. all sequences of LOCC
channels
TN : B(CdN ⊗ CdN ) → B(H⊗N ⊗ K⊗N ) (5.7)
such that
lim kTN∗ (ρ⊗N ) − |ΩN ihΩN | k1 = 0 (5.8)
N →∞
holds with a sequence of maximally entangled states ΩN ∈ CdN . Now we can define
log2 (dN )
ED (ρ) = sup lim sup , (5.9)
(TN )N ∈N N →∞ N
where the supremum is taken over all possible distillation protocols (T N )N ∈N . It

is not very difficult to see that ED satisfies E0, E1, E2 and E5b. It is not known
whether continuity (E4) and convexity (Axiom E3) holds. It can be shown however
that ED is not convex (and not additive; Axiom E5) if npt bound entangled states
exist (see [194], cf. also Subsection 4.3.3).
For pure states we have discussed beside distillation the “dilution” of entan-
glement and we can use, similar to ED , the asymptotic rate with which bipartite
systems in a given state ρ can be prepared out of maximally entangled singlets [108].
Hence consider again a sequence of LOCC channels
TN : B(H⊗N ⊗ K⊗N ) → B(CdN ⊗ CdN ) (5.10)
dN
and a sequence of maximally entangled states ΩN ∈ C , N ∈ N, but now with the
property
lim kρ⊗N − TN∗ (|ΩN ihΩN |) k1 = 0. (5.11)
N →∞
79 5.2. Two qubits
Then we can define the entanglement cost EC (ρ) of ρ as

log2 (dN )
EC (ρ) = inf lim inf , (5.12)
(SN )N ∈N N →∞ N
where the infimum is taken over all dilution protocols SN , N ∈ N. It is again easy to
see that EC satisfies E0, E1, E2 and E5b. In contrast to ED however it can be shown
that EC is convex (Axiom E3), while it is not known, whether EC is continuous
(Axiom E4); cf [108] for proofs.
ED and EC are directly based on operational concepts. The remaining two mea-
sures we want to discuss here are defined in a more abstract way. The first can be
characterized as the minimal convex extension of EvN to mixed states: We define
the entanglement of formation EF of ρ as [24]
X
EF (ρ) = P inf pj EvN (|ψj ihψj |), (5.13)
ρ= pj |ψj ihψj |
j
where the infimum is taken over all decompositions of ρ into a convex sum of pure
states. EF satisfies E0 - E4 and E5a (cf. [24] for E2 and [170] for E4 the rest follows
directly from the definition). Whether EF is (weakly) additive (Axiom E5b) is not
known. Furthermore it is conjectured that EF coincides with EC . However proven
is only the identity EF∞ = EC , where the existence of the regularization EF∞ of EF
follows directly from subadditivity.
Another idea to quantify entanglement is to measure the “distance” of the (en-
tangled) ρ from the set of separable states D. It hat turned out [216] that among
all possible distance functions the relative entropy is physically most reasonable.
Hence we define the relative entropy of entanglement as
£ ¡ ¢¤
ER (ρ) = inf S(ρ|σ), S(ρ|σ) = tr ρ log2 ρ − ρ log2 σ , (5.14)
σ∈D
where the infimum is taken over all separable states. It can be shown that E R
satisfies, as EF the Axioms E0 - E4 and E5a, where E1 and E2 are shown in [216]
and E4 in [73]; the rest follows directly from the definition. It is shown in [221] that
∞
ER does not satisfy E5b; cf. also Subsection 5.3. Hence the regularization ER of
ER differs from ER .
Finally let us give now some comments on the relation between the measures just
introduced. On pure states all measures just discussed, coincide with the reduced
von Neumann entropy – this follows from Theorem 5.1.2 and the properties stated in
the last Subsection. For mixed states the situation is more difficult. It can be shown
however that ED ≤ EC holds and that all “reasonable” entanglement measures lie
in between [121].
Theorem 5.1.3 For each entanglement measure E satisfying E0, E1, E2 and E5b
and each state ρ ∈ S(H ⊗ K) we have ED (ρ) ≤ E(ρ) ≤ EC (ρ).
Unfortunately no measure we have discussed in the last Subsection satisfies all
the assumptions of the theorem. It is possible however to get a similar statement for
the regularization E ∞ with weaker assumptions on E itself (in particular without
assuming additivity); cf [74].
5.2 Two qubits
Even more difficult than finding reasonable entanglement measures are explicit cal-
culations. All measures we have discussed above involve optimization processes over
spaces which grow exponentially with the dimension of the Hilbert space. A direct
numerical calculation for a general state ρ is therefore hopeless. There are however
some attempts to get either some bounds on entanglement measures or to get ex-
plicit calculations for special classes of states. We will concentrate this discussion to
some relevant special cases. On the one hand we will concentrate on EF and ER and
on the other we will look at two special classes of states where explicit calculations
are possible: Two qubit systems in this section and states with symmetry properties
in the next one.
5.2.1 Pure states
Assume for the rest of this section that H = C2 holds and consider first a pure state
ψ ∈ H ⊗ H. To calculate EvN (ψ) is of course not difficult and it is straightforward
to see that (cf. for all material of this and the following subsection [24]):
· ³ ´¸
1 p
EvN (ψ) = H 1 + 1 − C(ψ)2 (5.15)
2
holds, with
H(x) = −x log2 (x) − (1 − x) log2 (1 − x) (5.16)
and the concurrence C(ψ) of ψ which is defined by
¯ ¯
¯ 3 ¯ 3
¯X 2 ¯ X
C(ψ) = ¯¯ αj ¯¯ with ψ = αj Φ j , (5.17)
¯ j=0 ¯ j=0
where Φj , j = 0, . . . , 3 denotes the Bell basis (3.3). Since C becomes rather im-
portant in the following let us reexpress it as C(ψ) = |hψ, Ξψi|, where ψ 7→ Ξψ
denotes complex conjugation in Bell basis. Hence Ξ is an antiunitary operator and
it can be written as the tensor product Ξ = ξ ⊗ ξ of the map H 3 φ 7→ σ2 φ̄, where
φ̄ denotes complex conjugation in the canonical basis and σ2 is the second Pauli
matrix. Hence local unitaries (i.e. those of the form U1 ⊗ U2 ) commute with Ξ and
it can be shown that this is not only a necessary but also a sufficient condition for
a unitary to be local [222].
We see from Equations (5.15) and (5.17) that C(ψ) ranges from 0 to 1 and that
EvN (ψ) is a monotone function in C(ψ). The latter can be considered therefore as
an entanglement quantity in its own right. For a Bell state we get in particular
C(Φj ) = 1 while a separable state φ1 ⊗ φ2 leads to C(φ1 ⊗ φ2 ) = 0; this can be seen
easily with the factorization Ξ = ξ ⊗ ξ.
Assume now that one of the αj say α0 satisfies |α0 |2 > 1/2. This implies that
C(ψ) can not be zero since ¯ ¯
¯X ¯
¯ 3 2¯
¯ αj ¯¯ ≤ 1 − |α0 |2 (5.18)
¯
¯ j=1 ¯
must hold. Hence C(ψ) is at least 1 − 2|α0 |2 and this implies for EvN and arbitrary
ψ
( h p i
¡ 2
¢ H 21 + x(1 − x) x ≥ 21
EvN (ψ) ≥ h |hΦ0 , ψi| with h(x) = . (5.19)
0 x < 12
This inequality remains valid if we replace Φ0 by any other maximally entangled

state Φ ∈ H⊗H. To see this note that two maximally entangled states Φ, Φ 0 ∈ H⊗H
are related (up to a phase) by a local unitary transformation U1 ⊗ U2 (this follows
immediately from their Schmidt decomposition; cf Subsection 3.1.1). Hence, if we
replace the Bell basis in Equation (5.17) by Φ0j = U1 ⊗ U2 Φj , j = 0, . . . , 3 we get for
the corresponding C 0 the equation C 0 (ψ) = hU1∗ ⊗ U2∗ ψ, ΞU1∗ ⊗ U2∗ ψi = C(ψ) since Ξ
commutes with local unitaries. We can even replace |hΦ0 , ψi|2 with the supremum
over all maximally entangled states and get therfore
£ ¡ ¢¤
EvN (ψ) ≥ h F |ψihψ| , (5.20)
81 5.2. Two qubits
¡ ¢
where F |ψihψ| is the maximally entangled fraction of |ψihψ| which we have in-
troduced in Subsection 3.1.1.
To see that even equality holds in Equation (5.20) note first that it is sufficient to
consider the case ψ = a|00i+b|11i with a, b ≥ 0, a2 +b2 = 1, since each pure state ψ
can be brought into this form (this follows again from the Schmidt decomposition)
by a local unitary transformation which on the other hand does not change E vN .
The maximally
¡ ¢ entangled state which maximizes |hψ, Φi|2 is in this case Φ0 and we
2
£ F
get ¡ |ψihψ|¢¤ = (a + b) /2 = 1/2 + ab. Straightforward calculations now show that
h F |ψihψ| = h(1/2 + ab) = EvN (ψ) holds as stated.
5.2.2 EOF for Bell diagonal states
It is easy to extend the inequality (5.20) to mixed states if we use the convexity of
EF and the fact that EF coincides with EvN on pure states. Hence (5.20) becomes
£ ¤
EF (ρ) ≥ h F(ρ) . (5.21)
For general two qubit states

¡ this bound is not¢ achieved however. This can be seen
with the example ρ = 1/2 |φ1 ihφ1 | + |00ih00| , which we have already considered
in the last
£ paragraph
¤ of Subsection 3.1.1. It is easy to see that F(ρ) = 1/2 holds
hence h F(ρ) = 0 but ρ is entangled. Nevertheless we can show that equality holds
P3
P j ihΦj |. To
in Equation (5.21) if we restrict it to Bell diagonal states ρ = j=0 λj|Φ
prove this statement we have to find a convex decomposition ρ = j µj |Ψj ihΨj |
£ ¤ P
of such a ρ into pure states |Ψj ihΨj | such that h F(ρ) = j µj EvN (|Ψj ihΨj |
£ ¤
holds. Since EF (ρ) can not be smaller than h F(ρ) due to inequality (5.21) this
decomposition must be optimal and equality is proven.
To find such Ψj assume first that the biggest eigenvalue of ρ is greater than 1/2,
and let, without loss of generality, be λ1 this eigenvalue. A good choice for the Ψj
are then the eight pure states
 
p X 3
p
λ0 Φ0 + i  (± λj )Φj  (5.22)
j=1
The
P reduced von Neumann entropy of all these states equals h(λ1 ), hence
j µj EvN (|Ψj ihΨj |) = h(λ1 ) and therefore EF (ρ) = h(λ1 ). Since the maximally
entangled fraction of ρ is obviously λ1 we see that (5.21) holds with equality.
Assume now that the highest eigenvalue is less than 1/2. Then we can find phase
P3
factors exp(iφj ) such that j=0 exp(iφj )λj = 0 holds and ρ can be expressed as a
convex linear combination of the states
 
p 3
X p
eiφ0 /2 λ0 Φ0 + i  (±eiφj /2 λj )Φj  . (5.23)
j=1
The concurrence C of all these states is 0 hence their entanglement is 0 by Equation

(5.15), which in turn implies EF (ρ) = 0. Again we see that equality is achieved in
(5.21) since the maximally entangled fraction of ρ is less than 1/2. Summarizing
this discussion we have shown (cf. Figure 5.1)
Proposition 5.2.1 A Bell diagonal state ρ is entangled iff its highest eigenvalue λ
is greater than 1/2. In this case the Entanglement of Formation of ρ is given by
· ¸
1 p
EF (ρ) = H + λ(1 − λ) . (5.24)
2
1
Entanglement of Formation
EF (ρ) Relative Entropy
ER (ρ)
0.8
0.6
0.4
0.2
0
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
Highest eigenvalue λ of ρ
Figure 5.1: Entanglement of Formation and Relative Entropy of Entanglement for
Bell diagonal states, plotted as a function of the highest eigenvalue λ of ρ
5.2.3 Wootters formula

If we have a general two qubit state ρ there is a formula of Wootters [238] which
allows an easy calculation of EF . It is based on a generalization of the concurrence
C to mixed states. To motivate it rewrite C 2 (ψ) = |hψ, Ξψi| as
¡ ¢ ¡ ¢
C 2 (ψ) = tr |ψihψ||ΞψihΞψ| = tr ρΞρΞ = tr(R2 ) (5.25)
with q
√ √
R= ρΞρΞ ρ. (5.26)
Here we have set ρ = |ψihψ|. The definition of the hermitian matrix R however
makes sense for arbitrary ρ as well. If we write λj , j = 1, . . . , 4 for the eigenvalues of
R and λ1 is without loss of generality the biggest one we can define the concurrence
of an arbitrary two qubit state ρ as [238]
¡ ¢
C(ρ) = max 0, 2λ1 − tr(R) = max(0, λ1 − λ2 − λ3 − λ4 ). (5.27)
It is easy to see that C(|ψihψ|) coincides with C(ψ) from (5.17). The crucial point
is now that Equation (5.15) holds for EF (ρ) if we insert C(ρ) instead of C(ψ):
Theorem 5.2.2 (Wootters Formula) The Entanglement of Formation of a two
qubit system in a state ρ is given by
· ³ ´¸
1 p
EF (ρ) = H 1 + 1 − C(ρ)2 (5.28)
2
where the concurrence of ρ is given in Equation (5.27) and H denotes the binary
entropy from (5.16).
P To prove this theorem we firstly have to find a convex decomposition ρ =

j µj |Ψj ihΨj |P
of ρ into pure states Ψj such that the average reduced von Neu-
mann entropy j µj EvN (Ψj ) coincides with the right hand side of Equation (5.28).
Secondly we have to show that we have really found the minimal decomposition.
83 5.3. Entanglement measures under symmetry
Since this is much more involved than the simple case discussed in Subsection 5.2.2
we omit the proof and refer to [238] instead. Note however that Equation (5.28)
really coincides with the special cases we have derived for pure and Bell diagonal
states. Finally let us add the remark that there is no analogon of Wootters’ for-
mula for higher dimensional Hilbert spaces. It can be shown [222] that the essential
properties of the Bell basis Φj , j = 0, .., 3 which would be necessary for such a
generalization are available only in 2 × 2 dimensions.
5.2.4 Relative entropy for Bell diagonal states
To calculate the Relative Entropy of Entanglement ER for two qubit systems is
more difficult. However there is at least an easy formula for Bell diagonal states
which we will give in the following; [216].
Proposition 5.2.3 The Relative Entropy of Entanglement for a Bell diagonal state
ρ with highest eigenvalue λ is given by (cf. Figure 5.1)
(
1 − H(λ) λ > 21
ER (ρ) = (5.29)
0 λ ≤ 12
P3
Proof. For a Bell diagonal state ρ = j=0 λj |Φj ihΦj | we have to calculate
£ ¡ ¢¤
ER (ρ) = inf tr ρ log2 ρ − ρ log2 σ (5.30)
σ∈D
 
X 3
= tr(ρ log2 ρ) + inf − λj hΦj , log2 (σ)Φj i . (5.31)
σ∈D
j=0
Since log is a concave function we have − log 2 hΦj , σΦj i ≤ hΦj , − log2 (σ)Φj i and
therefore  
3
X
ER (ρ) ≥ tr(ρ log2 ρ) + inf − λj log2 hΦj , σΦj i . (5.32)
σ∈D
j=0
Hence only the diagonal elements of σ in the Bell basis enter the minimization
on the right hand side of this inequality and this implies that we can restrict the
infimum to the set of separable Bell diagonal state. Since a Bell diagonal state is
separable iff all its eigenvalues are less than 1/2 (Proposition 5.2.1) we get
 
X3 3
X
ER (ρ) ≥ tr(ρ log2 ρ) + inf − λj log2 pj  , with pj = 1. (5.33)
pj ∈[0,1/2]
j=0 j=0
This is an optimization problem (with constraints) over only four real parameters
and easy to solve. If the highest eigenvalue of ρ is greater than 1/2 we get p 1 = 1/2
and pj = λj /(2 − 2λ), where we have chosen without loss of generality λ = λ1 . We
get a lower bound on ER (ρ) which is achieved if we insert the corresponding σ in
Equation (5.31). Hence we have proven the statement for λ > 1/2. which completes
the proof, since we have already seen that λ ≤ 1/2 implies that ρ is separable
(Proposition 5.2.1). 2
5.3 Entanglement measures under symmetry

The problems occuring if we try to calculate quantities like ER or EF for general
density matrices arise from the fact that we have to solve optimization problems
over very high dimensional spaces. One possible strategy to get explicit results is
therefore parameter reduction by symmetry arguments. This can be done if the state
in question admits some invariance properties like Werner, isotropic or OO-invariant

states; cf. Section 3.1. In the following we will give some particular examples for
such calculations, while a detailed discussion of the general idea (together with much
more examples and further references) can be found in [221].
5.3.1 Entanglement of Formation
Consider a compact group of unitaries G ⊂ B(H ⊗ H) (where H is again arbitrary
R all ρ with [V, ρ] = 0 for all V ∈
finite dimensional), the set of G-invariant states, i.e.
G and the corresponding twirl operation PG σ = G V σV ∗ dV . Particular examples
we are looking at are: 1. Werner states where G consists of all unitaries U ⊗ U
2. Isotropic states where each V ∈ G has the form V = U ⊗ Ū and finally 3.
OO-invariant states where G consists of unitaries U ⊗ U with real matrix elements
(U = Ū ) and the twirl is given in Equation (3.24).
One way to calculate EF for a G-invariant state ρ consists now of the following
steps: 1. Determine the set Mρ of pure states Φ such that PG |ΦihΦ| = ρ holds. 2.
Calculate the function
PG S 3 ρ 7→ ²G (ρ) = inf{EvN (σ) | σ ∈ Mρ } ∈ R, (5.34)
where we have denoted the set of G-invariant states with PG S. 3. Determine EF (ρ)
then in terms of the convex hull of ², i.e.
P
EF (ρ) = inf{ j λj ²(σj ) |
P P
σj ∈ PG S, 0 ≤ λj ≤ 1, ρ = j λj σj , j λj = 1}. (5.35)
The equality in the last Equation is of course a non-trivial statement which has to
be proved. We skip this point, however, and refer the reader to [221]. The advantage
of this scheme relies on the fact that spaces of G invariant states are in general very
low dimensional (if G is not too small). Hence the optimization problem contained
in step 3 has a much bigger chance to be tractable than the one we have to solve for
the original definition of EF . There is of course no guarantee that any of this three
steps can be carried out in a concrete situation. For the three examples mentioned
above, however, there are results available, which we will present in the following.
5.3.2 Werner states
Let us start with Werner states [221]. In this case ρ is uniquely determined by its
flip expectation value tr(ρF ) (cf. Subsection 3.1.2). To determine Φ ∈ H ⊗ H such
that PUU |ΦihΦ| = ρ holds, we have to solve therefore the equation
X
hΦ, F Φi = Φjk Φkj = tr(F ρ), (5.36)
jk
where Φjk denote components of Φ in the canonical basis. On the otherPhand the
reduced density matrix ρ = tr1 |ΦihΦ| has the matrix elements ρjk = l Φjl Φkl .
By exploiting U ⊗ U invariance we can assume without loss of generality that ρ is
diagonal. Hence to get the function ²UU we have to minimize
" #
¡ ¢ X X
2
EvN |ΦihΦ| = S |Φjk | (5.37)
j k
under the constraint (5.36), where S(x) = −x log2 (x) denotes the von Neumann
entropy. We skip these calculations here (see [221] instead) and state the results
only. For tr(F ρ) ≥ 0 we get ²(ρ) = 0 (as expected since ρ is separable in this case)
and with H from (5.16)
· ³ ´¸
1 p
²UU (ρ) = H 1 − 1 − tr(F ρ) 2 (5.38)
2
1
EF (ρ)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1 -0.8 -0.6 -0.4 -0.2 0
tr(ρF )
Figure 5.2: Entanglement of Formation for Werner states plotted as function of the
flip expectation.
for tr(F ρ) < 0. The minima are taken for Φ where all Φjk except one diagonal
element are zero in the case tr(F ρ) ≥ 0 and for Φ with only two (non-diagonal)
coefficients Φjk , Φkj , j 6= k nonzero if tr(ρF ) < 0. The function ² is convex and
coincides therefore with its convex hull such that we get
Proposition 5.3.1 For any Werner state ρ the Entanglement of Formation is
given by (cf. Figure 5.2)
( h ³ p í
H 12 1 − 1 − tr(F ρ)2 tr(F ρ) < 0
EF (ρ) = (5.39)
0 tr(F ρ) ≥ 0.
5.3.3 Isotropic states

Let us now consider isotropic, i.e. U ⊗ Ū invariant states. They are determined by
the expectation value tr(ρFe ) with Fe from Equation (3.14). Hence we have to look
first for pure states Φ with hΦ, Fe¡Φi = tr(ρ e
¢ F ) (since this determines, as for Werner
states above, those Φ with PP UŪ |ΦihΦ| = ρ). To this P end assume that Φ has the
Schmidt decomposition Φ = j λj fj ⊗ fj0 = U1 ⊗ U2 j λj ej ⊗ ej with appropriate
unitary matrices U1 , U2 and the canonical basis ej , j = 1, . . . , d. Exploiting the
U ⊗ Ū invariance of ρ we get
* +
X X
e
tr(ρF ) = (1I ⊗ V ) e
λj ej ⊗ ej , F (1I ⊗ V ) λ k ek ⊗ e k (5.40)
j k
X
= λj λk hej ⊗ V ej , el ⊗ el ihem ⊗ em , ek ⊗ V ek i (5.41)
j,k,l,m
¯ ¯2
¯X ¯
¯ ¯
¯
=¯ λj hej , V ej i¯¯ (5.42)
¯ j ¯
with V = U1T U2 and after inserting the definition of Fe . Following our general
scheme, we have to minimize EvN (|ΦihΦ|) under the constraint given in Equation
2
d=4
²UŪ (ρ) d=3
1.8 d=2
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0
1 1.5 2 2.5 3 3.5 4
tr(ρFe )
Figure 5.3: ²-function for isotopic states plotted as a function of the flip expectation.
For d > 2 it is not convex near the right endpoint.
(5.42). This is explicitly done in [210]. We will only state the result here, which
leads to the function
(
H(γ) + (1 − γ) log2 (d − 1) tr(ρFe ) ≥ d1
²UŪ (ρ) = (5.43)
0 tr(ρFe ) < 0
with µq q ¶2
1
γ= 2 tr(ρFe ) + e
[d − 1][d − tr(ρF )] . (5.44)
d
For d ≥ 3 this function is not convex (cf. Figure 5.3), hence we get
Proposition 5.3.2 For any isotropic state the Entanglement of Formation is given
as the convex hull
P P
EF (ρ) = inf{ j λj ²UŪ (σj ) | ρ = j λj σj , PUŪ σ = σ} (5.45)
of the function ²UŪ in Equation (5.43).

5.3.4 OO-invariant states
The results derived for isotropic and Werner states can be extended now to a large
part of the set of OO-invariant states without solving new minimization problems.
This is possible, because the definition of EF in Equation (5.13) allows under some
conditions an easy extension to a suitable set of non-symmetric
P states. If more
precisely a nontrivial, minimizing decomposition ρ = j pj |ψj ihψj | of ρ is known,
all states ρ0 which are a convex linear combination of the same |ψj ihψj | but arbitrary
p0j have the same EF as ρ (see [221] for proof of the statement). For the general
scheme we have presented in Subsection 5.3.1 this implies the following: If we know
the pure states σ ∈ Mρ which solve the minimization problem for ²(ρ) in Equation
(5.34) we get a minimizing decomposition of ρ in terms of U ∈ G translated copies
of σ. This follows from the fact that ρ is by definition of Mρ the twirl of σ. Hence
any convex linear combination of pure states U σU ∗ with U ∈ G has the same EF
as ρ.
A detailed analysis of the corresponding optimiza-

3 tion problems in the case of Werner and isotropic
states (which we have omitted here; see [221, 210] in-
stead) leads therefore to the following results about
OO-invariant states: The space of OO-invariant states
2 decomposes into four regions: The separable square and
A three triangles A, B, C; cf. Figure 5.4. For all states ρ in
triangle A we can calculate EF (ρ) as for Werner states
C in Proposition 5.3.1 and in triangle B we have to apply
1
the result for isotropic states from Proposition 5.3.2.
This implies in particular that EF depends in A only
B on tr(ρF ) and in B only on tr(ρFe ) and the dimension.
-1 0 1
0 5.3.5 Relative Entropy of Entanglement
To calculate ER (ρ) for a symmetric state ρ is even eas-
Figure 5.4: State space of
ier as the treatment of EF (ρ), because we can restrict
OO-invariant states.
the minimization in the definition of ER (ρ) in Equa-
tion (5.14) to G-invariant separable states, provided G
is a group of local unitaries. To see this assume that σ ∈ D minimizes S(ρ|σ) for
a G-invariant state ρ. Then we get S(ρ|U σU ∗ ) = S(ρ|σ) for all U ∈ G since the
relative entropy S is invariant under unitary transformations of both arguments
and due to its convexity we even get S(ρ|PG σ) ≤ S(ρ|σ). Hence PG σ minimizes
S(ρ| · ) as well, and since PG σ ∈ D holds for a group G of local unitaries, we get
ER (σ, ρ) = S(ρ|PG σ) as stated.
1
ER (ρ)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1 -0.8 -0.6 -0.4 -0.2 0
tr(ρF )
Figure 5.5: Relative Entropy of Entanglement for Werner states, plotted as a func-
tion of the flip expectation.
The sets of Werner and isotropic states are just intervals and the corresponding
separable states form subintervals over which we have to perform the optimization.
Due to the convexity of the relative entropy in both arguments, however, it is
clear that the minimum is attained exactly at the boundary between entangled and
separable states. For Werner states this is the state σ0 with tr(F σ0 ) = 0, i.e. it
gives equal weight to both minimal projections. To get ER (ρ) for a Werner state ρ
we have to calculate therefore only the relative entropy with respect to this state.
Since all Werner states can be simultaneously diagonalized this is easily done and
we get: µ ¶
1 + tr(F ρ)
ER (ρ) = 1 − H (5.46)
2
Similarly, the boundary point σ1 for isotropic states is given by tr(Fe σ1 ) = 1 which
leads to
Ã ! Ã !
tr(Fe ρ) tr(Fe ρ) 1 − tr(Fe ρ)
ER (ρ) = log2 d − 1 − log2 (d − 1) − S , (5.47)
d d d
for each entangled isotropic state ρ, and 0 if ρ is separable. (S(p1 , p2 ) denotes here
the entropy of the probability vector (p1 , p2 ).)
2
d=2
ER (ρ) d=3
d=4
1.5
0.5
0
1 1.5 2 2.5 3 3.5 4
tr(ρFe )
Figure 5.6: Relative Entropy of Entanglement for isotropic states and d = 2, 3, 4,

plotted as a function of tr(ρFe ).
Let us now consider OO-invariant states. As for EOF we divide the state space
into the separable square and the three triangles A, B, C; cf. Figure 5.4. The state
at the coordinates (1, d) is a maximally entangled state and all separable states on
the line connecting (0, 1) with (1, 1) minimize the relative entropy for this state.
Hence consider a particular state σ on this line. The convexity property of the
relative entropy immediately shows that σ is a minimizer for all states on the line
connecting σ with the state at (1, d). In this way it is easy to calculate E R (ρ) for
all ρ in A. In a similar way we can treat the triangle B: We just have to draw a line
from ρ to the state at (−1, 0) and find the minimizer for ρ at the intersection with
the separable border between (0, 0) and (0, 1). For all states in the triangle C the
relative entropy is minimized by the separable state at (0, 1).
An application of the scheme just reviewed is a proof that ER is not additive, i.e.
it does not satisfy Axiom E5b. To see this consider the state ρ = tr(P− )−1 P− where
P− denotes the projector on the antisymmetric subspace. It is a Werner state with
flip expectation −1 (i.e. it corresponds to the point (−1, 0) in Figure 5.4). According
to our discussion above S(ρ| · ) is minimized in this case by the separable state σ 0
and we get ER (ρ) = 1 independently of the dimension d. The tensor product ρ⊗2
can be regarded as a state in S(H⊗2 ⊗ H⊗2 ) with U ⊗ U ⊗ V ⊗ V symmetry, where
U, V are unitaries on H. Note that the corresponding state space of U U V V invariant
states can be parameterized by the expectation of the three operators F ⊗ 1I, 1I ⊗ F
and F ⊗ F (cf. [221]) and we can apply the machinery just described to get the
minimizer σe of S(ρ| · ). If d > 2 holds it turns out that
d+1 d−1
σ
e= P+ ⊗ P + + P− ⊗ P − (5.48)
2d tr(P+ )2 2d tr(P− )2
holds (where P± denote the projections onto the symmetric and antisymmetric
subspaces of H ⊗ H) and not σe = σ0 ⊗ σ0 as one would expect. As a consequence
we get the inequality
µ ¶
2d − 1
⊗2
ER (ρ ) = 2 − log2 < 2 = S(ρ⊗2 |σ0⊗2 ) = 2ER (ρ). (5.49)
d
d = 2 is a special case, where σ0⊗2 and σ

e (and all their convex linear combination)
give the same value 2. Hence for d > 2 the Relative Entropy of Entanglement is, as
stated, not additive.
Chapter 6
Channel capacity
In Section 4.4 we have seen that it is possible to send (quantum) information

undisturbed through a noisy quantum channel, if we encode one qubit into a (pos-
sibly long and highly entangled) string of qubits. This process is wasteful, since
we have to use many instances of the channel to send just one qubit of quantum
information. It is therefore natural to ask, which resources we need at least if we
are using the best possible error correction scheme. More precisely the question is:
With which maximal rate, i.e. information sent per channel usage, we can transmit
quantum information undisturbed through a noisy channel? This question naturally
leads to the concept of channel capacities which we will review in this chapter.
6.1 Definition and elementary properties
A quantum channel T can be used to send quantum as well as classical data. Hence
we can associate a classical and a quantum capacity to T . The basic ideas behind
both quantities are, however, quite similar. In this section we will consider therefore
a general definition of capacity which applies to arbitrary channels and both kinds
of information. (See also [232] as a general reference for this section.)
6.1.1 The definition
Hence consider two observable algebras A1 , A2 and an arbitrary channel T : A1 →
A2 . To send systems described by a third observable algebra B undisturbed through
T we need an encoding channel E : A2 → B and a decoding channel D : B → A1
such that ET D equals the ideal channel B → B, i.e. the identity on B. Note that
the algebra B describing the systems to send, and the input respectively output
algebra of T need not to be of the same type, e.g. B can be classical while A 1 , A2
are quantum (or vice versa).
In general (i.e. for arbitrary T and B) it is of course impossible to find such a
pair E and D. In this case we are interested at least in encodings and decodings
which make the error produced during the transmission as small as possible. To
make this statement precise we need a measure for this error and there are in fact
many good choices for such a quantity (many of them leading to equivalent results).
We will use in the following the “cb-norm difference” kET D − Id kcb , where Id is
the identity (i.e. ideal) channel on B and k · kcb denotes the norm of complete
boundedness (“cb-norm” for short)
kT kcb = sup kT ⊗ Idn k, Idn : B(Cn ) → B(Cn ) (6.1)

n∈N
The cb-norm improves the sometimes annoying property of the unusual operator
norm that quantities like kT ⊗ IdB(Cd ) k may increase with the dimension d. On
infinite dimensional observable algebras kT kcb can be infinite although each term
in the supremum is finite. A particular example for a map with such a behavior is the
transposition on an infinite dimensional Hilbert space. A map with finite cb-norm
is therefore called completely bounded. In a finite dimensional setup each linear
map is completely bounded. For the transposition Θ on Cd we have in particular
kΘkcb = d. The cb-norm has some nice features which we will use frequently; this
includes its multiplicativity kT1 ⊗T2 kcb = kT1 kcb kT2 kcb and the fact that kT kcb = 1
holds for each (unital) channel. Another useful relation is kT kcb = kT ⊗ IdB(H) k,
which holds if T is a map B(H) → B(H). For more properties of the cb-norm let us
refer to [178].
91 6.1. Definition and elementary properties
Now we can associate to a pair of channels T : A1 → A2 and S : B1 → B2 the

quantity
∆(T, S) = inf kET D − Skcb , (6.2)
E,D
where the infimum is taken over all encoding and decoding channels E : A2 → B2
respectively D : B1 → A1 . The map S plays the role of a reference channel and
∆(T, S) is the minimal error we have to take into account if we want to simulate S
by T and appropriate encodings and decodings. If we try in particular to transmit
B systems through T we have to choose B1 = B2 = B and S = IdB . In this case we
write
∆(T, B) = ∆(T, IdB ) = inf kET D − IdB kcb . (6.3)
E,D
In Section 4.4, we have seen that we can reduce the error if we take M copies of the
channel instead of just one. More generally we are interested in the transmission
of “codewords of length” N , i.e. B ⊗N systems using M copies of the channel T .
Encodings and decodings are in this case channels of the form E : A⊗M 2 → B ⊗N
⊗N ⊗M
respectively D : B → A1 . If we increase the number M of channels the error
∆(T ⊗M , B ⊗N ) decreases provided the rate with which N grows as a function of
M is not too large. A more precise formulation of this idea leads to the following
definition.
Definition 6.1.1 A number c ≥ 0 is called achievable rate for a channel T with
respect to a reference channel S, if for any pair of sequences Mj , Nj , j ∈ N with
Mj → ∞ and lim supj→∞ Nj /Mj < c we have
lim ∆(T ⊗Mj , S ⊗Nj ) = 0. (6.4)

j→∞
The supremum of all achievable rates is called the capacity of T with respect to S
and denoted by C(T, S). If S is the ideal channel on an observable algebra B we
write C(T, B) instead of C(T, IdB ). Similarly we write C(A, S) if T is an ideal A
channel.
Note that by definition c = 0 is an achievable rate hence C(T, S) ≥ 0. If on
the other hand each c > 0 is achievable we write C(T, B) = ∞. At a first look
it seems cumbersome to check all pairs of sequences with given upper ratio when
testing c. Due to some monotonicity properties of ∆, however, it can be shown that
it is sufficient to check only one sequence provided the Mj satisfy the additional
condition Mj /(Mj+1 ) → 1. This is the subject of the following lemma.
Lemma 6.1.2 Let (Mα )α∈N be a strictly increasing sequence of integers such that
limα Mα+1 /Mα = 1. Suppose Nα are integers such that limα ∆(T ⊗Mα , S ⊗Na ) = 0.
Then any
Nα
c < lim inf (6.5)
α Mα
is an admissible rate. Moreover, if the errors decrease exponentially, in the sense
that ∆(T ⊗Mα , S ⊗Nα ) ≤ µe−λMα (µ, λ ≥ 0), then they decrease exponentially for
M → ∞ with rate
−1
lim inf ln ∆(T ⊗M , S ⊗bcM c ) ≥ λ, (6.6)
M →∞ M
where bxc denotes the largest integer smaller then x.

Proof. Let us introduce the notation c+ = lim inf α Nα /Mα , so c < c+ . We pick η > 0
such that (1 + η)c < c+ . Then for sufficiently large α ≥ α0 we have (Mα+1 /Mα ) ≤
(1 + η), and (Nα /Mα ) ≥ (1 + η)c. Now let M ≥ Mα0 , and consider the unique index
α such that Mα ≤ M ≤ Mα+1 . Then M ≤ (1 + η)Mα and
bcM c ≤ cM ≤ c(1 + η)Mα ≤ Nα . (6.7)

6. Channel capacity 92
Clearly, ∆(T ⊗M , S ⊗N ) decreases as M increases, because good coding becomes

easier if we have more parallel channels and increases with N , because if a coding
scheme works for codewords of length N it also works at least as well for shorter
codewords. Hence ∆(T ⊗M , S ⊗bcM c ) ≤ ∆(T ⊗Mα , S ⊗Nα ) → 0. It follows that c is an
admissible rate.
With the exponential bound on ∆ we find similarly that
∆(T ⊗M , S ⊗bcM c ) ≤ µ e−λMα ≤ µ e−λ/(1+η)M , (6.8)
so that the liminf in (6.6) is ≥ λ/(1 + η). Since η was arbitrary, we get the desired
result. 2
6.1.2 Elementary properties

We see that there are in fact many different capacities of a given channel depending
on the type of information we want to transmit. However, there are only two different
cases we are interested in: B can be either classical or quantum. We will discuss both
special cases in greater detail. Before we do this, however, we will have a short look
on some simple calculations which can be done in the general case. To this end it
is convenient to introduce the notations
Md = B(Cd ) and Cd = C({1, . . . , d}) (6.9)
as shorthand notations for B(Cd ) and C({1, . . . , d}) since some notations become
otherwise a little bit clumsy. Our first topic are capacities of ideal channels
Proposition 6.1.3 The capacities of ideal channels are given by
log2 f
C(Cf , Cd ) = C(Mf , Md ) = C(Mf , Cd ) = . (6.10)
log2 d
Proof. It is obvious that we can transmit N d-level systems through M parallel
copies of an ideal channel Mf → Mf provided f M > dN . Hence C(Mf , Md ) ≥
log2 f / log2 d. Similar reasoning holds for the other cases; hence we only have to show
that no bigger rate can be achieved. To this end assume that c > log 2 f / log2 d is
achievable. Then we have by definition
⊗Mj ⊗Nj
lim ∆(Mf , Md )=0 (6.11)
j→∞
for sequences Mj , Nj , j ∈ N with c > limj→∞ Nj /Mj > log2 f / log2 d. This implies
⊗N ⊗M
that there is a j0 ∈ N such that dim Md j > dim Mf j holds for all j > j0 .
⊗Nj ⊗Mj
Therefore each decoding map D : Md → Mf must have a nontrivial kernel.
⊗N
Let A ∈ Md jwith D(A) = 0 and kAk = 1. Then we have for any k ∈ N and
B ∈ Mk with kBk = 1:
kED − Id kcb ≥ k(ED − Id) ⊗ Id k ≥ k(ED − Id)(A) ⊗ Id(B)k = 1. (6.12)
⊗M ⊗N
Hence ∆(Mf j , Md j ) ≥ 1 for all j > j0 in contradiction to (6.11) which implies
C(Md , Mf ) = log2 f / log2 d. Similar reasoning holds for C(Cf , Cd ) and C(Mf , Cd ),
and the proof is complete1 . 2
In the previous proposition we have excluded the case C(Cf , Md ), i.e. the quan-
tum capacity of an ideal classical channel. From the “no-teleportation theorem”
we expect that this quantity is zero. For a proof of this statement it is useful to
introduce first a simple upper bound on C(T, Md ) (cf. [116])
1 For the classical capacity of a quantum channel C(M , C ), it is, however, more difficult
f d
to derive an analog of the error estimate (6.12). We skip this part nevertheless and leave the
corresponding details to the reader.
93 6.1. Definition and elementary properties
Lemma 6.1.4 For each channel T we have
C(T, Md ) ≤ logd kT Θkcb (6.13)
where ΘA = AT denotes the transposition.

Proof. We start with the fact that kΘkcb = d if d is the dimension of the Hilbert
space on which Θ operates. Assume that Nj /Mj → c ≤ C(T, Md ) and j large
N
enough such that k Idd j −Ej T ⊗Mj Dj k ≤ ² with appropriate encodings and decod-
ings Ej , Dj . We get (where Idd denotes the identity on Md )
N N
dNj = k Idd j Θkcb ≤ kΘ(Idd j −Ej T ⊗Mj Dj )kcb + kΘEj T ⊗Mj Dj kcb (6.14)
N
≤ dNj k Idd j −Ej T ⊗Mj Dj kcb + kΘEj Θ(ΘT )⊗Mj Dj kcb (6.15)
Nj M
≤d ²+ kΘT kcbj , (6.16)
where we have used for the last equation the fact that Dj and ΘEj Θ are channels
and that the cb-norm is multiplicative. Taking logarithms on both sides we get
Nj logd (1 − ²)
+ ≤ logd kΘT kcb . (6.17)
Mj Mj
In the limit j → ∞ this implies c ≤ log d kΘT k and therefore C(T, Md ) ≤

logd kΘT kcb as stated. 2
If, e.g. T is classical we have ΘT = T since the transposition coincides on a

classical algebra Cd with the identity (elements of Cd are just diagonal matrices).
This implies Cθ (T ) = log2 kΘT kcb = log2 kT kcb = 0, because the cb-norm of a
channel is 1. We see therefore that the quantum capacity of a classical channel is 0,
as expected.
Corollary 6.1.5 for each classical channel T : Cf → Ck we have C(T, Md ) = 0.
Now let us consider three channels T1 , T2 , T3 . On the one hand we can simulate
the reference channel T1 directly by T3 . On the other we can first simulate T1 by
T2 and then T2 by T3 . The next proposition shows that the second approach is
potentially wasteful.
Proposition 6.1.6 For three channels T1 , T2 , T3 the two step coding inequality
holds:
C(T3 , T1 ) ≥ C(T2 , T1 )C(T3 , T2 ). (6.18)
Proof. Consider the relations
kT1⊗N − E1 E2 T3⊗K D2 D1 kcb

= kT1⊗N − E1 T2⊗M D1 + E1 T2⊗M D1 − E1 E2 T3⊗K D2 D1 kcb (6.19)
≤ kT1⊗N − E1 T2⊗M D1 kcb + kE1 kcb kT2⊗M − E2 T3⊗K D2 kcb kD1 kcb (6.20)
≤ kT1⊗N − E1 T2⊗M D1 kcb + kT2⊗M − E2 T3⊗K D2 kcb (6.21)
where we have used for the last inequality the fact that the cb-norm of a channel is
one. If c1 is an achievable rate of T1 with respect to T2 such that limj→∞ Nj /Mj < c1
and c2 is an achievable rate of T2 with respect to T3 such that limj→∞ Mj /Kj < c2
(i.e. the sequences of quotients converge) we see that
Nj M j Nj Nj Mk
lim inf = lim inf ≤ lim lim . (6.22)
j→∞ Kj j→∞ Kj Mj j→∞ Mj k→∞ Kk
Hence each c < c1 c2 is achievable. Since C(T1 , T3 ) is the supremum over all achiev-
able rates we get (6.18). 2
As a first application of (6.18), we can relate all capacities C(T, Md ) (and

C(T, Cd )) for different d to one another. If we choose T3 = T , T1 = IdMd and
T2 = IdMf we get with (6.1.3) C(T, Md ) ≥ log 2f
log2 d C(T, Mf ), and exchanging d with
f shows that even equality holds. A similar relation can be shown for C(T, C d ).
Hence the dimension of the observable algebra B describing the type of information
to be transmitted, enters only via a multiplicative constant, i.e. it is only a choice
of units and we define the classical capacity Cc (T ) and the quantum capacity Cq (T )
of a channel T as2
Cc (T ) = C(T, C2 ), Cq (T ) = C(T, M2 ). (6.23)
A second application of Equation (6.18) is a relation between the classical and
the quantum capacity of a channel. Setting T3 = T , T1 = IdC2 and T2 = IdM2 we
get again with (6.1.3)
Cq (T ) ≤ Cc (T ). (6.24)
Note that it is now not possible to interchange the roles of C2 and M2 . Hence
equality does not hold here.
Another useful relation concerns concatenated channels: We transmit informa-
tion of type B first through a channel T1 and then through a second channel T2 . It
is reasonable to assume that the capacity of the composition T2 T1 can not be bigger
than capacity of the channel with the smallest bandwidth.
Proposition 6.1.7 Two channels T1 , T2 satisfy the “ Bottleneck inequality”:
C(T2 T1 , B) ≤ min{C(T1 , B), C(T2 , B)}. (6.25)
Proof. To see this consider an encoding and a decoding channel E respectively D
for (T2 T1 )⊗M , i.e. in the definition of C(T2 T1 , B) we look at
k Id⊗N
B −E(T2 T1 )
⊗M
Dkcb = k Id⊗N ⊗M
B −(ET2 )T1⊗M Dkcb . (6.26)
This implies that ET2⊗M and D are an encoding and a decoding channel for T1 .
Something similar holds for D and T1⊗M D with respect to T2 . Hence each achievable
rate for T2 T1 is also an achievable rate for T2 and T1 , and this proves Equation (6.25).
2
Finally we want to consider two channels T1 , T2 in parallel, i.e. we consider the

tensor product T1 ⊗ T2 .
Proposition 6.1.8 The channel capacity is superadditive, i.e.
C(T1 ⊗ T2 , B) ≥ C(T1 , B) + C(T2 , B) (6.27)
for any pair of channels T1 , T2 .
Proof. If Ej , Dj , j = 1, 2 are encoding, respectively decoding channels for T1⊗M
⊗N
and T2⊗M such that k IdB j −Ej Tj⊗M Dj kcb ≤ ² holds, we get
k Id − Id ⊗(E2 T ⊗M D2 ) + Id ⊗(E2 T ⊗M D2 ) − E1 ⊗ E2 (T1 ⊗ T2 )⊗M D1 ⊗ D2 kcb

(6.28)
≤ k Id ⊗(Id −E2 T ⊗M D2 )kcb + k(Id −E1 T1⊗M D1 ) ⊗ E2 T ⊗M D2 kcb (6.29)
≤ k Id −E2 T ⊗M
D2 kcb + k Id −E1 T1⊗M D1 kcb ≤ 2² (6.30)
2 There are other possibilities to define the quantum capacity [24, 13, 12] which are at least
closely related to our version. It is not yet clear whether equality holds. There might be subtle
differences [147].
95 6.2. Coding theorems
Hence c1 + c2 is achievable for T1 ⊗ T2 if cj is achievable for Tj , which completes

the proof. 2
When all channels are ideal, or when all systems involved are classical even
equality holds, i.e. channel capacities are additive in this case. However, if quantum
channels are considered, it is one of the big open problems of the field, to decide
under which conditions additivity holds.
6.1.3 Relations to entanglement measures
The duality lemma proved in Subsection 2.3.3 provides an interesting way to de-
rive bounds on channel capacities and capacity like quantities from entanglement
measures (and vice versa) [24, 122]: To derive a state of a bipartite system from a
channel T we can take a maximally entangled state Ψ ∈ H ⊗ H, send one particle
through T and get a less entangled pair in the state ρT = (Id ⊗T ∗ )|ΨihΨ|. If on the
other hand an entangled state ρ ∈ S(H ⊗ H) is given, we can use it as a resource
for teleportation and get a channel Tρ . The two maps ρ 7→ Tρ and T 7→ ρT are,
however, not inverse to one another. This can be seen easily from the duality lemma
(Theorem 2.3.5): For each state ρ ∈ S(H ⊗ H) there is a channel T and a pure state
Φ ∈ H ⊗ H such that ρ = (Id ⊗T ∗ )|ΦihΦ| holds; but Φ is in general not maximally
entangled (and uniquely determined by ρ). Nevertheless, there are special cases in
which the state derived from Tρ coincides with ρ: A particular class of examples is
given by teleportation channels derived from a Bell-diagonal state.
On ρT we can evaluate an entanglement measure E(ρT ) and get in this way a
quantity which is related to the capacity of T . A particularly interesting candidate
for E is the “one-way LOCC” distillation rate ED,→ . It is defined in the same way
as the entanglement of distillation ED , except that only one-way LOCC operation
are allowed in Equation (5.8). According to [24] ED,→ is related to Cq by the
inequalities ED,→ (ρ) ≥ Cq (Tρ ) and ED,→ (ρT ) ≤ Cq (T ). Hence if ρTρ = ρ we can
calculate ED,→ (ρ) in terms of Cq (Tρ ) and vice versa.
A second interesting example is the transposition bound Cθ (T ) introduced in
the last subsection. It is related to the logarithmic negativity [220]
Eθ (ρT ) = log2 k(Id ⊗Θ)ρT k1 , (6.31)
which measures the degree with which the partial transpose of ρ fails to be positive.
Eθ can be regarded as entanglement measure although it has some drawbacks: it is
not LOCC monotone (Axiom E2), it is not convex (Axiom E3) and most severe: It
does not coincides with the reduced von Neumann entropy on pure states, which we
have considered as “the” entanglement measure for pure states. On the other hand
it is easy to calculate and it gives bounds on distillation rates and teleportation
capacities [220]. In addition Eθ can be used together with the relation between
depolarizing channels and isotropic states to derive Equation (6.50) in a very simple
way.
6.2 Coding theorems
To determine channel capacities directly in terms of Definition 6.1.1 is fairly difficult,
because optimization problems in spaces of exponentially fast growing dimensions
are involved. This renders in particular each direct numerical approach practically
impossible. It is therefore an important task of (quantum) information theory to
express channel capacities in terms of quantities which are easier to compute. In
this section we will review the most important of these “coding theorems”.
6.2.1 Shannon’s theorem
Let us consider first a classical to classical channel T : C(Y ) → C(X). This is
basically the situation of classical information theory and we will only have a short
look here – mainly to show how this (well known) situation fits into the general
scheme described in the last section3 .
First of all we have to calculate the error quantity ∆(T, C2 ) defined in Equation
(6.2). As stated in Subsection 3.2.3 T is completely determined by its transition
probabilities Txy , (x, y) ∈ X × Y describing the probability to receive x ∈ X when
y ∈ Y was sent. Since the cb-norm for a classical algebra coincides with the ordinary
norm we get (we have set X = Y for this calculation):
¯ ¯
¯X ¯
¯ ¯
k Id −T kcb = k Id −T k = sup ¯ (δxy − Txy ) fy ¯ (6.32)
x,f ¯ y
¯
= 2 sup (1 − Txx ) (6.33)
x
where the supremum in the first equation is taken over all f ∈ C(X) with kf k =
supy |fy | ≤ 1. We see that the quantity in Equation (6.33) is exactly twice the
maximal error probability, i.e. the maximal probability of sending x and getting
anything different. Inserting this quantity for ∆ in Definition 6.1.1 applied to a
classical channel T and the “bit-algebra” B = C2 , we get exactly Shannon’s classical
definition of the capacity of a discrete memoryless channel [191].
Hence we can apply Shannon’s noisy channel coding theorem to calculate C c (T )
for a classical channel. To state it we have to introduce first some terminology.
Consider therefore a state p ∈ C ∗ (X) of the classical input algebra C(X) and its
image q = T ∗ (p) ∈ C ∗ (Y ) under the channel. p and q are probability distributions
on X respectively Y and px canPbe interpreted as the probability that the “letter”
x ∈ X was send. Similarly qy = x Txy px is the probability that y ∈ Y was received
and Pxy = Txy px is the probability that x ∈ X was sent and y ∈ Y was received.
The family of all Pxy can be interpreted as a probability distribution P on X × Y
and the Txy can be regarded as conditional probability of P under the condition x.
Now we can introduce the mutual information
X µ ¶
Pxy
I(p, T ) = S(p) + S(q) − S(P ) = Pxy log2 , (6.34)
px q y
(x,y)∈X×Y
where S(p), S(q) and S(P ) denote the entropies of p, q and P . The mutual infor-
mation describes, roughly speaking, the information that p and q contain about
each other. E.g. if p and q are completely uncorrelated (i.e. Pxy = px qy ) we get
I(p, T ) = 0. If T is on the other hand an ideal bit-channel and p equally distributed
we have I(p, T ) = 1. Now we can state Shannon’s Theorem which expresses the
classical capacity of T in terms of mutual informations [191]:
Theorem 6.2.1 (Shannon) The classical capacity of Cc (T ) of a classical commu-
nication channel T : C(Y ) → C(X) is given by
Cc (T ) = sup I(p, T ), (6.35)

p
where the supremum is taken over all states p ∈ C ∗ (X).

6.2.2 The classical capacity of a quantum channel
If we transmit classical data through a quantum channel T : B(H) → B(H) the
encoding E : B(H) → C2 is a parameter dependent preparation and the decoding
D : C2 → B(H) is an observable. Hence the composition ET D is a channel C2 → C2 ,
i.e. a purely classical channel and we can calculate its capacity in terms of Shan-
non’s Theorem (Theorem 6.2.1). The corresponding quantity supE,D Cc (ET D) is
3 Please note that this implies in particular that we do not give a complete review of the
foundations of classical information theory here; cf [140, 88, 63] instead.

obviously a lower bound to the classical capacity. It can be defined alternatively in

terms of Definition 6.1.1 if we allow only encodings (which are for the classical capac-
ity parameter dependent preparations) E ∗ : C ∗ (X N ) → B ∗ (H⊗M ) which are based
on separable signal states and similarly decodings D : C(X N ) → B(H⊗N ) which
allow only separable measurements. It is known that the value supE,D Cc (ET D)
can be improved if we allow entangled measurements for decodings [112]. The cor-
responding capacity, which uses separable signal states but arbitrary decodings is
called the one-shot classical capacity of T and denoted by Cc,1 (T ). It is not known in
general whether entangled encodings would provide further improvements. There-
fore we only have
1
Cc,1 (T ) ≤ Cc (T ) = sup Cc,1 (T ⊗M ) (6.36)
M ∈N M
for a general T . There are however examples for which Cc,1 is additive and coincides
therefore with the classical capacity. This concerns in particular the depolarizing
channel [142] and all qubit channels [141].
Another reason why Cc,1 (T ) is an interesting quantity relies on the fact that we
have, due to the following theorem [112] a computable expression for it.
Theorem 6.2.2 The one-shot classical capacity Cc,1 (T ) of a quantum channel T :
B(H) → B(H) is given by
   
X X ¡ ¢
Cc,1 (T ) = sup S  pj T ∗ [ρj ] − pj S T ∗ [ρj ]  , (6.37)
pj ,ρj
j j
where the supremum is taken over all probability distributions pj and collections of
density operators ρj .
6.2.3 Entanglement assisted capacity
Another classical capacity of a quantum channel arises, if we use dense coding
schemes instead of simple encodings and decodings to transmit the data through
the channel T . In other words we can define the entanglement enhanced classical
capacity Ce (T ) in the same way as Cc (T ) but by replacing the encoding and decod-
ing channels in Definition 6.1.1 and Equation (6.2) by dense coding protocols. Note
that this implies that the sender Alice and the receiver Bob share an (arbitrary)
amount of (maximally) entangled states prior to the transmission.
For this quantity a coding theorem was recently proven by Bennett and others
[26] which we want to state in the following. To this end assume that we are trans-
mitting systems in the state ρ ∈ B ∗ (H) through the channel and that ρ has the
purification Ψ ∈ H ⊗ H, i.e. ρ = tr1 |ΨihΨ| = tr2 |ΨihΨ|. Then we can define the
entropy exchange h¡ ¢¡ ¢i
S(ρ, T ) = S T ⊗ Id |ΨihΨ| . (6.38)
¡ ¢¡ ¢
The density operator T ⊗ Id |ΨihΨ| has the output state T ∗ (ρ) and the input
state ρ as its partial traces. It can be regarded therefore as the quantum analog of
the input/output probability distribution Pxy defined in Subsection 6.2.1. Another
way to look at S(ρ, T ) is in terms of an ancilla representation of T : If T ∗ (ρ) =
trK (U ρ ⊗ ρK U ∗ ) with a unitary U on H ⊗ K and a pure environment state ρK it
can be shown [13] that S(ρ, T ) = S [TK∗ ρ] where TK is the channel describing the
information transfer into the environment, i.e. TK∗ (ρ) = trH (U ρ ⊗ ρK U ∗ ), in other
words S(ρ, T ) is the final entropy of the environment. Now we can define
I(ρ, T ) = S(ρ) + S(T ∗ ρ) − S(ρ, T ) (6.39)
which is the quantum analog of the mutual information given in Equation (6.34).
It has a number of nice properties, in particular positivity, concavity with respect
to the input state and additivity [3] and its maximum with respect to ρ coincides
actually with Ce (T ) [26].
Theorem 6.2.3 The entanglement assisted capacity Ce (T ) of a quantum channel
T : B(H) → B(H) is given by
Ce (T ) = sup I(ρ, T ), (6.40)

ρ
where the supremum is taken over all input states ρ ∈ B ∗ (H).

Due to the nice additivity properties of the quantum mutual information I(ρ, T )
the capacity Ce (T ) is known to be additive as well. This implies that it coincides
with the corresponding “one-shot” capacity, and this is an essential simplification
compared to the classical capacity Cc (T ).
6.2.4 The quantum capacity
Although there is no coding theorem for the quantum capacity Cq (T ), there is a
fairly good candidate which is related to the coherent information
J(ρ, T ) = S(T ∗ ρ) − S(ρ, T ). (6.41)
Here S(T ∗ ρ) is the entropy of the output state and S(ρ, T ) is the entropy exchange
defined in Equation (6.38). It is argued [13] that J(ρ, T ) plays a role in quantum
information theory which is analogous to that of the (classical) mutual information
(6.34) in classical information theory. J(ρ, T ) has some nasty properties, however: it
can be negative [51] and it is known to be not additive [71]. To relate it to Cq (T ) it
is therefore not sufficient to consider a one-shot capacity as in Shannon’s Theorem
(Thm 6.2.1). Instead we have to define
1
Cs (T ) = sup Cs,1 (T ⊗N ) with Cs,1 (T ) = sup J(ρ, T ). (6.42)
N N ρ
In [13] and [14] it is shown that Cs (T ) is an upper bound on Cq (T ). Equality, how-

ever, is conjectured but not yet proven, although there are good heuristic arguments
[153],[122].
A second interesting quantity which provides an upper bound on the quantum
capacity uses the transposition operation Θ on the output systems. We have seen
already in Lemma 6.1.4 that
Cq (T ) ≤ Cθ (T ) = log2 kT Θkcb (6.43)
holds for any channel. Cθ is in many cases a weaker bound as Cs , however it is much
easier to calculate and it is in particular useful if we want identify cases where the
quantum capacity is zero (e.g. the quantum capacity of a classical channel discussed
in Corollary 6.1.5).
Finally we want to mention that lower bounds can be derived in terms of rates
which can be achieved in terms of distinguished coding schemes; cf. e.g [24, 99, 71,
158]. A detailed discussion of this approach is given in the next chapter.
6.2.5 Examples
Although the expressions provided in the coding theorems above are much easier to
calculate then the original definitions they still involve some optimization problems
over possibly large parameter spaces. Nevertheless there are special cases which
allow explicit calculations. As a first example we will consider the “quantum erasure
channel” which transmits with probability 1−ϑ the d-dimensional input state intact
while it is replaced with probability ϑ by an “erasure symbol”, i.e. a (d + 1) th pure

state ψe which is orthogonal to all others [100]. In the Schrödinger picture this is
B ∗ (Cd ) 3 ρ 7→ T ∗ (ρ) = (1 − ϑ)ρ + ϑ tr(ρ)|ψe ihψe | ∈ B ∗ (Cd+1 ). (6.44)
This example is very unusal, because all capacities discussed up to now can be
calculated explicitly: We get Cc,1 (T ) = Cc (T ) = (1 − ϑ) log2 (d) for the clas-
sical, Ce (T ) = 2Cc (T ) for the entanglement enhanced classical capacity and
Cq (T ) = max(0, (1 − 2ϑ) log2 (d)) for the quantum capacity [23, 25]. Hence the
gain by entanglement assistance is exactly a factor two; cf. Figure 6.1.
2
classical capacity
Ce (T ) ee. classical capacity
quantum capacity
Cc (T )
Cq (T ) 1.5
0.5
0
0 0.2 0.4 0.6 0.8 1
ϑ
Figure 6.1: Capacities of the quantum erasure channel plotted as a function of the
error probability.
Our next example is the depolarizing channel

1I
B ∗ (Cd ) 3 ρ 7→ T ∗ (ρ) = (1 − ϑ)ρ + ϑ tr(ρ) ∈ B ∗ (Cd ), (6.45)
d
already discussed in Section 3.2. It is more interesting and more difficult to
study. Using the unitary covariance of T (cf. Subsection 3.2.2) we see first that
I(U ρU ∗ , T ) = I(ρ, T ) holds for all unitaries U (to calculate S(U ρU ∗ , T ) note that
U ⊗ U Ψ is a purification of U ρU ∗ if Ψ is a purification of ρ). Due to the concavity of
I(ρ, T ) in the first argument we can average over all unitaries and see that the max-
imum in Equation (6.40) is achieved on the maximally mixed state. Straightforward
calculation therefore shows that
µ ¶ µ ¶
d2 − 1 d2 − 1 d2 − 1 ϑ
Ce (T ) = log2 (d2 ) + 1 − ϑ 2
log 2 1 − ϑ 2
+ϑ log2 2 (6.46)
d d d2 d
holds, while we have
µ ¶ µ ¶
d−1 d−1 d−1 ϑ
Cc,1 (T ) = log2 (d) + 1 − ϑ log2 1 − ϑ +ϑ log2 , (6.47)
d d d d
where the maximum in Equation (6.37) is achieved for an ensemble of equiprobable
pure states taken from an orthonormal basis in H [114]. This is plausible since
the first term underPthe sup in Equation (6.37) becomes maximal and the second
∗
becomes minimal: j pj T ρj is maximally mixed in this case and its entropy is
therefore maximal. The entropies of the T ∗ ρj are on the other hand minimal if the
ρj are pure. In Figure 6.2 we have plotted both capacities as a function of the noise
parameter ϑ and in Figure 6.3 we have plotted the quotient Ce (T )/Cc (T ) which
gives an upper bound on the gain we get from entanglement assistance. Note in this
context that due to a result of King [142] Cc (T ) = Cc,1 (T ) holds for the depolarizing
channel.
2
one-shot cl. capacity
Ce (T ) entanglement enhanced cl. capacity
1.8
Cc (T )
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
θ
Figure 6.2: Entanglement enhanced classical and classical capacity of a depolarizing

qubit channel.
3
Ce (T )
Cc,1 (T ) 2.9
2.8
2.7
2.6
2.5
2.4
2.3
2.2
2.1
2
0 0.2 0.4 0.6 0.8 1
θ
Figure 6.3: Gain of using entanglement assisted versus unassisted classical capacity
for a depolarizing qubit channel.
1
one-shot coherent information
Cθ (T ) transposition bound
Hamming bound
Cs,1 (T )
0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
ϑ
Figure 6.4: Cθ (T ), Cs (T ) and the Hamming bound of a depolarizing qubit channel

plotted as function of the noise parameter ϑ.
For the quantum capacity of the depolarizing channel precise calculations are
not available. Hence let us consider first the coherent information. J(T, ρ) inherits
from T its unitary covariance, i.e. we have J(U ρU ∗ , T ) = J(ρ, T ). In contrast to
the mutual information, however, it does not have nice concavity properties, which
makes the optimization over all input states more difficult to solve. Nevertheless,
the calculation of J(ρ, T ) is straightforward and we get in the qubit case (if ϑ is the
noise parameter of T and λ is the highest eigenvalue of ρ):
µ ¶ µ ¶ µ ¶
ϑ 1 − ϑ/2 + A 1 − ϑ/2 − A
J(ρ, T ) = S λ(1 − ϑ) + −S −S
2 2 2
µ ¶ µ ¶
λϑ (1 − λ)ϑ
−S −S (6.48)
2 2
where S(x) = −x log2 (x) denotes again the entropy function and
p
A = (2λ − 1)2 (1 − ϑ/2)2 + 4λ(1 − λ)(1 − ϑ)2 . (6.49)
Optimization over λ can be performed at least numerically (the maximum is at-

tained at the left boundary (λ = 1/2) if J is positive there, and the right boundary
otherwise). The result is plotted together with Cθ (T ) in Figure 6.4 as a function of
θ. The quantity Cθ (T ) is much easier to compute and we get
µ ¶
3
Cθ (T ) = max{0, log2 2 − θ }. (6.50)
2
Part II
Advanced topics
Chapter 7
Continuity of the quantum capacity
In the Section 6.2 we have stated that a coding theorem for the quantum ca-
pacity is not yet available. Nevertheless, there are several subproblems which can
be treated independently and which admit simpler solutions. One of them concerns
the question we are going to answer in the following: Is it possible to correct small
errors with small coding effort? Or more precisely: If a channel T is close to an
identical channel Id, is Cq (T ) close to Cq (Id)?. The arguments in this chapter are
based on [139]. Closely related discussions given by other authors are [105, 158]
7.1 Discrete to continuous error model
In Section 4.4 we have described how errors can be corrected, which occur only on a
small number k of n > k parallel channels. Hence the corresponding schemes correct
rare rather than small errors which occur on each copy of the parallel channels T ⊗n .
Nevertheless, the discrete theory can be applied to the situation we are studying in
this chapter. This is the content of the following Proposition. It is the appropriate
formulation of “reducing the order of errors from ε to εf +1 ”.
Proposition 7.1.1 Let T : B(H) → B(H) be a channel, and let E, D be encoding
and decoding channels for coding m systems into n systems. Suppose that this coding
scheme corrects f errors (Definition 4.4.1), and that
kT − id kcb ≤ (f + 1)/(n − f − 1). (7.1)
Then
kET ⊗n D − id kcb ≤ kT − id kfcb+1 2nH2 ((f +1)/n) , (7.2)
where H2 (r) = −r log2 r − (1 − r) log2 (1 − r) denotes the Shannon entropy of the
probability distribution (r, 1 − r).
Proof. Into ET ⊗n D, we insert the decomposition T = id +(T − id) and expand the
product. This gives 2n terms, containing tensor products with some number, say k,
of tensor factors (T − id) and tensor factors id on the remaining (n − k) sites. Now
when k ≤ f , the error correction property makes the term zero. Terms with k > f
we estimate by kT − id kkcb . Collecting terms we get
n
X µ ¶
n
kET ⊗n D − id kcb ≤ kT − id kkcb . (7.3)
k
k=f +1
The rest then follows from the next Lemma (with r = (f + 1)/n). It treats the
exponential growth in n for truncated binomial sums.
Lemma 7.1.2 Let 0 ≤ r ≤ 1 and a > 0 such that a ≤ r/(1 − r). Then, for all
integers n: Ã n µ ¶ !
1 X n ¡
log ak ≤ log ar ) + H2 (r) . (7.4)
n k
k=rn
Proof. For λ > 0 we can estimate the step function by an exponential, and get
n µ ¶
X Xn µ ¶
n k n k λ(k−rn)
a ≤ a e
k k
k=rn k=0
¡ ¢n
= e−λrn 1 + aeλ = M (λ)n (7.5)
105 7.2. Coding by random graphs
¡ ¢
with M (λ) = e−λr 1 + aeλ . The minimum over all real λ is attained at aeλmin =
r/(1−r). We get λmin ≥ 0 precisely when the conditions of the Lemma are satisfied,
in which case the bound is computed by evaluating M (λ). 2 2
Suppose now that we find a family of coding schemes with n, m → ∞ with

fixed rate r ≈ (m/n) of inputs per output, and a certain fraction f /n ≈ ε of errors
being corrected. Then we can apply the Proposition and find that the errors can be
estimated above by
¡ ¢ ³ H2 (ε) ń
∆ T ⊗n , Mm d ≤ 2 kT − id kεcb . (7.6)
This goes to zero, and even exponentially to zero, as soon as the expression in
parentheses is < 1. This will be the case whenever kT − id kcb is small enough, or,
more precisely,
kT − id kcb ≤ 2−H2 (ε)/ε . (7.7)
Note in addition that we have for all n ∈ N
² − n1
2H2 (ε)/ε < 1 . (7.8)
1−²+ n
Hence the bound from Equation (7.1) is implied by (7.7).

The function appearing on the right hand side of (7.7) looks rather complicated,
so we will often replace it by a simpler one, namely
ε
≤ 2−H2 (ε)/ε , (7.9)
e
where e is the base of natural logarithms; cf. Figure 7.1. The proof of this inequality
is left to the reader as exercise in logarithms. The bound is very good (exact to first
order) in the range of small ε, in which we are mostly interested anyhow. In any case,
from kT − id k ≤ ε/e we can draw the same conclusion as from (7.7): exponentially
decreasing errors, provided we can actually find code families correcting a fraction
² of errors. This will be the aim of the next section.
7.2 Coding by random graphs
Our aim in this section is to apply the theory of graph codes (Subsection 4.4.2)
to construct a family of codes with positive rate. It is not so easy to construct
such families explicitly. However, if we are only interested in existence, and do not
attempt to get the best possible rates, we can use a simple argument, which shows
not only the existence of codes correcting a certain fraction of errors, but even that
“typical graph codes” for sufficiently large numbers of inputs and outputs have this
property. Here “typical” is in the sense of the probability distribution, defined by
simply setting the edges of the graph independently, and each according to the
uniform distribution of the possible values of the adjacency matrix. For the random
method to work we need the dimension of the underlying one site Hilbert space to
be a prime number. This curious condition is most likely an artefact of our method,
and will be removed later on.
We have seen that a graph code corrects many errors if certain submatrices of
the adjacency matrix have maximal rank (Corollary 4.4.3). Therefore we need the
following Lemma.
Lemma 7.2.1 Let d be a prime, M < N integers and let X be an N × M -matrix
with independent and uniformly distributed entries in Zd . Then X is singular over
the field Zd with probability at most d−(N −M ) .
7. Continuity of the quantum capacity 106
2−H2 (²)/²
0.8
0.6
0.4
²
e
0.2
0
0 0.2 0.4 0.6 0.8 1
²
Figure 7.1: The two bounds from Equation (7.9) plotted as a function of ².
Proof. The sum of independent uniformly distributed random variables in Z d is

again uniformly distributed. Moreover, since d is prime, this distribution is invariant
under multiplication by non-zero factors. Hence if xj ∈ Zd (j = 1, . . . , N ) are
independent and uniformly distributed, and φj ∈ Zd are non-random constants,
PN
not all of which are zero, j=1 xj φj is uniformly distributed. Hence, for a fixed
PM
vector φ ∈ ZMd , the N components (Xφ)k = j=1 Xkj φj are independent uniformly
distributed random variables. Hence the probability for Xφ = 0 for some fixed φ 6= 0
is d−N . Since there are dM − 1 vectors φ to be tested, the probability for some φ to
yield Xφ = 0 is at most dM −N . 2
Proposition 7.2.2 Let d be a prime, and let Γ be a symmetric (n + m) × (n + m)-

matrix with entries in Zd , chosen at random such that Γkk = 0 and that the Γkj
with k > j are independent and uniformly distributed. Let P be the probability for
the corresponding graph code not to correct f errors (with 2f < n). Then
1 ³ m 4f ´ ³ 2f ´
log P ≤ + − 1 log d + H2 . (7.10)
n n n n
Proof. Each error configuration is an 2f -element subset of the n output nodes. Ac-
cording to Theorem 4.4.2 we have to decide, whether the corresponding (n − 2f ) ×
(m + 2f )-submatrix of Γ, connecting input and error positions with the remaining
output positions, is singular or not. Since this submatrix contains no pairs Γ ij , Γji ,
its entries are independent and satisfy the conditions of the previous Lemma. Hence
the probability that a particular configuration
¡n¢ of e errors goes uncorrected is at
most d(m+2f )−(n−2f ) . Since there are 2f possible error configurations among the
outputs, we can estimate¡ the probability of any 2f site error configuration to be
¢ m−n+4f
n
undetected as less than 2f d . Using Lemma 7.1.2 we can estimate the bi-
¡n¢
nomial as log 2f ≤ nH2 (2f /n), which leads to the bound stated. 2
In particular, if the right hand side of the inequality in (7.10) is negative, we

107 7.3. Results
get P < 1, so that there must be at least one matrix Γ correcting f errors. The
crucial point is that this observation does not depend on n, but only on the rate-like
parameters m/n and f /n. Let us make this behavior a Definition:
Definition 7.2.3 Let d be an integer. Then we say a pair (µ, ²) consisting of a
coding rate µ and an error rate ² is achievable, if for every n we can find an
encoding E of dµne d-level systems into n d-level systems correcting b²nc errors.
Then we can paraphrase the last proposition as saying that all pairs (µ, ²) with
(1 − µ − 4²) log2 d > H2 (2²) (7.11)
are achievable. This is all the input we need for the next section, although a better
coding scheme, giving larger µ or larger ² would also improve the rate estimates
proved there. Such improvements are indeed possible. E.g. for the qubit case (d = 2)
it is shown in [47] that there is always a code which saturates the quantum Gilbert-
Varshamov bound (1 − µ − 2² log2 (3)) > H2 (2²) which is slightly better than our
result.
But there are also known limitations, particularly the so-called Hamming bound.
This is a simple dimension counting argument, based on the error correctors dream:
Assuming that the scalar product (F, G) 7→ ω(F ∗ G) on the error space E is non-
degenerate, the dimension of the “bad space” is the same as the dimension of the
error space. Hence with the notations of Section 4.4 we expect dim H0 · dim E ≤
dim H2 . We now take m input systems and n output systems of dimension d each, so
that dim H1 = dm and dim H2 = dn . For the space of errors happening at at most f
places we introduce a basis s follows: at each site we choose a basis of B(H) consisting
of d2 −1 operators plus the identity. Then a basis of E is given byP all tensor
¡ ¢ products
with basis elements 6= 1I placed at j ≤ f sites. Hence dim E = j≤f nj (d2 − 1)j .
For large n we estimate this as in Lemma 7.1.1 as log dim E ≈ (f /n) log2 (d2 − 1) +
H2 (f /n). Hence the Hamming bound becomes
m f
log2 d + H2 (²) + log2 (d2 − 1) ≤ log2 d (7.12)
n n
which (with d2 À 1) is just (7.11) with a factor 1/2 on all errors.
If we drop the nondegeneracy condition made above it is possible to find codes
which break the Hamming bound [71]. In this case, however, we can consider the
weaker singleton bound, which has to be respected by those degenerate codes as well.
It reads
m f
1− ≥d . (7.13)
n n
We omit its proof here (see [172] Sect. 12.4 instead). Both bounds are plotted
together with the rate achieved by random graph coding in in Figure 7.2 (for d = 2).
7.3 Results
We are now ready to apply the results about error correction just derived to the
calculation of achievable rates and therefore to lower bounds on the quantum ca-
pacity.
7.3.1 Correcting small errors
We first look at the problem which motivated our study, namely estimating the
capacity of a channel T ≈ Id.
Theorem 7.3.1 Let d be a prime, and let T be a channel on d-level systems. Sup-
pose that for some 0 < ε < 1/2,
k id −T k < 2−H2 (ε)/ε . (7.14)

1
µ Hamming bound
Singleton bound
Achieved by random graph coding
0.8
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5
²
Figure 7.2: Singleton bound and Hamming bound together with the rate achieved
by random graph coding (for d = 2). The allowed regions are below the respective
curve.
Then
Cq (T ) ≥ (1 − 4ε) log2 (d) − H2 (2ε) = g(²) (7.15)
Proof. For every n set f = bεnc, and m = bµnc − 1, where µ is, up to a log 2 (d)
factor, the right hand side of (7.15), i.e. µ = 1 − 4ε − log2 (d)−1 H2 (2ε). This ensures
that the right hand side of (7.10) is strictly negative, so there must be a code for
d-level systems, with m inputs and n outputs, and correcting f errors. To this code
we apply Proposition 7.1.1, and insert the bound on k id −T k into Equation (7.6).
bµnc−1
Thus ∆(T ⊗n , Md ) → 0, even exponentially. This means that any number
< µ log2 (d) is an achievable rate. In other words, µ log 2 (d) is a lower bound to the
capacity. 2
If ² > 0 is small enough the quantity on the right hand side of Equation (7.15)
is strictly positive (cf. the dotted graph in Figure 7.2). Hence each channel which
is sufficiently close to the identity allows (asymptotically) perfect error correction.
Beyond that we see immediately that Cq (T ) is continuous (in the cb-norm) at
T = Id: Since Cq (T ) is smaller than log2 (d) and g(²) is continuous in ² with g(0) =
log2 (d) we find for each δ > 0 an ² > 0 exists, such that log 2 (d) − Cq (T ) < ² for
all T with kT − Id kcb < ²/e. In other words if T is arbitrarily close to the identity
its capacity is arbitrarily close to log 2 (d). In Corollary 7.3.3 below we will show the
significantly stronger statement that Q is a lower semicontinuous function on the
set of all channels.
7.3.2 Estimating capacity from finite coding solutions
A crucial consequence of the ability to correct small errors is that we do not actually
have to compute the limit defining the capacity: if we have a pretty good coding
scheme for a given channel, i.e., one that gives us ET ⊗k D ≈ idd , then we know the
errors can actually be brought to zero, and the capacity is close to the nominal rate
of this scheme, namely log2 (d)/k.
109 7.3. Results
Theorem 7.3.2 Let T be a channel, not necessarily between systems of the same
dimension. Let k, p ∈ N with p a prime number, and suppose there are channels E
and D encoding and decoding a p-level system through k parallel uses of T , with
1
error ∆ = k idp −ET ⊗k Dkcb < 2e . Then
log2 (p) 1
Cq (T ) ≥ (1 − 4e∆) − H2 (2e∆) . (7.16)
k k
Moreover, Cq (T ) is the least upper bound on all expressions of this form.
Proof. We apply Proposition 7.3.1 to the channel Te = ET ⊗k D. With the random
coding method we thus find a family of coding and decoding channels E e and De
from m0 into n0 systems, of p levels each, such that
¡ ¢ 0
e ET ⊗k D ⊗n Dk
k id −E e cb → 0. (7.17)
0
This can be reinterpreted as an encoding of pm -dimensional systems through
kn0 uses of the channel T (rather than Te), which corresponds to a rate
0
(kn0 )−1 log2 (pm ) = (log2 p/k)(m0 /n0 ). We now argue exactly as in the proof of
the previous proposition, with ε = e∆, so that
k idp −ET ⊗k Dkcb = ε/e ≤ 2H2 (ε)/ε (7.18)
by equation (7.9). By random graph coding we can achieve the coding ratio µ ≈
(m0 /n0 ) = 1 − 4ε − log2 (p)−1 H2 (2ε), and have the errors ∆(Te⊗n , Mm
0 0
p ) go to zero
exponentially. Since
¡ ¢ 0
0
∆(T ⊗kn , Mm
0
e ET ⊗k D ⊗n Dk
e⊗n0 , Mm0 ) ≤ k id −E e cb ,
p ) ≤ ∆(T p (7.19)
we can apply Lemma 6.1.2 to the channel T (where the sequence Mα is given by
Mα = nα) and find that the rate µ(log2 p/k) is achievable. This yields the estimate
claimed in Equation (7.16).
To prove the second statement consider the function x → p(x) which associates
to each real number x ≥ 2 the biggest prime p(x) with p(x) ≤ x. From known
bounds on the length of gaps between two consecutive primes [127]1 it follows that
limx→∞ x/p(x) = 1 holds, hence we get 2kc /p(2kc ) ≤ 1 + δ 0 for an arbitrary δ 0 > 0,
provided n is large enough, but this implies
£ ¤
log2 p(2kc ) log2 (1 + δ 0 )
c− < . (7.20)
k k
Since we can choose an achievable rate c arbitrarily close to the capacity C q (T ) this
shows that there is for each δ > 0 a prime p and a positive integer k such that
|Cq (T ) − log2 (p)/k| ≤ δ. In addition we can find a coding scheme E, D for T ⊗k
such that Equation (7.18) holds, i.e. the right hand side of (7.16) can be arbitrarily
close to log2 (p)/k, and this completes the proof. 2
This theorem allows us to derive very easily an important continuity property

of the quantum capacity. It is well known that each function F (on a topological
space) which is given as the supremum of¡ a set of
¢ real-valued, continuous functions
is lower semicontinuous, i.e. the set F −1 (x, ∞] is open for each x ∈ R. Since the
right hand side of Equation (7.16) is continuous in T and since Q(T ) is (according
to Proposition 7.3.2) the supremum over such quantities, we get:
Corollary 7.3.3 T 7→ Cq (T ) is lower semi-continuous in cb-norm.
1 If p denotes the nth prime and g(p ) = p
n n n+1 − pn is the length of the gap between pn and
pn+1 it is shown in [127] that g(p) is bounded by constp5/8+² .
7.3.3 Error exponents

Another consequence of Theorem 7.3.2 concerns the rate with which the error
bcnc
∆(T ⊗n , M2 ) decays in the limit n → ∞. Theorem 7.3.2 says, roughly speaking
that we can achieve each rate c < Cq (T ) by combining a coding scheme E, D with
£ ¤
subsequent random-graph coding E, e D.
e However, the error ∆ (ET ⊗n D)⊗l , Mk de-
p
cays according to (7.6) and Proposition 7.2.2 exponentially. A more precise analysis
of this idea leads to the following (cf. also the work Hamada [105]):
Proposition 7.3.4 If T is a channel with quantum capacity Cq (T ) and c < Cq (T ),

then, for sufficiently large n we have
bcnc
∆(T ⊗n , M2 ) ≤ e−nλ(c) , (7.21)
with a positive constant λ(c).
Proof. We start as in Theorem 7.3.2 with the channel Te = ET ⊗k D and the quantity
∆ = k idp −ET ⊗k Dkcb . However instead of assuming that ∆ = ²/e holds, the full
range e∆ ≤ ² ≤ 1/2 is allowed for the error rate ². Using the same arguments as in
the proof of Theorem 7.3.2 we get an achievable rate
µ ¶
log2 (p) H2 (2²)
c(k, p, ²) = 1 − 4² − (7.22)
k log2 (p)
and an exponential bound on the coding error:
¡ ¢ 0 ³ ń 0
0
∆(T ⊗kn , Mm
0
) ≤ k id − e ET ⊗k D ⊗n Dk
E e cb ≤ 2H2 (²) ∆² ; (7.23)
p
cf. Equations (7.6) and (7.19).

To calculate the exponential rate λ(c) with which the coding error vanishes we
have to consider the quantity
1 bncc −1 0 ³ H2 (²) ² ´
λ(c) = lim inf − ln ∆(T ⊗n , M2 ) ≥ lim n ln 2 ∆ (7.24)
n →∞ kn0
µ n
n→∞ 0
¶
² H2 (²)
≥− ln(∆) + ln 2 = −²Λ(∆, ²)/k (7.25)
k ²
where we have inserted inequality (7.23). Now we we can apply Lemma 6.1.2 (with
the sequence Mα = kα), which shows that λ(c) is positive, if the right hand side of
(3.21) is.
What remains to show is that λ(c) > 0 holds for each c < Cq (T ). To this end
we have to choose k, p, ∆ and ² such that c(k, p, ²) = c and Λ(∆, ²) < 0. Hence
consider δ > 0 such that c + δ < Cq (T ) is an achievable rate. As in the proof of
Theorem 7.3.2 we can choose log2 (p)/k such that log2 (p)/k > c + δ holds while ∆ is
arbitrarily small. Hence there is an ²0 > 0 such that c(k, p, ²) = c implies ² > ²0 . The
statement therefore follows from the fact that there is a ∆0 > 0 with Λ(∆, ²) > 0
for all 0 < ∆ < ∆0 and ² > ²0 . 2
In addition to the statement of Proposition 7.3.4 we have just derived a lower

bound on the error exponent λ(c). Since we can not express the error rate ² as a
function of k, p and c we can not specify this bound explicity. However we can plot it
as a parametrized curve (using Equation (7.22) and (7.25) with ² as the parameter)
in the (c, λ)-space. In Figure 7.3 this is done for k = 1, p = 2 and several values of
∆.
111 7.3. Results
1
λ ∆ = 10−3
∆ = 10−4
0.8
∆ = 10−5
∆ = 10−6
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
c
Figure 7.3: Lower bounds on the error exponent λ(c) plotted for n = 1, p = 2 and
different values of ∆.
7.3.4 Capacity with finite error allowed

(²)
We can also tolerate finite errors in encoding. Let Cq (T ) denote the quan-
tity defined exactly like the capacity, but with the weaker requirement that
bcnc (²)
∆(T ⊗n , M2 ) ≤ ε for large n. Obviously we have Cq (T ) ≥ Cq (T ) for each
² > 0. Regarded as a function of ² and T this new quantity admits in addition the
following continuity property in ².
(²)
Proposition 7.3.5 limε→0 Cq (T ) = Cq (T ).
Proof. By definition we can find for each ²0 , δ > 0 a tuple n, p, E and D such that
ε0 + ²
k idp −ET ⊗n Dkcb = (7.26)
e
(²)
and |Cq (T ) − log2 (p)/n| < δ holds. If ² + ²0 is small enough, however, we find as
in Theorem 7.3.2 a random graph coding scheme such that
log2 (p) ¡ ¢ 1 ¡ ¢
Cq (T ) ≥ 1 − 4(² + ²0 ) − H2 2(² + ²0 ) = g(² + ²0 ). (7.27)
n n
Hence the statement follows from continuity of g and the fact that g(0) = log 2 (p)/n
holds. 2
For a classical channel Φ even more is known about the similar defined quantity
(²)
Cc (T ): If ² > 0 is small enough we can not achieve bigger rates by allowing small
(²)
errors, i.e. C(T ) = Cc (T ). This is called the “strong converse of Shannon’s noisy
channel coding theorem” [191]. To check whether a similar statement holds in the
quantum case is one of the big open problem of the theory.
Chapter 8
Multiple inputs
The topic of this and the following three chapters is a quantitative discussion
of the circle of questions we have already visited in Section 4.2, i.e. quantum state
estimation, quantum copying and other devices which act on a large number of
equally prepared inputs. This means that we are following the spirit of Chapter
5 and 6 and ask questions like: How can we measure the error which an approx-
imate cloning machine produces in its outputs? What is the lower bound on this
error? Is there a device which achieves this bound and how does it look like? One
fundamental difference to similar questions arising within entanglement distillation
and calculations of channel capacities is the fact that we are able to give complete
answers under quite general conditions. The reason is that the tasks we are going
to discuss admit large symmetry groups which can be used to reduce the number of
parameters, which makes the corresponding optimization problems more tractable.
Since the material we want to present is quite comprehensive, we have broken it
up into four chapters: The topic of the present one (which is a significantly extended
version of [133]) is an overview and the discussion of some general properties, while
the following three treat special cases, namely: optimal pure state cloning (Chapter
9), quantum state estimation (Chapter 10) and optimal purification (Chapter 11).
8.1 Overview and general structure
To start with, let us have a short look on the general structure of this particular
type of problem. In all cases we are searching for channels
T : A → B(H⊗N ) (8.1)
which operate on N d-level quantum systems (i.e. H = Cd ) and produce an output

of a possibly different type (described by the observable algebra A). There are two
different choices for A we are going to consider: If we look at quantum cloning and
related tasks T should be a channel which produces M systems of the same type
as the inputs, i.e. A = B(H⊗M ). In slight abuse of language we will call all such
channels cloning maps (even if the particular problem we are looking at is only
loosely related to cloning) and we write
T (N, M ) = {T : B(H⊗M ) → B(H⊗N ) | T unital, cp}. (8.2)
For state estimation, T should be an observable with values in the quantum state
space S = S(H). Hence we have to choose A = C(S) and the set of all such
estimators is denoted by
T (N, ∞) = {E : C(S) → B(H⊗N ) | E positive, unital}. (8.3)
This notation is justified by the fact that state estimation is in a certain sense
the limiting case of cloning for infinitely many output systems. We will make this
statement more precise in Chapter 10.
In both cases the task is to optimize a “figure of merit” ∆(T ) which measures,
roughly speaking, the largest deviation of T ∗ (ρ⊗N ) from the target functional β(ρ) ∈
S(A) we want to approximate. In most cases ∆(T ) has the form
£ ¤
∆(T ) = sup δ T ∗ (ρ⊗N ), β(ρ) , (8.4)
ρ∈X
113 8.1. Overview and general structure
where δ is a distance measure on the state space S(A) of the algebra A and X ⊂
S(H). If nothing is known about the input state ρ we have to choose X = S(H). If,
in contrast to that, X is strictly smaller than S(H) it describes a priori knowledge
about ρ. The most relevant special cases arise when X is the set of pure states or if
X is finite. The latter corresponds to a cryptographic setup, where Alice and Bob
use finitely many signal states ρ1 , . . . , ρn to send classical information through a
quantum channel S and Eve tries to eavesdrop on the conversation by copying the
quantum information transmitted through S. Both situations require quite different
methods and we will concentrate our discussion on the pure state case (for recent
results concerning quantum hypothesis testing, i.e. estimation of states from a finite
set, cf. e.g. [173, 107, 166] and the references therein).
A different kind of a priori knowledge are a priori measures, i.e. instead of
knowing that all possible input states lie in a special set X we know for each
measurable set X ⊂ S(H) the probability µ(X) for ρ ∈ X. Such a situation typically
arises when we are trying to estimate (or copy) states of systems which originate
from a source with known characteristics. In this case we can use mean errors
Z
£ ¤
¯
∆(T ) = δ T ∗ (ρ⊗N ), β(ρ) µ(dρ). (8.5)
S(H)
as a figure of merit. Sometimes they are easier to compute than maximal errors as
in Equation (8.4). Often however ∆ leads to stronger results than ∆ ¯ and we will
concentrate our discussion therefore on maximal rather than mean errors.
Now assume that a particular estimation or cloning problem is given, which is
described by a set of channels and an appropriate figure of merit ∆. Then there is
a number of characteristic questions we are interested in the following. The first is:
• Is there an optimal device Tb which minimizes the error, i.e. ∆(Tb) = inf T ∆(T ),
and how does it look like?
Since the dimension of the space T (N, M ) grows exponentially with N and M it
seems at a first look to be hopeless to search for a closed form solution for arbitrary
N and M . We will see however that some problems admit quite simple (symmetry
based) arguments which restrict the size of the spaces, in which we have to search
for the minimizers, quite significantly. In this way we will be able to give a complete
answer for pure state cloning (Chapter 9) and estimation (Section 10.1) and for
purification (Chapter 11).
In other cases the situation is more difficult and a closed form solution can
not be achieved; for us this concerns primarily mixed state estimation and related
tasks. In this situation we will concentrate on the asymptotic behavior in the limit
of infinitely many input systems (N → ∞). Here we have to distinguish between
estimation (M = ∞) and cloning-like tasks (M arbitrary), because in the latter
case we have two parameters which can go to infinity. Let us consider first state
estimation. Here our main interests are the following
• Find sequences EN , N ∈ N with EN ∈ T (N, ∞) such that limN →∞ ∆(EN ) =

0 holds. We have already seen in Section 4.2 that such sequences exist.
• Determine the error exponent ν = limN →∞ −N −1 ln ∆(EN ). What is the best

ν we can achieve (i.e. what is the fastest decay) and how does the correspond-
ing estimation scheme EN , N ∈ N look like?
Note that the EN we are looking for in this context are not necessarily optimal
for N < ∞ but if ν is finite the error ∆(EN ) vanishes exponentially fast and the
difference (measured by ∆) between EN and an optimal scheme becomes already
negligible for a quite small number of input systems. The search for a sequence
8. Multiple inputs 114
EN , N ∈ N with minimal ν can be regarded in addition as a search for a scheme

which is asymptotically optimal. We will discuss this circle of question in Chapter
10.
If we consider cloning maps there are two parameters N and M which can be
send to infinity. One possibility is to keep N fixed and look at the limiting case
of infinite many output systems (M → ∞). This situation is closely related to
estimation, because cloning devices which can be constructed from estimators (cf.
Equation (4.19) and the corresponding discussion in Section 4.2) lead to errors which
do not depend on the number M of outputs. We will make this statement precise in
Section 10.1.1. A more general type of question appears if both parameters N and
M goes to infinity, i.e. we have a sequence M (N ) ∈ N, N ∈ N of positive integers
such that limN →∞ M (N )/N = c holds (possibly with c = ∞). Then we can ask
• Consider a double sequence TN,M ∈ T (N, M ), N, M ∈ N (e.g. a cloning
scheme). What is the best asymptotic rate c = limN →∞ M (N )/N we can
achieve such that limN →∞ ∆(TN,M (N ) ) = 0?
Note that this is precisely the same type of question we have already encountered
within the discussion of entanglement distillation (Section 5.1.3) and channel capac-
ities (Section 6.1.1). One difference is that we can combine the search for asymptotic
rates with results (where available) about optimal devices, i.e. we can choose for
TN,M those channels which minimizes ∆(TN,M ) for given N, M ∈ N. This simplifies
the calculation of optimal rates significantly. We will consider this type of questions
in detail in Chapter 11.
8.2 Symmetric cloner and estimators
All figures of merit we want to discuss in this work admit a large symmetry group
G because they describe “universal” and “symmetric” problems which do not prefer
a direction in the Hilbert space H or a particular tensor factor of the input or (if
applicable) output systems. This fact can be used to simplify the calculation of ∆
and the solution of the corresponding optimization problem. We have already used
similar arguments in the context of entanglement measures (cf. Subsection 5.3), and
we will see that the simplification of cloning and estimation problems gained from
symmetry arguments are even stronger. The purpose of this section is to discuss the
group G, its action on T (N, M ) and in particular the set of cloners (respectively
estimators) which are invariant under this action. A short summary of notations and
terminology from group theory, which we will use throughout this and the following
three chapters, is added as an appendix to this chapter (Section 8.3).
8.2.1 Reducing parameters
To start with let us consider the set T (N, M ) of cloning maps defined in Equation
(8.2), i.e. M < ∞. The symmetry group G is in this case given as the direct
product U(d) × SN × SM of the unitary group U(d) and the symmetric groups
SN and SM . Its action on T (N, M ) is defined by αU,σ,τ (T ) = (αU ασ ατ )(T ) with
(U, σ, τ ) ∈ U(d) × SN × SM and
(αU T )(A) = U ⊗N T (U ∗⊗M AU ⊗M )U ∗⊗N ,

(ασ T )(A) = Vσ T (A)Vσ∗ , (ατ T )(A) = T (Vτ AVτ∗ ), (8.6)
where Vσ , Vτ are the unitaries associated to the permutations σ and τ ; cf. Equation
(3.7). The transformation T 7→ (αU T ) can be interpreted (passively) as a basis
change in the one-particle Hilbert space H, while ασ and ατ refer to permutations
of input respectively output systems. Now we have the following lemma:
Lemma 8.2.1 Consider the space T (N, M ) for M < ∞ and a convex, lower semi-
continuous functional ∆ : T (N, M ) → R+ which is invariant under the action of
115 8.2. Symmetric cloner and estimators
G = U(d) × SN × SM defined in Equation (8.6), i.e. ∆(αg T ) = ∆(T ) holds for all
g ∈ G. Then there is at least one Tb ∈ T (N, M ) with
∆(Tb) ≤ ∆(T ) ∀T ∈ T (N, M ) and αg Tb = Tb, (8.7)
i.e. there exist minimizers which are invariant under the group action α g .
Proof. The existence of minimizers is a simple consequence of compactness of
T (N, M ) and semicontinuity of ∆. Hence there is an S ∈ T (N, M ) with ∆(S) ≤
∆(T ) ∀T ∈ T (N, M ). Due to the invariance of ∆ we get
∆(αg S) = ∆(S) ≤ ∆(T ) ∀g ∈ G ∀T ∈ T (N, M ). (8.8)
Now we can average over G (this integral is well defined because αg S ∈ T (N, M )
and T (N, M ) is finite dimensional)
Z
Tb = αg Sdg, (8.9)
G
where dg denotes the normalized Haar measure on G (which exists, due to com-
pactness of G). Obviously Tb is G-invariant: Since the action αg is affine we get
Z Z Z
αh Tb = αh αg Sdg = αhg Sdg = αg0 Sdg 0 = Tb with g 0 = hg. (8.10)
G G G
Exploiting convexity of ∆ and Equation (8.8) we get in addition (for all T ∈
T (N, M ))
·Z ¸ Z Z
b
∆(T ) = ∆ αg Sdg ≤ ∆(αg S)dg = ∆(S)dg = ∆(S), (8.11)
G G G
which proves the lemma. 2
Hence as long as we are only interested to find some (rather than all) optimal
devices we can restrict our attention to those channels which are invariant under
the operation αU,σ,τ of U(d) × SN × SM introduced above. It is therefore useful to
define
Definition 8.2.2 A completely positive, (not necessarily unital) map T :
B(H⊗N ) → B(H⊗M ) which is invariant under the action αU,σ,τ of U(d) × SN × SM
defined in Equation (8.6) is called a fully symmetric cloning map. The space of all
fully symmetric elements of T (N, M ) is denoted by Tfs (N, M ).
To adopt the previous discussion to state estimation, let us consider now
the space T (N, ∞) of estimators (Equation (8.3)). As the set T (N, M ) defined
above, T (N, ∞) is convex, however it is infinite dimensional and compactness is
therefore topology dependent. An appropriate topology for our purposes is the
weak topology on T (N, ∞), i.e. the coarsest topology such that all functions
T (N, ∞) 3 E 7→ hψ, E(f )φi with f ∈ C(S), ψ, φ ∈ H⊗N are continuous. It is
then an easy consequence of the Banach-Alaoglu Theorem [186, Theorem IV.21]
that T (N, ∞) is compact in this topology.
In analogy to Equation (8.6) we can define a weakly continuous action of the
group SN × U(d) on T (N, ∞): For each (U, τ ) 3 U(d) × SM we define αU,τ E =
(αU ατ )(E) with
αU E(f ) = U ⊗N E(fU )U ∗⊗N and ατ E(f ) = Vτ E(f )Vτ , (8.12)
where Vτ : H⊗N → H⊗N denotes the permutation unitary associated to τ and

fU ∈ C(S) is given by fU (ρ) = f (U ρU ∗ ). Now we can follow the proof of Lemma
8.2.1 if we take into account that integrals of T (N, ∞) valued maps should be
considered as weak integrals, i.e. the average Ē of αg E over the group G = SN × U(d)
is defined as
Z
hψ, Ē(f )φi = hψ, (αg E)(f )φiµ(dg) ∀f ∈ C(S) ∀ψ, φ ∈ H⊗N . (8.13)
G
Hence we have:
Lemma 8.2.3 Consider a convex, lower semicontinuous (with respect to the weak
topology) functional ∆ : T (N, ∞) → R+ which is invariant under the action of
G = U(d) × SN defined in Equation (8.12), i.e. ∆(αg E) = ∆(E) holds for all
b ∈ T (N, ∞) with
g ∈ G. Then there is at least one estimator E
b ≤ ∆(E) ∀E ∈ T (N, ∞) and αg E
∆(E) b = E,
b ∀g ∈ G. (8.14)
As in the case M < ∞ discussed above, this lemma shows that we can restrict
the search for minimizers to those observables which are invariant under the action
αU,τ of U(d) × SN (as long as the figure of merit under consideration has the correct
symmetry). In analogy to the M < ∞ case we define therefore
Definition 8.2.4 A (completely) positive, unital map E : C(S) → B(H ⊗N ) which
is invariant under the action αU,τ of U(d)×SN defined in Equation (8.12) is called a
fully symmetric estimator. The set of all fully symmetric E is denotes by Tfs (N, ∞).
To make use of these results it is necessary to get a better understanding of the
structure of the sets Tfs (N, M ) for N ∈ N and M ∈ N ∪ {∞}. This is the subject of
the rest of this chapter.
8.2.2 Decomposition of tensor products
The first step is an analysis of the representations U 7→ U ⊗N and σ 7→ Vσ of U(d)
respectively SN on the tensor product Hilbert space H⊗N , which play a crucial role
in the definition of fully symmetric channels and estimators. The results we are
going to review here are well known and go back to Weyl [237, Ch. 4]. To state
them we have to introduce some notations from group theory: A Young frame is an
arrangement of a finite number of boxes into rows of decreasing length. We represent
it by a sequence of integers m1 ≥ m2 ≥ · · · ≥ md ≥ 0 where mk denotes the number
of boxes in the k th row. Hence
d
X
Yd (N ) = {m = (m1 , . . . , md ) ∈ Nd0 | m1 ≥ m2 ≥ . . . ≥ md , mk = N } (8.15)
k=1
denotes the set of all frames with d rows and N boxes. Each Young frame m ∈ Y d (N )
determines uniquely (up to unitary equivalence) irreducible representations of S N
and U(d) which we denote by Πm and πm . In the U(d)-case m gives the highest
weight of πm in the basis Ejj = |jihj|, j = 1, .., d of the Cartan subalgebra of the
Lie algebra of U(d) (cf. Subsection 8.3.2 for notations and a further discussion).
Πm as well as πm can be constructed explicitly from m, but we do not need this
information.
Theorem 8.2.5 Consider the d-dimensional Hilbert space H = Cd , its N -fold ten-
sor product H⊗N and the representations U(d) 3 U 7→ U ⊗N and SN 3 σ 7→ Vσ on
H⊗N . There is a unique decomposition of H⊗N into a direct sum such that
M M
H⊗N ∼ = Hm ⊗ Km , U ⊗N ∼= πm (U ) ⊗ 1I,
m∈Yd (N ) m∈Yd (N )
M
Vσ ∼
= 1I ⊗ Πm (σ) (8.16)
m∈Yd (N )
holds, where ∼
= means “naturally isomorphic”.
For a proof see [195, Sect. IX.11]. This theorem is intimately related to Theorem
3.1.1, where commutativity properties between the U ⊗N and Vσ are discussed. This
can be seen if we introduce the algebras
M M
AN = B(Hm ) ⊗ 1I, BN = 1I ⊗ B(Km ). (8.17)
m∈Yd (N ) m∈Yd (N )
It is easy to check that AN and BN are commutants of each other, i.e. we have
0
AN = B N and BN = A0N where the prime denotes the commutant of the corre-
sponding set of operators, i.e. A0N = {B ∈ B(H⊗N ) | [A, B] = 0, ∀A ∈ A} and
0
similarly for BN . On the other hand we have U ⊗N ∈ AN and Vσ ∈ BN for all
U ∈ U(d) and σ ∈ SN . Hence, irreducibility of the representations πm and Πm im-
plies immediately that each element of AN (of BN ) is a finite linear combination of
operators from {U ⊗N | U ∈ U(d)} (respectively of permutation unitaries Vσ ). This
shows in particular that each operator which commutes with all U ⊗N is a linear
combination of Vσ ’s as stated in Theorem 3.1.1. Note however that Theorem 3.1.1
is not a corollary of Theorem 8.2.5, because the former is used in an essential way
in the proof of the latter.
Now let us consider the general linear group GL(d, C). Each representation πm
of U(d) admits a (unique) analytic continuation and leads therefore to a represen-
tation of GL(d, C). We will denote it by πm as well and it is in fact the GL(d, C)
representation with highest weight m. Therefore the following Corollary is an easy
consequence of Theorem 8.2.5
Corollary 8.2.6 Consider an operator X ∈ GL(d, C) and a Young frame m ∈
Yd (N ) for some N ∈ N then we have
Pm X ⊗N Pm = πm (X) ⊗ 1I ∈ B(Hm ) ⊗ B(Km ) (8.18)
where the Pm denote the central projections of the algebra AN (respectively BN ),

i.e. M
H⊗N 3 φ = φn 7→ Pm φ = φm ∈ Hm ⊗ Km , (8.19)
n
L
with n, m ∈ Yd (N ) and φ = n φn denotes the decomposition of φ according to the
direct sum from Equation (8.16).
A subrepresentation of U ⊗N which is particularly important within pure state
cloning and estimation arises if we consider the action of U(d) and SN on the
symmetric tensor product, i.e.
⊗N 1 X
H+ = {SN ψ | ψ ∈ H⊗N }, SN ψ = Vσ ψ. (8.20)
N!
σ∈SN
⊗N
By definition all permutation unitaries Vσ act as identities on H+ and it is the
⊗N ⊗N
biggest subspace of H with this property. In addition it is easy to see that H+
⊗N
is left invariant by the U , i.e. it carries a subrepresentation of U(d). Since the
trivial representation of SN is labeled by the Young frame with one row and N
⊗N
boxes we get with Theorem 8.2.5 H+ = HN 1 ⊗ KN 1 , where we have used the
notation
1 = (1, 0, . . . , 0) ∈ Yd (N ) and N 1 = (N, 0, . . . , 0) ∈ Yd (N ). (8.21)
But the trivial representation is one-dimensional, i.e. KN 1 = C and this leads to

the following proposition:
⊗N
Proposition 8.2.7 Consider the N -fold symmetric tensor product H + , the cor-
⊗N
responding projection SN : H⊗N → H+ and the U(d) representation π+N
(U ) =
⊗N
SN U SN . Then we have, using the notations from Theorem 8.2.5 and Equation
(8.21):
⊗N N
H+ = H N 1 , π+ = πN 1 , SN = P N 1 . (8.22)
8.2.3 Fully symmetric cloning maps
Let us consider now fully symmetric cloning maps. Our aim in this Subsection is to
determine the extremal elements of the convex set Tfs (M, N ) and the central tool
for this task is Theorem 8.2.5, which we have to apply to the input and output
Hilbert space H⊗N and H⊗M . Since the procedure is quite complex we have broken
it up into several steps.
Proposition 8.2.8 Each fully symmetric channel T : B(H ⊗M ) → B(H⊗N ) can be
decomposed into a direct sum
M
T (A) = Tm (A) ⊗ 1IKm , (8.23)
m∈Yd (N )
where Tm are channels from B(H⊗M ) to B(Hm ) with
Tm (U ⊗M AU ∗⊗M ) = πm (U )Tm (A)πm (U ∗ ) and Tm (Vτ AVτ∗ ) = Tm (A). (8.24)
The set of all such Tm (which we will call again fully symmetric) is denoted by
Tfs (Hm , M ).
Proof. According to Definition 8.2.2 we have [T (A), Vσ ] = 0 for all A ∈ B(H⊗M )
and all σ ∈ SN . By Theorem 8.2.5 this implies that T (A) ∈ AN holds, where AN
denotes the algebra from Equation (8.17). Hence, T is of the given form. 2
The next step applies Theorem 8.2.5 to the output Hilbert space H⊗M . This
leads to a further decomposition of the spaces Tfs (Hm , M ).
Theorem 8.2.9 Consider N, M ∈ N and m ∈ Yd (N ). Each channel T :
B(H⊗M ) → B(Hm ) satisfying the covariance condition from Equation (8.24) admits
a unique convex decomposition
X cn £ ¤
T (A) = Tn trKn (Pn APn ) (8.25)
dim Kn
n∈X
P
with cn > 0, n cn = 1 and
X = {n ∈ Yd (M ) | ∃A ∈ B(H⊗M ) with T (Pn APn ) 6= 0} (8.26)
and the Tn are unital cp-maps Tn : B(Hn ) → B(Hm ) satisfying
Tn (πn (U )Aπn (U )∗ ) = πm (U )T (A)πm (U )∗ . (8.27)
The set of all channels Tn with this property is denoted by Tfs (Hm , Hn ).
Proof. To prove uniqueness it is sufficient to note that each summand in Equation
(8.25) equals T (Pn APn ) and is therefore uniquely
P determined by T . To show that
the corresponding decomposition T (A) = n T (Pn APn ) of T has the given form
consider first the dual T ∗ of T . By assumption we have [Vτ , T ∗ (ρ)] = 0 for all
ρ ∈ B ∗ (Hm ) and all τ ∈ SM . Due to Theorem 8.2.5 this implies T ∗ (ρ) ∈ A∗M ,
where A∗M denotes the dual of the algebra from Equation (8.17). T ∗ is therefore a
L
direct sum T ∗ (ρ) = n Ten∗ (ρ) ⊗ 1I over n ∈ Yd (M ), where Ten∗ is the dual of a cp-
map Ten : B(Hn ) → B(Hm ) which satisfies the covariance condition from Equation
(8.27).
Rewriting the decomposition of T ∗ in the Heisenberg picture we get with A ∈

B(H⊗M ) and ρ ∈ B ∗ (Hm )
£ £ ¤ X £ ¤
tr T (A)ρ] = tr AT ∗ (ρ) = tr A (Ten∗ (ρ) ⊗ 1IKn ) (8.28)
n
X £ ¤
= tr trKn (Pn APn )Ten∗ (ρ) (8.29)
n
X £ ¤
= tr Ten [trKn (Pn APn )] ρ . (8.30)
n
Since A and ρ are arbitrary we see that T becomes

X £ ¤
T (A) = Ten trKn (Pn APn ) . (8.31)
n
This is exactly the form from Equation (8.25), except that the Ten are not unital.
Hence consider Ten (1I). Due to covariance of Ten we have
¡ ¢
πm (U )Ten (1I)πm (U )∗ = Ten πn (U )Pn πn (U )∗ = Ten (1I) (8.32)
For all U ∈ U(d). Irreducibility of πm implies therefore that Ten (1I) = e

cn 1I holds, i.e.
e
Tn is unital up to a positive factor 0 ≤ e
cn ≤ 1. In addition we get (since T is unital)
from (8.31) the equality
X X
1I = dim Kn Ten (1I) = dim Kn e
cn 1I, (8.33)
n n
P e
hence n Kn e cn = 1. If n ∈ X we have e c−1
cn > 0 such that we can define Tn = en Tn
and cn = e cn dim Kn , such that Equation (8.25) follows from (8.31). Since Tm is
unital and inherits covariance (8.27) from Tem the theorem is proved. 2
Now the final step is to analyze the spaces Tfs (Hm , Hn ) of πm , πn covariant
channels.
Proposition 8.2.10 The set Tfs (Hm , Hn ) is convex and its extremal elements are
of the form
T (A) = V ∗ (A ⊗ 1IL )V (8.34)
with an isometry V : Hm → Hn ⊗ L into the tensor product of Hm and an auxiliary
Hilbert space L such that L carries an irreducible representation π of U(d) and V
intertwines πm with πn ⊗ π, i.e. V πm = πn ⊗ πV holds.
Proof. Assume first that T from Equation (8.34) admits a convex decomposition
T = λT1 + (1 − λ)T2 with T1 , T2 ∈ Tfs (Hm , Hn ) and 0 < λ < 1. By Theorem 3.2.2
this implies that there are two operators F1 , F2 with Tj (A) = V ∗ A ⊗ Fj V , j = 1, 2
and [π(U ), Fj ] = 0. Irreducibility of π implies together with normalization (the Tj
are unital by assumption) that F1 = F2 = 1I holds. Hence T1 = T2 = T which
implies that T is extremal.
To show that each channel T ∈ Tfs (Hm , Hn ) can be decomposed into elements
of the given form, note that Theorem 3.2.2 implies that a Stinespring representation
T (A) = V ∗ (A ⊗ 1IL )V of T exists such that L carries a representation π of U(d)
and V : Hm → Hn ⊗ L is an isometry which intertwines πm with πn ⊗ π. If π
is irreducible the Theorem is proved (T is extremal in this case); if not we can
decompose it into a direct sum
M M
L= Lj , π = πj (8.35)
j∈J j∈J
where J is a finite index set and the πj are irreducible representations on Lj . If the
projection from L onto Lj is denoted by Pj we can define operators Vj = (1I ⊗ Pj )V
which intertwine πm and πn ⊗πj . Hence Tej (A) = Vj∗ (A⊗1I)Vj is a cp-map satisfying
the proposition, except that it is not unital. Irreducibility of πm and covariance of Tej
P
imply however that Tej (1I) = cj 1I holds with positive constants cj . Due to j Pj = 1I,
we get a convex decomposition T (A) = Σj cj Tj (A) of T with summands Tj = c−1 e
j Tj
of the stated form. 2
Combining Theorem 8.2.9 and Proposition 8.2.10 we get the extremal elements
of the set Tfs (Hm , M ) as
1 £¡ ¢ ¤
T (A) = V ∗ trKn (Pn APn ) ⊗ 1I V (8.36)
dim Kn
with n ∈ Yd (M ) and an isometry V which satisfies the condition from Proposition
8.2.10. Using in addition Proposition 8.2.8 we see that each extremal element of the
set Tfs (N, M ) is a direct sum over the set Yd (N ) of channels of the form (8.36).
To get a result which is even more explicit, we have to determine for each n and
m all admissible intertwining isometries V . For arbitrary but finite d this can be
done at least in an algorithmic way and in the special case d = 2 we just have to
calculate Clebsch-Gordon coefficients. This shows that the general structure of a
fully symmetric cloning map is completely determined by group theoretical data.
8.2.4 Fully Symmetric estimators
Our next task is to determine the structure of the set Tfs (N, M ) in the special case
M = ∞. Hence consider an E ∈ Tfs (N, M ). As for finite M we have [E(f ), Vσ ] = 0
for allLσ ∈ SN and all f ∈ C(S). This implies that E decomposes into a direct sum
E = m∈Yd (N ) Em where the Em are observables
Em : C(S) → B(Hm ), with Em (fU ) = πm (U )Em (f )πm (U ∗ ) (8.37)
and fU (ρ) = f (U ρU ∗ ). We write Tfs (m, ∞) for the space of all observables satisfying
Equation (8.37) and call them again fully symmetric. To analyze the structure of
the Em let us state first the following result [66, 111]:
Theorem 8.2.11 Consider a compact, unimodular group G which acts transitively
on a topological space X by G × X 3 (g, x) 7→ αg (x), and a representation π of
G on a Hilbert space H. Each covariant POV measure E : C(X) → B(H) (i.e.
E(f ◦ αg ) = π(g)E(f )π(g)∗ holds for all g ∈ G and all f ∈ C(X)) has the form
Z
E(f ) = f (αg x0 )π(g)Q0 π(g)∗ µ(dg) (8.38)
G
where x0 ∈ X is an (arbitrary) reference point, µ is the Haar-measure on G and

Q0 ∈ B(H) a positive operator which is uniquely determined by validity of Equation
(8.38) and the choice of x0 .
Unfortunately this theorem is not applicable to our case, because the action of
U(d) on S is not transitive. Nevertheless, it tells us how the observables E m look
like along the orbits of the U(d) action on S(H). To analyze the behavior of E m
transversal to the orbits it is useful to look at the set
d
X
Σ = {x ∈ [0, 1]d | x1 ≥ x2 ≥ . . . ≥ xd ≥ 0, xj = 1} (8.39)
j=1
of “ordered spectra” of density operators and the corresponding projection p : S →

Σ which associates to each ρ ∈ S its spectrum p(ρ) ∈ Σ. It is easy to see that
Σ coincides with the orbit space S/ U(d) and p with the canonical projection. If
e1 , . . . , ed ∈ H denotes an orthonormal basis we can introduce in addition the map
Σ 3 x 7→ ρx = Σdj=1 xj |ej ihej | ∈ S, (8.40)
which is a section of the projection p : S → Σ, i.e. x 7→ ρx is injective and satisfies

p(ρx ) = x for all x. Finally we can define the surjective (but not injective) map
Σ × U(d) 3 (x, U ) 7→ U ρx U ∗ ∈ S. Using this terminology we can expect from
Theorem 8.2.11 that Em ∈ Tfs (m, ∞) looks (heuristically) like
Z
Em (f ) = f (U ρx U ∗ )πm (U )Q0 (x)πm (U )∗ dU dx (8.41)
U(d)×Σ
where Σ 3 x 7→ Q0 (x) ∈ B(Hm ) is an operator valued density. To make this

statement precise we only have to take into account that Q0 can have discrete and
singular parts. This leads to the following theorem
Theorem 8.2.12 Each fully symmetric estimator Em ∈ Tfs (m, ∞) (with m ∈
Yd (N )) has the form
Z
Em (f ) = πm (U )Qm (fU )πm (U )∗ dU (8.42)
U(d)
with fU ∈ C(Σ), fU (x) = f (U ρx U ∗ ) and an appropriate POV-measure Qm : C(Σ) →

B(Hm ).
S
Proof. Note first that we can decompose S into a disjoint union S = j∈J Sj of
finitely many subsets Sj ⊂ S such that there is for each j ∈ J a homogeneous
j
space Xj , a transitive operation U(d) × Xj 3 (U, v) 7→ αU v ∈ Xj , a distinguished
j
element v0 ∈ Xj and a homeomorphism Φj : Σj × Xj → Sj with Σj = p(Sj )
j j
and Φj (x, αU v0 ) = U ρx U ∗ . This is a central result from the theory of compact
G-manifolds [129]. In our case the Sj are characterized by the degeneracy of the
eigenvalues of their elements.
The Sj are measurable subsets of S. This implies that we can define for each
f ∈ C(Sj ) the integral Z
em,j (f ) =
E f (ρ)E(dρ). (8.43)
Sj
If the map Eem,j is nonzero, covariance of E implies that there is a positive constant
λj with Eem,j (1I) = λj 1I hence we can define Em,j = λ−1 E em,j and get POV-measures
P j
Em,j : C(Sj ) → B(Hm ) with Em (f ) = j λj Em,j (f ¹ Sj ), where f ¹ Sj denotes
the restriction of f ∈ C(S) to Sj . It is therefore sufficient to show Equation (8.42)
for all Em,j .
Hence consider a positive function h ∈ C(Σj ) and the map
£ ¤
eh (g) = Em,j (g ⊗ h) ◦ Φ−1 ∈ B(Hm )
C(Xj ) 3 g 7→ E (8.44)
j
it is linear and positive and has the covariance property E eh (g ◦ αj ) =

U
πm (U )E eh (g)πm (U ) . Together with irreducibility of πm the latter implies that
∗
Eeh (1I) = νh 1I holds with a constant νh > 0. In other words E eh is (up to nor-
malization) a covariant POV measure and Theorem 8.2.11 applies. Hence there is
a unique positive operator Qm,j (h) ∈ B(Hm ) such that
Z
£ ¤
Eeh (g) = Em,j (g ⊗ h) ◦ Φ−1 =
j g(U v0j U ∗ )πm (U )Qm,j (h)πm (U )∗ dU (8.45)
U(d)
holds. It is easy to see that the map C(Σj ) 3 h 7→ Qm,j (h) ∈ B(Hm ) is linear and
positive. Normalization of Em,j implies in addition that Qm,j (1I) = 1I. Hence Qm,j
is a POV-measure satisfying Equation (8.45) for each function (g ⊗h)◦Φ−1 j ∈ C(Sj ).
Linearity and continuity (which is a consequence of positivity) implies therefore
Z
Em,j (f ) = πm (U )Qm,j (fU )πm (U )∗ dU (8.46)
U(d)
P
for all f ∈ C(Sj ). Hence Equation (8.42) follows with Qm = j λj Qm,j . 2
8.3 Appendix: Representations of unitary groups

Throughout this and the following three chapters many arguments from represen-
tation theory of unitary groups are used. In order to fix the notation and to state
the most relevant theorems we will recall in this appendix some well known facts
from representation theory of Lie groups. General references are the books of Barut
and Raczka [15], Zhelobenko [241] and Simon [195].
8.3.1 The groups and their Lie algebras
Let us consider first the group U(d) of all complex d × d unitary matrices. Its
Lie algebra u(d) can be identified with the Lie algebra of all anti-hermitian d × d
matrices. The exponential function is then given by the usual matrix exponential
X 7→ exp(X). u(d) is a real Lie algebra. Hence we can consider its complexification
u(d) ⊗ C which coincides with the set of all d × d matrices and at the same time with
the Lie algebra gl(d, C) of the general linear group GL(d, C). In other words u(d) is
a real form of gl(d, C). A basis of gl(d, C) is given by the matrices Ejk = |jihk|.
The set of elements of U(d) with determinant one forms the subgroup SU(d)
of U(d). Its Lie algebra su(d) is the subalgebra of u(d) consisting of the elements
with zero trace. Hence the complexification su(d) ⊗ C of su(d) is the Lie algebra
of trace-free matrices and coincides therefore with the Lie algebra sl(d, C) of the
special linear group SL(d, C). As well as in the U(d) case this means that su(d) is a
real form of sl(d, C). The matrices Ejk do not form a basis of sl(d, C) since the Ejj
are not trace free. Instead we have to consider Ejk , j 6= k and Hj = Ejj − Ej+1,j+1 ,
j = 1, . . . , d − 1. The difference between sl(d, C) and gl(d, C) is exactly the center
of gl(d, C), i.e. all complex multiples of the identity matrix. In other words we have
gl(d, C) = sl(d, C) ⊕ C1I. A similar result holds for the real forms: u(d) = su(d) ⊕ R1I.
The (real) span of all iEjj , j = 1, . . . , d is a subalgebra of u(d) which is maximally
abelian, i.e. a Cartan subalgebra of u(d). In the following we will denote it by t(d)
and its complexification by tC (d) ⊂ gl(d, C). The intersection of t(d) with su(d)
results in a Cartan subalgebra st(d) of su(d). We will denote the complexification
by stC (d). Again the two algebras t(d) and st(d) differ by the center of u(d) i.e.
t(d) = st(d) ⊕ R1I and tC (d) = stC (d) ⊕ C1I in the complexified case.
8.3.2 Representations
Consider now a finite-dimensional representation π : U(d) → GL(N, C) of U(d). It
is characterized uniquely by the corresponding representation ∂π : u(d) → gl(N, C)
of its Lie algebra, i.e. we have π(exp(X)) = exp(∂π(X)). The representation ∂π
can be extended by complex linearity to a representation of gl(d, C) which we will
denote by ∂π as well. Hence ∂π leads to a representation π of the group GL(d, C).
We will adopt similar notations for representations of SU(d) and SL(d, C).
Assume now that π is an irreducible representation of GL(d, C). An infinitesimal
weight of π (or simply a weight in the following) is an element λ of the dual of t ∗C (d)
of tC (d) such that ∂π(X)x = λ(X)x holds for all X ∈ tC (d) and for a nonvanishing
x ∈ CN . The linear subspace Vλ ⊂ CN of all such x is called the weight subspace of
the weight λ. The set of weights of π is not empty and, due to irreducibility, there is
123 8.3. Appendix: Representations of unitary groups
exactly one weight m, called the highest weight, such that ∂π(Ejk )x = 0 for all x in
the weight subspace of m and for all j, k = 1, . . . , d with j < k. The representation
π is (up to unitary equivalence) uniquely determined by its highest weight. On the
other hand the weight m is uniquely determined by its values m(Ejj ) = mj on the
basis Ejj of tC (d). We will express this fact in the following as “m = (m1 , . . . , md )
is the highest weight of the representation π”. For each analytic representation of
GL(d, C) the mj are integers satisfying the inequalities m1 ≥ m2 ≥ · · · ≥ md
and the converse is also true: each family of integers with this property defines the
highest weight of an analytic, irreducible representation of GL(d, C).
In a similar way we can define weights and highest weights for representations
of the group SL(d, C) as linear forms on the Cartan subalgebra stC (d). As in the
GL(d, C)-case an irreducible representation π of SL(d, C) is characterized uniquely
by its highest weight m. However we can not evaluate m on the basis Ejj since these
matrices are not trace free. One possibility is to consider an arbitrary extension of
m to the algebra tC (d) = stC (d) ⊕ C1I. Obviously this extension is not unique.
Therefore the values m(Ejj ) = mj are unique only up to an additive constant. To
circumvent this problem we will use usually the normalization condition m d = 0.
In this case the integer mj corresponds to the number of boxes in the j th row of
the Young tableau usually used to characterize the irreducible representation π.
Another possibility to describe the weight m is to use the basis Hj of stC (d). We
get a sequence of integers lj = m(Hj ), j = 1, . . . , d − 1. They are related to the
mj by lj = mj − mj+1 . Each sequence l1 , . . . , ld−1 defines the highest weight of an
irreducible representation of SL(d, C) iff the lj are positive integers.
Finally consider the representation π̄ conjugate to π, i.e. π(u) = π(u). If π
is irreducible the same is true for π̄. Hence π̄ admits a highest weight which is
given by (−md , −md−1 , . . . , −m1 ). If π is a SU(d) representation we can apply the
normalization md = 0. Doing this as well for the conjugate representation we get
(m1 , m1 − md−1 , . . . , m1 − m2 , 0). In terms of Young tableaus this corresponds to
the usual rule to construct the tableau of the conjugate representation: Complete
the Young tableau of π to form a d × m1 rectangle. The complementary tableau
rotated by 180◦ is the Young tableau of π̄.
8.3.3 The Casimir invariants
To each Lie algebra g we can associate its universalL enveloping algebra G. It is

⊗N
defined as the quotient of the full tensor algebra n∈N0 g with the two sided
ideal I generated by X ⊗ Y − Y ⊗ X − [X, Y ], i.e. G is an associative algebra. The
original Lie algebra g can be embedded in its envelopping algebra G by g 3 X 7→
X +I ∈ G. The Lie bracket is then simply given by [X, Y ] = XY −Y X. Moreover G
is algebraically generated by g and 1I. Hence each representation ∂π of g generates
a unique representation ∂π of G simply by ∂π(X1 · · · Xk ) = ∂π(X1 ) · · · ∂π(Xk ). If
∂π is irreducible the induced representation ∂π is irreducible as well.
We are interested not in the whole algebra but only in its center Z(G), i.e. the
subalgebra consisting of all Z ∈ G commuting with all elements of G. The elements
of Z(G) are called central elements or Casimir elements. If ∂π is a representation
of G the representatives ∂π(Z) of Casimir elements commute with all other rep-
resentatives ∂π(X). This implies for irreducible representations that all ∂π(Z) are
multiples of the identity.
Consider now the case g = gl(d, C). In this case we can identify the envelopping
algebra G with the set of all left invariant differential operators on GL(d, C) (a
similar statement is true for any Lie group). Of special interest for us are the Casimir
elements belonging to operators of first and second order. Using the standard basis
Eij of gl(d, C) introduced in Section 8.3.1 they are given by

d
X d
X
C1 = Ejj and C2 = Ejk Ekj .
j=1 j,k=1
Of course C21 is as well of second order and it is linearly independent of C2 . Hence

each second order Casimir element of G is a linear combination of C2 and C21 .
If ∂π is an irreducible representation of gl(d, C) with highest weight (m1 , . . . , md )
it induces, as described above, an irreducible representation ∂π of G and the images
of ∂π(C1 ) and ∂π(C2 ) are multiples of the identity, i.e. ∂π(C1 ) = C1 (π)1I and
∂π(C2 ) = C2 (π)1I with
d
X d
X X
C1 (π) = mj and C2 (π) = m2j + (mj − mk ). (8.47)
j=1 j=1 j<k
Let us discuss now the Casimir elements of SL(d, C). Since SL(d, C) is a subgroup
of GL(d, C) its enveloping algebra S is a subalgebra of G. However the corresponding
Lie algebras differ only by the center of gl(d, C). Hence the center Z(S) of S is a
subalgebra of Z(G). Since sl(d, C) is simple there is no first order Casimir element
and there is only one second order Casimir element C e 2 which is therefore a linear
combination C e 2 = C2 + αC2 of C2 and C2 . Obviously the factor α is uniquely
1 1
determined by the condition that the expression
 2
d
X X d
X
e2 (π) = C1 (π) + αC12 (π) =
C m2j + (mj − mk ) + α  mj  (8.48)
j=1 j<k j=1
with ∂π(C e 2) = C
e2 (π)1I is invariant under the renormalization (m1 , . . . , md ) 7→ (m1 +
µ, . . . , md + µ) with an arbitrary constant µ. Straightforward calculations show that
α = − d1 . Hence we get C e2 = C2 − 1 C 2 and
d 1
 
Xd Xd X
1
e2 (π) = (d − 1)
C m2j − mj mk + d (mj − mk ) . (8.49)
d j=1 j6=k j<k
e 2 can be expressed in terms of a basis (Xj )j of sl(d, C). In fact there

Alternatively C
is a symmetric second rank tensor g jk Xj ⊗ Xk ∈ sl(d, C) ⊗ sl(d, C) such that C e2
jk e P jk
coincides with the equivalence class of g in S. In other words C2 = jk g Xj Xk
holds which leads to X
Ce2 (π)1I = g jk ∂π(Xi )∂π(Xj ) (8.50)
jk
for an irreducible representation π of SU(d).

Chapter 9
Optimal Cloning
After the general discussion let us consider now several special problems. The
first is optimal cloning of pure states. In other words we are searching for a device
T ∈ T (N, M ) which acts on N d-level systems, each of them in the same (unknown)
pure state ρ and which yields at its output side an M -particle system in a state
T ∗ (ρ⊗N ) which approximates the product state ρ⊗M “as good as possible”. This is
obviously easy if N ≥ M holds, because we only have to drop some particles. Hence
we will assume throughout this chapter M > N , as long as nothing else is explicitly
stated.
The presentation in this chapter is based on [230, 136] and it concerns universal
and symmetric cloners, i.e. we are looking at problems which admit symmetry prop-
erties as discussed in Section 8.2. Other related work in this direction, in most cases
restricted to the qubit case can be found in [96, 41, 42, 45]. Other approaches to
quantum cloning, which are not subject of this work, include “asymmetric cloning”,
which arises if we trade the quality of one particular output system against the rest
(see [49]) and cloning of Gaussian states [50].
9.1 Figures of merit
To get a figure of merit ∆ which measures the quality of the clones, we can follow
the general formula (8.4). Hence we have to choose the set of pure states for X
and β(ρ) = ρ⊗M for the target functional. The remaining freedom is the distance
measure δ and there are in fact two physically different choices: We can either
check the quality of each clone separately or we can test in addition the correlations
between output systems. With the notation
ρ(j) = 1I⊗(j−1) ⊗ ρ ⊗ 1I⊗(M −j) ∈ B(H⊗M ) (9.1)
a figure of merit for the first case is given by

£ ¡ ¢¤
∆C 1 (T ) = sup 1 − tr T (ρ(j) )ρ⊗N = 1 − F1C (T ) (9.2)
ρ pure,j
where the supremum is taken over all pure states ρ and j = 1, . . . , N and F1C
denotes the “one-particle fidelity”
£ ¤
F1C (T ) = inf tr T (ρ(j) )ρ⊗N . (9.3)
ρ pure,j
∗ ⊗N
∆C1 measures the worst one particle error of the output state T (σ ). If we are
interested in correlations too, we have to choose
£ ¡ ⊗M ⊗N
¢¤
∆Call (T ) = sup 1 − tr T (ρ )ρ C
= 1 − Fall (T ) (9.4)
ρ,pure
and the corresponding fidelity is

£ ¤
C
Fall (T ) = inf tr T (ρ⊗M )σ ⊗N . (9.5)
ρ pure
∆Call measures again a “worst case” error, but now of the full output with respect
to M uncorrelated copies of the input ρ. Note that we can replace the fidelity
quantities in Equation (9.4) and (9.2) by other distance measures like trace-norm
9. Optimal Cloning 126
distances1 or relative entropies without changing the results we are going to present
significantly (although some proofs might become more difficult). This is however a
special feature of the pure state case. For mixed state cloning the correct choice of
the figure of merit has to be done much more carefully; cf the discussion in Section
9.6.
Another simplification which arises from the restriction to pure input states
concerns the dependency of ∆C C
1 and ∆all on the channel T . Since ρ = |ψihψ| with
ψ ∈ H holds we have
tr[T (A)ρ⊗N ] = hψ ⊗N , T (A)ψ ⊗N i (9.6)
for all A ∈ B(H⊗M ). Therefore tr[T (A)ρ⊗N ] depends only on the part of T (A)
⊗N
which is supported by the symmetric tensor product H+ ⊂ H⊗N , i.e. we have
C C
∆] (T ) = ∆] (T+ ), ] = 1, all with T+ (A) = SN T (A)SN , where SN denotes as
⊗N
in Subsection 8.2.2 the projection onto H+ . This implies that it is sufficient to
consider channels
⊗N
T : B(H⊗M ) → B(H+ ). (9.7)
⊗N
Since H+ ⊂ H⊗N we can look at such a T as a cp-map which takes its values in
⊗N ⊗N
B(H ), but in this sense it is not unital (since T is unital as map into B(H+ )
i.e. we have T (1I) = SN ). In other words the channel T from Equation (9.7) is not
in T (N, M ) and does not fit into the general discussion from Section 8.1. This is,
however, an artificial problem, because we can replace T by
⊥ ⊥
tr(SN ASN ) ⊥
Te(A) = T (A) + ⊥
SN , (9.8)
dim SN
⊥
where SN denotes the orthocomplement of SN . This new map is obviously com-
pletely positive and unital (i.e. in T (N, M )) but the additional term does not change
C e
the value of ∆C C
] (i.e. ∆] (T ) = ∆] (T )). Whenever it is necessary for formal reasons
to consider T from Equation (9.7) as an element of T (N, M ) such an extension is
understood.
9.2 The optimal cloner
According to the discussion from the last paragraph, the optimal cloning map has
⊗N
to take density operators on H+ to operators on H⊗M . An easy way to achieve
such a transformation is to tensor the given operator ρ with the identity operators
belonging to tensor factors (N + 1) through M , i.e., to take ρ 7→ ρ ⊗ 1I⊗(M −N ) . This
breaks the symmetry between the clones, making N perfect copies and (N − M )
states, which are worst possible “copies”. Moreover, it does not map to states on the
⊗M
Bose sector H+ , which would certainly be desirable, as the target states ρ⊗M are
supported by that subspace. An easy way to remedy both defects is to compress the
operator to the symmetric subspace with the projection SM . With the appropriate
normalization this is our definition of the cloning map, later shown to be optimal:
d[N ]
Tb∗ (ρ) = SM (ρ ⊗ 1I⊗(M −N ) )SM (9.9)
d[M ]
⊗N
where d[N ] (respectively d[M ]) denotes the dimension of H+ , i.e.
µ ¶ µ ¶
−d d+N −1
d[N ] = (−1)N = , (9.10)
N N
1 Trace norm distances would lead in the present case to exactly the same results, including the
values of the minimal errors from Proposition 9.2.2. This is easy to check. Other proofs, however,
would become more difficult, which is the reason why we have chosen fidelity based quantities.
127 9.2. The optimal cloner
⊗N
which can be checked easily using the occupation number basis of H+ . The channel
Tb given in Equation (9.9) produces M clones from N input systems. Sometimes it
is useful to have a symbol for Tb which indicates these numbers (i.e. if N and M are
not understood from the context) in this case we write TbN →M instead of Tb. The
following two propositions summarizes the most elementary properties of Tb.
Proposition 9.2.1 The map Tb : B(H⊗M ) → B(H+

⊗N
) with dual (9.9) is completely
positive, unital and fully symmetric.
Proof. Full symmetry and complete positivity of Tb are obvious. Hence it remains to
show that Tb is unital. With U(d) covariance of T (which is part of full symmetry)
we get
Tb(1I) = Tb(U ⊗M 1IU ∗⊗M ) = π+

N
(U )Tb(1I)π+
N
(U )∗ , (9.11)
where π+ N
(U ) = SN U ⊗N SN . Irreducibility of π+ (cf. Proposition 8.2.7) implies
therefore Tb(1I) = λ1I. To determine the value of λ ∈ R+ , let us consider the density
matrix τN = d[N ]−1 SN and
£ ¤ £ ¤
λ = tr Tb(1I)τN = tr Tb∗ (τN ) (9.12)
· ¸
1
= tr SM (SN ⊗ 1I⊗(M −N ) )SM (9.13)
d[M ]
· ¸
SM
= tr = 1. (9.14)
d[M ]
Hence Tb is unital, which completes the proof. 2
Proposition 9.2.2 The one- and all-particle errors ∆C C

1 and ∆all of the cloning
b
map T defined in Equation (9.9) are given by
¯ ¯
b d − 1 ¯¯ N M + d ¯¯
∆C
1 (T ) = 1 − (9.15)
d ¯ N +d M ¯
¯ ¯
¯ d[N ] ¯¯
∆C ( b
T ) = ¯ 1 − . (9.16)
all ¯ d[M ] ¯
C b C b
Proof. Consider ∆C
1 first. By definition we have ∆1 (T ) = 1 − F1 (T ) and
N
£ ¤ 1 X £ (j) b∗ ⊗M ¤
F1C (Tb) = inf tr T (σ (j) )σ ⊗N = tr ρ T (ρ ) (9.17)
σ pure,j N j=1
for an arbitrary pure state ρ. To get£the second equality

¤ we have to use the symmetry
properties of Tb which imply that tr T (σ (j) )σ ⊗N does not depend on j and σ. Using
this we get
M
d[N ] X £ (j) ¤
F1C (Tb) = tr ρ SM (ρ⊗N ⊗ 1I⊗(M −N ) )SM . (9.18)
M d[M ] j=1
2
P
Since SM is a projector (SM = SM ) and due to [ j σ (j) , SM ] = 0 this equation
leads to
M
d[N ] X £ (j) ⊗N ¤
F1C (Tb) = tr ρ (ρ ⊗ 1I⊗(M −N ) )SM (9.19)
M d[M ] j=1
d[N ] ³ £ ¤
= N tr SM (ρ⊗N ⊗ 1I⊗(M −N ) )SM (9.20)
M d[M ]
£ ¤´
+ (M − N ) tr SM (ρ⊗(N +1) ⊗ 1I⊗(M −N −1) )SM (9.21)
N £ b∗ ¤ (M − N ) d[N ] £ ¤
= tr TN →M (ρ⊗N ) + tr TbN∗ +1→M (ρ⊗(N +1) ) (9.22)
M M d[N + 1]
Where the indices N → M and N + 1 → M indicate that two variants of Tb occur

here. One operates on N the other on N + 1 input systems; cf. the corresponding
remark above. Inserting Equation (9.10) into (9.22) we get
N M −N N −1
F1C (Tb) = + . (9.23)
M M N +d
Together with ∆C C
1 = 1 − F1 a straightforward computation yields Equation (9.15).
To show the other equation note that
d[N ] ⊗M
F1C (Tb) = inf hψ , SM (ρ⊗N ⊗ 1I⊗(N −M ) )SM ψ ⊗M i, (9.24)
ψ d[M ]
where ρ = |ψihψ| and the infimum is taken over all normalized ψ ∈ H. Since
∗
SM = SM and SM ψ ⊗M = ψ ⊗M we get
d[N ] ⊗N ⊗N ⊗N d[N ]
F1C (Tb) = inf hψ , ρ ψ ihψ ⊗(M −N ) , ψ ⊗(M −N ) i = . (9.25)
ψ d[M ] d[M ]
Together with ∆C C
all = 1 − Fall Equation (9.16) follows. 2
The significance of Tb lies in the fact that it is the only cloning map which
minimizes ∆C C
1 and ∆all . The central result of this section is therefore the following.
⊗N
Theorem 9.2.3 For any cloning map T : B(H⊗M ) → B(H+ ) (with M > N ,
d ⊗N
H = C and H+ denotes the N -fold symmetric tensor product) we have
¯ ¯
d − 1 ¯¯ N M + d ¯¯
∆C (T ) ≤ 1 − (9.26)
1
d ¯ N +d M ¯
¯ ¯
¯ d[N ] ¯¯
∆C (T ) ≤ ¯ 1 − . (9.27)
all ¯ d[M ] ¯
In both cases equality holds iff T = Tb from (9.9).

The proof of this theorem is rather nontrivial (in particular the part concerning
∆C 1 ) and therefore distributed over the following subsections. The surprising mes-
sage of this result is that the amount of entanglement we allow between the clones
does not influence the best cloning device we can construct. This means that we can
not increase the quality of individual clones by increasing at the same time the cor-
relations between them. We will see in Subsection 9.6 that this changes drastically
if we consider mixed state cloning. Another aspect where the difference between ∆ C 1
and ∆C all is crucial are asymptotic rates (number of clones per input system) in the
limit N → ∞. We will discuss this point in detail in Section 9.5 and 11.4.
129 9.3. Testing all clones
9.3 Testing all clones

9.3.1 Existence and uniqueness
The purpose of this subsection is to prove the statements of Theorem 9.2.3 which
concern the all particle error ∆C b
all , i.e. optimality of T and its uniqueness. To this
end let us come back to the discussion of symmetry properties from Section 8.2. As
a supremum over affine quantities, ∆C all is convex and lower semicontinuous. It is in
addition straightforward to check that ∆C all is invariant under the action αU,σ,τ of
the group U(d) × SN × SM defined in Equation (8.6). Hence Lemma 8.2.1 applies.
To prove optimality of Tb it is therefore sufficient to show that ∆C C b
all (T ) ≥ ∆all (T )
holds for all fully symmetric T .
Additional simplification arises from the fact that we are only looking for chan-
⊗N
nels which take their values in the algebra B(H+ ) (rather than B(H⊗N )). This
implies that in the direct sum decomposition from Proposition 8.2.8 only the sum-
⊗N
mand with m = N 1 occurs (since H+ = HN 1 holds according to Proposition
8.2.7). Hence we can apply Theorem 8.2.9 and get in the Schrödinger picture (by
rewriting Equation (8.25) accordingly)
X cm
T ∗ (η) = T ∗ (η) ⊗ 1IKm (9.28)
dim Km m
m∈Yd (M )
⊗N ⊗N
with covariant, unital channels Tm : B(Hm ) → B(H+ ) and η ∈ S(H+ ).
Proposition 9.3.1 The channel Tb minimizes the all particle error ∆C C
all = 1 − Fall .
C
Proof. To calculate the all particle fidelity
£ ∗ ⊗N Fall (T ) of¤ a fully symmetric T note that
U(d) covariance of T implies that tr T (ρ )ρ⊗M does not depend on the pure
state ρ ∈ S(H). Hence
£ ¤ £ ¤
C
Fall (T ) = inf tr T ∗ (σ ⊗N )σ ⊗M = tr T ∗ (ρ⊗N )ρ⊗M (9.29)
σpure
for an arbitrary pure state ρ. Inserting Equation (9.28) leads to

X cm £ ∗ ⊗N ¤
C
Fall (T ) = tr Tm (ρ )Pm ρ⊗M Pm . (9.30)
dim Km
m∈Yd (M )
⊗M
Since ρ⊗M is supported by H+ we have due to PM 1 = SM the equalities
⊗M ⊗M
PM 1 ρ PM 1 = ρ and Pm ρ⊗M Pm = 0 for m 6= M 1. Hence only one term
remains in the sum (9.29) and we get
C
£ ∗ ⊗N ⊗M
¤
Fall (T ) = cM 1 tr TM 1 (ρ )ρ . (9.31)
If T is optimal we must have cM 1 = 1 and therefore
C
£ ∗ ⊗N ⊗M
¤
T = TM 1 and Fall (T ) = TM 1 (ρ )ρ (9.32)
To get an upper bound on Fall C
we use positivity of the operator SN − ρ⊗N
and U(d) covariance of TM 1 . The latter implies that T ∗ (SN /d[N ]) is a density
M M
matrix which commutes with π+ (U ). Irreducibility of π+ implies together with
∗ ∗
the fact that TM 1 is trace preserving that TM 1 (SN ) = d[N ]/d[M ]SM . Hence we get
according to T = TM 1
£ ¤
0 ≤ tr T ∗ (SN − ρ⊗N )ρ⊗M (9.33)
d[N ] £ ¤ £ ¤
= tr SM ρ⊗M − tr T ∗ (ρ⊗N )ρ⊗M (9.34)
d[M ]
d[N ] £ ¤
= − tr T ∗ (ρ⊗N )ρ⊗M . (9.35)
d[M ]
C
Together with Equation (9.32) we get Fall (T ) ≤ d[N ]/d[M ]. However we already
know from Proposition 9.2.2 that Fall (T ) = d[N ]/d[M ], hence Tb is optimal.
C b
2
⊗N
Proposition 9.3.2 There is only one channel T : B(H ⊗M ) → B(H+ ) which
C C
minimizes ∆all (respectively maximizes Fall ).
⊗N
Proof. To prove uniqueness consider now a general channel T : B(H ⊗M ) → B(H+ )
C C
which minimizes ∆all (respectively maximizes Fall ). By averaging over the groups
U(d) and SM we get
Z
∗ 1 X £ N ¤
T (η) = Vτ U ⊗M ∗ T ∗ π+ N
(U )ηπ+ (U )∗ U ⊗M Vτ∗ dU (9.36)
M! U(d)
τ ∈SM
which is a fully symmetric channel. Due to convexity and invariance of ∆ C all we get
∆Call (T ) ≤ ∆ C
all (T ) and since T is by assumption already optimal: ∆ C
all (T ) = ∆Call (T ).
Hence T is optimal as well and at the same time fully symmetric; cf. the proof of
∗
Lemma 8.2.1. This implies according to Equation (9.32) that T (η) is supported by
⊗M ∗ ∗ ⊗N
H+ , i.e. T (η) = SM T (η)SM for all η ∈ S(H+ ). Hence we get for an arbitrary
⊗M
vector ψ ∈ H with SM ψ = 0
∗
0 = hψ, T (η)ψi (9.37)
Z
1 X ⊗M ∗ £ N ¤ ®
= U Vτ ψ, T ∗ π+ N
(U )ηπ+ (U )∗ U ⊗M Vτ∗ ψ dU. (9.38)
M! U(d)
τ ∈SM
This is a sum of integrals over positive quantities and it vanishes. Hence the
integrand has to be zero for all U and all τ , which implies in particular that
∗ ⊗M
hψ, T ∗ (η)ψi = 0. In other words T ∗ (η) is as T (η) supported only by H+ .
C
Since T is optimal we have Fall (T ) = d[N ]/d[M ]. Together with Equation (9.29)
and (9.35) this implies
£ ∗ ¤
tr T (SN − ρ⊗N )ρ⊗M = 0. (9.39)
As in Equation (9.37) we can argue that (9.39) involves an integral over positive
quantities which vanish. Hence Equation (9.39) holds as well for T .
£ ¤
tr T ∗ (SN − ρ⊗N )ρ⊗M = 0. (9.40)
£ ¤
Together with optimality this leads to tr T ∗ (SN )ρ⊗M = d[N ]/d[M ] for all pure
⊗M
states ρ. Since the symmetric subspace H+ is generated by tensor products ψ ⊗M
∗ ⊗M
and due to the observation that T (η) is supported by H+ we conclude that
∗
T (SN ) = d[N ]/d[M ]SM holds.
To further exploit the optimality condition, consider the Stinespring dilation of
T ∗ in the form
d[N ] ∗
T ∗ (η) = V (η ⊗ 1IK )V (9.41)
d[M ]
⊗M ⊗N
where V : H+ → H+ ⊗ K for some auxiliary Hilbert space K, and η is an
⊗N
arbitrary density matrix on H+ . We have included the factor d[N ]/d[M ] in this
definition, so that for an optimal cloner V ∗ V = 1I. The optimality condition (9.40)
written in terms of V becomes
¡ ¢ ® ¡ ¢
0 = ψ ⊗M , V ∗ (SN − ρ⊗N ) ⊗ 1IK V ψ ⊗M = k (SN − ρ⊗N ) ⊗ 1IK V ψ ⊗M k2 (9.42)
¡
where
¢ ρ⊗Mis the one-dimensional projection to ψ ∈ H. Equivalently, (SN − ρ⊗N ) ⊗
1IK V ψ = 0 which is to say that V ψ ⊗M must be in the subspace ψ ⊗N ⊗ K for
every ψ.
131 9.3. Testing all clones
So we can write V ψ ⊗M = ψ ⊗N ⊗ ξ(ψ), with ξ(ψ) ∈ K some vector depending in

a generally non-linear way on the unit vector ψ ∈ H. From the above observation
that V must be an isometry we can calculate the scalar products of all the vectors
ξ(ψ):
hφ, ψiM = hφ⊗M , ψ ⊗M i = hV φ⊗M , V ψ ⊗M i (9.43)

⊗N ⊗N N
= hφ ⊗ ξ(φ), ψ ⊗ ξ(ψ)i = hφ, ψi hξ(φ), ξ(ψ)i (9.44)
hence we get
hξ(φ), ξ(ψ)i = hφ, ψiM −N = hφ⊗(M −N ) , ψ ⊗(M −N ) i. (9.45)
This information is sufficient to compute all matrix elements

¡ ¢ ⊗M
hψ1⊗M , T ∗ |φ⊗N ⊗N
1 ihφ2 | ψ2 i (9.46)
i.e., T is uniquely determined and equal to Tb. 2
9.3.2 Supplementary properties

For the rest of this section we will discuss some additional statements about the
minimizer of ∆C all which will be important later on (Sections 9.4 and 11.3). The first
proposition is an interesting consequence of the uniqueness result just proved.
⊗N
Proposition 9.3.3 Each completely positive, unital map T : B(H ⊗M ) → B(H+ )
satisfying the equation T = Tb (where T denotes the averaged channel from Equation
(9.36)) coincides with Tb.
Proof. Note first that T = Tb implies in the Schrödinger picture
Z
∗ 1 X
T = αU,τ T ∗ dU = Tb∗ (9.47)
M!
τ ∈SM
where αU,τ is the action defined in Equation2 (8.6). Furthermore we know from
¡ ¢ d[N ]
Proposition 9.2.2 and 9.3.1 that tr ρ⊗M T ∗ (ρ⊗N ) ≤ d[M ] is true for all pure states
ρ ∈ S(H) and from Proposition 9.3.2 that equality holds iff T = Tb. Consequently
we have
Z µ ¶
1 X d[N ] £ ¤
− tr ρ⊗M αU,τ T (ρ⊗N ) dU =
M! d[M ]
τ ∈SM U(d)
d[N ] ³ ∗
´ d[N ] ³ ´
− tr ρ⊗M T (ρ⊗N ) = − tr ρ⊗M Tb∗ (ρ⊗N ) = 0. (9.48)
d[M ] d[M ]
Since the integral on the left hand site of this equation is taken over positive quanti-
ties the integrand has to vanish for all values of U ∈ U(d) and τ ∈ SM . This implies
£ ¤ d[N ]
tr ρ⊗M T ∗ (ρ⊗N ) = d[M ] for all pure states ρ ∈ S(H). However this is, according to
Proposition 9.3.2 only possible if T = Tb. 2
For later use (Chapter 11) let us temporarily drop our assumption that M > N
holds. In the case M ≤ N the optimal “cloner” is easy to achieve: we only have to
throw away N − M particles, i.e. we can define
TbN∗ →M (η) = trN −M (η), N ≥M (9.49)

2 Since T (A) ∈ B(H⊗N ) (i.e. T (A) is by assumption supported by H⊗N ) the action α of S
+ + σ N
on the range of T is trivial and therefore omitted.
where trN −M denotes the partial trace over N −M tensor factors. As in the N < M
case TbN →M is uniquely determined by optimality. The proof can be done (almost)
as in Proposition 9.3.2.
Proposition 9.3.4 Assume that N ≥ M holds, then TbN →M from Equation (9.49)
⊗N
is the only channel T : B(H⊗M ) → B(H+ ) with ∆C
all (T ) = 0.
Proof. Assume that T is optimal, i.e. ∆C all (T ) = 0. Then we can show exactly as
⊗N
in the proof of Proposition 9.3.2 that T ∗ (η) is supported for all η ∈ B ∗ (H+ ) by
⊗M
H+ . Hence we get
¡ ¢ ¡ ¢ ¡ ¢
1 = tr T ∗ (η) = tr T ∗ (η)SM = tr ηT (SM ) (9.50)
⊗N
for each pure state η ∈ B ∗ (H+ ). This implies T (SM ) = SN . Optimality of T
implies in addition 0 = ∆all (T ) = ∆C
C
all (T ), where T denotes the averaged channel
from Equation (9.36). Since T is fully symmetric we have
£ ¡ ¢¤ £ ¡ ¢¤
0 = sup 1 − tr T (σ ⊗M )σ ⊗N = 1 − tr T (ρ⊗M )ρ⊗N , (9.51)
σ,pure
with an arbitrary pure state ρ. The right hand side of this equation should be
regarded ¡as in (9.37) as
¢ an integral over positive quantities which vanishes. Hence
we get tr T (ρ⊗M )ρ⊗N = 1. Together with (9.50) this implies
£ ¤
0 = tr T (SM − ρ⊗M )ρ⊗N (9.52)
in analogy to Equation (9.40). Now we can proceed as in the proof of Proposition

9.3.2, only replacing T ∗ with T and d[N ]/d[M ] with 1. 2
9.4 Testing single clones

What remains is the proof of those parts of Theorem 9.2.3 which concerns ∆C 1.
The central idea is, as in the last Section, to reduce the whole discussion to fully
symmetric cloners. However, the simplifications which arise from this approach are
less strong as for ∆C b
all , and the proof of optimality of T is much more difficult. We
have broken it up therefore into several parts.
9.4.1 Fully symmetric cloners
As ∆C C
all the error ∆1 is a supremum over affine quantities and therefore convex
and lower semicontinuous. Invariance under the group action from Equation (8.6)
is easy to prove, hence we can apply Lemma 8.2.1 again to see that it is sufficient
⊗N
to search for minimizers among fully symmetric channels T : B(H ⊗M ) → B(H+ ).
As in the last Subsection we can apply Theorem 8.2.9 to get a decomposition of T
⊗N
into a convex linear combination of channels Tm : B(Hm ) → B(H+ ); cf. Equation
(8.25). As we will see in the next to subsection it is reasonable to decompose the
Tm further. According to Proposition 8.2.10 we get
K(m)
X X cm £ ¤
T (A) = Tmj trKm (Pm APm ) . (9.53)
dim Km
m∈Yd (M ) j=1
where K(m) ∈ N and the Tmj are U(d) covariant channels with
∗ N
Tmj (A) = Vmj A ⊗ 1IVmj , Vmj π+ = πm ⊗ π
ej Vmj (9.54)
where πej is an irreducible U(d) representation. Note that the π

ej are not necessarily
different from each other, i.e. we can have π
ej = πek although j 6= k
133 9.4. Testing single clones
C
£ To (k) ¤ F1 note that in analogy to Equation (9.29) the quantity
calculate
⊗N
tr T (σ )σ does not depend on the pure state σ and the index k = 1, . . . , M .
Hence we get
M
X 1 £ ¤
F1C (T ) = tr T (ρ(k) )ρ⊗N (9.55)
M
k=1
for an arbitrary pure state ρ. Now consider the Lie algebra sl(d, C) of SL(d, C),
i.e. the space of trace free d × d matrices
P equipped with the commutator as the
(k)
Lie bracket. The map sl(d, C) 3 X 7→ k X is the representation of sl(d, C)
corresponding to U 7→ U ⊗N . Hence we get
M
X
Pm X (k) Pm = ∂πm (X) ⊗ 1IKm , (9.56)
k=1
where ∂πm denotes the irreducible sl(d, C) representation associated to πm ; cf.

Subsection 8.3.2. With X = 1I/d − ρ and Equations (9.54), (9.55) and (9.56) we get
 
1 1 X K(m) X £ ¡ ¢ ¤
F1C (T ) =  − cm tr Tmj ∂πm (X) ρ⊗N  . (9.57)
d M j=1
m∈Yd (M )
To further exploit this equation we need the following lemma which helps to calcu-
late Tm,j (∂πm (X)).
Lemma 9.4.1 Let π : U(d) → B(Hπ ) be a unitary representation, and let T :
⊗N
B(Hπ ) → B(H+ ) be a completely positive, unital and U(d)-covariant map, i.e.
∗ N
T (π(u)Aπ(u) ) = π+ N
(u)T (A)π+ (u)∗ . Then there is a number ω(T ) such that
N
X
T [∂π(X)] = ω(T ) X (k) , (9.58)
k=1
for every trace free X ∈ B(Hπ ).

⊗N
Proof. Consider the linear map sl(d, C) 3 X 7→ L(X) = T [∂π(X)] ∈ B(H+ ). It
inherits from T and ∂π the covariance property
L(U XU ∗ ) = π+
N N
(U )L(X)π+ (U )∗ . (9.59)
⊗N ⊗N ⊗N
Now note that we can identify B(H+ ) with the tensor product H+ ⊗ H+ .
⊗N
Hence the map which associates to each U ∈ SU(d) the operator B(H+ ) 3 X 7→
⊗N
π+N
(U )Xπ+N
(U )∗ ∈ B(H+ ) can be reinterpreted as a unitary representation of
⊗N ⊗N
SU(d) on the representation space H+ ⊗ H+ . In fact it is (unitarily equivalent
to) the tensor product π+ ⊗ π+ . Since SU(d) 3 U 7→ U ( · )U −1 ∈ B((su(d)) is the
N N
adjoint representation of SU(d) this implies that each linear map L satisfying (9.59)
N N and the adjoint representation Ad. Note in addition that the
intertwines π+ ⊗ π+
representation
⊗N
X
⊗N ⊗N
sl(d, C) 3 X 7→ ∂π+ (X) = X (j) ∈ B(H+ ) (9.60)
j=1
of the Lie algebra sl(d, C) satisfies Equation (9.59) as well. Hence we have to show
that all such intertwiners are proportional, or in other words that Ad is contained
N N exactly once. This however is a straightforward application of standard
in π+ ⊗ π+
results from group theory. We omit the details here, see [241, § 79, Ex. 4] instead.
2
Applying this lemma to Equation (9.57) we get with X = 1I/d − ρ and ∆C

1 =
1 − F1C the following proposition.
⊗N
Proposition 9.4.2 For each fully symmetric channel T : B(H ⊗M ) → B(H+ )
with a convex decomposition as in Equation (9.54), the one particle error is given
by  
d−1  N X
∆C
1 (T ) = 1− cmj ω(Tmj ) . (9.61)
d M mj
Hence, to find the minimizer we have to maximize ω(Tmj ), and this is in fact
the hard part of the proof. Therefore we will explain the idea first in the d = 2 case.
9.4.2 The qubit case
For d = 2 the representations of SU(2) are conventionally labeled by their “total
angular momentum” α = 0, 1/2, 1, . . ., which is related to the highest weight m =
(m1 , m2 ) by α = (m1 − m2 )/2. The irreducible representation πα has dimension
+
2α + 1, and is isomorphic to πN with N = 2α in the notation used above. For
α = 1 we get the 3-dimensional representation isomorphic to the rotation group,
which is responsible for the importance of this group in physics. In a suitable basis
X1 , X2 , X3 of the Lie algebra su(2) we get the commutation relations [X1 , X2 ] =
X3 , and cyclic permutations of the indices thereof. In the α = 1 representation
∂π1 (Xk ) generates the rotations around the k-axis in 3-space. The Casimir operator
(cf. Subsection 8.3.3) of SU(2) is the square of this vector operator, i.e., C e2 =
P3 2
k=1 Xk . In the representation πα it is the scalar α(α + 1), i.e., if we extend the
representation ∂π of the Lie algebra to the universal enveloping algebra (which
also contains polynomials in the generators), we get ∂πα (C e 2 ) = α(α + 1)1I. We
can use this to determine ω(Tmj ) from Proposition 9.4.2 for arbitrary irreducible
representations. This computation can be seen as an elementary computation of a
so-called 6j-symbol, but we will not need to invoke any of the 6j-machinery.
Lemma 9.4.3 Consider three irreducible SU(2) representations π α , πβ , πγ with
α, β, γ ∈ {0, 1/2, . . .}, an intertwining isometry V πγ = πα ⊗ πβ V and the corre-
sponding channel T (A) = V ∗ (A ⊗ 1I)V ∗ . Then we have
1 α(α + 1) − β(β + 1)
ω(T ) = + . (9.62)
2 2γ(γ + 1)
Proof. According to Lemma 9.4.1 ω(T ) is defined by
ω(T ) · ∂πγ (Xk ) = V ∗ (∂πα (Xk ) ⊗ 1I)V. (9.63)
We multiply this equation

¡ by ∂πγ (Xk ), use the
¢ intertwining property of V in the
form V ∂πγ (X) = ∂πα (X) ⊗ 1I + 1I ⊗ ∂πβ (X) V , and sum over k to get
¡ ¢ X ¡ ¢
e 2 ) = V ∗ ∂πα (C
ω(T ) · ∂πγ (C e 2 ) ⊗ 1I V + V ∗ ∂πα (Xk ) ⊗ ∂πβ (Xk ) V. (9.64)
k
The tensor product in the second summand can be re-expressed in terms of Casimir
operators as
X¡ ¢ 1 X¡ ¢2
∂πα (Xk ) ⊗ ∂πβ (Xk ) = ∂πα (Xk ) ⊗ 1I + 1I ⊗ ∂πβ (Xk ) −
2
k k
1 e 2 ) ⊗ 1Iβ − 1 1Iα ⊗ ∂πα (C
e 2 ).
∂πα (C (9.65)
2 2
Inserting this into the previous equation, using the intertwining property once again,
and inserting the appropriate scalars for ∂π(C e 2) ≡ Ce2 (π)1I, we find that
e2 (πγ ) = C
e2 (πα ) + 1¡ e e2 (πα ) − C
¢
e2 (πβ ) ,
ω(T ) · C C2 (πγ ) − C (9.66)
2
and hence
1 C e2 (πα ) − C
e2 (πβ )
ω(T ) = + . (9.67)
2 e
2C2 (πγ )
e2 we find
Inserting the value for C
1 α(α + 1) − β(β + 1)
ω= + , (9.68)
2 2γ(γ + 1)
which was to show. 2
Note that we have only used the fact that the Casimir operator C e 2 is some fixed
quadratic expression in the generators. This is also true for SU(d). Hence equa-
tion (9.67) also holds in the general case; this observation leads directly to Lemma
9.4.4. In particular, we have shown that for the purpose of optimizing ω(Tmj ) for
any finite d only the isomorphism types of πα and πβ are relevant, but not the
particular intertwiner V .
To calculate ω(Tmj ) we have to set γ = N/2 and α is constrained by the con-
dition that πα must be a subrepresentation of U 7→ U ⊗M , which is equivalent to
α ≤ M/2. Finally we have π ej = πβ for some β = 0, 1/2, 1, . . . which is constrained by
the condition that there must be a non-zero intertwiner between πγ and πα ⊗ πβ . It
is well-known that this condition is equivalent to the inequality |α−β| ≤ γ ≤ α + β.
This is the same as the “triangle inequality”: the sum of any two of α, β, γ is larger
than the third. The area of admissible pairs (α, β) is represented in Fig. 1.
Since x 7→ x(x + 1) is increasing for x ≥ 0, we maximize ω with respect to β
in equation (9.62) if we choose β as small as possible, i.e., β = |α − γ|. Then the
numerator in equation (9.62) becomes
α(α + 1) − β(β + 1) = 2αγ − γ 2 + max{γ, 2α − γ}, (9.69)
which is strictly increasing in α. Hence the maximum

M +2
ωmax = (9.70)
N +2
is only attained for α = M/2 and β = (M − N )/2.
Note that the seemingly simpler procedure of first maximizing α and then min-
imizing β to the smallest value consistent with α = M/2 leads to the same result,
but is fallacious because it fails to rule out possibly larger values of ω in the lower
triangle of the admissible region in Fig. 1. The same problem arises for higher d, and
one has to be careful to find a maximization procedure which takes into account all
constraints.
9.4.3 The general case
Let us generalize now the previous discussion to arbitrary but finite d. The first
step is to establish the analog of Equation (9.67).
Lemma 9.4.4 Consider two irreducible SU(d) representations π m and πn with
N
highest weight m, n, an intertwining isometry V π+ = πm ⊗ πn V and the cor-
responding channel
Tmn (A) = V ∗ (A ⊗ 1I)V. (9.71)
β
6
¡
¡
¡
¡
¡
(M − N )/2 ¡ r ωmax
¡ ¡
¡ ¡
¡ ¡
¡ ¡
¡ ¡
¡ ¡
¡ ¡
N/2 ¡ ¡
@ ¡
@ ¡
@¡ -
N/2 M/2 α
Figure 9.1: Area of admissible pairs (α, β).
Then we have
1 C e2 (πm ) − C
e2 (πn )
ω(Tmn ) = + , (9.72)
2 e2 (π )
2C N
+
e2 denotes the second order Casimir operator of SU(d); cf. Subsection 8.3.3.
where C
Proof. In analogy to (9.63) we consider the equation

N
ω(Tmn ) · ∂π+ (X) = V ∗ (∂πm (X) ⊗ 1In )V, ∀X ∈ su(d), (9.73)
which follows from Lemma 9.4.1. Note that equation (9.73) is valid only for X ∈
su(d) (and not for X ∈ u(d) in general). Hence we have to consider the second order
Casimir operator C e 2 of SU(d) which is given, according to Subsection 8.3.3, by an
expression of the form C e 2 = P g jk Xj Xk . This is all we needed in the derivation
jk
of equation (9.67) in Lemma 9.4.3. Hence the statement follows. 2
All channels Tmj from Equation (9.54) are of the form (9.71) with some highest
weight n. Hence the previous lemma shows together with Proposition 9.4.2 and the
e2 (π N ) is a positive constant that we have only to maximize the function
fact that C +
e2 (πm ) − C
W 3 (m, n) 7→ F (m, n) = C e2 (πn ) ∈ Z (9.74)
on its domain
W = {(m, n) ∈ Zd × Zd | m ∈ Yd (M ) and π+
N
⊂ πm ⊗ πn }, (9.75)
N N
where π+ ⊂ πm ⊗ πn stands for: “π+ is a subrepresentation of πm ⊗ πn ” and the
latter is an necessary and sufficient condition for the existence of an intertwining
N
isometry V between π+ and πm ⊗ πn . The first step is the following Lemma
Lemma 9.4.5 The function F from Equation (9.74) is given by
2M N − N 2
F (m, n) = F1 (m, n) − , (9.76)
d
with
W 3 (m, n) 7→ F1 (m, n) = C2 (πm ) − C2 (πn ) =

d
X d
X
(m2j − n2j ) + (d − 2k + 1)(mk − nk ) ∈ Z (9.77)
j=1 k=1
Proof. The first step is to reexpress F (m, n) in terms of the U(d) Casimir operators
C2 and C21 . Note in this context that although equation (9.73) is, as already stated,
valid only for X ∈ su(d) the representations πm and πn are still U(d) representations
Hence we can apply the equation C e 2 = C2 − 1 C2 given in Section 8.3.3:
d 1
1
F (m, n) = C2 (πm ) − C2 (πn ) − (C12 (πm ) − C12 (πn )). (9.78)
d
This rewriting is helpful, because the invariants C1 turn out to be independent of the
variational parameters: Since πm is a subrepresentation of U 7→ U ⊗M = π1⊗M (U )
and ∂π1⊗M (1I) = M 1I, we also have C1 (πm ) = M . On the other hand, the existence
N
of an intertwining isometry V with V π+ = πm ⊗ πn V implies
N N
V C1 (π+ )1I = V ∂π+ (C1 ) = (∂πm (C1 ) ⊗ 1In + 1Im ⊗ ∂πn (C1 )) V
= (C1 (πm )1I + C1 (πn )1I) V (9.79)
N N
and therefore C1 (π+ ) = C1 (πm ) + C1 (πn ). Since C1 (π+ ) = N and C1 (πm ) = M we
get C1 (πn ) = N − M . Inserting this into equation (9.78) the statement follows. 2
We see that only F1 depends on the variational parameter and has to be max-
imized over W . To do this we have to express the constraints defining the domain
W more explicitly.
Lemma 9.4.6 If we introduce for each n ∈ Zd the notation n
e = (e
n1 , . . . , n
ed ) =
(−nd , . . . , −n1 ), we can express the set W as
W = {(m, n) | n
e = m − µ, and (m, µ) ∈ W1 } (9.80)
with
d
X
W1 = {(m, µ) ∈ Yd (M ) × Zd | µk = N and
k=1
0 ≤ µk ≤ mk − mk+1 ∀k = 1, . . . , d − 1}. (9.81)
The function F1 is then given by
e) =
F1 (m, n) = F1 (m, n
d
X d
X
µk (2mk − 2k − µk ) + (d + 1) µk = F2 (m, µ) + (d + 1)N (9.82)
k=1 k=1
with
d
X
W1 3 (m, µ) 7→ F2 (m, µ) = µk (2mk − 2k − µk ) ∈ Z. (9.83)
k=1
Proof. To fix the constraints for n consider the characters χn , χm and χN + of πn , πm

N
and π+ , i.e. χn (U ) = tr[πn (U )] and similarly in the other cases. They are elements
¡ ¢
of the Hilbert space L2 U(d) and we have [195, Cor. VII.9.6]
N
π+ ⊂ πm ⊗ πn ⇔ hχN
+ , χm χn i 6= 0. (9.84)
More precisely the scalar product hχN N

+ , χm χn i coincides with the multiplicity of π+
in πm ⊗ πn . Hence
0 6= hχN N N
+ , χm χn i = hχ+ χn , χn i ⇔ πm ⊂ π+ ⊗ πn . (9.85)
e is the highest weight of πn (cf. Subsection 8.3.2) we get
Since n
N
πm ⊂ π + ⊗ πe
n
. (9.86)
N
A tensor product of π+ = πN 1 and an irreducible representation πp (with arbitrary
highest weight p) can be decomposed explicitly into irreducible components [241,
§79, Ex. 4]: X
πN 1 ⊗ π p = πp1 +µ1 ,...,pd +µd . (9.87)
0≤µk+1 ≤pk −pk+1
µ1 +···+µd =N
e we get
Together with Equation (9.86) and the definition of n
N
π+ ⊂ πm ⊗ πn ⇐⇒ n
ek = mk − µk with
d
X
0 ≤ µk ≤ mk − mk+1 ∀k = 1, . . . , d − 1 and µk = N, (9.88)
k=1
and this implies the statement about W . To express the function F1 in terms of the
new variables note that C2 (πn ) = C2 (πn ) = C2 (πe
n
).
e ) = F1 (m, m − µ).
F1 (m, n) = F1 (m, n (9.89)
Together with equation (9.77) this implies (9.76). 2
Hence we have reduced our problem to the following Lemma:

Lemma 9.4.7 The function F2 : W1 → Z defined in equation (9.83) attains its
maximum for and only for
(
(N, 0, . . . , 0) for N ≤ M
mmax = (M, 0, . . . , 0) and µmax = (9.90)
M, 0, . . . , 0, N − M ) for N ≥ M.
Proof. We consider a number of cases in each of which we apply a different strategy

for increasing F2 . In these procedures we consider d to be a variable parameter, too,
because if µd = md = 0, the further optimization will be treated as a special case
of the same problem with d reduced by one.
Case A: µd > 0, µi < mi − mi+1 for some i < d.
In this case we apply the substitution µi 7→ (µi + 1), µd 7→ (µd − 1), which leads to
the change
¡ ¢ ¡ ¢
δF2 = 2 −µi + µd + (d − i − 1) + mi+1 − md ≥ 2 µd + (d − i − 1) > 0 (9.91)
in the target functional. In this way we proceed until either all µi with i < d satisfy
the upper bound with equality (Case B below) or µd = 0, i.e., Case C or Case D
applies.
Case B: µd > 0, µi = mi − mi+1 for all i < d. In this case all µk , including µd
are determined by the mk and by the normalization (µd = N − m1 + md ). Inserting
these values into F2 , and using the normalization conditions, we get F2 (m, n) =
F3 (m) − 2(M + dN ) − N 2 with
F3 (m) = 2(N + d)m1 constrained by

X
m1 ≥ · · · ≥ md ≥ 0, mk = M, and m1 − md ≤ N. (9.92)
k
This defines a variational problem in its own right. Any step increasing m 1 at the
expense of some other mk increases F2 . This process terminates either, when M =
m1 , and all other mk = 0. This is surely the case for M < N , because then µd =
N −m1 +md ≥ N −M > 0. This is already the final result claimed in the Lemma. On
the other hand, the process may terminate because µd reaches 0 or would become
negative. In the former case we get µd = 0, and hence Case C or Case D. The latter
case (termination at µd = 1) may occur because the transformation m1 7→ (m1 + 1),
md 7→ (md − 1) changes µd = N − m1 + md by −2. There are two basic situations
in which changing both m1 and md is the only option for maximizing F3 , namely
d = 2 and m1 = m2 = · · · = md . The first case is treated below as Case E. In the
latter case we have 1 = N − m1 + md = N . Then the overall variational problem in
the Lemma is trivial, because only one term remains, and one only has to maximize
the quantity 2mk − 2k − 1, with trivial maximum at k = 1, m1 = M .
Case C: µd = 0, md > 0. For µd = 0, the number md does not enter in the
function F2 . Therefore, the move md 7→ 0 and m1 7→ m1 + md , increases F2 by
µ1 md ≥ 0. Note that this is always compatible with the constraints, and we end up
in Case D.
Case D: µd = 0, md = 0, d > 2. Set d 7→ (d − 1). Note that we could now use
the extra constraint µd0 ≤ md0 , where d0 = d − 1. We will not use it, so in principle
we might get a larger maximum. However, since we do find a maximizer satisfying
all constraints, we still get a valid maximum.
Case E: d = 2, µ1 = m1 − m2 , µ2 = 1. In this case m = (m1 , m2 ) is completely
fixed by the constraints. We have: m1 + m2 = M and µ1 + µ2 = m1 − m2 + 1 = N
hence m1 −m2 = N −1. This implies 2m1 = M +N −1, 2m2 = M −N +1 and since
m2 ≥ 0 we get M ≥ N − 1. If M = N − 1 holds we get m1 = N − 1 = M , m2 = 0
and consequently µ1 = N − 1. Together with µ2 = 1 = N − M these are exactly
the parameters where F2 should take its maximum according to the Lemma. Hence
assume M ≥ N . In this case µ2 = 1 implies that F2 becomes N M − 3N − 4, which
is, due to M ≥ N , strictly smaller than F2 (M, 0; N, 0) = 2M N − N 2 − 2N .
Uniqueness: In all cases just discussed the manipulations described lead to a
strict increase of F2 (m, µ) as long as (m, µ) 6= (mmax , µmax ) holds. The only ex-
ception is Case C with µ1 = 0. In this situation there is a 1 < k < d with µk > 0.
Hence we can apply the maps d 7→ d − 1 (Case D) and md 7→ 0 and m1 7→ m1 + md
(Case C) until we get µd 6= 0 (i.e. d reaches k). Since µ1 = 0 the corresponding
(m, µ) is not equal to (mmax , µmax ). Therefore we can apply one of manipulations
described in Case A, Case B or Case E which leads to a strict increase of F2 (m, µ).
This shows that F2 (m, µ) < F2 (mmax , µmax ) as long as (m, µ) 6= (mmax , µmax )
holds. Consequently the maximum is unique. 2
Now we are ready to prove optimality of Tb with respect to ∆C

1:
Proposition 9.4.8 The channel Tb minimizes the one particle error ∆C

1 (T ) = 1 −
C
Fall .
Proof. With Lemma 9.4.7 and Equations (9.72), (9.74), (9.76), (9.82) and (9.83)
we can easily calculate ωmax :
M +d
ωmax = ω(Tb) =
N +d
C b
and with Proposition 9.4.2 we get ∆C
1 (T ) ≥ ∆1 (T ) for all T . 2
The last thing which is missing is the uniqueness proof for ∆C

1.
⊗N
Proposition 9.4.9 There is only one channel : B(H ⊗M ) → B(H+ ) which mini-
C
mizes ∆1 .
Proof. One part of the uniqueness proof is already given above: there is only one
optimal fully symmetric cloning map, namely Tb. This follows easily from the unique-
ness of the maximum found in Lemma 9.4.7 and from the fact that the representation
N + +
π+ is contained exactly once in the tensor product πM ⊗ πM −N (cf. Equation (9.87)
and the decomposition of a fully symmetric T from Equation (9.53)).
Suppose now that T is a non-covariant cloning map, which also attains the best
C b
value: ∆C 1 (T ) = ∆1 (T ). Then we may consider the average T of T over the group
SM × U(d) (cf. Equation (9.36)), which is also optimal and, in addition, fully sym-
metric. Therefore T = Tb. The uniqueness part of the proof thus follows immediately
from Proposition 9.3.3. 2
9.5 Asymptotic behavior

Finally let us analyze the asymptotic behavior of the optimal cloner, in other words
we have to ask: What is the maximal asymptotic rate r ∈ R+ (number of outputs
per input system) such that the cloning error vanishes in the limit of infinitely many
input systems. In other words we are searching for r such that
lim ∆C b
] (TN →brN c ) = 0 (9.93)
N →∞
holds; where bxc denotes the biggest integer smaller than x. Note that this question
is related to entanglement of distillation and channel capacities, where asymptotic
rates are used as well. In the present case the complete answer to our question is
given by the following Theorem
Theorem 9.5.1 For each asymptotic rate r ∈ [1, ∞] we have
lim ∆C b
1 (TN →brN c ) = 0 (9.94)
N →∞
b 1
lim ∆C
all (TN →brN c ) = 1 − (9.95)
N →∞ rd−1
Proof. Consider first the all particle error. According to Equation (9.15) we have
¯ ¯
b d − 1 ¯¯ N (r − 1)N + d ¯¯
∆C ( T ) ≤ 1 − (9.96)
d ¯ N + d (r − 1)N ¯
1 N →brN c
¯ ¯
d − 1 ¯¯ 1 (r − 1) + d/N ¯¯
= 1 − . (9.97)
d ¯ r − 1 1 + d/N ¯
Hence limN →∞ ∆C b C
1 (TN →brN c ) = 0 as stated. The all particle error ∆all is given
according to Equation (9.16) and (9.10) by
b (N + 1)(N + 2) · · · (N + d − 1)
∆C
all (TN →brN c ) = 1 − (9.98)
(brN c + 1)(brN c + 2) · · · (brN c + d − 1)
hence we get
b (N + 1)(N + 2) · · · (N + d − 1)
lim ∆C
all (TN →brN c ) = 1 − lim (9.99)
N →∞ N →∞ (rN + 1)(rN + 2) · · · (rN + d − 1)
1 + 1/N 1 + 2/N 1 + (d − 1)/N

= 1 − lim ··· (9.100)
N →∞ r + 1/N r + 2/N r + (d − 1)/N
1
= 1 − d−1 (9.101)
r
This completes the proof. 2
141 9.6. Cloning of mixed states
This results complements Theorem 9.2.3 where we have seen that the one- and
all-particle error admits the same (unique) optimal cloner. If we consider the asymp-
totic behavior we see that both figures behaves very differently: We can produce
optimal copies at infinite rate if we measure only the quality of individual clones. If
we take in addition correlations into account the rate is, however, zero.
9.6 Cloning of mixed states
Up to now we have excluded a discussion of mixed state cloning and related tasks.
The reason is that the search for a reasonable figure of merit is much more difficult
in this case and not even clarified for classical systems. At first, the latter statement
sounds strange, because it is indeed possible to copy classical information without
any error. However, cloning mixed states of a classical system does not mean to
copy a particular code word (e.g. from the hard drive in the memory of a computer)
but to enlarge a sample of N iid random variables to a size M > N .
To explain this in greater detail let us consider a finite alphabet X, the corre-
sponding classical observable algebra C(X) and a channel T : C(X M ) → C(X N ).
This T can be interpreted as a device which maps codewords of length N to code-
words of length M and it is uniquely characterized by the matrix T~xy~ , ~x ∈ X N ,
~y ∈ X M of transition probabilities; i.e. T~x~y denotes the probability that the code-
word ~x = (x1 , . . . , xN ) is mapped to ~y = (y1 , . . . , yM ). If S denotes in addition
a source which produces letters from X independently and identically distributed
according to the (unknown) probability distribution p ∈ S(X) (recall the notation
and terminology from Subsection 2.1.3), we can describe classical cloning as follows:
Draw a sample ~x ∈ X N from S and generate with probability T~xy~ a bigger sequence
~y = (y1 , . . . , yM ) ∈ X M which reflects the statistics of S as good as possible. This
means the output distribution T ∗ (p⊗N ) with
X
(T ∗ ρ⊗N )(|~y ih~y |) = T~xy~ px1 . . . pxN (9.102)
x∈X N
~
should be “close” to p⊗M ∈ C(X M ).

If we know that p is a pure state, this task can be performed quite easily: A pure
state of a classical system is given by a Dirac measure δz ∈ S(X), i.e. δz (x) = δzx .
This means that S produces always the same (although unknown) letter z ∈ X. To
clone a sample ~x produced by such a source we only have to take the first letter
and copy it M times. The corresponding channel Tb is described by the transition
probabilities Tb~xy~ = δx1 y1 δx1 y2 · · · δx1 yM . It provides not only the optimal but even
the ideal solution of the pure state classical cloning problem, because the output
of Tb is in fact indistinguishable from a sample of length M drawn from S. This
implies that in the pure state case (i.e. if we know a priori that input state is pure)
there is a unique solution which is independent from each reasonable figure of we
can choose.
If the input state p is arbitrary (i.e. not necessarily pure) the situation is much
more difficult and different figures of merit leads to different optimal solutions.
Classical estimation theory suggests on the other hand that (if nothing is known
about p) the best cloning method is to draw a sample ~y ∈ X M which is iid according
to the empirical distribution pex = N (x)/N (where N (x) denotes the number of
occurrences of x ∈ X in the sample ~x). The corresponding channel Te can be realized
most easily in the following way: Generate M random, equally distributed integers
1 ≤ r1 , . . . , rM ≤ N and choose yk = xrk , k = 1, . . . , M for the sequence ~y . Such a
procedure is used within the so called “bootstrap program” within classical statistics
[79], however it is not known whether Te arises as a solution of an appropriate
optimization problem. In other words, a mathematically precise way to say that Te
is the optimal cloner, is missing. In the quantum case the definition of optimality
for mixed state cloners is most probable even more difficult and good proposals are
up to now not available.
Chapter 10
State estimation
Our next topic is quantum estimation theory, i.e. we are looking at measurements
on N d-level quantum systems which are all prepared in the same state ρ. There
is quite a lot of literature about this topic and we are not able to give a complete
discussion here (cf. Hellströms book [109] for an overview and [107, 166, 173, 2, 89,
94, 43, 42, 68] and the references therein for a small number of recent publications).
Instead we will follow the symmetry based approach already used in the last two
Chapters. Parts of the presentation (Theorem 10.2.4) are based on [137]. Other
results (Theorem 10.1.2 and 10.2.6) are not yet published.
10.1 Estimating pure states
Consider first the case where we know that the N input systems are all prepared
in the same pure but otherwise unknown state. As for optimal cloning this assump-
tion leads to great simplifications and therefore to a quite complete solution of the
corresponding optimization problem.
10.1.1 Relations to optimal cloning
As already discussed in Section 4.2 cloning and estimation are closely related: If E :
C(S) → B(H⊗N ) is an estimator we can construct a cloning map TE : B(H⊗M ) →
B(H⊗N ) by (cf. Equation (4.19))
Z
TE∗ (ρ⊗N ) = σ ⊗M tr[E(dσ)ρ⊗N ]. (10.1)
S
In terms of matrix elements this can be rewritten as

Z
hψ, TE∗ (ρ⊗N )φi = hψ, σ ⊗M φi tr[E(dσ)ρ⊗N ] = tr[E(fψφ )ρ⊗N ], (10.2)
S
where ψ, φ ∈ H⊗M and fψφ ∈ C(S) is the function given by fψφ (σ) = hψ, σ ⊗M φi.
If we insert TE into the figure of merit ∆C 1 we get according to Equation (9.2)
£ ¤
∆C1 (TE ) = 1 − inf tr ρ(j) TE∗ (ρ⊗N ) (10.3)
ρ pure,j
Z
= 1 − inf tr(ρσ) tr[E(dσ)ρ⊗N ] (10.4)
ρ pure S
£ ¤
= 1 − inf tr ρhEiρ (10.5)
ρ pure
where hEiρ denotes the expectation value of E in the state ρ⊗N , i.e.
Z
£ ¤
hψ, hEiρ φi = hψ, σφi tr[E(dσ)ρ⊗N ] = tr E(hψφ )ρ⊗N , ∀φ, ψ ∈ H (10.6)
S
with fψφ ∈ C(S), fψφ (ρ) = hψ, ρφi. Hence a possible figure of merit for the estima-
tion of pure states is the biggest deviation of the expectation value from the “true”
density matrix ρ, i.e. ¡ £ ¤¢
∆Ep (E) = sup 1 − tr ρhEiρ . (10.7)
ρpure
If we measure the quality of E with ∆E p we get immediately an upper bound:

Since M is completely arbitrary in Equation (10.1) we see from (10.5) that ∆E
p (E)
10. State estimation 144
is bounded from above by the one particle error ∆C b

1 (TN →M ) of the optimal cloner
for arbitrary M . With Proposition 9.2.2 we get
µ ¶
C b d−1 N d−1
∆Ep (E) ≤ lim ∆ 1 ( T N →M ) = 1 − = . (10.8)
M →∞ d N +d N +d
We will see in the next Subsection that this bound can be achieved. Hence optimal
estimation of pure states can be regarded as the limiting case of optimal cloning for
M → ∞: On the one hand we can construct the optimal N → ∞ cloner from an
optimal estimator. On the other we can produce the best possible estimates from
a composition of the N → ∞ optimal cloner with a measurement on the infinitely
many clones.
10.1.2 The optimal estimator
£ ¤
Equation (10.6) shows that the function T (N, ∞) 3 E 7→ tr ρhEiρ is continuous
in the weak topology of T (N, ∞) which we have defined in Subsection 8.2.1. As a
supremum over affine functions, ∆E p is therefore lower semicontinuous and convex.
In addition it is easy to see that ∆Ep is invariant under the group action αU,τ from
Equation (8.12). In this context note that hEiρ satisfies
£ ¤
hψ, hαU Eiρ φi = tr αU E(fψφ )ρ⊗N (10.9)
£ ∗ ⊗N
¤
= tr E(fU ∗ ψ,U ∗ φ )(U ρU ) (10.10)
= hψ, U hEiU ∗ ρU U ∗ φi (10.11)
and therefore hαU Eiρ = U hEiU ∗ ρU U ∗ holds. In the same way we can show that
hατ Eiρ = hEiρ holds for each permutation τ ∈ SN . Inserting this in (10.7) we get
¡ £ ¤¢
∆E
p (αU,τ T ) = sup 1 − tr ρhαU,τ Eiρ (10.12)
ρ pure
¡ £ ¤¢
= sup 1 − tr ρU hEiU ∗ ρU U ∗ (10.13)
ρ pure
¡ £ ¤¢
= sup 1 − tr (U ∗ ρU )hEiU ∗ ρU = ∆E
p (E). (10.14)
ρ pure
We can invoke therefore Lemma 8.2.3 to see that we can search estimators which
minimize ∆E p among the covariant ones, and the latter are completely characterized
by Theorem 8.2.12. The general structure is still quite complicate. However we know
that the input states are pure and this leads to several possible simplifications.
First of all ∆Ep (E) detects only the part of E which is supported by the symmetric
⊗N
subspace H+ . Hence we can restrict the discussion in this subsection to observables
of the form
⊗N
E : C(S) → B(H+ ). (10.15)
This is the same type of assumption we have made already in the last Chapter.
In addition it is reasonable to search an optimal pure state estimator among those
observables which are concentrated on the set of pure states. But the latter is tran-
sitive under the action of U(d). Covariance implies therefore according to Theorem
8.2.11 that we have to look for observables of the form
Z
E(f ) = f (U σ0 U ∗ )U ⊗N P0 U ⊗N ∗ dU (10.16)
U(d)
⊗N
where σ0 is a fixed but arbitrary pure state and P0 is a positive operator on H+ .
⊗N
The most obvious choice for P0 is just σ0 . Hence we define
Z
b ) = d[N ]
E(f f (U σ0 U ∗ )(U σ0 U ∗ )⊗N dU. (10.17)
U(d)
145 10.1. Estimating pure states
Sometimes it is useful to keep track about the number of input systems on which
bN instead of E.
the estimator operates. In this case we write E b
b
The map E is obviously positive. To see that it is unital as well (and therefore
an observable), note that
Z Z
(U σU ∗ )⊗N dU = U ⊗N σ ⊗N U ⊗N ∗ dU (10.18)
U(d) U(d)
⊗N
is an operator which is supported by the symmetric subspace H+ and commutes
⊗N N
with all unitaries U . Hence, by irreducibility of π+ it coincides with λSN where
λ is a positive constant which can be determined by
"Z # Z
λd[N ] = λ tr(SN ) = tr (U σU ∗ )⊗N dU = tr(U σU ∗ )N dU = 1. (10.19)
U(d) U(d)
b is a unital positive map from C(S) to B(H⊗N ). Now we

Hence λ = d[N ]−1 and E +
have [111, 157, 43]
b defined in Equation (10.17) satisfies
Theorem 10.1.1 The estimator E
b d−1
∆E
p (E) = (10.20)
N +d
and is therefore optimal.
£ ¤
b ρ does not depend on the pure state
Proof. Due to covariance the quantity tr ρhEi
ρ. Hence we have
£ ¤
b = 1 − tr ρhEi
∆E (E) b ρ = 1 − hψ, hEi
b ρ ψi (10.21)
p
for an arbitrary but fixed ρ = |ψihψ|. Inserting Equation (10.17) into (10.6) we get
Z
£ ¤
b ρ = d[N ]
tr ρhEi tr(ρU σU ∗ ) tr(ρ⊗N (U σU ∗ )⊗N )dU (10.22)
U(d)
" Z #
d[N ]
= d[N ] tr ρ⊗(N +1) (U σU ∗ )⊗(N +1) dU = (10.23)
U(d) d[N + 1]
where we have used for the last equality the same reasoning as in Equation (10.19).
Hence Equation (10.20) follows from the definition of d[N ] in (9.10) and optimality
is an immediate consequence of the bound (10.8), derived from optimal cloning. 2
In spite of its close relationship to optimal cloning, the quantity ∆E p provides

only the most basic measure for the quality of an estimator, because the probability
that concrete estimates are far away from ρ can be quite large although ∆ E (E) is
small. There are different ways to handle this problem: One is to study the behavior
of variances in the limit of infinitely many input systems. Such an analysis leads to
quantum versions of Cramer-Rao inequalities and is carried out by Gill and Massar
in [94]. In this work we will follow a different approach which is based on the theory
of large deviations (cf. Section 10.3 for a short overview of some material we will
need in this chapter). This means we are discussing the behavior of E b far away from
the expectation value (in contrast to a Cramer-Rao like analysis which is related to
the behavior near the expectation value) and show that the probability to get an
estimate outside a small ball around the “true” state vector decays exponentially
fast with the number N of input systems. To formulate the corresponding theorem
recall from Section 3.2.4 that E b associates to each measurable subset ω ⊂ S an
£ ¤
b
effect E(ω) b
such that tr ρ⊗N E(ω) is the probability to get an estimate in ω if the
N input systems where in the joint state ρ⊗N . Since E b is concentrated on the set
P ⊂ S of pure states, only subsets ω of P are interesting here. This leads to the
following theorem.
Theorem 10.1.2 Consider the estimator E b defined in Equation (10.17). The se-
quence of probability measures on the set P of pure states
£ ¤
b
KN (ω) = tr E(ω)ρ ⊗N
(10.24)
satisfies the large deviation principle with rate function I(σ) = − ln tr(ρσ).
Proof. We use Theorem 10.3.5 and show that the probability measures KN satisfy
the Laplace principle (cf. Definition 10.3.4). Hence consider a continuous, bounded
function f : P → R and
Z
1 ∗
lim e−N f (U σ0 U ) tr(ρU σ0 U ∗ )N dU (10.25)
N →∞ N U(d)
Z
1 ∗ ∗
= lim e−N [f (U σ0 U )−ln tr(ρU σ0 U )] dU (10.26)
N →∞ N U(d)
£ ¤
= − inf f (U σ0 U ∗ ) − ln tr(ρU σ0 U ∗ ) (10.27)
U ∈U(d)
£ ¤
= − inf f (σ) − ln tr(ρσ) . (10.28)
σ∈P
To derive the second Equation we have used Varadhan’s Theorem (Theorem 10.3.2)
and the fact that a constant sequence of measures satisfies the large deviation prin-
ciple with zero rate function. Theorem 10.3.5 now implies that the KN satisfy the
large deviation principle as stated. 2
Since the rate function I(σ) = − ln tr(ρσ) is positive and vanishes only for σ = ρ
£ ¤
we see that the probability measures KN (ω) = tr ρ⊗N E bN (ω) converge weakly to a
point measure concentrated at σ = ρ. This shows that the estimation scheme which
is given by the sequence of optimal estimators E bN is asymptotically exact (cf. the
corresponding discussion in Section 4.2).
10.2 Estimating mixed states
If no a priori information about the state ρ of the input systems is available, we
can try to generalize the figure of merit ∆Ep by replacing the supremum over all
pure states with the supremum over all density matrices. In addition we have to
use a different distance measure which is more appropriate for mixed states. A
good choice is the trace-norm distance (for a discussion of fidelities of mixed states
consider the corresponding Section in [172]) and we get
∆E
m (E) = sup kρ − hEiρ k1 . (10.29)
ρ∈S
It is easy to see that ∆E

m is a convex and lower semicontinuous function on T (N, ∞)
which is invariant under the group action from Equation (8.12). Hence we can re-
strict due to Lemma 8.2.3 our search for optimal estimators to the set Tfs (N, ∞) of
fully symmetric ones. In contrast to the pure state case discussed in the last sub-
section, the simplification which arises from this restriction is now not very strong.
From Theorem 8.2.12 we see that the structure of a fully symmetric estimator E is
simple only along the orbits of the action
U(d) × S 3 (U, ρ) 7→ U ρU ∗ ∈ S, (10.30)

147 10.2. Estimating mixed states
while E can be arbitrary transversal to them. Since the set of all orbits of (10.30)
coincides with the set
d
X
Σ = {x ∈ [0, 1]d | x1 ≥ x2 ≥ . . . ≥ xd ≥ 0, xj = 1} (10.31)
j=1
of ordered spectra (cf. Equation (8.39)) this observation indicates that the hard
part of the estimation problem is estimating the spectrum of a density matrix, while
the rest can be covered by methods we know already from pure state estimation.
10.2.1 Estimating the spectrum
To follow this idea let us introduce a spectral estimator as an observable
F : C(Σ) → B(H⊗N ) (10.32)
on N quantum systems with values in the set of ordered spectra. If we denote the
natural projection from S to Σ by p : S → Σ (i.e. p(ρ) coincides with the ordered
spectrum of ρ) we can construct a spectral estimator from a full estimator E by
F (f ) = E(f ◦ p), where f ∈ C(Σ). If E is fully symmetric the corresponding F is
invariant under U(d) transformations and permutations, i.e. it satisfies
[U ⊗N , F (f )] = [Vτ , F (f )] = 0, ∀f ∈ C(Σ), ∀U ∈ U(d), ∀τ ∈ SN . (10.33)
Following Definition 8.2.4 we will denote each spectral estimator with this invari-
ance property fully symmetric. Theorem 8.2.5 implies immediately the following
proposition.
Proposition 10.2.1 Consider a fully symmetric, spectral estimator, i.e. an ob-
servable F : C(Σ) → B(H⊗N ) satisfying Equation (10.33). There is a sequence µm ,
m ∈ Yd (N ) of probability measures on Σ such that
X Z
F (f ) = Pm f (x)µm (dx) (10.34)
m∈Yd (M ) Σ
holds, where Pm are the central projections from Equation (8.19).

If we consider in particular projection valued observables the structure of F
becomes much simpler. Since the Pm are the only projections which commute with
all U ⊗N and all Vτ we see:
Corollary 10.2.2 A fully symmetric, projection valued, spectral estimator F is
given by a map
Yd (N ) 3 m 7→ x(m) ∈ Σ (10.35)
such that X ¡ ¢
F (f ) = f x(m) Pm (10.36)
m∈Yd (N )
holds.
We see that the structure of spectral estimators becomes much easier if we
restrict our attention to the projection valued case. To indicate that we can do
this without loosing estimation quality, let us have a short look on an optimization
problem, which is similar to the one in the last section. To this end consider the
expectation value of a (general) spectral estimator F
Z
£ ¤
hF iρ = x tr ρ⊗N F (dx) , (10.37)
Σ
and define in analogy to Equation (10.29)
∆E 2
s (F ) = sup khF iρ − p(ρ)k , (10.38)
ρ∈S
where p(ρ) ∈ Σ denotes again the ordered spectrum of ρ and k · k is the usual
norm1 of Rd . In contrast to Theorem 10.1.1 we are not able to state a minimizer of
this quantity explicitly. However, we can show at least that it is sufficient to search
among projection valued observables.
Proposition 10.2.3 The figure of merit ∆E
s is minimized by a projection valued
estimator.
Proof. Using similar reasoning as in the pure state case (cf. Lemma 8.2.3) it is
easy to see that ∆E e
s is minimized by a fully symmetric estimator F , i.e. we have
E e E
∆s (F ) = inf F ∆s (F ). Inserting Equation (10.34) into (10.37) we get
X Z
£ ¤
e
hF i ρ = tr Pm ρPm xµm (dx). (10.39)
m∈Yd (N ) Σ
If ρ is non degenerate (i.e. has no vanishing eigenvalue) we have ρ ∈ GL(d, C) and

get
tr[Pm ρPm ] = χm (ρ) dim Km , (10.40)
where χm (ρ) = tr[πm (ρ)] is the character of the irreducible GL(d, C) representation
with highest weight m. Since the set of non-degenerate matrices is dense in B(H)
we have by continuity (of the quantity under the supremum)
sup khF iρ − p(ρ)k2 = sup khF iρ − p(ρ)k2 (10.41)

ρ∈S ρ∈S
det ρ>0
and therefore
° °2
° °
° X °
∆E ( e
F ) = sup ° χ (ρ) dim K x(m) − p(ρ) ° , (10.42)
s °
ρ∈S °
m m °
m∈Yd (N ) °
det ρ>0
where x(m) are the first moments of the probability measures µm , i.e.
Z
x(m) = xµm (dx). (10.43)
Σ
The map m 7→ x(m) from Equation (10.43) defines according to Corollary 10.2.2 a
projection valued, spectral estimator F which satisfies ∆E e E
s (F ) = ∆s (F ). 2
10.2.2 Asymptotic behavior

To determine an optimal spectral estimator we have to find a function x(m) which
minimizes the right hand side of Equation (10.42). Although explicit formulas for χm
and dim Km exist, this minimization problem is very difficult to solve. The quantity
under the sup in (10.42) is a polynomial in p(ρ) ∈ Σ and its degree increases linearly
in N . Hence there is no closed form expression for ∆E s except for trivial cases.
We omit therefore a further discussion of the optimization problem given by ∆ E s .
Instead we will pass directly to an analog of Theorem 10.1.2 and determine the large
1 There are of course other possible choices for a distance measure, however k . k 2 leads to
the most simple quantity, because the term under the sup becomes a polynomial (although its
coefficients depends in a difficult way on m and N .)
deviation behavior of an appropriate estimation scheme FbN without considering its

optimality for finite N . The result of Proposition 10.2.3 serves here as a motivation
for the fact that we choose the spectral estimators FbN among the fully symmetric,
projection valued ones. To get a concrete expression for FbN , note that the normalized
Young frames itself provide the most simple choice for a function x(m), i.e.
m
Yd (N ) 3 m 7→ x(m) = ∈ Σ. (10.44)
N
Hence we define X ³m´
FbN (f ) = f Pm . (10.45)
N
m∈Yd (N )
It turns out somewhat surprisingly that these FbN form an asymptotically exact
estimation scheme, i.e. the probability measures tr[FbN (ω)ρ⊗N ] converge weakly to
the point measure at the spectrum p(ρ) of ρ. Explicitly, for each continuous function
f on Σ we have
Z X ³m´ ¡ ¢ ¡ ¢
lim f (x) tr[FbN (dx)ρ⊗N ] = lim f tr ρ⊗N Pm = f p(ρ) (10.46)
N →∞ Σ N →∞ N
Y
We illustrate this in Figure 10.1, for d = 3, and ρ a density operator with spectrum
r = (0.6, 0.3, 0.1). Then Σ is a triangle with corners A = (1, 0, 0), B = (1/2, 1/2, 0),
and C = (1/3, 1/3, 1/3), and we plot the probabilities tr(ρ⊗N Pm ) over m/N ∈ Σ.
This behavior was observed already by Alicki et. al. [5] in the framework of statistical
mechanics. We will prove now the following stronger result.
Theorem 10.2.4 The sequence of probability measures
KN (ω) = tr[FbN (ω)ρ⊗N ] (10.47)
satisfies the large deviation principle on Σ with rate function
X
Σ 3 x 7→ I(x) = xj (ln xj − ln pj (ρ)) ∈ [0, ∞] (10.48)
j
where pj (ρ) denotes the j th component of the spectrum p(ρ) ∈ Σ of ρ.

Proof. The idea of the proof is to use the Gärtner Ellis Theorem (Theorem 10.3.3),
which is, however, a statement about measures on vector spaces. Instead of Fb (a
measure on Σ) we have to look at its trivial extension to Rd ⊃ Σ, i.e. Rd ⊃ ω 7→
Fb (ω ∩ Σ) ∈ B(H⊗N ) for any measurable subset ω of Rd . Hence we have to analyze
the integrals Z
1
ln eN hx,yi tr[FbN (dx)ρ⊗N ]. (10.49)
N Rd
To simplify the calculations note first that it is, due to U(d) invariance of Fb , sufficient
to consider diagonal density matrices, with eigenvalues given in decreasing order.
Further simplification arises if we set ρ = eh where h = diag(h1 , . . . , hd ) with
h1 ≥ h2 ≥ . . . ≥ hd . Note that we exclude by this choice singular density matrices,
i.e. those with zero eigenvalue. However we can retain the latter as a limiting case,
if some of the hj goes to infinity. Hence, to restrict the analysis to ρ = eh is no loss
of generality.
Now we can define
Z
1
cN (y, h) = ln eN hx,yi tr[FbN (dx)(eh )⊗N ] (10.50)
N Rd
1 X
= ln ehm,yi χm (eh ) dim Km (10.51)
N
m∈Yd (N )
Figure 10.1: Probability distribution tr(ρ⊗N Pm ) for d = 3, N = 20, 100, 500 and r =
(0.6, 0.3, 0.1). The set Σ is the triangle with corners A = (1, 0, 0), B = (1/2, 1/2, 0),
C = (1/3, 1/3, 1/3).
where χm denotes again the character of πm . It is easy to see that cn (y, h) exists for
each y and h. Hence it remains to show that the limit c(y, h) = limN →∞ cN (y, h)
exists and the function y 7→ c(y, h) is differentiable.
To get a more explicit expression for χm (ρ), note that h is an element of the
Cartan subalgebra tC of gl(d, C). Hence we can calculate λ · h for each weight λ ∈ t∗C
of πm (cf. Subsection 8.3.2). If we denote in addition the multiplicity of λ (i.e. the
dimension of the weight subspace of λ) by mult(λ) we get
X
χm (ρ) = mult(λ)eλ·h (10.52)
λ
where the sum is taken over the set of all weight λ of πm . If the matrix elements hk
of h are given (as assumed) in decreasing order, exp(m·h) is the biggest exponential
(this is equivalent to the statement that m is the highest weight) and we have
em·h ≤ χm (ρ) ≤ dim (Hm ) em·h . (10.53)
Using the Weyl dimension formula it can be shown [76] that dim(Hm ) is bounded
from above by a polynomial in N , i.e.
dim (Hm ) ≤ (a1 + a2 N )a3 (10.54)
with positive constants a1 , a2 , a3 . Inserting this in Equation (10.51) we get
a3 ln(a1 + a2 N )
cN (y + h) ≤ cN (y, h) ≤
e +e
cN (y + h) (10.55)
N
where X
1
e
cN (y) = ln em·y dim Km , (10.56)
N
m∈Yd (N )
and we have identified here the diagonal matrix h = diag(h1 , . . . , hd ) with the d-
tuple h = (h1 , . . . , hd ) ∈ Rd . Equation (10.55) implies that
c(y, h) = lim cN (y, h) = lim e
cN (y + h) (10.57)
N →∞ N →∞
holds. In other words we only have to calculate the limit of e

c(y) = lim N →∞ e
cN (y).
This can be traced back to the following lemma [76]2
Lemma 10.2.5 Consider the sequence of probability measures K e N , N ∈ N given
by
Z X ³m´
e N (dx) = B(N )−1
f (x)K f dim Km , (10.58)
Rd N
m∈Yd (N )
X
B(N ) = dim Km . (10.59)
m∈Yd (N )
Then the limits

Z
1 e N (dx)
b
c(y) = lim b
cN (y), b
cN (y) = ln eN hx,yi K (10.60)
N →∞ N Rd
c(y) is differentiable in y. For y1 ≥ · · · ≥ yd we have

exist and the function b
 
X d
c(y) = ln d−1
b exp(yj ) . (10.61)
j=1
2 If y ≥ · · · ≥ y holds, there is a direct way to calculate e c(y), because we can invoke Equation
1 d
(10.57) to show that e c(y) = c(0, y). Using the definition of cN (y, h) in Equation (10.50) we easily
Pd
get e
c(y) = ln j=1
exp(yj ); cf. [137]. For a general y ∈ Rd this argument does not work, however.
It is easy to see that

1
e
cN (y) = b
cN (y) + ln B(N ) (10.62)
N
holds. To estimate B(N ) note first that B(N ) ≤ dim H ⊗N = dN . On the other
hand we have X
dN = dim(Hm ) dim(Km ). (10.63)
m∈Yd (N )
With Equation (10.54) this leads to

ln(a1 + N a2 ) 1
ln d − a3 ≤ ln B(N ) ≤ ln d. (10.64)
N N
Hence (10.62) implies e c(y) = b c(y) + d. Together with Equation (10.57) this shows
that c(y, h) exists and is differentiable for all y. In other words the KN from Equation
(10.47)
¡ satisfy the¢ large deviation principle with the Legendre transform I(x) =
supy y · x − c(y, h) of c(y, h) as the rate function. Using (10.62) and (10.61) we get
d
X
c(y, h) = e
c(y + h) = b
c(y + h) + lnd = ln pj (ρ) exp(yj ) (10.65)
j=1
with ρ = eh , pj (ρ) = exp(hj ) and for all y ∈ Rd with y1 + h1 ≥ · · · ≥ yd + hd . It is

now a simple calculus exercise to see that the rate function I(x) is given (for each
x ∈ Σ) as in Equation (10.48). This completes the proof. 2
Although Fb is not optimal with respect to ∆E s or any related figure of merit

(this can be checked numerically), the last theorem shows that the probability to
have a measuring result which is “far away” from the true value of the spectrum
of ρ, decreases exponentially with the number N of input systems. In other words
we get very good estimates already with fairly small N . However, there is an even
stronger argument which indicates that Fb is not only a “good” estimator but even
optimal in an asymptotic sense. To explain this remark, assume that we have to
estimate the spectrum of a density matrix ρ whose eigenbasis ei , i = 1, . . . , d of ρ
is known a priori. In this case we only have to perform a complete von Neumann
measurement with respect to the ei and get a classical estimation problem for the
probability distribution xj = hej , ρej i j = 1, . . . , d. If no additional knowledge is
available about the xj , j = 1, . . . , d, the best possible estimate for this problem is to
use the empirical distribution, i.e. the collection of relative frequencies of outcomes
e N (ω) denotes the proability to get the empirical distribution drawn from a
j. If K
sample of length N in the measurable set ω ⊂ Rd , it follows from Sanov’s Theorem
[77] that the Ke N satisfy the large deviation principle with exactly the same rate
function as the KN from Equation (10.47)! This implies that the estimation scheme
FbN is (asymptotically) as good as a strategy which uses a priori information about
the eigenbasis of the input state ρ. Although this is not a very precise argument, it
indicates that the FbN are (in a certain sense) “asymptotically optimal”.
10.2.3 Estimating the full density matrix
Now we can combine the results of the last two subsections to get an estimation
scheme for the full density matrix. We look for a fully symmetric estimator E :
C(S) → B(H⊗N ) whose image under the projection p : S → Σ coincides with Fb
from Equation (10.45), i.e. E(h ◦ p) = FbN (h) holds for each h ∈ C(Σ). According to
Theorem 8.2.12 this implies that E has the form
X Z £ ¤
E(f ) = f (U ρm/N U ∗ )U ⊗N Qm ⊗ 1I U ⊗N ∗ dU, (10.66)
m∈Yd (N ) U(d)
where the Qm ∈ B(Hm ) are appropriately chosen operators and

d
X
ρx = xj |ej ihej |, x∈Σ (10.67)
j=1
with a distinguished basis ej , j = 1, . . . , d. If h ∈ C(Σ) we get due to irreducibility

of πm
X ³m´ Z £ ¤
E(h ◦ p) = h πm (U )Qm πm (U )∗ ⊗ 1IdU (10.68)
N U(d)
m∈Yd (N )
X ³m´
= h c m Pm , (10.69)
N
m∈Yd (N )
with constants cm . Positivity and normalization implies cm = 1. Hence E(h ◦ p) =

Fb (h) as stated. The only freedom we have is therefore the choice of the Qm . In anal-
ogy to the pure state estimator from Equation (10.17) we choose Qm = |φm ihφm |
where φm ∈ Hm is the heighest weight vector associated to the U (d) representation
πm (with heighest weight m). Hence we define
"Z #
X
b )=
E(f dim Hm f (U ρm/N U ∗ )πm (U )|φm ihφm |πm (U )∗ dU ⊗ 1I,
m∈Yd (N ) U(d)
(10.70)
where the factors dim(Hm ) are needed for normalization (this is straightforward to
check). As for the spectral estimator Fb and in contrast to the pure state case it is not
clear whether E b is optimal for finite N , i.e. whether it minimizes an appropriately
chosen figure of merit. Nevertheless, we can extend the large deviation result from
Theorem 10.2.4 to get the following:
Theorem 10.2.6 Consider the estimator E bN from Equation (10.70) and a density
matrix ρ. The sequence £ ¤
KN (ω) = tr E bN (ω)ρ⊗N (10.71)
satisfies the large deviation principle with a rate function I : S → [0, ∞] which is
given by
Xd µ · ¸¶
pmj (U ∗ ρU )
I(U ρx U ∗ ) = xj ln(xj ) − ln (10.72)
j=1
pmj−1 (U ∗ ρU )
where x ∈ Σ, ρx is the density matrix from Equation (10.67), U ∈ U(d) and pmj (σ)
denotes: the principal minor of the matrix σ for j = 1, . . . , d and pm0 (σ) = 1 for
j = 0.
Proof. We will show that the measures KN satisfy the Laplace principle (Definition
10.3.4) which implies according to Theorem 10.3.5 the large deviation principle.
Hence we have to consider
Z
e−N f (ρ) KN (dρ) =
S
X Z
∗
dim Hm dim Km e−N f (U ρm/N U ) hφm , πm (U ∗ ρU )φm idU, (10.73)
m∈Yd (N ) U(d)
where we have assumed without loss of generality that ρ is non-degenerate. Now we

can express the matrix elements of πm (U ∗ ρU ) with respect to the highest weight
vector as follows ([241, § 49] or [195, Sect. IX.8])
d
Y
hφm , πm (U ∗ ρU )φm i = pmk (U ∗ ρU )mk −mk+1 (10.74)
k=1
where we have set md+1 = 0. Note that the right hand side of this equation makes
sense even if the exponents are not integer valued. We can rewrite therefore Equation
(10.73) with the probability measure
Z X
1 m
h(x)LN (dx) = N h( ) dim(Hm ) dim(Km ) (10.75)
Σ d N
m∈Yd (N )
to get
Z
e−N f (ρ) KN (dρ) = (10.76)
S
Z Z d
Y
N −N f (U ρx U ∗ )
= d e pmk (U ∗ ρU )N (xk −xk+1 ) dU LN (dx) (10.77)
Σ U(d) k=1
Z Z
¡ £ ¤¢
= exp −N f (U ρx U ∗ ) − ln(d) − I1 (U, x) dU LN (dx) (10.78)
Σ U(d)
with
d
X £ ¤
I1 (U, x) = (xk − xk+1 ) ln pmk (U ∗ ρU ) (10.79)
k=1
where we have set xd+1 = 0. Now we can apply Lemma 10.2.5 and Equation (10.54)
to see that the LN satisfy the large deviation principle on Σ with rate function3
d
X
I0 (x) = ln(d) + xj ln(xj ). (10.80)
j=1
The product measures dU LN (dx) satisfy therefore the large deviation principle as
well, with the same rate function, but on U(d) × Σ. Varadhan’s Theorem 10.3.2
implies therefore
Z
e−N f (σ) KN (dσ) = − inf (f (U ρx U ∗ ) − ln(d) − I1 (U, x) + I0 (x)) (10.81)
S x,U
 
Xd
= − inf f (U ρx U ∗ ) + xj ln(xj ) − I1 (U, x) . (10.82)
x,U
j=1
Hence the KN satisfy the Laplace principle, provided there is a well defined function
I : S → [0, ∞] with I(U ρx U ∗ ) = ln(d) − I1 (U, x) + I0 (x).
Lemma 10.2.7 There is a (unique) continuous function I on S such that
d
X
I(U ρx U ∗ ) = xj ln(xj ) − I1 (U, x) (10.83)
j=1
holds. I is positive and I(σ) = 0 implies σ = ρ.

Proof. To prove that I is well defined we have to show that U1 ρx U1∗ = U2 ρx U2∗
implies I1 (U1 , x) = I2 (U2 , x) holds. This is equivalent to the implication [U, ρx ] = 0
⇒ I1 (U, x) = I1 (1I, x). To exploit the relation [U, ρx ] = 0 let us introduce k ≤ d
integers 1 = j1 < j2 < · · · < jk = d such that xjα > xjα+1 and xj = xjα holds for
jα ≤ j < jα+1 . Then we have
k
" #
X pmjα (U ∗ ρU )
I1 (U, x) = xjα ln (10.84)
α=1
pmjα−1 (U ∗ ρU )
eN from Lemma 10.2.5 are slightly different. We have to use therefore Theorem
3 The measures K
10.3.3 and the same reasoning as in the last paragraph of the proof of Theorem 10.2.4.
155 10.3. Appendix: Large deviation theory
with j0 = 0 and pm0 (σ) = 1. On the other hand [U, ρx ] = 0 implies that U is block
diagonal U = diag(U1 , . . . , Uk ) with Uα ∈ U(dα ), dα = jα+1 − jα . Hence we have
pmjα (U ∗ ρU ) = pmjα (ρ) for all such U and all α. Together with Equation (10.84)
this shows that I is well defined.
To prove positivity of I consider a fixed x ∈ Σ. To get a lower bound on
d
X · ¸
pmj (U ∗ ρU )
− xj ln (10.85)
j=1
pmj−1 (U ∗ ρU )
we have to choose U such that the − ln terms are given in increasing £ order;¤ i.e. the
reverse ordering of the xj . This implies in particular that − ln pm1 (U ∗ ρU ) should
be as small as possible, in other words pm1 (U ∗ ρU ) should be as big as possible. This
is achieved if pm1 (U ∗ ρU ) coincides with the biggest eigenvalue λ1 of ρ. In this case
the basis vector e1 has to be the eigenvector of U ∗ ρU which corresponds to λ1 . This
shows that biggest possible value of pm2 (U ∗ ρU ) is λ1 λ2 , where λ2 is the second
biggest eigenvalue of ρ. Again, this implies that e2 is the corresponding eigenvector
of U ∗ ρU . In this way we can proceed to see that the quantity in Equation (10.85)
is minimized if pmj (U ∗ ρU ) = λ1 λ2 · · · λj , where λj , j = 1, . . . , d are the eigenvalues
of ρ in decreasing order. Hence we get
d
X ¡ ¢
I(x, U ) ≥ xj ln(xj ) − ln(λj ) , (10.86)
j=1
and equality holds iff ρ and σ = U ρx U ∗ are simultaneously diagonalizable. Since

the left hand side of this inequality is a relative entropy of classical probability
distributions, we see that I is positive and I(σ) = 0 holds iff σ = ρ. 2
Now we can invoke Theorem 10.3.5 which implies together with Equation (10.82)
and the preceding lemma the Theorem. 2
We see that the measures KN converge weakly to a point measure concentrated

at ρ. Hence the estimation scheme E bN is asymptotically exact with error proba-
bilities which vanish exponentially fast. This is the same result as for the spectral
estimator FbN which we have studied in the last section. The rate function I(σ), how-
ever, has a more difficult structure. It is in particular not just the relative entropy
between ρ and σ, but a closely related quantity.
10.3 Appendix: Large deviation theory
The purpose of this appendix is to collect some material about large deviation
theory which is used throughout this chapter. For a more detailed presentation we
refer the reader to monographs like [85] or [77].
Definition 10.3.1 Let KN , N ∈ N be a sequence of probability measures on the
Borel subsets of a complete separable metric space E. We say that the KN satisfy
the large deviation principle with rate function I : E → [0, ∞] if the following
conditions hold:
1. I is lower semicontinuous
2. The set {x ∈ E | I(x) ≥ c} is compact for each c < ∞.
3. For each closed subset ω ⊂ E we have

1
lim sup ln KN (ω) ≤ − inf I(x) (10.87)
N →∞ N x∈ω
4. For each open subset ω ⊂ E we have

1
lim inf ln KN (ω) ≥ − inf I(x) (10.88)
N →∞ N x∈ω
The most relevant consequence of this definition is the following theorem of

Varadhan [214], which describes the behavior of some expectation values in the
limit N → ∞:
Theorem 10.3.2 (Varadhan) Assume that the sequence KN , N ∈ N of prob-
ability measures on E satisfies the large deviation principle with rate function
I : E → [0, ∞]. If fN : E → R, N ∈ N is a sequence of continuous functions,
bounded from above and converging uniformly on bounded subsets to f : E → R, the
following equality holds:
Z
1 ¡ ¢
lim ln e−N fN (x) KN (dx) = − inf f (x) + I(x) . (10.89)
N →∞ N E x∈E
Throughout this work we are using two different methods to prove that a given
sequence KN , N ∈ N satisfies the large deviation principle. One possibility, which
leads to the Gärtner-Ellis Theorem [85, Thm. II.6.1], is to look at the corresponding
sequence of Laplace transforms.
Theorem 10.3.3 Consider a finite dimensional vector space E with dual E ∗ and
a sequence KN , N ∈ N of probability measures on the Borel subsets of E. Define
cN : E ∗ → R by Z
1
cN (y) = ln eN y(x) KN (dx), (10.90)
N E
and assume that
1. cN is finite for all N ∈ N,

2. c(y) = limN →∞ cN (y) exists and is finite for all y ∈ E ∗ , and
3. c(y) is differentiable for all y ∈ E ∗ .
Then the sequence KN ,¡ N ∈ N satisfies

¢ the large deviation principle with rate
function I(x) = supy∈E ∗ y(x) − c(y) .
A second possibility to check the large deviation principle is basically the con-
verse of Varadhans Theorem. To formulate it we need first the following definition.
Definition 10.3.4 Let E, KN and I as in Definition 10.3.1. We say that the KN
satisfy the Laplace principle with rate function I, if we have
Z
1 ¡ ¢
lim ln e−N f (x) KN (dx) = − inf f (x) + I(x) . (10.91)
N →∞ N E x∈E
for all bounded continuous functions f : E → R.

Now we have [77, Theorem 1.2.3]
Theorem 10.3.5 The Laplace principle implies the large deviation principle with
the same rate function.
Chapter 11
Purification
A central problem of quantum information processing is to ensure that devices

which have been designed to perform certain tasks still work well in the presence of
decoherence, i.e., under the combined influences of inaccurate specifications, interac-
tion with further degrees of freedom, and thermal noise. If quantum error correction
is not an option or if the protection by the quantum code was insufficient we have
to try to recover the original information from the decohered systems. As in the
classical case this is impossible for operations working on single systems. However,
if many (say N ) systems are available, all of which were originally prepared in
the same unknown pure state σ, and subsequently exposed to the same decoher-
ing process (described by a noisy channel R : B(H) → B(H)), we can perform a
measurement on the decohered systems to get an estimate ρ ∈ S(H) for the state
R∗ (σ). If R is known and invertible we can calculate (R ∗ )−1 (ρ) = σe and reprepare
arbitrarily many systems in the state σ e, which approximates the (pure) input state
σ (we have described such a procedure already in Section 4.2). This is exactly the
same type of problem we have described in Chapter 8 and in fact closely related to
optimal cloning.
The present chapter is based on [138]. A different approach to the same problem
can be found in [55].
11.1 Statement of the problem
11.1.1 Figures of Merit
As for optimal cloning we are searching for cloning maps T ∈ T (M, N ) which
optimize appropriate figures of merit. To define the latter let us assume that the
decoherence can be described by a depolarizing channel
1I
R∗ σ = λσ + (1 − λ) . (11.1)
d
Now we can follow the general structure given in Equation (8.4): We are searching
for channels which act on N systems in the state ρ = R ∗ (σ), where σ ∈ S(H) is
pure. The target functional we want to approximate is (R ∗ )−1 (ρ) = σ. If we follow
the two general options already encountered in Chapter 9 (all-particle error and
one-particle error) we get
£ ¡ ¢¤
∆R1 (T ) = sup 1 − tr T (σ (j) )(R∗ σ)⊗N = 1 − F1R (T ) (11.2)
σ pure,j
where the supremum is taken over all pure states ρ and j = 1, . . . , N and F1R
denotes the “one-particle fidelity”
£ ¤
F1R (T ) = inf tr T (σ (j) )(R∗ σ)⊗N . (11.3)
σ pure,j
Here σ (j) = 1I⊗· · ·⊗σ ⊗· · ·⊗1I denotes the tensor products with (M −1) factors “1I”
and one factor σ at the j th position (cf. Section 9.1). ∆R 1 measures the worst one
particle error of the output state T ∗ ([R∗ σ]⊗N ). If we are interested in correlations
too, we have to choose
£ ¡ ⊗M
¢¤
∆Rall (T ) = sup 1 − tr T (σ )(R∗ σ)⊗N = 1 − Fall R
(T ) (11.4)
σ,pure
11. Purification 158
and the corresponding fidelity is

£ ¤
R
Fall (T ) = inf tr T (σ ⊗M )(R∗ σ)⊗N . (11.5)
σ pure
11.1.2 The optimal purifier

As in the last chapters symmetry arguments will play a central role in the following.
It is in particular easy to see that the ∆R ] are convex, lower semicontinuous and
invariant under the group action from Equation (8.12). Hence we can apply Lemma
8.2.1 to see that it is sufficient to search optimal purifiers among the fully symmetric
ones. There is, however, a significant difference between purification and cloning.
Since the ∆R ] (T ) are defined in terms of fidelities of T with respect to mixed states
(R∗ (σ)) rather than pure ones we have to consider all summands in the direct sum
decomposition from Proposition 8.2.8. This makes the representation theory much
more difficult than in the last two chapters and we can present complete results
only in the qubit case. If nothing else is explicitly stated we will assume throughout
this chapter that H = C2 holds.
As a first consequence of the restriction to qubits we can relabel the represen-
tations πm in terms of angular momentum quantum numbers s, i.e. we can express
each Young frame m ∈ Y2 (N ) in terms of N and s = (m1 − m2 )/2, i.e. we have
m1 = s + N/2 and m2 = N/2 − s. Hence we can write1 Hm = Hs , Km = KN,s and
πm = πs . With (
{0, 1, . . . , N2 } N even
s ∈ I[N ] = (11.6)
{ 12 , 32 . . . , N2 } N odd
the decomposition of H⊗N (Section 8.2.2) becomes.
M M
H⊗N = Hs ⊗ KN,s , U ⊗N = πs (U ) ⊗ 1I. (11.7)
s∈I[N ] s∈I[N ]
The second special feature of the qubit case concerns the rather simple structure
of the πs . For each s the Hilbert space Hs is naturally isomorphic to the symmetric
⊗2s 2s
tensor product H+ and πs is unitarily equivalent to π+ (the restriction of U 7→
⊗2s ⊗2s
U to H+ ). The L decomposition of a fully symmetric T from 8.2.8 therefore
becomes T (A) = s T s (A) ⊗ 1
I, with fully symmetric channels Ts : B(H⊗M ) →
⊗2s
B(H+ ). Hence the Ts are exactly of the special form we have studied already in
Chapter 9 within optimal cloning. Hence let us define
M
Qb : B(H⊗M ) → B(H⊗N ), Q(A) b = Tb2s→M (A) ⊗ 1I (11.8)
s∈I[N ]
with
2s + 1
Tb2s→M
∗
(θ) = SM (θ ⊗ 1I⊗(M −2s) )SM 2s < M (11.9)
M +1
and
Tb2s→M
∗
(θ) = tr2s−M θ 2s ≥ M. (11.10)
The action of Qb on a system in the state ρ ⊗N
can be interpreted as follows: First
apply an instrument to the system which produces with probability
£ ¤
wN (s) = tr Ps ρ⊗N Ps (11.11)
2s particles in the joint state

πs (ρ)
ρs = (11.12)
χs (ρ)
1 Note that the Hilbert space H
m = Hs depends (in contrast to KN,s ) only on s and not on N .
The same is true if we consider πm (U ) = πs (U ) for U ∈ SU(2).
159 11.2. Calculating fidelities
(where Ps = Pm is the projection from H⊗N to Hs ⊗ Ks,N ). If the number 2s of

systems we have got in this first step is bigger than required (2s ≥ M ) we throw
away any excess particles. If 2s < M holds we have to apply the optimal 2s → M
cloner to produce the required number of outputs. Although this cloning process is
b of the output state
wasteful we will see in Section 11.3 that the fidelities F]R (Q)
produced by Q b are even the best fidelities we can get for any N → M purifier.
b therefore the optimal purifier.
Hence we will call Q
11.2 Calculating fidelities
b This is now much more difficult
Our next task is to calculate the fidelities F]R of Q.
than in the cloning case. We start therefore with some additional simplifications
arising from the assumption d = 2.
11.2.1 Decomposition of states
Consider a general qubit density matrix ρ, which can be written in its eigenbasis as
(β ≥ 0)
³ σ ´ µ ¶
1 3 1 eβ 0
ρ(β) = exp 2β = β (11.13)
2 cosh(β) 2 e + e−β 0 e−β
µ ¶
1 1
= tanh(β)|ψihψ| + (1 − tanh(β)) 1I, ψ=
2 0
The parametrization of ρ in terms of the “pseudo-temperature” β is chosen here,

because it is, as we will see soon, very useful for calculations. The relation to the
form of ρ = R∗ σ initially given in Equation (11.1) is obviously
λ = tanh(β). (11.14)
The N –fold tensor product ρ⊗N can be expressed as
exp(2βL3 )
ρ(β)⊗N = (11.15)
2 cosh(β))N
where
1³ ´
B(H⊗N ) 3 L3 = σ3 ⊗ 1I⊗(N −1) + · · · + 1I⊗(N −1) ⊗ σ3 (11.16)
2
7 U ⊗N .
denotes the 3–component of angular momentum in the representation U →
Similarly we get
¡ ¢
πs ρ(β) sinh(β)
ρs (β) = ¡ ¢= ¡ ¢ exp(2βL(s)
3 ) (11.17)
χs ρ(β) sinh (2s + 1)β
(s)
where L3 denotes again the 3–component of angular momentum but now in the
representation πs . For wN (s) introduced in (11.11) we can write
¡ ¢
¡ ⊗N
¢ sinh (2s + 1)β
wN (s) = tr Ps ρ(β) Ps = dim KN,s . (11.18)
sinh(β)(2 cosh(β))N
Hence the decomposition of ρ(β)⊗N becomes

M 1I
ρ(β)⊗N = wN (s)ρs (β) ⊗ , (11.19)
dim KN,s
s∈I[N ]
The quantities wN (s) are closely related to the spectral estimator Fb introduced
in Equation (10.45). We only have to identify the set Σ with the interval [0, 1/2]
according to the map [0, 1/2] 3 λ 7→ (1/2 + λ, 1/2 − λ) ∈ Σ. Then we get
£ ¤ X
tr Fb (f )ρ(β)⊗N = f (s/N )wN (s). (11.20)
s∈I[N ]
This observation will be very useful inPSection 11.4. For now, note that the wN (s)
define a probability measure. Hence s wN (s) = 1 and 0 ≤ wN (s) ≤ 1. Together
with the fact that the multiplicities dim KN,s are independent of β we can extract
from Equation (11.18) a generating functional for dim KN,s :
X ¡ ¢
2 sinh(β)(2 cosh(β))N = 2 sinh (2s + 1)β dim KN,s (11.21)
s∈I[N ]
¡ ¢¡ ¢ X ³ ´
−β −β N
= e −e β β
e +e = e(2s+1)β − e−(2s+1)β dim KN,s , (11.22)
s∈I[N ]
obtaining µ ¶
2s + 1 N
dim KN,s = , (11.23)
N/2 + s + 1 N/2 − s
provided N/2−s is integer, and zero otherwise. The same result can be derived using
representation theory of the symmetric group; see [195], where the more general case
dim H = d ∈ N is studied.
11.2.2 The one qubit fidelity
b To this end note that due to covariance of the
Our next task is to calculate F1R (Q).
depolarizing channel R the expression under the infima defining F1R (T ) in Equation
(11.3) depends for any fully symmetric purifier T not on σ and i. I.e. we get with
R∗ σ = ρ(β): h ¡ ¢i
F1R (T ) = tr σ (1) T ∗ ρ(β)⊗N (11.24)
with σ = |ψihψ|. Further simplification arises, if we introduce the black cow param-
eter γ(θ) which is defined for each density matrix θ on H ⊗M by
1
γ(θ) = tr(2L3 θ). (11.25)
M
To derive the relation of γ to F1R note that full symmetry of T implies equivalently
to (11.24)   
XM
1 ¡ ¢
F1R (T ) = tr  σ (j)  T ∗ ρ(β)⊗N  . (11.26)
M j=1
Since σ = (1I + σ3 )/2 holds with the Pauli matrix σ3 we get together with the
definition of L3 in Equation (11.16)
1h £ ¤i
1 + γ T ∗ (ρ(β)⊗N ) .
F1R (T ) = (11.27)
2
£ ¤
In other words it is sufficient to calculate γ T ∗ (ρ(β)⊗N ) (which is simpler because
SU(2) representation theory is more directly applicable) instead of F1R (T ).
Another advantage of γ is its close relation to the parameter λ = tanh(β) defin-
ing the operation R∗ in Equation (11.1). In fact we have
1 ¡ ¢ 1 ¡ ¢
γ(ρ(β)⊗N ) = tr 2L3 ρ(β)⊗N = N tr σ3 ρ(β) = tanh(β) = λ. (11.28)
N N
161 11.2. Calculating fidelities
¡ ¢
In other words the one particle restrictions of the output state T ρ(β)⊗N are given
by
£ ¤ £ ¤ 1I
γ T (ρ(β)⊗N ) σ + 1 − γ[T (ρ(β)⊗N )] . (11.29)
2
£ ¤
This implies that γ T (ρ(β)⊗N ) > λ should hold if T is really a purifier. Now we
can prove the following proposition:
b of the optimal purifier is given
Proposition 11.2.1 The one–qubit fidelity F1R (Q)
by X
b =
F1R (Q) wN (s)f1 (M, β, s) (11.30)
s∈I[N ]
with
2f1 (M, β, s) − 1 =
 ¡ ¢
 2s + 1 1

 2s coth (2s + 1)β − 2s coth β for 2s > M
= (11.31)
 1 M + 2³
 ¡ ¢ ´
 (2s + 1) coth (2s + 1)β − coth β for 2s ≤ M .
2s + 2 M
Proof. According to Equation (11.8) and (11.27) we have
 
X £ ¤
b = 1 1 +
F1R (Q) wN (s)γ Tb2s→M
∗
(ρs (β))  (11.32)
2
s∈I[N ]
X
=: wN (s)f1 (M, β, s), (11.33)
s∈I[N ]
where we have introduced the abbreviation

1h £ ∗ ¤i
f1 (M, β, s) = 1 + γ Tb2s→M (ρs (β)) . (11.34)
2
To exploit this equation further we need the following Lemma.
⊗2s
Lemma 11.2.2 For each fully symmetric channel T : B(H ⊗M ) → B(H+ ) there
(s)
is a positive constant ω(T ) such that T (L3 ) = ω(T )L3 holds. In addition we have

 M

 2s for 2s ≥ M
max ω(T ) = ω(Tb2s→M ) = (11.35)
T 
 M + 2
 for 2s < M ,
2s + 2
where the supremum is taken over all fully symmetric channels T : B(H ⊗M ) →
⊗2s
B(H+ ).
(s)
Proof. Validity of T (L3 ) = ω(T )L3 follows from Lemma 9.4.1. If 2s < M Equation
(11.35) is a consequence of Theorem 9.2.3. For 2s ≥ M note first that the one-qubit
error of Tb2s→M vanishes, i.e. ∆C b
1 (T2s→M ) = 0; cf. Equation (9.4). On the other
hand we know from Proposition 9.4.2 that ∆C (Tb2s→M ) is related to ω(Tb2s→M ) by
1
µ ¶
b 1 2s b
∆C
1 (T2s→M ) = 1− ω(T2s→M ) . (11.36)
2 M
Hence ω(Tb2s→M ) = M/2s as stated and this is due to ∆C

1 (T ) ≥ 0 for all T the
biggest possible value. 2
Now we have
£ ∗ ¤ 1 £ b ¤
2f1 (M, β, s) − 1 = γ Tb2s→M (ρs (β)) = tr 2T2s→M (L3 )ρs (β) (11.37)
M
ω(Tb2s→M ) (s) ω(Tb2s→M )2s
= tr[2L3 ρs (β)] = γ[ρs (β)]. (11.38)
M M
and
¡ (s) (s) ¢
¡ ¢ 1 ³ (s) ´ 1 tr 2L3 exp(2βL3 )
γ ρs (β) = tr 2L3 ρs (β) = ¡ ¢ (11.39)
2s 2s tr exp(2βL(s) )
3
1 d ¡ (s) ¢
= ln tr exp(2βL3 ) (11.40)
2s dβ
1 d ¡ ¡ ¢ ¢
= ln sinh (2s + 1)β − ln sinh β (11.41)
2s dβ
2s + 1 ¡ ¢ 1
= coth (2s + 1)β − coth β (11.42)
2s 2s
Inserting the values of ω(Tb2s→M ) from Equation (11.42) we get Equation (11.31).
2
11.2.3 The all qubit fidelity

R
Similarly to Equation (11.46) the infima defining Fall (T ) in Equation (11.5) does
not depend on σ, provided T is a fully symmetric purifier. Hence we have
£ ¡ ¢¤
R
Fall (T ) = tr σ ⊗M T ∗ ρ(β)⊗N (11.43)
with σ = |ψihψ|. Using this relation we can prove the following proposition:
R b
Proposition 11.2.3 The all–qubit fidelity Fall (Q) of the optimal purifier is given
by X
R b
Fall (Q) = wN (s)fall (M, β, s) (11.44)
s∈I[N ]
where fall (M, β, s) is given by



 2s + 1 1 − e−2β

 M ≤ 2s
 M + 1 1 − e−(4s+2)β
fall (M, β, s) = µ ¶−1 X µ ¶ (11.45)

 1 − e−2β 2s K 2β(K−s)

 e M > 2s.
 1 − e−(4s+2)β M M
K
Proof. Using the decomposition of Q b given in Equation (11.8) we get for the optimal
purifier something similar as in the last subsection:
X h ¡ ¢i
R b
Fall (Q) = wN (s) tr σ ⊗M Tb2s→M
∗
ρs (β) . (11.46)
s∈I[N ]
However the calculation of

h ¡ ¢i
fall (M, β, s) := tr σ ⊗M Tb2s→M
∗
ρs (β) (11.47)
is now more difficult, since the knowledge of Tb2s→M (L3 ) = ω(Tb2s→M )Ls3 is not
sufficient in this case. Hence we have to use the explicit form of Tb2s→M in Equation
163 11.3. Solution of the optimization problems
(11.9) and (11.10). For 2s < M this leads to
2s + 1
fall (M, β, s) = hψ ⊗M , SM (ρs (β) ⊗ 1I⊗(M −2s) )SM ψ ⊗M i (11.48)
M +1
2s + 1
= hψ ⊗M , (ρs (β) ⊗ 1I⊗(M −2s) )ψ ⊗M i (11.49)
M +1
2s + 1
= hψ ⊗2s , ρs (β)ψ ⊗2s i (11.50)
M +1
2s + 1 1 − e−2β
= (11.51)
M +1 1 − e−(4s+2)β
For M ≤ 2s we have to calculate

h ¡ ¢i h i
fall (s, M, β) = tr σ ⊗M Tb2s→M
∗
ρs (β) = tr Tb2s→M (σ ⊗M )ρs (β) (11.52)
h ³ í
= tr ρs (β) SM [(|ψ ⊗M ihψ ⊗M |) ⊗ 1I⊗(2s−M ) ]SM (11.53)
We will compute the operator Tb2s→M (σ ⊗M ) in occupation number representation.

By definition, the basis vector “|ni” of the occupation number basis is the normal-
ized version of¡S¢M Ψ, where Ψ is a tensor product of n factors ψ and (M − n) factors
φ, where φ = 01 denotes obviously the second basis vector. The normalization fac-
tor is easily computed to be
µ ¶−1/2
M
SM (ψ ⊗n ⊗ φ⊗(M −n) ) = |ni. (11.54)
n
We can now expand the “1I” in Equation (11.53) in product basis, and apply (11.54),
to find
X µ2s − M ¶µ2s¶−1
⊗M ⊗M ⊗(2s−M )
SM [(|ψ ihψ |) ⊗ 1I ]SM = |Ki hK|. (11.55)
K −M K
K
Now L3 is diagonal in this basis, with eigenvalues mK = (K − s), K = 0, . . . , (2s).

With ρs (β) from (11.13) we get
µ ¶µ ¶−1
1 − e−2β X 2s − M 2s
fall (M, β, s) = e2β(K−s) for M ≤ 2s.
1 − e−(4s+2)β K K − M K
(11.56)
Together with
µ ¶µ ¶−1 µ ¶−1 µ ¶
2s − M 2s (2s − M )! K!(2s − K)! 2s K
= = (11.57)
K −M K (K − M )!(2s − K)! (2s)! M M
we get
µ ¶−1 X µ ¶
1 − e−2β 2s K 2β(K−s)
fall (M, β, s) = −(4s+2)β
e . (11.58)
1−e M
K
M
Now the statement follows from Equations (11.46), (11.51) and (11.58). 2
11.3 Solution of the optimization problems

Now we are going to prove the following theorem:
Theorem 11.3.1 The purifier Q b maximizes the fidelities F R (T ) and F R (T ) (re-

1 all
spectively minimizes the corresponding errors). Hence the optimal fidelities (] =
1, all)
F]max (N, M ) = sup F] (T ), (11.59)
T ∈T (N,M )
are given by Equation (11.30) and (11.44).

Proof. The figures of merit ∆R ] satisfy (as in the cloning case) the assumption
from Lemma 8.2.1. Hence, there is a fully symmetric purifier T which minimizes
∆R R
] , respectively maximizes F] , i.e. we have F L
max
] (N, M ) = F# R
(T ). Applying
Proposition 8.2.8 we get a decomposition T (A) = s Ts (A)⊗1I with fully symmetric
⊗2s
channels T : B(H⊗N ) → B(H+ ). With Equations (11.24) and (11.43) we get
therefore  
1 X £ ¤
F1R (T ) = 1 + wN (s)γ Ts∗ (ρs (β))  (11.60)
2
s∈I[N ]
and X £ ¡ ¢¤
R
Fall (T ) = wN (s) tr σ ⊗M Ts∗ ρs (β) . (11.61)
s∈I[N ]
The last two Equations show that we have to optimize each component T s of the
purifier T independently. In the one qubit case this is very easy, because we can use
(s) £ ¤ ¡ (s) ¢
Lemma 11.2.2 to get£Ts (L3 ) =¢ω(Ts )L3 and γ Ts∗ (ρs (β)) = ω(Ts ) tr L3 ρs (β) .
Hence maximizing γ Ts∗ (ρs (β) ] is equivalent to maximizing ω(Ts ). But we have
according to Lemma 11.2.2

 M

 2s for 2s ≥ M
b
max ω(Ts ) = ω(T2s→M ) = (11.62)
T 
 M +2
 for 2s < M ,
2(s + 1)
which shows that F1max (N, M ) = F1R (Q) b holds as stated.

For the many qubit–test version the proof is slightly more difficult. However
as in the F1R -case we can solve the optimization problem for each summand in
Equation (11.61) separately. First of all this means that we can assume without loss
⊗M
of generality that Ts∗ takes its values in B(H+ ) because the functional
¡ ⊗M ∗ ¡ ¢¢
fs (Ts ) := tr σ Ts ρs (β) (11.63)
which we have to maximize, depends only on this part of the operation. Full sym-
metry implies in addition that Ts∗ (ρs (β)) is diagonal in occupation number basis
(see Equation (11.54)), because Ts∗ (ρs (β)) commutes with each πs0 (U ) (s0 = M/2,
U ∈ U(2)) if πs (U ) commutes with ρs (β).
If M > 2s this means we have Ts∗ (ρs (β)) = κ∗ σ ⊗M + r∗ where r∗ is a positive
operator with σ ⊗M r∗ = r∗ σ ⊗M = 0. Inserting this into (11.63) we see that fs (Ts ) =
κ∗ . Hence we have to ¡maximize κ¢∗ . The first step is an upper bound which we get
from the fact that tr σ ⊗2s ρs (β) 1I − ρs (β) is a positive operator. Since Ts∗ (1I) =
(2s + 1)/(M + 1)1I (another consequence of full symmetry) we have
³ ¡ ¢ ´ 2s + 1 ¡ ⊗2s ¢
0 ≤ T tr σ ⊗2s ρs (β) 1I − ρs (β) = tr σ ρs (β) 1I − κσ ⊗M − r∗ . (11.64)
M +1
Multiplying this Equation with σ ⊗M and taking the trace we get
2s + 1 ¡ ⊗2s ¢
κ∗ ≤ tr σ ρs (β) . (11.65)
M +1
165 11.4. Asymptotic behavior
However calculating fs (Tb2s→M ) we see that this upper bound is achieved, in other
words Tb2s→M maximizes fs .
If M ≤ 2s holds we have to use slightly different arguments because the estimate
(11.65) is to weak in this case. However we can consider in Equation (11.63) the
dual Ts instead of Ts∗ and use then similar arguments. In fact for each covariant
Ts the quantity Ts (σ ⊗M ) is, due to the same reasons as Ts∗ (ρs (β)) diagonal in the
occupation number basis and we get Ts (σ ⊗M ) = κσ ⊗2s +r where r is again a positive
P2s−1
operator with r = n=0 rn |nihn| (|ni denotes again the occupation number basis)
and κ is a positive constant. Since Ts is unital we get from 1I−σ ⊗M ≥ 0 the estimate
0 ≤ κ ≤ 1 in the same way as Equation (11.65). Calculating Tb2s→M (σ ⊗M ) shows
again that the upper bound κ = 1 is indeed achieved, however it is now not clear
whether maximizing κ is equivalent to maximizing fs (Ts ).
Hence let us show first that κ = 1 is necessary for fs (Ts ) to be maximal. This
follows basically from the fact that Ts is, up to a multiplicative constant, trace
preserving. In fact we have
¡ ¢ ¡ ¢ ¡ ¢ 2s + 1
tr Ts (σ ⊗M ) = tr Ts (σ ⊗M )1I = tr σ ⊗M Ts∗ (1I) = . (11.66)
M +1
This means especially that κ + tr(r) = (2s + 1)/(M + 1) holds, i.e. decreasing κ by
0 < ² < 1 is equivalent to increasing tr(r) by¡ the same ¢². Taking into account that
P2s
ρs (β) = n=0 hn |nihn| holds with hn = exp 2β(n − s) , we see that reducing κ by
² reduces fs (Ts ) at least by
³ ¡ ¢ ¡ ¢´ ¡ ¢
² tr σ ⊗2s ρs (β) − tr |2s − 1ih2s − 1|ρs (β) = ² e2βs − e(2s−1)β > 0. (11.67)
Therefore κ = 1 is necessary.
The last question we have to answer, is how the rest term r has to be chosen,
for fs (Ts ) to be maximal. To this end let us consider the cloning fidelity of Ts ,
C
i.e. Fall (Ts ). It is in contrast to fs (Ts ) maximized iff κ = 1. However the operation
C
which maximizes Fall (Ts ) is according to Proposition 9.3.4 unique. This implies that
κ = 1 fixes Ts completely. Together with the facts that κ = 1 is necessary for fs (Ts )
to be maximal and κ = 1 is realized for Tb2s→M we conclude that max fs (Ts ) =
fs (Tb2s→M ) holds, which proves the assertion. 2
11.4 Asymptotic behavior

Now we want to analyze the rate with which nearly perfect purified qubits can be
produced in the limit N → ∞. This is more difficult as in the cloning case (Section
9.5), because we have to compute the asymptotic behavior of various expectations
involving s. Fortunately we can trace this problem back to the analysis of spectral
estimation from Subsection 10.2.2. According to Equation (11.20) and Theorem
10.2.4 the quantities wN (s) define a sequence of probability measures on Σ = [0, 1/2]
which converge weakly to a point measure. More precisely we have the following
lemma.
Lemma 11.4.1 Let fN : (0, 1) → R, N ∈ N be a uniformly bounded sequence of
continuous functions, converging uniformly on a neighborhood of λ = tr(ρ(β)σ 3 ) to
a continuous function f∞ , and let wN (s) denote the weights in Equation (11.18).
Then X
lim wN (s)fN (2s/N ) = f∞ (λ). (11.68)
N →∞
s∈I[N ]
11.4.1 The one particle test

Let us analyze first the behavior of the optimal one–qubit fidelity F1max (N, M )
(cf. Equation (11.59)) in the limit M → ∞. Obviously only the M > 2s case of
f1 (M, β, s) is relevant in this situation and we get, together with Equation (11.30),
the expression
F1max (N, ∞) =
· ´¸
X 1 1 ³ ¡ ¢
wN (s) 1 + (2s + 1) coth (2s + 1)β − coth β , (11.69)
2 2s + 2
s∈I[N ]
which obviously takes its values between 0 and 1. To take the limit N → ∞ we can
write
X 2s
lim F1max (N, ∞) = lim wN (s)fN,∞ ( ) (11.70)
N →∞ N →∞ N
s∈I[N ]
with
· ´¸
1 1 ³ ¡ ¢
fN,∞ (x) = 1+ (N x + 1) coth (N x + 1)β − coth β . (11.71)
2 Nx + 2
The functions fN,∞ are continuous, bounded and converge on each interval (², 1)
with 0 < ² < 1 uniformly to f∞,∞ ≡ 1. Hence the assumptions of Lemma 11.4.1 are
fulfilled and we get
lim F1max (N, ∞) = f∞,∞ (λ) = 1 (11.72)
N →∞
Hence we can produce arbitrarily good purified qubits at infinite rate if we have
enough input systems. In other words we have proved the following proposition:
Proposition 11.4.2 For each asymptotic rate r > 0 the optimal one-qubit fidelity
from Equation (11.59) satisfies
lim F1max (N, brN c) = 1. (11.73)

N →∞
Let us consider now F1max (N, M ) for M < ∞. Since F1max (N, M ) > F1max (N, ∞)
we have obviously limN →∞ F1max (N, M ) = 1 for all M . Hence there is no difference
between finite and infinite output systems, as long as we are looking only at the limit
limN →∞ F1max (N, M ). Our next task is therefore to analyze how fast the quantities
F1max (N, M ) approaches 1 as N → ∞. To this end we compare three different quan-
tities F1max (N, ∞), F1max (N, 1) and f1 (1, β, N/2). The latter is the maximal fidelity
we can expect for N input systems. It corresponds to a device which produces an
output only with probability wN (N/2) and declares failure otherwise (from Lemma
11.4.1 we see that this probability goes to 0 as N → ∞). In slight abuse of notation
we write F1max (N, 0) = f1 (1, β, N/2) expressing that this is the case with no de-
mands on output numbers at all. The results are given in the following proposition
and plotted in Figure 11.1.
Proposition 11.4.3 The leading asymptotic behavior (as N → ∞) of F1max (N, M )
for the cases M = 0, 1, ∞ is of the form
µ ¶
max cM 1
Fone (N, M ) = 1 − +o (11.74)
2N N
¡ ¢
where, as usual, o N1 stands for terms going to zero faster than N1 , and with
c0 = (1 − λ)/λ (11.75)
2
c1 = (1 − λ)/λ (11.76)
2
c∞ = (λ + 1)/λ (11.77)
1
F1 (M, N )
0.95
0.9
0.85
0.8
0.75
0.7
0.65
M =0
0.6 M =1
M =∞
0.55
0 20 40 60 80 100 120 140 160 180 200
Figure 11.1: Optimal one-qubit fidelity for M = 0, M = 1 and M = ∞ output

systems.
Proof. Consider the limit

X 2s c∞
lim N (1 − F1max (N, ∞)) = wN (s)feN,∞ ( ) ≡ (11.78)
N →∞ N 2
s∈I[N ]
with feN,∞ = N (1 − fN,∞ ). The existence of this limit is equivalent to the asymp-
totic formula (11.74). Lemma 11.4.1 leads to c∞ /2 = fe∞,∞ (λ) with fe∞,∞ =
limN →∞ feN,∞ uniformly on (², 1). To calculate fe∞,∞ note that
N N coth β
feN,∞ (x) = + + Rest (11.79)
Nx + 2 Nx + 2
holds, where “Rest” is a term which vanishes exponentially fast as N → ∞. Hence
with coth β = 1/λ we get
1+λ
c∞ = 2fe∞,∞ (λ) = . (11.80)
λ2
The asymptotic behavior of F1max (N, 1) can be analyzed in the same way. The
only difference is that we have to consider now the 1 = M ≤ 2s branch of Equation
(11.31). In analogy to Equation (11.78) we have to look at
X 2s c1
lim N (1 − F1max (N, 1)) = wN (s)feN,1 ( ) = (11.81)
N →∞ N 2
s∈I[N ]
with feN,1 = N (1 − fN,1 ) and

· ¸
1 1 £ ¡ ¢ ¤
fN,1 (x) = 1− (N x + 1) coth (N x + 1)β − coth β . (11.82)
2 Nx
For fe∞,1 we get

1 −1 1
fe∞,1 (x) = ( + ). (11.83)
2 x xλ
Using again Lemma 11.4.1 leads to

1−λ
c1 = 2fe∞,1 (λ) = . (11.84)
λ2
max
Finally let us consider Fone (N, 0). Here the situation is easier than in the other
cases because we only have to look at one summand (f1 (1, β, N/2)) of F1max (N, 1),
i.e.
· ¸
max 1 1 £ ¡ ¢ ¤
Fone (N, 0) = 1− (N + 1) coth (N + 1)β − coth β = fN,1 (1). (11.85)
2 N
Hence we only need the asymptotic behavior of fN,1 (x) at x = 1. Using Equation
(11.83) we get
max 1−λ 1
Fone (N, 0) = 1 − + ··· . (11.86)
λ 2N
This concludes the proof of Equations (11.74) to (11.77). 2
11.4.2 The many particle test

We have seen already within optimal cloning (Section 9.5) that the all particle
C
fidelity Fall behaves in the limit N, M → ∞ quite differently as F1C . This statement
generalizes to purification. It is in particular impossible to produce outputs with
R
Fall (N, M ) → 1 at a non-vanishing rate (as long as λ < 1). More precisely the
following proposition holds (cf. also Figure 11.2)
Proposition 11.4.4 For each rate µ ∈ R+ we have


 2λ2
 2 if µ ≤ λ
Φ(µ) = lim Fallmax
(N, bµN c) = 2λ +2 µ(1 − λ) (11.87)
N →∞ 
 2λ
 if µ ≥ λ.
µ(1 + λ)
Proof. The central part of the proof is the following lemma, which allows us to
¡ 2s ¢−1 P ¡ K ¢ 2β(K−s)
handle the M K M e term in Equation (11.45).
Lemma 11.4.5 For integers M ≤ K and z ∈ C, define
µ ¶−1 X K µ ¶
K R K−R
Φ(K, M, z) = z . (11.88)
M M
R=M
Then, for |z| < 1, and c ≥ 1:

1
lim Φ(K, M, z) = . (11.89)
M,K→∞
M/K→c
1 − (1 − c)z
Proof. We substitute R 7→ (K − R) in the sum, and get

∞
X
Φ(K, M, z) = c(K, M, R)z R , (11.90)
R=0
where coefficients with M +R > K are defined to be zero. We can write the non-zero
coefficients as
µ ¶−1 µ ¶
K K −R (K − M )!(K − R)!
c(K, M, R) = = (11.91)
M M K!(K − R − M )!
(K − M ) (K − M − 1) (K − M − R + 1)
= ··· (11.92)
K (K − 1) (K − R + 1)
Y³
R−1
M ´
= 1− . (11.93)
K −S
S=0
Since 0 ≤ c(K, M, R) ≤ 1, for all K, M, R, the series for different values of M, K

are all dominated by the geometric series, and we can go to the limit termwise, for
every R separately. In this limit we have M/(K − S) → c for every S, and hence
c(K, M, R) → (1 − c)R . The limit series is again geometric, with quotient (1 − c)z
and we get the result. 2
To calculate now Φ(µ) recall that the weights wN (s) approach a point measure
in 2s/N =: x concentrated at λ = tr(ρ(β)σ3 ). This means that in Equation (11.44)
only the term with 2s = λN survives the limit. Hence if µ ≥ λ we get M ≥ λN = 2s.
Using Equation (11.45) and Lemma 11.4.1 we get in this case
λ
Φ(µ) = (1 − e−2β ). (11.94)
µ
We see that Φ(µ) → 0 for µ → ∞ and Φ(µ) → 1 − exp(−2β) for µ → λ.

If 0 < µ < λ we get M < λN = 2s, which means we have to choose Equation
(11.45) for fall (M, β, s). With Lemma 11.4.5 and Lemma 11.4.1 we get
1 − e−2β
Φ(µ) = (11.95)
1 − (1 − µ/λ)e−2β
which approaches 1 if µ → 0 and 1 − exp(−2β) if µ → λ. Writing this in terms of

λ = tanh β, we obtain Equation (11.87). 2
1
theta=0.25
Φ(µ) theta=0.50
0.9 theta=0.75
theta=1.00
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5 2
µ
Figure 11.2: Asymptotic all-qubit fidelity Φ(µ) plotted as function of the rate µ.
Chapter 12
Quantum game theory
Game theory is a misnomer for “multiperson decision theory” and it studies

the decision making process of competing agents in some conflict situations. This
description leads immediately to the question what can be “quantum” in game
theory. The answer is that quantum game theory studies games which are played
with quantum systems and typical applications lie within quantum cryptography,
e.g. if Alice and Bob use a quantum key distribution system to exchange a secure
cryptographic key they cooperate against a third player – the eavesdropper – who
uses quantum measurements to gain as much information about the conversation
as possible. Although it is a very new field of research there are already many
publication available, mainly focusing on the discussion of examples (a recent list
of references can be found in [90]). In the present chapter we will discuss the basic
aspects of quantum game theory. To this end we consider after an introductory
section (Section 12.1) two particular examples: A quantum version of the “Monty
Hall problem” (Section 12.2) and a cryptographic protocol called “quantum coin
tossing” (Section 12.3). Most parts of this Chapter are based on publications in
[65, 75].
12.1 Overview
Game theory, originally developed in the 1930’s in the context of economic theory by
von Neumann and Morgenstern [224], has become in the meantime a well established
theory which is used in many areas such as social sciences, biology and engineering.
In this section we will give a brief (informal) survey about the most basic concepts
and some (slight) modifications which are necessary to cover quantum games. A
nice introduction to (classical) game theory can be found in [165].
12.1.1 Classical games
Our starting point is the definition of a two person 1 normal form game. It consists
of two sets Σj (where j = A, B stands – as usual – for Alice and Bob) and two
function uj : XA × XB → R. The elements of Xj represent the strategies which are
available to player j and uj (sA , sB ) is the payoff player j gains if Alice and Bob
play strategy sA and sB respectively. The uj are therefore called payoff or utility
functions. A special case arises if uA = −uB , i.e. the win of one player is the loss of
the other. This is a real conflict situation and called zero sum game.
Alternatively a game can be represented in its extensive form which is more
related to the usual picture of a game like chess where both players manipulate
turn by turn the information which is represented by the pieces on the game board.
Mathematically the extensive form of a game is given by a tree (i.e. a graph where
each pair of nodes is connected by exactly one path), an allocation of each node
of the tree (except the end notes) to a player and payoffs for each player at the
end nodes. If node n belongs to Alice all edges starting in n represent the possible
moves Alice has in the situation represented by n. A strategy of a player j is now
a complete description of the action she will take at each possible “position” in
the game, i.e. a map which associates to each node belonging to j one edge 2 . If a
1 Most of the material presented here can be generalized in a straightforward way to multiperson
games – the two-person case is, however, sufficient for our purposes.
2 To be more precise we have to decompose the set of all nodes into equivalence classes (“in-
formation sets”) such that all nodes in a class belong to the same player and the same moves are
available at each node. Each equivalence class represent the same position although each node in
a class is given by a different combination of moves. A strategy is then a map from equivalence
171 12.1. Overview
strategy is given for each player we get a path in the tree which connects the top
node with one of the end nodes where the payoffs are given. Hence we can construct
the normal form of a game from its extensive form but the converse is not true. The
normal form of a game should therefore be regarded as a summary representation
– which contains, however, for many purposes enough information.
The aim of each player is of course to maximize her (or his) payoff uj (sA , sB ) by
judicious choice of the own strategy. In general, however, this has to be done without
knowledge about the choice of the opponent. The most important concept to solve
this problem is the Nash equilibrium. A pair of strategies (b sA , sbB ) ∈ XA × XB is
called pure Nash equilibrium if
uA (sA , sbB ) ≤ uA (b sA , sB ) ≤ uB (b
sA , sbB ) and uB (b sA , sbB ) (12.1)
holds for all alternative strategies sA ∈ XA respectively sB ∈ XB . If the game has a

unique Nash equilibrium it is natural for both players to choose the corresponding
strategy, because it provides the best payoff under the assumption that the other
player behaves “rational”, i.e. always tries to maximize his own payoff. This can
be seen quite easily for a zero sum game: If Alice plays sbA each deviation of Bob
from sbB would decreases his gain. Hence it is natural for Alice to assume that Bob
chooses sbB and her best response to sbB is sbA .
There are games which do not admit pure state Nash equilibria. This problem
can be solved if mixed strategies are taken into account. In other words the same
game is played many times and player j uses the pure strategy s ∈ Xj with proba-
bility pj (s). Hence mixed strategies are probability distributions on the sets X j and
the sets of all strategies are therefore given by S(Xj ), i.e. the state spaces of the
classical algebras C(Xj ). The utility functions are obviously elements of C(XA ×XB )
and we can calculate for each pair pj ∈ S(Xj ) the expected payoff
X
ūj (pA , pB ) = pA ⊗ pB (uj ) = pA (sA )pB (sB )uj (sA , sB ). (12.2)
sA ,sB
Now we can define a (mixed) Nash equilibrium to a be pair pbA ∈ S(XA ), pbB ∈
S(XB ) such that
ūA (pA , pbB ) ≤ ūA (b pA , pB ) ≤ ūB (b

pA , pbB ) and ūB (b pA , pbB ) (12.3)
holds for all pA ∈ S(XA ), pB ∈ S(XB ). Due to a well known theorem of Nash [167]
each finite normal form game has a mixed Nash equilibrium, which is however in
general not unique.
12.1.2 Quantum games
Let us turn now to quantum games. Roughly speaking a quantum game is nothing
else but a usual game (in the sense described above) which is played with quantum
systems, i.e. the strategies can be represented by quantum operations. There are
several proposals which try to make this informal idea more precise. Most of them
are based on the normal form description of a game [84, 156, 149]. It is however
quite difficult to provide a definition which describes all relevant physical ideas
without excluding interesting examples (e.g. the version of the Monty Hall game
described in Section 12.2 is not covered by most of the proposed definitions). We
will follow therefore a different, turn based, approach which can be loosely regarded
as a generalization of the extensive form. Maybe it is still not general enough, but it
covers many relevant examples; in particular those which arise from quantum cryp-
tography (cf. Section 12.3). Hence a quantum game is in the following an interactive
process between two players (Alice and Bob) which obeys the following structure:
classes to moves rather than nodes to moves.
12. Quantum game theory 172
1. The starting point is a hybrid system described by an observable algebra

A = B(H) ⊗ C(X), where the quantum part is related to the “game system”
which is manipulated during game-play while X is a classical notepad which
is used to store and exchange classical information of private or public nature.
2. Initially the system is prepared in a state ρ which can be either arbitrary (e.g.
if it is overridden in the next step) or which is part of the game description.
3. The first player operates on A with a channel T1 : A → A which has to
be taken from a distinguished set Σ1 . Since A is a hybrid system, T1 is a
parameter dependent instrument (cf. Section 3.2.5). Hence it does not only
describe an operation on the quantum system B(H). Instead, it can depend
on classical information initially stored in X. After this step the system is in
the state T1∗ ρ.
4. Now Alice and Bob continue to act alternating on A with channels Tj ∈
Σj until a fixed number N of rounds is reached3 . The sets Σj describe the
operations which are allowed at each step.
5. Finally an observable E is measured to determine the payoffs.
The rules of the game are obviously implemented by the sets Σj , the initial prepa-
ration and the final measurement E, while Alice’s respectively Bob’s strategies are
the elements of the sets ΣA = Σ1 × Σ3 × · · · × ΣN −1 and ΣB = Σ2 × Σ4 × · · · × ΣN ;
where we have assumed without loss of generality that N is even. Alternatively
we can allow one or both of the players to choose the initial preparation and the
final measurement as a part of their strategical options. This is however not really
a generalization, because we can interpret the state T1∗ ρ of the A-system after the
first round as an initial preparation provided by Alice. In the same way we can look
at TN E as an observable measured on A by Bob in the N th round. Note that this
reinterpretation of the first and last round in the game will be used in the next
section.
Assume now that Alice chooses the strategy sA = (T1 , T3 , . . . , TN −1 ) ∈ ΣA and
Bob sB = (T2 , . . . , TN ) ∈ ΣB . The probability for player j to get a payoff in ωj ⊂ R
is then given by
£ ¤
υ(ωA , ωB ) = tr (TN∗ TN∗ −1 · · · T1∗ ρ)E(ωA × ωB ) (12.4)
and the expected payoffs are

Z
£ ¤
ῡj (sA , sB ) = xj tr (TN∗ TN∗ −1 · · · T1∗ ρ)E(dxA × dxB ) . (12.5)
R2
(1)
If the game is repeated many times and Alice uses strategy sA with probability λ
(2)
and sA with 1 − λ Equation (12.4) becomes
(1) (2) (1) (2)
λυ(sA , sB ) + (1 − λ)υ(sA , sB ) = υ(λsA + (1 − λ)sA , sB ), (12.6)
where
(1) (2) (1) (2) (1) (2)
λsA + (1 − λ)sA = (λT1 + (1 − λ)T1 , . . . , λTN −1 − (1 − λ)TN −1 ). (12.7)
Hence it is natural to assume that ΣA and ΣB are convex sets and its extremal
elements are the pure strategies.
3 Alternatively we can use a special condition on the classical system (“checkmate”) which
signals the end of the game. This is however more difficult to handle and we do not need this
generalization.
173 12.2. The quantum Monty Hall problem
We have reached therefore a normal form description of a quantum game which

is very similar to the classical case. The only difference is that the sets Σ j of mixed
strategies are not of the form S(Xj ). In many cases, however, statements from
game theory only rely on the convex structure and not on the fact that the S(X j )
are simplices. This concerns in particular Nash equilibria: if we use the definition
of expected payoffs from (12.5) the definition of a mixed Nash equilibrium from
Equation (12.3) can be applied immediately to the quantum case. Nash’s existence
proof is based on Kakutani’s fixed point theorem [131], which holds for any compact
and convex set. Hence each quantum game of the described form admits a (mixed)
Nash equilibrium.
Let us turn now to “quantizations” of classical games. Quantizing a game means
to enlarge the strategical options of one or both players and to allow them to
perform some quantum operations. To be more precise consider a (classical) game G
described by the strategy sets XA , XB and utility functions uA , uB . A quantization
then consists of a quantum game G0 as just described and maps Ij which associates
to each pure strategy sj ∈ Xj a pure strategy Ij (sj ) ∈ Σj such that the pair
(IA (sA ), IB (sB )) always leads to the same outcome as (sA , sB ). I.e. if two players
plays the game G0 but only uses strategies of the form Ij (sj ) (or convex linear
combinations of them) it is the same as if they are playing G. An interesting situation
occurs, if only one player knows about the additional options, because he can fool
the other who bases his decisions on knowledge about G rather G0 . This was first
observed by Meyer in [164].
We will study this and other typical behavior of quantum games with two ex-
amples which are based on publications in [65] and [75]. The next section treats
a quantum version of the Monty Hall problem. This is a model case which is at
the one hand quite simple but has on the other hand enough structure to provide
some interesting results. The other example – quantum coin tossing – is taken from
quantum cryptography and shows therefore how methods from game theory can be
applied to more practical problems of quantum information theory.
12.2 The quantum Monty Hall problem
The well-known classical Monty Hall problem, also known under various other names
[30], is set in the context of a television game show. It can be seen as a two person
game, in which a player P tries to win a prize, but a show master (or Quiz master) Q
tries to make it difficult for her4 . We will discuss in this section a quantization of this
game (mainly based on [65]) which illustrate many interesting features of quantum
game theory. Of course quantizations of a game are rarely unique, and depend
critically on what is seen as a “key element”, and also on how actions which might
change the system are formalized, corresponding to how, in the classical version,
information is gained by “looking at something”. The Monty Hall problem is no
exception, and there are already quantizations [152, 91]. The version we present
in this paper was drafted independently, and indeed we come to a quite different
conclusion. We discuss the relation between these two approaches and ours in more
detail in Sec. 12.2.6 below.
12.2.1 The classical game
The classical Monty Hall problem is set in the context of a television game show.
In the last round of the show, the candidates were given a chance to collect their
prize (or lose it) in the following game:
1. Before the show the prize is hidden behind one of three closed doors. The
show master knows where the prize is but, of course, the candidate does not.
4 In this text, the show master is male, like Monty Hall, the host of the show, where the game
first appeared. The player is female, like Marilyn vos Savant, who was the first to fight the public
debate for the recognition of the correct solution, and had to take some sexist abuse for that.
2. The candidate is asked to choose one of the three doors, which is, however,
not opened at this stage.
3. The show master opens another door, and shows that there is no prize behind
it. (He can do this, because he knows where the prize is).
4. The candidate can now open one of the remaining doors to either collect her
prize or lose.
Of course, the question is: should the candidate stick to her original choice or
“change her mind” and pick the other remaining door? As a quick test usually
shows, most people will stick to their first choice. After all, before the show mas-
ter opened a door the two doors were equivalent, and they were not touched (nor
was the prize moved). So they should still be equivalent. This argument seems so
obvious that trained mathematicians and physicists fall for it almost as easily as
anybody else.
However, the correct solution by which the candidates can, in fact, double their
chance of winning, is to always choose the other door. The quickest way to convince
people of this is to compare the game with another one, in which the show master
offers the choice of either staying with your choice or opening both other doors.
Anybody would prefer that, especially if the show master courteously offers to open
one of the doors for you. But this is precisely what happens in the original game
when you always change to the other door.
To catch up with the general discussion from the last section let us discuss the
normal form of this game. The pure strategies of Q are described by the numbers
of the doors where the prize is hidden5 XQ = {1, 2, 3}. The player P can choose
as well one of the three doors in round 2 and has to decide whether she switches
(1) or not (0). Hence XP = {1, 2, 3} × {0, 1}. The game is a zero sum game, i.e.
uQ = −uP and uP has only two possible outcomes +1 if P wins and −1 if she looses.
If j ∈ XQ and (k, l) ∈ XP we can write uP simply as uP (j; k, l) = (−1)l (2δkj − 1).
If the game is repeated very often there are unique optimal strategies for both
players. Assume to this end that P has watched each issue of the show and has
calculated the probabilities pj with which the price is hidden behind the door j.
Then her best option is to choose in the 2. round the door with the lowest pj and
her chance to win becomes 1 − minj pj if she switches at the end to the second
unopened door. This is even greater than 2/3 if Q does not uses all three doors with
equal probability. Hence the best option for Q is to choose the uniform distribution.
The pair of strategies “uniform distribution” and “switch to the second door” is
therefore a Nash equilibrium.
12.2.2 The quantum game
We will “quantize” only the key parts of the problem. That is, the prize and the
players, as well as their publicly announced choices, will remain classical. The quan-
tum version can even be played in a game show on classical TV.
The main quantum variable will be the position of the prize. It lies in a 3-
dimensional complex Hilbert space H, called the game space. We assume that an
orthonormal basis is fixed for this space so that vectors can be identified by their
components, but apart from this the basis has no significance for the game. A second
important variable in the game is what we will call the show master’s notepad,
described by an observable algebra N . This might be classical information describing
how the game space was prepared, or it might be a quantum system, entangled with
the prize. In the latter case, the show master is able to do a quantum measurement
5 If P selects the correct door in the 2. step, Q has has to choose in step 3 between two doors.
However, to take this choice as an additional strategical option of Q into account makes things
more difficult without leading to new insights.
on his notepad, providing him with classical information about the prize, without
moving the prize, in the sense that the player’s information about the prize is
not changed by the mere fact that the show master “consults his notepad”. A
measurement on an auxiliary quantum system, even if entangled with a system of
interest, does not alter the reduced state of the system of interest. After the show
master has consulted his notepad, we are in the same situation as if the notepad
had been a classical system all along. As in the classical game, the situation for
the player might change when the show master, by opening a door, reveals to some
extent what he saw in his notepad. Opening a door corresponds to a measurement
along a one dimensional projection on H. Finally we need a classical system which
is used by both players to exchange classical information. We will call it the mail
box and describe it by a classical observable algebra C(X), where X can be taken
as the space of one-dimensional projections in H (i.e. X is the projective space
P 2 (C)). The overall algebra A which has to be used to describe the game according
to Subsection 12.2.2 is therefore A = B(H) ⊗ N ⊗ C(X).
The game proceeds in the following stages, closely analogous to the classical
game:
1. Before the show Q prepares the game space quantum mechanically and stores
some information about this preparation in his notepad N . The initial state
of the mail box can be arbitrary.
2. The candidate P chooses some one dimensional projection p on H and stores
this as classical information in the mailbox. The game space and the show-
masters notepad (which P can not access) remain untouched.
3. The show master opens a door, i.e., he chooses a one dimensional projection
q, and makes a Lüders/von Neumann measurement with projections q and
(1I − q). In order to do this, he is allowed first to consult his notebook. If it
is a quantum system, this means that he carries out a measurement on the
notebook. The joint state of prize and notebook then change, but the traced
out or reduced state of the prize does not change, as far as the player is
concerned. Two rules constrain the show master’s choice of q: he must choose
“another door” in the sense that q ⊥ p; and he must be certain not to reveal
the prize. The purpose of his notepad is to enable him to do this. After these
steps, the game space is effectively collapsed to the two-dimensional space
(1I − q)H and information about the opened door is stored in the mailbox.
4. The player P reads the mailbox, chooses a one dimensional projection p 0 on
(1I − q)H, and performs the corresponding measurement on the game space.
If it gives “yes” she collects the prize.
Note that we recover the classical game if Q and P are restricted to choose pro-
jections along the three coordinate axes. This shows that the proposed scheme is
really a quantization as described in Subsection 12.2.2. As in the classical case, the
question is: how should the player choose the projection p0 in order to maximize her
chance of winning? Perhaps it is best to try out a few options in a simulation, for
which a Java applet is available [64]. For the input to the applet, as well as for some
of the discussion below it is easier to use unit vectors rather than one-dimensional
projections. As standard notation we will use for p = |ΦihΦ| for the door chosen
by the player, q = |χihχ| for the door opened by Q, and r = |ΨihΨ| for the initial
position of the prize, if that is defined.
From the classical case it seems likely that choosing p0 = p is a bad idea. So
let us say that the classical strategy in this game consists of always switching to
the orthogonal complement of the previous choice, i.e., to take p0 = 1I − q − p.
Note that this is always a projection because, by rule 3, p and q are orthogonal one
dimensional projections. We will analyze this strategy in Sec. 12.2.3, which turns
out to be possible without any specification of how the show master can guarantee
not to stumble on the prize in step 3.
For the show master there are two main ways how he can satisfy the rules. The
first is that he chooses randomly the components of a vector in H, and prepares the
game space in the corresponding pure state. He can then just take a note of his choice
on a classical pad, so that in stage 3 he can compute a vector orthogonal to both the
direction of the preparation and the direction chosen by the player. Q’s strategies in
this case are discussed in Subsection 12.2.3. The second and more interesting way is
to use a quantum notepad, i.e., another system with three dimensional Hilbert space
K, and to prepare a “maximally entangled state” on H ⊗ K. Then until stage 3 the
position of the prize is completely undetermined in the strong sense only possible
in quantum mechanics, but the show master can find a safe door to open on H by
making a suitable measurement on K. Q’s strategies in this case are discussed in
Subsection 12.2.5.
12.2.3 The classical strategy
To explain why the classical strategy works almost as in the classical version of the
problem, we look more closely at the end of round 3, i.e. Q has opened one door
by measuring along q and the information which q he has chosen is stored in the
mailbox system. Q’s notepad is completely irrelevant from this stage on because it
is now P’s turn and she can not access it. Hence we have to look at a state ω on
the hybrid system B(H) ⊗ C(X). Note that ω depends on p but we suppress this
dependency in the notation. For a finite set X we have seen in Section 2.2.2 that ω
is given by a probability distribution w(q) on X and family ρq ∈ S(H), q ∈ X of
density operators such that expectation values becomes
X
ω(p0 ⊗ f ) = w(q)f (q) tr[ρq p0 ], p0 ⊗ f ∈ B(H) ⊗ C(X) (12.8)
q∈X
The ρq are called conditional density operators and they represent, loosely speaking,
the density matrix which P has to use for the game space after Q has announced
his intention of opening door q. This is usually not the same conditional density
operator as the one used by Q: Since Q has more classical information about the
system, he may condition on that, leading to finer predictions. In contrast, ρ q is
conditioned only on the publicly available information.
In our case X is not finite but the set of all one-dimensional projections in H.
Therefore Equation (12.8) is not applicable. Fortunately it can be generalized if we
replace the sum with an integral and the probability distribution with a probability
measure [175]. Z
ω(p0 ⊗ f ) = w(dq) tr(ρq p0 )f (q) (12.9)
The map q 7→ ρq becomes an element of L1 (X, w) ⊗ B ∗ (H) and is therefore an

equivalence class of function with respect to almost everywhere equality. However
for our purposes it is safe to ignore this difficulty and to identify the equivalence
class with one of its representatives.
From w and ρq we can compute the marginal density operator for the quantum
subsystem, describing measurements without consideration of the classical variable
q. This is the mean density operator
Z
ρ = w(dq) ρq . (12.10)
It will not depend on p and it will be the same as the reduced density operator
for the game space before the show master consults his notepad (he is not allowed
to touch the prize), and even before the player chooses p (which cannot affect the
prize).
From the rules alone we know two things about the conditional density operators:
firstly, that tr(ρq q) = 0: the show master must not hit the prize. Secondly, q and
p must commute, so it does not matter whichR of the two we measure first. Thus a
measurement of p responds with probability w(dq) tr(ρq p) = tr(ρ p). Combining
these two we get the overall probability wc for winning with the classical strategy
as Z
¡
wc = w(dq) tr ρq (1I − p − q)) = 1 − tr(ρ p) . (12.11)
If we assume that ρ is known to P, from watching the show sufficiently often, the
best strategy for P is to choose initially the p with the smallest expectation with
respect to ρ, just as in the classical game with uneven prize distribution it is best
to choose initially the door least likely to contain the prize. If Q on the other hand
wants to minimize P’s gain, he will choose ρ to be uniform, which in the quantum
case means ρ = 31 1I, and hence wc = 2/3.
12.2.4 Strategies against classical notepads
In this section we consider the case that the show master records the prepared
direction of the prize on a classical notepad. We will denote the one dimensional
projection of this preparation by r. Then when he has to open a door q, he needs
to choose q ⊥ r and q ⊥ p. This is always possible in a three dimensional space.
But unless p = r, he has no choice: q is uniquely determined. This is the same as
in the classical case, only that the condition “p = r”, i.e., that the player chooses
exactly the prize vector typically has probability zero. Hence Q’s strategic options
are not in the choice of q, but rather in the way he randomizes the prize positions
r, i.e., in the choice of a probability measure v on the set of pure states. In order to
safeguard against
R the classical strategy he will make certain that the mean density
operator ρ = v(dr) r is unpolarized (= 13 1I). It seems that this is about all he has
to do, and that the best the player can do is to use the classical strategy, and win
2/3 of the time. However, this turns out to be completely wrong.
Preparing along the axes. — Suppose the show master decides that since the player
can win as in the classical case, he might as well play classical as well, and save
the cost for an expensive random generator. Thus he fixes a basis and chooses each
one of the basis vectors with probability 1/3. Then ρ = 13 1I, and there seems to be
no giveaway. In fact, the two can now play the classical version, with P choosing
likewise a projection along a basis vector.
But suppose she √ does not, and chooses instead the projection along the vec-
tor Φ = (1, 1, 1)/ 3. Then if the prize happens to be prepared in the direction
Ψ = (1, 0, 0), the show master has no choice but to choose for q the unique projec-
tion orthogonal to these two, which is along χ = (0, 1, −1). So when Q announces
his choice, P only has to look which component of the vector is zero, to find the
prize with certainty! In other words a quantum strategy of P can always beat an
opponent, who is restricted to classical strategies. This is exactly the behavior we
have mentioned at end of Section 12.1.
At a first look the success of P’s strategy seems to be an artifact of the rather
minimalistic choice of probability distribution. But suppose that Q has settled for
any arbitrary finite collection of vectors Ψα and their probabilities. Then P can
choose a vector Φ which lies in none of the two dimensional subspaces spanned by
two of the Ψα . This is possible, even with a random choice of Φ, because the union
of these two dimensional subspaces has measure zero. Then, when Q announces the
projection q, P will be able to reconstruct the prize vector with certainty: at most
one of the Ψα can be orthogonal to q. Because if there were two, they would span a
two dimensional subspace, and together with Φ they would span a three dimensional
subspace orthogonal to q, which is a contradiction.
Of course, any choice of vectors announced with floating point precision is a
choice from a finite set. Hence the last argument would seem to allow P to win with
certainty in every realistic situation. However, this only works if she is permitted
to ask for q at any desired precision. So by the same token (fixed length of floating
point mantissa) this advantage is again destroyed.
This shows, however, where the miracle strategies come from: by announcing
q, the show master has not just given the player log 2 3 bits of information, but an
infinite amount, coded in the digits of the components of q (or the vector χ).
Preparing real vectors. — The discreteness of the probability distribution is not
the key point in the previous example. In fact there is another way to economize
on random generators, which proves to be just as disastrous for Q. The vectors in
H are specified by three complex numbers. So what about choosing them real for
simplicity? An overall phase does not matter anyhow, so this restriction does not
seem to be very dramatic. √
Here the winning strategy for P is to take Φ = (1, i, 0)/ 2, or another vector
whose real and imaginary parts are linearly independent. Then the vector χ ⊥ Φ
announced by Q will have a similar property, and also must be orthogonal to the
real prize vector. But then we can simply compute the prize vector as the outer
product of real and imaginary part of χ.
For the vector Φ specified above we find that if the prize is at Ψ = (Ψ1 , Ψ2 , Ψ3 ),
with Ψk ∈ IR, the unique vector χ orthogonal to Φ and Ψ is connected to Ψ via the
transformations
χ ∝ (Ψ3 , −iΨ3 , −Ψ1 + iΨ2 ) (12.12)
Ψ ∝ (− Re χ3 , Im χ3 , χ1 ) , (12.13)
where “∝” means “equal up to a factor”, and it is understood that an overall phase
for χ is chosen to make χ1 real. This is also the convention used in the simulation
[64], so Eq. (12.13) can be tried out as a universal cheat against show masters using
only real vectors.
Uniform distribution. — The previous two examples have one thing in common: the
probability distribution of vectors employed by the show master is concentrated on
a rather small set of pure states on H. Clearly, if the distribution is more spread out,
it is no longer possible for P to get the prize every time. Hence it is a good idea for Q
to choose a distribution which is as uniform as possible. There is a natural definition
of “uniform” distribution in this context, namely the unique probability distribution
on the unit vectors, which is invariant under arbitrary unitary transformations. Is
this a good strategy for Q?
Let us consider the conditional density operator ρq , which depends on the two
orthogonal projections p, q. It implicitly contains an average over all prize vectors
leading to the same q, given p. Therefore, ρq must be invariant under all unitary
rotations of H fixing these two vectors, which means that it must be diagonal in the
same basis as p, q, (1I − p − q). Moreover, the eigenvalues cannot depend on p and
q, since every pair of orthogonal one dimensional projections can be transformed
into any other by a unitary rotation. Since we know the average eigenvalue in the
p-direction to be 1/3, we find
1 2
ρq = p + (1I − p − q) . (12.14)
3 3
Hence the classical strategy for P is clearly optimal. In other words, the pair of
strategies: “uniform distribution for Q and classical strategy for P” is a Nash equi-
librium of the game. We do not know however, whether this equilibrium is unique, in
other words: If Q does not play precisely by the uniform distribution: can P always
improve on the classical strategy? We suppose that the answer to this question is
yes; to find a proof of this conjecture has turned out, however, to be a hard problem
which is still open.
12.2.5 Strategies for Quantum notepads
Assume now that the notepad system N is quantum rather than classical, i.e. N =
B(K) with K = C3 . Initially the bipartite system B(K ⊗ H) consisting of notepad
and game space is prepared in a maximally entangled state
3
1 X
Ω= √ |kki, (12.15)
3 k=1
where |ki, denotes an arbitrary basis in H, respectively in K. To “look” in his

notepad now means that Q performs a measurement on N . This is described by a
POV measure Fx , x ∈ Y , which we can take for simplicity to be discrete; i.e. Y is a
finite set of possible outcomes. How could Q now infer from this result a safe door
q for him to open in the game? This would mean that Fx measured on K, and q
measured on H never give a simultaneous positive response, when measured in the
state Ω, i.e.,
1
0 = hΩ|q ⊗ Fx |Ωi = hΩ|1I ⊗ Fx q T |Ωi = tr(q T Fx ). (12.16)
3
Here q 7→ q T denotes transposition in the basis |ki and we have used the fact that
(X ⊗ 1I)Ω = (1I ⊗ X T )Ω holds for any operator X ∈ B(H); cf. Subsection 3.1.1.
Since Fx and q T are both positive, this is equivalent to Fx q T = 0. Of course, Q’s
choice must also satisfy the constraint q ⊥ p. There are different ways of arranging
this, which we discuss in the following.
Equivalence if observable is chosen beforehand. — Suppose Q chooses the measure-
ment beforehand, and let us suppose it is discrete, as before. Then for every outcome
x and every p he must be able to find a one dimensional projection satisfying both
constraints FxT q = 0 and qp = 0. Clearly, this requires that Fx has at least a two
dimensional null space, i.e., Fx = |φx ihφx |, with a possibly non-normalized vector
φx ∈ H. It will be convenient to take the vectors φx to be normalized, and to define
Fx = vx |φx ihφx | with factors vx summing to 3, the dimension of the Hilbert space.
We can further simplify this structure, by identifying outcomes x with the same φ x ,
since for these the same projection q has to be chosen anyhow. We can therefore
drop the index “x”, and consider the measure to be defined directly on the set of one
dimensional projections. But this is precisely the structure we had used to describe
a classical notepad. This is not an accidental analogy: apart from taking transposes
this measure has precisely the same strategic meaning as the measure of a classical
notepad.
This is not surprising: if the observable is chosen beforehand, it does not matter
whether the show master actually performs the measurement before or after the
player’s choice. But if he does it before P’s choice, we can just as well consider
this measurement with its classical output as part of the preparation of a classical
notepad, in which the result is recorded.
Simplified strategy for Q. — Obviously the full potential of entanglement is used
only, when Q chooses his observable after P’s choice. Since the position of the prize
is “objectively undetermined” until then, it might seem that there are now ways to
beat the 2/3 limit. However, the arguments for the classical strategy hold in this
case as well. So the best Q can hope for are some simplified strategies. For example,
he can now get away with something like measuring along axes only, even though
for classical notepads using “axes only” was a certain loss for Q.
We can state this in a stronger way, by introducing tougher rules for Q: In this
variant P not only picks the direction p, but also two more projections p0 and p00
such that p + p0 + p00 = 1I. Then Q is not only required to open a door q ⊥ p, but we
require that either q = p0 or q = p00 . It is obvious how Q can play this game with an
entangled notepad: he just uses the transposes of p, p0 , p00 as his observable. Then
everything is as in the classical version, and the equilibrium is again at 2/3.
12.2.6 Alternative versions and quantizations of the game
Quantizing something is seldom a problem with a unique solution and quantum
game theory is no exception in this respect. In the following we will give a brief
overview on some games which are closely related to our version.
Variants arising already in the classical case. — Some variants of the problem can
also be considered in the classical case, and they tend to trivialize the problem, so
that P’s final choice becomes equivalent to “Q has prepared a coin, and P guesses
heads or tails”. Here are some possibilities, formulated in a way applying both to
the classical and the quantum version.
• Q is allowed to touch the prize after P made her first choice. Clearly, in this
case Q can reshuffle the system, and equalize the odds between the remaining
doors. So no matter what P chooses, there will be a 50% chance for getting
the prize.
• Q is allowed to open the door first chosen by P. Then there is no way P’s first
choice enters the rules, and we may analyze the game with stage 2 omitted,
which is entirely trivial.
• Q may open the door with the prize, in which case the game starts again.
Since Q knows where the prize is, this is the same as allowing him to abort
the round, whenever he does not like what has happened so far, e.g., if he does
not like the relative position of prize and P’s choice. In the classical version
he could thus cancel 50% of the cases, where P’s choice is not the prize, thus
equalizing the chances for P’s two pure strategies. Similar possibilities apply
in the quantum case.
Variants in which classical and quantum behave differently. — More interesting

cases arises if we focus on the differences between the classical and the quantum
game.
• Q may open the door with the prize, in which case P gets the prize. In the
classical version, revealing the prize is then the worst possible pure strategy,
so mixing in a bit of it would seem to make things always worse for Q. Then
although increasing Q’s options in principle can only improve things for Q,
one would advise him not to use the additional options. This is assuming,
though, that in the remaining cases Q sticks to his old strategy. However,
even classically, the relaxed rule gives him some new options: He can simply
ignore the notepad, and open any door other than p. Then the game becomes
effectively “P and Q open a door each, and P gets all prizes”. Assuming
uniform initial distribution of prizes this gives the same 2/3 winning chance
as in the original game.
The corresponding quantum strategy works in the same way. Assuming, for
simplicity, a uniform mean density operator ρ = 13 1I, Q’s strategy of ignoring
his prior information will give the classical 2/3 winning chance for P. But this
is a considerable improvement for Q in cases where a non-uniform probability
distribution of pure states previously gave Q a 100% chance of winning. So in
the quantum case, doing two seemingly stupid things together amounts to a
good strategy for Q: firstly, sometimes revealing the prize for P, and secondly
ignoring all prior information.
Note that this strategy is optimal for Q, because the classical strategy still
guarantees the 2/3 winning chance for P. This can be seen with the same
arguments as in Subsection 12.2.3. The only difference is that tr(ρq q) can be
nonzero, since Q may open the door with the prize. However in this case P
wins and we get instead of Equation (12.11)
Z
¡
wc = w(dq) tr ρq (1I − p − q)) + tr(ρq q) (12.17)
2
= 1 − tr(ρp) = (12.18)
3
• As Q opens the door he is allowed to make a complete von Neumann measure-

ment. Classically, it would make no difference if the doors were completely
transparent to the show master. He would not even need a pad then, because
he could always look where the prize is. But “looking” is never innocent in
quantum mechanics, and in this case it is tantamount to moving the prize
around. So let us make it difficult for Q, by insisting that the initial prepa-
ration is along a fixed vector, known also to P, and that Q not only has to
announce the direction q of the door he opens, but also the projections q 0 ⊥ q
and q 00 = 1I − q − q 0 entering in the complete von Neumann measurement,
which takes an arbitrary density operator ρ to
ρ 7→ qρq + q 0 ρq 0 + q 00 ρq 00 . (12.19)
Moreover, we require as before, that q is orthogonal both to p and to the

prize. The only thing remaining secret is which of the projections q 0 and q 00
has detected the presence of the prize (This simply would allow P to open
that door and collect). Q’s simple strategy is now to choose q as before. The
position of p is irrelevant for his choice of p0 and p00 : he will just take these
directions at 45◦ to the prize vector. This will result in the unpolarized density
operator (q 0 + q 00 )/2, and no matter what P chooses, her chances of hitting
the prize will be 1/2. She will probably feel cheated, and she is, because even
though she knows exactly where the prize was initially, the strategy “choose
the prize, and stick to this choice” no longer works.
Two published versions. — Finally let us have a short look on two variants of the
game which are proposed independently by other authors [152, 91].
• The quantization proposed in [152] is closely related to a version already

discussed above: After Q has opened one door he is allowed to perform an
arbitrary von Neumann measurement on the remaining two-dimensional sub-
space – i.e. he “looks” where the prize is. In the classical case this is an allowed
(but completely superfluous) step. In the quantum case, however, the prize
is shuffled around. In other words, the final result of the game is completely
independent of the steps prior to this measurement and the whole game is
reduced to a classical coin flip – which is not very interesting.
• A completely different quantization of the game is given in [91]. In contrast to

our approach, the moves available to Q and P are here not preparations and
measurements but operations on a tripartite system which is initially in the
pure state ψ ∈ HQ ⊗ HP ⊗ HO (and different choices for ψ lead to different
variants of the game). The Hilbert spaces HQ , HP and HO describe the doors
where Q hides the prize, which P chooses in the second step and which Q
opens afterwards and the gameplay is described by the unitary operator
U = (cos γUS + sin γUN )UO (UQ ⊗ UP ⊗ 1I). (12.20)
UQ and UP are arbitrary unitaries, describing Q’s and P ’s initial choice, UO is

the (fixed) opening box operator and US respectively UN are P’s “switching”
and “not-switching” operators. The payoff is finally given as the expectation
value of an appropriate observable $ (for a precise definition of UO , US , UN
and $ see [91]). The basic idea behind this scheme is quite different from ours
and a comparison of results is therefore impossible. Nevertheless, this is a nice
example which shows that quantizing a classical game is very non-unique.
12.3 Quantum coin tossing

The purpose of this section is to show how game theoretical questions naturally
arises within quantum cryptography. To this end we will discuss a particular class
of cryptographic protocols called “quantum coin tossing” within the game theoretic
setup introduced in Section 12.1.
Classical coin tossing was introduced in 1981 by Blum [29] as a solution to the
following cryptographic problem (cited from [29]): “Alice and Bob want to flip a
coin by telephone. (They have just divorced, live in different cities, want to decide
who gets the car.) Bob would not like to tell Alice heads and hear Alice (at the
other end of the line) say: Here it goes... I’m flipping the coin... You lost!”. Hence
the basic difficulties are: both players (Alice and Bob) distrust each other, there
is no trustworthy third person available and the only resource they can use is the
communication channel. Although this problem sounds somewhat artificial, coin
tossing is a relevant building block which appears in many cryptographic protocols.
Within classical cryptography coin tossing protocols are in general based on
assumptions about the complexity of certain computational tasks like factoring of
large integers, which are unproven and, even worse, break down if quantum comput-
ers become available. A subset of classical cryptography which suffers from similar
problems is given by public key cryptosystems. In this case however a solution is
available in form of quantum key distribution (cf. [97] for a review) whose security
is based only on the laws of quantum mechanics and no other assumptions. Hence
the natural question is: Does quantum mechanics provide the same service for coin-
tossing, i.e. is there a perfectly secure quantum coin-tossing protocol? Although the
answer is, as we will see, “no” [154, 161], quantum coin-tossing provides a reasonable
security improvement over classical schemes.
12.3.1 Coin tossing protocols
Two players (as usual called Alice and Bob) are separated from each other and want
to create a random bit, which can take both possible values with equal probability.
However they do not trust each other and there is no trustworthy third person who
can flip the coin for them. Hence they only can exchange data until they have agreed
on a value 0 or 1 or until one player is convinced that the other is cheating; in this
case we will write ∅ for the corresponding outcome.
To describe such a coin tossing protocol mathematically, we need three observ-
able algebras A, B and M, where A and B represent private information, which is
only accessible by Alice and Bob respectively – Alice’s and Bob’s “notepad” – while
M is a public area, which is used by both players to exchange data. We will call
it in the following the “mailbox”. Each of the three algebras A, B and M contain
in general a classical and a quantum part, i.e. we have A = C(XA ) ⊗ B(HA ) and
similar for B and M. A typical choice is HA = H⊗n and XA = Bm where H = C2
183 12.3. Quantum coin tossing
TB,1 TA,2 TB,N −1 TA,N
ρB,0
Bob



 Mailbox
ρA,0 


Alice
Figure 12.1: Schematic picture of a quantum coin-tossing protocol. The curly arrows
stands for the flow of quantum or classical information or both.
and B denotes the field with two elements – in other words Alice’s notepad consists
in this case of n qubits and m classical bits.
If Alice wants to send data (classical or quantum) to Bob, she has to store them
in the mailbox system, where Bob can read them off in the next round. Hence each
processing step of the protocol (except the first and the last one) can be described
as follows: Alice (or Bob) uses her own private data and the information provided
by Bob (via the mailbox) to perform some calculations. Afterwards she writes the
results in part to her notepad and in part to the mailbox. An operation of this
kind can be described by a completely positive map TA : A ⊗ M → A ⊗ M, or (if
executed by Bob) by TB : M ⊗ B → M ⊗ B.
Based on these structures we can describe a coin tossing protocol as a special
case of the general scheme for a quantum game introduced in Subsection 12.1.2:
At the beginning Alice and Bob prepare their private systems in some initial state.
Alice uses in addition the mailbox system to share some information about her
preparation with Bob, i.e. Alice prepares the system A⊗M in a (possibly entangled,
or at least correlated) state ρA,0 , while Bob prepares his notepad in the state ρB,0 .
Hence the state of the composite system becomes ρ0 = ρA,0 ⊗ ρB,0 . Now Alice and
Bob start to operate alternately6 on the system, as described in the last paragraph,
i.e. Alice in terms of operations TA : A ⊗ M → A ⊗ M and Bob with TB : M ⊗ B →
M ⊗ B. After N rounds7 the systems ends therefore in the state (cf. Figure 12.1)
∗ ∗ ∗ ∗
ρN = (TA,N ⊗ IdB )(IdA ⊗TB,N −1 ) · · · (TA,2 ⊗ IdB )(IdA ⊗TB,1 )ρ0 , (12.21)
where IdA , IdB are the identity maps on A and B. Note that we have assumed here
without loss of generality that Alice performs the first (i.e. providing the initial
preparation of the mailbox) and the last step (applying the operation TA,N ). It
is obvious how we have to change the following discussion if Bob starts the game
or if N is odd. To determine the result Alice and Bob perform measurements on
their notepads. The corresponding observables EA = (EA,0 , EA,1 , EA,∅ ) and EB =
(EB,0 , EB,1 , EB,∅ ) can have the three possible outcomes X = {0, 1, ∅}, which we
6 This means we are considering only turn based protocols. If special relativity, and therefore
finite propagation speed for information, is taken into account it can be reasonable to consider
simultaneous exchange of information; cf. e.g. [132] for details.
7 Basically N is the maximal number of rounds: After K < N steps Alice (Bob) can apply
identity maps, i.e. TA,j = Id for j > K.

have described already above. The tuples
sA = (ρA,0 ; TA,2 , . . . , TA,N +2 ; EA ), sB = (ρB,0 ; TB,1 , . . . , TB,N +1 ; EB ) (12.22)
consists of all parts of the protocol Alice respectively Bob can influence. Hence the
sA represent Alice’s and the sB represent Bob’s strategies. As in Subsection 12.1.2
the sets of all strategies of Alice respectively Bob are denoted by ΣA and ΣB . Note
that ΣA depends only on the algebras A and M while ΣB depends on B and M.
Occasionally it is useful to emphasize this dependency (the number of rounds is kept
fixed in this paper). In this case we write ΣA (A, M) and ΣB (B, M) instead of ΣA
and ΣB . The probability that Alice gets the result a ∈ X if she applies the strategy
sA ∈ ΣA and Bob gets b ∈ X with strategy sB ∈ ΣB is (cf. Equation (12.4))
£ ¤
υ(sA , sB ; a, b) = tr (EA,a ⊗ 1I ⊗ EB,b )ρN . (12.23)
If both measurements in the last step yield the same result a = b = 0 or 1

the procedure is successful (and the outcome is a). If the results differ or if one
player signals ∅ the protocol fails. As stated above we are interested in protocols
which do not fail and which produce 0 and 1 with equal probability. Another crucial
requirement concerns security: Neither Alice nor Bob should be able to improve the
probabilities of the outcomes 0 or 1 by “cheating”, i.e. selecting strategies which
deviate from the predefined protocol. At this point it is crucial to emphasize that
we do not make any restricting assumptions about the resources Alice and Bob
can use to cheat – they are potentially unlimited. This includes in particular the
possibility of arbitrarily large notepads. In the next definition this is expressed by
the (arbitrary) algebra R.
Definition 12.3.1 A pair of strategies (sA , sB ) ∈ ΣA (A, M) × ΣB (B, M) is called
a (strong) coin tossing protocol with bias ² ∈ [0, 1/2] if the following conditions
holds for any (finite dimensional) observable algebra R
1. Correctness: υ(sA , sB ; 0, 0) = υ(sA , sB ; 1, 1) = 12 ,
2. Security against Alice: ∀s0A ∈ ΣA (R, M) and ∀x ∈ {0, 1} we have
1
υ(s0A , sB ; x, x) ≤ +² (12.24)
2
3. Security against Bob: ∀s0B ∈ ΣB (R, M) and ∀x ∈ {0, 1} we have
1
υ(sA , s0B ; x, x) ≤ +² (12.25)
2
The two security conditions in this definition imply that neither Alice nor Bob
can increase the probability of the outcome 0 or 1 beyond the bound 1/2 + ².
However it is more natural to think of coin tossing as a game with payoff defined
according to the following table
Alice Bob
a=b=0 1 0
(12.26)
a=b=1 0 1
other 0 0
This implies that Alice tries to increase only the probability for the outcome 0 and
not for 1 while Bob tries to do the contrary, i.e. increase the probability for 1. This
motivates the following definition.
Definition 12.3.2 A pair of strategies (sA , sB ) ∈ ΣA (A, M) × ΣB (B, M) is called

a weak coin tossing protocol, if item 1 of Definition 12.3.1 holds, and if items 2
and 3 are replaced by
2’ Weak security against Alice: ∀s0A ∈ ΣA (R, M) we have
1
υ(s0A , sB ; 0, 0) ≤ + ², (12.27)
2
3’ Weak security against Bob: ∀s0B ∈ ΣB (R, M) we have
1
υ(sA , s0B ; 1, 1) ≤ + ². (12.28)
2
Here R stands again for any finite dimensional (but arbitrarily large) observable
algebra.
Good coin tossing protocols are of course those with a small bias. Hence the
central question is: What is the smallest bias which we have to take into account, and
how do the corresponding optimal strategies look like? To get an answer, however,
is quite difficult. Up to now there are only partial results available (cf. Section 12.3.5
for a summary).
Other but related questions arise if we exploit the game theoretic nature of
the problem. In this context it is reasonable to look at a whole class of quantum
games, which arises from the scheme developed up to now. We only have to fix the
algebras8 A, B and M and to specify a payoff matrix as in Equation (12.26). The
latter, however, has to be done carefully. If we consider instead of (12.26) the payoff
Alice Bob
a=b=0 1 -1
(12.29)
a=b=1 -1 1
other 0 0
we get a zero sum game, which seems at a first look very reasonable. Unfortunately
it admits a very simple (and boring) optimal strategy: Bob produces always the
outcome 1 on his side while Alice claims always that she has measured 0. Hence
they never agree and nobody has to pay. The game from Equation (12.26) does not
suffer from this problem, because a draw is for Alice as bad as the case a = b = 1
where Bob wins.
12.3.2 Classical coin tossing
Let us now add some short remarks on classical coin tossing, which is included
in the general scheme just developed as a special case: We only have to choose
classical algebras for A, B and M, i.e. A = C(XA ), B = C(XB ) and M = C(XM ).
The completely positive maps TA and TB describing the operations performed by
Alice and Bob are in this case given by matrices of transition probabilities (see
Sect. 3.2.3) This implies in particular that the strategies in ΣA , ΣB are in general
mixed strategies. This is natural – there is of course no classical coin tossing protocol
consisting of pure strategies, because it would lead always to the same result (either
always 0 or always 1). However, we can decompose each mixed strategy in a unique
way into a convex linear combination of pure strategies, and this can be used to show
that there is no classical coin tossing protocol, which admits the kind of security
contained in Definition 12.3.1 and 12.3.2.
8 In contrast to the security definitions given above this means that we assume limited recourses
(notepads) of Alice and Bob. This simplifies the analysis of the problem and should not be a big
restriction (from the practical point of view) if the notepads are fixed but very large.
Proposition 12.3.3 There is no (weak) classical coin tossing protocol with bias
² < 12 .
Proof. Assume a classical coin tossing protocol (sA , sB ) is given. Since its outcome
is by definition probabilistic, sA or sB (or both) are mixed strategies which can be
decomposed (in a unique way) into pure strategies. Let us denote the sets of pure
strategies appearing in this decomposition by Σ0A , Σ0B . Since the protocol (sA , sB )
is correct, each pair (sA , sB ) ∈ Σ0A × Σ0B leads to a valid outcome, i.e. either 0 or 1
on both sides. Hence there are two possibilities to construct a zero-sum game, either
Alice wins if the outcome is 0 and Bob if it is 1 or the other way round. In both cases
we get a zero-sum two-person game with perfect information, no chance moves 9 and
only two outcomes. In those games one player has a winning strategy (cf. Sect. 15.6,
15.7 of [224]), i.e. if she (or he) follows that strategy she wins with certainty, no
matter which strategy the opponent uses. This includes in particular the case where
the other player is honest and follows the protocol. If we apply this arguments to
both variants of the game, we see that either one player could force both possible
outcomes or one bit could be forced by both players. Both cases only fit into the
definition of (weak) coin tossing if the bias is 1/2. This proves the proposition. 2
Note that the proof is not applicable in the quantum case (in fact there are
coin tossing protocols with bias less than 1/2 as we will see in Section 12.3.4). One
reason is that in the quantum case one does not have perfect information. E.g. if
Alice sends a qubit to Bob, he does not know what qubit he got. He could perform
a measurement, but if he measures in a wrong basis, he will inevitably change the
qubit.
Another way to circumvent the negative result of the previous proposition is
to weaken the assumption that both players can perform any operation on their
data. A possible practical restriction which come into mind immediately is limited
computational power, i.e. we can assume that no player is able to solve intractable
problems like factorization of large integers in an acceptable time. Within the defi-
nition given above this means that Alice and Bob do not have access to all strategies
in ΣA and ΣB but only to certain subsets. Of course, such additional restrictions
can be imposed as well in the quantum case. To distinguish the much stronger se-
curity requirements in Definition 12.3.1 and 12.3.2 a protocol is sometimes called
unconditionally secure, if no additional assumptions about the accessible cheating
strategies are necessary (loosely speaking: the “laws of quantum mechanics” are the
only restriction).
12.3.3 The unitary normal form
A special class of quantum coin tossing arises if: 1. all algebras are quantum, i.e.
A = B(HA ), B = B(HB ) and M = B(HM ) with Hilbert spaces HA , HB and HM ;
2. the initial preparation is pure: ρA = |ψA ihψA | and ρB = |ψB ihψB | with ψA ∈
HA ⊗ HM and ψB ∈ HB ; 3. the operations TA,j , TB,j are unitarily implemented:
∗
TA,j (ρ) = UA,j ρUA,j with a unitary operator UA,j on HA ⊗ HM and something
similar holds for Bob and 4. the observables EA , EB are projection valued. It is
easy to see that the corresponding strategies (sA , sB ) ∈ ΣA × ΣB do not admit a
proper convex decomposition into other strategies. Hence they represent the pure
strategies. In contrast to the classical case it is possible to construct correct coin
tossing protocols with pure strategies. The following proposition was stated for the
first time (in a less explicit way) in [160] and shows that we can replace a mixed
strategy always by a pure one without loosing security.
Proposition 12.3.4 For each strategy sA ∈ ΣA (A, M) with A ⊂ B(HA ) there is a
Hilbert space KA and a pure strategy σ e M) with Ae = B(HA ⊗ KA ) such
eA ∈ ΣA (A,
9 That means there are no outside probability experiments like dice throws.
that
υ(sA , sB ; x, y) = υ(e
σA , sB ; x, y) (12.30)
holds for all sB ∈ ΣB (B, M) (with arbitrary Bob algebra B) and all x, y ∈ {0, 1, ∅}.
A similar statement holds for Bob’s strategies.
Proof. Note first that all observable algebras A, B and M are linear subspaces
of pure quantum algebras, i.e. A ⊂ B(HA ), B ⊂ B(HB ) and M ⊂ B(HM ). In
addition it can be shown that Alice’s operations TA : A ⊗ M → B(HA ) ⊗ B(HM )
can be extended to a channel TeA : B(HA ) ⊗ B(HM ) → B(HA ) ⊗ B(HM ), i.e. a
quantum operation [178]; something similar holds for Bob’s operations. Hence we
can restrict the proof to the case where all three observable algebras are quantum.
Now the statement basically follows from the fact that we can find for each item
in the sequence TA = (ρA ; TA,2 , . . . , TA,N ; EA ) a “dilation”. For the operations TA,j
this is just the ancilla representation given in Corollary 3.2.1 , i.e.
∗
¡ ¢
TA,j (ρ) = tr2 Vj (ρ ⊗ |φj ihφj |)Vj∗ (12.31)
with a Hilbert space Lj , a unitary Vj on HA ⊗ Lj and a pure state φj ∈ Lj (and
tr2 denotes the partial trace over Lj ). Similarly, there is a Hilbert space L0 and a
pure state φ0 ∈ HA ⊗ L0 such that
ρA = tr2 (|φ0 ihφ0 |) (12.32)
holds (i.e. φ0 is the purification of ρA ; cf. Sect. 2.2), and finally we have a Hilbert
space LN +2 , a pure state φN +2 and a projection valued measure F0 , F1 , F∅ ∈ B(HA ⊗
LN +2 ) with ¡ ¢
tr(EA,x ρ) = tr Fx (ρ ⊗ |φN +2 ihφN +2 |) , (12.33)
this is another consequence of Stinesprings theorem. Now we can define the pure
strategy σeA as follows:
KA = L0 ⊗ L2 ⊗ . . . ⊗ LN ⊗ LN +2 (12.34)
ψA = φ0 ⊗ φ2 ⊗ · · · ⊗ φN ⊗ φN +2 (12.35)
UA,j = 1I0 ⊗ 1I2 ⊗ · · · ⊗ Vj ⊗ · · · ⊗ 1IN ⊗ 1IN +2 (12.36)
eA,x = 1I0 ⊗ · · · ⊗ 1IN ⊗ Fx ,
E (12.37)
where 1Ik denotes the unit operator on Lk and in Equation (12.36) we have implicitly
used the canonical isomorphism between HA ⊗KA and L0 ⊗· · ·⊗HA ⊗Lj ⊗. . .⊗LN +2
. It is now easy to show σ
eA satisfies Equation (12.30). 2
This result allows us to restrict many discussions to pure strategies. This is

very useful for the proof of no-go theorems or for calculations of general bounds
on the bias of coin-tossing protocols. This concerns in particular the results in [6]
which apply, due to Proposition 12.3.4 immediately to the general case introduced
in Section 12.3.1. Many concrete examples (cf. the next section) are however mixed
protocols and to rewrite them in a pure form is not necessarily helpful.
12.3.4 A particular example
In this section we are giving a concrete example for a strong coin tossing protocol.
It has a bias of ² = 0.25 and is derived from a quantum bit commitment protocol
(a procedure related to coin tossing) given in [199]. Bit commitment is another two
person protocol which is related to coin tossing. It is always possible to construct
a coin tossing protocol from a bit commitment protocol but not the other way
round (cf. [132]). Hence statements about the security of certain bit commitment
protocols can be translated into statements about the bias of the related coin tossing
protocols. This shows together with [199] that the given protocol has the claimed
bias.
The protocol. — In this protocol we take HA = HM = C3 , HB = C3 ⊗ C3 plus

classical parts of at most 2 bits for each notepad. The canonical base in the Hilbert
space C3 is denoted by |ii, i = 0, 1, 2
1. preparation step: Alice throws a coin, the result is bA ∈ {0, 1}, with prob-
ability 1/2 each. She stores the result and prepares the system B(HA ) ⊗
B(HM ) in the state |ψbA ihψbA |, where |ψ0 i = √12 (|0, 0i + |1, 2i) and |ψ1 i =
√1 (|1, 1i + |0, 2i) are orthogonal to each other. Bob throws a similar coin, and
2
stores the result bB . The initial preparation of his quantum part is arbitrary.
2. Bob reads the mailbox (i.e. swaps it with the second part of his Hilbert space)
and sends bB to Alice.
3. Alice receives bB and puts her remaining quantum system into the mailbox.
4. Bob reads the mailbox and puts the system into the first slot of this quantum
register.
5. results: The result on Alice’s side is bA ⊕ bB , where ⊕ is the addition modulo

2. Bob performs a projective measurement on his quantum system with P 0 =
|ψ0 ihψ0 |, P1 = |ψ1 ihψ1 | and P(∅) = 1I − P0 − P1 , with result b0A . If everybody
followed the protocol b0A = bA . So the result on Bob’s side is b0A ⊕ bB .
Possible cheating strategies. — Now we will give possible cheating strategies for each
party which lead to the maximal probability of achieving the preferred outcome.
For simplicity we just look at the case where Alice prefers the outcome to be 0,
whereas Bob prefers it to be 1, cheating strategies for the other cases are easily
derivable. A cheating strategy for Bob is to try to distinguish in step 2 whether
Alice has prepared |ψ0 i or |ψ1 i. For this purpose he performs the measurement
(|0ih0|, |1ih1|, |2ih2|). If the result cB 6= 2 (the probability for this in either case is
1/2) he can identify bA = cB and set bB = cB ⊕ 1 to achieve the overall result
1. If cB = 2 holds, he has not learned anything about bA . In that case he just
continues with the protocol and hopes for the desired result, which appears with
the probability 1/2.10 So the total probability for Bob to achieve the result 0 is
1 1 1 3
2 + 2 · 2 = 4.
A cheating strategy for Alice is to set in the initial step bA = 0 and to prepare
the system B(HA ) ⊗ B(HM ) in the state |ψ f0 i = √1 (|0, 0i + |0, 1i + 2 · |1, 2i). Then
6
she continues until step 3. If bB = 0 she just continues with the protocol. Then the
probability that in the last step Bob measures b0B = 0 equals tr(|ψ f0 ihψ
f0 |·|ψ0 ihψ0 |) =
f0 i|2 = 3 . If bB = 1 she first applies a unitary operator, which swaps |0i and
|hψ0 |ψ 4
|1i, on her system before she sends it to Bob. The state on Bob’s side is then | ψ f1 ihψ
f1 |
with |ψf1 i = √1 (|1, 0i + |1, 1i + 2 · |0, 2i). The probability that Bob measures b0 = 1
6 B
f f f 2 3
equals tr(|ψ1 ihψ1 | · |ψ1 ihψ1 |) = |hψ1 |ψ1 i| = 4 . So the total probability for Alice to
get the outcome 0 is 12 · 34 + 12 · 34 = 34 .
12.3.5 Bounds on security
The previous example shows that quantum coin tossing admits, in contrast to the
classical case, a nontrivial bias. However, how secure quantum coin tossing really is?
Can we reach the optimal case (² = 0)? The answer actually is “no”. This was first
proven by Mayers, Salvail and Chiba-Kohno [161]. Later on Ambainis recalls the
10 After that measurement he is no longer able to figure out which outcome occurs on Alice’s
side, so he just sets his outcome to 1. A similar situation occurs in the cheating strategy for Alice,
but she is in neither case able to predict the outcome on Bob’s side with certainty.
arguments in a more explicit form [6]11 . It is still an open question, whether there
exists quantum coin tossing protocols with bias arbitrarily near to zero. Ambainis
also shows that a coin tossing protocol with a bias of at most ² must use at least
Ω(log log 1² ) rounds of communication. Although in that paper he gives only the
proof for strong coin tossing, it holds in the weak case as well. It follows that a
protocol cannot be made arbitrarily secure (i.e. have a sequence of protocols with
² → 0) by just increasing the amount of information exchanged in each step. The
number of rounds has to go to infinity (although very slowly).
The strong coin tossing protocol given in section 12.3.4 has a bias of ² = 0.25.
Another one with the same bias is given by Ambainis [6]. No strong protocol with
provable smaller bias is known yet. The best known weak protocol is given by
Spekkens and Rudolph [200] and has a bias of ² = √12 − 12 = 0.207 . . . . Although
this is still far from arbitrarily secure, it shows another distinction between classical
and quantum information, as in a classical world no protocol with bias smaller than
0.5 is possible.
Another interesting topic in quantum coin tossing is the question of cheat-
sensitivity, that means how much can each player increase the probability of one
outcome without risking being caught cheating. For more about this cf e.g. [200] or
[106].
11 The first attempt for a proof, given by Lo and Chau [154]. However, its validity is restricted to
the case where ‘cheating’ always influences the probabilities of both valid outcomes. More precisely
they demand that the probabilities for the outcomes 0 and 1 are equal, for any cheating strategy.
This restriction is too strong, even if Alice and Bob sit together and throw a real coin one of them
can always say he (she) does not accept the result (and for example refuses to pay his loss) and
so put the probability for one outcome to zero while the probability for the other one and the
outcome invalid are 1/2 each.
Chapter 13
Infinitely entangled states
Many of the concepts of entanglement theory were originally developed for quan-
tum systems described in finite dimensional Hilbert spaces. This restriction is often
justified, since we are usually only trying to coherently manipulate a small part of
the system. On the other hand, a full description of almost any system, beginning
with a single elementary particle, requires an infinite dimensional Hilbert space.
Hence if one wants to discuss decoherence mechanisms arising from the coupling of
the “qubit part” of the system with the remaining degrees of freedom, it is neces-
sary to widen the context of entanglement theory to infinite dimensions. This is not
difficult, since many of the basic notions, e.g. the definitions of entanglement mea-
sures, like the reduced von Neumann entropy or entanglement of formation, carry
over almost unchanged, merely with finite sums replaced by infinite series. More
serious are some technical problems arising from the fact that such entanglement
measures can now become infinite, and are no longer continuous functions of the
state. Luckily, as shown in recent work of Eisert et. al. [83], these problems can
be tamed to a high degree, if one imposes some natural energy constraints on the
systems.
In this chapter we look at some not-so-tame states, which should be considered
as idealized descriptions of situations in which very much entanglement is available.
For example, in the study of “entanglement assisted capacity” (Subsection 6.2.3)
one assumes that the communicating partners have an unlimited supply of shared
maximally entangled singlets. In quantum information problems involving canonical
variables it is easily seen that perfect operations can only be expected in the limit
of an “infinitely squeezed” two mode gaussian state as entanglement resource (see
also Section 13.5). But infinite entanglement is not only a desirable resource, it is
also a natural property of some physical systems, such as the vacuum in quantum
field theory (see [204, 205] and Section 13.4 below). Our aim is to show that one can
analyze these situations by writing down bona fide states on suitably constructed
systems.
This chapter is mainly based on [135]. Related publications are [83] where en-
tangled density matrices in infinite dimensional Hilbert spaces are studied and
[58, 59, 60] concerning EPR states (cf. Section 13.5).
13.1 Density operators on infinite dimensional Hilbert

space
We will start our discussion with a short look at entanglement properties of density
operators on an infinite dimensional but separable1 Hilbert space H ⊗ H.
Most of the definitions of entanglement quantities carry over from the finite
dimensional setting without essential change. Since we want to see how these quan-
tities may diverge, let us look mainly at the smallest, the distillible entanglement
(cf. Subsection 5.1.3). Effectively, distillation protocols for infinite dimensional sys-
tems can be built up by first projecting to a suitable finite dimensional subspace,
and subsequently applying finite dimensional procedures to the result.
With this in mind we can easily construct pure states with infinite distillible
1 Another extension of this framework, namely to Hilbert spaces of uncountable dimension (i.e.,
unseparable ones in the topological sense) is not really interesting with regard to entanglement
theory, since any density operator has separable support, i.e., it is zero on all but countably many
dimensions.
191 13.2. Infinite one-copy entanglement
entanglement. Let us consider vectors in Schmidt form, i.e.,

X
Φ= cn e0n ⊗ e00n (13.1)
n
P
with orthonormal bases e0n , e00n , and positive numbers cn ≥ 0, and n |cn |2 = 1. The
density operator of the restriction ofPthis state to Alices’s subsystem has eigenvalues
c2n , and von Neumann entropy − n c2n log2 (c2n ), which we can take to be infinite
(cn = 1/(Z(n + 2) log2 (n + 2)2 ) will do). We can distill this by using more and
more of the dimensions as labelled by the bases e0n , e00n , and applying the known
finite dimensional distillation procedures to this to get out arbitrary amount of
entanglement per pair.
Once this is done, it is also easy to construct mixed states with large entan-
glement in the neighborhood of any state ρ, mixed or pure, separable or not. We
only have to remember that every state is essentially (i.e., up to errors of small
probability) supported on a finite dimensional subspace. Therefore we can consider
the mixture ρ² = (1 − ²)ρ + ²σ with a small fraction of an infinitely entangled pure
state σ, which is supported on those parts of the Hilbert space, where ρ is nearly
zero. Therefore distillation based on the support of σ will work for ρ² and produce
arbitrarily large entanglement per ρ² pair, in spite of the constant reduction factor
².
For the details of such arguments we refer to [59, 124]. The argument as given
here does not quite show that states of infinite distillible entanglement are norm
dense, but it certainly establishes the discontinuity of the function “distillible en-
tanglement” with respect to the trace norm topology. This might appear to show
that the approach to distillible entanglement based on finite dimensional systems is
fundamentally flawed: If only finitely many dimensions out of the infinitely many
providing a full description of the particle/system are used, might not the entan-
glement be misrepresented completely? Here it helps that states living on a far out
subspace in Hilbert space usually also have large or infinite energy. For typical con-
fined systems, the subspaces with bounded energy are finite dimensional, so if we
assume a realistic a priori bound on the energy expectation of the states on the
consideration, continuity can be restored [83].
13.2 Infinite one-copy entanglement
If the distillable entanglement of a state is infinite: how much of that entanglement
can we get out? If sufficiently many copies of the state are given, we can use of
course a distillation process producing in the long run infinitely many nearly pure
singlets per original entangled pair. But if the entanglement is infinite, might it not
be possible to use only one copy of the state in the first place? In other words,are
there states, which can be used as a one time resource, to teleport an arbitrary
number of qubits?
We will now give a definition of such states. The extraction of entanglement will
be described by a sequence of operations resulting in a pair of d-level systems with
finite d. The extraction is successful, if this pair is in a nearly maximally entangled
state, when one starts from the given input state. The overall operation is then given
mathematically by a completely positive, trace preserving map Ed . Of course, we
must make sure that the extraction process does not generate entanglement. There
are different ways of expressing this mathematically. For example, we could allow E d
to be LOCC operations (Subsection 3.2.6). We will also consider a much weaker, and
much more easily verified condition, namely that Ed takes pure product states into
states with positive partial transpose (“PPtPPT operations” for “pure product to
positive partial transpose”). Of course, every LOCC channel is a PPtPPT channel.
The success is measured by the fidelity of the output state Ed (ρ) with a fixed
maximally entangled state on Cd ⊗ Cd . By pd we denote the projection onto this
13. Infinitely entangled states 192
maximally entangled reference vector. Then a density operator ρ is said to have

“infinite one-copy entanglement”, if for any ε > 0 and any d ∈ N there is a PPtPPT
channel Ed such that
tr(Ed (ρ)pd ) ≥ 1 − ε . (13.2)
Then we have the following Theorem, whose proof uses a distillation estimate of
Rains [184] developed for the finite dimensional context.
Theorem 13.2.1 For any sequence of PPtPPT channels Ed , d ∈ N, and for any
fixed density operator ρ we have
lim tr(Ed (ρ)pd ) = 0 . (13.3)

d→∞
In particular, no density operator with infinite one-copy entanglement exists.

Proof. Consider the operators Ad defined by
tr(ρAd ) = tr(Ed (ρ)pd ). (13.4)
In order to verify that Ad exists, observe that Ed , as a positive operator is auto-

matically norm continuous. Hence the right hand side is a norm continuous linear
functional on density matrices ρ. Since the set of bounded operators is the dual
Banach space of the set of trace class operators [186, Theorem VI.26] such func-
tionals are indeed of the form (13.4). We now have to show that, for every ρ, we
have limd tr(ρAd ) = 0, i.e., that Ad → 0 in the weak*-topology of this dual Banach
space.
Obviously, 0 ≤ Ad ≤ 1I, and by the Banach-Alaoglu Theorem [186, Theorem
IV.21], this set is compact in the topology for which we want to show convergence.
Hence the sequence has accumulation points and we only have to show that all
accumulation points are zero. Let A∞ denote such a point. Then it suffices to show
that tr(σA∞ ) = 0 for all pure product states σ. Indeed, since A∞ ≥ 0, this condition
forces A∞ φ ⊗ ψ = 0 for all pairs of vectors φ, ψ, and hence A∞ = 0, because such
vectors span the tensor product Hilbert space.
On the other hand, our locality condition is strong enough to allow us to compute
the limit directly for pure product states σ. We claim that
tr(σAd ) = tr(Ed (σ)pd )

= tr(Ed (σ)T2 pTd 2 ) ≤ kpTd 2 k = 1/d (13.5)
Here we denote by X T2 the partial transposition with respect to the second tensor
factor, of an operator X on the finite dimensional space Cd ⊗ Cd , and use that this
operation is unitary with respect to the Hilbert-Schmidt scalar product hX, Y i HS =
tr(X ∗ Y ). By assumption, Ed (σ)T2 ≥ 0, and since partial transposition preserves the
trace, Ed (σ)T2 is even a density operator. Hence the expectation value of pTd 2 in this
state is bounded by the norm of this operator. But it is easily verified that p Td 2 is
just (1/d) times the unitary operator exchanging the two tensor factors. Hence its
norm is (1/d). Taking the limit of this estimate along a sub-net of Ad converging
to A∞ , we find tr(σA∞ ) = 0. 2
13.3 Singular states and infinitely many degrees of freedom

In this section we will show how to construct a system of infinitely many singlets.
It is clear from Theorem 13.2.1 that not all of the well-known features of the fi-
nite situation will carry over. Nevertheless, we will stay as closely as possible to
the standard constructions trying to pretend that ∞ is finite, and work out the
necessary modifications as we go along. The crucial point is, as we will see that the
193 13.3. Singular states and infinitely many degrees of freedom
equivalence between descriptions of quantum states in terms density matrices and

expectation value functionals, which we have discussed in the finite dimensional case
in Section 2.1, breaks down when observable algebras become infinite dimensional
(cf. Subsection 13.3.2).
13.3.1 Von Neumann’s incomplete infinite tensor product of Hilbert
spaces
The first difficulty we encounter is the construction of Hilbert spaces for Alice’s and
Bob’s subsystem, respectively, which should be the infinite tensor power (C 2 )⊗∞ of
the one qubit space C2 . Let us recall the definition of a tensor product: it is a Hilbert
space generated
N∞ by linear combination and norm limits from basic vectors written
as Φ = j=1 φj , where φj is a vector in the jth tensor factor. All we need to know
to construct the tensor product as the completion of formal linear combinations of
such vectors are their scalar products, which are, by definition,
*∞ ∞
+ ∞
O O Y
φj , ψj = hφj , ψj i . (13.6)
j=1 j=1 j=1
The problem lies in this infinite product, which clearly need not converge for arbi-
trary choice of vectors φj , ψj . A well-known way out of this dilemma, known as von
Neumann’s incomplete tensor product [223] is to restrict the possible sequences of
vectors φ1 , φ2 , . . . in the basic product vectors: for each tensor factor, one picks a
reference unit vector χj , and only sequences are allowed for which φj = χj holds for
all but a finite number of indices. Evidently, if this property holds for both the φ j
and the ψj the product in (13.6) contains only a finite number of factors 6= 1, and
converges.P By taking norm limits of such vectors we see that also product vectors
∞
for which j=1 kφj − χj k < ∞ are included in the infinite product Hilbert space.
However, the choice of reference vectors χj necessarily breaks the full unitary sym-
metry of the factors, as far as asymptotic properties for j → ∞ are concerned. For
the case at hand, i.e., qubit systems, let us choose, for definiteness, the “spin up”
vector as χj for every j, and denote the resulting space by H∞ .
An important observation about this construction is that all observables of finite
tensor product subsystems N∞ act as operators on this infinite tensor product space.
In fact, any operator j=1 Aj makes sense on the incomplete tensor product, as
long as Aj = 1I for all but finitely many indices. The algebra of such operators is
known as the algebra of local observables. It has the structure of a *-algebra, and
its closure in operator norm is called quasi-local algebra [35].
Let us take the space H∞ as Alice’s and Bob’s Hilbert space. Then each of them
holds infinitely many qubits, and we can discuss the entanglement contained in a
density operator on H∞ ⊗ H∞ . Clearly, there is no general upper bound to this
entanglement, since we can take a maximally entangled state on the first M < ∞
factors, complemented by infinitely many spin-up product states on the remaining
qubit pairs. But for any fixed density operator the entanglement is limited: for
measurements on qubit pairs with sufficiently large j we always get nearly the same
expectations as for two uncorrelated spin-up qubits (or whatever the reference states
χj dictate). This is just another instance of Theorem 13.2.1: there is no density
operator describing infinitely many singlets.
13.3.2 Singular states
However, can we not take the limit of states with growing entanglement? To be
specific, let ΦM denote the vector which is a product of singlet states for the first
M qubit pairs, and a spin-up product for the remaining ones. These vectors do
not converge in H∞ ⊗ H∞ , but that need not concern us, if we are only interested
in expectation values: for all local observables A (observables depending on only
finitely many qubits), the limit
ω(A) = limhΦM , AΦM i (13.7)

M
exists. Thereby we get an expectation value functional for all quasi-local observ-
ables, and by the Hahn-Banach Theorem (see e.g. [186, Theorem III.6]), we can
extend this expectation value functional to all bounded operators on H ∞ ⊗ H∞ .
The extended functional ω has all the properties required by the statistical in-
terpretation of quantum mechanics: linearity in A, ω(A) ≥ 0 for positive A, and
ω(1I) = 1. In other words it is a state on the algebra B(H) as we have introduced
them in Subsection 2.1.1. By construction, ω describes maximal entanglement for
any finite collection of qubit pairs, so it is truly a state of infinitely many singlets.
How does this match with Theorem 13.2.1? The crucial point is that that The-
orem only speaks of states given by the trace with a density operator, i.e., of func-
tionals of the form ωρ (A) = tr(ρA). Such states are called “normal” and for a finite
dimensional algebra each state is normal (cf. Subsection 2.1.2). But in the infinite
dimensional case this equivalence between the two different descriptions of quan-
tum states breaks down. In other words there is no density operator for ω: this is a
singular state on the algebra of bounded operators.
Singular states are not that unusual in quantum mechanics, although they can
only be “constructed” by an invocation the Axiom of Choice, usually through the
Hahn-Banach Theorem2 . For example, we can think of a non-relativistic particle lo-
calized at a sharp point, as witnessed by the expectations of all continuous functions
of position. Extending from this algebra to all bounded operators, we get a singular
state with sharp position3 , but “infinite momentum”, i.e., the probability assigned
to finding the momentum in any given finite interval is zero [235]. This shows that
the probability measure on the momentum space induced by such a state is only
finitely additive, but not σ-additive. This is typical for singular states.
More practical situations involving singular states arise in all systems with in-
finitely many degrees of freedom, as in quantum field theory and in statistical me-
chanics in the thermodynamic limit. For example, the equilibrium state of a free
Bose gas in infinite space at finite density and temperature is singular with respect
to Fock space because the probability for finding only a finite number of particles in
such a state is zero. In all these cases, one is primarily interested in the expectations
of certain meaningful observables (e.g., local observables), and the wilder aspects
of singular states are connected only to the extension of the state to all bounded
operators. Therefore it is a good strategy to focus on the state as an expectation
functional only on the “good” observables.
13.3.3 Local observable algebras
If we want to represent a situation with infinitely many singlets, an obvious approach
is to take again von Neumann’s incomplete tensor product, but this time the infinite
tensor product of pairs rather than single qubits, with the singlet vector chosen as
the reference vector χj for every pair. We denote this space by H∞∞ , and by Ω ∈
H∞∞ the infinite tensor product of singlet vectors. Clearly, this is a normal state
(with density operator |ΩihΩ|), and we seem to have gotten around Theorem 13.2.1
after all.
However, the problem is now to identify the Hilbert spaces of Alice and Bob as
tensor factors of H∞∞ . To be sure, the observables measurable by Alice and Bob,
respectively, are easily identified. For example, the σx -Pauli matrix for Alice’s 137th
2 Other constructions based on the Axiom of Choice are the application of invariant means, e.g.,
when averaging expectation values over all translations, or algebraic constructions using maximal
ideals. For an application in von Neumann style measurement theory of continuous spectra, see
[176]
3 This is not related to improper eigenkets of position, which do not yield normalized states
195 13.3. Singular states and infinitely many degrees of freedom
particle is a well defined operator on H∞∞ . Alice’s observable algebra A is generated

by the collection of all Alice observables for each pair. Bob’s observable algebra B is
generated similarly, and together they generate the local algebra of the pair system.
Moreover, the two observable algebras commute elementwise. This is just what we
expect from the usual setup, when the total Hilbert space is H = HA ⊗ HB , and
Alice’s and Bob’s observable algebras are A = B(HA ) ⊗ 1IB and B = 1IA ⊗ B(HB ).
However, the A and B constructed above are definitely not of this form, so H ∞∞
has no corresponding decomposition as HA ⊗ HB . The most direct way of seeing
this is to note that H∞∞ contains no product vectors describing an uncorrelated
preparation of the two subsystems. If we move to qubit pairs with sufficiently high
index, then by construction of the incomplete tensor product, every vector in H ∞∞
will be close to the singlet vector, and in particular, will violate Bell’s inequality
nearly maximally (see also Section 13.4.2).
Hence we arrive at the following generalized notion of bipartite states, generaliz-
ing the finite dimensional one: Alice’s and Bob’s subsystems are identified by their
respective observable algebras A and B. We postpone the discussion of the precise
technical properties of these algebras. What is important is, on the one hand, that
these algebras are part of a larger system, so they are both subalgebras of a larger
algebra, typically the algebra B(H) of bounded operators on some Hilbert space.
This allows us to consider products and correlations between the two algebras. On
the other hand, each measurement Alice chooses must be compatible with each one
chosen by Bob. This requires that A and B commute elementwise. A bipartite state
is then simply a state on the algebra containing both A and B.
We can then describe the two ways out of the NoGo-Theorem: on the one hand
we can allow more general states than density matrices, but on the other hand we
can also consider more general observable algebras. In the examples we will discuss,
the algebra containing A and B will in fact be of the form B(H), and the states
will be given by density matrices on H. So both strategies can be successful by
themselves.
13.3.4 Some basic facts about operator algebras
The possibility of going either to singular states or to extended observable algebras

is typical of the duality of states and observables in quantum mechanics. There are
many contexts, where it is useful to extend either the set of states or the set of
observables by idealized elements, usually obtained by some limit. However, these
two idealizations may not be compatible [235]. There are two types of operator
algebras which differ precisely in the strength of the limit procedures under which
they are closed [35, 208].
On the one hand there are C*-algebras, which are isomorphic to norm and adjoint
closed algebras of operators on a Hilbert space. Norm limits are quite restrictive, so
some operations are not possible in this framework. In particular, the spectral pro-
jections of an hermitian element of the algebra often do not lie again in the algebra
(although all continuous functions will). Therefore, it is often useful to extend the
algebra by all elements obtained as weak limits (meaning that all matrix elements
converge). In such von Neumann algebras the spectral theorem holds. Moreover,
the limit of an increasing but bounded sequence of elements always converges in
the algebra. For these algebras the distinction between normal and singular states
becomes relevant. The normal states are simply those for which such increasing
limits converge, and at the same time those which can be represented by a density
operator in the ambient Hilbert space.
A basic operation for von Neumann algebras is the formation of the commutant:
for any set M ⊂ B(H) closed under the adjoint operation, we define its commutant
as the von Neumann algebra (cf. Subsection 8.2.2)

n ¯ o
¯
M0 = X ∈ B(H) ¯ ∀M ∈ M [M, X] = 0 . (13.8)
Then the Bicommutant Theorem [208] states that M00 = (M0 )0 is the smallest von
Neumann algebra containing M. In particular, when M is already an algebra, M 00
is the weak closure of M. Von Neumann algebras are characterized by the property
M00 = M. A von Neumann algebra M with the property that its only elements
commuting with all others are the multiples of the identity (i.e., M0 ∩ M00 = C1I)
is called a factor.
It might seem that the two ways out of the NoGo-Theorem indicated at the end
of the previous section are opposite to each other, but in fact they are closely related.
For if ω is a state on a C*-algebra C ⊃ A ∪ B, we can associate with it a Hilbert
space Hω , a representation πω : C → B(H), and a unit vector Ω ∈ Hω , such that
ω(C) = hΩ, πω (C)Ωi, and such that the vectors πω (C)Ω are dense in Hω . This is
called the Gelfand-Naimark-Segal (GNS)-construction [35]. Clearly, the given state
ω is given by a density operator (namely |ΩihΩ|) in this new representation and the
algebra can naturally be extended to the weak closure πω (C)00 . The commutativity
of two subalgebras is preserved by the weak closure, so the normal state |ΩihΩ|,
and the two commuting von Neumann subalgebas πω (A)00 and πω (B)00 are again a
bipartite system, which describes essentially the same situation. The only difference
is that some additional idealized observables arise from the weak closure operations,
and that some observables in C (those with C ≥ 0 but ω(C) = 0) are represented
by zero in πω .
We remark that von Neumann’s incomplete infinite tensor product of Hilbert
spaces can be seen as aNspecial case of the GNS-construction: The infinite tensor
product of C*-algebras i Ai is well-defined (see [35, Sec 2.6] for precise conditions),
N
essentially by taking the norm completion of the algebra of local observables i Ai ,
with all but finitely many factors Ai ∈ Ai equal to 1Ii . On this algebra the infinite
tensor product of states is well-defined, Nand we get the incomplete tensor product
as the GNS-Hilbert space of the algebra i B(Hi ) with respect to the pure product
state defined by the reference vectors χi .
13.4 Von Neumann algebras with maximal entanglement
13.4.1 Characterization and basic properties
Let us analyze the example given in the last section: the bipartite state obtained
from the incomplete tensor product of singlets in H∞∞ . We take as Alice’s observ-
able algebra A the von Neumann algebra generated by all local Alice operators (and
N for Bob). The bipartite state on these algebras, given by the reference
analogously
vector i χi , then has the following properties
ME 1 A and B together generate B(H) as a von Neumann algebra, so there are no

additional observables of the system beyond those measurable by Alice and
Bob.
ME 2 A and B are maximal with respect to mutual commutativity. (i.e., A = B 0
and B = A0 )
ME 3 The overall state is pure, i.e., given by a vector Ω ∈ H,
ME 4 The restriction of this state to either subsystem is a trace, so ω(A 1 A2 ) =
ω(A2 A1 ), for A1 , A2 ∈ A.
ME 5 A is hyperfinite, i.e., it is the weak closure of an increasing family of finite
dimensional algebras.
197 13.4. Von Neumann algebras with maximal entanglement
These properties, except perhaps ME 2 (see [7]) are immediately clear from
the construction, and the properties of the respective local observables. They are
also true for finite dimensional maximally entangled states on H = HA ⊗ HB ,
A = B(HA ) ⊗ 1I, and B = 1I ⊗ B(HB ). This justifies calling this particular bipartite
system maximally entangled, as well.
There are many free parameters in this construction. For example, we could take
arbitrary dimensions di < ∞ for the ith pair. However, all these possibilities lead
to the same maximally entangled system:
Theorem 13.4.1 All bipartite states on infinite dimensional systems satisfying
conditions ME 1 - ME 5 above are unitarily isomorphic.
Proof. (Sketch). We first remark that A has to be a factor, i.e., A ∩ A0 = C1I.
Indeed, using ME 1 and ME 2, we get A ∩ A0 = B 0 ∩ A0 = (B ∪ A)0 = B(H)0 = C1I.
Now consider the support projection S ∈ A of the restriction of the state to
A. Thus 1I − S is the largest projection in A with vanishing expectation. Suppose
that this projection does not lie in the center of A, i.e., there is an A ∈ A such
that AS 6= SA. Let X = (1I − S)AS, which must then be nonzero, as AS − SA =
((1I − S) + S)(AS − SA) = X − SA(1I − S). Then using the trace property we get
ω(X ∗ X) = ω(XX ∗ ) ≤ kAk2 ω(1I−S) = 0, which implies that the support projection
of X ∗ X has vanishing expectation. But since X ∗ X ≤ kAk2 S, this contradicts the
maximality of (1I − S). It follows that S lies in the center of A and that S = 1I,
because A is a factor. To summarize this argument, ω must be faithful, in the sense
that A ∈ A, A ≥ 0, and ω(A) = 0 imply A = 0.
Now consider the subspace spanned by all vectors of the form AΩ, with A ∈ A.
This subspace is invariant under A, so its orthogonal projection is in A 0 = B. But
since (1I − P ) obviously has vanishing expectation, the previous arguments, applied
to B imply that P = 1I. This is to say that AΩ is dense in H or, in the jargon
of operator algebras, that Ω is cyclic for A. Thus H is unitarily equivalent to the
GNS-Hilbert space of ω restricted to A, and the form of B = A0 is completely
determined by this statement. Now a factor admits at most one trace state, so ω
is uniquely determined by the isomorphism type of A as a von Neumann algebra,
and it remains to show that A is uniquely determined by the above conditions.
A is a factor admitting a faithful normal trace state, so it is a “type II1 -factor”
in von Neumann’s classification. It is also hyperfinite, so we can invoke a deep
result of Alain Connes [61] stating that such a factor is uniquely determined up to
isomorphism. 2
For the rest of this section we will study further properties of this unique max-
imally entangled state of infinite entanglement. The items ME 6, ME 7 below are
clear from the above proof. ME 8 follows by splitting the infinite tensor product
either into a finite product and an infinite tail, or into factors with even and odd
labels, respectively. ME 9 - ME 11 are treated in separate subsections as indicated.
ME 6 A and B are factors: A ∩ A0 = C1I.
ME 7 AΩ and BΩ are dense in H.
ME 8 The state contains infinite one-shot entanglement, which is not diminished

by extracting entanglement. Moreover, it is unitarily isomorphic to two copies
of itself.
ME 9 Every density operator on H maximally violates the Bell-CHSH inequality

(see Section 13.4.2).
ME 10 The generalized Schmidt spectrum of Ω is flat (see Section 13.4.3).

ME 11 Every A ∈ A is completely correlated with a “double” B ∈ B. (see Sec-

tion 13.4.4).
13.4.2 Characterization by violations of Bell’s inequalities
If we look at systems consisting of two qubits, maximally entangled states can be
characterized in terms of maximal violations of Bell-inequalities. It is natural to
ask, whether something similar holds for the infinite dimensional setting introduced
in Section 13.4. To answer this question consider again a bipartite state ω on an
algebra containing two mutually commuting algebras A, B describing Alice’s and
Bob’s observables, respectively. We define the Bell correlations with respect to A
and B in ω as
1
β(ω) = sup ω(A1 (B1 + B2 ) + A2 (B1 − B2 )), (13.9)
2
where the supremum is taken over all selfadjoint Ai ∈ A, Bj ∈ B satisfying
−1I ≤ Ai ≤ 1I, −1I ≤ Bj ≤ 1I, for i, j = 1, 2. In other words A1 , A2 and B1 , B2
are (appropriately bounded) observables measurable by Alice respectively Bob. Of
course, a classically correlated (separable) state, or any other state consistent with
a local hidden variable model [229] satisfies the Bell-CHSH-inequality β(ω) ≤ 1.
Exactly as in the standard case, we can show Cirelson’s inequality [56, 206, 233]
bounding the quantum violations of the inequality as
√
β(ω) ≤ 2. (13.10)
√
If the upper bound 2 is attained we speak of a maximal violation of Bell’s inequal-
ity.
It is clear that the maximally entangled state described above does saturate this
bound: In the infinite tensor product construction of H = H∞∞ we only need to
take observables Ai , Bi from the first tensor factor. But we could also have chosen
similar observables Ai,k , Bi,k (i = 1, 2) for the k th qubit pair. Let us denote by
Tk = A1,k (B1,k + B2,k ) + A2,k (B1,k − B2,k ) (13.11)
the “test operator” for the k th qubit pair, whose expectation enters the Bell-CHSH-
inequality. Then for a dense set of vectors φ ∈ H, namely for those differing
√ from
the reference vector in only finitely many positions, we get hφ, Tk φi = 2 for all
sufficiently large k. Since the norms kT√ k k are uniformly bounded, a simple 3ε-
argument shows that limk→∞ hφ, Tk φi = 2 for all φ ∈ H∞∞ . By taking mixtures
we find √
lim tr(ρTk ) = 2 (13.12)
k→∞
for all density operators ρ on H∞∞ .
This property is clearly impossible in the finite dimensional case: any product
state would violate it. This clarifies the statement in Section 13.3.3 that H∞∞ is
in no way a tensor product of Hilbert spaces for Alice and Bob. Of course, we can
simply define a product state on the algebra of local operators, and then extend it
by the Hahn-Banach Theorem to all operators on B(H∞∞ ). However, just as the
reference state of infinitely many singlets is a singular state on B(H∞ ⊗ H∞ ), any
product state will necessarily be singular on B(H∞∞ ).
It is interesting that bipartite states with property (13.12) naturally arise in
quantum field theory, with A and B the algebras of observables measurable in
two causally disjoint (but tangent) spacetime regions. This is true under axiomatic
assumptions on the structure of local algebras, believed to hold in any free or in-
teracting theory. The only thing that enters is indeed the structure of the local von
Neumann algebras, as shown by the following Theorem [204, 205, 206]. Again the
maximally entangled state plays a key role.
Theorem 13.4.2 ([205]) Let A, B ⊂ B(H) be mutually commuting von Neumann

algebras acting on a separable Hilbert space H. Then the following are equivalent:
√
(i) For some density operator ρ, which has no zero eigenvalues, we have β(ρ) = 2.
√
(ii) For every density operator ρ on H we have β(ρ) = 2.
(iii) There is a set Tk of test operators formed from A and B such that (13.12)
holds for all density operators ρ.
(iv) There is a unitary isomorphism under which
H = H∞∞ ⊗ H e,
A = A1 ⊗ Ae ,
B = B1 ⊗ Be ,
e Be ⊂ B(H)
A1 , B1 ⊂ B(H∞∞ ) are the algebras of Theorem 13.4.1, and A, e are
other von Neumann algebras.
In other words, the maximal violation of Bell’s inequalities for all normal states
implies that the bipartite system is precisely the maximal entangled state, plus some
additional degrees of freedom (A, e B),
e which do not contribute to the violation of
Bell inequalities.
13.4.3 Schmidt decomposition and modular theory
The Schmidt decomposition (Proposition 2.2.1) is a key technique for analyzing
bipartite pure states in the standard framework. It represents an arbitrary vector
Ω ∈ HA ⊗ HB as X
Ω= c α eα ⊗ f β , (13.13)
α
where the cα > 0 are positive constants, and {eα } ⊂ HA and {fα } ⊂ HB are
orthonormal systems.
Its analog in the context of von Neumann algebras is a highly developed theory
with many applications in quantum field theory and statistical mechanics, known
as the modular theory of Tomita and Takesaki [207]. We recommend Chapter 2.5
in [35] for an excellent exposition, and only outline some ideas and indicate the
connection to the Schmidt decomposition.
Throughout this subsection, we will assume that A, B ⊂ B(H) are von Neumann
algebras, and Ω ∈ H is a unit vector, such that the properties ME 2, ME 3, and
ME 7 of Section 13.4.1 hold. As in the case of the usual Schmidt decomposition
the essential information is already contained in the restriction of the given state
to the subalgebra A, i.e., by the linear functional ω(A) = hΩ, AΩi. Indeed, the
Hilbert space and the cyclic vector Ω (cf. ME 7) satisfy precisely the conditions
for the GNS-representation, which is unique up to unitary equivalence. Moreover,
condition ME 2 fixes B as the commutant algebra.
However, since A often does not admit a trace, we cannot represent ω by a
density operator, and therefore we cannot use the spectrum of the density operator
to characterize ω. Surprisingly, it is equilibrium statistical mechanics, which pro-
vides the notion to generalize. In the finite dimensional context, we can consider
every density operator as a canonical equilibrium state, and determine from it the
Hamiltonian of the system. This in turn defines a time evolution. Note that the
Hamiltonian is only defined up to a constant, so we cannot expect to reconstruct
the eigenvalues of H, but only the spectrum of the Liouville operator σ 7→ i[σ, H],
which generates the dynamics on density operators, and has eigenvalues i(E n −Em ),
when the En are the eigenvalues of H. The connection between the time evolutions
and equilibrium states makes sense also for von Neumann algebras, and can be seen
as the physical interpretation of modular theory [35].
We begin the outline of this theory with the anti-linear operator S on H by
S(AΩ) = A∗ Ω, A ∈ A. (13.14)
It turns out to be closable, and we denote its closure by the same letter. As a closed
operator S admits a polar decomposition
S = J∆1/2 , (13.15)
which defines the anti-unitary modular conjugation J and the positive modular
operator ∆.
Let us calculate ∆ in the standard situation, where H = K⊗K, and A = B(K)⊗1I
respectively B = 1I ⊗ B(K), and Ω is in Schmidt form (13.13). Due to assumption
ME 7 (cyclicity), the orthonormal systems eα and fα have to be even complete (i.e.,
bases). Now consider (13.14) with A = (|eβ iheγ |) ⊗ 1I, which becomes
S(cγ eβ ⊗ fγ ) = cβ eγ ⊗ fβ , (13.16)
from which we readily get
∆1/2 = ρ1/2 ⊗ ρ−1/2 , and J = F (Θ ⊗ Θ), (13.17)
P
where ρ = α c2α |eα iheα | is the reduced density operator, F φ1 ⊗ φ2 = φ2 ⊗ φ1 is the
flip operator and Θ denotes complex conjugation in the en basis. The time evolution
with Hamiltonian H = − log ρ + c1I, for which ω is now the equilibrium state with
unit temperature, is then given by Et (A) ⊗ 1I = ∆it (A ⊗ 1I)∆−it .
In the case of general von Neumann algebras, the spectrum of ∆ need no longer
be discrete, and it can be a general positive, but unbounded selfadjoint operator.
It turns out that ∆it still defines a time evolution on the algebra A, the so-called
modular evolution The equilibrium condition cannot be written directly in the Gibbs
form ρ ∝ exp(−H), since there is no density matrix any more, but has to be replaced
by the so-called KMS-condition, a boundary condition for the analytic continuation
of correlation functions [35, 104] which links the modular evolution to the state.
In the standard situation, the eigenvalue 1 of ∆ plays a special role, because it
points to degeneracies in the Schmidt spectrum. In the extreme case of a maximally
entangled state all cα are equal, and ∆ = 1I or, equivalently, S is anti-unitary. This
characterization of maximal entanglement carries over to the von Neumann algebra
case: S is anti-unitary if and only if for all A1 , A2 ∈ A
hΩ, A1 A2 Ωi = hA∗1 Ω, A2 Ωi = hSA1 Ω, SA∗2 Ωi
= hA∗2 Ω, A1 Ωi = hΩ, A2 A1 Ωi.
This is precisely the trace property ME 4.
13.4.4 Characterization by the EPR-doubles property
In the original EPR-argument it is crucial that certain observables of Alice and Bob
are perfectly correlated, so that Alice can find the values of observables on Bob’s side
with certainty, without Bob having to carry out this measurement. An approach to
studying such correlations was proposed recently by Arens and Varadarajan [8]. The
basic idea, stripped of some measure theoretic overhead, and extended to the more
general bipartite systems considered here [236], rests on the following definition.
Let A, B be commuting observable algebras and ω a state on an algebra containing
both A and B. Then we say that an element B ∈ B is an EPR-double of A ∈ A, or
that A and B are doubles (of each other) if
¡ ¢ ¡ ¢
ω (A∗ − B ∗ )(A − B) = ω (A − B)(A∗ − B ∗ ) = 0. (13.18)
Of course, when A and B are hermitian, the two expressions coincide, and in this
case there is a simple interpretation of equation (13.18). Since A and B commute,
we can consider their joint distribution (measuring the joint spectral resolution of A
and B). Then (A−B)2 is a positive quantity, which has vanishing expectation if and
only if the joint distribution is concentrated on the diagonal, i.e., if the measured
values coincide with probability one.
Basic properties are summarized in the following Lemma.
Lemma 13.4.3 Let ω be a state on a C*-algebra containing commuting subalgebras
A and B. Then
(i) A and B are doubles iff for all C in the ambient observable algebra we have
ω(AC) = ω(BC) and ω(CA) = ω(CB).
(ii) If A1 , A2 have doubles B1 , B2 , then A∗1 , A1 + A2 , and A1 A2 have doubles

B1∗ , B1 + B2 , and B2 B1 , respectively.
(iii) When A and B are normal (AA∗ = A∗ A), and doubles of each other, then so
are f (A) and f (B), where f is any continuous complex valued function on the
spectrum of A and B, evaluated in the functional calculus.
(iv) When A and B are von Neumann algebras, and ω is a normal state, and
observables An with doubles Bn converge in weak*-topology to A, then every
cluster point of the sequence Bn is a double of A.
(v) Suppose that ω restricted to B is faithful (i.e., B 3 B ≥ 0 and ω(B) = 0 imply

B = 0). Then every A ∈ A admits at most one double.
Proof. (i) One direction is obvious by setting C = A∗ − B ∗ . The other direction

follows from the Schwartz inequality |ω(X ∗ Y )|2 ≤ ω(X ∗ X)ω(Y ∗ Y ).
The remaining items follow directly from (i). (iii) is obvious from (ii) for poly-
nomials in A and A∗ , and extends to continuous functions by taking norm limits on
the polynomial approximations to f provided by the Stone-Weierstraß approxima-
tion theorem. For (iv) one has to use the weak*-continuity of the product in each
factor separately (see e.g. [189, Theorem 1.7.8]). 2
In the situation we have assumed for modular theory, we can give a detailed
characterization of the elements admitting a double:
Proposition 13.4.4 Suppose A and B = A0 are von Neumann algebras on a
Hilbert space H, and the state ω is given by a vector Ω ∈ H, which is cyclic for
both A and B. Then for every A ∈ A the following conditions are equivalent:
(i) A has an EPR-double B ∈ B.
(ii) A is in the centralizer of the restricted state, i.e., ω(AA1 ) = ω(A1 A) for all
A1 ∈ A.
(iii) A is invariant under the modular evolution ∆it A∆−it = A for all t ∈ R.
In this case the double is given by B = JA∗ J.

Proof. (i)⇒(ii) When A has a double B, we get ω(AA1 ) = ω(BA1 ) = ω(A1 B) =
ω(A1 A) for all A1 in the ambient observable algebra.
(ii)⇔(iii) This is a standard result (see, e.g., [16, Prop. 15.1.7]).
(iii)⇒(i) Since ∆it Ω = Ω, (iii) implies ∆it AΩ = AΩ, so AΩ is an eigenvector for
eigenvalue 1 of the unitary ∆it and ∆AΩ = AΩ. By the same token, ∆A∗ Ω = A∗ Ω.
We claim that in that case B = JA∗ J ∈ B is a double of A in B: We have BΩ =
JA∗ JΩ = JA∗ Ω = JSAΩ = ∆AΩ = AΩ and, similarly, B ∗ Ω = A∗ Ω. From this (i)

follows immediately.
The formula for B was established in the last part of the proof. Uniqueness
follows from Lemma 13.4.3. 2
Two special cases are of interest. On the one hand, in the standard case of a pure
bipartite state we get a complete characterization of the observables which posses
a double: they are exactly the ones commuting with the reduced density operator
[8]. On the other hand, we can ask under what circumstances all A ∈ A admit a
double. Clearly, this is the case when the centralizer in (ii) of the Proposition is
all of A, i.e., if and only if the restricted state is a trace. Again this characterizes
the everybody’s maximally entangled states on finite dimensional algebras, and the
unique infinite dimensional one for hyperfinite von Neumann algebras.
13.5 The original EPR state
In their famous 1935 paper [82] Einstein, Podolsky and Rosen studied two quantum
particles with perfectly correlated momenta and perfectly anticorrelated positions.
It is immediately clear that such a state does not exist in the standard framework
of Hilbert space theory: the difference of the positions is a self-adjoint operator with
purely absolutely continuous spectrum, so whatever density matrix we choose, the
probability distribution of this quantity will have a probability density with respect
to Lebesgue measure, and cannot be concentrated on a single point. Consequently,
the wave function written in [82] is a pretty wild object. Essentially it is Ψ(x1 , x2 ) =
cδ(x1 − x2 + a), with the Dirac delta function, and c a “normalization factor” which
must vanish, because the normalization integral for the delta function is undefined,
but infinite if anything.
How could such a profound physical argument be based on such an ill-defined
object? The answer is probably that the authors were completely aware that they
were really talking about a limiting situation of more and more sharply peaked
wave functions. We could model them by a sequence of more and more highly
squeezed two mode Gaussian states (cf. Subsection 13.5.5), or some other sequence
representation of the delta function. The key point is that the main argument does
not depend on the particular approximating sequence. But then we should also be
able to discuss the limiting situation directly in a rigorous way, and extract precisely
what is common to all approximations of the EPR state.
13.5.1 Definition
In this section we consider a family of singular states, which describes quite well
what Einstein Podolsky and Rosen may have had in mind. Throughout we as-
sume we are in the usual Hilbert space H = L2 (R2 ) for describing two canonical
degrees of freedom, with position and momentum operators Q1 , Q2 , P1 , P2 . The
basic observation is that the operators P1 + P2 and Q1 − Q2 commute as a conse-
quence of the Heisenberg commutation relations. Therefore we can evaluate in the
functional calculus (i.e., using a joint spectral resolution) any function of the form
g(P1 + P2 , Q1 − Q2 ), where g : R2 → C is an arbitrary bounded continuous function.
We define an EPR-state as any state ω such that
³ ´
ω g(P1 + P2 , Q1 − Q2 ) = g(0, a) , (13.19)
where a is the fixed distance between the particles. Several comments are in order.
First of all, if we take any sequence of vectors to “approximate” the EPR wave func-
tion (and adjust normalization on the way), weak*-cluster points of the correspond-
ing sequence of pure states exist by compactness of the state space, and all these
will be EPR states in the sense of our definition. Secondly, condition (13.19) does
203 13.5. The original EPR state
not fix ω uniquely. Indeed, different approximating sequences may lead to different
ω. Even for a fixed approximating sequence it is rarely the case that the expectation
values of all bounded operators converge, so the sequence will have many different
cluster points. Thirdly, the existence of EPR states can also be seen more directly:
the algebra of bounded continuous functions on R2 is faithfully represented in B(H)
(i.e., g(P1 + P2 , Q1 − Q2 ) = 0 only when g is the zero function). On that algebra
the point evaluation at (0, a) is a well defined state, so any Hahn-Banach extension
of this state to all of B(H) will be an EPR state 4 .
In our further analysis we will only look at properties which are common to
all EPR states, and which are hence independent of any choice of approximating
sequences. The basic technique for extracting such properties from (13.19) is to use
positivity of ω in the form of the Schwartz inequality |ω(A∗ B)| ≤ ω(A∗ A)ω(B ∗ B).
For example, we get
³ ´ ³ ´
ω Xb g = ω gb X = g(0, a)ω(X) , (13.20)
where gb is shorthand for g(P1 + P2 , Q1 − Q2 ) for some bounded continuous function

g, and X ∈ B(H) is an arbitrary bounded operator. This is shown by taking A = X ∗
and B = (b g −g(0, a)1I) (or A = (b
g −g(0, a)1I) and B = X) in the Schwartz inequality.
13.5.2 Restriction to the CCR-algebra
Next we consider the expectations of Weyl operators
W(ξ1 , ξ2 , η1 , η2 ) = ei(ξ1 P1 +ξ2 P2 −η1 Q1 −η2 Q2 )

~ ~ ~
= ei(ξ·P −~η·Q) . (13.21)
~ ~η ) ∈ S, we have
Obviously, if ξ1 = ξ2 and η1 = −η2 , which we will abbreviate as (ξ,
~
W(ξ, ~η ) = gb for a uniformly continuous g, so (13.19) determines the expectation.
Combining it with Equation (13.20) we get:
³ ´ ³ ´
~ ~η )X = ω XW(ξ,
ω W(ξ, ~ ~η ) = ω(X) ,
for ~ ~η ) ∈ S .
(ξ, (13.22)
In particular, the state is invariant under all phase space translations by vectors in
S.
This is already sufficient to conclude that the state is purely singular, i.e., that
ω(K) = 0 for every compact operator, and in particular for all finite dimensional
projections. An even stronger statement is that the restrictions to Alice’s and Bob’s
subsystem are purely singular.
Lemma 13.5.1 For any EPR state, and any compact operator K, ω(K ⊗ 1I) = 0.
Proof. Indeed the restricted state is invariant under all phase space translations,
since we can extend W(ξ, η) to a Weyl operator of the total system, i.e., W 0 (ξ, η) =
W(ξ, ξ, η, −η) ∼
= W(ξ, η) ⊗ W(ξ, −η), with (ξ, ξ, η, −η) ∈ S, and
¡
ω (W(ξ, η)AW(ξ, η)∗ ) ⊗ 1I) (13.23)
¡ 0 0 ∗
= ω W (ξ, η)(A ⊗ 1I)W (ξ, η) )) .
4 The reason for defining EPR-states with respect to continuous functions of P + P and
1 2
Q1 − Q2 rather than, say, measurable functions, is that we need faithfulness. The functional
calculus is well defined also for measurable functions, but some functions will evaluate to zero.
In particular, for the function g(p, x) = 1 for x = a and p = 0, but g(p, x) = 0 for all other
points, we get g(P1 + P2 , Q1 − Q2 ) = 0, because the joint spectrum of these operators is purely
absolutely continuous. Hence condition (13.19), extended to measurable functions would require
the expectation of the zero operator to be 1.
Now consider a unit vector χ with bounded support in position space, and let K =
|χihχ| be the corresponding one-dimensional projection. Then sufficiently widely
space translates W(nξ0 , 0)χ are orthogonal, and hence, for all N , the operator
PN
KN = n=1 W(nξ0 , 0)KW(nξ0 , 0∗ ) is bounded by 1I. Hence N ω(K) = ω(KN ) ≤
ω(1I) = 1, and ω(K) = 0. Since vectors of compact support are norm dense in
Hilbert space, the conclusion holds for arbitrary 2
For other Weyl operators we get the expectations from the Weyl commutation
relations
~ ~η )W(ξ~ 0 , ~η 0 ) = ei σ2 W(ξ~ + ξ~ 0 , ~η + ~η 0 ) , with

W(ξ, σ = ξ~ · ~η 0 − ξ~ 0 · ~η . (13.24)
This is just a form of the Heisenberg commutation relations. Now S is a so-called

maximal isotropic subspace of phase space, which is to say that the commutation
phase σ vanishes for (ξ, ~ ~η ), (ξ~ 0 , ~η 0 ) ∈ S, and no subspace of phase space strictly
including S has the same property.
For a point (ξ, ~ ~η ) in phase space, which does not belong to S, we can find
some vector (ξ~ , ~η 0 ) ∈ S such that the commutation phase eiσ 6= 1 is non
0
trivial. Combining the Weyl relations (13.24) with the invariance (13.22) gives
ω(W(ξ, ~ ~η )) = ω(W(ξ~ 0 , ~η 0 )W(ξ, ~ ~η )) = eiσ ω(W(ξ,~ ~η )W(ξ~ 0 , ~η 0 )) = eiσ ω(W(ξ,
~ ~η ))
which implies that the expectation values
³ ´
ω W(ξ, ~ ~η ) = 0 ~ ~η ) ∈
for (ξ, /S (13.25)
must vanish. With equations (13.22,13.25) we have a complete characterization of

the state ω restricted to the “CCR-algebra”, which is just the C*-algebra generated
by the Weyl operators. Since this is a well-studied object, one might make these
equations the starting point of an investigation of EPR states. However, one can
see that (13.19) is strictly stronger: there are states which look like ω on the CCR-
algebra, but which give an expectation in (13.19) corresponding to a limit of states
going to infinity instead of going to zero.
13.5.3 EPR-correlations
How about the correlation property, which is so important in the EPR-argument?
The best way to show this is the ‘double’ formalism of Section 13.4.4 in which
we denote by Z the norm closed subalgebra of operators on L2 (R) generated by
all operators of the form f (ξP + ηQ), where f : R → C is an arbitrary uniformly
continuous function evaluated in the functional calculus on a real linear combination
ξP + ηQ of position and momentum 5 . This algebra is fairly large: it contains many
observables of interest, in particular all Weyl operators and all compact operators.
It is closed under phase space translations, and these act continuously in the sense
that, for Z ∈ Z, kW(ξ, η)ZW(ξ, η)∗ − Zk → 0 as (ξ, η) → 0 6 .
Theorem 13.5.2 All operators of the form Z ⊗ 1I with Z ∈ Z have doubles in
the sense of equation (13.18). Moreover, the double of Z ⊗ 1I is 1I ⊗ Z T , where
Z T denotes the transpose (adjoint followed by complex conjugation) in the position
representation.
Proof. We only have to show that for f (ξP + ηQ) ⊗ 1I = f (ξP1 + ηQ1 ) we get
the double 1I ⊗ f (−ξP + ηQ) = f (−ξP2 + ηQ2 ), when f, ξ, and η are as in the
definition of Z. By the general properties of the double construction this will then
automatically extend to operator products and norm limits.
5 The same type of operators, although motivated by a different argument already appears in
[60]
6 This continuity is crucial in the correspondence theory set out in [235]. We where not able to
prove the analogue of Theorem 13.5.2 by only assuming this continuity.

Fix ε > 0. Since f is uniformly continuous, there is some δ > 0 such that
|f (x) − f (y)| ≤ ε whenever |x − y| ≤ δ. Now pick a continuous function h : R →
[0, 1] ⊂ R such that h(0) = 1, h(t) = 0 for |t| > δ. We consider the operator
M = (f (ξP1 + ηQ1 ) − f (−ξP2 + ηQ2 )) ×

×h(ξ(P1 + P2 ) + η(Q1 − Q2 ))
= F (ξP1 + ηQ1 , −ξP2 + ηQ2 )
where F (x, y) = (f (x)−f (y))h(x−y) and this function is evaluated in the functional
calculus of the commuting selfadjoint operators (ξP1 + ηQ1 ) and (−ξP2 + ηQ2 ). But
the real valued function F satisfies |F (x, y)| ≤ ε for all (x, y): when |x − y| > δ the
h-factor vanishes, and on the strip |x − y| ≤ δ we have |f (x) − f (y)| ≤ ε. Therefore
kM k ≤ ε. Let X be an arbitrary operator. Then
¯ ¡£ ¤ ¢¯
¯ω f (ξP1 + ηQ1 ) − f (−ξP2 + ηQ2 ) X ¯
¡ ¢
= |ω M X | ≤ kM k kXk ≤ εkXk .
Here we have added a factor h(ξ(P1 + P2 ) + η(Q1 − Q2 )) at the second equality sign,
which we may because of (13.20), and because h is a function of the appropriate
operators, which is = 1 at the origin. Since this estimate holds for any ε, we conclude
that the first relation in Lemma 13.4.3.1 holds. The argument for the second relation
is completely analogous. 2
13.5.4 Infinite one-shot entanglement

In order to show that the EPR state is indeed highly entangled, let us verify that it
contains infinite one-shot entanglement in the sense forbidden by Theorem 13.2.1.
The local operations needed to extract a d-dimensional system will be simply the
restriction to a subalgebra. In other words, we will construct subalgebras A d ⊂ A
and Bd ⊂ B such that the state ω restricted to Ad ⊗Bd will be a maximally entangled
pure state of d-dimensional systems.
The matrix algebras Ad , Bd are best seen to be generated by Weyl operators,
satisfying a discrete version of the canonical commutation relations (13.24), with
the addition operation on the right hand side replaced by the addition in a finite
group. Let Zd denote the cyclic group of integers modulo d. With the canonical
basis |k, ì, k, ` ∈ Zd we introduce the Weyl operators
w(n1 , m1 , n2 , m2 )|k, ì (13.26)

= ζ n1 (k−m1 )+n2 (`−m2 ) |k − m1 , ` − m2 i,
where ζ = exp(2πi/d) is the d th root of unity. These are a basis of the vector space
B(Cd ⊗ Cd ), which shows that this algebra is generated by the four unitaries u1 =
w(1, 0, 0, 0), v1 = w(0, 1, 0, 0), u2 = w(0, 0, 1, 0) and v2 = w(0, 0, 0, 1). They are
defined algebraically by the relations vk uk = ζuk vk , k = 1, 2, and ud1 = ud2 = v1d =
v2d = 1I. The oneP dimensional projection onto the standard maximally entangled
vector Ω = d−1/2 k |kki can be expressed in the basis (13.26) as
1 X
|ΩihΩ| = w(n, m, −n, m)
d2 n,m
1 X
= (u1 u−1 n m
2 ) (v1 v2 ) , (13.27)
d2 n,m
which will be useful for computing fidelity.

In order to define the subalgebras extracting the desired entanglement we first
define operators U1 , V1 in Alice’s subalgebra and U2 , V2 in Bob’s, which satisfy the
above relations and hence generate two copies of the d × d matrices. It is easy
to satisfy the commutation relations Vk Uk = ζUk Vk , by taking appropriate Weyl
operators, say
U e2 = ei(Q2 −a) , and Vek = eiξPk
e1 = eiQ1 , U (13.28)
with ξ = 2π/d. The tilde indicates that these are not quite the operators yet we are
e d = exp(idQ1 ) 6=
looking for, because they do not satisfy the periodicity relations: U 1
e d e e
1I, and similarly for U2 and Vk . We will denote by A the C*-algebra, generated by
the operators U e1 , Ve1 (13.28). The algebra Be is constructed analogously. Then by
virtue of the commutation relations U e d and Ve d commute with all other elements of
1 1
e
A, i.e., they belong to the center CA ⊂ A, e which represents the classical variables
of the system. In the same manner, U e d and Ve d generate the center CB of Bob’s
2 2
algebra Be .7
If we take any continuous function (in the functional calculus) of a hermitian

or unitary element of CA , it will still be in CA . If we take a measurable (possibly
discontinuous) function the result may fail to be in CA , but it still commutes with
all elements of Ae (and analogously for Bob’s algebras). In particular, we construct
the operators
¡ d ¢1/d
bk = U
U ek , (13.29)
where the dth root of numbers on the unit circle is taken with a branch cut on the
negative real axis. This branch cut makes the function discontinuous, and also makes
this odd-looking combination very different from Uek . We now define Vbk analogously,
and set
Uk = Ub −1 U
ek and Vk = Vb −1 Vek (13.30)
k k
for k = 1, 2. Then since U bk , Vbk commute with Ae ⊗ B, e the commutation relations

Vk Uk = ζUk Vk still hold, but in addition we have Ukd = 1I, because U bd = Ue d . It
k k
remains to show that on the finite dimensional algebras generated by these oper-
ators, the given state is a maximally entangled pure state. We will verify this by
computing the fidelity, i.e., the expectation of the projection (13.27):
Ã !
1 X −1 n m
ω (U1 U2 ) (V1 V2 ) = 1. (13.31)
d2 n,m
Proof of this equation. We have shown in Section 13.5.3 that U e1 and Ue2 are EPR-
e
doubles. This property transfers to arbitrary continuous functions of U1 and U e2 by
Lemma 13.4.3 and uniform approximation of continuous functions by polynomials.
However, because the state ω is not normal, it does not transfer automatically to
the measurable functional calculus and hence not automatically to U b1 and U b2 . We
claim that this is true nonetheless.
Denote by rd (z) = z 1/d the dth root function with the branch cut as described,
and let f² be a continuous function from the unit circle to the unit interval [0, 1]
such that f² (z) = 1 except for z in an ²-neighborhood of z = −1 in arclength, and
such that f² (−1) = 0. Then the function z 7→ f² (z)rd (z) is continuous. Then, since
e d and U
U e d are doubles, so are f² (U
e d ), f² (U
e d )U
b1 and their counterparts. Note that
1 2 1 1
7 The C*-algebra A e is isomorphic to the continuous sections in an C*-algebra bundle over

the torus, where each fiber is a copy of the algebra Ad . Such a bundle is called trivial, if it is
isomorphic to the tensor product Ad ⊗ CA . This would directly give us the desired subalgebra
Ad as a subalgebra of A. However, this is bundle is not trivial [34, 110]. In order to “trivialize”
the bundle, we are therefore forced to go beyond norm continuous operations, which respect the
continuity of bundle sections. Instead we have to go to the measurable functional calculus, and
introduce an operation on the fibers, which depends discontionuously on the base point, through
the introduction of a branch cut.
both of these commute with all other operators involved. Hence (using the notation
|X|2 = X ∗ X or |X|2 = XX ∗ , which coincide in this case)
³ ´
e1d )2 |U
ω f ² (U b1 − U b2 |2
³¯ ¯´
= ω ¯f ² ( Ue d )Ub1 − f² (U b2 ¯ = 0,
e d )U (13.32)
1 2
where the first equality holds by expanding the modulus square, and applying the
e d ) where appropriate. On the other hand, we have
double property of f² (U 1
³¡ ¢ ´
ω 1I − f² (U b1 − U
e d ) 2 |U b2 |2
1
¡ ¢
≤ 4ω 1I − f² (Ue1d )2 ≤ 4 ² , (13.33)
π
because kU b1 − Ub2 k ≤ 2, and 0 ≤ f² (U e d ) ≤ 1I. For the estimate we used that f² (z)2
1
for all z on the unit circle except a section of relative size 2²/(2π), and that the
probability distribution for the spectrum of U e d is uniform, because the expectation
1
of all powers (Ue ) = exp(indQ1 ) vanishes.
d
1 ¡ ¢
Adding (13.32) and (13.33) we find that ω |U b1 − Ub2 |2 ≤ 4²/π for every ², and
hence that U b1 and U b2 are EPR doubles as claimed. The proof that Vb1 and Vb ∗ are
2
likewise doubles (just as Ve1 and Ve2∗ ) is entirely analogous. Hence U1 and U2 as well
as V1 and V2 are also doubles. Applying this property in the fidelity expression
(13.31) we find that every term has expectation one, so that with the prefactor d−2
the d2 terms add up to one as claimed. ¤
13.5.5 EPR states based on two mode Gaussians
In this section we will deviate from the announcement that we intended to study only
such properties of EPR states which follow from the definition alone, and are hence
common to all EPR states. The reason is that there is one particular family, which
has a lot of additional symmetry, and hence more operators admitting doubles, than
general EPR states. Moreover, it is very well known. In fact, most people working in
quantum optics probably have a very concrete picture of the EPR state, or rather
of an approximation to this state: since Gaussian states play a prominent role in
the description of lasers, it is natural to consider a Gaussian wave function of the
form
µ
1 1−λ
Ψλ (x1 , x2 ) = √ exp − (q1 − q2 )2
π 4(1 + λ)
¶
1+λ
− (q1 + q2 )2 (13.34)
4(1 − λ)
p X∞
Ψλ = 1 − λ 2 λn en ⊗ e n , (13.35)
n=0
where en denotes the eigenbasis of the harmonic oscillators Hi = (Pi2 + Q2i )/2
(i = 1, 2). This state is also known as the NOPA state, and the parameter λ ∈ [0, 1)
is related to the so-called squeezing parameter r by λ = tanh(r). Values around
r = 5 are considered a good experimental achievement [144]. Of course, we are
interested in the limit r → ∞, or λ → 1.
The λ-dependence of the wave function can also be written as
Ψλ (x1 , x2 ) = Ψ0 (x1 cosh η + x2 sinh η, −x1 sinh η + x2 cosh η) , (13.36)
where the hyperbolic angle η is r/2. It is easy to see that for any wave function Ψ 0
the probability distributions of both Q1 − Q2 and P1 + P2 scale to a point measures
at zero. Hence any cluster point of the associated sequence of states ω λ (X) =
hΨλ , AΨλ i is an EPR state in the sense of our definition (with shift parameter
a = 0). Note, however, that the family itself does not converge to any state: it is
easy to construct observables X for which the expectation ωλ (X) remains oscillating
between 0 and 1 as λ → 1. Here, as in the general case, a single state can only be
obtained by going to a finest subsequence (or by taking the limit along an ultrafilter).
The virtue of the particular family (13.35) is that it has especially high symme-
try: it is immediately clear that
³ ´
(f (H1 ) − f (H2 ) Ψλ = 0 (13.37)
for all λ, and for all bounded functions f : N → C of the oscillator Hamiltonians
H1 , H2 . This implies that f (H1 ) and f (H2 ) are doubles with respect to the state ωλ
for each λ. Clearly, this property remains valid in the limit along any subsequence,
so all EPR-states obtained as cluster points of the sequence ωλ also have f (H1 )
in their algebra of doubles. Consequently, the unitaries Uk (t) = exp(itHk ) are also
doubles of each other, and the limiting states are invariant under the time evolution
U12 (t) = U1 (t) ⊗ U2 (−t). This is certainly suggestive, because oscillator time evo-
lutions have an interpretation as linear symplectic transformations on phase space:
Qk 7→ Qk cos t ± Pk sin t and Pk 7→ ∓Qk sin t + Pk cos t, where the upper sign holds
for k = 1 and the lower for k = 2. The subspace S from Section 13.5.3 is invariant
under such rotations, and one readily verifies that the time evolution U12 (t) takes
EPR states into EPR states. This certainly implies that by averaging we can gener-
ate EPR states invariant under this evolution, and we have clearly just constructed
a family with this invariance.
As λ → 1, the Schmidt spectrum in (13.35) becomes “flatter”, which suggests
that exchanging some labels n should also define a unitary with double. Let p :
N → N denote an injective (i.e., one-to-one but not necessarily onto map). Then we
define an isometry Vp by
Vp en = ep(n) (13.38)
with adjoint ½
ep−1 (n) if n ∈ p(N)
Vp∗ en = (13.39)
0 if n ∈
/ p(N)
Let us assume that p has finite distance, i.e., there is a constant ` such that |p(n) −
n| ≤ ` for all n ∈ N. We claim that in this case Vp ⊗ 1I and 1I ⊗ Vp∗ are doubles in all
EPR states constructed from the sequence (13.35). We show this by verifying that
the condition holds approximately already for finite λ. Consider the vector
¡ ¢
∆λ = Vp ⊗ 1I − 1I ⊗ Vp∗ Ψλ (13.40)
p X∞
= 1 − λ2 (λn − λp(n) ) ep(n) ⊗ en ,
n=0
where in the second summand we changed the summation index from n to p(n),
automatically omitting all terms annihilated by Vp∗ according to (13.39). Since this
is a sum of orthogonal vectors, we can readily estimate the norm by writing (λ n −
λp(n) ) = λn (1 − λp(n)−n ):
k∆λ k2 ≤ max |1 − λp(n)−n |2 ≤ |1 − λ−` |2 , (13.41)

n
which goes to zero as λ → 1. Therefore

³ ¡ ¢´
ωλ X Vp ⊗ 1I − 1I ⊗ Vp∗ = hΨλ , X∆λ i → 0 (13.42)
as λ → 1. Hence Vp ⊗ 1I and 1I ⊗ Vp∗ are doubles in any state defined by a limit of

ωλ along a subsequence, as claimed.
Vp is an isometry but not necessarily unitary. But it is effectively
¡ unitary under
¢
an EPR state: Since Vp is in the centralizer, we must have ω (1I − Vp Vp∗ ) ⊗ 1I =
¡ ¢
ω (1I − Vp∗ Vp ) ⊗ 1I = 0, although this operator is non-zero. This is in keeping with
the general properties of EPR states, whose restrictions must be purely singular. In
fact, (1I − Vp Vp∗ ) is the projection onto those eigenstates en for which n ∈
/ p(N), and
this set is finite: it has at most ` elements8 .
It is interesting to note what happens if one tries to relax the finite distance con-
dition. An extreme case would be the two isometries Veven en = e2n and Vodd en =
e2n+1 . These cannot have doubles in any state, because the restriction ωA of the
∗ ∗
state to the first factor would then have to satisfy 1 = ωA (Veven Veven + Vodd Vodd )=
∗ ∗
ωA (Veven Veven + Vodd Vodd ) = ωA (1I + 1I) = 2. On the other hand, the norm of ∆λ
no longer goes to zero, and we get k∆λ k2 → 1/6 instead.
To get infinite one-shot entanglement is easier than in the case of general EPR
states: we can simply combine d periodic multiplication operators with d-periodic
permutation operators to construct a finite Weyl-system of doubles9 . In fact there
is a very quick way to get high fidelity entangled pure states even for λ < 1 (see
[168] for an application to Bell inequality violations). Consider the unitary operator
Ud : H → H ⊗ Cd given by
Ud edk+r = (ek ⊗ e(d)

r ), (13.43)
for k = 0, 1, . . . and r = 0, 1, . . . , d − 1. Then

(d)
(Ud ⊗ Ud )Ψλ = Ψλd ⊗ Ψλ (13.44)
(d)
with a λ-dependent normalized vector Ψλ ∈ Cd ⊗ Cd proportional to
d
X
(d)
Ψλ ∝ λr e(d) (d)
r ⊗ er . (13.45)
r=1
Note that the infinite dimensional factor on the right hand side of (13.44) is again
a state of the form (13.35), however, a less entangled one with parameter λ0 =
λd < λ. The second factor, i.e., (13.44) becomes maximally entangled in the limit
λ → 1. Therefore the unitary (Ud ⊗ Ud ) splits both Alice’s and Bob’s subsystem,
so that the total system is split exactly into a less entangled version of itself and a
pure, nearly maximally entangled d-dimensional pair. The local operation extracting
entanglement from this state is to discard the infinite dimensional parts. Seen in
one of the limit states of the family ωλ this is maximally entangled, so equation
(13.2) is satisfied with ² = 0. Moreover, since the remaining system is of exactly the
same type, the process can be repeated arbitrarily often.
13.5.6 Counterintuitive properties of the restricted states
Basically, subsection 13.5.3 shows that the EPR states constructed here do satisfy
the requirements of the EPR argument. However, Einstein, Podolsky and Rosen
do not consider the measurement of suitable periodic functions of Qk or Pk but
measurements of these quantities themselves [82]: What do EPR states have to say
about these?
Unfortunately, the “values of momentum” found by Alice or Bob are not quite
what we usually mean by “values”: they are infinite with probability 1. To see this,
8 For any N > `, consider the set {1, . . . , N }. This has to contain at least the images of
{1, . . . , N − `}, hence it can contain at most ` elements not in p(N).
9 This is probably what the authors of [39] are trying to say.
Infinitely entangled states 210
recall the remark after eq. (13.22) that EPR states are invariant with respect to
phase space translations with W(ξ, ~ ~η ) with (ξ,
~ ~η ) ∈ S. Hence
¡ ¢
ω W(ξ1 , 0, η1 , 0)(A ⊗ 1I)W(ξ1 , 0, η1 , 0)∗
¡ ¢
= ω W(ξ1 , ξ1 , η1 , −η1 )(A ⊗ 1I)W(ξ1 , ξ1 , η1 , −η1 )∗
= ω(A ⊗ 1I). (13.46)
That is, the reduced state is invariant under all phase space translations. Now
suppose that for some continuous function f with compact support we have
ω(f (Q1 )) = ² 6= 0. Then we could add many (say N ) sufficiently widely spaced
PN
translates of f to get an operator F = i f (Q1 + xi 1I) with kF k ≤ kf k and
|N ²| = |ω(F )| ≤ kf k, which implies ² = 0. Hence for every function with compact
support we must have ω(f (Q1 )) = 0. Note that this is possible only for singular
states, since we can easily construct a sequence of compactly supported function
increasing to the identity, whose ω expectations are all zero, hence fail to converge
to 1.
In spite of being infinite, the “measured values” of Alice and Bob are perfectly
correlated, which means that we have to distinguish different kinds if infinity. Such
“kinds of infinity” are the subject of the topological theory of compactifications
[53, 235]. The basic idea is very simple: consider some C*-algebra of bounded func-
tions on the real line. Then the evaluations of the functions at a point, i.e., the
functionals x 7→ f (x), are pure states on such an algebra, but ćompactness of the
state space together with the Kreı̆n-Milman Theorem [4] dictates that there are
many more pure states. These additional pure states are interpreted as the points
at infinity associated with the given observable algebra. The set of all pure states is
called the Gel’fand spectrum of the commutative C*-algebra[35, Sec.2.3.5], and the
algebra is known to be isomorphic to the algebra of continuous functions on this
compact space. For the algebra of all bounded function the additional pure states
are called free ultrafilters, for the algebra of all continuous bounded functions we
get the points of the Stone-Čech-compactification, and for the algebra of uniformly
continuous functions we get a still coarser notion of points at infinity. According to
Section 13.5.3 these are the measured values, which will be perfectly correlated be-
tween Alice’s and Bob’s positions or momenta. It is not possible to exhibit any such
value, because proving their mere existence already requires an argument based on
the Axiom of Choice.
So do we have to be content with the statement that the measured values lie “out
there on the infinite ranges, where the free ultrafilters roam?” Section 13.5.4 shows
that for many concrete problems, involving not too large observable algebras, we
can use the perfect correlation property quite well. A smaller algebra of observables
means that many points of Gel’fand spectrum become identified, and some of these
coarser points may have a direct physical interpretation. So the moral is not so much
that compactification points at infinity are wild, pathological objects, but that they
describe the way a sequence can go to infinity in the finest possible detail, which is
just much finer that we usually want to know. The EPR correlation property holds
even for such wild “measured values”.
Bibliography
[1] A. Acı́n, A. Andrianov, L. Costa, E. Jané, J. I. Latorre and R. Tarrach.

Schmidt decomposition and classification of three-quantum-bit states. Phys.
Rev. Lett. 85, no. 7, 1560–1563 (2000).
[2] A. Acı́n, R. Tarrach and G. Vidal. Optimal estimation of two-qubit pure-state

entanglement. Phys. Rev. A 61, 062307 (2000).
[3] C. Adami and N. J. Cerf. Von Neumann capacity of noisy quantum channels.
Phys. Rev. A 56, no. 5, 3470–3483 (1997).
[4] E.M. Alfsen. Compact convex sets and boundary integrals, volume 57 of Ergeb-
nisse der Mathematik und ihrer Grenzgebiete. Springer, Springer, New York,
Hedelberg, Berlin, (1971).
[5] R. Alicki, S. Rudnicki and S. Sadowski. Symmetry properties of product states

for the system of N n-level atoms. J. Math. Phys. 29, no. 5, 1158–1162 (1988).
[6] A. Ambainis. A new protocol and lower bounds for quantum coin flipping.
In Proceedings of the 33rd Annual Symposium on Theory of Computing 2001,
pages 134–142. Association for Computing Machinery, New York (2001). see
also the more recent version in quant-ph/0204022.
[7] H. Araki and E.J. Woods. A classification of factors. Publ. R.I.M.S, Kyoto
Univ. 4, 51–130 (1968).
[8] R. Arens and V.S. Varadarajan. On the concept of EPR states and their
structure. Jour. Math. Phys. 41, 638–651 (2000).
[9] W. Arveson. Subalgebras of C*-algebras. Acta. Math. 123, 141–224 (1969).
[10] A. Ashikhmin and E. Knill. Nonbinary quantum stabilizer codes. IEEE T.

Inf. Theory 47, no. 7, 3065–3072 (2001).
[11] A. Aspect, J. Dalibard and G. Roger. Experimental test of Bell’s inequalities

using time-varying analyzers. Phys. Rev. Lett. 49, 1804–1807 (1982).
[12] H. Barnum, E. Knill and M. A. Nielsen. On quantum fidelities and channel

capacities. IEEE Trans. Inf. Theory 46, 1317–1329 (2000).
[13] H. Barnum, M. A. Nielsen and B. Schumacher. Information transmission

through a noisy quantum channel. Phys. Rev. A 57, no. 6, 4153–4175 (1998).
[14] H. Barnum, J. A. Smolin and B. M. Terhal. Quantum capacity is properly

defined without encodings. Phys. Rev A 58, no. 5, 3496–3501 (1998).
[15] A. O. Barut and R. Raczka. Theory of group representations and applications.

World Scientific, Singapore (1986).
[16] H. Baumgärtel and M. Wollenberg. Causal nets of operator algebras.

Akademie Verlag, Berlin (1992).
[17] C. H. Bennett, H. J. Bernstein, S. Popescu and B. Schumacher. Concentrating

partial entanglement by local operations. Phys. Rev. A 53, no. 4, 2046–2052
(1996).
Bibliography 212
[18] C. H. Bennett and G. Brassard. Quantum key distribution and coin tossing.
In Proc. of IEEE Int. Conf. on Computers, Systems, and Signal Processing
(Bangalore, India, 1984), pages 175–179. IEEE, New York (1984).
[19] C. H. Bennett, G. Brassard, C. Crépeau, R. Jozsa, A. Peres and W. K. Woot-

ters. Teleporting an unknown quantum state via dual classical and Einstein-
Podolsky-Rosen channels. Phys. Rev. Lett. 70, 1895–1899 (1993).
[20] C. H. Bennett, G. Brassard, S. Popescu, B. Schumacher, J. A. Smolin and

W. K. Wootters. Purification of noisy entanglement and faithful teleportation
via noisy channels. Phys. Rev. Lett. 76, no. 5, 722–725 (1996). Erratum:
Phys. Rev. Lett. 78, 10, 2031 (1997).
[21] C. H. Bennett, D. P. DiVincenzo, C. A. Fuchs, T. Mor, E. M. Rains, P. W.

Shor, J. A. Smolin and W. K. Wootters. Quantum nonlocality without entan-
glement. Phys. Rev. A 59, no. 2, 1070–1091 (1999).
[22] C. H. Bennett, D. P. DiVincenzo, T. Mor, P. W. Shor, J. A. Smolin and B. M.

Terhal. Unextendible product bases and bound entanglement. Phys. Rev. Lett
82, no. 26, 5385–5388 (1999).
[23] C. H. Bennett, D. P. DiVincenzo and J. A. Smolin. Capacities of quantum

erasure channels. Phys. Rev. Lett. 78, no. 16, 3217–3220 (1997).
[24] C. H. Bennett, D. P. DiVincenzo, J. A. Smolin and W. K. Wootters. Mixed-

state entanglement and quantum error correction. Phys. Rev. A 54, no. 4,
3824–3851 (1996).
[25] C. H. Bennett, P. W. Shor, J. A. Smolin and A. V. Thapliyal. Entanglement-

assisted classical capacity of noisy quantum channels. Phys. Rev. Lett. 83,
no. 15, 3081–3084 (1999).
[26] C. H. Bennett, P. W. Shor, J. A. Smolin and A. V. Thapliyal. Entanglement-

assisted capacity of a quantum channel and the reverse Shannon theorem.
quant-ph/0106052 (2001).
[27] C. H. Bennett and S. J. Wiesner. Communication via one- and two-particle

operators on Einstein-Podolsky-Rosen states. Phys. Rev. Lett. 20, 2881–2884
(1992).
[28] E. Biolatti, R. C. Iotti, P. Zanardi and F. Rossi. Quantum information process-

ing with semiconductor macroatoms. Phys. Rev. Lett. 85, no. 26, 5647–5650
(2000).
[29] M. Blum. Coin flipping by telephone. A protocol for solving impossible prob-
lems. SIGACT News 15, 23–27 (1981).
[30] A. Bogomolny. Interactive mathematics miscellany and puzzles. http://www.

cut-the-knot.com/hall.html.
[31] D. Boschi, S. Branca, F. De Martini, L. Hardy and S. Popescu. Experimental

realization of teleporting an unknown pure quantum state via dual classical
an Einstein-Podolsky-Rosen channels. Phys. Rev. Lett. 80, no. 6, 1121–1125
(1998).
[32] D. Bouwmeester, A. K. Ekert and A. Zeilinger (editors). The physics of

quantum information: Quantum cryptography, quantum teleportation, quan-
tum computation. Springer, Berlin (2000).
213 Bibliography
[33] D. Bouwmeester, J.-W. Pan, K. Mattle, M. Eibl, H. Weinfurter and

A. Zeilinger. Experimental quantum teleportation. Nature 390, 575–579
(1997).
[34] O. Bratelli, G.A. Elliott, D.E. Evans and A. Kishimoto. Non-commutative
spheres II: Rational rotations. J. Operator Theory 27, 53–85 (1992).
[35] O. Bratteli and D. W. Robinson. Operator algebras and quantum statistical
mechanics. I+II. Springer, New York (1979, 1997).
[36] S. L. Braunstein, C. M. Caves, R. Jozsa, N. Linden, S. Popescu and R. Schack.
Separability of very noisy mixed states and implications for NMR quantum
computing. Phys. Rev. Lett. 83, no. 5, 1054–1057 (1999).
[37] G. K. Brennen, C. M. Caves and I. H. Deutsch F. S. Jessen. Quantum logic
gates in optical lattices. Phys. Rev. Lett. 82, no. 5, 1969–1063 (1999).
[38] K.R. Brown, D.A. Lidar and K.B. Whaley. Quantum computing with quantum
dots on linear supports. quant-ph/0105102 (2001).
[39] C. Brukner, M.S. Kim, J-W. Pan and A. Zeilinger. Correspondence between
continuous variable and discrete quantum systems of arbitrary dimensions.
quant-ph/0208116 (2002).
[40] T. A. Brun and H. L. Wang. Coupling nanocrystals to a high-q silica micro-
sphere: Entanglement in quantum dots via photon exchange. Phys. Rev. A
61, 032307 (2000).
[41] D. Bruß, D. P. DiVincenzo, A. Ekert, C. A. Fuchs, C. Machiavello and J. A.
Smolin. Optimal universal and state-dependent cloning. Phys. Rev. A 57,
no. 4, 2368–2378 (1998).
[42] D. Bruß, A. K. Ekert and C. Macchiavello. Optimal universal quantum cloning
and state estimation. Phys. Rev. Lett. 81, no. 12, 2598–2601 (1998).
[43] D. Bruß and C. Macchiavello. Optimal state estimation for d-dimensional
quantum systems. Phys. Lett. A253, 249–251 (1999).
[44] W. T. Buttler, R.J. Hughes, S.K. Lamoreaux, G.L. Morgan, J.E. Nordholt
and C.G. Peterson. Daylight quantum key distribution over 1.6 km. Phys.
Rev. Lett 84, 5652–5655 (2000).
[45] V. Bužek and M. Hillery. Universal optimal cloning of qubits and quantum
registers. Phys. Rev. Lett. 81, no. 22, 5003–5006 (1998).
[46] V. Bužek, M. Hillery and R. F. Werner. Optimal manipulations with qubits:
Universal-not gate. Phys. Rev. A 60, no. 4, R2626–R2629 (1999).
[47] A. R. Calderbank, E. M. Rains, P. W. Shor and N. J. A. Sloane. Quantum
error correction and orthogonal geometry. Phys. Rev. Lett. 78, no. 3, 405–408
(1997).
[48] A. R. Calderbank and P. W. Shor. Good quantum error-correcting codes exist.
Phys. Rev. A 54, 1098–1105 (1996).
[49] N. J. Cerf. Asymmetric quantum cloning machines. J.Mod.Opt. 47, 187–
(2000).
[50] N. J. Cerf. Quantum cloning with continuous variables. quant-ph/0210061
(2002).
Bibliography 214
[51] N. J. Cerf and C. Adami. Negative entropy and information in quantum

mechanics. Phys. Rev. Lett. 79, no. 26, 5194–5197 (1997).
[52] N. J. Cerf, C. Adami and R. M. Gingrich. Reduction criterion for separability.

Phys. Rev. A 60, no. 2, 898–909 (1999).
[53] R.E. Chandler. Hausdorff compactifications, volume 23 of Lect. Notes Pure

Appl. Math. Dekker, New York (1976).
[54] A. Church. An unsolved problem of elementary number theory. Amer. J.

Math. 58, 345–363 (1936).
[55] J. I. Cirac, A. K. Ekert and C. Macchiavello. Optimal purification of single

qubits. Phys. Rev. Lett. 82, 4344–4347 (1999).
[56] B.S. Cirel’son. Quantum generalizations of Bell’s inequalities. Lett. Math.

Phys. 4, 93–100 (1980).
[57] J. F. Clauser, M. A. Horne, A. Shimony and R. A. Holt. Proposed experiment

to test local hidden-variable theories. Phys. Rev. Lett 23, no. 15, 880–884
(1969).
[58] R. Clifton and H. Halvorson. Maximal beable subalgebras of quantum-

mechanical observables. Int. J. Theor. Phys. 38, 2441–2484 (1999).
[59] R. Clifton and H. Halvorson. Bipartite mixied states of infinite dimensional

systems are generically nonseparable. Phys. Rev. A 61, 012108 (2000).
[60] R. Clifton and H. Halvorson. Reconsidering Bohr’s reply to EPR. quant-

ph/0110107 (2001).
[61] A. Connes. Sur la cassification des facteurs de type II. C.R. Acad. Sci. Paris
Ser. A-B 281, A13–A15 (1975).
[62] J. F. Cornwell. Group theory in physics. II. Academic Press, London et. al.
(1984).
[63] T. M. Cover and J. A. Thomas. Elements of information theory. Wiley,

Chichester (1991).
[64] G.M. D’Ariano, R.D. Gill, M. Keyl, B. Kuemmerer, H. Maassen and

R.F. Werner. The quantum Monty Hall problem. http://www.imaph.tu-
bs.de/qi/monty.
[65] G.M. D’Ariano, R.D. Gill, M. Keyl, B. Kuemmerer, H. Maassen and R.F.
Werner. The quantum Monty Hall problem. Quantum Inf. Comput. 2, 355–
366 (2002).
[66] E. B. Davies. Quantum Theory of Open Systems. Academic Press, London

(1976).
[67] B. Demoen, P. Vanheuverzwijn and A. Verbeure. Completely positive maps

on the CCR-algebra. Lett. Math. Phys. 2, 161–166 (1977).
[68] R. Derka, V. Bužek and A.K. Ekert. Universal algorithm for optimal es-
timation of quantum states from finite ensembles via realizable generalized
measurements. Phys. Rev. Lett. 80, no. 8, 1571–1575 (1998).
[69] D. Deutsch. Quantum theory, the Church-Turing principle and the universal
quantum computer. Proc. R. Soc. Lond. A 400, 97–117 (1985).
215 Bibliography
[70] D. Deutsch and R. Jozsa. Rapid solution of problems by quantum computation.

Proc. R. Soc. Lond. A 439, 553–558 (1992).
[71] D. P. DiVincenzo, P. W. Shor and J. A. Smolin. Quantum-channel capacity

of very noisy channels. Phys. Rev. A 57, no. 2, 830–839 (1998). Erratum:
Phys. Rev. A 59, 2, 1717 (1999).
[72] D.P. DiVincenzo, P.W. Shor, J.A. Smolin, B.M. Terhal and A.V. Thapliyal.
Evidence for bound entangled states with negative partial transpose. Phys.
Rev. A 61, no. 6, 062312 (2000).
[73] M. J. Donald and M. Horodecki. Continuity of relative entropy of entangle-

ment. Phys. Lett A 264, no. 4, 257–260 (1999).
[74] M. J. Donald, M. Horodecki and O. Rudolph. The uniqueness theorem for

entanglement measures. quant-ph/0105017 (2001).
[75] C. Döscher and M. Keyl. An introduction to quantum coin-tossing. Fluct.

Noise Lett. 2, no. 4, R125–R137 (2002).
[76] N. G. Duffield. A large deviation principle for the reduction of product repre-
sentations. Proc. Amer. Math. Soc. 109, 503–515 (1990).
[77] P. Dupius and R. S. Ellis. A weak convergence approach to the theory of large
deviations. Wiley, New York et. al. (199?).
[78] W. Dür, J.I. Cirac, M. Lewenstein and D. Bruss. Distillability and partial
transposition in bipartite systems. Phys. Rev. A 61, no. 6, 062313 (2000).
[79] B. Efron and R. J. Tibshirani. An introduction to the bootstrap. Chapman

and Hall, New York (1993).
[80] T. Eggeling, K. G. H. Vollbrecht, R. F. Werner and M. M. Wolf. Distillability

via protocols respecting the positivity of the partial transpose. Phys. Rev. Lett.
87, 257902 (2001).
[81] T. Eggeling and R. F. Werner. Separability properties of tripartite states with

U × U × U -symmetry. Phys. Rev. A 63, no. 4, 042111 (2001).
[82] A. Einstein, B. Podolsky and N. Rosen. Can quantum-mechanical description

of physical reality be considered complete? Phys. Rev 47, 777–780 (1935).
[83] J. Eisert, C. Simon and M.B. Plenio. On the quantification of entanglement

in infinite-dimensional quantum systems. quant-ph/0112064 (2001).
[84] J. Eisert, M. Wilkens and M. Lewenstein. Quantum games and quantum

strategies. Phys. Rev. Lett. 83, 3077–3080 (1999).
[85] R. S. Ellis. Entropy, large deviations, and statistical mechanics. Springer,

Berlin (1985).
[86] D. J. Wineland et. al. Quantum information processing with trapped ions.
quant-ph/0212079 (2002).
[87] R. Laflamme et. al. Introduction to NMR quantum information processing.

quant-ph/0207172 (2002). to appear in LA Science.
[88] A. Feinstein. Foundations of Informations Theory. McGraw-Hill, New York

(1958).
Bibliography 216
[89] D. G. Fischer and M. Freyberger. Estimating mixed quantum states. Phys.

Lett. A273, 293–302 (2000).
[90] A. P. Flitney and D. Abbott. An introduction to quantum game theory. quant-

ph/0208069 (2002).
[91] A. P. Flitney and D. Abbott. Quantum version of the Monty Hall problem.
Phys. Rev. A 65, 062318 (2002).
[92] G. Giedke, L.-M. Duan, J. I. Cirac and P. Zoller. Distillability criterion for
all bipartite gaussian states. Quant. Inf. Comp. 1, no. 3 (2001).
[93] G. Giedke, B. Kraus, M. Lewenstein and J. I. Cirac. Separability properties

of three-mode gaussian states. Phys. Rev. A 64, no. 05, 052303 (2001).
[94] R. D. Gill and S. Massar. State estimation for large ensembles. Phys. Rev.
A61, 2312–2327 (2000).
[95] N. Gisin. Hidden quantum nonlocality revealed by local filters. Phys. Lett. A
210, no. 3, 151–156 (1996).
[96] N. Gisin and S. Massar. Optimal quantum cloning machines. Phys.Rev.Lett.

79, no. 11, 2153–2156 (1997).
[97] N. Gisin, G. Ribordy, W. Tittel and H. Zbinden. Quantum cryptography. Rev.

Mod. Phys. 74, no. 1, 145–195 (2002).
[98] D. Gottesman. Class of quantum error-correcting codes saturating the quan-

tum hamming bound. Phys. Rev. A 54, 1862–1868 (1996).
[99] D. Gottesman. Stabilizer codes and quantum error correction. Ph.D. thesis,
California Institute of Technology (1997). quant-ph/9705052.
[100] M. Grassl, T. Beth and T. Pellizzari. Codes for the quantum erasure channel.
Phys. Rev. A 56, no. 1, 33–38 (1997).
[101] D. M. Greenberger, M. A. Horne and A. Zeilinger. Going beyond bell’s the-

orem. In Bell’s theorem, quantum theory, and conceptions of the universe (
M. Kafatos, editor), pages 69–72. Kluwer Academic, Dordrecht (1989).
[102] L. K. Grover. Quantum computers can search arbitrarily large databases by a

single query. Phys. Rev. A 56, no. 23, 4709–4712 (1997).
[103] L. K. Grover. Quantum mechanics helps in searching for a needle in a

haystack. Phys. Rev. Lett. 79, no. 2, 325–328 (1997).
[104] R. Haag, N.M. Hugenholtz and M. Winnink. On the equilibrium states in

quantum statistical mechanics. Commun. Math. Phys. 5, 215–236 (1967).
[105] M. Hamada. Exponential lower bound on the highest fidelity achievable by

quantum error-correcting codes. quant-ph/0109114 (2001).
[106] L. Hardy and A. Kent. Cheat sensitive quantum bit commitment. quant-
ph/9911043 (1999).
[107] M. Hayashi. Optimal sequence of quantum measurements in the sense of stein’s

lemma in quantum hypothesis testing. quant-ph/020820 (2002). submitted to
J. Phys. A.
217 Bibliography
[108] P. M. Hayden, M. Horodecki and B. M. Terhal. The asymptotic entanglement

cost of preparing a quantum state. J. Phys. A., Math. Gen. 34, no. 35, 6891–
6898 (2001).
[109] C. W. Helstrom. Quantum detection and estimation theory. Academic Press,

New York (1976).
[110] R. Høegh-Krohn and T. Skjelbred. Classification of C*-algebras admitting

ergodic actions of the two-dimensional torus. J. Reine Angew. Math. 328,
1–8 (1981).
[111] A. S. Holevo. Probabilistic and statistical aspects of quantum theory. North-

Holland, Amsterdam (1982).
[112] A. S. Holevo. Coding theorems for quantum channels. Tamagawa University

Research Review no. 4 (1998). quant-ph/9809023.
[113] A. S. Holevo. Sending quantum information with gaussian states. In Proc. of

the 4th Int. Conf. on Quantum Communication, Measurement and Computing
(Evanston, 1998) (1998). quant-ph/9809022.
[114] A. S. Holevo. On entanglement-assisted classical capacity. quant-ph/0106075

(2001).
[115] A. S. Holevo. Statistical structure of quantum theory. Springer, Berlin (2001).
[116] A. S. Holevo and R. F. Werner. Evaluating capacities of bosonic gaussian

channels. Phys. Rev. A 63, no. 3, 032312 (2001).
[117] M. Horodecki and P. Horodecki. Reduction criterion of separability and limits

for a class of distillation protocols. Phys. Rev. A 59, no. 6, 4206–4216 (1999).
[118] M. Horodecki, P. Horodecki and R. Horodecki. Separability of mixed states:

Necessary and sufficient conditions. Phys. Lett. A 223, no. 1-2, 1–8 (1996).
[119] M. Horodecki, P. Horodecki and R. Horodecki. Mixed-state entanglement and

distillation: Is there a “bound” entanglement in nature? Phys. Rev. Lett. 80,
no. 24, 5239–5242 (1998).
[120] M. Horodecki, P. Horodecki and R. Horodecki. General teleportation channel,

singlet fraction, and quasidistillation. Phys. Rev. A 60, no. 3, 1888–1898
(1999).
[121] M. Horodecki, P. Horodecki and R. Horodecki. Limits for entanglement mea-

sures. Phys. Rev. Lett. 84, no. 9, 2014–2017 (2000).
[122] M. Horodecki, P. Horodecki and R. Horodecki. Unified approach to quantum

capacities: Towards quantum noisy coding theorem. Phys. Rev. Lett. 85, no. 2,
433–436 (2000).
[123] M. Horodecki, P. Horodecki and R. Horodecki. Mixed-state entanglement and

quantum communication. In Quantum information ( G. Alber et. al., editor),
pages 151–195. Springer (2001).
[124] P. Horodecki, J.I. Cirac and M. Lewenstein. Bound entanglement for contin-
uous variables is a rare phenomenon. quant-ph/0103076 (2001).
[125] P. Horodecki, M. Horodecki and R. Horodecki. Bound entanglement can be

activated. Phys. Rev. Lett. 82, no. 5, 1056–1059 (1999).
Bibliography 218
[126] R. J. Hughes, G. L. Morgan and C. G. Peterson. Quantum key distribution

over a 48 km optical fibre network. J. Mod. Opt. 47, no. 2-3, 533–547 (2000).
[127] A. E. Ingham. On the difference between consecutive primes. Quart. J. Math.,
Oxford Ser. 8, 255–266 (1937).
[128] A. JamioÃlkowski. Linear transformations which preserve trace and positive
semidefiniteness of operators. Rep. Math. Phys. 3, 275–278 (1972).
[129] Klaus Jänich. Differenzierbare G-Mannigfaltigkeiten. Lecture Notes in Math-
ematics, No 59. Springer-Verlag, Berlin (1968).
[130] T. Jennewein, C. Simon, G. Weihs, H. Weinfurter and A. Zeilinger. Quantum
cryptography with entangled photons. Phys. Rev. Lett. 84, 4729–4732 (2000).
[131] S. Kakutani. A generalization of Brouwer’s fixed point theorem. Duke Math.
J. 8, 457–459 (1941).
[132] A. Kent. Coin tossing is strictly weaker than bit commitment. Phys. Rev.
Lett. 83, 5382–5384 (1999).
[133] M. Keyl. Quantum operation with multiple inputs. In Quantum theory and
symmetries ( H. D. Doebner, V. K. Dobrev, J.-D. Hennig and W. Lücke,
editors), pages 401–405. World Scientific, Singapore (2000).
[134] M Keyl. Fundamentals of quantum information theory. Phys. Rep. 369, no. 5,
431–548 (2002).
[135] M. Keyl, D. Schlingemann and R. F. Werner. Infinitely entangled states.
quant-ph/0212014 (2002).
[136] M. Keyl and R. F. Werner. Optimal cloning of pure states, testing single
clones. J. Math. Phys. 40, 3283–3299 (1999).
[137] M. Keyl and R. F. Werner. Estimating the spectrum of a density operator.
Phys. Rev. A 64, no. 5, 052311 (2001).
[138] M. Keyl and R. F. Werner. The rate of optimal purification procedures. Ann
H. Poincaré 2, 1–26 (2001).
[139] M. Keyl and R.F. Werner. How to correct small quantum errors. In Coherent
evolution in noisy environment ( A. Buchleitner and K. Hornberger, editors),
volume 611 of Lecture notes in physics, pages 263–286. Springer, Berlin (2002).
[140] A. I. Khinchin. Mathematical Foundations of Information Theory. Dover
Publications, New York (1957).
[141] C. King. Additivity for unital qubit channels. J. Math. Phys. 43, no. 10,
4641–4653 (2002).
[142] C. King. The capacity of the quantum depolarizing channel. quant-ph/0204172
(2002).
[143] E. Knill and R. Laflamme. Theory of quantum error-correcting codes. Phys.
Rev. A 55, no. 2, 900–911 (1997).
[144] N. Korolkova and G. Leuchs. Multimode quantum correlations. In Coherence
and statistics of photons and atoms ( J. Perina, editor). Wiley (2001).
[145] B. Kraus, M. Lewenstein and J. I. Cirac. Characterization of distillable and
activable states using entanglement witnesses. quant-ph/0110174 (2001).
219 Bibliography
[146] K. Kraus. States effects and operations. Springer, Berlin (1983).
[147] D. Kretschmann. Channel capacities quantized. Diploma Thesis, TU-

Braunschweig. in preparation.
[148] R. Landauer. Irreversibility and heat generation in the computing process.

IBM J. Res. Dev. 5, 183 (1961).
[149] C. F. Lee and N. F. Johnson. Quantum game theory. quant-ph/0207012

(2002).
[150] U. Leonhardt. Measuring the quantum state of light. Cambridge Univ. Press,
Cambridge (1997).
[151] M. Lewenstein and A. Sanpera. Separability and entanglement of composite

quantum systems. Phys. Rev. Lett. 80, no. 11, 2261–2264 (1998).
[152] C.-F. Li, Y.-S. Zhang, Y.-F. Huang and G.-C. Guo. Quantum strategies of
quantum measurement. Phys. Lett. A 280, 257–260 (2000).
[153] S. Lloyd. Capacity of the noisy quantum channel. Phys. Rev. A 55, no. 3,
1613–1622 (1997).
[154] H.-K. Lo and H. F. Chau. Why quantum bit commitment and ideal quantum
coin tossing are impossible. Physica D 120, 177–187 (1998).
[155] Y. Makhlin, G. Schön and A. Shnirman. Quantum-state engineering with

Josephson-junction devices. Rev. Mod. Phys. 73, no. 2, 357–400 (2001).
[156] L. Marinatto and T. Weber. A quantum approach to static games of complete

information. Phys. Let. A 272, 291–303 (2000).
[157] S. Massar and S. Popescu. Optimal extraction of information from finite

quantum ensembles. Phys. Rev. Lett. 74, no. 8, 1259–1263 (1995).
[158] R. Matsumoto and T. Uyematsu. Lower bound for the quantum capacity of a
discrete memoryless quantum channel. quant-ph/0105151 (2001).
[159] K. Mattle, H. Weinfurter, P. G. Kwiat and A. Zeilinger. Dense coding in

experimental quantum communication. Phys. Rev. Lett. 76, no. 25, 4656–
4659 (1996).
[160] D. Mayers. Unconditional secure quantum bit commitment is impossible. Phys.

Rev. Let. 78, 3414–3417 (1997).
[161] D. Mayers, L. Salvai and Y. Chiba-Kohno. Unconditional secure quantum

coin-tossing. quant-ph/9904078 (1999).
[162] N. D. Mermin. Quantum mysteries revisited. Am. J. Phys. 58, no. 8, 731–734
(1990).
[163] N. D. Mermin. What’s wrong with these elements of reality? Phys. Today 43,
no. 6, 9–11 (1990).
[164] D. A. Meyer. Quantum strategies. Phys. Rev. Lett. 82, 1052–1055 (1999).
[165] M. M. Möbius. Introduction to game theory. http://www.courses.fas.harvard.

edu/∼ec1052/ (2002).
Bibliography 220
[166] H. Nagaoka and M. Hayashi. An information-spectrum approach to classi-

cal and quantum hypothesis testing for simple hypotheses. quant-ph/0206185
(2002).
[167] J. Nash. Non-cooperative games. Ann. of Math., II. Ser 54, 286–295 (1951).
[168] M. Neumann. Verletzung der Bellschen Ungleichungen für Gaußsche

Zustände. Diplomarbeit, TU-Braunschweig (2002).
[169] M. A. Nielsen. Conditions for a class of entanglement transformations. Phys.

Rev. Lett. 83, no. 2, 436–439 (1999).
[170] M. A. Nielsen. Continuity bounds for entanglement. Phys. Rev. A 61, no. 6,
064301 (2000).
[171] M. A. Nielsen. Characterizing mixing and measurement in quantum mechan-

ics. Phys. Rev. A 63, no. 2, 022114 (2001).
[172] M. A. Nielsen and I. L. Chuang. Quantum computation and quantum infor-

mation. Cambridge University Press, Cambridge (2000).
[173] T. Ogawa and H. Nagaoka. Strong converse and Stein’s lemma in quantum
hypothesis testing. IEEE Trans. Inf. Theory IT-46, 2428 2433 (2000).
[174] M. Ohya and D. Petz. Quantum entropy and its use. Springer, Berlin (1993).
[175] M. Ozawa. Conditional probability and a posteriori states in quantum me-

chanics. Publ. RIMS Kyoto Univ. 21, 279–295 (1985).
[176] M. Ozawa. Measuring processes and repeatability hypothesis. In Probability

theory and mathematical statistics (Kyoto, 1986), volume 1299 of Lect. Notes
Math., pages 412–421. Springer, Berlin (1988).
[177] C. M. Papadimitriou. Computational complexity. Addison-Wesley, Reading,

Massachusetts (1994).
[178] V. I. Paulsen. Completely bounded maps and dilations. Longman Scientific &
Technical (1986).
[179] A. Peres. Higher order schmidt decompositions. Phys. Lett. A 202, no. 1,
16–17 (1995).
[180] A. Peres. Separability criterion for density matrices. Phys. Rev. Lett. 77,
no. 8, 1413–1415 (1996).
[181] S. Popescu. Bell’s inequalities versus teleportation: What is nonlocality? Phys.

Rev. Lett. 72, no. 6, 797–799 (1994).
[182] S. Popescu and D. Rohrlich. Thermodynamics and the measure of entangle-

ment. Phys. Rev. A 56, no. 5, R3319–R3321 (1997).
[183] E. M. Rains. Bound on distillable entanglement. Phys. Rev. A 60, no. 1,

179–184 (1999). Erratum: Pys. Rev. A 63, 1, 019902(E) (2001).
[184] E. M. Rains. A semidefinite program for distillable entanglement. quant-

ph/0008047 (2000).
[185] E. M. Rains. A semidefinite program for distillable entanglement. IEEE T.

Inf. Theory 47, no. 7, 2921–2933 (2001).
221 Bibliography
[186] M. Reed and B. Simon. Methods of modern mathematical physics. I. Academic

Press, San Diego (1980).
[187] W. Rudin. Functional Analysis. McGraw-Hill, New-York (1973).
[188] O. Rudolph. A separability criterion for density operators. J. Phys. A 33,

no. 21, 3951–3955 (2000).
[189] S. Sakai. C*-algebras and W*-algebras. Springer, Berlin, Heidelberg, New

York (1971).
[190] D. Schlingemann and R. F. Werner. Quantum error-correcting codes associ-

ated with graphs. quant-ph/0012111 (2000).
[191] C. E. Shannon. A mathematical theory of communication. Bell. Sys. Tech. J.

27, 379–423, 623–656 (1948).
[192] P. W. Shor. Algorithms for quantum computation: Discrete logarithms and

factoring. In Proc. of the 35th Annual Symposium on the Foundations of
Computer Science ( S. Goldwasser, editor), pages 124–134. IEEE Computer
Science, Society Press, Los Alamitos, California (1994).
[193] P. W. Shor. Polynomial-time algorithms for prime factorization and discrete

logarithms on a quantum computer. Soc. Ind. Appl. Math. J. Comp. 26,
1484–1509 (1997).
[194] P. W. Shor, J. A. Smolin and B. M. Terhal. Nonadditivity of bipartite dis-

tillable entanglement follows from a conjecture on bound entangled Werner
states. Phys. Rev. Lett. 86, no. 12, 2681–2684 (2001).
[195] B. Simon. Representations of finite and compact groups. American Mathe-

matical Society, Providence (1996).
[196] D. Simon. On the power of quantum computation. In Proc. 35th annual sym-
posium on foundations of computer science, pages 124–134. IEEE Computer
Society Press, Los Alamitos (1994).
[197] R. Simon. Peres-Horodecki separability criterion for continuous variable sys-

tems. Phys. Rev. Lett. 84, no. 12, 2726–2729 (2000).
[198] S. Singh. The code book: The Science of Secrecy from Ancient Egypt to Quan-
tum Cryptography. Fourth Estate, London (1999).
[199] R. W. Spekkens and T. Rudolph. Degrees of concealment and bindingness in
quantum bit commitment protocols. Phys. Rev. A 65, 012310 (2002).
[200] R. W. Spekkens and T. Rudolph. A quantum protocol for cheat-sensitive weak

coin flipping. quant-ph/0202118 (2002).
[201] A. M. Steane. Multiple particle interference and quantum error correction.

Proc. Roy. Soc. Lond. A 452, 2551–2577 (1996).
[202] W. F. Stinespring. Positive functions on C*-algebras. Proc. Amer. Math. Soc.

6, 211–216 (1955).
[203] E. Størmer. Positive linear maps of operator algebras. Acta Math. 110, 233–
278 (1693).
[204] S.J. Summers and R.F.Werner. Maximal violation of Bell’s inequality is

generic in quantum field theory. Commun. Math. Phys. 110, 247–259 (1987).
Bibliography 222
[205] S.J. Summers and R.F.Werner. Maximal violation of Bell’s inequalities for
algebras of observables in tangent spacetime regions. Ann. Inst. H. Poincaré
A 49, 215–243 (1988).
[206] S.J. Summers and R.F.Werner. On Bell’s inequalities and algebraic invariants.
Lett. Math. Phys. 33, 321–334 (1995).
[207] M. Takesaki. Tomita’s theory of modular Hilbert algebras and its application,
volume 128 of Lect. Notes. Math. Springer, Berlin, Heidelberg, New York
(1970).
[208] M. Takesaki. Theory of operator algebras. Springer, New York, Heidelberg,

Berlin (1979).
[209] T. Tanamoto. Quantum gates by coupled asymmetric quantum dots and

controlled-not-gate operation. Phys. Rev. A 61, 022305 (2000).
[210] B. M. Terhal and K. G. H. Vollbrecht. Entanglement of formation for isotropic

states. Phys. Rev. Lett. 85, no. 12, 2625–2628 (2000).
[211] W. Tittel, J. Brendel and H. Zbinden N. Gisin. Violation of Bell inequalities

by photons more than 10 km apart. Phys. Rev. Lett. 81, no. 17, 3563–3566
(1998).
[212] A. M. Turing. On computable numbers, with an application to the entschei-

dungsproblem. Proc. Lond. Math. Soc. Ser. 2 42, 230–265 (1936).
[213] L. M. K. Vandersypen. Experimental quantum computation with nuclear

spins in liquid solution. Ph.D. thesis, Stanford University (2002). quant-
ph/0205193.
[214] S. R. S. Varadhan. Asymptotic probabilities and differential equations. Com-

mun. Pure Appl. Math. 19, 261–286 (1966).
[215] V. Vedral and M. B. Plenio. Entanglement measures and purification proce-

dures. Phys. Rev. A 54, no. 3, 1619–1633 (1998).
[216] V. Vedral, M. B. Plenio, M. A. Rippin and P. L. Knight. Quantifying entan-

glement. Phys. Rev. Lett. 78, no. 12, 2275–2279 (1997).
[217] G. Vidal. Entanglement monotones. J. Mod. Opt. 47, no. 2-3, 355–376 (2000).
[218] G. Vidal, J. I. Latorre, P. Pascual and R. Tarrach. Optimal minimal mea-

surements of mixed states. Phys. Rev. A60, 126–135 (1999).
[219] G. Vidal and R. Tarrach. Robustness of entanglement. Phys. Rev. A 59,

no. 1, 141–155 (1999).
[220] G. Vidal and R. F. Werner. A computable measure of entanglement. quant-

ph/0102117 (2001).
[221] K. G. H. Vollbrecht and R. F. Werner. Entanglement measures under sym-

metry. quant-ph/0010095. (2000).
[222] K. G. H. Vollbrecht and R. F. Werner. Why two qubits are special. J. Math.
Phys. 41, no. 10, 6772–6782 (2000).
[223] J. von Neumann. On infinite direct products. Compos. Math. 6, 1–77 (1938).
cf. also Collected Works III, No. 6.
223 Bibliography
[224] J. von Neumann and O. Morgenstern. Theory of games and economic behav-
ior. Princeton Univ. Press, Princeton (1944).
[225] I. Wegener. The complexity of boolean functions. Teubner, Stuttgart (1987).
[226] S. Weigert. Reconstruction of quantum states and its conceptual implications.

In Trends in quantum mechanics ( H. D. Doebner, S. T. Ali, M. Keyl and
R. F. Werner, editors), pages 146–156. World Scientific, Singapore (2000).
[227] H. Weinfurter and A. Zeilinger. Quantum communication. In Quantum in-

formation ( G. Alber et. al., editor), pages 58–95. Springer (2001).
[228] R. F. Werner. Quantum harmonic analysis on phase space. J. Math. Phys.

25, 1404–1411 (1984).
[229] R. F. Werner. Quantum states with Einstein-Podolsky-Rosen correlations ad-

mitting a hidden-variable model. Phys. Rev. A 40, no. 8, 4277–4281 (1989).
[230] R. F. Werner. Optimal cloning of pure states. Phys.Rev. A 58, 980–1003

(1998).
[231] R. F. Werner. All teleportation and dense coding schemes. quant-ph/0003070

(2000).
[232] R. F. Werner. Quantum information theory – an invitation. In Quantum

information ( G. Alber et. al., editor), pages 14–59. Springer (2001).
[233] R. F. Werner and M. M. Wolf. Bell inequalities and entanglement. Quant.

Inf. Comp. 1, no. 3, 1–25 (2001).
[234] R. F. Werner and M. M. Wolf. Bound entangled gaussian states. Phys. Rev.
Lett. 86, no. 16, 3658–3661 (2001).
[235] R.F. Werner. Physical uniformities on the state space of non-relativistic quan-
tum mechanics. Found. Phys. 13, 859–881 (1983).
[236] R.F. Werner. EPR states for von Neumann algebras. quant-ph/9910077
(1999).
[237] H. Weyl. The classical groups. Princeton University, Princeton (1946).
[238] W. K. Wooters. Entanglement of formation of an arbitrary state of two qubits.

Phys. Rev. Lett. 80, no. 10, 2245–2248 (1998).
[239] W. K. Wootters and W. H. Zurek. A single quantum cannot be cloned. Nature

299, 802–803 (1982).
[240] S. L. Woronowicz. Positive maps of low dimensional matrix algebras. Rep.

Math. Phys. 10, 165–183 (1976).
[241] D. P. Zhelobenko. Compact Lie groups and their representations. American

Mathematical Society, Providence (1978).

Keyl Aspects of QIT PDF

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Keyl Aspects of QIT PDF

Diunggah oleh

Hak Cipta:

Format Tersedia

Aspects of quantum

Von der Gemeinsamen Naturwissenschaftlichen Fakultät

von Michael Keyl

The main purpose of this habilitation thesis is to document my research on

8 Multiple inputs 112

9 Optimal Cloning 125

10 State estimation 143

12 Quantum game theory 170

13 Infinitely entangled states 190

Quantum information and quantum computation have recently attracted a lot

To start with, we have to clarify the precise meaning of “indistinguishable” in

Figure 1.3: Constructing a quantum copying machine from a teleportation device.

1.3 Experimental realizations

of entangled photons over a distance of 10 km using standard telecommunication

Although A and H can be infinite dimensional in general, we will consider only

E(A) = {A ∈ A | A ≥ 0, A ≤ 1I}. (2.2)

If d = 2 or d = 3 holds, it is most natural to choose the Pauli matrices respectively

A ⊗ B := span{A ⊗ B | A ∈ A, B ∈ B} ⊂ B(K ⊗ H). (2.9)

The dual of A ⊗ B is generated by product states, (ρ ⊗ σ)(A ⊗ B) = ρ(A)σ(B) and

B(H) ⊗ B(K) = B(H ⊗ K) (2.10)

with positive weights λj > 0 and ρA B

Proof. Since A = C(X) is classical, there is a basis |jihj| ∈ A, j ∈ X of Pmutually

holds for all A, A0 ∈ B(H) respectively B, B 0 ∈ B(K), with −1I ≤ A, A0 ≤ 1I and

Definition 2.3.1 Consider two observable algebras A, B and a linear map T : A →

1. T is called positive if T (A) ≥ 0 holds for all positive A ∈ A.

2. T is called completely positive (cp) if T ⊗ Id : A ⊗ B(Cn ) → B(H) ⊗ B(Cn )

3. T is called unital if T (1I) = 1I holds.

Consider now the map T ∗ : B ∗ → A∗ which is dual to T , i.e. T ∗ ρ(A) = ρ(T A)

Theorem 2.3.2 (Stinespring dilation theorem) Every completely positive

T (A) = V ∗ (A ⊗ 1IK )V, (2.15)

with an additional Hilbert space K and an operator V : H2 → H1 ⊗ K. Both (i.e.

P By introducing a family |χj ihχj | of one dimensional projectors with

with operators Vj : H2 → H1 and N ≤ dim(H1 ) dim(H2 ).

Tx (A) = V ∗ (A ⊗ Fx )V. (2.17)

2.3.3 The duality lemma

where ηk is an (arbitrary) orthonormal basis in H1 . Hence T is uniquely determined

2.4 Separability criteria and positive maps

To continue the proof of Theorem 2.4.1 associate now to any operator A ∈

tr(Aρ1 ⊗ ρ2 ) = tr(ρT1 TA∗ (ρ2 )), (2.22)

where ( · )T denotes the transposition in an arbitrary but fixed orthonormal basis

2.4.2 The partial transpose

B ∗ (H ⊗ K) 3 ρ 7→ (Id ⊗Θ)(ρ) ∈ B ∗ (H ⊗ K) (2.24)

where |ji ∈ Cd , j = 1, . . . , d denote the canonical basis vectors. In low dimensions

Theorem 2.4.3 Consider a bipartite system B(H ⊗ K) with dim H = 2 and

To use positivity of the partial transpose as a separability criterion was proposed

Definition 2.4.4 A state ρ ∈ B ∗ (H ⊗ K) of a bipartite quantum system is called

2.4.3 The reduction criterion

1I ⊗ tr2 (ρ) − ρ ≥ 0, tr1 (ρ) ⊗ 1I − ρ ≥ 0 (2.26)

with two orthonormal bases φ1 , . . . , φd and ψ1 , . . . , ψd . In 2n × 2n dimensions these

F(ρ) = sup hΨ, ρΨi. (3.4)

d−1 = hΨ, tr1 (ρ) ⊗ 1IΨi ≤ hΨ, ρΨi, (3.5)

φ = (X1 ⊗ 1I)Ψ = (1I ⊗ X2 )Ψ (3.6)

holds. This can be easily checked in a product basis.

Vπ φ1 ⊗ · · · ⊗ φN = φπ−1 (1) ⊗ · · · ⊗ φπ−1 (N ) . (3.7)

products (Bose- respectively Fermi-subspace). If we write d± = d(d ± 1)/2 for the

Each such operator is invariant under all transformations of the form U ⊗ U if U

The range of allowed values for tr(F ρ), tr(Fe ρ) is given by

φj = N (cos(2πj/5), sin(2πj/5), h), j = 0, . . . , 4, (3.26)

as in the two-partite case, which we mainly consider in this paper. Nevertheless, in

is entangled (i.e. tri-inseparable) but biseparable with respect to any decomposition

where U : H ⊗ K → H ⊗ K is a unitary operator describing the common evolution of

which proves the statement. 2

Figure 3.2: Noisy channel