Hypothesis Testing For An Entangled State Produced by Spontaneous Parametric Down Conversion

Hypothesis testing for an entangled state produced
by spontaneous parametric down conversion

Masahito Hayashi , Bao-Sen Shi , Akihisa Tomita, ,
Keiji Matsumoto, , Yoshiyuki Tsuda , and Yun-Kun Jiang
arXiv:quant-ph/0603254 v1 28 Mar 2006
ERATO Quantum Computation and Information Project,

Japan Science and Technology Agency (JST), Tokyo 113-0033, Japan
National Institute of Informatics, Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
COE, Chuo University, Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan
Fundamental Research Laboratories, NEC, Tsukuba 305-8501, Japan

Generation and characterization of entanglement are crucial tasks in quantum information processing. A hypothesis testing scheme for entanglement has been formulated. Three designs were
proposed to test the entangled photon states created by the spontaneous parametric down conversion. The time allocations between the measurement bases were designed to consider the anisotropic
deviation of the generated photon states from the maximally entangled states. The designs were
evaluated in terms of the asymptotic variance and the p-value. It has been shown that the optimal
time allocation between the coincidence and anti-coincidence measurement bases improves the entanglement test. The test can be further improved by optimizing the time allocation between the
anti-coincidence bases. Analysis on the data obtained in the experiment verified the advantage of
the entanglement test designed by the optimal time allocation.
PACS numbers: 03.65.Wj,42.50.-p,03.65.Ud
I.
INTRODUCTION
The concept of entanglement has been thought to be

the heart of quantum mechanics. The seminal experiment by Aspect et al [1] has proved the spooky nonlocal action of quantum mechanics by observing violation of Bell inequality [2] with entangled photon pairs.
Recently, entanglement has been also recognized as an
important resource for information processing. It has
been revealed that the entanglement plays an essential
role, explicitly or implicitly, in the quantum information processing, which provides the unconditional security in cryptographic communication and an exponential speed-up in some computational tasks[3]. For example, the entangled states are indispensable in quantum
teleportation[4], a key protocol in a quantum repeater[5].
Even in BB84 quantum cryptographic protocol[6], a hidden entanglement between the legitimate parties guarantees the security[7]. Practical realization of entangled
states is therefore one of the most important issues in the
quantum information technology.
Practical implementation raises a problem how to
make sure that we have highly entangled states, which are
required to achieve the quantum information protocols.
Unavoidable imperfections will reduce the entanglement
in the generation process. Moreover, decoherence and
dissipation due to the coupling with the environment degrade the entanglement during the processing. Therefore,
it is an important issue to characterize the entanglement
of the generated (or stored) states to guarantee the successful quantum information processing. Quantum state
estimation and Quantum state tomography are known
as the method of identifying the unknown state[8, 9, 10].
Quantum state tomography [11] has recently applied to
obtain full information of the 4 4 two-particle density
matrix from the coincidence counts of 16 combinations.

For practical applications, however, characterization is
not the goal of an experiment, but only a part of preparation. It is thus favorable to reduce the time for characterization and the number of consumed particles as possible. In most applications, we dont need to know the full
information on the states; we only need to know whether
the states are enough entangled or not by a test. The test
should be simpler than the full characterization. Barbieri et al [12] introduced an entanglement witness to test
the entanglement of polarized entangled photon pairs.
We will treat the optimization problem of the tests in
the framework of the hypothesis testing. It enables us
to handle the fluctuation in the data properly with the
mathematical statistics[13].
In the following, we consider experimental designs
to test the polarization entangled states of two photon pairs generated by spontaneous parametric down
conversion (SPDC), though the concept of the design
are applicable to other two-particle entangled states.
Two-photon states can be characterized by the correlation of photon detection events in several measurement bases. For example, if the state is close to
|(+) i = 12 (|HHi + |V V i), the coincidence counts on
the bases {|HHi, |V V i, |DDi, |XXi, |RLi, |LRi} yields
the maximum values, whereas the coincidence counts
on the bases {|HV i, |V Hi, |DXi, |XDi, |RRi, |LLi} take
the minimum values, where H, V , X, D, R, and L stand
for horizontal, vertical, 45 linear, 135 linear, clock-wise
circular, and anti-clock-wise circular polarizations, respectively. We will refer to the former set of the bases
as the coincidence bases, and to the latter as the anticoincidence bases. The ratio of the minimum counts to
the maximum counts measures the degree of entanglement. For example, visibility of two-photon interference,
2
which has been widely used to characterize the entangled
states since Aspects experiment [1], measures the entanglement by the ratios obtained at two fixed bases for one
particle. We will show that the visibility is not sufficient
and to be re-formulated in the view of statistics. We then
improve the test by optimizing the allocation on the measurement time for each measurement basis, considering
that the counts on the anti-coincidence bases are much
smaller than those on the coincidence bases. The test
can be further improved, if we utilize the knowledge on
the tendency of the entanglement degradation. In general, the error from the maximally entangled states can
be anisotropic, which reflects the generation process of
the states. We can improve the sensitivity to the entanglement degradation by focusing the measurement on the
expected error directions.
The SPDC is now widely used to generate entangled
photon pairs. Several experimental settings have been
demonstrated to provide highly entangled states. In particular, Kwait et al [15] have obtained a high flux of the
polarization entangled photon pairs from a stack of two
type-I phase matched nonlinear crystals. One nonlinear
crystal generates a photon pair polarized in the horizontal direction (|HHi), and the other generates ones polarized in the vertical direction (|V V i). If the two pairs are
indistinguishable, the generated photons are entangled.
Otherwise, the state will be a mixture of HH pairs and
V V pairs. The quantum state tomography has shown
that the only HHHH, V V V V , V V HH, HHV V elements are dominant [16], which implies that the density
matrix can be approximately given by the classical mixture of the |(+) ih(+) | and |() ih() |. We can improve the entanglement test based on this property of
the photon pairs, as described in the following sections.
In this article, we reformulate the hypothesis testing
to be applicable to the SPDC experiments, and demonstrate the effectiveness of the optimized time allocation
in the entanglement test. The construction of this article is following. Section 2 defines the hypothesis scheme
for the entanglement of the two-photon states generated
by SPDC. Section 3 gives the mathematical formulation
concerning statistical hypothesis testing. Sections 4 - 10
describe the mathematical aspects, and Section 11 examines the experimental aspects of the hypothesis testing on
the entanglement. The designs on the time allocation are
evaluated by the experimental data. Hence, if a reader is
interested only in experiments and data analysis, he can
skip sections 4 - 10 and proceed to section 11. However,
if he is concerned with the mathematical discussion, he
should read sections 4 - 10 before section 11.
The following is organization of sections 4 - 10, which
discuss more theoretical issues. Sections 4 and 5 give the
fundamental properties of the hypothesis testing: section
4 introduces the likelihood ratio test, and section 5 gives
the asymptotic theory of the hypothesis testing. Sections
6-9 are devoted to the designs of the time allocation between the coincidence and anti-coincidence bases: section
6 defines the modified visibility method, section 7 opti-
mize the time allocation, when the total photon flux is

unknown, section 8 gives the results with known , and
section 9 compares the designs in terms of the asymptotic
variance. Section 10 gives further improvement by optimizing the time allocation between the anti-coincidence
bases. Appendices give the detail of the proofs used in
the optimization.
II.
HYPOTHESIS TESTING SCHEME FOR

ENTANGLEMENT
This section introduces the hypothesis test for entanglement. We consider the two-photon states generated by
SPDC, which are described by a density matrix . We
assume each two-photon generation process to be identical but individual. The target state is the maximally
entangled |(+) i state. Here we measure the entanglement by the fidelity between the generated state and the
target state:
= h(+) ||(+) i.
(1)
The purpose of the test is to guarantee that the state is

sufficiently close to the maximally entangled state with a
certain significance. That is, we are required to disprove
that the fidelity is less than a threshold 0 with a small
error probability. In mathematical statistics, this situation is formulated as hypothesis testing; we introduce the
null hypothesis H0 that entanglement is not enough and
the alternative H1 that the entanglement is enough:
H0 : 0 v.s. H1 : > 0 ,
(2)
with a threshold 0 .
The outcome of the coincidence count measurement in
the SPDC experiment can be assumed a random variable independently distributed according to the Poisson
distribution. From now on, a symbol labeled by a pair
(x, y) refers to a random variable or parameter related to
the measurement basis |xA , yB i. The number of detection events nxy on the basis |xA , yB i is a random variable
according to the Poisson distribution Poi((xy + )txy )
of mean (xy + )txy , where
is a known constant, related to the photon detection rate, determined from the average number of
photon pairs generated in unit time and the detection efficiency,
xy = hxA , yB ||xA , yB i is an unknown constant,
txy is a known constant of the time for detection.
is a constant of the average dark counts,
The probability function of nxy is
exp((xy + )txy )
((xy + )txy )nxy

.
nxy !
3
Because the detections at different times are mutually
independent, nxy is independent of nx y (x 6= x or y 6=
y ). In this paper, we discuss the quantum hypothesis
testing under the above assumption while Usami et al.[17]
discussed the state estimation under this assumption.
Visibility of the two photon interference is an indicator of entanglement commonly used in the experiments.
When the dark-count parameter can be regarded to
be 0, the visibility is calculated as follows: first, As
measurement basis |xA i is fixed, then the measurement
|xA , yB i is performed by rotating Bs measurement basis |xB i to obtain the maximum and minimum number
of the coincidence counts, nmax and nmin . We need to
make the measurement with at least two bases of A in
order to exclude the possibility of the classical correlation. We may choose the two bases |Hi and |Di as |xA i,
for example. Finally, the visibility is given by the ratio
between nmax nmin and nmax + nmin with the respective As measurement basis |xA i. However, our decision
will contain a bias, if we choose only two bases as As
measurement basis |xA i. Hence, we cannot estimate the
fidelity between the target maximally entangled state and
the given state in a statistically proper way from the visibility.
Since the equation
|HHihHH| + |V V ihV V | + |DDihDD|
+ |XXihXX| + |RLihRL| + |LRihLR|
=2|(+) ih(+) | + I
(3)
holds, we can estimate the fidelity by measuring the

sum of the coincidence counts of the following bases:
|HHi, |V V i, |DDi, |XXi, |RLi, and |LRi, when and
are known[12, 13]. This is because the sum n1 :=
nHH + nV V + nDD + nXX + nRL + nLR obeys the Poisson distribution with the expectation value ( 1+2
6 +)t1 ,
where the measurement time for each base is t61 .
The parameter is usually unknown, however. We
need to perform another measurement on different bases
to obtain additional information. Since
|HV ihHV | + |V HihV H| + |XDihXD|
+ |DXihDX| + |RRihRR| + |LLihLL|
=2I 2|(+) ih(+) |
(4)
also holds, we can estimate the fidelity by measuring the

sum of the coincidence counts of the following bases:
|HV i, |V Hi, |DXi, |XDi, |RRi, and |LLi. The sum
n2 := nHV + nV H + nDX + nXD + nRR + nLL obeys
the Poisson distribution Poi(( 22
+ )t2 ), where the
6
measurement time for each base is t62 . Combining the
two measurements, we can estimate the fidelity without
the knowledge of .
We can also consider different type of measurement on
. If we prepare our device to detect all photons, the
detected number n3 obeys the distribution Poi(( + )t3 )
with the measurement time t3 . We will refer to it as the
total flux measurement. In the following, we consider
the best time allocation for estimation and test on the

fidelity, by applying methods of mathematical statistics.
We will assume that is known or estimated from the
detected number n3 .
III. HYPOTHESIS TESTING FOR
PROBABILITY DISTRIBUTIONS
A.
Formulation
In this section, we review the fundamental knowledge

of hypothesis testing for probability distributions[14].
Suppose that a random variable X is distributed according to a probability measure P identified by the unknown
parameter . We also assume that the unknown parameter belongs to one of mutually disjoint sets 0 and 1 .
When we want to guarantee that the true parameter belongs to the set 1 with a certain significance, we choose
the null hypothesis H0 and the alternative hypothesis H1
as
H0 : 0 versus H1 : 1 .
(5)
Then, our decision method is described by a test, which

is described as a function (x) taking values in {0, 1}; H0
is rejected if 1 is observed, and H0 is not rejected if 0 is
observed. That is, we make our decision only when 1 is
observed, and do not otherwise. This is because the purpose is accepting H1 by rejecting H0 with guaranteeing
the quality of our decision, and is not rejecting H1 nor
accepting H1 .
From theoretical viewpoint, we often consider randomized tests, in which we probabilistically make the decision
for a given data. Such a test is given by a function mapping to the interval [0, 1]. When we observe the data x,
H0 is rejected with the probability (x). In the following,
we treat randomized tests as well as deterministic tests.
In the statistical hypothesis testing, we minimize error
probabilities of the test . There are two types of errors.
The type one error is the case where H0 is rejected though
it is true. The type two error is the converse case, H0
is accepted though it is false. Hence, the type one error
probability is given P () ( 0 ), and the type two
error probability is given 1 P () ( 1 ), where
Z
P () = (x)dP (x).
It is in general impossible to minimize both P () and
1 P () simultaneously because of a trade-off relation
between them. Since we make our decision with guaranteeing its quality only when 1 is observed, it is definitively
required that the type one error probability P () is less
than a certain constant . For this reason, we minimize
the type two error probability 1P () under the condition P () . The constant in the condition is called
the risk probability, which guarantees the quality of our
decision. If the risk probability is large enough, our decision has less reliability. Under this constraint for the risk
4
probability, we maximize the probability to reject the hypothesis H0 when the true parameter is 1 . This
probability is given as P (), and is called the power of
. Hence, a test of the risk probability is said to be
most powerful (MP) at 1 if P () P () holds
for any test of the risk probability . Then, a test is
said to be Uniformly Most Powerful (UMP) if it is MP
at any 1 .
B.
p-values
In the hypothesis testing, we usually fixed our test before applying it to data. However, we sometimes focus
on the minimum risk probability among tests in a class T
rejecting the hypothesis H0 with a given data. This value
is called the p-value, which depends on the observed data
x as well as the subset 0 to be rejected.
In fact, in order to define the p-value, we have to fix a
class T of tests. Then, for x and 0 , p-value is defined
as
min
max p ().
T :(x)=1 0
In this case, the p-vale is max0 P (x). However, the

function x is unnatural as a test. Hence, we should fix
a class of tests to define p-value.
LIKELIHOOD TEST
A.
Definition
The likelihood ratio test is a standard test being UMP

for typical cases[14]. When both 0 and 1 consist of
single elements as 0 = {0 } and 1 = {1 }, the likelihood ratio test r is defined as
(
0 if p0 (x)/p1 (x) r,
r (x) :=
1 if p0 (x)/p1 (x) < r
where r is a constant, and the ratio P0 (x)/P1 (x) is
called the likelihood ratio. From the definition, any test
satisfies
(rP1 P0 )(r ) (rP1 P0 )().
(8)
When a likelihood test r satisfies

= P0 (r ),
+ rP1 () = P0 () + rP1 ()
P0 (r ) + rP1 (r ) = + rP1 (r ).
Hence, 1 P1 () 1 P1 (r ). This is known as
Neyman-Pearsons fundamental lemma[19].
The likelihood ratio test is generalized to the cases
where 0 or 1 has at least two elements as
0 if sup0 p (x) r,
sup1 p (x)
r (x) :=
1 if sup0 p (x) < r.
p (x)
sup
1
B.
(9)
Monotone Likelihood Ratio Test
In cases where the hypothesis is one-sided, that is, the

parameter space is an interval of R and the hypothesis
is given as
H0 : 0 versus H1 : < 0 ,
(6)
Since the p-value expresses the risk for rejecting the hypothesis H0 , Hence, this concept is useful for comparison
among several experiments.
Note that if we are allowed to choose any function
as a test, the above minimum is attained by the function
x :

0 if y 6= x
x (y) =
(7)
1 if y = x.
IV.
the test LR,r is MP of level . Indeed, when a test

satisfies P0 () ,
(10)
we often use so-called interval tests for its optimality under some conditions as well as for its naturalness.
When the likelihood ratio P (x)/P (x) is monotone
increasing concerning x for any , such that > ,
the likelihood ratio is called monotone. In this case, the
likelihood test r between P0 and P1 is UMP of level
:= P0 (r ), where 1 is an arbitrary element satisfying
1 < 0 .
Indeed, many important examples satisfy this condition. Hence, it is convenient to give its proof here.
From the monotonicity, the likelihood test r has the
form

1 x < x0
(11)
r (x) =
0 x x0
with a threshold value x0 . Since the monotonicity implies
P0 (r ) P (r ) for any 0 , it follows from Neyman
Pearson Lemma that the likelihood test r is MP of level
. From (11), the likelihood test r is also a likelihood
test between P0 and P , where is another element
satisfying < 0 . Hence, the test r is also MP of level
.
From the above discussion, it is suitable to treat pvalue based on the class of likelihood tests. In this case,
when we observe x0 , the p-value is equal to
Z x0
(12)
P0 (dx).
C.
One-Parameter Exponential Family
In mathematical statistics, exponential families are

known as a class of typical statistical models[18]. A family of probability distributions {p | } is called an
5
exponential family when there exists a random variable
x such that
P (x) := P0 (x) exp(x + g()),
(13)
R
where g() := log exp(x)P0 (dx).
It is known that this class of families includes, for example, the Poisson distributions, normal distributions,
binomial distributions, etc. In this case, the likelihood
0 x+g(0 ))
ratio exp(
exp(1 x+g(1 )) = exp((0 1 )x + g(0 ) g(1 )) is
monotone concerning x for 0 > 1 . Hence, the likelihood
ratio test is UMP in the hypothesis (10). Note that this
argument is valid even if we choose a different parameter
if the family has a parameter satisfying (13).
For example, in the case of the normal distribution
(x)2
x2
1
1
e 2V = 2V
e 2V + V 2V , the UMP
P (x) = 2V
test UMP, of the level is given as
(
1 if x < 0 V
(14)
UMP, (x) :=
0 if x 0 V ,
Next, we consider testing the following hypothesis in

the case of the binomial Poisson distribution Poi(1 , 2 ):
H0 :
1
1
0 versus H1 :
< 0 .
1 + 2
1 + 2
x2
1
e 2 dx.
2
(15)

The n-trial binomial distributions Ppn (k) = nk (1
p)nk pk are also an exponential family because anp
satisfies that Ppn (k) =
other parameter := log 1p
e

n 1 k+n log 1+e
. Hence, in the case of the n-trial bik 2n e
nomial distribution, the UMP test nUMP, of the level
is given as
1 if x < k0
UMP, (k) := if x = k0
(16)
0 if x > k
0
where k is the maximum value k satisfying

Pk 1 n0
nk k
, and is defined as
k=0 k (1 )

kX
0 1
n
n
(1 )nk0 k0 +
(1 )nk k .
k0
k
k=0
(17)
The Poisson distributions are also an exponential

family because another parameter := log n satisfies
Poi() = 1nene . In this case, the UMP test UMP,

of the level is similarly characterized. When n is sufficiently large, the distribution Pn (k) can be approximated
by the normal distribution with variance n(1 ). That
is, the UMP test nUMP, of the level is approximately
given as
(
1 if k < n0
n0 (10 )
n
(18)
UMP, (k) :=
0 if k n0
.
n0 (10 )
(20)
k1 +k2
In this case, the test (k1 , k2 ) = UMP,
(k1 ) is a test
of level . This is because the conditional distribution
P n Poi(1 ,2 )(k,nk)
Poi(1 ,2 )(k ,nk ) is equal to the binomial distribuk =0
tion P n 1 (k). Therefore, when we observe k1 , k2 , the p1 +2
Pk1 k1 +k2 k
0 (1
value of this class of tests is equal to k=0
k
)k1 +k2 k .
D.
where
Similarly, in the case of the Poisson distribution Poi(),

the UMP test UMP, of the level is approximately
given as
(
1 if k < n0
0
(19)
UMP, (k) :=
0 if k n0 .
Multi-parameter case
In the one-parameter case, UMP tests can be often

characterized by likelihood tests. However, in multiparameter case, this type characterization is impossible
generally, and the UMP test does not always exist. In
this case, we have to choose our test among non-UMP
tests. One idea is choosing our test among likelihood
tests because likelihood tests always exist and we can expect that these tests have good performances. Generally,
it is not easy to give an explicit form of the likelihood
test. When the family is multi-parameter exponential
family, the likelihood test has a simpler form. A family
of probability distributions {p | = (1 , . . . , m ) Rm }
is called an m-parameter exponential family when there
exists m random variable x1 , . . . , xm such that
P (x) := P0 (x) exp( x + g()),
R
where g() := log exp( x)P0 (dx).
In this case the likelihood test r has the form
0 if inf 1 1 D(P(x) kP1 )
inf 0 0 D(P(x) kP0 ) log r

r (x) =
1
if
inf 1 1 D(P(x) kP1 )
inf 0 0 D(P(x) kP0 ) < log r,
(21)
where the divergence D(P kP ) is defined as

Z
P (x )
D(P kP ) := log
P (dx )
P (x )
Z
= ( ) xP (dx) + g() g(),
and (x) is defined by [18]
Z
x P(x) (dx ) = x.
(22)
6
This is because the logarithm of the likelihood function
is calculated as
sup0 0 P0 (x)
sup1 1 P1 (x)
P (x)
= sup inf log 0
P1 (x)
1
1
0 0
log
= inf
c<c0
0 0 1 1
= sup
0 0 1 1
inf D(P(x) kP1 ) D(P(x) kP0 )
= sup
0 0 1 1
= inf D(P(x) kP1 ) inf D(P(x) kP0 ).

0 0
1 1
In addition, (x) coincides with the MLE when x is observed.

In the following, we treat two hypotheses given as
H0 : w c0 versus H1 : w < c0 .
(23)
In the case of the multi-normal distribution

fam1
12 (x)V 1 (x)
ily P (x) =
e
dx
=
(2)m/2 det V
1
1
(2)m/2 det V
e 2 xV
D(P kP ) =
1
2 (
x+V 1 x 21 V 1
)V
(wc)2
2wV 1 w ,
the likelihood
inf D(Px kP1 ) inf D(Px kP0 )
1 1
inf (0 1 ) x + g(0 ) g(0 )

Z
inf (0 1 ) x P(x) (dx ) + g(0 ) g(0 )
= sup
minw=c0 21 ( )V 1 ( ) =
function is calculated as
dx, In this case,
( ) and (x) = x. Since
0 0
(w x c)
(w x c)2
inf
.
1
cc0 2wV 1 w
2wV w
That is, the likelihood function has the same form as the
likelihood function of one-parameter normal distribution
family with the variance wV 1 w.
The
multi-nomial
Poisson
distributions
k
Pl
m
1 k
l
Poi(1 , . . . , m )(k1 , . . . , km )
:=
e i=1 i k11 !km
!
is also an exponential family.
The divergence is
calculated as
D(Poi(1 , . . . , m )kPoi(1 , . . . , m ))
m
m
X
X
i
i log .
(i i ) +
=
i
i=1
i=1

)
Pm
Pm

ki
k
)
+
k
log
min
i
i=1 i
i
,
max Poi(1 , . . . , m ) (k1 , . . . , km
) Pmw=c0 i=1 i
w =c0
i=1 wi ki < c0
max
w =c0
Poi(1 , . . . , m )
k
)
X k
i
1 ,

i=1 i
(27)

(k1 , . . . , km
)
where
i is defined as follows:
V.
where
wM
c0 (wM wi )
wi wM
log
:=
maxi wi
wM wi
wM .
and
R0
:=
(28)
(29)
c0
wM
(26)
ASYMPTOTIC THEORY
A.
Fisher information
Assume that the data x1 , . . . , xn obeys the identical

and independent distribution of the same distribution
family p and n is sufficiently large. When the true parameter is close to 0 , it is known that the meaningful information
Pn for is essentially given as the random
variable n1 i=1 l0 (xi ), where the logarithmic derivative
l0 (xi ) is defined by
d log p (x)
.
(30)
d
Pn
In this case, the random variable n1 i=1 l0 (xi ) can be
approximated by the normal distribution with the expectation value 0 and the variance nJ1 , where the Fisher
0 R
information J is defined as J := (l (x))2 P (dx).
Hence, the testing problem can be approximated by the
testing of this normal distribution family [14, 18]. That
is, the quality of testing is approximately evaluated by
l (x) :=
c0
i wi
i +
i log
= R if R R0
wi
c0
wM wi
c0
+
i log
= R if R > R0 ,
wM
wM
(25)
Hence, using this formula and (21), we can calculate the

likelihood ratio test. Now, we calculate the p-value concerning the class of likelihood
tests when we observe
Pratio
m
the data k1 , . . . , km . When i=1 wi ki < c0 , this value is
equal to
Pm
Pm
where R := minw=c0 i=1 (i ki )+ i=1 ki log kii . As
is shown in Appendix D, this value is upperly bounded
by
(24)
7
the Fisher information J0 at the threshold 0 .
In the case of Poisson distribution family Poi(t), the
parameter can be estimated by Xt . The asymptotic case
corresponds to the case with large t. In this case, Fisher
information is t . When X obey the unknown Poisson
distribution family Poi(t), the estimation error Xt is
close to the normal distribution with the variance t , i.e.,
X
t( t ) approaches to the random variables obeying
the normal distribution with variance . That is, Fisher
information corresponds to the inverse of variance of the
estimator.
This approximation can be extended to the multiparameter case {p | Rm }. Similarly, it is known that
the testing problem can be approximated by the testing
of the normal distribution family with the covariance matrix (nJ )1 , where the Fisher information matrix J;i,j
is given by
Z
J;i,j := l;i (x)l;j (x)P (dx),
(31)
l;i (x) :=
log p (x)
.
i
(x) :=
wJ 1 w
0
,
distribution family with variance
n
Indeed, the same fact holds for the multinomial Poisson
distribution family Poi(t1 , . . . , tm ). When the random
variable Xj is the i-th random variable, the random variPm j
able
j=1 t (Xj j ) converges to the random variable
the normal distribution with the variance
Pm obeying
2
j=1 j j in distribution:
m
m
X
X
j
d
(Xj j )
2j j .
t
j=1
j=1
(33)
This convergence is compact uniform concerning the parameter 1 , . . . , m . In this case, the Fisher information
matrix J is the diagonal matrix with the diagonal elements ( t1 , . . . , tm ). When our distribution family is
given as a subfamily Poi(t1 (), . . . , tm ()), the Fisher
information matrix is At J() A , where A;i,j = ij .

Hence, when the hypotheses is given by (23), the testing problem can be approximated by the testing of the
normal distribution family with variance
w (At J() A )1 w.
(34)
In the following, we call this value Fisher information.

Based on this value, the quality can be compared when
we have several testing schemes.
p-values
In the following, we treat the p-values. First, we focus

on the testing hypothesis (10) in one-parametric normal
x2
1
e 2 dx.
2
In the n-trial binomial distribution case, the distribution can approximated by normal distribution if n is sufficiently large. In this case, the p-value of likelihood ration
tests is the function of the data k, the number of trial and
the threshold 0 , which is equal to
k n0
( p
).
n(1 0 )0
(35)
Next, focus on the Poisson distribution case Poi(t).

The p-value of likelihood ration tests is the function of
the data n, the time t and the threshold 0 , which is
almost equal to
(32)
When the hypotheses is given by (10), the testing problem can be approximated by the testing of the normal
B.
distributions with the variance v. In this case, the pvalue of likelihood ration tests is the function of the data
x, the variance v, and the threshold 0 , which is equal to
0 ), where
( x
v
n 0 t
),
(
0 t
(36)
when the time t is sufficiently large.

Using (35), we consider the hypothesis (20) in the binomial Poisson distribution Poi(1 t, 2 t). The p-value of
tests given in likelihood ration tests is the function of the
data n1 , n2 , and the threshold 0 , which is almost equal
to
n1 (n1 + n2 )0
( p
),
(n1 + n2 )0 (1 0 )
(37)
when the total number n1 + n2 is sufficiently large.

Next, we consider the hypothesis (23) in the multinomial Poisson distribution Poi(1 , , m ). In this case,
by using
i defined in (28) and (29), the upper bound of
the p-value is approximated to
Pm
Pm
1 j=1 jj
1 j=1 jj
= max q
,
max qP
Pm i
m i
w =c0
w =c0
i=1
2i
i=1
2i
(38)
because this convergence (33) is compact uniform concerning the parameter 1 , . . . , m . Letting xi = wci0i
and yi = wci 02 , we have
i
Pm
1 j=1 jj
1x
max qP
= max ,
m i
w =c0
y
(x,y)Co
i=1
(39)
2i
where Co is the convex hull of (x1 , y1 ), . . . , (xm , ym ).

That is, p-value is given by
( max
(x,y)Co
1x
).
y
(40)
8
VI.
matrix concerning the parameters and is
MODIFICATION OF VISIBILITY
In the two photon interference, the coincidence

counts on the bases |HHi, |V V i, |DDi, |XXi, |RLi,
and |LRi yield the maximum values (coincidence),
whereas the coincidence counts on the bases
|HV i, |V Hi, |DXi, |XDi, |RRi, and |LLi does the
minimum values (anti-coincidence).
We can test
the fidelity between the maximally entangled state
|(+) ih(+) | and the given state , using the total coincidence count k1 and the total anti-coincidence count k2
t
obtained by measuring on all the bases with the time 12
.
2+1
The total coincidence count k1 obeys Poi( 12 t), and
the total anti-coincidence count k2 obeys the distribution
Poi( 22
12 t), and Fisher information matrix concerning
the parameters and is
!
t
t
+ 3(22)
)
0
( 3(2+1)
22
2+1
,
(41)
12 t+ 12 t
0
where the first element corresponds to the parameter

and the second one does to the parameter . Then, we
can apply the testing method given in the end of subsection IV C. On the basis of the discussion in subsection
V A, the asymptotic variance (34) is calculated to be
1
t
( 3(2+1) +
(2 + 1)(2 2)
.
=
t
t
3(22) )
t1 t2
3
(42)
2
,
1 + 2
22
where 1 = 2+1
12 t and 2 = 12 t are the expectation values of the total coincidence counts and the total anti-coincidence counts, respectively. Considering the
definition of visibility, (nmax nmin )/(nmax + nmin ) =
(2nmin /(nmax + nmin )), we can regard the above estimation of fidelity as a modification of the visibility in a
well-defined statistics manner. We will refer to it as the
modified visibility method. In the following, we will propose several designs of experiment to improve the modified visibility method.
DESIGN I (: UNKNOWN, ONE STAGE)
In this section, we consider the problem of testing the fidelity between the maximally entangled
state |(+) ih(+) | and the given state using data
(k1 , k2 , k3 ) subject to the multinomial Poisson distribu22
tion Poi( 2+1
6 t1 , 6 t2 , t3 ) with the assumption that
the parameter is unknown. In this problem, it is natural to assume that we can select the time allocation with
the constraint for the total time t1 + t2 + t3 = t.
The performance of the time allocation (t1 , t2 , t3 ) is
evaluated by the variance (34). The Fisher information
t1 t2
3
22
2+1
6 t1 + 6 t2 +t3
(43)
where the first element corresponds to the parameter

and the second one does to the parameter . Then, the
asymptotic variance (34) is calculated as
2+1
6 t1
( 2+1
6 t1 +
2
6 t2
2
6 t2 + t3
2t1
2t2
t3 )( 3(2+1) + 3(22)
)
2 2
( t1 t
3 )
.
(44)
We optimize the time allocation by minimizing the variance (44). We perform

the minimization by maximizing

the inverse:
2t1
3(2+1)
2t2
3(22)
t t
( 1 3 2 )2
22
2+1
6 t1 + 6 t2 +t3
Applying Lemmas 1 and 2 shown in Appendix A to the

2
2
22
case of a = 3(2+1)
, b = 3(22)
, c = 2+1
6 , d =
6 , we
obtain
(i)
2t1
t1 +t3 =t 3(2 + 1)
max
The above method uses the ratio
VII.
2t2
3(22) )
2t1
+
( 3(2+1)
( t31 )2
2+1
6 t1
+ t3
2t
3(2 + 1)(1 +
2+1 2
6 )
(45)
(ii)
2t2
t2 +t3 =t 3(2 2)
max
( t32 )2
22
6 t2
+ t3
2t
3(2 2)(1 +
22 2
6 )
(46)
and
2t2
2t1
+
t1 +t2 =t 3(2 + 1)
3(2 2)
q
q
2+1 2
1
( 13 22
2+1 + 3
22 ) t
q
q
=
22 2
( 2+1
6 +
6 )
(iii) max
2 2
( t1 t
3 )
2+1
6 t1
6t
,
(2 + 1)(2 2)( 2 + 1 + 2 2)2
22
6 t2
(47)
using the results of (i) coincidence and total flux measurements, (ii) anti-coincidence and total flux measurements,
and (iii) coincidence and anti-coincidence measurements,
respectively. The ratio of (47) to (45) is equal to
3( 6 + 2 + 1)2
> 1,
(48)
2(2 2)( 2 + 1 + 2 2)2
as shown in Appendix B. That is, the measurement using
the coincidence and the anti-coincidence provides better
9
test than that using the coincidence and the total flux.
Hence, we compare (ii) with (iii), and obtain
max
t1 +t2 +t3 =t
2t1
2t2
+
3(2 + 1) 3(2 2)

2 2
( t1 t
3 )
22
2+1
6 t1 +
6 t2 + t3
4t
(22)( 6+ 22)2
6t
(2+1)(22)( 2+1+ 22)2
if 1 < 1
(49)
if 0 1
where 1 < 1 is defined by
2(21 + 1)( 21 + 1 + 2 21 )2
= 1.
3( 6 + 2 21 )2
2(1)
22
t,
2+1+ 22
The
optimal
asymptotic
variance
is
threshold 0
is much better
than that obtained by the modified visibility method.
The ratio of the optimal asymptotic variance is given by
(2+1)(22)( 22+ 1+2)2

when the
6t
is less than 1 . This asymptotic variance
(50)
The approximated value of 1 is 0.899519. The equation

(49) is derived in Appendix C.
Fig. 1 shows the ratio of the optimal Fisher information obtained from the result of the anti-coincidence
and total flux measurements to that obtained from the
result of the coincidence and anti-coincidence measurements. When 1 1, the maximum Fisher
in
formation is attained by t1 = 0, t2 = 6
t,
( 6+ 2(1))
2(1)
t3 =
t. Otherwise, the maximum is attained
6+
FIG. 1: The ratio of the optimal Fisher information (solid

line) and the optimal time allocation as a function of the fidelity . The measurement time is divided into three periods:
coincidence t1 (plus signs), anti-coincidence t2 (circles), and
total flux t3 (squares), which are normalized as t1 +t2 +t3 = 1
in the plot.
2+1
t2 = 2+1+
t, t3 = 0. The
by t1 =
22
optimal time allocation shown in Fig. 1 implies that we
should measure the counts on the anti-coincidence bases
preferentially over other bases.
( 2 2 + 1 + 2)2
< 1.
6
(51)
In the following, we give the optimal test of level in

the hypothesis testing (5). Assume that the threshold
0 is less than 1 . In this case, we can apply testing
of the hypothesis (20). First, we measure two-photon
coincidence
count on the coincidence bases for a period of
t 22
0
t1 = 2 +1+22
, to obtain the total count n1 . Then,
0
0
we measure the count
on the anti-coincidence bases for a
20
+1
period of t2 = 2 t+1+
to obtain the total count
220
0
n2 . Note that the optimal time allocation depends on the
threshold of our hypothesis. Finally, we apply the UMP
test of of the hypothesis:
1.0
H0 : p
0.8
0.8
0.6
0.4
0.6
0.4
0.2
0.2
0.0
0.0
optimal time allocation
ratio of Fisher information
1.0
0.4
0.6
0.8
t 6

, t3
6+ 2(10 )
2t1
( 3((2+1)+6)
+
(2+1)
3((2+1)+6) t1
22
22+ 1+2
6+
2(10 )
.
2(10 )
1.0
If the dark count parameter is known but is not negligible, the Fisher information matrix is given by
0.899519
fidelity
versus H1 : p <
with the binomial distribution family Ppn1 +n2 to the data

n1 . We can apply a similar testing for 0 > 1 . It is
sufficient to replace the
time allocation to t1 = 0 t2 =
0.0
0.2
22
22+ 1+2
2t2
3((22)+6) )
(22)
3((22)+6) t2
(2+1)
(22)
3((2+1)+6) t1 3((22)+6) t2
2+1
22
1
2+1
22
(2+1)+6 6 t1 + (22)+6 6 t2 + t3
Hence, from (34), the inverse of the minimum variance is
(52)
equal to
f (t1 , t2 , t3 )
2t2
2t1
+
:=(
3((2 + 1) + 6) 3((2 2) + 6)
(2+1)
t1
( 3((2+1)+6)
(2+1)
2+1
t1 +
(22)
2
3((22)+6) t2 )
).
(22) 22
t2 + t3
10
Then, we apply Lemmas 1 and 2 in Appendix A to
f (t1 ,t2 ,t3 )
2
2
with a = 3((2+1)+6)
, b = 3((22)+6)
,
(2+1) 2+1
c = (2+1)+6
6 , d =
optimized value:
(22) 22
(22)+6 6 ,
(ii) anti-coincidence and total flux
and obtain the

max f (0, t2 , t3 ) =
t2 +t3 =t
((2 2) +
(i) coincidence and total flux

max f (t1 , 0, t3 ) =
t1 +t3 =t
((2 + 1) +
4t
q
6((2+1)+6) 2
)
4t
q
6((22)+6) 2
)
(54)
(53)
and
(iii) coincidence and anti-coincidence
3
max f (t1 , t2 , 0) =t
(2+1)
3 ((2+1)+)((22)+6)
q
q
+ (2 2) 6((22)+6)
(2 + 1) 6((2+1)+6)
2
3
2 t
=
3((2 + 1) + )((2 2) + 6) 2+1
+ 22
t1 +t2 =t
((2+1)+)((22)+6)
(22)
(2+1)+6
(22)+6
62 t
The ratio of (53) to (55) is
=
2 .
p
p
(2 + 1) (2 2) + 6 + (2 2) (2 + 1) + 6
(55)
2

q
6((2+1)+6)
3 (2 + 1) +

2
p
p
2 (2 + 1) (2 2) + 6 + (2 2) (2 + 1) + 6
!2
p
(2 + 1) + 6((2 + 1) + 6)
3
p
p
=
> 1,
2 (2 + 1) (2 2) + 6 + (2 2) (2 + 1) + 6
where the final inequality is derived in Appendix B.
Therefore, the measurement using the coincidence and
the anti-coincidence provides better test than that using
the coincidence and the total flux, as in the case of = 0.
Define 1 and for = 6/ < 1 as
p
p
p
1 + 3 1 = 3/2
p
p
p
1 + 2 + 2 2 + = 3/2.
The parameter 1 is calculated to be 0.375. As shown in
Appendix C, the measurement using the coincidence and
the anti-coincidence provides better test than that using
the anti-coincidence and the total flux, if the fidelity is
(56)
smaller than the threshold :

max
f (t1 , t2 , t3 )
t1 +t2 +t3 =t
2
4 t
((22) + 6((22)+6))2
62 t

(2+1)
The
t1
t3
and
(22)+6+(22)
optimal
time
0,
t1
t2
if >
(2+1)+6
otherwise.
2
(57)
allocation
is
given
by
t 6((22)+6)
,
and
6((22)+6)+(22)
t(22)
for
>
6((22)+6)+(22)
(22)
t(22)
(2+1)+6
(2+1)+6+(2+1)
(22)+6
11
t2 =
(22)
t(2+1)
(22)+6
(2+1)+6+(2+1)
(22)+6
, t3 = 0
for . The threshold for optimal time allocation

increases with the normalized dark count as illustrated
in Fig. 2.
parameter is zero.
(i) Modified visibility:
(2+1)(22)
.
t
The asymptotic variance is
(iia) Design I ( unknown).

optimal time allocation between the anti-coincidence count and the
coincidence count: The asymptotic variance is
(2+1)(22)( 2+1+ 22)2
.
6t
(iib) Design I ( unknown), optimal time allocation between the anti-coincidence count and the
total flux
count: The asymptotic variance is

(22)( 6+ 22)2
.
4t
0.98
0.96
0.94
(iiia) Design II ( known), estimation from the anticoincidence count: The asymptotic variance is
3(22)
2t .
0.92
0.9
0.1
0.2
0.3 0.375
normalized dark counts
FIG. 2: The threshold for optimal time allocation as a

function of normalized dark counts .
VIII.
DESIGN II (: KNOWN, ONE STAGE)
In this section, we consider the case where is known.

Then, the Fisher information is
(
2t2
2t1
+
).
3((2 + 1) + 6) 3((2 2) + 6)
The maximum value is calculated as

(
2
max
(58) =
t1 +t2 +t3 =t;t3 =0
2 t
3((2+1)+6)
22 t
3((22)+6)
if <
if
Fig. 3 shows the comparison, where the asymptotic variances in (iia)-(iiib) are normalized by the one in (i).
The anti-coincidence measurement provides the best estimation for high ( > 0.25) fidelity. When is unknown, the measurement with the anti-coincidence count
and the coincidence count is better than that with the
anti-coincidence count and the total flux count for <
0.899519. For higher fidelity, the anti-coincidence count
and the total flux count turns to be better, but the difference is small.
(58)
1
4
1
4.
(59)
The above optimization shows that when 41 , the anticoincidence count (t1 = 0; t2 = t) is better than the coincidence count (t1 = t; t2 = 0). In fact, Barbieri et al.[12]
measured the sum of the counts on the anti-coincidence
bases |HV i, |V Hi, |DXi, |XDi, |RRi, |LLi to realize the
entanglement witness in their experiment. In this case,
the variance is 3((22)+6)
. When we observe the sum
22 t
of counting number n2 , the estimated value of is given
by 1+3( nt2 ), which is the solution of ( 22
6 +)t = n2 .
The UMP test is given from the UMP test of Poisson distribution.
IX.
(iiib) Design II ( known), estimation from the coincidence count The asymptotic variance is 3(2+1)
2t .
COMPARISON OF THE ASYMPTOTIC

VARIANCES
We compare the asymptotic variances of the following designs for time allocation, when the dark count
2
variancevariance of HiL
threshold
1.75
1.5
1.25
1
0.75
0.5
0.25
0
0.25
0.5
fidelity
0.75
0.899 1
FIG. 3: Comparison of the designs for time allocation. The

asymptotic variances normalized by the value of modified visibility method are shown as a function of fidelity, where dots:
(iia), solid: (iib), thick: (iiia), and dash: (iiib).
12
X.
DESIGN III (: KNOWN, TWO STAGE)

A.
Optimal Allocation
The comparison in the previous section shows that

the measurement on the anti-coincidence bases yields a
better variance than the measurement on the coincidence
bases, when the fidelity is close to 1 and the parameters
and are known. We will explore further improvement
in the measurement on the anti-coincidence bases. In the
previous sections, we allocate an equal time to the measurement on each of the anti-coincidence bases. Here we
minimize the variance by optimizing the time allocation
tHV , tV H , tDX , tXD , tRR , and tLL between the anticoincidence bases |HV i, |V Hi, |DXi, |XDi, |RRi, and
|LLi, respectively. The number of the coincidence counts
nxy obeys Poisson distribution Poi((xy + )txy ) with

unknown parameter xy . Then, the Fisher information
matrix is the diagonal matrix with the diagonal elements
2 tV H
2 tDX
2 tXD
2 tRR
2 tLL
2 tHV
HV + , V H + , DX + , XD + , RR + , LL + .
Since we are interested in the parameter 1 =
1
2 (HV + V H + DX + XD + RR + LL ), the variance
is given by
1 HV + V H + DX +
+
+
4
2 tHV
2 tV H
2 tDX
XD +
RR +
LL +
+
,
+
+
2
2
tXD
tRR
2 tLL
(60)
as mentioned in section V A. Under the restriction of the

measurement time: tHV +tV H +tDX +tXD +tRR +tLL =
t, the minimum value of (60) is
( HV + + V H + + DX + + XD + + RR + + LL + )2
,
42 t
(61)
which is attained by the optimal time allocation
( xy + )t
txy =
,
HV + + V H + + DX + + XD + + RR + + LL +
called Neyman allocation. The variance with the equal

allocation is
3((2 2) + 6)
22 t
3((HV + V H + DX + XD + RR + LL ) + 6)
.
=
22 t
(63)
The inequality (61) (63) can be derived from
Schwartzs
inequality
of the vectors (1, . . . , 1) and
( HV + , . . . , LL + ). The equality holds if and

only if HV = V H = DX = XD = RR = LL .
Therefore, the Neyman allocation has an advantage over
the equal allocation, when there is a bias in the parameters HV , V H , DX , XD , RR , LL . In other words,
the Neyman allocation is effective when the expectation
values of the coincidence counts on some bases are larger
than those on other bases.
B.
Two-stage Method
The optimal time allocation derived above is not applicable in the experiment, because it depends on the
unknown parameters HV , V H , DX , XD , RR , and
LL . In order to resolve this problem, we introduce a
two-stage method, where the total measurement time t
is divided into tf for the first stage and ts for the sec-
(62)
ond stage under the condition of t = tf + ts . In the first

stage, we measure the coincidence counts on each basis
for tf /6 and estimate the expectation value for Neyman
allocation on measurement time ts . In the second stage,
we measure the coincidence counts on a basis |xA yB i according to the estimated Neyman allocation. The twostage method is formulated as follows.
(i) The measurement time for each basis in the first stage
is given by tf /6
(ii) In the second stage, we measure the coincidence
counts on a basis |xA yB i for txy defined as
mxy
txy = P
(t tf )
mxy
(x,y)B
where mxy is the observed count in the first stage.

(iii) Define
xy and as
1 nxy
)
(
txy
1 X
= 1
x,y
2
xy =
(x,y)B
where nx,y is the number of the counts on |xA yB i for txy .

We test the hypothesis (5) by
(
0 if c0 ,
T =
1 if > c0
where c0 is a constant which makes the level .
13
ANALYSIS OF EXPERIMENTAL DATA
The experimental set-up for the hypothesis testing is

shown in Fig. 4. The nonlinear crystals (BBO), the optical axis of which were set to orthogonal to on another,
were pumped by a pulsed UV light polarized in 45 direction to the optical axis of the crystals. One nonlinear
crystal generates two photons polarized in the horizontal direction (|HHi) from the vertical component of the
pump light, and the other generates ones polarized in the
vertical direction (|V V i) from the horizontal component
of the pump. The second harmonic of the mode-locked
Ti:S laser light of about 100 fs duration and 150 mW
average power was used to pump the nonlinear crystal.
The wavelength of SPDC photons was thus 800 nm. The
group velocity dispersion and birefringence in the crystal may differ the space-time position of the generated
photons and make the two processes to be distinguished
[16]. Fortunately, this timing information can be erased
by compensation; the horizontal component of the pump
pulse should arrive at the nonlinear crystals earlier than
the vertical component. The compensation can be done
by putting a set of birefringence plates (quartz) and a
variable wave-plate before the crystals. We could control the two photon state from highly entangled states to
separable states by shifting the compensation from the
optimal setting.
The coincidence count on the basis |xA yB i was measured by adjusting the half wave plates (HWPs) and the
quarter wave plates (QWPs) in Fig. 4. We accumulated
the coincidence counts for one second, and recorded the
counts every one second. Therefore, the time allocation
of the measurement time on a basis must be an integral
multiple of one second. Figure 5 shows the histogram of
the coincidence counts in one second on the basis
B = {|V Hi, |HV i, |XDi, |DXi, |RRi, |LLi},
NLC (BBOs)
Pol
800 nm
Coinc.
400 nm
tp 100 fs
HWP
QWP
PBS
SPDM
IF
Quartz
Plate
Bereck
Compensator
FIG. 4: Schematic of the entangled photon pair generation

by spontaneous parametric down conversion. Cascade of the
nonlinear crystals (NLC) generate the photon pairs. Group
velocity dispersion and birefringence in the NLCs are precompensated with quartz plates and a Bereck compensator.
Two-photon states are analyzed with half wave plates (HWP),
quarter wave plates (QWP), and polarization beam splitters
(PBS). Interference filters (IF) are placed before the single
photon counting modules (SPCM).
0.3
0.25
Probability
XI.
0.2
0.15
0.1
0.05
0
0
10
15
20
25
30

Counts
when the visibility of the two-photon states was estimated to be 0.92. The measurement time was 40 seconds on each basis. The distribution of the coincidence
FIG. 5: Distribution of the coincidence counts obtained in
events obeys the Poisson distribution. Only small numone second on the basis |V Hi, |HV i, |XDi, |DXi, |RRi, and
bers of coincidence were observed on |HV i and |V Hi
|LLi. Bars present the histograms of the measured numbers,
bases. Those observations agree with the prediction,
and lines show the Poisson distribution with the mean values
therefore, we expect that the hypothesis testing in the
estimated from the experiment. Measurement time was 40
previous section can be applied.
seconds for each basis.
In the following, we compare four testing methods on experimental data with the fixed total time t.
The testing method employ the different time allocations
{tHH , tV V , tDD , tXX , tRL , tLR , tHV , tV H , tDX , tXD , tRR , tLL }
(ii) Design I: is unknown. The coincidence and antibetween the measurement bases:
coincidence counts are measured with the optimal
time allocation at the target threshold 0 ;
(i) Modified visibility method: is unknown. The
coincidence and the anti-coincidence are measured
with the equal time allocation;
t1
tHH = tV V = tDD = tXX = tRL = tLR =
tHH = tV V = tDD = tXX = tRL = tLR
6
t
t2
= tHV = tV H = tDX = tXD = tRR = tLL =
tHV = tV H = tDX = tXD = tRR = tLL = ,
. (64)
(65)
12
6
14
where
In the method (iii), we measured the coincidence count

on each anti-coincidence basis for 40 seconds. Using the
total anti-coincidence count n, and applying (37), we cal0)
).
culated the p-value approximately by ( n(t/6)(22
t 2 20
t1 =
20 + 1 + 2 20
t 20 + 1
.
t2 =
20 + 1 + 2 20
(t/6)(220 )
(66)
(iii) Design II: is known. Only the anti-coincidence

counts are measured with the equal time allocation.
tHV = tV H = tDX = tXD = tRR = tLL =
t
,
6
(67)
(iv) Design III: is known. Only the anti-coincidence

counts are measured. The time allocation is given
by the two-step method:
tHV = tV H = tDX = tXD = tRR = tLL =
tf
6
(68)
in the first stage, and

mxy
(t tf )
mxy
(x,y)B
txy = P
(69)
in the second stage. The observed count mxy in

the first stage determines the time allocation in the
second stage.
We have compared the p-values at the fixed threshold 0 = 7/8 = 0.875 with the total measurement time
t = 240 seconds. As shown in section III B, the p-value
measures the minimum risk probability to reject the hypothesis H0 , i.e., the probability to make an erroneous
decision to accept insufficiently entangled states with the
fidelity less than the threshold. The results of the experiment and the analysis of obtained data are described in
the following.
In the method (i), we measured the coincidence on
each basis for 20 seconds. Using the total coincidence
count n1 and the total anti-coincidence count n2 , and
applying (37), we calculated the p-value approximately
by ( n2 (20 +1)n1 (220 ) ). We obtained n1 = 9686
(n1 +n2 )(20 +1)(220 )
and n2 = 868 in the experiment, which yielded the pvalue 0.343.

In the method (ii), the optimal time allocation was
calculated with (66) to be t1 = 55.6 seconds and t2 ==
184.4 seconds. However, since the time allocation should
be the integral multiple of second in our experiment, we
used the time allocation t1 = 54 and t2 = 186. That
is, we measure the coincidence count on each coincidence basis for 9 seconds and on each anti-coincidence
basis for 31 seconds. Using the total coincidence count
n1 and the total anti-coincidence count n2 , and applying (37), we calculated the p-value approximately by
( n2 (t1 /6)(20 +1)n1 (t2 /6)(220 ) ). We obtained n1 =
(n1 +n2 )(t1 /6)(20 +1)(t2 /6)(220 )
7239 and n2 = 2188 in the experiment, which yielded the

p-value 0.0736.
We used = 290 estimated from another experiment.

We obtained n = 2808 in the experiment, which yielded
the p-value 0.0438.
In the method (iv), the calculation is rather complicated. Similarly to (iii), was estimated to be 290
from another experiment. In the first stage, we measured the coincidence count on each anti-coincidence
basis for tf /6 = 1 second. We obtained the counts
6, 3, 13, 20, 11, and 23 on the bases |HV i, |V Hi, |DXi,
|XDi, |RRi, and |LLi, respectively. We made the
time allocation of remaining 234 seconds for the second stage according to (69), and obtained tHV = 28.14,
tV H = 19.90, tDX = 41.42, tXD = 51.37, tRR = 38.10,
and tLL = 55.09. Since the time allocation should be the
integral multiple of second in our experiment, we used
the time allocation {tHV , tV H , tDX , tXD , tRR , tLL } =
{28, 20, 42, 51, 38, 55}. We obtained the anti-coincidence
counts nHV = 99, nV H = 66, nDX = 703, nXD =
863, nRR = 531, and nLL = 853. Applying the counts
and the time allocation to the formula (40), we obtained
the p-values to be 0.0308.
The p-values obtained in the four methods are summarized in the table. We also calculated the p-values at
different values of the threshold 0 . The results are shown
in Figs. 6 and 7. As clearly seen, the optimal time allocation between the coincidence bases measurement and
the anti-coincidence bases measurement reduces the risk
of a wrong decision on the fidelity (the p-value) in analyzing the experimental data. The counts on the anticoincidence bases is much more sensitive to the degradation of the entanglement. This matches our intuition
that the deviation from zero provides a more efficient
measure than that from the maximum does. The comparison between (iii) and (iv) shows that the risk can
be reduced further by the time allocation between the
anti-coincidence bases, as shown in Fig. 7. The optimal
(Neyman) allocation implies that the measurement time
should be allocated preferably to the bases that yield
more coincidence counts. Under the present experimental
conditions, the optimal allocation reduces the risk probability to about 75 %. The improvement should increased
as the fidelity. However, the experiment showed almost
no gain when the visibility was larger than 0.95. In such
high visibility, errors from the maximally entangled state
are covered by dark counts, which are independent of the
setting of the measurement apparatus.
(i)
(ii)
(iii)
(iv)
p-value at 0.875 0.343 0.0736 0.0438 0.0308
15
dont need any further information of the probability distribution and the tested state. The test can be further
improved by optimizing time allocation between the anticoincidence bases, when the error from the maximally
entangled state is anisotropic. However, this time allocation requires the expectation values on the coincidence
counts, so that we need to apply the two stage method.
0.4
p-value
0.3
APPENDIX A: OPTIMIZATION OF FISHER
INFORMATION
0.2
In this section, we maximize the quantities appearing

in Fisher information.
0.1
0.872
0.873 0.874 0.875 0.876

threshold for hypothesis
Lemma 1 The equation

max
FIG. 6: Calculated p-value as a function of the threshold.

Dash-dot: (i)the modified visibility, dash: (ii)design I, dot:
(iii)design II, solid: (iv) design III.
t1 ,t3 0, ct1 +t3 =t
at2
at
act21
=
ct1 + t3
( c + 1)2
holds and the

maximum value is attained when t1 =
t , t3 = ct .
c+1
c+1
Proof: Letting x := ct1 + t3 , we have t1 =
xt
c1 .
Then,
act21
a
=
ct1 + t3
(c 1)2

ct2
x
+ (c + 1)t .
x
Hence, the maximum

is attained at x = ct, i.e., t1 =
c
t
and t3 = tc+1
. Thus,
c+1
at1
0.1
0.08
p-value
(A1)
act21
t1 ,t3 0, ct1 +t3 =t
ct1 + t3
a
at
2 ct + (c + 1)t =
.
=
2
(c 1)
( c + 1)2
max
0.06
0.04
0.02
0.872 0.873 0.874 0.875 0.876
threshold for hypothesis
FIG. 7: Calculated p-value as a function of the threshold

(magnified). Dash: (ii)design I, dots: (iii)design II, solid:
(iv) design III.
XII.
CONCLUSION
We have formulated the hypothesis testing scheme to

test the entanglement of the two-photon state generated
by SPDC. Our statistical method can handle the fluctuation in the experimental data properly. It has been shown
that the optimal time allocation improves the test: the
measurement time should be allocated preferably to the
anti-coincidence bases. This design is particularly useful
for the experimental test, because the optimal time allocation depends only on the threshold of the test. We
at1
Lemma 2 The equation
( act1 bct2 )2
max
at1 + bt2
t1 ,t2 0, t1 +t2 =t
ct1 + dt2
2
t( ad + bc)
=
.
(A2)
( c + d)2
holds,
and this maximum value is attained when t1 =
c
t
t d
.
, t2 = c+
c+ d
d
Proof: Letting x := ct1 + dt2 , we have t1 =
t2 = xct
dc . Then,
dtx
dc
( act1 bct2 )2
at1 + bt2
ct1 + dt2
!2

ad + bc
cdt2
(c + d)t x
.
=
dc
x
and
16
Hence,
the maximum
is attained at x =
t c
t d
and t2 = . Thus,
c+ d
c+ d
cdt, i.e., t1 =
( act1 bdt2 )2
max
at1 + bt2
t1 ,t2 0, t1 +t2 =t
ct1 + dt2
!
2

t( ad + bc)2
ad + bc
=
.
(c + d)t 2 cdt =
dc
( c + d)2
Further, three-parameter case can be maximized as follows.
Proof: The condition can be classified to two cases: i)

bdC {y = x} = , ii) bdC {y = x} =
6 . In the
case i), when fix x is fixed, maxy:(x,y)C f (x, y) =
maxy:(x,y)bdC f (x, y).
Then,
we
obtain
max(x,y)C f (x, y) = max(x,y)bdC f (x, y). In the case ii),
when (x, x) C, maxy:(x,y)C f (x, y) = f (x, x) = x.
Hence,
maxx:(x,x)C maxy:(x,y)C f (x, y)
=
maxx:(x,x)C x This maximum is attained at
x = maxx {x| : (x, x) C} or x = minx {x| :
(x, x) C}. These point belongs to the boundary
bdC.
Further, maxx:(x,x)C
/ maxy:(x,y)C f (x, y) =
maxx:(x,x)C maxy:(x,y)bdC f (x, y). Thus, the proof is
completed.
Lemma 3 The maximum value

max
t1 ,t2 ,t3 0, t1 +t2 +t3 =t
is
equal
to
at1 + bt2
( act1 bdt2 )2
ct1 + dt2 + t3
the
maximum among three values

act21
,
max
at2
max
at2
t2 ,t3 0, ct2 +t3 =t
t1 ,t3 0, ct1 +t3 =t
ct1 + t3
2
( act1 bdt2 )2
bdt2
,
max
at1 + bt2
.
dt2 + t3 t1 ,t2 0, t1 +t2 =t
ct1 + dt2
Proof: Define
two parameters x := ct1 + dt2 + t3 and
y := cdt1 bdt2 . Then, the range of x and y forms a
convex set. Since
bd(x t) + (d 1)y
,
t1 =
bd(c 1) + ac(d 1)
ac(x t) (c 1)y
.
t2 =
ac(c 1) + bd(d 1)
Hence,
( act1 bdt2 )2
at1 + bt2
ct + dt2 + t3
1

a bd
=
bd(c 1) + ac(d 1)

b ac
+
(x t)
ac(c 1) + bd(d 1)

a(d 1)
+
bd(c 1) + ac(d 1)

b(c 1)
y2
y
x
ac(c 1) + bd(d 1)
=
1
1
B2
(y Bx)2 + (
+ A)x At,
x
2
4
b ac
a bd
+
, B :=
bd(c1)+ ac(d1)
ac(c1)+ bd(d1)
a(d1)
b(c1)
ac(c1)+bd(d1) . Applying Lemma

bd(c1)+ ac(d1)
where A :=
4, we obtain this lemma.
Lemma 4 Define the function f (x, y) := x1 (y x)2 +

x on a closed convex set C. The maximum value is
realized at the boundary bdC.
APPENDIX B: PROOF OF INEQUALITIES (48)

AND (56)
It is sufficient to show
r

p
3
(2 + 1) + 6((2 + 1) + 6)
2

p
(2 + 1) (2 2) + 6

p
+ (2 2) (2 + 1) + 6 > 0.
By putting :=
6
,
(B1)
the LHS is evaluated as
LHS of (B1)
r
p
3
=
(2 + 1) + 3 (2 + 1) + )
2
p
p
(2 + 1) (2 2) + (2 2) (2 + 1) +
r
p
3
=
(2 + 1) + (2 + 1) (2 + 1) + )
2
p
(2 + 1) (2 2) +
!
r
p
3 p
+ (2 + 1) + ) (2 2) + .
=(2 + 1)
2
Since 0 1, we have
r
p
3 p
+ (2 + 1) + ) (2 2) +
2
r
+ 1 + 2 + .
2
( [0, ])
Further, the function 1 + 2 +q
has the minimum 1 2 > 1 > 32 at = 0.

Hence, LHS of (B1) > 0.
17
if and only if 1 > and > .
APPENDIX C: PROOF OF EQUATIONS (49)

AND (57)
It is sufficient to show that

r

p
3
(2 2) + 6((2 2) + 6)
2

p
(2 + 1) (2 2) + 6

p
+ (2 2) (2 + 1) + 6 > 0
6
APPENDIX D: PROOF OF (26) (27)
Define
i by
(C1)
if and only if
< 1 and 6 . By putting :=
the LHS of (C1) is evaluated as
6
,
LHS of (C1)
r
p
3
(2 2) + 3 (2 2) + )
=
2
p
p
(2 + 1) (2 2) + (2 2) (2 + 1) +
r
p
3
=
(2 2) + (2 2) (2 2) +
2
p
(2 2) (2 + 1) +
!
r
p
3 p
+ (2 2) + (2 + 1) + .
=(2 2)
2
Since 0 1 and 0,
r
p
3 p
+ (2 2) + (2 + 1) + > 0
2
min D(Poi(0, . . . , 0,
i , 0, . . . , 0)kPoi(1 , . . . , m ) = R.
w =c0
In fact,
=
=
=
i=1
i i
m
X
i 0: w =c0
j=1
j a + a log
min
0,0: i +M =c0
c0
awi
wi a + a log c0
c0
wM wi
wM + a log
wM
a
i
+ a + a log
if a
if a <
c0 (wM wi )
wM wi
c0 (wM wi )
.
wM wi
Next, we prove (27). It is sufficient to show that
pi min
D(Poi(0, . . . , 0, ai , 0, . . . , 0)kPoi(1 , . . . , m )).
w =c0
(D1)
i=1
w =c0
i=1
min
This value is monotone decreasing concerning a. When

M wi )
M wi )
0
i
, this value is wcM
c0 (w
log wMww
.
a = c0 (w
wM wi
wi wM
M
Hence, we obtain (28) and (29).
Lemma 5 implies
k
X
i=1
min
D(Poi(p1 a1 , . . . , pm am )kPoi(1 , . . . , m ))
D(Poi(0, . . . , 0, a, 0, . . . , 0)kPoi(1 , . . . , m )
k

)
) (
X k
min D(Poi(k , . . . , k )kPoi( , . . . , )) R
1
m
1
m

i
w =c0
1 .
(k1 , . . . , km
)
(k1 , . . . , km
) P

m wk <c
The above relation follows from the relation
k
X
min
i 0: w =c0
k
X
pi min
D(Poi(0, . . . , 0, ai , 0, . . . , 0)kPoi(1 , . . . , m ))
w =c0
pi D(Poi(0, . . . , 0, ai , 0, . . . , 0)kPoi(1,i , . . . , m,i ))
i=1
k
k
X
X
pi m,i ))
pi 1,i , . . . ,
D(Poi(p1 a1 , . . . , pm am )kPoi(
i=1
i=1
min
D(Poi(p1 a1 , . . . , pm am )kPoi(1 , . . . , m )).
w =c0
We
choose
1,i , . . . , m,i
such
that
minw =c0 D(Poi(0, . . . , 0, ai , 0, . . . , 0)kPoi(1 , . . . , m )) =
D(Poi(0, . . . , 0, ai , 0, . . . , 0)kPoi(1,i , . . . , m,i )). Then,
Lemma 5 Any real number 0 p 1 and any four

sequence of positive numbers (i ), (i ), (i ), and (i )
18
Hence,
satisfy
m
m
X
X
i
i log )
p( (i i ) +
i
i=1
i=1
m
m
X
X
i log i )
+ (1 p)( (i i ) +
i
i=1
i=1
m
X
i=1
((pi + (1 p)i ) pi + (1 p)i ))
m
X
i=1
(pi + (1 p)i ) log
(pi + (1 p)i )
.
(pi + (1 p)i )
Proof: It is sufficient to show

m
X
p(
i=1
m
X
i=1
i log
m
X
i log i )
) + (1 p)(
i
i
i=1
(pi + (1 p)i ) log
(pi + (1 p)i )
.
(pi + (1 p)i )
The convexity of log implies that
m
m
X
X
i
i log i )
i log ) + (1 p)(
p(
i
i
i=1
i=1
m
X
pi
i
) log( )
(p
+
(1
p)
i
i
i
i=1

(1 p)i
log( i )
+
(pi + (1 p)i )
i
m
X
(pi + (1 p)i )
(pi + (1 p)i ) log
.
(pi + (1 p)i )
i=1

(pi + (1 p)i )
(pi + (1 p)i )
)
(pi + (1 p)i )
i
i
(1 p)i
pi
+
)
= log(
(pi + (1 p)i ) i
(pi + (1 p)i ) i
i
pi
log( )
(pi + (1 p)i )
i
(1 p)i
i
+
log(
).
(pi + (1 p)i )
i
log(
[1] A. Aspect, P. Grangier, and G. Roger, Phys. Rev. Lett.,

49, 91 (1982).
[2] J. S. Bell. Speakable and Unspeakable in Quantum Mechanics: Collected Papers on Quantum Philosophy, Cambridge University Press, Cambridge, 1993.
[3] P. W. Shor, SIAM J. Comp. 26, 1484 (1997).
[4] C.H. Bennett, G. Brassard, C. Crepeau, R. Jozsa, A.
Peres, and W. K. Wootters, Phys. Rev. Lett. 70, 1895
(1993).
[5] H.-J. Briegel, W. Dur, J.I. Cirac, and P. Zoller, Phys.
Rev. Lett., 81, 5932 (1998).
[6] C. H. Bennett and G. Brassard, Proc. Int. Conf. Comput.
Syst. Signal Process., Bangalore, 1984, pp. 175-179.
[7] P.W. Shor and J. Preskill, Phys. Rev. Lett., 85, 441
(2000).
[8] C. W. Helstrom, Quantum detection and estimation theory Academic Press (1976).
[9] A. S. Holevo, Probabilistic and statistical aspects of quantum theory North-Holland Publishing (1982).
[10] M. Hayashi, Asymptotic Theory Of Quantum Statistical
Inference: Selected Papers World Scientific (2005).
[11] A. G. White, D. F. V. James, P. H. Eberhard, and
P. G. Kwiat, Phys. Rev. Lett., 83, 3103 (1999).

[12] M. Barbieri, F. De Martini, G. Di Nepi, P. Mataloni,
G. M. DAriano, and C. Macchiavello, Phys. Rev. Lett.,
91, 227901 (2003).
[13] Y. Tsuda, K. Matsumoto, and M. Hayashi. Hypothesis testing for a maximally entangled state, quantph/0504203.
[14] E. L. Lehmann, Testing statistical hypotheses Second edition. Wiley (1986).
[15] P. G. Kwiat, E. Waks, A. G. White, I. Appelbaum, and
P.H. Eberhard, Phys. Rev. A, 60, 773(R) (1999).
[16] Y. Nambu, K. Usami, Y. Tsuda, K. Matsumoto, and
K. Nakamura, Phys. Rev. A, 66, 033816 (2002).
[17] K. Usami, Y. Nambu, Y. Tsuda, K. Matsumoto, and
K. Nakamura, Accuracy of quantum-state estimation
utilizing Akaikes information criterion, Phys. Rev. A,
68, 022314 (2003).
[18] S. Amari and H. Nagaoka, Methods of Information Geometry, (AMS & Oxford University Press, 2000).
[19]

Hypothesis Testing For An Entangled State Produced by Spontaneous Parametric Down Conversion

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Hypothesis Testing For An Entangled State Produced by Spontaneous Parametric Down Conversion

Diunggah oleh

Hak Cipta:

Format Tersedia

Hypothesis testing for an entangled state produced

by spontaneous parametric down conversion

arXiv:quant-ph/0603254 v1 28 Mar 2006

ERATO Quantum Computation and Information Project,

National Institute of Informatics, Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan

COE, Chuo University, Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan

Fundamental Research Laboratories, NEC, Tsukuba 305-8501, Japan

The concept of entanglement has been thought to be

matrix from the coincidence counts of 16 combinations.

mize the time allocation, when the total photon flux is

HYPOTHESIS TESTING SCHEME FOR

The purpose of the test is to guarantee that the state is

((xy + )txy )nxy

holds, we can estimate the fidelity by measuring the

=2I 2|(+) ih(+) |

also holds, we can estimate the fidelity by measuring the

the best time allocation for estimation and test on the

In this section, we review the fundamental knowledge

Then, our decision method is described by a test, which

In this case, the p-vale is max0 P (x). However, the

The likelihood ratio test is a standard test being UMP

When a likelihood test r satisfies

Monotone Likelihood Ratio Test

In cases where the hypothesis is one-sided, that is, the

the test LR,r is MP of level . Indeed, when a test

One-Parameter Exponential Family

In mathematical statistics, exponential families are

Next, we consider testing the following hypothesis in

where k is the maximum value k satisfying

The Poisson distributions are also an exponential

Poi() = 1nene . In this case, the UMP test UMP,

Similarly, in the case of the Poisson distribution Poi(),

In the one-parameter case, UMP tests can be often

0 if inf 1 1 D(P(x) kP1 )

inf 0 0 D(P(x) kP0 ) log r

inf 0 0 D(P(x) kP0 ) < log r,

where the divergence D(P kP ) is defined as

inf D(P(x) kP1 ) D(P(x) kP0 )

= inf D(P(x) kP1 ) inf D(P(x) kP0 ).

In addition, (x) coincides with the MLE when x is observed.

In the case of the multi-normal distribution

inf D(Px kP1 ) inf D(Px kP0 )

inf (0 1 ) x + g(0 ) g(0 )

dx, In this case,

( ) and (x) = x. Since

Assume that the data x1 , . . . , xn obeys the identical

Hence, using this formula and (21), we can calculate the

information matrix is At J() A , where A;i,j = ij .

In the following, we call this value Fisher information.

In the following, we treat the p-values. First, we focus

Next, focus on the Poisson distribution case Poi(t).

when the time t is sufficiently large.

when the total number n1 + n2 is sufficiently large.

where Co is the convex hull of (x1 , y1 ), . . . , (xm , ym ).

matrix concerning the parameters and is

In the two photon interference, the coincidence

where the first element corresponds to the parameter

DESIGN I (: UNKNOWN, ONE STAGE)

where the first element corresponds to the parameter

We optimize the time allocation by minimizing the variance (44). We perform

Applying Lemmas 1 and 2 shown in Appendix A to the

The above method uses the ratio