intervalos

© All Rights Reserved

3 tayangan

intervalos

© All Rights Reserved

- Thyroid Cancer Letter
- Paper Statistics b.b.a.4.Reptears
- Class 3 - CI
- Mathematics IV Nov2003 or 311851
- STA6166_FinalExam_Ramin_Shamshiri_Part1
- Meta-Analysis Preferred Method
- Audits of Grant or Contrilbution Programs
- Mathematical Methods Curriculum Statement
- Output
- Mima Tutorial
- TTestLecture
- Mb0040 Statistics for Management Final
- 332
- Heat Tr Manual
- Normal Distribution
- Work Measurement
- Stats Report
- Stats Lec3
- stats final word
- stats project

Anda di halaman 1dari 8

Author(s): B. K. Ghosh

Source: Journal of the American Statistical Association, Vol. 74, No. 368 (Dec., 1979), pp.

894-900

Published by: Taylor & Francis, Ltd. on behalf of the American Statistical Association

Stable URL: http://www.jstor.org/stable/2286420

Accessed: 19-05-2017 20:24 UTC

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted

digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about

JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

http://about.jstor.org/terms

American Statistical Association, Taylor & Francis, Ltd. are collaborating with JSTOR to digitize,

preserve and extend access to Journal of the American Statistical Association

This content downloaded from 148.228.127.19 on Fri, 19 May 2017 20:24:22 UTC

All use subject to http://about.jstor.org/terms

A Comparison of Some Approximate Confidence

Intervals for the Binomial Parameter

B. K. GHOSH*

that are frequently recommended for large samples. We show that

one of these, which is in fact less popular in the literature, enjoys

certain advantages over the other one. The criteria used for com-

u n (X) = [2A + 02 - 0(02 + 4pq )][2(1 + 02)]-1 (1.3)

parison are the confidence coefficients, the lengths, and the Neyman uif(X) A[2 + 02 + 0(02 + 4A A)][2(1 + 02) }1

shortness of the intervals.

I'(X

v n =(X)

A _P0Q5AA)i

fpq) 2 "!

Vt Xn

= A5

(X)+ pOQ5A4)~

+ 2 P4, (141 .4)

KEY WORDS: Confidence intervals; Normal approximation;

Neyman shortness; Asymptotic relative efficiency; Arcsine trans- and

formation.

p = X/n, 1 p, 0 = z,t/ni (1.5)

1. INTRODUCTION M\any advanced texts usually recommend

Let X be the number of successes in n independent

the basis being of course that ni5 - p)[p(1 - p)]- is

Bernoulli trials and p the unknown probability of success asymptotically distributed as N(0, 1). Some of these

texts then point out that I, can be regarded as an ap-

on a single trial. Consider the problem of constructing a

proximation to Iu if one neglects 02 in comparison to terms

confidence interval I for p such that the coverage prob-

of order n-1. Almost all elementary texts describe only

ability P,(p C I) > a for all values of p E (0, 1),

Iv, often without giving a proper justification. One can,

where 0 < a < 1 is specified. We shall call a the con-

of course, develop Iv directly by noting that, as a con-

fidence level and a (p, n) = Pp (p E I) the confidence

sequence of Slutsky's theorem, n ( -p)(p4) 2 is also

coefficient of I.

asymptotically distributed as N(0, 1).

When n is small, it is customary to use the Clopper-

The purpose of this article is to give a comparative

Pearson interval or some variation of it, which guarantees

study of Iu and I, from several viewpoints. For definite-

that a (p, n) > a. If X = x is observed, then the Clopper-

ness, we shall call n small if n < 30, moderate if 30 < n

Pearson (1934) interval is defined by I = [p', p"],

< 100, and large if n > 100. Section 2 describes the

where p' and p" are, respectively, the solutions of

Pp(X < x - 1) = a + y and Pp(X < x) = -y for some

adequacy of the two intervals in fulfilling the basic re-

choice of y E (0, 1 - a). Several modifications of this

quirement that a (p, n) > a. We show that Iu is quite

interval have been proposed by various authors (Eudey

good in this regard even for small n, while I, can be quite

poor even for large n. Section 3 investigates the lengths

1949; Crow 1956; Clunies-Ross 1958) with a view toward

minimizing the expected length of I or its coverage

of the two intervals. In particular, we show that, in all

probabilities of false values of p. These intervals require

cases in which a (p, n) - a j is reasonably small under

extensive tables of p' and p", and this aspect may be Iu and Iv, the length of Iu is smaller than that of Iv with

a large probability. Section 4 investigates the Neyman

regarded by the practical statistician as an obstacle as

shortness (i.e., the probability of covering false values

the sample size increases (Anderson and Burstein 1967,

of p) of the two intervals. It is shown that, if

1968 provided some approximations for p' and p").

When n is large, two confidence intervals for p have Ia (p, n) - a j is small, Iu and Iv are not appre

different in this regard. Finally, Section 5 briefly des

gained universal acceptance in the literature, and they

the status of a third confidence interval and an allied

are known to guarantee that a (p, n) -+ a as n -+ o.

problem.

Define

The results of this article lead to the overall conclusion

+(z) = (27r)-i exp (_IZ2), 4(z) = f c(t)dt, (1.1) that Iu is preferable to Iv, whatever the sample size and

the true value of p may be. As Robbins (1977) noted in

and Za by 4 (za,) = 2(1 + a). Then these two approxi- a slightly different context,1 this feature should be of

mate intervals are some importance in practical statistics. We may add here

Iu = Eu'n (X), u"- (X)] , I, = [v'- (X), v" (X)] , (1.2) IHis question concerns the relative merits of two well-known

approximate tests for the equality of two binomial parameters, which

has since been partially answered by Eberhardt and Fligner (1977).

* B. K. Ghosh is Professor, Department of Mathematics, Lehigh

University, Bethlehem, PA. 18015. He is currently visiting the

Department of Statistics, Virginia Polytechnic Institute and State ? Journal of the American Statistical Association

University, Blacksburg, VA 24061. The author wishes to thank the December 1979, Volume 74, Number 368

editor and a referee for useful comments. Theory and Methods Section

894

This content downloaded from 148.228.127.19 on Fri, 19 May 2017 20:24:22 UTC

All use subject to http://about.jstor.org/terms

Ghosh: Confidence Intervals 895

that I, also suffers from an interpretational difficulty - -C2 in Lemma 1 of the Appendix to get (2.2) as

when one finds v'.(X) < 0 or v". (X) > 1 in practice

av(p, n) = a + (npq)`Oq(z.)

(e.g., if n = 100 and a = .99, then (1.4) shows that

v'n(X) < 0 for 1 < X < 6 and v"n(X) > 1 for 94 - (144npq)-1z,4(za)E24(2 - pq) + 8(1 + 1lpq)za2

< X < 99). + 16(1 - 4pq)za4] + O(n-1)

One final remark needs to be made here. The litera-

The last expression and (2.5) lead to (2.6).

ture on the study of the asymptotic normality of

It is clear from (2.5) that the speed of convergence of

n( - p) [p(1 - p)]-2 is extensive (Johnson and Kotz a, (p, n) to the level a may be slow, especially when p is

1969, p. 61). The main focus of these studies is on refine-

very small or large. A consoling factor is that, for any p

ments of bounds for I a (p, n) - a under I. rather than and a, n can be chosen sufficiently large such that

the kind of details presented in Section 2.

a, (p, n) actually exceeds a up to order n-v. Similar con-

clusions hold for a, (p, n), except that (2.6) shows that I,

2. THE CONFIDENCE COEFFICIENTS

should be more reliable than I, in satisfying the require-

We consider first the large-sample behavior of the con- ment a (p, n) > a for all p.

fidence coefficients of Iu and I,. Using (1.3) and (1.4), we Consider now situations in which the sample size is

note that small or moderate. Table 1 gives some values of the con-

fidence coefficients for several combinations of a, p, and

au(p, n) = P,(p E Iu)

n. The values were computed by using (2.1), (2.2), and

= Pp(U'n(p) < X < U"n(P)) , (2.1) tables of binomial probabilities provided by Aiken (1955).

a,v(p, n) = Pp(p E Iv) It can be easily verified from- (2.1) and (2.2) that, for all

= Pp(V'n(p) < X < V"n(p)) , (2.2) O < p < 1 and n > 1, au(p, n) = a,,(1 - p, n) and

where a,v(p, n) = a,(1 -p, n).

The pattern in Table 1 shows that the conclusions of

Un (p) = np - za(npq) 2 Theorem 1 are generally corroborated by small and

U"n(p) = np + za (npq) , (2.3)

1. Confidence Coefficients of I, and I,

V'n(p) = [2np + Za2 - Za(Za2 + 4npq) ]

n = 15 n =20

* [2(1 + za2/n) ]-I

a=.95 a=.99 a=.95 a=.99

V n (p) = [2np + Za + Za(za2 + 4npq) 1]

* [2(1 +Za2/n) ]-I , (2.4) p a, a, a, a, a, at, au a,

and q = 1 - p. The inequalities U',(p) < X < U",,(p) .01, .99 .860 .140 .990 .140 .983 .182 .983 .182

in (2.1) follow by inverting the inequalities u'.(X) < p .05, .95 .964 .536 .964 .537 .925 .639 .984 .641

.10, .90 .944 .792 .987 .794 .957 .876 .989 .878

< u".(X) under (1.3). Then the following theorem gives

.20, .80 .982 .815 .982 .961 .956 .921 .990 .928

some idea about the closeness of the confidence coefficient .30, .70 .915 .949 .996 .961 .975 .947 .994 .959

of Iu or I, to the level a for sufficiently large n. .40, .60 .939 .939 .985 .964 .963 .928 .990 .978

.50 .965 .882 .993 .965 .959 .959 .988 .959

Theorem 1: For any p E (0, 1) and a E (0, 1), the

n=30 n=50

confidence coefficients of Iu and I, satisfy

a=.95 a=.99 a=.95 a=.99

au(p, n) = a + (npq)`0(z.a)

- (144npq)<1z ap (Za) [24 (2 - pq) p au a, a(u a, au a,7 au a,

- 4 (7 - 22pq)za2 .01, .99 .964 .260 .964 .260 .911 .395 .986 .395

+ 4(1 - 4pq)za4] + O(ni), (2.5) .05, .95 .939 .782 .984 .785 .962 .920 .988 .923

.10, .90 .974 .809 .992 .957 .970 .879 .991 .965

au(p, n) - av i(p, n) = (12npq)-1z a3(za) .20, .80 .964 .946 .989 .953 .951 .938 .992 .979

.30, .70 .930 .953 .992 .968 .957 .935 .992 .979

*[3 + (q- p)2Za2] + 0(n-A) , (2.6) .40, .60 .962 .935 .986 .975 .941 .941 .987 .979

.50 .957 .957 .995 .984 .935 .935 .993 .985

and both expansions hold uniformly for Za in any bounded

interval. n =100 n =200

Proof: The expansion (2.5) follows by noting the a=.95 a=.99 a=.95 a=.99

asymptotic forms in (2.3) and putting a1 = -Za = -a2,

bi = 0 = b2, and c1 = 0 = C2 in Lemma 1 of the Appendix. p au a av a, a,) a, av au at,

To establish (2.6), write (2.4) as

.01, .99 .921 .633 .982 .634 .948 .862 .984 .866

.05, .95 .966 .877 .989 .962 .967 .926 .986 .973

VI (p) = np - za(npq) 2 + 2 (q - p)Za .10, .90 .936 .932 .988 .975 .956 .927 .987 .982

-8 (1 - 8pq)za3(npq)-l2 + O(n8-) .20, .80 .941 .933 .992 .986 .958 .941 .990 .987

.30, .70 .937 .950 .988 .987 .947 .944 .989 .989

.40, .60 .948 .948 .990 .986 .949 .949 .989 .988

and V"tn(p) in a similar asymptotic form. Put ai =-Z

.50 .943 .943 .988 .988 .944 .944 .991 .987

- -a2, b1 = 21(q - p)za2 = b2, and cl = -8 (1 -8pq)za3

This content downloaded from 148.228.127.19 on Fri, 19 May 2017 20:24:22 UTC

All use subject to http://about.jstor.org/terms

896 Journal of the American Statistical Association, December 1979

2. Confidence Coefficients and the Sets J(p) or large, one might argue that a confidence interval for p

= {XJpeI} of Iu and I, When n = 30 based on the Poisson approximation to the binomial dis-

tribution should be a better choice than I.. While this

p a Ju(P) a,, J,v(p) a, J*(P) a* Pp(X = 0)

may well be true, the resulting interval would require

.01 .95 0-1 .964 (1-3) .260 0-1 .964 .7397 extensive tables of the Poisson probabilities, a feature that

.99 0-1 .964 (1-5) .260 0-2 .997 makes it less desirable than some Clopper-Pearson type of

.05 .95 0-3 .939 (1-5) .782 0-3 .939 .2146

interval. Moreover, the confidence coefficient of the

.99 0-4 .984 (1-7) .785 0-4 .984

.10 .95 0-6 .974 (2-7) .809 1-6 .932 .0424 interval may unduly exceed a (Anderson and Samuels

.99 0-7 .992 (1-9) .957 0-7 .992 1967), perhaps at the cost of increasing its length or

.20 .95 2-10 .964 (3-11) .946 2-10 .964 .0012

.99 1-11 .989 (3-12) .953 1-11 .989

Neyman shortness. The simplicity and relative accuracy

.30 .95 5-13 .930 5-14 .953 5-14 .953 .0000 of I. should obviate such an alternative choice for large

.99 3-15 .992 (5-16) .968 3-15 .992

samples. It may be emphasized that both I, and I, will

.40 .95 7-17 .962 8-17 .935 7-17 .962 .0000

.99 6-18 .986 7-18 .975 6-18 .986 be unsatisfactory when p is extremely small (.001, say)

.50 .95 10-20 .957 10-20 .975 10-20 .957 .0000 or large.

.99 8-22 .995 9-21 .984 8-22 .995

NOTE: JX(p) is the set containing the least number of values of X whose total probability 3. THE LENGTHS

a* is closest to a. The sets in parentheses do not contain values of X with largest

probabilities.

A desirable feature of any confidence interval is that

its length be small, although this need not be the sole

moderate sample sizes. Thus, Iu satisfies the requirement criterion for choosing an interval. It follows from (1.3)

au (p, n) > a reasonably well for all .01 < p < .99, even and (1.4) that the lengths of I. and I, are

when n is as small as 20 (e.g., .911 < au < .983 when

a = .95, and .964 < au < .995 when a = .99). On the Lu = u " (X) - U'n(X) = 0(1 + 02)-1 (02 + 4ip?) 2 (3.1)

other hand, I, is unsatisfactory in this regard for most

L = v"n(X) - v'n(X) = 20( p2 (3.2)

values of p < .2 or >.8, even when n is as large as 200

(e.g., a, =.927 when a = .95 and p = .1, and a, = .866 It is clear that, for arbitrary values of n and a, Lu may

when a .99 and p = .01). This means, of course, that turn out to be smaller or larger than Lv, depending on the

the effect of the terms in O(n-1) is more pronounced on outcome X.

a,v(p, n) than on au(p, n). As n -* oo, 0 -> 0 for any fixed a, and it is well known

It is instructive to reinterpret the results of Table 1 that A -> p almost surely. Because Lu and L, are bounded

from the viewpoint of likelihood principle. Observing that continuous functions of A, it follows that Lu -> 0 and

a confidence interval I for p is a function of X L, -O0 almost surely for all 0 < p < 1, and in particular

E {0, 1, .. ., n}, define the set E,(Lu) -O 0 and Ep(Lv) -*0.

The following theorem throws some light on the rela-

J(p) = tx!p E I} (2.7) tive status of Lu and L, for arbitrary sample sizes.

Any reasonable I should be such that, for any given value

Theorem 2: For any p C (0, 1), a C (0, 1), and n > 1

of p, J(p) ought to consist of a small number of values of

the lengths of Iu and I, satisfy

X whose total probability is as close as possible to a.

Table 2 shows the sets (intervals) Ju (p) and J (p) cor- Pp(Lu < Lv) Pp(- [(1 + 2)/(2+ 02)] < 1

responding to Iu and I, when n = 30. It can be seen that < 2 + 1[(1 + 02)/(2 + 02)]2) (3.3)

I, is quite deficient in the sense just explained, and this

is true for other choices of n as well. In fact, J, (p) often > Pp (. 1464 < - < .8536) . (3.4)

excludes values of X with the largest probabilities. Table

Proof: It follows from (3.1) and (3.2) that Lu < Lv if

2 displays values of P,(X = 0) in order to point out that,

and only if p(1 -) > {(2 + 02)-1. The last inequality

in the extreme case when X = 0, h = {0 0 concludes that

can be rewritten as the inequalities in parentheses on the

p = 0 even though P, (X = 0) may be quite high for

right side of (3.3). As a increases from 0 to 1,

p > 0. None of these deficiencies seems to be present in

(1 + 02)/(2 + 02) increases from 2 to 1 and, therefore,

Iu. In fact, Table 2 shows that Ju(p) is almost identical

to J*(p), which consists of the least number of values of

X whose total probability a* is as close as possible to a. 3. Lower Bound of Pp(LU < L) From (3.4)

The arguments here give an additional justification for

n

preferring Iu to I,. p

If one has a priori evidence that the true value of p is 20 30 50 100 200

.20, .80 .7939 .7448 .8096 .9196 .9717

Iu and I, are satisfactory for all n > 20. As is shown in the .25, .75 .9087 .9021 .9547 .9946 .9998

next section, however, in such cases I,u will generally have .30, .70 .9645 .9699 .9927 .9998 1.0000

.40, .60 .9964 .9985 .9999 1.0000 1.0000

a smaller length than Iv,.

.50 .9996 .9999 1.0000 1.0000 1.0000

If one has a priori evidence that p is small (.01, say)

This content downloaded from 148.228.127.19 on Fri, 19 May 2017 20:24:22 UTC

All use subject to http://about.jstor.org/terms

Ghosh: Confidence Intervals 897

the right side of (3.3) is greater than where the U. and V. are defined in (2.3) and (2.4). Note

that f# and f,3 are functions of p, p', a, and n.

PP ( - 2 < A < 2 + 2-I) = Pp(.1464 < A < .8536) Consider first the large-sample behavior of (4.1) and

for all a E (0, 1). (4.2). As n -? oo, it follows from (2.3) and (2.4) and the

asymptotic normality of (X - np) (npq)-- that f 0 0

Table 3 gives some values of the lower bound (3.4) for

and f3 --* 0 for all p' $ p and a. This implies that both

PP(LU < Lv), which were computed by using tables

I. anid I, are consistent and asymptotically unbiased in-

of binomial probabilities provided by Aiken (1955).

tervals. In order to determine the asymptotic relative

Table 3 is restricted to .2 < p < .8 merely because out-

efficiency (ARE) of Iu against Iv, it is necessary to look

side this range I a, (p, n) - a will be unacceptably large,

at the rates with which u,. and #,, tend to zero. Let n. and

as shown in the earlier section. Table 3 clearly indicates

nm denote the sample sizes of I. and I,.

a favorable status of I. compared with I,.

Suppose p' (n.) is a sequence of false values of

Theorem 2 and the asymptotic normality of

the parameter. Since 0 < [p'(n)q'(nu)]' < 2 and

- p) (pq)A- imply that, as n -+ o, Pp (Lu < L) 1

(X - np) (npq)-i is asymptotically distributed as N(0, 1),

if .1464 < p < .8536. In fact, Table 3 shows the speed of

this convergence as n varies between 20 and 200. Simi-

it follows from (2.3) and (4.1) that lim #u exists and lies

in (0, a) if and only if p'(nu) = p + aunu- + o(nu`) for

larly, Pp(Lv < L.) -* 1 if p < .1464 or >.8536, which

some constant au - 0. The limiting Neyman shortness

indicates an advantage of I, over I. for quite small or

of I. is, in fact,

large values of p. But in practice, this advantage will

materialize only for extremely large sample sizes because lim fl ) = - (au[pq]- + z) - u4(aE[pq] - z) * (4.3)

only then a, (p, n) would be comparably close to a as nu ---o

a,u(p, n) is to a. In exactly the same way, we find from (2.4) and (4.2)

The conclusions of Theorem 2 are also borne out by that lim f3 exists and lies in (0, a) if and only if p'(nu)

the moments of L, and L,. For instance, it can be shown

= p + a,n,-' + o(n,`) for some constant a, -4 0, and

without any difficulty that the limiting Neyman shortness of I, is given by (4.3),

EP(Lu - L,) = 40(1 - 8pq)(pq)-2 + 0(nr2) where au is replaced by a,. But p'(nu) = p'(n,) implies

n,/nu = (a,/au)2 asymptotically, and lim fu = lim f3, im-

which implies that, to order n-1, E,(Lu) < E,(Lv)

plies if

au = 4 a,. Hence, the limiting ratio of the sample

.1464 < p < .8536. sizes required by I, and Iu, both of which have con-

fidence coefficient a, to achieve the same Neyman short-

4. THE NEYMAN SHORTNESS ness (between 0 and a) for the same false value (arbi-

trarily close to p) is unity. This is the so-called Pitman-

Any reasonable confidence interval should cover false

Noether ARE of Iu against I, (Noether 1955).

values of the parameter with a small probability. This

Next, suppose that the false value p' is a fixed number.

aspect is known as the Neyman shortness of the interval.

It follows from (2.3) and (4.1), and taking a, = -zc,

Let p' - p, 0 < p' < 1, denote a false value of the

= -a2 and bi(nu) = 0 = b2(nu) in Lemma 2 of the

binomial parameter. Then the Neyman shortnesses of

Appendix, that

Iu and I, are, respectively,

lim nu-' log Au = p' log (p/p') + q' log (q/q') . (4.4)

Pp(p' C IU) = Pp(U'n(p') <KX < U"n(p')) nu >e

= Au , say , (4.1)

It is easily verified that (4.4) is negative for all p' =4 p.

Pp(p C I,) = Pp(V'n(p') < X < V n(p')) Similarly, using (2.4) and (4.2), and taking al= -Z=

= f, , say , (4.2) = -a2, bi(n,) = 2(q'-p')zo2 + 0(nv-) and b2(n,)

False p' True p

.2 .3 .4 .5a

18. ~

u 13 pu

pv .8 1811.

v . 8,16Pv

u~~~~~~~f3 l Pu Pu

.2 .9507b .9375b .5690 .6832 .0955 .1561 .0033 .0077

.3 .6926 .5562 .9567b .9347b .6699 .6694 .1611 .1611

.4 .1106 .1106 .6718 .6718 .9406b .9406b .6636 .6636

.5 .0025 .0025 .1406 .1406 .6639 .6639 .9351 b .9351b

.6 .0000 .0000 .0056 .0056 .1562 .1562 .6636 .6636

.7 .0000 .0000 .0000 .0000 .0076 .0076 .1611 .1611

.8 .0000 .0000 .0000 .0000 .0000 .0001 .0033 .0077

I Confidence coefficients of I, and /v.

This content downloaded from 148.228.127.19 on Fri, 19 May 2017 20:24:22 UTC

All use subject to http://about.jstor.org/terms

898 Journal of the American Statistical Association, December 1979

= - p')za2. + O(n,- ) in Lemma 2, we find that terval for p, which is seldom cited in the literature. Using

lim n,-' log fl is also given by the right side of (4.4). the well-known fact that 2n{[arcsin(pj) - arcsin(pl)]

Consequently, the limiting ratio of the sample sizes -* N(O, 1) as n - oo, one gets this confidence interval as

required by I, and I., both of which have confidence IW = [w',n(X), W",n (X)], where

coefficient a, to achieve an arbitrarily small Neyman

WIn (X) = sin2 [arcsin (p-)-2O]

shortness (i.e., #u, (, < e) at any specified p' is unity.

Wfn (X) = sin2 [arcsin(p--) + 10] , (5.3)

This is the so-called Hodges-Lehmann (1956) ARE of I.

against I,. It should be noted that the ARE here need not and 0 is defined in (1.5). Note that the confidence limits

imply that lim (f3/t3) is one if both Iu and I, are based in (5.3) lie, like those of I, between 0 and 1. We sum-

on the same n. Note also that Lemma 2 is actually marize some comparative features of I, in relation to

stronger than what is needed (e.g., Pp(X = k)/Pp(X > k) I. and I,.

> T1 and Stirling's formula) to establish (4.4). Lemma 2 It can be shown by the technique of Theorem 1 that

shows quite precisely the rate with which Au and (3 tend

to zero. aw(p, n) - a,,(p, n) = (48npq)->za3,(z.)

Thus, we have proved the following result concerning * [32pq + 5 (q - p)2Za2] + O (n-) . (5.4)

the Neyman shortness of Iu and I, in large samples.

Relation (5.4) implies that, for large n, Iw should fulfill

Theorem 3: The Pitman-Noether ARE of I. against I, the requirement aw(p, n) > a more satisfactorily than I,

is unity for all 0 < p < 1 and 0 < a < 1, and the fulfills a, (p, n) > a. Numerical calculations similar to

Hodges-Lehmann ARE of Iu against I, is unity for all those in Table 1 confirm this conclusion for small and

0 <p $ p' <1 and 0 <a < 1. moderate sample sizes. Next, since sin 0 < 0 for all 0 > 0,

Numerical calculations of f3 and 3, for various com- it follows from (1.4) and (5.3) that Lw < L, for all

binations of p, p', and a tend to confirm the assertions of 0 < X < n andn > 1 (L, = 0 = L,. when X = 0 or n).

Theorem 3 in small or moderate sample sizes. Table 4 Moreover, it can be shown by the technique of Theorem 3

shows some typical values of fu and #, when n that

= 50the

andPitman-Noether or Hodges-Lehmann ARE of

p is restricted to a range that assures that both Iw against I, is unity. Calculations similar to those in

Table 4 show

i au (p, n) - a I and I a, (p, n) - a I are reasonably that, for small and moderate sample sizes,

small.

Bearing in mind that a comparison of Neyman shortness fw lies between f. and f,, but closer to f.. These observa-

is meaningful only when the two intervals have the same tions suggest that Iw should generally be a better choice

confidence coefficient, it seems that for moderate sample than I,.

sizes ,,u may be slightly smaller than AU, when the false The distinction between Iw and Iu is not so clear-cut.

values lie near zero or one. Relations (2.5), (2.6), and (5.4) show that, to order n ,

au (p, n) - aw (p, n) may be positive or negative. Numeri-

5. SOME COMMENTS cal calculations for small or moderate values of n reveal

the same feature, and Table 5 reproduces some of these

It is informative to note that I. can be viewed as a

calculations. Since a major purpose of the arcsine trans-

modification of I, along Bayes-Laplacian lines. Observe

formation is to normalize the binomial variable (Johnson

that I, is a confidence interval centered at the minimum

variance unbiased estimate A of p and based on the usual

estimate 54 of the variance pq. The interval Iu is first 5. Comparison of a,, - a and a,, - a

shifting A to

p a=.95

P = (np + lZa2)/(n + za 2)

n= 15 20 30 50 100 200

= p + 02 (1 + 02)1( - p) , (5. 1)

.01 au - a -.090 .033 .014 -.039 -.029 -.002

which essentially adds IZ a2 (instead of the usual 1) to a, - a .040 .033 .047 .036 .032 -.100

both successes and failures to estimate p. Since 2.20

- I au -a .032 .006 .014 .001 -.009 .008

< A- , the net effect of this shift is to shrink A aw- a -.003 -.051 -.020 -.012 .005 -.002

toward 2. When A is close to 0 or 1, one may indeed prefer

.50 au - a .015 .009 .007 -.015 -.007 -.006

a different estimate for p, and Iu provides an objective aw - a .015 .009 .007 -.015 -.007 -.006

guideline for this purpose. Next, with p chosen as the

p a=.99

center of the confidence interval, Iu is also shrinking the

variance estimate pq to n= 15 20 30 50 100 200

pq - 402(1 + 02)-I = (PA + 102)(1 + 02)-2 (5.2) .01 au -a .000 -.007 -.026 -.004 -.008 -.006

aw - a .010 .009 .010 .008 .009 -.125

The asymptotic normality of (p - p)nl[pq - 402/

.20 au - a -.008 .000 -.001 .002 .002 .000

(1 + 02)]-2 then leads to Iu.

a,-a -.029 -.004 -.004 .002 .001 -.001

The expressions in (5.1) and (5.2) should conceptually

.50 au, - a .003 -.002 .005 .003 .002 .001

simplify the formulas in (1.3).

am -ax .003 -.002 -.006 -.005 -.002 .001

It is of some interest to study a third asymptotic in-

This content downloaded from 148.228.127.19 on Fri, 19 May 2017 20:24:22 UTC

All use subject to http://about.jstor.org/terms

Ghosh: Confidence Intervals 899

and Kotz 1969), one may be tempted to surmise that, as B = 72[c20(a2) - cl4(a,)] - [9(2b2 + 1)2

n increases, Ia (p, n) - a will converge to zero more - 6(q - p)(a22 - 3)(2b2 + 1) + 3(1 - 2pq)

rapidly than does Ia. (p, n) - al. Table 5, however, - (7 - 22pq) a22 + (1 - 4pq) a24]a2O(a2)

shows that this is clearly not the case.

+ [9(2b, - 1)2 - 6(q - p)(a12 -3)(2b, - 1)

One can show by the method of Theorem 2 that, for

sufficiently large n, PP(LU < Li,,) is close to 1 if .1838 + 3(1 - 2pq) - (7 - 22pq)a,2

< p < .8162 and Pp(Lw < L.) is close to 1 if p < .1838 + (1 -4pq) al4 ]aj4 (a,)

or >.8162. Finally, the Pitman-Noether or Hodges-

and 4(z) and 4D(z) are defined in (1.1).

Lehmann ARE of I, against I. is unity. Proof: Taking 0 = 2 and , = 3 in Theorem 6 of

A related problem in the theory of confidence intervals

Kalinin (1967), one gets, after some algebra,

for p is to find an interval I whose confidence coefficient

is at least a and length is at most 2d for all values of X, Pp (k1 < X < k2) = 4(2) (Y- )

where 0 < d < 2 is specified. An exact solution to this + Ql(npq)- - Q2(npq)-l + O(n-1)

problem does not exist for arbitrary d. An asymptotic

where

(as d -*0) solution is provided by I, or I, or I, by

choosing the sample size suitably, as clarified in the y= (kI- -7np) (npq)-A2

following paragraph. 2= (k2 + 2 -np) (npq)-2

It is easily shown from (3.1) that L. < Zcr2(n + Zcr2)-2

Qi = (q - p)[(l -Y22)(Y2) -(1 - y,2)4(yl)]

for X = 0, 1, .. ., n. Hence, if we choose n = nu, where

nfu is the smallest integer greater than or equal to Q2= (1/24)(1 -2pq)[y24(y2) -Y1(y1)]

z2(ld-2 - 1), then L, < 2d for all X. Moreover, if d is - (1/72)(7 - 22pq)[y2340(Y2) - yl3O(Yl)]

small, then n. is large and, as shown in Section 2, + (1/72)(1 - 4pq)[y254(y2) - y154(y1)]

la,(p, n.) - a is small. Similar arguments based on

Kalinin proved that this expansion holds uniformly for

(3.2) and (5.3) show that the appropriate sample sizes

Yl < y2 in any bounded interval. Putting ki = np

for I, and I, are n, and np,, which are the smallest integers

+ ai(npq)l + bi + ci(npq)-A for i = 1, 2 and expanding

> z 2/ (4d2) and [zar/arcsin (2d) ]2, respectively. One can

4> (yi) and 4 (yi) around ai, one obtains the expansion in the

now verify that n. < nw < n, for all a and d, which

lemma.

implies that the interval I. should be the best choice

among the three. The actual difference between the three Lemma 2: Let X be a binomial (n, p) variable, a, < a2

sample sizes is hardly noticeable for conventional values be constants, and b1(n) and b2(n) be bounded (bi(n)

of a and d (e.g., nu = 1,837, nw = 1,841, and n, = 1,844 < b2(n) if a, = a2). Then, for large n and for any

when a = .99 and d = .03). Nevertheless, one would still pI z p, 0 < p' < 1,

prefer I, or I, to I, on the basis of their confidence

coefficients and lengths.

Pp(np' + ai[np'q']' + bi(n) < X < np'

A somewhat similar problem in the theory of point + a2[np'q'] 1 + b2(n)) = K[RR{1 + o(1)}

estimation for p should not be confused with the problem - (pq'/p'q)R2{ 1 + o (1) ]

described earlier. If one wants a point estimate r(X) of where

p such that P,( r(X) - p < d) > a, then r(X) = p

K = (27rnp'q')- 2 (p'q)(p' - p)P[p/p')IP(q/q)Q']nl

based on sample size n, is the well-known solution (Feller

1957, p. 176). As a confidence interval for p with maxi- Ri = (pq'/p'q)ai(nP'Q)*?bi(n) exp (-lai2) i = 1, 2

mum length 2d, however, Io = [p-d, p + d] is worse

Proof: Consider first the case when 0 < p < p' < 1.

than I. or I, or I, in the sense that Io contains the other

three intervals for sufficiently small d. Bahadur (1960) has shown that, for np < k < n,

APPENDIX

< Pp(X = k)/Pp(X > k) < T1

Lemma 1: Let X be a binomial (n, p) variable, and

+ T2[(k + 2) (k - np + q) + (n - k)p]-l

al < a2, b , b2, Cl, C2 be constants that may depend on

p(bO < b2 if a, = a2, and cl < C2 if al = a2 and b1 = b2). where

Then the asymptotic expansion

T, = (k - np + q)/[(k + 1)q]

Pp(np + alEnpq]4 + bi + clEnpq]-j < X < np T2 = (n + 1)(n - k)p2/[(k + 1)q]

+ a2[npq]2 + b2 + c2[npq]-I) = 4(a2) - 4(al) Putting k = np' + ai(np'q')' + bi(n) and k = np'

+ (A/6)(npq)-- + (B/72)(npq)-1 + O(n-1) + a2(np'q')' + 1 + b2(n) successively for sufficiently

large n, and applying Stirling's formula to the three

holds for large n and uniformly for al < a2 in any bounded

interval, where factorials in Pp(X = k), one obtains the expansion of the

lemma. Next, if 0 < p' < p < 1, one arrives at the same

A = [3 (2b2 + 1) - (q -p) (a22 -1) ]P(a2) result by applying the technique to the binomial (n, q)

- [3(2b - 1) -(q -p)(ai2 - )]p(ai), variable Y = n -X.

This content downloaded from 148.228.127.19 on Fri, 19 May 2017 20:24:22 UTC

All use subject to http://about.jstor.org/terms

900 Journal of the American Statistical Association, December 1979

metrika, 43, 423-435.

Aiken, H. (1955), Tables of the Cumulative Binomial Probability Eberhardt, K.R., and Fligner, M.A. (1977), "A Comparison of Two

Distribution, Cambridge, Mass.: Harvard University Press. Tests for Equality of Two Proportions," The American Statistician

Anderson, T.W., and Burstein, H. (1967), "Approximating the 31, 151-155.

Upper Binomial Confidence Limit," Journal of the American Eudey, M.W. (1949), "On the Treatment of Discontinuous Random

Statistical Association, 62, 857-861. Variables," Technical Report No. 13, Statistical Laboratory,

(1968), "Approximating the Lower Binomial Confidence University of California, Berkeley.

Limit," Journal of the American Statistical Association, 63, 1413- Feller, W. (1957), An Introduction to Probability Theory and Its

1415. Applications (Vol. I), New York: John Wiley & Sons.

Anderson, T.W., and Samuels, S.M. (1967), "Some Inequalities Hodges, J.L., and Lehmann, E.L. (1956), "The Efficiency of Some

Among Binomial and Poisson Distributions," Proceedings of the Nonparametric Competitors of the t-Test," Annals of Mathe-

Fifth Berkeley Symposium on Mathematical Statistics and Prob- matical Statistics, 27, 324-335.

ability, 1, 1-12. Johnson, N.L., and Kotz, S. (1969), Distributions in Statistics:

Bahadur, R.R. (1960), "Some Approximations to the Binomial Discrete Distributions, Boston: Houghton Mifflin.

Distribution Function," Annals of Mathematical Statistics, 31, Kalinin, V.M. (1967), "Convergent and Asymptotic Expansions for

43-54. Probability Distributions," Theory of Probability and Its Ap-

Clopper, C.J., and Pearson, E.S. (1934), "The Use of Confidence or plications, 12, 22-35.

Fiducial Limits Illustrated in the Case of the Binomial," Bio- Noether, G.E. (1955), "On a Theorem of Pitman," Annals of

metrika, 26, 404-413. Mathematical Statistics, 26, 64-68.

Clunies-Ross, C.W. (1958), "Interval Estimation for the Parameter Robbins, H. (1977), "A Fundamental Question of Practical Sta-

of Binomial Distribution," Biometrika, 45, 275-279. tistics," The American Statistician, 31, 97.

This content downloaded from 148.228.127.19 on Fri, 19 May 2017 20:24:22 UTC

All use subject to http://about.jstor.org/terms

- Thyroid Cancer LetterDiunggah olehLisa Sorg
- Paper Statistics b.b.a.4.ReptearsDiunggah olehfazalulbasit9796
- Class 3 - CIDiunggah olehjbrunomaciel1957
- Mathematics IV Nov2003 or 311851Diunggah olehNizam Institute of Engineering and Technology Library
- STA6166_FinalExam_Ramin_Shamshiri_Part1Diunggah olehRaminShamshiri
- Meta-Analysis Preferred MethodDiunggah olehHasan Tayyar BEŞİK
- Audits of Grant or Contrilbution ProgramsDiunggah olehRay Brooks
- Mathematical Methods Curriculum StatementDiunggah olehegh1
- OutputDiunggah olehNNofyawati
- Mima TutorialDiunggah olehelmarjansen
- TTestLectureDiunggah olehInnocent Harry
- Mb0040 Statistics for Management FinalDiunggah olehDip Konar
- 332Diunggah olehMuhammad Usman Lateef
- Heat Tr ManualDiunggah olehjonnychad1
- Normal DistributionDiunggah olehYo_Rocks
- Work MeasurementDiunggah olehNishu Thimmaiah
- Stats ReportDiunggah olehbhavesh_p12
- Stats Lec3Diunggah olehAsad Ali
- stats final wordDiunggah olehapi-260800543
- stats projectDiunggah olehapi-249338748
- term project part 5Diunggah olehapi-270355752
- Advanced Material Characterisation of Australian Mixes for Pavement DesignDiunggah olehbkollarou9632
- D 2915 – 03 ;RDI5MTU_Diunggah olehCarlos L. Oyuela
- lessons on statisticsDiunggah olehapi-327863428
- ciexercises_extract.pdfDiunggah olehRoshan Singh
- Acl Assignments 3 eDiunggah olehQonita
- SampleSize.TAScott.handout.pdfDiunggah olehKartika Fitri
- bal fix.docDiunggah olehwinda wahyuni
- An Introduction to Statistics.pdfDiunggah olehFrescura1
- Chapter08.pptDiunggah olehShreyas Satardekar

- Exercícios fenômenosDiunggah olehMeuri Hesper
- DrvDiunggah olehHANSLEY MERVIN RUGHOONATH
- Application 10Diunggah olehDr. Ir. R. Didin Kusdian, MT.
- Basic Statistics Formula SheetDiunggah olehHéctor Flores
- Regression Modeling.pdfDiunggah olehJuanCarlosAguilarCastro
- Survey and Analysis of Pavement Condition IndexDiunggah olehfranspratamast
- Method Validation Report Template 1Diunggah olehAbu Wildan
- day-2.pptxDiunggah olehDivyesh Patel
- Multivariate Linear RegressionDiunggah olehN Kusuma Wardhana
- R11_Time-Series_Analysis_Q_Bank.pdfDiunggah olehZidane Khan
- A Forecasting Model for MalaysiaDiunggah olehasyikinhazam
- AP Statistics 1st Semester Study GuideDiunggah olehSusan Huynh
- PTU 2.0 Compiled Test Doc (Proficiency Version)Diunggah olehChristopher Wells
- Assigned TextDiunggah olehJean Jeiner Diaz Lozada
- For MolasDiunggah olehAbdu Abdoulaye
- A neural network based several-hour-ahead electric load forecasting using similar days approachDiunggah olehTariq Khan
- Freefall Lab ReportDiunggah oleholiver gorton
- PHase Dia.pdfDiunggah olehViren Patel
- Chapter 11Diunggah olehta.ba
- ExercisesDiunggah olehTrương Xuân Quý
- Humidity CalculatorDiunggah olehelnaqa176
- On Sampling Errors in Empirical Orthogonal FunctionsDiunggah olehcfisicaster
- Dowden Logical ReasoningDiunggah olehMafe Salazar
- Optimizing Safety StockDiunggah olehjohanmateo
- Application of Microchemical Characterization of Placer Gold Grains to Exploration for Epithermal Gold Mineralization in Regions of Poor ExposureDiunggah olehjeffauston
- R7310306 Heat TransferDiunggah olehsivabharathamurthy
- Excel Howto DashboardDiunggah olehThảo Nguyễn
- Nickel Ore BritanniaDiunggah olehgusoyong
- Annual Abstract Statistics Volume-1Diunggah olehkhalidomaar
- 6.3-Process-Stability-SPC.pdfDiunggah olehMohammed Amine Labbardi