Samworth's Treatment of Conjugate Gradient Method

© All Rights Reserved

0 tayangan

samworth12st

Samworth's Treatment of Conjugate Gradient Method

© All Rights Reserved

- Derivation of BLUE property of OLS estimators
- MA1022_TUTORIAL_5_6_2010
- r22_art3_c
- Melo_et_al_Oikos_2003
- Stats220 Week8 Homework Example
- Savoia, 2009
- Cindy C. Crawford et al- Alterations in Random Event Measures Associated with a Healing Practice
- Interpretation With Expanded Uncertainty 2007 v1
- Rocky-Mountain-Power--16-Scott-D-Thornton
- State Estimation
- Chap 2
- Enhanced Identification of Battery Model for Real Time Battery Management
- UICC Report Final - Cell Towers and Children
- Assignment
- AM Distillate
- Economic Returns to Communist Party Membership-Evidence From Chinese Twins
- Bootstrap Based Birnbaum Saunders
- oyeaaaah
- Mid Semester Marking Scheme.docx
- kill me

Anda di halaman 1dari 4

P

erhaps the most surprising result in Statis- problem above with p = 1 and the squared error

tics arises in a remarkably simple estima- loss function, the estimator θ̂ = 37 (which ignores

tion problem. Let X1, …, Xp be independent the data!) is admissible. On the other hand, deci-

random variables, with Xi ∼ N(θi , 1) for i = 1, …, sion theory dictates that inadmissible estimators

p. Writing X = (X1, …, Xp)T, suppose we want to can be discarded, and that we should restrict our

find a good estimator θ̂ = θ̂(X) of θ = (θ1, …, θp)T. choice of estimator to the set of admissible ones.

To define more precisely what is meant by a good

estimator, we use the language of statistical deci- This discussion may seem like overkill in this

sion theory. We introduce a loss function L(θ̂, θ), simple problem, because there is a very obvious

which measures the loss incurred when the true estimator of θ: since all the components of X are

value of our unknown parameter is θ, and we esti- independent, and E(Xi) = θi (in other words Xi

mate it by θ̂. We will be particularly interested in is an unbiased estimator of θi ), why not just use

the squared error loss function L(θ̂, θ) = �θ̂ – θ� 2 , θ̂ 0 (X) = X? Indeed, this estimator appears to have

where � . � denotes the Euclidean norm, but other

several desirable properties (for example, it is the

choices, such as the absolute error loss L(θ̂, θ) =

maximum likelihood estimator and the uniform

∑i=1 �θ̂i – θ i � are of course perfectly possible.

p

minimum variance unbiased estimator), and by

Now L(θ̂, θ) is a random quantity, which is not the early 1950’s, three proofs had emerged to show

ideal for comparing the overall performance of that θ̂ 0 is admissible for squared error loss when

two different estimators (as opposed to the loss- p = 1. Nevertheless, Stein (1956) stunned the sta-

es they each incur on a particular data set). We tistical world when he proved that, although θ̂ 0 is

therefore introduce the risk function admissible for squared error loss when p = 2, it is

inadmissible when p ≥ 3. In fact, James and Stein

(1961) showed that the estimator

If θ̂ and θ̃ are both estimators of θ, we say θ̂

strictly dominates θ̃ if R(θ̂, θ) ≤ R(θ̃, θ) for all θ,

with strict inequality for some value of θ. In this

case, we say θ̃ is inadmissible. If θ̂ is not strictly

dominated by any estimator of θ, it is said to be strictly dominates θ̂ 0. The proof of this remark-

admissible. Notice that admissible estimators able fact is relatively straightforward, and is given

are not necessarily sensible: for instance, in our in the Appendix.

as the sum of two odd composite numbers.

One of the things that is so surprising about this where x+ = max(x, 0). The risk of the positive-

result is that even though all of the components part James–Stein estimator θ̂+JS = θ̂ +,

JS

0 is also in-

of X are independent, the ith component of θ̂ JS cluded in Figure 1 for comparison. Remarkably,

depends on all of the components of X. To give even the positive-part James–Stein estimator is

an unusual example to emphasise the point, sup- inadmissible, though it cannot be improved by

pose that we were interested in estimating the much, and it took until Shao and Strawderman

proportion of the US electorate who will vote for (1994) to find a (still inadmissible!) estimator to

Barack Obama, the proportion of babies born in strictly dominate it.

China that are girls and the proportion of Britons

with light-coloured eyes. Then our James–Stein

estimate of the proportion of democratic voters

Generalisations and

depends on our hospital and eye colour data! Related Problems

The reader might reasonably complain that in It is natural to ask how crucial the normality and

the above examples, the data would be binomi- squared error loss assumptions are to the Stein

ally rather than normally distributed. However, phenomenon. As a consequence of many papers

one can easily transform binomially distributed written since Stein’s original masterpiece, it is now

data so that it is well approximated by a normal known that the normality assumption is not criti-

distribution with unit variance (see the baseball cal at all; similar (but more complicated) results

example below), and then consider the estimation can be proved for very wide classes of distribu-

problem on the transformed scale, before apply- tions. The original result can also be generalised

ing the inverse transform. to different loss functions, but there is an im-

portant caveat here: the Stein phenomenon only

Geometrically, the James–Stein estimator shrinks holds when we are interested in simultaneous es-

each component of X towards the origin, and it timation of all components of θ. If our loss func-

is therefore not particularly surprising that the tion were L(θ̂, θ) = (θ̂ 1 – θ1) 2 , for example, then

biggest improvement in risk over θ̂ 0 comes when we could not improve on θ̂ 0. This explains why it

�θ� is close to zero; see Figure 1 for plots of the wouldn’t make much sense to use the James–Stein

risk functions of θ̂ 0 and θ̂ JS when p = 5. A simple estimator in our bizarre example above; it is in-

calculation shows that R(θ̂ JS , 0) = 2 for all p ≥ 2, conceivable that we would be simultaneously in-

so the improvement in risk can be substantial terested in three such different quantities to the

when p is moderate or large. In terms of choosing extent that we would want to incorporate all three

a point to shrink towards, though, there is noth- estimation errors into our loss function.

ing special about the origin, and we could equally

well shrink towards any pre-chosen θ 0 ∈ Rp using Although Stein’s result is very clean to state and

the estimator prove, it may seem somewhat removed from

practical statistical problems. Nevertheless, the

idea at the heart of Stein’s proposal, namely that

of employing shrinkage to reduce variance (at

the expense of introducing bias) turns out to be

In this case, we have R(θ̂ θJS0, θ – θ0) = R(θ̂ JS , θ), so a very powerful one that has had a huge impact

θ̂ θJS0 still strictly dominates θ̂ 0 when p ≥ 3. on statistical methodology. In particular, many

modern statistical models may involve thousands

Note that the shrinkage factor in θ̂ θJS0 becomes or even millions of parameters (e.g. in microar-

negative when �X – θ 0� 2 < p – 2, and indeed it ray experiments in genetics, or fMRI studies in

can be proved that θ̂ θJS0 is strictly dominated by the neuroimaging); in such circumstances, we would

positive-part James–Stein estimator almost certainly want estimators to set some of

the parameters to zero, not only to improve per-

formance but also to ensure the interpretability of

the fitted model.

and the first three powers of three (3 + 9 + 27). 39

Player ni Zi πi

5

Baines 415 0.284 0.289

4 Barfield 476 0.246 0.256

Bell 583 0.254 0.265

3 Biggio 555 0.276 0.287

Risk

2 θ̂ 0 Bonilla 625 0.280 0.279

θ̂JS Brett 544 0.329 0.305

1

θ̂+JS Brooks Jr. 568 0.266 0.269

0 Browne 513 0.267 0.271

0 1 2 3 4 5 6

�θ�

Figure 1: Risks with respect to squared error loss of Table 1: Table showing number of times at

the usual estimator θ̂ 0, the James–Stein estimator θ̂JS and bat n i , batting average Z i in 1990, and career

the positive-part James–Stein estimator θ̂+JS when p = 5. batting average π i , of p = 9 baseball players.

Another important problem that is closely related the 1990 season. Further, let πi denote the player’s

to estimation is that of constructing a confidence true batting average, taken to be his career batting

set for θ, the aim being to give an idea of the un- average. (Each player had at least 3000 at bats in

certainty in our estimate of θ. Given α ∈ (0,1), his career.) We consider the model where Z1,…,

an exact (1 – α)-level confidence set is a subset Zp are independent, with Zi ∼ ni–1 Bin(ni , πi).

C = C(X) of Rp such that, whatever the true value

of θ, the confidence set contains it with probability We make the transformation

exactly 1 – α. The usual, exact (1 – α)-level confi-

dence set for θ in our original normal distribution

set-up is a sphere centred at X. More precisely, it is

and let θi = sin–1 (2πi – 1), which means that

Xi is approximately distributed as N(θi , 1). A heu-

where χ 2p (α) denotes the upper α-point of the ristic argument (which can be made rigorous) to

χ 2p distribution (in other words, if Z ∼ χ 2p, then justify this is that by a Taylor expansion applied

P�Z > χ 2p (α)� = α). But in the light of what we have to the function g(x) = sin–1 (2x – 1), we have

seen in the estimation problem, it is natural to

consider confidence sets that are spheres centred

at θ̂+JS (or θ̂ +, θ0 , for some θ 0 ∈ R ). Since the distri-

JS p

JS 2

but it may be possible to construct much smaller and this latter expression has an approximate

confidence sets – using bootstrap methods to ob- N(0, 1) distribution when ni is large, by the cen-

tain the radius, for example – which still have at tral limit theorem. In fact, since mini ni ≥ 400,

least (1 – α)-level coverage (e.g. Samworth, 2005). an exact calculation gives that the variance of

each Xi is between 1 and 1.005 for πi ∈ [0.2, 0.8].

A baseball data example For our prior guess θ 0 = (θ 0,1, …, θ 0,p)T, we take

θ 0,i = sin–1 (2π0 – 1), with π0 = 0.275 and

The following example is adapted from Sam- n = p ∑i=1 ni. We find that �X – θ� 2 = 2.56, some-

– –1 p

worth (2005). The data in Table 1 give the base-

what below its expected value of around 9, though

ball batting averages (number of hits divided by

number of times at bat) of p = 9 baseball players, since the variance of a χ 92 random variable is 18,

all of whom were active in 1990. The source was this observation is only around 1.5 standard de-

www.baseball-reference.com. For i = 1, …, p, let ni viations away from its mean. On the other hand,

and Zi respectively denote the number of times at �θ̂ +,

JS

θ0 – θ� = 1.50, so Stein estimation does pro-

2

bat and batting average of the ith player during vide an improvement in this case.

returns to the same point in the night sky every 40 years.

Letting π = (π1, …, πp) and recalling that θ is a References

function of π, the usual 95% confidence set for π

is 1. W. James, and C. Stein, Estimation with quad-

ratic loss, Proc. Fourth Berkeley Symposium,

1, 361–380, Univ. California Press (1961)

On the other hand, the 95% confidence set for π 2. R. Samworth, Small confidence sets for the

constructed using the bootstrap approach is mean of a spherically symmetric distribution, J.

Roy. Statist. Soc., Ser. B, 67, 343–361 (2005)

3. C. Stein, Inadmissibility of the usual estimator

Numerical integration gives that the volume ratio of the mean of a multivariate normal distri-

of the bootstrap confidence set to the usual con- bution, Proc. Third Berkeley Symposium, 1,

fidence set in this case is 0.26, so the benefits of 197–206, Univ. California Press (1956)

having centred the confidence set more appropri-

4. P. Y.-S. Shao and W. E. Strawderman, Im-

ately are quite substantial.

proving on the James–Stein positive-part esti-

mator, Ann. Statist., 22, 1517–1538 (1994)

Appendix

First note that since �X – θ� 2 ∼ χ 2p, we have R(θ̂ 0, θ) = p for all θ ∈ Rp. To compute the risk of the James–

Stein estimator, note that we can write

Consider the expectation inside the sum when i = 1. We can simplify this expectation by writing it out

as an n-fold integral, and computing the inner integral by parts:

since the integrated term vanishes. Repeating virtually the same calculation for components i = 2, …, p,

we obtain

polynomial n2 + n + 41 gives primes for |n| < 40. 41

- Derivation of BLUE property of OLS estimatorsDiunggah olehanksingh08
- MA1022_TUTORIAL_5_6_2010Diunggah olehNuwan Bandara
- r22_art3_cDiunggah olehsamykassa
- Melo_et_al_Oikos_2003Diunggah olehTiate Tdatdntt
- Savoia, 2009Diunggah olehJbl2328
- Stats220 Week8 Homework ExampleDiunggah olehSteve Jones
- Cindy C. Crawford et al- Alterations in Random Event Measures Associated with a Healing PracticeDiunggah olehHtemae
- Interpretation With Expanded Uncertainty 2007 v1Diunggah olehMaryuris Melo Pedraza
- Rocky-Mountain-Power--16-Scott-D-ThorntonDiunggah olehGenability
- State EstimationDiunggah olehtowfiqeee
- Chap 2Diunggah olehKelvin Law
- Enhanced Identification of Battery Model for Real Time Battery ManagementDiunggah olehminhtrieudoddt
- UICC Report Final - Cell Towers and ChildrenDiunggah olehGet the Cell Out - Atlanta Chapter
- AssignmentDiunggah olehJohn Nash
- AM DistillateDiunggah olehalvinhim
- Economic Returns to Communist Party Membership-Evidence From Chinese TwinsDiunggah olehmurtajih
- Bootstrap Based Birnbaum SaundersDiunggah olehjamesbondtv
- oyeaaaahDiunggah olehJea Magno
- Mid Semester Marking Scheme.docxDiunggah olehengler808
- kill meDiunggah olehEmily
- Data StructureDiunggah olehRajesh Lingampally
- reflection skittlesDiunggah olehapi-299319566
- intro to b-networksDiunggah olehGeorge190
- Chapter7_Stats_QEM_2015.pdfDiunggah olehHienpt Pham
- 0003-3219(2008)078[0670-mdaiaj]2.0.co;2Diunggah olehRebin Ali
- Exam2 Review 2Diunggah olehWei Mei
- Confidence IDiunggah olehAnonymous HH3c17os
- Data Structures and AlgorithmsDiunggah olehSushree Samapika
- Ch 7 _7.1 7.2 Estimation NalandaDiunggah olehRahul Saxena
- ResearchDiunggah olehreekha Aeo

- Express Investigation: Over reporting of tigersDiunggah olehExpress Web
- 169 Erp606 Bb Configguide en AeDiunggah olehAdel Zaker
- Cool Computer TricksDiunggah olehJimmy SanDiego
- 106-198-1-SM.pdfDiunggah olehYanethAvila
- Mumbai Data Center 5 - Largest Data Center in Mumbai | Netmagic SolutionsDiunggah olehBookmyDC
- Liquid Extraction 2nd Edition by Robert e TreybalDiunggah olehfikri
- 6510 Hardware RefManualDiunggah olehsushant_beura
- How to Slipstream the Intel ICH8M SATA Driver Into a Windows XP InstallationDiunggah olehSergio Sambuelli
- Question Paper Analysis by Makarand SirDiunggah olehneela94
- Sharath_Resume1Diunggah olehSushanth Jadey
- Uruguay Mobile Market 2018-2022_Critical MarketsDiunggah olehCritical Markets
- Day1 HomeworkDiunggah olehnguyenbacuit
- Media Shout 2.0 User GuideDiunggah olehGlenn Martin
- FRAMME_FeaDefOverDiunggah olehJose
- 14 Q - TCM Display Board - 20170808.pdfDiunggah olehchavesierra20081 Sierra
- Strsw Ilt Sanimp Rev2 LabsetupguideDiunggah olehRamanjulu Goniguntla
- Chapter #6, Engineering Economy, 7 th edition Leland Blank and Anthony TarquinDiunggah olehMusa'b
- eFront FAQDiunggah olehAthanasios Papagelis
- AC 20-175Diunggah olehlzhengq2010
- Midterm SolutionDiunggah olehIsmail Mabrouk
- Organization 4thDiunggah olehahmed
- Recycle and Bypass ProcessesDiunggah olehjohn_42494
- Data Warehousing-Seminar ReportDiunggah olehNiharika Singh
- 881032269842421Diunggah olehSiddharthan VP
- PHP Login Form With Anti SQL Injection ScriptDiunggah olehRezy Alt
- Request for inspection civil concrete PouringDiunggah olehProject m707
- Mininet Python API Reference ManualDiunggah olehMahyuddin Husairi
- PLM LDMDiunggah olehLonari Kunbi Samaj
- 13-ArrayList[1]Diunggah olehObiajulu Nick Ezenwabachili
- Bca3 Data StructureDiunggah olehAritra Mondal

## Lebih dari sekadar dokumen.

Temukan segala yang ditawarkan Scribd, termasuk buku dan buku audio dari penerbit-penerbit terkemuka.

Batalkan kapan saja.