The Philosophy of Statistics

The Philosophy of Statistics Author(s): Dennis V. Lindley Source: Journal of the Royal Statistical Society.
Series D (The Statistician), Vol. 49, No. 3 (2000), pp. 293-337 Published by: Blackwell Publishing for the Royal Statistical Society Stable URL: http://www.jstor.org/stable/2681060 . Accessed: 13/02/2011 20:00
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at . http://www.jstor.org/action/showPublisher?publisherCode=black. . Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Blackwell Publishing and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series D (The Statistician).
http://www.jstor.org
The Statistician (2000) 49, Part3, pp. 293-337
The philosophy of statistics

Dennis V. Lindley
Minehead,UK
[Received June 1999] Summary. This paper puts forward an overallview of statistics.It is argued thatstatisticsis the studyof uncertainty. The manydemonstrations that uncertainties can onlycombine accordingto the rules of the probability calculus are summarized.The conclusion is thatstatistical inference is based on probability firmly alone. Progress is thereforedependent on the construction of a probability model; methods fordoing this are considered. It is argued that the probabilities are and exchangeability personal. The roles of likelihood are explained. Inference is onlyof value ifit can be used, so the extensionto decision analysis,incorporating is relatedto riskand to the utility, use ofstatistics in science and law. The paper has been written inthe hope thatitwillbe intelligible to all who are interested instatistics. Data analysis; Decision analysis; Exchangeability; Keywords: Conglomerability; Law; Likelihood; Models; Personal probability; Risk;Scientific method;Utility
1.
Introduction
Insteadof discussing a specific problem, thispaper provides an overview within whichmost in thetitle statistical issuescan be considered. is used in thesenseof 'The study of 'Philosophy' thegeneral principles of some particular branch of knowledge, or activity' experience (Onions, has recently a reputation for with abstract 1956).Theword acquired beingconcerned solely issues, from divorced hereis to avoidexcessive andto dealwith abstraction reality. My intention practical concerned If thepractitioner matters with oursubject. whoreadsthispaperdoes notfeelthat the in has benefited will have failed one of its endeavours. The tries to study them, mywriting paper at statistics that will helpus, as statisticians, to developbetter a sound developa wayof looking to anyproblem which we might matters beenavoided, encounter. Technical havelargely approach notbecause theyare notimportant, butin theinterests of focusing on a clearunderstanding of can be studied. howa statistical situation At someplaces,matters of detailhavebeen omitted to the key idea. For example, densities an explicit highlight have been used without probability ofthedominating mention measure towhich refer. they The paperbeginsby recognizing thatstatistical issuesconcern uncertainty, goingon to argue thatuncertainty can only be measuredby probability. This conclusionenables a systematic accountof inference, based on theprobability whichis shownto be calculus,to be developed, from different some conventional accounts. The likelihood follows from thebasic role principle in theconstruction of probability modelsand the playedbyprobability. The roleof dataanalysis nature ofmodelsarenextdiscussed. The development ofmaking decisions leadsto a method and of riskis considered. method thenature Scientific and its application to some legal issues are within theprobabilistic The conclusion is thatwe haveherea satisfactory framework. explained setofstatistical whoseimplementation should statistical general procedures improve practice.
Address DennisV Lindley, TA24 5QU,UK. forcorrespondence: "Woodstock", QuayLane,Minehead, Somerset, E-mail:thombayes@aol.com ? 2000 Royal Statistical Society 0039-0526/00/49293
294
D. V. Lindley
The philosophy herepresented places moreemphasis on modelconstruction thanon formal inference. In thisitagreeswith muchrecent opinion. A reasonfor thischangeof emphasis is that is a systematic formal inference procedure within thecalculusofprobability. Model construction, bycontrast, cannot be so systematic. Thepaperaroseoutofmyexperiences at theSixth Valencia Conference on Bayesian Statistics, heldin June 1998 (Bemardoet al., 1999). Although I was impressed bytheoverall quality ofthe advancesmade,many did notseemto me fully to apprepapersand thesubstantial participants ciatetheBayesian Thispaperis an attempt to describe ofthat philosophy. myversion philosophy. It is a reflection of 50 years'statistical experience and a personalchangefrom a frequentist, through objective Bayes,to thesubjective attitude presented here.No attempt has been madeto indetail alternative to indicate where differ from those analyse philosophies, only their conclusions developed hereandto contrast theresulting practical methods. 2. Statistics To discussthe philosophy of statistics, it is necessary to be reasonably clear what it is the philosophy of,notin thesenseof a precisedefinition, so that thisis 'in', that is 'out',butmerely to be able to perceive itsoutlines. The suggestion hereis that statistics is thestudy of uncertainty in handling are experts (Savage, 1977): thatstatisticians uncertainty. Theyhave developed tools, like standard errors and significance levels,that measure theuncertainties that we might reasonofoursubject do can ablyfeel.A checkofhowwellthisdescription agreeswith whatwe actually be performed at thefour seriesofjournals in bylooking published bytheRoyalStatistical Society 1997. These embraceissues as diverse as social accounting and stablelaws. Apartfrom a few likethealgorithms exceptions, section (whichhas subsequently been abandoned) and a paperon all thepapersdeal either withuncertainty or withfeatures, like stablelaws, education, directly whicharisein problems that exhibit forthisviewof oursubject is provided uncertainty. Support role in topics thathave variability, by the factthatstatistics plays a greater givingrise to forexample, as an essential thanin moreprecisesubjects. uncertainty, ingredient, Agriculture, withstatistics, whereasphysicsdoes not.Notice thatit is onlythe enjoysa close association withthe matter thatis of uncertainty thatinterests us. We are not concerned manipulation of rain;onlywhether it will rain.This places uncertain. Thus we do not study themechanism in a curious inthat we are,as practitioners, on others. statistics situation The forecast of dependent on bothmeteorologist and statistician. can we exist rainwill be dependent Onlyas theoreticians we suffer ifwe remain from thescience.The term 'client'willbe alone.Even there too divorced in their to theperson, or lawyer, who encounters field used in reference e.g. scientist uncertainty ofstudy. is essentially thestudy of uncertainty The philosophical hereis that statistics position adopted role is to assist workers in otherfields, the clients, who encounter and thatthe statistician's In practice, in that in their work. there is a restriction statistics is ordinarily associated uncertainty theuncertainty, in thedataandthat in thetopic with or variability, data;and itis thelinkbetween thathas occupiedstatisticians. even restrict the data to be frequency itself Some writers data, been of statistical capable of near-identical repetition. Uncertainty, away from data,has rarely of studies of uncertainty. Probabilists discusshow interest. Statisticians do nothave a monopoly in one partof a system affects other randomness parts.Thusthemodelfora stochastic process willprovide. to data aboutthedatathat theprocess The passagefrom provides predictions process and go from that This is clear;it is whenwe attempt a reversal datato process difficulties appear. it this is devoted to last called and the action that paper mainly phase,commonly inference, might generate.
Philosophy ofStatistics
295
Notice thatuncertainty is everywhere, not just in science or even in data. It providesa motivation forsome aspectsof theology (Bartholomew, 1988). Therefore, the recognition of If a philosophical statistics as uncertainty wouldimply an extensive roleforstatisticians. position can be developedthatembracesall uncertainty, it will providean important advance in our understanding oftheworld. Atthemoment itwouldbe presumptive to claimso much. 3. Uncertainty
it is necessary Acceptance that statistics is thestudy ofuncertainty implies that to investigate the phenomenon. A scientific approach wouldmeanthemeasurement of uncertainty; for, to follow Kelvin,it is onlyby associating numbers withany scientific conceptthatthe conceptcan be properly understood. The reasonfor measurement is notjustto makemoreprecise thenotion that we are moreuncertain aboutthestock-market thanaboutthesunrising tomorrow, butto be able in a problem; to combine uncertainties. Onlyexceptionally is there one element of uncertainty in thesampling morerealistically there are several. In thecollection of data,there is uncertainty in thesampling. unit, andthenin thenumber reported In an archetypal statistical problem, there in bothdata and parameter. is uncertainty The central problem is therefore the combination of uncertainties. Now therulesforthecombination of numbers are especially simple. Furthermore, numbers combine in twoways,addition and multiplication, so leadingto a richness of ideas. We wantto measureuncertainties in orderto combinethem.A politician said thathe preferred adverbs tonumbers. itis difficult Unfortunately to combine adverbs. How is thismeasurement to be achieved? All measurement is based on a comparison witha standard. For length we refer to theorange-red line of thekrypton-86 isotope.The keyrole of in theworld meansthat there areno absolutes ofmeasurement. Thisis a point which comparisons we shallreturn to in Section11. It is therefore a standard for Several to find necessary uncertainty. have been suggested but the simplest is historically the first, namelygames of chance.These use as ourstandard provided thefirst uncertainties to be studied systematically. Let us therefore a simple game. are as nearly identical Consider before N ofballs that a known number youan urncontaining at random from theurn. as modern can makethem.Supposethatone ball is drawn engineering theballs are numbered Forthisto makesense,it is needful to define randomness. Imaginethat 1 to N and supposethat, a prizeifball 57 from at no costto you,youwereoffered consecutively If weredrawn. that thesameprizeifball 12 weredrawn. Suppose,alternatively, youwereoffered beweenany two numbers between the two propositions you are indifferent and,in extension, Notice thatthe definition of between1 and N, then,foryou, the ball is drawnat random. it depends forone person randomness is subjective; on you.Whatis random maynotbe random for another. We shallreturn to thisaspectin Section 8. thenumbers and suppose ofa ball at random, Havingsaid whatis meant bythedrawing forget not affecting thatR of the balls are red and the remainder the colouring white, youropinion is red. The at random, of randomness. Considerthe uncertain eventthatthe ball, withdrawn is thatthisprovides a standard foruncertainty and thatthe measureis R/N, the suggestion of the of red balls in the urn.Thereis nothing proportion profound here,beingjust a variant or proposition, which on whichgamesof chanceare based. Now pass to anyevent, assumption associated can either to measure happenor not,be trueor false.It is proposed youruncertainty If you think that theeventis just as with theevent withthestandard. happening by comparison N balls,ofwhich R arered, uncertain as therandom ofa redball from an urncontaining drawing R/N for then theevent has uncertainty you.R and N arefor youto choose.ForgivenN, itis easy of uncertainty forany to see that there cannot be morethanone suchR. Thereis nowa measure
296
D. V. Lindley
event orproposition. Before proceeding, letus consider themeasurement process carefully. A serious has beenmadethat assumption theconcept ofuncertainty can be isolated from other In discussing it was usefulto compare features. randomness, a prizegivenunderdifferent cirof an event cumstances, ball 57 or ball 12. Rewards will notnecessarily workin thecomparison witha standard. Forexample, supposethattheevent whoseuncertainty is beingassessedis the of a nuclear explosion weapon,within 50 milesof you,nextyear.Thena prizeof ?10000, say, will be valued differently when a red ball appearsfrom when it will whenfleeing from the The measurement radiation. processjust described assumesthatyou can isolatetheuncertainty its unpleasant of thenuclear from For themoment we shallmakethe explosion consequences. to it in Section17 to showthat, in a sense,it is not important. assumption, returning Ramsey (1926),whosework willbe discussed in Section 4, introduced theconcept ofan 'ethically neutral withtheurnpresents event'forwhichthecomparison fewer difficulties. Nuclearbombsare not neutral. ethically In contrast, noticean assumption that has notbeenmade.Foranyevent, including thenuclear ithas notbeen assumed that R (and N, butthat bomb, you can do themeasurement to determine in yourassessment theprecision we assumethatyou onlyreflects of theuncertainty). Rather, wouldwishto do it,wereyouto knowhow.All thatis assumedof anymeasurement processis it is reasonable, that it can easilybe done.Because youdo notknowhowto measure notthat the itdoesnotfollow distance to ourmoon, ofa distance that to it. youdo notbelievein theexistence Scientists have spentmucheffort on the accuratedetermination because theywere of length that convinced theconcept of distance of krypton it seems made sensein terms light. Similarly, to attempt reasonable themeasurement ofuncertainty. 4. and probability Uncertainty
Ithasbeennoted that a prime reasonfor themeasurement ofuncertainties is to be able to combine so letus see howthemethod thisend.Supposethat, oftheN balls them, suggested accomplishes R are red,B are blue and theremainder in theurn, white. Thentheuncertainty that is associated thewithdrawal ball is (R + B)/N = R/N + B/N, thesumoftheuncertainties with ofa coloured associated with willobtain for events red,andwith blue,balls.The sameresult anytwoexclusive whose uncertainties are respectively R/N and B/N and we have an additionrule foryour uncertainties ofexclusive events. N - R white; at thesametime, S Next,supposeagainthatR balls are redand theremaining N - S plain.The urnthen are spotted andtheremaining contains four ofball,ofwhich one types with and red,of whichthenumber is say T Thentheuncertainty associated typeis bothspotted thewithdrawal of a spotted redball is T/N, whichis equal to R/N X T/R, theproduct of the ofa redball andthat of spotted balls among thered.Againthesameresult will apply uncertainty withcoloured and withspotted foranytwoevents balls and we have a product beingcompared rulefor uncertainties. themeasureThe addition rules with rulethat andproduct theconvexity justobtained, together mentR/N alwayslies in the(convex)unitinterval, are thedefining rulesof probability, at least for a finite is therefore that themeasurements of number ofevents (see Section 5). The conclusion In thereverse can be described therulesof uncertainty by thecalculusof probability. direction, to therulesgoverning thishelpsto explain reducesimply probability proportions. Incidentally, are often so useful: thecombination can be studied ofuncertainties whyfrequentist arguments by in a group, orfrequencies, basisofprobability is very hereofballs.The mathematical proportions, anditis perhaps that ityieldscomplicated anduseful results. simple surprising The conclusions withuncertainty can are thatstatisticians are concerned and thatuncertainty
ofStatistics Philosophy
297
such adequatefor heremaynotbe thought demonstration The sketchy byprobability. be measured approaches. so letus lookatother conclusion, an important Hence one withgamesof chanceand gambling. has been associated uncertainty Historically, to stakes on it.The willingness that depend thegambles is through uncertainty wayofmeasuring of theevent of theuncertainty a measure occursis, in effect, on an event to win w if theevent by is nowreplaced of events theodds (against)of w/sto 1. The combination through expressed Withtwohorses in horse-racing. thesituation of gambles.To fixideas,contemplate collections of as well as a beton either maybe considered separately in thesame race,betson them running may be used. Using races,the eventof bothwinning Withhorsesin different themwinning. A a Dutchbookcan be employed. we term in Britain ofwhat likethese, theconcept combinations a Dutchbookif it is possibleto place a setof odds is said to constitute seriesofbetsat specified in assets.A bookmaker with an increase one is sureto comeoutoverall in sucha waythat stakes 1974,1975) which a Dutchbookmaybe made.It is easyto show(de Finetti, oddsfor never states from theodds,obeying derived to theprobabilities, of a Dutchbook is equivalent that avoidance Forodds o, the mentioned. already rulesofprobability and multiplication addition theconvexity, withthe the use of odds, combined is p = (1 + o)-1. In summary, probability corresponding as before. ofa Dutchbook,leadsbacktoprobability impossibility numerical to stateyour youare required Supposethat approach. de Finetti introduced another S(x, E) if you foran event, and you are toldthatyou will be scoredby an amount uncertainty is subsequently to be trueor falserespectively. shown E =1 or E = 0 iftheevent state x, where events,the scores are to be added. A possible penaltyscore used by him is For different anyfunction conclusions, lead to ridiculous a fewscoresthat S(x, E) = (x - E)2, but,aside from of something that i.e. as a function of probability, will do. Thenif you use x as some function a smaller penalty in thesenseofobtaining three rulesofprobability, youwillbe sure, obeysthose dependson S. This way.Whichfunction thansomeonewho acts in another score,to do better weather forecasters. and has been used to train is attractive because it is operational approach andhence assessments ofyour probability checkon thequality itprovides an empirical Generally pointhereis thatagain it a testof yourabilities as a statistician (see Section9). The important thatagain compelsthe use of groupof methods We nextstudya third leads to probability. to describe uncertainty. probability setup with was originally other statistical groups, many Society, together The RoyalStatistical becausepartof the aboutuncertainty data.This agreeswithourargument to gather and publish in uncertainty. It remains an essential partof was a reduction thedatagathering behind purpose is the offices whose function have statistical statistical todayand most Governments activity wondered howthe It didnottakelongbefore statisticians ofstatistics. andpresentation acquisition as watched inference statistical was born.Thenstatisticians bestbe used and modern datamight to concern and began,withothers, or actions, wereused as thebasis of decisions, theinferences It is nowexplained howthisleadsagaintoprobability. themselves with decision-making. seemsto havebeen Ramsey(1926). He to makethestepto decision-making The first person He made in thefaceof uncertainty?'. 'how shouldwe makedecisions askedthesimplequestion that The theorem and from themdeducedtheorems. reasonableassumptions some apparently measures shouldobeytherulesof the us hereis thatwhichsaysthattheuncertainty concerns until work was unappreciated again.Ramsey's calculus.So we are backto probability probability cameup with butrelated, assumptions and from different, Savage(1954) askedthesamequestion de Finetti's a linkwith ithas to be probability. Savagealso established that thesameresult, namely favourite My personal thefield withsimilar results. haveexplored many others ideas. Sincethen, recent 6. An excellent havefullrigour is that ofDeGroot that (1970), chapter presentations among is Bernardo (1994). and Smith presentation
298
D. V. Lindley
5.
Probability
The conclusionis thatmeasurements of uncertainty mustobey the rules of the probability on maximaand calculus.Other rules,like thoseof fuzzylogic or possibility theory, dependent see minima, rather thansums and products, are out. So are some rulesused by statisticians; Section 6. All these derivations, whether based on balls in urns,gambles,scoringrules or results, decision-making, arebasedon assumptions. Sincethese assumptions imply suchimportant itis proper ofstatisticians that they areexamined with great care.Unfortunately, thegreat majority do notdo this.Some deny thecentral thereasons for it. result aboutprobability, without exploring It is notthepurpose ofthispaperto provide a rigorous account butletus look at one assumption it is intended and becauseof itslater implications. As with all theassumptions, to be self-evident ifyouwereto be caught notion that youwouldfeelfoolish violating it.It is based on a primitive ofone event sense,butas part beingmorelikely than another. ('Likely' is notusedin itstechnical ofnormal 'A is morelikely is that if English usage.) We write thanB' as A > B. The assumption = > (i 1, 2) imply and A2 are exclusive, and the same is true of and B2, then Al Ai Bi B, Al U A2> B, U B2. (Al U A2 meanstheevent thatis truewhenever either Al or A2 is true.)We to be might feelunhappy ifwe thought that thenext person topass through a doorwas morelikely a non-white more likely than a female, than a non-white male, that a white female, A2, is AI, B,, whitemale, B2, yeta female, The developments U A2 was less likely than a male, U B2. AI B, outlined pointis that above start from assumptions, or axioms,of thischaracter. The important they all lead to probability being theonlysatisfactory expression ofuncertainty. and The last sentence is notstrictly true.Some writers have considered theaxiomscarefully a system that uses produced objections. A fine critique is Walley (1991), whowent on to construct in place of the singleprobability. a pair of numbers, called upperand lowerprobabilities, The I result is a morecomplicated system. My position is thatthecomplication seemsunnecessary. andwhere haveyetto meeta situation in which theprobability approach appears to be inadequate to deal theinadequacy can be fixed by employing upperand lowervalues.The pairis supposed withtheprecision of probability alone contains a measure of its own assertions; yetprobability I believein simplicity; thatit works, thesimpler is to be preferred overthe precision. provided Occam'srazor. complicated, essentially it is Withthe conclusionthatuncertainty is only satisfactorily describedby probability, calculus.Probability convenient to stateformally thethree rules,or axioms,of theprobability theuncertain event andtheconditions under which on twoelements: depends youare considering it. In theextraction of balls from an urn, forred depends on thecondition that yourprobability theball is drawn at random. We write of A whenyou know, or are p(AIB) foryourprobability B to be true, ofA, givenB. The rulesareas follows. andwe speakofyour assuming, probability for all A andB, 0 - p(AIB) - 1 andp(A A) = 1. (a) Rule 1 (convexity): ifA andB areexclusive, (b) Rule2 (addition): givenC, p(A U BIC) = p(AIC) + p(BI C). for all A, B and C, (c) Rule 3 (multiplication): p(ABIC) = p(A IBC) p(BI C). ruleis that occursif,and onlyif,bothA and B occur.)The convexity (Here AB meanstheevent sometimes to includep(AIB)= 1 onlyif A is a logical consequence of B. The strengthened addition is calledCromwell's rule. but Thereis one point abouttheaddition rulethat to be merely a mathematical nicety appears in facthas important to exhibited in 8. With the three rules as Section practical consequences be
299
stated above,itis easyto extend theaddition rulefor two,to anyfinite number ofevents. None of themanyapproaches already discussed lead to therule'sholding foran infinity of events. It is it does so holdbecauseoftheundesirable usualto suppose that results that follow without it.This can be madeexplicit either bysimply restating theaddition rule, orbyadding a fourth rulewhich, together withthethree above,leads to addition foran infinity of exclusive events. My personal preference is for thelatter andto add thefollowing. if{B,,} is a partition, (d) Rule4 (conglomerability): possibly infinite, of C andp(AIB, C) = k, thesamevaluefor all n,then p(A IC) = k. (It is easyto verify that rule4 follows from rules1-3 whenthepartition is finite. The definition is due to de Finetti.) Conglomerability is in thespirit of a class of rulesknown as 'surethings'. Roughly, if whatever happens(whatever yourbeliefis k unconditionB,,)yourbeliefis k, then ally.The assumption described earlierin thissectionis in the same spirit. Some statisticians appearto be conglomerable onlywhenit suits them: hencethepractical connection to be studied in Section8. Notethat therulesofprobability areherenotstated as axiomsin themanner found in texts on probability. They are deductions, apart fromrule 4, fromother,more basic, assumptions. 6. Significance and confidence
The reaction of manystatisticians to theassertion that theyshoulduse probability will be to say that they do italready, andthat thedevelopments heredescribed do nothing morethan givea little cachetto whatis already beingdone.Thejournalsare fullof probabilities: normal and binomial theexponential It might that distributions abound; family is everywhere. evenbe claimed no other measure is used: few, if any, ofuncertainty statisticians embrace fuzzy logic.Yetthisis nottrue; statisticians do use measures to therulesof the of uncertainty thatdo not combineaccording probability calculus. Consider a hypothesis a medicaltreatment is ineffectual, orthat a specific social factor H, that does notinfluence crime or sociologist, is uncertain aboutH, and dataare levels.The physician, in thehopeofremoving, or at leastreducing, A statistician calledin to collected theuncertainty. adviseon theuncertainty theclient ofuncertainty, aspectmayrecommend that uses,as a measure a tailarea,significance Thatis, assuming thatH is true, the level,withH as thenullhypothesis. oftheobserved, ormoreextreme, datais calculated. Thisis a measure ofthecredence probability that can be puton H; thesmaller theprobability, thesmaller is thecredence. This usage fliesin the face of the arguments above whichassertthatuncertainty about H The needsto be measured by a probability forH. A significance level is notsucha probability. can be expressed distinction starkly: level-the probability ofsomeaspectofthedata,givenH is true; significance of H, given thedata. probability your probability The prosecutor's fallacyis well knownin legal circles.It consistsin confusing p(AIB) with the same. The distinction between levels significance p(BIA), twovalueswhichare onlyrarely andprobability is almost theprosecutor's fallacy:'almost'becausealthough B, in theprosecutor uses A as data. form, may be equated with H, the data are treated differently. Probability of significance Adherents levelssoon recognized that couldnotuse just thedatabuthad to they include'moreextreme' datain theform ofthetailof a distribution. As Jeffreys (1961) putit:the levelincludes havehappened datawhich butdidnot. might of Fromhypothesis letus pass to pointestimation. A parameter 0 might be theeffect testing,
300
D. V. Lindley
themedicaltreatment, or theinfluence of thesocial factor on crime. Again 0 is uncertain, data might be collected and a statistician consulted (hopefully notin that order). The statistician will typically recommend a confidence interval. The development above based on measured uncertainty willuse a probability density for0, andperhaps an interval ofthat density. Againwe havea contrast similar to theprosecutor's fallacy: confidenceprobability that theinterval includes 0; probabilityprobability that 0 is included intheinterval. is a probability The former statement abouttheinterval, given0; thelatter about0, given thedata. Practitioners frequently confuse thetwo.Moreimportant than theconfusion is thefact that neither significance levelsnorstatements of confidence combine to therulesoftheprobability according calculus. Does the confusion At a theoretical matter? level,it certainly does, because theuse of any measure thatdoes notcombine according to therulesof theprobability calculuswill ultimately violate some of the basic assumptions thatwere intended to be self-evident and to cause if violated.At a practical embarrassment to spenda level,it is not so clearand it is necessary whileexplaining thepractical implications. Statisticians tendto study problems in isolation, with the resultthatcombinations of statements are not needed,and it is in the combinations that can arise,as was seeninthecolour-sexexample in Section difficulties 5. Forexample, itis rarely tomakea Dutchbookagainst ofsignificance statements possible levels.Somecommon estimators are known to be inadmissible. The clearest occurswiththe exampleof an important violation relationship betweena significance level and the sample size n on whichit is based. The interpretation of 'significant at 5%' depends on n, whereas a probability of 5% alwaysmeansthe same.Statisticians havepaid inadequate totherelationships attention between statements that they make and thesamplesizes on whichtheyare based. Thereare theoretical reasons(Bergerand thatit is too easy to obtain 5% significance. If so, many Delampady,1987) for thinking raisefalsehopesofa beneficial effect that doesnottruly exist. experiments Individual statistical made in isolation, thetrouble lies statements, maynotbe objectionable; in theircombinations. For example,confidence for a single parameter intervals are usually acceptablebut,withmanyparameters, theyare not. Even the ubiquitous samplemean fora is unsoundin highdimensions. In an experiment withseveraltreatments, normaldistribution butmultiple is established individual tests are fine Scientific truth comparisons present problems. of manyexperiments: theresults is a difficult area forstatistics. by combining yetmeta-analysis How do you combineseveraldata sets concerning the same hypothesis, each withits own on meansY, and M2 do notcohere level?The conclusions from two Student t-tests significance with thebivariate T-test for(yi, a2) (Healy,1969). In contrast, hereeasilytakes theviewadopted ofthelatter to provide theformer. themargins that is theonlymeasure is therefore notjusta paton The conclusion ofuncertainty probability his ideas in an theback but strikes at manyof thebasic statistical activities. Savage developed tojustify statistical himself someaspectsof it.Let us He surprised attempt practice. bydestroying to theconstructive from ofthe therefore ideasthat flow theappreciation pass from disagreements in statistics. basicroleofprobability 7. Inference
The formulation that has served wellthroughout thiscentury is basedon thedatahaving, statistics foreach value of a parameter, a probability distribution. This accordswiththe idea thatthe in the data needs to be described It is desiredto learnsomething uncertainty probabilistically.
301
is of interest, so abouttheparameter from thedata.Generally notevery aspectof theparameter it as (0, a) where term. write we wishto learnabout0 witha as a nuisance, to use thetechnical ofx given0 anda. A Denoting thedatabyx, theformulation introduces p(x 0, a), theprobability simpleexample wouldhave a normal distribution of mean0 and variance a, buttheformulation embraces many complicated cases. is also This handlesthe uncertainty in the data to everyone's satisfaction. The parameter it is that that is thestatistician's mainconcern. The recipesaysthat uncertain. Indeed, uncertainty italso should from theconventional be described bya probability p(O, a). In so doing, we depart It is often attitude. said that theparameters are assumed to be random Thisis notso. It quantities. is the axiomsthatare assumed,from whichthe randomness property is deduced.Withboth uncertainty probabilities available, theprobability calculuscan be invoked to evaluate therevised inthelight ofthedata:
p(O,a x) oxp(x 0, a) p(O,a),
(1)
theconstant ofproportionality dependent onlyon x, nottheparameters. Sincea is notof interest, itcan be eliminated, againbytheprobability calculus, to give p(O x)= p(O, ax) da. (2)
solve theproblem Equation(1) is theproduct rule;equation (2) theaddition rule.Together they of inference, for its solution.Equation (1) is Bayes's or, better, theyprovidea framework it has givenits name to the whole approach, theorem whichis and,by a historical accident, termed which by some other Bayesian.This perhapsunfortunate terminology is accompanied These are is even worse. p(O) is oftencalled the priordistribution, p(Olx) the posterior. unfortunate andposterior are relative to thedata.Today'sposterior becauseprior terms, referring is tomorrow's that theircompleteavoidance is almost prior.The termsare so engrained impossible. Let us summarize theposition reached. (a) (b) (c) (d) (e) Statistics is thestudy ofuncertainty. be measured should Uncertainty byprobability. on theparameters. Data uncertainty is so measured, conditional is similarly Parameter measured uncertainty byprobability. is performed within theprobability Inference (1) and(2). calculus, mainly byequations
and (c) is generally from Points(a) and (b) havebeendiscussed Point(e) follows (a)accepted. has beento point considered theBayesian (d). The mainprotest against position (d). It is therefore next. 8. Subjectivity
it is clearthat At thebasic level from whichwe started, one person's uncertainties are ordinarily different from another's. Theremaybe agreement overwell-conducted gamesof chance,so that can be disagreement. buton many there they maybe used as a standard, issues,evenin science, As thisis beingwritten, on someissuesconcerning modified food. scientists disagree genetically It might in thenotation. therefore be sensible to reflect thesubjectivity The preferred wayto do thisis to include is theconcept of a person's at thetimethat theprobability judgment knowledge made. Denoting thatby K, a better notation thanp(O) is p(OIK), theprobability of 0 givenK, to p(Olx,K) on acquiring thedata.This notation is valuablebecauseit emphasizes the changing
302
D. V. Lindley
factthatprobability is alwaysconditional (see Section5). It dependson two arguments: the element whoseuncertainty is beingdescribed and theknowledge on whichthatuncertainty is based. The omissionof the conditioning argument oftenleads to confusion. The distinction between priorand posterior is better described by emphasizing the different conditions under which theprobability oftheparameter is beingassessed. It has been suggested thattwo people with the same knowledge should have the same uncertainties, and therefore thesameprobabilities. It is calledthenecessary view.Thereare two difficulties withthisattitude. First, it is difficult to say whatis meant by twopeoplehaving the iftheprobability sameknowledge, and also hardto realizein practice. The secondpointis that, it shouldbe possibleto evaluate it without does necessarily follow, reference to a person. So far in an entirely no-one has managed theevaluation satisfactory way.One wayis through theconcept of ignorance. If a stateofknowledge I was identified, that described thelack ofknowledge about 0, and thatp(O I) was defined, thenp(OIK), foranyK, couldbe calculated by Bayes'stheorem (1) on updating I to K. Unfortunately attempts to do this ordinarily lead to a conflict with conglomerability. For example,supposethat0 is known onlyto assumepositive integer values andyouareotherwise about0. Thentheusualconcept ignorant of ignorance meansall valuesare equallyprobable: p(O = i I) = c forall i. The addition ruleforn exclusive events {0 = i}, with ne > 1, meansc = 0 sinceno probability can exceed 1 by convexity. Now partition thepositive integers intosetsof three, each containing twoodd,and one even,value: A,,= (4n - 3, 4n - 1, 2n) will do. If E is the eventthat0 is even,p(EIA,,)= 3 and by conglomerability p(E) = 3 Another partition, each set withtwo even and one odd value, say B,,= (4n - 2, 4n, 2n - 1), has p(EIBn) 2 and hence p(E) =2 in contradiction withtheprevious result. By the suitable selection ofa partition, p(E) can assumeanyvaluein theunitinterval. Sucha distribution is said tobe improper. mostattempts to producep(OII) by a necessary lead to impropriety Unfortunately argument in addition to violating leads to other behaviour. which, conglomerability, typesof unsatisfactory in detailbyJeffreys See, for example, Dawid et al. (1973). The necessary viewwas first examined buttheissueis stillunresolved. Here (1961). Bemardo(1999) andothers havemaderealprogress is an expression theview will be takenthatprobability by a personwithspecified knowledge to as 'you'. p(A IB) is your beliefaboutA aboutan uncertain The person willbe referred quantity. to refer It is as whenyou knowB. In thisview,it is incorrect to theprobability; onlyto yours. to state theconditions as itis theuncertain event. important the to theexampleof 0 taking Returning positive integer values,as soon as you understand no oftheobjectto which theGreek letter meaning refers, youare,becauseofthat understanding, is vitiated longerignorant. Then p(O = i) = ai with ai = 1. Some statistical research by a I challenge andthereality itrepresents. toproduce confusion between theGreek letter that anyone a real quantity A further is thata computer aboutwhichtheyare truly consideration ignorant. one example of 0. Since p(O - n I) = nc = 0, p(O > n) = 1, so 0 must couldnotproduce surely be larger than orthecomputer can handle. anyvaluethat youcareto name, A further and notation when matter attention. It is common to use thesame concept requires oftheconditioning is unknown butassumed to be true. Forexample, all statisticians use event part p(xI0, a) as above,or,moreaccurately, p(xI0, a, K). Here theydo notknowthevalue of the is uncertainty aboutx, if theparameters wereto havevalues0 Whatis beingexpressed parameter. between andfact anda (andthey hadknowledge to distinguish K). It is notnecessary supposition inthiscontext andthenotation p(A IB) is adequate. is thatyourpersonal is expressed The philosophical position uncertainty through yourprobof an uncertain real or assumed.This is termed ability quantity, givenyourstateof knowledge, thesubjective, attitude toprobability. orpersonal,
303
are objective, thinkthat theirstatements matters, Many people, especiallyin scientific Theiralarm of subjectivity. by theintrusion and are alarmed theprobability, through expressed calculus. intheprobability is reflected reality andhowthat reality byconsidering can be alleviated another Law provides appliesgenerally. of sciencebutthe approach We discussin the context law,0 1 or 0 0 of interest. (In criminal quantity example.Supposethat0 is thescientific will thescientist thecrime.)Initially did,or did not,commit to whether thedefendant according will have base K is small,andtwoscientists knowledge about0, becausetherelevant knowlittle willbe conducted, datax probabilities through expressed different opinions, p(0IK). Experiments It can be updatedto p(0lx, K) in theway alreadydescribed. probabilities obtained and their as the obtain, thattypically demonstrated (see Edwardset al. (1963)) thatundercircumstances or at to where0 is known, typically viewswill converge, thedisparate of dataincreases, amount This is what is observedin practice:whereas precision. with considerable least determined they amongthemselves, vigorously, sometimes viewsand discuss, varyin their initially scientists is reallya consensus. objectivity As someonesaid,theapparent come to agreement. eventually and theBayesianparadigm. practice scientific herebetween Thereis therefore good agreement whenwe consider Thesewillbe discussed all agreeon a probability. Therearecases where almost in Section14. exchangeability to acceptpoint(d), the use of a has been a reluctance It is now possibleto see whythere subjectivity It is becausetheessential uncertainty. parameter to describe distribution probability Withlittledata,p(O, a) variesamongsubjects:as the data increase, has not been recognized. whentwo Thisis openly recognized Noticethat p(x 0, a) is also subjective. consensus is reached. tothese ofthesamedataset.We shallreturn modelsintheir analysis statisticians different employ in Sections 9 and 11. theroleofmodelsis treated when points 9. Models
theBayesian viewpoint discussed byDraper(1995) from The topicof modelshas beencarefully here.The fora moredetailedaccountthanthatprovided and thatpaper shouldbe consulted of solelyin terms shouldbe described hereis thatuncertainty developed position philosophical of probability of this idea requiresthe construction The implementation your probability. The complete in thereality probability elements beingstudied. forall theuncertain distributions differs fromthat will be called a (probability) model,thoughthe terminology specification from modelas used in later. It also differs in a wayto be described used in statistics, ordinarily under theuncertain world Having study. taskis to construct a modelfor science.The statistician's uncertainto havetheir aspectsof interest theprobability calculusenablesthespecific donethis, two aspectsto our study: on theknowledge thatis available.Thereare therefore ties computed is essentially automatic; of that model.The latter theconstruction of themodeland theanalysis To withreality. close contact it can be done on a machine.The former requires in principle but themodel;with de Finetti, think whenconstructing it,do notthink and exaggerate paraphrase in doingthis, thesubject, whose madethat, already We repeat thepoint leave it to thecomputer. but here,is notthestatistician, the'you' in thelanguageadopted are beingsought, probabilities task is to who has asked forstatistical advice. The statistician's the client,oftena scientist the with andthen to compute in thelanguage ofprobability, thescientist's uncertainties articulate it describes of reality A model is merely and,like probability, found. yourreflection numbers to between butonlya relationship you and thatworld.It is unsound you northeworld, neither thisusage can be excusedis whenmostpeopleare agreed refer to thetrue model.One timethat normalmight of fathers and sons as bivariate on the model. Thus the model of the heights as true. be described reasonably
304
D. V. Lindley
in a typical Whatuncertainties are there The fundamental and scenario? problem of inference of induction is to use past data to predict future data. Extensive observations on the motions heavenly bodiesenablestheir future positions to be calculated. Clinicalstudies on a drugallowa doctor to givea prognosis fora patient for whomthedrug is prescribed. Sometimes theuncertain dataarein thepast,notthefuture. A historian willuse whatevidence he has to assesswhatmight havehappened where records are missing. A court of criminal law enquires aboutwhathad happastdatax penedon thebasis of laterevidence. We shall,however, use thetemporal image, with beingusedto infer future datay (as x comesbefore y in thealphabet). In thisview, thetaskis to the assess p(ylx,K). In the interests of clarity, the background knowledge, fixedthroughout treatment, willbe omitted from thenotation andwe write p(ylx). it maybe One possibility is to tryto assess p(ylx) directly. This is usuallydifficult, though thought of as thebasis of theapprenticeship system. Herean apprentice wouldsitat themaster's feetand absorb thedatax. With whatwould yearsof suchexperience, theapprentice couldinfer be likely to happen whenhe worked on his own.Successive observation on theuse of ash in the a construction of a wheelwouldenablehimto employ ash forhis ownwheel.Thereis, however, better theconnections x and y,and themechanisms wayto proceedand thatis to study between thatoperate. Newton'slaws enable the tidesto be calculated.Materialsscience assistsin the design andconstruction of a wheel.Mostmodern inference can be expressed through a parameter O thatreflects theconnection thetwo sets of data. Extending theconversation, strictly between to include within theprobability calculus, 0,
p(YIx) =
H,x) p(OIx) dO. I0
It is usualto supposethat, In probability language, once 0 is known, thepastdataare irrelevant. thisfact with thedetermination ofp(Olx)byBayes's given0, x andy areindependent. Combining theorem, we have P(ylx)
p(yI0) p(xI ) p(O)dO fp(x 0) p(O) dO.
andthetwodatauncertainties, Now thetaskis to assesstheuncertainty p(O) abouttheparameter can stopat p(Olx),leavingothers to Often theinference p(xI0) and p(yI0), giventheparameter. wouldneedto insert happenwiththedrugexampleabove,wherethedoctor p(yI0). This might 0 andthen know thedistribution ofefficacy assessp(y 0) for theindividual patient. is rightly in terms oftheevaluation of p(Olx),there is an muchinference Although expressed in contemplating accruesfrom the factthaty will important advantage p(ylx). The advantage is not the doctor The parameter be observed; will see whathappensto thepatient. eventually thatof y disappears. This feature enables The uncertainty of 0 often usuallyobserved. remains; version of theeffectiveness oftheinference to be displayed rulein an extended byusinga scoring in Section4. If theinference thatdescribed is p(ylx) and y is subsequently observed to be Yo,a how good the inference score function was, so thatthe clientand the S{yo, p( Ix)} describes has been used in meteorology, statistician have their assessed.The method e.g. in competences rainfall. Such methods availableforp(Olx). One of the tomorrow's are not readily forecasting thathas been levelledagainst levelsis thatlittle has been made of criticisms significance study how manyhypotheses, been shownto be true.Thereis no rejectedat 5%, have subsequently that itshould be much andthat to think that itis 5%. Theory is reason suggests higher significance A weatherforecaster rain on only 5% of days, when it too easily attained. who predicted esteemed. The bestwayto assessthequality of rained on 20%, wouldnotbe highly subsequently
305
inferences is to checkp(Olx)through thedataprobabilities they generate. p(ylx) that As previously mentioned, itis usually necessary to introduce nuisance parameters a, in addition to 0, to describe adequately theconnection between x and y, and to establish theindependence between them, given(0, a). In the drugexample,a might involvefeatures of the individual Nuisance patient. parameters imposeformidable problems forsomeforms of inference, likethose based solelyon likelihood, but,in principle, are easilyhandled within theprobability framework from the 0: equation bypassing jointdistribution of(0, a) to themarginal for (2). The introduction ofparameters reduces theconstruction of a modelto providing p(x 0, a) and is similar for x andneednotbe separately p(O, a). p(yI0, a) also arisesbutitsassessment to that discussed. Herewe see thedistinction our use of 'model' and thatcommonly adopted, between whereonlythedata distribution, is included. Our definition includesthe giventheparameters, distribution oftheparameters, sincethey form an important partoftheuncertainty that is present. in thenext Most of thecurrent literature on modelstherefore concerns thedata and is discussed A section. Forthemoment, we justrepeat thepoint madeearlier that evenp(xI0, a) is subjective. common reasonforwrongly there is often more thinking thatit is objective lies in thefactthat on the data thanon the parameters, public information and we saw in Section 8 that,with increased information, peopletend to approach agreement. theritual of determining Whygo through calculating p(Olx)? p(xI0, a) and p(O, a), and then If p(O, a) can be assessed, whynotassess p(Olx) directly and avoidsomecomplications? To use I do notlike:ifyour Part terminology that prior can be assesseddirectly, whynotyour posterior? oftheanswer lies intheinformation that is typically available aboutthedatadensity, butthedesire forcoherence is themajorreason.A set of uncertainty is said to be coherent if they statements = the rules of the probability 0.7 and satisfy calculus.Thus, the pair of statements p(AIB) = = = 0.4 do not with the and 0.3. ,B -B) cohere pair 0.5 -A) (Here p(BIA) p(BI p(Al denotes thecomplement of B.) Think ofA as a statement aboutdatax and B as a statement about in thedata and coheres 0. The first to uncertainties thefirst parameter pairrefers with parameter do notcohere statement, p(BIA) = 0.5, fordata A. (Take p(B) = 0.4/1.1 = 0.36.) But all three withthesecondparameter fordata -A, thatp(BI r-A) = 0.3. Withp(B) = 0.36, the statement that for coherent valueis 0.22. The standard ensures procedure youareprepared anyvaluesofthe makesense. inferences aboutB willcollectively data,A or'-A, andthefinal 10. Data analysis
or Much statistical workis not concerned with a mathematical whether system, frequentist level. When faced with a new set of data, a Bayesian,but operatesat a less sophisticated statistician will 'playaround' with an activity called(exploratory) dataanalysis. Elementary them, calculations will be made; simplegraphswill be plotted.Several valuable ideas have been and box plots.We arguethatthisis an essential, developedfor'playing',such as histograms that and worthwhile fits intothephilosophy. The viewadopted hereis important activity sensibly dataanalysis of a modeland is an activity that theformal that assistsin theformulation precedes so farin this calculations that areneededfor inference. The argument paper probability developed has demonstrated the need for probability. Data analysisputs fleshonto this mathematical skeleton. The onlynovelties thatwe add to conventional data analysis is therecognition thatits ofprobability andshould as wellas data. final conclusions should be interms embrace parameters In thelanguage ofthelastsection, theconclusions ofdataanalysis should cohere. The fundamental themeasurement of uncertainty was thecomparison behind witha concept Such comparisons We standard. are often difficult and there is a need to find somereplacement. the standard, but employothermethods. Data do not measurelength by usingkrypton light,
306
D. V.Lindley
analysis and theconcept of coherence is sucha replacement. Supposethat you need to assess a single probability; then all youhaveto guideyouis thenecessity that thevaluelies between 0 and 1. In contrast, suppose that theneedis to assessseveral probabilities ofrelated events orquantities, whenthewholeof therichcalculusof probabilities is availableto helpyou in yourassessments. In theexample that concluded Section9, youmight havereached thefour valuesgiventhere, but considerations of coherence wouldforceyou to alterat least one of them.Coherence acts like in themeasurement it forces geometry of distance; several measurements to obeythesystem. We haveseenhowthishappens in replacing p(y x) byp(x 0, a) andp(O, a). Let us consider thisand itsrelationship with dataanalysis, considering first thedatadensity p(x 0, a). A familiar and useful toolhereis thehistogram andmodemvariants like stem-and-leaf plots. Thesehelpto determine whether a normal density might be appropriate, or whether somericher Ifthedataconsist family is required. oftwo, ormore, quantities x = (w, z), then a plotofz against w will helpto assess theregression the of z on w and hencep(zlw, 0, a). These devicesinvolve of repeated thehistogram. concept observations, e.g. to construct We shallreturn to thispointin ofexchangeability discussion oftheconcept in Section14. Thereare issues herethathave not alwaysbeen recognized. You are making an uncertainty thedataavailable, is foryoucertain. statement, p(x 0, a), fora quantity x, which, with Moreover in theform of data analysis you are doingit withreal thought, aboutx. It is strange onlyto use forthe onlycertain uncertainty (probability) quantity present. Furthermore, supposethatt(x) thehistogram describes theaspectsof thedatathat youhaveconsidered, or theregression. Then theresult ofthedataanalysis is really on t(x).Forexample, p{xl 0, a, t(x)}; youareconditioning you mightsay thatx is normalwithmean 0 and variancea, but only afterseeing t(x), or equivalently doing the data analysis.This may lead to spurious precisionin the subsequent calculations. One way to proceedwouldbe to construct themodelwithout lookingat thedata. Indeed, thisis necessary whendesigning theexperiment (Section16). The construction couldonly come in close consultation withthe clientand wouldinvolvelarger modelsthanare currently used. Perhapsdata analysiscan be regarded as approximate out thegrosser inference, clearing ofthelarger modelthat arenotneededintheoperational, smaller model. aspects Another of a histogram, is onlyexhibited forone value pointis that p(xJH,a), sayin theform little evidence evenif of(0, a), namely theuncertain valuethat holdsthere. The datacontain that, for models x - N(0O, ao), itis N(0, a) in situations unobserved. Thereis a case therefore making as big as yourcomputing to allow fornon-normality and general powerwill accommodate, in Section11.Noticethat thedifficulties values.The size ofa modelis discussed raised parameter inthelasttwoparagraphs areas relevant to thefrequentist as they areto theBayesian. The assessment is is different whenit comesto theparameter becausethere problem density in no repetition often andthefamiliar toolsof dataanalysis areno longer available.Furthermore, thedatadensity, several standard modelsarereadily handling available, e.g. theexponential family forease of analysis andmethods builtaround beendesigned GLIM. Thesemodelshaveprimarily low dimensionality, thepossession of specialproperties like sufficient statistics of fixed through havethedifficulty that Theseconstraints have outliers arenoteasilyaccommodated. though they of computer been imposedpartly limitations through capacitybut moreimportantly because, the within thefrequency there are no general and a newmodelmayrequire approach, principles introduction of new ideas. Moderncomputational lessen the first and techniques difficulty withtheirubiquitous use of the probability Bayesianmethods, calculus,removethe second in Section15. theobjectis always to calculate to this entirely; point p(Olx).We shallreturn limited Few standard modelsareavailablefor theparameter to thedensities density, essentially The thatare conjugate to the member of the exponential chosenforthe data density. family chantis 'wheredid you get thatprior?'.It is not a silly gibe; thereare serious frequentist
307
difficulties butthey arepartly causedbya failure to linktheory andpractice. I haveoften seenthe stupidquestionposed 'what is an appropriate priorforthe varianceo2 of a normal(data) It is stupid itis essential becausea is just a Greek letter. To find theparameter density?'. density, to go beyond the alphabet and to investigate thereality behindor2. Whatis it thevariance of? Whatrange ithas?Recallthat ofvaluesdoestheclient think taskis to express thestatistician's the in probability A sensible uncertainty ofyou,theclient, form of question be 'whatis terms. might in healthy, youropinionaboutthe variability of systolic blood pressure middle-aged males in England?'.But,evenwithcareful regard forpractice, it wouldbe stupid to denytheexistence of veryreal,and largely unexplored, problems here.This is especially true when,as in mostcases, theparameter space has highdimensionality. We are lackingin methods of appreciating multivariate densities. (This is trueof data as well as parameters.) Physicists did notdenyNewton's laws becauseseveral of theideas that he introduced weredifficult to measure. No, they said that the laws made sense,theyworkwherewe can measure, of so let us developbetter methods measurement. Similar considerations applyto probability. A neglected area of statistical research is theexpression of multivariate in terms of probability, opinion whereindependence is invoked too often, on grounds of simplicity, It is notoften of ignoring reality. recognized that thenotion independence, sinceit involves probability, is also conditional. The mantra that x2, . . ., (xl, xJ forming a random sample are independent is ridiculous whentheyare used to infer x,,1.They areindependent, given0. It is sometimes arguedthatdata analysiscan make no contribution to the assessment of a theparameter distribution for at thedata,whereas becauseitinvolves looking whatis neededis a distribution prior to thedata.This is countered bytheobservation that we all use datato suggest andthen consider to itwas without whatourattitude thedata.You see a sequenceofOs something and Is andnoticefew, butlong,runs.Could thesequencebe Markov of exchangeable instead as You think aboutreasonsforthe dependence you had anticipated? and,havingdecidedthata think Markov chainis possible, aboutitsvalue.Had you seen Is onlywhentheorder was prime, reasons andaccepttheextraordinary that has happened. youwouldfailto find thing 11. Models again
A modelis a probabilistic is helpedby data description of a client's whoseassessment situation, and exploration oftheclient's Severalproblems of which analysis present understanding. remain, one is thesize ofthemodel.Shouldyouinclude extra Should besidesx, as covariates? quantities, in number theparameters increase to offer a normal distribution greater flexibility, replacing bya a modelshouldbe as big as an elephant. Student's t,say? Savage once gavethewise advicethat a world Indeed, theideal Bayesianhas one modelembracing whathas been termed everything: view.Such a modelis impractical and you mustbe content witha smallworldembracing your interests. But how small shouldit be? Really small worldshave the advantage of immediate that and thepossibility of obtaining butthey havethedisadvantage simplicity many results, they so thatp(ylx) based on themmayhave a high of reality maynot capture yourunderstanding score.Compromise is called for, butalwayschoosethelargest modelthat penalty yourcomputais to use a largemodel and to determine, tionalpowerswill tolerate. One successful strategy ofthemodelseriously robustness what affect final conclusion. Those through studies, aspects your in size achieved. that do notcan be ignored andsomereduction It is valuableto think between thesmallworldselectedand thelarger abouttherelationships worldsthatcontainit. In Englandit is current practiceto publishleague tables of schools' thatuse onlyexamination thatthisis a ridiculously small results. performances Manycontend It andthat liketheperformance ofpupilsat admission, should be included. world other quantities,
308
D. V.Lindley
is sometimes saidthat, in ourapproach, a smaller world cannot fit intoa larger one andthat, ifthe it is necessary former is found to be inadequate, to start afresh. This is notso; theapprehension arisesthrough a failure to appreciate theconditional nature of probability. Here is an example. Supposethat yourmodelis that x - N(O, a). Then,in full, you are describing p(x 0, a, N, K), K and supposing x has mean 0 and whereN denotes In words, normality. knowing normality, variance a. If thepresence of outliers suggests an extension to Student's t,so that x - t(H,a, v) withindexv, thenthetwo modelscohere, therestriction, or condition, that the former having v= oc. In contemplating t, you will alreadyhave considered largevalues of v. Typically, in a smallto a largemodel,theformer will correspond to thelatter under conditions passingfrom and thesmaller is embedded in thelarger. thisappearsnotto be so. For example, Occasionally, is to one modelmaysay that x is normal; theother thatlog(x) is. One wayoutof thisdifficulty introduce a seriesoftransformations with x andlog(x) as twomembers, as suggested byBox and Cox (1964). If thisis notpossibleand you are genuinely modelMl or model uncertain whether thendescribe a modelthathas Ml with M2 obtains, youruncertainty by probability, producing 1 - y. Partof theinferential probability y and M2 withprobability problem will be thepassage fromy to p(MI x). This is a problem thathas been discussed(O'Hagan, 1995), and where impropriety is bestavoidedandconglomerability assumed. Largemodelshavebeen criticized becausethey can sometimes appearto produce unsatisfactory results in comparison withsmaller models.Forexample, in considering theregression ofone on many quantity others, youareurged notto include toomany regressor variables, becauseto do so leads to overfitting. This undesirable feature comes about through the use of frequentist A theorem methods. within theBayesian paradigm showsthat thephenomenon cannot arisewith a coherent analysis, essentially becausemaximization overa subsetcannot exceedthatoverthe of fitting fullset.The issue is connected withconglomerability (Section5) becausethemethod is ordinarily is equivalent to a Bayesian that used,leastsquares, argument usingan improper prior, This does notcause namely a uniform distribution overthespace of theregression parameters. offence whenthe dimension of the space is low,but causes increasing difficulties as it grows (Stein,1956),andhencetheoverfitting. ofstandard someofwhich are Statisticians a collection have,overtheyears, developed models, so routine that exist.Although whenmodified their computer packagesfor implementation these, their to provide are indubitably from form a coherent frequentist analysis, valuable, theyshould realities. neverreplaceyourcareful construction of a model from the practical We repeatthe in constructing themodel:oncethat has beendone,leave everything to adviceto think important of nonthe probability is provided calculus.An illustration by the inconvenient phenomenon in samplesurveys. to think thatgaveriseto Here it is important aboutthemechanisms response andto modelthem. do notflowfrom thelack ofresponse, Some modelsin theliterature anyreal The client's of whythedataare incomplete, andthey aretherefore understanding suspect. reality must be modelled inprobability terms. itis possibleto test The suggestion has often beenmadethat theadequacyof a model, without thespecification ofalternatives, andmethods for doingthishavebeendeveloped (Box, 1980).We therejection withan alternative that of a modelis nota reality arguethat exceptin comparison of probability whichis essentially a comparative The reasonlies in the nature appearsbetter. are no absolutes. The point measure. The Bayesianworldis a comparative worldin whichthere not will emergeagain in decisionanalysisin Section 15, whereyou decide to do something, becauseit is good,butbecause it is better thananything else thatyou can think of. Peoplewho to votein an election on thegrounds no candidate meetstheir missthe refuse that requirements with thelimited of candidates, point that, availability youshouldchoosetheone whom youthink is best, evenifawful.
Philosophy ofStatistics 12. Optimality
309
Theposition has beenreached that thepractical uncertainties should be described byprobabilities, incorporated intoyourmodel and thenmanipulated according to the rules of the probability calculus.We now consider theimplications thatthemanipulations within thatcalculushave on statistical methods, especiallyin contrast with frequentist procedures, thereby extending the discussion ofsignificance in Section tests andconfidence intervals 6. It is sometimes said,bythose who use Bayes estimates or tests, thatall the Bayesianapproach does is to add a prior to the frequentist paradigm. A prior is introduced merely as a devicefor constructing a procedure, that is theninvestigated within thefrequentist theladderof theprior framework, ignoring by whichthe procedure was discovered. This is untrue: the adoption of the fullBayesianparadigm entailsa drastic inthewaythat change aboutstatistical methods. youthink A largeamount of effort has beenputintothederivation of optimum tests and estimates. This is evident on thetheoretical side wherethesplendid scholarly books of Lehmann (1983, 1986) are largely devotedto methods of finding good estimates and testsrespectively. Again,more informally, in dataanalysis, reasons are advanced forusingone procedure rather thananother, as whentrimmed meansare rightly said to be better thanrawmeansin thepresence of outliers. Let in thesenseofsaying us therefore lookat inference, abouta parameter something 0, givendatax, in the presenceof nuisanceparameters a. The frequentist may seek the best pointestimate, confidence interval or significance testfor 0. A remarkable, and largely unrecognized, factis that, the Bayesianparadigm, within all the therecipe.It optimality problems vanish;a wholeindustry How can thisbe? Consider disappears. is to calculate for theparameter of interest giventhedataand background p(Olx,K), thedensity knowledge. This density is a complete description of yourcurrent understanding of 0. Thereis nothing moreto be said. It is an estimate: your onlyestimate. Integrated overa set H, itprovides H is true.Thereis nothing of whether better thanp(Olx,K). It is yourentire understanding If themodel Consider thecase ofthetrimmed means unique;theonlycandidate. justmentioned. incorporates simplenormality, the density for0 is approximately normalaboutx-, the sample mean.However, that is replaced t (with suppose normality twonuisance by Student's parameters, thenthedensity noton -x, of freedom); for0 will be centred, buton whatis spreadand degrees a trimmed and not because of mean.In other the estimate arisesinevitably essentially words, considerations. optimality The Bayesian'suniqueestimate, theposterior so thereis distribution, dependson theprior, some similarity between the Bayesianand the frequentist who uses a priorto construct their is the Bayes class.) The real estimates. optimum (The class of good frequentist procedures difference is that thefrequentist willuse different liketheerror rather than coherence criteria, rate, tojudgethequality in Section16. oftheresulting Thisis discussed further procedure. 13. The likelihood principle We haveseenthat inference is madebycalculating parametric p(O x) = p(x 0) p(O)
fp(x 0) p(O) dO.
(3)
Consider oftwoquantities, x and 0. As a function ofx, for p(x 0) as a function anyfixed 0, p( 0) is a probability it is positive and integrates, of 0, for density, namely overx, to 1. As a function butdoes notusually to 1. It is calledthelikelihood of0 for anyfixed x, p(x .) is positive integrate the fixed x. It is immediate from thatthe data make to equation(3) thatthe onlycontribution inference is through function for theobserved that thelikelihood x. Thisis thelikelihood principle
310
D. V. Lindley
valuesofx, other than that observed, playno rolein inference. A valuablereference is Berger and Wolpert (1988). This facthas important recognized consequences. in inference Whenever an integration takes place overvalues of x, the principle is violatedand the resulting procedure may cease to be coherent. Unbiased estimates and tail area significance testsare among the casualties.The likelihood function therefore playsa moreimportant rolein Bayesianstatistics thanit does in the frequentist form, yetlikelihood alone is notadequateforinference butneedsto be tempered by theparameter distribution. Uncertainty mustbe described by probability, notlikelihood. Before enlarging on thisremark, it is important to be clearwhatis meant If a modelwith by likelihood. datax has been developed withparameters (0, a), then p(x 0, a) as a function of (0, a), forthe fixed observed valueofx, is undoubtedly thelikelihood in equation function. However, inference (3) doesnotinvolve theentire likelihood butonlyitsintegral function,
p(xI 0) =Jp(x I H, a) p(a I0) da. (4)
We refer to thisas thelikelihood of 0 buttheterminology is notalwaysaccepted. The reasonis clear:itsconstruction involves one aspect, p(a 0), of theparameter density, p(O, a), whichlatter is not admitted to the frequentist or likelihoodschools. In neither school is theregeneral agreement aboutwhatconstitutes the likelihood function fora parameter 0 of interest in the of a nuisanceparameter presence a. Thereare at least a dozen candidates in theliterature. For in addition example, to theintegrated form in equation is p(x 0, a), where (4), there a is thevalue that makesp(xI0, a) overa a maximum. The plethora of candidates reflects theimpossibility of anysatisfactory that definition avoidstheintrusion ofprobabilities for theparameters. The reasonforlikelihood being,on its own,inadequate is that, unlikeprobability, it is not If A and B are two exclusive additive. sets,thenp(A U B) = p(A) + p(B), omitting theconditions, it is nottruethatl(A U B) = l(A) + I(B) fora likelihood whereas function 1(.). Since the properties used as axiomsin thedevelopment of inference, e.g. in theworkof Savage,lead to additivity, anyviolation maylead to someviolation oftheaxioms.Thishappens withlikelihood. In Section5 we had an exampleinvolving colourand sex,whichwas expressed in terms of the informal of one event concept beingmorelikely In fact, thananother. theexampleholdswhen 'likely'is used in thetechnical senseas defined here.Likelihood is an essential in the ingredient inference butitcannot be theonlyone. recipe Noticethat thelikelihood i.e. to calculations principle onlyappliesto inference, once thedata have been observed.Beforethen,e.g. in some aspects of model choice, in the design of or in decisionanalysisgenerally, experiments a consideration of severalpossibledata values is essential (see Section16). 14. Frequentist concepts
Ever sincethe 1920s,statistics has been dominated by thefrequentist and has,by any approach sensiblecriterion, been successful; yetwe have seen thatit clasheswiththe coherent view in apparently serious ways.How can thisbe? Our explanation is that is a property, there shared by bothviews,thatlinksthem morecloselythanthematerial so farpresented heremight suggest. The link is the conceptof exchangeability. A sequence(xl, x2,..., x") of uncertain quantities is, foryou,exchangeable underconditions K if your joint probability distribution, givenK, is invariant undera permutation of the suffixes. For example,p(x1 = 3, x2= 51K) = p(x2= 3, = 1 5 on 2. and An permuting infinite is exchangeable ifevery finite sequence subsequence xi 1K) is so judged.The rolesof 'you' and K havebeenmentioned to emphasize that is a exchangeability
311
iftheconditions subjective judgment andthat youmaychange your opinion change. If youjudge a sequenceto be (infinitely) exchangeable, then your probability structure forthe sequenceis equivalent to introducing a parameter, say,suchthat, givenV/, themembers of the V/ areindependent sequence andidentically distributed (IID). As theparameter is uncertain, youwill have a probability distribution forit. This result is due to de Finetti (1974, 1975). Ordinarily V/ will consistof elements(0, a) of which 0 is of interest and a is nuisance.Consequently exchangeability imposesthe structure used above but withthe addition thatthe data x have ip is related theparticular form of IID components. Furthermore, to frequency properties of the sequence.Thus,in the simplecase wherexi is either 0 or 1, the Bernoulli sequence,i is the limiting proportion of them that are 1. Consequently, a Bayesianwhomakestheexchangeability is effectively judgment makingthe same judgment about data as a frequentist, but withthe addition ofa probability for theparameter. specification The concept of IID observations has dominated statistics in thiscentury. Evenwhenobviously inappropriate, as in the studyof timeseries,the modelling uses IID as a basis. For example, IID for xsome0, leading to a linear, first-order autoregressive, process. Ox,-,maybe supposed Within theIID assumption, ideasareapposite, someevenwithin theBayesian so frequency canon, there has developeda beliefthatuncertainty and probability are therefore based on frequency. Some statistics texts IID dataandtherefore therange ofstatistical onlydeal with restrict activities. Theirexamples will comefrom experimental science, where repetition is basic,andnotfrom law, whereit is not.Frequency, is notadequatebecausethere is ordinarily no repetition of however, parameters; theyhave uniqueunknown the confusion values. Consequently between frequency and probability has deniedthe frequentist the opportunity of usingprobability forparameter with for theresult that ithas beennecessary them to developincoherent like uncertainty, concepts confidence intervals. The use offrequency concepts outside exchangeability leads to another difficulty. Frequentists their thattheyare justified 'in the long run',to whichthe oftensupport arguments by saying coherent a confidence response is 'whatlong run?'.For example, interval (see Section6) will 1 - a of timesin the long run.To make sense of thisit is coverthe truevalue a proportion to embedthe particular case of data x intoa sequenceof similardata sets: which necessary whatis similar? The classicexample is a datasetconsisting of r successesin n trials, sequence?; In thesequencedo we fixn, orfixr or someother feature oftheobserved judgedto be Bernoulli. data?It matters. an answer forthesinglesituation, whereas often Bayesians provide frequentists needto embedthesituation intoa sequenceofsituations. The restriction ofprobability to frequency can lead to misrepresentations. Hereis an example, G. It is the determination of physicalconstants, such as the gravitational constant concerning made at one place and time are commonand reasonableto supposethatthe measurements G. It is reasonable meanas the and unbiased, each having to use their exchangeable expectation current estimate of G. Some rejection of outliers may be needed beforethis is done. The to thisestimate attached is found uncertainty by taking s2, equal to theaverageof the squared from deviations the mean,and quotinga standard error of s/IVn, wheren is the number of measurements. This leads to confidence limitsfor G. Experience showsthatthe morerecent In other thelimits estimates theconfidence limits of earlier estimates. usuallylie outside words, A scoring weretoo narrow. of G wouldproducea largepenalty score.The rule forestimators themeasurements reasonis that are actually biased.Sincetheamount of thebias is notamenable to frequency The Bayesianapproach forthebias wouldhave a distribution ideas,it is ignored. and woulduse as a priorfor G theposterior from the last estimate, possibly adjustedforany in themeasurement are too smallbecause onlythe modifications standard errors process.Often of uncertainty is considered. Similarmistakescan arise with the exchangeable component
312
D. V. Lindley
predictions offuture numbers ofcases ofacquired immune deficiency syndrome. Theycan ignore inpersonal changes behaviour or Government policy, changes that arenotamenable to frequentist analysis. 15. Decision analysis
It has been notedhow statistics began withthe collection and presentation of data,and then extended to include thetreatment ofthedataand theprocess which we nowcall inference. There is a further stagebeyondthat, namely theuse of data,and theinferences drawn from them, to reacha decision andto initiate In myview, action. statisticians havea realcontribution to maketo decisionanalysisand shouldextendtheir data collection and inference to includeaction.The methods ofRamsey and Savagehavedemonstrated howthefoundations can be presented through decisionanalysis. The extension ifwe ask whatis the to includeactioncan be better understood in calculating on past purpose of an inference thatconsists data y, conditional p(ylx) forfuture datax. An examplecitedwas a doctor who had data on a drugand wishedto infer whatmight happento a patient giventhe drug.The exampleinvolvesa decision,namelywhichdrugto prescribe, another drugpossiblyleadingto a different inference fory. We argue,following Ramsey, that an inference is onlyof valueif it is capableof beingused to initiate Partial action. knowledge cannot that be used is of little value.Even in itsparametric form, p(Olx) will onlybe worthwhile if it can be incorporated intoactions theuncertain that involve 0. Marxwas right: the point is notjustto understand theworld (inference) butalso to changeit (action).Let us see how thiscan be doneintheBayesian view. The structure usedbySavageandothers is to formulate a listofpossibledecisions d that might be taken. The uncertainty is captured in a quantity 0. The pair(d, 0) is termed a (or parameter) ifyoutakedecision d whentheparameter whatwillhappen has value0. consequence, describing Wehaveseenhowtheuncertainty in0 needstobe described distribution bya probability p(O). This will be conditional on yourstateof knowledge, from thenotation. It mayalso whichis omitted in advertising, as inthecase where thedecisions and0 on thedecision, areto invest ornot, depend is next the year's sales.Wetherefore write p(O d). The foundational argument goes on to showthat ofthe merits oftheconsequence theutility (d, 0) canbe described bya realnumber u(d, 0), termed ifithasthehigher Ifthese One consequence is preferred toanother are consequence. utilities utility. ina sensible constructed thebestdecision is that which maximizes way, your expected utility
fu(d, 0) p(O d) dO.
The addition with of of a utility function forconsequences, combined theprobability description leadsto a solution to thedecision has to be described with care.It is uncertainty, problem. Utility not merelya measureof worth, but a measureof worth on a probability scale. If the best 1 and worst has utility consequence utility 0, thenconsequence (d, 0) has utility u(d, 0) if you thesubjective areindifferent between (notice element) forsureand (a) theconsequence (b) a chanceu(d, 0) ofthebest(and 1 - u(d, 0) oftheworst). It is thisprobability construction to emergeas the onlyrelevant thatenablesthe expectation criterion thechoiceof decision. for embraces all aspectsoftheconsequence. Forexample, Utility ifone outcome ofa gambleis a winof ?100, itsutility notonlyan increase in monetary includes the thrill assetsbut also of the gamble.Some analyses, based solelyon money, are defective becauseoftheir limited viewofutility.
313
Noticethat, just as p(O) is notthestatistician's uncertainty, butrather theclient's, so theutility is thatof thedecisionmaker. The statistician's roleis to articulate theclient's preferences in the form of a utility function, just as it is to express their uncertainty through probability. Noticealso that theanalysis supposes that there is only onedecision maker, the'you' ofourtext, though 'you' maybe several individuals forming a group, making a collective decision. None ofthearguments given hereapply tothecase oftwo, ormore, decision makers whodo nothavea common purpose, ormayevenbe in conflict. Thisis an important limitation on maximized expected utility. One topicthat statisticians haveoften considered their own,at leastsincethebrilliant work of Fisher(1935), is the designof experiments. This is a decisionproblem and fits neatly intothe principles just enunciated. Let e be a member of a class ofpossibleexperiments from whichone must be selected. Letx denote datathat might arisefrom suchan experiment. The experimentation presumably has some purpose, expressed by theselection of a (terminal) decisiond. As usual, denote theuncertain element by 0. (A similar analysis applieswhen theinference is for future data y.) The final consequence ofexperimentation andaction is (e, x, d, 0) towhich youattach a utility
u(e, x, d, 0). The expectedutility is
Ju(e,x, d, 0) p(O e, x, d) dO,
(5)
the uncertainty beingconditional on all the other ingredients. The optimum decisionis thatd which maximizes expression (5). Denotethemaximum value so obtained by -u(e,x). The expectation ofthisis
Ju(e,x) p(x e) dx,
(6)
sincex is theonlyuncertain at thisstage, element theuncertainty beingclearly dependent on the e. A final theoptimum experiment maximization ofexpression (6) provides experimental design. thatare involvedhere,even thoughthe technical Notice the simplicity of the principles an Thereis a temporal alternates between manipulations maybe formidable. sequencethat taking overthe quantities thatare uncertain and maximizing overthe decisionsthatare expectation then. must on all that is known The utility available.Each uncertainty be evaluated conditionally is attached to thefinaloutcome, other like iu(e, x) beingderived therefrom. (expected) utilities, Thisprovides a formal framework for thedesign ofexperiments. It wouldappear are to be a sensible that criticism ofthemethod justoutlined many experiments in about This notconducted witha terminal decision mindbutmerely to gather information 0. theinterpretation about0 of a decision.Information aspectcan be accommodated by extending d be on your So letthedecision about0 expressed, as always, depends uncertainty byprobability. itis to selecttherelevant A then Often density, herep(O e, x). utility function can be constructed. in to supposeu additive thesensethat reasonable
u(e, x, d, 0) = u(e, x) + u(d, 0), (7)
Notice thefirst term involving theexperimental cost and thesecondtheterminal consequences. in Section 9. Hereu(d, 0) maybe theconnection rulessuggested between u(d, 0) andthescoring to decision d to announce has thought of as a reward scoreattached p(O e, x) whentheparameter oftheinformation value00.The usualmeasure provided byp(O) is Shannon's,
p(O)log{p(0)} dO.
in connection with has been used by Neymanand others The languageof decisionanalysis
314
D. V.Lindley
thehypothesis H. to accept,and to reject, wheretheyspeakof thedecisions hypothesis testing, of as action,as be thought can legitimately Thereare cases whereacceptanceand rejection cases wherewe could calculate of a batchof items.Equallythere are other withtherejection may, as inthelast K. The latter form aboutH on datax andknowledge p(Hlx, K) as an inference purposes. ofas a decision. for different Our Bothforms arevalidanduseful be thought paragraph, before howto modelthereality youto consider bothviewsanditis for philosophy accommodates a widevariety of to encompass is itsability feature of theBayesianparadigm you.An important situations usinga fewbasicprinciples. that there aremany different cases. in discussing testing, haveargued hypothesis Somewriters, cases havebeen inferential. Other action;somearepurely involve Forexample, somemayreally of situations and greatcomplexity. The in a plethora endingup, as withlikelihood, described, and thatthe differences Bayesianview is thattheseare all coveredby the generalprinciples it for in theprobability Some folk lovecomplexity andutility structures. aredifferences perceived andevenerrors. hidesinadequacies 16. Likelihood principle (again)
yetdeniedbymany In Section13 it was seenhowthelikelihood principle is basic forinference, ofthedecision design is part ceases to applywhenexperimental The principle frequentist notions. in expression (6). At the initial because of the integration overx involved essentially analysis, on any to perform, the data, conditional whichexperiment stage,whereyou are considering is expressed through is uncertain foryou. This uncertainty selected, experiment p(xle) and is in expression an inference, or in (6). In conducting eliminated of expectation by the operation itis thedataareavailable.Consequently thevalueofx, for decision, youknow making a terminal to consider is all thatis needed.Whenit is a other datavalues and thelikelihood unnecessary of experimental the data are surely not availableand all possibilities mustbe design, question theimportance of theconThis contrast between emphasizes pre-and post-data contemplated. of twoarguments, notone. ditions when is a function Probability youfaceuncertainty. can involve of integration overx used one form Just howtheconsideration of theexperiment decision which error canbe seenas follows. Denotebyd*(e, x) that namely rates, byfrequentists, be written over x, expression (5), canthen maximizes theexpected utility (5). The expectation
J u{e, x,d*(e, x), O} p{0

Now
e, x, d*(e, x)} dOp(xle) dx.
p{Ole, x, d*(e, x)} = p(Ole, x) is ofe andx, addsno further The latter probability sincetheaddition ofd*, a function condition. p(x e, 0) p(O e)/p(x e). of integration, we have theexpectation andreversing theorders this valueinto Inserting
Ju{e x, d*(e, x), 0} p(x e, 0)dxp(0

Ju{d*(e,x), 0} p(x e, 0) dx.
e)dO
and wheretheinner overx. For a fixed experiment, integration integral exposesthe frequentist is a utility involve with that doesnotdirectly integral x, therelevant
315
With twodecisions and 0-1 utility, have fp(xIe, 0) dx overa subsetof sample we immediately spaceandthefamiliar errors ofthetwokinds. The occurrence of error ratesleads to some confusion because theyare often treated as the in decision to be controlled, quantities andtherefore occupya primary position analysis, whereas ourprimary consideration has beenimposed, lies in theutility structure. Once theutility structure the errors will look after themselves. However, a consideration of different errors maylead to in theutility viewis that theutilities, nottheerrors, undesirable changes structure. The Bayesian aretheinvariants oftheanalysis. an experiment Forexample, to design to achieveprescribed error rates maybe incoherent. The prescription should instead specify utilities. 17. Risk
Riskis a term which we havenotused.It has beendefined for (Duckworth, 1998) as 'thepotential to uncertain theoccurrence ofwhich wouldhaveundesirable The exposure events, consequences'. in whatwe have called decisionanalysis, definition recognizes thetwoelements theuncertainty in common and theutility, though Duckworth, withmoststatisticians, the loss, or emphasized than thegain,theutility. The change is linguistic. Riskis therefore undesirability, rather dependent in Section on twoarguments andourfoundational 3 is dependent on theseparation of presentation inuncertainty them andworth. Yetitis common, as Duckworth does,to quotea measure ofriskas a single with a 1000-mile number, so denying theseparation. Thustheriskassociated is 1.7 flight insuitable units. Thisis defensible for thefollowing reason. for The optimum decision maximizes expected utility which, datax, is proportional to ju(d, 0) p(0) p(x 0) dO as a weighted likelihood andmaybe written jw(d, 0) p(x 0) dO, where for a givenlikelihood, does w(d, 0) = u(d, 0) p(O). The analysis, givendata,andhencefor notdependseparately and probability, thetwocorner-stones of thephilosophy, but on theutility ifyouwereto watcha coherent To putit in another onlyon their product. way, person acting (as from be able distinct histhoughts) expressing youwouldnot,on thebasisoftheobserved actions, to separate thetwoelements; theweight function be determined. only might fromprobability. The most Nevertheless thereare several reasons for separating utility i.e. for without reference to important is theneedfor inference, a soundappreciation oftheworld fortheworld.In action.The philosophy structure saysthatthisis had through yourprobability inference, manipulations take place entirely within the probability calculus,whichtherefore in theform of pure becomesseparated from Thereare peoplewho arguethatinference, utility. in technology. is unsatisfactory Whatis undoubtedly whenisolatedfrom itsapplications science, notan activity important is that inference should be in a suitable form for that is decision-making, isolatedfrom application. We have seen how Bayesianinference is perfectly adaptedforthis from purpose.It will be seen in Section 19 how some aspectsof the law separateinference of communication Another reasonfortheseparation lies in thedesirability between people, betweendifferent cited above. Part of the 'yous'. Take the exampleof the 1000-mileflight on theobserved calculation rests accident ratefor aircraft. Another rests on theconsequences part it is known of theflight. to thesetwoelements. thatfor You mayreactdifferently For example,
decision.
316
D. V.Lindley
is an increased in cramped elderly peoplethere riskofcirculatory problems dueto sitting for hours seats,andtherefore youmayevaluate your accident ratedifferently from that suggested purely by the accident rateforaircraft. In contrast, in more a healthy, middle-aged executive, travelling in first comfort class,mayaccepttheaccident statistics buthavea different utility becauseof the importance of themeeting to whichhe is bound.These considerations suggest that theaccident rateandconsequences ofan accident be keptseparate becauseyoumaybe able touse one element butnottheother, whereas theweight function alonewouldbe moredifficult to use. 18. Science Karl Pearsonsaid 'The unityof all science consistsalone in its method, not in its material' (Pearson, 1892). It is nottrueto say thatphysics is sciencewhereas literature is not.Thereare times whena physicist makesa leap oftheimagination likean artist. ofwordcounts can Analyses helpto identify theauthor ofan anonymous pieceofliterature. is certainly Scientific method much moreimportant inphysics than in literature, butithasthepotentiality to be usedin anydiscipline. Of whatthendoes themethod consist? Thereis an enormous literature devoted to answering thisquestion and it is presumptuous of me to claim to have the answer. But I do believethat in their statisticians, deep study of thecollection and analyses of datahave,perhaps unwittingly, uncovered the answerand it lies in the philosophy withits here.Experimentation, presented of data, is an essentialingredient of scientific so the connection production method, between In thisview,thescientific statistics and scienceis notsurprising. in expressing method consists in terms viewofyour uncertain world ofprobability, your performing experiments to obtain data, and usingthatdata to updateyourprobability and henceyourview of theworld.Although the in thisupdating is ordinarily theproduct emphasis puton Bayes,effectively rule,theelimination of theubiquitous nuisance rule2 is also important. As we have seen, parameters bytheaddition thedesign oftheexperiment is also amenable to statistical method of treatment. Scientific consists in Section8, each a sequencealternating between and experimentation. As explained reasoning scientist is a 'you' with their ownbeliefs whichare brought intoharmony theaccumulathrough tionofdata.It is thisconsensus that is objective science. havebeenmadeto thissimple viewon thegrounds that scientists do notact in the Objections in thelastparagraph. The response tests. to the waydescribed Theyevendo tailarea significance is normative, It is nottheintention is thatourphilosophy notdescriptive. to describe objection howscientists behavebuthowthey wouldwishto behaveifonlythey knewhow.The probability of calculusprovides the 'how'. An impediment 'how' is the lack of good methods affecting whenno exchangeability is availableto guide you. This is assessingprobabilities assumption itis wider as determining butin reality than that. Some attacks on described ordinarily your prior, sciencearetruly on howscientists arevalid. attacks behave-on thedescriptive aspect.Often they Such attacks wouldbecomeless cogentif theydealtwiththe normative are aspect.Scientists human.Real scientists conditions. One wouldhope thata scientist are affected by extraneous fora multinational and another working company employed by an environmental agencydiffer other andwouldupdateaccordingly. One suspects that issuesintervene. onlyin their probabilities It is myhopethat in either a Bayesian wouldhelpto exposeanybiasesorfallacies ofthe approach protagonists' arguments. 19. Criminallaw
Thereare tworeasons forincluding thissection on criminal law: first becauseofmyowninterest in forensic thisinterest thatsome that has engendered science;secondbecauseof theconviction
317
important aspectsofthelaw areamenable in thelastsection. to thescientific method as described These aspectsconcern thetrialprocess, wherethereis uncertainty aboutthe defendant's guilt, thatis subsequently uncertainty tempered by data,in theform of evidence, to reacha hopefully consensus abouttheguilt. thisfits intotheparadigm Clearly developed here.Lawyers do nothave a monopoly on thediscovery ofthetruth; scientists havebeendoingit successfully forcenturies. Thereare aspectsof thelaw,likethewriting of a law,to which thescientific method has little to contribute. thecourts However, arenotjustconcerned with guilt; they needto pass sentence. The law has separated thesetwofunctions, just as we have.Theycan be recognized as inference and decision-making respectively. The defendant in a court of law is either G ornotguilty G. The guilt is uncertain truly guilty and so shouldbe described by a probability from p(G). (The background knowledge is omitted thenotation.) Data, in theform of evidenceE, are produced and theprobability Since updated. there areonly twopossibilities, G or - G, itis convenient towork interms ofodds(on), o(G) when reads Bayes'stheorem o(GIE) G p(E I o(G) p(El G)oG p(G)/p(G),
involving multiplication of the originalodds by the likelihood ratio.Evidenceofteninvolves nuisance thesecan be eliminated in theusual way by theaddition parameters but,in principle, rule.Theywilloften enter intop(E- G) becausethere be several thecrime might waysin which couldhavebeencommitted, other than As thetrial further is evidence bythedefendant. proceeds, introduced and successivemultiplications ratiosdetermine the finalodds. A difby likelihood hereis that successive either ficulty pieces of evidence maynotbe independent, givenG orgiven rG. So farthismethod has mainly beenused successfully forscientific like bloodstains evidence, and DNA (Aitken in general and Stoney, on satisfactory methods 1991). Its applicability depends ofprobability assessment. It has thepotential ofhelping thecourt to combine advantage disparate in Section3, theprincipal as remarked merit typesof evidencefor, of measurement lies in its to meldseveral uncertainties intoone. ability The law agrees withthe philosophy in separating inference fromdecision.It even allows different evidence to be admitted intothetwoprocesses. Forexample, convictions previous may be used in sentencing (decision)butnot alwaysin assessingguilt(inference). Expectedutility a theorem to theeffect that includes cost-free information is alwaysexpected to increase analysis This suggests theutility. that theonlyreasonfornotadmitting evidence shouldbe on grounds of cost(Eggleston, 1983). Thepartofthetrial atpresent, inthejudgment results ornotguilty, process that, guilty, should, in our view,be replacedby thecalculation of odds o(GIE), whereE is now thetotality of all admitted evidence. On thisview,thejuryshouldnotmakea firm of guilt, statement or not,but their ofguilt. At leastthisprovides state final a moreflexible andinformative odds,orprobability, it provides communication. More importantly, thejudge withtheinformation thathe needs for If d is a possibledecision, aboutgaol ora fine, then theexpected ofd is sentencing. utility
u(d, G) p(G IE) + u(d, -G) p(-G IE).
The optimum sentence is thatd whichmaximizes thisexpectation. The utilities herewill reflect evaluation ofthemerits of different sentences fortheguilty andtheseriousness society's person,
318
D. V.Lindley
of falseimprisonment. We are a longway from theimplementation of theseideas butevennow they can guideus intosensible procedures andavoidincoherent ones. 20. Conclusions
The philosophy of statistics presented herehas three fundamental tenets: first, thatuncertainty be described should merits byprobabilities; second, that consequences should havetheir described that theoptimum byutilities; third, decision combines theprobabilities andutilities bycalculating that. Ifthese taskofa statistician expected utility andthen maximizing areaccepted, then thefirst is to develop a (probability) modelto embrace theclient's interests anduncertainties. It will include thedataand anyparameters that arejudgednecessary. Once accomplished, themechanics of the calculus takeoverandtherequired is made.Ifdecisions areinvolved, themodelneedsto inference be extended to include followed mechanical ofmaximizing utilities, byanother operation expected One attractive feature is that thewholeprocedure is welldefined andthere needfor ad utility. is little hoc assumptions. a considerable need forapproximation. To carry Thereis, however, out this It is essential schemeforthelargeworldis impossible. to use a smallworld, whichintroduces but oftencauses distortion. Even the mechanics of calculation need numerical simplification Boththeseissueshavebeen considered in theliterature, whether or approximations. frequentist has been made. Where a real difficulty Bayesian,and substantial progress arises is in the of themodel.Manyvaluabletechniques construction have been introduced but,because of the of how to assess frequentist emphasisin past work,thereis a real gap in our appreciation in therequisite howto express ouruncertainties form. themost probabilities-of My viewis that thenewmillennium of sensible important statistical research topicas we enter is thedevelopment methods of probability assessment. This will require withnumerate co-operation experimental andmuchexperimental A colleague with work. someexaggerapsychologists putitneatly, though tion:'Thereareno problems instatistics ofprobability'. It is curious left theassessment that except inprobability thetypical knows andhasno interest expert nothing about, in,assessment. The adoptionof the positionoutlinedin this paper would resultin a wideningof the remit to include and statistician's as well as datacollection, modelconstruction decision-making, in theiractivity inference. Yet it also involvesa restriction that has not been adequately in their Statisticians are notmasters own house. Theirtaskis to help the clientto recognized. handlethe uncertainty thattheyencounter. The 'you' of the analysisis the client,not the Our journals,and perhapsour practice, the client's statistician. have been too divorcedfrom In thisI have been as guilty as any.But at least the theoretician has developed requirements. togooduse. methods. Yourtaskis toputthem References
D. A. (1991) TheUse ofStatistics inForensic Science.Chichester: Horwood. Aitken, C. G. G. andStoney, Soc. A, 151, 137-178. statistics andtheology (with discussion). J R. Statist. Bartholomew, D. J.(1988) Probability, M. (1987) Testing (with discussion). Statist. Sci.,2, 317-352. precise hypotheses Berger, J.0. andDelampady, Statistics. Hayward: Institute ofMathematical Principle. Berger, J.0. andWolpert, R. L. (1988) TheLikelihood In BayesianStatistics 6 (eds J.M. J.M. (1999) Nestedhypothesis criterion. testing: the Bayesianreference Bemardo, Clarendon. Oxford: Bemardo, J.0. Berger, A. P.Dawid andA. F M. Smith). J.M., Berger, 6. Oxford: Clarendon. J.O., Dawid,A. P. andSmith, A. F M. (eds) (1999) Bayesian Statistics Bemardo, Wiley. J.M. andSmith, A. F M. (1994) BayesianTheory. Chichester: Bemardo, J R. Statist. Soc. A, 143, in scientific modelling (withdiscussion). and Bayes' inference Box, G. E. P. (1980) Sampling 383-430. J R. Statist. (with discussion). Soc. B, 26, 211-252. oftransformations Box,G. E. P. andCox,D. R. (1964) An analysis inference (with paradoxesin Bayesianand structural Dawid,A. P., Stone,M. and Zidek,J.V (1973) Marginalization J R. Statist. discussion). Soc. B, 35, 189-233.
PhilosophyofStatistics
319
M. H. (1970) Optimal DeGroot, Statistical Decisions.New York: McGraw-Hill. D. (1995) Assessment Draper, andpropagation ofmodeluncertainty J R. Statist. (with discussion). Soc. B, 57, 45-98. F. (1998) The quantification RSS News,26, no. 2, 10-12. Duckworth, ofrisk. H. and Savage,L. J.(1963) Bayesianstatistical Edwards, W. L., Lindman, inference forpsychological research. Psychol. Rev.,70, 193-242. R. (1983) Evidence, Eggleston, Proof andProbability. London:Weidenfeld andNicolson. de Finetti, B. (1974) Theory vol. 1. Chichester: ofProbability, Wiley. (1975) Theory ofProbability, vol. 2. Chichester: Wiley. Fisher, R. A. (1935) TheDesignofExperiments. Edinburgh: Oliver andBoyd. Healy, M. J.R. (1969) Rao's paradox concerning multivariate tests ofsignificance. Biometrics, 25, 411-413. H. (1961) Theory ofProbability. Oxford: Clarendon. Jeffreys, E. L. (1983) Theory Lehmann, ofPoint Estimation. New York: Wiley. (1986) Testing Statistical Hypotheses. New York: Wiley. for O'Hagan,A. (1995) Fractional modelcomparison Bayesfactors (with discussion). J R. Statist. Soc. B, 57, 99-138. C. T. (ed.) (1956) TheShorter Onions, Oxford: English Dictionary. Clarendon. K. (1892) TheGrammar Pearson, ofScience.London:Black. F. P. (1926) Truth In The Foundations and probability. Ramsey, and OtherLogical Essays (ed. R. B. of Mathematics Braithwaite), pp. 156-198. London:KeganPaul. Savage,L. J.(1954) TheFoundations New York: ofStatistics. Wiley. (1977) The shifting foundations In Logic,Laws and Life: Some PhilosophicalComplications of statistics. (ed. R. G. Colodny), pp. 3-18. Pittsburgh: Press. Pittsburgh University of theusual estimation of themeanof a multivariate Stein,C. (1956) Inadmissibility normal In Proc. 3rd distribution. Mathematical Statistics Berkeley Symp. and Probability andE. L. Scott), vol. 1,pp. 197-206. Berkeley: (eds J.Neyman ofCalifornia Press. University Walley, P. (1991) Statistical Reasoning with Imprecise Probabilities. London:Chapman andHall.
Comments on the paper by Lindley

PeterArmitage(Wallingford) Dennis Lindleyhas written so frequently, and so persuasively, abouttheprinciples of Bayesianstatistics, thatwe scarcelyexpectto findnew insights in yet another such paper.The present paper shows how would be. Lindley'sconcernis withtheverynatureof statistics, wrongsuch a prior judgment and his argument unfolds and relentlessly. Those ofus who cannotaccompany clearly, seamlessly himto theend of hisjoumeymustconsider wherewe need to dismount; verycarefully otherwise we shallfind ourselves at thebus terminus, without a return ticket. unwittingly I wrote'thoseof us' because there mustbe manywho,like me,sympathize withmuchoftheBayesian to discarda frequentist whichappearsto have servedthemwell. It approachbut are unwilling tradition to enquirewhythisshouldbe so. One possibility, is worth trying of course,is thatour reluctance is, at of inertia, least in part,a manifestation a lack of courageor understanding. demonstrating I mustleave thatforothers to judge. I think, thatthereare sounderreasonsforwithholding though, for fullsupport theBayesianposition.Lindleyand I came to statistics during the 1940s, at a timewhenthesubjectwas dominatedby the Fisherianrevolution. During the 19th century inverseprobability had co-existed methods standard uneasilywithfrequentist errors and normalapproximations, by theuse of flatpriors, results beinginterpretable mode of reasoning, by either albeitwithoccasional lack of clarity. Fisherhad, oftheneed forinverse itseemed,clearedtheair bydisposing probability. Philosophical disputes, such as thosewithNeymanand E. S. Pearson,tookplace within thefrequentist school,although Jeffreys and a fewother pioneers maintained and developedtheLaplace-Bayes framework. To manyof us entering the fieldat thattimeit wouldhave seemedbizarreto overtum such a powerful to bodyof ideas. It is greatly thecredit of Lindley, and of a fewof his contemporaries like Good and Savage, thattheyrecognized the as they possibility that, might have putit,theEmperor had no clothes. The greatmerit of theFisherian revolution, apartfrom the sheerrichness of the applicablemethods, was the abilityto summarize,and to draw conclusionsfrom,experimental and observational data without reference to priorbeliefs.An experimental scientist needs to report his or her findings, and to statea range of possible hypotheses withwhichthese findings are consistent. The scientist will unhave prejudicesand hunches, but the reporting of theseshouldnot be a primary doubtedly aim of the forinstance, investigation. Consider, one of themajorachievements of medicalstatistics in thelasthalfcentury-thefirst studyof Doll and Hill (1950) on smoking and lungcancer.They certainly had prior hunches, e.g. thatair pollutionwas more likelythansmoking to cause lung cancer,but it would have servedno purpose to quantify these beliefs and to entertheminto the calculationsthrough Bayes's
320
Commentson thePaper by Lindley
theorem. Therewereindeedimportant uncertainties, aboutpossiblebiases in thechoice of controls and aboutthepossibleexistence of confounding factors. But thewayto deal withthemwas to consider each in turn, by scrupulous argument rather thanby assigning probabilities to different models. That is an arbitrary example, butone thatcouldbe replicated by studies in a widevariety of appliedfields. This is notto denytheimportance ofprior beliefsin theweighing up of evidence,or especiallyin the planning of future studies, butrather to cast doubton theneed to quantify thesebeliefswithanydegree ofprecision. As Curnow(1999) remarked, in connection withstudies on passivesmoking, to our way of thinking. 'Bayesian concepts must be fundamental However, quantifying and in any formal combining way theevidenceon mechanisms withthatfrom theepidemiological study are,in myview,impossible.' To return, then,to Lindley'somnibus, I findthatI shouldhave dismounted by stage (b) in the list of stages (a)-(e) in Section 7. I believe thatthereare many instancesof uncertainty which are best to measurement approached by discussionand further investigation and whichdo notlendthemselves by probability. In thatway I am absolved from thelaterneed to dismount at stage(d). On further thought, though, I should perhapshave dismounted at (a), where,we are told, 'Statisticsis the studyof unThis seems to claim too much forstatistics, certainty'. as indeedthe authorrecognizesat the end of Section2. I am surprised thathe abandonsthemoretraditional identification of statistics withthestudy of groupsof numerical observations. Uncertainty stillcomes intothepicture, by way of unexplained or 'random'variation, butthismoremodestview of our subjectputstheemphasison frequentist variation rather thanthemoreambitious Bayesianworldview. from Frequentists are accustomedto receiving generousamountsof criticism Bayesiansabout their incoherent practices.(Incidentally, I am not at all surethata littleincoherency is not a good thing. As Durbin(1987) implied,to look at a problemfromirreconcilable pointsof view may generateuseful insight.) Significance testscome in forespeciallyhardknocks.Thus (Section6), 'The interpretation of "significant at 5%" depends on n'. Well, it depends what you mean by 'interpretation'. In a rather is independent in some meaningful obvioussense,thedefinition of n. The possibleresults are arranged order and ranked on thenullhypothesis. of thesamplesize, a result that by their probability Irrespective is significant at 5% comes beyondtheappropriate of thenull distribution. Whattheobjection percentile means is that,on a particular formulation of priorprobabilities forthe non-null hypothesis, the Bayes factor will varywithn (Lindley,1957). However, thetwohypotheses thereis no reasonwhy comparing the non-nullpriorsshould be the same for different for the n. Large sample sizes are appropriate detection of small differences, and we might towards expectthenon-null priorto be moreconcentrated thenull forlargerather of thissortthephenomenon thansmall n. Withan adjustment disappears(Cox and Hinkley (1947), section10.5). methods to cope withcomplexseems to underestimate the abilityof frequentist Lindleysometimes As he says,thisis merely ities.For instance(Section6), 'meta-analysis is a difficult area forstatistics'. It has come rather thetaskof combining of manyexperiments. late to some disciplines, such theresults it was unusualto find as clinicaltrials, because previously studiesthatweresufficiently manyreplicated similarto be combinedsensibly. But a frequentist estimates and not significance analysis,combining In an earliergeneration it was a standard exercisein agricultural trials(Yates levels,is straightforward. and Cochran,1938) and in bioassay(Finney(1978), chapter 14). of changes Again,he asserts(Section 14) thatfrequentist analysisis 'unableto cope' withtheeffects in personalbehaviouror Government of future numbersof cases of acquired policy on predictions in immunedeficiency syndrome. Yet the effects of changes in sexual practicesand of improvements can be estimated or observation and builtintothemodelsthatare used forsuch therapy by experiment It seems better to approachtheseproblems thanto projections. by specificenquiriesabout each effect imposedistributions ofbias determined by subjective judgments. thanI had expected.I respect the has led me to respond morerobustly Lindley'sforceful presentation of modern and I acknowledge theinfluence thatit has had in reminding intellectual rigour Bayesianism, statisticians thatancillary information is important, whether or notit is includedin a formal analysisof thecurrent data. In particular fordecisionsand verdicts a Bayesianapproachseems essential, although here again a fullyquantitative analysis may be unnecessary. However,I cannot agree that for the I have no objections to thosewho oftypicalscientific studiesa Bayesiananalysisis mandatory. reporting that wishto followthat conclusions to their route, provided theycan maketheir comprehensible readers, but I wish to reservethe right to present such my own conclusionsin a different way. Unfortunately, is unlikely to find eclecticism muchsupport amongtheBayesiancommunity.
321
M. J. R. Healy (Harpenden) approach to In thispaperDennisLindleysumsup theexperience of 50 years'advocacyofthesubjective forBishopBerkeley, of statistical reasoning. It has been a longhaul; he mustat timeshave feltsympathy Todaythough, whose arguments it was said thattheyadmitted no refutation butcarriedno conviction. is widelystudiedand the thanks verylargelyto workby him and his students, Bayesianmethodology are employed. Royal Statistical Society'sjournalsroutinely carrypapersin whichBayesiantechniques analysisthatis Yet it remains truethatthe impactof thismethodology on thevast amountof statistical in the scientific published literature is essentially negligible. AlmosteverypaperthatI see as statistical adviserto a prominent 'Values of p less than0.05 wereregarded as medicaljournalcontains thesentence criticizesubmitted papers for an significant' or its equivalent,and non-statistical refereesregularly absenceofpowercalculations of significance levelsto allow formultiple comparisons. or of adjustments If the Fisher-Neyman-Pearson unsatisfactory, as Professor paradigm(Healy, 1999) is demonstrably interpreted and a highly Lindleyclaimsto show, thenlargequantities of research data are beingwrongly this. unsatisfactory situation exists.It seemsto me that there maybe morethanone reasonfor underpinning of The first, and probably the least important, consistsof weaknessesin thetheoretical an area in whichmuchwork Bayesianmethods. One of theserelatesto therepresentation of ignorance, in enquiring whatwe has been done. (It may be thatwe are nevertruly ignorant, but thereare merits relevant here,and I mustconfessthatI shouldbelieve if we had been so.) Walley(1996) is particularly found thecontributions to thediscussion by Professor Lindleyand some ofhis colleaguesunconvincing. donebynon-statisticians. Another issue stemsfrom thefactthat thebulkof statistical workis actually (one that is notalwayslived This is as it shouldbe; one oftheresponsibilities ofthestatistical profession in all fields. itsinsights availableto research workers Medical students todayare up to) is thatofmaking (not all written by exposed to statistical teachingin theirpreclinicalyears and later to text-books statisticians) whichlay down the standard approachof t,x2, r, Wilcoxonand the rest.If theywish to notto say compelled, to quote p-values of research, publishtheresults theyare liable to be encouraged, and confidence limits-the paradigmthat I have referred to is in full possession of the field. As statisticians we may come to follow Professor Lindley and to agree among ourselvesthatnew and incompatible techniquesmust replace it, but how are we to explain this to our clients?Are we to that mustunlearn all that we havebeen teaching themformanydecades,that apologize and suggest they to us? they mustabandontheir favourite computer packages?And,ifwe do, will theylisten But the most severeproblem, I suggest,is essentially a matter of psychology. Mankindin general certain knowledge. longsforcertainty, andtheriseofnatural sciencehas been seen as a wayof obtaining that We statisticians have pointedout thatcompletecertainty is unobtainable, but we have maintained the degreeof uncertainty is quantifiable and objective(Schwartz,1994)-we can be certain abouthow how do we uncertain we shouldbe. If we now insiston the personalsubjective natureof uncertainty, theirresults?Are the conclusionsto be precededby the wish scientists to behave whentheypresent rubric'In our opinion',withtheimpliedparenthesis '(but you don'thave to agreewithus)'? We cannot even fall back on the objectivity of the data, since (as Professor Lindleyhas pointedout to me) the itselfdependson a model which is itselfsubjectively chosen. It may be thatthe likelihoodfunction papersto which inclusion of sucha rubric wouldshowa certain degreeofrealism we can all remember our reaction thatit will notbe welcomedby was 'I don'tbelieve a wordof it'. But it mustbe admitted thescientific as a whole,letalone bythegeneral community public. Dennis Lindley'spaper,like so manyof his previouscontributions, raises innumerable topics that unlikemost are worthy of deep thought and discussion.There is no escapingthe factthatstatistics, for demandsphilosophicalinvestigation. As practitioners we owe him a debt of gratitude disciplines, as we may be, thatsuch investigations mustbe pursuedand forlayingdown persuading us, unwilling one paththat needsto be followed. D. R. Cox (Nuffield College,Oxford) It is 50 years since Dennis Lindleyand I became colleagues at the Statistical Laboratory, Cambridge. Since thenI have readmostifnotall of his workon thefoundations of statistics alwayswithadmiration forits intellectual and verbal clarityand vigour.The present paper is no exception.It sets out with and aplomb the personalistic and individualdecision-making. persuasiveness approachto uncertainty The ideas described are an important statistical A keyissue,though, is whether thinking. partofmodern basis of at least themoreformal or are to be takenas some theyare theall-embracing partof statistics have patiently argued;see, forexample,Cox partof an eclecticapproachas I, and surely manyothers, (1978, 1995, 1997) and Cox and Hinkley (1974).
322
does notmean merely Bayesianin thiscontext On one pointI believethatwe are in totalagreement. Many of the applications to produceinferences. on a formalapplicationof Bayes's theorem relying as a essentially can be regarded of dimensions in a small number flatpriorsor hyperpriors involving of empiricalBayes methodsand a technicaldevice to produce sensible approximate combination Providedthat forexample,by Markovchain Monte Carlo methods. intervals implemented, confidence seem philosophisuch investigations of dimensions, flatpriorsare not in a largenumber therelatively of the cally fairlyneutral.Dennis Lindley's view is much more radical. It is the predominance that including variouskindsof information, to synthesize probability use of personalistic constructive leadingto a preferably of totaluncertainty, assessment by data,intoa comprehensive provided directly The terminology decisionanalysis.Flat priorshave no role exceptoccasionallyas an approximation. withit. we are stuck butI supposethat 'Bayesian' is unfortunate, to discuss thisin a brief basis forour subject?In trying as the primary Why is thisunsatisfactory but thanis the intention morebelligerent not merelyof sounding contribution one is in the difficulty of not havingthe last word! Also the paper is rich in specificdetail on which even more seriously is reallydesirable. comment thatis not directly of the personalistic view is thatit aims to addressuncertainty A majorattraction is of thisbroader data, in the narrowsense of thatterm.Clearlymuchuncertainty based on statistical emerges.To take an kind. Yet when we come to specificissues I believe thata snag in the theory mobiletelephones whatis theevidencethatthe signalsfrom me at themoment: examplethatconcerns new and are relatively hazard?Because suchtelephones are a majorhealth or transmission base stations evidence epidemiological is longthedirect of,say,braintumours periodforthedevelopment thelatency of animal and cellularstudiesand to some extenton is slender;we relylargelyon the interpretation levels thatare needed to inducecertainchanges.What is the calculationsabout the energy theoretical studieshave relevanceforhumanhealth?Now I such indirect thatconclusionsdrawnfrom probability at least approximately. But thatis not is at themoment, actually can elicitwhatmypersonalprobability because I wantto behave oughtto be, partly the issue. I wantto know whatmy personalprobability whichwantsto be of a report in thewriting because I am involved and muchmoreimportantly sensibly interest to me and is of little probability I come to theconclusion mypersonal that convincing. generally of no interest whatever to anyoneelse unless it is based on serious and so far as feasible explicit studiesbeen misleading laboratory have verybroadlycomparable For example,how often information. a direct health? studiesfrom processaffecting are thelaboratory as regards humanhealth?How distant to buthowmuchI ought I actually puton suchconsiderations The issue is notto elicithow muchweight thanhavingnone is better approachhaving(good) information put.Now of course in thepersonalistic forreasoneddiscusworthless is virtually probability butthepointis thatin myview thepersonalistic kind. The of a broadlyfrequentist or indirectly oftendirectly sion unless it is based on information, thecartbeforethehorse.I hope that is in dangerof putting approachas usuallypresented personalistic or, as I tendto a nuance of interpretation on this.Is the issue in effect Dennis Lindleywill comment a pointofprinciple? think, on a theory of the more Another way of sayingthis is thatwe can put broadlythreerequirements thatit embracesas much as possible in a single approach,thatit leads to formal partsof statistics: consistent consequencesand thatit mesheswell withthe real world(calibration). (coherent) internally is thatI put twopoints.My difficulty well on thefirst approachscoresextremely Now thepersonalistic selfIf there wereto be a choice betweenworking verylarge,indeedalmosttotal,weighton thethird. thenI prefer and of coursewe wouldliketo do both, and beingin accordwiththerealworld, consistently to putthat first. often rather crudely, attempts, The frequency-based approach thelatter. of 2 and 4 arise.Are they in whichsome probabilities in Mendeliangenetics Take a simplesituation of some biological phenomenathatwere going on long beforeanyone representations approximate as essentially the convergence of your personalistic themor are theyto be interpreted investigated but,to mymind, The secondview is interesting of information? in thefaceof a largeamount probability are important. and the essentialreasonwhythe probabilities is the preferred interpretation the former which is a realworldoutthere thatis close to naive realism;there position This is undera philosophical in this case, by biological captured, and which shows certainregularities it is our task to investigate constants. in situations where usefulmainly of priors is generally that theelicitation This leads to theconclusion to use informal whichit is required kind, is a largeamount of information, possiblyof a relatively there summary by expertsinto a prior and which it is not practicableto analyse in detail. An informal thatthe expertsmay be however, approach.It carriesthe danger, distribution may thenbe a fruitful
323
wrong andtreating their opinionas equivalent to explicit empirical data has hazards.In anycase settling issuesbyappeal to supposedauthority, whilesometimes unavoidable, is in principle bad. It is, of course, also possiblethat data arewrong, i.e. seriously defective, butthisis opento direct investigation. I understand Dennis Lindley'sirritation at thecry'wheredid thepriorcome from?'.I hope thatit is clear thatmyobjectionis rather in someoneelse's priorand why different: whyshouldI be interested in mine?Thereis a parallelquestion:wheredid themodelforthedatashouldanyoneelse be interested generating process come from?This is no trivialmatter, especiallyin subjectslike economics.Here any sortof repetition is very hypothetical and, althoughsome economistsasserta solid theoretical knowledge base, this seems decidedlyshaky.The reason forbeing interested in models is, however, clear.Theyare an imperfect buthopefully reasonedattempt to capture theessence of some aspectofthe real physical, biological or social worldand are in principle empirically at least partly testable.If we in it. havea reasonably fruitful representation thenin principle everyone is or shouldbe interested The need forpersonaljudgment, perhapssupremely in scientific research, is not in dispute.The in some situations. formalization ofthismaybe instructive A central therole of statistical issue concerns I see that methods (nottheroleof statisticians whichis a different matter). roleas primarily theprovision ofa basis formathematical ofphysical random and forpublicdiscourse about representation phenomena uncertainty. The Bayesianformalism is an elegantrepresentation of the evolution of uncertainty as increasingly moreinformation it is concerned arises.In itssimplest form withcombining twosourcesof information. But one of the generalprinciples in the combination of evidenceis not to mergeinconsistent 'data'. has usually to be interpreted in a probabilistic sense. Therefore we should face the Consistency thatthedata and theotherassessment possibility (theprior)are inconsistent. (I am, of course,awareof theargument thatno possibility shouldbe excludeda prioribut I cannotsee thatas a satisfactory way out.) Of course it may be thatthe data are flawedor being misinterpreted. But at least in principle something like a significance testseems unavoidable.I knowthatin principle we can reservea small to put on unexpected butsurely what of prior we need also to represent portion probability possibilities theseare and thismaybe totally unknown. A complexset of data may showentirely unanticipated but important features. This is connected withthematter of temporal coherency on whichcomments would be welcome. Are thenp-values needed? It is interesting froma broadly thatfor 50 years statisticians writing frequentist perspective have criticized the overuseof significance tests(Yates, 1950). Indeed in some via approximate limits conclusions are now primarily confidence fields, notably epidemiology, presented if thatpointof view whichcould be regarded of a likelihoodfunction, as an approximate specification it seems essentialto have a way of saying'the data underimmediate werepreferred. But in principle withtherepresentation analysisare inconsistent suggested'.Now I agreewithDennis Lindleythatit is to have some idea of an alternative butnotthatit is necessary to formulate itprobabilistically: necessary no. For examplewe maytestforlinearity an explicitidea of the desirable butnecessary without maybe, If theneed ariseswe maythenhave to formulate new form of non-linearity thatis appropriate. specific modelsbutnototherwise. The construction via treating measures of overviews (so-calledmeta-analyses) p-valuesas uncertainty if estimated is clearly a poor procedure effects and measuresof precision are availableon a comparable in whichmeasures ofthedegreeofbeliefin somehypothesis were scale. But so also wouldbe overviews theonlyevidenceavailable. It seems to be a fundamental assumptionof the personalistic theorythat all probabilitiesare to so faras I understand comparable. Moreover, it,we are not allowedto attachmeasuresof precision information of 2 elicitedfrom and flimsy probabilities. They are as theyare. A probability unspecified is the same as a probability based on a massive high qualitydatabase. Those based on very little of the information information are unstableunderperturbations set but thatis all. This relatesto the of suchmeasuresforcommunication. previous pointand to theusefulness of model of acquiredimmune as an exemplar Forecasting deficiency syndrome (AIDS) is mentioned and stated Now the initialreporton AIDS in the UK discussed sources of uncertainty uncertainty. was themajorsource.Indeedthemessagewas rammed that modeluncertainty homeby a front explicitly as curvesagainsttime.Wouldit have helpedto put coverwhichshowedseveralquitedifferent forecasts I supposethatit is a on thedifferent modelsand to have producedan overallassessment? probabilities thiswouldhave been a confusing and misleading to do matter ofjudgment butit seemsto me that thing thanclarified the issues involved.To put the pointgently, the idea that and would have hiddenrather is wrong. aboutmodeluncertainty onlyBayesiansare concerned
324
Dennis Lindleyputsdecision-making as a primary objective. Now I agreethatsuchquestions as why is thisissue beingstudiedand whatare theconsequencesof such and such conclusions mustalwaysbe consideredwhatever the field of study.At the same time I have rarelyfoundquantitative decision I certainly analysisusefulalthough accept thisas a limitation of personalexperience and imagination. For example,in the AIDS predictions mentioned above, the summary of the forecasts into a recommendedplanning basis was based on an informal decisionanalysisbased on the qualitative idea thatit was better to overpredict, leadingto an overprovision of resources, thanto underpredict, leadingto a It wouldhave been difficult shortfall. to putthisquantitatively other thanas a veryartificial exercisevia a seriesof sensitivity analyses.In mostof theapplications thatI see, therole of statistical analysisis, in a base forinformed anycase, to provide publicdiscussion. Over the designof experiments, I do not see thatas primarily the preserve of statisticians and it is important thatmost experiments are done to add to the pool of public knowledgeof a subject and therefore should, fortheir interpretation at least,notbe too strongly tiedto thepriors of theinvestigator. Witha different interpretation ofthewordpublicthisappliesto industrial experiments also. I do notunderstand the comment thattheory makesa prediction abouttheproportion of hypotheses rejectedat the 5% level in a significance testthatare in fact false. How can theory possiblyshow of the sort?Theymay all be false or all (approximately) anything truedepending on whatwe chose to I agree thatit is the case thatmanyassessmentsof uncertainty investigate. underestimate the error involved butthisis fora variety of empirical reasons, theuse of modelsignoring certain components of varianceor biases (which statisticians in the effects surelydo not in generalignore),real instabilities underinvestigation and so on. My attitude maybe partly a reflection of a lack of mastery of current computational procedures butI am deeplyscepticalof theadvice of Savage to takemodelsas complexas we can handle.This seems a recipeforoverelaboration and fortheabandonment of an important feature of good statistical analyses, namely transparency, theability to see thepathways between thedata and theconclusions. I agree thatprediction is underemphasized in manytreatments of statistics and thatthe test of a representation of data is itsability to predict newobservations or aspectsof theoriginal data notused in is necessarily the rightfinalobjective.We are not analysis. But this does not mean thatprediction in estimating interested thevelocity of light to predict thenextmeasurement on it. In conclusion, and not directly a comment on thepaper,I wantto objectto thepracticeof labelling people as Bayesianor frequentist (or any other'ist'). I wantto be bothand can see no reasonfornot ifpushed, as I have made clear,I regard thefrequentist view as primary, formostifnot being,although this virtually all the applications withwhichI happento have been involved.I hope thatby combining sin: incoherency. view witha highregard forthepresent an ultimate paperI am notcommitting J. Nelder (ImperialCollege ofScience,Technology and Medicine, London) should be called statistical Recently(Nelder, 1999) I have argued that statistics science, and that I thinkthat mathematics probability theoryshould be called statistical (not mathematical statistics). it there Professor of statistical and within Lindley'spapershouldbe called thephilosophy mathematics, is in the philosophy is littlethatI disagreewith.However,my interest of statistical science,whichI but rather deals with Statistical science is notjust about the studyof uncertainty, regardas different. data. An important is thatthey inferences aboutscientific theories from uncertain qualityabouttheories outsidethe are essentially open ended;at anytimesomeonemaycome along and producea new theory it is necessaryto set. This contrasts withprobability, whereto calculatea specificprobability current are defined. Whenthere is intrinsic have a boundeduniverse ofpossibilities overwhichtheprobabilities it is notenoughto have a residualclass of all thetheories thatI have notthought ofyet. open-endedness The best thatwe can do is to expressrelativelikelihoodsof different parameter values, without any I do not thatone of themis true.Although that are conditional implication Lindleystresses probabilities think that thiscopes withtheopen-endedness problem. I followFisherin distinguishing itwill rainhere between inferences aboutspecific suchas that events, if and inferences abouttheories. For inferences aboutevents, tomorrow, Lindley'sanalysisis persuasive; I were a businessman trying I to reach a decisionon whether to investa millionpoundsin a project, In analysing I would datarelative to one or morescientific wouldact verymuchas he suggests. theories, wishto present whatis objectiveand notto mixthiswithsubjective whichare derived from probabilities If theexperimenter whomI am working withwishesto combinelikelihoods withhis own set mypriors. to do so; it is notmy of weights based on his (doubtless moreextensive) thenhe is at liberty knowledge if he wishesto communicate it would be theresults to otherscientists, job to do it forhim. However,
325
is heavilydependent on ideas of better, in my view,to staywiththe objectivepart.(This paragraph GeorgeBarnard.) are finein themselves, butproblems arisewhenwe Generalideas like exchangeability and coherence whencombining information fromseveral tryto applythemto data from thereal world.In particular but the data themselvesmay strongly suggest that this data sets we can assume exchangeability, is nottrue.Similarly and wrong, because theworldis notas assumedby assumption we can be coherent I find to be morecomplexthanthosedefined in thepaper. Lindley. theprocedures of scientific inference of These latter fall intothe class of 'wouldn'tit be nice if', i.e. would it notbe nice if thephilosophy itdoes. I do notthink that statistical mathematics sufficed forscientific inference. A. P. Dawid (University College London) It is a real pleasureto commenton thispaper. Dennis Lindleyhas been one of the most significant to heart. reading carefully and taking influences on myprofessional life,and his wordsare alwaysworth It is in no way a criticism work,ideas thatDennis has been to say thatI recognize,in the current I have knownhim-these things saying, are stillworth promoting throughout the30 yearsand morethat and thought-provoking perhapsnow morethanever.For thosewho wishto readmoreof his penetrating recommend analyses,I particularly Lindley(1971) and Lindley(1978), whichcontainsome fascinating and educationalexamples of the differences and the Bayesian approachesto betweenthe frequentist to the thatcan arise when we do not conform problemsand clearlypoint up the logical difficulties principles of Bayesiancoherence. A casefor Sherlock Bayes? a A recentand veryimportant real example of this has arisen in the area of forensicidentification, earlycontributions (Lindley,1977). problem areato whichLindleymade some important has detailsare commonto each case. A murder We are asked to comparetwo cases. The following been committed, and a DNA profile, whichcan be assumedto be thatofthemurderer, has been obtained from and a DNA profile obtained from blood at thescene ofthecrime.A suspecthas been apprehended, is his blood. The two profiles matchperfectly. The probability of thisevent,if the suspectis innocent, be P = 10-6. (It is assumedthata matchis certain if he some smallnumber P-a realistic value might P may reasonablybe taken as expressing is guilty. Then smallervalues of the 'match probability' evidenceagainst thesuspect.)Thereis no other relevant evidence. directly stronger The difference betweenthe two cases is thatin case 1 the suspectwas picked up at random,for was foundto matchtheDNA from thescene of the unrelated completely reasons,and,on beingtested, of theDNA profiles whereasin case 2 a searchwas made through a computer databasecontaining crime, and thesuspect(and no-oneelse in thedatabase) a largenumber N (perhapsN = 10000) of individuals, was found to match. The questionto be addressedis 'In which of these two cases is the evidence againstthe suspect of the evidence;we are not concerned stronger?'. (Note thatthisquestionrelatesonlyto the strength that in thetwocases might withthepossibility theprior be different.) reasonably probabilities that The defence counselargues, withmathematical because ofthe 'multiple testing' correctness, that, a (single)matchin thedatabase,ifthetruemurderer has takenplace in case 2, theprobability of finding of finding morethanone matchis entirely is notincludedin it,is aroundNP (theprobability negligible). NP forcase 2 is verysubstantially thanthematchprobability P for Since thismatchprobability larger muchweaker. case 1,that meansthat theevidenceagainstthesuspectin case 2 is very the in case 2, one consequenceof thesearchwas to eliminate The prosecution counselpointsoutthat, of thusincreasing the strength otherN - 1 individuals in thedatabaseas possible alternative suspects, forcase 1. theevidenceagainstthesuspect-albeit by a typically amount above that negligible Bayesiansor intuitive frequentists by deciding Readersmay like to assess whether theyare intuitive whichof thesetwo arguments bothare based on probability onlyone Although arguments, theyprefer. to measureand manipulate of themis in accordancewiththe coherent use of probability uncertainty. case thatthe database Insteadof identifying whichthisis, I shalljust give a hint:considertheextreme in thepopulation. contains records on everyone For further see Stockmarr (1999) and Donnellyand Friedman(1999); for readingon thisproblem, to forensic moregeneral identification see Balding of coherent application Bayesianreasoning problems, and Donnelly(1995) and Dawid and Mortera (1996, 1998).
326
ThomasBayes in the21stcentury I shareLindley'sview that, muchas the tremendous recentexpansionof interest in Bayesian statistics is to be welcomedand admired, its emphasison computational aspectscan sometimes standin theway of a fullerunderstanding and appreciationof the Bayesian approach. It was the deep logical and philosophical conundra thatbeset the makingof inductive inferences fromdata thatattracted me into in the first statistics place and have exercisedme ever since. But I have alwaysbeen disappointed that so few otherstatisticians seem to share my view of statistics as 'applied philosophy of science', and even thatsmall number seems to be dwindling fast.On thepositiveside, thereare increasing numbers of researchersin artificial intelligenceand machine learningwho are taking foundational issues extremely seriouslyand are conducting some very originaland important work. It is ironicthat,as statisticians devotemore of theireffort to computing, so computer scientists are applying themselves to statistical logic. When I was starting out, Bayesian computation of any complexity was essentially impossible.We could handle a few simple normal,binomial and Poisson models, and that was it. Whateverits philosophicalcredentials, a commonand valid criticism of Bayesianismin those days was its sheer impracticability. Indeed,when I was engaged in organizing the first meetingon 'Practical Bayesian statistics' (sponsoredby whatwas thenstillthe Institute of Statisticians) in Cambridgein 1982, it was stillpossible foran eminent statistician to writeto the Institute's newsletter thatthiswas 'a suggesting in terms':an extreme contradiction and biased judgment, perhaps, but witha grainof truth. So, as we could notcompute, we had to devoteourselves instead to foundational issues. How things have changed! Withthe availability of fastcomputers and sophisticated computational techniquessuch as Markov chain Monte Carlo sampling,Bayesians can now construct and analyse realisticmodels of a degree of complexity which leaves most classical statisticians farbehind.This is itself a very strongargument power and versatility for doing statistics the Bayesian way-far thandeep consideration of thelogic of inference. But it wouldbe sad if thispractical stronger, perhaps, success wereat theexpenseof a clear understanding of whatwe are doing,and whywe are doingit. It is that Whatis theprincipal distinction between Bayesianand classical statistics? Bayesianstatistics is fundamentally boring.There is so littleto do: just specifythe model and the prior,and turnthe and Bayesian handle. There is no room for clever tricksor an alphabeticcornucopiaof definitions I have heard people who should know betteruse this 'dullness' as an argument criteria. optimality as well complainthatNewton'sdynamics, One might againstBayesianism. beingbased on threesimple and one of gravitation, is a poor substitute fortherichness ofPtolemy's laws of motion epicyclicsystem. is difficult the Ptolemaictemptation to resistand is apparent in much neo-Bayesian Nevertheless, hard to escape fromthe restrictive confinesof the fullycoherentsubjectivist work,which struggles I regard thisas a seriously Bayesianparadigm, dreaming up insteaditsownnew and clevertricks. wrong All myexperience teachesme thatit is invariably more fruitful, direction. and leads to deeperinsights a data analyses, to exploretheconsequencesofbeinga 'thoroughly and better boring Bayesian'.Without of what being coherent clear appreciation entails,and the guidancethata strict Bayesian framework supplies,it is all too easy to fall into erroneousand misguidedways of formulating problemsand data. analysing J. F. C. Kingman (University ofBristol) If 'philosophy'is read as 'generalprinciples', the authoris laying This paper is of greatimportance. thatthe outputfromany statistical down the generalprinciple of analysisshould consistof a number in the sense thattheydependon assumptions These are subjective made by the probability statements. and another withdifferent will producedifferent conand statedin thereport, analyst analyst prejudices clusions. I use theword 'analyst'rather than'statistician' if it is valid at all, mayapply because theargument, not just to statistical methodbut to any reported researchin which uncertainty plays a part. Thus in the way thatresearchin general is carriedout and Professor Lindley is calling for a revolution and is doing so on the basis of verysimplearguments reported, of coherence.If we do not followhis from us by askingus to beton ourconclusions. advice,he can makemoneysystematically I firstencountered the clarityand deceptive simplicityof ProfessorLindley's expositionas a freshman enthralled to his introductory courseon statistics. Much of whathe taught Cambridge listening us thenhe would now recant, buttheway in whichthe complexities of an uncertain worldwere fitted intoan elegantand convincing was deeplyimpressive. selectthemselves theory Perhapsmathematicians
327
bythisdesireto reducechaos to order and onlylearnbyexperience that therealworldtakesitsrevenge. The most commonreason forscepticism about the Bayesian approachis the apparently arbitrary nature of thepriordistribution p(O), but I worry even moreaboutthe 'model' p(x10), whichso many statisticians, Bayesianor otherwise, seem to takeforgranted. Just whatevidencedo we need to convince us thata particular meaningfortheparameter 0, is or is not appropriate to a model,witha particular particular problem? in theoretical Special aspectsof thisquestion have of coursebeen studied terms, butin practice many statisticians make a conventional choice, oftenbased on mathematical or computational convenience. This habit seems to me to be based on a feelingthat,althoughstatistical inference is difficult and controversial, theprobability calculusat leastis a firm foundation that need notbe questioned. Mathematicians since Kolmogorov have connivedby presenting the mathematics of probability as following irresistibly from thegeneral theory of measureand integration, buttheinternal consistency of themathematics is no guarantee thatit appliesto anyreal situation. Philosophers warnof thedangers of firm attaching meaning to any probability statement abouttheworld,and the factthatsuch statements are undeniably usefulto (for instance)the designersof telephonesystemsshould not lead us to an uncritical relianceon whatis in theend onlya collection ofmathematical tautologies. One examplemustsuffice. We teachour students thattwoeventsare (statistically) when independent theprobability thattheybothoccur is theproduct of their probabilities. We thenforget theadverb, and ifwe cannotsee anycausal linkbetween assumethat, twoevents, themultiplication law mustapply.At thelevel of constructing a plausiblemodel,thisis a reasonableprocedure, so long as themodel is then tested. But how do we testthe sortof assertion thatis made aboutthe safety of nuclearpower-stations, thattheprobability of disasteris 10-N, whereN is some verylargenumber. The assertion is based on manyapplications of the 'multiplication law', ignoring the factthatthejustification of the law is inherently circular. So probability statements are dangerous evenbefore to infer we try themfrom data.We currency dirty in a few simple formulae. mustdistrust the prophets who can sum up all the complexities But such scepticism does notabsolve statisticians from askingwhatthegeneralprinciples of theirsubjectare. If we do notacceptProfessor whatalternative do we have? Lindley'sprescription, David J. Bartholomew(StokeAsh) It is a pleasureto comment on thislucid and authoritative exposition of subjective Bayesianism. Thereis I muchin thepaper withwhichI whole-heartedly agree,but I shall focuson thepointsof difference. I regard and variability are at theheartof statistics. Unlikethe author, agreethatuncertainty however, as themorefundamental. Data analysis thencomes first and does nothaveto be justified later variability as a tool formodelselection. butsecondarily, about arisesnaturally, whenwe need to think Uncertainty p(ylx) or p(0lx). Debates on inference have often treated themodel as givenand My mainpointconcerns modelling. so focused on the prior distribution. How the model is chosen is much more important for the of statistics. philosophicalfoundations Lindley argues forthe largestpossible model. What happens whenwe push thisto the limitand tryto imaginea truly global model forthe whole cosmos? In the thatcould be observed. themodel'sx wouldhave to includeliterally At thatstage beginning everything K, the backgroundknowledge,is an emptyset. How could we then assign a prior withoutany If thisis impossible, how does thejourney to knowledge everstart? background knowledge? But if we allow that,somehowor other, it did start whatmatters now is how we proceed;just how modelsmay largedoes the 'world' of themodelhave to be? How do we cope withthefactthatdifferent have thesame observational everagree,no matter how much consequences?Whyshouldtwo scientists data theyhave in common,if theyare operating withdifferent, but equallywell-supported, models?In worldmodelis underdetermined and so certainty is beyondourreach. anycase, anyrealistic to solve the 'small' world problemby controlling all extraneous science attempts Experimental R. A. Fishertookthat theextraneous effects variables. idea further to ensurethat byusingrandomization were controlled thatit has no place on average.It is a weaknessof the subjective Bayesianphilosophy forrandomization and thusno way of making in a smallworld.Insteadit valid unconditional inferences is left to flounder on theslippery like an infinite slope ofwhatlooks suspiciously regress. which makes me most uneasy.The pursuitof Finally,it is the personalfocus of the philosophy It is not, and decision-making-isa collectiveas well as an individual knowledge-inference activity. aboutwhatit is rational foryou or me to believeand do, butaboutwhatclaim there is on us essentially, in thisdistinction to believe something or to act in a particular all, collectively, way.Lindleyrecognizes
328
decision-making butit is equallyrelevant in inference. Inference is conditional on themodel and without agreement on wherethejourneystarts thereis no guarantee thatwe shall all arriveat the same destination. Attractive though it is, the author's worldof discourseseems too small and self-contained to be the lastword. A. O'Hagan (University ofSheffield) I congratulate Dennis Lindleyforhis elegantly written paper,thatso lucidlycoversan enormous range of fundamental topics.I particularly likedthesectionon 'data analysis'in Section 10. The idea, thatwe shouldconditionon whatever summaries t(x) of the data have been used in buildingor checkingthe It clearlycoversthe case wherewe use partof the data as a 'training model,is a real insight. sample', reserving the remaining data forconfirmatory analysis,and so linksto theuse of partialBayes factors (O'Hagan, 1995). It will,of course,be moredifficult to applyfollowing moreloosely structured 'data analysis'. I also applaud the emphasis, in thefinalsection, on theneed forresearch intomethods of assessing (or eliciting) probability distributions. Lindleymissesan opportunity, however, to showhowtheBayesianapproachclarifies theconceptof a nuisance parameter. He says, at several points,that it may be 'necessary' to introducenuisance a in addition 0. In whatsense is this'necessary'? parameters totheparameters of interest in Section9, theexampleof a doctorneedingto give a prognosis Lindleyintroduces, y fora patient based on observations x from He saysthatthiscould be done simply previous patients. by assessingthe predictive distribution p(ylx), but thatthis is 'usually difficult'. He thenassertsthat'a better way to proceed... is to study theconnections between x and y,andthemechanisms that operate'.This argument ofpractical is onlyworksifwe recognize thelimitations assessment. To assess p(ylx) directly probability notjust 'difficult' butlikely tobe very inaccurate, whereas itindirectly constructing via other assessments In O'Hagan (a model) and the laws of probability is bothmoreaccurateand moredefensible to others. (1988) I developthisidea of 'elaboration' as a fundamental toolofprobability measurement. Taking this view, nuisance parameters are desirable (if not absolutely 'necessary') to achieve we can be confident sufficiently accurateand defensible assessments; of assessingp(ylx, a, 0) butnot ofassessingp(ylx, 0). Finally,I recognizethatin such a concise surveyas this it is necessaryto make some judicious simplifications, but thereare a fewplaces whereLindleyrisksdamaginghis argument by being simthansimplified. plisticrather whenignorance is represented (a) The examplein Section8 ofnon-conglomerability by an improper The is nota good one, because theconditional are notdefined. uniform distribution probabilities events havezeroprobability. conditioning (b) At the end of Section 8, it is not truethatconsensuswill be reachedif, as statedin the next we admitsubjectivity aboutthelikelihood. sentence, of tacticalvotingis overlooked.One's utility is not (c) At the end of Section 11, the possibility ofvotescast forthe'best' candidate. a matter ofthenumber simply and Medicine, David J. Hand (ImperialCollege ofScience,Technology London) I would like to congratulate Professor clear expositionof the Bayesian Lindley on a masterfully thisapproach are difficult to refute. The arguments that he presents foradopting to inference perspective. I musttake issue withthe paper,and my disagreement Nevertheless, begins withthe title.What is in thepaper is a philosophy nota philosophy of statistics. As such,it described of statistical inference, theorbit muchwhichshouldproperly be regarded as within of statistics. ignores in Section2, thatstatistics is the study of uncertainty. This is certainly one of the Lindleysuggests, mostimportant thelargest butit does notdefine it.At theveryleast,it aspectsof statistics, perhaps part, whenthedata are notuncertain leaves out description, summarization and simplification and theaim is not inference, as arises, forexample,when the completepopulationis available foranalysis.Would Professor Lindleyclaim thatdata analytictools such as multidimensional scaling and biplotsare not of chemicalmolecules,whendata are available for statistical tools? Wouldhe claim thattheclustering of molecules,is not statistics? the entire of a given family Would he claim thatclustering population data is notstatistics? microarray geneexpression This would not be a seriousissue if it were merelya matter of terminology. But it is not. It goes
329
further thanthisand has implications forhowthediscipline of statistics as a whole is perceived. In my view,thenarrow view of statistics whichit implieshas contributed to the factthatotherdata analytic havegrown disciplines up and adoptedsubject-matter, kudosand resources whichare moreappropriately if it wereonlya questionof hurt regarded as belonging to statistics. Again,thiswouldnotmatter pride. But,again,there is moreinvolved. In particular, it meansthattheeleganttools forhandling uncertainty whichhave been developedby statisticians have notalwaysbeen adoptedby others concerned withdata and have therefore notbeen appliedto problems analysis, whichcould benefit from them.For example, have not always appreciated the need forinferential database technologists methods(a case in point of a relationship in thedata has beingtheanalysisof supermarket transaction data,wherethediscovery been takenat face value, withno explicitly articulated from notionof an underlying population which the data were drawn).A second exampleis thecase of fuzzylogic. The underlying logic herehas not been uniquely defined, and thearea certainly lackstheelegantrigour of theBayesianinferential strategy described now attract a huge following. This following tendsto by Professor Lindley.But themethods come fromthe computational disciplines where,because of the narrowview of statistics described A thirdexample is inferential methodshave not been adopted as fundamental. above, statistical whichbegan by assumingthatthe classes in supervisedclassification computational learning theory, problems were,in principle at least,perfectly separableand has onlyrecently begunto struggle withthe morerealistic non-separable case whichstatisticians takeforgranted. Sometimes thereis a convergence artificial neuralnets are a mostimportant recentexample,and recursivepartitioning tree methods,developed in parallel by the statisticaland machine learning are another. Whenthishappenssignificant of the communities, synergy can resultfrom theintegration It is a pity,and detrimental different to the rate of scientific thata period of perspectives. progress, has to existat all. separation One issue on which I would welcome Professor Lindley's commentsis the issue of what I call 'problemuncertainty'. The inferential strategy outlinedin the paper capturesmodel uncertainty and butreal problems in thatthequestionthat sampling uncertainty, often have an extralayerof uncertainty, An obvious illustration the researcher is trying to answeris not precisely defined. lies in the need to in physics we mayhave a good idea that ourmeasuring instruments operationalize measurements: match ourconceptual butin manyother definition of a variable, domainsthings are notso clearcut.Our model maypredict a good outcomeif we measurea responsein one way,butwhatif thereis a disagreement An extreme aboutthebestwayto measuretheresponse? measurement. examplewouldbe quality-of-life in a clinical trial,the responseto a treatment Similarly, may be measuredin different ways. And in classification problems,for example,it is not always clear how to weightthe relativecosts of the different kindsof misclassification. How shouldwe takeintoaccountthiskindofuncertainty? I perhapsdisagree On a minor ifI disagree withProfessor point, Lindleyaboutthescope of statistics, withhim even more about the scope of literature. the Analysesof wordcountsmay 'help to identify author of an anonymous aboutliterature piece of literature' (Section 18), buttheydo not say anything per se. I wouldlike to end on a noteof agreement. in his final that'Ourjournals Lindleyremarks, paragraph, theclient'srequirements'. ... have been too divorced from This seems too painfully to be thecase. The on narrowtechnicaladvance into increasingly focus seems to be increasingly specialized areas, with therealities of data. merit to workwhichis moreabstract and moredivorced from greater beingawarded it. I am Statistics has enoughof an image problemto overcome, without our gratuitously aggravating of RonaldReagan'sremark, workin reminded that'Economistsare people who see something painfully to be tarred withthe same practiceand wonderif it would workin theory'.I would hate statisticians brush. George Barnard (Colchester) themanypointsof disagreement, and some pointsof agreement, between Space does notpermit listing me and my friendProfessorLindley. My centralobjection to probability as the sole measure of is therulePr(H) + Pr(notH) = 1. If H is a statistical thatis relevant to a given uncertainty hypothesis the probability thatH is false leaves data set E it mustspecify Pr(EIH) of E. But the mereassertion It is onlywhengivena particular collectionM of model,i.e. a specified Pr(EIH) whollyunspecified. that we are entitled in M'. Our modelmaybe to equate 'not H' with'some other hypotheses, hypothesis If M is wrong, in and theprimary of traditional function wrong, p-values is to pointto thispossibility. to 0. repeated experimentation p will shrink GiventhatM is accepted, our statistical becomesthatof weighting theevidenceforanyone problem
330
Commentson thePaper by Lindley W = L(H versusH'IE) = Pr(EIH)/Pr(EIH').
H' inM. Thisis done H inM against that for any other bycalculating thelikelihood ratio L exceeds Inany longseries ofjudgments between pairsH andH', ifwechooseH rather than H' when H when L is lessthan L falls w andchooseH' rather than 1/w, leaving ourchoice undetermined when between 1/wand w,correct choices willoutnumber incorrect choices by at leastw:1. Forimportant fixw = 100,insisting choices we might on more datawhen w falls between 100and 1/100.Forless choices be willing the more important wemight totakew = 20. Themore important ourchoice, data we tocollect. may need Likelihoods cannot always be added.Butifwith dataE giving L(a, lSIE) we areinterested in a but notin /3 then, provided that thedatathemselves are reasonably informative about /3, adding L(a, /3) as a reasonable over/3 valuesis permissible allowus approximation. Nowadays desk-top computers to overview L(a, /) and it is easy to see whether, for thedata to hand, suchan approximation is permissible. Fisher never had a desk-top computer. But in every edition of Statistical Methods for Research he said thatlikelihood was themeasure of credibility Workers forinferences. To keepto statistical I am sure methods that were actually usableinhisdayhe hadtooverstress p-values. that Fisher would hismind andI hopethat Professor be persuaded todothesame. change today, Lindley may Thelikelihood principle seems tobe oneofthose ideasthat is rigorously verifiable andyetwrong. My is that theprinciple rulesoutmany useful toolswithout difficulty ofourmost dataanalytic providing notentirely substitutes. Hereis a bootstrap toillustrate the workable story, apocryphal, point. A medical ofabdominal researcher investigating a newtype surgery collected thefollowing dataon indays, the post-operative hospital stay, for 23 patients: 1 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 7 7 8 9 10 16 29.
Brad Efron(Stanford University)
04
to
5 5 6
7 7 8 9
mean bootstrap 10% trimmed bootstrap
of the 10% trimmed Fig.1. 2000 bootstrap replications mean forthe hospital staydata:.*---.----,Bayesian
331
a referee's Following advice she had summarized thedatawithits 10% trimmed mean,5.35, butwanted someformula fortheestimate's accuracy. I drew2000 independent To helpto answer herquestion bootstrap samples,each comprising 23 draws withreplacement fromthe data above, and foreach sample computedthe 10% trimmed mean. The histogram of the 2000 bootstrap trimmed means is shownin Fig. 1. From it I calculateda bootstrap standard error of 0.87 and a nonparametric 90% bootstrap approximate confidence interval [4.42, 7.25]. It was interesting to noticethatthe interval extended twice as farabove as below the pointestimate, reflecting thelongright-hand tailofthebootstrap histogram. This is exactlythe kind of calculationthatis ruled out by the likelihoodprinciple;it relies on hypothetical data sets different from the data thatare actuallyobservedand does so in a particularly flagrant way. Some of the bootstrap samples put more than 10% sample weighton the two largest observations, 16 and 29, givingtheminfluence on the 10% trimmed mean thattheydo not exertin the of factthis effect accountsformost of the long right-hand originaldata set. As a matter tail in the histogram. It is notas though Bayesiantheory does nothaveanything to say aboutthisexample.We might putan uninformative Dirichletprioron the class of probability distributions on the 23 entirely supported observedvalues, as suggestedin Rubin (1981) and in chapter10 of Efron(1982), and calculate the posterior distribution of the population10% trimmed this posterior mean. Interestingly, distribution thebootstrap itbythedotted in Fig. 1. agreescloselywith histogram I have indicated histogram But of coursethisis nota genuineBayesiananalysis;it is empirical Bayesian,usingthedata to guide the formation A well-known of the 'prior'distribution. quote of Professor Lindleysays thatnothing is less Bayesian than empiricalBayes analysis.Does he still feel this way? My feelingis that good out something frequentist proceduresare often carrying like an 'objective' Bayesian analysis, as in Efron(1993), and thatmaybethishints at a usefulconnection of data suggested between therealities analysis andthephilosophic cogencyoftheBayesianargument. D. A. Sprott(Centrode Investigacion en Matemacticas, Guanajuato) Some of my doubtsabout the applicationof the ideas in thispaper to scientific inference are listed below. briefly inferenceto a branch (probability)of pure (a) This paper relegates statisticaland scientific if HI thenH2. This can whereinferences are deductive statements of implication: mathematics, aboutwhether thereis reproducible evidenceforHI or H2, as is say nothing objectiveempirical inference. Scientific inference transcends required by a scientific puremathematics. in (b) In particular, Bayes's theorem (1) requiresthatall possibilitiesHI, H2, ..., Hk be specified of hypothesis or advance, along with theirpriorprobabilities. Any new, hitherto unthought havezero prior FromBayes's theorem, H will thenalways conceptH will necessarily probability. havezero posterior probability no matter how strong theempirical evidencein favour of H. thenecessity of presenting to summarize theobjective thelikelihood function (c) This demonstrates evidence.Butwhatifthelikelihood function theprior contradicts experimental flatly distribution, that flatlycontradicts leading to a posterior distribution both the priordistribution and the likelihoodfunction? Surelycontradictory, itemscannotmerelybe routinely comconflicting, bined. whereas probabilities are set functions. Likelihood therefore (d) Likelihoodsare point functions measures therelative of two specific 0. The plausibility values, 0':0", of a continuous parameter of each specific measurestheuncertainty probability, however, value, 0' and 0", is 0. Probability of intervals. The practical value of usinglikelihoodsupplemented by probability (if possible) to is illustrated measure uncertainty byDiaz-Francesand Sprott (2000). of subjective beliefsintoexperimental evidence(Section8) can be (e) I do notsee howtheinjection in designing Beliefsare necessary To injectthemintothe analysisof the justified. experiments. objectivedata could lead to proofby assumption or belief,or to the combination of contradicnot toryitemsas in (c) above. The beliefsmayjust be plainlywrongand shouldbe rejected, however'incoherent' thiswould be. In any case the likelihoodshouldbe presented combined, as a summary oftheempirical uncontaminated separately evidence, bybeliefs.As putby Bernard (1957), page 23, 'I considerit,therefore, an absoluteprinciple thatexperiments mustalwaysbe devisedin view of a
332
... I positit similarly preconceived idea .... As fornoting theresults of an experiment, as a principle that we musthere,as always,observewithout a preconceived idea.'
Ian W. Evett(ForensicScience Service, London) I am grateful forthisopportunity to comment on thepaperby Professor Lindley.He and I have known each other forover25 yearsand it wouldbe difficult forme to exaggerate theeffect thathe has had on mythinking. His view, like thatof the majority of statisticians, is that of the mathematician. I am, first and a scientist indeed, a forensicscientist. foremost, My perspective is quite different and mightbe considered iconoclastic, renegadeeven,to readers ofthis journal. It is appropriate thatDennis shouldmention thatthere appearsnotto be a strong associationbetween statistics I tookmyfirst and physics. degreein physics and myintroduction to statistics came in myfirst year.It was a two-hour lectureon 'errors of observation and their treatment'; thelecturer was so proud of it thathe recorded himself forfuture use. I understood none of it. My secondyearincludeda course on statistics I am not exaggerating froma real live academic statistician. when I say thatI foundit Now clearly, I had to do something in mypracticalexperiments completely mystifying. to indicatethe in any estimates extent of the uncertainties or otherinferences thatI drewfrom myobservations. But thatwas notreallya problem, because it soon became clear thatmy supervisors, to say nothing of my fellowstudents, understood no moreofthefiner pointsof statistics thanI did! A good amount offudging to round offone's experimental reports was quiteenoughto satisfy themostscrupulous demonstrator. Later by thattimea practising forensic scientist I was sufficiently fortunate to study statistics full timein a post-graduate I have spent courseat Cardiff. Since then, mostofmytimeworking withforensic scientists of all disciplineson matters of inference and statistics. A large proportion of my time is devotedto training withtheemphasison new entrants to theForensicScience Service.And whatdo I find?Most graduatescientists have learnedlittleof statistics otherthana dislikeof the subject.Even moreimportantlytheyhaveno understanding A forensic of probability. scientist spendshis or hertime In courts,termssuch as 'probable', 'unlikely'and 'random' are everyday dealing with uncertainty. I havethought fora longtimethat ifa forensic scientist is indeeda genuinescientist thenhe or currency. she shouldunderstand probability. withthenotionofprobability. Yet it is myimpression that are rather uncomfortable manystatisticians thereis plenty of talk about long runfrequencies but whatabout the probabilities of real Certainly, world problems?Is it not a fundamental weakness that most texts and teachingschemes present conditional as something textas Chance Rules (Everitt, probabilities special? Even such a delightful beforeintroducing 'conditional almostas a 1999) talks quite happilyabout 'probability' probability', in chapter are conditional. Thereis no suchthing as an 'unconditional new concept, 7. All probabilities A probability and assumptions thatit is without unlesswe specify theinformation probability'. meaning would dispute is based on. Whereastheseassertions are to me obvious,I sense thatmanystatisticians them. the principles of forensicscience through the Much progress has been made towardsestablishing alike thatBayes's Bayesian paradigm.There is littledisagreement among Bayesians and frequentist theorem a logical model fortheprocessing of evidencein courtsof law. I have heardclassical provides at all againstBayesian statisticians who have ventured into the fieldsay thingslike 'I have nothing But this is the cry of the toolkit methods indeed,I use themmyselfwhen theyare appropriate'. In the world of science here lies the distinction who lacks a philosophy. statistician a statistician between thescientist and thetechnician. has caughtthe headlinesand brought into The new technology of DNA profiling manystatisticians to thepursuit butthere havebeen severaleccentricities, thefield. Much ofthishas been highly beneficial An exampleof misguided the classical view,thathave confusedrather thanilluminated. arisingfrom forHardy-Weinberg statistical is thenotionof significance thinking testing equilibrium (HWE). We all knowthat theconditions forHWE cannotexistin therealworldbutwhatdo we do whena new locus is thatHWE is true-even though forDNA profiling? We set out to testthenull hypothesis implemented we knowthatit is patently false! Whydo we playthesesillygames? Theydo nothelp scienceand they of statistics as a scientific do nothelptheadvancement discipline. is thatthereis the Bayesian paradigmand thereis everything as a scientist, My view of statistics, thatis at best peripheral to the else a hotchpotch of significance intervals testingand confidence scientific method. Yetthisis whatis taught to scienceundergraduates and most,likeme,are mystified by the client's it. In his concluding Dennis says thatstatisticians 'have been too divorcedfrom paragraph,
333
requirements'. Here is my requestas a client:in the future, I requirethatall new science graduates And I am nottalking I am shouldunderstand probability. about coin tossingand long runfrequencies: talking aboutprobability as thefoundation oflogicalscientific inference.
Author's response
As explainedat the end of Section 1, thispaperbegan as a reprimand to my fellowBayesiansfornot beingsufficiently Bayesian,but it endedup by being a statement of myunderstanding of the Bayesian thisview acts like a red rag to a bull and I am most appreciative of the factthatthe view. To many, have notbeen bullishbuthave brought thatcarry discussants forward reasonedand sensiblearguments weight and deserverespect. Limitations of space prevent every pointfrom beingdiscussedand omission lack of interest does notimply or dismissal as unimportant. are reluctant to abandonfrequentist ideas and I agreewithArmitage that this Manyofthediscussants in admitting is notjust inertia, though thedifficulty all of us experience we werewrong mustplaya part. Thereare at least two solid reasonsforthis: frequency techniques have enjoyedmanysuccesses and, through theconceptof exchangeability, sharemanyideas withtheBayesianparadigm. Againstthisthere is the consideration thatalmost everyfrequentist techniquehas been shownto be flawed, the flaws thatcan onlycome through not as arisingbecause of the lack of a coherent underpinning probability, frequency, butas belief.A secondconsideration is that, unlikethefrequency paradigm withitsextensive collection of specializedmethods, thecoherent view provides a constructive method of formulating and thatthe statistical comsolvingany and everyuncertainty problemof yours.30 years ago I thought theflaws, munity would appreciate butI was wrong.My hope now is thattheconstructive flowering of thecoherent approachwill convince, if notstatisticians, scientists, who increasingly showawarenessof thepowerofmethods based on probability, e.g. in animalbreeding (Gianola,2000). The paper,stripped to itsbare essentials, amounts to saying that probability is thekeyto all situations involving uncertainty. What it does not tell us is how the probability is to be assessed. I have been the rules are so simple,theirimplementation can oftenbe so difficult. Dawid surprised that, although in the 1970s. a striking provides examplethatcaused me muchanguishwhenit was first encountered I findit illuminating, is sound and pertinent. when seeing yet Kingman'sexample of independence another is correct; it is introductory texton statistics, to see whether thedefinition of independence often textsdo not even mention the notionexplicitly is not. Some elementary and conditional probability rarely mentioned. No wonder manyof thesetexts are so poor. In his perceptive comments, Cox may not have appreciated my view of the relationship betweena who has the probabilities statistician and theirclient. It is not the statistician but the client; the in terms statistician's theclient's uncertainties and utilities oftheprobability taskis to articulate calculus, thatthe clienthas. This these being 'based on serious and, so far as feasible,explicitinformation' on data but oftenit uses deep understanding of physicaland other information may be based directly mechanisms thatare unfamiliar to the statistician. The idea of a statistician froma position starting almostof ignorance about mobile telephones and updating by Bayes's theorem, usingwhatthe expert views and it is thesethatneed to be says,is not how I perceivethe process.The clienthas informed In themobiletelephones modified to be made coherent. quantified and,if necessary, examplethereare the manufacturers withopposingconcerns. and environmentalists presumably manyclientsincluding in thepaperdo notdirectly ideas exhibited and I do notknowof Although applyto groupsin opposition, It is thatdoes in generality, functions. anymethod theycan assist,especiallyin exhibiting strange utility interesting that,having expressed doubts about probabilitiesin the mobile telephonestudy,Cox concludesthat 'the elicitation of priorsis generally usefulmainlyin situations wherethereis a large amountof informal to use and whichit is not information, possiblyof a relatively kind,whichit is required to analysein detail'. practicable Is not this a fairdescription of the study?Incidentally, the statistician's fondness forfrequency data shouldnotblindthemto information to be had from a scientific of theunderlying appreciation physical mechanism. I agreewithCox that shouldbe based on information; thatis whyit is always personalistic probability conditional on thatinformation. But I do not see how he can claim that'confidence limits... could be a as an approximate of a likelihoodfunction'. regarded specification Observingr successes in n trials,
334
likelihood function can be foundbutnota confidence limit because thesamplespace is undefined. Cox in mind.Yes, butwhether suggests thatwe can testa hypothesis without an alternative thetestwill be anygood dependson thealternative. My emphasis,supported by Evett,on the conditional natureof probability, thatit dependson two not one, has not been fullyappreciated. arguments, For example,if hypotheses, HI, H2, ..., H,, are contemplated, withH their union, thenall probabilities willbe conditional on H, so theaddition of Hn,I will only necessitate a change in the conditions to the union of H and H,,,1. This meets Barnard's objection about not-H, Nelder's point about likelihood and Sprott'spoint (b). Incidentally, it is interesting, though notunexpected that, apartfrom Barnard, no one attempts to demolish thearguments of Sections1-6 leadingto whathas been called 'theinevitability ofprobability' and it is onlywhenthey thebus, in Armitage's mount happyanalogy, thatdoubtsenter. The proofof thepudding maybe in the buttherecipecountsalso. eating The identification withuncertainty I explainedthatit was of statistics has worried many, eventhough 'notin thesense of a precisedefinition' so Hand is allowedhis exemptions, though even thereit is well to recognize that, evenwitha complete population, a summary introduces uncertainty and thequalityof a summary is judged by how littleuncertainty it leaves behind.To answerArmitage, the reasonsfor in the data,are first rather choosinguncertainty as primary, thanvariability thatit is theuncertainty of theparameter (or thedefendant's guilt)thatis primary and thedata are aids to assessingit,and second thatdata need something morebeforetheywill be of value. In amplification of the second point,it is anotherof effectand the last a possible to have two 2 X 2 X 2 tables, each with a controlfactor, in whichtheconclusions bothwiththesame numbers, abouttheeffect ofthecontrol confounding factor, are completely opposed (Lindley and Novick, 1981). Statistical packages thatanalyse data without context areunsound. The suggestion of an eclecticapproachto statistics, incorporating thebestof variousapproaches, has been made. I would,withEvett,call it unprincipled. Whydo adherents of thelikelihoodapproach, part of thiseclecticism, continue withtheir upwards of 12 varieties of likelihood, all designedin an attempt to overcomethe failureof likelihoodto be additive, a requirement easily seen to be essentialto any measureof uncertainty? There is onlyone principle: probability. Whyuse a pejorative termlike sin to describeincoherence? These eclectic people do not like principles, as is evidentby theirfailureto consider insteadconcentrating on their of whathappenswhentheyare applied,often them, perceptions Efronworries aboutthe likelihood whichis not surprising whenthebootstrap has no falsely. principle, likelihoodabout which to have a principle.The Bayesian view embracesthe whole world,which is whereasthefrequentist view restricts attention to and has to be reducedto smallworlds, overambitious, within to a population. The bootstrap and operates thesample,eschewing reference goes to theextreme outside aspects and using ad hoc methods,like trimmed means, discussed in Section 12 withina with it mightlike to read the balanced coherent Readers who are not already familiar framework. in Young(1994) and,in particular, of thebootstrap remark that'we shouldthink discussion Schervish's abouttheproblem'. O'Hagan may be wrongwhen he says,in his point(b), thata consensuswill not be reachedif the is subjective. likelihood Withtwo hypotheses and exchangeable data,yourlog-oddschange,on receipt of data,by the additionof the sum of yourlog-likelihood ratios.Providedthatthe expectation of the the correctconclusionwill be ratiosis positiveunderone hypothesis and negativeunderthe other, reachedand henceconsensus. I do not knowhow we get started; Bartholomew worriesabout this consensusin a wider context. is. Interesting as thispointis, it does notmatter perhapsit is all wiredin as Chomsky suggests grammar in a problem, in practicebecause,whenwe are faced withquantities theymake some sense to us and therefore we have some knowledge of them.I maybe unduly butI feelthatiftwopeople are optimistic will a big assumption, thena coherent of theory and experience each separately coherent, appreciation It happens in science, thoughnot in politicsor religion,but are they lead to agreement. ultimately coherent? A small correction to Bartholomew: Bayes does recognizewhat I have called a haphazard that design(Lindley,1982), and a convenient way to produceone is by randomization (afterchecking therandom Latinsquareis notKnut-Vik). Hand raisestheimportant forourSocietyofwhy question have not 'the eleganttools for handlinguncertainty which have been developed by statisticians alwaysbeen adopted'. themselves One reasonmaybe thatsome statisticians, have not immersed myself included, sufficiently
335
tellingus that the data. Efron providesan examplewhen,apartfrom in thecircumstances surrounding he just treatsthe 23 numbersas elementsin the the numbers referto hospital stays aftersurgery, forall thebootstrap cares.We need to show to geophysics calculation: theymight equallyhave referred recently, some members of thisat a meeting morerespect thanwe do forour clients.WhenI suggested medicineas people who do not theaudiencelaughedand mentioned cranks who believedin alternative deserverespect.Oughtwe not to tryto help all who come to us to expresstheirideas in termsof to data? Another reasonfor and to respondmoresensibly probability, to help themto become coherent is thatsome of our methodssound,and are, absurd.How manypractitioners suspicionof statistics than as a statement about a understand a confidence intervalas coverage of a fixedvalue, rather parameter? syndrome (AIDS) progression. My Cox defends thestatistical analysisof acquiredimmune deficiency approachcan only forwhichI apologize,is thata frequentist pointhere,perhapsnotclearlyexpressed, thatis present in thedata.It cannot that use thefrequentist variation lead to standard errors forestimates impressed by typesof uncertainty. For example,it mayhappenthat, incorporate intoitsprediction other in their sexual activities and,as a result, media emphasison AIDS, thepublicmayact morecautiously canon and, although the incidencewould decrease. This sort of judgmentis outside the frequency will search forways aroundthe limitation, the Bayesian approach dedicatedfrequentists competent, whatis the objectionto attaching of uncertainty. In thisconnection, naturally incorporated bothforms an overallBayesianmodel?Hoetinget al. (1999) provides probabilities to frequentist modelsto provide thevelocity of wantto estimate a good account.I findCox's notionof prediction too narrow. Scientists realities thatdependon it,e.g. the of velocity butto predict light, notto predict a future measurement of x, but a repetition timetakenforthe lightfroma starto reach us. In p(ylx), y is not necessarily Of a parameter. to x; therelationship often beingmade explicit, as O'Hagan says,through merely related do not like prediction because it is so difficult withintheirparadigm, involving course,frequentists contortions akinto thosewithconfidence intervals. issue of inconsistency, e.g. between a prior,to use the Several discussantsraise the important There are unfortunate and the likelihood.In thiscase, data have arisenwhichwere unexpected. term, severalpossibilities. One is thatan inspection of the data or discussionwith a colleague reveals a thatyou had notcontemplated, in whichcase you mayadd thequantity (as H,+1 above) and possibility and you astonishing has occurred, just as it oughtto occasionally, continue. Another is thatthe truly Thereis A third thatwereinsufficiently continue. is that dispersed. you selectedprobabilities possibility assessmentcan lead to oversome evidence thatthe psychologicalprocess involvedin probability in yourknowledge. Sometimes it can easilybe corrected confidence by usinga long-tailed distribution, in place of a normaldistribution, of prior whenthecombination suchas t withlow degreesof freedom, but withenhanceddispersion and inconsistent likelihoodleads to a reasonablecompromise (Lindley, of my paper,we are woefully ignorant paragraph 1983). To repeatthe pointmade in the penultimate in thisfieldis important. of probabilities and a concerted research effort Healy is abouttheassessment to drawattention to thepsychology oftheproblem. right Cox raises the natureof the probabilities arisingin Mendeliangenetics.I would like to reservethe use 'chances' which to refer to beliefs.Genetics,and similarly mechanics, word'probability' quantum relatedto biological phenomenaand arise fromexchangeability judgments are, as Cox would prefer, was there.The olderterminology discussedin Section 14, chance playingtherole of theparameter Vp directand inverse The distinction becomes usefulwhenyou wish to considerthe simple probabilities. of a probability of a chance, whereasprobability is, in the philosophy, conceptof your probability Whenothers raisetheissue they I do notunderstand Cox's remark aboutcalibration. unsound. ordinarily refer to the long-run whereasBayesianslive aroundthepresent, not long runs,and continufrequency, in a way thatis distinct fromthe frequentist. This is responding ally adjustby our beloved theorem, ratherthan errorfrequencies.His claim that illustrated by their use of presentutilityfunctions meshbetter thanBayesianswiththerealworldseemswrong to me. frequentists and I agreewith is deficient in claimingthatmydiscussionof nuisanceparameters O'Hagan is right to the him thatthe inclusionof extraquantities, is fundamental thatare not of immediate interest, It is a case of the largermodel being simplerand hence more commuassessment of probabilities. nicative.However,I do not agree with his commenton the conglomerability example for it is the distribution on the integers thatare the tangibleconcepts.A uniform conditional only probabilities in anyfinite set. makessensewhenitmeansuniform thereis a discussionabout the Bayesian view, someone is sure to bringout the remark Whenever butwrong'and Nelderdoes notdisappoint. You are neverwrongon theevidence aboutbeing 'coherent
336
thatyou have, when expressing yourbeliefscoherently. To appreciate this,tryto give a definition of 'wrong'. Of courseadditionalevidencemay demonstrate thatyou werewrongbut Bayesianscan deal with that,eitherby changingthe conditions, as when you learn that an event on which you have is false,or by updating conditioned by the theorem. Wrongyou may often be withhindsight but even frequentists, or likelihood enthusiasts, havethat property also. I do agree withDawid that'Bayesian statistics is fundamentally boring'.A copy of thispaper was sentto the personwho has the fullest understanding of the subjectivist view of anyoneI know (Lad, 1996), and his principalcomment was thatit was boring.My initialreactionwas of disappointment, even fury, but further contemplation showedme thathe is right forthe reasonsthatDawid gives. My wouldbe that thetheory onlyqualification maybe boring buttheapplications are exciting. in his point(e), arguesthatyou should summarize Sprott, to empiricalevidencewithout reference preconceived ideas and saysthat thisshouldbe donethrough likelihood statements. AgainstthisI would arguethatno-onehas succeeded in describing a sensibleway of doingthis.I disputeArmitage's claim that the 'Fisherianrevolution'accomplishedthis because, althoughhis methodswere superb,his were mostlyfallacious.Likelihoodwill not workbecause of difficulties justifications with nuisance and because of absurdities likethat in Section13. parameters described An interesting feature of the comments is an omission;thereis littlereference to the subjectivity advocatedin the paper,which surprises me because science is usuallydescribedas objective.Indeed Cox, in concluding his advocacyof theeclecticapproach, gives a personalistic reasonforsupporting his view. 'I regardthe frequentist view as primary, formostif notvirtually all the applications withwhichI happento havebeen involved.' of My advocacy of the subjectivepositionis based on reason,subsequently supported by experiences myself and others. I concludeon a personalnote.When,halfa century in statistics, ago, I began to do seriousresearch myobjectwas to putstatistics, thenalmostentirely Fisherian, ontoa logical,mathematical basis to unite the manydisparatetechniques thatgeniushas produced.When thishad been done by Savage, in the form thatwe todaycall Bayesian,I feltthatpractice and theory had been united.Kingman'ssentence is so aptto whatfollowed. 'Perhapsmathematicians selectthemselves by thisdesireto reducechaos to orderand onlylearnby that therealworldtakesitsrevenge.' experience The revenge came laterwiththeadvocacyofthelikelihood and laterBimbaum,so principle by Barnard, thatdoubtsbegan to enter, of counter-examples I realizedthat and laterstill,as theplethora appeared, Bayes destroyed frequency ideas. Even then I clung to the improper priorsand the attempt to be the subjectivist objective,onlyto have themdamagedby themarginalization paradoxes.More recently view has been seen as thebestthatis currently availableand de Finetti as thegreatgeniusof appreciated It is therefore how othersfindit hardto adopt a personalistic probability. easy forme to understand forthe reasonedarguments thattheyhave used, attitude and am therefore to the discussants grateful some ofwhichI might havemyself used in thepast.
References in the comments

J R. Statist. in forensic Balding,D. J.and Donnelly, P. (1995) Inference identification Soc. A, 158, (withdiscussion). 21-53. Bernard, C. (1957) AnIntroduction totheStudy NewYork. DoverPublications. ofExperimental Medicine (Engl.transl.). of statistical inference: the case foreclecticism Cox, D. R. (1978) Foundations Aust.J Statist., (withdiscussion). 20, 43-59. between in statistics (1995) The relation theory andapplication (with discussion). Test, 4, 207-261. ofstatistical inference. Nieuw Arch.Wisk., (1997) The nature 15,233-242. D. V (1974) Theoretical Statistics. London:Chapman andHall. Cox,D. R. andHinkley, R. N. (1999) Unfathomable nature andGovernment Curnow, policy. Statistician, 48, 463-476. offorensic Dawid,A. P. andMortera, J.(1996) Coherent analysis identification evidence. J R. Statist. Soc. B, 58, 425-443. with (1998) Forensic identification imperfect evidence. Biometrika, 85, 835-849. D. A. (2000) The use of the likelihood function in the analysisof environmental Diaz-Frances, E. and Sprott, data. Environmetrics, 11,75-98.
337
Doll, R. andHill,A. B. (1950) Smoking andcarcinoma ofthelung:preliminary report. Br Med.J.,ii,739-748. Donnelly, P. and Friedman, R. D. (1999) DNA database searches and thelegal consumption of scientific evidence. Mich. Law Rev., 97, 931-984. J.(1987) Statistics Durbin, andstatistical science.J R. Statist. Soc. A, 150, 177-191. Efron, B. (1982) The Jackknife, theBootstrap, and OtherResampling Plans. Philadelphia: SocietyforIndustrial and Applied Mathematics. (1993) Bayesandlikelihood calculations from confidence intervals. Biometrika, 80, 3-26. B. S. (1999) ChanceRules:an Informal Riskand Statistics. New York: Everitt, GuidetoProbability, Springer. Finney, D. J.(1978) Statistical Method inBiological Assay, 3rdedn.London:Griffin. D. (2000) Statistics in animal Gianola, breeding. J Am.Statist. Ass.,95, 296-299. M. J.R. (1999) Paradigmes Healy, etpragmatisme. Rev.Epidem. Sant.Publ.,47, 185-189. Hoeting, J. A., Madigan,D., Raftery, A. E. and Volinsky, C. T. (1999) Bayesianmodel averaging: a tutorial (with Statist. discussion). Sci., 14,382-417. Lad,F. (1996) Operational Subjective Statistical Methods. New York: Wiley. D. V (1957) A statistical Lindley, paradox. Biometrika, 44, 187-192. (1971) Bayesian Statistics: a Review. for Mathematics. Philadelphia: Society Industrial andApplied in forensic (1977) A problem science.Biometrika, 64,207-213. (1978) The Bayesian approach (with discussion). Scand.J Statist., 5, 1-26. (1982) Theuse ofrandomization in inference. Philos.Sci. Ass.,2, 431-436. (1983) Reconciliation ofprobability distributions. OpsRes.,13, 866-880. M. R. (1981) The roleofexchangeability ininference. D. V andNovick, Ann.Statist., Lindley, 9, 45-58. Nelder, J.A. (1999) From statistics to statistical science. Statistician, 48, 257-267. andMeasurement. andHall. O'Hagan,A. (1988) Probability: Methods London:Chapman (1995) Fractional Bayesfactors for modelcomparison (with discussion). Soc. B, 57, 99-138. J R. Statist. D. B. (1981) The Bayesian Ann. Rubin, bootstrap. Statist., 9, 130-134. D. (1994) Le Jeude la ScienceetduHasard.Paris:Flammarion. Schwartz, A. (1999) Likelihood ratios forevaluating DNA evidence whenthesuspect is found a database search. Stockmarr, through Biometrics, 55, 671-677. P. (1996) Inference from multinomial data:learning abouta bag ofmarbles Soc. B, Walley, (with discussion). J R. Statist. 58, 3-57. F. (1950) The influence on thedevelopment ofthescienceofstatistics. of "Statistical for research workers" Yates, methods J Am.Statist. Ass.,46, 19-34. F andCochran, W G. (1938) The analysis ofgroups ofexperiments. Yates, J Agric. Sci.,28, 556-580. more a stabinthedark G. A. (1994) Bootstrap: than Statist. Young, (with discussion)? Sci., 9, 382-415.

The Philosophy of Statistics

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

The Philosophy of Statistics

Diunggah oleh

Hak Cipta:

Format Tersedia

The Philosophy of Statistics Author(s): Dennis V. Lindley Source: Journal of the Royal Statistical Society.

The Statistician (2000) 49, Part3, pp. 293-337

The philosophy of statistics

p(O,a x) oxp(x 0, a) p(O,a),

H,x) p(OIx) dO. I0

Philosophy ofStatistics 12. Optimality

Ju(e,x, d, 0) p(O e, x, d) dO,

J u{e, x,d*(e, x), O} p{0

e, x, d*(e, x)} dOp(xle) dx.

Ju{e x, d*(e, x), 0} p(x e, 0)dxp(0

Comments on the paper by Lindley

Commentson thePaper by Lindley

Commentson thePaper by Lindley

Commentson thePaper by Lindley

Commentson thePaper by Lindley

Commentson thePaper by Lindley

Commentson thePaper by Lindley

Commentson thePaper by Lindley

Commentson thePaper by Lindley

Commentson thePaper by Lindley

Commentson thePaper by Lindley

Commentson thePaper by Lindley W = L(H versusH'IE) = Pr(EIH)/Pr(EIH').

mean bootstrap 10% trimmed bootstrap

Commentson thePaper by Lindley

Commentson thePaper by Lindley

Commentson thePaper by Lindley

Commentson thePaper by Lindley

Commentson thePaper by Lindley

Commentson thePaper by Lindley

References in the comments

Commentson thePaper by Lindley

Anda mungkin juga menyukai