Anda di halaman 1dari 2

Psychological Methods 1999. Vol.4. No. 2.

212-: 13

Copyright 1999 by the American Psychological Association. Inc. I082-989X/99/S3.00

One Cheer for Null Hypothesis Significance Testing


Howard Wainer
Educational Testing Service Null hypothesis testing as a tool in research is defended. Six examples are offered of situations in which, if all the researcher could do was "reject H0 at a = .05" the scientific contribution would still be substantial. The examples are drawn from physics, cosmology, psychology, geophysics, career counseling and theology.

A major consideration for the American Psychological Association (APA) Task Force on Statistical Inference deliberations is the extent to which traditional null hypothesis significance testing (NHT) should be discouraged or even disallowed in the descriptions of research contained within the confines of APA journals. Members of the task force regularly receive substantial collections of comments and suggestions from APA members. A large number of these comments and suggestions are surprisingly adamant in their opposition to NHT. The lack of appreciation of the potential value of NHT m the appropriate circumstances suggested that it night be useful to offer at least one cheer for NHT, a procedure that, I believe, can be a powerful and useful v^eapon in our methodological armorata. To be perfectly honest, I am a little at a loss to understand fully the vehemence and vindictiveness that have recently greeted NHT. These criticisms seem to focus primarily on the misuse of NHT. This focus on the technique rather than on those who misuse it seems to be misplaced. (Don't ask me to be consistent here, because I find the National Rifle Association's similar defense of handguns completely specious.)

All would agree that "6:00 p.m." is a pretty stupid answer to most questions. But, because it is precisely the correct answer to some questions, it would be shortsighted to ban forever the use of "6:00 p.m." Thus, it seems to me that the issue is to specify more clearly the kinds of questions for which NHT is suitable. To do this I do two things. First I take a broader view of its use than simply within scientific psychology. Second 1 go a little overboard and specify six different questions for which a reliable reject-not reject decision would be generally welcomed as a major breakthrough. I do not mean to imply that doing more (e.g., estimating the direction or size of the effect) might not have improved matters further; rather, only that when such further elaborations are not possible a simple, trustworthy reject-not reject decision can still be worthwhile. How worthwhile? The canny reader will note that someone who could have done the appropriate studies to yield a reliable reject-not reject decision in some of the situations listed subsequently might have been rewarded with a Nobel Prize or even canonization. It is of some historical interest to note that the earliest example (Example 6) of NHT was aiming at the latter.

Example 1: Physics
The time spent in writing this article was supported by the Educational Testing Service research allocation. 1 would like to thank my friends and colleagues Erich Lehmann, George Miller, Bob Mislevy, Don Rubin, and Spencer Swinton for helpful discussions about null hypothesis testire that clarified my own thinking. Correspondence concerning this article should be addressed to Howard Wainer, Educational Testing Service (T-15), Princeton, New Jersey 08541. Electronic mail may be sent to hwainer@ets.org.
//0

= Cj for all / and j

Here c, is the speed of light in reference frame /. Note that if, after credible effort when reference frames / and j are moving away from each other at great speeds, we were still unable to reject the null hypothesis, we would have gone a long way toward providing the basis for the theory of relativity. Einstein would have been pleased.

212

ONE CHEER FOR NHT

213

Example 2: Cosmology (a One-Tailed Test)


H(}: Vu < 0 //,: Vu>0 Vv is the speed of expansion of the universe. Of course, solving the estimation problem Vu = r for r would be a bigger contribution still, but rejecting //0 is still a pretty impressive piece of work.

Example 6: Theology

Example 3: Geophysics
H0: DT_NY(t) = k //,: DT_NY(t) * k D^vyO) is the distance between Tokyo and New York City at some time t. The null hypothesis is simply th;;.t this distance is constant over time; the alternative is that it changes. Rejecting Hn provides powerful evidence of continental drift and thus supports the theory of plate tectonics. Lacking such evidence, Vine and Mathews, in their definitive 1963 article, needeti to use more indirect magnetic evidence to support their claim of the movement of the Earth's surface on giant plates. I suspect their task would have been niuch easier if they could have simply said, "Reject H(> at p < .001."

Example 4: Career Counseling


//,: , + T v i s one's employment status at the end of next year; T is tenured. I suspect that there are a very large number of assistant professors who would find "Reject //, at p < .00 1 " an enormously informative result.

//0: Nc = 0 //,: NG > 0 NCj is the "number of supreme beings." I believe that a valid study that could conclude P(datal//0) < .0001 would be greeted with enormous approbation. It is of more than passing interest to note that the earliest study I know of to use NHT (Arbuthnot, 1710) was concerned with exactly this hypothesis. Arbuthnot rejected //. In this note, I have attempted to illustrate circumstances in which science is advanced with only a binary result. Sometimes such an advance is primitive, and if we can do better, we ought to. Sometimes it is the best we can do, and in doing so we have made a real advance. Scientific investigations only rarely must end with a simple reject-not reject decision, although they often include such decisions as part of their beginnings. The real test of relativity used the assumption that the speed of light was a constant to derive other observational equations, but having evidence to support that assumption was an important start. Providing a test of fit for a complex model (i.e., likelihood ratio tests for nested models) may be the most powerful use of NHT. NHT, when used in its proper place, can provide us with valuable help. It is surely not the most powerful weapon we have, but neither is factor analysis, and no one is advocating a ban on that. So let me offer up one cheer for NHT with the hope that it can continue to be used when it can be of help and that instructors will more clearly indicate those circumstances.

References Example 5: Psychology


Arbuthnot, J. (1710). An argument for divine providence taken from the constant regularity in the births of both sexes. Philosophical Transactions of the Royal Society, London, 27. Herrnstein, R. J., & Murray, C. (1994). The bell cun>e: Intelligence and class structure in American life. New York: Free Press. Vine, F. J., & Matthews, D. H. (1963). Magnetic anomalies over oceanic ridges. Nature, 199, 947-949. Received October 2, 1997 Revision received April 10, 1998 Accepted April 21, 1998

Here p./(f) is the mean human intelligence at time t. Rejecting the null hypothesis suggests that the intelligence of the human race is changing. Once again, it would be more valuable to estimate the direction and rate of change, but just being able to state that intelligence is changing would be an important contribution. Early in this century, eugenicists warned of the dangers of differential reproduction rates, and shadows of this warning have shown up even in more recent work (Herrnstein & Murray, 1994). Rejecting // must lie at the start of any credible eugenic theory.

Anda mungkin juga menyukai