Statistics PDF

seteenth-century English statesman Ben: N= Daal eponealy one named tree forms of dishonesty: “lies, damined lies, and Statistics.” It is certainly true dhat people can lie with the help of statisties. It happens all he time: ‘Advertisers, politicians, and others with some claim to make either use numbers inappropriately 013° ‘nore certain critical ones, (When hearing that “four out of five doctors surveyed” recommended some product, have you ever wondered just how many doctors were surveyed and whether they were rep~ resentative ofall doctors?) People also use num- pers to convey a false impression of certainty ard objectivity when the true state of affairs i uncertainty or ignorance. But its people, nor statistics, that lie. When statistics are used correctly, they nei- ther eonfuse nor mislead. On the contrary, they eX pose unwarranted conclusions, promote clarity and precision, and protect us from our own biases and blind spots Ti statistics are useful anywhere, itis in the study of human behavior. If human beings were all alike, and psychologists could specify all the In~ fluences on behavior, there would be no need for statistics. But any time we measure human belay for, we are golng to wind up with different observations 0 scores for different individuals. Statistics ‘an help us spot trends amid the diversity. ‘This appendix will introduce you to some baske statistical calculations used in psychology. Reading the appendix will not make you into a statistidan, ‘but it will acquaint you with some ways of orga- piring and assessing research data. Tf you suffer {rom a “number phobia,” relax: You do not need to know much math to understand this material However, you should have read Chapter 1, which tiscussed the rationale for using statistics and de- Sribed various research methods, You may want to review the besic terms ancl concepis covered in that Chapter. Be sure that you can define hypothesis sample, correlation, independent variable, dependent vari- fable, random assignnent, experimental group, control soup, descriptive statistics, inferential statistics and test ‘of statstcal sgnifieance. (Correlation coefficients, which ate deseribed in some detail in Chapter 2, will not be covered here.) To read che tables in this appendix, you will also need to know the following symbols IN = the total aumber of observations or scores in a set X = an observation or score ‘3 = the Greck capital letter sigma, read as =the sum of” Vo = the square root of (Note: Boldfaced terms in this appendix are defined {in the glossary at the end of the book.) Organizing Data Before we can discuss statistics, we need some num~ bers. Imagine that you are a psychologist and that ‘you are interested in that most pleasing of human Zeaites, a sense of humor. You suspect thot a well- developed funny bone can protect people from the negative emotional effects of stress. You already {know that in the months following a stressful event, people who score high on sense-of-Iumor tests tend to feel less tense and moody than more sober~ sided individuals do. You realize, though, that this Correlational evidence does not prove cause and effect, Perhaps people with a healthy sense of humor hhave other traits, such as flexibility or creativity, that act 2s the true stress buffers. To find out whether humor itselt really softens the impact of stress, you do an experiment. isst, you randomly assign subjects to two groups, an experlinental group and a control group. ‘To keep our calculations simple, le’ assume there are only 15 people per group. Each person inci- ‘vidually views a silent film that most North Amer~ icans find fairy stressful, one showing Australian aboriginal boys undergoing a puberty ste involving genital mutilation, Subjects in the experimental group are instructed to make up a humorous mono Togue while watching the film. Those in the control group are told to make up a straightforward nar tative. After the film, each person answers a mood {questionnaire that measures current feelings of ten sion, depression, aggressiveness, and anxiety. A pet= Son's overall score on the questionnaire can range from 1 {no mood disturbance) to 7 (strong mood disturbance). This procedure provides you with 15 “mood disturbance” scores for each group. Have people who tied to be humorous reported les disturbance than those who did not?‘imeteenth-century English statesman Ben Ne asl pete once named thee forms of dishonesty: “lies, damned lies, and statistics.” It is certainly true that people can lie with the help of statistics, It happens all the time: Advertisers, politicians, and others with some claim fo make either use numbers inappropriately or ig- nore certain critical ones. (When hearing that “four ‘out of five doctors surveyed” recommended some Product, have you ever wondered just how many doctors were surveyed and whether they were representative of all doctors?) People also use numbers to convey a false impression of certainty and objectivity when the true state of affairs is uncertainty or ignorance. But itis people, not statistics, that lie. When statiscs are used correctly, they nei. ther confuse nor mislead, On the contrary, they ex- ose unwarranted conclusions, promote clarity and Precision, and protect us from our own biases and. biind spors, If statistics are useful anywhere, it is in the study of human behavior. if human beings were all alike, and psychologists could specify all the in- Auences on behavior, there would be no need for Statistics, But any time we measure human behave Jor, we are going to wind up with different obser. vations or scores for different individuals. Statistics ‘an help us spot trends amid the diversity ‘This appendix will introduce you to some basic statistical calculations used in psychology. Reading os the appendix will not make you into a statistician, but it will acquaint you with some ways of orga, nizing and assessing research data. If you suffer from a “number phobia,” relax: You do not nced to know much math to understand this material, However, you should have read Chapter 1, which discussed the rationale for using statistics and deseribed various research methods. You may want to teview the basic terms and conccpis covered in that chapter. Be sure that you can define iiypothesis, sam le, correlation, independent variable, dependent variable, random assignment, experimental group, control sroup, descriptive statistics, inferential statistics andl test of statistical significance, (Correlation coetficients, which are described in some detail in Chapter 2, will not be covered here.) To read the tables in this appendix, you will also need to know the following symbols: the total number of observations or scores ina set X = an observation or score = the Greek capital lester sigma, read as “the sum of” V7 = the square root of (Note: Boldfaced terms in this appendix are defined In the glossary at the end of the book ) Organizing Data bers. Imagine that you are a psychologist and thet You are interested in that most pleasing of human Qualities, a sense of humor. You suspect that a well. developed funny bone can protect people from the ‘negative emotional effects of stress. You already know that in the months following a stressful event, People who score high on sense-of-humor tests tend to feel less tense and moody than moze sober. sided individuals do, You realize, though, that this correlational evidence does not prove cause and ef. fect, Pethaps people with a healthy sense of humor have other waits, such as flexibility or creativity, that act as the true stress buffers. To find out whether humor itself really softens the impact of stress, you do an experiment, First, you randomly assign subjects to two soups, an experimental group and a control group. ‘To keep our calculations simple, lev’s assume there are only 15 people per group. Each person indi. vvidually views a silent film that most North Amer. icans find fainy stressful, one showing Australian aboriginal boys undergoing a puberty site involving genital mutilation. Subjects in the experimental srOup are instructed to make up a humorous mono- logue while watching the film. Those in the control group are told to make up a straightforward nar- tative. After the fllm, cach person answers a mood questionnaire that measures current feelings of ten- sion, depression, aggressiveness, and anxiety. A per. son's overall score on the questionnaire can range from 1 (no mood disturbance) to 7 (strong mood disturbance}. This procedure provides you with 15 “mood disturbance” scores for each group. Have people who tried to be humorous reported less dis. turbance than those who did not? quazz@ Some Hypothetical Raw Data ‘These scores are forthe hypothetical humorand- ‘sess study described inthe text. experimental group saan seas sea56 Control group 64,7,6,6,4,6,7,7/5,5,5,7.6.6 Constructing a Frequency Distribution Your frst step might be to organize and condense the ‘raw data” (the obtained scores) by constructing a frequency distribution for each group. A frequency distribution shows how often each possible score actually occurred. To construct one, you first onder al the possible scores from highest to lowest. (Qurmood disturbance scores will be ordered from 7 10 1.} Then ‘you tally how often each score was actualy obtained. ‘able A.1 gives some hypothetical ravy data for the too groups, and Table A.2 shows the two frequency disteibutions based on these data, From these distrb- tutions you can see that the two groups differed. In the experimental group, the extreme scores of 7 and 1 did not occur at all, and the most common score was the middle one, 4. in the control group, a score ‘of 7 occurred four times, the most common score was 6, and no one obtained a score lover than 4 Because our mood scores have only seven possible values, our frequency distributions are quite manageable. Suppose, though, that your questionnaire had yielded scores that could range from 1 t0 50. A frequency distribution with 50 entries ‘would be cumbersome and might not reveal trends in the data clearly. solution. would be to construct a grouped frequency distribution by grouping adjacent scores into equal-sized classes or intervals, Each interval could cover say, live scores (1-5, 6-10, 11-15, and so forth). Then you could tally the frequencies ‘within each interval. This procedure would reduce the number of entries in each distribution from 50 to only 10, making the overall xesults much easier ‘to grasp. However, information would be lost. For ‘example, there would be no way of knowing how many people had a score of 43 versus 44. Graphing the Data As everyone knows, a picture is worth a thousand ‘words. The most common statistical picture Is a graph, a drawing that depicts numerical relation- Statistical Methods tees Experimental Group oa te — eal ee a ee 7 ° | id un « Sw sa oS 4 aT) 7 ae 4 EE aw ae a ed 2 oO oO 1 oO E n=15 Was ships. Graphs appear at several potnts In this book, and are routinely ted by pyeboogts to convey their findings to others. From graphs, we can get a general impression of what the data are like, note the relative frequencies of different scores, and see which score was most frequent. In a graph construct- ed from a frequency dis mibution, the possible score values are shown along a borizontal line {the x-axis of the graph) and frequencies along a vertical line {the yraxs), or vice versa. To construct a histogram, or bar graph, from our mood scores, we draw rectangles (bars) above each score, indicat- ing the number of times it ‘occurred by the rectangle’s hheight (Figure .1) ea slighly ferent keind of picture” is provid- ed by a frequency polygon, or line graph. In a frequency polygon, the frequency of each score is indicated by a dot placed directly over tae score on. the hotizontal axis, at the APPENDIX Two Frequency Distributions The scores are from Table A 485 FIGURE A.1. A Histogram : Thisgopt depicts the dstrbuton of mooa dst bance scores shown on the left side of Table A.2.APPENDIX Statistical Methods 2 1 ! eee ara es BEHeteteiy ‘Mood disturbance score FIGURE A.2 A Frequency Polygon THis @ph depicts the same data as Figure Ai ipbtepriate height on the vertical axis. The dots for ihe arlous scores are then joined together by *fisht lines, asin Figure A.2. When necessary of gata" score, witha frequency of zero, ean be aden 8 ‘Average achievement test score School 7 School 2 ® Fame Data, Different impressions GaSe Me Steph ceplct tna same sete, ut have afer ‘ent units on the vertical aus {Rok quite diferent, actually depict the sarme data Always read the units om the axes of a graph, oth. etwise, the shape of a histogram or frequency polygon may be misleading. Describing Data observation, Measuring Central Tendency The Mean. The most popular measure of central smeney is the arithmetic mean, usually called ly add up a set of scores and divide the tol by the number of scores in the set. Recall shay Sanematical notation, ¥ means “the sun oft ands for the individual scores, and Nv epresenic ihe oral number of scores ina set. Thus ihe far mula for calculating the mean ts: Ex MeN Table A.3 shows how to compute the mean for serine’ Compare the vo means staisially vo see there isa significant difference between thc, The Median. Despite its usefulness, sometimes ie mean can be misleading, as we noice et Shapiet 1. Suppose you piled some children on Sesser im such a way that it was perfectly baleAPPENDIX Statistical Methods Frequency atla ita teeate ties ‘Mood disturbance score FIGURE A.2 A Frequency Polygon This areph depicts the same data as Figure A. appropriate height on the vertical axis. The dats for the various scores are then joined together by straight lines, asin Figure A.2. When necessary an “extra” score, with a frequency of zero, can beaded at each end of the horizontal axis, so that the polygon will rest on this axis instead of floating above it. ‘A word of caution about graphs: They may either exaggerate or mask differences in the data, de- pending on which units are used on the vertical axis. The two graphs in Figure A.3, although they ‘Average achievement test score Average achloverent test score ° ‘School? Schoel2 o ) Scheel? FIGURE A.5 ‘Same Dats, Different Impressions These two graphs depict the same data, but have cltfer -ent.units on the vertical axis, Jook quite different, actually depict the same data Always read the units on the axes of a graph; othe erwise, the shape of a histogram or frequency polygon may be misleading, Describing Data Having organized your data, you are now ready to summarize and describe them. As you will recall from Chapter 1, procedures for doing so are known as descriptive statistics. In the following discus- sion, the word score will stand for any numerical observation, Measuring Central Tendency Your first step in describing your data might be to compute a measure of central tendency for each group. Measures of central tendency characterize an entire set of data in terms of a single representa: tive number, The Mean. ‘The most popular measure of central tendency is the arithmetic mean, usually called simply the mean. It is often expressed by the symbol M, Most people are thinking of the mean when they say “average.” We ran across means all the time: In grade point averages, temperature averages, and batting averages. The mean is valuable to the psychologist because it takes all the data into account and it can be used in further sta~ sical analyses. To compute the mean, you sim ply add up a set of scores and divide the total by the number of scores in the set. Recall that in mathematical notation, J means “the sum of,” X stands for the individual scores, and N represents the total number of scores in @ set. Thus the formula for calculating the mean is: BX M Table A.3 shows how torcompute the mean for our experimental group. Test your ability to per- form this calculation by computing the mean for the control group yourself. (You can find the answer, along with other control group statisties, on page 490.) Later, we will describe how a psychol- gist would compare the two means statistically to sce if there isa significant difference between them. The Median. Despite its usefulness, sometimes the mean can be misleading, as we noted in Chapter 1. Suppose you piled some children on a seesaw in such a way that it was perfectly bal- anced, and then a 200-pound adult came and sat on one end. The center of gravity would quickly shift toward the adult. In the same way, one extremely high score can dramatically raise the mean (and one extremely low score can dramatically lower it). In real life, this can be a serious problem. For example, in the calculation of town’s mean income, one millionaire would offset hundreds of poor people. The mean income would be @ misleading indication of the town’s actual wealth. ‘When extreme scores occur, more repre sentative measure of central tendency is the median, or midpoint in a set of scores or observations ordered {rom highest to lowest. In any set of scores, the same umber of scores falls above ‘the median as below it. The median is not affect ed by extreme scores. If you were calculating the ‘median income of that sane town, the one mil- Honaire would offset only one poor person. When the number of scores in the set is odd, caleulating the median is a simple matter of count ing in from the ends to the middle. However, if the number of scores is even, there will be two niddle scores, The simplest solution is to find the ‘mean of those two scores and use that number as the median, (When the data are from a grouped frequency distribution, a more complicated procedure is required, one beyond the scope of this appendix.) In our experimental group, the medi- fan score is 4 (sce Table A.3 again). What is it for the control group? The Mode. A third measure of central tendency is the mode, the score that occurs most often, In our experimental group, the modal score is 4. In ‘our control group, its 6. In some distributions, all scores occur with equal frequency, and there is no mode, In others, two or more scores “tie” for the distinction of being most frequent. Modes are used less often than other measures of central tendency. They do not tell us anything about the other scores in the distribution; they often are not very “central”; and they tend to fluctuate from one random sample of a population to another ‘more than either the median or the mean. Measuring Variability ‘A measure of central tendency may or may not be highly representative of other scores in. a discribu- tion, To understand our results, we also need a measure of variability that will tell us whether ‘our scores are clustered dlosely around the mean or widely scattered, 487 Statistical Methods APPENDIX. Tae) Calculating a Mean and a Median ‘The scores are from the left side of Table A Mean (M) $4+G4 3546454244534 54as 045 ee + 15 ee iB -4 Median Scores inorder: 23,3,3,4, 4,4, [i] 4,4, 4,5,5,3,6 Median The Range. The simplest sneasure of variability is the range, which is found by subtracting the lowest score from the highest one. For our hypothetical set of mood disturbance scores, the range in the experimental group is 4 and in the control group it is 3. Unfortunately, though, simplicity is not always a virtue. The range gives us some information about variability but ignores all scores other than the highest and lowest ones. The Standard Deviation. A more sophisticated ‘measure of variability fs the standard deviation (SD). This statistic takes every score in the distrib ution into account. Loosely speaking, it gives us ‘an idea of how much, on the average, scores in a distribution differ from the mean. If the scores ‘were all the same, the standard deviation would be zero, The higher the standard deviation, the more variability there is among scores. ‘To compute the standard deviation, we must find out how much each individual score deviates from the mean. To do so we simply subtract the ‘mean from cach score. This gives us a set of deviation scores. Deviation scores for numbers above the ‘mean will be postive, those for numbers below the ‘mean will be negative, and the positive scores will exactly balance the negative ones. In other words, the sum of the deviation scores will be zero. That is a problem, since the next step in our calculation is to add. The solution is to square all the deviation scores (that is, to multiply each score by itself). This step gets rid of negative values. Then we can compute the average of the squared deviation scores by adding them up and dividing the sum by the number of scores (N). Finally, we take the square zo9t ofthe result, which takes us from squared units ‘of measurement back to the same units that were488 used originally (in this case, mood disturbance levels). ‘The calculations just described are expressed by the following formula: APPENDIX Statistical Methods so Pew ‘Table At shows the calculations for computing the standard deviation for our experimental group. Try Your hand at computing the standard deviation for the control group, Remember, a large standard deviation signifies that scores are widely scatiered, and that therefore the mean is not terribly typical of the entire population. A small standard deviation tells us that most scores are clustered near the mean, and that there= fore the mean is representative. Suppose two class. €s took a psychology exam, and both classes had the same mean score, 75 out of a possible 100. From the means alone, you might conclude that the dlass- €s were similar in performance. But if Class A had AGS ees Calculating a Standard Deviation Scores Deviation cores square deviation scores ” a Mm an my 6 2 4 3 1 1 5 a 1 5 1 1 4 0 ° 4 ° ° 4 ° ° 4 ° ° 4 ° ° ‘ ° ° 4 ° ° 5 a 1 5 a 1 3 a 1 2 2 4 “oO. a s0~ ROOD, [B Va5 = 97 ieapopaen e088 ste are used to estimate the stnderd destin of tne sien Worn whic te compe fon by = 1 wstead oh er Gee Pett rot concer usta. Seton by a siandard deviation of and Class B hada standard deviation of 8, you would know that there was much more variability in performance in Clasy 5. ‘This information could be useful to an instructor in planning lectures and making assignments, Transforming Scores Sometimes researchers do not wish to work directly with raw scores. They may prefer numbers that are ‘more manageable, such as when the raw scores are tiny fractions. Or they may want to work with scores that reveal where a person stands relative to others. In such cases, raw scores can be transformed to other kinds of scores, Percentile Scores. One common transforma tion converts each rawy score to a percentile score {also called a centile rank). A percentile score gives the percentage of people who scored at or below a ‘given raw score. Suppose you learn that you have scored 37 on a psychology exam. In the absence of any other information, you may not know whether to celebrate or cry, But tf you are told that 37 Is equivalent to a percentile score of 90, you mow that you can be pretty proud of yourself: you have scored as well as, or higher than, 90 percent of those who have taken the test, On the other hhand, if you are told that 37 is equivalent to a percentile score of 50, you have scored only at the median—ouly as well as, or higher than, half of the other students. The highest possible percentile rank 's 99, or more precisely, 99.99, because you can never do better than 100 percent of a group when you are a member of the group. (Can you say what the lowest possible percentile score is? The answer {5 on page 490.) Standardized tests stich as those described in previous chapters often come with tables that allow for the easy conversion of any taw Score to the appropriate percentile score, based on data from a larger number of people who have already taken the test Percentile scores are easy to understand and easy to calculate. However, they also have a draw- back: They merely rank people and do not tell us how far apart people are in terms of rave scores Suppose you scored in the 50th percentile on an ‘exam, June scored in the 45th, Tricia scored in the 20th, and Sean scored in the 15th. The difference between you and June may seem identical to that between Tricia and Sean (five percentiles). But in terms of raw scores you and Sune are probably more alike than Tricia and Sean, because exam scores usually cluster closely together around the midpoint of the distribution and are farther apa:t at488 used originally {in this case, mood disturbance levels} ‘The calculations just described are expressed by the following formula: APPENDIX Statistical Methods (x= my? a0 NEN ‘ble A shows the clevatons for computing the standard deviation for our experimental group. Try Your and at computing the standard deviation fez the contol group Remember lng standard deviation signiies that sores ae widely scattered and that tberlone the mean snot texbly typical ofthe entre popes lation. small andard deviation telus that most seores are custered near the mean, end dat these: fore the mean isrepresenttive, Suppose to cas es took a psychology exam, and both classes had the same mean score, 75 out ofa possible 100, From the means alone, you might conclude tat the dass es were sila in performance. Buti Clas A hd Guaym ss Calculating a Standard Deviation Scores oo (Geeta ines ant tom — = aye sears : : ' : ; : : : : : : : : ; ; : : : : ; : 1. jt - [2-vB- 97 amp are used io estate te tans ceiaon of the Heese omni te sample wen oun, chon DY 1 etn Fe standard deviation of 3 and Class B had a standard deviation of 9, you would know that there was much more variability in performance in Class B, ‘This information could be useful to an instructor in planning lectures and making assignments, Transforming Scores Sometimes researchers do not wish to work directly with raw scores. They may prefer numbers that are ‘more manageable, such as when the raw scotes are tiny fractions. Or they may want to work with scores that reveal where a person stands relative to others. In such cases, raw scores can be transformed to other kinds of scores. Percentile Scores. One common transformation converts each raw score to a percentile score {also called a centile rank). A percentile score gives the percentage of people who scored at ot below a given raw score. Suppose you learn that you have scored 37 on a.psychology exam. In the absence of any other information, you may not know whether to celebrate or cry. But if you are told that 37 Js equivalent to a percentile score of 90, you know that you can be pretty proud of yourself; you have scared as well as, or higher than, 90 percent of those who have taken the test. On the other hand, if you are told that 37 is equivalent to a percentile score of 50, you have scored only at the median—only as well as, or higher than, half of the other students, The highest possible percentile rank 's 99, or more precisely, 99.99, because you can never do beiter than 100 percent of a group when you area member ofthe group. (Can you say what the lowest possible percentile score is? ‘The answer is on page 490.) Standardized vests such as those described in previous chapters often come with tables that allow for the easy conversion of any raw score to the appropriate percentile score, based on data from a larger number of people wo have already taken the tet. Percentile scores are easy to understand and casy to calculate, However, they also have a draw- back: They merely rank people and do not tell us how far apart people are in terms of raw scores. Suppose you scored in the 50th percentile on an exam, June scored in the 45th, Tricia scored in the 20th, and Sean scored in the 15th, The difference between you and June may seem identical to that between Tricia and Sean (five percentiles). But in terms of raw scores you and June are probably more alike than Tricia and Sean, because exam scores usually cluster closely together around the midpoint of the distribution and are farther apart at the extremes. Because percentile scores do not preserve the spatial relationships in the original distribution of scores, they are inappropriate for computing many kinds of statistics. For example, they cannot be used to calculate means. Z-scores. Another common transformation of raw scores is t0 z-scores, or standard scores. A wscore tells you how far a given raw score is above or below the mean, using the standard deviation as the unit of measurement. To compute a z-score, you subtract the mean of the distribution from the raw score and divide by the standard deviation: Unlike percentile scores, z-scores preserve the relative spacing of the original raw scores. The mean itself elways corresponds to a z-score of zeto, since it cannot deviate from itself. All scores above the mean have positive z-scores and all scores below ‘the mean have negative ones. When the raw scores form a certain pattern called a normal distribution (00 be described shortly), a z-score tells you how high or low the corresponding raw score was, relative to the other scores. If your exam score of 37 is equivalent to a2-score of +1.0, you have scored 1 standard deviation above the mean. Assurning a roughly normal distribution, that’s pretty good, because in a normal distribution only about 16 percent of all scores fall at or above 1 standard deviation above the mean. But if your 37 is equivalent to a z-score of ~1.0, you have scored 1 standard deviation below the mean—a poor score. Zescores are sometimes used to compare people’s performance on different tests or measures. Sey that Elsa earns a score of 64 on her first psychology test and Manucl, who is taking psychology from a dif= ferent instructor, earns a 62 on his first test. In Elsa's class, the mean score is 50 and the standard devia~ tion is 7, so Elsa’s z-score is (64 ~ 50)/7 = 2.0. In Manuel's class, the mean is also 50, but the standard deviation is 6. Therelore, his z-score is also 2.0 [(62 ~ 50)/6]. Compared to their respective lassinates, Flsa and Manuel did equally well. But be careful: This does not imply that they are equally able students, Perhaps Blsa’s instructor has a rep- utation for giving casy tests and Manuel's for giving hhard ones, so Manuel's instructor has attracted a ‘more industrious group of students, In that case, ‘Manuel faces stiffer competition than Elsa does, and ‘even though he and Elsa have the same z-score ‘Manuel's performance may be more impressive, Statistical Methods 489 APPENDIX You can see that comparing z-scores from dif- {erent people or different tests must be done with caution. Standardized tests, such as 1Q tests and various personality tests, use z-scores derived from a large sample of people assumed to be representative of the general population taking the tests, ‘When two tests are standardized for similar popu lations, it is safe to compare z-scotes on them. But z-scores derived from special sarnples, such as students in different psychology classes, may not be comparable, Curves In addition to knowing how spread out our scores are, we need to know the pattern of their distribution, At this point we come to a rather curious phenomenon, When researchers make a very large ‘number of observations, many of the physical and psychological variables they study have a distribution that approximates a pattern called a normal distribution. (We say “approximates” because a perfect normal distsbution isa theoretical construct and is not actually found in nature.) Plotted in a frequency polygon, a normal distribution has a sym- metrical, bell-shaped form known as @ normal curve (see Figure A.A) ‘Anomal curve has several interesting and convenient properties. The right side is the exact mi ror image of the let. The mean, median, and mode all have the same value and are at the exact center of the curve, at the top of the “bell.” Most obser~ vations or scores cluster around the center of the Number of scores or individuals 2 2 Mon 0728 ‘Standard deviations from the mean FIGURE A.4 A Normal Curve When standard deviations (or z-scores) are usec along the horizontal axis of a normal curve, certain fixed percentages of scores fall between the mean land any given point. As you can see, most scores fall In the middle range (between +1 and —1 standerd deviations from the mean).490 curve, with far fewer out at the ends, or “tails” of the curve. Most important, as Figure A.4 shows, ‘when standard deviations (or z-scores) are used on the horizontal axis of the curve, the percentage of scores falling between the mean and any given point on the horizontal axis is always the same. For example, 68.26 percent of the scores will fall between plus and minus 1 standard deviation from the mean; 95.44 percent of the scores will fall between plus and minus 2 standard deviations from the mean; and 99.74 percent of the scores will fall between plus and minus 3 standard deviations from the mean, These percentages hold for any normal ‘curve, no matter what the size of the standard deviation, Tables are available showing the percentages of scores in a normal distribution that lie between the mean and various points (as expressed by z-scores). ‘The normal curve makes life easier for psychologists when they want to compare individuals ‘on some trait or performance. For example, since 10 scores from a population form a roughly normal curve, the mean and standard deviation ofa test are all the information you need in order to know how many people score above or below a particular score. On a test with a mean of 100 anda standard deviation of 15, about 68.26 percent of the population scores between 85 and 115—1 standard deviation below and 1 standard deviation above the mean (see Chapter 7). Not all types of observations, however, are dis tributed normally. Some curves are lopsided, or skewed, with scores clustering at one end or the other of the horizontal axis (see Figure A.5). When the “tail” of the curve Is longer on the right than on the APPENDIX Statistical Methods B) Mose g Mode Medien Median ic ! = ‘Scores Seores @ 8 FIGURE A.5 ‘Skewed Curves Curve (a} Is showed neaabvely, to tne let. Curve (D) Is sewed positively, to the nght. The arecton of a ‘curve's skewness Is determined by the position of the long tall, not by the position of the bulge. In a srewed curve, the mean, median, and mode fall at aiferent points, let, the curve is sald to be positively, or right, skewed, ‘When the opposite is true, the curve is said to be negatively, or left, skewed. In experiments, reaction times typically form a right-skewed distribution, For ‘example, if people must press @ button whenever they hear some signal, most will react quite quick- Jy but a few will ake an unusually long time, caus- ing the right “tai” ofthe curve tobe stretched out. “Knowing the shape of a distribution can be extremely valuable. Paleontologist Stephen Jay Gould (1985) once told how such information helped bm cope with the news that he bad a rare and serious fonm of cancer. Being a researcher, he immediately headed for the library to lear all he could about his disease. The first thing he found was that it was in- ‘ourabe, with a meelian morality of only eight months afer discovery. Most people might have assured that a “median mortality of eight months” means “I wll probably be dead in eight months.” But Gould real- ized that although half of all patiens died within eight months, the other half survived longer.than that. Since his disease had been diagnosed in its early stages, he was getting top-notch medical treatment, and he had a strong will to live, Gould figuzed he could reasonably expect to be in the half ofthe distribution that survived beyond eight months, Even more cheering, the distribution of deaths from the disease was right-skewed: The cases to zhe left of the ‘median of eight months could only extend to zero ‘months, but those tothe right could stretch out for years, Gould saw no reason why he should not expect fo bein the tip of that right-hand tall, For Stephen Jay Gould, statistics, properly in- terpreted, were “profoundly nurturant and lfe-giv- ng.” They offered him hope and inspired him to fight his disease, The initial diagnosis was made in July of 1982, Gould remained professionally active for 20 more years. When he died in 2002, 1t was from an unrelated type of cancer. Answers to Questions in this chapter: Cont! group satstes = V96 = 98 Lowest possible percentile score: 1 {r, more precisely, 01)490 APPENDIX. Statistical Methods ‘curve, with far fewer out at the ends, or “tals” of the curve. Most important, as Figure A.4 shows, when standard deviations (or z-scores) are used on the horizontal axis of the curve, the percentage of scores falling between the mean and any given, point on the horizontal axis is always the same, For ‘example, 68.26 percent of the scores will fall be- ‘tween plus and minus 1 standard deviation from the mean; 95.44 percent of the scores will fall between plus and minus 2 standard deviations from the mean; and 99.74 percent of the scores will fall between plus and minus 3 standard deviations from the mean. These percentages hold for any normal ‘curve, no matter what the size of the standard deviation. Tables are available showing the percentages of scores in a normal distribution that lic between the mean and various points (as expressed by z-scores) The normal curve makes life easier for psy- ‘chologists when they want to compare individuals ‘on some trait or performance. For example, since 1Q scores from a population form a roughly nor- ‘mal curve, the mean and standard deviation ofa test are all the information you need in order to know how many people score above or below a particular score. On a test with a mean of 100 anda stan: dard deviation of 15, about 68.26 percent of the population scores between 85 and 115—I standard deviation below and 1 standard deviation above the mean (see Chapter 7) Not all types of observations, however, are distributed normally. Some curves are lopsided, or skewed, with scores clustering at one end or the other of the horizontal axis (see Figure A.5), When the “tail” of the curve is longer on the right than on the i Mode . Mode Median Mesian nn é i ‘Scores Scores ®) o FIGURE A.5 Skewed Curves Curve (2) is skewed negatively, to the left. Curve (b) |5 shewed positively,-to the night. The direction of a Gune’s skewness s determined by the positon of the long tall, not by the positon of the bulge. In a skewed curve, the mean, mecian, and mode fall at ferent points, left, the curve is said to be positively, orright, skewed, When the opposite is true, the curve is said to be negatively, or left, skewed, In experiments, reaction times typically form a right-skewed distribution, For example, if people must press a button whenever they hear some signal, most will react quite quickly; but a few will take an unusually Jong time, caus- ing the right “tail” of the curve to be stretched out. Knowing the shape of a distribution can be ex: tremely valuable. Paleontologist Stephen Jay Gould (1985) once told how such information helped him ‘cope With the news that he had a rare and serious form of cancer. Being a researcher, he immedictely headed for the library to leam all he could about his disease. The first thing he found was that it was in- ‘curable, with a median morality of only eight months after discovery. Most people might have assumed that “median mortality of eight months” means “T will probably be dead in eight months.” But Gould real- ized that although half ofall patients died within eight months, the other half survived longer than that. Since his disease had been diagnosed in its early stages, he was getting top-notch medical treatment, and he had a strong will to live, Gould figured he could reasonably expect to be in the half of the dis- tibution that survived beyond eight months. Even more cheering, the distribution of deaths from the disease was right-skewed: The cases to the left of the ‘median of eight months could only extend 10 ze70 ‘months, but those to the right could stretch out for years. Gould saw no reason why he should not expect to be in the tip of that right-hand tail, For Stephen Jay Gould, statistics, properly in- texpreted, were “profoundly nurturant and life-giving.” They offered him hope and inspired him to ‘ight his disease. The initial diagnosis was made in July of 1982. Gould remained professionally active for 20 more years. When he died in 2002, it was from an unrelated type of cancer. ‘Answers to Questions in this chapter: Control group statistics Median = 6 : SOAP paw Standard Deviation = 4) aS ~ V56 = 98 Lowest possible percentile score: 1 (Ct, more precisely, 01) Drawing Inferences Once data are organized and summarized, the next step is to ask whether they differ from what might have been expected purely by chance (see Chapter 1). A researcher needs to know whether its safe to infer that the results from a particular sample of people are valid for the entire population from which the sample was drawn. Inferential statisties provide this information. They are used in both experimental and correlational studies The Null Versus the Alternative Hypothesis In an experiment, the scientist must assess the possibility that his or her experimental manipulations have no effect on the subjects’ behavior. The statement expressing this possibility is called the null hypothesis. In our stress-and-humor study, the null hypothesis states that making up a funny ‘commentary will not relieve stress any more than ‘making up a straightforward narrative will. In other words, it predicts that the difference between the ‘means of the two groups will not deviate significantly from zero. Any obtained difference will be due solely to chance fluctuations, In contrast, the alternative hypothesis (also called the experi- ‘mental or research hypothesis) states that on the av- ‘erage the experimental group will have lower mood disturbance scores than the contro! group. ‘The null hypothesis and the alternative hypothesis cannot both be true, Our goal isto reject the null hypothesis. If our results turn out to be consis- tent with the null hypothesis, we will not be able to do so. Ifthe data are inconsistent with the null hypothesis, we will be able to reject it with some degree of confidence. Unless we study the entire population, though, we will never be able to say ‘that the alternative hypothesis has been proven. No matter how impressive our resulls are, there will always be some degree of uncertainty about the i erences we draw from them. Since we cannot prove the alternative hypothesis, we must be satisfied with showing that the null hypothesis is unreasonable. Students are often surprised to learn that in traditional hypothesis testing it is the null hy- pothesls, not the alternative hypothesis, that is ‘ested. After all, it is the alternative hypothesis that fs actually of interest, But this procedure does make sense. The null hypothesis can be stated precisely and tested directly. In the case of our fcr tious study, the null hypothesis predicts that the difference between the two means will be zero. Statistical Methods APPENDIX 491. ‘The alternative hypothesis does not permit a pre cise prediction because we don’t know how much the two means might differ (if, in fact, they do dif fer). Therefore, it cannot be tested directly Testing Hypotheses ‘Many computations are available for testing the null hypothesis, The choice depends on the design ‘of the study, the size of the sample, and other factors, We will not cover any specific tests here. Our Purpose is simply to introduce you to the kind of reasoning that underlies hypothesis testing. With that in mind, let us return once again to our data For each of our two groups we have calculated a mean and a standard deviation. Now we want t0 ‘compare the two sets of data to see if they differ ‘enough for us to reject the null hypothesis. We wish to be reasonably certain that our observed differences did not occur entirely by chance. What does it mean to be “reasonably certain”? How different from zero must our result be to be taken seriously? Imagine, for a moment, that we hhad infinite resources and could somehow repeat our experiment, each time using a new pair of groups, until we had “run” the entire population through the study. It can be shown mathemati- cally that if only chance were operating, our various experimental results would form a normal distribution. This theoretical distribution is called “the sampling distribution of the difference between means,” but since that is quite a mouthful, we will simply call it the sompling distribution for short. If the null hypothesis were true, the mean of the sampling distribution would be zero, That is, on the average, we would find no difference be. tween the two groups. Often, though, because of chance influences or random error, we would get a result that deviated to one degree or anather from zero. On rare occasions, the result would deviate 2 great deal from zero. We cannot test the entire population, though. All we have are data from a single sample. We ‘would like to know whether the difference between ‘means that we actually obtained would be close to the mean of the theoretical sampling distribution (i ‘we could test the entire population) or far away from it, out in one of the “tails” of the curve. Was our result highly likely to occur on the basis of chance alone or highly unlikely? Before we can answer that question, we must have some precise way (0 measure distance from the mean of the sampling distribution. We must know exactly hov far from the mean our obtained492 result must be tobe considered “far away.” only we knew the standard deviation of the sampling disti- bution, we could use it as our unit of measurement. We don’t know it, but fortunately, we can use the standard deviation of our sample to estimate it. (We vwill not go into the reasons that this is so.) Now we are in business. We can look at the ‘mean difference between our two groups and fig- ture out how far Its (in terms of standard deviations) from the mean of the sampling distribution. ‘As mentioned earlier, one of the conventent things about a normal distribution is that a certain fixed percentage of all observations falls between the mean of the distribution and any given point above or below the mean. These percentages are avall able from tables. Therefore, if we know the “distance” of our obtained result from the mean of the theoretical sampling distribution, we automatical- ly know how likely our result is to have occurred strictly by chance To give a specific example, if it tums out that our obtained result is 2 standard deviations above the mean of the theoretical sampling distribution, we know thatthe probability ofits having occurred by chance is less than 2.3 percent. If our result is 3 standard deviations above the mean of the sam= pling distribution, the probability of its having occurred by chance is less than .13 percent—Iess than 1 im 800. In either case, we might well suspect that oar result did not occur entirely by chance afterall ‘We would call the result statistically significant. (Psychologists usvally consider any highly unlie- Iy result tobe of interest, no matter which direction ittakes. In other words, the result may be in either “tail” of the sampling distribution.) ‘To summarize: Statistical significance means that ionly chance were operating, our result would be highly improbable, so we are fairly safe in con- eluding that more than chance was operating— namely, the influence of our independent variable. We can reject the null hypothesis, and open the champagne. As we noted in Chapter 1, psychelo- sists usually accept a finding as statistically signli- cant if the likelihood of its occurring by chance is 5 percent or less (see Figure 4.6). This cutoff point sives the researcher a reasonable chance of con firming reliable results as well as reasonable pro- tection against accepting unreliable ones. Some cautions are in order, though. As noted in Chapter 2, conventional tests of statistical signli- ‘ance have drawn serious criticisms in recent years. Statistically significant results are not always psychologically interesting or important. Further, sta tistical significance is related to te size of the sample. APPENDIX Statistical Methods 2 & 25% 05% 25% probabil probebity= ‘robabity= reject nul Bo not rect reject nul hypothosis nal hypothests\\ hypcthess 3 on 2 8 Distance from the mean (in standard deviations) FIGURE A.6 Statistical Significance ‘This curve represents the theoretical sampling distib- ution discussed In the text, The curve Is what we would expect by chance if we did cur hypothetical stress-ané-humor study many times, testing the lente population. Ifwe used the conventional sigrlf- cance level of 05, we would regard cur obtained result as significant only f the probabllty of getting a result that fer from zero by chance (in either drec- tion) fotwied 5 percent or less. AS shown, the result ‘must fll far out in ane of the tals ofthe sampling, distribution, Otherwise, we cannot reject the null hypothesis, ‘A large sample increases the likelihood of reliable results, But there is a trade-off: The larger the sample, the more probable it is that a small result having no practical importance will reach statistical significance. On the other hand, with the sample sizes typically used in psychological research, there is a ‘good chance of falsely concluding that an experl- mental effect has not occurred when one actually has (Hunter, 1997), For these reasons, itis always useful to know how much of the total variability in scores was accounted for by the independent variable (the effect size). (The computations are not discussed here.) If only 3 percent of the vatiance was accounted for, then 97 percent was due either to chance factors orto systematic influences of wich the researcher was unaware. Because human he- havior affected by so many factors, the amount of variability accounted for by a single psychological variable is often modest. But sometimes the effect size is considerable even when the results don't quite reach significance, ‘Oh, by the way, to find out what experimental psychologists have learned about humor, stress,492 result must be to be considered “far away.” If only we ‘knew the standard deviation of the sampling distribution, we could use it as our unit of measurement, We don't know it, but fortunately, we can use the standard deviation of our sample to estimate it. (We Will not go into the reasons that this isso.) Now we are in business. We can look at the mean difference between our two groups and fig- ture out how far itis (in terms of standard deviations} ftom the mean of the sampling distribution, Asmentioned earlier, one of the convenient things about ¢ normal distribution is that a certain fixed. Percentage of all observations falls between the ‘mean of the distribution and any given point above or below the mean. These percentages are available from tables, Therefore, il we know the “distance” of our obtained result from the mean of the theoretical sampling distribution, we automatical- ly know how likely our result is to have occurred strictly by chance. ‘To give a specific example, if it tums out that our obtained result is 2 standard deviations above the mean of the theoretical sampling distribution, ‘Wwe know that the probability of its having occurred by chance is less than 2.3 percent. If our result is 3 standard deviations above the mean of the sampling distribution, the probability of its having oc- ‘curred by chance is less than .13 percent—Iess than Ln 800. In either case, we might well suspect that ur result did not occur entirely by chance after all. ‘We would call the result statistically significant. (Psychologists usually consider any highly unlike Iy result to be of interest, no matter which direction ittakes, In other words, the result may be in either “tail” of the sampling distrfbution.) ‘To stommmarize: Statistical significance means that if only chance were operating, our result would be highly improbable, so we are fairly safe in concluding that more than chance was operating— namely, the influence of our independent variable, We can reject the null hypothesis, and open the champagne. As we noted in Chapter 1, psycholo- sists usualty accept a finding as statistically signifi- Cant if the likelihood of its occurring by chance fs 5 percent or less (see Figure A.6). This cutoff point gives the researcher a reasonable chance of con- firming reliable results as well as reasonable pro- ‘ection against accepting unreliable ones. Some cautions are in order, though. As noted in Chapter 2, conventional tests of statistical signi cance have dravm serious criticisms in recent years, Statistically significant results are not always psy. chologically interesting or important. Further, sta- sal significance is related to the size ofthe sample, APPENDIX Statistical Methods 25% 5% 2x probabity- —/”—probabitiy. robebty. reject rut Serotreect jected | Ppstese ralmyoatiesis \\ poe on 2 Distance from the mean (in standard deviations) FIGURE A.6 Statistical Significance This curve represents the theoretical sampling distr lution discussed in the text. The curve is what we Would expect by chance if we did our hypothetical stress-and-humor study many times, testing the entre population. Ifwe used the conventional signif. cance level of 05, we would regard our obtained Fesult as significant only f the probability of detting 2 result that for trom zero by chance (in either clrec tion) totaled 5 percent or ess, AS shown, the result ‘must fll far out in one of the tails of the sampling distribution. Otherwise, we cannot reject the mul hypothesis, A large sample increases the likelihood of reliable results, But theresa trade-oll: The lazger the sample, the more probable itis thata small result having no Practical importance will reach statistical signifi ‘cance. On the other hand, with the sample sizes typically used in psychological research, there is a good chance of falsely concluding that an experi- ‘mental effect has not occurred when one actually hhas (Hunter, 1997}. For these reasons, itis always useful to know how much of the total variability in scores was accounted for by the independent variable (the effect size). (The computations are not discussed here.) If only 3 percent of the variance was accounted for, then 97 percent was dite either to chance factors or to systematic influences of which the researcher was unaware. Because human behavior is affected by so many factors, the amount of variability accounted for by a single psychological variable is often modest. But sometimes the effect size fs considerable even when the results don't quite reach significance, ‘Oh, by the way, to find out what experimental psychologists have learned about humor, stress, and health, see p. 442 in Chapter 13. The study of ‘humor turns out to be pretty complicated. The results depend on how you define “sense of humor,” how you do the study, and what aspecis of humor you are investigating. Researchers are learning SUMMARY ‘¢ When used correctly, statistics expose unwarranted conclusions, promote precision, and help researchers spot trends amid diversity, © Olten, the fist step in data analysis is to orga nize and condense data in a frequency distribution, a tally showing how often each possible score (or in~ terval of scores) occurred. Such information can also be depicted in a histogram (bar graph) or a Jrequency polygon (line graph). © Descriptive statistics summarize and describe the ata, Central endency is measured by the mean, median, or, less frequently, the mode. Since a measure of central tendency may or may not be highly representative of other scores in a distribution, itis also important to analyze variability. A large standand deviation means that scores are widely scaitered about the mean: a small one means that most scores are clustered near the mean, @ Raw scores can be transformed into other kinds of scores. Percentile sores indicate the percentage of people who scored at or below a given raw score. Zescores (standard scores) indicate how far a given KEY TERMS frequency distribution 485 mode 487 measure of variability 487 493 that humor probably doesn’t help people live longer, but it is more emotionally beneficial than ‘moping around, So, when gravity gets you down, try a little Ievity. Statistical Methods APPENDIX raw score Is above or below the mean of the distribution, © Many variables have a distribution approximat ing a normal distribution, depicted as a normal curve. ‘The normal curve has a convenient property: When standard deviations are used as the units on the horizontal axis, the percentage of scores falling be~ tween any two points on the horizontal axis is always the same. Not all types of observations are distributed normally, however. Some distributions are skewed to the left or right, © Inferential statistics can be used to test the ull hypothesis ado ell researches Whether result di fered significantly from what might have been expected purely by chance. Basically, hypothesis testing involves estimating where the obsained re sult would have fallen ina theoretical sampling ds- tribution based on studies of the entire population “in question. If the result would have been far out in one of the “tails” of the distribution its considered statistically significant. A statistically significant result may or may not be psychologically interesting or important, so many researchers also compute the eft size normal curve 489 right. and left-skewed a an distributions 490 ee eae 487 inferential statistics 491 enc polgonine standard deviation 487 ‘yon as cae null bypothess 491 iit 486 oe alternative hypothesis 491 esaipave ais alee Geos Pt centle rank) 488 sampling distribution 491 ‘measure of central tendency 486 mean 486 medion 487 ‘score (standard score) 489 normal distribution 489 statistically significant 492 effect size 492

Statistics PDF

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Statistics PDF

Diunggah oleh

Hak Cipta:

Format Tersedia

Anda mungkin juga menyukai