Anda di halaman 1dari 20

5

The Graphical Description of Data: SPSS v. 10-11


(reproduced from Chapter 5 of Statistics for Social and Health Research) Chapter 3 looked at ways of summarizing data by producing frequency tables. However, often the most striking way of summarizing information is with a graph. A graph (sometimes called a chart) provides a quick visual sense of the main features of the data. The particular graphs that can be constructed in any given context are determined largely by the level of measurement, as indicated in Table 5.1.
Table 5.1 Graphs by level of measurement
Level of measurement Nominal and ordinal Interval/ratio Graph Pie graph Bar graph Pie graph Bar graph (discrete variables) Histogram (continuous variables) Polygon

Some general principles In this chapter, to illustrate the procedures and principles involved in constructing graphs, we will use the hypothetical data for 20 people that we introduced in Chapter 2 and which are contained in the SPSS file Ch05.sav on the CD that comes with this book (Table 5.25.4).
Table 5.2 Sex of respondents
Sex Male Female Total Frequency 12 8 20

Table 5.3 Health rating of respondents


Health rating Unhealthy Healthy Very healthy Total Frequency 7 5 8 20

The graphical description of data: SPSS v. 10-11


Table 5.4 Age of respondents
Age in years 18 19 20 21 22 Total Frequency 7 5 4 2 2 20

93

The same general rules that apply to the construction of these tables also apply to the construction of graphs. Most importantly a graph should be a selfcontained bundle of information. A reader should not have to search through the text in order to understand a graph (if one does have to search the text this may be a sign that the graph is concealing information rather than illuminating it). In order for a graph to be a self-contained description of the data we need to: give the graph an appropriate title; clearly identify the categories or values of the variable; indicate, for interval/ratio data, the units of measurement; indicate the total number of cases; indicate the source of the data.

We will now look at the construction of different types of graphs. Unlike previous chapters where we separated the SPSS procedures from the hand calculations, here we will only use SPSS to generate the relevant graphs. The superiority of computer graphics has made the hand-drawing of graphs redundant; unlike calculating the mean, where some knowledge of the procedure for doing things by hand can be helpful, we almost invariably go directly to a computer to generate a graph.

Pie graphs A pie graph can be constructed for all levels of measurement. A pie graph presents the distribution of cases in the form of a circle. The relative size of each slice of the circle is equal to the proportion of cases within the category represented by the slice. To generate a pie graph on SPSS we select Graphs/Pie from the menu (Figure 5.1). The Pie Charts dialog box will appear, which provides some options for how the pie graph is to be defined. We are using the default setting, which is to group cases according to their values for the same variable. This is the appropriate choice for the data we are using so we click on the Define button. This brings up the dialog box shown in Figure 5.2.

94

Statistics for Research

Figure 5.1 The Pie Graph command

Figure 5.2 The Define Pie dialog box

In this dialog box we select the variable we want to analyze and what the slices of the pie will represent. The default setting is for the slices to represent the number of cases in each category, which is what we want here. The output shown in Figure 5.3 will be generated when we click on the OK button. The pie chart describes the data in the same way as the frequency table above, but in a more visually striking form. However, the basic SPSS procedure produces a bare minimum amount of information. We do not get a title, nor the number or proportion of cases in each slice. Some of this information could have been generated if we had clicked on the Titles button in the Define Pie dialog box. A more complete set of options for improving the presentation of graphs in SPSS is available if we simply double-click on the graph in the Viewer window. This brings up the graph in a new SPSS window called the SPSS Chart Editor (Figure 5.4).

The graphical description of data: SPSS v. 10-11

95

Figure 5.3 SPSS Pie Graph output

Figure 5.4 The SPSS Chart Editor

From the menu bar at the top (or the tool bar below it) we have a range of options for improving the quality of our graph. Here we simply want to attach a title to the pie graph, and to provide appropriate labels to the slices. I leave it to you to explore the full range of options available under the Chart Editor. To attach a title and labels to the graph we use the Chart/Title, Chart/Options, and Chart/Footnotes commands.

96

Statistics for Research

Figure 5.5 The Chart command

The Title command allows us to type in various levels of titles. Here our needs are simple so we type in Sex of respondents as Title 1. The Options command allows us to change the information that is attached to each slice of the pie. We can see in Figure 5.3 that the default setting is for the text labels (in this example male and female) represented by each slice to be included. If we click on the boxes next to Values and Percents we will also generate the absolute and relative frequencies for each slice displayed, along with the text label. You may also wish to click on the Format button and set the number of decimal places to 0. Using the Footnote command we can type Source: Hypothetical survey for Footnote 1 and n=20 for Footnote 2 to indicate respectively the source of data and the total number of cases represented by the graph. We can also right-justify these footnotes under Footnote Justification. If we follow these commands Figure 5.6 will appear.

Figure 5.6 SPSS pie chart

The graphical description of data: SPSS v. 10-11

97

We can see that the basic chart is still the same, but now we have all the relevant information as well. Female respondents represent 8 out of 20 students, so the slice representing the frequency of female respondents is 8/20ths of the circle (40%). Male respondents represent 12 out of 20 cases (60%). Now that we have the pie chart in front of us with all the necessary information what does it tell us about the distribution of the variable we are investigating? Pie graphs emphasize the relative importance of a particular category to the total. They are therefore mainly used to highlight distributions where cases are concentrated in only one or two categories. For example, Figure 5.7 illustrates the distribution of students by age, and clearly reflects the large proportion of total cases that are 18 years of age.

Figure 5.7 SPSS pie chart with relative frequencies

Pie graphs begin to look a bit clumsy when there are too many categories for the variable (about six or more categories, as a rule of thumb, is usually too many for a pie graph). If we look at the pie graph for the age distribution of students, we can see that if there were any more slices to the pie the chart will begin to look a little cluttered. A pie chart is actually a specific case of a more general type of graph. Any shape can be used to represent the total number of cases, and the area within it divided to show the relative number of cases in various categories. For example, the United Nations has used a champagne glass to illustrate the distribution of world income (Figure 5.8). Rather than using a simple circle, the metaphor of the champagne glass, a symbol of wealth, makes this graph a powerful illustrative device for showing the inequality of world income.

98

Statistics for Research

Figure 5.8 World distribution of income Source : United Nations Development Program (1992) Human Development Report 1992, Oxford: Oxford University Press.

Bar graphs Another type of graph that can be produced from the same set of data as that used to generate a pie chart is a bar graph. While these two types of graph can be generated from exactly the same set of data, they describe different aspects of the distribution of that data. We have noted that a pie graph emphasizes the relative contribution of the number of cases in each category to the total. On the other hand, bar graphs emphasize the frequency of cases in each category relative to each other. A bar graph has two sides or axes: Along one side, usually the horizontal base, of the graph are the values of the variable. This axis of the bar graph is called the abscissa. Along the left vertical side of the graph are the frequencies, expressed either as the raw count or as percentages of the total number of cases. This vertical axis is known as the ordinate. To generate a bar graph on SPSS we choose the Graphs/Bar command from the menu bar. This brings up the Bar Charts dialog box (Figure 5.9).

The graphical description of data: SPSS v. 10-11

99

Figure 5.9 The Bar Charts dialog box

From this box we choose one of three types of bar charts, each of which we will explore separately: Simple bar graphs Clustered bar graphs Stacked bar graphs

Simple bar graphs The default setting in the SPSS Bar Charts dialog box (Figure 5.9) is for a simple bar chart to be generated. This is evident from the dark border around the Simple button. If we click on Define the Define Simple Bar: dialog box appears, which is almost identical to the Pie Graph dialog box we came across in the previous section. In this dialog box we select the variable we are investigating from the source variable list, which in this instance is Sex of respondents, and click on OK. This will generate a bar graph in the Viewer window. As with the pie graph we generated earlier we can tidy up this basic graph by going into the Chart Editor (double-click on the graph). I have generated a bar chart and edited to add information about the source of data and the sample size, and to improve its appearance (Figure 5. 10).

100

Statistics for Research

Figure 5.10 An SPSS bar chart

Again I emphasize that SPSS provides many options for improving the layout and appearance of a graph and the amount of information it contains. Often these choices (such as the wording and placement of titles) are matters of aesthetic judgment so you need not follow my preferences exactly. Play around with the options so that you produce something that you like, but always keep in mind the general rules of graph construction that we discussed above.

Stacked bar graphs An alternative way of presenting information is to stack the categories on top of each other, rather than side by side. A stacked bar graph (sometimes called a component bar graph) layers in a single bar the number (or proportion) of cases in each category of a distribution. Each bar is divided into layers, with the area of each layer proportional to the frequency of the category it represents. It is very similar in this respect to a pie chart, but using a rectangle rather than a circle. One advantage of a stacked bar graph is that a number of distributions can be combined into a single chart. Such a graph is especially helpful in comparing cases broken down by the categories of another variable, such as distributions over time. For example, a health assessor may want to compare the current health rating of respondents with their ratings for the previous 2 years, the data for which are given in Table 5.5.

The graphical description of data: SPSS v. 10-11


Table 5.5 Health ratings of respondents, 19971999
Health rating Very healthy Healthy Unhealthy Total 1997 3 9 15 27 1998 6 6 11 23 1999 8 5 7 20

101

To generate a stacked bar graph in SPSS we click on the Stacked button from the Bar Charts dialog box (Figure 5.9). I have entered the data from Table 5.5 in a separate SPSS file, with the year of the survey as a separate category axis variable. This generates the stacked bar graph in Figure 5.11.

Figure 5.11 SPSS stacked bar chart output

We can immediately see that, relatively speaking, these people regard themselves as getting healthier. Notice that we are using percentages in this graph to standardize the distributions, since the totals for each year vary. Stacked bar graphs have the same limitations as pie graphs, especially when there are too many values stacked on top of each other. Figure 5.11 contains only three values stacked on top of each other to form each bar, but when there are more than five categories, and some of the layers are very small, the extra detail hinders the comparison of distributions that we want to make, rather than illuminating it.

Clustered bar graphs An alternative to a stacked bar chart is to place the bars for each category side by side along the horizontal. This is called a clustered bar graph.

102

Statistics for Research

A clustered bar graph displays, for each category of one variable, the distribution of cases across the categories of another variable. Figure 5.12 presents a clustered bar graph for health rating across years.

Figure 5.12 An SPSS clustered bar chart

Again I can see the relative decrease in the proportion of unhealthy respondents and the gradual increase in the proportion of very healthy respondents.

Histograms Bar graphs constructed for discrete variables always have gaps between each of the bars: there is no gradation between male and female, for example. A persons age, on the other hand, is a continuous variable, in the sense that it progressively increases. Even though we have chosen discrete values (whole years) for age to organize the data, the variable itself actually increases in a continuous way (as we discussed in Chapter 1). As a result, the bars on a histogram , unlike a bar graph, are pushed together so they touch. The individual values (or class mid-points if we are working with class intervals) are displayed along the horizontal, and a rectangle is erected over each point. The width of each rectangle extends half the distance to the values on either side. The area of each rectangle is proportional to the frequency of the class in the overall distribution. Using the example of the age distribution of students we can construct the following histogram. We select Graphs/Histogram from the SPSS menu which brings up the Histogram dialog box (Figure 5.13).

The graphical description of data: SPSS v. 10-11

103

Figure 5.13 The Histogram dialog box

This dialog box is one of the more straightforward ones that appear in SPSS. We basically select the relevant variable from the source list and transfer it to the target list. (Note for the next chapter, though, the option to Display normal curve.) Selecting Age in years and clicking on OK will produce the histogram in Figure 5.14.

Figure 5.14 SPSS Histogram output

We can see that the area for the rectangle over 18 years of age, as a proportion of the total area under the histogram, is equal to the proportion of all cases having this age. The histogram indicates that we are working with a continuous variable whereby we have clustered cases to the nearest whole year, so that a score of 18 years of age gathers together cases that are actually between 17.5 and 18.5 years, a score of 19 years of age gathers together cases that are

104

Statistics for Research

actually 18.519.5 years, and so on. SPSS has also generated some other descriptive statistics: the standard deviation, mean, and total number of cases, which conform to our calculations in Chapter 4.

Frequency polygons When working with continuous interval/ratio data, we can also represent the distribution in the form of a frequency polygon. A frequency polygon is a continuous line formed by plotting the values or class mid-points in a distribution against the frequency for each value or class. If we place a dot on the top and center of each bar in a histogram, and connect the dots, we produce a frequency polygon (Figure 5.15). To illustrate this point I have not used SPSS to generate a polygon (which SPSS calls a line graph) for the age distribution of respondents, since we cannot superimpose a polygon onto the equivalent histogram in SPSS.

Figure 5.15 A frequency polygon

Notice that the polygon begins and ends on a frequency of zero. To explain why we close the polygon in this way we can think of this distribution as having seven values for age, rather than five, by including the ages 17 and 23. Since there are no such respondents with either of these ages the dot at these values is on the abscissa, indicating a frequency of zero. There are certain common shapes to polygons that appear in research. For example, Figure 5.16 illustrates the bell-shaped (symmetric) curve, the J-shaped curve, and the U-shaped curve. The bell-shaped curve is one we will explore in much greater length in later chapters.

The graphical description of data: SPSS v. 10-11

105

Figure 5.16 Three polygon shapes: (a) a bell-shaped curve; (b) a J-shaped curve; and (c) a U-shaped curve

For polygons that have a bell shape such as that in Figure 5.16(a) we usually describe two aspects of it. The first is the skewness of the distribution, an issue we raised in Chapter 4. If the curve has a long tail to the left, it is negatively skewed, whereas if it has a long tail to the right it is said to be positively skewed. The second important aspect of a bell-shaped curve is kurtosis . Kurtosis refers to the degree of peakedness or flatness of the curve. We can see in Figure 5.17(a) a curve with most cases clustered closed together. Such a curve is referred to as leptokurtic, while curve (b) that has a wide distribution of scores which fatten the tails is referred to as platykurtic . Curve (b) lies somewhere in between, and is called mesokurtic.

Figure 5.17 A (a) leptokurtic, (b) platykurtic, and (c) mesokurtic curve

There are precise numerical techniques for measuring the skewness and kurtosis of a distribution, but these are beyond the scope of this book. Wherever necessary we will simply rely on a visual inspection of a polygon to describe these aspects of the distribution. One aspect of frequency polygons (and histograms) that will be of utmost importance in later chapters needs to be pointed out, even though its relevance may not be immediately obvious. A polygon is constructed in such a way that the area under the curve between any two points on the horizontal, as a proportion of the total area, is equal to the proportion of cases in the distribution that have that range of values. Thus the shaded area in Figure 5.18, as a proportion of the total area under the curve, is equal to the proportion of cases that are aged 21 or above. This is 4/20, or 0.2 of all cases.

106

Statistics for Research

Figure 5.18

Another way of looking at this is to say that the probability of randomly selecting a respondent aged 21 years or more from this group is equal to the proportion of the total area under the curve that is shaded. This probability is 0.2, or a 1-in-5 chance of selecting a respondent in this age group.

Common problems and misuses of graphs Unfortunately graphs lend themselves to considerable misuse. Practically every day in a newspaper we can find one (if not all) of the following tricks, which can give a misleading impression of the data. For those wishing to look at this issue further the starting point is Darrell Huffs cheap, readable, and entertaining classic, How to Lie with Statistics (New York: W.W. Norton, various printings).

Relative size of axes The same data can give different graphical pictures depending on the relative sizes of the two axes. By stretching either the abscissa or the ordinate the graph can be flattened or peaked depending on the impression that we might desire to convey. Consider for example the data represented in Figure 5.19. This provides two alternative ways of presenting the same data. The data are for a hypothetical example of the percentage of respondents who gave a certain answer to a survey item over a number of years. Graph (a) has the ordinate (the vertical) compressed relative to graph (b); similarly graph (b) has the abscissa (the horizontal) relatively more compressed. The effect is to make the data seem much smoother in graph (a) the rise and fall in responses is not so dramatic. In graph (b) on the other hand, there is a very sharp rise and then fall, making the change appear dramatic. Yet the two pictures describe the same data!

The graphical description of data: SPSS v. 10-11

107

Figure 5.19 Two ways of presenting data

In order to avoid such distortions the convention in research is to construct graphs, wherever possible, such that the vertical axis is around two-thirds to three-quarters the length of the horizontal.

108 Truncation of the ordinate

Statistics for Research

A similar effect to the one just described can be achieved by cutting out a section of the ordinate. Consider the data on the relative pay of women to men in Australia between 1950 and 1975, shown in Figure 5.20.

Figure 5.20 Womens income as a percentage of mens income, 19501975 presented in two different ways

The graphical description of data: SPSS v. 10-11

109

Graph (a) is complete in the sense that the vertical axis goes from 0 through to 100 percent. The vertical axis in graph (b) though has been truncated: a large section of the scale between 0 percent and 50 percent has been removed and replaced by a squiggly line to indicate the truncation. The effect is obvious: the improvement in womens relative pay after 1965 seems very dramatic, as opposed to the slight increase reflected in graph (a).

Selection of the start and end points of the abscissa This is especially relevant with data that describe a pattern of change over time. Varying the date at which to begin the data series and to end it can give different impressions. For example, we may begin in a year in which the results were unusually low, thereby giving the impression of increase over subsequent years, and vice versa if we choose to begin with a year in which results peaked. Consider, for example, the time series of data over a 15 year period shown in Figure 5.21.

Figure 5.21

If we wanted to emphasize steady growth we could begin the graph where the dashed is displayed, which would completely transform the image presented by the graph.

Exercises 5.1 Which aspects of a distribution do pie graphs emphasize and which aspects do bar graphs emphasize?

110 5.2 5.3

Statistics for Research Explain the difference between a bar graph and a histogram. Survey your own statistics class in terms of the variables age, sex, and health rating. Use the graphing techniques outlined in this chapter to describe your results. In Exercise 2.2 the following data were entered into SPSS (time, in minutes, taken for subjects in a fitness trial to complete a certain exercise task): 31 35 19 12 17 39 37 16 48 31 45 25 56 38 56 26 42 21 37 28 23 32 34 39 40 56 58 36 42 82 45 80 10 27 27 80 71 38 39 37

5.4

Using SPSS select an appropriate graphing technique to illustrate the distribution. Justify your choice of technique against the other available options. 5.5 Consider the following list of prices, in whole dollars, for 20 used cars: 8600 7980 13,630 11,670 9200 11,100 9400 10,000 8200 12,900 11,800 11,250 11,300 10,750 10,200 12,750 10,600 9200 12,240 12,990

From these data: (a) Construct a frequency table using class intervals: 70008499, 85009999, 1000011499, 1150012999, 1300014499. (b) Construct a histogram using these class intervals. (c) Construct a frequency polygon using these class intervals. 5.6 Construct a pie graph to describe the following data:
Migrants in local area, place of origin
Place Asia Africa Europe South America Other Total Number 900 1200 2100 1500 300 6000

What feature of this distribution does your pie graph mainly illustrate?

The graphical description of data: SPSS v. 10-11 5.7

111

From a recent newspaper or magazine find examples of the use of graphs. Do these examples follow the rules outlined in this chapter? Use the Employee data file to answer the following problems with the aid of SPSS. (a) I want to emphasize the high proportion of all cases that have clerical positions. Which graph should I generate and why? Generate this graph using SPSS, and add necessary titles and notes. (b) Use a stacked bar graph to show the number of women and the number of men employed in each employment category. What does this indicate about the sexual division of labor in this company? (c) Generate a histogram to display the distribution of scores for current salary. How would you describe this distribution in terms of skewness?

5.8

Anda mungkin juga menyukai