Anda di halaman 1dari 26

How to Use and Apply SPSS Version 15.

0 for Windows

Benjamin Noblitt Lead Tutor Quantitative Skill Center

Notes This mini guide/ walkthrough is for SPSS 15.0 for Windows and was written in the Summer of 2010. If there are any changes/updates that apply afterwards, this document will not address them. Additionally, if a previous version (pre 15.0) is being used, this document might not fully apply. This said, the concepts behind what is being explained should still be the same, so long as the functions of the program remain similar. This guide is intended to give the reader a VERY basic understanding on how to use SPSS. This is also intended to be a crash course type of guide. The length of this document is indicative of how in depth this document goes. Furthermore, there is a lot that this document does not mention. If you want to perform a very thorough analysis with very in depth statistics, you can read the SPSS survival manual by Julie Pallant.

Table of Contents Preparations .................................................................................................................................... 3 Getting Started ................................................................................................................................ 4 Entering Data ...................................................................................................................... 7 Output Window .................................................................................................................. 9 Walkthrough .................................................................................................................................. 10 Analysis ............................................................................................................................. 10 Graphing .............................................................................................................. 10 Regression ............................................................................................................ 13 Correlation ........................................................................................................... 14 Testing .............................................................................................................................. 15 One Sample T-Test .............................................................................................. 15 Independent Sample T-Test ................................................................................ 17 Paired Sample T-Test .......................................................................................... 18 One Way ANOVA ................................................................................................. 20 Hypothesis Testing Crash Course .................................................................................................. 26

Preparations Know that SPSS is for analyzing data from a statistical researchers point of view. The name SPSS stands for Statistical Package for the Social Sciences SPSS. This means that this program is best suited to comparing data samples or surveys using statistics. Furthermore, this is a very robust program that is much more difficult to use if you do not know statistics very well; it is designed primarily for the researcher or statistician. Essentially, if you need to use this program, you will need to learn basic statistics and its application before using this program. Set up a survey or data collection mechanism that is numerical in its nature. SPSS works with numbers, so the data needs to be able to be quantified. (If its qualitative, you will need to make it quantitative.) Make sure you have a goal in your research. Have a question you want to answer, and collect data that you think will help answer that question. For example do you think there is a correlation between stress level and red blood cell count? These are questions that can be answered numerically, making SPSS a good candidate for analyzing this data.

Getting Starting 1. Open up SPSS (may be PASW Statistics 18) 2. A screen will pop up that has 5 different options to choose from as follows: Figure 1:

a. Run the Tutorial This is a very informative walkthrough made by the authors of the program. This tutorial will only show you how to operate the program. Whereas it is the intention of this walkthrough to show you how to apply it. b. Type in data This will be the most commonly used option. If this is selected, you can type in the data manually. c. Skip Run an existing query, and Create a new query using Database Wizard d. Open an existing data source Like the name implies, you can open a file that already has data in it from this option. 4

3. Select Type in data. You will be presented with a screen that looks a lot like an Excel page. At the top of the work-area (the area of white cells) you will see tabs that say var; these columns are the variables of your sample data. Consequently, the numbered rows represent the number of data points for each variable in your data. Figure 2

4. At the bottom of the page there are two tabs: Data View and Variable View. The variable view brings you to the page where you can enter the variables into the program. For example if you have a range of age groups, this is where you would enter the age groups and their numerical assignments. The data view is the tab where you enter the raw data into the program. Figure 3:

5. Open the file accidents.sav from SPSS/Tutorial/sample_files/accidents.sav. You can see an example of how these tabs work by playing with them. If you look at the variable view tab, you can see that there are 4 variables. If you click on the data view tab, you can see these four variables at the top of the work-area. A confusing aspect to SPSS is that the rows in the variable view are the columns in the data view tabs (see Figure 4 and 5). Figure 4: (Variable View)

Figure 5: (Data View)

6. Look at the variables in variable view; each one has an abbreviated name (with no spaces) that will show up in the data view tab in place of the var that usually shows up. The Label column is the full name of the variable, which you can enter in as well with spaces. Under the Values column, you can determine what numerical values represent certain data points. For example, in the accidents.sav file, you can see that a 1 represents a female, and a 0 represents a male. In the Scale column, this is where you determine the type of data being used. For example, gender is a categorical type of variable, so it best fits into the nominal1 option. Furthermore, age category is an ordinal variable, so the ordinal2 option would fit best.

Nominal: A type of data classification that categorizes data either by name or category. Example: Hair color is Brown = Br, Black = Bk, Red = Rd, and Blonde = Bl 2 Ordinal: A type of data classification that categorizes data by some order or rank. Example: agree = 3, somewhat agree = 2, does not agree = 1

7. In the data view tab, notice a small button that looks like a price tag with a red end (see Figure 6). If you click on this button, it changes the data from numbers to labels. You can see here that Females are denoted with a 1 and males are denoted with a 0 as stated in the variable view. This makes entering data into the work-area easier. Instead of writing female for all of your data, you can just enter a 1 and the program knows to treat it as female.

Figure 6:

Entering Data 1. Open up the variable view tab. 2. You must first assign every variable a numerical value. For example assign a 1 to Under 21 and a 2 to 21-25 and a 3 to 26-30. This is the ONLY way for SPSS to quantify your data. Additionally it is advised to at least determine the Label, Values, and Measure of your data. Figure 7:

3. Open up the data view tab. 4. After you have entered in all your variables, you need to enter the data into the work-area. You must now numerically assign your data sources. For example, if you have taken a survey of 6 people (each person is a data source), each person should be assigned a number of 1 through 6 7

on paper that will correspond to rows in SPSS. This allows SPSS to correlate data from each data source (i.e. what people responded with what data). 5. Each data set can then be assigned to its respective row. This means that data source 1 (the first person surveyed) will be entered in row 1, and data source 2 will be entered in row 2 and so on. 6. There are two ways to enter data in the data view tab: a. If the Value Labels button is pressed, you can simply select to response of each survey from a drop down menu in each cell. This can be done if the cell that corresponds to the correct data source (row) and variable (column) is selected (see Figure 8). b. If the Value Labels button is not pressed, you must enter the data in as a number (with no drop down menu) (see Figure 9). Figure 8:

Figure 9:

Output Window Figure 10:

1. The Output window is a window that is separate from the main window which shows you the results of your analysis. For example, if you want to have a graph of some of your data, it will pop up in the output window. 2. There are two sections of the output window. The area on the left is a log of all the data you have generated in your study. The area on the right is the visual output of that data. For example, if you have 5 different graphs you have made, you can click on any specific chart in the left area, and it will appear (selected) in the right area. If you look at Figure 10, you can see that the Log portion is selected, and it highlighted the log data in the right section. 3. Note: you can close the output window and NOT close SPSS; it is an auxiliary feature.

Walkthrough Analysis 1. Graphing is the most basic type of visual analysis and is presented first in this guide. a. Open SPSS/Tutorial/sample_files/accidents.sav. At the top of the screen, select Graphs>Chart Builder. b. You can select the different type of graphs you want to create from the Gallery tab. Select Scatter/Dot and then Grouped Scatter by clicking and dragging the thumbnail up into the empty area above the tabs (see Figure 11). Figure 11:

c. Next, you can see the variables that you have created in the upper left side of the screen. Click and drag Age Category and place it as the independent variable (x-axis), and then click and drag Accidents to the dependent variable (Y-axis). Next, click and drag Gender into the Set Color area. This allows the graph to distinguish between the two genders, when graphing the two sets of data provided (see Figure 12). Figure 12:

10

d. Click OK and look at the graph produced in the output window. If you notice, both males and females are graphed on the same scale. This can allow you to compare different sets of data on the same graph. (see Figure 13)

Figure 13:

11

e. Go back to the chart builder and experiment with the different graphing options and see what you can gather from visual analysis of the sample data. For example, look at using a simple bar graph, and then look under the Basic Elements tab for the three dimensional aspects, and then place gender on the Z axis instead of the Set Color area. These are both very common ways to visually compare larger sets of data. Graphing can sometimes illustrate a large amount of data in a very compact and elegant way if done effectively. Looking at the different types of graphs available to use in SPSS can help familiarize you with the options that you can use if you wish to use a graph. Also, a graph can act as a useful reference in a report (always recommended).

2. Regression is a good trend analysis tool. In simple terms a regression can allow you to model data with a linear equation (a straight line). a. Go to File>Open>Data, and then go to SPSS/Tutorial/sample_files/car_sales.sav. If you notice, you now have two main windows open. Opening up a new file does not close out your existing project. 12

b. From the top of the window go to Analysis>Regression>Linear. It should be noted that this guide is only covering linear regression for now. In the screen that pops up, you can see all the variables on the left hand side (lots of them!). c. Select Horsepower and then click on the triangle under the dependent area. This will place Horsepower as the dependent variable in the regression. Next, select Price and click on the triangle under the independent area. Note that the triangle acts as an arrow showing you if you can put a variable either in or out of the area available. See Figure 14 for a reference. Figure 14:

d. Click OK. A new set of tables should appear in the Output window. In the output window, there are a lot of tables that SPSS will make for you. If you want to model the data with a linear equation, the bottom table should be used. The equation in this example can be made in the form of y=mx+b where y is the horsepower of the car, and x is the price of the car.

Figure 15:

13

e. The slope is given by the unstandardized coefficient for price in thousands: 3.323 and the intercept is given as the constant (unstandardized coefficient constant): 94.670 so the final equation is: y=3.323x+94.670. You can now use this equation to extrapolate beyond the data set and model the data mathematically.

3. Correlation is a good analysis tool because it can provide a numerical value of how closely two variables are related to one another. (PLEASE note correlation is NOT causation) a. Go to File>Open>Data, and then go to SPSS/Tutorial/sample_files/car_sales.sav. b. At the top of either window (output or the main screen) go to Analyze>Correlate>Bivariate. Click on Price in Thousands and move it over to the variables box by clicking on the triangle (arrow). Do the same with the Horsepower variable (see Figure 16).

Figure 16:

c. Note that the box next to Pearson is checked. This will produce a table that has the Pearson correlation values for two variables (basic correlation). This tool is most handy when finding the correlation (not causation) between large numbers of variables. If you 14

noticed, the correlation coefficient for the two variables is given in both the regression (see previous section) and the correlation table (R=.840) (see Figure 17). Figure 17:

Testing In order to determine the validity of sample data, you need to test it. Testing determines the likelihood of obtaining the sample results given a certain assumption (the assumption is called the Null Hypothesis: ). If you are unfamiliar with Hypothesis testing, please refer to the end of this walkthrough for a brief crash course on hypothesis testing.

1. One Sample T-Test is a way to determine whether or not you are convinced that your sample can allow you to make a conclusion from it, i.e. (does it reject )? This is done by comparing the t-critical value to the t-value. If the t-critical is larger than the t-value (using absolute values) then there is insufficient proof to reject , however if the t-value is larger, then should be rejected in favor of the alternative hypothesis: (also using absolute values). a. Go to File>Open>Data, and then go to SPSS/Tutorial/sample_files/callwait.sav. b. We will see if the waiting time for being on hold is different than 9 minutes. c. Go to Analyze>Compare Means>One-Sample T Test. A box should open up that looks like Figure 18.

Figure 18:

15

d. Select Minutes to Respond and move it to the Test Variable(s) box. Then type in a 9 in the Test Value box to represent our 9 minutes benchmark. This means that SPSS is going to do a T-Test comparing the mean of the data in the sample compared to 9. Click OK. e. Figure 19 shows the output for the T-Test. Note that the significance level is very small at .001. Since the alpha value is so small, you would reject your Null Hypothesis (mean waiting time = 9) in favor of your Alternative Hypothesis (mean waiting time 9).

Figure 19:

16

2. Independent Samples T-Test is useful to compare two sets of sample data. It is only useful when the data comes from two distinct groups, rather than the same group of data (i.e., one sample from men, and the other from women). a. Go to File>Open>Data, and then go to SPSS/Tutorial/sample_files/callwait.sav. b. We will be looking to see if there is a significant difference between the Monday (coded as 2) and Friday (coded as 6) waiting times. is that there is no difference, and is that there is a difference. c. Go to Analyze>Compare Means>Independent Sample T Test. Place the Minutes to respond as the Test Variable, and the Grouping Variable will be the days of the week, see Figure 20. Figure 20:

d. The Grouping Variable now needs to be defined. Click on Define Groups and type in a 2 for Group 1 and then a 6 for Group 2 (see Figure 21). This lets SPSS know that you are comparing only the Monday and Friday data. Also, since this is a two sample test of differences, determining which set of data is Group 1 or Group 2 is arbitrary (it does not matter). Click on Continue and then Hit OK. Figure 21:

17

e. SPSS should now bring you to the output window with a table that looks like Figure 22. First you must look to see if Equal Variance can be assumed. To do this, look at Levenes Test for Equal Variances and see if the significance is large (above .05 or so). If it is, then use the upper row of data, if it is not use the bottom row of data. In our case, the significance level of Equal Variances is far too small to assume equal variances, so we use the bottom row. f. Look at the bottom row and see that the T-value of the test is -7.519 (very large!), and the significance of the test (Sig two tailed) is .000, so there is significant data to determine a difference among the mean call waiting time of Monday compared to Friday. Another way to measure this is to look at the 95% confidence interval; if 0 is within the range, there is not sufficient data to determine a difference. In our interval, 0 is outside of the range, which supports our previous conclusion (reject ). Figure 22:

3. Paired Samples T-Test is useful when the data is either from the same group, or the data is paired up. For example if two police officers are giving tickets each day for a week, the data would be paired because the number of tickets written by each officer is paired since we have two sets of data that are coming from the same two officers on the same days. a. Go to File>Open>Data, and then go to SPSS/Tutorial/sample_files/dietstudy.sav. b. We want to see if there is a difference in someones weight before and after a treatment plan. Therefore, is that there is no difference, and is that there is a difference. To do this, we will look at a set of paired data so that each test subject has two sets of data; an initial weight (wgt0) and a final weight (wgt4). c. Go to Analyze>Compare Means>Paired Sample T Test. Select both the initial weight and the final weight and move them over to the Paired Variables box and click OK (see

18

Figure 23). Note: you must select BOTH data points before you can move then to the paired variables box, because it treats them as one pair. Figure 23:

d. In Figure 24 it can be seen that the t-value for this test is 11.175, with a sig value of .000. This means that the likelihood of these two samples being from the same population is effectively 0. Thus, we will reject our in favor of the (there is a difference in the before and after populations).

Figure 24:

19

4. One-Way ANOVA test is most useful when more than two means are being compared. This is done by looking at the sample variances. ANOVA stands for ANalysis Of VAriances. a. Go to File>Open>Data, and then go to SPSS/Tutorial/sample_files/demo.sav. b. We want to see if there is a difference among income levels based on education background. So we are going to compare the means of the income levels for the different education backgrounds. Our : there is no difference among the different mean income levels, and our : there is a difference. We will do this by using an analysis of variance test (ANOVA). c. Go to Analyze>Compare Means>One-Way ANOVA There are two terms that need to be clarified; the Dependent and the Factor. The factor is what you are using to distinguish the groups from one another. Our Factor is the Level of Education. To try and make this easy to remember the dependent depends on what factor we are looking at (kind of). The dependent is what you are measuring as a result of the change in factor. Our dependent is Household Income in Thousands (see Figure 25). Figure 25:

d. If the ANOVA test fails to reject , then we are ok to leave the test as it is. However, if the test rejects in favor of , then we need to conduct an additional Tukey test to determine which levels of income are in fact different. This is because the ANOVA test only tells us that there is a difference among the means, it does not tell us where the difference is. Consequently, the Tukey test will show where the differences are, should we want to know. Click on the Post Hoc button, and select the Tukey test as shown in Figure 26. Click Continue.

20

Figure 26:

e. To further assess the validity of the assumptions used in the ANOVA and Tukey test, we must also make sure that the variances are treated appropriately. To do this, click on Options, and then select Descriptive, Homogeneity of variance test, Drown-Forsythe, and Welch, and the Means plot. See Figure 27. Click Continue and then OK.

Figure 27:

f.

The output window will have a lot of data provided! Dont worry because once it is explained, it is not too bad to follow. The first table shows the basic descriptive statistics

21

of the groups of data analyzed. The data provided is a good overview of what the data looks like in a condensed form with all the basic data provided (see Figure 28). Figure 28:

g. Before the ANOVA test can be done, we must see if the assumption of equal variances is appropriate. The table of Homogeneity of Variances looks at this very thing and tests it using Levenes test (see Figure 29). Since the F value of the test is large (14.766), the corresponding significance level (.000) is well below .05, meaning that equal variances cannot be assumed. If the sig value was larger than .05, we could simply move onto the ANOVA test. Figure 29:

h. As a result of non equal variances, we need to be careful with interpreting the ANOVA table. Because the variances are not equal, the resulting F value and significance value might be off enough to sway the output of the test (see Figure 30). The results of the ANOVA test imply that we would reject our because of the low significance level, however we need to verify this with the Welch and Drown-Forsythe tests (Robust Test for Equality of Means). Figure 30:

22

i.

If the Welch and Brown-Forsythe tests produce values that are above .05 then this would contradict our ANOVA test and cause us to fail to reject our , however, since both of these tests yield the same results as our ANOVA test, (significance levels of .000) we can use the ANOVA results. See Figure 31. Figure 31:

j.

Now that we have sufficient evidence for rejecting , we need to determine which groups are different from one another. The Post Hoc table in Figure 32 provides the data need to see which groups of data have sufficient evidence to say that they are indeed different. Look at the Did not complete high school (I) compared to High school degree (J). Notice that both the sig level and the 95% confidence interval are above .05 and includes 0 respectively. This means that there is not sufficient data to say that the income levels between Did not complete high school and High school degree are different. Conversely, look at Did not complete High school compared to Some college. This sig level and confidence interval are below .05 and do not contain 0 respectively, so there is sufficient data to say that this pair of data is likely to be different. If you noticed, the mean differences that have an asterisk next to them indicate a significant difference between values.

23

Figure 32:

k. We can now look at a graphical representation of the means. If we did not use the ANOVA test, we could not infer the validity of this graph (Figure 33), but now we can say which points contain significant differences. For example, we see graphically that Did not complete high school and High school degree have different means, but now we can say that the this difference is not enough proof to say that they are in fact different (statistically). This illustrates the potentially misleading information in a graph. Furthermore, the axis on the graph can make small differences that are not significant appear to be large differences since the graph zooms in, which can allow for faulty analysis of a graph.

24

Figure 33:

Hypothesis Testing Super Crash Course! A hypothesis is a claim that you need to test in order to accept3 something. Whenever you create a hypothesis, there are always two sides to it: the claim that you want to prove and the claim that is already assumed to be true. The Alternative hypothesis is generally what you are trying to prove. For example if we want to show that trees are taller than shrubs, you would say that the mean height for a shrub is less than the mean height for a tree.

Accept is in quotes since the statistical terminology is fail to reject. The reason for this is that there are no absolutes in statistics.

25

The Null Hypothesis is what is assumed to be true. For example, if you have absolutely no prior knowledge of something, then this is what you would believe, Using the previous case, if you never knew anything about trees or shrubs, you would not know that there is a difference in their heights, or you could assume that shrubs are taller than trees. This said, the Null hypothesis is that the mean height for shrubs is either equal to (or greater than)4 the mean height of trees.

The alpha value () is the point at which you are convinced that your sample is significant. When in doubt, let =.05 or if you REALLY want to be sure let =.01 If the probability of an event occurring (given the assumption of your ) is less than some arbitrary percentage (alpha value or ) then it is called significant. This means that there is sufficient evidence to say that is not likely, therefore you reject in favor of the . Another way to find if a test is significant (when you reject in this document. ) is to use a t-test as explained previously

It is assumed that there is no difference among the trees and shrubs, however, it is also IMPLIED that the shrubs could be greater than because that would still not contradict the alternative hypothesis.

26

Anda mungkin juga menyukai