Anda di halaman 1dari 55

SPSS1 Finding and Managing Data for the Social Sciences Tutorial Goal: Introduction to the fundamentals of searching

and managing data in SPSS for statistical analysis. Participants learn how to find data on the Internet, understand variables, and manipulate data. Workshop covers descriptive statistics, charts, and tables. Different data formats are discussed. Lets start with the basics of data. For statistics, there are four kinds of levels of measurement for the variable. All your analyses extend from what kind of level your variable is. They are NOIR. (N)ominal (O)rdinal (I)nterval (R)atio Lets talk about each one. Nominal means that the number simply represents a category of objects. There is no measured different among the objects or people. Some examples are giving states numbers (N.Y. 1, Connecticut 2, R.I. 3), assigning a number for gender (male 1, female 2), or designating college major (History 1, Business 2, Sociology 3). You are just assigning a number to something. Ordinal means the larger number for the object is truly larger in some sort of amount. This typically means rank. Some examples are 1st, 2nd, and 3rd places in a contest, or preferences for different movies. However, there is no exactly measured difference among the objects. We dont know definitively how much larger or better 1st is compared to 2nd. We just know 1st is somehow larger than 2nd. Interval means, like Ordinal, that there is a rank for the objects or people, but there is also a measurement for the ranking. Some examples are degrees Celsius or Fahrenheit. We know that the different between 98 and 99 degrees is the difference of the amount of mercury in a thermometer. Also, the difference between 42 and 43 degrees is the same amount between 98 and 99. However, there is no true zero, which stands for a complete lack of the object being measured. 0 degree does not mean there is no mercury, for example. Ratio means, like Interval, that there is a measurement for the ranking, but there is also a true zero. A true zero means that there is lack of the quality being measured. Some examples are income, where the difference between $10,000 and $11,000 is known and zero means complete lack of income. These levels are very important and we will be discussing them more as we go on. Nominal and Ordinal are called Nonparametric Data, and Interval and Ratio are called Parametric Data. Basically, if the data can have a mean, they are parametric. The statistical analyses that you can use are dependent on what level your data are. Now, lets get some data from online. When you work with data, you need to remember that they come in different digital formats. SPSS has its own format, but there are others out there, such as Excel and SAS. You need to know how to import data sets from a different format into SPSS. The data come in big spreadsheets, where variables run left to right, and cases up and down. A variable is something being measured, such as income, time, or height. A case is the observation, such as a person, country, or company. Sometimes, when the names are too long, the labels of the variables are given a code, and sometimes these codes are indecipherable. In order to read the variables, you need a codebook to explain what each variable means.

Variables

Cases

First, be organized. You might download dozens of files and you need to keep everything in order. So, lets create one folder to store everything that you download in. 1. Go to your storage device. In my case, Im going to use my C drive. Right-click onto the screen, and in the pop-up menu, select New and left-click on Folder. Call the new folder DATA. Close the storage device and return to the desktop.

STOP! Keep track of how much space your files are and how much space your storage device has free. If you fill up your storage device, computations wont work

Ok, lets start a project. Lets say were interested in survey data about American opinions about obesity. For example, how many calories people consume, if they diet or not, etc. We can find these data by using the Social Sciences Data Resource Page. Please use Internet Explorer as your browser.

1. In the Library Resource Guides (http://dl.lib.brown.edu/gateway) under Special Subject Guides on the right, go to Statistics and Data.

2. Under onto Data By Subject, you can search for data by the subject or university department. We want social data, so start looking in Sociology.

Remember, this resource page is constantly being updated. If you cant find what youre looking for, you can contact me, Tom, at Thomas_Stieve@brown.edu or call 863-7978.

3. On the Sociology page, we are interested in the Inter-university Consortium for Political and Social Research (ICPSR), which has a huge archive of 6,000 studies covering many subjects, such as urban studies, the environment, social indicators, etc.

4. On the ICPSR homepage, Advanced Search option under Search. This option gives you more possibilities for your search and is a good place to start.

5. On the Search Full Data Holdings, you can search just like in a library catalog, where you search for words in data records. In the dropdown menu, you can choose what field youre interested in. Lets just leave the default any field.

6. What we want is data on surveys on obesity done in the United States. In the first empty field type in obesity. In the second empty field type in United States. Click Search.

7. In the Search Results page, you can see what data match your search criteria. You see how many results were found on top. The results are sorted by relevance according to our search criteria, but you can rearrange the list by Sort by Title or Score by Date which weights the relevance by date.

8. If you look closer at the citation, you have a number of options. Lets look at the first one, ABC News/Time Magazine Obesity Poll. Click on description, which contains the metadata for this data. Metadata is data about data. So, its a detailed description about the data, like a catalog record, and should accompany the data.

Here you have a record of this data set.

If you scroll down, you can see 1) a detailed summary of the file and 2) subject terms, which contain term names that this file is categorized under. If youre not satisfied with this data set, check through the other data sets that share similar subject terms for other comparable files. 1

2 9. 1) Scroll down to Access and Availablity and 2) click on summary of holdings. Here the metadata say what format the data are in.

10. There can be many different formats, but this file is a SPSS portable file. This file is already prepared to be directly imported into SPSS. So, all we need to do is download, unzip and open it into SPSS.

11. Click back to the Description & Citation page. Scroll to the top of the page and click on the Download tab.

12. On the Log In page, if youre a New User, click on Create Account and follow the procedure. If not, log in like usual.

13. After creating an account, you will be prompted to log on through your email. When you do, you might need to search for the data again. You can search for Study No. 4040 and go to the Download page. Here at Step 1, you can select the files you want. Today leave it as All Files, which means we can get the SPSS Portable and Documentation. Leave Step 2 at All datasets. In Step 3 Add to data cart, click on Add to Data Cart, which prepares the data for download.

14. You can ignore Step 4, which simply reviews what we have already done. In Step 5. Download cart contents, you see that we added 2 files. Click on Download Data Cart.

15. You must click on the I Agree button to accept the Terms of Use agreement.

16. In the File Download dialog, you are prompted to save your file. Click Save. Remember, there are newer versions of zip, so your dialogs might not look the same. Close out of the browser when the file is downloaded.

17. In the Save As window, navigate to our DATA folder and click Save.

10

18. In the Download complete dialog, click Open Folder (If the Save As window closes and it doesnt give you a new window, simply go to your DATA folder in your storage device).

19. You now have the DATA folder window. In it, you see the folder with a number. In this example, 5165062 was randomly assigned. This folder has been zipped, which is a compression data format to squish large amounts of data into a smaller amount. So, we have to unzip it to get to our data. Right-click onto the folder and then left-click onto Extract All.

20. You now get an Extraction Wizard. Click Next. Wizard: Instructional help in an application or system development environment that guides the user through a series of multiple choice questions to accomplish a task. (http://www.techweb .com/encyclopedia/)

11

21. In the next window, Select a Destination, you can rename the folder if you wish. Leave it at 5165062 and click Next.

22. In the Extraction Complete window, click Finish.

23. You now have the unzipped folder for our obesity survey data. There are many files within this folder, so lets navigate to the one we need. If you click folder, you get the1) ICPSR_04040 folder. Click on the ICPSR_04040 folder, and you have the 2) DS0001 folder. This contains our data, so click onto it.

12

24. In the DS0001 folder, you have two files, the 04040-0001-Codebook and the 04040-0001-Data.por file.

First, you have the 04040-0001-Codebook. Codebooks are extremely important for data sets because they contain the metadata, or the data about data. They sometimes have the instrument, or the original survey document, and an explanation of each variable. Double left-click on the Codebook to open it. You see on the right the Table of Contents (TOC), which should explain every detail of the data. In the TOC, click on Poll

Here you see the actual instrument that was used to collect the data. Spend a minute and look at some of this survey and see how our data was generated. Then you can close the codebook.

13

For SPSS, we need the 04040-0001-Data.por file. This contains our data. It is a .por file, which is a data format specifically used to transfer survey data into SPSS. You can now close this DATA folder and the ICPSR window. Now, that were organized for our data, lets start SPSS. 1. Left-click onto Start from your Desktop and move your cursor over All Programs, which give you a menu off all the programs.

2. Put your cursor over SPSS Inc, which brings up a pop-up menu and then move the cursor over the SPSS 16.0 pop-up. To start the program, left-click onto SPSS 17.0 (SPSS is available on the CIS computers at the Rockefeller Library under the Computational menu).

14

3. You now receive the SPSS Data Editor window. Here you display your data and your variable information. In the SPSS for Windows, Open an existing data source is selected. Make sure More Files is highlighted and slick OK.

4. In the Open File, navigate to where you are keeping the obesity survey in the DS0001 folder. Its set by default to .sav, which is another type of tile. So, In Files of type, select All Files. Here you see the 040400001-Data.por file. Double-click onto the .por file and open it.

Files of type is a good place to look if you have a file that youre not sure if you can import it. If the file type is listed here, you can convert it into an SPSS file.

15

5. Two windows open quickly, first the Output and then the Data Editor. The Output window is a separate file from the data file that contains the results from our statistical analyses and notes. You need to toggle between the two windows. This file has two parts 1) Table of Contents (TOC), and 2) the View. The TOC will list everything and the View will show the results. A Log is immediately started of all the commands performed.

In the TOC, you can control whats being viewed in the View simply by clicking the to close the results or the + to open the results. Click the + sign again and open the log in the display. Remember, all your results for the analyses that you are going to do now appear here in the Output. Open the Output book for this tutorial. Be prepared to toggle between the windows as the Output window will open with the execution of each command.

The Log will keep making notes of every command. You can turn it off by going into the Edit pull down, and left click on Options.

In the Options window, click on the Viewer tab. You see in the bottom left a box for Display the commands in log. Deselect it and press OK.

16

6. You will come to the output window later. Save it for now. 1) Click on the Save and navigate to where youre keeping the data. 2) Call the file obesity. In the Save as type field, notice that its being save as Viewer Files .spv. You will see three types of data files in this tutorial. The spv extention is for results. 3) Click on Save and go back to the Data Viewer.

3 2 You now have the new obesity survey data in SPSS. In the SPSS Data Editor, you have two views: the Data View and the Variable View. You can see the tabs for each view on the bottom left. The Data View is the spreadsheet with all the cases and variables. For example, this obesity data set has 1202 cases and 114 variables. The Variable View has the information about each one of our variables. Lets explore some of the data management functions in SPSS. Lets start with the Variable View.

17

Click on the Variable View tab. In the Variable View you have 10 columns of attribute information about your variables.

If you click onto any of the boxes, you see the border becomes bolded, meaning that box is active and you can change its contents.

But lets just look at some of the important attributes. Name, as the name implies, is the name of the variable. Type defines what kind of variable it is. 1. If you click on a box in the column to modify an entry, you see a little box with three dots appears. This means there is a dialog box, or a window with a number of options, that comes for this information. Click onto it.

2. Next you are given the Variable Type dialog. Here you can choose how your numbers are formatted.

18

Width sets the number of characters before the decimal that is shown in the column. The actual number can be more, but only the specified number shows in the column. When you click onto the box, you get a pair of scroll arrows. Simply scroll to the desired amount.

Decimals sets the number of characters after the decimal that is shown in the column. The actual number can be more, but only the specified number shows in the column. Clicking on the box brings up a scroll menu to choose a number.

Label is very important since you can define what the name of the variable actually means. Expand the column to see that this data set is managed very well. They wrote out the whole question that was asked for this variable.

STOP! Believe it or not, sometimes you might get a poorly organized data set that leaves the labels blank. From the data file, you have no idea what any of the variables mean. If thats the case, immediately turn to the codebook or metadata for the explanations. If you want to expand a column, move the cursor over the border in the label box. The cursor changes into an arrow. Left-click and hold down on the mouse and expand the width.

Values allows you to code your answers. Remember back to the level of measurement section. Everything is assigned a number and you need to keep track of what those numbers mean.

19

1. Click onto the Values box for the 3rd variable, tzone. This brings up the three dot box for a dialog. Click onto the three dots box.

2. In the Value Labels dialog, you can see what each number means. You can also add a number in the Value field and a definition in the Value Label.

STOP! If you are designing your own data set, please take the time and make proper labels and values for your data. You understand everything now, but when you come back to this data set in a few years times, will you understand everything then? You never know. Measure sets the level of measurement for our numbers. This is very important because SPSS reads the measure and only allows you to perform tests that are appropriate for the data. 1. Click on the box and bring up the scroll arrow. *We are not going to work with the attribute information Missing, Columns and Align in todays lesson, so please explore them on your own.

2. In the dropdown menu, you can choose one of three levels: Scale, Ordinal and Nominal. SPSS has combined Interval and Ratio in the Scale level (Parametric Data).

20

Ok, that was the Variable View part of SPSS. Now lets look at Data View and see how we can manage and modify our data set. Click onto the Data View tab. First, lets save our data set. 1. Left-click on the File menu, and then left-click on Save. STOP! Its a good habit to save your work often. Remember, if the software crashes, you lose all you work from the last time you saved.

2. In the Save Data As, 1) you see that this file is being save in the file we downloaded from ICPSR. 2) In File name, type Obesity as the name of the file. 3) Look in Save as type. This file is being saved as a .sav file, which is a data file for SPSS. 4) Click on Save.

2 4 3

Now, lets look at some basic functions in SPSS. Value Labels allows you to see what all the numbers in the Data View mean. Simply click on the icon and the numbers are given the verbal explanation from the values column in Variable View. Click on the icon again to switch it back.

In SPSS, you can also create new variables. This comes in handy especially if youre creating your own data set.

21

1. In the Data View, scroll right to the end of the variable columns.

2. Right-click onto the label box and bring up the pop-up menu. Left-click onto Insert Variables.

3. You see in the Data View that a new variable has been created. You can populate the variable with the values of whatever variable you create. Type in 1 for case 1, 2 for case 2 and 2 for case 3.

22

4. If you double left-click very rapidly on a title bar of the variable in the Data View, it immediately switches onto that variable in the Variable View.

5. Lets change the name of the new variable. Double left-click onto the entry for Name for variable var0001. Lets name our new variable new. Click Enter. You can double-left click on the gray 115 bar to take you back to the Variable View.

Clear allow you to delete a variable. 1. Back in Data View, lets get rid of this unneeded variable. Right-click on the label box and then left-click on Clear, which deletes the variable.

23

STOP! Remember, if you make a mistake, you can always go back a step and Undo. In the Edit menu, left-click on Undo. It shows you your last step after the word Undo, and it takes the data set back to before that last function.

Move Variables or cases allows you to literally move the variables around on the spreadsheet. This is especially good if you have a large data set. You might only be interested in a few variables and want them close together so you can work more easily with them. 1. In our obesity data set, lets imagine we want to move the variable q45 (the participants weight), which is near the end of the variables towards the right, closer to the beginning of the spreadsheet near the variable respno (participant number). Left-click on the label box once and you get an arrow. Left-click on the label box twice and you see a box next to the cursor. Be sure to hold the cursor over the title boxes. This means that the variable column can be moved.

2. Keeping your finger on the left side of the mouse, move the cursor left on the spreadsheet. You see a red line appear, which tells you that the variable is going to placed to the right of that line. Drag the variable all the way down to respno. The line at the right of the respno column turns red when you put the cursor over it. This means the variable is going to be placed in the column immediately to its right.

24

3. Release your finger on the left and the column drops into the spot.

You can also move a variable by inserting variable and then cutting and pasting the variable into that column. Lets move gender (q921) next to weight. 1. Select the column where you want the new variable by left-clicking on it. Lets put q921 to the right of q45. Then right-click to bring up the pop-up menu. Left-click on Insert Variable and put in a new variable.

2. 1) Right-click on the label box of q921 to bring up the pop-up menu. Then left-click on Cut and remove the column. 2) Back next to q45, right-click onto the label box in the new empty variable column to bring up the pop-up menu. Left-click on Paste to place the cut variable there.

25

Sorting the cases in the variable is another useful function. You can sort the variables from high to low or low to high. 1. Right-click on q45 and then left-click on Sort Ascending. This sorts the variables from high at top down to low. Obviously, Sort Descending sorts from low at top down to high.

There are also several good data management functions in the Data menu. Lets explore some of them. Weight Cases makes cases more important to compensate for over- or under-sampled groups. For example, if your sample is small, but you know that a certain region represents 40% of the population, you can weight those cases from that region so the number of those cases is higher. This function literally multiplies the value of one variable by the frequency of another. 1. Left-click the Data menu and then left-click Weight Cases.

2. In the Weight Cases dialog, you can weight the cases by a variable. 1) In the Current Status: display you see the data is already weighted by the variable weight - Weight cases by weight. 2) If you had to do this manually, scroll down in the variable menu to Weight and select it. The researchers calculated the variable weight according to the population of each census region. Click on the Weight cases by radio button. and move the Weight variable into the Frequency Variable field. 3) Since the weight Click the arrow is already set, just click Cancel. 26

1 3 You can also see at the bottom right corner of the Data View that Weight On is displayed.

Remember, all calculations and analyses will be based on the weight. Whenever you modify your data, all further operations work on that modification. 27

Select Cases allows you to select a number of cases within a variable(s) and create a new variable from them. For example, from our two variables for gender and weight, we want to select only men who weigh more than 100 lbs. 1. In the Data menu, left-click on Select Cases.

2. In the Select Cases dialog, select If condition is satisfied. Click the If icon.

3. In the Select Cases: If calculator, we need to build a conditional statement that will select only men who are over 100 lbs. We need to select the variables from the left, move them into the calculator on the right, and then set up the conditional. First, select Q921 on the left and click the arrow icon to move it into the and then 1, which is the code for men. So, that says choose men from calculator. Click the equal sign the variable gender. Next, we need to add the second part of our statement. Click on the ampersand . 28

Select Q.45 Respondent WEIGHT IN POUNDS [q45] and click on the arrow to move it to the calculator. Type in the > (greater than) sign and then 100 for weight. This part of the statement says and choose from weight and cases over 100 lbs. Then click Continue.

Another good data management technique is to right-click on the variable menu in the dialog. This gives you many choices of listing the variables, for example by variable name, label, alphabetically or file order. 4. You see back in the Select Cases dialog, the statement has been set. Click OK.

So, explore the results and scroll down the page. You see that those cases that did not meet the condition are crossed out. If you scroll to the end of the variables to the right, you see a new variable has been created which has initially been labeled filter_$ (You can change the name in the Variable View). In this variable, 1 means the case met the condition men over 100 lbs and 0 means the case did not meet the condition. You can now do statistics with this new variable. 29

5. Please go back into the Select Cases dialog and click Reset , which will set it back to before we performed this function. Click OK. The new variable remains, but the crosses from the cases are removed. Recode allows you to transform your code in the variable without manually changing the number for each case. For example, q1 asks how the participant rates his/her health. The answers were 1 Excellent, 2 Good, 3 Not so good and 4 Poor. Lets say we want to collapse these four answers into two groups good and bad. So, we need to recode answers 1 and 2 into 1 for Good and answers 3 and 4 into 2 for Bad. 1. In the Transform menu, left-click on Recode into Different Variables. Different Variables makes a new variable with our recoded data and is the safest option if you make a mistake. You can delete the new variable and start again. Recode Into Same Variables changes the actual variable.

2. In the Recode into Different Variables dialog, 1) select q1 in the variable list on the left and move it to Numeric Variable > Output Variable; 2) in Name, type in the name of the new variable, Health; 3) in Label, type in the description Participants health statement; 4) click Change to set the new variable. Now, we need to set our values. 5) Click on Old and New Values. 1 2 3

30

3. In the Recode into Different Variables: Old and New Values dialog, 1) select Range and type in the range of the value that is being changed 1 through 2; 2) In Value, type in the new value 1; 3) click on Add and set the value recoding. 2

1 3

4. Please follow the same procedure for range 3 through 4, set the new value at 2, and add it. Then click Continue.

5. Back in the Recode into Different Variables, click OK.

31

If you scroll all the way down to the end of the variables, you see there is a new variable with our new code for health.

6. Now, we need to set the values. Click onto the Variable View. 1) In the new Health variable, in the Values column, click on the three dots icon and bring up the Value Labels dialog. 2) Set the first value. In Value, type in 1. 3) Set the label for the new value. In Value Label, type in Good Health. 4) Click Add.

1 2 3 4

7. Follow the same procedure for value 2. In Value type in 2 and in Value Label type Bad Health. Click Add. Click OK to finalize the new values.

32

Computing Variable allows you to calculate a new variable from the values of variables you already have in a data set. For example, in our data set, variables 4_1 through 4_8 are dealing with the same subject matter; they are all questions on the survey asking about how much of a health problem certain things like AIDS, drug abuse, and obesity are in this country according to the participant (Explore the questions in the Variable View). The lower the number means the more important. Lets combine these eight variables into one to try to measure the participants consideration of Public Health issues. We will call our new variable Pubhel. 1. In the Transform menu, left-click on Compute.

2. In the Compute Variable calculator, 1) type public_health in the Target Variable; 2) construct the following equation in the Numeric Expression box: (q4_1 + q4_2 + q4_3 + q4_4 + q4_5 + q4_6 + q4_7 , and then move each variable over one at a time +q4_8) / 8. You need to start with the parentheses while clicking a plus sign in between each. Then a backslash , which is division, after the parentheses followed by an eight. So, in this equation, we are added up all these variables that make up the Public Health subject on our survey and then dividing them by the number of variables, or, in other words, finding the mean. We are doing this for each case. 3) Click OK.

3 3. Now we have a new variable, public_health. The lower the number, the more the person is concerned with Public Health issues.

33

Creating a New Variable From Two Variables allows you to subsort two variables into one. We want to change q45 (gender) and q921 (weight) into one variable, newweight. Specifically, in our new variable, we want to make a combined variable that has four values: 1 is men under 200 lbs, 2 is men 200 lbs and over, 3 is women under 200 lbs, and 4 is women 200lbs and over. In order to do this, we have to write a syntax file and run it. Syntax in SPSS is scripting, and SPSS allows you to do many operations with scripting. 1. We need to create a new syntax file. In the File pulldown menu, select New and then left-click Syntax.

2. In the Syntax file, there are 1) a navigation panel that lists the functions and the conditions, and 2) the view where you type the commands.

34

3. You have to create our four new values using If statements as given below. There are two parts, the condition and the result. Note: Sometimes as you type these statements, a pop-up menu appears. This offers a list of all commands. If you keep typing, it disappears. Syntax
if (q45 < 200 & q921= 1) newweight = 1.

if (q45 >= 200 & q921 = 1) newweight = 2.

if (q45 < 200 & q921 = 2) newweight = 3.

if (q45 >= 200 & q921 = 2) newweight = 4.

execute.

What its saying If sets up the condition and the parentheses are the parameters of the condition. The parameters are find the cases in q45 that are under 200 and the cases in q921 that equal 1 (men under 200 lbs). Give the value of 1 in the new variable newweight for the cases that meet those conditions If sets up the condition and the parentheses are the parameters of the condition. The parameters are find the cases in q45 that are equal to or greater than 200, and the cases in q921 that equal 1 (men 200 lbs and over). Give the value of 2 in the new variable newweight for the cases that meet those conditions If sets up the condition and the parentheses are the parameters of the condition. The parameters are find the cases in q45 that are under 200 and the cases in q921 that equal 2 (women under 200 lbs). Give the value of 3 in the new variable newweight for the cases that meet those conditions If sets up the condition and the parentheses are the parameters of the condition. The parameters are find the cases in q45 that are greater than or equal to 200, and the cases in q921 that equal to 2 (women 200 lbs and over). Give the value of 4 in the new variable newweight for the cases that meet those conditions Denotes the end of the script.

Dont forget the period and the end of the statements!

35

2) In the pull-down menu Run, left click on All.

3. You see in our new variable newweight, values have been added according to the parameters we set.

4. You can save syntax files, too, and use them later. 1) In the Syntax1 SPSS Syntax Editor, use the pulldown menu File, and left-click Save. 2) In the Save As window, save it as obesity.SPS. SPS is the extention for syntax files. You can close the syntax file.

Now, we have modified our data set and we are interested in some descriptive statistics, which describes and or summarizes the scores from our data set. Usually descriptives deal with central tendency and variance. Lets explore these essential ideas for a moment.

36

(Graphic from http://www.maximumiq.com/iq-tests-stats.php) An IQ test is a perfect example of central tendency and variance. Your result on an IQ test is literally the comparison of your result with everybody elses who has taken the test. Millions of people take these tests. Very few people would score low, and there a very few geniuses around who would score high. The majority of us have average IQs. As seen in the graphic above, IQ results, when plotted out, take a normal distribution where the majority of results cluster in the middle and results that are lower and higher are infrequent and lessen the farther away from the center of the results. The central tendency is measured usually by the mean (All cases added and then divided by the number of cases). So, a score of 100 on an IQ is the mean. Its an average intelligence. Remember, the results of the majority of people bunch around 100. Variance is how far the score falls from the mean. If most of the scores cluster around the mean, then there is low variance. It looks like a bell curve, where most of the results are in the middle taking the shape of a bell. If the variance is high, the curve in the middle is not as high and the results are more spread out. Now, lets get some descriptive statistics for the variable q45, our participants weight measurements. Frequencies gives you the number of cases reporting a certain amount of a variable. 1. In the Analyze menu, select Descriptive Statistics and left-click on Frequencies.

2. In the Frequencies dialog, select our variable, Q.45 Respondent WEIGHT IN POUNDS, and move it into the Variable box with the arrow and click OK.

37

Scroll down to the Frequency results, where you have a few important numbers. 1) The path to the data file is given at the top of any results. 2) In Statistics chart, N denotes the number of cases. Valid gives the number of cases that were calculated in the Frequencies. Missing are those cases that did not have any value and were not calculated. 3) In Q.45 Respondent WEIGHT IN POUNDS chart, the very left column gives you the value in the cells in the weight variable. That is, it gives you every weight that was reported. 4) Frequency gives you the number of cases reporting this number. 5) Percent gives you the percent of all cases reporting this number. 6) Cumulative Percent gives you the total amount of cases that reported this number and those below. So, .9 percent of the cases reported Dont Know. The second entry is No Answer with a Cumulative Percent of 4.3, the percent of cases reporting Dont Know and No Answer added up to 4.3. As you go down the chart, youre adding all the percents together and the number increases until it reaches 100% at the end. Please close the results in the TOC after each analysis so you dont get overwhelmed. 1 STOP! When you do many different analyses, you get a ton of results. Dont get overwhelmed by all the numbers. In the results, there tend to be only a few numbers that you really need to report for the final product of your analyses. In the tutorials, we concentrate only on the numbers you need.

Crosstabs gives a frequency count of one variable by another variable. Lets get the frequency of weight by gender. 1. In the Analyze dropdown menu, select Descriptive Statistics and then left-click Crosstabs.

38

2. In the Crosstabs dialog, 1) select Q.45 Respondent WEIGHT IN POUNDS and move it to the Row field. 2) Select Q921. GENDER and move it to the Column field. 3) Click OK. 1

3 3. As you can see in the second chart, you have a frequency count of weight by gender.

Descriptives gives you a lot information about descriptives, such as the mean. 39

1. In the Analyze menu, select Descriptive Statistics and left-click onto Descriptives.

2. In the Descriptives dialog, select our variable, Q.45 Respondent WEIGHT IN POUNDS, and move it into and click OK. the Variable box with the arrow

3. In Output, you receive the Descriptives results. You have four new numbers here that you dont receive in the Frequencies result. First, you are given the range of your results. 1) Minimum is the lowest case value. 2) Maximum is the highest case value. So, you know all your cases range from -7 to 540. 3) Mean is the mean of all the case values. 4) Std. Deviation, which stands for Standard Deviation, is a measure of variance and is explained below.

Standard Deviation says where a certain amount of cases lie. Basically, if you have a normal distribution (a bell-shaped curve as seen in our IQ graphic below), 68% of the cases fall within +/ 1 std. deviation from the mean, a total of 95% of the cases fall within +/ 2 std. deviation from the mean, and a total of 99% of the cases 40

fall within +/ 3 std. deviation from the mean. So, in our IQ example again, 68% of people fall about 15 IQ points away from the mean (range IQ 85 to 115).

Back to our weight data set. From our descriptives, we see that the mean of our sample is approximately 164.14 lbs with a standard deviation of approximately 55 lbs. So, we know 68% of our cases range approximately from 109 lbs to 211 lbs. You can also look and see how large the standard deviation is. The smaller the number, the less variance.

Ok, we have a problem. Lets look back at our Descriptive Statistics. Whats wrong?

Look at the Minimum. The values -7 and -5 (Dont know and No answer) are being included in our results as if they were genuinely pounds. So, we need to deselect all the cases that have these values. Refer back to Select Cases on pg. 29. In SPSS, click on Select Cases, select if condition is satisfied, and in the Select Cases: If calculator type the statement q45 > 0, which means select cases in the variable weight only if theyre over 0 lbs. Then go back and run the Descriptives for the variable Q.45 Respondent WEIGHT IN POUNDS.

41

These results are more accurate. The N, number of cases, dropped from 1202 to 1150 since the cases with -7 and -5 werent included. The lowest case for weight, 50 lbs, was included as our minimum. Also, look at our mean. It jumped 7 lbs to 171.78 lbs. The Std. Deviation decreased to 43.131, showing even less variance in our data. Do not reset the Q45 variable, and leave it with cases selected. Split File allows you to split a variable into groups and then run descriptive on that variable compared to another variable. So, for example, we want to run descriptives on the weight of men and women. With Split File, we can split the variable gender into men and women and then run the descriptives to get the results for each. 1. In the Data menu, select Split File.

2. In the Split File dialog, 1) select Compare groups so when you present the descriptives in one chart. 2) select the variable that you want to split into groups. Q921 Gender. 3) Click OK.

42

3 3. In the Analyze menu, select Descriptive Statistics and left-click Descriptives.

4. In the Descriptives, move the variable over to the Variable field that you want to descriptives for in the groups of men and women, in this case Q.45 Respondent Weight. Click OK.

5. In the Descriptives result, you see the statistics for weight by male and female. Please remember to go back to the Split File dialog and reset.

43

Explore gives you many of the basics descriptives plus some nice graphics. 1. In the Analyze menu, select Descriptive Statistics and then left-click Explore.

2. In the Explore dialog, select our variable, Q.45 Respondent WEIGHT IN POUNDS, and move it into the Variable box with the arrow . Click OK.

3. In the View, you get a number of results. The Descriptives chart gives you much of the important statistics that we discussed in Frequencies and Descriptives.

44

Stop! Positively skewed data is when the long tail of your distribution going up on your scale. Negative skewed is the long tail going down. Your data are considered skewed with a Skewness result over 1.

The Boxplot is a good graphic as well. It is a depiction of the cases if they were lined up lowest to highest. 1) The beginning and the end of the range is the start and finish of the I figure. 2) The bottom and top of the red box are considered the 25th and 75th percentile. So, the 25th denotes that 25% of the cases occur below the line and the 75th denotes that 75% of the cases occur below that line. 3) The thick line in the middle of the box denotes the median, or the exact middle case if lined up lowest to highest. The numbers outside the I figure are considered outliers, which are cases that are extreme and dont fit into the normal distribution.

1 3

SPSS offers some other nice functions to visualize data through its Graphs menu. Lets explore some of them. First, lets make a histogram of our weight data. 1. In the Graphs menu, left click on Chart Builder.

45

2. If you get a Chart Builder, its simply reminding you that you need to set the level of measurements correctly or you charts wont look right. Please select Dont show this dialog again and press OK. A histogram shows the number of cases which fall within each interval. If youre uncertain about something, go to the Help menu and left-click Topics.

3. In the Chart Builder, 1) click on Histogram the first graphic. 2) Then double-left click on the first graph Simple Histogram in the bottom middle of the window.

Click on the Index tab and then you can type in the subject. The results can give you definitions and info on how to use the function.

2 1 4. 1) The Element Properties window opens on the left, which contains more controls of graph. You can move it out of your way. 2) Graphs in SPSS are built by dragging and dropping variables from the Variables menu in the Chart preview area. Move the Q.45 respondent WEIGHT in POUNDS over to the Chart in the XAxis? slot on the bottom.

46

5. It is also very descriptive to show the Normal Curve for the histogram. 1) In the Element Properties, select Display normal curve. 2) Finish your graph by pressing OK in the Chart Builder.

47

6. In the Output window, you have a histogram of the respondents weight.

Another good graphic is a pie chart, where each pie slice represents a value within a variable. Lets make a pie chart of the percentage participants by gender. 1. Go back to Data Editor and open up the Chart Builder. 1) In the Gallery menu, left-click Pie. 2) Doubleleft click on Pie Chart icon, and the pie format appears in the chart preview area. 3) Drag and drop the gender variable into the Slice by? slot.

48

2. You also need to select what values are used for the slices. You want to show only men and women, not values such as No Answer or Dont know. 1) In the Element Properties window, select GroupColor (PolarInterval1) in the Edit Properties of menu. 2) In the Order menu, you see all the labels appear. Select each of the labels and click the exclude button Editor, click OK. except Male and Female. 3) Click Apply. 4) Back in the Chart

3 4 3. In the Output window, you see a pie chart for Male and Female. Lets be more descriptive and put the percentage that each slice of the whole is.

49

4. 1) Double-left click on the pie chart and bring up the Chart Editor. 2) Left-click on the Show Data Labels button and bring up the Properties window with Data Value Labels tab selected.

2 1

5. 1) In the Not Displayed menu, select the Percent variable and left-click on Move Variable to Contents to move the variable into the Displayed menu. 2) In the Displayed menu, select the Count variable and left-click on exclude to move it down into Not Displayed. 3) Left-click on Apply. The values in each slice of the pie chart are now percentages. Close out of Properties and Chart Editor.

50

A bar chart is also very useful. 1. Back in the Data View, go to the Graphs menu and left-click on Chart Builder (You can also do this from the Output Viewer). 2. In the Chart builder, 1) left-click on Bar. 2) Double-left click on Simple Bar. Q921 Gender should still be in the Slice by? slot.

3. In the Element Properties window, 1) in the Statistic pull-down menu, select Mean. Each gender bar with show the mean weight. 2) Click Apply.

51

4. Now select the values that you want to chart. 1) In Edit Properties of, left-click X-Axis1 (Bar1). 2) In the Order menu, exclude all values except Male and Female. 3) Click Apply. 4) Move the Q45 Respondent WEIGHT into the weight slot.. 5) Back in the Chart Editor, click OK.

3 5 5. In the Output window, you have your bar chart of men and women showing. Do you remember how to show the mean weight number? (In the Chart Editor)

Another good function of Output is that you can copy and paste results into a word document. So, if youre writing an essay, you can just place your stats results right into your document. 1. Right-click on our histogram and then left-click on Copy. 52

2. Open up a Word Document. 3. In our Word Document, left-click on the screen and then right-click on Paste. The histogram appears and you can position it as you wish. Please close Document1 and dont save it.

Finally, you can print the results you want to use. Output allows you to close some of the results you dont want, and print the rest. Lets leave the Graph for the histogram open, close the rest. We can then print the results we left open. 1. There are two ways to print. The first way is by clicking on the print icon .

53

The second way is by 1) going into the File menu and left-clicking on Print.

2) In the Printer pulldown, make sure youre printing at the right printer. Click on OK.

Remember, some of your results can be very long. For example, our Frequency chart for weight is very long and would not fit on one page. Sometimes you need to play with the data and your results, and look at them from different angles to see what best for you. In our next lesson, we will explore how to do inferential statistics in SPSS. With this form of statistics, you can form and test hypotheses about the whole population from our sample. For now, you can do some further study on what we have just learned.

54

Finding and understanding Data Milner Library: Finding Statistics - Understanding Statistics (http://www.mlb.ilstu.edu/learn/stat/understanding4.htm) Baker Library Guide: Statistics: Understanding Statistics (http://www.library.hbs.edu/guides/statistics/understanding.html) Descriptive Statistics Introduction to Descriptive Statistics (http://www.mste.uiuc.edu/hill/dstat/dstat.html) HyperStat Online: Descriptive Statistics (http://davidmlane.com/hyperstat/A28521.html) School of Psychology: University of New England Chapter 4: Analysing the Data; Part 4: Descriptive Statistics (http://www.une.edu.au/WebStat/unit_materials/c4_descriptive_statistics/) SPSS SPSS Home Site (http://www.spss.com/) Raynalds SPSS Tools (http://www.spsstools.net/) If you have any questions, contact me at: Thomas Stieve 863-7978 Thomas_Stieve@brown.edu Thomas Stieve

55

Anda mungkin juga menyukai