Anda di halaman 1dari 8

3 September 2013

Page 1 of 8

Getting Started with SPSS


Stat E100 Fall 2013
The purpose of this tutorial is to learn how to download, install, and use SPSS for data manipulation,
visualization, and simple analysis.

1. Downloading and Installing onto your PC


Downloading
You can get a free site-licensed copy of SPSS for your computer through HUIT. The very first thing
you need to do is activate your Harvard PIN account (if you have not already done so). That can be
done here:
http://www.pin.harvard.edu/
You will also need an FAS email account. If you do not already have one, please go here to get one:
http://www.fas-it.fas.harvard.edu/accounttools
Next, you will need to download the install program from the FAS software download website:
- Go to the software download center (you will need to log in using your Harvard PIN #):
http://downloads.fas.harvard.edu/download
(If you are using a MAC, make sure you click on the correct platform for proper installation)
- Scroll down to the program SPSS Statistics and click on the download button
. A window should
pop open to download the install program: save this to a convenient location on your computer, like on
your desktop. (If you have a pop-up blocker installed, you may have to click on the banner that opens
near the top of the screen). This is a large file (over 2.0GB for both PCs and Macs), so it may take a
little while to download (especially if you are off campus). Make sure you download the appropriate
version for your operating system: either Windows for PCs, or the Mac version for Macs.
- You will need to send an email to HUIT Support (help@fas.harvard.edu) and request an
Authorization Code for the installation. Be sure to send this email from your FAS account!

Installing
Once you have downloaded the program as outlined above and have the Authorization Code in hand,
you now need to install the program. Open up the program you just downloaded: presumably it will be
called SPSS21.exe for PCs and SPSS_Statistics_21.dmg for Macs (the unpacking of this may take a
few minutes). When the first window opens up, click on Install IBM SPSS Statistics 21; this
installation may also take a few minutes. You can click through all of the defaults (if you want to
install any languages besides English, where to save the files on your computer, etc). But towards
the end of the installation, you will be prompted whether you want to supply a license for your product.
Be sure to select the options "Site license" and "License my product now." Once installation is
complete, you can delete the install program you downloaded (presumably on your desktop or
download folder); this was used for installation only. Now, restart your computer before booting up
SPSS for the first time.

3 September 2013

Page 2 of 8

2. Start-Up and Data Manipulation


Start-Up
To open SPSS on a computer lab PC (or on your computer in which you followed the above
directions), click on Start Programs IBM SPSS Statistics 21. A screen should pop up that should
look similar to this:

Notice on the right you can Run the tutorial that SPSS provides. Feel free to do that if you like (not
required).
This window is the start window, and SPSS is automatically asking if you would like to open up a
dataset to begin with. If you click Cancel, then you will be left with a blank dataset in SPSSs
memory. For now, just click Cancel. You should then see a window that looks similar to this:

3 September 2013

Page 3 of 8

This is SPSSs Data Editor. A few things to notice in this window:


1) The spreadsheet in the middle of the screen will show the data, once you open up a dataset.
2) The menus at the top of the screen. Well be using the Analyze and Graphs window a lot.
3) The buttons just below the menus. These are just shortcuts for some of the menu options. This
tutorial will stress the menu options more than the shortcuts (as they are easier to explain).
Now, it is time to import/open up a dataset.
Importing Data
Most datasets used in this class will actually be Excel spreadsheet, where the name of the Excel file
will end with .xls or .xlsx. This tutorial uses the 2012 Election SPSS Tutorial.xls data file found
on the course website here (under the SPSS Material tab):
http://isites.harvard.edu/icb/icb.do?keyword=k97297&pageid=icb.page624056
Save this file on your computers desktop to start. To open up the file in SPSS, click on the menu File
Open Data. In the window that pops open, change the Files of type: option to Excel (*.xls,
*.xlsx, *.xlsm) and then browse your computer for the file and select it. In the window that pops
open, be sure the box is checked that reads Read variable names from the first row of data, and then
click on OK. Another window will open up that is the Ouptput window, which will give you some
commands that SPSS used behind the scenes to read in the dataset. You should also now see the data
values read into SPSS Data Editor. The beginning of it should looks similar to this:

Note: you can manually enter or change data in this Data Editor. You will almost never need to do that
in this class (though you may for your project). I usually do my data manipulation in the original
Excel file, and re-read in the updated dataset if I need to make any changes to the dataset.

3. Data Visualization
Once the dataset is read in, the main concern now is how to manipulate data. Problem set 1 discusses
how to use the SPSS menus to produced simple graphs and summary statistics, and is repeated here.
Please note: some of the methods illustrated in this tutorial will not be used until the second week of
class or later.

Summary Statistics
To get frequencies of a categorical variable, use the menu: Analyze Descriptive Statistics
Frequencies. In the window that opens up, choose the region variable by double clicking on it (or by
dragging it over to the Variable(s) list). Make sure the Display frequency tables box at the bottom
of the window is checked. Click OK.

3 September 2013

Page 4 of 8
region

Frequency

Percent

Valid Percent

Cumulative
Percent

Valid

MW

13

26.0

26.0

26.0

NE

11

22.0

22.0

48.0

13

26.0

26.0

74.0

13

26.0

26.0

100.0

Total

50

100.0

100.0

To get some quick statistics on a quantitative variable, use the same menu: Analyze Descriptive
Statistics Frequencies (the Descriptives menu option does not allow you to calculate the
quartiles). Choose the romney and gsp variables by double clicking on them or by dragging them over
to the Variable(s) list (you may need to remove the variable region from this list first). Click on the
Statistics button, and then make sure to select the following statistics to be calculated: Quartiles,
Mean, Median, Std. deviation, Minimum, and Maximum. Click Continue then OK (before clicking
OK you may want to uncheck the Display frequency tables box so that you dont get a huge table
that is not needed), and the results should show up in the output window like this:

Histograms
To get a quick view of the distribution of a variable, use the menu Graphs Chart Builder. In the
window that opens up click on Histogram in the lower left. In the chart options, double click on the
first one (should look like this:
). Then drag the variable romney to the x-axis at the top of the
window. Then click OK. Your histogram should look like this:

3 September 2013

Page 5 of 8

*Note: if you double-click on the graph in the output window, a Chart Editor window will open up.
Here you can edit the chart (change axis names, change chart dimensions, add a legend, remove the
weird statistics in the upper-right of the chart, etc).

Boxplots
To produce a boxplot of a variable, again well use Graphs Chart Builder. This time just
select Boxplot at the bottom of the window, and then double-click on the 3rd option tile:
You can also split a boxplot into different categories. Choose the 1st option tile for boxplots, and put
romney on the y-axis and region on the x-axis. Click OK and the result should look like this:

3 September 2013

Page 6 of 8

Scatterplots
To get a quick visual of how two quantitative variables are related, again we will use the Graphs
Chart Builder menu. Select Scatter/Dot at the bottom of the window that opens up. Choose the first
tile, and then drag romney to the y-axis and gsp to the x-axis.
To add the regression line to the scatterplot, double-click on the scatterplot in the output window to
open up the Chart Editor. Above the graphic there is a button that looks like this:
, which will
add the Fit line to the graph. Click that button and make sure Linear is selected, and the linear
regression line should appear on the graph (along with the formula for the line). Exit out of the Chart
Editor, and the graph should be updated in the output window. It should look like this:

Saving and Printing Graphs


The easiest way to print a histogram, scatterplot, etc is to right-click on the graph itself (in the
middle), and then copy and paste into a word processor. From there you can add comments, adjust the
size, etc

Creating a New Variable


Sometimes you may want to transform a variable (possibly to make it more symmetric). One of the
most common transformation is the log transformation (square root and square transformations are
common too). To create a new variable that is the log of the original values (in this case the percent of
the state population that is non-white), use the menu Transform Compute Variable. In the window
that pops up, you can type any variable name you want for the Target Variable (I called the new
variable log10nonwhite), and in the Numeric Expression field, type in LG10(nonwhite) as shown
below:

Click OK. Back in the Data Editor window you should now see a new variable (new column) with
the name you gave it, and the first three entries should be 1.46, 1.49, and 1.35 (since 101.46 = 28.9).

4. Data Analysis

3 September 2013

Page 7 of 8

In Unit 2, we will learn to measure and analyze the association between two variables (correlation and
regression). Later in the course, we will see many more ways to do analysis (confidence intervals,
hypothesis testing, ANOVA, etc). Lets do some work on what we know for now:

Correlation
To find the correlation coefficient between two (or more) variables, use the menu Analyze Correlate
Bivariate. In the window that pops up, drag romney, nonwhite, and log10nonwhite into the
Variables list. Click OK, and your results should look like this:
Correlations
romney

-.206

.022

.150

50

50

50

Sig. (2-tailed)
N
Pearson Correlation

nonwhite

-.323

Sig. (2-tailed)

-.323

.908

.022

N
Pearson Correlation
log10nonwhite

log10nonwhite
*

Pearson Correlation
romney

nonwhite

**

.000

50

50

50

-.206

**

Sig. (2-tailed)
N

.908

.150

.000

50

50

50

*. Correlation is significant at the 0.05 level (2-tailed).


**. Correlation is significant at the 0.01 level (2-tailed).

Here, we see the correlation (Pearson Correlation) is -0.323 between romney and nonwhite, -0.206
between romney and log10nonwhite, and 0.908 between nonwhite and log10nonwhite.

Regression
To get the printout of a regression (to find the estimates for the slope and intercept of a line aong other
things), use the menu Analyze Regression Linear. In the window that opens up, drag romney
into the Dependent variable location, and gsp into the Independent(s) variable location. Click
OK, and your results should look like this:
Model Summary
Model

.360

R Square

Adjusted R

Std. Error of the

Square

Estimate

.130

.112

9.46513

a. Predictors: (Constant), gsp


a

ANOVA
Model

Sum of Squares
Regression

df

Mean Square

641.049

641.049

Residual

4300.255

48

89.589

Total

4941.303

49

a. Dependent Variable: romney

F
7.155

Sig.
.010

3 September 2013

Page 8 of 8

b. Predictors: (Constant), gsp


Coefficients
Model

Unstandardized Coefficients

Standardized

Sig.

Coefficients
B
(Constant)

Std. Error

69.790

7.442

-.480

.180

Beta
9.378

.000

-2.675

.010

1
gsp

-.360

a. Dependent Variable: romney

Often, we know we often would like to look at the residual plot for a regression to see if the
assumptions are met (to see if there is a pattern in the residuals, like a U-shape). To get the residual vs.
fitted plot (the fitted variable being your y i ), go back to the Analyze Regression Linear menu.
In the window that opens up, drag romney into the Dependent variable location, and gsp into the
Independent(s) variable location. Click Save on the right. In this window, check Unstandardized
for both the Predicted Values and Residuals panels. Click Continue and then OK, and two new
variables should be available in the Data Editor: PRE_1 and RES_1. Now just create a scatterplot
(from the Graphs Chart Builder menu) with RES_1 as the dependent variable and PRE_1 as the
independent variable (or use gsp as the independent variable). You can also create a histogram of the
residuals (using the variable RES_1). Those graphs will look like this:

Anda mungkin juga menyukai