Using Excel:
Correlation and Regression
To find the correlation coefficient:
1. Click the fx button and use the correl function. Insert the x and y data as directed.
To make a scatter-plot with (or without) a least-squares line.
1. Select the x and y data columns. Use the control button to select non-adjacent columns of data.
2. Click insert from the menu bar and choose scatter (no lines).
3. Click anywhere on the graph and under chart tools select design.
4. Choose a design that has axis titles and edit these accordingly.
5. Right click any point on the graph and select add trendline
6. In the window that opens, check linear and display equation.
7. Edit the resulting line and equation to your preference.
8. Note: You can start by inserting a scatterplot and then adding the data. This is a little trickier
but if you follow the directions you can get to the same endpoint.
The following screen-shots show the step-by-step process.
Inserting a scatterplot:
Conclusions:
1. If the data appears to be linearly related and
2. if there are no outliers that can mess up the regression equation and
3. if |r| is greater than the critical value of r from Table 4,
4. then you can use the regression equation (
y = m x + b) to make predictions about y given x.
Greater Correlation and Regression options are available in the Analysis ToolPak.
The Analysis ToolPak is available with all PC versions of Excel. Here is how to install the the
Analysis ToolPak for PCs (See Mac notes below).
1. Open a blank Excel spreadsheet.
2. Click on the windows icon (pre 2010) or the file tab (2010+).
3. Choose Excel Options (pre 2010) or just options (2010+).
4. Choose add-ins.
5. In manage (bottom of window), choose Excel Add-ins and click Go.
6. Check the box that says Analysis ToolPak and click OK.
7. After you load the Analysis ToolPak, the Data Analysis command is available under the Data tab.
It should be the far right option.
Mac Notes: As of this writing, if you are running Excel 2008 or higher on a Mac, the analysis ToolPak
is not available. There is an application called StatPlus:Mac LE which is a free version of the full
StatPlus application. It can handle most of the tasks performed by the Analysis ToolPak and in its
full version is probably superior - but that costs money.
(y y)2
and is used in calculating a prediction
n2
temperature (F)
53
62
57
71
78
66
84
87
96
91
94
96
chirps(per minute)
20
32
40
60
80
100
120
140
160
180
200
220
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
0.9357
0.8755
0.8631
25.2188
12
ANOVA
df
Regression
Residual
Total
1
10
11
SS
MS
44738.7783 44738.7783
6359.8884
635.9888
51098.6667
Coefficients
Standard Error
-204.2138
38.4764
4.0669
0.4849
Intercept
temperature (F)
F
Significance F
70.3452 0.0000077650
t Stat
P-value
-5.3075 0.00034
8.3872 7.8E-06
Lower 95%
Upper 95%
-289.9446
-118.4830
2.9865
5.1473
RESIDUAL OUTPUT
Observation
1
2
3
4
5
6
7
8
9
10
11
12
Residuals
8.667303367
-15.9349333
12.39964263
-24.53716997
-33.00557627
35.79740596
-17.40706738
-9.607812933
-26.2100496
14.12452633
21.92378077
33.7899504
Correlation Coefficient
P-value of Test Statistic
y-intercept of regression line
slope of regression line
Residuals
20
10
0
-10 0
20
40
60
80
-20
-30
-40
temperature (F)
There is a slight U-shaped pattern here so a linear fit might not be best.
100
120