JHU SON
Spring 2017 course
Essential Statistics, Instructors Copy, Moore/Notz/Fligner 2 edition
Transcribed course lecture by Janna Stephens, PhD, RN
Notes on Correlation:
1. Correlation makes no distinction between explanatory and response
variables (doesnt matter which variable you all x or y)
2. r has no units and does not change when we change the units of
measurement of x, y or both. (ex. You can measure weight in pounds or
kilograms or height in cm or inches, and it wont change correlation
between height and weight).
3. Positive r indicates positive association between the variables, and
negative r indicates negative association.
4. The correlation r is always a number between -1 and 1.
We want to see how tightly grouped points are to the regression line. So we
look at each data point and draw a line from the data point to the regression
line. These lines are the residuals. Then, we plot that information on a
residual plot.
Recall that an outlier is an observation that lies far away from the other
observations.
Outliers in the y direction have large residuals
Outliers in the x direction are often influential for the least-squares
regression line, meaning that the removal of such points would
markedly change the equation of the line.
Also, we discussed previously how correlation (r), describes the
strength of a straight line relationship. In the regression setting, this
description is r2 (or the square of the correlation, is the fraction of the
variation in the values of y that is explained by the least squares
regression of y on x).