Anda di halaman 1dari 31

FELIPE REGO

HOME ABOUT SERVICES CASE

23 Oct 2015

QUICK
GUIDE:
INTERPRETI
SIMPLE
LINEAR
MODEL
OUTPUT
IN R
Linear regression models are
a key part of the family of
supervised learning models.

supervised learning models.

In particular, linear

I’ve written on Simple Linear

Regression - An example
using R. In general, statistical

analyst who is starting with

linear regression in R to

cars dataset found in the

datasets package in R

package you can call:

library(help =

"datasets") .

summary(cars)
## speed
## Min. : 4.0 Min.
## 1st Qu.:12.0 1st Q
## Median :15.0 Medi
## Mean :15.4 Mean
## 3rd Qu.:19.0 3rd Q
## Max. :25.0 Max.

speed variable varies from

cars with speed of 4 mph to

stop.

Below is a scatterplot of the

variables:

plot(cars, col='blue', p
xlab="Speed in m
From the plot above, we can

distil and interpret the key

components of the R linear

model but we are more

interested in interpreting

would then allow us to

potentially de ne next

steps in the model

building process.

price drop

Let’s get started by running

one example:
set.seed(122)
speed.c = scale(cars\$spe
mod1 = lm(formula = dist
summary(mod1)

##
## Call:
## lm(formula = dist ~
##
## Residuals:
## Min 1Q Med
## -29.069 -9.525 -2.
##
## Coefficients:
## Estimate
## (Intercept) 42.9800
## speed.c 3.9324
## ---
## Signif. codes: 0 '*
##
## Residual standard er
## Multiple R-squared:
## F-statistic: 89.57 on
The model above is achieved

model.

price drop

Formula Call

Note the simplicity in the

syntax: the formula just

Residuals

model output breaks it down

into 5 summary points. When

Coe cients

and slope terms in the linear

model. If we wanted to

formula. Ultimately, the

analyst wants to nd an
intercept and a slope such

a stop. The second row in the

Coe cients is the slope, or in

3.9324088 feet.

ideally want a lower number

relative to its coe cients. In

hypothesis of the existence of

a relationship between speed

from zero and are large

relative to the standard error,

which could indicate a

relationship exists. In

that it is unlikely we will

observe a relationship

value of 5% or less is a good

cut-o point. In our model

signi cant p-value.

Consequently, a small p-

measure of the quality of a

linear regression t.

deviate from the true

regression line by

approximately 15.3795867

feet, on average. In other

words, given that the mean

with 48 degrees of freedom.

Simplistically, degrees of

these parameters

had 50 data points and two

parameters (intercept and

slope).

Multiple R-squared,

actual data. It takes the form

of a proportion of variance.

R
2
is a measure of the linear

does not explain the variance

in the response variable well

that the answer would almost

certainly be a yes. That why

2
.

Nevertheless, it’s hard to

de ne what level of R
2
is

2

2
is the preferred

measure as it adjusts for the

number of variables

considered.
F-Statistic

F-statistic is a good indicator

of whether there is a

of data points and the

number of predictors.

hypothesis (H0 : There is no

relationship between speed

to ascertain that there may be

a relationship between

variables. In our example the

F-statistic is 89.5671065

above was just an example to

illustrate how a linear model

way we could start to improve

is by transforming our

response variable log-

transformed mod2 =

lm(formula = log(dist) ~

bringing in new variables,

new transformation of

variables and then

subsequent variable

YOU MAY ALSO

LIKE...

3 Things I Learned

Helping Australia’s

Agency Win A Large

Analytics Project In

Data Science and

Analytics Panel @

Big Data

DASAMA™ - Data

Science and Analytics

Maturity Assessment

model

Data Analytics

Initiative

SERVICES CASE STUDIES

TESTIMONIALS BLOG

EMAIL

PHONE +61 2 8005

MESSAGE

EMAIL felipe@felip
SOCIAL  
SEND MESSAGE

© FELIPE REGO INSPIRED BY MASSIVELY THEME

DESIGN BY HTML5 UP JEKYLL INTEGRATION BY SOMIIBO