Anda di halaman 1dari 18

HW 7

1. Bodyfat Revisited
a)
bodyfat=read.delim("http://sites.williams.edu/rdeveaux/files/2014/09/bodyfat.
txt")
cor(bodyfat)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Pct.BF
Age
Weight
Height
Neck Chest
Pct.BF
1.00000 0.29505 0.61730 -0.02939 0.4885 0.7007
Age
0.29505 1.00000 -0.01605 -0.24589 0.1187 0.1818
Weight
0.61730 -0.01605 1.00000 0.51291 0.8100 0.8913
Height -0.02939 -0.24589 0.51291 1.00000 0.3247 0.2236
Neck
0.48852 0.11874 0.81001 0.32466 1.0000 0.7688
Chest
0.70067 0.18181 0.89129 0.22359 0.7688 1.0000
Waist
0.82368 0.24278 0.87374 0.18669 0.7285 0.9101
Hip
0.63267 -0.05813 0.93269 0.39672 0.7075 0.8249
Thigh
0.54855 -0.21608 0.85212 0.34959 0.6688 0.7076
Knee
0.49231 0.01719 0.84274 0.51291 0.6482 0.6976
Ankle
0.24456 -0.10962 0.58091 0.39455 0.4344 0.4471
Bicep
0.48154 -0.04414 0.78521 0.31857 0.7085 0.7069
Forearm 0.36471 -0.08512 0.68333 0.32199 0.6608 0.5994
Wrist
0.33901 0.21751 0.72510 0.39698 0.7312 0.6445
Thigh
Knee
Ankle
Bicep Forearm Wrist
Pct.BF
0.5485 0.49231 0.2446 0.48154 0.36471 0.3390
Age
-0.2161 0.01719 -0.1096 -0.04414 -0.08512 0.2175
Weight
0.8521 0.84274 0.5809 0.78521 0.68333 0.7251
Height
0.3496 0.51291 0.3945 0.31857 0.32199 0.3970
Neck
0.6688 0.64817 0.4344 0.70853 0.66079 0.7312
Chest
0.7076 0.69760 0.4471 0.70689 0.59935 0.6445
Waist
0.7370 0.71042 0.4075 0.65632 0.53014 0.6023
Hip
0.8811 0.80915 0.5212 0.72165 0.60327 0.6264
Thigh
1.0000 0.77747 0.5036 0.74402 0.60430 0.5437
Knee
0.7775 1.00000 0.5852 0.65416 0.57855 0.6557
Ankle
0.5036 0.58516 1.0000 0.44904 0.42943 0.5450
Bicep
0.7440 0.65416 0.4490 1.00000 0.70110 0.6137
Forearm 0.6043 0.57855 0.4294 0.70110 1.00000 0.5983
Wrist
0.5437 0.65573 0.5450 0.61367 0.59833 1.0000

with(bodyfat,plot(Pct.BF,Waist,pch=19))
lm.bodyfat=lm(Waist~Pct.BF, data=bodyfat)
abline(lm.bodyfat,col="red")

Waist
Hip
0.8237 0.63267
0.2428 -0.05813
0.8737 0.93269
0.1867 0.39672
0.7285 0.70752
0.9101 0.82491
1.0000 0.86052
0.8605 1.00000
0.7370 0.88113
0.7104 0.80915
0.4075 0.52119
0.6563 0.72165
0.5301 0.60327
0.6023 0.62640

One might think that Waist would do the best job because Waist and Pct.BF have a positive
correlation with little outliers. The correlation between Waist and Pcf.BF is linear and
regression can be used appropriately to describe their relationship here.
b)
with(bodyfat,lm(Waist~Pct.BF))
##
##
##
##
##
##
##

Call:
lm(formula = Waist ~ Pct.BF)
Coefficients:
(Intercept)
28.738

Pct.BF
0.399

The regression equation above means that when the waist increases an inch, Pct.BF will
increase 0.3991.
c)
with(bodyfat,boxplot(Waist))

with(bodyfat,boxplot(Pct.BF))

plot(lm.bodyfat$residuals)

scatter.smooth(residuals(lm.bodyfat)~predict(lm.bodyfat), col="red")

Unfortunately, the assumptions and conditions of regression are not satisified. Both
variables are quantitative and there are little outliers looking at the boxplots. However,
there is a underlying pattern of the residual which means that the relationship between
Waist and Pct.BF is not a linear model. This does not satisfy the conditions of regression.
d)
lmobj=lm(Pct.BF~Waist, data=bodyfat)
predict(lmobj,new=data.frame(Waist=38))
##
1
## 21.86
predict(lmobj,new=data.frame(Waist=72))
##
1
## 79.66

A man who has a 38 waist should, according to the prediction, have 21.8648% bodyfat.
e)A man who has a 72 waist should, according to the prediction, have a 79.66384%
bodyfat. This percent bodyfat seems unrealistic because our body is composed of other

stuff too such as bones...


f)
bootstrap=do(1000)*lm(Pct.BF~Waist,data=resample(bodyfat))
## Loading required package: parallel
predict1=(bootstrap$Waist*38)+(bootstrap$Intercept)
hist(predict1)

predict2=(bootstrap$Waist*72)+(bootstrap$Intercept)
hist(predict2)

mean(predict2)
## [1] 79.65

Doing bootstraping should make more sense of the data and eliminate the prediction by
chance. The mean of the bootstrapped prediction is still 79.65596 which is still unrealistic
biologically.
2. Big Mac
a)
bigmac=read.delim("http://sites.williams.edu/rdeveaux/files/2014/09/bigmac.tx
t")
with(bigmac,plot(BigMac,EngSal))

The relationship between the price of a Big Mac and the average EngSal is a quadratic one.
It could also be a negative exponential relationship.
b)
with(bigmac,plot(BigMac,EngSal))
lm.bigmac=lm(EngSal~BigMac, data=bigmac)
abline(lm.bigmac,col="red")

residuals(lm.bigmac)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Amsterdam
6616
Buenos Aires
-545
Dusseldorf
24224
Houston
-9380
Los Angeles
19025
Milan
2815
Oslo
-7194
Seoul
-19183
Tel Aviv
-19485

Athens
-17685
Caracas
-10843
Frankfort
11124
Johannesburg
-12287
Luxembourg
27325
Montreal
-7076
Palma
-15113
Singapore
-2616
Tokyo
-5776

Bogota
Bombay
-2239
-3067
Chicago
Copenhagen
14927
19010
Geneva
Helsinki
23624
7609
Kuala Lumpur
Lagos
-17805
-6166
Madrid
Manila
6097
5905
Nairobi
New York
-18126
8820
Paris Rio de Janeiro
3510
-13823
Stockholm
Sydney
-2409
-14373
Toronto
Vienna
4525
11517

Brussels
12116
Dublin
5118
Hong Kong
-21478
London
-11788
Mexico City
27147
Nicosia
-14691
Sao Paulo
-6046
Taipei
-12586
Zurich
30725

predict(lm.bigmac)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Amsterdam
37684
Buenos Aires
15545
Dusseldorf
40376
Houston
38880
Los Angeles
40975
Milan
37085
Oslo
34094
Seoul
37983
Tel Aviv
37085

Athens
37085
Caracas
16143
Frankfort
40376
Johannesburg
36487
Luxembourg
40975
Montreal
40676
Palma
27213
Singapore
26016
Tokyo
40676

plot(lm.bigmac$residuals)

Bogota
Bombay
17639
7767
Chicago
Copenhagen
41573
35290
Geneva
Helsinki
40676
34991
Kuala Lumpur
Lagos
29905
8066
Madrid
Manila
30803
-2405
Nairobi
New York
22426
39180
Paris Rio de Janeiro
35290
23323
Stockholm
Sydney
28709
41573
Toronto
Vienna
40975
37983

Brussels
37684
Dublin
38282
Hong Kong
39778
London
36188
Mexico City
-23347
Nicosia
34991
Sao Paulo
15246
Taipei
36786
Zurich
40975

scatter.smooth(residuals(lm.bigmac)~predict(lm.bigmac))

A regression here is not approprite because the relationship between BigMac and EngSal is
not a linear one as shown in the underlying pattern of the residual.
c) Looking at Tukey's Circle of Transformation, the Y should go down the ladder because
the curve is in the third quadrant.
with(bigmac,scatter.smooth(sqrt(EngSal)~BigMac))

with(bigmac,scatter.smooth(log(EngSal)~BigMac))

with(bigmac,scatter.smooth(-1/(EngSal)~BigMac))

with(bigmac,plot(-1/(EngSal),BigMac))
lm.bigmac1=lm(-1/(EngSal)~BigMac,data=bigmac)
abline(lm.bigmac1,col="red")

I think the transformation with -1/(EngSal) is the most appropriate transformation


because it makes the regression line almost linear. The regression in b is forced upon the
data while the regression here is more accurate, although not as linear as the one in part b.
d)
mylm=lm(BigMac~EngSal,data=bigmac)
library(MASS)
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:dplyr':
##
##
select
boxcox(mylm)

The boxcox suggests that the transformation would be most efficient when EngSal is raised
to the -6th power.
e)
lmobj=lm(BigMac~EngSal,data=bigmac)
predict(lmobj,new=data.frame(EngSal=70000))
##
1
## -7.545

The original model predict for the price of a Big Mac is -7.544585 which is also unrealistic.
f)
lmobj=lm(-1/BigMac~EngSal,data=bigmac)
predict(lmobj,new=data.frame(EngSal=70000))
##
1
## -0.0522

The adjusted prediction for the price of a Big Mac after transformation is -0.05220045
which is still unrealistic although much better than the untransformed prediction.

Anda mungkin juga menyukai