Anda di halaman 1dari 13

Homework #3

Due 09/17/08
ABE 6986
Ramin Shamshiri
UFID # 90213353
Incidence of coronary heart disease with age
Reference: Hosmer, D.W and S. Lemeshow. 1989. Applied logistic regression
John Wiley & Sons. New York, NY
Table 1: Dependence of coronary heart disease with age
Age Group
yr
20-29
30-34
35-39
40-44
45-49
50-54
55-59
60-69

Age
yr
25.0
32.5
37.5
42.5
47.5
52.5
57.5
65.0

0.10
0.13
0.25
0.33
0.46
0.63
0.76
0.80

-2.079
-1.779
-0.956
-0.547
0.044
0.847
1.692
2.079

Data are given in Table 1 for incidence of coronary heart disease with age group among 100 subjects,
where F is the fraction of the 100 with significant symptoms. Assume the result can be described by the
logistic model given by:
=

1 + exp
( . )

Where A is the maximum value of F at high age; b is the intercept parameter; and c is the response
coefficient, yr-1. Now equation above can be rearranged to the linearized form
= ln

Ramin Shamshiri

1 = .

ABE 6986, HW #3

Due 09/17/08

1- Plot F vs. Age on linear-linear graph paper.


Answer:

Age
yr
25.0
32.5
37.5
42.5
47.5
52.5
57.5
65.0

Ramin Shamshiri

F
0.10
0.13
0.25
0.33
0.46
0.63
0.76
0.80

ABE 6986, HW #3

Due 09/17/08

2- Calculate values of Z for each age for values of A= 0.90, 0.91, 0.92, 0.93, 0.94 and 0.95.
Answer:
= ln

1 = .

A=0.90
Age=25 => F=0.1
0.9
= ln
1 = 2.079
0.1
.
.
.
A=0.95
Age=65 => F=0.8
0.95
= ln
1 = 1.6739
0.8
Table 2: Values of Z for each age for the given values of A

Age
yr
25.0
32.5
37.5
42.5
47.5
52.5
57.5
65.0

Z1

Z2

-2.0794
-1.7789
-0.95551
-0.54654
0.044452
0.8473
1.6917
2.0794

-2.0919
-1.7918
-0.97078
-0.56394
0.021979
0.81093
1.6227
1.9841

A=

0.90

0.91

Ramin Shamshiri

Z3

Z4

Z5

Z6

-2.1041
-1.8045
-0.9858
-0.5810
0
0.7758
1.5581
1.8971

-2.1163
-1.8171
-1.0006
-0.59784
-0.02151
0.74194
1.4975
1.8171

-2.1282
-1.8295
-1.0152
-0.61437
-0.04256
0.70915
1.4404
1.743

-2.1401
-1.8418
-1.0296
-0.63063
-0.06318
0.6774
1.3863
1.674

0.92

0.93

0.94

0.95

ABE 6986, HW #3

Due 09/17/08

3- Estimate values of parameters b and c corresponding to each values of A by linear


regression of Z vs. Age. Include the correlation coefficient (r) to 5 decimal places.
Answer:

Linear model: = ln 1 = .
= 0.1144 5.235
Coefficients (with 95% confidence bounds):
c = 0.1144 (0.09751, 0.1313)
b = -5.235 (-6.022, -4.447)
Goodness of fit:
SSE: 0.3531
R-square: 0.9787
Adjusted R-square: 0.9751

Ramin Shamshiri

r=0.98929
RMSE: 0.2426

ABE 6986, HW #3

Age
yr
25.0
32.5
37.5
42.5
47.5
52.5
57.5
65.0

Z
-2.079
-1.779
-0.956
-0.547
0.044
0.847
1.692
2.079

Due 09/17/08

Linear model: = ln 1 = .
1 = 0.1144 5.235
Coefficients (with 95% confidence bounds):
c = 0.1144 (0.09754, 0.1312)
b=
-5.235 (-6.022, -4.448)
Goodness of fit:
SSE: 0.3523
R-square: 0.9787
Adjusted R-square: 0.9752

Ramin Shamshiri

r=0.98929
RMSE: 0.2423

ABE 6986, HW #3

Age (yr)
25.0
32.5
37.5
42.5
47.5
52.5
57.5
65.0

Z1
-2.0794
-1.7789
-0.95551
-0.54654
0.044452
0.8473
1.6917
2.0794

A=

0.90

Due 09/17/08

Age (yr)
25.0
32.5
37.5
42.5
47.5
52.5
57.5
65.0

Linear model: = ln 1 = .
2 = 0.1123 5.178
Coefficients (with 95% confidence bounds):
c=
0.1123 (0.09589, 0.1288)
b=
-5.178 (-5.946, -4.41)
Goodness of fit:
SSE: 0.3358
R-square: 0.979
Adjusted R-square: 0.9754

Ramin Shamshiri

r=0.98944
RMSE: 0.2366

ABE 6986, HW #3

A=

Z2
-2.0919
-1.7918
-0.97078
-0.56394
0.021979
0.81093
1.6227
1.9841
0.91

Due 09/17/08

Age (yr)
25.0
32.5
37.5
42.5
47.5
52.5
57.5
65.0

Linear model: = ln 1 = .
3 = 0.1105 5.127
Coefficients (with 95% confidence bounds):
c=
0.1105 (0.09435, 0.1266)
b=
-5.127 (-5.88, -4.374)
Goodness of fit:
SSE: 0.3226
R-square: 0.9791
Adjusted R-square: 0.9756

Ramin Shamshiri

r=0.989797
RMSE: 0.2319

ABE 6986, HW #3

A=

Z3
-2.1041
-1.8045
-0.9858
-0.5810
0
0.7758
1.5581
1.8971
0.92

Due 09/17/08

Age (yr)
25.0
32.5
37.5
42.5
47.5
52.5
57.5
65.0

Linear model: = ln 1 = .
4 = 0.1088 5.082
Coefficients (with 95% confidence bounds):
c=
0.1088 (0.09291, 0.1246)
b=
-5.082 (-5.823, -4.341)
Goodness of fit:
SSE: 0.312
R-square: 0.9791
Adjusted R-square: 0.9757

Ramin Shamshiri

r=0.989494
RMSE: 0.2281

ABE 6986, HW #3

A=

Z4
-2.1163
-1.8171
-1.0006
-0.59784
-0.02151
0.74194
1.4975
1.8171
0.93

Due 09/17/08

Age (yr)
25.0
32.5
37.5
42.5
47.5
52.5
57.5
65.0

Linear model: = ln 1 = .
5 = 0.1072 5.041
Coefficients (with 95% confidence bounds):
c=
0.1072 (0.09156, 0.1228)
b=
-5.041 (-5.772, -4.311)
Goodness of fit:
SSE: 0.3035
R-square: 0.9791
Adjusted R-square: 0.9756

Ramin Shamshiri

r=0.989494
RMSE: 0.2249

ABE 6986, HW #3

A=

Z5
-2.1282
-1.8295
-1.0152
-0.61437
-0.04256
0.70915
1.4404
1.743
0.94

Due 09/17/08

Age (yr)
25.0
32.5
37.5
42.5
47.5
52.5
57.5
65.0

Linear model: = ln 1 = .
6 = 0.1057 5.004
Coefficients (with 95% confidence bounds):
c=
0.1057 (0.09028, 0.1212)
b=
-5.004 (-5.726, -4.283)
Goodness of fit:
SSE: 0.2964
R-square: 0.979
Adjusted R-square: 0.9755

Ramin Shamshiri

r=0.989444
RMSE: 0.2223

ABE 6986, HW #3

A=

Z6
-2.1401
-1.8418
-1.0296
-0.63063
-0.06318
0.6774
1.3863
1.674
0.95

Due 09/17/08

4- Select the values of A, b, and c for the optimum r.


Answer:
Based on the results of problem 3, we have the below table for the r values:
A

0.90

0.98929

0.91

0.98944

0.92

0.98979

0.93

0.98949

0.94

0.98949

0.95

0.98944

The correlation coefficient value is desired to be closer to 1.0. According to the above table, r=0.98979
has the larger value and closet to one, thus we consider it as the optimum r. This value corresponds to:
3 = 0.1105 5.127
A=0.92,
b=
-5.127 (-5.88, -4.374)
c=
0.1105 (0.09435, 0.1266)

5- Plot Z vs. Age for this case on linear-linear graph paper. Plot the regression line as well.
Answer:

Ramin Shamshiri

ABE 6986, HW #3

Due 09/17/08

6- Plot the estimation equation on part (1)


Answer:
First, using the F and Age data set, I have plotted F vs. Age on linear-linear paper and a regression line
(Figure 9). The using MATLAB, I have symbolically plotted (Figure 10) Eq.1 with:
A=0.92
Age
F
b = -5.127
yr
c = 0.1105
0.92
25.0
0.10
=
1 + exp
(5.127 0.1105. )
32.5
0.13
37.5
0.25
Linear model to fit on F vs. Age:
42.5
0.33
= 0.02024 0.4784
47.5
0.46
R-square: 0.9605
r=0.98005
52.5
0.63
57.5
0.76
65.0
0.80

Ramin Shamshiri

ABE 6986, HW #3

Due 09/17/08

7- Discuss your results.


Answer:
I tried to fit Eq.1 to Age and F data set using MATLAB to find out the constants, b & c. The fit
equation was then calculated as below:
General model:
F(Age) = 0.92/(1+exp(b-c*Age))
Coefficients (with 95% confidence bounds):
b=
0.402
c=
-9.724
Goodness of fit:
SSE: 2.024
R-square: -2.834
The results show that this model is not capable of fitting this data set and cannot be considered as a
prediction model for this data. Converting the F values to linear values which has resulted the Z values
seems to have solved this problem. We can see from the results that a simple linear regression model with
a very good correlation coefficient is capable to fit the data.
Using the b and c constant from the linear model, I plotted Eq.1 which appears to better fit the F data,
however it is still not a good prediction because for instance, at Age=65 the given value for F is 0.8 while
the value from the prediction model is around 0.92!

Ramin Shamshiri

ABE 6986, HW #3

Due 09/17/08

Anda mungkin juga menyukai