Assignment No 1: Summary of Data

Assignment No 1
Q1. The file logitsubscibedata.xls gives the number of people in each age group who
subscribe and do not subscribe to a magazine. How does age influence the chance of
subscribing the magazine?
Solution:
Summary of data:
Mean
Age
22
27
32
37
42
47
52
57
22
27
32
37
42
47
52
57
Age
20-24
25-29
30-34
35-39
40-44
45-49
50-54
55-60
20-24
25-29
30-34
35-39
40-44
45-49
50-54
55-60
No Sub
52
61
57
73
56
84
57
87
44
53
57
54
56
83
77
74
Sub
Gender
31
30
18
14
17
8
8
9
46
37
30
12
12
19
17
12
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
Mean Age
22
27
32
37
42
47
52
57
22
27
32
37
42
47
52
57
Total
83
91
75
87
73
92
65
96
90
90
87
66
68
102
94
86
Subscriber
s
31
30
18
14
17
8
8
9
46
37
30
12
12
19
17
12
To determine the impact of multiple independent variables, age and gender, to predict
the outcome group memebeership i.e. chance of subscribing the magazine by
individual we need to use Logit Regression.
Result and interpretation of Logit Regression using SPSS
Omnibus Tests of Model Coefficients
Chi-square
Step 1
df
Sig.
Step
94.806
.000
Block
94.806
.000
Model
94.806
.000
Variance explained
In order to understand how much variation in the dependent variable can be
explained by the model (the equivalent of R2 in multiple regression), we have to
interpret the below table, "Model Summary":
Model Summary
Step
-2 Log likelihood
Cox & Snell R
Nagelkerke R
Square
Square
1381.112a
.068
.102
a. Estimation terminated at iteration number 4 because

parameter estimates changed by less than .001.
This table contains the Cox & Snell R Square and Nagelkerke R Square values, which
are both methods of calculating the explained variation. These values are sometimes
referred to as pseudo R2 values (and will have lower values than in multiple
regression). However, they are interpreted in the same manner, but with more caution.
Therefore, the explained variation in the dependent variable based on our model
ranges from 6.80% to 10.2%, depending on reference to the Cox & Snell R2 or
Nagelkerke R2 methods, respectively. Nagelkerke R2 is a modification of Cox &
Snell R2, the latter of which cannot achieve a value of 1. For this reason, it is
preferable to report the Nagelkerke R2 value.
Category prediction
Binomial logistic regression estimates the probability of an event (in this case,
subscription of magazine) occurring. If the estimated probability of the event
occurring is greater than or equal to 0.5 (better than even chance), SPSS classifies the
event as occurring (e.g., subscription of magazine). If the probability is less than 0.5,
SPSS classifies the event as not occurring (e.g., no subscription of magazine). It is
very common to use binomial logistic regression to predict whether cases can be
correctly classified (i.e., predicted) from the independent variables. Therefore, it
becomes necessary to have a method to assess the effectiveness of the predicted
classification against the actual classification. There are many methods to assess this
with their usefulness oftening depending on the nature of the study conducted.
However, all methods revolve around the observed and predicted classifications,
which are presented in the "Classification Table", as shown below:
Classification Tablea
Observed
Predicted
Subscription
0
Step 1
Subscription
Percentage
Correct
1025
100.0
320
.0
Overall Percentage
76.2
a. The cut value is .500
This means that if the probability of a case being classified into the "yes" category is
greater than .500, then that particular case is classified into the "yes" category.
Otherwise, the case is classified as in the "no" category (as mentioned previously).
Whilst the classification table appears to be very simple, it actually provides a lot of
important information about your binomial logistic regression result, including:
A. The percentage accuracy in classification (PAC), which reflects the percentage of
cases that can be correctly classified as "no" subscription of magazine with the
independent variables added (not just the overall model).
In our case is 100%
B. Sensitivity, which is the percentage of cases that had the observed characteristic
(e.g., "yes" for subscription of magazine) which were correctly predicted by the model
(i.e., true positives).
In our case is 0%
C. Specificity, which is the percentage of cases that did not have the observed
characteristic (e.g., "no" for subscription of magazine) and were also correctly
predicted as not having the observed characteristic (i.e., true negatives).
In our case is 76.2%
D. The positive predictive value, which is the percentage of correctly predicted cases
"with" the observed characteristic compared to the total number of cases predicted as
having the characteristic.
E. The negative predictive value, which is the percentage of correctly predicted cases
"without" the observed characteristic compared to the total number of cases predicted
as not having the characteristic.
Variables in the equation

The "Variables in the Equation" table shows the contribution of each independent
variable to the model and its statistical significance. This table is shown below:
B
Variables in the Equation

S.E.
Wald
df
Sig.
Age
-.052
.006 78.998
Step
Gender(
.407
.134 9.264
1a
1)
Constant
.598
.231 6.701
a. Variable(s) entered on step 1: Age, Gender.
Exp(B)
95% C.I.for
EXP(B)
Lower Upper
.949
.938
.960
.000
.002
1.502
.010
1.818
1.156
1.952
The Wald test ("Wald" column) is used to determine statistical significance for each of
the independent variables. The statistical significance of the test is found in the "Sig."
column. From these results you can see that age (p = .00), gender (p = .002) added
significantly to the model/prediction. We can use the information in the "Variables in
the Equation" table to predict the probability of an event occurring based on a one unit
change in an independent variable when all other independent variables are kept
constant. For example, the table shows that the Females were 1.52 times more likely to
subscribe the magazine than males.
Regression equation
Probability of a case taking subscription =
e{0.407*Gender 0.052*Age+0.598}
1+ e{0.407*Gender 0.052*Age+0.598}
Conclusion
A logistic regression was performed to ascertain the effects of age and gender on the
likelihood that participants have subscribe the magazine or not. The logistic regression
model was statistically significant, 2(4) = 94.86, p < .0005. The model explained
10.20% (Nagelkerke R2) of the variance in subscription and correctly classified
76.2.0% of cases. Females were 1.52 times more likely to subscribe the magazine than
males. Increasing age was associated with less l subscription of magazine.

Assignment No 1: Summary of Data

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Assignment No 1: Summary of Data

Diunggah oleh

Hak Cipta:

Format Tersedia

Assignment No 1

Cox & Snell R

a. Estimation terminated at iteration number 4 because

a. The cut value is .500

Variables in the equation

Variables in the Equation

Anda mungkin juga menyukai