Anda di halaman 1dari 4

CSC 423 Marcelo Manzo

Assignment 6

1.

a) By observing the generated boxplots, we can infer that in terms of customer


age, when customers are older, with ages varying between a wide range of
approximately 30 and 50 years old, with a mean value of approximately 40 years
old, they do not tend to switch providers, while customers at younger ages, with
ages varying between a small range of approximately between 20 and 25 years
old and a mean value of approximately 25 years old, they tend to switch
provider. In terms of changes in bill amount, we can infer that the way it affects
the probability of a customer switching provider is not very clear as with
customer age,. Furthermore, the boxplot shows that in average, customers who
do not switch provider, they have a slighty higher than 1% change in bill amount,
while customers who do switch provider, have approximately 1% change in bill
amount, in average.

b) CHURN = 7.1459 0.6045*TOT_ACTV_SRV_CNT 0.1719*AGE


0.4521*PCT_CHNG_IB_SMS_CNT

c) Log odds log(p/(1-p) of CHURN decrease by 0.6045, for every unit of active
services. Exp(-0.6045) =0.5463. The odds p/(1-p) of CHURN decrease by 45.36%
for every unit of active services, as [0.5463-1)*100] = -45.36%.
Log odds log(p/(1-p) of AGE decrease by 0.1719, for every unit of age.
Exp(-0.1719) =0.842. The odds p/(1-p) of AGE decrease by 15.8% for every unit
of age, as [0.842-1)*100] = -15.8%.
Log odds log(p/(1-p) of PCT_CHNG_IB_SMS_CNT decrease by 0.4521, for every
unit of percent change of latest 2 months incoming SMS wrt previous 4 months
incoming SMS. Exp(-0.4521)=0.6363. The odds p/(1-p) of
PCT_CHNG_IB_SMS_CNT decrease by 36.37% for every unit of percent change of
latest 2 months incoming SMS wrt previous 4 months incoming SMS, as [0.6363-
1)*100]= -36.37%.

d) Predicted probability = 0.03801


95% confidence interval is (0.02379,0.06021), so 95% of the time the
predicted probability will fall within 0.02379 and 0.06021

e)

data churntrain;
infile 'S:\LabSession1\churn_train.csv' firstobs=2 delimiter=',' MISSOVER;
input GENDER $ EDUCATION LAST_PRICE_PLAN_CHNG_DAY_CNT
TOT_ACTV_SRV_CNT AGE PCT_CHNG_IB_SMS_CNT PCT_CHNG_BILL_AMT CHURN
COMPLAINT;
numgender=(GENDER="M");
run;
proc print;
run;
proc sort DATA=churntrain;
by CHURN;
run;
proc boxplot DATA=churntrain;
PLOT (PCT_CHNG_BILL_AMT AGE)*CHURN;
run;
proc logistic data=churntrain;
model CHURN(event='1')= numgender EDUCATION
LAST_PRICE_PLAN_CHNG_DAY_CNT TOT_ACTV_SRV_CNT AGE
PCT_CHNG_IB_SMS_CNT PCT_CHNG_BILL_AMT COMPLAINT/corrb;
run;
proc logistic data=churntrain;
model CHURN(event='1')= numgender EDUCATION
LAST_PRICE_PLAN_CHNG_DAY_CNT TOT_ACTV_SRV_CNT AGE
PCT_CHNG_IB_SMS_CNT PCT_CHNG_BILL_AMT COMPLAINT/selection=forward;
run;
proc logistic data=churntrain;
model CHURN(event='1')= TOT_ACTV_SRV_CNT AGE
PCT_CHNG_IB_SMS_CNT/corrb influence iplots;
run;
proc logistic data=churntrain;
model CHURN(event='1')= TOT_ACTV_SRV_CNT AGE
PCT_CHNG_IB_SMS_CNT/stb;
run;
data new;
input numgender age LAST_PRICE_PLAN_CHNG_DAY_CNT TOT_ACTV_SRV_CNT
PCT_CHNG_IB_SMS_CNT PCT_CHNG_BILL_AMT COMPLAINT;
datalines;
1 43 0 4 1.04 1.19 1
;
data pred;
set new churntrain;
numgender=(GENDER="M");
run;
proc logistic data=pred;
model CHURN(event='1')= TOT_ACTV_SRV_CNT AGE PCT_CHNG_IB_SMS_CNT;
output out=pred p=phat lower=lcl upper=ucl predprob=(individual);
run;
proc print data=pred;
title2 'Predicted Probabilities and 95% Confidence Limits';
run;

2.

a) The scatterplot reveals a curve that goes up, back down and up, and in this
case a cubic polynomial function appears to be the best approach to fit a model
between Energy and Temp.
b)

c) Yes, as p-values for each variable < 0.0001

d) The constant variance appears to be violated in the residuals vs predicted plot


as points dont appear to be randomly scattered around the zero line, with lots of
points in fact, appearing almost in a vertical line where Predicted value equals
120. In terms of the normality assumption, the normal probability plot reveals
that points lie close to a line, so they are assumed to be approximately normal.

e) Energy = -17.03623 + 24.524temp 1.49003temp2 + 0.02928temp3

f) Predicted value = 108.4787 energy units


95% confidence interval is (73.3717,143.5860)
g)

data energy;
infile 'S:\LabSession1\energytemp.txt' firstobs=2;
input energy temp;
run;
proc print;
run;
proc gplot;
plot energy*temp;
run;
data energy;
set energy;
temp2=temp**2;
temp3=temp**3;
run;
proc reg;
model energy=temp/r;
plot student.*pred.;
proc reg;
model energy=temp temp2 temp3/r;
plot student.*predicted.;
plot student.*temp;
plot npp.*student.;
run;
DATA energy;
infile energytemp.txt firstobs=2;
input energy temp;
run;
DATA NEW;
INPUT energy temp;
Datalines;
. 10
;
data predict;
set new energy;
temp2=temp**2;
temp3=temp**3;
run;
proc reg data=predict;
model energy=temp temp2 temp3/p cli;
run;

Anda mungkin juga menyukai