Anda di halaman 1dari 11

DSC 2008 Business AnalyticsData and Decisions Tutorial 4 (answers)

Q1, 3 & 5 are for tutorial discussions; answers for the rest are already included.
Again, BACT offers useful consulting before each tutorial class.
(1) Price of Beef. How is the price of beef related to other factors? The data in
Tut4-Q1-Price_of_Beef.xlsx give information on the price of beef (PBE) and the
possible explanatory variables: consumption of beef per capita (CBE), retail
food price index (PFO), food consumption per capita index (CFO), and an index
of real disposable income per capita (RDINC) for the years 1925 to 1941 in the
United States.
(a) Use computer software to find the regression equation for predicting the
price of beef based on all of the given explanatory variables. What is the
regression equation?
In order of importance of X variables, from Tut4-Q1Price_of_Beef(answer)&.xlsx (as given by Excels Data Analysis add-in):
PBE = 347.60 2.83CFO 0.772CBE 0.327PFO + 0.544 RDINC
As seen, for predicting the price of beef, food consumption is most
important, followed by consumption of beef, retail food price index, and
real disposable income.
Note that it is usually sufficient (and clearer) to give only 3 significant
digits for numbers. In the above, 347.60s 5 significant digits gives us 2
decimals to match those of 2.83; i.e. the 3-digit rule needs to be
selectively relaxed.
The plots generated by Data Analysis add-in are not used.
(b) Produce the appropriate residual plots to check Regression assumptions. Is
this inference appropriate for this model? Please explain.
Remember that, practically, there is really no point in doing this, unless
this is the final most-valid model. Nonetheless, there are 3 ways to do
this:
(1) By Excel Data Analysis add-in: Tut4-Q1-Price_of_Beef(answer)&.xlsx
(ignore the Line Fit plots, and the Normal Probability plot).
(2) By RegressTemplate.xlsm: Tut4-Q1-RegressPrice_of_Beef(answer)&.xlsm
(without the Normal probability plot; with axes hand-tuned to achieve
square plots).
(3) By MiniRegressTemplate.xlsx: Tut4-Q1-MiniRegressionTemplatePrice_of_Beef(answer)&.xlsx (axes are scaled automatically, so plots are
mostly not square; Normal probability plot has very slight hint of short
right-tail).
All plots look ok generally, with no clearly discernible pattern.
1

(c) How much (total) variation in the price of beef can be explained by this
model?
R2 = 91.7%. [Adjusted R2 (88.9%) is the proportion of the variance in the
price of beef which is explained by the model.]
(d) Consider the coefficient of beef consumption per capita (CBE). Does it say
that the price of beef goes up when people eat less beef? Explain.
The coefficient is negative, which suggests lower beef price (PBE) is
expected with higher consumption (CBE). But it must be interpreted after
allowing for the effects of the other predictors. CBE (beef consumption)
may be collinear with CFO (food consumption), for example. In addition,
regression coefficients cannot be interpreted causally. Eating less beef (an
X) may occur along with higher beef prices (Y), but it is unlikely to cause
them. For example, the causation (if any) may actually run in the other
direction: more expensive beef (PBE as an X) may lead to lower per capita
consumption (CBE as Y); i.e. the whole regression might have to be
rethought!
So, does having price as Y (instead of as an X) result in a nonsensical
model altogether? Probably.
See also Tut4-Q1-RegressPrice_of_Beef&.xlsm where Year is included, yielding
a model with a smaller Significance F. Year could be standing for factors that
were not included in the original collection of 4 Xs. Some would argue that
PFO (retail food price index) should have captured any trend that Year is here
standing in for, but apparently not. When the Run button is pressed, an
even better model obtains at Tut4-Q1-RegressPrice_of_BeefSol&.xlsm after
RDINC (Real Disposable Income) is deleted, and with variables ordered
according to importance. Year is not even the least important in this model:
PFO (Retail Food Price Index) is worse.
Lesson: dont throw away any data (e.g. Year) if possible; let SSS do any
throwing.
(2) U.S.News March-1999 rankings of the best Business Schools in the U.S.A. was
analysed with Microsoft Excel and output below, where the Overall Scores of
business schools are regressed on the ten published numerical measures.

SUMMARY OUTPUT
Regression Statistics
Multiple R
0.993695
R Square
0.98743
Adjusted R Square 0.984122
Standard Error
1.617535
Observations
49
ANOVA
df
Regression
Residual
Total

Intercept
GMAT
RecruitRank
AcadRank
GPA
Salary
Employ3Month
Employ0Month
Enroll
AcceptRate
Fees

10
38
48

SS
MS
F
Significance F
7809.92298
781 298.49653 6.516E-33
99.42395923 2.616
7909.346939

Coefficients Standard Error


-100.447
14.95664546
0.12113
0.022724912
-0.16618
0.032646002
-0.16297
0.033180852
16.28703
3.47755003
0.000225
5.42735E-05
25.06349
9.299604199
9.404194
3.803458018
0.000208
0.000468628
1.319524
3.931272789
-1.4E-06
7.18525E-05

t Stat
-6.72
5.33
-5.09
-4.91
4.683
4.147
2.695
2.473
0.443
0.336
-0.02

P-value
5.975E-08
4.712E-06
1.001E-05
1.75E-05
3.55E-05
0.0001819
0.0104222
0.0180055
0.6600896
0.7389825
0.9844997

Lower 95%
-130.72502
0.0751262
-0.232269
-0.2301368
9.2470973
0.0001152
6.2374214
1.7044955
-0.000741
-6.6389221
-0.0001469

Upper 95%
-70.168723
0.16713456
-0.1000923
-0.0957946
23.3269618
0.00033496
43.8895516
17.1038925
0.00115641
9.27796985
0.00014405

The correlation matrix for the variables (some of which with shortened
names) is as follows:

Overall Salary GMAT Acad Recruit Accept Fees


Overall
1
Salary
0.9539
1
GMAT
0.8807 0.8289
1
Acad
-0.8732 -0.808 -0.669
1
Recruit
-0.8519 -0.827 -0.598 0.851
1
Accept
-0.785 -0.736
-0.83 0.683 0.5428
1
Fees
0.6448 0.6923 0.5676 -0.61 -0.6011 -0.4148
1
GPA
0.6253 0.5212 0.6555 -0.45 -0.4132 -0.5062 0.224
Enroll
0.5573 0.5393
0.43 -0.51 -0.5284 -0.2625 0.514
Employ0 0.5399 0.5606 0.4407 -0.39 -0.3545 -0.5154 0.238
Employ3 0.4606 0.4487 0.4296 -0.26 -0.2006 -0.4085 0.276

GPA Enroll Employ0 Employ3

1
0.357
1
0.022 0.221
1
0.119
0.24 0.691967

The variables include ranks by recruiters, ranks by academics, grade point


averages, median starting salaries, % employed at graduation as well as 3
months thereafter, total number of students enrolled at each school,
acceptance rates, etc. The data for the rankings were collected for the
year 1998, are as follows.
3

Overal
100
98
98
98
95
94
93
93
91
90
89
88
86
84
83
81
81
79
79
73
72
72
71
71
69
68
68
68
68
67
66
66
65
64
64
63
63
63
63
63
63
63
63
62
62
62
62
61
61

Acad
1
2
5
2
2
5
9
7
9
9
13
9
14
7
18
14
17
16
18
20
25
20
22
25
25
34
30
34
22
22
34
34
52
30
44
52
42
42
34
44
25
25
44
30
63
52
34
59
34

Recruit
5
2
3
1
10
7
8
3
6
16
8
17
13
13
18
13
11
18
12
23
24
22
26
28
24
21
31
29
35
27
49
33
38
54
70
37
49
47
20
65
33
30
43
40
43
45
40
54
35

GPA
3.59
3.5
3.45
3.5
3.5
3.38
3.45
3.34
3.34
3.5
3.36
3.4
3.4
3.43
3.4
3.25
3.2
3.2
3.37
3.26
3.34
3.3
3.2
3.3
3.2
3.21
3.21
3.35
3.24
3.13
3.3
3.3
3.2
3.4
3.31
3.2
3.2
3.3
3.4
3.2
3.3
3.28
3.2
3.2
3.27
3.4
3.3
3.52
3.28

GMAT
722
689
685
685
690
695
680
672
664
683
685
671
675
674
682
647
641
653
660
624
640
631
639
650
630
637
641
653
620
624
628
628
632
634
652
631
636
619
601
663
612
613
633
614
630
632
647
634
618

Accept
6.70%
12.90%
15.50%
13.10%
13.40%
22.70%
11.50%
22.00%
15.90%
14.50%
15.00%
12.00%
17.70%
11.00%
25.80%
28.40%
22.60%
29.80%
23.20%
25.50%
30.70%
40.10%
27.60%
25.30%
45.20%
36.40%
26.50%
24.00%
45.60%
33.10%
42.80%
27.20%
44.60%
24.90%
27.00%
48.70%
32.10%
33.60%
67.90%
30.50%
47.50%
45.90%
42.00%
40.00%
35.10%
43.40%
32.30%
43.70%
27.00%

Salary
$105,700
$105,000
$115,000
$100,000
$100,000
$95,000
$106,000
$97,000
$98,950
$95,000
$95,000
$100,000
$95,000
$90,000
$95,000
$92,500
$92,000
$88,370
$85,000
$76,000
$77,500
$80,000
$80,000
$87,000
$77,000
$80,000
$77,000
$70,000
$74,000
$75,000
$78,500
$71,600
$75,500
$62,000
$71,500
$75,000
$71,000
$63,000
$66,000
$65,000
$67,000
$62,000
$65,000
$74,150
$70,550
$72,000
$64,700
$65,200
$65,000

Employ
80.80%
95.40%
97.50%
95.00%
97.10%
94.70%
98.00%
99.70%
98.80%
97.70%
93.60%
97.40%
94.80%
66.80%
89.70%
90.60%
96.00%
89.10%
84.70%
96.70%
80.50%
82.00%
89.90%
76.40%
85.10%
73.90%
83.00%
80.00%
96.50%
87.60%
94.00%
87.50%
93.10%
72.20%
75.50%
84.10%
82.10%
86.10%
38.90%
86.40%
72.60%
79.30%
88.40%
82.60%
90.60%
74.70%
71.40%
78.40%
77.10%

Employ3
98.90%
98.50%
100.00%
100.00%
97.10%
98.20%
99.10%
100.00%
99.40%
99.70%
98.30%
99.00%
97.80%
87.40%
97.80%
94.20%
98.00%
95.30%
88.70%
99.20%
99.20%
91.80%
94.40%
97.50%
92.00%
92.10%
98.10%
94.20%
99.10%
95.20%
97.00%
94.80%
100.00%
97.20%
96.20%
95.50%
97.30%
97.70%
88.90%
97.70%
93.40%
92.50%
99.00%
89.70%
97.90%
91.10%
88.80%
91.20%
88.10%

Fees
$24,990
$26,260
$25,872
$26,290
$27,100
$26,284
$27,770
$25,185
$26,548
$20,534
$21,479
$26,100
$27,923
$19,792
$25,355
$25,135
$16,353
$24,130
$15,268
$15,619
$24,200
$17,013
$23,970
$24,852
$24,240
$25,678
$14,958
$15,235
$16,332
$23,800
$12,411
$12,182
$12,865
$12,411
$20,577
$16,300
$22,598
$9,125
$21,670
$19,867
$19,004
$16,230
$20,500
$20,900
$16,218
$23,304
$13,677
$7,740
$14,756

Enrol
726
1,767
2,504
1,557
708
2,808
1,373
1,927
671
1,046
491
375
2,902
809
430
576
461
641
726
252
515
548
629
1,229
433
519
438
867
1,276
727
853
457
200
321
249
295
855
181
1,575
127
505
470
666
1,139
390
389
349
271
247

(a) In the context of the regression model, which variable appeared to be least
related to the overall rankings? Please explain. Comment on whether this
is surprising? What is the practical implication of this result for a typical
U.S.News reader?
Fees least related; largest P-value. Rather surprising. On the whole, given
other variables that are better predictor for ranking, higher fees do not
imply better school. If this model is to be believed, a reader should be
aware that high fees might be arbitrary (if it can be argued that school
charge high fees when they are ranked high).
(b) In the context of the regression model, what is the most important
characteristic of a good business school? Please explain. How would you
interpret this result in a simple English sentence for a typical U.S.News
reader?
GMAT is most important, since it has smallest P-value. This is not the
final model, since some variables should probably be deleted (and then
relative importance of variables may change), hence it is premature to
4

interpret for magazine readers.


Nonetheless: For administrators of
business schools, focusing on raising the admission GMAT scores will
probably also require enhancing the other measures of school quality, thus
improving rankings as a final result.
(c) Which variable is the most important as reflected by the correlation
analysis? Which is the least important? How would you interpret these
results in simple English sentences for a typical U.S.News reader?
Salary (having the highest correlation with Overall) is most important,
Employ3 is least important. For interested parties not employed by the
business schools (e.g., potential applicants), a good proxy for quality would
be Salary earned by business school graduates; whereas the percentage of
graduates who find employment 3 months after graduation is of almost no
significance.
(d) Please explain the apparent inconsistencies between the regression and
the correlation analyses. Write in a way that readers of U.S.News would
understand.
Regression shows how variables together can predict Y, correlation
measures each variables individual affinity to Y without reference to other
variables.
(e) Clearly percentage of students employed after 3 months and
undergraduate GPA are the two most important determinants of ranks
just look at their multipliers for the overall scores! Comment on this
statement.
This has everything to do with scales of measurement, which affect the
size of the coefficients (multipliers; which are the largest at 25.0 and 16.3
respectively for Employ3month & GPA); nothing to do with the importance
of the variables (which should be reflected by their p-values).
(f) Suppose you are the editor of U.S.News, write a short paragraphstarting
with Dear Deanadvising the deans of business schools on how they
might improve their U.S.News rankings in future.
Deans are concerned with the drivers in the regression model, since he
might be able to improve future rankings by changing those: 1. Recruit
strong students (GMAT & GPA), 2. Concentrate on placements (RecruitRank
and Salary), 3. Build strong faculty (acadrank).
Dear Dean, A careful reading of the ranking data suggests that you should
focus on improving student quality, increasing efforts for student
placements, and building a strong faculty. Those have the potential to
substantially elevate your Schools ranking.
(g) Suppose that in this 1999 rankings U.S.News managed to uncover the true
(i.e., the most accurate possible) model to rank schools (e.g., Stanford is
indeed the best business school in the world)! Suppose this model will
remain true for future years. Write a short paragraphstarting with Dear
5

MBA Applicantadvising an MBA applicant, who has the same criteria for
ranking business schools as U.S.News, on how he might choose the best
among a list of business schools (none of which appears on the U.S.News
list) he had been offered admissions for September 1999. You should not
assume that the MBA applicant would know how U.S.News arrived at the
rankings, or how to do statistical analyses on his own (i.e., dont give him a
formula, just tell him what to look for). Note: a school that did not
participate in the U.S.News survey will not typically have external
measures like RecruitRank, but will normally publish internal data like Fees
and Median Starting Salary.
Applicant cannot (and has no intention to) control many of the drivers in a
regression equation, and is more dependent on correlation analysis: just
aim for school with highest Salary score.
Dear MBA Applicant, Among all the schools that had offered you
admissions, you might do well to choose the one with the highest Salary
score. We have identify this to be the single best predictor of the value
you are likely to derive from your MBA education.
(h) Explain how you might obtain a good (better?) prediction equation for the
overall score of a business school. Make a guess as to whether your
procedure is likely to yield a good model, giving your reasons.
Step-wise procedure, like the SSS. Since Sig F and adjusted R-square are
already good, step-wise procedure likely to yield even better (more
useable) model.
(i) Find as good a prediction model as you can for predicting the overall score
of a business school. Clearly state your prediction equation and explain
why it might be good.
If we apply variable selection to the original model Tut4-Q2RegressU.S.NewsMarch1999Rankings.xlsm, we end up with the last 3
variables (Fees, AcceptRate and Enrol) being deleted from the original
regression model: Tut4-Q2-RegressU.S.NewsMarch1999RankingsSol.xlsm.
It is surprising that the relative importance of the original variables was
intact except AcadRank and RecruitRank switched adjacent places. This
probably means that the 3 deleted variables were not much related to the
final set of 7 Xs (since deleting collinear Xs may much change some
remaining p-values but not others, thus alter relative significance of
remaining Xs).
It makes more sense to base our regression analysis on this final model,
rather than the original model with 10 Xs, since the bigger model wasnt
valid as yet.
(3) 4e: P 558, 5e: P 493, problem 19. Electric utility stocks. Tut4-Q3-P10_19.xlsx
Traditionally, utility stocks are bought by pensioners for their steady
dividends, not for their upside price potentials.

Tut4-Q3-P10_19Regress&.xlsm has the regression output, in the same format


as Excels Data Analysis add-in, but with live cells. From the regression
output, we see that any change of return on average equity (X1) by 1 would
be associated with a change in stock price of 0.48, in regions where the
annual dividend rate (X2) is the same. Similarly for 2.
The Standard Error of 1.66 gives the estimated standard deviation of the
vertical distances of the Y values from the regression plane, i.e., the standard
deviation of the residuals. Loosely, 1.66 is the typical residual when the
multiple regression equation is used to fit the stock price.
The proportion of variation in stock price which is explained by the regression
is 0.93, the R2. The proportion of the variance of stock price which is
explained by the regression is 0.92, the Adjusted R 2.
Since Significant F is small and Adjusted R2 is large, any prediction using the
regression equation promises to be fairly accurate.
(4) 4e: P 560, 5e: P 494, problem 25. Forward stepwise regression. Tut4-Q4P10_10.xlsx
This problem explores the procedure of forward stepwise regression, which
normally involves finding the best X variable to join an existing subset of Xs.
Thus, with a pool of k Xs, the number of regressions to be run is k+(k-1)+(k2)++3+2+1=k(k+1)/2. This compares with backward stepwise regression
(e.g. SSS) which involves k regressions. The best-subset regression
theoretically involves 2k-1 regressions.
Its easier to open RegressTemplate.xlsm, click on New Size, paste in the full
dataset, and then delete unwanted columns one at a time. An Undo will
recover all the deleted columns, to allow you to start over with a different
order of deletion. In effect, we are simulating the forward stepwise regression
using a template for backward stepwise regression, make possible by Excels
undo capability.
(a) Starting with Tut4-Q4-P10_10Regress.xlsm (3 sheets; select 1st sheet
Regression), which also includes the Home (index) variable for the
home number, the best model would be Tut4-Q4-P10_10RegressSol.xlsm
that includes only Lot Size, Home Size and Number Baths. However, we
instead show the results, as if we had added one variable at a time (in the
order they appear in the dataset), as the followings:
Home Size

Standard Error
R Square
Adjusted R
Square
Significance F

29945.59
071
0.485569
15
0.482093
266
4.06E-23

Home
SizeSize
Lot

24552.37
187
0.656518
175
0.651844
953
7.74E-35

Home Size
Lot Size
Number
Rooms
24613.2081
0.65716210
2
0.65011748
8
9.08E-34

Home Size
Lot Size
Number
Rooms
Number
Baths
23770.43227
0.682428382
0.673667785
3.86E-35

Home Size
Lot Size
Number
Rooms
Number
Baths
Home
(index)
23741.54434
0.685384624
0.674460479
1.87E-34

0.68

29500
28500

R Square

0.63

27500
26500

0.58

Standard Error

25500

Adjusted R Square

0.53

24500
23500

0.48
1

1.00E-24
1.00E-26
1.00E-28
1.00E-30

Significance F

1.00E-32
1.00E-34
1.00E-36
1

Note that the vertical axis for Significance F has a log scale. We can see
the same kink at 3 on the horizontal scales of all three graphs: a rise for
Standard Error and Significance F, and a dip for Adjusted R Square. There
is actually also a slight rise for R Square. This represents the introduction
of the 3rd variable (Number Rooms) that does not go well in the presence
of Home Size and Lot Size. We also see the Significance F increases when
the last (5th) variable Home (index) is added, whereas the other 3 statistics
did not react adversely. This is why Significance F is used in the SSS
scheme of variables selection to produce a parsimonious model: it
suggested that 4-variable model is best, whereas the other 3 criteria all
preferred the full 5-variable model.
If we rank the importance of the 5 variables, and delete the worst one in
turns, we get the followings instead.
Home
Size

Standard Error
R Square
Adjusted R
Square
Significance
F

29945.59
0.485569
15
0.482093
266
4.06E-23

Home Size
Lot Size

Home Size
Lot Size
Number
Baths

24552.37
0.656518
175
0.651844
953
7.74E-35

23699.0802
0.68215501
8
0.67562395
6
3.68E-36

Home Size
Lot Size
Number
Baths
Home
(index)
23670.832
0.6850840
92
0.6763967
57
2.11E-35

Home Size
Lot Size
Number
Baths
Home (index)
Number
Rooms
23741.54434
0.685384624
0.674460479
1.87E-34

0.68

29500
28500

R Square

0.63

27500
26500

0.58

Standard Error

25500

Adjusted R Square

0.53

24500
23500

0.48
1

1.00E-24
1.00E-26
1.00E-28
1.00E-30

Significance F

1.00E-32
1.00E-34
1.00E-36
1

Here we can more clearly see why Significance F is the better criterion for
variables selection. Viewing it as a forward stepwise procedure, R Square
kept increasing with more variables, but all the other statistics turned after
a while: Significance F turned at 3 variables, then Standard Error and
Adjusted R Square turned at 4. Note that the 3-variable model here (with
Number Baths) was not picked up by the earlier sequence (whose 3variable model had Number Rooms).
For this and other reasons, Significance F is preferable to both Standard
Error and Adjusted R Square (not to mention R Square) for variables
selection: it prefers models with fewer X variables.
(b) The regression coefficients with 5 Xs are as follows:
Y: Price
Intercept
Lot Size
Home Size
Number Baths
Home (index)
Number Rooms

Coefficient
s
81697.677
9
7688.8224
2
30.363661
2
14726.900
8
52.899410
4
863.26689
7

Standard Error
10571.61791
892.3935895
7.052401732
4329.243968
45.47676299
2327.602126

t Stat
7.72802
8.61595
4
4.30543
6
3.40172
6
1.16321
8
0.37088
3

P-value
1.7095E12
1.1224E14
3.0647E05
0.0008670
3
0.2466648
0.7112700
9

Where Home Size, Number Baths, Home (index) and Number Rooms are
the same, an increase of one unit in Lot Size is associated with an increase
of about 7700 in Selling Price, on average. Same for the other variables.
Remember: this shouldnt be the model we are interpreting, since it is not
yet valid, as some P-values are large.
(c) R Square, 68.5%, is the proportion of (total) variation in the selling price
that is explained by the 5 variables.
(5) 4e: P 659, problem 61; 5e: P 581, problem 56. Business admissions. Tut4-Q5P11_61.xlsx
Freshman: Year 1; Sophomore: Year 2; Junior: Year 3; Senior: Year 4. Some
U.S. colleges (universities) require students to declare their majors after the
sophomore year (e.g. Berkeley); some after freshmen year (e.g. MIT); while
others (especially for less well-known colleges/universities) admit students
directly from Year 1 (e.g. Wharton of U Penn, which isas an exceptionvery
well known; e.g. graduated Donald Trump which ran for U.S. President in
2015).
This problem considers whether to catch students early for a major (before
they settle into some other disciplines), or to accept them later when their
aptitudes can be better determined. This issue doesnt exist in the British
(and Singaporean) system, because students apply to different majors for
university admission, and are thus streamed before Year 1 (just like Wharton).
Tut4-Q5-P11_61(answers)&.xlsx gives the correlations.
a. Correlations all low (rather surprising!): largest is 0.43, with most quite a
bit smaller. Hence, collinearity not likely to be a problem.
b. Starting with Tut4-Q5-P11_61Regress&.xlsm, using SSS (click on Run), we
end with Tut4-Q5-P11_61RegressSol&.xlsm, which has a largest p-value >
0.05 (equivalent to 95% Confidence Interval of -0.0014 to 0.1929 capturing
0). If we eliminate that last variable, we obtain Tut4-Q5P11_61RegressSol2&.xlsm (although it has a larger Significance F), as
required by the question. The prediction equation is
I-core = -0.30 + 0.15 M119 + 0.20 E270 + 0.14 A202 + 0.14 A201 + 0.12
L201 + 0.12 K201.
The proportion of total variation in I-core that is explained by the freshman
and sophomore grades is only 36%, the R Square value. The standard
error is 0.46, about half a letter grade (commonly for GPA: A is 4, B is 3, C
is 2, D is 1, F is 0).
c. Only Calculus (M119) and Computers (K201) among freshman classes are
useful for predicting junior performance (Wait! How is this possible? Are
we not talking about a business program here? Who needs quant stuff for
biz?!); the other variables are sophomore classes.
10

(1) All coefficient signs as expected.


(2) Calculus (M119) and Statistics (E207; not more quant stuff!) are by far
the best predictors, judging by their small P-values (and large t-stats
close to 4). It seems that quantitative preparation is important.
Another vote for analytics!
(3) Given the small Adjusted R Square of 0.35, the predictions will not be of
much good.
(4) From the plots in Tut4-Q5-P11_61-MiniRegressionTemplate&.xlsx, we
dont see any obvious violation of assumptions.
d. Starting with Tut4-Q5-P11_61(freshman)Regress&.xlsm, we arrive at Tut4Q5-P11_61(freshman)RegressSol&.xlsm, which is also what the questions
method obtains.
I-core = 0.94 + 0.31 K201 + 0.17 M119.
Only Computers (K201) and Calculus (M119) carried any weight (heaven
forbids, all are quant stuff!!). The proportion of total variation in I-core that
is explained by the freshman grades is only 18%, the R Square value. The
standard error is 0.52, still about half a letter grade.
e. It appears that using the freshman grades alone sacrifices predictive
power too much, judging by Adjusted R2 dropping from an already-low 35%
to 17%; i.e., the proportion of variance of I-core explained was (slightly
more than) halved!
Its about securing students early for the Business program, or waiting a
year till the Business Program office can make better selections. Overall, it
seems that choosing business majors after their sophomore year might be
a better bet, provided not too many potentially good students will be lost
to other programs by then.

11

Anda mungkin juga menyukai