GRADING SCEHDULE: Do not distribute the exam or solutions to anybody (Code honor)
farming 40 points
i. identify the constant as the value for December 10 points
ii. set dummy for September to one and interpret coefficient 10 points
iii. correct variables included in the regression 4 points
perform curvature and heteroskedasticity tests 3 points
set hypothesis and perform test or provide p-value 3 points
iv. correct variables included in the regression 4 points
perform curvature and heteroskedasticity tests 3 points
choice of model 3 points
true/false 15 points
i. correct answer: False 5 points
ii. correct answer: False 5 points
iii. correct answer: True 5 points
autoparts 10 points
i. correct variables included in the regression 5 points
ii. perform curvature and heteroskedasticity tests 2 points
iii. correct interpretation of coefficient for analysis 3 points
Part I.
i. [10 points] The constant is the average weekly sales in December: 83,929 bottles.
ii. [10 points] The average weekly total volume in September is (set dummy for September equal to one and
the rest of the dummies to zero):
83929.75 + 6623.875 = 90,553.625 bottles.
Part II.
iii. [10 points] The answer is no. We run the regression of miloprice on ivicaentry:
------------------------------------------------------------------------------
miloprice | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ivicaentry | .2715052 .0818844 3.32 0.001 .1097517 .4332586
_cons | 5.746374 .0530913 108.24 0.000 5.641498 5.851249
------------------------------------------------------------------------------
The model looks linear (rvfplot) and homoskedastic (the p-value of Breusch-Pagan test is obtained as p =
0.31>0.05). If Milos prices have dropped, then true coefficient multiplying ivicaentry should be negative. The
estimated coefficient is positive (0.2715); hence, the p-value for the hypothesis test that the coefficient is
negative will be large (bigger than 0.5 > 0.05) and we cannot prove Milos average price was lower after
Ivicas entry. (One could get the exact p-value of 0.999 by using klincom ivicaentry and looking at the Ha:
< alternative hypothesis.) Note that just looking at the regression outputs reported p-value of 0.001 gives
the wrong answer, because that is for the test trying to show that Milos average price was different before
and after Ivicas entry, and in this case we see that this difference is positive and not negative as we were
trying to show.
Part III.
iv. [10 points] Either a semi-log specification or a log-log regression will be appropriate in this case. The results
for the two regressions are given below.
Part III.
iv. [10 points] Regressions results:
Log-linear:
-------------+----------------------------------------------------------------
lnpancicvolume | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pancicprice | -.4454663 .024758 -17.99 0.000 -.4943938 -.3965387
miloprice | .0827112 .0234977 3.52 0.001 .0362741 .1291482
gbosprice | .0487792 .0304608 1.60 0.111 -.0114184 .1089768
aldiprice | .1384748 .0331023 4.18 0.000 .0730569 .2038927
ivicaentry | -.7341845 .425805 -1.72 0.087 -1.575674 .1073055
milo_entry | .0220722 .0409245 0.54 0.590 -.0588042 .1029487
pancic_entry | -.0391247 .0366463 -1.07 0.287 -.1115463 .033297
gbos_entry | .0770227 .0767372 1.00 0.317 -.074628 .2286734
aldi_entry | .0684911 .0517738 1.32 0.188 -.0338261 .1708083
_cons | 11.56035 .2020676 57.21 0.000 11.16102 11.95969
------------------------------------------------------------------------------
Log-log specification
--------------------------------------------------------------------------------
lnpancicvolume | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
lnpancicprice | -2.452484 .1383394 -17.73 0.000 -2.725875 -2.179093
lnmiloprice | .4863486 .1291666 3.77 0.000 .2310852 .741612
lngbosprice | .2593963 .1696241 1.53 0.128 -.0758205 .5946132
lnaldiprice | .5089998 .1234564 4.12 0.000 .2650211 .7529785
ivicaentry | -1.136476 .7709127 -1.47 0.143 -2.65998 .3870271
lnmilo_entry | .2348183 .2352649 1.00 0.320 -.2301199 .6997565
lnpancic_entry | -.4077246 .2133808 -1.91 0.058 -.8294149 .0139656
lngbos_entry | .4142044 .4639228 0.89 0.373 -.5026152 1.331024
lnaldi_entry | .4268966 .2136244 2.00 0.048 .004725 .8490683
_cons | 12.58364 .3377567 37.26 0.000 11.91615 13.25112
--------------------------------------------------------------------------------
Part III.
iv. [10 points] Regressions variables:
The dependent variable is pancicvolume.
It is not clear if we should include or exclude ivicaprice as that was not the focus, the answer above
omits it, but it was equally acceptable to include it (note that since it was 0 before Ivicas entry, including it
was exactly the same as including the slope dummy ivicaentryivicaprice).
We need to include:
miloprice, gobsprice, aldiprice and pancicprice because their relationship with pancicvolume was the
focus of our investigation.
ivicaentry, as we know that our prediction is for the time after Ivicas entry.
slope dummies: milo_entry, gobs_entry, aldi_entry and pancic_entry (each equaling the corresponding
price variable times Ivicaentry) to allow for these prices to affect Dragos sales differently before and after
Ivicas entry
After running a linear regression, we notice that the model is heteroskedastic (Breusch-Pagan test is
obtained as p = 0.0046). We run a semi-log model or log-log model. Either model looks linear (run rvfplot)
and homoscedastic (B-P test = 0.70 for semi-log, p = 0.72 for log-log).
Remark: similar conclusion would have been obtained if you had included ivicaprice as an additional
variable, the hettest p-value = 0.0042, and either semi-log or log-log corrects for it.
i. [5 points] The estimated change in lndiabetes when lnsugar increases by 1 in countries where more than
30% of the population is classified as overweight (and thus obese = 1 and obese_lnsguar = lnsugar) is
0.1181605 + (0.0385298) = 0.0796307.
Thus when sugar consumption per capita increases by 1% the diabetes rate increases by 0.0796307%;
alternatively, the elasticity of diabetes rate over sugar consumption is 0.0796307.
ii. [5 points] The predicted lndiabetes is (note that obese = 1 and obese_lnsguar = lnsugar):
0.6201178 + 0.1181605ln(285) + 1.2322421 + (0.0385298)1ln(285) = 2.3024715
The predicted diabetes rate is therefore exp(2.3024715)= 9.9988641.
Remark: We do not use the correction factor here because we are estimating the prediction for one country
not the average across countries.
Part I. [5 points] You estimate a regression y = b0 + b1x1 + b2x2 and find that both x1 and x2 are statistically
significant.
Once the level of x2 is fixed the change in y associated with a given change in x1 is equal to b1.
True False
Part II. [5 points] Your regression is heteroskedastic and you do not correct the problem.
The reported coefficients are not biased, only the standard errors are biased.
True False
Part III. [5 points] Suppose you run two regressions on the same data and get the following output:
Since b1* = 0.0178 > b1 = 0.000376 we have overestimation thus it must be the case that b2a1 > 0 where a1 is the
correlation coefficient between x and z and b2 = 0.0215 > 0. It must be the case that a1 > 0.
True False
-----------------------------------------------------------------------------
sales | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
inc3mi | .0287468 .010682 2.69 0.010 .0072574 .0502363
pop3mi | .0100729 .0012368 8.14 0.000 .0075848 .012561
endcap | 2026.702 356.865 5.68 0.000 1308.782 2744.622
_cons | -883.1614 831.5062 -1.06 0.294 -2555.936 789.6133
------------------------------------------------------------------------------
The model is linear (rvfplot) and homoskedastic (the Breusch-Pagan test result is p-value = 0.619). Moving from
non-endcap to endcap holding income and population fixed (since both locations have the same 3 mile radius) is
estimated to increase the average sales by $2,026,702, which is more than the estimated $1.8 million gain we
needed to justify location A. However, we need a formal test:
Ee calculate the t-test = (2,026.702 1,800)/356.865 = 0.63525983 and then the right tail test (according to the
alternative) as ttail(47,0.63525983) = 0.264. We cannot reject the null thus we would suggest location B.
Remark: Notice that, given the information in the question, since the 3mile radius is the same for the two locations,
we know that inc3mi is the same for the two locations, therefore it must be included in the regression. Furthermore,
we should not include a slope-dummy endcappop3mi because we are told that the impact of population on sales is
the same whether or not the store is an encap thus we are looking at a difference in levels.
i. [5 points] Since:
(a) in a more affluent neighborhood you would probably see higher prices for almost all goods
including prices of yogurt
(b) in a more affluent neighborhood you would probably see the sales of yogurt will be higher
we can safely infer an overestimation effect of ovb.
ii. [5 points] As long as the average income remains constant in each neighborhood then, yes, the fixed effect
regression will help.
iii. [5 points] The difference can be explain by (a) and (b).
(a) Reason: A change in the average income in some of the neighborhoods means that the main
assumption of the fixed effects regression is violated.
(b) Reason: The relation between sales and price changed because of new competitors thus the true
coefficient of price is changed with or without fixed effects.
Remark: Part (c) is the very definition of fixed effects which means that the method does remove ovb for all
variables having only between group variation: i.e. fixed for each store across all period 2001 2010 but
different across stores.