com
Chapter 4
Linear Regression with One Regressor
Multiple Choice
When the estimated slope coefficient in the simple regression model, β̂1 , is zero,
then
a. R 2 = Y .
b. 0 < R 2 < 1.
c. R 2 = 0.
d. R 2 > (SSR/TSS).
ESS
a.
TSS
RSS
b.
TSS
n
∑ (Y − Y )( X
i =1
i i −X)
c.
n n
∑ (Yi − Y )2
i =1
∑(X
i =1
i − X )2
SSR
d.
n −2
1 n $2
a. ∑ ui
n − 2 i =1
b. SSR
c. 1- R2
1 n $2
d. ∑ ui
n − 1 i =1
(Requires Appendix material) Which of the following statements is correct?
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
Binary variables
The following are all least squares assumptions with the exception of:
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
a. Y i − β 0 − β 1 X i
b. Yi − β 0 − β1 X i
c. Yi − Y i
d. (Yi − Y ) 2
14) The slope estimator, β1, has a smaller standard error, other things equal, if
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
19) If the three least squares assumptions hold, then the large sample normal
distribution of β 1 is
1 var[ X i − µ X )ui ]
a. N (0, )
n [var( X i )] 2
1 var(ui )]2
b. N ( β1 , )
n [var( X i )]2
σ u2
c. N ( β1 , n
∑(X i − X )2
i =1
1 var(ui )]
d. N ( β1 , )
n [var( X i )]2
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
21) To obtain the slope estimator using the least squares principle, you divide the
as
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
25) Multiplying the dependent variable by 100 and the explanatory variable by
100,000 leaves the
(c) What is the prediction for the height of a child whose parents have an average
height of 70.06 inches?
(e) Given the positive intercept and the fact that the slope lies between zero and
one, what can you say about the height of students who have quite tall parents?
Who have quite short parents?
(f) Galton was concerned about the height of the English aristocracy and referred
to the above result as “regression towards mediocrity.” Can you figure out
what his concern was? Why do you think that we refer to this result today as
“Galton’s Fallacy?”
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
n n
∑ Yi = 17,375,
i =1
∑X
i =1
i = 7,665.5,
n n n
∑ yi2 = 94,228.8,
i =1
∑ xi2 = 1,248.9,
i =1
∑x y
i =1
i i = 7,625.9
(a) Calculate the slope and intercept of the regression and interpret these.
(b) Find the regression R2 and explain its meaning. What other factors can you
think of that might have an influence on the weight of an individual?
where Earn and Age are measured in dollars and years respectively.
(c) Why should age matter in the determination of earnings? Do the results
suggest that there is a guarantee for earnings to rise for everyone as they
become older? Do you think that the relationship between age and earnings is
linear?
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
(d) The average age in this sample is 37.5 years. What is annual income in the
sample?
4) The baseball team nearest to your home town is, once again, not doing
well. Given that your knowledge of what it takes to win in baseball is
vastly superior to that of management, you want to find out what it takes
to win in Major League Baseball (MLB). You therefore collect the
winning percentage of all 30 baseball teams in MLB for 1999 and
regress the winning percentage on what you consider the primary
determinant for wins, which is quality pitching (team earned run
average). You find the following information on team performance:
(a) What is your expected sign for the regression slope? Will it make sense to
interpret the intercept? If not, should you omit it from your regression and
force the regression line through the origin?
(b) OLS estimation of the relationship between the winning percentage and the
team ERA yields the following:
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
had the most wins in 1999 with 103. Teams play a total of 162 games a year.
Given this information, do you consider the slope coefficient to be large or
small?
(d) What would be the effect on the slope, the intercept, and the regression R 2 if
you measured Winpct in percentage points, i.e., as (Wins/Games) ×100?
(e) Are you impressed with the size of the regression R 2? Given that there is 51%
of unexplained variation in the winning percentage, what might some of these
factors be?
5) You have learned in one of your economics courses that one of the
determinants of per capita income (the “Wealth of Nations”) is the
population growth rate. Furthermore you also found out that the Penn
World Tables contain income and population data for 104 countries of
the world. To test this theory, you regress the GDP per worker (relative
to the United States) in 1990 ( RelPersInc) on the difference between
the average population growth rate of that country (n) to the U.S.
average population growth rate (nus ) for the years 1980 to 1990. This
results in the following regression output:
(b) What would happen to the slope, intercept, and regression R 2 if you ran
another regression where the above explanatory variable was replaced by n
only, i.e. , the average population growth rate of the country? (The population
growth rate of the United States from 1980 to 1990 was 0.009.) Should this
have any affect on the t -statistic of the slope?
(c) 31 of the 104 countries have a dependent variable of less than 0.10. Does it
therefore make sense to interpret the intercept?
6) The neoclassical growth model predicts that for identical savings rates
and population growth rates, countries should converge to the per
capita income level. This is referred to as the convergence hypothesis.
One way to test for the presence of convergence is to compare the
growth rates over time to the initial starting level.
(a) If you regressed the average growth rate over a time period (1960-1990) on
the initial level of per capita income, what would the sign of the slope have to
be to indicate this type of convergence? Explain. Would this result confirm or
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
(b) The results of the regression for 104 countries were as follows:
where g6090 is the average annual growth rate of GDP per worker for
the 1960-1990 sample period, and RelProd60 is GDP per worker
relative to the United States in 1960.
Interpret the results. Is there any evidence of unconditional
convergence between the countries of the world? Is this result
surprising? What other concept could you think about to test for
convergence between countries?
(c) You decide to restrict yourself to the 24 OECD countries in the sample. This
changes your regression output as follows:
7) In 2001, the Arizona Diamondbacks defeated the New York Yankees in the Baseball World
Series in 7 games. Some players, such as Bautista and Finley for the Diamondbacks, had a substantially
higher batting average during the World Series than during the regular season. Others, such as Brosius
and Jeter for the Yankees, did substantially poorer. You set out to investigate whether or not the
regular season batting average is a good indicator for the World Series batting average. The results for
11 players who had the most at bats for the two teams are:
where Wsavg and Seasavg indicate the batting average during the
World Series and the regular season respectively.
(b) What can you say about the explanatory power of your equation? What do you
conclude from this?
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
8) For the simple regression model of Chapter 4, you have been given the
following data:
420 420
(a) How would the slope coefficient change, if you decided one day to
measure testscores in 100s, i.e., a testscore of 650 became 6.5? Would
this have an effect on your interpretation?
(b) Do you think the regression R 2 will change? Why or why not?
(c) Although Chapter 4 in your textbook did not deal with hypothesis testing,
it presented you with the large sample distribution for the slope and the
intercept estimator. Given the change in the units of measurement in (a),
do you think that the variance of the slope estimator will change
numerically? Why or why not?
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
The concept of purchasing power parity or PPP (“the idea that similar
foreign and domestic goods … should have the same price in terms of
the same currency,” Abel, A. and B. Bernanke, Macroeconomics, 4th
edition, Boston: Addison Wesley, 476) suggests that the ratio of the Big
Mac priced in the local currency to the U.S. dollar price should equal the
exchange rate between the two countries.
Multiple Choice
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
With heteroskedastic errors, the weighted least squares estimator is BLUE. You
should use OLS with heteroskedasticity-robust standard errors because
a. not include an intercept because the price of the good is never zero.
b. use a one-sided alternative hypothesis to check the influence of price
on quantity.
c. use a two-sided alternative hypothesis to check the influence of price
on quantity.
d. reject the idea that price determines demand unless the coefficient is
at least 1.96.
6) If the absolute value of your calculated t-statistic exceeds the critical value
from the standard normal distribution, you can
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
7) Under the least squares assumptions (zero conditional mean for the error term,
Xi and Y i being i.i.d., and Xi and ui having finite fourth moments), the OLS
estimator for the slope and intercept
estimate-hypothesize value
a. .
standard error of estimate
estimator
b. .
standard error of estimator
estimator-hypothesize value
c. .
standard error of estimator
estimator-hypothesize value
d. .
standard error of estimator
n
9) Consider the following regression line: TestScore = 698.9 − 2.28 ×STR . You are
told that the t-statistic on the slope coefficient is 4.38. What is the standard error of
the slope coefficient?
a. 0.52
b. 1.96
c. -1.96
d. 4.38
10) Imagine that you were told that the t-statistic for the slope coefficient of the
regression line TestScore = 698.9 − 2.28 ×STR was 4.38. What are the units of
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
11) The construction of the t-statistic for a one- and a two-sided hypothesis
d. ( β 1 −1.96, β 1 + 1.96) .
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
b. ( β 0 − 1.645SE ( β 0 ), β 0 + 1.645SE ( β 0 )) .
d. ( β 0 − 1.96, β 0 + 1.96) .
15) The 95% confidence interval for the predicted effect of a general change in X is
su$2
a. n
∑(X i
− X )2
i =1
s$u
b.
n
∑(Xi =1
i − X )2
su2$
c. n
2
∑X
i =1
i −X
1 n 2
∑ ( X i − X )2 u$ i
1 n − 2 i =1
d. × 2
n ⎡1 n 2⎤
⎢ n ∑ ( Xi − X ) ⎥
⎣ i =1 ⎦
17) One of the following steps is not required as a step to test for the null hypothesis:
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
18) Finding a small value of the p-value (e.g. less than 5%)
19) The only difference between a one- and two-sided hypothesis test is
a. dummy variable
b. dependent variable
c. residual
d. power of a test
b. var(ui | X i = x) depends on x
c. X i is normally distributed
d. there are no outliers.
22) In the presence of heteroskedasticity, and assuming that the usual least squares
assumptions hold, the OLS estimator is
a. efficient
b. BLUE
c. unbiased and consistent
d. unbiased but not consistent
23) The proof that OLS is BLUE requires all of the following assumptions with the
exception of:
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
c. E (ui | X i ) = 0 .
d. large outliers are unlikely.
a. OLS is BLUE
b. WLS is BLUE if the conditional variance of the errors is known up to a
constant factor of proportionality
c. LAD is BLUE if the conditional variance of the errors is known up to a
constant factor of proportionality
d. OLS is efficient
25) The homoskedastic normal regression assumptions are all of the following with
the exception of:
where Studenth is the height of students in inches, and Midparh is the average of
the parental heights. Values in parentheses are heteroskedasticity robust
standard errors. (Following Galton’s methodology, both variables were adjusted so
that the average female height was equal to the average male height.)
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
(b) If children, on average, were expected to be of the same height as their parents,
then this would imply two hypotheses, one for the slope and one for the
intercept.
(i) What should the null hypothesis be for the intercept? Calculate the
relevant t -statistic and carry out the hypothesis test at the 1% level.
(ii) What should the null hypothesis be for the slope? Calculate the relevant
t-statistic and carry out the hypothesis test at the 5% level.
(c) Can you reject the null hypothesis that the regression R2 is zero?
(d) Construct a 95% confidence interval for a one inch increase in the average of
parental height.
2) (Requires Appendix) (continuation from Chapter 4). At a recent county fair, you
observed that at one stand people’s weight was forecasted, and were surprised by
the accuracy (within a range). Thinking about how the person could have
predicted your weight fairly accurately (despite the fact that she did not know
about your “heavy bones”), you think about how this could have been
accomplished. You remember that medical charts for children contain 5%, 25%,
50%, 75% and 95% lines for a weight/height relationship and decide to conduct an
experiment with 110 of your peers. You collect the data and calculate the following
sums:
n n
∑ Yi = 17,375,
i =1
∑X
i =1
i = 7,665.5,
n n n
∑ yi2 = 94,228.8,
i =1
∑ xi2 = 1,248.9,
i =1
∑x y
i =1
i i = 7,625.9
where the height is measured in inches and weight in pounds. (Small letters refer
(c) Calculate the homoskedasticity-only standard errors and, using the resulting
t-statistic, perform a test on the null hypothesis that there is no relationship
between height and weight in the population of college students.
(d) What is the alternative hypothesis in the above test, and what level of
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
(e) Statistics and econometrics textbooks often ask you to calculate critical values
based on some level of significance, say 1%, 5%, or 10%. What sort of criteria
do you think should play a role in determining which level of significance to
choose?
(d) What do you think the relationship is between testing for the significance of
the slope and whether or not the regression R 2 is zero?
3) You have obtained measurements of height in inches of 29 female and 81 male students
(Studenth) at your university. A regression of the height on a constant and a binary variable (BFemme),
which takes a value of one for females and is zero otherwise, yields the following result:
(0.3) (0.57)
(a) What is the interpretation of the intercept? What is the interpretation of the slope?
How tall are females, on average?
b. Test the hypothesis that females, on average, are shorter than males, at the 1%
level.
where Earn and Age are measured in dollars and years respectively.
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
(b) The variance of the error term and the variance of the dependent variable are
related. Given the distribution of earnings, do you think it is plausible that the
distribution of errors is normal?
(c) Construct a 95% confidence interval for both the slope and the intercept.
(a) Is there any reason to believe that the variance of the error terms is
homoskedastic?
6) You recall from one of your earlier lectures in macroeconomics that the per capita
income depends on the savings rate of the country: those who save more end up
with a higher standard of living. To test this theory, you collect data from the Penn
World Tables on GDP per worker relative to the United States (RelProd) in 1990
and the average investment share of GDP from 1980-1990 (sK ), remembering
that investment equals saving. The regression results in the following output:
(0.04) (0.38)
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
(b) Calculate the t-statistics to determine whether the two coefficients are
significantly different from zero. Justify the use of a one-sided or two-sided
test.
You are delighted to find that the coefficients have not changed at all and that your
results have become even more significant. Why haven’t the coefficients changed?
Are the results really more significant? Explain.
(d) Upon reflection you think about the advantages of OLS with and without
homoskedasticity-only standard errors. What are these advantages? Is it likely
that the error terms would be heteroskedastic in this situation?
8) (Requires Appendix material from Chapters 4 and 5) Shortly before you are making a group
presentation on the testscore/student-teacher ratio results, you realize that one of your peers forgot
to type all the relevant information on one of your slides. Here is what you see:
In addition, your group member explains that he ran the regression in a standard spreadsheet
program, and that, as a result, the standard errors in parenthesis are homoskedasticity-only standard
errors.
(b) Calculate the t-statistic for the slope and the intercept. Test the hypothesis that the intercept
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
(c) Should you be concerned that your group member only gave you the result for the
homoskedasticity-only standard error formula, instead of using the heteroskedasticity-robust standard
errors?
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
The concept of purchasing power parity or PPP (“the idea that similar foreign and
domestic goods … should have the same price in terms of the same currency,” Abel, A. and B.
Bernanke, Macroeconomics, 4 th edition, Boston: Addison Wesley, 476) suggests that the ratio of the Big
Mac priced in the local currency to the U.S. dollar price should equal the exchange rate between the
two countries.
After entering the data into your spread sheet program, you calculate the
predicted exchange rate per U.S. dollar by dividing the price of a Big Mac in local
currency by the U.S. price of a Big Mac ($2.51). To test for PPP, you regress the
actual exchange rate on the predicted exchange rate.
(a) Your spreadsheet program does not allow you to calculate heteroskedasticity robust
standard errors. Instead, the numbers in parenthesis are homoskedasticity only standard errors. State
the two null hypothesis under which PPP holds. Should you use a one-tailed or two-tailed alternative
hypothesis?
(c) Using a 5% significance level, what is your decision regarding the null hypothesis given the
two t -statistics? What critical values did you use? Are you concerned with the fact that you are testing
the two hypothesis sequentially when they are supposed to hold simultaneously?
(d) What assumptions had to be made for you to use Student’s t -distribution?
10) (Continuation from Chapter 4, number 6) The neoclassical growth model predicts
that for identical savings rates and population growth rates, countries should
converge to the per capita income level. This is referred to as the convergence
hypothesis. One way to test for the presence of convergence is to compare the
growth rates over time to the initial starting level.
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
(a) The results of the regression for 104 countries were as follows:
where g6090 is the average annual growth rate of GDP per worker for the
1960-1990 sample period, and RelProd60 is GDP per worker relative to the
United States in 1960. Numbers in parenthesis are heteroskedasticity robust
standard errors.
Why didn’t the estimated coefficients change? Given that the standard error of the
slope is now smaller, can you reject the null hypothesis of no beta convergence?
Are the results in the second equation more reliable than the results in the first
equation? Explain.
(b) You decide to restrict yourself to the 24 OECD countries in the sample. This
changes your regression output as follows (numbers in parenthesis are
heteroskedasticity robust standard errors):
Test for evidence of convergence now. If your conclusion is different than in (a),
speculate why this is the case.
(c) The authors of your textbook have informed you that unless you have more than
100 observations, it may not be plausible to assume that the distribution of your
OLS estimators is normal. What are the implications here for testing the
significance of your theory?
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
2) For the following estimated slope coefficients and their heteroskedasticity robust
standard errors, find the t-statistics for the null hypothesis H0: β1 = 0. Assuming
that your sample has more than 100 observations, indicate whether or not you are
able to reject the null hypothesis at the 10%, 5%, and 1% level of a one-sided and
two-sided hypothesis.
5) Below you are asked to decide on whether or not to use a one-sided alternative or
a two-sided alternative hypothesis for the slope coefficient. Briefly justify your
decision.
(a) qid = β 0 + β 1 pi , where q d is the quantity demanded for a good, and p is its price.
actual
(b) pi = β 0 + β 1 piassess , where piactual is the actual house price, and piassess is the
assessed house price. You want to test whether or not the assessment is correct,
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
on average.
disposable income.
6) (Requires Appendix material) Your textbook shows that OLS is a linear estimator
n
Xi − X
β 1 = ∑ a$ i Yi , where a$ i = n
. For OLS to be conditionally unbiased,
2
i =1
∑(Xi =1
i − X)
n n
the following two conditions must hold: ∑ a$i = 0 and
i =1
∑ a$ X
i =1
i i = 1 . Show that
7) (Requires Appedix material and Calculus) Equation (5.36) in your textbook derives
n
var( β 1 | X1 ,K , X n ) = σ u2 ∑ ai2 (where the conditions for conditional
i =1
n n
unbiasedness are ∑a = 0
i =1
i
and ∑a X
i =1
i i
= 1. As an alternative to the BLUE
proof presented in your textbook, you recall from one of your calculus courses that
you could minimize the variance subject to the two constraints, thereby making
the variance as small as possible while the constraints are holding. Show that in
doing so you get the OLS weights a$ i . (You may assume that X1 , K, X n are
8) Your textbook states that under certain restrictive conditions, the t- statistic has a
Student t-distribution with n-2 degrees of freedom. The loss of two degrees of
freedom is the result of OLS forcing two restrictions onto the data. What are these
two conditions, and when did you impose them onto the data set in your derivation
of the OLS estimator?
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
Yi = β1 X i + ui
i.e., a regression through the origin (no intercept). Under the homoskedastic
normal regression assumptions, the t-statistic will have a Student t distribution
with n-1 degrees of freedom, not n-2 degrees of freedom, as was the case in
Chapter 5 of your textbook. Explain. Do you think that the residuals will still sum to
zero for this case?
10) In many of the cases discussed in your textbook, you test for the significance of
the slope at the 5% level. What is the size of the test? What is the power of the
test? Why is the probability of committing a Type II error so large here?
11) Assume that the homoskedastic normal regression assumption hold. Using the
Student t-distribution, find the critical value for the following situation:
12) Consider the following two models involving binary variables as explanatory
variables:
where Wage is the hourly wage rate, DFemme is a binary variable that is equal to
1 if the person is a female, and 0 if the person is a male. Male = 1 – DFemme .
Even though you have not learned about regression functions with two
explanatory variables (or regressions without an intercept), assume that you had
estimated both models, i.e., you obtained the estimates for the regression
coefficients.
What is the predicted wage for a male in the two models? What is the predicted
wage for a female in the two models? What is the relationship between the β s
and the φ s ? Why would you prefer one model over the other?
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
13) Consider the sample regression function Y i = β0 + β1 X i . The table below lists
2
estimates for the slope ( β1 ) and the variance of the slope estimator ( σ β1 ). In
each case calculate the p-value for the null hypothesis of β1 = 0 and a two-tailed
alternative hypothesis. Indicate in which case you would reject the null hypothesis
at the 5% significance level.
2
σ β1 0.37 0.000003 117.5 0.0000013
14) Your textbook discussed the regression model when X is a binary variable
Yi = β0 + β1 Di + ui , i = 1,..., n
Let Y represent wages, and let D be one for females, and 0 for males. Using the
OLS formula for the slope coefficient, prove that β1 is the difference between the
average wage for males and the average wage for females.
15) Your textbook discussed the regression model when X is a binary variable
Yi = β0 + β1 Di + ui , i = 1,..., n
Let Y represent wages, and let D be one for females, and 0 for males. Using the
OLS formula for the intercept coefficient, prove that β0 is the average wage for
males.
16) Let ui be distributed N (0,σ u2 ) , i.e., the errors are distributed normally with a
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
σ u2
N ( β1 , σ β2 ) , where σ β2 = n
. Statistical inference would be
1 1
2
∑( X
i =1
i − X)
straightforward if σ u2 was known. One way to deal with this problem is to replace
σ u2 with an estimator su2$ . Clearly since this introduces more uncertainty, you
follows Student’s t distribution. Look at the table for the Student t-distribution and
focus on the 5% two-sided significance level. List the critical values for 10 degrees
of freedom, 30 degrees of freedom, 60 degrees of freedom, and finally ∞ degrees
Sample: 1 1000
BETA0_HAT BETA1_HAT
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
100
Series: BETA1_HAT
Sample 1 1000
80 Observations 1000
Mean -0.499857
60 Median -0.500181
Maximum -0.467636
Minimum -0.538362
40 Std. Dev. 0.010872
Skewness -0.042224
Kurtosis 2.986107
20
Jarque-Bera 0.305190
Probability 0.858477
0
-0.525 -0.500 -0.475
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
100
Series: BETA0_HAT
Sample 1 1000
80 Observations 1000
Mean 100.0144
60 Median 100.0212
Maximum 106.3482
Minimum 93.86240
40 Std. Dev. 1.994086
Skewness 0.012801
Kurtosis 3.025797
20
Jarque-Bera 0.055040
Probability 0.972855
0
94 96 98 100 102 104 106
Using the means listed next to the graphs, you see that the averages are not
exactly 100 and -0.5. However, they are “close.” Test for the difference of these
averages from the population values to be statistically significant.
18) In the regression through the origin model Yi = β1 X i + ui , the OLS estimator is
∑XY i i
β1 = i =1
n
. Prove that the estimator is a linear function of Y1 ,K ,Yn and prove
2
∑X
i =1
i
19) The neoclassical growth model predicts that for identical savings rates and
population growth rates, countries should converge to the per capita income level.
This is referred to as the convergence hypothesis. One way to test for the
presence of convergence is to compare the growth rates over time to the initial
g6090 is the average annual growth rate of GDP per worker for the 1960-1990
sample period, and RelProd60 is GDP per worker relative to the United States
implying (“beta”) convergence. Using a standard regression package, you get the
following output:
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
Coefficien
Variable t Std. Error t-Statistic Prob.
You are delighted to see that this program has already calculated p-values for you.
However, a peer of yours points out that the correct p-value should be 0.4562.
Who is right?
20) Changing the units of measurement obviously will have an effect on the slope of
*
∑x y
i =1
* *
i i
a
easy but tedious to show that β 1 = n
= β 1 . Given this result, how do you
*2 b
∑x
i =1
i
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
c) Enter the data into your regression analysis program (EViews, Stata,
Excel, SAS, etc.). Calculate the predicted exchange rate per U.S. dollar
by dividing the price of a Big Mac in local currency by the U.S. price of a
Big Mac ($2.51).
(c) Plot the actual exchange rate against the predicted exchange rate.
Include the 45 degree line in your graph. Which observations might
cause the slope and the intercept to differ from zero and one?
2) You have analyzed the relationship between the weight and height of
individuals. Although you are quite confident about the accuracy of your
measurements, you feel that some of the observations are extreme, say,
two standard deviations above and below the mean. Your therefore
decide to disregard these individuals. What consequence will this have
on the standard deviation of the OLS estimator of the slope?
3) In order to calculate the regression R2 you need the TSS and either the
SSR or the ESS. The TSS is fairly straightforward to calculate, being
just the variation of Y. However, if you had to calculate the SSR or ESS
by hand (or in a spreadsheet), you would need all fitted values from the
regression function and their deviations from the sample mean, or the
residuals. Can you think of a quicker way to calculate the ESS simply
using terms you have already used to calculate the slope coefficient?
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
n
restrictions that OLS places on the data, namely that ∑ uˆ
i =1
i = 0 and
∑ uˆ X
i =1
i i = 0. Show that you get the same formula for the regression
slope and the intercept if you impose these two conditions on the
sample regression function.
5) (Requires Appendix material) Show that the two alternative formulae for
the slope given in your textbook are identical.
1 n n
∑ X Y − XY
n i =1 i i
∑( X
i =1
i − X )(Yi −Y )
=
1 n 2 2
n
∑Xi − X ∑(X i − X )2
n i =1 i =1
Yi = β 0 + ui .
Y i = β 1 X i + ui .
8) Show first that the regression R2 is the square of the sample correlation
coefficient. Next, show that the slope of a simple regression of Y on X is
only identical to the inverse of the regression slope of X on Y if the
regression R2 equals one.
Yi = β 0 + β1 X i + uˆi .
First, take averages on both sides of the equation. Second, subtract the resulting
equation from the above equation to write the sample regression function in
deviations from means. (For simplicity, you may want to use small letters to
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
two-dimensional diagram with SSR on the vertical axis and the regression
slope on the horizontal axis how you could find the least squares estimator for
the slope by varying its values through trial and error.
10) Given the amount of money and effort that you have spent on your
education, you wonder if it was (is) all worth it. You therefore collect data
from the Current Population Survey (CPS) and estimate a linear
relationship between earnings and the years of education of individuals.
What would be the effect on your regression slope and intercept if you
measured earnings in thousands of dollars rather than in dollars?
Would the regression R2 be affected? Should statistical inference be
dependent on the scale of variables? Discuss.
Yi * = γ 0 + γ 1 X i* + u i ,
where “*” indicates that the variable has been standardized. What are
the units of measurement for the dependent and explanatory variable?
Why would you want to transform both variables in this way? Show that
the OLS estimator for the intercept equals zero. Next prove that the
OLS estimator for the slope in this case is identical to the formula for the
least squares estimator where the variables have not been
standardized, times the ratio of the sample standard deviation of X and
s
Y, i.e., γˆ1 = βˆ1 * X .
sY
12) The OLS slope estimator is not defined if there is no variation in the
data for the explanatory variable. You are interested in estimating a
regression relating earnings to years of schooling. Imagine that you
had collected data on earnings for different individuals, but that all these
individuals had completed a college education (16 years of education).
Sketch what the data would look like and explain intuitively why the OLS
coefficient does not exist in this situation.
13) Indicate in a scatterplot what the data for your dependent variable and your
explanatory variable would look like in a regression with an R2 equal to zero.
How would this change if the regression R2 was equal to one?
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
In the case of the regression R2 being one, all observations would lie
on a straight line.
14) Imagine that you had discovered a relationship that would generate a
scatterplot very similar to the relationship Y i = X i2 , and that you would
try to fit a linear regression through your data points. What do you
expect the slope coefficient to be? What do you think the value of your
regression R2 is in this situation? What are the implications from your
answers in terms of fitting a linear regression through a non-linear
relationship?
n n
= 0 and ∑ uˆi X i = 0. Show that these conditions imply that
i =1
∑ uˆ Yˆ
i =1
i i = 0.
16) The help function for a commonly used spreadsheet program gives the
following definition for the regression slope it estimates:
n n n
n∑ X i Yi − ( ∑ X i )(∑ Yi )
i =1 i =1 i =1
n n
n ∑ X i2 − ( ∑ X i ) 2
i =1 i =1
Prove that this formula is the same as the one given in the textbook.
17) In order to calculate the slope, the intercept, and the regression R2 for
a simple sample regression function, list the five sums of data that you
need.
18) A peer of yours, who is a major in another social science, says he is not
interested in the regression slope and/or intercept. Instead he only
cares about correlations. For example, in the testscore/student-teacher
ratio regression, he claims to get all the information he needs from the
negative correlation coefficient corr(X,Y)=-0.226. What response might
you have for your peer?
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com
圣才学习网 www.100xuexi.com