Anda di halaman 1dari 13

Haopeng Lu

Part 1: Explaining Income Patterns


You will explore the determinants of income using data from the National
Longitudinal Survey of Youth 2000 sample. Download the file PS3_Data.dta,
which contains a random sample of 1000 respondents. This data set began
with roughly 12,000 American teenagers in 1979 and has been following them
since. Respondents are classified as white, black or Hispanic.
1. Get to know your data set.
a) Show summary statistics by using the command “sum” (summarize). What
fraction of the sample is female? What is the average age? What are the
minimum, maximum and average monthly incomes in the sample? What
fraction is black and what fraction is Hispanic?
. sum

Variable Obs Mean Std. Dev. Min Max

age 1,000 20.261 1.573326 16 23


black 1,000 .086 .2805043 0 1
hispanic 1,000 .057 .2319586 0 1
income 1,000 889.5661 345.7179 258.9612 2618.175
single 1,000 .736 .4410198 0 1

married 1,000 .264 .4410198 0 1


yrs_educ 1,000 11.958 1.502163 4 17
urban 1,000 .8 .4002002 0 1
male 1,000 .538 .4988034 0 1

.
Mean(female)=1-0.538=0.462 Female=1000*0.462=462 Fraction of the
462
sample is female= *100% =0.462 *100% =46.2% The average age is
1000
20.261. The minimum monthly incomes are 258.9612. The maximum monthly
incomes are 2618.175. The average monthly incomes in the sample is
86
889.5661. Black=1000*0.086=86. Fraction of the sample is black= *100%
1000
=0.086 *100% =8.6%. Hispanic=1000*0.057=57. Fraction of the sample is
57
Hispanic = *100% =0.057 *100% =5.7%
1000

b) Now generate a new variable called “minority” that is equal to 1 if a person


is black or Hispanic, and 0 otherwise. What fraction of the sample is made up of
whites? What is the 95% Confidence Interval for the fraction that is made up by
minorities? Can we reject the null hypothesis that 50% of the sample is made
up of minorities? Why or why not?
. generate minority = black==1| hispanic==1

. sum

Variable Obs Mean Std. Dev. Min Max

age 1,000 20.261 1.573326 16 23


black 1,000 .086 .2805043 0 1
hispanic 1,000 .057 .2319586 0 1
income 1,000 889.5661 345.7179 258.9612 2618.175
single 1,000 .736 .4410198 0 1

married 1,000 .264 .4410198 0 1


yrs_educ 1,000 11.958 1.502163 4 17
urban 1,000 .8 .4002002 0 1
male 1,000 .538 .4988034 0 1
minority 1,000 .143 .350248 0 1

Whites=1000-143=857 The fraction of the sample is made up of whites is


857
∗100 %=85.7%
1000
S 0. 350248
SE ( x́ ) = min ority = ≈ 0.0111
√ 1000 √ 1000
´
x́ + ¿1.96∗SE( x́) ¿
0.143-1.96*0.0111≈0.1213 0.143+1.96*0.0111≈0.1648
The 95% Confidence Interval for the fraction that is made up by minorities is
[0.1213, 0.1648]

H o : the sample is made up of minorities=0.5 H 1 : the sample is made up of


minorities≠0.5
x́−μx , 0 0.143−0.5
t= = ≈-32.1622 |t|≈ 32.1622
SE( x́ ) 0.0111
Since the absolute value of the t-statistic is greater that 1.96, we reject that the
null hypothesis the sample is made up of minorities is 0.5.

2. Regress income on the variable’s male, minority and years of education. For
this and all subsequent regressions, use the “robust” option. Report your
results. Discuss which coefficients are significant (and at what level) and which
are not.
. reg income male minority yrs_educ, robust

Linear regression Number of obs = 1,000


F(3, 996) = 51.88
Prob > F = 0.0000
R-squared = 0.1356
Root MSE = 321.91

Robust
income Coef. Std. Err. t P>|t| [95% Conf. Interval]

male 237.22 20.31916 11.67 0.000 197.3467 277.0933


minority -34.96239 28.8751 -1.21 0.226 -91.62541 21.70063
yrs_educ 44.64197 6.52905 6.84 0.000 31.8297 57.45424
_cons 233.1127 80.11345 2.91 0.004 75.90214 390.3232

.
The coefficient on male is 237.22. The coefficient on minority is -34.96239. The
coefficient on years of education is 44.64197. The intercept is 233.1127. That
mean when male=0, minority=0 and years of education=0. the income is
233.1127.
^income=¿ 233.1127 ¿+237.22 X male−34.9624 X minority +44.64197 X years of education
H 0 male : β male=0 H 1 male : β male ≠ 0
t male=11.67 t male >¿2.58
Since t male >¿2.58 (the critical value for α =1 %) we can reject the null hypothesis
at the 1% significance level. we conclude that male is associated with higher
income.
H 0 minority : β minority =0 H 1 minority : β minority ≠ 0
t minority=-1.21 |t minority|<1.96
Since |t minority|<1.96 (the critical value for α =5 %) we cannot reject the null
hypothesis at the 5% significance level. we conclude that minority may not
associated with income.
H 0 years of education : β years of education=0 H 1 years of education : β years of education ≠ 0
t years of education =6.84 t years of education>2.58
Since t years of education>2.58 (the critical value for α =1 %) we can reject the null
hypothesis at the 1% significance level. we conclude that years of education is
associated with income.
H 0 intercept : β intercept=0 H 1 intercept : β intercept ≠ 0
t intercept=2.91 t intercept >2.58
Since t intercept >2.58(the critical value for α =%) we can reject the null hypothesis at
the 1 significance level. we conclude that intercept is associated with income.

3. Now regress income on the variables male, minority, years of education and
married. Which coefficients are significant and at what level? Also, what is the
estimated change in income associated with marriage, for males, and for
females?

. reg income male minority yrs_educ married,robust

Linear regression Number of obs = 1,000


F(4, 995) = 40.09
Prob > F = 0.0000
R-squared = 0.1419
Root MSE = 320.89

Robust
income Coef. Std. Err. t P>|t| [95% Conf. Interval]

male 241.1299 20.42648 11.80 0.000 201.046 281.2139


minority -27.12995 28.82951 -0.94 0.347 -83.70358 29.44368
yrs_educ 46.78304 6.558709 7.13 0.000 33.91255 59.65353
married 62.89816 23.02831 2.73 0.006 17.70854 108.0878
_cons 187.681 81.79813 2.29 0.022 27.16436 348.1976

H 0 male : β male=0 H 1 male : β male ≠ 0


t male=11.80 t male >¿2.58
Since t male >¿2.58 (the critical value for α =1) we can reject the null hypothesis at
the 1% significance level. we conclude that male is associated with higher
income.
H 0 minority : β minority =0 H 1 minority : β minority ≠ 0
t minority=-0.94 |t minority|<1.96
Since |t minority|<1.96 (the critical value for α =5 %) we cannot reject the null
hypothesis at the 5% significance level. we conclude that minority may not
associated with income.
H 0 years of education : β years of education=0 H 1 years of education : β years of education ≠ 0
t years of education = 7.13 t years of education>2.58
Since t years of education>2.58 (the critical value for α =1) we can reject the null
hypothesis at the 1% significance level. we conclude that years of education is
associated with income.
H 0 married : β married =0 H 1 married : β married ≠ 0
t married = 2.73 t married >2.58
Since t married>2.58 (the critical value for α =1 %) we can reject the null hypothesis
at the 1% significance level. we conclude that married is associated with
income.
H 0 intercept : β intercept=0 H 1 intercept : β intercept ≠ 0
t intercept=2.29 t intercept >1.96
Since t intercept >1.96(the critical value for α =5 %) we can reject the null hypothesis
at the 5% significance level. we conclude that intercept is associated with
income.
^income=¿ 187.681¿ +241.1299 X male + 46.78304 X years of education +62.89816 X married −27.13 X minority
If we leave the variable of years of education unchanged
For male married
^income=¿ 187.681¿ +241.1299+ 46.78304 X years of education +62.89816−27.13 X minority

For male unmarried


income=¿ 187.681¿ +241.1299+ 46.78304 X years of education −27.13 X minority
^

For female married


^
income=¿ 187.681+ 46.78304 X years of education ¿ +62.89816−27.13 X minority

For female unmarried


income=¿ 187.681+ 46.78304 X years of education−27.13 X minority ¿

From this we can see that men, whether married or not, are likely to earn more
than women under the same conditions.

4. You are the lead advisor to a senator who sees your results and points out
that since married young people earn more than unmarried young people, it
would be good policy to promote marriage among young members of the labor
market. Is this a correct conclusion to draw? Why or why not?
The observational data that was discussed suffers from omitted variable bias
which makes the coefficient on married positive, large in magnitude and
statistically significant unless the correct variables are controlled for. There may
also be other omitted variables that we do not have data on.
Therefore, our best estimate allows us to reject the null hypothesis. In other
words, we find evidence that marriage may be effective in raising incomes.
However, due to external validity considerations, we cannot use the
conclusions of this randomized trial to generalize all types of guidance items.
For example, there may be great differences between regions or ages.
Therefore, a sensible recommendation is to start a mentoring program and take
some action based on the available data, but to ensure that it is evaluated,
preferably using randomized trials.

5. Now generate a new variable that is equal to the interaction of the dummies
for “male” and “yrs_educ” (call it male_educ). Run the same regression as in 3
but include your new variable among the independent variables. Interpret the
coefficient on education. Interpret the coefficient on the interaction of male and
education. Which of these two coefficients are statistically significant? At what
level?
. gen male_educ= male* yrs_educ

. reg income male minority yrs_educ married male_educ,robust

Linear regression Number of obs = 1,000


F(5, 994) = 36.25
Prob > F = 0.0000
R-squared = 0.1420
Root MSE = 321.05

Robust
income Coef. Std. Err. t P>|t| [95% Conf. Interval]

male 280.1631 145.6308 1.92 0.055 -5.615973 565.9422


minority -27.46953 28.98964 -0.95 0.344 -84.35744 29.41838
yrs_educ 48.69585 8.236225 5.91 0.000 32.53347 64.85823
married 63.00786 23.01572 2.74 0.006 17.84288 108.1728
male_educ -3.246325 12.4493 -0.26 0.794 -27.67625 21.1836
_cons 164.302 99.97692 1.64 0.101 -31.88806 360.4921

.
income=¿ 164.302¿ +280.1631 X male +48.6959 X years of education+
^
63.0079 X married −27.4695 X minority −3.2463 X male educ

In this regress the coefficient of years of education is 48.69585 In general,


similarly, people with more years of education are likely to earn 48.69585
multiply the higher years more than people with less years of education.
H 0 years of education : β years of education=0 H 1 years of education : β years of education ≠ 0
t years of education = 5.91 t years of education>2.58
Since t years of education>2.58 (the critical value for α =1 %) we can reject the null
hypothesis at the 1% significance level. we conclude that years of education is
associated with income.
H 0 male : β male =0 H 1 male : βmale ≠ 0
educ educ educ educ

t male educ
= -0.26 |t male |<1.96
educ

Since |t male |<1.96(the critical value for α =5 %) we cannot reject the null
educ

hypothesis at the 5% significance level. we conclude that male_educ may not


associated with income.
The coefficient280.1631(male) is predicted difference in income between
men and women with 0 years of education, not married and not minority.
First of all, male is a binary variable, years of education is a continuous
variable.
– The coefficient (-3.246325) on X years of education* X male is the effect ( for
income) of a one-unit increase in X years of education and X male , above and
beyond the sum of the individual effects of a unit increase in
X years of educationand a unit increase in X male alone. Plus, this coefficient is
the difference between men and women in the association of years
of education and income.

6. Generate a variable called min_male, the interaction between minority


and male. Regress income on minority, male and min_male. What is the
estimated association with gender for non-minority people? For
minorities? Is there evidence to suggest that this return is different for
non-minorities than for minorities in this sample?
. gen min_male= minority* male

. reg income minority male min_male,robust

Linear regression Number of obs = 1,000


F(3, 996) = 39.03
Prob > F = 0.0000
R-squared = 0.1013
Root MSE = 328.23

Robust
income Coef. Std. Err. t P>|t| [95% Conf. Interval]

minority -20.12805 33.26325 -0.61 0.545 -85.40215 45.14604


male 225.5103 21.7923 10.35 0.000 182.7462 268.2744
min_male -77.42513 57.15255 -1.35 0.176 -189.5784 34.72809
_cons 777.0042 12.18139 63.79 0.000 753.1001 800.9083

income=¿ 777.0042 ¿+225.5103 X male −20.1281 X minority-77.4251 X min


^ male

If male ^income=¿ 777.0042+ 225.5103¿ *1−20.1281∗0−77.4251∗0=1002.5145


If female
^ income=¿ 777.0042+ 225.5103¿ *0−20.1281∗0−77.4251∗0=777.0042
If minority male
^ income=¿ 777.0042+ 225.5103¿ *1−20.1281∗1−77.4251∗1=904.9613
If minority female
^ income=¿ 777.0042+ 225.5103¿ *0−20.1281∗1−77.4251∗1=679.451
H 0 min male
: βmin =0 H 1 min : β min ≠0
male male male

t min
male
= -1.35 |t min |<1.96
male

Since |t min |<1.96(the


male
critical value for α =5 %) we cannot reject the null
hypothesis at the 5% significance level. we conclude that minmale may not
associated with income.
H 0 minority : β minority =0 H 1 minority : β minority ≠ 0
t minority=-0.61 |t minority|<1.96
Since |t minority|<1.96 (the critical value for α =5 %) we cannot reject the null
hypothesis at the 5% significance level. we conclude that minority may not
associated with income.
Therefore, we cannot reject this return is same for non-minorities than for
minorities in this sample.

Part 2: The impact of education on income.


Continue using the same data.
1. Regress income on education and education2 (this requires you to
generate a new variable). Use the robust option as usual. Report your
output.
. gen yrs_educ2= yrs_educ* yrs_educ

. reg income yrs_educ yrs_educ2,robust

Linear regression Number of obs = 1,000


F(2, 997) = 13.31
Prob > F = 0.0000
R-squared = 0.0200
Root MSE = 342.59

Robust
income Coef. Std. Err. t P>|t| [95% Conf. Interval]

yrs_educ 46.26233 45.69802 1.01 0.312 -43.41301 135.9377


yrs_educ2 -.5935477 2.01895 -0.29 0.769 -4.555426 3.368331
_cons 422.5728 258.6426 1.63 0.103 -84.97367 930.1192

.
income=¿ 422.5728+ 46.26233 X years of education−0.5935477 X years of education ¿
2

2. Can you interpret the coefficient on yrs_educ? If so, interpret it. If not,
explain why not.
The coefficients in polynomial regressions do not have a simple
interpretation. The best way to interpret polynomial regression is to plot the
estimated regression function and calculate the estimated effect on income
associated with a change in yrs_educ for one or more values of yrs_educ.
3. What is the predicted difference in income between people with 11
and 12 years of educ?
^
income 12=¿ ¿892.2499 ^ income 11=¿ ¿859.6392 The predicted difference
in income between people with 11 and 12 years of educ is 32.5107.
4. What is the predicted difference in income between people with 15
and 16 years of educ?
income 16=¿ ¿1010.8219
^ ^income 15=¿ ¿982.9595
The predicted difference in income between people with 15 and 16 years
of educ is 27.8624.

5. Are the answers to 3 and 4 the same? Different? Discuss.


32.5107>27.8624.
The predicted difference in income between people with 11 and 12 years
of educ is higher than the predicted difference in income between people with
15 and 16 years of educ. I think the reason might be because when the value of
X reaches a certain point, that value reaches the edit of this regression
equation. And the coefficient on education2 is less than 0. We need a nonlinear
regression function to predict income the change in education with respect to
education.
6. Why might it make sense to include education2 as an explanatory
variable in addition to education?
We need a nonlinear regression function to predict income the change in
education with respect to education. I think my need to include
education2 as an explanatory variable in addition to education. Because
the correlation between education and income might depend on the
education itself.
7. You should find that the coefficients on “educ” and “education2” are
statistically insignificant. Does this mean that education is not a significant
predictor of income? Test if education is a significant predictor of in-come,
report the type of test, how you conduct it, the significance level, and the
result.
The "educ" and "education2" coefficients are statistically insignificant, but
this does not mean that education has no impact on income.
The prob>F=0.00000
F=13.31
H 0 years of education : β years of education =0 ¿ β years of education 2=0
H 1 years of education : β years of education ≠ 0∨β years of education 2 ≠ 0
Therefore, we can reject the null hypothesis the 5% significance level. we
conclude that Any variable may have an income correlation.

Plus

. reg income yrs_educ,robust

Linear regression Number of obs = 1,000


F(1, 998) = 24.12
Prob > F = 0.0000
R-squared = 0.0199
Root MSE = 342.43

Robust
income Coef. Std. Err. t P>|t| [95% Conf. Interval]

yrs_educ 32.4567 6.60851 4.91 0.000 19.48854 45.42487


_cons 501.4488 78.41489 6.39 0.000 347.5718 655.3258

H 0 years of education : β years of education=0 H 1 years of education : β years of education ≠ 0


t years of education = 4.91 t years of education>1.96
Since t years of education>1.96(the critical value for α =1 %) we can reject the null
hypothesis at the 1% significance level. we conclude that years of education is
associated with income.
. test yrs_educ yrs_educ2

( 1) yrs_educ = 0
( 2) yrs_educ2 = 0

F( 2, 997) = 13.31
Prob > F = 0.0000
H 0 years of education : β years of education =0 ¿ β years of education 2=0
H 1 years of education : β years of education ≠ 0∨β years of education 2 ≠ 0
Therefore, we can reject the null hypothesis the 5% significance level. we
conclude that Any variable may have an income correlation.

Part 3: Explore the determinants of ln(income).


1. Continue using the same data. Make a regression table like those
usually present in academic papers. The dependent variable is always
ln(income) (this requires you to generate a new variable) and just as
before, all of the regressions should use the “robust” option. The
independent variables of interest are male, minority and years of
education; add these sequentially to each of the 3 models (i.e. your
first model should just be a regression of ln(income) on male, the 2nd
should include male and minority as the independent variables and
the 3rd should include all 3 independent variables. Standard errors
should be placed below coefficients in parentheses, along with stars
for statistical significance (* p<.1 ** p<.05 *** p<.01). The R2 and
sample size for each regression should be reported at the bottom of
each column.
Dependent variable: ln(income)
Regressor (1) (2) (3)

male 0.2199552 *** 0.219715*** 0.2448481***

(0.0208777) (0.020847) (0.0210224)

minority -0.0639271** -0.0348291

(0.0289074) (0.0282038)

years of 0.0492057***
education
(0.0072806)

Intercept 6.607976*** 6.617247*** 6.011162***

(0.013689) (0.0144097) (0.0910073)

R2 0.0972 0.1012 0.1433

n 1,000 1,000 1,000

2. Using column 3, interpret each of the three coefficients on male, min,


and educ, and discuss whether each is statistically significant at the
5% level.
. reg ln_income male minority yrs_educ ,robust

Linear regression Number of obs = 1,000


F(3, 996) = 51.12
Prob > F = 0.0000
R-squared = 0.1433
Root MSE = .32623

Robust
ln_income Coef. Std. Err. t P>|t| [95% Conf. Interval]

male .2448481 .0210224 11.65 0.000 .2035948 .2861014


minority -.0348291 .0282038 -1.23 0.217 -.0901748 .0205166
yrs_educ .0492057 .0072806 6.76 0.000 .0349185 .0634929
_cons 6.011162 .0910073 66.05 0.000 5.832574 6.18975

^ln ( income )=¿ ¿


6.011162+0.2448481 X male−0.0348291 X minority+0 .0492057 X years of education
The coefficients(0.2448481) for a change in male by one unit (∆ male=1 ¿
is associated with a 100(0.2448481) β male% change in income.
The coefficients(−0.0348291) for a change in minority by one unit (
∆ minority=1¿ is associated with a 100(−0.0348291) β minority % change in
income.
The coefficients¿) for a change in years of education by one unit (
∆ years of education=1¿ is associated with a 100(0 .0492057) β years of education%
change in income.
H 0 male : β male=0 H 1 male : β male ≠ 0
t male=11.65 t male >¿1.96
Since t male >¿1.96 (the critical value for α =5) we can reject the null hypothesis at
the 5% significance level. We conclude that male is associated with higher
income.
H 0 minority : β minority =0 H 1 minority : β minority ≠ 0
t minority=-1.23 |t minority|<1.96
Since |t minority|<1.96 (the critical value for α =5 %) we cannot reject the null
hypothesis at the 5% significance level. we conclude that minority may not
associated with income.
H 0 years of education : β years of education=0 H 1 years of education : β years of education ≠ 0
t years of education = 6.76 t years of education>1.96
Since t years of education>1.96 (the critical value for α =5 %) we can reject the null
hypothesis at the 5% significance level. we conclude that years of education is
associated with income.

3. Is the return to education different for minorities and non-minorities? If


so, by how much? Explain in detail how you would answer this
question, then answer it (even if it requires running additional
regressions).
. gen min_educ= minority* yrs_educ

. reg ln_income male minority yrs_educ min_educ,robust

Linear regression Number of obs = 1,000


F(4, 995) = 39.30
Prob > F = 0.0000
R-squared = 0.1433
Root MSE = .32638

Robust
ln_income Coef. Std. Err. t P>|t| [95% Conf. Interval]

male .2450817 .0210115 11.66 0.000 .2038499 .2863136


minority -.0946558 .1710743 -0.55 0.580 -.4303635 .2410519
yrs_educ .0482149 .0084522 5.70 0.000 .0316288 .064801
min_educ .0051723 .0146016 0.35 0.723 -.0234811 .0338258
_cons 6.022967 .1046416 57.56 0.000 5.817624 6.228311

^ ln ( income )=¿ 6.022967¿ +0.2450817 X male+ 0.0946558 X minority+


0.0482149 X years of education+0.0051723 X min⁡ educ

H 0 min ⁡educ
: β min ⁡ =0
educ
H 1 min ⁡ : β min ⁡ ≠ 0
educ educ

t min ⁡ =0.35 |t min ⁡ |<1.96


educ educ

Since |t min ⁡ |<1.96


(the critical value for α =5 %) we cannot reject the null
educ

hypothesis at the 5% significance level. we conclude that min ⁡educ may not
associated with income. Based that, we can say that education may not
different for minorities and non-minorities.

4. Why might it make more sense to use ln(income) as the dependent


variable than income?
logarithms provide ways of adding curvature to regressions. Using
logarithms allows us to express relationships in terms of percentage
changes. Because men with more education are likely to be paid more.
In addition, the effect of education years on income may also have the
phenomenon of marginal decline.

Anda mungkin juga menyukai