Anda di halaman 1dari 17

ACADEMY OF ECONOMIC STUDIES

FACULTY OF BUSINESS ADMINISTRATION – ENGLISH

ECONOMETRICS PROJECT

Prof. Dr. Univ. Daniela Şerban

Student: Barbu Claudiu–Andrei


Group: 136

12/01/2015
1
TABLE OF CONTENTS

I. INTRODUCTION
II. HYPOTHESIS TESTING
III. LINEAR SIMPLE REGRESSION
IV. LINEAR MULTIPLE REGRESSION
V. CONCLUSION
VI. ANEXES

2
I.
INTRODUCTION

Regression analysis is mainly concerned with the study of the


dependence of one variable (the dependent variable), on one or more other
variables (the explanatory variables), in order to estimate and predict the
mean value of the former in terms of the known values of the latter. A simple
linear regression model attempts to explain the relationship between two
variables using a straight line.
Multiple linear regression attempts to model the relationship between
two or more explanatory variables and a response variable by fitting a linear
equation to observed data. Every value of the independent variable x is
associated with a value of the dependent variable y.
The purpose of this project is to determine whether there is a correlation
between the monthly value of GDP starting in January 2007 and ending in
October 2012 in United States of America (the dependant variable) and the
unemployment rate (the independent variable) - for the scope of conducting
a simple regression - and, moreover, for conducting a multiple one, we are
adding also the inflation rate for each month starting the year 2007 and
ending in October 2012 (also independent variable).
The purpose of this paper is to find a correlation between these variables,
meaning the strength of their relationship, or the degree of linear
association. We shall also test the significance of the coefficients, the validity
of the model through different methods such as F Test and p Value and we
will draw conclusions regarding the autocorrelation and the
heteroscedasticity/ homoscedasticity of the model.
The most important and relevant information needed for conducting such
a test is gathering the necessary data in order to process the simple and the
multiple regressions. In our case, data has been obtained from a single
source, this being http://www.y.charts.com. The study was conducted on all
70 months during the period between the years Jan. 2007 – Oct. 2012,
having the year 2007 as reference year for the accumulated data. In the
following rows, there are some briefly details upon our dependent and
independent variables that the study consists of:

 Y= The value of the monthly GDB =dependent variable: it represents


the primary indicator used to gauge the health of a country's economy and
the total U.S. dollar value of all goods and services produced over a specific
time period - it is practically the size of the economy;
 X1= The value of the monthly Unemployment Rate (%) =independent
variable: it represents the percentage of the total labor force that is
unemployed but actively seeking employment and willing to work. This
3
indicator is considered a lagging indicator, confirming but not foreshadowing
long-term market trends;
 X2= The value of the monthly Inflation Rate (%) =independent
variable: it represents the rate at which the general level of prices for goods
and services is rising and, subsequently, purchasing power is falling.

The gathered data helps us in stating the relevance of the correlation


between the monthly GDP and influencing factors such as: monthly
unemployment rate and monthly inflation rate. This relationship is important
for policy makers in order to obtain a sustainable rise in living standards. If
GDP growth rate is below its natural rate, it is indicated to promote
employment because this rise in total income will generate inflationary
pressures. In contrast, if the GDP growth rate is above its natural level, policy
makers will decide not to intensively promote the creation of new jobs in
order to obtain a sustainable growth rate which will not generate inflation.

II.
HYPOTHESIS TESTING

Hyptothesis 1
On the internet, there are rumors that the U.S. monthly average
unemployment rate is 12%. In order to test this hypothesis, the sample is
gathered on 70 months, starting in January 2007 and ending in October
2012. The test on sample evidence was found to have a percentage of 7.7%
for a standard deviation of 1.96. For a confidence level of 95%, decide if the
assumption is in accordance with the result.
Step 1: Define the null hypothesis
H0: π = 12%; Null hypothesis
H1: π≠12%; Alternative hypothesis; two tail test.
The null hypothesis relies on the fact that the U.S. monthly average
unemployment rate is 12%.
The alternative hypothesis is that the U.S. monthly average unemployment
rate is different than 12%.
We are in the case of a two tail test.
Step2: Establish level of significance
Because the significance level is α= 95% and because we are using a two
sided test, the probability of committing a type 1 error goes both ways and
we will have two cut off values.
Step 3: Cut off values and Rejection Region
Cut-off values = ± 1.96
Rejection Region = ( - ∞; -1.96) U (1.96; ∞)
Step 4: Compute zcalculated.
4
x́−µ 7.7−12 −5 −5
Sample evidence−Claim = = =−21.73
Zcalc= = σ = 1.96 1.96 0.23
Standard Error
√n √ 70 8.36

Step 5: Z calculated falls into the Rejection Region because


-21.73>1.96.

Step 6: Decision upon H0.


Based on the sample evidence, we will reject the null hypothesis and we will
accept the alternative hypothesis in 95% of cases.
Step 7: According to the sample evidence, in 95% of cases, the U.S.
monthly average unemployment rate is different from 12%.

Hypothesis 2
The United States believes that the monthly average inflation rate for
all the 70 months, starting from January 2007 and ending in October 2012,
from the database is 2.16%, with a standard deviation of 1.75. There are
rumors that the U.S. monthly average inflation rate is 5%. Does this result
lend support to the United States’ opinion at 5% level of significance?
Step 1: Define the null and alternative hypotheses.
H0: π=2.16%
H1: π>2.16%
The null hypothesis is based on the fact that the U.S. monthly average
inflation rate calculated on 70 months is 2.16%.
The alternative hypothesis represents the fact that the U.S. monthly average
inflation rate calculated on 70 months is bigger than 2.16.
We are in the case of a right sided test.
Step 2: Set the significance level.
We set the significance level at 5%, so the probability to guarantee the
results is 95% of cases.
Step 3: Establish, according to the significance level, the cut-off
values, the Rejection Region (RR), and the Acceptance Region (AR).
Cut-off values = + 1.645
Rejection Region = (1.645; ∞)
Acceptance Region: (- ∞, 1.645]
5
Step 4: Compute Zcalculated:

x́−µ 2.16−5 −2.84 −2.84


Sample evidence−Claim = = =−13.58
Zcalc= = σ = 1.75 1.75 0.209
Standard Error
√n √70 8.36

Step 5: Z calculated falls into the Rejection Region because -21.73


belongs to the Acceptance Region: (- ∞, 1.645].
Step 6: Decision upon H0.
Based on the sample evidence, we will reject the alternative hypothesis in
95% of cases.
Step 7: According to the sample evidence, in 95% of cases, the U.S.
monthly average inflation rate is different from 2.16%.

III.
LINEAR SIMPLE REGRESSION

The GDP of a country varies a lot due to the fact that several factors
influence it. With the help of the available data, we will try to deduct the
influence of the independent variable (the monthly unemployment rate) over
the dependent variable (the monthly GDP).
The following data has been extracted from an Excel File, and by using
and analyzing it, our purpose is to find the regression equation, with the
following form: Y = α0 + α1*X+ε.
The first table called 'Summary Output' deals with the analysis of the
coefficient of correlation - Multiple R, the coefficient of determination - R
Square and the adjusted coefficient of determination - Adjusted R Square.

Regression Statistics
Multiple R 0.290097746
R Square 0.084156702
Adjusted R Square 0.070688419
Standard Error 0.567157441
Observations 70
Table 1. Summary Output.

The coefficient of correlation (Multiple R) measures the strength of


the relationship between two or more variables and takes values between -1
and 1: the closer to 1, the stronger the relationship between the variables. In
6
our case, Multiple R=0.29, which represents a medium to low positive
correlation between the presented variables (<0.5).
The coefficient of determination (R square) is the proportion of
variability in a data set that is accounted for by the statistical model and it
provides a measure of how well future outcomes are likely to be predicted by
the model. In our case, R2=0.084, which means that 8.4% of the variation in
the value of GDP is explained by the independent variable taken into account
by the model (monthly unemployment rate), if all the other factors are
constant.
The coefficient of determination adjusted for degrees of
freedom (0.070 in our case) is a statistic that has been adjusted to take into
account the sample size as well as the number of independent variables. In
our case, if other factors of influence are determinants of monthly GDP, only
7% out of the monthly GDP variation is explained by monthly unemployment
rate.
The average difference between concrete monthly GDP values and
estimated/predicted monthly GDP values according to the linear function is
0.567 trillion dollars and this is represented in the table by the value of the
Standard error.
In order to test the validity of the model, by interpreting data from the
table called 'ANOVA table':

ANOVA
df SS MS F Significance F
2.00994
Regression 1 2.009942898 3 6.24851 0.01484804
0.32166
Residual 68 21.87339425 8
Total 69 23.88333714
Table 2. ANOVA table for Simple Regression.

The model is a valid one due to the fact that Significance F (0.014) is
close to 0 (<0.05). This probability is the chance to state that the model is
valid when in reality is not, or there is a chance to wrongly reject the null
hypothesis on validity.
Defining hypothesis:

H0: All predicted monthly GDP values have the same value.
H1: In 95% of cases there are at least 2 estimated values for the monthly
GDP which are different.

7
The value of this statistics is calculated by Excel and we find its value
in the above table: F=6.24851.
The decision over the test:
A comparison is done between the F test calculated value and the
theoretical one for a significance level of 5%.
F calculated>F 5%,1,68 => It falls into the rejection region. The
chance to wrongly reject H0 is smaller than 5% and there is enough sample
evidence to reject H0 and to accept H1.
If the chance to wrongly reject H0 (Significance F = 0.014) is smaller
than α (0.05), the decision to reject H0 is correct and the model is valid.

Next, the analysis will be done for the following table's coefficients,
that will help us develop our model's equation:

Table 3

The intercept is 13.9171256. If the monthly unemployment rate is 0 or


if there would not be a correlation between variables, then GDP will be
13.9171256 trillion dollars.
The slope is 0.086792594. This indicates a positive correlation between
the variables.
Using the 1st degree equation Y = α0 + α1*X+ε, the specific model for
the sample will be:

The predicted monthly GDP = 13.9171256 + 0.086*Predicted


Monthly Unemployment rate + ε.
According to the equation, 1% more in the monthly unemployment rate
will induce a bigger monthly GDP with 0.086 trillion dollars on average. Also,
as the slope is greater than 0, we can say we have a direct correlation
between variables.
If the U.S. monthly unemployment rate increases, the U.S. monthly
GDP will also increase due to a growth in labor productivity. Also, the
efficiency of the employees grows faster than the monthly unemployment
rate. However, this is not a causal relation: an increase in the monthly GDP
will not generate an increase in the monthly unemployment rate, but it is an
observed stochastic relation. The monthly unemployment rate can increase
by firing inefficient employees.
In what concerns the Confidence Class, we have valid statements for
95% of cases. The confidence class does not comprise the value 0, and for
this reason the inference is possible. There is 95% chance for any slope to be

8
different from 0. It means that in 95% of cases we will correctly reject the
null hypothesis upon the slope.
In the variation of the monthly unemployment rate, the monthly GDP is
expected to be comprised between at least 0.0175 and at maximum 0.1560.
The P-value represents the computed risk to decide that the slope is
significantly different from 0 when in reality is 0. Because the P-value is lower
than 5%, there is a low risk to take wrong decisions. The probability to
commit Type 1 error is 0.01 (P-value) which is less than 5%.

Following, we will analyze the Graphics:

In the U.S. monthly unemployment rate Residual Plot, there is no


correlation between the errors. It is a fan-effect and we compute the Durbin
Watson Statistics Test, a test used to detect the presence of autocorrelation
in the residuals from a regression analysis.
T

∑ (e t −e t−1) ❑2❑ 0.878281005 belongs to (0;0.95) => positive


t =2
= =0.04
SSResidual 21.8733
autocorrelation

According to the result, the model is affected by a positive autocorrelation .

The U.S. Unemployment Rate % Residual Plot


Residuals

f(x) = 0x + 0
R² = 0

The U.S. Unemployment Rate %

Fig 1. The U.S Unemployment rate – Residual plot

In the U.S. Unemployment rate – Line Fit Plot chart, we can see that
there is a typical heteroscedastic model. The variance of the errors is
increasing, while the monthly unemployment rate increases. There is no
constant variance.
In real life, it will be chosen another relation between the monthly GDP
and the monthly unemployment rate of a country.

9
The U.S. Unemployment Rate % Line Fit Plot

GDP U.S

GDP U.S
Predicted GDP U.S

The U.S. Unemployment Rate %

Fig 2. The U.S. Unemployment Rate – Line Fit Plot

In the table below, we will analyze the normal distribution of errors. As


shown in the Normal Probability Plot chart, the errors are following a S-shape.
They are perfectly balanced or square surface (have a slight uniform
distribution). As we can observe in the graphic, there is a small skeweness to
the right (small errors). We have uniform errors and the normality of errors is
not severe affected.

Normal Probability Plot

f(x) = 0.02x + 13.61


GDP U.S

R² = 0.94

Sample Percentile

Fig. 3. Normal Probability Plot

As a conclusion, the model is violating to assumptions, according the Line


fit plot chart and the Residual plot chart.

IV.
LINEAR MULTIPLE REGRESSION

In order to perform the multiple regression analysis, we take into account


two independent variables, these being: the U.S monthly unemployment rate
and the U.S. monthly inflation rate. We wish to test if adding up these two
variables, the model changes in a certain way. If yes, this means that the

10
variables contribute decisively to the change in the value of monthly GDP.
We will start again the same analysis as for the simple regression:
As we can see from the image extracted from the Excel file, this is how
the regression model looks like. From the following available data, our
purpose is to find the regression equation, with the following form: Y = α0 +
α1*X1+ α2*X2+ ε.
The first table we are analyzing is called 'Summary Output' and deals with
the analysis of the coefficient of correlation - Multiple R, the coefficient of
determination - R Square and the adjusted coefficient of determination -
Adjusted R Square.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.59552604
R Square 0.354651264
Adjusted R Square 0.335387123
Standard Error 0.479631099
Observations 70
Table 5. Summary output for Multiple Regression.

The coefficient of correlation (Multiple R) measures the strength of


the relationship between two or more variables and takes values between -1
and 1: the closer to 1, the stronger the relationship between the variables. In
our case, Multiple R=0.59, which is comprised between the values 0.5 and
0.75. This means that there is a medium to strong positive correlation.
The coefficient of determination (R square) is the proportion of
variability in a data set that is accounted for by the statistical model and it
provides a measure of how well future outcomes are likely to be predicted by
the model. In our case, R 2=0.3546, which means that 35.46% out of the
variation in the value of the monthly GDP is explained by the independent
variables taken into account by the model (the monthly unemployment rate
and the monthly inflation rate), while the other factors being constant.
The coefficient of determination adjusted for degrees of freedom
(0.3353 in our case) is a statistic that has been adjusted to take into account
the sample size as well as the number of independent variables. In our case,
if other factors of influence are determinants of the monthly GDP, only
33.53% out of the value of monthly GDP variation is explained by the
monthly unemployment rate and by the monthly inflation rate.
We can argue and compare the values obtained from the simple
regression and the ones obtained now, for the multiple one: as we can see,
Multiple R increased from 29% to 59%, the increase of 30% being due to the
inclusion of another variable, meaning the monthly inflation rate. We can
state that the second variable had a real impact on the change of the model.

11
Next, we wish to test the validity of the model, by interpreting data from
the table called 'ANOVA table':
Significance
df SS MS F F
18.4099180
Regression 2 8.470255713 4.235127857 7 4.24732E-07
Residual 67 15.41308143 0.230045991
Total 69 23.88333714
Table 5. ANOVA method for Multiple Regression.

The model is a valid one due to the fact that Significance F


(4.24732E-07) tends to 0. This probability is the chance to state that the
model is valid when in reality is not, or there is a chance to wrongly reject
the null hypothesis on validity.
Defining hypothesis:

H0: All predicted monthly GDP values have the same value.
H1: In 95% of cases there are at least 2 estimated values for the monthly
GDP which are different.
A comparison is done between the F test calculated value and the
theoretical one for a significance level of 5%.
F calculated>F 5%,2,67 => It falls into the rejection region. The
chance to wrongly reject H0 is smaller than 5% and there is enough sample
evidence to reject H0 and to accept H1.
If the chance to wrongly reject H0 (Significance F = 4.24732E-07 which
tends to 0) is smaller than α (0.05), the decision to reject H 0 is correct and the
model is valid.

Table 6. Coefficients of the multiple regression equation.

The intercept is 12.75164724. If the monthly unemployment rate and


the monthly inflation rate are 0 or if there is no correlation between
variables, then GDP will be 12.75164724 trillion dollars.
The first slope is 0.18079695. This indicates a positive correlation
between the monthly unemployment rate and GDP.
The second slope is 0.203683865. This indicates a positive correlation
between the monthly inflation rate and GDP.

12
Using the 1st degree equation Y = α0 + α1*X1+ α2*X2+ ε, the specific
model for the sample will be:
The predicted monthly GDP = 12.75 + 0.18*Predicted Monthly
Unemployment rate +0.20*Predicted Monthly Inflation Rate+ ε.

According to the equation, 1% more in the monthly unemployment rate


will induce a bigger monthly GDP with 0.18 trillion dollars on average. Also,
as the slope is greater than 0, we can say we have a direct correlation
between variables.
Also, 1% more in the monthly inflation rate will induce a bigger
monthly GDP with 0.2 trillion dollars on average. The slope of 0.2 is bigger
than 0 and there is also a direct correlation between variables.
A growth in the monthly unemployment rate and a growth in the
monthly inflation rate will both increase the level of the monthly GDP due to
labor productivity, respectively due to an increase in prices.
According to the Table 6, both confidence classes, in the case of
monthly unemployment rate and in the case of monthly inflation rate, do not
comprise the value 0. This means that in both cases, the chances for any
slope to be 0 are 5%. The inference is valid for both slopes and in reality we
have 2 correlations. In what concerns the Confidence Class, we have valid
statements for 95% of cases.
In the variation of the monthly unemployment rate, the monthly GDP is
expected to be comprised between at least 0.1123 and at maximum 0.2492,
while in the case of the monthly inflation rate, the monthly GDP is expected
to comprised in between at least 0.1269 and at most 0.2804.
The probability to commit Type 1 error in the case of the monthly
unemployment rate is 1.56284E-06 (P-value), value that tends to 0, which is
less than 5%. In the case of monthly inflation rate, the probability to commit
Type 1 error is represented by the P-value, which is 1.3973E-06 (also tends to
0).
Following, we will analyze the charts:
Like in the case of the monthly unemployment rate – residual plot, in
the monthly inflation rate – residual plot there is no correlation between
errors and it will be applied the Durbin Watson Statistic test.
T

∑ (e t −e t−1) ❑2❑
1.5196 belongs to (0;0.95) => positive autocorrelation
t =2
==0.09
SSResidual 15.413
According to the result obtained after applying the Durbin Watson test,
the model is affected by positive autocorrelation.

13
The U.S. Inflation Rate % Residual Plot

Residuals
The U.S. Inflation Rate %

Fig. 4. The U.S. Inflation Rate – Residual Plot


In the Line fit plot char presented below, we can see that the model is
affected by heteroscedasticity. The variance of the errors is increasing, while
the monthly inflation rate increases, as in the case of monthly
unemployment rate. So, there is no constant variance.

The U.S. Inflation Rate % Line Fit Plot

GDP U.S
GDP U.S

Predicted GDP U.S

The U.S. Inflation Rate %

Fig.5. The U.S. Inflation Rate – Line fit plot

We also conducted a Multicollinearity Test. Multicollinearity suggests


that several of the independent variables are closely linked in some way.
Once the collinear variables are identified, it may be helpful to study whether
there is a causal link between the variables. The simplest way to resolve
multicollinearity problems is to reduce the number of collinear variables until
there is only one remaining out of the set. Sometimes, after some study it
may be possible to identify one of the variables as being extraneous.
Alternatively, it may be possible to combine two or more closely related
variables into a single input.
According to this test for the sample data of 70 months, we do not
have to worry about multicollinearity if the R-squared from the multiple

14
regression output (0.35) exceeds the R-squared of any independent variable
regressed on the other independent variables (0.26).

V.
CONCLUSION

In economy, knowing the relationship between the GDP and the


unemployment rate and the inflation rate represents a critical issue and are
used to gauge the state of an economy.
GDP and the unemployment rate are linked in the sense that a rise in
GDP is significant in the study of macroeconomic trends in a nation. This is
also true of a rise or decrease in unemployment levels. GDP and
unemployment rate usually go together because a rise in the rate of
unemployment is reflected in an increase in GDP, due to the fact that there is
an increase in consumer demand for goods and services. Such a rise in both
GDP and employment levels is an indication that the economy is booming
GDP and the inflation rate are also influencing each other. For example,
in a situation in which the inflation reaches a high level, the companies are
perceiving their prices to be higher than normal. The GDP goes up even more
because inflation causes the dollar to be worth less meaning everything
costs more and on the top of that more is being produced because firms
think their products have increased in price relative to other prices.
One of the reasons U.S. is able to claim that GDP is growing is that the
Fed and U.S. Government are “fudging” the inflation statistics through the
use of “hedonic adjustments”, “substitution” and other methodologies, which
make the inflation rate look smaller than it actually is. Therefore, because a
smaller percentage gets subtracted out of the total value of exchanged
goods and services, it looks like as if GDP is growing (or growing faster than
it actually is).

VI.
ANEXES

GDP U.S The U.S.


The U.S. Inflation
Month/Year (trillion U.S. Unemployment
Rate %
dollars) Rate %
Jan-07 13.71 4.6 2.08
Feb-07 13.86 4.5 2.42

15
Mar-07 13.8 4.4 2.78
Apr-07 13.98 4.5 2.57
May-07 14.02 4.5 2.69
Jun-07 14.03 4.5 2.69
Jul-07 14.04 4.6 2.36
Aug-07 14.15 4.6 1.97
Sep-07 14.29 4.7 2.76
Oct-07 14.22 4.7 3.54
Nov-07 14.27 4.7 4.31
Dec-07 14.38 5 4.08
Jan-08 14.41 4.9 4.28
Feb-08 14.25 4.8 4.03
Mar-08 14.33 5.1 3.98
Apr-08 14.39 5 3.94
May-08 14.43 5.5 4.18
Jun-08 14.6 5.5 5.02
Jul-08 14.58 5.7 5.6
Aug-08 14.45 6.1 5.37
Sep-08 14.42 6.1 4.94
Oct-08 14.31 6.5 3.66
Nov-08 14.28 6.7 1.07
Dec-08 13.99 7.2 0.09
Jan-09 14.07 7.6 0.03
Feb-09 14.06 8.1 0.24
Mar-09 14.02 8.5 -0.38
Apr-09 14.04 8.9 -0.74
May-09 14.06 9.4 -1.28
Jun-09 13.86 9.5 -1.43
Jul-09 13.85 9.4 -2.1
Aug-09 13.94 9.7 -1.48
Sep-09 13.97 9.8 -1.29
Oct-09 14.14 10.1 -1.3
Nov-09 14.08 9.9 -0.2
Dec-09 14.04 9.9 1.8
Jan-10 14.2 9.7 2.7
Feb-10 14.27 9.8 2.6
Mar-10 14.36 9.8 2.1
Apr-10 14.6 9.9 2.3
May-10 14.48 9.6 2.2
Jun-10 14.45 9.4 2
Jul-10 14.54 9.5 1.1
Aug-10 14.55 9.6 1.2
Sep-10 14.64 9.5 1.1

16
Oct-10 14.71 9.5 1.1
Nov-10 14.69 9.8 1.2
Dec-10 14.81 9.3 1.1
Jan-11 14.72 9.1 1.5
Feb-11 14.74 9 1.6
Mar-11 14.99 8.9 2.1
Apr-11 15.04 9 2.7
May-11 15.04 9 3.2
Jun-11 14.93 9.1 3.6
Jul-11 15.13 9 3.6
Aug-11 15.21 9 3.6
Sep-11 15.14 9 3.8
Oct-11 15.38 8.9 3.9
Nov-11 15.29 8.6 3.5
Dec-11 15.29 8.5 3.4
Jan-12 15.42 8.3 3
Feb-12 15.55 8.3 2.9
Mar-12 15.46 8.2 2.9
Apr-12 15.54 8.1 2.7
May-12 15.59 8.2 2.3
Jun-12 15.65 8.2 1.7
Jul-12 15.81 8.2 1.7
Aug-12 15.77 8.1 1.4
Sep-12 15.86 7.8 1.7
Oct-12 15.81 7.9 2
Table 7. Database

17