CHRISTOPHER A. LLONES
Presented to:
Dr. Moises Neil V. Serio
As Partial Requirements in
Econometrics
st
1 Semester S.Y. 2015-2016
SEPTEMBER 2015
Table of Contents
Table of Contents
Introduction
Data Collection
Data Analysis
Choice of Model and Variables
Scope and Limitation
Regression Analysis
Summary and Descriptive Statistics 0f the Model
Modifying and Organizing the Variables
Correlation
Regression
Regression Diagnostic Tests
Test for the Normality of Residuals
Test for Heteroscedasticity
Test for Multicollinearity
Test for Specification Error
Test for Autocorrelation
Prais-Winsten Regression
Findings
References
Page
i
1
1
1
1
1
2
2
2
3
4
4
4
5
6
7
7
8
8
10
Introduction
United Arab Emirates (UAE) is a member of the Organization of Petroleum Exporting
Country (OPEC) and the country is located at the Middle East. The country has a huge
production of crude oil which made UAE the fourth largest supplier of crude oil amounting to
12% of the total world supply of crude oil based from Energy Supply Security 2014 of
International Energy Authority (IEA). Crude oil is a non-renewable resource found in natural
underground reservoir and it has no close substitute yet found making its demand inelastic in the
side of consumer and supply elasticity also is quite inelastic since market of oil is monopolistic
in nature.
This paper aimed to apply analysis in regression to estimate supply elasticity of crude oil
of United Arab Emirates (UAE) as a function of real oil price of crude oil, production of crude
oil advanced by 1 year and export of crude oil lagged 1 year.
Data Collection
The data used in this paper were collected from the website of OPEC and World Bank.
Data Analysis
The study used descriptive statistics to describe data used and Stata to conduct regression
analysis and diagnostic tests in estimating the coefficient of supply elasticity in crude oil.
Choice of Model and Variables
The study used a basic double-log model with lag and lead variable to estimate the
coefficient of elasticity of crude oil. The dependent variable is the annual export of crude oil of
UAE to represent supply outside the country. Explanatory variables were real crude oil price,
production of crude oil advanced by 1 year and export of crude oil lagged by 1 year. A variable
has been lagged by a year to account the effect of previous supply of the country at present
exportation while a variable has been advanced by a year to account future anticipation in the
market of oil by the exporter which can affect present willingness to supply.
Scope and Limitation
The data used a time series data from 1960-2014 collected from OPEC and World Bank.
This paper focused primarily in conducting and applying regression analysis and gave less
elaboration in discussing the implication of the estimates generated from the model. The author
discourages that the model would be used in any policy recommendations. This papers primary
objective was only to apply methods in regression using Stata as the statistical software.
Regression Analysis
The model was based from export of crude oil=f (real oil price, production of crude oil
advanced by a year, export of crude oil lagged by 1year). This can be expressed as;
CrudeXport= 0 + 1 RealOilPrice + 3 LeadProd + 3 LagXport +
Obs
Mean
year
CrudeXport
RealOilPrice
CrudeProd
55
35
43
53
1987
7.634224
3.024035
7.165557
Std. Dev.
16.02082
.3228516
.5816263
.9962443
Min
Max
1960
6.975414
2.257588
2.639057
2014
8.161945
4.23931
7.936303
Variable
year
CrudeXport
RealOilPrice
CrudeProd
Obs=.
20
12
2
Obs<.
Obs>.
Obs<.
Unique
values
Min
Max
55
35
43
53
55
34
42
51
1960
6.975414
2.257588
2.639057
2014
8.161945
4.23931
7.936303
Using the command summarize the stata has provided a summary of the variable where
it shows number of observation, mean, the standard deviation and the minimum and maximum
value of the variables. The starting year is 1960 until 2014 based from min. and max. of the
variable year in the summary table which has 55 observations. Then, if number of observation is
below 55 the variables has missing observations. Using the command misstable summarize, all
stata will generate table summarizing number of missing observation (obs=.), number of
observation (obs<.) where observation is less than missing values since stata treats missing
observations as large positive values. Therefore, the model has 20, 12 and 2 missing values in
export of crude oil, real oil price and crude production, respectively.
7
. d CrudeXport RealOilPrice CrudeProd
variable name
CrudeXport
RealOilPrice
CrudeProd
storage
type
int
float
int
display
format
value
label
%8.0g
%9.0g
%8.0g
variable label
Xport Crude
Real Oil Price
Crude Production
. replace
CrudeXport=ln( CrudeXport)
CrudeXport was int now float
(35 real changes made)
. replace RealOilPrice=ln( RealOilPrice)
(43 real changes made)
. replace CrudeProd=ln( CrudeProd)
CrudeProd was int now float
(55 real changes made, 2 to missing)
After transforming variables into a log form the storage type will change into float as
shown by the command d (short for describe) which shows storage type before the variable
was transformed into log form. Since the data will be in a time series, the command tsset was
used to tell stata that the data will be in time-series data and also we could use lag and lead
options to generate a lag and lead variables which can only be used if the data would be in timeseries.
. tsset year, yearly
time variable:
delta:
. gen lagCrudeXport=L1.CrudeXport
(21 missing values generated)
. gen leadCrudeProd=F1.CrudeProd
(2 missing values generated)
The option yearly would tell stata that the time variable is annually. Then the time-series
operator L. (for lag) and F. (for lead) can be used. The number 1 means that export in crude
will be lagged by 1 year and crude production will be advance by a year. This is to capture the
researchers hypothesis that expected production and past exportation will affect the amount to
be supplied at present aside from price.
Correlation
Before regression, it would be useful to determine the possible associations among the
variables in the model. The command pwcorr would perform a pairwise correlation.
. pwcorr CrudeXport RealOilPrice CrudeProd lagCrudeXport leadCrudeProd
CrudeX~t RealOi~e CrudeP~d lagCru~t leadCr~d
CrudeXport
RealOilPrice
CrudeProd
lagCrudeXp~t
leadCrudeP~d
1.0000
0.7437
0.9844
0.9551
0.9681
1.0000
0.6701
0.7250
0.6385
1.0000
0.9681
0.9855
1.0000
0.9168
1.0000
The result of the pairwise correlation means that all the independent variable has a
positive association with the independent variable. The magnitude of associations among
variables are quite strong since the coefficient is close to 1.
Regression
Using the basic double-log form of the model;
CrudeXport= 0 + 1 RealOilPrice + 3 LeadProd + 3 LagXport +
The independent variable would be regressed by its explanatory variables using the regress
command in stata.
. regress CrudeXport RealOilPrice leadCrudeProd lagCrudeXport
Source
SS
df
MS
Model
Residual
3.20418579
.096579859
3
29
1.06806193
.00333034
Total
3.30076565
32
.103148927
CrudeXport
Coef.
RealOilPrice
leadCrudeP~d
lagCrudeXp~t
_cons
.0580351
.6313507
.357686
-.0604724
Std. Err.
.0258407
.0851185
.0882998
.309896
t
2.25
7.42
4.05
-0.20
Number of obs
F(
3,
29)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.032
0.000
0.000
0.847
=
=
=
=
=
=
33
320.71
0.0000
0.9707
0.9677
.05771
.1108853
.8054375
.5382794
.573336
Based from the F-test which measures the overall significance of the model, the model is
significant at 1% significance level since the p-value is less than 0.01 margin error. This implies
that at least one of the explanatory variables has able to explain the variability in crude export.
The explanatory variables have able to explain the variability in the exportation of crude oil by
97% based from the R-square. Based also from a t-test that tests if the individual independent
variables have linear relationship with the dependent variable, the independent variables:
RealOilPrice, leadCrudeProd and lagCrudeXport are significant at 5% and 1%. However, before
making an inference out of the results in this regression, the model must undergo a diagnostic
test to determine if the coefficients are unbiased and p-values are valid.
. predict r, residual
(22 missing values generated)
. pnorm r
. kdensity r, normal
Using
result
the
in
kernel density estimate and normal probability (pnorm), it shows a slight deviation from normal.
Nonetheless, the residuals were quite close to a normal distribution. In order to have a clear
result if the residuals are normally distributed, a Shapiro-Wilk test for normality will be used
using the command swilk.
. swilk r
Shapiro-Wilk W test for normal data
Variable
Obs
33
0.97667
0.796
z
-0.473
Prob>z
0.68200
The null hypothesis of Shapiro-Wilk test is that the distributions are normal. Based from
the result, it fails to reject the null hypothesis and accept that residuals are normally distributed.
It is quite difficult to trace a pattern if heteroscedasticity is present using the plot above
since the number of points in not enough to established a good pattern. However, it can be
10
roughly estimated that heteroscedasticity is not present since the data points is not quite
narrowing to the right. Using White test and Breusch-Pagan test it can be concluded if
heteroscedasticity is present using the p-value. The command estat imtest and estat hettest is
for White test and Breusch-Pagan test, respectively.
. estat imtest
Cameron & Trivedi's decomposition of IM-test
Source
chi2
df
Heteroskedasticity
Skewness
Kurtosis
7.90
2.00
0.85
9
3
1
0.5440
0.5723
0.3552
Total
10.76
13
0.6311
. estat hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of CrudeXport
chi2(1)
Prob > chi2
=
=
1.19
0.2756
The p-values of the tests are against the null hypothesis that the variance of the residuals
are homogeneous or homoscedastic. Since the p-value for both test is large and not significant
then, the test fails to reject the null hypothesis and accept that the variance of the residuals is
homogeneous or homoscedastic. If the results reject the null hypothesis, the option vce(robust)
in the regression will be used to come up with an estimate of the coefficients adjusted for the
presence of heteroscedasticity.
VIF
1/VIF
lagCrudeXp~t
leadCrudeP~d
RealOilPrice
7.09
6.27
1.96
0.140980
0.159400
0.510924
Mean VIF
5.11
The rule of thumb states that vif with greater than 10 and tolerance (1/vif) less than 0.1
shows presence of multicollinearity. Then the result for variance inflation factor (vif) and
tolerance here is fine then, the variables is not a near perfect linear combination of the other or
the variables is not capturing the same thing.
11
SS
df
MS
Model
Residual
3.20427346
.09649219
2
30
1.60213673
.003216406
Total
3.30076565
32
.103148927
CrudeXport
Coef.
_hat
_hatsq
_cons
.7272369
.0181491
1.022852
Std. Err.
1.651505
.1098676
6.196664
Number of obs
F( 2,
30)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.44
0.17
0.17
0.663
0.870
0.870
=
=
=
=
=
=
33
498.11
0.0000
0.9708
0.9688
.05671
4.10006
.2425286
13.67813
The test creates two variables the _hat and _hatsq, the _hatsq should not be significant so
that the predictor of our model is specified correctly. The _hat is the variable of prediction and
_hatsq is the variable of the squared prediction. The primary concern in this test is the test for
_hatsq. Based from the link test _hatsq is not significant and it fails to reject the assumptions that
the model is specified correctly.
The ovtest command shows if the model has omitted a variable that is essential in the
model and supposedly be included in the model.
. ovtest
Ramsey RESET test using powers of the fitted values of CrudeXport
Ho: model has no omitted variables
F(3, 26) =
0.82
Prob > F =
0.4969
The test fails to reject the null hypothesis that the model has no omitted variables, then
there are no omitted variables in the model.
Lastly, using a lag and lead in time series data is prone to autocorrelation. The command
estat bgodfrey for Breusch-Godfey and estat dwatson for Durbin-Watson test for serial
correlation of the error term or disturbance.
. estat bgodfrey
Breusch-Godfrey LM test for autocorrelation
lags(p)
1
chi2
df
4.867
4,
33) =
2.55771
The two tests rejected the null hypothesis of no serial correlation, then the eror term is
serially correlated. Using the Prais-Winsten and Cochrane-orcutt regression, according to the
12
stata manual prais uses the generalized least-squares method to estimate the parameters in a
linear regression model in which the errors are serially correlated. Specifically, the errors are
assumed to follow a first-order autoregressive process. Using the Prais-Winsten regression, a
new estimates of the coefficient can be obtain adjusted for autocorrelation.
. prais CrudeXport RealOilPrice lagCrudeXport leadCrudeProd, rhotype(theil)
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
rho
rho
rho
rho
rho
rho
=
=
=
=
=
=
0.0000
-0.2901
-0.3060
-0.3064
-0.3064
-0.3064
SS
df
MS
Model
Residual
14.0513271
.084277236
3
29
4.6837757
.002906112
Total
14.1356043
32
.441737635
CrudeXport
Coef.
RealOilPrice
lagCrudeXp~t
leadCrudeP~d
_cons
.0497158
.3543518
.6477729
-.1344191
rho
-.3063733
Std. Err.
.0190028
.0688691
.0662351
.2252445
t
2.62
5.15
9.78
-0.60
Number of obs
F( 3,
29)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.014
0.000
0.000
0.555
=
33
= 1611.70
= 0.0000
= 0.9940
= 0.9934
= .05391
.0885809
.4952049
.7832388
.3262576
Findings
The p-value of the F-test is less than the marginal error of 0.01, the model is significant at
1%. We have evidence to say that at least one of the explanatory variables has able to explain the
variability of crude oil export of UAE. Based from the R-square the independent variables has
able to explain the variability of crude oil export by 99%. Furthermore, all the explanatory
variables are significant by 5% and 1% based from the p-value of the t-test. Since the model had
undergone diagnostic tests and estimates of the coefficients are adjusted in the presence of
autocorrelation, we can now make inferences based from the estimates of the coefficients.
Based from the classical law of supply it is expected that real oil price should have a
positive sign, as well as for future production of crude oil and past exportation of the country.
The coefficient of RealOilprice is the supply elasticity of crude oil for United Arab Emirates. The
elasticity of supply at 0.0497 is inelastic which means that exportation of crude oil is not
responsive to changes in price of crude oil. Crude oil has no close substitute then the demand for
this good is inelastic. United Arab Emirate is a member of the Organization of Petroleum
Exporting Country (OPEC) which has the power of a monopoly to set prices then it coincides
13
with the estimates that the elasticity of supply of crude oil would be inelastic. A 1% decrease in
price would only decrease exportation by 0.049% since prices is dictated by the OPEC itself. A
percent increase in the previous exportation of crude oil of UAE will increase its present
exportation by 0.354 percent, the percentage of increase in present is lower than the previous
because sellers kept supply at low level to maintain higher price level and also crude oil are nonrenewable resources. Lastly, if anticipated production would increase by 1% exportation of crude
oil will increase by 0.67%. The explanation is quite straightforward, when production of the
good increases sellers has more to supply.
14
References
Chen, X., Ender, P., Mitchell, M. and Wells, C. (2003). Regression with Stata, from
http://www.ats.ucla.edu/stat/stata/webbooks/reg/default.htm
INTERNATIONAL ENERGY AGENCY (IEA), 2014. Energy Supply Security 2104: Emergency
Response of IEA Countries, pp. 502-510
ORGANIZATION of the PETROLEUM EXPORTING COUTTRIES (OPEC) 2015. OPEC
Annual Statistical Bulletin- 50th Edition.
WOOLDRIDGE, JEFFREY, 2009. Introductory Econometrics, Fourth Edition, p. 339-435