14.1 - Autoregressive Models: Autocorrelation and Partial Autocorrelation

14.
1 - Autoregressive Models
Printer-friendly version
A time series is a sequence of measurements of the same variable(s) made over time. Usually the
measurements are made at evenly spaced times - for example, monthly or yearly. Let us first consider
the problem in which we have a y-variable measured as a time series. As an example, we might
have y a measure of global temperature, with measurements observed each year. To emphasize that
we have measured values over time, we use "t" as a subscript rather than the usual "i,"
i.e., ytyt means yy measured in time period tt. An autoregressive model is when a value from a time
series is regressed on previous values from that same time series. for example, ytyt on yt−1yt−1:
yt=β0+β1yt−1+ϵt.yt=β0+β1yt−1+ϵt.
In this regression model, the response variable in the previous time period has become the predictor
and the errors have our usual assumptions about errors in a simple linear regression model.
The order of an autoregression is the number of immediately preceding values in the series that are
used to predict the value at the present time. So, the preceding model is a first-order autoregression,
written as AR(1).
If we want to predict yy this year (ytyt) using measurements of global temperature in the previous
two years (yt−1,yt−2yt−1,yt−2), then the autoregressive model for doing so would be:
yt=β0+β1yt−1+β2yt−2+ϵt.yt=β0+β1yt−1+β2yt−2+ϵt.
This model is a second-order autoregression, written as AR(2), since the value at time tt is predicted
from the values at times t−1t−1 and t−2t−2. More generally, a kthkth-order autoregression, written as
AR(k), is a multiple linear regression in which the value of the series at any time t is a (linear)
function of the values at times t−1,t−2,…,t−kt−1,t−2,…,t−k.
Autocorrelation and Partial Autocorrelation
The coefficient of correlation between two values in a time series is called the autocorrelation
function (ACF) For example the ACF for a time series ytyt is given by:
Corr(yt,yt−k).Corr(yt,yt−k).
This value of k is the time gap being considered and is called the lag. A lag 1 autocorrelation
(i.e., k = 1 in the above) is the correlation between values that are one time period apart. More
generally, a lag k autocorrelation is the correlation between values that are k time periods apart.
The ACF is a way to measure the linear relationship between an observation at time t and the
observations at previous times. If we assume an AR(k) model, then we may wish to only measure the
association between ytyt and yt−kyt−k and filter out the linear influence of the random variables that
lie in between (i.e., yt−1,yt−2,…,yt−(k−1)yt−1,yt−2,…,yt−(k−1)), which requires a transformation on the
time series. Then by calculating the correlation of the transformed time series we obtain the partial
autocorrelation function (PACF).
The PACF is most useful for identifying the order of an autoregressive model. Specifically, sample
partial autocorrelations that are significantly different from 0 indicate lagged terms of yy that are
useful predictors of ytyt. To help differentiate between ACF and PACF, think of them as analogues
to R2R2 and partial R2R2 values as discussed previously.
Graphical approaches to assessing the lag of an autoregressive model include looking at the ACF and
PACF values versus the lag. In a plot of ACF versus the lag, if you see large ACF values and a non-
random pattern, then likely the values are serially correlated. In a plot of PACF versus the lag, the
pattern will usually appear random, but large PACF values at a given lag indicate this value as a
possible choice for the order of an autoregressive model. It is important that the choice of the order
makes sense. For example, suppose you have blood pressure readings for every day over the past two
years. You may find that an AR(1) or AR(2) model is appropriate for modeling blood pressure.
However, the PACF may indicate a large partial autocorrelation value at a lag of 17, but such a large
order for an autoregressive model likely does not make much sense.
Example 1: Google Data

The data set (google_stock.txt) consists of n = 105 values which are the closing stock price of a share
of Google stock during 2-7-2005 to 7-7-2005. We will analyze the dataset to identify the order of an
autoregressive model. A plot of the stock prices versus time is presented in the figure below
(Minitab: Stat > Time Series > Time Series Plot, select "price" for the Series, click the Time/Scale
button, click "Stamp" under "Time Scale" and select "date" to be a Stamp column):
Consecutive values appear to follow one another fairly closely, suggesting an autoregression model
could be appropriate. We next look at a plot of partial autocorrelations for the data:
To obtain this in Minitab select Stat > Time Series > Partial Autocorrelation. Here we notice that
there is a significant spike at a lag of 1 and much lower spikes for the subsequent lags. Thus, an
AR(1) model would likely be feasible for this data set.
Approximate bounds can also be constructed (as given by the red lines in the plot above) for this plot
to aid in determining large values. Approximate (1−α)×100%(1−α)×100% significance bounds are
given by ±z1−α/2/√ n ±z1−α/2/n. Values lying outside of either of these bounds are indicative of an
autoregressive process.
We can next create a lag-1 price variable and consider a scatterplot of price versus this lag-1 variable:
There appears to be a moderate linear pattern, suggesting that the first-order autoregression model
yt=β0+β1yt−1+ϵtyt=β0+β1yt−1+ϵt
could be useful.
Example 2: Quake Data

Let yt = the annual number of worldwide earthquakes with magnitude greater than 7 on the Richter
scale for n = 100 years (earthquakes.txt data obtained from https://earthquake.usgs.gov). The plot
below gives a time series plot for this dataset.
The plot below gives a plot of the PACF (partial autocorrelation function), which can be interpreted
to mean that a third-order autoregression may be warranted since there are notable partial
autocorrelations for lags 1 and 3.
The next step is to do a multiple linear regression with number of quakes as the response variable and
lag-1, lag-2, and lag-3 quakes as the predictor variables. (In Minitab, we used Stat >> Time Series
>> Lag to create the lag variables.) In the results below we see that the lag-3 predictor is significant
at the 0.05 level (and the lag-1 predictor p-value is also relatively small).
14.2 - Regression with Autoregressive Errors
Next, let us consider the problem in which we have a y-variable and x-variables all measured as a
time series. As an example, we might have yas the monthly highway accidents on an interstate
highway and x as the monthly amount of travel on the interstate, with measurements observed for
120 consecutive months. A multiple (time series) regression model can be written as:
yt=Xtβ+ϵt.yt=Xtβ+ϵt.
The difficulty that often arises in this context is that the errors (ϵtϵt) may be correlated with each
other. In other words, we have autocorrelationor a dependency between the errors.
We may consider situations in which the error at one specific time is linearly related to the error at
the previous time. That is, the errors themselves follow a simple linear regression model that can be
written as
ϵt=ρϵt−1+ωt.ϵt=ρϵt−1+ωt.
Here, |ρ|<1|ρ|<1 is called the autocorrelation parameter and the ωtωt term is a new error term that
follows the usual assumptions that we make about regression errors: ωt∼iidN(0,σ2)ωt∼iidN(0,σ2).
(Here "iid" stands for "independent and identically distributed.) So, this model says that the error at
time t is predictable from a fraction of the error at time t - 1 plus some new perturbation ωtωt.
Our model for the ϵtϵt errors of the original Y versus X regression is an autoregressive model for the
errors, specifically AR(1) in this case. One reason why the errors might have an autoregressive
structure is that the Y and X variables at time t may be (and most likely are) related to
the Y and X measurements at time t – 1. These relationships are being absorbed into the error term of
our multiple linear regression model that only relates Y and X measurements made at concurrent
times. Notice that the autoregressive model for the errors is a violation of the assumption that we
have independent errors and this creates theoretical difficulties for ordinary least squares estimates of
the beta coefficients. There are several different methods for estimating the regression parameters of
the Y versus X relationship when we have errors with an autoregressive structure and we will
introduce a few of these methods later.
The error terms ϵtϵt still have mean 0 and constant variance:
E(ϵt)=0Var(ϵt)=σ21−ρ2.E(ϵt)=0Var(ϵt)=σ21−ρ2.
However, the covariance (a measure of the relationship between two variables) between adjacent
error terms is:
Cov(ϵt,ϵt−1)=ρ(σ21−ρ2),Cov(ϵt,ϵt−1)=ρ(σ21−ρ2),
which implies the coefficient of correlation (a unitless measure of the relationship between two
variables) between adjacent error terms is:
Corr(ϵt,ϵt−1)=Cov(ϵt,ϵt−1)√ Var(ϵt)Var(ϵt−1) =ρ,Corr(ϵt,ϵt−1)=Cov(ϵt,ϵt−1)Var(ϵt)Var(ϵt−1)=ρ,

which is the autocorrelation parameter we introduced above.
We can use partial autocorrelation function (PACF) plots to help us assess appropriate lags for the
errors in a regression model with autoregressive errors. Specifically, we first fit a multiple linear
regression model to our time series data and store the residuals. Then we can look at a plot of the
PACF for the residuals versus the lag. Large sample partial autocorrelations that are significantly
different from 0 indicate lagged terms of ϵϵ that may be useful predictors of ϵtϵt.
Least Squares Method : We can take any other year
as the origin, and for that year X would be 0. Considerable
saving of both time and effort is possible if the origin is
taken in the middle of the whole time span covered by the
entire series. The origin would than be located at the
mean of the X values. Sum of the X values would then
equal 0. The two normal equations would then be
simplified to
∑Y = Na ...(i)
or a =
and ∑XY = b∑X or2
b = ...(ii)
Two cases of short cut method are given below. In the first
case there are odd number of years while in the second
case the number of observations are even.
Illustration : Fit a straight line trend on the following
data :
Year 1996 1997 1998 1999 2000 2001 2002 2003 2004
Y 4 7 7 8 9 11 13 14 17
Solution : Since we have 9 observations, therefore, the
origin is taken at 2000 for which X is assumed to be 0.
------------------------------
Year Y X XY X 2
------------------------------
1996 4 – 4 – 16 16
1997 7 – 3 – 21 9
1998 7 – 2 – 14 4
1999 8 – 1 – 8 1
2000 9 0 0 0
2001 11 1 11 1
2002 13 2 26 4
2003 14 3 42 9
2004 17 4 68 16
-----------------------------
Total 90 0 88 60
------------------------------
Thus n = 9, SY = 90, SX = 0, SXY = 88, and SX2 = 60
Substituting these values in the two normal equations, we
get
90 = 9a or a = 90/9 or a = 10
88 = 60 or b = 88/60 or b = 1.47
Trend equation is : Yc = 10 + 1.47 X
Inserting the various values of X, we obtain the trend
values as below :
Solution : Here there are two mid-years viz; 2006 and

2007. The mid-point of the two years is assumed to be 0
and the time of six months is treated to be the unit. On
this basis the calculations are as shown below:
----------------------------------------------
Years Observed Y X XY X 2
----------------------------------------------
2003 6.7 – 7 – 46.9 49
2004 5.3 – 5 – 26.5 25
2005 4.3 – 3 – 12.9 9
2006 6.1 – 1 – 6.1 1
2007 5.6 1 5.6 1
2008 7.9 3 23.7 9
2009 5.8 5 29.0 25
2010 6.1 7 42.7 49
----------------------------------------------
Total 47.8 0 8.6 168
----------------------------------------------
From the above computations, we get the following
values.
n = 8, ∑Y = 47.8, ∑X = 0, ∑XY = 8.6, ∑X = 168 2
Substituting these values in the two normal equations, we

obtain
47.8 = 8a or a = 47.8/8 or a = 5.98 and 8.6 = 168 b or =
8.6/168 or b = 0.051
The equation for the trend line is : Y = 5.98 + 0.051X
c
Trend values generated by this equation are below :

Second Degree Parabola
The simplest example of the non-linear trend is the second
degree parabola, the equation is written in the form :
Y =
c a + bX + cX
2
When numerical values for a, b and c have been derived,

the trend value for any year may be
computed substituting in the equation the value of X for
that year. The values of a, b and c can be determined
by solving the following three normal equations
simultaneously:
(i) ∑Y = Na + bSX + c∑X2
(ii) ∑XY = a∑X + b∑X + 2
c∑X
3
(iii) ∑X Y
2
= a∑X +
2
b∑X +
3
c∑X
4
Note that the first equation is merely the summation of

the given function, the second is the summation of X
multiplied into the given function, and the third is the
summation of X2 multiplied into the given function.
When time origin is taken between two middle years SX
would be zero. In that case the equations are reduced to
:
(i) ∑Y = Na + c∑X 2
(ii) ∑XY = b∑X 2
(iii) ∑X Y
2
= a∑X + c∑X2 4
The value of b can now directly be obtained from equation

(ii) and value of a and c by solving equations (i) and (iii)
simultaneously. Thus,
a=b=c=
Illustration : The price of a commodity during 2000 –
2005 is given below. Fit a parabola Y = a + bX + cX to this 2
data. Estimate the price of the commodity for the year

2010 :
Year Price Year Price
2000 100 2003 140
2001 107 2004 181
2002 128 2005 192
Also plot the actual and trend values on graph.
Solution : To determine the value a, b and c, we solve
the following normal equations:
∑Y = Na + b∑X + c∑X 2
∑XY = a∑X + b∑X +2

c∑X3
∑X2Y = a∑X + b∑X + c∑X

2 3 4
----------------------------------------------------------------------
-------------
Year Y X X
2
X 3
X 4
XY XY
2
Yc
------------------------------------------------------
-----------------------------
2000 100 – 2 4 – 8 16 –
200 400 97.744
2001 107 – 1 1 – 1 1 –
107 107 110.426
2002
128 0 0 0 0 0 0 126.680
2003
140 +1 1 +1 1 +140 140 146.5
06
2004 181 +2 4 +8 16 +
362 724 169.904
2005
192 +3 9 +27 81 +576 1728 196.8
74
----------------------------------------------------------------------
----------------
N = 6 ∑Y = 848 ∑X = 3 ∑X = 19 ∑X = 27 ∑X = 115 ∑XY =
2 3 4
771 ∑X Y = 3099 ∑Y = 848.134

2
c
------------------------------------------------------
--------------------------------
848 = 6a + 3b + 19c ...(i)
771 = 3a +19b +27c ...(ii)
3,099 = 19a + 27b +115c ...(iii)
Solving Eqns. (i) and (ii), get
35b + 35c = 695 ...(iv)
Multiplying Eqn. (ii) by 19 and Eqn. (iii) by 3. Subtracting
(iii) from (ii), we get
5352 = 280b + 168 c ...(v)
Solving Eqns. (iv) and (v), we get
c = 1.786
Substituting the value of c in Eqn. (iv), we get
b = 18.04 [35 b +(35 × 1.786) = 695]
Putting the value of b and c in Eqn. (i), we get
a = 126.68 [848 = 6a + (3 × 18.04) + (19 × 1.786))
Thus a = 126.68, b =18.04 and c = 1.786
Substituting the values in the equation
Yc = 126.68 + 18.04X + 1.786X 2
When X = – 2, Y = 126.68 + 18.04(–2) + 1.786(– 2) 2
= 126.68 – 36.08 + 7.144 = 97.744

When X = –1, Y = 126.68 + 18.04(–1) + 1.786(–1) 2
= 126.68 – 18.04 + 1.786 = 110.426

When X = 0, Y = 126.68
When X = l, Y = 126.68 + 18.04 + 1.786 = 146.506
When X = 2, Y = 126.68 + 18.04(2) + 1.786(2) 2
= 126.68 + 36.08 + 7.144 = 169.904

When X = 3, Y = 126.68 + 18.04(3) + 1.786(3) 2
= 126.68 + 54.12 + 16.074 = 196.874

Price for 2010, Y = 126.68 + 18.04(8) + 1.786(8) 2
When X = 8 = 126.68 + 144.32 + 114.304 = 385.304

Thus the likely price of the commodity for the year 2010
is Rs.385.304.
The graph of the actual trend values values is given below:
Conversion of Annual Trend Equation to
Monthly Trend Equation
Fiting a trend line by least squares to monthly data may
be excessively time consuming. It is more convenient to
compute the trend equation from annual data and then
convert this trend equation to a monthly trend equation.
There are two possible situations: (i) the Y units are annual
totals, for example, the total number of passenger cars
sold; (ii) the Y units are monthly averages, for example
average monthly wholesale price Index.
Where Data are Annual Totals
A trend equation operative on an annual level is to be
reduced to a monthly level. Constant value, a, is
expressed in terms of annual Y values. To express it in
terms of monthly values, we must divide it by 12. Similarly
b is to be divided by 12 to convert the annual change to a
monthly change. But this division shows us only the change
for any month of two consecutive years, whereas we want
change for two consecutive months. Therefore b is to be
divided by 12 once again. Consequently, to convert annual
trend equation to a monthly trend equation, when the
annual data are expressed as annual totals, we divide a by
12 and b by 144.
Where the Data are given as monthly averages
per year
In this case, Y values are on a monthly level. Therefore, a
value remains unchanged in the conversion process. The b
value in this case shows us the change on a monthly level,
but from a month in one year to the corresponding month
in the following year. Here, it is necessary only to convert
b value to make it measure the change between
consecutive month by dividing it with 12 only.
Merits
(i) This method has no place for subjectivity since it is a
mathematical method of measuring trend,
(ii) This method gives the line of best fit because from this
line the sum of the positive and negative deviations is zero
and the total of the squares of these deviations is
minimum.
Limitations
The best practicable use of mathematical trends is for
describing movements in time series. It does not provide
a clue to the causes of such movements. Therefore,
forecasting on this basis may be quite risky.
Forecasting will be valid if there is a functional
relationship between the variable under consideration and
time for a particular trend. But if trend describes the past
behaviour, it hardly throws light on the causes which may
influence the future behaviour.
The other limitation is that if some items are added to the
original data, a new equation has to be obtained.
Curvilinear Trend
Sometimes, the time series may not be represented by a
straight line trend. Such trends are known as curvilinear
trends. If the curvilinear trend is represented by a straight
line or semi-log paper, or by polynomials of second or
higher degree or by double logarithmic function, then the
method of least squares is also applicable to such cases.
MEASUREMENT OF SEASONAL VARIATIONS
Seasonal variations are those rhythmic changes in the time
series data that occur regularly each year. They have their
origin in climatic or institutional factors that affect either
supply or demand or both. It is important that these
variations be measured accurately for three reasons. First,
the investigator wants to eliminate seasonal variations
from the data he is studying. Second, a precise knowledge
of the seasonal pattern aid in planning future operations.
Lastly, complete knowledge of seasonal variations is of use
to those who are trying to remove the cause of seasonals
or are attempting to mitigate the problem by
diversification, off setting opposing seasonal patterns, or
some other means.
Since the number of calender days and working days vary
from month to month, therefore, it is essential to adjust
the monthly figures if the same are based on daily
quantities, otherwise, there is no need for such
adjustment when we deal with either volume of
inventories or of bank deposits because then the values
are not influenced by the number of calender days or
working days.
Methods of Measuring Seasonal Variations
1. Method of Simple Averages (Weekly, Monthly or
Quarterly).
2. Ratio-to-Trend Method.
3. Ratio-to-Moving Average Method.
4. Link Relatives Method.
Methods of Simple Average
This is the simplest method of obtaining a seasonal index.
The following steps are necessary for calculating the index
:
(i) Average the unadjusted date by years and months or
quarters if quarterly data are given.
(ii) Find totals of January, February etc.
(iii) Divide each total by the number of years for which
data are given. For example, if we are given monthly data
for five years then we shall first obtain total for each
month for five years and divide each total by 5 to obtain
an average.
(iv) Obtain an average of monthly averages by dividing the
total of monthly averages by 12.
(v) Taking the average of monthly average as 100,
compute the percentage of various monthly averages as
follows:
Seasonal Index for January
=
If instead of the average of each month, the total of each
month are obtained, we will get the same result.
The following example shall illustrate the method.
Illustration : Consumption of monthly electric power in
million of kilowat (k.w.) hours for street lighting in India
during 1999-2003 is given below:
The above calculations are explained below:
(i) Column No. 7 gives the total for each month for five
years.
(ii) In column No. 8 each total of column No. 7 has been
divided by 5 to obtain an average for each month.
(iii) The average of monthly averages is obtained by
dividing the total of monthly averages by 12.
(iv) In column No. 9 each monthly average has been
expressed as percentage of the average of monthly
averages. Thus, the percentage for January
=
Percentage for February =
If instead of monthly data, we are given weekly or
quarterly data, we shall compute weekly or quarterly
averages by following the same procedure.
Ratio-to-moving average method : The method of
monthly totals or monthly averages does not give any
consideration to the trend which may be present in the
data. The ratio-to-moving-average method is one of the
simplest of the commonly used devices for measuring
seasonal variation which takes the trend into
consideration: The steps to compute seasonal
variation are as follows :
(i) Arrange the unadjusted data by years and months.
(ii) Compute the trend values by the method of moving
averages. For this purpose take 12 month moving average
followed by a two-month moving average to recentre the
trend values.
(iii) Express the data for each month as a percentage ratio
of the corresponding moving-average trend value.
(iv) Arrange these ratios by months and years.
(v) Aggregate the ratios for January, February etc.
(vi) Find the average ratio for each month.
(vii) Adjust the average monthly ratios found in step (vi)
so that they will themselves average 100 percent. These
adjusted ratios will be the seasonal indices for various
months.
A seasonal index computed by the ratios-to-moving-
average method ordinarily does not fluctuate so much as
the index based on straight-line trends. This is because the
12-month moving average follows the cyclical course of
the actual data quite closely. Therefore the index ratios
obtained by this method are often more representative of
the data from which they are obtained than is the case in
the ratio-to-trend method which will be discussed later
on.
Illustration : Prepare a monthly seasonal index from
the following data, using moving averages method :
Monthly Sales of XYZ Products Co,. Ltd. (Rs.)
Year
2000 2001 2002
January 3,639 3,913 4,393
February 3,591 3,856 4,530
March 3,326 3,714 4,287
April 3,469 3,820 4.405
May 3,321 3,647 4,024
June 3,320 3,498 3,992
July 3,205 3,476 3,795
August 3,205 3,354 3,492
September 3,255 3,594 3,571
October 3,550 3,830 3,923
November 3,771 4,183 3,984
December 3,772 4,482 3,880
Average of Monthly Averages 100.55
Putting average of monthly averages as 100, monthly
averages have been admitted to obtain seasonal index for
each month.
For example, Seasonal Index for January =
For February =
Merits
This method is more widely used in practice than other
methods. The index calculated by the ratioto- moving
average method does not fluctuate very much. The 12-
month moving average follows the cyclical course of the
actual data closely. So index ratios are the true
representative of the data from which they have been
obtained.
Limitations
All seasonal index numbers cannot be calculated for each
month for which data is available. When a four month
average is taken, 2 months, in the beginning and 2 months
in the end are left out for which we cannot calculate
seasonal index numbers.
The ratio-to-trend method : The ratio-to-trend
method is similar to ratio-to-moving-average method.
The only difference is the way of obtaining the trend
values. Whereas in the ratio-to-moving-average method,
the trend values are obtained by the method of moving
averages, in the ratio-to-trend method, the corresponding
trend is obtained by the method of least sequares.
The steps in the calculation of seasonal variation are as
follows :
(i) Arrange the unadjusted data by years and months.
(ii) Compute the trend values for each month with the help
of least squares equation.
(iii) Express the data for each month as a percentage ratio
of the corresponding trend value.
(iv) Aggregate the January’s ratios, February’s ratios,
etc., computed previously
(v) Find the average ratio for each month.
(vi) Adjust the average ratios found in step (v) so that they
will themselves average 100 per cent.
The last step gives us the seasonal index for each month.
Sometimes the median is used in place of the arithmetic
average of the ratios-to-trend. The choice depends upon
circumstances but there is a preference for the median if
several erratic ratios are found. In fact, if a fairly large
number of years, say, 20 or 15, are used in the
computation, it is not uncommon to omit extremely
erratic ratios from the computation of average of monthly
ratios. Only the arithmetic average should be used for
small number of years.
This method has the advantage of simplicity and case of
interpretation. Although it makes allowance for the trend,
it may be influenced by errors in the calculation of the
trend. The method may also be influenced by cyclical and
erratic influences. This source of possible error is
eliminated by the selection of a period of time in which
depression is offset by prosperity.
Illustration : Find seasonal variations by the ratio-to-
trend method from the following data :
Year 1st Quarter 2nd Quarter 3rd
Quarter 4th Quarter
2000 30 40 36 34
2001 34 52 40 44
2002 40 58 54 48
2003 54 76 68 62
2004 80 92 86 82
Solution : For finding out seasonal variations by ratio-
to-trend method, first the trend for yearly data will be
obtained and convert them into quarterly data.
Average 92.78 118.28 102.9
2 89.12
The average of quarterly average of trend figures :
Quarterly seasonal Index for 1st Quarter :
Quarterly seasonal Index for 2rd Quarter :
Quarterly seasonal Index for 3rd Quarter :
Quarterly seasonal Index for 4th Quarter :
The total of seasonal indices should be equal to 400 and
that for monthly indices should be 1200.
Merits
(i) This method is based on a logical procedure for
measuring seasonal variations. This procedure has an
advantage over the moving average method for it has a
ratio to trend value for each month for which data is
available. So this method avoids loss of data which is
inherent in the case of moving averages. If the period of
time series is very short then the advantage becomes more
prominent.
(ii) It is a simple method.
(iii) It is easy to understand.
Limitations :
If the cyclical changes are very wide in the time series,
the trend can never follow the actual data, as closely as a
12-month moving average will follow, under the ratio-to-
trend method. There will be more bias in a seasonal index
computed by ratio to trend method.
4. Link Relatives Method
Among all the methods of measuring seasonal variation,
link relatives method is the most difficult one. When this
method is adopted the following steps are taken to
calculate the seasonal variation indices :
(i) Calculate the link relatives of the seasonal figures. Link
relatives are calculated by dividing the figure of each
season* by the figure of immediately preceding season and
multiplying it by 100.
These percentages are called link relatives since they link
each month (or quarter or other time period) to the
preceding one.
(ii) Calculating the average of the link relatives for each
season. While calculating average we might take
arithmetic average but median is probably better. The
arithmetic average would give undue weight to extreme
cases which were not primarily due to seasonal influences.
(iii) Convert these averages into chain relatives on the
base of the first season.
(iv) Calculate the chain relatives of the first season on the
basis of the last season. There will be some difference
between the chain relative of the first season and the
chain relative calculated by the previous method. This
difference will be due to long-term changes. It is therefore
necessary to correct these chain relatives.
(v) For correction, the chain relative of the first season
calculated by first method is deducted from the chain
relative (of the first season) calculated by the second
method. The difference is divided by the number of
seasons. The resulting figure multiplied by 1,2,3 (and so
on) is deducted respectively from the chain relatives of
the 2nd, 3rd, 4th (and so on) seasons. These are correct
chain relatives.
(vi) Express the corrected chain relatives as percentage of
their averages. These provide the required seasonal
indices by the method of link relatives. The following
example will illustrate the process.
The correction factor is calculated as follows :

Chain relative of the first quarter (on the basis of first
quarter) = 100
Chain relative of the first quarter (on the basis of first
quarter) =
Difference between these chain relatives = 106.7 – 100 =
6.7
Difference per quarter =
Adjusted chain relatives are obtained by subtracting 1 ×
1.675, 2 × 1.675, 3 × 1.675 from the chain relatives of 2nd
, 3rd and 4th quarters, respectively.
Seasonal variation indices are calculated as below :
Seasonal variation index =
Meaning of “Normal” in Business Statistics
Business is often said to be “above normal” or “below
normal”. When so used the term “normal” is generally
recognized to mean a level of activity which is
characterized by the presence of basic trend and seasonal
variation. This implies that the influence of business
cycles and erratic fluctuations on the level of activity is
assumed to be insignificant. Therefore, the product of
trend value for any period when adjusted by the seasonal
index for that period gives us an estimate of the normal
activity during that period.
Measuring Cycle as the residual
Business cyclical variations are measured either as the
difference between the observed value and the “normal”.
Whatever remains after elimination of secular trend and
seasonal variations from the time series, is said to be
composed of cyclical variations and Irregular movements.
Second degree Parabola
The simplest form of the non-linear trend is the second
degree parabola. It is used to find long term trend. We use
the following equation for finding second degree trend –
Yc = a + bX + cX2
To know the value of a, b and c we use the following three

normal equations –
∑Y = Na + b∑X + c∑X2
∑XY = a∑X + b∑X +

2
c∑X3
∑X2Y = a∑X2 + b∑X +

3
c∑X4
A second degree trend equation is apporpriate for the
secular trend component of a time series when the data
do not fall in a straight line.
Illustration: Fit a parabola (Yc = a + bX + cX ) from the
2
following
Years 1 2 3 4 5 6 7
Values 35 38 40 42 36 39 45
– 84c = – 4
c = 4/84 = 0.05
By substituting the value of c in equation (i) we get the
value of a
7a + 28 × 4/48 = 275
7a = 275 – 1.33
a = 273.67/7 = 39.09
We may get the value of b with the help of equation (ii)
28b = 28
b = 1
The required equation would be:
Yc = 39.09 + 1X + 0.05 X2
= 39.09 + X + 0.05 X2
With the help of above equation we can estimate the value

for year 8 where x = 4
Yc = 39.09 + 4 + 0.05 (4)2
= 39.09 + 4 + 0.8 = 43.89

Exponential Trend
The equation for exponential trend is of the form: y = abx
Taking log of both sides we get log y = log a + x log b
To get the value of a and b we have normal equation
∑logy = Nlog a + logb ∑X
∑(x. log y) = log a∑x + log b∑X 2
When we slove these equations we get –

log a = and log b =
Illustration : The production of certain raw material by
a company in lakh tons for the years 1996 to 2002 are given
below:
Year : 1996 1997 1998 1999 2000 2001 2002
Production : 32 47 65 92 132 190 275
Estimate Production figure for the year 2003 using an
equation of the form y = ab1 where x = years and y =
production
Solution :
log y = 1.9704 + .154 x

for 2003, x would be 4 and log y will be
log y = 1.9704 + .154(4) = 2.5864
y = AL 2.5864 = 385.9
Thus estimated production for 2003 would be 385.9 lakh
tons.
https://sol.du.ac.in/mod/book/view.php?id=1317&chapterid=1071

14.1 - Autoregressive Models: Autocorrelation and Partial Autocorrelation

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

14.1 - Autoregressive Models: Autocorrelation and Partial Autocorrelation

Diunggah oleh

Hak Cipta:

Format Tersedia

14.

Example 1: Google Data

Example 2: Quake Data

Corr(ϵt,ϵt−1)=Cov(ϵt,ϵt−1)√ Var(ϵt)Var(ϵt−1) =ρ,Corr(ϵt,ϵt−1)=Cov(ϵt,ϵt−1)Var(ϵt)Var(ϵt−1)=ρ,

Solution : Here there are two mid-years viz; 2006 and

Substituting these values in the two normal equations, we

Trend values generated by this equation are below :

When numerical values for a, b and c have been derived,

Note that the first equation is merely the summation of

(ii) ∑XY = b∑X 2

The value of b can now directly be obtained from equation

data. Estimate the price of the commodity for the year

∑XY = a∑X + b∑X +2

∑X2Y = a∑X + b∑X + c∑X

771 ∑X Y = 3099 ∑Y = 848.134

When X = – 2, Y = 126.68 + 18.04(–2) + 1.786(– 2) 2

= 126.68 – 36.08 + 7.144 = 97.744

= 126.68 – 18.04 + 1.786 = 110.426

= 126.68 + 36.08 + 7.144 = 169.904

= 126.68 + 54.12 + 16.074 = 196.874

When X = 8 = 126.68 + 144.32 + 114.304 = 385.304

The correction factor is calculated as follows :

To know the value of a, b and c we use the following three

∑XY = a∑X + b∑X +

∑X2Y = a∑X2 + b∑X +

With the help of above equation we can estimate the value

= 39.09 + 4 + 0.8 = 43.89

When we slove these equations we get –

log y = 1.9704 + .154 x

Anda mungkin juga menyukai