Anda di halaman 1dari 24

Written by :

2009 Bowei Zhang


Proofread by:
Steven Miller
Steven Subichin
09/30/2009

Last Revision Date11/24/2009


2

Table of Contents
PROJECT INTRODUCTION.............................................................................................3
12 Months Rolling Sum and Lagged “Leading” Indicators.......................................................3
Correlation Verified To Be Linear.....................................................................................4
Goodyear and Industry North America
Market Share Forecast...................................................................................................4

Commercial Replacement Tire


Modeling Data Geography-US Models extended to include Canada...........................................5
MODELING EFFORT 1-SIMPLE LINEAR REGRESSION MODELS.........................................5
Causative
Modeling Assumptions andForecasting Models User
Limitations..............................................................................5
How to Obtain Monthly Forecast fromGuide 12 Months Rolling Sum Forecasts....................................5
Seasonality-Removed By 12 Months Rolling Sum.................................................................6
Outliers-Strike Consideration...........................................................................................6
Outliers Reduction-Smooth Economic Leading Indicators.......................................................7
Model Utility and Residual Analysis..................................................................................7
MODELING EFFORT 2- MULTIPLE LINEAR REGRESSION MODELS....................................8
How to Build Multiple Linear Regression Models in Minitab...................................................8
Reason for Not Using Multiple Regression Models’ Monthly Forecasts.......................................9
MODELING EFFORT 3- TIME SERIES MODELS...............................................................10
DATA RANGE AND SOURCE........................................................................................11
EXPLANATION OF THE STANDARD LINEAR REGRESSION SPREADSHEET......................12
Common Tabs...........................................................................................................12
Unique Tabs..............................................................................................................14
Steps of Searching for New Leading Indicators...................................................................15
MODEL REFRESH AND UPDATE ISSUES.......................................................................16
FILES LOCATION AND NAME......................................................................................16
FUTURE LOOK...........................................................................................................17
APPENDIX..................................................................................................................17
APPENDIX
3

PROJECT INTRODUCTION
Lean inventory and efficient demand planing are two weapons especially improtant for any businesses to
survive recession times. To achieve these two goals, a powerful demand forecasting system with
relatively high level of accuracy is necessary. The aim of the project is to build such forecasting models
which reveal the relationship between leading economic variables and Goodyear’s business that by
looking at the trend of those economic variables, Goodyear can tell the future highs and lows of its
relevant business segments.

Key members of the project include Steven Miller, Steven Subichin, Mike Ryan, Greg Tomsho and
Bowei Zhang.

This project is focused on Goodyear and the total industry’s performance in US/North America
commercial replacement tire markets. We split the commercial replacement tire market into four
different segments by tire application and wished to forecast the demand for each segment as well as the
total market as a whole.

Raw data we have for this project are:

• Monthly data of 74 leading economic variables(US) that may potentially relate to the commercial
replacement tire market from 01/1996 to 06/2009. (Multiple data sources)
• Industry’s monthly shipment data for each segement of US/North America commercial
replacement tire market: Urban/Regional/long haul/Mixed service from 01/1996 to 06/2009.
(Data source: RMA)
• Goodyear’s monthly billed sales and shipment data for each segment of US commercial
replacement tire market: Urban/Regional/long haul/Mixed service from 01/2003 to 07/2009 (Data
source: EDW)

One thing worth notice is that RMA and Goodyear’s classification of the four market segments are
slightly different. We kept Goodyear’s billed sales data for each market segment using its own market
classification criteria and regrouped Goodyear’s shipment data using RMA’s criteria. We did it this way
because Goodyear’s billed sales forecast will be used to assist DP which uses Goodyear’s market
classification criteria and Goodyear’s shipment will be used, together with Industry’s shipment forecast,
which uses RMA’s criteria, to calculate Goodyear’s future market share.

12 Months Rolling Sum and Lagged “Leading” Indicators


Initially we wished to find the potential linkage between external economic variables and the replacement
tire business, be it linear or non-linear relationship. To reduce the modeling noise occurred to relatively
small monthly billed sales and shipment values and identify their correlation with leading economic
variables more easily, we substituted each monthly tire sales/shipment data point with the sum of data for
that month and data for previous 11 months. Thereinafter this moving “yearly” data will be called 12
months rolling sum. We calculated the correlation coefficients between 12 months rolling sums of billed
sales/shipments of Goodyear and Industry for each market segment with the 74 economic variables’
monthly data. We assumed some of the variables have leading capabilities for the commercial replacment
market. To test that, we simplely lagged those variables by certain months when we calculated the
correlation coefficients. For example, if we think it takes the replacement tire market 2 months to respond
to the movement of a leading indicator, then we would use the 2 months lagging data of that variable to
calculate the correlation coefficients. For Goodyear’s billed sales data, we calculated their correlation
4

coefficients with up to 24 months lagging data of the 74 variables. The numbers somehow proved our
assumption because some variables have high correlation coefficients when lagged by near term and
some by long term.

Correlation Verified To Be Linear


Although correlation coefficient is a tool to depict the strength of a linear relationship between two
variables, the interpretation of that value could be very arbitary. There is no set rule about what nubmer is
high and low and sometimes high numbers don’t necessarily mean pure linear relationships. So we also
drew scatter plots to study the true relationship between tire sales/shipment and leading economic
varaibles and used correlation coefficients as a second reference.

Using these two tools, we were able to identify some regular relatonship patterns between external
variables and sales/shipment data that can be captured by certain mathematical models. By regular I mean
that those relationship patterns can mostly be depicted by certain mathematical models. After careful
consideration and comprehensive tests, we decided to build only simple linear regression models (which
means one leading indicator matchs one market segment)for ease of understanding and use in practice.

Now we have already built simple linear regression causative models with some level of confidence for
Goodyear’s billed sales, Goodyear’s shipment and Industry shipment to forecast 2 months and 12 months
out for each of the four segments in US only and North American commercial replacement tire market.
For Industry shipment models, we also built time series models to provide alternative views and they all
achieved decent forecast accuracy rates.(Monthly forecast ex-post errors for US only time series models
range from 7.75% for Urban tires and 22.24% for Mixed Service tires; Monthly forecast ex-post errors for
North America Market time series models range from 7.05% for Urban tire and 17.75 % for Mixed
Service tire. )

Market Share Forecast


Since for Goodyear and Industry shipment data we applied the same market classsification criteria(by
vehicle application code)[1] when grouping the data for each market segment, we are able to
forecast(calculate) Goodyear’s future market shares.

However, a word of caution is that even though we re-grouped Goodyear’s shipment data using RMA’s
criteria, there are still some difference between RMA’s definition of certain market segments and
Goodyear’s. One verification is that Goodyear’s re-grouped shipment data(North America) for Regional
and Long haul segments are significantly different from what RMA’s adjustment and interpretation of
shipment data reported by Goodyear. But the difference within each market segment can offset each other
to a certain extent. Also, due to “re-statement” issue, Goodyear’s total shipment data in EDW is different
from the data sent back by RMA, on average by 7.5% during the period from 06/2007 to 04/2008.

The above two facts mean that using different sources for Goodyear shipment data can leads to different
causative models. RMA’s “official” Goodyear shipment data has its value for other analysis endeavors.
However,we chose to use EDW’s Goodyear shipment data to build causative models and calculate
Goodyear’s future market share because the modeling results from this project are intended for internal
use only.
5

Modeling Data Geography-US Models extended to include Canada


All the data and models mentioned during the following paragraphs are for US sales/shipment only. For
North America models, we used the same set of external economic variables (US economic variables)
selected for US only models and also achieved decent forecast accuracy for two reasons. One is because
Goodyear’s commercial replacement tire sales in Canada is only about one tenth of its sales in US and
this figure is quite stable. Another is that Canada’s economy is highly related to US’s. Due to these two
reasons, we did not go further to collect Canada’s economic variables when building the North America
causative models.

MODELING EFFORT 1-SIMPLE LINEAR REGRESSION MODELS


Modeling Assumptions and Limitations
The fundamental assumption of our simple linear regression forecast models (in the format of Y=a+
b.X)is that the tire sales/shipment for the commercial replacement business is linearly dependent on a
leading indicator and this relationship will last into future. Hence by using lagged data of the leading
indicator in regression models, we can forecast future sales/shipment. Strictly speaking, this modeling
technique might not be the most accurate one. It could be very possbile that other variables can affect tire
sales/shipment independently or together and their relationship with tire sales/shipment are not linear. But
our tests indicated that a lot of the 74 variables are highly correlated with each other and it’s not worth
spending months to develop complex multiple curvilinear regression models that is difficult to
interpret ,maitain and use. Hence for this project, we chose the simplest linear regression model.

Ideally if the chosen leading indicators’ monthly data can be available without delivery lag , we can
forecast tire sales and shipment 2 and 12 months out without having to forecast those economic variables.
However, since most of our leading indicators come from public/government source, there is a lag most
of the time. Hence in order to effectively execute the causative models, sometimes we need to use time
series modeling techniques(moving average, exponential smoothing, etc.) to obtain leading indicator’s
future values first, if they are not provided by the data source.

Another assumption of our regression models is related to our defined transformation formula of 12
months rolling sum forecast to monthly forecast. As mentioned in the “Project Background” part, we used
12 months rolling sums history value as dependent variables in our regression models because they are
more stable and capable of absorbing modeling noise than monthly history value. As proved by our tests,
for monthly history value we can barely find well correlated external factors but for 12 months rolling
sum history value, it is the opposite.

How to Obtain Monthly Forecast from 12 Months Rolling Sum Forecasts


Assume we have two consecutive rolling sum forecasts. One is i=213Fi and another isi=112Fi. “i” is
the number of month. For example, i=13 means that it is the 13th month.

For illustration purposes, assume i=1 is Jan of the first year and i=13 the Jan of second year. We also
assumed that each 12 months rolling sum forecast is the sum of twelve monthly forecasts for twelve
consecutive months.

So in order to get the monthly forecast for Jan of the second year, f13 in this case, we will do the
following transformation (H1 in the formula represents the monthly history value of Jan of the first year):
6

f13=(i=213Fi-i=112Fi) +H1

=F13+i=212Fi-i=212Fi - F1+H1

= H1+ (F13- F1) + ( i=212Fi-i=212Fi)

H1 is the true history value of Jan in the first year. (F13- F1) is deemed as the forecasted monthly
increase/decrease year-over-year, the change from Jan in the first year to Jan in the second year in our
example. We assume the forecasted values from the two rolling sum forecasts of the same 11 months are
almost the same, namely the artificial error term ( i=212Fi-i=212Fi) would be close to 0. If this
assumption does not hold, our forecasted monthly value will deviate from the true monthly forecast we
wish to, but impossible to, get directly from 12 months rolling sum forecasts. This is likely to happen to a
monthly forecast value when absolute percentage forecast errors of the two related 12 months rolling sum
forecasts change dramatically in that it will violate the i=212Fi≈i=212Fi assumption. It will be easier
for multiple regression models to violate this assumption thus generating inaccurate forecasts. More
detailed discussion will be covered in the section “Multiple Linear Regression”.

Seasonality-Removed By 12 Months Rolling Sum


One benefit of using 12 months rolling sum history as the dependent variables for linear regression
models is that we don’t have seasonality in the data. Appendix 2 is a comparison plot for monthly and
12 months rolling sum industry commercial replacement tire shipment data. As can be seen, the monthly
data is more volatile and has some seasonality across the history. The 12 months rolling sum, on the other
hand, does not have seasonal pattern at all (This fact applys to every market segment of our analysis). But
over the long term, the 12 months rolling sum may indicate some regular business cycilcal pattern which
can be deemed as a sort of seasonality when buiding time series models. This topic will be covered in
more detail later.

Normally for statistical modeling purpose, if the raw data has strong seasonality, we would have to
deseasonalize them first then build the model and in the end “reseasonalize” the forecast. In our models,
the transformation foluma introduced above f13= H1+ (F13- F1) + ( i=212Fi-i=212Fi) adds
seasonality back to monthly forecasts by adding the monthly history term (“H1” in the formula). Hence
by using this formula we successfully avoided the seasonality issue in raw data and kept seasonality in the
monthly forecast. The leading indicators we picked for this project are all free of seasonality issue.
However, in the future, if we want to bring in new economic variables with seasonal pattern, we
have to deseasonalize them before use.

Outliers-Strike Consideration
Outlying data points in terms of either independent or dependent variables in regression and time series
models can heavily skew forecast results and hence forecast accuracy. Among the many possible outlying
reasons, unusual one time event can generate abnormal history data. For example, the strike occurred on
Oct 5th, 2006 and ended early 2007 made Goodyear’s commercial replacement tire sales of each market
segment from 11-2006 to 01-2007 extremely low. Especially sales for 12 -2006 is lower than the lowest
points during recent recession times. Hence for Goodyear’s total replacement tire market, we
overforecasted about 63% and 117% for 11-2006 and 12-2006 respectively. To fix the problem, we
replaced the monthly sales data from 11-2006 to 01-2007 with the monthly average of same month from
2003 to 2005 and reran the linear regression model. The total cumulative absolute forecast error during
model building period (12-2003 to 12-2007) decreased from original 20.07% to 14.57%. However, the
7

model validation error (ex-post error over period from 01-2008 to 06-2009) increased a little from the
original 9.37% to 10.30%. One possible explanation is that without outlying data clean-up, the model
“studied” the dampened sales during stike period and exterted the learning for forecasts over ex-
post period, during which recession and dampened sales exist. Hence data clean-up in this case did
not improve the model’s ex-post forecast accuracy. For details about the test, please refer to the tab
named “Industry outlier fix” in the excel file named “Goodyear Billed Sales Causative Models 2
months out (US only)”. After careful thought, we think our Goodyear models are robust enough to
contain some outlying raw data in the model buidling period without deterioriate forecast results.
Hence we kept our models as they are.

Outliers Reduction-Smooth Economic Leading Indicators


However, there is one easy way to reduce, at least partially, the outlying forecast points. Before we dive
into that, let’s first look at how we calculate the monthly forecast residual. Assuming H13 is the monthly
history value of the 13th month and using the transformation formula mentioned earlier, we can get the
following residual calculation formula:

H13- f13=H13- H1- (F13- F1) - ( i=212Fi-i=212Fi) = (H13-F13)-(H1- F1) - ( i=212Fi-i=212Fi)

As can be seen, abnormal values of H13, H1, or abnormal change of leading indicators’ monthly values
(It may cause two consecutive 12 months rolling sum forecasts change dramatically, which will very
possibly violate our assumption that i=212Fi≈i=212Fi). Hence one reasonable remedy for outlying
monthly forecasts is to smooth the leading indicators’ monthly values by replacing them with the average
of values of corresponding months plus previous 11 months. This transformation of leading indicators
will reduce the monthly forecasts’ volatility.For an example, please refer to Appendix 3. We tested the
transformation technique of leading indicators on the industry shipment 12 months out model for North
America region. As shown by the plot, the transformation makes the forecasts smoother and closer to
history value. In fact, the ex-post error during period from 01/2008 to 06/2009 dropped from 21.59% to
only 9.35% after we took the 12 months rolling average of leading indicators’ monthly value.

This smoothing technique will not always generate more accurate forecast results. However, it will
definitely help make monthly forecasts less volatile if the leading indicators used are volatile in nature.
We applied this technique for Industry shipment 12 months out forecast models for US only and North
America data.

Model Utility and Residual Analysis


Appendix 4 is a causative models’ utility comparison table. Key metrics used include R-square (both
original and adjusted) and cumulative absolute percentage errors (ex-post period) of both causative and
“naïve” models. The naïve models simply assume “what happened yesterday will happen again
tomorrow”. Hence we would take current month’s sales/shipment as the forecast in 2 months and 12
months for naïve models. As coded in blue at the right most column of this table, ex-post forecast errors
of naïve models are all higher than those of causative models, except for Regional Market. We think this
could be either a coincidence or that the Regional Market is relatively speaking, stable enough to repeat
the history value over time. Either explanation won’t invalidate the effectiveness of our causative models
though.

Ex-post errors are cumulative absolute percentage errors. To be specific, this heuristic metrics is
calculated by dividing the sum of absolute values of all monthly forecast residuals by the sum of monthly
8

history values over a certain period. We prefer to use this metric to evaluate our forecast’s absolute
deviation from history rather than from a constant, as in R-square.

The R-square value is another tool to indicate the effectiveness of regression models. The higher R-square
(including adjusted one) is, the more likely that the total variation in the n observed values of the
dependent variable is explained by the overall regression model. However, there is no absolute standard
for what is a “good” value. As can be seen from the table (color coded in yellow), Goodyear models have
relatively high R-square and Industry models’ lower, despite the fact that industry models are as accurate
as Goodyear models in terms of monthly forecasts’ ex-post errors. This brings up two questions. One
question is that how long can we keep using the causative models until we have to revise them? (This
question will be addressed at the last section of this document). Another is that can we build multiple
regression models that generate small monthly forecast ex-post errors and high R-square (adjusted) value?
(Addressed below)

MODELING EFFORT 2- MULTIPLE LINEAR REGRESSION MODELS


How to Build Multiple Linear Regression Models in Minitab
Since in general our Industry models have low R-square, “industry total market shipment 2 months out
forecast model” is picked for this test. What we wish to build is a multiple regression model which has a
high R-square and low cumulative absolute percentage error for monthly forecast during ex-post period.
Minitab’s automatic model selection function is used to perform the test.

Ideally, it would be great if we can dump as many variables’ data into Minitab as possible and let the
computer generate an optimal solution for us. However, Minitab can only process a limited number of
variables using “Stepwise” [5] and “Best subsets” [6] selection methods. So some variables need to be
screened out of the candidates’ pool as follows:

• Select 12 months rolling sum of total industry shipment and 2 months lagged data of the 74
variables.
• Calculate correlation coefficients between rolling sum values and 74 variables and keep variables
which have a correlation coefficient higher than 40% or lower than -40%. This step reduced the
number of potential variables from 74 to 30.
• Use “Stat-Basic Statistics-Correlation” in Minitab to generate Correlation Matrix, which includes
P-value for each correlation coefficient between any pairs of variables, including dependent
variable-shipment. (If the correlation coefficient between two variables is higher than 0.9 or
lower than -0.9, then one of them can be considered redundant for the dependent variable in the
model. If there are more than two multi-correlated variables, compare their P-values with
dependent variable first to screen out those with higher P-values; if P-values are the same, keep
those with a higher absolute value of correlation coefficient with dependent variable.)
• Use the Correlation Matrix to eliminate redundant variables. This step reduced the number of
variables from 30 to 20.
• Use “Step-wise” and “Best-subsets” methods in Minitab to generate the best multiple linear
regression models.

The best two models generated by “Best-subsets” method are one “10 variables linear regression model”
[7]
and one “9 variables linear regression model”. [8]
9

The best model from “Step-wise” method is a “5 variables linear regression model”. [9]

Even though these models all have high R-squares (around 90%) and low ex-post forecast errors for 12
months rolling sum forecasts , which are better than our original “one variable linear regression models”,
their ex-post forecast errors for monthly forecast are especially high (around 40% and that of our original
model is around 9%). However, the multiple regression models’ cumulative absolute percentage errors of
monthly forecasts during model building period (around 12%) are not very far from those of our original
models (8%).

Reason for Not Using Multiple Regression Models’ Monthly Forecasts


We find out that the reason why multiple regression models did not perform well for monthly forecast
during ex-post period is related to the assumption of our defined transformation formula of 12 months
rolling sum forecast to monthly forecast. As mentioned before, we assume the forecasted values from the
two 12 months rolling sum forecasts of the same 11 months is almost the same,
namelyi=212Fi≈i=212Fi.

However this assumption is not always true and can be more easily violated by multiple linear regression
models than single linear regression models. The multiple linear regression models in our case all have
high R-square values, which means that the variation of dependent variable (shipment/sales of tires) is
explained to a large extent by those multiple variables we included (despite the fact that mathematically
speaking, the more variables we add in a multiple regression model, the higher its R-square). The
downside of that is that for multiple regression models we have more external factors to control and
each one’s fluctuation can affect our final transformed monthly forecasts.

Look at the data and plot in Appendix 10. In the vertical axis of the plots in Appendix 10 is the
cumulative absolute percentage error for “12 months rolling forecast”. Blue line represents the 12 months
rolling sum forecast obtained by our original single linear regression model. Red and green line are
forecasts from two multiple linear regression models selected by “Best subsets” method. As can be seen,
before period 49, which is 12-2007, multiple regression models are more accurate than single linear
regression model in terms of 12 month rolling sum forecast.

During ex-post period, from 01-2008 to 06-2009, the forecast accuracy of multiple regression models
fluctuate more heavily than the single linear regression model. That is because there are more variables
in multiple regression models and it is more possible that the recession impacts on those variables will
skew the 12 months rolling sum forecast. More fluctuation between two consecutive 12 months rolling
sum forecasts will violate thei=212Fi≈i=212Fi assumption and cause our related monthly forecast to
have a high forecast error.

To sum up, due to technique we used to transform 12 months rolling sum forecast to monthly
forecast and the fact that multiple regression models are more difficult to control and maintain, we
think simple linear regression model is better for our modeling purposes even though relatively,
they will have a smaller R-square, compared with multiple regression models.

It is natural to think that if we can use monthly shipment/sales as dependent variables directly to build
multiple regression models then we can have both high forecast accuracy and high R-square. However,
the monthly data is too volatile compared with 12 months rolling sum values, and as tested, we can barely
find well correlated external indicators for monthly shipment/sales data.
10

MODELING EFFORT 3- TIME SERIES MODELS


Other than causative models, we also tested time series models for tire sales/shipment data. The
forecasting method we used is called exponential smoothing, which weights the observed time series
values unequally. More recent observations are weighted more heavily than more remote observations.
This modeling method studies the time series history’s level, trend (optional) and seasonality (optional)
and copies it/them into future to make forecasts. As mentioned earlier, Goodyear’s history data dates back
to 2003 and Industry back to 1996. Opposite to causative models, for time series studies, the more data
we have, the easier it is for us to capture trend and seasoanlity, if there are any. As proofed by the plots in
Appendix 11, Goodyear’s history data is too short to show obvious trend and seasonality while industry’s
history data is strong enough to be considered as a good candidate for Multiplicative Holt Winter’s
method [12]. Actually, the “seasonality” indicated in the industry data is business cyclicality over the long
term because the 12 months rolling sum values don’t have seasonality in themselves. But this cyclical
pattern can be modeled as a sort of seasonality.

Due to data availability, we built time series models for industry shipment only using monthly data from
01/1996 to 12/2007 and tested each model over the period from 01/2008 to 06/2009. The monthly hisotry
data is very volatile even though it has trend and seasonality over history. To make sure we build the best
time series models we can, we tested four models using both the monthly shipment data and 12 month
rolling sum shipment data for each market segment. The four models are : Level only; Level + Trend;
Level+Trend+Increasing Seasonality (Multiplicative Holt Winter’s method); Level+Trend+Constant
Seasonality (Additive Holt Winter’s method [13]). Hence in total, for each market segment, 8 time series
models were tested using Minitab. As expected, the model that generates the smallest monthly forecast
ex-post error for all the market segments except Mixed service is Multiplicative Holt Winter’s method on
12 months rolling sum history data. The fact that Mixed service is an exception did not surprise us
because its relatively complex business structure. Mixed service’s hisory plot does not show very typical
and easily recognizable trend and seasonality patterns either. The best time series model for this market
segment is “level only” using 12 month rolling sum hisory. This model generates a monthly forecast ex-
post error of 22.24%, which is higher than those of all the Multiplicative Holt Winter’s models for the
other market segments. The “level only” model means that if we wish to forecast into future for multiple
periods after 06/2009, we would get the same 12 month rolling sum forecast for every future period. In
that case, according to the transformation formula introduced previously (f13=(i=213Fi-i=112Fi)
+H1), all future monthly forecasts will be the same as the one year back monthly history values. This is a
kind of naïve model too.

All time series models are executed using Minitab. After uploading history shipment data into Minitab,
simple go to “Stat-Time series-Single Exp Smoothing, Double Exp Smoothing and Winter’s
method” for “level only”, “level + trend” and “level+ trend+ seasonality” models, respectively. For
“level only” and “level+ trend” models, Minitab can generate optimal models by automatically searching
for smoothing constants for level and trend components that minimize the “Sum of Square Errors”. For
Holt Winter’s method, we have to manually define all three smoothing constants for level, trend and
seasonality while the default 0.2 values for all three smoothing constants work well for our project, most
of the time.

All the Industry shipment modeling results are stored in an Excel file named “Industry Time Series
modeling”. For each market segment, there are three tabs in this Excel file. Take urban market for
example. In the “Urban forecast results” tab, monthly forecast, monthly forecast’s cumulative absolute
11

percentage errors, “history v.s. forecast” plots for each of the eight time series models are listed for
comparison. In the “Rolling to monthly transform- Urb” tab, 12 months rolling sum forecasts generated
by Minitab can be copied to the column named “Urban 12 months rolling sum forecast” to generate the
monthly forecast in the right most column. The transformation formula previously defined was already
imbedded in this calculation. In the last tab called “Error Calculator- Urban”, transformed or Minitab
directly generated monthly forecasts can be copied to related column to get the forecast error statistics
color coded in blue.

DATA RANGE AND SOURCE


As an old saying goes “ Garbage in garbage out”. To avoid this cliché for our project, we have to
carefully maintain and process the raw data. All Goodyear billed sales data is available from EDW. As
Goodyear’s market group names are slightly different from the RMA names for the four market
segments, for a detailed transformation table, please refer to Appendix 14. All Goodyear shipment data
has to be manually processed in order to apply RMA’s market classification. This job is previously done
by Greg Tomsho using Materials number × Vehicle Application code table generated by Steven D.
Miller. For a copy of this table, please see file named “pbu03_all”.All Industry shipment data by market
segments is available from RMA. Contact Krista Liem for latest industry data.

All the key leading indicators used for our modeling purposes are summarized in the table named “Key
Leading Indicators”[15].In general our leading indicators come from three sources: Federal Reserve Bank
of St. Louis, Energy Information Administration, US Dept. of Energy and Freight Transportation
Research (FTR) Associate. FTR database is updated monthly and can be accessed by Krista Liem.

Another thing worth notice is the data range issue. For Goodyear and Industry’s causative models, we
used sales/shipment history data from 2003 to 2009. For Industry’s time series models, we used data from
1996 to 2009. It makes sense to use more history data to study purely the time series’ trend and
seasonality pattern. However, since so many macro-economic factors can affect tire sales/shipment
dramatically over a long period of time, it would be risky to use say, 12 years’ tire sales/shipment history
data to build single linear regression models. As a matter of fact, at the initial stage of our project, we
built causative models for Industry shipment using history data for past 12.5 years and then we reduced
the data range to past 6.5 years and re-ran the models. It turns out that using less shipment history data,
we got lower monthly forecast ex-post errors. And we had to change some of the leading indicators
selected previously.

Hence to build effective causative models, we may have to consider dropping some of the oldest data in
modeling period when new data becomes available and for time series models, it is OK to include new
data points while keeping the old data. Also, most of our economic indicators’ data source organizations
revise their published data periodically afterwards . As the new monthly data becomes available, the data
for past periods may also have changed. If that is the case, all revised data within our modeling data range
should be used to re-run the model to get new sales/shipment forecast.

EXPLANATION OF THE STANDARD LINEAR REGRESSION


SPREADSHEET
All the causative models developed so far have the same standardized excel spreadsheet structure.
12

There are six files in total for each category of modeling and they are named as:

• Goodyear Billed Sales Causative Models 2 months out


• Goodyear Billed Sales Causative Models 12 months out
• Goodyear Shipment Causative Models 2 months out
• Goodyear Shipment Causative Models 12 months out
• Industry Shipment Causative Models 2 months out
• Industry Shipment Causative Models 12 months out

There are two sets of models. One set for US only data and another set for North America data. Hence in
total there are 12 files. Every file contains the following 11 tabs. Take “Goodyear Billed Sales Causative
Models 2 months out (US only)” for example.

Common Tabs
1. ReadMe:
It contains description of the models within the Excel file and description of each indexed tab and
how to use them.
2. Scatter Plots:
For each market segment and each of the 74 economic variables, there is a matching scatter plot
generated in this tab. All the data used come from the tab “x-months Lagging Data Set”. If the
current structure of the data in that tab does not change, the scatter plots will update automatically
as the data changes. However, if new data is added, then we have to manually change the plots to
refect the new data points. To do that, you can right click on the plots and select “Select data”,
then you will be directed to the tab “x-months Lagging Data Set”, where you are able to re-select
raw data.
3. x-months Lagging Data Set:
“x” can be either 2 or 12 depending the purpose of the model. The reason why our causative
models have the ability to forecast dependent variables’ future values is that we lagged the
independent variables while constructing the linear regression relationships. If we wish to
forecast 2 months out, we will lag the leading indicators by 2 months; if we wish to forecast 12
months out, we will lag the leading indicators by 12 months. Hence in this tab, billed sales’ 12
months rolling sum data and 2 months lagged 74 economic variables are listed from 12-2003 to
06-2009, which include both the modeling building period and validation (ex-post) period.
4. All Data:
This tab lists all 74 variables’s monthly history data from 01-1996 to 06-2009. Some variables
may have missing data points for the most recent history. This tab was set up to store any history
data used for the project.
5. Correlation Coefficients:
This tab contains the monthly history of tire sales and automatically calculated 12 months rolling
sum values. Also, the monthly history data of all 74 variables are listed here. The red dotted line
table at the bottom of this tab listed correlation coefficients (calculated using “=Correl()”function
in Excel) between 12 months rolling sum over the period from 12-2003 to 06-2009 (same period
as used in the scartter plots) for each market segment and 2 months lagged 74 variables. All the
correlation coefficients whose absolute values are above 80% are listed in color using
“Conditional Formatting” in Excel. To obtain the updated correlation coefficients as new data
comes in, you may have to add new monthly sales, drag-down excel cells to get 12 months rolling
13

sums, add new monthly data for the 74 variables, and re-set the inbedded formula to include new
12 months rolling sum and leading indicators.
This tool is used together with scatter plots to detect potential linear relationship between leading
indicators and tire sales data.
6. Forecast errors:
This tab listed all the selected variables (using scartter plots and correlation coefficients in
previous tabs) and their cumulative absolute percentage errors during period 01/2008 to 06/2009
(ex-post errors) for both the 12 month rolling sum forecasts and transformed monthly forecats.
The 12 month rolling sum forecasts’ ex-post errors are used to monitor our simple linear
regression models’ effectiveness in capturing potential linear relationship between 12 months
rolling sum sales and leading indicators. If the relationship is close to linear, this ex-post error
should be small. And the monthly forecast ex-post error is used to check if our model can
generate decent monthly forecast in near future. Normally, the ex-post error for 12 months rolling
sum forecast should be smaller than that of monthly forecast.
7. Urban-x:
From tab 7 to 11 are the models we used to generate monthly forecasts. All tabs have the same
structure and are self-explanatory. For illustration purpose, a detailed explanation is provided
here only for Urban-2 tab for Goodyear Billed Sales 2 months out model.
The only two columns that need to be updated with external data source are the “monthly history”
of tire sales and the column named by the selected leading indicator. You can drag down the
column named “12 months rolling sum history” to get the 12 months rolling sum needed for
modeling.
Then use “Regression” function in an Excel add-in called “data analysis” [16] to select the
dependent variable, which is 12 months rolling sum history of tire sales and 2 months lagged
monthly history of the leading indicator over the modeling period. The “Regression” function in
“Data analysis” will generate a detailed ANOVA analysis as shown in appendix 17. The orange
color coded two numbers are coefficients for the constant value and leading indicator in the
simple linear regression model. You can copy those two numbers in the corresponding locations
at the top of the table then the monthly forecasts (at the right most of the table) and forecast errors
(at the top right of the table) will be automatically generated. If current data selected for model
building is used for forecasting future monthly sales, you don’t have to change the coefficients
previously entered at the top of the table. When new monthly sales data and leading indicator’s
data become avaible, you can add them in and change the formula for new ex-post error
calcualtion. If after a certain period of time, new data needs to be added into the modeling period,
you have to rerun the “data-analysis” add-in to reselect the corresponding 12 month rolling sum
tire sales and leading indicator’s monthly data.
Most of the data for new cells can be obtained by “draging down” the cells in Excel.
8. Regional-x:
See tab 6 for instruction.
9. Long haul-x:
See tab 6 for instruction.
10. Mixed service-x:
See tab 6 for instruction.
11.Total Market-x:
See tab 6 for instruction.
14

Unique Tabs
• Goodyear Billed Sales Causative Models 2 months out
○ Industry Outlier Fix: As mentioned in the section “Simple linear regression models:
Outliers”, this tab is a test for fixing the outlying monthly forecast due to the Goodyear
strike during year 2006.
• Goodyear Billed Sales Causative Models 12 months out
○ None
• Goodyear Shipment Causative Models 2 months out
○ None
• Goodyear Shipment Causative Models 12 months out
○ Actual v.s. Forecast: This tab listed the monthly forecasts from 01-2008 to 06-
2010,using “Business Inventory” as the leading indicator for each market segment. Aslo,
monthly history values from 01-2008 to 07-2009 for each market segment are shown here
for comparison purpose. The plots at the bottom of this tab show a clearer picture of how
the forecasts match the history.
○ Market share calculation: In this tab we used the actual Goodyear and Industry
shipment and forecasted Goodyear and Industry shipment to calcualte “actual and
forecasted Goodyear market share” for each market segment. Industry forecasts come
from the file named “Industry Shipment Causative Models 12 months out”. At the bottom
of the tab, we also made “actual v.s. forecast” plots for each market segment. One reason
why the "forecast" is not so close to past history is due to its mathematical formula
(Goodyear shipment forecast divided by Industry shipment forecast). If Goodyear and
Industry shipment both over-forecast or under-forecast at the same time, their negative
impacts on the error for market share calculation can be partially cancelled out. However,
if Goodyear and Industry shipment's forecasts error go to opposite directions, the error
for market share calculation can be amplified sometimes.
• Industry Shipment Causative Models 2 months out
○ None
• Industry Shipment Causative Models 12 months out
○ Cross-verify: In this tab the summed industry forecast (summation of forecasts of the
four market segments that form the total market) was compared with the industry total
history and error statistics was calculated. Summed industry forecast generates a monthly
forecast ex-post error of 10.07%, which is comparable with the 13.31% generated by
“Conference Board Index of Consumer Confidence” as the industry total’s leading
indicator. This test is another way for checking the accuracy of the four market semgents’
causative forecasting models.
○ Actual v.s. Forecast: This tab listed the monthly forecasts from 01-2008 to 06-
2010,using “Business Inventory” as the leading indicator for each market segment. Aslo,
monthly history values from 01-2008 to 07-2009 for each market segment are shown here
for comparison purpose. The plots at the bottom of this tab show a clearer picture of how
the forecasts match the history.
○ Long haul backup: This tab shows the second best leading indicator for Long haul
Market: Housing starts
15

Steps of Searching for New Leading Indicators


The logical steps of using multiple tabs in each model/excel file to search for the best leading indicators
for each market segment can be described as follows. Take “Goodyear Billed Sales Causative Models 2
months out (US only)” for example.

1. Update tire sales and leading indicators monthly data in the tab “Correlation Coefficients”.
2. Adjust fomulas to include new data when refreshing the correlation coefficient calculation table
in this tab.
3. Copy and paste new tire sales 12 month rolling sum data (including ex-post period)and leading
indicators’ monthly data to the tab “2-month lagging data sets”.
4. Go to tab “Scatter Plots” to update scatter plots one by one if necessary to include new data added
in the tab “2-month lagging data sets” .
5. Observe the scatter plots. If a linear relationship is found, consider that variable a condidate for
test.
6. If linear relationship is not obvious to detect, use tab “Correlation Coefficients” to search for
variables with high correaltion coefficients with 12 months rolling sum tire sales data.
7. To test all the candidate varibles for a specific market segment, copy their data to corresponding
market segment tab one bye one then perform the following test starting from step 8.
8. Update both the monthly data for tire sales and leading indicator selected in specific market
segment tab.
9. Click Excel “Data-Data Analysis-Regression” tab to select the matching 12 months rolling sum
sales and lagged monthly data for leading indicator(lagged by 2 months in this case) and perform
ANOVA analysis.
10.Copy coefficients for the constant and variable in linear regression model from the ANOVA
analysis generated by Excel to corresponding positions at the top of the market segment tab.
11. Drag down the the colum called “12 months rolling sum forecast” and “monthly forecast” if
necessary. All formulas are already inbedded.
12.Copy and paste the ex-post forecast errors for both 12 months rolling sum and monthly forecasts
automatically generated at the top right of the table in market segment tab to corresponding
positions in the tab named “Forecast errors”.
13. Repeat step 7 to 12 until ex-post errors generated by every potential leading indicator are
recorded in the “Forecast Errors” tab.
14.Select the one variable that does not generate negative monthly forecasts and gives a low monthly
forecast ex-post error.
15.If outlying monthly forecasts are generated by a chosen leading indicator, either manual
adjustment of forecast is required or a back up leading indicator can be selected from the tab
“Forecast Errors”.

MODEL REFRESH AND UPDATE ISSUES


To use linear regression models to forecast, one important underlying assumption is that the linear
relationship between independent variable (leading indicators in our models) and dependent variable
(tires sales/ shipment)will last into future. And the similar type of underlying assumption for exponential
smoothing models is that the trend and seasonality will last into future. However, in practice these
assumption won’t hold forever. That brings up the question about when to revisit the models. The
suggested re-modeling cycle is 6 months for our project. Every six months, when we have 6 more months
16

new tire sales/shipment data, we can evaluate the effectiveness of each model. If the leading indicator still
works fine, then the only thing to do might be to add new data in modeling period and drop the equal
amount of old data, if necessary. If the chosen external economic variable loses its power of leading tire
sales/shipment, then a backup leading indicator may be found at the “Forecast Errors” tab of each
model/excel file or a completely new leading indicator should be brought in by the above mentioned 15
steps approach.

All the update info about leading indicators chosen for this project is stored in the file named “Key
Leading Indicators”. Some of the economic variables for our 2 months out models have a delivery lag
around 45 to 60 days. That means to effectively use some of our causative models, we need to obtain the
leading indicators’ forecast values first. Sometimes these forecasted values are provided by the data
source organizations. Sometimes we need to do the forecasts by ourselves using time series modeling
techniques.

FILES LOCATION AND NAME


All the files related to this project is stored at the following location:

T:\NAT\703 Commercial Demand Planning\Commercial Modeling

For details about all the folders and their contents please see Appendix 18.

FUTURE LOOK
Depending on the effectiveness of the causitive models developed for this project as new data becomes
available, we can

• Revise and maintain our current models


• Transfer the modeling technique to Goodyear’s other business segments
• Automate the modeling procedures in Excel using advanced programming language

APPENDIX
[1] RMA Commercial Truck Tire Classification

Market segment Vehicle Application Code Description

Urban 220 Light, Medium, and W ide-Base


Truck Tires marketed to operate
specifically in pickup and delivery
service in a local area (e.g. retail

and wholesale pick-up and delivery,


emergency vehicles, and intracity
17

bus fleets).

Regional 230 Medium, Wide Base and Heavy


Truck Tires marketed to operate in a
limited (150 mile radius) delivery or
service related vocation (e.g.

State & local government,


emergency vehicles, public utility,
school bus, food, petroleum and
manufacturing goods distribution,
and

inter-modal “piggy-back” trailers).

Long haul 240 Medium, W ide Base and Heavy


Truck Tires marketed to operate in
long distance, high annual mileage
operations (e.g. Less-Than-

Trailer-Load, Trailer-Load, and


Lease/Rental Fleets, Common
Contract Carriers, and Inter-City Bus
Fleets).

On-Off/Off Highway (Mixed 250 All Light, Medium, W ide Base,


service) Heavy and Large-off-the-Road
Truck Tires marketed to operate in
off and on-off highway applications
(e.g.

construction, mining, sanitation, and


logging)

[2] Comparison of monthly data with 12 months rolling sum data

[3] Using 12 months rolling average to smooth leading indicator will sometimes improve forecast results
18

IndustryRegional MarketSegment Shipment Forecast


700000
600000
500000
400000
300000
200000
100000
0
8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0
0 0
- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
-
n b r-a r- -y -
n l-
u
-
g -
p t-c -v -
c -
n
-
b r-a r- -y -
n l-
u
-
g -
p t-c -v -
c -
n
-
b r-a r- -y -
n
a
J e p a Ju J u e o e a
J e p a Ju J u e o e a
J e p a Ju
F M A M A S O N D F M A M A S O N D F M A M

Actual History Forecast using leading indicator's monthly data Forecast using leading indicator's 12 months moving average

[4] Causative Models Utility Comparison Table

[5] Stepwise regression removes and adds variables to the regression model for the purpose of identifying a useful
subset of the predictors. Minitab provides three commonly used procedures: standard stepwise regression (adds and
removes variables), forward selection (adds variables), and backward elimination (removes variables).
• When you choose the stepwise method, you can enter a starting set of predictor variables in
Predictors in initial model. These variables are removed if their p-values are greater than the
Alpha to enter value. If you want keep variables in the model regardless of their p-values,
enter them in Predictors to include in every model in the main dialog box.
• When you choose the stepwise or forward selection method, you can set the value of Alpha
for entering a new variable in the model in Alpha to enter.
• When you choose the stepwise or backward elimination method, you can set the value of
Alpha for removing a variable from the model in Alpha to remove.
[6] Best subsets regression identifies the best-fitting regression models that can be constructed with the predictor
variables you specify. Best subsets regression is an efficient way to identify models that achieve your goals with as
few predictors as possible. Subset models may actually estimate the regression coefficients and predict future
responses with smaller variance than the full model using all predictors.
19

Minitab examines all possible subsets of the predictors, beginning with all models containing one predictor, and then
all models containing two predictors, and so on. By default, Minitab displays the two best models for each number
of predictors.
For example, suppose you conduct a best subsets regression with three predictors. Minitab will report the best and
second best one-predictor models, followed by the best and second best two-predictor models, followed by the full
model containing all three predictors
[7] Best multiple regression models by Minitab “Best-subsets” method-10 variables

The regression e
TOTAL = 16972750
- 6535 1
- 1712863
- 70343
[8] Best multiple regression models by Minitab “Best-subsets” method-9 variables

+ 991 61-
+ 3270 6
- 628280
+ 140976
20

The regression eq
TOTAL = 12404727
- 265762
+ 10.7 49
[9] The best multiple regression model by Minitab “Step-wise” method-5 variables

+ 2959 66
The regression eq
- 6831723
TOTAL = - 2832340
+ 1476426
+ 10.6 49
- 67189 5
[10] 12 months rolling sum forecast’s absolute percentage errors comparison table and plot for single and

- 115555
multiple linear regression models

+ 5738595
Predictor
21

12monthsrollingsumforeca

Tim e Date
1 Dec-03
2 Jan-04
3 Feb-04
4 Mar-04
5 Apr-04
6 May-04
7 Jun-04
8 Jul-04
9
[11] Goodyear’s 12 months rolling sum history plot for Total Market billed sales
Aug-04
10 Sep-04
11 Oct-04
12 Nov-04
13 Dec-04
14 Jan-05
22

Time Series Plot of Gyt 12-month rolling sales


4000000

3750000

Gyt 12-month rolling sales


3500000

3250000

3000000

Month Dec Jun Dec Jun Dec Jun Dec Jun Dec
Year 2003 2004 2005 2006 2007

Industry’s 12 months rolling sum history plot for Total Market shipment

Time Series Plot of Industry 12-month rolling Ship


19000000

18000000
Industry 12-month rolling Ship

17000000

16000000

15000000

14000000

13000000

12000000
Month Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec
Year 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007

[12] Multiplicative Holt Winter’s method

A time series modeling technique that is able to capture increasing seasonal variation.

[13] Additive Holt Winter’s method

A time series modeling technique that is able to capture constant seasonal variation.

[14] Goodyear’s Market Group and RMA name transformation

[15] Key Leading Indicators and their sources


23

External Econom
N.O. Variab
Var 1 Industrial Prod
Var 2 Civ U-R
Var 3 Real Retail Sales-
Var 4 Housing
Var 5 2-4 Unit Hou
Var 6 Conference Board Index o
Var 7 UM Index of Cons
Var 8 Diesel P
Var 9 WTI Crude
Var 10 Diesel Supp
[16] Add-in “Data Analysis” in Excel 2007 can be activated as follows:

Var 11 M1 Money
Var 12 ISM P
24

• Click the Microsoft Office Button , and then click Excel Options.
• Click Add-Ins, and then in the Manage box, select Excel Add-ins.
• Click Go.
• In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK.
• Tip: If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it.
• If you get prompted that the Analysis ToolPak is not currently installed on your computer, click “Yes” to
install it.
• After you load the Analysis ToolPak, the Data Analysis command is available in the Analysis group on the
Data tab.
[17] ANOVA analysis generated by “Regression” function of “Data analysis” add-in in Excel 2007

SUMMARY OUTPUT

Regression Statistic
Multiple R 0.8
[18] Project Folders and their contents

F
R Square
2 months and 12 months out Caus 0.76
Adjusted
2 months R
and Square
12 months out
Folder Name: 2 months and 0.76
Caus
StandardReplacement
Commercial Error Industry371
and
Goodyear
Greg Billed Sales Causative Mode
Tomsho
Observations
Goodyear Billed Sales Causative Mode
James Krein
Goodyear
Mike RyanShipment Causative Models

Anda mungkin juga menyukai