Anda di halaman 1dari 7



Automatic Forecasting SnapStat

The Automatic Forecasting SnapStat creates a one-page summary of forecasts generated for a time series. Like the Automatic Forecasting procedure, this SnapStat tries a collection of forecasting models and selects the one that gives the best fit according to a specified criterion. Unlike that procedure, however, the SnapStat output is preformatted to fit on a single page.

Sample StatFolio: autocastsnapstat.sgp Sample Data:

The file baseball.sf6 contains the leading batting average in U. S. Major League Baseball for each year between 1901 and 2004. Batting averages represent the proportion of times that a player gets a hit out of all at-bats that result in either a hit or an out. The table below shows a partial list of the data from that file. The batting averages are expressed as the number of points out of 1000, such that a player batting 333 would have gotten a hit one-third of the time. Year 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 2004 Leading average 422 376 355 381 377 358 350 354 377 385 372

Forecasts are desired for the next several years.

2005 by StatPoint, Inc.

Automatic Forecasting SnapStat - 1

STATGRAPHICS Rev. 1/10/2005

Data Input
The data input dialog box requests the name of the column containing the time series data and information about how it was sampled:

Data: numeric column containing n equally spaced numeric observations. Sampling Interval: defines the interval between successive observations. For example, the baseball data were collected once every year, beginning in 1901. Seasonality: the length of seasonality s, if any. The data is seasonal if there is a pattern that repeats at a fixed period. For example, monthly data typically have a seasonality of s = 12. Hourly data that repeat every day have a seasonality of s = 24. If no entry is made, the data is assumed to be nonseasonal (s = 1). Trading Days Adjustment: a numeric variable with n observations used to normalize the original observations, such as the number of working days in a month. The observations in the Data column will be divided by these values before being plotted or analyzed. There must be enough entries in this column to cover both the observed data and the number of periods for which forecasts are requested. Select: subset selection. Number of Forecasts: number of periods following the end of the data for which forecasts are desired. Automatic Forecasting SnapStat - 2

2005 by StatPoint, Inc.

STATGRAPHICS Rev. 1/10/2005

The output from the SnapStat consists of a single page pf graphs and numerical statistics.
SnapStat: Automatic Forecasting Data variable: Leading average RMSE=17.7 MAE=13.96 MAPE=3.81% ME=-1.077 MPE=-0.48% Period 2005 2006 2007 2008 2009 2010 Forecast 366.743 365.715 365.591 365.58 365.579 365.579 Lower 95% Limit 330.772 327.642 326.667 325.925 325.216 324.519 Upper 95% Limit 402.715 403.788 404.515 405.235 405.943 406.64

Time Series Plot


Leading average

420 400 380 360 340 320 1900192019401960198020002020

actual forecast 95.0% lim

Residual Autocorrelations
1 48

Residual Plot


0.6 0.2 -0.2 -0.6 -1 0 5 10 15 20 25

Residual lag

28 8 -12 -32 1900192019401960198020002020

Residual Periodogram
2500 2000

Normal Probability Plot

99.9 99 95 80 50 20 5 1 0.1 -32

1500 1000 500 0 0 0.1 0.2 0.3 0.4 0.5








2005 by StatPoint, Inc.

Automatic Forecasting SnapStat - 3

STATGRAPHICS Rev. 1/10/2005

Model Statistics and Forecasts (top left)

The top left section of the output summarizes the selected forecasting model, which in this case is an ARIMA(0,1,1) model. Included are: Summary Statistics: table of summary statistics calculated from the one-period ahead forecast errors (error made in forecasting the value at time t given all data through time t-1). The statistics include the root mean squared error (RMSE), the mean absolute percentage error (MAPE), and the mean absolute error (MAE), all of which measure the variability of the one-period ahead forecast errors. Small values are preferred. The mean error (ME) and mean percentage error (MPE) measure bias and should be close to zero. Forecasts: table of forecasted values and probability limits. The forecasts are made given all available data. The probability limits are calculated at the level specified on the Forecasting tab of the Preferences dialog box, accessible via the Edit menu.

Time Sequence Plot (top right)

The plot shows: 1. The observed data Yt, shown as point symbols, including any replacements for missing values. 2. The one-step ahead forecasts Ft(1), displayed as a solid line through the points. These are created using the fitted model, forecasting each time period t+1 using only the information available at time t. The one-ahead forecast errors et are observable as the vertical distance between the observations and the solid line. 3. Forecasts for future values Fn(k) made at time t = n, the last time at which observed data is available. These are shown by the extension of the solid forecast line beyond the last observation. 4. Probability limits for the forecasts at the 100(1-)% confidence level, calculated assuming that the noise in the system follows a normal distribution. For mathematical details regarding the calculations, see the documentation for the Forecasting procedure.

Residual Autocorrelations (center left)

The residual autocorrelations measure the correlations amongst the residuals from the fitted forecasting model. If the model has captured all of the dynamic structure in the data, then the residuals should be random (white noise). In such a case, all of the estimates should be within the probability limits, as in the above plot.

2005 by StatPoint, Inc.

Automatic Forecasting SnapStat - 4

STATGRAPHICS Rev. 1/10/2005

Residual Plot (center right)

This plot shows the data in sequential order. It can be helpful in finding outliers or identifying trends that the forecasting model has missed. Ideally, the residuals should behave like a random set of observations from a normal distribution.

Residual Periodogram (bottom left)

The residual periodogram can be used to identify cyclical components that have not been captured by the forecasting model. The periodogram plots the power remaining at each of the Fourier frequencies. If the residuals are random, there should approximately equal power at all frequencies, which is why a random time series is often called white noise. Any large spikes could indicate a cycle at a fixed frequency that, if modeled, might improve the forecasts.

Residual Normal Probability Plot (bottom right)

The normal probability plot is used to determine whether the residuals left behind by the forecasting model follow a normal distribution. If so, they should fall approximately along the reference line. A plot such as that displayed above, which shows some curvature in the tails, is indicative of a situation where the data have some positive skewness. In such cases, it made be helpful to transform the data using Analysis Options.

Analysis Options

Display: if desired, the plot may be limited to the specified number of most recent observations. Transformation: the transformation to be applied to the data, if any. If Box-Cox is selected, the program will automatically determine an appropriate power transformation to normalize the data, after adding the specified Addend to each data value. Note: the Box-Cox option can be very time-consuming if many models are being compared, since the program will fit every model at each iteration of the Box-Cox optimization algorithm. Automatic Forecasting SnapStat - 5

2005 by StatPoint, Inc.

STATGRAPHICS Rev. 1/10/2005

SnapStat Defaults
The defaults used by the Automatic Forecasting SnapStat are set on the Forecasting tab of the Preferences dialog box under the Edit menu:

Models Included: specify the models that should be fit to the data. These are the models from which the best model will be selected. Descriptions of each of the models are given in the Forecasting documentation. For several of the models, additional options are provided: Random walk model check include constant to consider a model containing a constant as well as one without a constant. Moving average model select the maximum span to consider. Models will be fit of spans 2 through the number indicated. ARIMA AR Terms specify the maximum order p of the autoregressive terms in the model. ARIMA MA Terms specify the maximum order q of the moving average terms in the model. You may elect instead to consider only models for which q = p 1. ARIMA Differencing specify the maximum order of differencing d. Select Include constant to consider models that include a constant term when differencing is performed.

2005 by StatPoint, Inc.

Automatic Forecasting SnapStat - 6

STATGRAPHICS Rev. 1/10/2005 Information Criterion: the criterion used to select the best model. Forecast Limits: percentage used for the forecast probability limits.

The procedure fits each of the models indicated and selects the model that gives the smallest value of the selected criterion. They are three criteria to choose from: Akaike Information Criterion The Akaike Information Criterion (AIC) is calculated from
AIC = 2 ln(RMSE ) + 2c n


where RMSE is the root mean squared error during the estimation period, c is the number of estimated coefficients in the fitted model, and n is the sample size used to fit the model. Notice that the AIC is a function of the variance of the model residuals, penalized by the number of estimated parameters. In general, the model will be selected that minimizes the mean squared error without using too many coefficients (relative to the amount of data available). Hannan-Quinn Criterion The Hannan Quinn Criterion (HQC) is calculated from
HQC = 2 ln(RMSE ) + 2 p ln(ln(n) ) n


This criterion uses a different penalty for the number of estimated parameters. Schwarz-Bayesian Information Criterion The Schwarz-Bayesian Information Criterion (SBIC) is calculated from
SBIC = 2 ln(RMSE) + p ln(n ) n


Again, the penalty for the number of estimated parameters is different than for the other criteria.

2005 by StatPoint, Inc.

Automatic Forecasting SnapStat - 7