VIEWS: 2,740 PAGES: 8 CATEGORY: Other POSTED ON: 1/8/2009 Public Domain
Chapter 7 Notes, Forecasting A time series is numerical sequence of values generated over regular time intervals. The classical time-series model involves four components: Secular trend (Tt). Cyclical movement (Ct). Seasonal fluctuation (St). Irregular variation (It). The multiplicative model determine the level of the forecast variable Yt: Yt = Tt × Ct × St × It Smoothing Basic Exponential Smoothing: Historical data weighted by a decreasing exponential factor. Ft+1 = Yt + (1 - )Yt-1 + (1 - )2Ft-1 + . . . . 0< <1 Single-parameter smoothing: Used only for stationary data, no trend Ft+1 = Yt + (1 - )Ft Example: Period Actual Forecast Error for = 0.2 t Yt Ft t startup: F2 = Y1 1 4,890 -- -- 2 4,910 4,890.0 20.0 3 4,970 4,894.0 76.0 tracking forecast error 4 5,010 4,909.2 101.8 5 5,060 4,929.4 130.6 6 5,100 4,955.5 144.5 t = Yt - Ft (forecast error) MSE = (Y t Ft ) 2 (mean square error) nm Two-parameter smoothing: identifies the trend; 0 < < 1, 0 < < 1 Tt = Yt + (1 - )(Tt-1 + bt-1) smooth the data to get the trend bt = (Tt - Tt-1) + (1 - )bt-1 smooth the slope in the trend Ft+1 = Tt + bt forecast startup: T2 = Y1 ; b2 = Y2 - Y1 Period Actual Trend Slope Forecast Error Example assumes t Yt Tt bt Ft t = 0.2 1 4,890 -- -- -- -- = 0.3 2 4,910 4,890 20.0 -- -- 3 4,970 4,922 23.6 4,910.0 60.0 4 5,010 4,958 27.3 4,945.6 64.4 5 5,060 5,000 31.7 4,985.3 74.7 Three-parameter exponential smoothing: identifies up and down trend, and seasonal factors. 0 < < 1, 0 < < 1, 0 < < 1, p = # of periods, p = 4 for quarterly data. Y Tt = t + (1 - )(Tt-1 + bt-1) smooth the data to get the trend S t p St-p = 1 when t-p not available startup bt = (Tt - Tt-1) + (1 - )bt-1 smooth the slope in the trend Y St = t + (1 - )St - p smooth the seasonal factor Tt Y St = t when St-p not available startup Tt Ft+1 = (Tt + bt)St-p+1 forecast, one period ahead Ft+m = (Tt + btm)St-p+m forecast m periods ahead startup: T2 = Y1 ; b2 = Y2 - Y1 Example assumes: = 0.4, = 0.5, = 0.2 Period Actual Trend Slope Seasonal Forecast Error t Yt Tt bt St Ft t 1, Q1 6 -- -- -- -- -- 2, Q2 8 6 2 1.33 -- -- 3, Q3 7 7.6 1.8 0.921 4, Q4 10 9.64 1.92 1.037 5, Q1 7 9.736 1.008 0.719 6, Q2 9 9.153 0.2125 1.261 7, Q3 8.626 Forecasting the trend using regression analysis A regression equation in the form of Y = a + bX can be used to extrapolate a time trend forward. Below the year (transformed to start at t = 0) of the annual sales becomes the X variable used to forecast unit annual sales by creating a least squares equation. X (year) Y calculation calculation Year t=0 sales XY X2 1995 0 72.9 0 0 1996 1 74.4 74.4 1 1997 2 75.9 151.8 4 1998 3 77.9 233.7 9 1999 4 78.6 314.4 16 2000 5 79.1 395.5 25 2001 6 81.7 490.2 36 2002 7 84.4 590.8 49 2003 8 85.9 687.2 64 2004 9 84.8 763.2 81 Sum 45 795.6 3701.2 285 = = =2 X = 45/10 = 4.5 Y = 795.6/10 = 79.56 b XY n X Y 3701.2 10( 4.5)(79.56) 1.47 X nX 285 10( 4.5) 2 2 2 a Y b X = 79.56 - 1.47(4.5) = 72.95 so the equation becomes: Y = 72.95 + 1.47X for 2005, when X = 10, the calculation is: Y = 72.95 + 1.47(10) = 87.65 Regression in Causal Models Regression analysis can make forecasts with a non-time independent variable. A simple regression employs a straight line. Y(X) = a + bX The dependent variable is not time periods, such as: store size. order amount. weight Seasonal Analysis can be combined with regression The classical time-series model provides a rationale for isolating seasonal components: The procedure is the ratio-to-moving-average method (calculations on case, not on exam): (1) Compute moving averages from Ys. (2) Center above. (3) Express Ys as % of moving average. (4) Determine medians by season and adjust. ^ Regression: The goal is to estimate: Y ( X ) a bX This technique, also known as ordinary least squres, fits a lie through dta so that the square of the difference between the estimate and the actual observation is minimized. Consider the example below for highway construction costs. The Y variable is total project cost and the X variable is the length of the highway in miles. 2 The other values are used in calculating the regression equation. The X is just each value of X squared. XY is 2 X times Y. Y is the value of Y squared. 2 2 X Y X XY Y 1 6 1 6 36 3 14 9 42 196 4 10 16 40 100 5 14 25 70 196 7 26 49 182 676 20 = X 70 = Y 100 = X 2 340 = XY 1204 = Y 2 20 70 X 4 Y 14 5 5 To estimate the regression, use the formulas below. First calculate b then use b to calculate a. _ _ XY n X Y 340 (5)( 4)(14) b 3 a Y b X 14 (3)(4) 2 2 X 2 n X 100 (5)( 4)2 So: ˆ Y ( X ) 2 3X and when X = 6, Y(X) = 2 + 3(6) = 20 Two formulas for the standard error of the estimate: the first is for definition, the second is designed for calculations: ˆ (Y Y ( X ))2 Y 2 aY bXY 1204 (2)(70) (3)(340) s y. x 3.83 n2 n2 52 Use the standard error of the estimate to calculate prediction intervals. There are two formulas for the prediction interval, first is the Prediction Interval for the Conditional Mean, use this if you are calculating the prediction interval for an average value (degrees of freedom = n - 2 in this case). Prediction Interval for the Conditional Mean: 2 ˆ 1 (X X ) 1 ( 6 4) 2 y. x Y ( X ) ta / 2 ( s y. x ) 20 3.182(3.83) 20 7.7 n X 2 n X 2 5 100 5(4)2 Prediction Interval of an Individual Value of Y, use this for when forecasting a specific value: 2 ˆ 1 (X X ) 1 ( 6 4) 2 Y Y ( X ) t a / 2 ( s y. x ) 1 20 3.182(3.83) 1 20 14.4 n 2 2 5 100 5(4) 2 X n X The sample correlation coefficient, r, measures the degree of association between two variables such as X and Y. The value of r ranges between -1 and 1. If r = 1, there is a perfect positive correlation; if r - 0 there is no correlation, if r = -1 there is a perfect negative correlation (as one variable goes up the other goes down). Using the data above as an example: XY n X Y 340 5(4)(14) r 0.896 2 2 2 2 2 (X n X )(Y nY ) 2 100 (5)( 4) )(1204 5(14) )) The sample coefficient of determination, called R-Squared, is the ratio of Explained Variation to the Total Variation. That makes it the percent of variation in Y explained by X. Using the data above as an example, the first formula provides a definition and the second part is used for calculation. The result is that 80.357% of the variation of cost is explained by the number of miles for the road project. ˆ 2 (Y Y )2 (Y Y ( X ))2 aY bXY nY 2(70) 3(340) 5(14)2 r2 0.80357 (Y Y )2 (Y Y )2 1204 (5)(14)2 To do a hypothesis test on the significance of the b coefficient, use the t-statistic. Usually this test is for the null hypothesis Ho: B = 0. The formula for calculating t is: b 3 t 3.5 s y. s 3.83 X 2 n X 2 100 5( 4)2 Since the value of the t-statistic for a two-tailed test at the 95% confidence interfal with three degress of freedom (n - 2 = degrees of freedom for the basic regression case) is equal to 3.182 (from the t-table), the calculated value of 3.5 above indicates that the b coefficient is significantly different from zero. When using a regression equation the follow assumptions apply: 1. All populations have the same standard deviation independent of X. 2. The data fit a straight line. 3. Successive sample observations are independent. 4. Values of X are known in advance. SUMMARY OUTPUT Regression Statistics Using the ANOVA Multiple R 0.981981 with the Excel regression output R Square 0.964286 Adjusted R 0.928571 Tests the hypothesis: Ho: b1 = b2 = . . . bn = 0 Square Standard 2 Significance F provides the area in the tail of the Error F distribution, see Table E.5, p. 839. Observations 5 ANOVA df SS MS F Significance F Regression 2 216 108 27 0.03571429 Residual 2 8 4 Total 4 224 Coefficien Standard t Stat P-value Lower 95% Upper ts Error 95% Intercept 2 2.0000 1.0000 0.42265 -6.6053115 10.60531 X Variable 1 2.4 0.4898979 4.898979 0.039231 0.29213779 4.507862 X Variable 2 6 2.0000 3.0000 0.095466 -2.6053115 14.60531 In this example the value 0.0357 indicates that the probability of a Type I error of rejecting the null hypothesis when the null hypothesis is true (the b values are actually 0) would be about 3.57 percent. Or, we would reject the null hypothesis if the reliability is set to 95 percent ( = .05) but would accept the null hypothesis if the reliabiity is set to 99 percent ( = .01). Notice from the diagram in the text on page 839 that F is not normally distributed. The F statistic is approximately: R Square/(1 - R Square) NORMAL PROBABILITY PLOT: An output option for regression is the normal probability plot. A straight line for this plot indicates that the assumption of a normally distributed sample is supported. A more formal way of testing for normality is the Kolmogorov-Smirnov Test that: Accept Ho if D < D (conclude that the normal distribution applies) Reject Ho if D > D (conclude that some other distribution applies) See Table L on page 639 for critical values of D (Excel doesn't generate this statistic). The test calculates by how much the actual results depart from the results expected from a normal distribution. A common violation of the normality assumption is for accounting data when fixed and variable costs are constrained to be zero, the normal curve is often distorted when there are boundaries on possible values. SUMMARY OUTPUT Regression Statistics Using the t Stat Multiple R 0.981981 with the excel regression output R Square 0.964286 Adjusted R 0.928571 Square Standard 2 Error Observations 5 ANOVA df SS MS F Significance F Regression 2 216 108 27 0.03571429 Residual 2 8 4 Total 4 224 Coefficients Standard t Stat P-value Lower 95% Upper Error 95% Intercept 2 2.0000 1.0000 0.42265 -6.6053115 10.60531 X Variable 1 2.4 0.4898979 4.898979 0.039231 0.29213779 4.507862 X Variable 2 6 2.0000 3.0000 0.095466 -2.6053115 14.60531 Coefficients are estimates for equation Y(X) = a + b1X1 + b2X2 so a = 2, b1 = 2.4, b2 = 6 Standard error is the standard error of the estimated coefficient s b The t Stat measures the distance in units of standard error that the estimated coefficient is from 0. For the intercept with a coefficient value of 2 and a standard error of 2, the coefficient values is 1 standard error from 0. This format is for the hypothesis test for the intercept, a: Ho: a = 0 for the X Variables the test is Ho: b = 0. The P-values measures the area in the two tails of a distribution with a mean of zero (for a = 0 or b = 0) and a distance of the reported t-statistic for the appropriate degrees of freedom from the mean in either direction. The value associated with the type I error, the probability of getting the estimated coefficient when the true population value is zero. For X Variable 2, the t-statistic is 3.0, since degrees of freedom equal 2, look on the inside of the t- table on row two for a value close to 3, its just a little higher than the column .05 meaning that a little less than .05 occurs in each tail of the distribution 3 standard deviations from the mean, for a total area of 0.0954. Confidence Interval for coefficients: b + t/2(sb) for X Variable 1 for df = 2 and t/2 = 4.303, 2.4 + 4.303(0.48989), or 0.292 to 4.507