# t stat definition by abe2

VIEWS: 2,740 PAGES: 8

• pg 1
```									Chapter 7 Notes, Forecasting

A time series is numerical sequence of values generated over regular time intervals.
The classical time-series model involves four components:
 Secular trend (Tt).
 Cyclical movement (Ct).
 Seasonal fluctuation (St).
 Irregular variation (It).
The multiplicative model determine the level of the forecast variable Yt: Yt = Tt × Ct × St × It
Smoothing
Basic Exponential Smoothing: Historical data weighted by a decreasing exponential
factor.

Ft+1 =  Yt +  (1 -  )Yt-1 + (1 -  )2Ft-1 + . . . .                0<  <1

Single-parameter smoothing: Used only for stationary data, no trend

Ft+1 =  Yt + (1 -  )Ft                   Example:        Period     Actual   Forecast   Error
for  = 0.2       t           Yt       Ft        t
startup: F2 = Y1                                             1        4,890       --        --
2        4,910    4,890.0     20.0
3        4,970    4,894.0     76.0
tracking forecast error                                      4        5,010    4,909.2    101.8
5        5,060    4,929.4    130.6
6        5,100    4,955.5    144.5
 t = Yt - Ft          (forecast error)

MSE =
 (Y   t    Ft ) 2
(mean square error)
nm

Two-parameter smoothing: identifies the trend;                   0 <  < 1, 0 <  < 1

Tt =  Yt + (1 -  )(Tt-1 + bt-1)            smooth the data to get the trend

bt =  (Tt - Tt-1) + (1 -  )bt-1            smooth the slope in the trend

Ft+1 = Tt + bt                                forecast

startup: T2 = Y1 ;            b2 = Y2 - Y1

Period     Actual           Trend      Slope       Forecast      Error         Example assumes
t           Yt               Tt       bt            Ft           t           = 0.2
1        4,890               --         --          --           --           = 0.3
2        4,910            4,890      20.0           --           --
3        4,970            4,922      23.6        4,910.0         60.0
4        5,010            4,958      27.3        4,945.6         64.4
5        5,060            5,000      31.7        4,985.3         74.7
Three-parameter exponential smoothing: identifies up and down trend, and seasonal
factors. 0 <  < 1, 0 <  < 1,  0 <  < 1, p = # of periods, p = 4 for quarterly data.

 Y 
Tt =   t  + (1 -  )(Tt-1 + bt-1)           smooth the data to get the trend
 S t p 

St-p = 1     when t-p not available           startup

bt =  (Tt - Tt-1) + (1 -  )bt-1              smooth the slope in the trend

Y 
St =   t  + (1 -  )St - p                  smooth the seasonal factor
 Tt 

Y 
St =  t  when St-p not available            startup
 Tt 

Ft+1 = (Tt + bt)St-p+1                         forecast, one period ahead

Ft+m = (Tt + btm)St-p+m                        forecast m periods ahead

startup:      T2 = Y1 ;   b2 = Y2 - Y1

Example assumes:  = 0.4,  = 0.5,  = 0.2

Period       Actual     Trend         Slope    Seasonal       Forecast        Error
t           Yt          Tt          bt        St             Ft              t
1, Q1          6           --            --         --          --              --
2, Q2          8           6             2       1.33           --              --
3, Q3          7         7.6            1.8      0.921
4, Q4         10         9.64           1.92     1.037
5, Q1          7        9.736           1.008    0.719
6, Q2          9        9.153           0.2125   1.261
7, Q3                                                          8.626
Forecasting the trend using regression analysis
A regression equation in the form of Y = a + bX can be used to extrapolate a time trend
forward. Below the year (transformed to start at t = 0) of the annual sales becomes the X
variable used to forecast unit annual sales by creating a least squares equation.
X (year) Y           calculation   calculation
Year     t=0        sales     XY            X2
1995     0          72.9      0             0
1996     1          74.4      74.4          1
1997     2          75.9      151.8         4
1998     3          77.9      233.7         9
1999     4          78.6      314.4         16
2000     5          79.1      395.5         25
2001     6          81.7      490.2         36
2002     7          84.4      590.8         49
2003     8          85.9      687.2         64
2004     9          84.8      763.2         81
Sum      45         795.6     3701.2        285
        =       =          =2

X = 45/10 = 4.5              Y = 795.6/10 = 79.56

b
 XY  n X Y   
3701.2  10( 4.5)(79.56)
 1.47
 X  nX               285  10( 4.5) 2
2      2

a  Y  b X = 79.56 - 1.47(4.5) = 72.95
so the equation becomes: Y = 72.95 + 1.47X
for 2005, when X = 10, the calculation is: Y = 72.95 + 1.47(10)    = 87.65

Regression in Causal Models
Regression analysis can make forecasts with a non-time independent variable.
A simple regression employs a straight line. Y(X) = a + bX
The dependent variable is not time periods, such as: store size. order amount. weight

Seasonal Analysis can be combined with regression
The classical time-series model provides a rationale for isolating seasonal components:

The procedure is the ratio-to-moving-average method (calculations on case, not on exam):
 (1) Compute moving averages from Ys.
 (2) Center above.
 (3) Express Ys as % of moving average.
 (4) Determine medians by season and adjust.
^
Regression: The goal is to estimate: Y ( X )  a  bX

This technique, also known as ordinary least squres, fits a lie through dta so that the square of the difference
between the estimate and the actual observation is minimized. Consider the example below for highway
construction costs. The Y variable is total project cost and the X variable is the length of the highway in miles.
2
The other values are used in calculating the regression equation. The X is just each value of X squared. XY is
2
X times Y. Y is the value of Y squared.
2                 2
X      Y      X         XY       Y
1      6      1         6        36
3      14     9         42       196
4      10     16        40       100
5      14     25        70       196
7      26     49        182         676
20 = X 70 = Y 100 = X 2 340 = XY 1204 = Y 2

20                    70
X       4         Y          14
5                      5

To estimate the regression, use the formulas below. First calculate b then use b to calculate a.
_ _
XY  n X Y           340  (5)( 4)(14)
b                                             3       a  Y  b X  14  (3)(4)  2
2
X 2  n X             100  (5)( 4)2

So:
ˆ
Y ( X )  2  3X           and when X = 6, Y(X) = 2 + 3(6) = 20

Two formulas for the standard error of the estimate: the first is for definition, the second is designed for
calculations:
ˆ
(Y  Y ( X ))2   Y 2  aY  bXY   1204  (2)(70)  (3)(340)
s y. x                                                                   3.83
n2                  n2                    52
Use the standard error of the estimate to calculate prediction intervals. There are two formulas for the prediction
interval, first is the Prediction Interval for the Conditional Mean, use this if you are calculating the prediction
interval for an average value (degrees of freedom = n - 2 in this case).
Prediction Interval for the Conditional Mean:
2
ˆ                          1   (X  X )                      1   ( 6  4) 2
 y. x  Y ( X )  ta / 2 ( s y. x )                20  3.182(3.83)                20  7.7
n X 2  n X 2                    5 100  5(4)2

Prediction Interval of an Individual Value of Y, use this for when forecasting a specific value:
2
ˆ                           1    (X  X )                          1   ( 6  4) 2
Y  Y ( X )  t a / 2 ( s y. x )                 1  20  3.182(3.83)                1  20  14.4
n      2      2                        5 100  5(4) 2
X  n X

The sample correlation coefficient, r, measures the degree of association between two variables such as X and
Y. The value of r ranges between -1 and 1. If r = 1, there is a perfect positive correlation; if r - 0 there is no
correlation, if r = -1 there is a perfect negative correlation (as one variable goes up the other goes down). Using
the data above as an example:

XY  n X Y                                 340  5(4)(14)
r                                                                                     0.896
2                  2                        2            2
2
(X  n X )(Y  nY )       2               100  (5)( 4) )(1204  5(14) ))

The sample coefficient of determination, called R-Squared, is the ratio of Explained Variation to the Total
Variation. That makes it the percent of variation in Y explained by X. Using the data above as an example, the
first formula provides a definition and the second part is used for calculation. The result is that 80.357% of the
variation of cost is explained by the number of miles for the road project.
ˆ                          2
(Y  Y )2  (Y  Y ( X ))2 aY  bXY  nY        2(70)  3(340)  5(14)2
r2                                                                               0.80357
(Y  Y )2                (Y  Y )2            1204  (5)(14)2

To do a hypothesis test on the significance of the b coefficient, use the t-statistic. Usually this test is for the null
hypothesis Ho: B = 0. The formula for calculating t is:

b                          3
t                                                        3.5
s y. s                        3.83
X 2  n X
2              100  5( 4)2

Since the value of the t-statistic for a two-tailed test at the 95% confidence interfal with three degress of freedom
(n - 2 = degrees of freedom for the basic regression case) is equal to 3.182 (from the t-table), the calculated
value of 3.5 above indicates that the b coefficient is significantly different from zero.

When using a regression equation the follow assumptions apply:

1.   All populations have the same standard deviation independent of X.
2.   The data fit a straight line.
3.   Successive sample observations are independent.
4.   Values of X are known in advance.
SUMMARY OUTPUT

Regression Statistics                      Using the ANOVA
Multiple R    0.981981              with the Excel regression output
R Square      0.964286
Adjusted R    0.928571           Tests the hypothesis: Ho: b1 = b2 = . . . bn = 0
Square
Standard              2          Significance F provides the area in the tail of the
Error                            F distribution, see Table E.5, p. 839.
Observations          5

ANOVA
df           SS           MS          F        Significance F
Regression              2          216         108          27 0.03571429
Residual                2            8           4
Total                   4          224

Coefficien Standard t Stat         P-value Lower 95% Upper
ts         Error                                               95%
Intercept               2      2.0000    1.0000 0.42265 -6.6053115 10.60531
X Variable 1          2.4 0.4898979 4.898979 0.039231 0.29213779 4.507862
X Variable 2            6      2.0000    3.0000 0.095466 -2.6053115 14.60531
In this example the value 0.0357 indicates that the probability of a Type I error of rejecting
the null hypothesis when the null hypothesis is true (the b values are actually 0) would be
about 3.57 percent. Or, we would reject the null hypothesis if the reliability is set to 95
percent ( = .05) but would accept the null hypothesis if the reliabiity is set to 99 percent ( =
.01).

Notice from the diagram in the text on page 839 that F is not normally distributed. The F
statistic is approximately: R Square/(1 - R Square)

NORMAL PROBABILITY PLOT: An output option for regression is the normal
probability plot. A straight line for this plot indicates that the assumption of a normally
distributed sample is supported. A more formal way of testing for normality is the
Kolmogorov-Smirnov Test that:
Accept Ho if D < D (conclude that the normal distribution applies)
Reject Ho if D > D (conclude that some other distribution applies)
See Table L on page 639 for critical values of D (Excel doesn't generate this statistic). The
test calculates by how much the actual results depart from the results expected from a
normal distribution.
A common violation of the normality assumption is for accounting data when fixed and
variable costs are constrained to be zero, the normal curve is often distorted when there are
boundaries on possible values.
SUMMARY OUTPUT

Regression Statistics                          Using the t Stat
Multiple R         0.981981             with the excel regression output
R Square           0.964286
Square
Standard                  2
Error
Observations              5

ANOVA
df               SS          MS          F        Significance F
Regression                  2         216         108          27 0.03571429
Residual                    2           8           4
Total                       4         224

Coefficients Standard t Stat    P-value Lower 95% Upper
Error                                  95%
Intercept                 2     2.0000   1.0000 0.42265 -6.6053115 10.60531
X Variable 1            2.4 0.4898979 4.898979 0.039231 0.29213779 4.507862
X Variable 2              6     2.0000   3.0000 0.095466 -2.6053115 14.60531

Coefficients are estimates for equation Y(X) = a + b1X1 + b2X2 so a = 2, b1 = 2.4, b2 = 6

Standard error is the standard error of the estimated coefficient s b

The t Stat measures the distance in units of standard error that the estimated coefficient is
from 0. For the intercept with a coefficient value of 2 and a standard error of 2, the
coefficient values is 1 standard error from 0. This format is for the hypothesis test for the
intercept, a: Ho: a = 0 for the X Variables the test is Ho: b = 0.

The P-values measures the area in the two tails of a distribution with a mean of zero
(for a = 0 or b = 0) and a distance of the reported t-statistic for the appropriate degrees of
freedom from the mean in either direction. The  value associated with the type I error, the
probability of getting the estimated coefficient when the true population value is zero. For X
Variable 2, the t-statistic is 3.0, since degrees of freedom equal 2, look on the inside of the t-
table on row two for a value close to 3, its just a little higher than the column .05 meaning
that a little less than .05 occurs in each tail of the distribution 3 standard deviations from the
mean, for a total area of 0.0954.

Confidence Interval for coefficients: b + t/2(sb)
for X Variable 1 for df = 2 and t/2 = 4.303, 2.4 + 4.303(0.48989), or 0.292 to 4.507

```
To top