Combining disaggregate forecasts or combining disaggregate information to forecast an aggregate
David F. Hendry and Kirstin Hubrich
Comments by Olivier de Bandt, Banque de France
Main results (1/4)
• Highly relevant paper, provides analytical results to answer questions of major interest to practionners • Authors pursue research agenda already developed in their previous papers: use a forecast error taxonomy to compare 3 methods to forecast an aggregate:
– Univariate model of the aggregate – Use disaggregated components and aggregate the forecasts – Use disaggregate variables in the aggregate model
• Monte Carlo simulations and application to US inflation, which favours direct aggregate model
Main results (2/4)
• Forecast error taxonomy after a break (intercept and slope) in a VAR at T (extension of Clements and Hendry, 1998, 2006 in order to use exogenous variables), when projecting at T+1 (equ. (10) and (16):
– – – – Parameter shift (long run mean and slope) Misspecification : since DGP unknown Estimation uncertainty (interaction and innovation error)
( y ) ( y e ) ( e )
* *
^
^
• Paper provides analytical results to compare the 3 forecasting methods
Main results (3/4)
• Comparison of aggregate versus aggregated disggregates:
– Long run mean misspecification is unlikely when in-sample DGP is constant – Slope mis-specification and estimation uncertainty are the main source of forecast error differences – Implications:
• Estimation uncertainty dominates if DGP are similar across subcomponents (equal eigenvalues) • If the mean of the aggregate exhibits very little variability, the lower estimation uncertainty of the mean estimate than for the mean of some of the components imply dominance of aggregate over disaggregates • Adding disaggregates improves foercastability if contributes to the dynamics of the aggregate, but estimation uncertainty might worsen forecastabilityselection of disaggregates is necessary
• Monte Carlo simulations
– Different dimensions:
• Misspecification : estimate an AR(p) instead of a VAR • Correlation across components • Sample size estimation uncertainty
– Conclusions: including disaggregates helps forecasting the aggregate if the disaggregates follow different stochastic structures and are interdependent
Main results (4/4)
• Application to US inflation
– Direct forecast of the aggregate is more accurate than the indirect forecast, consistent with other research by the authors (Hubrich, 2005, Hendry and Hubrich, 2006): smaller bias and smaller forecast error variance – Estimation uncertainty matters
Questions/remarks (1/4)
• Link with the literature= could do more to explain opposing results in the literature / revisit the literature
– Marcellino, Stock, Watson(2003) : disaggregate forecasts are better, Devulder & Chauvin (2007) forecasts on components for euro area inflation, better than for the aggregate, but dummy variablesDGP is not stable in-sample – Evidence at the national level in favour of disaggregates: France (Bruneau, de Bandt et al), Netherlands (Reijer et al), Austria (Fritzer et al) does more volatility at the national level explain such a result?
• Extension to the regional dimension (geographic as opposed to sectoral dimension)
– CPI-U for the US available at the regional level (4 metropolitan regions: West urban, midwest urban, south urban et northeast urban), as well as for the euro area
Questions/remarks (2/4)
• Model Specification
Effect of exogenous variable (as stressed in the theory part):
– What kind of exogenous variables are introduced ? – how does it affect the ranking in 2.3 in particular if shifts in the slope parameters (eg flattening of the Phillips curve, affects the impact of unemployment on inflation)?
Role of disaggregate information:
– Brüggeman, Lütkepohl & Marcellino (2006): German data are better than synthetic euro area data to forecast euro area inflation if no major ajustment in the level « if heterogeneity in the DGPs, then forecasting the aggregate directly will be less efficient than aggregating disaggregate univariate or disaggregate forecasts »
Questions/remarks (3/4)
• Protocol for forecast comparison
Data used for factor model? Only price data or other variables (see previous
question)?
Presentation of results in Monte Carlo simulations:
–
–
RMSFE on VAR sub= VAR agg, sub « VAR including the aggregate and one
disaggregate performs identically to the VAR with both subcomponents », so why keep it? Stress that Indirect AR versus direct VAR always dominated by VAR when components are correlated = misspecification
Comparison of forecasting performance : why no formal test of comparison? Comparison depends on standard deviation of RMSE
– Small sample size? Why not increase it? – Use Diebold-Mariano as modified par Harvey, Leybourne et Newbold (IJF, 1997) or Clark et McCracken (JoE, 2001) – Table 12 seems to indicate that FEV relatively large, so that forecasts can be seen as relatively similar
•
Absolute forecasting performance?
Should the forecast be lagged?