Nonnested+Model+Testing+for+World+Politics

Reviews
Shared by: Myrna Carlson
Categories
Tags
Stats
views:
82
rating:
not rated
reviews:
0
posted:
7/1/2008
language:
English
pages:
0
Nonnested Model Testing for World Politics Assessing Binary Choice Models1 Paper prepared for presentation to the American Political Science Association Conference September 3-6, 1998 Boston, MA Kevin A. Clarke University of Michigan, Department of Political Science 7th floor Haven Hall, Ann Arbor, MI 48109 734 998-1532 kclarke@umich.edu 1 I thank John Jackson, Susan Murphy, Paul Huth and Bob Pahre for helpful comments and advice. I also thank Paul Huth, Christopher Gelpi and D. Scott Bennett for providing their data. Nick Winters provided programming assistance above and beyond the call of friendship. Edward Czilli was kind enough to read a draft. Errors and omissions remain my own. Two strategies of theory choice exist in quantitative political science: first-order empiricism and what Levey (1996) has termed second-order empiricism. First-order empiricism considers the extent to which a theory is directly confirmed or falsified by data. The procedure is to derive a model from theory and then use statistical methods to estimate the coefficients associated with the model. The general goal of first-order empiricism is to determine whether or not an independent variable or set of independent variables (a model) has a statistically significant non-zero effect upon a dependent variable. If such an effect is established, we consider the model to be confirmed or at least not falsified.2 A researcher may find, however, that more than one set of variables or models has a significant, non-zero effect on the dependent variable. Throughout the social sciences, numerous “confirmed” theories usually exist for any given phenomenon (McAleer, 1987). First-order empiricism is a strategy of theory choice only insofar as it allows one to test a null model where the independent variables have no effect on the dependent variable against an alternative model. A first-order strategy is entirely unable to cope with the problem of contending, non-falsified models because a hypothesis test makes no claims about the alternative models. In addition, testing claims based on the effects of the variables or the size of the tests are not borne out. (See the next section for explanation of these claims.) In order to choose between theories, a different methodology is necessary. 2 Note that this procedure does not conform to traditional hypothesis tests where the only inference concerns the null hypothesis. 1 Broadening Levey’s (1996) definition somewhat, second-order empiricism refers to the testing of competing theories directly against one another. This broader definition captures the work of philosophers of science such as Sellars (1963), Lakatos (1978) and Laudan (1977). Only by pitting theories against the data and one another can we choose between rivals. The general idea behind second-order empiricism is that a “true” theory should be able to account for the explanatory success of its rivals. One version of this principle (the one used here) is that a “true” theory should be able to predict the consequences of its rivals (Sellars, 1963; McAleer, 1987). First and second-order empiricism answer distinctly different questions and one cannot replace the other. The relationship between these strategies is a sequential one. First-order empiricism comes before second-order to identify theories that are serious contenders in the explanation of an empirical problem. These theories are said to have an “empirical grip” on the problem (Levey, 1996). Once these contending theories have been established, it is necessary to test these theories against one another, and this is where a second-order strategy becomes necessary. Empirical work that does not attempt such second-order testing “quickly degenerates into a sort of mindless instrumentalism” (Blaug, 1980). Too often, the goals of second-order empiricism have taken a back seat to the goals of first-order empiricism. Granger (1990) argues that estimation is “a question to which econometricians have paid an undue amount of attention in the past 50 years.” A quick glance at any of the commonly employed econometrics texts used in training political scientists bears out the claim that testing competing theories is secondary to estimation (see Hanushek and Jackson, 1977; Kmenta, 1986; King, 1989; and Greene, 2 1993; Fox, 1997). One reason for this imbalance is that the theory of estimation stands on firmer philosophical and theoretical ground than the theory of hypothesis testing (see Howson and Urbach, 1993). In those cases where a second-order strategy has been employed, it has almost always been used in the service of nested models where one model is a special case of the other. The testing of nonnested models where one model is not a special case of a second model has been almost entirely overlooked.3 The econometrics texts noted above have either no or brief discussions of nonnested model testing. Furthermore, on those rare occasions where nonnested tests have been used, they have been used exclusively for linear models (McAleer, 1995). When faced with discrete choice nonnested models, empirical researchers have resorted to informal and ad hoc decision criteria.4 The major goal of this project is to introduce and develop a methodology of nonnested hypothesis testing that researchers in world politics (and by extension the rest of political science) will find useful. I make use of both the Cox test for nonnested hypotheses and the Vuong test for nonnested model selection. I argue for a sequential approach where the Vuong test will be used depending upon the outcome of the Cox test. In keeping with the goal of making this methodology useful for world politics research, I discuss both tests in the context of binary choice models, specifically probits. I apply the methodology developed here to the problem of testing alternative models of the escalation of great power militarized disputes. 3 4 See next section for a more precise definitions of nested and nonnested models. These include the use of “supermodels”, f-tests, and individual coefficient testing. 3 Nonnested Models and Problematic Tests Two models are nested if one model can be reduced to the other model by imposing restrictions on certain parameters. For example, H0 :Y = β0 + β1 X1 + β 2 X 2 + ε0 H1 : Y = β 0 + β1 X1 + β 2 X 2 + β 3 X 3 + ε1 are nested models because by imposing the restriction that β3 = 0 , H1 becomes H0. In other words, H1 encompasses H0. Discriminating between these models involves simply testing the restriction on β3 . This test can be done with a t-test under ordinary least squares (Greene, 1993) and a likelihood-ratio test under either least squares or maximum likelihood (King, 1989). If H1 included a β4 X4 as well, an F-test would be appropriate (Greene, 1993). Two models are nonnested (or separate) if one model cannot be reduced to the other model by imposing restrictions on certain parameters.5 For example, H0 :Y = β0 + β1 X1 + β 2 X 2 + β 3 X 3 + ε 0 H1 : Y = β 6 + + β3 X 3 + β4 X4 + β5 X 5 + ε1 are nonnested models because even if we impose the restrictions that β4 = 0 and β5 = 0 , H1 does not become H0. Testing such models is the subject of this project. Numerous examples of nonnested models exist in the world politics literature. One of the clearest examples of strictly nonnested rival models in political science research can be found in Huth, Gelpi, and Bennett (1993).6 In this paper, the authors test two rival explanations for the escalation of great power militarized disputes: structural 5 Models can also be nonnested in terms of their functional forms and error structures. This paper is concerned with selecting regressors. 6 Strictly nonnested models have no overlapping variables. 4 realism and rational deterrence theory. Their operationalization of structural realism includes7: System uncertainty (size) System size*risk System uncertainty (diffusion) System diffusion*risk Risk acceptant Their operationalization of deterrence theory includes: Balance of forces Secure 2nd strike Defender vital interests Challenger vital interests Defender backed down Challenger backed down Defender other dispute Challenger other dispute These models are decidedly nonnested; no set of parametric restrictions will reduce one model to the other. Huth et al. (1993) test these rival theories by combining the variables from both models into one large equation -- a “supermodel.” The results here seem unambiguous: six of eight of the deterrence variables are conventionally significant while only one of five of the structural realist variables is conventionally significant. Nonnested model testing appears to be unnecessary. As we shall see later on, the “supermodel” approach in this case leads to mistaken inferences. A similar situation can also be found in Bennett (1997). In this paper, Bennett tests four alternative models of alliance duration. These models are capabilityaggression, security-autonomy, domestic politics, and institutionalization. Bennett first tests the models individually and finds that each of the models has significant first-order 7 The authors operationalize structural realism in other ways as well. 5 empirical support. He is correct in stating that simply testing the models individually is not enough. Only limited information can be drawn from the fact that the null hypothesis is rejected in each of the four individual tests. First, rejecting the null does not imply acceptance of the alternative. The only valid inference that can be made from a significance test concerns the null hypothesis. The alternative hypothesis is there only to provide high power in a particular direction and any number of alternative hypothesis could produce the same effect (Dastoor, 1985). Rejection of the null, therefore, does not imply acceptance of the alternative. Second, we cannot compare the individual results in terms of their effects on the dependent variable. Variables may be compared only if they are measured on the same scale (Achen, 1982). Drawing an example from Bennett, the effect on the dependent variable from a unit, percent, or standard deviation change in “liberal” cannot be compared with the effect of a unit, percent, or standard deviation change in “capability change.” Third, the size of the tests does not allow any inferences to be made regarding the alternative model. As Howson and Urbach (1993) point out, there is no theory that connects the size of a test to a measure of inductive support. The fact that a null in one test is rejected at the 5% level and the same null in another test is rejected at the 1% level has no legitimate implications for either the null or the alternative models. First-order empiricism simply cannot handle the problem of competing nonnested models. Bennett then attempts a “simultaneous” test of the models using the supermodel approach. The results of Bennett’s analysis are reproduced in Table 1. 6 Table 1 ________________________________________________________________________ Variable Coefficient S.E. Significance ________________________________________________________________________ Constant 4.81 (0.350) Capability Aggregation Change in security 0.101 (0.018) p<0.01 Alliance Security Improvement 0.011 (0.007) p<0.1 Mutual Threat -0.059 (1.02) Security-Autonomy Capability Change -62.6 (11.2) p<0.01 Symmetry -0.083 (0.232) Capability Concentration 0.929 (0.391) p<0.01 Domestic Politics Liberal 0.781 (0.284) p<0.01 Polity Change -0.052 (0.302) Other Variables Number of States 0.078 (0.038) p<0.05 Wartime -0.567 (0.400) p<0.1 War Termination -0.279 (0.190) p<0.1 Institutionalization p (shape parameter) 1.44 (0.180) p<0.01 θ (heterogeneity parameter) 0.194 (0.192) Log-Likelihood -665.1 ________________________________________________________________________ In interpreting these results, Bennett notes that, “each individual theoretical model has at least one significant variable.” He goes on to argue that three of the four models as well as the control variables seem to have some effect on alliance duration. This information is fine, as far as it goes. If we want to know if any of these models are close to the truth, however, nonnested testing is necessary. The hypothesis testing procedures that may be used in the nested case will not test these models. The “comprehensive” model takes the following general form: Y = Xβ + Zγ + P δ + Sψ + Tν + ε where X contains the variables from the capability aggregation model, Z contains the variables from the security-autonomy model, P contains the variables from the domestic 7 politics model, S contains the variables from the institutionalization models, and T contains the control variables. Distinguishing between the models requires direct tests of each model against the others. The problem with an F-test approach is that model 1 is not tested against model 2 or model 3 or model 4 but rather model 1 is tested against a hybrid of the other four models (Kmenta, 1986; Greene, 1993). Further problems with the F-test include low power and the effects of multicollinearity (Huth et al. have thirteen variables with only two models). The approach more commonly seen in political science research and demonstrated by both Huth et al. (1993) and Bennett (1997) is the testing of individual coefficients. When testing nested models, we choose the F-test when we want to test a complete model against an alternative one, generally the null (Hanushek and Jackson, 1977). The reasons for testing complete models are the same for the nested and nonnested cases. Individual coefficient tests in cannot establish in either case that β = 0 or γ = 0 . In fact, it is often the case that the individually significant variables are likely to be spread across the models as in Bennett (1997). The common response to this situation, which can be seen throughout the world politics literature, is an inductive decision to combine the models despite having no theoretical justification for doing so.8 It is also possible to discriminate between nonnested models using selection criteria such as R2, Akaike’s information criteria (AIC), or Schwarz’s Bayesian information criteria (BIC) (Sawa, 1978). While these methods are adequate when adjusted for the nonnested case, they are not hypothesis tests and hence answer a different 8 Bennett (1997) does argue that the four models are not mutually exclusive but that is not same as arguing that combining the models is theoretically justified. 8 question. Selection criteria measure how well the models fit the data taking parsimony into account. The logic is that the best predicting model is closest to the true specification (McAleer, 1987). There are two problems with using these criteria. First, these methods do not allow a probabilistic statement to be made regarding model selection, hence the distinction between “absolute” (hypothesis tests) and “relative” (selection criteria) discrimination (McAleer, 1987). The second problem, which is a consequence of the first problem, is that these methods will always choose a model even if both models are “bad.” If both models are seriously misspecified, we would like to be able to reject both models. The Vuong model selection test is a serious improvement over previous selection methods because it does allow a probabilistic statement to be made. It is important to remember, however, that the Vuong test is still a model selection test and, if it chooses a model, it will choose the model that is closest to the true specification even if both are far away from that specification. . The Methodology The literature on nonnested hypothesis testing stems from the seminal work of David Cox (1961, 1962). The philosophy behind Cox’s work is that a true model should be able to predict the performance of specific alternatives. The idea is to compare the actual performance of the alternative model with the expected performance of the alternative model under the null hypothesis (McAleer, 1987). A true null should not distort the actual performance of the alternative model. 9 The math behind Cox’s innovation is a generalization of the familiar likelihood ratio test statistic. The modified statistic is the difference between the log-likelihood ratio ˆ and the expected log-likelihood ratio under the null hypothesis. 9 That is, if L0 (α 0 ) is the maximum value of the likelihood of a sample of y values when H0 is postulated and ˆ L1 (α1 ) is the maximum value of the likelihood of a sample of y values when H1 is postulated, then the Cox test statistic is: ˆ = log L (α ) − log L (α ) ˆ0 ˆ1 l 10 0 1 T = ˆ − E (ˆ ) ˆ l l 0 10 10 α =α 0 N0 ≡ T0 ~ N (0,1). [V (T0 )]1 / 2 Cox’s statistic is sometimes referred to as a centered likelihood ratio as N0 has a standard normal distribution. The major problem with applying the Cox statistic to models other than least squares models is calculating the expected value of the log-likelihood ratio under the null hypothesis. Recent studies [Pesaran (1987), Pesaran and Pesaran (1993), Gourieroux and Monfort (1995)] have focused on finding measures of closeness that will approximate the expected value under the null. The common choice among these authors is the KullbackLeibler information criteria (KLIC), which is defined as: ∫  f (y, θ)  log   f (y, θ)dy R  g(y, γ )  where R stands for the range of variation of Y under H0. The KLIC is the mean information for discrimination in favor of f (y, θ) against g(y, θ ) (Kullback, 1959). It can be interpreted as the surprise experienced on average when we believe that f (y, θ) is 9 Both models must serve as the null hypothesis. 10 the data generating process (DGP) and then we find that g(y, θ ) is the DGP (White, 1994). Justifications for using the KLIC generally focus on its analytic tractability and important properties (Pesaran, 1987).10 Where the KLIC cannot be analytically derived, a simulation approach is necessary (Pesaran and Pesaran, 1993).11 Following the approach used by Pesaran (1987) and Pesaran and Pesaran (1993), the numerator of the test statistic is: ˆ ˆ ˆ ˆ L f (Y, θ ) − Lg (Y, γ ) − C(θ , γ * (R)) ˆ ˆ ˆ where C(θ, γ * (R)) is the KLIC. Notice that γ * is the maximum likelihood estimator of ˆ model 2 assuming that model 1 is the actual data generating process. γ * is therefore a pseudo-maximum likelihood estimator. Written out completely, the numerator for testing two rival nonnested probits is12: N  ˆ ˆ  Φ   1 − Φ t 1  1   − N 0 ( R ) = N −1 ∑ Yt log t 1  + (1 − Yt ) log Φ   1 − Φ  N ˆ ˆ  t =1  t 2   t2    ˆ ∑ Φ  t =1 n    t1 ˆ ˆ  Φ   1 − Φ t 1  ˆ  log * t 1  + (1 − Φ 1 ) log *  Φ ( R)   1 − Φ ( R)  ˆ ˆ t2  t2     ˆ where the pseudo-ML estimator is Φ *2 (R) and R stands for the number of repetitions t used to simulate the estimator. (An informal discussion of the mathematics behind these results can be found in the appendix.) As noted earlier, the resulting statistic has a standard normal distribution. The results of a nonnested hypothesis test are not always unambiguous. As there is generally no reason to assume that either rival model is the null, each model must, in The KLIC is invariant to transformations of θ and γ, is nonnegative, and equals 0 when f(y,θ) and g(y,γ) coincide. The KLIC is also additive for independent random events (Kullback, 1959; Pesaran, 1987). 11 For binary choice models, an analytical derivation of the KLIC is possible and we need only simulate the pseudo-ML estimator. 12 See the appendix for the variance. 10 11 turn, take that role. Four outcomes are therefore possible: one or the other model may be rejected, both models may be rejected, or neither model may be rejected. It is also important to remember that if the null is rejected, it is not rejected in favor of the alternative. The possibility of rejecting both models without a hint of where to go next has led some to criticize the Cox test (see Granger et al., 1995). The fact that both models may be rejected or neither model rejected should not, however, be taken as weakness of the test or as inconsistent with first-order empiricism. The rejection of both models implies that neither model could predict the results of the other model. We should conclude then that both models are misspecified in some way. Such a result is not inconsistent with first-order support for the models as even misspecified models may be strong enough to reject a null hypothesis of no effect. What to do if both models are rejected is, of course, a natural question. It is in this situation that Vuong’s model selection test will prove useful. Short of inventing better theories, we might want to choose the best of a bad lot of models and work at respecifying it. Similarly, if we end up accepting both models and we are convinced that one of them is the true specification, we might use model selection criteria to pick the model that is closest to the true specification. Vuong’s test (1989) also makes use of the KLIC. Vuong defines the KLIC as: KLIC ≡ E 0 [log h 0 (Y t | X t )] − E 0 [log f (Y t | X t ;θ * )] where h 0 (. | .) is the true conditional density of Yt given X t , and θ * are the pseudo-true values of θ . As Vuong points out, the last term in the above equation is an equivalent 12 measure of the KLIC. The “best” model is therefore the model that maximizes E 0 [log f (Y t | X t ;θ * )] . The null hypothesis of Vuong’s test is:  f (Yt | X t ;θ * )  H 0 : E 0 log =0 g(Yt | Z t ;γ * )   meaning that the two models are equivalent. The alternative hypotheses are:  f (Yt | X t ;θ * )  H f : E 0 log >0 g (Yt | Z t ;γ * )    f (Yt | X t ;θ * )  H g : E 0 log <0 g (Yt | Z t ;γ * )   meaning that model f is better than g or model f is worse than g, respectively. The expected value in the above hypotheses is unknown. Vuong demonstrates, however, that under fairly general conditions:  f (Yt | X t ;θ * )  1 ˆ LRn (θ n , γˆ n ) a . s . E 0 log →  n g(Yt | Z t ;γ * )   which states that the expected value can be consistently estimated by (1/n) times the likelihood ratio statistic. Vuong’s test therefore is simply the log-likelihood ratio suitably normalized. If this value is greater than some critical value from the standard normal distribution, the null equivalence will be rejected in favor of model f. If this value is less than some critical value from the standard normal distribution, the null of equivalence will be rejected in favor of model g. The normalization can be obtained from the sum of squared 13 ˆ deviations of log[ f (Yn | X n ;θ n ) / g(Yn | Z n ;γˆ n )] from its sample mean which is 1 ˆ LRn (θ n , γˆ n ) (Vuong, 1989). n The methodology I propose then is a sequential one. Staying with the case of two models, if theoretically well-developed rival models have passed a first-order hurdle, a nonnested hypothesis test should be used to see if either of these models are consistent with the data. If only one model survives the test, the analysis can end there. If both models are rejected, a model selection test should be done to choose, if possible, the model that is closest to the true specification. If both models are accepted and the researcher believes that the set of models contains the true specification, again a model selection test should be performed. If the null hypothesis of equivalence cannot be rejected, then the data cannot discriminate between the models. The Application Realism has been the dominant theoretical position in international relations for the last fifty years. Structural realism (Waltz, 1979) has been the dominant brand of realism for the past twenty years. Rational deterrence theory (Schelling, 1960), however, has been a serious contender for that position. Properly specified, both theories have demonstrated empirical grasp on important problems in world politics (Huth, Bennett, and Gelpi, 1992). Efforts to test these theories against one another have ignored the fact that the theories are nonnested and standard model selection techniques are inappropriate. Structural realism and rational deterrence theory are in fact strictly nonnested (no 14 overlapping variables) because the theories are drawn from different levels of analysis (the systemic and the dyadic). The most systematic attempt to test these theories against one another can be found in Huth et al. (1993). In that article, Huth and his co-authors conceptualize structural realism in terms of the amount of uncertainty created by the structure of the international system. To connect the amount of uncertainty in the system to actual decisions taken by state leaders, the authors interact uncertainty with the risk propensities of these decision-makers. When uncertainty is high, risk-acceptant leaders will pursue policies that might spark armed conflict while risk-averse leaders will likely be more cautious (Huth et al., 1993). Structural realism then is operationalized by two composite measures of uncertainty (size and capability diffusion), a measure of risk propensity, and two interaction terms (one for each measure of uncertainty).13 As for rational deterrence theory, Huth et al. (1993) argue that, “the credibility of the threat is the primary determinant of deterrence success or failure.” Credibility is affected by the balance of military capabilities, the interests at stake for the states involved, the past dispute behavior of the states, and if either state is engaged in another dispute at the same time. Deterrence is more likely to fail as the balance of capabilities and the interests at stake shift toward the challenger. Deterrence is also more likely to fail if the defender has backed down in a previous dispute or is engaged in a dispute elsewhere. Huth et al. (1993) test their models against one another using the “supermodel” approach. They conclude that “rational deterrence theory provides a much more 15 compelling explanation of great-power decisions to escalate militarized disputes than does structural realism.” Their conclusion is based upon the results of the individual ttests of their coefficients. Only one of the coefficients in the structural realist model is significant but it is in the wrong direction. Their results are reproduced in Table 2. Table 2 ________________________________________________________________________ Variable Coefficient S.E. Significance ________________________________________________________________________ Constant -0.71 (1.37) Structural Realism System uncertainty 1 (size) 0.21 (0.32) System size x risk -0.97 (0.36) p<0.025 System uncertainty 2 (diffusion) -0.20 (0.24) System diffusion x risk 0.18 (0.32) Risk-acceptant 1.55 (1.54) Deterrence Theory Balance of forces 1.73 (0.94) p<0.05 Secure 2d strike -2.33 (0.83) p<0.01 Defender vital interests -1.29 (0.46) p<0.01 Challenger vital interests 1.09 (0.44) p<0.01 Defender backed down 1.23 (0.46) p<0.01 Challenger backed down -0.72 (0.57) p<0.15 Defender other dispute 0.96 (0.42) p<0.025 Challenger other dispute 0.05 (0.41) ________________________________________________________________________ As noted previously, however, there is no formal theory that connects the significance level of a hypothesis test with a measure of inductive support. To actually test these models against one another, a different methodology is necessary. The results of the Cox test are presented in Table 3. I cannot reject the null hypothesis in either case. The inference to be drawn is that neither model is seriously misspecified. Both models can predict the consequences of its rival and therefore both should be accepted. This result calls the main conclusion of Huth et al. (1993) into question (namely, that one model is significantly more compelling than the other). 13 Huth et al.(1993) actually test five separate models of structural 16 Table 3 __________________________________________________________ | Model One as Null | Model Two as Null Sim # n | Z Stat 2-tailed sig. | Z Stat 2-tailed sig. __________________________________________________________ 1 300 | -0.6747 0.4999 | -0.1028 0.9181 2 400 | -0.6727 0.5011 | -0.1019 0.9189 3 500 | -0.6757 0.4992 | -0.1042 0.9170 __________________________________________________________ Whether or not one of these models is significantly closer to the true specification remains a question of interest. If one model is chosen over the other, powerful new evidence has entered the debate. As noted earlier, the Vuong test statistic is: ˆ LRn (θ n , γˆ n ) . ˆ n 1 / 2ω n The numerator of the statistic is the difference in the maximum log-likelihood values for the two models: ˆ LRn (θ n , γˆ n ) = −55.933 − ( −45.558) = −10.375 . ˆ The denominator is the sum of squared deviations of log[ f (Yn | X n ;θ n ) / g(Yn | Z n ;γˆ n )] from its sample mean which is 1 ˆ LRn (θ n , γˆ n ) (Vuong, 1989)14. That is, n 2 ˆ   f (Yn | X n ;θ n ) 1 ˆ ˆ n ω n = ∑  log − LRn (θ n , γˆ n )  = 20.214 .   g (Yn | Z n ;γˆ n ) n n   1/ 2 The test statistic is therefore: ˆ LRn (θ n , γˆ n ) = −0.513 . ˆ n 1 / 2ω n realism. I have focused solely on their most comprehensive model (#5). 17 The p-value is 0.608 meaning that the null hypothesis of equivalence cannot be rejected at any conventional significance level. The data then do not discriminate between the structural realist model of the escalation of great power militarized disputes and the rational deterrence model of the escalation of great power militarized disputes. The results of this analysis are important because Huth et al. end their article with recommendations to great powers that turn on the rational deterrence model being the more compelling explanation. My results indicate that the models are equivalent in terms of their distance from the true specification. This result is more in keeping with the conclusion of Huth et al. (1992).15 Caveats and Directions for Future Research Two caveats are in order. First, the data to which these tests were applied are binary time-series-cross-section data. The effect of this lack of independence upon these test statistics is unknown and constitutes a fruitful area for further research. Note, however, that this same lack of independence also affects the original Huth et al. analysis. Second, the Cox test is not a useful tool for model building.16 Since the null is not rejected in favor of the alternative, no model building advice is implied by a rejection. Nor is the Cox test a silver bullet – it is simply one method of implementing a secondorder empirical strategy when faced with rival models. It will not definitively solve debates in world politics. It will, however, increase the amount of evidence available to those debates. 14 Vuong provides two methods of generating the normalization. The results from both methods are essentially the same. In that paper, the authors concluded that both models are compelling. 16 The Vuong test can be used for this purpose. 15 18 Conclusion The purpose of this research is to provide applied researchers in world politics with a method of testing nonnested binary choice models against one another. The method I have proposed is a sequential one. The Cox text should be used with theoretically well-developed rival models to determine if any of the models is “close” to the truth. In the case where all models are rejected, the Vuong test should be used to choose the model that is closest to the true specification. Work can then begin on respecifying that model. If the null of equivalence cannot be rejected, it may be necessary to find new models. In the case where more than one model is accepted, the Vuong test should again be used. If the null of equivalence cannot be rejected, a synthesis of the models may be in order. 19 Appendix: The Cox Test Using the KLIC Consider the following univariate binary choice models: H f : P ( y t = 1 | x t ) = Φ (θ ′x t ) = ∫ H f : P ( y t = 1 | z t ) = Φ(γ ′z t ) = θ ′x t −∞ φ (v )dv ∫ γ′t z −∞ φ (v )dv where Φ(θ’xt)=Φ 1 and Φ(γ’zt)=Φ 2 are probability distribution functions of the normal distribution function and φ(.) is the density function of the standard normal variate. These are the average log-likelihood functions under Hf and Hg: H f : L f (Y, θ | x) = N−1 ∑ { t log Φt 1 + (1 − Yt )log(1 − Φ t 1 )} Y t=1 N N Hg: : Lg (Y, γ | z) = N −1 ∑{ t log Φ t 2 + (1 − Yt )log(1 − Φ t 2 )} Y t =1 where Y=(Y1,Y2,...,YN)’ is the N x 1 vector of observations on y. The Cox test statistic when Hf is the null is: Nf ≡ Tf [V(Tf )]1 / 2 The numerator of the Cox test statistic is: ˆ ˆ ˆ ˆ Tf (R) = L f (Y, θ ) − L g (Y, γ ) − C(θ, γ * (R)) where  f (y, θ0 )  C(θ, γ * ) = ∫ log   f (y, θ0 )dy  g( y, γ * )  That is, the average log-likelihood of the first model minus the average log-likelihood of the second model minus the Kullback-Leibler information criteria. Denoting the first two terms of the numerator as: ˆ ˆ d t = L f (Y, θ ) − L g (Y, γ ) 20 we find, d t = [Yt log Φ t 1 + (1 − Yt ) log(1 − Φ t 1 )] − [Yt log Φ t 2 + (1 − Yt ) log(1 − Φ t 2 )] = Yt log Φ t 1 + (1 − Yt ) log(1 − Φ t 1 ) − Yt log Φ t 2 − (1 − Yt ) log(1 − Φ t 2 ) = Yt log Φ t 1 − Yt log Φ t 2 + (1 − Yt ) log(1 − Φ t 1 ) − (1 − Yt ) log(1 − Φ t 2 ) = Yt (log Φ t 1 − log Φ t 2 ) + (1 − Yt )[log(1 − Φ t 1 ) − log(1 − Φ t 2 )] Φ   1 − Φ t1  = Yt log t 1  + (1 − Yt ) log Φ  1− Φ   t2   t2   therefore N  Φ   1 − Φ t 1  d t = N −1 ∑ Yt log t 1  + (1 − Yt ) log Φ   1 − Φ   t =1  t 2   t2   In calculating the third term of the numerator, we must note that: 1. The expectation of a sum is equal to the sum of the expectations. 2. Each of the summands here is a function of only one observation. 3. P(Y1=1)=Φ τ1 and P(Y1=0)=1-Φ τ1 Looking just at t=1, the expectation of the likelihood ratio under the null, E f (dt ), is Y1 at 0 times the probability of Y1 at 0 plus Y1 at 1 times the probability of Y1 at 1. When Y1=0,  Φ   1 − Φt 1    Yt log t 1  + (1 − Yt )log    Φt 2   1 − Φt2     reduces to 1 − Φ  t1  log   1 − Φt 2  When Y1=1,  Φ   1 − Φt 1    Yt log t 1  + (1 − Yt )log    Φt 2   1 − Φt2     reduces to Φ  log  t1   Φt 2  21 Multiplied by the appropriate probabilities and summed gives:  Φ   1 − Φt 1    Φ t 1 log t 1  + (1 − Φt 1 )log   Φt 2   1 − Φt 2     The Kullback-Leibler Information Criteria is therefore: ˆ ˆ Φ  1 − Φ  1 n ˆ ˆ ∑ Φt 1 log  Φ t*1  + (1 − Φ 1 )log  1 − Φ*t1   ˆ  N t =1   ˆ t2   t2  The only problem left is estimating Φ t2 under the null hypothesis. A simulation technique used by Pesaran and Pesaran (1993) for estimating Φ t2 can be found below. The complete numerator of the Cox statistic is therefore: N  ˆ ˆ  Φt   1 − Φt   −1 1 ˆ ˆ  L f (Y, θ ) − Lg (Y, γ ) = N ∑ Yt log ˆ 1  + (1 − Yt )log  ˆ   Φt2   1 − Φt 2  t=1  ˆ ˆ C(θ, γ * (R)) = where ˆ ˆ  Φt   1 − Φt   1 n ˆ 1 1 ˆ  + (1 − Φ 1 )log  Φ t 1 log ˆ * ∑ ˆ * (R)   N t=1   Φ t 2 (R)   1 − Φt 2  ’ ˆ* ˆ Φ t 2 (R) = Φ(z t γ * (R)) and ˆ ˆ Φ t 1 = Φ(x’tθ ) ˆ Computing γ * (R) by simulation: • • • • ˆ Artificially simulate y using f (y, θ) as the DPG. Yj=(Y1j,Y2j,...,YNj)’ are the independent observations generated artificially according to Hf ˆ Use Yj to compute the ML estimate of γ under Hg and call it γ j Then estimate γ* by ˆ γ * (R) = 1 R ∑ γˆ j R j=1 22 where R stands for the number of repetitions. The variance of the Cox test is: ˆ ˆ ˆ ˆ N −1 d ’ I N − R( β )[ R’( β ) R( β )]−1 R’( β ) d where N  ˆ ˆ  1 − Φ t 1  Φ     and d = N −1 ∑ Yt log t 1  + (1 − Yt ) log  1 − Φ  Φ  ˆ ˆ t =1  t 2    t2    { }  ∂ log f (Y1 , β ) 1 ∂β 1  1 ∂ log f (Y2 , β ) R( β ) =  ∂β 1  M M ∂ log f (Y3 , β ) 1  ∂β 1  ∂ log f (Y1 , β )   ∂β p  ∂ log f (Y2 , β )  L  ∂β p  L M  ∂ log f (Y N , β )  L  ∂β p  L 23 References Achen, Christopher H. 1982. Interpreting and Using Regression. Newbury Park: Sage Publications. Amemiya, Takeshi. 1980. “Selection of Regressors.” International Economic Review 21(2): 331-354. Bennett, D. Scott. 1997. “Testing Alternative Models of Alliance Duration, 1816-1984.” American Journal of Political Science 41(3):846-878. Blaug, M. 1980. The Methodology of Economics. Cambridge: Cambridge University Press. Boyd, Richard, Philip Gasper, and J.D. Trout, eds. 1993. The Philosophy of Science. Cambridge, MA: The MIT Press. Cox, D.R. 1962. “Further results on tests of separate families of hypotheses.” J. Roy. Statist. Soc. Ser. B 24: 406-424. Cox, D. R. 1961. “Tests of separate families of hypotheses.” Proceedings of the Fourth Berkeley Symposium I: 105-123. Dastoor, Naorayev. 1985. “A classical approach to Cox’s test for non-nested hypotheses.” Journal of Econometrics 27: 363-370. Dastoor, Naorayev. 1981. “A note on the interpretation of the Cox procedure for nonnested hypotheses.” Economics Letters 8: 113-119. Fox, John. 1997. Applied Regression Analysis, Linear Models, and Related Methods. Thousand Oaks, CA: Sage Publications, Inc. Gourieroux, Christian, and Alain Monfort. 1995. Statistics and Econometric Models, v.1&2. Cambridge: Cambridge University Press. Granger, C.W.J., Maxwell L. King, and Halbert White. 1995. “Comments on testing economic theories and the use of model selection criteria.” Journal of Econometrics 67: 173-187. Granger, C.W.J. 1990. “Where are the Controversies in Econometric Methodology?” in Modelling Economic Series, edited by C.W.J. Granger. Oxford: Oxford University Press. Greene, William H. 1993. Econometric Analysis, 2ed. New York: Macmillan Publishing Company. Hanushek, Eric A., and John E. Jackson. 1977. Statistical Methods for Social Scientists. San Diego: Academic Press, Inc. Howson, Colin, and Peter Urbach. 1993. Scientific Reasoning: The Bayesian Approach, 2ed. Chicago: Open Court. Huth, Paul, Christopher Gelpi, and D. Scott Bennett. 1993. “The Escalation of Great Power Militarized Disputes: Testing Rational Deterrence Theory and Structural Realism.” American Political Science Review. 87: 609-623. Huth, Paul, Christopher Gelpi, and D. Scott Bennett. 1992. “System Uncertainty, Risk Propensity, and International Conflict Among the Great Powers. Journal of Conflict Resolution. 36: 478-517. Judge, George G., W.E. Griffiths, R. Carter Hill, Helmut Lütkepohl, and Tsoung-Chao Lee. 1985. The Theory and Practice of Econometrics, 2ed. New York: John Wiley and Sons. 24 King, Gary. 1989. Unifying Political Methodology. Cambridge; Cambridge University Press. Kmenta, Jan. 1986. Elements of Econometrics, 2ed. New York: Macmillan Publishing Company. Kullback, Solomon. 1959. Information Thoery and Statistics. New York: John Wiley & Sons, Inc. Lakatos, Irme. 1978. Philosophical Papers, edit by J. Worrall and G. Currie. Cambridge: Cambridge University Press. Laudan, Larry. 1977. Progress and its problems: Towards a theory of scientific growth. London: Routledge and Kegan Paul. Levey, Geoffrey Brahm. 1996. “Theory choice and the comparison of rival theoretical perspectives in political sociology.” Philosophy of the Social Sciences 26:1 (March): 26-60. McAleer, Michael. 1995. “The Significance of Testing Empirical Non-Nested Models.” Journal of Econometrics 67: 149-171. McAleer, Michael. 1987. “Specification Tests for Separate Models: A Survey.” Specification Analysis in the Linear Model, eds. M.L. King, and D.E.A. Giles. London: Routledge & Kegan Paul. Pesaran, M.H. 1987. “Global and partial non-nested hypotheses and aymptotic local power.” Econometric Theory. 3: 69-97. Pesaran, M.H., and B. Pesaran. 1993. “A Simulation Approach to the Problem of Computing Cox’s Statistic for Testing Nonnested Models.” Journal of Econometrics 57: 377-392. Sawa, Takamitsu. 1978. “Information criteria for discriminating among alternative regression models.” Econometrica 46:6 (November): 1273-1291. Schelling, Thomas. 1960. Strategy of Conflict. Cambridge: Harvard University Press. Sellars, W. 1963. Science, perception and reality. London: Routledge and Kegan Paul. Vuong, Quang. 1989. “Likelihood ratio tests for model selection and non-nested hypotheses.” Econometrica 57:2 (March): 307-333. Waltz, Kenneth N. 1979. Theory of International Politics. Reading: Addison-Wesley. White, Halbert. 1994. Estimation, Inference and Specification Analysis. Cambridge: Cambridge University Press. 25

Shared by: Myrna Carlson
About
Home-schooling my youngest child (16). Small on-line bookseller. Unpublished writing.
Other docs by Myrna Carlson
Politics+of+India
Views: 86  |  Downloads: 1
Politics+of+globalization+2
Views: 85  |  Downloads: 1
Politics+and+Government+of+Africa-1
Views: 75  |  Downloads: 0
Executive+Popularity+in+France+Version
Views: 88  |  Downloads: 0
Political+Marketing-1
Views: 91  |  Downloads: 3
Policies+Prototypes+and+Presidential+Approval
Views: 43  |  Downloads: 0
What+Are+Your+Political+Beliefs
Views: 55  |  Downloads: 0
Internet+Political+Surveys
Views: 52  |  Downloads: 0