VIEWS: 7 PAGES: 14 CATEGORY: Childrens Literature POSTED ON: 12/8/2009
on using khmadlazation in the statistical inference of quantile
ON USING KHMADLAZATION IN THE STATISTICAL INFERENCE OF QUANTILE REGRESSION PUTRA MANGGALA A BSTRACT. We begin by introducing the quantile regression and presenting an empirical example using R. Some statistical inference paradigms are introduced, where the presence of nuisance parameter related to quantile estimation is acknowledged. Then we discuss the the purging of a similar nuisance parameter in the (quantile) empirical process by using the Khmadlaze martingale transform in a relatively great detail. Some ideas are drawn that may help transform the quantile estimation nuisance parameter. 1. M EET Q UANTILE R EGRESSION 1.1. In a nutshell. Quantile regression is emerging as a comprehensive approach to the statistical analysis of both the linear and nonlinear response models. It supplements the classical linear regression methods, which are based on estimating models for the conditional mean, with a general technique for estimating families of conditional quantile function. In short, this approach allows the statistical analyst to capture more distributional properties of the underlying model, which are otherwise lost if least squares is his sole approach. Historically, probably the best motivation for the quantile regression is the following remark by Mosteller and Tukey in [MT]: What the regression curve does is give a grand summary for the averages of the distributions corresponding to the set of xs. We could go further and compute several different regression curves corresponding to the various percentage points of the distributions and thus get a more complete picture of the set. Ordinarily this is not done, and so regression often gives a rather incomplete picture. Just as the mean gives an incomplete picture of a single distribution, so the regression curve gives a correspondingly incomplete picture for a set of distributions. 1 ON USING KHMADLAZATION IN THE STATISTICAL INFERENCE OF QUANTILE REGRESSION 2 Now instead of building from scratch the mathematics of quantile regression by examining the relationship between quantiles and optimisation [Ko], we will view it as a generalisation of the least absolute deviation (LAD) regression. 1.2. Extending the LAD regression. Let us ﬁrst recall this regression. Suppose that we have a data set consisting of the points ( xi , yi ) for i ∈ [n]1. Now we want a function f such that f ( xi ) ≈ yi . To proceed, we conjecture that the function f has a particular form containing some parameters which needs to be determined. For simplicity, suppose that f ( x ) = a + bx and thus we must determine what a and b are. Then the LAD regression problem is essentially seeking a and b which minimises the absolute error: n (1.1) i =1 ∑ |yi − f (xi )| The mathematics used to solve this is different from that in the least squares theory. While the least squares regression which involves the problem of minimising the squared errors can be reduced to a problem in numerical linear algebra, the LAD problem delves into the realm of optimisation which leads to machineries such as the simplex and interior point methods. Although solving the least absolute deviation regression is harder, due to the unstable and non-uniqueness nature of the solution, it still ﬁnds applications in many areas, since it is more robust than the least squares method. We can generalise LAD to introduce the idea of quantile regression, by remembering that it is actually the median that solves 1.1. To see this, the symmetry of of the piecewise linear absolute value function implies that the minimisation of the sum of absolute errors must equate the number of positive and negative residuals, assuring that there are the same number of observations above and below the solution. Consider “tilting” the piecewise linear absolute value function to get the quantile regression ρ function ρτ (u) = u(τ − 1[u<0] ) where 0 < τ < 1. The LAD regression case then translates to: i =1 1This notation means for i = 1, . . . , n ∑ ρ0.5 (yi − f (xi )) n ON USING KHMADLAZATION IN THE STATISTICAL INFERENCE OF QUANTILE REGRESSION 3 Quantile regression essentially varies τ, which gives different weights to the positive and negative errors in this minimisation problem to yield the τth sample quantile. Note that the median is just the 0.5th sample quantile. 1.3. A Motivating Example. Despite the author’s lack of knowledge in economics, an example from economics will be presented by following the treatment in [Ko]. For the empirical study in this exposition, the R programming language is used. We will exhibit quantileregression-powered plot on the original Engel (1857) data on the relationship between food expenditure and household income. Figure 1 is a scatterplot of Food Expenditure versus the Household Income from the Engel food expenditure data. The blue line represents the median regression ﬁt, red the least squares ﬁt and grey the quantile regression ﬁts for τ = {0.05, 0.1, 0.25, 0.75, 0.90}. For convenience, the features of the data are presented in the form of R session. > dim(engel) [1] 235 2 > names(engel) [1] "income" "foodexp" > summary(engel) income foodexp Min. : 377.1 Min. : 242.3 1st Qu.: 638.9 1st Qu.: 429.7 Median : 884.0 Median : 582.5 Mean : 982.5 Mean : 624.2 3rd Qu.:1164.0 3rd Qu.: 743.9 Max. :4957.8 Max. :2032.7 ON USING KHMADLAZATION IN THE STATISTICAL INFERENCE OF QUANTILE REGRESSION 4 Food Expenditure 500 1000 1500 2000 1000 2000 3000 4000 5000 Household Income ON USING KHMADLAZATION IN THE STATISTICAL INFERENCE OF QUANTILE REGRESSION 5 Here the conditional mean and median ﬁts are quite different. Perhaps this is due to a strong inﬂuence exerted on the least squares ﬁt by points with high income but low food expenditure. From the ﬁgure, we may as well conclude that the least squares ﬁt is less robust compared to the median ﬁt, as it is apparent that the least squares ﬁt provides a poor estimate on the conditional mean of the poorest household in the sample; the least squares line passes above all of the very low income observations. We can now say that whenever the time and resources are available, LAD regression should supplement least squares regression. But why stop at the median? With each quantile ﬁt, we may perform a model selection to obtain a more specialised model. By the law of parsimony, such a local ﬁt is desirable. 2. T HE ISSUE OF N UISANCE PARAMETER 2.1. Paradigms of Statistical Inference. Ideally, we would like to provide the apparatus for statistical inference which is as good as the elegant classical theory of least-squares inference when the errors are under i.i.d Gaussian errors and its asymptotic variants when the errors are not. However, it is well-known that if we want to say something about the asymptotic variance of a quantile estimate, and thus that of the quantile regression parameters, we must deal with the −1 reciprocal of a density function evaluated at the quantile of interest, e.g. f ( F −1 (τ )) . Tukey calls this the sparsity function in [TU]. Intuitively, this quantity summarises the density of observations in the neighbourhood of the quantile of interest; if there are only sparse number of observations, then it is hard to estimate, but if there are many observations, the quantile will be more precisely estimated. This nuisance quantity forces us into the realm of density estimation and smoothing, but by the resampling methods such as in [Ha, Ho, BCY, HeHu], we may be able to avoid it. There is also a body of work dedicated to the asymptotic inference of the quantile regression; the rank-based inference based on the approach of [GJKP] works for various quantile regression inference problems including the construction of conﬁdence intervals for individual quantile regression parameter estimates. We will give short accounts on some available paradigms in the following subsections. 2.2. Directly Estimating the Asymptotic Covariance Matrix. This is developed in [SID]. Re−1 d call the nuisance parameter s(τ ) = f ( F −1 (τ )) . Then we can write dt F −1 (t) = s(t) and call upon the a difference quotient to estimate s(t) as follows: ON USING KHMADLAZATION IN THE STATISTICAL INFERENCE OF QUANTILE REGRESSION 6 ˆ sn (t) = ˆ− ˆ− Fn 1 (t + hn ) − Fn 1 (t − hn ) /2hn ˆ where F −1 is an estimate of F −1 and hn is a bandwith tending to zero as n → ∞. Now what should we plug-in into this estimator? Firstly, it is natural to consider the empirical quantile − ˆ function Fn 1 for F −1 . A simple approach is to use the residuals from the quantile regression ˆ ˆ ﬁt, ui = yi − xiT β(τ ), i ∈ [n]. Then we can invoke the order statistics to get the empirical − ˆ quantile function, Fn 1 (t) = u(i) , t ∈ the bandwith sequence: i −1 i n ,n . Secondly, in [HS], it is suggested that we use hn = n−1/3 z2/3 1.5s(t)/s (t) α 1/3 where zα is such that Φ(zα ) = 1 − α/2 and α is the size of the test. For an explicit application to the Engel data and the conditions for which these estimators hold desirable properties, refer to [Ko]. More sophisticated covariance matrix estimators are also available, the HendriksKoenker Sandwich in [HK] and the Powell Sandwich in [Po]. 2.3. Resampling Method. A thorough discussion of the resampling methods can be found in [Ko]. In general, this nonparametric approach has a smaller error approximation compared to directly estimation. The most promising resampling approach, shown by a Monte Carlo study, is the Markov Chain Marginal bootstrap proposed in [HeHu]. They consider generating a Markov Chain resampler which is based on solutions to a marginal coefﬁcientby-coefﬁcient version of the optimality condition for the general M-estimation problem. Its looks very extensible and its potential to deal with high parametric dimension is also very attractive due to its low complexity. 3. B ORROWING AN I DEA FROM E MPIRICAL P ROCESSES T HEORY 3.1. Distribution-Free Tests. Let X = ( X1 , . . . , Xn ) be a sequence of independent identically distributed random variables with some unknown distribution F ( x ). Consider the hypothesis test F ⊂ F, where F = { Fθ ( x )|θ ∈ Θ ∈ Rm } is a family of absolutely continuous distribution function depending on a ﬁnite m dimensional parameter θ. We then consider the parametric empirical process: ON USING KHMADLAZATION IN THE STATISTICAL INFERENCE OF QUANTILE REGRESSION 7 (3.1) ˆ Un ( x, θ ) = √ n Fn ( x ) − Fθ ( x ) ˆ ˆ ˆ where Fn ( x ) is the empirical distribution function of X and θ = θ ( X ) is an estimate of θ. Before we proceed any further, let us present some of the famous properties of the empirical 1 n distribution function Fn ( x ) = n ∑i=1 1[Xi ≤ x] : (1) Fn ( x ) → F ( x ), for every x, where F ( x ) is the theoretical underlying distribution from which we construct the empirical distribution. This consistency property is by the law of large numbers. √ (2) n( Fn ( x ) − F ( x )) ∼ AN (0, F ( x )(1 − F ( x ))), by the Central Limit Theorem. For now let |Θ| = 1, e.g. we are testing for a simple hypothesis. Then the famous nonparametric goodness of ﬁt tests, such as the χ2 , ω 2 and the Kolmogorov-Smirnov tests, based on the parametric process described in 3.1 depend on F ( x, θ ), but in such a way that their distributions do not depend on F ( x, θ ) when the hypothesis holds. This distribution-free property is very desirable in practical applications, since then we may get away from the density estimation process. This is due to the fact that the limiting distribution of Gn is the same for every − Fθ , as this process can be transformed by letting x = Fˆ 1 (y), to the empirical process ˆ θ a.s. ˜ Un (y) = √ − n Fn ( Fθ 1 (y)) − y ˆ which converges in law to a Brownian bridge process. For a proof, we prefer the treatment in [VDV] the most. 3.2. In the Case of Composite Hypothesis. However, in practice it may be difﬁcult to specify Fθ directly, i.e. when |Θ| > 1. We must now consider: ˆ (3.2) − Letting x = Fθ0 1 (y), Vn ( x ) = √ n Fn ( x ) − Fθˆn ( x ) ON USING KHMADLAZATION IN THE STATISTICAL INFERENCE OF QUANTILE REGRESSION 8 ˜ Vn (y) = √ √ − − n Fn Fθ 1 (y) − Fθˆn Fθ0 1 (y) ˆ = n Gn (y) − Gθˆn (y) such that Gθ0 (y) = y. Expanding Gθˆn (y) in θˆn in the neighbourhood of θ0 up to the linear term and under mild conditions on the sequence of estimators {θˆn } we have the Bahadur representation √ ˆ n(θˆn − θ0 ) = 0 1 h0 (s)dVn (s) + o p (1) ˙ Informally, if θ → Gθ has a derivative Gθ in an appropriate sense (Frechet), we can expand to get ˙ sup Gθˆn − Gθ0 − (θˆn − θ0 ) Gθ0 t = o p ( θˆn − θ0 ) and write ˆ (3.3) ˙ ˜ Vn (y) = Vn (y) − G (y) T 0 1 h0 (s)dVn (s) + rn (y) which converges in law to the Gaussian process ˆ ˙ ˜ V0 (y) = V0 (y) − G (y) T 0 1 h0 (s)dV0 (s) where V0 (s) is a Brownian bridge process. This Gaussian process has the covariance function ˆ ˜ ˜ E V0 (s)V0 (t) ˙ = s ∧ t − st − G (t) 0 t ˆ ˙ h0 (s)ds − G (s) 0 s ˆ ˙ h0 ( t ) + G ( s ) 0 1ˆ 1 0 ˙ h0 (s)h0 (t)dsdt G (t) ON USING KHMADLAZATION IN THE STATISTICAL INFERENCE OF QUANTILE REGRESSION 9 ˆ If θn is the MLE, we can write h0 (s) = −( E θ ψ)−1 ψ( F −1 (s)) with ψ = θ log f . Thus obtaining the Gaussian process introduces a component that is dependent on the underlying cdf F. This is termed the Durbin problem. 3.3. Khmaladzation. The Khmaladze martingale transform essentially purges the cdf term such that the test statistic for the composite hypothesis is distribution free. To start, we need to consider the Doob-Meyer decomposition for a parametric empirical process. Let us consider the random variables X1 , . . . , Xn be i.i.d on the interval [0, 1], and let X(1) , . . . , X(n) be the order statistics. Now consider the process: 1 n 1 n 1 [ t − Xi ≥ 0 ] = ∑ 1 [ t − X ≥ 0 ] (i ) n i∑ n i =1 =1 Fn (t) = From the theory of martingales, we can consider the history F Fn = FtFn |0 ≤ t ≤ 1 of the process Fn . Then Fn is a submartingale and X(i) are Markov times with respect to this history, i e.g. the event X(i) ≤ t = Fn (t) ≥ n ∈ F Fn . Before proceeding with the decomposition, let us now give the distributional properties that develop the decomposition. Fact 3.1. The process Fn (t), where Fn (0) = 0, is Markov such that n∆Fn (t) = n [ Fn (t + ∆(t) − Fn (t)], ∆t ≥ 0 under the condition that Fn (t) is Binomial (n [1 − Fn (t)] , 1∆tt ). Thus we have the conditional − expectation E ∆Fn (t)|FtFn = E [∆Fn (t)| Fn (t)] = Theorem 3.2. We have the decomposition Fn (t) = t ≤ 1 is a martingale. 1− Fn (t) 1−t ∆t. ´t 0 1− Fn (s) 1−s ds + mn ( t ), where mn (t), FtFn , 0 ≤ Proof. To show that mn is a martingale, from Fact 2.1. we get the conditional expectation ´ t 1− Fn (s) E mn (t)|FsFn = mn (s) and that mn is integrable by the inequality 0 1−s ds ≤ − log(1 − X(n) ), since this implies ﬁnite mean for mn (t) for all t ≤ 1 . Substituting the empirical process Vn (t) = Corollary 3.3. Vn (t) = Wn (t) − motion process w0 (t) ´t Vn (s) 0 1−s ds, √ n [ Fn (t) − t], we get the following result where Wn (t) = √ nmn (t) converges to a Brownian ON USING KHMADLAZATION IN THE STATISTICAL INFERENCE OF QUANTILE REGRESSION 10 Now we consider the general parametric empirical process. Let g(t) = (t, g1 (t), . . . , gm (t)) T ˙ be a (m + 1)-vector of real-valued functions on [0, 1]. Suppose that g(t) is linearly independent in a neighbourhood of 1 such that: ˆ C (t) = t 1 ˙ ˙ g(s) g(t) T ds is non-singular and write our transformation Q g as follows: ˆ Q g Vn (t) = Vn (t) − 0 t ˆ ˙ g(s) C T −1 1 (s) s ˙ g(r )dVn (r )ds ´1 ˙ In the special case of g(t) = t, we have C (s) = 1 − s and s gdVn (r ) = −Vn (s), which yields the Doob-Meyer decomposition in Fact 3.1. Then we can apply the transform on the representation 3.3: ˆ ˜ ˆ Vn (y) = Q g Vn (y) = Q g (Vn (y) + rn (y)) = V0 (y) + o p (1) Thus the g component in 3.3 is purged and an asymptotically distribution-free tests can be ˆ performed based on Vn (y). 4. A PPLYING K HMALADZATION TO Q UANTILE R EGRESSION I NFERENCE 4.1. Quantile Two Sample Treatment Effects. In this section we present an analog of Section 3 in the case of quantile regression. In order to do this, we will consider the classical twosample treatment-control problem. The premise is the following, from a random sample of size n, we randomly assign n1 treatment observations and n0 control observations. We then observe a response variable Yi and assess the effect of the treatment on this response. In [Le], a tractable generalisation is formulated. The treatment response is assumed to be x + ∆( x ) when the response of the untreated subject would be x. Then the distribution G of the treatment responses is just that of the random variable X + ∆( X ) where X is distributed according to the control distribution F. We can write ∆( x ) = G −1 ( F ( x )) − x and letting τ = F ( x ), we get what we will call the quantile treatment effect: ON USING KHMADLAZATION IN THE STATISTICAL INFERENCE OF QUANTILE REGRESSION 11 δ(τ ) = ∆( F −1 (τ )) = G −1 (τ ) − F −1 (τ ) This quantity can be estimated by: − − ˆ δ(τ ) = Gn11 (τ ) − Fn01 (τ ) where Gn1 and Fn0 are the empirical distribution functions of the treatment and control data points respectively. We then ask the following, “Is the treatment effect signiﬁcant?”. To answer this, the tests we discussed in Section 3 can be used, and indeed when the non-parametric null is free of the nuisance parameter, we may develop robust distribution-free tests based on just the empirical distribution function. However, if this is not the case we may run into some problems. For example, suppose that we wish to test the hypothesis that the response distribution under the treatment, G, differs from the control distribution F, by some pure locations shift, e.g. ∀τ ∈ [0, 1]: G −1 (τ ) = F −1 (τ ) + δ0 Here, δ0 is the nuisance parameter that cancels the distribution-free properties of the test. This leads us to Khmaladzation. 4.2. The Parametric Empirical Quantile Process. Consider a random sample yi , i ∈ [n] with − − cdf FY . We want to test the hypothesis FY 1 (τ ) = µ0 + σ0 F0 1 (τ ), denote this by α(τ ). Then we may consider the empirical quantile process: n ˆ α(τ ) = inf on which we deﬁne our test statistic: vn (τ ) = a ∈ R| ∑ ρτ (yi − a) = min! i =1 √ ˆ nϕ0 (τ )(α(τ ) − α(τ ))/σ0 ON USING KHMADLAZATION IN THE STATISTICAL INFERENCE OF QUANTILE REGRESSION 12 − where ϕ0 (τ ) = f 0 ( F0 1 (τ )) and vn (τ ) converges in law to the Brownian bridge process v0 (τ ). However, when the parameters θ = (µ, σ ) are unknown we need to employ Khmaladzation. − ˆT ˜ ˆ ˆ − Let ξ (t) = (1 − F0 1 (t)) T and α(t) = µ + σ0 F0 1 (t) = θn ξ (t), where the sequence of estimators ˆ θn satisﬁes the Bahadur representation in Section 3. Then we may prepare the parametric ˆ process vn (t) on which we will apply the martingale transform: ˆ vn (t) = ˆ ˜ nϕ0 (t)(α(t) − α(t))/σ0 √ ˆ = vn (t) − nϕ0 (t)(θ − θ0 )T ξ (t)/σ0 ˆ 1 −1 T h0 (s)dvn (s) + o p (1) = vn (t) − σ0 ϕ0 (t)ξ (t) 0 √ − − − ˙ Taking g(t) = (t, ξ (t) T ϕ0 (t)) T , we get g(t) = (1, f˙/ f , 1 + F0 1 (t) f˙( F0 1 (t))/ f ( F0 1 (t))) T . Since ξ (t) is in the linear span of g, we can proceed with the transform: ˜ ˆ vn (t) = Q g vn (t) which converges to a Brownian motion. 5. F UTURE W ORK It is of course desirable to construct distribution-free variance terms in the quantile regression inference. Resampling methods can achieve this and excellent research can be done to obtain more powerful tests and computationally more efﬁcient constructions. We can also drive the research the other way. The Khmaladze transform works due to the properties of a martingales and thus may be only viable for the quantile regression process. Perhaps, some kind of transformation that removes the sparsity term may be available if we are willing to delve further into functional analysis and operator theory. R EFERENCES [Ko] [MT] Roger Koenker. Quantile Regression, Econometric Society Monographs. Mosteller, F. and J. Tukey. Data Analysis and Regression: A Second Course in Statistics, Reading, MA: Addison-Wesley. ON USING KHMADLAZATION IN THE STATISTICAL INFERENCE OF QUANTILE REGRESSION 13 J. Tukey. What Part of the Sample Contains the Information?. Proceedings of the National Academy of Sciences, 53 (1965), 127-134. [GJKP] Gutenbrunner, C., Jurecková, J., Koenker, R., and Portnoy, S.. Tests of linear hypotheses based on regression rank scores, Journal of Nonparametric Statistics (1993) 2: 307-333. [Ha] J. Hahn, Bootstrapping quantile regression models, Econometric Theory 11 (1995), pp. 105–121. [Ho] J. Horowitz, Bootstrap methods for median regression models, Econometrica 66 (1998), pp. 1327–1352. [BCY] Bilias, Y. Chen, S. and Z. Ying. Simple resampling methods for censored quantile regression, J. of Econometrics, 99 (2000), 373-386. [HeHu] He, X; Hu, F. Markov chain marginal bootstrap. Journal of the American Statistical Association (2002). [SID] Siddqui, M. Distribution of Quantiles from a Bivariate Population. Journal of Research of the National Bureau of Standards, 64 (1960), 145-150. [HS] Hall, P. and S. Sheather. On the Distribution of a Studentized Quantile. Journal of the Royal Statistical Society, Series B, 50 (1988), 381-391. [HK] Hendricks, W. and R. Koenker. Hierarchical Spline Models for Conditional Quantiles and the Demand for Electricity. Journal of the American Statistical Association, 87 (1992), 58-68. [Po] Powell, J. L. The Asymptotic Normality of Two-Stage Least Absolute Deviations Estimators. Econometrica, 51 (1983), 1569-1576. [Le] Lehmann, E. Nonparametrics: Statistical Methods based on Ranks. (1974) Holden-Day, San Fransisco [VDV] A.W. van der Vaart. Asymptotic Statistics. [TU] Note. Here is the code for Section 1: library(quantreg) # R package developed by Roger Koenker ## Engel data(engel) dim(engel) names(engel) summary(engel) attach(engel) ## Compute the quantile regression for various tau’s plot(foodexp~income, cex=0.5, type="n", xlab="Household Income", ylab="Food Expenditure") points(foodexp~income, pch=23, col="orange", bg=’red’) abline(rq(foodexp~income,tau=0.5), col = "blue") abline(lm(foodexp~income), col="red") ON USING KHMADLAZATION IN THE STATISTICAL INFERENCE OF QUANTILE REGRESSION 14 taus <- c(0.05, 0.1, 0.25, 0.75, 0.9, 0.95) for (i in 1:length(taus)) { abline(rq(foodexp~income, tau=taus[i]), col="gray") }