Document Sample

J. R. Statist. Soc. B (1999) 61, Part 4, pp. 793^815 Multivariate bandwidth selection for local linear regression Lijian Yang Michigan State University, East Lansing, USA and Rolf Tschernig t, Humboldt-Universita Berlin, Germany [Received December 1997. Final revision January 1999] Summary. The existence and properties of optimal bandwidths for multivariate local linear regression are established, using either a scalar bandwidth for all regressors or a diagonal band- width vector that has a different bandwidth for each regressor. Both involve functionals of the derivatives of the unknown multivariate regression function. Estimating these functionals is dif®cult primarily because they contain multivariate derivatives. In this paper, an estimator of the multivariate second derivative is obtained via local cubic regression with most cross-terms left out. This estimator has the optimal rate of convergence but is simpler and uses much less computing time than the full local estimator. Using this as a pilot estimator, we obtain plug-in formulae for the optimal bandwidth, both scalar and diagonal, for multivariate local linear regression. As a simpler alternative, we also provide rule-of-thumb bandwidth selectors. All these bandwidths have satisfactory performance in our simulation study. Keywords: Asymptotic optimality; Blocked quartic ®t; Functional estimation; Partial local regression; Plug-in bandwidth; Rule-of-thumb bandwidth; Second derivatives 1. Introduction Nonparametric estimation in general requires little a priori knowledge on the functions to be estimated. The estimation results, however, depend crucially on the choice of bandwidth. Whereas choosing too large a bandwidth may introduce a large bias, selecting too small a bandwidth may cause large estimation variance. An asymptotically optimal bandwidth usually exists and can be obtained through a bias±variance trade-o. Such an optimal bandwidth in general involves functionals of the unknown underlying functions, and the selection of the optimal bandwidth has always been a challenge. Bandwidth selection methods with good theoretical properties and practical performance exist for univariate density estimation, e.g. Jones et al. (1996a, b), Cheng (1997) and Grund and Polzehl (1998), and univariate local least squares regression, e.g. Fan and Gijbels (1995) and Ruppert et al. (1995). Wand and Jones (1993) provided an excellent analysis of bandwidth selection for bivariate density estimation, Sain et al. (1994) looked at multivariate cross-validation for density estimation bandwidth selection and Wand and Jones (1994) developed plug-in (PI) bandwidths for multivariate kernel density estimation. Herrmann et al. (1995) studied bandwidth selection for bivariate regression with ®xed regular design, but it is unclear whether their method works for random Address for correspondence: Lijian Yang, Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA. E-mail: yang@stt.msu.edu & 1999 Royal Statistical Society 1369±7412/99/61793 794 L. Yang and R. Tschernig designs and higher dimensions. Ruppert (1997) proposed the empirical bias bandwidth selector (EBBS) which could be applied in a multivariate setting. However, the theoretical properties of the EBBS are unknown and its practical performance has only been studied in the univariate setting. As pointed out by Ruppert (1997), the diculty in obtaining a reliable multivariate data-driven bandwidth is essentially due to the complexity of estimating higher order multivariate derivatives. To address this diculty, we estimate multivariate second- order derivatives by a local cubic ®t, where all unnecessary terms are left out. This both solves the theoretical problem and makes a practical implementation feasible. We are not aware of any other work that makes use of the desirable properties of a partial local polynomial ®t. Another new feature is the use of a dierent bandwidth for each of the regressors, in contrast with the EBBS method which uses the same bandwidth for all regressors. We have established some general assumptions that ensure the existence of asymptotic optimal band- widths. This is not trivial because, when using a bandwidth vector, there is no simple closed formula for the solution of the bias±variance trade-o, and the existence of the solution needs to be implicitly proven. This use of a `diagonal bandwidth' is a good compromise between ¯exibility and optimality on the one hand (which need more smoothing parameters, such as a bandwidth matrix) and interpretation and simplicity on the other hand (fewer parameters, such as one bandwidth for all explanatory variables). We have also studied the theoretical properties and practical performances of the latter option, which we call the `scalar optimal bandwidth'. The computation of our PI bandwidths requires the estimation of functionals that include derivatives up to the fourth order. The idea is to estimate unknown functionals by blocked polynomial ®ts, as in Hardle and Marron (1995) and Ruppert et al. (1995), but we had È further to re®ne the idea. To do a quartic ®t of a function of four variables could require up to 70 parameters in each block, which not only makes it impossible to have a sucient number of blocks to adapt to local features of the function but also introduces much noise for the estimation within each block. Since the computation of one functional does not require all the derivatives included in a complete fourth-order Taylor expansion, we propose to use a partial quartic ®t in the blocks, with only 23 parameters for the same function. This advan- tage in number grows drastically if the number of variables becomes even more. The PI bandwidths are necessarily computation intensive, so we also investigated rule-of- thumb (ROT) bandwidths as simpler alternatives. The paper is organized as follows. In the next section, we show the existence of an asymp- totically optimal bandwidth vector. Section 3 de®nes an ROT and a PI bandwidth selector for local linear regression, and asymptotic optimality is established for the PI bandwidth. In Section 4, we describe in detail how functionals of second derivatives are estimated via partial local cubic regression, while parallel results using a scalar bandwidth selector are summarized in Appendix A. Implementation of the methods proposed is discussed in Section 5. In Section 6 we describe the results of a rather comprehensive Monte Carlo study. Section 7 concludes, while all assumptions and proofs are contained in Appendix B. The programs that are used in the analysis can be obtained from http://www.blackwellpublishers.co.uk/rss/ 2. Background To formulate the problem, consider a multivariate regression model Y f X g X Multivariate Bandwidth Selection 795 where Y is a scalar dependent variable, X X1 , X2 , . . ., Xd is a vector of explanatory variables, is independent of X, E 0 and var 1. Let Xi , Yi , i 1, 2, . . ., n, be an independent and identically distributed sample. Then the local linear estimator of f at a given point x x1 , x2 , . . ., xd is obtained by doing a ®rst-order Taylor expansion of the function f at point x for all the data points Xi and solving a least squares problem locally weighted by kernels: i.e. the estimator f x is the ®rst element c of the vector fc, c 144d g that minimizes 2 P n P d Yi À c À c Xi À x K h Xi À x i1 1 where K is a symmetric, compactly supported, univariate probability kernel (so that K is non- negative and K u 1) and Q 1 d Xi À x K h Xi À x K : 1 h h Denoting by p x the density of X, using a weight function w x, the weighted mean integrated squared error (MISE) E f f x À f xg2 w x p x dx is a function of the bandwidth vector h h1 , . . ., hd T , and the h that minimizes this error is called the optimal bandwidth vector. An asymptotic formula of the MISE in this setting is given by 4 P K d 1 AMISE h C f h2 h2 kK k2d B g 4 ,1 2 nh1 . . . hd in which 2 u2 K u du, kK k2 K 2 u du and K 2 B g g2 x w x dx, C f f x f x p x w x dx, , 1, . . ., d, where f x denotes the second derivative of f x with respect to the th variable x . For the general formula of AMISE using a bandwidth matrix, see p. 140 of Wand and Jones (1995). We denote by C f the matrix whose , th entry is C f , which is always non-negative de®nite. Also, we always have B g > 0 by assumption 2 in Appendix B. We introduce a function Q: Rd Â M d Â R 3 R 1 P d a 1 a Q v, M, a M v v p v T Mv p 2:1 4 ,1 v1 . . . vd 4 v1 . . . vd where M d is a set of non-negative de®nite d Â d matrices whose de®nition is given by expression (B.1) in Appendix B, R 0, I and M is the , th entry of a matrix M. With h2 h2 , . . ., h2 T , we then have 1 d 796 L. Yang and R. Tschernig kK k2d B g 2 AMISE h Q h2 , 4 C f , K : n Simple algebra gives Q v, M, a a4= d4 cd= d4 Q aÀ2= d4 c2= d4 v, cÀ1 M, 1 and therefore v M, a a2= d4 cÀ2= d4 v cÀ1 M, 1 2:2 where v M, a denotes the vector v that minimizes Q v, M, a. By theorem 6, v M, a is a well-de®ned and in®nitely smooth function of M and a. Using equation (2.2), we have the following theorem. Theorem 1. Under assumptions (1)±(4), AMISE h is minimized by a unique vector which depends on B g and C f smoothly via 1= d4 kK k2d B g 2 hopt v1=2 fC f , 1g: 2:3 n4K The optimal bandwidth hopt given in formula (2.3) contains unknown quantities B g and C f and hence cannot be directly used in the estimation of f x. In Section 4, appropriate estimators of f x and C f are given based on a partial local cubic ®t with many cross- terms left out. These estimators have very simple bias formulae and their biases and variances are of the same order as keeping the cross-terms in. 3. Bandwidth selection A quick substitute of hopt is the simple ROT bandwidth de®ned as 1= d4 kK k2d BROT g 2 hROT v1=2 fCROT f , 1g 3:1 n4 K in which CROT f and BROT g are based on piecewise quartic estimation of f x, and whose formulae are given by expressions (5.1) and (5.2). The bandwidth vector hROT is of the same order as hopt but does not estimate hopt eciently because f may not always be represented by a piecewise fourth-order polynomial. The idea of such an ROT bandwidth exists in the univariate case; see Fan and Gijbels (1995) and Ruppert et al. (1995). The latter additionally used a blocking idea that we have also adopted. To improve on the ROT idea, we propose a PI bandwidth, so called as it approximates hopt by plugging in consistent estimates of the functionals B g and C f . Speci®cally, we de®ne 1= d4 kK k2d B g hPI 2 v1=2 f C f , 1g 3:2 n4K in which C f f C f g f C f , hg where for each , 1, . . ., d 1P n C f f Xi f Xi w Xi f x f x w x p x dx Op nÀ1=2 3:3 n i1 and the f x is a partial local cubic estimator of f x given in equation (4.2). The estimator B g is de®ned in equation (5.3). Multivariate Bandwidth Selection 797 Using standard results such as contained in Wand and Jones (1995) or Fan and Gijbels (1996), we have B g=B g 1 Op nÀ2= d4 as n 3 I. On the basis of this, the asymptotics f as given in corollary 2 and the smooth dependence of hopt on C f as in theorem 1, of C we have the following theorem. Theorem 2. Under assumptions (1)±(4), for n 3 I, the bandwidth de®ned in equation (3.2) is asymptotically optimal. In particular, hPI hopt fId Op nÀ2= d6 g: Since hPI is a consistent substitute for the optimal bandwidth hopt , it can be shown that hPI performs asymptotically similarly to hopt in terms of MISE. hPI will on average work better than hROT since the latter is an inconsistent bandwidth estimator. Simulation results in Section 6 strongly support these expectations. In Appendix A, we discuss the use of a scalar bandwidth h instead of h, i.e. h h, . . ., h. In that case, the corresponding ROT bandwidth hROT and PI bandwidth hPI are given by equations (A.2) and (A.3). 4. Partial local estimation In this section, we formulate an estimator f x of f x. In what follows, denote for any compact supported function L I r L ur L u du; ÀI hence in particular 2 K We write 4 K 4 where denotes the kurtosis of the 2 . K K kernel K, which is always greater than 1. For the derivative functions, we denote @3 f x f x, @x @x @x etc. Denote Z 1, f Xi À x 2 g, f Xi À x Xi À x gT , Xi À x , f Xi À x Xi À x 2 gT , f Xi À x 2 Xi À x gT , Xi À x 3 n , i1 4:1 and then de®ne f x 2e T Z T WZ À1 Z T WY 4:2 where e is a (5d À 1)-vector of 0s whose ( 1)-element is 1, Y Yi nÂ1 and with a scalar bandwidth h h, . . ., h n 1 W diag Kh Xi À x ; n i1 see Ruppert and Wand (1994) for a general de®nition of a local polynomial estimator. We then obtain the following theorem. Theorem 3. Under assumptions (1)±(3), for 1, . . ., d, as h 3 0 and nhd4 3 I, we have 798 L. Yang and R. Tschernig p nhd4 f f x À f x À b xh2 g 3 N f0, 2 xg where 6 K À 6K P 2 K b x f x f x, 4:3 12 À 1K 4 T 2 2 dÀ1 4 f 4 K 2 À 2 2 K 2 2 kK k2 4 gkK k2 K 2 K g2 x 2 x 8 2 : 4:4 K À 1 p x The formation of the matrix Z is much simpler than the corresponding matrix of a full local cubic expansion. This makes the explicit mathematical derivation of Z T WZ À1 easier and computing expression (4.2) less costly. On the basis of equation (4.2), the asymptotic properties of C f, h are presented in the next theorem. In the following, we denote K * u K u u2 À 2 , K * K * the convolution K between K and K *, K * K * t K t À u K * u du, which is also K t u K * u du by symmetry, and K 2 K * K, the self-convolution of K. Also, we denote by the Kronecker index, which equals 1 or 0 depending on whether or not. Theorem 4. Under assumptions (1)±(4), as h 3 0 and nhd4 3 I, we have B g V K 1 C f, h C f fB f B f gh2 Op h4 op p d8 nhd4 n h in which B f f x b x w x p x dx, 2 dÀ2 4kK k2 f 4 K 2 kK k2 1 À 2 K 2 À 22 2 K 2 kK k2 4 kK k4 g 2 2 K 2 K 2 V K , 8 À 12 K p D n hd8 3 N f0, 2 g, K g where 32 g4 x w2 x dx 2 g, K F t F t dt 16 À 14 K and where we de®ne for any , 1, . . ., d Q Q F t 1 À K * * K t K * * K t K 2 t K * 2 t K 2 t : T, T 4 2 Note here that g x w x dx > 0 by assumption (2), and that therefore C f, h has a p standard deviation of order n hd8 À1 , which is smaller than one of the bias terms nhd4 À1 . Multivariate Bandwidth Selection 799 The two bias terms point to a trade-o if both are positive or cancellation if the h2 -term happens to be negative. Assuming that the h2 -term is never 0, we obtain the following results for estimating C f optimally. Corollary 1. Under the same assumptions as in theorem 4, the optimal bandwidth to estimate C f by C f, h is 8 1= d6 > > À B g V K > > if B f B f < 0, < fB f B f g n hC f ,opt 1= d6 4:5 > > d4 > > B g V K : if B f B f > 0, 2 fB f B f g n p d8 which gives the standard error of order n hC f ,opt O nÀ d4=2 d6 and the bias of order h4 f ,opt O nÀ4= d6 when B f B f < 0 and of order h2 f ,opt O nÀ2= d6 when C C B f B f > 0. We can now complete the de®nition of C f , and therefore of C f . Estimate all the derivatives f x etc. by blocked quartic ®ts and obtain the estimates of B f as described in Section 5. Then plug them in formula (4.5) together with B g in place of B g. Use the result- ing bandwidth in the estimator C f . De®nition (3.3) and theorem 4 yield the following corollary. Corollary 2. Under assumptions (1)±(4), for n 3 I, the C f de®ned by equation (3.3) has the consistency property C f C f fId Op nÀ2= d6 g: 5. Implementation In this section we describe how to estimate C f and B g to obtain the ROT and PI bandwidths (3.1) and (A.2), and (3.2) and (A.3) respectively. For the ROT estimation of f Xi and f Xi , Fan and Gijbels (1995) suggested the use of a quartic Taylor expansion. Ruppert et al. (1995) proposed to separate the data into equal- sized blocks and to use a quartic Taylor expansion on each block in case the function is very wiggly. Since the number of parameters per block increases dramatically with the increase in dimension and since the wiggliness of the function depends on each variable dierently, our ¯exible blocking scheme sorts the data one variable at a time. Let N denote the number of blocks in the th direction and collect them in N N1 , . . ., Nd . Then N Åd N is the 1 total number of blocks given the choices of N , 1, 2, . . ., d. To choose the optimal blocks we follow Ruppert et al. (1995) and use Mallows's Cp , RSS N fn À k d n=4k d g Cp N À fn À 2 k d N g minN fRSS Ng where RSS N denotes the residual sum of squares based on the quartic ®t with blocking N, P diÀ1 4 k d 1 i1 i is the maximum number of parameters in one block and a denotes the integer part of a. 800 L. Yang and R. Tschernig Denoting the block vector chosen with Cp N by N*, and by fROT,N* , fROT,N*, the corres- ponding function estimators, we then estimate 1 P fYi À fROT,N* Xi g2 w Xi n BROT g , 5:1 n À k d N* i1 p ROT Xi n 1P C ROT f fROT,N*, Xi fROT,N*, Xi w Xi 5:2 n i1 where p ROT x is the uniform density on the range of the data. These estimates are necessarily inconsistent unless the true f is a piecewise quartic function. This problem is avoided by PI estimation described in Section 4 and Appendix A. Now we estimate B g by 1 P w Xi fYi À fhROT Xi g n 2 B g 5:3 n i1 ~ phS Xi where hROT is de®ned by equations (3.1), (5.1) and (5.2), hS is the d-dimensional normal reference bandwidth (Silverman (1986), pages 86±87)) and phS x is a kernel density estimator. ~ To estimate C f we use partial local cubic regression as described in Section 4. This is better done with a bandwidth that is slightly larger than the optimal pilot bandwidth hC ,opt . The reason is that the bandwidth hC ,opt only minimizes the absolute value of the bias, whereas it completely ignores the variance in C f estimation. Asymptotically this is justi®ed since the standard deviation of C f, h is of higher order than the bias. To use a bandwidth that is not equal to hC ,opt would produce in the C f estimate a larger bias than the minimum bias, but not signi®cantly so if the bandwidth is larger. To see this eect we take the bandwidth hC ,opt and plot in Fig. 1 the ratio of the bias against the minimum bias (achieved at 1) as a function r of P 0:6, 2, for d 4. Here, we assume that B B > 0, and the ratio then is f d 4=2g2= d6 2 f d 4=2gÀ d4= d6 À d4 r : f d 4=2g2= d6 f d 4=2gÀ d4= d6 The broken line has the height of r 2. We can see that r is about 3 or less for > 1, whereas there is a dramatic increase for < 1. Also it is easy to verify that r 2 4 4 for any d. Meanwhile, for 2, the ratio of decrease in the standard deviation is À d8=2 , which is 1=32 if d 2 and more for larger d: This decrease in standard deviation is drastic whereas the increase in bias is at most 4 times. We therefore use 2hC ,opt , which should have very good ®nite sample performance. Fig. 1. Ratio of the increase in bias when estimating C( f ) with hC (f ),opt , for d 4 Multivariate Bandwidth Selection 801 The estimation of the unknown functional B is based on blocked quartic ®ts of f if d 4 3. For larger d, the number of parameters can be so large that for medium sample sizes blocking or even global estimation becomes impossible. We therefore use a partial quartic ®t for the estimation of each B , consisting of only those fourth-order derivatives f , 1, . . ., d, in equation (4.3) plus all lower order derivatives that form a subset of those. As an example, such a partial quartic polynomial centred at zero reads P d P d P d P d P d P d b0 b X b X X b X2 b X X2 b X2 X b X2 X2 : 1 1 1,T 1 1,T 1 For this partial expansion the number of parameters k d 6d À 1 grows linearly in d. It also includes all terms to estimate f . For d 4, the number of parameters drops to about a third of the complete expansion. The partial expansion allows not only more blocks but also the blocks to focus more on the variable directions that have more curvature. This is a new feature that is not an issue in the univariate case. For ecient partial quartic ®ts we propose to use at least 4 k d observations in each block. This amounts to requiring n 5 4 k d 4 6d À 1. For example, if d 4, partial blocking needs n 5 92 observations. If n < 4 6d À 1, we cannot recommend any of our methods with con®dence as there are just not enough data to estimate f accurately. To use standard software for the numerical optimization problem in equations (3.1), (3.2) or (2.3), it may be convenient to reparameterize v as v exp u, i.e. v1 exp u1 , . . ., vd exp ud where u u1 , . . ., ud T P Rd so there is no constraint on u. The gradient and Hessian matrices are then 0 1 Pd B exp u1 M1 exp u C 0 1 B 1 C 1 @Q fexp u, M, ag 1 B B F C CÀ p a BFC F B F C 2 fexp u . . . exp u g @ F A @u 2B F C 1 d @ Pd A 1 exp ud Md exp u 1 @ 2 Q fexp u, M, ag 1 diag fexp u1 , . . ., exp ud gM diag fexp u1 , . . ., exp ud g @u@u T 2 1 diag fM11 exp 2u1 , . . ., Mdd exp 2ud g 2 a p 1 , 4 fexp u1 . . . exp ud g dÂd which make it easy to ®nd u M, a lnfv M, ag, the u-value that minimizes Q fexp u, M, ag. We then obtain the following substitute for equation (2.3): 1= d4 kK k2d B g 2 u fC f , 1g hopt exp : n4K 2 Similar substitutes are used to compute equations (3.1) and (3.2). 6. Simulation results In this section we investigate the ®nite sample performance of the ROT bandwidths hROT 802 L. Yang and R. Tschernig Table 1. Model 1: mean of the ISE for various bandwidth selectors Bandwidth Kind Means for the following sample sizes: 100 250 500 Diagonal Asymptotic optimal 0.0326 0.0165 0.00893 PI 0.0409 0.0183 0.00931 ROT 0.0656 0.0233 0.0112 Scalar Asymptotic optimal 0.0338 0.0174 0.00934 PI 0.0401 0.0180 0.00928 ROT 0.0734 0.0244 0.0113 Table 2. Model 2: mean of the ISE for various bandwidth selectors Bandwidth Kind Means for the following sample sizes: 100 250 500 Diagonal Asymptotic optimal 0.0505 0.0246 0.0140 PI 0.0523 0.0248 0.0139 ROT 0.0752 0.0301 0.0168 Scalar Asymptotic optimal 0.0505 0.0246 0.0140 PI 0.0503 0.0240 0.0136 ROT 0.0824 0.0314 0.0175 Table 3. Model 3: mean of the ISE for various bandwidth selectors Bandwidth Kind Means for the following sample sizes: 100 250 500 Diagonal Asymptotic optimal 0.0704 0.0327 0.0186 PI 0.0711 0.0334 0.0185 ROT 0.114 0.0469 0.0242 Scalar Asymptotic optimal 0.0760 0.0361 0.0208 PI 0.0747 0.0354 0.0202 ROT 0.137 0.0560 0.0272 Table 4. Model 4: mean of the ISE for various bandwidth selectors Bandwidth Kind Means for the following sample sizes: 100 250 500 Diagonal Asymptotic optimal 0.0765 0.0344 0.0198 PI 0.0873 0.0378 0.0219 ROT 0.230 0.0525 0.0268 Scalar Asymptotic optimal 0.147 0.0604 0.0349 PI 0.105 0.0536 0.0337 ROT 0.282 0.0823 0.0431 given by equation (3.1) and hROT given by equation (A.2), the PI bandwidths hPI given by equation (3.2) and hPI given by equation (A.3) for the local linear estimation of f x. These results are compared with those obtained by using the asymptotic optimal bandwidths hopt as in equation (2.3) and hopt as in equation (A.1), which we can compute since f x, g x and p x are explicitly given. For each pseudodata set generated, we compute the ISE Multivariate Bandwidth Selection 803 Table 5. Model 5: mean of the ISE for various bandwidth selectors Bandwidth Kind Means for the following sample sizes: 100 250 500 Diagonal Asymptotic optimal 0.0662 0.0324 0.0193 PI 0.142 0.0573 0.0297 ROT 0.306 0.100 0.0466 Scalar Asymptotic optimal 0.150 0.0716 0.0425 PI 0.145 0.0692 0.0410 ROT 0.338 0.116 0.0606 Table 6. Model 6: mean of the ISE for various bandwidth selectors Bandwidth Kind Means for the following sample sizes: 250 500 Diagonal Asymptotic optimal 0.459 0.291 PI 0.363 0.225 ROT 0.641 0.502 Scalar Asymptotic optimal 0.791 0.502 PI 0.447 0.308 ROT 0.793 0.747 1P s f f U À f Ui g2 n i1 h i for the diagonal bandwidths h hopt , hPI , hROT and scalar bandwidths h hopt , hPI , hROT , where the Ui are taken from an equally spaced grid of roughly 2500 points. In total we consider six models of varying complexity and in all cases 100 replications are conducted. For all the models the random design matrix X is an independent sample from the uniform distribution on 0, 1d with sample size n, and for the estimation of C f we screen o observations Xi that are outside a, 1 À ad where a 1 À 0:91=d =2. The sample size n is 100, 250 or 500 except for model 6 where n 250 and n 500. The models are Y f X , $ N 0, 0:25, where the regression function is f x x2 x4 1 2 for model 1, f x sin f x1 x2 g for model 2, f x sin x1 sin 2x2 for model 3, f x fsin x1 sin 4x2 g=2 for model 4, 804 L. Yang and R. Tschernig Fig. 2. Density of the ISE for (a) model 1, (b) model 2, (c) model 3, (d) model 4, (e) model 5 and (f) model 6 (Ð, hopt ; - - - - -, hPI ; ± ± ±, hROT ; n 500) f x f x1 À 0:52 x2 g sin 2x3 2 for model 5 and f x sin f x1 0:5x2 2x3 0:5x4 g for model 6. Table 1 reports in its ®rst three rows the mean of the ISE based on hopt , hPI , hROT for each sample size of model 1. The last three rows present the corresponding results for the scalar bandwidth hopt , hPI and hROT . Similarly, Tables 2±6 display the results for models 2±6. In addition, Fig. 2 presents plots of the densities of the ISE based on hopt (full curve), hPI (short broken curve) and hROT (long broken curve) for all models, where the sample size is 500. Inspecting Tables 1±6 and Fig. 2, we ®nd that the PI bandwidth is always superior to the ROT bandwidth as it delivers function estimates that have smaller ISEs. Furthermore, the mean of the ISE declines with sample size as expected. If we use C f to measure complexity, the two-dimensional models 1±4 are of increasing complexity with C f 48:8, 194:8, 608:8, 3129:3 respectively. Since the ROT bandwidth is based on piecewise quartic ®ts, we might expect the ROT function estimates to perform worse Multivariate Bandwidth Selection 805 Fig. 3. Function f (.) of model 3: (a) true function; (b) estimate with hPI , n 100; (c) estimate with hROT , n 100; (d) estimate with hopt , n 500; (e) estimate with hPI , n 500; (f) estimate with hROT , n 500 806 L. Yang and R. Tschernig Fig. 4. Densities of bandwidth ratios for model 3 (- - - - - , PI; ± ± ±, ROT): (a) ®rst elements of hPI =hopt and hROT =hopt , n 100; (b) second elements of hPI =hopt and hROT =hopt , n 100; (c) h PI =h opt and h ROT =h opt , n 100; (d) ®rst elements of hPI =hopt and hROT =hopt , n 500; (e) second elements of hPI =hopt and hROT =hopt , n 500; (f) h PI =h opt and h ROT =h opt , n 500 than the PI estimates for complex models that are far from being piecewise quartic. This is con®rmed by Tables 1±4 and the ISE densities in Figs 2(a)±2(d). Fig. 3 shows the true function of model 3 and its estimates for one replication based on hPI and hROT , n 100, or hopt , hPI and hROT , n 500. For higher dimension, the advantage of PI over ROT estimates becomes more signi®cant. For the three-dimensional model 5 and four-dimensional model 6, Tables 5 and 6 show a reduction in MISE up to 60%. The ISE densities are plotted in Figs 2(e) and 2(f ). For models 1 and 2, there is practically no dierence between the diagonal and scalar bandwidth, whereas for model 3 the diagonal has only a slight advantage. This is due to the form of f, which does not require dierent amounts of smoothing in each direction. For models 4±6, the improvement in diagonal bandwidth becomes signi®cant as the f-function requires very dierent amounts of smoothing for various directions. Overall, we clearly prefer the diagonal to the scalar bandwidth. Although the function estimate is the ultimate goal, it is also informative to analyse how well the PI and ROT bandwidths approximate the asymptotically optimal bandwidth hopt . When comparing the densities of ratios, we ®nd that hPI =hopt is always to the right of hROT =hopt . The Multivariate Bandwidth Selection 807 ratio for the PI bandwidth is always closer to 1 except for a few cases where the ROT ratio is also close to 1. The PI ratio hPI =hopt also approaches 1 faster than the ROT ratio hROT =hopt as the sample size n increases. All these conclusions can be drawn from inspecting the density plots. Fig. 4 presents the density plots for the complicated model 3 based on 100 (Figs 4(a)± 4(c)) and 500 (Figs 4(d)±4(f )) observations. The densities of the ®rst and second elements of hPI =hopt and hROT =hopt are depicted in Figs 4(a), 4(b), 4(d) and 4(e). Figs 4(c) and 4(f ) show the densities of hPI =hopt and hROT =hopt . Increasing complexity or dimension makes the band- width estimation more dicult. 7. Conclusion We have shown the existence of both a diagonal and a scalar optimal bandwidth for multivariate regression and have proposed both an ROT and a PI bandwidth selector. The ROT method is simple to use but may perform poorly because of its inconsistency. The PI method is achieved by using partial local cubic regression to estimate the unknown func- tionals of second derivatives. The partial local cubic expansion does not need many terms and has simple bias and variance formulae. The pilot bandwidths for the functional estimation are determined by blocked partial quartic ®ts. In Monte Carlo simulations we investigated the performance of these automatic bandwidth selectors for models up to four dimensions and various sample sizes. The diagonal PI method is found to be the best in terms of the ISE, although the ROT method can be useful if the model is not too complicated. Furthermore, a bandwidth vector is always preferable to a scalar bandwidth. These conclusions are observed with a sample size as small as 100. The results of this paper can be generalized to autoregressive time series under conditions that lead to geometric mixing; see, for instance, Hardle et al. (1998) for such conditions. The È scalar plug-in bandwidth was used to obtain a data-driven asymptotic ®nal prediction error for the non-linear time series lag selection developed by Tschernig and Yang (1999). Acknowledgements The authors thank Rong Chen and Michael Neumann for their many helpful discussions. The very insightful comments from the referees and the Joint Editor are also gratefully acknow- ledged. These comments stimulated a substantial extension of the method. This work was È supported by Sonderforschungsbereich 373 `Quanti®kation und Simulation Okonomischer Prozesse' of the Deutsche Forschungsgemeinschaft, at Humboldt-Universitat zu Berlin. È Appendix A We demonstrate that a scalar bandwidth h can be used instead of a bandwidth vector h h1 , . . ., hd for estimating f x. Estimation based on a single bandwidth h achieves the same convergence rate as multiple bandwidths but may be less ecient when in each variable direction a dierent amount of smoothing is appropriate. However, a single bandwidth is more easily computed. When using a single bandwidth h to estimate the function f x, the AMISE is the following function of h: 4K 1 AMISE h C f h4 kK k2d B g d 2 4 nh where C f Æd ,1 C f . The scalar bandwidth h that minimizes AMISE h is 808 L. Yang and R. Tschernig 1= d4 d kKk2d B g 2 hopt : A:1 4n4 C f K Denote C f, h Æd ,1 C f, h; then we have the following theorem. Theorem 5. Under assumptions (1)±(4) in Appendix B, as h 3 0 and nhd4 3 I, we have P d B g V K P d ,1 1 C f, h C f 2h2 B f Op h4 op p ,1 nhd4 n hd8 in which p D n hd8 3 N f0, 2 g, K g where 32 g4 x w2 x dx P d 2 2 g, K F t dt: 16 À 14 K ,1 The optimal bandwidth to estimate C f by C f, h is 88 91= d6 >> > Pd > > > B g >> V K> > >< > = P d > > À ,1 > >> if B f < 0, >> > > 2 P B f n > d > ,1 > >: > ; < ,1 hC f ,opt 8 91= d6 > >> Pd >> > >> > >< B g V K> > = > > ,1 P d > d 4 >> if B f > 0: >> >> Pd > > >: > 4 B f n > ; ,1 : ,1 By using the formulae in theorem 5, we can obtain both the ROT and the PI bandwidth 8 91= d4 > > > > > < d kK k2d B g > = 2 ROT hROT , A:2 > >4n4 P C d > > : K ROT, f > > ; ,1 1= d4 d kK k2d B g 2 hPI A:3 K 4n4 C f in which BROT g and CROT, f are the same as used in equation (3.1). It is easy to verify that hPI hopt f1 Op nÀ2= d6 g. Appendix B We denote by S the support of w x, and all integrals in variable x are over S. The following assump- tions are needed to prove theorems 1, 3 and 4. (a) The density p x of X exists, is continuously dierentiable up to order 2 and is bounded below away from 0 on S (assumption 1). (b) The function f x is continuously dierentiable on S up to order 4 whereas g x is continuous on S and positive on an open subset of S (assumption 2). (c) The bandwidth h hn is a positive number depending on n such that h 3 0 and nhd4 3 I (assumption 3). Multivariate Bandwidth Selection 809 Denote by M d the set of all non-negative de®nite matrices M that are positive de®nite for non- negative vectors, i.e. M d M: M non-negative definite and inf h T Mh c M > 0 B:1 hPS dÀ1 " where S dÀ1 S dÀ1 Rd denotes the portion of the d À 1-dimensional unit sphere S dÀ1 with non- negative co-ordinates. For the diagonal bandwidth to exist, we assume that the matrix C f P M d (assumption 4). Assumption 4 is not trivial. Take the function f x1 , x2 exp x1 sin x2 ; together with p x equal to the uniform density on 0, 12 and w x p x, we have 2 1 À2 1 1 À2 C f exp 2x1 sin x2 dx fexp 2 À 1g 0,12 À2 4 4 À2 4 and therefore 2 , 1 C f 2 , 1 T 0 even though 2 , 1 T P R2 . For the proof of theorem 1, we also need lemmas 1 and 2 and theorem 6. Lemma 1. 0 1 P d M1 v C 0 À1 1 B v1 B 1 C @Q v, M, a 1 B B F C CÀ p a B F C B F C, B F F C 2 v . . . v @ F A B:2 @v 2B C 1 d @ Pd A vÀ1 d Md v 1 0 1 3vÀ2 1 vÀ1 vÀ1 1 2 FFF vÀ1 vÀ1 1 d B C B À1 À1 C 2 B v2 v1 3vÀ2 FFF À1 À1 v2 vd C @ Q v, M, a 1 a B 2 C M p B C: B:3 @v@v T 2 4 v1 . . . vd B F F FF F C B F F F C @ F F F F A À1 À1 À1 À1 vd v1 vd v2 FFF 3vÀ2 d Proof. Direct matrix calculation plus de®nition (2.1) yield the results. Lemma 2. For any ®xed M P M d and a > 0, Q v, M, a is a strictly convex function of v P Rd and therefore can be minimized by at most one vector v in Rd . Proof. Both terms in equation (B.3) are non-negative de®nite, whereas the second term is positive de®nite, so the Hessian of Q v, M, a with respect to v is positive de®nite. Theorem 6. For any ®xed M P M d and a > 0, Q v, M, a has a unique minimum vector vopt v M, a, which is in®nitely continuously dierentiable in both M and a. Proof. M P M d implies that 1 P d 1 P 2 1 d Q v, M, a 5 M v v 5 c M v c Mkvk2 4 ,1 4 1 4 for all v P Rd and Euclidean norm kvk Æd v2 1=2 , so 1 1 lim fQ v, M, ag 5 lim fc M kvk2 g I kvk3I 4 kvk3I while it is also clear that we always have 1 a lim fQ v, M, ag 5 lim p I: kvk30 4 kvk30 v1 . . . vd 810 L. Yang and R. Tschernig Thus the in®mum of Q v, M, a is reached at a v whose norm is bounded away from I and 0, and by the previous lemma this v is unique. Equation (B.2) and the implicit function theorem show that v M, a is an in®nitely continuously dierentiable function of M and a. To prove theorem 3, we need the dispersion matrix of the partial local cubic estimator. Lemma 3. Let Z be as in equation (4.1) and H diag 1, hÀ2 I2dÀ1 , hÀ1 Id , hÀ3 I2dÀ2 , hÀ3 : As n 3 I, T 1 2 11Âd H Z T WZ H diag K 4 , K IdÀ1 , D f p xI5dÀ1 op 1g B:4 2 1dÂ1 K 4 f À 1Id 1dÂd g K and À 1 d À 1À1 ÀÀ2 À 1À1 11Âd I5dÀ1 H T Z T WZ H À1 diag K À4 , K IdÀ1 , D À1 op 1 ÀÀ2 À 1À1 1dÂ1 K À4 À 1À1 Id K p x B:5 uniformly in a compact neighbourhood of x. Here 0 2 1 K Id A B C B T C B A 6 IdÀ1 K 0 dÀ1Â dÀ1 0 dÀ1Â1 C DB T B B C @ 0 dÀ1Â dÀ1 6 f À 1IdÀ1 1 dÀ1Â dÀ1 g K 6 1 dÀ1Â1 C K A CT 01Â dÀ1 6 11Â dÀ1 K 6 K in which 0 1 0 1 0 1 IÀ1 0 À1Â dÀ 0 À1Â dÀ1 0 À1Â1 B 01Â dÀ C, B C B C A 4 @ 01Â À1 K A B 4 @ 11Â dÀ1 A, K C 4 @ 1 A: K 0 dÀÂ À1 IdÀ 0 dÀÂ dÀ1 0 dÀÂ1 Proof. Equation (B.4) follows by the usual array-type central limit theorem, K being a symmetric compact product kernel; see, for example, Wand and Jones (1995). Equation (B.5) follows by direct veri®cation. The exact inverse of D can be obtained by using the tools in Lutkepohl (1996). È Proof of theorem 3. Given any 1, . . ., d and denoting Z by Z, we ®rst note that e T H H T Z T WZH À1 H T Z T WZe0 f x e T Z T WZ À1 Z T WZe0 f x 0, e T H H T Z T WZH À1 H T Z T WZe f x f x, whereas for any H 1, . . ., 5d À 1, H T , e T H H T Z T WZH À1 H T Z T WZeH e T Z T WZ À1 Z T WZeH 0: Combining these with equation (B.5) of lemma 3, we have f x À f x 2e T H H T Z T WZH À1 H T Z T WY À f x T1 T2 T3 where 2 op 1 P n Xi À x 2 T1 4 Kh Xi À x À 2 K f Xi À f x À Xi À x T r f x K À 1 p xnh i1 2 h2 1 1 P 3 À Xi À x T r2 f x Xi À x À f x Xi À x 3 À f x Xi À x 2 Xi À x 2 3! T 3! P 3 À f x Xi À x Xi À x 2 , T 3! Multivariate Bandwidth Selection 811 2 op 1 Pn Xi À x 2 P T2 À 4 Kh Xi À x À 2 K Xi À x Xi À x f x , K À 1 p xnh2 i1 h2 <,TT 2 op 1 Pn Xi À x 2 2 T3 4 Kh Xi À x À K g Xi i : K À 1 p xnh2 i1 h2 The asymptotic variance of T3 is calculated as 2 4 o 1 u À x 2 Kh u À x2 À 2 K g2 u p u du K À 12 p x2 nh4 8 h2 which (using u x hv) 4 o 1 K v2 v2 À 2 2 g2 x hv p x hv dv K 8 À 12 p x2 nhd4 K 2 dÀ1 4 f4 K 2 À 2 2 K 2 2 kK k2 4 g kK k2 K 2 K g2 x f1 o 1g 2 x o h2 B:6 8 À 12 p xnhd4 K with 2 x as given in equation (4.4). A similar procedure applied to T1 yields 2 op 1 u À x 2 T1 4 Kh u À x À 2 K f u À f x À u À x T r f x K À 1 p xh2 h2 1 1 P 3 À u À x T r2 f x u À x À f x u À x 3 À f x u À x u À x 2 2 3! T 3! P 3 À f x u À x 2 u À x p u du T 3! 2 op 1 h2 h3 4 K v v2 À 2 f x hv À f x À hv T rf x À v T r2 f xv À f x v3 K K À 1 p xh2 2 3! P h3 P h3 À f x v v2 À f x v2 v p x hv dv T 2 T 2 h2 K v v2 À 2 v2 v2 dv K P P f x p x f f x p x f x p xg 4 À 1 p x K <,TT < 2 h2 K v v2 À 2 v4 dv P K f x p x P f x p x d 4 o h2 : K À 1 p x T 3 1 12 We note then that 0 a < , T T , K v v2 À 2 v2 v2 dv K À 16 K T , B:7 0 T , K v v2 À 2 v4 dv K 6 K À 6 K , so 812 L. Yang and R. Tschernig h2 À 1 6 P f x p x h2 f6 K À 6 g f x p x K K T1 4 o h2 4 À 1 p x T K 2 K À 1 p x 12 or T1 b xh2 o h2 , B:8 where b x is as given in equation (4.3). The term T2 is treated as 2 op 1 Pn Xi À x 2 P Xi À x Xi À x T2 À Kh Xi À x À 2K f x 4 À 1 p xn i1 K h2 <,TT h h P 2 f xf1 op 1g u À x 2 2 u À x u À x À Kh u À x À K p u du <,TT 4 À 1 p x K h2 h h P 2 f x À 4 K v v2 À 2 v v p x hv dvf1 op 1g K <,TT K À 1 p x P K v v2 À 2 v2 v2 dv f x p x K 2 À2h 4 f1 op 1g o h2 B:9 <,TT K À 1 p x since K v v2 À 2 v2 v2 dv 0 for < , T T , by equation (B.7). Now equations (B.8), (B.9) K and (B.6) together prove theorem 3. Proof of theorem 4. Since 1 f x f x fb xh2 T3, g 1 op h2 p d4 nh where 2 Pn Xi À x 2 T3, Kh Xi À x À 2 g Xi i K 4 À 1 p xnh2 i1 K h2 we have C f, h f x f x w x p x dx Op nÀ1=2 h2 C f h2 fB f B f g T3, T3, w x p x dx op h4 p d4 : B:10 nh Thus it remains to compute the term T3, T3, p x dx, which has a simple decomposition as P P 2 H Xi , Xj i j H Xi , Xi 2 i 14i< j4n 14i4n where 4 Kh Xi À x Kh Xj À x w x Xi À x 2 Xj À x 2 H Xi , Xj À 2 K À 2 K g Xi g Xj dx: 8 À 12 p xn2 h4 K h2 h2 Note ®rst that Multivariate Bandwidth Selection 813 2 4 K 2 X1 À x w x dx X1 À x 2 h X1 À x E fH X1 , X1 2 g E 1 À 2K À 2 g2 X1 K 8 À 12 p xn2 h4 K h2 h2 4 K 2 y À x w x dx y À x 2 h y À x 2 À 2K À 2 g2 y p y dy K 8 À 12 p xn2 h4 K h2 h2 yÀxhu 4 w x K 2 u u2 À 2 u2 À 2 g2 x hu p x hu dx du K K 8 À 12 p xn2 hd4 K 4 g2 x w x dx K 2 u u2 u2 À 2 u2 À 2 u2 4 duf1 o 1g K K K 8 2 2 d4 K À 1 n h B g V K f1 o 1g n2 hd4 and similarly that 1 EfH X1 , X1 2 4 g O 1 : n4 h83d Hence by an array-type central limit theorem P B g V K 1 H Xi , Xi 2 i Op p 3 83d . B:11 14i4n nhd4 n h Meanwhile, using a central limit theorem for non-degenerate U-statistics as contained in Hall (1984), one can verify that P 2 H Xi , Xj i j 14i< j4n is asymptotically normal with mean 0 and asymptotic variance n2 Ef4H X1 , X2 2 2 2 g 2n2 E fH X1 , X2 2 g 1 2 2 2 4 Kh X1 À x Kh X2 À x w x X1 À x 2 X2 À x 2 2n2 E À 2 g X1 K À 2 g X2 dx K 8 À 12 p xn2 h4 K h2 h2 which is 2 4 Kh X1 À x Kh X2 À x w x X1 À x 2 X2 À x 2 2n2 E À 2 K À 2 g X1 g X2 dx K 8 À 12 n2h4 p x K h2 h2 2 4 Kh y1 À x Kh y2 À x w x y1 À x 2 y2 À x 2 2n2 À 2 K À 2 K g y1 g y2 dx 8 À 12 n2h4 p x K h2 h2 Â p y1 p y2 dy1 dy2 32 Kh y1 À x1 Kh y2 À x1 Kh y1 À x2 Kh y2 À x2 w x1 w x2 y1 À x1 2 2 À K 16 À 14 n2h8 p x1 p x2 K h2 y2 À x1 2 y1 À x2 2 y2 À x2 2 Â À 2K À 2K À 2 K h2 h2 h2 Â g2 y1 g2 y2 p y1 p y2 dx1 dx2 dy1 dy2 : 814 L. Yang and R. Tschernig This, after doing a change of variables y1 À x1 hu1 and y2 À x2 hu2 , becomes 32 K u1 Kh x2 À x1 hu2 Kh x1 À x2 hu1 K u2 w x1 w x2 2 u1 À 2 K 16 À 14 n2h8 K p x1 p x2 x2 À x1 hu2 2 x1 À x2 hu1 2 Â À 2 K À 2 u2 À 2 g2 x1 hu1 g2 x2 hu2 K 2 K h2 h2 Â p x1 hu1 p x2 hu2 dx1 dx2 du1 du2 : With another change of variables x2 À x1 ht1 , this becomes 32 K u1 K t1 u2 K Àt1 u1 K u2 w x1 w x1 ht1 2 4 2 d8 u1 À 2 K 16 K À 1 n h p x1 p x1 ht1 Â f t1 u2 2 À 2 gf Àt1 u1 2 À 2 g u2 À 2 g2 x1 hu1 g2 x1 ht1 hu2 K K 2 K Â p x1 hu1 p x1 ht1 hu2 dx1 dt1 du1 du2 32 g4 x1 w2 x1 dx1 K u1 u2 À 2 K Àt1 u1 f Àt1 u1 2 À 2 g K u2 1 K K 16 À 14 n2 hd8 K Â u2 À 2 K t1 u2 f t1 u2 2 À 2 g dt1 du1 du2 f1 o 1g 2 K K 32 g4 x1 w2 x1 dx1 dt1 K u1 K u1 À t1 u2 À 2 f u1 À t1 2 À 2 g du1 1 K K 16 À 14 n2hd8 K Â K u2 K u2 t1 u2 À 2 f u2 t2 2 À 2 g du2 f1 o 1g 2 K K 32 g4 x w2 x dx F t F t dt f1 o 1g 16 À 14 n2hd8 K where F t is as given in theorem 4. Hence P 2 H Xi , Xj i j 14i< j4n has mean 0 and asymptotic variance 32 g4 x w2 x dx F t F t dt: 16 À 14 n2hd8 K This, plus equations (B.10) and (B.11), completes the proof of theorem 4. References Cheng, M. Y. (1997) A bandwidth selector for local linear density estimators. Ann. Statist., 25, 1001±1013. Fan, J. and Gijbels, I. (1995) Adaptive order polynomial ®tting: bandwidth robusti®cation and bias reduction. J. Comput. Graph. Statist., 4, 213±227. Ð (1996) Local Polynomial Modelling and Its Applications. London: Chapman and Hall. Grund, B. and Polzehl, J. (1998) Bias corrected bootstrap bandwidth selection. J. Nonparam. Statist., 8, 97±126. Hall, P. (1984) Central limit theorem for integrated square error of multivariate nonparametric density estimators. J. Multiv. Anal., 14, 1±16. Multivariate Bandwidth Selection 815 Hardle, W. and Marron, J. S. (1995) Fast and simple smoothing parameter selection. Comput. Statist. Data Anal., 20, È 1±17. Hardle, W., Tsybakov, A. B. and Yang, L. (1998) Nonparametric vector autoregression. J. Statist. Planng Inf., 68, È 221±245. Herrmann, E., Wand, M. P., Engel, J. and Gasser, Th. (1995) A bandwidth selector for bivariate kernel regression. J. R. Statist. Soc. B, 57, 171±180. Jones, M. C., Marron, J. S. and Sheather, S. J. (1996a) A brief survey of bandwidth selection for density estimation. J. Am. Statist. Ass., 91, 401±407. Ð (1996b) Progress in data-based bandwidth selection for kernel density estimation. Comput. Statist., 11, 337±381. Lutkepohl, H. (1996) Handbook of Matrices. Chichester: Wiley. È Ruppert, D. (1997) Empirical-bias bandwidths for local polynomial nonparametric regression and density estim- ation. J. Am. Statist. Ass., 92, 1049±1062. Ruppert, D., Sheather, S. J. and Wand, M. P. (1995) An eective bandwidth selector for local least squares regression. J. Am. Statist. Ass., 90, 1257±1270. Ruppert, D. and Wand, M. P. (1994) Multivariate locally weighted least squares regression. Ann. Statist., 22, 1346±1370. Sain, S. R., Baggerly, K. A. and Scott, D. W. (1994) Cross-validation of multivariate densities. J. Am. Statist. Ass., 89, 807±817. Silverman, B. W. (1986) Density Estimation for Statistics and Data Analysis. London: Chapman and Hall. Tschernig, R. and Yang, L. (1999) Nonparametric lag selection for time series. J. Time Ser. Anal., to be published. Wand, M. P. and Jones, M. C. (1993) Comparison of smoothing parametrizations in bivariate kernel density estimation. J. Am. Statist. Ass., 88, 520±528. Ð (1994) Multivariate plug-in bandwidth selection. Comput. Statist., 9, 97±117. Ð (1995) Kernel Smoothing. London: Chapman and Hall.

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 4 |

posted: | 4/1/2012 |

language: | English |

pages: | 23 |

OTHER DOCS BY gegeshandong

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.