HOMEWORK ON BANDWIDTH CHOICE due Tue Feb 20 Suppose that we have

Document Sample
HOMEWORK ON BANDWIDTH CHOICE due Tue Feb 20 Suppose that we have Powered By Docstoc
					HOMEWORK ON BANDWIDTH CHOICE                                       due Tue Feb 20

Suppose that we have data Yi , xi , i = 1, . . . , n with Yi = m(xi ) + i , with the
 i ’s independent mean zero and variance σ . We’ll estimate m using a local
quadratic fit with kernel K. We can show that, for xi ’s equally spaced on [0, 1],
under regularity conditions,

                                m (x)    u4 K(u) du      m (x)
             Bias(m (x)) ∼ h2
                  ˆ                                 ≡ h2       Kb
                                 6       u2 K(u) du       6

                                 σ2     u2 K 2 (u) du   σ2
                  var(m (x)) ∼
                      ˆ             3 [ u2 K(u) du]2
                                                      ≡     Kv
                                 nh                     nh3

 1a. What is the asymptotic integrated mean squared error? What is hopt , the
     value of h that minimizes the asymptotic integrated mean squared error?
     (You don’t need to do any fancy theory here - assume that you can just
     go ahead and use the above formulae.)
 1b. Suppose that K(u) = (1/ 2π) exp(−u2 /2), that is, K is the standard
     normal kernel. Find the values of Kb and Kv in the formulae for the
     asymptotic bias and variance. (For your calculations, you can reference a
     probability book or wikipedia or ... for moments of a normal distriubtion.
     But do state your reference, and exactly what you are getting from the
 1c. Suppose that m(x) = a exp(bx) + cx + d. What is the value of hopt from
     part 2? (It will possibly depend on a, b, c, d.)

   Data analysis: For 2a)-2b) use the dataset used in class - onebms.txt - that
has body mass as a function of week (weeks -1 to 60 with some missing). For
ease, rescale to (0,1) and assume that these xi ’s are evenly spaced. You’ll esti-
mate the growth rate (the derivative of the regression function) using locpoly,
with local quadratic and normal kernel (assume locpoly uses the standard nor-

 2a. Find a “first generation” rule of thumb value of h as follows. Estimate a,
     b, c and d in the model m(x) = a exp(bx) + cx + d using non-linear least
     squares (see my R code - I had trouble with R’s nls, so wrote this). Use
     the estimated m: m(x) = a exp(ˆ + cx + d and Rice’s estimate of σ 2 to
                        ˆ      ˆ      bx) ˆ
     plug into the hopt from part 1c above. What is your estimate of hopt ?
 2b. Use locpoly to estimate m , using the h from 2a.

   For 3a)-3c): use another data analysis. Use the R data set beav2: body tem-
peratures of a beaver, taken ever 10 minutes.
help(beav2) # gives information
x <- (1:100)/101; y <- beav2$temp                 ## gives the temperature
Smooth y in three ways:
 3a. Use KernSmooth’s dpill choice of h in locpoly, for a local linear estimator.
     3b.] Use smooth.spline with default choice of smoothing (gcv) - you don’t
     need to code gcv.
 3c. Use the cross-validation R code I wrote for lecture, to choose a bandwidth
     for a local linear estimator.
For 3a)-3c): hand in a plot showing the data and your three estimates (or three
separate plots, if you like). For each estimate, how many effective parameters
did you use? (For some of these, you may need to use the hatmatrix R code.)


Shared By: