STAT 587 Homework Assignment No - DOC by kol12169

VIEWS: 17 PAGES: 2

									                                  STAT 587 Homework Assignment No.1

Problem 1:

        Install R software to your computer. Read the first 51 pages of "An Introduction to R" (also in
        http://www.r-project.org/).

        [Remark: you do not need to hand-in the results for this problem. ]

Problem 2

    A. Brand preference. In a small-scale experimental study of the relation between degree of brand
       liking (Y ) and moisture content ( X 1 ) and sweetness ( X 2 ) of the product, the following results
       were obtained from the experiment based on a completely randomized design (data are coded):


   i:    1      2     3      4     5      6     7      8     9     10       11   12   13   14   15   16
X1 :     4      4     4      4     6      6     6      6     8      8        8    8   10   10   10   10
X2 :     2      4     2      4     2      4     2      4     2      4        2    4    2    4    2    4
 Yi :   64     73    61    76     72    80     71    83     83     89       86   93   88   95   94 100

                                                                                               ˆ
    a. Fit regression model to the data. State the estimated regression function. How is 1 interpreted
       here?
    b. Obtain the residuals and prepare a box plot of the residuals. What information does this plot
       provide?
                                   ˆ
    c. Plot the residuals against Y , X1, X 2 , and X 1 X 2 on separate graphs. Also prepare a normal
       probability plot. Analyze the plots and summarize your findings.
    d. Conduct a formal test for lack of fit of the first-order regression function; use a = .01. State the
       alternatives, decision rule, and conclusion.


    B. Refer to Brand preference. The diagonal elements of the hat matrix are:
       h55  h66  h77  h88  h99  h10,10  h11,11  h12,12  .137 and
        h11  h22  h33  h44  h13,13  h14,14  h15,15  h16,16  .237.

             a. Explain the reason for the pattern in the diagonal elements of the hat matrix.
             b. According to the rule of thumb stated in the chapter, are any of the observations outlying
                with regard to their X values.
             c. Obtain the studentized deleted residuals and identify any outlying Y observations.
             d. Case 14 appears to be a borderline outlying Y observation. Obtain the DFFITS,
                DFBETAS, and Cook’s distance values for this case to assess its influence. What do you
                conclude?
             e. Calculate the average absolute percent difference in the fitted values with and without
                case 14. What does this measure indicate about the influence of case 14?
             f. Calculate Cook’s distance D, for each case. Are any cases influential according to this
                measure?
Problem 3.

   A. Car purchase. A marketing research firm was engaged by an automobile manufacturer to
      conduct a pilot study to examine the feasibility of using logistic regression for ascertaining the
      likelihood that a family will purchase a new car during the next year. A random sample of 33
      suburban families was selected. Data on annual family income (X 1, in thousand dollars) and the
      current age of the oldest family automobile (X2, in years) were obtained. A follow-up interview
      conducted 12 months later was used to determine whether the family actually purchased a new
      car (Y = 1) or did not purchase a new car (Y = 0) during the year.

         i:      1      2      3        ...          31     32     33
        Xi1:    32     45     60        ...          21     32     17
        Xi2:     3      2      2        ...           3      5      1
        Yj:      0      0      1        ...           0      1      0


       Multiple logistic regression model with two predictor variables in first-order terms is assumed to
       be appropriate.

       a. Find the maximum likelihood estimates of  o,  1, and  2. State the fitted response function.
       b. Obtain exp(b1) and exp(b2) and interpret these numbers.
       c.What is the estimated probability that a family with annual income of $50 thousand and an
       oldest car of 3 years will purchase a new car next year?
   B. Refer to Car purchase
       a. To assess the appropriateness of the logistic regression function, form three groups of 11 cases
       each according to their fitted logit values  ' . Plot the estimated proportions p j against the
                                                       ˆ
       midpoints of the ˆ ' intervals. Is the plot consistent with a response function of monotonic
       sigmoidal shape? Explain.
       c. Obtain the deviance residuals and present them in an index plot. Do there appear to be any
       outlying cases?
       d. Construct a half-normal probability plot of the absolute deviance residuals. Do any cases here
      appear to be outlying?


Additional Problem (Required STAT PhD Students; optional for other students)

       Prove the two equations in Remark 3 on page 8 of the lecture notes on GLM
       (http://www.stat.rutgers.edu/~mxie/stat587/lecture2.pdf).

       [Hint: Consider the expectations of the first and the second derivatives (w.r.t. theta_i) of the log-
       likelihood function. ]

								
To top