# STAT 587 Homework Assignment No - DOC by kol12169

VIEWS: 17 PAGES: 2

• pg 1
```									                                  STAT 587 Homework Assignment No.1

Problem 1:

Install R software to your computer. Read the first 51 pages of "An Introduction to R" (also in
http://www.r-project.org/).

[Remark: you do not need to hand-in the results for this problem. ]

Problem 2

A. Brand preference. In a small-scale experimental study of the relation between degree of brand
liking (Y ) and moisture content ( X 1 ) and sweetness ( X 2 ) of the product, the following results
were obtained from the experiment based on a completely randomized design (data are coded):

i:    1      2     3      4     5      6     7      8     9     10       11   12   13   14   15   16
X1 :     4      4     4      4     6      6     6      6     8      8        8    8   10   10   10   10
X2 :     2      4     2      4     2      4     2      4     2      4        2    4    2    4    2    4
Yi :   64     73    61    76     72    80     71    83     83     89       86   93   88   95   94 100

ˆ
a. Fit regression model to the data. State the estimated regression function. How is 1 interpreted
here?
b. Obtain the residuals and prepare a box plot of the residuals. What information does this plot
provide?
ˆ
c. Plot the residuals against Y , X1, X 2 , and X 1 X 2 on separate graphs. Also prepare a normal
probability plot. Analyze the plots and summarize your findings.
d. Conduct a formal test for lack of fit of the first-order regression function; use a = .01. State the
alternatives, decision rule, and conclusion.

B. Refer to Brand preference. The diagonal elements of the hat matrix are:
h55  h66  h77  h88  h99  h10,10  h11,11  h12,12  .137 and
h11  h22  h33  h44  h13,13  h14,14  h15,15  h16,16  .237.

a. Explain the reason for the pattern in the diagonal elements of the hat matrix.
b. According to the rule of thumb stated in the chapter, are any of the observations outlying
with regard to their X values.
c. Obtain the studentized deleted residuals and identify any outlying Y observations.
d. Case 14 appears to be a borderline outlying Y observation. Obtain the DFFITS,
DFBETAS, and Cook’s distance values for this case to assess its influence. What do you
conclude?
e. Calculate the average absolute percent difference in the fitted values with and without
case 14. What does this measure indicate about the influence of case 14?
f. Calculate Cook’s distance D, for each case. Are any cases influential according to this
measure?
Problem 3.

A. Car purchase. A marketing research firm was engaged by an automobile manufacturer to
conduct a pilot study to examine the feasibility of using logistic regression for ascertaining the
likelihood that a family will purchase a new car during the next year. A random sample of 33
suburban families was selected. Data on annual family income (X 1, in thousand dollars) and the
current age of the oldest family automobile (X2, in years) were obtained. A follow-up interview
conducted 12 months later was used to determine whether the family actually purchased a new
car (Y = 1) or did not purchase a new car (Y = 0) during the year.

i:      1      2      3        ...          31     32     33
Xi1:    32     45     60        ...          21     32     17
Xi2:     3      2      2        ...           3      5      1
Yj:      0      0      1        ...           0      1      0

Multiple logistic regression model with two predictor variables in first-order terms is assumed to
be appropriate.

a. Find the maximum likelihood estimates of  o,  1, and  2. State the fitted response function.
b. Obtain exp(b1) and exp(b2) and interpret these numbers.
c.What is the estimated probability that a family with annual income of \$50 thousand and an
oldest car of 3 years will purchase a new car next year?
B. Refer to Car purchase
a. To assess the appropriateness of the logistic regression function, form three groups of 11 cases
each according to their fitted logit values  ' . Plot the estimated proportions p j against the
ˆ
midpoints of the ˆ ' intervals. Is the plot consistent with a response function of monotonic
sigmoidal shape? Explain.
c. Obtain the deviance residuals and present them in an index plot. Do there appear to be any
outlying cases?
d. Construct a half-normal probability plot of the absolute deviance residuals. Do any cases here
appear to be outlying?

Additional Problem (Required STAT PhD Students; optional for other students)

Prove the two equations in Remark 3 on page 8 of the lecture notes on GLM
(http://www.stat.rutgers.edu/~mxie/stat587/lecture2.pdf).

[Hint: Consider the expectations of the first and the second derivatives (w.r.t. theta_i) of the log-
likelihood function. ]

```
To top