EXERCISES

Document Sample
EXERCISES Powered By Docstoc
					To access R in Unix:

In the computer-lab click on the Putty window and login in Unity. Then click   #
on an ICON # named XWin32 (A blue X icon)
% add R
% R


EXERCISES:
1: HOW TO CONVERT A SPLUS OBJECT TO R
2: INTRO TO LM (LINEAR MODELS)
3: STATISTICAL MODELS: ANOVA, GLM, Poisson regression

#We will use the CAR library: Companion to Applied Regression
# for more information:
library(car)
library(help=car)



EXERCISE 1:

How to convert a Splus object to R object?


For example, the data object "car.test.frame" will be converted from a Splus
object to R object.

In Splus6:
% add Splus
% Splus6
(license has expired)

> car.test.frame   # this is a file available in Splus6
> dump("car.test.frame")    # car.test.frame.q will be generated in
the current working directory.

Open R in the current working directory
> source("car.test.frame.q")
> car.test.frame   # This would be exactly same as "car.test.frame" in Splus




------------------------

EXERCISE 2: INTRO TO LM


Linear Models

The function lm() returns an R object called a fitted linear
least-squares model object, or lm object for short:

lm(formula, data)

where formula is the structural formula that specifies the model
and data is the data frame containing the data (omitted if the
data is not stored in a data frame).

library(rpart)

>data(car.test.frame)
# get the data file
 The 'car.test.frame' data frame has 60 rows and 8 columns, giving
     data on makes of cars taken from the April, 1990 issue of
     _Consumer Reports_.



>   fuel.fit<-lm(100/Mileage ~ Weight + Disp., car.test.frame)


> fuel.fit
Call:
lm(formula=100/Mileage~Weight + Disp.,data=car.test.frame)

Coefficients:
 (Intercept)      Weight        Disp.
   0.4789733 0.001241421 0.0008543589

Degrees of freedom: 60 total; 57 residual
Residual standard error: 0.3900812

A list of the components of an lm object and of the arguments which can be
passed through the lm() function can be found at the
end of this section.

The lm object fuel.fit can be used as an argument to other functions to
obtain summaries of the data or to extract specific
information about the model. The function summary() gives a more detailed
description of the model than the one obtained by
printing out the lm object. The components of an lm.summary object are listed
at the end of this section.

> fuel.summary<-summary(fuel.fit)
> fuel.summary

Call:
lm(formula=100/Mileage~Weight + Disp.,data = car.test.frame)
Residuals:
      Min     1Q Median      3Q    Max
 -0.8109 -0.2559 0.01971 0.2673 0.9812

Coefficients:
             Value Std. Error   t value   Pr(>|t|)
(Intercept) 0.4790 0.3418       1.4014    0.1665
     Weight 0.0012 0.0002       7.2195    0.0000
      Disp. 0.0009 0.0016       0.5427    0.5895

Residual standard error: 0.3901 on 57 degrees of freedom
Multiple R-Squared: 0.7438
F-statistic: 82.75 on 2 and 57 degrees of freedom,
the p-value is 0
Correlation of Coefficients:
       (Intercept) Weight
Weight -0.8968
 Disp. 0.4720      -0.8033

Specific information about the model can be printed using the functions
coef(), resid() and fitted(). The same information can be
obtained using fuel.fit$coef, fuel.fit$resid, and fuel.fit$fitted.

> coef(fuel.fit)
 (Intercept)      Weight        Disp.
   0.4789733 0.001241421 0.0008543589

> formula(fuel.fit)
100/Mileage ~ Weight + Disp.

              * the formula() function extracts the model
                formula from the lm object fuel.fit


The update() function is used to modify the model formula used in fuel.fit:

> fuel.fit2<-lm(update(fuel.fit, . ~ . + Type))
   #Type is a factor with 6 levels (we need 5 degrees of freedom)
> is.factor(car.test.frame$Type)
[1] T
>
> levels(car.test.frame$Type)
[1] "Compact" "Large"   "Medium" "Small"    "Sporty" "Van"
>

> fuel.fit2
Call:
lm(formula = update(fuel.fit, . ~ . + Type))

Coefficients:
 (Intercept)      Weight       Disp.      Type1      Type2
    2.960617 4.16396e-05 0.007594073 -0.1452804 0.09808411
      Type3       Type4     Type5
 -0.1240942 -0.04358047 0.1915062

Degrees of freedom: 60 total; 52 residual
Residual standard error: 0.3139246



------------------
attach(Ornstein)
attach(Mroz)
attach(Moore)
attach(Prestige)

EXERCISE 3:


STATISTICAL MODELS IN R
  # multiple regression (Prestige data)
  # The 'Prestige' data frame has 102 rows and 6 columns. The
  #   observations are occupations.
  # This data frame contains the following columns:

        # education Average education of occupational incumbents, in   1971.

        # income Average income of incumbents, dollars, in 1971.

        # women Percentage of incumbents who are women.

        # prestige Pineo-Porter prestige score for occupation, from a social
        #     survey conducted in the mid-1960s.

        # census Canadian Census occupational code.

        # type Type of occupation. A factor with levels (note: out of
        #     order): 'bc', Blue Collar; 'prof', Professional, Managerial,
        #     and Technical; 'wc', White Collar.


library(car)
data(Prestige)
names(Prestige)

attach(Prestige)
prestige.mod <- lm(prestige ~ income + education + women,Prestige)
summary(prestige.mod)

       # dummy regression

type       # a factor

detach(Prestige)
Prestige.2 <- na.omit(Prestige) # filter out missing data
attach(Prestige.2)
type
length(type)

class(type)
unclass(type)
        # generating contrasts from factors

options("contrasts")

contrasts(type)

contrasts(type) <- contr.treatment(levels(type), base=2)
    # changing the baseline category

contrasts(type)

contrasts(type) <- 'contr.helmert'     # Helmert contrasts
contrasts(type)

contrasts(type) <- 'contr.sum'     # "deviation" contrasts
contrasts(type)

contrasts(type) <- NULL   # back to default



'cont.helmert' returns Helmert contrasts, which contrast the
     second level with the first, the third with the average of the
     first two, and so on. 'contr.poly' returns contrasts based on
     orthogonal polynomials. 'contr.sum' uses "sum to zero contrasts".

     'contr.treatment' contrasts each level with the baseline level
     (specified by 'base'): the baseline level is omitted. Note that
     this does not produce "contrasts" as defined in the standard
     theory for linear models as they are not orthogonal to the
     constant.


type.ord <- ordered(type, levels=c("bc", "wc", "prof")) # ordered factor
type.ord
round(contrasts(type.ord), 3)   # orthogonal polynomial contrasts

prestige.mod.1 <- lm(prestige ~ income + education + type)
summary(prestige.mod.1)

anova(prestige.mod.1)     # sequential ("type-I") sums of squares

prestige.mod.0 <- lm(prestige ~ income + education)
summary(prestige.mod.0)
anova(prestige.mod.0, prestige.mod.1)   # incremental F-test

Anova(prestige.mod.1)     # "type-II" sums of squares

prestige.mod.2 <- lm(prestige ~ income + education + type - 1)
# suppressing the constant

summary(prestige.mod.2)
Anova(prestige.mod.2)
# note: test for type is that all intercepts are 0 (not sensible)

prestige.mod.3 <- update(prestige.mod.1,
    .~. + income:type + education:type)       # adding interactions to the model
summary(prestige.mod.3)
Anova(prestige.mod.3)

lm(prestige ~ income*type + education*type) # equivalent specification


lm(prestige ~ type + (income+education) %in% type,data=Prestige) # nesting
lm(prestige ~ type + (income+education) %in% type – 1,data=Prestige)
# separate intercepts and slopes

detach(Prestige.2)

    # Anova Models

data(Moore)
attach(Moore)
Moore
    The 'Moore' data frame has 45 rows and 4 columns. The data are for
      subjects in a social-psychological experiment, who were faced with
      manipulated disagreement from a partner of either of low or high
      status. The subjects could either conform to the partner's
      judgment or stick with their own judgment.

fcategory <- factor(fcategory, levels=c("low","medium","high"))
#reorder levels

fcategory

means <- tapply(conformity, list(fcategory, partner.status), mean)
means   # table of means

tapply(conformity, list(fcategory, partner.status), sd) # standard deviations
tapply(conformity, list(fcategory, partner.status), length) # counts


Fcat <- as.numeric(fcategory)   # plot the means

#USEFUL GRAPH

plot(c(0.5, 3.5), range(conformity), xlab="F category",
    ylab="Conformity", type="n", axes=F)
axis(1, at=1:3, labels=c("low", "medium", "high"))
axis(2)
box()
points(jitter(Fcat[partner.status=="low"]),
    conformity[partner.status=="low"], pch="L")
points(jitter(Fcat[partner.status=="high"]),
    conformity[partner.status=="high"], pch="H")
lines(1:3, means[,1], lty=1, lwd=3, type="b", pch=19, cex=2)
lines(1:3, means[,2], lty=3, lwd=3, type="b", pch=1, cex=2)
legend(locator(1), c("high","low"), lty=c(1,3), lwd=c(3,3), pch=c(19,1))

#jitter: to add a little bit of noise
#lty: line type, 1 solid, 3 dotted.
#lwd: The line width.
#pch= Either an integer specifying a symbol or a single character
 #         to be used as the default in plotting points.
#Type: the default plot type desired (“b” for overplotted lines)

identify(Fcat, conformity)   # identify unusual points
#press ESC to finish


options(contrasts=c("contr.sum", "contr.poly"))
# contr.sum = deviation contrasts

moore.mod <- lm(conformity ~ fcategory*partner.status)
summary(moore.mod)
Anova(moore.mod)    # type II sums of squares
Anova(moore.mod, type="III")    # type III sums of squares
    # more on lm

args(lm)

# Subset: observation selection


lm(prestige ~ income + education, data=Prestige, subset=-c(6, 16))

lm(conformity ~ partner.status*fcategory, # specifying contrasts
    contrasts=list(partner.status=contr.sum, fcategory=contr.poly))

lm(100*conformity/40 ~ partner.status*fcategory, data=Moore,subset=-c(6,16))
  # data argument
  # note computation of y




      # Generalized Linear Models (GLM)

      # binary logit model


data(Mroz)
attach(Mroz)
Mroz[sort(sample(753,10)),]
     The 'Mroz' data frame has 753 rows and 8 columns. The
     observations, from the Panel Study of Income Dynamics (PSID), are
     married women.



attach(Mroz)
mod.mroz <- glm(lfp ~ k5 + k618 + age + wc + hc + lwg + inc,
data=Mroz,fam=binomial)

 summary(mod.mroz)

anova(update(mod.mroz, . ~ . - k5), mod.mroz, test="Chisq")
  # likelihood-ratio test

Anova(mod.mroz)     # analysis-of-deviance table

detach(Mroz)

           # Poisson regression

data(Ornstein)
Ornstein[sort(sample(248,10)),]

attach(Ornstein)
tab <- table(interlocks)
tab

x <- sort(unique(interlocks))
par(mfrow=c(1,1))
plot(x, tab, type='h', xlab='Number of Interlocks',
    ylab='Frequency')
points(x, tab, pch=16)

mod.ornstein <- glm(interlocks ~ assets + nation + sector, family=poisson)
summary(mod.ornstein)
Anova(mod.ornstein)

detach(Ornstein)

Assingment: Fit a linear model for the climate dataset,
Having as response rain, and as regressors, lon, lat, and elev.

Assingment: Fit a linear model to the Prestige dataset. Eliminate the
observations with missing values using na.omit.
Response: prestige, and predictors: income and education. Compare models with a
without the variable type using an incremental F-test (using anova). Is Type
categorical? Choose a contrast. Update the model to add second order
interactions with the variable type.
Refit but without using observations #6 and #16 (use subset command).

anova gives type I
Anova gives by default type II (but one can get type III)

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:8
posted:5/18/2012
language:English
pages:8