Data Mining Packages in R: logistic regression and SVM by IcdkK800


									Data Mining Packages in R:
     logistic regression and SVM

                       Jiang Du
                     March 2008
Logistic Regression

• lrm in package ``Design”
• glm in package ``stats”
• …
    Logistic Regression: lrm
Usage                                                Usage
    lrm(formula, data, subset,                          ## S3 method for class 'lrm':
   na.action=na.delete, method="",
   model=FALSE, x=FALSE, y=FALSE,                       predict(object, ..., type=c("lp", "fitted",
   linear.predictors=TRUE,,                "fitted.ind", "mean", "x", "data.frame", "terms",
   penalty=0, penalty.matrix, tol=1e-7,                 "adjto","", "model.frame"),
   var.penalty=c('simple','sandwich'), weights,, codes=FALSE)
   normwt, ...)                                      Arguments
Arguments                                            • Object
• Formula
                                                           – a object created by lrm
     – a formula object. An offset term can be
       included. The offset causes fitting of a      •   ...
       model such as logit(Y=1) = Xβ + W,                 – arguments passed to predict.Design, such
       where W is the offset variable having no
       estimated coefficient. The response                    as kint and newdata (which is used if you
       variable can be any data type; lrm converts            are predicting out of data). See
       it in alphabetic or numeric order to an S              predict.Design to see how NAs are
       factor variable and recodes it 0,1,2,...               handled.
•   Data                                             •   Type
     – data frame to use. Default is the current           –   …
Logistic Regression: lrm

• Fitting training data
   – model = lrm(Class ~ X + Y + Z, data=train)
• Prediction on new data
   – To get logit(Y=1)
      • predict(model, newdata = test, type = “lp”)
   – To get Pr(Y=1)
      • predict(model, newdata = test, type = “fitted.ind”)
•   The models fit by, e.g., the lm and glm functions are specified in a compact symbolic form.
    The ~ operator is basic in the formation of such models. An expression of the form y ~
    model is interpreted as a specification that the response y is modelled by                     a
    linear predictor specified symbolically by model. Such a model
    consists of a series of terms separated by + operators. The terms
    themselves consist of variable and factor names separated by :
    operators. Such a term is interpreted as the interaction of all the variables and factors
    appearing in the term.

•   In addition to + and :, a number of other operators are useful in model formulae. The * operator
    denotes factor crossing: a*b interpreted as a+b+a:b. The ^ operator indicates crossing
    to the specified degree. For example (a+b+c)^2 is identical to (a+b+c)*(a+b+c)
    which in turn expands to a formula containing the main effects for a, b and c together with their
    second-order interactions. The %in% operator indicates that the terms on its left are nested
    within those on the right. For example a + b %in% a expands to the formula a + a:b. The -
    operator removes the specified terms, so that (a+b+c)^2 - a:b is identical to a + b +
    c + b:c + a:c. It can also used to remove the intercept term: y ~ x - 1 is a line through the
    origin. A model with no intercept can be also specified as y ~ x + 0 or y ~ 0 + x.
Logistic Regression: glm

• Fitting training data
   – model = glm(Class ~ X + Y + Z, data=train,
• Prediction on new data
   – To get logit(Y=1)
      • predict(model, newdata = test)
   – To get Pr(Y=1)
      • predict(model, newdata = test, type = “response”)

• svm in ``e1071”
• ksvm in ``kernlab”
SVM: svm
• the kernel used in training and predicting. You might
  consider changing some of the following parameters,
  depending on the kernel type.
   – linear:
       • u'*v
   – polynomial:
       • (gamma*u'*v + coef0)^degree
   – radial basis:
       • exp(-gamma*|u-v|^2)
   – sigmoid:
       • Tanh(gamma*u'*v + coef0)
SVM: svm

• Training
  – model = svm(Class ~ X + Y + Z, data=train, type =
    "C“, kernel = “linear”)
• Prediction
  – predict(model, newdata = test)

To top