# Logistic regression Outline Outline Outline

Shared by:
Categories
-
Stats
views:
7
posted:
3/17/2010
language:
English
pages:
17
Document Sample

```							                                              Outline

Logistic regression        Logistic regression

Logistic regression in R
Marco Baroni
Further topics

Linear models in R

Outline                                       Outline

Logistic regression                           Logistic regression
Introduction                                  Introduction
The model                                     The model
Estimation                                    Estimation
Looking at and comparing ﬁtted models         Looking at and comparing ﬁtted models

Logistic regression in R                      Logistic regression in R

Further topics                                Further topics
Modeling discrete response variables                                       Examples: binomial responses

Is statement X rated as “acceptable” in the following
condition(s)?
Does sentence S, that has features Y, W and Z, display
In a very large number of problems in cognitive science                    phenomenon X? (linguistic corpus data!)
and related ﬁelds                                                          Is it common for shoppers to decide to purchase the good
the response variable is categorical, often binary (yes/no;            X given these conditions?
acceptable/not acceptable; phenomenon takes place/does
not take place)                                                        Did subject make more errors in this condition?
potentially explanatory factors (independent variables) are            How many people answer YES to question X in the survey
categorical, numerical or both
Do old women like X more than young men?
Did the subject feel pain in this condition?
How often was reaction X triggered by these conditions?
Do children with characteristics X, Y and Z tend to have
autism?

Examples: multinomial responses                                            Binomial and multinomial logistic regression models
Discrete response variable with natural ordering of the
Problems with binary (yes/no, success/failure,
levels:
happens/does not happen) dependent variables are
Ratings on a 6-point scale
Depending on the number of points on the scale, you might
handled by (binomial) logistic regression
also get away with a standard linear regression                   Problems with more than one discrete output are handled
Subjects answer YES, MAYBE, NO                                         by
Subject reaction is coded as FRIENDLY, NEUTRAL,                            ordinal logistic regression, if outputs have natural ordering
ANGRY                                                                      multinomial logistic regression otherwise
The cochlear data: experiment is set up so that possible               The output of ordinal and especially multinomial logistic
errors are de facto on a 7-point scale                                 regression tends to be hard to interpret, whenever possible
Discrete response variable without natural ordering:                       I try to reduce the problem to a binary choice or a set of
Subject decides to buy one of 4 different products                     binary choices
We have brain scans of subjects seeing 5 different objects,                E.g., if output is yes/maybe/no, treat “maybe” as “yes”
and we want to predict seen object from features of the                    and/or as “no”
scan                                                                       Multiple binomial regressions instead of a single
We model the chances of developing 4 different (and                        multinomial regression
mutually exclusive) psychological syndromes in terms of a              Here, I focus entirely on the binomial case
number of behavioural indicators
Don’t be afraid of logistic regression!                                    Don’t be afraid of logistic regression!

Logistic regression is less popular than linear regression
This might be due in part to historical reasons
the formal theory of generalized linear models is relatively           If it is natural to cast your problem in terms of a discrete
recent: it was developed in the early 70s                              variable, you should go ahead and use logistic regression
the iterative maximum likelihood methods used for ﬁtting
logistic regression models require more computational                  Logistic regression might be trickier to work with than linear
power than solving the least squares equations                         regression, but it’s still much better than pretending that the
Results of logistic regression are not as straightforward to               variable is continuous or artiﬁcially re-casting the problem
understand and interpret as linear regression results                      in terms of a continuous response
Finally, there might also be a bit of prejudice against
discrete data as less “scientiﬁcally credible” than
hard-science-like continuous measurements

The Machine Learning angle                                                 The Machine Learning angle

Classiﬁcation of a set of observations into 2 or more
discrete categories is a central task in Machine Learning
Same setting of logistic regression, except that emphasis is
The classic supervised learning setting:
placed on predicting the class of unseen data, rather than
Data points are represented by a set of features, i.e.,
on the signiﬁcance of the effect of the features/independent
discrete or continuous explanatory variables
The “training” data also have a label indicating the class of          variables (that are often too many – hundreds or more – to
the data point, i.e., a discrete binomial or multinomial               be analyzed singularly) in discriminating the classes
dependent variable                                                     Indeed, logistic regression is also a standard technique in
A model (e.g., in the form of weights assigned to the                  Machine Learning, where it is sometimes known as
dependent variables) is ﬁtted to the training data                     Maximum Entropy
The trained model is then used to predict the class of
unseen data points (where we know the values of the
features, but we do not have the label)
Outline                                                                    Ordinary linear regression

Logistic regression
Introduction                                                               The by now familiar model:
The model
Estimation                                                                                 y = β0 + β1 x1 + β2 × x2 + ... + βn xn +
Looking at and comparing ﬁtted models
Why will this not work if variable is binary (0/1)?
Logistic regression in R                                                      Why will it not work if we try to model proportions instead
of responses (e.g., proportion of YES-responses in
condition C)?
Further topics

Modeling log odds ratios                                                   From probabilities to log odds ratios
Following up on the “proportion of YES-responses” idea,
let’s say that we want to model the probability of one of the
two responses (which can be seen as the population

5
proportion of the relevant response for a certain choice of
the values of the dependent variables)
Probability will range from 0 to 1, but we can look at the
logarithm of the odds ratio instead:

logit(p)
0
p
logit(p) = log
1−p
This is the logarithm of the ratio of probability of

−5
1-response to probability of 0-response
It is arbitrary what counts as a 1-response and what counts
as a 0-response, although this might hinge on the ease of
interpretation of the model (e.g., treating YES as the
1-response will probably lead to more intuitive results than
treating NO as the 1-response)                                                           0.0   0.2    0.4       0.6   0.8   1.0
Log odds ratios are not the most intuitive measure (at least
p
for me), but they range continuously from −∞ to +∞
The logistic regression model                                      From log odds ratios to probabilities

1.0
Predicting log odds ratios:

logit(p) = β0 + β1 x1 + β2 x2 + ... + βn xn

0.8
From probabilities to log odds ratios:

0.6
p
logit(p) = log
1−p

p
0.4
Back to probabilities:

elogit(p)

0.2
p=
1 + elogit(p)

0.0
Thus:
eβ0 +β1 x1 +β2 x2 +...+βn xn
p=                                                               −10     −5         0        5     10
1 + eβ0 +β1 x1 +β2 x2 +...+βn xn
logit(p)

Probabilities and responses                                        What is the relation between p and the response?
1.0

q        q         q
q q qqq

Given a single response at a speciﬁc setting of the
0.8

independent variables, p is the p parameter of a Bernoulli
probability distribution for whether the response is 1 (YES,
0.6

success, etc.) or 0 (NO, failure, etc.)
Probability of the two possible values for the response
p

random variable (1 and 0):
0.4

P(y ) = py (1 − p)(1−y )
0.2

Expected value: E(y ) = p
Variance: var (y ) = p(1 − p)
0.0

q
qq q q q           q q

−10        −5        0           5        10
logit(p)
What is the relation between p and the response?                           Bernoulli vs. binomial

If independent variables are categorical, data might be
Given n responses for a certain setting of the independent                 stored in Bernoulli format (one row per observation, with 1
variables, p is the p parameter of a binomial distribution                 or 0 response) or in aggregate binomial format (one row
bin(p, n) for the number y of 1 responses                                  per combination of the independent variables, with one
Probability of a number y of 1 responses given n                           column for the number of successes for that combination,
responses in the speciﬁc setting:                                          and one column for the observations with that combination)
Below, I assume Bernoulli format, that is more natural
n y
p(y ) =     p (1 − p)(n−y )                               when collecting data, and typically more meaningful in the
y                                               presence of continuous independent variables (when there
Expected value: E(y ) = np                                                 might be mostly one observation per combination of the
independent variables anyway)
Variance: var (y ) = np(1 − p)
Almost nothing changes, but deviance can only be
Single response (Bernoulli) scenario is simply a special
interpreted as a measure of goodness of ﬁt with a χ2
case of aggregate binomial setting with n = 1
distribution when data are analyzed in the aggregate
binomial format

Generalized linear models                                                  Ordinary linear regression
as a generalized linear model
Logistic regression is an instance of wider family of
generalized linear models
Somewhat brutally, in a generalized linear model:
the value of a certain quantity η for a certain setting of the         Systematic component: η = x β
independent variables (a row vector x) is computed by a
weighted linear combination of the explanatory variables
(the systematic component): η = x β
η = E(y )
the responses are modeled by a probability distribution that
has an expected value E(y ) (the random component)
Random component is normal distribution with mean µ and
A link function connects the random and systematic
variance σ 2
components by establishing that η is a function of the
expected value E(y ), i.e.: η = f (E(y ))                                   This is why we add the 0-centered error term in the linear
regression model
General framework that uses same ﬁtting techniques to
estimate models for different kinds of data
More about the general model in Luigi’s classes
Logistic regression as a generalized linear model                       Variance in the random component

Systematic component: η = x β
The systematic component is η = x β for both ordinary and
Link function:                                                          logistic regression (and any other generalized linear model)
E(y )           E(y )           In ordinary regression, random component is normally
η = f (E(y )) = logit(E(y )) = log             = log
1 − E(y )       1 − E(y )         distributed with mean η and variance σ 2
Putting the pieces together, a speciﬁc response is modeled
Random component is Bernoulli distribution with expected
by x β + , where the term comes from the random
value parameter E(y ), i.e., p
component (it is sampled from a normal distribution with
Given E(y ), i.e., p, observations have a Bernoulli                     mean 0 and variance σ 2 )
distribution with variance p(1 − p)

Variance in the random component                                        Bernoulli variance
as a function of the expected value p
In logistic regression, the random component has a
Bernoulli distribution with p = logit(η) and variance

0.25
p(1 − p): there is no independent σ 2 parameter, and no
term from the random component
The response generation happens by picking 1 with

0.20
probability p and 0 otherwise; variance in responses is
entirely determined by p, i.e., η

0.15
On a related point, the errors, computed as response − p,                 variance
are not expected to be normally distributed with the same
variance across settings of the independent variables: we
0.10

expect a larger variance (and consequently a larger
squared error) when the expected p is close to .5, dropping
0.05

as we move towards the extremes (p = 0 or p = 1)
0.00

0.0   0.2   0.4   0.6   0.8   1.0
Outline                                                              Maximum likelihood estimation

Logistic regression
Introduction                                                         For any generalized linear model, we estimate the values
The model                                                            of the β parameters by maximum likelihood estimation, i.e.,
Estimation                                                           searching across possible settings of the β for that
Looking at and comparing ﬁtted models                                combination that makes the sample responses most
probable
Logistic regression in R                                                                                               ˆ
Given a set of responses y , we search for the β estimate
that maximizes
Further topics                                                                                         ˆ
p(y |β)

Maximum likelihood estimation                                        Iteratively reweighted least squares
We minimize squared error by giving more weight to errors
at levels of the independent variables where variance
In ordinary linear regression maximizing the likelihood is          around p should be low (high and low ps)
equivalent to minimizing the sum of squared errors across           This requires estimates of p, that in turn require estimates
the board (and consequently the estimated variance of               of the β coefﬁcients
errors)                                                             Iterative procedure:
In logistic regression, the errors are not expected to have             make a guess at the ps (e.g., set them to the proportions of
the same variance: we should have high variance for p                   positive responses at each level)
near .5, lower variance towards the extremes                            use guesses to estimate the variances and hence the
weights (inversely proportional to the variances)
Leads to (iteratively) (re)weighted least squares (IRWLS)               use the new weights to estimate the βs, derive new p
method, where errors are penalized more where we expect                 estimates, go back to the second step until convergence
less variance around p                                              In the logistic regression setting (and in other generalized
linear models), IRWLS is equivalent to the
Newton-Rhapson optimization method to maximize
likelihood
Iteratively reweighted least squares                                    Outline

Iterative search for best parameter tuning instead of closed         Logistic regression
form equation:                                                          Introduction
No guarantee that estimation will converge on a vector of β         The model
values that are actually maximizing the likelihood (“local          Estimation
maximum” problem)                                                   Looking at and comparing ﬁtted models
Computationally more demanding than ordinary least
squares
Possibly unstable                                                Logistic regression in R
Note that ordinary least squares can be interpreted as a
special case of weighted least squares with ﬁxed and                 Further topics
uniform weights

Interpreting the βs                                                     Interpreting the βs

ˆ      ˆ
Asymptotically, βi |/s.e.(βi ) follows a standard normal
distribution if βi is 0
The standard errors are the squared roots of the diagonal
p is not a linear function of the βs                                          elements of:
The same β will have a more drastic impact on p towards                                         [X T diag(var (pi ))X ]−1
ˆ
the center of the p range than near the extremes (recall the                  where, if there are m data points in the sample,
S shape of the p curve)                                                       diag(var (pi )) is a m × m matrix with pi (1 − pi ) in the ith
ˆ                           ˆ       ˆ
As a rule of thumb (the “divide by 4” rule), β/4 is an upper                  diagonal element
(More involved than standard error of ordinary regression
bound on the difference in p brought about by a unit
coefﬁcients because variance changes with p)
difference on the corresponding explanatory variable
Roughly, if a β is more than 2 standard errors away from 0,
we can say that the corresponding explanatory variable
has an effect that is signiﬁcantly different from 0 (at
α = 0.05)
Goodness of ﬁt                                                                  Binned goodness of ﬁt

Measures such as R 2 based on residual errors are not
very informative
Goodness of ﬁt can be inspected visually by grouping the
One intuitive measure of ﬁt is the error rate, given by the                     ps into equally wide bins (0-0.1,0.1-0.2, . . . ) and plotting
ˆ
proportion of data points in which the model assigns p > .5
ˆ
the average p predicted by the model for the points in each
ˆ
to 0-responses or p < .5 to 1-responses
ˆ
bin vs. the observed proportion of 1-responses for the data
This can be compared to baseline in which the model
always predicts 1 if majority of data points are 1 or 0 if                  points in the bin
majority of data points are 0 (baseline error rate given by                 We can also compute a R 2 or other goodness of ﬁt
proportion of minority responses over total)                                measure on these binned data
Some information lost (a .9 and a .6 prediction are treated                     Won’t try it here, see the plot.logistic.fit.fnc()
equally)                                                                        function from languageR
Other measures of ﬁt proposed in the literature, no widely
agreed upon standard

Model selection                                                                 Log likelihood
We explore two ways to select models:
Via a statistical signiﬁcance test for nested models based                  Various measures of absolute or relative ﬁt of logistic
on the notion of deviance                                                   regression models (and other generalized linear models)
There is a relation between the deviance-based test used in            are based on the logarithm of the likelihood of a ﬁtted
logistic regression (and with other generalized linear models)         model, i.e., the maximum likelihood value for the relevant
and the F-test used for ordinary linear regression                     model
Via a criterion-based method that favours the model with
For a data set of m rows with Bernoulli (1 or 0) responses,
the lowest Akaike Information Criterion (AIC) score
where yi is the ith response and pi is the model-produced
ˆ
In both approaches, the notion of log likelihood of a model                     p for the ith row, the (maximized) log likelihood of the
ﬁtted by maximum likelihood plays a crucial role (it is                         model M is given by:
desirable that a model assigns a high probability to the
seen data)                                                                                            i=m

Moreover, in both approaches the number of parameters                                    L(y , M) =         {log(ˆ yi ) + log[(1 − p)1−yi ]}
p                 ˆ
i=1
used by a model counterbalances the likelihood (a high
likelihood is less impressive if we can tune a lot of                           Note that the log likelihood is always ≤ 0 (because
parameters, and a model with many parameters is more                            log 1 = 0)
prone to overﬁtting)
Deviance                                                              Deviance

L(y , M) is the (maximum) log likelihood for a model M with           If data are arranged with a separate row for each Bernoulli
n parameters (βs to estimate) on a sample of m                        response, there are m rows and pi is the model-ﬁtted p for
ˆ
observations with responses y = (y1 , y2 , · · · , ym )               the ith row, the computation of deviance reduces to:
L(y , My ) is the (maximum) log likelihood for the saturated
i=m
model that has one parameter per response (i.e., m
parameters), so it can ﬁt the observed responses perfectly                  −2          p       p
{ˆi [log(ˆi ) − log(1 − pi )] + log(1 − pi )}
ˆ               ˆ
i=1
The deviance of a maximum-likelihood-ﬁtted model M is
−2 times the difference in log likelihoods between M and              When the data are arranged in this way (rather than by
My (equivalently, the log likelihood ratio):                          counts of successes and observations per level) and/or
some independent variables are continuous, the
D(y , M) = −2[L(y , M) − L(y , My )]                     signiﬁcance of the deviance from the saturated model
cannot be assessed with a χ2 test as we will do below
The better the model is at predicting the actual responses,
when comparing two nested models: comparing nested
the lower the deviance will be; conversely, the larger the
models will be the main use of deviance for our purposes
deviance, the worse the ﬁt

Deviance                                                              Deviance
Consider two nested models where the larger model Ml
uses all the independent variables in the smaller model Ms            The log likelihood ratio, or equivalently the difference in
plus some extra ones; the difference in parameters (β) to             deviance, approximates a χ2 distribution with the difference
estimate will be nl − ns                                              in number of parameters nl − ns as df’s (with some caveats
Maximizing the likelihood on the same observations y                  we do not worry about here: in general, approximation is
using a subset of the parameters cannot yield a higher                better if difference in parameter number is not too large)
maximum than with the larger set, thus:                               Intuitively, the larger the difference in deviance between Ms
and Ml , the less likely is is that the improvement in
L(y , Ms ) ≤ L(y , Ml )
likelihood is just due to the fact that Ml has more
A log likelihood ratio test that the improvement in likelihood        parameters to play with, although this difference will be the
with Ml is not due to chance is given by:                             less surprising, the more extra-parameters Ml has
A model can also be compared against the baseline “null”
−2[L(y , Ms ) − L(y , Ml )] = −2[L(y , Ms ) − L(y , My )]          model that always predicts the same p (given by the
−2[L(y , Ml ) − L(y , My )]        proportion of 1-responses in the data) and has only one
= D(y , Ms ) − D(y , Ml )              parameter (the ﬁxed predicted value)
Akaike Information Criterion (AIC)                                      Outline
The AIC is a measure derived from information-theoretic
arguments that provides a trade-off between the ﬁt and the
complexity of a model
Given a model M with n parameters achieving maximum                  Logistic regression
log likelihood L(y , M) on some data y , the AIC of the
model is:                                                            Logistic regression in R
AIC = −2L(y , M) + 2n                              Preparing the data and ﬁtting the model
The lower the AIC, the better the model                                 Practice
The negative log likelihood term penalizes models with a
poor ﬁt, whereas the parameter count term penalizes             Further topics
complex models
AIC does not tell us anything about “signiﬁcant”
differences, but models with lower AIC can be argued to be
“better” than those with higher AIC (what matters is the
rank, not the absolute difference)
AIC can also be applied to choose among non-nested
models, as long as the ﬁtted data are the same, of course

Outline                                                                 Back to the Graffeo et al.’s discount study
Fields in the discount.txt ﬁle

Logistic regression
subj Unique subject code
sex M or F
Logistic regression in R
Preparing the data and ﬁtting the model                                        age NB: contains some NA
Practice                                                              presentation absdiff (amount of discount), result (price after
discount), percent (percentage discount)
Further topics                                                                product pillow, (camping) table, helmet, (bed) net
response variable
Preparing the data                                                  Logistic regression in R

Read the ﬁle into an R data-frame, look at the summaries,
etc.
Note in the summary of age that R “understands” NAs
> sex_age_pres_prod.glm<-glm(choice~sex+age+
(i.e., it is not treating age as a categorical variable)
presentation+product,family="binomial")
We can ﬁlter out the rows containing NAs as follows:
> e<-na.omit(d)                                                   > summary(sex_age_pres_prod.glm)
Compare summaries of d and e
na.omit can also be passed as an option to the modeling
functions, but I feel uneasy about that
Attach the NA-free data-frame

Selected lines from the summary() output                            Selected lines from the summary() output
Deviance

Estimated β coefﬁcients, standard errors and z scores
(β/std. error):
Coefficients:
Estimate Std. Error z value Pr(>|z|)                For the “null” model and for the current model:
sexM                -0.332060   0.140008 -2.372 0.01771 *
age                 -0.012872   0.006003 -2.144 0.03201 *                    Null deviance: 1453.6     on 1175    degrees of freedom
presentationpercent 1.230082    0.162560   7.567 3.82e-14 ***            Residual deviance: 1284.3     on 1168    degrees of freedom
presentationresult   1.516053   0.172746   8.776 < 2e-16 ***
Note automated creation of binary dummy variables:
Difference in deviance (169.3) is much higher than
discounts presented as percents and as resulting values
are signiﬁcantly more likely to lead to a purchase than                  difference in parameters (7), suggesting that the current
discounts expressed as absolute difference (the default                  model is signiﬁcantly better than the null model
level)
use relevel() to set another level of a categorical
variable as default
Deviance: do it yourself                                                   Selected lines from the summary() output
AIC and Fisher Scoring iterations

i=m
D(y , M) = −2          p       p
{ˆi [log(ˆi ) − log(1 − pi )] + log(1 − pi )}
ˆ               ˆ
i=1
The Akaike Information Criterion score:
AIC: 1300.3
> fitted_ps<-fitted.values(sex_age_pres_prod.glm)
> -2 * sum(fitted_ps * (log(fitted_ps)
And how many iterations of the IRWLS procedure were
- log(1-fitted_ps)) + log(1-fitted_ps))
performed before convergence (large numbers might cue a
[1] 1284.251
problem):
> ys<-as.numeric(choice)-1                                                        Number of Fisher Scoring iterations: 4
> -2 * length(ys) * (mean_y * (log(mean_y)
- log(1-mean_y)) + log(1-mean_y))
[1] 1453.619

The AIC: do it yourself                                                    Comparing models with the deviance test
Let us add a presentation by interaction term:

> ys<-as.numeric(choice)-1                                                        > interaction.glm<-glm(choice~sex+age+presentation+
product+sex:presentation,family="binomial")
> fitted_ps<-fitted.values(sex_age_pres_prod.glm)
Are the extra-parameters justiﬁed?
> ll<-sum(log(fitted_ps^ys)
+ log((1-fitted_ps)^(1-ys)))                                                    > anova(sex_age_pres_prod.glm,interaction.glm,
test="Chisq")
...
> -2*ll +                                                                           Resid. Df Resid. Dev   Df Deviance P(>|Chi|)
2*length(coefficients(sex_age_pres_prod.glm))                                  1      1168     1284.25
[1] 1300.251                                                                      2      1166     1277.68   2     6.57      0.04

Apparently, yes (although summary(interaction.glm)
suggests just a marginal interaction between sex and the
percentage dummy variable)
Comparing models by AIC                                            Error rate
The model makes an error when it assigns p > .5 to
observation where choice is N or p < .5 to observation
where choice is Y:

> sum((fitted(interaction.glm)>.5 & choice=="N") |
(fitted(interaction.glm)<.5 & choice=="Y")) /
Again, the model with the interactions looks better:                     length(choice)
[1] 0.2712585
> sex_age_pres_prod.glm\$aic
[1] 1300.251
Compare to error rate by baseline model that always
> interaction.glm\$aic                                               guesses the majority choice:
[1] 1297.682
> table(choice)
choice
N   Y
363 813
> sum(choice=="N")/length(choice)
[1] 0.3086735

Improvement in error rate is nothing to write home about. . .

Bootstrap estimation                                               Outline

Recompute logistic model ﬁtted with lrm(), the logistic
regression ﬁtting function from the Design package:
> interaction.glm<-lrm(choice~sex+age                           Logistic regression
+presentation+product+sex:presentation,
x=TRUE,y=TRUE)                                                Logistic regression in R
Validation using the logistic model estimated by lrm() and         Preparing the data and ﬁtting the model
1,000 iterations:                                                  Practice
> validate(interaction.glm,B=1000)
When fed a logistic model, validate() returns various           Further topics
measures of ﬁt we have not discussed: see, e.g., Baayen
or Harrell
Independently of the interpretation of the measures, the
size of the optimism indices gives a general idea of the
amount of overﬁtting (not dramatic in this case)
The Navarrete et al.’s                                                The cwcc.txt data-set
Cumulative Within-Category Cost data
subj A unique id for each of the 20 subjects
Part of a larger study by Eduardo Navarrete, Brad Mahon                   item The speciﬁc objects presented in the pictures
and Alfonso Caramazza; we just look at one very speciﬁc                        (pears, pianos, houses, etc.)
aspect of their Experiment 1 data                                     category 12 superordinate categories (animals, body parts,
A well-known and robust effect in picture naming makes                         buildings, etc.)
subjects slower at naming objects as a (linear) function of            ordpos The ordinal position of the item within its category
how many objects from the same category they saw before                       in the current block (is this the ﬁrst, second, . . . ,
(you are slower at naming the second mammal, slower still                     ﬁfth animal?)
to name the third, etc.)                                                  block Each subject sees 4 repetitions of the whole
The question here: does this “cumulative within-category                        stimulus set, presented in different orders
cost” effect also impact error rate? Are subjects more likely        response The picture naming latency in milliseconds (or
to mis-label/fail to name an object as a function of the                      error for wrong or anomalous responses)
numbers of objects in the same category they already                      error Did the subject gave a wrong/anomalous
saw?                                                                            response? (0: no; 1: yes)

Practice time                                                         Outline

Using logistic regression, model the probability of making         Logistic regression
an error in function of category, ordinal position within
category and block                                                 Logistic regression in R
Which effects do you ﬁnd?
Do they make sense?                                                Further topics
Is there an interaction, e.g., between ordinal position and
block?
Further topics
All stuff you can do in R

More generalized linear models: multinomial, ordinal
logistic regression, Poisson regression, . . .
Too many independent variables? Automated selection
procedures, Ridge/regularized regression, PCA of the
independent variables. . .
Item and subject issues? Mixed models with random
effects
No independent variable (unlabeled data)? Clustering,
PCA. . .
Fancier machine learning: Support Vector Machines. . .

```
Related docs