Psychometrics Research Titles

Document Sample
Psychometrics Research Titles Powered By Docstoc
					                                                  R in Psychometrics and Psychometrics in R

                                                  Jan de Leeuw, UCLA Statistics

                                  St    Ucla      In psychometrics, and in the closely related fields
                                                  of quantititative methods for the
                                                  social and educational sciences, R is not yet used
                                                  very often. Traditional mainframe packages such

   R in Psychometrics and                         as SAS and SPSS are still dominant at the user-level,
                                                  Stata has made inroads at the teaching level,
                                                  and Matlab is quite prominent at the research level.

     Psychometrics in R                           In this paper we define the most visible techniques
                                                  in the psychometrics area, we give an overview of
                                                  what is available in R, and we discuss what is missing.
                  Jan de Leeuw                    We then outline a strategy and a project to fill in
                                                  the gaps. The outcome will hopefully be a more
                                                  prominent position of R in the social and behavioral
                                                  sciences, and as a result less of a gap between these
                                                  disciplines and mainstream statistics.


1. What is Psychometrics ? How is it related to
other Foometrics ?

2. How much R is there in Psychometrics ? Can
there be more ? Should there be more ?

3. How much Psychometrics is there in R ? Will
there be more ? What is missing.

A recent overview of what Psychometricians
themselves think about Psychometrics is in
Statistica Neerlandica, 60, 2006, 135-144.
                        3                                                         4
If Foo is a science then Foo often has both an        Each of the social and behavioural sciences has a
area Foometrics and an area Mathematical Foo.         form of Foometrics, although they may not all
                                                      use a name in this family.
Mathematical Foo applies mathematical
modeling to the Foo subject area, while               Clearly Economics, Psychology, Biology,
Foometrics develops and studies data analysis         Archeology, Anthropology, and Environmental
techniques for empirical data collected in Foo.       Science have their own Foometrics.

What we call statistics is the union of the various   And then there are various recent upstarts such as
Foometrics over all Foo. Not the intersection, but    Cliometrics, Informetrics, Bibliometrics,
the union.                                            Behaviormetrics, Ecolometrics, Cybermetrics,
                                                      and Scientometrics.
                         5                                                     6

Sociology would like to have Sociometrics, but        In this presentation we'll look at Psychometrics
the name was already in use for something quite       and Educometrics, with a dash of Sociometrics
different. Historiometrics and Archeometrics are      and Econometrics.
there, but struggling.
                                                      Psychometrics and Educometrics have been
Education does not really have Educometrics, but      around for a long time, at least since Galton, and
we'll use it anyway.                                  their development has been very closely linked
                                                      and often the two have been indistinguishable.
Social sciences in which data are less prominent
usually have books and conferences with titles        So we do not distort reality too much if we just
such as Statistics in Foo -- they will have their     simply call the body of techniques we discuss
very own Foometrics in the future.                    Psychometrics.
                         7                                                     8
                                                       R in Psychometrics

               Psychometrics                           Traditionally psychologists doing data analysis
                   MDS                                 use SPSS, some use SAS.
                  FA SEM
                         CA                            Psychometricians developing data analysis
                                                       techniques use Matlab, sociometricians and
                      HLM        LogLin                econometricians (at least in the US) tend to use
    Educometrics                 Sociometrics
                                                       The situation in France or England may be quite
                            9                                                   10

This has mainly historical reasons -- it has to do     Psychometric software is often distributed by
with where these packages originated.                  incorporating it as modules in the standard
                                                       packages (SPSS, SAS, Stata), using either native
But it also has to do with the rather large distance   matrix routines if available or linking in compiled
between areas such as psychometrics and                code. This guarantees good distribution, some
(academic) statistics, which again has historical      money, but certainly not efficient computation.
reasons, most of them silly. Typically, there is not
much interaction, despite institutions like ETS        Examples are CATEGORIES for CA in SPSS,
and Bell Labs.                                         PROC CALIS for SEM and PROC GLM for
                                                       MLA in SAS, and gllamm for SEM and MLA in
And thus the R revolution has largely passed           Stata.
psychometrics by.
                            11                                                  12
In addition, psychometricians tend to write stand-       Writing stand-alone compiled packages often
alone packages for specific families of                  means that the psychometrician is a small
techniques. This is often compiled code                  company, trying to make money. It also means a
combined with a suitable GUI.                            certain form of competition, which does not
                                                         really belong in academia. And it means
The prototypical example are SEM packages like           proprietary software, which costs money.
packages such as HLM or ML-WIN -- but there              More seriously, perhaps, is that this approach
many similar stand-alone packages for IRT and            means black-box software, in which the
CA and LLA as well. In fact the number of CA             machinery is almost completely hidden. This
packages in marketing, for example, is                   means the user often will not even try to
staggering.                                              understand what is going on.
                         13                                                       14

The techniques implemented in the black-box            Promoting the teaching and the use of R in
packages are often complicated (many parameters,       psychometrics has some major advantages.
complicated optimizations, doubtful standard
errors).                                               1. The distance to academic statistics becomes
This is necessarily true: simpler techniques are
already implemented in SAS or SPSS and usually         2. Software is more transparent -- driven by
the institution has a site license for those.          interpreted code. Reproducible results are more likely.

Thus we have Deus Ex Machina software: it              3. One can teach with R. One can teach SAS, but one
transforms large datasets into rather mysterious       cannot teach with SAS (or LISREL).
pictures or tables that are nevertheless acceptable,
and often even encouraged, by peers and journals.      4. Software should be free.
                         15                                                       16
Psychometrics in R                                     The psychoR project.

We give a quick inventory of the psychometric          I have been writing and planning a substantial
software now available or soon to be available in R.   number of psychometric techniques in R.
                                                       Eventually they will grow up to be packages.
I shall concentrate on CRAN, of course, while
mentioning some additional easily available            They are not intended to replace existing
packages on other servers.                             packages: let a thousand flowers bloom. They are
                                                       written following the familiar programming
We shall see there is quite an abundance, although     philosophy that you can write FORTRAN in any
in most cases all forms of organization is lacking     language. You can find them at
and duplications abound.                     
                         17                                                    18

                                                       1. Simple and Multiple Correspondence Analysis.
JSS ( is planning a number of
special issues, with appropriate guest editors, and    There is CA and MCA both in MASS, in ade4, in
names such as                                          FactoMineR, and in homals. Many variations
                                                       (Canonical CA, Fuzzy CA, Detrended CA,
-- R in Psychometrics                                  Multiway CA, Discriminant CA, Co-CA) in
-- R in Econometrics                                   ade4, PTAk, cocorresp, vegan, made4. At least
-- R in Sociometrics                                   three more CA packages (Greenacre, Beh, De
                                                       Leeuw) with various options are currently being
and whatever else anyone suggests along these          prepared.
lines. Of course there is an inherent risk in
actually making constructive suggestions -- you        An Embarrassment of riches.
may wind up to be a guest editor.
                         19                                                    20
The homals (soon gifi) package does what SPSS                         2. Item Response Theory
Categories does, and more. It has many forms of
multivariate analysis with optimal scaling,                           ltm fits the simple Rasch model, the graded logistic
organized as extensions of MCA. But it is rather                      model for polytomous data, and the linear
poorly documented.                                                    multidimensional logistic model.
      min min                tr (X − G j Y j ) (X − G j Y j )         mprobit fits the multivariate binary probit model.
     X X=I Y j ∈Y j
                                                                      Logistic IRT is related to Gaussian ordination,
CA and MCA are extended in the psychoR project                        implemented in various forms in VGAM.
with distance association models (distassoc,
scalassoc, singlepeaked, logithom), which also                        More Rasch model fitting packages are on their way.
generalize many common IRT models.
                                   21                                                              22

In psychoR we have                                                    This covers most IRT models, and then some.
     n    m    kj
                                        β j exp(η(xi , y j ))         There are also versions for marginal maximum
                      yi j log      kj
                                                                      likelihood estimation, and for cross tables with
    i=1 j=1 =1                      ν=1    β jν exp(η(xi , y jν ))    frequencies in the form
     n   m    kj
                                                                                      n   m
                      yi j log Φ(τ j − η(xi , y j ))−                                          yi j log λi j − λi j ,
    i=1 j=1 =1                                                                       i=1 j=1
                             − Φ(τ j      −1   − η(xi , y j   −1 ))                 λi j = αi β j exp(η(xi , y j ))
                   x y ,
                    i j
    η(xi , y j ) = − xi − y j ,                                      This generalizes CA, the RC model, Quasi-
                   − x − y 2 .
                                                                     Symmetry, and so on.
                        i    j
                                   23                                                              24
3. Factor Analysis (see also under SEM)              4. Three-mode Analysis

factanal in stats can do exploratory maximum         PTAk has various forms of k-mode component
likelihood factor analysis.                          analysis or singular value decomposition, popular
                                                     in both psychometrics, chemometrics, and fMRI
MCMCpack has some options for sampling from          analysis.
the posterior for ordinal and mixed factor models.
These are related to IRT.                            Although there is a three-mode slot in the
                                                     psychoR project, currently PTAk seems to cover
homals can do various forms of mixed data            most of the useful analysis.
principal component analysis, which the French
sometimes call FA. See also FactoMineR.
                        25                                                      26

5. Structural Equations Models                       psychoR has a slot for least squares SEM. Find a
                                                     patterned matrix A of coefficients and a matrix of
sem fits SEM's using the RAM specification.          transformed (quantified and standardized)
This is quite general, and allows one to specify     variables B such that
arbitrary path models with observed and latent                             n
variables.                                                   min min            tr A B BA
                                                             A∈A B∈K∩S
In order to compete with the stand-alone
programs sem may need various constraints,           Some of the blocks in B can also be "latent
confirmatory analysis, asymptotically distribution   variables", which basically means they are
free methods, ordinal variables, and hierarchical    completely missing and are only defined by the
structures.                                          orthogonality constraints.

                        27                                                      28
6. Multidimensional Scaling                         psychoR has metric and non-metric least squares
                                                    multidimensional scaling, including unfolding
There is non-metric MDS in MASS, labdsv,            individual difference models, using the SMACOF
ecodist, vegan and xgobi/ggobi. These are all       majorization algorithm.
Kruskal-type least squares loss function using                 m    n   n

step-size gradient optimization methods.                                     wi jk (δi jk − di j (Xk ))2
                                                               k=1 i=1 j=1

There is classic (Torgerson) metric MDS in stats,   It also has least squares squared-distance
and Principal Coordinate Analysis (Gower) in        multidimensional scaling, using either the
ecodist, ade4, labdsv, and vegan.                   ALSCAL or the ELEGANT algorithm.
                                                               m   n    n
                                                                             wi jk (δ2jk − di2j (Xk ))2
                                                              k=1 i=1 j=1
                        29                                                       30

We do not discuss HLM and LogLin because
they are mostly outside Psychometrics.

In any case, it seems that quite a few procedures
(in many cases packages) are available, and more
are coming on line regularly.

It seems that providing more options and better
plots will pay off in the long run, but GUI's and
spreadsheet data editors (for instance, a diagram
editor for SEM) also seem to be a necessary
condition for acceptance.

Shared By:
Description: Psychometrics Research Titles document sample