PLS‐PM
in
R:
The
plspm package
A
brief
tutorial
1
Introduction
This
package
is
one
of
the
results
derived
from
my
doctoral
research
in
Partial
Least
Squares
Path
Modeling.
Since
the
beginning
of
my
graduate
studies
I
was
encouraged
by
Prof.
Tomàs
Aluja
to
program
in
R
a
package
that
allowed
us
to
estimate
latent
variable
path
models
by
PLS
approach.
Innocently,
I
accepted
that
challenge
and
started
to
figure
out
the
way
to
achieve
that
goal.
It
took
me
about
three
months
to
come
out
with
a
very
poor
initial
version
of
my
package
with
long
lines
of
code
and
many
inefficient
subroutines.
As
I
was
improving
the
functions,
my
doctoral
project
also
started
to
expand
its
scope.
Finally,
it
was
decided
to
use
the
functions
I
had
in
R
as
a
baseline
for
a
more
user‐friendly
package
in
Java
with
a
fancy
look
and
a
graphic
interface
more
manageable
for
inexperienced
users.
The
result
of
that
project
was
the
launch
of
an
academic
software
program
called
Visual Pathmox
(Serch,
2008).
However,
I
felt
that
it
would
be
worthwhile
to
create
an
R
package
and
make
it
available
on
CRAN
for
public
use
while
contributing
to
the
spread
of
PLS
Path
Modeling.
I
hope
the
plspm
package
can
be
of
great
helpful
for
many
practitioners
and
researchers,
and
that
it
can
be
used
within
the
data
analysis
framework,
plots,
programming‐options,
and
capabilities
provided
by
R.
2
Yet
another
PLS‐PM
program
The
plspm
package
is
a
new
option
to
the
list
of
existing
programs
that
perform
PLS‐ PM
analysis
such
as
LVPLS
(Lohmöller,
1987),
PLS‐Graph
(Chin,
1993),
PLS‐GUI (Li,
2005),
VisualPLS (Fu,
2006),
SPAD‐PLS
(Test&Go,
2006),
SmartPLS (Ringle
et
al,
2005),
and
XLSTAT‐PLSPM
(Addinsoft,
2008).
Although
plspm
lacks
of
a
graphic
interface
to
draw
path
diagrams,
its
main
advantage
is
that
one
can
complement
its
use
with
all
the
data
analysis
options
and
programming
capabilities
of
R.
In
addition,
plspm
is
totally
free.
It
is
available
from
the
CRAN
http://cran.r‐project.org/
(R
Development
Core
Team
2009).
3
The
R
package
plspm
In
order
to
be
able
to
use
the
package,
one
must
load
it
(after
been
installed):
R> library(“plspm”)
Gastón
Sánchez
Universitat
Politècnica
de
Catalunya
(UPC),
Spain.
pg.
1
The
main
function
of
the
package
is
also
called
plspm.
This
function
has
seven
arguments:
1. x:
a
matrix
or
data
frame
containing
the
manifest
variables
2. inner.mat:
the
inner
design
matrix
that
indicates
the
relationships
among
latent
variables
3. sets:
a
list
of
vectors
that
contain
the
column
indices
of
x
to
form
the
different
blocks
of
variables
4. modes:
a
character
vector
indicating
the
type
of
measurement
(reflective
or
formative)
for
each
latent
variable
5. scheme:
the
inner
weighting
scheme
to
be
used
6. scaled:
a
logical
value
indicating
whether
the
data
must
be
standardized
7. boot.val:
a
logical
value
indicating
whether
bootstrap
validation
must
be
performed
The
path
model
has
to
be
specified
with
the
use
of
the
arguments
inner.mat, sets,
and
modes;
that
is,
using
a
matrix,
a
list,
and
a
vector.
To
be
exact,
the
structural
part
(inner
model)
is
specified
with
the
argument
inner.mat
while
the
measurement
part
(outer
model)
is
specified
by
means
of
the
arguments
sets
and
modes.
The
different
options
of
the
inner
weighting
schemes
are
set
with
the
argument
scheme.
If
bootstrap
validation
is
required,
this
can
be
performed
by
using
the
argument
boot.val.
The
value
returned
by
the
function
plspm
is
an
object
of
class
“plspm”,
which
can
be
printed
and
summarized
by
the
print
and
summary methods.
The
first
three
arguments
must
be
provided
by
the
user.
The
rest
of
the
arguments
have
default
values
and
the
function
can
be
run
without
the
need
to
specify
them.
• The
argument
x
must
be
a
numeric
matrix
or
data
frame
and
no
missing
values
are
allowed.
• The
inner.mat
argument
is
a
special
kind
of
matrix.
This
is
the
matrix
that
indicates
the
structural
relationships
among
latent
constructs.
The
function
plspm
requires
the
inner.mat
argument
to
be
defined
as
a
squared
matrix
with
only
zeros
or
ones.
In
fact,
since
PLS‐PM
only
works
with
recursive
models,
this
matrix
must
be
a
lower
triangular
matrix
which
means
that
the
entries
in
the
diagonal
and
above
the
diagonal
are
zero.
• The
argument
sets
is
a
list
of
length
equal
to
the
number
of
latent
variables.
The
elements
of
sets
are
vectors
which
contain
the
column
indices
of
x;
that
is,
the
indices
of
the
manifest
variables
that
form
the
different
blocks.
• The
argument
modes
is
an
optional
argument.
By
default
it
is
a
character
vector
of
length
equal
to
the
number
of
latent
variables,
and
it
contains
as
many
letters
“A”
as
latent
variables
in
the
model.
This
means
that
the
latent
variables
are
measured
in
a
reflective
way.
If
any
LV
is
supposed
to
be
measured
in
a
formative
way,
the
user
has
to
specify
the
argument
modes
indicating
which
blocks
are
formative
by
a
letter
“B”.
• The
argument
scheme
is
an
optional
parameter.
By
default
it
is
set
to
be
the
character
string
“factor”,
which
means
that
inner
weights
are
calculated
according
to
the
factor
scheme.
The
other
possible
values
are
“centroid”
and
“path”.
• The
argument
scaled
refers
to
a
logical
value
indicating
whether
data
should
be
standardized.
pg.
2
•
The
function
plspm
produces
a
list
with
the
following
results:
1. unidim:
results
for
checking
the
unidimensionality
of
blocks.
Includes:
first
and
second
eigenvalues,
Cronbach’s
alpha,
and
Dillon‐Goldstein’s
rho.
2. outer.mod:
results
of
the
outer
(measurement)
model.
Includes:
outer
weights,
standard
loadings,
communalities,
and
redundancies.
3. inner.mod:
results
of
the
inner
(structural)
model.
Includes:
path
coefficients
and
R‐squared
for
each
endogenous
latent
variable.
4. latents:
matrix
of
standardized
latent
variables
(variance=1).
5. scores:
matrix
of
re‐scaled
latent
variables
when
scaled=FALSE.
If
scaled=TRUE
then
scores
are
equal
to
latents.
6. out.weights:
vector
of
outer
weights.
7. loadings:
vector
of
standardized
loadings
(i.e.
correlations
with
LVs).
8. path.coefs:
matrix
of
path
coefficients;
this
matrix
has
a
similar
form
as
inner.mat.
9. r.sqr:
vector
of
R‐squared
coefficients.
10. outer.cor:
correlations
between
the
latent
variables
and
the
manifest
variables
(also
called
cross‐loadings).
11. inner.sum:
summarized
results
by
latent
variable.
Includes:
type
of
measurement,
number
of
indicators,
R‐.squared,
average
communality,
average
redundancy,
and
average
variance
extracted.
12. gof:
Table
with
indexes
of
Goodness‐of‐Fit.
Includes:
absolute
GoF,
relative
GoF,
outer
model
GoF,
and
inner
model
GoF.
13. effects:
table
of
path
effects
from
the
structural
relationships.
Includes:
direct,
indirect,
and
total
effects.
14. boot.mod:
List
of
bootstrapping
results;
only
available
when
argument
boot.val=TRUE.
4
An
example
with
a
Customer
Satisfaction
study
To
illustrate
the
usage
and
outputs
of
plspm,
we
use
the
satisfaction
dataset
and
the
typical
model
of
the
European
Customer
Satisfaction
Index
(Westlund et al, 2001; Kristensen et al, 2001; and Tenenhaus et al, 2005).
The
dataset
refers
to
customers’
perceptions
about
the
service
provided
by
a
Spanish
credit
institution.
The
data
set
contains
27
variables
observed
on
250
individuals:
Variables
of
block
Image:
columns
1
to
5
Variables
of
block
Expectations:
columns
6
to
10
Variables
of
block
Quality:
columns
11
to
15
Variables
of
block
Value:
columns
16
to
19
Variables
of
block
Satisfaction:
columns
20
to
23
Variables
of
block
Loyalty:
columns
24
to
27
pg.
3
The
last
argument
is
the
optional
logical
argument
boot.val.
It
is
FALSE
by
default,
meaning
that
no
bootstrap
validation
is
performed.
Fig
1
Path
diagram
of
the
ECSI
Model
Since
the
structural
part
of
the
path
model
has
to
be
specified
in
matrix
form,
the
inner
design
matrix
is
defined
as
follows:
IMAG EXPE QUAL VAL SAT LOY IMAG 0 1 0 0 1 1 EXPE 0 0 1 1 1 0 QUAL 0 0 0 1 1 0 VAL 0 0 0 0 1 0 SAT 0 0 0 0 0 1 LOY 0 0 0 0 0 0
Fig
2
Inner
design
matrix
for
expressing
the
structural
relationships
Note
that
the
inner
design
matrix
is
a
Boolean
lower
triangular
matrix.
It
has
zeros
in
its
diagonal
and
above
it,
which
means
that
the
structural
model
is
a
recursive
model
(i.e.
no
loops
in
the
cause‐effect
flow).
The
zeros
in
the
diagonal
indicate
that
a
latent
variable
cannot
affect
itself.
The
causal
relationships
among
latent
constructs
are
established
in
the
matrix
in
an
up‐down
direction,
that
is:
columns
affecting
rows.
The
first
column
of
the
matrix
implies
that
IMAG
affects
EXPE,
SAT
and
LOY;
the
second
column
implies
that
EXPE
affects
QUAL,
VAL,
and
SAT;
the
third
column
implies
that
QUAL
affects
VAL
and
SAT;
and
so
no.
The
measurement
relationships
for
the
latent
variables
are
indicated
with
a
character
vector
containing
A’s
and/or
B’s.
An
“A”
is
used
to
indicate
a
reflective
block
of
manifest
variables
while
a
“B”
is
used
for
a
formative
block
of
indicators
as
in
figure
3.
Fig
3
Reflective
(“A”)
and
Formative
(“B”)
type
of
measurement
relationships
In
this
example
all
the
latent
variables
are
considered
in
reflective
form.
Thus,
the
argument
modes
is
a
character
vector
with
six
elements
given
as:
> modes <- c(“A”, “A”, “A”, “A”, “A”, “A”)
The
selected
scheme
to
calculate
the
inner
weights
is
the
factor
scheme.
In
addition,
since
the
manifest
variables
are
measured
in
the
same
scale,
the
argument
scaled
is
set
pg.
4
as
FALSE
(i.e.
no
standardization
required).
The
code
in
R
to
perform
a
PLS‐PM
analysis
with
the
satisfaction
data
is
shown
below
(figure
4).
Fig
4
Screen
display
in
R
with
the
instructions
to
perform
PLS‐PM
analysis
with
the
satisfaction
data
The
print
method
gives
the
following
display:
Fig
5
Printing
display
of
the
of
an
object
of
class
“plspm”
showing
the
list
of
results
pg.
5
In
order
to
see
a
general
output
of
the
model,
the
summary
method
provides
the
following
results:
The
model
specification
appears
in
first
place,
showing
the
number
of
cases
(250)
of
the
analyzed
data,
the
number
of
latent
variables
in
the
model
(6),
the
number
of
used
manifest
indicators
(27),
the
scale
of
the
data
(Raw
Data),
the
inner
weighting
scheme
(factor),
and
the
bootstrap
validation
option
(FALSE).
Then,
the
definition
of
the
blocks
shows
the
latent
variables,
their
type
(exogenous
or
endogenous),
the
number
of
indicators
in
each
block
and
the
type
of
measurement
relationships.
Fig
6
Specification
of
the
model
with
the
selected
arguments,
and
definition
of
the
blocks
of
variables
The
next
results
in
the
summary
of
the
object
“plspm”
are
the
indexes
used
to
check
the
unidimensionality
of
the
blocks.
Since
all
measurement
relationships
of
the
model
are
reflective,
it
is
meaningful
to
analyze
whether
the
blocks
can
be
considered
to
be
unidimensional.
In
order
to
assess
the
extent
to
which
a
block
is
unidimensional,
plspm
provides
three
indexes:
the
first
and
second
eigenvalues
of
the
MVs
correlation
matrix,
the
Cronbach’s
alpha,
and
the
Dillon‐Goldstein’s ρ
(see
figure
7). The
use
of
eigen‐analysis
is
based
on
the
importance
of
the
eigenvalues.
If
a
block
is
unidimensional,
then
the
first
eigenvalue
of
the
correlation
matrix
of
the
MVs
should
be
larger
than
one
whereas
the
second
eigenvalue
should
be
smaller
than
1.
Cronbach’s
alpha
coefficient
is
another
criterion
used
to
assess
a
block’s
unidimensionality;
it
evaluates
how
well
a
block
of
indicators
measure
their
corresponding
latent
construct.
In
this
case,
the
indicators
are
required
to
be
standardized
and
positively
correlated.
As
a
rule
of
thumb,
a
block
is
considered
as
unidimensional
when
Cronbach’s
alpha
is
larger
than
0.7.
Finally,
the
Dillon‐Goldstein’s ρ is
also
focused
on
the
variance
of
the
sum
of
variables
in
the
block
of
interest.
As
a
pg.
6
rule
of
thumb,
a
block
is
considered
unidimensional
when
Dillon‐Goldstein’s ρ is
larger
than
0.7.
Fig
7
Indexes
used
to
assess
blocks’
unidimensionality
In
fourth
place,
the
results
of
the
outer
model
are
provided:
outer
weights,
standard
loadings,
communality,
and
redundancy.
Fig
8
Results
of
the
outer
model:
outer
weights,
standard
loadings,
communality,
and
redundancy
pg.
7
Then,
the
correlations
between
latent
variables
and
manifest
variables
are
listed.
Fig
9
Correlations
between
latent
variables
and
manifest
variables
(i.e.
cross‐loadings)
In
sixth
place,
the
output
of
the
inner
model
is
displayed
in
a
list
with
the
results
for
each
endogenous
latent
variable:
R2
coefficient,
intercept
term,
and
path
coefficients.
2 Fig
10
Inner
model
results:
R ,
intercept
term,
and
path
coefficients
pg.
8
The
next
results
are
the
correlations
between
latent
variables.
Then,
a
summary
table
for
the
inner
model
is
presented
with
the
average
communality,
the
average
redundancy,
and
the
average
variance
extracted
index.
In
ninth
place,
the
Goodness‐ of‐Fit
(GoF)
are
displayed
(absolute
gof
index,
relative
gof
index,
outer
model
gof,
and
inner
model
gof).
Fig
11
Inner
model
correlations,
summary
table,
and
GoF
index
Finally,
the
table
with
the
path
relations
effects
is
shown
as
in
figure
12.
Fig
12
Table
of
path
effects:
direct,
indirect
and
total
effects
pg.
9
Basically,
plspm
covers
the
PLS‐PM
methodology
described
in
Wold
(1982,
1985),
Lohmöller
(1989),
and
Tenenhaus
et al
(2005).
The
package
offers
the
main
results
from
a
PLS‐PM
analysis
as
most
of
the
available
PLS‐PM
software
programs.
To
conclude,
plspm
provides
a
flexible
easy‐to‐use
function
which
allow
practitioners
and
researchers
from
different
disciplines
to
compute,
interpret,
and
analyze
path
models
from
a
pls
approach.
References
Addinsoft
(2008)
XLSTAT‐PLSPM.
Addinsoft.
Chin,
W.W.
(1993)
PLS Graph – Version 3.0.
Soft
Modeling
Inc.
Fu,
J.‐R.
(2006)
VisualPLS – Partial Least Squares (PLS) Regression – An Enhanced GUI for Lvpls (PLS 1.8 PC) Version 1.04.
National
Kaohsiung
University
of
Applied
Sciences,
Taiwan,
ROC.
Kristensen,
K.,
Juhl,
H.J.,
and
Ostergaard,
P.
(2001)
Customer
satisfaction:
some
results
for
European
Retailing.
Total Quality Management,
12(7&8):
809‐897.
Li,
Y.
(2005)
PLS‐GUI
–
Graphic User Interface for Partial Least Squares (PLS‐PC 1.8) – Version 2.0.1 beta.
University
of
South
Carolina,
Columbia,
SC.
Lohmöller,
J.‐B.
(1987)
PLS‐PC: Latent Variables Path Analysis with Partial Least Squares – Version 1.8 for PC under MS‐DOS.
Lohmöller,
J.‐B.
(1989)
Latent Variable Path Modeling with Partial Least Squares.
Heidelberg:
Physica‐Verlag.
R
Development
Core
Team
(2009).
R: A Language and Environment for Statistical Computing.
R
Foundation
for
Statistical
Computing,
Vienna,
Austria.
ISBN
3‐900051‐07‐ 0,
URL
http:/www.R‐project.org.
Serch,
O.
(2008)
Sistema
de
Visualització
de
models
PLS‐PM.
Projecte
Final
de
Carrera.
Facultat
d’Informàtica
de
Barcelona,
Universitat
Politècnica
de
Catalunya.
Enero,
2008.
Tenenhaus,
M.,
Esposito
Vinzi,
V.,
Chatelin,
Y.M.,
and
Lauro,
C.
(2005)
PLS
path
modeling.
Computational Statistics & Data Analysis,
48:
159‐205.
Test&Go
(2006)
Spad Version 6.0.0.
Paris,
France.
Westlund,
A.H.,
Cassel,
C.M,
Eklöf,
J.,
and
Hackl,
P.
(2001)
Structural
analysis
and
measurement
of
customer
perceptions,
assuming
measurement
and
specifications
errors.
Total Quality Management,
12(7&8):
873‐881.
Wold,
H.
(1982)
Soft
modeling:
The
Basic
Design
and
Some
Extensions.
In:
Systems under indirect observation: Causality, structure, prediction. Part II,
1‐54.
K.G.
Jöreskog
&
H.
Wold
(Eds).
Amsterdam:
North
Holland.
Wold,
H.
(1985)
Partial
least
squares.
In:
Encyclopedia of Statistical Sciences, Vol. 6,
581‐591.
Kotz,
S.,
and
Johnson,
N.L.
(Eds).
New
York:
Wiley.
pg.
10