# Factor Analysis08

Document Sample

```					Multivariate statistics: factor analysis
Introduction
The situations where more than two variables are to be analysed together that we have looked at
so far have all had an EV - DV distinction. Specifically several EVs and a single DV.
However, in some research, especially of the more 'exploratory' type, one has measured a sample
of cases on a set of variables without there being any recognisable EVs at all. One just wants to
see the relationships among those DVs. E.g. an attitude instrument where one asks people to rate
20 accents of English for 'friendliness', or a questionnaire where there are 30 questions about the
reading strategies learners claim to use. In such instances one may have a single, more or less
homogeneous, group of subjects and the interest is just in seeing which variables are related to
which and which not for such cases. This leads naturally to identifying subsets of the variables
that are mutually highly related among themselves, but not related with those in other subsets.
E.g. it might turn out that people who plan before their reading also tend to predict a lot during it
(but maybe those two do not correlate much with rereading and using a dictionary). Then in
further work in a sense these can be regarded as not two variables any more, but a unitary set or
group of them. Also in later studies one might start with the hypothesis that the same sets of
variables would emerge as related as in previous studies. It is for this reason that the techniques
for analysis in this area are often called data reduction techniques.
One can also use factor analysis where there are clear hypotheses in advance about which
variables are related. Then the term confirmatory factor analysis may be used. However there
are no inferential tests related to this to establish whether the differences between the expected
groups and the groups of variables actually identified are significantly different. One just has to
judge descriptively.
In this sort of analysis, groups of variables that turn out to be highly correlated with each other
(positively or negatively) are regarded as reflecting a single 'supervariable' called a factor. Note
that this is a different use of that word from that in connection with ANOVA. There, a factor was
an EV that we had made or measured, and in categories. Here, a factor is not a variable that we
measured directly, and not an EV necessarily, and it is interval in scale. It is a new underlying
variable reconstructed from variables that we did measure, designed to summarise the shared
variance in a group of actual variables, and so be capable of being used later instead of them.
Another example. Perhaps there are EVs in your study, but so many DVs that one wants to find a
way of reducing their number before looking at their relationship with the EVs, to avoid getting
buried under so much analysis that one will not be able to see the wood for the trees. If you have
EVs like level of student, type of teaching they have had, etc. AND a questionnaire where they
had to rate their interest in 20 topic areas of English, then you might do well to 'reduce' those
topic areas into (maybe three or four) sets which internally were responded to much the same.
You do this by using FA to boil down the 20 to a few factors. Subjects can then be given a
summary score for each set (factor scores) replacing those on each of the individual original
variables so you then have your data on a reduced set of 'supervariables' (factors) that is almost
as good as all the original 20 ones. So if people who were interested in current affairs as a topic
in the English class were also interested in biographies of famous people, each person's score on
the factor that encapsulates this common trend will reflect this. You can then go on to do your
ANOVAs or whatever using your EVs, and these new 'supervariables' as the DVs. Much more
manageable. Note, a key feature of the factors that result from factor analysis is that they have
zero correlation with each other. I.e. the patterns of variation that get picked out of the data are
totally distinct from each other.
The main technique for identifying these relationships in a set of variables and doing the
'reduction' is Factor Analysis or Principle Components Analysis. These are strictly different
and have many variants, but for simplicity we skate over that here and refer to them as FA. They
essentially identify (linear, symmetric) relationships among members of a set of variables: these
variables could be quite disparate and on different scales (e.g. language learning aptitude,
instrumental motivation, integrative motivation, attitude to learning English, intelligence,
parental encouragement, etc., or a set of language tests all scored on different scales), OR they
could be effectively a set of repeated measures, like the interest in 20 different topics mentioned
above (but treated as separate variables here), or acquisition scores for 10 syntactic features, or
strategy use ratings from a questionnaire about 72 strategies. However for FA to be appropriate
they have to be each on an interval scale, or at a pinch just in two exhaustive categories. There
are other new facilities of SPSS to deal with where the variables are not all interval.
In general, 'multivariate' statistics like FA where not just two but many variables are involved
need larger numbers of cases in the analysis than some other techniques. The recommendation is
that a minimum requirement is to have at least as many cases as you have variables, and really at
least twice as many. Some statisticians say you need four times as many.
While ANOVA can be thought of as a sort of extension of t tests to more variables, FA is
essentially an extension of the Pearson r correlation coefficient to more than two variables.
Hence the relevant relationships between variables are assumed to be linear. If there are strong
non-linear ones, they will not be spotted by FA (they may well be identified as lack of
relationship).
A final preliminary point is that all this is done effectively as a descriptive procedure. Unlike
many other statistics we have used, FA has no inferential side. Ordinary factor analysis does not
have p/sig values associated with its results: the figures you get at the end are rather like
correlation coefficients, and vary between -1 and +1 in the same way, with values near those
extremes showing high degrees of relationship. We have to use judgement at various stages to
decide what factors are prominent/useful.
(We previously saw a different extension of Pearson r to more than two columns of figures... that
was the alpha reliability coefficient. The difference between that and FA is that alpha quantifies
how far all the columns are positively correlated with each other. I.e. the assumption is that only
one variable/factor is involved, and all correlations are meant to be positive. Any negative
correlations will reduce alpha. FA however does not assume that all the columns correlate
positively/ negatively/at all with each other. It allows you to find which groups of columns
correlate with each other (positively or negatively) and so can be seen as reflecting separate
factors).
Overview of doing factor analysis
Purpose: to find distinct patterns of variation that crop up strongly in much the same way in more
than one variable, thus identifying groups of variables that share much the same pattern/are
closely related. These shared patterns of variation are referred to as factors. E.g. if it turns out,
say, that people who plan a lot when reading also predict a lot, then these variables share a
common pattern of variation across the subjects, and factor analysis will identify one 'factor'
underlying both.
There are three main steps, of which the first is really a preliminary to what is usually called
'factor analysis'.
a) Check that there are some reasonable correlations between the variables (and some other
technical things that make the data suitable). Obviously no point in even beginning to look for
groups of variables that are closely related if none are related much to any other. Ideally one
would check for any non-linear relations, by looking at scatterplots, and for normality of
distribution of each variable at this stage also (though it is common to proceed even if the latter
is not met, since this is not an inferential procedure).
b) Decide how many groups of variables should be identified. Put another way, how many
distinct patterns of variation ('factors') are strongly evident in the data? At one extreme they
might all be strongly correlated with each other - one factor, all variables in one group. Or there
might be two very marked patterns of variation, with each variable exhibiting mostly one or the
other type, etc. Or at the other extreme there may be no widely shared patterns so you end up
with as many factors as you had original variables. Some investigator judgment comes in at the
stage of deciding on the number of patterns of variation/factors to identify.
c) Get the variables assigned as efficiently as possible to the number of factors/groups you have
decided on, so that as far as possible each variable you started with belongs in one group only.
That is a bit of an oversimplification, however. In fact factor analysis does not end up putting
each variable in one group or another, it rather shows each variable's affinity to every group, and
leaves it to the researcher to spot which group each variable fits in with best. Suppose three
factors are established, then it tells us how strongly each variable fits in with each of the three
factors. It is possible that a variable might end up belonging moderately well in more than one
group. After this the researcher of course has to interpret the groups to see if they make sense,
either in the light of prior expectations or by 'after the event' explanation.
When you use the Data reduction... Factor command in SPSS it produces a dialog box on
screen with several buttons along the bottom where you can request particular things done. Three
of these correspond closely to the three stages above, respectively (a) Descriptives, (b)
Extraction and (c) Rotation. The best way to perform the analysis therefore is to run the SPSS
Factor command three times, each time making different choices under these three heads.
Strictly there are many forms of factor analysis, using further different options in SPSS. Some
are technically not 'factor analysis' at all, in the strict sense, but 'principal components analysis',
but I shan't distinguish these here. The following glosses over the complexity and uses a
simple-minded general purpose approach.
FA TASK 1. Five interval scale variables, with no very obvious EV among them.
1) A student studied 18 adult learners of English in a class at Essex and measured five variables
via a questionnaire, administered anonymously. Each variable in itself was measured by a set of
statements to each of which the subject had to respond on a five point agree-disagree scale, or in
some other similar fashion. Their score for that variable is then a total of ratings of several items.
Integrative orientation. I.e. how far they were interested in learning the language for the
purposes of communicating with people and fitting into the target culture (as against, for
example, purely for employment purposes).
English class anxiety. Measured via response to statements like 'It embarrasses me to volunteer
Motivational intensity. Measured via items like
'When it comes to English homework, I
a) just skim over it
b) put some effort into it, but not as much as I could
c) work very carefully, making sure I understand everything'
Attitude to English teacher. Measured by ratings of the teacher for being boring, polite,
organised etc.
Attitude to English class. Measured by ratings of it for being boring, educational, useful etc.
We could of course criticize the way these variables were measured in the first place, and, since
each is derived from an inventory of items each scored separately and then summed, examine the
internal reliability of each measure. But let's suppose these validity and reliability aspects are OK
and continue.
What reliability measure would you use on each internally, if checking that?
2) The researcher does not regard any of these variables as EVs, but is just interested in which
are related to each other. She does not present any clear expectations either.
Can you think which of these variables you would expect to correlate with any other ones. E.g.
would you expect people high in class anxiety to have a more favourable attitude to the teacher
etc.? We have to do this from common sense here, but there may well be some past research that
would suggest relationships between some of these, which should properly be looked at before
proceeding.
Relationships among a set of DV variables like this are often looked at in a purely exploratory
mode by researchers, with no stated expectations, just questions (using so-called exploratory
factor analysis). However, it is good practice I think to always sit down and think before you do
it whether you don't really have some expectation/hypothesis about how some of the variables
might be related to each other. Where there are clear hypotheses in advance, the term
confirmatory factor analysis may be used.
3) To explore the example data, load it from the file charl.txt. This is not an SPSS file so has to
be loaded via Read text data. It is in five columns of summary scores corresponding to the five
variables as given above. Name the columns appropriately.
Step (a) PRECHECKS. Before going on to factor analysis proper it is usually a good idea to get
an initial idea of the relationships in the data by looking at the ordinary Pearson r correlations
between each pair of variables:
We did earlier in ?which tasks?
This step also serves as a check on how to proceed.
You could do this via Analyze...Correlation...Bivariate as before, but it is best now to use
options within the SPSS Factor command that also calculate Pearson r. (In SPSS there are often
several ways to arrive at the same information via different commands!). Choose Analyze...
Data Reduction.. Factor from the menus on the top of the screen when you have the data or
output sheet showing. Then highlight the names of all five variables on the left and transfer them
to the Variables box. Click the Descriptives button and deselect Initial Solution. In the
Correlation Matrix section choose Coefficients and KMO and Bartlett's test. Then
Continue..OK.
In the Output sheet you get a correlation matrix of the Pearson r correlation between every pair
of variables, including each variable with itself (on the diagonal). Excluding the diagonal set of
results, the ones of interest all appear in a triangle.
For each pair of variables you are given the value of r, which we recall can have values between
what and what? What r would show a high degree of relationship - i.e. scores varying in the same
pattern on both variables?
In our table
Does every variable correlate to some degree with at least one other?
'Some degree' is a matter of judgment of course, but if a variable has near zero
correlation with all others it might as well be left out of the later factor analysis.
Are the variables you expected to be related actually related? In a positive or
negative way? Or can you explain them now you see they are related?
You also need to look at the Kaiser-Meyer-Olkin measure and the Bartlett significance. For
technical reasons that I won't go into, as a rule of thumb, one might query if going on to factor
analysis proper is a good idea if the Kaiser-Meyer-Olkin measure is less than .5 or the Bartlett
significance value greater than .05. (Yes, I have checked, with the Bartlett test you want it to be
significant, unlike other similar precheck tests we have used before like Mauchly's and Levene's
that you want to be non-sig).
Does this data pass these tests?
In essence the Bartlett test checks if there are some correlations among the variables or not.
Obviously no point in doing FA unless there are! If Bartlett is significant, that means one can
reject the null hypothesis that there are no correlations in the data.
In essence the KMO test checks if there are some instances of pairs of variables whose
correlation is also shared by a third. Again, no point in doing FA unless some patterns of
relationship spread beyond pairs of variables into larger subsets of them. If KMO is large, the
more of such widely shared relationship there is.
You could of course generate the scatterplots for every pair of variables and examine them as we
did before alongside the use of Pearson r. However, this is impractical with more than three
variables as you would have a confusingly large number of such plots to look at. A good idea,
though, is to look at the scatterplots of pairs of variables that seem to be poorly related, just to
check that the relationship is genuinely poor and not rather a good but nonlinear one (e.g. a
quadratic U shape). Remember quadratic relations will not be properly reflected by Pearson r or
FA, but it is as well to be aware of them if any are there.
When you look at a scatterplot, what sort of distribution of spots on the graph
represents 'no relation' as against 'non-linear relation'?
A final check would be the old K-S test of normality of each variable, though experts put more
reliance in things like the KMO test.
4) Step (b) CHOOSING HOW MANY FACTORS. A problem with just looking at
relationships pairwise like this is that broader aspects are not taken into account - e.g. high
correlation shared by sets of three or more variables. The picture is often a bit different when all
the variables are analysed together for mutual relationship. This is where factor analysis (or
principal components analysis) proper comes in.
The general outcome sought from factor analysis is an indication of not which pairs of variables
are strongly related between themselves, but which larger groups (e.g. sets of three or more
variables) share similar variation. Along with this there has also to be an indication of how many
groups of mutually correlating variables it is reasonable to identify in the data - i.e. how many
really distinct factors to identify. We have five variables, so various outcomes are possible. No
variables correlating much with any other (so five groups of one, but we have seen that is
unlikely already), a group of two and one of three, or perhaps they are all varying in parallel with
each other to a considerable degree (one group) etc. This now has to be decided in step (b) -
partly as a matter of investigator judgment.
Run through the relevant SPSS commands again, Analyze... Data Reduction.. Factor. At the
dialog box, the five variables you want are already in the Variables box. You just have to change
some of the requests for types of information. Click Descriptives and deselect KMO, Bartlett
and the Correlation Matrix Coefficients. We don't want that all over again. Instead select Initial
Solution. Click also Extraction and under that click to get the scree plot. Then Continue and
OK.
The aim on this run is to decide how many groups of mutually related variables to identify. There
are various ways of doing this.
i) You might have a reason from past research to select a particular number of groups/factors that
other people found, but you would be rash not to look also at the following criteria to see if your
expectation was confirmed.
ii) You can go by the eigenvalues, given in the Total Variance Explained.... Initial
Eigenvalues table, and recognise as many groups/factors/components as have eigenvalues above
1. One is always the average eigenvalue of all the factors, so you are deciding to take as many as
are 'above average'. SPSS inclines to do this as default. The output for our example shows you
five eigenvalues, because there are five variables in the analysis, so the maximum
factors/components is five.
How many eigenvalues are above 1 in our example?
So how many factors/groups would you select in step two on this criterion?
iii) You can go by the pattern of change in eigenvalues, and choose to stop recognising further
groups/factors where recognising additional ones does not improve the efficiency of the analysis
markedly. The scree plot helps with this one. Essentially the scree plot is a graph of the five
eigenvalues in order of descending size, so you always get a falling line. The idea is that you
look to see where the line changes from steeply dropping to a gradual slope: usually there is a
clear 'point of inflexion' or elbow. You count those points (factors) to the left of it, and that is
your optimal number of groups/factors to recognise.
In our example, how many factors does this criterion suggest?
The graph is called a scree plot because it is regarded as looking like a cliff with loose stones
(=scree) at the bottom. You have to look at it to decide where the cliff ends and the pile of scree
that has fallen down it begins. The idea is that the unimportant factors are those whose inclusion
would not increase the amount of variation accounted for markedly, which is reflected in the
shallow sloping line of the 'scree'. The ones to the left of the elbow account for most of the
shared relationship in the variables, and those in the low slope at the bottom are not
cost-effective to add and are not markedly different in their contribution among themselves.
iv) You can try step (c) below repeatedly with several choices of numbers of factors and choose
the one that is easiest to explain/interpret in real terms.
(ii) and (iii) need more explanation. What are these eigenvalues? To understand them
mathematically you need to know matrix algebra, but for our purposes they can be taken as
indicating how much useful information is captured by recognising different numbers of factors.
The procedure works like this. When you run SPSS Factor, the computer first seeks out the kind
of variation that is shared most widely among the five variables. E.g., oversimplifying, perhaps
there is a tendency for cases 3, 4 and 11 to score highly on three of them and for cases 8 and 18
to score low on those same ones. Those three variables share some common variance (or
symmetric relationship). The computer works out how far all the variables reflect this sort of
variation and the size of the first eigenvalue reflects how much of the general variation in the
scores is covered by this strongest tendency (factor/component). The computer then excludes this
shared variation from consideration and looks for the next most prominent tendency - e.g.
perhaps for cases 3, 5 and 8 to score high on another subgroup of the variables. And so on. (Does
that remind you of the 'Stepwise' option in Multiple Regression? It might, as it is all the same
maths in the end.) Since there are no perfect correlations in the data, clearly all the variation
among the scores on all the variables cannot be covered without identifying five factors, but we
are not interested in the last few factors which cover small amounts of shared variation. In most
analyses the first one, two or three factors are all that need be concentrated on: in other words it
is quite common to find no more than three different patterns of variation really strongly
reflected in a set of variables. Criteria like (ii) and (iii) decide the cut-off point. Hence also the
description of all this by SPSS as 'data reduction'.
Closely related to eigenvalue is percent of variance, displayed in the same table. This perhaps
more clearly than the eigenvalue shows how adding each successive kind of variation into the
account 'explains' successively less of the shared variation among the variables.
If one just settled for one factor, and regarded all the variables as basically one
group with one kind of variation parallel across all of them, how much of the
variation in the scores would not be 'covered'?
What if one settled for the first two factors - i.e. two groups of variables, two
patterns of parallel variation across the original variables? Does that cover the
vast majority of the variation in scores?
The 'percent of variation' measure is of course similar to what we saw elsewhere quantified by
eta squared and R squared, which are readily convertible to %.
In short, the procedures of (ii) and (iii) are cost-effectiveness measures - ways of deciding the
smallest number of underlying factors (alias distinct patterns of covariation, alias groups of
variables) that need to be recognised to enable the greatest amount of the shared variation to be
accounted for.
NUMBER OF FACTORS. You again choose Analyze... Data Reduction.. Factor. This time
you click Extraction and enter the Number of Factors you have decided on (let's say two, on
criterion (ii) above, though criterion (iii) suggested only one), and deselect scree plot. Continue.
You click on Rotation and choose Varimax: this is the commonly chosen procedure to
reorganise the data to fit in optimally with the two factors in the light of the other three now not
being included, so that as much shared variation as possible is reflected in those two. In the set of
Rotation choices also choose to be given both the Rotated Solution and the Loading Plot.
(Note, no rotation can be done if you choose only one factor).
We now need to look in the output at the Rotated Factor/Component Matrix table. This shows
how the five original variables relate to the chosen factors (i.e. the strongest shared patterns of
variation in the data). You can interpret the figures in the table like correlation coefficients: you
look for the highest positive or negative figures in each column.
So which of the five variables are related most closely with Factor 1 - i.e. might
be regarded as a group of variables which predominantly vary in parallel with
each other? (The technical jargon often speaks of variables being loaded on
factors, so we are looking for the variables most strongly loaded on Factor 1).
Are they all positively related to the factor (and so each other) or are some
negatively?
Can you see any reason why this subset of variables should be so strongly related
to each other? Never forget to interpret the results! What name would you give
this shared kind of variation?
Which are most heavily loaded on Factor 2 - i.e. form another group with a
different shared pattern of variation from that of Factor 1?
Again, can you see why these ones might be related to each other, and in a
different way from the group loaded on Factor 1? Can you 'interpret' this factor?
Are any of the variables more or less equally (un)related to both factors?
6) Note that in all this the Factors come to be spoken of as entities in themselves. One can speak
of our original variables being 'correlated with factors' or 'loaded on factors', as if the Factors
were new variables themselves. In fact the factors are effectively new variables, each one
summarising most of the information in the group of variables which load heavily on it. In effect
the whole analysis enables us to talk about our data by reference to the limited number of new
factors chosen, rather than the five original variables, because we have shown that all five are not
really telling a different story - there are only two different patterns of real importance (hence the
reference to this as 'data reduction'). Remember that factor analysis works in such a way that the
factors it identifies are totally distinct. The correlation between them is exactly zero. And being
variables, the 18 cases in the study all have a score on each of the factors, just as on the original
variables. SPSS can be made to produce these if you need them.
If you had to give each Factor a name, in its capacity as summarising the set of
variables in its group, what would it be?
For linguists an analogy with deep and surface structure is informative. Just as one can show that
apparently different surface phenomena in a language all depend, say, on one single parameter
setting, so factor analysis often shows that what one thought of as separate variables are actually
scored high and low on in a similar way by people so really one 'super-variable' underlies them.
Hence people speak of factor analysis as revealing the 'hidden structure' of data, the 'underlying
dimensionality' of a set of variables, etc. The downside can be where you do exploratory factor
analysis but cannot see any rhyme or reason in the grouping of variables that emerges associated
with different factors.
7) To reinforce the grouping of variables associated with the factors, a useful graph is the
Component/Factor Plot in Rotated Space. This is a special kind of scatterplot in which the
original variables appear as the 'cases' plotted in relation to the 'new' Factor variables which are
the axes. It works well with two factors, but if you had chosen three then it would not be so
simple.
Can you see how your two groups of variables appear visually on this graph, and
their relationship to the factors?
Which group looks more tight-knit? Why?
Note. If you had chosen just one factor as relevant, not two, you would not succeed in getting
SPSS to do any 'rotation', or produce that last graph. The information you need is in fact there in
the Component Matrix which you get if you choose Factor...Extraction...Unrotated Factor
Solution, since when one factor is picked, no rotation can improve on the analysis beyond that
which is arrived at in the first place.
Does the one factor solution produce a picture of the inter-relations between
variables that makes better or less 'sense' in applied linguistic terms than the two
factor solution?
As you see, there are quite a lot of choices based on investigator judgment in FA. There is no
associated significance test to guide you, e.g. to tell you which groupings of variables would be
likely to hold true in the population of cases you have sampled, I am afraid.
8) In this study it was the groupings of variables that were of interest. We did not get SPSS to
produce the scores of each person on each factor. However if we were interested in how
individual people scored on the elemental patterns reflected in the 'reduced' version of the data,
we would want to see these. Also if we wanted to use a reduced form of the data for some further
analysis we would need them.
Just for illustration, why not get SPSS to produce these scores and look at them. Go through the
commands as usual, leaving the settings as they were except that at the Factor Analysis box
click the Scores option and ask select Save as Variables. You can of course ask for scores on
the assumption that there is only one factor of interest, or two etc. The score of each person on
each factor will appear as extra columns in the Data grid. To understand the nature of these
scores: click Analyze... Descriptives and get the means and SDs of all the variables (both the
original ones and the two new factors)
What do you notice about the means, SDs and ranges of score?
Remember the original variables may often be on all sorts of different scales, as in our example.
In order to look for common patterns of variation and represent these in summary form as the
new factors the statistical procedures standardize all the scores along the way so that they all
have the same average and spread (mean and SD) and all fall usually within the range +3 to -3.
They can then all be dealt with on the same basis when common patterns are sought. Remember
that all correlation methods like Pearson r and Factor Analysis work on similarities in high or
low scoring on a relative basis (i.e. for each person relative to the other scores on the same
variable, not other ones). You see the typical features of 'standardized scores' (also known as 'z
scores') in the columns of scores for the factors.
Also try getting the Pearson r correlation between the two columns of factor scores.
What is it? Why?

FA TASK 2. Another one like the last
Just to check that you have got the hang of FA, here is another data set for you to try analysing,
this time with less detailed support from me. Refer back to task 1 where you can't recall what to
do.
Gitsaki gathered data from 108 learners of English at school in Greece, aged 10-15. They
answered a questionnaire on how they are taught and learn vocabulary. The data is in git.sav with
columns already labelled. Among the questions were some related to dictionary use, as follows
(translated from the original Greek):
3) If you are looking for a word in the dictionary, you use:
a. bilingual dictionary 3--2--1--0
b. monolingual dictionary 3--2--1--0
c. both kinds of dictionary 3--2--1--0
4) How often do you use a dictionary to:
A. check:
a. the spelling of a word 3--2--1--0
b. the pronunciation of a word 3--2--1--0
c. the meaning of a word 3--2--1--0
d. the part of speech of a word (e.g. adjective, verb, noun etc.)
3--2--1--0
B. find: an example with this word 3--2--1--0
The 3--2--1--0 rating scale was glossed as 'always usually sometimes never'. As usual, you may
be able to think of some non-optimal aspects of these questions, and this form of data gathering
in general. But let us proceed to the results.
How many variables have we got information on from the above questions?
Do any of the variables suggest themselves as obvious EVs?
The research was exploratory, but we might have some common sense expectations about some
aspects of what these variables are measuring.
Which of the following hypotheses could FA provide an answer to, for learners of
the type represented?
a) They look up spelling more often than pronunciation
b) Bilingual dictionary users tend to look up spelling and meaning
but not seek examples
c) There is a relationship between use of monolingual dictionaries
and search for meaning information
d) There are fewer people of this sort who use monolingual
dictionaries than use bilingual ones
Can you think of others?
In fact two of the above hypotheses would require a different statistical test and associated
graphs to settle. FA and associated correlation measures would not help.
Which two?
What would that test be? We had it in task... which one?
Do you get anything on (b) and (c) above or do the emergent groups of variables
reflect something else?
PJS rev slightly 08
(To add: strung out and amalgamated applications)

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 7 posted: 7/4/2012 language: pages: 13