Geographically Weighted Discriminant Analysis and the British General Election by TroyO


									Geographically Weighted Discriminant Analysis and the
2005 British General Election
Ron Johnston, 1 Charles Pattie 2
 School of Geographical Sciences, University of Bristol, Bristol UK. 2 Department of
Geography, University of Sheffield, Sheffield UK

                                 This paper has been
                               submitted for publication

                           NOT TO BE CITED WITHOUT
                           THE AUTHORS’ PERMISSION

       A response to a recent paper in this journal, identifying a substantial error in
       the empirical example used to illustrate GWDA and suggesting that it offers
       no improvement on standard linear discriminant analysis for that problem

Recent statistical developments have enabled substantial advances in the analysis of
spatial data, with novel methods being introduced – such as Geographically Weighted
Regression (GWR) – which allow spatial variations in relationships to be explored.
One such very recent innovatio n, building on GWR, is Brunsdon, Fotheringham and
Charlton’s (2007; henceforth BFC) paper on Geographically Weighted Discriminant
Analysis (GWDA).

While not disputing the potential value of this approach, this brief note focuses on the
empirical study deployed by BFC to illustrate GWDA’s applicability – the pattern of
seats won at the 2005 UK general election in England and Wales only. Two issues are
    1. Does GWDA provide a better set of predictions than standard Linear
        Discriminant Analysis (LDA)?; and
    2. Is the particular approach adopted consistent with the underlying argument
        regarding GWDA?

Discriminant analysis and voting in England and Wales, 2005

To illustrate GWDA empirically, BFC use the results of the 2005 general election in
England and Wales. Six variables (% of economically active males unemployed; % of
the adult population with no qualifications; % of households in owner-occupied
properties; % population pensioners; % population non-white; and % of households
with a lone parent head)), reduced to two principal components, were used to predict
the outcome in the 569 constituencies. According to their Table 1, the Conservative
party won 196 seats, Labour won 314, and a group comprising the Liberal Democrats,
Plaid Cymru and independents won 59: the actual result was – Conservative, 197;
Labour, 315; Liberal Democrat, 51; Plaid Cymru, 3; Independent, 3.
BFC report (p. 386) that using a ‘straightforward (nongeographically weighted)
discriminant [i.e. LDA] analysis to predict the party elected in each constituency’ they
predicted no seats at all for the Liberal Democrats (including Plaid Cymru and the
independents). This is somewhat surprising, and we were unable to replicate their
results in an LDA, using either their dataset (provided to us by Chris Brunsdon) or
another widely used by electoral studies specialists (available at http://ksghome. Instead, we correctly predicted 21 of the 57
seats won by parties other than the Conservatives and Labour, although fewer of the
seats won by those two parties (Table 1).

Not only does our application of LDA produce an outcome superior to BFC’s (at least
for the prediction of the geography of minor party victories) it also produces better
results (again, especially for the ‘third parties’) than BFC’s two applications of
GWDA to that data set – as Table 1 shows. Whereas they are only able to successfully
classify 13 and 10 of the 57 Liberal Democrat and other party seats using the adaptive
kernel and fixed kernel GDWA bandwiths respectively, using LDA we successfully
classified 21.

Given that our LDA application is ‘correct’, then the validity of GWDA – for this
application at least – is open to considerable doubt. Why? One reason, we suggest, is
the approach taken. BFC argue that the advantage of GWDA over LDA is that the
former allows the relationships among the discriminating variables to vary over space
– as is the case with GWR. So why reduce the six variables to two principal
components which are, by their very nature, orthogonal and so – unless very narrow
bandwidths are to be deployed – necessarily invariant in their relationship over space?

We addressed this by running four separate LDAs distinguished by the number of
separate groups to be identified and whether the original six variables or the two
principal components were used as the discriminating variables. The former
separation was undertaken because the third group in BFC’s analyses – Liberal
Democrats, Plaid Cymru and Independ ents – form a chaotic conception; no strong
substantive argument can be made for grouping them together and expecting them to
be similar on the discriminating variables. In particular the three seats won by
Independents have nothing in common: Bethnal Green & Bow was won by the
Respect Party on a largely anti-Iraq War campaign against an incumbent Labour MP
who had voted for the invasion; Wyre Forest was won by a local doctor standing as an
independent who was first elected in 2001 on a campaign focused on retaining
particular facilities at a hospital in the constituency; and the Independent Labour
candidate who won in Blaenau Gwent did so against the Labour party’s candidate
selected through the central party’s policy regarding all-women shortlists in certain

Whichever approach is taken, the results summarised in Table 2 indicate that the
LDAs were superior to GWDA in predicting Liberal Democrat successes (as shown in
Table 1).

By incorporating the three Plaid Cymru won seats plus those won by independents,
BFC are creating difficulties for any discriminant analysis, the independents because
the three cases are so singular in their characteristics and Plaid Cymru because no
variable (such as percentage speaking the Welsh language) is included which could
discriminate the ir seats from the rest. If we exclude those six constituencie s, then we
get the results shown in the second block of data in Table 2; we continue to out-
perform GWDA.

The reason why the LDAs outperform the GWDAs reported by BFC can readily be
appreciated by a comparison of the main maps in BFC’s paper – showing the actual
election result (Figure 3), the predicted result using LDA (Figure 4), the predicted
result using fixed bandwidth GWDA (Figure 8), and the predicted result using
adaptive bandwidth GWDA (Figure 9). Although the country can readily be divided
into blocks according to which of the two main parties is likely to win there, LDA
(according to BFC but not our results reported here) cannot identify the block of seats
won by Liberal Democrats and others in far southwest England and in west Wales.
The GWDAs does partially identify the latter (although only three of the eight
constituencies won by neither the Conservatives nor Labour in Wales), but not most
of the isolated seats/small groups of constituencies won by the Liberal Democrats and
others elsewhere (e.g. southwest London and central southern England). In addition,
the GWDAs divide the country into more cohesive blocks of Conservative and
Labour territory than is actually the case.

A further reason for the relative failure of GWDA with regard to the Liberal
Democrats is that although it explores whether the relationships among the variables
vary across different parts of the country it does not also investigate whether the
differences between Liberal Democrat and Conservative and Labour seats vary: the
relationships are the same, but the allocation is not. Many of the seats won by the
Liberal Democrats are in areas where the Conservatives dominate (mainly in the south
and west of England, but also in some suburban areas), and it is very likely that the
Conservatives would win them if there were not a strong Liberal Democrat
performance. But increasingly over recent elections the Liberal Democrats have also
been challenging very strongly in some Labour strongholds – in 2005, for example,
against government policies on the Iraq War and charging top-up fees to University
students. Those constituencies are very different in their population characteristics
from the first group. It is very unlikely indeed that either an LDA or a GWDA could
separately identify both blocks; instead, they are most likely to identify the largest of
the two only. Because of this, no LDA or GWDA can predict most of the Liberal
Democrat seats; as a consequence, because of the local weighting incorporated, the
latter predicts many more of the Conservative and Labour seats than the former – but
also mis-allocates most of the Liberal Democrat seats to those categories as well.
GWDA gets the big picture right, but cannot identify the residuals within it. BFC
claim, in the abstract to their paper, to have shown that ‘similar social conditions can
lead to different voting outcomes in different parts of England and Wales’: this
discussion shows that in fact they have not, that GWDA has not captured that (well-
known) situation, notably with regard to the very different sources of support for the
Liberal Democrats.


Is GWDA a necessary sophistication in order to improve the predictive capability of a
discriminant analysis in this example? If support for political parties is spatially
clustered, over and above what one would anticipate from knowledge of the
constituencies’ characteristics, then perhaps reflecting that clustering by inclusion of
regional variables would do the job as well. But the results of the analyses reported
here suggest that is unnecessary; because BFC’s LDAs appear to have been wrongly
conducted, we have found no evidence at all that GWDA is superior to LDA for their
chosen empirical task – estimating the outcome of the 2005 British general election in
England and Wales, in particular for the smallest of the three main parties.

This conclusion, of course, does not negate the potential use of GWDA in other
contexts, where there is a prima facie case for arguing that the relationships among
the discriminating variables vary over space. But this is not the case with British
elections – and in any case would not be if the potential for such variation were
denied (or at least very substantially reduced), as BFC did, by reducing the number of
variables through orthogonalisation.

The new generation of local spatial statistical approaches has very considerable
potential to advance our appreciation of a wide range of geographical patterns – but
only if they are applied to problems where the key argument (spatially- varying
relationships) is valid. Sadly this was not the case with BFC’s example for GWDA,
hence its potential awaits further elucidation.


Brunsdon, C., S. Fotheringham, and M. Charlton (2007) “Geographically Weighted
      Discriminant Analysis”. Geographical Analysis 39, 376-396.
Table 1. Predicted outcome (number of seats won) of the 2005 general election in
England and Wales, using DA and GWDA with two principal components

                                     Seats won by
                           C                L              LD+
Using DA
BFC                      150              290                 0
JP                       131              236                21
Using GWDA
BFC (Adaptive kernel)    160              297                13
BFC (Fixed kernel)       161              291                10
ACTUAL                   197              315                57

Key: C – Conservative; L – Labour; LD+ Liberal Democrats and others.
BFC – Brunsdon, Fotheringham and Charlton; JP – Johnston and Pattie; LDA –
Linear Discriminant Analysis; GWDA – Geographically Weighted Discriminant
The BFC results are taken from their Tables 4-6, p. 393.
Table 2. Results of various LDA analyses

                                               Seats won by
                                C          L   LD+      LD      PC        I
All constituencies
6 variables – 3 groups       146      240       28
              5 groups       140      178               21       3        1
2 components – 3 groups      131      236       21
                 5 groups    124       77               11       3        2
Constituencies won by Conservative, Labour and Liberal Democrat only
6 variables                  147      245       30
2 components                 143      227       14

ACTUAL                        197     315        57     51       3        3

Key: C – Conservative; L – Labour; LD+ Liberal Democrats and others; LD – Liberal
Democrats; PC – Plaid Cymru; I – Independents.

To top