Assessment of map similarity of categorical maps using kappa by mifei


									Assessment of map similarity of categorical maps using kappa statistics
The Case of Sado Estuary


In the past thirty years GIS technology has progressed from computer mapping to spatial database management, and more recently, to quantitative map analysis and modeling. However, most applications still rely on visual analysis for determining similarity within and among maps. The aim of this study is to compare management areas of Sado estuary (categorical maps) computed from three different interpolation methods. Different kappa statistics and visual map overlays were used for map comparison. The confusion matrix was used to calculate the Kappa coefficients, to assess agreement between the three interpolation methods. These map comparison techniques help to confirm the no gain of precision of one the methods for homogenous areas delineation and help to find the main sources of difference between the maps.
Keywords:, Comparison Methods, Assessment of Map Similarity, Kappa statistics

INTRODUCTION In the different GIS applications, environmental in particular, compare or detect different categorical maps is an essential issue. The accuracy of a comparison procedure based on a more reliable and robust approach could have a marked improvement in the ability to detect a map change. Map comparison procedures can express the similarity between two maps by looking at simple proportions of areas or by measuring it numerically. This numerical similarity could be assessed by categorical representation of overlay results as a contingency table, and statistical analysis of the latter with various integral measures of association, loglinear models, among others (9). The result of a map comparison can be an overall value for similarity (e.g a value between 0 and 1) or a map in it’s own, which means that the result of a comparison of two maps is a third map which indicates per location how strong the similarity is (5). In many situations, it is preferential to express the level of agreement in a single number. When the comparison consists of a number of pairwise comparisons, the kappa statistic can be a suitable approach (2). The Kappa index of agreement for categorical data was developed by Cohen (3) and was first used in the context of psychology and psychiatric diagnosis and was subsequently adopted by the remote sensing community as a useful measure of classification accuracy. The aim of this study is to present some new variants of Kappa statistic introduced by Pontius (8) and Hagen (5) and use them to compare three maps. These maps represent different methods of delineating environmental management areas of the Sado Estuary. METHODS In order to divide the Sado Estuary in homogenous areas for future environmental management of this ecosystem, geostatistical multivariate techniques were used. Three maps of final homogenous/management areas were computed from three sediment characterization indicators, using: 1) cluster analysis of dissimilarity matrix function of geographical separation followed by indicator kriging of the cluster data, 2) discriminant analysis of kriged values of the three sediment attributes, 3) combination of methods 1 and 2 (fig. 1) (1).

Method 1

Method 2

Method 3

Figure 1 – Maps representing the 3 methodologies for Sado estuary management areas delineation.

The aim of a pair wise post classification comparison is to identify areas of categorical disagreement between two maps by determining the pixels with a difference in theme. For that purpose maps were overlaid on a pixel-by-pixel basis to produce a map and attribute table of site specific differences using simple operations of map algebra in the “map calculator” and reclassify within Arc View®.
To express the level of agreement of the 3 maps in a single number Kappa statistics were used, based upon the so called contingency table (or confusion matrix) - Table 1. This table details how the distribution of categories in map A differs from map B. p iT is the proportion of cells of category i in map A, p Ti is the proportion of cells of category i in map B and p ij

is the proportion of cells of category i of map A in category j of map B.
Table 1 - The contingency table (Adapted from (7)). Map A categories 1 1 2 i j . c Total p11 p21 pi1 pj1 . pc1 pT1 2 p12 p22 pi2 pj2 . pc2 pT 2 Map B categories i p1i p2i pii pji . pci pTi Total j p1j p2j pij pjj . pcj PTj . . . . . . . c p1c p2c pic pjc . pcc pTc p1T p2T piT pjT . pcT 1

Three statistics derived from the contingency table were used (4): P(A) stands for Fraction of Agreement and is calculated according to Equation (eq. 1):

P( A ) =

i =1



(eq. 1)

P(E) stands for Expected Fraction of Agreement subject to the observed distribution, and is calculated according to Equation (eq. 2):

P( E ) =

i= 1



∗ pTi

(eq. 2)

P(max) stands for Maximum Fraction of Agreement subject to the observed distribution, that mean the maximum agreement that could be attained if the location of the cells in one of the maps was to be rearranged and is calculated according to Equation (eq. 3):

P(max) = ∑ min ( piT , pTi )
i =1


(eq. 3)

These statistics where then used for Kappa calculation defined according to the following equation (eq. 4) (3):


P( A ) − P( E ) 1 − P( E )

(eq. 4)

Kappa is the proportion of agreement P(A) after chance agreement P(E) has been removed. If kappa=1, there is perfect agreement. If kappa=0, the agreement is the same as would be expected by randomly arranging cells. The stronger the agreement is, the higher is the value of kappa. Negative values occur when agreement is weaker than expected by chance, but this rarely happens (table 2). Table 2 – Strength of agreement of maps comparison according to Kappa values (6). Kappa Values < 0.00 0.00 – 0.20 0.21 – 0.40 0.41 – 0.60 0.61 – 0.80 0.81 – 1.00 Strength of Agreement poor slight fair moderate substantial almost perfect

The reason to apply Kappa is that the total number of cells taken in by the individual categories can explain part of the cell-by-cell agreement between two maps. Nevertheless the usefulness of Kappa, Pontius (8) clarifies that Kappa statistic confounds quantification error with location error and introduces two statistics to separately consider similarity of location and similarity of quantity.
Klocation compares the actual success rate to the expected success rate relative to the maximum success rate given that the total number of cells of each category does not change. The maximum success rate is calculated according to equation (eq. 3) and Klocation according to equation (eq. 5). The maximum value for Klocation is 1. There is not a minimal value. The advantage above Kappa is that Klocation is independent of the total number of cells in each category.

Kloc =

P( A ) − P( E ) P(max)− P( E )

(eq. 5)

Kquantity is a statistic for disagreement due to quantitative difference. This is more complex than for location, because it is not possible to change quantities of certain categories without changing the locations. It is necessary to correct both for random success and success due to good location specification (eq. 6).


∑p 
i =1




1  Kloc − 1 − Kloc ∗ min , p iT  + c c  

i =1


 2  2 1    Kloc ∗ ( 1 − c ) − 1  piT + Kloc ∗  p iT − min , piT    +   c  c    

(eq. 6)

The comparison is not symmetrical. Which means, that comparison of map A to map B yields different results then map B to map A. This means that one map has to be designated as the ‘real’ or template map. The other map is the ‘model’ or comparison map. After experimenting with the statistics introduced by Pontius, it is recommended not to use the Kquantity statistic, because of the following three reasons (5): 1. The statistic is incomprehensible; it is not possible to give a reasonable explanation to what the formula signifies. For example, it is not clear why the formulation of Kquantity involves Klocation, while the objective is to find a measure for similarity that does not depend on the spatial arrangement.



The range of values for the statistic is not the usual Kappa range between –1 and 1; Kquantity can be larger than 1 in cases where Klocation is low and the best overall agreement does not coincide with identical quantitative distributions of the two maps. The statistic is not stable; a minor change in the maps can lead to a major change in the statistic. This is a problem, which arises in situations where the denominator has a value close to 0.

Hagen (5) proposes an alternative expression for the similarity of the quantitative model that results in the maximal similarity that can be found based upon the total number of cells taken in by each category. This has already been calculated as P(max). P(max) can be put in the context of Kappa and Klocation by scaling it to P(E). The resulting statistic is called Khisto, because it is a statistic that can be calculated directly from the histograms of two maps. Khisto is defined by equation (eq. 7).

Khisto =

P(max) − P( E ) 1 − P( E )

(eq. 7)

The definition of Khisto has the important property that Kappa is now defined as the product of Klocation and Khisto (eq. 8). Klocation is a measure for the similarity of spatial allocation of categories of the two compared maps, and Khisto is a measure for the quantitative similarity of the two compared maps. K= Khisto* Klocation (eq. 8)

RESULTS AND DISCUSSION In Figures 2 and 3 is shown the visual map comparison for methods 1, 2 and 3 using respectively, a binary classification which states for each cell whether or not the maps are identical on that location, and class differences. Method 2 - 3

Method 1 - 2 The following map are f

Method 1 - 3

Figure 2 – Map comparison for methods 1, 2 and 3 using Binary classification.




Proportion of cells

0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 1 and 2

0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 1 and 3

Proportion of cells

Proportion of cells



0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 2 and 3

Method 1 - 2

Method 1 - 3

Method 2 - 3

Figure 3 - Map comparison for methods 1, 2 and 3 using class differences and graphs with proportions of cells for each class differences.

The results of the three different Kappa calculations are presented in Table 3. Analysis of the Kappa values, figures 2 and 3 of the three maps comparison shows an almost perfect agreement (according to (6), see table 2) between map 2 and 3, confirmed not only for quantity but also for location similarity. This result was expected since method 3 is a refinement of discriminant analysis applied in method 2 using the probabilities of Indicator kriging developed in method 1 (1). Method 3 is moderately similar to method 1 (Kappa = 0,55) because, although these maps are computed using different multivariate geostatistics, method 3 uses results from method 1. Maps 1 and 2 are the ones with less strength of agreement (kappa = 0,42) since computed homogenous areas using independent interpolation techniques. Looking at the Kloc value of maps 1 (0.51) the differences between these to maps should be more due to spatial location then -2 quantitative dissimilarities (see table 3). Comparison between Map 1-2 in figures 2 and 3 also confirmed this local difference due to less areas of identical classes of classification. This major location difference can also be true for the maps 1 1 – 3 comparison since Klocation value are more distance from the maximum value then khisto. In opposition, the small difference between map 2 should be due to quantity category values, since Kloc value is almost near -3 maximum similarity. Nevertheless all Klocation values shows agreement substantially greater than agreement expected due to chance (8). This means that although the tree methods of homogenous areas were computed with different statistical techniques, their results are not completely different.

Maps 1-2 1-3 2-3

Table 3 – The Kappa, Kloc and Khisto results for the 3 map comparison. Kappa (-1< K<1) Kloc (max =1) Khisto (max =1) 0,42 0.51 0.83 0,55 0.63 0.87 0,85 0.95 0.89

Despite the good information Kappa statistics computes, contingency tables reduce overlaid maps to a summary by categories thus losing information about neighborhood, directional and distance relationships, and map pattern (9). Also, it is as well cell-by-cell comparison in which cells are either identical or non-identical. There are no intermediate similarities. Also, despite Kloc gives an indication of the similarity of the spatial distribution of categories, the statistics does not make a distinction between a category that is dislocated over the distance of one cell, from a cell that is dislocated over the whole map (5). Nevertheless Kappa statistics and their variants gives a quick and simple indication of the level of agreement between two maps and guidelines of the source and magnitude of differences between two maps. CONCLUSION In this work the advantages of using the Kappa statistics and its new variants to compare maps were demonstrated. The similarity was analyzed not only in terms of location but also in terms of quantity. The Kappa statistics and visual overlay map comparison help us to confirm no gain of precision in using method 3 for homogenous areas delineation and help to find the source of difference between the maps. In future developments fuzzy set theory and fuzzy Kappa statistics will also be used for map comparison. This approach takes both proximity relations and categorical dependencies into account while assessing similarity between two maps (4, 5). ACKNOWLEDGMENT

The authors would like to thank Prof. Gill Pontius, Jr. from Clark University, USA for his advice in Kappa calculations and results discussion. The research was financed by the Portuguese Science and Technology Foundation (Research Project
POCTI/BSE 35137/99).


1. Caeiro, S., Goovaerts, P., Painho, M. and Costa, M. H. Delineation of Estuarine management areas using multivariate geostatistics: the case of Sado estuary . Submitted to Journal of Environmental Science and Technology.

2. Carletta, J. Assessing agreement on classification tasks: the kappa statistic. Computational linguistics; 1996, 22, 249-254. 3. Cohen, J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement. 1960, 20, 37-46.
th 4. Hagen, A Multi-method assessement of map similarity. Proceeding of 5 AGILE Conference on Geographic . Information Science, April. 25 – 27 Spain, 2002, 171- 182.

5. Hagen, A. Technical Report: Comparison of maps containing nominal data. National Institute for Public Health and the Environment, Maastricht, 2002. 6. Landis JR Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977; 33: 159-174. 7. Monserud, R. A., and Leemans, R. Comparing global vegetation maps with the kappa statistic. Ecological Modeling, 1992 62, 275-293. 8. Pontius, J R. G. Quantification error versus location error in comparison of categorical maps. Photogrammetric r, Engineering & Remote Sensing, 2000, 66, 1011-1016. 9. Zaslavsky, I. Analysis of association between categorical maps in multi-layer GIS. Proceeding of GIS/LIS 95, 1995, vol. 2, 14 - 16 November, 1066-1074.
1) ISEGI – Instituto Superior de Estatística e Gestão de Informação Campus de Campolide 1070 – 312 LISBOA Tel: (+ 351 21 387 04 13) Fax: ( + 351 21 387 21 40) Email: URL:

2) Universidade Aberta Departamento de Ciências Exactas e Tecnológicas R. Escola Politecnica, nº 147 1269-001 Lisboa Tel: +351.213916300 Fax: +351.213969293 URL:

To top