Docstoc

Classification or Cluster Analysis

Document Sample
Classification or Cluster Analysis Powered By Docstoc
					Discrimination and
   Classification
             Discrimination
Situation:
We have two or more populations p1, p2, etc
(possibly p-variate normal).
The populations are known (or we have data
from each population)
We have data for a new case (population
unknown) and we want to identify the which
population for which the new case is a member.
              The Basic Problem
Suppose that the data from a new case x1, … , xp
has joint density function either :
      p1: g(x1, … , xn) or
      p2: h(x1, … , xn)
We want to make the decision to
   D1: Classify the case in p1 (g is the
       correct distribution) or
   D2: Classify the case in p2 (h is the
       correct distribution)
             The Two Types of Errors

1.  Misclassifying the case in p1 when it actually lies
    in p2.
Let P[1|2] = P[D1|p2] = probability of this type of error


2.  Misclassifying the case in p2 when it actually lies
    in p1.
Let P[2|1] = P[D2|p1] = probability of this type of error

 This is similar Type I and Type II errors in hypothesis
 testing.
Note:
A discrimination scheme is defined by splitting p –
dimensional space into two regions.

1.    C1 = the region were we make the decision D1.
     (the decision to classify the case in p1)

2.    C2 = the region were we make the decision D2.
     (the decision to classify the case in p2)
There can be several approaches to determining the
regions C1 and C2. All concerned with taking into
account the probabilities of misclassification P[2|1] and
P[1|2]


1.   Set up the regions C1 and C2 so that one of the
     probabilities of misclassification , P[2|1] say, is at
     some low acceptable value a. Accept the level of
     the other probability of misclassification P[1|2] =
     b.
2.   Set up the regions C1 and C2 so that the total
     probability of misclassification:

      P[Misclassification] = P[1] P[2|1] + P[2]P[1|2]
      is minimized

      P[1] = P[the case belongs to p1]

      P[2] = P[the case belongs to p2]
3.   Set up the regions C1 and C2 so that the total
     expected cost of misclassification:
     E[Cost of Misclassification] = ECM
       = c2|1P[1] P[2|1] + c1|2 P[2]P[1|2]
     is minimized
       P[1] = P[the case belongs to p1]
       P[2] = P[the case belongs to p2]
       c2|1= the cost of misclassifying the case in p2
             when the case belongs to p1.
        c1|2= the cost of misclassifying the case in p1
              when the case belongs to p2.
        The Optimal Classification Rule
         The Neyman-Pearson Lemma
Suppose that the data x1, … , xp has joint density
function
      f(x1, … , xp ;q)
where q is either q1 or q2.
Let
      g(x1, … , xp) = f(x1, … , xn ;q1) and
      h(x1, … , xp) = f(x1, … , xn ;q2)

We want to make the decision
     D1: q = q1 (g is the correct distribution) against
     D2: q = q2 (h is the correct distribution)
then the optimal regions (minimizing ECM, expected
cost of misclassification) for making the decisions D1
and D2 respectively are C1 and C2

           
                                     L q1          g  x1 ,   , xp       
                                                                             
      C1   x1 ,    , xp                                             k
                                   L q 2           h  x1 ,   , xp       
                                                                            
and
           
                                       L q1        g  x1 ,   , xp       
                                                                             
      C2   x1 ,    , xp                                             k
                                     L q 2         h  x1 ,   , xp       
                                                                            
                          c1 2 P  2
where                k
                          c21 P 1
Proof:    ECM = E[Cost of Misclassification]
            = c2|1P[1] P[2|1] + c1|2 P[2]P[1|2]

      P 1 2   
                         hx , 1        , x p  dx1       dx p
                     C1


     P  2 1  
                         g x ,
                                 1         , x p  dx1       dx p
                     C2

              1              g x ,1       , x p  dx1       dx p
                          C1
                                                                            
 ECM  c2|1 P 1 1    g  x1 ,                 , x p  dx1         dx p  
                                                                            
                       C1                                                   
                c1|2 P  2            hx ,  1        , x p  dx1     dx p
                                  C1
Therefore

    ECM 

              c1|2 P  2 h  x1 ,
                                         , x p   c2|1P 1 g  x1 ,     , x p  dx1   dx p
                                                                                   
         C1
                                c2|1P 1


   Thus ECM is minimized if C1 contains all of the points
   (x1, …, xp) such that the integrand is negative

              c1|2 P  2 h  x1 ,      , x p   c2|1P 1 g  x1 ,      , xp   0

                             g  x1 ,      , xn        c1|2 P  2
                                                    
                             h  x1 ,      , xn        c2|1P 1
          The Neymann-Pearson Lemma
                 (another proof)
Suppose that the data x1, … , xn has joint density
function
      f(x1, … , xn ;q)
where q is either q1 or q2.
Let
      g(x1, … , xn) = f(x1, … , xn ;q1) and
      h(x1, … , xn) = f(x1, … , xn ;q2)

We want to test
     H0: q = q1 (g is the correct distribution) against
     HA: q = q2 (h is the correct distribution)
The Neymann-Pearson Lemma states that the Uniformly
Most Powerful (UMP) test of size a is to reject H0 if:

               L q2        h  x1 ,    , xn 
                                                ka
               L q1        g  x1 ,    , xn 

and accept H0 if:

                L q2        h  x1 ,    , xn 
                                                 ka
                L q1        g  x1 ,    , xn 

 where ka is chosen so that the test is of size a .
Proof: Let C be the critical region of any test of size a.
Let                  
                               h  x1 , , xn       
                                                     
               C   x1 , , xn
                 *
                                                 ka 
                     
                               g  x1 , , xn       
                                                     
           g  x1, , xn  dx1 dxn 
      C*
                             g x ,     1       , xn  dx1     dxn  a
                              C
We want to show that
    h  x ,      1     , xn  dx1     dxn 
  C*
                               h  x ,
                              C
                                              1         , xn  dx1   dxn

 Note:                      
                     C*  C  C  C*  C
                              *
                                                        
                         C  C   *
                                       C   C    *
                                                        C
hence       g  x , , x  dx  1               n           1       dxn 
                C*

        g  x , , x  dx dx
                        1             n       1               n   
      C* C
                      g  x ,                         1           , xn  dx1    dxn  a
                                C* C


and        g  x ,
          C
                            1        , xn  dx1           dxn 

          g  x , , x  dx dx 
                            1             n       1               n
        C* C
                      g  x , , x  dx            1               n     1      dxn  a
                            C * C


Thus              g  x ,         1       , xn  dx1               dxn 

                                                        g  x ,                , xn  dx1
                C* C
                                                                            1                   dxn
                                                  C * C
        C

            C*  C                                C*
                     C*  C
                                  C*  C


  g  x ,1   , xn  dx1   dxn 
C* C
                        g  x ,   1   , xn  dx1   dxn
                     C * C
    g  x ,  1     , xn  dx1 dxn 
  C* C
                         1
                               h  x1 , , xn  dx1
                        ka C * C
                                                                      dxn

                            1
since g  x1 ,   , xn   h  x1 , , xn  in C * .
                           ka
and
        g  x ,   1      , xn  dx1       dxn 
      C * C
                      1
                     ka     h  x ,   1      , xn  dx1     dxn
                          C * C

                          1
since g  x1 ,   , xn   h  x1 ,            , xn  in C * .
                         ka
Thus
              h  x ,     1           , xn  dx1     dxn 
            C* C



                       h  x ,         1       , xn  dx1    dxn
                     C * C

and

        h  x , , x  dx
                        1             n        1   dxn 
       C*

                 h  x ,
                        C
                                           1       , xn  dx1    dxn
when we add the common quantity

                h  x ,       1        , xn  dx1      dxn
             C* C                                                     Q.E.D.
 to both sides.
Fishers Linear Discriminant Function.
Suppose that x1, … , xp is either data from a p-variate
Normal distribution with mean vector:
                   1 or 2

The covariance matrix  is the same for both
populations p1 and p2.
                            1                     1  x  1  1  x  1 
         g x                              e     2


                    2p 
                            p/2
                                  
                                      1/ 2



                             1                    1  x  2  1  x  2 
          hx                              e     2


                    2p 
                            p/2
                                  
                                      1/ 2
The Neymann-Pearson Lemma states that we should
classify into populations p1 and p2 using:
                                       1                    1  x  1   1  x  1 
                                                       e     2

              g x        2p 
                                     p/2
                                            
                                                1/ 2

                    
              hx                    1
                                                       e
                                                            1  x  2   1  x  2 
                                                             2


                           2p 
                                     p/2
                                            
                                                1/ 2



                                 x 2  1  x 2  1  x 1  1  x 1 
                      e
                            1
                            2                             2




That is make the decision
         D1 : population is p1
if  > k
or ln   1  x  2  1  x  2   1  x  1  1  x  1   ln k
          2                              2
or     x  2  1  x  2    x  1  1  x  1   2ln k

or                               
      x 1 x  2  2  1 x   2  1 2
                  x 1 x  2 1 1 x  111  2 ln k


and
        1  2   1 x  ln k  1  1 11   2 1 2 
                                    2
                                                      
Finally we make the decision
        D1 : population is p1

  if     a x  K

  where

                                      2              
   a  1  1  2  and K  ln k  1 111  212
                                                                           
                     c1 2 P  2
   and          k
                     c21 P 1
Note: k = 1 and ln k = 0 if c1|2 = c2|1 and P[1] = P[2].

and K      1
            2    1 11  2  12  
                                             1
                                              2    1  2   1  1  2 
The function
            a x   1  2   1 x
Is called Fisher’s linear discriminant function
                                    p2
             p1
                                    2
                   1




               a x   1  2   1 x  Ka
In the case where the populations are unknown
but estimated from data


   Fisher’s linear discriminant function

               a  x   x1  x2  S 1 x
               ˆ
     200
x2
                               Classify asp 1

                                                  Classify as p 2

     100
                                                                       p1
                                                                       p2



           0
                                                                  x1
               0   20     40        60     80      100      120
     A Pictorial repres entation of Fisher's procedure for two populations
Example 1


              p1 : Riding-mower owners          p2 : Nonowners

        x1 (Income        x2 (Lot size     x1 (Income     x2 (Lot size
        in $1000s)        in 1000 sq ft)   in $1000s)     in 1000 sq ft)

            20.0              9.2            25.0             9.8
            28.5              8.4            17.6            10.4
            21.6             10.8            21.6             8.6
            20.5             10.4            14.4            10.2
            29.0             11.8            28.0             8.8
            36.7              9.6            16.4             8.8
            36.0              8.8            19.8             8.0
            27.6             11.2            22.0             9.2
            23.0             10.0            15.8             8.2
            31.0             10.4            11.0             9.4
            17.0             11.0            17.0             7.0
            27.0             10.0            21.0             7.4
                                          12
Lot Size (in thousands of s quare feet)




                                           8




                                                         Rid in g Mo wer o wn ers
                                                         No n o wn wers
                                           4
                                               10             20                    30            40
                                                    In co me (in th ou s an d s o f d o llars )
Example 2
  Annual financial data are collected for firms
  approximately 2 years prior to bankruptcy and for
  financially sound firms at about the same point in
  time. The data on the four variables
• x1 = CF/TD = (cash flow)/(total debt),
• x2 = NI/TA = (net income)/(Total assets),
• x3 = CA/CL = (current assets)/(current liabilties, and
• x4 = CA/NS = (current assets)/(net sales) are given in
  the following table.
The data are given in the following table:
          Bankrupt Firms                            Nonbankrupt Firms
       x1      x2      x3          x4               x1     x2       x3          x4
Firm   CF/TD     NI/TA     CA/CL    CA/NS    Firm   CF/TD     NI/TA     CA/CL    CA/NS
   1     -0.4485   -0.4106   1.0865   0.4526    1      0.5135    0.1001   2.4871   0.5368
   2     -0.5633   -0.3114   1.5314   0.1642    2      0.0769    0.0195   2.0069   0.5304
   3      0.0643    0.0156   1.0077   0.3978    3      0.3776    0.1075   3.2651   0.3548
   4     -0.0721   -0.0930   1.4544   0.2589    4      0.1933    0.0473   2.2506   0.3309
   5     -0.1002   -0.0917   1.5644   0.6683    5      0.3248    0.0718   4.2401   0.6279
   6     -0.1421   -0.0651   0.7066   0.2794    6      0.3132    0.0511   4.4500   0.6852
   7      0.0351    0.0147   1.5046   0.7080    7      0.1184    0.0499   2.5210   0.6925
   8     -0.6530   -0.0566   1.3737   0.4032    8     -0.0173    0.0233   2.0538   0.3484
   9      0.0724   -0.0076   1.3723   0.3361    9      0.2169    0.0779   2.3489   0.3970
  10     -0.1353   -0.1433   1.4196   0.4347   10      0.1703    0.0695   1.7973   0.5174
  11     -0.2298   -0.2961   0.3310   0.1824   11      0.1460    0.0518   2.1692   0.5500
  12      0.0713    0.0205   1.3124   0.2497   12     -0.0985   -0.0123   2.5029   0.5778
  13      0.0109    0.0011   2.1495   0.6969   13      0.1398   -0.0312   0.4611   0.2643
  14     -0.2777   -0.2316   1.1918   0.6601   14      0.1379    0.0728   2.6123   0.5151
  15      0.1454    0.0500   1.8762   0.2723   15      0.1486    0.0564   2.2347   0.5563
  16      0.3703    0.1098   1.9914   0.3828   16      0.1633    0.0486   2.3080   0.1978
  17     -0.0757   -0.0821   1.5077   0.4215   17      0.2907    0.0597   1.8381   0.3786
  18      0.0451    0.0263   1.6756   0.9494   18      0.5383    0.1064   2.3293   0.4835
  19      0.0115   -0.0032   1.2602   0.6038   19     -0.3330   -0.0854   3.0124   0.4730
  20      0.1227    0.1055   1.1434   0.1655   20      0.4875    0.0910   1.2444   0.1847
  21     -0.2843   -0.2703   1.2722   0.5128   21      0.5603    0.1112   4.2918   0.4443
                                               22      0.2029    0.0792   1.9936   0.3018
                                               23      0.4746    0.1380   2.9166   0.4487
                                               24      0.1661    0.0351   2.4527   0.1370
                                               25      0.5808    0.0371   5.0594   0.1268
Examples using SPSS
Classification or Cluster Analysis


     Have data from one or several
             populations
                 Situation
• Have multivariate (or univariate) data from
  one or several populations (the number of
  populations is unknown)
• Want to determine the number of populations
  and identify the populations
                                           Example
Table: Numerals in eleven languages

English Norwegian Danish Dutch         German    French Spanish       Italian   Polish Hungarian     Finnish

    one       en      en        een        ein       un      uno       uno     jeden       egy           yksi
   two        to       to     twee       zwei     deux        dos      due      dwa      ketto          kaksi
  three      tre      tre      drie       drei    trois      tres       tre      trzy   harom          kolme
   four     fire     fire      vier       vier   quatre   cuarto    quattro   cztery      negy           neua
   five    fem      fem         vijf      funf     cinq    cinco    cinque      piec         ot          viisi
     six   seks     seks        zes     sechs       six      seix       sei    szesc        hat         kuusi
 seven       sju     syv     zeven     sieben      sept     siete     sette siedem          het    seitseman
  eight     atte     otte      acht       acht     huit    ocho        otto   osiem      nyole    kahdeksan
   nine       ni       ni    negen       neun     neuf    nueve       nove dziewiec     kilenc     yhdeksan
    ten        ti       ti     tien      zehn       dix     diez      dieci dziesiec        tiz   kymmenen
Distance Matrix
               Distance = # of numerals (1 to 10) differing in first letter

                          E   N Da Du G Fr Sp I P H Fi
                  E     0                            
                  N     2    0                       
                  Da    2    1 0                     
                                                     
                  Du    7    5 6 0                   
                  G     6    4 5 5 0                 
                  Fr    6    6 6 9 7 0               
                        
                                                     
                                                      
                  Sp    6    6 5 9 7 2 0             
                   I    6    6 5 9 7 1 1 0           
                   P    7    7 6 10 8 5 3 4 0        
                   H                                 
                        9    8 8 8 9 10 10 10 10 0 
                   Fi   
                        9    9 9 9 9 9 9 9 9 8 0    
            Hierarchical Clustering Methods
The following are the steps in the agglomerative Hierarchical
clustering algorithm for grouping N objects (items or variables).
1.   Start with N clusters, each consisting of a single entity and
     an N X N symmetric matrix (table) of distances (or
     similarities) D = (dij).
2.   Search the distance matrix for the nearest (most similar)
     pair of clusters. Let the distance between the "most
     similar" clusters U and V be dUV.
3.   Merge clusters U and V. Label the newly formed cluster
     (UV). Update the entries in the distance matrix by

       a)   deleting the rows and columns corresponding to
            clusters U and V and
       b)   adding a row and column giving the distances
            between cluster (UV) and the remaining clusters.
4.   Repeat steps 2 and 3 a total of N-1 times. (All objects
     will be a single cluster a termination of this algorithm.)
     Record the identity of clusters that are merged and the
     levels (distances or similarities) at which the mergers
     take place.
Different methods of computing inter-cluster distance
                                                        Clu s ter Dis tan ce
                  Sin g le Lin k ag e

      1                                     3                d 24

              2                     4         5


                      Co mp lete Lin k ag e

      1                                       3
                                                             d 15
              2                      4          5



                       Av erag e Lin k ag e

                                                3
          1                                           d 13+ d 14+ d 15+ d 23+ d 24+ d 25
                  2                     4         5                  6
Example
To illustrate the single linkage algorithm, we consider the
hypothetical distance matrix between pairs of five objects given
below:

                                 1    2 3 4 5
                   1           0               
                   2           9     0         
                               3               
     D = {d ik } = 3                 7    0    
                                               
                   4           6     5    9 0 
                   5            11
                                     10        
                                           2 8 0
Treating each object as a cluster, the clustering
begins by merging the two closest items (3 & 5).
To implement the next level of clustering we
need to compute the distances between cluster
(35) and the remaining objects:
      d(35)1 = min{3,11} = 3
      d(35)2 = min{7,10} = 7
      d(35)4 = min{9,8} = 8
The new distance matrix becomes:
The new distance matrix becomes:
              (35     1    2 4
         (35  0              
          1 3         0       
          2 7                
                       9    0 
          4 8        6       
                            5 0

   The next two closest clusters ((35) & 1) are
   merged to form cluster (135). Distances between
   this cluster and the remaining clusters become:
 Distances between this cluster and the remaining
 clusters become:
                  d(135)2 = min{7,9} = 7
                  d(135)4 = min{8,6} = 6
      The distance matrix now becomes:
                 35 2 4 
             35  0    
               2 7 0 
                         
                         
               4  6 5 0
Continuing the next two closest clusters (2 & 4)
are merged to form cluster (24).
  Distances between this cluster and the remaining
  clusters become:
                    d(135)(24) = min{d(135)2,d(135)4)=
      min{7,6} = 6
       The final distance matrix now becomes:
                     35 24
                35  0    
                            
                 24  6 0 

At the final step clusters (135) and (24) are merged to
form the single cluster (12345) of all five items.
The results of this algorithm can be summarized
graphically on the following "dendogram"
         Dendograms

for clustering the 11 languages on the
       basis of the ten numerals
Example 2: Public Utility data

                                                                       variables

                   Company                         X1     X2 X3 X4            X5    X6      X7        X8

         1      Arizona Public Service            1.06    9.2   151    54.4 1.6     9077     0.0      0.628
         2      Boston Edison Co                  0.89   10.3   202    57.9 2.2     5088    25.3      1.555
         3      Central Louisiana Electric Co     1.43   15.4   113    53.0 3.4     9212     0.0      1.058
         4      Commonwealth Edison Co            1.02   11.2   168    56.0 0.3     6423    34.3      0.700
         5      Consolidated Edison Co (NY)       1.49    8.8   192    51.2 1.0     3300    15.6      2.044
         6      Florida Power & Light Co          1.32   13.5   111    60.0 -2.2   11127    22.5      1.241
         7      Hawaiian Electric Co              1.22   12.2   175    67.6 2.2     7642     0.0      1.652
         8      Idaho Power Co                    1.10    9.2   245    57.0 3.3    13082     0.0      0.309
         9      Kentucky Utilities Co             1.34   13.0   168    60.4 7.2     8406     0.0      0.862
         10     Madison Gas & Electric Co         1.12   12.4   197    53.0 2.7     6455    39.2      0.623
         11     Nevada Power Co                   0.75    7.5   173    51.5 6.5    17441     0.0      0.768
         12     New England Electric Co           1.13   10.9   178    62.0 3.7     6154     0.0      1.897
         13     Northern States Power Co          1.15   12.7   199    53.7 6.4     7179    50.2      0.527
         14     Oklahoma Gas & Electric Co        1.09   12.0    96    49.8 1.4     9673     0.0      0.588
         15     Pacific Gas & Electric Co         0.96    7.6   164    62.2 -0.1    6468     0.9      1.400
         16     Puget Sound Power & Light Co      1.16    9.9   252    56.0 9.2    15991     0.0      0.620
         17     San Diego Gas & Electric Co       0.76    6.4   136    61.9 9.0     5714     8.3      1.920
         18     The Southern Co                   1.05   12.6   150    56.7 2.7    10140     0.0      1.108
         19     Texas Utilities Co                1.16   11.7   104    54.0 -2.1   13507     0.0      0.636
         20     Wisconsin Electric Power Co       1.20   11.8   148    59.9 3.5     7287    41.1      0.702
         21     United Illuminating Co            1.04    8.6   204    61.0 3.5     6650     0.0      2.116
         22     Virginia Electric & Power Co      1.07    9.3   174    54.3 5.9    10093    26.6      1.306

              X1: Fixed charge coverage ratio (income/debt)           X2: Rate of return on capital
              X3: Cost per KW capacity in place                       X4: Annual load factor
              X5: Peak KWH demand growth from 1974 to1975             X6: Sales (KWH per year)
              X7: Percent Nuclear                                     X8: Total fuel costs (cents per KWH)
  Table: Distances between 22 Utilities

Firm
number   1      2      3      4      5      6      7      8      9      10      11      12      13      14      15      16      17      18      19   20   21   22

 1       0.00
 2       3.10   0.00
 3       3.68   4.92   0.00
 4       2.46   2.16   4.11   0.00
 5       4.12   3.85   4.47   4.13   0.00
 6       3.61   4.22   2.99   3.20   4.60   0.00
 7       3.90   3.45   4.22   3.97   4.60   3.35   0.00
 8       2.74   3.89   4.99   3.69   5.16   4.91   4.36   0.00
 9       3.25   3.96   2.75   3.75   4.49   3.73   2.80   3.59   0.00
 10      3.10   2.71   3.93   1.49   4.05   3.83   4.51   3.67   3.57    0.00
 11      3.49   4.79   5.90   4.86   6.46   6.00   6.00   3.46   5.18    5.08    0.00
 12      3.22   2.43   4.03   3.50   3.60   3.74   1.66   4.06   2.74    3.94    5.21    0.00
 13      3.96   3.43   4.39   2.58   4.76   4.55   5.01   4.14   3.66    1.41    5.31    4.50    0.00
 14      2.11   4.32   2.74   3.23   4.82   3.47   4.91   4.34   3.82    3.61    4.32    4.34    4.39    0.00
 15      2.59   2.50   5.16   3.19   4.26   4.07   2.93   3.85   4.11    4.26    4.74    2.33    5.10    4.24    0.00
 16      4.03   4.84   5.26   4.97   5.82   5.84   5.04   2.20   3.63    4.53    3.43    4.62    4.41    5.17    5.18    0.00
 17      4.40   3.62   6.36   4.89   5.63   6.10   4.58   5.43   4.90    5.48    4.75    3.50    5.61    5.56    3.40    5.56    0.00
 18      1.88   2.90   2.72   2.65   4.34   2.85   2.95   3.24   2.43    3.07    3.95    2.45    3.78    2.30    3.00    3.97    4.43    0.00
 19      2.41   4.63   3.18   3.46   5.13   2.58   4.52   4.11   4.11    4.13    4.52    4.41    5.01    1.88    4.03    5.23    6.09    2.47    0.00
 20      3.17   3.00   3.73   1.82   4.39   2.91   3.54   4.09   2.95    2.05    5.35    3.43    2.23    3.74    3.78    4.82    4.87    2.92    3.90 0.00
 21      3.45   2.32   5.09   3.88   3.64   4.63   2.68   3.98   3.74    4.36    4.88    1.38    4.94    4.93    2.10    4.57    3.10    3.19    4.97 4.15 0.00
 22      2.51   2.42   4.11   2.58   3.77   4.03   4.00   3.24   3.21    2.56    3.44    3.00    2.74    3.51    3.35    3.46    3.63    2.55    3.97 2.62 3.01 0.00
               Dendogram
Cluster Analysis of N=22 Utility companies
   Euclidean distance, Average Linkage
               Dendogram
Cluster Analysis of N=22 Utility companies
    Euclidean distance, Single Linkage

				
DOCUMENT INFO