200911171208_2009679zsyeo4_archivo_presentacion by wuzhengqin

VIEWS: 1 PAGES: 21

									 Testing the spatial adjacency match of the
       Intiendo address matching tool
for geocoding of addresses with misleading
            suburb or place names
                        by
       Serena Coetzee scoetzee@cs.up.ac.za and
      Magnus Rademeyer magnus@afrigis.co.za
              presented at the ICC 2009,
            Santiago, Chile, November 2009
Overview

  • Why Geocode?
  • The Address Lifecycle
  • Problem statement
  • Address matching with a spatial adjacency match
  • Test runs
  • Results
  • Conclusion




 Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                     Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
¿Why Geocode?


  We geocode addresses to link attribute data to physical
        positions for the purpose of logistics, governance
        (elections, rates and taxes), customer database
        analysis (risk, trade area analytics) and many more….




 Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                     Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
The Address Lifecycle

  We geocode addresses to link attribute data to physical
        positions for the purpose of logistics, governance
        (elections, rates and taxes), customer database
        analysis (risk, trade area analytics) and many more….

                                                                                 Address                                     Address
Address                                  Address
                                                                               Geocoding &                                  Delivery &
Capturing                                Cleaning
                                                                               Verification                                  Analysis




 Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                     Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Problem statement

Alphanumeric matching
101 Rubida Street, Murrayfield incorrectly matched to 110 Rubida Street, Murrayfield




                                                                                  

                                       
   Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                       Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Problem statement



 • Alphanumeric matching by itself can cause errors
     (previous slide)
 • Potential solution: attribute relaxation (i.e. ignore suburb)
 • Most common cause of errors (Goldberg et al. 2007)




 Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                     Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
With spatial adjacency match

 •     Intiendo = alphanumeric matching + spatial adjacency match
 •     Improves geocoding results


                             Alphanumeric match:
                 propose matched address from reference dataset


 Above threshold?
           Yes, proposed matched address is an acceptable result
           No, search for street number in radius around proposed address


 Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                     Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
With spatial adjacency match
                                          Structure input
                                                                                    Alphanumerical
                                       address according to
                                                                                        match
                                             hierarchy



                                                              Above
                                                            Threshold?
                                         NO                                          YES

                                                Spatial                     Final
                                               adjacency                  Geocoding
                                                 match                      result



                                              Satisfactory
                                                Match?
                                                                                     YES

                                NO
                                                        No
                                                    Geocoding
                                                      result



 Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                     Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
With spatial adjacency match




  Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                      Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
With spatial adjacency match

 1. Geocode without SpatialAdjacencyMatch (Non-spatial run)
 2. Geocode with SpatialAdjacencyMatch enabled (Spatial run)

 Compare results




 Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                     Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
With spatial adjacency match

 • Sample input address data
      14,760 address records
      Test for misleading names
      Therefore include only addresses for which province, suburb,
         street name and street number are populated


                                                                                                                                Street
    Province                         Town                        Suburb                      Street Name
                                                                                                                               Number
Gauteng                     Johannesburg                  Saxonwold                     Engelwold Road                    19
Gauteng                     Pretoria                      Atteridgeville                Sekukuni Street                   104
Gauteng                     Midrand                       Noordwyk                      Sagewood Avenue                   637




  Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                      Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
With spatial adjacency match

 Intiendo hierarchy database
 Reference dataset: AfriGIS address data




 Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                     Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Test runs

  Intiendo settings




  Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                      Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Results




                        Results



  Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                      Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Results



                                                                        Spatial run                        Non-spatial run
Customer address records                                            14,670                      14,670
Matched address records                                             8,905 (61%)                 8,514 (58%)
Non-matched address records                                         5,765 (39%)                 6,156 (42%)




      3% is low but improvement on bigger address sets
                 can be significant (next slide),
           e.g. address on different sides of a highway

 Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                     Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Results


       Subsequent real life implementations on
       bigger datasets have yielded significantly
       improved results.

       In a dataset recently analysed for a major credit bureau, 21
       million records were examined. Without Spatial adjacency
       3.87 million were successfully geocoded automatically, with
       Spatial adjacency on, an additional 0.95 million were
       geocoded for a total of 4.82 million. Thus the spatial
       adjacent match yielded a 24.5% improvement.

 Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                     Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Results

     Specific example

                                                                                                                                       Street
         Source               Province                    Town                    Suburb                  Street Name
                                                                                                                                      Number
 1     Input              Gauteng                   Alberton                 New Redruth              Voortrekker Road               16
                          Gauteng                   Alberton                 New Redruth              Voortrekker Road               35
 2     NSR
                          (100%)                    (100%)                   (100%)                   (100%)                         (96%)
                          Gauteng                   Alberton                 South Crest              Voortrekker Road               16
 3     SR
                          (100%)                    (100%)                   (44%)                    (100%)                         (100%)




     Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                         Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Results




                                                                                                        16 Voortrekker Road




                                          35 Voortrekker Road




  Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                      Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Results

  If there are misleading suburb names in
      addresses, alphanumeric match by itself can
      cause errors.

  • Intiendo = alphanumeric + spatial adjacency match
  • More input addresses are matched more accurately
  • Improves quality of results
  • Sample test runs: 3% improvement
  • Real life example: 24.5% improvement
  Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                      Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Conclusion

 • Intiendo address matching
   = alphanumeric string matching + spatial adjacency match
 • Improves quality of results
 • More addresses matched more accurately

 • This work
 • Specific sample dataset showed improvement

 • Future
 • More tests to understand average percentage improvement



  Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names,
                      Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Acknowledgements




            Christopher Ueckermann
                  from AfriGIS

           for running the geocoding tests
                   with Intiendo

								
To top