Active Learning on Spatial Data by nye15450

VIEWS: 8 PAGES: 10

									Active Learning on Spatial Data



          Christine Körner
      Fraunhofer AIS, Uni Bonn
Outline



     •    Active Learning

     •    FAW-Project

     •    Spatial Data

     •    Experiment Outline




                               2
    Active Learning


•    Difficult / expensive to obtain labelled data
        – manual preparation of documents for text mining
        – analysis of drugs or molecules


•    Active learning strategies actively select which data
     points to query in order to
        – minimize the number of training examples for a
          given classification quality
        – maximize the quality of results for a given number
          of data points



                                                               3
Selective Sampling
                    Label?

            Instance                       ORACLE



       add to training set

Which Instance to choose next? Where we
   •   have no data?
   •   perform poorly?
   •   have a low confidence?
   •   expect our model to change?
   •   previously found data that improved quality?


                                                      4
 The FAW-Project

FAW:       Association to regulate outdoor commercials
Goal:      Prediction of traffic frequencies for
           82 major German cities
Samples:   ~ 400-1500 poster sites measured per city




                                                    5
Data Characteristics, Prediction
•   street name,
•   segment ID
•   speed class
•   street type




                                          EIC
                                          EC
                                            HS
                                            H
    sidewalks




                                                FE
•




                                                 ELL
                                                  DE
    one-way-road




                                                     R
                                                     R
•




                                                       ST
                                                       ST
                                                         R.
•   POIs                                                                                           BU
                                                                                                        RG
                                                                                                           WE

     • no. restaurants
                                                                                                              G



     • no. public buildings
                                                                              HALT
                                                                                     ENH
                                                                                           OF F
                                                                       51.912.591              STR
                                                                  93                 51. 9         .
                                                         51.936.5
     • …
                                                                                          12.8




                                                HE
                                   51.912.737                                                  16




                                                 RR
                                                 RR
                                                                                                       51.
                                                                                                             913

    spatial coordinates



                                                    EN
                                                    E
                                                                                                                 .187
•




                                                       HA
                                                                                                                        51.
                                                                                                                              957




                                                         EU
                                                                                                                                  .140




                                                          U SE
                                                            SE
                                                              R
                                                              ST
                                                              S
                                                                R.
KNN:
    •   similarity calculated based on scalar attributes and spatial
        coordinates
    •   applies weights according to (spatial) distance of
        neighbors
                                                                                                                                     6
   Spatial Data
  Spatial Data:
      •    spatial covariance between data points
      •    high autocorrelation and concentrated linkage* on
           street name bias test accuracy
            – 1:n relationship between street name and segments
            – frequencies within one street are alike
Frequency
      2000

      1500
                Streets              Nordstraße                      Riesenweg
      1000

          500   Segments
           0

      •    here: complete instance space is known
                 (all street segments of a city)

*David Jensen, Jennifer Neville: Autocorrelation and Linkage Cause
 Bias in Evaluation of Relational Learners
                                                                                 7
    Active Learning in FAW

Usage:
• additional samples at ~50 places per city
• KNN needs cross product of street segments with
  all poster places
          – Cologne: 50 GB, 5 days

Strategy:
• Data density
     •   mean distance of next
         k neighbors
•    Model differences
     •   Build Model Tree with predicted frequencies
     •   Disagreement between models?


                                                       8
     Experiment Outline
               Samples

                                                            Model Tree
                Training      KNN       Frequencies
Iterations




                                                              Distance
                  Test

                 Oracle                                    Ranking for AL
 •           Comparison of accuracy-increase using
                  Ranking vs Random order of added samples
 •           Alternatives
              • iterative ranking (reality?, greedy search optimal?)
              • rank once, remove similar objects (eg: exclude
                segments of same street, …)
 •           Possible Problems:
              • KNN not very stable
              • few samples, Oracle has little choice to provide
                requested data sets
                                                                         9
Thank you!



      Ideas


                   Questions


     Suggestions




                               10

								
To top