Docstoc

EAP

Document Sample
EAP Powered By Docstoc
					Mining Approximate User
Moving Patterns

              Presented by
              Chih-Chieh Hung
              2004 / 4 / 9
Outline
   Introduction
   Preliminary
   Mining Approximate User Moving Patterns
   Conclusions
Introduction
   User moving patterns
       Referring to areas in which mobile users frequently
        travel
   Usefulness of user moving patterns
       Developing efficient location management
       Designing data allocation
       Providing more precise query for location
        dependent data services
Utilizing Moving Patterns in Location
Management
     (A, B)
                 B                               H
                         2           7
            1

                10   9
                                         4
                             8
                                             5       6
        A                C       D
                             3




                                                 G


          Partition 1                    Partition 2
Prior Work
   Procedure for incremental mining of moving
    patterns
       Step 1. (Data collection phase). Employing algorithm MM
        to determine maximal moving sequences from log data
       Step 2. (Incremental mining phase). Employing algorithm
        LM to determine large moving sequences for every w
        maximal moving sequences obtained in Step 1
       Step 3. (Pattern generation phase). Determining moving
        patterns from large moving sequences
Introduction (cont’d)
   Moving log is generated to keep track of
    every movement of mobile users
       Large storage space is needed
       Precise moving patterns are not always necessary
   Exploring existing log to mine moving
    patterns
       Call data records (refer to as CDR)
Characteristics of CDRs




                  Uid     Date    Time      Cellid
                   1    01/02/01 14:00:28     A
                   1    01/02/01 16:01:53     F
Problem
   Given CDRs, mining approximate user
    moving patterns
            12:30
                    A   B       C   D


                    E   F       G   H
                        15:30
                    I   J       K   L


                    M   N       O   P   16:00
  Preliminary
                 Moving
                 Section

                   S1          S2   S3         S4
 Moving
           MS1    A:14     A:3                F:1
Sequence
           MS2     B:1              C:8    C:1,D:1,F:1

           MS3   A:2,C:3   C:1              D:1,F:1

           MS4   A:2,B:3   E:2      F:9

                                          Moving Record
                 inter-count
          Preliminary (cont’d)
   Time projection sequence TPw
               S1   S2       S3   S4   S5   S6
         MS1
         MS2
         MS3
         MS4

        TP4 = { 1        ,        4 , 5          }
Mining Approximate User Moving
Pattern
              Original CDRs in
                  database

           Data Collection Phase

          Time Clustering Phase

          Partial Regression Phase


            Generation Phase


             Approximate User
              Moving Pattern
         Data Collection Phase
   Main steps for data collection
     Collecting w moving sequences
     Finding large moving sequence
     Obtaining regular moving sequences
Finding Large Moving Sequence
  vertical_min_sup = 2


                 S1       S2    S3       S4        S5

       MS1      A:14      A:3           F:1       I:2
                                       C:1,D:    H:1,G:
       MS2      B:1             C:8
                                       1,F:1       4
       MS3    A:2,C:3     C:1         D:1,F:1     H:1

       MS4    A:2,B:3     A:2   F:9

         L = { { A, B }   {A}   φ     { D, F }   {H}      }
        Obtaining Regular Moving
        Sequences
        Match_min_sup = 3
            L = { { A, B }   {A}   φ     { D, F }   {H} }

                      S1      S2   S3      S4        S5

            MS1      A:14    A:3          F:1       I:2     X3

regular                                  C:1,D:1            X3
            MS2      B:1           C:8           H:1,G:4
                                          ,F:1
            MS3    A:2,C:3   C:1         D:1,F:1    H:1     X3

irregular                                                   X2
            MS4    A:2,B:3   A:2   F:9
The Output of Data Collection Phase
      L ={{A,B}, {A},φ, {D,F}, {H}}

                     S1    S2    S3      S4      S5
     Conditional    A:16
                           A:1   φ:0   D:2 F:3   H:2
      support       B:1
                                 Spatial information


                   TP4 = { 1, 2, 4, 5 }
                                     Temporal information
      Time Clustering Phase
 Features of time clustering
   Sequential
   Two merits for clustering
       : traditional distance-based
      2 : variance
   Do not need to assign the number of clusters
Quality of Clustering


    1               4       5                 1                4      5



    9                      12                 9                       12


    Low quality clusters                      High quality clusters



 (*) M.S. C. Cheng-Ru Lin. On the Optimal Clustering of Sequential Data. In
 Proceeding of SIAM International Conference on Data Mining, 2002.
Algorithm TC
   Definition
      A difference sequence Di=TPi+1-TPi where
                                   w  w
    TPw is time projection sequence.
Ex:
  TPw = {1,2,3,4,5,9,10,14,17,18,20,24}
  D = {1,1,1,1,4,1,4,3,1,2,4}
 Algorithm TC
Given , 2, and a time projection sequence
Step 1: Distance-based clustering;
Step 2: mark all clusters which S2>2;
Step 3: for all marked clusters
              decrease  and re-clustering ;
Step 4: repeat Step 2 until =0 or no marked
              cluster;
Step 5: if there exist marked cluster
              then call algorithm TCBK;
An Illustrative Example
     Let =3, 2=1.6, {1,2,3,4,5,9,10,14,17,18,20,24}

     {{1,2,3,4,5}{9,10}{14,17,18,20}{24}}
      ∵ S2({1,2,3,4,5})=2 and S2({14,17,18,20})=4.69
      ∴ set =2, and {{1,2,3,4,5}{9,10}{14,17,18,20}{24}}

     {{1,2,3,4,5}{9,10}{14}{17,18,20}{24}}
       ∵ S2({1,2,3,4,5})=2
      ∴ set =1, and {{1,2,3,4,5}{9,10}{14}{17,18,20}{24}}
                         ……………………
     {1,2,3,4,5} can’t be refined
       ∴ call Algorithm TCBK
Algorithm TCBK

    Divide cluster of contiguous number
     sequence into smaller clusters

    Example:
     {1,2,3,4,5,6,7,8,9,10}
       May lose spatial locality.
Steps of Algorithm TCBK

                      (6)
  start        bk/2   bk-1 bk     bk+1           en
                                                 d
                            (1)
               (2)                       (3)
             (4)                           (5)



   •If bk=start then
           {start,…,end}  {start}{start+1,…,end}
Example for Algorithm TCBK
   Since S2({1,2,3,4,5})=2
       Set bk=3
   Since S2({1,2,3})=0.67
       then divide {1,2,3,4,5} into {1,2,3}{4,5}
       Recursively process {4,5}
   Since S2({4,5})=0.25 and all elements are processed
       then terminate

   The final result:
    { {1,2,3}{4,5}{9,10}{14}{17,18,20}{24} }
Partial Regression Phase
     Dividing the movements into two dimensions
                               y
      y                                        Pyt




                                                     t
                       t
                                                     t




            x                                  Pxt
                                      x
Spatial Information

     X     Y
A    1     1
                                  S1      S2    S3      S4      S5
B    1     1
                 Conditional
D    4     2                   A:16 B:1   A:1   φ:0   D:2 F:3   H:2
                  support
F    3     4
H    5     2
 Coordinate of
   stations                    weight
Partial Regression
     Apply WLS to Pxt and Pyt of every cluster
                                           
     For Pxt, linear estimation function: x (t )   0  1t
                                           
     For Pyt, linear estimation function: y (t )   0  1t

     For whole space,
         integral estimation function is :
                                  
              Z i (t )  ( x (t ), y (t ))
  Partial Regression (cont’d)

         1     t1       w1                          x1 
         1     t2                                  x          0 
                   W                     
                                w2
 Tmat                                             bx   2       
                                                                 *
                                             
         ...   ... 
                         
                                     ...
                                                       ...      1 
                                                      
         1     tn      
                                          wn 
                                                       xn 

Solve normal equation:
                                       
        (WTmat ) (WTmat )  (WTmat ) bx
                T         *          T

to get 0 and 1.
Similarly, we can get 0 and 1
Example - Pxt
            
     Pxt   x (t )  0.1768  0.7221t t  [1,5]
Example - Pyt
            
     Pyt   y(t )  0.6142  0.2954t , t  [1,5]
Example – Z(t)
                       
     Z (t )  ( x (t ), y(t )) t  [1,5]
The Output of Partial Regression Phase
      Z1(t),…, Zk(t) are integral estimation function of
      corresponding cluster CL1,…CLk
     Estimation sequence :

        Z (t )  {Z1 (t ), Z 2 (t ),..., Z k (t )}
     Estimation time interval:

                 T      {t | t  CL }
                                     i
                      i 1, 2,..,k
    Generation Phase
    Approximate Moving Patterns
       = estimation functions + linking linear functions




      linking linear
         function
How to Use Approximate User
Moving Patterns?
   Since we got the approximate user moving
    pattern, after a transformation between the curve
    and base stations, we can predict which base
    stations the user will pass by.

   A easy way: To find the “closest” the coordinate
    of base station.
                                                   D
                                                        2:00
                                           B
                                                   1:30        E
                                    1:00
                                                    C
                                               A               F
Conclusions
   Mining approximate user moving patterns
       Utilize existing log (CDRs)
       Develop a procedure of mining user moving
        patterns
   Future works
       Conducting experiments to verify our proposal
        methods

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:7/31/2012
language:English
pages:34