VIEWS: 6 PAGES: 34 POSTED ON: 7/31/2012 Public Domain
Mining Approximate User Moving Patterns Presented by Chih-Chieh Hung 2004 / 4 / 9 Outline Introduction Preliminary Mining Approximate User Moving Patterns Conclusions Introduction User moving patterns Referring to areas in which mobile users frequently travel Usefulness of user moving patterns Developing efficient location management Designing data allocation Providing more precise query for location dependent data services Utilizing Moving Patterns in Location Management (A, B) B H 2 7 1 10 9 4 8 5 6 A C D 3 G Partition 1 Partition 2 Prior Work Procedure for incremental mining of moving patterns Step 1. (Data collection phase). Employing algorithm MM to determine maximal moving sequences from log data Step 2. (Incremental mining phase). Employing algorithm LM to determine large moving sequences for every w maximal moving sequences obtained in Step 1 Step 3. (Pattern generation phase). Determining moving patterns from large moving sequences Introduction (cont’d) Moving log is generated to keep track of every movement of mobile users Large storage space is needed Precise moving patterns are not always necessary Exploring existing log to mine moving patterns Call data records (refer to as CDR) Characteristics of CDRs Uid Date Time Cellid 1 01/02/01 14:00:28 A 1 01/02/01 16:01:53 F Problem Given CDRs, mining approximate user moving patterns 12:30 A B C D E F G H 15:30 I J K L M N O P 16:00 Preliminary Moving Section S1 S2 S3 S4 Moving MS1 A:14 A:3 F:1 Sequence MS2 B:1 C:8 C:1,D:1,F:1 MS3 A:2,C:3 C:1 D:1,F:1 MS4 A:2,B:3 E:2 F:9 Moving Record inter-count Preliminary (cont’d) Time projection sequence TPw S1 S2 S3 S4 S5 S6 MS1 MS2 MS3 MS4 TP4 = { 1 , 4 , 5 } Mining Approximate User Moving Pattern Original CDRs in database Data Collection Phase Time Clustering Phase Partial Regression Phase Generation Phase Approximate User Moving Pattern Data Collection Phase Main steps for data collection Collecting w moving sequences Finding large moving sequence Obtaining regular moving sequences Finding Large Moving Sequence vertical_min_sup = 2 S1 S2 S3 S4 S5 MS1 A:14 A:3 F:1 I:2 C:1,D: H:1,G: MS2 B:1 C:8 1,F:1 4 MS3 A:2,C:3 C:1 D:1,F:1 H:1 MS4 A:2,B:3 A:2 F:9 L = { { A, B } {A} φ { D, F } {H} } Obtaining Regular Moving Sequences Match_min_sup = 3 L = { { A, B } {A} φ { D, F } {H} } S1 S2 S3 S4 S5 MS1 A:14 A:3 F:1 I:2 X3 regular C:1,D:1 X3 MS2 B:1 C:8 H:1,G:4 ,F:1 MS3 A:2,C:3 C:1 D:1,F:1 H:1 X3 irregular X2 MS4 A:2,B:3 A:2 F:9 The Output of Data Collection Phase L ={{A,B}, {A},φ, {D,F}, {H}} S1 S2 S3 S4 S5 Conditional A:16 A:1 φ:0 D:2 F:3 H:2 support B:1 Spatial information TP4 = { 1, 2, 4, 5 } Temporal information Time Clustering Phase Features of time clustering Sequential Two merits for clustering : traditional distance-based 2 : variance Do not need to assign the number of clusters Quality of Clustering 1 4 5 1 4 5 9 12 9 12 Low quality clusters High quality clusters (*) M.S. C. Cheng-Ru Lin. On the Optimal Clustering of Sequential Data. In Proceeding of SIAM International Conference on Data Mining, 2002. Algorithm TC Definition A difference sequence Di=TPi+1-TPi where w w TPw is time projection sequence. Ex: TPw = {1,2,3,4,5,9,10,14,17,18,20,24} D = {1,1,1,1,4,1,4,3,1,2,4} Algorithm TC Given , 2, and a time projection sequence Step 1: Distance-based clustering; Step 2: mark all clusters which S2>2; Step 3: for all marked clusters decrease and re-clustering ; Step 4: repeat Step 2 until =0 or no marked cluster; Step 5: if there exist marked cluster then call algorithm TCBK; An Illustrative Example Let =3, 2=1.6, {1,2,3,4,5,9,10,14,17,18,20,24} {{1,2,3,4,5}{9,10}{14,17,18,20}{24}} ∵ S2({1,2,3,4,5})=2 and S2({14,17,18,20})=4.69 ∴ set =2, and {{1,2,3,4,5}{9,10}{14,17,18,20}{24}} {{1,2,3,4,5}{9,10}{14}{17,18,20}{24}} ∵ S2({1,2,3,4,5})=2 ∴ set =1, and {{1,2,3,4,5}{9,10}{14}{17,18,20}{24}} …………………… {1,2,3,4,5} can’t be refined ∴ call Algorithm TCBK Algorithm TCBK Divide cluster of contiguous number sequence into smaller clusters Example: {1,2,3,4,5,6,7,8,9,10} May lose spatial locality. Steps of Algorithm TCBK (6) start bk/2 bk-1 bk bk+1 en d (1) (2) (3) (4) (5) •If bk=start then {start,…,end} {start}{start+1,…,end} Example for Algorithm TCBK Since S2({1,2,3,4,5})=2 Set bk=3 Since S2({1,2,3})=0.67 then divide {1,2,3,4,5} into {1,2,3}{4,5} Recursively process {4,5} Since S2({4,5})=0.25 and all elements are processed then terminate The final result: { {1,2,3}{4,5}{9,10}{14}{17,18,20}{24} } Partial Regression Phase Dividing the movements into two dimensions y y Pyt t t t x Pxt x Spatial Information X Y A 1 1 S1 S2 S3 S4 S5 B 1 1 Conditional D 4 2 A:16 B:1 A:1 φ:0 D:2 F:3 H:2 support F 3 4 H 5 2 Coordinate of stations weight Partial Regression Apply WLS to Pxt and Pyt of every cluster For Pxt, linear estimation function: x (t ) 0 1t For Pyt, linear estimation function: y (t ) 0 1t For whole space, integral estimation function is : Z i (t ) ( x (t ), y (t )) Partial Regression (cont’d) 1 t1 w1 x1 1 t2 x 0 W w2 Tmat bx 2 * ... ... ... ... 1 1 tn wn xn Solve normal equation: (WTmat ) (WTmat ) (WTmat ) bx T * T to get 0 and 1. Similarly, we can get 0 and 1 Example - Pxt Pxt x (t ) 0.1768 0.7221t t [1,5] Example - Pyt Pyt y(t ) 0.6142 0.2954t , t [1,5] Example – Z(t) Z (t ) ( x (t ), y(t )) t [1,5] The Output of Partial Regression Phase Z1(t),…, Zk(t) are integral estimation function of corresponding cluster CL1,…CLk Estimation sequence : Z (t ) {Z1 (t ), Z 2 (t ),..., Z k (t )} Estimation time interval: T {t | t CL } i i 1, 2,..,k Generation Phase Approximate Moving Patterns = estimation functions + linking linear functions linking linear function How to Use Approximate User Moving Patterns? Since we got the approximate user moving pattern, after a transformation between the curve and base stations, we can predict which base stations the user will pass by. A easy way: To find the “closest” the coordinate of base station. D 2:00 B 1:30 E 1:00 C A F Conclusions Mining approximate user moving patterns Utilize existing log (CDRs) Develop a procedure of mining user moving patterns Future works Conducting experiments to verify our proposal methods