A Short Introduction to Sequential Data Mining

Reviews
Shared by: gregoria
Stats
views:
80
rating:
not rated
reviews:
0
posted:
11/21/2008
language:
English
pages:
0
A Short Introduction to Sequential Data Mining Koji IWANUMA Hidetomo NABESHIMA University of Yamanashi The First Franco-Japanese Symposium on Knowledge Discovery in System Biology, September 17, Aix-en-Provence Two Main Frameworks of Sequential Mining Sequential pattern mining for multiple data sequences Sequence ID 1 2 3 4 5 Purchase data record <(wheat, milk), bread, (berry, sausage)> <(bread, pumpkin, sausage)> Sequential pattern mining for a single data sequence Data sequence 2 J. Han and M. Kamber. Data Mining: Concepts and Techniques, www.cs.uiuc.edu/~hanji What Is Sequential Pattern Mining? Given a set of sequences, find the complete set of frequent subsequences A sequence : < (ef) (ab) (df) c b > A sequence database SID 10 20 30 sequence <(ad)c(bc)(ae)> <(ef)(ab)(df)cb> An element may contain a set of items. Items within an element are unordered and we list them alphabetically. 40 is a subsequence of Given support threshold min_sup =2, <(ab)c> is a sequential pattern 3 Challenges on Sequential Pattern Mining A huge number of possible sequential patterns are hidden in databases A mining algorithm should find the complete set of patterns, when possible, satisfying the minimum support (frequency) threshold be highly efficient, scalable, involving only a small number of database scans be able to incorporate various kinds of user-specific constraints J. Han and M. Kamber. Data Mining: Concepts and Techniques, www.cs.uiuc.edu/~hanji 4 Sequential Pattern Mining Algorithms for Multiple Data Sequences Apriori-based method: GSP (Generalized Sequential Patterns: Srikant & Agrawal @ EDBT’96) Pattern-growth methods: FreeSpan & PrefixSpan (Han et al.@KDD’00; Pei, et al.@ICDE’01) Vertical format-based mining: SPADE (Zaki@Machine Leanining’00) Constraint-based sequential pattern mining (SPIRIT: Garofalakis, Rastogi, Shim@VLDB’99; Pei, Han, Wang @ CIKM’02) Mining closed sequential patterns: CloSpan (Yan, Han & Afshar @SDM’03) J. Han and M. Kamber. Data Mining: Concepts and Techniques, www.cs.uiuc.edu/~hanji 5 Mining Sequential Patterns from a Very-Long Single Sequence A series of daily news paper articles < typhoon flood, landslide typhoon flood, landslide > 6 Sequential Pattern Mining Algorithms for a Single data Sequence Discovery of frequent episodes in event sequences, based on a sliding window system [Mannila 1998]: The frequency measure becomes anti-monotonic, but has a problem, i.e., a duplicate counting of an occurrence. Asynchronous periodic pattern mining [Yang et.al 2000, Huang 2004]: Any anti-monotonic frequency measures are not investigated. On-line approximation algorithm for mining frequent items, not for frequent subsequences Lossy counting algorithm [Manku and Motwani, VLDB’02] 7 Research in Our Laboratory Sequential Data Mining from a very-large single data sequence. Main target: sequential textual data, especially, newspaper-articles corpora Objectives: to generate a robust and useful largescale event-sequences corpus. Application 1: topic tracking/detection in information retrieval. Application 2: automated content-tracking in WEB. Application 3: scenario/story semi-automatic creation Ordinary temporal data analysis: various log data in computer systems, genetic information, etc. 8 Technical Topics (1/2) A new framework for extracting frequent subsequences from a single long data sequence: in IEEE Inter. Conf. on Data Mining 2005 (ICDM2005): A new rational frequency measures, which satisfies the Apriori (anti-monotonic) property and has no duplicate counting. A fast on-line algorithm for a some limited case 9 Technical Topics (1/2) On-going current works and future work On-line rational filters based on confidence criteria and/or information-gain for eliminating redundant valueless sequences from system output Methods for finding meta-structures embedded in huge amount of frequent sequences generated by a system A method using compression based on context-free grammarinference/learning More fast extraction algorithm based on a method for simultaneously searching multiple strings over compressed data. 10 References: Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques (Chapter 8). www.cs.uiuc.edu/~hanj 11 Thanks for your attention!! 12

Related docs
A Short Introduction to Sequential Data Mining
Views: 58  |  Downloads: 6
Data Mining Introduction
Views: 62  |  Downloads: 17
Introduction to Sequential XPath
Views: 14  |  Downloads: 4
Top 10 Data Mining Algorithms
Views: 1866  |  Downloads: 77
Data_Mining
Views: 104  |  Downloads: 30
data_mining_concepts_and_techniques
Views: 175  |  Downloads: 30
Data Mining
Views: 840  |  Downloads: 52
Mining
Views: 62  |  Downloads: 16
data mining
Views: 247  |  Downloads: 26
premium docs
Other docs by gregoria
Complete bylaws for small profit corporation
Views: 339  |  Downloads: 16
partnership interest in
Views: 483  |  Downloads: 12
ISHPS_2006_Program_PDF[1]
Views: 217  |  Downloads: 0
Transcript of Voting Rights Act
Views: 188  |  Downloads: 1
Agreements for dissolution of partnership
Views: 828  |  Downloads: 62
2006angelmarketanalysis[1]
Views: 106  |  Downloads: 0
Lee Resolution info
Views: 172  |  Downloads: 0
Minutes of Shareholders Meeting
Views: 265  |  Downloads: 7
Certificate of partnership
Views: 214  |  Downloads: 4
Contract for child to manage parents business
Views: 167  |  Downloads: 0
Lend-Lease Act info
Views: 159  |  Downloads: 0
Bill of Sale for Goods
Views: 754  |  Downloads: 20
Finance Lecture12
Views: 288  |  Downloads: 11
Federal Judiciary Act info
Views: 213  |  Downloads: 0