spatical_data_mining

Description

spatical_data_mining

Reviews
Shared by: ramesh muvvala
Stats
views:
138
rating:
not rated
reviews:
0
posted:
1/1/2009
language:
pages:
0
Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong yangyubin@cuhk.edu.hk Agenda • • • • • Motivation and General Description Data Mining: Basic Concepts Data Mining Techniques Spatial Data Mining Spatial Data Mining Scenarios in Meteorology and Weather Forecasting • Conclusions • Questions & Discussions 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 2 • • • • • Motivation and General Description Data Mining: Basic Concepts Data Mining Techniques Spatial Data Mining Spatial Data Mining Scenarios in Meteorology and Weather Forecasting • Conclusions • Questions & Discussions 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 3 Why do we need Data Mining? • Large number of records(cases) (108-1012 bytes) – – – – One thousand (103) bytes = 1 kilobyte (KB) One million (106) bytes = 1 megabyte (MB) One billion (109) bytes = 1 gigabyte (GB) One trillion (1012) bytes = 1 terabyte (TB) • High dimensional data (variables) – 10-104 attributes • Only a small portion, typically 5% to 10%, of the collected data is ever analyzed • We are drowning in data, but starving for knowledge! 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 4 Scientific Viewpoint • Data collected and stored at enormous speeds (Gbyte/hour) – remote sensor on a satellite – telescope scanning the skies – scientific simulations generating terabytes of data • • • • Classical modeling techniques are infeasible Data reduction Cataloging, classifying, segmenting data Helps scientists in Hypothesis Formation 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 5 Current Situations (1) • Great efforts for construction and maintenance of large information databases • Data cannot be analyzed by standard statistical methods – numerous missing records – data are qualitative rather than quantitative • We do not always know what information might be represented or how relevant it might be to the questions 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 6 Current Situations (2) • the ways and means for using all this data lag far behind the increase of available data – Information can only be found with: • a lot of coincidence (internet) • not explicitly available (company databases) • only accessible for human eyes by using lots of processing power (astronomical, meteorological and earth observation data) • This leads to a clear demand for means of uncovering the information and knowledge hidden in the massive quantities of data 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 7 • • • • • Motivation and General Description Data Mining: Basic Concepts Data Mining Techniques Spatial Data Mining Spatial Data Mining Scenarios in Meteorology and Weather Forecasting • Conclusions • Questions & Discussions 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 8 What is Data Mining? • Data mining is concerned with solving problems by analyzing existing data • ―Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from huge amount of data‖ • Alternative Names: Knowledge Discovery in Databases (KDD) – A term originated in Artificial Intelligence (AI) field – KDD consists of several steps (one of which is Data Mining) 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 9 Data Mining vs. KDD • Knowledge Discovery in Databases (KDD): The whole process of finding useful information and patterns in data • Data Mining: Use of algorithms to extract the information and patterns derived by the KDD process • Data mining is the core of the knowledge discovery process 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 10 KDD Process • Selection: Obtain data from various sources. • Preprocessing: Cleanse data. • Transformation: Convert to common format. Transform to new format. • Data Mining: Obtain desired results. • Interpretation/Evaluation: Present results to user in meaningful manner 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 11 Data Mining: A KDD Process – Data mining: core of knowledge discovery process Pattern Evaluation Data Mining Task-relevant Data Data Warehouse Data Cleaning Data Integration Databases 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 12 Selection Typical Data Mining Architecture Graphical user interface Pattern evaluation Data mining engine Database or data warehouse server Data cleaning & data integration Filtering Knowledge-base Databases Data Warehouse Hong Kong Observatory Hong Kong Meteorological Society 13 2004/09/09 Data Mining: Confluence of Multiple Disciplines Database Systems Statistics Machine Learning Data Mining Visualization Information Theory Algorithms, …,Other Disciplines Hong Kong Observatory Hong Kong Meteorological Society 14 2004/09/09 Data Mining is: • A ―hot‖ word for a class of techniques that find patterns in data • A user-centric, interactive process which leverages analysis technologies and computing power • A group of techniques that find relationships that have not previously been discovered • Not reliant on an existing database • A relatively easy task that requires knowledge of the business problem/subject matter expertise 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 15 Experts and clients are needed in: • • • • • • Define and redefine problems Determine relevant aspects of the problem Supply the data Remove errors from the data Provide constraints on possible patterns Interpret patterns and possibly reject implausible ones • Evaluate predicted effects… 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 16 • • • • • Motivation and General Description Data Mining: Basic Concepts Data Mining Techniques Spatial Data Mining Spatial Data Mining Scenarios in Meteorology and Weather Forecasting • Conclusions • Questions & Discussions 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 17 Primary Data Mining Tasks (1) • Descriptive Modeling – Finding a compact description for large dataset [Concept Description] – Clustering people or things into groups based on their attributes [Clustering] – Associating what events are likely to occur together [Association Rule] – Sequencing what events are likely to lead to later events [Sequential Pattern Analysis] – Discovering the most significant changes [Deviation Detection] 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 18 Primary Data Mining Tasks (2) • Predictive Modeling – Classifying people or things into groups by recognizing patterns [Classification] – Forecasting what may happen in the future by mapping a data item to a predicting real-value variable [Regression] 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 19 Concept Description • Characterization: provides a concise and succinct summarization of the given collection of data • Discrimination: provides descriptions comparing two or more collections of data • can handle complex data types of the attributes • a more automated process 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 20 Concept description: Characterization Name Jim Initial Woodman Relation Scott Lachance Laura Lee … Removed Gender M M F … Retained Major CS Birth-Place Birth_date Residence 3511 Main St., Richmond 345 1st Ave., Richmond 125 Austin Ave., Burnaby … Phone # 687-4598 253-9106 420-5232 … Removed GPA 3.67 3.70 3.83 … Excl, VG,.. Vancouver,BC, 8-12-76 Canada CS Montreal, Que, 28-7-75 Canada Physics Seattle, WA, USA 25-8-70 … … … Sci,Eng, Bus Country Age range Age_range 20-25 25-30 … City GPA Very-good Excellent … Gender Major Birth_region Canada Foreign … Residence Richmond Burnaby … Count 16 22 … Generalized Relation M F … Science Science … Birth_Region Canada Gender M F Total 16 10 26 14 22 36 30 32 62 Foreign Total 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 21 Clustering • Cluster: a collection of data objects – Similar to one another within the same cluster – Dissimilar to the objects in other clusters • Clustering – Grouping a set of data objects into clusters based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity • Example – Land use: Identification of areas of similar land use in an earth observation database – City-planning: Identifying groups of houses according to their house type, value, and geographical location 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 22 Association rule • Association (correlation and causality) – age(X, ―20..29‖) ^ income(X, ―20..29K‖) ―PC‖) [support = 2%, confidence = 60%] buys(X, • Association rule mining – Finding frequent patterns, associations, correlations among sets of items or objects in transaction databases, relational databases, and other information repositories – Frequent pattern: pattern (set of items, sequence, etc.) that occurs frequently in a database • Motivation: finding regularities in data – What products were often purchased together? 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 23 Example: Association rule Transaction-id 10 20 30 40 • Itemset A1,A2={a1, …, ak} • Find all the rules A1A2 with min a1,a2, a3 confidence and support a1, a3 – support, s, probability that a a1, a4 transaction contains A1A2 a2, a5, a6 – confidence, c, conditional probability that a transaction having A1 also contains A2. Let min_support = 50%, min_conf = 50%: a1  a3 (50%, 66.7%) a3  a1 (50%, 100%) Items bought Hong Kong Observatory Hong Kong Meteorological Society 24 2004/09/09 Sequential Pattern Analysis • Given a set of sequences, find the complete set of frequent subsequences SID 10 20 30 sequence <(ad)c(bc)(ae)> <(ef)(ab)(df)cb> Given support threshold min_sup =2, <(ab)c> is a sequential pattern • Applications of sequential pattern – Customer shopping sequences: • First buy computer, then CD-ROM, and then digital camera, within 3 months. 40 – Weblog click streams – Telephone calling patterns 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 25 Deviation Detection • Outlier analysis – Outlier: a data object that does not comply with the general behavior of the data – It can be considered as noise or exception but is quite useful in fraud detection, rare events analysis • Trend and evolution analysis – Trend and deviation: regression analysis – Periodicity analysis – Similarity-based analysis 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 26 Classification and Regression • Classification: – constructs a model (classifier) based on the training set and uses it in classifying new data – Example: Climate Classification,… • Regression: – models continuous-valued functions, i.e., predicts unknown or missing values – Example: stock trends prediction,… 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 27 Classification (1): Model Construction Training Data Classification Algorithms NAME RANK M ike M ary B ill Jim D ave Anne A ssistan t P ro f A ssistan t P ro f P ro fesso r A sso ciate P ro f A ssistan t P ro f A sso ciate P ro f YEARS TENURED 3 7 2 7 6 3 no yes yes yes no no Classifier (Model) IF rank = ‗professor‘ OR years > 6 THEN tenured = ‗yes‘ 28 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society Classification (2): Prediction Using the Model Classifier Testing Data Unseen Data (Jeff, Professor, 4) NAME RANK T om M erlisa G eorge Joseph A ssistant P rof A ssociate P rof P rofessor A ssistant P rof 2004/09/09 YEARS TENURED 2 7 5 7 no no yes yes Tenured? Hong Kong Observatory Hong Kong Meteorological Society 29 Classification Techniques • • • • • Decision Tree Induction Bayesian Classification Neural Networks Genetic Algorithms Fuzzy Set and Logic 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 30 Regression • Regression is similar to classification – First, construct a model – Second, use model to predict unknown value • Methods – Linear and multiple regression – Non-linear regression • Regression is different from classification – Classification refers to predict categorical class label – Regression models continuous-valued functions 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 31 Are All the ―Discovered‖ Patterns Interesting? • A data mining task may generate thousands of patterns, not all of them are interesting. • Interestingness measures: – A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm – Objective vs. Subjective interestingness measures: • Objective: based on statistics and structures of patterns, e.g., support, confidence, etc. • Subjective: based on user‘s belief in the data, e.g., unexpectedness, novelty, executability, etc. 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 32 • • • • • Motivation and General Description Data Mining: Basic Concepts Data Mining Techniques Spatial Data Mining Spatial Data Mining Scenarios in Meteorology and Weather Forecasting • Conclusions • Questions & Discussions 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 33 Spatial Data Mining • Spatial Patterns – – – – Spatial outliers Location prediction Associations, co-locations Hotspots, Clustering, trends, … • Primary Tasks – – – – Mining Spatial Association Rules Spatial Classification and Prediction Spatial Data Clustering Analysis Spatial Outlier Analysis • Example: Unusual warming of Pacific ocean (El Nino) affects weather in USA… 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 34 Spatial Data Mining Results • Understanding spatial data, discovering relationships between spatial and nonspatial data, construction of spatial knowledge bases, etc. • In various forms – The description of the general weather patterns in a set of geographic regions is a spatial characteristic rule. – The comparison of two weather patterns in two geographic regions is a spatial discriminant rule. – A rule like ―most cities in Canada are close to the Canada-US border‖ is a spatial association rule • near(x,coast) ^ southeast(x, USA) ) hurricane(x), (70%) – Others: spatial clusters,… 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 35 What is Spatial Data? • The data related to objects that occupy space – traffic, bird habitats, global climate, logistics, ... • Object types: – Points, Lines, Polygons,etc. Used in/for:     GIS - Geographic Information Systems Meteorology Astronomy Environmental studies, etc. 36 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society Basic Concepts (1) • Spatial data mining follows along the same functions in data mining, with the end objective to find patterns in geography, meteorology, etc. • The main difference (Spatial autocorrelation) – the neighbors of a spatial object may have an influence on it and therefore have to be considered as well • Spatial attributes – Topological • adjacency or inclusion information – Geometric • position (longitude/latitude), area, perimeter, boundary polygon 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 37 Basic Concepts (2) • Spatial neighborhood – Topological relation • ―intersect‖, ―overlap‖, ―disjoint‖, … – distance relation • ―close_to‖, ―far_away‖,… Global Model – direction/orientation relation • ―left_of‖, ―west_of‖,… • Global model might be inconsistent with regional models 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society Local Model 38 Applications • NASA Earth Observing System (EOS): Earth science data • National Inst. of Justice: crime mapping • Census Bureau, Dept. of Commerce: census data • Dept. of Transportation (DOT): traffic data • National Inst. of Health(NIH): cancer clusters • …… 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 39 Example: What Kind of Houses Are Highly Valued?—Associative Classification 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 40 • • • • • Motivation and General Description Data Mining: Basic Concepts Data Mining Techniques Spatial Data Mining Spatial Data Mining Scenarios in Meteorology and Weather Forecasting • Conclusions • Questions & Discussions 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 41 Meteorological Data Mining • Motivation – Lot of analysis methods must be applied to fast growing data for climate studies • Result – Appropriate presentation instruments (graphs, maps, reports, etc) must be applied • Examples – Spatial outliers can be associated with disastrous natural events such as tornadoes, hurricane, and forest fires – Associations between disaster events and certain meteorological observations 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 42 Case Studies (1): Astronomy • SKICAT(SKy Image Cataloging and Analysis Tool ) (Caltech, US) • The Palomar Observatory discovered 22 quasars with the help of data mining • the Second Palomar Observatory Sky Survey (POSS-II) – decision tree methods – classification of galaxies, stars and other stellar objects • About 3 TB of sky images were analyzed 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 43 Case Studies (2): NCAR & UCAR • National Center for Atmospheric Research (NCAR) & University Corporation for Atmospheric Research(UCAR), US – http://www.ucar.edu/ • ―Automatic Fuzzy Logic-based systems now compete with human forecasts‖ • Richard Wagoner, Deputy Director at Research Applications Program(RAP), NCAR • Intelligent Weather System (IWS) – Detection and forecast in the areas of en-route turbulence, en-route icing, ceiling/visibility, and convective hazards in the aviation community – Road winter maintenance, airport operations, and flash flood forecasting 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 44 Operational Application • Prediction System: WIND-2 – WIND: ―Weather Is Not Discrete‖ • Consists of three parts: – Data • Past airport weather observations, 30 years of hourly observations, time series of 300,000 detailed observations • Recent and current observations (METARs) • Model based guidance (knowledge of near-term changes,e.g., imminent wind-shift, onset/cessation of precipitation) – Fuzzy similarity-measuring algorithm – Prediction composition – predictions based on k nearest neighbors(k-nn, clustering method) 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 45 Operational Application • Hybrid methods are used to predict weather – Dynamical approach - based upon equations of the atmosphere,uses finite element techniques – Empirical approach - similar weather situations lead to similar outcomes • WIND runs in real-time for meteorologically different sites • Data-mining/forecast process takes about one second 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 46 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 47 Case Studies (3): CrossGrid (EU) • Objective – To develop, implement and exploit new Grid components for interactive compute and data intensive applications like flooding crisis team decision support systems, air pollution combined with weather forecasting • Main tasks in Meteorological applications package – Data mining for atmospheric circulation patterns • Find a set of representative prototypes of the atmospheric patterns in a region of interest – Weather forecasting for maritime applications – Ocean wave forecasting by models of various complexity 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 48 • Data – ERA-15 using a T106L31 model (from 1978 to 1994) with 1.125◦ resolution – Terabytes – Comprises data from approx. 20 variables (such as temperature,humidity, pressure, etc.) at 30 pressure levels of a 360x360 nodes grid SOM Application for DataMining Adaptive Competitive Learning Downscaling Weather Forecasts Sub-grid details scape from numerical models 6 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 49 Dept. of Applied Mathematics Universidad de Cantabria Santander, Spain 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 50 Case Studies (4): Typhoon Image Data Mining • Objective – To establish algorithms and database models for the discovery of information and knowledge useful for typhoon analysis and prediction – Content-based image retrieval technology to search for similar cloud patterns in the past – Data mining technology to extract spatio-temporal pattern information which is meaningful from the meteorology viewpoints • Result – Alignment of Multiple Typhoons, Explore by Projection to 2D Plane, Diurnal Analysis 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 51 Methods • Archive of approximately 34,000 typhoon images for the northern and southern hemisphere • Various data mining approaches – Principal component analysis(PCA), K-means clustering, self-organizing map(SOM), wavelet transform • Retrieval of historical similar patterns from image databases to perform instance-based typhoon analysis and prediction • Extracting the eigenvectors of the whole typhoon image collection 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 52 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 53 Case Studies (5): LEAD • Linked Environments for Atmospheric Discovery – To accommodate the real time, on-demand, and dynamically-adaptive nature of mesoscale problems • Complexities: vastly disparate, high volume and bandwidth data • Tremendous computational demands – Used in accessing, preparing, assimilating, predicting, managing, mining/analyzing, and displaying a broad array of meteorological and related information • Data Mining Solution Center: ITSC, The Univ. of Alabama in Huntsville, US – http://datamining.itsc.uah.edu/index.jsp 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 54 ADaM • The Algorithm Development and Mining – Component architecture data mining toolkit – For geophysical phenomena detection and feature extraction • Applications – Detecting tropical cyclones and estimating their maximum sustained wind speed – Mesocyclone Identification from RADAR – Detecting Cumulus Cloud Fields in GOES Images 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 55 ADaM (cont’d) – Mesoscale Convective Systems Detection • EOS Special Sensor Microwave/Imager (SSM/I) Brightness Temperature Swaths from DMSP F13 and F14 – Rain Detection Using SSM/I – Lightning Detection Using OLS – Rain Accumulation Study 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 56 Case Studies (6): Rainfall Classification University of Oklahoma Norman • To classify significant and interesting features within a two-dimensional spatial field of meteorological data – Observed or predicted rainfall • Data source – Estimates of hourly accumulated rainfall – Using radar and raingage data • ―Attributes‖ for classification – Statistical parameters representing the distribution of rainfall amounts across the region • Classification Method – Hierarchical cluster analysis 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 57 Many Others… • JARtool Project (Fayyad et al., NASA ) • Identifying volcanoes on the surface of Venus from images transmitted by the Magellan spacecraft • More than 30,000 high resolution Synthetic Aperture Radar(SAR) images of the surface of Venus from different angles • The obtained accuracy was about 80% 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 58 What we can learn from those scenarios? • Data Mining is a promising way for meteorological analysis • Very strong interaction between scientists and the knowledge discovery system is necessary • The users define features of the meteorological phenomena based on their expert knowledge • The system extracts the instances of such phenomena • Then, further analysis of phenomena is possible 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 59 • • • • • Motivation and General Description Data Mining: Basic Concepts Data Mining Techniques Spatial Data Mining Spatial Data Mining Scenarios in Meteorology and Weather Forecasting • Conclusions • Questions & Discussions 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 60 Conclusions • Data mining: discovering interesting patterns from large amounts of data • A natural evolution of database technology, in great demand, with wide applications • A KDD process includes data mining, and other steps • Data Mining can be performed in a variety of information repositories • Data mining Tasks: characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc. 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 61 And now discussion 2004/09/09 Hong Kong Observatory Hong Kong Meteorological Society 62

Shared by: ramesh muvvala
Other docs by ramesh muvvala
assocanay
Views: 5  |  Downloads: 0
datamining
Views: 117  |  Downloads: 30
lecture01
Views: 222  |  Downloads: 10
mining
Views: 81  |  Downloads: 3
Amortised analysis
Views: 8  |  Downloads: 1
assocanay
Views: 26  |  Downloads: 0
assoc
Views: 33  |  Downloads: 0
data mining
Views: 627  |  Downloads: 59
data mining
Views: 88  |  Downloads: 12