Geographical Information System (GIS) to Knowledge
April 2003
Peter Bajcsy, Ph.D.
Research Scientist Adjunct Assistant Professor, CS Department, UIUC Automated Learning Group National Center for Supercomputing Applications University of Illinois pbajcsy@ncsa.uiuc.edu
Outline
• • • • • • •
Problem Statement Top Level Overview Input Information Extraction and Representation Georeferencing and Raster Information Extraction Feature Driven Boundary Aggregation and Evaluation Error Evaluation of New Boundary Aggregations and Decision Making Summary
alg | Automated Learning Group
Acknowledgement
•
Project Team Members: Peter Bajcsy, Peter Groves, Sunayana Saha, Tyler Alumbaugh
•
Support: Michael Welge, Loretta Auvil, Dora Cai, Tom Redman, David Clutter, Duane Searsmith, Lisa Gatzke, Andrew Shirk, Ruth Aydt, Greg Pape, David Tcheng, Chris Navaro, Marquita Miller.
alg | Automated Learning Group
Problem Statement
•
Problem Statement: search for the best partition of any geographical area that is
• • • •
(a) based on raster or point information, (b) formed by aggregations of known boundaries, (c) constrained or unconstrained by spatial locations of know boundaries and (d) minimizing an error metric. Grid-based information, e.g., from satellite or air-borne sensors Geographical point information, e.g., from GPS or address data base Man-made, e.g., Counties, US Census Bureau Territories Defined by environmental characteristics, e.g., Eco-regions, Historical isocontours Defined by applications
• •
Raster or Point Information:
•
•
• •
Boundaries (Vector Data):
•
Spatial Constraints and Error Metric:
•
alg | Automated Learning Group
Top Level Overview
References: ALG Technical Reports: TR-20030226-1.doc, TR-20030211-1.doc, TR-20021011-1.doc Conferences: Peter Bajcsy and Tyler Jeffrey Alumbaugh, ―Georeferencing Maps With Contours,‖ Proceedings of the 7th World Multiconference on Systemics, Cybernetics and Informatics (SCI 2003), Orlando, Florida, July 27-30, 2003. •Peter Bajcsy, ―Automatic Extraction Of Isocontours From Historical Maps,‖ Proceedings of the 7th World Multiconference on Systemics, Cybernetics and Informatics (SCI 2003), Orlando, Florida, July 27-30, 2003.
alg | Automated Learning Group
Input Information Extraction and Representation
alg | Automated Learning Group
Input Information Extraction and Representation
alg | Automated Learning Group
Data Types and Representation: Examples
•
Raster Information: GeoImage Object
•
Boundary Information: Shape Object
• •
Tabular Information: Table Object Neighborhood Information: NBH Object
alg | Automated Learning Group
Raster Data: File Formats
• •
USGS Digital Elevation Data (DEM) Files
• • •
Header file with georeferencing information Floating point values, 30 m spatial resolution, IL coverage, published in 2002 Georeferencing information from: – One or more standardized files are distributed along with TIFF image data as .tfw and/or .txt files. – The metadata is encoded in the image file using private TIFF tags. – An extension of the TIFF format called GeoTIFF is used. Forest labels, 1km spatial resolution, – Forest Cover Types: 29 labels, USA coverage, published in 2000 – Forest Fragmentation Index Map of North America, 8 labels, USA coverage, published in 1993 Land use labels, 1km spatial resolution, world wide coverage, published in 2001
TIFF Files
•
•
alg | Automated Learning Group
Vector Data: File Formats
•
Computational Tradeoffs Between Vector Information Retrieval and Data Storage — US Census Bureau TIGER Files – Elaboration of the chain file structure (CFS) – Used record files 1, 2, I, S, P — Environmental Systems Research Institute (ESRI) Shapefiles – Location list data structure (LLS) – shp, shx, dbf files
•
TIGER to ESRI Shapefiles
alg | Automated Learning Group
Point Data: File Formats
• FBI Crime Reports
United States Crimes Database, years 94-98, USA states, reports per county, published in 2001 • United States Crimes Database, years 98-00, IL state, reports per county, published in 2002
•
• Entries
•
Theme_Keyword: crime, arrests, murder, forcible rape, rape, robbery, aggravated assault, assault, burglary, larceny, motor vehicle theft, theft, arson Multiple Files Varying notation Association with geographical boundary information
• Challenges
• • •
alg | Automated Learning Group
Data Size
Data size driven operations : • Sub-setting • Sub-sampling • Cropping • Zooming
alg | Automated Learning Group
Formation of Vector Data
•
Iso-contour extraction from historical maps
•
Segmentation and clustering of raster data into homogeneous regions
alg | Automated Learning Group
alg | Automated Learning Group
Georeferencing Data Sets and Raster Information Extraction
alg | Automated Learning Group
Georeferencing Data Sets and Raster Information Extraction
alg | Automated Learning Group
alg | Automated Learning Group
Georeferencing Based on Data Types
•
Raster and Raster
•
Vector and Vector
•
Raster and Vector
alg | Automated Learning Group
Georeferencing Based on Coordinate Systems
alg | Automated Learning Group
Raster Information Extraction: Categorical Variable
Frequency of Occurrence
alg | Automated Learning Group
Raster Information Extraction: Continuous Variable
Elevation Statistics Per County
Sample Mean
alg | Automated Learning Group
Standard Deviation
Skew
Kurtosis
Feature Driven Boundary Aggregation and Evaluation
alg | Automated Learning Group
Feature Driven Boundary Aggregation and Evaluation
alg | Automated Learning Group
alg | Automated Learning Group
Spatially Unconstrained Boundary Aggregation
• Hierarchical clustering of crime data with the exit criterion
being the number of clusters and the clustered feature being “auto theft in 2000” leads to six aggregations.
Boundaries
Geographical Display Boundary Aggregations
Tabular Display
alg | Automated Learning Group
Spatially Constrained Boundary Aggregation
•
Hierarchical segmentation and hierarchical clustering of oak hickory feature with the exit criterion of 18 numbers of county aggregations
Boundaries
With Spatial Constraint
Without Spatial Constraint
Boundary Aggregations
alg | Automated Learning Group
Boundary Aggregation With Hierarchical Output
•
Hierarchical segmentation of extracted forest statistics (oak hickory occurrence) with two output partitions.
Boundaries
43 aggregations
21 aggregations
Boundary Aggregations
alg | Automated Learning Group
Error Evaluations of New Territorial Partitions
•
Error evaluation of partitions obtained by clustering and segmentation of mean elevation feature per Illinois county with Variance error metric
alg | Automated Learning Group
Geographical Error Evaluations and Decision Making
•
Geographical error evaluation of partitions obtained by clustering and segmentation of mean elevation feature per Illinois county with Variance error metric
Partition Index
Eval#0
Eval#1
Eval#2
Eval#3
alg | Automated Learning Group
Decision Making
•
Which global partition minimizes a chosen error metric?
•
Which partition minimizes a chosen error metric at a selected fundamental area definition?
•
What is the geographical error distribution given a territorial partition?
alg | Automated Learning Group
Documentation
alg | Automated Learning Group
Summary
• Applications of GIS tools
— — — — — — — Remote Sensing Agriculture Hydrology Water Quality Survey Atmospheric Science Military Socio-Economics
• Interested ? Useful ? Let us know.
alg | Automated Learning Group