VIEWS: 4 PAGES: 29 POSTED ON: 8/4/2012
Thematic Information Extraction: Pattern Recognition Chapter 9 Classification Multispectral classification may be performed using a variety of methods, including: • algorithms based on parametric and nonparametric statistics that use ratio- and interval-scaled data and nonmetric methods that can also incorporate nominal scale data; • the use of supervised or unsupervised classification logic;, • the use of hard or soft (fuzzy) set classification logic to create hard or fuzzy thematic output products; • the use of per-pixel or object-oriented classification logic, and hybrid approaches. Classification • Parametric methods such as maximum likelihood classification and unsupervised clustering assume normally distributed remote sensor data and knowledge about the forms of the underlying class density functions. • Nonparametric methods such as nearest-neighbor classifiers, fuzzy classifiers, and neural networks may be applied to remote sensor data that are not normally distributed and without the assumption that the forms of the underlying densities are known. • Nonmetric methods such as rule-based decision tree classifiers can operate on both real-valued data (e.g., reflectance values from 0 to 100%) and nominal scaled data (e.g., class 1 = forest; class 2 = agriculture). Supervised Classification In a supervised classification, the identity and location of some of the land- cover types (e.g., urban, agriculture, or wetland) are known a priori through a combination of fieldwork, interpretation of aerial photography, map analysis, and personal experience. The analyst attempts to locate specific sites in the remotely sensed data that represent homogeneous examples of these known land-cover types. These areas are commonly referred to as training sites because the spectral characteristics of these known areas are used to train the classification algorithm for eventual land-cover mapping of the remainder of the image. Multivariate statistical parameters (means, standard deviations, covariance matrices, correlation matrices, etc.) are calculated for each training site. Every pixel both within and outside the training sites is then evaluated and assigned to the class of which it has the highest likelihood of being a member Unsupervised Classification In an unsupervised classification, the identities of land-cover types to be specified as classes within a scene are not generally known a priori because ground reference information is lacking or surface features within the scene are not well defined. The computer is required to group pixels with similar spectral characteristics into unique clusters according to some statistically determined criteria. The analyst then re-labels and combines the spectral clusters into information classes. Hard vs. Fuzzy Classification • Supervised and unsupervised classification algorithms typically use hard classification logic to produce a classification map that consists of hard, discrete categories (e.g., forest, agriculture). • Conversely, it is also possible to use fuzzy set classification logic, which takes into account the heterogeneous and imprecise nature of the real world. Per-pixel vs. Object-oriented Classification • In the past, most digital image classification was based on processing the entire scene pixel by pixel. This is commonly referred to as per-pixel classification. • Object-oriented classification techniques allow the analyst to decompose the scene into many relatively homogenous image objects (referred to as patches or segments) using a multi-resolution image segmentation process. The various statistical characteristics of these homogeneous image objects in the scene are then subjected to traditional statistical or fuzzy logic classification. • Object-oriented classification based on image segmentation is often used for the analysis of high-spatial-resolution imagery (e.g., 1 1 m Space Imaging IKONOS and 0.61 0.61 m Digital Globe QuickBird). Land-use and Land-cover Classification Schemes • Land cover refers to the type of material present on the landscape (e.g., water, sand, crops, forest, wetland, human-made materials such as asphalt). • Land use refers to what people do on the land surface (e.g., agriculture, commerce, settlement). Land-use and Land-cover Classification Schemes • Mutually exclusive means that there is no taxonomic overlap (or fuzziness) of any classes (i.e., deciduous forest and evergreen forest are distinct classes). • Exhaustive means that all land-cover classes present in the landscape are accounted for and none have been omitted. * Hierarchical means that sublevel classes (e.g., single-family residential, multiple-family residential) may be hierarchically combined into a higher- level category (e.g., residential) that makes sense. This allows simplified thematic maps to be produced when required. Land-use and Land-cover Classification Schemes It is also important for the analyst to realize that there is a fundamental difference between information classes and spectral classes. * Information classes are those that human beings define. * Spectral classes are those that are inherent in the remote sensor data and must be identified and then labeled by the analyst. Land-use and Land-cover Classification Schemes Certain hard classification schemes can readily incorporate land-use and/or land-cover data obtained by interpreting remotely sensed data, including the: • American Planning Association Land-Based Classification System which is oriented toward detailed land-use classification; • United States Geological Survey Land-Use/Land-Cover Classification System for Use with Remote Sensor Data and its adaptation for the U.S. National Land Cover Dataset and the NOAA Coastal Change Analysis Program (C-CAP); • U.S. Department of the Interior Fish & Wildlife Service Classification of Wetlands and Deepwater Habitats of the United States; • U.S. National Vegetation and Classification System; • International Geosphere-Biosphere Program IGBP Land Cover Classification System modified for the creation of MODIS land-cover products U.S. Geological Survey Land- Use/Land-Cover Classification System Four Levels of the U.S. Geological Survey Land- Use/Land-Cover Classification System for Use with Remote Sensor Data and the type of remotely sensed data typically used to provide the information. Selecting the Optimum Bands for Image Classification: Feature Selection • Once the training statistics have been systematically collected from each band for each class of interest, a judgment must be made to determine the bands (channels) that are most effective in discriminating each class from all others. • This process is commonly called feature selection. The goal is to delete from the analysis the bands that provide redundant spectral information. In this way the dimensionality (i.e., the number of bands to be processed) in the dataset may be reduced. • This minimizes the cost of the digital image classification process (but should not affect the accuracy). Feature selection may involve both statistical and graphical analysis to determine the degree of between-class separability in the remote sensor training data. • Using statistical methods, combinations of bands are normally ranked according to their potential ability to discriminate each class from all others using n bands at a time. Select the Appropriate Classification Algorithm • Various supervised classification algorithms may be used to assign an unknown pixel to one of m possible classes. The choice of a particular classifier or decision rule depends on the nature of the input data and the desired output. Parametric classification algorithms assumes that the observed measurement vectors Xc obtained for each class in each spectral band during the training phase of the supervised classification are Gaussian; that is, they are normally distributed. Nonparametric classification algorithms make no such assumption. • Several widely adopted nonparametric classification algorithms include: • one-dimensional density slicing • parallepiped, • minimum distance, • nearest-neighbor, and • neural network and expert system analysis. • The most widely adopted parametric classification algorithms is the: • maximum likelihood. Unsupervised Classification Unsupervised classification (commonly referred to as clustering) is an effective method of partitioning remote sensor image data in multispectral feature space and extracting land-cover information. Compared to supervised classification, unsupervised classification normally requires only a minimal amount of initial input from the analyst. This is because clustering does not normally require training data. Unsupervised Classification • Unsupervised classification is the process where numerical operations are performed that search for natural groupings of the spectral properties of pixels, as examined in multispectral feature space. • The clustering process results in a classification map consisting of m spectral classes. The analyst then attempts a posteriori (after the fact) to assign or transform the spectral classes into thematic information classes of interest (e.g., forest, agriculture). • This may be difficult. Some spectral clusters may be meaningless because they represent mixed classes of Earth surface materials. The analyst must understand the spectral characteristics of the terrain well enough to be able to label certain clusters as specific information classes. Unsupervised Classification Hundreds of clustering algorithms have been developed. Two examples of conceptually simple but not necessarily efficient clustering algorithms will be used to demonstrate the fundamental logic of unsupervised classification of remote sensor data: • clustering using the Chain Method • clustering using the Iterative Self-Organizing Data Analysis Technique (ISODATA). Clustering Using the Chain Method The Chain Method clustering algorithm operates in a two-pass mode (i.e., it passes through the multispectral dataset two times). Pass #1: The program reads through the dataset and sequentially builds clusters (groups of points in spectral space). A mean vector is then associated with each cluster. Pass #2: A minimum distance to means classification algorithm is applied to the whole dataset on a pixel-by-pixel basis whereby each pixel is assigned to one of the mean vectors created in pass 1. The first pass, therefore, automatically creates the cluster signatures (class mean vectors) to be used by the minimum distance to means classifier. ISODATA Clustering • The Iterative Self-Organizing Data Analysis Technique (ISODATA) represents a comprehensive set of heuristic (rule of thumb) procedures that have been incorporated into an iterative classification algorithm. Many of the steps incorporated into the algorithm are a result of experience gained through experimentation. • The ISODATA algorithm is a modification of the k-means clustering algorithm, which includes a) merging clusters if their separation distance in multispectral feature space is below a user-specified threshold and b) rules for splitting a single cluster into two clusters. ISODATA Clustering • ISODATA is iterative because it makes a large number of passes through the remote sensing dataset until specified results are obtained, instead of just two passes. • ISODATA does not allocate its initial mean vectors based on the analysis of pixels in the first line of data the way the two-pass algorithm does. Rather, an initial arbitrary assignment of all Cmax clusters takes place along an n- dimensional vector that runs between very specific points in feature space. The region in feature space is defined using the mean, µk, and standard deviation, sk, of each band in the analysis. This method of automatically seeding the original Cmax vectors makes sure that the first few lines of data do not bias the creation of clusters. ISODATA Clustering ISODATA is self-organizing because it requires relatively little human input. A sophisticated ISODATA algorithm normally requires the analyst to specify the following criteria: • Cmax: the maximum number of clusters to be identified by the algorithm (e.g., 20 clusters). However, it is not uncommon for fewer to be found in the final classification map after splitting and merging take place. • T: the maximum percentage of pixels whose class values are allowed to be unchanged between iterations. When this number is reached, the ISODATA algorithm terminates. Some datasets may never reach the desired percentage unchanged. If this happens, it is necessary to interrupt processing and edit the parameter. ISODATA Clustering • M: the maximum number of times ISODATA is to classify pixels and recalculate cluster mean vectors. The ISODATA algorithm terminates when this number is reached. • Minimum members in a cluster (%): If a cluster contains less than the minimum percentage of members, it is deleted and the members are assigned to an alternative cluster. This also affects whether a class is going to be split (see maximum standard deviation). The default minimum percentage of members is often set to 0.01. Analog and Digital Image Analysis Tasks Object-oriented Image Segmentation • This need has given rise to the creation of image classification algorithms based on object-oriented image segmentation. The algorithms incorporate both spectral and spatial information in the image segmentation phase. • The result is the creation of image objects defined as individual areas with shape and spectral homogeneity which one may recognize as segments or patches in the landscape. In many instances, carefully extracted image objects can provide a greater number of meaningful features for image classification. • In addition, objects don’t have to be derived from just image data but can also be developed from any spatially distributed variable (e.g., elevation, slope, aspect, population density). • Homogeneous image objects are then analyzed using traditional classification algorithms (e.g., nearest-neighbor, minimum distance, maximum likelihood) or knowledge-based approaches and fuzzy classification logic. Object-oriented Image Segmentation • There are many algorithms that can be used to segment an image into relatively homogeneous image objects. Most can be grouped into two classes: • edge-based algorithms, and • area-based algorithms. • Unfortunately, the majority do not incorporate both spectral and spatial information, and very few have been used for remote sensing digital image classification. Object-oriented Image Segmentation One of the most promising approaches to remote sensing image segmentation was developed by Baatz and Schape (2000). The image segmentation involves looking at individual pixel values and their neighbors to compute a (Baatz et. al., 2001): • color criterion (hcolor), and • a shape or spatial criterion (hshape). Object-oriented Image Segmentation • The object-oriented classification of a segmented image is substantially different from performing a per-pixel classification. • First, the analyst is not constrained to using just spectral information. He or she may choose to use a) the mean spectral information in conjunction with b) various shape measures associated with each image object (polygon) in the dataset. • This introduces flexibility and robustness. Once selected, the spectral and spatial attributes of each polygon can be input to a variety of classification algorithms for analysis (e.g., nearest-neighbor, minimum distance, maximum likelihood).
Pages to are hidden for
"GEOMETRIC CORRECTION"Please download to view full document