VIEWS: 16 PAGES: 79 POSTED ON: 11/23/2011 Public Domain
Introduction to Hyperspectral Imaging HSI Feature Extraction Methods Dr. Richard B. Gomez, Instructor Outline • What is Hyperspectral Image Data? • Interpretation of Digital Image Data • Pixel Classification • HSI Data Processing Techniques – Methods and Algorithms (Continued) • Principal Component Analysis • Unmixing Pixel Problem • Spectral Mixing Analysis • Other • Feature Extraction Techniques – N-dimensional Exploitation – Cluster Analysis What is Hyperspectral Image Data? Hyperspectral image data is image data that is: • In digital form, i.e., a picture that a computer can read, manipulate, store, and display • Spatially quantized into picture elements (pixels) • Radiometrically quantized into discrete brightness levels • It can be in the form of Radiance, Apparent Reflectance, True Reflectance, or Digital Number Difference Between Radiance and Reflectance • Radiance is the variable directly measured by remote sensing instruments • Radiance has units of watt/steradian/square meter • Reflectance is the ratio of the amount of light leaving a target to the amount of light striking the target • Reflectance has no units • Reflectance is a property of the material being observed • Radiance depends on the illumination (both its intensity and direction), the orientation and position of the target, and the path of the light through the atmosphere • Atmospheric effects and the solar illumination can be compensated for in digital remote sensing data. This yields something, which is called "apparent reflectance," and it differs from true reflectance in that shadows and directional effects on reflectance have not been dealt with Interpretation of Digital Image Data • Qualitative Approach: Photointerpretation by a human analyst/interpreter • On a scale large relative to pixel size • Limited multispectral analysis • Inaccurate area estimates • Limited use of brightness levels • Quantitative Approach: Analysis by computer • At individual pixel level • Accurate area estimates possible • Exploits all brightness levels • Can perform true multidimensional analysis Data Space Representations • Image Space - Geographic Orientation • Spectral Signatures - Physical Basis for Response • N-Dimensional Space - For Use in Pattern Analysis Hyperspectral Imaging Barriers Ephemeris, Calibration, etc. • Scene - The most complex and dynamic part Sensor On-Board • Sensor - Also not under Processing analyst’s control • Processing System - Analyst’s choices Data Information Preprocessing Analysis Utilization Human Participation with Ancillary Data HSI Data Analysis Scheme* Finding Optimal Feature Subspaces • Feature Selection (FS) • Discriminant Analysis Feature Extraction (DAFE) • Decision Boundary Feature Extraction (DBFE) • Projection Pursuit (PP) Available in MultiSpec via WWW at: http://dynamo.ecn.purdue.edu/~biehl/MultiSpec/ Additional documentation via WWW at: http://dynamo.ecn.purdue.edu/~landgreb/publications.html *After David Landgrebe, Purdue University Dimension Space Reduction Pixel Classification • Labeling the pixels as belonging to particular spectral classes using the spectral data available • The terms classification, allocation, categorization, and labeling are generally used synonymously • The two broad classes of classification procedure are: supervised classification and unsupervised classification • Hybrid Supervised/Unsupervised Methods are available Pixel Classification Pixel Classification Classification Techniques Unsupervised Supervised Hybrid Classification Classification (Cont) Classification (Cont) Classification (Cont) Classification (Cont) Classification (Cont) Classification (Cont) Classification (Cont) Classifier Options • Correlation Classifier • Spectral Angle Mapper XT i XT i gi (X) T gi (X) cos1 T X X i i X X i i T T • Matched Filter - Constrained Energy Minimization X T C1i gi (X) T 1b i Cb i • Other types - “Nonparametric” Parzen Window Estimators Fuzzy Set - based Neural Network implementations K Nearest Neighbor - K-NN etc. Classification Algorithms • Linear Spectral Unmixing (LSU) • Generates maps of the fraction of each endmember in a pixel • Orthogonal Subspace Projection (OSP) • Suppresses background signatures and generates fraction maps like the LSU algorithm • Spectral Angle Mapper (SAM) • Treats a spectrum like a vector; Finds angle between spectra • Minimum Distance (MD) • A simple Gaussian Maximum Likelihood algorithm that does not use class probabilities • Binary Encoding (BE) and Spectral Signature Matching (SSM) • Bit compare simple binary codes calculated from spectra Unsupervised Classification K-MEANS • Use of statistical techniques to group n-dimensional data into their natural spectral classes • The K-Means unsupervised classifier uses a cluster analysis approach that requires the analyst to select the number of clusters to be located in the data, arbitrarily locates this number of cluster centers, then iteratively repositions them until optimal spectral separability is achieved ISODATA (Iterative Self-Organizing Data Analysis Technique) • IsoData unsupervised classification calculates class means evenly distributed in the data space and then iteratively clusters the remaining pixels using minimum distance techniques • Each iteration recalculates means and reclassifies pixels with respect to the new means • This process continues until the number of pixels in each class changes by less than the selected pixel change threshold or the maximum number of iterations is reached Iterative Self-Organizing Data Analysis Technique (ISODATA) • IsoData, unsupervised classification, calculates class means evenly distributed in the data space and then iteratively clusters the remaining pixels using minimum distance techniques • Each iteration recalculates means and reclassifies pixels with respect to the new means • This process continues until the number of pixels in each class changes by less than the selected pixel change threshold or the maximum number of iterations is reached Supervised Classification • Supervised classification requires that the user select training areas for use as the basis for classification • Various comparison methods are then used to determine if a specific pixel qualifies as a class member • A broad range of different classification methods, such as Parallelepiped, Maximum Likelihood, Minimum Distance, Mahalanobis Distance, Binary Encoding, and Spectral Angle Mapper can be used Parallelepiped • Parallelepiped classification uses a simple decision rule to classify multidimensional spectral data • The decision boundaries form an n-dimensional parallelepiped in the image data space • The dimensions of the parallelepiped are defined based upon a standard deviation threshold from the mean of each selected class Maximum Likelihood • Maximum likelihood classification assumes that the statistics for each class in each band are normally distributed • The probability that a given pixel belongs to a specific class is then calculated • Unless a probability threshold is selected, all pixels are classified • Each pixel is assigned to the class that has the highest probability (i.e., the "maximum likelihood") Minimum Distance • The minimum distance classification uses the mean vectors of each region of interest (ROI) • It calculates the Euclidean distance from each unknown pixel to the mean vector for each class • All pixels are classified to the closest ROI class unless the user specifies standard deviation or distance thresholds, in which case some pixels may be unclassified if they do not meet the selected criteria Euclidean Distance Mahalanobis Distance • The Mahalanobis Distance classification is a direction sensitive distance classifier that uses statistics for each class • It is similar to the Maximum Likelihood classification, but assumes all class covariances are equal and, therefore, is a faster method • All pixels are classified to the closest ROI class unless the user specifies a distance threshold, in which case some pixels may be unclassified if they do not meet the threshold Bhattacharyya Distance 1 1 1 T 1 2 1 2 1 2 B 1 2 1 2 Ln 8 2 2 1 2 Mean Difference Term Covariance Term Binary Encoding Classification • The binary encoding classification technique encodes the data and endmember spectra into 0s and 1s based on whether a band falls below or above the spectrum mean • An exclusive OR function is used to compare each encoded reference spectrum with the encoded data spectra and a classification image produced • All pixels are classified to the endmember with the greatest number of bands that match unless the user specifies a minimum match threshold, in which case some pixels may be unclassified if they do not meet the criteria Spectral Angle Mapper Classification • The Spectral Angle Mapper (SAM) is a physically-based spectral classification that uses the n-dimensional angle to match pixels to reference spectra • The SAM algorithm determines the spectral similarity between two spectra by calculating the angle between the spectra, treating them as vectors in a space with dimensionality equal to the number of bands Spectral Angle Mapper (SAM) Classification • The Spectral Angle Mapper (SAM) is a physically based spectral classification that uses the n-dimensional angle to match pixels to reference spectra • The algorithm determines the spectral similarity between two spectra by calculating the angle between the spectra, treating them as vectors in a space with dimensionality equal to the number of bands •The SAM algorithm assumes that hyperspectral image data have been reduced to "apparent reflectance", with all dark current and path radiance biases removed Spectral Angle Mapper (SAM) Algorithm The SAM algorithm uses a reference spectra, r, and the spectra found at each pixel, t. The basic comparison algorithm to find the angle is: (where nb = number of bands in the image) OR Minimum Noise Fraction (MNF) Transformation • The minimum noise fraction (MNF) transformation is used to determine the inherent dimensionality of image data, to segregate noise in the data, and to reduce the computational requirements for subsequent processing • The MNF transformation consists essentially of two-cascaded Principal Components transformations • The first transformation, based on an estimated noise covariance matrix, decorrelates and rescales the noise in the data. This first step results in transformed data in which the noise has unit variance and no band-to-band correlations • The second step is a standard Principal Components transformation of the noise-whitened data. • For further spectral processing, the inherent dimensionality of the data is determined by examination of the final eigenvalues and the associated images • The data space can be divided into two parts: one part associated with large eigenvalues and coherent eigenimages, and a complementary part with near-unity eigenvalues and noise-dominated images. By using only the coherent portions, the noise is separated from the data, thus improving spectral processing results. N - Dimensional Visualization • Spectra can be thought of as points in an n - dimensional scatterplot, where n is the number of bands • The coordinates of the points in n -space consist of "n" values that are simply the spectral radiance or reflectance values in each band for a given pixel • The distribution of these points in n - space can be used to estimate the number of spectral endmembers and their pure spectral signatures Pixel Purity Index (PPI) • The "Pixel-Purity-Index" (PPI) is a means of finding the most "spectrally pure," or extreme pixels in multispectral and hyperspectral images • PPI is computed by repeatedly projecting n- dimensional scatterplots onto a random unit vector • The extreme pixels in each projection are recorded and the total number of times each pixel is marked as extreme is noted • A PPI image is created in which the DN of each pixel corresponds to the number of times that pixel was recorded as extreme Matched Filter Technique Matched filtering maximizes the response of a known endmember and suppresses the response of the composite unknown background, thus "matching" the known signature • Provides a rapid means of detecting specific minerals based on matches to specific library or image endmember spectra • Produces images similar to the unmixing technique, but with significantly less computation • Results (values from 0 to 1), provide a means of estimating relative degree of match to the reference spectrum where “1” is a perfect match Spectral Mixing • Natural surfaces are rarely composed of a single uniform material • Spectral mixing occurs when materials with different spectral properties are represented by a single image pixel • Researchers who have investigated mixing scales and linearity have found that, if the scale of the mixing is large (macroscopic), mixing occurs in a linear fashion • For microscopic or intimate mixtures, the mixing is generally nonlinear Mixed Spectra Models Mixed spectra effects can be formalized in three ways: • A physical model • A mathematical model • A geometric model Mixed Spectra Physical Model Mixed Spectra Mathematical Model Mixed Spectra Geometric Model Mixture Tuned Matched Filtering (MTMF) • MTMF constrains the Matched Filtering as mixtures of the composite unknown background and the known target • MTMF produces the standard Matched Filter score images plus an additional set of images for each endmember “infeasibility images” • The best match to a target is obtained when the Matched Filter score is high (near 1) and the “infeasibility” score is low (near 0) Principal Component Analysis (PCA) • Calculation of new transformed variables (components) by a coordinate rotation • Components are uncorrelated and ordered by decreasing variance • First component axis aligned in the direction of the highest percentage of the total variance in the data • Component axes are mutually orthogonal • Maximum SNR and largest percentage of total variance in the first component Principal Component Analysis (PCA) Principal Component Analysis (PCA) (Cont) • The mean of the original data is the origin of the transformed system with the transformed axes of each component mutually orthogonal • To begin the transformation, the covariance matrix, C, is found. Using the covariance matrix, the eigenvalues, i, are obtained from |C – iI| = 0 where i = 1,2,...,n (n is the total number of original images and I is an identity matrix) Principal Component Analysis (PCA) (Cont) • The eigenvalues, i,, are equal to the variance of each corresponding component image • The eigenvectors, ei , define the axes of the components and are obtained from (C – iI) ei = 0 • The principal components are then given as PC = T• DN where DN is the digital number matrix of the original data and T is the (n x n) transformation matrix with matrix elements given by eij , i, j = 1,2,3,...n A Matrix Equation Problem: Find the value of vector x from measurement of a different vector y, where they are related by the matrix equation given by: y = Ax or yi = aijxj sum over j Note1: If both A and x are known, it is trivial to find y Note2: In our problem, y is the measurement, and A is determined from the physics of the problem, and we want to retrieve the value of x from y Mean and Variance Mean: x = (1/N) xk Variance: var(x) = (1/N) (xk - x)2 = x2 where k = 1,2,…,N Covariance cov(x,y) = (1/N) (xk x)(yk y) = (1/N) xk yk x y Note1: cov(x,x) = var(x) Note2: If the mean values of x and y are zero, then cov(x,y) = (1/N) xk yk Note3: Sums are over k = 1,2,…., N Covariance Matrix • Let x = (x1, x2, …,xn) be a random vector with n components • The covariance matrix of x is defined to be: C = (x )(x )T where = (1, 2, … k)T and k = (1/N)xmk Summation is over m = 1,2,…, N Gaussian Probability Distributions • Many physical processes are well represented with Guassian distributions given by: P(x) = (1/2x){e(x<x>)2 /2 x 2 } • Given the mean and variance of a Guassian random variable,it is possible to evaluate all of the higher moments • The form of the Gaussian is analytically simple Normal (Gaussian) Distribution Scatterplots Spectral Signatures Laboratory Data: Two classes of vegetation Discrete (Feature) Space Samples from T wo Classes 23 22 21 % Reflectance at 0.69 µm 20 19 18 Clas s 1 17 Clas s 2 16 10 11 12 13 14 15 % Re flectance at 0.67 µm Hughes Effect G.F. Hughes, "On the mean accuracy of statistical pattern recognizers," IEEE Trans. Inform. Theory., Vol IT-14, pp. 55-63, 1968. Higher Dimensional Space Implications High dimensional space is mostly empty. Data in high dimensional space is mostly in a lower dimensional structure. Normally distributed data will have a tendency to concentrate in the tails; Uniformly distributed data will concentrate in the corners. Higher Dimensional Space Geometry • The number of labeled samples needed for supervised classification increases rapidly with dimensionality In a specific instance, it has been shown that the samples required for a linear classifier increases linearly, as the square for a quadratic classifier. It has been estimated that the number increases exponentially for a non-parametric classifier. • For most high dimensional data sets, lower dimensional linear projections tend to be normal or a combination of normals. HSI Data Analysis Scheme* *After David Landgrebe, Purdue University 200 Dimensional Data Class Conditional Feature Classifier/Analyzer Feature Extraction Selection Class-Specific Information HSI Image of Washington DC Mall* Define Desired Classes Training areas designated by polygons outlined in white *After David Landgrebe, Purdue University Thematic Map of Washington DC Mall* Legend Operation CPU Time (sec.) Analyst Time Display Image 18 Roofs Define Classes < 20 min. Streets Feature Extraction 12 Grass Reformat 67 Trees Initial Classification 34 Inspect and Mod. Training ≈ 5 min. Paths Final Classification 33 Water Total 164 sec = 2.7 min. ≈ 25 min. Shadows (No preprocessing involved) *After David Landgrebe, Purdue University Hyperspectral Imaging Barriers Scene - Varies from hour to hour and sq. km to sq. km Sensor - Spatial Resolution, Spectral bands, S/N Processing System - • Classes to be labeled • Number of samples to define the classes • Features to be used • Complexity of the Classifier Operating Scenario • Remote sensing by airborne or spaceborne hyperspectral sensors • Finite flux reaching sensor causes spatial- spectral resolution trade-off • Hyperspectral data has hundreds of bands of spectral information • Spectrum characterization allows subpixel analysis and material identification Spectral Mixture Analysis Assumes reflectance from each pixel is caused by a linear mixture of subpixel materials Mixed Spectra Example 14000 12000 10000 Digital Count 8000 Parking Lot Vegetation 6000 1:1 Mixture 4000 2000 0 0.4 0.9 1.4 1.9 2.4 Wavelength (microns) Mixed Pixels and Material Maps 1.0 0.0 Red Input Image Fraction Map PURE PURE 1.0 0.5 PURE MIXED 0.0 1.0 Green Fraction 0.0 0.5 Map Traditional Linear Unmixing N Ri Ri,e f e i i=1…k e 1 Constraint Conditions • Unconstrained: fe • Partially Constrained: f e endmembers 10 . • Fully Constrained: 00 f e 10 . . Hierarchical Linear Unmixing Method • Unmixes broad material classes first • Proceeds to a group’s constituents only if the unmixed fraction is greater than a given threshold Example Materials Hierarchy Mixed Pixel Full Library • Concrete Man-Made Water Vegetation • Metal • Water • Vegetation Concrete Metal Trees Grass • Trees • Deciduous Trees • Coniferous Trees Deciduous Coniferous • Grass Stepwise Unmixing Method • Employs linear unmixing to find fractions • Uses iterative regressions to accept only the endmembers that improve a statistics-based model • Shown to be superior to classic linear method – Has better accuracy – Can handle more endmembers • Quantitatively tested only on synthetic data Performance Evaluation Error Metric: 1 SE = ( f truth f test ) 2 N pixels materials • Compare squared error from traditional, stepwise and hierarchical methods • Visually assess fraction maps for accuracy Endmember Selection • Endmembers are simply material types – Broad classification: road, grass, trees… – Fine classification: dry soil, moist soil... • Use image-derived endmembers to produce spectral library – Average reference spectra from “pure” sample pixels – Chose specific number of distinct endmembers Materials Hierarchy • Grouped similar materials into 3-level hierarchy – Level 1 – Level 2 – Level 3 Squared Error Results Stepwise Unmixing Comparisons • Linear unmixing does poorly, forcing fractions for all materials • Hierarchical approach performs better but requires extensive user involvement • Stepwise routine succeeds using adaptive endmember selection without extra preparation HSI Image of Washington DC Mall HYDICE Airborne System 1208 Scan Lines, 307 Pixels/Scan Line 210 Spectral Bands in 0.4-2.4 µm Region 155 Megabytes of Data (Not yet Geometrically Corrected) Hyperspectral Imaging Potential • Assume 10 bit data in a 100 dimensional space • That is (1024)100 ≈ 10300 discrete locations • Even for a data set of 106 pixels, the probability of any two pixels lying in the same discrete location is extremely small