Principal Component Analysis of Cas A
Jessica S. Warren John P. Hughes Rutgers University
PCA Technique
Goal: To look at spectral variations in SNRs
Unbiased technique for identifying spectral variations in a statistically quantifiable way
Specific to Cas A:
Look for Si, Fe, non-thermal regions Use to quantify differences in Si/Fe regions Use to find most Fe-rich regions
Cas A – 3 Color
Hughes et al. 2000 – 5 ks image Red: 0.6 – 1.65 keV Green: 1.65 –2.25 keV Blue: 2.25 – 7.50 keV
Cas A – Line-to-Continuum Ratio
Hwang & Laming 2003 Fe L image overlaid with contours of Fe L line-to-continuum ratios
What is PCA?
PCA is a statistical technique often used to reduce the dimensionality of a dataset.
Represent the data in matrix form
Rows: spatial regions; Columns: spectral channels
PCA finds the eigenvalues & eigenvectors of the data matrix
Eigenvectors: axes which maximize the variance of the data Eigenvalues: quantify amount of variation accounted for by that eigenvector
PCA - Picture
Data: n objects (spatial regions), each with m attributes (spectral channels)
Each object represented by a vector in space of m attributes
Eigenvectors are new axes in m-space that:
Maximize variances of all objects (C) Equivalent to: Minimize distances of all objects (B)
How do we use PCA for SNRs?
Divide SNR into many spatial regions
Each region has at least a minimum number of userdetermined counts 50 ks observation 1000 minimum counts Spatial binning: 4”x4” to 16”x16”
Using PCA
Get spectrum of each region Bin each spectrum in energy in the same way Normalize each to the total number of counts in that region Input to PCA code (Murtagh & Heck 1987)
Outputs will be eigenvectors and eigenvalues
Energy Binning – First Try!
Bin 1 2 3 4 5 6 7 8 9 10 Name Absorbed Oxygen Iron L Magnesium Continuum/Mg Silicon Sulphur Argon Calcium Iron K/Continuum Energy (keV) 0.2 – 0.45 0.45 – 0.7 0.7 – 1.2 1.2 – 1.4 1.4 – 1.6 1.6 – 2.2 2.2 – 2.6 2.6 – 3.4 3.4 – 5.0 5.0 – 9.0
Results: A Work in Progress
Output from PCA are eigenvectors, or principal components
As many components as spectral channels Rank the components based on eigenvalues
Consider only 1st two here
Represent 38% and 31% of variation in spectra Still to do: quantify significance of remaining components
Eigenvectors
Soft vs. hard spectra – perhaps column density
Si-rich vs. Si-poor spectra
Si
Projections of Spectra in PC1-PC2 Plane
Project each spectrum onto the new axis (eigenvector).
The features in this graph are indications of different types of spectra.
Map of Projections – PC1
Map of Projections – PC1
But dark regions don’t just represent nonthermal emission . . .
Projections
Look at regions in pc1-pc2 plane: degeneracy is broken
Non-thermal hard spectra
Softer spectra
Thermal hard spectra
Map of Projections – PC2
Map of Projections – PC2
But light regions don’t just represent iron-rich material . . .
Projections
Look at regions in pc1-pc2 plane: degeneracy is broken
Non-thermal Fe rich
Si rich
Conclusions . . .
Able to separate out major differences in spectra Need to:
Non-thermal
Fe rich
Go from pc1-pc2 plot to the spectra to ID regions of interest Is 3rd component useful? Optimize spectral binning vs. spatial binning for 1 Ms observation
Thermal hard
Soft
Si rich
Simulated Data
Need to determine significance of results Calculate mean spectrum Input to Poisson random number generator
Only variations then due to Poisson noise Input simulations to PCA Tells which eigenvectors are significant
SCREE Test
2 significant eigenvalues
This is the variance explained by a particular eigenvector.