Docstoc

Databases _MPA

Document Sample
Databases _MPA Powered By Docstoc
					                   Databases@MPA,
                access methods and plans

                       With contributions from
        • JHU : Alex Szalay, Jan Vanderberg
        • MPA: Jeremy Blaizot, Jarle Brinchmann,
               Guinevere Kauffmann, Anja von der Linden,
               Ben Panter, Guo Qi, Volker Springel,
               Vivienne Wild



Toledo, 2006-02-25         Databases @ MPA
                     Last year, Budapest
• Presented milli-Millennium halo merger tree
  database
• Requests:
     –   More properties (lambda, ...) X
     –   Galaxies V
     –   Correlation with environment (galaxies in voids) V
     –   Millennium


• Why use databases ? Ask Alex.



Toledo, 2006-02-25        Databases @ MPA
                       Current status
• milli-Millennium
     – Galaxies added: merger trees, links to their parent halos
     – Density field at various smoothings
     – Updated web site (demo)
• Millennium subset
     – Subset (~2%, 10x milli-Mil) of halo and galaxy trees
     – Z=0 density field
• Millennium
     – Halo trees in database (proprietary)
     – SAM galaxies under way (settle on model etc)
     – Density fields at all Z will be added: 1056964608 rows
• Durham
     – milli_Millennium mirror (Postgres)
     – Durham halo tree and galaxy catalogues


Toledo, 2006-02-25          Databases @ MPA
                     Other databases
• ROSAT: source catalogues and RASS photons (~100
  million)
• SDSS Peripherals
     – SDSS_MPA (Brinchman, Kauffmann, Tremonti et al)
     – MOPED (Ben Panter)
     – SDSS_PCA (Vivienne Wild et al)
• GalICS (Jeremy Blaizot)
• HEALPix all sky maps (Alex Szalay, Tony Banday)
     –   wmap (3 year data soon !)
     –   extinction maps
     –   radio maps (Bonn)
     –   ROSAT background (hopefully)



Toledo, 2006-02-25         Databases @ MPA
                     Access
• Public: http://www.g-vo.org/mpasims
• Local web apps to Millennium, BESTDR3 and
  peripherals: http://www.g-vo.org/sdssdr3/
• Public web browser queries limited (1min,
  10000 rows)
• Local databases + web apps less limited




Toledo, 2006-02-25   Databases @ MPA
                             Streaming
• Query results temporarily buffered on server:
  memory
• Streaming queries: faster, less limited (only
  timeout)
• Access:
     – IDL (with Ben Panter)
          • wget –http-user=*** --http-password=*** -O localfile.csv
            http://www.g-vo.org/sdssdr3/DBQueryStream?SQL=select * from
            moped..agebin
          • GUI asking for username/password
          • Interprets CSV stream, turned into IDL components

     – TOPCAT



Toledo, 2006-02-25             Databases @ MPA
                       Plans: Millennium
• Millennium:
     – Tune database
          • 750000000 halos
          • N x 1000000000 galaxies
          • 63 x 256^3 density field grid cells
     – More halo properties (shape, λ, ...)
     – More galaxy catalogues
          • different parameters
          • different algorithms (GalICS, Durham, ...)
     –   Light cone mock catalogues
     –   Galaxy spectra (+ PCA)
     –   Links to SDSS mirror and peripherals
     –   Proper metadata handling (ala SkyServer)
     –   "SAM online„
     –   Move webapps to MPA
     –   Use JHU services, install CAS jobs



Toledo, 2006-02-25              Databases @ MPA
         Plans: SDSS mirror + peripherals
• Make mirror web site public
• Upgrade SDSS mirror to DR4 …
• Stabilize, document, publish SDSS
  peripherals
• Proper metadata handling
• Links to Millennium
• Personal databases: MyDB (ala SkyServer)



• Add logos

Toledo, 2006-02-25   Databases @ MPA
                     Theory VO: spectra
• Combine theory and observations
• Example: query-by-example on theory
  spectra
• Find similar spectra, from these the actual
  galaxy formation history
• Chi-squared on all stored spectra ? Slow,
  requires storing all of them
• Idea (not original, see HVO/JHU talks): use
  PCA to compress data



Toledo, 2006-02-25        Databases @ MPA
                          PCA
• Need training sample of theory spectra to
  create eigenspectra
• Project all spectra
• Store PCA amplitudes in DB
• Provide web service:
     – Upload (observational) spectrum (IVOA SSA/SED)
     – Project onto theory eigenspectra
     – Use amplitudes as parameters in query for
       “nearby” amplitudes
     – Return corresponding theory spectra
     – Return corresponding galaxy formation histories,
       or their halos, or their environment …


Toledo, 2006-02-25     Databases @ MPA
                         Issues
• Dealing with errors, gaps: “gappy PCA”
  (Connolly & Szalay)
• Normalization:
     – incoming spectrum in general from very different
       dataset, needs common normalization
     – Incoming set will have gaps, errors
     – Ad hoc normalization possible (and works quite
       good)
• Indexing of complex multi-dimensional point
  set for quick nearest k neigbours search
  (Voronoi ? See Laszlo„s work)

Toledo, 2006-02-25      Databases @ MPA
                     Normalized gappy PCA
• Fit normalization factor at same time as PCA
  amplitudes. Model:




• Minimize (over ai and N ) :




Toledo, 2006-02-25         Databases @ MPA
Toledo, 2006-02-25   Databases @ MPA
Toledo, 2006-02-25   Databases @ MPA
Toledo, 2006-02-25   Databases @ MPA
                       So far
• Ran PCA on BC03 stochastic bursts
  (Vivienne)
• On first GalICS+milli-Millennium spectra
  (Jeremy)
• Projected SDSS spectra on both
• Defined a PCA data model/schema
• Stored PCAs in database
• TOPCAT




Toledo, 2006-02-25   Databases @ MPA
PCA data model (RDB schema available)
                                               -algorithm       PCADecompositionAlgorithm


                                                           1

                                                            *

                -pcaDecomposition                PCARun                                                                    -catalogue      SpectrumCatalogue
                                           -restRedshift : double
                                   1                                           *                                                     1


                                                                           PCAPreProcessing
                                                 -preprocessing            -lambda
                                                                           -mean
                                                                           -variance
                                                                   *       -wavelengthMask
                          PCAEigenSpectrum           *
                          -pcaRank : int
                                                     -eigenSpectra

                                                         -inputSpectra                  PCASpectrum
                                                                                   -assumedRedshift : double
                                                                                   -featureMask                                  1       -spectrum
                                                                       *                                        *

                      *                                                                                                        Spectrum          *
                                                                                                               -spectrum     -redshift
                                                                                                                             -target
           PCAProjectionRun                                                                                                                      -spectra
                                                                                                                      1
                                                                           *

                                                                                                                                 *       -spectrum
            *                                               PCAAmplitudes
                                       -amplitudes
                                                         -normalization : double                                            PhotometryPoint
                                                         -redshiftShift : double
                                                                                                                            -lambda
                                                 *       -amplitudes : double
                                                                                                                            -bin
                                                                                                                            -flux
           1     -algorithm                                                                                                 -error

          PCAProjectionAlgrithm




Toledo, 2006-02-25                                               Databases @ MPA
Toledo, 2006-02-25   Databases @ MPA
Toledo, 2006-02-25   Databases @ MPA
                        milliMil-GalICS
                PC1 vs PC2 Voronoi tesselation




Toledo, 2006-02-25       Databases @ MPA
              Issues for query-by-example
•   Overlap quite good, but good enough ?
•   GalICS spread less than SDSS.
•   BC03 comparable with SDSS, but different slope.
•   Systematics
     – Model:
          • physics very preliminary (see Blaizot & de Lucia?)
          • resolution effects
     – Preprocessing SDSS galaxies
          • Rebinning: different algorithms give comparable results
          • (slightly) wrong redshift ? Can be easily simulated
     – Projection algorithm: normalization does not affect outcome
     – Observational systematics: use virtual telescope (+virtual
       spectrograph) to test on the theory spectra.
       Easier to blow up simulation than to shrink observation
       cloud


Toledo, 2006-02-25             Databases @ MPA
                     Comments
• Millennium database being used for science
  projects (Guo Qi)
• SDSS peripherals used for science projects
  (see Vivienne‟s talk, Ben Panter)
• Use of mydb for debugging and testing
  (Jeremy)

• Please give comments, feedback.




Toledo, 2006-02-25   Databases @ MPA

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:10/20/2011
language:English
pages:22