Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

compaq

VIEWS: 4 PAGES: 60

									             The Sloan Digital
             Sky Survey


           Alex Szalay
Department of Physics and Astronomy
   The Johns Hopkins University
         The Sloan Digital Sky Survey

A project run by the Astrophysical Research Consortium (ARC)


            The University of Chicago
            Princeton University
            The Johns Hopkins University
            The University of Washington
            Fermi National Accelerator Laboratory
            US Naval Observatory
            The Japanese Participation Group
            The Institute for Advanced Study
            Max Planck Inst, Heidelberg
            SLOAN Foundation, NSF, DOE, NASA



Goal: To create a detailed multicolor map of the Northern Sky
          over 5 years, with a budget of approximately $80M
Data Size: 40 TB raw, 2 TB processed



                             Alex Szalay, JHU
             Scientific Motivation


Create the ultimate map of the Universe:
        The Cosmic Genome Project!
Study the distribution of galaxies:
        What is the origin of fluctuations?
        What is the topology of the distribution?
Measure the global properties of the Universe:
        How much dark matter is there?
Local census of the galaxy population:
        How did galaxies form?
Find the most distant objects in the Universe:
        What are the highest quasar redshifts?


                      Alex Szalay, JHU
                             Cosmology Primer


The Universe is expanding:
  the galaxies move away from us                            v = Ho r
  spectral lines are redshifted                            Hubble’s law


The fate of the universe depends
                                                      = density/critical
  on the balance between gravity
  and the expansion velocity                        if  <1, expand forever


Most of the mass in the Universe
 is dark matter, and it may be                                d>  *
 cold (CDM)

The spatial distribution of galaxies
  is correlated, due to small ripples               P(k): power spectrum
  in the early Universe


                                        Alex Szalay, JHU
                      The ‘Naught’ Problem


What are the global parameters of the Universe?

        H0       the Hubble constant                  55-75 km/s/Mpc
        0       the density parameter                0.25-1
        0       the cosmological constant            0 - 0.7

Their values are still quite uncertain today...

Goal:   measure these parameters with an accuracy of a few percent



                   High Precision Cosmology!




                                   Alex Szalay, JHU
The Cosmic Genome Project

The SDSS will create the ultimate map           SDSS Collaboration 2002
of the Universe, with much more detail
than any other measurement before




                             daCosta etal
     deLapparent, Geller and Huchra 1986 1995



         Gregory and Thompson 1978




                    Alex Szalay, JHU
                                 Area and Size of Redshift Surveys

                1.00E+09
                                                                                      SDSS
                                                                                     photo-z

                1.00E+08




                1.00E+07
                                                           SDSS
No of objects




                                                           main
                                                                                                   SDSS
                1.00E+06                                                                          abs line
                                                                              SDSS
                                                                               red

                1.00E+05            CfA+
                                    SSRS                  2dF        2dFR


                                                           LCRS
                1.00E+04
                                                        QDOT
                                         SAPM


                1.00E+03
                      1.00E+04    1.00E+05   1.00E+06     1.00E+07    1.00E+08   1.00E+09      1.00E+10      1.00E+11

                                                           Volume in M pc 3



                                                           Alex Szalay, JHU
                          Clustering of Galaxies


We will measure the spectrum of the
 density fluctuations to high precision
 even on very large scales




                                 The error in the amplitude of
                                   the fluctuation spectrum

                                            1970      x100
                                            1990      x2
                                            1995      ±0.4
                                            1998      ±0.2
                                            1999      ±0.1
                                            2002      ±0.05


                                          Alex Szalay, JHU
                              Relevant Scales

       Distances measured in Mpc [megaparsec]
                      1 Mpc           = 3 x 1024 cm
                      5 Mpc           = distance between galaxies
                   3000 Mpc           = scale of the Universe


if  >200 Mpc
          fluctuations have a PRIMORDIAL shape

if  <100 Mpc
          gravity creates sharp features, like walls,
          filaments and voids

Biasing
          conversion of mass into light is nonlinear
          light is much more clumpy than the mass




                                       Alex Szalay, JHU
         The Topology of Local Universe


Measure the Topology of the Universe
   Does it consist of walls and voids
   or is it randomly distributed?




                               Alex Szalay, JHU
Finding the Most Distant Objects




             Intermediate and high redshift QSOs
                   Multicolor selection function.
                   Luminosity functions and spatial clustering.
                   High redshift QSO’s (z>5).



               Alex Szalay, JHU
                     Features of the SDSS

Special 2.5m telescope, located at Apache Point, NM
           3 degree field of view.
           Zero distortion focal plane.
Two surveys in one:
           Photometric survey in 5 bands.
           Spectroscopic redshift survey.
Huge CCD Mosaic
           30 CCDs 2K x 2K (imaging)
           22 CCDs 2K x 400 (astrometry)
Two high resolution spectrographs
           2 x 320 fibers, with 3 arcsec diameter.
           R=2000 resolution with 4096 pixels.
           Spectral coverage from 3900Å to 9200Å.
Automated data reduction
           Over 100 man-years of development effort.
           (Fermilab + collaboration scientists)
Very high data volume
           Expect over 40 TB of raw data.
           About 2 TB processed products
           Data made available to the public


                                    Alex Szalay, JHU
                 Apache Point Observatory


Located in New Mexico,
near White Sands National Monument




                               Alex Szalay, JHU
                                 The Telescope

Special 2.5m telescope
  3 degree field of view
  Zero distortion focal plane
  Wind screen moved separately




                                      Alex Szalay, JHU
                     The Photometric Survey


Northern Galactic Cap
  5 broad-band filters ( u', g', r',       i', z’ )
  limiting magnitudes (22.3, 23.3, 23.1, 22.3, 20.8)
  drift scan of 10,000 square degrees
  55 sec exposure time
  40 TB raw imaging data -> pipeline ->
           100,000,000 galaxies
           50,000,000 stars
  calibration to 2% at r'=19.8
  only done in the best seeing (20 nights/yr)
  pixel size is 0.4 arcsec,
  astrometric precision is 60 milliarcsec

Southern Galactic Cap
  multiple scans (> 30 times) of the same stripe

Continuous data rate of 8 Mbytes/sec




                                       Alex Szalay, JHU
                               Survey Strategy

Overlapping 2.5 degree wide stripes
Avoiding the Galactic Plane (dust)
Multiple exposures on the three
   Southern stripes




                                      Alex Szalay, JHU
                 The Spectroscopic Survey



Measure redshifts of objects  distance

SDSS Redshift Survey:
         1 million galaxies
         100,000 quasars
         100,000 stars

Two high throughput spectrographs
         spectral range 3900-9200 Å.
         640 spectra simultaneously.
         R=2000 resolution.

Automated reduction of spectra
Very high sampling density and completeness
Objects in other catalogs also targeted



                                  Alex Szalay, JHU
                            Optimal Tiling

Fields have 3 degree diameter
Centers determined by an
    optimization procedure
A total of 2200 pointings
640 fibers assigned simultaneously




                                     Alex Szalay, JHU
The Mosaic Camera




       Alex Szalay, JHU
                    Photometric Calibrations


The SDSS will create a new
  photometric system:
      u' g' r' i' z'

Primary standards:
  observed with the USNO
  40-inch telescope in Flagstaff

Secondary standards:
  observed with the SDSS
  20-inch telescope at Apache
  Point – calibrating the SDSS
  imaging data




                                   Alex Szalay, JHU
                       The Spectrographs

Two double spectrographs
  very high throughput
  two 2048x2048 CCD detectors
  mounted on the telescope
  light fed through slithead




                                Alex Szalay, JHU
                   The Fiber Feed System

Galaxy images are captured by optical fibers
  lined up on the spectrograph slit
Manually plugged during the day into Al plugboards
640 fibers in each bundle
The largest fiber system today




                                Alex Szalay, JHU
First Light Images

                    Telescope:
                           First light May 9th 1998
                           Equatorial scans




        Alex Szalay, JHU
                             The First Stripes

Camera:
  5 color imaging of >100 square degrees
  Multiple scans across the same fields
  Photometric limits as expected




                                           Alex Szalay, JHU
NGC 2068




   Alex Szalay, JHU
UGC 3214




   Alex Szalay, JHU
NGC 6070




   Alex Szalay, JHU
                       The First Quasars


   The four highest redshift
quasars have been found in the
     first SDSS test data !




                                 Alex Szalay, JHU
                   Methane/T Dwarf



Discovery of several new(June 1999)
                        SDSS T-dwarf


 objects by SDSS & 2MASS




                          Alex Szalay, JHU
                 Detection of Gravitational Lensing

28,000 foreground galaxies and 2,045,000 background galaxies in test data
(McKay etal 1999)




                                       Alex Szalay, JHU
SDSS Data Flow




      Alex Szalay, JHU
                    Distributed Collaboration

                                       Fermilab


                                                  U.Chicago
                              ESNET
               U.Washington

                                                        I. Advanced
                                                            Study

   Japan                      VBNS
                                                       Princeton U.



                                                       JHU
Apache Point
Observatory           NMSU           USNO



                               Alex Szalay, JHU
Data Processing Pipelines




           Alex Szalay, JHU
           Concept of the SDSS Archive




 Operational
                                    Science Archive
  Archive                         (products accessible to users)
(raw + processed data)




                                   Other Archives
                                    Other Archives
                                     Other Archives




                         Alex Szalay, JHU
       SDSS Data Products


Object catalog                           400 GB
 parameters of >108 objects
Redshift Catalog                            1 GB
 parameters of 106 objects
Atlas Images                              1.5 TB
 5 color cutouts of >108 objects
Spectra                                   60 GB
 in a one-dimensional form
Derived Catalogs                          20 GB
 - clusters
 - QSO absorption lines
4x4 Pixel All-Sky Map                     60 GB
 heavily compressed


   All raw data saved in a tape vault at Fermilab


                     Alex Szalay, JHU
        Who will be using the archive?

Power Users
        sophisticated, with lots of resources
        research is centered around the archive data
                   moderate number of very intensive queries
                   mostly statistical, large output sizes
General Astronomy Public
        frequent, but casual lookup of objects/regions
        the archives help their research, but not central to it
                   large number of small queries
                   a lot of cross-identification requests
Wide Public
        browsing a ‘Virtual Telescope’
        can have large public appeal
        need special packaging
                   could be a very large number of requests




                               Alex Szalay, JHU
          How will the data be analyzed?


The data are inherently multidimensional
          => positions, colors, size, redshift

Improved classifications result in complex N-dimensional volumes
         => complex constraints, not ranges

Spatial relations will be investigated
           => nearest neighbors
           => other objects within a radius

Data Mining: finding the ‘needle in the haystack’
         => separate typical from rare
         => recognize patterns in the data

Output size can be prohibitively large for intermediate files
          => import output directly into analysis tools




                                  Alex Szalay, JHU
                    Geometric Approach


The Main Problem:
    •fast, indexed, complex searches of Terabytes in k-dim space
    •searches are not necessary parallel to the axes
          => traditional indexing (b-tree) does not work



Geometric Approach:
   •Use the geometric nature of the k-dimensional data
   •Quantize data into containers of ‘friends’:
       objects of similar colors
       close on the sky
       stored together
       => efficient cache performance
   •Containers represent a coarse grained density map of the data
       multidimensional index tree: k-d tree + r-tree



                                 Alex Szalay, JHU
                Geometric Indexing

 “Divide and Conquer”                             Partitioning

           Attributes                         Number

           Sky Position                           3
           Multiband Fluxes                   N = 5+
           Other                              M= 100+



                   3NM

Hierarchical              Split as k-d tree             Using regular
Triangular                Stored as r-tree                indexing
   Mesh                 of bounding boxes                techniques




                              Alex Szalay, JHU
                                Sky coordinates

Stored as Cartesian coordinates:
         projected onto a unit sphere
Longitude and Latitude lines:
         intersections of planes and the sphere
Boolean combinations:
         query polyhedron




                                           Alex Szalay, JHU
                Sky Partitioning

Hierarchical Triangular Mesh - based on octahedron




                       Alex Szalay, JHU
                     Hierarchical Subdivision


Hierarchical subdivision of spherical triangles
        represented as a quadtree
In SDSS the tree is 5 levels deep - 8192 triangles




                                    Alex Szalay, JHU
Result of the Query




        Alex Szalay, JHU
               Magnitudes and Multicolor Searches

Galaxy fluxes
                                      m  2.5 log10 ( f / f 0 )  2.5 log10 x
    • large dynamic range
                                            m      x 2
                                                       2
    • errors
                                      m  
                                        2
                                                 x  2
                                                    2
         divergent as x 0 !
                                            x       x

For multicolor magnitudes
    the error contours can be
    very anisotropic and skewed,
         extremely poor localization!



            But: this is an artifact of the logarithm at zero flux,
                in flux space the object is well localized




                                         Alex Szalay, JHU
                        Novel Magnitude Scale


      2.5      1  f    
        sinh  b   c
     ln 10            

b: softness
c: set to match normal magnitudes

Advantages:
    monotonic
    degrades gracefully
    objects have small error ellipse
    unified handling of detections
        and upper limits!

Disadvantages:
     unusual

(Lupton, Gunn and Szalay, AJ 99)

                                        Alex Szalay, JHU
                              Flux Indexing


Split along alternating flux directions
Create balanced partitions
Store bounding boxes at each step
Build a 10-12 level tree in each triangle




                                      Alex Szalay, JHU
                  How to build compact cells?

The SDSS will measure fluxes in 5 bands
       => asinh magnitudes
Axis-parallel splits in median flux,
   in 8 separate zones in Galactic latitude
          => 5 dimensional bounding boxes

    The fluxes are strongly correlated
             => 2 + dimensional distribution of typical objects
             => widely scattered rare objects
                       => large density contrasts

        Therefore:
          first create a local density and split on its value (Csabai etal 96)
                  typical (98%)                           rare (2%)




                                    Alex Szalay, JHU
       Coarse Grained Design




User Interface                    Analysis Engine




                 Archive



      Query Support

                         Data Warehouse




                      Alex Szalay, JHU
              Distributed Implementation


   User Interface                                 Analysis Engine


                              Master

                       SX Engine                          Objectivity Federation



                              Objectivity


  Slave
                      Slave
                                              Slave
Objectivity                                                             Slave
                    Objectivity
  RAID                                      Objectivity
                      RAID                                            Objectivity
                                              RAID
                                                                         RAID




                                  Alex Szalay, JHU
                      JHU Contributions


Fiber spectrographs
    P. Feldman
    A. Uomoto
    S. Friedman                        Science Archive
    S. Smee                                     A. Szalay
                                                A. Thakar
                                                P. Kunszt

                                                I. Csabai
       Management
                                                Gy. Szokoly
           T. Heckman                           A. Connolly
           T. Poehler                           A. Chaudhaury
           A. Davidsen
           A. Uomoto                        A lot of help from
           A. Szalay
                                                Jim Gray, Microsoft




                             Alex Szalay, JHU
                  Processing Platforms


At Fermilab:
 2 AlphaServer 8200          data processing
 1    SGI Origin 2000        data bases
Archive at JHU:
 1 AlphaServer 1000A (development)
 10 Intel based servers w. LVD RAID
  software verified on
       Digital Unix, IRIX, Solaris, Linux




                             Alex Szalay, JHU
                      Exploring new methods

New spectral classification techniques
        galaxy spectra can be expressed as a superposition
        of a few (<5) principal components
                   => objective classification of 1 million spectra!


Photometric redshifts
        galaxy colors systematically change with redshift,
        the SDSS photometry works like a 5-pixel spectrograph
                 => z=0.05, but with 100 million objects!


Measuring cosmological parameters
        before: data analysis was limited by small number statistics
        after:  dominant errors are systematic (extinction)
                => new analysis methods are required!




                                      Alex Szalay, JHU
                    Photometric redshifts


Multicolor photometry maps physical parameters
         luminosity L
         redshift z                       observed fluxes
         spectral type T
Inversion: u’,g’,r’,I’,z’ => z, L, T




        Redshifts are statistical, with large errors: z0.05
        The data set is huge, more than 100 million galaxies
        Easy to subdivide into coarse z bins, and by type
                 => study evolution
                 => enormous volume - 1 Gpc3




                                 Alex Szalay, JHU
                                       Measuring P(k)

   Karhunen-Loeve transform:
            Signal-to-noise eigenmodes of the redshift survey
            Optimal extraction of clustering signal
            Maximal rejection of systematic errors
   (Vogeley and Szalay 96, Matsubara, Szalay and Landy 99)



                           8            
  North     0.480..20
                
                  0 22
                         0.820..06
                             
                               0 06
                                      0.150..05
                                          
                                            0 05


  South     0.310..19
                
                  0 22
                         0.750..05
                             
                               0 05
                                      0.140..05
                                          
                                            0 05


Combined 0.400..14
             
               0 15
                         0.780..04
                             
                               0 04
                                      0.140..03
                                          
                                            0 03



        Pilot project using the Las
        Campanas Redshift Survey
       We simultaneously measure the values of
        withredshift-distortion parameter (=0.6/b),
         the 22,000 galaxies
          the normalization (8 ) and
          the CDM shape parameter (  = h).



                                                   Alex Szalay, JHU
                                               Trends


  • Future dominated by detector improvements

                                                             1000
                                                                    • Moore’s Law growth in
                                                                     CCD capabilities
                                                            100
                                                                    • Gigapixel arrays on the
                                                            10        horizon

                                                            1
                                                                    • Improvements in computing
                                                                       and storage will track growth
                                                            0.1        in data volume
                                                     2000
                                       1990
                                              1995                  • Investment in software is
                                1985
                  1975
                         1980                                          critical, and growing
           1970                                CCDs     Glass


Total area of 3m+ telescopes in the world in m2, total number
of CCD pixels in Megapix, as a function of time. Growth over
25 years is a factor of 30 in glass, 3000 in pixels.
                                              Alex Szalay, JHU
              The Age of Mega-Surveys

The next generation of astronomical archives with
  Terabyte catalogs will dramatically change astronomy
         top-down design
         large sky coverage
         built on sound statistical plans
         uniform, homogeneous, well calibrated
         well controlled and documented systematics

The technology to acquire, store and index the data is here
         we are riding Moore’s Law

Data mining in such vast archives will be a challenge,
   but possibilities are quite unimaginable

Integrating these archives into a single entity is a
   project for the whole community
         => National Virtual Observatory

                              Alex Szalay, JHU
             New Astronomy – Different!

Systematic Data Exploration
    will have a central role in the New Astronomy
Digital Archives of the Sky
    will be the main access to data
Data “Avalanche”
    the flood of Terabytes of data is already happening,
    whether we like it or not!
Transition to the new
    may be organized or chaotic



                           Alex Szalay, JHU
                NVO: The Challenges

Size of the archived data
      •   40,000 square degrees is 2 trillion pixels
      •   One band:                       4 Terabytes
      •   Multi-wavelength:       10-100 Terabytes
      •   Time dimension:             few Petabytes
The development of
      • new archival methods
      • new analysis tools
      • new standards
        (metadata, interchange formats)
Hardware/networking requirements
Training the next generation!

                             Alex Szalay, JHU
                              Summary


The SDSS project combines astronomy, physics, and computer science


     It promises to fundamentally change our view of the universe

It will determine how the largest structures in the universe were formed


 It will serve as the standard astronomy reference for several decades

 Its ‘virtual universe’ can be explored by both scientists and the public

    Through its archive it will create a new paradigm in astronomy



                                   Alex Szalay, JHU
 www.sdss.org
www.sdss.jhu.edu


       Alex Szalay, JHU

								
To top