Learning Center
Plans & pricing Sign in
Sign Out



									             The Sloan Digital
             Sky Survey

           Alex Szalay
Department of Physics and Astronomy
   The Johns Hopkins University
         The Sloan Digital Sky Survey

A project run by the Astrophysical Research Consortium (ARC)

            The University of Chicago
            Princeton University
            The Johns Hopkins University
            The University of Washington
            Fermi National Accelerator Laboratory
            US Naval Observatory
            The Japanese Participation Group
            The Institute for Advanced Study
            Max Planck Inst, Heidelberg
            SLOAN Foundation, NSF, DOE, NASA

Goal: To create a detailed multicolor map of the Northern Sky
          over 5 years, with a budget of approximately $80M
Data Size: 40 TB raw, 2 TB processed

                             Alex Szalay, JHU
             Scientific Motivation

Create the ultimate map of the Universe:
        The Cosmic Genome Project!
Study the distribution of galaxies:
        What is the origin of fluctuations?
        What is the topology of the distribution?
Measure the global properties of the Universe:
        How much dark matter is there?
Local census of the galaxy population:
        How did galaxies form?
Find the most distant objects in the Universe:
        What are the highest quasar redshifts?

                      Alex Szalay, JHU
                             Cosmology Primer

The Universe is expanding:
  the galaxies move away from us                            v = Ho r
  spectral lines are redshifted                            Hubble’s law

The fate of the universe depends
                                                      = density/critical
  on the balance between gravity
  and the expansion velocity                        if  <1, expand forever

Most of the mass in the Universe
 is dark matter, and it may be                                d>  *
 cold (CDM)

The spatial distribution of galaxies
  is correlated, due to small ripples               P(k): power spectrum
  in the early Universe

                                        Alex Szalay, JHU
                      The ‘Naught’ Problem

What are the global parameters of the Universe?

        H0       the Hubble constant                  55-75 km/s/Mpc
        0       the density parameter                0.25-1
        0       the cosmological constant            0 - 0.7

Their values are still quite uncertain today...

Goal:   measure these parameters with an accuracy of a few percent

                   High Precision Cosmology!

                                   Alex Szalay, JHU
The Cosmic Genome Project

The SDSS will create the ultimate map           SDSS Collaboration 2002
of the Universe, with much more detail
than any other measurement before

                             daCosta etal
     deLapparent, Geller and Huchra 1986 1995

         Gregory and Thompson 1978

                    Alex Szalay, JHU
                                 Area and Size of Redshift Surveys



No of objects

                1.00E+06                                                                          abs line

                1.00E+05            CfA+
                                    SSRS                  2dF        2dFR


                      1.00E+04    1.00E+05   1.00E+06     1.00E+07    1.00E+08   1.00E+09      1.00E+10      1.00E+11

                                                           Volume in M pc 3

                                                           Alex Szalay, JHU
                          Clustering of Galaxies

We will measure the spectrum of the
 density fluctuations to high precision
 even on very large scales

                                 The error in the amplitude of
                                   the fluctuation spectrum

                                            1970      x100
                                            1990      x2
                                            1995      ±0.4
                                            1998      ±0.2
                                            1999      ±0.1
                                            2002      ±0.05

                                          Alex Szalay, JHU
                              Relevant Scales

       Distances measured in Mpc [megaparsec]
                      1 Mpc           = 3 x 1024 cm
                      5 Mpc           = distance between galaxies
                   3000 Mpc           = scale of the Universe

if  >200 Mpc
          fluctuations have a PRIMORDIAL shape

if  <100 Mpc
          gravity creates sharp features, like walls,
          filaments and voids

          conversion of mass into light is nonlinear
          light is much more clumpy than the mass

                                       Alex Szalay, JHU
         The Topology of Local Universe

Measure the Topology of the Universe
   Does it consist of walls and voids
   or is it randomly distributed?

                               Alex Szalay, JHU
Finding the Most Distant Objects

             Intermediate and high redshift QSOs
                   Multicolor selection function.
                   Luminosity functions and spatial clustering.
                   High redshift QSO’s (z>5).

               Alex Szalay, JHU
                     Features of the SDSS

Special 2.5m telescope, located at Apache Point, NM
           3 degree field of view.
           Zero distortion focal plane.
Two surveys in one:
           Photometric survey in 5 bands.
           Spectroscopic redshift survey.
Huge CCD Mosaic
           30 CCDs 2K x 2K (imaging)
           22 CCDs 2K x 400 (astrometry)
Two high resolution spectrographs
           2 x 320 fibers, with 3 arcsec diameter.
           R=2000 resolution with 4096 pixels.
           Spectral coverage from 3900Å to 9200Å.
Automated data reduction
           Over 100 man-years of development effort.
           (Fermilab + collaboration scientists)
Very high data volume
           Expect over 40 TB of raw data.
           About 2 TB processed products
           Data made available to the public

                                    Alex Szalay, JHU
                 Apache Point Observatory

Located in New Mexico,
near White Sands National Monument

                               Alex Szalay, JHU
                                 The Telescope

Special 2.5m telescope
  3 degree field of view
  Zero distortion focal plane
  Wind screen moved separately

                                      Alex Szalay, JHU
                     The Photometric Survey

Northern Galactic Cap
  5 broad-band filters ( u', g', r',       i', z’ )
  limiting magnitudes (22.3, 23.3, 23.1, 22.3, 20.8)
  drift scan of 10,000 square degrees
  55 sec exposure time
  40 TB raw imaging data -> pipeline ->
           100,000,000 galaxies
           50,000,000 stars
  calibration to 2% at r'=19.8
  only done in the best seeing (20 nights/yr)
  pixel size is 0.4 arcsec,
  astrometric precision is 60 milliarcsec

Southern Galactic Cap
  multiple scans (> 30 times) of the same stripe

Continuous data rate of 8 Mbytes/sec

                                       Alex Szalay, JHU
                               Survey Strategy

Overlapping 2.5 degree wide stripes
Avoiding the Galactic Plane (dust)
Multiple exposures on the three
   Southern stripes

                                      Alex Szalay, JHU
                 The Spectroscopic Survey

Measure redshifts of objects  distance

SDSS Redshift Survey:
         1 million galaxies
         100,000 quasars
         100,000 stars

Two high throughput spectrographs
         spectral range 3900-9200 Å.
         640 spectra simultaneously.
         R=2000 resolution.

Automated reduction of spectra
Very high sampling density and completeness
Objects in other catalogs also targeted

                                  Alex Szalay, JHU
                            Optimal Tiling

Fields have 3 degree diameter
Centers determined by an
    optimization procedure
A total of 2200 pointings
640 fibers assigned simultaneously

                                     Alex Szalay, JHU
The Mosaic Camera

       Alex Szalay, JHU
                    Photometric Calibrations

The SDSS will create a new
  photometric system:
      u' g' r' i' z'

Primary standards:
  observed with the USNO
  40-inch telescope in Flagstaff

Secondary standards:
  observed with the SDSS
  20-inch telescope at Apache
  Point – calibrating the SDSS
  imaging data

                                   Alex Szalay, JHU
                       The Spectrographs

Two double spectrographs
  very high throughput
  two 2048x2048 CCD detectors
  mounted on the telescope
  light fed through slithead

                                Alex Szalay, JHU
                   The Fiber Feed System

Galaxy images are captured by optical fibers
  lined up on the spectrograph slit
Manually plugged during the day into Al plugboards
640 fibers in each bundle
The largest fiber system today

                                Alex Szalay, JHU
First Light Images

                           First light May 9th 1998
                           Equatorial scans

        Alex Szalay, JHU
                             The First Stripes

  5 color imaging of >100 square degrees
  Multiple scans across the same fields
  Photometric limits as expected

                                           Alex Szalay, JHU
NGC 2068

   Alex Szalay, JHU
UGC 3214

   Alex Szalay, JHU
NGC 6070

   Alex Szalay, JHU
                       The First Quasars

   The four highest redshift
quasars have been found in the
     first SDSS test data !

                                 Alex Szalay, JHU
                   Methane/T Dwarf

Discovery of several new(June 1999)
                        SDSS T-dwarf

 objects by SDSS & 2MASS

                          Alex Szalay, JHU
                 Detection of Gravitational Lensing

28,000 foreground galaxies and 2,045,000 background galaxies in test data
(McKay etal 1999)

                                       Alex Szalay, JHU
SDSS Data Flow

      Alex Szalay, JHU
                    Distributed Collaboration



                                                        I. Advanced

   Japan                      VBNS
                                                       Princeton U.

Apache Point
Observatory           NMSU           USNO

                               Alex Szalay, JHU
Data Processing Pipelines

           Alex Szalay, JHU
           Concept of the SDSS Archive

                                    Science Archive
  Archive                         (products accessible to users)
(raw + processed data)

                                   Other Archives
                                    Other Archives
                                     Other Archives

                         Alex Szalay, JHU
       SDSS Data Products

Object catalog                           400 GB
 parameters of >108 objects
Redshift Catalog                            1 GB
 parameters of 106 objects
Atlas Images                              1.5 TB
 5 color cutouts of >108 objects
Spectra                                   60 GB
 in a one-dimensional form
Derived Catalogs                          20 GB
 - clusters
 - QSO absorption lines
4x4 Pixel All-Sky Map                     60 GB
 heavily compressed

   All raw data saved in a tape vault at Fermilab

                     Alex Szalay, JHU
        Who will be using the archive?

Power Users
        sophisticated, with lots of resources
        research is centered around the archive data
                   moderate number of very intensive queries
                   mostly statistical, large output sizes
General Astronomy Public
        frequent, but casual lookup of objects/regions
        the archives help their research, but not central to it
                   large number of small queries
                   a lot of cross-identification requests
Wide Public
        browsing a ‘Virtual Telescope’
        can have large public appeal
        need special packaging
                   could be a very large number of requests

                               Alex Szalay, JHU
          How will the data be analyzed?

The data are inherently multidimensional
          => positions, colors, size, redshift

Improved classifications result in complex N-dimensional volumes
         => complex constraints, not ranges

Spatial relations will be investigated
           => nearest neighbors
           => other objects within a radius

Data Mining: finding the ‘needle in the haystack’
         => separate typical from rare
         => recognize patterns in the data

Output size can be prohibitively large for intermediate files
          => import output directly into analysis tools

                                  Alex Szalay, JHU
                    Geometric Approach

The Main Problem:
    •fast, indexed, complex searches of Terabytes in k-dim space
    •searches are not necessary parallel to the axes
          => traditional indexing (b-tree) does not work

Geometric Approach:
   •Use the geometric nature of the k-dimensional data
   •Quantize data into containers of ‘friends’:
       objects of similar colors
       close on the sky
       stored together
       => efficient cache performance
   •Containers represent a coarse grained density map of the data
       multidimensional index tree: k-d tree + r-tree

                                 Alex Szalay, JHU
                Geometric Indexing

 “Divide and Conquer”                             Partitioning

           Attributes                         Number

           Sky Position                           3
           Multiband Fluxes                   N = 5+
           Other                              M= 100+


Hierarchical              Split as k-d tree             Using regular
Triangular                Stored as r-tree                indexing
   Mesh                 of bounding boxes                techniques

                              Alex Szalay, JHU
                                Sky coordinates

Stored as Cartesian coordinates:
         projected onto a unit sphere
Longitude and Latitude lines:
         intersections of planes and the sphere
Boolean combinations:
         query polyhedron

                                           Alex Szalay, JHU
                Sky Partitioning

Hierarchical Triangular Mesh - based on octahedron

                       Alex Szalay, JHU
                     Hierarchical Subdivision

Hierarchical subdivision of spherical triangles
        represented as a quadtree
In SDSS the tree is 5 levels deep - 8192 triangles

                                    Alex Szalay, JHU
Result of the Query

        Alex Szalay, JHU
               Magnitudes and Multicolor Searches

Galaxy fluxes
                                      m  2.5 log10 ( f / f 0 )  2.5 log10 x
    • large dynamic range
                                            m      x 2
    • errors
                                      m  
                                                 x  2
         divergent as x 0 !
                                            x       x

For multicolor magnitudes
    the error contours can be
    very anisotropic and skewed,
         extremely poor localization!

            But: this is an artifact of the logarithm at zero flux,
                in flux space the object is well localized

                                         Alex Szalay, JHU
                        Novel Magnitude Scale

      2.5      1  f    
        sinh  b   c
     ln 10            

b: softness
c: set to match normal magnitudes

    monotonic
    degrades gracefully
    objects have small error ellipse
    unified handling of detections
        and upper limits!

     unusual

(Lupton, Gunn and Szalay, AJ 99)

                                        Alex Szalay, JHU
                              Flux Indexing

Split along alternating flux directions
Create balanced partitions
Store bounding boxes at each step
Build a 10-12 level tree in each triangle

                                      Alex Szalay, JHU
                  How to build compact cells?

The SDSS will measure fluxes in 5 bands
       => asinh magnitudes
Axis-parallel splits in median flux,
   in 8 separate zones in Galactic latitude
          => 5 dimensional bounding boxes

    The fluxes are strongly correlated
             => 2 + dimensional distribution of typical objects
             => widely scattered rare objects
                       => large density contrasts

          first create a local density and split on its value (Csabai etal 96)
                  typical (98%)                           rare (2%)

                                    Alex Szalay, JHU
       Coarse Grained Design

User Interface                    Analysis Engine


      Query Support

                         Data Warehouse

                      Alex Szalay, JHU
              Distributed Implementation

   User Interface                                 Analysis Engine


                       SX Engine                          Objectivity Federation


Objectivity                                                             Slave
  RAID                                      Objectivity
                      RAID                                            Objectivity

                                  Alex Szalay, JHU
                      JHU Contributions

Fiber spectrographs
    P. Feldman
    A. Uomoto
    S. Friedman                        Science Archive
    S. Smee                                     A. Szalay
                                                A. Thakar
                                                P. Kunszt

                                                I. Csabai
                                                Gy. Szokoly
           T. Heckman                           A. Connolly
           T. Poehler                           A. Chaudhaury
           A. Davidsen
           A. Uomoto                        A lot of help from
           A. Szalay
                                                Jim Gray, Microsoft

                             Alex Szalay, JHU
                  Processing Platforms

At Fermilab:
 2 AlphaServer 8200          data processing
 1    SGI Origin 2000        data bases
Archive at JHU:
 1 AlphaServer 1000A (development)
 10 Intel based servers w. LVD RAID
  software verified on
       Digital Unix, IRIX, Solaris, Linux

                             Alex Szalay, JHU
                      Exploring new methods

New spectral classification techniques
        galaxy spectra can be expressed as a superposition
        of a few (<5) principal components
                   => objective classification of 1 million spectra!

Photometric redshifts
        galaxy colors systematically change with redshift,
        the SDSS photometry works like a 5-pixel spectrograph
                 => z=0.05, but with 100 million objects!

Measuring cosmological parameters
        before: data analysis was limited by small number statistics
        after:  dominant errors are systematic (extinction)
                => new analysis methods are required!

                                      Alex Szalay, JHU
                    Photometric redshifts

Multicolor photometry maps physical parameters
         luminosity L
         redshift z                       observed fluxes
         spectral type T
Inversion: u’,g’,r’,I’,z’ => z, L, T

        Redshifts are statistical, with large errors: z0.05
        The data set is huge, more than 100 million galaxies
        Easy to subdivide into coarse z bins, and by type
                 => study evolution
                 => enormous volume - 1 Gpc3

                                 Alex Szalay, JHU
                                       Measuring P(k)

   Karhunen-Loeve transform:
            Signal-to-noise eigenmodes of the redshift survey
            Optimal extraction of clustering signal
            Maximal rejection of systematic errors
   (Vogeley and Szalay 96, Matsubara, Szalay and Landy 99)

                           8            
  North     0.480..20
                  0 22
                               0 06
                                            0 05

  South     0.310..19
                  0 22
                               0 05
                                            0 05

Combined 0.400..14
               0 15
                               0 04
                                            0 03

        Pilot project using the Las
        Campanas Redshift Survey
       We simultaneously measure the values of
        withredshift-distortion parameter (=0.6/b),
         the 22,000 galaxies
          the normalization (8 ) and
          the CDM shape parameter (  = h).

                                                   Alex Szalay, JHU

  • Future dominated by detector improvements

                                                                    • Moore’s Law growth in
                                                                     CCD capabilities
                                                                    • Gigapixel arrays on the
                                                            10        horizon

                                                                    • Improvements in computing
                                                                       and storage will track growth
                                                            0.1        in data volume
                                              1995                  • Investment in software is
                         1980                                          critical, and growing
           1970                                CCDs     Glass

Total area of 3m+ telescopes in the world in m2, total number
of CCD pixels in Megapix, as a function of time. Growth over
25 years is a factor of 30 in glass, 3000 in pixels.
                                              Alex Szalay, JHU
              The Age of Mega-Surveys

The next generation of astronomical archives with
  Terabyte catalogs will dramatically change astronomy
         top-down design
         large sky coverage
         built on sound statistical plans
         uniform, homogeneous, well calibrated
         well controlled and documented systematics

The technology to acquire, store and index the data is here
         we are riding Moore’s Law

Data mining in such vast archives will be a challenge,
   but possibilities are quite unimaginable

Integrating these archives into a single entity is a
   project for the whole community
         => National Virtual Observatory

                              Alex Szalay, JHU
             New Astronomy – Different!

Systematic Data Exploration
    will have a central role in the New Astronomy
Digital Archives of the Sky
    will be the main access to data
Data “Avalanche”
    the flood of Terabytes of data is already happening,
    whether we like it or not!
Transition to the new
    may be organized or chaotic

                           Alex Szalay, JHU
                NVO: The Challenges

Size of the archived data
      •   40,000 square degrees is 2 trillion pixels
      •   One band:                       4 Terabytes
      •   Multi-wavelength:       10-100 Terabytes
      •   Time dimension:             few Petabytes
The development of
      • new archival methods
      • new analysis tools
      • new standards
        (metadata, interchange formats)
Hardware/networking requirements
Training the next generation!

                             Alex Szalay, JHU

The SDSS project combines astronomy, physics, and computer science

     It promises to fundamentally change our view of the universe

It will determine how the largest structures in the universe were formed

 It will serve as the standard astronomy reference for several decades

 Its ‘virtual universe’ can be explored by both scientists and the public

    Through its archive it will create a new paradigm in astronomy

                                   Alex Szalay, JHU

       Alex Szalay, JHU

To top