Docstoc

Grid Tutorial

Document Sample
Grid Tutorial Powered By Docstoc
					Introduction to Grid
Computing


   Midwest Grid Workshop Module 1



                                    1
Scaling up Social Science:
Citation Network Analysis




      1975




      1980




        1985




         1990




             1995

                             Work of James Evans,
                             University of Chicago,
                2000
                                 Department of
                2002               Sociology
                                                  2
Scaling up the analysis

    Database queries of 25+ million citations
    Work started on small workstations
    Queries grew to month-long duration
    With database distributed across
     U of Chicago TeraPort cluster:
        50 (faster) CPUs gave 100 X speedup
        Many more methods and hypotheses can be tested!
    Higher throughput and capacity enables deeper
     analysis and broader community access.

                                                           3
  Computing clusters have commoditized
  supercomputing
Cluster Management     I/O Servers typically    A few Headnodes,
    “frontend”            RAID fileserver        gatekeepers and
                                               other service nodes




                                               Lots of
                       Disk Arrays
                                               Worker
                                               Nodes
  Tape Backup robots
                                                                     4
   PUMA: Analysis of Metabolism
    PUMA
Knowledge Base
 Information about
 proteins analyzed
 against ~2 million
  gene sequences




                                    Analysis on Grid
                                    Involves millions of
                                   BLAST, BLOCKS, and
Natalia Maltsev et al.                other processes
http://compbio.mcs.anl.gov/puma2                       5
    Initial driver: High Energy Physics
                                     ~PBytes/sec
                                                                                                                1 TIPS is approximately 25,000
                                                        Online System          ~100 MBytes/sec                  SpecInt95 equivalents

                                                                                    Offline Processor Farm
          There is a “bunch crossing” every 25 nsecs.
                                                                                           ~20 TIPS
          There are 100 “triggers” per second
                                                                                                          ~100 MBytes/sec
          Each triggered event is ~1 MByte in size

                                                       ~622 Mbits/sec
                                                                           Tier 0              CERN Computer Centre
                                        or Air Freight (deprecated)

 Tier 1
          France Regional                   Germany Regional                  Italy Regional                     FermiLab ~4 TIPS
              Centre                            Centre                            Centre
                                                                                                                                ~622 Mbits/sec


                                                            Tier 2            Caltech                   Tier2   Tier2 Centre
                                                                                               Tier2 Centre Centre        Tier2 Centre
                                                                              ~1 TIPS            ~1 TIPS ~1 TIPS ~1 TIPS ~1 TIPS
                                                ~622 Mbits/sec


                                 Institute
                                          Institute Institute     Institute
                                ~0.25TIPS                                                      Physicists work on analysis “channels”.
                                                                                               Each institute will have ~10 physicists working on one or more
      Physics data cache
                                                  ~1 MBytes/sec                                channels; data for these channels should be cached by the
                                                                                               institute server
                                                                  Tier 4
                    Physicist workstations



Image courtesy Harvey Newman, Caltech                                                                                                                           6
Grids Provide Global Resources
To Enable e-Science




                                 7
Ian Foster’s Grid Checklist

   A Grid is a system that:
       Coordinates resources that are not subject to
        centralized control
       Uses standard, open, general-purpose protocols
        and interfaces
       Delivers non-trivial qualities of service




                                                         8
    Virtual Organizations
   Groups of organizations that use the Grid to share resources
    for specific purposes
   Support a single community
   Deploy compatible technology and agree on working policies
     Security policies - difficult

   Deploy different network accessible services:
       Grid Information
       Grid Resource Brokering
       Grid Monitoring
       Grid Accounting



                                                                   9
    What is a grid made of ? Middleware.
                                                 Grid Resources dedicated
Grid Client




                                                                                             Com puting
                                                 by UC, IU, Boston




                                                                                              Cluster
                                                                                   Grid
     Application                                            Grid                Middleware
        User                                              Storage
      Interface
                                                 Grid resources shared




                                                                                             Com puting
                                                 by OSG, LCG, NorduGRID
         Grid




                                                                                              Cluster
                               Grid Protocols                                      Grid
      Middleware
                                                            Grid                Middleware
                                                          Storage

       Resource,
       Workflow                                  Grid resource time purchased




                                                                                             Com puting
                                                 from commercial provider
       And Data




                                                                                              Cluster
                                                                                   Grid
       Catalogs
                                                            Grid                Middleware
                                                          Storage

                  Security to control access and protect communication (GSI)
                  Directory to locate grid sites and services: (VORS, MDS)
                  Uniform interface to computing sites (GRAM)
                  Facility to maintain and schedule queues of work (Condor-G)
                  Fast and secure data set mover (GridFTP, RFT)
                  Directory to track where datasets live (RLS)


                                                                                                          10
We will focus on Globus and Condor
   Condor provides both client & server scheduling
       In grids, Condor provides an agent to queue, schedule
        and manage work submission
   Globus Toolkit provides the lower-level
    middleware
       Client tools which you can use from a command line
       APIs (scripting languages, C, C++, Java, …) to build
        your own tools, or use direct from applications
       Web service interfaces
       Higher level tools built from these basic components,
        e.g. Reliable File Transfer (RFT)
                                                                11
Grid components form a “stack”
                   Grid Application (M6,M7,M8)
                     (often includes a Portal)

            Workflow system (explicit or ad-hoc) (M6)

    Condor Grid Client (M3)


      Job                   Data                   Information
 Management (M3)       Management (M4)           (config & status)
                                                  Services (M5)

                 Grid Security Infrastructure (M2)


                      Core Globus Services


                   Standard Network Protocols



                                                                     12
Grid architecture is evolving to a
Service-Oriented approach.
...but this is beyond our workshop’s scope.
                                                     Users
See “Service-Oriented Science” by Ian Foster.
                                                          Composition
   Service-oriented applications
       Wrap applications as                       Workflows
        services                                          Invocation
       Compose applications
        into workflows
                                                 Appln       Appln
   Service-oriented Grid                       Service     Service
    infrastructure                                  Provisioning
       Provision physical
        resources to support
        application workloads

         “The Many Faces of IT as Service”, Foster, Tuecke, 2005      13
Security

      Workshop Module 2




                          14
Grid security is a crucial component
   Resources are typically valuable
   Problems being solved might be sensitive
   Resources are located in distinct administrative
    domains
       Each resource has own policies, procedures, security
        mechanisms, etc.
   Implementation must be broadly available &
    applicable
       Standard, well-tested, well-understood protocols;
        integrated with wide variety of tools

                                                               15
Security Services
   Forms the underlying communication medium for
    all the services
   Secure Authentication and Authorization
   Single Sign-on
       User explicitly authenticates only once – then single
        sign-on works for all service requests
   Uniform Credentials
   Example: GSI (Grid Security Infrastructure)



                                                                16
Authentication means identifying
that you are whom you claim to be
   Authentication stops imposters
   Examples of authentication:
       Username and password
       Passport
       ID card
       Public keys or certificates
       Fingerprint



                                      17
Authorization controls what you are
allowed to do.

   Is this device allowed to access to this service?
   Read, write, execute permissions in Unix
   Access conrol lists (ACLs) provide more flexible
    control
   Special “callouts” in the grid stack in job and data
    management perform authorization checks.



                                                           18
Job and resource management

     Workshop Module 3




                              19
Job Management Services provide a
standard interface to remote resources
   Includes CPU, Storage and Bandwidth
   Globus component is Globus Resource Allocation
    Manager (GRAM)
   The primary Condor grid client component is Condor-G
   Other needs:
       scheduling
       monitoring
       job migration
       notification



                                                           20
GRAM provides a uniform interface to
diverse resource scheduling systems.


        GRAM   Condor         VO   LSF            VO
 User
                  Site A                 Site C




               PBS                 UNIX fork()
                                                  VO
                              VO
                     Site B              Site D

                                                       Grid


                                                              21
GRAM: What is it?
   Globus Resource Allocation Manager
   Given a job specification:
       Create an environment for a job
       Stage files to and from the environment
       Submit a job to a local resource manager
       Monitor a job
       Send notifications of the job state change
       Stream a job’s stdout/err during execution



                                                     22
A “Local Resource Manager” is a batch system
for running jobs across a computing cluster
   In GRAM
   Examples:
       Condor
       PBS
       LSF
       Sun Grid Engine
   Most systems allow you to access “fork”
       Default behavior
       It runs on the gatekeeper:
           A bad idea in general, but okay for testing

                                                          23
Managing your jobs
   We need something more than just the basic functionality
    of the globus job submission commands
   Some desired features
       Job tracking
       Submission of a set of inter-dependant jobs
       Check-pointing and Job resubmission capability
       Matchmaking for selecting appropriate resource for executing the
        job
   Options: Condor, PBS, LSF, …




                                                                           24
Data Management

     Workshop Module 4




                         25
Data management services provide the
mechanisms to find, move and share data
   Fast
   Flexible
   Secure
   Ubiquitous
   Grids are used for analyzing and manipulating
    large amounts of data
       Metadata (data about data): What is the data?
       Data location: Where is the data?
       Data transport: How to move the data?

                                                        26
GridFTP is a secure, efficient and
standards-based data transfer protocol

   Robust, fast and widely accepted
   Globus GridFTP server
   Globus globus-url-copy GridFTP client
   Other clients exist (e.g., uberftp)




                                            27
GridFTP is secure, reliable and fast
   Security through GSI
       Authentication and authorization
       Can also provide encryption
   Reliability by restarting failed transfers
   Fast
       Can set TCP buffers for optimal performance
       Parallel transfers
       Striping (multiple endpoints)
   Not all features are accessible from basic client
       Can be embedded in higher level and application-specific
        frameworks

                                                                   28
File catalogues tell you where the data is
   Replica Location Service (RLS)
   Phedex
   RefDB / PupDB




                                             29
Requirements from a File Catalogue
   Abstract out the logical file name (LFN) for a
    physical file
       maintain the mappings between the LFNs and the PFNs
        (physical file names)
   Maintain the location information of a file




                                                              30
In order to avoid “hotspots”, replicate
data files in more than one location
   Effective use of the grid resources
   Each LFN can have more than 1 PFN
   Avoids single point of failure
   Manual or automatic replication
       Automatic replication considers the demand for a file,
        transfer bandwidth, etc.




                                                                 31
                The Globus-Based
                LIGO Data Grid
                LIGO Gravitational Wave Observatory



                                            Birmingham•
                                                          Cardiff




                                                 AEI/Golm




Replicating >1 Terabyte/day to 8 sites
>40 million replicas so far
                 www.globus.org/solutions                       32
National Grid
Cyberinfrastructure


   Workshop Module 5



                       33
TeraGrid provides vast resources via a
number of huge computing facilities.




                                         34
Open Science Grid (OSG) provides shared computing
resources, benefiting a broad set of disciplines
       A consortium of universities and national
       laboratories, building a sustainable grid
       infrastructure for science.




   OSG incorporates advanced networking and focuses on general services, operations, end-to-end
    performance
   Composed of a large number (>50 and growing) of shared computing facilities, or “sites”

                                                         http://www.opensciencegrid.org/
                                                                                                   35
         Open Science Grid
 50 sites (15,000 CPUs) & growing
 400 to >1000 concurrent jobs
 Many applications + CS experiments;
  includes long-running production operations
 Up since October 2003; few FTEs central ops




          Jobs (2004)




                                www.opensciencegrid.org
                                                      36
To efficiently use a Grid, you must
locate and monitor its resources.
   Check the availability of different grid sites
   Discover different grid services
   Check the status of “jobs”
   Make better scheduling decisions with
    information maintained on the “health” of sites




                                                      37
OSG Resource Selection Service: VORS




                                  38
    Monitoring and Discovery Service -
    MDS                      Clients (e.g., WebMDS)
                     GT4 Container
WS-ServiceGroup
                         MDS-
                         Index
 Registration &
WSRF/WSN Access
                          adapter

                                                  GT4 Cont.
   GT4 Container
           MDS-             Custom protocols       MDS-
          Index           for non-WSRF entities    Index
    Automated
   registration                      GridFTP
   in container
                                                    RFT
   GRAM           User
                                                              39
Grid Workflow



   Workshop Module 6



                       40
  A typical workflow pattern in image
  analysis runs many filtering apps.
 3a.h     3a.i        4a.h       4a.i            ref.h             ref.i             5a.h        5a.i          6a.h     6a.i


        align_warp/1             align_warp/3                       align_warp/5                        align_warp/7

             3a.w                         4a.w                                5a.w                           6a.w


          reslice/2                 reslice/4                              reslice/6                      reslice/8

          3a.s.h        3a.s.i          4a.s.h            4a.s.i            5a.s.h          5a.s.i          6a.s.h     6a.s.i


                                                            softmean/9

                                                         atlas.h             atlas.i


                                            slicer/10                      slicer/12                 slicer/14

                                        atlas_x.ppm                        atlas_y.ppm                  atlas_z.ppm


                                    convert/11                        convert/13                       convert/15

                                        atlas_x.jpg                        atlas_y.jpg                   atlas_z.jpg




Workflow courtesy James Dobson, Dartmouth Brain Imaging Center
                                                                                                                                41
    Workflows can process vast datasets.
       Many HEP and Astronomy experiments consist of:
           Large datasets as inputs (find datasets)
           “Transformations” which work on the input datasets (process)
           The output datasets (store and publish)
       The emphasis is on the sharing of the large datasets
       Transformations are usually independent and can be
        parallelized. But they can vary greatly in duration.




Mosaic of M42 created on TeraGrid
                                    Montage Workflow: ~1200 jobs, 7 levels
    = Data            = Compute
     Transfer           Job                                                  42
Virtual data model enables workflow to
abstract grid details.




                                         43
An important application pattern:
Grid Data Mining



     Workshop Module 7



                                    44
Mining Seismic data for hazard analysis
(Southern Calif. Earthquake Center).
      Seismicity             Paleoseismology                            Local site effects         Geologic structure




         Faults




                                    Seismic
                                    Hazard
                                     Model
                                                InSAR Image of the
                                              Hector Mine Earthquake
                                    A satellite
                                    generated
                                    Interferometric
                                    Synthetic Radar
                                    (InSAR) image of
                                    the 1999 Hector
                                    Mine earthquake.

                                    Show s the
                                    displacement field
                                    in the direction of
                                    radar imaging

                                    Each fringe (e.g.,
                                    from red to red)

   Stress                           corresponds to a
                                    few centimeters of
                                    displacement.

  transfer                                                                                                    Rupture
                   Crustal motion                         Crustal deformation          Seismic velocity      dynamics
                                                                                          structure                     45
Data mining in the grid
    Decompose across network
    Clients integrate dynamically
        Select & compose services
        Select “best of breed” providers
        Publish result as a new service
    Decouple resource & service providers

                    Users          Discovery tools


                                Analysis tools

                            Data Archives



                                Fig: S. G. Djorgovski
                                                        46
Conclusion: Why Grids?
   New approaches to inquiry based on
       Deep analysis of huge quantities of data
       Interdisciplinary collaboration
       Large-scale simulation and analysis
       Smart instrumentation
       Dynamically assemble the resources to tackle a new
        scale of problem
   Enabled by access to resources & services without
    regard for location & other barriers


                                                             47
Grids: Because Science Takes a Village …
   Teams organized around common goals
       People, resource, software, data, instruments…
   With diverse membership & capabilities
       Expertise in multiple areas required
   And geographic and political distribution
       No location/organization possesses all required skills
        and resources
   Must adapt as a function of the situation
       Adjust membership, reallocate responsibilities,
        renegotiate resources



                                                                 48
Based on:
Grid Intro and Fundamentals Review


                 Dr Gabrielle Allen
                 Center for Computation & Technology
                 Department of Computer Science
                 Louisiana State University
                 gallen@cct.lsu.edu

                              Grid Summer Workshop
                              June 26-30, 2006         49

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:10/1/2011
language:English
pages:49