Document Sample
Communication Powered By Docstoc
Introduction to Grid Computing

   What is a Grid – an integrated advanced cyber
    infrastructure that delivers:
    o Computing capacity
    o Data capacity
    o Communication capacity
   Why? There are many applications that are
    characterized as follows:
    o Large varied distributed collaborations need to work
    o Need lots of cycles, storage (we are talking about
      teraflops, terabytes)
    o Need to share results, codes, parameter files, …
            Grid Motivation

 Grid Computing was originally about
  extending scientific computing on single
  machines to distributed systems
 Despite the improvement in raw computing
  power, storage capacity, communication it
  is difficult to keep up with the increased
  demand from the types of applications
  being developed.
               Grid Motivation

   Scientific Applications
    o Analysis of large data volumes from different
    o Lots of computation needed to model an aspect
      of the natural world
    o Often requires substantially different types of
      computational resources
    o Projected data is measured in petabytes; Lots
      of storage
                 Grid Motivation

   Astronomy
    o Digital sky surveys
   Medical data
    o X-Ray, mammography data, etc. (many petabytes)
    o Digitizing patient records (ditto)
   Molecular genomics and related disciplines
    o Human Genome, other genome databases
    o Proteomics (protein structure, activities, …)
    o Protein interactions, drug delivery
   Virtual Population Laboratory (proposed)
    o Simulate likely spread of disease outbreaks
   Brain scans (3-D, time dependent)
   Climate studies
               Grid Motivation

   In the business world, companies want to
    integrate, manage and analyze large
    volumes of data
    o Example: An insurance company mines data
      from partner hospitals for fraud detection
                Grid Motivation

   Could buy additional machines
   There is a lot of computing power that is
    unutilized or underutilized most of the time
   How can applications take advantage of the
    multiple resources available in an effective
   A grid is intended for allowing the sharing,
    selection and aggregation of a wide variety of
    geographically dispersed resources owned by
    different organizations (virtual organizations)
      Emergence of the Virtual

“Resource sharing & coordinated
problem solving in dynamic … virtual

    “The Anatomy of the Grid”, Foster, Kesselman, Tuecke, 2001
Other Distributed Infrastructures

 Road, rail, telephones, power, banking,
  water, electrial
 All started locally, then regionally, then
  nationally, and then internationally
 Provide reliable relatively low cost access
  to a standardized service
 Available to the masses
             Electrical Power Grid

   Single entity providing power
   Relatively efficient, low cost, reliable
   US Grid links 10K generators
   Complex physical connections and trading mechanisms
   Components heterogeneous and operated/owned by
    different companies
   Consumers differ in amount of power they use, the quality of
    service they require, and the price they will pay
   Economics important: grid driven by economic factors.
    Reserve capacities, trading power.
   Politics important: success depended on regulatory, political
    and institutional developments as much as technical
   Control important: infrastructure for monitoring,
    management and control
       Emergence of the Virtual
   Commonalities
    o Need to discover and share resources
    o Do not necessarily trust all other participants
    o Not just about document exchange; Also about
      remote software, computers, data, sensors, etc;
    o Resource sharing is conditional and the
      conditions are dynamic
       —Can only use resources for a limited class of problems
        or at certain times of the day.
What is a Grid Checklist (Foster)

   Coordinates distributed resources using
    non-centralized control mechanisms.
    o A grid integrates and coordinates resources
      and users that live within different
      administrative domains
       —E.g.., different administrative units of the same
        company or different company
    o Addresses the issue of security, policy,
      payment, membership
What is a Grid Checklist (Foster)

   Uses standard, open, general-purpose
    protocols and interfaces
    o A grid is built from multi-purpose protocols and
      interfaces that address such fundamental
      issues as authentication, authorization,
      resource discovery, and resource access.
   Deliver nontrivial qualities of service
    o Resources should be used in a coordinated
      fashion to deliver various quality of service
    o Quality of service is usually defined in metrics
      such as response time, throughput, availability,
              Grid vs. Internet?

 We’ve had computers connected by
  networks for 20 years
 The Grid brings additional notions
  o Virtual Organizations
  o Infrastructure to enable computation to be
    carried out across these
    —Authentication, monitoring, information, resource
     discovery, status, coordination, etc
 CanI just plug my application into the
  o No! Much work to do to get there!
              Are these Grids?

   Cluster Management Systems
    o Examples: Sun’s Sun Grid Engine,Platform’s
      Loadsharing facility
    o These can be installed on a parallel computer or
      in a local area network
    o Can deliver a quality of service
    o Each may be an important component of a Grid,
      but by itself does not constitute a Grid
                Are these Grids?

   Multi-site scheduler
    o Example: Platform’s multicluster scheduler
    o Yes: Not terribly sophisticated but it is a grid
   Gnuetella
    o Maybe – Is it too specialized.
    o Is it open or is it a standard?
   WWW
   Foster’s checklist more clearly applies to large-
    scale Grid deployments:
    o Data Grid: GriPhyN, PPDG, EU DataGrid, iVDGL,
      DataTAG, NASA’s Information Power Grid
    o TeraGrid: Used to link major US academic sites
    Advantages of Grid Computing

   Uses resources scattered across the world
    o Access to more computing power
    o Better access to data
    o Utilize unused cycles
   Facilitates Virtual Organizations (VO)
    o Groups of organizations that use the Grid to
      share resources
                 Online Access to
               Scientific Instruments
  Advanced Photon Source


real-time                         archival   desktop & VR clients
                                             with shared controls
collection                        storage

 tomographic reconstruction
       DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago
                                 Data Grids for
                              High Energy Physics
                                                                                                               1 TIPS is approximately 25,000
                                                       Online System          ~100 MBytes/sec                  SpecInt95 equivalents

                                                                                   Offline Processor Farm
         There is a “bunch crossing” every 25 nsecs.
                                                                                          ~20 TIPS
         There are 100 “triggers” per second
                                                                                                        ~100 MBytes/sec
         Each triggered event is ~1 MByte in size

                                                      ~622 Mbits/sec
                                                                         Tier 0               CERN Computer Centre
                                       or Air Freight (deprecated)

Tier 1
         France Regional                   Germany Regional                  Italy Regional                     FermiLab ~4 TIPS
             Centre                            Centre                           Centre
                                                                                                                              ~622 Mbits/sec

                                                           Tier 2            Caltech                  Tier2    Tier2 Centre
                                                                                              Tier2 Centre Centre        Tier2 Centre
                                                                             ~1 TIPS            ~1 TIPS ~1 TIPS ~1 TIPS ~1 TIPS
                                            ~622 Mbits/sec

                                       Institute Institute       Institute
                              ~0.25TIPS                                                       Physicists work on analysis “channels”.
                                                                                              Each institute will have ~10 physicists working on one or more
     Physics data cache
                                                ~1 MBytes/sec                                 channels; data for these channels should be cached by the
                                                                                              institute server
                                                                Tier 4
                   Physicist workstations
                                                         Image courtesy Harvey Newman, Caltech
NEES (Network for Earthquake Engineering
       Simulation) Collaboration

                                  U.Nevada Reno

                  Home Computers
                 Evaluate AIDS Drugs
Community   =
   o 1000s of home computer
   o Philanthropic computing
     vendor (Entropia)
   o Research group
Common goal= advance
AIDS research
                               SHARCNET is a high performance scientific
                               computing project involving the University of Western
                               Ontario, University of Guelph, McMaster University,
                               the University of Windsor and Wilfred Laurier

                                          SHARCnet                                   South Western Ontario
SHARCNET provides UWO                  Cluster of Clusters
                                       or “Super Cluster”
researchers with world-class
computing capabilities. As of
November 2001, the computer              Wilfred Laurier
                                                                                                   of Guelph
cluster at the University of              University
Western Ontario was the               Ultra high speed fiber        Waterloo
                                         optic networking
fastest    computer      at     a                                London

Canadian University and the
12th fastest in any University in
North America.                         University
                                                                                             McMaster University
                                       of Windsor
                                                               The University of Western Ontario
                Example Grids


       Ideal Grid-based Scientific
   User submits request through GUI
    o Application
    o Operating System and other requirements
    o Input data
   Grid finds and allocates resources to satisfy
   Grid monitors request processing
    o Moves job when resources fail or are too busy
   Grid notifies user when results are available
   Assume a source file Main.F on machine A,
    an input file on machine B. Main.F is written
    using MPI, it will need around 4GB of core
    memory to run, it will take several hours to
    complete, and will produce a large output

   What functionality is needed?

   How to select a machine to run it on?
   How to provide an executable which can run on
    that machine?
   How to move the input file?
   How to start the executable?
   How to monitor the job? When does it start?
    When does it finish?
   How to move the output file back?
   What about security?
   How do we know if it didn’t work and how it failed?
          How to Select a Machine

   What properties of a machine are we interested
    o What resources does my executable require?
         — 4 GB memory, “several hours of compute time”
         — Enough diskspace for the output
    o What kind of environment do I need on the machine?
         — OS limitations?
         — MPI? (Which version?), Fortran?
    o   What resources am I authorized to run on?
    o   How quickly will it run?
    o   How much will it cost/what is my allocation there?
    o   How to find all this information? What should the user
             More Complicated

   What if the program might need to read in data
    kept on machine C while it is running?
   What about distributing across processors on
    different machines?
   What if I have a lot of interconnected programs?
   How do I find the output file afterwards?
   What if it doesn’t work?
       Common Features Needed by Grid

                                  Resource   registry is an
                                  information source that
                                  allows entities to publish
                                  and update information
                                  about the resource they
                                  wish to share

Figure from Sean Norman’s reading course presentation
Common Features Needed by Grid

               Client is typically an agent
               acting on behalf of the user
                   o Acquires resources
                     requested by the user by
                     consulting resource
                   o Submits an allocation
                     request to the resource
                     manager(s) responsible for
                     the desired resources
Common Features Needed by Grid

               If request can be
               accommodated, resource
               manager(s) update status
               information for acquired
               resources in resource
               Client then sends the
               appropriate executables and
               input data to the allocated
               resources and receives a
               reference to the execution in
Common Features Needed by Grid

               Reference   allows the
               client to monitor the
               execution of a job and
               inquire about its status
               Client may also receive
               the results of the job
               once its execution is
                 Some Solutions

Middleware    Toolkits:      Higher   Level Toolkits
not all speak (or spoke)      (build on Globus)
Globus:                          o JavaCoG
   o   Condor                    o GridPortal Toolkit, Grid
   o   Globus Toolkit              Portal Development
   o   Legion/Avaki                Toolkit (GPDK)
   o   Condor (now Sun Grid      o Condor-G
       Engine)                   o SGE

Shared By: