Docstoc

Overview of Grid Computing

Document Sample
Overview of Grid Computing Powered By Docstoc
					     Grid Computing:
  Concepts, Applications, and
        Technologies


            Dheeraj Bhardwaj
Department of Computer Science and Engineering
      Indian Institute of Technology, Delhi
                                                            2


                          Outline
        The technology landscape
        Grid computing
        The Globus Toolkit
        Applications and technologies
         – Data-intensive; distributed computing;
           collaborative; remote access to facilities
        Grid infrastructure
        Open Grid Services Architecture
        Global Grid Forum
        Summary and conclusions
Dheerajb@cse.iitd.ac.in                             IIT  DELHI
                                                            3


                          Outline
        The technology landscape
        Grid computing
        The Globus Toolkit
        Applications and technologies
         – Data-intensive; distributed computing;
           collaborative; remote access to facilities
        Grid infrastructure
        Open Grid Services Architecture
        Global Grid Forum
        Summary and conclusions
Dheerajb@cse.iitd.ac.in                             IIT  DELHI
                                                   4
         Living in an Exponential World
            (1) Computing & Sensors
Moore’s Law: transistor count doubles each 18 months




    Magnetohydro-
       dynamics
    star formation


Dheerajb@cse.iitd.ac.in                    IIT  DELHI
                                                         5
         Living in an Exponential World:
                    (2) Storage
        Storage density doubles every 12 months
        Dramatic growth in online data (1 petabyte
         = 1000 terabyte = 1,000,000 gigabyte)
         – 2000      ~0.5 petabyte
         – 2005      ~10 petabytes
         – 2010      ~100 petabytes
         – 2015      ~1000 petabytes?
        Transforming entire disciplines in physical
         and, increasingly, biological sciences;
         humanities next?
Dheerajb@cse.iitd.ac.in                          IIT  DELHI
                                                     6
          Data Intensive Physical Sciences

    High energy & nuclear physics
     – Including new experiments at CERN
    Gravity wave searches
     – LIGO, GEO, VIRGO
    Time-dependent 3-D systems (simulation, data)
     – Earth Observation, climate modeling
     – Geophysics, earthquake modeling
     – Fluids, aerodynamic design
     – Pollutant dispersal scenarios
    Astronomy: Digital sky surveys

Dheerajb@cse.iitd.ac.in                      IIT  DELHI
                                                           7

     Ongoing Astronomical Mega-Surveys
    Large number of new surveys
     – Multi-TB in size, 100M objects or larger   MACHO
                                                  2MASS
      – In databases                              SDSS
      – Individual archives planned and under way DPOSS
                                                  GSC-II
    Multi-wavelength view of the sky             COBE
                                                  MAP
      – > 13 wavelength coverage within 5 years NVSS
    Impressive early discoveries                 FIRST
                                                  GALEX
      – Finding exotic objects by unusual colors  ROSAT
         > L,T dwarfs, high redshift quasars      OGLE
                                                  ...
     – Finding objects by time variability
         > Gravitational micro-lensing

Dheerajb@cse.iitd.ac.in                           IIT  DELHI
                                                          8



      Coming Floods of Astronomy Data
        The planned Large Synoptic Survey
         Telescope will produce over 10 petabytes
         per year by 2008!
         – All-sky survey every few days, so will have
           fine-grain time series for the first time




Dheerajb@cse.iitd.ac.in                           IIT  DELHI
             Data Intensive Biology and                       9


                      Medicine
   Medical data
    – X-Ray, mammography data, etc. (many petabytes)
    – Digitizing patient records (ditto)
   X-ray crystallography
   Molecular genomics and related disciplines
    – Human Genome, other genome databases
    – Proteomics (protein structure, activities, …)
    – Protein interactions, drug delivery
   Virtual Population Laboratory (proposed)
    – Simulate likely spread of disease outbreaks
   Brain scans (3-D, time dependent)
Dheerajb@cse.iitd.ac.in                               IIT  DELHI
                                                                         10

                                                    A Brain
                                                    is a Lot
                                                    of Data!
                                              (Mark Ellisman, UCSD)




  And comparisons must be
     made among many

We need to get to one micron to know location of every cell. We’re just now
   starting to get to 10 microns – Grids will help get us there and further

Dheerajb@cse.iitd.ac.in                                         IIT  DELHI
                                                                                               11
      An Exponential World: (3) Networks
          (Or, Coefficients Matter …)
          Network vs. computer performance
            – Computer speed doubles every 18 months
            – Network speed doubles every 9 months
            – Difference = order of magnitude per 5 years
          1986 to 2000
            – Computers: x 500
            – Networks: x 340,000
          2001 to 2010
            – Computers: x 60
            – Networks: x 4000
Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-
Dheerajb@cse.iitd.ac.in Khoslan, Kleiner, Caufield and Perkins.
2001) by Cleo Vilett, source Vined                                                   IIT  DELHI
                                                            12


                          Outline
        The technology landscape
        Grid computing
        The Globus Toolkit
        Applications and technologies
         – Data-intensive; distributed computing;
           collaborative; remote access to facilities
        Grid infrastructure
        Open Grid Services Architecture
        Global Grid Forum
        Summary and conclusions
Dheerajb@cse.iitd.ac.in                             IIT  DELHI
                                                          13



      Evolution of the Scientific Process
        Pre-electronic
         – Theorize &/or experiment, alone or in small
           teams; publish paper
        Post-electronic
         – Construct and mine very large databases of
           observational or simulation data
         – Develop computer simulations & analyses
         – Exchange information quasi-instantaneously
           within large, distributed, multidisciplinary
           teams

Dheerajb@cse.iitd.ac.in                           IIT  DELHI
                                                           14



                  Evolution of Business
        Pre-Internet
         – Central corporate data processing facility
         – Business processes not compute-oriented
        Post-Internet
         – Enterprise computing is highly distributed,
           heterogeneous, inter-enterprise (B2B)
         – Outsourcing becomes feasible => service
           providers of various sorts
         – Business processes increasingly computing-
           and data-rich

Dheerajb@cse.iitd.ac.in                            IIT  DELHI
                                                          15



                          The Grid
       “Resource sharing & coordinated problem
        solving in dynamic, multi-institutional
        virtual organizations”




Dheerajb@cse.iitd.ac.in                           IIT  DELHI
                                                                  16


                                    A Comparison
   SERIAL             PARALLEL             GRID
    Fetch/Store  Fetch/Store              Fetch/Store
    Compute           Compute/            Discovery of
                           communicate       Resources
                       Cooperative game    Interaction with
                                             remote application
                                            Authentication /
                                             Authorization
                                            Security
                                            Compute/Commu
                                             nicate

Dheerajb@cse.iitd.ac.in
                                            Etc        IIT  DELHI
                                                                  17


                                    A Comparison
   SERIAL             PARALLEL             GRID
    Fetch/Store  Fetch/Store              Fetch/Store
    Compute           Compute/            Discovery of
                           communicate       Resources
                       Cooperative game    Interaction with
                                             remote application
                                            Authentication /
                                             Authorization
                                            Security
                                            Compute/Commu
                                             nicate

Dheerajb@cse.iitd.ac.in
                                            Etc        IIT  DELHI
                                                                 18


                  Distributed Computing
                         vs. GRID
             Grid is an evolution of distributed
              computing
              –   Dynamic
              –   Geographically independent
              –   Built around standards
              –   Internet backbone


             Distributed computing is an “older term”
              – Typically built around proprietary
                software and network
              – Tightly couples systems/organization


Dheerajb@cse.iitd.ac.in                                  IIT  DELHI
                                              Web vs.            19

                                               GRID
      Web
       – Uniform naming access to documents

                                    http://

                                          http://




      Grid - Uniform, high performance access to
            Software
       computational
            Catalogs   resources

                                                          Sensor nets

                     Colleges/R&D
                     Labs
Dheerajb@cse.iitd.ac.in                                 IIT  DELHI
                                                                 20
                                 Is the World Wide
                                    Web a Grid ?
              Seamless naming?       Yes
              Uniform security and Authentication?       No
              Information Service?          Yes or No
              Co-Scheduling?         No
              Accounting & Authorization ? No
              User Services?    No
              Event Services?        No
              Is the Browser a Global Shell ?     No




Dheerajb@cse.iitd.ac.in                                  IIT  DELHI
                                                                      21

          What does the World Wide Web
                bring to the Grid ?
            Uniform Naming
            A seamless, scalable information
             service
            A powerful new meta-data language:
             XML
             – XML will be standard language for
               describing information in the grid
             – SOAP – simple object access protocol
                 > Uses XML for encoding. HTML for protocol
             – SOAP may become a standard RPC
               mechanism for Grid services
                 > Uses XML for encoding. HTML for protocol
            Portal Ideas
Dheerajb@cse.iitd.ac.in                                       IIT  DELHI
                                                             22



                               The Ultimate Goal


                   In future I will not know or
                    care where my application will
                    be executed as I will acquire
                    and pay to use these
                    resources as I need them




Dheerajb@cse.iitd.ac.in                              IIT  DELHI
                                                                   23



                          Why Grids?
               Large-scale science and engineering are done
                through the interaction of people,
                heterogeneous computing resources,
                information systems, and instruments, all of
                which are geographically and organizationally
                dispersed.


               The overall motivation for “Grids” is to
                facilitate the routine interactions of these
                resources in order to support large-scale
                science and Engineering.



Dheerajb@cse.iitd.ac.in                                    IIT  DELHI
                                                       24
       An Example Virtual Organization:
        CERN’s Large Hadron Collider
     1800 Physicists, 150 Institutes, 32 Countries




        100 PB of data by 2010; 50,000 CPUs?
Dheerajb@cse.iitd.ac.in                        IIT  DELHI
     Grid Communities & Applications:                                                                                                                    25


     Data Grids for High Energy Physics
                                     ~PBytes/sec
                                                                                                                1 TIPS is approximately 25,000
                                                        Online System          ~100 MBytes/sec                  SpecInt95 equivalents

                                                                                    Offline Processor Farm
          There is a “bunch crossing” every 25 nsecs.
                                                                                           ~20 TIPS
          There are 100 “triggers” per second
                                                                                                         ~100 MBytes/sec
          Each triggered event is ~1 MByte in size

                                                       ~622 Mbits/sec
                                                                          Tier 0               CERN Computer Centre
                                        or Air Freight (deprecated)

 Tier 1
          France Regional                   Germany Regional                  Italy Regional                     FermiLab ~4 TIPS
              Centre                            Centre                           Centre
                                                                                                                               ~622 Mbits/sec


                                                            Tier 2            Caltech                  Tier2    Tier2 Centre
                                                                                               Tier2 Centre Centre        Tier2 Centre
                                                                              ~1 TIPS            ~1 TIPS ~1 TIPS ~1 TIPS ~1 TIPS
                                             ~622 Mbits/sec


                                Institute
                                        Institute Institute       Institute
                               ~0.25TIPS                                                       Physicists work on analysis “channels”.
                                                                                               Each institute will have ~10 physicists working on one or more
      Physics data cache
                                                 ~1 MBytes/sec                                 channels; data for these channels should be cached by the
                                                                                               institute server
                                                                 Tier 4
                    Physicist workstations



www.griphyn.org
Dheerajb@cse.iitd.ac.in                                 www.ppdg.net                                       www.eu-datagrid.org
                                                                                                                    IIT  DELHI
           Intelligent Infrastructure:      26


       Distributed Servers and Services




Dheerajb@cse.iitd.ac.in             IIT  DELHI
                                                            28
                            The Grid:
                          A Brief History
        Early 90s
         – Gigabit testbeds, metacomputing
        Mid to late 90s
         – Early experiments (e.g., I-WAY), academic
           software projects (e.g., Globus, Legion),
           application experiments
        2002
         – Dozens of application communities & projects
         – Major infrastructure deployments
         – Significant technology base (esp. Globus ToolkitTM)
         – Growing industrial interest
         – Global Grid Forum: ~500 people, 20+ countries
Dheerajb@cse.iitd.ac.in                             IIT  DELHI
                                                           33


         The Grid World: Current Status
        Dozens of major Grid projects in scientific &
         technical computing/research & education
         – www.mcs.anl.gov/~foster/grid-projects
        Considerable consensus on key concepts
         and technologies
         – Open source Globus Toolkit™ a de facto
           standard for major protocols & services
        Industrial interest emerging rapidly
         – IBM, Platform, Microsoft, Sun, Compaq, …
        Opportunity: convergence of eScience and
         eBusiness requirements & technologies
Dheerajb@cse.iitd.ac.in                            IIT  DELHI
                                                            34


                          Outline
        The technology landscape
        Grid computing
        The Globus Toolkit
        Applications and technologies
         – Data-intensive; distributed computing;
           collaborative; remote access to facilities
        Grid infrastructure
        Open Grid Services Architecture
        Global Grid Forum
        Summary and conclusions
Dheerajb@cse.iitd.ac.in                             IIT  DELHI
                                                         35
            Grid Technologies:
   Resource Sharing Mechanisms That …

        Address security and policy concerns of
         resource owners and users
        Are flexible enough to deal with many
         resource types and sharing modalities
        Scale to large number of resources, many
         participants, many program components
        Operate efficiently when dealing with large
         amounts of data & computation



Dheerajb@cse.iitd.ac.in                          IIT  DELHI
                                                             36


                Aspects of the Problem
     1) Need for interoperability when different
         groups want to share resources
         – Diverse components, policies, mechanisms
         – E.g., standard notions of identity, means of
           communication, resource descriptions
     2) Need for shared infrastructure services to
         avoid repeated development, installation
         – E.g., one port/service/protocol for remote
           access to computing, not one per tool/appln
         – E.g., Certificate Authorities: expensive to run
        A common need for protocols & services
Dheerajb@cse.iitd.ac.in                            IIT  DELHI
                                                                     37


                  The Hourglass Model

    Focus on architecture issues         Applications
     – Propose set of core services       Diverse global services
       as basic infrastructure
     – Use to construct high-level,
       domain-specific solutions
    Design principles                 Core
                                       services
     –   Keep participation cost low
     –   Enable local control
     –   Support for adaptation
     –   “IP hourglass” model
                                                  Local OS

Dheerajb@cse.iitd.ac.in                                      IIT  DELHI
                                                                    38

        Layered Grid Architecture
   (By Analogy to Internet Architecture)

                                       Application




                                                                      Internet Protocol Architecture
“Coordinating multiple resources”:
ubiquitous infrastructure services,        Collective
app-specific distributed services                       Application

“Sharing single resources”:
negotiating access, controlling use       Resource

“Talking to things”: communication
(Internet protocols) & security        Connectivity     Transport
                                                         Internet
“Controlling things locally”: Access
to, & control of, resources               Fabric           Link


Dheerajb@cse.iitd.ac.in                                 IIT  DELHI
                                                           39



                          Globus Toolkit™
        A software toolkit addressing key technical
         problems in the development of Grid-enabled
         tools, services, and applications
         – Offer a modular set of orthogonal services
         – Enable incremental development of grid-
           enabled tools and applications
         – Implement standard Grid protocols and APIs
         – Available under liberal open source license
         – Large community of developers & users
         – Commercial support

Dheerajb@cse.iitd.ac.in                            IIT  DELHI
                                                           40



                     General Approach
        Define Grid protocols & APIs
         – Protocol-mediated access to remote resources
         – Integrate and extend existing standards
         – “On the Grid” = speak “Intergrid” protocols
        Develop a reference implementation
         – Open source Globus Toolkit
         – Client and server SDKs, services, tools, etc.
        Grid-enable wide variety of tools
         – Globus Toolkit, FTP, SSH, Condor, SRB, MPI, …
        Learn through deployment and applications
Dheerajb@cse.iitd.ac.in                            IIT  DELHI
                                                                         41



                          Key Protocols
        The Globus Toolkit™ centers around four
         key protocols
         – Connectivity layer:
             > Security: Grid Security Infrastructure (GSI)
         – Resource layer:
             > Resource Management: Grid Resource Allocation
               Management (GRAM)
             > Information Services: Grid Resource Information
               Protocol (GRIP) and Index Information Protocol (GIIP)
             > Data Transfer: Grid File Transfer Protocol (GridFTP)

        Also key collective layer protocols
         – Info Services, Replica Management, etc.
Dheerajb@cse.iitd.ac.in                                          IIT  DELHI

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:11
posted:12/7/2011
language:
pages:36