Document Sample
Elektronika Powered By Docstoc
					              Grid Computing

• Outline
    – Introduction
    – Using the grid
    – Ongoing research

The presentation is based on the web, especially the work of Faisal N. Abu-Khzam &
   Michael A. Langston (University of Tenessee)

•   What is Grid Computing?
•   Who Needs It?
•   An Illustrative Example
•   Grid Users
•   Current Grids
         What is Grid Computing?

• Computational Grids
   – Homogeneous (e.g., Clusters)
   – Heterogeneous (e.g., with one-of-a-kind instruments)
• Cousins of Grid Computing
• Methods of Grid Computing
             Computational Grids

• A network of geographically distributed resources
  including computers, peripherals, switches, instruments,
  and data.
• Each user should have a single login account to access all
• Resources may be owned by diverse organizations.
             Computational Grids

• Grids are typically managed by gridware.
• Gridware can be viewed as a special type of middleware
  that enable sharing and manage grid components based on
  user requirements and resource attributes (e.g., capacity,
  performance, availability…)
         Cousins of Grid Computing

•   Parallel Computing
•   Distributed Computing
•   Peer-to-Peer Computing
•   Many others: Cluster Computing, Network Computing,
    Client/Server Computing, Internet Computing, etc...
          Distributed Computing

• People often ask: Is Grid Computing a fancy new name for
  the concept of distributed computing?
• In general, the answer is “no.” Distributed Computing is
  most often concerned with distributing the load of a
  program across two or more processes.
                 P2P Computing

• Sharing of computer resources and services by direct
  exchange between systems.
• Computers can act as clients or servers depending on what
  role is most efficient for the network.
       Methods of Grid Computing

•   Distributed Supercomputing
•   High-Throughput Computing
•   On-Demand Computing
•   Data-Intensive Computing
•   Collaborative Computing
•   Logistical Networking
      Distributed Supercomputing

• Combining multiple high-capacity resources on a
  computational grid into a single, virtual distributed
• Tackle problems that cannot be solved on a single system.
      High-Throughput Computing

• Uses the grid to schedule large numbers of loosely coupled
  or independent tasks, with the goal of putting unused
  processor cycles to work.
           On-Demand Computing

• Uses grid capabilities to meet short-term requirements for
  resources that are not locally accessible.
• Models real-time computing demands.
        Data-Intensive Computing

• The focus is on synthesizing new information from data
  that is maintained in geographically distributed
  repositories, digital libraries, and databases.
• Particularly useful for distributed data mining.
         Collaborative Computing

• Concerned primarily with enabling and enhancing human-
  to-human interactions.
• Applications are often structured in terms of a virtual
  shared space.
              Logistical Networking

• Global scheduling and optimization of data movement.
• Contrasts with traditional networking, which does not explicitly model
  storage resources in the network.
• Called "logistical" because of the analogy it bears with the systems of
  warehouses, depots, and distribution channels.
      Who Needs Grid Computing?

• A chemist may utilize hundreds of processors to screen
  thousands of compounds per hour.
• Teams of engineers worldwide pool resources to analyze
  terabytes of structural data.
• Meteorologists seek to visualize and analyze petabytes of
  climate data with enormous computational demands.
         An Illustrative Example

• Tiffany Moisan, a NASA research scientist, collected
  microbiological samples in the tidewaters around Wallops
  Island, Virginia.
• She needed the high-performance microscope located at
  the National Center for Microscopy and Imaging Research
  (NCMIR), University of California, San Diego.
              Example (continued)

• She sent the samples to San Diego and used NPACI’s
  Telescience Grid and NASA’s Information Power Grid
  (IPG) to view and control the output of the microscope
  from her desk on Wallops Island. Thus, in addition to
  viewing the samples, she could move the platform holding
  them and make adjustments to the microscope.
• The microscope produced a huge dataset of images.
   – This dataset was stored using a storage resource broker on
     NASA’s IPG.
• Moisan was able to run algorithms on this very dataset
  while watching the results in real time.
                      Grid Users

•   Grid developers
•   Tool developers
•   Application developers
•   End Users
•   System Administrators
                Grid Developers

• Very small group.
• Implementers of a grid “protocol” who provides the basic
  services required to construct a grid.
                   Tool Developers

• Implement the programming models used by application
• Implement basic services similar to conventional
  computing services:
   – User authentication/authorization
   – Process management
   – Data access and communication
• Also implement new (grid) services such as:
   –   Resource locations
   –   Fault detection
   –   Security
   –   Electronic payment
          Application Developers

• Construct grid-enabled applications for end-users who
  should be able to use these applications without concern
  for the underlying grid.
• Provide programming models that are appropriate for grid
  environments and services that programmers can rely on
  when developing (higher-level) applications.
          System Administrators

• Balance local and global concerns.
• Manage grid components and infrastructure.
• Some tasks still not well delineated due to the high degree
  of sharing required.
         Some Highly-Visible Grids

•   The NSF PACI/NCSA Alliance Grid.
•   The NASA Information Power Grid (IPG).
•   The Distributed Terascale Facility (DTF) Project.

• Currently being built by NSF’s Partnerships for Advanced
  Computational Infrastructure (PACI)
• A collaboration: NCSA, SDSC, Argonne, and Caltech will
  work in conjunction with IBM, Intel, Quest
  Communications, Myricom, Sun Microsystems, and
• DTF Expectations
   – A 40-billion-bits-per-second optical network (Called TeraGrid) is
     to link computers, visualization systems, and data at four sites.
   – Performs 11.6 trillion calculations per second.
   – Stores more than 450 trillion bytes of data.
               Using the Grid

•   Globus
•   Condor
•   Harness
•   Legion
•   IBP
•   NetSolve
•   Others

• A collaboration of Argonne National Laboratory’s
  Mathematics and Computer Science Division, the
  University of Southern California’s Information Sciences
  Institute, and the University of Chicago's Distributed
  Systems Laboratory.
• Started in 1996 and is gaining popularity year after year.
• A project to develop the underlying technologies needed
  for the construction of computational grids.
• Focuses on execution environments for integrating widely-
  distributed computational platforms, data resources,
  displays, special instruments and so forth.
               The Globus Toolkit

• The Globus Resource Allocation Manager (GRAM)
   – Creates, monitors, and manages services.
   – Maps requests to local schedulers and computers.
• The Grid Security Infrastructure (GSI)
   – Provides authentication services.
                The Globus Toolkit

• The Monitoring and Discovery Service (MDS)
   – Provides information about system status, including server
     configurations, network status, and locations of replicated datasets,
• Nexus and globus_io
   – provides communication services for heterogeneous environments.
• Global Access to Secondary Storage (GASS)
   – Provides data movement and access mechanisms that enable
     remote programs to manipulate local data.
• Heartbeat Monitor (HBM)
   – Used by both system administrators and ordinary users to detect
     failure of system components or processes.

• The Condor project started in 1988 at the University of
• The main goal is to develop tools to support High
  Throughput Computing on large collections of
  distributively owned computing resources.
• Runs on a cluster of workstations to glean wasted CPU cycles.
• A “Condor pool” consists of any number of machines, of possibly
  different architectures and operating systems, that are connected by a
• Condor pools can share resources by a feature of Condor called
         The Condor Pool Software

• Job management services:
   –   Supports requests about the job queue .
   –   Puts a job on hold.
   –   Enables the submission of new jobs.
   –   Provides information about jobs that are already finished.
• A machine with job management installed is called a
  submit machine.
        The Condor Pool Software

• Resource management:
   – Keeps track of available machines.
   – Performs resource allocation and scheduling.
• Machines with resource management installed are called
  execute machines.
• A machine could be a “submit” and an “execute” machine

• A version of Condor that uses Globus to submit jobs to
  remote resources.
• Allows users to monitor jobs submitted through the Globus
• Can be installed on a single machine. Thus no need to have
  a Condor pool installed.

• An object-based metasystems software project designed at
  the University of Virginia to support millions of hosts and
  trillions of objects linked together with high-speed links.
• Allows groups of users to construct shared virtual work
  spaces, to collaborate research and exchange information.
• An open system designed to encourage third party
  development of new or updated applications, run-time
  library implementations, and core components.
• The key feature of Legion is its object-oriented approach.

• A Heterogeneous Adaptable Reconfigurable Networked
• A collaboration between Oak Ridge National Lab, the
  University of Tennessee, and Emory University.
• Conceived as a natural successor of the PVM project.

• An experimental system based on a highly customizable,
  distributed virtual machine (DVM) that can run on
  anything from a Supercomputer to a PDA.
• Built on three key areas of research: Parallel Plug-in
  Interface, Distributed Peer-to-Peer Control, and Multiple
  DVM Collaboration.

• The Internet Backplane Protocol (IBP) is a middleware for
  managing and using remote storage.
• It was devised at the University of Tennessee to support
  Logistical Networking in large scale, distributed systems
  and applications.

• Named because it was designed to enable applications to
  treat the Internet as if it were a processor backplane.
• On a processor backplane, the user has access to memory
  and peripherals, and can direct communication between
  them with DMA.

• IBP gives the user access to remote storage and
  standard Internet resources (e.g. content servers
  implemented with standard sockets) and can direct
  communication between them with the IBP API.

• By providing a uniform, application-independent interface
  to storage in the network, IBP makes it possible for
  applications of all kinds to use logistical networking to
  exploit data locality and more effectively manage buffer

• A client-server-agent model.
• Designed for solving complex scientific problems in a
  loosely-coupled heterogeneous environment.
           The NetSolve Agent

• A “resource broker” that represents the gateway to
  the NetSolve system
• Maintains an index of the available computational
  resources and their characteristics, in addition to
  usage statistics.
           The NetSolve Agent

• Accepts requests for computational services from
  the client API and dispatches them to the best-
  suited sever.
• Runs on Linux and UNIX.
             The NetSolve Client

• Provides access to remote resources through simple and
  intuitive APIs.
• Runs on a user’s local system.
• Contacts the NetSolve system through the agent, which in
  turn returns the server that can best service the request.
• Runs on Linux, UNIX, and Windows.
            The NetSolve Server

• The computational backbone of the system.
• A daemon process that awaits client requests.
• Runs on different platforms: a single workstation, cluster
  of workstations, symmetric multiprocessors (SMPs), or
  massively parallel processors (MPPs).
           The NetSolve Server

• A key component of the server is the Problem Description
  File (PDF).
• With the PDF, routines local to a given server are made
  available to clients throughout the NetSolve system.
                The PDF Template

PROBLEM Program Name
LIB Supporting Library Information
INPUT specifications
OUTPUT specifications
        Network Weather Service

• Supports grid technologies.
• Uses sensor processes to monitor cpu loads and network
• Uses statistical models on the collected data to generate a
  forecast of future behavior.
• NetSolve is currently integrating NWS into its agent.
             Gridware Collaboarations

• NetSolve is using Globus' "Heartbeat Monitor" to detect failed servers.
• A NetSolve client is now in testing that allows access to Globus.
• Legion has adopted NetSolve’s client-user interface to leverage its
  metacomputing resources.
• The NetSolve client uses Legion’s data-flow graphs to keep track of
  data dependencies.
           Gridware Collaboarations

• NetSolve can access Condor pools among its
  computational resources.
• IBP-enabled clients and servers allow NetSolve to allocate
  and schedule storage resources as part of its resource
  brokering. This improves fault tolerance.
                Ongoing Research

• Motivation
• Special Projects
   – Ongoing work at Tennessee
• General Issues
   – Open questions of interest to the entire research community

• Computer speed doubles every
  18 months
• Network speed doubles every 9

                       Graph from Scientific American (Jan-2001) by Cleo Vilett,
                       source Vined Khoslan, Kleiner, Caufield and Perkins
                 Special Projects

• The SInRG Project.
   – Grid Service Clusters (GSCs)
   – Data Switches
• Incorporating Hardware Acceleration.
• Unbridled Parallelism
   – SETI@home and Folding@home
   – The Vertex Cover Solver
• Security.
The SInRG Project
        The Grid Service Cluster

• The basic grid building block.
• Each GSC will use the same software infrastructure as is
  now being deployed on the national Grid, but tuned to take
  advantage of the highly structured and controlled design of
  the cluster.
• Some GSCs are general-purpose and some are special-
The Grid Service Cluster
       An advanced data switch

• The components that make up a GSC must be able
  to access each other at very high speeds and with
  guaranteed Quality of Service (QoS).
• Links of at least1Gbps assure QoS in many
  circumstances simply by over provisioning.
     Computational Ecology GSC

•   Collaboration between computer science and
    mathematical ecology.
•   8-processor Symmetric Multi-Processor (SMP).
•   Initial in-core memory (RAM) is approximately 4
•   Out-of-core data storage unit provides a minimum of
    450 gigabytes.
          Medical Imaging GSC

•   Collaboration between computer science and the
    medical school.
•   High-end graphics workstations.
•   Distinguished by the need to have these workstations
    attached as directly as possible to the switch to facilitate
    interactive manipulation of the reconstructed images.
           Molecular Design GSC

• Collaboration between computer science and chemical
• Data visualization laboratory
• 32 dual processors
• High performance switch
          Machine Design GSC

• Collaboration between computer science and
  electrical engineering.
• 12 Unix-based CAD workstations.
• 8 Linux boxes with Pilchard boards.
• Investigating the potential of reconfigurable
  computing in grid environments.
Machine Design GSC
            Types of Hardware

• General purpose hardware – can implement any function
• ASICs – hardware that can implement only a specific
• FPGAs – reconfigurable hardware that can implement any
                     The FPGA

• FPGAs offer reprogrammability
• Allows optimal logic design of each function to be
• Hardware implementations offer acceleration over
  software implementations which are run on general
  purpose processors
       The Pilchard Environment

• Developed at Chinese University in Hong Kong.
• Plugs into 133MHz RAM DIMM slot and is an example of
  “programmable active memory.”
• Pilchard is accessed through memory read/write
• Higher bandwidth and lower latency than other

• Evaluate utility of NetSolve gridware.
• Determine effectiveness of hardware acceleration in this
• Provide an interface for the remote use of FPGAs.
• Allow users to experiment and gauge whether a given
  problem would benefit from hardware acceleration.
         Sample Implementations

•   Fast Fourier Transform (FFT)
•   Data Encryption Standard algorithm (DES)
•   Image backprojection algorithm
•   A variety of combinatorial algorithms
       Implementation Techniques

•   Two types of functions are implemented
•   Software version - runs on the PC’s processor
•   Hardware version - runs in the FPGA
•   To implement the hardware version of the function, VHDL
    code is needed
         The Hardware Function

• Implemented in VHDL or some other hardware description
• The VHDL code is then mapped onto the FPGA
• CAD tools help make mapping decisions based on
  constraints such as: chip area, I/O pin counts, routing
  resources and topologies, partitioning, resource usage
         The Hardware Function

• Result of synthesis is a configuration file (bit stream).
• This file defines how the FPGA is to be reprogrammed in
  order to implement the new desired functionality.
• To run, a copy of the configuration file must be loaded on
  the FPGA.
             Behind the Scenes

  VHDL                                   Software              Server
programmer           Synthesis          programmer          administrator
             VHDL           Configuration        Software and
              code              file             Hardware functions


                       Client                   NetSolve

• Hardware acceleration is offered to both local and remote
• Resources are available through an efficient and easy-to-
  use interface.
• A development environment is provided for devising and
  testing a wide variety of software, hardware and hybrid
• Unbridled parallelism
   – Sometimes the overhead of gridware is unneeded
   – Well known examples include SETI@home and Folding@home

Shared By: