Talk.ppt - Centre for Advanced Computing and Emerging Technologies by ert554898


									                      The Grid

            Presented by: Prof Mark Baker

               ACET, University of Reading
                   Tel: +44 118 378 8615

April 29, 2012
•    Characterisation of the Grid.
•   Evolution of the Grid.
•   The Grid’s Architecture.
•   E-Science.
•   Utility Computing.
•   Lies, Damn Lies, and …
•   Aspects of Executing Applications,
•   Virtualisation,
•   Application tasks and workload.
•   Summary and conclusions

                       April 29, 2012
      Characterisation of the Grid
• In 2001, Foster, Kesselman and Tuecke
  refined their original definition of a grid

  "co-ordinated resource sharing and problem
  solving in dynamic, multi-institutional virtual

• This definition is the one most commonly
  used to day to abstractly define a grid.

                    April 29, 2012
       Characterisation of the Grid
• Foster later produced a checklist that
  could be used to help understand exactly
  what can be identified as a grid system,
  three parts:
1. Co-ordinated resource sharing with no centralised
   point of control and that the users resided within
   different administrative domains:
   –   If not true it is probably the case that this is not a grid
2. Standard, open, general-purpose protocols and
   –   If not, it is unlikely that system components will be able
       to communicate or inter-operate, and it is likely that we
       are dealing with an application-specific system, and not
       the Grid.

                          April 29, 2012
        Characterisation of the Grid
3. Delivering non-trivial qualities of service - here we
   are considering how the components that make up a
   grid can be used in a co-ordinated way to deliver
   combined services, which are appreciably greater
   than sum of the individual components:
   –   These services may be associated with throughput,
       response time, meantime between failure, security, or
       many other facets.

                          April 29, 2012
     Characterisation of the Grid
• From a commercial view point, IBM define
  a grid as:

 “a standards-based application/resource sharing
 architecture that makes it possible for
 heterogeneous systems and applications to share
 compute and storage resources transparently”

                    April 29, 2012
                    Evolution of the Grid
• The early to mid 1990s marked the emergence of the
  early metacomputing or grid environments.
• IWAY from ANL was first demonstrated at SC95 in
  San Diego.
• Typically, the objective of these early metacomputing
  projects was to provide computational resources to a
  range of high performance applications.
• Over time this had changed to providing a virtual
  distributed environment that for all manner of
  application types.
• Now, there is now a vast array grid software:
   – Middleware - Globus, UNICORE, gLite, OMII, Crown…
   – Tools - Cactus, GridSAM, Condor, SGE, SRB…
   – Standards efforts (OGF, OASIS) - OGSA, JSDL,
   April 29, 2012
                    Virtual Organisations
• Resource sharing and coordinated problem solving
  in dynamic multi-institutional virtual organisations

   • Security via PKI – X.509 Certs, and myProxy.

   April 29, 2012
                  What is not a Grid!
• A cluster, a network attached storage device, a desktop PC, a
  scientific instrument, a network; these are not grids:
   – Each might be an important component of a grid, but by itself, it
     does not constitute a grid.
• Screen saver/cycle stealers:
   – SETI@HOME, fold@home, etc…,
   – Other application specific to distributed computing.
• Most of the current “Grid” providers:
   – Proprietary technology with closed model of operation.
• Globus:
   – It is a toolkit to build a system that might work as or within a
• Sun Grid Engine, Platform LSF and related.
• Almost anything referred to as a grid by marketeers, e.g.
  Oracle 10g!
                            April 29, 2012
                    The Grid’s Architecture
• Moved to Service Oriented Architecture (SOA) Web
  Services-based Grid infrastructure back in early 2003 -
  Open Grid Service Architecture (OGSA):
   –   Huge effort to standardise everything Grid-related!
   –   Out popped OGSI, quickly dropped,
   –   Then WSRF (Jan 2004+),
   –   More recently WS-Resource Transfer (Oct 2006)!
• Also there is a great debate about services and state!
• Funnily enough, Globus 2.4, is still very popular and a key
  part of many Grid projects.
• Far too many changes, standards and specifications, and all
  very confusing and complicated…
• Mutterings in the Grid community for several years now,
  due to fact that no one really knows what standards and
  specs to use:
   – And if the one chosen will actually still being used in the future?

   April 29, 2012
                  And to day!

April 29, 2012
• Should we be using standards IF they:
   – Are new and just emerging:
         • Develop on the “bleeding edge”!
   –   Are changing frequently, for example UDDI!
   –   Enhance interoperability, but potentially cripple performance,
   –   Are not widely adopted,
   –   Are not easy to understand and complicated to implement.
• What are the alternatives?
   – Web 2.0,
   – REST.

 April 29, 2012
• Moved toward Cyber-Infrastructure and e-
  Research, e-Science was one of the first
  – “e-Science is about global collaboration in key areas
    of science, and the next generation of
    infrastructure that will enable it.”
  – “e-Science will change the dynamics of the way
    science is undertaken.”
  – John Taylor, Director General of Research Councils, Office of
    Science and Technology

                          April 29, 2012
            The Drivers for e-Science
• More data:
   – Instrument resolution and laboratory automation,
   – Storage capacity and data sources.
• More computation:
   – Computations available, simulations        Doubling every year
• Faster networks:
   – Bandwidth,
   – Need to schedule.
• More inter-play and collaboration:
   – Between scientists, engineers, computer scientists etc.,
   – Between computation and data.

                            April 29, 2012
         The Drivers for e-Science
• Collaboration,
• Data Deluge,
• Digital Technology:
   – Ubiquity,
   – Cost reduction,
   – Performance increase.

In summary:

    Shared data, information and computation by
        geographically dispersed communities.

                        April 29, 2012
                  Utility Computing
• Most researchers in Grid arena believe that computing
  services will, in the future, be provided in a similar
  fashion to telephone, electricity and other utilities
   – In this case there will be a market with different companies
     competing and co-operating to serve customers.
• Companies will have to set prices to attract customers
  and make profits.
• For instance both IBM and SUN are selling Grid
  access by the hour.
• Soon we will see brokers re-selling Grid services they
  bought in bulk from providers.

 April 29, 2012
                  Utility Computing
• For the vendors who may wish to provide Grid access
  as an efficient alternative to buying computers, it is
  important to have a market models to be able to test
  various pricing schemes.
• This will be particularly relevant once futures and
  options for Grid access are sold.
• Customers of a Grid market are likely to be banks,
  insurance and engineering companies, games vendors,
  as well as universities and other research
• Significant efforts have been expended understanding
  and modelling financial markets, much less is known
  about modelling commodity markets.

 April 29, 2012
      Lies, damned lies, and statistics
• This well-known saying is part of a phrase
  attributed to Benjamin Disraeli and
  popularised in the U.S. by Mark Twain:
   – There are three kinds of lies: lies, damned lies, and
• The semi-ironic statement refers to the
  persuasive power of numbers, and succinctly
  describes how even accurate statistics can be
  used to bolster inaccurate arguments.

 April 29, 2012
       Lies, Damn Lies, and Benchmarks
• The level of media attention is a reflection of how
  computer performance has become a growing concern
  for virtually everyone.
• Computers are becoming ubiquitous, and as such they
  are becoming a significant part of any company's budget
  -- and in today's competitive climate every significant
  budget item is being closely monitored.
• Buying too little computing power can seriously limit the
  ability to get the job done.
• However, buying too much can raise the cost of the job
  above where it is effective.
• Thus, there is great interest in determining just how
  much performance can be expected from any given
  computer system.         Alexander Carlton
                           Hewlett-Packard, Cupertino, Calif.,Dec 1994
   April 29, 2012
       Aspects of Executing Applications
•   Want to successfully execute sequential and parallel jobs.
•   Maximise the utilisation of the machine(s).
•   Maximize throughput of machine(s)
•   Fairness of resource allocation (maybe).
•   Multiple queues and policies.
•   Large system (HPC platform) fragmentation.
•   Minimise response time.
•   Throughput may not schedule large jobs
•   Knowing the desired job run-time
•   Scheduler queues and failures.
•   Workload, rigid versus flexible.
•   Shared versus single resource use.

     April 29, 2012
                  Aspects of Job Scheduling
• Scheduling algorithms (examples):
   –    Backfill,
   –    Destination Hashing
   –    Dynamic Feedback Load Balancing
   –    Least-Connection
   –    Locality-Based Least-Connection
   –    Locality-Based Least-Connection with Replication
   –    Never Queue
   –    Round-Robin
   –    Shortest Expected Delay
   –    Source Hashing Scheduling
   –    Weighted Least-Connection
   –    Weighted Round-Robin…

       April 29, 2012
• Virtualisation has recently had a big impact!
• Problem with many of the current middleware stacks
  is that they mandate a certain OS, versions of
  software (e.g. Java, Tomcat, Axis…).
• For example - Mono, VMWare, and Xen.
• Virtualisation provide the ability to create DOMs
  that follow the software needs of each middleware
  stack, thus proving the ability work with multiple
• Virtualisation have provide some issues:
   – Emerging technology, so not always as robust as desired,
   – Inter-operation between Microsoft and UNIX is still a
     potential issue,
   – Imposes extra overheads and resource use.
 April 29, 2012
                  Application Tasks
• Applications typically need particular platforms,
  operating systems and libraries!
• Types:
   – Simple sequential,
   – Workflows – simple to complex,
   – Parameter sweeps – running the same task 100s/1000s of,
     with different input parameters,
   – Parallel applications – tasks running concurrently with peer
   – Applications that need particular resources – DB,
     visualisation, or other special kit,
   – Service oriented – loosely  tightly coupled…
   – And so on…

 April 29, 2012
                  Application Workload
• There’s been a lot of work in this area, especially to
  create ideal scheduling algorithms:
   – Normally many assumptions are made, typically the run time
     and resources needed are known “exactly”!
• In reality, the resources needed and time taken by an
  application is dependent on the underlying resources
  (based on the hardware (CPU/Mem/Comms/Disk, OS,
   – Without prior executions you will not know.
• It is also typically assumed that a scientist knows how
  long their application takes to run when using a
  batching scheduling system!
   – Example: Using NGS, wanted 8 Tbytes of disk space, had to
     also include how many CPU hours needed too!

 April 29, 2012
                    Some Questions
• Given that Grid is about resource pooling, is it always
  true that participating is always better than self-
   – Do we gain from Grid participation?
• Sharing policies that maximise total performance
  should be preferred?
   – Egalitarian sharing versus prioritised?
• How crucial are sharing policies for the sustainability
  of Grid infrastructures?
   – Stability issues?
• How do we really provide SLAs and QoS in a Grid
• How to enable pervasive participation:
   – Trust and security between consumers and providers!?

   April 29, 2012
                    Economic Incentives
• Economic Challenges:
   – Incentives to share resources,
   – Allocations that guarantee consumers a high value.
• Why engineer a Grid market?
   – Idea of applying Markets to distributed systems is old …
        • Ferguson et al. [1988]: Market based load balancing, Regev, Nisan
          [1998]: Popcorn, Market for CPU scheduling., Buyya et al. [2002]:
          Grid economy.
   – None of the proposed mechanisms are applied in commercial
• System Design Challenges
   – What is traded?
        • Often just CPU, but there is also RAM, Diskspace, and communication.
   – What are the technical requirements?
   – How can they be realised in a market?

   April 29, 2012 
                  Economic Incentives
• Economic Design Challenges:
   – What is the objective of the allocation?
   – What are the economic requirements?

 April 29, 2012
     Resource Sharing - Assumptions
• Examples – fair share, proportional share, and pay-as-
• All three mechanisms have a common drawback - they
  can only be used in scenarios where one resource
  provider serves several consumers, that is, there is no
  competition among providers.
   – Incases where all resources are under fully centralised
     control, this condition is no a problem.
• However, one idea of a Grid is to cross administrative
• Consequently, there is a need for market mechanisms
  that support multiple resource providers.

 April 29, 2012
                    Thesis Contribution
• Problem - how statistical demand forecasting methods can
  be integrated into a large-scale compute farm
  infrastructure to allow both resource consumers and
  resource providers to make economically and
  computationally efficient allocation decisions
• Contribution - a set of methods to predict demand in
  computational markets based on the proportional share
  resource allocation principle:
   – A model encompassing these methods the Proportional Share
     Market Prediction and Access Control (PS-MP/AC) model.
   – PS-MP/AC includes:
       • Collecting and summarising resource prices,
       • Algorithms to estimate bounds of future demand with statistical
       • A risk probing interface for resource consumers,
       • An access control mechanism for resource providers.
   April 29, 2012
• Trace Analysis:
   – Job traces as well as load traces from a large-scale shared
     computational network were pre-processed to represent time
     series of global demand.
   – Traces were also used to evaluate predictor models, and to
     drive simulations and experiments.
• Mathematical Modeling:
   – Probabilistic models were designed based on the trace
     analyses and the distributional properties discovered.
• Prototype Implementation:
   – All models were therefore implemented in both prototype
     simulations and more robust implementations in full-scale
   – Many of the implementations were also tested with real users
     and under real workloads.

 April 29, 2012
• Simulation Evaluation:
   – Simulations were designed to test initial implementations and
     to narrow down the problem and parameter space that was
     most interesting to test in a live system.
• Experimental Evaluation:
   – Second best option of running simulated user loads with real
     applications in the real system, and measuring sporadic usage
     from real users.
   – All experiments were conducted in the HP Labs Tycoon
     cluster of 80 nodes in Palo Alto.

 April 29, 2012
• What are the real incentives to use a market-place
  like this when local resources are getting
  progressively cheaper?
• Can the analysis of the traces job and loads on various
  computational resources really help predict the usual
  application workloads over a Grid environment?
• Has the statistical analysis and model created
  produced true picture of workload over a Grid
• What about the question of workload prediction?
• What types of applications can be used in the Tycoon

 April 29, 2012
• Does the risk prober really help the consumer know
  that they are getting good value for money?
• How will the system produced be used across the
  multiple grid middleware stacks?
• What is the real overhead of using a system like this
  in a world-wide market?
   – How would one partition such a system?
• What about the Microsoft resources?

 April 29, 2012
• The various Grid initiatives have helped make huge
  strides in creating globally shared distributed systems.
• One of the key ideas behind the Grid has been sharing
  common protocols and API, this has not always
   – Following the Web Services route has produced many wins, but
     also made environments increasingly complicated:
        • Too many specification and standards, which are not that stable and
        • Example: WSRF::Lite and Apache WSRF can work together, because each
          uses different different encoding – “Literal” verses “Section 5”.

   – OGF and OASIS have taken far too long to push
     standards out.
   – People are looking at alternatives – REST is now
     increasingly being used.
   April 29, 2012
• A question that is still relevant is, starting now, what
  standards and specifications would I actually used to
  create a Grid environment?
   – Globus, UNICORE, Glite, CROWN, OMII,…
        • All work, but are not interoperable.

• Virtualisation is one way ahead, create DOMs with all
  the grid middleware stacks, and load them on-demand.
• Commercial entities are not really using grid
   – Google are a REST-based storage system,
   – Amazon Elastic Compute is a web service that provides
     resizable compute capacity in the cloud.
• Greater interest recently (including OGF) in cloud
   April 29, 2012  

To top