Talk.ppt - Centre for Advanced Computing and Emerging Technologies by ert554898

VIEWS: 9 PAGES: 35

									                      The Grid

            Presented by: Prof Mark Baker



               ACET, University of Reading
                   Tel: +44 118 378 8615
             E-mail: Mark.Baker@computer.org
             Web: http://acet.rdg.ac.uk/~mab



April 29, 2012       mark.baker@computer.org
                        Outline
•    Characterisation of the Grid.
•   Evolution of the Grid.
•   The Grid’s Architecture.
•   E-Science.
•   Utility Computing.
•   Lies, Damn Lies, and …
•   Aspects of Executing Applications,
•   Virtualisation,
•   Application tasks and workload.
•   Summary and conclusions



                       April 29, 2012
      Characterisation of the Grid
• In 2001, Foster, Kesselman and Tuecke
  refined their original definition of a grid
  to…

  "co-ordinated resource sharing and problem
  solving in dynamic, multi-institutional virtual
  organizations“

• This definition is the one most commonly
  used to day to abstractly define a grid.


                    April 29, 2012
       Characterisation of the Grid
• Foster later produced a checklist that
  could be used to help understand exactly
  what can be identified as a grid system,
  three parts:
1. Co-ordinated resource sharing with no centralised
   point of control and that the users resided within
   different administrative domains:
   –   If not true it is probably the case that this is not a grid
       system!
2. Standard, open, general-purpose protocols and
   interfaces:
   –   If not, it is unlikely that system components will be able
       to communicate or inter-operate, and it is likely that we
       are dealing with an application-specific system, and not
       the Grid.

                          April 29, 2012
        Characterisation of the Grid
3. Delivering non-trivial qualities of service - here we
   are considering how the components that make up a
   grid can be used in a co-ordinated way to deliver
   combined services, which are appreciably greater
   than sum of the individual components:
   –   These services may be associated with throughput,
       response time, meantime between failure, security, or
       many other facets.




                          April 29, 2012
     Characterisation of the Grid
• From a commercial view point, IBM define
  a grid as:

 “a standards-based application/resource sharing
 architecture that makes it possible for
 heterogeneous systems and applications to share
 compute and storage resources transparently”




                    April 29, 2012
                    Evolution of the Grid
• The early to mid 1990s marked the emergence of the
  early metacomputing or grid environments.
• IWAY from ANL was first demonstrated at SC95 in
  San Diego.
• Typically, the objective of these early metacomputing
  projects was to provide computational resources to a
  range of high performance applications.
• Over time this had changed to providing a virtual
  distributed environment that for all manner of
  application types.
• Now, there is now a vast array grid software:
   – Middleware - Globus, UNICORE, gLite, OMII, Crown…
   – Tools - Cactus, GridSAM, Condor, SGE, SRB…
   – Standards efforts (OGF, OASIS) - OGSA, JSDL,
     DRMAA…
   April 29, 2012       mark.baker@computer.org
                    Virtual Organisations
• Resource sharing and coordinated problem solving
  in dynamic multi-institutional virtual organisations




   • Security via PKI – X.509 Certs, and myProxy.

   April 29, 2012       mark.baker@computer.org
                  What is not a Grid!
• A cluster, a network attached storage device, a desktop PC, a
  scientific instrument, a network; these are not grids:
   – Each might be an important component of a grid, but by itself, it
     does not constitute a grid.
• Screen saver/cycle stealers:
   – SETI@HOME, fold@home, etc…,
   – Other application specific to distributed computing.
• Most of the current “Grid” providers:
   – Proprietary technology with closed model of operation.
• Globus:
   – It is a toolkit to build a system that might work as or within a
     grid.
• Sun Grid Engine, Platform LSF and related.
• Almost anything referred to as a grid by marketeers, e.g.
  Oracle 10g!
                            April 29, 2012
                    The Grid’s Architecture
• Moved to Service Oriented Architecture (SOA) Web
  Services-based Grid infrastructure back in early 2003 -
  Open Grid Service Architecture (OGSA):
   –   Huge effort to standardise everything Grid-related!
   –   Out popped OGSI, quickly dropped,
   –   Then WSRF (Jan 2004+),
   –   More recently WS-Resource Transfer (Oct 2006)!
• Also there is a great debate about services and state!
• Funnily enough, Globus 2.4, is still very popular and a key
  part of many Grid projects.
• Far too many changes, standards and specifications, and all
  very confusing and complicated…
• Mutterings in the Grid community for several years now,
  due to fact that no one really knows what standards and
  specs to use:
   – And if the one chosen will actually still being used in the future?

   April 29, 2012        mark.baker@computer.org
                  And to day!




April 29, 2012   mark.baker@computer.org
                          Thoughts
• Should we be using standards IF they:
   – Are new and just emerging:
         • Develop on the “bleeding edge”!
   –   Are changing frequently, for example UDDI!
   –   Enhance interoperability, but potentially cripple performance,
   –   Are not widely adopted,
   –   Are not easy to understand and complicated to implement.
• What are the alternatives?
   – Web 2.0,
   – REST.




 April 29, 2012        mark.baker@computer.org
                       e-Science
• Moved toward Cyber-Infrastructure and e-
  Research, e-Science was one of the first
  drivers:
  – “e-Science is about global collaboration in key areas
    of science, and the next generation of
    infrastructure that will enable it.”
  – “e-Science will change the dynamics of the way
    science is undertaken.”
  – John Taylor, Director General of Research Councils, Office of
    Science and Technology




                          April 29, 2012
            The Drivers for e-Science
• More data:
   – Instrument resolution and laboratory automation,
   – Storage capacity and data sources.
• More computation:
   – Computations available, simulations        Doubling every year
• Faster networks:
   – Bandwidth,
   – Need to schedule.
• More inter-play and collaboration:
   – Between scientists, engineers, computer scientists etc.,
   – Between computation and data.




                            April 29, 2012
         The Drivers for e-Science
• Collaboration,
• Data Deluge,
• Digital Technology:
   – Ubiquity,
   – Cost reduction,
   – Performance increase.


In summary:

    Shared data, information and computation by
        geographically dispersed communities.



                        April 29, 2012
                  Utility Computing
• Most researchers in Grid arena believe that computing
  services will, in the future, be provided in a similar
  fashion to telephone, electricity and other utilities
  today:
   – In this case there will be a market with different companies
     competing and co-operating to serve customers.
• Companies will have to set prices to attract customers
  and make profits.
• For instance both IBM and SUN are selling Grid
  access by the hour.
• Soon we will see brokers re-selling Grid services they
  bought in bulk from providers.



 April 29, 2012      mark.baker@computer.org
                  Utility Computing
• For the vendors who may wish to provide Grid access
  as an efficient alternative to buying computers, it is
  important to have a market models to be able to test
  various pricing schemes.
• This will be particularly relevant once futures and
  options for Grid access are sold.
• Customers of a Grid market are likely to be banks,
  insurance and engineering companies, games vendors,
  as well as universities and other research
  organisations.
• Significant efforts have been expended understanding
  and modelling financial markets, much less is known
  about modelling commodity markets.

 April 29, 2012     mark.baker@computer.org
      Lies, damned lies, and statistics
• This well-known saying is part of a phrase
  attributed to Benjamin Disraeli and
  popularised in the U.S. by Mark Twain:
   – There are three kinds of lies: lies, damned lies, and
     statistics!
• The semi-ironic statement refers to the
  persuasive power of numbers, and succinctly
  describes how even accurate statistics can be
  used to bolster inaccurate arguments.




 April 29, 2012    mark.baker@computer.org
       Lies, Damn Lies, and Benchmarks
• The level of media attention is a reflection of how
  computer performance has become a growing concern
  for virtually everyone.
• Computers are becoming ubiquitous, and as such they
  are becoming a significant part of any company's budget
  -- and in today's competitive climate every significant
  budget item is being closely monitored.
• Buying too little computing power can seriously limit the
  ability to get the job done.
• However, buying too much can raise the cost of the job
  above where it is effective.
• Thus, there is great interest in determining just how
  much performance can be expected from any given
  computer system.         Alexander Carlton
                           Hewlett-Packard, Cupertino, Calif.,Dec 1994
   April 29, 2012       mark.baker@computer.org
       Aspects of Executing Applications
•   Want to successfully execute sequential and parallel jobs.
•   Maximise the utilisation of the machine(s).
•   Maximize throughput of machine(s)
•   Fairness of resource allocation (maybe).
•   Multiple queues and policies.
•   Large system (HPC platform) fragmentation.
•   Minimise response time.
•   Throughput may not schedule large jobs
•   Knowing the desired job run-time
•   Scheduler queues and failures.
•   Workload, rigid versus flexible.
•   Shared versus single resource use.

     April 29, 2012    mark.baker@computer.org
                  Aspects of Job Scheduling
• Scheduling algorithms (examples):
   –    Backfill,
   –    Destination Hashing
   –    Dynamic Feedback Load Balancing
   –    Least-Connection
   –    Locality-Based Least-Connection
   –    Locality-Based Least-Connection with Replication
   –    Never Queue
   –    Round-Robin
   –    Shortest Expected Delay
   –    Source Hashing Scheduling
   –    Weighted Least-Connection
   –    Weighted Round-Robin…



       April 29, 2012      mark.baker@computer.org
                     Virtualisation
• Virtualisation has recently had a big impact!
• Problem with many of the current middleware stacks
  is that they mandate a certain OS, versions of
  software (e.g. Java, Tomcat, Axis…).
• For example - Mono, VMWare, and Xen.
• Virtualisation provide the ability to create DOMs
  that follow the software needs of each middleware
  stack, thus proving the ability work with multiple
  stacks.
• Virtualisation have provide some issues:
   – Emerging technology, so not always as robust as desired,
   – Inter-operation between Microsoft and UNIX is still a
     potential issue,
   – Imposes extra overheads and resource use.
 April 29, 2012      mark.baker@computer.org
                  Application Tasks
• Applications typically need particular platforms,
  operating systems and libraries!
• Types:
   – Simple sequential,
   – Workflows – simple to complex,
   – Parameter sweeps – running the same task 100s/1000s of,
     with different input parameters,
   – Parallel applications – tasks running concurrently with peer
     communications,
   – Applications that need particular resources – DB,
     visualisation, or other special kit,
   – Service oriented – loosely  tightly coupled…
   – And so on…



 April 29, 2012       mark.baker@computer.org
                  Application Workload
• There’s been a lot of work in this area, especially to
  create ideal scheduling algorithms:
   – Normally many assumptions are made, typically the run time
     and resources needed are known “exactly”!
• In reality, the resources needed and time taken by an
  application is dependent on the underlying resources
  (based on the hardware (CPU/Mem/Comms/Disk, OS,
  libraries):
   – Without prior executions you will not know.
• It is also typically assumed that a scientist knows how
  long their application takes to run when using a
  batching scheduling system!
   – Example: Using NGS, wanted 8 Tbytes of disk space, had to
     also include how many CPU hours needed too!

 April 29, 2012       mark.baker@computer.org
                    Some Questions
• Given that Grid is about resource pooling, is it always
  true that participating is always better than self-
  provisioning?
   – Do we gain from Grid participation?
• Sharing policies that maximise total performance
  should be preferred?
   – Egalitarian sharing versus prioritised?
• How crucial are sharing policies for the sustainability
  of Grid infrastructures?
   – Stability issues?
• How do we really provide SLAs and QoS in a Grid
  environment?
• How to enable pervasive participation:
   – Trust and security between consumers and providers!?

   April 29, 2012        mark.baker@computer.org
                    Economic Incentives
• Economic Challenges:
   – Incentives to share resources,
   – Allocations that guarantee consumers a high value.
• Why engineer a Grid market?
   – Idea of applying Markets to distributed systems is old …
        • Ferguson et al. [1988]: Market based load balancing, Regev, Nisan
          [1998]: Popcorn, Market for CPU scheduling., Buyya et al. [2002]:
          Grid economy.
   – None of the proposed mechanisms are applied in commercial
     applications.
• System Design Challenges
   – What is traded?
        • Often just CPU, but there is also RAM, Diskspace, and communication.
   – What are the technical requirements?
   – How can they be realised in a market?

   April 29, 2012           mark.baker@computer.org
                  Economic Incentives
• Economic Design Challenges:
   – What is the objective of the allocation?
   – What are the economic requirements?




 April 29, 2012      mark.baker@computer.org
     Resource Sharing - Assumptions
• Examples – fair share, proportional share, and pay-as-
  bid.
• All three mechanisms have a common drawback - they
  can only be used in scenarios where one resource
  provider serves several consumers, that is, there is no
  competition among providers.
   – Incases where all resources are under fully centralised
     control, this condition is no a problem.
• However, one idea of a Grid is to cross administrative
  boundaries.
• Consequently, there is a need for market mechanisms
  that support multiple resource providers.



 April 29, 2012      mark.baker@computer.org
                    Thesis Contribution
• Problem - how statistical demand forecasting methods can
  be integrated into a large-scale compute farm
  infrastructure to allow both resource consumers and
  resource providers to make economically and
  computationally efficient allocation decisions
• Contribution - a set of methods to predict demand in
  computational markets based on the proportional share
  resource allocation principle:
   – A model encompassing these methods the Proportional Share
     Market Prediction and Access Control (PS-MP/AC) model.
   – PS-MP/AC includes:
       • Collecting and summarising resource prices,
       • Algorithms to estimate bounds of future demand with statistical
         claims,
       • A risk probing interface for resource consumers,
       • An access control mechanism for resource providers.
   April 29, 2012         mark.baker@computer.org
                     Methodology
• Trace Analysis:
   – Job traces as well as load traces from a large-scale shared
     computational network were pre-processed to represent time
     series of global demand.
   – Traces were also used to evaluate predictor models, and to
     drive simulations and experiments.
• Mathematical Modeling:
   – Probabilistic models were designed based on the trace
     analyses and the distributional properties discovered.
• Prototype Implementation:
   – All models were therefore implemented in both prototype
     simulations and more robust implementations in full-scale
     systems.
   – Many of the implementations were also tested with real users
     and under real workloads.

 April 29, 2012      mark.baker@computer.org
                     Methodology
• Simulation Evaluation:
   – Simulations were designed to test initial implementations and
     to narrow down the problem and parameter space that was
     most interesting to test in a live system.
• Experimental Evaluation:
   – Second best option of running simulated user loads with real
     applications in the real system, and measuring sporadic usage
     from real users.
   – All experiments were conducted in the HP Labs Tycoon
     cluster of 80 nodes in Palo Alto.




 April 29, 2012      mark.baker@computer.org
                     Questions
• What are the real incentives to use a market-place
  like this when local resources are getting
  progressively cheaper?
• Can the analysis of the traces job and loads on various
  computational resources really help predict the usual
  application workloads over a Grid environment?
• Has the statistical analysis and model created
  produced true picture of workload over a Grid
  environment?
• What about the question of workload prediction?
• What types of applications can be used in the Tycoon
  environment?


 April 29, 2012    mark.baker@computer.org
                       Questions
• Does the risk prober really help the consumer know
  that they are getting good value for money?
• How will the system produced be used across the
  multiple grid middleware stacks?
• What is the real overhead of using a system like this
  in a world-wide market?
   – How would one partition such a system?
• What about the Microsoft resources?




 April 29, 2012      mark.baker@computer.org
                            Conclusions
• The various Grid initiatives have helped make huge
  strides in creating globally shared distributed systems.
• One of the key ideas behind the Grid has been sharing
  common protocols and API, this has not always
  happened:
   – Following the Web Services route has produced many wins, but
     also made environments increasingly complicated:
        • Too many specification and standards, which are not that stable and
          changing.
        • Example: WSRF::Lite and Apache WSRF can work together, because each
          uses different different encoding – “Literal” verses “Section 5”.

   – OGF and OASIS have taken far too long to push
     standards out.
   – People are looking at alternatives – REST is now
     increasingly being used.
   April 29, 2012          mark.baker@computer.org
                               Conclusions
• A question that is still relevant is, starting now, what
  standards and specifications would I actually used to
  create a Grid environment?
   – Globus, UNICORE, Glite, CROWN, OMII,…
        • All work, but are not interoperable.

• Virtualisation is one way ahead, create DOMs with all
  the grid middleware stacks, and load them on-demand.
• Commercial entities are not really using grid
  technologies:
   – Google are a REST-based storage system,
   – Amazon Elastic Compute is a web service that provides
     resizable compute capacity in the cloud.
• Greater interest recently (including OGF) in cloud
  computing…
   April 29, 2012            mark.baker@computer.org

								
To top