Scalability

Document Sample
Scalability Powered By Docstoc
					                                                                   ADSaT




Scalability & Availability

              Paul Greenfield
                  CSIRO



Advanced Distributed Software Architectures and Technology group    1
                                                                      ADSaT


    Building Real Systems
• Scalable
  – Fast enough to handle expected load
  – Grow easily when load grows
• Available
  – Available enough of the time
• Performance and availability cost
  – Aim for ‘enough’ of each but not more

   Advanced Distributed Software Architectures and Technology group    2
                                                                      ADSaT


                       Scalable
• Scale-up
  – Bigger and faster systems
• Scale-out
  – Systems working to handle load
  – Server farms
  – Clusters
• Implications for application design

   Advanced Distributed Software Architectures and Technology group    3
                                                                      ADSaT


                      Available
• Goal is 100% availability
  – 24x7 operations
• Redundancy is the key
  – No single points of failure
  – Spare everything
     • Disks, disk channels, processors, power
       supplies, fans, memory, ..
• Automated fail-over and recovery
   Advanced Distributed Software Architectures and Technology group    4
                                                                      ADSaT


                 Performance
• How fast is this system?
  – Not the same as scalability but related
     • Scalability is concerned with the limits to
       possible performance
  – Measured by response time and
    throughput
  – Aim for enough performance
     • Have a performance target
     • Tune and add hardware until target hit
     • Then worry about tomorrow…
   Advanced Distributed Software Architectures and Technology group    5
                                                                      ADSaT


   Performance Measures
• Response time
  – What delay does the user see?
  – Instantaneous is good but 95% under
    2 seconds is acceptable
  – Response time varies with ‘heaviness’
    of transactions
     • Fast read-only transactions
     • Slower update transactions
     • Effects of database contention

   Advanced Distributed Software Architectures and Technology group    6
                                                                   ADSaT


        Response Times




Advanced Distributed Software Architectures and Technology group    7
                                                                   ADSaT


        Response Times




Advanced Distributed Software Architectures and Technology group    8
                                                                   ADSaT

        Response Times




Advanced Distributed Software Architectures and Technology group    9
                                                                         ADSaT


                      Throughput
• How many transactions can be handled in
  some period of time
  – Transactions/second or tpm, tph or tpd
  – A measure of overall capacity
• Transaction Processing Council
  –   Standard benchmarks for TP systems
  –   TPCC for typical transaction system
  –   www.tpc.org
  –   Current record is 227,000 tpmc
      Advanced Distributed Software Architectures and Technology group    10
                                                                      ADSaT


                   Throughput
• Throughput increases until some
  resource limit is hit
  – Adding more clients just increases the
    response time
  – Run out of processor, disk bandwidth,
    network bandwidth
  – Some resources overload badly
     • Ethernet network performance degrades

   Advanced Distributed Software Architectures and Technology group    11
                                                                   ADSaT

                Throughput




Advanced Distributed Software Architectures and Technology group    12
                                                                      ADSaT


           System Capacity
• How many clients can you support?
  – Name an acceptable response time
  – Average 95% under 2 secs is common
     • And what is ‘average’?
  – Plot response time vs # of clients
• Great if you can run benchmarks
  – Reason for prototyping and proving
    proposed architectures before leaping
    into full-scale implementation
   Advanced Distributed Software Architectures and Technology group    13
                                                                   ADSaT

        System Capacity




Advanced Distributed Software Architectures and Technology group    14
                                                                      ADSaT


            Load Balancing I
• A few different but related meanings
• 1. Balancing across server processes
  – CORBA-style where clients use objects
    that live inside server processes
  – Want all server processes to be busy
  – Client calls have to go to the process
    containing their object, even if this
    process is busy and others are idle

   Advanced Distributed Software Architectures and Technology group    15
                                                                   ADSaT


         Load Balancing I




Advanced Distributed Software Architectures and Technology group    16
                                                                      ADSaT


            Load Balancing I
• Client calls on name server to find
  the location of a suitable server
• Name server can spread client
  objects across multiple servers
  – Often ‘round robin’
• Client is bound to server and stays
  bound forever
  – Can lead to performance problems

   Advanced Distributed Software Architectures and Technology group    17
                                                                              ADSaT


               Load Balancing I
Server      Client       Total Clients    Server      Client        Total Clients
Object      Numbers      per server       Object      Numbers       per server
Reference                object           Reference                 object

                                          1           1-100, 201,   160
1           1-100        100                          206, 211,
                                                      ….496

                                          2           101-200,      160
2           101-200      100                          202, 207,
                                                      212, …, 497


3           201-300      100              3           203, 208,     60
                                                      213, …, 498


4           301-400      100              4           204, 209,     60
                                                      214, …, 499


5           401-500      100              5           205, 210,     60
                                                      215, …, 500




               Initial                                   Later

      Advanced Distributed Software Architectures and Technology group              18
                                                                      ADSaT


            Load Balancing I
• Solution to static allocation
  problem is for clients to throw
  away their server objects and get
  new ones every now and again
• Application coding problem
  – And can be objects be discarded?
  – What kind of ‘objects’ are they if they
    can be discarded?
   Advanced Distributed Software Architectures and Technology group    19
                                                                      ADSaT


               Name Servers
• Server processes call name server
  when they come up
  – Advertising their services
• Clients call name server to find the
  location of a server process
  – Up to the name server to match
    clients to servers
• Client calls server process to
  create objects
   Advanced Distributed Software Architectures and Technology group    20
                                                                            ADSaT


                 Load Balancing I
                                                      Advertise
                                    Name Server        service

Request server    Return server
    reference     reference        Get server
                                   object reference

             Client                                                Server
                                                                  process
                                  Call server
                                  object’s methods

             Client
                                                                   Server
             Client                                               process




   Load balancing across processes within a server
      Advanced Distributed Software Architectures and Technology group       21
                                                                      ADSaT


           Load Balancing II
• What happens when our single
  system is full?
  – Use faster systems
     • Scale-up
  – Use additional systems
     • Scale-out
     • Now load-balancing is used to spread load
       across systems

   Advanced Distributed Software Architectures and Technology group    22
                                                                      ADSaT


           Load Balancing II
• CORBA world…
  – Name server can distribute across
    server processes running on different
    systems
  – Scales well…
     • Name server only involved when handing
       out a reference to a server, not on every
       method call


   Advanced Distributed Software Architectures and Technology group    23
                                                                            ADSaT


                 Load Balancing II
                                                      Advertise
                                    Name Server        service

Request server    Return server
    reference     reference        Get server
                                   object reference
                                                                   Server
             Client                                               process
                                  Call server
                                  object’s methods

             Client

             Client
                                                                   Server
                                                                  process
    Load balancing across
      multiple systems
      Advanced Distributed Software Architectures and Technology group       24
                                                                      ADSaT


           Load Balancing II
• COM+ world…
  – No need for load-balancing within a
    system
     • Multithreaded server process
     • All objects live in a single process space
  – Component load balancing across systems
     • Client calls router when creating object
     • Router returns reference to an object in a
       COM+ server process
     • Load balanced at time of object creation
   Advanced Distributed Software Architectures and Technology group    25
                                                                   ADSaT

         Load Balancing II
                                          MTS process


                                    D
Client                                                    A
                                    C
                                                          p
                                    O
                                                          p
                                    M
                                    /
Client                                                    D
                                    M
                                                          L
                                    T
Client                                                    L
                                    S



                                        Thread   Shared   Application
                                         pool    object   code
                                                 space
 COM+/MTS using thread pools rather than
   load balancing within a single system
Advanced Distributed Software Architectures and Technology group        26
                                                                      ADSaT
          COM+ Component
           Load Balancing
                            Response
                           time tracker
      Create object                        Pass request to
                              Router           server

                         Create object and pass
                            back reference
 Client
                         Call object’s
                          methods

 Client

 Client


COM + CLB balancing load across
       multiple systems
   Advanced Distributed Software Architectures and Technology group    27
                                                                      ADSaT


           Load Balancing II
• COM+ scales well…
  – Router only involved when object is
    created
     • May change in later release to support
       dynamic re-balancing as server load changes
  – Method calls direct from client to server
  – Allocation based on response time rather
    than round-robin
     • Allocate to least-loaded server

   Advanced Distributed Software Architectures and Technology group    28
                                                                      ADSaT


           Load Balancing II
• No name server in COM world?
  – COM/MTS clients ‘know’ the name of
    the server
     • Set at client installation time
     • Can change using GUI tools
     • Admin problem if server app is moved
  – COM+ uses Active Directory to find
    services

   Advanced Distributed Software Architectures and Technology group    29
                                                                      ADSaT


           Load Balancing II
• Some systems involve the router in
  every method call/request
  – Request goes to router process who
    then passes it on to a server process
  – Scales poorly as the router can be a
    major bottle-neck
  – Some availability concerns as well
     • What happens if the router fails?

   Advanced Distributed Software Architectures and Technology group    30
                                                                      ADSaT


          Load Balancing II

                                                             Server
Client                                                      process



                       Router
                                                             Server
Client
                                                            process
Client




   Load balancing with router in main call path

  Advanced Distributed Software Architectures and Technology group     31
                                                                      ADSaT


                      Scale-up
• No need for load-balancing across
  systems
• Just use a bigger box
  – Add processors, memory, ….
  – SMP (symmetric multiprocessing)
• Runs into limits eventually
• Could be less available

   Advanced Distributed Software Architectures and Technology group    32
                                                                      ADSaT


                      Scale-up
• Example from the Web
  – Large auction site
  – Server farm of NT boxes (scale-out)
  – Single database server (scale-up)
     • 64-processor SUN box
  – More capacity needed?
     • Add more NT boxes easily
     • SUN box is full so have to shift some
       databases to another box

   Advanced Distributed Software Architectures and Technology group    33
                                                                         ADSaT


                          Clusters
• A group of independent computers
  acting like a single system
  –   Shared disks
  –   Single IP address
  –   Single set of services
  –   Fail-over to other members of cluster
  –   Load sharing within the cluster
  –   DEC, IBM, MS, …

      Advanced Distributed Software Architectures and Technology group    34
                                                                              ADSaT

                               Clusters
                                    Client PCs




Server A                                                                Server B


                                           Heartbeat

               Disk cabinet A              Cluster management


                                              Disk cabinet B



           Advanced Distributed Software Architectures and Technology group    35
                                                                      ADSaT


                       Clusters
• Address scalability
  – Add more boxes to the cluster
• Address availability
  – Fail-over
  – Add & remove boxes from the cluster
    for upgrades and maintenance
• Can be used as one element of a
  highly-available system

   Advanced Distributed Software Architectures and Technology group    36
                                                                      ADSaT


         Web Server Farms
• Web servers are highly scalable
  – Web applications are normally stateless
     • Next request can go to any Web server
     • State comes from client or database
  – Just need to spread incoming requests
     • IP sprayers (hardware, software)
     • >1 Web server looking at same IP address
       with some coordination (see MS WLB docs)
  – Same technique for other network apps

   Advanced Distributed Software Architectures and Technology group    37
                                                                         ADSaT


          Available System
Web              Web Servers       App                Database is
Clients          Load balanced     Servers use        installed on
                 using Convoy      COM+ LB            Wolfpack cluster
                                                      for high
                                                      availability




                                                 COM+ LBS router node




Advanced Distributed Software Architectures and Technology group          38
                                                                      ADSaT


                    Availability
• How much?
  – 99%                  87.6 hours a year
  – 99.9%                8.76 hours a year
  – 99.99%               0.876 hours a year
• Need to consider operations as well
  – Maintenance, software upgrades,
    backups, application changes
  – Not just faults and recovery time

   Advanced Distributed Software Architectures and Technology group    39
                                                                      ADSaT


  Availability and Scalability
• Often a question of application design
  – Stateful vs stateless
     • What happens if a server fails?
     • Can requests go to any server?
  – What language and database API
     • Balance cost vs speed – VB/C++ - ODBC/ADO
  – Synchronous method calls or
    asynchronous messaging?
     • Reduce dependency between components
     • Failure tolerant designs
   Advanced Distributed Software Architectures and Technology group    40
                                                                      ADSaT


                    Next Week
• Distributed application architectures
  – How to design systems that will work,
    scale and be available
  – Web-based systems
  – Web technology




   Advanced Distributed Software Architectures and Technology group    41

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:8/19/2011
language:English
pages:41