Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

file10csyang

VIEWS: 0 PAGES: 97

									National Sun Yat-Sen University
System Support for Scalable, Reliable and
  Highly Manageable Internet Services




                Chu-Sing Yang


  Department of Computer Science and Engineering
       National Sun Yat-Sen University
                   Outline
   Introduction
   Proposed System
    – Request Routing Mechanism
    – Management System
   Content-aware Intelligence
   Work in Progress
   Conclusion




                                  3
                  Background
 The Internet has become the most important client-
  server application platform
 More and more Internet services emerge
 The trend toward exponential growth of Internet users
  at higher speed continues
 The Internet is becoming a mission-critical business
  delivery infrastructure
 The desire for using the Web to conduct business
  transactions or deliver services is increasing at an
  amazing rate.
 Huge demand for high performance, scalable, highly
                   reliable Internet servers
                                                     4
                Meeting the challenge
     New Moore’s law?
      – Servers capacity doubles every 18 month for meeting the
        explosive growth
More contents, more
servers, faster feeds
                                                     Providers
                                                     increase
                                                     the
                                                     network
        More users,                                  capacity
     more demand,
     faster modems
                                                                  5
              Web has new requirements
              Before                                  Now
   An economical platform for           An important platform for critical
    Information sharing and publish       services
    (non-critical information)           More sophisticated content,
   90 percents of information            e.g., larger percentage or
    represented by text and images        dynamic content or streaming
   Unfrequent maintenance and            data
    updates                              Content change frequently
   Security is not important            Security becomes great
   No guarantee on service               concern
    availability                         Companies are evaluated even
   Highly available performance          on the basis of their websites
                                         Explosive growth of user
                                          population

                                                                      6
                 Meeting the challenge
   The first-generation Web infrastructure was never
    designed to handle the unique traffic patterns of the
    Web, which today accounts for 80% of Internet usage.
   Most current medium & large Web service providers
    are suffering from server overload
    – Yahoo, Google, Altavista, CNN, Microsoft,…..
    – A single monolithic server system is difficult to cope with these
      challenges.


    We need a scalable Internet Service Architecture
     guaranteeing the service expectation of all Web
                        services!

                                                                     7
Essential Requirements of Internet Server
     High Performance
     Scalability
     High reliability
     Robustness
     QoS




                                      8
        Scalable Server Architecture
   Feasible solution: Server Cluster (server farm)
   Collection of independent computer systems working
    together as if a single system.
   Advantages
    – Scalable: grow on demand
    – Highly available: redundancy
    – Cost-effective
   Most current medium & large Web service providers
    take this architecture.
    This trend is accelerating!



                                                     9
                   Design Issues
                                                   Clients

                                   HTTP request
                 Internet

                        Router
                                         Request
                                         Routing

                                  Communication

                                        System     Server
                                    Management
                                                   farm
 We need an integrated system to support a successful
  Internet service on such a distributed server system

                                                             10
                      Our Solutions
         --Content-aware Web Cluster System
   Content-aware Distributor (Server Load Balancer)
    –   Content-aware routing (Layer-7 routing)
    –   Sophisticated load balancing
    –   Service Differentiation
    –   QoS
    –   Fault resilience
    –   Transaction support
   Distributed Server Management System
    –   System management
    –   Intelligent Content placement and management
    –   Supporting differentiated service
    –   Supporting QoS
    –   Fault management

                                                       11
                   Outline
   Introduction
   Proposed System
    Request routing mechanism
    – Management System
   Content-aware Intelligence
   Work in Progress
   Conclusion




                                 12
                    Motivation
   Key Challenge: how to dispatch and route incoming
    requests to the server best suited to respond




                                                    13
             Desirable Properties
   User transparency
   Backward compatible
   Fast Access
   Scalability
   Robustness
   Availability
   Reliability
   QoS Support




                                    14
                  Client-side Approach
   Customized browser (e.g. Netscape)
   Java applet
    – HAWA (From AT&T)
    – Smart client (From U.C. Berkeley)
   Advantage
    – low overhead
    – Global-wide solution
   Problems
    – create increased network traffic
           applet transmission
           extra querying between applet and servers for state information
    – insensitive to server’s state


                                                                              15
      DNS-based Approach

                                                              DNS
                     1    Who is www.nsysu.edu.tw ?




                     4   IP address of www.nsysu.edu.tw



                                                                        Database
                         Enterprise

5           Router                                        3         2
                         Internet


                         Router
                                                      Customized        Database
                                                         DNS

    Server Farm
                                                                                   16
                 DNS-based Approach
   Advantage
    – Ease of implementation
    – Low overhead
    – Global-wide solution
   Problems
    – Hostname to IP address mapping can be cached by DNS
      server
           lead to significant load imbalance
           change in DNS information propagate slowly through the Internet. That
            is, if the backend sever is failure or removed, Internet as a whole may
            not aware it.
    – It is difficult to detect failure and load information of back-end
      nodes


                                                                                17
                 HTTP Redirection
   One special response codes called redirection
    defined in the HTTP protocol can be used for
    directing a request.
   Through HTTP redirection, we can make a server to
    instruct the client to send the request to another
    location instead of returning the requested data.
   Problems
    – a request may require two or more connections for getting
      the desired service, thus this approach will increase the
      response time and network traffic.
    – the node serving this mechanism may become the
      impediment to scaling the server.


                                                                  18
                    Single IP address
client   http://pds.cse.nsysu.edu.tw/job/
                                                subnet

                                                           Server Farm
                                       Router


                                                         SVR.1
                                                                  ‧
                                                                 ‧‧
         Internet

                              Web Switch                 SVR.2



                                                                 ‧
                          Virtual IP address                     ‧
                                                                 ‧
                                                                 ‧
                                                         SVR.n      ‧
                                                                   ‧‧
                                                                         19
              Layer-4 based approach
   Route request based on Source and destination IP
    address, port number, TCP flag (SYN/FIN)
   Packets pertaining to the same connection must be
    routed to the same server
   Fine grained control on request routing
   Scalability limited by the Internet access bandwidth
   Based on simple algorithms:
    – Round-Robin, Weighted Round-Robin, Least connection.....
   Examples:
    –   Cisco LocalDirector
    –   IBM Network Dispatcher
    –   Foundry ServerIron
    –   F5 network
    –   HydraWeb

                                                            20
Issues Ignored by Existing Schemes
    Session Integrity
    Sophisticated Load Balancing
    Quality of Service
    Fault Resilience
    Content Deployment and Management

    These observations lead to the inevitable conclusion :
      the request routing mechanism should factor in
           content of request in making decisions
                   [Yang and Luo, IWI’99].



                                                        21
    Design of Content-aware Distributor
   Basic idea: route requests based on its URL
   Major challenge: connection-oriented semantics of TCP

                     1. establish TCP connection
                                                               Perform content-based
                     2. send HTTP request                         routing decision
                                   .                                               selected
                                   .                          relay HTTP request server
                                   .
    User                                         Dispatcher
               How to migrate a TCP connection or
               HTTP request to the selected server?

   Our design
     – Bridge two TCP connection
               client to distributor
               distributor to selected server
     – Pre-fork and re-use server-side connection
     – seamlessly relay HTTP request from client connection to the server
       connection


                                                                                              22
Operation of Content-aware Distributor
             Client                       Layer-7 Switch                           Server
                                               pre-fork         SYN(PISN
      connection                                                        )
                        SYN(CIS               connection
        setup                      N)
                                                  (1)                   )
          (2)                                                   SYN(SISN
                                                                          1)
                                  )                             ACK(PISN+
                         SYN(DISN
                         ACK(C ISN+1)                      HTTP Kee
                                                                   pAlive(PIS
                                                                              N+1)
                                                              ACK(SISN
                        ACK(DIS                                         +1)
                                N +1)
                   HTTP requ
    Client sends            est(CISN+
                      ACK(DIS         1)                        Data(SSN')
                                                                           )
   HTTP request                 N+1)                             ACK(PSN
         (3)
                                                (4)         ACK(SSN
                                                                    =SSN  '+x+1)
                                                           Connection
                                                             reuse
                                                             HTTP requ
                                                                      est(PSN)
                                                            ACK(SSN
                                             Connection             ),Option(bi
                                                                               nd)
                                              Binding
                                                               Data(SSN)
                                              rewrite                  n+1)
                                 +1)                         ACK(PSN+le
                       Data(DISN              packet
                                      )
                      ACK(C ISN+len+1

                      ACK(DIS                   rewrite
                              N+len+1)
                                                packet     ACK(SSN
                                                                    +len+1)



                                                                       ta
                                                             End of da
                                 ta, FIN
                       End of da

                             ACK
                                                                  ACK
                                              connection
                                                Reuse
                                                                                            23
             Make Routing Decision
   Parse HTTP Request
   Make routing decision
    – Select a destination server
    – Select a pre-forked connection




                                       24
              Content-aware Distributor
 Uper Layer    Administrator's Commands Administrator's Commands                  Uper Layer



                                                                   Workload
Socket Layer          URL Table            Cluster Table                         Socket layer
                                                                   Manager



                                             Packet                  Packet
   TCP               Dispatcher                                                     TCP
                                             Rewriter              Transmitter



                                                                   Handshake
  IP Input              Timer                 Mapping                             IP Output
                                               Table                Handler



                                             Packet
                                            Analyzer



                                            Network
                                            Interface


                                                                                               25
            Implementation Status
   The content-aware distributor is implemented as
    kernel loadable module
   The distributor module inserts itself between network
    interface (NIC) driver and TCP/IP stack.
   We have extended the Linux kernel (Version 2.2.16)
    with this module.
   Because ideas and mechanisms adopted in content-
    aware distributor are generic, so it should be
    applicable to other system (e.g. BSD or Windows NT)
    as well.



                                                      26
Challenges of Content-aware Routing
   How can we build the content-aware intelligence into
    the distributor for making routing decision?
    – Content type,size,priority,location,…
    – Should be configurable,extensible,comprehensive
   How can the distributor perfom request distribution
    based on the content-aware intelligence?
    – Parsing HTTP header of each request
    – Should be fast,efficient




                                                          27
                   The idea of URL table
client         http://www.nsysu.edu.tw/pds/
                                                   subnet

                                                                Server Farm
                                          Router


                                                              SVR.1
                                                                        ‧
                                                                       ‧‧
               Internet

                                   Distributor                SVR.2


                 URL Table                                            ‧
                                                                      ‧
          URL                        server        priority           ‧
    http://www.yahoo.com
  http://www.yahoo.com               SVR.1          Low
  http://www.nsysu.edu.tw
  http://www.nsysu.edu.tw/pds/       SVR.2          High
                                                   SVR.2
  http://www.nctu.edu.tw             SVR.4                    SVR.n        ‧
                                                                          ‧‧
                                                                               28
                  Design of URL Table
   URL table holds the content-related information that enables the
    distributor to make intelligent routing decisions.
     – E.g., content type, size, priority, location…
   The URL table is a multiple level hash tree that model the
    content tree
   Such an idea is based on the observation that people generally
    organize content using a directory-based hierarchical structure.
   The files in the same directory usually possess the same
    attributes.
   For example, the files underneath the /CGI-bin/ directory
    generally are CGI scripts for generating dynamic content
   To reduce the search time and the size of the table,we use an
    aggregation mechanism to specify a set of items that own the
    same properties

                                                                 29
                         Modeling the Content Tree
          Ex: http://foo.com/special/dancer/img/main/01/21.jpg
                                                                                      URL Table


      busines
                   smart chatroom special                                   busines
      sweekly                                                                             smart chatroo   special
                                                                            sweekly
                                                                                                m
  1020 ~                            star       dancer
                                                                            1020 ~ 1030            star      dancer
  1030
                                                          HTML
                                                           files
img             HTML     900201 ~            img
                 files   900301                                                             900201 ~
                                                                                            900301

                    img                      main        submit                                                  img


  Image
   files                            HTML
                                     files


                                                Image              Image
                    Image
                                                 files              files
                     files                                                                                  30
                 Request Routing
   An example of incoming HTTP request:
http://www.pds.nsysu.edu.tw/nsysu/personal/engineering/~883460
   1/index_logo2.jpg




   The problem is…..
                                                            31
32
         URL Parsing is expensive!!
   Performing content-aware routing implies that some
    kind of string searching and matching algorithm is
    required.
    – Such a time-consuming function is expensive in a heavy
      traffic web site.
   Our experience showed that the system performance
    would be severely degraded if we implement some
    URL parsing functions in the distributor.
   You will loss 7/8ths of your Web switch’s
    performance if you turn on its URL parsing
    function.                              ~~F5 Lab


                                                               33
The Idea of the URL Formalization

   Generally, the reason for using the variable-length
    string to name a file or directory is just because it is
    mnemonic, thereby making it easier for humans to
    remember..
    In most cases, an HTTP request is issued when the
    browser follows a link: either explicitly, when the user
    clicks on an anchor, or implicitly, via an embedded
    image or object.
   Most URLs are invisible to the users,they don’t care
    about what name it has.
   The name is only meaningful to the content provider.
    Therefore,we can convert the original name to a
    formalized form.

                                                        34
click




        35
                  URL Formalization
   Convert user-friendly names to routing-friendly name.
   Basic idea: convert the original name of each file or
    directory to a fixed-length and formalized name.
   The procedure of URL formalization
    – Convert the original name of every directory and file into a fixed
      length and formatted name.
    – Parse all html files and modify the embedded hyperlinks to
      conform to the new name.
    – The new path name of each embedded link will be :
      /preamble/formalized host name/formalized path name/…..




                                                                       36
After URL Formalization




                          37
38
                                  Content Placement
              Customer


                         Home Server
     Upload                                                                   Server nodes
                                       Object dependence
                                             graph

www.pds.nsysu.edu.tw

                 /                                                             /!!           preamble

            /Document    /Image
                                                     Transform                 /sxtn         Transformation
         …                                           then Place                              of the host name

                                    Parse                         /tpvz              /gngr
             /Research
                                                              …
                                                                      /wukl




                                                                                                   39
                          Content Management
       Customer



                Home Server
Update                                                               Server nodes
                                    Object dependence
      Trigger!                            graph

   www.pds.nsysu.edu.tw                                                      /!!

                 /                                                           /sxtn
                                                        Update
            /Document      /Image
         …                                                       /tpvz             /gngr
                                     Lookup                      …
             /Research

                                                                     /wukl




                                                                                           40
click




        41
42
    Advantages of URL Formalization
   The fixed-length formalized names are easier for the
    distributor to process.
    – We even can implement the routing function in hardware for
      performance boosting.
   Placing the host name in the first level of the path
    name can further speed up the routing decision.
   Combined with the well-designed URL table, the
    dispatcher can quickly retrieve related information to
    make routing decision.
   Be particularly useful in Web hosting service
    environment


                                                              43
                   Outline
   Introduction
   Proposed System
    – Request Routing Mechanism
    Management System
   Content-aware Intelligence
   Work in Progress
   Conclusion




                                  44
          Why need a Management System?
     Load management? Configuration? Content management? Failure?
       Monitoring…...

                                   .......
Web
                                 .......
servers                        .......


Web
Content



                                                           45
                 Required Functions
   System configuration
    – ease to configure
    – status visualization
   Content placement and management
    – be able to deploy content on each node according to its
      capability
    – service differentiation
    – Dynamically change the content placement according to the
      load
    – Support the content-aware routing
   Monitoring
    – real-time statistics
    – log analysis
    – site usage statistics

                                                             46
           Required Functions (cont’)
   Performance management
    –   monitoring
    –   analysis and statistics
    –   event (poor performance or overloaded)
    –   automatic tuning
   Failure management
    –   diagnosis
    –   server failure identification
    –   network analysis and protocol analysis
    –   content verification (object/link analysis)
    –   monitoring and alarm



                                                      47
Our Management System
     (Overview)




                        48
Our Management System
   (Implementation)
       User


                              HTTP Request

                                                          HTTP Daemon
     Web page                    Web page

                                                                                  Invoke
  Remote Console Applet                                       Controler
                          Administrative Operation                                             Agent Code Base
                                                        Administration Funtions


                                                         Modefied Kernel


Java-enabled Browser                                                          Distributor

                                                                                                        Agent


                                              Network                                                   Agent
             Agent


 HTTP Daemon                                                                                   HTTP Daemon


      Broker                  . . . . . . .                                                        Broker

Common Operating
                                                                                            Common Operating system
    system


Web Server Node                                                                              Web Server Node




                                                                                                                      49
          Our Management System
   Controller(Java Application)
    – Communicate with the distributor
    – control center
   Broker(Java Application)
    – running on Web server node
    – monitoring
    – execute downloading agent
   Agent (Java class)
    – Each administrative function is implemented in the form of a
      Java class termed agent
   Remote console (Java Applet)
    – an easy-to-use GUI for web site manager to maintain and
      manage the system.
                                                                50
Web Server Configuration




                           51
Content Management




                     52
Performance Monitor Options




                              53
Performance Monitor




                      54
Performance Monitor (cont’)




                              55
Performance Monitor (cont’)




                              56
Features of Our Management System
   Platform independent
    – Implementing the daemon in Java can relieve the concerns
      related to heterogeneity of the target platforms
   Support comprehensive management functions
   Enables the complete management of a web site via
    a standard browser (from any location)
    – Support tracking and visualization of the system’s
      configuration and state
    – Produce a single, coherent view of the partitioned content
   Extensibility
   Support URL Formalization


                                                                   57
                   Outline
   Introduction
   Proposed System
    – Request Routing Mechanism
    – Management System
Content-aware Intelligence
 Work in Progress
 Conclusion




                                  58
                  Current Status
We have implemented the following content-aware
  intelligence in our system:
 Affinity-Based request routing
 Content placement and management
    – Dispersed Content Placement
    – Content Segregation
   Fault Resilience




                                                  59
      Affinity-Based Request Routing
   An important factor to consider: serving a request
    from the disk is far slower than serving the request
    from the memory cache.
   With the content-aware mechanism, it is possible to
    direct requests for a given item of content to the
    server that already have data cached in main
    memory.
   Achieving load balancing and locality




                                                       60
    Benefits of Affinity-Based Routing
             -- Test Environment
   Heterogeneous servers               Workload generated by
    cluster                              WebBench
    – 4 Pentium-2 machines               – 8 Pentium-2 machines
           350MHZ CPU                     serve as WebBench Client
           128M RAM
                                                350MHZ CPU
           8G SCSI hard disk
           100 Mbps Fast Ethernet              128M RAM
           Windows NT + IIS 4.0                100 Mbps Fast Ethernet
    – 3 Pentium-pro machine                     Windows NT
           200MHZ CPU                  Each machine runs four
           64M RAM
           4G SCSI hard disk            WebBench client programs
           100 Mbps Fast Ethernet       that emit a stream of Web
           LINUX + Apache
                                         requests, and measure the
    – 2 Pentium-pro machine
           150 MHZ CPU
                                         system response
           64M RAM                     The stream of requests is
           4G IE hard disk
           100 Mbps Fast Ethernet
                                         called the workload
           LINUX + Apache


                                                                          61
    Benefits of Affinity-Based Routing
            -- Workload description
   Workload
                        Number of Files   Average File Size (bytes)   Request Percentage

      CLASS_1 (gif)          301                    223                      16

      CLASS_2 (gif)          200                    735                       7

      CLASS_3 (gif)          361                    1522                     12

      CLASS_4 (jpg)          665                    2895                     20

      CLASS_5 (htm)          1865                   6040                     16

      CLASS_6 (htm)          1705                  11426                     15

      CLASS_7 (htm)          721                   22132                      6

      CLASS_8 (htm)          265                   41518                      3

      CLASS_9 (exe)           53                   529k                       3

     CLASS_10 (Video)         27                   1024k                      2


   Total size of documents set >> memory size in each
    server node

                                                                                           62
    Benefits of Affinity-Based Routing
               --Results (1/3)
   Layer-4 based dispatcher + Weighted Round Roubin

                                                Overall WebBench Throughput (Bytes/Second)

                                     14000000.000

                                     12000000.000
       Throughput (Bytes / Second)




                                     10000000.000

                                      8000000.000

                                      6000000.000

                                      4000000.000

                                      2000000.000

                                            0.000
                                                    1   6   12   18 24   30 36   42   48   54   60   66   72 78   84

                                                                         Number of clients


                                                                                                                       63
    Benefits of Affinity-Based Routing
               --Results (2/3)
   Layer-4 based dispatcher + Weighted Least
    Connection
                                                 Overall WebBench Throughput (Bytes/Second)


                                      14000000.000

                                      12000000.000
        Throughput (Bytes / Second)




                                      10000000.000

                                       8000000.000

                                       6000000.000

                                       4000000.000

                                       2000000.000

                                             0.000
                                                     1   6   12   18 24   30 36   42 48   54 60   66 72   78 84

                                                                          Number of Clients


                                                                                                                  64
    Benefits of Affinity-Based Routing
               --Results (3/3)
   Affinity-Based Routing

                                                Overall WebBench Throughput (Bytes/Second)

                                     14000000.000

                                     12000000.000
       Throughput (Bytes / Second)




                                     10000000.000

                                      8000000.000

                                      6000000.000

                                      4000000.000

                                      2000000.000

                                            0.000
                                                    1   6   12   18 24   30 36   42 48   54 60   66 72   78 84

                                                                         Number of Clients



                                                                                                                 65
Content Placement and Management
   An important factors in efficient utilization of a
    distributed server and achieving better performance
    is to be able to deploy content on each node
    according to its capability, and then direct clients to
    the best suited server.
   Challenge: how to place and manage content in such
    a distributed server system, in particular, such
    servers tend to be more heterogeneous




                                                         66
    Existing Content Placement Schemes
   Place all content on a shared network file system
     – Advantage:
           ease to maintain
     – Disadvantage:
           suffer from the single-point-of-failure problem
           increase user perceived latency
           inability to support dynamic content

   Replicate all content on each server node:
     – Advantage:
           avoid the significant overhead associated with the previous scheme
           high availability due to data redundancy
     – Disadvantage:
           expensive in terms of disk utilization
           pose great administrative burden on content management


                                                                                 67
    Issues Ignored by Existing Schemes
   Variety of Web content
   Heterogeneity of server configuration
   Variety of access pattern (e.g., flash crowd)
   Need for differentiate content according different
    priority or importance

Neither of the two schemes is a satisfactory solution for
  a heterogeneous distributed Web-server




                                                         68
                       Our Solution
   Basic idea: content-aware routing + a content
    placement and management system, which enable
    the administrator to be free on deciding which node
    does what
    – content partition
    – partial replication for performance or availability
    – incorporate with two traditional scheme
   Advantage
    – better resource utilization and scalability
    – ability to specialize some node to host certain content type
    – content segregation for preventing interference between
      different type of requests
    – ability to exert explicit control over resource allocation
      policies

                                                                 69
Features of Content Management System

    Support tracking and visualization of the system’s
     configuration and state
    Produce a single, coherent view of the partitioned
     content
    Implementing the daemon in Java can relieve the
     concerns related to heterogeneity of the target
     platforms
    Ability to be tailored or extended for the different
     requirements of different system
    Automatic content rearrangement facility to further
     ensure an even load distribution
     – Skew of access pattern may cause load imbalance

                                                            70
Benefits of Content Management System
          -- Workload description
    Workload 2 (static content + dynamic content)
                             Number of Files   Average File Size (bytes)   Request Percentage

                ASP                37                                              8

                CGI                14                                             12

           CLASS_1 (gif)          301                    223                      12

           CLASS_2 (gif)          200                    735                       6

           CLASS_3 (gif)          361                    1522                      8

           CLASS_4 (jpg)          665                    2895                     15

           CLASS_5 (htm)          1865                   6040                     12

           CLASS_6 (htm)          1705                  11426                     14

           CLASS_7 (htm)          721                   22132                      8

           CLASS_8 (htm)          265                   41518                      1

           CLASS_9 (exe)           53                   529k                       2

          CLASS_10 (Video)         27                   1024k                      1




                                                                                                71
      Benefits of the Content Partition
               --Configuration
   We used WebBench with workload 1 to perform the
    experiment on the following three configurations:
    – the entire set of files was replicated on each of the servers
    – the entire set of files shared using NFS
    – the document sets was dispersed with the content aware
      routing


   We roughly partitioned the document tree by content
    type in the configuration 3.
   We also place large video file in the nodes with large
    volume and fast disk.



                                                                  72
      Benefits of the Content Partition
                   --Results
                                             NFS    Full-replication    Our system

                                16
                                14
          Throughput (MB/sec)


                                12
                                10
                                 8
                                 6
                                 4
                                 2
                                 0
                                     1   8 16 24 32 40 48 56 64 72 80 88 96
                                                    Number of Clients


   Due to content partition, each server only poses part of the content, so
    that each server sees a smaller set of distinct requests and the working
    set size is reduced Improve cache hit rate

                                                                                     73
      Benefits of Content Segregation
              -- Configuration
   We used WebBench with workload 2 to perform the
    experiment on the following two configurations:
    – the entire set of files was replicated on each of the servers
    – the document sets was dispersed with the content aware
      routing

   In our content-smart cluster (configuration 3):
    – separate dynamic content and static content on different
      servers.
    – place dynamic content (CGI scripts and ASP) on the servers
      with powerful CPU, plain html content on the nodes with
      slow processor and disk.
    – separate large file (e.g., video file) on the server nodes with
      fast disk.


                                                                  74
            Benefits of Content Segregation
                        --Result
                                                    Full-replication   Our system

                                  2500
      Throughput (requests/sec)



                                  2000

                                  1500

                                  1000

                                   500

                                     0
                                         1   8 16 24 32 40 48 56 64 72 80 88 96
                                                           Number of Clients


   Full-replication placement scheme does not take the
    heterogeneity on the capability of each node into consideration
 result in poor performance


                                                                                    75
          Benefits of Content Segregation
                      --Result
   The result shows the throughput when the server was saturated
    by 120 concurrent WebBench clients.
                                                                   CGI request
                                                                   ASP request
                                                               Request for static content




                            2000                                                               2000
                                                                                                                  1524
       Number of requests




                                                                          Number of requests
                            1500                                                               1500


                            1000                                                               1000
                                                       962


                             500                                                                500
                                                                                                      256
                                    176                                                                     196
                                              138

                               0                                                                  0
                             (a)NAT Router with Content Replication         (b)Content-aware Router with Content Segregation


   This experiment serves as a proof of the performance benefits
    of content-aware routing incorporated with content segregation.

                                                                                                                               76
                  Fault Resilience
   The existing server-clustering solutions are not highly
    reliable, but merely highly available.
   They offer no guarantee about fault resilience for the
    service.
    – Although the server failure can be easily detected and
      transparently replaced with the available redundant
      component, however, any ongoing requests on the failed
      server will be lost.
   In addition to detecting and masking the failures, an
    ideal fault-tolerant Internet server should enable the
    outstanding requests on the failed node to be
    smoothly migrated and then recovered on another
    working node.

                                                               77
                           Analysis
   To support fault resilience, we think the routing
    mechanism should be extended to support two
    important capabilities: checkpointing and fault-
    recovery.
   Challenges:
    – the cost is very expensive if we log every incoming request
      for checkpointing
     The request routing mechanism should be content-aware,
      so that it can differentiate varieties of requests and provide a
      corresponding fault-resilience guarantee.
    – how to recover a Web request from a failed server node to
      continue execution in another working node
     Request and its TCP connection should be smoothly and
      transparently migrated to another node


                                                                   78
             FT-capable Distributor
   Goal : enable the outstanding requests on the failed
    node to be smoothly migrated to and then recovered
    on another working node.
   We think the request routing mechanism, needed in
    the sever cluster, is the suitable position to realize the
    fault-resilience capability.
   We combine the capabilities of content-aware routing,
    checkpointing, and fault recovery to propose a new
    mechanism named Fault-Tolerance capable
    distributor.




                                                           79
                    Fault Recovery
   We divide web requests into two types, stateless and
    stateful request, and then provide corresponding
    solution to each category.
   Stateless requests
    – static content
    – dynamic content
   Stateful requests
    – transaction-based services
    – the heart of a large number of important web services (e.g.,
      E-commerce)




                                                                80
     Fault Recovery—Static Requests
   A majority of Web requests are to static objects, such
    as HTML files, images, and videos.
   If one server node fails in the middle of a static
    request, we use the following mechanism to recover
    this request on another node.
    – select a new server
    – select an idle pre-forked connection connected with the
      target server
    – infer how many bytes has been successfully received by the
      client (from information in the mapping table)
    – issues a range request on the new server-side connection to
      the selected server node.



                                                              81
Fault Recovery—Dynamic Requests
   Dynamic content: response pages are created on
    demand (e.g., CGI scripts, ASP), mostly based on
    client-provided arguments.
   Distributor will log user arguments conveyed in the
    dynamic requests.
   Recovery mechanism
    – select a new server
    – select an idle pre-forked connection connected with the
      target server
    – replay with the logged arguments




                                                                82
Fault Recovery—Dynamic Requests
              (cont’)
   We found the previous approach is problematic in
    some situations.
   The major problem is that some dynamic requests
    are not “idempotent”.
    – the result of two successive requests with the same
      arguments is different.
   It is needed to force the client to give up the data that
    it has received and then re-receive the new response
    page.
    – it will not be user-transparent and compatible with the
      existing browser.
   We tackle this problem by making the distributor
    node to be a reverse proxy and “store-and-then-
    forward” the response page.

                                                                83
                 Stateful Requests
   In some cases, the user does not browse a number
    of independent statically or dynamically generated
    pages, but is guided through a session controlled by
    a server-side program (e.g., a CGI script) associated
    with some shared states.
   These session-based services are generally based
    on so-called three-tier architecture.
   Recovering a session in the three-tier architecture is
    a more challenging problem.




                                                        84
Fault Recovery—Stateful Requests
   First of all, the web site manager should define a
    session for which fault resilience is required.
    – Via the GUI of a management system
    – the configuration information will be stored in the URL table
   When the distributor finds a request belonging to a
    session, it will “tag” this client and then direct all
    consequent requests from the client to one of the
    “twin servers”, until it finds a request conveying the
    “end” action.




                                                                 85
Protocol of Twin Server
  Client              Primary                Backup                Database

                Request
                                  Request

                                processing and
      processing request
                                  Log request
                                 Go for It

                                       Start of two-phase commit




                                    Yes


                                    Log

                                 Ack


                                                       Commit


                                             Ack
                                                     Ack
           Result


               Ack

                           release logged data




                                                                              86
                  Workload Description
   We created a workload that models the workload
    characterization (e.g., file size, request distribution, file access
    frequency, etc.) of representative Web servers
                               Number of Files   Average File Size (bytes)   Request Percentage

                  ASP                37                                              8

                  CGI                14                                             12

             CLASS_1 (gif)          301                    223                      12

             CLASS_2 (gif)          200                    735                       6

             CLASS_3 (gif)          361                    1522                      8

             CLASS_4 (jpg)          665                    2895                     15

             CLASS_5 (htm)          1865                   6040                     12

             CLASS_6 (htm)          1705                  11426                     14

             CLASS_7 (htm)          721                   22132                      8

             CLASS_8 (htm)          265                   41518                      1

             CLASS_9 (exe)           53                   529k                       2

            CLASS_10 (Video)         27                   1024k                      1



   About 6000 unique files of which the total size is about 116MB.
                                                                                                  87
                      Fail-over Time
   We implemented a fault-injection program executing on each
    server nodes, which will shutdown and restart the system or
    HTTP daemon to simulate failures and repairs.
   Static requests
          Request size (Kb)      4K        8K        32K
         Failed request (ms)    887.64    954.74   1076.98
           Baseline (ms)        24.12     33.48     172.42
         Fail-overtime (ms)     863.52    921.26    904.56
            File size (Kb)       64K      256K     1024K
         Failed request (ms)   1241.61   2172.23   6915.01
           Baseline (ms)        312.38   1143.97   5325.45
         Fail-overtime (ms)     929.23   1028.26   1589.56




                                                                  88
                     Fail-over Time (Cont’)
   We think that the measured fail-over time may be
    overestimated.
   Instrumentation
                                                   Request Migration


             Td                Tr           Ts             Tpars            Tproc               Tnet

     Failure      Failure is   Distributor sends Request             Parsing        Processing Data received
    Occurrence    detected      partial request  arrives            complete         complete by distributor


   Result of instrumentation
             Request size      Tr (ms)     Ts (ms)     Tpars (ms)     Tproc(ms)     Tnet (ms)
                 4k                 3.23    2.58         8.38           9.53          2.25
                  84                3.12    2.35         8.52           8.89          3.01
                 32k                4.18    2.21         9.86          10.23          2.05
                 64k                3.89    2.54         8.25          10.56          2.16
                256k                3.56    2.39         7.89          42.56          2.38
               1024k                3.09    2.64         8.69          225.23         2.53

                                                                                                        89
                            Overhead
   Compared with a server system clustered by a
    Server Load Balancer (Layer-4 routing)
   Distributor node
    – Pentium-II 350
    – 64M RAM
    – 100 Mbps fast ethernet

   Overhead associated with the Fault-Tolerance
    Mechanism (User perceived latency)
    – Static content
              Request size (Kb)    4K        8K        32K
              Our system (ms)     27.19     36.07     174.58
               Baseline (ms)      23.58     32.25     170.24
               Overhead (ms)       3.61      3.82      4.34
               File size (Kb)      64K      256K     1024K
              Our system (ms)     312.9    1151.04    4824.1
               Baseline (ms)      308.39   1145.62   4815.17
               Overhead (ms)       4.51      5.42      8.93
                                                               90
                   Overhead (cont’)
   Dynamic content

               Type       Baseline    Our system
               Light      0.842 sec    0.851 sec
              Moderate    3.128 sec     3.149 sec
               Heavy      5.432 sec    1.246 sec

   In terms of session-based requests, our protocol introduces an
    overhead of about 8% over the baseline system that does not
    offer any guarantee.
   The experiment was performed over a local area network,
    where high-speed connections are the norm, resulting in short
    observed response time and then large relative overhead.
   The overhead would be insignificant when compared with the
    latency over wide-area networks

                                                                 91
                        Overhead
                      (Throughput)
   Peak throughput
    – Layer-4 cluster: 2489 requests/sec
    – Our system : 2378 requests/sec
    – It shows that our fault tolerance mechanism does not cause
      significant performance degradation.
   At the period of peak throughput, the CPU utilization
    of the distributor is 67%, and the consumed memory
    of our system is slightly larger (only 2.3 Mbytes) than
    that of the layer-4 dispatcher.
   This means that our mechanism is not a performance
    bottleneck.
    – In fact, we found the performance bottleneck in our
      experiment is the network interface of the distributor node.



                                                                   92
                         Long-running Test


                            One server fails         Three servers fail

                 1300
request/second




                 1200
                 1100
                 1000
                 900
                        1        3   5   7   9 11 13 15 17 19 21 23 25
                                                                   Spawn two new
                   Server fail               Timeline(seconds)     server nodes




                                                                                   93
                 Service Reliability
Our system guarantees service reliability at three levels:
 The management system provide a status detection
  mechanism that can detect and mask the server failures.
 A request-failover mechanism enables an ongoing Web
  request to be smoothly migrated and recovered on
  another server node in the presence of server failure or
  overload.
 A mechanism to prevent the single-point-of-failure.




                                                      94
                    Work in progress
   Load Balancing
     – Sophisticated Load Balancing in distributor
     – Dynamically content rearrangement facility to further ensure an
       even load distribution or QoS requirement
   Security
   Quality of Service support
   Service Level Agreement
     – enable the content owners to specify their specific requirements
       such as bandwidth usage, content type, number or placement of
       content replicas, or required degrees of service reliability.
   We are implementing the related mechanisms to configure the
    management policy to meet the complex requirements of
    different customers.
   Hardware Support


                                                                          95
                  Contributions
   Content-aware request routing mechanism
   Java-based management system
   Idea of URL table
   Performance speedup by URL formalization
   Enabling fault resilience for Web services
   Enhance reliability for Internet services
   Content-aware Load balancing algorithm
   QoS support
   Transaction-based services support
   System robustness
   Service Level Agreement and System policy
                                                 96
                             Conclusion
   Web service providers must gradually move to more
    sophisticated services as the content of a Web site or
    e-business operations become more complex.
   The Internet service supported by our system will be
    – Scalable
           Cluster-based architecture
           Efficient content-aware routing
    – Reliable
           Failure detection
           Request failover
           System robustness
    – Highly manageable
           Java-based management system



                                                       97

								
To top