server-tutorial

Document Sample
server-tutorial Powered By Docstoc
					        Web Servers:
Implementation and Performance

                          Erich Nahum


  IBM T.J. Watson Research Center
 www.research.ibm.com/people/n/nahum
         nahum@us.ibm.com


Web Servers: Implementation and Performance   Erich Nahum   1
         Contents of This Tutorial
   Introduction to HTTP
   HTTP Servers:
    • Outline of an HTTP Server Transaction
    • Server Models: Processes, Threads, Events
    • Event Notification: Asynchronous I/O
   HTTP Server Workloads:
    • Workload Characteristics
    • Workload Generation
   Server TCP Issues:
    • Introduction to TCP
    • Server TCP Dynamics
    • Server TCP Implementation Issues


Web Servers: Implementation and Performance   Erich Nahum   2
       Things Not Covered in Tutorial

     Clusters
     Client-side issues: DNS, HTML rendering
     Proxies: some similarities, many differences
     Dynamic Content: CGI, PHP, JSP, etc.
     QoS for Web Servers
     SSL/TLS and HTTPS
     Content Distribution Networks (CDN’s)
     Security and Denial of Service


    Web Servers: Implementation and Performance   Erich Nahum   3
    Assumptions and Expectations

   Some familiarity with WWW as a user
    (Has anyone here not used a browser?)
   Some familiarity with networking concepts
    (e.g., unreliability, reordering, race conditions)
   Familiarity with systems programming
    (e.g., know what sockets, hashing, caching are)
   Examples will be based on C & Unix
    taken from BSD, Linux, AIX, and real servers
    (sorry, Java and Windows fans)


Web Servers: Implementation and Performance   Erich Nahum   4
        Objectives and Takeaways

After this tutorial, hopefully we will all know:

   Basics of server implementation & performance
   Pros and cons of various server architectures
   Difficulties in workload generation
   Interactions between HTTP and TCP
   Design loop of implement, measure, profile, debug,
    and fix

Many lessons should be applicable to any networked
server, e.g., files, mail, news, DNS, LDAP, etc.

Web Servers: Implementation and Performance   Erich Nahum   5
                               Timeline
Section                                                      Min
Introduction to HTTP                                         20
Outline of an HTTP Server Transaction                        25
Server Models: Processes, Threads, Events                    20
Event Notification: Asynchronous I/O                         20
Workload Characteristics                                     35
Break                                                        15
Workload Generation                                          40
Introduction to TCP                                          35
Server TCP Dynamics                                          20
Server TCP Implementation                                    25



 Web Servers: Implementation and Performance   Erich Nahum         6
                  Acknowledgements
         Many people contributed comments and
         suggestions to this tutorial, including:


  Abhishek Chandra                       Balachander Krishnamurthy
  Mark Crovella                          Vivek Pai
  Suresh Chari                           Jennifer Rexford
  Peter Druschel                         Anees Shaikh
  Jim Kurose


                 Errors are all mine, of course.



Web Servers: Implementation and Performance   Erich Nahum            7
Chapter 1: Introduction to HTTP




Web Servers: Implementation and Performance   Erich Nahum   8
              Introduction to HTTP

                http request                       http request

Laptop w/      http response                      http response
Netscape                                                          Desktop w/
                               Server w/ Apache                   Explorer

     HTTP: Hypertext Transfer Protocol
       • Communication protocol between clients and servers
       • Application layer protocol for WWW
     Client/Server model:
       • Client: browser that requests, receives, displays object
       • Server: receives requests and responds to them
     Protocol consists of various operations
       • Few for HTTP 1.0 (RFC 1945, 1996)
       • Many more in HTTP 1.1 (RFC 2616, 1999)

Web Servers: Implementation and Performance       Erich Nahum                  9
    How are Requests Generated?

   User clicks on something
   Uniform Resource Locator (URL):
    •   http://www.nytimes.com
    •   https://www.paymybills.com
    •   ftp://ftp.kernel.org
    •   news://news.deja.com
    •   telnet://gaia.cs.umass.edu
    •   mailto:nahum@us.ibm.com
   Different URL schemes map to different services
   Hostname is converted from a name to a 32-bit IP
    address (DNS resolve)
   Connection is established to server
        Most browser requests are HTTP requests.

Web Servers: Implementation and Performance   Erich Nahum   10
              What Happens Then?
   Client downloads HTML document                          <html>

    • Sometimes called “container page”                     <head>
                                                            <meta

    • Typically in text format (ASCII)                      name=“Author”
                                                            content=“Erich Nahum”>

    • Contains instructions for rendering                   <title> Linux Web
                                                            Server Performance
         (e.g., background color, frames)                   </title>
                                                            </head>
    • Links to other pages                                  <body text=“#00000”>
                                                            <img width=31
                                                            height=11
                                                            src=“ibmlogo.gif”>

   Many have embedded objects:                             <img
                                                            src=“images/new.gif>
                                                            <h1>Hi There!</h1>
    • Images: GIF, JPG (logos, banner ads)                  Here’s lots of cool
                                                            linux stuff!
    • Usually automatically retrieved                       <a href=“more.html”>
                                                            Click here</a>
         • I.e., without user involvement                   for more!
         • can control sometimes                            </body>
                                                            </html>
           (e.g. browser options, junkbusters)
                                                             sample html file


Web Servers: Implementation and Performance   Erich Nahum                            11
        So What’s a Web Server Do?
     Respond to client requests, typically a browser
       • Can be a proxy, which aggregates client requests (e.g., AOL)
       • Could be search engine spider or custom (e.g., Keynote)
     May have work to do on client’s behalf:
       •   Is the client’s cached copy still good?
       •   Is client authorized to get this document?
       •   Is client a proxy on someone else’s behalf?
       •   Run an arbitrary program (e.g., stock trade)
     Hundreds or thousands of simultaneous clients
     Hard to predict how many will show up on some day
     Many requests are in progress concurrently

             Server capacity planning is non-trivial.

    Web Servers: Implementation and Performance   Erich Nahum           12
What do HTTP Requests Look Like?
           GET /images/penguin.gif HTTP/1.0
           User-Agent: Mozilla/0.9.4 (Linux 2.2.19)
           Host: www.kernel.org
           Accept: text/html, image/gif, image/jpeg
           Accept-Encoding: gzip
           Accept-Language: en
           Accept-Charset: iso-8859-1,*,utf-8
           Cookie: B=xh203jfsf; Y=3sdkfjej
           <cr><lf>


    Messages are in ASCII (human-readable)
    Carriage-return and line-feed indicate end of headers
    Headers may communicate private information
      • (e.g., browser, OS, cookie information, etc.)

    Web Servers: Implementation and Performance   Erich Nahum   13
What Kind of Requests are there?

Called Methods:
 GET: retrieve a file (95% of requests)
 HEAD: just get meta-data (e.g., mod time)
 POST: submitting a form to a server
 PUT: store enclosed document as URI
 DELETE: removed named resource
 LINK/UNLINK: in 1.0, gone in 1.1
 TRACE: http “echo” for debugging (added in 1.1)
 CONNECT: used by proxies for tunneling (1.1)
 OPTIONS: request for server/proxy options (1.1)




Web Servers: Implementation and Performance   Erich Nahum   14
  What Do Responses Look Like?
    HTTP/1.0 200 OK
    Server: Tux 2.0
    Content-Type: image/gif
    Content-Length: 43
    Last-Modified: Fri, 15 Apr 1994 02:36:21 GMT
    Expires: Wed, 20 Feb 2002 18:54:46 GMT
    Date: Mon, 12 Nov 2001 14:29:48 GMT
    Cache-Control: no-cache
    Pragma: no-cache
    Connection: close
    Set-Cookie: PA=wefj2we0-jfjf
    <cr><lf>
    <data follows…>

       • Similar format to requests (i.e., ASCII)


Web Servers: Implementation and Performance   Erich Nahum   15
          What Responses are There?
     1XX: Informational (def’d in 1.0, used in 1.1)
       100 Continue, 101 Switching Protocols
     2XX: Success
       200 OK, 206 Partial Content
     3XX: Redirection
       301 Moved Permanently, 304 Not Modified
     4XX: Client error
       400 Bad Request, 403 Forbidden, 404 Not Found
     5XX: Server error
        500 Internal Server Error, 503 Service
         Unavailable, 505 HTTP Version Not Supported




    Web Servers: Implementation and Performance   Erich Nahum   16
    What are all these Headers?
     Specify capabilities and properties:

   General:
     Connection, Date
   Request:
     Accept-Encoding, User-Agent
   Response:
     Location, Server type
   Entity:
     Content-Encoding, Last-Modified
   Hop-by-hop:
     Proxy-Authenticate, Transfer-Encoding


     Server must pay attention to respond properly.

Web Servers: Implementation and Performance   Erich Nahum   17
    Summary: Introduction to HTTP

   The major application on the Internet
     • Majority of traffic is HTTP (or HTTP-related)
   Client/server model:
     • Clients make requests, servers respond to them
     • Done mostly in ASCII text (helps debugging!)
   Various headers and commands
     • Too many to go into detail here
     • We’ll focus on common server ones
     • Many web books/tutorials exist (e.g., Krishnamurthy &
       Rexford 2001)




Web Servers: Implementation and Performance   Erich Nahum      18
  Chapter 2: Outline of a Typical
        HTTP Transaction




Web Servers: Implementation and Performance   Erich Nahum   19
    Outline of an HTTP Transaction

   In this section we go over the
    basics of servicing an HTTP GET
    request from user space
    For this example, we'll assume a
                                                            initialize;
                                                           forever do {
    single process running in user                            get request;
    space, similar to Apache 1.3                              process;
                                                              send response;
   At each stage see what the                                log request;
    costs/problems can be                                   }

   Also try to think of where costs
    can be optimized                                           server in
                                                               a nutshell
   We’ll describe relevant socket
    operations as we go


Web Servers: Implementation and Performance   Erich Nahum                      20
                  Readying a Server
     s = socket();      /* allocate listen socket */
     bind(s, 80);       /* bind to TCP port 80    */
     listen(s);         /* indicate willingness to accept */
     while (1) {
         newconn =      accept(s); /* accept new connection */b


   First thing a server does is notify the OS it is interested in
    WWW server requests; these are typically on TCP port 80.
    Other services use different ports (e.g., SSL is on 443)
   Allocate a socket and bind()'s it to the address (port 80)
   Server calls listen() on the socket to indicate willingness to
    receive requests
   Calls accept() to wait for a request to come in (and blocks)
   When the accept() returns, we have a new socket which
    represents a new connection to a client


Web Servers: Implementation and Performance   Erich Nahum            21
               Processing a Request
        remoteIP = getsockname(newconn);
        remoteHost = gethostbyname(remoteIP);
        gettimeofday(currentTime);
        read(newconn, reqBuffer, sizeof(reqBuffer));
        reqInfo = serverParse(reqBuffer);


   getsockname() called to get the remote host name
    • for logging purposes (optional, but done by most)
   gethostbyname() called to get name of other end
    • again for logging purposes
   gettimeofday() is called to get time of request
    • both for Date header and for logging
   read() is called on new socket to retrieve request
   request is determined by parsing the data
    • “GET /images/jul4/flag.gif”

Web Servers: Implementation and Performance   Erich Nahum   22
       Processing a Request (cont)
     fileName = parseOutFileName(requestBuffer);
     fileAttr = stat(fileName);
     serverCheckFileStuff(fileName, fileAttr);
     open(fileName);

   stat() called to test file path
    • to see if file exists/is accessible
    • may not be there, may only be available to certain people
    • "/microsoft/top-secret/plans-for-world-domination.html"
   stat() also used for file meta-data
    • e.g., size of file, last modified time
    • "Have plans changed since last time I checked?“
   might have to stat() multiple files just to get to end
    • e.g., 4 stats in bill g example above
   assuming all is OK, open() called to open the file
Web Servers: Implementation and Performance   Erich Nahum         23
           Responding to a Request
    read(fileName, fileBuffer);
    headerBuffer = serverFigureHeaders(fileName, reqInfo);
    write(newSock, headerBuffer);
    write(newSock, fileBuffer);
    close(newSock);
    close(fileName);
    write(logFile, requestInfo);

    read() called to read the file into user space
    write() is called to send HTTP headers on socket
     (early servers called write() for each header!)
    write() is called to write the file on the socket
    close() is called to close the socket
    close() is called to close the open file descriptor
    write() is called on the log file



Web Servers: Implementation and Performance   Erich Nahum    24
    Optimizing the Basic Structure


   As we will see, a great deal of locality exists in
    web requests and web traffic.

   Much of the work described above doesn't really
    need to be performed each time.

   Optimizations fall under 2 categories: caching and
    custom OS primitives.



Web Servers: Implementation and Performance   Erich Nahum   25
                 Optimizations: Caching
       Idea is to exploit locality in client requests. Many
       files are requested over and over (e.g., index.html).

     Why open and close                  fileDescriptor =
      files over and over                   lookInFDCache(fileName);
      again? Instead,                     metaInfo =
                                            lookInMetaInfoCache(fileName);
      cache open file FD’s,               headerBuffer =
      manage them LRU.                      lookInHTTPHeaderCache(fileName);



     Why stat them again                    Again, cache HTTP header
      and again? Cache                        info on a per-url basis,
      path name access                        rather than re-generating
      characteristics.                        info over and over.

    Web Servers: Implementation and Performance   Erich Nahum                  26
         Optimizations: Caching (cont)

    Instead of reading and writing the data, cache data,
     as well as meta-data, in user space

    Even better, mmap()                    fileData =
     the file so that two                     lookInFileDataCache(fileName);
     copies don’t exist in                  fileData =
                                              lookInMMapCache(fileName);
     both user and kernel                   remoteHostName =
     space                                    lookRemoteHostCache(fileName);



    Since we see the same clients over and over, cache
     the reverse name lookups (or better yet, don't do
     resolves at all, log only IP addresses)

    Web Servers: Implementation and Performance    Erich Nahum                 27
     Optimizations: OS Primitives
   Rather than call accept(), getsockname() & read(),
    add a new primitive, acceptExtended(), which
    combines the 3 primitives

   Instead of calling                 acceptExtended(listenSock,

    gettimeofday(), use a
                                              &newSock, readBuffer,
                                              &remoteInfo);
    memory-mapped
    counter that is cheap              currentTime = *mappedTimePointer;

    to access (a few                   buffer[0] = firstHTTPHeader;
    instructions rather                buffer[1] = secondHTTPHeader;
    than a system call)                buffer[2] = fileDataBuffer;
                                       writev(newSock, buffer, 3);


   Instead of calling write() many times use writev()


Web Servers: Implementation and Performance    Erich Nahum                 28
                       OS Primitives (cont)
        Rather than calling read() & write(), or write() with
         an mmap()'ed file, use a new primitive called
         sendfile() (or transmitfile()). Bytes stay in the
         kernel.

        While we're at it,                 httpInfo = cacheLookup(reqBuffer);
         add a header option                sendfile(newConn,
         to sendfile() so                          httpInfo->headers,
         that we don't have                        httpInfo->fileDescriptor,
                                                   OPT_CLOSE_WHEN_DONE);
         to call write() at all.

        Also add an option to close the connection so that we
         don't have to call close() explicitly.

All this assumes proper OS support. Most have it these days.
       Web Servers: Implementation and Performance   Erich Nahum                 29
    An Accelerated Server Example
    acceptex(socket, newConn, reqBuffer, remoteHostInfo);
    httpInfo = cacheLookup(reqBuffer);
    sendfile(newConn, httpInfo->headers,
           httpInfo->fileDescriptor, OPT_CLOSE_WHEN_DONE);
    write(logFile, requestInfo);


   acceptex() is called
    • gets new socket, request, remote host IP address
   string match in hash table is done to parse request
    • hash table entry contains relevant meta-data, including
      modification times, file descriptors, permissions, etc.
   sendfile() is called
    • pre-computed header, file descriptor, and close option
   log written back asynchronously (buffered write()).
                                 That’s it!
Web Servers: Implementation and Performance   Erich Nahum       30
                        Complications

   Custom APIs have problems:
    • Additional test coverage is required
    • API may not be sufficiently general to be worth it
   Take, for example, acceptex():
    • Work has shown it doesn’t make a big difference
    • Some server applications write before reading (e.g., FTP)
    • Result is no OS’s outside of MS have it
   Counter-example is sendfile():
    • Useful for many types of servers (Web, FTP, SMB, NFS)
    • Result is available on virtually every OS




Web Servers: Implementation and Performance   Erich Nahum         31
                Complications (cont)

   Much of this assumes sharing is easy:
    • but, this is dependent on the server architectural model
    • if multiple processes are being used, as in Apache, it is
      difficult to share data structures.
   Take, for example, mmap():
    • mmap() maps a file into the address space of a process.
    • a file mmap'ed in one address space can’t be re-used for
      a request for the same file served by another process.
    • Apache 1.3 does use mmap() instead of read().
    • in this case, mmap() eliminates one data copy versus a
      separate read() & write() combination, but process will
      still need to open() and close() the file.




Web Servers: Implementation and Performance   Erich Nahum         32
                Complications (cont)
   Similarly, meta-data info needs to be shared:
    • e.g., file size, access permissions, last modified time, etc.
   While locality is high, cache misses can and do
    happen sometimes:
    • if previously unseen file requested, process can block
      waiting for disk.
   OS can impose other restrictions:
    • e.g., limits on number of open file descriptors.
    • e.g., sockets typically allow buffering about 64 KB of data.
      If a process tries to write() a 1 MB file, it will block until
      other end receives the data.
   Need to be able to cope with the misses without
    slowing down the hits


Web Servers: Implementation and Performance   Erich Nahum              33
    Summary: Outline of a Typical
         HTTP Transaction
   A server can perform many steps in the process of
    servicing a request
   Different actions depending on many factors:
    • e.g., 304 not modified if client's cached copy is good
    • e.g., 404 not found, 401 unauthorized
   Most requests are for small subset of data:
    • we’ll see more about this in the Workload section
    • we can leverage that fact for performance
   Architectural model affects possible optimizations
    • we’ll go into this in more detail in the next section




Web Servers: Implementation and Performance   Erich Nahum      34
             Chapter 3:
     Server Architectural Models




Web Servers: Implementation and Performance   Erich Nahum   35
     Server Architectural Models

Several approaches to server structure:
 Process based: Apache, NCSA
 Thread-based: JAWS, IIS
 Event-based: Flash, Zeus
 Kernel-based: Tux, AFPA, ExoKernel


We will describe the advantages and disadvantages
  of each.
Fundamental tradeoffs exist between performance,
  protection, sharing, robustness, extensibility, etc.



Web Servers: Implementation and Performance   Erich Nahum   36
      Process Model (ex: Apache)



   Process created to handle each new request:
    • Process can block on appropriate actions,
      (e.g., socket read, file read, socket write)
    • Concurrency handled via multiple processes
   Quickly becomes unwieldy:
    • Process creation is expensive.
    • Instead, pre-forked pool is created.
    • Upper limit on # of processes is enforced
         • First by the server, eventually by the operating system.
         • Concurrency is limited by upper bound

Web Servers: Implementation and Performance   Erich Nahum             37
     Process Model: Pros and Cons

   Advantages:
    • Most importantly, consistent with programmer's way of
      thinking. Most programmers think in terms of linear
      series of steps to accomplish task.
    • Processes are protected from one another; can't nuke
      data in some other address space. Similarly, if one
      crashes, others unaffected.
   Disadvantages:
    • Slow. Forking is expensive, allocating stack, VM data
      structures for each process adds up and puts pressure on
      the memory system.
    • Difficulty in sharing info across processes.
    • Have to use locking.
    • No control over scheduling decisions.


Web Servers: Implementation and Performance   Erich Nahum        38
       Thread Model (Ex: JAWS)




   Use threads instead of processes. Threads
    consume fewer resources than processes (e.g.,
    stack, VM allocation).
   Forking and deleting threads is cheaper than
    processes.
   Similarly, pre-forked thread pool is created. May
    be limits to numbers but hopefully less of an issue
    than with processes since fewer resources
    required.
Web Servers: Implementation and Performance   Erich Nahum   39
     Thread Model: Pros and Cons
   Advantages:
    • Faster than processes. Creating/destroying cheaper.
    • Maintains programmer's way of thinking.
    • Sharing is enabled by default.
   Disadvantages:
    • Less robust. Threads not protected from each other.
    • Requires proper OS support, otherwise, if one thread
      blocks on a file read, will block all the address space.
    • Can still run out of threads if servicing many clients
      concurrently.
    • Can exhaust certain per-process limits not encountered
      with processes (e.g., number of open file descriptors).
    • Limited or no control over scheduling decisions.



Web Servers: Implementation and Performance   Erich Nahum        40
           Event Model (Ex: Flash)
         while (1) {
           accept new connections until none remaining;
           call select() on all active file descriptors;
           for each FD:
             if (fd ready for reading) call read();
             if (fd ready for writing) call write();
           }




   Use a single process and deal with requests in a
    event-driven manner, like a giant switchboard.
   Use non-blocking option (O_NDELAY) on sockets, do
    everything asynchronously, never block on anything,
    and have OS notify us when something is ready.


Web Servers: Implementation and Performance   Erich Nahum   41
        Event-Driven: Pros and Cons
   Advantages:
    •   Very fast.
    •   Sharing is inherent, since there’s only one process.
    •   Don't even need locks as in thread models.
    •   Can maximize concurrency in request stream easily.
    •   No context-switch costs or extra memory consumption.
    •   Complete control over scheduling decisions.

   Disadvantages:
    • Less robust. Failure can halt whole server.
    • Pushes per-process resource limits (like file descriptors).
    • Not every OS has full asynchronous I/O, so can still
      block on a file read. Flash uses helper processes to deal
      with this (AMPED architecture).


Web Servers: Implementation and Performance   Erich Nahum           42
        In-Kernel Model (Ex: Tux)

               HTTP           user/                              user/
                              kernel                             kernel
               SOCK           boundary               HTTP        boundary

                TCP                                  TCP
                 IP                                   IP
                ETH                                  ETH

          user-space server                    kernel-space server


   Dedicated kernel thread for HTTP requests:
     • One option: put whole server in kernel.
     • More likely, just deal with static GET requests in kernel to
       capture majority of requests.
     • Punt dynamic requests to full-scale server in user space,
       such as Apache.

Web Servers: Implementation and Performance   Erich Nahum                   43
     In-Kernel Model: Pros and Cons

     In-Kernel Event Model:
       • Avoids transitions to user space, copies across u-k boundary, etc.
       • Leverages already existing asynchronous primitives in the kernel
         (kernel doesn't block on a file read, etc.).
     Advantages:
       • Extremely fast. Tight integration with kernel.
       • Small component without full server optimizes common case.
     Disadvantages:
       • Less robust. Bugs can crash whole machine, not just server.
       • Harder to debug and extend, since kernel programming required,
         which is not as well-known as sockets.
       • Similarly, harder to deploy. APIs are OS-specific (Linux, BSD,
         NT), whereas sockets & threads are (mostly) standardized.
       • HTTP evolving over time, have to modify kernel code in response.



    Web Servers: Implementation and Performance   Erich Nahum                 44
     So What’s the Performance?




   Graph shows server throughput for Tux, Flash, and Apache.
   Experiments done on 400 MHz P/II, gigabit Ethernet, Linux
    2.4.16, 8 client machines, WaspClient workload generator
   Tux is fastest, but Flash close behind
Web Servers: Implementation and Performance   Erich Nahum       45
    Summary: Server Architectures
   Many ways to code up a server
    • Tradeoffs in speed, safety, robustness, ease of
      programming and extensibility, etc.
   Multiple servers exist for each kind of model
    • Not clear that a consensus exists.
   Better case for in-kernel servers as devices
    • e.g. reverse proxy accelerator, Akamai CDN node
   User-space servers have a role:
    • OS should provides proper primitives for efficiency
    • Leave HTTP-protocol related actions in user-space
    • In this case, event-driven model is attractive
   Key pieces to a fast event-driven server:
    • Minimize copying
    • Efficient event notification mechanism


Web Servers: Implementation and Performance   Erich Nahum   46
    Chapter 4: Event Notification




Web Servers: Implementation and Performance   Erich Nahum   47
    Event Notification Mechanisms
   Recall how Flash works:
    • One process, many FD's, calling select() on all active
      socket descriptors.
    • All sockets are set using O_NDELAY flag (non-blocking)
    • Single address space aids sharing for performance
    • File reads and writes don't have non-blocking support,
      thus helper processes (AMPED architecture)
   Point is to exploit concurrency/parallelism:
    • Can read one socket while waiting to write on another
   Event notification:
    • Mechanism for kernel and application to notify each
      other of interesting/important events
    • E.g., connection arrivals, socket closes, data available to
      read, space available for writing


Web Servers: Implementation and Performance   Erich Nahum           48
       State-Based: Select & Poll
   select() and poll():
    • State-based: Is socket ready for reading/writing?
    • select() interface has FD_SET bitmasks turned on/off
      based on interest
    • poll() is simple array, larger structure but simpler
      implementation
   Performance costs:
    • Kernel scans O(N) descriptors to set bits
    • User application scans O(N) descriptors
    • select() bit manipulation can be expensive
   Problems:
    • Traffic is bursty, connections not active all at once
         • # (active connections) << # (open connections).
         • Costs are O(total connections), not O(active connections)
    • Application keeps specifying interest set repeatedly

Web Servers: Implementation and Performance   Erich Nahum              49
         Event-Based Notification
          Banga, Mogul & Druschel (USENIX 99)
   Propose an event based approach, rather than
    state-based:
    • Something just happened on socket X, rather than socket
      X is ready for reading or writing
    • Server takes event as indication socket might be ready
    • Multiple events can happen on a single socket (e.g.,
      packets draining (implying writeable) or accumulating
      (readable))
   API has following:
    • Application notifies kernel by calling declare_interest()
      once per file descriptor (e.g., after accept()), rather
      than multiple times like in select()/poll()
    • Kernel queues events internally
    • Application calls get_next_event() to see changes

Web Servers: Implementation and Performance   Erich Nahum         50
    Event-Based Notification (cont)
   Problems:
    • Kernel has to allocate storage for event queue. Little's
      law says it needs to be proportional to the event rate
    • Bursty applications could overflow queue
    • Can address multiple events by coalescing based on FD
    • Results in storage O(total connections).
   Application has to change the way it thinks:
    • Respond to events, instead of checking state.
    • If events are missed, connections might get stuck.
   Evaluation shows it scales nicely:
    • cost is O(active) not O(total)
   Windows NT has something similar:
    • called IO completion ports



Web Servers: Implementation and Performance   Erich Nahum        51
    Notification in the Real World
   POSIX Real-Time Signals:
    • Different concept: Unix signals are invoked when
      something is ready on a file descriptor.
    • Signals are expensive and difficult to control (e.g., no
      ordering), so applications can suppress signals and then
      retrieve them via sigwaitinfo()
    • If signal queue fills up, events will be dropped. A
      separate signal is raised to notify application about
      signal queue overflow.
   Problems:
    • If signal queue overflows, then app must fall back on
      state-based approach. Chandra and Mosberger propose
      signal-per-fd (coalescing events per file descriptor).
    • Only one event is retrieved at a time: Provos and Lever
      propose sigtimedwait4() to retrieve multiple signals at
      once

Web Servers: Implementation and Performance   Erich Nahum        52
    Notification in the Real World
   Sun's /dev/poll:
    • App notifies kernel by writing to special file /dev/poll to
      express interest
    • App does IOCTL on /dev/poll for list of ready FD's
    • App and kernel are still both state based
    • Kernel still pays O(total connections) to create FD list
   Libenzi’s /dev/epoll (patch for Linux 2.4):
    • Uses /dev/epoll as interface, rather than /dev/poll
    • Application writes interest to /dev/epoll and IOCTL's to
      get events
    • Events are coalesced on a per-FD basis
    • Semantically identical to RT signals with sig-per-fd &
      sigtimedwait4().



Web Servers: Implementation and Performance   Erich Nahum           53
       Real File Asynchronous I/O

   Like setting O_NDELAY (non-blocking) on file
    descriptors:
    • Application can queue reads and writes on FDs and pick
      them up later (like dry cleaning)
    • Requires support in the file system (e.g., callbacks)
   Currently doesn't exist on many OS's:
    • POSIX specification exists
    • Solaris has non-standard version
    • Linux has it slated for 2.5 kernel
   Two current candidates on Linux:
    • SGI's /dev/kaio and Ben LeHaises's /dev/aio
   Proper implementation would allow Flash to
    eliminate helpers

Web Servers: Implementation and Performance   Erich Nahum      54
     Summary: Event Notification
   Goal is to exploit concurrency
    • Concurrency in user workloads means host CPU can
      overlap multiple events to maximize parallelism
    • Keep network, disk busy; never block
   Event notification changes applications:
    • state-based to event-based
    • requires a change in thinking
   Goal is to minimize costs:
    • user/kernel crossings and testing idle socket descriptors
   Event-based notification not yet fully deployed:
    • Most mechanisms only support network I/O, not file I/O
    • Full deployment of Asynchronous I/O spec should fix this



Web Servers: Implementation and Performance   Erich Nahum         55
               Chapter 5:
        Workload Characterization




Web Servers: Implementation and Performance   Erich Nahum   56
        Workload Characterization

   Why Characterize Workloads?
    • Gives an idea about traffic behavior
      ("Which documents are users interested in?")
    • Aids in capacity planning
      ("Is the number of clients increasing over time?")
    • Aids in implementation
      ("Does caching help?")

   How do we capture them ?
    • Through server logs (typically enabled)
    • Through packet traces (harder to obtain and to process)




Web Servers: Implementation and Performance   Erich Nahum       57
                Factors to Consider


    client?                          proxy?                      server?

    Where do I get logs from?
      • Client logs give us an idea, but not necessarily the same
      • Same for proxy logs
      • What we care about is the workload at the server
     Is trace representative?
      • Corporate POP vs. News vs. Shopping site
    What kind of time resolution?
      • e.g., second, millisecond, microsecond
    Does trace/log capture all the traffic?
      • e.g., incoming link only, or one node out of a cluster
Web Servers: Implementation and Performance   Erich Nahum                  58
              Probability Refresher
   Lots of variability in workloads
    •   Use probability distributions to express
    •   Want to consider many factors

   Some terminology/jargon:
    •   Mean: average of samples
    •   Median : half are bigger, half are smaller
    •   Percentiles: dump samples into N bins
        (median is 50th percentile number)

   Heavy-tailed:                  Pr[ X  x]  cx  a
    •   As x->infinity



Web Servers: Implementation and Performance   Erich Nahum   59
            Important Distributions
Some Frequently-Seen Distributions:
                                                              ( x   ) 2 /( 2 2 )
   Normal:                                   f ( x) 
                                                         e
    •   (avg. sigma, variance mu)                               2
                                                              (ln( x )   ) 2 /( 2 2 )
   Lognormal:                                f ( x) 
                                                         e
    •   (x >= 0; sigma > 0)                                      x 2

   Exponential:                              f ( x)  e  x
    •   (x >= 0)

   Pareto:                                   f ( x)  ak a / x ( a 1)
    •   (x >= k, shape a, scale k)

Web Servers: Implementation and Performance    Erich Nahum                                  60
                    More Probability




       • Graph shows 3 distributions with average = 2.
       • Note average  median in all cases !
       • Different distributions have different “weight” in tail.
Web Servers: Implementation and Performance   Erich Nahum           61
              What Info is Useful?

   Request methods
    • GET, POST, HEAD, etc.
   Response codes
    • success, failure, not-modified, etc.
   Size of requested files
   Size of transferred objects
   Popularity of requested files
   Numbers of embedded objects
   Inter-arrival time between requests
   Protocol support (1.0 vs. 1.1)



Web Servers: Implementation and Performance   Erich Nahum   62
        Sample Logs for Illustration
Name:                    Chess         Olympics             IBM           IBM
                          1997            1998              1998          2001
Description:         Kasparov-     Nagano 1998         Corporate     Corporate
                     Deep Blue        Olympics          Presence      Presence
                    Event Site      Event Site

Period:              2 weeks in        2 days in         1 day in      1 day in
                      May 1997         Feb 1998        June 1998      Feb 2001
Hits:                1,586,667       11,485,600        5,800,000    12,445,739

Bytes:               14,171,711      54,697,108       10,515,507    28,804,852
Clients:               256,382           86,021           80,921       319,698
URLS:                    2,293           15,788           30,465        42,874


    We’ll use statistics generated from these logs as examples.


Web Servers: Implementation and Performance        Erich Nahum                    63
                   Request Methods
                  Chess           Olympics IBM              IBM
                  1997            1998     1998             2001
    GET           96%             99.6%       99.3%         97%
    HEAD          04%             00.3 %      00.08% 02%
    POST          00.007% 00.04 %             00.02% 00.2%

    Others: noise                 noise       noise         noise


   KR01: "overwhelming majority" are GETs, few POSTs
   IBM2001 trace starts seeing a few 1.1 methods (CONNECT,
    OPTIONS, LINK), but still very small (1/10^5 %)


Web Servers: Implementation and Performance   Erich Nahum           64
                     Response Codes
    Code   Meaning                Chess       Olympics   IBM        IBM
                                  1997        1998       1998       2001

    200    OK                     85.32       76.02      75.28      67.72
    204    NO_CONTENT             --.--       --.--      00.00001   --.--
    206    PARTIAL_CONTENT        00.25       --.--      --.--      --.--
    301    MOVED_PERMANENTLY      00.05       --.--      --.--      --.--
    302    MOVED_TEMPORARILY      00.05       00.05      01.18      15.11
    304    NOT_MODIFIED           13.73       23.24      22.84      16.26
    400    BAD_REQUEST            00.001      00.0001    00.003     00.001
    401    UNAUTHORIZED           --.—-       00.001     00.0001    00.001
    403    FORBIDDEN              00.01       00.02      00.01      00.009
    404    NOT_FOUND              00.55       00.64      00.65      00.79
    407    PROXY_AUTH             --.--       --.--      --.--      00.002
    500    SERVER_ERROR           --.--       00.003     00.006     00.07
    501    NOT_IMPLEMENTED        --.--       00.0001    00.0005    00.006
    503    SERVICE_UNAVAIL        --.--       --.--      00.0001    00.0003
    ???    UNKNOWN                00.0003     00.00004   00.005     00.0004


              Table shows percentage of responses.
              Majority are OK and NOT_MODIFIED.
              Consistent with numbers from AW96, KR01.
Web Servers: Implementation and Performance       Erich Nahum                 65
               Resource (File) Sizes




   Shows file/memory usage (not weighted by frequency!)
   Lognormal body, consistent with results from AW96, CB96, KR01.
   AW96, CB96: sizes have Pareto tail; Downey01: Sizes are lognormal.

Web Servers: Implementation and Performance   Erich Nahum                66
           Tails from the File Size




       Shows the complementary CDF (CCDF) of file sizes.
       Haven’t done the curve fitting but looks Pareto-ish.
Web Servers: Implementation and Performance   Erich Nahum      67
          Response (Transfer) Sizes




         Shows network usage (weighted by frequency of requests)
         Lognormal body, pareto tail, consistent with CBC95,
          AW96, CB96, KR01
Web Servers: Implementation and Performance   Erich Nahum           68
             Tails of Transfer Size




   Shows the complementary CDF (CCDF) of file sizes.
   Looks somewhat Pareto-like; certainly some big transfers.
Web Servers: Implementation and Performance   Erich Nahum       69
                Resource Popularity




           Follows a Zipf model: p(r) = r^{-alpha}
                (alpha = 1 true Zipf; others “Zipf-like")
           Consistent with CBC95, AW96, CB96, PQ00, KR01
           Shows that caching popular documents is very effective
Web Servers: Implementation and Performance       Erich Nahum        70
    Number of Embedded Objects

   Mah97: avg 3, 90% are 5 or less
   BC98: pareto distr, median 0.8, mean 1.7
   Arlitt98 World Cup study: median 15 objects, 90%
    are 20 or less
   MW00: median 7-17, mean 11-18, 90% 40 or less
   STA00: median 5,30 (2 traces), 90% 50 or less
   Mah97, BC98, SCJO01: embedded objects tend to
    be smaller than container objects
   KR01: median is 8-20, pareto distribution


Trend seems to be that number is increasing over time.

Web Servers: Implementation and Performance   Erich Nahum   71
             Session Inter-Arrivals

   Inter-arrival time between successive requests
    • “Think time"
    • difference between user requests vs. ALL requests
    • partly depends on definition of boundary
   CB96: variability across multiple timescales, "self-
    similarity", average load very different from peak
    or heavy load
   SCJO01: log-normal, 90% less than 1 minute.
   AW96: independent and exponentially distributed
   KR01: pareto with a=1.5, session arrivals follow
    poisson distribution, but requests follow pareto



Web Servers: Implementation and Performance   Erich Nahum   72
                       Protocol Support

     IBM.com 2001 logs:
       • Show roughly 53% of client requests are 1.1
     KA01 study:
       • 92% of servers claim to support 1.1 (as of Sep 00)
       • Only 31% actually do; most fail to comply with spec
     SCJO01 show:
       • Avg 6.5 requests per persistent connection
       • 65% have 2 connections per page, rest more.
       • 40-50% of objects downloaded by persistent connections



Appears that we are in the middle of a slow transition to 1.1

   Web Servers: Implementation and Performance   Erich Nahum      73
                Summary: Workload
                 Characterization

   Traffic is variable:
    • Responses vary across multiple orders of magnitude
   Traffic is bursty:
    • Peak loads much larger than average loads
   Certain files more popular than others
    • Zipf-like distribution captures this well
   Two-sided aspect of transfers:
    • Most responses are small (zero pretty common)
    • Most of the bytes are from large transfers
   Controversy over Pareto/log-normal distribution
   Non-trivial for workload generators to replicate


Web Servers: Implementation and Performance   Erich Nahum   74
Chapter 6: Workload Generators




Web Servers: Implementation and Performance   Erich Nahum   75
           Why Workload Generators?
     Allows stress-testing and
      bug-finding
     Gives us some idea of server
      capacity                                        Measure        Reproduce

     Allows us a scientific process
      to compare approaches
       • e.g., server models, gigabit                  Fix
                                                       and/or           Find
         adaptors, OS implementations                  improve       Problem
     Assumption is that
      difference in testbed                               The Performance
      translates to some                                  Debugging Cycle
      difference in real-world
     Allows the performance
      debugging cycle

    Web Servers: Implementation and Performance   Erich Nahum                    76
               Problems with Workload
                     Generators


   Only as good as our understanding of the traffic
   Traffic may change over time
     • generators must too
   May not be representative
     • e.g., are file size distributions from IBM.com similar to mine?
   May be ignoring important factors
     • e.g., browser behavior, WAN conditions, modem connectivity
   Still, useful for diagnosing and treating problems



    Web Servers: Implementation and Performance   Erich Nahum            77
    How does W. Generation Work?
   Many clients, one server
    • match asymmetry of Internet
   Server is populated with some
    kind of synthetic content
   Simulated clients produce
    requests for server
   Master process to control
    clients, aggregate results
   Goal is to measure server                          Requests   Responses
    • not the client or network
   Must be robust to conditions
    • e.g., if server keeps sending 404
      not found, will clients notice?


Web Servers: Implementation and Performance   Erich Nahum                     78
               Evolution: WebStone
   The original workload generator from SGI in 1995
   Process based workload generator, implemented in C
   Clients talk to master via sockets
   Configurable: # client machines, # client processes, run time
   Measured several metrics: avg + max connect time, response
    time, throughput rate (bits/sec), # pages, # files
   1.0 only does GETS, CGI support added in 2.0
   Static requests, 5 different file sizes:
              Percentage          Size
                    35.00        500 B
                    50.00         5 KB
                    14.00        50 KB
                     0.90       500 KB
                                              www.mindcraft.com/webstone
                     0.10         5 MB


Web Servers: Implementation and Performance      Erich Nahum               79
             Evolution: SPECWeb96

   Developed by SPEC
    • Systems Performance Evaluation Consortium
    • Non-profit group with many benchmarks (CPU, FS)
   Attempt to get more representative
    • Based on logs from NCSA, HP, Hal Computers
   4 classes of files:
                  Percentage             Size
                       35.00           0-1 KB
                       50.00          1-10 KB
                       14.00       10-100 KB
                         1.00 100 KB – 1 MB

   Poisson distribution between each class

Web Servers: Implementation and Performance     Erich Nahum   80
                 SPECWeb96 (cont)


   Notion of scaling versus load:
    • number of directories in data set size doubles as
      expected throughput quadruples (sqrt(throughput/5)*10)
    • requests spread evenly across all application directories
   Process based WG
   Clients talk to master via RPC's (less robust)
   Still only does GETS, no keep-alive

                    www.spec.org/osg/web96


Web Servers: Implementation and Performance   Erich Nahum         81
                   Evolution: SURGE
   Scalable URL Reference GEnerator
    • Barford & Crovella at Boston University CS Dept.
   Much more worried about representativeness,
    captures:
    •   server file size distributions,
    •   request size distribution,
    •   relative file popularity
    •   embedded file references
    •   temporal locality of reference
    •   idle periods ("think times") of users
   Process/thread based WG




Web Servers: Implementation and Performance   Erich Nahum   82
                        SURGE (cont)

   Notion of “user-equivalent”:
    • statistical model of a user
    • active “off” time (between URLS),
    • inactive “off” time (between pages)
   Captures various levels of burstiness
   Not validated, shows that load generated is
    different than SpecWeb96 and has more
    burstiness in terms of CPU and # active
    connections

                         www.cs.wisc.edu/~pb


Web Servers: Implementation and Performance   Erich Nahum   83
                  Evolution: S-client
   Almost all workload generators are closed-loop:
    • client submits a request, waits for server, maybe thinks
      for some time, repeat as necessary
   Problem with the closed-loop approach:
    • client can't generate requests faster than the server can
      respond
    • limits the generated load to the capacity of the server
    • in the real world, arrivals don’t depend on server state
         • i.e., real users have no idea about load on the server when
           they click on a site, although successive clicks may have this
           property
    • in particular, can't overload the server
   s-client tries to be open-loop:
    • by generating connections at a particular rate
    • independent of server load/capacity

Web Servers: Implementation and Performance   Erich Nahum                   84
                      S-Client (cont)
   How is s-client open-loop?
    • connecting asynchronously at a particular rate
    • using non-blocking connect() socket call
   Connect complete within a particular time?
    • if yes, continue normally.
    • if not, socket is closed and new connect initiated.
   Other details:
    • uses single-address space event-driven model like Flash
    • calls select() on large numbers of file descriptors
    • can generate large loads
   Problems:
    • client capacity is still limited by active FD's
    • “arrival” is a TCP connect, not an HTTP request

    www.cs.rice.edu/CS/Systems/Web-measurement
Web Servers: Implementation and Performance   Erich Nahum       85
             Evolution: SPECWeb99

   In response to people "gaming" benchmark, now
    includes rules:
    • IP maximum segment lifetime (MSL) must be at least 60
      seconds (more on this later!)
    • Link-layer maximum transmission unit (MTU) must not be
      larger than 1460 bytes (Ethernet frame size)
    • Dynamic content may not be cached
         • not clear that this is followed
    • Servers must log requests.
         • W3C common log format is sufficient but not mandatory.
    • Resulting workload must be within 10% of target.
    • Error rate must be below 1%.
   Metric has changed:
    • now "number of simultaneous conforming connections“:
      rate of a connection must be greater than 320 Kbps
Web Servers: Implementation and Performance   Erich Nahum           86
                 SPECWeb99 (cont)
   Directory size has changed:
        (25 + (400000/122000)* simultaneous conns) / 5.0)
   Improved HTTP 1.0/1.1 support:
    • Keep-alive requests (client closes after N requests)
    • Cookies
   Back-end notion of user demographics
    • Used for ad rotation
    • Request includes user_id and last_ad
   Request breakdown:
    •    70.00 % static GET
    •    12.45 % dynamic GET
    •    12.60 % dynamic GET with custom ad rotation
    •    04.80 % dynamic POST
    •    00.15 % dynamic GET calling CGI code

Web Servers: Implementation and Performance   Erich Nahum    87
                 SPECWeb99 (cont)
   Other breakdowns:
    •   30 % HTTP 1.0 with no keep-alive or persistence
    •   70 % HTTP 1.0 with keep-alive to "model" persistence
    •   still has 4 classes of file size with Poisson distribution
    •   supports Zipf popularity
   Client implementation details:
    • Master-client communication now uses sockets
    • Code includes sample Perl code for CGI
    • Client configurable to use threads or processes
   Much more info on setup, debugging, tuning
   All results posted to web page,
    • including configuration & back end code

                    www.spec.org/osg/web99

Web Servers: Implementation and Performance   Erich Nahum            88
 So how realistic is SPECWeb99?

   We’ll compare a few characteristics:
     •   File size distribution (body)
     •   File size distribution (tail)
     •   Transfer size distribution (body)
     •   Transfer size distribution (tail)
     •   Document popularity
   Visual comparison only
     • No curve-fitting, r-squared plots, etc.
     • Point is to give a feel for accuracy



Web Servers: Implementation and Performance   Erich Nahum   89
        SpecWeb99 vs. File Sizes




   SpecWeb99: In the ballpark, but not very smooth
Web Servers: Implementation and Performance   Erich Nahum   90
    SpecWeb99 vs. File Size Tail




   SpecWeb99 tail isn’t as long as real logs (900 KB max)

Web Servers: Implementation and Performance   Erich Nahum    91
   SpecWeb99 vs.Transfer Sizes




         Doesn’t capture 304 (not modified) responses
         Coarser distribution than real logs (i.e., not smooth)
Web Servers: Implementation and Performance   Erich Nahum          92
    Spec99 vs.Transfer Size Tails




   SpecWeb99 does OK, although tail drops off rapidly (and in
    fact, no file is greater than 1 MB in SpecWeb99!).

Web Servers: Implementation and Performance   Erich Nahum        93
  Spec99 vs. Resource Popularity




           SpecWeb99 seems to do a good job, although tail
            isn’t long enough

Web Servers: Implementation and Performance   Erich Nahum     94
                   Evolution: TPC-W
   Transaction Processing Council (TPC-W)
    •   More known for database workloads like TPC-D
    •   Metrics include dollars/transaction (unlike SPEC)
    •   Provides specification, not source
    •   Meant to capture a large e-commerce site
   Models online bookstore
    •   web serving, searching, browsing, shopping carts
    •   online transaction processing (OLTP)
    •   decision support (DSS)
    •   secure purchasing (SSL), best sellers, new products
    •   customer registration, administrative updates
   Has notion of scaling per user
    • 5 MB of DB tables per user
    • 1 KB per shopping item, 25 KB per item in static images


Web Servers: Implementation and Performance   Erich Nahum       95
                       TPC-W (cont)
   Remote browser emulator (RBE)
    • emulates a single user
    • send HTTP request, parse, wait for thinking, repeat
   Metrics:
    • WIPS: shopping
    • WIPSb: browsing
    • WIPSo: ordering
   Setups tend to be very large:
    • multiple image servers, application servers, load balancer
    • DB back end (typically SMP)
    • Example: IBM 12-way SMP w/DB2, 9 PCs w/IIS: 1M $

                          www.tpc.org/tpcw


Web Servers: Implementation and Performance   Erich Nahum          96
    Summary: Workload Generators
   Only the beginning. Many other workload generators:
    •   httperf from HP
    •   WAGON from IBM
    •   WaspClient from IBM
    •   Others?
   Both workloads and generators change over time:
    • Both started simple, got more complex
    • As workload changes, so must generators
   No one single "good" generator
    • SpecWeb99 seems the favorite (2002 rumored in the works)
   Implementation issues similar to servers:
    • They are networked-based request producers
      (i.e., produce GET's instead of 200 OK's).
    • Implementation affects capacity planning of clients!
      (want to make sure clients are not bottleneck)



Web Servers: Implementation and Performance   Erich Nahum        97
  Chapter 7: Introduction to TCP




Web Servers: Implementation and Performance   Erich Nahum   98
                    Introduction to TCP
    Layering is a common principle in
     network protocol design
    TCP is the major transport protocol                        application
     in the Internet
                                                                transport
    Since HTTP runs on top of TCP,
     much interaction between the two
                                                                 network
    Asymmetry in client-server model
     puts strain on server-side TCP                                link
     implementations
    Thus, major issue in web servers is                         physical
     TCP implementation and behavior


    Web Servers: Implementation and Performance   Erich Nahum                 99
                   The TCP Protocol

   Connection-oriented, point-to-point protocol:
    • Connection establishment and teardown phases
    • ‘Phone-like’ circuit abstraction
    • One sender, one receiver
   Originally optimized for certain kinds of transfer:
    • Telnet (interactive remote login)
    • FTP (long, slow transfers)
    • Web is like neither of these
   Lots of work on TCP, beyond scope of this tutorial
    • e.g., know of 3 separate TCP tutorials!




Web Servers: Implementation and Performance   Erich Nahum   100
                            TCP Protocol (cont)
              application                                               application
              writes data                                               reads data
     socket                                                                             socket
      layer                                                                              layer
                 TCP            data segment                               TCP
              send buffer                                              receive buffer
                                                  ACK segment




   Provides a reliable, in-order, byte stream abstraction:
     •    Recover lost packets and detect/drop duplicates
     •    Detect and drop bad packets
     •    Preserve order in byte stream, no “message boundaries”
     •    Full-duplex: bi-directional data flow in same connection
   Flow and congestion controlled:
     •    Flow control: sender will not overwhelm receiver
     •    Congestion control: sender will not overwhelm network!
     •    Send and receive buffers
     •    Congestion and flow control windows

    Web Servers: Implementation and Performance          Erich Nahum                             101
                    The TCP Header
                                                             32 bits
Fields enable the following:
 Uniquely identifying a
                                               source port #          dest port #
   connection                                         sequence number
    (4-tuple of client/server IP                  acknowledgement number
       address and port                       head not
                                                        UA P R S F   rcvr window size
       numbers)                                len used
                                                  checksum            ptr urgent data
   Identifying a byte range
    within that connection                        Options (variable length)
   Checksum value to detect
    corruption
   Identifying protocol                                    application
    transitions (SYN, FIN)                                     data
   Informing other side of                              (variable length)
    your state (ACK)



Web Servers: Implementation and Performance         Erich Nahum                         102
    Establishing a TCP Connection

   Client sends SYN with                            client    server
    initial sequence number          connect()
    (ISN)                                                               listen()
                                                                        port 80
   Server responds with
    its own SYN w/seq
    number and ACK of
    client (ISN+1) (next
    expected byte)
   Client ACKs server's
    ISN+1
   The ‘3-way handshake’                     time
   All modulo 32-bit                                                   accept()
    arithmetic
                                                                        read()



Web Servers: Implementation and Performance      Erich Nahum                       103
                           Sending Data
             application                                            application
             writes data                                            reads data
    socket                                                                          socket
     layer                                                                           layer
                TCP         data segment                               TCP
             send buffer                                           receive buffer
                                              ACK segment




      Sender puts data on the wire:
         • Holds copy in case of loss
         • Sender must observed receiver flow control window
         • Sender can discard data when ACK is received
      Receiver sends acknowledgments (ACKs)
         • ACKs can be piggybacked on data going the other way
         • Protocol says receiver should ACK every other packet in
           attempt to reduce ACK traffic (delayed ACKs)
         • Delay should not be more than 500 ms. (typically 200)
         • We’ll see how this causes problems later

Web Servers: Implementation and Performance          Erich Nahum                             104
              Preventing Congestion
   Sender may not only overrun receiver, but may
    also overrun intermediate routers:
    • No way to explicitly know router buffer occupancy,
      so we need to infer it from packet losses
    • Assumption is that losses stem from congestion, namely,
      that intermediate routers have no available buffers
   Sender maintains a congestion window:
    • Never have more than CW of un-acknowledged data
      outstanding (or RWIN data; min of the two)
    • Successive ACKs from receiver cause CW to grow.
   How CW grows based on which of 2 phases:
    • Slow-start: initial state.
    • Congestion avoidance: steady-state.
    • Switch between the two when CW > slow-start threshold

Web Servers: Implementation and Performance   Erich Nahum       105
      Congestion Control Principles

     Lack of congestion control would lead to
      congestion collapse (Jacobson 88).
     Idea is to be a “good network citizen”.
     Would like to transmit as fast as possible
      without loss.
     Probe network to find available bandwidth.
     In steady-state: linear increase in CW per RTT.
     After loss event: CW is halved.
     This is called additive increase /multiplicative
      decrease (AIMD).
     Various papers on why AIMD leads to network
      stability.

Web Servers: Implementation and Performance   Erich Nahum   106
                           Slow Start
   Initial CW = 1.                                  sender receiver
   After each ACK, CW += 1;
   Continue until:




                                              RTT
    • Loss occurs OR
    • CW > slow start threshold
   Then switch to congestion
    avoidance
   If we detect loss, cut CW
    in half
   Exponential increase in
    window size per RTT
                                                                       time



Web Servers: Implementation and Performance    Erich Nahum                    107
                Congestion Avoidance

Until (loss) {
 after CW packets ACKed:
  CW += 1;
}
ssthresh = CW/2;
Depending on loss type:
  SACK/Fast Retransmit:
   CW/= 2; continue;
  Course grained timeout:
   CW = 1; go to slow start.

(This is for TCP Reno/SACK: TCP
Tahoe always sets CW=1 after a loss)




  Web Servers: Implementation and Performance   Erich Nahum   108
          How are losses recovered?
Say packet is lost (data or ACK!)                          sender receiver

 Coarse-grained Timeout:
   • Sender does not receive ACK
     after some period of time
   • Event is called a retransmission




                                                 timeout
     time-out (RTO)
   • RTO value is based on estimated                           X
                                                              loss
     round-trip time (RTT)
   • RTT is adjusted over time using
     exponential weighted moving
     average:
     RTT = (1-x)*RTT + (x)*sample
     (x is typically 0.1)
                                                                               time

   First done in TCP Tahoe                                 lost ACK scenario


  Web Servers: Implementation and Performance   Erich Nahum                           109
                         Fast Retransmit

   Receiver expects N, gets N+1:                        sender receiver
    •   Immediately sends ACK(N)
    •   This is called a duplicate ACK
    •   Does NOT delay ACKs here!
    •   Continue sending dup ACKs for                             X
        each subsequent packet (not N)
   Sender gets 3 duplicate ACKs:
    • Infers N is lost and resends
    • 3 chosen so out-of-order
      packets don’t trigger Fast
      Retransmit accidentally
    • Called “fast” since we don’t need
      to wait for a full RTT
                                                  time
     Introduced in TCP Reno
    Web Servers: Implementation and Performance   Erich Nahum              110
     Other loss recovery methods

   Selective Acknowledgements (SACK):
    • Returned ACKs contain option w/SACK block
    • Block says, "got up N-1 AND got N+1 through N+3"
    • A single ACK can generate a retransmission
   New Reno partial ACKs:
    • New ACK during fast retransmit may not ACK all
      outstanding data. Ex:
         • Have ACK of 1, waiting for 2-6, get 3 dup acks of 1
         • Retransmit 2, get ACK of 3, can now infer 4 lost as well
   Other schemes exist (e.g., Vegas)
   Reno has been prevalent; SACK now catching on



Web Servers: Implementation and Performance   Erich Nahum             111
    How about Connection Teardown?
     Either side may terminate a
      connection. ( In fact,                            client   server
      connection can stay half-
      closed.) Let's say the
      server closes (typical in                                           close()
      WWW)
     Server sends FIN with seq
      Number (SN+1) (i.e., FIN is close()
      a byte in sequence)
     Client ACK's the FIN with
      SN+2 ("next expected")            time




                                                                          timed wait
     Client sends it's own FIN
      when ready
     Server ACK's client FIN as
      well with SN+1.                                                      closed


    Web Servers: Implementation and Performance   Erich Nahum                          112
           The TCP State Machine

   TCP uses a Finite State Machine, kept by each
    side of a connection, to keep track of what state a
    connection is in.
   State transitions reflect inherent races that can
    happen in the network, e.g., two FIN's passing
    each other in the network.
   Certain things can go wrong along the way, i.e.,
    packets can be dropped or corrupted. In fact,
    machine is not perfect; certain problems can arise
    not anticipated in the original RFC.
   This is where timers will come in, which we will
    discuss more later.


Web Servers: Implementation and Performance   Erich Nahum   113
           TCP State Machine:
         Connection Establishment
                                                                 CLOSED
   CLOSED: more implied than
    actual, i.e., no connection                    server application
                                                                           client application
   LISTEN: willing to receive                        calls listen()
                                                                            calls connect()
                                                                               send SYN
    connections (accept call)
                                                     LISTEN
   SYN-SENT: sent a SYN,
    waiting for SYN-ACK                                                        SYN_SENT

   SYN-RECEIVED: received a                      receive SYN
                                                send SYN + ACK       receive SYN
    SYN, waiting for an ACK of                                        send ACK


    our SYN
                                                                            receive SYN & ACK
   ESTABLISHED: connection                         SYN_RCVD                     send ACK


    ready for data transfer
                                                       receive ACK




                                                              ESTABLISHED




Web Servers: Implementation and Performance   Erich Nahum                                       114
              TCP State Machine:
              Connection Teardown
                                                                       ESTABLISHED
   FIN-WAIT-1: we closed first,
    waiting for ACK of our FIN                             close() called

    (active close)                                           send FIN
                                                                                 receive FIN

    FIN-WAIT-2: we closed
                                                                                  send ACK
                                                      FIN_WAIT_1
    first, other side has ACKED
    our FIN, but not yet FIN'ed                    receive ACK   receive FIN     CLOSE_WAIT
                                                     of FIN       send ACK
   CLOSING: other side closed
    before it received our FIN                FIN_WAIT_2               CLOSING        close() called
                                                                                        send FIN
   TIME-WAIT: we closed,
    other side closed, got ACK of                  receive FIN   receive ACK
                                                                                  LAST_ACK
    our FIN
                                                    send ACK       of FIN


   CLOSE-WAIT: other side                             TIME_WAIT
    sent FIN first, not us                                                        receive ACK

    (passive close)
    LAST-ACK: other side sent
                                                        wait 2*MSL
                                                      (240 seconds)

    FIN, then we did, now waiting                                           CLOSED
    for ACK
Web Servers: Implementation and Performance      Erich Nahum                                     115
            Summary: TCP Protocol

   Protocol provides reliability in face of complex
    network behavior
   Tries to trade off efficiency with being "good
    network citizen"
   Vast majority of bytes transferred on Internet
    today are TCP-based:
    •   Web
    •   Mail
    •   News
    •   Peer-to-peer (Napster, Gnutella, FreeNet, KaZaa)



Web Servers: Implementation and Performance   Erich Nahum   116
         Chapter 8: TCP Dynamics




Web Servers: Implementation and Performance   Erich Nahum   117
                       TCP Dynamics
   In this section we'll describe some of the
    problems you can run into as a WWW server
    interacting with TCP.
   Most of these affect the response as seen by the
    client, not the throughput generated by the
    server.
   Ideally, a server developer shouldn't have to
    worry about this stuff, but in practice, we'll see
    that's not the case.
   Examples we'll look at include:
    • The initial window size
    • The delayed ACK problem
    • Nagle and its interaction with delayed ack
    • Small receive windows interfering with loss recovery

Web Servers: Implementation and Performance   Erich Nahum    118
         TCP’s Initial Window Problem
     Recall congestion control:
                                                          sender receiver
       • senders’ initial congestion
         window is set to 1
     Recall delayed ACKs:




                                                                            RTT
       • ack every other packet
       • set 200 ms. delayed ack timer
     Short-term deadlock:




                                                                            200 ms.
                                                   time
       • sender is waiting for ACK since
         it sent 1 segment
       • receiver is waiting for 2nd
         segment before ACKing




                                                                            RTT
     Problem worse than it seems:
       • multiple objects per web page
       • IE does not do pipelining!


    Web Servers: Implementation and Performance   Erich Nahum                     119
            Solving the IW Problem

                                                        sender receiver
Solution: set IW = 2-4
    • RFC 2414
    • Didn't affect many BSD




                                                                          RTT
      systems since they
      (incorrectly) counted the
      connection setup in




                                                                          RTT
      congestion window
      calculation
                                                 time
    • Delayed ACK still happens,




                                                                          200 ms.
      but now out of critical path
      of response time for
      download



 Web Servers: Implementation and Performance   Erich Nahum                     120
     Receive Window Size Problem

                                                       sender receiver
Recall Fast Retransmit:
 Amount of data in flight:
    • MIN(cong win,recv win)
    • can't ever have more than that                            X
      outstanding
   In order for FR to work:
    • enough data has to be in flight
    • after lost packet, 3 more
      segments must arrive
    • 4.5 KB of receive-side buffer
      space must be available.
    • note many web documents are               time
      less than 4.5 KB!


Web Servers: Implementation and Performance   Erich Nahum                121
           Receive Window Size (cont)

     Previous discussion assumes                                  sender receiver
      large enough receive
      windows!
       • Early versions of MS Windows
         had 16 KB default recv. window
     Balakrishnan et al. 1998:                                               X




                                                  RTO timeout
       • Study server TCP traces from
         1996 Olympic Web Server
       • show over 50% of clients have
         receive window < 10K                                   (illegal for sender
       • Many suffer coarse-grained                                to send more)
         retransmission timeouts (RTOs)
       • Even SACK would not have
         helped!                                          time



    Web Servers: Implementation and Performance   Erich Nahum                         122
      Fixing Receive Window Problem
   Balakrishnan et. al 98
    • "Right-edge recovery“                                sender receiver
    • Also proposed by Lin & Kung 98
    • Now an RFC (3042)
   How does it work?                                               X
    • Arrival of dup ack means, segment
      has left the network
    • When dup ACK is received, send
      next segment (not retransmission)
    • Continue with 2nd and 3rd dup acks
    • Idea is "keep ACK clock flowing"
      by forcing more duplicate acks to
      be generated                        3rd dup
                                           ack!
    • Claim is that it would have avoided
      25% of course-grained timeouts in         time
      96 Olympics trace

    Web Servers: Implementation and Performance   Erich Nahum                123
               The Nagle Algorithm

   Different types of TCP traffic exist:
    • Some apps (e.g., telnet) send one byte of data, then wait
      for ACK
    • Others (e.g., FTP) use full-size segments
   Recall server can write() to a socket at any time
    • Once written, should host stack send? Or should we wait
      and hope to get more data?
   May send many small packets, which is bad for 2
    reasons:
    • Uses more network bandwidth (raises ratio of headers to
      content)
    • Uses more CPU (many costs are per-packet, not per-byte)




Web Servers: Implementation and Performance   Erich Nahum         124
                   The Nagle Algorithm


Solution is the Nagle algorithm:

     If full-size segment of data is available, just send
     If small segment available, and there is no
      unacknowledged data outstanding, send
     Otherwise, wait until either:
       • More data arrives from above (can coalesce packet), or
       • ACK arrives acknowledging outstanding data
     Idea is have at most one small packet outstanding




    Web Servers: Implementation and Performance   Erich Nahum     125
Interaction of Nagle & Delayed ACK
    Nagle and delayed ACK's                                    sender receiver
     cause (temporary) deadlock:
      • Sender wants to send 1.5                  write()
        segments, sends first full one




                                                                                  RTT
      • Nagle prevents second from
                                                  write()
        being sent (since not full size,                    (Nagle forbids
        and now we have unacked data                         sender from




                                                                                  200 ms.
        outstanding)                                        sending more)

      • Sender waits for delayed ACK
        from receiver
      • Receiver is waiting for 2nd




                                                                                  RTT
        segment before sending ACK
      • Similar to IW=1 problem earlier
Result: Many disable Nagle.
      • via setsockopt() call

    Web Servers: Implementation and Performance     Erich Nahum                        126
Interaction of Nagle & Delayed ACK

   For example, in WWW servers:
     •   original NCSA server issued a write() for every header
     •   Apache does its own buffering to do a single write() call
     •   other servers use writev() (e.g., Flash)
     •   if not careful you can flood the network with packets
   More of an issue when using persistent connections:
     • closing the connection forces data out with the FIN bit
     • but persistent connections or 1.0 “keep-alives” affected
   Mogul and Minshal 2001 evaluate a number of
    modifications to Nagle to deal with this
   Linux has similar "TCP_CORK" option
     • suppresses any non-full segment
     • application has to remember to disable TCP_CORK when
       finished.
 Web Servers: Implementation and Performance   Erich Nahum           127
          Summary: TCP Dynamics


   Many ways in which an HTTP transfer can interact
    with TCP
   Interaction of factors can cause delays in
    response time as seen by clients
   Hard to shield server developers from having to
    understand these issues
   Mistakes can cause problems such as flood of
    small packets




Web Servers: Implementation and Performance   Erich Nahum   128
  Chapter 9: TCP Implementation




Web Servers: Implementation and Performance   Erich Nahum   129
       Server TCP Implementation

   In this section we look at ways in which the host
    TCP implementation is stressed under large web
    server workloads. Most of these techniques deal
    with large numbers of connections:
    • Looking up arriving TCP segments with large numbers of
      connections
    • Dealing with the TIME-WAIT state caused by closing
      large number of connections
    • Managing large numbers of timers to support connections
    • Dealing with memory consumption of connection state
   Removing data-touching operations
    • byte copying and checksums



Web Servers: Implementation and Performance   Erich Nahum       130
        In the beginning…BSD 4.3
                                                             packet arrival: ?
   Recall how demultiplexing works:                        IP: 10.1.1.2, port: 5194

    • given a packet, want to find                                   One-behind cache
      connection state (PCB in BSD)
    • 4-tuple of source, destination port                      Head of PCB list

      & IP addresses
   Original BSD:                                       IP: 192.123.168.40, port: 23

    • used one-behind cache with linear
      search to match 4-tuple
                                                            IP: 1.2.3.4, port: 45981
    • assumption was "next segment very
      likely is from the same connection“
    • assumed solitary, long-lived, FTP-
                                                            IP: 9.2.16.1, port: 873


      like transfer
    • average miss time is O(N/2)                       IP: 118.23.48.3, port: 65383

      (N=length of PCB list)
                                                            IP: 10.1.1.2, port: 5194




Web Servers: Implementation and Performance   Erich Nahum                               131
                        PCB Hash Tables
    McKenney & Dove SigComm 92:
      • linear search with one-behind cache
        doesn't work well for transaction                       packet arrival: ?
        workloads                                               IP: 10.1.1.2, port: 5194
      • hashing does much better
      • hash based on 4-tuple                             PCB Hash Table
      • cost: O(1) (constant time)                   O                     -------          (N-1)

    BSD adds hash table in 90's
      • other BSD Unixes (such as AIX)                           IP: 10.1.1.2, port: 5194

        quickly followed.
    Algorithmic work on hash tables:
      • e..g., CLR book, “perfect” hash tables
      • none specific to web workloads
      • hash table sizing problematic


    Web Servers: Implementation and Performance   Erich Nahum                                   132
        Problem of Old Duplicates
   Recall in the Internet:                          client   server
    • packets may be arbitrarily
      duplicated, delayed, and
      reordered.
    • while rare, case must be
      accounted for.
   Consider the following:
    • two hosts connect, transfer
      data, close
    • client starts new connection
      using same 4-tuple
    • duplicate packet arrives from           time
      first connection
    • connection has been closed,
      state is gone                                                    ?
    • how can you distinguish?

Web Servers: Implementation and Performance   Erich Nahum                  133
    Role of the TIME-WAIT State
   Solution: don’t do that!                         client   server
    • prevent same 4-tuple from
      being used
    • one side must remember 4-
      tuple for period of time to
      reject old packets.
    • spec says, whoever closes
      the connection must do
      this (in the TW state).
    • Period is 2 times maximum
      segment lifetime (MSL),
      after which it is assumed




                                                                           (2 * MSL)
      no packet from previous                 time
      conversation will still be
      alive
    • MSL defined as 2 minutes                                         X
      in RFC 1122                                              reject!

Web Servers: Implementation and Performance   Erich Nahum                              134
    TIME-WAIT Problem in Servers
   Recall in a WWW server, server closes connection!
     • asymmetry of client/server model means many clients
     • PCB sticks around for 2*MSL units of time
   Mogul 1995 CA Election server study:
     • shows large numbers (90%) of PCB's in TIME-WAIT.
     • would have been 97% if followed proper MSL!
   Example: doing 1000 connections/sec.
     • Assume MSL is 120 seconds, request takes 1 second.
     • Have 1000 connections in ESTABLISHED state.
     • 240,000 connections in TIME-WAIT state!
   FTY99 propose & evaluate 3 schemes:
     •   require client to close (requires changing HTTP).
     •   have client use new TCP option (client close) (TCP).
     •   do client reset (browser, MS did this for a while)
     •   claim 50% improvement in throughput, 85% in memory use
Web Servers: Implementation and Performance   Erich Nahum         135
        Dealing with TIME-WAIT
   Sorting hash table entries
    (Aron & Druschel 99)                              PCB Hash Table

    • Demultiplexing requires that             O                     -------        (N-1)
      all PCB's be examined (for
      some hash bucket) before you
      can give up on that PCB and                      192.123.168.40: ESTABLISHED

      say it was not found.
    • Since most lookups are for                        128.119.82.37: TIME_WAIT
      existing connections, most
      connections will be in
      ESTABLISHED state rather                              9.2.16.145: TIME_WAIT

      than TIME-WAIT.
    • Can sort PCB chain such that                       178.23.48.3: TIME_WAIT

      TW entries are at the end.
      Thus, ESTABLISHED entries                              10.1.1.2: TIME_WAIT
      are at front of chain.



Web Servers: Implementation and Performance   Erich Nahum                                   136
            Server Timer Management
    Each TCP connection can have up
     to 5 timers associated with it:                            HEAD OF PCB LIST

      • delayed ack, retransmission,
        persistence, keep-alive, time-wait
     Original BSD:
                                                           192.123.168.40: RTO in 2 secs

      • linear linked list of PCB's
                                                           1.2.3.4: TIME-WAIT in 30 secs
      • fast timer (200 ms): walk all PCB's
        for delayed ACK timer
      • slow timer (500 ms): walk all PCB's                9.2.16.1: delayed ACK in 100 ms


        for all other timers
      • time kept in relative form, so have                118.23.48.3: keep-alive in 1 sec

        to subtract time from PCB (500
        ms) for 4 larger timers                              10.1.1.2: persist in 10 secs
      • costs: O(#PCBs), not O(#active
        timers)

    Web Servers: Implementation and Performance   Erich Nahum                                 137
        Server Timer Management
   Can again exploit semantics                       PCB Hash Table
    of the TIME-WAIT state:                    O                     -------        (N-1)
    • If PCB's are sorted by state,
      delayed ACK timer can stop
      after it encounters PCB in                       192.123.168.40: ESTABLISHED
      TIME-WAIT, since ACKs are
      not delayed for connections
      in TIME-WAIT state                                128.119.82.37: TIME_WAIT


    • Aron and Druschel show 25
      percent HTTP throughput                               9.2.16.145: TIME_WAIT

      improvement using this
      technique
                                                         178.23.48.3: TIME_WAIT
    • Attribute most of win to
      reduced timer processing,
      but probably helps PCB                                 10.1.1.2: TIME_WAIT

      lookup as well.


Web Servers: Implementation and Performance   Erich Nahum                                   138
                Customized PCB Tables
                                                         Regular PCB Hash Table
    Maintain 2 sets of PCBs:                      O                        -------   (N-1)
     normal and TIME-WAIT
      • first done in BSDI in 96                           192.123.168.40: ESTABLISHED

      • still must search both PCBs
    Aron & Druschel 99:
                                                       TIME-WAIT PCB Hash Table
      • can compress TW PCBs, since
                                                   O                        -------   (N-1)
        only port and sequence
        numbers needed
      • normal still has full PCB state                      9.2.16.145


      • show you can save a lot of
        kernel pinned RAM (from 31                          128.119.72.4
        MB to 5 MB, a 82% reduction)
      • results in more RAM available
        for disk cache, which leads to                          10.1..1.2

        better performance

    Web Servers: Implementation and Performance   Erich Nahum                                 139
    Scalable Timers: Timing Wheels
    Varghese SOSP 1987:




                                                             wheel pointer

    • use a hash-table-like
      structure called timing wheel
    • events are ordered by relative                                          Timing Wheel
      time in the future
                                                O                            -------   (N-1)
    • given event in future time T,
      put in slot (T mod N)
    • list sorted by time (scheme 5)                        Expire: 12


   Each clock tick:
    • wheel “turns” one slot (mod N)                        Expire: 22


    • look at first item in chain:
         • if ready, fire, check next                       Expire: 42

         • if empty or not ready to fire,
           all done
    • continue until non-ready item              Ex: current time = 12, N = 10;
      is encountered (or end of list)

Web Servers: Implementation and Performance   Erich Nahum                                      140
               Timing Wheels (cont)
   Variant (scheme 6 in paper):
    • just insert into wheel slot, don’t bother to sort
    • check all timers in slot on each tick
   Original SOSP 1987 paper
    • premise was more for large-scale simulations
    • have lots of events happening "in the future"
   Algorithmic Costs (assuming good hash function):
    • O(1) average time for basic dictionary operations
         • insertion, cancellation, per-tick bookkeeping
    • O(N) (N = number timers) worst-case for scheme 6
    • O(log(N)) worst-case for scheme 5
   Deployment:
    • Used in FreeBSD as of release 3.4 (scheme 6)
    • Variant in Linux 2.4 (hierarchy of timers with cascade)
    • Aron claims "about the same perf" as his approach
Web Servers: Implementation and Performance   Erich Nahum       141
        Data-Touching Operations
   Lots of research in high-speed network community
    about how touching data is bad
    • especially as CPU speeds increase relative to memory
   Several ways to avoid data copying:
    • Use mmap() as described earlier to cut to one copy
    • Use I/O lite primitives (new API) to move buffers around
    • Use sendfile() API combined with integrated zero-copy
      I/O system in kernel
   Also a cost to reading the data via checksums:
    • Jacobson showed how it can be folded into the copy for
      free, with some complexity on the receive side
    • I/O Lite /exokernel use checksum caches
    • Advanced network cards do checksum for you
         • Originally on SGI FDDI card (1995)
         • Now on all gigabit adaptors, some 100baseT adaptors

Web Servers: Implementation and Performance   Erich Nahum        142
    Summary: Implementation Issues

   Scaling problems happen in large WWW Servers:
     • Asymmetry of client/server model
     • Large numbers of connections
     • Large amounts of data transferred
   Approaches fall into one or more categories:
     •   Hashing
     •   Caching
     •   Exploit common-case behavior
     •   Exploiting semantic information
     •   Don't touch the data
   Most OS's now support these functions over the
    last 3 years


Web Servers: Implementation and Performance   Erich Nahum   143

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:11
posted:4/10/2012
language:English
pages:143