Docstoc

Slides

Document Sample
Slides Powered By Docstoc
					Servers: Concurrency and
      Performance
          Jeff Chase
        Duke University
                 HTTP Server
• HTTP Server
  – Creates a socket (socket)
  – Binds to an address
  – Listens to setup accept backlog
  – Can call accept to block waiting for connections
  – (Can call select to check for data on multiple socks)
• Handle request
  – GET /index.html HTTP/1.0\n
    <optional body, multiple lines>\n
    \n
   Inside your server
                              Measures
         Server application   offered load
             (Apache,
         Tomcat/Java, etc)
                              response time
                              throughput
                              utilization
                     accept
                     queue
packet
queues


          listen
          queue
    Example: Video On Demand
                                   Server() {
                                      while (1) {
Client() {
                                       cfd = accept();
   fd = connect(“server”);             read (cfd, name);
   write (fd, “video.mpg”);            fd = open (name);
   while (!eof(fd))      {             while (!eof(fd)) {
         read (fd, buf);                   read(fd, block);
                                           write (cfd, block);
        display (buf);
                                       }
   }                                   close (cfd); close (fd);
}                                  }



 How many clients can the server support?
 Suppose, say, 200 kb/s video on a 100 Mb/s network link?
                                                                  [MIT/Morris]
       Performance “analysis”
• Server capacity:
   – Network (100 Mbit/s)
   – Disk (20 Mbyte/s)
• Obtained performance: one client stream
• Server is limited by software structure
• If a video is 200 Kbit/s, server should be able to
  support more than one client.

                               500?



                                                  [MIT/Morris]
           WebServer Flow
Create ServerSocket                             128.36.232.5
                                                128.36.230.2
                             TCP socket space

                                state: listening
connSocket = accept()           address: {*.6789, *.*}
                                completed connection queue:
                                sendbuf:
                                recvbuf:

 read request from
    connSocket                  state: established
                                address: {128.36.232.5:6789, 198.69.10.10.1500}
                                sendbuf:
                                recvbuf:

        read
      local file
                                 state: listening
                                 address: {*.25, *.*}
     write file to               completed connection queue:
                                 sendbuf:
     connSocket                  recvbuf:




   close connSocket     Discussion: what does each step do and
                        how long does it take?
 Web Server Processing Steps
                     Accept Client
                      Connection

                     Read HTTP
                    Request Header
may block                                 may block
waiting on               Find
                                          waiting on
 network                 File              disk I/O

                     Send HTTP
                   Response Header
                      Read File
                      Send Data


  Want to be able to process requests concurrently.
Process States and Transitions
                       running
                        (user)
                interrupt,   trap/return
                exception

                                              Yield
                       running
           Sleep        (kernel)      Run


      blocked                               ready
                        Wakeup
              Server Blocking
• accept() when no connect requests are waiting on the
  listen queue
   – What if server has multiple ports to listen from?
       • E.g., 80 for HTTP, 443 for HTTPS
• open/read/write on server files
• read() on a socket, if the client is sending too slowly
• write() on socket, if the client is receiving too slowly
   – Yup, TCP has flow control like pipes

What if the server blocks while serving one client, and
 another client has work to do?
             Under the Hood

   start (arrival rate λ)




                              CPU

I/O completion                           I/O request




                                                          exit
                            I/O device           (throughput λ until some
                                                     center saturates)
 Concurrency and Pipelining

CPU
DISK                    Before
NET


CPU
DISK                    After
NET
          Better single-server
             performance
• Goal: run at server’s hardware speed
   – Disk or network should be bottleneck
• Method:
   – Pipeline blocks of each request
   – Multiplex requests from multiple clients
• Two implementation approaches:
   – Multithreaded server
   – Asynchronous I/O




                                                [MIT/Morris]
     Concurrent threads or processes

• Using multiple threads/processes
   – so that only the flow processing
     a particular request is blocked
   – Java: extends Thread or
     implements Runnable interface




Example: a Multi-threaded WebServer, which creates a thread for each request
    Multiple Process Architecture
Process 1
       Accept        Read            Find          Send    Read File
        Conn        Request          File         Header   Send Data




                                  …
                                                  separate address spaces

Process N
       Accept        Read            Find          Send    Read File
        Conn        Request          File         Header   Send Data



•   Advantages
     – Simple programming while addressing blocking issue
•   Disadvantages
     – Many processes; large context switch overheads
     – Consumes much memory
     – Optimizations involving sharing information among processes
        (e.g., caching) harder
                    Using Threads
Thread 1
       Accept         Read           Find           Send    Read File
        Conn         Request         File          Header   Send Data




                                   …
Thread N
       Accept         Read           Find           Send    Read File
        Conn         Request         File          Header   Send Data




•   Advantages
     – Lower context switch overheads
     – Shared address space simplifies optimizations (e.g., caches)
•   Disadvantages
     – Need kernel level threads (why?)
     – Some extra memory needed to support multiple stacks
     – Need thread-safe programs, synchronization
           Multithreaded server
server() {                     for (i = 0; i < 10; i++)
   while (1) {
                                   threadfork (server);
    cfd = accept();
    read (cfd, name);
    fd = open (name);
    while (!eof(fd)) {
         read(fd, block);         • When waiting for I/O,
         write (cfd, block);        thread scheduler runs
    }
    close (cfd); close (fd);
                                    another thread
}}                                • What about references to
                                    shared data?
                                  • Synchronization


                                                          [MIT/Morris]
   Event-Driven Programming
• One execution stream: no CPU
  concurrency.
• Register interest in events        Event
  (callbacks).                       Loop
• Event loop waits for events,
  invokes handlers.
• No preemption of event         Event Handlers
  handlers.
• Handlers generally short-
  lived.


                                      [Ousterhout 1995]
Single Process Event Driven (SPED)
     Accept        Read           Find      Send    Read File
      Conn        Request         File     Header   Send Data




                       Event Dispatcher



• Single threaded
• Asynchronous (non-blocking) I/O
• Advantages
   – Single address space
   – No synchronization
• Disadvantages
   – In practice, disk reads still block
Asynchronous Multi-Process Event Driven (AMPED)
           Accept               Read        Find        Send    Read File
            Conn               Request      File       Header   Send Data




                                    Event Dispatcher


                    Helper 1               Helper 1             Helper 1



  •   Like SPED, but use helper processes/thread for disk I/O
  •   Use IPC to communicate with helper process
  •   Advantages
       – Shared address space for most web server functions
       – Concurrency for disk I/O
  •   Disadvantages
       – IPC between main thread and helper threads

This hybrid model is used by the “Flash” web server.
   Event-Based Concurrent
Servers Using I/O Multiplexing

• Maintain a pool of connected descriptors.
• Repeat the following forever:
   – Use the Unix select function to block until:
      • (a) New connection request arrives on the listening
         descriptor.
      • (b) New data arrives on an existing connected descriptor.
   – If (a), add the new connection to the pool of connections.
   – If (b), read any available data from the connection
      • Close connection on EOF and remove it from the pool.




                                                         [CMU 15-213]
                            Select
• If a server has many open sockets, how does it know
  when one of them is ready for I/O?
   int select(int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds,
      struct timeval *timeout);
• Issues with scalability: alternative event interfaces
  have been offered.
                   Asynchronous I/O
struct callback {
  bool (*is_ready)();
  void (*cb)(arg);              • Code is structured as a
  void *arg;                    collection of handlers
}                               • Handlers are nonblocking
                                • Create new handlers for
main() {                        blocking operations
  while (1) {                   • When operation
    for (c = each callback) {
                                completes, call handler
       if (c->is_ready())
          c->handler(c->arg);
    }
   }
}


                                                      [MIT/Morris]
             Asychronous server
                                        name_cb(cfd) {
init() {                                   read(cfd,name);
    on_accept(accept_cb);                  fd = open(name);
}                                          on_readable(fd, read_cb);
accept_cb() {                           }
    on_readable(cfd,name_cb);           read_cb(cfd, fd) {
}                                          read(fd, block);
on_readable(fd, fn) {                      on_writeeable(fd, write_cb);
    c = new                             }
     callback(test_readable, fn, fd);   write_cb(cfd, fd) {
    add c to callback list;                write(cfd, block);
}                                          on_readable(fd, read_cb);
                                        }




                                                               [MIT/Morris]
        Multithreaded vs. Async
•   Hard to program                  •   Hard to program
     – Locking code                       – Callback code
     – Need to know what blocks           – Need to know what blocks
•   Coordination explicit            •   Coordination implicit
•   State stored on thread’s stack   •   State passed around explicitly
     – Memory allocation implicit         – Memory allocation explicit
•   Context switch may be            •   Lightweight context switch
    expensive                        •   Uniprocessors
•   Multiprocessors




                                                                 [MIT/Morris]
         Coordination example
• Threaded server:            • Asynchronous I/O
   – Thread for network          – Poll for packets
     interface                      • How often to poll?
   – Interrupt wakes up          – Or, interrupt generates
     network thread                an event
   – Protected (locks and           • Be careful: disable
     conditional variables)            interrupts when
     shared buffer shared              manipulating callback
     between server threads            queue.
     and network thread




                                                      [MIT/Morris]
One View




  Threads!
   Should You Abandon Threads?
• No: important for high-end servers (e.g.
  databases).

• But, avoid threads wherever possible:
                                           Event-Driven Handlers
   – Use events, not threads, for GUIs,
     distributed systems, low-end servers.
   – Only use threads where true CPU
     concurrency is needed.                 Threaded Kernel
   – Where threads needed, isolate usage
     in threaded application kernel: keep
     most of code single-threaded.


                                                [Ousterhout 1995]
                                 Another view
 • Events obscure control flow
    – For programmers and tools
                                                                                     Web Server
                                                                                     Accept
                                                                                     Conn.
         Threads                                      Events
thread_main(int sock) {          AcceptHandler(event e) {
  struct session s;                struct session *s = new_session(e);               Read
  accept_conn(sock, &s);           RequestHandler.enqueue(s);                       Request
  read_request(&s);              }
  pin_cache(&s);                 RequestHandler(struct session *s) {
                                                                                      Pin
  write_response(&s);              …; CacheHandler.enqueue(s);                       Cache
  unpin(&s);                     }                                                                Read
}                                CacheHandler(struct session *s) {                                 File
                                   pin(s);                                            Write
pin_cache(struct session *s) {     if( !in_cache(s) ) ReadFileHandler.enqueue(s);   Response
   pin(&s);                        else              ResponseHandler.enqueue(s);
   if( !in_cache(&s) )           }
      read_file(&s);             ...                                                  Exit
}                                ExitHandlerr(struct session *s) {
                                   …; unpin(&s); free_session(s); }
                                                                                             [von Behren]
                                  Control Flow
 • Events obscure control flow
    – For programmers and tools
                                                                                        Web Server
                                                                                        Accept
                                                                                        Conn.
         Threads                                      Events
thread_main(int sock) {          CacheHandler(struct session *s) {
  struct session s;                pin(s);                                               Read
  accept_conn(sock, &s);           if( !in_cache(s) ) ReadFileHandler.enqueue(s);       Request
  read_request(&s);                else              ResponseHandler.enqueue(s);
  pin_cache(&s);                 }
                                                                                         Pin
  write_response(&s);            RequestHandler(struct session *s) {                    Cache
  unpin(&s);                       …; CacheHandler.enqueue(s);                                     Read
}                                }                                                                  File
                                 ...                                                     Write
pin_cache(struct session *s) {   ExitHandlerr(struct session *s) {                     Response
   pin(&s);                        …; unpin(&s); free_session(s);
   if( !in_cache(&s) )           }
      read_file(&s);             AcceptHandler(event e) {                                Exit
}                                  struct session *s = new_session(e);
                                   RequestHandler.enqueue(s); }
                                                                                    [von Behren]
                                     Exceptions
 • Exceptions complicate control flow
    – Harder to understand program flow
    – Cause bugs in cleanup code                                                         Web Server
                                                                                         Accept
                                                                                         Conn.
         Threads                                     Events
thread_main(int sock) {          CacheHandler(struct session *s) {
  struct session s;                pin(s);                                                Read
  accept_conn(sock, &s);           if( !in_cache(s) ) ReadFileHandler.enqueue(s);        Request
  if( !read_request(&s) )          else              ResponseHandler.enqueue(s);
     return;                     }
                                                                                          Pin
  pin_cache(&s);                 RequestHandler(struct session *s) {                     Cache
  write_response(&s);              …; if( error ) return; CacheHandler.enqueue(s);                  Read
  unpin(&s);                     }                                                                   File
}                                ...                                                      Write
                                 ExitHandlerr(struct session *s) {                      Response
pin_cache(struct session *s) {     …; unpin(&s); free_session(s);
   pin(&s);                      }
   if( !in_cache(&s) )           AcceptHandler(event e) {                                 Exit
      read_file(&s);               struct session *s = new_session(e);
}                                  RequestHandler.enqueue(s); }
                                                                                     [von Behren]
                       State Management
 • Events require manual state management
 • Hard to know when to free
    – Use GC or risk bugs                                                                Web Server
                                                                                         Accept
         Threads                                     Events                              Conn.

thread_main(int sock) {          CacheHandler(struct session *s) {
  struct session s;                pin(s);                                                Read
  accept_conn(sock, &s);           if( !in_cache(s) ) ReadFileHandler.enqueue(s);        Request
  if( !read_request(&s) )          else              ResponseHandler.enqueue(s);
     return;                     }
                                                                                          Pin
  pin_cache(&s);                 RequestHandler(struct session *s) {
                                                                                         Cache
  write_response(&s);              …; if( error ) return; CacheHandler.enqueue(s);                  Read
  unpin(&s);                     }                                                                   File
}                                ...                                                      Write
                                 ExitHandlerr(struct session *s) {                      Response
pin_cache(struct session *s) {     …; unpin(&s); free_session(s);
   pin(&s);                      }
   if( !in_cache(&s) )           AcceptHandler(event e) {                                 Exit
      read_file(&s);               struct session *s = new_session(e);
}                                  RequestHandler.enqueue(s); }
                                                                                     [von Behren]
Thread 1
       Accept    Read     Find    Send    Read File
        Conn    Request   File   Header   Send Data




                          …
Thread N
       Accept    Read     Find    Send    Read File
        Conn    Request   File   Header   Send Data
Internet Growth and Scale


      The Internet



                 How to handle all those
                 client requests raining on
                 your server?
              Servers Under Stress

                                                        Ideal


                                   Peak: some
Performance




                                   resource at max




                                          Overload: some
                                          resource thrashing



                  Load (concurrent requests, or arrival rate)

                                                                [Von Behren]
                 Response Time
Components
• Wire time +
• Queuing time +
• Service demand +
• Wire time (response)




                              latency
Depends on
• Cost/length of request
• Load conditions at server
                                        offered load
 Queuing Theory for Busy People
                      wait here
   offered load
                                           Process for mean
request stream @
                                           service demand D
   arrival rate λ
                    M/M/1 Service Center


• Big Assumptions
   – Queue is First-Come-First-Served (FIFO, FCFS).
   – Request arrivals are independent (poisson arrivals).
   – Requests have independent service demands.
   – i.e., arrival interval and service demand are
     exponentially distributed (noted as “M”).
                   Utilization
• What is the probability that the center is busy?
   – Answer: some number between 0 and 1.
• What percentage of the time is the center busy?
   – Answer: some number between 0 and 100
• These are interchangeable: called utilization U
• If the center is not saturated, i.e., it completes all its
  requests in some bounded time, then:
      • U = λD = (arrivals/T * service demand)
      • “Utilization Law”
• The probability that the service center is idle is 1-U.
                 Little’s Law
• For an unsaturated queue in steady state, mean
  response time R and mean queue length N are
  governed by:


             Little’s Law: N = λR

• Suppose a task T is in the system for R time units.
• During that time:
   – λR new tasks arrive.
   – N tasks depart (all tasks ahead of T).
• But in steady state, the flow in balances flow out.
   – Note: this means that throughput X = λ.
    Inverse Idle Time “Law”
                               Service center saturates as 1/ λ
                               approaches D: small increases in
R                              λ cause large increases in the
                               expected response time R.

        U       1(100%)



    Little’s Law gives response time R = D/(1 - U).

        Intuitively, each task T’s response time R = D + DN.
        Substituting λR for N: R = D + D λR
        Substituting U for λD: R = D + UR
        R - UR = D --> R(1 - U) = D --> R = D/(1 - U)
Why Little’s Law Is Important
1. Intuitive understanding of FCFS queue behavior.
         • Compute response time from demand parameters (λ, D).
         • Compute N: how much storage is needed for the queue.
2. Notion of a saturated service center.
    – Response times rise rapidly with load and are unbounded.
       • At 50% utilization, a 10% increase in load increases R by 10%.
       • At 90% utilization, a 10% increase in load increases R by 10x.
3. Basis for predicting performance of queuing networks.
        • Cheap and easy “back of napkin” estimates of system
          performance based on observed behavior and proposed
          changes, e.g., capacity planning, “what if” questions.
 What does this tell us about
server behavior at saturation?
             Under the Hood

   start (arrival rate λ)




                              CPU

I/O completion                           I/O request




                                                          exit
                            I/O device           (throughput λ until some
                                                     center saturates)
         Common Bottlenecks
•   No more File Descriptors
•   Sockets stuck in TIME_WAIT
•   High Memory Use (swapping)
•   CPU Overload
•   Interrupt (IRQ) Overload




                                 [Aaron Bannert]
  Scaling Server Sites: Clustering

                                               Goals
                                               server load balancing
           L4: TCP
                                               failure detection
          L7: HTTP
             SSL                               access control filtering
             etc.                              priorities/QoS
                                               request locality
          virtual IP                           transparent caching
                        smart
          addresses
Clients    (VIPs)
                       switch
                                               What to switch/filter on?
                                server array   L3 source IP and/or VIP
                                               L4 (TCP) ports etc.
                                               L7 URLs and/or cookies
                                               L7 SSL session IDs
    Scaling Services: Replication
                         Site A
                                          Site B


Distribute service load across
                                          ?
multiple sites.
                                       Internet
How to select a server site for each
client or request?
Is it scalable?
                                                  Client
       Extra Slides

(Any new information on the following
      slides will not be tested.)
   Event-Based Concurrent
Servers Using I/O Multiplexing

• Maintain a pool of connected descriptors.
• Repeat the following forever:
   – Use the Unix select function to block until:
      • (a) New connection request arrives on the listening
         descriptor.
      • (b) New data arrives on an existing connected descriptor.
   – If (a), add the new connection to the pool of connections.
   – If (b), read any available data from the connection
      • Close connection on EOF and remove it from the pool.




                                                         [CMU 15-213]
Problems of Multi-Thread Server




• High resource usage, context switch overhead, contended
  locks
• Too many threads  throughput meltdown, response time
  explosion
• Solution: bound total number of threads
        Event-Driven Programming




•   Event-driven programming, also called asynchronous i/o
•   Using Finite State Machines (FSM) to monitor the progress of requests
•   Yields efficient and scalable concurrency
•   Many examples: Click router, Flash web server, TP Monitors, etc.

•   Java: asynchronous i/o
     – for an example see: http://www.cafeaulait.org/books/jnp3/examples/12/
         Traditional Processes
•   Expensive and “heavyweight”
•   One system call per process
•   Fork overhead
•   Coordination
                       Events
•   Need async I/O
•   Need select
•   Wasn’t originally available
•   Not standardized
•   Immature
•   But efficient
•   Code is distributed all through the program
•   Harder to debug and understand
                     Threads
•   Separate interface and implementation
•   Pthreads interface
•   Implementation is user-level or kernel (native)
•   If user-level, needs async I/O
•   But hide the abstraction behind the thread interface
                     Reference
The State of the Art in Locally Distributed Web-
                 server Systems
 Valeria Cardellini, Emiliano Casalicchio, Michele Colajanni and Philip S.
                                    Yu

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:51
posted:8/15/2011
language:English
pages:53