Slides
Shared by: wuxiangyu
-
Stats
- views:
- 20
- posted:
- 8/14/2011
- language:
- English
- pages:
- 53
Document Sample


Servers: Concurrency and
Performance
Jeff Chase
Duke University
HTTP Server
• HTTP Server
– Creates a socket (socket)
– Binds to an address
– Listens to setup accept backlog
– Can call accept to block waiting for connections
– (Can call select to check for data on multiple socks)
• Handle request
– GET /index.html HTTP/1.0\n
<optional body, multiple lines>\n
\n
Inside your server
Measures
Server application offered load
(Apache,
Tomcat/Java, etc)
response time
throughput
utilization
accept
queue
packet
queues
listen
queue
Example: Video On Demand
Server() {
while (1) {
Client() {
cfd = accept();
fd = connect(“server”); read (cfd, name);
write (fd, “video.mpg”); fd = open (name);
while (!eof(fd)) { while (!eof(fd)) {
read (fd, buf); read(fd, block);
write (cfd, block);
display (buf);
}
} close (cfd); close (fd);
} }
How many clients can the server support?
Suppose, say, 200 kb/s video on a 100 Mb/s network link?
[MIT/Morris]
Performance “analysis”
• Server capacity:
– Network (100 Mbit/s)
– Disk (20 Mbyte/s)
• Obtained performance: one client stream
• Server is limited by software structure
• If a video is 200 Kbit/s, server should be able to
support more than one client.
500?
[MIT/Morris]
WebServer Flow
Create ServerSocket 128.36.232.5
128.36.230.2
TCP socket space
state: listening
connSocket = accept() address: {*.6789, *.*}
completed connection queue:
sendbuf:
recvbuf:
read request from
connSocket state: established
address: {128.36.232.5:6789, 198.69.10.10.1500}
sendbuf:
recvbuf:
read
local file
state: listening
address: {*.25, *.*}
write file to completed connection queue:
sendbuf:
connSocket recvbuf:
close connSocket Discussion: what does each step do and
how long does it take?
Web Server Processing Steps
Accept Client
Connection
Read HTTP
Request Header
may block may block
waiting on Find
waiting on
network File disk I/O
Send HTTP
Response Header
Read File
Send Data
Want to be able to process requests concurrently.
Process States and Transitions
running
(user)
interrupt, trap/return
exception
Yield
running
Sleep (kernel) Run
blocked ready
Wakeup
Server Blocking
• accept() when no connect requests are waiting on the
listen queue
– What if server has multiple ports to listen from?
• E.g., 80 for HTTP, 443 for HTTPS
• open/read/write on server files
• read() on a socket, if the client is sending too slowly
• write() on socket, if the client is receiving too slowly
– Yup, TCP has flow control like pipes
What if the server blocks while serving one client, and
another client has work to do?
Under the Hood
start (arrival rate λ)
CPU
I/O completion I/O request
exit
I/O device (throughput λ until some
center saturates)
Concurrency and Pipelining
CPU
DISK Before
NET
CPU
DISK After
NET
Better single-server
performance
• Goal: run at server’s hardware speed
– Disk or network should be bottleneck
• Method:
– Pipeline blocks of each request
– Multiplex requests from multiple clients
• Two implementation approaches:
– Multithreaded server
– Asynchronous I/O
[MIT/Morris]
Concurrent threads or processes
• Using multiple threads/processes
– so that only the flow processing
a particular request is blocked
– Java: extends Thread or
implements Runnable interface
Example: a Multi-threaded WebServer, which creates a thread for each request
Multiple Process Architecture
Process 1
Accept Read Find Send Read File
Conn Request File Header Send Data
…
separate address spaces
Process N
Accept Read Find Send Read File
Conn Request File Header Send Data
• Advantages
– Simple programming while addressing blocking issue
• Disadvantages
– Many processes; large context switch overheads
– Consumes much memory
– Optimizations involving sharing information among processes
(e.g., caching) harder
Using Threads
Thread 1
Accept Read Find Send Read File
Conn Request File Header Send Data
…
Thread N
Accept Read Find Send Read File
Conn Request File Header Send Data
• Advantages
– Lower context switch overheads
– Shared address space simplifies optimizations (e.g., caches)
• Disadvantages
– Need kernel level threads (why?)
– Some extra memory needed to support multiple stacks
– Need thread-safe programs, synchronization
Multithreaded server
server() { for (i = 0; i < 10; i++)
while (1) {
threadfork (server);
cfd = accept();
read (cfd, name);
fd = open (name);
while (!eof(fd)) {
read(fd, block); • When waiting for I/O,
write (cfd, block); thread scheduler runs
}
close (cfd); close (fd);
another thread
}} • What about references to
shared data?
• Synchronization
[MIT/Morris]
Event-Driven Programming
• One execution stream: no CPU
concurrency.
• Register interest in events Event
(callbacks). Loop
• Event loop waits for events,
invokes handlers.
• No preemption of event Event Handlers
handlers.
• Handlers generally short-
lived.
[Ousterhout 1995]
Single Process Event Driven (SPED)
Accept Read Find Send Read File
Conn Request File Header Send Data
Event Dispatcher
• Single threaded
• Asynchronous (non-blocking) I/O
• Advantages
– Single address space
– No synchronization
• Disadvantages
– In practice, disk reads still block
Asynchronous Multi-Process Event Driven (AMPED)
Accept Read Find Send Read File
Conn Request File Header Send Data
Event Dispatcher
Helper 1 Helper 1 Helper 1
• Like SPED, but use helper processes/thread for disk I/O
• Use IPC to communicate with helper process
• Advantages
– Shared address space for most web server functions
– Concurrency for disk I/O
• Disadvantages
– IPC between main thread and helper threads
This hybrid model is used by the “Flash” web server.
Event-Based Concurrent
Servers Using I/O Multiplexing
• Maintain a pool of connected descriptors.
• Repeat the following forever:
– Use the Unix select function to block until:
• (a) New connection request arrives on the listening
descriptor.
• (b) New data arrives on an existing connected descriptor.
– If (a), add the new connection to the pool of connections.
– If (b), read any available data from the connection
• Close connection on EOF and remove it from the pool.
[CMU 15-213]
Select
• If a server has many open sockets, how does it know
when one of them is ready for I/O?
int select(int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds,
struct timeval *timeout);
• Issues with scalability: alternative event interfaces
have been offered.
Asynchronous I/O
struct callback {
bool (*is_ready)();
void (*cb)(arg); • Code is structured as a
void *arg; collection of handlers
} • Handlers are nonblocking
• Create new handlers for
main() { blocking operations
while (1) { • When operation
for (c = each callback) {
completes, call handler
if (c->is_ready())
c->handler(c->arg);
}
}
}
[MIT/Morris]
Asychronous server
name_cb(cfd) {
init() { read(cfd,name);
on_accept(accept_cb); fd = open(name);
} on_readable(fd, read_cb);
accept_cb() { }
on_readable(cfd,name_cb); read_cb(cfd, fd) {
} read(fd, block);
on_readable(fd, fn) { on_writeeable(fd, write_cb);
c = new }
callback(test_readable, fn, fd); write_cb(cfd, fd) {
add c to callback list; write(cfd, block);
} on_readable(fd, read_cb);
}
[MIT/Morris]
Multithreaded vs. Async
• Hard to program • Hard to program
– Locking code – Callback code
– Need to know what blocks – Need to know what blocks
• Coordination explicit • Coordination implicit
• State stored on thread’s stack • State passed around explicitly
– Memory allocation implicit – Memory allocation explicit
• Context switch may be • Lightweight context switch
expensive • Uniprocessors
• Multiprocessors
[MIT/Morris]
Coordination example
• Threaded server: • Asynchronous I/O
– Thread for network – Poll for packets
interface • How often to poll?
– Interrupt wakes up – Or, interrupt generates
network thread an event
– Protected (locks and • Be careful: disable
conditional variables) interrupts when
shared buffer shared manipulating callback
between server threads queue.
and network thread
[MIT/Morris]
One View
Threads!
Should You Abandon Threads?
• No: important for high-end servers (e.g.
databases).
• But, avoid threads wherever possible:
Event-Driven Handlers
– Use events, not threads, for GUIs,
distributed systems, low-end servers.
– Only use threads where true CPU
concurrency is needed. Threaded Kernel
– Where threads needed, isolate usage
in threaded application kernel: keep
most of code single-threaded.
[Ousterhout 1995]
Another view
• Events obscure control flow
– For programmers and tools
Web Server
Accept
Conn.
Threads Events
thread_main(int sock) { AcceptHandler(event e) {
struct session s; struct session *s = new_session(e); Read
accept_conn(sock, &s); RequestHandler.enqueue(s); Request
read_request(&s); }
pin_cache(&s); RequestHandler(struct session *s) {
Pin
write_response(&s); …; CacheHandler.enqueue(s); Cache
unpin(&s); } Read
} CacheHandler(struct session *s) { File
pin(s); Write
pin_cache(struct session *s) { if( !in_cache(s) ) ReadFileHandler.enqueue(s); Response
pin(&s); else ResponseHandler.enqueue(s);
if( !in_cache(&s) ) }
read_file(&s); ... Exit
} ExitHandlerr(struct session *s) {
…; unpin(&s); free_session(s); }
[von Behren]
Control Flow
• Events obscure control flow
– For programmers and tools
Web Server
Accept
Conn.
Threads Events
thread_main(int sock) { CacheHandler(struct session *s) {
struct session s; pin(s); Read
accept_conn(sock, &s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); Request
read_request(&s); else ResponseHandler.enqueue(s);
pin_cache(&s); }
Pin
write_response(&s); RequestHandler(struct session *s) { Cache
unpin(&s); …; CacheHandler.enqueue(s); Read
} } File
... Write
pin_cache(struct session *s) { ExitHandlerr(struct session *s) { Response
pin(&s); …; unpin(&s); free_session(s);
if( !in_cache(&s) ) }
read_file(&s); AcceptHandler(event e) { Exit
} struct session *s = new_session(e);
RequestHandler.enqueue(s); }
[von Behren]
Exceptions
• Exceptions complicate control flow
– Harder to understand program flow
– Cause bugs in cleanup code Web Server
Accept
Conn.
Threads Events
thread_main(int sock) { CacheHandler(struct session *s) {
struct session s; pin(s); Read
accept_conn(sock, &s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); Request
if( !read_request(&s) ) else ResponseHandler.enqueue(s);
return; }
Pin
pin_cache(&s); RequestHandler(struct session *s) { Cache
write_response(&s); …; if( error ) return; CacheHandler.enqueue(s); Read
unpin(&s); } File
} ... Write
ExitHandlerr(struct session *s) { Response
pin_cache(struct session *s) { …; unpin(&s); free_session(s);
pin(&s); }
if( !in_cache(&s) ) AcceptHandler(event e) { Exit
read_file(&s); struct session *s = new_session(e);
} RequestHandler.enqueue(s); }
[von Behren]
State Management
• Events require manual state management
• Hard to know when to free
– Use GC or risk bugs Web Server
Accept
Threads Events Conn.
thread_main(int sock) { CacheHandler(struct session *s) {
struct session s; pin(s); Read
accept_conn(sock, &s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); Request
if( !read_request(&s) ) else ResponseHandler.enqueue(s);
return; }
Pin
pin_cache(&s); RequestHandler(struct session *s) {
Cache
write_response(&s); …; if( error ) return; CacheHandler.enqueue(s); Read
unpin(&s); } File
} ... Write
ExitHandlerr(struct session *s) { Response
pin_cache(struct session *s) { …; unpin(&s); free_session(s);
pin(&s); }
if( !in_cache(&s) ) AcceptHandler(event e) { Exit
read_file(&s); struct session *s = new_session(e);
} RequestHandler.enqueue(s); }
[von Behren]
Thread 1
Accept Read Find Send Read File
Conn Request File Header Send Data
…
Thread N
Accept Read Find Send Read File
Conn Request File Header Send Data
Internet Growth and Scale
The Internet
How to handle all those
client requests raining on
your server?
Servers Under Stress
Ideal
Peak: some
Performance
resource at max
Overload: some
resource thrashing
Load (concurrent requests, or arrival rate)
[Von Behren]
Response Time
Components
• Wire time +
• Queuing time +
• Service demand +
• Wire time (response)
latency
Depends on
• Cost/length of request
• Load conditions at server
offered load
Queuing Theory for Busy People
wait here
offered load
Process for mean
request stream @
service demand D
arrival rate λ
M/M/1 Service Center
• Big Assumptions
– Queue is First-Come-First-Served (FIFO, FCFS).
– Request arrivals are independent (poisson arrivals).
– Requests have independent service demands.
– i.e., arrival interval and service demand are
exponentially distributed (noted as “M”).
Utilization
• What is the probability that the center is busy?
– Answer: some number between 0 and 1.
• What percentage of the time is the center busy?
– Answer: some number between 0 and 100
• These are interchangeable: called utilization U
• If the center is not saturated, i.e., it completes all its
requests in some bounded time, then:
• U = λD = (arrivals/T * service demand)
• “Utilization Law”
• The probability that the service center is idle is 1-U.
Little’s Law
• For an unsaturated queue in steady state, mean
response time R and mean queue length N are
governed by:
Little’s Law: N = λR
• Suppose a task T is in the system for R time units.
• During that time:
– λR new tasks arrive.
– N tasks depart (all tasks ahead of T).
• But in steady state, the flow in balances flow out.
– Note: this means that throughput X = λ.
Inverse Idle Time “Law”
Service center saturates as 1/ λ
approaches D: small increases in
R λ cause large increases in the
expected response time R.
U 1(100%)
Little’s Law gives response time R = D/(1 - U).
Intuitively, each task T’s response time R = D + DN.
Substituting λR for N: R = D + D λR
Substituting U for λD: R = D + UR
R - UR = D --> R(1 - U) = D --> R = D/(1 - U)
Why Little’s Law Is Important
1. Intuitive understanding of FCFS queue behavior.
• Compute response time from demand parameters (λ, D).
• Compute N: how much storage is needed for the queue.
2. Notion of a saturated service center.
– Response times rise rapidly with load and are unbounded.
• At 50% utilization, a 10% increase in load increases R by 10%.
• At 90% utilization, a 10% increase in load increases R by 10x.
3. Basis for predicting performance of queuing networks.
• Cheap and easy “back of napkin” estimates of system
performance based on observed behavior and proposed
changes, e.g., capacity planning, “what if” questions.
What does this tell us about
server behavior at saturation?
Under the Hood
start (arrival rate λ)
CPU
I/O completion I/O request
exit
I/O device (throughput λ until some
center saturates)
Common Bottlenecks
• No more File Descriptors
• Sockets stuck in TIME_WAIT
• High Memory Use (swapping)
• CPU Overload
• Interrupt (IRQ) Overload
[Aaron Bannert]
Scaling Server Sites: Clustering
Goals
server load balancing
L4: TCP
failure detection
L7: HTTP
SSL access control filtering
etc. priorities/QoS
request locality
virtual IP transparent caching
smart
addresses
Clients (VIPs)
switch
What to switch/filter on?
server array L3 source IP and/or VIP
L4 (TCP) ports etc.
L7 URLs and/or cookies
L7 SSL session IDs
Scaling Services: Replication
Site A
Site B
Distribute service load across
?
multiple sites.
Internet
How to select a server site for each
client or request?
Is it scalable?
Client
Extra Slides
(Any new information on the following
slides will not be tested.)
Event-Based Concurrent
Servers Using I/O Multiplexing
• Maintain a pool of connected descriptors.
• Repeat the following forever:
– Use the Unix select function to block until:
• (a) New connection request arrives on the listening
descriptor.
• (b) New data arrives on an existing connected descriptor.
– If (a), add the new connection to the pool of connections.
– If (b), read any available data from the connection
• Close connection on EOF and remove it from the pool.
[CMU 15-213]
Problems of Multi-Thread Server
• High resource usage, context switch overhead, contended
locks
• Too many threads throughput meltdown, response time
explosion
• Solution: bound total number of threads
Event-Driven Programming
• Event-driven programming, also called asynchronous i/o
• Using Finite State Machines (FSM) to monitor the progress of requests
• Yields efficient and scalable concurrency
• Many examples: Click router, Flash web server, TP Monitors, etc.
• Java: asynchronous i/o
– for an example see: http://www.cafeaulait.org/books/jnp3/examples/12/
Traditional Processes
• Expensive and “heavyweight”
• One system call per process
• Fork overhead
• Coordination
Events
• Need async I/O
• Need select
• Wasn’t originally available
• Not standardized
• Immature
• But efficient
• Code is distributed all through the program
• Harder to debug and understand
Threads
• Separate interface and implementation
• Pthreads interface
• Implementation is user-level or kernel (native)
• If user-level, needs async I/O
• But hide the abstraction behind the thread interface
Reference
The State of the Art in Locally Distributed Web-
server Systems
Valeria Cardellini, Emiliano Casalicchio, Michele Colajanni and Philip S.
Yu
Get documents about "