The Web: some jargon
User agent for Web is
called a browser:
consists of “objects”
MS Internet Explorer
addressed by a URL
Most Web pages
Server for Web is
called Web server:
base HTML page, and
Apache (public domain)
objects. MS Internet
URL has two
components: host name
and path name:
The Web: the http protocol
http: hypertext transfer
Web’s application layer PC running
client: browser that
requests, receives, Server
“displays” Web objects running
server: Web server
sends objects in
response to requests
http1.0: RFC 1945 Navigator
http1.1: RFC 2068
The http protocol: more
http: TCP transport http is “stateless”
service: server maintains no
client initiates TCP information about
connection (creates socket) past client requests
to server, port 80
server accepts TCP Protocols that maintain
connection from client “state” are complex!
http messages (application- past history (state) must
layer protocol messages) be maintained
exchanged between browser if server/client crashes,
(http client) and Web server their views of “state” may
(http server) be inconsistent, must be
TCP connection closed reconciled
Suppose user enters URL (contains text,
www.someSchool.edu/someDepartment/home.index references to 10
1a. http client initiates TCP
connection to http server
1b. http server at host
www.someSchool.edu. Port 80
for TCP connection at port 80.
is default for http server.
“accepts” connection, notifying
2. http client sends http request
message (containing URL) into
TCP connection socket 3. http server receives request
message, forms response
message containing requested
sends message into socket
http example (cont.)
4. http server closes TCP
5. http client receives response
message containing html file,
displays html. Parsing html
file, finds 10 referenced jpeg
6. Steps 1-5 repeated for each
of 10 jpeg objects
Non-persistent and persistent connections
HTTP/1.0 default for HTTP/1.1
server parses request, on same TCP
responds, and closes connection: server,
TCP connection parses request,
2 RTTs to fetch each responds, parses new
Each object transfer Client sends requests
suffers from slow for all referenced
start objects as soon as it
receives base HTML.
But most 1.0 browsers use Fewer RTTs and less
parallel TCP connections. slow start.
http message format: request
two types of http messages: request, response
http request message:
ASCII (human-readable format)
(GET, POST, GET /somedir/page.html HTTP/1.0
HEAD commands) User-agent: Mozilla/4.0
Accept: text/html, image/gif,image/jpeg
Carriage return, (extra carriage return, line feed)
http request message: general format
http message format: respone
status code HTTP/1.0 200 OK
status phrase) Date: Thu, 06 Aug 1998 12:00:15 GMT
Server: Apache/1.3.0 (Unix)
Last-Modified: Mon, 22 Jun 1998 …...
data data data data data ...
http response status codes
In first line in server->client response message.
A few sample codes:
request succeeded, requested object later in this message
301 Moved Permanently
requested object moved, new location specified later in
this message (Location:)
400 Bad Request
request message not understood by server
404 Not Found
requested document not found on this server
505 HTTP Version Not Supported
Trying out http (client side) for yourself
1. Telnet to your favorite Web server:
telnet www.eurecom.fr 80 Opens TCP connection to port 80
(default http server port) at www.eurecom.fr.
Anything typed in sent
to port 80 at www.eurecom.fr
2. Type in a GET http request:
GET /~ross/index.html HTTP/1.0 By typing this in (hit carriage
return twice), you send
this minimal (but complete)
GET request to http server
3. Look at response message sent by http server!
User-server interaction: authentication
Authentication goal: control client server
access to server documents usual http request msg
stateless: client must present 401: authorization req.
authorization in each request WWW authenticate:
authorization: typically name,
password usual http request msg
authorization: header + Authorization:line
line in request
usual http response msg
if no authorization
presented, server refuses
access, sends usual http request msg
WWW authenticate: + Authorization:line
header line in response usual http response msg time
Browser caches name & password so
that user does not have to repeatedly enter it. 12
User-server interaction: cookies
server sends “cookie” to client server
client in response mst usual http request msg
Set-cookie: 1678453 usual http response +
client presents cookie in Set-cookie: #
usual http request msg
server matches cookie: # cookie-
presented-cookie with spectific
usual http response msg action
remembering user usual http request msg
preferences, previous cookie: #
choices usual http response msg action
User-server interaction: conditional GET
Goal: don’t send object if client server
client has up-to-date stored
http request msg
(cached) version If-modified-since: object
client: specify date of <date>
cached copy in http request http response modified
<date> 304 Not Modified
server: response contains
no object if cached copy up-
to-date: http request msg
HTTP/1.0 304 Not
HTTP/1.1 200 OK
Web Caches (proxy server)
Goal: satisfy client request without involving origin server
user sets browser: origin
Web accesses via web server
client sends all http server
requests to web cache client
if object at web
cache, web cache
object in http
else requests object
from origin server, client
then returns http server
response to client
Why Web Caching?
Assume: cache is “close”
to client (e.g., in same public
smaller response time:
cache “closer” to 1.5 Mbps
client access link
decrease traffic to institutional
distant servers 10 Mbps LAN
link out of
network often institutional
참고 자료 1: web caching
Large-Scale Web Caching and
Caching for a Better Web
Performance is a major concern in the Web
Proxy caching is the most widely used
method to improve Web performance
Duplicate requests to the same document
served from cache
Hits reduce latency, network utilization, server
Misses increase latency (extra hops)
Clients Proxy Cache Servers
[Source: Geoff Voelker] 19
Previous work has shown that hit rate
increases with population size [Duska et al. 97,
Breslau et al. 98]
However, single proxy caches have
Load, network topology, organizational
One technique to scale the client
population is to have proxy caches
[Source: Geoff Voelker] 20
Cooperative Web Proxy Caching
Sharing and/or coordination of cache state among
multiple Web proxy cache nodes
Effectiveness of proxy cooperation depends on:
Inter-proxy communication distance Proxy utilization and load balance
Size of client population served
[Source: Geoff Voelker]
Idea: place caches at exchange or
switching points in the network, and
origin Web site
cache at each level of the hierarchy. (e.g., U.S. Congress)
Resolve misses through the parent. upstream
Content-Sharing Among Peers
Idea: Since siblings are “close” in the network, allow
them to share their cache contents directly.
Harvest-Style ICP Hierarchies
Examples Idea: multicast probes within each
Harvest [Schwartz96] “family”: pick first hit response or
Squid (NLANR) wait for all miss responses.
client query response
Issues for Cache Hierarchies
With ICP: query traffic within “families” (size n)
• Inter-sibling ICP traffic (and aggregate overhead) is quadratic
• Query-handling overhead grows linearly with n.
• Object passes through every cache from origin to client:
deeper hierarchies scale better, but impose higher
• A recently-fetched object is replicated at every level of
• Interior cache benefits are limited by capacity if objects
are not likely to live there long (e.g., LRU).
Hashing: Cache Array Routing
Microsoft Proxy Server
1. single-hop request resolution
hash 2. no redundant caching of objects
“GET www.hotsite.com” function 3. allows client-side implementation
4. no new cache-cache protocols
Issues for CARP
no way to exploit network locality at each level
• e.g., relies on local browser caches to absorb repeats
hash can be balanced and/or weighted with a load factor
reflecting the capacity/power of each server
must rebalance on server failures
• Reassigns (1/n)th of cached URLs for array size n.
• URLs from failed server are evenly distributed among the
remaining n-1 servers.
miss penalty and cost to compute the hash
• In CARP, hash cost is linear in n: hash with each node and
pick the “winner”.
Cache for ICP
Idea: each caching server replicates the cache
directory (“summary”) of each of its peers (e.g.,
• [Cao et. al. Sigcomm98]
Query a peer only if its local summary indicates a hit.
To reduce storage overhead for summaries, implement the
summaries compactly using Bloom Filters.
– May yield false hits (e.g., 1%), but not false misses.
– Each summary is three orders of magnitude smaller than the
cache itself, and can be updated by multicasting just the flipped
A Summary-ICP Hierarchy
e.g., Squid configured miss Summary caches at each level of the hierarchy
reduce inter-sibling miss queries by 95+%.
to use cache digests
client query response
Issues for Directory-Based
Servers update their summaries lazily.
• Update when “new” entries exceed some threshold
• Update delays may yield false hits and/or false
Other ways to reduce directory size?
• Vicinity cache [Gadde/Chase/Rabinovich98]
• Subsetting by popularity [Gadde/Chase/Rabinovich97]
What are the limits to scalability?
• If we grow the number of peers?
• If we grow the cache sizes?
On the Scale and
[Wolman/Voelker/.../Levy99] is a key paper in this
area over the last few years.
first negative result in SOSP (?)
illustrates tools for evaluating wide-area systems
• simulation and analytical modeling
illustrates fundamental limits of caching
• benefits dictated by reference patterns and object rate of
• forget about capacity, and assume ideal cooperation
ties together previous work in the field
• wide-area cooperative caching strategies
• analytical models for Web workloads
UW Trace Characteristics
Duration 7 days
HTTP objects 18.4 million
HTTP requests 82.8 million
Avg. requests/sec 137
Total Bytes 677 GB
[Source: Geoff Voelker] 32
A Multi-Organization Trace
University of Washington (UW) is a large and diverse
• Approximately 50K people
UW client population contains 200 independent campus
• Museums of Art and Natural History
• Schools of Medicine, Dentistry, Nursing
• Departments of Computer Science, History, and Music
A trace of UW is effectively a simultaneous trace of
200 diverse client organizations
Key: Tagged clients according to their organization in trace
[Source: Geoff Voelker] 33
Cooperation Across Organizations
Treat each UW organization as an
Evaluate cooperative caching among these
How much Web document reuse is there
among these organizations?
Place a proxy cache in front of each
What is the benefit of cooperative caching
among these 200 proxies?
[Source: Geoff Voelker] 34
Ideal Hit Rates for UW proxies
Ideal hit rate - infinite
Average ideal local
hit rate: 43%
[Source: Geoff Voelker] 35
Ideal Hit Rates for UW proxies
Ideal hit rate - infinite
Average ideal local
hit rate: 43%
Explore benefits of
rather than a particular
Average ideal hit rate
increases from 43% to [Source: Geoff Voelker] 36
Sharing Due to Affiliation
UW organizational sharing vs. random organizations
Difference in weighted averages across all orgs is
~5% [Source: Geoff Voelker] 37
Cacheable Hit Rates for
Cacheable hit rate -
same as ideal, but
Cacheable hit rates
are much lower than
ideal (average is 20%)
Average cacheable hit
rate increases from
20% to 41% with
caching [Source: Geoff Voelker] 38
Scaling Cooperative Caching
Organizations of this size can benefit significantly
from cooperative caching
But…we don’t need cooperative caching to handle
the entire UW population size
A single proxy (or small cluster) can handle this
No technical reason to use cooperative caching for
In the real world, decisions of proxy placement are
often political or geographical
How effective is cooperative caching at scales
where a single cache cannot be used?
[Source: Geoff Voelker] 39
Hit Rate vs. Client Population
Curves similar to other studies
[e.g., Duska97, Breslau98]
Significant increase in hit
rate as client population
The reason why cooperative
caching is effective for UW
Marginal increase in hit rate
as client population increases
[Source: Geoff Voelker] 40
In the Paper...
1. Do we believe this? What are some possible
sources of error in this tracing/simulation study?
• What impact might they have?
2. Why are “ideal” hit rates so much higher for
the MS trace, but the cacheable hit rates are the
• What is the correlation between sharing and cacheability?
3. Why report byte hit rates as well as object hit
• Is the difference significant? What does this tell us about
4. How can it be that byte hit rate increases with
population, while bandwidth consumed is linear?
Sources of Error
1. End effects: is the trace interval long enough?
• Need adequate time for steady-state behavior to become
2. Sample size: is the population large enough?
• Is it representative?
3. Completeness: does the sample accurately
capture the client reference streams?
• What about browser caches and lower-level proxies? How
would they affect the results?
4. Client subsets: how to select clients to
represent a subpopulation?
5. Is the simulation accurate/realistic?
• cacheability, capacity/replacement, expiration, latency
What about Latency?
From the client’s
matters far more than
How does latency change
Median latencies improve
only a few 100 ms with
ideal caching compared
to no caching.
[Source: Geoff Voelker] 43
1. How did they obtain these reported latencies?
2. Why report median latency instead of mean?
• Is the difference significant? What does this tell us? Is it
consistent with the reported byte hit ratios?
3. Why does the magnitude of the possible error
decrease with population?
4. What about the future?
• What changes in Web behavior might lead to different
conclusions in the future?
• Will latency be as important? Bandwidth?
Large Organization Cooperation
What is the benefit of cooperative caching
among large organizations?
Explore three ways
Linear extrapolation of UW trace
Simultaneous trace of two large organizations
(UW and MS)
Analytic model for populations beyond trace
[Source: Geoff Voelker] 45
Extrapolation to Larger Client
Use least squares fit to create
a linear extrapolation of hit
Hit rate increases
logarithmically with client
population, e.g., to increase hit
rate by 10%:
Need 8 UWs (ideal)
Need 11 UWs (cacheable)
“Low ceiling”, though:
61% at 2.1M clients (UW cacheable)
A city-wide cooperative cache
would get all the benefit
[Source: Geoff Voelker] 46
UW & Microsoft Cooperation
Use traces of two large organizations to
evaluate caching systems at medium-scale
We collected a Microsoft proxy trace
during same time period as the UW trace
Combined population is ~80K clients
Increases the UW population by a factor of 3.6
Increases the MS population by a factor of 1.4
Cooperation among UW & MS proxies…
Gives marginal benefit: 2-4%
Benefit matches “hit rate vs. population” curve
[Source: Geoff Voelker] 47
UW & Microsoft Traces
Trace UW MS
Duration 7 days 6.25 days
HTTP objects 18.4 million 15.3 million
HTTP requests 82.8 million 107.7 million
Avg. requests/sec 137 199
Total Bytes 677 GB N/A
Servers 244,211 360,586
Clients 22,984 60,233
Population ~50,000 ~40,000
[Source: Geoff Voelker] 48
UW & MS Cooperative Caching
Is this worth it?
[Source: Geoff Voelker] 49
Use an analytic model to evaluate caching systems
at very large client populations
Parameterize with trace data, extrapolate beyond trace
Assumes caches are in steady state, do not start cold
Accounts for document rate of change
Explore growth of Web, variation in document popularity,
rate of change
Results agree with trace extrapolations
95% of maximum benefit achieved at the scale of a
medium-large city (500,000)
[Source: Geoff Voelker] 50
Inside the Model
[Wolman/Voelker/Levy et. al., SOSP 1999]
refines [Breslau/Cao et. al., 1999], and others
Approximates asymptotic cache behavior assuming
Zipf-like object popularity
caches have sufficient capacity
= per-client request rate
= rate of object change
pc = percentage of objects that are cacheable
= Zipf parameter (object popularity)
[Breslau/Cao99] and others observed that Web
accesses can be modeled using Zipf-like probability
Rank objects by popularity: lower rank i ==> more popular.
The probability that any given reference is to the ith most
popular object is pi
• Not to be confused with pc, the percentage of cacheable
Zipf says: “pi is proportional to 1/i, for some with
0 < < 1”.
Higher gives more skew: popular objects are way popular.
Lower gives a more heavy-tailed distribution.
In the Web, ranges from 0.6 to 0.8 [Breslau/Cao99].
With =0.8, 0.3% of the objects get 40% of requests.
Cacheable Hit Ratio: the Formula
CN is the hit ratio for cacheable
objects achievable by population of
size N with a universe of n objects.
1 1 dx
1 1 n
Inside the Hit Ratio Formula
Approximates a sum over a universe of n objects...
...of the probability of access to each object x...
…times the probability x was accessed since its last change.
1 1 dx
C is just a normalizing 1 1 n
constant for the Zipf-like N
popularity distribution, C = 1/
which must sum to 1. C is
not to be confused with CN.
dx in [Breslau/Cao 99]
Inside the Hit Ratio Formula, Part
What is the probability that i was accessed since its last invalidate?
= (rate of accesses to i)/(rate of accesses or changes to i)
= Npi / (Npi + )
1 1 dx
1 1 n Divide through by Npi.
Note: by Zipf pi = 1/Ci
dx so: 1/(Npi) = Ci/N
Hit Rates From Model
Focus on cacheable
different rate of
Believe even Slow
and Mid-Slow are
Knee at 500K – 1M
[Source: Geoff Voelker] 56
Extrapolating UW & MS Hit
These are from the
simulation results, ignoring
rate of change (compare to What is the
graphs from analytic significance
model). of slope ?
[Graph from Geoff Voelker]
Latency From Model
calculation from the
hit rate results
[Source: Geoff Voelker] 58
Rate of Change
What is more important, the rate of
change of popular objects or the rate of
change of unpopular objects?
Separate popular from unpopular objects
Look at sensitivity of hit rate to variations
in rate of change
[Source: Geoff Voelker] 59
Rate of Change Sensitivity
Popular docs sensitivity
Unpopular low R-of-C
Issue is minutes to hours
Unpopular docs sensitivity
Popular low R-of-C
Days to weeks to month
Unpopular more sensitive
Compare differences in
hit rates between A,C
[Source: Geoff Voelker] 60
참고자료 2: Content Distribution
Networks and Quality of Service
What is a CDN?
A system (often overlay network to
Internet) for high-performance delivery of
A CDN maintains multiple locations with
mirrors of the same content (known as
surrogates) and redirects users to the
most appropriate content location.
This distributes the load and also moves
the content closer to the user, avoiding
potential congestion and reducing response
Need for CDNs
Multimedia such as videoconferences and
Sensitive to response-time delays
Require large amounts of bandwidth
CDNs address these requirements by
minimizing the number of backbone routers
that content must traverse and
distributing the bandwidth load
As an example of CDN’s scalability:
Once a year the Victoria’s Secret Lingerie
Company broadcasts their Fashion Parade.
1,000,000+ viewers watching live @ 25 Kbps
The first year they tried it the enormous
load crashed their centralized servers and
many missed the show
Since then they have started using Yahoo
and Akamai for their CDN.
As many as 2 million watched the show in 2001
without any hiccups.
CDNs and Cache
Caches are used in the Internet to move
content closer to the user.
Reduces load on origin servers
Eliminates redundant data traversal
CDNs make heavy use of cache
Origin servers are fully or partially cached at
surrogate servers close to the users
Initiates communication between client and a
Mechanisms that move content from origin
servers to surrogates.
Consists of surrogate servers to delivery copies
of content to users.
How CDN Routing Works
1. Client requests content from
2. Site uses a CDN as their
provider. Client gets
redirected to the CDN.
3. Client gets redirected to
most appropriate cache.
4. If the CDN has a cache at
the Client’s ISP, the Client
gets redirected to that
5. The CDN cache serves the
content to the client.
6. If content is served for
ISP’s cache performance
improves due to close
proximity to client.
Direct a client’s request for objects
served by a CDN to the most appropriate
Two commonly used methods:
1. DNS Redirection
2. URL Rewriting
Authoritative DNS server redirects client
request by resolving CDN server to IP
address of one content server.
A number of factors determine which
content server is used in final resolution.
Availability of resources, network conditions
Load balancing can be implemented by
specifying a low TTL field in a DNS reply.
Two types of CDNs using redirection:
1. Full site content delivery
2. Partial site content delivery
Full Site Content Delivery
All requests for origin-server are
redirected by DNS, to a CDN server.
UniTech’s Networks IntelliDNS
Partial Site Content Delivery
Origin site alters an object’s URL so that
it’s resolved by the CDN’s DNS server.
eg: www.foo.com/bar.gif becomes
Origin server dynamically generates pages
to redirect clients to different content
Page is dynamically rewritten with the IP
address of a mirror server.
Clearway CDN is one such company.
Mechanisms that move content from
origin servers to surrogates.
Two different methods to get content to
Content is delivered to cache before
requests are generated.
Used for highly distributed usage.
Caches can be updated during off-hours to
reduce network load.
Content is pulled from the origin server to
the cache when a request is received from
The object is delivered to the client and
simultaneously stored on the cache for
Can implement multicasting for efficient
content transfer between caches.
Leased lines may be used between servers
to ensure QoS.
Consists of servers placed around the Internet with each
server caching the central content.
Transparent to end user – looks like it’s coming from central
Distributed structure means less load on all servers.
Can support QoS for customers with differing needs.
Eg: Gold class, Silver class, Bronze class scheme
Costs becomes cheaper
Cost of buying new servers is relatively cheaper than
trying to obtain higher output from just one server.
CDN Usage in the Real World
Allow organisations to put on large multimedia events
Internal: Company announcements, video
conferencing meetings, instructor-led staff
External: Event hosting such as concerts, fashion
Allow organisations to improve internal data flows
Decentralised intranet system to reduce WAN
Companies can choose to outsource or build
their own network
Outsource: setup and maintenance costs much
lower, no need for inhouse experts, providers may
also have additional services such as live event
management and extensive usage statistics
Own network: greater control, privacy
Some of the largest companies include
Yahoo (mainly streaming media)
Extensive networks covering large areas
Akamai has over 13000 servers in more than 60
How good are these networks?
Largest companies tested for streaming live
broadcast capabilities as well as on-demand
Each provider sent 1 hour MPEG-2 stream via
satellite and needed to encode at 100kbps in real-
time before transmission
Yahoo achieved average packet loss rate of 0.006%
Another study found internet packet loss of > 9%
for similar bandwidth and distance
However, this is upper end of results
Tested Performance (cont)
After September 11 many web sites flooded
Typical sites that experienced massive
increases in traffic were airline and news
Akamai used to serve 80% of MSNBC.com’s
traffic, including around 12.5 million streams
Akamai also used by MSNBC.com for Winter
Many design aspects can affect performance
Capabilities of infrastructure
Location of equipment
DNS Redirection is crucial in obtaining
optimal performance, but is also one of the
hardest areas to perfect
DNS Redirection Issues
Study found that neither Akamai nor Digital
Island could redirected the client to the
optimal server in their content distribution
In a small fraction of cases performance was
far from optimal
Due to difficulty in determining user’s exact
location and the best server at the time
Using a number of mechanisms including load
balancing and caching servers, content
delivery networks aim to distribute internet
content towards the network edge
Avoids bottlenecks involved in centralized
architecture, and reduces latency between
end user and content
Common uses for these networks is support
for a large number of users to access popular
web sites, or as a delivery means for
Hierarchical Caches and CDNS
What are the implications of this study for
hierarchical caches and Content Delivery
Networks (e.g., Akamai)?
Demand-side proxy caches are widely deployed and are
likely to become ubiquitous.
What is the marginal benefit from a supply-side CDN
cache given ubiquitous demand-side proxy caching?
What effect would we expect to see in a trace gathered
at an interior cache?
CDN interior caches can be modeled as upstream
caches in a hierarchy, given some simplifying
An Idealized Hierarchy
Level 1 (Root)
N2 clients N2 clients
Assume the trees are symmetric to simplify the math.
Ignore individual caches and solve for each level.
Hit Ratio at Interior Level i
CN gives us the hit ratio for a complete
subtree covering population N
The hit ratio predicted at level i or at any
cache in level i over R requests is given by:
hits at level i hi Rpc (CNi CNi 1 )
requests to level i ri ri 1 hi 1
“the hits for Ni (at level i) minus the hits captured by level
i+1, over the miss stream from level i+1”
Root Hit Ratio
Predicted hit ratio for cacheable objects,
observed at root of a two-level cache
hierarchy (i.e. where r2=Rpc):
h1 CN1 CN2
r1 1 CN2
Generalizing to CDNs
Request Interior Caches
Routing (supply side “reverse proxy”)
Function ƒ NI clients
ƒ(leaf, object, state)
NL clients NL clients
Symmetry assumption: ƒ is stable and “balanced”.
Hit ratio in CDN caches
Given the symmetry and balance
assumptions, the cacheable hit ratio
at the interior (CDN) nodes is:
CN I CN L
1 CN L
NI is the covered population at each CDN cache.
NL is the population at each leaf cache.
Cacheable interior hit ratio
fixed fanout NI /NL
Interior hit rates improve as
leaf populations increase....
increasing NI and NL -->
Interior hit ratio
as percentage of all cacheable
....but, the interior
cache sees a declining
marginal share of traffic.
increasing NI and NL -->