Embed
Email

web

Document Sample
web
Shared by: HC111109025537
Categories
Tags
Stats
views:
0
posted:
11/8/2011
language:
English
pages:
27
HTTP and the Dynamic Web

How does the Web work?

The canonical example in your Web browser



Click here





“here” is a Uniform Resource Locator (URL)





http://www-cse.ucsd.edu





It names the location of an object on a server.







[courtesy of Geoff Voelker]

voelker@cs.ucsd.edu

In Action…



http://www-cse.ucsd.edu









HTTP





Client Server



• Client uses DNS to resolves name of server (www-cse.ucsd.edu)

• Establishes an HTTP connection with the server over TCP/IP

• Sends the server the name of the object (null)

• Server returns the object





[Voelker]

Naming and URLs

How should objects be named?

• URLs name objects and the virtual locations for those objects.

Location is a DNS name, so there’s two more levels of naming and

indirection under there.

Before hypertext we used to worry about access transparency.

• Object name interpretation is up to the server, but it’s often a

location in the local file tree.

If an object moves, the URL breaks (dangling reference).

Location-independent names seem like the obvious way to go

• Why don’t we use them (e.g., URNs)?

• How do we make them work, esp. in the face of mobility?





[from Voelker, with additions]

Protocols

What kind of transport protocol should the Web use?

HTTP 1.0

• One TCP connection/object

• Complaints: inefficient, slow, burdensome…

HTTP 1.1

• One TCP connection/many objects (persistent connections)

• Solves all problems, right? Huge amount of complexity

Clients, proxies, servers

How do they compare?

• Protocol differences [Krishnamurthy99], performance comparison

[Nielsen97], effects on servers [Manley97], overhead of TCP connections

[Caceres98]

HTTPS: HTTP with encryption



[Voelker]

HTTP in a Nutshell



GET /path/to/file/index.html HTTP/1.0





Content-type: MIME/html, Content-Length: 5000,...





Client Server(s)



HTTP supports request/response message exchanges of arbitrary length.

Small number of request types: basically GET and POST, with supplements.

object name, + content for POST

optional query string

optional request headers

Responses are self-typed objects (documents) with various attributes and tags.

optional cookies

optional response headers

Scalable Servers









Server









• Of course, you are not the only person accessing the server…

Web Caching









Clients Proxy Cache Servers







• Gee, is there some way to offload those busy servers?

• Use caches to exploit reference locality among clients









[Voelker]

Caching

How should we build caching systems for the Web?

• Seminal paper [Chankhunthod96]

• Proxy caches [Duska97]

• Akamai hack [Karger99]

• Cooperative caching [Tewari99, Fan98, Wolman99]

• Popularity distributions [Breslau99]









[Voelker]

Issues for Web Caching



• binding clients to proxies, handling failover

manual configuration, router-based “transparent caching”, WPAD

(Web Proxy Automatic Discovery)

• proxy may confuse/obscure interactions between server and client

• consistency management

At first approximation the Web is a wide-area read-only file service...but

it is much more than that.

caching responses vs. caching documents

deltas [Mogul+Bala/Douglis/Misha/others@research.att.com]

• prefetching, scale, request routing, scale, performance

Web caching vs. content distribution (e.g., Akamai)

A few weeks from now...

HTTP 1.1

Specification effort started in W3C, finished in IETF....much later.

A number of research works influenced the specification.

HTTP 1.0 shows the importance of careful specification.

• performance

persistent connections with pipelining

range requests, incremental update, deltas

• caching

cache control headers

• negotiation of content attributes and encodings

• content attributes vs. transport attributes

transport encodings for transmission through proxies

• Trailer header and trailer headers

Persistent Connections

There are three key performance reasons for persistent connections:

• connection setup overhead

• TCP slow start: just do it and get it over with

• pipelining as an alternative to multiple connections

And some new complexities resulting from their use, e.g.:

• request/response framing and pairing

• unexpected connection breakage

Just ask anyone from Akamai...

• large numbers of active connections

How long to keep connections around?

These motivations and issues manifest in HTTP, but they are

fundamental for request/response messaging over TCP.

Cookies

HTTP cookies (RFC2109) have brought us a better Web.

• S optionally includes arbitrary state as a cookie in a response.

• Cookie is opaque to C, but C saves the cookie.

• C sends the saved cookie in future requests to S, and possibly to

other servers as well.

• Allows stateful servers for sessions, personalized content, etc.

But: cookies raise privacy and security issues.

• What did S put in that cookie? Can anyone else see it? How much

space does it take up on my disk that I paid soooo much for?

• Cookies may allow third parties who are friends of S1,..., SN to

observe C’s movements among S1,..., SN.

Unverifiable transactions, e.g., DoubleClick and other ad services.

Unverifiable Transactions

GET x GET ad

Referer mycfo.com

ad, cookie c

mycfo.com

GET y

GET ad, cookie c

Referer amazon.com/x

ad

Client doubleclick,

amazon.com akamai, etc.



• Users may not know that they are interacting with DoubleClick.

Amazon and MyCFO trust DoubleClick, but client is ignorant.

• The user visits pages at many sites that reference DoubleClick.

• DoubleClick’s cookie allows it to associate all the requests from a given user.

• If the browser sends Referer headers, DoubleClick may gather information

about all the sites the user visits that reference DoubleClick.

Web Cache Consistency



“Requirements of performance, availability, and disconnected operation

require us to relax the goal of semantic transparency.”

- HTTP 1.1 specification



Any caching/replication framework must take steps to ensure that

the cache does not deliver old copies of modified objects.

Issues for cache consistency in the Web:

• large number of clients/proxies

• most static objects don’t change very often

• weaker consistency requirements

Stale information might be OK, as long as it is “not too stale”.

Cache Expiration and Validation

GET x GET x



x, Last-Modified m

GET x Expires t



GET x GET x

If-Modified-Since m

Clients Proxy 304: Not Modified Origin

Server

HTTP 1.0 cache control

• Origin server may add a “freshness date” (Expires) response header.

...or the cache could determine expiration time heuristically.

• Proxy must revalidate cache entry if it has expired.

Last-Modified and If-Modified-Since

• Whose clock do we use for absolute expiration times?

Expiration and Validation in HTTP 1.1



GET x GET x



x, ETag v

GET x max-age t

Age Hello world!");

} else {

output.println(“Hello world from "

+ fromWho + “");

}

}

Example 1: Invoking a Servlet by URL



Most servers allow a servlet to be invoked directly by URL.

• client issues HTTP GET

e.g., http://www.yourhost/servlet/HelloWorld

• servlet specified by HTTP POST

e.g., with form data







From :







generates a URL-encoded query

string, e.g., “?from=me”


Related docs
Other docs by HC111109025537
table 20of 20contents
Views: 1  |  Downloads: 0
sept21 reg2
Views: 1  |  Downloads: 0
SHERLOCK
Views: 6  |  Downloads: 0
Week 2008 20 20Radio
Views: 7  |  Downloads: 0
Unit6
Views: 0  |  Downloads: 0
3c211940f85dcef9c3356aca97bc0ebd
Views: 90  |  Downloads: 0
PIM
Views: 16  |  Downloads: 0
Hackney 20Torts 20Fall 201998 20a
Views: 0  |  Downloads: 0
DR CAFTA 20Artesanos 20La 20Vega
Views: 1  |  Downloads: 0
BP113 20panel 20for 20adm 20sup 20to 20aao
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!