HTTP messages
Entities and Encoding
Herng-Yow Chen
1
Outline
The format and behavior of HTTP
message entities as HTTP containers
How HTTP describes the size of entity
bodies, and what HTTP requires in the
way of sizing
The entity headers used to describe the
format, alphabet, and language of content,
so clients can process it properly
2
Reversible content encoding transforms
data format to take up less space or be
more secure
Transfer encoding modifies how HTTP
ships data to enhance the communication
of some kinds of data
Chunked encoding chops data into
multiple pieces to deliver content of
unknown length safely
3
The assortment of tags, labels, times, and
checksums help clients get the latest
version of requested content
Ranges are useful for continuing aborted
downloads where they left off
Delta encoding extensions allow client to
request just those parts of a web page
that actually have changed since a
previously viewed revision
4
Checksums of entity bodies are used to
detect changes in entity content as it
passes through proxies
5
Message is made up of header and body
HTTP/1.0 200 OK
Server: Netscape_Enterprise/3.6
Date: Sun, 17 Sep 2000 00:01:05 GMT
Content_type: text/plain
Entity headers
Content-length :18
Entity
Hi!I’m a message! Entity body
6
HTTP 1.1 defines 10 entity headers
Content-Type Content-MD5
Content-Length Last-Modified
Content-Language Expires
Content-Encoding Allow
Content-Location ETag
Content-Range Cache-Control
7
Entity Bodies
8
Why content-length is important?
Detecting Truncation
Incorrect Content-Length problems?
When connection is persistent, where one entity body
ends and the next message begins.
Chunked encoding is an alternate, sending the data in
a series of chunks, each with a specified chunk size.
When content-encoding is applied
Content-length refers to the encoded body, not the
length of the original, unencoded body.
9
Entity Digest
Content-MD5
Is used to check message integrity
Also can be used as a key into a hash
table to quickly locate documents and
reduce duplicate storage of content.
10
Media type and Charset
Content-type refers to original entity body
type before encoding.
Support optional parameters to further
specify the content type.
Character Encodings for Text Media
Content-Type: text/html; charset=iso-8859-4
11
Common media types
Media type Description
Text/html Entity body is an HTML document
Text/plain Entity body is a document in plain text
Image/gif Entity body is an image of type GIF
Image/jpeg Entity body is an image of type JPEG
Audio/x-wav Entity body contains WAV sound data
Model/vrml Entity body is a three-dimensional VRML model
Application/vnd.ms-powerpoint Entity body is a Microsoft PowerPoint presentation
Multipart/byteranges Entity body has multiple parts,each containing a different
range(in bytes) of the full document
Message/http Entity body contains a complete HTTP message (see TRACE)
12
Multipart Media Types
MIME “multipart” email messages contain
multiple messages stuck together and sent as a
single, complex message.
Each component is self-contained, with its own
headers describing its contents; the different
components are concatenated together and
delimited by a string.
HTTP also supports multipart bodies; however,
only used in two cases: fill-in form submission
and range responses carrying pieces of a
document.
13
Multipart Form Submissions
Your Name?
Your File to send?
14
If the user enters “John” and selects
the text file “hello.txt”
Content-Type: multipart/form-data; boundary=AaBo3x
--AaBo3x
Content-Disposition: form-data; name=“submit-name”
John
--AaBo3x
Content-Disposition: form-data; name=“files”; filename=“hello.txt”
Content-Type: text/plain
… contents of hello.txt …
--AaBo3x
15
If selects the text file “hello.txt” and
the second image file “image.gif”
Content-Type: multipart/form-data; boundary=AaBo3x
--AaBo3x
Content-Disposition: form-data; name=“submit-name”
John
--AaBo3x
Content-Disposition: form-data; name=“files”;
Content-type: multipart/mixed; boundary=BbC04y
--BbC04y
Content-Disposition: file: filename=“hello.txt”
Content-type: text/plain
… contents of hello.txt …
--BbC04y
Content-Disposition: file: filename=“image.gif”
Content-Type: image/gif
Content-Transfer-Encoding: binary
… contents of image.gif …
--BbC04y
--AaBo3x
16
Multipart Range Response
HTTP/1.0 206 Partial Content
Server: Microsoft-IIS/5.0
Content-Location: http://xxx/hello.txt
Content-Type: martipart/x-byteranges; boundary=--[abcdefghik…z]--
----[abcdefghik…z]—
Content-Type: text/plain
Content-Range: bytes 0-174/1441
…. Part I content ---
--[abcdefghik…z]--
Content-Type: text/plain
Content-Range: bytes 1344-1441/1441
…. Part II content ---
--[abcdefghik…z]--
17
Content-Encoding
HTTP applications sometimes want to
encode content before sending it, to help
lesson the time it takes to transmit the
data.
Content-Type is the type of the original
format, before encoding
Content-Length is the length of the
encoded length
18
Content Encoding
Content-encoded content
Original content Content-Type: text/html Original content
Content-Type: text/html Content-Length: 5746 Content-Type: text/html
Content-Length: 17571 content-encoding: gzip Content-Length: 17571
01110001
00110010
Gzip content
decoder Gzip content
encoder
19
Content-encoding tokens
Content-encoding Description
value
gzip Using the GNU zip encoding (RFC1952)
compress Using the UNIX file compression program
deflate Using zlib format (RFC1950) for deflate
compression (RFC 1951)
identity No encoding has been performed. When a
Content-encoding header is not present, this can
be assumed.
20
Accept-Encoding Headers
Request message
GET /logo.gif HTTP/1.1
Accept-encoding: gzip
[…]
client
server
HTTP/1.1 200 OK
Content-type: image/gif
Content-encoding: gzip
gunzip gzip
[…]
Response message
…00101101… …00101101…
The server compresses the image with gzip to transport a smaller file over the thin
Network connection between itself and the client.This saves network bandwidth
And reduces the amount of time that the client waits for the transfer.Though,the
Client will have to spend time decompressing the image once the image is served.
21
Client can indicate preferred
encodings by attaching Q values
Accept-Encoding: compress, gzip
Accept-Encoding:
Accept-Encoding: *
Accept-Encoding: compress;q=0.5, gzip;q=1.0
Accept-Encoding: gzip;q=1.0, identity;q=0.5; *;q=0
22
Transfer Encoding
Content-Encodings are to deal with the
entity content to be encoded for less-
space or security reason, tightly
associated with the content format.
In comparison, transfer encodings are
applied for architectural reasons and are
independent of the content format.
23
Content encoding vs. transfer encoding
Content-encoded response
HTTP/1.0 200 OK
content-encoding: gzip Normal header block
Content-Type: text/html
[…]
[encoded message] Normal entity
(just encoded) A content-encoded message just encodes the entity
Section of the message. With Transfer-encoded
Messages the encoding is a function of the entire
Transfer-encoded response Message, changing the structure of the message itself
HTTP/1.1 200 OK
Transfer-encoding: Chunked Basic header
10
abcdefghijk Encoded blocks
1
a
24
Transfer-Encoding Headers
TE
Used in the request header to tell the server
what extension transfer encoding are okay to
use.
Transfer-Encoding
Used in the response header to tell the
receiver (client) what encoding has been
perform
25
Example
GET /1.html HTTP/1.1
Host: www.csie.ncnu.edu.tw
User-Agent: Mozilla/4.61
TE: trailers, chunked
HTTP/1.1 200 ok
Transfer-Encoding: chunked
Server: Apache 3.0
26
Chunked Encoding
27
Chunked Encoding (continued)
Chunking and Persistent connection
Trailers in chunked messages
Combining Content and Transfer Encoding
28
Combining Content and Transfer Encodings
Content-type: text/heml
Content encoding
Content-Type: text/html 9BF2578EA4 9BF2578EA4
content-encoding: gzip 2670CD 2670CD
Transfer encoding
(chunking)
426
Content-Type: text/html 426
8EA
content-encoding: gzip 8EA
257 257
Transfer-encoding: chunked
98B 98B
29
Time-Varying Instance
Web objects usually are not static.
The same URL can, over time, point to
different versions of an object.
For example, the website of any media
company like CNN, and BBC.
30
Time-Varying Instances
31
Validators and Freshness
In the previous CNN example, the client got the
initial resource V1 and can cache this copy, but
for how long?
Once the document has “expired” at the client, it
must request a fresh copy from the server.
Using a “conditional request” to tell the server
which version it currently has, using a validator,
and ask for a copy to be sent only if its current
copy is no long valid.
32
Cache-Control header directives
Directive Message type
no-cache Request
no-store Request
max-age Request
max-fresh Request
no-transform Request
only-if-cached Request
public Response
private Response
33
Cache-Control header directives
Directive Message type
no-cache Response
no-store Response
no-transform Response
must-revalidate Response
proxy-revalidate Response
max-age Response
s-max-age Response
34
Conditional request types
Request type validator
If-Modified-Since Last-Modified
If-Unmodified-Since Last-Modified
If-Match ETag
If-None-Match ETag
35
Range Request
HTTP allows clients to actually request just
part or a range of a document.
Applications:
Request RoI (Region of Interest)
Media Indexing and Access
Streaming applications
36
Range Requests
Request message
GET /bigfile.html HTTP/1.1
[…]
client Response message
HTTP/1.1 200 OK
Content-Type: text/html
110100 Content-Length: 65537
111001 www.csie.ncnu.edu.tw
Accept-Ranges: bytes
101001
[…]
110010
Range request message
GET /bigfile.html HTTP/1.1
Range: bytes=20224-
[…]
Range response message
HTTP/1.1 200 OK
The client’s original request was Content-Type: text/html
Interrupted,but a second request Range: bytes=20224- www.csie.ncnu.edu.tw
For the part of the message that
Was not received allows the
Accept-Ranges: bytes
Client to resume form the point
Of the interruption […]
37
Delta Encoding
An extension to the HTTP protocol that
optimizes transfer by communicating
changes instead of entire objects.
RFC 3229 describe delta encoding.
38
Delta Encoding
39
Delta Encoding
40
Delta-encoding headers
Etag
If-None-Match
A-IM
IM
Delta-Base
41
IANA registered types of instance
manipulations
Type Description
vcdiff Delta using the vcdiff algorithm
diffe Delta using the Unix diff-e command
gdiff Delta using the gdiff algorithm
gzip Compression using the gzip algorithm
deflate Compression using the deflate algorithm
range Used in a server response to indicate that the response is partial content as
the result of a range selection
identity Used in a client request’s A-IM header to indicate that the client is willing to
accept an identity instance manipulation
42
For More Information
http://www.ietf.org/rfc/rfc2616.txt
Hypertext Transfer Protocol -- HTTP/1.1
http://www.ietf.org/rfc/rfc3229.txt
Delta encoding in HTTP
http://www.ietf.org/rfc/rfc1521.txt
MIME (Multipurpose Internet Mail Extensions) Part One:Mechanisms for
Specifying and Describing the Format of Internet Message Bodies
http://www.ietf.org/rfc/rfc2045.txt
Multipurpose Internet Mail Extensions(MIME) Part One:Format of Internet
Message Bodies
http://www.ietf.org/rfc/rfc1864.txt
The Content-MD5 Header Field
http://www.ietf.org/rfc/rfc3230.txt
Instance Digests in HTTP
43