csci5211 Computer Networks and Data Communications by chenmeixiu


									Internet Engineering Course
Web Servers
   Company needs to provide various web
    ◦ Hosting intranet applications
    ◦ Company web site
    ◦ Various internet applications
   Therefore there is a need to provide http
    ◦ First we have a look at what http protocol is
    ◦ Then we talk about Web Servers and Apache
      as leading web server application
The World Wide Web (WWW)
   Global hypertext system
   Initially developed in 1989
     ◦ By Tim Berners Lee at the European Laboratory for Particle
       Physics, CERN in Switzerland.
     ◦ To facilitate an easy way of sharing and editing research
       documents among a geographically dispersed groups of
   In 1993, started to grow rapidly
     ◦ Mainly due to the NCSA developing a Web browser called
       Mosaic (an X Window-based application)
       First graphical interface to the Web  More convenient browsing
       Flexible way people can navigate through worldwide resources in the
        Internet and retrieve them
    Web Browsers
 Provides access to a
  Web server
 Basic components
    ◦ HTML interpreter
    ◦ HTTP client used to
      retrieve HTML pages
   Some also support
    ◦ FTP, NTTP, POP, SMTP, …
Web Servers
   Definitions
    ◦ A computer, responsible for accepting HTTP
      requests from clients, and serving them Web
    ◦ A computer program that provides the above
      mentioned functionality.
   Common features
    ◦ Accepting HTTP requests from the network
    ◦ Providing HTTP response to the requester
      Typically consists of an HTML
    ◦ Usually capable of logging
      Client requests/Server responses
Web Servers cont.
   Returned content
    ◦ Static
       Comes from an existing file
    ◦ Dynamic
       Dynamically generated by some other
        program/script called by the Web server.
   Path translation
    ◦ Translate the path component of a URL into a
      local file system resource
       Path specified by the client is relative to the server’s
        root dir
 Basic Client/Server Architecture in
   Overall organization of the Web.

• Basic function operation is to fetch documents
     – Client issues requests, browser displays document
     – Server responsible for retrieving document from local file system
• Client/server communications based on HTTP protocol
    Dynamic Content
Parts of documents may be specified via
 Client-side (executed on client machine, e.g., within
  the browser)
    ◦ Client-side script - Script embedded in html document
    ◦ Applet - pre-compiled program passed to client
   Server-side (executed on server machine)
    ◦ Server-side script embedded in document
    ◦ Servelet - precompiled program executed within the
      server’s address space
    ◦ CGI scripts
Common Gateway Interface (CGI)

   The principle of using server-side CGI programs.
    • Allows documents can be generated dynamically “on-the-fly”
    • Provides a standard way for web server to execute a program
      using user-provided data as input
    • To the server, CGI program appears as program responsible for
      fetching the requested document
Architectural Overview
   Architectural details of a client and server in the Web.

• Document fetch (and possibly server-side script): 2b-3b
• Execute CGI Script (separate process): 2c-3c-4c
• Execute servlet program (run within server): 2a-3a-4a
http protocol
 Defines the communication between a web
  server and a client
 Used to deliver virtually all files and other
  data (collectively called resources) on the
  World Wide Web
 A browser is an HTTP client because it sends
  requests to an HTTP server (Web server
 The standard (and default) port for HTTP
  servers to listen on is 80, though they can
  use any port.
Structure of http transactions
 Request/Response, text based protocol
 Format of a http message:
    <initial line, different for request vs. response>
    Header1: value1
    Header2: value2
    Header3: value3
    <optional message body goes here, like file contents
     or query data; it can be many lines long, or even
     binary data >
                The Format of a Request

method     sp      URL     sp version   cr   lf
header      :      value   cr lf
 header     :      value   cr   lf
cr lf

          Entity Body

Request Example
GET /index.html HTTP/1.1 [CRLF]
Accept: image/gif, image/jpeg [CRLF]
User-Agent: Mozilla/4.0 [CRLF]
Host: [CRLF]
Connection: Keep-Alive [CRLF]

Request Example
                                 request URL
 GET /index.html HTTP/1.1              version
 Accept: image/gif, image/jpeg
 User-Agent: Mozilla/4.0
 Connection: Keep-Alive
 [blank line here]
                 The Format of a Response

 version    sp status code sp phrase   cr   lf
 header      :    value    cr lf                  line

 header      :     value   cr   lf
cr lf

           Entity Body

Response Example
 HTTP/1.0 200 OK
 Date: Fri, 31 Dec 1999 23:59:59 GMT
 Content-Type: text/html
 Content-Length: 1354

 <h1>Hello World</h1>
 (more file contents) . . .

Response Example
version        status code
                              reason phrase
 HTTP/1.0 200 OK
 Date: Fri, 31 Dec 1999 23:59:59 GMT
 Content-Type: text/html                     headers
 Content-Length: 1354

 <h1>Hello World</h1>
 (more file contents) . . .   message body
 </html>                                18
    Initial line
   A typical initial request line:
     ◦ GET /path/to/file/index.html HTTP/1.0
   Initial response line:
     ◦ HTTP/1.0 200 OK
     ◦ HTTP/1.0 404 Not Found
   Status code:
     ◦ 1xx indicates an informational message only
     ◦ 2xx indicates success of some kind
     ◦ 3xx redirects the client to another URL
     ◦ 4xx indicates an error on the client's part
     ◦ 5xx indicates an error on the server's part
   Common status codes:
     ◦ 200 OK
     ◦ 404 Not Found
     ◦ 301 Moved Permanently
     ◦ 302 Moved Temporarily
     ◦ 303 See Other (HTTP 1.1 only)
     ◦ 500 Server Error
Header lines
   Typical request headers:
    ◦ From: email address of requester
    ◦ User-Agent: for example User-
      agent: Mozilla/3.0Gold
   Typical response headers:
    ◦ Server: for example Server:Apache/1.2b3-
    ◦ Last-modified: for example Last-Modified: ,
      19 Feb 2006 23:59:59 GMT
    Message body
 In a response, this is where the requested
  resource is returned to the client (the most
  common use of the message body), or perhaps
  explanatory text if there's an error.
 In a request, this is where user-entered data or
  uploaded files are sent to the server.
 If an HTTP message includes a body, there are
  usually header lines in the message that
  describe the body. In particular,
    ◦ The Content-Type: header gives the MIME-type of
      the data in the body, such as text/html or
    ◦ The Content-Length: header gives the number of
      bytes in the body.
    MIME Media types
 Multipurpose Internet Mail Extensions
 HTTP sends the media type of the file using the
  Content-Type: header
 Some important media types are
    ◦   text/plain, text/html
    ◦   image/gif, image/jpeg
    ◦   audio/basic, audio/wav
    ◦   model/vrml
    ◦   video/mpeg, video/quicktime
    ◦   application/*, application-specific data that does not fall
        under any other MIME category, e.g. application/octet-stream
Sample HTTP exchange
   To retrieve the file at the URL
   Request:
    GET /path/file.html HTTP/1.0
    User-Agent: HTTPTool/1.0
    [blank line here]
   Response:
    HTTP/1.0 200 OK
    Date: Fri, 31 Dec 1999 23:59:59 GMT
    Content-Type: text/html
    Content-Length: 1354
    <html> <body> <h1>Happy New Millennium!</h1> (more
      file contents) . . . </body> </html>
HTTP methods
   GET: request a resource by url
   HEAD
    ◦ is just like a GET request, except it asks the server to return the
      response headers only, and not the actual resource (i.e. no
      message body).
    ◦ This is useful to check characteristics of a resource without
      actually downloading it, thus saving bandwidth.
   POST
    ◦ A POST request is used to send data to the server to be
      processed in some way, like by a CGI script.
    ◦ There's a block of data sent with the request, in the message
      body. There are usually extra headers to describe this message
      body, like Content-Type: and Content-Length:.
    ◦ The request URI is not a resource to retrieve; it's usually a
      program to handle the data you're sending.
    ◦ The HTTP response is normally program output, not a static file.
    HTTP 1.1
   It is a superset of HTTP 1.0. Improvements
    ◦ Faster response, by allowing multiple transactions
      to take place over a single persistent connection.
    ◦ Faster response and great bandwidth savings, by
      adding cache support.
    ◦ Faster response for dynamically-generated pages,
      by supporting chunked encoding, which allows a
      response to be sent before its total length is
    ◦ Efficient use of IP addresses, by allowing multiple
      domains to be served from a single IP address.
Manually Experimenting
with HTTP
>telnet 80
Connected to
Escape character is „^]‟.

Sending a Request
> GET /~ladani/index.htm HTTP/1.0
[blank line]

The Response
HTTP/1.1 200 OK
Date: Fri, 29 Feb 2008 08:23:33 GMT
Server: Apache/2.0.52 (CentOS)
Last-Modified: Wed, 07 Nov 2007 12:27:44 GMT
ETag: "6ccb6-741c-43e55e05a5000"
Accept-Ranges: bytes
Content-Length: 29724
Connection: close
Content-Type: text/html; charset=WINDOWS-1256
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
http-equiv="Content-Type" content="text/html; charset=windows-1252">
                                           <meta name="
GENERATOR" content="Microsoft FrontPage 5.0">


GET /~ladani/index.htm HTTP/1.0

        HTTP/1.1 200 OK

                HTML code

GET /~ladani/no-such-page.htm HTTP/1.0

                HTTP/1.1 404 Not Found

                        HTML code

      GET /index.html HTTP/1.1

       HTTP/1.1 400 Bad Request

               HTML code

      Why is it a Bad Request?

HTTP/1.1 without Host Header

Session-persistent State
   What does session-persistent state mean?
    ◦ State information that is preserved between browsing
    ◦ Information that is stored semi-permanently (i.e., on disk)
      for later access.
   Why was calculator example not session-persistent?
    ◦ Sum, current display, etc. not preserved if we went to a
      different website and back to calculator.
Why session-persistence?
   User-based customizations.
    ◦ MyYahoo, E*Trade, etc.
   Long transactions.
    ◦ Electronic shopping carts.
    ◦ Order preparation
   Server-side state maintenance.
    ◦ Large amounts of state info that you don’t
      want to pass back and forth.
Cookie Overview
 HTTP cookies are a mechanism for creating and
  using session-persistent state.
 Cookies are simple string values that are
  associated with a set of URL’s.
 Servers set cookies using an HTTP header.
 Client transmits the cookie as part of HTTP
  request whenever an associated URL is visited
  in the future.
Anatomy of a cookie.
   Cookie has 6 parts:
    ◦   Name
    ◦   Value
    ◦   Domain
    ◦   Path
    ◦   Expiration
    ◦   Security flag
   Name and Value are required, others have
    default value.
Setting a cookie.
 A cookie is set using the “Set-cookie”
  header in an HTTP response.
 String value of the Set-cookie header is
  parsed into semi-colon separated fields
  that define the different parts of the
 Cookie is stored by the client.
Sending cookies
 Every time a client makes an HTTP request, it
  tests every cookie for a match.
 Cookies match if…
    ◦   Cookie domain is suffix of URL server.
    ◦   Cookie expiration has not passed.
    ◦   Cookie path is prefix of URL path.
    ◦   Cookie security flag is on and connection is secure.
   If a match is made, then name/value pair of
    cookie is sent as “Cookie” header in request.
Setting a Cookie
   Full cookie:
    Set-Cookie: my_cookie = This is my
     cookie value;;
     path=/~ladani; expires Thu, 06-
     March-08 12:00:00 GMT
   Can have more than one Set-Cookie
    header, or can combine more than one
    cookie in one header by separating with ,
Cookie Matching
   Biggest misunderstanding:
    ◦ Servers do not RETRIEVE cookies!!!!
    ◦ Servers RECEIVE cookies previously planted.
   Step 1:
    ◦ Some response by server installs cookie with
      “Set-cookie” header.
    ◦ Client saves cookie to disk.
Cookie Matching
   Step 2:
    ◦ Browser goes to some page which matches
      previously received cookie.
    ◦ Cookie name and value sent in request as
      “Cookie” HTTP header.
   Step 3:
    ◦ CGI program detects presence of cookie and
      uses it.
      Where is the cookie info?
        Environment variable HTTP_COOKIE
Where are cookies stored on client?
   Client-specific locations.
   No standard.
   Latest IE stores in a folder called
    “Temporary Internet Files”
      ◦ Each cookie stored in a separate file.
     Netscape stores in “cookies.txt”
Typical Cookie Usages
   Cookies as Database Index
    ◦ Most common use of cookies.
    ◦ State information is kept in some sort of
      database and the cookie acts as an index.
   Cookies as State Variables
    ◦ Name of cookie is like variable name.
    ◦ Value of cookie is state information.
Cookie Security
   Security flag restricts when browser will
    send a cookie back to server.
    ◦ Requires “secure” connection.
      For example: https in effect.
   What does this mean about when the
    cookies was set?
First Web Server
   Berners-Lee wrote two programs
    ◦ A browser called WorldWideWeb
    ◦ The world’s first Web server, which ran on
      The machine is on exhibition at CERN’s public
Most Famous Web Servers
 Apache HTTP Server from Apache
  Software Foundation
 Internet Information Services (IIS) from
 Google Web Server (GWS)
    ◦ Started from May 2007
   Lighttpd
    ◦ powers several popular Web 2.0 sites like
      YouTube, wikipedia and meebo
Web Servers Usage – Statistics
   The most popular Web servers, used for
    public Web sites, are tracked by Netcraft
    Web Server Survey
    ◦ Details given by Netcraft Web Server Reports
 Apache is the most popular since April 1996
 Currently (February 2008) about
    ◦   50.93%  Apache
    ◦   35.56 %  Microsoft (IIS, PWS, etc.)
    ◦   5.16 %  Google
    ◦   0.99%  Lighttpd
Web Servers Usage – Statistics cont.

    Total Sites Across All Domains August 1995 -
                    February 2008
Web Servers Usage – Statistics cont.

     Market Share for Top Servers Across All
     Domains August 1995 - February 2008
Web Servers Usage – Statistics cont.

     Totals for Active Servers Across All Domains
               June 2000 - February 2008
Apache (A PAtCHy) Web Server
 Origins: NCSA (Univ. of Illinois,Urbana/Champaign)
 Now: Apache Software Foundation (,
  developers world-wide
 Most widely used web server today [NetCraft web
  survey, 2/2008]
 Open source software
    ◦ Geographically distributed developers
    ◦ Modular, extensible design needed where third-party developers
      could override or extend basic characteristics
Web Server Processing Steps

           Accept Client

           Read HTTP
          Request Header


            Send HTTP
          Response Header
             Read File
             Send Data
Apache HTTP Server

   Apache Core
    ◦   Receives client request
    ◦   Typically, allocate new process for each incoming request
    ◦   Allocates request record
    ◦   Invokes handlers on individual modules in sequence
   Modules register handlers during configuration
   Handler
    ◦ Request record passed as single parameter
    ◦ Each handler reads/modifes request record
 Web Server Phases
Apache core invokes a handler for each phase
 Resolve document reference (URI) to a local file
  name (or CGI program+parameters)
 Client authentication (verify client identity)
 Client access control (determine access rights)
 Request access control (check if access allowed)
 MIME type determination of the response
 General phase for handling leftovers (e.g., check
  syntax of returned response, build up user profile)
 Transmission of the response to client
 Logging data on the processing of the request
   TCP/IP Tutorial and Technical Overview,
    Rodriguez, Gatrell, Karas, Peschke, IBM redbooks,
    August 2001
   Wikipedia, the free encyclopedia
   Apache: The Definitive Guide, 2nd edition, Ben
    Laurie, Peter Laurie, O’Reilly, February 1999
   Webmaster in a nutshell, 1st edition, Stephen
    Spainhour,Valerie Quercia, O’Reilly, October 1996
   Netcraft: February 2006 Web Server Survey

To top